US20200005901A1 - Cancer classifier models, machine learning systems and methods of use - Google Patents

Cancer classifier models, machine learning systems and methods of use Download PDF

Info

Publication number
US20200005901A1
US20200005901A1 US16/458,589 US201916458589A US2020005901A1 US 20200005901 A1 US20200005901 A1 US 20200005901A1 US 201916458589 A US201916458589 A US 201916458589A US 2020005901 A1 US2020005901 A1 US 2020005901A1
Authority
US
United States
Prior art keywords
cancer
patient
biomarkers
classifier model
panel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US16/458,589
Other languages
English (en)
Inventor
Jonathan Cohen
Victoria Doseeva
Peichang SHI
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
20 20 GeneSystems Inc
Original Assignee
20 20 GeneSystems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 20 20 GeneSystems Inc filed Critical 20 20 GeneSystems Inc
Priority to US16/458,589 priority Critical patent/US20200005901A1/en
Publication of US20200005901A1 publication Critical patent/US20200005901A1/en
Priority to US18/213,882 priority patent/US20240040068A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Definitions

  • This application pertains generally to classifier models generated by a machine learning system, trained with longitudinal data, for identifying asymptomatic patients with an increased risk for developing cancer and the type of cancer, especially in an otherwise asymptomatic or vaguely symptomatic patient.
  • imaging and diagnostic tests have been introduced into medical practice in an attempt to help physicians detect cancer early. These include various imaging modalities such as mammography as well as diagnostic tests to identify cancer specific “biomarkers” in the blood and other bodily fluids such as the prostate specific antigen (PSA) test.
  • PSA prostate specific antigen
  • the value of many of these tests is often questioned particularly with regard to whether the costs and risks associated with false positives, false negatives, etc. outweigh the potential benefits in terms of actual lives saved.
  • Cancer detection poses significant technical challenges as compared to detecting viral or bacterial infections since cancer cells, unlike viruses and bacteria, are biologically similar to and hard to distinguish from normal, healthy cells. For this reason, tests used for the early detection of cancer often suffer from higher numbers of false positives and false negatives than comparable tests for viral or bacterial infections or for tests that measure genetic, enzymatic, or hormonal abnormalities. This often causes confusion among healthcare practitioners and their patients leading in some cases to unnecessary, expensive, and invasive follow-up testing while in other cases to a complete disregard for follow-up testing resulting in cancers being detected too late for useful intervention.
  • Physicians and patients welcome tests that yield a binary decision or result, e.g., either the patient is positive or negative for a condition, such as observed in the over the counter pregnancy test kits which present, for example, an immunoassay result in the shape of a plus sign or a negative sign as an indication of pregnancy or not.
  • a binary decision or result e.g., either the patient is positive or negative for a condition, such as observed in the over the counter pregnancy test kits which present, for example, an immunoassay result in the shape of a plus sign or a negative sign as an indication of pregnancy or not.
  • a level not obtainable for most cancer tests such binary outputs can be highly misleading or inaccurate.
  • Machine learning systems comprising diagnostic decision-support systems may use clinical decision formulas, rules, trees, or other processes for assisting a physician with making a diagnosis.
  • decision-making systems have been developed, such systems are not widely used in medical practice because these systems suffer from limitations that prevent them from being integrated into the day-to-day operations of health organizations.
  • decision-making systems may provide an unmanageable volume of data, rely on analysis that is marginally significant, and not correlate well with complex multimorbidity (Greenhalgh, T. Evidence based medicine: a movement in crisis? BMJ (2014) 348:g3725)
  • patient data may be scattered across different computer systems in both structured and unstructured form.
  • systems are difficult to interact with (Berner, 2006; Shortliffe, 2006).
  • the entry of patient data is difficult, the list of diagnostic suggestions may be too long, and the reasoning behind diagnostic suggestions is not always transparent. Further, the systems are not focused enough on next actions, and do not help the clinician figure out what to do to help the patient (Shortliffe, 2006).
  • classifier models Disclosed herein are classifier models, machine learning systems, computer implemented systems and methods thereof.
  • a method in a computer-implemented system comprising at least one processor and at least one memory, the at least one memory comprising instructions executed by the at least one processor to cause the at least one processor to implement one or more classifier models to predict an increased risk of having or developing cancer, for an asymptomatic patient, comprises obtaining measured values of a panel of biomarkers in a sample from the patient, wherein a value of a biomarker corresponds to a level of the biomarker in the sample; obtaining clinical parameters corresponding to the patient including at least age and gender; classifying the patient into a risk category of having or developing cancer using a first classifier model, wherein the first classifier model is generated by a machine learning system using first training data that comprises values of a panel of at least two biomarkers, age, and a diagnostic indicator, for a population of patients; and, wherein the first classifier model classifies the patient in an increased risk category using input variables of age and the measured values of a panel of biomarkers from the patient when an output of the
  • the machine learning system further comprises iteratively regenerating the first classifier model by training the first classifier model with new training data to improve the performance of the first classifier model.
  • the classifier model is iteratively regenerated wherein the method further comprises obtaining one or more test results from the diagnostic testing which confirm or deny the presence of cancer in the patient; incorporating the one or more test results into the first training data for further training of the first classifier model of the machine learning system; and generating an improved first classifier model by the machine learning system.
  • the training data used to train the classifier model generated by the machine learning system comprises a group of data from a group of patients with no cancer diagnosis three or more months after providing a sample. In certain other embodiments, the training data comprises a group of data from a group of patients with a cancer diagnosis three or more months after providing a sample.
  • a method in a computer implemented system comprising at least one processor and at least one memory, the at least one memory comprising instructions executed by the at least one processor to cause the at least one processor to implement one or more classifier models to predict an organ system-based malignancy for a patient with an increased risk of having or developing cancer, comprises:
  • cancer classifier model is generated by a machine learning system using training data that comprises values from a panel of at least two biomarkers, age, and a diagnostic indicator for a population of patients;
  • cancer classifier model assigns the organ system class membership using input variables of age and the measured values of the panel of biomarkers from the patient;
  • a method in a computer implemented system comprising at least one processor and at least one memory, the at least one memory comprising instructions executed by the at least one processor to cause the at least one processor to implement one or more classifier models to predict an organ system-based malignancy for a patient with an increased risk of having or developing cancer, comprising:
  • a machine learning comprising at least one processor for predicting an organ system-based malignancy for a patient with an increased risk of having or developing cancer, wherein the processor is configured to:
  • a) obtain measured values of a panel of biomarkers in a sample from the patient, wherein a value of a biomarker corresponds to a level of the biomarker in the sample;
  • e provide a notification to a user for diagnostic testing of the patient.
  • FIGS. 1A and 1B show Receiver Operating Characteristic (ROC) Curves for the best performing machine learning models, Ridge Logistic Regression (AUC 0.875, Youden Index 0.628) ( FIG. 1A ) and SVM model (AUC 0.816, Youden Index 0.631) ( FIG. 1B ) for male subject's likelihood of developing cancer within about 2 years from testing date. See Example 1 and Table 4.
  • ROC Receiver Operating Characteristic
  • kNN pattern recognition algorithm
  • FIG. 3 shows a table of input variables (biomarker measurements and age) for the classifier model and the classification of each patient into a risk category based on the output (probability value). See Example 3.
  • FIG. 4 shows workflow for performing methods to predict an increased risk of having or developing cancer, for an asymptomatic patient using the present classifier models.
  • FIGS. 5A and 5B show significant improvement of the present male classifier model for sensitivity and specificity ( FIG. 5A ) as compared to measurement of individual biomarkers (“any marker high” methods) for predicting cancer and the corresponding area under the curve (AUC) value of 0.87 ( FIG. 5B ). See Example 4.
  • FIGS. 6A and 6B show the present male classifier model was able to distinguish cancers from noncancers with 82% sensitivity and 81% specificity with a threshold value of 0.5.
  • FIGS. 7A and 7B show the present female classifier model is significantly better at predicting cancer development within one year than measurement of a panel of individual biomarkers from the same subjects ( FIG. 7A ) and corresponding AUC value of 0.67 ( FIG. 7B ).
  • the present female classifier model is an improvement as compared to individual biomarker “single threshold” method wherein the sensitivity represents a 4-fold increase as compared to the single threshold method.
  • the present female classifier model identifies 4 ⁇ more cancers in female patients as compared to the conventional methods of “any marker high”.
  • FIGS. 8A and 8B show the present female classifier model was able to distinguish cancers from noncancers with 50% sensitivity and 74% specificity with a threshold value of 0.5.
  • classifier models and there use with asymptomatic patients as to cancer for the early prediction of tumors and/or occult cancer.
  • the classifier models were generated by a machine learning system using training data that comprises values of a panel of at least two biomarkers, age, and a diagnostic indicator, for a population of patients.
  • the present classifier models were trained with biomarkers that were measured at least 3 months, if not longer, before patients received a diagnosis.
  • training data comprises a group of data from a group of patients with no cancer diagnosis three or more months after providing a sample.
  • the training data comprises a group of data from a group of patients with a cancer diagnosis three or more months after providing a sample. See Example 1A.
  • the classifier models are “trained” using machine learning systems by building a model from inputs.
  • Those inputs may be longitudinal data, wherein a known diagnosis of cancer (including matched controls) is determine months, if not years after data from measured biomarkers and clinical factors of those patients is collected. See Example 1A and 2 for training of the present classifier models using longitudinal cancer patient data.
  • the classifier model has a performance of a Receiver Operator Characteristic (ROC) curve with a sensitivity value of at least 0.8 and a specificity value of at least 0.8.
  • ROC Receiver Operator Characteristic
  • a first classifier model generated by a machine learning system, that classifies a patient into a risk category of having or developing cancer.
  • use of the classifier model classifies a patient in an increased risk category using input variables of age and the measured values of a panel of biomarkers from the patient when an output of the classifier model is above a threshold value.
  • the classifier model classifies a patient in a low risk category using input variables of age and the measured values of a panel of biomarkers from the patient when an output of the classifier model is below a threshold value.
  • the term “increased risk” refers to an increase for the presence, or development, of the cancer as compared to the known prevalence of that particular cancer across the population cohort. See Example 3.
  • a second classifier model generated by a machine learning system, that classifies a patient into an organ system or specific cancer class membership.
  • the second classifier model assigns the organ system or specific cancer class membership using input variables of age and the measured values of the panel of biomarkers from the patient.
  • a patient is classified into an organ system or specific cancer class membership using a second classifier model, when the patient was classified into an increased risk category by the first classifier model, and wherein the second classifier model is generated by a machine learning system using training data that comprises values from a panel of at least two biomarkers, age, and a diagnostic indicator for a population of patients.
  • the classifier model is static, and its use is implemented by a computer-implemented system comprising at least one processor and at least one memory, the at least one memory comprising instructions executed by the at least one processor to cause the at least one processor to implement the classifier model.
  • a machine learning system iteratively regenerates the classifier model by training the classifier model with new training data to improve the performance of the classifier model.
  • the present methods using a first classifier model and in a computer-implemented system comprising at least one processor and at least one memory, the at least one memory comprising instructions executed by the at least one processor to cause the at least one processor to implement one or more classifier models to predict an increased risk of having or developing cancer, for an asymptomatic patient, comprise obtaining measured values of a panel of biomarkers in a sample from the patient, wherein a value of a biomarker corresponds to a level of the biomarker in the sample, obtaining clinical parameters corresponding to the patient including at least age and gender, classifying the patient into a risk category of having or developing cancer using a first classifier model, wherein the first classifier model is generated by a machine learning system using first training data that comprises values of a panel of at least two biomarkers, age, and a diagnostic indicator, for a population of patients; and, wherein the first classifier model classifies the patient in an increased risk category using input variables of age and the measured values of a panel of bio
  • the first classifier model yields a numerical risk score for each patient tested, which can be used by physicians to further inform screening procedures to better predict and diagnose early stage cancer in asymptomatic patients. Those patients classified into an increased risk category may be further classified using the second classifier model into a class membership. That class membership may be an organ system malignancy, or a specific cancer type. Also, as disclosed in more detail herein, the machine learning system is adapted to receive additional data as the system is used in a real-world clinical setting and to recalculate and improve the performance so that the classifier model becomes “smarter” the more it is used.
  • the term “or” is used to refer to a nonexclusive or, such that “A or B” includes “A but not B,” “B but not A,” and “A and B,” unless otherwise indicated.
  • the term “about” is used to refer to an amount that is approximately, nearly, almost, or in the vicinity of being equal to or is equal to a stated amount, e.g., the state amount plus/minus about 5%, about 4%, about 3%, about 2% or about 1%.
  • asymptomatic refers to a patient or human subject that has not previously been diagnosed with the same cancer that their risk of having is now being quantified and categorized.
  • human subjects may show signs such as coughing, fatigue, pain, etc., but have not been previously diagnosed with lung cancer but are now undergoing screening to categorize their increased risk for the presence of cancer and for the present methods are still considered “asymptomatic”.
  • the term “AUC” refers to the Area Under the Curve, for example, of a ROC Curve. That value can assess the merit or performance of a test on a given sample population with a value of 1 representing a good test ranging down to 0.5 which means the test is providing a random response in classifying test subjects. Since the range of the AUC is only 0.5 to 1.0, a small change in AUC has greater significance than a similar change in a metric that ranges for 0 to 1 or 0 to 100%. When the % change in the AUC is given, it will be calculated based on the fact that the full range of the metric is 0.5 to 1.0.
  • a variety of statistics packages can calculate AUC for a ROC curve, such as, JMPTM or Analyse-ItTM.
  • AUC can be used to compare the accuracy of the classification model across the complete data range. Classification models with greater AUC have, by definition, a greater capacity to classify unknowns correctly between the two groups of interest (disease and no disease).
  • biological sample and “test sample” refer to all biological fluids and excretions isolated from any given subject.
  • samples include, but are not limited to, blood, blood serum, blood plasma, urine, tears, saliva, sweat, biopsy, ascites, cerebrospinal fluid, milk, lymph, bronchial and other lavage samples, or tissue extract samples.
  • blood, serum, plasma and bronchial lavage or other liquid samples are convenient test samples for use in the context of the present methods.
  • biomarker measure is information relating to a biomarker that is useful for characterizing the presence or absence of a disease. Such information may include measured values which are, or are proportional to, concentration, or that are otherwise provide qualitative or quantitative indications of expression of the biomarker in tissues or biologic fluids.
  • cancer and “cancerous” refer to or describe the physiological condition in mammals that is typically characterized by unregulated cell growth.
  • examples of cancer include but are not limited to, lung cancer, breast cancer, colon cancer, prostate cancer, hepatocellular cancer, gastric cancer, pancreatic cancer, cervical cancer, ovarian cancer, liver cancer, bladder cancer, cancer of the urinary tract, thyroid cancer, renal cancer, carcinoma, melanoma, and brain cancer.
  • the term “cohort” or “cohort population” refers to a group or segment of human subjects with shared factors or influences, such as age, family history, cancer risk factors, environmental influences, medical histories, etc.
  • a “cohort” refers to a group of human subjects with shared cancer risk factors; this is also referred to herein as a “disease cohort”.
  • a “cohort” refers to a normal population group matched, for example by age, to the cancer risk cohort; also referred to herein as a “normal cohort”.
  • a “same cohort” refers to a group of human subjects having the same shared cancer risk factors as the individual undergoing assessment for a risk of having a disease such as cancer.
  • machine learning refers to algorithms that give a computer the ability to learn without being explicitly programmed including algorithms that learn from and make predictions about data.
  • Machine learning algorithms include, but are not limited to, decision tree learning, artificial neural networks (ANN) (also referred to herein as a “neural net”), deep learning neural network, support vector machines, rule base machine learning, random forest, logistic regression, pattern recognition algorithms, etc.
  • ANN artificial neural networks
  • neural net deep learning neural network
  • linear regression or logistic regression can be used as part of a machine learning process.
  • using linear regression or another algorithm as part of a machine learning process is distinct from performing a statistical analysis such as regression with a spreadsheet program such as Excel.
  • the machine learning process has the ability to continually learn and adjust the classifier model as new data becomes available and does not rely on explicit or rules-based programming.
  • Statistical modeling relies on finding relationships between variables (e.g., mathematical equations) to predict an outcome.
  • Medical history refers to any type of medical information associated with a patient.
  • the medical history is stored in an electronic medical records database.
  • Medical history may include clinical data (e.g., imaging modalities, blood work, biomarkers, cancerous samples and control samples, labs, etc.), clinical notes, symptoms, severity of symptoms, number of years smoking, family history of a disease, history of illness, treatment and outcomes, an ICD code indicating a particular diagnosis, history of other diseases, radiology reports, imaging studies, reports, medical histories, genetic risk factors identified from genetic testing, genetic mutations, etc.
  • the term “increased risk” refers to an increase in the risk level, for a human subject after analysis by the classifier model, for the presence, or development, of a cancer relative to a population's known prevalence of a particular cancer before testing.
  • a human subject's risk for cancer before biomarker testing and/or data analysis may be 1% (based on the understood prevalence of cancer in the population), but after analysis using the classifier model the patient's risk for the presence of cancer may be 8% or alternatively reported as an increase of 8 times compared to the cohort.
  • the machine learning system calculates the 8% risk of having the cancer and the increased risk of 8 times relative to the population or cohort population is provided in more detail herein.
  • markers refer to molecules that can be evaluated in a sample and are associated with a physical condition.
  • markers include expressed genes or their products (e.g., proteins) or autoantibodies to those proteins that can be detected from human samples, such as blood, serum, solid tissue, and the like, that is associated with a physical or disease condition.
  • biomarkers include, but are not limited to, biomolecules comprising nucleotides, amino acids, sugars, fatty acids, steroids, metabolites, polypeptides, proteins (such as, but not limited to, antigens and antibodies), carbohydrates, lipids, hormones, antibodies, regions of interest which serve as surrogates for biological molecules, combinations thereof (e.g., glycoproteins, ribonucleoproteins, lipoproteins) and any complexes involving any such biomolecules, such as, but not limited to, a complex formed between an antigen and an autoantibody that binds to an available epitope on said antigen.
  • biomolecules comprising nucleotides, amino acids, sugars, fatty acids, steroids, metabolites, polypeptides, proteins (such as, but not limited to, antigens and antibodies), carbohydrates, lipids, hormones, antibodies, regions of interest which serve as surrogates for biological molecules, combinations thereof (e.g., glycoproteins, ribonucleoproteins, lip
  • biomarker can also refer to a portion of a polypeptide (parent) sequence that comprises at least 5 consecutive amino acid residues, preferably at least 10 consecutive amino acid residues, more preferably at least 15 consecutive amino acid residues, and retains a biological activity and/or some functional characteristics of the parent polypeptide, e.g. antigenicity or structural domain characteristics.
  • the present markers refer to both tumor antigens present on or in cancerous cells or those that have been shed from the cancerous cells into bodily fluids such as blood or serum.
  • the present markers as used herein, also refer to autoantibodies produced by the body to those tumor antigens.
  • a “marker” as used herein refers to both tumor antigens and autoantibodies that are capable of being detected in serum of a human subject. It is also understood in the present methods that use of the markers in a panel may each contribute equally in the classifier model or certain biomarkers may be weighted wherein the markers in a panel contribute a different weight or amount in the classifier model.
  • Biomarker may include any biological substance indicative of the presence of cancer, including but not limited to, genetic, epigenetic, proteomic, glycomic or imaging biomarkers. Biomarkers include molecules secreted by tumors or cancer, including cell freeDNA, mRNA, and protein-based products (tumor markers or antigens), etc.
  • pathology of (tumor) cancer includes all phenomena that compromise the well-being of the patient. This includes, without limitation, abnormal or uncontrollable cell growth, metastasis, interference with the normal functioning of neighboring cells, release of cytokines or other secretory products at abnormal levels, suppression or aggravation of inflammatory or immunological response, neoplasia, premalignancy, malignancy, invasion of surrounding or distant tissues or organs, such as lymph nodes, etc.
  • a “physiological sample” includes samples from biological fluids and tissues.
  • Biological fluids include whole blood, blood plasma, blood serum, sputum, urine, sweat, lymph, and alveolar lavage.
  • Tissue samples include biopsies from solid lung tissue or other solid tissues, lymph node biopsy tissues, biopsies of metastatic foci. Methods of obtaining physiological samples are well known.
  • a positive predictive score As used herein, the term “a positive predictive score,” “a positive predictive value,” or “PPV” refers to the likelihood that a score within a certain range on a biomarker test is a true positive result. It is defined as the number of true positive results divided by the number of total positive results. True positive results can be calculated by multiplying the test sensitivity times the prevalence of disease in the test population. False positives can be calculated by multiplying ( 1 minus the specificity) times (1 ⁇ the prevalence of disease in the test population). Total positive results equal True Positives plus False Positives.
  • ROC curve Receiveiver Operating Characteristic Curve
  • ROC curves can be generated for a single feature as well as for other single outputs, for example, a combination of two or more features that are combined (such as, added, subtracted, multiplied, weighted, etc.) to provide a single combined value which can be plotted in a ROC curve.
  • the ROC curve is a plot of the true positive rate (sensitivity) of a test against the false positive rate (1 ⁇ specificity) of the test.
  • ROC curves provide another means to quickly screen a data set.
  • performance of the present classifier models is determined using computed ROC curves with sensitivity and specificity values. The performance is used to compare models, and also importantly, to compare models with different variables to select a classifier model with the highest accuracy as to predicting having or developing cancer, for a patient.
  • classifier models for classifying asymptomatic patients into a risk category for having or developing cancer and/or classifying a patient with an increased risk of having or developing cancer into an organ system-based malignancy class membership and/or into a specific cancer class membership.
  • the machine learning system disclosed herein generated the present classifier models using longitudinal data from a cohort of over 12,000 asymptomatic male patients and over 15,000 asymptomatic female patients. See Example 1A and 2.
  • biomarkers were measured, and follow-up of the patients was performed to provide a diagnostic indicator in the future (e.g. no cancer development, or diagnosis of a specific cancer).
  • Using biomarkers obtained months, or even years, before cancer was detected provided a powerful tool to train the classifier models resulting in highly accurate classifier models as measured by ROC curve analysis.
  • training data comprises data from a group of patients with no cancer diagnosis three or more months after providing a sample.
  • training data comprises data from a group of patients with a cancer diagnosis three or more months after providing a sample.
  • the cohort of asymptomatic female patients was used to train a classifier model to be used with female patients and the cohort of asymptomatic male patients was used to train a classifier model to be used with male patients.
  • the gender of the patient is used to select the classifier model.
  • training data comprises a greater number of patients without cancer than with cancer, wherein training of the classifier models comprises reprocessing the training data by using a stratified sampling technique to improve selection of negative samples.
  • the classifier model has a performance of a Receiver Operator Characteristic (ROC) curve with a sensitivity value of at least 0.8 and a specificity value of at least 0.8.
  • ROC Receiver Operator Characteristic
  • the machine learning system generates a classifier model that may be static.
  • the classifier model is trained and then its use is implemented with a computer implemented system wherein patient data (e.g. biomarker marker measurements and age) are input and the classifier model provides an output that is used to classify patients.
  • patient data e.g. biomarker marker measurements and age
  • the classifier models are continuously, or routinely, being updated and improved wherein the input values, output values, along with a diagnostic indicator from patients are used to further train the classifier models.
  • the classifier model has an improved performance of a Receiver Operator Characteristic (ROC) curve having a sensitivity value of at least 0.85 and a specificity value of at least 0.8.
  • ROC Receiver Operator Characteristic
  • the classifier model is further trained and improved by the machine learning system comprising (1) obtaining one or more test results from the diagnostic testing which confirm or deny the presence of cancer in the patient, (2) incorporating the one or more test results into the training data for further training of the classifier model of the machine learning system; and (3) generating an improved classifier model by the machine learning system.
  • diagnostic testing comprises radiography screening or tissue biopsy.
  • this first classifier model is generated by a machine learning system using training data that comprises values of a panel of at least two biomarkers, age, and a diagnostic indicator, for a population of patients.
  • the first classifier model was trained using data from only a male cohort or a female cohort.
  • the training data that comprises values of a panel of at least six biomarkers.
  • the training data comprises values from a panel of biomarkers selected from AFP, CEA, CA125, CA19-9, CA 15-3, CYFRA21-1, PSA and SCC.
  • a first classifier model is generated by a machine learning system using training data that comprises a male cohort only, values of a panel of six biomarkers comprising AFP, CEA, CA19-9, CYFRA21-1, PSA and SCC, and age.
  • a first classifier model is generated by a machine learning system using training data that comprises a female cohort only, values of a panel of seven biomarkers comprising AFP, CEA, CA125, CA19-9, CA 15-3, CYFRA21-1 and SCC, and age.
  • the first classifier model classifies the patient in an increased risk category using input variables of age and the measured values of a panel of biomarkers from the patient when an output of the first classifier model is above a threshold. In embodiments, the first classifier model classifies the patient in a low (e.g., no increased risk) risk category using input variables of age and the measured values of a panel of biomarkers from the patient when an output of the first classifier model is below a threshold.
  • the output is a probability value, wherein the threshold is set to separate patients into a low risk category (those patients wherein their risk is no more than the population reflective of the training data) from an increased risk category (those patients with an increased risk of having or developing cancer as compared to a population reflective of the training data). See Example 3 and FIG. 3 .
  • the increased risk category may be further subdivided, such as a moderate risk category and a high-risk category.
  • those patients classified into an increased risk category may be assigned a risk score, such as a percent, e.g., X of 100, or multiplier number.
  • a patient may be assigned a 2 to 10% risk score (of having or developing cancer) wherein the incidence of cancer in the population used to train the classifier model is about 1%.
  • those percentage risk scores may be presented as X of 100, e.g. 3 out of 100 wherein a patient with that score has an approximately 3 out of 100 risk of developing cancer within one year from when the biomarkers were measured.
  • a threshold cut off wherein a risk score at or below would be considered normal, and a risk score above would be considered an increased risk.
  • the threshold cut off value may be 1 out of 100, corresponding to a “normal” risk of having cancer in a heterogenous population of 1%.
  • the patient may be assigned a multiplier number.
  • the risk score is not an output value, but a value assigned to a risk category, such as an increased risk category, wherein the output value is used to classify a patient into the risk category.
  • an output value is a predicted probability value that may range from 0 to 1, wherein that value is used to classify a patient into a risk category. The risk score assigned to a risk category is then calculated by comparing the predicted probability assigned to a risk category to the prevalence of cancer in a population. See Example 3.
  • a patient may have an increased risk of having or developing cancer selected from the group consisting of: breast cancer, bile duct cancer, bone cancer, cervical cancer, colon cancer, colorectal cancer, gallbladder cancer, kidney cancer, liver or hepatocellular cancer, lobular carcinoma, lung cancer, melanoma, ovarian cancer, pancreatic cancer, prostate cancer, skin cancer, and testicular cancer.
  • cancer selected from the group consisting of: breast cancer, bile duct cancer, bone cancer, cervical cancer, colon cancer, colorectal cancer, gallbladder cancer, kidney cancer, liver or hepatocellular cancer, lobular carcinoma, lung cancer, melanoma, ovarian cancer, pancreatic cancer, prostate cancer, skin cancer, and testicular cancer.
  • the classifier model is selected based on the gender of the patient.
  • the input variables for a male patient comprises measured values from a panel of at least six biomarkers and age.
  • the panel of biomarkers is selected from AFP, CEA, CA125, CA19-9, CA 15-3, CYFRA21-1, PSA and SCC.
  • the input variable for a male patient comprises measured values from AFP, CEA, CA19-9, CYFRA21-1, PSA and SCC, and age.
  • the input variables for a female patient comprises measured values from a panel of at least six biomarkers and age.
  • the input variables for a female patent comprises measured values from AFP, CEA, CA125, CA19-9, CA 15-3, CYFRA21-1 and SCC, and age.
  • the first classifier model comprises a support vector machine, a decision tree, a random forest, a neural network, a deep learning neural network, or a logistic regression algorithm.
  • a second classifier model to predict at least one most likely organ system malignancy and/or a specific cancer.
  • the second classifier model is applied to patients that are classified into an increased risk category for having or developing cancer.
  • the second classifier model was trained with measured biomarkers from a longitudinal study, and age, wherein one classifier model was trained from and for female patients and another classifier model was trained from and for male patients.
  • the second classifier model was generated by a machine learning system using training data that comprises values from a panel of at least two biomarkers, age, and a diagnostic indicator for a population of patients.
  • the second classifier model was trained using data from only a male cohort or only a female cohort.
  • the training data comprises values of a panel of at least six biomarkers.
  • the training data comprises values from a panel of biomarkers selected from AFP, CEA, CA125, CA19-9, CA 15-3, CYFRA21-1, PSA and SCC.
  • a second classifier model is generated by a machine learning system using training data that comprises a male cohort only, values of a panel of six biomarkers comprising AFP, CEA, CA19-9, CYFRA21-1, PSA and SCC, and age.
  • a second classifier model is generated by a machine learning system using training data that comprises a female cohort only, values of a panel of seven biomarkers comprising AFP, CEA, CA125, CA19-9, CA 15-3, CYFRA21-1 and SCC, and age.
  • the second classifier model has a performance of a Receiver Operator Characteristic (ROC) curve with a sensitivity value of at least 0.8 and a specificity value of at least 0.7.
  • ROC Receiver Operator Characteristic
  • the second classifier model assigns a patient into an organ system class membership using input variables of age and the measured values of the panel of biomarkers from the patient. In certain embodiments, the second classifier model assigns a patient into a specific cancer class membership using input variables of age and the measured values of the panel of biomarkers from the patient. In embodiments, the class membership is for an organ system selected from genitourinary (GU), gastrointestinal (GI), pulmonary, dermatological, hematological, nervous system, gynecological, or general. See Example 3.
  • GUI genitourinary
  • GI gastrointestinal
  • pulmonary dermatological
  • hematological hematological
  • nervous system gynecological
  • the class membership is for a cancer selected from breast cancer, bile duct cancer, bone cancer, cervical cancer, colon cancer, colorectal cancer, gallbladder cancer, kidney cancer, liver or hepatocellular cancer, lobular carcinoma, lung cancer, melanoma, ovarian cancer, pancreatic cancer, prostate cancer, skin cancer, or testicular cancer.
  • the second classifier model is selected based on the gender of the patient.
  • the input variables for a male patient comprises measured values from a panel of at least six biomarkers and age.
  • the panel of biomarkers is selected from AFP, CEA, CA125, CA19-9, CA 15-3, CYFRA21-1, PSA and SCC.
  • the input variable for a male patient comprises measured values from AFP, CEA, CA19-9, CYFRA21-1, PSA and SCC, and age.
  • the input variables for a female patient comprises measured values from a panel of at least six biomarkers and age.
  • the input variables for a female patent comprises measured values from AFP, CEA, CA125, CA19-9, CA 15-3, CYFRA21-1 and SCC, and age.
  • the second classifier model comprises a pattern recognition algorithm.
  • the second classifier model comprises k-Nearest Neighbors algorithm (kNN).
  • the second classifier model comprises a support vector machine, a decision tree, a random forest, a neural network, a deep learning neural network, or a logistic regression algorithm.
  • a machine learning system comprising at least one processor for predicting an increased risk for cancer, and/or an organ system-based malignancy, and/or a specific cancer.
  • the processor is configured to obtain measured values of a panel of biomarkers in a sample from a patient, wherein a value of a biomarker corresponds to a level of the biomarker in the sample, obtain clinical parameters from the patient including age and gender, and generate a first classifier model by the machine learning system to classify the patient into a risk category of having or developing cancer, wherein the first classifier model classifies a patient into an increased risk category when the output of the first classifier model is greater than a threshold, and wherein the first classifier model is generated by the machine learning system using training data that comprises values from a panel of at least two biomarkers, age, gender and a diagnostic indicator for a population of patients.
  • the training data is from longitudinal study wherein the biomarker measurements are obtained months, or years, before a cancer diagnosis is confirmed (or not) for a patent in the training data cohort.
  • the processor is configured to obtain measured values of a panel of biomarkers in a sample from the patient, wherein a value of a biomarker corresponds to a level of the biomarker in the sample; obtain clinical parameters from the patient including age and gender, and generate a second classifier model by the machine learning system to classify the patient into an organ system class membership, wherein the second classifier model assigns the organ system class membership using input variables of age and the measured values of the panel of biomarkers from the patient, and wherein the second classifier model is generated by a machine learning system using training data that comprises values from a panel of at least two biomarkers, age, and a diagnostic indicator for a population of patients.
  • the processor is configured to obtain measured values of a panel of biomarkers in a sample from the patient, wherein a value of a biomarker corresponds to a level of the biomarker in the sample; obtain clinical parameters from the patient including age and gender, and generate a second classifier model by the machine learning system to classify the patient into a specific cancer class membership, wherein the second classifier model assigns the specific cancer class membership using input variables of age and the measured values of the panel of biomarkers from the patient, and wherein the second classifier model is generated by a machine learning system using training data that comprises values from a panel of at least two biomarkers, age, and a diagnostic indicator for a population of patients.
  • a panel of markers from an asymptomatic human subject may be measured.
  • gene expression e.g., mRNA
  • resulting gene products e.g., polypeptides or proteins
  • tumor antigens e.g. CEA, CA-125, PSA, etc.
  • testing is preferably conducted using an automated immunoassay analyzer from a company with a large installed base.
  • Representative analyzers include the Elecsys® system from Roche Diagnostics or the Architect® Analyzer from Abbott Diagnostics. Using such standardized platforms permits the results from one laboratory or hospital to be transferable to other laboratories around the world.
  • the methods provided herein are not limited to any one assay format or to any particular set of markers that comprise a panel. For example, PCT International Pat. Pub. No. WO 2009/006323; US Pub. No. 2012/0071334; US Pat. Pub. No. 2008/0160546; US Pat. Pub. No. 2008/0133141; US Pat. Pub. No.
  • 2007/0178504 (each herein incorporated by reference) teaches a multiplex lung cancer assay using beads as the solid phase and fluorescence or color as the reporter in an immunoassay format. Hence, the degree of fluorescence or color can be provided in the form of a qualitative score as compared to an actual quantitative value of reporter presence and amount.
  • the presence and quantification of one or more antigens or antibodies in a test sample can be determined using one or more immunoassays that are known in the art.
  • Immunoassays typically comprise: (a) providing an antibody (or antigen) that specifically binds to the biomarker (namely, an antigen or an antibody); (b) contacting a test sample with the antibody or antigen; and (c) detecting the presence of a complex of the antibody bound to the antigen in the test sample or a complex of the antigen bound to the antibody in the test sample.
  • Well known immunological binding assays include, for example, an enzyme linked immunosorbent assay (ELISA), which is also known as a “sandwich assay”, an enzyme immunoassay (EIA), a radioimmunoassay (RIA), a fluoroimmunoassay (FIA), a chemiluminescent immunoassay (CLIA), a counting immunoassay (CIA), a filter media enzyme immunoassay (META), a fluorescence-linked immunosorbent assay (FLISA), agglutination immunoassays and multiplex fluorescent immunoassays (such as the Luminex Lab MAP), immunohistochemistry, etc.
  • ELISA enzyme linked immunosorbent assay
  • EIA enzyme immunoassay
  • RIA radioimmunoassay
  • FFIA fluoroimmunoassay
  • CLIA chemiluminescent immunoassay
  • CIA counting immunoassay
  • MEA filter media enzyme
  • the immunoassay can be used to determine a test amount of an antigen in a sample from a subject.
  • a test amount of an antigen in a sample can be detected using the immunoassay methods described above. If an antigen is present in the sample, it will form an antibody-antigen complex with an antibody that specifically binds the antigen under suitable incubation conditions as described herein. The amount, activity, or concentration, etc. of an antibody-antigen complex can be determined by comparing the measured value to a standard or control.
  • the AUC for the antigen can then be calculated using techniques known, such as, but not limited to, a ROC analysis.
  • gene expression of markers is measured in a sample from a human subject.
  • markers e.g., mRNA
  • gene expression profiling methods for use with paraffin-embedded tissue include quantitative reverse transcriptase polymerase chain reaction (qRT-PCR), however, other technology platforms, including mass spectroscopy and DNA microarrays can also be used. These methods include, but are not limited to, PCR, Microarrays, Serial Analysis of Gene Expression (SAGE), and Gene Expression Analysis by Massively Parallel Signature Sequencing (MPSS).
  • the sample from the human subject is a tissue section such as from a biopsy.
  • the sample from the human subject is a bodily fluid such as blood, serum, plasma or a part or fraction thereof.
  • the sample is a blood or serum and the markers are proteins measured therefrom.
  • the sample is a tissue section and the markers are mRNA expressed therein. Many other combinations of sample forms from the human subjects and the form of the markers are contemplated.
  • a panel can be selected, or as was done by the present Applicants, a panel can be selected based on measurement of individual markers in longitudinal clinical samples wherein a panel is generated based on empirical data for a desired disease such as cancer.
  • biomarkers examples include molecules detectable, for example, in a body fluid sample, such as, antibodies, antigens, small molecules, proteins, hormones, enzymes, genes and so on.
  • molecules detectable for example, in a body fluid sample
  • a body fluid sample such as, antibodies, antigens, small molecules, proteins, hormones, enzymes, genes and so on.
  • tumor antigens has many advantages due to their widespread use over many years and the fact that validated and standardized detection kits are available for many of them for use with the aforementioned automated immunoassay platforms.
  • a panel of biomarkers are selected from AFP, CEA, CA125, CA19-9, CA 15-3, CYFRA21-1, PSA and SCC.
  • the panel of biomarkers is selected from anti-p53, anti-NY-ESO-1, anti-ras, anti-Neu, anti-MAPKAPK3, cytokeratin 8, cytokeratin 19, cytokeratin 18, CEA, CA125, CA15-3, CA19-9, Cyfra 21-1, serum amyloid A, proGRP and ⁇ 1 -anti-trypsin (US 20120071334; US 20080160546; US 20080133141; US 20070178504 (each herein incorporated by reference)).
  • Additional tumor markers include human epididymal protein 4; calcitonin, PAP, BR 27.29, Her-2; and HE-4.
  • Autoantibodies that are proposed to be circulating markers for lung cancer include p53, NY-ESO-1, CAGE, GBU4-5, Annexin 1, SOX2 and IMPDH, phosphoglycerate mutase, ubiquillin, Annexin I, Annexin II, and heat shock protein 70-9B (HSP70-9B).
  • a panel of markers comprises markers associated with a cancer selected from bile duct cancer, bone cancer, pancreatic cancer, cervical cancer, colon cancer, colorectal cancer, gallbladder cancer, liver or hepatocellular cancer, ovarian cancer, testicular cancer, lobular carcinoma, prostate cancer, and skin cancer or melanoma.
  • a panel of markers comprises markers associated with breast cancer.
  • a panel of biomarkers comprises markers associated with “pan cancer”.
  • the patients were tested with the following biomarkers: AFP, CA 15-3, CA125, PSA, SCC, CEA, CA 19-9, and CYFRA, 21-1 using kits available from Roche Diagnostics, Abbott Diagnostics, and Siemens Healthcare Diagnostics.
  • the sensitivity of the panel for identifying the four most commonly diagnosed malignancies in that region was 90.9%, 75.0%, 100% and 76%, respectively.
  • Subjects with at least one of the markers showing values above the cut-off point were considered positive for the assay. No algorithm was reported. Moreover, neither clinical parameters nor biomarker velocity were factored in with this test.
  • the methods and machine learning systems according to the present invention can improve and enhance the pan-cancer biomarker panel reported by the Taiwanese group and readily permit its use in other parts of the world.
  • an algorithm that combines biomarker values with clinical parameters could be employed that automatically improves using the machine learning software.
  • a panel can comprise any number of markers as a design choice, seeking, for example, to maximize specificity or sensitivity of the classifier model.
  • the present methods may ask for presence of at least one of two or more biomarkers, three or more biomarkers, four or more biomarkers, five or more biomarkers, six or more biomarkers, seven or more biomarkers, eight biomarkers or more as a design choice.
  • the panel of biomarkers may comprise at least two, at least three, at least four, at least five, at least six, at least seven, at least eight, at least nine or at least ten or more different markers. In one embodiment, the panel of biomarkers comprises about two to ten different markers. In another embodiment, the panel of biomarkers comprises about four to eight different markers. In yet another embodiment, the panel of markers comprises about six or about seven different markers.
  • a sample is committed to the assay and the results can be a range of numbers reflecting the presence and level (e.g., concentration, amount, activity, etc.) of presence of each of the biomarkers of the panel in the sample.
  • each marker in the panel is measured and normalized wherein none of the markers are given any specific weight. In this instance each marker has a weight of 1.
  • the choice of the markers may be based on the understanding that each marker, when measured and normalized, contributed unequally as an input variable for the classifier model.
  • a particular marker in the panel can either be weighted as a fraction of 1 (for example if the relative contribution is low), a multiple of 1 (for example if the relative contribution is high) or as 1 (for example when the relative contribution is neutral compared to the other markers in the panel).
  • a machine learning system may analyze values from biomarker panels without normalization of the values.
  • the raw value obtained from the instrumentation to make the measurement may be analyzed directly.
  • Primary care healthcare practitioners who may include physicians specializing in internal medicine or family practice as well as physician assistants and nurse practitioners, are among the users of the techniques disclosed herein. These primary care providers typically see a large volume of patients each day. In one instance these patients are at risk for lung cancer due to smoking history, age, and other lifestyle factors. In 2012 about 18% of the U.S. population was current smokers and many more were former smokers with a lung cancer risk profile above that of a population that has never smoked.
  • a blood sample from patient such as a patient 50 years of age or older, is sent to a laboratory qualified to test the sample using a panel of biomarkers, such as those used to train the present classifier models generated by a machine learning system.
  • biomarkers such as those used to train the present classifier models generated by a machine learning system.
  • suitable bodily fluids such as a sputum or saliva might also be utilized.
  • the measured values of the biomarkers are then used as input values, along with age, to be used with the first classifier model in a computer implemented system.
  • An output value is obtained and compared to a threshold value wherein the threshold is empirically determined and set to separate patients in a low risk category from those in an increased risk for having or developing cancer.
  • the threshold value is empirically determined using longitudinal clinical data. If the risk calculation is to be made at the point of care, rather than at the laboratory, a software application compatible with mobile devices (e.g. a tablet or smart phone) may be employed.
  • the input variables of measured biomarkers and age may be used with the second classifier model in a computer implemented system.
  • An output value is obtained and compared to the longitudinal clinical data used to train the second classifier model and assigned a class membership, wherein the class memberships are organ system.
  • the class membership is further defined by a specific cancer type, e.g. lung cancer.
  • Embodiments of the present invention further provide for an apparatus for assessing a subject's risk level for the presence of cancer and correlating the risk level with an increase or decrease of the presence of cancer after testing relative to a population or a cohort population.
  • the apparatus may comprise a processor configured to execute computer readable media instructions (e.g., a computer program or software application, e.g., a machine learning system, to receive the concentration values from the evaluation of biomarkers in a sample and, in combination with other risk factors (e.g., medical history of the patient, publicly available sources of information pertaining to a risk of developing cancer, etc.) may determine a risk score and compare it to a grouping of stratified cohort population comprising multiple risk categories.
  • computer readable media instructions e.g., a computer program or software application, e.g., a machine learning system, to receive the concentration values from the evaluation of biomarkers in a sample and, in combination with other risk factors (e.g., medical history of the patient, publicly available sources of information
  • the apparatus can take any of a variety of forms, for example, a handheld device, a tablet, or any other type of computer or electronic device.
  • the apparatus may also comprise a processor configured to execute instructions (e.g., a computer software product, an application for a handheld device, a handheld device configured to perform the method, a world-wide-web (WWW) page or other cloud or network accessible location, or any computing device.
  • the apparatus may include a handheld device, a tablet, or any other type of computer or electronic device for accessing a machine learning system provided as a software as a service (SaaS) deployment.
  • SaaS software as a service
  • the correlation may be displayed as a graphical representation, which, in some embodiments, is stored in a database or memory, such as a random access memory, read-only memory, disk, virtual memory, etc.
  • a database or memory such as a random access memory, read-only memory, disk, virtual memory, etc.
  • Other suitable representations, or exemplifications known in the art may also be used.
  • the apparatus may further comprise a storage means for storing the correlation, an input means, and a display means for displaying the status of the subject in terms of the particular medical condition.
  • the storage means can be, for example, random access memory, read-only memory, a cache, a buffer, a disk, virtual memory, or a database.
  • the input means can be, for example, a keypad, a keyboard, stored data, a touch screen, a voice-activated system, a downloadable program, downloadable data, a digital interface, a hand-held device, or an infrared signal device.
  • the display means can be, for example, a computer monitor, a cathode ray tube (CRT), a digital screen, a light-emitting diode (LED), a liquid crystal display (LCD), an X-ray, a compressed digitized image, a video image, or a hand-held device.
  • the apparatus can further comprise or communicate with a database, wherein the database stores the correlation of factors and is accessible to the user.
  • the apparatus is a computing device, for example, in the form of a computer or hand-held device that includes a processing unit, memory, and storage.
  • the computing device can include or have access to a computing environment that comprises a variety of computer-readable media, such as volatile memory and non-volatile memory, removable storage and/or non-removable storage.
  • Computer storage includes, for example, RAM, ROM, EPROM & EEPROM, flash memory or other memory technologies, CD ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or other medium known in the art to be capable of storing computer-readable instructions.
  • the computing device can also include or have access to a computing environment that comprises input, output, and/or a communication connection.
  • the input can be one or several devices, such as a keyboard, mouse, touch screen, or stylus.
  • the output can also be one or several devices, such as a video display, a printer, an audio output device, a touch stimulation output device, or a screen reading output device.
  • the computing device can be configured to operate in a networked environment using a communication connection to connect to one or more remote computers.
  • the communication connection can be, for example, a Local Area Network (LAN), a Wide Area Network (WAN) or other networks and can operate over the cloud, a wired network, wireless radio frequency network, and/or an infrared network.
  • LAN Local Area Network
  • WAN Wide Area Network
  • Artificial intelligence systems include computer systems configured to perform tasks usually accomplished by humans, e.g., speech recognition, decision making, language translation, image processing and recognition, etc.
  • artificial intelligence systems have the capacity to learn, to maintain and access a large repository of information, to perform reasoning and analysis in order to make decisions, as well as the ability to self-correct.
  • Artificial intelligence systems may include knowledge representation systems and machine learning systems.
  • Knowledge representation systems generally provide structure to capture and encode information used to support decision making.
  • Machine learning systems are capable of analyzing data to identify new trends and patterns in the data.
  • machine learning systems may include neural networks, induction algorithms, genetic algorithms, etc. and may derive solutions by analyzing patterns in data.
  • the present classifier models comprise an algorithm such as a support vector machine, a decision tree, a random forest, a neural network, a deep learning neural network, a logistic regression or a pattern recognition algorithm.
  • the present classifier models may be used to classify an individual patient into one of a plurality of categories, e.g., a category indicative of a likelihood of cancer or a category indicating that cancer is not likely.
  • Inputs to the classifier model may include a panel of biomarkers associated with the presence of cancer as well as clinical parameters. See Example 3.
  • clinical parameters include one or more of the following: (1) age; (2) gender; (3) smoking history in years; (4) number of packs per year; (5) symptoms; (6) family history of cancer; (7) concomitant illnesses; (8) number of nodules; (9) size of nodules; and (10) imaging data and so forth.
  • the clinical parameter used as in put value is age wherein gender is used to train the classifier model providing a classifier model for male patients and a separate classifier model for female patients.
  • the clinical parameters include smoking history in years, number of packs per year, and age.
  • the panel of biomarkers comprises any two, any three, any four, any five, any six, any seven, any eight, any nine, or any ten biomarkers.
  • the panel of biomarkers comprises two or more biomarkers selected from the group consisting of: AFP, CA125, CA 15-3, CA 19-19, CEA, CYFRA 21-1, HE-4, NSE, Pro-GRP, PSA, SCC, anti-Cyclin E2, anti-MAPKAPK3, anti-NY-ESO-1, and anti-p53.
  • the panel of biomarkers comprises CA 19-9, CEA, CYFRA 21-1, NSE, Pro-GRP, and SCC. In still other embodiments, the panel of biomarkers comprises AFP, CA125, CA 15-3, CA-19-9, CEA, HE-4, and PSA. In yet other embodiments, the panel of biomarkers comprises AFP, CA125, CA 15-3, CA-19-9, Calcitonin, CEA, PAP, and PSA. In other embodiments, the panel of biomarkers comprises AFP, BR 27.29, CA12511, CA 15-3, CA-19-9, Calcitonin, CEA, Her-2, and PSA.
  • SVMs support vector machines
  • SVMs are supervised learning models that analyze data for classification and regression analysis.
  • SVMs may plot a collection of data points in n-dimensional space (e.g., where n is the number of biomarkers and clinical parameters), and classification is performed by finding a hyperplane that can separate the collection of data points into classes.
  • hyperplanes are linear, while in other embodiments, hyperplanes are non-linear.
  • SVMs are effective in high dimensional spaces, are effective in cases in which the number of dimensions is higher than the number of data points, and generally work well on data sets with clear margins of separation.
  • Decision trees are a type of supervised learning algorithm also used in classification problems. Decision trees may be used to identify the most significant variable that provides the best homogenous sets of data. Decision trees split groups of data points into one or more subsets, and then may split each subset into one or more additional categories, and so forth until forming terminal nodes (e.g., nodes that do not split). Various algorithms may be used to decide where a split occurs, including a Gini Index (a type of binary split), Chi-Square, Information Gain, or Reduction in Variance. Decision trees have the capability to rapidly identify the most significant variables among a large number of variables, as well as identify relationships between two or more variables. Additionally, decision trees can handle both numerical and non-numerical data. This technique is generally considered to be a non-parametric approach, e.g., the data does not have to fit a normal distribution.
  • Random forest (or random decision forest) is a suitable approach for both classification and regression.
  • the random forest method constructs a collection of decision trees with controlled variance.
  • nvar a number of variables less than M is used to split groups of data points. The best split is selected and the process is repeated until reaching a terminal node.
  • Random forest is particularly suited to process a large number of input variables (e.g., thousands) to identify the most significant variables. Random forest is also effective for estimating missing data.
  • Neural nets also referred to as artificial neural nets (ANNs) are described throughout this application.
  • a neural net which is a non-deterministic machine learning technique, utilizes one or more layers of hidden nodes to compute outputs. Inputs are selected and weights are assigned to each input. Training data is used to train the neural networks, and the inputs and weights are adjusted until reaching specified metrics, e.g., a suitable specificity and sensitivity.
  • ANNs may be used to classify data in cases in which correlation between dependent and independent variables is not linear or in which classification cannot be easily performed using an equation. More than 25 different types of ANNs exist, with each ANN yielding different results based on different training algorithms, activation/transfer functions, number of hidden layers, etc. In some embodiments, more than 15 types of transfer functions are available for use with the neural network. Prediction of the likelihood of having cancer is based upon one or more of the type of ANN, the activation/transfer function, the number of hidden layers, the number of neurons/nodes, and other customizable parameters.
  • Deep learning neural networks another machine learning technique, are similar to regular neural nets, but are more complex (e.g., typically have multiple hidden layers) and are capable of automatically performing operations (e.g., feature extraction) in an automated manner, generally requiring less interaction with a user than a traditional neural net.
  • inputs may be selected in order to improve the performance of the classifier model. For example, rather than picking the set of inputs that achieves the highest possible sensitivity with a clinically relevant specificity such as 80% or greater, the inputs are selected to reach a sensitivity threshold (e.g., 80% or greater), and once reaching this threshold, the inputs are selected to optimize performance of the classifier model, thereby improving the performance of the classifier model.
  • a sensitivity threshold e.g., 80% or greater
  • a set of data comprising a plurality of patient records, each patient record including a plurality of parameters and corresponding values for a patient, and wherein the set of data also includes a diagnostic indicator indicating whether or not the patient has been diagnosed with cancer is stored in a memory, accessible by the classifier model or machine learning system.
  • the plurality of parameters includes various biomarkers, clinical factors and other factors which may be selected as inputs into the classifier model.
  • the diagnostic indicator is an affirmative indicator that the patient has cancer, e.g., a lung X-ray and/or biopsy confirming a diagnosis of cancer.
  • a subset of the plurality of parameters is selected for inputs into the machine learning system, wherein the subset includes a panel of at least two different biomarkers and at least one clinical parameter, such as age.
  • the set of data (e.g. longitudinal) is randomly partitioned into training data and validation data.
  • the classifier model is generated using the machine learning system based on the training data, the subset of inputs and other parameters associated with the machine learning system as described herein. It is determined whether the classifier meets certain performance criteria, such as a predetermined Receiver Operator Characteristic (ROC) statistic, specifying a sensitivity and a specificity, for correct classification of patients. In embodiments, the specificity is at least 80% and the sensitivity is at least 75%. See Example 1A and 2.
  • ROC Receiver Operator Characteristic
  • the classifier may be iteratively regenerated based on the training data and a different subset of inputs until the classifier meets the pre-determined ROC statistic.
  • a static configuration of the classifier may be generated. This static configuration may be deployed to a physician's office for use in identifying patients at risk of having lung cancer or stored on a remote server that can be accesses by the physician's office.
  • the classifier model may be validated using the validation data.
  • the validation data also includes a plurality of parameters and corresponding values for a patient, and includes a diagnostic indicator indicating whether or not the patient has been diagnosed with cancer.
  • the validation data may be classified using the classifier model, and it may be determined whether the classifier meets the predetermined performance criteria such as a ROC statistic based on this data.
  • the classifier may be iteratively regenerated based on the training data and a different subset of the plurality of parameters, until the regenerated classifier meets the predetermined ROC statistic. The validation process may then be repeated.
  • a user may enter input values corresponding to a patient into the computing device.
  • the patient may then be classified, using the static classifier, into a risk category indicative of a likelihood of having cancer or into another risk category indicative of a likelihood of not having cancer.
  • the system may then send a notification to the user (e.g., a physician) recommending additional diagnostic testing (e.g., a CT scan, a chest x-ray or biopsy) when the patient is classified into the category indicative of a likelihood of having cancer.
  • additional diagnostic testing e.g., a CT scan, a chest x-ray or biopsy
  • the classifier model generated by the machine learning system may be continuously trained over time. Test results obtained from the diagnostic testing, which confirm or deny the presence of cancer, may be incorporated into the training data set for further training of the machine learning system, and to generate an improved classifier by the machine learning system.
  • the values of a panel of biomarkers in a sample from a patient are measured.
  • a classifier model is generated by a machine learning system to classify the patient into a risk category for having or developing cancer, wherein the classifier model has a performance of a ROC curve with a sensitivity of at least 80% and a specificity of at least 80%, and wherein the classifier is generated using the panel of biomarkers comprising at least two different biomarkers, and at least one clinical parameter, such as age.
  • a notification to a user for diagnostic testing is provided.
  • the risk category for having or developing cancer may be further categorized into qualitative groups (e.g. high, low, medium, etc.) for the likelihood of having cancer, or into quantitative groups (e.g. a percentage, multiplier, risk score, composite score) of the likelihood of having cancer.
  • a second classifier model is generated by a machine learning system to assign patients to an organ system and/or specific cancer class membership, wherein the classifier model has a performance of a ROC curve with a sensitivity of at least 70% and a specificity of at least 80%, and wherein the classifier is generated using the panel of biomarkers comprising at least two different biomarkers, and at least one clinical parameter, such as age.
  • a notification to a user for diagnostic testing is provided.
  • a computer implemented method for predicting a risk or having or developing cancer in a subject using a computer system having one or more processors coupled to a memory storing one or more computer readable instructions for execution by the one or more processors, the one or more computer readable instructions comprising instructions for: storing a set of data comprising a plurality of patient records, each patient record including a plurality of parameters for a patient, and wherein the set of data also includes a diagnostic indicator indicating whether or not the patient has been diagnosed with cancer; selecting a plurality of parameters for inputs into a machine learning system, wherein the parameters include a panel of at least two different biomarker values and at least one type of clinical data; and generating a classifier using the machine learning system, wherein the classifier comprises a sensitivity of at least 70% and a specificity of at least 80%, and wherein the classifier is based on a subset of the inputs.
  • the machine learning system may have the capability to deploy improved predictions on a scheduled basis.
  • the techniques used by the machine learning system to determine risk may remain static for a period of time, allowing consistency with regard to determination of a risk score.
  • the machine learning system may deploy updated techniques that incorporate analysis of new data to produce an improved risk score.
  • the machine learning systems described herein may operate: (1) in a static manner; (2) in a semi-static manner, in which the classifier is updated according to a prescribed schedule (e.g., at a specific time); or (3) in a continuous manner, being updated as new data is available.
  • Example 1A Development of a Multi-Marker Model for Classifying Asymptomatic Patients as to Developing Cancer: “Pan Cancer” Test
  • a multi-marker classification model and method for identifying asymptomatic patients with an increased risk for developing cancer can be categorized as “low”, “medium/moderate” or “high risk” for developing cancer, wherein the ranges for those categories may be based on, for example, probability of developing cancer within 6 months to a year, wherein the probability is measured against baseline level of cancer in the heterogenous population. It is understood in the art, that the rate of cancer is about 1% in the general population. The prevalence of cancer in the cohort used to develop the present Pan Cancer test was about 1.5%. See the below examples for more detail on the use of the test and probability values.
  • the development of the classifier model, and the selection of markers may be based on a combination of accuracy, area under the curve (AUC), sensitivity, specificity values, and/or Youden index (Sensitivity+Specificity ⁇ 1) that provide a measure of the performance of the classifier model.
  • the development and continued learning by the classifier model of the Pan Cancer Test was performed using longitudinal data and/or retrospective data over a 12-year period wherein biomarkers were measured (along with gender and age), statistical analysis performed, and that data correlated to those individuals that developed cancer. From that, a model comprising an algorithm was generated and trained to identify those individuals with an increased risk at developing cancer over the following 6 months to a year. The same principal is applied to continually increase the accuracy of the model wherein individuals and their biomarker measurements are added to the cohort and further train the model.
  • the present “pan cancer” model was developed using data from 12,622 asymptomatic males and 15,316 asymptomatic females who had sera biomarkers measured based on a tumor marker panel over a 12-year period in Taiwan.
  • the male cohort had a panel of six markers measured (AFP, CEA, CA19-9, CA15-3, CA125, PSA, SCC, and CYFRA21-1) and the female cohort had a panel of seven markers measured (AFP, CEA, CA19-9, CA125, CA15-3, SCC, and CYFRA21-1). All tumor markers were measured using commercially available in vitro diagnostic (IVD) kits and instrumentation manufactured by either Roche or Abbott Diagnostics. All assays of tumor markers met the requirements of the College of American Pathologists (CAP) Laboratory Accreditation Program. Outcome data were obtained from a cancer registry to determine whether each patient had received a new diagnosis of malignancy within 1 year of the tumor markers test.
  • IVD in vitro diagnostic
  • the biomarker panel AFP, CEA, CA19-9, CYFRA21-1, SCC and PSA were measured for all 12,622 male individuals and the biomarker panel AFP, CEA, CA19-9, CA125, CA15-3, SCC, and CYFRA21-1 were measured for all 15,316 female individuals.
  • a variable selection process was applied to select robust variables from those serum tumor markers to design cancer detection models. The accuracy, sensitivity, specificity, AUC (area under the curve), and Youden index were compared to select the best machine learning models.
  • the Youden index was used as a performance indicator for selecting the variables used in the classifier models in this study.
  • the ML models are amenable to periodic review and redefinition. Using a larger data set by combining the US and Asian cohorts, the accuracy of the pan cancer model may be further improved for females by leveraging additional data and expanding the number of clinical factor predictors. It is also possible, without wishing to be bound by a theory, that a model for females may optionally account for fluctuations in hormones, such as during pregnancy or menstrual cycles, to further improve performance.
  • the developed pan cancer model can be applied to the panel of measured biomarkers, along with age and gender, to determine the likelihood that an individual is at risk for developing cancer.
  • the time frame for developing cancer is a few months, such as within 3 months, and up to about 2 years.
  • the “likelihood” an individual is at risk for developing cancer is a probability above background that the individual tested will develop cancer within a few months to about 2 years.
  • an individual may be classified as “moderate risk” wherein their probability of developing cancer is five times (5 ⁇ ) more than baseline, wherein baseline is about 1% in the general population.
  • the likelihood a tested individual that is classified as “moderate risk” has a 5% risk of developing cancer as compared to a “low risk” individual that has a 1% risk of developing cancer over that same time period.
  • individuals identified as “moderate risk” or “high risk” may then be selected for further analysis for predicting organ system-based malignancy for a patient with an increased risk of having cancer.
  • an individual with a probability above 0.5 (50%) using the selected model of Table 5 were classified as “moderate risk” or “high risk”.
  • Individuals with a probability value below 0.5 (50%) were classified as “low risk”.
  • the performance of the selected models had a sensitivity value of 0.82 and a specificity value of 0.81.
  • a method for predicting an increased risk of having cancer for an asymptomatic patient comprising measuring values of a panel of biomarkers in a sample from a patient; obtaining clinical parameters from the patient including age and gender; utilizing a classifier generated by a machine learning system to classify the patient into a low risk, moderate risk or high risk category of having or developing cancer, wherein the classifier provides a probability value and those individuals with a probability of 0.5 or greater are classified as moderate risk or high risk, and wherein the classifier is generated using a panel of at least six biomarkers, age, gender and a diagnostic indicator from a plurality of patient records and wherein the classifier has a performance based on a Receiver Operator Characteristic (ROC) curve of a sensitivity value of at least 0.8 and a specificity value of at least 0.8; and providing a notification to a user for diagnostic testing.
  • ROC Receiver Operator Characteristic
  • the present classifier model comprises the following importance factor for each variable, and for each gender.
  • Example 1B Improvement of a Multi-Marker Model for Classifying Asymptomatic Patients as to Developing Cancer: Inclusion of Clinical Factor “Age” in Model
  • ROC Receiver Operating Characteristic
  • the classifier model using only measured sera biomarkers helped 1 in 125-200 males whereas 1 in 4-7 were harmed (false diagnosis); and, 1 in 200-333 females were helped whereas 1 in 3-8 females were harmed.
  • age was used in the present classifier model along with the measured sera biomarkers AFP, CEA, CA19-9, CYFRA 21-1 and SCC along with PSA for men and CA 15-3 and CA125 for women.
  • Table 1 shows a comparison of various models that includes all 6 biomarkers (AFP, CEA, CA19-9, CYFRA21-1, PSA and SCC) and age, wherein the classifier model performance was significantly increased with a sensitivity value of at least 0.8 and a specificity value of at least 0.8 (of a ROC curve).
  • Example 2 Development of a Model for Predicting Organ System-Based Malignancy for Individuals in the “High Risk” and “Moderate Risk” Category Based on the Pan Cancer Test
  • Example 1 Provided herein are techniques for predicting organ system-based malignancy for a patient with an increased risk of having cancer as identified in Example 1. That information can then be used to refer patients to a specialist for more invasive diagnostic testing.
  • k-Nearest Neighbors algorithm (kNN) was used to determine the top three most likely organs to develop cancer in the “moderate risk” or “high risk” classified groups the performance of the test had a sensitivity value of 81% and the specificity value was 72%.
  • a method for predicting organ system-based malignancy for a patient with an increased risk of having cancer comprising: measuring values of a panel of biomarkers in a sample from a patient; obtaining clinical parameters from the patient including age and gender; utilizing a machine learning system to classify patient with an increased risk of having or developing cancer into an appropriate category, to identify at least one most likely organ system malignancy for that patient, wherein the classifier provides a class membership, and wherein the classifier is generated using a panel of at least six biomarkers, age, gender and a diagnostic indicator from a plurality of patient records and wherein the classifier has a performance based on a Receiver Operator Characteristic (ROC) curve of a sensitivity value of at least 0.8 and a specificity value of at least 0.7; and, providing a notification to a user for diagnostic testing.
  • ROC Receiver Operator Characteristic
  • Example 3 Screening Patients for Likelihood of Developing Cancer and Predicting Mostly Likely Organ Involved in Cancer Using a Two-Step Model
  • a method for predicting organ system-based malignancy for a patient with an increased risk of having cancer wherein a model trained from the cohort in Example 1 is applied to the measured panel of biomarkers and the clinical factors of age and gender to identify those patients with an increased risk of having or developing cancer; the pan cancer test.
  • the model trained using the cohort of Example 2 is applied to the measured panel of biomarkers and the clinical factors of age and gender to provide a class membership (e.g. the organ system most likely (or top 2 or 3 organ systems)) to be involved in the cancer; the organ system-based malignancy test.
  • the trained model predicts the top three organ systems.
  • the output of the model may provide a class membership in one organ system (wherein the top three organ systems are all the same), in two organ systems (wherein two of the top three organ systems are the same) or in three organ systems (wherein the top three organ system predicted by the model are all different). See Table 6 for a list of organ systems (class membership) and representative cancer types within each class.
  • asymptomatic patients (5 male and 3 female) were first screened using the pan cancer test according to Example 1, and then those categorized as moderate or high risk were further screened using the organ system-based malignancy test according to Example 2.
  • Health History Hypertension, Diabetes, Chronic Pancreatitis, Colorectal Polyps, Crohn's Disease, Ulcerative Colitis, COPD, Chronic Bronchitis, Emphysema, etc.
  • Cancer screening history colonnoscopy, sigmoidoscopy, mammogram, X-Ray or CT scan for Lung cancer, PAP/HPV test
  • a male patient with a probability value categorized as low risk that means less than 1% of individuals with a probability value in that range will likely be found to have cancer. That risk level is no different than the general heterogeneous population; in other words, the low risk category represents no increased risk for a male patient as compared to baseline.
  • a male patient with a probability value categorized as moderate risk that means approximately 5 out of 100 individuals with a probability value in that range were diagnosed with cancer within one year of having biomarkers measured. That risk level is approximately 5% of having or developing cancer within one year, or a five times (5 ⁇ ) increase as compared to the low risk category.
  • a probability value categorized as high risk that means approximately 10 out of 100 individuals with a probability value in that range were diagnosed with cancer within one year of having those biomarkers measured. That risk level is approximately 10% of having or developing cancer within one year, or a ten times (10 ⁇ ) increase as compared to the low risk category.
  • the current iteration of the application of the pan cancer test model provides the following probability ranges for each category for female patients:
  • a female patient with a probability value categorized as low risk that means less than 1% of individuals with a probability value in that range will likely be found to have cancer. That risk level is no different than the general heterogeneous population; in other words, the low risk category represents no increased risk for a female patient as compared to baseline.
  • the low risk category represents no increased risk for a female patient as compared to baseline.
  • the low risk category represents no increased risk for a female patient as compared to baseline.
  • a female patient with a probability value categorized as moderate risk that means approximately 2 out of 100 individuals with a probability value in that range were diagnosed with cancer within one year of having biomarkers measured. That risk level is approximately 2% of having or developing cancer within one year, or a two times (2 ⁇ ) increase as compared to the low risk category.
  • a female patient with a probability value categorized as high risk that means approximately 8 out of 100 individuals with a probability value in that range were diagnosed with cancer within one year of having those biomarkers measured. That risk level is approximately 8% of having or developing cancer within one year, or an eight times (8 ⁇ ) increase as compared to the low risk category.
  • the trained pattern recognition model of Example 2 was applied to the high and moderate risk male patients and the high-risk female patient. Those same variables of FIG. 3 were used as input for the organ system-based malignancy test model.
  • the output a class membership of an organ system that represents a group of cancer types, may be used to suggest a specialist for follow-up care that may include radiography or invasive diagnostic tests.
  • a method for predicting organ system-based malignancy for a patient with an increased risk of having cancer that utilizes a two-step machine learning process wherein a first machine learning model is applied using measured sera biomarkers and age as input variables, wherein gender is used to select the measured biomarkers and to train the classifier, to categorize patients as low risk (no increased risk) or moderate or high risk wherein the latter two categories represent an increased risk of having or developing cancer within one year as compared to baseline (low risk). For those patients categorized as moderate or high risk a second machine learning classifier is applied using the measured biomarkers, age and gender as input variables and providing a class membership for an organ system that represents a number of different cancer types.
  • a method for predicting organ system-based malignancy for a patient with an increased risk of having cancer comprising: a) measuring values of a panel of biomarkers in a sample from a patient; b) obtaining clinical parameters from the patient including age and gender; c) utilizing a first classifier generated by a machine learning system to classify the patient into a low risk, moderate risk or high risk of having or developing cancer, wherein the classifier provides a probability value and those individuals with a probability of 0.5 or greater are classified as moderate risk or high risk, and wherein the classifier is generated using a panel of at least six biomarkers, age, gender and a diagnostic indicator from a plurality of patient records; utilizing a second classifier generated by a machine learning system, when a patient is classified into a medium or high risk category of developing cancer in step c), to identify at least one most likely organ system malignancy for that patient, wherein the classifier provides a class membership, and wherein the classifier is generated using a panel of
  • the machine learning system comprises one or more machine learning processors.
  • the machine learning processors are deep learning processors.
  • the one or more deep learning processors train one or more classification models using training data.
  • the machine learning system generates one or more classifiers to predict a likelihood of having cancer or developing cancer, of class membership, or of both.
  • the machine learning model may comprise one or more classifiers, one or more inputs, and one or more weighting factors for weighting of the inputs, along with one or more classification models.
  • the machine learning model may be continuously improved as new training data is available.
  • Example 4 Male Classifier Model is Superior to a Single Threshold Method of Measuring Biomarkers for Prediction of Cancer
  • Example 1 Provided herein is a demonstration that the present male classifier model, as developed in Example 1, is significantly better at predicting cancer development within one year than measurement of a panel of individual biomarkers from the same subjects.
  • the present methods and classifier models aggregate biomarker measurements and clinical factors, such as age, to predict a patient's cancer risk, whereas previous methods may measure the same panel of markers but predict, or deem a patient an increased risk for developing cancer, if any one measured biomarker is “high”.
  • any one biomarker above a threshold deemed to be clinically relevant would indicate a positive test for an increased risk of developing cancer.
  • Table 8 below provides a normal range for well-validated tumor markers, measurement of a given marker above the normal range would indicate an increased likelihood of developing cancer.
  • the present male classifier model according to Example 1, and used in Example 3, provides a significant improvement to sensitivity and specificity for predicting cancer as compared to “any marker high” methods. See FIG. 5 .
  • Biomarker Normal Range Cancers AFP ⁇ 8.3 ng/ml Liver cancer, testicular and ovarian cancers CA 19-9 ⁇ 35 U/ml Pancreatic, colorectal, stomach, liver and bile duct cancer CEA ⁇ 4.7 ng/ml Colorectal, pancreatic, (non-smokers) gastrointestinal cancers, ⁇ 5.6 ng/ml lung cancer (smokers) CYFRA 21-1 ⁇ 3.3 ng/ml Lung, H&N cancer, uterine cancer, esophagus cancer, bladder cancer, mesothelioma, some lymphomas and sarcomas PSA ⁇ 4 ng/ml Prostate cancer
  • the present male classifier model provides a substantial improvement in diagnostic accuracy over conventional methods, e.g., any marker high methods; an improvement in sensitivity is demonstrated wherein 2 ⁇ more cancers in males detected. Moreover, the present male classifier model was able to distinguish cancers from noncancers with 82% sensitivity and 81% specificity. See FIG. 6 . In this figure, the cut off between low risk and moderate or high risk was 50, or 0.5. The risk score may be provided from 0 to 1, or 0 to 100.
  • Example 5 Female Classifier Model is Superior to a Single Threshold Method of Measuring Biomarkers for Prediction of Cancer
  • the present female classifier model as developed in Example 1, is significantly better at predicting cancer development within one year than measurement of a panel of individual biomarkers from the same subjects.
  • the present female classifier model improves individual biomarker “single threshold” method wherein the sensitivity represents a 4-fold increase as compared to the single threshold method.
  • the present female classifier model identifies 4 ⁇ more cancers in female patients as compared to the conventional methods of “any marker high”. See FIG. 7 .
  • Table 9 provides a normal range for well-validated tumor markers, measurement of a given marker above the normal range would indicate an increased likelihood of developing cancer using conventional methods.
  • Biomarker Normal Range Cancers AFP ⁇ 8.3 ng/ml Liver cancer, testicular and ovarian cancers CA 19-9 ⁇ 35 U/ml Pancreatic, colorectal, stomach, liver and bile duct cancer CEA ⁇ 4.7 ng/ml Colorectal, pancreatic, (non-smokers) gastrointestinal cancers, ⁇ 5.6 ng/ml lung cancer (smokers) CYFRA 21-1 ⁇ 3.3 ng/ml Lung, H&N cancer, uterine cancer, esophagus cancer, bladder cancer, mesothelioma, some lymphomas and sarcomas CA 125 ⁇ 38 U/ml Ovarian and lung cancers CA15-3 ⁇ 25 U/ml Breast cancer
  • the present female classifier model provides a substantial improvement in diagnostic accuracy over conventional methods, e.g., any marker high methods; an improvement in sensitivity is demonstrated wherein 4 ⁇ more cancers in females are detected. Moreover, the present female classifier model was able to distinguish cancers from noncancers with 50% sensitivity and 74% specificity. See FIG. 8 . In this figure, the cut off between low risk and moderate or high risk was 50, or 0.5.
  • the risk score may be provided from 0 to 1, or 0 to 100, or X out of 100 patients (who have scored (in the population used to develop the algorithm) at or above your score were diagnosed with cancer within one year of have these biomarkers tested).
  • a heterogenous population has a cancer incidence of 1 out 100, wherein any risk score of 1 out of 100 is considered normal risk, or not an increased risk.
  • a risk score of 2 out of 100, or great classifies a patient in an increased risk category.
  • Example 6 Screening Patients for Likelihood of Developing Cancer and Identifying Patients with an Increased Risk of Developing Cancer when all Measured Biomarkers are in the Normal Range
  • this method and present classifier model uses input variables of measured biomarkers that are within a normal clinical range, wherein the pan cancer classifier model classifies the patient in an increased risk category using input variables of age and the measured values of a panel of biomarkers from the patient when an output of the first classifier model is above a threshold.
  • asymptomatic patients (2 male and 2 female) were screened using the pan cancer test according to Example 1 and Example 3.
  • the biomarkers of Table 8 were measured within the normal range, however the present male classifier model classified both patients in an increased risk category using a threshold of a 1% (cancer rate in a heterogenous population).
  • One patient (mp #1) was classified as having an increased risk of having cancer as 5 out of 100 (positive predictive value) and the other (mp #2) was classified as having an increased risk of having cancer as 12 out of 100.
  • Mp #1 was subsequently diagnosed with stage 1 liver cancer and mp #2 was subsequently diagnosed with stage 1 bladder cancer.
  • the present male classifier model classified the male patients at high risk, where normally all tumor markers low would not raise concern.
  • the biomarkers of Table 9 were measured within the normal range, however the present female classifier model classified both patients in an increased risk category using a threshold of a 1% (cancer rate in a heterogenous population).
  • One patient (fp #1) was classified as having an increased risk of having cancer as 2 out of 100 (positive predictive value) and the other (fp #2) was classified as having an increased risk of having cancer as 3 out of 100.
  • Fp # was subsequently diagnosed with stage1B lung cancer and fp #2 was subsequently diagnosed with stage 2B breast cancer.
  • the present female classifier model classified the female patients at high risk, where normally all tumor markers low would not raise concern.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Epidemiology (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Pathology (AREA)
  • Primary Health Care (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Bioethics (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Analytical Chemistry (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Chemical & Material Sciences (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
US16/458,589 2018-06-30 2019-07-01 Cancer classifier models, machine learning systems and methods of use Pending US20200005901A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/458,589 US20200005901A1 (en) 2018-06-30 2019-07-01 Cancer classifier models, machine learning systems and methods of use
US18/213,882 US20240040068A1 (en) 2018-10-29 2023-06-25 Fast and/or slow motion compensating timer display

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862692683P 2018-06-30 2018-06-30
US16/458,589 US20200005901A1 (en) 2018-06-30 2019-07-01 Cancer classifier models, machine learning systems and methods of use

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US16/173,033 Continuation US10388322B1 (en) 2018-10-29 2018-10-29 Real time video special effects system and method

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/793,747 Continuation-In-Part US11218646B2 (en) 2018-10-29 2020-02-18 Real time video special effects system and method

Publications (1)

Publication Number Publication Date
US20200005901A1 true US20200005901A1 (en) 2020-01-02

Family

ID=68987635

Family Applications (1)

Application Number Title Priority Date Filing Date
US16/458,589 Pending US20200005901A1 (en) 2018-06-30 2019-07-01 Cancer classifier models, machine learning systems and methods of use

Country Status (4)

Country Link
US (1) US20200005901A1 (ja)
JP (1) JP7431760B2 (ja)
CN (1) CN112970067A (ja)
WO (1) WO2020006547A1 (ja)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111222575A (zh) * 2020-01-07 2020-06-02 北京联合大学 一种基于hrrp目标识别的klxs多模型融合方法及系统
US20200185059A1 (en) * 2018-12-10 2020-06-11 Grail, Inc. Systems and methods for classifying patients with respect to multiple cancer classes
CN111276243A (zh) * 2020-01-22 2020-06-12 首都医科大学附属北京佑安医院 一种基于生物标志物的多变量分类系统和方法
CN111584064A (zh) * 2020-03-27 2020-08-25 湖州市中心医院 一种结、直肠癌转移预测系统及其使用方法
CN111583993A (zh) * 2020-05-29 2020-08-25 杭州广科安德生物科技有限公司 构建体外检测癌症的数学模型的方法及其应用
CN112259221A (zh) * 2020-10-21 2021-01-22 北京大学第一医院 基于多种机器学习算法的肺癌诊断系统
US20210057100A1 (en) * 2019-08-22 2021-02-25 Kenneth Neumann Methods and systems for generating a descriptor trail using artificial intelligence
US20210057099A1 (en) * 2019-08-22 2021-02-25 Kenneth Neumann Methods and systems for generating a descriptor trail using artificial intelligence
CN112652361A (zh) * 2020-12-29 2021-04-13 中国医科大学附属盛京医院 一种基于gbdt模型的骨髓瘤高风险筛查方法及其应用
US20210241046A1 (en) * 2019-11-26 2021-08-05 University Of North Texas Compositions and methods for cancer detection and classification using neural networks
WO2021206925A1 (en) * 2020-04-06 2021-10-14 General Genomics, Inc. Predicting susceptibility of living organisms to medical conditions using machine learning models
CN113539493A (zh) * 2021-06-23 2021-10-22 吾征智能技术(北京)有限公司 一种利用多模态风险因素推断癌症风险概率的系统
US20210345925A1 (en) * 2018-09-21 2021-11-11 Carnegie Mellon University A data processing system for detecting health risks and causing treatment responsive to the detection
WO2021247577A1 (en) * 2020-06-01 2021-12-09 2020 Genesystems Methods and software systems to optimize and personalize the frequency of cancer screening blood tests
CN113913518A (zh) * 2021-08-31 2022-01-11 广州市金域转化医学研究院有限公司 成熟b细胞肿瘤的分型标志物及其应用
WO2022015700A1 (en) * 2020-07-13 2022-01-20 20/20 GeneSystems Universal pan cancer classifier models, machine learning systems and methods of use
US20220084632A1 (en) * 2019-06-27 2022-03-17 Veracyte, Inc. Clinical classfiers and genomic classifiers and uses thereof
CN114974589A (zh) * 2022-06-10 2022-08-30 燕山大学 一种宫颈癌预测方法
US11475302B2 (en) * 2019-04-05 2022-10-18 Koninklijke Philips N.V. Multilayer perceptron based network to identify baseline illness risk
US11487608B2 (en) * 2018-12-11 2022-11-01 Rovi Guides, Inc. Entity resolution framework for data matching
WO2022251633A1 (en) * 2021-05-28 2022-12-01 University Of Southern California A radiomic-based machine learing algorithm to reliably differentiate benign renal masses from renal carcinoma
WO2022241264A3 (en) * 2021-05-13 2023-01-26 Arizona Board Of Regents On Behalf Of The University Of Arizona Method of targeted multi-panel approach and tiered a.i. use for differential diagnosis and prognosis
CN116259414A (zh) * 2023-05-09 2023-06-13 南京诺源医疗器械有限公司 转移性淋巴结区分模型、构建方法及应用
US20230207128A1 (en) * 2021-12-29 2023-06-29 AiOnco, Inc. Processing encrypted data for artificial intelligence-based analysis
US20230243830A1 (en) * 2020-10-05 2023-08-03 Freenome Holdings, Inc. Markers for the early detection of colon cell proliferative disorders
CN116779179A (zh) * 2023-08-22 2023-09-19 聊城市第二人民医院 一种基于支持向量机的肾细胞瘤背景信息分析系统
US11783915B2 (en) 2018-06-01 2023-10-10 Grail, Llc Convolutional neural network systems and methods for data classification
TWI818203B (zh) * 2020-10-23 2023-10-11 國立臺灣大學醫學院附設醫院 基於病患病情的分類模型建立方法
US11817214B1 (en) 2019-09-23 2023-11-14 FOXO Labs Inc. Machine learning model trained to determine a biochemical state and/or medical condition using DNA epigenetic data

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11621080B2 (en) * 2014-12-08 2023-04-04 20/20 GeneSystems Methods and machine learning systems for predicting the likelihood or risk of having cancer

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5983211A (en) * 1996-01-24 1999-11-09 Heseltine; Gary L. Method and apparatus for the diagnosis of colorectal cancer
US20090061422A1 (en) * 2005-04-19 2009-03-05 Linke Steven P Diagnostic markers of breast cancer treatment and progression and methods of use thereof
KR101401561B1 (ko) * 2010-12-30 2014-06-11 주식회사 바이오인프라 복합 바이오마커를 활용한 암 진단 정보 생성 방법, 및 암 진단 예측 시스템 장치
IL278227B (en) * 2011-04-29 2022-07-01 Cancer Prevention & Cure Ltd Data classification systems for identifying biomarkers and diagnosing diseases
US9753043B2 (en) * 2011-12-18 2017-09-05 20/20 Genesystems, Inc. Methods and algorithms for aiding in the detection of cancer
US9753037B2 (en) * 2013-03-15 2017-09-05 Rush University Medical Center Biomarker panel for detecting lung cancer
WO2015066564A1 (en) * 2013-10-31 2015-05-07 Cancer Prevention And Cure, Ltd. Methods of identification and diagnosis of lung diseases using classification systems and kits thereof
DK3071973T3 (da) * 2013-11-21 2021-01-11 Pacific Edge Ltd Triage af patienter med asymptomatisk hæmaturi ved hjælp af genotype- og fænotypebiomarkører
TWI630501B (zh) * 2016-07-29 2018-07-21 長庚醫療財團法人林口長庚紀念醫院 Establishment of a cancer prediction model and a method for analyzing cancer detection results in combination with a tumor marker set

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11621080B2 (en) * 2014-12-08 2023-04-04 20/20 GeneSystems Methods and machine learning systems for predicting the likelihood or risk of having cancer

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Cairns, S. R., British Society of Gastroenterology, & Association of Coloproctology for Great Britain and Ireland (2010). Guidelines for colorectal cancer screening and surveillance in moderate and high risk groups (update from 2002). Gut, 59(5), 666–689. (Year: 2010) *
Kovalchik, Stephanie A., et al. "A regression model for risk difference estimation in population-based case–control studies clarifies gender differences in lung cancer risk of smokers and never smokers." BMC medical research methodology 13.1 (2013): 1-8 (Year: 2013) *
Prescott, Eva, et al. "Gender and smoking-related risk of lung cancer." Epidemiology (1998): 79-83 (Year: 1998) *
Wen, Y. H., Chang, P. Y., Hsu, C. M., Wang, H. Y., Chiu, C. T., & Lu, J. J. (2015). Cancer screening through a multi-analyte serum biomarker panel during health check-up examinations: Results from a 12-year experience. Clinica chimica acta; international journal of clinical chemistry, 450, 273–276 (Year: 2015) *
Yan, S., Qian, W., Guan, Y., & Zheng, B. (2016). Improving lung cancer prognosis assessment by incorporating synthetic minority oversampling technique and score fusion method. Medical physics, 43(6) (Year: 2016) *

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11783915B2 (en) 2018-06-01 2023-10-10 Grail, Llc Convolutional neural network systems and methods for data classification
US20210345925A1 (en) * 2018-09-21 2021-11-11 Carnegie Mellon University A data processing system for detecting health risks and causing treatment responsive to the detection
US20200185059A1 (en) * 2018-12-10 2020-06-11 Grail, Inc. Systems and methods for classifying patients with respect to multiple cancer classes
US11581062B2 (en) * 2018-12-10 2023-02-14 Grail, Llc Systems and methods for classifying patients with respect to multiple cancer classes
US11487608B2 (en) * 2018-12-11 2022-11-01 Rovi Guides, Inc. Entity resolution framework for data matching
US11475302B2 (en) * 2019-04-05 2022-10-18 Koninklijke Philips N.V. Multilayer perceptron based network to identify baseline illness risk
US20220084632A1 (en) * 2019-06-27 2022-03-17 Veracyte, Inc. Clinical classfiers and genomic classifiers and uses thereof
US20210057100A1 (en) * 2019-08-22 2021-02-25 Kenneth Neumann Methods and systems for generating a descriptor trail using artificial intelligence
US20210057099A1 (en) * 2019-08-22 2021-02-25 Kenneth Neumann Methods and systems for generating a descriptor trail using artificial intelligence
US11810669B2 (en) * 2019-08-22 2023-11-07 Kenneth Neumann Methods and systems for generating a descriptor trail using artificial intelligence
US11581094B2 (en) * 2019-08-22 2023-02-14 Kpn Innovations, Llc. Methods and systems for generating a descriptor trail using artificial intelligence
US11817214B1 (en) 2019-09-23 2023-11-14 FOXO Labs Inc. Machine learning model trained to determine a biochemical state and/or medical condition using DNA epigenetic data
US20210241046A1 (en) * 2019-11-26 2021-08-05 University Of North Texas Compositions and methods for cancer detection and classification using neural networks
CN111222575A (zh) * 2020-01-07 2020-06-02 北京联合大学 一种基于hrrp目标识别的klxs多模型融合方法及系统
CN111276243A (zh) * 2020-01-22 2020-06-12 首都医科大学附属北京佑安医院 一种基于生物标志物的多变量分类系统和方法
CN111584064A (zh) * 2020-03-27 2020-08-25 湖州市中心医院 一种结、直肠癌转移预测系统及其使用方法
WO2021206925A1 (en) * 2020-04-06 2021-10-14 General Genomics, Inc. Predicting susceptibility of living organisms to medical conditions using machine learning models
CN111583993A (zh) * 2020-05-29 2020-08-25 杭州广科安德生物科技有限公司 构建体外检测癌症的数学模型的方法及其应用
WO2021247577A1 (en) * 2020-06-01 2021-12-09 2020 Genesystems Methods and software systems to optimize and personalize the frequency of cancer screening blood tests
WO2022015700A1 (en) * 2020-07-13 2022-01-20 20/20 GeneSystems Universal pan cancer classifier models, machine learning systems and methods of use
US20230243830A1 (en) * 2020-10-05 2023-08-03 Freenome Holdings, Inc. Markers for the early detection of colon cell proliferative disorders
CN112259221A (zh) * 2020-10-21 2021-01-22 北京大学第一医院 基于多种机器学习算法的肺癌诊断系统
TWI818203B (zh) * 2020-10-23 2023-10-11 國立臺灣大學醫學院附設醫院 基於病患病情的分類模型建立方法
CN112652361A (zh) * 2020-12-29 2021-04-13 中国医科大学附属盛京医院 一种基于gbdt模型的骨髓瘤高风险筛查方法及其应用
WO2022241264A3 (en) * 2021-05-13 2023-01-26 Arizona Board Of Regents On Behalf Of The University Of Arizona Method of targeted multi-panel approach and tiered a.i. use for differential diagnosis and prognosis
WO2022251633A1 (en) * 2021-05-28 2022-12-01 University Of Southern California A radiomic-based machine learing algorithm to reliably differentiate benign renal masses from renal carcinoma
CN113539493A (zh) * 2021-06-23 2021-10-22 吾征智能技术(北京)有限公司 一种利用多模态风险因素推断癌症风险概率的系统
CN113913518A (zh) * 2021-08-31 2022-01-11 广州市金域转化医学研究院有限公司 成熟b细胞肿瘤的分型标志物及其应用
US20230207128A1 (en) * 2021-12-29 2023-06-29 AiOnco, Inc. Processing encrypted data for artificial intelligence-based analysis
CN114974589A (zh) * 2022-06-10 2022-08-30 燕山大学 一种宫颈癌预测方法
CN116259414A (zh) * 2023-05-09 2023-06-13 南京诺源医疗器械有限公司 转移性淋巴结区分模型、构建方法及应用
CN116779179A (zh) * 2023-08-22 2023-09-19 聊城市第二人民医院 一种基于支持向量机的肾细胞瘤背景信息分析系统

Also Published As

Publication number Publication date
JP7431760B2 (ja) 2024-02-15
CN112970067A (zh) 2021-06-15
JP2021529954A (ja) 2021-11-04
WO2020006547A1 (en) 2020-01-02

Similar Documents

Publication Publication Date Title
JP7431760B2 (ja) 癌分類子モデル、機械学習システム、および使用方法
US20240112811A1 (en) Methods and machine learning systems for predicting the likelihood or risk of having cancer
Xiao et al. Comparison and development of machine learning tools in the prediction of chronic kidney disease progression
JP7250693B2 (ja) 初期ステージの肺がん診断のための血漿ベースのタンパク質プロファイリング
US20230263477A1 (en) Universal pan cancer classifier models, machine learning systems and methods of use
US20190072554A1 (en) Methods of Identification and Diagnosis of Lung Diseases Using Classification Systems and Kits Thereof
Ostrin et al. Contribution of a blood-based protein biomarker panel to the classification of indeterminate pulmonary nodules
Kiessling The changing face of cancer diagnosis: from computational image analysis to systems biology
US20230243830A1 (en) Markers for the early detection of colon cell proliferative disorders
CN113270188A (zh) 食管鳞癌根治术后患者预后预测模型构建方法及装置
Rashid et al. Artificial intelligence in acute respiratory distress syndrome: A systematic review
Tang et al. Diagnosis of hepatocellular carcinoma based on salivary protein glycopatterns and machine learning algorithms
CA3202255A1 (en) Markers for the early detection of colon cell proliferative disorders
He et al. A novel clinical model for predicting malignancy of solitary pulmonary nodules: a multicenter study in Chinese population
Wang et al. Survival risk prediction model for ESCC based on relief feature selection and CNN
US20230223145A1 (en) Methods and software systems to optimize and personalize the frequency of cancer screening blood tests
Popa et al. A new approach to predict ulcerative colitis activity through standard clinical–biological parameters using a robust neural network model
US20130080101A1 (en) System, method and computer-accessible medium for evaluating a malignancy status in at-risk populations and during patient treatment management
Kanellakis et al. Management of incidental nodules in lung cancer screening: ready for prime-time?
Yadav et al. Artificial Intelligence: A Promising Tool in Diagnosis of Respiratory Diseases
Nayak et al. Computational Intelligence in Cancer Diagnosis: Progress and Challenges
Liu et al. Detection of Nasopharyngeal Carcinoma Using Routine Medical Tests via Machine Learning
CN117831690A (zh) 检测待测血样异常信号定量的计算机实施方法
Gray Validating and Updating Lung Cancer Prediction Models
CN115862838A (zh) 一种基于机器学习算法的胆管癌诊断模型及其构建方法和应用

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: ADVISORY ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED