WO2022047352A1 - Method for early treatment and detection of women specific cancers - Google Patents

Method for early treatment and detection of women specific cancers Download PDF

Info

Publication number
WO2022047352A1
WO2022047352A1 PCT/US2021/048337 US2021048337W WO2022047352A1 WO 2022047352 A1 WO2022047352 A1 WO 2022047352A1 US 2021048337 W US2021048337 W US 2021048337W WO 2022047352 A1 WO2022047352 A1 WO 2022047352A1
Authority
WO
WIPO (PCT)
Prior art keywords
samples
diseased
cancer
cancerous
metabolite
Prior art date
Application number
PCT/US2021/048337
Other languages
French (fr)
Inventor
Kanury Venkata Subba Rao
Rajat ANAND
Najmuddin Mohd Saquib
Zaved SIDDIQUI
Ganga SAGAR
Sujata Nayak
Ankur Gupta
Original Assignee
Predomix, Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Predomix, Inc filed Critical Predomix, Inc
Publication of WO2022047352A1 publication Critical patent/WO2022047352A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/40ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to mechanical, radiation or invasive therapies, e.g. surgery, laser therapy, dialysis or acupuncture

Definitions

  • the present invention relates to the field of metabolomics, and more particularly the present invention relates to metabolite bio-signatures and their use for detection of early stage cancers in women.
  • Cancer is a leading cause of death worldwide, with the disease burden expanding in countries of all income levels due to growth and aging.
  • cancer is the second most important cause of death globally, with about 7 million new cases and over 3.5 million deaths being recorded each year (cancer Epidemiol Biomarkers Prev 26(4). doi: 10.1158/1055-9965. EPI- 16-0858).
  • the greatest numbers of cancer cases and deaths among females are in Eastern Asia, followed by North America and South-Central Asia.
  • the leading female specific cancers are breast cancer, cervical cancer, uterine or endometrial cancer, and ovarian cancer. Of these, breast cancer is the most frequently diagnosed and accounts for 25% of cancer cases, along with 15% of cancer-related deaths among women across the world (CancerBase No.
  • cervical cancer is the fourth most frequently diagnosed cancer in women, with an estimate of over 500,000 cases worldwide.
  • Uterine cancer accounts for about 5% and 2% of global cancer incidence and mortality among women, whereas ovarian cancer accounts for about 4% of the women cancers.
  • Metabolomics is an emerging field and is broadly defined as the comprehensive measurement of all metabolites and low-molecular-weight molecules in a biological specimen. Metabolomics affords profiling of much larger numbers of metabolites than are presently covered in standard clinical laboratory techniques. Hence it facilitates comprehensive coverage of biological processes and metabolic pathways. Consequently, it holds promise to serve as an essential objective lens in the molecular microscope for precision medicine. This is particularly relevant as metabolites have been described as proximal reporters of disease because their abundances in biological specimens are often directly related to pathogenic mechanisms.
  • Metabolomics is an especially relevant technique for cancer detection. Cancer cells have significantly altered metabolism and, therefore, the pattern of metabolites produced can yield a "signature" that is indicative of the cancer's presence or behavior. Importantly, and in contrast to gene expression profiling as a risk stratifier, this is a signal that originates directly or indirectly from micrometastatic disease, rather than one derived from features of the primary tumor. As a result, metabolome derived signatures provide a high-precision risk-stratifier for disease, with an accuracy that can far exceed those of methods based on DNA or protein markers. Untargeted metabolome profiles, however, are complex and multivariate in nature, and cannot be accurately analyzed by linear analytical methods. Such data, however, is readily amenable to the application of Al-based methodologies. By exploring non-linear variables in the data that correlate with defined clinical states, one can potentially extract metabolite signatures that are characteristic of a given disease state.
  • Metabolomics is now frequently used in oncology research, with particular emphasis on early diagnosis, monitoring, and prognosis of cancers. For example, several studies have exploited metabolomics analysis for both diagnosis and prognosis of breast cancer. Collectively, however, these studies have suffered from a variability in results, as well as limited accuracy. Similarly, the application of metabolomics for endometrial cancer resulted in the identification of metabolites that could predict the presence of cancer, tumor behavior, and also the pathological characteristics. These findings, however, await validation. A recent analysis identified metabolite signatures for cervical intraepithelial neoplasia and cervical cancer. The sample sizes though were relatively small and the discriminatory capacity of the test was sub- optimal. Metabolomic approaches for diagnosis of ovarian cancer has been recently reviewed. The inference was that while metabolomics offers significant new opportunities for ovarian cancer diagnosis, further work needed to be done.
  • US 9459255 discloses amino acids that are useful in discriminating between breast cancer and breast cancer-free individuals. A multivariate discriminant was found, which included the concentrations of the identified amino acids as explanatory variables, that correlated significantly with the state of breast cancer. The sensitivity of the method, however, was only about 87% whereas the specificity was about 85%.
  • US 2011/0143444 discloses a method for evaluating female genital cancer, by using the amino acid concentrations in blood collected from subjects. This method evaluates the state of female genital cancer including at least one of cervical cancer, endometrial cancer, and ovarian cancer in the subject. The total number of subject samples tested, however, was small and the discriminatory power of the method was weak; ranging from 55% to 81% for the individual cancers.
  • US 2017/0003291 is drawn to a method for diagnosing endometrial cancer by detecting, in a biological sample from a patient, variations in concentrations of specific lipids and some small metabolites.
  • US 2017/0097355 describes methods for measuring metabolic changes useful in the differentiation between ovarian cancer and benign ovarian tumor.
  • Two independent LC-MS- based metabolomics platforms including a global lipidomics approach, were used to screen for differentially abundant plasma metabolites between cases with serous ovarian carcinoma and controls with benign serous ovarian tumor. While the combination of small molecule with lipidome profiling yielded test with good sensitivity (95%), the specificity however was less than 50%. This limits the utility of the test for patient screening.
  • the present invention relates to a process that may be implemented to differentiate between the women-specific cancer samples (breast cancer, endometrial cancer, cervical cancer, and ovarian cancer) from the normal controls and further to differentiate between the women-specific cancer samples to detect stage 0 / 1 endometrial cancer, breast cancer, cervical cancer and ovarian cancer from disease control individuals among adult women.
  • women-specific cancer samples breast cancer, endometrial cancer, cervical cancer, and ovarian cancer
  • Figure 1 illustrates a schematic representation of the metabolomics process implemented in the present invention for differentiating the diseased cancer samples (for example, breast cancer, endometrial cancer, cervical cancer, and ovarian cancer) from the normal controls and further to identify each specific disease sample from among the disease cancer samples, in accordance with an embodiment of the present invention;
  • the diseased cancer samples for example, breast cancer, endometrial cancer, cervical cancer, and ovarian cancer
  • Figure 2 depicts an exemplary flowchart illustrating a method for the metabolomics process implemented in the present invention for differentiating the diseased cancer samples (for example, breast cancer, endometrial cancer, cervical cancer, and ovarian cancer) from the normal controls and further to identify each specific disease sample from among the disease cancer samples, in accordance with an embodiment of the present invention;
  • the diseased cancer samples for example, breast cancer, endometrial cancer, cervical cancer, and ovarian cancer
  • Figures 3 A-E each depicts an exemplary bar graphs illustrating Age-wise distribution of serum samples among normal controls (A) and cancer individuals (B)-(E), in accordance with an embodiment of the present invention
  • Figure 4 depicts exemplary representative images of the chromatograms obtained from normal control and the individual cancer samples, in accordance with an embodiment of the present invention
  • Figures 5 A-E each depicts exemplary graphical representation of age-wise distribution of total ions detected during untargeted metabolomics profiling, in accordance with an embodiment of the present invention
  • Panel A and B depict mass and Retention time index for each ion box.
  • the figure depicts the indices for each ion box which is used to calculate the mass (Panel A) and retention time windows (Panel B) for each ion box.
  • Panel C depicts exemplary graphical representation of the data preprocessing pipeline used to make the data amenable to Al models, in accordance with an embodiment of the present invention.
  • Figure 7 depicts exemplary graphical representation illustrating PCA Plot of the matrix of samples and metabolites versus metabolite intensity showing the clear separation of samples on the basis of their clinical information, in accordance with an embodiment of the present invention
  • Panel A depicts an exemplary block diagram illustrating an Al workflow for detection of the disease Cancers samples (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) from Normal Controls, specifically the Al workflow used to make and test the Al models for distinguishing between the Cancer group from the Normal controls, in accordance with an embodiment of the present invention
  • Panel B depicts an exemplary representation illustrating testing the trained model for disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls, in accordance with an embodiment of the present invention;
  • Panel A depicts an exemplary block diagram illustrating an Al workflow for detection of the disease Cancers samples (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) from Normal Controls, specifically the Al workflow used to make and test the Al models for testing of the multiclass trained model separation of Endometrial Cancers, in accordance with an embodiment of the present invention
  • Panel B depicts an exemplary representation illustrating testing the trained model for disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multiclass trained model separation of Endometrial Cancers versus Other Cancers based on model’s Endometrial scores, with a resulting confusion matrix on applying a threshold shows high accuracy, sensitivity and specificity, in accordance with an embodiment of the present invention;
  • disease Cancers for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer
  • Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multiclass trained model separation of Endometrial Cancers versus Other Cancers based on model’s Endometrial scores, with a resulting confusion
  • Panel A depicts an exemplary block diagram illustrating an Al workflow for detection of the disease Cancers samples (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) from Normal Controls, specifically the Al workflow used to make and test the Al models for testing of the multi class trained model separation of Breast Cancers, in accordance with an embodiment of the present invention
  • Panel B depicts an exemplary representation illustrating testing the trained model for disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multi class trained model separation of Breast Cancers versus Other Cancers based on model’s Breast scores, with a resulting confusion matrix on applying a threshold shows high accuracy, sensitivity and specificity, in accordance with an embodiment of the present invention; Figure 11.
  • disease Cancers for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer
  • Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multi class trained model separation of Breast Cancers versus Other Cancers based on model’s Breast scores, with a resulting confusion matrix on applying a threshold shows high accuracy,
  • Panel A depicts an exemplary block diagram illustrating an Al workflow for detection of the disease Cancers samples (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) from Normal Controls, specifically the Al workflow used to make and test the Al models for testing of the multi class trained model separation of Cervical Cancers, in accordance with an embodiment of the present invention
  • Panel B depicts an exemplary representation illustrating testing the trained model for disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multiclass trained model separation of Cervical Cancers versus Other Cancers based on model’s Cervical scores, with a resulting confusion matrix on applying a threshold shows high accuracy, sensitivity and specificity, in accordance with an embodiment of the present invention;
  • disease Cancers for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer
  • Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multiclass trained model separation of Cervical Cancers versus Other Cancers based on model’s Cervical scores, with a resulting confusion matrix on applying
  • Panel A depicts an exemplary block diagram illustrating an Al workflow for detection of the disease Cancers samples (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) from Normal Controls, specifically the Al workflow used to make and test the Al models for testing of the multiclass trained model separation of Ovarian Cancers, in accordance with an embodiment of the present invention
  • Panel B depicts an exemplary representation illustrating testing the trained model for disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multiclass trained model separation of Ovarian Cancers versus Other Cancers based on model’s Ovarian scores, with a resulting confusion matrix on applying a threshold shows high accuracy, sensitivity and specificity, in accordance with an embodiment of the present invention;
  • disease Cancers for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer
  • Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multiclass trained model separation of Ovarian Cancers versus Other Cancers based on model’s Ovarian scores, with a resulting confusion matrix on applying
  • Panel A depicts an exemplary block diagram illustrating an Al workflow, using the total sample set (See Fig. 16, Table-A), for detection of the disease Cancers samples (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) from Normal Controls, specifically the Al workflow used to make and test the Al models for testing of the multiclass trained model for distinguishing between the individual Cancer groups (for example, between breast cancer, endometrial cancer, cervical cancer and ovarian cancer), in accordance with an embodiment of the present invention;
  • the disease Cancers samples for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer
  • Normal Controls specifically the Al workflow used to make and test the Al models for testing of the multiclass trained model for distinguishing between the individual Cancer groups (for example, between breast cancer, endometrial cancer, cervical cancer and ovarian cancer), in accordance with an embodiment of the present invention
  • Panel B depicts an exemplary representation illustrating testing the trained model for disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multiclass trained model separation of Endometrial Cancers versus Other Cancers based on model’s Endometrial scores, with a resulting confusion matrix on applying a threshold shows high accuracy, sensitivity and specificity, in accordance with an embodiment of the present invention;
  • disease Cancers for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer
  • Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multiclass trained model separation of Endometrial Cancers versus Other Cancers based on model’s Endometrial scores, with a resulting confusion
  • Panel C depicts an exemplary representation illustrating testing the trained model for disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multiclass trained model separation of Breast Cancers versus Other Cancers based on model’s Breast scores, with a resulting confusion matrix on applying a threshold shows high accuracy, sensitivity and specificity, in accordance with an embodiment of the present invention;
  • disease Cancers for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer
  • Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multiclass trained model separation of Breast Cancers versus Other Cancers based on model’s Breast scores, with a resulting confusion matrix on applying a threshold shows high accuracy, sensitivity
  • Panel D depicts an exemplary representation illustrating testing the trained model for disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multiclass trained model separation of Cervical Cancers versus Other Cancers based on model’s Cervical scores, with a resulting confusion matrix on applying a threshold shows high accuracy, sensitivity and specificity, in accordance with an embodiment of the present invention;
  • disease Cancers for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer
  • Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multiclass trained model separation of Cervical Cancers versus Other Cancers based on model’s Cervical scores, with a resulting confusion matrix on applying
  • Panel E depicts an exemplary representation illustrating testing the trained model for disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multiclass trained model separation of Ovarian Cancers versus Other Cancers based on model’s Ovarian scores, with a resulting confusion matrix on applying a threshold shows high accuracy, sensitivity and specificity, in accordance with an embodiment of the present invention;
  • disease Cancers for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer
  • Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multiclass trained model separation of Ovarian Cancers versus Other Cancers based on model’s Ovarian scores, with a resulting confusion matrix on applying
  • Figure 14 Depicts an exemplary graphical representation illustrating Coefficient of each metabolite in the signature of the BECO Cancer group, in accordance with an embodiment of the present invention.
  • Panel A depicts an exemplary graphical representation illustrating Coefficient of each metabolite in the signature of Endometrial Cancer, in accordance with an embodiment of the present invention
  • Panel B depicts an exemplary graphical representation illustrating Coefficient of each metabolite in the signature of Breast Cancer, in accordance with an embodiment of the present invention
  • Panel C depicts an exemplary graphical representation illustrating Coefficient of each metabolite in the signature of Cervical Cancer, in accordance with an embodiment of the present invention
  • Panel D depicts an exemplary graphical representation illustrating Coefficient of each metabolite in the signature of Ovarian Cancer, in accordance with an embodiment of the present invention.
  • Figure 16 shows Table A, which depicts an exemplary table distribution illustrating the ethnicity and demographic distribution of samples under study, in accordance with an embodiment of the present invention
  • Figure 17 shows Table B, which depicts an exemplary table distribution illustrating division of samples as training sets and testing sets for disease Cancers samples (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer).
  • Table B depicts the ethnicity and demographic distribution of samples under study when divided into training and test sets to distinguish disease Cancers samples (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) from normal controls, in accordance with an embodiment of the present invention;
  • Figure 18 shows Table C, which depicts an exemplary table distribution illustrating division of samples as training sets and testing sets for distinguishing within disease Cancers samples (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer).
  • Table C depicts the ethnicity and demographic distribution of samples under study when divided into training and test sets to distinguish within disease Cancers samples (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer), in accordance with an embodiment of the present invention.
  • the present invention discloses embodiments that enable simultaneous screening for endometrial cancer, breast cancer, cervical cancer, and ovarian cancer in a single analysis.
  • the present invention related to a system and a method that may integrate global metabolome profiling with machine learning powered data analysis, to capture the disease-specific signatures.
  • the invention may provide an integrated method for the simultaneous detection of early stages of the four most prominent women cancers. This method may further elaborate the process of untargeted metabolomics for detecting and measuring metabolic changes that are not only useful in the broad differentiation between cancer and healthy individual but also effectively, and simultaneously, distinguish each individual cancer from normal controls as well as the other women-specific cancers.
  • the detailed description herein explains and relates to the four women-specific cancers, which are endometrial cancer, breast cancer, cervical cancer, and ovarian cancer, but, the method explained here may not be restricted in detection these four women-specific cancers only, and may be applied on segregation and detection of other cancer in a biological mammal specimen from normal controls.
  • LC-MS Liquid Chromatography with mass spectrometry
  • a sample refers to one or more samples, i.e., a single sample and multiple samples.
  • this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely,” “only” and the like in connection with the recitation of claim elements, or use of a “negative” limitation.
  • sample as used herein relates to a material or mixture of materials, typically, although not necessarily, in liquid form, containing one or more analytes of interest.
  • the term as used in its broadest sense refers to any mammalian material containing cells or producing cellular metabolites, such as, for example, tissue or fluid isolated from an individual (including without limitation plasma, serum, cerebrospinal fluid, lymph, tears, saliva and tissue sections) or from in vitro cell culture constituents, as well as samples from the environment.
  • tissue or fluid isolated from an individual (including without limitation plasma, serum, cerebrospinal fluid, lymph, tears, saliva and tissue sections) or from in vitro cell culture constituents, as well as samples from the environment.
  • sample may also refer to a “biological sample”.
  • a biological sample refers to a whole organism or a subset of its tissues, cells or component parts (e.g.
  • a “biological sample” can also refer to a homogenate, lysate or extract prepared from a whole organism or a subset of its tissues, cells or component parts, or a fraction or portion thereof, including but not limited to, for example, plasma, serum, spinal fluid, lymph fluid, the external sections of the skin, respiratory, intestinal, and genitourinary tracts, tears, saliva, milk, blood cells, tumors, organs.
  • the sample has been removed from an animal.
  • Biological samples of the invention include cells.
  • Metabolite profile as used in the invention should be understood to be any defined set of values of quantitative results for metabolites that can be used for comparison to reference values or profiles derived from another sample or a group of samples. For instance, a metabolite profile of a sample from a diseased patient might be significantly different from a metabolite profile of a sample from a similarly matched healthy patient. Metabolites can be, but not limited to, amino acids, peptides, acylcamitines, monosaccharides, lipids and phospholipids, prostaglandins, steroids, bile acids and glycol and phospholipids can be detected and/or quantified.
  • untargeted metabolomics studies are characterized by the simultaneous measurement of many metabolites from biological samples. This strategy, known as top-down strategy, avoids the need for a prior specific hypothesis on a particular set of metabolites and, instead, analyses the global metabolomic profile. Consequently, these studies are characterized by the generation of large amounts of data. This data is not only characterized by its volume but also by its complexity and, therefore, there is a need for high performance bioinformatic tools.
  • chromatography refers to a process in which a chemical mixture carried by a liquid or gas is separated into components as a result of differential distribution of the chemical entities as they flow around or over a stationary liquid or solid phase.
  • HPLC high performance liquid chromatography
  • ultra-high performance liquid chromatography or UPLC or UHPLC refers to HPLC which occurs at much higher pressures than traditional HPLC techniques.
  • sample injection refers to introducing an aliquot of a single sample into an analytical instrument, for example a mass spectrometer. This introduction may occur directly or indirectly. An indirect sample injection may be accomplished, for example, by injecting an aliquot of a sample into a HPLC or UPLC analytical column that is connected to a mass spectrometer in an on-line fashion.
  • MS mass spectrometry
  • MS refers to an analytical technique to identify compounds by their mass.
  • MS refers to methods of filtering, detecting and measuring ions based on their mass-to-charge ratio or m/z.
  • the term operating in positive ion mode refers to those mass spectrometry methods where positive ions are generated and detected.
  • the term electron ionization or El refers to methods in which an analyte of interest in a gaseous or vapor phase interacts with a flow of electrons. Impact of the electrons with the analyte produces analyte ions, which may then be subjected to a mass spectrometry technique.
  • electrospray ionization refers to methods in which a solution is passed along a short length of capillary tube, to the end of which is applied a high positive or negative electric potential. Solution reaching the end of the tube is vaporized (nebulized) into a jet or spray of very small droplets of solution in solvent vapor. This mist of droplets flows through an evaporation chamber, which is heated slightly to prevent condensation and to evaporate solvent. As the droplets get smaller, the electrical surface charge density increases until such time that the natural repulsion between like charges causes ions as well as neutral molecules to be released.
  • noise filters reduce the data based on a calculated noise threshold.
  • data below a certain signal to noise ratio is filtered.
  • Content based filtering of the results leverages. For example, disease specific knowledge to concentrate on relevant metabolite aspects of the disease under investigation.
  • samples are derived from patients participating in a clinical trial, where a novel drug compound is under investigation and compared to an approved drug.
  • Al Artificial intelligence in its core, the new technical discipline that researches and develops theories, methods, technologies, and application system for simulating the extension and expansion of human intelligence.
  • the use of Al in research likely to perform some complex task that require human cognitive ability.
  • the major core concept of Al is machine learning and deep learning.
  • machine learning is the art of study of algorithms that learn from examples and experiences. Additionally, machine learning is based on the idea that there exist some patterns in the data that were identified and used for future predictions.
  • deep learning uses different layers to learn from the data. The depth of the model is represented by the number of layers in the model. In deep learning, the learning phase is done through a neural network.
  • a neural network is an architecture where the layers are stacked on top of each other.
  • FIG. 1 illustrates a schematic representation of a system for implementing metabolomics process for differentiating the diseased cancer samples (for example, breast cancer, endometrial cancer, cervical cancer, and ovarian cancer) from the normal controls and further to identify each specific disease sample from among the disease cancer samples, in accordance with an embodiment of the present invention.
  • the FIG. 1 shows a metabolomics system 100 that may comprise at least one or more components performing one or more functions from the following:
  • At least one sample collecting device 102 for collecting and storing biological samples from one or more organisms or individuals;
  • At least one Liquid Chromatography device 110 with a mass spectrometer 112 (abbreviated, herein after, as LC-MS) for analyzing the dried metabolite extract using the LC-MS technique.
  • At least one computing device 114 with compound discoverer software for automated data extraction for example, automated data extraction for metabolite ions and their related features using the compound discoverer software; and 6.
  • the present system 100 may distinguish each individual cancer from normal controls as well as the other cancer samples.
  • FIGs. 1-15 will be explained taking examples and hence, should not be considered as limiting to those specific examples only.
  • the FIGs. 1-15 are described, herein, considering a sample size of 1369 taken from adult female volunteers.
  • the present system 100 may be implemented to differentiate between stage 0 / I endometrial cancer, breast cancer, cervical cancer and ovarian cancer from healthy/disease control individuals among adult women.
  • sample collecting device 102 may be a test tube or similar tube.
  • the present system 100 may include a metabolite extraction which may be achieved by precipitating serum proteins with chilled methanol, according to an embodiment.
  • the precipitation device 104 may be used here in order to extract metabolite from the samples collected by precipitating serum proteins with chilled methanol.
  • the precipitation device 104 may be a test tube or similar tube.
  • the supernatant may be collected as the metabolite extract, and may further be dried before use.
  • the phase separation device 106 may be used that may dry the metabolite extract using speed vacuum.
  • the dried extract may be reconstituted in an aqueous solution in a mobile phase using a device 108.
  • the ion spectrum of the resultant samples, derived from the reconstitution phase may be generated by UHPLC-HRMS, where samples may be first resolved by Ultra High-Performance Liquid Chromatography (abbreviated as UHPLC) device 110, and then, the ion spectra may be subsequently obtained through high- resolution mass spectrometer 112 (abbreviated as HRMS).
  • UHPLC Ultra High-Performance Liquid Chromatography
  • HRMS high- resolution mass spectrometer
  • the features of the ion spectra accumulated in metabolic profile may be extracted using the computing device 114 that may execute, using one or more processors, compound discoverer software (for example of compound discoverer software Thermo Fisher Scientific).
  • compound discoverer software for example of compound discoverer software Thermo Fisher Scientific.
  • the masses obtained for the ions in the metabolome profile, using the UHPLC device 110 and the HRMS 112, may be aligned across all the samples. This may be done to enable comparison of the peak intensity of each ion across all the samples. For example: a pool of known internal standard used for RT alignment with ⁇ 0.02 mins of error window, followed by peak picking and identification of metabolites.
  • the present system 100 may also include functions for minimizing the errors that be generated in measurement of the masses for the ions.
  • a sophisticated approach of using parts per million (ppm) error-based approach may be used, according to an embodiment.
  • ppm parts per million
  • a modified virtual lock mass-based approach may also be used. This is based on the principle that mass errors are known to increase with mass.
  • This modified virtual lock mass-based approach may be used and adapted according to the datasets in examples of the invention. This may be done by combining the traditional virtual lock mass approach with metabolite identification from the Human Metabolome database (HMDB).
  • HMDB Human Metabolome database
  • the virtual lock mass boxes may be defined using the masses of metabolites identified by HMDB database search across multiple samples. Subsequently, the metabolite ions may be filtered based on the frequency of presence in samples. In an embodiment, a 20% cutoff may be used for metabolite ions filtering; meaning ions present in greater than 20% of samples may be used in subsequent analysis.
  • AI/ML models are applied for statistical analysis of the samples.
  • the computing device 116 that may be able to execute one or more AI/ML algorithms for applying the AI/ML models for statistical analysis of the samples.
  • one or more first AI/ML models may be generated to distinguish the cancer samples (breast cancer, endometrial cancer, cervical cancer, and ovarian cancer) from the normal controls.
  • one or more second AI/ML models may be generated that may be layered on the one or more first AI/ML models to further distinguish and identify a particular cancer sample (e.g., breast cancer) from the other cancer samples (e.g., endometrial cancer, cervical cancer, and ovarian cancer) from the normal controls.
  • a particular cancer sample e.g., breast cancer
  • the other cancer samples e.g., endometrial cancer, cervical cancer, and ovarian cancer
  • the computing device 116 may follow one or more of the following steps: i. While applying the Al model, a logistic regression function may be applied on a training dataset to find a function separating Cancer samples versus Normal Control samples; ii. Class balancing parameters were configured in the Al model to deal with the imbalance of classes in the training dataset
  • This may generate a first Al Model Layer I, that may separate normal control samples from cancer Samples.
  • second Al Model Layer II may also be generated that may be layered on the first Al Model Layer 1 to distinguish and identify a particular cancer sample (e.g., breast cancer) from the other cancer samples (e.g., endometrial cancer, cervical cancer, and ovarian cancer) from the normal controls.
  • the second Al Model Layer II may be generated in a similar way as the first Al Model Layer I, and may further include a one versus rest (OVR) classifier multiclass classification model that may be made using the training samples to give the second Al Model Layer II.
  • OVR one versus rest
  • a two layered modeling scheme may be applied on the test set, in an embodiment. That is, firstly, Al model I differentiating cancer samples versus normal samples may be applied on the test set. Then, Al model II may be applied on the resulting predicted cancer samples. Now, if 4 cancer samples are taken, for example breast cancer, endometrial cancer, cervical cancer, and ovarian cancer, then this two-layered modeling scheme may result in 4 scores for each sample, with each score defining probability of the respective sample belonging to one of the four classes.
  • above process as implemented by the system 100 is performed: out of total 1369 samples, 1119 samples were either Breast, Endometrial, Cervical and Ovarian without any disease. The data was randomly partitioned into training and test datasets in equal proportion.
  • Class balancing parameters may be configured in the Al model I to deal with the imbalance of classes in the training dataset.
  • an AI/Model I may first be trained using the training dataset of samples.
  • the trained model / algorithm I may find a score for each sample.
  • the trained model I may be evaluated on a test set to determine the accuracy.
  • the sensitivity, specificity and accuracy obtained in this example was 98%, 98%, and 98% respectively.
  • another multiclass Al model II may be layered on top of the earlier AI/Model I.
  • the Al Model II acted on the predicted cancers samples from Al model I (breast, endometrial, cervical or ovary) and gave a multiclass score to each sample: one score for each disease class denoting the probability of the sample belonging to the respective disease class.
  • 304 samples were Endometrial Cancer, 303 Breast Cancer, 250 Cervical Cancer and 262 Ovarian Cancer. The data was randomly partitioned into training and test datasets in equal proportion.
  • the system 100 may be further implemented for determining the accuracy of the multiclass model in differentiating specifically endometrial cancer from the other three cancers within the BECO group, as well as differentiating BECO samples from normal controls disclosed herein.
  • the scores obtained from multiclass model were plotted.
  • a plot of the multiclass model Endometrial Cancer Score for endometrial cancer samples against the scores for breast, cervical and ovarian (BCO) cancer samples give scores that clearly differentiate endometrial cancer from the other three women-specific cancers (breast cancer, cervical cancer, and ovarian cancer) upon applying a threshold to differentiate between two types results in a confusion matrix.
  • the normal samples are also added in the controls to get the sensitivity, specificity values for endometrial cancer versus all the other groups including normal controls.
  • Sensitivity, Specificity, and Accuracy were calculated to be 100%, 97%, and 98% respectively (See FIG. 9).
  • the training set included 152 endometrial cancer samples versus 410 of the BCO samples.
  • the testing set included 152 endometrial cancer samples versus 405 of BCO samples and 124 normal controls (See FIG. 13A.
  • the Sensitivity, Specificity and Accuracy were calculated to be 87%, 93%, and 91.6% respectively. (See e.g. FIG. 13B)
  • the system 100 may be further implemented for determining the accuracy of the multiclass model in differentiating specifically breast cancer from the other three cancers within the BECO group, as well as differentiating BECO samples from normal controls disclosed herein.
  • the scores obtained from multiclass model were plotted.
  • a plot of the multiclass model Breast Score for Breast Samples and set of Endometrial, Cervical and Ovarian (ECO) Cancer samples gives scores that clearly differentiate Breast cancer from the other three women-specific cancer (endometrial cancer, cervical cancer, and ovarian cancer) upon applying a threshold to differentiate between two types results in a confusion matrix.
  • the normal samples are also added in the controls to get the sensitivity, specificity values for Breast cancer versus all the other groups including normal controls.
  • Sensitivity, Specificity and Accuracy were calculated to be 97%, 100%, and 99% respectively (See FIG. 10).
  • the training set included 152 breast cancer samples versus 410 of the ECO samples.
  • the testing set included 151 breast cancer samples versus 406 of ECO samples and 124 normal controls (See FIG. 13A.
  • the Sensitivity, Specificity and Accuracy were calculated to be 93%, 95%, and 94.4% respectively. (See e.g., FIG. 13C).
  • the system 100 may be further implemented for specifically identifying cervical cancer cases from the other cancers in the BECO group was mentioned.
  • Scores from the multiclass model for cervical cancer samples (Cervical Score) were plotted against the scores for endometrial, breast and ovarian (EBO) cancer samples.
  • the model scores clearly differentiated between cervical and the EBO cancer samples upon applying a threshold to differentiate between two types results in a confusion matrix.
  • the normal samples are also added in the controls to get the sensitivity, specificity values for Cervical versus rest. Sensitivity, Specificity and Accuracy were calculated to be 87%, 100%, and 98% respectively (See FIG 11).
  • the training set included 127 cervical cancer samples versus 435 of the EBO samples.
  • the testing set included 123 cervical cancer samples versus 434 of EBO samples and 124 normal controls (See FIG. 13A).
  • the Sensitivity, Specificity and Accuracy were calculated to be 87%, 90%, and 87.6% respectively. (See e.g., FIG. 13D).
  • the system 100 may be further implemented for specifically discriminating ovarian cancer samples from the other three cancers, and from control cases.
  • Ovarian Score the scores for the ovarian cancer samples
  • EBC endometrial cancer, breast cancer, cervical cancer
  • the normal samples are also added in the controls to get the sensitivity, specificity values for Ovarian versus rest. Sensitivity, Specificity and Accuracy were calculated to be 100%, 99%, and 99% respectively (See FIG. 12).
  • the training set included 131 ovarian cancer samples versus 431 of the EBC samples.
  • the testing set included 131 ovarian cancer samples versus 426 of EBC samples and 124 normal controls.
  • the Sensitivity, Specificity and Accuracy were calculated to be 86%, 93%, and 92% respectively. (See e.g., FIG. 13E).
  • Figure 2 that illustrates a flow chart for implementing metabolomics process for differentiating the diseased cancer samples (for example, breast cancer, endometrial cancer, cervical cancer, and ovarian cancer) from the normal controls and further to identify each specific disease sample from among the disease cancer samples, in accordance with an embodiment of the present invention.
  • the Figure 2 should be read and understood in conjunction with the Figures 1-15, and also may include at least one or more embodiments of the Figures 1-15, without deviating from the meaning and scope of the present invention.
  • the method 200 may include at least one or more steps 202-212, individually or in combination.
  • the method 200 is explained by taking an example of women-specific four cancers including Breast cancer, Endometrial cancer, Cervical cancer and Ovarian cancer (abbreviated as BECO), and should not be considered to limit the meaning and scope of the present invention.
  • women-specific four cancers including Breast cancer, Endometrial cancer, Cervical cancer and Ovarian cancer (abbreviated as BECO), and should not be considered to limit the meaning and scope of the present invention.
  • the samples are collected and stored in the sample collecting device 102.
  • the method includes a step 204 extracting a metabolite extraction which may be achieved by precipitating serum proteins with chilled methanol.
  • the precipitation device 104 may be a test tube.
  • the supernatant may be collected as the metabolite extract.
  • the metabolite extract may be dried before use.
  • the phase separation device 106 may be used that may dry the metabolite extract using speed vacuum.
  • the dried extract may be reconstituted in an aqueous solution in a mobile phase using a device 108.
  • the UHPLC-HRMS analysis of the resultant samples, derived from the reconstitution phase, may be performed by the UHPLC-HRMS.
  • the reconstituted samples may be first resolved by Ultra High-Performance Liquid Chromatography (abbreviated as UHPLC) device 110, and then, at a step 212, the ion spectra may be subsequently obtained through high-resolution mass spectrometer 112 (abbreviated as HRMS).
  • HRMS high-resolution mass spectrometer
  • the features of the ion spectra accumulated in metabolic profile may be extracted using the computing device 114 that may execute, using one or more processors, compound discoverer software.
  • the method 200 may include a step of 216 aligning the masses obtained for the ions in the metabolome profile, using the UHPLC device 110 and the HRMS 112, across all the samples. This may be done to enable comparison of the peak intensity of each ion across all the samples.
  • the method 200 may include a step 218 of minimizing the errors that may be generated in measurement of the masses for the ions.
  • a sophisticated approach of using parts per million (ppm) error-based approach may be used, according to an embodiment.
  • ppm parts per million
  • a modified virtual lock mass-based approach may also be used. This is based on the principle that mass errors are known to increase with mass. This modified virtual lock mass-based approach may be used and adapted according to the datasets in examples of the invention. This may be done by combining the traditional virtual lock mass approach with metabolite identification from the Human Metabolome database (HMDB).
  • HMDB Human Metabolome database
  • the virtual lock mass boxes may be defined using the masses of metabolites identified by HMDB database search across multiple samples. Subsequently, the metabolite ions may be filtered based on the frequency of presence in samples. In an embodiment, a 20% cutoff may be used for metabolite ions filtering; meaning ions present in greater than 20% of samples may be used in subsequent analysis.
  • the steps 216 and 218 may be optionally included in the method 200. Further, the flow of the steps 216 and 218 may be altered, and may not be restricted to as shown in the method 200 in figure 2.
  • the method 200 may furthermore include a step 220 of applying AI/ML models / algorithms on the obtained, measured (also, e.gang aligned, corrected) and featured metabolite ions, whi ch are measured and aligned as explained above.
  • the step 220 may include applying AI/ML models for statistical analysis of the samples.
  • the computing device 116 that may be able to execute one or more AI/ML algorithms for applying the AI/ML models for statistical analysis of the samples.
  • the step 220 of applying AI/ML models / algorithms may include creating and applying at least two Al models, namely a first Al Model I and a second Al Model II.
  • one or more first AI/ML models may be generated to distinguish the cancer samples (breast cancer, endometrial cancer, cervical cancer, and ovarian cancer) from the normal controls.
  • one or more second AI/ML models may be generated that may be layered on the one or more first AI/ML models to further distinguish and identify a particular cancer sample (e.g., breast cancer) from the other cancer samples (e.g., endometrial cancer, cervical cancer, and ovarian cancer) from the normal controls.
  • a particular cancer sample e.g., breast cancer
  • the other cancer samples e.g., endometrial cancer, cervical cancer, and ovarian cancer
  • the computing device 116 may follow one or more of the following steps: i. While applying the Al model, a logistic regression function may be applied on a training dataset to find a function separating Cancer samples versus Normal Control samples; ii. Class balancing parameters were configured in the Al model to deal with the imbalance of classes in the training dataset.
  • first Al Model Layer I at the step 220, that may separate normal control samples from cancer Samples.
  • Another Al Model layer may be termed as second Al Model Layer II, may also be generated, at step 220 that may be layered on the first Al Model Layer 1 to distinguish and identify a particular cancer sample (e.g., breast cancer) from the other cancer samples (e.g., endometrial cancer, cervical cancer, and ovarian cancer) from the normal controls.
  • the second Al Model Layer II may be generated in a similar way as the first Al Model Layer I, and may further include a one versus rest (OVR) classifier multiclass classification model that may be made using the training samples to give the second Al Model Layer II.
  • OVR one versus rest
  • a two layered modeling scheme may be applied on the test set, in an embodiment. That is, firstly, Al model I differentiating cancer samples versus normal samples may be applied on the test set. Then, Al model II may be applied on the resulting predicted cancer samples. Now, if 4 cancer samples are taken, for example breast cancer, endometrial cancer, cervical cancer, and ovarian cancer, then this two-layered modeling scheme may result in 4 scores for each sample, with each score defining probability of the respective sample belonging to one of the four classes.
  • the cancer is ovarian. In some embodiments, after diagnosing or detecting the cancer, thereafter is performed surgery to remove one ovary or both ovaries. In some embodiments, the surgery includes removing at least one affected ovary and its fallopian tube. In some embodiments, surgery removes both ovaries and the uterus.
  • the chemotherapy drugs are injected into a vein, into the abdomen (intraperitoneal chemotherapy), or taken by mouth.
  • the cancer is endometrial cancer.
  • after diagnosing or detecting the endometrial cancer thereafter is performed surgery to remove the uterus (hysterectomy), and in some embodiments to also remove the fallopian tubes and/or one or more ovaries (salpingo-oophorectomy).
  • radiation therapy using a machine outside the body to administer radiation to the endometrial cancer.
  • radiation placed inside the body for example, internal radiation (brachytherapy) involving placing a radiation-filled device, such as small seeds, wires or a cylinder, inside the patient's vagina.
  • the one or more chemotherapy drugs are administered orally or through veins (intravenously).
  • the cancer is endometrial cancer.
  • after diagnosing or detecting the endometrial cancer thereafter is performed surgery to remove the cancer only.
  • the surgery is a cone biopsy, which leaves most of the cervix intact.
  • the surgery removes the cervix (trachelectomy).
  • the surgery is a radical trachelectomy procedure, which removes the cervix and some surrounding tissue.
  • the surgery removes the cervix and uterus (hysterectomy).
  • the radiation therapy is performed using a machine to administer radiation to the cervical cancer.
  • the radiation is external, by directing a radiation beam at the affected area of the body (external beam radiation therapy).
  • the radiation is internal, by placing a device filled with radioactive material inside the patient's vagina, usually for only a few minutes (brachytherapy)
  • after diagnosing or detecting the cervical cancer thereafter is administered one or more chemotherapy drugs for cervical cancer.
  • the one or more chemotherapy drugs are administered orally or through veins (intravenously).
  • the cancer is breast cancer.
  • after diagnosing or detecting the breast cancer thereafter is performed surgery to remove the breast cancer (lumpectomy), optionally with a comparatively (with respect to the cancer) small margin of surrounding healthy tissue and/or optionally undergoing chemotherapy before surgery to shrink a tumor and make it possible to remove cancer completely with a lumpectomy procedure.
  • breastectomy after diagnosing or detecting the breast cancer, thereafter is performed surgery to remove the entire breast (mastectomy). In some embodiments, surgery removes all of the breast tissue — the lobules, ducts, fatty tissue and some skin, including the nipple and areola (total or simple mastectomy). In some embodiments, further surgery removes one or more limited number of lymph nodes (sentinel node biopsy).
  • the untargeted metabolomics approach (See e.g., Figure-4) generated a large metabolites list, which were further divided into subset of normal control, endometrial cancer, breast cancer, cervical cancer and ovarian cancer (See e.g., Figure-5).
  • the mass and retention time index for these metabolite ions are shown in Figure 6A&B.
  • the number of identified metabolites in each of these groups were 5895, 5971, 5982, 6300 and 6336 metabolites respectively.
  • the total number of unique metabolites, across all groups, identified in the present study was 7596 in number.
  • the plant and drug metabolites were removed from this database.
  • the data was passed through our data processing pipeline (See e.g., Figure-6C).
  • samples were aligned using a combination of VLM approach along with identified metabolites to make a matrix of 1369 samples and 6893 metabolites along with the corresponding intensity information. This intensity values were transformed into loglO scale. Then, in an embodiment, metabolites ion filtering was performed to find metabolites consistently present in samples. Then, in an embodiment, data normalization and missing value imputation were performed on the data. This resulted in a matrix of total of 2823 metabolites across 1369 samples. Out of 1369 samples, 304 samples were of endometrial cancer, 303 were breast cancers, 250 were cervical cancer, 262 ovarian cancer, and 250 were normal control samples (Fig. 16, Table- A).
  • BECO cancer Endometrial Cancer +Breast cancer + Cervical cancer + Ovarian cancer
  • a multiclass classifier was also built to distinguish within BECO cancers.
  • a model e.g., Al Model II
  • Al Model II was built with total of 304 Endometrial, 303 Breast, 250 Cervical and 262 ovarian cancer samples. These study samples were randomly divided (50%) into the training and testing sets.
  • the training sets contain a total of 152 Endometrial, 152 Breast, 127 Cervical and 131 Ovarian cancer samples.
  • For testing sets a total of 152 Endometrial, 151 Breast, 123 Cervical and 131 Ovarian cancer samples were grouped for test cases (Fig. 18, Table-C, Figure 13A).
  • a set of 124 normal samples were also kept in test set to test the accuracy of applying first BECO versus normal model and then applying multiclass model to distinguish between BECO cancers.
  • a multivariate classifier was derived into the training sets, and evaluated in the testing sets.
  • the multiclass model gave four scores to each sample: Endometrial score, Breast score, Cervical score and ovarian score.
  • the system 100 and the method 200 may efficiently detect and distinguish cancer samples from the normal controls using a first Al Model I, and further may efficiently detect and distinguish each individual cancer sample from the other cancer samples and the normal controls using a second Al Model II applied over the first Al Model I.
  • Serum samples were collected from two different US based clinical centers. The demographic and ethnic distribution of the specimens were shown in Fig. 16, Table-A. Controls and disease cases were catalogued according to age-group, BMI, ethnicity and stages of cancer. All diagnoses were made in accordance with uniform histological and pathological guidelines.
  • Serum samples were collected and processed according to standardized protocols. Serum samples selected for analysis were distributed into five batches, such as normal, endometrial cancer, breast cancer, cervical cancer and ovarian cancer. Each sample was assigned a unique laboratory identification number, which specified the order of processing and blinded laboratory personnel to sample identity. Samples were stored at -80C until use.
  • Metabolite extraction from serum was performed as explained previously. Briefly, all the serum samples were thawed on ice and mixed properly. 10 pl of each serum sample was taken in microfuge tube (1.5ml), (Genaxy, Cat No. GEN-MT-150-C. S) and then 30pl of chilled Methanol, (Merck, Cat.No.1.06018.1000) to the sample, vortexed briefly and then kept at - 20°C for 60 minutes.
  • the sample was then centrifuged (Sorvall Legend Microl7, Thermo Fisher Scientific, Cat.No. Ligend Micro 17) at 10000 rpm for 10 minutes. After centrifugation 27ul supernatant was collected in separate microfuge tube without disturbing the pellet and dried using Speed Vacuum, (ThermoFisher Scientific, Cat.No. SPD1030-230) at low energy for 30-35 minutes. Samples pellets were then re-suspended using 30ul methanol: water (1 : 1, water: methanol) mixture for injection. Or the samples can be stored at -20°C without re-suspending it.
  • the mobile phase was kept isocratic at 5% B for Imin, and was increased to 95% B in 7min and kept for another two min at 95% B, the mobile phase composition returned to 5% B in 14min.
  • the ESI voltage was 4 kV.
  • the mass accuracy of QExactive mass spectrometry was less than 5 ppm and calibrated at recommended schedule prior to each batch run.
  • the mass scan range is from 66.7-1000 Da, and resolution was set to 35000.
  • the maximum inject time for orbitrap was 100msec while, AGC target was optimized with le6. Representative images of the chromatograms obtained from normal control and the individual cancer samples are shown in Figure-4.
  • Mass errors are known to be present in metabolomics data. This means that the same identified metabolite in different samples would have slightly different mass. This creates problems when intensity of same metabolite has to be compared across samples. This intensity comparison is required in the downstream Al based analysis.
  • a fixed window size of mass is used to align the samples, but here, we have used a sophisticated approach of using a parts per million (ppm) error-based approach.
  • ppm parts per million
  • we have adapted the virtual lock mass (vim) based approach This is based on the principle that mass errors are known to increase with mass. This, approach was used and adapted according to our datasets. This was done by combining the traditional vim based approach with metabolite identification from HMDB database.
  • the vim boxes were defined using the masses of metabolites identified by HMDB database search across multiple samples.
  • Metabolite Ions Filtering The metabolite ions were filtered based on the frequency of presence in samples. A 20% cutoff was used, wherein only those metabolites present in greater than 20% of samples was used in subsequent analysis.
  • Missing value imputation Missing values in untargeted metabolomics data is known to be problematic. A k-nearest neighbors (KNN) approach was applied to impute the missing values in the data to make the data more homogenous and amenable to Al based analysis.
  • KNN k-nearest neighbors
  • Class balancing parameters were configured in the model to deal with the imbalance of classes in the training dataset.
  • xO is a constant number
  • a one versus rest (OVR) classifier multi class classification model was made using the training samples to give Al model2.
  • a two layered modeling scheme was applied on the test set. That is, firstly, Al model 1 differentiating BECO versus normal samples was applied on the test set. Then, Al model2 was applied on the resulting predicted BECO samples. This resulted in 4 scores for each sample, with each score defining probability of the respective sample belonging to one of the four classes.
  • y_scorel yO+yl*Ii+ y2*h+ y3*E+ +y2823*l2823
  • y_score2 zO+zl*Ii+ z2*h+ z3*E+ +z2823*l2823
  • y_score3 a0+al*Ii+ a2*h+ a3*E+ +a2823*l2823
  • y_score4 bO+bl*Ii+ b2*h+ b3*E+ +b2823*l2823
  • yO, zO, aO, bO are constant number
  • the multiclass model Breast Score for Breast Samples and set of Endometrial, Cervical and Ovarian (ECO) Cancer samples were plotted.
  • the model scores are clearly seen to be different between Breast and ECO Cancer samples where on applying a threshold to differentiate between two types results in a confusion matrix as shown.
  • the normal samples are also added in the controls to get the sensitivity, specificity values for Breast versus rest. Sensitivity, Specificity and Accuracy can be calculated from below formulae:
  • the multiclass model Cervical Score for Cervical Samples and set of Endometrial, Breast and Ovarian (EBO) Cancer samples were plotted.
  • the model scores are clearly seen to be different between Cervical and EBO Cancer samples where on applying a threshold to differentiate between two types results in a confusion matrix as shown.
  • the normal samples are also added in the controls to get the sensitivity, specificity values for Cervical versus rest. Sensitivity, Specificity and Accuracy can be calculated from below formulae:
  • the multiclass model is differentiating BECO samples from each other as well as from BECO samples from normal.
  • the scores obtained from multiclass model were plotted.
  • the multiclass model Ovarian Score for Ovarian Samples and set of Endometrial, Breast and Cervical (EBC) Cancer samples were plotted.
  • the model scores are clearly seen to be different between Breast and EBC Cancer samples where on applying a threshold to differentiate between two types results in a confusion matrix as shown.
  • the normal samples are also added in the controls to get the sensitivity, specificity values for Ovarian versus rest. Sensitivity, Specificity and Accuracy can be calculated from below formulae:

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

An integrated method for the treatment and simultaneous detection of early stages of the four most prominent women cancers is described. The cancers that can be diagnosed by this method are endometrial cancer, breast cancer, cervical cancer, and ovarian cancer. The method combines global metabolome profiling of serum samples, from either controls or cases with the individual cancer, with data analysis using a machine learning algorithm to capture the complex metabolite signatures that specifically characterize early stages of the individual cancers. The detection accuracy obtained with this method is significantly superior to that of other existing methods. Additionally, the method enables simultaneous screening for all the four cancers in a single analysis.

Description

METHOD FOR EARLY TREATMENT AND DETECTION OF WOMEN SPECIFIC
CANCERS
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims priority benefit of United States Application no. 63/072,482 filed 31 August 2020, the content of which is incorporated herein by reference in its entirety.
TECHNICAL FIELD OF THE INVENTION
The present invention relates to the field of metabolomics, and more particularly the present invention relates to metabolite bio-signatures and their use for detection of early stage cancers in women.
BACKGROUND OF THE INVENTION
Cancer is a leading cause of death worldwide, with the disease burden expanding in countries of all income levels due to growth and aging. In females, cancer is the second most important cause of death globally, with about 7 million new cases and over 3.5 million deaths being recorded each year (cancer Epidemiol Biomarkers Prev 26(4). doi: 10.1158/1055-9965. EPI- 16-0858). The greatest numbers of cancer cases and deaths among females are in Eastern Asia, followed by North America and South-Central Asia. The leading female specific cancers are breast cancer, cervical cancer, uterine or endometrial cancer, and ovarian cancer. Of these, breast cancer is the most frequently diagnosed and accounts for 25% of cancer cases, along with 15% of cancer-related deaths among women across the world (CancerBase No. 11 [Internet], [cited 2015 July 30], http://globocan.iarc.fr). In comparison, cervical cancer is the fourth most frequently diagnosed cancer in women, with an estimate of over 500,000 cases worldwide. Uterine cancer accounts for about 5% and 2% of global cancer incidence and mortality among women, whereas ovarian cancer accounts for about 4% of the women cancers.
It is widely recognized that the most critical point for best prognosis is to identify cancer in its early stage as this could reduce death rates significantly in the long-term. Unfortunately, however, effective methodologies for early-stage cancer detection are either lacking, or not sufficiently sensitive, for many of the cancers that are of major public health relevance. For instance, in the case of female-specific cancers, there is yet no reliable method for detection of early-stage ovarian cancer, as well as for asymptomatic endometrial cancers. Similarly, for breast cancer, existing detection methods suffer from limitations that include either high cost, time consumption, and/or inadequate efficacy. For instance, although mammography is the commonly recommended method for early detection, the relatively high false-positive and false-negative rates, particularly in patients with dense breasts, presents a problem. The efficacy of biomarker-based approaches, employing either DNA or protein markers, is also similarly compromised due either to poor penetration in the risk groups (DNA), or low circulating concentrations (proteins). Finally, while screening strategies for early-stage cervical cancer do exist, their impact has been limited in less developed regions of the world where about 85% of new diagnosis occur. These factors, therefore, emphasize the need for developing new methods that can detect early-stage female-specific cancers with a high degree of accuracy, and that are economical enough to be affordable to women across the economic spectrum. In this context, an integrated test that can simultaneously screen for all of the four female-specific cancers would provide a distinct advantage.
Metabolomics is an emerging field and is broadly defined as the comprehensive measurement of all metabolites and low-molecular-weight molecules in a biological specimen. Metabolomics affords profiling of much larger numbers of metabolites than are presently covered in standard clinical laboratory techniques. Hence it facilitates comprehensive coverage of biological processes and metabolic pathways. Consequently, it holds promise to serve as an essential objective lens in the molecular microscope for precision medicine. This is particularly relevant as metabolites have been described as proximal reporters of disease because their abundances in biological specimens are often directly related to pathogenic mechanisms.
The idea that the metabolite composition of biological fluids reflects the health of an individual has existed for a long time. Confidence in this supposition comes from experience with recent applications to find early metabolic indicators of disease in longitudinal cohorts, years before symptoms are clinically apparent — for example, in pancreatic cancer, type 2 diabetes, cardiovascular disease, memory impairment, and many other conditions. Metabolomics studies have also inspired work revealing novel insights into relationships between diet and disease, such as observations linking elevated branched chain amino acids and obesity to insulin resistance. Such studies therefore, provide strong support that metabolomics - coupled with multivariate statistical analysis - provides a relatively simple and efficient way to identify risk factors and/or biomarkers for disease.
Metabolomics is an especially relevant technique for cancer detection. Cancer cells have significantly altered metabolism and, therefore, the pattern of metabolites produced can yield a "signature" that is indicative of the cancer's presence or behavior. Importantly, and in contrast to gene expression profiling as a risk stratifier, this is a signal that originates directly or indirectly from micrometastatic disease, rather than one derived from features of the primary tumor. As a result, metabolome derived signatures provide a high-precision risk-stratifier for disease, with an accuracy that can far exceed those of methods based on DNA or protein markers. Untargeted metabolome profiles, however, are complex and multivariate in nature, and cannot be accurately analyzed by linear analytical methods. Such data, however, is readily amenable to the application of Al-based methodologies. By exploring non-linear variables in the data that correlate with defined clinical states, one can potentially extract metabolite signatures that are characteristic of a given disease state.
Metabolomics is now frequently used in oncology research, with particular emphasis on early diagnosis, monitoring, and prognosis of cancers. For example, several studies have exploited metabolomics analysis for both diagnosis and prognosis of breast cancer. Collectively, however, these studies have suffered from a variability in results, as well as limited accuracy. Similarly, the application of metabolomics for endometrial cancer resulted in the identification of metabolites that could predict the presence of cancer, tumor behavior, and also the pathological characteristics. These findings, however, await validation. A recent analysis identified metabolite signatures for cervical intraepithelial neoplasia and cervical cancer. The sample sizes though were relatively small and the discriminatory capacity of the test was sub- optimal. Metabolomic approaches for diagnosis of ovarian cancer has been recently reviewed. The inference was that while metabolomics offers significant new opportunities for ovarian cancer diagnosis, further work needed to be done.
US 9459255 discloses amino acids that are useful in discriminating between breast cancer and breast cancer-free individuals. A multivariate discriminant was found, which included the concentrations of the identified amino acids as explanatory variables, that correlated significantly with the state of breast cancer. The sensitivity of the method, however, was only about 87% whereas the specificity was about 85%.
US 2011/0143444 discloses a method for evaluating female genital cancer, by using the amino acid concentrations in blood collected from subjects. This method evaluates the state of female genital cancer including at least one of cervical cancer, endometrial cancer, and ovarian cancer in the subject. The total number of subject samples tested, however, was small and the discriminatory power of the method was weak; ranging from 55% to 81% for the individual cancers. US 2017/0003291 is drawn to a method for diagnosing endometrial cancer by detecting, in a biological sample from a patient, variations in concentrations of specific lipids and some small metabolites. Using combined NMR and Mass spectrometry (MS) based metabolomics analysis, statistically significant changes were found in the serum of endometrial cancer patients in comparison with unaffected co trolls. However, despite that fact that two separate metabolome analysis techniques were employed, the resultant sensitivity and specificity of the method ranged only between 70% to 80%.
US 2017/0097355 describes methods for measuring metabolic changes useful in the differentiation between ovarian cancer and benign ovarian tumor. Two independent LC-MS- based metabolomics platforms, including a global lipidomics approach, were used to screen for differentially abundant plasma metabolites between cases with serous ovarian carcinoma and controls with benign serous ovarian tumor. While the combination of small molecule with lipidome profiling yielded test with good sensitivity (95%), the specificity however was less than 50%. This limits the utility of the test for patient screening.
Thus, it is clear from all of these studies that better methods, with higher fidelity, are required for early-stage diagnosis of female-specific cancers. Furthermore, it is also evident that screening for early-stage women-specific cancers would greatly benefit from the development of a single test that can simultaneously screen for all the four cancers.
SUMMARY OF THE INVENTION:
The present invention relates to a process that may be implemented to differentiate between the women-specific cancer samples (breast cancer, endometrial cancer, cervical cancer, and ovarian cancer) from the normal controls and further to differentiate between the women- specific cancer samples to detect stage 0 / 1 endometrial cancer, breast cancer, cervical cancer and ovarian cancer from disease control individuals among adult women.
BRIEF DESCRIPTION OF THE DRAWINGS:
The present invention, by way of example, is described with reference to the following drawings. These drawings and the following description are added as example and merely to illustrate and understand the invention. However, the drawings and the following description should not be construed to limit the scope of the invention.
Figure 1 illustrates a schematic representation of the metabolomics process implemented in the present invention for differentiating the diseased cancer samples (for example, breast cancer, endometrial cancer, cervical cancer, and ovarian cancer) from the normal controls and further to identify each specific disease sample from among the disease cancer samples, in accordance with an embodiment of the present invention;
Figure 2 depicts an exemplary flowchart illustrating a method for the metabolomics process implemented in the present invention for differentiating the diseased cancer samples (for example, breast cancer, endometrial cancer, cervical cancer, and ovarian cancer) from the normal controls and further to identify each specific disease sample from among the disease cancer samples, in accordance with an embodiment of the present invention;
Figures 3 A-E each depicts an exemplary bar graphs illustrating Age-wise distribution of serum samples among normal controls (A) and cancer individuals (B)-(E), in accordance with an embodiment of the present invention;
Figure 4 depicts exemplary representative images of the chromatograms obtained from normal control and the individual cancer samples, in accordance with an embodiment of the present invention;
Figures 5 A-E each depicts exemplary graphical representation of age-wise distribution of total ions detected during untargeted metabolomics profiling, in accordance with an embodiment of the present invention;
Figure 6 Panel A and B depict mass and Retention time index for each ion box. The figure depicts the indices for each ion box which is used to calculate the mass (Panel A) and retention time windows (Panel B) for each ion box. Panel C depicts exemplary graphical representation of the data preprocessing pipeline used to make the data amenable to Al models, in accordance with an embodiment of the present invention;
Figure 7 depicts exemplary graphical representation illustrating PCA Plot of the matrix of samples and metabolites versus metabolite intensity showing the clear separation of samples on the basis of their clinical information, in accordance with an embodiment of the present invention;
Figure 8. Panel A depicts an exemplary block diagram illustrating an Al workflow for detection of the disease Cancers samples (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) from Normal Controls, specifically the Al workflow used to make and test the Al models for distinguishing between the Cancer group from the Normal controls, in accordance with an embodiment of the present invention; Panel B depicts an exemplary representation illustrating testing the trained model for disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls, in accordance with an embodiment of the present invention;
Figure 9. Panel A depicts an exemplary block diagram illustrating an Al workflow for detection of the disease Cancers samples (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) from Normal Controls, specifically the Al workflow used to make and test the Al models for testing of the multiclass trained model separation of Endometrial Cancers, in accordance with an embodiment of the present invention;
Panel B depicts an exemplary representation illustrating testing the trained model for disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multiclass trained model separation of Endometrial Cancers versus Other Cancers based on model’s Endometrial scores, with a resulting confusion matrix on applying a threshold shows high accuracy, sensitivity and specificity, in accordance with an embodiment of the present invention;
Figure 10. Panel A depicts an exemplary block diagram illustrating an Al workflow for detection of the disease Cancers samples (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) from Normal Controls, specifically the Al workflow used to make and test the Al models for testing of the multi class trained model separation of Breast Cancers, in accordance with an embodiment of the present invention;
Panel B depicts an exemplary representation illustrating testing the trained model for disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multi class trained model separation of Breast Cancers versus Other Cancers based on model’s Breast scores, with a resulting confusion matrix on applying a threshold shows high accuracy, sensitivity and specificity, in accordance with an embodiment of the present invention; Figure 11. Panel A depicts an exemplary block diagram illustrating an Al workflow for detection of the disease Cancers samples (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) from Normal Controls, specifically the Al workflow used to make and test the Al models for testing of the multi class trained model separation of Cervical Cancers, in accordance with an embodiment of the present invention;
Panel B depicts an exemplary representation illustrating testing the trained model for disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multiclass trained model separation of Cervical Cancers versus Other Cancers based on model’s Cervical scores, with a resulting confusion matrix on applying a threshold shows high accuracy, sensitivity and specificity, in accordance with an embodiment of the present invention;
Figure 12. Panel A depicts an exemplary block diagram illustrating an Al workflow for detection of the disease Cancers samples (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) from Normal Controls, specifically the Al workflow used to make and test the Al models for testing of the multiclass trained model separation of Ovarian Cancers, in accordance with an embodiment of the present invention;
Panel B depicts an exemplary representation illustrating testing the trained model for disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multiclass trained model separation of Ovarian Cancers versus Other Cancers based on model’s Ovarian scores, with a resulting confusion matrix on applying a threshold shows high accuracy, sensitivity and specificity, in accordance with an embodiment of the present invention;
Figure 13. Panel A depicts an exemplary block diagram illustrating an Al workflow, using the total sample set (See Fig. 16, Table-A), for detection of the disease Cancers samples (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) from Normal Controls, specifically the Al workflow used to make and test the Al models for testing of the multiclass trained model for distinguishing between the individual Cancer groups (for example, between breast cancer, endometrial cancer, cervical cancer and ovarian cancer), in accordance with an embodiment of the present invention;
Panel B depicts an exemplary representation illustrating testing the trained model for disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multiclass trained model separation of Endometrial Cancers versus Other Cancers based on model’s Endometrial scores, with a resulting confusion matrix on applying a threshold shows high accuracy, sensitivity and specificity, in accordance with an embodiment of the present invention;
Panel C depicts an exemplary representation illustrating testing the trained model for disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multiclass trained model separation of Breast Cancers versus Other Cancers based on model’s Breast scores, with a resulting confusion matrix on applying a threshold shows high accuracy, sensitivity and specificity, in accordance with an embodiment of the present invention;
Panel D depicts an exemplary representation illustrating testing the trained model for disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multiclass trained model separation of Cervical Cancers versus Other Cancers based on model’s Cervical scores, with a resulting confusion matrix on applying a threshold shows high accuracy, sensitivity and specificity, in accordance with an embodiment of the present invention;
Panel E depicts an exemplary representation illustrating testing the trained model for disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Normal Controls showing clear separation of disease Cancers (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) versus Controls based on model scores, specifically testing of the multiclass trained model separation of Ovarian Cancers versus Other Cancers based on model’s Ovarian scores, with a resulting confusion matrix on applying a threshold shows high accuracy, sensitivity and specificity, in accordance with an embodiment of the present invention;
Figure 14. Depicts an exemplary graphical representation illustrating Coefficient of each metabolite in the signature of the BECO Cancer group, in accordance with an embodiment of the present invention;
Figure 15 Panel A depicts an exemplary graphical representation illustrating Coefficient of each metabolite in the signature of Endometrial Cancer, in accordance with an embodiment of the present invention;
Panel B depicts an exemplary graphical representation illustrating Coefficient of each metabolite in the signature of Breast Cancer, in accordance with an embodiment of the present invention;
Panel C depicts an exemplary graphical representation illustrating Coefficient of each metabolite in the signature of Cervical Cancer, in accordance with an embodiment of the present invention;
Panel D depicts an exemplary graphical representation illustrating Coefficient of each metabolite in the signature of Ovarian Cancer, in accordance with an embodiment of the present invention.
Figure 16 shows Table A, which depicts an exemplary table distribution illustrating the ethnicity and demographic distribution of samples under study, in accordance with an embodiment of the present invention;
Figure 17 shows Table B, which depicts an exemplary table distribution illustrating division of samples as training sets and testing sets for disease Cancers samples (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer). Table B depicts the ethnicity and demographic distribution of samples under study when divided into training and test sets to distinguish disease Cancers samples (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer) from normal controls, in accordance with an embodiment of the present invention;
Figure 18 shows Table C, which depicts an exemplary table distribution illustrating division of samples as training sets and testing sets for distinguishing within disease Cancers samples (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer). Table C depicts the ethnicity and demographic distribution of samples under study when divided into training and test sets to distinguish within disease Cancers samples (for example, breast cancer, endometrial cancer, cervical cancer and ovarian cancer), in accordance with an embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION:
Although specific terms are used in the following description for the sake of clarity, these terms are intended to refer only to the particular structure of the invention selected for illustration in the drawings, and are not intended to define or limit the scope of the invention.
References in the specification to “one embodiment” or “an embodiment” member that a particular feature, structure, characteristics, or function described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
The present invention discloses embodiments that enable simultaneous screening for endometrial cancer, breast cancer, cervical cancer, and ovarian cancer in a single analysis. The present invention related to a system and a method that may integrate global metabolome profiling with machine learning powered data analysis, to capture the disease-specific signatures.
In an embodiment, the invention may provide an integrated method for the simultaneous detection of early stages of the four most prominent women cancers. This method may further elaborate the process of untargeted metabolomics for detecting and measuring metabolic changes that are not only useful in the broad differentiation between cancer and healthy individual but also effectively, and simultaneously, distinguish each individual cancer from normal controls as well as the other women-specific cancers.
Although, the detailed description herein explains and relates to the four women-specific cancers, which are endometrial cancer, breast cancer, cervical cancer, and ovarian cancer, but, the method explained here may not be restricted in detection these four women-specific cancers only, and may be applied on segregation and detection of other cancer in a biological mammal specimen from normal controls.
As described in some of the examples below, a Liquid Chromatography with mass spectrometry (abbreviated as LC-MS) based untargeted metabolomics approach may be used to screen differentially abundant serum metabolites from control cases (normal controls and disease controls) and test cases (i.e. either endometrial cancer, breast cancer, cervical cancer, or ovarian cancer). In a particular example study, which is conducted and presented herein, a total of 1369 serum samples were collected from participants. Among them, 250 were designated as normal controls, while 1119 were selected as test cases. In the test cases, the distribution of cancers serum samples was 304, 303, 250 and 262 as endometrial cancer, breast cancer, cervical cancer and ovarian cancer respectively. The potential utility of derived metabolite profiles to discriminate between cases and controls, in the example study of the present invention and, was investigated through construction and evaluation of multivariate classification matrix.
Before describing exemplary embodiments in greater detail, the following definitions are set forth to illustrate and define the meaning and scope of the terms used in the description. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Singleton, et al ., DICTIONARY OF MICROBIOLOGY AND MOLECULAR BIOLOGY, 2D
ED., John Wiley and Sons, New York (1994), and Hale & Markham, THE HARPER COLLINS DICTIONARY OF BIOLOGY, Harper Perennial, N.Y. (1991) provide one of skill with the general meaning of many of the terms used herein. Still, certain terms are defined below for the sake of clarity and ease of reference.
It must be noted that as used herein and in the appended claims, the singular forms “a”, “an”, and “the” include plural referents unless the context clearly dictates otherwise. For example, the term “a sample” refers to one or more samples, i.e., a single sample and multiple samples. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely,” “only” and the like in connection with the recitation of claim elements, or use of a “negative” limitation.
The term “sample” as used herein relates to a material or mixture of materials, typically, although not necessarily, in liquid form, containing one or more analytes of interest. In one embodiment, the term as used in its broadest sense, refers to any mammalian material containing cells or producing cellular metabolites, such as, for example, tissue or fluid isolated from an individual (including without limitation plasma, serum, cerebrospinal fluid, lymph, tears, saliva and tissue sections) or from in vitro cell culture constituents, as well as samples from the environment. The term “sample” may also refer to a “biological sample”. As used herein, the term “a biological sample” refers to a whole organism or a subset of its tissues, cells or component parts (e.g. body fluids, including but not limited to blood, mucus, lymphatic fluid, synovial fluid, cerebrospinal fluid, saliva, amniotic fluid, amniotic cord blood, urine, vaginal fluid and semen). A “biological sample” can also refer to a homogenate, lysate or extract prepared from a whole organism or a subset of its tissues, cells or component parts, or a fraction or portion thereof, including but not limited to, for example, plasma, serum, spinal fluid, lymph fluid, the external sections of the skin, respiratory, intestinal, and genitourinary tracts, tears, saliva, milk, blood cells, tumors, organs. In certain embodiments, the sample has been removed from an animal. Biological samples of the invention include cells.
Metabolite profile as used in the invention should be understood to be any defined set of values of quantitative results for metabolites that can be used for comparison to reference values or profiles derived from another sample or a group of samples. For instance, a metabolite profile of a sample from a diseased patient might be significantly different from a metabolite profile of a sample from a similarly matched healthy patient. Metabolites can be, but not limited to, amino acids, peptides, acylcamitines, monosaccharides, lipids and phospholipids, prostaglandins, steroids, bile acids and glycol and phospholipids can be detected and/or quantified.
As used herein, untargeted metabolomics studies are characterized by the simultaneous measurement of many metabolites from biological samples. This strategy, known as top-down strategy, avoids the need for a prior specific hypothesis on a particular set of metabolites and, instead, analyses the global metabolomic profile. Consequently, these studies are characterized by the generation of large amounts of data. This data is not only characterized by its volume but also by its complexity and, therefore, there is a need for high performance bioinformatic tools.
As used herein, the term chromatography refers to a process in which a chemical mixture carried by a liquid or gas is separated into components as a result of differential distribution of the chemical entities as they flow around or over a stationary liquid or solid phase.
As used herein, the term high performance liquid chromatography or HPLC (also sometimes known as high pressure liquid chromatography) refers to liquid chromatography in which the degree of separation is increased by forcing the mobile phase under pressure through a stationary phase, typically a densely packed column. As used herein the term ultra-high performance liquid chromatography or UPLC or UHPLC (sometimes known as ultra-high pressure liquid chromatography) refers to HPLC which occurs at much higher pressures than traditional HPLC techniques. As used herein, the term sample injection refers to introducing an aliquot of a single sample into an analytical instrument, for example a mass spectrometer. This introduction may occur directly or indirectly. An indirect sample injection may be accomplished, for example, by injecting an aliquot of a sample into a HPLC or UPLC analytical column that is connected to a mass spectrometer in an on-line fashion.
As used herein, the term mass spectrometry or MS refers to an analytical technique to identify compounds by their mass. MS refers to methods of filtering, detecting and measuring ions based on their mass-to-charge ratio or m/z.
As used herein, the term operating in positive ion mode refers to those mass spectrometry methods where positive ions are generated and detected.
As discussed herein, the term electron ionization or El refers to methods in which an analyte of interest in a gaseous or vapor phase interacts with a flow of electrons. Impact of the electrons with the analyte produces analyte ions, which may then be subjected to a mass spectrometry technique.
As used herein, the term electrospray ionization or ESI refers to methods in which a solution is passed along a short length of capillary tube, to the end of which is applied a high positive or negative electric potential. Solution reaching the end of the tube is vaporized (nebulized) into a jet or spray of very small droplets of solution in solvent vapor. This mist of droplets flows through an evaporation chamber, which is heated slightly to prevent condensation and to evaporate solvent. As the droplets get smaller, the electrical surface charge density increases until such time that the natural repulsion between like charges causes ions as well as neutral molecules to be released.
As used herein, data processing involves typically the data reduction step called filtering. Noise filters reduce the data based on a calculated noise threshold. In this respect, data below a certain signal to noise ratio is filtered. Content based filtering of the results leverages. For example, disease specific knowledge to concentrate on relevant metabolite aspects of the disease under investigation.
After pre-processed data derived from mass spectrometry analysis has been technical validated, statistical analysis can proceed. Depending on the design of a metabolite profiling study, a sample or several samples derived from healthy controls and patients are compared to reveal differences, i.e. biomarkers that can be utilized to characterize a disease at the molecular level. In another embodiment, samples are derived from patients participating in a clinical trial, where a novel drug compound is under investigation and compared to an approved drug.
Artificial intelligence in its core, the new technical discipline that researches and develops theories, methods, technologies, and application system for simulating the extension and expansion of human intelligence. The use of Al in research likely to perform some complex task that require human cognitive ability. The major core concept of Al is machine learning and deep learning. However, machine learning is the art of study of algorithms that learn from examples and experiences. Additionally, machine learning is based on the idea that there exist some patterns in the data that were identified and used for future predictions. While, deep learning uses different layers to learn from the data. The depth of the model is represented by the number of layers in the model. In deep learning, the learning phase is done through a neural network. A neural network is an architecture where the layers are stacked on top of each other.
Referring to FIG. 1 that illustrates a schematic representation of a system for implementing metabolomics process for differentiating the diseased cancer samples (for example, breast cancer, endometrial cancer, cervical cancer, and ovarian cancer) from the normal controls and further to identify each specific disease sample from among the disease cancer samples, in accordance with an embodiment of the present invention. The FIG. 1 shows a metabolomics system 100 that may comprise at least one or more components performing one or more functions from the following:
1. At least one sample collecting device 102 for collecting and storing biological samples from one or more organisms or individuals;
2. At least one precipitating device 104 for protein precipitation and extraction of metabolite;
3. At least one phase separation device 106 for drying the metabolite extract extracted from the precipitating device;
4. At least one Liquid Chromatography device 110 with a mass spectrometer 112 (abbreviated, herein after, as LC-MS) for analyzing the dried metabolite extract using the LC-MS technique.
5. At least one computing device 114 with compound discoverer software for automated data extraction, for example, automated data extraction for metabolite ions and their related features using the compound discoverer software; and 6. At least one computing device 116 that may execute one or more AI/ML algorithms for Al based pattern recognition for finally identifying, differentiating and presenting the cancer samples from the normal control samples and further to identify, differentiate and present individual cancer samples within the identified cancer samples. Thus, not only the detection and differentiation between the cancer and healthy individual can be achieved from the present system 100, but also effectively, and simultaneously, the present system 100 may distinguish each individual cancer from normal controls as well as the other cancer samples.
It should be again noted that FIGs. 1-15 will be explained taking examples and hence, should not be considered as limiting to those specific examples only. For example, the FIGs. 1-15 are described, herein, considering a sample size of 1369 taken from adult female volunteers.
In an embodiment, the present system 100 may be implemented to differentiate between stage 0 / I endometrial cancer, breast cancer, cervical cancer and ovarian cancer from healthy/disease control individuals among adult women.
A number of samples are acquired from adult female volunteers who are either free of any cancer (normal control) (n =250), have stage 0/1 of endometrial cancer (n =304), breast cancer (n =303), cervical cancer (n =250) and ovarian cancer (n = 262), or are at early stages of any other cancer (disease controls). The samples are collected and stored in the sample collecting device 102. In an embodiment, the sample collecting device 102 may be a test tube or similar tube.
Further, the present system 100 may include a metabolite extraction which may be achieved by precipitating serum proteins with chilled methanol, according to an embodiment. Thus, the precipitation device 104 may be used here in order to extract metabolite from the samples collected by precipitating serum proteins with chilled methanol. In an embodiment, the precipitation device 104 may be a test tube or similar tube.
The supernatant may be collected as the metabolite extract, and may further be dried before use. For the process of drying, in an embodiment, the phase separation device 106 may be used that may dry the metabolite extract using speed vacuum.
Further, in an embodiment, the dried extract may be reconstituted in an aqueous solution in a mobile phase using a device 108. Thereafter, the ion spectrum of the resultant samples, derived from the reconstitution phase, may be generated by UHPLC-HRMS, where samples may be first resolved by Ultra High-Performance Liquid Chromatography (abbreviated as UHPLC) device 110, and then, the ion spectra may be subsequently obtained through high- resolution mass spectrometer 112 (abbreviated as HRMS). Using the UHPLC device 110 and the HRMS 112, ions in the metabolite extraction may be measured, the masses for the ions may be measured based on their mass-to-charge ratio or m/z.
Thereafter, the features of the ion spectra accumulated in metabolic profile may be extracted using the computing device 114 that may execute, using one or more processors, compound discoverer software (for example of compound discoverer software Thermo Fisher Scientific).
The masses obtained for the ions in the metabolome profile, using the UHPLC device 110 and the HRMS 112, may be aligned across all the samples. This may be done to enable comparison of the peak intensity of each ion across all the samples. For example: a pool of known internal standard used for RT alignment with ±0.02 mins of error window, followed by peak picking and identification of metabolites.
The present system 100 may also include functions for minimizing the errors that be generated in measurement of the masses for the ions. To normalize for unavoidable, but minor, variations in mass (m/z), a sophisticated approach of using parts per million (ppm) error-based approach may be used, according to an embodiment. In another embodiment, briefly, a modified virtual lock mass-based approach may also be used. This is based on the principle that mass errors are known to increase with mass. This modified virtual lock mass-based approach may be used and adapted according to the datasets in examples of the invention. This may be done by combining the traditional virtual lock mass approach with metabolite identification from the Human Metabolome database (HMDB). Specifically, the virtual lock mass boxes may be defined using the masses of metabolites identified by HMDB database search across multiple samples. Subsequently, the metabolite ions may be filtered based on the frequency of presence in samples. In an embodiment, a 20% cutoff may be used for metabolite ions filtering; meaning ions present in greater than 20% of samples may be used in subsequent analysis.
Thereafter, on the obtained, measured, aligned, corrected and featured metabolite ions, which are measured and aligned as explained above, AI/ML models are applied for statistical analysis of the samples. The computing device 116 that may be able to execute one or more AI/ML algorithms for applying the AI/ML models for statistical analysis of the samples. By executing the one or more AI/ML algorithms, using one or more processors, at the computing device 116, one or more first AI/ML models may be generated to distinguish the cancer samples (breast cancer, endometrial cancer, cervical cancer, and ovarian cancer) from the normal controls. Further, in another embodiment, executing the one or more AI/ML algorithms, using one or more processors, at the computing device 116, one or more second AI/ML models may be generated that may be layered on the one or more first AI/ML models to further distinguish and identify a particular cancer sample (e.g., breast cancer) from the other cancer samples (e.g., endometrial cancer, cervical cancer, and ovarian cancer) from the normal controls.
While generating the AI/Models, the computing device 116 may follow one or more of the following steps: i. While applying the Al model, a logistic regression function may be applied on a training dataset to find a function separating Cancer samples versus Normal Control samples; ii. Class balancing parameters were configured in the Al model to deal with the imbalance of classes in the training dataset
This may generate a first Al Model Layer I, that may separate normal control samples from cancer Samples.
Another Al Model layer, may be termed as second Al Model Layer II, may also be generated that may be layered on the first Al Model Layer 1 to distinguish and identify a particular cancer sample (e.g., breast cancer) from the other cancer samples (e.g., endometrial cancer, cervical cancer, and ovarian cancer) from the normal controls. In an embodiment, the second Al Model Layer II may be generated in a similar way as the first Al Model Layer I, and may further include a one versus rest (OVR) classifier multiclass classification model that may be made using the training samples to give the second Al Model Layer II.
Thus, a two layered modeling scheme may be applied on the test set, in an embodiment. That is, firstly, Al model I differentiating cancer samples versus normal samples may be applied on the test set. Then, Al model II may be applied on the resulting predicted cancer samples. Now, if 4 cancer samples are taken, for example breast cancer, endometrial cancer, cervical cancer, and ovarian cancer, then this two-layered modeling scheme may result in 4 scores for each sample, with each score defining probability of the respective sample belonging to one of the four classes. In a particular example, above process as implemented by the system 100 is performed: out of total 1369 samples, 1119 samples were either Breast, Endometrial, Cervical and Ovarian without any disease. The data was randomly partitioned into training and test datasets in equal proportion. This resulted in 562 BECO (abbreviation for Breast cancer, Endometrial cancer, Cervical cancer, and Ovarian cancer) Cancer samples and 126 normal controls in training set, and 557 BECO Cancer samples and 124 normal controls in test set. The Al model I was applied on the training set (See e.g., Figure 8A) and tested in the test set to obtain Accuracy, Sensitivity and Specificity values. While applying the Al model I, a logistic regression function may be applied on the training dataset to find a function separating BECO Cancer samples versus Normal Control samples.
Further, Class balancing parameters may be configured in the Al model I to deal with the imbalance of classes in the training dataset. Thus, an AI/Model I may first be trained using the training dataset of samples. The trained model / algorithm I may find a score for each sample. Then, the trained model I may be evaluated on a test set to determine the accuracy. The sensitivity, specificity and accuracy obtained in this example was 98%, 98%, and 98% respectively.
In yet another exemplary embodiment, another multiclass Al model II may be layered on top of the earlier AI/Model I. The Al Model II acted on the predicted cancers samples from Al model I (breast, endometrial, cervical or ovary) and gave a multiclass score to each sample: one score for each disease class denoting the probability of the sample belonging to the respective disease class. Here, out of total 1119 BECO samples, 304 samples were Endometrial Cancer, 303 Breast Cancer, 250 Cervical Cancer and 262 Ovarian Cancer. The data was randomly partitioned into training and test datasets in equal proportion. This resulted in 152 Endometrial Cancer samples, 152 Breast Cancer, 127 Cervical Cancer and 131 Ovarian Cancer samples in training set and in 152 Endometrial Cancer samples, 151 Breast Cancer, 123 Cervical Cancer and 131 Ovarian Cancer samples in test set. Another set of 124 normal control samples were added to the test set. Then, a one versus rest (OVR) classifier multiclass classification model was made using the training samples to give Al model II. Then, a two layered modeling scheme (Al Model II layer over Al Model I layer) was applied on the test set. That is, firstly, Al Model I differentiating BECO versus normal samples was applied on the test set. Then, Al Model II was applied on the resulting predicted BECO samples. This resulted in 4 scores for each sample, with each score defining probability of the respective sample belonging to one of the four classes. In some embodiments, the system 100 may be further implemented for determining the accuracy of the multiclass model in differentiating specifically endometrial cancer from the other three cancers within the BECO group, as well as differentiating BECO samples from normal controls disclosed herein. The scores obtained from multiclass model were plotted. A plot of the multiclass model Endometrial Cancer Score for endometrial cancer samples against the scores for breast, cervical and ovarian (BCO) cancer samples give scores that clearly differentiate endometrial cancer from the other three women-specific cancers (breast cancer, cervical cancer, and ovarian cancer) upon applying a threshold to differentiate between two types results in a confusion matrix. Here, the normal samples are also added in the controls to get the sensitivity, specificity values for endometrial cancer versus all the other groups including normal controls. Sensitivity, Specificity, and Accuracy were calculated to be 100%, 97%, and 98% respectively (See FIG. 9). We also tested the accuracy of the multiclass model in differentiating specifically endometrial cancer using the entire set of samples for testing and training. In this exercise the training set included 152 endometrial cancer samples versus 410 of the BCO samples. The testing set included 152 endometrial cancer samples versus 405 of BCO samples and 124 normal controls (See FIG. 13A. The Sensitivity, Specificity and Accuracy were calculated to be 87%, 93%, and 91.6% respectively. (See e.g. FIG. 13B)
In some embodiments, the system 100 may be further implemented for determining the accuracy of the multiclass model in differentiating specifically breast cancer from the other three cancers within the BECO group, as well as differentiating BECO samples from normal controls disclosed herein. The scores obtained from multiclass model were plotted. A plot of the multiclass model Breast Score for Breast Samples and set of Endometrial, Cervical and Ovarian (ECO) Cancer samples gives scores that clearly differentiate Breast cancer from the other three women-specific cancer (endometrial cancer, cervical cancer, and ovarian cancer) upon applying a threshold to differentiate between two types results in a confusion matrix. Here, the normal samples are also added in the controls to get the sensitivity, specificity values for Breast cancer versus all the other groups including normal controls. Sensitivity, Specificity and Accuracy were calculated to be 97%, 100%, and 99% respectively (See FIG. 10). We also tested the accuracy of the multiclass model in differentiating specifically breast cancer using the entire set of samples for testing and training. In this exercise the training set included 152 breast cancer samples versus 410 of the ECO samples. The testing set included 151 breast cancer samples versus 406 of ECO samples and 124 normal controls (See FIG. 13A. The Sensitivity, Specificity and Accuracy were calculated to be 93%, 95%, and 94.4% respectively. (See e.g., FIG. 13C).
In another embodiment, the system 100 may be further implemented for specifically identifying cervical cancer cases from the other cancers in the BECO group was mentioned. Scores from the multiclass model for cervical cancer samples (Cervical Score) were plotted against the scores for endometrial, breast and ovarian (EBO) cancer samples. The model scores clearly differentiated between cervical and the EBO cancer samples upon applying a threshold to differentiate between two types results in a confusion matrix. Here, the normal samples are also added in the controls to get the sensitivity, specificity values for Cervical versus rest. Sensitivity, Specificity and Accuracy were calculated to be 87%, 100%, and 98% respectively (See FIG 11). We also tested the accuracy of the multiclass model in differentiating specifically cervical cancer using the entire set of samples for testing and training. In this exercise the training set included 127 cervical cancer samples versus 435 of the EBO samples. The testing set included 123 cervical cancer samples versus 434 of EBO samples and 124 normal controls (See FIG. 13A). The Sensitivity, Specificity and Accuracy were calculated to be 87%, 90%, and 87.6% respectively. (See e.g., FIG. 13D).
In another similar embodiment, the system 100 may be further implemented for specifically discriminating ovarian cancer samples from the other three cancers, and from control cases. For this, we plotted the scores for the ovarian cancer samples (Ovarian Score) from the multiclass model Ovarian Score for Ovarian Samples against the scores for the other three women-specific cancers EBC (endometrial cancer, breast cancer, cervical cancer). The model scores differentiated between ovarian and EBC Cancer samples upon applying a threshold to differentiate between two types results in a confusion matrix. Here, the normal samples are also added in the controls to get the sensitivity, specificity values for Ovarian versus rest. Sensitivity, Specificity and Accuracy were calculated to be 100%, 99%, and 99% respectively (See FIG. 12). We also tested the accuracy of the multiclass model in differentiating specifically ovarian cancer using the entire set of samples for testing and training. In this exercise the training set included 131 ovarian cancer samples versus 431 of the EBC samples. The testing set included 131 ovarian cancer samples versus 426 of EBC samples and 124 normal controls. The Sensitivity, Specificity and Accuracy were calculated to be 86%, 93%, and 92% respectively. (See e.g., FIG. 13E).
Referring to Figure 2 that illustrates a flow chart for implementing metabolomics process for differentiating the diseased cancer samples (for example, breast cancer, endometrial cancer, cervical cancer, and ovarian cancer) from the normal controls and further to identify each specific disease sample from among the disease cancer samples, in accordance with an embodiment of the present invention. The Figure 2 should be read and understood in conjunction with the Figures 1-15, and also may include at least one or more embodiments of the Figures 1-15, without deviating from the meaning and scope of the present invention.
Further, the method 200 may include at least one or more steps 202-212, individually or in combination.
Also, the method 200 is explained by taking an example of women-specific four cancers including Breast cancer, Endometrial cancer, Cervical cancer and Ovarian cancer (abbreviated as BECO), and should not be considered to limit the meaning and scope of the present invention.
The Figure 2 shows a metabolomics process 200 that may include a step 202 for collecting and storing number of samples from adult female volunteers who are either free of any cancer (normal control) (n =250), have stage 0/1 of endometrial cancer (n =304), breast cancer (n =303), cervical cancer (n =250) and ovarian cancer (n = 262), or are at early stages of any other cancer (disease controls). The samples are collected and stored in the sample collecting device 102.
Further, the method includes a step 204 extracting a metabolite extraction which may be achieved by precipitating serum proteins with chilled methanol. In an embodiment, the precipitation device 104 may be a test tube. The supernatant may be collected as the metabolite extract. Thereafter, at a step 206, the metabolite extract may be dried before use. For the process of drying, in an embodiment, the phase separation device 106 may be used that may dry the metabolite extract using speed vacuum. At a step of 208, in an embodiment, the dried extract may be reconstituted in an aqueous solution in a mobile phase using a device 108. Thereafter, at a step of 210-212, analysis of the resultant samples, derived from the reconstitution phase, may be performed by the UHPLC-HRMS. At step 210, the reconstituted samples may be first resolved by Ultra High-Performance Liquid Chromatography (abbreviated as UHPLC) device 110, and then, at a step 212, the ion spectra may be subsequently obtained through high-resolution mass spectrometer 112 (abbreviated as HRMS). Using the UHPLC device 110 and the HRMS 112, ions in the metabolite extraction may be measured, the masses for the ions may be measured based on their mass-to-charge ratio or m/z. Thereafter, at a step of 214, the features of the ion spectra accumulated in metabolic profile may be extracted using the computing device 114 that may execute, using one or more processors, compound discoverer software. Furthermore, in an embodiment, the method 200 may include a step of 216 aligning the masses obtained for the ions in the metabolome profile, using the UHPLC device 110 and the HRMS 112, across all the samples. This may be done to enable comparison of the peak intensity of each ion across all the samples.
Further, in an embodiment, the method 200 may include a step 218 of minimizing the errors that may be generated in measurement of the masses for the ions. To normalize for unavoidable, but minor, variations in mass (m/z), a sophisticated approach of using parts per million (ppm) error-based approach may be used, according to an embodiment. In another embodiment, briefly, a modified virtual lock mass-based approach may also be used. This is based on the principle that mass errors are known to increase with mass. This modified virtual lock mass-based approach may be used and adapted according to the datasets in examples of the invention. This may be done by combining the traditional virtual lock mass approach with metabolite identification from the Human Metabolome database (HMDB). Specifically, the virtual lock mass boxes may be defined using the masses of metabolites identified by HMDB database search across multiple samples. Subsequently, the metabolite ions may be filtered based on the frequency of presence in samples. In an embodiment, a 20% cutoff may be used for metabolite ions filtering; meaning ions present in greater than 20% of samples may be used in subsequent analysis.
The steps 216 and 218 may be optionally included in the method 200. Further, the flow of the steps 216 and 218 may be altered, and may not be restricted to as shown in the method 200 in figure 2.
The method 200 may furthermore include a step 220 of applying AI/ML models / algorithms on the obtained, measured (also, e.g„ aligned, corrected) and featured metabolite ions, whi ch are measured and aligned as explained above. The step 220 may include applying AI/ML models for statistical analysis of the samples. The computing device 116 that may be able to execute one or more AI/ML algorithms for applying the AI/ML models for statistical analysis of the samples.
The step 220 of applying AI/ML models / algorithms may include creating and applying at least two Al models, namely a first Al Model I and a second Al Model II. By executing the one or more AI/ML algorithms, using one or more processors, at the computing device 116, one or more first AI/ML models may be generated to distinguish the cancer samples (breast cancer, endometrial cancer, cervical cancer, and ovarian cancer) from the normal controls. Further, in another embodiment, executing the one or more AI/ML algorithms, using one or more processors, at the computing device 116, one or more second AI/ML models may be generated that may be layered on the one or more first AI/ML models to further distinguish and identify a particular cancer sample (e.g., breast cancer) from the other cancer samples (e.g., endometrial cancer, cervical cancer, and ovarian cancer) from the normal controls.
While generating the AI/Models at step 220, the computing device 116 may follow one or more of the following steps: i. While applying the Al model, a logistic regression function may be applied on a training dataset to find a function separating Cancer samples versus Normal Control samples; ii. Class balancing parameters were configured in the Al model to deal with the imbalance of classes in the training dataset.
This may generate a first Al Model Layer I, at the step 220, that may separate normal control samples from cancer Samples. Another Al Model layer, may be termed as second Al Model Layer II, may also be generated, at step 220 that may be layered on the first Al Model Layer 1 to distinguish and identify a particular cancer sample (e.g., breast cancer) from the other cancer samples (e.g., endometrial cancer, cervical cancer, and ovarian cancer) from the normal controls. In an embodiment, the second Al Model Layer II may be generated in a similar way as the first Al Model Layer I, and may further include a one versus rest (OVR) classifier multiclass classification model that may be made using the training samples to give the second Al Model Layer II.
Thus, a two layered modeling scheme may be applied on the test set, in an embodiment. That is, firstly, Al model I differentiating cancer samples versus normal samples may be applied on the test set. Then, Al model II may be applied on the resulting predicted cancer samples. Now, if 4 cancer samples are taken, for example breast cancer, endometrial cancer, cervical cancer, and ovarian cancer, then this two-layered modeling scheme may result in 4 scores for each sample, with each score defining probability of the respective sample belonging to one of the four classes.
In some embodiments, the cancer is ovarian. In some embodiments, after diagnosing or detecting the cancer, thereafter is performed surgery to remove one ovary or both ovaries. In some embodiments, the surgery includes removing at least one affected ovary and its fallopian tube. In some embodiments, surgery removes both ovaries and the uterus.
In some embodiments, after diagnosing or detecting the ovarian cancer, thereafter is administered one or more chemotherapy drugs for ovarian cancer. In some embodiments, the chemotherapy drugs are injected into a vein, into the abdomen (intraperitoneal chemotherapy), or taken by mouth.
In some embodiments, the cancer is endometrial cancer. In some embodiments, after diagnosing or detecting the endometrial cancer, thereafter is performed surgery to remove the uterus (hysterectomy), and in some embodiments to also remove the fallopian tubes and/or one or more ovaries (salpingo-oophorectomy).
In some embodiments, after diagnosing or detecting the endometrial cancer, thereafter is performed radiation therapy using a machine outside the body to administer radiation to the endometrial cancer. In some embodiment, radiation placed inside the body, for example, internal radiation (brachytherapy) involving placing a radiation-filled device, such as small seeds, wires or a cylinder, inside the patient's vagina.
In some embodiments, after diagnosing or detecting the endometrial cancer, thereafter is administered one or more chemotherapy drugs for endometrial cancer. In some embodiments, the one or more chemotherapy drugs are administered orally or through veins (intravenously).
In some embodiments, the cancer is endometrial cancer. In some embodiments, after diagnosing or detecting the endometrial cancer, thereafter is performed surgery to remove the cancer only. In some embodiments, the surgery is a cone biopsy, which leaves most of the cervix intact. In some embodiments, the surgery removes the cervix (trachelectomy). In some embodiments, the surgery is a radical trachelectomy procedure, which removes the cervix and some surrounding tissue. In some embodiments the surgery removes the cervix and uterus (hysterectomy).
In some embodiments, after diagnosing or detecting the cervical cancer, thereafter is performed radiation therapy using a machine to administer radiation to the cervical cancer. In some embodiments, the radiation is external, by directing a radiation beam at the affected area of the body (external beam radiation therapy). In some embodiment, the radiation is internal, by placing a device filled with radioactive material inside the patient's vagina, usually for only a few minutes (brachytherapy) In some embodiments, after diagnosing or detecting the cervical cancer, thereafter is administered one or more chemotherapy drugs for cervical cancer. In some embodiments, the one or more chemotherapy drugs are administered orally or through veins (intravenously).
In some embodiments, the cancer is breast cancer. In some embodiments, after diagnosing or detecting the breast cancer, thereafter is performed surgery to remove the breast cancer (lumpectomy), optionally with a comparatively (with respect to the cancer) small margin of surrounding healthy tissue and/or optionally undergoing chemotherapy before surgery to shrink a tumor and make it possible to remove cancer completely with a lumpectomy procedure.
In some embodiments, after diagnosing or detecting the breast cancer, thereafter is performed surgery to remove the entire breast (mastectomy). In some embodiments, surgery removes all of the breast tissue — the lobules, ducts, fatty tissue and some skin, including the nipple and areola (total or simple mastectomy). In some embodiments, further surgery removes one or more limited number of lymph nodes (sentinel node biopsy).
The present invention is illustrated by examples. The examples are meant only for illustrative purposes only and should not be construed as limiting. The examples below are described in details, while implementing the system and method described in Figures 1-2, respectively.
EXAMPLES
Using liquid chromatography-mass spectrometry for untargeted metabolomics of serum described the approach for early-stage distinction of women’s specific cancer from the control cases. The specimen collected for these cancer subject i.e., endometrial cancer, breast cancer, cervical cancer and ovarian cancer were 304, 303, 250 and 262 respectively (Fig. 16, Table- A). Specimen collected for normal control cases was 250 in number (Fig. 16, Table- A).
Fig. 16, Table A: Details of the specimen collected.
The untargeted metabolomics approach (See e.g., Figure-4) generated a large metabolites list, which were further divided into subset of normal control, endometrial cancer, breast cancer, cervical cancer and ovarian cancer (See e.g., Figure-5). The mass and retention time index for these metabolite ions are shown in Figure 6A&B. The number of identified metabolites in each of these groups were 5895, 5971, 5982, 6300 and 6336 metabolites respectively. The total number of unique metabolites, across all groups, identified in the present study was 7596 in number. The plant and drug metabolites were removed from this database. Next, the data was passed through our data processing pipeline (See e.g., Figure-6C). Briefly, here firstly samples were aligned using a combination of VLM approach along with identified metabolites to make a matrix of 1369 samples and 6893 metabolites along with the corresponding intensity information. This intensity values were transformed into loglO scale. Then, in an embodiment, metabolites ion filtering was performed to find metabolites consistently present in samples. Then, in an embodiment, data normalization and missing value imputation were performed on the data. This resulted in a matrix of total of 2823 metabolites across 1369 samples. Out of 1369 samples, 304 samples were of endometrial cancer, 303 were breast cancers, 250 were cervical cancer, 262 ovarian cancer, and 250 were normal control samples (Fig. 16, Table- A).
To find whether there was any difference in these samples based on the metabolite profiles, the matrix generated above was used. A PCA plot was made using the matrix as shown in Figure- 7. The figure 7 clearly shows that each cancer can be distinguished from each other as well as from healthy samples based on their metabolic data. To quantify how well these can be distinguished, an Al analysis (See e.g., Figures 1-2, 8-15) was done on the data as described below to find common patterns in metabolite variations within cancer samples which is different from control samples. Furthermore, a classification model built on the detected metabolite ions with random distribution of samples into testing and training sets (See e.g., Figures 1-2, 8-15). The first such model (e.g., Al Model I) were built for the BECO cancer (Endometrial Cancer +Breast cancer + Cervical cancer + Ovarian cancer) with 1119 BECO cancer samples and 250 control cases were taken into consideration. Further, these study samples (n=1369) were randomly divided (50%) into training and testing sets (See e.g., Figure- 8 A) (Fig. 17, Table-B). A multivariate classifier was derived into the training set, and evaluated in the testing sets and a confusion matrix with predicted and true label was generated. This leads to ultimately, distinguish BECO cancer candidate from the controls with 98%, 98.3% and 98% of sensitivity, specificity and accuracy respectively (See e.g., Figure-8B).
Further, a multiclass classifier was also built to distinguish within BECO cancers. Here, a model (e.g., Al Model II) was built with total of 304 Endometrial, 303 Breast, 250 Cervical and 262 ovarian cancer samples. These study samples were randomly divided (50%) into the training and testing sets. The training sets contain a total of 152 Endometrial, 152 Breast, 127 Cervical and 131 Ovarian cancer samples. Similarly, for testing sets a total of 152 Endometrial, 151 Breast, 123 Cervical and 131 Ovarian cancer samples were grouped for test cases (Fig. 18, Table-C, Figure 13A). A set of 124 normal samples were also kept in test set to test the accuracy of applying first BECO versus normal model and then applying multiclass model to distinguish between BECO cancers. A multivariate classifier was derived into the training sets, and evaluated in the testing sets. The multiclass model gave four scores to each sample: Endometrial score, Breast score, Cervical score and ovarian score.
To test the accuracy of using endometrial cancer scores to find endometrial cancers, a confusion matrix with predicted and true label was generated based on applying a threshold on endometrial scores which ultimately leads to distinction of endometrial cancer candidate from the others with 87%, 93% and 91.6% of sensitivity, specificity and accuracy respectively (See e.g., Figures 1-2, 13B).
To test the accuracy of using breast cancer scores to find breast cancers, a confusion matrix with predicted and true label was generated based on applying a threshold on breast scores which ultimately leads to distinction of breast cancer candidate from the others with 93%, 95% and 94.4% of sensitivity, specificity and accuracy respectively (See e.g., Figures 1-2, 13C).
To test the accuracy of using cervical cancer scores to find cervical cancers, a confusion matrix with predicted and true label was generated based on applying a threshold on cervical scores which ultimately leads to distinction of cervical cancer candidate from the others with 87%, 90% and 87.6% of sensitivity, specificity and accuracy respectively (See e.g., Figures 1-2, 13D).
To test the accuracy of using ovarian cancer scores to find ovarian cancers, a confusion matrix with predicted and true label was generated based on applying a threshold on ovarian scores which ultimately leads to distinction of ovarian cancer candidate from the others with 86%, 93% and 92% of sensitivity, specificity, and accuracy respectively (See e.g., Figures 1-2, 13E).
Hence, as explained above, the system 100 and the method 200 may efficiently detect and distinguish cancer samples from the normal controls using a first Al Model I, and further may efficiently detect and distinguish each individual cancer sample from the other cancer samples and the normal controls using a second Al Model II applied over the first Al Model I.
Following are the explanation of the exemplary processes and devices that may be used in the system 100 and the method 200 for executing the metabolomics process, and that were used in the present study conducted.
Subjects and Methods Serum samples were collected from two different US based clinical centers. The demographic and ethnic distribution of the specimens were shown in Fig. 16, Table-A. Controls and disease cases were catalogued according to age-group, BMI, ethnicity and stages of cancer. All diagnoses were made in accordance with uniform histological and pathological guidelines.
Serum Specimens
Blood samples were collected and processed according to standardized protocols. Serum samples selected for analysis were distributed into five batches, such as normal, endometrial cancer, breast cancer, cervical cancer and ovarian cancer. Each sample was assigned a unique laboratory identification number, which specified the order of processing and blinded laboratory personnel to sample identity. Samples were stored at -80C until use.
Sample Preparation
Metabolite extraction from serum was performed as explained previously. Briefly, all the serum samples were thawed on ice and mixed properly. 10 pl of each serum sample was taken in microfuge tube (1.5ml), (Genaxy, Cat No. GEN-MT-150-C. S) and then 30pl of chilled Methanol, (Merck, Cat.No.1.06018.1000) to the sample, vortexed briefly and then kept at - 20°C for 60 minutes.
The sample was then centrifuged (Sorvall Legend Microl7, Thermo Fisher Scientific, Cat.No. Ligend Micro 17) at 10000 rpm for 10 minutes. After centrifugation 27ul supernatant was collected in separate microfuge tube without disturbing the pellet and dried using Speed Vacuum, (ThermoFisher Scientific, Cat.No. SPD1030-230) at low energy for 30-35 minutes. Samples pellets were then re-suspended using 30ul methanol: water (1 : 1, water: methanol) mixture for injection. Or the samples can be stored at -20°C without re-suspending it.
LC-MS/MS Analysis
Untargeted LC-MS/MS metabolomics experiments were performed using Dionex LC system (Ultimate 3000) coupled online with QExactive Plus (Thermo Scientific). Each extracted metabolite sample was injected (lOul for positive ESI ionization) onto Acquity UPLC HSS T3 from Waters (1.8 micron, dimensions - 2.1 x 100 mm, Part No. 186003539), which was heated to 40C. The flow rate was 0.3ml/min. Mobile phase A was (water +0.1% formic acid), and mobile phase B was (methanol +0.1% formic acid). The mobile phase was kept isocratic at 5% B for Imin, and was increased to 95% B in 7min and kept for another two min at 95% B, the mobile phase composition returned to 5% B in 14min. The ESI voltage was 4 kV. The mass accuracy of QExactive mass spectrometry was less than 5 ppm and calibrated at recommended schedule prior to each batch run. The mass scan range is from 66.7-1000 Da, and resolution was set to 35000. The maximum inject time for orbitrap was 100msec while, AGC target was optimized with le6. Representative images of the chromatograms obtained from normal control and the individual cancer samples are shown in Figure-4.
Results
The demographic and ethnicity distribution (Figure-3) (Fig. 16, Table- A) for control and cancer cases were well balanced with respect to frequency matching covariates like age, race, BMI and stages of cancer. None of the observed differences in the distribution of these variables between control and disease cases, in either training and testing sets, reached statistical significance. The majority of the disease cases (-80%) were from stage-I. Schematic of the entire method is depicted in Figure-1, where major steps of the process has been explained graphically. The extracted metabolites from the sera were injected into the Dionex LC system coupled online with the QExactive Plus mass spectrometry. The representative image of the chromatogram (Figure-4) shown to locate the spectral changes in the control and disease cases. The difference in the chromatogram were further significantly enhanced using the compound discoverer (Thermo Scientific) software.
The data was first subjected to preprocessing as shown schematically in Fig. 6. The various steps in data preprocessing are mentioned below:
1. Incorporating mass errors in the data:
Mass errors are known to be present in metabolomics data. This means that the same identified metabolite in different samples would have slightly different mass. This creates problems when intensity of same metabolite has to be compared across samples. This intensity comparison is required in the downstream Al based analysis. Usually, a fixed window size of mass is used to align the samples, but here, we have used a sophisticated approach of using a parts per million (ppm) error-based approach. Briefly, we have adapted the virtual lock mass (vim) based approach. This is based on the principle that mass errors are known to increase with mass. This, approach was used and adapted according to our datasets. This was done by combining the traditional vim based approach with metabolite identification from HMDB database. Specifically, the vim boxes were defined using the masses of metabolites identified by HMDB database search across multiple samples. 2. Metabolite Ions Filtering: The metabolite ions were filtered based on the frequency of presence in samples. A 20% cutoff was used, wherein only those metabolites present in greater than 20% of samples was used in subsequent analysis.
3. Data Normalization: Owing to the variations in the metabolic data across various conditions of the mass spectrometer, normalization methods are needed to minimize the variations in the data. Different normalization methods were tried such as Quantile Normalization, Variance Stabilization Normalization, Best Normalization, Probabilistic Quotient Normalization. Quantile Normalization (QN) was selected as the one performing best across various conditions of the experiment. QN method was further adapted to our datasets to enable normalization of new samples with respect to training datasets and testing of one sample at a time.
4. Missing value imputation: Missing values in untargeted metabolomics data is known to be problematic. A k-nearest neighbors (KNN) approach was applied to impute the missing values in the data to make the data more homogenous and amenable to Al based analysis.
5. Al modeling of the data: Now with the above data, Al models were made to differentiate cancers from normal and then between the individual cancers.
Keeping in mind clinical applications of the Al model in the present invention, a layered approach was used here in which first, an Al model was developed to differentiate BECO cancers from normal controls. Out of total 1369 samples, 304 samples were of Endometrial Cancer, 303 Breast Cancers, 250 Cervical Cancer, 262 Ovarian Cancer and 250 Normal Control samples. To determine whether there is any difference in these samples based on metabolic data, the matrix generated above was used. A PCA plot was made using the matrix as shown in Figure-7. The figure clearly shows that each cancer can be distinguished from each other as well as from normal control samples based on their metabolic data. To quantify how well these can be distinguished, an Al analysis was done on the data as described below to find common patterns in metabolite variations within cancer samples which is different from normal control samples.
Distinguishing women-specific cancer samples from controls
Out of total 1369 samples, 1119 samples were either Breast, Endometrial, Cervical and Ovarian Cancer (BECO) and 250 were normal controls. The controls included normal samples without any disease. The data was randomly partitioned into training and test datasets in equal proportion. This resulted in 562 BECO Cancer samples and 126 Controls in training set, and 557 BECO Cancer samples and 124 Controls in test set (Fig. 17, Table B). The Al model was applied on the training set (See e.g., Fig. 8A) and tested in the test set to obtain Accuracy, Sensitivity and Specificity values. The logistic regression function was applied on the training data to find a function separating BECO Cancer samples versus Control samples. Class balancing parameters were configured in the model to deal with the imbalance of classes in the training dataset. The trained algorithm finds a score for each of the sample according to the formulae below: y_score=xO+xl*Ii+ x2*h+ x3*h+ +x2823*l2823
Here, xO is a constant number, li (l<=i<=2823) is the intensity of metabolite i present in the respective sample. Figure-14 gives the value of coefficient xi (l<=i<=2823) for each metabolite. Any value near this value may be used as a signature for differentiating between BECO Cancer cases versus Controls.
The evaluation of the trained model as applied on test set for a single partition of data was shown for example in Figure-8B. The scatter plot shows the Model Score for Controls and BECO Cancer cases. The model scores are clearly seen to be different between Controls and BECO Cancer samples where on applying a threshold of 5 to differentiate between two types results in a confusion matrix as shown. Sensitivity, Specificity and Accuracy can be calculated from below formulae:
TP+TN ccuracy. Tp+TN+Fp+FN
TP
Sensitivity
J : — — - TP+FN
Sp
1 ecificity J : TN T +N FP
Figure imgf000033_0001
This results in Accuracy of 98%, Sensitivity of 98% and Specificity of 98.3%. (Fig. 8B)
Differentiating Endometrial, Breast, Cervical, and Ovarian Cancer from each other In the second step, another multiclass Al model was layered on top of it which acted on the predicted cancers samples from 1st model (breast, endometrial, cervical or ovarian) and gave a multiclass score to each sample: one score for each disease class denoting the probability of the sample belonging to the respective disease class.
Here, out of total 1119 BECO samples, 304 samples were Endometrial Cancer, 303 Breast Cancer, 250 Cervical Cancer and 262 Ovarian Cancer. The data was randomly partitioned into training and test datasets in equal proportion as shown in for example Figs 9-13. This resulted in 152 Endometrial Cancer samples, 152 Breast Cancer, 127 Cervical Cancer and 131 Ovarian Cancer samples in training set and in 152 Endometrial Cancer samples, 151 Breast Cancer, 123 Cervical Cancer and 131 Ovarian Cancer samples in test set (Fig. 18, Table C)(Fig. 13A). Another set of 124 normal control samples were added to the test set (Fig. 18, Table C)(Fig. 13 A). Then, a one versus rest (OVR) classifier multi class classification model was made using the training samples to give Al model2. Then, a two layered modeling scheme was applied on the test set. That is, firstly, Al model 1 differentiating BECO versus normal samples was applied on the test set. Then, Al model2 was applied on the resulting predicted BECO samples. This resulted in 4 scores for each sample, with each score defining probability of the respective sample belonging to one of the four classes.
For the multi class model: Al model 2, a one versus rest (OVR) classifier multi class classification model was made using the training samples. The trained algorithm finds 4 scores for each of the sample according to the formulae below: y_scorel=yO+yl*Ii+ y2*h+ y3*E+ +y2823*l2823 y_score2=zO+zl*Ii+ z2*h+ z3*E+ +z2823*l2823 y_score3=a0+al*Ii+ a2*h+ a3*E+ +a2823*l2823 y_score4=bO+bl*Ii+ b2*h+ b3*E+ +b2823*l2823
Here, yO, zO, aO, bO are constant number, li (l<=i<=2823) is the intensity of metabolite i present in the respective sample. Figure-15 A-D gives the value of coefficient yi, zi, ai, bi ( 1 <=i<=2823) for each metabolite. Any value near this value may be used as a signature for differentiating within BECO samples.
To find out how well this multiclass model is differentiating BECO samples from each other as well as from BECO samples from normal, the scores obtained from multiclass model were plotted. As shown in for example in the Fig. 9, 13, the multi class model Endometrial Score for Endometrial Samples and set of Breast, Cervical and Ovarian (BCO) Cancer samples were plotted. The model scores are clearly seen to be different between Endometrial and BCO Cancer samples where on applying a threshold to differentiate between two types results in a confusion matrix as shown. Here, the normal samples are also added in the controls to get the sensitivity, specificity values for Endometrial versus rest. Sensitivity, Specificity and Accuracy can be calculated from below formulae:
TP+TN ccuracy. Tp+TN+Fp+FN
TP
Sensitivity J : — TP+ —FN -
Sp
1 ecificity J : TN T +N FP
Figure imgf000035_0001
This results in Accuracy of 91.6%, Sensitivity of 87% and Specificity of 93%. (Fig. 13B)
To find out how well the multiclass model is differentiating BECO samples from each other as well as from BECO samples from normal, the scores obtained from multiclass model we plotted. As shown in, for example, Figs. 10, 13, the multiclass model Breast Score for Breast Samples and set of Endometrial, Cervical and Ovarian (ECO) Cancer samples were plotted. The model scores are clearly seen to be different between Breast and ECO Cancer samples where on applying a threshold to differentiate between two types results in a confusion matrix as shown. Here, the normal samples are also added in the controls to get the sensitivity, specificity values for Breast versus rest. Sensitivity, Specificity and Accuracy can be calculated from below formulae:
TP+TN
Accuracy:
TP+TN+FP+FN
TP
Sensitivity
J : - TP+FN
TN
Specificity:
TN+FP
Figure imgf000036_0001
This results in Accuracy of 94.4%, Sensitivity of 93% and Specificity of 95%. (Fig. 13C)
To find out how well the multiclass model is differentiating BECO samples from each other as well as from BECO samples from normal controls, the scores obtained from multiclass model were plotted. As shown in, for example, Fig. 11, 13, the multiclass model Cervical Score for Cervical Samples and set of Endometrial, Breast and Ovarian (EBO) Cancer samples were plotted. The model scores are clearly seen to be different between Cervical and EBO Cancer samples where on applying a threshold to differentiate between two types results in a confusion matrix as shown. Here, the normal samples are also added in the controls to get the sensitivity, specificity values for Cervical versus rest. Sensitivity, Specificity and Accuracy can be calculated from below formulae:
TP+TN ccuracy. Tp+TN+Fp+FN
TP
Sensitivity
J : — — - TP+FN
Sp
1 ecificity J : TN T +N FP
Figure imgf000036_0002
This results in Accuracy of 87.6%, Sensitivity of 87% and Specificity of 90%. (Fig. 13D)
To find out how well the multiclass model is differentiating BECO samples from each other as well as from BECO samples from normal, the scores obtained from multiclass model were plotted. As shown in for example Fig. 12, 13, the multiclass model Ovarian Score for Ovarian Samples and set of Endometrial, Breast and Cervical (EBC) Cancer samples were plotted. The model scores are clearly seen to be different between Breast and EBC Cancer samples where on applying a threshold to differentiate between two types results in a confusion matrix as shown. Here, the normal samples are also added in the controls to get the sensitivity, specificity values for Ovarian versus rest. Sensitivity, Specificity and Accuracy can be calculated from below formulae:
TP+TN
Accuracy:
TP+TN+FP+FN
TP
Sensitivity:
TP+FN
TN
Specificity:
TN+FP
Figure imgf000037_0001
This results in Accuracy of 92%, Sensitivity of 86% and Specificity of 93%. (Fig. 13E)
Numerous variations, whether explicitly given in the specification or not, such as differences in structure, dimension, and use of material, are possible. Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any component(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature or component of any or all the claims.
It is intended that the disclosure and examples be considered as exemplary only. Though the present disclosure includes examples from manufacturing systems, such as CNC machines, the system and method disclosed herein may be employed for various businesses as would be appreciated by one skilled in the art. The references to manufacturing systems and CNC machines used here are intended to be applied or extended to the larger scope and should not be construed as restricting the scope and practice of present invention.

Claims

What is claimed is:
1. A system for detection of early stages cancer in an organism in a single analysis, the system comprising: at least one sample collecting device for collecting one or more biological fluid samples from one or more organisms or individuals; at least one precipitating device for extraction of one or more metabolite extracts from the one or more biological fluid samples by precipitation of protein present in the biological fluid samples with chilled alcohol including at least methanol; at least one phase separation device for drying the one or more metabolite extracts extracted from the at least one precipitating device; at least one device for reconstituting the one or more dried metabolite extracts in aqueous solutions in a mobile phase; at least one Liquid Chromatography (LC) device with a mass spectrometer (MS) (abbreviated, herein after, as LC-MS) for analysing one or more resultant reconstituted metabolites, after the reconstituting, using the LC-MS technique, wherein the LC-MS device is configured to: resolve the one or more resultant reconstituted metabolites by an Ultra High- Performance Liquid Chromatography using the LC device; obtain ion spectra of the one or more resultant reconstituted metabolites through the MS device; and measure masses for metabolite ions present in the ion spectra, of the one or more resultant reconstituted metabolites, based on their mass-to-charge ratio or m/z through the MS device; at least one computing device for executing one or more AI/ML algorithms on the measured metabolite ions to create at least one or more Al Models for identifying and differentiating diseased cancerous samples from non-diseased normal samples and further to identify and differentiate individual diseased cancerous sample from other diseased cancerous samples and the non-diseased normal samples, where the measured metabolite ionsdata is randomly divided into a training dataset and a test dataset, and wherein the at least one computing device is configured to:
36 apply a logistic regression function by executing the AI/ML algorithms on the training dataset of the metabolite ions to find a function separating diseased cancerous samples from non-diseased normal samples; configure one or more class balancing parameters to balance the imbalance of classes in the training dataset, and thereby creating a first Al Model from the training dataset; apply the first Al Model on the test dataset of the metabolite ions for identifying and differentiating the diseased cancerous samples from the nondiseased normal samples based on the function separating diseased cancerous samples from non-diseased normal samples; configure a one versus rest (OVR) classifier multiclass classification model using the training dataset, and thereby creating a second Al Model from the training dataset; and apply the second Al Model over the first Al Model to identify and differentiate individual diseased cancerous sample from the other diseased cancerous samples and the non-diseased normal samples by obtaining scores assigned to each of the individual diseased cancerous sample.
2. The system of claim 1, wherein the one or more biological fluid samples include a material or mixture of materials including in liquid or solid form, containing one or more analytes of interest, the one or more biological fluid samples refers to any mammalian material containing cells or producing cellular metabolites that include at least one of the group containing a tissue or a fluid isolated from an individual including plasma, serum, cerebrospinal fluid, lymph, tears, saliva and tissue sections or from in vitro cell culture constituents, as well as samples from the environment, body fluids, including blood, mucus, lymphatic fluid, synovial fluid, cerebrospinal fluid, saliva, amniotic fluid, amniotic cord blood, urine, vaginal fluid and semen, or also refer to a homogenate, lysate or extract prepared from a whole organism or a subset of its tissues, cells or component parts, or a fraction or portion thereof, including plasma, serum, spinal fluid, lymph fluid, the external sections of the skin, respiratory, intestinal, and genitourinary tracts, tears, saliva, milk, blood cells, tumors, organs.
37
3. The system of claim 2, wherein the at least one computing device executing the AI/ML algorithms is further configured to determine accuracy of the multiclass classification model in the second Al Model in differentiating individual diseased cancerous sample from the other diseased cancerous samples and the non-diseased normal samples, wherein in determining the accuracy, the at least one computing device executing the AI/ML algorithms is further configured to: obtain and plot the scores obtained from the multiclass classification model in the second Al Model, and wherein the plot of the multiclass classification model of one individual diseased cancerous sample against the scores for other diseased cancerous samples give scores that clearly differentiate the one individual diseased cancerous sample from the other diseased cancerous sample upon applying a threshold to differentiate between two types results in a confusion matrix; and wherein, scores of the non-diseased normal samples are also added to get sensitivity, specificity values for the one individual diseased cancerous sample versus all the other diseased cancerous samples including the non-diseased normal samples.
4. The system of claim 3, wherein the sample collecting device is a test tube; the precipitation device is a test tube; and wherein the phase separation device is used to dry the metabolite extract using speed vacuum.
5. The system of claim 3, wherein the system further includes at least one computing device with compound discoverer software for automated data extraction of the measured metabolite ions and their related features using the compound discoverer software; and wherein the at least one computing device with compound discoverer software significantly enhances any differences in chromatogram obtained using the LC device.
6. The system of claim 3, wherein the system further includes at least one device for aligning the masses measured for the measured metabolite ions, measured using LC device and the MS device, wherein the alignment is done across all the one or more biological fluids samples in order to enable comparison of peak intensity of each ion across all the biological fluids samples.
7. The system of claim 3, wherein the system further includes at least computing device for minimizing any errors generated in measurement of the masses for the metabolite ions, and wherein in minimizing the errors, the at least computing device is configured to normalize for variations in the masses (m/z) by applying a modified virtual lock mass-based approach that includes: combining a traditional virtual lock mass approach with metabolite identification from the Human Metabolome database (HMDB); defining virtual lock mass boxes using the masses of metabolites identified by HMDB database search across multiple samples; filtering the metabolite ions based on the frequency of presence of ions in the biological fluid samples is used in subsequent AI/ML analysis.
8. The system of claim 1, wherein the system further includes at least computing device for minimizing any errors generated in measurement of the intensity of the masses for the metabolite ions, and wherein in minimizing the errors, the at least computing device is configured to normalize for variations in the intensity of masses (m/z) by applying a data normalization procedure that includes: utilizing one or more of the following methods Quantile Normalization, Variance Stabilization Normalization, Best Normalization, Probabilistic Quotient Normalization; applying Quantile Normalization method for normalizing the intensity of masses across samples; adapting the Quantile Normalization method to our datasets to enable normalization of new samples with respect to training datasets thus enabling normalization of one sample at a time.
9. The system of claim 1, wherein the system further includes at least computing device for minimizing any errors generated during measurement of the masses for the metabolite ions, and wherein in minimizing the errors, the at least computing device is configured to generate data of any missing metabolite masses in a sample that includes:
-utilizing a nearest neighbours (K -Nearest Neighbours) approach to impute the missing values in the data to make the data more homogenous and amenable to Al based analysis
- Adapting K-Nearest Neighbours approach to our datasets to enable imputing the missing values in new samples with respect to training datasets thus enabling imputation of one sample at a time.
10. A method for detection of early stages cancer in an organism in a single analysis, the method comprising: collecting, using at least one sample collecting device, one or more biological fluid samples from one or more organisms or individuals; extracting, using at least one precipitating device, of one or more metabolite extracts from the one or more biological fluid samples by precipitation of protein present in the biological fluid samples with chilled alcohol including at least methanol; drying, using at least one phase separation device, the one or more metabolite extracts extracted from the at least one precipitating device; reconstituting, using at least one device, the one or more dried metabolite extracts in aqueous solutions in a mobile phase; analysing, using at least one Liquid Chromatography (LC) device with a mass spectrometer (MS) (abbreviated, herein after, as LC-MS), one or more resultant reconstituted metabolites, after the reconstituting, using the LC-MS technique, wherein the LC-MS device is configured to: resolve the one or more resultant reconstituted metabolites by an Ultra High- Performance Liquid Chromatography using the LC device; obtain ion spectra of the one or more resultant reconstituted metabolites through the MS device; and measure masses for metabolite ions present in the ion spectra, of the one or more resultant reconstituted metabolites, based on their mass-to-charge ratio or m/z through the MS device; executing one or more AI/ML algorithms on the measured metabolite ions, using at least one computing device, to create at least one or more Al Models for identifying and differentiating diseased cancerous samples from non-diseased normal samples and further to identify and differentiate individual diseased cancerous sample from other diseased cancerous samples and the non-diseased normal samples, wherein the at least one computing device is configured to: divide the measured metabolite ions into a training dataset and a test dataset; apply a logistic regression function by executing the AI/ML algorithms on the training dataset of the metabolite ions to find a function separating diseased cancerous samples from non-diseased normal samples; configure one or more class balancing parameters to balance the imbalance of classes in the training dataset, and thereby creating a first Al Model from the training dataset; apply the first Al Model on the test dataset of the metabolite ions for identifying and differentiating the diseased cancerous samples from the nondiseased normal samples based on the function separating diseased cancerous samples from non-diseased normal samples; configure a one versus rest (OVR) classifier multiclass classification model using the training dataset, and thereby creating a second Al Model from the training dataset; and apply the second Al Model over the first Al Model to identify and differentiate individual diseased cancerous sample from the other diseased cancerous samples and the non-diseased normal samples by obtaining scores assigned to each of the individual diseased cancerous sample.
11. The method of claim 10, wherein the one or more biological fluid samples include a material or mixture of materials including in liquid or solid form, containing one or more analytes of interest, the one or more biological fluid samples refers to any mammalian material
41 containing cells or producing cellular metabolites that include at least one of the group containing a tissue or a fluid isolated from an individual including plasma, serum, cerebrospinal fluid, lymph, tears, saliva and tissue sections or from in vitro cell culture constituents, as well as samples from the environment, body fluids, including blood, mucus, lymphatic fluid, synovial fluid, cerebrospinal fluid, saliva, amniotic fluid, amniotic cord blood, urine, vaginal fluid and semen, or also refer to a homogenate, lysate or extract prepared from a whole organism or a subset of its tissues, cells or component parts, or a fraction or portion thereof, including plasma, serum, spinal fluid, lymph fluid, the external sections of the skin, respiratory, intestinal, and genitourinary tracts, tears, saliva, milk, blood cells, tumors, organs.
12. The method of claim 10, wherein the at least one computing device executing the AI/ML algorithms is further configured to determine accuracy of the multiclass classification model in the second Al Model in differentiating individual diseased cancerous sample from the other diseased cancerous samples and the non-diseased normal samples, wherein in determining the accuracy, the at least one computing device executing the AI/ML algorithms is further configured to: obtain and plot the scores obtained from the multiclass classification model in the second Al Model, and wherein the plot of the multiclass classification model of one individual diseased cancerous sample against the scores for other diseased cancerous samples give scores that clearly differentiate the one individual diseased cancerous sample from the other diseased cancerous sample upon applying a threshold to differentiate between two types results in a confusion matrix; and wherein, scores of the non-diseased normal samples are also added to get sensitivity, specificity values for the one individual diseased cancerous sample versus all the other diseased cancerous samples including the non-diseased normal samples.
13. The method of claim 10, wherein the sample collecting device is a test tube; the precipitation device is a test tube; and wherein the phase separation device is used to dry the metabolite extract using speed vacuum.
42
14. The method of claim 10, wherein the method further includes automated data extraction of the measured metabolite ions and their related features using at least one computing device with compound discoverer software for; and wherein the at least one computing device with compound discoverer software significantly enhances any differences in chromatogram obtained using the LC device.
15. The method of claim 10, wherein the method further includes aligning, using at least one device, the masses measured for the measured metabolite ions, measured using LC device and the MS device, wherein the alignment is done across all the one or more biological fluids samples in order to enable comparison of peak intensity of each ion across all the biological fluids samples.
16. The method of claim 10, wherein the method further includes minimizing, using at least computing device, any errors generated in measurement of the masses for the metabolite ions, and wherein in minimizing the errors, the at least computing device is configured to normalize for variations in the masses (m/z) by applying a modified virtual lock mass-based approach that includes: combining a traditional virtual lock mass approach with metabolite identification from the Human Metabolome database (HMDB); defining virtual lock mass boxes using the masses of metabolites identified by HMDB database search across multiple samples; and filtering the metabolite ions based on the frequency of presence of ions in the biological fluid samples is used in subsequent AI/ML analysis.
17. A system for detection of early stages cancer in an organism in a single analysis, the system comprising: at least one Liquid Chromatography (LC) device with a mass spectrometer (MS) (abbreviated, herein after, as LC-MS) for analysing one or more metabolites present in one or more biological samples of one or more organisms, using an LC-MS technique, wherein the LC-MS device is configured to:
43 resolve the one or more metabolites by an Ultra High-Performance Liquid Chromatography using the LC device; obtain ion spectra of the one or more metabolites through the MS device; and measure masses for metabolite ions present in the ion spectra, of the one or more metabolites, based on their mass-to-charge ratio or m/z through the MS device; at least one computing device for executing one or more AI/ML algorithms on the measured metabolite ions to create at least one or more Al Models for identifying and differentiating diseased cancerous samples from non-diseased normal samples and further to identify and differentiate individual diseased cancerous sample from other diseased cancerous samples and the non-diseased normal samples, and wherein the at least one computing device is configured to: randomly divide the measured metabolite ions into a training dataset and a test dataset; apply a logistic regression function by executing the AI/ML algorithms on the training dataset of the metabolite ions to find a function separating diseased cancerous samples from non-diseased normal samples; configure one or more class balancing parameters to balance the imbalance of classes in the training dataset, and thereby creating a first Al Model from the training dataset; apply the first Al Model on the test dataset of the metabolite ions for identifying and differentiating the diseased cancerous samples from the nondiseased normal samples based on the function separating diseased cancerous samples from non-diseased normal samples; configure a one versus rest (OVR) classifier multiclass classification model using the training dataset, and thereby creating a second Al Model from the training dataset; and apply the second Al Model over the first Al Model to identify and differentiate individual diseased cancerous sample from the other diseased cancerous samples and the non-diseased normal samples by obtaining scores assigned to each of the individual diseased cancerous sample.
44
18. The system of claim 17, wherein the system prepares a biological sample before subjecting the biological sample to the LC-MS device, wherein the system includes at least one or more of the following components which are configured to perform one or more of the following steps, individually or in any combination, for sample preparation: at least one sample collecting device for collecting one or more biological fluid samples from one or more organisms or individuals; at least one precipitating device for extraction of one or more metabolite extracts from the one or more biological fluid samples by precipitation of protein present in the biological fluid samples with chilled alcohol including at least methanol; at least one phase separation device for drying the one or more metabolite extracts extracted from the at least one precipitating device; and at least one device for reconstituting the one or more dried metabolite extracts in aqueous solutions in a mobile phase, wherein resultant reconstituted metabolites are provided to the LS-MS device after the reconstitution.
19. The system of claim 17, wherein the at least one computing device executing the AI/ML algorithms is further configured to determine accuracy of the multiclass classification model in the second Al Model in differentiating individual diseased cancerous sample from the other diseased cancerous samples and the non-diseased normal samples, wherein in determining the accuracy, the at least one computing device executing the AI/ML algorithms is further configured to: obtain and plot the scores obtained from the multiclass classification model in the second Al Model, and wherein the plot of the multiclass classification model of one individual diseased cancerous sample against the scores for other diseased cancerous samples give scores that clearly differentiate the one individual diseased cancerous sample from the other diseased cancerous sample upon applying a threshold to differentiate between two types results in a confusion matrix; and
45 wherein, scores of the non-diseased normal samples are also added to get sensitivity, specificity values for the one individual diseased cancerous sample versus all the other diseased cancerous samples including the non-diseased normal samples.
20. The system of claim 17, wherein the system further includes at least one computing device with compound discoverer software for automated data extraction of the measured metabolite ions and their related features using the compound discoverer software; and wherein the at least one computing device with compound discoverer software significantly enhances any differences in chromatogram obtained using the LC device.
21. The system of claim 17, wherein the system further includes at least one device for aligning the masses measured for the measured metabolite ions, measured using LC device and the MS device, wherein the alignment is done across all the one or more biological fluids samples in order to enable comparison of peak intensity of each ion across all the biological fluids samples.
22. The system of claim 17, wherein the system further includes at least computing device for minimizing any errors generated in measurement of the masses for the metabolite ions, and wherein in minimizing the errors, the at least computing device is configured to normalize for variations in the masses (m/z) by applying a modified virtual lock mass-based approach that includes: combining a traditional virtual lock mass approach with metabolite identification from the Human Metabolome database (HMDB); defining virtual lock mass boxes using the masses of metabolites identified by HMDB database search across multiple samples; filtering the metabolite ions based on the frequency of presence of ions in the biological fluid samples is used in subsequent AI/ML analysis.
46
PCT/US2021/048337 2020-08-31 2021-08-31 Method for early treatment and detection of women specific cancers WO2022047352A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202063072482P 2020-08-31 2020-08-31
US63/072,482 2020-08-31

Publications (1)

Publication Number Publication Date
WO2022047352A1 true WO2022047352A1 (en) 2022-03-03

Family

ID=80355705

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/048337 WO2022047352A1 (en) 2020-08-31 2021-08-31 Method for early treatment and detection of women specific cancers

Country Status (1)

Country Link
WO (1) WO2022047352A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150330984A1 (en) * 2012-12-06 2015-11-19 Dana-Farber Cancer Institute, Inc. Metabolomic profiling defines oncogenes driving prostate tumors
US20190027249A1 (en) * 2016-01-22 2019-01-24 OTraces, Inc Systems and methods for improving diseases diagnosis
WO2019142136A1 (en) * 2018-01-17 2019-07-25 Ods Medical Inc. System and methods for real-time raman spectroscopy for cancer detection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150330984A1 (en) * 2012-12-06 2015-11-19 Dana-Farber Cancer Institute, Inc. Metabolomic profiling defines oncogenes driving prostate tumors
US20190027249A1 (en) * 2016-01-22 2019-01-24 OTraces, Inc Systems and methods for improving diseases diagnosis
WO2019142136A1 (en) * 2018-01-17 2019-07-25 Ods Medical Inc. System and methods for real-time raman spectroscopy for cancer detection

Similar Documents

Publication Publication Date Title
CN111562338B (en) Application of transparent renal cell carcinoma metabolic marker in renal cell carcinoma early screening and diagnosis product
CN113960235B (en) Application and method of biomarker in preparation of lung cancer detection reagent
US20150056605A1 (en) Identification of blood based metabolite biomarkers of pancreatic cancer
CN112183616B (en) Diagnostic marker and kit for diagnosis of glioma, screening method and construction method of glioma diagnostic model
CN112305121B (en) Application of metabolic marker in atherosclerotic cerebral infarction
Liang et al. Serum metabolomics uncovering specific metabolite signatures of intra-and extrahepatic cholangiocarcinoma
CN113711044A (en) Biomarker for detecting colorectal cancer or adenoma and method thereof
CN113567585A (en) Esophageal squamous carcinoma screening marker and kit based on peripheral blood
CN112599239A (en) Metabolite marker and application thereof in cerebral infarction diagnosis
Zhang et al. Altered phosphatidylcholines expression in sputum for diagnosis of non-small cell lung cancer
ES2841950T3 (en) A diagnostic procedure for pancreatic cancer based on lipidomic analysis of a body fluid
WO2022047352A1 (en) Method for early treatment and detection of women specific cancers
CN109946467B (en) Biomarker for ossification diagnosis of thoracic vertebra ligamentum flavum
CN113466370A (en) Marker and detection kit for early screening of esophageal squamous carcinoma
CN113533560A (en) Esophageal cancer early screening marker based on metabonomics and kit thereof
CN113804901A (en) Serum lipid marker for early noninvasive diagnosis of oral squamous cell carcinoma and application thereof
Pyatnitskiy et al. Identification of differential signs of squamous cell lung carcinoma by means of the mass spectrometry profiling of blood plasma
CN112834652B (en) Acute aortic dissection patient-specific biomarker composition and application thereof
CN117388495B (en) Application of metabolic marker for diagnosing lung cancer stage and kit
CN113447586B (en) Marker for cardiac cancer screening and detection kit
CN110632231B (en) Metabolic marker of glioblastoma in urine and use thereof in early diagnosis
Lokhov et al. Metabolic fingerprinting of blood plasma from patients with prostate cancer
CN112147344B (en) Metabolic marker of atherosclerotic cerebral infarction and application of metabolic marker in diagnosis and treatment
CN116183922B (en) Construction method of oral squamous cell carcinoma diagnosis model, marker and application thereof
CN113433239A (en) Marker and kit for diagnosing cardia cancer

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21862949

Country of ref document: EP

Kind code of ref document: A1