EP4587839A2 - Diagnose von eierstockkrebs durch gezielte quantifizierung der ortsspezifischen proteinglykosylierung - Google Patents

Diagnose von eierstockkrebs durch gezielte quantifizierung der ortsspezifischen proteinglykosylierung

Info

Publication number
EP4587839A2
EP4587839A2 EP23866505.3A EP23866505A EP4587839A2 EP 4587839 A2 EP4587839 A2 EP 4587839A2 EP 23866505 A EP23866505 A EP 23866505A EP 4587839 A2 EP4587839 A2 EP 4587839A2
Authority
EP
European Patent Office
Prior art keywords
peptide
ovarian cancer
structures
peptide structure
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP23866505.3A
Other languages
English (en)
French (fr)
Inventor
Chirag DHAR
Prasanna Ramachandran
Tomislav CAVAL
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Venn Biosciences Corp
Original Assignee
Venn Biosciences Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Venn Biosciences Corp filed Critical Venn Biosciences Corp
Publication of EP4587839A2 publication Critical patent/EP4587839A2/de
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6842Proteomic analysis of subsets of protein mixtures with reduced complexity, e.g. membrane proteins, phosphoproteins, organelle proteins
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional [2D] or three-dimensional [3D] molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/20Protein or domain folding

Definitions

  • Embodiments of the present disclosure generally relate to methods and systems for analyzing peptide structures for diagnosing and/or treating ovarian cancer. More particularly, embodiments of the present disclosure relate to analyzing quantification data for a set of peptide structures detected in a biological sample obtained from a subject for use in diagnosing and/or treating the subject, the set of peptide structures being associated with ovarian cancer.
  • Glycoprotein analysis is fraught with challenges on several levels.
  • a single glycan composition in a peptide can contain a large number of isomeric structures due to different glycosidic linkages, branching patterns, and/or multiple monosaccharides having the same mass.
  • the presence of multiple glycans that share the same peptide backbone can lead to assay signals from various glycoforms, lowering their individual abundances compared to aglycosylated peptides. Accordingly, the development of algorithms that can identify glycan structures on peptide fragments remains elusive.
  • a method for diagnosing a subject with respect to an ovarian cancer disease state includes receiving peptide structure data corresponding to a biological sample obtained from the subject.
  • the peptide structure data can be analyzed using a supervised machine learning model to generate a disease indicator that indicates whether the biological sample evidences the ovarian cancer disease state of having early stage or late stage ovarian cancer based on at least one peptide structure selected from one of a group of peptide structures identified in Tables 3B, 3C, or 3D.
  • a diagnosis output can be generated based on the disease indicator.
  • the disease indicator can include a score.
  • the method of generating the diagnosis output can include determining that the score falls above a selected threshold and generating the diagnosis output based on the score falling above the selected threshold, wherein the diagnosis output includes a classification of late stage ovarian cancer disease state.
  • the method of generating the diagnosis output can include determining that the score falls below a selected threshold and generating the diagnosis output based on the score falling below the selected threshold, wherein the diagnosis output includes a classification of early stage ovarian cancer disease state.
  • the score may include a probability score and the selected threshold is 0.5. Alternatively, the selected threshold may fall within a range between 0.30 and 0.65.
  • the analyzing the peptide structure data can include analyzing the peptide structure data using a binary classification model.
  • the peptide structure of the at least one peptide structure can include a glycopeptide structure defined by a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 3D, with the peptide sequence being one of SEQ ID NOS: 126-175 in Table 3D as defined in Table 5.
  • the method can include training the supervised machine learning model using training data, wherein the training data comprises a plurality of peptide structure profiles for a plurality of subjects and a plurality of subject diagnoses for the plurality of subjects, wherein the plurality of subject diagnoses includes a diagnosis for any subject of the plurality of subjects determined to have early stage or late stage ovarian cancer.
  • the method can include performing a differential expression analysis using initial training data to compare a first portion of the plurality of subjects diagnosed with the classification of early stage ovarian cancer disease state versus a second portion of the plurality of subjects diagnosed with the classification of late stage ovarian cancer disease state; identifying a training group of peptide structures based on the differential expression analysis for use as prognostic markers for the ovarian cancer disease state; and forming the training data based on the training group of peptide structures identified.
  • the training of the supervised machine learning model can include reducing the training group of peptide structures to a final group of peptide structures identified in Tables 3B, 3C, or 3D.
  • the first group of peptide structures in Tables 3B, 3C, or 3D is used to distinguish between the ovarian cancer disease state being late stage or early stage.
  • the quantification data for a peptide structure of the set of peptide structures can include at least one of an abundance, a relative abundance, a normalized abundance, a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, or a normalized concentration.
  • the method can include preparing a sample of the biological sample using reduction, alkylation, and enzymatic digestion to form a prepared sample that includes a set of peptide structures.
  • the method of classifying early and late stage ovarian cancer can be implemented after the subject has already been diagnosed as having ovarian cancer.
  • the subject can be initially diagnosed for having ovarian cancer using one or more biomarkers in Tables 1, 2, or 3.
  • the generating the diagnosis output can include generating a report identifying that the biological sample evidences the early stage or late stage ovarian cancer disease state.
  • the generating a treatment output can be generated based on at least one of the diagnosis output or the disease indicator.
  • the treatment output can include at least one of an identification of a treatment to treat the subject or a treatment plan.
  • the treatment can include at least one of surgery, radiation therapy, a targeted drug therapy, chemotherapy, immunotherapy, hormone therapy, or neoadjuvant therapy.
  • the group of peptide structures in Tables 3B, 3C, or 3D is listed in order of relative significance to the disease indicator.
  • the method can further include preparing a sample of the biological sample using reduction, alkylation, and enzymatic digestion to form a prepared sample that includes a set of peptide structures.
  • the method can further include generating the peptide structure data from the prepared sample using multiple reaction monitoring mass spectrometry (MRM-MS).
  • MRM-MS multiple reaction monitoring mass spectrometry
  • a method of training a model to diagnose a subject with respect to an ovarian cancer disease state having a malignant pelvic tumor is described.
  • the method can include receiving quantification data for a panel of peptide structures for a plurality of samples for a plurality of subjects.
  • the plurality of subjects includes a first portion diagnosed with a classification of early stage ovarian cancer disease state and a second portion diagnosed with a classification of late stage ovarian cancer disease state.
  • a bottommost N-acetylglucosamine (dark square) of the glycan structure in Table 7 is attached to a linking site position in the peptide sequence in accordance with Tables 1, 2, 3, 3B, 3C, 3D, and 5.
  • Figure 17 is an illustration of a receiver operating characteristic (ROC) diagram corresponding to the multivariable model built to predict early stage v. late stage ovarian cancer in accordance with one or more embodiments.
  • ROC receiver operating characteristic
  • Figure 18 is a graph illustrating the fold changes for a plurality of tri- and tetra- antennary glycans glycopeptides that were either non-fucosylated or fucosylated.
  • Figure 19A is a graph illustrating the fold changes for pairs of tri- and tetra- antennary glycans glycopeptides that were either non-fucosylated or fucosylated.
  • Figure 19B is a graph illustrating the fold changes for triplets of tri- and tetra- antennary glycans glycopeptides that were either non-fucosylated, mono-fucosylated, or di- fucosylated. Both mono-fucosylated and di-fucosylated markers has median FC’s above 1 suggesting correlation of these markers with malignant EOC.
  • Figure 20 is an illustration of a diagram showing the probability distributions for early stage v. late stage ovarian cancer using training data set and the testing data set in accordance with one or more embodiments using the biomarkers of Table 3D
  • Figures 21A to 21E are graphs of the relative abundance of five distinct types of fucosylated glycopeptides in benign tumors, early stage EOC, and late stage EOC.
  • An example of the positive diagnosis includes the subject suffering from a form of ovarian cancer (e.g., epithelial ovarian cancer (EOC)).
  • EOC epithelial ovarian cancer
  • a diagnosis can also assess a malignancy status of a previously identified pelvic (or adnexal) tumor (or mass).
  • Mass spectrometry can be used to analyze serum for various glycoproteins and/or glycopeptides to differentiate between benign and malignant adnexal masses.
  • a distinct signature was found with the circulating N-glycoproteins that allows a differentiation between late stage (metastatic disease of stage III/IV) and early stage (stage I/II) epithelial ovarian cancer (EOC).
  • EOC epithelial ovarian cancer
  • Qiagen s Ingenuity Pathway Analysis package on this data, it was predicted that the signature markers are downstream of cytokine signaling.
  • the markers also suggest the presence of the sialyl Lewis X (sLex) epitope on N-glycans of certain liver-derived circulatory glycoproteins.
  • the methods, systems, and compositions provided by the embodiments described herein may enable an earlier and more accurate diagnosis of ovarian cancer in a subject as compared to currently available diagnostic modalities (e.g., imaging, biochemical tests) used for determining whether surgical intervention is indicated.
  • diagnostic modalities e.g., imaging, biochemical tests
  • various currently available non-invasive tests to distinguish between benign and malignant pelvic tumors rely on detection of the biomarker cancer antigen 125 (CA125).
  • CA125 cancer antigen 125
  • serum CA125 is not elevated in over 20% of ovarian carcinomas and is elevated in a variety of other malignant and non-malignant conditions.
  • amino acid generally refers to any organic compound that includes an amino group (e.g., -NH2), a carboxyl group (-COOH), and a side chain group (R) which varies based on a specific amino acid. Amino acids can be linked using peptide bonds.
  • biological sample generally refers to a specimen taken by sampling so as to be representative of the source of the specimen, typically, from a subject.
  • a biological sample can be representative of an organism as a whole, specific tissue, cell type, or category or sub-category of interest.
  • the biological sample can include a matrix (e.g., a gel or polymer matrix) comprising a cell or one or more constituents from a cell (e.g., cell bead), such as DNA, RNA, organelles, proteins, or any combination thereof, from the cell.
  • a matrix e.g., a gel or polymer matrix
  • the biological sample may be obtained from a tissue of a subject.
  • the biological sample can include a hardened cell. Such hardened cells may or may not include a cell wall or cell membrane.
  • the biological sample can include one or more constituents of a cell but may not include other constituents of the cell. An example of such constituents may include a nucleus or an organelle.
  • the biological sample may include a live cell.
  • the live cell can be capable of being cultured.
  • biomarker generally refers to any measurable substance taken as a sample from a subject whose presence is indicative of some phenomenon. Non- limiting examples of such phenomenon can include a disease state, a condition, or exposure to a compound or environmental condition. In various embodiments described herein, biomarkers may be used for diagnostic purposes (e.g., to diagnose a health state, a disease state). The term “biomarker” can be used interchangeably with the term “marker.”
  • digesting a peptide generally refers to a biological process that employs enzymes to break specific amino acid peptide bonds.
  • digesting a peptide includes contacting the peptide with an digesting enzyme, e.g., trypsin to produce fragments of the glycopeptide.
  • an digesting enzyme e.g., trypsin to produce fragments of the glycopeptide.
  • a protease enzyme is used to digest a glycopeptide.
  • protease enzyme refers to an enzyme that performs proteolysis or breakdown of large peptides into smaller polypeptides or individual amino acids.
  • protease examples include, but are not limited to, one or more of a serine protease, threonine protease, cysteine protease, aspartate protease, glutamic acid protease, metalloprotease, asparagine peptide lyase, and any combinations of the foregoing.
  • Enzymatic digestion may be used in preparation for mass spectrometry using trypsin digestion protocols. Proteins may be digested using other proteases in preparation for mass spectrometry if access is limited to cleavage sites.
  • disease state generally refers to a condition that affects the structure or function of an organism.
  • causes of disease states may include pathogens, immune system dysfunctions, cell damage caused by aging, cell damage caused by other factors (e.g., trauma and cancer).
  • Disease states can include any state of a disease whether symptomatic or asymptomatic.
  • Disease states can include disease stages of a disease progression. Disease states can cause minor, moderate, or severe disruptions in structure or function of an organism (e.g., a subject).
  • glycopeptide or “glycopolypeptide” as used herein, generally refers to a peptide or polypeptide comprising at least one glycan residue.
  • glycopeptides comprise carbohydrate moieties (e.g., one or more glycans) covalently attached to a side chain (i.e. R group) of an amino acid residue.
  • glycopeptide fragment or “glycosylated peptide fragment” or “glycopeptide” as used herein, generally refers to a glycosylated peptide (or glycopeptide) having an amino acid sequence that is the same as part (but not all) of the amino acid sequence of the glycosylated protein from which the glycosylated peptide is obtained, e.g., ion fragmentation within a MRM-MS instrument.
  • MRM refers to multiple-reaction-monitoring.
  • glycopeptide fragments or “fragments of a glycopeptide” refer to the fragments produced directly by using a mass spectrometer optionally after the glycoprotein has been digested enzymatically to produce the glycopeptides.
  • glycoprotein generally refers to a protein having at least one glycan residue bonded thereto.
  • a glycoprotein is a protein with at least one oligosaccharide chain covalently bonded thereto. Examples of glycoproteins include but are not limited to the peptide structures including glycan molecules shown in the various Tables presented herein.
  • a glycopeptide, as used herein, refers to a fragment of a glycoprotein, unless specified otherwise to the contrary.
  • mass spectrometry generally refers to an analytical technique used to identify molecules. In various embodiments described herein, mass spectrometry can be involved in characterization and sequencing of proteins.
  • m/z or “mass-to-charge ratio,” as used herein, generally refers to an output value from a mass spectrometry instrument.
  • m/z can represent a relationship between the mass of a given ion and the number of elementary charges that it carries.
  • the “m” in m/z stands for mass and the “z” stands for charge.
  • m/z can be displayed on an x-axis of a mass spectrum.
  • the term “patient,” as used herein, generally refers to a mammalian subject.
  • the mammal can be a human, or an animal including, but not limited to an equine, porcine, canine, feline, ungulate, and primate animal.
  • the individual is a human.
  • the methods and uses described herein are useful for both medical and veterinary uses.
  • a “patient” is a human subject unless specified to the contrary.
  • peptide generally refers to amino acids linked by peptide bonds.
  • Peptides can include amino acid chains between 10 and 50 residues.
  • Peptides can include amino acid chains shorter than 10 residues, including, oligopeptides, dipeptides, tripeptides, and tetrapeptides.
  • Peptides can include chains longer than 50 residues and may be referred to as “polypeptides” or “proteins.”
  • the phrase “peptide,” is meant to include glycopeptides unless stated otherwise.
  • Protein or “polypeptide” or “peptide” may be used interchangeably herein and generally refer to a molecule including at least three amino acid residues. Proteins can include polymer chains made of amino acid sequences linked together by peptide bonds. Proteins may be digested in preparation for mass spectrometry using trypsin digestion protocols. Proteins may be digested using other proteases in preparation for mass spectrometry if access is limited to cleavage sites.
  • peptide structure generally refers to peptides or a portion thereof or glycopeptides or a portion thereof.
  • a peptide structure can include any molecule comprising at least two amino acids in sequence.
  • reduction generally refers to the gain of an electron by a substance.
  • a sugar can directly bind to a protein, thereby, reducing the amino acid to which it binds. Such reducing reactions can occur in glycosylation. In various embodiments, reduction may be used to break disulfide bonds between two cysteines.
  • sample generally refers to a sample from a subject of interest and may include a biological sample of a subject.
  • the sample may include a cell sample.
  • the sample may include a cell line or cell culture sample.
  • the sample can include one or more cells.
  • the sample can include one or more microbes.
  • the sample may include a nucleic acid sample or protein sample.
  • the sample may also include a carbohydrate sample or a lipid sample.
  • the sample may be derived from another sample.
  • the sample may include a tissue sample, such as a biopsy, core biopsy, needle aspirate, or fine needle aspirate.
  • the sample may include a fluid sample, such as a blood sample, urine sample, or saliva sample.
  • the sample may include a skin sample.
  • the sample may include a cheek swab.
  • the sample may include a plasma or serum sample.
  • the sample may include a cell-free or cell free sample.
  • a cell-free sample may include extracellular polynucleotides.
  • the sample may originate from blood, plasma, serum, urine, saliva, mucosal excretions, sputum, stool, or tears.
  • the sample may originate from red blood cells or white blood cells.
  • the sample may originate from feces, spinal fluid, CNS fluid, gastric fluid, amniotic fluid, cyst fluid, peritoneal fluid, marrow, bile, other body fluids, tissue obtained from a biopsy, skin, or hair.
  • sequence generally refers to a biological sequence including one-dimensional monomers that can be assembled to generate a polymer.
  • sequences include nucleotide sequences (e.g., ssDNA, dsDNA, and RNA), amino acid sequences (e.g., proteins, peptides, and polypeptides), and carbohydrates (e.g., compounds including Cm (H2O) chunk).
  • the term “subj ect,” as used herein, generally refers to an animal, such as a mammal (e.g., human) or avian (e.g., bird), or other organism, such as a plant.
  • the subject can include a vertebrate, a mammal, a rodent (e.g., a mouse), a primate, a simian or a human.
  • Animals may include, but are not limited to, farm animals, sport animals, and pets.
  • a subject can include a healthy or asymptomatic individual, an individual that has or is suspected of having a disease (e.g., cancer) or a pre-disposition to the disease, and/or an individual that is in need of therapy or suspected of needing therapy.
  • a subject can be a patient.
  • a subject can include a microorganism or microbe (e.g., bacteria, fungi, archaea, viruses). However, in the context of diagnosing ovarian cancer, the subject is female unless explicitly specified otherwise.
  • a subject may be one who has been previously identified as having a disease or a condition, and optionally has already undergone, or is undergoing, a therapeutic intervention for the disease or condition.
  • a subject can also be one who has not been previously diagnosed as having a disease or a condition.
  • a subject can be one who exhibits one or more risk factors for a disease or a condition, or a subject who does not exhibit disease risk factors, or a subject who is asymptomatic for a disease or a condition.
  • a subject can also be one who is suffering from or at risk of developing a disease or a condition.
  • computer system 400 can include a bus 402 or other communication mechanism for communicating information, and a processor 404 coupled with bus 402 for processing information.
  • computer system 400 can also include a memory, which can be a random-access memory (RAM) 406 or other dynamic storage device, coupled to bus 402 for determining instructions to be executed by processor 404. Memory also can be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 404.
  • computer system 400 can further include a read only memory (ROM) 408 or other static storage device coupled to bus 402 for storing static information and instructions for processor 404.
  • ROM read only memory
  • a storage device 410 such as a magnetic disk or optical disk, can be provided and coupled to bus 402 for storing information and instructions.
  • This input device 414 typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
  • a first axis e.g., x
  • a second axis e.g., y
  • input devices 414 allowing for three-dimensional (e.g., x, y, and z) cursor movement are also contemplated herein.
  • computer-readable medium e.g., data store, data storage, storage device, data storage device, etc.
  • computer-readable storage medium refers to any media that participates in providing instructions to processor 404 for execution.
  • Such a medium can take many forms, including but not limited to, non-volatile media, volatile media, and transmission media.
  • non-volatile media can include, but are not limited to, optical, solid state, magnetic disks, such as storage device 410.
  • volatile media can include, but are not limited to, dynamic memory, such as RAM 406.
  • transmission media can include, but are not limited to, coaxial cables, copper wire, and fiber optics, including the wires that comprise bus 402.
  • instructions or data can be provided as signals on transmission media included in a communications apparatus or system to provide sequences of one or more instructions to processor 404 of computer system 400 for execution.
  • a communication apparatus may include a transceiver having signals indicative of instructions and data.
  • the instructions and data are configured to cause one or more processors to implement the functions outlined in the disclosure herein.
  • Representative examples of data communications transmission connections can include, but are not limited to, telephone modem connections, wide area networks (WAN), local area networks (LAN), infrared data connections, NFC connections, optical communications connections, etc.
  • Figure 5 is a flowchart of a process for diagnosing a subject with respect to an ovarian cancer disease state in accordance with one or more embodiments.
  • Process 500 may be implemented using, for example, at least a portion of workflow 100 as described in Figures 1, 2A, and 2B and/or analysis system 300 as described in Figure 3.
  • Process 500 may be used to generate a final output that includes at least a diagnosis output for the subject.
  • Step 502 includes receiving peptide structure data corresponding to a biological sample obtained from the subject.
  • the peptide structure data may be, for example, one example of an implementation of peptide structure data 310 in Figure 3.
  • the peptide structure data may include quantification data for each peptide structure of a plurality of peptide structures.
  • the quantification data may include, for example, one or more quantification metrics for each peptide structure of the plurality of peptide structures.
  • a quantification metric for a peptide structure may be, for example, but is not limited to, a relative quantity, an adjusted quantity, a normalized quantity, a relative concentration, an adjusted concentration, or a normalized concentration.
  • the quantification data for a given peptide structure provides an indication of the abundance of the peptide structure in the biological sample.
  • at least one peptide structure includes a glycopeptide structure having a peptide sequence and a glycan structure linked to the peptide sequence at a linking site of the peptide sequence, as identified in Table 1 or Table 2, with the peptide sequence being one of SEQ ID NOS: 11-19 in Table 1 or one of SEQ ID NOS: 14, 15, and 31-46 in Table 2, the SEQ ID NOS being defined in Table 5 below.
  • Step 504 includes analyzing the peptide structure data using a supervised machine learning model to generate a disease indicator that indicates whether the biological sample evidences an ovarian cancer disease state based on at least three peptide structures selected from a first group of peptide structures identified in Table 1 (below) or a second group of peptide structures identified in Table 2 (below).
  • the first and second groups of peptide structures are associated with the ovarian cancer disease state.
  • the first group of peptide structures is listed in Table 1 with respect to relative significance to the disease indicator.
  • the second group of peptide structures is listed in Table 2 with respect to relative significance to the disease indicator.
  • the first group of peptide structures in Table 1 includes peptide structures that have been determined relevant to distinguishing at least between ovarian cancer (e.g., EOC) and a healthy state.
  • the first group of peptide structures may be used to predict the probability of EOC for use in clinically screening patients.
  • the first group of peptide structures in Table 1 may also be peptide structures that have been determined relevant to distinguishing between ovarian cancer (e.g., EOC) and a benign tumor state (e.g., a benign pelvic tumor).
  • the first group of peptide structures may be used to clinically triage patients that have been identified as having pelvic tumors to determine the probability that such a tumor evidences EOC.
  • the second group of peptide structures in Table 2 includes peptide structures that have been determined relevant to distinguishing at least between ovarian cancer (e.g., EOC) and the benign tumor state (e.g., a benign pelvic tumor).
  • the second group of peptide structures may be used to clinically triage patients that have been identified as having pelvic tumors to determine the probability that such a tumor evidences EOC. In this manner, the second group of peptide structures may predict malignancy of an identified pelvic tumor.
  • the at least 3 peptide structures includes at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or all 10 of the peptide structures PS-1 to PS-10 in Table 1.
  • the at least 3 peptide structures include at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, or all 25 of the peptide structures PS-5 and PS-11 through PS-34 in Table 1.
  • the at least 3 peptide structures includes at least PS-5, which is present in both Table 1 and Table 2.
  • step 504 may be implemented using a binary classification model (e.g., a regression model).
  • the regression model may be, for example, penalized multivariable regression model.
  • the disease indicator may be computed using a weight coefficient associated with each peptide structure of the at least 3 peptide structures, the weight coefficient of a corresponding peptide structure of the at least 3 peptide structures may indicate the relative significance of the corresponding peptide structure to the disease indicator.
  • step 504 may include computing a peptide structure profile for the biological sample that identifies a weighted value for each peptide structure of the at least 3 peptide structures.
  • the weighted value for a peptide structure of the at least 3 peptide structures may be a product of a quantification metric for the peptide structure identified from the peptide structure data and a weight coefficient for the peptide structure.
  • the disease indicator may be computed using the peptide structure profile.
  • the disease indicator may be a logit equal to the sum of the weighted values for the peptide structures plus an intercept value. The intercept value may be determined during the training of the model.
  • the peptide structure profile for a given peptide structure may include a corresponding feature — relative abundance, concentration, site occupancy — for that peptide structure.
  • the relative abundance may be a normalized relative abundance; the concentration may be normalized concentration.
  • two peptide structure profiles may be computed for the same peptide structure, each profile corresponding to a different feature.
  • a first peptide structure profile may include a relative abundance for a corresponding peptide structure and a second peptide structure profile may include a concentration for the same corresponding peptide structure.
  • the disease indicator comprises a probability that the biological sample is positive for the ovarian cancer disease state and the supervised machine learning model is configured to generate an output that identifies the biological sample as either evidencing (“positive for”) the ovarian cancer disease state when the disease indicator is greater than a selected threshold or not evidencing (“negative for”) the ovarian cancer disease state when the disease indicator is not greater than the selected threshold.
  • the selected threshold may be, for example, 0.30, 0.35, 0.40, 0.45, 0.50, 0.55, 0.60, or some other threshold between 0.30 and 0.65. In one or more embodiments, the selected threshold is 0.5.
  • Generating the diagnosis output in step 506 may include determining that the score falls above (or at or above) a selected threshold and generating a positive diagnosis for the ovarian cancer disease state.
  • step 506 can include determining that the score falls below (or at or below) a selected threshold and generating a negative diagnosis for the ovarian cancer disease state.
  • the score can include a probability score and the selected threshold can be 0.5. In other scoring systems, the selected threshold can fall within a range between 0.30 and 0.65.
  • Peptide Sequence is a number that refers to the sequential position of an amino acid of the corresponding peptide in which a glycan is attached.
  • the amino acid position of the peptide sequence is defined by the sequentially numbered order of amino acids for the peptide sequence.
  • Glycan Structure GL No. is a number that corresponds to a symbol structure and a composition of the glycan as indicated in Tables 7.
  • the term AGP12 for SEQ ID NOs: 68-69 represents that the glycopeptide is a fragment of either AGP1 or AGP2.
  • the Glycan Linking Site Pos. in Protein Sequence column should be used for identification of the peptide.
  • the Glycan Linking Site Pos. in Protein Sequence column should be used for identification of the peptide.
  • the second number subsequent to the second underscore in the Peptide Structure (PS) NAME is inconsistent with the Glycan Structure GL NO column, then the Glycan Structure GL NO column should be used for identification of the glycan portion of the glycopeptide. If the Peptide Structure (PS) NAME does not contain any numbers, then the peptide is non-glycosylated. In some instances of the Peptide Structure (PS) NAME, subsequent to the prefix, there is a number noted with the notation MC that indicates that there was a missed cleavage at position in the peptide sequence as noted by the number.
  • Figure 7 is a flowchart of a process for training a model to diagnose a subject with respect to an ovarian cancer disease state in accordance with one or more embodiments.
  • Process 700 may be implemented using, for example, at least a portion of workflow 100 as described in Figures 1, 2A, and 2B and/or analysis system 300 as described in Figure 3.
  • process 700 may be one example of an implementation for training the model used in the process 500 in Figures 5, 6, or 6B.
  • Step 702 includes receiving quantification data for a panel of peptide structures for a plurality of subjects.
  • the plurality of subjects may include a first portion diagnosed with a negative diagnosis of an ovarian cancer disease state and a second portion diagnosed with a positive diagnosis of the ovarian cancer disease state.
  • the plurality of subjects may include a first portion having early stage EOC and a second portion have late stage EOC.
  • the quantification data comprises an initial plurality of peptide structure profiles for the plurality of subjects.
  • a peptide structure profile in the initial plurality of peptide structure profiles may include a feature associated with a corresponding peptide structure.
  • the feature may be relative abundance, concentration, site occupancy, or some other quantification-based feature.
  • the initial plurality of peptide structure profiles may include, one, two, three, or more profiles for a given peptide structure.
  • Step 704 includes training a machine learning model using the quantification data to diagnose a biological sample with respect to the ovarian cancer disease state using a group of peptide structures associated with the ovarian cancer disease state (e.g., the first group of peptide structures is identified in Table 1, the second group of peptide structures is identified in Table 2, the third group of peptide structures is identified in Table 3).
  • the first, second, and third groups of peptide structures are listed in Tables 1, 2, and 3, respectively, with respect to relative significance to diagnosing the biological sample as evidencing malignancy (e.g., EOC).
  • Step 704 can include training the machine learning using a portion of the quantification data corresponding to a training group of peptide structures included in the plurality of peptide structures.
  • Step 704 can include training a machine learning model using the quantification data to assess a biological sample with respect to the staging of the ovarian cancer disease state using a group of peptide structures associated with the ovarian cancer disease state such as a group of peptide structures identified in Table 3B, 3C, or 3D.
  • Step 704 may include reducing the plurality of peptide structure profiles using LASSO regression to identify a final group of peptide structures identified in Table 1 above.
  • Step 704 may include reducing the plurality of peptide structure profiles using LASSO regression to identify a final group of peptide structures identified in Table 2 above.
  • Step 704 may include reducing the plurality of peptide structure profiles using LASSO regression to identify a final group of peptide structures identified in Tables 3B, 3C, or 3D above.
  • Training data can be used for training the supervised machine learning model.
  • the training data can include a plurality of peptide structure profiles for a plurality of subjects and a plurality of subject diagnoses for the plurality of subjects.
  • the plurality of subject diagnoses can include a positive diagnosis for any subject of the plurality of subjects determined to have the ovarian cancer disease state and a negative diagnosis for any subject of the plurality of subjects determined not to have the ovarian cancer disease state.
  • the machine learning model can include a binary classification model.
  • Some binary classification models can include logistical regression models.
  • Some logistical regression models can include LASSO regression models.
  • An alternative or additional step in process 700 can include filtering the initial plurality of peptide structure profiles by a coefficient of variation to generate a plurality of peptide structure profiles for use in training the machine learning model. As one example, only those peptide structure profiles having a low coefficient of variation ( ⁇ 20%) were included int the plurality of peptide structure profiles used for training.
  • An alternative or additional step in process 700 can include performing a differential expression analysis using initial training data to compare a first portion of the plurality of subjects diagnosed with the positive diagnosis for the ovarian cancer disease state versus a second portion of the plurality of subjects diagnosed with the negative diagnosis for the ovarian cancer disease state.
  • An alternative or additional step in process 700 can include identifying a first portion of the plurality of samples for subjects with benign pelvic tumors and malignant pelvic tumors and a second portion of the plurality of samples for subjects with a healthy status.
  • An alternative or additional step in process 700 can include generating a training set of peptide structure profiles for 80% of the first portion and a test set of peptide structure profiles for a remaining 20% of the first portion and the second portion.
  • the machine learning model is a supervised machine learning model that is trained to determine weight coefficients for a panel of peptide structures such that a first portion of the weight coefficients for a first portion of the panel of peptide structures are non-zero and a second portion of the weight coefficients for a second portion of the panel of peptide structures are zero (or, alternatively, substantially close to zero so as to not be statistically significant).
  • the final output generated in step 506 in Figure 5 or in step 606 in Figure 6 may include a treatment output.
  • the treatment output may identify one or more treatment types for a subject based on the disease indicator and/or diagnosis output generated via process 500 in Figure 5 or process 600 in Figure 6, respectively.
  • Treatment for ovarian cancer e.g., EOC
  • the treatment output may include, for example, a treatment plan.
  • the treatment plan may include, for example, but is not limited to, a timeline or schedule for administering the treatment, dosing information, other treatment-related information, or a combination thereof.
  • Being able to accurately predict malignancy via the process 500 in Figure 5 and/or the process 600 in Figure 6 may allow treatment for malignant pelvic tumors (e.g., EOC) to be started earlier without requiring, in many or most cases, further invasive testing such as a biopsy.
  • malignant pelvic tumors e.g., EOC
  • a patient biological sample is obtained from a subject.
  • the biological sample may be processed (e.g., via digestion and fragmentation) such that one or more peptide structures of interest are detected. For example, detection and quantification may be performed for one or more peptide structures from Table 1, Table 2, Table 3, Table 3B, Table 3C, and/or Table 3D.
  • the quantification data that is generated for these peptide structures may be input into a trained binary classification model to generate a disease indicator, which may be, for example, a probability score.
  • a determination may be made as to whether the disease indicator (e.g., score) is above or below a selected threshold (e.g., 0.5). If the disease indicator is above the selected threshold, the biological sample may be classified as evidencing malignant pelvic tumor.
  • this classification may further include a classification that the subject is in need of treatment. If the subject is in need of treatment based on the classification, treatment is administered. For example, a therapeutically effective amount of a therapeutic agent is administered to the patient, where the therapeutic agent is selected from a chemotherapeutic agent, an immunotherapeutic agent, a hormone therapy, a targeted therapeutic agent, a neoadjuvant therapy, or a combination.
  • compositions comprising one or more of the peptide structures listed in Table 1, in Table 2, in Table 3, in Table 3B, in Table 3C, or in Table 3D.
  • a composition comprises a plurality of the peptide structures listed in Table 1, a plurality of the peptide structures listed in Table 2, or a plurality of the peptide structures listed in Table 3.
  • a composition comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 412, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, or 90 of the peptide structures listed in Tables 1, 2, 3, 3B, 3C, and 3D.
  • a composition comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, or all 36 of the peptide structures listed in Table 3B. In one or more embodiments, a composition comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, or all 25 of the glycopeptide structures listed in Table 3C.
  • a composition comprises 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or all 50 of the glycopeptide structures listed in Table 3D.
  • a composition comprises a peptide structure having an amino acid sequence with at least 80% sequence identity, such as, for example, at least 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to any one of SEQ ID NOs: 11-19, 31-46, 53-65, 68-77, 101-125, and 126-175 listed in Tables 1, 2, 3, 3B, 3C, and 3D.
  • a composition comprises a peptide structure having an amino acid sequence with at least 80% sequence identity, such as, for example, at least 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% sequence identity to any one of SEQ ID NOs: 131-134, 137, 139, 140, 143, 151, 165-167 listed in Tables 3D
  • compositions comprising one or more precursor ions having a defined charge and/or defined mass-to-charge (m/z) ratio, as listed in Tables 4, 4B, and 4C.
  • compositions comprising one or more product ions having a defined mass-to-charge (m/z) ratio, which product ions are produced by converting a peptide structure described herein (e.g., a peptide structure listed in Tables 1, 2, 3, 3B, 3C, or 3D) into a gas phase ion in a mass spectrometry system.
  • Conversion of the peptide structure into a gas phase ion can take place using any of a variety of techniques, including, but not limited to, matrix assisted laser desorption ionization (MALDI); electron ionization (El); electrospray ionization (ESI); atmospheric pressure chemical ionization (APCI); and/or atmospheric pressure photo ionization (APPI).
  • MALDI matrix assisted laser desorption ionization
  • El electron ionization
  • ESI electrospray ionization
  • APCI atmospheric pressure chemical ionization
  • APPI atmospheric pressure photo ionization
  • compositions comprising one or more product ions produced from one or more of the peptide structures described herein (e.g., a peptide structure listed in Tables 1, 2, 3, 3B, 3C, or 3D).
  • a composition comprises a set of the product ions listed in Table 4, 4B, or 4C having an m/z ratio selected from the list provided for each peptide structure in Table 4, 4B, or 4C.
  • a composition comprises at least one of peptide structures PS-1 to PS-10 identified in Table 1. In some embodiments, a composition comprises at least one of peptide structures PS-11 to PS-34 and PS-5 identified in Table 2. In some embodiments, a composition comprises at least one of peptide structures PS-1, PS-5, PS-11, PS-15, PS-20, PS-25, PS-28, PS-29, PS-30, PS-31, PS-32, and PS-35 to PS-61 identified in Table 3. In some embodiments, a composition comprises at least one of peptide structures PS-4, PS-8, PS- 18, PS-36, PS-37, PS-41, PS-56, PS-62 to PS-90 identified in Table 3B.
  • a composition comprises at least one of peptide structures of SEQ ID NOS 101-125 identified in Table 3C. In some embodiments, a composition comprises at least one of peptide structures of PS-ID 91 to 140 identified in Table 3D. In some embodiments, a composition comprises at least one of peptide structures of PS-ID NO: 96-99, 102, 104, 105, 108, 116, and 130-132 identified in Table 3D. [0247] In one or more embodiments, a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, or all 10 of the peptide structures PS-1 to PS-10 identified in Table 1.
  • a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, or all 25 of the peptide structures PS-11 to PS-34 and PS-5 identified in Table 2.
  • a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, at least 36, at least 37, or all 38 of the peptide structures PS-1, PS-5, PS-11, PS-15, PS-20, PS-25, PS-28, PS-29, PS-30, PS-31, PS-32, and PS-35 to PS-61 identified in Table 3.
  • the at least 3 peptide structures additionally include at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, or all 7 of the remaining peptide structures PS-1, PS-5, PS-11, PS-15, PS-20, PS-25, PS-28, PS-29, PS-30, PS-31, PS-32, and PS-35 to PS-61 identified in Table 3.
  • a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, at least 25, at least 26, at least 27, at least 28, at least 29, at least 30, at least 31, at least 32, at least 33, at least 34, at least 35, or all 36 of the peptide structures PS-4, PS-8, PS-
  • a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least 19, at least 20, at least 21, at least 22, at least 23, at least 23, at least 24, or all 25 of the peptide structures of SEQ ID NOS 121-125 identified in Table 3C.
  • a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, at least 12, at least 13, at least 14, at least 15, at least 16, at least 17, at least 18, at least
  • a composition comprises at least 1, at least 2, at least 3, at least 4, at least 5, at least 6, at least 7, at least 8, at least 9, at least 10, at least 11, or all 12 of the peptide structures of SEQ ID NOS 131-134, 137, 139, 140, 143, 151, 165-167 identified in Table 3D
  • the product ion is selected as one from a group consisting of product ions identified in Tables 4, 4B, and 4C including product ions falling within an identified m/z range of the m/z ratio identified in Tables 4, 4B, and 4C and characterized as having a precursor ion having an m/z ratio within an identified m/z range of the m/z ratio identified in Tables 4, 4B, and 4C.
  • a first range for the product ion m/z ratio may be ⁇ 0.5.
  • a second range for the product ion m/z ratio may be ⁇ 0.8.
  • a third range for the product ion m/z ratio may be ⁇ 1.0.
  • Table 6 identifies the proteins of SEQ ID NOS: 1-10, 20-30, 47-52, and 66-67 from at least of one of Tables 1, 2, 3, 3B, 3C, and 3D.
  • Table 6 identifies a corresponding protein abbreviation and protein name for each of protein SEQ ID NOS: 1-10, 20-30, 47-52, and 66- 67.
  • Table 6 identifies a corresponding Uniprot ID and protein sequence for each of protein SEQ ID NOS: 1-10, 20-30, 47-52, and 66-67.
  • Table 7 illustrates the symbol structure and composition of detected glycan moi eties that correspond to glycopeptides of Table 1, 2, 3, 3B, 3C, and 3D based on the Glycan GL NO.
  • the term Symbol Structure illustrates a geometric linking structure of the carbohydrates where the bottommost carbohydrate such as N-acetylglucosamine is bound to the designated amino acid for an N-linked glycan and the rightmost carbohydrate such as N- acetylgalactosamine is bound to the designated amino acid for an O-linked glycan.
  • the Glycan Structure GL NO 1102 is an O-linked glycan and that the rest of the glycans in Table 7 are N-linked glycans.
  • N-linked glycans have a glycan attached to the amino acid asparagine and O-linked glycans have a glycan attached to either a serine or a threonine.
  • the identity of the various monosaccharides is illustrated by the Legend section located at the end of Table 7.
  • the abbreviations of the Legend are Glc that represents glucose and is indicated by a dark circle, Gal that represents galactose and is indicated by an open circle, Man that represents mannose and is indicated by a circle with intermediate grey shading, Fuc that represents fucose and is indicated by a dark triangle, Neu5Ac that represents N- acetylneuraminic acid and is indicated by a dark diamond, GlcNAc that represents N- acetylglucosamine and is indicated by a dark square, GalNAc that represents N- acetylgalactosamine and is indicated by an open square, and ManNAc that represents N- acetylmannosamine and is indicated by a square with intermediate grey shading.
  • Composition refers to the number of various classes of carbohydrates that make up the glycan.
  • the quantity for each class of carbohydrate is depicted as a number in parenthesis to the right of an abbreviation that corresponds to the class of the carbohydrate.
  • the abbreviations for these clasess are Hex, HexNAc, Fuc, and NeuAc that respectively correspond to hexose, N-acetylhexosamine, fucose, and N-acetylneuraminic acid.
  • hexose sugars include glucose, galactose, and mannose; and N-acetylhexosamine sugars includes N-acetylglucosamine, N-acetylgalactosamine, and N-acetylmannosamine.
  • the terms Neu5Ac, NeuAc, and N-acetylneuraminic acid may be referred to as sialic acid.
  • Glycan Structure GL NO 3510 there are two symbol structures provided for one Glycan Structure GL NO such as, for example, Glycan Structure GL NO 3510.
  • the identify of a peptide that references a Glycan Structure GL NO that has two symbol structures could be either one of the two possibilities based on the MRM of the LC- MS analysis.
  • a bracket symbol is used as part of the Symbol Structure to indicate that the precise bonding linkage is not exactly known, but that the linking line segment is attached to one of the plurality of adj acent carbohydrates immediately adj acent to the bracket.
  • the fucose of Glycan Structure GL NO 3510 could have either a core fucose or an outer-arm fucose linkage.
  • the fucose orientation of either core or outer-arm linkage can be specified.
  • the methods for analyzing one or more peptide structures involve detecting a set of product ions generated by a reaction monitoring mass spectrometry system in which one or more product ions may correspond to each of the one or more peptide structures that have been inputted into the mass spectrometry system.
  • each peptide structure can be converted into a set of product ions having a defined m/z ratio, as provided in Tables 4, 4B, and 4C or an m/z ratio within an identified m/z ratio as provided in Tables 4, 4B, and 4C.
  • the methods involve generating quantification (e.g., abundance) data for the one or more product ions detected using the reaction monitoring mass spectrometry system.
  • the methods further comprise generating a diagnosis output using the quantification data and a model that has been trained using supervised or unsupervised machine learning.
  • the reaction monitoring mass spectrometry system may include multiple/selected reaction monitoring mass spectrometry (MRM/SRM-MS) to detect the one or more product ions and generate the quantification data.
  • Figure 8 is a table describing the distribution of the samples acquired in this exemplary retrospective analysis in accordance with one or more embodiments.
  • serum samples were acquired from a commercial biobank for 151 women with benign pelvic masses, 145 women with malignant epithelial ovarian cancer (EOC), and 55 healthy controls. Information on stage of EOC was available in 98 of the 145 patients with EOC. All samples were obtained prior to therapeutic intervention. Information on the benign or malignant nature of tumors was based on histopathological analysis of tissue specimens.
  • Sample processing involved pooled human serum/plasma (e.g., glycoprotein standards purified from human serum/plasma) for assay normalization, dithiothreitol (DTT), and iodoacetamide (IAA), sequencing-grade trypsin, LC-MS-grade water and acetonitrile, and formic acid (LC-MS grade). Serum samples were treated with DTT and IAA to reduce disulfide bonds and to inhibit cysteine proteases, respectively, followed by digestion with trypsin at 37°C for 18 hours. The digestion was quenched by adding formic acid to each sample to a final concentration of 1% (v/v).
  • DTT dithiothreitol
  • IAA iodoacetamide
  • LC-MS analysis included separating digested serum samples over an Agilent ZORBAX Eclipse Plus C18 column (2.1 mm x 150 mm i.d., 1.8 pm particle size) using an Agilent 1290 Infinity UHPLC system.
  • the mobile phase A consisted of 3% acetonitrile, 0.1% formic acid in water (v/v), and the mobile phase B of 90% acetonitrile 0.1% formic acid in water (v/v), with the flow rate set at 0.5 mL/minute.
  • the binary solvent composition was set at 100% mobile phase A at the beginning of the run, linearly shifting to 20% B at 20 minutes, 30% B at 40 minutes, and 44% B at 47 minutes.
  • the column was flushed with 100% B and equilibrated with 100% A for a total run time of 70 minutes.
  • samples were injected into an Agilent 6495B triple quadrupole MS operated in dynamic multiple reaction monitoring (dMRM) mode.
  • the MRM transitions comprised 513 glycopeptide structures which were normalized by comparing them with the abundance of 71 non-glycosylated peptide structures, representing each of 71 proteins from which the glycopeptides monitored were derived.
  • Samples were injected randomized as to underlying phenotype, and reference pooled serum digests were injected interspersed with study samples. VIII. A.3. Data Analysis
  • This subset included 976 features, with each feature being a concentration, relative abundance, or site occupancy for a corresponding peptide structure and where some peptide structures correspond with multiple features.
  • a given peptide structure may be associated with one, two, or three features within the subset of the 976 features.
  • Figure 9 is a plot diagram illustrating the results of a principal component analysis performed to assess the segregation between healthy, benign pelvic tumor, and EOC samples across first and second principal components in accordance with one or more embodiments.
  • EOC samples segregated distinctly from healthy control samples, while most benign pelvic tumors did not segregate as distinctly from healthy control samples.
  • Figure 10 is a plot diagram illustrating the results of a principal component analysis performed to assess segregation between healthy, benign pelvic tumor, early EOC, late EOC, and missing (undocumented) samples).
  • EOC samples and in particular late stage EOC samples
  • segregated distinctly from healthy control samples while most benign pelvic tumors did not segregate as distinctly from healthy control samples.
  • FIG 11 is an illustration of a receiver operating characteristic (ROC) diagram corresponding to the multivariable model built to predict malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
  • the multivariable model that was built may be used accurately and reliably to malignant EOC and distinguish such malignancy from a healthy status.
  • diagnostic power may be used to reduce the need for unnecessary invasive testing.
  • diagnostic information can be used to identify patients with EOC earlier, which may lead to earlier treatment, improved treatment recommendations, and improved treatment plans.
  • Table 8 below provides the fold changes, FDRs, and p-values for the 10 peptide structures PS-1 to PS- 10 (same as those in Table 1 above) based on differential expression analysis (DEA).
  • Table 8 Peptide Structure Markers for Regression Model to distinguish between Epithelial Ovarian Cancer and Healthy State
  • FIG. 13 is an illustration of a receiver operating characteristic (ROC) diagram corresponding to the multivariable model built to predict malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
  • the multivariable model that was built may be used accurately and reliably to triage pelvic tumors and distinguish those that are malignant from those that are benign.
  • diagnostic power may be used to reduce the need for invasive testing (e.g., biopsy) prior to treatment can be administered.
  • diagnostic information can be used to improve treatment recommendations and treatment plans (e.g., earlier treatment in the case of malignant EOC) and reduce indications for unnecessary treatment (e.g., no indication for surgery when the pelvic tumor is benign).
  • FIG 14 is an illustration of a diagram showing the probability distributions for the various groups using the multivariable model for predicting malignancy v. benign status of pelvic tumors in accordance with one or more embodiments.
  • the probability distributions for benign pelvic tumor, healthy, missing (undocumented), stage 1 EOC, stage 2 EOC, stage 3 EOC, and stage 4 EOC samples increased with cancer stage, with probability distributions being similar across training and test sets.
  • applying the built multivariable model to healthy patients, who were not utilized in the training resulted in few misclassifications and a spread nearly equivalent to that of the benign pelvic tumor cases.
  • Such results indicate that the glycoproteomic signature of the 25 peptide structures for the LASSO regression model solidly predict malignancy and severity of disease.
  • Table 9 below provides the fold changes, FDRs, and p-values for the 25 peptide structures PS-5 and PS-11 to PS-34 (same as those in Table 2 above) based on differential expression analysis (DEA).
  • Table 10 below provides the fold changes, FDRs, and p-values for the 36 peptide structures PS-4, PS-8, PS-18, PS-36, PS-37, PS-41, PS-56, PS-62 to PS-90 (same as those in Table 3B above) based on differential expression analysis (DEA).
  • the peptide structures PS- 4, PS-8, PS-18, PS-36, PS-37, PS-41, PS-56, PS-62 to PS-90 are ordered in Table 10 with respect to relative significance to the p value score generated by the model.
  • Table 10 Peptide Structure Markers for Regression Model to distinguish between late stage (3/4) EOC and early stage EOC (1/2) using the biomarkers of Table 3B.
  • Table 10B below provides the fold changes, FDRs, and p-values for the 25 peptide structures denoted by SEQ ID NO 101-125 (in accordance with Table 3C above) using differential expression analysis (DEA).
  • Table 10B Peptide Structure Markers for Regression Model to distinguish between late stage (3/4) EOC and early stage EOC (1/2) using the biomarkers of Table 3C.
  • Table 10C below provides the fold changes, FDRs, and p-values for the 50 peptide structures denoted by SEQ ID NO 126-175 (in accordance with Table 3D above) using differential expression analysis (DEA).
  • DEA differential expression analysis
  • Table 10D below provides the fold changes, FDRs, and p-values for the 12 peptide structures denoted by SEQ ID NO 131-134, 137, 139, 140, 143, 151, 165-167 (in accordance with Table 3D above) using differential expression analysis (DEA).
  • DEA differential expression analysis
  • n a number of biomarkers having a unique PS-ID No
  • i an index number for each of the biomarkers, Table 11.
  • the markers from Table 3C were used to train a regularized regression model (e.g., LASSO regression model).
  • Coefficients for the regularized regression model e.g., LASSO regression model
  • Table 11B Coefficients for the regularized regression model (e.g., LASSO regression model) are provided in Table 11B.
  • a probability for one of the states can be determined by summing together the product of the concentration of each biomarker in the sample and the respective coefficient (of one column) and then adding the summation and the intercept to yield the logit of a probability score (see equation 1).
  • FIG. 17 illustrates a receiver-operating-characteristic (ROC) curve and the area under curve (AUC) for the regularized regression model (e.g., LASSO regression model) for early stage and late stage ovarian cancer samples using testing case data and training case data.
  • ROC receiver-operating-characteristic
  • AUC area under curve
  • Table 12 shows the accuracy, sensitivity, specificity and precision for the training data set and the testing data set.
  • Table 13 shows the training accuracy and testing accuracy for the early stage and late stage cohort for ovarian cancer.
  • predicted probability was generated for early stage and late stage ovarian cancer samples showing a stratification in predicted probabilities between the two cohorts as is illustrated in Figure 20.
  • predicted probability can be generated for classifying early stage and late stage ovarian cancer samples using the markers with non-zero coefficients such as SEQ ID NO’s 130-135, 137, 139, 140, 143, 148, 149, 155, 158-162, 166, and 171.
  • a logistic regression model was used with the glycopeptides of Table 3D where the glycopeptides had 1 or more sialic acids and zero or more fucosylations for the early and late stage EOC cohorts.
  • the sialic acid is connected to galactose
  • galactose is connected to N- acetylglucosamine
  • N-acetylglucosamine is connected to fucose.
  • the 4 glycan breakdown fragment represents a single antennary branch having a fucose in an outer arm fucose position where the aggregate glycan was cleaved at a linkage between a mannose and a N- acetylglucosamine.
  • the 3 glycan breakdown fragment includes galactose, N-acetylglucosamine, and fucose (m/z value of 512.198).
  • the galactose is connected to N- acetylglucosamine
  • N-acetylglucosamine is connected to fucose.
  • the presence of the 4 glycan breakdown fragment and the 3 glycan breakdown fragment as shown in Figure 22 indicates the presence of outer arm fucosylation.
  • SEQ ID NOS. 131, 137, 143, 155, 158, 159, 162, 166, and 171 correspond to glycopeptides that each have a non-zero coefficient along with one fucose.
  • SEQ ID NO 131, 137, 143, 155, 159, 162, 166, and 171 each correspond to a glycoepeptide having an outer arm fucosylation format.
  • glycopeptide biomarkers with outer arm fucosylation can provide better prediction of ovarian cancer disease states.
  • a predicted probability can be generated for early stage and late stage ovarian cancer samples showing a stratification in predicted probabilities between the two cohorts.
  • predicted probability can be generated for classifying early stage and late stage ovarian cancer samples using the markers
  • glycopeptides of Table 3D were found to be associated with EOC.
  • glycopeptides that included fucose and also carrying tri- and tetra-antennary glycan structure were found to be more strongly associated with EOC.
  • Table 15 shows the accuracy, sensitivity, and specificity for the training data set and the testing data set.
  • the relative performance of Table 15 is better than Table 14 indicating that the subset of biomarkers using predominantly tri and tetra antennary glycans generated a better model for determining early vs late stage EOC.
  • a validation study was conducted using both retrospective patient samples and samples collected prospectively in the ongoing Clinical Validation of the InterVenn Ovarian CAncer Liquid Biopsy (VOCAL) study. Samples included those from patients with malignant EOC and patients with benign pelvic tumors. Samples were processed in a manner similar to the manner described for the Exemplary Retrospective Analysis in Section VII. A above.
  • Table 10 below provides the fold changes and p-values for the 38 peptide structures also identified in Table 3 above based on differential expression analysis (DEA). These peptide structures are ordered both in Table 3 and in Table 10 with respect to relative significance to the probability score generated by the model based on p-values. In this context, more significant peptide structures have lower p-values, while less significant peptide structures have higher p-values. In other words, relative significance to the probability score decreased with increasing p-values. IX. Additional Considerations

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Genetics & Genomics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Chemical & Material Sciences (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Urology & Nephrology (AREA)
  • Immunology (AREA)
  • Hematology (AREA)
  • Databases & Information Systems (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Cell Biology (AREA)
  • Artificial Intelligence (AREA)
  • Primary Health Care (AREA)
  • Bioethics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Microbiology (AREA)
  • Evolutionary Computation (AREA)
  • Food Science & Technology (AREA)
  • Medicinal Chemistry (AREA)
  • Biochemistry (AREA)
  • General Physics & Mathematics (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
EP23866505.3A 2022-09-16 2023-09-14 Diagnose von eierstockkrebs durch gezielte quantifizierung der ortsspezifischen proteinglykosylierung Pending EP4587839A2 (de)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202263376053P 2022-09-16 2022-09-16
US202363489712P 2023-03-10 2023-03-10
US202363517859P 2023-08-04 2023-08-04
PCT/US2023/074251 WO2024059750A2 (en) 2022-09-16 2023-09-14 Diagnosis of ovarian cancer using targeted quantification of site-specific protein glycosylation

Publications (1)

Publication Number Publication Date
EP4587839A2 true EP4587839A2 (de) 2025-07-23

Family

ID=90275934

Family Applications (1)

Application Number Title Priority Date Filing Date
EP23866505.3A Pending EP4587839A2 (de) 2022-09-16 2023-09-14 Diagnose von eierstockkrebs durch gezielte quantifizierung der ortsspezifischen proteinglykosylierung

Country Status (2)

Country Link
EP (1) EP4587839A2 (de)
WO (1) WO2024059750A2 (de)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5266465A (en) * 1989-06-23 1993-11-30 The Trustees Of The University Of Pennsylvania α-1-antichymotrypsin, analogues and methods of production
JP2006516089A (ja) * 2002-10-02 2006-06-22 ジェネンテック・インコーポレーテッド 腫瘍の診断と治療のための組成物と方法
AU2019246007C1 (en) * 2018-03-26 2025-08-07 Glycanostics S.R.O. Means and methods for glycoprofiling of a protein
CA3128367A1 (en) * 2019-02-01 2020-08-06 Venn Biosciences Corporation Biomarkers for diagnosing ovarian cancer
CA3219354A1 (en) * 2021-05-18 2022-11-24 Daniel SERIE Biomarkers for diagnosing ovarian cancer

Also Published As

Publication number Publication date
WO2024059750A3 (en) 2024-06-13
WO2024059750A2 (en) 2024-03-21

Similar Documents

Publication Publication Date Title
US20230055572A1 (en) Biomarkers for diagnosing ovarian cancer
US20220310230A1 (en) Biomarkers for determining an immuno-onocology response
US11774459B2 (en) Biomarkers for diagnosing non-alcoholic steatohepatitis (NASH) or hepatocellular carcinoma (HCC)
US20260004885A1 (en) Biomarkers for determining a cancer disease state, response to immuno-oncology, stages of fibrosis in non-alcoholic steatohepatitis, or application of age or sex related biomarker panel for quality control
US12578346B2 (en) Systems and methods for glycopeptide concentration determination, normalized abundance determination, and LC/MS run sample preparation
WO2024059750A2 (en) Diagnosis of ovarian cancer using targeted quantification of site-specific protein glycosylation
CN116456895A (zh) 用于诊断非酒精性脂肪性肝炎(nash)或肝细胞癌(hcc)的生物标志物
US20240379228A1 (en) Diagnosis of colorectal cancer using targeted quantification of peptides
AU2022399828A1 (en) Diagnosis of pancreatic cancer using targeted quantification of site-specific protein glycosylation
US20250111901A1 (en) De novo glycopeptide sequencing
US20250087363A1 (en) Predicting sarcoma treatment response using targeted quantification of site-specific protein glycosylation
US20250232874A1 (en) Ai-driven glycoproteomics liquid biopsy in nasopharyngeal carcinoma
HK40098154A (zh) 用於诊断非酒精性脂肪性肝炎(nash)或肝细胞癌(hcc)的生物标志物
WO2025024433A1 (en) Fucosylated pd-1 variants for determining an immuno-oncology response
CN117561449A (zh) 用于测定免疫肿瘤学反应的生物标志物
HK40109183A (zh) 用於测定免疫肿瘤学反应的生物标志物

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20250416

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC ME MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)