WO2021202620A1 - Approche métabolomique combinée à un apprentissage automatique pour reconnaître un condition médicale - Google Patents

Approche métabolomique combinée à un apprentissage automatique pour reconnaître un condition médicale Download PDF

Info

Publication number
WO2021202620A1
WO2021202620A1 PCT/US2021/025015 US2021025015W WO2021202620A1 WO 2021202620 A1 WO2021202620 A1 WO 2021202620A1 US 2021025015 W US2021025015 W US 2021025015W WO 2021202620 A1 WO2021202620 A1 WO 2021202620A1
Authority
WO
WIPO (PCT)
Prior art keywords
influenza
medical condition
compound
metabolite
equivalent
Prior art date
Application number
PCT/US2021/025015
Other languages
English (en)
Inventor
Pranav Rajpurkar
Benjamin Alan PINSKY
Catherine HOGAN
Anthony T. Le
Tina M. Cowan
Original Assignee
The Board Of Trustees Of The Leland Stanford Junior University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Board Of Trustees Of The Leland Stanford Junior University filed Critical The Board Of Trustees Of The Leland Stanford Junior University
Publication of WO2021202620A1 publication Critical patent/WO2021202620A1/fr

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/40ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Definitions

  • Accurate testing is particularly important for respiratory viruses including influenza, which are estimated to have caused over 35 million symptomatic illnesses during the 2018- 2019 season alone in the United States (Estimated Influenza Illnesses, Medical visits, Hospitalizations, and Deaths in the United States — 2018-2019 influenza season. (2019)).
  • rapid diagnostic methods for COVID-19 are currently limited to targeted molecular tests (RT-PCR) that detect the viral RNA genome, or serologic tests that detect anti-SARS-CoV-2 antibodies.
  • RT-PCR targeted molecular tests
  • serologic tests that detect anti-SARS-CoV-2 antibodies.
  • up to 30% of COVID-19 cases may be missed by molecular methods and specific antibodies are not reliably identified until 2 weeks after the onset of symptoms.
  • a method comprising, or alternatively consisting essentially of, or yet further consisting of (a) generating a training dataset based on one or more metabolites isolated from a plurality of biological samples isolated from subjects, wherein the plurality of subjects comprises subjects having a medical condition, and wherein the training dataset comprises, consists essentially of, or yet further consists of a set of features identified through one or more tests run on the biological samples; (b) producing, using a machine learning system, a metabolite biomarker signature by: (i) applying one or more machine learning models to the training dataset comprising, or alternatively consisting essentially of, or yet further consisting of the set of identified features; (ii) selecting a subset of the set of features based on contributions to model predictions; and (iii) generating the metabolite biomarker signature based on the subset of features; and (c) storing, in a computer-readable storage medium, the metabolite biomarker signature in association with the medical condition
  • a method comprising, or alternatively consisting essentially of, or yet further consisting of (a) running one or more types of tests on biological samples from subjects having a medical condition, wherein the biological samples comprise, or alternatively consist essentially of, or yet further consist of a metabolite profile that changed in the subjects as a result of the medical condition; (b) generating a training dataset comprising, or alternatively consisting essentially of, or yet further consisting of a set of features identified through the one or more tests run on the biological samples; (c) producing a metabolite biomarker signature by: (i) applying one or more machine learning models to the training dataset comprising the set of identified features; (ii) selecting a subset of the set of features having contributions to model predictions exceeding a threshold; and (iii) generating the metabolite biomarker signature based on the subset of features; and (d) applying the metabolite biomarker signature to a biological sample from a patient to recognize the medical condition.
  • a method comprising, or alternatively consisting essentially of, or yet further consisting of producing, using a machine learning system, a metabolite biomarker signature by: (i) applying one or more machine learning models to a training dataset based on one or more metabolites in a plurality of biological samples from a plurality of subjects, wherein the plurality of subjects comprises, or alternatively consists essentially of, or yet further consists of subjects having a medical condition, and wherein the training dataset comprises, or alternatively consists essentially of, or yet further consists of a set of features identified through the one or more tests run on the biological samples;
  • a method comprising, or alternatively consisting essentially of, or yet further consisting of applying a metabolite biomarker signature to a biological sample from a patient to recognize the medical condition.
  • the metabolite biomarker signature is produced by a method as disclosed herein.
  • a method for selecting a subject for an anti-influenza treatment comprises, or alternatively consists essentially of, or yet further consists of determining in a biological sample isolated from a subject suspected of having a medical condition (such as being infected with an influenza virus) a feature of a metabolite.
  • the metabolite is selected from one or more of: pyroglutamic acid, an in-source fragment ion of pyroglutamic acid, formylmethyl glutathione, a compound having a mass-to-charge ratio (m/z) of 106.0865 and a retention time (RT) of 10.34 or an equivalent thereof, a compound having an m/z of 130.0507 and an RT of 0.81 or an equivalent thereof, a compound having an m/z of 144.0935h and an RT of 8.36 or an equivalent thereof, a compound having an m/z of 145.0935 and an RT of 8.36 or an equivalent thereof, a compound having an m/z of 178.1441 and an RT of 10.33 or an equivalent thereof, a compound having an m/z of 201.0740 and an RT of 3.21 or an equivalent thereof, a compound having an m/z of 211.1376 and an RT of 8.65 or an equivalent thereof, a compound
  • an altered level of the metabolite in the sample as compared to a control level of the metabolite indicates that the subject is suitable for an anti-influenza treatment.
  • the the in-source fragment ion of pyroglutamic acid is pyroglutamic acid-D5.
  • the method further comprises administering the subject having the infection an anti -influenza therapy.
  • a system comprising, or alternatively consisting essentially of, or yet further consisting of a processor and a memory.
  • the memory comprises, or alternatively consists essentially of, or yet further consists of instructions that are executable by the processor to cause the machine learning system to: (a) generate a training dataset based on one or more tests run on biological samples from a plurality of subjects having a medical condition, wherein the biological samples comprise, or alternatively consist essentially of, or yet further consist of metabolites from the medical condition, and wherein the training dataset comprises, or alternatively consist essentially of, or yet further consist of a set of features identified through the one or more tests run on the biological samples; (b) produce a metabolite biomarker signature by: (i) applying one or more machine learning models to the training dataset comprising the set of identified features; (ii) selecting a subset of the set of features based on contributions to model predictions; and (iii) generating the metabolite biomarker signature based on the sub
  • the plurality of subjects comprises, or alternatively consists essentially of, or yet further consists of subjects having the medical condition. In further embodiments, the plurality of subjects further comprises subject without the medical condition. In some embodiments, the subject has been treated with a therapy neutralizing a pathogen causing the medical condition. Additionally or alternatively, the subject is immune-compromised. In some embodiments, the subject is a human. In some embodiments, the subject is an adult. In other embodiments, the subject is a child. In some embodiments, the medical condition is an infection by an influenza virus.
  • the medical condition is an infection by a coronavirus, such as HCoV-OC43, HCoV-HKUl, HCoV-229E, HCoV-NL63, severe acute respiratory syndrome coronavirus (SARS-CoV or SARS-CoV-1), Middle East respiratory syndrome coronavirus (MERS- CoV), or severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
  • a coronavirus such as HCoV-OC43, HCoV-HKUl, HCoV-229E, HCoV-NL63, severe acute respiratory syndrome coronavirus (SARS-CoV or SARS-CoV-1), Middle East respiratory syndrome coronavirus (MERS- CoV), or severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
  • the feature of the biological sample comprises a concentration (absolute or normalized), for example an extracellular concentration, of the metabolite.
  • a metabolite is an intracellular metabolite.
  • the machine learning models comprise, or alternatively consist essentially of, or yet further consist of a boosted or bagged decision tree, such as Light Gradient Boosting Machine (LightGBM), XGBoost, random forest, or Adaptive Boosting (AdaBoost).
  • the selecting the subset of features comprises performing feature importance analysis.
  • the method further comprises applying a Shapley Additive Explanation (SHAP) procedure and selecting the subset of features.
  • SHAP Shapley Additive Explanation
  • the method further comprises administering to the patient having the medical condition a therapy specifically for treating the condition.
  • the therapy comprises, or alternatively consists essentially of, or yet further consists of a pharmaceutical agent neutralizing a pathogen causing the medical condition.
  • the therapy comprises, or alternatively consists essentially of, or yet further consists of a pharmaceutical agent not neutralizing a pathogen causing the medical condition.
  • the therapy comprises, or alternatively consists essentially of, or yet further consists of a pharmaceutical agent not neutralizing an extracellular pathogen causing the medical condition.
  • LC/MS Liquid Chromatography /Mass Spectrometry
  • FIGs. 1A - ID show principal component analysis (PCA) of unpaired t-test comparison based on 4 key metabolites of nasopharyngeal swabs positive for respiratory virus (as marked, FIG. 1A: adenovirus; FIG. IB: coronavirus; and FIG. 1C: RSV) and nasopharyngeal swab negative for respiratory viruses by RT-PCR (as marked) and Influenza A 2009 H1N1 was also differentiated from influenza A H3N2 using the same methodology (FIG. ID)
  • PCA principal component analysis
  • FIGs. 2A - 2B provide a conceptual diagram of the classification analysis by gradient boosted decision tree implemented in GBM (FIG. 2A) and a conceptual diagram of the study from data collection to interpretation (FIG. 2B). The phases of data collection, model development, and interpretation are illustrated.
  • LC/Q-TOF liquid chromatography quadrupole time-of-flight
  • LC-MS/MS liquid chromatography-mass spectrometry
  • RF random forests
  • ROC receiver operating characteristic curve
  • SHAP Shapley Additive explanation.
  • FIGs. 3A - 3D show area under the receiver operating characteristic curve test performance of the biomarker discovery set.
  • ROC curves comparing the performance of the machine learning models (RF, LightGBM) with the traditional linear models (Lasso, Ridge) on the test set; bracketed values are 95% AUC confidence intervals calculated from a normal fit of the curves.
  • FIGs. 3B - 3C are AUC curves of comparing LightGBM’ s performance on the test set stratified by subgroup pairs: pediatrics (FIG. 3B) and immunocompromised (FIG. 3C). 95% confidence intervals are shown in brackets.
  • 3D provides AUC curves comparing LightGBM’ s performance on the prospective test set; bracketed values are 95% AUC confidence intervals calculated from a normal fit of the curves.
  • AUC area under the receiver operating characteristic curve; RF: random forests; ROC: receiver operating characteristic curve.
  • FIGs. 4A - 4D provide feature importance analysis by SHapley Additive explanation (SHAP) values.
  • FIGs. 4A-4B list the top 20 ion features by percentage importance using the SHAP method. Ion features are identified by accurate mass @ retention time (m/z), and the grey scale indicate the association between feature value and positive influenza classification. For example, low values of 84.0447@0.81 are indicative of positive classification, while the relative value of 106.0865@10.34 does not have a clear interpretation, despite being an important feature.
  • FIG. 4C provides AUC and 95% confidence interval of parsimonious decision tree models as a function of number of features used for training. For each set, the left bar indicates data from the discovery set while the right bar indicates data from the validation set.
  • FIG. 4D provides an example decision tree model trained using only the top feature and a maximum depth of 1 that has an AUC of greater than 0.9 on the test set.
  • AUC area under the receiver operating characteristic curve; m/z: mass over charge ratio; RT: retention time; SHAP: SHapley Additive explanation analysis.
  • FIGs. 5A - 5B show area under the receiver operating characteristic curve test performance of the validation set.
  • ROC curves demonstrate LightGBM’s performance on the 96-sample validation test set in Laboratory 1.
  • ROC curves demonstrate LightGBM’s performance on the 96-sample validation test set in Laboratory 2.
  • FIG. 6 is a heatmap of nasopharyngeal metabolites. This heatmap was generated from metabolomics analysis of nasopharyngeal samples from children and adults with and without influenza infection, clustered by correlation distance and average linkage. The accurate mass and retention time (accurate mass @ retention time) are listed for each compound on the right, the hierarchical cluster tree appears on the left, and the influenza virus type or subtype is listed at the bottom.
  • FIG. 7 provides the LC/Q-TOF experimental workflow from sample collection to data analysis.
  • FIG. 8 provides area under the receiver operating characteristic (AUC) data with viral transport medium subtraction. Area under the receiver operating characteristic (AUC) data with viral transport medium subtraction. This model subtracted the mean viral transport medium (VTM) data to assess the impact of background matrix in the analysis. The estimates presented are similar to those without VTM subtraction.
  • AUC receiver operating characteristic
  • AUC receiver operating characteristic
  • VTM mean viral transport medium
  • FIGs. 9A - 9D provide pyroglutamic acid concentration by LC/MS-MS area under the curve analysis in influenza-positive vs influenza-negative specimens in Laboratory 1. Results are shown the overall classification of influenza-positive vs influenza negative (FIG. 9A, pyroglutamic acid; and FIG. 9C, in-source fragment ion of pyroglutamic acid) and classification by influenza type and subtype (FIG. 9B, pyroglutamic acid; and FIG. 9D, in-source fragment ion of pyroglutamic acid). P-values calculated by Mann-Whitney U test.
  • FIGs. 10A - 10D provide pyroglutamic acid concentration by LC/MS-MS area by standard curve analysis in influenza-positive vs influenza-negative specimens in Laboratory 2. Results are shown the overall classification of influenza-positive vs influenza negative (FIG. 10A, pyroglutamic acid; and FIG. IOC, in-source fragment ion of pyroglutamic acid) and classification by influenza type and subtype (FIG. 10B, pyroglutamic acid; and FIG. 10D, in-source fragment ion of pyroglutamic acid). P-values calculated by Mann-Whitney U test.
  • FIG. 11A is a block diagram depicting an embodiment of a network environment comprising a client device in communication with server device.
  • FIG. 1 IB is a block diagram depicting a cloud computing environment comprising client device in communication with cloud service providers.
  • FIGs. llC - 1 ID are block diagrams depicting embodiments of computing devices useful in connection with the methods and systems described herein.
  • FIG. 12 illustrates a system including a computing device and a sample processing system according to various potential embodiments.
  • FIG. 13 shows a flowchart for an example process employing a machine learning approach according to various potential embodiments.
  • a cell includes a plurality of cells, including mixtures thereof.
  • compositions and methods are intended to mean that the compounds, compositions and methods include the recited elements, but not exclude others.
  • Consisting essentially of when used to define compounds, compositions and methods, shall mean excluding other elements of any essential significance to the combination. Thus, a composition consisting essentially of the elements as defined herein would not exclude trace contaminants, e.g., from the isolation and purification method and pharmaceutically acceptable carriers, preservatives, and the like. “Consisting of’ shall mean excluding more than trace elements of other ingredients. Embodiments defined by each of these transition terms are within the scope of this technology.
  • deviations of 20 percent may be considered insubstantial deviations, while in certain embodiments, deviations of 15 percent may be considered insubstantial deviations, and in other embodiments, deviations of 10 percent may be considered insubstantial deviations, and in some embodiments, deviations of 5 percent may be considered insubstantial deviations.
  • deviations may be acceptable when they achieve the intended results or advantages, or are otherwise consistent with the spirit or nature of the embodiments.
  • substantially or “essentially” means nearly totally or completely, for instance, 95% or greater of some given quantity. In some embodiments, “substantially” or “essentially” means 95%, 96%, 97%, 98%, 99%, 99.5%, or 99.9%.
  • comparative terms as used herein can refer to certain variation from the reference.
  • such variation can refer to about 10%, or about 20%, or about 30%, or about 40%, or about 50%, or about 60%, or about 70%, or about 80%, or about 90%, or about 1 fold, or about 2 folds, or about 3 folds, or about 4 folds, or about 5 folds, or about 6 folds, or about 7 folds, or about 8 folds, or about 9 folds, or about 10 folds, or about 20 folds, or about 30 folds, or about 40 folds, or about 50 folds, or about 60 folds, or about 70 folds, or about 80 folds, or about 90 folds, or about 100 folds or more higher than the reference.
  • such variation can refer to about 1%, or about 2%, or about 3%, or about 4%, or about 5%, or about 6%, or about 7%, or about 8%, or about 0%, or about 10%, or about 20%, or about 30%, or about 40%, or about 50%, or about 60%, or about 70%, or about 75%, or about 80%, or about 85%, or about 90%, or about 95%, or about 96%, or about 97%, or about 98%, or about 99% of the reference.
  • the term “animal” refers to living multi-cellular vertebrate organisms, a category that includes, for example, mammals and birds.
  • the term “mammal” includes both human and non-human mammals.
  • a mammal is a human.
  • mammals include humans, non-human primates (e.g., apes, gibbons, chimpanzees, orangutans, monkeys, macaques, and the like), domestic animals (e.g., dogs and cats), farm animals (e.g., horses, cows, goats, sheep, pigs) and experimental animals (e.g., mouse, rat, rabbit, guinea pig).
  • a mammal is a human.
  • a mammal can be any age or at any stage of development (e.g., an adult, teen, child, infant, or a mammal in utero).
  • a mammal can be male or female.
  • a subject is a human.
  • a subject is suspected of having a medical condition.
  • the subject may be asymptomatic.
  • the subject may be symptomatic, i.e., showing a symptom of the medical condition.
  • the subject is immune-comprised. Additionally or alternatively, the subject has been treated with a therapy neutralizing the pathogen causing the medical condition.
  • composition refers to an active agent, such as a compound as disclosed herein and a carrier, inert or active.
  • the carrier can be, without limitation, solid such as a bead or resin, or liquid, such as phosphate buffered saline.
  • an “effective amount” is an amount sufficient to effect beneficial or desired results.
  • An effective amount can be administered in one or more administrations, applications or dosages. Such delivery is dependent on a number of variables including the time period for which the individual dosage unit is to be used, the bioavailability of the therapeutic agent, the route of administration, etc. It is understood, however, that specific dose levels of the therapeutic agents disclosed herein for any particular subject depends upon a variety of factors including the activity of the specific compound employed, bioavailability of the compound, the route of administration, the age of the animal and its body weight, general health, sex, the diet of the animal, the time of administration, the rate of excretion, the drug combination, and the severity of the particular disorder being treated and form of administration.
  • “Therapeutically effective amount” of a drug or an agent refers to an amount of the drug or the agent that is an amount sufficient to obtain a pharmacological response; or alternatively, is an amount of the drug or agent that, when administered to a patient with a specified disorder or disease, is sufficient to have the intended effect, e.g., treatment, alleviation, amelioration, palliation or elimination of one or more manifestations of the specified disorder or disease in the patient.
  • a therapeutic effect does not necessarily occur by administration of one dose, and may occur only after administration of a series of doses. Thus, a therapeutically effective amount may be administered in one or more administrations.
  • treating or “treatment” of a disease in a subject refers to (1) preventing the symptoms or disease from occurring in a subject that is predisposed or does not yet display symptoms of the disease; (2) inhibiting the disease or arresting its development; or (3) ameliorating or causing regression of the disease or the symptoms of the disease.
  • treatment is an approach for obtaining beneficial or desired results, including clinical results.
  • beneficial or desired results can include one or more, but are not limited to, alleviation or amelioration of one or more symptoms, diminishment of extent of a condition (including a disease), stabilized (i.e., not worsening) state of a condition (including disease), delay or slowing of condition (including disease), progression, amelioration or palliation of the condition (including disease), states and remission (whether partial or total), whether detectable or undetectable.
  • the disease is cancer
  • the following clinical end points are non-limiting examples of treatment: reduction in tumor burden, slowing of tumor growth, longer overall survival, longer time to tumor progression, inhibition of metastasis or a reduction in metastasis of the tumor.
  • treatment excludes prophylaxis.
  • a biological sample is obtained from a subject.
  • exemplary samples include, but are not limited to, cell sample, tissue sample, tumor biopsy, liquid samples such as blood and other liquid samples of biological origin, including, but not limited to, ocular fluids (aqueous and vitreous humor), peripheral blood, sera, plasma, ascites, urine, cerebrospinal fluid (CSF), sputum, saliva, bone marrow, synovial fluid, aqueous humor, amniotic fluid, cerumen, breast milk, broncheoalveolar lavage fluid, semen, prostatic fluid, cowper’s fluid or pre-ejaculatory fluid, female ejaculate, sweat, tears, cyst fluid, pleural and peritoneal fluid, pericardial fluid, ascites, lymph, chyme, chyle, bile, interstitial fluid, menses, pus, sebum, vomit, vaginal secretions/flushing, synovial fluid, mucosal
  • contacting means direct or indirect binding or interaction between two or more.
  • a particular example of direct interaction is binding.
  • a particular example of an indirect interaction is where one entity acts upon an intermediary molecule, which in turn acts upon the second referenced entity.
  • Contacting as used herein includes in solution, in solid phase, in vitro, ex vivo, in a cell and in vivo. Contacting in vivo can be referred to as administering, or administration.
  • extracting is also used herein to refer to polynucleotides, polypeptides, proteins, metabolites, cells, tissues, or any combination thereof (such as a biological sample) that are isolated from other polynucleotides, polypeptides, proteins, metabolites, cells, tissues, or any combination thereof which are normally associated in nature.
  • extracting a biological sample from a subject may refer to obtaining polynucleotides, polypeptides, proteins, metabolites, cells, tissues, or any combination thereof from the subject and having the obtained biological materials ex vivo or in vitro.
  • a “cancer” is a disease state characterized by the presence in a subject of cells demonstrating abnormal uncontrolled replication and in some aspects, the term may be used interchangeably with the term “tumor.”
  • a metabolite refers to an intermediate of metabolism, or end product of metabolism, or any substance involved in metabolism.
  • a metabolite refers to a small molecule, which is an organic compound having a low molecular weight, such as lower than 900 daltons, or size on the order of 1 nm.
  • a metabolite can be measured by a test as disclosed herein.
  • MS mass spectrometry
  • MS refers to an analytical technique to identify compounds by their mass.
  • MS refers to methods of filtering, detecting, and measuring ions based on their mass-to-charge ratio, or "m/z”.
  • MS technology generally includes (1) ionizing the compounds to form charged compounds; and (2) detecting the molecular weight of the charged compounds and calculating a mass-to-charge ratio. The compounds maybe ionized and detected by any suitable means.
  • a “mass spectrometer” generally includes an ionizer and an ion detector.
  • one or more molecules of interest are ionized, and the ions are subsequently introduced into a mass spectrographic instrument where, due to a combination of magnetic and electric fields, the ions follow a path in space that is dependent upon mass (“m”) and charge (“z").
  • mass m
  • z charge
  • Retention time is a measure of the time taken for a solute to pass through a chromatography column. It is calculated as the time from injection to detection.
  • the RT for a compound is not fixed as many factors can influence it even if the same chromatography (GC) and column are used. These include, but are not limited to: gas flow rate, temperature differences in the oven and column, column degradation, or column length. These factors can make it difficult to compare retention times.
  • an RT as used herein is determined by a quanlitative analysis. Qualitative analysis relies on comparing the retention times of the peaks in an unknown sample with those of known standards. If the retention time of a peak in the unknown sample is the same as the standard then a positive identification can be made.
  • an RT as used herein is a relative RT.
  • the use of the relative retention time (RRT) reduces the effects of some of the variables that can affect the retention time.
  • RRT is an expression of a sample’s retention time relative to the standard’s retention time.
  • a sample matrix is made up by mixing the sample with an internal standard (IS), and the following calculation can then be performed.
  • a decision tree is a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility, displaying an algorithm that only contains conditional control statements.
  • Ensemble methods combine several decision trees to produce better predictive performance than utilizing a single decision tree. Ensembled decision trees may be bagged or boosted.
  • Bagging Bitstrap Aggregation
  • a decision tree for example by creating several subsets of data from training sample chosen randomly with replacement, using each collection of subset data to train the decision trees, and accordingly ending up with an ensemble of different models. Average of all the predictions from different trees are used which is more robust than a single decision tree.
  • One non limiting example of bagged decision trees is random forest, which takes one extra step using the radom selection of features rather than using all features to grow trees.
  • Boosting is another ensemble technique to create a collection of predictors.
  • learners are learned sequentially with early learners fitting simple models to the data and then analyzing data for errors.
  • Consecutive trees random sample
  • the goal is to solve for net error from the prior tree.
  • Gradient Boosting is an extension over boosting method, using radient descent algorithm which can optimize any differentiable loss function.
  • An ensemble of trees are built one by one and individual trees are summed sequentially. Next tree tries to recover the loss (difference between actual and predicted values).
  • Non-limiting examples of gradient boosting include Light Gradient Boosting Machine(LightGBM), XGBoost, or Adaptive Boosting (AdaBoost).
  • LightGBM Light Gradient Boosting Machine
  • XGBoost Adaptive Boosting
  • AdaBoost Adaptive Boosting
  • SHAP SHapley Additive explanations
  • SHAP is a method to explain individual predictions.
  • SHAP is based on the game theoretically optimal Shapley Values.
  • the goal of SHAP is to explain the prediction of an instance x by computing the contribution of each feature to the prediction.
  • the SHAP explanation method computes Shapley values from coalitional game theory.
  • the feature values of a data instance act as players in a coalition.
  • a player can be an individual feature value, e.g. for tabular data.
  • a player can also be a group of feature values. For example to explain an image, pixels can be grouped to super pixels and the prediction distributed among them.
  • SHAP Shapley value explanation is represented as an additive feature attribution method, a linear model. That view connects LIME and Shapley Values.
  • SHAP specifies the explanation as: where g is the explanation model, z'E ⁇ 0, 1 ⁇ M is the coalition vector, M is the maximum coalition size and f
  • £ ⁇ I is the feature attribution for a feature j, the Shapley values.
  • an entry of 1 means that the corresponding feature value is "present” and 0 that it is "absent”.
  • To compute Shapley values it is stimulated that only some features values are playing ("present") and some are not ("absent"). More details are available at shap . readthedocs . i o/ en/1 atest/.
  • Metabolomics or the large- scale study of small molecules, represents a change in paradigm from routine clinical virology diagnostics as it detects host metabolic response rather than directly detecting the pathogen (Sinem Nalbantoglu (August 7th 2019). Metabolomics: Basic Principles and Strategies, Molecular Medicine, Sinem Nalbantoglu and Hakima Amri, IntechOpen). Metabolomics theoretically holds promise for infectious diseases applications as it can be performed directly from patient specimens from minimal sample volume, is inexpensive to run, provides a real-time assessment of host response and may accurately differentiate active infection from colonization (Pacchiarotta et al. Bioanalysis 4, 919-925 (2012); and Zurfluh et al. Expert Rev Anti Infect Ther 16, 133-142 (2016)).
  • VTM viral transport medium
  • Applicant used this LC/Q-TOF method to generate data to develop and validate machine learning (ML) algorithms for classification of influenza infection status, and an interpretation method for biomarker discovery (FIG. 2B).
  • the developed top-20 biomarker signature was then adapted to testing on simpler, targeted triple quadrupole mass spectrometry instruments (LC/MS-MS; referred to as tandem mass spectrometry) in two distinct laboratories for validation on upper respiratory tract specimens.
  • LC/MS-MS triple quadrupole mass spectrometry instruments
  • tandem mass spectrometry tandem mass spectrometry
  • the metabolomic method of this disclosure presents multiple novel aspects that have the potential to fill the diagnostic gap and significantly improve the way infections disease such as influenze or SARS-CoV-2 infection/COVID-19 is diagnosed and monitored.
  • metabolic signature discovery is based on a novel in-line, two-column metabolomics method that enables testing to be performed in a single run. This approach reduces turnaround time and increases precision compared to current standard of care in metabolomics, where testing must be performed separately for polar and non-polar compounds.
  • the method shows promise to improve the way SARS-CoV-2 diagnostics are performed by directly characterizing the host metabolic response to infection in a non-invasive manner that optimizes sensitivity.
  • FIG. 11 A an embodiment of a network environment is depicted.
  • the network environment includes one or more clients 102a-102n (also generally referred to as local machine(s) 102, client(s) 102, client node(s) 102, client machine(s) 102, client computer(s) 102, client device(s) 102, endpoint(s) 102, or endpoint node(s) 102) in communication with one or more servers 106a- 106n (also generally referred to as server(s) 106, node 106, or remote machine(s) 106) via one or more networks 104.
  • a client 102 has the capacity to function as both a client node seeking access to resources provided by a server and as a server providing access to hosted resources for other clients 102a-102n.
  • FIG. 11A shows a network 104 between the clients 102 and the servers 106
  • the clients 102 and the servers 106 may be on the same network 104.
  • a network 104’ (not shown) may be a private network and a network 104 may be a public network.
  • a network 104 may be a private network and a network 104’ a public network.
  • networks 104 and 104’ may both be private networks.
  • the network 104 may be connected via wired or wireless links.
  • Wired links may include Digital Subscriber Line (DSL), coaxial cable lines, or optical fiber lines.
  • the wireless links may include BLUETOOTH, Wi-Fi, Worldwide Interoperability for Microwave Access (WiMAX), an infrared channel or satellite band.
  • the wireless links may also include any cellular network standards used to communicate among mobile devices, including standards that qualify as 1G, 2G, 3G, 4G, or 5G.
  • the network standards may qualify as one or more generation of mobile telecommunication standards by fulfilling a specification or standards such as the specifications maintained by International Telecommunication Union.
  • the 3G standards may correspond to the International Mobile Telecommunications-2000 (IMT-2000) specification, and the 4G standards may correspond to the International Mobile Telecommunications Advanced (IMT- Advanced) specification.
  • Examples of cellular network standards include AMPS, GSM, GPRS, UMTS, LTE, LTE Advanced, Mobile WiMAX, and WiMAX- Advanced.
  • Cellular network standards may use various channel access methods e.g. FDMA, TDMA, CDMA, or SDMA.
  • different types of data may be transmitted via different links and standards.
  • the same types of data may be transmitted via different links and standards.
  • the network 104 may be any type and/or form of network.
  • the geographical scope of the network 104 may vary widely and the network 104 can be a body area network (BAN), a personal area network (PAN), a local-area network (LAN), e.g. Intranet, a metropolitan area network (MAN), a wide area network (WAN), or the Internet.
  • the topology of the network 104 may be of any form and may include, e.g., any of the following: point-to-point, bus, star, ring, mesh, or tree.
  • the network 104 may be an overlay network which is virtual and sits on top of one or more layers of other networks 104’.
  • the network 104 may be of any such network topology as known to those ordinarily skilled in the art capable of supporting the operations described herein.
  • the network 104 may utilize different techniques and layers or stacks of protocols, including, e.g., the Ethernet protocol, the internet protocol suite (TCP/IP), the ATM (Asynchronous Transfer Mode) technique, the SONET (Synchronous Optical Networking) protocol, or the SDH (Synchronous Digital Hierarchy) protocol.
  • the TCP/IP internet protocol suite may include application layer, transport layer, internet layer (including, e.g., IPv6), or the link layer.
  • the network 104 may be a type of a broadcast network, a telecommunications network, a data communication network, or a computer network.
  • the system may include multiple, logically-grouped servers 106.
  • the logical group of servers may be referred to as a server farm 38 or a machine farm 38.
  • the servers 106 may be geographically dispersed.
  • a machine farm 38 may be administered as a single entity.
  • the machine farm 38 includes a plurality of machine farms 38.
  • the servers 106 within each machine farm 38 can be heterogeneous - one or more of the servers 106 or machines 106 can operate according to one type of operating system platform (e.g., WINDOWS NT, manufactured by Microsoft Corp. of Redmond, Washington), while one or more of the other servers 106 can operate on according to another type of operating system platform (e.g., Unix, Linux, or Mac OS X).
  • operating system platform e.g., Unix, Linux, or Mac OS X
  • servers 106 in the machine farm 38 may be stored in high- density rack systems, along with associated storage systems, and located in an enterprise data center. In this embodiment, consolidating the servers 106 in this way may improve system manageability, data security, the physical security of the system, and system performance by locating servers 106 and high performance storage systems on localized high performance networks. Centralizing the servers 106 and storage systems and coupling them with advanced system management tools allows more efficient use of server resources.
  • the servers 106 of each machine farm 38 do not need to be physically proximate to another server 106 in the same machine farm 38.
  • the group of servers 106 logically grouped as a machine farm 38 may be interconnected using a wide-area network (WAN) connection or a metropolitan-area network (MAN) connection.
  • WAN wide-area network
  • MAN metropolitan-area network
  • a machine farm 38 may include servers 106 physically located in different continents or different regions of a continent, country, state, city, campus, or room. Data transmission speeds between servers 106 in the machine farm 38 can be increased if the servers 106 are connected using a local-area network (LAN) connection or some form of direct connection.
  • LAN local-area network
  • a heterogeneous machine farm 38 may include one or more servers 106 operating according to a type of operating system, while one or more other servers 106 execute one or more types of hypervisors rather than operating systems.
  • hypervisors may be used to emulate virtual hardware, partition physical hardware, virtualize physical hardware, and execute virtual machines that provide access to computing environments, allowing multiple operating systems to run concurrently on a host computer.
  • Native hypervisors may run directly on the host computer.
  • Hypervisors may include VMware ESX/ESXi, manufactured by VMWare, Inc., of Palo Alto, California; the Xen hypervisor, an open source product whose development is overseen by Citrix Systems, Inc.; the HYPER-V hypervisors provided by Microsoft or others.
  • Hosted hypervisors may run within an operating system on a second software level. Examples of hosted hypervisors may include VMware Workstation and VIRTU ALBOX.
  • Management of the machine farm 38 may be de-centralized.
  • one or more servers 106 may comprise components, subsystems and modules to support one or more management services for the machine farm 38.
  • one or more servers 106 provide functionality for management of dynamic data, including techniques for handling failover, data replication, and increasing the robustness of the machine farm 38.
  • Each server 106 may communicate with a persistent store and, in some embodiments, with a dynamic store.
  • Server 106 may be a file server, application server, web server, proxy server, appliance, network appliance, gateway, gateway server, virtualization server, deployment server, SSL VPN server, or firewall.
  • the server 106 may be referred to as a remote machine or a node.
  • a plurality of nodes 290 may be in the path between any two communicating servers.
  • a cloud computing environment may provide client 102 with one or more resources provided by a network environment.
  • the cloud computing environment may include one or more clients 102a-102n, in communication with the cloud 108 over one or more networks 104.
  • Clients 102 may include, e.g., thick clients, thin clients, and zero clients.
  • a thick client may provide at least some functionality even when disconnected from the cloud 108 or servers 106.
  • a thin client or a zero client may depend on the connection to the cloud 108 or server 106 to provide functionality.
  • a zero client may depend on the cloud 108 or other networks 104 or servers 106 to retrieve operating system data for the client device.
  • the cloud 108 may include back end platforms, e.g., servers 106, storage, server farms or data centers.
  • the cloud 108 may be public, private, or hybrid.
  • Public clouds may include public servers 106 that are maintained by third parties to the clients 102 or the owners of the clients.
  • the servers 106 may be located off-site in remote geographical locations as disclosed above or otherwise.
  • Public clouds may be connected to the servers 106 over a public network.
  • Private clouds may include private servers 106 that are physically maintained by clients 102 or owners of clients.
  • Private clouds may be connected to the servers 106 over a private network 104.
  • Hybrid clouds 108 may include both the private and public networks 104 and servers 106.
  • the cloud 108 may also include a cloud based delivery, e.g. Software as a Service (SaaS) 110, Platform as a Service (PaaS) 112, and Infrastructure as a Service (IaaS) 114.
  • SaaS Software as a Service
  • PaaS Platform as a Service
  • IaaS Infrastructure as a Service
  • IaaS may refer to a user renting the use of infrastructure resources that are needed during a specified time period.
  • IaaS providers may offer storage, networking, servers or virtualization resources from large pools, allowing the users to quickly scale up by accessing more resources as needed.
  • IaaS can include infrastructure and services (e.g., EG-32) provided by OVH HOSTING of Montreal, Quebec, Canada, AMAZON WEB SERVICES provided by Amazon.com, Inc., of Seattle, Washington, RACKSPACE CLOUD provided by Rackspace US, Inc., of San Antonio, Texas, Google Compute Engine provided by Google Inc. of Mountain View, California, or RIGHTSCALE provided by RightScale, Inc., of Santa Barbara, California.
  • PaaS providers may offer functionality provided by IaaS, including, e.g., storage, networking, servers or virtualization, as well as additional resources such as, e.g., the operating system, middleware, or runtime resources.
  • PaaS examples include WINDOWS AZURE provided by Microsoft Corporation of Redmond, Washington, Google App Engine provided by Google Inc., and HEROKU provided by Heroku, Inc. of San Francisco, California.
  • SaaS providers may offer the resources that PaaS provides, including storage, networking, servers, virtualization, operating system, middleware, or runtime resources. In some embodiments, SaaS providers may offer additional resources including, e.g., data and application resources. Examples of SaaS include GOOGLE APPS provided by Google Inc., SALESFORCE provided by Salesforce.com Inc. of San Francisco, California, or OFFICE 365 provided by Microsoft Corporation. Examples of SaaS may also include data storage providers, e.g. DROPBOX provided by Dropbox, Inc. of San Francisco, California, Microsoft SKYDRIVE provided by Microsoft Corporation, Google Drive provided by Google Inc., or Apple ICLOUD provided by Apple Inc. of Cupertino, California.
  • Clients 102 may access IaaS resources with one or more IaaS standards, including, e.g., Amazon Elastic Compute Cloud (EC2), Open Cloud Computing Interface (OCCI), Cloud Infrastructure Management Interface (CIMI), or OpenStack standards.
  • IaaS standards may allow clients access to resources over HTTP, and may use Representational State Transfer (REST) protocol or Simple Object Access Protocol (SOAP).
  • REST Representational State Transfer
  • SOAP Simple Object Access Protocol
  • Clients 102 may access PaaS resources with different PaaS interfaces.
  • Some PaaS interfaces use HTTP packages, standard Java APIs, JavaMail API, Java Data Objects (JDO), Java Persistence API (JPA), Python APIs, web integration APIs for different programming languages including, e.g., Rack for Ruby, WSGI for Python, or PSGI for Perl, or other APIs that may be built on REST, HTTP, XML, or other protocols.
  • Clients 102 may access SaaS resources through the use of web-based user interfaces, provided by a web browser (e g. GOOGLE CHROME, Microsoft INTERNET EXPLORER, or Mozilla Firefox provided by Mozilla Foundation of Mountain View, California).
  • Clients 102 may also access SaaS resources through smartphone or tablet applications, including, e.g., Salesforce Sales Cloud, or Google Drive app.
  • Clients 102 may also access SaaS resources through the client operating system, including, e.g., Windows file system for DROPBOX.
  • access to IaaS, PaaS, or SaaS resources may be authenticated.
  • a server or authentication server may authenticate a user via security certificates, HTTPS, or API keys.
  • API keys may include various encryption standards such as, e.g., Advanced Encryption Standard (AES).
  • Data resources may be sent over Transport Layer Security (TLS) or Secure Sockets Layer (SSL).
  • TLS Transport Layer Security
  • SSL Secure Sockets Layer
  • the client 102 and server 106 may be deployed as and/or executed on any type and form of computing device, e.g. a computer, network device or appliance capable of communicating on any type and form of network and performing the operations described herein.
  • FIGs. 11C and 11D depict block diagrams of a computing device 100 useful for practicing an embodiment of the client 102 or a server 106. As shown in FIGs. 11C and 11D, each computing device 100 includes a central processing unit 121, and a main memory unit 122. As shown in FIG.
  • a computing device 100 may include a storage device 128, an installation device 116, a network interface 118, an I/O controller 123, display devices 124a-124n, a keyboard 126 and a pointing device 127, e.g. a mouse.
  • the storage device 128 may include, without limitation, an operating system, software, and a software of a genomic data processing system 120.
  • each computing device 100 may also include additional optional elements, e.g. a memory port 103, a bridge 170, one or more input/output devices 130a-130n (generally referred to using reference numeral 130), and a cache memory 140 in communication with the central processing unit 121.
  • the central processing unit 121 is any logic circuitry that responds to and processes instructions fetched from the main memory unit 122.
  • the central processing unit 121 is provided by a microprocessor unit, e.g.: those manufactured by Intel Corporation of Mountain View, California; those manufactured by Motorola Corporation of Schaumburg, Illinois; the ARM processor and TEGRA system on a chip (SoC) manufactured by Nvidia of Santa Clara, California; the POWER7 processor, those manufactured by International Business Machines of White Plains, New York; or those manufactured by Advanced Micro Devices of Sunnyvale, California.
  • the computing device 100 may be based on any of these processors, or any other processor capable of operating as described herein.
  • the central processing unit 121 may utilize instruction level parallelism, thread level parallelism, different levels of cache, and multi-core processors.
  • a multi-core processor may include two or more processing units on a single computing component. Examples of multi-core processors include the AMD PHENOM IIX2, INTEL CORE i5 and INTEL CORE i7.
  • Main memory unit or memory device 122 may include one or more memory chips capable of storing data and allowing any storage location to be directly accessed by the microprocessor 121.
  • Main memory unit or device 122 may be volatile and faster than storage 128 memory.
  • Main memory units or devices 122 may be Dynamic random access memory (DRAM) or any variants, including static random access memory (SRAM), Burst SRAM or SynchBurst SRAM (BSRAM), Fast Page Mode DRAM (FPM DRAM),
  • DRAM Dynamic random access memory
  • SRAM static random access memory
  • BSRAM SynchBurst SRAM
  • FPM DRAM Fast Page Mode DRAM
  • EDRAM Enhanced DRAM
  • EDO DRAM Burst Extended Data Output DRAM
  • the main memory 122 or the storage 128 may be non volatile; e.g., non-volatile read access memory (NVRAM), flash memory non-volatile static RAM (nvSRAM), Ferroelectric RAM (FeRAM), Magnetoresistive RAM (MRAM), Phase- change memory (PRAM), conductive-bridging RAM (CBRAM), Silicon-Oxide-Nitride- Oxide-Silicon (SONOS), Resistive RAM (RRAM), Racetrack, Nano-RAM (NRAM), or Millipede memory.
  • NVRAM non-volatile read access memory
  • nvSRAM flash memory non-volatile static RAM
  • FeRAM Ferroelectric RAM
  • MRAM Magnetoresistive RAM
  • PRAM Phase- change memory
  • CBRAM conductive-bridging RAM
  • SiBRAM Silicon-Oxide-Nitride- Oxide-Silicon
  • SONOS Silicon-Oxide-Nitride- Oxide-Silicon
  • Resistive RAM Racetrack
  • the main memory 122 may be based on any of the above described memory chips, or any other available memory chips capable of operating as described herein.
  • the processor 121 communicates with main memory 122 via a system bus 150 (described in more detail below).
  • FIG. 11D depicts an embodiment of a computing device 100 in which the processor communicates directly with main memory 122 via a memory port 103.
  • the main memory 122 may be DRDRAM.
  • FIG. 11D depicts an embodiment in which the main processor 121 communicates directly with cache memory 140 via a secondary bus, sometimes referred to as a backside bus.
  • the main processor 121 communicates with cache memory 140 using the system bus 150.
  • Cache memory 140 typically has a faster response time than main memory 122 and is typically provided by SRAM, BSRAM, or EDRAM.
  • the processor 121 communicates with various EO devices 130 via a local system bus 150.
  • Various buses may be used to connect the central processing unit 121 to any of the I/O devices 130, including a PCI bus, a PCI-X bus, or a PCI-Express bus, or a NuBus.
  • the processor 121 may use an Advanced Graphics Port (AGP) to communicate with the display 124 or the I/O controller 123 for the display 124.
  • AGP Advanced Graphics Port
  • FIG. 11D depicts an embodiment of a computer 100 in which the main processor 121 communicates directly with I/O device 130b or other processors 121 ’ via HYPERTRANSPORT, RAPIDIO, or INFINIBAND communications technology.
  • FIG. 11D also depicts an embodiment in which local busses and direct communication are mixed: the processor 121 communicates with I/O device 130a using a local interconnect bus while communicating with I/O device 130b directly.
  • Input devices may include keyboards, mice, trackpads, trackballs, touchpads, touch mice, multi-touch touchpads and touch mice, microphones, multi-array microphones, drawing tablets, cameras, single-lens reflex camera (SLR), digital SLR (DSLR), CMOS sensors, accelerometers, infrared optical sensors, pressure sensors, magnetometer sensors, angular rate sensors, depth sensors, proximity sensors, ambient light sensors, gyroscopic sensors, or other sensors.
  • Output devices may include video displays, graphical displays, speakers, headphones, inkjet printers, laser printers, and 3D printers.
  • Devices 130a- 13 On may include a combination of multiple input or output devices, including, e.g., Microsoft KINECT, Nintendo Wiimote for the WII, Nintendo WII U GAMEPAD, or Apple IPHONE. Some devices 130a- 13 On allow gesture recognition inputs through combining some of the inputs and outputs. Some devices 130a-130n provides for facial recognition which may be utilized as an input for different purposes including authentication and other commands. Some devices 130a-130n provides for voice recognition and inputs, including, e.g., Microsoft KINECT, SIR! for IPHONE by Apple, Google Now or Google Voice Search.
  • Additional devices 130a- 13 On have both input and output capabilities, including, e.g., haptic feedback devices, touchscreen displays, or multi-touch displays.
  • Touchscreen, multi-touch displays, touchpads, touch mice, or other touch sensing devices may use different technologies to sense touch, including, e.g., capacitive, surface capacitive, projected capacitive touch (PCT), in-cell capacitive, resistive, infrared, waveguide, dispersive signal touch (DST), in-cell optical, surface acoustic wave (SAW), bending wave touch (BWT), or force-based sensing technologies.
  • PCT surface capacitive, projected capacitive touch
  • DST dispersive signal touch
  • SAW surface acoustic wave
  • BWT bending wave touch
  • Some multi-touch devices may allow two or more contact points with the surface, allowing advanced functionality including, e.g., pinch, spread, rotate, scroll, or other gestures.
  • Some touchscreen devices including, e.g., Microsoft PIXELSENSE or Multi-Touch Collaboration Wall, may have larger surfaces, such as on a table-top or on a wall, and may also interact with other electronic devices.
  • Some I/O devices 130a-130n, display devices 124a-124n or group of devices may be augment reality devices.
  • the I/O devices may be controlled by an I/O controller 123 as shown in FIG. 11C.
  • the I/O controller may control one or more EO devices, such as, e.g., a keyboard 126 and a pointing device 127, e.g., a mouse or optical pen.
  • an EO device may also provide storage and/or an installation medium 116 for the computing device 100.
  • the computing device 100 may provide USB connections (not shown) to receive handheld USB storage devices.
  • an EO device 130 may be a bridge between the system bus 150 and an external communication bus, e.g. a USB bus, a SCSI bus, a FireWire bus, an Ethernet bus, a Gigabit Ethernet bus, a Fibre Channel bus, or a Thunderbolt bus.
  • display devices 124a-124n may be connected to EO controller 123.
  • Display devices may include, e.g., liquid crystal displays (LCD), thin film transistor LCD (TFT-LCD), blue phase LCD, electronic papers (e-ink) displays, flexile displays, light emitting diode displays (LED), digital light processing (DLP) displays, liquid crystal on silicon (LCOS) displays, organic light-emitting diode (OLED) displays, active- matrix organic light-emitting diode (AMOLED) displays, liquid crystal laser displays, time- multiplexed optical shutter (TMOS) displays, or 3D displays. Examples of 3D displays may use, e.g.
  • Display devices 124a-124n may also be a head-mounted display (HMD).
  • display devices 124a-124n or the corresponding I/O controllers 123 may be controlled through or have hardware support for OPENGL or DIRECTX API or other graphics libraries.
  • the computing device 100 may include or connect to multiple display devices 124a-124n, which each may be of the same or different type and/or form.
  • any of the I/O devices 130a-130n and/or the I/O controller 123 may include any type and/or form of suitable hardware, software, or combination of hardware and software to support, enable or provide for the connection and use of multiple display devices 124a-124n by the computing device 100.
  • the computing device 100 may include any type and/or form of video adapter, video card, driver, and/or library to interface, communicate, connect or otherwise use the display devices 124a-124n.
  • a video adapter may include multiple connectors to interface to multiple display devices 124a-124n.
  • the computing device 100 may include multiple video adapters, with each video adapter connected to one or more of the display devices 124a-124n. In some embodiments, any portion of the operating system of the computing device 100 may be configured for using multiple displays 124a-124n. In other embodiments, one or more of the display devices 124a-124n may be provided by one or more other computing devices 100a or 100b connected to the computing device 100, via the network 104. In some embodiments software may be designed and constructed to use another computer’s display device as a second display device 124a for the computing device 100. For example, in one embodiment, an Apple iPad may connect to a computing device 100 and use the display of the device 100 as an additional display screen that may be used as an extended desktop.
  • a computing device 100 may be configured to have multiple display devices 124a-124n.
  • the computing device 100 may comprise a storage device 128 (e.g. one or more hard disk drives or redundant arrays of independent disks) for storing an operating system or other related software, and for storing application software programs such as any program related to the software for the genomic data processing system 120.
  • storage device 128 include, e.g., hard disk drive (HDD); optical drive including CD drive, DVD drive, or BLU-RAY drive; solid-state drive (SSD); USB flash drive; or any other device suitable for storing data.
  • Some storage devices may include multiple volatile and non-volatile memories, including, e.g., solid state hybrid drives that combine hard disks with solid state cache.
  • Some storage device 128 may be non-volatile, mutable, or read-only. Some storage device 128 may be internal and connect to the computing device 100 via a bus 150. Some storage devices 128 may be external and connect to the computing device 100 via an EO device 130 that provides an external bus. Some storage device 128 may connect to the computing device 100 via the network interface 118 over a network 104, including, e.g., the Remote Disk for MACBOOK AIR by Apple. Some client devices 100 may not require a non-volatile storage device 128 and may be thin clients or zero clients 102. Some storage device 128 may also be used as an installation device 116, and may be suitable for installing software and programs.
  • the operating system and the software can be run from a bootable medium, for example, a bootable CD, e.g. KNOPPIX, a bootable CD for GNU/Linux that is available as a GNU/Linux distribution from knoppix.net.
  • a bootable CD e.g. KNOPPIX
  • a bootable CD for GNU/Linux that is available as a GNU/Linux distribution from knoppix.net.
  • Client device 100 may also install software or application from an application distribution platform.
  • application distribution platforms include the App Store for iOS provided by Apple, Inc., the Mac App Store provided by Apple, Inc., GOOGLE PLAY for Android OS provided by Google Inc., Chrome Webstore for CHROME OS provided by Google Inc., and Amazon Appstore for Android OS and KINDLE FIRE provided by Amazon.com, Inc.
  • An application distribution platform may facilitate installation of software on a client device 102.
  • An application distribution platform may include a repository of applications on a server 106 or a cloud 108, which the clients 102a- 102n may access over a network 104.
  • An application distribution platform may include application developed and provided by various developers. A user of a client device 102 may select, purchase and/or download an application via the application distribution platform.
  • the computing device 100 may include a network interface 118 to interface to the network 104 through a variety of connections including, but not limited to, standard telephone lines LAN or WAN links (e.g., 802.11, Tl, T3, Gigabit Ethernet, Infmiband), broadband connections (e.g., ISDN, Frame Relay, ATM, Gigabit Ethernet, Ethemet-over-SONET, ADSL, VDSL, BPON, GPON, fiber optical including FiOS), wireless connections, or some combination of any or all of the above.
  • standard telephone lines LAN or WAN links e.g., 802.11, Tl, T3, Gigabit Ethernet, Infmiband
  • broadband connections e.g., ISDN, Frame Relay, ATM, Gigabit Ethernet, Ethemet-over-SONET, ADSL, VDSL, BPON, GPON, fiber optical including FiOS
  • wireless connections or some combination of any or all of the above.
  • Connections can be established using a variety of communication protocols (e.g., TCP/IP, Ethernet, ARCNET, SONET, SDH, Fiber Distributed Data Interface (FDDI), IEEE 802.1 la/b/g/n/ac CDMA, GSM, WiMax and direct asynchronous connections).
  • the computing device 100 communicates with other computing devices 100’ via any type and/or form of gateway or tunneling protocol e.g. Secure Socket Layer (SSL) or Transport Layer Security (TLS), or the Citrix Gateway Protocol manufactured by Citrix Systems, Inc. of Ft. Lauderdale, Florida.
  • SSL Secure Socket Layer
  • TLS Transport Layer Security
  • Citrix Gateway Protocol manufactured by Citrix Systems, Inc. of Ft. Lauderdale, Florida.
  • the network interface 118 may comprise a built-in network adapter, network interface card, PCMCIA network card, EXPRESSCARD network card, card bus network adapter, wireless network adapter, USB network adapter, modem or any other device suitable for interfacing the computing device 100 to any type of network capable of communication and performing the operations described herein.
  • a computing device 100 of the sort depicted in FIGs. 11B and 11C may operate under the control of an operating system, which controls scheduling of tasks and access to system resources.
  • the computing device 100 can be running any operating system such as any of the versions of the MICROSOFT WINDOWS operating systems, the different releases of the Unix and Linux operating systems, any version of the MAC OS for Macintosh computers, any embedded operating system, any real-time operating system, any open source operating system, any proprietary operating system, any operating systems for mobile computing devices, or any other operating system capable of running on the computing device and performing the operations described herein.
  • Typical operating systems include, but are not limited to: WINDOWS 2000, WINDOWS Server 2022, WINDOWS CE, WINDOWS Phone, WINDOWS XP, WINDOWS VISTA, and WINDOWS 7, WINDOWS RT, and WINDOWS 8 all of which are manufactured by Microsoft Corporation of Redmond, Washington; MAC OS and iOS, manufactured by Apple, Inc. of Cupertino, California; and Linux, a freely-available operating system, e.g. Linux Mint distribution (“distro”) or Ubuntu, distributed by Canonical Ltd. of London, United Kingdom; or Unix or other Unix-like derivative operating systems; and Android, designed by Google, of Mountain View, California, among others.
  • WINDOWS 2000 WINDOWS Server 2022, WINDOWS CE, WINDOWS Phone, WINDOWS XP, WINDOWS VISTA, and WINDOWS 7, WINDOWS RT, and WINDOWS 8 all of which are manufactured by Microsoft Corporation of Redmond, Washington;
  • the computer system 100 can be any workstation, telephone, desktop computer, laptop or notebook computer, netbook, ULTRABOOK, tablet, server, handheld computer, mobile telephone, smartphone or other portable telecommunications device, media playing device, a gaming system, mobile computing device, or any other type and/or form of computing, telecommunications or media device that is capable of communication.
  • the computer system 100 has sufficient processor power and memory capacity to perform the operations described herein.
  • the computer system 100 can be of any suitable size, such as a standard desktop computer or a Raspberry Pi 4 manufactured by Raspberry Pi Foundation, of Cambridge, United Kingdom.
  • the computing device 100 may have different processors, operating systems, and input devices consistent with the device.
  • the Samsung GALAXY smartphones e.g., operate under the control of Android operating system developed by Google, Inc.
  • GALAXY smartphones receive input via a touch interface.
  • the computing device 100 is a gaming system.
  • the computer system 100 may comprise a PLAYSTATION 3, or PERSONAL PLAYSTATION PORTABLE (PSP), or a PLAYSTATION VITA device manufactured by the Sony Corporation of Tokyo, Japan, a NINTENDO DS, NINTENDO 3DS, NINTENDO WII, or a NINTENDO WII U device manufactured by Nintendo Co., Ltd., of Kyoto, Japan, an XBOX 360 device manufactured by the Microsoft Corporation of Redmond, Washington.
  • the computing device 100 is a digital audio player such as the Apple IPOD, IPOD Touch, and IPOD NANO lines of devices, manufactured by Apple Computer of Cupertino, California.
  • Some digital audio players may have other functionality, including, e.g., a gaming system or any functionality made available by an application from a digital application distribution platform.
  • the IPOD Touch may access the Apple App Store.
  • the computing device 100 is a portable media player or digital audio player supporting file formats including, but not limited to, MP3, WAV, M4A/AAC, WMA Protected AAC, AIFF, Audible audiobook, Apple Lossless audio file formats and .mov, m4v, and .mp4 MPEG-4 (H.264/MPEG-4 AVC) video file formats.
  • the computing device 100 is a tablet e.g. the IPAD line of devices by Apple; GALAXY TAB family of devices by Samsung; or KINDLE FIRE, by Amazon.com, Inc. of Seattle, Washington.
  • the computing device 100 is an eBook reader, e.g. the KINDLE family of devices by Amazon.com, or NOOK family of devices by Barnes & Noble, Inc. of New York City, New York.
  • the communications device 102 includes a combination of devices, e.g. a smartphone combined with a digital audio player or portable media player.
  • a smartphone e.g. the IPHONE family of smartphones manufactured by Apple, Inc.; a Samsung GALAXY family of smartphones manufactured by Samsung, Inc.; or a Motorola DROID family of smartphones.
  • the communications device 102 is a laptop or desktop computer equipped with a web browser and a microphone and speaker system, e.g. a telephony headset.
  • the communications devices 102 are web-enabled and can receive and initiate phone calls.
  • a laptop or desktop computer is also equipped with a webcam or other video capture device that enables video chat and video call.
  • the status of one or more machines 102, 106 in the network 104 are monitored, generally as part of network management.
  • the status of a machine may include an identification of load information (e.g., the number of processes on the machine, CPU and memory utilization), of port information (e.g., the number of available communication ports and the port addresses), or of session status (e.g., the duration and type of processes, and whether a process is active or idle).
  • this information may be identified by a plurality of metrics, and the plurality of metrics can be applied at least in part towards decisions in load distribution, network traffic management, and network failure recovery as well as any aspects of operations of the present solution described herein.
  • a system 1200 may include a computing device 1210 (or multiple computing devices, co-located or remote to each other) and a sample processing system 1280.
  • computing device 1210 (or components thereof) may be integrated with the sample processing system 1280 (or components thereof).
  • Components of computing device 1210 may be implemented by various combinations of computing hardware and software.
  • the sample processing system 1280 may include, may be, or may employ, for example, chromatography, mass spectroscopy, in situ hybridization, PCR, next-generation sequencing, northern blotting, microarray, dot or slot blots, FISH, and/or electrophoresis, etc., on such biological sample as, for example, phlegm, saliva, blood (or components thereof), tissue, and/or cells.
  • the sample processing system 1280 may be, or may include, devices or systems for performing liquid chromatography and mass spectrometry, and may be used for extracting metabolites in biological samples.
  • the control unit 1215 may be used to control, and receive signals acquired via, components of sample processing system 1280.
  • the control unit 1210 may include one or more processors and one or more volatile and non volatile memories for storing computing code and data that are captured, acquired, recorded, and/or generated.
  • the computing device 1210 may include a control unit 1215 that is configured to exchange control signals with sample processing system 1280, allowing the computing device 1210 to be used to control, for example, processing of samples and/or delivery of data generated and/or acquired through processing of samples.
  • a raw data analyzer 1220 may be used, for example, to perform analyses of data captured via sample processing system 1280, and may employ, for example, alignment, peak picking, and normalization procedures as discussed herein.
  • data may be generated as a multi-dimensional array or vector with values representing concentrations or levels of each of a plurality of metabolites or other signature components, and many instances, such levels may have widely different scales (e.g. parts per thousand, parts per million, etc.).
  • values may be normalized to a predetermined range (e.g. 0-1, 0-100, or any other such range).
  • the normalization may comprise linear rescaling, or may be a more complex function (e.g. based on an average concentration of the particular metabolite in a sample data set).
  • dimension reduction may be performed to reduce large and sparse arrays or vectors.
  • a machine learning modeler 1225 may be used to implement various machine learning functionality discussed herein.
  • a model training and testing engine 1230 may be used to apply various machine learning techniques (which may comprise, e.g., Light Gradient Boosting Machine (LightGBM) and/or random forests techniques) to one or more training datasets (e.g., datasets comprising processed data from raw data analyzer 1220 and/or various features such as ion features) to train and test machine learning models for various predictions or other classifications
  • a classification engine 1235 may employ a machine learning model (e.g., classifiers trained via model training and testing engine 1230) to analyze data on metabolites (based on, e.g., tests on samples from subjects) to make various predictions or other classifications (e.g., regarding presence of various medical conditions).
  • a feature analyzer 1240 may be used to evaluate features by, for example, quantifying the impact of each feature on the developed model.
  • Feature analyzer 1240 may, for example, uncover clinically important ion features that were globally predictive of the outcome, and may determine, for example, Shapley values for all features, or the top features (e.g., the top 2, top 5, top 10, top 15, top 20, top 25, top 30, etc.) on individual predictions and provide percent of contributions of features.
  • Shapley analysis may provide improvements in analysis accuracy and/or efficiency in many implementations. Classification systems not utilizing the systems and methods discussed herein may only identify a top set of contributing features and the magnitude of change for each ion feature, which may be limiting in some instances.
  • a biomarker signature module 1245 may generate a biomarker signature based on selected features as disclosed herein. Features may be selected based on a threshold, such a percent contribution to predicting a medical condition, such as 0.5%, 1%, 2%, 5%, 10%, etc.
  • a transceiver 1250 allows the computing device 1210 to exchange readings, control commands, and/or other data with sample processing system 1280 (or components thereof).
  • the transceiver 1250 may additionally or alternatively include a network interface permitting the computing device 1210 to communicate with other remote devices and systems via, for example, a telecommunications network such as the internet.
  • One or more user interfaces 1255 allow the computing device 1210 to receive user inputs (e.g., via a keyboard, touchscreen, microphone, camera, etc.) and provide outputs (e.g., via display screen, audio speakers, etc.).
  • the computing device 1210 may additionally include one or more databases 1260 (stored in, e.g., one or more computer-readable non-volatile memory devices) for storing, for example, data and analyses obtained from or via raw data analyzer 1220, machine learning modeler 1225 (e.g., model training and testing engine 1230 and/or classification engine 1235), feature analyzer 1240, biomarker signature module 1245, and/or sample processing system 1280.
  • database 1260 (or portions thereof) may alternatively or additionally be part of another computing device that is co located or remote and in communication with computing device 1210 and/or sample processing system 1280 (or components thereof).
  • a flowchart for an example process 1300 according to various potential embodiments is shown in FIG. 13.
  • biological samples from subjects in a cohort may be analyzed or otherwise processed.
  • the cohort may include subjects with a medical condition as well as control subjects without the medical condition. Processing the biological samples may include suitable tests for extracting, for example, various metabolites in the samples.
  • the control unit 1215 may, for example, instruct sample processing system 1280 to process samples and provide test results to computing device 1210.
  • raw test results may be received and processed (e.g., by or via raw data analyzer 1220).
  • the raw test results may be analyzed by, for example, alignment, peak picking, and normalization procedures as discussed herein.
  • a training dataset may be generated from the processed test results and a machine learning model may be developed as disclosed herein (e.g., by or via machine learning modeler 1225).
  • feature analysis may be performed to identify features having sufficient predictive value (e.g., contributing at least a certain threshold percent), and a metabolite biomarker signature may be generated based on identified features.
  • Feature analysis may be performed by or via, for example, feature analyzer 1240, and the metabolite biomarker signature may be generated by or via, for example, biomarker signature module 1245.
  • the metabolite biomarker signature may be stored in database 1260 in association with the medical condition, for subsequent application to patient samples for recognizing the medical condition based on a metabolite profile of the samples.
  • the metabolite biomarker signature may be incorporated into a report, presented graphically or otherwise via user interfaces 1255, and/or transmitted to another device through a network via transceiver 1240.
  • one or more biological samples from a patient may be processed by running one or more tests.
  • the control unit 1215 may instruct sample processing system 1280 to process the patient sample(s) and provide test results to computing device 1210.
  • the metabolite biomarker signature from step 1320 may be applied (by, e.g., biomarker signature module 1245) to the data obtained from tests on the patient’s sample(s) to determine whether the patient has the medical condition that is associated with the metabolite biomarker signature.
  • a method comprising, or alternatively consisting essentially of, or yet further consisting of (a) generating a training dataset based on one or more metabolites isolated from a plurality of biological samples isolated from subjects, wherein the plurality of subjects comprises, or alternatively consists essentially of, or yet further consists of subjects having a medical condition, and wherein the training dataset comprises, or alternatively consists essentially of, or yet further consists of a set of features identified through one or more tests run on the biological samples; (b) producing, using a machine learning system, a metabolite biomarker signature by: (i) applying one or more machine learning models to the training dataset comprising, or alternatively consisting essentially of, or yet further consisting of the set of identified features; (ii) selecting a subset of the set of features based on contributions to model predictions; and (iii) generating the metabolite biomarker signature based on the subset of features; and (c) storing, in a computer-
  • a method comprising, or alternatively consisting essentially of, or yet further consisting of (a) running one or more types of tests on biological samples from subjects having a medical condition, wherein the biological samples comprise, or alternatively consist essentially of, or yet further consists of a metabolite profile that changed in the subjects as a result of the medical condition; (b) generating a training dataset comprising, or alternatively consisting essentially of, or yet further consisting of a set of features identified through the one or more tests run on the biological samples; (c) producing a metabolite biomarker signature by: (i) applying one or more machine learning models to the training dataset comprising, or alternatively consisting essentially of, or yet further consisting of the set of identified features; (ii) selecting a subset of the set of features having contributions to model predictions exceeding a threshold; and (iii) generating the metabolite biomarker signature based on the subset of features; and (d) applying the metabolite biomark
  • a method comprising, or alternatively consisting essentially of, or yet further consisting of producing, using a machine learning system, a metabolite biomarker signature by: (i) applying one or more machine learning models to a training dataset based on one or more metabolites in a plurality of biological samples from a plurality of subjects, wherein the plurality of subjects comprises, or alternatively consists essentially of, or yet further consists of subjects having a medical condition, and wherein the training dataset comprises, or alternatively consists of, or yet further consists of a set of features identified through the one or more tests run on the biological samples; (ii) selecting a subset of the set of features based on contributions to model predictions; and (iii) generating the metabolite biomarker signature based on the subset of features; and storing, in a computer-readable storage medium, the metabolite biomarker signature in association with the medical condition.
  • the method further comprises generating the training dataset based on the one or more metabolites.
  • the training dataset comprises, or alternatively consists, or yet further consists of a feature of the one or more metabolites in a biological sample isolated from a subject.
  • the training dataset further comprises whether the subject has a medical condition or not.
  • the method further comprises isolating the one or more metabolites from the plurality of biological samples.
  • a subset of feature(s) lacks at least one feature compared to the reference set of features.
  • a subset of feature(s) comprises, or alternatively consists essentially of, or yet further consists of the top 200, or the top 150, or the top 100, or the top 90, or the top 80, or the top 70, or the top 60, or the top 59, or the top 58, or the top 57, or the top 56, or the top 55, or the top 54, or the top 53, or the top 52, or the top 51, or the top 50, or the top 49, or the top 48, or the top 47, or the top 46, or the top 45, or the top 44, or the top 43, or the top 42, or the top 41, or the top 40, or the top 39, or the top 38, or the top 37, or the top 36, or the top 35, or the top 34, or the top 33, or the top 32, or the top 31, or the top 30, or the top 29, or the top 28, or the top 27, or the top 26, or the top 25, or the top 24, or the top 23, or the top 22, or the top 21, or the top
  • selecting the subset of features comprises, or alternatively consists essentially of, or yet further consists of performing feature importance analysis.
  • selecting the subset of features comprises applying a Shapley Additive Explanation (SHAP) procedure.
  • the primary measure of model performance or the feature performance is the area under the receiver operating characteristic curve (AUC), which illustrates the diagnostic discriminative performance of the models.
  • Performance measures for the models or the feature subsets also included sensitivity, specificity, or accuracy at a high-sensitivity operating point used to binarize the model predictions.
  • the SHAP provides the top list of feature classification, the magnitude of the change for each ion feature, the direction of the change (higher or lower in infected samples) or any combination thereof.
  • a tranditional classification provides feature importance classification but not direction of change at the same time.
  • the subject has been treated with a therapy neutralizing a pathogen causing the medical condition. Additionally or alternatively, the subject is immune-compromised. In some embodiments, the subject is a human. In some embodiments, the subject is an adult. In other embodiments, the subject is a child.
  • immune-compromisation or any grammatical variation thereof refers to a state in which the immune system’s ability to fight infectious disease or cancer is compromised or entirely absent.
  • a subject or patient who is immune-compromised may not have an immune response, or have a lower immune response compared to a subject having a healthy immune system, recognizing a pathogen upon having an infection of the pathogen.
  • a subject who is immune-compromised may not have an immune response, or have a lower immune response compared to a subject having a healthy immune system, recognizing a cancer cell upon developing such cancer.
  • a commonly used diagnosis method relying on detecting the immune response would fail in identifying a subject having the medical condition (such as infection or cancer).
  • a method as disclosed herein relying on a metabolite feature would have no difficulty in selecting the subject or patient having the medical condition, no matter whether the subject or patient is immune-compromised or not.
  • a subject or patient who has received a therapy neutralizing the pathogen causing the medical condition may not present such pathogen, or may not have a higher level (such as, absolute amount, or raw concentration, or normalized concentration) of such pathogen compared to that of a subject free of the pathogen, in a biological sample isolated from the subject or patient.
  • a commonly used diagnosis method relying on detecting the pathogen would fail in identifying a subject having the medical condition caused by the pathogen.
  • a method as disclosed herein relying on a metabolite feature would have no difficulty in selecting the subject or patient having the medical condition, no matter whether the subject or patient has been treated with a therapy neutralizing the pathogen or not.
  • the feature of the biological sample comprises, or alternatively consists essentially of, or yet further consists of an extracellular concentration of the metabolite.
  • a metabolite is an intracellular metabolite.
  • the method further comprises extracting and analyzing one or more biological samples of a patient using the metabolite biomarker signature.
  • analyzing a biological sample refers to obtaining the metabolite feature(s) of the biological sample.
  • the metabolite feature(s) is determined by a test as disclosed herein, such as LC/MS-MS or LC-Q-TOF-MS.
  • the machine learning models comprise, or alternatively consist essentially of, or yet further consist of boosted or bagged decision trees.
  • the boosted or bagged decisions are selected from the group of Light Gradient Boosting Machine (LightGBM), XGBoost, random forest, or Adaptive Boosting (AdaBoost).
  • a linear model optionally selected from least absolute shrinkage and selection operator (LASSO) or Ridge, may substitute a machine learning model.
  • the medical condition is selected from the group consisting of: an infection caused by a pathogen selected from the group consisting of a bacterium, a virus, a fungi or a parasite, a cancer, or a chronic disease.
  • the medical condition is selected from the group consisting of tuberculosis, a human papillomavirus (HPV) infection, or malaria.
  • HPV human papillomavirus
  • the medical condition is an infection by a virus selected from adenovirus, coronavirus, influenza A H1N1, influenza A H3N2, influenza B, human metapneumovirus, parainfluenza 1, parainfluenza 2, parainfluenza 3, parainfluenza 4, respiratory syncytial virus (RSV), or rhinovirus.
  • a virus selected from adenovirus, coronavirus, influenza A H1N1, influenza A H3N2, influenza B, human metapneumovirus, parainfluenza 1, parainfluenza 2, parainfluenza 3, parainfluenza 4, respiratory syncytial virus (RSV), or rhinovirus.
  • the medical condition comprises, or alternatively consists essentially of, or yet further consists of an infection by a respiratory virus.
  • the respiratory virus is the pathogen causing the medical condition.
  • the respiratory virus is selected from the group of influenza virus, respiratory syncytial virus, parainfluenza virus, metapneumovirus, rhinovirus, coronavirus, adenovirus, or bocavirus.
  • the influenza is selected from influenza type A (influenza A) or a subtype thereof, influenza type B (influenza B) of a lineage thereof, influenza type C (influenza C), or influenza type D (influenza D).
  • the subtype of influenza A is selected from HI optionally H1N1; H3 optionally H3N2; H5 optionally selected from H5N1, H5N2, H5N3, H5N4, H5N5, H5N6, H5N7, H5N8, or H5N9; H7 optionally selected from H7N1, H7N2, H7N3, H7N4, H7N5, H7N6, H7N7, H7N8, or H7N9; or H9 optionally selected from H9N1, H9N2, H9N3, H9N4, H9N5, H9N6, H9N7, H9N8, or H9N9.
  • the lineage of influenza B is selected from Victoria or Yamagata.
  • the medical condition comprises, or alternatively consists essentially of, or yet further consists of an infection by a coronavirus.
  • the coronavirus is referred to herein as the pathogen causing the medical condition.
  • the coronavirus is selected from the group of: common cold optionally caused by any one of human coronavirus (HCoV) HCoV-OC43, HCoV-HKUl, HCoV- 229E, or HCoV-NL63.
  • the coronavirus is selected from the group of severe acute respiratory syndrome coronavirus (SARS-CoV or SARS-CoV-1), Middle East respiratory syndrome coronavirus (MERS-CoV) or severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
  • the medical condition comprises, or alternatively consists essentially of, or yet further consists of severe acute respiratory syndrome (SARS) caused by severe acute respiratory syndrome coronavirus (SARS-CoV or SARS-CoV-1); Middle East respiratory syndrome (MERS) caused by Middle East respiratory syndrome coronavirus (MERS-CoV); or Coronavirus Disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
  • the medical condition comprises, or alternatively consists essentially of, or yet further consists of a cancer.
  • the cancer is selected from a cancer of: circulatory system, for example, heart (sarcoma (angiosarcoma, fibrosarcoma, rhabdomyosarcoma, liposarcoma), myxoma, rhabdomyoma, fibroma, lipoma or malignant teratoma), mediastinum, pleura, or other intrathoracic organs, vascular tumors, tumor-associated vascular tissue; respiratory tract, for example, nasal cavity, middle ear, accessory sinuses, larynx, trachea, bronchus or lung such as small cell lung cancer (SCLC), non-small cell lung cancer (NSCLC), bronchogenic carcinoma (squamous cell, undifferentiated small cell, undifferentiated large cell, adenocarcinoma), alveolar (bronchiolar) carcinoma, bron
  • SCLC small cell lung cancer
  • the cancer is a solid tumor. In other embodiments, wherein the cancer is a liquid cancer. Additionally or alternatively, the cancer is a primary cancer or a metastasis. In some embodiments, the cancer comprises a carcinoma, a sarcoma, a myeloma, a leukemia, or a lymphoma. Accordingly, the cancer or the cell may be referred to herein as the pathogen causing the medical condition.
  • a chronic disease refers to a medical condition that last 1 year or more and require ongoing medical attention or limited activities of daily living or both.
  • a chronic disease is a heart disease, or a stroke, or other cardiovascular disease.
  • a chronic disease is high blood pressure.
  • a chronic disease is high cholesterol.
  • a chronic disease is a cancer.
  • a chronic disease is diabetes.
  • the one or more tests comprises, or alternatively consists essentially of, or yet further consists of a liquid chromatography (LC), or a mass spectrometry (MS), or both.
  • the one or more tests comprises, or alternatively consists essentially of, or yet further consists of liquid chromatography tandem mass spectrometry (LC/MS-MS).
  • the one or more tests comprises, or alternatively consists essentially of, or yet further consists of liquid chromatography quadrupole time-of-flight mass spectrometry (LC-Q-TOF-MS).
  • the metabolite features are ion features.
  • a metabolite feature comprises, or alternatively consists essentially of, or yet further consists of presence of absence of the metabolite.
  • a metabolite feature comprises, or alternatively consists essentially of, or yet further consists of absolute amount of the metabolite in the biological sample.
  • a metabolite feature comprises, or alternatively consists essentially of, or yet further consists of a concentration of the metabolite in the biological sample.
  • a metabolite feature comprises, or alternatively consists essentially of, or yet further consists of a compound abundance of the metabolite in the biological sample.
  • a metabolite feature comprises, or alternatively consists essentially of, or yet further consists of an absolute amount or a concentration or a compound abundance of the metabolite in the biological sample normalized, for example to an internal standard or to the mean compound abundance. Additionally, in some embodiments, the concentration or amount or compound abundance of the metabolite may be an extracellular or an intracellular one. Additionally or alternatively, the level (such as the absolute amount, or the compound abundance, or the concentration) is normalized by subtracting a level of a negative control. In further embodiments, the negative control is a subject free of the medical condition. In other embodiments, the negative control is a solution immersing or diluting the biological sample prior to performing the test. In some embodiments, the internal standard is a labeled standard D5-pyroglutamic acid.
  • the biological sample is a nasopharyngeal sample, a blood sample, a serum sample, a plasma sample, or a urine sample.
  • the biological sample is a nasopharyngeal swab or viral transport medium (VTM) immersing a nasopharyngeal swab.
  • VTM viral transport medium
  • a metabolite feature is normalized by subtracting a level of the viral transport medium.
  • a metabolite profile comprises, or alternatively consists essentially of, or yet further consists of a feature of more than one metabolites. Additionally or alternatively, a metabolite profile comprises, or alternatively consists essentially of, or yet further consists of more than one features of a metabolite. In some embodiments, a metabolite profile comprises, or alternatively consists essentially of, or yet further consists of a set or a subset of features as disclosed herein.
  • presence or absence of the metabolite(s) is determined.
  • an absolute amount of the metabolite(s) in the biological sample is determined.
  • a level (such as an absolute amount or a compound abundance or concentration) of the extracellular metabolite(s) in the biological sample is determined.
  • the level is an extracellular level.
  • a metabolite is an intracellular metabolite.
  • the level is normalized, for example, for example to an internal standard or to the mean compound abundance.
  • the internal standard is a labeled standard D5-pyroglutamic acid.
  • a method comprising, or alternatively consisting essentially of, or yet further consisting of applying a metabolite biomarker signature to a biological sample from a patient to recognize the medical condition.
  • the metabolite biomarker signature produced by a method as disclosed herein.
  • the method further comprises performing the one or more tests, for example, liquid chromatography tandem mass spectrometry (LC/MS-MS) or liquid chromatography quadrupole time-of-flight mass spectrometry (LC-Q-TOF-MS), on the biological sample from the patient.
  • the patient is not the subjects used in determining or selecting the metabolite biomarker signature.
  • the patient is suspect of having a medical condition.
  • the patient is immune- compromised.
  • the patient is not immune-compromised.
  • the patient has been treated with a therapy neutralizing a pathogen causing the medical condition. In other embodiments, the patient has not been treated with a therapy neutralizing a pathogen causing the medical condition. In some embodiment, the patient is asymptomatic. In some embodiments, the patient is a human. In some embodiments, the patient is an adult. In other embodiments, the patient is a child.
  • a metabolite biomarker signature can be used to indicate the medical condition or to select a subject or patient having the medical condition from that suspect of having such medical condition.
  • a metabolite biomarker signature may be discovered by a method as disclosed herein.
  • a metabolite biomarker signature comprises, or alternatively consists essentially of, or yet further consists of one or more certain metabolite feature(s).
  • a metabolite biomarker signature comprises, or alternatively consists essentially of, or yet further consists of an altered (increased or decreased) level (such as absolute amount, concentration, or compound abundance, normalized or not) of a metabolite compared to a negative control.
  • a metabolite biomarker signature comprises, or alternatively consists essentially of, or yet further consists of a level (such as absolute amount, concentration, or compound abundance, normalized or not) of a metabolite similar to (such as at least 50%, or at least 60%, or at least 70%, or at least 80%, or at least 90%, or at least 95%, or at least 97%, or at least 99%, or at least 100%; additionally or alternatively no more than 110%, or no more than 120%, or no more than 150%, or no more than 200% of) that of a positive control.
  • a metabolite biomarker signature comprises, or alternatively consists essentially of, or yet further consists of a metabolite at a level (such as absolute amount, concentration, or compound abundance, normalized or not) higher than a certain threshold. Additionally or alternatively, a metabolite biomarker signature comprises, or alternatively consists essentially of, or yet further consists of a metabolite at a level (such as absolute amount, concentration, or compound abundance, normalized or not) lower than a certain threshold.
  • a metabolite biomarker signature comprises, or alternatively consists essentially of, or yet further consists of a metabolite at a level (such as absolute amount, concentration, or compound abundance, normalized or not) within a certain range.
  • the threshold and range can be determined by a machine learning method as disclosed herein.
  • having a metabolite biomarker signature as disclosed herein in a biological sample isolated from a subject or patient is indicative of the subject or patient having a medical condition, or having a high possibility or risk (such as, more than 50%, or more than 60%, or more than 70%, or more than 75%, or more than 80%, or more than 85%, or more than 90%, or more than 95%, or more than 96%, or more than 97%, or more than 98%, or more than 99%, or about 100%) in having a medical condition.
  • having a feature profile in a biological sample isolated from a subject or patient not fall within a metabolite biomarker signature as disclosed herein is indicative of the subject or patient not having a medical condition, or having a low possibility or risk (such as, less than 50%, or less than 45%, or less than 40%, or less than 35%, or less than 30%, or less than 25%, or less than 20%, or less than 15%, or less than 10%, or less than 5%, or less than 1%) in having a medical condition.
  • the metabolite biomarker signature comprises, or alternatively consists of, or yet further consists of a decision tree or ensembled decision trees as determined by a machine learning method as disclosed herein.
  • applying or using a metabolite biomarker signature as used herein refers to comparing feature(s) of a biological sample in a patient with the metabolite biomarker signature, or inputting feature(s) of a biological sample in a patient to the decision tree or ensemble decision trees of the metabolite biomarker signature, and optionally determining or outputting whether the patient has the medical condition, or the possibility or risk of the patient having the medical condition.
  • the method further comprises identifying, using the metabolite biomarker signature, the medical condition in the patient based on an analysis of the biological sample.
  • the medical condition is a microbial infection.
  • the method further comprises recognizing, based on the metabolite biomarker signature, in the patient the medical condition that is an infection by a respiratory virus optionally selected from the group of influenza virus, respiratory syncytial virus, parainfluenza virus, metapneumovirus, rhinovirus, coronavirus, adenovirus, or bocavirus.
  • influenza is selected from influenza type A (influenza A) or a subtype thereof, influenza type B (influenza B) of a lineage thereof, influenza type C (influenza C), or influenza type D (influenza D).
  • the subtype of influenza A is selected from HI optionally H1N1; H3 optionally H3N2; H5 optionally selected from H5N1, H5N2, H5N3, H5N4, H5N5, H5N6, H5N7, H5N8, or H5N9; H7 optionally selected from H7N1, H7N2, H7N3, H7N4, H7N5, H7N6, H7N7, H7N8, or H7N9; or H9 optionally selected from H9N1, H9N2, H9N3, H9N4, H9N5, H9N6, H9N7, H9N8, or H9N9.
  • the lineage of influenza B is selected from Victoria or Yamagata.
  • the coronavirus is selected from the group of HCoV-OC43, HCoV-HKUl, HCoV-229E, HCoV-NL63, severe acute respiratory syndrome coronavirus (SARS-CoV or SARS-CoV-1); Middle East respiratory syndrome coronavirus (MERS-CoV); or severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).
  • the plurality of subjects comprises, or alternatively consists essentially of, or yet further consists of subject having the medical condition. Additionally or altneratively, the plurality of subjects comprises, or alternatively consists essentially of, or yet further consists of subjects without the medical condition. In some embodiments, the plurality of subjects may comprise, or alternatively consist essentially of, or yet further consist of subjects who are immunocomprised. Additionally or alternatively, the plurality of subjects may comprise, or alternatively consist essentially of, or yet further consist of subjects who are not immunocomprised. In some embodiments, the plurality of subjects may comprise, or alternatively consist essentially of, or yet further consist of subjects having been treated with a therapy neutralizing the pathogen causing the medical condition. Additionally or alternatively, the plurality of subjects may comprise, or alternatively consist essentially of, or yet further consist of subjects not having been treated with a therapy neutralizing the pathogen causing the medical condition.
  • the method further comprises determining or isolating or both determining and isolating the one or more metabolites from the plurality of biological samples.
  • determining or analyzing a metabolite refers to obtaining a feature of the metabolite, such as presence or absence of the metabolite or level (such as absolute amount, or concentration, or compound abundance, normalized or not) of the metabolite.
  • a feature of the metabolite such as presence or absence of the metabolite or level (such as absolute amount, or concentration, or compound abundance, normalized or not) of the metabolite.
  • Such feature can be obtained via a test as disclosed herein.
  • the method further comprises detecting a pathogen causing the medical condition in the biological sample, for example by reverse transcription polymerase chain reaction (RT-PCR) or an immunofluorescence assay.
  • detection of the pathogen further indicates the subject has the medical condition.
  • no detection of the pathogen further indicates the subject does not have the medical condition.
  • the method further comprises culturing the biological sample under a condition suitable for growth of the pathogen causing the medical condition prior to the detecting step.
  • the method further comprises detecting an immunoglobulin or an immune cell specifically recognizing and binding a pathogen causing the medical condition in the biological sample, for example, by an immunofluorescence assay.
  • detection of the immunoglobulin or the immune cell or both further indicates the subject has the medical condition.
  • no detection of either the immunoglobulin or the immune cell or both further indicate the subject does not have the medical condition.
  • the method further comprises administering to the patient having the medical condition a therapy specifically for treating the condition.
  • the therapy comprises, or alternatively consists essentially of, or yet further consists of a pharmaceutical agent neutralizing a pathogen causing the medical condition.
  • the therapy comprises, or alternatively consists essentially of, or yet further consists of a pharmaceutical agent not neutralizing the pathogen, such as the influenza virus.
  • the therapy comprises, or alternatively consists essentially of, or yet further consists of a pharmaceutical agent not neutralizing the extracellular pathogen or the extracellular influenza virus.
  • the therapy or the pharmaceutical agent is administered in an effective amount, for example in treating the medical condition, or in changing a metabolite feature to one indicative of not having the medical condition using the metabolite biomarker signature, or both.
  • the medical condition is an infection caused by a pathogen.
  • the therapy is specifically for treating the pathogen.
  • corresponding therapy is available to one of skill in the art, for example at www.drugs.com.
  • the medical condition is an infection caused by a bacterium.
  • the therapy is specifically for treating a bacterial infection.
  • the therapy is an antibiotic, or an antibody or an antigen binding fragment thereof specifically recognizing and binding the bacterium, or both.
  • the medical condition is a viral infection.
  • the therapy is an anti-viral therapy.
  • the anti-viral therapy is selected from an antibody or an antigen binding fragment thereof specifically recognizing and binding the virus, an inhibitor that inhibits transcription of the viral genome, such as a NDA polymerase inhibitor or a reverse transcriptase inhibitor; a protease inhibitor that inhibits the post-translational process of the virus; an agent that inhibits the virus from attaching to or penetrating the host cell; an immunomodulatory that induces production of host cell enzyme which stop viral reproduction; an integrase strand transfer inhibitor that prevents integration of the viral DNA into the host DNA by inhibiting the viral enzyme integrase; a neuraminidase inhibitor that blocks viral enzymes and inhibits reproduction of the virus; or any combination thereof.
  • the anti viral therapy is selected from an adamantane antiviral, an antiviral booster, an antiviral interferon, a chemokine receptor antagonist, an integrase strand transfer inhibitor, a miscellaneous antiviral, a neuraminidase inhibitor, a non-nucleoside reverse transcriptase inhibitor (NNRTI), a non-structural protein 5A (NS5A) inhibitor, a nucleoside reverse transcriptase inhibitor (NRTI), a protease inhibitor, a purine nucleoside, or any combination thereof.
  • NRTI non-nucleoside reverse transcriptase inhibitor
  • NRTI non-structural protein 5A
  • NRTI nucleoside reverse transcriptase inhibitor
  • protease inhibitor a purine nucleoside, or any combination thereof.
  • drugs.com/drug-class/adamantane-antivirals.html drugs.com/drug-class/antiviral-boosters.html, drugs.com/drug-class/antiviral- combinations.html, drugs.com/drug-class/antiviral-interferons.html, drugs.com/drug- class/chemokine-receptor-antagonist.html, drugs.com/drug-class/integrase-strand-transfer- inhibitor.html, drugs.com/drug-class/miscellaneous-antivirals.html, drugs. com/drug- class/neuraminidase-inhibitors.
  • drugs.com/drug-class/nnrtis.html drugs.com/drug-class/ns5a-inhibitors.html, drugs.com/drug-class/nrtis.html, drugs. com/drug-class/protease- inhibitors. html, and drugs.com/drug-class/purine-nucleosides.html, each of which is incorporated herein by reference in its entirety.
  • a method for selecting a subject for an anti-influenza treatment comprises, or alternatively consists essentially of, or yet further consists of determining in a biological sample isolated from a subject suspected of having a medical condition which is being infected with an influenza virus a feature of a metabolite. Accordingly, the influenza virus may be referred to herein as a pathogen causing the medical condition.
  • the metabolite is selected from one or more of: pyroglutamic acid, an in-source fragment ion of pyroglutamic acid, formylmethyl glutathione, a compound having a mass-to-charge ratio (m/z) of 106.0865 and a retention time (RT) of 10.34 or an equivalent thereof, a compound having an m/z of 130.0507 and an RT of 0.81 or an equivalent thereof, a compound having an m/z of 145.0935 and an RT of 8.36 or an equivalent thereof, a compound having an m/z of 178.1441 and an RT of 10.33 or an equivalent thereof, a compound having an m/z of 201.0740 and an RT of 3.21 or an equivalent thereof, a compound having an m/z of 211.1376 and an RT of 8.65 or an equivalent thereof, a compound having an m/z of 214.1306 and an RT of 10.85 or an equivalent thereof, a compound having
  • the metabolite is selected from one or more of: pyroglutamic acid, an in-source fragment ion of pyroglutamic acid, formylmethyl glutathione, a compound having a mass-to-charge ratio (m/z) of 106.0865 and a retention time (RT) of 10.34 or an equivalent thereof, a compound having an m/z of 130.0507 and an RT of 0.81 or an equivalent thereof, a compound having an m/z of 144.0935h and an RT of 8.36 or an equivalent thereof, a compound having an m/z of 178.1441 and an RT of 10.33 or an equivalent thereof, a compound having an m/z of 201.0740 and an RT of 3.21 or an equivalent thereof, a compound having an m/z of 211.1376 and an RT of 8.65 or an equivalent thereof, a compound having an m/z of 214.1306 and an RT of 10.85 or an equivalent thereof, a compound
  • the metabolite is selected from one or more of: pyroglutamic acid, an in-source fragment ion of pyroglutamic acid, formylmethyl glutathione, a compound having a mass-to-charge ratio (m/z) of 106.0865 and a retention time (RT) of 10.34 or an equivalent thereof, a compound having an m/z of 130.0507 and an RT of 0.81 or an equivalent thereof, a compound having an m/z of 144.0935h and an RT of 8.36 or an equivalent thereof, a compound having an m/z of 145.0935 and an RT of 8.36 or an equivalent thereof, a compound having an m/z of 178.1441 and an RT of 10.33 or an equivalent thereof, a compound having an m/z of 201.0740 and an RT of 3.21 or an equivalent thereof, a compound having an m/z of 211.1376 and an RT of 8.65 or an equivalent thereof, a compound
  • the metabolite is selected from one or more of: a compound having an m/z of 130.0507 and an RT of 0.81 or an equivalent thereof, a compound having an m/z of 84.0447 and an RT of 0.81 or an equivalent thereof, a compound having an m/z of 106.0865 and an RT of 10.34 or an equivalent thereof, a compound having an m/z of 422.1307 and an RT of 4.73 or an equivalent thereof, a compound having an m/z of 350.0774 and an RT of 9.34 or an equivalent thereof, a compound having an m/z of 249.1085 and an RT of 10.87 or an equivalent thereof, or a compound having an m/z of 957.3750 and an RT of 9.28 or an equivalent thereof
  • an altered (such as increased or decreased) level of the metabolite in the sample as compared to a control level of the metabolite indicates that the subject is suitable for an anti-influenza treatment.
  • the in-source fragment ion of pyroglutamic acid is pyroglutamic acid-D5.
  • a lower level of pyroglutamic acid indicates the subject is suitable for an anti -influenza treatment.
  • a lower level of a compound having an m/z of 349.0774h and an RT of 9.34 or an equivalent thereof, for example, compared to a control indicates the subject has the medical condition (such as influenza).
  • a lower level of a compound having an m/z of 350.0774 and an RT of 9.34 or an equivalent thereof, for example, compared to a control indicates the subject has the medical condition (such as influenza).
  • a lower level of a compound having an m/z of 956.3750h and an RT of 9.28 or an equivalent thereof, for example, compared to a control indicates the subject has the medical condition (such as influenza).
  • a lower level of a compound having an m/z of 957.3750 and an RT of 9.28 or an equivalent thereof, for example, compared to a control indicates the subject has the medical condition (such as influenza).
  • a lower level of a compound having an m/z of 352.213 In and an RT of 10.89 or an equivalent thereof, for example, compared to a control indicates the subject has the medical condition (such as influenza).
  • a lower level of a compound having an m/z of 353.2131 and an RT of 10.89 or an equivalent thereof, for example, compared to a control indicates the subject has the medical condition (such as influenza).
  • the subject is immune-compromised. Additionally or alternatively, the subject has been treated with a therapy neutralizing the influenza viral infection.
  • the subject is a human. In some embodiments, the subject is an adult. In other embodiments, the subject is a child.
  • a metabolite compound is identified by its m/z and RT.
  • a metabolite compound as disclosed herein is identified by its m/z and RT under the experimental setting as detailed in the examples.
  • an equivalent of a reference metabolite compound may refer to a metabolite compound identified by its m/z and RT under an experimental setting different to the one detailed in the examples.
  • the equivalent’s m/z or RT or both is different to the m/z or RT or both of the reference.
  • the equivalent shares the same m/z and RT of the reference.
  • an equivalent of a metabolite compound is the actual compound, such as pyroglutamic acid, having the same m/z and RT.
  • control is the level of the metabolite as measured in a sample isolated from a subject not suffering from an influenza infection, or an average level measured from a plurality of subjects not suffering from an influenza infection.
  • the method further comprises detecting the influenza virus in the biological sample by reverse transcription polymerase chain reaction (RT-PCR) or an immunofluorescence assay.
  • detection of the influenza virus further indicates the subject has the infection.
  • no detection of the influenza virus further indicates the subject does not have the infection.
  • the method further comprises culturing the biological sample under a condition suitable for growth of the influenza virus prior to the detecting step by RT-PCT or an immunofluorescence assay.
  • the method further comprises detecting an immunoglobulin or an immune cell specifically recognizing and binding the influenza virus in the biological sample by an immunofluorescence assay.
  • detection of the immunoglobulin or the immune cell or both further indicates the subject has the infection.
  • no detection of either the immunoglobulin or the immune cell or both further indicate the subject does not have the infection.
  • the method further comprises administering to the subject having the infection an anti -influenza therapy.
  • the anti-influenza therapy comprises, or alternatively consists essentially of, or yet further consists of a neuraminidase inhibitor, a M2 channel blocker, an antibody neutralizing an influenza virus, or any combination thereof.
  • the anti-influenza therapy optionally not neutralizing an influenza virus, comprises, or alternatively consists essentially of, or yet further consists of oseltamivir, oseltamivir phosphate, zanamivir, rimantadine, amantadine, peramivir, baloxavir marboxil, acetaminophen, dextromethorphan, pseudoephedrine, guaifenesin, phenylephrine, chlorpheniramine, peramivir, diphenhydramine, or any combination thereof.
  • the medical condition is influenza A infection.
  • the anti-influenza therapy comprises, or alternatively consists essentially of, or yet further consists of amantadine, rimantadine, or any combination thereof.
  • the therapy or the pharmaceutical agent is administered in an effective amount, for example in treating the influenza infection, or in changing a metabolite feature to one indicative of not having the influenza infection using the metabolite biomarker signature, or both.
  • the feature of the biological sample comprises, or alternatively consists essentially of, or yet further consists of presence or absence of one or more of the metabolites.
  • the feature of the biological samples comprises, or alternatively consists essentially of, or yet further consists of a level (such as absolute amount, compound abundance, or concentration) of one or more of the metabolites.
  • the level is an extracellular level.
  • a metabolite is an intracellular metabolite.
  • the level is normalized to an internal standard, or to the mean compound abundance. Additionally or alternatively, the level is normalized by subtracting a level of a negative control.
  • the negative control is a subject free of the medical condition. In other embodiments, the negative control is a solution immersing or diluting the biological sample prior to performing the test. In some embodiments, the internal standard is a labeled standard D5- pyroglutamic acid.
  • the biological sample is a nasopharyngeal sample, a blood sample, a serum sample, a plasma sample, or a urine sample.
  • the biological sample is a nasopharyngeal swab or viral transport medium (VTM) immersing a nasopharyngeal swab.
  • VTM viral transport medium
  • a metabolite feature is normalized by subtracting a level of the viral transport medium.
  • a system comprising, or alternatively consisting essentially of, or yet further consisting of a processor and a memory.
  • the memory comprises, or alternatively consists essentially of, or yet further consists of instructions that are executable by the processor to cause the machine learning system to: (a) generate a training dataset based on one or more tests run on biological samples from a plurality of subjects having a medical condition, wherein the biological samples comprise, or alternatively consist essentially of, or yet further consist of metabolites from the medical condition, and wherein the training dataset comprises, or alternatively consists of, or yet further consists of a set of features identified through the one or more tests run on the biological samples; (b) produce a metabolite biomarker signature by: (i) applying one or more machine learning models to the training dataset comprising the set of identified features; (ii) selecting a subset of the set of features based on contributions to model predictions; and (iii) generating the metabolite biomarker signature based on
  • kits for use in a method as disclosed herein comprises, or alternatively consists essentially of, or yet further consists of one or more of an instruction for use, methanol, formic acid, ammonium formate salt, ammonium hydroxide, water, acetonitrile, isopropanol, MS calibration and reference mass solution, or MS metabolite library of standards. Additionally or alternatively, the kit comprises, or alternatively consists essentially of, or yet further consists of a system as disclosed herein.
  • a non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform a method or a step thereof as described herein.
  • LC/MS liquid chromatography combined with mass spectrometry
  • a rapid and simple diagnosis of a medical condition such as respiratory viral infections, in children and adults.
  • This technique has many potential advantages, for example, it can be developed into a low-complexity, rapid point- of-care-test; it is cheaper than existing diagnostics; it is less invasive (for example, could be adapted for urine-based diagnosis, instead of nasopharyngeal swab); and it allows identification of certain metabolic pathways used in a medical condition, such as by respiratory viruses, that could pave the way for new therapeutic targets in the future.
  • a more streamlined LC/MS system is developed to accelerate turnaround time.
  • the method is validated using a blind trial set.
  • the method and system as disclosed herein is suitable for a point-of-care-test in a simple format.
  • the technology is expanded to transplant viruses, tropical medicine and diagnosis of congenital infections.
  • the human metabolome encompasses lipids, carbohydrates, and metabolic intermediates (e.g., organic acids, amino acids, and acylcamitines). Detection of these diverse compound classes using liquid chromatography-mass spectrometry (LC-MS) currently requires multiple chromatographic techniques. Commonly, lipidomic methods use reversed-phase (RP) chromatography, hydrophilic interaction chromatography (HILIC), or direct infusion; and gly comic methods use HILIC. Because methods based on either RP or HILIC alone can miss key metabolites, results from the independent use of these two approaches are often combined to capture and detect the full range of compounds by full- scan MS. However, this approach requires separate sample preparations for each chromatographic technique, and leads to overlapping datasets (i.e., the same metabolite being detected on both techniques) that must be meticulously curated to achieve a single, unique result set.
  • RP reversed-phase
  • HILIC hydrophilic interaction chromatography
  • gly comic methods use HI
  • RP and HILIC Various chromatographic strategies have been investigated to address limitations of the independent use of RP and HILIC. These include RP methods using mobile phase modifiers, such as ion-pairing reagents or ammonium fluoride, and columns with increased polar retention, such as C18-pentafluorophenyl (PFP) and porous graphitic carbon (Hypercarb), as well as combined RP -HILIC or HILIC -RP arrangements. While these strategies expand metabolome coverage, they are unable to resolve key pathognomonic metabolites (e.g., alloisoleucine, seen in maple syrup urine disease) without sacrificing negative mode ionization, or they require at least two LC systems to overcome mobile phase incompatibility.
  • mobile phase modifiers such as ion-pairing reagents or ammonium fluoride
  • PFP C18-pentafluorophenyl
  • Hypercarb porous graphitic carbon
  • Ion-exchange (IEX) chromatography and mixed-mode IEX have also been investigated to widen metabolite coverage, especially to retain highly charged metabolites, but, under the conditions studied, were associated with prolonged retention of hydrophobic or highly charged compounds, or the lack of hydrophobic retention.
  • IEX-RP configuration using a single LC system has been used to increase peak capacity in proteomic applications. And the RP column would separate the remaining, less polar metabolites.
  • the method comprises, or alternatively consists essentially of, or yet further consists of: separating components of the biological sample via reversed-phase (RP) chromatography to obtain an elute; subjecting the elute to separation via ion-exchange (IEX) chromatography or mixed mode IEX chromatography; and detecting the separated compounds to determine the components of the biological sample.
  • RP reversed-phase
  • IEX ion-exchange
  • the biological sample comprises, or alternatively consists essentially of, or yet further consists of lipids, carbohydrates, and metabolic intermediates.
  • the biological sample comprises, or alternatively consists essentially of, or yet further consists of polar and non-polar metabolites.
  • the detecting step is performed using mass spectrometry.
  • the detecting step includes qualitative analysis.
  • the biological sample is separated in the RP chromatography and IEX chromatography with one solvent gradient. In some embodiments, there is no switching valve between the RP chromatography and IEX chromatography. In some embodiments, isomers of metabolites in the biological sample are separated. Representative methods are described in US Patent Application No. 17/207,295 filed March 19, 2021, which is incorporated herein by reference in its entirety.
  • Example 1 Novel metabolomics approach for the diagnosis of respiratory viruses directly from nasopharyngeal specimens
  • Respiratory virus infections including influenza A and B, are important causes of morbidity and mortality among pediatric and adult patients. These viruses infect respiratory epithelial cells, where they may induce metabolite alterations.
  • Uses of liquid chromatography (LC) combined with mass spectrometry (MS) was investigated for the study of host cell metabolite alterations to diagnose and differentiate respiratory viruses. Rapid identification of respiratory viruses may have important implications for patient management, optimization of infection control measures and antimicrobial stewardship.
  • Table 1 Liquid chromatography (LC) Gradient-time table.
  • Results A total of 130 samples were tested by Q-TOF LC/MS, including 120 positive samples (10 samples of each viral target), as well as 10 clinical specimens collected from patients with acute respiratory symptoms and negative respiratory virus RT-PCR.
  • Viruses tested included adenovirus, coronavirus, influenza A H1N1 and H3N2, influenza B, human metapneumovirus, parainfluenza 1, 2, 3 and 4, respiratory syncytial virus (RSV), and rhinovirus.
  • Q-TOF LC/MS allowed identification of key metabolites that distinguished all virus positive samples compared to the negative group (FIGs. 1A-1C), as well as differentiating these respiratory viruses from one another. Clear differentiation was also seen between influenza A H1N1 and H3N2 subtypes (FIG. ID), and between parainfluenza types.
  • Example 2 Novel metabolomics approach combined with machine learning for the diagnosis of influenza from nasopharyngeal specimens
  • Respiratory virus infections are important causes of morbidity and mortality and may induce host metabolite alterations by infecting respiratory epithelial cells.
  • LC liquid chromatography
  • Q-TOF quadrupole time-of-flight mass spectrometry
  • machine learning was investigated to identify distinct metabolic signatures from nasopharyngeal samples for the diagnosis of respiratory tract infection.
  • Nasopharyngeal swab samples positive and negative for influenza A and B were analyzed by LC/Q-TOF to identify distinct metabolic signatures for diagnosis of acute illness.
  • Machine learning models were performed for classification, followed by Shapley additive explanation (SHAP) analysis to analyze feature importance and for biomarker discovery.
  • a total of 236 samples were tested in the discovery phase by LC/Q-TOF, including 118 positive samples (40 influenza A 2009 H1N1, 39 influenza H3 and 39 influenza B) as well as 118 age and sex-matched negative controls with acute respiratory illness.
  • LC/Q-TOF combined with machine learning analysis allowed identification of key metabolites that distinguished positive influenza from negative samples with an area under the receiver operating characteristic curve (AUC) of 0.94 (95%CI 0.88, 1.00).
  • This signature revealed an AUC of 0.99 (95% Cl 0.97, 1.00), sensitivity of 1.00 (95% Cl 0.93, 1.00) and specificity of 0.69 (95% Cl 0.55, 0.80).
  • this metabolomic approach may be used for diagnostic applications in infectious diseases testing directly from patient samples and may be eventually adapted for point-of-care testing.
  • a metabolomics method for the diagnosis of infectious diseases based on an in-line, two-column chromatographic arrangement that allows the capture of both non-polar and polar compounds in a single 20-minute run.
  • This method is used for the characterization of host metabolite signatures directly from patient specimens using Liquid Chromatography Quadrupole Time-of-Flight (LC/Q-TOF), followed by a machine learning (ML) algorithm developed for metabolomics classification analysis and biomarker discovery.
  • LC/Q-TOF method was used to profile metabolites between influenza-positive samples, including influenza A H1N1, influenza A H3, and influenza B viruses, and negative samples.
  • Untargeted metabolomics discovery analysis identified a total of 3318 features. After ranking the overall LC/Q-TOF features by importance, the top 20 ion features associated with classification, of which only 13 contributed more than 1% to model predictions (FIGs. 4A-4B). After ranking features by importance, demonstrated is a model trained using only the top feature (84.0447@0.81) (accurate mass @ retention time) had an AUC of 0.92 (95% Cl 0.84, 1.00). Models trained using the top 3, 5, and 7 features obtained AUCs of 0.98 (95% Cl 0.96, 1.00), 1.0 (95% Cl 0.99,1.00), and 0.99 (95% Cl 0.99, 1.00), respectively (FIG. 4C).
  • the top 20 biomarker signature revealed an overall AUC of 1.00 (95% Cl 0.998, 1.00), sensitivity of 0.94 (95% Cl 0.83, 0.98) and specificity of 1.00 (95% Cl 0.93, 1.00) (FIG. 5).
  • Heatmap analysis showed the top 20 biomarker signature varied slightly by influenza subtype compared to the negative subgroup (FIG. 6).
  • Metabolite identification through in-house library matching revealed a tier 1 match for compound 130.0507@0.81 as pyroglutamic acid, and compound 84.0447@0.81 as an in-source fragment ion of pyroglutamic acid. Furthermore, compound 350.0774@9.34 was identified to be consistent with formylmethyl glutathione. Further confirmation of this identification is under investigation. Further metabolite annotation work will be required for the other metabolites listed as these did not definitively match the in-house library or large database screening (Table 3).
  • the top 20 biomarker signature identified by LC/Q-TOF was adapted to LC/MS-MS testing on a 96-sample set, and demonstrated sustained high performance. Given LC/MS-MS is already employed in multiple laboratories for routine clinical testing, this proof of concept provides a model for feasibility of adaptation and roll-out to other centralized laboratory facilities (Seger et al. Clin Biochem. 2020;82:2-11; and Garg et al. Methods Mol Biol. 2016;1383:1-10). [000196] Molecular testing has revolutionized viral diagnostics in clinical laboratories, with multiplexed reverse-transcriptase polymerase chain reaction (RT-PCR) representing the current standard of care for the diagnosis of respiratory viral infections.
  • RT-PCR reverse-transcriptase polymerase chain reaction
  • Metabolomics or the large-scale study of small molecules, represents the ‘-omics’ technology closest to phenotype and thus holds promise to address current gaps in molecular testing of infectious diseases (Johnson et al. Nat Rev Mol Cell Biol 2016; 17(7): 451-9; Patti et al. Nat Rev Mol Cell Biol 2012; 13(4): 263-9; and Fiehn et al. Plant Mol Biol 2002; 48(1-2): 155-71). This is particularly important given the significant burden of respiratory viruses in the U.S. and internationally (Centers for Disease Control and Prevention. Disease Burden of Influenza. Available at: cdc.gov/flu/about/burden/index.html. Accessed March 6th 2020).
  • the top 20 ion features retained in the biomarker signature likely represent a heterogeneous group of compounds from a variety of biological pathways.
  • the top two ion features were successfully identified through in-house library matching as pyroglutamic acid (130.0507@0.81) and an in-source fragment ion of pyroglutamic acid (compound 84.0447@0.81), which are decreased in specimens from influenza-infected individuals.
  • Pyroglutamic acid (synonyms: pidolic acid, 5-oxoproline) is a cyclized derivative of L- glutamic acid which can form in one of three ways in the living cell: from the degradation of glutathione, from incomplete reactions following glutamate activation, or from the degradation of proteins containing pyroglutamic acid at the N-terminus (Kumar. Current Science. 2012;102(2):288-97).
  • ROS reactive oxygen species
  • the infected A549 cells were washed and lysed prior to metabolite analysis, and showed upregulation of glutathione metabolism with an increase in the intracellular concentration of pyroglutamic acid.
  • the results herein show a decrease in pyroglutamic acid in NP swabs from influenza-infected individuals. Given the specimens used herein were not washed or lysed, the observed decrease in pyroglutamic acid in NP swabs from infected individuals may be due to decreased extracellular concentrations from increased use of glutathione in the intracellular space. Alternatively, a more complex mechanism involving oxidative stress and upstream metabolic effects may be at play.
  • this study was based on a real-world, diverse patient population of individuals who were naturally infected with influenza, which may better approximate metabolic changes compared to experimentally-infected previously healthy volunteers. Furthermore, this patient population was diverse including children and adults in the inpatient and outpatient settings, and additionally included a high proportion of immunocompromised individuals. Also, this study was based on a newly-adapted in-line, two-column LC arrangement that provides highly accurate results in a single injection. Standard of care in untargeted metabolomics is to perform a minimum of 4 runs (positive mode, negative mode, polar and non-polar) which increases imprecision and turnaround time, as well as the complexity of downstream analysis (Gertsman et al.
  • the research objective was to assess the diagnostic test performance of the LC/Q- TOF (biomarker discovery cohort) and targeted analysis (validation cohort) for the diagnosis of influenza-infected vs uninfected individuals, and to identify key metabolites for classification of these two groups.
  • target sample size was determined before the experiments to achieve over 90% power based on an AUC of 0.925 for detection of a difference in the primary outcome of influenza infection vs no infection.
  • a secondary endpoint of influenza A vs influenza B was established in the study design phase, and used as an exploratory endpoint. The target sample size was not changed during the study.
  • NAAT nucleic acid amplification test
  • LC/Q-TOF testing was performed to generate raw data on mass-to-charge ratio and retention time for each sample tested. Single replicate testing was performed, and outlier data points were included for analysis.
  • LC/MS-MS testing was performed to generate raw data on mass-to-charge ratio and retention time for each sample tested. Single replicate testing was performed, and outlier data points were included for analysis. This method served to confirm the results from the LC/Q-TOF analysis in a separate participant cohort.
  • LC-Q-TOF method Liquid chromatography (LC) separation was performed on an Agilent 1290 Quaternary LC system (Agilent Technologies). In this unique chromatographic arrangement, two columns were used in-line: a reverse-phase (RP) column of 2.1 x 50mm 1.8 pm HSS T3 (Waters Corporation, Milford, MA) was placed first followed by an ion exchange (IEX) column of 2.0 x 30mm 3-pm Intrada (Imtakt USA, Portland, OR). Both columns were joined with EXP2 fittings (Optimize Technologies, OR). Mass spectrometry was performed on an Agilent 6545 Q-TOF instrument with electrospray ionization.
  • RP reverse-phase
  • IEX ion exchange
  • the optimized mobiles phases were A) 150 mg of ammonium formate per liter water with 0.4% formic acid (v/v), B) 1.2 g of ammonium formate per liter of methanol with 0.2% formic acid, and C) water with 1% each formic acid and ammonium hydroxide, as previously described (Le et al. J Chromatogr B Analyt Technol Biomed Life Sci.
  • MS was performed on an Agilent 6545 Q-TOF with dual Agilent JetStream electrospray ionization, as previousely described (Le et al. J Chromatogr B Analyt Technol Biomed Life Sci.
  • LC/Q-TOF metabolite extraction and analysis A volume of 100pL of nasopharyngeal sample eluted in VTM was processed by ultracentrifugation using Pall Omega 3kDa centrifugal devices (VWR, Radnor, PA) at 4°C for 15 minutes at 17,000 x g. The filtrate was transferred to glass vials and analyzed, and each sample was run once. Two quality controls (QC) samples, one pooled QC sample and an independent normalization QC were used to assess for batch effect.
  • VWR Pall Omega 3kDa centrifugal devices
  • the pooled QC was created by pooling an equal volume of aliquots from all the samples included in the run. Unsupervised principal component analysis was performed to visually assess appropriate performance of the pooled QC. In addition, blank VTM was run in triplicate to generate a mean background spectral distribution. Progenesis QI software (Waters Corporation) was used for run alignment, peak picking (automatic, level 4), adduct deconvolution, and feature identification. Positive polarity analysis was performed using the adducts [M+H], [M+NEL] and [M+Na] Metabolite identification was first performed using a previously-developed authentic standard library (Le et al. J Chromatogr B Analyt Technol Biomed Life Sci.
  • LC-MS/MS Targeted methods In some embodiments, the targeted analysis was performed on a clinically-validated method that detects pyroglutamic acid (Mak et al. Methods Mol Biol. 2019;2030:85-109; and Le et al. J Chromatogr B Analyt Technol Biomed Life Sci. 2014;944:166-74). Mass spectrometry was performed on an Agilent 6460 Triple Quadrupole mass spectrometer equipped with an Agilent JetStream electrospray ionization, as previously described (Mak et al. Methods Mol Biol. 2019;2030:85-109; and Le et al. J Chromatogr B Analyt Technol Biomed Life Sci.
  • Second dimensional separation used a Waters BEH C18 column, 2.1 x 100mm, 2.5pm (Waters Corporation).
  • Mobile phase A 0.03% perfluoroheptanoic acid in water, is identical for both pumps 1 and 2.
  • Mobile phase B acetonitrile, is identical for both pumps 1 and 2.
  • the data were acquired using MassHunter Workstation Acquisition version B.08.02 (Agilent) and exported for ML analysis.
  • LC-MS/MS Targeted methods In some embodiments, The targeted analysis was performed using the same method as described for LC/Q-TOF above, but adapted for LC/MS-MS with SRM. Mass spectrometry was performed on a Waters Xevo TQ-XS mass spectrometer equipped with electrospray ionization. Ion features that were not detected by this method on the pooled sample tested on the Xevo TQ-XS were removed from the SRM pairs. Liquid chromatography separation was performed on a Waters Acquity H-class quaternary LC system (Waters Corp.). The two-column arrangement described for LC/Q- TOF was replicated.
  • LC-MS/MS Metabolite Extraction and Analysis A volume of 100 pL of respiratory specimen eluted in VTM or phosphate buffered saline (PBS) and 10 pL of pyroglutamic acid-D5 0.025nm/L as internal standard (Cambridge Isotope Laboratories,
  • Machine learning methods were developed for the task of determining whether a sample was positive or negative for influenza based on its metabolic profile.
  • Machine learning is a class of techniques that uses data to learn a model that maps an input (the metabolic profile of a sample; includes mass-to-charge ratio (m/z) and retention time for each sample) to its associated output (the influenza infection outcome of the sample), and uses this learned model on new inputs (the metabolic profiles of new samples) to make predictions of new outputs (the influenza outcomes of new samples).
  • Two machine learning methods were implemented: gradient boosted decision trees and random forests.
  • Gradient boosted decision trees and random forests are both ensemble learning methods that improve upon the performance of decision tree models.
  • Decision tree learners construct a model by iteratively identifying which feature most effectively divides the data into groups with low within-group variation in the outcome and high between-group variation in outcome, and then repeat the process within each group.
  • Gradient boosted decision trees (GBDT) construct several decision trees such that each tree learns from the errors of the prior tree (Ke et al. Neural Information Processing Systems Foundations 2017).
  • Random forests (RFs) construct several decision trees such that each tree is constructed using different subsets of the data.
  • the machine learning approaches of GBDT and RF were chosen over alternative machine learning methods because they can handle mixes of categorical and continuous covariates, capture nonlinear relationships, and scale well to large amounts of data.
  • Dataset Splitting Ion features showing zero values through all samples tested were removed from the dataset. The remaining dataset without normalization was partitioned into a training set used to develop machine learning models, and a holdout test set used to evaluate the predictive performance of the machine learning models. The partitioning of the dataset was random such that 80% of the samples were included in the training set, and the other 20% in the test set. There was no overlap between the samples and patients between the two sets. [000222] All models were developed on the training set, and their final performance reported on the holdout test set and/or the prospective cohort. Within the training set cross- validation was used to develop the models to avoid overfitting to the training set. Within the training set, cross-validation was used to develop the models.
  • Machine Learning Methods vs Traditional Linear Models To determine the usefulness of capturing non-linear relationships with machine learning models, the modelling approaches using two machine learning methods, gradient boosted decision trees and random forests, were compared with two traditional linear models, least absolute shrinkage and selection operator (LASSO) and Ridge. These models are variants of Logistic regression, a statistical model that uses the logistic function to model the outcome assuming a linear relationship between the features and the outcome. LASSO makes the same linear assumption but alters the model fitting process to select only a subset of the features for use in the final model rather than using all of them. Unlike LASSO, Ridge does not result in a sparse model, but rather addresses multicollinearity in the features by shrinking the weights assigned to correlated variables. The training and test sets, and the cross-validation strategy were identical across the machine learning models and traditional linear models.
  • LASSO least absolute shrinkage and selection operator
  • Feature Importance The SHAP (SHapley Additive explanations) method was used to quantify the impact of each feature on the models. The method explains prediction by allocating credit among the input features; feature credit is calculated using Shapley Values (Lundberg et al. Nat Biomed Eng 2018; 2(10): 749-60; and Lundberg. Neural Information Processing Systems Foundations 2017; 30), as the change in the expected value of the model’s prediction of improvement for a symptom when a feature is observed versus unknown.
  • SHAP SHapley Additive explanations
  • Parsimonious Model A set of parsimonious models were developed designed to use a small subset of features identified to be important by the feature importance method. The top k features with highest overall importance to the machine learning models were used; k values of 1, 3, 5, and 7 were used. On each of these choices, a single decision tree model was trained using the previously described cross-validation strategy to build the parsimonious model. Maximum depth was restricted to k, and additional hyperparameters were optimized using grid search during cross-validation. The performance of the parsimonious models was compared to the full models.
  • the subgroup analysis was used to evaluate variation in model performance across patient subpopulations.
  • An LGBM model was trained using the previously described cross-validation strategy on the discovery training set and generated predictions with this model on the discovery test set.
  • the test samples were then split into disjoint subpopulations and reported the AUC and confidence interval using DeLong’s method for each subgroup.
  • the following subgroups were investigated: adult vs pediatric individuals, immunocompromised vs not, ICU-admitted vs not, antibiotic-treated vs not, bacterial coinfection vs not, and by time since symptom onset at the time of respiratory viral testing ( ⁇ 7 days vs >7 days).
  • AUC receiver operating characteristic curve
  • Performance measures for the models also included sensitivity, specificity, and accuracy at a high-sensitivity operating point used to binarize the model predictions.
  • the high-sensitivity operating point was selected on the training set by aggregate the predictions on the k validation folds, and then picking the threshold that produced a model sensitivity closest to 0.9.
  • the high-sensitivity operating point was selected by selecting a high-sensitivity operating point on each of the k validation folds and averaging them: on each validation fold, an operating point that maximized the Youden’s J statistic and produced a sensitivity of at least 0.9 was selected.
  • 95% Wilson score confidence intervals were provided for sensitivity, specificity, and accuracy and 95% DeLong confidence intervals were provided for AUC (DeLong et al. Biometrics. 1988;44(3):837-45).
  • ED emergency department
  • ICU intensive care unit
  • IQR inter-quartile range
  • SD standard deviation
  • yo years-old.
  • Table 3 Metabolite annotation of the top compounds associated with differentiation of influenza-positive from negative samples.
  • Da daltons
  • m/z mass to charge ratio
  • Table 5 Subgroup analyses for AUC data for adult vs pediatrics, immunocompromised vs non-immunocompromised individuals, ICU-admitted vs non-ICU- admitted individuals, presence of bacterial coinfection or colonization or not, antibiotic treatment vs no antibiotic treatment, and time since symptom onset.
  • Bacterial coinfection or colonization was defined as a positive respiratory culture or positive molecular test for a bacterial pathogen within 7 days of the index respiratory viral testing. The number (n) corresponds to the size of the test set.
  • ATBx antibiotic
  • AUC area under the receiving operating characteristic curve
  • Cl confidence interval
  • coinfx coinfection
  • d days
  • IC immunocompromised
  • ICU intensive care unit
  • LGBM LightGBM
  • Peds pediatrics
  • RF random forests.
  • Table 8 Key resources table Example 3 - Novel metabolomics method for the diagnosis of SARS-COV-2 infection and/or COVID-19 disease
  • the research objective is to assess the diagnostic test performance of the LC/Q- TOF (biomarker discovery cohort) and targeted analysis (validation cohort) for the diagnosis of SARS-COV-2-infected vs uninfected individuals, and to identify key metabolites for classification of these two groups.
  • target sample size is determined before the experiments to achieve over 90% power based on an AUC of 0.925 for detection of a difference in the primary outcome of SARS- COV-2 infection vs no infection.
  • Nasopharyngeal samples collected from adult patients and children are processed per routine clinical procedures. Briefly, a flocked swab is inserted in the nasal passage, rotated for collection of cells for 10-15 seconds and placed in viral transport medium (MicroTest M4RT, Remel Inc., San Diego, CA). Specimens were aliquoted and stored at - 80°C for subsequent LC/Q-TOF testing.
  • Example 2 The following methods and analysis are performed as described in Example 2: LC/Q-TOF methods, LC/Q-TOF metabolite extraction and analysis, LC-MS/MS targeted methods, LC-MS/MS metabolite extraction and analysis, statistical analysis, machine learning analysis, dataset splitting, comparing machine learning methods vs traditional linear models, determining feature importance, subgroup analysis, and multivariable analysis.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Epidemiology (AREA)
  • Data Mining & Analysis (AREA)
  • Primary Health Care (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Medicines Containing Antibodies Or Antigens For Use As Internal Diagnostic Agents (AREA)
  • Pharmaceuticals Containing Other Organic And Inorganic Compounds (AREA)
  • Medicines That Contain Protein Lipid Enzymes And Other Medicines (AREA)

Abstract

L'invention concerne des procédés, des compositions, des systèmes et des dispositifs comprenant l'application d'une signature de biomarqueur de métabolite déterminée par apprentissage automatique à un échantillon biologique d'un patient pour reconnaître une condition médicale.
PCT/US2021/025015 2020-03-31 2021-03-30 Approche métabolomique combinée à un apprentissage automatique pour reconnaître un condition médicale WO2021202620A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US202063002913P 2020-03-31 2020-03-31
US63/002,913 2020-03-31
US202063006641P 2020-04-07 2020-04-07
US63/006,641 2020-04-07

Publications (1)

Publication Number Publication Date
WO2021202620A1 true WO2021202620A1 (fr) 2021-10-07

Family

ID=77927591

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/025015 WO2021202620A1 (fr) 2020-03-31 2021-03-30 Approche métabolomique combinée à un apprentissage automatique pour reconnaître un condition médicale

Country Status (1)

Country Link
WO (1) WO2021202620A1 (fr)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114137137A (zh) * 2021-11-15 2022-03-04 上海交通大学 一种视网膜母细胞瘤分期模型构建方法及标志物
CN114243702A (zh) * 2022-01-28 2022-03-25 国网湖南省电力有限公司 一种电网avc系统运行参数的预测方法、系统及存储介质
CN114664382A (zh) * 2022-04-28 2022-06-24 中国人民解放军总医院 多组学联合分析方法、装置及计算设备
WO2022240891A1 (fr) * 2021-05-10 2022-11-17 The Cleveland Clinic Foundation Métabolites salivaires en tant que biomarqueurs non invasifs de chc
WO2023060297A1 (fr) * 2021-10-13 2023-04-20 Omniscient Neurotechnology Pty Limited Visualisation de données cérébrales
CN116338210A (zh) * 2023-05-22 2023-06-27 天津云检医学检验所有限公司 用于诊断原发性中枢神经系统淋巴瘤的生物标志物及检测试剂盒
US11754536B2 (en) 2021-11-01 2023-09-12 Matterworks Inc Methods and compositions for analyte quantification
TWI855954B (zh) 2024-01-10 2024-09-11 廣達電腦股份有限公司 人工智慧預測模型的系統和建立方法
US12100484B2 (en) 2021-11-01 2024-09-24 Matterworks Inc Methods and compositions for analyte quantification

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120197539A1 (en) * 2009-10-09 2012-08-02 Carolyn Slupsky Methods for diagnosis, treatment and monitoring of patient health using metabolomics
US20170241043A1 (en) * 2016-02-18 2017-08-24 The Regents Of The University Of California Predicting metabolic side effects of transported drugs
US20180107783A1 (en) * 2015-05-28 2018-04-19 Immunexpress Pty Ltd Validating biomarker measurement
US20190101544A1 (en) * 2017-09-01 2019-04-04 Venn Biosciences Corporation Identification and use of glycopeptides as biomarkers for diagnosis and treatment monitoring
WO2019200410A1 (fr) * 2018-04-13 2019-10-17 Freenome Holdings, Inc. Mise en œuvre de l'apprentissage automatique pour un dosage multi-analytes d'échantillons biologiques

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120197539A1 (en) * 2009-10-09 2012-08-02 Carolyn Slupsky Methods for diagnosis, treatment and monitoring of patient health using metabolomics
US20180107783A1 (en) * 2015-05-28 2018-04-19 Immunexpress Pty Ltd Validating biomarker measurement
US20170241043A1 (en) * 2016-02-18 2017-08-24 The Regents Of The University Of California Predicting metabolic side effects of transported drugs
US20190101544A1 (en) * 2017-09-01 2019-04-04 Venn Biosciences Corporation Identification and use of glycopeptides as biomarkers for diagnosis and treatment monitoring
WO2019200410A1 (fr) * 2018-04-13 2019-10-17 Freenome Holdings, Inc. Mise en œuvre de l'apprentissage automatique pour un dosage multi-analytes d'échantillons biologiques

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022240891A1 (fr) * 2021-05-10 2022-11-17 The Cleveland Clinic Foundation Métabolites salivaires en tant que biomarqueurs non invasifs de chc
WO2023060297A1 (fr) * 2021-10-13 2023-04-20 Omniscient Neurotechnology Pty Limited Visualisation de données cérébrales
US11754536B2 (en) 2021-11-01 2023-09-12 Matterworks Inc Methods and compositions for analyte quantification
US12100484B2 (en) 2021-11-01 2024-09-24 Matterworks Inc Methods and compositions for analyte quantification
CN114137137A (zh) * 2021-11-15 2022-03-04 上海交通大学 一种视网膜母细胞瘤分期模型构建方法及标志物
CN114243702A (zh) * 2022-01-28 2022-03-25 国网湖南省电力有限公司 一种电网avc系统运行参数的预测方法、系统及存储介质
CN114664382A (zh) * 2022-04-28 2022-06-24 中国人民解放军总医院 多组学联合分析方法、装置及计算设备
CN116338210A (zh) * 2023-05-22 2023-06-27 天津云检医学检验所有限公司 用于诊断原发性中枢神经系统淋巴瘤的生物标志物及检测试剂盒
CN116338210B (zh) * 2023-05-22 2023-08-11 天津云检医学检验所有限公司 用于诊断原发性中枢神经系统淋巴瘤的生物标志物及检测试剂盒
TWI855954B (zh) 2024-01-10 2024-09-11 廣達電腦股份有限公司 人工智慧預測模型的系統和建立方法

Similar Documents

Publication Publication Date Title
WO2021202620A1 (fr) Approche métabolomique combinée à un apprentissage automatique pour reconnaître un condition médicale
Grant et al. SARS-CoV-2 coronavirus nucleocapsid antigen-detecting half-strip lateral flow assay toward the development of point of care tests using commercially available reagents
Niu et al. Noninvasive proteomic biomarkers for alcohol-related liver disease
Ihling et al. Mass spectrometric identification of SARS-CoV-2 proteins from gargle solution samples of COVID-19 patients
Yan et al. Rapid detection of COVID-19 using MALDI-TOF-based serum peptidome profiling
Yadav et al. SERS based lateral flow immunoassay for point-of-care detection of SARS-CoV-2 in clinical samples
Li et al. Microfluidic magneto immunosensor for rapid, high sensitivity measurements of SARS-CoV-2 nucleocapsid protein in serum
Cui et al. Challenges and emergent solutions for LC‐MS/MS based untargeted metabolomics in diseases
Lin et al. Microfluidic immunoassays for sensitive and simultaneous detection of IgG/IgM/antigen of SARS-CoV-2 within 15 min
Karakioulaki et al. Biomarkers in pneumonia—beyond procalcitonin
Weiner 3rd et al. Metabolite changes in blood predict the onset of tuberculosis
Aggarwal et al. Role of multiomics data to understand host–pathogen interactions in COVID-19 pathogenesis
Cazares et al. Development of a parallel reaction monitoring mass spectrometry assay for the detection of SARS-CoV-2 spike glycoprotein and nucleoprotein
Chen et al. Analytical pipeline for discovery and verification of glycoproteins from plasma-derived extracellular vesicles as breast cancer biomarkers
Renuse et al. A mass spectrometry-based targeted assay for detection of SARS-CoV-2 antigen from clinical specimens
Van Puyvelde et al. Cov-MS: a community-based template assay for mass-spectrometry-based protein detection in SARS-CoV-2 patients
Su et al. Value of serum procalcitonin levels in predicting spontaneous bacterial peritonitis.
US20220392580A1 (en) Computational model trained to predict interacting pairs based on weakly-correlated features
Grenga et al. Taxonomical and functional changes in COVID‐19 faecal microbiome could be related to SARS‐CoV‐2 faecal load
US20170046476A1 (en) Systems, apparatus, and methods for analyzing and predicting cellular pathways
Wang et al. Ultrasensitive antibiotic perceiving based on aptamer-functionalized ultraclean graphene field-effect transistor biosensor
Foster et al. Targeted proteomics of human metapneumovirus in clinical samples and viral cultures
Zang et al. Early detection of cystic fibrosis acute pulmonary exacerbations by exhaled breath condensate metabolomics
Lin et al. Progress in understanding COVID-19: insights from the omics approach
Burke et al. Nasopharyngeal protein biomarkers of acute respiratory virus infection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21778760

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21778760

Country of ref document: EP

Kind code of ref document: A1