WO2024010854A1 - Rapid generation of breath-based health reports and systems for use in the same - Google Patents

Rapid generation of breath-based health reports and systems for use in the same Download PDF

Info

Publication number
WO2024010854A1
WO2024010854A1 PCT/US2023/027001 US2023027001W WO2024010854A1 WO 2024010854 A1 WO2024010854 A1 WO 2024010854A1 US 2023027001 W US2023027001 W US 2023027001W WO 2024010854 A1 WO2024010854 A1 WO 2024010854A1
Authority
WO
WIPO (PCT)
Prior art keywords
breath
subject
biopsy
health
machine learning
Prior art date
Application number
PCT/US2023/027001
Other languages
French (fr)
Inventor
Chris Wheeler
Karl-Magnus LARSSON
Kevin BUNDY
Luke Clauson
Original Assignee
Diagnose Early, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Diagnose Early, Inc. filed Critical Diagnose Early, Inc.
Publication of WO2024010854A1 publication Critical patent/WO2024010854A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/08Detecting, measuring or recording devices for evaluating the respiratory organs
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/483Physical analysis of biological material
    • G01N33/497Physical analysis of biological material of gaseous biological material, e.g. breath
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H15/00ICT specially adapted for medical reports, e.g. generation or transmission thereof
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H80/00ICT specially adapted for facilitating communication between medical practitioners or patients, e.g. for collaborative diagnosis, therapy or health monitoring
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders

Definitions

  • Medical diagnosis is the process of determining which disease or condition explains a person's symptoms and signs.
  • the information required for diagnosis is typically collected from a history and physical examination of the person seeking medical care. Often, one or more diagnostic procedures, such as medical tests, are also done during the process.
  • a diagnosis in the sense of diagnostic procedure, can be regarded as an attempt at classification of an individual's condition into separate and distinct categories that allow medical decisions about treatment and prognosis to be made. Diagnosis is often challenging because many signs and symptoms are nonspecific. For example, redness of the skin (erythema), by itself, is a sign of many disorders and thus does not tell the healthcare professional what is wrong. Thus differential diagnosis, in which several possible explanations are compared and contrasted, must be performed. This involves the correlation of various pieces of information followed by the recognition and differentiation of patterns.
  • aspects of the methods include: analyzing breath samples from one or a plurality of subjects with a secondary electrospray ionization-high-resolution mass spectrometry analyzer to generate a plurality of breath biopsy output files; obtaining a health record associated with a disease or condition for the one or each subject; training a machine learning model to identify a relationship between the breath samples and the disease or condition using the breath biopsy output files and the obtained health records; and applying the trained machine learning model to a breath biopsy output file, different from the breath biopsy output files used to train the model, to generate a health report regarding the disease or condition for a subject or subjects.
  • aspects of the present invention further include methods of generating the breath biopsy output file and methods of generating real time feedback to enhance accuracy and monitor data quality. Also provided are systems for use in practicing methods of the invention.
  • FIG. 1 depicts an overview of the results obtained from an identifier associated breath biopsy output file generated from a breath sample assay in accordance with an embodiment of the invention.
  • FIG. 2 provides a flow diagram depicting a method for generating an intuitive data set from an identifier associated breath biopsy output file in accordance with an embodiment of the invention.
  • FIG. 3 depicts a health report obtained at least in part from a breath biopsy output file generated from a breath sample assay in accordance with an embodiment of invention.
  • FIG. 4 illustrates various metabolic profiles of a health report obtained at least in part from a breath biopsy output file generated from a breath sample assay in accordance with an embodiment of invention.
  • FIG. 5 depicts a section of a health report breaking down the results of the breath sample assay as they relate to COPD in accordance with an embodiment of invention.
  • FIGS. 6A-6B illustrate a metabolic profile of toxins (i.e., a toxin panel) of a health report obtained at least in part from a breath biopsy output file generated from a breath sample assay in accordance with an embodiment of invention.
  • FIG. 7 provides a flow diagram depicting a method for dynamically adjusting breath collection automatically based on real-time feedback in accordance with an embodiment of the invention.
  • FIGS. 8A-8B illustrate selected ion monitoring (SIM) automatically performed based on real-time feedback in accordance with an embodiment of the invention.
  • FIG. 9 provides a flow diagram depicting a method for training a machine learning model using generated breath biopsy output files and obtained health records in accordance with an embodiment of the invention.
  • aspects of the methods include: analyzing breath samples from one or a plurality of subjects with a secondary electrospray ionization-high-resolution mass spectrometry analyzer to generate a plurality of breath biopsy output files; obtaining a health record associated with a disease or condition for one or each subject; training a machine learning model to identify a relationship between the breath samples and the disease or condition using the breath biopsy output files and the obtained health records; applying the trained machine learning model to a breath biopsy output file, different from the breath biopsy output files used to train the model, to generate a health report regarding the disease or condition for a subject.
  • aspects of the present invention further include methods of generating the breath biopsy output file and methods of generating real time feedback to enhance accuracy and monitor data quality. Also provided are systems for use in practicing methods of the invention.
  • aspects of the methods include: analyzing breath samples from a plurality of subjects with a secondary electrospray ionization-high-resolution mass spectrometry analyzer to generate a plurality of breath biopsy output files; obtaining a health record associated with a disease or condition for each subject; training a machine learning model to identify a relationship between the breath samples and the disease or condition using the breath biopsy output files and the obtained health records; applying the trained machine learning model to a breath biopsy output file, different from the breath biopsy output files used to train the model, to generate a health report regarding the disease or condition for a subject.
  • aspects of the present invention further include methods of generating the breath biopsy output file and methods of generating real time feedback to enhance accuracy and monitor data quality.
  • embodiments of the methods include analyzing breath samples from a plurality of subjects with a breath analyzer.
  • the breath sample of the subject or subjects that is analyzed may vary, and may be made up of 1 or more breaths, where in some instances the number of breaths ranges from 1 to 25, such as 1 to 20, including 1 to 15, e.g., 1 to 10, including 1 to 5 exhaled breaths.
  • the period of time between each exhaled breath received for the breath assay may vary, where in some instances the time between each received exhaled breath ranges from 1 to 180 seconds, such as 10 to 120, including 15 to 100, e.g., 20 to 90, including 20 to 60 seconds.
  • each exhaled breath of the breath sample may be received consecutively with respect to the previously received exhaled breath.
  • the breath sample may be a gaseous breath sample or an exhaled breath condensate (EBC) of the breath sample.
  • EBC exhaled breath condensate
  • the EBC may be collected by having the subject exhale into a container, cooling the container, then collecting the EBC on the inside walls of the cooled container.
  • the container may be cooled by, e.g., chilling the container in a freezer or refrigerator, with dry ice, or using liquid nitrogen.
  • the EBC may be stored for a period of time before assaying.
  • the EBC is stored for a period of time such as 24 hours or more, or 48 hours or more, or 72 hours or more, or 4 days or more, or 5 days or more, or 6 days or more, or 1 week or more, or 2 weeks or more, or 3 weeks or more, or 4 weeks or more, or 1 month or more.
  • methods may include aerosolization of the condensate prior to assaying using, e.g., a nebulizer.
  • Embodiments of the method may further include shipping the breath sample (e.g., EBC) to a remote location for assaying.
  • a “remote location,’’ is a location other than the location at which the breath sample is collected.
  • a remote location could be another location (e.g., office, lab, etc.) in the same city, another location in a different city, another location in a different state, another location in a different country, etc.
  • office e.g., lab, etc.
  • the two items can be in the same room but separated, or at least in different rooms or different buildings, and can be at least one mile, ten miles, or one hundred miles or more apart.
  • Breath analyzers in accordance with embodiments of the methods, may vary.
  • the analyzer includes a Raman spectroscopy analyzer, a breathalyzer, an optical absorbance sensing analyzer, a gas chromatography analyzer, electronic sensing using an electronic nose, a nuclear magnetic resonance spectroscopy analyzer, or a mass spectrometry analyzer.
  • the breath analyzer includes a mass spectrometry analyzer such as, e.g., a high-resolution mass spectrometry (HRMS) analyzer.
  • HRMS high-resolution mass spectrometry
  • the mass spectrometry method/technique employed by the analyzer may vary and the analyzer may be coupled with or include (e.g., may be configured to perform) one or more of: ion mobility spectrometry (IMS), gas chromatography (GC), liquid chromatography (LG), differential mobility spectrometry (DMS), field asymmetric ion mobility spectrometry (FAIMS), a selective-ion flow tube (i.e., SIFT-MS), a proton-transfer-reaction (i.e., PTR-MS), time-of-flight mass spectrometry (TOF-MS) etc.
  • IMS ion mobility spectrometry
  • GC gas chromatography
  • LG liquid chromatography
  • DMS differential mobility spectrometry
  • FIMS field asymmetric ion mobility spectrometry
  • SIFT-MS selective-ion flow tube
  • PTR-MS proton-transfer-reaction
  • TOF-MS time-of-flight mass spectrometry
  • the mass spectrometry analyzer may perform IMS-mass spectrometry (IMS-MS), GC-mass spectrometry (GC-MS), LC-mass spectrometry (LC- MS), etc.
  • IMS-MS IMS-mass spectrometry
  • GC-MS GC-mass spectrometry
  • LC- MS LC-mass spectrometry
  • tandem mass spectrometry may be performed using, e.g., two or more mass spectrometry analyzers.
  • the ionization method/technique employed by the analyzer may vary and may include matrix-assisted laser desorption/ionization (MALDI), atmospheric pressure chemical ionization (APCI), atmospheric pressure photoionization (APPI), electrospray ionization (ESI), secondary electrospray ionization (SESI), etc.
  • the ionization technique employed is a soft ionization technique.
  • the mass spectrometry analyzer may be configured to perform SESI such as, e.g., SESI-HRMS or SESI-TOF-HRMS.
  • the breath sample may be a gaseous breath sample (e.g., collected directly from the subject or aerosolized after being collected as an EBC).
  • the mass spectrometry analyzer may include a SUPER SESITM (e.g., SUPER SESITM-HRMS) device.
  • the mass spectrometry analyzer may be configured to perform SESI mass spectrometry (e.g., SESI-HRMS).
  • SESI mass spectrometry may be run in positive-ion mode (i.e., wherein ionization occurs through protonation, or positive ions enter the mass spectrometer) or negative-ion mode (i.e., wherein ionization occurs through deprotonation, or negative ions enter the mass spectrometer).
  • the SESI mass spectrometry analyzer is run in negative-ion mode.
  • the ionization agent may vary.
  • the ionization agent includes water. In some embodiments, the ionization agent includes formic acid. In embodiments where the ionization agent includes formic acid, the formic acid may be diluted in water, such as diluted to achieve a ratio ranging from 0.01 -1.0% volume over volume (v/v) of formic acid to water, such as 0.05-0.5% v/v of formic acid to water, or 0.1 - 0.2% v/v of formic acid to water.
  • mass spectrometry techniques that may be employed include, but are not limited to, those disclosed in U.S. Patent No. 11 ,075,068 and the patent documents cited therein, which methods are incorporated herein by reference; and Singh, K.D., Tancev, G., Decrue, F. et al. Standardization procedures for real-time breath analysis by secondary electrospray ionization high-resolution mass spectrometry.
  • the mass spectrometry analyzer may be a Thermo Scientific high-resolution mass spectrometer (e.g., Thermo Scientific ExactiveTM or Q-ExactiveTM) or a SciEX high-resolution mass spectrometer (e.g., a TripleTOF ® mass spectrometer system).
  • the breath sample is assayed in real time with respect to the subject providing the breath sample. Assaying the breath sample in real time with respect to the subject providing the breath sample may, e.g., minimize any chemical changes taking place which may impact the results of the breath sample assay. In these embodiments, compounds that are exhaled from deeper in the lungs may be detected relatively later in the assay.
  • the time of detection of a compound in the breath sample assay is used to identify and validate the detection of the compound or provide other information, e.g., related to the fingerprint of a compound, toxin source, disease, or condition in the breath sample or the pharmacokinetics of a compound.
  • real-time feedback of the measurements of the mass spectrometry analyzer may be generated and used to enhance the accuracy of relevant measurements.
  • relevant measurement is meant a mass-to-charge ratio (m/z) measurement of a feature of interest.
  • the feature of interest may be a compound of interest (e.g., the m/z of the compound of interest or a metabolite thereof).
  • the feature of interest may be one or more m/z measurements of a compound, toxin source, disease, or condition fingerprint.
  • fingerprint is meant a unique set of identified (e.g., as unique compounds or metabolites thereof) and/or unidentified m/z peaks or measurements and the context of the m/z peaks or measurements (e.g., the relative intensities of the m/z peaks, the temporal position of the m/z peaks in a breath, or any other context determined to be significant by a machine learning model during training whether known or unknown, as discussed in greater detail below) that are unique to a specific subject, sample type, compound and/or circumstance.
  • context of the m/z peaks or measurements e.g., the relative intensities of the m/z peaks, the temporal position of the m/z peaks in a breath, or any other context determined to be significant by a machine learning model during training whether known or unknown, as discussed in greater detail below
  • a subject’s breath may have a specific fingerprint
  • a compound may have a specific fingerprint such that it is able to be identified in a subject’s breath
  • a toxin source or a toxin may have a specific fingerprint such that exposure of a subject to the toxin source or toxin may be determined using the subject’s breath
  • a disease or condition may a specific fingerprint such that, e.g., the risk of the subject has of developing the disease or condition or the diagnosis of the disease or condition may be determined using the subject’s breath, etc.
  • the fingerprint may include the abundance (e.g., concentration) of a unique set of metabolites or other compounds in relation to each other or in relation to other compounds found in the subject’s breath (i.e., the relative abundance of the set of metabolites or other compounds or combinations thereof) determined using identified m/z peaks.
  • the fingerprint may include a temporal component. For example, the relative intensity of a set of m/z peaks or measurements of a fingerprint may change with the time of detection (e.g., as air is exhaled from deeper portions of the lungs).
  • the fingerprint may be generated by a machine learning model.
  • the real-time measurements may be fed to the trained machine learning model in order to generate features of interest (i.e., relevant measurements) for which accuracy may be enhanced, as discussed in greater detail below.
  • the mass spectrometry analyzer is dynamically adjusted in real-time based on real-time measurement feedback provided, e.g., for each breath assayed from a subject.
  • selected ion monitoring may be performed in order to enhance the accuracy of relevant measurements.
  • measurements of a subject’s breath generated by the mass spectrometer may be analyzed in real-time in order to search for compounds and fingerprints of interest. If evidence of a compound and/or fingerprint is found, the mass spectrometry analyzer may be configured to only measure and/or transmit one or more m/z values of select features of interest (or, e.g., limited ranges of m/z values containing selected features) in a subsequent breath sample provided by the subject.
  • the mass spectrometry analyzer may be configured to measure the select features (or, e.g., select range of m/z values containing features) with enhanced sensitivity and accuracy, i.e., when compared with the measurements taken before SIM. For example, by limiting the range of detected m/z values, the mass spectrometry analyzer may boost or amplify the signal of selected features of interest.
  • the SIM may be dynamic within a single breath and, e.g., the selected features of interest may change throughout a single breath.
  • the SIM may change to monitor different m/z ranges as the time of detection within a single breath changes.
  • SIM is performed automatically.
  • mass spectrometry measurements may be transmitted directly to a processor configured to search for compounds and fingerprints of interest.
  • the processor may then configure the mass spectrometry analyzer to limit detection to, and amplify the signal of, one or more select features (e.g., of compounds or fingerprints of interest) for which evidence is found thereof.
  • the processor may be configured to automatically perform SIM using a trained machine learning model, as discussed in greater detail below.
  • real-time feedback of the mass spectrometry analyzer may be generated and used to enhance the accuracy of relevant measurements.
  • fragmentation may be performed in order to enhance the accuracy of relevant features.
  • fragmentation is performed on all breath samples using, e.g., tandem mass spectrometry.
  • fragmentation may be performed based on real-time feedback as discussed above. For example, if evidence of a compound and/or fingerprint of interest is found, the mass spectrometry analyzer may be configured to perform a fragmentation run on the compound of interest or compounds of the fingerprint of interest.
  • Fragmentation may vary depending on the compound or fingerprint of interest and may include, but is not limited to, collision-induced dissociation (CID), surface-induced dissociation (SID), laser induced dissociation, electron-capture dissociation (ECD), electron-transfer dissociation (ETD), negative electron-transfer dissociation (NETD), electron-detachment dissociation (EDD), photodissociation (e.g., infrared multiphoton dissociation (IRMPD) or blackbody infrared radiative dissociation (BIRD)), higher-energy C-trap dissociation (HCD), EISA, and/or charge remote fragmentation.
  • CID collision-induced dissociation
  • SID surface-induced dissociation
  • ECD electron-capture dissociation
  • ETD electron-transfer dissociation
  • NETD negative electron-transfer dissociation
  • ETD electron-detachment dissociation
  • photodissociation e.g., infrared multiphoton dis
  • fragmentation is performed automatically.
  • mass spectrometry measurements may be transmitted directly to a processor configured to search for compounds and fingerprints of interest.
  • the processor may then configure the mass spectrometry analyzer to perform fragmentation of the compound of interest or compounds of the fingerprint of interest for which evidence is found thereof.
  • the processor may be configured to automatically perform fragmentation using a trained machine learning model, as discussed in greater detail below.
  • the processor may be configured to automatically perform SIM and fragmentation.
  • the processor may perform SIM (e.g., as discussed above) to amplify the signal of m/z measurements pertaining to compounds and fingerprints of interest for which evidence is found thereof after receiving measurements pertaining to a first breath or group of breaths provided by a subject.
  • the processor may then configure the mass spectrometry analyzer to perform fragmentation of the compound of interest or compounds of the fingerprint of interest in order to confirm the presence of the identified compound(s) and/or fingerprint(s) of interest in the subject’s breath.
  • one or more analyzers may be used to further verify the presence of the identified compound(s) and/or fingerprint(s) of interest. For example, after the method for dynamically adjusting breath collection automatically based on realtime feedback (e.g., as described above) is run, a further breath sample may be collected and analyzed using gas chromatography (GC) or liquid chromatography (LG) techniques, such as GC- MS or LC-MS. In some cases, the GC-MS or LC-MS may be coupled with SESI-HRMS including, e.g., in tandem with the SESI-HRMS.
  • GC gas chromatography
  • LG liquid chromatography
  • real-time feedback of measurements of the mass spectrometry analyzer may be generated and used to monitor data quality.
  • real-time feedback of the mass spectrometry analyzer may be automatically monitored in order to determine if the breath sample (i.e., or individual breaths thereof) is of a sufficient quality.
  • sufficient quality is meant capable of producing accurate breath assay results.
  • data quality may be monitored using a machine learning model as discussed in greater detail below. For example, real-time measurements may be fed to a trained machine learning model in order to determine if the measurements of an individual breath are of sufficient quality.
  • the subject may be prompted to provide an additional breath or additional breaths if a breath sample (i.e., or individual breaths thereof) is not of sufficient quality.
  • a technician or operator may monitor real-time feedback of the mass spectrometry analyzer in order to determine if the breath sample is of a sufficient quality or if one or more settings of the mass spectrometry analyzer should be adjusted.
  • the subject is a human.
  • the human is a protective service professional, a healthcare professional, a construction professional, a production professional, or a military professional, e.g., as is further detailed at: https://www.bls.gov/soc/2018/major_groups.htm.
  • a protective service professional such as a firefighter.
  • the methods of the invention may be employed on a subject wherein there is evidence the subject has a disease or condition or is at an elevated risk of developing a disease or condition.
  • the plurality of subjects may include two or more subjects. In some instances, the plurality of subjects may include ten or more subjects, such as twenty or more, or fifty or more, or one hundred or more, or two hundred or more, or five hundred or more, or one thousand or more, or five thousand or more, or ten thousand or more, or one hundred thousand or more.
  • the plurality of subjects may include the subjects of any demographic or cohort. For example, the subjects may be of any sex, gender, age, ethnicity, or race.
  • the plurality of subjects may include subjects associated with, or belonging to, a population or cohort of interest.
  • population or cohort of interest is meant a group of people banded together or treated as a group, such as a specific demographic of individuals.
  • the cohort of interest may be individuals experiencing or affected by (e.g., at risk for) a specific disease or condition.
  • the plurality of subjects may consist of only subjects belonging to a cohort of interest.
  • FIG. 7 provides a depiction of a method for dynamically adjusting breath collection automatically based on real-time feedback in accordance with an embodiment of the invention.
  • the subject supplies one or more initial breaths to the mass spectrometry analyzer for analysis.
  • real-time analysis is performed on the measurements generated by the mass spectrometry analyzer in order to identify one or more compounds or fingerprints of interest.
  • the compounds of interest may include toxins, and the fingerprints of interest may be generated using a machine learning model.
  • a check is done as to whether there is evidence for the presence of a compound or fingerprint of interest. For compounds of interest, any relevant m/z signal above a predetermined level associated with noise may be considered evidence of the compound of interest.
  • the mass spectrometry analyzer is automatically adjusted to “zoom in” (e.g., limit detection to) one or more features of interest at step S5.
  • the features of interest may be determined using a machine learning model. For example, compounds for which a minor alteration in detected intensity would change the identified fingerprint of interest may be classified as features of interest and “zoomed in” on.
  • a visual display such as a liquid crystal display (LCD) screen, prompts the subject to provide one or more additional breaths to the mass spectrometry analyzer. Steps S1 and S2 are then repeated, and the subject provides another breath or set of breaths for which real-time analysis is performed.
  • a check is done as to whether there is still evidence for the presence of a compound or fingerprint of interest after the measurements for the “zoomed in” on compound or compounds are received or updated. If evidence for the presence of a compound or fingerprint of interest is still present after SIM, the mass spectrometry analyzer is automatically configured to perform fragmentation for one or more features of interest at step S7.
  • Steps S6, S1 , and S2 are then repeated in order to verify the presence of the compound or fingerprint of interest, and the assay is ended.
  • the subject may be prompted to provide an additional breath or set of breaths prior to SIM, during SIM, and/or during fragmentation measurements as needed (e.g., at step S4). For example, if a trained machine learning algorithm or an operator monitoring breath collection determines an individual breath or set of breaths is not of a sufficient quality, another breath or set of breaths may be provided without resetting the automatic dynamic breath collection process.
  • the subject may be prompted to provide multiple breaths or series of breaths to support SIM (e.g., to enhance the statistical significance of results) or to gather additional data for deep learning, as described in greater detail below.
  • FIG. 8A and 8B provide an example of SIM.
  • a range from 0 m/z to roughly 1750 m/z is measured in a single scan.
  • a smaller range from roughly 500 m/z to 750 m/z is measured in a single scan, allowing for greater sensitivity and the distinction of compounds similar in m/z value.
  • embodiments of the methods include analyzing breath samples from a plurality of subjects with a secondary electrospray ionization-high-resolution mass spectrometry analyzer to generate a plurality of breath biopsy output files.
  • the methods and techniques by which a breath biopsy file may be generated and analyzed, in accordance with embodiments of the invention, may vary.
  • breath assay data may be generated and analyzed in real-time, e.g., as described in United States Provisional Application Serial Nos. 63/359,134 and 63/416,185 (Attorney docket nos. DIAG-003PRV and DIAG-003PRV2, respectively); the disclosures of which are herein incorporated by reference.
  • the breath assay includes mass spectrometry such as, e.g., SESI-MS
  • the breath sample may be assayed by a mass spectrometry analyzer to generate a breath biopsy output file.
  • the breath biopsy output file is a RAW file.
  • RAW file is meant a file that has not been compressed, encrypted, or processed.
  • the breath biopsy output file (e.g., RAW file) may then be automatically detected.
  • the automatically detected breath biopsy output file may then be associated with an identifier of the subject to produce an identifier associated breath biopsy output file.
  • associating the automatically detected generated breath biopsy output file with an identifier of the subject includes: receiving an identifier from the subject; and confirming that the generated breath biopsy output file is from analysis of the breath sample obtained from the subject.
  • computer code e.g., a program
  • the identifier is associated with the automatically detected generated breath biopsy output file by a human operator, while in other instances the identifier is associated with the automatically detected generated breath biopsy output file by a program (e.g., after confirmation). In other cases, the automatically detected breath biopsy output file is automatically associated with the subject identifier without confirmation from a human operator or technician (e.g., by a program).
  • a breath biopsy output file (e.g., RAW file) may be automatically detected and subsequently associated with the subject (i.e., an identifier of the subject) to produce an identifier associated breath biopsy output file.
  • the identifier of the subject may vary, where examples of identifiers include, but are not limited to alpha/numeric identifiers (e.g., an identification number or a string of letters and/or numbers), codes such as, e.g., QR codes, barcodes, etc.
  • the identifier may identify the subject through association with identifying information of the subject such as, but not limited to, the subject’s full legal name, contact information, home address, social security number, etc.
  • the association may occur in a database or in a datasheet (e.g., wherein the identifying information may be found by searching for the identifier). In these cases, it may be relatively difficult or impossible to associate the identifying information of the subject with the identifier without access to the database or the datasheet (i.e., the database or datasheet is secured and/or protected).
  • the identifier is generated for or assigned to the subject during the session or appointment in which the breath sample is collected (and, e.g., subsequently analyzed wherein the breath biopsy output file is produced). In other embodiments, the identifier is generated for or assigned to the subject before the session or appointment in which the breath sample is collected.
  • the subject may provide their identifying information through any number of means including, e.g., by navigating to a web address or via email, wherein an identifier is generated for or assigned to the subject after the subject has provided their identifying information.
  • the subject may provide the identifier to a technician or operator prior to the collection and analysis (i.e., assaying) of the breath sample of the subject.
  • the subject may provide a QR code to an operator or technician, wherein by scanning the QR code the identifier is received from the subject.
  • the identifier may be automatically generated for or assigned to the subject after the subject has provided their identifying information.
  • the subject may fill in or submit an initial health information questionnaire that may be associated with the identifier of the subject.
  • the method includes associating the identifier with a prior health record of the subject.
  • the file may be converted to an open XML-based format such as, e.g., mzML format.
  • metadata associated with the identifier associated breath biopsy output file may be obtained.
  • the obtained metadata may include, but is not limited to, the subject’s identifier and/or identifying information, a health questionnaire submitted by the subject, mass spectrometer status/settings, temperature, humidity, etc.
  • the metadata is saved in a file (e.g., a logfile) associated with the identifier associated breath biopsy output file (e.g., labeled with the subject’s identifier, a timestamp, a lab identifier, a machine identifier, etc.).
  • a technician may be enabled to enter comments to the metadata file if desired (e.g., indicating the breath sample assayed was contaminated).
  • the metadata file may be in a readable format such as, e.g., JSON, XML, CSV, CSON, TXT, etc.
  • an intuitive data set is generated from the identifier associated (and, e.g., converted) breath biopsy output file.
  • the intuitive data set may be structured and formatted in order to be compatible with the subsequent steps of the invention.
  • the intuitive data set may be structured and formatted in order to train a machine learning model, as discussed in greater detail below.
  • the intuitive data set (e.g., and the metadata file associated therewith) is used to generate a health report as described in greater detail below.
  • the intuitive data set is generated, at least in part, by reducing the data of the identifier associated breath biopsy output file.
  • the reduction may vary. In some embodiments, the reduction may depend on one or more components of the training and/or configuration of the machine learning model, as discussed in greater detail below. In embodiments wherein the breath sample is collected directly from the subject (i.e., without a phase transition), the reduction may include the processing step of automatically identifying individual breaths in the sample. Breath identification may occur by finding plateau signatures in the time-dependent total ion current (TIC) data received from the mass spectrometry analyzer and contained in the identifier associated breath biopsy output file.
  • TIC time-dependent total ion current
  • TIC is meant the summed intensity across the entire range of masses (m/z values) being detected at a single point in time.
  • plateaus may be identified by detecting large increases or decreases in TIC between different timepoints, e.g., before or after timepoints reflecting a relatively uniform TIC.
  • the identified breaths in the sample may then be assigned breath identification numbers.
  • a breath duration is determined for each identified breath indicating the time from the onset of the breath to the end of the breath. In these instances, data or measurements generated at the beginning or end of each breath duration (i.e., data or measurements at the shoulders of each identified breath) may be excluded or discarded.
  • the time given data of an identified breath (e.g., a measurement or a peak) is generated from the beginning of the breath is determined and/or recorded. In these instances, the time from the beginning of an identified breath the given data is generated is used to distinguish between data received from deep and shallow portions of the exhaled breath.
  • the reduction process may include the step of automatically identifying all features (i.e., peaks or measurements) of the breath sample from the identifier associated breath biopsy output file. Statistical measures of the identified features may then be determined. For example, a per-breath average and standard deviation describing specific features in each identified breath may then be determined.
  • the automatically identified features of the breath sample may be matched or associated with compounds, e.g., using the mass to charge ratio (m/z) of each peak and/or the time from the beginning of an identified breath each peak was generated.
  • a value of abundance is generated for the identified peaks matched or associated with compounds, e.g., using the intensity of each peak and/or the identity of the associated compound.
  • the reduction process may include the step of rectifying or correcting spectra such as, e.g., reducing noise or correcting the m/z or intensity value of an identified peak or peaks.
  • the mass spectrometry analyzer may generate a plurality of scans during the breath sample assay. These scans may be uniquely and adaptively sampled in the m/z space.
  • rectifying or correcting spectra may include the resampling and interpolation of all scans to a single m/z space or axis (e.g., a common m/z array).
  • each individual scan is processed and analyzed in their own unique m/z space, and the sample scans are linked from one scan to the next (e.g., temporally).
  • the reduction may include the step of omitting or excluding (e.g., deleting) data determined to not be necessary for further analysis (e.g., the training of a machine learning model, as described below) after a processing step or processing steps (e.g., as described above) have been performed or executed.
  • a processing step or processing steps e.g., as described above
  • data e.g., peaks or scans
  • identified features i.e., peaks
  • of the breath sample that cannot be matched or associated with compounds may be deleted or omitted.
  • identified features i.e., peaks
  • data determined to not be necessary for the generating of a health report may be deleted or omitted.
  • a code or program may be configured to reduce the data, e.g., as described above.
  • the code may be wrapped (i.e., the code may be encapsulated in a wrapper function).
  • data e.g., arguments
  • the reduction code or program automatically reduces the identifier associated breath biopsy output file to generate an intuitive data set.
  • an overview of the results of the breath sample assay may be generated from the data of the converted identifier associated breath biopsy output file or the intuitive data set generated therefrom.
  • the overview may include the number of peaks found, the peaks found at different m/z values over the time the assay was run, total ion current, various statistical analyses, the number of matched or associated compounds detected per identified breath, an intensity distribution, a histogram of the number of features per m/z value, etc.
  • the overview may additionally contain data from the breath collection device or system.
  • the overview may contain the flow rate a breath sample was collected at, the volume of a breath sample, the temperature of a breath sample, a value of abundance of water vapor or carbon dioxide in a breath sample (e.g., the percentage of water vapor or carbon dioxide in a breath sample), etc.
  • the overview may display or convey the results of the breath sample assay on a per assayed breath basis. In some cases, this may allow outlier breaths to be identified and potentially excluded from the health report in order to, e.g., enhance the accuracy of the results.
  • outlier breathes are identified using a machine learning model such as, e.g., a machine learning model trained or including architecture as described below.
  • outlier breaths may be identified using a rules-based system.
  • the overview may indicate potential problems including, but not limited to, problems associated with the breath sample quality, possible contamination, etc.
  • an operator or technician may choose to adjust the machine configuration or capture additional breath samples based, at least in part, on feedback provided by the overview.
  • the overview may be generated in real time. By real time is meant the overview is generated during or immediately following the breath sample assay (e.g., during collection of the breath sample or while the breath sample is being analyzed using, e.g., a mass spectrometry analyzer). In some instances, the overview is generated in two hours or less.
  • the overview is generated in one hour or less, such as thirty minutes or less, or twenty minutes or less, or ten minutes or less, or five minutes or less, or one minute or less.
  • one or more of the identifier associated breath biopsy output file, the intuitive data set generated from the breath biopsy output file, the metadata file associated with the breath biopsy output file, or the overview of the results of the breath sample assay may be saved or archived to a database such as, e.g., a database including a data warehouse.
  • a database such as, e.g., a database including a data warehouse.
  • one or more non-breath assay health records of the subject are associated with the identifier of the identifier associated breath biopsy output file, the intuitive data set, the metadata file, and/or the overview. The one or more non-breath assay health records of the subject are then saved or archived to the database (e.g., data warehouse) with the breath biopsy files.
  • FIG. 1 provides a depiction of an overview of the results obtained from an identifier associated breath biopsy output file in accordance with an embodiment of the invention.
  • Overview (i.e., Quicklook) 100 includes header 101 and selectable menu 102 provided to assist a viewer in navigating between sections of a health report when, e.g., the report and the overview are both displayed on an electronic viewing device (e.g., a computer or a smart phone).
  • Session summary 103 provides information pertaining to the session in which the breath sample assay was performed.
  • the overview further includes the identifier of the subject 104 as well as various charts and graphs depicting data of the intuitive data set generated from the breath sample assay.
  • Graph 105 depicts the TIC per sample number (i.e., scan number), with the orange line indicating sample numbers wherein an exhaled breath is received by the mass spectrometer.
  • Graph 106 depicts the m/z value of compounds detected by the mass spectrometer over time.
  • Graph 107 depicts the total number of peaks found per identified exhaled breath received by the mass spectrometer.
  • Graph 108 depicts a histogram of the number of features detected per m/z value, with colors indicating which identified exhaled breath each bin belongs to.
  • the overview may be generated, at least in part, using a trained machine learning algorithm. In these cases, the overview may further indicate breaths determined to not be of a sufficient quality that were excluded from downstream analysis (e.g., to generate a health report).
  • FIG. 2 provides a depiction of a method for generating an intuitive data set from an identifier associated breath biopsy output file in accordance with an embodiment of the invention.
  • an identifier associated breath biopsy output file (produced from a RAW file as described above) is converted to mzML format.
  • data from the mzML file is passed to a wrapper function configured to reduce the data of the mzML file and produce an intuitive data set using, e.g., rules-based approaches.
  • the wrapper function performs peak analysis in order to, e.g., identify and associate peaks of the breath sample assay with target compounds. The results of the peak analysis are then passed into a data frame at step 203.
  • the wrapper function automatically identifies and labels individual breaths in the sample.
  • the results of the breath identification are then passed into a data frame at step 205.
  • the wrapper function rectifies or corrects spectra in order to, e.g., reduce noise or correct the m/z or intensity value of an identified peak or peaks.
  • the results of the breath identification are then passed into a data frame at step 207.
  • Steps 202, 204, and 206 may be performed concurrently or sequentially in any order. For example, step 204 may be performed prior to step 202. In this case, the peaks not occurring within an identified breath may be omitted from the peak analysis.
  • metadata associated with the identifier associated breath biopsy output file may be captured or obtained.
  • the metadata e.g., a metadata file
  • the output of the reducing wrapped function i.e., the intuitive data set
  • the metadata and intuitive data set may be saved via local storage and/or cloud storage and, e.g., may be saved to a database such as a data warehouse.
  • the metadata and intuitive data set are associated with one or more non-breath assay health records of the subject before being saved or archived with the non-breath assay health record(s).
  • the one or more non-breath assay health records may be associated with an identifier of the subject (e.g., as discussed above) and saved before or after the breath assay data.
  • the association of the non-breath assay data with the breath assay data may then be made in the database.
  • the metadata and intuitive data set are used to generate an overview of the results of the breath sample assay (i.e., a Quicklook or a Quicklook report).
  • the metadata and intuitive data set are used to generate other reports such as, e.g., a health report as described in greater detail below.
  • the health report may be generated based on correlations and relationships determined from the previously stored metadata, breath biopsy output files and/or intuitive data sets of a plurality of subjects in combination with one or more non-breath assay health records of each subject.
  • a dynamic model such as a machine learning model (e.g., as described below) may be trained and updated each time step 209 is run (i.e., whenever new data is stored or archived).
  • the health report may then be generated, at least in part, using the trained machine learning model.
  • embodiments of the methods include obtaining a health record associated with a disease or condition for each subject.
  • obtain is meant to make the health record(s) accessible or available for the subsequent steps of the methods (e.g., available for training the machine learning model).
  • health records associated with a disease or condition are health records that indicate a diagnosis of the disease or condition in the subject.
  • health records associated with a disease or condition are health records that disclose the manifestation of signs or symptoms of the disease or condition in the subject.
  • the disease or condition may be the relative condition of the subject’s overall health or the health or condition of an organ or system of the subject’s body.
  • the disease or condition may be any disease or condition that impairs or affects the normal functioning of the body.
  • the disease or condition may be, e.g., an infectious disease, deficiency disease, hereditary disease, or physiological disease.
  • the infectious disease may be, e.g., a bacterial disease or infection (such as, e.g., syphilis, pneumonia, tetanus, and/or tuberculosis), a viral disease or infection (such as, e.g., chickenpox, measles, herpes, the common cold, or COVID-19), a fungal disease or infection (such as, e.g., ringworm infection, athlete’s foot, or yeast infections), or a parasite or parasitic disease (such as, e.g., malaria).
  • a bacterial disease or infection such as, e.g., syphilis, pneumonia, tetanus, and/or tuberculosis
  • a viral disease or infection such as, e.g., chickenpox, measles, herpes, the common cold, or COVID-19
  • a fungal disease or infection such as, e.g., ringworm infection, athlete’s
  • the disease or condition is a deficiency disease
  • the deficiency disease may be, e.g., malnutrition, scurvy, rickets, osteoporosis, or a birth defect.
  • the hereditary disease may be, e.g., cystic fibrosis, Huntington’s Disease, sickle cell anemia, a birth defect, etc.
  • the disease or condition may be affected by, but not unilaterally caused by, genetics or may be a polygenic disease.
  • the disease or condition may be caused by a combination of genetic and environmental factors and may be asthma, an autoimmune disease such as multiple sclerosis, cancer (e.g., colon, skin, or lung cancer), ciliopathy, cleft palate, diabetes, chronic obstructive pulmonary disease, heart disease, hypertension, inflammatory bowel disease, an intellectual disability, a mood disorder, obesity, refractive error, infertility, schizophrenia, or any number of a variety of mental disorders.
  • the disease or condition is a physiological disease
  • the physiological disease may be, e.g., diabetes, cancer, hypertension, or heart disease.
  • the disease or condition may include any disease or condition caused by environmental factors, behavior, or diet.
  • the disease or condition may be a psychological disease or condition such as, e.g., an anxiety disorder, depression, bipolar disorder, post-traumatic stress disorder (PTSD), schizophrenia, an eating disorder, a disruptive behavior and/or dissocial disorder, or a neurodevelopmental disorder.
  • the disease or condition may be hypothermia, hyperthermia, or may otherwise result from exposure to prolonged or extreme hot or cold temperatures.
  • the disease or condition may result from an injury or may affect mobility.
  • the disease or condition may be toxin exposure or may result from the exposure of the subject to one or more toxins or sources of toxins.
  • the disease or condition may be the presence of a compound of interest, such as a toxin, in the breath and/or body of the subject.
  • the one or more toxins includes one or more carcinogens.
  • Carcinogens of interest include, but are not limited to, carcinogens classified as being Group 1 carcinogens by the International Agency for Research on Cancer (IARC).
  • a Group 1 classification indicates that an agent (e.g., a compound) exhibits sufficient evidence of carcinogenicity in humans.
  • Carcinogens of interest also include, but are not limited to, carcinogens classified as Group 2A carcinogens by the IARC.
  • Group 2A classification indicates that an agent (e.g., a toxin) is probably carcinogenic.
  • embodiments of the methods include obtaining a health record associated with a disease or condition for each subject.
  • the health record includes one or more of a personal health record (PHR), electronic medical record (EMR), or electronic health record (EHR) of the subject.
  • the health record includes selfreported health data such as, e.g., the subject’s responses to a survey or a health information questionnaire (e.g., as described above).
  • the health record may include non-health data.
  • the non-health data may include information regarding the subject that has the potential to affect, or be affected by, the subject’s health.
  • the non-health data may include one or more cohorts in which the subject belongs such as, e.g., the subject’s profession, the various tasks or responsibilities associated with the subject’s profession, or the location in which the subject lives or works (e.g., country, state, city, local geography, proximity to locations of interest such as, e.g., industrial facilities, etc.).
  • the subject belongs such as, e.g., the subject’s profession, the various tasks or responsibilities associated with the subject’s profession, or the location in which the subject lives or works (e.g., country, state, city, local geography, proximity to locations of interest such as, e.g., industrial facilities, etc.).
  • the health record includes one or more non-breath health assessments. While the one or more non-breath health assessments may vary, in some instances, the one or more health assessments may include a health assessment selected from the group consisting of a lung health assessment, an assessment of fitness for a given task(s), a medical imaging assessment (e.g., an ultrasound assessment), a biological sample assessment (e.g., urine tests, feces tests, blood tests, biopsies, etc.) and combinations thereof. In some instances, the biological sample assessment may include a blood panel such as, e.g., a complete blood count (CBC).
  • CBC complete blood count
  • the CBC may include counts of white blood cells, red blood cells and platelets, the concentration of hemoglobin, the hematocrit, red blood cell indices, white blood cell differentials, etc.
  • the non-breath health assessment may include a microbiome test or assay (e.g., 16S sequencing or shotgun metagenomic sequencing).
  • the non-breath health assessment may include a genetic test or DNA testing.
  • the health record may include physiological data, such as, but not limited to, one or more of heart rate, blood glucose, blood pressure, respiration rate, body temperature, blood volume, sound pressure, photoplethysmography, electroencephalogram, electrocardiogram, blood oxygen saturation, and skin conductance.
  • physiological data may be obtained using a wearable device.
  • Wearable devices in accordance with embodiments of the methods may include, but are not limited to, smartwatches (e.g., Apple watches, Garmin watches, or Fitbit® watches), sleep trackers (e.g., Oura rings), or heart rate monitors.
  • the wearable device may include motion sensors (e.g., accelerometers and gyroscopes), electrical sensors (e.g., electrocardiogram sensors), or light sensors (e.g., photoplethysmography (PPG) sensors).
  • the wearable device is a medical Internet of Things (loT) device.
  • Medical loT devices of interest may include, but are not limited to, implanted medical devices (IMDs) (e.g., insulin pumps or defibrillators), wearable medical devices (e.g., continuous glucose monitors), and discrete devices (e.g., loT enabled blood pressure cuffs).
  • IMDs implanted medical devices
  • wearable medical devices e.g., continuous glucose monitors
  • discrete devices e.g., loT enabled blood pressure cuffs.
  • the non-breath health assessments and/or physiological data may be associated with the diagnosis of a disease or the assessment of a condition in the subject.
  • the health assessments and/or physiological data may have been used to inform the diagnosis of a disease or assess a condition in the subject.
  • the health assessments and/or physiological data may reflect a sign or symptom of a disease or condition in the subject.
  • the health assessments and/or physiological data may regard a subject diagnosed with a disease or condition or having been assessed as having a given condition of overall health, organ health, or system health (e.g., lung health is excellent, overall good, somewhat poor, overall poor, etc.).
  • the health assessments and/or physiological data may regard a subject known to be free of a disease or condition (e.g., the subject is healthy, the subject does not have COPD, etc.).
  • the health records may be obtained directly or indirectly from the subject, a caregiver or provider of the subject, or a database or data warehouse (e.g., as described above).
  • the health records e.g., associated with a disease or condition for each of the plurality of subjects
  • the health records may be obtained, at least in part, by converting the health records to a form compatible with a subsequent step or steps of the methods.
  • the health records may be converted from a format difficult for machines to interpret to a format in a standard computer language that can be read automatically by a machine.
  • OCR optical character recognition
  • the health records data may be converted to a JSON format, an XML format, a CSV format, a CSON format, an HTML format, etc.
  • organizational or categorical information structuring or classifying the health records data may be manually entered.
  • one or more components of a health record (e.g., as discussed above) or a section thereof, may be categorized using date or diagnosis codes (such as, e.g., diagnosis codes associated with a disease or condition).
  • organizational or categorical information may be automatically identified from structured digital health records data and used to identify or classify one or more components of a health record, or sections thereof, using, e.g., lines of computer code and rules- based approaches or supervised machine learning approaches paired with natural language processing software.
  • the EHR data may be obtained by scanning or imaging a plurality of health records existing in hard copy form, followed optionally by conversion of the resulting image files in any of the manners discussed above.
  • embodiments of the methods include obtaining a health record associated with a disease or condition for each subject.
  • Health records associated with a disease or condition may be, e.g., health records that indicate a diagnosis of the disease or condition in the subject or disclose the manifestation of signs or symptoms of the disease or condition in the subject.
  • the condition may be the relative condition of the subject’s overall health, an organ of the subject, or a system of the subject’s body.
  • the disease or condition may be any disease or condition that impairs or affects the normal functioning of the body.
  • the health record may include a personal health record (PHR), electronic medical record (EMR), electronic health record (EHR), self-reported health data, non-health data, nonbreath health assessment and/or physiological data regarding the subject and may be provided by the subject, a caregiver or provider of the subject, or a database or data warehouse as described above.
  • obtaining the health records may include converting the health records to a form that can be read automatically by a machine and is compatible with a subsequent step or steps of the methods (e.g., automatic supervised training of a machine learning model).
  • the health records obtained for each subject, together with the breath biopsy output files generated for each subject may then be used to train a machine learning model to identify a relationship between breath samples and a disease or condition of interest, as discussed in greater detail below.
  • embodiments of the methods include training a machine learning model to identify a relationship between breath samples and a disease or condition using generated breath biopsy files and obtained health records.
  • training is meant providing or feeding the breath biopsy output files and one or more elements of the obtained health records to the machine learning model so that the model can adjust one or more of its components (e.g., weights or biases) in order to or effectively (e.g., accurately or efficiently) perform a task.
  • the machine learning model in accordance with embodiments of the methods, may vary and may include, but is not limited to, any of the models discussed below.
  • the training may further include validating and testing.
  • the obtained health records are used to interpret the findings or inferences generated by a machine learning model using the subject’s breath.
  • the findings or inferences generated by the machine learning model using the subject’s breath may include changes in a health state or a condition of health of the subject.
  • the machine learning model may be trained to indicate a change in the fingerprint of a subject’s breath using unsupervised machine learning techniques.
  • the subject may provide breath samples (e.g., to generate breath biopsy output files) at two or more timepoints such that the most recent sample provided by the subject can be compared to a baseline.
  • the baseline may include breath sample data generated from a breath sample provided by the subject 1 day prior to the most recently provided breath sample, 1 week prior to the most recent breath sample, 1 month prior to the most recent breath sample, 6 months prior to the most recent breath sample, 1 year prior to the most recent breath sample, 5 prior to the most recent breath sample, etc.
  • the machine learning model may then use data generated from the most recent breath sample provided by the subject and the baseline in order to look for temporal changes of the subject’s breath fingerprint.
  • the obtained health records (including, e.g., health records obtained at the time the baseline breath sample was provided and/or health records obtained at the time the most recent breath sample was provided) may then be used to interpret any identified temporal changes.
  • the tasks performed by the machine learning model may depend on the nature of the disease or condition of interest.
  • the machine learning model may be trained to identify features of a breath sample (e.g., the relative abundance of a set of metabolites or other compounds) that correspond or correlate with a diagnosis of the disease or condition in order to, e.g., identify a signature of the disease or condition.
  • the machine learning model may then be applied to a breath biopsy output file generated for a subject (i.e., separate from the breath biopsy output files used for training) in order to indicate a diagnosis of the disease or condition in the subject using the identified features.
  • the machine learning model may be applied to a breath biopsy output file generated for a subject in order to indicate the likelihood the subject has a disease or condition, or a prediction as to whether subject may develop a disease or condition (e.g., if they maintain their current lifestyle).
  • the condition is the relative condition of the subject’s overall health or the relative condition of an organ or system of the subject’s body
  • the machine learning model may be trained to classify a breath biopsy output file using a numerical score representative of the overall health or the relative condition of an organ or system of the subject providing the breath sample.
  • the tasks performed by the machine learning model may depend on the nature of health records obtained for each subject.
  • the machine learning model may be trained to identify relationships between features of a breath sample and features of the health assessment in order to classify the breath sample as belonging to a subject having the disease or condition.
  • the machine learning model may be trained to identify features in the breath sample that correspond to the presence of specific bacteria or genes in the microbiome.
  • the trained machine learning model may then be able to identify specific bacteria or genes in the microbiome of a subject by analyzing the subject’s breath (i.e., a breath biopsy files generated from the subject’s breath).
  • the machine learning model may be trained to only identify features in the breath sample that correspond to the presence of specific bacteria or genes in the microbiome that are indicative of a disease or condition of interest (e.g., using a microbiome assessment and a disease or condition diagnosis).
  • the machine learning model may be trained to utilize both the health assessment and the breath biopsy file in order to identity subject’s at risk for, or having, a disease or condition of interest.
  • the machine learning model be trained to identify breath assay data of insufficient quality.
  • bad breath assay data may be labeled (e.g., automatically or by a person of skill in the art) in order to train the machine learning model to recognize data of insufficient quality as the result of, e.g., ambient air or contamination.
  • the machine learning model may be trained to identify bad data (e.g., data of insufficient quality) using any of the techniques or methods used to train the machine learning model as described below (e.g., the machine learning model may be trained to determine a fingerprint for bad data).
  • the machine learning model in accordance with embodiments of the methods, may vary and may include, but is not limited to, any of the models discussed below or any standard machine learning model, as well as combinations thereof, as is known in the art.
  • the machine learning model may depend on, e.g., the nature of the obtained health records and the disease(s) or condition(s) of interest.
  • the relationships between features of the breath samples and features of the health records identified by the machine learning model may be obtained or extracted for downstream analysis.
  • the machine learning model may include, or be configured to employ, a linear and/or logistic regression algorithm, a linear discriminant analysis algorithm, a support vector machine (SVM) algorithm, a random forest algorithm, a K- Nearest Neighbors algorithm, a decision tree algorithm, or an XGBoost algorithm.
  • the relationships between features of the breath samples and features of the health records identified by the machine learning model may be difficult to obtain or extract and/or may be unknown to the individuals implementing the model (e.g., the relationships may be too complexed to be understood or interpreted by a human or the relationships may contained in a component of the machine learning model considered to be a “black box”).
  • the features of interest e.g., of a compound, toxin source, disease, or condition fingerprint
  • the features of interest may include unidentified peaks or measurements (i.e., m/z signals).
  • the machine learning model may include an artificial neural network (NN).
  • the machine learning model is a deep learning model.
  • the model may be three or more layers deep, such as five or more layers deep, or ten or more, or twelve or more, or thirty or more, or fifty or more, or one hundred or more.
  • the data of the breath biopsy output files may be provided in an image format (e.g., as a total ion current (TIC) chromatogram or spicier diagram).
  • the machine learning model may be configured to process images and may include, or be based on, a convolutional neural network (CNN), recurrent neural network (RNN), region-convolutional neural network (R-CNN), etc.
  • CNN convolutional neural network
  • RNN region-convolutional neural network
  • the machine learning model is configured to process sequential input data.
  • the machine learning model may include, or be based on, a recurrent neural network (RNN) model or a transformer model.
  • RNN recurrent neural network
  • the RNN may include, e.g., long short-term memory (LSTM) architecture, gated recurrent units (GRUs), or attention (i.e., may employ the attention technique or include an attention unit).
  • the machine learning model may include, or be based on, the architecture of a transformer model.
  • the machine learning model may be configured to process sequential input data.
  • the sequential input data may be a sequence of scans presented, e.g., as temporally linked numerical matrices or images.
  • the machine learning model may be configured to learn from the contextual information of a scan (i.e., the scans before or after a given scan sequentially/temporally).
  • the machine learning model may learn from the contextual information of a scan and, e.g., may learn from the past to present context of a scan and/or the present to past context of a scan.
  • the machine learning model may learn from both the past to present context and the present to left past context of a scan (i.e., the machine learning model may be bidirectional).
  • the machine learning model may include, or be based on, e.g., a bi-directional LSTM model, an RNN model with an attention, a convolutional recurrent neural network model with an attention (CRNN-A), or a transformer model.
  • the transformer model may include decoder blocks, encoder blocks and/or encoder/decoder architecture.
  • the machine learning model may be trained using supervised learning methods.
  • relevant data of interest e.g., disease diagnoses, gene expression, microbiome bacteria, etc.
  • the labels or categories of interest, and the labeled breath biopsy data may then be used to train the machine learning algorithm.
  • the extraction of the labels, association of the extracted labels with the generated breath biopsy output files, and training of the machine learning model are performed automatically using, e.g., lines of computer code and rules-based approaches or supervised machine learning approaches paired with natural language processing software.
  • the health records that include relevant data of interest may be scarce.
  • semi-supervised learning methods may be employed.
  • unsupervised learning methods may be employed and, e.g., the categories or classifications generated by training the machine learning model may be correlated or associated with certain characteristics of patient cohorts or certain components of obtained health records after training.
  • both supervised and unsupervised learning methods may be employed.
  • unsupervised learning methods may be used to detect any temporal changes in breath fingerprints that occur in the plurality of subjects (e.g., as described above). Characteristics of the temporal changes may then be extracted and labeled (e.g., using labels extracted from health records) in order to train a machine learning model using supervised machine learning techniques.
  • the model training algorithms and hyperparameters used to control the training may depend on, e.g., the nature or architecture of the machine learning model, the tasks the machine learning model is trained to perform, the desired accuracy or efficiency of the machine learning model, and/or the nature or size of the training data set.
  • the training may include methods of preventing data overfitting such as, e.g., dilution and dropout techniques.
  • the training and/or the training data set (e.g., the labeled breath biopsy output files) may be modified or altered to address class imbalance.
  • class imbalance is meant a skewed proportion of the classes that make up a data set.
  • labeled breath biopsy data reflecting a specific relationship or classification may be relatively uncommon in the data set.
  • the training may be modified or altered to address class imbalance.
  • the optimization loss may be weighted based on class distributions. In these cases, the weighting may be learned dynamically, e.g., during training.
  • the training data set may be modified to address class imbalance. In these instances, the majority class may be undersampled.
  • breath biopsy data not labeled with the diagnosis of a disease or condition may be randomly undersampled.
  • the majority class or classes may be randomly undersampled to achieve a ratio of one to five minority class (i.e., rare relationship or classification) to majority class(es) or less.
  • the majority class(es) may be undersampled to achieve a ratio of one to fifty minority class (i.e., rare relationship or classification) to majority class(es) or less, such one to twenty, or one to ten, or one to five, or one to four.
  • the training may further include testing the trained machine learning model or machine learning models.
  • testing in this context is meant evaluating the trained machine learning model using labeled breath biopsy data different from the labeled breath biopsy data used for training after the machine learning model has finished training.
  • a first subset of the labeled breath biopsy data is used for training and a second subset of the labeled breath biopsy data is used for testing.
  • the testing may use one or more metrics to evaluate the performance of the trained machine learning model or machine learning models.
  • the one or more metrics may vary and may depend on the tasks performed by the trained machine learning model, the training methods employed to train the machine learning model, and the architecture of the machine learning model.
  • the metric may include the number, or percent, of true positives, false positives, true negatives, or false negatives for one or more classes.
  • the metric may include a sensitivity, specificity, accuracy and/or f-score.
  • a metric may be determined per class.
  • the f- score may include a macro F1 -score.
  • the metric may include a silhouette coefficient or any other method of evaluating an unsupervised machine learning model such as, e.g., any of the methods found in: Palacio- Nino, J., Galiano, F.B. Evaluation Metrics for Unsupervised Learning Algorithms, which are herein incorporated by reference.
  • the metric may be used to determine if the trained machine learning model performs sufficiently using, e.g., a predetermined threshold (i.e. , requirement). In these instances, if the trained machine learning model does not meet the predetermined threshold, the model may be discarded and/or another model may be trained.
  • one or more of the model architecture, training and/or the training data set may be modified prior to training.
  • machine learning models are trained until a trained machine learning models meets the predetermined threshold.
  • the division between the first and second subsets of the labeled breath biopsy data used for training and testing, respectively, may vary. In some cases, roughly 80% of the labeled breath biopsy data may be used for training and roughly 20% for testing. In some instances, roughly 70% of the labeled breath biopsy data may be used for training and roughly 30% for testing.
  • the training may further include validating the trained machine learning model or machine learning models.
  • validating in this context is meant evaluating the machine learning model during training using labeled breath biopsy data different from the labeled breath biopsy data used for training and testing.
  • a first subset of the labeled breath biopsy data is used for training
  • a second subset of the labeled breath biopsy data is used for testing
  • a third subset of the labeled breath biopsy data is used for validating.
  • the validating may use one or more metrics to evaluate the performance of the machine learning model or machine learning models such as, e.g., any of the metrics discussed above for testing.
  • the machine learning model may be continuously updated based, e.g., on newly generated breath biopsy output files and newly obtained health records.
  • the machine learning model may be continuously updated based, e.g., on the data saved or archived to a database, or data warehouse, as discussed above.
  • the machine learning model may be updated by training incrementally as new data comes in, in batches once a certain amount of new data is available, or the machine model may be retrained from scratch once a certain amount of new data is available.
  • the machine learning model may be updated incrementally or in batches, and then completely retrained once a certain amount of new data is available (e.g., every certain number of batch updates).
  • embodiments of the methods include training a machine learning model to identify a relationship between breath samples and a disease or condition using generated breath biopsy files and obtained health records.
  • the relationship may be difficult to obtain or extract (e.g., the relationship may be too complexed to be understood or interpreted by a human or the relationships may contained in a component of the machine learning model considered to be a “black box” such as within multiple layers of a NN).
  • the machine learning model may be trained to perform any task associated with assessing a subject’s health including any task demonstrated, or enabled, by the obtained health records as described above.
  • the machine learning model may be trained to identify relationships between features of a breath sample (e.g., the relative abundance of a set of metabolites or other compounds) and the diagnosis of a disease or condition.
  • the machine learning model may include, but is not limited to, any of the discussed models or any standard machine learning model, as well as combinations thereof, as is known in the art.
  • the machine learning model may include an artificial neural network (NN).
  • the machine learning model may include, or be based on the architecture of a recurrent neural network (RNN) or a transformer model. Training may depend on, e.g., the nature or architecture of the machine learning model, the nature of the obtained health records, and/or the nature of the disease or condition of interest.
  • RNN recurrent neural network
  • the machine learning model may be trained using supervised learning methods and relevant data of interest (e.g., disease diagnoses, gene expression, microbiome bacteria) may be extracted from the health records and used to label the corresponding breath biopsy output file of each subject.
  • the machine learning model may be trained using unsupervised approaches. In some cases, both supervised and unsupervised approaches may be utilized in order to assess a subject’s health (e.g., to diagnose a disease or condition).
  • the training may further include validating, and testing of the machine learning model.
  • the trained machine learning model may be applied to a breath biopsy output file (e.g., different from the files used for training) to generate a health report, as discussed in greater detail below.
  • the extracted labels are then associated with the breath biopsy output file corresponding to the patient for which the health record used to extract each label was obtained.
  • a machine learning model such as, e.g., a RNN, CNN, transformer, or regression model
  • components of the obtained health records such as, e.g., other non-breath health assessments or physiological data, are also labeled and used to train the machine learning algorithm along with/in addition to the labeled breath biopsy output files.
  • another breath biopsy output file separate from the breath biopsy output files used for training, is generated from a subject.
  • the train machine learning algorithm is applied to the breath biopsy output file in order to classify the breath biopsy output file (step 909).
  • the breath biopsy output file is classified as, e.g., generated by a subject diagnosed with a disease or condition, reflecting one or more components of a non-breath health assessment, etc.
  • the classified breath biopsy output file, along with any other health records obtained for the subject may then be saved to a database or a data warehouse (e.g., as discussed above) in order to continuously train the machine learning model or train other machine learning models that may be applied to future breath biopsy output files.
  • embodiments of the invention include applying a trained machine learning model to a breath biopsy output file to generate a health report for the subject.
  • the health report is a qualitative or quantitative determination regarding one or more health related matters pertaining to the subject.
  • the health report generated in accordance with embodiments of the methods, may vary.
  • the health report may be generated for the subject from the data of the converted (e.g., to mzML format) identifier associated breath biopsy output file such as, e.g., from the intuitive data set generated from the breath biopsy output file.
  • the health report may be generated for the subject from the identifier associated breath biopsy output file and the metadata file associated therewith.
  • the health report may be generated or obtained based at least in part on the breath biopsy output file (i.e., breath assay data) as described above and/or on non-breath assay data (e.g., data not obtained from a breath sample).
  • breath assay data i.e., breath assay data
  • non-breath assay data e.g., data not obtained from a breath sample
  • a health report may be generated or obtained at two or more timepoints.
  • a health report may be generated or obtained at three or more timepoints (i.e., to generate three or more health reports, such as four or more, or five or more, or ten or more).
  • the two or more timepoints may be at least a day apart from each other, such as at least a week apart from each other, or at least a month apart from each other, or at least a year apart from each other.
  • a first timepoint of the two or more timepoints may occur after a potential exposure of the subject to a source of toxins or an indication that the subject may have a disease or condition.
  • a first timepoint of the two or more timepoints occurs before a potential exposure of the subject to a source of toxins or an indication that the subject may have a disease or condition in order to, e.g., function as a baseline as discussed above.
  • the first timepoint may occur prior to the subject initiating employment (e.g., as a firefighter) or moving to a new location.
  • the subject may be assayed (i.e., a timepoint may occur) every set number of days or months while they are at a certain location or working a certain profession (e.g., firefighting).
  • the non-breath assay data may vary.
  • the health report includes one or more non-breath health assessments. While the one or more additional health assessments may vary, in some instances, the one or more additional health assessments may include a health assessment selected from the group consisting of a lung health assessment, an assessment of fitness for a given task(s), a medical imaging assessment (e.g., an ultrasound assessment), a biological sample assessment (e.g., urine tests, feces tests, blood tests, biopsies, etc.) and combinations thereof.
  • the non-breath assay data may include a microbiome test or assay.
  • the non-breath assay data may include the medical history or health records of the subject.
  • the non-breath assay data may include physiological data, such as, but not limited to, one or more of heart rate, blood glucose, blood pressure, respiration rate, body temperature, blood volume, sound pressure, photoplethysmography, electroencephalogram, electrocardiogram, blood oxygen saturation, and skin conductance.
  • the physiological data may be obtained using a wearable device.
  • Wearable devices in accordance with embodiments of the methods may include, but are not limited to, smartwatches (e.g., Apple watches, Garmin watches, or Fitbit® watches), sleep trackers (e.g., Oura rings), or heart rate monitors.
  • the wearable device is a smartwatch such as, e.g., a Fitbit® watch.
  • the wearable device may include motion sensors (e.g., accelerometers and gyroscopes), electrical sensors (e.g., electrocardiogram sensors), or light sensors (e.g., photoplethysmography (PPG) sensors).
  • the wearable device is a medical Internet of Things (loT) device.
  • Medical loT devices of interest may include, but are not limited to, implanted medical devices (IMDs) (e.g., insulin pumps or defibrillators), wearable medical devices (e.g., continuous glucose monitors), and discrete devices (e.g., loT enabled blood pressure cuffs).
  • IMDs implanted medical devices
  • wearable medical devices e.g., continuous glucose monitors
  • discrete devices e.g., loT enabled blood pressure cuffs.
  • the health report may include data from the breath biopsy output file (i.e., breath assay data) and non-breath assay data (e.g., other health assessments, the subject’s medical history, data gathered from wearable devices, etc.).
  • the health report includes an interpretation of the breath assay data and non-breath assay data.
  • the interpretation may be derived based on the breath assay data and non-breath assay data either individually and/or in combination with one another.
  • the interpretation may include the likelihood that the subject has a disease or condition (e.g., a potential diagnosis). In these instances, the interpretation may include the severity or stage of the disease or condition.
  • the interpretation may include the likelihood or risk level the subject may have of developing a disease or condition.
  • the presence of one or more compounds and the abundance (e.g., concentration) of each compound relative to one another in a breath sample may be correlated with a disease or fingerprint (e.g., using a machine learning model as described above).
  • the potential diagnosis and/or risk level is generated by analyzing or assaying the breath sample for the presence of one or more compounds (e.g., or unidentified m/z peaks or measurements) of a disease or condition fingerprint.
  • the potential diagnosis and/or risk level may be generated by comparing the fingerprint of the disease or condition to the m/z peaks or measurements generated from the breath sample provided by the subject (e.g., the compounds, and the values of abundance thereof, detected in the breath sample assay as indicated by the identifier associated breath biopsy output file and intuitive data set generated therefrom) using the trained machine learning algorithm.
  • the health report may include an interpretation of the breath assay data alone or in combination with non-breath assay data.
  • This interpretation may be generated using the trained machine learning algorithm (e.g., as discussed above) and may include a potential diagnosis and/or a risk level of a disease or condition generated, e.g., by comparing the fingerprint of a disease or condition to the determined presence of one or more compounds of a disease or condition fingerprint (e.g., and the values of abundance thereof) in the breath sample.
  • a potential diagnosis and/or a risk level for a cancer such as, e.g., colon cancer
  • a potential diagnosis and/or a risk level for a cancer can be generated by comparing the determined presence of one or more compounds in the breath sample to compounds associated or correlated with colon cancer when found in breath (i.e., a determined colon cancer fingerprint of compounds or metabolites).
  • the correlation or association of compounds found in a breath sample to a specific disease or condition i.e., the relationship between compounds found in a breath and a disease or condition
  • a specific disease or condition i.e., the relationship between compounds found in a breath and a disease or condition
  • the correlation or association can be determined by comparing the determined presence of compounds (e.g., and their relative abundances) found in the breath samples of healthy patients with the determined presence of compounds found in the breath samples of patients diagnosed with a disease or condition.
  • the correlation or association may be generated using a dynamic algorithm, such as, e.g., a machine learning model as discussed above.
  • a potential diagnosis and/or a risk level for chronic obstructive pulmonary disease may be generated for the subject based, at least in part, on the determined presence and/or the relative abundance of one or more of: 2-hydroxyisobutyric acid, aspartic acid semialdehyde, acetohydroxybutanoic acid, 11 -hydroxyundecanoic acid, (+)-g- hydroxy-L-homoarginine, oxo-tetradecenoic acid, hexadecatrienoic acid, or oxo-heptadecanoic acid in the breath sample.
  • a machine learning model trained as discussed above, may be configured to provide the potential diagnosis and/or risk level using the determined presence and/or the relative abundance of one or more of the compounds disclosed above.
  • a potential diagnosis and/or a risk level for obstructive sleep apnea may be generated for the subject based, at least in part, on the determined presence and/or the relative abundance of one or more of: hexonate, hexonolactone, pentose, deoxypentose, hexose, butyrylcarnitine, propionylcarnitine, acryloylcarnitine, acetylcarnitine, carnitine, dehydrocarnitine, pentitol, deoxyhexose, hexuronate, hexitol, malonate semialdehyde, hydroxypropanoate, propanoate, hydroxybutyrate, succinate semialdehyde, methylaconitate, methylcitrate, aconitate, (iso)citrate, oxoglutarate, succinate, fumarate, malate, oxaloa
  • a potential diagnosis and/or a risk level for coronavirus disease (COVID) and/or long COVID resulting from an infection of SARS-CoV-2 may be generated for the subject based, at least in part, on the determined presence and/or the relative abundance of one or more of: tryptophan, glutamine, glutamic acid, citrulline, histidine, phenylalanine, neopterin, aspartic acid, or nicotinic acid in the breath sample.
  • a machine learning model trained as discussed above, may be configured to provide the potential diagnosis and/or risk level using the determined presence and/or the relative abundance of one or more of the compounds disclosed above.
  • a potential diagnosis and/or a risk level for myalgic encephalomyelitis (ME), chronic fatigue syndrome (CFS), ME/CFS, Lyme disease, or posttreatment Lyme disease may be generated for the subject based, at least in part, on the determined presence and/or the relative abundance of one or more of: 1-pyrroline-5-carboxylate acid, 13-carboxy-alpha-tocopherol, 2-aminobutyric acid, 2-hydroxy-3-methylbutyrate, 2- methylglutaconic acid, 2-octenoylcarnitine, 3-hydroxylaurate, 4-hydroxyperoxy-2-nonenal, 4- hydroxyphenyllactic, 4-imidazolone-5-proponoate, 5,6-didhydrothymine, acetamidopropanal, aconitic acid, adenosine, alanine, alpha-ketoglutarate, arginine, asx (asparagine/aspartic acid), beta-
  • the breath assay data may be used to help distinguish long COVID, from ME-CFS, from Lyme disease, from post-treatment Lyme disease when, e.g., a subject is experiencing symptoms of fatigue.
  • the differential diagnosis may be informed or generated, at least in part, using the trained machine learning model as discussed above.
  • the interpretation may include a general assessment of a subject’s fitness for performing a task (e.g., driving, running, etc.) or undertaking a duty or responsibility (e.g., firefighting, piloting a vehicle, policing, construction, manufacturing, etc.).
  • a task e.g., driving, running, etc.
  • a duty or responsibility e.g., firefighting, piloting a vehicle, policing, construction, manufacturing, etc.
  • fitness is meant the ability of the subject to perform and/or the risks associated with the subject undertaking (e.g., the potential risks to themselves, others, property, etc.) a task or tasks associated with the duty or responsibility.
  • the interpretation may include a general assessment regarding the fitness of a firefighter for duty.
  • the suggested course of action may include an explanation regarding typical manners in which an individual may develop a higher risk of developing a disease or condition or a higher risk of being exposed to a toxin (e.g., sources of the toxin) and steps the subject may take to avoid or mitigate the risk.
  • the suggested course of action may include preventative measures, such as, e.g., a recommended diet or recommended personal protective equipment (PPE).
  • PPE personal protective equipment
  • the suggested course of action may include a potential treatment regimen or therapy recommendation.
  • treatment regimen is meant a treatment plan that specifies the quantity, the schedule, and the duration of treatment.
  • the treatment regimen may include a suggested drug regimen, a detoxification process, or a suggested lifestyle change (e.g., dietary or exercise plans, etc.).
  • the health report may include one or more health scores.
  • health score is meant a quantitative evaluation of the subject’s overall health, the health or condition of an organ or system of the subject’s body, a health risk facing the subject, or the subject’s fitness for performing a task or undertaking a duty or responsibility compared with a baseline.
  • the baseline may vary, and in some instances includes the average of data associated with a cohort, such as an average level or amount of a given toxin found in a population or cohort of interest, a likelihood of developing a disease or condition in a population or cohort of interest, or the average resting or peak heart rate found in a population or cohort of interest.
  • the baseline includes prior data obtained for the subject, e.g., prior data obtained for the subject 1 day prior to generating the health report, 1 week prior to generating the health report, 1 month prior to generating the health report, 6 months prior to generating the health report, 1 year prior to generating the health report, 5 years prior to generating the health report, etc.
  • a health score is generated for the subject’s overall health, lung health, exposure to toxins, risk of developing a disease or condition, or fitness for the duty associated with their employment (e.g., firefighting).
  • the health score may be generated or obtained using the trained machine learning model as discussed above and breath assay data and/or non-breath assay data.
  • an overall health score may be generated that is a composite of the findings of the trained machine learning algorithm (e.g., applied to the breath assay data) and one or more additional health assessments (e.g., as discussed above).
  • the health report may include one or more personalized insights.
  • a personalized insight may vary and includes, but is not limited to, the detection of an anomaly, a classification, the detection of a cluster, or a forecast.
  • the personalized insight includes an insight regarding the subject individually.
  • the personalized insight includes an insight regarding a group or cohort in which the subject belongs.
  • the insight may include the identification of unusual data.
  • the insight may be that a specific toxin is detected at a higher level or concentration than usual or the risk of developing a disease or condition is elevated (e.g., when compared to a baseline as described above).
  • the predicted health outcome may be that the subject has a high risk of developing a specific disease or condition (e.g., chronic obstructive pulmonary disease (COPD) or a myocardial infarction (heart attack)).
  • COPD chronic obstructive pulmonary disease
  • myocardial infarction myocardial infarction
  • the health outcome can be predicted at least in part using the trained machine learning algorithm, as discussed above.
  • the health report is used to determine if a particular event or source of toxin exposure has affected the subject's predicted health outcomes.
  • the two or more health reports may be used to, e.g., determine changes in exposure of the subject to toxins over time, determine a clearance time of toxins from the subject, or predict one or more health outcomes for the subject using some combination of the two or more health reports. In some cases, some combination of the two or more health reports is used to determine if a particular event or source of toxin exposure has affected the subject’s predicted health outcomes.
  • the health report may include a metabolic profile or metabolic profiles of the breath sample of the subject.
  • metabolic profile is meant a higher-level view of the state of metabolic pathways or presence of various groupings of compounds in the individual at the time the breath is collected.
  • a metabolic profile may compare a particular breath or breaths obtained from the subject to a baseline (e.g., as described above).
  • Abnormal metabolic profiles may help identify the causes of certain symptoms, screen for disease, and guide treatment regimens.
  • the metabolic profiles may be tailored to assist medical professionals with decision making. For example, compounds associated with specific diseases or symptoms, or falling under the same category of toxin, may be grouped together and intuitively displayed, e.g., with their determined levels or values of abundance.
  • the health report may be obtained or generated, at least in part, using the trained machine learning model as discussed above.
  • any of the components the health report is comprised of such as, e.g., any of the components described above may be generated or obtained, at least in part, using the trained machine learning model.
  • the classification or detection may be generated or obtained using the trained machine learning model.
  • the health report is generated in real-time, e.g., as described in United States Provisional Application Serial Nos. 63/359,134 and 63/416,185 (Attorney docket nos. DIAG-003PRV and DIAG-003PRV2, respectively); the disclosures of which are herein incorporated by reference.
  • the method further includes suggesting preventative measures based on the health report, such as, e.g., recommended personal protective equipment (PPE) to avoid potential future exposure to a toxin or the development of a disease or condition.
  • PPE personal protective equipment
  • the method further includes providing a therapy recommendation to the subject based on the health report. While the therapy recommendation may vary, in some instances the therapy recommendation includes recommendations regarding the specifics of administering some existing standard of care for the treatment of a disease or condition.
  • the method further includes administering the treatment to the subject.
  • Embodiments of the methods may further include transmitting the health report, e.g., to a health care practitioner, to the subject, to an agent of the subject, etc.
  • the health report is received by a computer or mobile device application, such as a smart phone or computer app.
  • the health report is received by mail, electronic mail, fax machine, etc.
  • aspects of the invention further include methods of obtaining a health report, e.g., by breathing into a system of the invention as discussed in greater detail below; and receiving a health report from the system.
  • FIG. 3 provides a depiction of a health report obtained at least in part from a breath biopsy output file in accordance with an embodiment of invention.
  • first page 300 of the health report includes header 301 including information pertaining to the session in which the health report was generated and identifying information of the subject.
  • Diagnostics section 302 includes breath assay data 303 including a chart summarizing results of a toxin screening and a chart depicting compounds detected in the breath assay associated with various diseases or conditions.
  • interpretation section 304 explains the significance of the breath assay data (and, e.g., the non-breath assay data) on the subject’s lung health and the health risks toxins may pose to the subject.
  • the second page 305 of the health report includes toxin health risk evolution 306 and various health scores 307 obtained, e.g., as described above.
  • personal insights 308 are also provided as charts depicting evolutions of the subject’s overall health and lung health over the previous year and up to the present timepoint the depicted health report was obtained.
  • Spider diagrams 405 depict the presence and relative abundance of compounds associated with pulmonary fibrosis, COPD, COVID/long COVID, and OSA.
  • the shape of a spider diagram may aid in the diagnosis of a disease such as, e.g., through differential diagnosis with non-breath assay data.
  • Chart 406 summarizes the results of a toxin panel.
  • Chart 407 summarizes the results of a metabolic profile including a wide variety of various compounds.
  • FIG. 5 provides a section of a health report breaking down the results of the breath sample assay as the relate to COPD obtained at least in part from a breath biopsy output file in accordance with an embodiment of invention.
  • FIGS. 6A-6B provide a depiction of a metabolic profile of toxins (i.e., a toxin panel) of a health report obtained at least in part from a breath biopsy output file in accordance with an embodiment of invention.
  • first page 600 of the toxin panel includes header 601 and selectable menu 602 provided to assist a viewer in navigating between sections of the health report when, e.g., the report is displayed on an electronic viewing device (e.g., a computer or a smart phone).
  • Background section 604 is provided to explain the purpose of the toxin panel to the viewer (e.g., the subject) and session summary 603 is included providing information pertaining to the session in which the breath sample assay was performed.
  • the first page of the toxin panel further includes table 605 summarizing the findings of the toxin panel.
  • Table 605 lists each selected toxin in a row with an assigned detection level as described above, a history of toxin presence in previous breath samples provided by the subject (e.g., as determined by the findings of one or more previous health reports), and an explanation regarding the toxin as described above.
  • second page 606 of the toxin panel breaks each selected toxin into one of tables 607-609 based on a classification of each toxin (e.g., as Group 1 or Group 2A carcinogens as classified by the International Agency for Research on Cancer (IARC)).
  • IARC International Agency for Research on Cancer
  • Each of tables 607- 609 list selected toxins classified in the respective category in a row with an assigned detection level (e.g., as described above) and a note highlighting any changes in detected toxin level from a previous breath sample provided by the subject (i.e., a temporal change).
  • the second page of the toxin panel further includes chart 610 summarizing the results of the toxin panel.
  • aspects of the present disclosure further include systems, such as computer-controlled systems, for practicing embodiments of the above methods.
  • aspects of the systems include: a particle analyzer configured to receive a breath sample; a processor configured to receive the measurements generated by the particle analyzer; and memory operably coupled to the processor wherein the memory includes instructions stored thereon, which when executed by the processor, cause the processor to: analyze breath samples from a plurality of subjects to generate a plurality of breath biopsy output files; obtain a health record associated with a disease or condition for each subject; train a machine learning model to identify a relationship between the breath samples and the disease or condition using the breath biopsy output files and the obtained health records; and applying the trained machine learning model to a breath biopsy output file to generate a health report regarding the disease or condition for a subject
  • the particle analyzer may be a mass spectrometer.
  • the mass spectrometer may be configured to perform a variety of techniques/methods.
  • the mass spectrometer includes a high-resolution mass spectrometer (HRMS).
  • HRMS high-resolution mass spectrometer
  • the mass spectrometer may be coupled to or include one or more of: an ion mobility spectrometer (IMS), a gas chromatograph (GC), a liquid chromatograph (LC), a differential mobility spectrometer (DMS), a field asymmetric ion mobility spectrometer (FAIMS), a selective-ion flow tube (i.e., SIFT-MS), a proton-transfer-reaction (i.e., PTR-MS), a time-of-flight mass spectrometer (TOF-MS) etc.
  • IMS ion mobility spectrometer
  • GC gas chromatograph
  • LC liquid chromatograph
  • DMS differential mobility spectrometer
  • FIMS field asymmetric ion mobility spectrometer
  • SIFT-MS selective-ion flow tube
  • PTR-MS proton-transfer-reaction
  • TOF-MS time-of-flight mass spectrometer
  • the mass spectrometer may be a Thermo Scientific high-resolution mass spectrometer (e.g., Thermo Scientific ExactiveTM, Q- ExactiveTM, ExplorisTM) or a SciEX high-resolution mass spectrometer (e.g., TripleTOF ® mass spectrometer system).
  • Thermo Scientific high-resolution mass spectrometer e.g., Thermo Scientific ExactiveTM, Q- ExactiveTM, ExplorisTM
  • SciEX high-resolution mass spectrometer e.g., TripleTOF ® mass spectrometer system
  • the ionizer is configured to perform SESI.
  • the ionizer may be a SUPER SESITM device (e.g., a SUPER SESITM QE or SUPER SESITM-X device).
  • the ionizer may be configured to ionize particles in the breath sample, wherein the mass spectrometer may be configured to generate measurements of the mass-to-charge ratio of the ionized particles.
  • the mass spectrometer is configured to provide real-time feedback of the breath sample assay related to the quality of the breath sample.
  • the ionizer and mass spectrometer are configured to assay the breath sample in real time with respect to the subject providing the breath sample.
  • the mass spectrometer is configured to measure the time of detection of a toxin or toxin associated compound in the breath sample assay.
  • the systems may further include means for delivering a breath sample (e.g., one or more exhaled breaths of the breath sample) from the subject to the particle analyzer.
  • these delivery means may include a mouthpiece configured to seal to the lips of a subject and receive the breath sample from the subject.
  • the delivery means may additionally include a breath chamber configured to receive the breath sample from the mouthpiece.
  • the breath chamber is operably coupled to the ionizer.
  • the delivery means may further include a valve configured to do one or more of: direct the breath sample along a desired flow path, control the flow rate of the breath sample into the ionizer, or block the flow of ambient air/the breath sample.
  • the breath chamber is configured to produce exhaled breath condensate (EBC) from the breath sample.
  • the system may include means for chilling the breath chamber. Chilling means may include, but are not limited to, a freezer or refrigerator, dry ice, or liquid nitrogen.
  • the system may further include aerosolization means configured to aerosolize the EBC prior to ionization such as, e.g., a nebulizer.
  • the system may further include means for stably storing the EBC such as, e.g., a refrigerator or a freezer.
  • the memory includes instructions stored thereon, which when executed by the processor, further cause the processor to generate an identifier associated breath biopsy output file, an intuitive data set generated from the breath biopsy output file, and/or a metadata file associated with the breath biopsy output file according to any of the methods as discussed above.
  • the memory includes instructions stored thereon, which when executed by the processor, further cause the generate a plurality of breath biopsy output files for a plurality of subjects and obtain a health record associated with a disease or condition for each subject according to any of the methods as discussed above.
  • the instructions when executed by the processor, may cause the processor to train a machine learning model to identify a relationship between the breath samples and the disease or condition using the breath biopsy output files and the obtained health records according to any of the methods as discussed above.
  • the processor includes instructions stored thereon, which when executed by the processor, further cause the processor to generate a health report regarding the disease or condition for a subject according to any of the methods as discussed above.
  • the processor includes instructions stored thereon, which when executed by the processor, further cause the processor to generate an intuitive data set based on the identifier associated breath biopsy output file according to any of the methods as discussed above.
  • the instructions when executed by the processor, may cause the processor to reduce the data of the identifier associated breath biopsy output file in order to generate the intuitive data set according to any of the methods as discussed above.
  • the instructions, when executed by the processor may cause the processor to first generate the intuitive data set before generating the health report according to any of the methods as discussed above.
  • systems further include one or more computers for complete automation or partial automation of the methods described herein.
  • systems include a computer having a computer readable storage medium with a computer program stored thereon.
  • the system includes an input module, a processing module and an output module.
  • the subject systems may include both hardware and software components, where the hardware components may take the form of one or more platforms, e.g., in the form of servers, such that the functional elements, i.e., those elements of the system that carry out specific tasks (such as managing input and output of information, processing information, etc.) of the system may be carried out by the execution of software applications on and across the one or more computer platforms represented of the system.
  • the processing module includes a processor which has access to a memory having instructions stored thereon for performing the steps of the subject methods.
  • the processing module may include an operating system, a graphical user interface (GUI) controller, a system memory, memory storage devices, and input-output controllers, cache memory, a data backup unit, and many other devices.
  • GUI graphical user interface
  • the processor may be a commercially available processor or it may be one of other processors that are or will become available.
  • the processor executes the operating system and the operating system interfaces with firmware and hardware in a well-known manner, and facilitates the processor in coordinating and executing the functions of various computer programs that may be written in a variety of programming languages, such as Java, Perl, C++, Python, other high-level or low-level languages, as well as combinations thereof, as is known in the art.
  • the operating system typically in cooperation with the processor, coordinates and executes functions of the other components of the computer.
  • the operating system also provides scheduling, input-output control, file and data management, memory management, and communication control and related services, all in accordance with known techniques.
  • the processor may be any suitable analog or digital system.
  • the processor includes analog electronics which provide feedback control, such as for example negative feedback control.
  • a computer program product including a computer usable medium having control logic (computer software program, including program code) stored therein.
  • the control logic when executed by the processor the computer, causes the processor to perform functions described herein.
  • some functions are implemented primarily in hardware using, for example, a hardware state machine. Implementation of the hardware state machine so as to perform the functions described herein will be apparent to those skilled in the relevant arts.
  • Memory may be any suitable device in which the processor can store and retrieve data, such as magnetic, optical, or solid-state storage devices (including magnetic or optical disks or tape or RAM, or any other suitable device, either fixed or portable).
  • the processor may include a general-purpose digital microprocessor suitably programmed from a computer readable medium carrying necessary program code. Programming can be provided remotely to processor through a communication channel, or previously saved in a computer program product such as memory or some other portable or fixed computer readable storage medium using any of those devices in connection with memory.
  • a magnetic or optical disk may carry the programming, and can be read by a diskwriter/reader.
  • Systems of the invention also include programming, e.g., in the form of computer program products, algorithms for use in practicing the methods as described above.
  • Programming according to the present invention can be recorded on computer readable media, e.g., any medium that can be read and accessed directly by a computer.
  • Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and magnetic tape; optical storage media such as CD-ROM; electrical storage media such as RAM and ROM; portable flash drive; and hybrids of these categories such as magnetic/optical storage media.
  • the processor may also have access to a communication channel to communicate with a user at a remote location.
  • remote location is meant the user is not directly in contact with the system and relays input information to an input manager from an external device, such as a computer connected to a Wide Area Network (“WAN”), telephone network, satellite network, or any other suitable communication channel, including a mobile telephone (i.e. , smartphone).
  • WAN Wide Area Network
  • smartphone mobile telephone
  • systems according to the present disclosure may be configured to include a communication interface.
  • the communication interface includes a receiver and/or transmitter for communicating with a network and/or another device.
  • the communication interface can be configured for wired or wireless communication, including, but not limited to, radio frequency (RF) communication (e.g., Radio-Frequency Identification (RFID), Zigbee communication protocols, WiFi, infrared, wireless Universal Serial Bus (USB), Ultra Wide Band (UWB), Bluetooth® communication protocols, and cellular communication, such as code division multiple access (CDMA) or Global System for Mobile communications (GSM).
  • RFID Radio-Frequency Identification
  • RFID Radio-Frequency Identification
  • WiFi WiFi
  • USB Universal Serial Bus
  • UWB Ultra Wide Band
  • Bluetooth® communication protocols e.g., Bluetooth® communication protocols
  • CDMA code division multiple access
  • GSM Global System for Mobile communications
  • the communication interface is configured to include one or more communication ports, e.g., physical ports or interfaces such as a USB port, an RS-232 port, or any other suitable electrical connection port to allow data communication between the subject systems and other external devices such as a computer terminal (for example, at a physician’s office or in hospital environment) that is configured for similar complementary data communication.
  • one or more communication ports e.g., physical ports or interfaces such as a USB port, an RS-232 port, or any other suitable electrical connection port to allow data communication between the subject systems and other external devices such as a computer terminal (for example, at a physician’s office or in hospital environment) that is configured for similar complementary data communication.
  • the communication interface is configured for infrared communication, Bluetooth® communication, or any other suitable wireless communication protocol to enable the subject systems to communicate with other devices such as computer terminals and/or networks, communication enabled mobile telephones, personal digital assistants, or any other communication devices which the user may use in conjunction.
  • the communication interface is configured to provide a connection for data transfer utilizing Internet Protocol (IP) through a cell phone network, Short Message Service (SMS), wireless connection to a personal computer (PC) on a Local Area Network (LAN) which is connected to the internet, or WiFi connection to the internet at a WiFi hotspot.
  • IP Internet Protocol
  • SMS Short Message Service
  • PC personal computer
  • LAN Local Area Network
  • the subject systems are configured to wirelessly communicate with a server device via the communication interface, e.g., using a common standard such as 802.11 or Bluetooth® RF protocol, or an IrDA infrared protocol.
  • the server device may be another portable device, such as a smart phone, Personal Digital Assistant (PDA) or notebook computer; or a larger device such as a desktop computer, appliance, etc.
  • the server device has a display, such as a liquid crystal display (LCD), as well as an input device, such as buttons, a keyboard, mouse or touch-screen.
  • LCD liquid crystal display
  • the communication interface is configured to automatically or semi- automatically communicate data stored in the subject systems, e.g., in an optional data storage unit, with a network or server device using one or more of the communication protocols and/or mechanisms described above.
  • Output controllers may include controllers for any of a variety of known display devices for presenting information to a user, whether a human or a machine, whether local or remote. If one of the display devices provides visual information, this information typically may be logically and/or physically organized as an array of picture elements.
  • a graphical user interface (GUI) controller may include any of a variety of known or future software programs for providing graphical input and output interfaces between the system and a user, and for processing user inputs.
  • the functional elements of the computer may communicate with each other via system bus. Some of these communications may be accomplished in alternative embodiments using network or other types of remote communications.
  • the output manager may also provide information generated by the processing module to a user at a remote location, e.g., over the Internet, phone or satellite network, in accordance with known techniques.
  • the presentation of data by the output manager may be implemented in accordance with a variety of known techniques.
  • data may include SQL, HTML or XML documents, email or other files, or data in other forms.
  • the data may include Internet URL addresses so that a user may retrieve additional SQL, HTML, XML, or other documents or data from remote sources.
  • the one or more platforms present in the subject systems may be any type of known computer platform or a type to be developed in the future, although they typically will be of a class of computer commonly referred to as servers.
  • may also be a main-frame computer, a workstation, or other computer type. They may be connected via any known or future type of cabling or other communication system including wireless systems, either networked or otherwise. They may be co-located or they may be physically separated.
  • Various operating systems may be employed on any of the computer platforms, possibly depending on the type and/or make of computer platform chosen. Appropriate operating systems include Windows, iOS, Oracle Solaris, Linux, IBM i, Unix, and others.
  • Non-transitory computer readable storage mediums having instructions for practicing the subject methods.
  • Computer readable storage mediums may be employed on one or more computers for complete automation or partial automation of a system for practicing methods described herein.
  • instructions in accordance with the method described herein can be coded onto a computer- readable medium in the form of “programming”, where the term "computer readable medium” as used herein refers to any non-transitory storage medium that participates in providing instructions and data to a computer for execution and processing.
  • Non-transitory computer-readable media include all computer-readable media except for a transitory, propagating signal.
  • non-transitory storage media examples include a floppy disk, hard disk, optical disk, magnetooptical disk, CD-ROM, CD-R, magnetic tape, non-volatile memory card, ROM, DVD-ROM, Blueray disk, solid state disk, and network attached storage (NAS), whether or not such devices are internal or external to the computer.
  • a file containing information can be “stored” on computer readable medium, where “storing” means recording information such that it is accessible and retrievable at a later date by a computer.
  • the computer-implemented method described herein can be executed using programming that can be written in one or more of any number of computer programming languages. Such languages include, for example, Python, Java, Java Script, C, C#, C++, Go, R, Swift, PHP, as well as many others.
  • the non-transitory computer readable storage medium may be employed on one or more computer systems having a display and operator input device. Operator input devices may, for example, be a keyboard, mouse, or the like.
  • the processing module includes a processor which has access to a memory having instructions stored thereon for performing the steps of the subject methods.
  • the processing module may include an operating system, a graphical user interface (GUI) controller, a system memory, memory storage devices, input-output controllers, cache memory, a data backup unit, and many other devices.
  • the processor may be a commercially available processor or it may be one of other processors that are or will become available.
  • the methods and systems of the invention find use in a variety of applications where it is desirable to make a qualitative or quantitative determination regarding one or more health-related matters pertaining to a subject.
  • the methods and systems described herein find use when it is desirable to enhance the accuracy of differential diagnoses.
  • Embodiments of the present disclosure find use in applications wherein it is desired to acquire additional health information through non-invasive diagnostic procedures in order to, e.g., detect exposure to toxins or facilitate the early diagnosis of various diseases and conditions and, correspondingly, provide for improvements in patient outcomes.
  • the subject methods and systems may facilitate carcinogen exposure testing of a subject or the generation of data useful for the diagnosis of a disease or condition by low/minimally trained technicians.
  • the subject methods and systems may facilitate diagnosis for one or more conditions, insight on one or more health risks, or recommendations for one or more therapies or treatments.
  • the breath sample of a healthy subject was assayed for the presence of twelve Group 1 carcinogens.
  • the breath sample assay was run using a high-resolution mass spectrometer coupled to a SUPER SESITM device run on negative-ion mode.
  • the numbers reflect a value of abundance and a “ — ” denotes non-detection. Non-detection indicates that the respective compound was not present in the breath sample above the limit of detection.
  • Most carcinogens are detected at trace levels, some in only one or two of the five breaths assayed.
  • Table 1 The results of the breath sample assay appear in Table 1 , below:
  • the breath sample of two subjects was assayed for the presence of six PFAS compounds.
  • the breath sample assay was run using a high-resolution mass spectrometer coupled to a SUPER SESITM device run on negative-ion mode.
  • the numbers in the peak m/z column reflect the absolute value of a ratio of mass (i.e., Daltons) to charge at the center of the peak determined to correspond to the relevant compound.
  • the numbers in the integrated IEC column reflect the area appearing under each respective peak on a produced extracted ion chromatogram, indicating relative abundance of the respective PFAS compound in the breath sample.
  • a “ — ” denotes non-detection. Non-detection indicates that the respective compound was not present in the breath sample above the limit of detection.
  • PFAS compounds are detected at trace levels. Some PFAS compounds are not detected in the breath sample assay, which may indicate a high elimination rate of the PFAS compound in the human body or a limited exposure of the subject to the PFAS compound.
  • the results of the breath sample assay appear in Table 2 for Subject 1 and Table 3 for Subject 2, as can be seen below:
  • the breath sample of a healthy subject was assayed for the presence TCE and six TCE associated byproducts.
  • the breath sample assay was run using a high-resolution mass spectrometer coupled to a SUPER SESITM device run on negative-ion mode.
  • the numbers reflect a value of abundance and a “ — ” denotes non-detection. Non-detection indicates that the respective compound was not present in the breath sample above the limit of detection.
  • Most agents are detected at trace levels, some in only one or two of the five breaths assayed. Some agents are not detected in any breaths of the breath sample assay, which may indicate a high elimination rate of the agent in the human body or a limited exposure of the subject to the agent.
  • Table 4 Detection of TCE and associated byproducts
  • the breath sample of a healthy subject was assayed for the presence of compounds associated with COPD, Pulmonary Fibrosis, COVID/Long COVID, and OSA when found in the human breath.
  • the breath sample assay was run using a high-resolution mass spectrometer coupled to a SUPER SESITM device.
  • FIG. 4 provides the results of assaying the breath of the subject for the presence of various disease related compounds in accordance with an embodiment of invention.
  • Spider diagrams 405 depict the presence and relative abundance of compounds associated with pulmonary fibrosis, COPD, COVID/long COVID, and OSA.
  • the shape of a spider diagram may aid in the diagnosis of a disease such as, e.g., through differential diagnosis with non-breath assay data.
  • FIG. 5 provides the results of assaying the breath of the subject for the presence of various compounds associated with COPD in accordance with an embodiment of invention.
  • the box plot in results section 506 intuitively displays the determined presence of various compounds associated with COPD when found in breath.
  • a health report was generated based in part on an identifier associated breath biopsy output file generated from a breath sample assay in accordance with embodiments of the invention.
  • FIG. 3 provides a depiction of the health report obtained in part from the identifier associated breath biopsy output file.
  • first page 300 of the health report includes header 301 including information pertaining to the session in which the health evaluation was generated and identifying information of the subject.
  • Diagnostics section 302 includes breath assay data 303 including a chart summarizing results of a toxin screening and a chart depicting compounds detected in the breath assay associated with various diseases or conditions.
  • Interpretation section 304 explains the significance of the breath assay data on the subject’s lung health and the health risks toxins may pose to the subject.
  • the second page 305 of the health report includes toxin health risk evolution 306 and various health scores 307 obtained as described above.
  • Personal insights 308 are provided as charts depicting evolutions of the subject’s overall health and lung health over the previous year and up to the present timepoint the depicted health evaluation was obtained.
  • FIGS. 6A-6B provide a depiction of a metabolic profile of toxins (i.e., a toxin panel) of a health report obtained from the breath sample assay.
  • first page 600 of the toxin panel includes header 601 and selectable menu 602 for navigating between sections of the health report when it is displayed on an electronic viewing device.
  • Background section 604 is also provided along with session summary 603 providing information pertaining to the session in which the breath sample assay was performed.
  • Table 605 summarizes the findings of the toxin panel, listing each selected toxin in a row with an assigned detection level reflecting a relative value of abundance for the toxin.
  • second page 606 of the toxin panel breaks each selected toxin into one of tables 607-609 based on a classification of each toxin.
  • Each of tables 607-609 list selected toxins classified in the respective category in a row with the assigned detection level and a note highlighting any changes in detected toxin level from a previous breath sample of the subject.
  • Chart 610 summarizes the results of the breath sample assay.
  • a range includes each individual member.
  • a group having 1 -3 articles refers to groups having 1 , 2, or 3 articles.
  • a group having 1-5 articles refers to groups having 1 , 2, 3, 4, or 5 articles, and so forth.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Public Health (AREA)
  • Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Molecular Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Chemical & Material Sciences (AREA)
  • Computational Linguistics (AREA)
  • Urology & Nephrology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Biochemistry (AREA)
  • Pulmonology (AREA)
  • Immunology (AREA)
  • Medicinal Chemistry (AREA)
  • Food Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • Hematology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Surgery (AREA)
  • Veterinary Medicine (AREA)
  • Physiology (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

Methods of generating a health report for a subject are provided. Aspects of the methods include: analyzing breath samples from one or a plurality of subjects with a secondary electrospray ionization-high-resolution mass spectrometry analyzer to generate a plurality of breath biopsy output files; obtaining a health record associated with a disease or condition for the or each subject; training a machine learning model to identify a relationship between the breath samples and the disease or condition using the breath biopsy output files and the obtained health records; applying the trained machine learning model to a breath biopsy output file to generate a health report regarding the disease or condition for a subject. Aspects of the present invention further include methods of generating the breath biopsy output file and methods of generating real time feedback to enhance accuracy and monitor data quality. Also provided are systems for use in practicing methods of the invention.

Description

RAPID GENERATION OF BREATH-BASED HEALTH REPORTS AND
SYSTEMS FOR USE IN THE SAME
CROSS-REFERENCE TO REL TED APPLICATIONS
Pursuant to 35 U.S.C. § 119 (e), this application claims priority to the filing dates of United States Provisional Patent Application Serial No. 63/461 ,498 filed April 24, 2023; United States Provisional Patent Application Serial No. 63/416,185, filed on October 14, 2022; and to United States Provisional Patent Application Serial No. 63/359,134, filed on July 7, 2022, the disclosures of which applications are herein incorporated by reference.
INTRODUCTION
Medical diagnosis is the process of determining which disease or condition explains a person's symptoms and signs. The information required for diagnosis is typically collected from a history and physical examination of the person seeking medical care. Often, one or more diagnostic procedures, such as medical tests, are also done during the process.
A diagnosis, in the sense of diagnostic procedure, can be regarded as an attempt at classification of an individual's condition into separate and distinct categories that allow medical decisions about treatment and prognosis to be made. Diagnosis is often challenging because many signs and symptoms are nonspecific. For example, redness of the skin (erythema), by itself, is a sign of many disorders and thus does not tell the healthcare professional what is wrong. Thus differential diagnosis, in which several possible explanations are compared and contrasted, must be performed. This involves the correlation of various pieces of information followed by the recognition and differentiation of patterns.
Early diagnosis of diseases and conditions can be important in the successful treatment thereof. As such, there is continued interest in the development of protocols and systems which can provide for rapid generation of relevant information that can be used in the diagnosis of diseases and conditions, including the early diagnosis of diseases and conditions.
SU MARY
The inventors have realized that the rapid acquisition of health information through non- invasive diagnostic procedures would greatly enhance the accuracy of differential diagnoses for many diseases and conditions. As such, new rapid and non-invasive diagnostic procedures are needed that facilitate the early diagnosis of various diseases and conditions and, correspondingly, provide for improvements in patient outcomes. Embodiments of the invention meet this need.
Methods of generating a health report for a subject are provided. Aspects of the methods include: analyzing breath samples from one or a plurality of subjects with a secondary electrospray ionization-high-resolution mass spectrometry analyzer to generate a plurality of breath biopsy output files; obtaining a health record associated with a disease or condition for the one or each subject; training a machine learning model to identify a relationship between the breath samples and the disease or condition using the breath biopsy output files and the obtained health records; and applying the trained machine learning model to a breath biopsy output file, different from the breath biopsy output files used to train the model, to generate a health report regarding the disease or condition for a subject or subjects. Aspects of the present invention further include methods of generating the breath biopsy output file and methods of generating real time feedback to enhance accuracy and monitor data quality. Also provided are systems for use in practicing methods of the invention.
BRIEF DESCRIPTION OF THE FIGURES
FIG. 1 depicts an overview of the results obtained from an identifier associated breath biopsy output file generated from a breath sample assay in accordance with an embodiment of the invention.
FIG. 2 provides a flow diagram depicting a method for generating an intuitive data set from an identifier associated breath biopsy output file in accordance with an embodiment of the invention.
FIG. 3 depicts a health report obtained at least in part from a breath biopsy output file generated from a breath sample assay in accordance with an embodiment of invention.
FIG. 4 illustrates various metabolic profiles of a health report obtained at least in part from a breath biopsy output file generated from a breath sample assay in accordance with an embodiment of invention.
FIG. 5 depicts a section of a health report breaking down the results of the breath sample assay as they relate to COPD in accordance with an embodiment of invention.
FIGS. 6A-6B illustrate a metabolic profile of toxins (i.e., a toxin panel) of a health report obtained at least in part from a breath biopsy output file generated from a breath sample assay in accordance with an embodiment of invention. FIG. 7 provides a flow diagram depicting a method for dynamically adjusting breath collection automatically based on real-time feedback in accordance with an embodiment of the invention.
FIGS. 8A-8B illustrate selected ion monitoring (SIM) automatically performed based on real-time feedback in accordance with an embodiment of the invention.
FIG. 9 provides a flow diagram depicting a method for training a machine learning model using generated breath biopsy output files and obtained health records in accordance with an embodiment of the invention.
DETAILED DESCRIPTION
Methods of generating a health report for a subject are provided. Aspects of the methods include: analyzing breath samples from one or a plurality of subjects with a secondary electrospray ionization-high-resolution mass spectrometry analyzer to generate a plurality of breath biopsy output files; obtaining a health record associated with a disease or condition for one or each subject; training a machine learning model to identify a relationship between the breath samples and the disease or condition using the breath biopsy output files and the obtained health records; applying the trained machine learning model to a breath biopsy output file, different from the breath biopsy output files used to train the model, to generate a health report regarding the disease or condition for a subject. Aspects of the present invention further include methods of generating the breath biopsy output file and methods of generating real time feedback to enhance accuracy and monitor data quality. Also provided are systems for use in practicing methods of the invention.
Before the present invention is described in greater detail, it is to be understood that this invention is not limited to particular embodiments described, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting, since the scope of the present invention will be limited only by the appended claims.
Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges and are also encompassed within the invention, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
Certain ranges are presented herein with numerical values being preceded by the term "about.” The term "about" is used herein to provide literal support for the exact number that it precedes, as well as a number that is near to or approximately the number that the term precedes. In determining whether a number is near to or approximately a specifically recited number, the near or approximating unrecited number may be a number which, in the context in which it is presented, provides the substantial equivalent of the specifically recited number.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present invention, representative illustrative methods and materials are now described.
All publications and patents cited in this specification are herein incorporated by reference as if each individual publication or patent were specifically and individually indicated to be incorporated by reference and are incorporated herein by reference to disclose and describe the methods and/or materials in connection with which the publications are cited. The citation of any publication is for its disclosure prior to the filing date and should not be construed as an admission that the present invention is not entitled to antedate such publication by virtue of prior invention. Further, the dates of publication provided may be different from the actual publication dates, which may need to be independently confirmed.
It is noted that, as used herein and in the appended claims, the singular forms “a”, “an”, and “the” include plural referents unless the context clearly dictates otherwise. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely,” “only” and the like in connection with the recitation of claim elements, or use of a “negative” limitation.
As will be apparent to those of skill in the art upon reading this disclosure, each of the individual embodiments described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present invention. Any recited method can be carried out in the order of events recited or in any other order which is logically possible. While the apparatus and method has or will be described for the sake of grammatical fluidity with functional explanations, it is to be expressly understood that the claims, unless expressly formulated under 35 U.S.C. §112, are not to be construed as necessarily limited in any way by the construction of "means" or "steps" limitations, but are to be accorded the full scope of the meaning and equivalents of the definition provided by the claims under the judicial doctrine of equivalents, and in the case where the claims are expressly formulated under 35 U.S.C. §112 are to be accorded full statutory equivalents under 35 ll.S.C. §112.
METHODS
As summarized above, methods of generating a health report for a subject are provided. Aspects of the methods include: analyzing breath samples from a plurality of subjects with a secondary electrospray ionization-high-resolution mass spectrometry analyzer to generate a plurality of breath biopsy output files; obtaining a health record associated with a disease or condition for each subject; training a machine learning model to identify a relationship between the breath samples and the disease or condition using the breath biopsy output files and the obtained health records; applying the trained machine learning model to a breath biopsy output file, different from the breath biopsy output files used to train the model, to generate a health report regarding the disease or condition for a subject. Aspects of the present invention further include methods of generating the breath biopsy output file and methods of generating real time feedback to enhance accuracy and monitor data quality.
Analyzing the Breath Sample
As described above, embodiments of the methods include analyzing breath samples from a plurality of subjects with a breath analyzer. The breath sample of the subject or subjects that is analyzed (i.e., assayed) may vary, and may be made up of 1 or more breaths, where in some instances the number of breaths ranges from 1 to 25, such as 1 to 20, including 1 to 15, e.g., 1 to 10, including 1 to 5 exhaled breaths. In some instances, the period of time between each exhaled breath received for the breath assay may vary, where in some instances the time between each received exhaled breath ranges from 1 to 180 seconds, such as 10 to 120, including 15 to 100, e.g., 20 to 90, including 20 to 60 seconds. In other embodiments, each exhaled breath of the breath sample may be received consecutively with respect to the previously received exhaled breath.
The breath sample may be a gaseous breath sample or an exhaled breath condensate (EBC) of the breath sample. In embodiments wherein the breath sample is an EBC, the EBC may be collected by having the subject exhale into a container, cooling the container, then collecting the EBC on the inside walls of the cooled container. The container may be cooled by, e.g., chilling the container in a freezer or refrigerator, with dry ice, or using liquid nitrogen. In some embodiments, the EBC may be stored for a period of time before assaying. In some instances, the EBC, is stored for a period of time such as 24 hours or more, or 48 hours or more, or 72 hours or more, or 4 days or more, or 5 days or more, or 6 days or more, or 1 week or more, or 2 weeks or more, or 3 weeks or more, or 4 weeks or more, or 1 month or more. In embodiments where the breath sample is a condensate, methods may include aerosolization of the condensate prior to assaying using, e.g., a nebulizer.
Embodiments of the method may further include shipping the breath sample (e.g., EBC) to a remote location for assaying. A “remote location,’’ is a location other than the location at which the breath sample is collected. For example, a remote location could be another location (e.g., office, lab, etc.) in the same city, another location in a different city, another location in a different state, another location in a different country, etc. As such, when one item is indicated as being “remote” from another, what is meant is that the two items can be in the same room but separated, or at least in different rooms or different buildings, and can be at least one mile, ten miles, or one hundred miles or more apart.
Breath analyzers, in accordance with embodiments of the methods, may vary. In some embodiments, the analyzer includes a Raman spectroscopy analyzer, a breathalyzer, an optical absorbance sensing analyzer, a gas chromatography analyzer, electronic sensing using an electronic nose, a nuclear magnetic resonance spectroscopy analyzer, or a mass spectrometry analyzer. In some embodiments, the breath analyzer includes a mass spectrometry analyzer such as, e.g., a high-resolution mass spectrometry (HRMS) analyzer. In embodiments where the breath analyzer includes a mass spectrometry analyzer (i.e., the analyzer is configured to perform mass spectrometry), the mass spectrometry method/technique employed by the analyzer may vary and the analyzer may be coupled with or include (e.g., may be configured to perform) one or more of: ion mobility spectrometry (IMS), gas chromatography (GC), liquid chromatography (LG), differential mobility spectrometry (DMS), field asymmetric ion mobility spectrometry (FAIMS), a selective-ion flow tube (i.e., SIFT-MS), a proton-transfer-reaction (i.e., PTR-MS), time-of-flight mass spectrometry (TOF-MS) etc. For example, the mass spectrometry analyzer may perform IMS-mass spectrometry (IMS-MS), GC-mass spectrometry (GC-MS), LC-mass spectrometry (LC- MS), etc. In some embodiments, tandem mass spectrometry may be performed using, e.g., two or more mass spectrometry analyzers. In embodiments where the breath analyzer includes a mass spectrometry analyzer, the ionization method/technique employed by the analyzer may vary and may include matrix-assisted laser desorption/ionization (MALDI), atmospheric pressure chemical ionization (APCI), atmospheric pressure photoionization (APPI), electrospray ionization (ESI), secondary electrospray ionization (SESI), etc. In some embodiments, the ionization technique employed is a soft ionization technique. In these instances, the mass spectrometry analyzer may be configured to perform SESI such as, e.g., SESI-HRMS or SESI-TOF-HRMS. In embodiments where the ionization method/technique employed by the mass spectrometry analyzer is SESI, the breath sample may be a gaseous breath sample (e.g., collected directly from the subject or aerosolized after being collected as an EBC). In embodiments where the ionization method/technique is SESI, the mass spectrometry analyzer may include a SUPER SESI™ (e.g., SUPER SESI™-HRMS) device.
As described above, the mass spectrometry analyzer may be configured to perform SESI mass spectrometry (e.g., SESI-HRMS). The SESI mass spectrometry may be run in positive-ion mode (i.e., wherein ionization occurs through protonation, or positive ions enter the mass spectrometer) or negative-ion mode (i.e., wherein ionization occurs through deprotonation, or negative ions enter the mass spectrometer). In some embodiments, the SESI mass spectrometry analyzer is run in negative-ion mode. In embodiments where the breath assay includes SESI mass spectrometry run in negative-ion mode, the ionization agent may vary. In some embodiments, the ionization agent includes water. In some embodiments, the ionization agent includes formic acid. In embodiments where the ionization agent includes formic acid, the formic acid may be diluted in water, such as diluted to achieve a ratio ranging from 0.01 -1.0% volume over volume (v/v) of formic acid to water, such as 0.05-0.5% v/v of formic acid to water, or 0.1 - 0.2% v/v of formic acid to water.
As discussed above, mass spectrometry methods and techniques employed in accordance with embodiments of the methods may vary. In some embodiments, mass spectrometry techniques that may be employed include, but are not limited to, those disclosed in U.S. Patent No. 11 ,075,068 and the patent documents cited therein, which methods are incorporated herein by reference; and Singh, K.D., Tancev, G., Decrue, F. et al. Standardization procedures for real-time breath analysis by secondary electrospray ionization high-resolution mass spectrometry. Anal Bioanal Chem 41 1 , 4883-4898 (2019). https://doi.org/10.1007/s00216- 019-01764-8. In some embodiments, the mass spectrometry analyzer may be a Thermo Scientific high-resolution mass spectrometer (e.g., Thermo Scientific Exactive™ or Q-Exactive™) or a SciEX high-resolution mass spectrometer (e.g., a TripleTOF ® mass spectrometer system). In some embodiments, the breath sample is assayed in real time with respect to the subject providing the breath sample. Assaying the breath sample in real time with respect to the subject providing the breath sample may, e.g., minimize any chemical changes taking place which may impact the results of the breath sample assay. In these embodiments, compounds that are exhaled from deeper in the lungs may be detected relatively later in the assay. In some embodiments, the time of detection of a compound in the breath sample assay is used to identify and validate the detection of the compound or provide other information, e.g., related to the fingerprint of a compound, toxin source, disease, or condition in the breath sample or the pharmacokinetics of a compound.
In some embodiments, real-time feedback of the measurements of the mass spectrometry analyzer may be generated and used to enhance the accuracy of relevant measurements. By relevant measurement is meant a mass-to-charge ratio (m/z) measurement of a feature of interest. In some embodiments, the feature of interest may be a compound of interest (e.g., the m/z of the compound of interest or a metabolite thereof). In some embodiments, the feature of interest may be one or more m/z measurements of a compound, toxin source, disease, or condition fingerprint. By fingerprint is meant a unique set of identified (e.g., as unique compounds or metabolites thereof) and/or unidentified m/z peaks or measurements and the context of the m/z peaks or measurements (e.g., the relative intensities of the m/z peaks, the temporal position of the m/z peaks in a breath, or any other context determined to be significant by a machine learning model during training whether known or unknown, as discussed in greater detail below) that are unique to a specific subject, sample type, compound and/or circumstance. For example, a subject’s breath may have a specific fingerprint, a compound may have a specific fingerprint such that it is able to be identified in a subject’s breath, a toxin source or a toxin may have a specific fingerprint such that exposure of a subject to the toxin source or toxin may be determined using the subject’s breath, a disease or condition may a specific fingerprint such that, e.g., the risk of the subject has of developing the disease or condition or the diagnosis of the disease or condition may be determined using the subject’s breath, etc. In some cases, the fingerprint may include the abundance (e.g., concentration) of a unique set of metabolites or other compounds in relation to each other or in relation to other compounds found in the subject’s breath (i.e., the relative abundance of the set of metabolites or other compounds or combinations thereof) determined using identified m/z peaks. In some instances, the fingerprint may include a temporal component. For example, the relative intensity of a set of m/z peaks or measurements of a fingerprint may change with the time of detection (e.g., as air is exhaled from deeper portions of the lungs). In some embodiments, the fingerprint may be generated by a machine learning model. In these instances, the real-time measurements may be fed to the trained machine learning model in order to generate features of interest (i.e., relevant measurements) for which accuracy may be enhanced, as discussed in greater detail below. In some instances, the mass spectrometry analyzer is dynamically adjusted in real-time based on real-time measurement feedback provided, e.g., for each breath assayed from a subject.
In some embodiments, selected ion monitoring (SIM) may be performed in order to enhance the accuracy of relevant measurements. In these instances, measurements of a subject’s breath generated by the mass spectrometer may be analyzed in real-time in order to search for compounds and fingerprints of interest. If evidence of a compound and/or fingerprint is found, the mass spectrometry analyzer may be configured to only measure and/or transmit one or more m/z values of select features of interest (or, e.g., limited ranges of m/z values containing selected features) in a subsequent breath sample provided by the subject. In these instances, the mass spectrometry analyzer may be configured to measure the select features (or, e.g., select range of m/z values containing features) with enhanced sensitivity and accuracy, i.e., when compared with the measurements taken before SIM. For example, by limiting the range of detected m/z values, the mass spectrometry analyzer may boost or amplify the signal of selected features of interest. In some cases, the SIM may be dynamic within a single breath and, e.g., the selected features of interest may change throughout a single breath. For example, in embodiments where an identified fingerprint of interest is characterized by different m/z measurements (e.g., identified compounds) as air is exhaled from deeper portions of the lungs, the SIM may change to monitor different m/z ranges as the time of detection within a single breath changes.
In some embodiments, SIM is performed automatically. For example, mass spectrometry measurements may be transmitted directly to a processor configured to search for compounds and fingerprints of interest. The processor may then configure the mass spectrometry analyzer to limit detection to, and amplify the signal of, one or more select features (e.g., of compounds or fingerprints of interest) for which evidence is found thereof. In some instances, the processor may be configured to automatically perform SIM using a trained machine learning model, as discussed in greater detail below.
As discussed above, real-time feedback of the mass spectrometry analyzer may be generated and used to enhance the accuracy of relevant measurements. In some embodiments, fragmentation may be performed in order to enhance the accuracy of relevant features. In some instances, fragmentation is performed on all breath samples using, e.g., tandem mass spectrometry. In other cases, fragmentation may be performed based on real-time feedback as discussed above. For example, if evidence of a compound and/or fingerprint of interest is found, the mass spectrometry analyzer may be configured to perform a fragmentation run on the compound of interest or compounds of the fingerprint of interest. Fragmentation may vary depending on the compound or fingerprint of interest and may include, but is not limited to, collision-induced dissociation (CID), surface-induced dissociation (SID), laser induced dissociation, electron-capture dissociation (ECD), electron-transfer dissociation (ETD), negative electron-transfer dissociation (NETD), electron-detachment dissociation (EDD), photodissociation (e.g., infrared multiphoton dissociation (IRMPD) or blackbody infrared radiative dissociation (BIRD)), higher-energy C-trap dissociation (HCD), EISA, and/or charge remote fragmentation.
In some embodiments, fragmentation is performed automatically. For example, mass spectrometry measurements may be transmitted directly to a processor configured to search for compounds and fingerprints of interest. The processor may then configure the mass spectrometry analyzer to perform fragmentation of the compound of interest or compounds of the fingerprint of interest for which evidence is found thereof. In some instances, the processor may be configured to automatically perform fragmentation using a trained machine learning model, as discussed in greater detail below. In some cases the processor may be configured to automatically perform SIM and fragmentation. For example, the processor may perform SIM (e.g., as discussed above) to amplify the signal of m/z measurements pertaining to compounds and fingerprints of interest for which evidence is found thereof after receiving measurements pertaining to a first breath or group of breaths provided by a subject. If the SIM further verifies the presence of the identified compound(s) and/or fingerprint(s) of interest, the processor may then configure the mass spectrometry analyzer to perform fragmentation of the compound of interest or compounds of the fingerprint of interest in order to confirm the presence of the identified compound(s) and/or fingerprint(s) of interest in the subject’s breath.
In some instances, one or more analyzers (e.g., as described above) may be used to further verify the presence of the identified compound(s) and/or fingerprint(s) of interest. For example, after the method for dynamically adjusting breath collection automatically based on realtime feedback (e.g., as described above) is run, a further breath sample may be collected and analyzed using gas chromatography (GC) or liquid chromatography (LG) techniques, such as GC- MS or LC-MS. In some cases, the GC-MS or LC-MS may be coupled with SESI-HRMS including, e.g., in tandem with the SESI-HRMS.
In some embodiments, real-time feedback of measurements of the mass spectrometry analyzer may be generated and used to monitor data quality. In some embodiments, real-time feedback of the mass spectrometry analyzer may be automatically monitored in order to determine if the breath sample (i.e., or individual breaths thereof) is of a sufficient quality. By sufficient quality is meant capable of producing accurate breath assay results. In some embodiments, data quality may be monitored using a machine learning model as discussed in greater detail below. For example, real-time measurements may be fed to a trained machine learning model in order to determine if the measurements of an individual breath are of sufficient quality. In some embodiments, the subject may be prompted to provide an additional breath or additional breaths if a breath sample (i.e., or individual breaths thereof) is not of sufficient quality. In some embodiments, a technician or operator may monitor real-time feedback of the mass spectrometry analyzer in order to determine if the breath sample is of a sufficient quality or if one or more settings of the mass spectrometry analyzer should be adjusted.
While the methods of the invention may be employed on a variety of subjects, in some instances, the subject is a human. In some instances, the human is a protective service professional, a healthcare professional, a construction professional, a production professional, or a military professional, e.g., as is further detailed at: https://www.bls.gov/soc/2018/major_groups.htm. Of interest in certain embodiments is where the human is a protective service professional, such as a firefighter. In some instances, the methods of the invention may be employed on a subject wherein there is evidence the subject has a disease or condition or is at an elevated risk of developing a disease or condition.
In some embodiments, the plurality of subjects may include two or more subjects. In some instances, the plurality of subjects may include ten or more subjects, such as twenty or more, or fifty or more, or one hundred or more, or two hundred or more, or five hundred or more, or one thousand or more, or five thousand or more, or ten thousand or more, or one hundred thousand or more. The plurality of subjects may include the subjects of any demographic or cohort. For example, the subjects may be of any sex, gender, age, ethnicity, or race. In some cases, the plurality of subjects may include subjects associated with, or belonging to, a population or cohort of interest. By population or cohort of interest is meant a group of people banded together or treated as a group, such as a specific demographic of individuals. For example, the cohort of interest may be individuals experiencing or affected by (e.g., at risk for) a specific disease or condition. In some instances, the plurality of subjects may consist of only subjects belonging to a cohort of interest.
FIG. 7 provides a depiction of a method for dynamically adjusting breath collection automatically based on real-time feedback in accordance with an embodiment of the invention. At step S1 , the subject supplies one or more initial breaths to the mass spectrometry analyzer for analysis. At step S2, real-time analysis is performed on the measurements generated by the mass spectrometry analyzer in order to identify one or more compounds or fingerprints of interest. The compounds of interest may include toxins, and the fingerprints of interest may be generated using a machine learning model. At step S3, a check is done as to whether there is evidence for the presence of a compound or fingerprint of interest. For compounds of interest, any relevant m/z signal above a predetermined level associated with noise may be considered evidence of the compound of interest. For fingerprints of interest generated using a machine learning model, if a minor adjustment of a small number of m/z signals (e.g., the intensity and/or m/z value of the signals) would result in the identification of a fingerprint of interest, there may be considered evidence of the fingerprint of interest. If evidence is found for a compound or fingerprint of interest, and SIM has not yet been performed (step S4), the mass spectrometry analyzer is automatically adjusted to “zoom in” (e.g., limit detection to) one or more features of interest at step S5. The features of interest may be determined using a machine learning model. For example, compounds for which a minor alteration in detected intensity would change the identified fingerprint of interest may be classified as features of interest and “zoomed in” on. At step S6, a visual display, such as a liquid crystal display (LCD) screen, prompts the subject to provide one or more additional breaths to the mass spectrometry analyzer. Steps S1 and S2 are then repeated, and the subject provides another breath or set of breaths for which real-time analysis is performed. At step S3, a check is done as to whether there is still evidence for the presence of a compound or fingerprint of interest after the measurements for the “zoomed in” on compound or compounds are received or updated. If evidence for the presence of a compound or fingerprint of interest is still present after SIM, the mass spectrometry analyzer is automatically configured to perform fragmentation for one or more features of interest at step S7. Steps S6, S1 , and S2 are then repeated in order to verify the presence of the compound or fingerprint of interest, and the assay is ended. In some cases, the subject may be prompted to provide an additional breath or set of breaths prior to SIM, during SIM, and/or during fragmentation measurements as needed (e.g., at step S4). For example, if a trained machine learning algorithm or an operator monitoring breath collection determines an individual breath or set of breaths is not of a sufficient quality, another breath or set of breaths may be provided without resetting the automatic dynamic breath collection process. In some cases, the subject may be prompted to provide multiple breaths or series of breaths to support SIM (e.g., to enhance the statistical significance of results) or to gather additional data for deep learning, as described in greater detail below.
FIG. 8A and 8B provide an example of SIM. In Fig. 8A, a range from 0 m/z to roughly 1750 m/z is measured in a single scan. In Fig. 8B, a smaller range from roughly 500 m/z to 750 m/z is measured in a single scan, allowing for greater sensitivity and the distinction of compounds similar in m/z value.
Data Generation and Analysis
As described above, embodiments of the methods include analyzing breath samples from a plurality of subjects with a secondary electrospray ionization-high-resolution mass spectrometry analyzer to generate a plurality of breath biopsy output files. The methods and techniques by which a breath biopsy file may be generated and analyzed, in accordance with embodiments of the invention, may vary. In embodiments where the breath assay includes mass spectrometry (i.e., where the breath analyzer includes a mass spectrometry analyzer), breath assay data may be generated and analyzed in real-time, e.g., as described in United States Provisional Application Serial Nos. 63/359,134 and 63/416,185 (Attorney docket nos. DIAG-003PRV and DIAG-003PRV2, respectively); the disclosures of which are herein incorporated by reference.
As summarized above, methods of generating a health report for a subject are provided. In embodiments where the breath assay includes mass spectrometry such as, e.g., SESI-MS, the breath sample may be assayed by a mass spectrometry analyzer to generate a breath biopsy output file. In some embodiments, the breath biopsy output file is a RAW file. By RAW file is meant a file that has not been compressed, encrypted, or processed. The breath biopsy output file (e.g., RAW file) may then be automatically detected. The automatically detected breath biopsy output file may then be associated with an identifier of the subject to produce an identifier associated breath biopsy output file. In some embodiments of the methods, associating the automatically detected generated breath biopsy output file with an identifier of the subject includes: receiving an identifier from the subject; and confirming that the generated breath biopsy output file is from analysis of the breath sample obtained from the subject. In some embodiments, computer code (e.g., a program) may be configured to automatically detect the breath biopsy output file and, e.g., prompt a technician or operator regarding the association of the generated output file with a subject (e.g., an identifier of the subject, such as an identification number or code). The technician or operator may then confirm the detected breath biopsy output file is from analysis of the breath sample obtained from the subject and the automatically detected breath biopsy output file may then be associated with the identifier of the subject to produce an identifier associated breath biopsy output file. In some instances, the identifier is associated with the automatically detected generated breath biopsy output file by a human operator, while in other instances the identifier is associated with the automatically detected generated breath biopsy output file by a program (e.g., after confirmation). In other cases, the automatically detected breath biopsy output file is automatically associated with the subject identifier without confirmation from a human operator or technician (e.g., by a program).
As discussed above, a breath biopsy output file (e.g., RAW file) may be automatically detected and subsequently associated with the subject (i.e., an identifier of the subject) to produce an identifier associated breath biopsy output file. The identifier of the subject may vary, where examples of identifiers include, but are not limited to alpha/numeric identifiers (e.g., an identification number or a string of letters and/or numbers), codes such as, e.g., QR codes, barcodes, etc. In some embodiments, the identifier may identify the subject through association with identifying information of the subject such as, but not limited to, the subject’s full legal name, contact information, home address, social security number, etc. In these embodiments, the association may occur in a database or in a datasheet (e.g., wherein the identifying information may be found by searching for the identifier). In these cases, it may be relatively difficult or impossible to associate the identifying information of the subject with the identifier without access to the database or the datasheet (i.e., the database or datasheet is secured and/or protected). In some embodiments, the identifier is generated for or assigned to the subject during the session or appointment in which the breath sample is collected (and, e.g., subsequently analyzed wherein the breath biopsy output file is produced). In other embodiments, the identifier is generated for or assigned to the subject before the session or appointment in which the breath sample is collected. In these embodiments, the subject may provide their identifying information through any number of means including, e.g., by navigating to a web address or via email, wherein an identifier is generated for or assigned to the subject after the subject has provided their identifying information. In these instances, the subject may provide the identifier to a technician or operator prior to the collection and analysis (i.e., assaying) of the breath sample of the subject. For example, the subject may provide a QR code to an operator or technician, wherein by scanning the QR code the identifier is received from the subject. In some instances, the identifier may be automatically generated for or assigned to the subject after the subject has provided their identifying information. In some embodiments, the subject may fill in or submit an initial health information questionnaire that may be associated with the identifier of the subject. In some embodiments, the method includes associating the identifier with a prior health record of the subject.
After the identifier associated breath biopsy output file is produced (e.g., as described above), the file may be converted to an open XML-based format such as, e.g., mzML format. In some embodiments, metadata associated with the identifier associated breath biopsy output file may be obtained. The obtained metadata may include, but is not limited to, the subject’s identifier and/or identifying information, a health questionnaire submitted by the subject, mass spectrometer status/settings, temperature, humidity, etc. In some embodiments, the metadata is saved in a file (e.g., a logfile) associated with the identifier associated breath biopsy output file (e.g., labeled with the subject’s identifier, a timestamp, a lab identifier, a machine identifier, etc.). In some embodiments, a technician may be enabled to enter comments to the metadata file if desired (e.g., indicating the breath sample assayed was contaminated). The metadata file may be in a readable format such as, e.g., JSON, XML, CSV, CSON, TXT, etc. In some embodiments, an intuitive data set is generated from the identifier associated (and, e.g., converted) breath biopsy output file. The intuitive data set may be structured and formatted in order to be compatible with the subsequent steps of the invention. For example, the intuitive data set may be structured and formatted in order to train a machine learning model, as discussed in greater detail below. In some instances, the intuitive data set (e.g., and the metadata file associated therewith) is used to generate a health report as described in greater detail below. In some embodiments, the intuitive data set is generated, at least in part, by reducing the data of the identifier associated breath biopsy output file.
In embodiments where generating the intuitive data set includes reducing the data of the identifier associated (and, e.g., converted) breath biopsy output file, the reduction may vary. In some embodiments, the reduction may depend on one or more components of the training and/or configuration of the machine learning model, as discussed in greater detail below. In embodiments wherein the breath sample is collected directly from the subject (i.e., without a phase transition), the reduction may include the processing step of automatically identifying individual breaths in the sample. Breath identification may occur by finding plateau signatures in the time-dependent total ion current (TIC) data received from the mass spectrometry analyzer and contained in the identifier associated breath biopsy output file. By TIC is meant the summed intensity across the entire range of masses (m/z values) being detected at a single point in time. In some embodiments, plateaus may be identified by detecting large increases or decreases in TIC between different timepoints, e.g., before or after timepoints reflecting a relatively uniform TIC. The identified breaths in the sample may then be assigned breath identification numbers. In some embodiments, a breath duration is determined for each identified breath indicating the time from the onset of the breath to the end of the breath. In these instances, data or measurements generated at the beginning or end of each breath duration (i.e., data or measurements at the shoulders of each identified breath) may be excluded or discarded. In some embodiments, the time given data of an identified breath (e.g., a measurement or a peak) is generated from the beginning of the breath is determined and/or recorded. In these instances, the time from the beginning of an identified breath the given data is generated is used to distinguish between data received from deep and shallow portions of the exhaled breath.
In some embodiments, the reduction process may include the step of automatically identifying all features (i.e., peaks or measurements) of the breath sample from the identifier associated breath biopsy output file. Statistical measures of the identified features may then be determined. For example, a per-breath average and standard deviation describing specific features in each identified breath may then be determined. The automatically identified features of the breath sample may be matched or associated with compounds, e.g., using the mass to charge ratio (m/z) of each peak and/or the time from the beginning of an identified breath each peak was generated. In some embodiments, a value of abundance is generated for the identified peaks matched or associated with compounds, e.g., using the intensity of each peak and/or the identity of the associated compound. By value of abundance is meant a quantitative value related to, in some instances indicating, the number or amount of the matched or associated compound in the breath sample. In some embodiments, the reduction process may include the step of rectifying or correcting spectra such as, e.g., reducing noise or correcting the m/z or intensity value of an identified peak or peaks. In some instances, the mass spectrometry analyzer may generate a plurality of scans during the breath sample assay. These scans may be uniquely and adaptively sampled in the m/z space. In some embodiments, rectifying or correcting spectra may include the resampling and interpolation of all scans to a single m/z space or axis (e.g., a common m/z array). In other embodiments, each individual scan is processed and analyzed in their own unique m/z space, and the sample scans are linked from one scan to the next (e.g., temporally).
In some embodiments, the reduction may include the step of omitting or excluding (e.g., deleting) data determined to not be necessary for further analysis (e.g., the training of a machine learning model, as described below) after a processing step or processing steps (e.g., as described above) have been performed or executed. For example, after the processing step of automatically identifying breaths in the sample data has been executed, data (e.g., peaks or scans) not generated during an identified breath may be deleted or omitted. In some embodiments, identified features (i.e., peaks) of the breath sample that cannot be matched or associated with compounds may be deleted or omitted. In some embodiments, identified features (i.e., peaks) of the breath sample are deemed to be spurious (e.g., by some statistical means) and may be deleted or omitted. In some instances, data determined to not be necessary for the generating of a health report (e.g., as described in greater detail below) may be deleted or omitted. In some embodiments, a code or program may be configured to reduce the data, e.g., as described above. In some embodiments, the code may be wrapped (i.e., the code may be encapsulated in a wrapper function). In these instances, data (e.g., arguments) from the identifier associated breath biopsy output file and/or the metadata file associated with the breath biopsy output file may be passed to the wrapper function. In some cases, the reduction code or program automatically reduces the identifier associated breath biopsy output file to generate an intuitive data set.
In some embodiments, an overview of the results of the breath sample assay may be generated from the data of the converted identifier associated breath biopsy output file or the intuitive data set generated therefrom. For example, the overview may include the number of peaks found, the peaks found at different m/z values over the time the assay was run, total ion current, various statistical analyses, the number of matched or associated compounds detected per identified breath, an intensity distribution, a histogram of the number of features per m/z value, etc. In some instances, the overview may additionally contain data from the breath collection device or system. For example, the overview may contain the flow rate a breath sample was collected at, the volume of a breath sample, the temperature of a breath sample, a value of abundance of water vapor or carbon dioxide in a breath sample (e.g., the percentage of water vapor or carbon dioxide in a breath sample), etc. In some embodiments, the overview may display or convey the results of the breath sample assay on a per assayed breath basis. In some cases, this may allow outlier breaths to be identified and potentially excluded from the health report in order to, e.g., enhance the accuracy of the results. In some cases, outlier breathes are identified using a machine learning model such as, e.g., a machine learning model trained or including architecture as described below. In some instances, outlier breaths may be identified using a rules-based system. In some embodiments, the overview may indicate potential problems including, but not limited to, problems associated with the breath sample quality, possible contamination, etc. In these instances, an operator or technician may choose to adjust the machine configuration or capture additional breath samples based, at least in part, on feedback provided by the overview. In some cases, the overview may be generated in real time. By real time is meant the overview is generated during or immediately following the breath sample assay (e.g., during collection of the breath sample or while the breath sample is being analyzed using, e.g., a mass spectrometry analyzer). In some instances, the overview is generated in two hours or less. In some cases, the overview is generated in one hour or less, such as thirty minutes or less, or twenty minutes or less, or ten minutes or less, or five minutes or less, or one minute or less. In some instances, one or more of the identifier associated breath biopsy output file, the intuitive data set generated from the breath biopsy output file, the metadata file associated with the breath biopsy output file, or the overview of the results of the breath sample assay may be saved or archived to a database such as, e.g., a database including a data warehouse. In some cases, one or more non-breath assay health records of the subject are associated with the identifier of the identifier associated breath biopsy output file, the intuitive data set, the metadata file, and/or the overview. The one or more non-breath assay health records of the subject are then saved or archived to the database (e.g., data warehouse) with the breath biopsy files.
FIG. 1 provides a depiction of an overview of the results obtained from an identifier associated breath biopsy output file in accordance with an embodiment of the invention. Overview (i.e., Quicklook) 100 includes header 101 and selectable menu 102 provided to assist a viewer in navigating between sections of a health report when, e.g., the report and the overview are both displayed on an electronic viewing device (e.g., a computer or a smart phone). Session summary 103 provides information pertaining to the session in which the breath sample assay was performed. The overview further includes the identifier of the subject 104 as well as various charts and graphs depicting data of the intuitive data set generated from the breath sample assay. Graph 105 depicts the TIC per sample number (i.e., scan number), with the orange line indicating sample numbers wherein an exhaled breath is received by the mass spectrometer. Graph 106 depicts the m/z value of compounds detected by the mass spectrometer over time. Graph 107 depicts the total number of peaks found per identified exhaled breath received by the mass spectrometer. Graph 108 depicts a histogram of the number of features detected per m/z value, with colors indicating which identified exhaled breath each bin belongs to. In some cases, the overview may be generated, at least in part, using a trained machine learning algorithm. In these cases, the overview may further indicate breaths determined to not be of a sufficient quality that were excluded from downstream analysis (e.g., to generate a health report).
FIG. 2 provides a depiction of a method for generating an intuitive data set from an identifier associated breath biopsy output file in accordance with an embodiment of the invention. At step 200, an identifier associated breath biopsy output file (produced from a RAW file as described above) is converted to mzML format. At step 201 , data from the mzML file is passed to a wrapper function configured to reduce the data of the mzML file and produce an intuitive data set using, e.g., rules-based approaches. At step 202, the wrapper function performs peak analysis in order to, e.g., identify and associate peaks of the breath sample assay with target compounds. The results of the peak analysis are then passed into a data frame at step 203. At step 204, the wrapper function automatically identifies and labels individual breaths in the sample. The results of the breath identification are then passed into a data frame at step 205. At step 206, the wrapper function rectifies or corrects spectra in order to, e.g., reduce noise or correct the m/z or intensity value of an identified peak or peaks. The results of the breath identification are then passed into a data frame at step 207. Steps 202, 204, and 206 may be performed concurrently or sequentially in any order. For example, step 204 may be performed prior to step 202. In this case, the peaks not occurring within an identified breath may be omitted from the peak analysis. At step 208, metadata associated with the identifier associated breath biopsy output file may be captured or obtained. At step 209, the metadata (e.g., a metadata file) and the output of the reducing wrapped function (i.e., the intuitive data set) are saved or archived. The metadata and intuitive data set may be saved via local storage and/or cloud storage and, e.g., may be saved to a database such as a data warehouse. In some embodiments, the metadata and intuitive data set are associated with one or more non-breath assay health records of the subject before being saved or archived with the non-breath assay health record(s). In some instances, the one or more non-breath assay health records may be associated with an identifier of the subject (e.g., as discussed above) and saved before or after the breath assay data. The association of the non-breath assay data with the breath assay data may then be made in the database. At step 210, the metadata and intuitive data set are used to generate an overview of the results of the breath sample assay (i.e., a Quicklook or a Quicklook report). At step 211 , the metadata and intuitive data set are used to generate other reports such as, e.g., a health report as described in greater detail below. The health report may be generated based on correlations and relationships determined from the previously stored metadata, breath biopsy output files and/or intuitive data sets of a plurality of subjects in combination with one or more non-breath assay health records of each subject. For example, a dynamic model, such as a machine learning model (e.g., as described below), may be trained and updated each time step 209 is run (i.e., whenever new data is stored or archived). The health report may then be generated, at least in part, using the trained machine learning model.
Obtaining Health Records
As described above, embodiments of the methods include obtaining a health record associated with a disease or condition for each subject. By obtain is meant to make the health record(s) accessible or available for the subsequent steps of the methods (e.g., available for training the machine learning model). In some instances, health records associated with a disease or condition are health records that indicate a diagnosis of the disease or condition in the subject. In some instances, health records associated with a disease or condition are health records that disclose the manifestation of signs or symptoms of the disease or condition in the subject. In some cases, the disease or condition may be the relative condition of the subject’s overall health or the health or condition of an organ or system of the subject’s body. In some instances, the disease or condition may be any disease or condition that impairs or affects the normal functioning of the body. In some instances, the disease or condition may be, e.g., an infectious disease, deficiency disease, hereditary disease, or physiological disease.
In embodiments where the disease or condition is an infectious disease, the infectious disease may be, e.g., a bacterial disease or infection (such as, e.g., syphilis, pneumonia, tetanus, and/or tuberculosis), a viral disease or infection (such as, e.g., chickenpox, measles, herpes, the common cold, or COVID-19), a fungal disease or infection (such as, e.g., ringworm infection, athlete’s foot, or yeast infections), or a parasite or parasitic disease (such as, e.g., malaria).
In embodiments where the disease or condition is a deficiency disease, the deficiency disease may be, e.g., malnutrition, scurvy, rickets, osteoporosis, or a birth defect. In embodiments where the disease or condition is a hereditary disease, the hereditary disease may be, e.g., cystic fibrosis, Huntington’s Disease, sickle cell anemia, a birth defect, etc. In some cases, the disease or condition may be affected by, but not unilaterally caused by, genetics or may be a polygenic disease. In these instances, the disease or condition may be caused by a combination of genetic and environmental factors and may be asthma, an autoimmune disease such as multiple sclerosis, cancer (e.g., colon, skin, or lung cancer), ciliopathy, cleft palate, diabetes, chronic obstructive pulmonary disease, heart disease, hypertension, inflammatory bowel disease, an intellectual disability, a mood disorder, obesity, refractive error, infertility, schizophrenia, or any number of a variety of mental disorders. In embodiments where the disease or condition is a physiological disease, the physiological disease may be, e.g., diabetes, cancer, hypertension, or heart disease. In some cases, the disease or condition may include any disease or condition caused by environmental factors, behavior, or diet.
In some cases, the disease or condition may be a psychological disease or condition such as, e.g., an anxiety disorder, depression, bipolar disorder, post-traumatic stress disorder (PTSD), schizophrenia, an eating disorder, a disruptive behavior and/or dissocial disorder, or a neurodevelopmental disorder. In some instances, the disease or condition may be hypothermia, hyperthermia, or may otherwise result from exposure to prolonged or extreme hot or cold temperatures. In some cases, the disease or condition may result from an injury or may affect mobility. For example, the disease or condition may include, but is not limited to, a burn, cuts or scrapes, internal bleeding, traumatic injuries, arthritis, tendonitis, a tendon or myotendinous tear, a hernia, old age, paralysis, chronic health problems such as, e.g., chronic pain or chronic problems associated with bad posture, etc.
In some cases, the disease or condition may be toxin exposure or may result from the exposure of the subject to one or more toxins or sources of toxins. In some cases, the disease or condition may be the presence of a compound of interest, such as a toxin, in the breath and/or body of the subject. In some instances, the one or more toxins includes one or more carcinogens. Carcinogens of interest include, but are not limited to, carcinogens classified as being Group 1 carcinogens by the International Agency for Research on Cancer (IARC). A Group 1 classification indicates that an agent (e.g., a compound) exhibits sufficient evidence of carcinogenicity in humans. Carcinogens of interest also include, but are not limited to, carcinogens classified as Group 2A carcinogens by the IARC. Group 2A classification indicates that an agent (e.g., a toxin) is probably carcinogenic.
As discussed above, embodiments of the methods include obtaining a health record associated with a disease or condition for each subject. In some embodiments, the health record includes one or more of a personal health record (PHR), electronic medical record (EMR), or electronic health record (EHR) of the subject. In some instances, the health record includes selfreported health data such as, e.g., the subject’s responses to a survey or a health information questionnaire (e.g., as described above). In some cases, the health record may include non-health data. In these instances, the non-health data may include information regarding the subject that has the potential to affect, or be affected by, the subject’s health. For example, the non-health data may include one or more cohorts in which the subject belongs such as, e.g., the subject’s profession, the various tasks or responsibilities associated with the subject’s profession, or the location in which the subject lives or works (e.g., country, state, city, local geography, proximity to locations of interest such as, e.g., industrial facilities, etc.).
In some embodiments the health record includes one or more non-breath health assessments. While the one or more non-breath health assessments may vary, in some instances, the one or more health assessments may include a health assessment selected from the group consisting of a lung health assessment, an assessment of fitness for a given task(s), a medical imaging assessment (e.g., an ultrasound assessment), a biological sample assessment (e.g., urine tests, feces tests, blood tests, biopsies, etc.) and combinations thereof. In some instances, the biological sample assessment may include a blood panel such as, e.g., a complete blood count (CBC). In these instances, the CBC may include counts of white blood cells, red blood cells and platelets, the concentration of hemoglobin, the hematocrit, red blood cell indices, white blood cell differentials, etc. In some embodiments, the non-breath health assessment may include a microbiome test or assay (e.g., 16S sequencing or shotgun metagenomic sequencing). In some embodiments, the non-breath health assessment may include a genetic test or DNA testing.
In some embodiments, the health record may include physiological data, such as, but not limited to, one or more of heart rate, blood glucose, blood pressure, respiration rate, body temperature, blood volume, sound pressure, photoplethysmography, electroencephalogram, electrocardiogram, blood oxygen saturation, and skin conductance. In some embodiments, the physiological data may be obtained using a wearable device. Wearable devices in accordance with embodiments of the methods may include, but are not limited to, smartwatches (e.g., Apple watches, Garmin watches, or Fitbit® watches), sleep trackers (e.g., Oura rings), or heart rate monitors. In some embodiments, the wearable device may include motion sensors (e.g., accelerometers and gyroscopes), electrical sensors (e.g., electrocardiogram sensors), or light sensors (e.g., photoplethysmography (PPG) sensors). In some embodiments, the wearable device is a medical Internet of Things (loT) device. Medical loT devices of interest may include, but are not limited to, implanted medical devices (IMDs) (e.g., insulin pumps or defibrillators), wearable medical devices (e.g., continuous glucose monitors), and discrete devices (e.g., loT enabled blood pressure cuffs).
In some embodiments, the non-breath health assessments and/or physiological data may be associated with the diagnosis of a disease or the assessment of a condition in the subject. In these instances, the health assessments and/or physiological data may have been used to inform the diagnosis of a disease or assess a condition in the subject. In some cases, the health assessments and/or physiological data may reflect a sign or symptom of a disease or condition in the subject. In some instances, the health assessments and/or physiological data may regard a subject diagnosed with a disease or condition or having been assessed as having a given condition of overall health, organ health, or system health (e.g., lung health is excellent, overall good, somewhat poor, overall poor, etc.). In some instances, the health assessments and/or physiological data may regard a subject known to be free of a disease or condition (e.g., the subject is healthy, the subject does not have COPD, etc.).
In some embodiments, the health records may be obtained directly or indirectly from the subject, a caregiver or provider of the subject, or a database or data warehouse (e.g., as described above). In some instances, the health records (e.g., associated with a disease or condition for each of the plurality of subjects) may be obtained, at least in part, by converting the health records to a form compatible with a subsequent step or steps of the methods. In some embodiments, the health records may be converted from a format difficult for machines to interpret to a format in a standard computer language that can be read automatically by a machine. In some cases, optical character recognition (OCR) software may be used to convert health records to a form compatible with a subsequent step or steps of the methods. For example, in cases where a health record is stored in an image format (e.g., a PDF or JPEG format), the health records data may be converted to a JSON format, an XML format, a CSV format, a CSON format, an HTML format, etc. In these cases, organizational or categorical information structuring or classifying the health records data may be manually entered. For example, one or more components of a health record (e.g., as discussed above) or a section thereof, may be categorized using date or diagnosis codes (such as, e.g., diagnosis codes associated with a disease or condition). In some cases, organizational or categorical information may be automatically identified from structured digital health records data and used to identify or classify one or more components of a health record, or sections thereof, using, e.g., lines of computer code and rules- based approaches or supervised machine learning approaches paired with natural language processing software. In some instances, the EHR data may be obtained by scanning or imaging a plurality of health records existing in hard copy form, followed optionally by conversion of the resulting image files in any of the manners discussed above.
As discussed above, embodiments of the methods include obtaining a health record associated with a disease or condition for each subject. Health records associated with a disease or condition may be, e.g., health records that indicate a diagnosis of the disease or condition in the subject or disclose the manifestation of signs or symptoms of the disease or condition in the subject. In some embodiments, the condition may be the relative condition of the subject’s overall health, an organ of the subject, or a system of the subject’s body. In other embodiments, the disease or condition may be any disease or condition that impairs or affects the normal functioning of the body. The health record may include a personal health record (PHR), electronic medical record (EMR), electronic health record (EHR), self-reported health data, non-health data, nonbreath health assessment and/or physiological data regarding the subject and may be provided by the subject, a caregiver or provider of the subject, or a database or data warehouse as described above. In some instances, obtaining the health records may include converting the health records to a form that can be read automatically by a machine and is compatible with a subsequent step or steps of the methods (e.g., automatic supervised training of a machine learning model). The health records obtained for each subject, together with the breath biopsy output files generated for each subject, may then be used to train a machine learning model to identify a relationship between breath samples and a disease or condition of interest, as discussed in greater detail below.
Machine Learning Models
As described above, embodiments of the methods include training a machine learning model to identify a relationship between breath samples and a disease or condition using generated breath biopsy files and obtained health records. By training is meant providing or feeding the breath biopsy output files and one or more elements of the obtained health records to the machine learning model so that the model can adjust one or more of its components (e.g., weights or biases) in order to or effectively (e.g., accurately or efficiently) perform a task. The machine learning model, in accordance with embodiments of the methods, may vary and may include, but is not limited to, any of the models discussed below. In some embodiments, the training may further include validating and testing.
The tasks the machine learning model is trained to perform, in accordance with embodiments of the invention, may vary. In some instances, the machine learning model may be trained to perform any task associated with assessing a subject’s health including any task demonstrated, or enabled, by the obtained health records as described above. For example, if the health records provide the diagnosis of a disease or condition, the machine learning model may be trained to diagnose the disease or condition in a subject using the subject’s breath (i.e., a breath biopsy file generated from the subject’s breath). In embodiments where the health record reflects a sign or symptom of a disease or condition in the subject, the machine learning model may be trained to indicate the manifestation of the sign or symptom of the disease or condition in the subject using the subject’s breath.
In some cases, the obtained health records (e.g., as described above) are used to interpret the findings or inferences generated by a machine learning model using the subject’s breath. In some embodiments, the findings or inferences generated by the machine learning model using the subject’s breath may include changes in a health state or a condition of health of the subject. In these cases, the machine learning model may be trained to indicate a change in the fingerprint of a subject’s breath using unsupervised machine learning techniques. For example, the subject may provide breath samples (e.g., to generate breath biopsy output files) at two or more timepoints such that the most recent sample provided by the subject can be compared to a baseline. In some instances, the baseline may include breath sample data generated from a breath sample provided by the subject 1 day prior to the most recently provided breath sample, 1 week prior to the most recent breath sample, 1 month prior to the most recent breath sample, 6 months prior to the most recent breath sample, 1 year prior to the most recent breath sample, 5 prior to the most recent breath sample, etc. The machine learning model may then use data generated from the most recent breath sample provided by the subject and the baseline in order to look for temporal changes of the subject’s breath fingerprint. The obtained health records (including, e.g., health records obtained at the time the baseline breath sample was provided and/or health records obtained at the time the most recent breath sample was provided) may then be used to interpret any identified temporal changes. In some embodiments, the tasks performed by the machine learning model may depend on the nature of the disease or condition of interest. In embodiments where the health record indicates a diagnosis of the disease or condition in the subject, the machine learning model may be trained to identify features of a breath sample (e.g., the relative abundance of a set of metabolites or other compounds) that correspond or correlate with a diagnosis of the disease or condition in order to, e.g., identify a signature of the disease or condition. The machine learning model may then be applied to a breath biopsy output file generated for a subject (i.e., separate from the breath biopsy output files used for training) in order to indicate a diagnosis of the disease or condition in the subject using the identified features. In some instances, the machine learning model may be applied to a breath biopsy output file generated for a subject in order to indicate the likelihood the subject has a disease or condition, or a prediction as to whether subject may develop a disease or condition (e.g., if they maintain their current lifestyle). In embodiments where the condition is the relative condition of the subject’s overall health or the relative condition of an organ or system of the subject’s body, the machine learning model may be trained to classify a breath biopsy output file using a numerical score representative of the overall health or the relative condition of an organ or system of the subject providing the breath sample.
In some embodiments, the tasks performed by the machine learning model may depend on the nature of health records obtained for each subject. In embodiments where the health record includes a health assessment used to inform the diagnosis of a disease or assess a condition in the subject, the machine learning model may be trained to identify relationships between features of a breath sample and features of the health assessment in order to classify the breath sample as belonging to a subject having the disease or condition. For example, in instances where the health record includes, e.g., a microbiome assessment, the machine learning model may be trained to identify features in the breath sample that correspond to the presence of specific bacteria or genes in the microbiome. The trained machine learning model may then be able to identify specific bacteria or genes in the microbiome of a subject by analyzing the subject’s breath (i.e., a breath biopsy files generated from the subject’s breath). In some cases, the machine learning model may be trained to only identify features in the breath sample that correspond to the presence of specific bacteria or genes in the microbiome that are indicative of a disease or condition of interest (e.g., using a microbiome assessment and a disease or condition diagnosis). In some cases, the machine learning model may be trained to utilize both the health assessment and the breath biopsy file in order to identity subject’s at risk for, or having, a disease or condition of interest. In some embodiments, the machine learning model be trained to identify breath assay data of insufficient quality. In these instances, bad breath assay data may be labeled (e.g., automatically or by a person of skill in the art) in order to train the machine learning model to recognize data of insufficient quality as the result of, e.g., ambient air or contamination. In some cases, the machine learning model may be trained to identify bad data (e.g., data of insufficient quality) using any of the techniques or methods used to train the machine learning model as described below (e.g., the machine learning model may be trained to determine a fingerprint for bad data).
As discussed above, the machine learning model, in accordance with embodiments of the methods, may vary and may include, but is not limited to, any of the models discussed below or any standard machine learning model, as well as combinations thereof, as is known in the art. In some embodiments, the machine learning model may depend on, e.g., the nature of the obtained health records and the disease(s) or condition(s) of interest.
In some embodiments, the relationships between features of the breath samples and features of the health records identified by the machine learning model may be obtained or extracted for downstream analysis. In these instances, the machine learning model may include, or be configured to employ, a linear and/or logistic regression algorithm, a linear discriminant analysis algorithm, a support vector machine (SVM) algorithm, a random forest algorithm, a K- Nearest Neighbors algorithm, a decision tree algorithm, or an XGBoost algorithm. In some embodiments, the relationships between features of the breath samples and features of the health records identified by the machine learning model (e.g., during training) may be difficult to obtain or extract and/or may be unknown to the individuals implementing the model (e.g., the relationships may be too complexed to be understood or interpreted by a human or the relationships may contained in a component of the machine learning model considered to be a “black box”). In some instances, the features of interest (e.g., of a compound, toxin source, disease, or condition fingerprint) identified by the machine learning model and used to classify or identify components of the breath sample may not be correlated or associated with known or identified compounds. In these cases, the features of interest may include unidentified peaks or measurements (i.e., m/z signals).
In some embodiments, the machine learning model may include an artificial neural network (NN). In some embodiments, the machine learning model is a deep learning model. In these cases, the model may be three or more layers deep, such as five or more layers deep, or ten or more, or twelve or more, or thirty or more, or fifty or more, or one hundred or more. In some embodiments, the data of the breath biopsy output files may be provided in an image format (e.g., as a total ion current (TIC) chromatogram or spicier diagram). In these instances, the machine learning model may be configured to process images and may include, or be based on, a convolutional neural network (CNN), recurrent neural network (RNN), region-convolutional neural network (R-CNN), etc.
In some embodiments, the machine learning model is configured to process sequential input data. In these instances, the machine learning model may include, or be based on, a recurrent neural network (RNN) model or a transformer model. In embodiments where the machine learning model includes an RNN, the RNN may include, e.g., long short-term memory (LSTM) architecture, gated recurrent units (GRUs), or attention (i.e., may employ the attention technique or include an attention unit). In some embodiments, the machine learning model may include, or be based on, the architecture of a transformer model.
As discussed above, the machine learning model may be configured to process sequential input data. The sequential input data may be a sequence of scans presented, e.g., as temporally linked numerical matrices or images. In these instances, the machine learning model may be configured to learn from the contextual information of a scan (i.e., the scans before or after a given scan sequentially/temporally). The machine learning model may learn from the contextual information of a scan and, e.g., may learn from the past to present context of a scan and/or the present to past context of a scan. In some embodiments, the machine learning model may learn from both the past to present context and the present to left past context of a scan (i.e., the machine learning model may be bidirectional). For example, the machine learning model may include, or be based on, e.g., a bi-directional LSTM model, an RNN model with an attention, a convolutional recurrent neural network model with an attention (CRNN-A), or a transformer model. In embodiments where the bi-directional machine learning model includes, or is based on, a transformer model, the transformer model may include decoder blocks, encoder blocks and/or encoder/decoder architecture.
In some embodiments, the machine learning model is configured to select and process an individual scan of the breath biopsy output file. In some instances, the data of the breath biopsy output files may be provided as three-dimensional images (e.g., or multi-dimensional numerical matrices) aggregating or compiling data from all scans of a breath biopsy output file, or select scans (i.e., chosen randomly or using rules-based approaches) of a breath biopsy output file. In some embodiments, including any of the embodiments discussed above, the machine learning model may be configured to process or analyze the breath biopsy output file(s) on a per breath basis. Training may depend on, e.g., the nature or architecture of the machine learning model, the nature of the obtained health records, and/or the nature of the disease or condition of interest. In some cases, the machine learning model may be trained using supervised learning methods. In these cases, relevant data of interest (e.g., disease diagnoses, gene expression, microbiome bacteria, etc.) may be extracted from the health records obtained for each of a plurality of subject’s and used to label the corresponding breath biopsy output file of each subject. The labels or categories of interest, and the labeled breath biopsy data, may then be used to train the machine learning algorithm. In some embodiments, the extraction of the labels, association of the extracted labels with the generated breath biopsy output files, and training of the machine learning model, are performed automatically using, e.g., lines of computer code and rules-based approaches or supervised machine learning approaches paired with natural language processing software. In some instances, the health records that include relevant data of interest may be scarce. In these instances, semi-supervised learning methods may be employed. In some embodiments, unsupervised learning methods may be employed and, e.g., the categories or classifications generated by training the machine learning model may be correlated or associated with certain characteristics of patient cohorts or certain components of obtained health records after training. In some embodiments, both supervised and unsupervised learning methods may be employed. For example, unsupervised learning methods may be used to detect any temporal changes in breath fingerprints that occur in the plurality of subjects (e.g., as described above). Characteristics of the temporal changes may then be extracted and labeled (e.g., using labels extracted from health records) in order to train a machine learning model using supervised machine learning techniques. In this way, machine learning models may be trained to assess a subject’s health using temporal changes of a subject’s breath fingerprint (e.g., by comparing data obtained from a recent breath sample provided by the subject to a baseline). In other words, a disease or condition or, e.g., the exposure of the subject to a toxin source or a toxin may have a temporal change fingerprint that can be determined using machine learning techniques and utilized to assess the health of a subject.
In some embodiments, the model training algorithms and hyperparameters used to control the training may depend on, e.g., the nature or architecture of the machine learning model, the tasks the machine learning model is trained to perform, the desired accuracy or efficiency of the machine learning model, and/or the nature or size of the training data set. In some cases, the training may include methods of preventing data overfitting such as, e.g., dilution and dropout techniques. In some cases, the training and/or the training data set (e.g., the labeled breath biopsy output files) may be modified or altered to address class imbalance. By class imbalance is meant a skewed proportion of the classes that make up a data set. For example, labeled breath biopsy data reflecting a specific relationship or classification (e.g., biopsy data labeled with the diagnosis of a disease or condition extracted from the obtained health record of the subject) may be relatively uncommon in the data set. In some embodiments, the training may be modified or altered to address class imbalance. For example, the optimization loss may be weighted based on class distributions. In these cases, the weighting may be learned dynamically, e.g., during training. In some embodiments, the training data set may be modified to address class imbalance. In these instances, the majority class may be undersampled. For example, in embodiments where breath biopsy data labeled with the diagnosis of a disease or condition are relatively rare, breath biopsy data not labeled with the diagnosis of a disease or condition may be randomly undersampled. In some embodiments, the majority class or classes may be randomly undersampled to achieve a ratio of one to five minority class (i.e., rare relationship or classification) to majority class(es) or less. In some instances, the majority class(es) may be undersampled to achieve a ratio of one to fifty minority class (i.e., rare relationship or classification) to majority class(es) or less, such one to twenty, or one to ten, or one to five, or one to four.
In some embodiments, the training may further include testing the trained machine learning model or machine learning models. By testing in this context is meant evaluating the trained machine learning model using labeled breath biopsy data different from the labeled breath biopsy data used for training after the machine learning model has finished training. In some embodiments, a first subset of the labeled breath biopsy data is used for training and a second subset of the labeled breath biopsy data is used for testing. The testing may use one or more metrics to evaluate the performance of the trained machine learning model or machine learning models. The one or more metrics may vary and may depend on the tasks performed by the trained machine learning model, the training methods employed to train the machine learning model, and the architecture of the machine learning model. For example, in embodiments where the model performs a classification task, the metric may include the number, or percent, of true positives, false positives, true negatives, or false negatives for one or more classes. In some embodiments, the metric may include a sensitivity, specificity, accuracy and/or f-score. In some instances, a metric may be determined per class. In embodiments where the metric includes an f-score, the f- score may include a macro F1 -score. In embodiments where the model performs clustering or anomaly tasks, the metric may include a silhouette coefficient or any other method of evaluating an unsupervised machine learning model such as, e.g., any of the methods found in: Palacio- Nino, J., Galiano, F.B. Evaluation Metrics for Unsupervised Learning Algorithms, which are herein incorporated by reference. In some embodiments, the metric may be used to determine if the trained machine learning model performs sufficiently using, e.g., a predetermined threshold (i.e. , requirement). In these instances, if the trained machine learning model does not meet the predetermined threshold, the model may be discarded and/or another model may be trained. In embodiments where another machine learning model is trained, one or more of the model architecture, training and/or the training data set may be modified prior to training. In some instances, machine learning models are trained until a trained machine learning models meets the predetermined threshold. The division between the first and second subsets of the labeled breath biopsy data used for training and testing, respectively, may vary. In some cases, roughly 80% of the labeled breath biopsy data may be used for training and roughly 20% for testing. In some instances, roughly 70% of the labeled breath biopsy data may be used for training and roughly 30% for testing.
In some embodiments, the training may further include validating the trained machine learning model or machine learning models. By validating in this context is meant evaluating the machine learning model during training using labeled breath biopsy data different from the labeled breath biopsy data used for training and testing. In some embodiments, a first subset of the labeled breath biopsy data is used for training, a second subset of the labeled breath biopsy data is used for testing, and a third subset of the labeled breath biopsy data is used for validating. The validating may use one or more metrics to evaluate the performance of the machine learning model or machine learning models such as, e.g., any of the metrics discussed above for testing. In some embodiments, the validating may be used to, e.g., select model parameters (e.g., select one or more machine learning algorithms to continue training), optimize or tune hyperparameters (e.g., model hyperparameters or algorithm hyperparameters), etc. The division between the first, second, and third subsets of the labeled breath biopsy data used for training, testing, and validating, respectively, may vary. In some cases, roughly 80% of the labeled breath biopsy data may be used for training, roughly 10% for testing, and roughly 10% for validating.
In some embodiments, the machine learning model is trained using the outputs from other analytical tools in addition to the breath biopsy output file (i.e., the output from SESI-MS). In other words, both the breath biopsy output file and data obtained using other analytical tools are used as inputs for training the machine learning model and subsequently using the trained machine learning model to perform tasks in any of the manners as described above. In some embodiments, the other analytical tools may include a Raman spectroscopy analyzer, a breathalyzer, an optical absorbance sensing analyzer, a gas chromatography analyzer, electronic sensing using an electronic nose, a nuclear magnetic resonance spectroscopy analyzer, or any of the other breath analyzers as describe above. In some cases, the other analytical tools may include analyzers configured to perform tests and generate outputs from non-breath sample provided by the subject such as, e.g., blood samples, hair sample, urine sample, DNA samples, etc.
In some cases, the machine learning model may be continuously updated based, e.g., on newly generated breath biopsy output files and newly obtained health records. In some instances, the machine learning model may be continuously updated based, e.g., on the data saved or archived to a database, or data warehouse, as discussed above. For example, the machine learning model may be updated by training incrementally as new data comes in, in batches once a certain amount of new data is available, or the machine model may be retrained from scratch once a certain amount of new data is available. In some cases, the machine learning model may be updated incrementally or in batches, and then completely retrained once a certain amount of new data is available (e.g., every certain number of batch updates). As discussed above, embodiments of the methods include training a machine learning model to identify a relationship between breath samples and a disease or condition using generated breath biopsy files and obtained health records. In some embodiments, the relationship may be difficult to obtain or extract (e.g., the relationship may be too complexed to be understood or interpreted by a human or the relationships may contained in a component of the machine learning model considered to be a “black box” such as within multiple layers of a NN). The machine learning model may be trained to perform any task associated with assessing a subject’s health including any task demonstrated, or enabled, by the obtained health records as described above. For example, the machine learning model may be trained to identify relationships between features of a breath sample (e.g., the relative abundance of a set of metabolites or other compounds) and the diagnosis of a disease or condition. The machine learning model, may include, but is not limited to, any of the discussed models or any standard machine learning model, as well as combinations thereof, as is known in the art. In some embodiments, the machine learning model may include an artificial neural network (NN). For example, the machine learning model may include, or be based on the architecture of a recurrent neural network (RNN) or a transformer model. Training may depend on, e.g., the nature or architecture of the machine learning model, the nature of the obtained health records, and/or the nature of the disease or condition of interest. In some cases, the machine learning model may be trained using supervised learning methods and relevant data of interest (e.g., disease diagnoses, gene expression, microbiome bacteria) may be extracted from the health records and used to label the corresponding breath biopsy output file of each subject. In some instances, the machine learning model may be trained using unsupervised approaches. In some cases, both supervised and unsupervised approaches may be utilized in order to assess a subject’s health (e.g., to diagnose a disease or condition). In some embodiments, the training may further include validating, and testing of the machine learning model. The trained machine learning model may be applied to a breath biopsy output file (e.g., different from the files used for training) to generate a health report, as discussed in greater detail below.
FIG. 9 provides a flow diagram depicting a method for training a machine learning model using generated breath biopsy output files and obtained health records in accordance with an embodiment of the invention. At step 901 , breath samples from a plurality of subjects are analyzed with a secondary electrospray ionization-high-resolution mass spectrometry analyzer to generate a plurality of breath biopsy output files. At step 903, a health record associated with a disease or condition of interest is obtained for each subject. The obtained health records are then used to select labels of interest enabled by the health records at step 904, and extract the labels of interest from each health record at step 905. At step 902, the extracted labels (e.g., disease or condition diagnoses) are then associated with the breath biopsy output file corresponding to the patient for which the health record used to extract each label was obtained. At step 906, a machine learning model (such as, e.g., a RNN, CNN, transformer, or regression model) may then be trained using supervised or semi-supervised machine learning methods, the labeled breath biopsy output files, and the selected labels of interest. In some embodiments, components of the obtained health records such as, e.g., other non-breath health assessments or physiological data, are also labeled and used to train the machine learning algorithm along with/in addition to the labeled breath biopsy output files. At step 907, another breath biopsy output file, separate from the breath biopsy output files used for training, is generated from a subject. At step 908, the train machine learning algorithm is applied to the breath biopsy output file in order to classify the breath biopsy output file (step 909). In some cases, the breath biopsy output file is classified as, e.g., generated by a subject diagnosed with a disease or condition, reflecting one or more components of a non-breath health assessment, etc. The classified breath biopsy output file, along with any other health records obtained for the subject, may then be saved to a database or a data warehouse (e.g., as discussed above) in order to continuously train the machine learning model or train other machine learning models that may be applied to future breath biopsy output files.
Generating a Health Report As discussed above, embodiments of the invention include applying a trained machine learning model to a breath biopsy output file to generate a health report for the subject. The health report is a qualitative or quantitative determination regarding one or more health related matters pertaining to the subject. The health report, generated in accordance with embodiments of the methods, may vary. In some embodiments, the health report may be generated for the subject from the data of the converted (e.g., to mzML format) identifier associated breath biopsy output file such as, e.g., from the intuitive data set generated from the breath biopsy output file. In some cases, the health report may be generated for the subject from the identifier associated breath biopsy output file and the metadata file associated therewith. In some embodiments, the health report may be generated or obtained based at least in part on the breath biopsy output file (i.e., breath assay data) as described above and/or on non-breath assay data (e.g., data not obtained from a breath sample).
A health report may be generated or obtained at two or more timepoints. In some instances, a health report may be generated or obtained at three or more timepoints (i.e., to generate three or more health reports, such as four or more, or five or more, or ten or more). The two or more timepoints may be at least a day apart from each other, such as at least a week apart from each other, or at least a month apart from each other, or at least a year apart from each other. In some instances, a first timepoint of the two or more timepoints may occur after a potential exposure of the subject to a source of toxins or an indication that the subject may have a disease or condition. In other cases, a first timepoint of the two or more timepoints occurs before a potential exposure of the subject to a source of toxins or an indication that the subject may have a disease or condition in order to, e.g., function as a baseline as discussed above. In these instances, the first timepoint may occur prior to the subject initiating employment (e.g., as a firefighter) or moving to a new location. The subject may be assayed (i.e., a timepoint may occur) every set number of days or months while they are at a certain location or working a certain profession (e.g., firefighting).
In embodiments where the health report may be generated or obtained based at least in part on non-breath assay data, the non-breath assay data may vary. For example, in some embodiments the health report includes one or more non-breath health assessments. While the one or more additional health assessments may vary, in some instances, the one or more additional health assessments may include a health assessment selected from the group consisting of a lung health assessment, an assessment of fitness for a given task(s), a medical imaging assessment (e.g., an ultrasound assessment), a biological sample assessment (e.g., urine tests, feces tests, blood tests, biopsies, etc.) and combinations thereof. In some embodiments, the non-breath assay data may include a microbiome test or assay. In some embodiments, the non-breath assay data may include the medical history or health records of the subject. In certain cases, the non-breath assay data may include physiological data, such as, but not limited to, one or more of heart rate, blood glucose, blood pressure, respiration rate, body temperature, blood volume, sound pressure, photoplethysmography, electroencephalogram, electrocardiogram, blood oxygen saturation, and skin conductance. In some embodiments, the physiological data may be obtained using a wearable device. Wearable devices in accordance with embodiments of the methods may include, but are not limited to, smartwatches (e.g., Apple watches, Garmin watches, or Fitbit® watches), sleep trackers (e.g., Oura rings), or heart rate monitors. In some embodiments, the wearable device is a smartwatch such as, e.g., a Fitbit® watch. In some embodiments, the wearable device may include motion sensors (e.g., accelerometers and gyroscopes), electrical sensors (e.g., electrocardiogram sensors), or light sensors (e.g., photoplethysmography (PPG) sensors). In some embodiments, the wearable device is a medical Internet of Things (loT) device. Medical loT devices of interest may include, but are not limited to, implanted medical devices (IMDs) (e.g., insulin pumps or defibrillators), wearable medical devices (e.g., continuous glucose monitors), and discrete devices (e.g., loT enabled blood pressure cuffs).
As described above, the health report may include data from the breath biopsy output file (i.e., breath assay data) and non-breath assay data (e.g., other health assessments, the subject’s medical history, data gathered from wearable devices, etc.). In some instances, the health report includes an interpretation of the breath assay data and non-breath assay data. The interpretation may be derived based on the breath assay data and non-breath assay data either individually and/or in combination with one another. In some embodiments, the interpretation may include the likelihood that the subject has a disease or condition (e.g., a potential diagnosis). In these instances, the interpretation may include the severity or stage of the disease or condition. In some embodiments, the interpretation may include the likelihood or risk level the subject may have of developing a disease or condition. In some cases, the presence of one or more compounds and the abundance (e.g., concentration) of each compound relative to one another in a breath sample may be correlated with a disease or fingerprint (e.g., using a machine learning model as described above). In some embodiments, the potential diagnosis and/or risk level is generated by analyzing or assaying the breath sample for the presence of one or more compounds (e.g., or unidentified m/z peaks or measurements) of a disease or condition fingerprint. In these cases, the potential diagnosis and/or risk level may be generated by comparing the fingerprint of the disease or condition to the m/z peaks or measurements generated from the breath sample provided by the subject (e.g., the compounds, and the values of abundance thereof, detected in the breath sample assay as indicated by the identifier associated breath biopsy output file and intuitive data set generated therefrom) using the trained machine learning algorithm.
As described above, the health report may include an interpretation of the breath assay data alone or in combination with non-breath assay data. This interpretation may be generated using the trained machine learning algorithm (e.g., as discussed above) and may include a potential diagnosis and/or a risk level of a disease or condition generated, e.g., by comparing the fingerprint of a disease or condition to the determined presence of one or more compounds of a disease or condition fingerprint (e.g., and the values of abundance thereof) in the breath sample. For example, a potential diagnosis and/or a risk level for a cancer such as, e.g., colon cancer, can be generated by comparing the determined presence of one or more compounds in the breath sample to compounds associated or correlated with colon cancer when found in breath (i.e., a determined colon cancer fingerprint of compounds or metabolites). In some embodiments, the correlation or association of compounds found in a breath sample to a specific disease or condition (i.e., the relationship between compounds found in a breath and a disease or condition) is determined using previously generated breath biopsy output files. For example, the correlation or association can be determined by comparing the determined presence of compounds (e.g., and their relative abundances) found in the breath samples of healthy patients with the determined presence of compounds found in the breath samples of patients diagnosed with a disease or condition. In some cases, the correlation or association may be generated using a dynamic algorithm, such as, e.g., a machine learning model as discussed above.
In some embodiments, a potential diagnosis and/or a risk level for chronic obstructive pulmonary disease (COPD) may be generated for the subject based, at least in part, on the determined presence and/or the relative abundance of one or more of: 2-hydroxyisobutyric acid, aspartic acid semialdehyde, acetohydroxybutanoic acid, 11 -hydroxyundecanoic acid, (+)-g- hydroxy-L-homoarginine, oxo-tetradecenoic acid, hexadecatrienoic acid, or oxo-heptadecanoic acid in the breath sample. In these instances, a machine learning model, trained as discussed above, may be configured to provide the potential diagnosis and/or risk level using the determined presence and/or the relative abundance of one or more of the compounds disclosed above.
In some embodiments, a potential diagnosis and/or a risk level for pulmonary fibrosis may be generated for the subject based, at least in part, on the determined presence and/or the relative abundance of one or more of: 4-hydroxyproline, allysine, aroline, phenylalanine, pyroglutamic acid, valine, or leucine in the breath sample. In these instances, a machine learning model, trained as discussed above, may be configured to provide the potential diagnosis and/or risk level using the determined presence and/or the relative abundance of one or more of the compounds disclosed above.
In some embodiments, a potential diagnosis and/or a risk level for obstructive sleep apnea (OSA) may be generated for the subject based, at least in part, on the determined presence and/or the relative abundance of one or more of: hexonate, hexonolactone, pentose, deoxypentose, hexose, butyrylcarnitine, propionylcarnitine, acryloylcarnitine, acetylcarnitine, carnitine, dehydrocarnitine, pentitol, deoxyhexose, hexuronate, hexitol, malonate semialdehyde, hydroxypropanoate, propanoate, hydroxybutyrate, succinate semialdehyde, methylaconitate, methylcitrate, aconitate, (iso)citrate, oxoglutarate, succinate, fumarate, malate, oxaloacetate, lactate, acetoacetate, pyruvate, acetolactate, glyoxylate, hydroxypyruvate, oxalate, methylcitrate, glycerate, or aminobutanoate in the breath sample. In these instances, a machine learning model, trained as discussed above, may be configured to provide the potential diagnosis and/or risk level using the determined presence and/or the relative abundance of one or more of the compounds disclosed above.
In some embodiments, a potential diagnosis and/or a risk level for coronavirus disease (COVID) and/or long COVID resulting from an infection of SARS-CoV-2 may be generated for the subject based, at least in part, on the determined presence and/or the relative abundance of one or more of: tryptophan, glutamine, glutamic acid, citrulline, histidine, phenylalanine, neopterin, aspartic acid, or nicotinic acid in the breath sample. In these instances, a machine learning model, trained as discussed above, may be configured to provide the potential diagnosis and/or risk level using the determined presence and/or the relative abundance of one or more of the compounds disclosed above.
In some embodiments, a potential diagnosis and/or a risk level for myalgic encephalomyelitis (ME), chronic fatigue syndrome (CFS), ME/CFS, Lyme disease, or posttreatment Lyme disease may be generated for the subject based, at least in part, on the determined presence and/or the relative abundance of one or more of: 1-pyrroline-5-carboxylate acid, 13-carboxy-alpha-tocopherol, 2-aminobutyric acid, 2-hydroxy-3-methylbutyrate, 2- methylglutaconic acid, 2-octenoylcarnitine, 3-hydroxylaurate, 4-hydroxyperoxy-2-nonenal, 4- hydroxyphenyllactic, 4-imidazolone-5-proponoate, 5,6-didhydrothymine, acetamidopropanal, aconitic acid, adenosine, alanine, alpha-ketoglutarate, arginine, asx (asparagine/aspartic acid), beta-alanine, biotin, chenodeoxycholic acid, citrate, cysteinylglycine, deoxyguanosine, dimethyl sulfone, ethanolamine, fructoseglycine, gamma-aminobutyric acid, gamma-CEHC, gamma- glutamyl-threonine, glutamate, glutamine, histidine, hydroxyisocaproic acid, hydroxyproline, isoleucine, indolelactate, inosine 5’-monophosphate, isocitrate, L-kynurenine/formyl-5- hydroxykyneurenamine, lactate, leucine, lithocholate, lysine, malate, N-carbamoylalanine, nicotinamide, octanoylcarnitine (C8), ornithine, oxaloacetate, phenylalanine, phenylalanylalanine, phenylalanylglycine, phenyllactic acid, proline, pyroglutamic acid, pyruvate, serine, stachydrine, succinic acid, succinylcarnitine, threonine, tiglylcarnitine, tyrosine, urocanate, valine, valylleucine, or 4-hydroxyphenylacate/2-hydroxyphenylacate/3,4-dihydroxyphenylacate in the breath sample. In these instances, a machine learning model, trained as discussed above, may be configured to provide the potential diagnosis and/or risk level using the determined presence and/or the relative abundance of one or more of the compounds disclosed above.
As discussed above, the presence of one or more compounds and the abundance (e.g., concentration) of each compound relative to one another in a breath sample may be used to generate a potential diagnosis and/or a risk level of a disease or condition (e.g., using the trained machine learning model as discussed above). In some embodiments, the presence of one or more compounds and the abundance (e.g., concentration) of each compound relative to one another in a breath sample may be used to inform a differential diagnosis of a disease or condition, e.g., along with non-breath assay data as discussed above (e.g., microbiome tests, wearable device data, the subject’s medical history, etc.). For example, the breath assay data may be used to help distinguish long COVID, from ME-CFS, from Lyme disease, from post-treatment Lyme disease when, e.g., a subject is experiencing symptoms of fatigue. In these instances, the differential diagnosis may be informed or generated, at least in part, using the trained machine learning model as discussed above.
As discussed above, the health report may include an interpretation derived based on the breath and non-breath assay data either individually and/or in combination with one another. In some embodiments, the interpretation may include a general assessment of the subject’s overall health or the health or condition of an organ or system of the subject’s body. For example, the interpretation may include a general assessment of the subject’s lung health (e.g., lung health is excellent, overall good, somewhat poor, overall poor, etc.). In some embodiments, the interpretation may include a general assessment of a specific category of health risk or health threat to the subject. For example, the interpretation may include a general assessment regarding the threat or risk of toxin exposure (e.g., groups of toxins and/or specific toxins) to the subject. By toxin is meant an agent (e.g., a compound) known or suspected of being harmful to the subject being assayed. In some embodiments, a qualitative or quantitative determination regarding whether one or more toxins or toxin associated compounds (e.g., metabolites of the one or more toxins) are present in a breath sample may be obtained from the identifier associated breath biopsy output file (i.e., breath assay data), where the resultant determination is employed as an indicator of whether the subject from which the assayed breath sample was obtained has been exposed to the one or more toxins. In some instances, the qualitative or quantitative determination regarding the one or more toxins is obtained, at least in part, using the trained machine learning model as discussed above. Toxins and toxin associated compounds that may be detected in a breath sample, in accordance with embodiments of the methods, may vary and include but are not limited to, carcinogens, PFAS compounds, and Trichloroethylene and/or metabolites thereof. In these cases, the general assessment may include the nature of risk posed by the one or more toxins and/or the level of exposure to the one or more toxins relative to a baseline (e.g., levels of probable carcinogen dichloromethane are unusually high, toxin exposure is low relative to the subject’s cohort, etc.). The baseline may include previous data generated from the subject (e.g., as described above) or may be a cohort average value such as an average level or amount of a given toxin found in a population or cohort of interest, e.g., firefighters.
In some embodiments, the interpretation may include a general assessment of a subject’s fitness for performing a task (e.g., driving, running, etc.) or undertaking a duty or responsibility (e.g., firefighting, piloting a vehicle, policing, construction, manufacturing, etc.). By fitness is meant the ability of the subject to perform and/or the risks associated with the subject undertaking (e.g., the potential risks to themselves, others, property, etc.) a task or tasks associated with the duty or responsibility. For example, the interpretation may include a general assessment regarding the fitness of a firefighter for duty.
In some embodiments, the health report may include a suggested next course of action. In embodiments where a next course of action is suggested, the suggested course of action may vary. In some instances, the course of action includes obtaining additional tests or consulting with additional medical professionals. For example, the suggested course of action may include consulting a specialist wherein a secondary opinion may be obtained, or additional testing may be recommended or ordered. In some embodiments, the suggested course of action may include a temporary or permanent modification to the subject’s responsibilities of employment. For example, the suggested course of action may include a period of time wherein the subject should avoid any potential for smoke inhalation if the subject is, e.g., a firefighter. In some embodiments, the suggested course of action may include an explanation regarding typical manners in which an individual may develop a higher risk of developing a disease or condition or a higher risk of being exposed to a toxin (e.g., sources of the toxin) and steps the subject may take to avoid or mitigate the risk. For example, the suggested course of action may include preventative measures, such as, e.g., a recommended diet or recommended personal protective equipment (PPE). In some embodiments, the suggested course of action may include a potential treatment regimen or therapy recommendation. By treatment regimen is meant a treatment plan that specifies the quantity, the schedule, and the duration of treatment. For example, the treatment regimen may include a suggested drug regimen, a detoxification process, or a suggested lifestyle change (e.g., dietary or exercise plans, etc.).
In some instances, the health report may include an evolution of a health risk, a disease or condition severity, or a likelihood of developing a disease or condition. By evolution is meant a progression of a metric over time, e.g., the progression of a health risk, a condition or disease severity, or a likelihood of developing a disease or condition over time. In some cases, the evolution is generated based at least in part on one or more previously obtained health reports. In some embodiments, the health evolution includes an explanation of how the relevant metric has changed over time. For example, the health evolution may include a peak, periods of decline or incline, and whether the metric is in a period of incline or decline at the time the present health report was obtained. In some embodiments, the health evolution may include an assessment of the effectiveness of a previously suggested next course of action (e.g., as described above). For example, the health evolution may include an assessment of the effectiveness of previously suggested preventative measures or detoxification processes. The assessment of effectiveness may be obtained based on whether the health evolution indicates the level of a metric is in a period of incline or decline at the time the present health report was obtained.
In some embodiments, the health report may include one or more health scores. By health score is meant a quantitative evaluation of the subject’s overall health, the health or condition of an organ or system of the subject’s body, a health risk facing the subject, or the subject’s fitness for performing a task or undertaking a duty or responsibility compared with a baseline. The baseline may vary, and in some instances includes the average of data associated with a cohort, such as an average level or amount of a given toxin found in a population or cohort of interest, a likelihood of developing a disease or condition in a population or cohort of interest, or the average resting or peak heart rate found in a population or cohort of interest. In some instances, the baseline includes prior data obtained for the subject, e.g., prior data obtained for the subject 1 day prior to generating the health report, 1 week prior to generating the health report, 1 month prior to generating the health report, 6 months prior to generating the health report, 1 year prior to generating the health report, 5 years prior to generating the health report, etc. In some embodiments, a health score is generated for the subject’s overall health, lung health, exposure to toxins, risk of developing a disease or condition, or fitness for the duty associated with their employment (e.g., firefighting). The health score may be generated or obtained using the trained machine learning model as discussed above and breath assay data and/or non-breath assay data. For example, an overall health score may be generated that is a composite of the findings of the trained machine learning algorithm (e.g., applied to the breath assay data) and one or more additional health assessments (e.g., as discussed above).
In some instances, the health report may include one or more personalized insights. A personalized insight may vary and includes, but is not limited to, the detection of an anomaly, a classification, the detection of a cluster, or a forecast. In some instances, the personalized insight includes an insight regarding the subject individually. In other instances, the personalized insight includes an insight regarding a group or cohort in which the subject belongs. In embodiments where the insight includes the detection of an anomaly, the insight may include the identification of unusual data. For example, the insight may be that a specific toxin is detected at a higher level or concentration than usual or the risk of developing a disease or condition is elevated (e.g., when compared to a baseline as described above). In embodiments where the insight includes a classification, the insight may include the identification of a group with similar data to the subject and, e.g., assigning and comparing the results and/or data of the subject to the group. For example, the insight may be that the subject has better overall health than 70% of firefighters (e.g., when the subject is a firefighter). In embodiments where the insight includes the detection of a cluster, the insight may include finding groups with similar results. For example, the insight may be that a city or location has the highest rate of a disease or exposure to a toxin.
As discussed above, the health report may include one or more personalized insights. In some embodiments, the personalized insight may include a forecast. The forecast may include an evolution of a health risk, a disease or condition severity, or a likelihood of developing a disease or condition as described above. In some embodiments, the forecast may include an evolution of the subject’s overall health, the health or condition of an organ or system of the subject’s body, or the subject’s fitness for performing a task or undertaking a duty or responsibility compared with a baseline. In some embodiments, the forecast may include a predicted future outcome such as, e.g., a health outcome prediction for the subject. The health outcome can be predicted at least in part using a health report obtained as discussed above. For example, the predicted health outcome may be that the subject has a high risk of developing a specific disease or condition (e.g., chronic obstructive pulmonary disease (COPD) or a myocardial infarction (heart attack)). In some instances, the health outcome can be predicted at least in part using the trained machine learning algorithm, as discussed above. In some instances, the health report is used to determine if a particular event or source of toxin exposure has affected the subject's predicted health outcomes. In instances where the subject is assayed at two or more timepoints to generate two or more health reports the two or more health reports may be used to, e.g., determine changes in exposure of the subject to toxins over time, determine a clearance time of toxins from the subject, or predict one or more health outcomes for the subject using some combination of the two or more health reports. In some cases, some combination of the two or more health reports is used to determine if a particular event or source of toxin exposure has affected the subject’s predicted health outcomes.
In some embodiments, the health report may include a metabolic profile or metabolic profiles of the breath sample of the subject. By metabolic profile is meant a higher-level view of the state of metabolic pathways or presence of various groupings of compounds in the individual at the time the breath is collected. A metabolic profile may compare a particular breath or breaths obtained from the subject to a baseline (e.g., as described above). Abnormal metabolic profiles may help identify the causes of certain symptoms, screen for disease, and guide treatment regimens. The metabolic profiles may be tailored to assist medical professionals with decision making. For example, compounds associated with specific diseases or symptoms, or falling under the same category of toxin, may be grouped together and intuitively displayed, e.g., with their determined levels or values of abundance.
In some embodiments, the health report may include notes or explanations aiding the subject, or a person associated with the subject, in interpreting the results of the health report. For example, the report may include an explanation regarding typical manners in which the risk of developing a disease or condition is enhanced, steps the subject may take to avoid risk enhancement, and/or symptoms associated with different diseases or conditions. In some cases, the health report may include notes indicating information relevant specifically to the subject. For example, the report may include a note indicating a difference between the level of risk the subject may have of developing a disease or condition relative to a baseline, or symptoms associated with the determined level of one or more compounds in the breath sample (e.g., dizziness, headaches, skin lesions, etc.). In some embodiments, the health report may include a background section such as, e.g., a background section explaining the purpose of the health report and the implication of certain results. In some embodiments, the health report may include visual means aiding the subject, or a person associated with the subject, in interpreting the findings of the health report and/or the results of the breath sample assay (e.g., figures, charts, images, etc.). The visual means may be a component of, or accompany, any of the components the health report is comprised of such as, e.g., any of the components described above. For example, in embodiments where the health report includes a health score, a figure including a number line with the relevant ranges displayed and an indication of the subject’s score may accompany the health score (or, e.g., the health score may consist of the figure). In some embodiments, the health report may be obtained or generated, at least in part, using the trained machine learning model as discussed above. In these instances, any of the components the health report is comprised of such as, e.g., any of the components described above may be generated or obtained, at least in part, using the trained machine learning model. For example, in embodiments where the health report includes a classification or the detection of a cluster as described above, the classification or detection may be generated or obtained using the trained machine learning model.
As discussed above, a health report can be generated for a subject from breath assay data. In some instances, the health report is generated in real time. By real time is meant the health report is generated during or immediately following the breath sample assay (e.g., during collection of the breath sample or while the breath sample is being assayed using, e.g., a mass spectrometer). In some instances, the health report is generated in two hours or less. In some cases, the health report is generated in one hour or less, such as thirty minutes or less, or twenty minutes or less, or ten minutes or less, or five minutes or less, or one minute or less following generation of the breath biopsy output file. In some instances, the health report is generated in real-time, e.g., as described in United States Provisional Application Serial Nos. 63/359,134 and 63/416,185 (Attorney docket nos. DIAG-003PRV and DIAG-003PRV2, respectively); the disclosures of which are herein incorporated by reference.
In some instances, the health report is associated with an identifier of the subject. The health report and associated identifier may be saved to a database such as, e.g., a database including a data warehouse. In some instances, the data warehouse is used to determine a relationship between health outcomes and compounds (e.g., and the values of abundance thereof) detected in breath samples. The relationship may be determined, at least in part, using a trained machine learning model (e.g., as discussed above). In some instances, the determined relationship may be used to generate a subsequent health report.
In some embodiments, the method further includes suggesting preventative measures based on the health report, such as, e.g., recommended personal protective equipment (PPE) to avoid potential future exposure to a toxin or the development of a disease or condition. In some embodiments, the method further includes providing a therapy recommendation to the subject based on the health report. While the therapy recommendation may vary, in some instances the therapy recommendation includes recommendations regarding the specifics of administering some existing standard of care for the treatment of a disease or condition. In some instances, the method further includes administering the treatment to the subject. Embodiments of the methods may further include transmitting the health report, e.g., to a health care practitioner, to the subject, to an agent of the subject, etc. In some instances, the health report is received by a computer or mobile device application, such as a smart phone or computer app. In some cases, the health report is received by mail, electronic mail, fax machine, etc. Aspects of the invention further include methods of obtaining a health report, e.g., by breathing into a system of the invention as discussed in greater detail below; and receiving a health report from the system.
FIG. 3 provides a depiction of a health report obtained at least in part from a breath biopsy output file in accordance with an embodiment of invention. In FIG. 3, first page 300 of the health report includes header 301 including information pertaining to the session in which the health report was generated and identifying information of the subject. Diagnostics section 302 includes breath assay data 303 including a chart summarizing results of a toxin screening and a chart depicting compounds detected in the breath assay associated with various diseases or conditions. Following diagnostics section 303, interpretation section 304 explains the significance of the breath assay data (and, e.g., the non-breath assay data) on the subject’s lung health and the health risks toxins may pose to the subject. The second page 305 of the health report includes toxin health risk evolution 306 and various health scores 307 obtained, e.g., as described above. Personal insights 308 are also provided as charts depicting evolutions of the subject’s overall health and lung health over the previous year and up to the present timepoint the depicted health report was obtained.
FIG. 4 provides a depiction of various metabolic profiles that may be included in a health report obtained at least in part from a breath biopsy output file in accordance with an embodiment of invention. Metabolic profile section 400 includes header 401 and selectable menu 402 provided to assist a viewer in navigating between sections of the health report when, e.g., the report is displayed on an electronic viewing device (e.g., a computer or a smart phone). Session summary 403 provides information pertaining to the session in which the breath sample assay was performed. The metabolic profile breakdown further includes the identifier of the subject 404 as well as various charts and graphs depicting metabolic profiles intuitively displayed in order to, e.g., assist medical professionals with decision making. Spider diagrams 405 depict the presence and relative abundance of compounds associated with pulmonary fibrosis, COPD, COVID/long COVID, and OSA. In some instances, the shape of a spider diagram may aid in the diagnosis of a disease such as, e.g., through differential diagnosis with non-breath assay data. Chart 406 summarizes the results of a toxin panel. Chart 407 summarizes the results of a metabolic profile including a wide variety of various compounds. FIG. 5 provides a section of a health report breaking down the results of the breath sample assay as the relate to COPD obtained at least in part from a breath biopsy output file in accordance with an embodiment of invention. COPD breakdown 500 includes header 501 and selectable menu 502 provided to assist a viewer in navigating between sections of the health report when, e.g., the report is displayed on an electronic viewing device (e.g., a computer or a smart phone). Session summary 503 provides information pertaining to the session in which the breath sample assay was performed. The COPD breakdown further includes the identifier of the subject 504 as well as background section 505 and summary of results 506 including a risk level for the subject, the confidence level of the risk level, and a box plot intuitively displaying the determined presence of various compounds associated with COPD when found in breath.
FIGS. 6A-6B provide a depiction of a metabolic profile of toxins (i.e., a toxin panel) of a health report obtained at least in part from a breath biopsy output file in accordance with an embodiment of invention. In Fig. 6A, first page 600 of the toxin panel includes header 601 and selectable menu 602 provided to assist a viewer in navigating between sections of the health report when, e.g., the report is displayed on an electronic viewing device (e.g., a computer or a smart phone). Background section 604 is provided to explain the purpose of the toxin panel to the viewer (e.g., the subject) and session summary 603 is included providing information pertaining to the session in which the breath sample assay was performed. The first page of the toxin panel further includes table 605 summarizing the findings of the toxin panel. Table 605 lists each selected toxin in a row with an assigned detection level as described above, a history of toxin presence in previous breath samples provided by the subject (e.g., as determined by the findings of one or more previous health reports), and an explanation regarding the toxin as described above. In FIG. 6B, second page 606 of the toxin panel breaks each selected toxin into one of tables 607-609 based on a classification of each toxin (e.g., as Group 1 or Group 2A carcinogens as classified by the International Agency for Research on Cancer (IARC)). Each of tables 607- 609 list selected toxins classified in the respective category in a row with an assigned detection level (e.g., as described above) and a note highlighting any changes in detected toxin level from a previous breath sample provided by the subject (i.e., a temporal change). The second page of the toxin panel further includes chart 610 summarizing the results of the toxin panel. SYSTEMS
Aspects of the present disclosure further include systems, such as computer-controlled systems, for practicing embodiments of the above methods. Aspects of the systems include: a particle analyzer configured to receive a breath sample; a processor configured to receive the measurements generated by the particle analyzer; and memory operably coupled to the processor wherein the memory includes instructions stored thereon, which when executed by the processor, cause the processor to: analyze breath samples from a plurality of subjects to generate a plurality of breath biopsy output files; obtain a health record associated with a disease or condition for each subject; train a machine learning model to identify a relationship between the breath samples and the disease or condition using the breath biopsy output files and the obtained health records; and applying the trained machine learning model to a breath biopsy output file to generate a health report regarding the disease or condition for a subject
In some embodiments, the particle analyzer may be a mass spectrometer. The mass spectrometer may be configured to perform a variety of techniques/methods. In some embodiments, the mass spectrometer includes a high-resolution mass spectrometer (HRMS). In some embodiments, the mass spectrometer may be coupled to or include one or more of: an ion mobility spectrometer (IMS), a gas chromatograph (GC), a liquid chromatograph (LC), a differential mobility spectrometer (DMS), a field asymmetric ion mobility spectrometer (FAIMS), a selective-ion flow tube (i.e., SIFT-MS), a proton-transfer-reaction (i.e., PTR-MS), a time-of-flight mass spectrometer (TOF-MS) etc. In some embodiments, the mass spectrometer may be a Thermo Scientific high-resolution mass spectrometer (e.g., Thermo Scientific Exactive™, Q- Exactive™, Exploris™) or a SciEX high-resolution mass spectrometer (e.g., TripleTOF ® mass spectrometer system).
In order for a mass spectrometer to measure the mass to charge ratio of a particle or particles (e.g., toxins), the particle(s) must first be ionized or charged using, e.g., an ionizer. The ionizer (e.g., ionization source) coupled to the mass spectrometer in accordance with embodiments of the invention may vary. In some embodiments, the ionizer is configured to perform matrix-assisted laser desorption/ionization (MALDI), atmospheric pressure chemical ionization (APCI), atmospheric pressure photoionization (APPI), electrospray ionization (ESI), secondary electrospray ionization (SESI), etc. In some embodiments, the ionizer is configured to perform SESI. In these instances, the ionizer may be a SUPER SESI™ device (e.g., a SUPER SESI™ QE or SUPER SESI™-X device). The ionizer may be configured to ionize particles in the breath sample, wherein the mass spectrometer may be configured to generate measurements of the mass-to-charge ratio of the ionized particles. In some embodiments, the mass spectrometer is configured to provide real-time feedback of the breath sample assay related to the quality of the breath sample. In some embodiments, the ionizer and mass spectrometer are configured to assay the breath sample in real time with respect to the subject providing the breath sample. In these embodiments, compounds that are exhaled from deeper in the lungs may be detected relatively later in the assay. In some embodiments, the mass spectrometer is configured to measure the time of detection of a toxin or toxin associated compound in the breath sample assay.
The systems may further include means for delivering a breath sample (e.g., one or more exhaled breaths of the breath sample) from the subject to the particle analyzer. In some instances, these delivery means may include a mouthpiece configured to seal to the lips of a subject and receive the breath sample from the subject. The delivery means may additionally include a breath chamber configured to receive the breath sample from the mouthpiece. In some instances, the breath chamber is operably coupled to the ionizer. In these cases, the delivery means may further include a valve configured to do one or more of: direct the breath sample along a desired flow path, control the flow rate of the breath sample into the ionizer, or block the flow of ambient air/the breath sample. In other cases, the breath chamber is configured to produce exhaled breath condensate (EBC) from the breath sample. In these instances, the system may include means for chilling the breath chamber. Chilling means may include, but are not limited to, a freezer or refrigerator, dry ice, or liquid nitrogen. In embodiments where an ECB is used, the system may further include aerosolization means configured to aerosolize the EBC prior to ionization such as, e.g., a nebulizer. In embodiments where an ECB is used, the system may further include means for stably storing the EBC such as, e.g., a refrigerator or a freezer.
In some embodiments, the memory includes instructions stored thereon, which when executed by the processor, further cause the processor to generate an identifier associated breath biopsy output file, an intuitive data set generated from the breath biopsy output file, and/or a metadata file associated with the breath biopsy output file according to any of the methods as discussed above. In some embodiments, the memory includes instructions stored thereon, which when executed by the processor, further cause the generate a plurality of breath biopsy output files for a plurality of subjects and obtain a health record associated with a disease or condition for each subject according to any of the methods as discussed above. In these instances, the instructions, when executed by the processor, may cause the processor to train a machine learning model to identify a relationship between the breath samples and the disease or condition using the breath biopsy output files and the obtained health records according to any of the methods as discussed above. In some embodiments, the processor includes instructions stored thereon, which when executed by the processor, further cause the processor to generate a health report regarding the disease or condition for a subject according to any of the methods as discussed above. In some embodiments, the processor includes instructions stored thereon, which when executed by the processor, further cause the processor to generate an intuitive data set based on the identifier associated breath biopsy output file according to any of the methods as discussed above. In these instances, the instructions, when executed by the processor, may cause the processor to reduce the data of the identifier associated breath biopsy output file in order to generate the intuitive data set according to any of the methods as discussed above. In some embodiments, the instructions, when executed by the processor, may cause the processor to first generate the intuitive data set before generating the health report according to any of the methods as discussed above.
In some embodiments, the processor includes instructions stored thereon, which when executed by the processor, further cause the processor to dynamically adjust breath collection automatically based on real-time feedback according to any of the methods as discussed above.
In some instances the systems further include one or more computers for complete automation or partial automation of the methods described herein. In some embodiments, systems include a computer having a computer readable storage medium with a computer program stored thereon.
In embodiments, the system includes an input module, a processing module and an output module. The subject systems may include both hardware and software components, where the hardware components may take the form of one or more platforms, e.g., in the form of servers, such that the functional elements, i.e., those elements of the system that carry out specific tasks (such as managing input and output of information, processing information, etc.) of the system may be carried out by the execution of software applications on and across the one or more computer platforms represented of the system.
Systems may include a display and operator input device. Operator input devices may, for example, be a keyboard, mouse, or the like. The processing module includes a processor which has access to a memory having instructions stored thereon for performing the steps of the subject methods. The processing module may include an operating system, a graphical user interface (GUI) controller, a system memory, memory storage devices, and input-output controllers, cache memory, a data backup unit, and many other devices. The processor may be a commercially available processor or it may be one of other processors that are or will become available. The processor executes the operating system and the operating system interfaces with firmware and hardware in a well-known manner, and facilitates the processor in coordinating and executing the functions of various computer programs that may be written in a variety of programming languages, such as Java, Perl, C++, Python, other high-level or low-level languages, as well as combinations thereof, as is known in the art. The operating system, typically in cooperation with the processor, coordinates and executes functions of the other components of the computer. The operating system also provides scheduling, input-output control, file and data management, memory management, and communication control and related services, all in accordance with known techniques. The processor may be any suitable analog or digital system. In some embodiments, the processor includes analog electronics which provide feedback control, such as for example negative feedback control.
The system memory may be any of a variety of known or future memory storage devices. Examples include any commonly available random access memory (RAM), magnetic medium such as a resident hard disk or tape, an optical medium such as a read and write compact disc, flash memory devices, or other memory storage device. The memory storage device may be any of a variety of known or future devices, including a compact disk drive, a tape drive, a removable hard disk drive, or a diskette drive. Such types of memory storage devices typically read from, and/or write to, a program storage medium (not shown) such as, respectively, a compact disk, magnetic tape, removable hard disk, or floppy diskette. Any of these program storage media, or others now in use or that may later be developed, may be considered a computer program product. As will be appreciated, these program storage media typically store a computer software program and/or data. Computer software programs, also called computer control logic, typically are stored in system memory and/or the program storage device used in conjunction with the memory storage device.
In some embodiments, a computer program product is described including a computer usable medium having control logic (computer software program, including program code) stored therein. The control logic, when executed by the processor the computer, causes the processor to perform functions described herein. In other embodiments, some functions are implemented primarily in hardware using, for example, a hardware state machine. Implementation of the hardware state machine so as to perform the functions described herein will be apparent to those skilled in the relevant arts.
Memory may be any suitable device in which the processor can store and retrieve data, such as magnetic, optical, or solid-state storage devices (including magnetic or optical disks or tape or RAM, or any other suitable device, either fixed or portable). The processor may include a general-purpose digital microprocessor suitably programmed from a computer readable medium carrying necessary program code. Programming can be provided remotely to processor through a communication channel, or previously saved in a computer program product such as memory or some other portable or fixed computer readable storage medium using any of those devices in connection with memory. For example, a magnetic or optical disk may carry the programming, and can be read by a diskwriter/reader. Systems of the invention also include programming, e.g., in the form of computer program products, algorithms for use in practicing the methods as described above. Programming according to the present invention can be recorded on computer readable media, e.g., any medium that can be read and accessed directly by a computer. Such media include, but are not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and magnetic tape; optical storage media such as CD-ROM; electrical storage media such as RAM and ROM; portable flash drive; and hybrids of these categories such as magnetic/optical storage media.
The processor may also have access to a communication channel to communicate with a user at a remote location. By remote location is meant the user is not directly in contact with the system and relays input information to an input manager from an external device, such as a computer connected to a Wide Area Network (“WAN”), telephone network, satellite network, or any other suitable communication channel, including a mobile telephone (i.e. , smartphone).
In some embodiments, systems according to the present disclosure may be configured to include a communication interface. In some embodiments, the communication interface includes a receiver and/or transmitter for communicating with a network and/or another device. The communication interface can be configured for wired or wireless communication, including, but not limited to, radio frequency (RF) communication (e.g., Radio-Frequency Identification (RFID), Zigbee communication protocols, WiFi, infrared, wireless Universal Serial Bus (USB), Ultra Wide Band (UWB), Bluetooth® communication protocols, and cellular communication, such as code division multiple access (CDMA) or Global System for Mobile communications (GSM).
In one embodiment, the communication interface is configured to include one or more communication ports, e.g., physical ports or interfaces such as a USB port, an RS-232 port, or any other suitable electrical connection port to allow data communication between the subject systems and other external devices such as a computer terminal (for example, at a physician’s office or in hospital environment) that is configured for similar complementary data communication.
In one embodiment, the communication interface is configured for infrared communication, Bluetooth® communication, or any other suitable wireless communication protocol to enable the subject systems to communicate with other devices such as computer terminals and/or networks, communication enabled mobile telephones, personal digital assistants, or any other communication devices which the user may use in conjunction.
In one embodiment, the communication interface is configured to provide a connection for data transfer utilizing Internet Protocol (IP) through a cell phone network, Short Message Service (SMS), wireless connection to a personal computer (PC) on a Local Area Network (LAN) which is connected to the internet, or WiFi connection to the internet at a WiFi hotspot.
In one embodiment, the subject systems are configured to wirelessly communicate with a server device via the communication interface, e.g., using a common standard such as 802.11 or Bluetooth® RF protocol, or an IrDA infrared protocol. The server device may be another portable device, such as a smart phone, Personal Digital Assistant (PDA) or notebook computer; or a larger device such as a desktop computer, appliance, etc. In some embodiments, the server device has a display, such as a liquid crystal display (LCD), as well as an input device, such as buttons, a keyboard, mouse or touch-screen.
In some embodiments, the communication interface is configured to automatically or semi- automatically communicate data stored in the subject systems, e.g., in an optional data storage unit, with a network or server device using one or more of the communication protocols and/or mechanisms described above.
Output controllers may include controllers for any of a variety of known display devices for presenting information to a user, whether a human or a machine, whether local or remote. If one of the display devices provides visual information, this information typically may be logically and/or physically organized as an array of picture elements. A graphical user interface (GUI) controller may include any of a variety of known or future software programs for providing graphical input and output interfaces between the system and a user, and for processing user inputs. The functional elements of the computer may communicate with each other via system bus. Some of these communications may be accomplished in alternative embodiments using network or other types of remote communications. The output manager may also provide information generated by the processing module to a user at a remote location, e.g., over the Internet, phone or satellite network, in accordance with known techniques. The presentation of data by the output manager may be implemented in accordance with a variety of known techniques. As some examples, data may include SQL, HTML or XML documents, email or other files, or data in other forms. The data may include Internet URL addresses so that a user may retrieve additional SQL, HTML, XML, or other documents or data from remote sources. The one or more platforms present in the subject systems may be any type of known computer platform or a type to be developed in the future, although they typically will be of a class of computer commonly referred to as servers. However, they may also be a main-frame computer, a workstation, or other computer type. They may be connected via any known or future type of cabling or other communication system including wireless systems, either networked or otherwise. They may be co-located or they may be physically separated. Various operating systems may be employed on any of the computer platforms, possibly depending on the type and/or make of computer platform chosen. Appropriate operating systems include Windows, iOS, Oracle Solaris, Linux, IBM i, Unix, and others.
Aspects of the present disclosure further include non-transitory computer readable storage mediums having instructions for practicing the subject methods. Computer readable storage mediums may be employed on one or more computers for complete automation or partial automation of a system for practicing methods described herein. In certain embodiments, instructions in accordance with the method described herein can be coded onto a computer- readable medium in the form of “programming”, where the term "computer readable medium" as used herein refers to any non-transitory storage medium that participates in providing instructions and data to a computer for execution and processing. Non-transitory computer-readable media include all computer-readable media except for a transitory, propagating signal. Examples of suitable non-transitory storage media include a floppy disk, hard disk, optical disk, magnetooptical disk, CD-ROM, CD-R, magnetic tape, non-volatile memory card, ROM, DVD-ROM, Blueray disk, solid state disk, and network attached storage (NAS), whether or not such devices are internal or external to the computer. A file containing information can be “stored” on computer readable medium, where “storing” means recording information such that it is accessible and retrievable at a later date by a computer. The computer-implemented method described herein can be executed using programming that can be written in one or more of any number of computer programming languages. Such languages include, for example, Python, Java, Java Script, C, C#, C++, Go, R, Swift, PHP, as well as many others.
The non-transitory computer readable storage medium may be employed on one or more computer systems having a display and operator input device. Operator input devices may, for example, be a keyboard, mouse, or the like. The processing module includes a processor which has access to a memory having instructions stored thereon for performing the steps of the subject methods. The processing module may include an operating system, a graphical user interface (GUI) controller, a system memory, memory storage devices, input-output controllers, cache memory, a data backup unit, and many other devices. The processor may be a commercially available processor or it may be one of other processors that are or will become available. The processor executes the operating system and the operating system interfaces with firmware and hardware in a well-known manner, and facilitates the processor in coordinating and executing the functions of various computer programs that may be written in a variety of programming languages, such as those mentioned above, other high level or low level languages, as well as combinations thereof, as is known in the art. The operating system, typically in cooperation with the processor, coordinates and executes functions of the other components of the computer. The operating system also provides scheduling, input-output control, file and data management, memory management, and communication control and related services, all in accordance with known techniques.
UTILITY
The methods and systems of the invention, e.g., as described above, find use in a variety of applications where it is desirable to make a qualitative or quantitative determination regarding one or more health-related matters pertaining to a subject. In some embodiments, the methods and systems described herein find use when it is desirable to enhance the accuracy of differential diagnoses. Embodiments of the present disclosure find use in applications wherein it is desired to acquire additional health information through non-invasive diagnostic procedures in order to, e.g., detect exposure to toxins or facilitate the early diagnosis of various diseases and conditions and, correspondingly, provide for improvements in patient outcomes. In some embodiments, the subject methods and systems may facilitate carcinogen exposure testing of a subject or the generation of data useful for the diagnosis of a disease or condition by low/minimally trained technicians. In some embodiments, the subject methods and systems may facilitate diagnosis for one or more conditions, insight on one or more health risks, or recommendations for one or more therapies or treatments.
EXAMPLES
The following examples are put forth so as to provide those of ordinary skill in the art with a complete disclosure and description of how to use the present invention and are not intended to limit the scope of what the inventors regard as their invention nor are they intended to represent that the experiments below are all or the only experiments performed. Efforts have been made to ensure accuracy with respect to numbers used (e.g., amounts, temperature, etc.) but some experimental errors and deviations should be accounted for. I. Detection of Compounds in Breath
Group 1 carcinogens
The breath sample of a healthy subject was assayed for the presence of twelve Group 1 carcinogens. The breath sample assay was run using a high-resolution mass spectrometer coupled to a SUPER SESI™ device run on negative-ion mode. The numbers reflect a value of abundance and a “ — ” denotes non-detection. Non-detection indicates that the respective compound was not present in the breath sample above the limit of detection. Most carcinogens are detected at trace levels, some in only one or two of the five breaths assayed. The results of the breath sample assay appear in Table 1 , below:
Table 1: Detection of Group 1 carcinogens
Figure imgf000055_0001
PFAS
The breath sample of two subjects was assayed for the presence of six PFAS compounds. The breath sample assay was run using a high-resolution mass spectrometer coupled to a SUPER SESI™ device run on negative-ion mode. The numbers in the peak m/z column reflect the absolute value of a ratio of mass (i.e., Daltons) to charge at the center of the peak determined to correspond to the relevant compound. The numbers in the integrated IEC column reflect the area appearing under each respective peak on a produced extracted ion chromatogram, indicating relative abundance of the respective PFAS compound in the breath sample. A “ — ” denotes non-detection. Non-detection indicates that the respective compound was not present in the breath sample above the limit of detection. Most PFAS compounds are detected at trace levels. Some PFAS compounds are not detected in the breath sample assay, which may indicate a high elimination rate of the PFAS compound in the human body or a limited exposure of the subject to the PFAS compound. The results of the breath sample assay appear in Table 2 for Subject 1 and Table 3 for Subject 2, as can be seen below:
Table 2: Detection of Group PFAS compounds in Subject 1
Figure imgf000056_0001
Table 3: Detection of Group PFAS compounds in Subject 2
Figure imgf000056_0002
TCE and associated byproducts
The breath sample of a healthy subject was assayed for the presence TCE and six TCE associated byproducts. The breath sample assay was run using a high-resolution mass spectrometer coupled to a SUPER SESI™ device run on negative-ion mode. The numbers reflect a value of abundance and a “ — ” denotes non-detection. Non-detection indicates that the respective compound was not present in the breath sample above the limit of detection. Most agents are detected at trace levels, some in only one or two of the five breaths assayed. Some agents are not detected in any breaths of the breath sample assay, which may indicate a high elimination rate of the agent in the human body or a limited exposure of the subject to the agent. The results of the breath sample assay appear in Table 4, below: Table 4: Detection of TCE and associated byproducts
Figure imgf000057_0001
Disease related compounds
The breath sample of a healthy subject was assayed for the presence of compounds associated with COPD, Pulmonary Fibrosis, COVID/Long COVID, and OSA when found in the human breath. The breath sample assay was run using a high-resolution mass spectrometer coupled to a SUPER SESI™ device.
FIG. 4 provides the results of assaying the breath of the subject for the presence of various disease related compounds in accordance with an embodiment of invention. Spider diagrams 405 depict the presence and relative abundance of compounds associated with pulmonary fibrosis, COPD, COVID/long COVID, and OSA. In some instances, the shape of a spider diagram may aid in the diagnosis of a disease such as, e.g., through differential diagnosis with non-breath assay data.
FIG. 5 provides the results of assaying the breath of the subject for the presence of various compounds associated with COPD in accordance with an embodiment of invention. The box plot in results section 506 intuitively displays the determined presence of various compounds associated with COPD when found in breath.
II. Generating a Health Report
A health report was generated based in part on an identifier associated breath biopsy output file generated from a breath sample assay in accordance with embodiments of the invention.
FIG. 3 provides a depiction of the health report obtained in part from the identifier associated breath biopsy output file. In FIG. 3, first page 300 of the health report includes header 301 including information pertaining to the session in which the health evaluation was generated and identifying information of the subject. Diagnostics section 302 includes breath assay data 303 including a chart summarizing results of a toxin screening and a chart depicting compounds detected in the breath assay associated with various diseases or conditions. Interpretation section 304 explains the significance of the breath assay data on the subject’s lung health and the health risks toxins may pose to the subject. The second page 305 of the health report includes toxin health risk evolution 306 and various health scores 307 obtained as described above. Personal insights 308 are provided as charts depicting evolutions of the subject’s overall health and lung health over the previous year and up to the present timepoint the depicted health evaluation was obtained.
FIGS. 6A-6B provide a depiction of a metabolic profile of toxins (i.e., a toxin panel) of a health report obtained from the breath sample assay. In FIG. 6A, first page 600 of the toxin panel includes header 601 and selectable menu 602 for navigating between sections of the health report when it is displayed on an electronic viewing device. Background section 604 is also provided along with session summary 603 providing information pertaining to the session in which the breath sample assay was performed. Table 605 summarizes the findings of the toxin panel, listing each selected toxin in a row with an assigned detection level reflecting a relative value of abundance for the toxin. In FIG. 6B, second page 606 of the toxin panel breaks each selected toxin into one of tables 607-609 based on a classification of each toxin. Each of tables 607-609 list selected toxins classified in the respective category in a row with the assigned detection level and a note highlighting any changes in detected toxin level from a previous breath sample of the subject. Chart 610 summarizes the results of the breath sample assay.
In at least some of the previously described embodiments, one or more elements used in an embodiment can interchangeably be used in another embodiment unless such a replacement is not technically feasible. It will be appreciated by those skilled in the art that various other omissions, additions and modifications may be made to the methods and structures described above without departing from the scope of the claimed subject matter. All such modifications and changes are intended to fall within the scope of the subject matter, as defined by the appended claims.
It will be understood by those within the art that, in general, terms used herein, and especially in the appended claims (e.g., bodies of the appended claims) are generally intended as “open” terms (e.g., the term “including” should be interpreted as “including but not limited to,” the term “having” should be interpreted as “having at least,” the term “includes” should be interpreted as “includes but is not limited to,” etc.). It will be further understood by those within the art that if a specific number of an introduced claim recitation is intended, such an intent will be explicitly recited in the claim, and in the absence of such recitation no such intent is present. For example, as an aid to understanding, the following appended claims may contain usage of the introductory phrases “at least one” and “one or more" to introduce claim recitations. However, the use of such phrases should not be construed to imply that the introduction of a claim recitation by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim recitation to embodiments containing only one such recitation, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an” (e.g., “a” and/or “an” should be interpreted to mean “at least one” or “one or more”); the same holds true for the use of definite articles used to introduce claim recitations. In addition, even if a specific number of an introduced claim recitation is explicitly recited, those skilled in the art will recognize that such recitation should be interpreted to mean at least the recited number (e.g., the bare recitation of “two recitations,” without other modifiers, means at least two recitations, or two or more recitations). Furthermore, in those instances where a convention analogous to “at least one of A, B, and C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “ a system having at least one of A, B, and C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). In those instances where a convention analogous to “at least one of A, B, or C, etc.” is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., “ a system having at least one of A, B, or C” would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.). It will be further understood by those within the art that virtually any disjunctive word and/or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms. For example, the phrase “A or B” will be understood to include the possibilities of “A” or “B” or “A and B.”
In addition, where features or aspects of the disclosure are described in terms of Markush groups, those skilled in the art will recognize that the disclosure is also thereby described in terms of any individual member or subgroup of members of the Markush group.
As will be understood by one skilled in the art, for any and all purposes, such as in terms of providing a written description, all ranges disclosed herein also encompass any and all possible sub-ranges and combinations of sub-ranges thereof. Any listed range can be easily recognized as sufficiently describing and enabling the same range being broken down into at least equal halves, thirds, quarters, fifths, tenths, etc. As a non-limiting example, each range discussed herein can be readily broken down into a lower third, middle third and upper third, etc. As will also be understood by one skilled in the art all language such as “up to,” “at least,” “greater than,” “less than,” and the like include the number recited and refer to ranges which can be subsequently broken down into sub-ranges as discussed above. Finally, as will be understood by one skilled in the art, a range includes each individual member. Thus, for example, a group having 1 -3 articles refers to groups having 1 , 2, or 3 articles. Similarly, a group having 1-5 articles refers to groups having 1 , 2, 3, 4, or 5 articles, and so forth.
Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it is readily apparent to those of ordinary skill in the art in light of the teachings of this invention that certain changes and modifications may be made thereto without departing from the spirit or scope of the appended claims.
Accordingly, the preceding merely illustrates the principles of the invention. It will be appreciated that those skilled in the art will be able to devise various arrangements which, although not explicitly described or shown herein, embody the principles of the invention and are included within its spirit and scope. Furthermore, all examples and conditional language recited herein are principally intended to aid the reader in understanding the principles of the invention and the concepts contributed by the inventors to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions. Moreover, all statements herein reciting principles, aspects, and embodiments of the invention as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents and equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims.
The scope of the present invention, therefore, is not intended to be limited to the exemplary embodiments shown and described herein. Rather, the scope and spirit of present invention is embodied by the appended claims. In the claims, 35 U.S.C. §112(f) or 35 U.S.C. §112(6) is expressly defined as being invoked for a limitation in the claim only when the exact phrase "means for" or the exact phrase "step for" is recited at the beginning of such limitation in the claim; if such exact phrase is not used in a limitation in the claim, then 35 U.S.C. § 1 12 (f) or 35 U.S.C. §112(6) is not invoked.

Claims

WHAT IS CLAIMED IS:
1 . A method of generating a health report for a subject, the method comprising: analyzing a breath sample from a subject with a secondary electrospray ionization-high-resolution mass spectrometry analyzer to generate a breath biopsy output file; automatically detecting the generated breath biopsy output file; associating the automatically detected generated breath biopsy output file with an identifier of the subject to produce an identifier associated breath biopsy output file; and producing the health report for the subject from the identifier associated breath biopsy output file.
2. The method according to Claim 1 , wherein the method further comprises analyzing breath samples from a plurality of subjects to produce a plurality of identifier associated breath biopsy output files.
3. The method according to Claim 2, wherein the method further comprises: obtaining a health record associated with a disease or condition for each subject; training a machine learning model to identify a relationship between the breath samples and the disease or condition using the breath biopsy output files and the obtained health records; and applying the trained machine learning model to a breath biopsy output file, different from the breath biopsy output files used to train the model, to generate a health report regarding the disease or condition for a subject.
4. The method according to any of Claims 1 to 3, wherein the analyzing further comprises automatically configuring the mass spectrometry analyzer to perform selected ion monitoring based on real-time feedback of the measurements of the mass spectrometry analyzer.
5. The method according to Claim 4, wherein the automatic selected ion monitoring restricts the mass spectrometry analyzer to measure a select range of m/z values comprising one or more features of interest.
6. The method according to Claims 4 or 5, wherein the analyzing further comprises automatically configuring the mass spectrometry analyzer to perform fragmentation if the selected ion monitoring indicates the breath sample comprises one or more features of interest.
7. The method according to Claim 3, wherein one or more labels are extracted from each obtained health record and associated with the corresponding breath biopsy output file of each subject.
8. The method according to Claim 7, wherein the labeled breath biopsy output files are used to train the machine learning algorithm using supervised or semi-supervised learning.
9. The method according to Claims 7 or 8, wherein the breath biopsy output files are configured as a sequence of temporally linked scans.
10. The method according to Claim 9, wherein the machine learning algorithm is configured to process sequential input data.
11 . The method according to Claim 10, wherein the machine learning algorithm is based on a recurrent neural network (RNN) model or a transformer model.
12. The method according to any of Claims 7 to 1 1 , wherein the each obtained health record comprises a non-breath health assessment.
13. The method according to Claim 12, wherein the one or more extracted labels comprises a component of the non-breath health assessment.
14. The method according to Claim 13, wherein the component of the non-breath health assessment is used to classify the breath biopsy output file different from the breath biopsy output files used to train the model.
15. The method according to Claim 12, wherein the one or more extracted labels are associated with each non-breath health assessment.
16. The method according to Claim 15, wherein the labeled breath biopsy output files and the labeled non-breath health assessments are used to train the machine learning algorithm using supervised or semi-supervised learning.
17. The method according to any of the preceding claims, wherein producing the health report for the subject from the identifier associated breath biopsy output file comprises converting the identifier associated breath biopsy output file to an open XML-based format.
18. The method according to Claim 17, wherein the open XML-based format is mzML format.
19. The method according to any of the preceding claims, wherein producing the health report comprises generating an intuitive data set from the identifier associated breath biopsy output file.
20. The method according to any of the preceding claims, wherein the method further comprises obtaining metadata associated with the identifier associated breath biopsy output file.
21 . The method according to any of the preceding claims, wherein associating the automatically detected generated breath biopsy output file with an identifier of the subject comprises: receiving an identifier from the subject; and confirming that the generated breath biopsy output file is from analysis of the breath sample obtained from the subject.
22. The method according to Claim 21 , wherein the identifier receiving the subject comprises an alpha/numeric identifier.
23. The method according to Claim 21 , wherein the identifier received from the subject comprises a code.
24. The method according to Claim 23, wherein the code comprises a QR code.
25. The method according to any of the preceding claims, wherein the identifier is associated with the automatically detected generated breath biopsy output file by a human operator.
26. The method according to any of the preceding claims, wherein the identifier is associated with the automatically detected generated breath biopsy output file by a program.
27. The method according to any of the preceding claims, wherein the method comprises associating the identifier with a prior health record of the subject.
28. The method according to any of the preceding claims, wherein the health report is generated in 10 minutes or less following generation of the breath biopsy output file.
29. The method according to Claim 28, wherein the health report is generated in 5 minutes or less following generation of the breath biopsy output file.
30. The method according to any of the preceding claims, wherein the health report is generated using a machine learning algorithm.
31 . The method according to any of the preceding claims, wherein the method further comprises employing additional non-breath data from the subject to generate the health report.
32. The method according to Claim 31 , wherein the additional non-breath data comprises physiological data.
33. The method according to Claim 32, wherein the physiological data comprises one or more of heart rate, blood glucose, blood pressure, respiration rate, body temperature, blood volume, sound pressure, photoplethysmography, electroencephalogram, electrocardiogram, blood oxygen saturation, and skin conductance.
34. The method according to any of Claims 32 and 33, wherein the physiological data has been obtained from a wearable device.
35. The method according to any of Claims 31 to 34, wherein the additional nonbreath data comprises microbiome data.
36. The method according to any of the preceding claims, wherein the method further comprises transmitting the health report.
37. The method according to Claim 36, wherein the health report is transmitted to a health care practitioner.
38. The method according to any of Claims 36 and 37, wherein the health report is transmitted to the subject.
39. The method according to any of Claims 36 to 38, wherein the health report is received by a mobile device application.
40. A system configured to perform a method according to any of Claims 1 to 39.
41. A method comprising: breathing into a system according to Claim 40; and receiving a health report from the system according to Claim 40.
42. A non-transitory computer readable storage medium comprising instructions stored thereon for performing a method according to any of Claims 1 to 39.
PCT/US2023/027001 2022-07-07 2023-07-06 Rapid generation of breath-based health reports and systems for use in the same WO2024010854A1 (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US202263359134P 2022-07-07 2022-07-07
US63/359,134 2022-07-07
US202263416185P 2022-10-14 2022-10-14
US63/416,185 2022-10-14
US202363461498P 2023-04-24 2023-04-24
US63/461,498 2023-04-24

Publications (1)

Publication Number Publication Date
WO2024010854A1 true WO2024010854A1 (en) 2024-01-11

Family

ID=89454064

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/027001 WO2024010854A1 (en) 2022-07-07 2023-07-06 Rapid generation of breath-based health reports and systems for use in the same

Country Status (1)

Country Link
WO (1) WO2024010854A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150051920A1 (en) * 2013-08-16 2015-02-19 Sohi, Llc System and method for communication between hub, office, and laboratory
US20170059535A1 (en) * 2015-09-02 2017-03-02 Labsystems Diagnostics Oy Novel methods and kits for detecting of urea cycle disorders using mass spectrometry
US20180275143A1 (en) * 2010-07-09 2018-09-27 Somalogic, Inc. Lung Cancer Biomarkers and Uses Thereof
CN109142503B (en) * 2018-08-23 2020-10-16 厦门大学 In-situ mass spectrometry detection device for heterogeneous catalytic reaction intermediate and product
US20210393235A1 (en) * 2020-06-19 2021-12-23 Ultrasound AI, Inc. Premature Birth Prediction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180275143A1 (en) * 2010-07-09 2018-09-27 Somalogic, Inc. Lung Cancer Biomarkers and Uses Thereof
US20150051920A1 (en) * 2013-08-16 2015-02-19 Sohi, Llc System and method for communication between hub, office, and laboratory
US20170059535A1 (en) * 2015-09-02 2017-03-02 Labsystems Diagnostics Oy Novel methods and kits for detecting of urea cycle disorders using mass spectrometry
CN109142503B (en) * 2018-08-23 2020-10-16 厦门大学 In-situ mass spectrometry detection device for heterogeneous catalytic reaction intermediate and product
US20210393235A1 (en) * 2020-06-19 2021-12-23 Ultrasound AI, Inc. Premature Birth Prediction

Similar Documents

Publication Publication Date Title
Wollenstein-Betech et al. Personalized predictive models for symptomatic COVID-19 patients using basic preconditions: hospitalizations, mortality, and the need for an ICU or ventilator
Yawn et al. Development of the Lung Function Questionnaire (LFQ) to identify airflow obstruction
US11328796B1 (en) Techniques for selecting cohorts for decentralized clinical trials for pharmaceutical research
Olin et al. Continuous laryngoscopy quantitates laryngeal behaviour in exercise and recovery
Hardy et al. Missing data: a special challenge in aging research
US20200227172A1 (en) Determining indicators of individual health
US20170308981A1 (en) Patient condition identification and treatment
US11363984B2 (en) Method and system for diagnosis and prediction of treatment effectiveness for sleep apnea
US11948682B2 (en) Methods and systems for securely communicating over networks, in real time, and utilizing biometric data
US10332031B2 (en) Method and system for recommending one or more events based on mood of a person
US20210057100A1 (en) Methods and systems for generating a descriptor trail using artificial intelligence
Tsang et al. Application of machine learning algorithms for asthma management with mHealth: a clinical review
Walters et al. Clinical diaries in COPD: compliance and utility in predicting acute exacerbations
van der Spoel et al. Comparing methods for measurement error detection in serial 24-h hormonal data
Martín-Rodríguez et al. Association of prehospital oxygen saturation to inspired oxygen ratio with 1-, 2-, and 7-day mortality
Godfrey et al. Validating measures of emotional abuse with behavioral observations during interpersonal conflict
Schütz et al. A sensor-driven visit detection system in older adults’ homes: towards digital late-life depression marker extraction
Soleimani et al. Feasibility and reliability testing of manual electronic health record reviews as a tool for timely identification of diagnostic error in patients at risk
WO2024010854A1 (en) Rapid generation of breath-based health reports and systems for use in the same
Rehm et al. Use of machine learning to screen for acute respiratory distress syndrome using raw ventilator waveform data
US20240168008A1 (en) Methods of Breath-Based PFAS Exposure Assessment, and Systems for Practicing the Same
US20240142403A1 (en) Methods of Breath-Based Toxin Exposure Assessment, and Systems for Practicing the Same
Pozo et al. Evaluating the Reliability and Validity of the Famous Faces Doppelgangers Test, a Novel Measure of Familiar Face Recognition
US20210193276A1 (en) Integrated healthcare monitoring system and method therefor
US11810669B2 (en) Methods and systems for generating a descriptor trail using artificial intelligence

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23836087

Country of ref document: EP

Kind code of ref document: A1