WO2025229620A1 - System and method for early diagnosis of endometriosis - Google Patents

System and method for early diagnosis of endometriosis

Info

Publication number
WO2025229620A1
WO2025229620A1 PCT/IB2025/054638 IB2025054638W WO2025229620A1 WO 2025229620 A1 WO2025229620 A1 WO 2025229620A1 IB 2025054638 W IB2025054638 W IB 2025054638W WO 2025229620 A1 WO2025229620 A1 WO 2025229620A1
Authority
WO
WIPO (PCT)
Prior art keywords
machine learning
data
learning model
subjects
patient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/IB2025/054638
Other languages
French (fr)
Inventor
Isabelle Katherine Marquez CHICKANOSKY
David A. Vorp
Nicole Michelle DONNELLAN
Timothy Kwang-Joon CHUNG
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Pittsburgh
Original Assignee
University of Pittsburgh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Pittsburgh filed Critical University of Pittsburgh
Publication of WO2025229620A1 publication Critical patent/WO2025229620A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/20ICT specially adapted for the handling or processing of patient-related medical or healthcare data for electronic clinical trials or questionnaires
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/40ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Definitions

  • Endometriosis is a painful gynecological disease affecting 10% of people with uteruses (-200 million) worldwide. Characterized by the presence of endometrial-like tissue outside the uterus in the peritoneal cavity, patients with EM often suffer from infertility and chronic pelvic pain while awaiting their diagnosis and treatment. The average time to EM diagnosis is 6.7 years from the onset of symptoms due to the general lack of knowledge of the disease, excessive cost of diagnosis, and confounding symptoms leading to misdiagnoses. The only currently accepted method of EM diagnosis is through surgery and histological confirmation of disease.
  • Ultrasound and magnetic resonance imaging can be used to inform surgeons, but provide unreliable visualization of EM lesions, oftentimes only identifying high-stage, non-specific lesions.
  • EM lesions Approximately 20-40 percent of patients who undergo the diagnostic surgery are found not to have EM, resulting in an unnecessary, invasive exploratory procedure. See Albee etal., Laparoscopic Excision of Lesions Suggestive of Endometriosis or Otherwise Atypical in Appearance: Relationship Between Visual Findings and Final Histologic Diagnosis, Minimally Invasive Gynecology, 2008, 15(1 ): 32-37.
  • An improved method for early, cost-effective, non-invasive, and accurate EM diagnosis would be technological for patients and clinicians.
  • a computer-implemented method of training a machine learning model for identifying the presence of endometriosis in a symptomatic patient including receiving, with at least one processor, biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects, receiving, with at least one processor, clinical data relating to one or more clinical features of one or more subjects, receiving, with at least one processor, survey data relating to one or more validated, subjective parameters of one or more subjects, training, with at least one processor and based at least on the biomarker data, the clinical data, and the survey data, a machine learning model.
  • Also provided herein is a computer-implemented method of identifying the presence of endometriosis in a symptomatic patient, including training, with at least one processor, a machine learning model based at least on biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects, clinical data relating to one or more clinical features of one or more subjects, and survey data relating to one or more validated, subjective parameters of one or more subjects.
  • the method further includes applying, with at least one processor, the machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from the patient, and based on applying the machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determining, with at least one processor, that the patient has endometriosis or that the patient does not have endometriosis.
  • a system including at least one processor programmed or configured to receive biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects, receive clinical data relating to one or more clinical features of one or more subjects, receive survey data relating to one or more validated, subjective parameters of one or more subjects, train, based at least on the biomarker data, the clinical data, and the survey data, a machine learning model.
  • a system including at least one processor programmed or configured to train a machine learning model based at least on biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects, clinical data relating to one or more clinical features of one or more subjects, and survey data relating to one or more validated, subjective parameters of one or more subjects.
  • the at least one processor is further programmed or configured to apply the machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from a patient, and based on applying the machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determine that the patient has endometriosis or that the patient does not have endometriosis.
  • Also provided herein is a computer-implemented method of identifying the presence of endometriosis in a patient, including applying, with at least one processor, a trained machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from the patient and based on applying the trained machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determining, with at least one processor, that the patient has endometriosis or that the patient does not have endometriosis, wherein the machine learning model is trained with biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects, clinical data relating to one or more clinical features of one or more subjects, and survey data relating to one or more validated, subjective parameters of one or more subjects.
  • system comprising at least one processor programmed or configured to apply a trained machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from a patient and based on applying the trained machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determine that the patient has endometriosis or that the patient does not have endometriosis, wherein the machine learning model is trained with at least biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects, clinical data relating to one or more clinical features of one or more subjects, and survey data relating to one or more validated, subjective parameters of one or more subjects.
  • Also provided herein is a non-transitory, computer-readable medium including programming instructions that, when executed by at least one processor, cause the at least one processor to receive clinical data relating to one or more clinical features of one or more subjects, receive survey data relating to one or more validated, subjective parameters of one or more subjects, train, based at least on the biomarker data, the clinical data, and the survey data, a machine learning model.
  • a non-transitory, computer-readable medium including programming instructions that, when executed by at least one processor, cause the at least one processor to apply a trained machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from a patient, and based on applying the trained machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determine that the patient has endometriosis or that the patient does not have endometriosis, wherein the machine learning model is trained with at least biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects, clinical data relating to one or more clinical features of one or more subjects, and survey data relating to one or more validated, subjective parameters of one or more subjects.
  • a computer-implemented method of training a machine learning model for identifying the presence of endometriosis in a patient comprising: receiving, with at least one processor, biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects; receiving, with at least one processor, clinical data relating to one or more clinical features of one or more subjects; receiving, with at least one processor, survey data relating to one or more validated, subjective parameters of one or more subjects; and training, with at least one processor and based at least on the biomarker data, the clinical data, and the survey data, a machine learning model.
  • a computer-implemented method of identifying the presence of endometriosis in a patient comprising: training, with at least one processor, a machine learning model based at least on: biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects; clinical data relating to one or more clinical features of one or more subjects; and survey data relating to one or more validated, subjective parameters of one or more subjects; applying, with at least one processor, the machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from the patient; and based on applying the machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determining, with at least one processor, that the patient has endometriosis or that the patient does not have endometriosis.
  • a system comprising at least one processor programmed or configured to: receive biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects; receive clinical data relating to one or more clinical features of one or more subjects; receive survey data relating to one or more validated, subjective parameters of one or more subjects; and train, based at least on the biomarker data, the clinical data, and the survey data, a machine learning model.
  • a system comprising at least one processor programmed or configured to: train a machine learning model based at least on: biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects; clinical data relating to one or more clinical features of one or more subjects; and survey data relating to one or more validated, subjective parameters of one or more subjects; apply the machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from a patient; and based on applying the machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determine that the patient has endometriosis or that the patient does not have endometriosis.
  • a computer-implemented method of identifying the presence of endometriosis in a patient comprising: applying, with at least one processor, a trained machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from the patient; and based on applying the trained machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determining, with at least one processor, that the patient has endometriosis or that the patient does not have endometriosis, wherein the machine learning model is trained with: biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects; clinical data relating to one or more clinical features of one or more subjects; and survey data relating to one or more validated, subjective parameters of one or more subjects.
  • a system comprising at least one processor programmed or configured to: apply a trained machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from a patient; and based on applying the trained machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determine that the patient has endometriosis or that the patient does not have endometriosis, wherein the machine learning model is trained with at least: biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects; clinical data relating to one or more clinical features of one or more subjects; and survey data relating to one or more validated, subjective parameters of one or more subjects.
  • a non-transitory, computer-readable medium comprising programming instructions that, when executed by at least one processor, cause the at least one processor to: receive clinical data relating to one or more clinical features of one or more subjects; receive survey data relating to one or more validated, subjective parameters of one or more subjects; and train, based at least on the biomarker data, the clinical data, and the survey data, a machine learning model.
  • a non-transitory, computer-readable medium comprising programming instructions that, when executed by at least one processor, cause the at least one processor to: apply a trained machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from a patient; and based on applying the trained machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determine that the patient has endometriosis or that the patient does not have endometriosis, wherein the machine learning model is trained with at least: biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects; clinical data relating to one or more clinical features of one or more subjects; and survey data relating to one or more validated, subjective parameters of one or more subjects.
  • FIG. 1 is a schematic depicting sources of inputs for systems and methods according to non-limiting embodiments described herein;
  • FIG. 2 is a schematic depicting a machine learning pipeline useful in implementing the systems and methods according to non-limiting embodiments described herein;
  • FIG. 3 is a schematic depicting a machine learning pipeline useful in implementing the systems and methods for identifying the presence of endometriosis (EM) according to non-limiting embodiments described herein;
  • FIG. 4 is a schematic depicting a machine learning pipeline useful in implementing the systems and methods for staging EM according to non-limiting embodiments described herein;
  • FIG. 5 is a schematic depicting a machine learning pipeline useful in implementing the systems and methods according to non-limiting embodiments described herein;
  • FIG. 6 is a schematic diagram of example components of one or more devices useful in non-limiting embodiments of systems and methods according to nonlimiting embodiments described herein;
  • FIG. 7 shows a confusion matrix (panel A) and an ROC curve (panel B) for a machine learning model according to non-limiting embodiments described herein;
  • FIGS. 8A-8B show a confusion matrix (A) and an ROC curve (B) for another machine learning model according to non-limiting embodiments described herein;
  • FIGS. 9A-9B show a confusion matrix (A) and an ROC curve (B) for an additional machine learning model according to non-limiting embodiments described herein;
  • FIGS. 10A-10B show a confusion matrix (A) and an ROC curve (B) for another machine learning model according to non-limiting embodiments described herein;
  • FIGS. 11A-11 B show a confusion matrix (A) and an ROC curve (B) for an additional machine learning model according to non-limiting embodiments described herein;
  • FIGS. 12A-12B show a confusion matrix (A) and an ROC curve (B) for another machine learning model according to non-limiting embodiments described herein;
  • FIGS. 13A-13B show a confusion matrix (A) and an ROC curve (B) for an additional machine learning model according to non-limiting embodiments described herein;
  • FIGS. 14A-14B show a confusion matrix (A) and an ROC curve (B) for another machine learning model according to non-limiting embodiments described herein;
  • FIGS. 15A-15B show a confusion matrix (A) and an ROC curve (B) for an additional machine learning model according to non-limiting embodiments described herein;
  • FIGS. 16A-16B show a confusion matrix (A) and an ROC curve (B) for another machine learning model according to non-limiting embodiments described herein.
  • satisfying a threshold may refer to a value being greater than the threshold, more than the threshold, higher than the threshold, greater than or equal to the threshold, less than the threshold, fewer than the threshold, lower than the threshold, less than or equal to the threshold, equal to the threshold, etc.
  • the terms “has,” “have,” “having,” or the like are intended to be open-ended terms. Further, the phrase “based on” is intended to mean “based at least partially on” unless explicitly stated otherwise.
  • reference to an action being “based on” a condition may refer to the action being “in response to” the condition.
  • the phrases “based on” and “in response to” may, in some non-limiting embodiments or aspects, refer to a condition for automatically triggering an action (e.g., a specific operation of an electronic device, such as a computing device, a processor, and/or the like).
  • the term “communication” may refer to the reception, receipt, transmission, transfer, provision, and/or the like of data (e.g., information, signals, messages, instructions, commands, and/or the like).
  • data e.g., information, signals, messages, instructions, commands, and/or the like.
  • one unit e.g., a device, a system, a component of a device or system, combinations thereof, and/or the like
  • this may refer to a direct or indirect connection (e.g., a direct communication connection, an indirect communication connection, and/or the like) that is wired and/or wireless in nature.
  • two units may be in communication with each other even though the information transmitted may be modified, processed, relayed, and/or routed between the first and second unit.
  • a first unit may be in communication with a second unit even though the first unit passively receives information and does not actively transmit information to the second unit.
  • a first unit may be in communication with a second unit if at least one intermediary unit processes information received from the first unit and communicates the processed information to the second unit.
  • a message may refer to a network packet (e.g., a data packet and/or the like) that includes data. It will be appreciated that numerous other arrangements are possible. Communication may include one or more wired and/or wireless networks.
  • communication may include a cellular network (e.g., a long-term evolution (LTE) network, a third-generation (3G) network, a fourth-generation (4G) network, a fifth-generation (5G) network, a code division multiple access (CDMA) network, etc.), a public land mobile network (PLMN), a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a telephone network (e.g., the public switched telephone network (PSTN) and/or the like), a private network, an ad hoc network, an intranet, the Internet, a fiber optic-based network, a cloud computing network, and/or the like, and/or a combination of some or all of these or other types of networks.
  • LTE long-term evolution
  • 3G third-generation
  • 4G fourth-generation
  • 5G fifth-generation
  • CDMA code division multiple access
  • PLMN public land mobile network
  • LAN local area network
  • WAN wide area network
  • MAN metropolitan
  • computing device may refer to one or more electronic devices configured to process data.
  • a computing device may, in some examples, include the necessary components to receive, process, and output data, such as a processor, a display, a memory, an input device, a network interface, and/or the like.
  • a computing device may be a mobile device.
  • a mobile device may include a cellular phone (e.g., a smartphone or standard cellular phone), a portable computer, a wearable device (e.g., watches, glasses, lenses, clothing, and/or the like), a personal digital assistant (PDA), and/or other like devices.
  • a computing device may also be a desktop computer or other form of non-mobile computer.
  • server may refer to or include one or more computing devices that are operated by or facilitate communication and processing for multiple parties in a network environment, such as the Internet, although it will be appreciated that communication may be facilitated over one or more public or private network environments and that various other arrangements are possible. Further, multiple computing devices (e.g., servers, mobile devices, etc.) directly or indirectly communicating in the network environment may constitute a “system.”
  • system may refer to one or more computing devices or combinations of computing devices (e.g., processors, servers, client devices, software applications, components of such, and/or the like).
  • references to “a device,” “a server,” “a processor,” and/or the like, as used herein, may refer to a previously-recited device, server, or processor that is recited as performing a previous step or function, a different device, server, or processor, and/or a combination of devices, servers, and/or processors.
  • a first device, a first server, or a first processor that is recited as performing a first step or a first function may refer to the same or different device, server, or processor recited as performing a second step or a second function.
  • machine learning may refer to a field of computer science that uses statistical techniques to provide a computer system with the ability to learn (e.g., to progressively improve performance of) a task with data without the computer system being explicitly programmed to perform the task.
  • a machine learning model may be developed for a set of data so that the machine learning model may perform a task (e.g., a task associated with a prediction) with regard to the set of data.
  • a machine learning model such as a predictive machine learning model, may be used to make a prediction regarding a risk or an opportunity based on a large amount of data (e.g., a large-scale dataset).
  • a predictive machine learning model may be used to analyze a relationship between the performance of a unit based on a large-scale dataset associated with the unit and one or more known features of the unit. The objective of the predictive machine learning model may be to assess the likelihood that a similar unit will exhibit the same or similar performance as the unit.
  • the large-scale dataset may be segmented so that the predictive machine learning model may be trained on data that is appropriate.
  • systems and methods that utilize machine learning to train a model, based biomarker data, clinical data, and survey data, for early and accurate identification of the presence of endometriosis in a patient.
  • Such systems and methods provide significant improvements in speed and accuracy of identification, as well as a reduction in invasiveness for the patient and recourses required by healthcare professionals, over existing methods of identifying the presence of endometriosis, which allows for earlier, more effective treatment (e.g., more timeefficient surgical identification and/or excision), a wider ability to provide healthcare services to a variety of populations, and a conservation of resources.
  • FIG. 1 shown is a schematic illustrating potential sources of data to be input to the systems and methods disclosed herein, for example to train a machine learning model capable of identifying and/or staging the presence of endometriosis (EM) in a patient.
  • the machine learning model is trained with data obtained from one or more subjects, who may or may not have EM.
  • the sources of data include biomarker data.
  • the biomarker data may include data obtained from one or more tissues and/or bodily fluids, for example, and without limitation, a subject’s blood, urine, and/or saliva.
  • Collected fluids for example blood, may be peripheral or local to the uterus and/or peritoneal cavity. Collection and analysis of biomarkers from these sources is within the level of skill of those in the field, as set forth in, for example, Tian eta/., Current biomarkers for the detection of endometriosis, Chinese Medical Journal 133(19) (2020) 2346-2352; Leyendecker et a/., Endometriosis results from the dislocation of basal endometrium, Human Reproduction 17(10) (2002) 2725-2736; Stefansson at a/., Genetic factors contribute to the risk of developing endometriosis, Human Reproduction 17(3) (2002) 555-559; and Sapkota et a/., Meta-analysis identifies five novel loci associated with endometriosis highlighting key genes involved in hormone metabolism, Nature communications 8(1 ) (2017) 1 -12. As will be appreciated by those of skill in the art, numerous biomarkers that may be potentially relevant to identification
  • useful biomarkers may include genetic markers, for example the presence of single nucleotide polymorphisms (SNPs).
  • SNPs of interest may include those in WNT4, GREB1, and/or KDR.
  • the SNPs of interest may include one or more of rs12037376, rs1 1674184, rs6546324, rs10167914, rs1903068, rs760794, rs12700667, rs1537377, rs4762326, rs1250241 , rs1971256, rs71575922, rs74491657, and rs74485684.
  • the SNPs of interest may include one or more of rs1519761 , rs7512902, rs4141819, rs7739264, rs12700667, rs1537377, rs10965235, rs10859871 , rs17773813, rs519664, and rs6542095.
  • Processes and assays known to those of skill in the art, for example polymerase chain reaction (PCR) and/or sequencing, may be used to identify generic markers of interest.
  • useful biomarkers may include the presence, quantity, and/or concentration of extracellular vesicles, for example extracellular vesicles in the endometrium, hormones, for example estrogen, urocortin, and/or progesterone, neurotransmitters, growth factors, for example vascular endothelial growth factor, peptides, cytokines and/or immune factors, for example interleukin-1 beta (IL-113), interleukin-1 receptor agonist protein (IL-1 RN), interleukin-2 (IL-2), interleukin-4 (IL-4), interleukin-6 (IL-6), interleukin-8 (IL-8), interleukin-10 (IL-10), interleukin-12 (IL-12 and/or IL-12p70), interleukin-17 alpha (IL-17a), interferon gamma (IFN-y), leptin, glycodelin, chemokine ligand 20 (CCL-20), granulocyte colonystimulating factor
  • IL-1 beta interleukin
  • Processes and assays known to those of skill in the art for example chromatography, immunohistochemistry, immunofluorescent assays, immunosorbent assays (including ELISA), blotting, and/or binding assays, may be used to identify and/or quantify such biomarkers.
  • useful biomarkers may include the presence, quantity, and/or concentration of one or more cells.
  • the one or more cells may include stromal cells, for example endometrial stromal cells or endometrial epithelial cells (obtained, for example, with a non-invasive and/or minimally-invasive endometrial biopsy or menstrual effluent collection), circulating endometrium cells, and/or immune cells, for example neutrophils, eosinophils, basophils, mast cells, monocytes, macrophages, dendritic cells, natural killer cells, and/or lymphocytes. Processes and assays known to those of skill in the art, for example flow cytometry, may be used to identify and/or quantify such cells.
  • a prediction machine learning model may be generated to provide a prediction of whether a subject has EM and/or a stage of EM based on a training dataset.
  • the prediction machine learning model may include a machine learning model designed to receive, as an input, data associated with a subject (e.g., biomarker data, survey data, and/or clinical data associated with a subject) and provide, as an output, a prediction of whether a subject has EM and/or a stage of EM.
  • the prediction machine learning model may be designed to receive data associated with a subject during a time interval and provide an output that includes the prediction of whether a subject has EM and/or a stage of EM.
  • a system may store the prediction machine learning model (e.g., for later use).
  • the system may process data associated with a subject during a time interval (e.g., historical data associated with a subject) to obtain training data (e.g., a training dataset) for the prediction machine learning model.
  • training data e.g., a training dataset
  • the data may be processed by a system to change the data into a format that may be analyzed o generate the prediction machine learning model.
  • the data that is changed (e.g., the data that results from the change) may be referred to as training data.
  • a system may process the data associated with a subject during a time interval to obtain the training data based on receiving the data.
  • a system may process the data to obtain the training data based on the system receiving an indication, from a user (e.g., a user associated with a user device) of the system, that the system is to process the data, such as when the system receives an indication to generate a prediction machine learning model for a time interval corresponding to the data associated with a subject.
  • a user e.g., a user associated with a user device
  • the system may process data associated with a subject by determining an EM prediction variable based on the data.
  • An EM prediction variable may include a metric, associated with a diagnosis of EM, which may be derived based on the data associated with a subject.
  • the EM prediction variable may be analyzed to generate a prediction machine learning model .
  • the EM prediction variable may include a variable associated with particular medical condition of a subject, a variable associated with a diagnosis of a medication condition of a subject, a variable associated with whether a subject took a medication, a variable associated with responses to a survey (e.g., a survey associated with a determination of EM in a subject) provided by a subject, and/or the like.
  • the system may analyze the training data to generate the prediction machine learning model. For example, the system may use machine learning techniques to analyze the training data to generate the prediction machine learning model. In some non-limiting embodiments, generating the prediction machine learning model (e.g., based on training data obtained from historical data associated with subject during a previous time interval) may be referred to as training the prediction machine learning model.
  • the machine learning techniques may include, for example, supervised and/or unsupervised techniques, such as decision trees, random forests, logistic regressions, linear regression, gradient boosting, supportvector machines, extra-trees (e.g., an extension of random forests), Bayesian statistics, learning automata, Hidden Markov Modeling, linear classifiers, quadratic classifiers, association rule learning, and/or the like.
  • the prediction machine learning model may include a model that is specific to a particular characteristic, for example, a model that is specific to a particular subject, a particular geographical area of subjects, a particular time interval during which a diagnosis of a medical condition may have been made for a subject, and/or the like.
  • the prediction machine learning model may be specific to a particular entity (e.g., a subject that fits into a demographic category, such as an age group).
  • the system may generate one or more prediction machine learning models for one or more subjects, a particular group of subjects, and/or one or more subjects of a particular group of subjects.
  • the system may identify one or more variables (e.g., one or more independent variables) as predictor variables (e.g., features) that may be used to make a prediction when analyzing the training data.
  • values of the predictor variables may be inputs to the prediction machine learning model.
  • the system may identify a subset (e.g., a proper subset) of the variables as the predictor variables that may be used to accurately predict a determination of whether EM is present in or stage of EM that is present in a subject.
  • the predictor variables may include one or more of the prediction variables, as discussed above, that have a significant impact (e.g., an impact satisfying a threshold) on a prediction of whether a subject has EM and/or a stage of EM as determined by the system.
  • a significant impact e.g., an impact satisfying a threshold
  • the system may validate the prediction machine learning model. For example, the system may validate the prediction machine learning model after the system generates the prediction machine learning model. In some non-limiting embodiments, the system may validate the prediction machine learning model based on a portion of the training data to be used for validation. For example, the system may partition the training data into a first portion and a second portion, where the first portion may be used to generate the prediction machine learning model , as described above. In this example, the second portion of the training data (e.g., the validation data) may be used to validate the prediction machine learning model.
  • the system may validate the prediction machine learning model after the system generates the prediction machine learning model.
  • the system may validate the prediction machine learning model based on a portion of the training data to be used for validation. For example, the system may partition the training data into a first portion and a second portion, where the first portion may be used to generate the prediction machine learning model , as described above. In this example, the second portion of the training data (e.g., the validation data) may be used to
  • the system may validate the prediction machine learning model by providing validation data associated with a user (e.g., data associated with one or more subjects) as input to the prediction machine learning model, and determining, based on an output of the prediction machine learning model , whether the prediction machine learning model correctly, or incorrectly, predicted whether a subject has EM and/or a stage of EM.
  • the system may validate the prediction machine learning model based on a validation threshold.
  • the system may be configured to validate the prediction machine learning model when the determination of whether a subject has EM and/or a stage of EM (as identified by the validation data) are correctly predicted by the prediction machine learning model (e.g., when the prediction machine learning model correctly predicts 50% of a data set as to whether a subject has EM and/or a stage of EM, 70% of a data set as to whether a subject has EM and/or a stage of EM, a threshold quantity of a population of a data set as to whether a subject has EM and/or a stage of EM, and/or the like).
  • the prediction machine learning model correctly predicts 50% of a data set as to whether a subject has EM and/or a stage of EM, 70% of a data set as to whether a subject has EM and/or a stage of EM, a threshold quantity of a population of a data set as to whether a subject has EM and/or a stage of EM, and/or the like).
  • the system may generate one or more additional prediction machine learning models.
  • the system may further train the prediction machine learning model and/or generate new prediction machine learning models based on receiving new training data.
  • the new training data may include additional data associated with one or more subjects.
  • the new training data may include data associated with of whether a subject has EM and/or a stage of EM.
  • the system may use the prediction machine learning model to predict whether a subject has EM and/or a stage of EM and compare an output of a prediction machine learning model to the new training data that includes data associated with whether a subject has EM and/or a stage of EM.
  • the system may update one or more prediction machine learning models based on the new training data.
  • the system may store the prediction machine learning model.
  • the system may store the prediction machine learning model in a data structure (e.g., a database, a linked list, a tree, and/or the like).
  • the data structure may be located within the system or external (e.g., remote from) the system.
  • the sources of data useful in the systems and methods disclosed herein may include clinical data.
  • clinical data may include data that may be part of a subject’s electronic health record (EHR).
  • EHR electronic health record
  • the clinical data may include one or more of the subject’s age, race, family history of endometriosis, medication history, for example history of contraceptive use, antidepressant use, aspirin use, metformin use, muscle relaxant use, non-steroidal anti-inflammatory drug (NSAID) use, opioid use, steroid use (e.g., prednisone), and/or statin use, age of first menstrual cycle, weight, height, body mass index, previously-diagnosed diseases or conditions and/or comorbidities, for example asthma, cancer (e.g., breast cancer, ovarian cancer, and/or uterine cancer), depression, bipolar disorder, anxiety, post- traumatic stress disorder (PTSD), thyroid disorders, gastroesophageal reflux disease (GERD), irritable bowel syndrome (IBS), Crohn’s Disease, ulcerative colitis, fibromyalgia, chronic fatigue, history of migraines, diabetes, cardiovascular disease (e.g., hypertension), hyperlipidemia, and/
  • the sources of data useful in the systems and methods disclosed herein may include survey data, including subject self-reported data, validated, subjective parameters.
  • validated, subjective parameters may be objectified through use of a visual analog scale.
  • one or more scores of the visual analog scale may include one or more numeric ratings of the one or more subjects’ self-perception of pain, self-perception of control and/or powerlessness, self-perception of social support, self-perception of emotional well-being, self-image, duration of pain, location of pain, menstrual cycle length, and/or menstrual cycle irregularities before and/or after hormone therapy.
  • the validated, subjective parameters are ascertained through use of the Endometriosis Health Profile-30 (EHP30) (see, e.g., Jones et al., Development of an endometriosis quality-of-life instrument: The Endometriosis Health Profile-30, Obstetrics & Gynecology 2001 , 98(2): 258-264).
  • EHP30 Endometriosis Health Profile-30
  • the core EHP30 (30 questions) is utilized.
  • the long-form EHP30 is utilized.
  • the shortform EHP-5 is utilized.
  • one or more EHP30 supplementary modules are utilized as an alternative and/or in addition to the core EHP30 and/or the EHP-5.
  • sources of data described herein may be used to build a database for training and/or validating a machine learning model as described herein.
  • additional data may be added to the database as the machine learning model is applied to patients. That is, the processes described herein may be iterative.
  • the data used to build a database as described herein may include identifying information, the data may be encrypted or de-identified.
  • FIG. 2 shown is a schematic representation of a process of training a machine learning model with data from one or more subjects, the data including at least those described herein above.
  • the machine learning model is trained with at least biomarker data, survey data, and/or clinical data as described herein.
  • data from one or more subjects e.g., biomarker data, survey data, and/or clinical data associated with a subject, biomarker data, survey data, and/or clinical data associated with a plurality of subjects, biomarker data, survey data, and/or clinical data associated with a specified group of subjects, etc.
  • the data set is made up of only symptomatic patients (e.g., there are no patients that are healthy controls).
  • the machine learning model (e.g., the trained machine learning model) may be configured to output, based on an input, which includes a prediction of whether a subject will have endometriosis (e.g., a prediction of whether the subject is to be diagnosed as having endometriosis).
  • the machine learning model may be configured to output, based on an input, which includes a prediction of which stage of endometriosis a subject may have (e.g., a prediction of whether the subject is to be diagnosed as Stage 1 , Stage 2, Stage 3, or Stage 4 endometriosis).
  • the training data set may be associated with a population of subjects that includes a plurality of data records associated with a plurality of features.
  • the training data set may include a large amount of data records, such as 100 data records, 500 data records, 1 ,000 data records, 5,000 data records, 10,000 data records, 25,000 data records, 50,000 data records, 100,000 data records, 1 ,000,000 data records, and/or the like.
  • the plurality of features may represent variables associated with biomarker data, survey data, and/or clinical data associated with one or more subjects.
  • the plurality of features may include data associated with whether a subject took muscle relaxants, data associated with whether a subject took sleep medication, data associated with whether a subject took hormonal medication, data associated with whether a subject has (e.g., was diagnosed with) post-traumatic stress disorder (PTSD), data associated with whether a subject took contraceptive medication, data associated with whether a subject took attention deficit hyperactivity disorder (ADHD) medication, data associated with whether a subject took asthma medication, data associated with whether a subject has interstitial cystitis, data associated with whether a subject has asthma, data associated with responses to an Endometriosis Health Profile (EHP) survey (e.g., an EHP infertility survey, an EHP self-perception survey, an EHP social survey, an EHP pain survey, an EHP work survey, an EHP control survey, an EHP pain survey, an EHP sexual survey, an EHP family survey, an EHP treatment survey, an EHP medical treatment survey, an EHP emotion survey, an EHP survey total, etc.)
  • EHP Endo
  • the EHP survey suite is a Health Related Quality of Life patient self-report survey that can be used to assess areas of concern within women with endometriosis (e.g., available at https://innovation.ox.ac.uk/outcome-measures/endometriosis-health-profile-ehp/).
  • the EHP surveys may allow for an understanding of the perception of a patient’s pain and experiences of the burden of endometriosis-like symptoms during their search for a diagnosis.
  • the EHP highlights areas of concern for patients, such as the impact of their symptoms on infertility, self-perception, social life, pain persistence, work life, sexual life, family life, control of oneself, medical treatment, and emotions.
  • the VAS survey suite provides information about the patient’s perception of their pain during menstruation, intercourse, bowel movements, at rest, and overall combination of each of these responses. These responses may be collected on a scale from 1 to 10 through an electronic survey.
  • a machine learning model for determining the presence of endometriosis in a patient may be trained with and/or may be applied to clinical data as described herein.
  • a machine learning model for determining a stage of endometriosis in a patient may be trained with and/or may be applied to clinical data as described herein.
  • a machine learning model for determining a stage of endometriosis may be trained with and/or may be applied to survey data (e.g., EHP data and/or VAS data as described herein).
  • medication data may be included in a training data set (with clinical and/or survey data) for determining the presence and/or stage of endometriosis.
  • Subjects from whom data may be collected for training may include those with or without symptoms of EM, those diagnosed with EM, and/or those who have not been diagnosed with EM.
  • data from patients diagnosed with EM is compared to patients diagnosed as not having EM.
  • the machine learning model may include a particular architecture
  • the machine learning model may include a linear support vector machine (SVM) (e.g., an efficient linear SVM, a Gaussian SVM, such as a medium Gaussian SVM, a Quadratic SVM, a Cubic SVM, etc.), a subspace discriminant (e.g., an ensemble subspace discriminant), a neural network (e.g., a deep learning neural network, a narrow neural network, a medium neural network, a bi-layered neural network, a tri-layered neural network, etc.), a decision tree (e.g., a gradient boosted tree, an ensemble boosted tree, etc.), and/or a K-Nearest Neighbors (KNN) (e.g., a fine KNN) model architecture.
  • SVM linear support vector machine
  • KNN K-Nearest Neighbors
  • the machine learning model is trained, with data escribed herein from one or more subjects, with a k-nearest algorithm, a naive bayes algorithm, a random forest algorithm, boosted gradient algorithm, and/or a neural network.
  • the machine learning model may be trained with random forest trees, boosted gradient ensemble models, and neural networks using hold-out validation.
  • the machine learning model may be compared against multinomial logistic regression models with regards to ability to accurately identify patients with and/or without EM, and, in non-limiting embodiments, to predict EM stage.
  • EM may be staged in up to four stages, with Stage 1 being the most mild (few superficial implants), Stages 2 (more and deeper implants) and 3 (many deep implants, small cysts on one or both ovaries, and presence of filmy adhesions) being more severe, and Stage 4 (many deep implants, large cysts on one or both ovaries, many dense adhesions) being the most severe.
  • the machine learning model may classify and/or categorize the EM, from Category I (peritoneal EM), through Category II (ovarian EM), Category III (deep infiltrating EM I), to Category IV (deep infiltrating EM II).
  • the machine learning model is trained with one or more data points shown in Table 1 , below.
  • the machine learning model that is trained is a classification model.
  • multiple machine learning models are trained, validated, and/or employed.
  • the multiple models may include a model for identifying the presence of EM, a model for staging EM in a patient, and/or a model allowing a patient to, with one or more of the data points described herein, enter information into software, for example an application executable on a computing device as described herein, to obtain at least a preliminary indication of presence and/or staging of EM.
  • the machine learning model may accommodate longitudinal data, where two or more timepoints collecting the biomarker, clinical, and survey data can be used to improve long-term predictions using recurrent neural networks or other forecasting architectures.
  • the training and testing protocol may require that a robust hold-out validation be performed (i.e., separating a testing and training dataset a priori).
  • a probability density function may be fit to any continuous variable (i.e., non-binary variables) that enables histogram matching of the respective datasets:
  • x is the continuous variable
  • e is the natural exponent.
  • a random number generator to extract a proportion of the dataset may then be used to split the testing and training dataset incrementally such that the training and testing data split does not exceed 70/30.
  • a minimum AUC of 0.70 may be required to exhibit reasonable discriminability of patient outcomes for the presence of EM and the stage of EM (stages 1 - 4 or pooling stages 1 and 2, and 3 and 4) for a classificationbased ML model.
  • stages 1 - 4 or pooling stages 1 and 2, and 3 and 4 for a classificationbased ML model.
  • intermediate AUCs of 0.70 and changes in status may be important in matching the temporal changes to a patient’s overall health to reflect the precise time when EM is non- invasively diagnosed (as matched to the available data on surgical confirmation of the presence of EM and respective stage).
  • the data described herein is used to train and/or validate a machine learning model for identifying the presence of EM in a patient.
  • the method may include receiving biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects, receiving clinical data relating to one or more clinical features of one or more subjects, and receiving survey data relating to one or more validated, subjective parameters of one or more subjects.
  • the method may further include training, based at least on the biomarker data, the clinical data, and the survey data, a machine learning model, for example a classification model. The training may be completed with any useful algorithm, for example those described herein.
  • the machine learning model is applied to one or more of biomarker data, clinical data, and/or survey data obtained from the patient and, based on applying the machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, the method may include determining that the patient has EM or that the patient does not have EM. In non-limiting embodiments, after determining that the patient has EM, the method may include determining, with the machine learning model or another machine learning model, a stage of the patient’s EM.
  • a feature importance and/or dimensionality reduction process may be conducted in conjunction with training of the machine learning model. Feature importance of data inputs will be determined, and through dimensionality reduction, a number of features may be determined to be unnecessary. The resulting subset of input data will be used to train a machine learning model. In parallel with the machine learning training and testing, multinomial logistic regression may be performed, using the database described above as inputs, to assess its efficacy in diagnosing disease presence and stage in comparison to the machine learning models. Lastly, the best performing machine learning model will be validated using hold-out validation, a preferred method of machine learning validation where the dataset being used was not involved in the training of the models.
  • FIG. 3 shown is a schematic representation of a process of performing an inference task of that results in a prediction of a diagnosis of endometriosis for a subject.
  • new data e.g., input data that was not used in training the machine learning model, which may include biomarker data, clinical data, and survey data
  • the trained machine learning classification model may be configured to provide a prediction of a diagnosis of endometriosis for the patient.
  • the trained machine learning classification model may be configured to receive an input that is based on a feature importance and/or dimensionality reduction procedure.
  • the trained machine learning classification model may be configured to receive an input that includes a plurality of features that are a subset of all features in the training data set based on the feature importance and/or dimensionality reduction procedure.
  • the trained machine learning classification model may be configured to receive an input that includes a feature associated with whether a subject whether a subject took muscle relaxants, a feature associated with whether a subject took asthma medication, a feature associated with whether a subject has (e.g., was diagnosed with) post-traumatic stress disorder (PTSD), a feature associated with whether a subject took hormonal medication, a feature associated with whether a subject took attention deficit hyperactivity disorder (ADHD) medication, and/or a feature associated with whether a subject was diagnosed with ADHD.
  • PTSD post-traumatic stress disorder
  • ADHD attention deficit hyperactivity disorder
  • the trained machine learning classification model may be configured to receive an input that includes a feature associated with whether a subject whether a subject took muscle relaxants, a feature associated with whether a subject took asthma medication, a feature associated with whether a subject has (e.g., was diagnosed with) post-traumatic stress disorder (PTSD), a feature associated with whether a subject took hormonal medication, and/or a feature associated with a cycle frequency of a menstrual cycle of a subject.
  • PTSD post-traumatic stress disorder
  • FIG. 4 shown is a schematic representation of a process of performing an inference task of that results in a prediction of a diagnosis of a stage (e.g., stage 1 , stage 2, stage 3, stage 4, stages 1 or 2, stages 3 or 4, etc.) of endometriosis for a subject.
  • new data e.g., input data that was not used in training the machine learning model, which may include biomarker data, clinical data, and survey data
  • the trained machine learning classification model may be configured to provide a prediction of a diagnosis of a stage of endometriosis for the patient.
  • the trained machine learning classification model may be configured to receive an input that is based on a feature importance and/or dimensionality reduction procedure.
  • the trained machine learning classification model may be configured to receive an input that includes a plurality of features that are a subset of all features in the training data set based on the feature importance and/or dimensionality reduction procedure.
  • the trained machine learning classification model may be configured to receive an input that includes a feature associated with the age of a subject, a feature associated with responses to an EHP pre-sexual survey by a subject, a feature associated with responses to an EHP survey total by a subject, a feature associated with responses to an EHP pre-treatment survey by a subject, and/or a feature associated with hormone information of a subject.
  • the trained machine learning classification model may be configured to receive an input that includes a feature associated with the age of a subject, a feature associated with hormones self-reported by a subject, a feature associated with responses to an EHP pre-sexual survey by a subject, a feature associated with whether a subject took contraceptive medication, and/or a feature associated with whether a subject has irritable bowel syndrome (IBS).
  • IBS irritable bowel syndrome
  • FIG. 5 shown is a schematic representation of a process of performing an inference task of that results in a prediction of a diagnosis of endometriosis and then a diagnosis of a stage (e.g., no endometriosis, stage 1 , stage 2, stage 3, stage 4, stages 1 or 2, stages 3 or 4, etc.) of endometriosis for a subject.
  • a stage e.g., no endometriosis, stage 1 , stage 2, stage 3, stage 4, stages 1 or 2, stages 3 or 4, etc.
  • new data e.g., input data that was not used in training the machine learning model, which may include biomarker data, clinical data, and survey data
  • a first trained machine learning classification model e.g., input data that was not used in training the machine learning model, which may include biomarker data, clinical data, and survey data
  • the first trained machine learning classification model may be configured to provide a prediction of a diagnosis of endometriosis for the patient.
  • the second trained machine learning classification model may be configured to provide a prediction of a diagnosis of a stage of endometriosis for the patient based on a prediction of a diagnosis of endometriosis for the patient from the first trained machine learning classification model.
  • the first trained machine learning classification model and/or the second trained machine learning classification model may be configured to receive an input that is based on a feature importance and/or dimensionality reduction procedure.
  • the first and/or second trained machine learning classification models may be configured to receive an input that includes a plurality of features that are a subset of all features in the training data set based on the feature importance and/or dimensionality reduction procedure.
  • the first and/or second trained machine learning classification models may be configured to receive an input that includes a feature a feature associated with the age of a subject, a feature associated with responses to an EHP pre-sexual survey by a subject, a feature associated with the use of hormonal medication self-reported by a subject, a feature associated with responses to a VAS pre-dyspareunia survey by a subject, and/or a feature associated with responses to an EHP survey total by a subject.
  • the trained machine learning classification model may be configured to receive an input that includes a feature associated with whether a subject took contraceptive medication, a feature associated with whether a subject whether a subject took muscle relaxants, a feature associated with hormones selfreported by a subject, a feature associated with the age of a subject, and/or a feature associated with whether a subject has irritable bowel syndrome (IBS).
  • IBS irritable bowel syndrome
  • FIG. 6 shows a diagram of example components of a device 200 for training, validating, and/or applying a machine learning model according to non-limiting embodiments described herein.
  • a system useful for training, validating, and/or applying a machine learning model as described herein may include any number of devices, with one or more components as shown in FIG. 6, and that the device(s) may communicate with one or more other device(s) through wired or wireless connections as are known in the art.
  • one or more client devices may be used to input data as described herein, and one or more aspects of the machine learning model may be stored and/or executed by the client device, by another device (e.g., a server) in communication with the client device, and/or both.
  • a device may be a mobile device, such as those described herein, and may receive data input by a user.
  • one or more aspects of the machine learning model may be stored and/or executed by the mobile device, by another device (e.g., a server) in communication with the mobile device, and/or both.
  • Device 200 may correspond to any element of any system or device described herein, including any computing device and/or server, for example those configured to collect data for training, validating, and/or applying a machine learning model as described herein.
  • such systems or devices may include at least one device 200 and/or at least one component of device 200.
  • the number and arrangement of components shown are provided as an example.
  • device 200 may include additional components, fewer components, different components, or differently arranged components than those shown.
  • a set of components (e.g., one or more components) of device 200 may perform one or more functions described as being performed by another set of components of device 200.
  • device 200 may include a bus 202, a processor 204, memory 206, a storage component 208, an input component 210, an output component 212, and a communication interface 214.
  • Bus 202 may include a component that permits communication among the components of device 200.
  • processor 204 may be implemented in hardware, firmware, or a combination of hardware and software.
  • processor 204 may include a processor (e.g., a central processing unit (CPU), a graphics processing unit (GPU), an accelerated processing unit (APU), etc.), a microprocessor, a digital signal processor (DSP), and/or any processing component (e.g., a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), etc.) that can be programmed to perform a function.
  • Memory 206 may include random access memory (RAM), read only memory (ROM), and/or another type of dynamic or static storage device (e.g., flash memory, magnetic memory, optical memory, etc.) that stores information and/or instructions for use by processor 204.
  • RAM random access memory
  • ROM read only memory
  • static storage device e.g., flash memory, magnetic memory, optical memory, etc.
  • storage component 208 may store information and/or software related to the operation and use of device 200.
  • storage component 208 may include a hard disk (e.g., a magnetic disk, an optical disk, a magneto-optic disk, a solid-state disk, etc.) and/or another type of computer-readable medium.
  • Input component 210 may include a component that permits device 200 to receive information, such as via user input (e.g., a touch screen display, a keyboard, a keypad, a mouse, a button, a switch, a microphone, etc.). Additionally, or alternatively, input component 210 may include a sensor for sensing information (e.g., a global positioning system (GPS) component, an accelerometer, a gyroscope, an actuator, etc.). Sensors useful here may include biochemical sensors, electrochemical sensors, sensors for detecting autonomic tone, sensos for detecting sympathetic tone, and/or the like.
  • GPS global positioning system
  • Output component 212 may include a component that provides output information from device 200 (e.g., a display, a speaker, one or more light-emitting diodes (LEDs), etc.).
  • Communication interface 214 may include a transceiver-like component (e.g., a transceiver, a separate receiver and transmitter, etc.) that enables device 200 to communicate with other devices, such as via a wired connection, a wireless connection, or a combination of wired and wireless connections.
  • Communication interface 214 may permit device 200 to receive information from another device and/or provide information to another device.
  • communication interface 214 may include an Ethernet interface, an optical interface, a coaxial interface, an infrared interface, a radio frequency (RF) interface, a universal serial bus (USB) interface, a Wi-Fi® interface, a cellular network interface, and/or the like.
  • RF radio frequency
  • USB universal serial bus
  • Device 200 may perform one or more processes described herein. Device 200 may perform these processes based on processor 204 executing software instructions stored by a computer-readable medium, such as memory 206 and/or storage component 208.
  • a computer-readable medium may include any non-transitory memory device.
  • a memory device includes memory space located inside of a single physical storage device or memory space spread across multiple physical storage devices.
  • Software instructions may be read into memory 206 and/or storage component 208 from another computer-readable medium or from another device via communication interface 214. When executed, software instructions stored in memory 206 and/or storage component 208 may cause processor 204 to perform one or more processes described herein. Additionally, or alternatively, hardwired circuitry may be used in place of or in combination with software instructions to perform one or more processes described herein.
  • embodiments described herein are not limited to any specific combination of hardware circuitry and software.
  • the term “configured to,” as used herein, may refer to an arrangement of software, device(s), and/or hardware for performing and/or enabling one or more functions (e.g., actions, processes, steps of a process, and/or the like).
  • a processor configured to may refer to a processor that executes software instructions (e.g., program code) that cause the processor to perform one or more functions.
  • Clinical Data Patient clinical data will be collected from University of Pittsburgh Medical Center (UPMC) Clinical Analytics, which structures various streams of UPMC data into easily visualizable data for the purposes of clinical, quality, and operational improvement. Clinical Analytics will provide for each enrolled patient the indices shown in Table 1 above.
  • UPMC University of Pittsburgh Medical Center
  • the pre-surgical survey includes the EHP-30 and the Visual Analog Scales for pain.
  • the survey includes questions detailing the patient’s experience of pain (see Table 1 above).
  • the patientgenerated data from the pre-surgical survey customizes the data based on personal experiences of the biological and physical presentation of the disease. Studies have shown that 86.2% of patients presented with multiple symptoms before their diagnosis.
  • summing and averaging patient responses will calculate the scores. De-identified responses will be input into a secured and encrypted for the database.
  • Biological samples will be collected under sterile conditions during the surgical procedure to minimize disruption to the standard of care of the patient. Peripheral blood will be collected via routine venous peripheral blood draws. Endometrial biopsy is a non-surgical technique of obtaining endometrial tissue using a small suctioning swab inserted through the cervix into the uterus. All samples will be stored in de-identified containers, either blood vials or sterile conical tubes with tissue biopsy medium. Blood will be kept at room temperature for 30 minutes prior to further processing and tissue will be stored on ice.
  • EM is an estrogendependent and progesterone-resistant disease which drives inflammation, pelvic pain, and impacts fertility of EM patients. It is thought that EM has a 50% heritability rate and 19 SNPs have been identified as explaining half of EM disease variants.
  • EnSCs shed into the peritoneal cavity during retrograde menstruation and activate macrophages, driving inflammatory release of IL-6, TNF-a, and VEGF which leads to disease invasion and growth.
  • EM risk has been related to increased PGP-9.5+ nerve cell size and density in endometrial biopsies, demonstrating a potential relationship between patient symptom survey data and biomarker data that ML could identify.
  • Flow cytometry will be used to measure EnSCs and PGP-9.5 receptor cells and Luminex immunoassays will be used to measure pro- inflammatory cytokines.
  • Genetic testing of blood and endometrial biopsy samples will be performed using TaqMan SNP genotyping and assaying. Patient biomarker results will be added to the database after all biomarker analysis for the patient has been completed.
  • patient-specific data inputs will be determined, and it may be discovered that many of the variables are extraneous through dimensionality reduction, a filtering process which will include principal component analysis.
  • the resulting subset of patient-specific input data will be used to train the final predictive ML models.
  • multinomial logistic regression will be performed, using the database described above as inputs, to assess its efficacy in diagnosing disease presence and stage in comparison to the ML models.
  • the best performing ML model will be validated using hold-out validation, a preferred method of ML validation where the dataset being used was not involved in the training of the models.
  • two ML models may be produced, one each capable of diagnosing EM presence and stage, as well as two multinomial logistic regression models, one each for disease presence and stage.
  • This will allow for direct comparison of the ML models to the regression models for each prediction using receiver operating curves (ROC) and their areas under the curve (AUC), accuracy, specificity, and sensitivity. It is hypothesized that ML will improve upon multinomial logistic regression prediction of the presence and stage of EM based on these assessments.
  • ROC receiver operating curves
  • AUC areas under the curve
  • the training data which will be a random selection of 80%, or 160 patients, of the patient-specific data with and without EM as described herein, will be input to the ML pipeline (FIG. 2).
  • the Python library, Gini Importance will be used to determine the most important features, or input variables, in the ML models iteratively using feature importance and dimensionality reduction. Input variables with correlation values greater than 0.6 will be considered important to the model and will be retained for training of the models. It is expected that the results of this step will mimic the results seen in the multinomial logistic regression model.
  • ML Model Training The ML models will be trained and tested using 80% of the dataset described herein, a total of 160 patients. One ML model classifies whether a patient does or does not have EM (FIGS. 3 and 5). A patient identified to have EM will result in a ML model predicting the stage of EM in a patient (FIGS. 4 and 5). A final subset of indices will be used to train ML classification models for determining presence and/or staging of disease. Python libraries along with in-house Python scripts will leverage sci-kit learn (sklearn), XGBoost, and Tree Based Pipeline Optimization (T-POT). T-POT may not be implemented given the sample size, though AutoML performed as described herein will determine this.
  • the desired model outcome would include a receiver operating characteristic (ROC) curve with an area under the curve (AUC) greater than 0.85 (typically, an AUC of greater than 0.70 would reveal a positive direction in clinical utility), which would reveal a significant improvement in accurate diagnosis since the current rate of diagnosis is 40%.
  • ROC receiver operating characteristic
  • AUC area under the curve
  • the final outputs for the training of the ML models will be the AUC of the ROCs, accuracy, sensitivity, and specificity.
  • FIG. 7 a proof-of-concept prediction model of this disease has been established using a preliminary database of clinical data and survey data to demonstrate the clinical relevance and accuracy of the proposed method.
  • This initial ML model relied on 72 patients with 7-fold cross-validation and resulted in a 0.752 AUC for diagnosing EM accurately (FIG. 7, panel A).
  • the AUC of 0.752 indicates that the model has a degree of discriminability of outcomes, but the initial results are based on internal cross-validation.
  • the overall model had an accuracy of 76.4% (FIG. 7, panel B) when predicting the presence of EM (accurately predicting 9.7% with no EM and 66.7% with EM).
  • the proof-of-concept prediction model pipeline serves as the basis for training the larger models and inputting unseen data to perform a more rigorous hold-out-validation.
  • FIGS. 8A-8B another prediction model of this disease has been established using a preliminary database of clinical data and survey data to demonstrate the clinical relevance and accuracy of the present disclosure.
  • the ML model an efficient linear support vector machine (SVM), relied on 241 patients with 5- fold cross-validation and resulted in a 0.6453 AUC for diagnosing the presence of EM accurately (FIG. 8A).
  • the overall model had an accuracy of 72.6% (FIG.
  • FIGS. 9A-9B another prediction model of this disease has been established using a preliminary database of clinical data and survey data to demonstrate the clinical relevance and accuracy of the present disclosure.
  • the ML model a subspace discriminant, relied on 241 patients with 5-fold cross-validation and resulted in a 0.5812 AUC for diagnosing the presence of EM accurately, a 0.7162 AUC for diagnosing whether the subject has stages 1 or 2 of EM accurately, and a 0.7782 AUC for diagnosing whether the subject has stages 3 or 4 of EM accurately (FIG. 9A).
  • the overall model had an accuracy of 52.4% (FIG. 9B) when predicting the presence and stage of EM (accurately predicting 23.8% with no EM, 72.7% with stage 1 or 2 of EM, and 50% with stage 3 or 4 of EM).
  • FIGS. 10A-10B another prediction model of this disease has been established using a preliminary database of clinical data and survey data to demonstrate the clinical relevance and accuracy of the present disclosure.
  • the ML model a subspace discriminant, relied on 177 patients with 5-fold cross-validation and resulted in a 0.7645 AUC for diagnosing the stage (e.g., stages 1 or 2 or stages 3 or 4) of EM accurately among known endometriotic patients (FIG. 10A).
  • the overall model had an accuracy of 72.9% (FIG. 10B) when predicting the stage of EM (accurately predicting 83.6% with stage 1 or 2 of EM and 55.2% with stage 3 or 4 of EM).
  • FIGS. 11A-11 B another prediction model of this disease has been established using a preliminary database of clinical data and survey data to demonstrate the clinical relevance and accuracy of the present disclosure.
  • the ML model a subspace discriminant, relied on 238 patients with 5-fold cross and resulted in a 0.6429 AUC for diagnosing the presence of EM accurately (FIG. 11 A).
  • the overall model had an accuracy of 74.4% (FIG. 11 B) when predicting the presence of EM (accurately predicting 21 .7% with no EM and 92.1 % with EM).
  • FIG. 12A-12B another prediction model of this disease has been established using a preliminary database of clinical data and survey data to demonstrate the clinical relevance and accuracy of the present disclosure.
  • the ML model a linear SVM, relied on 239 patients with 5-fold cross-validation and resulted in a 0.5981 AUC for diagnosing the presence of EM accurately, a 0.6544 AUC for diagnosing whether the subject has stages 1 or 2 of EM accurately, and a 0.7352 AUC for diagnosing whether the subject has stages 3 or 4 of EM accurately (FIG. 12A).
  • the overall model had an accuracy of 53.6% (FIG. 12B) when predicting the presence and stage of EM (accurately predicting 19.7% with no EM, 76.6% with stage 1 or 2 of EM, and 46.3% with stage 3 or 4 of EM).
  • FIGS. 13A-13B another prediction model of this disease has been established using a preliminary database of clinical data and survey data to demonstrate the clinical relevance and accuracy of the present disclosure.
  • the ML model a subspace discriminant, relied on 177 patients with 5-fold cross-validation and resulted in a 0.7462 AUC for diagnosing the stage (e.g., stages 1 or 2 or stages 3 or 4) of EM accurately among known endometriotic patients (FIG. 13A).
  • the overall model had an accuracy of 74% (FIG. 13B) when predicting the stage of EM (accurately predicting 85.3% with stage 1 or 2 of EM and 55.9% with stage 3 or 4 of EM).
  • FIGS. 14A-14B another prediction model of this disease has been established using a preliminary database of clinical data and survey data to demonstrate the clinical relevance and accuracy of the present disclosure.
  • the ML model an efficient linear SVM, relied on 236 patients with 5-fold cross-validation and resulted in a 0.6165 AUC for diagnosing the presence of EM accurately (FIG. 14A).
  • the overall model had an accuracy of 76.7% (FIG. 14B) when predicting the presence of EM (accurately predicting 34.4% with no EM and 89.9% with EM).
  • FIGS. 15A-15B another prediction model of this disease has been established using a preliminary database of clinical data and survey data to demonstrate the clinical relevance and accuracy of the present disclosure.
  • the ML model an efficient linear SVM, relied on 239 patients with 5-fold cross-validation and resulted in a 0.5825 AUC for diagnosing the presence of EM accurately, a 0.6128 AUC for diagnosing whether the subject has stages 1 or 2 of EM accurately, and a 0.7839 AUC for diagnosing whether the subject has stages 3 or 4 of EM accurately (FIG. 15A).
  • the overall model had an accuracy of 52.3% (FIG. 15B) when predicting the presence and stage of EM (accurately predicting 32.8% with no EM, 61 .8% with stage 1 or 2 of EM, and 54.4% with stage 3 or 4 of EM).
  • FIGS. 16A-16B another prediction model of this disease has been established using a preliminary database of clinical data and survey data to demonstrate the clinical relevance and accuracy of the present disclosure.
  • the ML model a subspace discriminant, relied on 176 patients with 5-fold cross-validation and resulted in a 0.7236 AUC for diagnosing the stage (e.g., stages 1 or 2 or stages 3 or 4) of EM accurately among known endometriotic patients (FIG. 16A).
  • the overall model had an accuracy of 69.9% (FIG. 16B) when predicting the stage of EM (accurately predicting 76.6% with stage 1 or 2 of EM and 58.8% with stage 3 or 4 of EM).
  • Multinomial Logistic Regression Analysis of Patient-Specific Inputs to Identify Disease Presence Statistical analyses, namely statistical classification models, have previously been used as EM clinical screening tools using symptombased inputs. However, combining non-invasive biomarkers, clinical data, and patientgenerated survey responses has yet to be included in a stochastic model for EM presence or stage prediction. Multinomial logistic regression analysis will be used to compare against the ML models being developed. This multinomial logistic regression model will rely on the same inputs the ML models described herein, will also undergo feature importance and dimensionality reduction, and will output an AUC of the ROC, accuracy, specificity, and sensitivity to be compared against the ML models.
  • Model for Prediction Accuracy The remaining 20% of the dataset that were randomly selected for use to train and test the model, totaling to 40 patients, will be utilized. This technique is known as hold-out validation and is considered a common and preferred method of validating ML models for clinical approaches. Hold-out validation will be the primary method of model validation for the entirety of development. A model will be considered validated when the prediction from the model aligns with the outcome the patient experienced (+/- EM presence and Stage 1 , 2, 3, or 4).
  • a framework for a non-invasive diagnostic tool for EM using ML and multinomial logistic regression may be created.
  • This framework utilizes clinical data, patient-generated survey data, and/or biomarker data to determine both the presence and stage of EM.
  • the desired marker for a successfully trained ML algorithm would be a model that has an ROC with an AUC greater than 0.85.
  • Two trained and validated ML models may be produced, which have been compared to standard statistical techniques and will identify the presence and severity of EM.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Chemical & Material Sciences (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Analytical Chemistry (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Genetics & Genomics (AREA)
  • Biotechnology (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Zoology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Immunology (AREA)

Abstract

Provided herein is a computer-implemented method of training a machine learning model for identifying the presence of endometriosis in a symptomatic patient, including receiving, with at least one processor, biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects, receiving, with at least one processor, clinical data relating to one or more clinical features of one or more subjects, receiving, with at least one processor, survey data relating to one or more validated, subjective parameters of one or more subjects, training, with at least one processor and based at least on the biomarker data, the clinical data, and the survey data, a machine learning model.

Description

SYSTEM AND METHOD FOR EARLY DIAGNOSIS OF ENDOMETRIOSIS
CROSS REFERENCE TO RELATED APPLICATION
[0001] The present application claims the benefit of United States Provisional Patent Application No. 63/642,330, filed on May 3, 2024, the disclosure of which is hereby incorporated by reference in its entirety.
BACKGROUND OF THE INVENTION
Field of the Invention
[0002] Provided herein are systems and methods for early and accurate diagnosis of endometriosis.
Description of Related Art
[0003] Endometriosis (EM) is a painful gynecological disease affecting 10% of people with uteruses (-200 million) worldwide. Characterized by the presence of endometrial-like tissue outside the uterus in the peritoneal cavity, patients with EM often suffer from infertility and chronic pelvic pain while awaiting their diagnosis and treatment. The average time to EM diagnosis is 6.7 years from the onset of symptoms due to the general lack of knowledge of the disease, excessive cost of diagnosis, and confounding symptoms leading to misdiagnoses. The only currently accepted method of EM diagnosis is through surgery and histological confirmation of disease. Ultrasound and magnetic resonance imaging can be used to inform surgeons, but provide unreliable visualization of EM lesions, oftentimes only identifying high-stage, non-specific lesions. Approximately 20-40 percent of patients who undergo the diagnostic surgery are found not to have EM, resulting in an unnecessary, invasive exploratory procedure. See Albee etal., Laparoscopic Excision of Lesions Suggestive of Endometriosis or Otherwise Atypical in Appearance: Relationship Between Visual Findings and Final Histologic Diagnosis, Minimally Invasive Gynecology, 2008, 15(1 ): 32-37. An improved method for early, cost-effective, non-invasive, and accurate EM diagnosis would be groundbreaking for patients and clinicians.
[0004] Attempts at non-invasive diagnosis and staging of EM by investigating one of three critical classes of patient information - clinical data, symptoms, and biomarkers - have been made, though none have been successful in meeting clinical standards. Progress in EM diagnostics has been limited in that: i) there are no known databases containing all three classes of patient information in one place that have confirmed the presence or absence of endometriosis surgically; ii) there is no research indicating which of the classes provide the clearest prediction of disease state; and iii) and there is no documented relationship between these classes with the diagnosis or presence and stage of disease that patients receive. Accordingly, there is a need in the art for solutions that allow for early and accurate diagnosis of EM.
SUMMARY OF THE INVENTION
[0005] Provided herein is a computer-implemented method of training a machine learning model for identifying the presence of endometriosis in a symptomatic patient, including receiving, with at least one processor, biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects, receiving, with at least one processor, clinical data relating to one or more clinical features of one or more subjects, receiving, with at least one processor, survey data relating to one or more validated, subjective parameters of one or more subjects, training, with at least one processor and based at least on the biomarker data, the clinical data, and the survey data, a machine learning model.
[0006] Also provided herein is a computer-implemented method of identifying the presence of endometriosis in a symptomatic patient, including training, with at least one processor, a machine learning model based at least on biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects, clinical data relating to one or more clinical features of one or more subjects, and survey data relating to one or more validated, subjective parameters of one or more subjects. The method further includes applying, with at least one processor, the machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from the patient, and based on applying the machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determining, with at least one processor, that the patient has endometriosis or that the patient does not have endometriosis.
[0007] Also provided herein is a system including at least one processor programmed or configured to receive biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects, receive clinical data relating to one or more clinical features of one or more subjects, receive survey data relating to one or more validated, subjective parameters of one or more subjects, train, based at least on the biomarker data, the clinical data, and the survey data, a machine learning model.
[0008] Also provided herein is a system including at least one processor programmed or configured to train a machine learning model based at least on biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects, clinical data relating to one or more clinical features of one or more subjects, and survey data relating to one or more validated, subjective parameters of one or more subjects. The at least one processor is further programmed or configured to apply the machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from a patient, and based on applying the machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determine that the patient has endometriosis or that the patient does not have endometriosis.
[0009] Also provided herein is a computer-implemented method of identifying the presence of endometriosis in a patient, including applying, with at least one processor, a trained machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from the patient and based on applying the trained machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determining, with at least one processor, that the patient has endometriosis or that the patient does not have endometriosis, wherein the machine learning model is trained with biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects, clinical data relating to one or more clinical features of one or more subjects, and survey data relating to one or more validated, subjective parameters of one or more subjects.
[0010] Also provided herein is system comprising at least one processor programmed or configured to apply a trained machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from a patient and based on applying the trained machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determine that the patient has endometriosis or that the patient does not have endometriosis, wherein the machine learning model is trained with at least biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects, clinical data relating to one or more clinical features of one or more subjects, and survey data relating to one or more validated, subjective parameters of one or more subjects. [0011] Also provided herein is a non-transitory, computer-readable medium including programming instructions that, when executed by at least one processor, cause the at least one processor to receive clinical data relating to one or more clinical features of one or more subjects, receive survey data relating to one or more validated, subjective parameters of one or more subjects, train, based at least on the biomarker data, the clinical data, and the survey data, a machine learning model.
[0012] Also provided herein is a non-transitory, computer-readable medium including programming instructions that, when executed by at least one processor, cause the at least one processor to apply a trained machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from a patient, and based on applying the trained machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determine that the patient has endometriosis or that the patient does not have endometriosis, wherein the machine learning model is trained with at least biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects, clinical data relating to one or more clinical features of one or more subjects, and survey data relating to one or more validated, subjective parameters of one or more subjects.
[0013] Further non-limiting embodiments are set forth in the following numbered clauses:
[0014] 1. A computer-implemented method of training a machine learning model for identifying the presence of endometriosis in a patient, comprising: receiving, with at least one processor, biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects; receiving, with at least one processor, clinical data relating to one or more clinical features of one or more subjects; receiving, with at least one processor, survey data relating to one or more validated, subjective parameters of one or more subjects; and training, with at least one processor and based at least on the biomarker data, the clinical data, and the survey data, a machine learning model.
[0015] 2. The computer-implemented method of clause 1 , wherein the one or more biomarkers are obtained from the one or more subjects’ saliva, urine, and/or blood.
[0016] 3. The computer-implemented method of clause 1 or clause 2, wherein the one or more biomarkers comprise one or more genetic markers. [0017] 4. The computer-implemented method of any of clauses 1 -3, wherein the one or more biomarkers comprise one or more single nucleotide polymorphisms (SNPs).
[0018] 5. The computer-implemented method of any of clauses 1 -4, wherein the one or more SNPs comprise one or more SNPs in WNT4, GREB1 , and/or KDR.
[0019] 6. The computer-implemented method of any of clauses 1 -5, wherein the one or more biomarkers comprise one or more extracellular vesicles, hormones, neurotransmitters, growth factors, peptides, cytokines, glycoproteins, and/or enzymes. [0020] 7. The computer-implemented method of any of clauses 1 -6, wherein the one or more biomarkers comprise presence and/or levels of one or more cells.
[0021] 8. The computer-implemented method of any of clauses 1 -7, wherein the one or more cells comprise one or more immune cells and/or stromal cells.
[0022] 9. The computer-implemented method of any of clauses 1 -8, wherein the one or more clinical features comprises one or more of the one or more subjects’ age, race, family history of endometriosis, medication history, age of first menstrual cycle, weight, height, body mass index, previously-diagnosed diseases or conditions, comorbidities, obstetric history, and/or substance use history.
[0023] 10. The computer-implemented method of any of clauses 1 -9, wherein the one or more validated, subjective parameters comprise one or more scores of a visual analog scale.
[0024] 1 1 . The computer-implemented method of any of clauses 1 -10, wherein the one or more scores of the visual analog scale comprise one or more numeric ratings of the one or more subjects’ self-perception of pain, self-perception of control and/or powerlessness, self-perception of social support, self-perception of emotional wellbeing, self-image, duration of pain, location of pain, menstrual cycle length, and/or menstrual cycle irregularities before and/or after hormone therapy.
[0025] 12. The computer-implemented method of any of clauses 1 -11 , wherein the machine learning model comprises a classification model.
[0026] 13. The computer-implemented method of any of clauses 1 -12, wherein the machine learning model is trained with one or more of a k-nearest algorithm, a naive bayes algorithm, and/or a neural network.
[0027] 14. A computer-implemented method of identifying the presence of endometriosis in a patient, comprising: training, with at least one processor, a machine learning model based at least on: biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects; clinical data relating to one or more clinical features of one or more subjects; and survey data relating to one or more validated, subjective parameters of one or more subjects; applying, with at least one processor, the machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from the patient; and based on applying the machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determining, with at least one processor, that the patient has endometriosis or that the patient does not have endometriosis.
[0028] 15. The computer-implemented method of clause 14, wherein the one or more biomarkers are obtained from the one or more subjects’ saliva, urine, and/or blood.
[0029] 16. The computer-implemented method of clause 14 or clause 15, wherein the one or more biomarkers comprise one or more genetic markers.
[0030] 17. The computer-implemented method of any of clauses 14-16, wherein the one or more biomarkers comprise one or more single nucleotide polymorphisms (SNPs).
[0031 ] 18. The computer-implemented method of any of clauses 14-17, wherein the one or more SNPs comprise one or more SNPs in WNT4, GREB1 , and/or KDR.
[0032] 19. The computer-implemented method of any of clauses 14-18, wherein the one or more biomarkers comprise one or more extracellular vesicles, hormones, neurotransmitters, growth factors, peptides, cytokines, glycoproteins, and/or enzymes. [0033] 20. The computer-implemented method of any of clauses 14-19, wherein the one or more biomarkers comprise presence and/or levels of one or more cells.
[0034] 21 . The computer-implemented method of any of clauses 14-20, wherein the one or more cells comprise one or more immune cells and/or stromal cells.
[0035] 22. The computer-implemented method of any of clauses 14-21 , wherein the one or more clinical features comprises one or more of the one or more subjects’ age, race, family history of endometriosis, medication history, age of first menstrual cycle, weight, height, body mass index, previously-diagnosed diseases or conditions, comorbidities, obstetric history, and/or substance use history.
[0036] 23. The computer-implemented method of any of clauses 14-22, wherein the one or more validated, subjective parameters comprise one or more scores of a visual analog scale. [0037] 24. The computer-implemented method of any of clauses 14-23, wherein the one or more scores of the visual analog scale comprise one or more numeric ratings of the one or more subjects’ self-perception of pain, self-perception of control and/or powerlessness, self-perception of social support, self-perception of emotional wellbeing, self-image, duration of pain, location of pain, menstrual cycle length, and/or menstrual cycle irregularities before and/or after hormone therapy.
[0038] 25. The computer-implemented method of any of clauses 14-24, wherein the machine learning model comprises a classification model.
[0039] 26. The computer-implemented method of any of clauses 14-25, wherein the machine learning model is trained with one or more of a k-nearest algorithm, a naive bayes algorithm, and/or a neural network.
[0040] 27. The computer-implemented method of any of clauses 14-26, further comprising, after determining that the patient has endometriosis, determining, with at least one processor, a stage of the patient’s endometriosis.
[0041] 28. A system comprising at least one processor programmed or configured to: receive biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects; receive clinical data relating to one or more clinical features of one or more subjects; receive survey data relating to one or more validated, subjective parameters of one or more subjects; and train, based at least on the biomarker data, the clinical data, and the survey data, a machine learning model.
[0042] 29. The system of clause 28, wherein the at least one processor is programmed or configured to train the machine learning model using one or more of a k-nearest algorithm, a naive bayes algorithm, and/or a neural network.
[0043] 30. The system of clause 28 or clause 29, wherein the machine learning model is a classification model.
[0044] 31 . A system comprising at least one processor programmed or configured to: train a machine learning model based at least on: biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects; clinical data relating to one or more clinical features of one or more subjects; and survey data relating to one or more validated, subjective parameters of one or more subjects; apply the machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from a patient; and based on applying the machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determine that the patient has endometriosis or that the patient does not have endometriosis.
[0045] 32. The system of clause 31 , wherein the at least one processor is programmed or configured to train the machine learning model using one or more of a k-nearest algorithm, a naive bayes algorithm, and/or a neural network.
[0046] 33. The system of clause 31 or clause 32, wherein the machine learning model is a classification model.
[0047] 34. The system of any of clauses 31 -33, wherein the at least one process is further programmed or configured to, after determining that the patient has endometriosis, determine a stage of the patient’s endometriosis.
[0048] 35. A computer-implemented method of identifying the presence of endometriosis in a patient, comprising: applying, with at least one processor, a trained machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from the patient; and based on applying the trained machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determining, with at least one processor, that the patient has endometriosis or that the patient does not have endometriosis, wherein the machine learning model is trained with: biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects; clinical data relating to one or more clinical features of one or more subjects; and survey data relating to one or more validated, subjective parameters of one or more subjects.
[0049] 36. A system comprising at least one processor programmed or configured to: apply a trained machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from a patient; and based on applying the trained machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determine that the patient has endometriosis or that the patient does not have endometriosis, wherein the machine learning model is trained with at least: biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects; clinical data relating to one or more clinical features of one or more subjects; and survey data relating to one or more validated, subjective parameters of one or more subjects.
[0050] 37. A non-transitory, computer-readable medium comprising programming instructions that, when executed by at least one processor, cause the at least one processor to: receive clinical data relating to one or more clinical features of one or more subjects; receive survey data relating to one or more validated, subjective parameters of one or more subjects; and train, based at least on the biomarker data, the clinical data, and the survey data, a machine learning model.
[0051] 38. A non-transitory, computer-readable medium comprising programming instructions that, when executed by at least one processor, cause the at least one processor to: apply a trained machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from a patient; and based on applying the trained machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determine that the patient has endometriosis or that the patient does not have endometriosis, wherein the machine learning model is trained with at least: biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects; clinical data relating to one or more clinical features of one or more subjects; and survey data relating to one or more validated, subjective parameters of one or more subjects.
BRIEF DESCRIPTION OF THE DRAWINGS
[0052] FIG. 1 is a schematic depicting sources of inputs for systems and methods according to non-limiting embodiments described herein;
[0053] FIG. 2 is a schematic depicting a machine learning pipeline useful in implementing the systems and methods according to non-limiting embodiments described herein;
[0054] FIG. 3 is a schematic depicting a machine learning pipeline useful in implementing the systems and methods for identifying the presence of endometriosis (EM) according to non-limiting embodiments described herein;
[0055] FIG. 4 is a schematic depicting a machine learning pipeline useful in implementing the systems and methods for staging EM according to non-limiting embodiments described herein;
[0056] FIG. 5 is a schematic depicting a machine learning pipeline useful in implementing the systems and methods according to non-limiting embodiments described herein;
[0057] FIG. 6 is a schematic diagram of example components of one or more devices useful in non-limiting embodiments of systems and methods according to nonlimiting embodiments described herein; [0058] FIG. 7 shows a confusion matrix (panel A) and an ROC curve (panel B) for a machine learning model according to non-limiting embodiments described herein;
[0059] FIGS. 8A-8B show a confusion matrix (A) and an ROC curve (B) for another machine learning model according to non-limiting embodiments described herein;
[0060] FIGS. 9A-9B show a confusion matrix (A) and an ROC curve (B) for an additional machine learning model according to non-limiting embodiments described herein;
[0061] FIGS. 10A-10B show a confusion matrix (A) and an ROC curve (B) for another machine learning model according to non-limiting embodiments described herein;
[0062] FIGS. 11A-11 B show a confusion matrix (A) and an ROC curve (B) for an additional machine learning model according to non-limiting embodiments described herein;
[0063] FIGS. 12A-12B show a confusion matrix (A) and an ROC curve (B) for another machine learning model according to non-limiting embodiments described herein;
[0064] FIGS. 13A-13B show a confusion matrix (A) and an ROC curve (B) for an additional machine learning model according to non-limiting embodiments described herein;
[0065] FIGS. 14A-14B show a confusion matrix (A) and an ROC curve (B) for another machine learning model according to non-limiting embodiments described herein;
[0066] FIGS. 15A-15B show a confusion matrix (A) and an ROC curve (B) for an additional machine learning model according to non-limiting embodiments described herein; and
[0067] FIGS. 16A-16B show a confusion matrix (A) and an ROC curve (B) for another machine learning model according to non-limiting embodiments described herein.
DESCRIPTION OF THE INVENTION
[0068] For purposes of the description hereinafter, the terms “end,” “upper,” “lower,” “right,” “left,” “vertical,” “horizontal,” “top,” “bottom,” “lateral,” “longitudinal,” and derivatives thereof shall relate to the embodiments as they are oriented in the drawing figures. However, it is to be understood that the present disclosure may assume various alternative variations and step sequences, except where expressly specified to the contrary. It is also to be understood that the specific devices and processes illustrated in the attached drawings, and described in the following specification, are simply exemplary and non-limiting embodiments or aspects of the disclosed subject matter. Hence, specific dimensions and other physical characteristics related to the embodiments or aspects disclosed herein are not to be considered as limiting.
[0069] Some non-limiting embodiments or aspects are described herein in connection with thresholds. As used herein, satisfying a threshold may refer to a value being greater than the threshold, more than the threshold, higher than the threshold, greater than or equal to the threshold, less than the threshold, fewer than the threshold, lower than the threshold, less than or equal to the threshold, equal to the threshold, etc.
[0070] No aspect, component, element, structure, act, step, function, instruction, and/or the like used herein should be construed as critical or essential unless explicitly described as such. Also, as used herein, the articles “a” and “an” are intended to include one or more items and may be used interchangeably with “one or more” and “at least one.” Furthermore, as used herein, the term “set” is intended to include one or more items (e.g., related items, unrelated items, a combination of related and unrelated items, and/or the like) and may be used interchangeably with “one or more” or “at least one.” Where only one item is intended, the term “one” or similar language is used. Also, as used herein, the terms “has,” “have,” “having,” or the like are intended to be open-ended terms. Further, the phrase “based on” is intended to mean “based at least partially on” unless explicitly stated otherwise. In addition, reference to an action being “based on” a condition may refer to the action being “in response to” the condition. For example, the phrases “based on” and “in response to” may, in some non-limiting embodiments or aspects, refer to a condition for automatically triggering an action (e.g., a specific operation of an electronic device, such as a computing device, a processor, and/or the like).
[0071] As used herein, the term “communication” may refer to the reception, receipt, transmission, transfer, provision, and/or the like of data (e.g., information, signals, messages, instructions, commands, and/or the like). For one unit (e.g., a device, a system, a component of a device or system, combinations thereof, and/or the like) to be in communication with another unit means that the one unit is able to directly or indirectly receive information from and/or transmit information to the other unit. This may refer to a direct or indirect connection (e.g., a direct communication connection, an indirect communication connection, and/or the like) that is wired and/or wireless in nature. Additionally, two units may be in communication with each other even though the information transmitted may be modified, processed, relayed, and/or routed between the first and second unit. For example, a first unit may be in communication with a second unit even though the first unit passively receives information and does not actively transmit information to the second unit. As another example, a first unit may be in communication with a second unit if at least one intermediary unit processes information received from the first unit and communicates the processed information to the second unit. In some non-limiting embodiments or aspects, a message may refer to a network packet (e.g., a data packet and/or the like) that includes data. It will be appreciated that numerous other arrangements are possible. Communication may include one or more wired and/or wireless networks. For example, communication may include a cellular network (e.g., a long-term evolution (LTE) network, a third-generation (3G) network, a fourth-generation (4G) network, a fifth-generation (5G) network, a code division multiple access (CDMA) network, etc.), a public land mobile network (PLMN), a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a telephone network (e.g., the public switched telephone network (PSTN) and/or the like), a private network, an ad hoc network, an intranet, the Internet, a fiber optic-based network, a cloud computing network, and/or the like, and/or a combination of some or all of these or other types of networks.
[0072] As used herein, the term “computing device” may refer to one or more electronic devices configured to process data. A computing device may, in some examples, include the necessary components to receive, process, and output data, such as a processor, a display, a memory, an input device, a network interface, and/or the like. A computing device may be a mobile device. As an example, a mobile device may include a cellular phone (e.g., a smartphone or standard cellular phone), a portable computer, a wearable device (e.g., watches, glasses, lenses, clothing, and/or the like), a personal digital assistant (PDA), and/or other like devices. A computing device may also be a desktop computer or other form of non-mobile computer.
[0073] As used herein, the term “server” may refer to or include one or more computing devices that are operated by or facilitate communication and processing for multiple parties in a network environment, such as the Internet, although it will be appreciated that communication may be facilitated over one or more public or private network environments and that various other arrangements are possible. Further, multiple computing devices (e.g., servers, mobile devices, etc.) directly or indirectly communicating in the network environment may constitute a “system.”
[0074] As used herein, the term “system” may refer to one or more computing devices or combinations of computing devices (e.g., processors, servers, client devices, software applications, components of such, and/or the like). Reference to “a device,” “a server,” “a processor,” and/or the like, as used herein, may refer to a previously-recited device, server, or processor that is recited as performing a previous step or function, a different device, server, or processor, and/or a combination of devices, servers, and/or processors. For example, as used in the specification and the claims, a first device, a first server, or a first processor that is recited as performing a first step or a first function may refer to the same or different device, server, or processor recited as performing a second step or a second function.
[0075] As used herein, “machine learning” may refer to a field of computer science that uses statistical techniques to provide a computer system with the ability to learn (e.g., to progressively improve performance of) a task with data without the computer system being explicitly programmed to perform the task. In some instances, a machine learning model may be developed for a set of data so that the machine learning model may perform a task (e.g., a task associated with a prediction) with regard to the set of data.
[0076] In some instances, a machine learning model, such as a predictive machine learning model, may be used to make a prediction regarding a risk or an opportunity based on a large amount of data (e.g., a large-scale dataset). A predictive machine learning model may be used to analyze a relationship between the performance of a unit based on a large-scale dataset associated with the unit and one or more known features of the unit. The objective of the predictive machine learning model may be to assess the likelihood that a similar unit will exhibit the same or similar performance as the unit. In order to generate the predictive machine learning model, the large-scale dataset may be segmented so that the predictive machine learning model may be trained on data that is appropriate.
[0077] Provided herein are systems and methods that utilize machine learning to train a model, based biomarker data, clinical data, and survey data, for early and accurate identification of the presence of endometriosis in a patient. Such systems and methods provide significant improvements in speed and accuracy of identification, as well as a reduction in invasiveness for the patient and recourses required by healthcare professionals, over existing methods of identifying the presence of endometriosis, which allows for earlier, more effective treatment (e.g., more timeefficient surgical identification and/or excision), a wider ability to provide healthcare services to a variety of populations, and a conservation of resources.
[0078] Turning to FIG. 1 , shown is a schematic illustrating potential sources of data to be input to the systems and methods disclosed herein, for example to train a machine learning model capable of identifying and/or staging the presence of endometriosis (EM) in a patient. In non-limiting embodiments, the machine learning model is trained with data obtained from one or more subjects, who may or may not have EM. In non-limiting embodiments, the sources of data include biomarker data. In non-limiting embodiments, the biomarker data may include data obtained from one or more tissues and/or bodily fluids, for example, and without limitation, a subject’s blood, urine, and/or saliva. Collected fluids, for example blood, may be peripheral or local to the uterus and/or peritoneal cavity. Collection and analysis of biomarkers from these sources is within the level of skill of those in the field, as set forth in, for example, Tian eta/., Current biomarkers for the detection of endometriosis, Chinese Medical Journal 133(19) (2020) 2346-2352; Leyendecker et a/., Endometriosis results from the dislocation of basal endometrium, Human Reproduction 17(10) (2002) 2725-2736; Stefansson at a/., Genetic factors contribute to the risk of developing endometriosis, Human Reproduction 17(3) (2002) 555-559; and Sapkota et a/., Meta-analysis identifies five novel loci associated with endometriosis highlighting key genes involved in hormone metabolism, Nature communications 8(1 ) (2017) 1 -12. As will be appreciated by those of skill in the art, numerous biomarkers that may be potentially relevant to identification and/or staging of EM may be detected and/or quantified based on samples from such sources.
[0079] In non-limiting embodiments, useful biomarkers may include genetic markers, for example the presence of single nucleotide polymorphisms (SNPs). In non-limiting embodiments, SNPs of interest may include those in WNT4, GREB1, and/or KDR. In non-limiting embodiments, the SNPs of interest may include one or more of rs12037376, rs1 1674184, rs6546324, rs10167914, rs1903068, rs760794, rs12700667, rs1537377, rs4762326, rs1250241 , rs1971256, rs71575922, rs74491657, and rs74485684. In non-limiting embodiments the SNPs of interest may include one or more of rs1519761 , rs7512902, rs4141819, rs7739264, rs12700667, rs1537377, rs10965235, rs10859871 , rs17773813, rs519664, and rs6542095. Processes and assays known to those of skill in the art, for example polymerase chain reaction (PCR) and/or sequencing, may be used to identify generic markers of interest. [0080] In non-limiting embodiments, useful biomarkers may include the presence, quantity, and/or concentration of extracellular vesicles, for example extracellular vesicles in the endometrium, hormones, for example estrogen, urocortin, and/or progesterone, neurotransmitters, growth factors, for example vascular endothelial growth factor, peptides, cytokines and/or immune factors, for example interleukin-1 beta (IL-113), interleukin-1 receptor agonist protein (IL-1 RN), interleukin-2 (IL-2), interleukin-4 (IL-4), interleukin-6 (IL-6), interleukin-8 (IL-8), interleukin-10 (IL-10), interleukin-12 (IL-12 and/or IL-12p70), interleukin-17 alpha (IL-17a), interferon gamma (IFN-y), leptin, glycodelin, chemokine ligand 20 (CCL-20), granulocyte colonystimulating factor (G-CSF), macrophage inflammatory protein-1 alpha (MIP-1 a), tumor growth factor beta (TGF-[3), macrophage migration inhibitory factor (MIF), c-reactive protein (CRP), monocyte chemoattractant protein (MCP-1 ), and/or tumor necrosis factor-alpha (TNF-a), glycoproteins, for example glycoproteins CA72-4, CA-199 and/or CA-125, proteins, for example Vitamin D binding protein, and/or enzymes, for example non-neuronal enolase. Processes and assays known to those of skill in the art, for example chromatography, immunohistochemistry, immunofluorescent assays, immunosorbent assays (including ELISA), blotting, and/or binding assays, may be used to identify and/or quantify such biomarkers.
[0081] In non-limiting embodiments, useful biomarkers may include the presence, quantity, and/or concentration of one or more cells. In non-limiting embodiments, the one or more cells may include stromal cells, for example endometrial stromal cells or endometrial epithelial cells (obtained, for example, with a non-invasive and/or minimally-invasive endometrial biopsy or menstrual effluent collection), circulating endometrium cells, and/or immune cells, for example neutrophils, eosinophils, basophils, mast cells, monocytes, macrophages, dendritic cells, natural killer cells, and/or lymphocytes. Processes and assays known to those of skill in the art, for example flow cytometry, may be used to identify and/or quantify such cells.
[0082] Those of skill will appreciate that any number and combination of markers described herein may be included in a machine learning model. In non-limiting embodiments, at least eight blood-based biomarkers, twenty-one genetic markers, and two cell markers are utilized and input into the machine learning model. [0083] According to non-limiting embodiments, a prediction machine learning model may be generated to provide a prediction of whether a subject has EM and/or a stage of EM based on a training dataset. In some non-limiting embodiments, the prediction machine learning model may include a machine learning model designed to receive, as an input, data associated with a subject (e.g., biomarker data, survey data, and/or clinical data associated with a subject) and provide, as an output, a prediction of whether a subject has EM and/or a stage of EM. For example, the prediction machine learning model may be designed to receive data associated with a subject during a time interval and provide an output that includes the prediction of whether a subject has EM and/or a stage of EM. In some non-limiting embodiments, a system may store the prediction machine learning model (e.g., for later use).
[0084] In some non-limiting embodiments, as described herein, the system may process data associated with a subject during a time interval (e.g., historical data associated with a subject) to obtain training data (e.g., a training dataset) for the prediction machine learning model. For example, the data may be processed by a system to change the data into a format that may be analyzed o generate the prediction machine learning model. The data that is changed (e.g., the data that results from the change) may be referred to as training data. In some non-limiting embodiments, a system may process the data associated with a subject during a time interval to obtain the training data based on receiving the data. Additionally or alternatively, a system may process the data to obtain the training data based on the system receiving an indication, from a user (e.g., a user associated with a user device) of the system, that the system is to process the data, such as when the system receives an indication to generate a prediction machine learning model for a time interval corresponding to the data associated with a subject.
[0085] In some non-limiting embodiments, the system may process data associated with a subject by determining an EM prediction variable based on the data. An EM prediction variable may include a metric, associated with a diagnosis of EM, which may be derived based on the data associated with a subject. The EM prediction variable may be analyzed to generate a prediction machine learning model . For example, the EM prediction variable may include a variable associated with particular medical condition of a subject, a variable associated with a diagnosis of a medication condition of a subject, a variable associated with whether a subject took a medication, a variable associated with responses to a survey (e.g., a survey associated with a determination of EM in a subject) provided by a subject, and/or the like.
[0086] In some non-limiting embodiments, the system may analyze the training data to generate the prediction machine learning model. For example, the system may use machine learning techniques to analyze the training data to generate the prediction machine learning model. In some non-limiting embodiments, generating the prediction machine learning model (e.g., based on training data obtained from historical data associated with subject during a previous time interval) may be referred to as training the prediction machine learning model. The machine learning techniques may include, for example, supervised and/or unsupervised techniques, such as decision trees, random forests, logistic regressions, linear regression, gradient boosting, supportvector machines, extra-trees (e.g., an extension of random forests), Bayesian statistics, learning automata, Hidden Markov Modeling, linear classifiers, quadratic classifiers, association rule learning, and/or the like. In some non-limiting embodiments, the prediction machine learning model may include a model that is specific to a particular characteristic, for example, a model that is specific to a particular subject, a particular geographical area of subjects, a particular time interval during which a diagnosis of a medical condition may have been made for a subject, and/or the like. Additionally, or alternatively, the prediction machine learning model may be specific to a particular entity (e.g., a subject that fits into a demographic category, such as an age group). In some non-limiting embodiments, the system may generate one or more prediction machine learning models for one or more subjects, a particular group of subjects, and/or one or more subjects of a particular group of subjects.
[0087] Additionally or alternatively, when analyzing the training data, the system may identify one or more variables (e.g., one or more independent variables) as predictor variables (e.g., features) that may be used to make a prediction when analyzing the training data. In some non-limiting embodiments, values of the predictor variables may be inputs to the prediction machine learning model. For example, the system may identify a subset (e.g., a proper subset) of the variables as the predictor variables that may be used to accurately predict a determination of whether EM is present in or stage of EM that is present in a subject. In some non-limiting embodiments, the predictor variables may include one or more of the prediction variables, as discussed above, that have a significant impact (e.g., an impact satisfying a threshold) on a prediction of whether a subject has EM and/or a stage of EM as determined by the system.
[0088] In some non-limiting embodiments, the system may validate the prediction machine learning model. For example, the system may validate the prediction machine learning model after the system generates the prediction machine learning model. In some non-limiting embodiments, the system may validate the prediction machine learning model based on a portion of the training data to be used for validation. For example, the system may partition the training data into a first portion and a second portion, where the first portion may be used to generate the prediction machine learning model , as described above. In this example, the second portion of the training data (e.g., the validation data) may be used to validate the prediction machine learning model.
[0089] In some non-limiting embodiments, the system may validate the prediction machine learning model by providing validation data associated with a user (e.g., data associated with one or more subjects) as input to the prediction machine learning model, and determining, based on an output of the prediction machine learning model , whether the prediction machine learning model correctly, or incorrectly, predicted whether a subject has EM and/or a stage of EM. In some non-limiting embodiments, the system may validate the prediction machine learning model based on a validation threshold. For example, the system may be configured to validate the prediction machine learning model when the determination of whether a subject has EM and/or a stage of EM (as identified by the validation data) are correctly predicted by the prediction machine learning model (e.g., when the prediction machine learning model correctly predicts 50% of a data set as to whether a subject has EM and/or a stage of EM, 70% of a data set as to whether a subject has EM and/or a stage of EM, a threshold quantity of a population of a data set as to whether a subject has EM and/or a stage of EM, and/or the like).
[0090] In some non-limiting embodiments, if the system does not validate the prediction machine learning model (e.g., when a percentage of correctly predicted subjects does not satisfy the validation threshold), then the system may generate one or more additional prediction machine learning models.
[0091] In some non-limiting embodiments, once the prediction machine learning model has been validated, the system may further train the prediction machine learning model and/or generate new prediction machine learning models based on receiving new training data. The new training data may include additional data associated with one or more subjects. In some non-limiting embodiments, the new training data may include data associated with of whether a subject has EM and/or a stage of EM. The system may use the prediction machine learning model to predict whether a subject has EM and/or a stage of EM and compare an output of a prediction machine learning model to the new training data that includes data associated with whether a subject has EM and/or a stage of EM. In such an example, the system may update one or more prediction machine learning models based on the new training data.
[0092] In some non-limiting embodiments, the system may store the prediction machine learning model. For example, the system may store the prediction machine learning model in a data structure (e.g., a database, a linked list, a tree, and/or the like). The data structure may be located within the system or external (e.g., remote from) the system.
[0093] With continuing reference to FIG. 1 , in non-limiting embodiments the sources of data useful in the systems and methods disclosed herein may include clinical data. In non-limiting embodiments, clinical data may include data that may be part of a subject’s electronic health record (EHR). In non-limiting embodiments, the clinical data may include one or more of the subject’s age, race, family history of endometriosis, medication history, for example history of contraceptive use, antidepressant use, aspirin use, metformin use, muscle relaxant use, non-steroidal anti-inflammatory drug (NSAID) use, opioid use, steroid use (e.g., prednisone), and/or statin use, age of first menstrual cycle, weight, height, body mass index, previously-diagnosed diseases or conditions and/or comorbidities, for example asthma, cancer (e.g., breast cancer, ovarian cancer, and/or uterine cancer), depression, bipolar disorder, anxiety, post- traumatic stress disorder (PTSD), thyroid disorders, gastroesophageal reflux disease (GERD), irritable bowel syndrome (IBS), Crohn’s Disease, ulcerative colitis, fibromyalgia, chronic fatigue, history of migraines, diabetes, cardiovascular disease (e.g., hypertension), hyperlipidemia, and/or osteoporosis, obstetric history (e.g., pregnancy, number of pregnancies, including those that reach viable gestational age, spontaneous vaginal delivery (SVD), caesarean sections, ectopic pregnancies, spontaneous abortions (SABs), and/or therapeutic abortions (TABs)), for example prior pregnancy, prior EM diagnosis and/or treatment, and/or substance use history. [0094] In non-limiting embodiments the sources of data useful in the systems and methods disclosed herein may include survey data, including subject self-reported data, validated, subjective parameters. As will be understood by those of skill in the art, validated, subjective parameters may be objectified through use of a visual analog scale. In non-limiting embodiments, one or more scores of the visual analog scale may include one or more numeric ratings of the one or more subjects’ self-perception of pain, self-perception of control and/or powerlessness, self-perception of social support, self-perception of emotional well-being, self-image, duration of pain, location of pain, menstrual cycle length, and/or menstrual cycle irregularities before and/or after hormone therapy. In non-limiting embodiments, the validated, subjective parameters are ascertained through use of the Endometriosis Health Profile-30 (EHP30) (see, e.g., Jones et al., Development of an endometriosis quality-of-life instrument: The Endometriosis Health Profile-30, Obstetrics & Gynecology 2001 , 98(2): 258-264). In non-limiting embodiments, the core EHP30 (30 questions) is utilized. In non-limiting embodiments the long-form EHP30 is utilized. In non-limiting embodiments, the shortform EHP-5 is utilized. In non-limiting embodiments, one or more EHP30 supplementary modules are utilized as an alternative and/or in addition to the core EHP30 and/or the EHP-5.
[0095] In non-limiting embodiments, sources of data described herein may be used to build a database for training and/or validating a machine learning model as described herein. In non-limiting embodiments, additional data may be added to the database as the machine learning model is applied to patients. That is, the processes described herein may be iterative. In non-limiting embodiments, because the data used to build a database as described herein may include identifying information, the data may be encrypted or de-identified.
[0096] With reference to FIG. 2, shown is a schematic representation of a process of training a machine learning model with data from one or more subjects, the data including at least those described herein above. In non-limiting embodiments, the machine learning model is trained with at least biomarker data, survey data, and/or clinical data as described herein. In non-limiting embodiments, data from one or more subjects (e.g., biomarker data, survey data, and/or clinical data associated with a subject, biomarker data, survey data, and/or clinical data associated with a plurality of subjects, biomarker data, survey data, and/or clinical data associated with a specified group of subjects, etc.) is utilized as a training data set, to train a machine learning model, through one or more algorithms. In some non-limiting embodiments, the data set is made up of only symptomatic patients (e.g., there are no patients that are healthy controls).
[0097] In non-limiting embodiments, the machine learning model (e.g., the trained machine learning model) may be configured to output, based on an input, which includes a prediction of whether a subject will have endometriosis (e.g., a prediction of whether the subject is to be diagnosed as having endometriosis). In non-limiting embodiments, the machine learning model may be configured to output, based on an input, which includes a prediction of which stage of endometriosis a subject may have (e.g., a prediction of whether the subject is to be diagnosed as Stage 1 , Stage 2, Stage 3, or Stage 4 endometriosis).
[0098] In non-limiting embodiments, the training data set may be associated with a population of subjects that includes a plurality of data records associated with a plurality of features. In some examples, the training data set may include a large amount of data records, such as 100 data records, 500 data records, 1 ,000 data records, 5,000 data records, 10,000 data records, 25,000 data records, 50,000 data records, 100,000 data records, 1 ,000,000 data records, and/or the like. In non-limiting embodiments, the plurality of features may represent variables associated with biomarker data, survey data, and/or clinical data associated with one or more subjects. In non-limiting embodiments, the plurality of features may include data associated with whether a subject took muscle relaxants, data associated with whether a subject took sleep medication, data associated with whether a subject took hormonal medication, data associated with whether a subject has (e.g., was diagnosed with) post-traumatic stress disorder (PTSD), data associated with whether a subject took contraceptive medication, data associated with whether a subject took attention deficit hyperactivity disorder (ADHD) medication, data associated with whether a subject took asthma medication, data associated with whether a subject has interstitial cystitis, data associated with whether a subject has asthma, data associated with responses to an Endometriosis Health Profile (EHP) survey (e.g., an EHP infertility survey, an EHP self-perception survey, an EHP social survey, an EHP pain survey, an EHP work survey, an EHP control survey, an EHP pain survey, an EHP sexual survey, an EHP family survey, an EHP treatment survey, an EHP medical treatment survey, an EHP emotion survey, an EHP survey total, etc.) by a subject, data associated with whether a subject has seasonal allergies, data associated with whether a subject has gastroesophageal reflux disease (GERD), data associated with whether a subject has anemia, data associated with whether a subject has cardiovascular disease (CVD), data associated with whether a subject has migraines, data associated with whether a subject took migraine medication, data associated with whether a subject has heavy bleeding during a menstrual cycle, data associated with a body mass index of a subject, data associated with whether a subject has had a spontaneous vaginal delivery (SVD), data associated with a reproductive history of a subject (e.g., data associated with para for a subject), data associated with a cycle frequency of a menstrual cycle of a subject, data associated with whether a subject has had a spontaneous abortion (SAB), data associated with responses to a Visual Analogue Scale (VAS) survey (e.g., a VAS overall survey, a VAS dyspareunia survey, a VAS dysmenorrhea survey, a VAS non-menstrual survey, a VAS dyschezia survey, etc.) by a subject, data associated with the race of a subject, data associated with the age of a subject, data associated with whether a subject has had depression, data associated with whether a subject has had anxiety, data associated with whether a subject has fibromyalgia, data associated with whether a subject has had an abnormal pap smear result, data associated with whether a subject has irritable bowel syndrome (IBS), data associated with a hormone self-report (e.g., data associated with use of hormonal medication self-reported) by a subject, data associated with hormones of a subject (e.g., data associated with hormone information of a subject), data associated with whether a subject has engaged in postmenopausal hormone use (PMH), data associated with whether a subject has taken narcotics, and/or data associated with whether a subject is currently pregnant (e.g., gravid).
[0099] In some non-limiting embodiments, the EHP survey suite is a Health Related Quality of Life patient self-report survey that can be used to assess areas of concern within women with endometriosis (e.g., available at https://innovation.ox.ac.uk/outcome-measures/endometriosis-health-profile-ehp/).
The EHP surveys may allow for an understanding of the perception of a patient’s pain and experiences of the burden of endometriosis-like symptoms during their search for a diagnosis. The EHP highlights areas of concern for patients, such as the impact of their symptoms on infertility, self-perception, social life, pain persistence, work life, sexual life, family life, control of oneself, medical treatment, and emotions.
[00100] In some non-limiting embodiments, the VAS survey suite provides information about the patient’s perception of their pain during menstruation, intercourse, bowel movements, at rest, and overall combination of each of these responses. These responses may be collected on a scale from 1 to 10 through an electronic survey.
[00101] In non-limiting embodiments, a machine learning model for determining the presence of endometriosis in a patient may be trained with and/or may be applied to clinical data as described herein. In non-limiting embodiments, a machine learning model for determining a stage of endometriosis in a patient may be trained with and/or may be applied to clinical data as described herein. In non-limiting embodiments, a machine learning model for determining a stage of endometriosis may be trained with and/or may be applied to survey data (e.g., EHP data and/or VAS data as described herein). In non-limiting embodiments, medication data may be included in a training data set (with clinical and/or survey data) for determining the presence and/or stage of endometriosis.
[00102] Subjects from whom data may be collected for training may include those with or without symptoms of EM, those diagnosed with EM, and/or those who have not been diagnosed with EM. In non-limiting embodiments, data from patients diagnosed with EM is compared to patients diagnosed as not having EM. In non-limiting embodiments, the machine learning model may include a particular architecture, for example, the machine learning model may include a linear support vector machine (SVM) (e.g., an efficient linear SVM, a Gaussian SVM, such as a medium Gaussian SVM, a Quadratic SVM, a Cubic SVM, etc.), a subspace discriminant (e.g., an ensemble subspace discriminant), a neural network (e.g., a deep learning neural network, a narrow neural network, a medium neural network, a bi-layered neural network, a tri-layered neural network, etc.), a decision tree (e.g., a gradient boosted tree, an ensemble boosted tree, etc.), and/or a K-Nearest Neighbors (KNN) (e.g., a fine KNN) model architecture.
[00103] In non-limiting embodiments, the machine learning model is trained, with data escribed herein from one or more subjects, with a k-nearest algorithm, a naive bayes algorithm, a random forest algorithm, boosted gradient algorithm, and/or a neural network. In non-limiting embodiments, the machine learning model may be trained with random forest trees, boosted gradient ensemble models, and neural networks using hold-out validation. In non-limiting embodiments, the machine learning model may be compared against multinomial logistic regression models with regards to ability to accurately identify patients with and/or without EM, and, in non-limiting embodiments, to predict EM stage. Those of skill in the art will appreciate that EM may be staged in up to four stages, with Stage 1 being the most mild (few superficial implants), Stages 2 (more and deeper implants) and 3 (many deep implants, small cysts on one or both ovaries, and presence of filmy adhesions) being more severe, and Stage 4 (many deep implants, large cysts on one or both ovaries, many dense adhesions) being the most severe. In non-limiting embodiments, rather than staging, the machine learning model may classify and/or categorize the EM, from Category I (peritoneal EM), through Category II (ovarian EM), Category III (deep infiltrating EM I), to Category IV (deep infiltrating EM II).
[00104] In non-limiting embodiments, the machine learning model is trained with one or more data points shown in Table 1 , below.
Table 1
[00105] In non-limiting embodiments, the machine learning model that is trained is a classification model. In non-limiting embodiments, multiple machine learning models are trained, validated, and/or employed. In non-limiting embodiments, the multiple models may include a model for identifying the presence of EM, a model for staging EM in a patient, and/or a model allowing a patient to, with one or more of the data points described herein, enter information into software, for example an application executable on a computing device as described herein, to obtain at least a preliminary indication of presence and/or staging of EM. Furthermore, in non-limiting embodiments, the machine learning model may accommodate longitudinal data, where two or more timepoints collecting the biomarker, clinical, and survey data can be used to improve long-term predictions using recurrent neural networks or other forecasting architectures.
[00106] The training and testing protocol may require that a robust hold-out validation be performed (i.e., separating a testing and training dataset a priori). To closely match the characteristics of the training and testing datasets, a probability density function may be fit to any continuous variable (i.e., non-binary variables) that enables histogram matching of the respective datasets:
[00108] where x is the continuous variable, e is the natural exponent. A random number generator to extract a proportion of the dataset may then be used to split the testing and training dataset incrementally such that the training and testing data split does not exceed 70/30.
[00109] In non-limiting embodiments, a minimum AUC of 0.70 may be required to exhibit reasonable discriminability of patient outcomes for the presence of EM and the stage of EM (stages 1 - 4 or pooling stages 1 and 2, and 3 and 4) for a classificationbased ML model. For longitudinal classification modeling or forecasting, intermediate AUCs of 0.70 and changes in status may be important in matching the temporal changes to a patient’s overall health to reflect the precise time when EM is non- invasively diagnosed (as matched to the available data on surgical confirmation of the presence of EM and respective stage).
[00110] In non-limiting embodiments, the data described herein is used to train and/or validate a machine learning model for identifying the presence of EM in a patient. In non-limiting embodiments, the method may include receiving biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects, receiving clinical data relating to one or more clinical features of one or more subjects, and receiving survey data relating to one or more validated, subjective parameters of one or more subjects. In non-limiting embodiments, the method may further include training, based at least on the biomarker data, the clinical data, and the survey data, a machine learning model, for example a classification model. The training may be completed with any useful algorithm, for example those described herein.
[00111] Also provided herein are methods of identifying the presence of EM in a patient with a model trained, for example, as described herein. In non-limiting embodiments, the machine learning model is applied to one or more of biomarker data, clinical data, and/or survey data obtained from the patient and, based on applying the machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, the method may include determining that the patient has EM or that the patient does not have EM. In non-limiting embodiments, after determining that the patient has EM, the method may include determining, with the machine learning model or another machine learning model, a stage of the patient’s EM.
[00112] As further shown in FIG. 2, a feature importance and/or dimensionality reduction process may be conducted in conjunction with training of the machine learning model. Feature importance of data inputs will be determined, and through dimensionality reduction, a number of features may be determined to be unnecessary. The resulting subset of input data will be used to train a machine learning model. In parallel with the machine learning training and testing, multinomial logistic regression may be performed, using the database described above as inputs, to assess its efficacy in diagnosing disease presence and stage in comparison to the machine learning models. Lastly, the best performing machine learning model will be validated using hold-out validation, a preferred method of machine learning validation where the dataset being used was not involved in the training of the models.
[00113] With reference to FIG. 3, shown is a schematic representation of a process of performing an inference task of that results in a prediction of a diagnosis of endometriosis for a subject. As shown in FIG. 3, new data (e.g., input data that was not used in training the machine learning model, which may include biomarker data, clinical data, and survey data) associated with a patient may be provided as an input to a trained machine learning classification model. The trained machine learning classification model may be configured to provide a prediction of a diagnosis of endometriosis for the patient. In non-limiting embodiments, the trained machine learning classification model may be configured to receive an input that is based on a feature importance and/or dimensionality reduction procedure. For example, the trained machine learning classification model may be configured to receive an input that includes a plurality of features that are a subset of all features in the training data set based on the feature importance and/or dimensionality reduction procedure. In one example, the trained machine learning classification model may be configured to receive an input that includes a feature associated with whether a subject whether a subject took muscle relaxants, a feature associated with whether a subject took asthma medication, a feature associated with whether a subject has (e.g., was diagnosed with) post-traumatic stress disorder (PTSD), a feature associated with whether a subject took hormonal medication, a feature associated with whether a subject took attention deficit hyperactivity disorder (ADHD) medication, and/or a feature associated with whether a subject was diagnosed with ADHD. In another example, the trained machine learning classification model may be configured to receive an input that includes a feature associated with whether a subject whether a subject took muscle relaxants, a feature associated with whether a subject took asthma medication, a feature associated with whether a subject has (e.g., was diagnosed with) post-traumatic stress disorder (PTSD), a feature associated with whether a subject took hormonal medication, and/or a feature associated with a cycle frequency of a menstrual cycle of a subject.
[00114] With reference to FIG. 4, shown is a schematic representation of a process of performing an inference task of that results in a prediction of a diagnosis of a stage (e.g., stage 1 , stage 2, stage 3, stage 4, stages 1 or 2, stages 3 or 4, etc.) of endometriosis for a subject. As shown in FIG. 4, new data (e.g., input data that was not used in training the machine learning model, which may include biomarker data, clinical data, and survey data) associated with a patient may be provided as an input to a trained machine learning classification model. The trained machine learning classification model may be configured to provide a prediction of a diagnosis of a stage of endometriosis for the patient. In non-limiting embodiments, the trained machine learning classification model may be configured to receive an input that is based on a feature importance and/or dimensionality reduction procedure. For example, the trained machine learning classification model may be configured to receive an input that includes a plurality of features that are a subset of all features in the training data set based on the feature importance and/or dimensionality reduction procedure. In one example, the trained machine learning classification model may be configured to receive an input that includes a feature associated with the age of a subject, a feature associated with responses to an EHP pre-sexual survey by a subject, a feature associated with responses to an EHP survey total by a subject, a feature associated with responses to an EHP pre-treatment survey by a subject, and/or a feature associated with hormone information of a subject. In another example, the trained machine learning classification model may be configured to receive an input that includes a feature associated with the age of a subject, a feature associated with hormones self-reported by a subject, a feature associated with responses to an EHP pre-sexual survey by a subject, a feature associated with whether a subject took contraceptive medication, and/or a feature associated with whether a subject has irritable bowel syndrome (IBS).
[00115] With reference to FIG. 5, shown is a schematic representation of a process of performing an inference task of that results in a prediction of a diagnosis of endometriosis and then a diagnosis of a stage (e.g., no endometriosis, stage 1 , stage 2, stage 3, stage 4, stages 1 or 2, stages 3 or 4, etc.) of endometriosis for a subject. As shown in FIG. 5, new data (e.g., input data that was not used in training the machine learning model, which may include biomarker data, clinical data, and survey data) associated with a patient may be provided as an input to a first trained machine learning classification model. The first trained machine learning classification model may be configured to provide a prediction of a diagnosis of endometriosis for the patient. The second trained machine learning classification model may be configured to provide a prediction of a diagnosis of a stage of endometriosis for the patient based on a prediction of a diagnosis of endometriosis for the patient from the first trained machine learning classification model.
[00116] In non-limiting embodiments, the first trained machine learning classification model and/or the second trained machine learning classification model may be configured to receive an input that is based on a feature importance and/or dimensionality reduction procedure. For example, the first and/or second trained machine learning classification models may be configured to receive an input that includes a plurality of features that are a subset of all features in the training data set based on the feature importance and/or dimensionality reduction procedure. In one example, the first and/or second trained machine learning classification models may be configured to receive an input that includes a feature a feature associated with the age of a subject, a feature associated with responses to an EHP pre-sexual survey by a subject, a feature associated with the use of hormonal medication self-reported by a subject, a feature associated with responses to a VAS pre-dyspareunia survey by a subject, and/or a feature associated with responses to an EHP survey total by a subject. In another example, the trained machine learning classification model may be configured to receive an input that includes a feature associated with whether a subject took contraceptive medication, a feature associated with whether a subject whether a subject took muscle relaxants, a feature associated with hormones selfreported by a subject, a feature associated with the age of a subject, and/or a feature associated with whether a subject has irritable bowel syndrome (IBS).
[00117] As may be appreciated, as methods of training and applying a machine learning model are described herein, device and systems for implementing these methods are also within the scope of the present disclosure. To this end, FIG. 6, shows a diagram of example components of a device 200 for training, validating, and/or applying a machine learning model according to non-limiting embodiments described herein. It is to be appreciated that a system useful for training, validating, and/or applying a machine learning model as described herein may include any number of devices, with one or more components as shown in FIG. 6, and that the device(s) may communicate with one or more other device(s) through wired or wireless connections as are known in the art. For example, and without limitation, one or more client devices may be used to input data as described herein, and one or more aspects of the machine learning model may be stored and/or executed by the client device, by another device (e.g., a server) in communication with the client device, and/or both. In non-limiting embodiments, a device may be a mobile device, such as those described herein, and may receive data input by a user. In such non-limiting embodiments, one or more aspects of the machine learning model may be stored and/or executed by the mobile device, by another device (e.g., a server) in communication with the mobile device, and/or both.
[00118] Device 200 may correspond to any element of any system or device described herein, including any computing device and/or server, for example those configured to collect data for training, validating, and/or applying a machine learning model as described herein. In some non-limiting embodiments, such systems or devices may include at least one device 200 and/or at least one component of device 200. The number and arrangement of components shown are provided as an example. In some non-limiting embodiments, device 200 may include additional components, fewer components, different components, or differently arranged components than those shown. Additionally, or alternatively, a set of components (e.g., one or more components) of device 200 may perform one or more functions described as being performed by another set of components of device 200.
[00119] As shown in FIG. 6, device 200 may include a bus 202, a processor 204, memory 206, a storage component 208, an input component 210, an output component 212, and a communication interface 214. Bus 202 may include a component that permits communication among the components of device 200. In some non-limiting embodiments, processor 204 may be implemented in hardware, firmware, or a combination of hardware and software. For example, processor 204 may include a processor (e.g., a central processing unit (CPU), a graphics processing unit (GPU), an accelerated processing unit (APU), etc.), a microprocessor, a digital signal processor (DSP), and/or any processing component (e.g., a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), etc.) that can be programmed to perform a function. Memory 206 may include random access memory (RAM), read only memory (ROM), and/or another type of dynamic or static storage device (e.g., flash memory, magnetic memory, optical memory, etc.) that stores information and/or instructions for use by processor 204. It is to be understood that programming instructions, stored in memory (e.g., RAM and/or ROM), which may be executed by a processor (e.g., processor 204) to receive data, analyze data, train a machine learning model, validate a machine learning model, and/or apply a machine learning model, are within the scope of the present disclosure. [00120] With continued reference to FIG. 6, storage component 208 may store information and/or software related to the operation and use of device 200. For example, storage component 208 may include a hard disk (e.g., a magnetic disk, an optical disk, a magneto-optic disk, a solid-state disk, etc.) and/or another type of computer-readable medium. Input component 210 may include a component that permits device 200 to receive information, such as via user input (e.g., a touch screen display, a keyboard, a keypad, a mouse, a button, a switch, a microphone, etc.). Additionally, or alternatively, input component 210 may include a sensor for sensing information (e.g., a global positioning system (GPS) component, an accelerometer, a gyroscope, an actuator, etc.). Sensors useful here may include biochemical sensors, electrochemical sensors, sensors for detecting autonomic tone, sensos for detecting sympathetic tone, and/or the like. Output component 212 may include a component that provides output information from device 200 (e.g., a display, a speaker, one or more light-emitting diodes (LEDs), etc.). Communication interface 214 may include a transceiver-like component (e.g., a transceiver, a separate receiver and transmitter, etc.) that enables device 200 to communicate with other devices, such as via a wired connection, a wireless connection, or a combination of wired and wireless connections. Communication interface 214 may permit device 200 to receive information from another device and/or provide information to another device. For example, communication interface 214 may include an Ethernet interface, an optical interface, a coaxial interface, an infrared interface, a radio frequency (RF) interface, a universal serial bus (USB) interface, a Wi-Fi® interface, a cellular network interface, and/or the like.
[00121] Device 200 may perform one or more processes described herein. Device 200 may perform these processes based on processor 204 executing software instructions stored by a computer-readable medium, such as memory 206 and/or storage component 208. A computer-readable medium may include any non-transitory memory device. A memory device includes memory space located inside of a single physical storage device or memory space spread across multiple physical storage devices. Software instructions may be read into memory 206 and/or storage component 208 from another computer-readable medium or from another device via communication interface 214. When executed, software instructions stored in memory 206 and/or storage component 208 may cause processor 204 to perform one or more processes described herein. Additionally, or alternatively, hardwired circuitry may be used in place of or in combination with software instructions to perform one or more processes described herein. Thus, embodiments described herein are not limited to any specific combination of hardware circuitry and software. The term “configured to,” as used herein, may refer to an arrangement of software, device(s), and/or hardware for performing and/or enabling one or more functions (e.g., actions, processes, steps of a process, and/or the like). For example, “a processor configured to” may refer to a processor that executes software instructions (e.g., program code) that cause the processor to perform one or more functions.
Example
[00122] The current standard for diagnosing EM is through surgical visualization and pathological confirmation, which is highly invasive, expensive, and unnecessary for a staggering 20-40% of patients that end up not having the disease. An alternative method of reliable disease detection is greatly needed. Machine learning (ML) has been shown to improve upon statistical analysis through the identification, conversion, and implementation of data patterns into relevant evidence. While artificial intelligence use is expanding in the field of medicine, no known ML algorithm exist in the clinical setting to diagnose both the presence and stage of EM in symptomatic patients using biomarker, clinical, and survey inputs. The proposed ML-based tool consists of two predictive models: one to identify disease presence and the other to classify the disease stage. Presence prediction is necessary to immediately inform patients and clinicians of the answer to their symptoms and to reduce costs associated with surgical diagnosis. Stage prediction is necessary to enable clinical preparedness for scheduling surgeries, to inform surgeons of anticipated challenges, and to support cost projection of the performed procedure.
[00123] Clinical Data: Patient clinical data will be collected from University of Pittsburgh Medical Center (UPMC) Clinical Analytics, which structures various streams of UPMC data into easily visualizable data for the purposes of clinical, quality, and operational improvement. Clinical Analytics will provide for each enrolled patient the indices shown in Table 1 above.
[00124] Patient-Generated Data via Preoperative Surveys: The pre-surgical survey includes the EHP-30 and the Visual Analog Scales for pain. The survey includes questions detailing the patient’s experience of pain (see Table 1 above). The patientgenerated data from the pre-surgical survey customizes the data based on personal experiences of the biological and physical presentation of the disease. Studies have shown that 86.2% of patients presented with multiple symptoms before their diagnosis. Upon completion of the ~15 min survey, summing and averaging patient responses will calculate the scores. De-identified responses will be input into a secured and encrypted for the database.
[00125] Patient Biological Samples: Biological samples will be collected under sterile conditions during the surgical procedure to minimize disruption to the standard of care of the patient. Peripheral blood will be collected via routine venous peripheral blood draws. Endometrial biopsy is a non-surgical technique of obtaining endometrial tissue using a small suctioning swab inserted through the cervix into the uterus. All samples will be stored in de-identified containers, either blood vials or sterile conical tubes with tissue biopsy medium. Blood will be kept at room temperature for 30 minutes prior to further processing and tissue will be stored on ice.
[00126] Quantification of Patient-Specific Biomarkers: From these samples, a minimum of eight blood biomarkers, 21 genetic markers, and two cell markers will be evaluated for inclusion in the database (see Table 1 above). EM is an estrogendependent and progesterone-resistant disease which drives inflammation, pelvic pain, and impacts fertility of EM patients. It is thought that EM has a 50% heritability rate and 19 SNPs have been identified as explaining half of EM disease variants. When the disease forms, EnSCs shed into the peritoneal cavity during retrograde menstruation and activate macrophages, driving inflammatory release of IL-6, TNF-a, and VEGF which leads to disease invasion and growth. EM risk has been related to increased PGP-9.5+ nerve cell size and density in endometrial biopsies, demonstrating a potential relationship between patient symptom survey data and biomarker data that ML could identify. Flow cytometry will be used to measure EnSCs and PGP-9.5 receptor cells and Luminex immunoassays will be used to measure pro- inflammatory cytokines. Genetic testing of blood and endometrial biopsy samples will be performed using TaqMan SNP genotyping and assaying. Patient biomarker results will be added to the database after all biomarker analysis for the patient has been completed.
[00127] Comparing Classes of Patient Information in EM and non-EM Patients: All quantified data added to the database will be compared between EM and non-EM patients to determine the relevance and significance of that data type. Standard statistical analysis using t-tests for analyzing presence of disease and analysis of variance (ANOVA) to assess presentation of biomarkers at varying stages. [00128] Training and validation of ML algorithms and multinomial logistic regression models to predict the presence and stage of EM in symptomatic patients will utilize a database compiled based on the data described herein as inputs. Firstly, the highest performing preliminarily trained ML models will be selected. Secondly, feature importance of patient-specific data inputs will be determined, and it may be discovered that many of the variables are extraneous through dimensionality reduction, a filtering process which will include principal component analysis. Next, the resulting subset of patient-specific input data will be used to train the final predictive ML models. In parallel with the ML training and testing, multinomial logistic regression will be performed, using the database described above as inputs, to assess its efficacy in diagnosing disease presence and stage in comparison to the ML models. Lastly, the best performing ML model will be validated using hold-out validation, a preferred method of ML validation where the dataset being used was not involved in the training of the models.
[00129] In non-limiting embodiments, two ML models may be produced, one each capable of diagnosing EM presence and stage, as well as two multinomial logistic regression models, one each for disease presence and stage. This will allow for direct comparison of the ML models to the regression models for each prediction using receiver operating curves (ROC) and their areas under the curve (AUC), accuracy, specificity, and sensitivity. It is hypothesized that ML will improve upon multinomial logistic regression prediction of the presence and stage of EM based on these assessments.
[00130] Training and Selection of the Optimal ML Models: To build the ML models, the proper algorithm framework must be used to train and test models. In compliance with the FDA’s “Locked Model Policy” initiated by the Software as Medical Device (SaMD) group, the appropriate model for the database may be selected at the outset of the study. To determine the optimal algorithm, automated ML (AutoML) will be used to analyze all known models with the data to generate the best hold-out-validation score and the most important input data for the models. It is expected that classification models for disease presence and/or stage will be trained based on random forest, boosted gradient ensemble, or neural networks. The output of AutoML will be the framework for the ML models.
[00131 ] Determine Feature Importance of Patient-Specific Inputs: The training data, which will be a random selection of 80%, or 160 patients, of the patient-specific data with and without EM as described herein, will be input to the ML pipeline (FIG. 2). The Python library, Gini Importance, will be used to determine the most important features, or input variables, in the ML models iteratively using feature importance and dimensionality reduction. Input variables with correlation values greater than 0.6 will be considered important to the model and will be retained for training of the models. It is expected that the results of this step will mimic the results seen in the multinomial logistic regression model.
[00132] ML Model Training: The ML models will be trained and tested using 80% of the dataset described herein, a total of 160 patients. One ML model classifies whether a patient does or does not have EM (FIGS. 3 and 5). A patient identified to have EM will result in a ML model predicting the stage of EM in a patient (FIGS. 4 and 5). A final subset of indices will be used to train ML classification models for determining presence and/or staging of disease. Python libraries along with in-house Python scripts will leverage sci-kit learn (sklearn), XGBoost, and Tree Based Pipeline Optimization (T-POT). T-POT may not be implemented given the sample size, though AutoML performed as described herein will determine this. The desired model outcome would include a receiver operating characteristic (ROC) curve with an area under the curve (AUC) greater than 0.85 (typically, an AUC of greater than 0.70 would reveal a positive direction in clinical utility), which would reveal a significant improvement in accurate diagnosis since the current rate of diagnosis is 40%. The final outputs for the training of the ML models will be the AUC of the ROCs, accuracy, sensitivity, and specificity.
[00133] As shown in FIG. 7, a proof-of-concept prediction model of this disease has been established using a preliminary database of clinical data and survey data to demonstrate the clinical relevance and accuracy of the proposed method. This initial ML model relied on 72 patients with 7-fold cross-validation and resulted in a 0.752 AUC for diagnosing EM accurately (FIG. 7, panel A). The AUC of 0.752 indicates that the model has a degree of discriminability of outcomes, but the initial results are based on internal cross-validation. The overall model had an accuracy of 76.4% (FIG. 7, panel B) when predicting the presence of EM (accurately predicting 9.7% with no EM and 66.7% with EM). The proof-of-concept prediction model pipeline serves as the basis for training the larger models and inputting unseen data to perform a more rigorous hold-out-validation. [00134] As shown in FIGS. 8A-8B, another prediction model of this disease has been established using a preliminary database of clinical data and survey data to demonstrate the clinical relevance and accuracy of the present disclosure. The ML model, an efficient linear support vector machine (SVM), relied on 241 patients with 5- fold cross-validation and resulted in a 0.6453 AUC for diagnosing the presence of EM accurately (FIG. 8A). The overall model had an accuracy of 72.6% (FIG. 8B) when predicting the presence of EM (accurately predicting 28.6% ((18)/( 18+45) = 0.286 *100 = 28.6%) with no EM and 88.2% ((157)/(157+21 ) = 88.2 *100 = 88.2%) with EM).
[00135] As shown in FIGS. 9A-9B, another prediction model of this disease has been established using a preliminary database of clinical data and survey data to demonstrate the clinical relevance and accuracy of the present disclosure. The ML model, a subspace discriminant, relied on 241 patients with 5-fold cross-validation and resulted in a 0.5812 AUC for diagnosing the presence of EM accurately, a 0.7162 AUC for diagnosing whether the subject has stages 1 or 2 of EM accurately, and a 0.7782 AUC for diagnosing whether the subject has stages 3 or 4 of EM accurately (FIG. 9A). The overall model had an accuracy of 52.4% (FIG. 9B) when predicting the presence and stage of EM (accurately predicting 23.8% with no EM, 72.7% with stage 1 or 2 of EM, and 50% with stage 3 or 4 of EM).
[00136] As shown in FIGS. 10A-10B, another prediction model of this disease has been established using a preliminary database of clinical data and survey data to demonstrate the clinical relevance and accuracy of the present disclosure. The ML model, a subspace discriminant, relied on 177 patients with 5-fold cross-validation and resulted in a 0.7645 AUC for diagnosing the stage (e.g., stages 1 or 2 or stages 3 or 4) of EM accurately among known endometriotic patients (FIG. 10A). The overall model had an accuracy of 72.9% (FIG. 10B) when predicting the stage of EM (accurately predicting 83.6% with stage 1 or 2 of EM and 55.2% with stage 3 or 4 of EM).
[00137] As shown in FIGS. 11A-11 B, another prediction model of this disease has been established using a preliminary database of clinical data and survey data to demonstrate the clinical relevance and accuracy of the present disclosure. The ML model, a subspace discriminant, relied on 238 patients with 5-fold cross and resulted in a 0.6429 AUC for diagnosing the presence of EM accurately (FIG. 11 A). The overall model had an accuracy of 74.4% (FIG. 11 B) when predicting the presence of EM (accurately predicting 21 .7% with no EM and 92.1 % with EM). [00138] As shown in FIGS. 12A-12B, another prediction model of this disease has been established using a preliminary database of clinical data and survey data to demonstrate the clinical relevance and accuracy of the present disclosure. The ML model, a linear SVM, relied on 239 patients with 5-fold cross-validation and resulted in a 0.5981 AUC for diagnosing the presence of EM accurately, a 0.6544 AUC for diagnosing whether the subject has stages 1 or 2 of EM accurately, and a 0.7352 AUC for diagnosing whether the subject has stages 3 or 4 of EM accurately (FIG. 12A). The overall model had an accuracy of 53.6% (FIG. 12B) when predicting the presence and stage of EM (accurately predicting 19.7% with no EM, 76.6% with stage 1 or 2 of EM, and 46.3% with stage 3 or 4 of EM).
[00139] As shown in FIGS. 13A-13B, another prediction model of this disease has been established using a preliminary database of clinical data and survey data to demonstrate the clinical relevance and accuracy of the present disclosure. The ML model, a subspace discriminant, relied on 177 patients with 5-fold cross-validation and resulted in a 0.7462 AUC for diagnosing the stage (e.g., stages 1 or 2 or stages 3 or 4) of EM accurately among known endometriotic patients (FIG. 13A). The overall model had an accuracy of 74% (FIG. 13B) when predicting the stage of EM (accurately predicting 85.3% with stage 1 or 2 of EM and 55.9% with stage 3 or 4 of EM).
[00140] As shown in FIGS. 14A-14B, another prediction model of this disease has been established using a preliminary database of clinical data and survey data to demonstrate the clinical relevance and accuracy of the present disclosure. The ML model, an efficient linear SVM, relied on 236 patients with 5-fold cross-validation and resulted in a 0.6165 AUC for diagnosing the presence of EM accurately (FIG. 14A). The overall model had an accuracy of 76.7% (FIG. 14B) when predicting the presence of EM (accurately predicting 34.4% with no EM and 89.9% with EM).
[00141] As shown in FIGS. 15A-15B, another prediction model of this disease has been established using a preliminary database of clinical data and survey data to demonstrate the clinical relevance and accuracy of the present disclosure. The ML model, an efficient linear SVM, relied on 239 patients with 5-fold cross-validation and resulted in a 0.5825 AUC for diagnosing the presence of EM accurately, a 0.6128 AUC for diagnosing whether the subject has stages 1 or 2 of EM accurately, and a 0.7839 AUC for diagnosing whether the subject has stages 3 or 4 of EM accurately (FIG. 15A). The overall model had an accuracy of 52.3% (FIG. 15B) when predicting the presence and stage of EM (accurately predicting 32.8% with no EM, 61 .8% with stage 1 or 2 of EM, and 54.4% with stage 3 or 4 of EM).
[00142] As shown in FIGS. 16A-16B, another prediction model of this disease has been established using a preliminary database of clinical data and survey data to demonstrate the clinical relevance and accuracy of the present disclosure. The ML model, a subspace discriminant, relied on 176 patients with 5-fold cross-validation and resulted in a 0.7236 AUC for diagnosing the stage (e.g., stages 1 or 2 or stages 3 or 4) of EM accurately among known endometriotic patients (FIG. 16A). The overall model had an accuracy of 69.9% (FIG. 16B) when predicting the stage of EM (accurately predicting 76.6% with stage 1 or 2 of EM and 58.8% with stage 3 or 4 of EM).
[00143] Multinomial Logistic Regression Analysis of Patient-Specific Inputs to Identify Disease Presence: Statistical analyses, namely statistical classification models, have previously been used as EM clinical screening tools using symptombased inputs. However, combining non-invasive biomarkers, clinical data, and patientgenerated survey responses has yet to be included in a stochastic model for EM presence or stage prediction. Multinomial logistic regression analysis will be used to compare against the ML models being developed. This multinomial logistic regression model will rely on the same inputs the ML models described herein, will also undergo feature importance and dimensionality reduction, and will output an AUC of the ROC, accuracy, specificity, and sensitivity to be compared against the ML models.
[00144] Validation of Model for Prediction Accuracy: The remaining 20% of the dataset that were randomly selected for use to train and test the model, totaling to 40 patients, will be utilized. This technique is known as hold-out validation and is considered a common and preferred method of validating ML models for clinical approaches. Hold-out validation will be the primary method of model validation for the entirety of development. A model will be considered validated when the prediction from the model aligns with the outcome the patient experienced (+/- EM presence and Stage 1 , 2, 3, or 4).
[00145] In non-limiting embodiments, a framework for a non-invasive diagnostic tool for EM using ML and multinomial logistic regression may be created. This framework utilizes clinical data, patient-generated survey data, and/or biomarker data to determine both the presence and stage of EM. The desired marker for a successfully trained ML algorithm would be a model that has an ROC with an AUC greater than 0.85. Two trained and validated ML models may be produced, which have been compared to standard statistical techniques and will identify the presence and severity of EM.
[00146] Although embodiments have been described in detail for the purpose of illustration, it is to be understood that such detail is solely for that purpose and that the disclosure is not limited to the disclosed embodiments or aspects, but, on the contrary, is intended to cover modifications and equivalent arrangements that are within the spirit and scope of the appended claims. For example, it is to be understood that the present disclosure contemplates that, to the extent possible, one or more features of any embodiment or aspect can be combined with one or more features of any other embodiment or aspect.

Claims

THE INVENTION CLAIMED IS
1. A computer-implemented method of training a machine learning model for identifying the presence of endometriosis in a patient, comprising: receiving, with at least one processor, biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects; receiving, with at least one processor, clinical data relating to one or more clinical features of one or more subjects; receiving, with at least one processor, survey data relating to one or more validated, subjective parameters of one or more subjects; and training, with at least one processor and based at least on the biomarker data, the clinical data, and the survey data, a machine learning model.
2. The computer-implemented method of claim 1 , wherein the one or more biomarkers are obtained from the one or more subjects’ saliva, urine, and/or blood.
3. The computer-implemented method of claim 1 , wherein the one or more biomarkers comprise one or more genetic markers.
4. The computer-implemented method of claim 1 , wherein the one or more biomarkers comprise one or more single nucleotide polymorphisms (SNPs).
5. The computer-implemented method of claim 4, wherein the one or more SNPs comprise one or more SNPs in WNT4, GREB1, and/or KDR.
6. The computer-implemented method of claim 1 , wherein the one or more biomarkers comprise one or more extracellular vesicles, hormones, neurotransmitters, growth factors, peptides, cytokines, glycoproteins, and/or enzymes.
7. The computer-implemented method of claim 1 , wherein the one or more biomarkers comprise presence and/or levels of one or more cells.
8. The computer-implemented method of claim 7, wherein the one or more cells comprise one or more immune cells and/or stromal cells.
9. The computer-implemented method of claim 1 wherein the one or more clinical features comprises one or more of the one or more subjects’ age, race, family history of endometriosis, medication history, age of first menstrual cycle, weight, height, body mass index, previously-diagnosed diseases or conditions, comorbidities, obstetric history, and/or substance use history.
10. The computer-implemented method of claim 1 , wherein the one or more validated, subjective parameters comprise one or more scores of a visual analog scale.
1 1 . The computer-implemented method of claim 10, wherein the one or more scores of the visual analog scale comprise one or more numeric ratings of the one or more subjects’ self-perception of pain, self-perception of control and/or powerlessness, self-perception of social support, self-perception of emotional wellbeing, self-image, duration of pain, location of pain, menstrual cycle length, and/or menstrual cycle irregularities before and/or after hormone therapy.
12. The computer-implemented method of claim 1 , wherein the machine learning model comprises a classification model.
13. The computer-implemented method of claim 1 , wherein the machine learning model is trained with one or more of a k-nearest algorithm, a naive bayes algorithm, and/or a neural network.
14. A computer-implemented method of identifying the presence of endometriosis in a patient, comprising: training, with at least one processor, a machine learning model based at least on: biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects; clinical data relating to one or more clinical features of one or more subjects; and survey data relating to one or more validated, subjective parameters of one or more subjects; applying, with at least one processor, the machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from the patient; and based on applying the machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determining, with at least one processor, that the patient has endometriosis or that the patient does not have endometriosis.
15. The computer-implemented method of claim 14, wherein the one or more biomarkers are obtained from the one or more subjects’ saliva, urine, and/or blood.
16. The computer-implemented method of claim 14, wherein the one or more biomarkers comprise one or more genetic markers.
17. The computer-implemented method of claim 14, wherein the one or more biomarkers comprise one or more single nucleotide polymorphisms (SNPs).
18. The computer-implemented method of claim 17, wherein the one or more SNPs comprise one or more SNPs in WNT4, GREB1, and/or KDR.
19. The computer-implemented method of claim 14, wherein the one or more biomarkers comprise one or more extracellular vesicles, hormones, neurotransmitters, growth factors, peptides, cytokines, glycoproteins, and/or enzymes.
20. The computer-implemented method of claim 14, wherein the one or more biomarkers comprise presence and/or levels of one or more cells.
21 . The computer-implemented method of claim 20, wherein the one or more cells comprise one or more immune cells and/or stromal cells.
22. The computer-implemented method of claim 14, wherein the one or more clinical features comprises one or more of the one or more subjects’ age, race, family history of endometriosis, medication history, age of first menstrual cycle, weight, height, body mass index, previously-diagnosed diseases or conditions, comorbidities, obstetric history, and/or substance use history.
23. The computer-implemented method of claim 14, wherein the one or more validated, subjective parameters comprise one or more scores of a visual analog scale.
24. The computer-implemented method of claim 23, wherein the one or more scores of the visual analog scale comprise one or more numeric ratings of the one or more subjects’ self-perception of pain, self-perception of control and/or powerlessness, self-perception of social support, self-perception of emotional wellbeing, self-image, duration of pain, location of pain, menstrual cycle length, and/or menstrual cycle irregularities before and/or after hormone therapy.
25. The computer-implemented method of claim 14, wherein the machine learning model comprises a classification model.
26. The computer-implemented method of claim 14, wherein the machine learning model is trained with one or more of a k-nearest algorithm, a naive bayes algorithm, and/or a neural network.
27. The computer-implemented method of claim 14, further comprising, after determining that the patient has endometriosis, determining, with at least one processor, a stage of the patient’s endometriosis.
28. A system comprising at least one processor programmed or configured to: receive biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects; receive clinical data relating to one or more clinical features of one or more subjects; receive survey data relating to one or more validated, subjective parameters of one or more subjects; and train, based at least on the biomarker data, the clinical data, and the survey data, a machine learning model.
29. The system of claim 28, wherein the at least one processor is programmed or configured to train the machine learning model using one or more of a k-nearest algorithm, a naive bayes algorithm, and/or a neural network.
30. The system of claim 28, wherein the machine learning model is a classification model.
31. A system comprising at least one processor programmed or configured to: train a machine learning model based at least on: biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects; clinical data relating to one or more clinical features of one or more subjects; and survey data relating to one or more validated, subjective parameters of one or more subjects; apply the machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from a patient; and based on applying the machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determine that the patient has endometriosis or that the patient does not have endometriosis.
32. The system of claim 31 , wherein the at least one processor is programmed or configured to train the machine learning model using one or more of a k-nearest algorithm, a naive bayes algorithm, and/or a neural network.
33. The system of claim 31 , wherein the machine learning model is a classification model.
34. The system of claim 31 , wherein the at least one process is further programmed or configured to, after determining that the patient has endometriosis, determine a stage of the patient’s endometriosis.
35. A computer-implemented method of identifying the presence of endometriosis in a patient, comprising: applying, with at least one processor, a trained machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from the patient; and based on applying the trained machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determining, with at least one processor, that the patient has endometriosis or that the patient does not have endometriosis, wherein the machine learning model is trained with: biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects; clinical data relating to one or more clinical features of one or more subjects; and survey data relating to one or more validated, subjective parameters of one or more subjects.
36. A system comprising at least one processor programmed or configured to: apply a trained machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from a patient; and based on applying the trained machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determine that the patient has endometriosis or that the patient does not have endometriosis, wherein the machine learning model is trained with at least: biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects; clinical data relating to one or more clinical features of one or more subjects; and survey data relating to one or more validated, subjective parameters of one or more subjects.
37. A non-transitory, computer-readable medium comprising programming instructions that, when executed by at least one processor, cause the at least one processor to: receive clinical data relating to one or more clinical features of one or more subjects; receive survey data relating to one or more validated, subjective parameters of one or more subjects; and train, based at least on the biomarker data, the clinical data, and the survey data, a machine learning model.
38. A non-transitory, computer-readable medium comprising programming instructions that, when executed by at least one processor, cause the at least one processor to: apply a trained machine learning model to one or more of biomarker data, clinical data, and/or survey data obtained from a patient; and based on applying the trained machine learning model to the biomarker data, clinical data, and/or survey data obtained from the patient, determine that the patient has endometriosis or that the patient does not have endometriosis, wherein the machine learning model is trained with at least: biomarker data relating to presence and/or levels of one or more biomarkers obtained from one or more subjects; clinical data relating to one or more clinical features of one or more subjects; and survey data relating to one or more validated, subjective parameters of one or more subjects.
PCT/IB2025/054638 2024-05-03 2025-05-02 System and method for early diagnosis of endometriosis Pending WO2025229620A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202463642330P 2024-05-03 2024-05-03
US63/642,330 2024-05-03

Publications (1)

Publication Number Publication Date
WO2025229620A1 true WO2025229620A1 (en) 2025-11-06

Family

ID=97561246

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2025/054638 Pending WO2025229620A1 (en) 2024-05-03 2025-05-02 System and method for early diagnosis of endometriosis

Country Status (1)

Country Link
WO (1) WO2025229620A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016198749A1 (en) * 2015-06-12 2016-12-15 Turun Yliopisto Diagnostic biomarkers, clinical variables, and techniques for selecting and using them
WO2020051530A1 (en) * 2018-09-07 2020-03-12 Juneau Biosciences, L.L.C. Methods of using genetic markers associated with endometriosis
US20210232934A1 (en) * 2020-01-28 2021-07-29 Color Genomics, Inc. Systems and methods for enhanced user specific predictions using machine learning techniques

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016198749A1 (en) * 2015-06-12 2016-12-15 Turun Yliopisto Diagnostic biomarkers, clinical variables, and techniques for selecting and using them
WO2020051530A1 (en) * 2018-09-07 2020-03-12 Juneau Biosciences, L.L.C. Methods of using genetic markers associated with endometriosis
US20210232934A1 (en) * 2020-01-28 2021-07-29 Color Genomics, Inc. Systems and methods for enhanced user specific predictions using machine learning techniques

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "EndoDx: A New Frontier in Endometriosis Diagnosis", NEWS.ENGINEERING.PITT.EDU, 10 April 2024 (2024-04-10), XP093370977, Retrieved from the Internet <URL:https://news.engineering.pitt.edu/endodx-a-new-frontier-in-endometriosis-diagnosis/#> *
BENDIFALLAH SOFIANE, PUCHAR ANNE, SUISSE STÉPHANE, DELBOS LÉA, POILBLANC MATHIEU, DESCAMPS PHILIPPE, GOLFIER FRANCOIS, TOUBOUL CYR: "Machine learning algorithms as new screening approach for patients with endometriosis", SCIENTIFIC REPORTS, NATURE PUBLISHING GROUP, US, vol. 12, no. 1, US , XP093370979, ISSN: 2045-2322, DOI: 10.1038/s41598-021-04637-2 *
GOLDSTEIN ANAT, COHEN SHANI: "Self-report symptom-based endometriosis prediction using machine learning", SCIENTIFIC REPORTS, NATURE PUBLISHING GROUP, US, vol. 13, no. 1, US , XP093370981, ISSN: 2045-2322, DOI: 10.1038/s41598-023-32761-8 *

Similar Documents

Publication Publication Date Title
Malik et al. Beyond the complete blood cell count and C-reactive protein: a systematic review of modern diagnostic tests for neonatal sepsis
EP2016405B1 (en) Methods and apparatus for identifying disease status using biomarkers
Zhao et al. Prognostic value of an inflammatory biomarker-based clinical algorithm in septic patients in the emergency department: an observational study
Nwabuko An overview of research study designs in quantitative research methodology
CN105229471B (en) Systems and methods for preeclampsia risk determination based on biochemical marker analysis
US20250140413A1 (en) Tools for predicting the risk of preterm birth
CN105096225A (en) Analysis system, apparatus and method for assisting disease diagnosis and treatment
CN117253625B (en) Construction device of lung cancer screening model, lung cancer screening device, equipment and medium
TW202119430A (en) Detecting, evaluating and predicting system for cancer risk
Park et al. Noninvasive prediction of intra-amniotic infection and/or inflammation in women with preterm labor: various cytokines in cervicovaginal fluid
CN119314687B (en) Machine learning-based MCI and its evolution into a dementia prediction system
Liu et al. Development and validation of a preliminary clinical support system for measuring the probability of incident 2-year (pre) frailty among community-dwelling older adults: A prospective cohort study
KR20230108782A (en) Method and apparatus for predicting risk of disease based on microbiome data
WO2022171302A1 (en) Individualized medical intervention planning
WO2025229620A1 (en) System and method for early diagnosis of endometriosis
WO2022185320A1 (en) Machine learning models for prediction of unplanned cesarean delivery
CN120319466A (en) A cervical cancer screening system based on risk HPV extended typing
US20250182902A1 (en) Systems and methods for predicting outcomes for a lung undergoing an ex vivo lung perfusion
Bella et al. Naive Bayes Classification for Early Prediction of Diabetes Mellitus
CN117711618A (en) A protein-based kidney disease risk prediction system and storage medium
Li et al. Novel Clinically Validated Machine Learning Model for Early Pregnancy Loss in Recurrent Spontaneous Abortion: Integrating Serum Autoantibodies and Ultrasonic Parameters
Alle et al. COVID-19 risk stratification and mortality prediction in hospitalized Indian patients
AU2021102832A4 (en) System &amp; method for automatic health prediction using fuzzy based machine learning
CN116798627A (en) Model and system for predicting psoriasis recurrence by using IL-17A inhibitor, construction method and application thereof
CN116844728A (en) IL-17A inhibitor efficacy prediction model and system for treating psoriasis and its construction method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 25798153

Country of ref document: EP

Kind code of ref document: A1