WO2022250446A1 - Procédé et dispositif de diagnostic pour déterminer la présence ou l'absence de troubles gastro-intestinaux à l'aide d'un modèle d'apprentissage automatique - Google Patents

Procédé et dispositif de diagnostic pour déterminer la présence ou l'absence de troubles gastro-intestinaux à l'aide d'un modèle d'apprentissage automatique Download PDF

Info

Publication number
WO2022250446A1
WO2022250446A1 PCT/KR2022/007418 KR2022007418W WO2022250446A1 WO 2022250446 A1 WO2022250446 A1 WO 2022250446A1 KR 2022007418 W KR2022007418 W KR 2022007418W WO 2022250446 A1 WO2022250446 A1 WO 2022250446A1
Authority
WO
WIPO (PCT)
Prior art keywords
machine learning
learning model
absence
microorganism
confirmed
Prior art date
Application number
PCT/KR2022/007418
Other languages
English (en)
Korean (ko)
Inventor
지요셉
박소영
Original Assignee
주식회사 에이치이엠파마
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 에이치이엠파마 filed Critical 주식회사 에이치이엠파마
Publication of WO2022250446A1 publication Critical patent/WO2022250446A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Definitions

  • the present invention relates to a method and a diagnostic device for determining the presence or absence of a digestive disorder using a machine learning model.
  • Gastrointestinal disorder refers to the occurrence of abnormal symptoms related to digestion due to abnormalities in the digestive system such as the stomach, intestines, duodenum, and liver. Digestive diseases are caused by irregular eating habits, mental stress, irregular sleep and life patterns. Gastrointestinal disorders commonly occur in the esophagus, stomach, duodenum, etc., and also occur in the lower gastrointestinal tract, pancreas, and biliary tract.
  • a representative digestive disorder is functional gastrointestinal disorder.
  • Functional gastrointestinal disorder is one of the diseases that have increased in modern times, and one in four adults has functional gastrointestinal disorder.
  • One of the major characteristics of this disorder is that although digestive function is poor, it is difficult to determine whether or not there is a disorder even when the stomach or intestines are examined with an endoscopy.
  • digestive disorders can develop into gastric ulcer, duodenal ulcer, gastric cancer, etc., so it can be said that determining whether or not there is a digestive disorder is important for modern people's disease prevention.
  • genome refers to genes contained in chromosomes
  • microbiota refers to the microbial community in the environment as a microflora
  • microbiome refers to the genome of the total microbial community in the environment.
  • the microbiome may mean a combination of a genome and a microbiota.
  • Patent Registration No. 10-2057047 which is a prior art, relates to a disease prediction device and a disease prediction method using the same, which compares a specific person vector extracted from a specific person's bio signal with a learning vector to predict a disease of a specific person.
  • a prediction method is disclosed.
  • bacterial metagenome analysis is performed without undergoing a special process such as culturing a sample, and it is difficult to derive an accurate causative factor for digestive disorders due to large bias between samples of each subject.
  • the present invention is to solve the above problems, a machine learning model for diagnosing the presence or absence of digestive disorders by selecting microorganism-related variables from a plurality of microorganism data based on the analysis result of a mixture obtained by mixing a sample with a composition similar to the intestinal environment. to improve the performance of
  • one embodiment of the present invention is a method for determining the presence or absence of digestive disorders using a machine learning model is a mixture obtained by mixing an intestinal-derived substance collected from an individual with a composition similar to the intestinal environment. Analyzing, extracting a plurality of microbial data based on the analysis result of the mixture, selecting a microbial-related variable to be used in a machine learning model from among the plurality of microbial data based on a predetermined variable selection algorithm, the microorganism
  • the method may include learning the machine learning model using related variables and determining whether or not there is a digestive disorder by inputting microbial data collected from the object to be inspected into the learned machine learning model.
  • the microorganism-related variables are RF39, Lachnospiraceae, Enterobacteriaceae, Barnesiellaceae, Butyricicoccaceae, Bacteroidaceae , Streptococcaceae, and Anaerovoracaceae may include a content of one or more microorganisms selected from the genus belonging to the family.
  • another embodiment of the present invention is a device for diagnosing the presence or absence of digestive disorders using a machine learning model, which collects a plurality of microbial data based on the analysis result of a mixture obtained by mixing an intestinal-derived material collected from an individual with a composition similar to the intestinal environment.
  • the microorganism-related variables are RF39, Lachnospiraceae, Enterobacteriaceae, Barnesiellaceae, Butyricicoccaceae, Bacteroidaceae , Streptococcaceae, and Anaerovoracaceae may include a content of one or more microorganisms selected from the genus belonging to the family.
  • a machine for diagnosing the presence or absence of digestive disorders by selecting microorganism-related variables from a plurality of microorganism data based on the analysis result of a mixture obtained by mixing a sample with a composition similar to the intestinal environment.
  • the performance of the running model can be improved.
  • FIG. 1 is a block diagram of a diagnostic device according to an embodiment of the present invention.
  • FIG. 2 is a diagram showing an MCMOD technique according to an embodiment of the present invention.
  • FIG. 3 is a diagram for explaining sample analysis through the MCMOD technique according to an embodiment of the present invention.
  • FIG. 4 is a diagram for explaining interpretation of sample analysis results through the MCMOD technique according to an embodiment of the present invention.
  • 5 is a binomial distribution deviation plot of the analysis results according to the method of determining the presence or absence of digestive disorders according to an embodiment of the present invention and the method of Comparative Example by checking the error value according to the number of variables to determine the optimal range of the number of variables This is a diagram showing the results of the verification.
  • 6A is a diagram for explaining the importance of selected microorganism-related variables.
  • 6B is a diagram for explaining the importance of selected microbial-related variables.
  • FIG. 7 is a diagram comparing analysis results of each sample according to a method for determining the presence or absence of a digestive disorder according to an embodiment of the present invention and a method of a comparative example.
  • FIG. 8 is a diagram comparing analysis results of each sample according to a method for determining the presence or absence of a digestive disorder according to an embodiment of the present invention and a method of a comparative example.
  • FIG. 9 is a diagram showing a receiver operating characteristic (ROC) curve and an area under a ROC curve (AUC) score of each of the XGB models according to the method for determining the presence or absence of digestive disorders according to an embodiment of the present invention and the method of a comparative example.
  • ROC receiver operating characteristic
  • AUC area under a ROC curve
  • FIG. 10 is a diagram comparing performance of a method for determining the presence or absence of a digestive disorder according to an embodiment of the present invention and an XGB model according to a method of a comparative example.
  • FIG. 11 is a diagram comparing performance of a machine learning model according to a method for determining the presence or absence of a digestive disorder according to an embodiment of the present invention and a method of a comparative example.
  • 12A is a diagram showing LEfSe (Linear discriminant analysis efficiency size) according to a method for determining the presence or absence of a digestive disorder according to an embodiment of the present invention.
  • 12B is a diagram showing LEfSe (Linear discriminant analysis efficiency size) according to a method of a comparative example of the present invention.
  • Figure 13a is a diagram showing the Pearson correlation (correlation) for the distribution of microorganisms according to the method for determining the presence or absence of digestive disorders according to an embodiment of the present invention.
  • Figure 13b is a diagram showing the Pearson correlation (correlation) for the distribution of microorganisms according to the method of a comparative example of the present invention.
  • 14A is a diagram showing Pearson's correlation for each microbial gene pathway prediction according to the method for determining the presence or absence of digestive disorders according to an embodiment of the present invention.
  • Figure 14b is a diagram showing Pearson's correlation for each microbial gene pathway prediction (gene pathway prediction) according to the method of a comparative example of the present invention.
  • SFAs short chain fatty acids
  • 16 is a flowchart illustrating a method for determining the presence or absence of a fire extinguisher disorder according to an embodiment of the present invention.
  • a "unit” includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Further, one unit may be realized using two or more hardware, and two or more units may be realized by one hardware.
  • some of the operations or functions described as being performed by a terminal or device may be performed instead by a server connected to the terminal or device.
  • some of the operations or functions described as being performed by the server may also be performed in a terminal or device connected to the corresponding server.
  • the diagnosis device 1 may include a microorganism data extraction unit 100, a variable selection unit 110, a learning unit 120, and a diagnosis unit 130.
  • the diagnosis device 1 may be a determination device for determining whether there is a digestive disorder.
  • An example of the diagnosis device 1 may include a mobile terminal capable of wired/wireless communication as well as a personal computer such as a desktop or laptop computer.
  • a mobile terminal is a wireless communication device that guarantees portability and mobility, and includes not only smartphones, tablet PCs, and wearable devices, but also Bluetooth (BLE, Bluetooth Low Energy), NFC, RFID, ultrasonic, infrared, and Wi-Fi ( It may include various devices equipped with communication modules such as WiFi) and LiFi.
  • the diagnostic device 1 is not limited to the form shown in FIG. 1 or those previously exemplified.
  • the diagnosis device 1 may detect a biomarker for diagnosing the presence or absence of a digestive disorder caused by an abnormality in the intestinal environment in a sample collected from an individual.
  • the diagnostic device 1 may diagnose the presence or absence of a digestive disorder based on a sample preparation process, a sample preprocessing process, a sample analysis process and a data analysis process, and derived data.
  • diagnosis may mean determining or predicting the presence or absence of a digestive disorder through an output value of a machine learning model.
  • the biomarker may be a substance detected in the intestine, and specifically, may include intestinal flora, endotoxin, hydrogen sulfide, intestinal microbial metabolites, short-chain fatty acids, etc., but is not limited thereto.
  • the microbial data extraction unit 100 may extract a plurality of microbial data based on an analysis result of a mixture obtained by mixing a sample collected from an individual with a composition similar to the intestinal environment.
  • the plurality of microbial data may be classified into training data (Training set) and test data (Test set) to be used for learning, and the ratio of classification may vary such as 9: 1, 7: 3, 5: 5, etc. , preferably in a 7:3 ratio.
  • a pretreatment is performed to analyze a mixture in which a sample is mixed with an intestinal environment-like composition.
  • the pretreatment may be referred to as MCMOD (Meta-culture Multi-Omics Diagnose).
  • fecal-derived microbiome and metabolites are analyzed in vitro for fecal samples from humans and various animals that can most easily represent the microbial environment in the body. do.
  • subject means any organism that has an abnormality in the intestinal environment, has a possibility of developing or developing a disease due to an abnormality in the intestinal environment, or needs to improve the intestinal environment, and specific examples include mice and monkeys. , cattle, pigs, mini-pigs, livestock, mammals including humans, birds, farmed fish, etc. may be included without limitation.
  • Sample means a material derived from the subject, and may be, for example, a material derived from the intestine.
  • Sample may specifically be cells, urine, feces, etc., but the type is not limited thereto as long as substances existing in the intestine such as intestinal flora, intestinal microbial metabolites, endotoxins, and short-chain fatty acids can be detected.
  • composition similar to the intestinal environment may be a composition for mimicking the same or similar intestinal environment of the subject in vitro.
  • the intestinal milieu-like composition may be a culture medium composition, but is not limited thereto.
  • the intestinal environment-like composition may include L-cysteine Hydrochloride and Mucin.
  • L-cysteine Hydrochloride is one of the amino acid enhancers, and plays an important role in metabolism as a component of glutathione in vivo, preventing browning of fruit juice, etc., and preventing oxidation of vitamin C. is also used
  • L-cysteine hydrochloride may be included at a concentration of, for example, 0.001% (w/v) to 5% (w/v), specifically 0.01% (w/v) to 0.1% (w/v) may be included at a concentration of
  • L-cysteine hydrochloride is one of various formulations or forms of L-cysteine, and the composition may include not only L-cysteine, but also L-cysteine including other types of salts.
  • Mucin is a mucous substance secreted from the mucous membrane, also called mucin or mucin, and there are submandibular gland mucin, gastric mucin, small intestine mucin, etc. It is known to be one of the energy sources that can be used as a carbon source and nitrogen source.
  • Mucin may be included, for example, at a concentration of 0.01% (w/v) to 5% (w/v), specifically at a concentration of 0.1% (w/v) to 1% (w/v) It may include, but is not limited to.
  • the intestinal environment-like composition may not contain nutrients other than mucin, and may specifically be characterized in that it does not contain nitrogen sources and/or carbon sources such as proteins and carbohydrates.
  • the protein serving as the carbon source and nitrogen source may be one or more of tryptone, peptone, and yeast extract, but is not limited thereto, and may specifically be tryptone.
  • the carbohydrate serving as a carbon source may be one or more of monosaccharides such as glucose, fructose, and galactose, and disaccharides such as maltose and lactose, but is not limited thereto, and may specifically be glucose.
  • the intestinal environment-like composition may not contain glucose and tryptone, but is not limited thereto.
  • the composition similar to the intestinal environment may further include at least one selected from the group consisting of sodium chloride (NaCl), sodium carbonate (NaHCO3), KCl (potassium chloride), and hemin, and the sodium chloride is, for example, at a concentration of 10 to 100 mM. It may be included as, sodium carbonate may be included at a concentration of, for example, 10 to 100 mM, potassium chloride may be included at a concentration of, for example, 1 to 30 mM, and hemin may be included at a concentration of, for example, 1x10 -6 g/L to 1x10-4 g/L may be included, but is not limited thereto.
  • NaCl sodium chloride
  • NaHCO3 sodium carbonate
  • KCl potassium chloride
  • hemin may be included at a concentration of, for example, 1x10 -6 g/L to 1x10-4 g/L may be included, but is not limited thereto.
  • the mixture can be incubated for 18 to 24 hours in anaerobic conditions.
  • equal amounts of a homogenized mixture of feces and medium in an anaerobic chamber are dispensed to a culture plate such as a 96-well plate.
  • the culture may be carried out for 12 hours to 48 hours, specifically, it may be performed for 18 hours to 24 hours, but is not limited thereto.
  • each experimental group is fermented and cultured by incubating the plate under anaerobic conditions with the temperature, humidity and motion similar to that of the intestinal environment.
  • the culture in which the mixture was grown is analyzed.
  • the analysis of the culture is, for example, the content, concentration and type of one or more of endotoxin, hydrogen sulfide, short-chain fatty acids (SCFAs) and intestinal flora-derived metabolites contained in the culture.
  • SCFAs short-chain fatty acids
  • intestinal flora-derived metabolites contained in the culture.
  • endotoxin is a toxic substance found inside bacterial cells and is an antigen composed of a complex of proteins, polysaccharides, and lipids.
  • the endotoxin may include, but is not limited to, LPS (Lipopolysaccharide), and the LPS may be specifically Gram negative and pro-inflammatory.
  • Short-chain fatty acid refers to short-chain fatty acids having 6 or less carbon atoms, and is a representative metabolite produced by intestinal microorganisms. Short-chain fatty acids have useful functions in the body, such as increasing immunity, stabilizing intestinal lymphocytes, lowering insulin signal, and stimulating sympathetic nerves.
  • short-chain fatty acids are formate, acetic acid, propionate, butyrate, isobutyrate, valerate, and iso-valerate. It may include one or more selected from the group consisting of, but is not limited thereto.
  • various analytical methods that can be used for the analysis by those skilled in the art, such as absorbance analysis, chromatography analysis, gene analysis such as next generation sequencing, and metagenomic analysis, can be used.
  • the supernatant and the precipitate can be analyzed.
  • metabolites, short-chain fatty acids, toxic substances, etc. may be analyzed from the supernatant, and intestinal flora analysis may be performed from the precipitate.
  • enterobacteriaceae After extracting all the genomes in the sample, enterobacteriaceae can be identified through genome-based analysis such as real-time PCR using bacteria-specific primers suggested in the GULDA method or metagenome analysis such as Next Generation Sequencing. analysis can be analyzed.
  • the present invention it is possible to reduce deviation between learning data by optimizing learning data before machine learning by analyzing cultures in a state in which an intestinal environment is implemented in vitro through an intestinal environment-like composition.
  • the performance of the machine learning model can be improved by facilitating the selection of microorganism-related variables to be described later and learning the machine learning model through these microorganism-related variables. Therefore, it is possible to increase the accuracy of diagnosing the presence or absence of digestive disorders through the learned machine learning model.
  • the variable selection unit 110 may select (ie, feature selection) variables related to microorganisms from among a plurality of microorganism data as variables to be used in the machine learning model based on a preset variable selection algorithm.
  • the number of microbe-related variables can be between 3 and 10.
  • the optimal number of microbe-related variables may be 10.
  • variables features, variables, or attributes
  • problems such as overfitting of the machine learning model or decrease in prediction accuracy occur.
  • variable selection algorithm may include, for example, at least one of a Boruta algorithm and a recursive feature elimination (RFE) algorithm.
  • RFE recursive feature elimination
  • Microbial-related variables selected from the preset variable selection algorithm are RF39, Lachnospiraceae, Enterobacteriaceae, Barnesiellaceae, Butyricicoccaceae, Bactero It may contain the content of one or more types of microorganisms selected from Genus belonging to the family of Bacteroidaceae, Streptococcaceae, and Anaerovoracaceae.
  • the microorganism-related variable selected from the preset variable selection algorithm is, for example, Coprobacter, Ruminococcus, Butyricoccus, Bacteroides, Streptococcus ( The content of one or more microorganisms selected from Species belonging to the Streptococcus Genus may be further included.
  • the learning unit 120 may train a machine learning model using microorganism-related variables.
  • the learning unit 120 performs supervised learning based on labeling for the presence or absence of digestive disorders for each microbial data (learning data) and the content of microorganisms related to the selected variable to predict the presence or absence of digestive disorders for each microbial data.
  • machine learning models can be trained.
  • the machine learning model includes, for example, at least one of a linear regression analysis (LRA) model, a random forest model, a generalized linear (GLMNET) model, a gradient boosting model, and an extreme gradient boost (XGB) model. can do.
  • LRA linear regression analysis
  • GLMNET generalized linear
  • XGB extreme gradient boost
  • the diagnosis unit 130 may diagnose the presence or absence of a digestive disorder by inputting the microbial data collected from the object to be examined into the learned machine learning model.
  • the diagnosis unit 130 may diagnose a digestive disorder based on the presence or absence of a digestive disorder, which is an output value of a machine learning model. That is, the diagnosis unit 130 may determine the presence or absence of a digestive disorder in the object to be tested or predict the probability of occurrence of a digestive disorder in the object to be tested based on the output value of the machine learning model.
  • Example 1 Microbial-related variables selected based on recursive variable elimination algorithm after treatment with or without MCMOD
  • a pretreatment is performed to analyze a mixture in which a sample is mixed with an intestinal environment-like composition.
  • the above-described pretreatment may be referred to as MCMOD.
  • the comparative example relates to a method for determining the presence or absence of digestive disorders through microbial data extracted by performing only a normal pre-treatment without performing the above-described pre-treatment on a sample.
  • the conventional pretreatment for the comparative example is named SMOD.
  • the samples are MCMOD of a simple clinical data set (feces) based on the self-response results from 44 patients with gastro-intestinal tract disorder (disease group) and 154 normal people (normal group) and microbial data of SMOD were used, and in particular, oversampling and undersampling were performed on the data set to resolve class imbalance, and the corresponding data set included 82 normal data and 78 digestive disorder data. A total of 160 data sets were converted.
  • Microbial data was classified into training data (Train set) and test data (Test set) to be used for learning at a ratio of 7:3.
  • variable selection was performed using the Boruta algorithm, binomial deviance plot, and XGB model for the training data to select microorganism-related variables to be used in the machine learning model. Meanwhile, the test data was used to evaluate the performance of the machine learning model as described below.
  • Table 1 shows the method for determining the presence or absence of digestive disorders according to an embodiment of the present invention
  • Table 2 shows the results of primarily selecting variables through the Boruta algorithm for the analysis results according to the comparative example method.
  • Figure 5 shows the result of confirming the optimal number of variables by checking the error value according to the number of variables with a binomial distribution deviation plot for the analysis results according to the method for determining the presence or absence of digestive disorders according to an embodiment of the present invention and the method of Comparative Example. do.
  • MCMOD was 3 to 10
  • SMOD was 1 to 5.
  • 6A and 6B show the importance of selected microbe-related variables.
  • a plurality of microorganism-related variables selected through the XGB model may be selected. 10 microbe-related variables with high accuracy for MCMOD and 5 for SMOD are shown.
  • a microorganism-related variable with high accuracy among a plurality of selected microorganism-related variables may be a microorganism of the RF39 family.
  • FIG. 7 is a diagram comparing the analysis results of each sample according to the method of determining the presence or absence of digestive disorders according to an embodiment of the present invention and the method of Comparative Example
  • FIG. It is a diagram comparing the analysis results of each sample according to the method of Comparative Example.
  • the beta diversity of each fecal sample is expressed as a PCoA plot using Unweighted Unifrac Distance. As shown in the PCoA plot of FIG. 7 (a), it can be seen that the MCMOD-treated fecal samples are relatively clustered, whereas the MCMOD-untreated fecal samples are relatively scattered.
  • Figure 7 (c) shows the distance between eight points in each group (Examples and Comparative Examples) on the PCoA plot.
  • the bias between the fecal samples is small, so the fecal samples have relatively little noise, and thus have little variability.
  • variable selection is facilitated by MCMOD processing of fecal samples before variable selection and machine learning learning, and the performance of the machine learning model can be improved by learning the machine learning model as will be described later.
  • Comparative Example 2 Comparison of performance of machine learning models trained using learning data obtained from each of fecal samples treated with MCMOD and those without MCMOD treatment
  • Microbial data was extracted by MCMOD treatment of the fecal sample collected in Example 1 (Example), and microbial data was extracted without MCMOD treatment (Comparative Example).
  • the optimal number of variables was set through a binomial distribution deviation plot, and a plurality of microorganism-related variables were selected for the XGB model.
  • FIG. 9 is a diagram showing a receiver operating characteristic (ROC) curve and an area under a ROC curve (AUC) score of each of the XGB models according to the method for determining the presence or absence of digestive disorders according to an embodiment of the present invention and the method of a comparative example.
  • 10 is a diagram comparing performance of a method for determining the presence or absence of a digestive disorder according to an embodiment of the present invention and an XGB model according to a method of a comparative example.
  • 11 is a diagram comparing performance of a machine learning model according to a method for determining the presence or absence of a digestive disorder according to an embodiment of the present invention and a method of a comparative example.
  • FIG. 12a is a diagram illustrating a method for determining the presence or absence of a digestive disorder according to an embodiment of the present invention
  • FIG. 12b is a diagram showing LEfSe according to a method of a comparative example.
  • Figure 13a is a method for determining the presence or absence of digestive disorders according to an embodiment of the present invention
  • Figure 13b is a diagram showing the Pearson's correlation for the distribution of microorganisms according to the comparative example method.
  • 14a is a method for determining the presence or absence of digestive disorders according to an embodiment of the present invention
  • FIG. 14b is a diagram showing Pearson's correlation for each microbial gene pathway prediction according to the method of a comparative example.
  • 15 is a diagram comparing the amount of short chain fatty acids (SCFAs) according to the method for determining the presence or absence of digestive disorders according to an embodiment of the present invention and the method of Comparative Example.
  • SCFAs short chain fatty acids
  • the average sensitivity (Average true positive rate), average specificity (Average False Positive Rate), accuracy and AUC values all show higher values in the example than in the comparative example, so that the microorganisms of the example are better than the comparative example.
  • the XGB model's ability to discriminate whether or not there is a digestive disorder increases.
  • FIG. 11 shows Roc curves and AUC scores of each machine learning model. As shown in FIG. 11, when the machine learning model is learned using the microbial data of the example, it can be confirmed that the performance of all machine learning models is higher than that of the comparative example.
  • FIGS. 12A and 12B show the difference between each microorganism characteristically found in a disease group and a normal group. Referring to FIGS. 12A and 12B , it can be seen that more microbial taxa are identified in LEfSe analyzed through Examples than in Comparative Examples.
  • the example can more clearly determine the difference between the normal group and the patient group than the comparative example.
  • FIGS. 14a and 14b show the Pearson correlation between each microbial gene pathway abundance and the above-described numerical data. This is a comparison drawing. Referring to Figures 13a, 13b, 14a, 14b, since the Pearson correlation of the example data is higher than that of the comparative example, the digestive disorder detection method according to the embodiment is more advantageous than the determination method according to the comparative example. Able to know.
  • 15 is a diagram comparing the amount of short-chain fatty acids in the data of Examples and the data of Comparative Examples. In general, it is known that the higher the absolute amount of short-chain fatty acids (acetic acid, propionic acid, butyric acid), the more beneficial it is.
  • the disease group has a higher amount than the normal group, but in the example, it can be seen that the difference is reduced compared to the example even if the average of the normal group is higher or the number of disease groups is larger.
  • FIG. 16 is a flowchart illustrating a method for determining the presence or absence of a digestive disorder according to an embodiment of the present invention.
  • the method for determining whether or not there is a digestive disorder according to an embodiment shown in FIG. 16 includes steps processed time-sequentially in the diagnosis device shown in FIG. 1 . Therefore, even if the content is omitted below, it is also applied to the fire extinguisher failure detection method performed according to the embodiment shown in FIG. 16 .
  • a mixture obtained by mixing the intestinal-derived material collected from the subject in step S1700 with a composition similar to the intestinal environment can be analyzed.
  • step S1710 data of a plurality of microorganisms may be extracted based on the analysis result of the mixture.
  • a microorganism-related variable to be used in the machine learning model may be selected from a plurality of microorganism data based on a preset variable selection algorithm.
  • a machine learning model may be trained using microorganism-related variables.
  • a machine learning model may be trained using microorganism-related variables.
  • the fire extinguisher disorder detection method described with reference to FIG. 16 may be implemented in the form of a computer program stored in a medium or in the form of a recording medium containing instructions executable by a computer, such as program modules executed by a computer.
  • Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media.
  • Computer readable media may include computer storage media.
  • Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Primary Health Care (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioethics (AREA)
  • Molecular Biology (AREA)
  • Physiology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Procédé pour déterminer la présence ou l'absence de troubles gastro-intestinaux à l'aide d'un modèle d'apprentissage automatique, pouvant comprendre les étapes consistant à : analyser un mélange d'une matière dérivée de l'intestin prélevée chez un sujet et d'une composition simulée d'environnement intestinal ; extraire une pluralité de données de micro-organismes sur la base du résultat d'analyse du mélange ; sélectionner une variable associée aux micro-organismes à utiliser dans un modèle d'apprentissage automatique parmi la pluralité de données de micro-organismes sur la base d'un algorithme de sélection de variable prédéfini ; entraîner le modèle d'apprentissage automatique à l'aide de la variable associée aux micro-organismes ; et déterminer la présence ou l'absence de troubles gastro-intestinaux par entrée des données de micro-organismes collectées auprès d'un sujet à tester dans le modèle d'apprentissage automatique entraîné. La variable associée aux micro-organismes peut comprendre la teneur d'au moins un des genres appartenant aux familles RF39, Lachnospiraceae, Enterobacteriaceae, Barnesiellaceae, Butyricicoccaceae, Bacteroidaceae, Streptococcaceae et Anaerovoracaceae.
PCT/KR2022/007418 2021-05-25 2022-05-25 Procédé et dispositif de diagnostic pour déterminer la présence ou l'absence de troubles gastro-intestinaux à l'aide d'un modèle d'apprentissage automatique WO2022250446A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020210066614A KR20220158950A (ko) 2021-05-25 2021-05-25 머신러닝 모델을 이용하여 소화기 장애 유무를 판별하는 방법 및 진단 장치
KR10-2021-0066614 2021-05-25

Publications (1)

Publication Number Publication Date
WO2022250446A1 true WO2022250446A1 (fr) 2022-12-01

Family

ID=84228971

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2022/007418 WO2022250446A1 (fr) 2021-05-25 2022-05-25 Procédé et dispositif de diagnostic pour déterminer la présence ou l'absence de troubles gastro-intestinaux à l'aide d'un modèle d'apprentissage automatique

Country Status (2)

Country Link
KR (1) KR20220158950A (fr)
WO (1) WO2022250446A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012115885A1 (fr) * 2011-02-22 2012-08-30 Caris Life Sciences Luxembourg Holdings, S.A.R.L. Biomarqueurs circulants
JP2020507308A (ja) * 2016-12-28 2020-03-12 アスカス バイオサイエンシーズ, インコーポレイテッド 複雑な不均一コミュニティの微生物株の解析、その機能的関連性及び相互作用の決定、ならびにそれに基づく診断及び生物学的状態の管理、のための方法、装置、及びシステム
KR20200054203A (ko) * 2017-08-14 2020-05-19 소마젠 인크 질병-관련 마이크로바이옴 특성화 프로세스
KR20200090135A (ko) * 2019-01-18 2020-07-28 주식회사 천랩 과민성대장증후군 특이적 미생물 바이오마커와 이를 이용하여 과민성대장증후군의 위험도를 예측하는 방법
KR102241357B1 (ko) * 2020-10-20 2021-04-16 주식회사 에이치이엠 머신러닝 모델을 이용하여 대장용종을 진단하는 방법 및 장치

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012115885A1 (fr) * 2011-02-22 2012-08-30 Caris Life Sciences Luxembourg Holdings, S.A.R.L. Biomarqueurs circulants
JP2020507308A (ja) * 2016-12-28 2020-03-12 アスカス バイオサイエンシーズ, インコーポレイテッド 複雑な不均一コミュニティの微生物株の解析、その機能的関連性及び相互作用の決定、ならびにそれに基づく診断及び生物学的状態の管理、のための方法、装置、及びシステム
KR20200054203A (ko) * 2017-08-14 2020-05-19 소마젠 인크 질병-관련 마이크로바이옴 특성화 프로세스
KR20200090135A (ko) * 2019-01-18 2020-07-28 주식회사 천랩 과민성대장증후군 특이적 미생물 바이오마커와 이를 이용하여 과민성대장증후군의 위험도를 예측하는 방법
KR102241357B1 (ko) * 2020-10-20 2021-04-16 주식회사 에이치이엠 머신러닝 모델을 이용하여 대장용종을 진단하는 방법 및 장치

Also Published As

Publication number Publication date
KR20220158950A (ko) 2022-12-02

Similar Documents

Publication Publication Date Title
WO2022203351A1 (fr) Procédé et dispositif de diagnostic pour déterminer la présence ou l'absence d'entérite à l'aide d'un modèle d'apprentissage automatique
WO2022085941A1 (fr) Procédé et appareil de détermination de la présence ou de l'absence de polypes du côlon au moyen d'un modèle d'apprentissage automatique
Rodrigues et al. Transkingdom interactions between Lactobacilli and hepatic mitochondria attenuate western diet-induced diabetes
Tyler et al. Analyzing the human microbiome: a “how to” guide for physicians
WO2022203350A1 (fr) Méthode et dispositif de diagnostic pour déterminer la présence ou l'absence d'atopie à l'aide d'un modèle d'apprentissage automatique
Mai et al. Distortions in development of intestinal microbiota associated with late onset sepsis in preterm infants
Sacchetti et al. Gut microbiome investigation in celiac disease: from methods to its pathogenetic role
WO2021040159A1 (fr) Procédé de criblage d'une substance personnalisée améliorant l'environnement intestinal à l'aide d'un procédé pmas
Sheth et al. Evidence of transmission of Clostridium difficile in asymptomatic patients following admission screening in a tertiary care hospital
Guard et al. HORSE SPECIES SYMPOSIUM: Canine intestinal microbiology and metagenomics: From phylogeny to function
Hong et al. Identification of Neisseria meningitidis by MALDI-TOF MS may not be reliable
WO2019160284A1 (fr) Procédé de diagnostic d'un accident vasculaire cérébral par l'intermédiaire de l'analyse du métagénome bactérien
WO2018155950A1 (fr) Procédé de diagnostic du diabète par analyse du métagénome microbien
WO2022203353A1 (fr) Procédé et dispositif de diagnostic pour déterminer la présence ou l'absence de constipation à l'aide d'un modèle d'apprentissage automatique
Asakura et al. Long-term grow-out affects Campylobacter jejuni colonization fitness in coincidence with altered microbiota and lipid composition in the cecum of laying hens
WO2022203306A1 (fr) Procédé et dispositif de diagnostic pour déterminer l'hyperglycémie à l'aide d'un modèle d'apprentissage automatique
Nouioui et al. Streptacidiphilus bronchialis sp. nov., a ciprofloxacin-resistant bacterium from a human clinical specimen; reclassification of Streptomyces griseoplanus as Streptacidiphilus griseoplanus comb. nov. and emended description of the genus Streptacidiphilus
WO2022250446A1 (fr) Procédé et dispositif de diagnostic pour déterminer la présence ou l'absence de troubles gastro-intestinaux à l'aide d'un modèle d'apprentissage automatique
WO2022250447A1 (fr) Procédé et appareil de diagnostic pour déterminer la présence d'une maladie intestinale à l'aide d'un modèle d'apprentissage automatique
WO2022250444A1 (fr) Procédé et dispositif de diagnostic pour déterminer la présence ou l'absence d'une distension abdominale à l'aide d'un modèle d'apprentissage automatique
WO2022250445A1 (fr) Procédé et appareil de diagnostic pour déterminer la présence de maux d'estomac à l'aide d'un modèle d'apprentissage automatique
WO2022203307A1 (fr) Procédé pour déterminer si l'obésité est présente, à l'aide d'un modèle d'apprentissage automatique, et dispositif de diagnostic
WO2021049834A1 (fr) Procédé de diagnostic du cancer colorectal sur la base de métagénome et de métabolite de vésicules extracellulaires
WO2018155967A1 (fr) Procédé de diagnostic d'une maladie respiratoire obstructive chronique par analyse du métagénome bactérien
Wongkuna et al. Taxono-genomics description of Olsenella lakotia SW165 T sp. nov., a new anaerobic bacterium isolated from cecum of feral chicken

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22811639

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22811639

Country of ref document: EP

Kind code of ref document: A1