WO2022250446A1 - Procédé et dispositif de diagnostic pour déterminer la présence ou l'absence de troubles gastro-intestinaux à l'aide d'un modèle d'apprentissage automatique - Google Patents
Procédé et dispositif de diagnostic pour déterminer la présence ou l'absence de troubles gastro-intestinaux à l'aide d'un modèle d'apprentissage automatique Download PDFInfo
- Publication number
- WO2022250446A1 WO2022250446A1 PCT/KR2022/007418 KR2022007418W WO2022250446A1 WO 2022250446 A1 WO2022250446 A1 WO 2022250446A1 KR 2022007418 W KR2022007418 W KR 2022007418W WO 2022250446 A1 WO2022250446 A1 WO 2022250446A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- machine learning
- learning model
- absence
- microorganism
- confirmed
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 71
- 238000010801 machine learning Methods 0.000 title claims abstract description 61
- 208000018522 Gastrointestinal disease Diseases 0.000 title abstract description 8
- 244000005700 microbiome Species 0.000 claims abstract description 64
- 239000000203 mixture Substances 0.000 claims abstract description 45
- 238000004458 analytical method Methods 0.000 claims abstract description 43
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 23
- 238000012549 training Methods 0.000 claims abstract description 8
- 241000588921 Enterobacteriaceae Species 0.000 claims abstract description 7
- 239000000463 material Substances 0.000 claims abstract description 7
- 241000606126 Bacteroidaceae Species 0.000 claims abstract description 6
- 241000868590 Barnesiellaceae Species 0.000 claims abstract description 6
- 241001112693 Lachnospiraceae Species 0.000 claims abstract description 6
- 241000194018 Streptococcaceae Species 0.000 claims abstract description 6
- 230000001079 digestive effect Effects 0.000 claims description 64
- 230000000813 microbial effect Effects 0.000 claims description 43
- 230000000968 intestinal effect Effects 0.000 claims description 42
- 150000004666 short chain fatty acids Chemical class 0.000 claims description 23
- 235000021391 short chain fatty acids Nutrition 0.000 claims description 21
- 239000002158 endotoxin Substances 0.000 claims description 14
- 238000003745 diagnosis Methods 0.000 claims description 12
- 239000002207 metabolite Substances 0.000 claims description 9
- 238000002156 mixing Methods 0.000 claims description 8
- RWSOTUBLDIXVET-UHFFFAOYSA-N Dihydrogen sulfide Chemical compound S RWSOTUBLDIXVET-UHFFFAOYSA-N 0.000 claims description 6
- 241000192031 Ruminococcus Species 0.000 claims description 6
- 241000194017 Streptococcus Species 0.000 claims description 6
- 229910000037 hydrogen sulfide Inorganic materials 0.000 claims description 6
- 239000006228 supernatant Substances 0.000 claims description 6
- 230000008030 elimination Effects 0.000 claims description 5
- 238000003379 elimination reaction Methods 0.000 claims description 5
- 239000002244 precipitate Substances 0.000 claims description 5
- 241000894006 Bacteria Species 0.000 claims description 4
- 238000007637 random forest analysis Methods 0.000 claims description 4
- 241000606125 Bacteroides Species 0.000 claims description 3
- 241000503641 Coprobacter Species 0.000 claims description 3
- 238000013075 data extraction Methods 0.000 claims description 3
- 238000012417 linear regression Methods 0.000 claims description 3
- 241000894007 species Species 0.000 claims description 3
- 238000012258 culturing Methods 0.000 claims description 2
- OAICVXFJPJFONN-UHFFFAOYSA-N Phosphorus Chemical compound [P] OAICVXFJPJFONN-UHFFFAOYSA-N 0.000 claims 2
- 229910052698 phosphorus Inorganic materials 0.000 claims 2
- 239000011574 phosphorus Substances 0.000 claims 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 66
- 208000035475 disorder Diseases 0.000 description 57
- 230000000052 comparative effect Effects 0.000 description 46
- 238000010586 diagram Methods 0.000 description 32
- 230000002550 fecal effect Effects 0.000 description 23
- QTBSBXVTEAMEQO-UHFFFAOYSA-N Acetic acid Chemical compound CC(O)=O QTBSBXVTEAMEQO-UHFFFAOYSA-N 0.000 description 9
- 201000010099 disease Diseases 0.000 description 9
- 102000015728 Mucins Human genes 0.000 description 7
- 108010063954 Mucins Proteins 0.000 description 7
- 108090000623 proteins and genes Proteins 0.000 description 7
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 6
- WCUXLLCKKVVCTQ-UHFFFAOYSA-M Potassium chloride Chemical compound [Cl-].[K+] WCUXLLCKKVVCTQ-UHFFFAOYSA-M 0.000 description 6
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 6
- FERIUCNNQQJTOY-UHFFFAOYSA-N Butyric acid Natural products CCCC(O)=O FERIUCNNQQJTOY-UHFFFAOYSA-N 0.000 description 5
- 108700005443 Microbial Genes Proteins 0.000 description 5
- 210000000936 intestine Anatomy 0.000 description 5
- 230000037361 pathway Effects 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 4
- VLSOAXRVHARBEQ-UHFFFAOYSA-N [4-fluoro-2-(hydroxymethyl)phenyl]methanol Chemical compound OCC1=CC=C(F)C=C1CO VLSOAXRVHARBEQ-UHFFFAOYSA-N 0.000 description 4
- 230000005856 abnormality Effects 0.000 description 4
- 229910052799 carbon Inorganic materials 0.000 description 4
- 210000003608 fece Anatomy 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 229920006008 lipopolysaccharide Polymers 0.000 description 4
- 238000007481 next generation sequencing Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 239000000126 substance Substances 0.000 description 4
- FERIUCNNQQJTOY-UHFFFAOYSA-M Butyrate Chemical compound CCCC([O-])=O FERIUCNNQQJTOY-UHFFFAOYSA-M 0.000 description 3
- 241000186394 Eubacterium Species 0.000 description 3
- 208000014540 Functional gastrointestinal disease Diseases 0.000 description 3
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 3
- XBDQKXXYIPTUBI-UHFFFAOYSA-M Propionate Chemical compound CCC([O-])=O XBDQKXXYIPTUBI-UHFFFAOYSA-M 0.000 description 3
- 230000001580 bacterial effect Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 239000008103 glucose Substances 0.000 description 3
- 238000000338 in vitro Methods 0.000 description 3
- 239000002609 medium Substances 0.000 description 3
- 229910052757 nitrogen Inorganic materials 0.000 description 3
- 239000001103 potassium chloride Substances 0.000 description 3
- 235000011164 potassium chloride Nutrition 0.000 description 3
- 102000004169 proteins and genes Human genes 0.000 description 3
- 239000011780 sodium chloride Substances 0.000 description 3
- 210000002784 stomach Anatomy 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 239000003440 toxic substance Substances 0.000 description 3
- 239000012137 tryptone Substances 0.000 description 3
- PWKSKIMOESPYIA-UHFFFAOYSA-N 2-acetamido-3-sulfanylpropanoic acid Chemical compound CC(=O)NC(CS)C(O)=O PWKSKIMOESPYIA-UHFFFAOYSA-N 0.000 description 2
- CIWBSHSKHKDKBQ-JLAZNSOCSA-N Ascorbic acid Chemical compound OC[C@H](O)[C@H]1OC(=O)C(O)=C1O CIWBSHSKHKDKBQ-JLAZNSOCSA-N 0.000 description 2
- XUJNEKJLAYXESH-REOHCLBHSA-N L-Cysteine Chemical compound SC[C@H](N)C(O)=O XUJNEKJLAYXESH-REOHCLBHSA-N 0.000 description 2
- 241000736262 Microbiota Species 0.000 description 2
- CDBYLPFSWZWCQE-UHFFFAOYSA-L Sodium Carbonate Chemical compound [Na+].[Na+].[O-]C([O-])=O CDBYLPFSWZWCQE-UHFFFAOYSA-L 0.000 description 2
- 239000000090 biomarker Substances 0.000 description 2
- 150000001720 carbohydrates Chemical class 0.000 description 2
- 235000014633 carbohydrates Nutrition 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 2
- 238000004587 chromatography analysis Methods 0.000 description 2
- 208000010643 digestive system disease Diseases 0.000 description 2
- XBDQKXXYIPTUBI-UHFFFAOYSA-N dimethylselenoniopropionate Natural products CCC(O)=O XBDQKXXYIPTUBI-UHFFFAOYSA-N 0.000 description 2
- 210000001198 duodenum Anatomy 0.000 description 2
- RWSXRVCMGQZWBV-WDSKDSINSA-N glutathione Chemical compound OC(=O)[C@@H](N)CCC(=O)N[C@@H](CS)C(=O)NCC(O)=O RWSXRVCMGQZWBV-WDSKDSINSA-N 0.000 description 2
- BTIJJDXEELBZFS-QDUVMHSLSA-K hemin Chemical compound CC1=C(CCC(O)=O)C(C=C2C(CCC(O)=O)=C(C)\C(N2[Fe](Cl)N23)=C\4)=N\C1=C/C2=C(C)C(C=C)=C3\C=C/1C(C)=C(C=C)C/4=N\1 BTIJJDXEELBZFS-QDUVMHSLSA-K 0.000 description 2
- 229940025294 hemin Drugs 0.000 description 2
- 238000011534 incubation Methods 0.000 description 2
- NOESYZHRGYRDHS-UHFFFAOYSA-N insulin Chemical group N1C(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(NC(=O)CN)C(C)CC)CSSCC(C(NC(CO)C(=O)NC(CC(C)C)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CCC(N)=O)C(=O)NC(CC(C)C)C(=O)NC(CCC(O)=O)C(=O)NC(CC(N)=O)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CSSCC(NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2C=CC(O)=CC=2)NC(=O)C(CC(C)C)NC(=O)C(C)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2NC=NC=2)NC(=O)C(CO)NC(=O)CNC2=O)C(=O)NCC(=O)NC(CCC(O)=O)C(=O)NC(CCCNC(N)=N)C(=O)NCC(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC(O)=CC=3)C(=O)NC(C(C)O)C(=O)N3C(CCC3)C(=O)NC(CCCCN)C(=O)NC(C)C(O)=O)C(=O)NC(CC(N)=O)C(O)=O)=O)NC(=O)C(C(C)CC)NC(=O)C(CO)NC(=O)C(C(C)O)NC(=O)C1CSSCC2NC(=O)C(CC(C)C)NC(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CC(N)=O)NC(=O)C(NC(=O)C(N)CC=1C=CC=CC=1)C(C)C)CC1=CN=CN1 NOESYZHRGYRDHS-UHFFFAOYSA-N 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 231100000614 poison Toxicity 0.000 description 2
- 238000002203 pretreatment Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- OWEGMIWEEQEYGQ-UHFFFAOYSA-N 100676-05-9 Natural products OC1C(O)C(O)C(CO)OC1OCC1C(O)C(O)C(O)C(OC2C(OC(O)C(O)C2O)CO)O1 OWEGMIWEEQEYGQ-UHFFFAOYSA-N 0.000 description 1
- GWYFCOCPABKNJV-UHFFFAOYSA-M 3-Methylbutanoic acid Natural products CC(C)CC([O-])=O GWYFCOCPABKNJV-UHFFFAOYSA-M 0.000 description 1
- QTBSBXVTEAMEQO-UHFFFAOYSA-M Acetate Chemical compound CC([O-])=O QTBSBXVTEAMEQO-UHFFFAOYSA-M 0.000 description 1
- 241000251468 Actinopterygii Species 0.000 description 1
- GUBGYTABKSRVRQ-XLOQQCSPSA-N Alpha-Lactose Chemical compound O[C@@H]1[C@@H](O)[C@@H](O)[C@@H](CO)O[C@H]1O[C@@H]1[C@@H](CO)O[C@H](O)[C@H](O)[C@H]1O GUBGYTABKSRVRQ-XLOQQCSPSA-N 0.000 description 1
- 241000283690 Bos taurus Species 0.000 description 1
- 241000282693 Cercopithecidae Species 0.000 description 1
- 241000193403 Clostridium Species 0.000 description 1
- ZZZCUOFIHGPKAK-UHFFFAOYSA-N D-erythro-ascorbic acid Natural products OCC1OC(=O)C(O)=C1O ZZZCUOFIHGPKAK-UHFFFAOYSA-N 0.000 description 1
- BDAGIHXWWSANSR-UHFFFAOYSA-M Formate Chemical compound [O-]C=O BDAGIHXWWSANSR-UHFFFAOYSA-M 0.000 description 1
- 229930091371 Fructose Natural products 0.000 description 1
- 239000005715 Fructose Substances 0.000 description 1
- RFSUNEUAIZKAJO-ARQDHWQXSA-N Fructose Chemical compound OC[C@H]1O[C@](O)(CO)[C@@H](O)[C@@H]1O RFSUNEUAIZKAJO-ARQDHWQXSA-N 0.000 description 1
- 102000009338 Gastric Mucins Human genes 0.000 description 1
- 108010009066 Gastric Mucins Proteins 0.000 description 1
- 108010024636 Glutathione Proteins 0.000 description 1
- 102000004877 Insulin Human genes 0.000 description 1
- 108090001061 Insulin Proteins 0.000 description 1
- 239000004201 L-cysteine Substances 0.000 description 1
- 235000013878 L-cysteine Nutrition 0.000 description 1
- GUBGYTABKSRVRQ-QKKXKWKRSA-N Lactose Natural products OC[C@H]1O[C@@H](O[C@H]2[C@H](O)[C@@H](O)C(O)O[C@@H]2CO)[C@H](O)[C@@H](O)[C@H]1O GUBGYTABKSRVRQ-QKKXKWKRSA-N 0.000 description 1
- GUBGYTABKSRVRQ-PICCSMPSSA-N Maltose Natural products O[C@@H]1[C@@H](O)[C@H](O)[C@@H](CO)O[C@@H]1O[C@@H]1[C@@H](CO)OC(O)[C@H](O)[C@H]1O GUBGYTABKSRVRQ-PICCSMPSSA-N 0.000 description 1
- 241000124008 Mammalia Species 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- PQMWYJDJHJQZDE-UHFFFAOYSA-M Methantheline bromide Chemical compound [Br-].C1=CC=C2C(C(=O)OCC[N+](C)(CC)CC)C3=CC=CC=C3OC2=C1 PQMWYJDJHJQZDE-UHFFFAOYSA-M 0.000 description 1
- 241000699670 Mus sp. Species 0.000 description 1
- BZORFPDSXLZWJF-UHFFFAOYSA-N N,N-dimethyl-1,4-phenylenediamine Chemical compound CN(C)C1=CC=C(N)C=C1 BZORFPDSXLZWJF-UHFFFAOYSA-N 0.000 description 1
- 239000001888 Peptone Substances 0.000 description 1
- 108010080698 Peptones Proteins 0.000 description 1
- 208000005718 Stomach Neoplasms Diseases 0.000 description 1
- 208000007107 Stomach Ulcer Diseases 0.000 description 1
- 241000282887 Suidae Species 0.000 description 1
- 229930003268 Vitamin C Natural products 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000002835 absorbance Methods 0.000 description 1
- 238000011481 absorbance measurement Methods 0.000 description 1
- WQZGKKKJIJFFOK-PHYPRBDBSA-N alpha-D-galactose Chemical compound OC[C@H]1O[C@H](O)[C@H](O)[C@@H](O)[C@H]1O WQZGKKKJIJFFOK-PHYPRBDBSA-N 0.000 description 1
- 229940024606 amino acid Drugs 0.000 description 1
- 150000001413 amino acids Chemical class 0.000 description 1
- 238000000540 analysis of variance Methods 0.000 description 1
- 239000000427 antigen Substances 0.000 description 1
- 108091007433 antigens Proteins 0.000 description 1
- 102000036639 antigens Human genes 0.000 description 1
- 238000003149 assay kit Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- WQZGKKKJIJFFOK-VFUOTHLCSA-N beta-D-glucose Chemical compound OC[C@H]1O[C@@H](O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-VFUOTHLCSA-N 0.000 description 1
- GUBGYTABKSRVRQ-QUYVBRFLSA-N beta-maltose Chemical compound OC[C@H]1O[C@H](O[C@H]2[C@H](O)[C@@H](O)[C@H](O)O[C@@H]2CO)[C@H](O)[C@@H](O)[C@@H]1O GUBGYTABKSRVRQ-QUYVBRFLSA-N 0.000 description 1
- 210000003445 biliary tract Anatomy 0.000 description 1
- 229940041514 candida albicans extract Drugs 0.000 description 1
- 125000004432 carbon atom Chemical group C* 0.000 description 1
- 238000005119 centrifugation Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 210000000349 chromosome Anatomy 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 229960002433 cysteine Drugs 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 230000029087 digestion Effects 0.000 description 1
- 210000002249 digestive system Anatomy 0.000 description 1
- 150000002016 disaccharides Chemical class 0.000 description 1
- 230000006806 disease prevention Effects 0.000 description 1
- 208000000718 duodenal ulcer Diseases 0.000 description 1
- 235000006694 eating habits Nutrition 0.000 description 1
- 238000001839 endoscopy Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000003623 enhancer Substances 0.000 description 1
- 210000003238 esophagus Anatomy 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 235000015203 fruit juice Nutrition 0.000 description 1
- 229930182830 galactose Natural products 0.000 description 1
- 238000004817 gas chromatography Methods 0.000 description 1
- 206010017758 gastric cancer Diseases 0.000 description 1
- 201000005917 gastric ulcer Diseases 0.000 description 1
- 208000018685 gastrointestinal system disease Diseases 0.000 description 1
- 210000001035 gastrointestinal tract Anatomy 0.000 description 1
- 229960003180 glutathione Drugs 0.000 description 1
- 150000004676 glycans Chemical class 0.000 description 1
- 239000001963 growth medium Substances 0.000 description 1
- 230000036039 immunity Effects 0.000 description 1
- 238000001727 in vivo Methods 0.000 description 1
- 230000028709 inflammatory response Effects 0.000 description 1
- 229940125396 insulin Drugs 0.000 description 1
- RBTARNINKXHZNM-UHFFFAOYSA-K iron trichloride Chemical compound Cl[Fe](Cl)Cl RBTARNINKXHZNM-UHFFFAOYSA-K 0.000 description 1
- KQNPFQTWMSNSAP-UHFFFAOYSA-N isobutyric acid Chemical compound CC(C)C(O)=O KQNPFQTWMSNSAP-UHFFFAOYSA-N 0.000 description 1
- GWYFCOCPABKNJV-UHFFFAOYSA-N isovaleric acid Chemical compound CC(C)CC(O)=O GWYFCOCPABKNJV-UHFFFAOYSA-N 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000008101 lactose Substances 0.000 description 1
- 150000002632 lipids Chemical class 0.000 description 1
- 210000004185 liver Anatomy 0.000 description 1
- 244000144972 livestock Species 0.000 description 1
- 210000003750 lower gastrointestinal tract Anatomy 0.000 description 1
- 210000004698 lymphocyte Anatomy 0.000 description 1
- -1 maltose and lactose Chemical class 0.000 description 1
- 230000003340 mental effect Effects 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- CXKWCBBOMKCUKX-UHFFFAOYSA-M methylene blue Chemical compound [Cl-].C1=CC(N(C)C)=CC2=[S+]C3=CC(N(C)C)=CC=C3N=C21 CXKWCBBOMKCUKX-UHFFFAOYSA-M 0.000 description 1
- 229960000907 methylthioninium chloride Drugs 0.000 description 1
- 244000005706 microflora Species 0.000 description 1
- 244000309715 mini pig Species 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 150000002772 monosaccharides Chemical class 0.000 description 1
- 210000004400 mucous membrane Anatomy 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 235000015097 nutrients Nutrition 0.000 description 1
- 230000003647 oxidation Effects 0.000 description 1
- 238000007254 oxidation reaction Methods 0.000 description 1
- 210000000496 pancreas Anatomy 0.000 description 1
- 239000008188 pellet Substances 0.000 description 1
- 235000019319 peptone Nutrition 0.000 description 1
- 229920001282 polysaccharide Polymers 0.000 description 1
- 239000005017 polysaccharide Substances 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000770 proinflammatory effect Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 235000019260 propionic acid Nutrition 0.000 description 1
- IUVKMZGDUIUOCP-BTNSXGMBSA-N quinbolone Chemical compound O([C@H]1CC[C@H]2[C@H]3[C@@H]([C@]4(C=CC(=O)C=C4CC3)C)CC[C@@]21C)C1=CCCC1 IUVKMZGDUIUOCP-BTNSXGMBSA-N 0.000 description 1
- 238000003753 real-time PCR Methods 0.000 description 1
- 150000003839 salts Chemical class 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 210000000813 small intestine Anatomy 0.000 description 1
- 229910000029 sodium carbonate Inorganic materials 0.000 description 1
- UIIMBOGNXHQVGW-UHFFFAOYSA-N sodium;hydron;carbonate Chemical compound [Na+].OC(O)=O UIIMBOGNXHQVGW-UHFFFAOYSA-N 0.000 description 1
- 230000000087 stabilizing effect Effects 0.000 description 1
- 230000004936 stimulating effect Effects 0.000 description 1
- 201000011549 stomach cancer Diseases 0.000 description 1
- 210000001913 submandibular gland Anatomy 0.000 description 1
- 230000002889 sympathetic effect Effects 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 231100000167 toxic agent Toxicity 0.000 description 1
- 210000002700 urine Anatomy 0.000 description 1
- 229940070710 valerate Drugs 0.000 description 1
- NQPDZGIKBAWPEJ-UHFFFAOYSA-N valeric acid Chemical compound CCCCC(O)=O NQPDZGIKBAWPEJ-UHFFFAOYSA-N 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 235000019154 vitamin C Nutrition 0.000 description 1
- 239000011718 vitamin C Substances 0.000 description 1
- 239000012138 yeast extract Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/50—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Definitions
- the present invention relates to a method and a diagnostic device for determining the presence or absence of a digestive disorder using a machine learning model.
- Gastrointestinal disorder refers to the occurrence of abnormal symptoms related to digestion due to abnormalities in the digestive system such as the stomach, intestines, duodenum, and liver. Digestive diseases are caused by irregular eating habits, mental stress, irregular sleep and life patterns. Gastrointestinal disorders commonly occur in the esophagus, stomach, duodenum, etc., and also occur in the lower gastrointestinal tract, pancreas, and biliary tract.
- a representative digestive disorder is functional gastrointestinal disorder.
- Functional gastrointestinal disorder is one of the diseases that have increased in modern times, and one in four adults has functional gastrointestinal disorder.
- One of the major characteristics of this disorder is that although digestive function is poor, it is difficult to determine whether or not there is a disorder even when the stomach or intestines are examined with an endoscopy.
- digestive disorders can develop into gastric ulcer, duodenal ulcer, gastric cancer, etc., so it can be said that determining whether or not there is a digestive disorder is important for modern people's disease prevention.
- genome refers to genes contained in chromosomes
- microbiota refers to the microbial community in the environment as a microflora
- microbiome refers to the genome of the total microbial community in the environment.
- the microbiome may mean a combination of a genome and a microbiota.
- Patent Registration No. 10-2057047 which is a prior art, relates to a disease prediction device and a disease prediction method using the same, which compares a specific person vector extracted from a specific person's bio signal with a learning vector to predict a disease of a specific person.
- a prediction method is disclosed.
- bacterial metagenome analysis is performed without undergoing a special process such as culturing a sample, and it is difficult to derive an accurate causative factor for digestive disorders due to large bias between samples of each subject.
- the present invention is to solve the above problems, a machine learning model for diagnosing the presence or absence of digestive disorders by selecting microorganism-related variables from a plurality of microorganism data based on the analysis result of a mixture obtained by mixing a sample with a composition similar to the intestinal environment. to improve the performance of
- one embodiment of the present invention is a method for determining the presence or absence of digestive disorders using a machine learning model is a mixture obtained by mixing an intestinal-derived substance collected from an individual with a composition similar to the intestinal environment. Analyzing, extracting a plurality of microbial data based on the analysis result of the mixture, selecting a microbial-related variable to be used in a machine learning model from among the plurality of microbial data based on a predetermined variable selection algorithm, the microorganism
- the method may include learning the machine learning model using related variables and determining whether or not there is a digestive disorder by inputting microbial data collected from the object to be inspected into the learned machine learning model.
- the microorganism-related variables are RF39, Lachnospiraceae, Enterobacteriaceae, Barnesiellaceae, Butyricicoccaceae, Bacteroidaceae , Streptococcaceae, and Anaerovoracaceae may include a content of one or more microorganisms selected from the genus belonging to the family.
- another embodiment of the present invention is a device for diagnosing the presence or absence of digestive disorders using a machine learning model, which collects a plurality of microbial data based on the analysis result of a mixture obtained by mixing an intestinal-derived material collected from an individual with a composition similar to the intestinal environment.
- the microorganism-related variables are RF39, Lachnospiraceae, Enterobacteriaceae, Barnesiellaceae, Butyricicoccaceae, Bacteroidaceae , Streptococcaceae, and Anaerovoracaceae may include a content of one or more microorganisms selected from the genus belonging to the family.
- a machine for diagnosing the presence or absence of digestive disorders by selecting microorganism-related variables from a plurality of microorganism data based on the analysis result of a mixture obtained by mixing a sample with a composition similar to the intestinal environment.
- the performance of the running model can be improved.
- FIG. 1 is a block diagram of a diagnostic device according to an embodiment of the present invention.
- FIG. 2 is a diagram showing an MCMOD technique according to an embodiment of the present invention.
- FIG. 3 is a diagram for explaining sample analysis through the MCMOD technique according to an embodiment of the present invention.
- FIG. 4 is a diagram for explaining interpretation of sample analysis results through the MCMOD technique according to an embodiment of the present invention.
- 5 is a binomial distribution deviation plot of the analysis results according to the method of determining the presence or absence of digestive disorders according to an embodiment of the present invention and the method of Comparative Example by checking the error value according to the number of variables to determine the optimal range of the number of variables This is a diagram showing the results of the verification.
- 6A is a diagram for explaining the importance of selected microorganism-related variables.
- 6B is a diagram for explaining the importance of selected microbial-related variables.
- FIG. 7 is a diagram comparing analysis results of each sample according to a method for determining the presence or absence of a digestive disorder according to an embodiment of the present invention and a method of a comparative example.
- FIG. 8 is a diagram comparing analysis results of each sample according to a method for determining the presence or absence of a digestive disorder according to an embodiment of the present invention and a method of a comparative example.
- FIG. 9 is a diagram showing a receiver operating characteristic (ROC) curve and an area under a ROC curve (AUC) score of each of the XGB models according to the method for determining the presence or absence of digestive disorders according to an embodiment of the present invention and the method of a comparative example.
- ROC receiver operating characteristic
- AUC area under a ROC curve
- FIG. 10 is a diagram comparing performance of a method for determining the presence or absence of a digestive disorder according to an embodiment of the present invention and an XGB model according to a method of a comparative example.
- FIG. 11 is a diagram comparing performance of a machine learning model according to a method for determining the presence or absence of a digestive disorder according to an embodiment of the present invention and a method of a comparative example.
- 12A is a diagram showing LEfSe (Linear discriminant analysis efficiency size) according to a method for determining the presence or absence of a digestive disorder according to an embodiment of the present invention.
- 12B is a diagram showing LEfSe (Linear discriminant analysis efficiency size) according to a method of a comparative example of the present invention.
- Figure 13a is a diagram showing the Pearson correlation (correlation) for the distribution of microorganisms according to the method for determining the presence or absence of digestive disorders according to an embodiment of the present invention.
- Figure 13b is a diagram showing the Pearson correlation (correlation) for the distribution of microorganisms according to the method of a comparative example of the present invention.
- 14A is a diagram showing Pearson's correlation for each microbial gene pathway prediction according to the method for determining the presence or absence of digestive disorders according to an embodiment of the present invention.
- Figure 14b is a diagram showing Pearson's correlation for each microbial gene pathway prediction (gene pathway prediction) according to the method of a comparative example of the present invention.
- SFAs short chain fatty acids
- 16 is a flowchart illustrating a method for determining the presence or absence of a fire extinguisher disorder according to an embodiment of the present invention.
- a "unit” includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Further, one unit may be realized using two or more hardware, and two or more units may be realized by one hardware.
- some of the operations or functions described as being performed by a terminal or device may be performed instead by a server connected to the terminal or device.
- some of the operations or functions described as being performed by the server may also be performed in a terminal or device connected to the corresponding server.
- the diagnosis device 1 may include a microorganism data extraction unit 100, a variable selection unit 110, a learning unit 120, and a diagnosis unit 130.
- the diagnosis device 1 may be a determination device for determining whether there is a digestive disorder.
- An example of the diagnosis device 1 may include a mobile terminal capable of wired/wireless communication as well as a personal computer such as a desktop or laptop computer.
- a mobile terminal is a wireless communication device that guarantees portability and mobility, and includes not only smartphones, tablet PCs, and wearable devices, but also Bluetooth (BLE, Bluetooth Low Energy), NFC, RFID, ultrasonic, infrared, and Wi-Fi ( It may include various devices equipped with communication modules such as WiFi) and LiFi.
- the diagnostic device 1 is not limited to the form shown in FIG. 1 or those previously exemplified.
- the diagnosis device 1 may detect a biomarker for diagnosing the presence or absence of a digestive disorder caused by an abnormality in the intestinal environment in a sample collected from an individual.
- the diagnostic device 1 may diagnose the presence or absence of a digestive disorder based on a sample preparation process, a sample preprocessing process, a sample analysis process and a data analysis process, and derived data.
- diagnosis may mean determining or predicting the presence or absence of a digestive disorder through an output value of a machine learning model.
- the biomarker may be a substance detected in the intestine, and specifically, may include intestinal flora, endotoxin, hydrogen sulfide, intestinal microbial metabolites, short-chain fatty acids, etc., but is not limited thereto.
- the microbial data extraction unit 100 may extract a plurality of microbial data based on an analysis result of a mixture obtained by mixing a sample collected from an individual with a composition similar to the intestinal environment.
- the plurality of microbial data may be classified into training data (Training set) and test data (Test set) to be used for learning, and the ratio of classification may vary such as 9: 1, 7: 3, 5: 5, etc. , preferably in a 7:3 ratio.
- a pretreatment is performed to analyze a mixture in which a sample is mixed with an intestinal environment-like composition.
- the pretreatment may be referred to as MCMOD (Meta-culture Multi-Omics Diagnose).
- fecal-derived microbiome and metabolites are analyzed in vitro for fecal samples from humans and various animals that can most easily represent the microbial environment in the body. do.
- subject means any organism that has an abnormality in the intestinal environment, has a possibility of developing or developing a disease due to an abnormality in the intestinal environment, or needs to improve the intestinal environment, and specific examples include mice and monkeys. , cattle, pigs, mini-pigs, livestock, mammals including humans, birds, farmed fish, etc. may be included without limitation.
- Sample means a material derived from the subject, and may be, for example, a material derived from the intestine.
- Sample may specifically be cells, urine, feces, etc., but the type is not limited thereto as long as substances existing in the intestine such as intestinal flora, intestinal microbial metabolites, endotoxins, and short-chain fatty acids can be detected.
- composition similar to the intestinal environment may be a composition for mimicking the same or similar intestinal environment of the subject in vitro.
- the intestinal milieu-like composition may be a culture medium composition, but is not limited thereto.
- the intestinal environment-like composition may include L-cysteine Hydrochloride and Mucin.
- L-cysteine Hydrochloride is one of the amino acid enhancers, and plays an important role in metabolism as a component of glutathione in vivo, preventing browning of fruit juice, etc., and preventing oxidation of vitamin C. is also used
- L-cysteine hydrochloride may be included at a concentration of, for example, 0.001% (w/v) to 5% (w/v), specifically 0.01% (w/v) to 0.1% (w/v) may be included at a concentration of
- L-cysteine hydrochloride is one of various formulations or forms of L-cysteine, and the composition may include not only L-cysteine, but also L-cysteine including other types of salts.
- Mucin is a mucous substance secreted from the mucous membrane, also called mucin or mucin, and there are submandibular gland mucin, gastric mucin, small intestine mucin, etc. It is known to be one of the energy sources that can be used as a carbon source and nitrogen source.
- Mucin may be included, for example, at a concentration of 0.01% (w/v) to 5% (w/v), specifically at a concentration of 0.1% (w/v) to 1% (w/v) It may include, but is not limited to.
- the intestinal environment-like composition may not contain nutrients other than mucin, and may specifically be characterized in that it does not contain nitrogen sources and/or carbon sources such as proteins and carbohydrates.
- the protein serving as the carbon source and nitrogen source may be one or more of tryptone, peptone, and yeast extract, but is not limited thereto, and may specifically be tryptone.
- the carbohydrate serving as a carbon source may be one or more of monosaccharides such as glucose, fructose, and galactose, and disaccharides such as maltose and lactose, but is not limited thereto, and may specifically be glucose.
- the intestinal environment-like composition may not contain glucose and tryptone, but is not limited thereto.
- the composition similar to the intestinal environment may further include at least one selected from the group consisting of sodium chloride (NaCl), sodium carbonate (NaHCO3), KCl (potassium chloride), and hemin, and the sodium chloride is, for example, at a concentration of 10 to 100 mM. It may be included as, sodium carbonate may be included at a concentration of, for example, 10 to 100 mM, potassium chloride may be included at a concentration of, for example, 1 to 30 mM, and hemin may be included at a concentration of, for example, 1x10 -6 g/L to 1x10-4 g/L may be included, but is not limited thereto.
- NaCl sodium chloride
- NaHCO3 sodium carbonate
- KCl potassium chloride
- hemin may be included at a concentration of, for example, 1x10 -6 g/L to 1x10-4 g/L may be included, but is not limited thereto.
- the mixture can be incubated for 18 to 24 hours in anaerobic conditions.
- equal amounts of a homogenized mixture of feces and medium in an anaerobic chamber are dispensed to a culture plate such as a 96-well plate.
- the culture may be carried out for 12 hours to 48 hours, specifically, it may be performed for 18 hours to 24 hours, but is not limited thereto.
- each experimental group is fermented and cultured by incubating the plate under anaerobic conditions with the temperature, humidity and motion similar to that of the intestinal environment.
- the culture in which the mixture was grown is analyzed.
- the analysis of the culture is, for example, the content, concentration and type of one or more of endotoxin, hydrogen sulfide, short-chain fatty acids (SCFAs) and intestinal flora-derived metabolites contained in the culture.
- SCFAs short-chain fatty acids
- intestinal flora-derived metabolites contained in the culture.
- endotoxin is a toxic substance found inside bacterial cells and is an antigen composed of a complex of proteins, polysaccharides, and lipids.
- the endotoxin may include, but is not limited to, LPS (Lipopolysaccharide), and the LPS may be specifically Gram negative and pro-inflammatory.
- Short-chain fatty acid refers to short-chain fatty acids having 6 or less carbon atoms, and is a representative metabolite produced by intestinal microorganisms. Short-chain fatty acids have useful functions in the body, such as increasing immunity, stabilizing intestinal lymphocytes, lowering insulin signal, and stimulating sympathetic nerves.
- short-chain fatty acids are formate, acetic acid, propionate, butyrate, isobutyrate, valerate, and iso-valerate. It may include one or more selected from the group consisting of, but is not limited thereto.
- various analytical methods that can be used for the analysis by those skilled in the art, such as absorbance analysis, chromatography analysis, gene analysis such as next generation sequencing, and metagenomic analysis, can be used.
- the supernatant and the precipitate can be analyzed.
- metabolites, short-chain fatty acids, toxic substances, etc. may be analyzed from the supernatant, and intestinal flora analysis may be performed from the precipitate.
- enterobacteriaceae After extracting all the genomes in the sample, enterobacteriaceae can be identified through genome-based analysis such as real-time PCR using bacteria-specific primers suggested in the GULDA method or metagenome analysis such as Next Generation Sequencing. analysis can be analyzed.
- the present invention it is possible to reduce deviation between learning data by optimizing learning data before machine learning by analyzing cultures in a state in which an intestinal environment is implemented in vitro through an intestinal environment-like composition.
- the performance of the machine learning model can be improved by facilitating the selection of microorganism-related variables to be described later and learning the machine learning model through these microorganism-related variables. Therefore, it is possible to increase the accuracy of diagnosing the presence or absence of digestive disorders through the learned machine learning model.
- the variable selection unit 110 may select (ie, feature selection) variables related to microorganisms from among a plurality of microorganism data as variables to be used in the machine learning model based on a preset variable selection algorithm.
- the number of microbe-related variables can be between 3 and 10.
- the optimal number of microbe-related variables may be 10.
- variables features, variables, or attributes
- problems such as overfitting of the machine learning model or decrease in prediction accuracy occur.
- variable selection algorithm may include, for example, at least one of a Boruta algorithm and a recursive feature elimination (RFE) algorithm.
- RFE recursive feature elimination
- Microbial-related variables selected from the preset variable selection algorithm are RF39, Lachnospiraceae, Enterobacteriaceae, Barnesiellaceae, Butyricicoccaceae, Bactero It may contain the content of one or more types of microorganisms selected from Genus belonging to the family of Bacteroidaceae, Streptococcaceae, and Anaerovoracaceae.
- the microorganism-related variable selected from the preset variable selection algorithm is, for example, Coprobacter, Ruminococcus, Butyricoccus, Bacteroides, Streptococcus ( The content of one or more microorganisms selected from Species belonging to the Streptococcus Genus may be further included.
- the learning unit 120 may train a machine learning model using microorganism-related variables.
- the learning unit 120 performs supervised learning based on labeling for the presence or absence of digestive disorders for each microbial data (learning data) and the content of microorganisms related to the selected variable to predict the presence or absence of digestive disorders for each microbial data.
- machine learning models can be trained.
- the machine learning model includes, for example, at least one of a linear regression analysis (LRA) model, a random forest model, a generalized linear (GLMNET) model, a gradient boosting model, and an extreme gradient boost (XGB) model. can do.
- LRA linear regression analysis
- GLMNET generalized linear
- XGB extreme gradient boost
- the diagnosis unit 130 may diagnose the presence or absence of a digestive disorder by inputting the microbial data collected from the object to be examined into the learned machine learning model.
- the diagnosis unit 130 may diagnose a digestive disorder based on the presence or absence of a digestive disorder, which is an output value of a machine learning model. That is, the diagnosis unit 130 may determine the presence or absence of a digestive disorder in the object to be tested or predict the probability of occurrence of a digestive disorder in the object to be tested based on the output value of the machine learning model.
- Example 1 Microbial-related variables selected based on recursive variable elimination algorithm after treatment with or without MCMOD
- a pretreatment is performed to analyze a mixture in which a sample is mixed with an intestinal environment-like composition.
- the above-described pretreatment may be referred to as MCMOD.
- the comparative example relates to a method for determining the presence or absence of digestive disorders through microbial data extracted by performing only a normal pre-treatment without performing the above-described pre-treatment on a sample.
- the conventional pretreatment for the comparative example is named SMOD.
- the samples are MCMOD of a simple clinical data set (feces) based on the self-response results from 44 patients with gastro-intestinal tract disorder (disease group) and 154 normal people (normal group) and microbial data of SMOD were used, and in particular, oversampling and undersampling were performed on the data set to resolve class imbalance, and the corresponding data set included 82 normal data and 78 digestive disorder data. A total of 160 data sets were converted.
- Microbial data was classified into training data (Train set) and test data (Test set) to be used for learning at a ratio of 7:3.
- variable selection was performed using the Boruta algorithm, binomial deviance plot, and XGB model for the training data to select microorganism-related variables to be used in the machine learning model. Meanwhile, the test data was used to evaluate the performance of the machine learning model as described below.
- Table 1 shows the method for determining the presence or absence of digestive disorders according to an embodiment of the present invention
- Table 2 shows the results of primarily selecting variables through the Boruta algorithm for the analysis results according to the comparative example method.
- Figure 5 shows the result of confirming the optimal number of variables by checking the error value according to the number of variables with a binomial distribution deviation plot for the analysis results according to the method for determining the presence or absence of digestive disorders according to an embodiment of the present invention and the method of Comparative Example. do.
- MCMOD was 3 to 10
- SMOD was 1 to 5.
- 6A and 6B show the importance of selected microbe-related variables.
- a plurality of microorganism-related variables selected through the XGB model may be selected. 10 microbe-related variables with high accuracy for MCMOD and 5 for SMOD are shown.
- a microorganism-related variable with high accuracy among a plurality of selected microorganism-related variables may be a microorganism of the RF39 family.
- FIG. 7 is a diagram comparing the analysis results of each sample according to the method of determining the presence or absence of digestive disorders according to an embodiment of the present invention and the method of Comparative Example
- FIG. It is a diagram comparing the analysis results of each sample according to the method of Comparative Example.
- the beta diversity of each fecal sample is expressed as a PCoA plot using Unweighted Unifrac Distance. As shown in the PCoA plot of FIG. 7 (a), it can be seen that the MCMOD-treated fecal samples are relatively clustered, whereas the MCMOD-untreated fecal samples are relatively scattered.
- Figure 7 (c) shows the distance between eight points in each group (Examples and Comparative Examples) on the PCoA plot.
- the bias between the fecal samples is small, so the fecal samples have relatively little noise, and thus have little variability.
- variable selection is facilitated by MCMOD processing of fecal samples before variable selection and machine learning learning, and the performance of the machine learning model can be improved by learning the machine learning model as will be described later.
- Comparative Example 2 Comparison of performance of machine learning models trained using learning data obtained from each of fecal samples treated with MCMOD and those without MCMOD treatment
- Microbial data was extracted by MCMOD treatment of the fecal sample collected in Example 1 (Example), and microbial data was extracted without MCMOD treatment (Comparative Example).
- the optimal number of variables was set through a binomial distribution deviation plot, and a plurality of microorganism-related variables were selected for the XGB model.
- FIG. 9 is a diagram showing a receiver operating characteristic (ROC) curve and an area under a ROC curve (AUC) score of each of the XGB models according to the method for determining the presence or absence of digestive disorders according to an embodiment of the present invention and the method of a comparative example.
- 10 is a diagram comparing performance of a method for determining the presence or absence of a digestive disorder according to an embodiment of the present invention and an XGB model according to a method of a comparative example.
- 11 is a diagram comparing performance of a machine learning model according to a method for determining the presence or absence of a digestive disorder according to an embodiment of the present invention and a method of a comparative example.
- FIG. 12a is a diagram illustrating a method for determining the presence or absence of a digestive disorder according to an embodiment of the present invention
- FIG. 12b is a diagram showing LEfSe according to a method of a comparative example.
- Figure 13a is a method for determining the presence or absence of digestive disorders according to an embodiment of the present invention
- Figure 13b is a diagram showing the Pearson's correlation for the distribution of microorganisms according to the comparative example method.
- 14a is a method for determining the presence or absence of digestive disorders according to an embodiment of the present invention
- FIG. 14b is a diagram showing Pearson's correlation for each microbial gene pathway prediction according to the method of a comparative example.
- 15 is a diagram comparing the amount of short chain fatty acids (SCFAs) according to the method for determining the presence or absence of digestive disorders according to an embodiment of the present invention and the method of Comparative Example.
- SCFAs short chain fatty acids
- the average sensitivity (Average true positive rate), average specificity (Average False Positive Rate), accuracy and AUC values all show higher values in the example than in the comparative example, so that the microorganisms of the example are better than the comparative example.
- the XGB model's ability to discriminate whether or not there is a digestive disorder increases.
- FIG. 11 shows Roc curves and AUC scores of each machine learning model. As shown in FIG. 11, when the machine learning model is learned using the microbial data of the example, it can be confirmed that the performance of all machine learning models is higher than that of the comparative example.
- FIGS. 12A and 12B show the difference between each microorganism characteristically found in a disease group and a normal group. Referring to FIGS. 12A and 12B , it can be seen that more microbial taxa are identified in LEfSe analyzed through Examples than in Comparative Examples.
- the example can more clearly determine the difference between the normal group and the patient group than the comparative example.
- FIGS. 14a and 14b show the Pearson correlation between each microbial gene pathway abundance and the above-described numerical data. This is a comparison drawing. Referring to Figures 13a, 13b, 14a, 14b, since the Pearson correlation of the example data is higher than that of the comparative example, the digestive disorder detection method according to the embodiment is more advantageous than the determination method according to the comparative example. Able to know.
- 15 is a diagram comparing the amount of short-chain fatty acids in the data of Examples and the data of Comparative Examples. In general, it is known that the higher the absolute amount of short-chain fatty acids (acetic acid, propionic acid, butyric acid), the more beneficial it is.
- the disease group has a higher amount than the normal group, but in the example, it can be seen that the difference is reduced compared to the example even if the average of the normal group is higher or the number of disease groups is larger.
- FIG. 16 is a flowchart illustrating a method for determining the presence or absence of a digestive disorder according to an embodiment of the present invention.
- the method for determining whether or not there is a digestive disorder according to an embodiment shown in FIG. 16 includes steps processed time-sequentially in the diagnosis device shown in FIG. 1 . Therefore, even if the content is omitted below, it is also applied to the fire extinguisher failure detection method performed according to the embodiment shown in FIG. 16 .
- a mixture obtained by mixing the intestinal-derived material collected from the subject in step S1700 with a composition similar to the intestinal environment can be analyzed.
- step S1710 data of a plurality of microorganisms may be extracted based on the analysis result of the mixture.
- a microorganism-related variable to be used in the machine learning model may be selected from a plurality of microorganism data based on a preset variable selection algorithm.
- a machine learning model may be trained using microorganism-related variables.
- a machine learning model may be trained using microorganism-related variables.
- the fire extinguisher disorder detection method described with reference to FIG. 16 may be implemented in the form of a computer program stored in a medium or in the form of a recording medium containing instructions executable by a computer, such as program modules executed by a computer.
- Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media.
- Computer readable media may include computer storage media.
- Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Epidemiology (AREA)
- Pathology (AREA)
- Primary Health Care (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioethics (AREA)
- Molecular Biology (AREA)
- Physiology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Procédé pour déterminer la présence ou l'absence de troubles gastro-intestinaux à l'aide d'un modèle d'apprentissage automatique, pouvant comprendre les étapes consistant à : analyser un mélange d'une matière dérivée de l'intestin prélevée chez un sujet et d'une composition simulée d'environnement intestinal ; extraire une pluralité de données de micro-organismes sur la base du résultat d'analyse du mélange ; sélectionner une variable associée aux micro-organismes à utiliser dans un modèle d'apprentissage automatique parmi la pluralité de données de micro-organismes sur la base d'un algorithme de sélection de variable prédéfini ; entraîner le modèle d'apprentissage automatique à l'aide de la variable associée aux micro-organismes ; et déterminer la présence ou l'absence de troubles gastro-intestinaux par entrée des données de micro-organismes collectées auprès d'un sujet à tester dans le modèle d'apprentissage automatique entraîné. La variable associée aux micro-organismes peut comprendre la teneur d'au moins un des genres appartenant aux familles RF39, Lachnospiraceae, Enterobacteriaceae, Barnesiellaceae, Butyricicoccaceae, Bacteroidaceae, Streptococcaceae et Anaerovoracaceae.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020210066614A KR20220158950A (ko) | 2021-05-25 | 2021-05-25 | 머신러닝 모델을 이용하여 소화기 장애 유무를 판별하는 방법 및 진단 장치 |
KR10-2021-0066614 | 2021-05-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022250446A1 true WO2022250446A1 (fr) | 2022-12-01 |
Family
ID=84228971
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2022/007418 WO2022250446A1 (fr) | 2021-05-25 | 2022-05-25 | Procédé et dispositif de diagnostic pour déterminer la présence ou l'absence de troubles gastro-intestinaux à l'aide d'un modèle d'apprentissage automatique |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR20220158950A (fr) |
WO (1) | WO2022250446A1 (fr) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012115885A1 (fr) * | 2011-02-22 | 2012-08-30 | Caris Life Sciences Luxembourg Holdings, S.A.R.L. | Biomarqueurs circulants |
JP2020507308A (ja) * | 2016-12-28 | 2020-03-12 | アスカス バイオサイエンシーズ, インコーポレイテッド | 複雑な不均一コミュニティの微生物株の解析、その機能的関連性及び相互作用の決定、ならびにそれに基づく診断及び生物学的状態の管理、のための方法、装置、及びシステム |
KR20200054203A (ko) * | 2017-08-14 | 2020-05-19 | 소마젠 인크 | 질병-관련 마이크로바이옴 특성화 프로세스 |
KR20200090135A (ko) * | 2019-01-18 | 2020-07-28 | 주식회사 천랩 | 과민성대장증후군 특이적 미생물 바이오마커와 이를 이용하여 과민성대장증후군의 위험도를 예측하는 방법 |
KR102241357B1 (ko) * | 2020-10-20 | 2021-04-16 | 주식회사 에이치이엠 | 머신러닝 모델을 이용하여 대장용종을 진단하는 방법 및 장치 |
-
2021
- 2021-05-25 KR KR1020210066614A patent/KR20220158950A/ko not_active Application Discontinuation
-
2022
- 2022-05-25 WO PCT/KR2022/007418 patent/WO2022250446A1/fr active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2012115885A1 (fr) * | 2011-02-22 | 2012-08-30 | Caris Life Sciences Luxembourg Holdings, S.A.R.L. | Biomarqueurs circulants |
JP2020507308A (ja) * | 2016-12-28 | 2020-03-12 | アスカス バイオサイエンシーズ, インコーポレイテッド | 複雑な不均一コミュニティの微生物株の解析、その機能的関連性及び相互作用の決定、ならびにそれに基づく診断及び生物学的状態の管理、のための方法、装置、及びシステム |
KR20200054203A (ko) * | 2017-08-14 | 2020-05-19 | 소마젠 인크 | 질병-관련 마이크로바이옴 특성화 프로세스 |
KR20200090135A (ko) * | 2019-01-18 | 2020-07-28 | 주식회사 천랩 | 과민성대장증후군 특이적 미생물 바이오마커와 이를 이용하여 과민성대장증후군의 위험도를 예측하는 방법 |
KR102241357B1 (ko) * | 2020-10-20 | 2021-04-16 | 주식회사 에이치이엠 | 머신러닝 모델을 이용하여 대장용종을 진단하는 방법 및 장치 |
Also Published As
Publication number | Publication date |
---|---|
KR20220158950A (ko) | 2022-12-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022203351A1 (fr) | Procédé et dispositif de diagnostic pour déterminer la présence ou l'absence d'entérite à l'aide d'un modèle d'apprentissage automatique | |
WO2022085941A1 (fr) | Procédé et appareil de détermination de la présence ou de l'absence de polypes du côlon au moyen d'un modèle d'apprentissage automatique | |
Rodrigues et al. | Transkingdom interactions between Lactobacilli and hepatic mitochondria attenuate western diet-induced diabetes | |
Tyler et al. | Analyzing the human microbiome: a “how to” guide for physicians | |
WO2022203350A1 (fr) | Méthode et dispositif de diagnostic pour déterminer la présence ou l'absence d'atopie à l'aide d'un modèle d'apprentissage automatique | |
Mai et al. | Distortions in development of intestinal microbiota associated with late onset sepsis in preterm infants | |
Sacchetti et al. | Gut microbiome investigation in celiac disease: from methods to its pathogenetic role | |
WO2021040159A1 (fr) | Procédé de criblage d'une substance personnalisée améliorant l'environnement intestinal à l'aide d'un procédé pmas | |
Sheth et al. | Evidence of transmission of Clostridium difficile in asymptomatic patients following admission screening in a tertiary care hospital | |
Guard et al. | HORSE SPECIES SYMPOSIUM: Canine intestinal microbiology and metagenomics: From phylogeny to function | |
Hong et al. | Identification of Neisseria meningitidis by MALDI-TOF MS may not be reliable | |
WO2019160284A1 (fr) | Procédé de diagnostic d'un accident vasculaire cérébral par l'intermédiaire de l'analyse du métagénome bactérien | |
WO2018155950A1 (fr) | Procédé de diagnostic du diabète par analyse du métagénome microbien | |
WO2022203353A1 (fr) | Procédé et dispositif de diagnostic pour déterminer la présence ou l'absence de constipation à l'aide d'un modèle d'apprentissage automatique | |
Asakura et al. | Long-term grow-out affects Campylobacter jejuni colonization fitness in coincidence with altered microbiota and lipid composition in the cecum of laying hens | |
WO2022203306A1 (fr) | Procédé et dispositif de diagnostic pour déterminer l'hyperglycémie à l'aide d'un modèle d'apprentissage automatique | |
Nouioui et al. | Streptacidiphilus bronchialis sp. nov., a ciprofloxacin-resistant bacterium from a human clinical specimen; reclassification of Streptomyces griseoplanus as Streptacidiphilus griseoplanus comb. nov. and emended description of the genus Streptacidiphilus | |
WO2022250446A1 (fr) | Procédé et dispositif de diagnostic pour déterminer la présence ou l'absence de troubles gastro-intestinaux à l'aide d'un modèle d'apprentissage automatique | |
WO2022250447A1 (fr) | Procédé et appareil de diagnostic pour déterminer la présence d'une maladie intestinale à l'aide d'un modèle d'apprentissage automatique | |
WO2022250444A1 (fr) | Procédé et dispositif de diagnostic pour déterminer la présence ou l'absence d'une distension abdominale à l'aide d'un modèle d'apprentissage automatique | |
WO2022250445A1 (fr) | Procédé et appareil de diagnostic pour déterminer la présence de maux d'estomac à l'aide d'un modèle d'apprentissage automatique | |
WO2022203307A1 (fr) | Procédé pour déterminer si l'obésité est présente, à l'aide d'un modèle d'apprentissage automatique, et dispositif de diagnostic | |
WO2021049834A1 (fr) | Procédé de diagnostic du cancer colorectal sur la base de métagénome et de métabolite de vésicules extracellulaires | |
WO2018155967A1 (fr) | Procédé de diagnostic d'une maladie respiratoire obstructive chronique par analyse du métagénome bactérien | |
Wongkuna et al. | Taxono-genomics description of Olsenella lakotia SW165 T sp. nov., a new anaerobic bacterium isolated from cecum of feral chicken |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22811639 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22811639 Country of ref document: EP Kind code of ref document: A1 |