WO2021075574A1 - 被験物質のヒトにおける適応疾患を予測するための人工知能モデル - Google Patents
被験物質のヒトにおける適応疾患を予測するための人工知能モデル Download PDFInfo
- Publication number
- WO2021075574A1 WO2021075574A1 PCT/JP2020/039179 JP2020039179W WO2021075574A1 WO 2021075574 A1 WO2021075574 A1 WO 2021075574A1 JP 2020039179 W JP2020039179 W JP 2020039179W WO 2021075574 A1 WO2021075574 A1 WO 2021075574A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- group
- data group
- training
- test
- training data
- Prior art date
Links
- 239000000126 substance Substances 0.000 title claims abstract description 273
- 238000012360 testing method Methods 0.000 title claims abstract description 197
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 107
- 238000012549 training Methods 0.000 claims abstract description 267
- 210000000056 organ Anatomy 0.000 claims abstract description 160
- 239000000090 biomarker Substances 0.000 claims abstract description 98
- 238000000034 method Methods 0.000 claims abstract description 87
- 241000282412 Homo Species 0.000 claims abstract description 58
- 230000002411 adverse Effects 0.000 claims abstract description 42
- 238000012545 processing Methods 0.000 claims description 84
- 238000004891 communication Methods 0.000 claims description 53
- 230000009471 action Effects 0.000 claims description 27
- 238000004590 computer program Methods 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 10
- 229940079593 drug Drugs 0.000 description 81
- 239000003814 drug Substances 0.000 description 79
- 238000003860 storage Methods 0.000 description 66
- 230000006399 behavior Effects 0.000 description 47
- 108090000623 proteins and genes Proteins 0.000 description 21
- 230000014509 gene expression Effects 0.000 description 18
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 16
- 238000005259 measurement Methods 0.000 description 16
- 238000011814 C57BL/6N mouse Methods 0.000 description 15
- 201000010099 disease Diseases 0.000 description 15
- 230000006870 function Effects 0.000 description 15
- 241000699670 Mus sp. Species 0.000 description 13
- 230000005540 biological transmission Effects 0.000 description 12
- 239000002207 metabolite Substances 0.000 description 12
- 210000001519 tissue Anatomy 0.000 description 12
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 11
- 241001465754 Metazoa Species 0.000 description 11
- 238000001647 drug administration Methods 0.000 description 10
- 238000004458 analytical method Methods 0.000 description 9
- 210000004027 cell Anatomy 0.000 description 9
- CEUORZQYGODEFX-UHFFFAOYSA-N Aripirazole Chemical compound ClC1=CC=CC(N2CCN(CCCCOC=3C=C4NC(=O)CCC4=CC=3)CC2)=C1Cl CEUORZQYGODEFX-UHFFFAOYSA-N 0.000 description 8
- 229960004372 aripiprazole Drugs 0.000 description 8
- 210000001124 body fluid Anatomy 0.000 description 8
- 239000010839 body fluid Substances 0.000 description 8
- RZVAJINKPMORJF-UHFFFAOYSA-N Acetaminophen Chemical compound CC(=O)NC1=CC=C(O)C=C1 RZVAJINKPMORJF-UHFFFAOYSA-N 0.000 description 7
- 229930186217 Glycolipid Natural products 0.000 description 7
- 238000007796 conventional method Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- 229920002134 Carboxymethyl cellulose Polymers 0.000 description 6
- IAZDPXIOMUYVGZ-UHFFFAOYSA-N Dimethylsulphoxide Chemical compound CS(C)=O IAZDPXIOMUYVGZ-UHFFFAOYSA-N 0.000 description 6
- 241000699666 Mus <mouse, genus> Species 0.000 description 6
- 239000001768 carboxy methyl cellulose Substances 0.000 description 6
- 235000010948 carboxy methyl cellulose Nutrition 0.000 description 6
- 239000008112 carboxymethyl-cellulose Substances 0.000 description 6
- 210000002216 heart Anatomy 0.000 description 6
- 210000004072 lung Anatomy 0.000 description 6
- 239000002547 new drug Substances 0.000 description 6
- 108020004707 nucleic acids Proteins 0.000 description 6
- 102000039446 nucleic acids Human genes 0.000 description 6
- 150000007523 nucleic acids Chemical class 0.000 description 6
- OGSPWJRAVKPPFI-UHFFFAOYSA-N Alendronic Acid Chemical compound NCCCC(O)(P(O)(O)=O)P(O)(O)=O OGSPWJRAVKPPFI-UHFFFAOYSA-N 0.000 description 5
- 238000003559 RNA-seq method Methods 0.000 description 5
- 210000004100 adrenal gland Anatomy 0.000 description 5
- 229940062527 alendronate Drugs 0.000 description 5
- 210000004556 brain Anatomy 0.000 description 5
- 229960004170 clozapine Drugs 0.000 description 5
- QZUDBNBUXVUHMW-UHFFFAOYSA-N clozapine Chemical compound C1CN(C)CCN1C1=NC2=CC(Cl)=CC=C2NC2=CC=CC=C12 QZUDBNBUXVUHMW-UHFFFAOYSA-N 0.000 description 5
- 238000009511 drug repositioning Methods 0.000 description 5
- 210000004185 liver Anatomy 0.000 description 5
- 238000003058 natural language processing Methods 0.000 description 5
- 239000013642 negative control Substances 0.000 description 5
- 210000002027 skeletal muscle Anatomy 0.000 description 5
- 239000000243 solution Substances 0.000 description 5
- 210000001685 thyroid gland Anatomy 0.000 description 5
- 239000013598 vector Substances 0.000 description 5
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 5
- 206010062767 Hypophysitis Diseases 0.000 description 4
- 108090001030 Lipoproteins Proteins 0.000 description 4
- 102000004895 Lipoproteins Human genes 0.000 description 4
- 241000124008 Mammalia Species 0.000 description 4
- 206010028980 Neoplasm Diseases 0.000 description 4
- 150000001413 amino acids Chemical class 0.000 description 4
- 210000000709 aorta Anatomy 0.000 description 4
- 238000011161 development Methods 0.000 description 4
- 229960003722 doxycycline Drugs 0.000 description 4
- XQTWDDCIUJNLTR-CVHRZJFOSA-N doxycycline monohydrate Chemical compound O.O=C1C2=C(O)C=CC=C2[C@H](C)[C@@H]2C1=C(O)[C@]1(O)C(=O)C(C(N)=O)=C(O)[C@@H](N(C)C)[C@@H]1[C@H]2O XQTWDDCIUJNLTR-CVHRZJFOSA-N 0.000 description 4
- 238000004453 electron probe microanalysis Methods 0.000 description 4
- UFZOPKFMKMAWLU-UHFFFAOYSA-N ethoxy(methyl)phosphinic acid Chemical compound CCOP(C)(O)=O UFZOPKFMKMAWLU-UHFFFAOYSA-N 0.000 description 4
- GOTYRUGSSMKFNF-UHFFFAOYSA-N lenalidomide Chemical compound C1C=2C(N)=CC=CC=2C(=O)N1C1CCC(=O)NC1=O GOTYRUGSSMKFNF-UHFFFAOYSA-N 0.000 description 4
- 229960004942 lenalidomide Drugs 0.000 description 4
- 150000002632 lipids Chemical class 0.000 description 4
- 108020004999 messenger RNA Proteins 0.000 description 4
- 210000000496 pancreas Anatomy 0.000 description 4
- 210000003635 pituitary gland Anatomy 0.000 description 4
- 150000008442 polyphenolic compounds Chemical class 0.000 description 4
- 235000013824 polyphenols Nutrition 0.000 description 4
- 238000002360 preparation method Methods 0.000 description 4
- 108090000765 processed proteins & peptides Proteins 0.000 description 4
- 102000004169 proteins and genes Human genes 0.000 description 4
- 239000002994 raw material Substances 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 210000000952 spleen Anatomy 0.000 description 4
- 210000002784 stomach Anatomy 0.000 description 4
- 235000000346 sugar Nutrition 0.000 description 4
- 150000008163 sugars Chemical class 0.000 description 4
- 208000024891 symptom Diseases 0.000 description 4
- 210000001550 testis Anatomy 0.000 description 4
- 210000001541 thymus gland Anatomy 0.000 description 4
- 238000011222 transcriptome analysis Methods 0.000 description 4
- 102000019034 Chemokines Human genes 0.000 description 3
- 108010012236 Chemokines Proteins 0.000 description 3
- 108700011259 MicroRNAs Proteins 0.000 description 3
- 241000700159 Rattus Species 0.000 description 3
- HEMHJVSKTPXQMS-UHFFFAOYSA-M Sodium hydroxide Chemical compound [OH-].[Na+] HEMHJVSKTPXQMS-UHFFFAOYSA-M 0.000 description 3
- 108010049264 Teriparatide Proteins 0.000 description 3
- 210000000593 adipose tissue white Anatomy 0.000 description 3
- 210000000988 bone and bone Anatomy 0.000 description 3
- 210000001185 bone marrow Anatomy 0.000 description 3
- 210000005252 bulbus oculi Anatomy 0.000 description 3
- DQLATGHUWYMOKM-UHFFFAOYSA-L cisplatin Chemical compound N[Pt](N)(Cl)Cl DQLATGHUWYMOKM-UHFFFAOYSA-L 0.000 description 3
- 229960004316 cisplatin Drugs 0.000 description 3
- 229960003345 empagliflozin Drugs 0.000 description 3
- OBWASQILIWPZMG-QZMOQZSNSA-N empagliflozin Chemical compound O[C@@H]1[C@@H](O)[C@H](O)[C@@H](CO)O[C@H]1C1=CC=C(Cl)C(CC=2C=CC(O[C@@H]3COCC3)=CC=2)=C1 OBWASQILIWPZMG-QZMOQZSNSA-N 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 235000013305 food Nutrition 0.000 description 3
- 210000003405 ileum Anatomy 0.000 description 3
- 238000001727 in vivo Methods 0.000 description 3
- 210000001630 jejunum Anatomy 0.000 description 3
- 210000002429 large intestine Anatomy 0.000 description 3
- 229910021645 metal ion Inorganic materials 0.000 description 3
- 239000002679 microRNA Substances 0.000 description 3
- 229960005489 paracetamol Drugs 0.000 description 3
- 210000003681 parotid gland Anatomy 0.000 description 3
- 239000002504 physiological saline solution Substances 0.000 description 3
- 102000004196 processed proteins & peptides Human genes 0.000 description 3
- 210000003491 skin Anatomy 0.000 description 3
- 210000003625 skull Anatomy 0.000 description 3
- OGBMKVWORPGQRR-UMXFMPSGSA-N teriparatide Chemical compound C([C@H](NC(=O)[C@H](CCSC)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@@H](NC(=O)[C@H](CCC(O)=O)NC(=O)[C@H](CO)NC(=O)[C@@H](NC(=O)[C@@H](N)CO)C(C)C)[C@@H](C)CC)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC(C)C)C(=O)NCC(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC=1N=CNC=1)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC=1C2=CC=CC=C2NC=1)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CC=1N=CNC=1)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CC=1C=CC=CC=1)C(O)=O)C1=CNC=N1 OGBMKVWORPGQRR-UMXFMPSGSA-N 0.000 description 3
- 229960005460 teriparatide Drugs 0.000 description 3
- VSWBSWWIRNCQIJ-HUUCEWRRSA-N (S,S)-asenapine Chemical compound O1C2=CC=CC=C2[C@H]2CN(C)C[C@@H]2C2=CC(Cl)=CC=C21 VSWBSWWIRNCQIJ-HUUCEWRRSA-N 0.000 description 2
- 241000283690 Bos taurus Species 0.000 description 2
- 241000282472 Canis lupus familiaris Species 0.000 description 2
- 241000283086 Equidae Species 0.000 description 2
- 241000282326 Felis catus Species 0.000 description 2
- 208000028389 Nerve injury Diseases 0.000 description 2
- IIDJRNMFWXDHID-UHFFFAOYSA-N Risedronic acid Chemical compound OP(=O)(O)C(P(O)(O)=O)(O)CC1=CC=CN=C1 IIDJRNMFWXDHID-UHFFFAOYSA-N 0.000 description 2
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 2
- 229930006000 Sucrose Natural products 0.000 description 2
- CZMRCDWAGMRECN-UGDNZRGBSA-N Sucrose Chemical compound O[C@H]1[C@H](O)[C@@H](CO)O[C@@]1(CO)O[C@@H]1[C@H](O)[C@@H](O)[C@H](O)[C@@H](CO)O1 CZMRCDWAGMRECN-UGDNZRGBSA-N 0.000 description 2
- 241000282887 Suidae Species 0.000 description 2
- 108020004417 Untranslated RNA Proteins 0.000 description 2
- 102000039634 Untranslated RNA Human genes 0.000 description 2
- 229960005245 asenapine Drugs 0.000 description 2
- 210000000941 bile Anatomy 0.000 description 2
- 210000004369 blood Anatomy 0.000 description 2
- 239000008280 blood Substances 0.000 description 2
- 210000002798 bone marrow cell Anatomy 0.000 description 2
- 230000002490 cerebral effect Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 229960002027 evolocumab Drugs 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000010195 expression analysis Methods 0.000 description 2
- 239000012530 fluid Substances 0.000 description 2
- 210000004907 gland Anatomy 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 210000003734 kidney Anatomy 0.000 description 2
- 229960001432 lurasidone Drugs 0.000 description 2
- PQXKDMSYBGKCJA-CVTJIBDQSA-N lurasidone Chemical compound C1=CC=C2C(N3CCN(CC3)C[C@@H]3CCCC[C@H]3CN3C(=O)[C@@H]4[C@H]5CC[C@H](C5)[C@@H]4C3=O)=NSC2=C1 PQXKDMSYBGKCJA-CVTJIBDQSA-N 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000010534 mechanism of action Effects 0.000 description 2
- 210000003205 muscle Anatomy 0.000 description 2
- 230000008764 nerve damage Effects 0.000 description 2
- 229960005017 olanzapine Drugs 0.000 description 2
- KVWDHTXUZHCGIO-UHFFFAOYSA-N olanzapine Chemical compound C1CN(C)CCN1C1=NC2=CC=CC=C2NC2=C1C=C(C)S2 KVWDHTXUZHCGIO-UHFFFAOYSA-N 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 235000010482 polyoxyethylene sorbitan monooleate Nutrition 0.000 description 2
- 229920000053 polysorbate 80 Polymers 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 229940017164 repatha Drugs 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 239000011780 sodium chloride Substances 0.000 description 2
- 241000894007 species Species 0.000 description 2
- 239000005720 sucrose Substances 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 208000001072 type 2 diabetes mellitus Diseases 0.000 description 2
- 210000003462 vein Anatomy 0.000 description 2
- GMDCDXMAFMEDAG-CHHFXETESA-N (S,S)-asenapine maleate Chemical compound OC(=O)\C=C/C(O)=O.O1C2=CC=CC=C2[C@H]2CN(C)C[C@@H]2C2=CC(Cl)=CC=C21 GMDCDXMAFMEDAG-CHHFXETESA-N 0.000 description 1
- PTNZGHXUZDHMIQ-UHFFFAOYSA-N 4-(dimethylamino)-1,5,10,11,12a-pentahydroxy-6-methyl-3,12-dioxo-4a,5,5a,6-tetrahydro-4h-tetracene-2-carboxamide;hydrochloride Chemical compound Cl.C1=CC=C2C(C)C(C(O)C3C(C(O)=C(C(N)=O)C(=O)C3N(C)C)(O)C3=O)C3=C(O)C2=C1O PTNZGHXUZDHMIQ-UHFFFAOYSA-N 0.000 description 1
- QTBSBXVTEAMEQO-UHFFFAOYSA-N Acetic acid Chemical compound CC(O)=O QTBSBXVTEAMEQO-UHFFFAOYSA-N 0.000 description 1
- 101150100721 Alas2 gene Proteins 0.000 description 1
- 206010052613 Allergic bronchitis Diseases 0.000 description 1
- 208000024827 Alzheimer disease Diseases 0.000 description 1
- 244000099147 Ananas comosus Species 0.000 description 1
- 235000007119 Ananas comosus Nutrition 0.000 description 1
- 206010002329 Aneurysm Diseases 0.000 description 1
- 206010003445 Ascites Diseases 0.000 description 1
- 208000023275 Autoimmune disease Diseases 0.000 description 1
- 241000271566 Aves Species 0.000 description 1
- 241000894006 Bacteria Species 0.000 description 1
- 241000283707 Capra Species 0.000 description 1
- 201000009030 Carcinoma Diseases 0.000 description 1
- 241000606161 Chlamydia Species 0.000 description 1
- 241000725101 Clea Species 0.000 description 1
- 208000035473 Communicable disease Diseases 0.000 description 1
- 206010012289 Dementia Diseases 0.000 description 1
- 238000002965 ELISA Methods 0.000 description 1
- 206010014418 Electrolyte imbalance Diseases 0.000 description 1
- 208000005189 Embolism Diseases 0.000 description 1
- 241000233866 Fungi Species 0.000 description 1
- 241000287828 Gallus gallus Species 0.000 description 1
- 206010018364 Glomerulonephritis Diseases 0.000 description 1
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 1
- 108090000288 Glycoproteins Proteins 0.000 description 1
- 102000003886 Glycoproteins Human genes 0.000 description 1
- 208000032843 Hemorrhage Diseases 0.000 description 1
- 101001135770 Homo sapiens Parathyroid hormone Proteins 0.000 description 1
- 101001135995 Homo sapiens Probable peptidyl-tRNA hydrolase Proteins 0.000 description 1
- 241000283973 Oryctolagus cuniculus Species 0.000 description 1
- 208000018737 Parkinson disease Diseases 0.000 description 1
- 208000031481 Pathologic Constriction Diseases 0.000 description 1
- 241001494479 Pecora Species 0.000 description 1
- 208000002151 Pleural effusion Diseases 0.000 description 1
- 208000007536 Thrombosis Diseases 0.000 description 1
- 206010062174 Venous aneurysm Diseases 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 210000000577 adipose tissue Anatomy 0.000 description 1
- 210000003486 adipose tissue brown Anatomy 0.000 description 1
- 230000001919 adrenal effect Effects 0.000 description 1
- DCSBSVSZJRSITC-UHFFFAOYSA-M alendronate sodium trihydrate Chemical compound O.O.O.[Na+].NCCCC(O)(P(O)(O)=O)P(O)([O-])=O DCSBSVSZJRSITC-UHFFFAOYSA-M 0.000 description 1
- 208000026935 allergic disease Diseases 0.000 description 1
- 210000000436 anus Anatomy 0.000 description 1
- 210000001367 artery Anatomy 0.000 description 1
- 210000001815 ascending colon Anatomy 0.000 description 1
- 229960001615 asenapine maleate Drugs 0.000 description 1
- 206010003549 asthenia Diseases 0.000 description 1
- 208000029618 autoimmune pulmonary alveolar proteinosis Diseases 0.000 description 1
- 210000000467 autonomic pathway Anatomy 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- -1 bile Substances 0.000 description 1
- 210000000013 bile duct Anatomy 0.000 description 1
- 210000003445 biliary tract Anatomy 0.000 description 1
- 208000034158 bleeding Diseases 0.000 description 1
- 230000000740 bleeding effect Effects 0.000 description 1
- 210000000133 brain stem Anatomy 0.000 description 1
- 210000000481 breast Anatomy 0.000 description 1
- 210000000621 bronchi Anatomy 0.000 description 1
- 210000000845 cartilage Anatomy 0.000 description 1
- 210000004534 cecum Anatomy 0.000 description 1
- 210000001175 cerebrospinal fluid Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 210000003467 cheek Anatomy 0.000 description 1
- 235000013330 chicken meat Nutrition 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 208000027744 congestion Diseases 0.000 description 1
- 210000002808 connective tissue Anatomy 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 239000002537 cosmetic Substances 0.000 description 1
- 210000004748 cultured cell Anatomy 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 210000000188 diaphragm Anatomy 0.000 description 1
- 235000005911 diet Nutrition 0.000 description 1
- 230000037213 diet Effects 0.000 description 1
- 230000009274 differential gene expression Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 210000002249 digestive system Anatomy 0.000 description 1
- 208000035475 disorder Diseases 0.000 description 1
- 229960004082 doxycycline hydrochloride Drugs 0.000 description 1
- 238000007876 drug discovery Methods 0.000 description 1
- 210000001198 duodenum Anatomy 0.000 description 1
- 210000000883 ear external Anatomy 0.000 description 1
- 210000003027 ear inner Anatomy 0.000 description 1
- 210000000959 ear middle Anatomy 0.000 description 1
- 230000002500 effect on skin Effects 0.000 description 1
- 210000000750 endocrine system Anatomy 0.000 description 1
- 210000003238 esophagus Anatomy 0.000 description 1
- 210000003722 extracellular fluid Anatomy 0.000 description 1
- 210000001508 eye Anatomy 0.000 description 1
- 210000000744 eyelid Anatomy 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000002496 gastric effect Effects 0.000 description 1
- 210000004195 gingiva Anatomy 0.000 description 1
- 239000008103 glucose Substances 0.000 description 1
- 210000002149 gonad Anatomy 0.000 description 1
- 230000003394 haemopoietic effect Effects 0.000 description 1
- 102000058004 human PTH Human genes 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 210000000987 immune system Anatomy 0.000 description 1
- 210000000936 intestine Anatomy 0.000 description 1
- 208000023589 ischemic disease Diseases 0.000 description 1
- 208000017169 kidney disease Diseases 0.000 description 1
- 201000010901 lateral sclerosis Diseases 0.000 description 1
- 210000003041 ligament Anatomy 0.000 description 1
- 210000000088 lip Anatomy 0.000 description 1
- 229960002863 lurasidone hydrochloride Drugs 0.000 description 1
- NEKCRUIRPWNMLK-SCIYSFAVSA-N lurasidone hydrochloride Chemical compound Cl.C1=CC=C2C(N3CCN(CC3)C[C@@H]3CCCC[C@H]3CN3C(=O)[C@@H]4[C@H]5CC[C@H](C5)[C@@H]4C3=O)=NSC2=C1 NEKCRUIRPWNMLK-SCIYSFAVSA-N 0.000 description 1
- 210000002751 lymph Anatomy 0.000 description 1
- 210000004880 lymph fluid Anatomy 0.000 description 1
- 210000001165 lymph node Anatomy 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 210000004995 male reproductive system Anatomy 0.000 description 1
- 230000003211 malignant effect Effects 0.000 description 1
- 238000004949 mass spectrometry Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 210000004379 membrane Anatomy 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 244000005700 microbiome Species 0.000 description 1
- 235000013336 milk Nutrition 0.000 description 1
- 210000004080 milk Anatomy 0.000 description 1
- 239000008267 milk Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 208000005264 motor neuron disease Diseases 0.000 description 1
- 201000006417 multiple sclerosis Diseases 0.000 description 1
- 210000003928 nasal cavity Anatomy 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 210000000653 nervous system Anatomy 0.000 description 1
- 208000015122 neurodegenerative disease Diseases 0.000 description 1
- 108091027963 non-coding RNA Proteins 0.000 description 1
- 102000042567 non-coding RNA Human genes 0.000 description 1
- 210000001672 ovary Anatomy 0.000 description 1
- 210000003101 oviduct Anatomy 0.000 description 1
- 210000003254 palate Anatomy 0.000 description 1
- 210000002741 palatine tonsil Anatomy 0.000 description 1
- 244000045947 parasite Species 0.000 description 1
- 210000003899 penis Anatomy 0.000 description 1
- 210000000578 peripheral nerve Anatomy 0.000 description 1
- 210000004303 peritoneum Anatomy 0.000 description 1
- 210000003800 pharynx Anatomy 0.000 description 1
- 210000002381 plasma Anatomy 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 238000000513 principal component analysis Methods 0.000 description 1
- 210000002307 prostate Anatomy 0.000 description 1
- 238000003753 real-time PCR Methods 0.000 description 1
- 210000000664 rectum Anatomy 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000001850 reproductive effect Effects 0.000 description 1
- 230000000241 respiratory effect Effects 0.000 description 1
- 210000001525 retina Anatomy 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 229940089617 risedronate Drugs 0.000 description 1
- 229960000759 risedronic acid Drugs 0.000 description 1
- 210000003296 saliva Anatomy 0.000 description 1
- 210000003079 salivary gland Anatomy 0.000 description 1
- 238000013515 script Methods 0.000 description 1
- 210000000697 sensory organ Anatomy 0.000 description 1
- 210000002966 serum Anatomy 0.000 description 1
- 210000001599 sigmoid colon Anatomy 0.000 description 1
- 229960002063 sofosbuvir Drugs 0.000 description 1
- TTZHDVOVKQGIBA-IQWMDFIBSA-N sofosbuvir Chemical compound N1([C@@H]2O[C@@H]([C@H]([C@]2(F)C)O)CO[P@@](=O)(N[C@@H](C)C(=O)OC(C)C)OC=2C=CC=CC=2)C=CC(=O)NC1=O TTZHDVOVKQGIBA-IQWMDFIBSA-N 0.000 description 1
- 210000000278 spinal cord Anatomy 0.000 description 1
- 238000011425 standardization method Methods 0.000 description 1
- 230000036262 stenosis Effects 0.000 description 1
- 208000037804 stenosis Diseases 0.000 description 1
- 238000010254 subcutaneous injection Methods 0.000 description 1
- 239000007929 subcutaneous injection Substances 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 230000009885 systemic effect Effects 0.000 description 1
- 210000001138 tear Anatomy 0.000 description 1
- 210000002435 tendon Anatomy 0.000 description 1
- 210000000115 thoracic cavity Anatomy 0.000 description 1
- 210000002105 tongue Anatomy 0.000 description 1
- 210000000515 tooth Anatomy 0.000 description 1
- 210000003437 trachea Anatomy 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 210000003384 transverse colon Anatomy 0.000 description 1
- 210000003932 urinary bladder Anatomy 0.000 description 1
- 230000002485 urinary effect Effects 0.000 description 1
- 210000001635 urinary tract Anatomy 0.000 description 1
- 210000002700 urine Anatomy 0.000 description 1
- 210000004291 uterus Anatomy 0.000 description 1
- 210000001215 vagina Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- A—HUMAN NECESSITIES
- A01—AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
- A01K—ANIMAL HUSBANDRY; AVICULTURE; APICULTURE; PISCICULTURE; FISHING; REARING OR BREEDING ANIMALS, NOT OTHERWISE PROVIDED FOR; NEW BREEDS OF ANIMALS
- A01K67/00—Rearing or breeding animals, not otherwise provided for; New or modified breeds of animals
- A01K67/027—New or modified breeds of vertebrates
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6809—Methods for determination or identification of nucleic acids involving differential detection
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/15—Medicinal preparations ; Physical properties thereof, e.g. dissolubility
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N33/00—Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
- G01N33/48—Biological material, e.g. blood, urine; Haemocytometers
- G01N33/50—Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H70/00—ICT specially adapted for the handling or processing of medical references
- G16H70/40—ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
-
- A—HUMAN NECESSITIES
- A01—AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
- A01K—ANIMAL HUSBANDRY; AVICULTURE; APICULTURE; PISCICULTURE; FISHING; REARING OR BREEDING ANIMALS, NOT OTHERWISE PROVIDED FOR; NEW BREEDS OF ANIMALS
- A01K2267/00—Animals characterised by purpose
- A01K2267/03—Animal model, e.g. for test or diseases
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
Definitions
- a method for predicting an indication disease of a test substance in humans a device for predicting an indication disease of a test substance in humans, a program for predicting an indication disease of a test substance in humans, and a test substance of the test substance.
- methods of training artificial intelligence models used to predict adaptive diseases in humans and trained artificial intelligence models are disclosed.
- Patent Document 1 describes organ association in each organ obtained from cells or tissues derived from one or more organs of an individual to which the test substance is administered. By comparing the test data of the index factor with the standard data of the corresponding organ-related index factor determined in advance, the pattern similarity for calculating the pattern similarity of the organ-related index factor is obtained, and the pattern of the organ-related index factor is obtained. Disclosed is a method for predicting the efficacy or side effect of a test substance in one or more of the organs and / or in an organ other than the one or more organs by using the similarity of the above as an index.
- Patent Document 2 describes from a non-human animal to each of the non-human animals to which a plurality of existing substances whose actions in humans are known are individually administered.
- the data group showing the behavior of the transcriptome in a plurality of different organs collected in the above and the data showing the known action of each existing substance in humans are input to the artificial intelligence model as training data, and the artificial intelligence model is trained. From the behavior of the transcriptome in a plurality of different organs of the non-human animal to which the test substance was administered, including the organs collected at the time of preparation of the training data, 1 of the test substance in humans.
- an artificial intelligence model for predicting multiple actions has been disclosed.
- One object of the present disclosure is to efficiently predict the indication of the test substance in humans from the behavior of the biomarker when the test substance is administered to an animal other than human.
- An object of the present invention is to predict even if the test substance has an unknown effect on the existing substance used when acquiring the training data.
- the present invention may include the following embodiments as embodiments.
- the training method includes associating the first training data group, the second training data group, and the third training data group and inputting them into the artificial intelligence model to train the artificial intelligence model, and the first training data.
- the group includes a group of data showing the behavior of biomarkers in one or more different organs collected from each non-human animal individually administered with a plurality of predetermined existing substances having known indications in humans, as described above. It is a group of data associated with a label indicating each name of a predetermined existing substance administered, and the second training data group includes a label indicating each name of the plurality of predetermined existing substances and the above.
- the first training data group and the third training data group are linked by the second training data group to generate a fourth training data group, and the fourth training data group is artificially created. Enter into intelligence.
- the information regarding the adverse event includes a label indicating the adverse event and the presence / absence or frequency of occurrence of the adverse event in the indication.
- the biomarker is a transcriptome.
- the artificial intelligence model is One-Class SVM.
- One embodiment of the present invention relates to a training device for an artificial intelligence model.
- the training device includes a processing unit, and the processing unit associates the first training data group, the second training data group, and the third training data group and inputs them into the artificial intelligence model to train the artificial intelligence model.
- the first training data group showed the behavior of biomarkers in one or more different organs collected from each non-human animal individually administered with a plurality of predetermined pre-existing substances of known indications in humans. It is a group of data in which the group of the data to be shown and the label indicating the name of each of the predetermined existing substances administered are linked, and the second training data group is each of the plurality of predetermined existing substances. A group of data in which a label indicating a name and a label indicating the indication reported for each of the plurality of predetermined existing substances are associated with each other, and the third training data group is the plurality of predetermined data.
- the first training data group is the behavior of biomarkers in one or more different organs taken from each non-human animal individually administered with multiple predetermined pre-existing substances of known indications in humans.
- the second training data group is a group of the plurality of predetermined existing substances.
- the third training data group is the plurality of data. It is a group of data in which labels indicating the indications reported for each of the predetermined existing substances and information on adverse events reported corresponding to each of these indications are linked, and the artificial intelligence.
- the model is for predicting the indication of the test substance in humans. Item 8.
- One embodiment of the invention relates to a method of predicting the indication of a test substance in humans.
- the method is a step of acquiring a first test data group, wherein the first test data group is data showing the behavior of a biomarker in one or more organs collected from a non-human animal to which a test substance has been administered.
- the steps, the first test data group, and the second test data group were input to the artificial intelligence model trained by the method according to any one of Items 1 to 5, and the training was performed.
- It is a step of predicting the indication of the test substance in humans based on the input first test data group and the second test data group by the artificial intelligence model, and the second test data group is a plurality of known known.
- the test substance includes a processing unit, and the processing unit uses an artificial intelligence model in which a first test data group and a second test data group are trained by the method according to any one of Items 1 to 5.
- the second test data group is a group of data showing the behavior of the biomarker in the above, and the second test data group is reported corresponding to each of a plurality of known indication labels and the plurality of known indications. This is a group of data associated with information on adverse events acquired when the training data group was generated. Item 12.
- An embodiment of the present invention comprises an artificial intelligence model in which a first test data group and a second test data group are trained by the method according to any one of items 1 to 5 when executed by a computer.
- the group is a group of data showing the behavior of biomarkers in one or more organs collected from non-human animals to which the test substance was administered, and corresponds to the organ collected at the time of generation of the first training data group1.
- the present invention relates to a predictive system for predicting the indication of a test substance in humans.
- the system is a server device that transmits a first test data group, and the first test data group is data showing the behavior of biomarkers in one or more organs collected from a non-human animal to which the test substance was administered.
- a server device which is a group of the above, and a prediction device connected to the server device via a network for predicting the action of the test substance in humans.
- the server device includes a communication unit for transmitting the first test data group, the prediction device includes a processing unit and a communication unit, and the processing unit transmits via the communication unit of the server device.
- the method according to any one of Items 1 to 5, wherein the obtained first test data group is acquired via the communication unit of the prediction device, and the acquired first test data group and the second test data group are obtained.
- the first test data group is a group of data showing the behavior of biomarkers in one or more organs collected from non-human animals to which the test substance was administered, and is an organ collected at the time of generation of the first training data group. It is a group of data showing the behavior of the biomarker in one or a plurality of organs corresponding to, and the second test data group corresponds to a plurality of known indication labels and each of the plurality of known indications. This is a group of data associated with information on adverse events acquired at the time of generation of the third training data group reported in the above. Item 14.
- the name of the existing substance administered when acquiring a group of data showing the behavior of the biomarker in one or a plurality of different organs and a group of data showing the behavior of the biomarker is given.
- the first training data group which is a group of data associated with the indicated label
- the one or more different organs individually administer a plurality of predetermined existing substances having known indications in humans.
- the first training data group labels indicating the names of the plurality of predetermined existing substances, and the indications reported for each of the plurality of predetermined existing substances, which are collected from each of the non-human animals.
- the second training data group which is a group of data associated with the label indicating the disease, and the information on the adverse events reported corresponding to each of the label indicating the indication and the indication are associated.
- the third training data group which is a group of the obtained data, relates to a method of using the third training data group for training an artificial intelligence model for predicting the indication of a test substance in humans.
- Item 15. The present invention relates to a method of using a first test data group and a second test data group as test data for predicting the indication of a test substance in humans.
- the first test data group is a group of data showing the behavior of biomarkers in one or more organs collected from a non-human animal to which the test substance was administered, and generation of the first training data group.
- test substance Even if the test substance has an unknown effect on the existing substance used when acquiring the training data, the effect can be predicted.
- the outline of the present invention is shown.
- the outline of the invention (conventional technique) described in Patent Document 2 is shown.
- An example of training data is shown.
- (A) shows an example of the first training data.
- (B) shows an example of the second training data.
- (C) shows an example of the third training data.
- (D) shows an example of the 4th training data.
- (A) shows the hardware configuration of the training system.
- (B) shows the hardware configuration of the prediction system.
- the hardware configuration of the training device is shown. It is a flowchart which shows the process flow of a training program.
- the hardware configuration of the prediction device is shown. It is a flowchart which shows the processing flow of a prediction program.
- the hardware configuration of the server device is shown.
- the prediction result of the artificial intelligence trained without using the transcriptome data of the test drug is shown.
- the prediction results of artificial intelligence trained using the transcriptome data of the test drug are shown.
- Some of the decision function values of the alendronate are shown.
- the prediction method predicts the indication of the test substance in humans.
- the predictive method is the behavior of the biomarker in non-human animals administered with an existing substance of known action in humans, the known indications, and the adverse events reported in response to the known indications.
- the prediction is achieved using an artificial intelligence model.
- the artificial intelligence model used for prediction preferably associates three types of training data groups, a first training data group, a second training data group, and a third training data group. Trained by a set of data.
- a plurality of predetermined existing substances having known indications in humans were individually administered to non-human animals, and one or one collected from each of the non-human animals. It is a group of data in which a group of data showing the behavior of a biomarker in each of a plurality of different organs is associated with a label showing the name of each of the predetermined existing substances administered.
- drugs A, B, and C are individually administered to non-human animals such as mice as predetermined existing substances, and the organs or organs are individually administered from the non-human animals. Collect some tissue.
- FIG. 3A shows a specific example of the first training data group.
- the leftmost column is the first column.
- the drug name “Aripiprazole” and the drug name “EMPA” are shown as examples.
- the second and subsequent columns show the expression level of RNA in each organ.
- "Heart” and “Skin” are labels for organ names
- "Alas2" and "Apod” are labels for gene names whose expression has been analyzed.
- values indicating the expression level of each gene are input as elements.
- [label indicating the organ name and label indicating the gene name] and [value indicating the expression level of each gene] correspond to the label indicating the drug name.
- the second training data includes a label (first column of FIG. 3A) indicating the name of each of the plurality of predetermined existing substances administered when the first training data group was acquired.
- a group of data associated with the label indicating the indication reported for each of the plurality of predetermined existing substances.
- FIG. 3B shows a specific example of the second training data group.
- the leftmost column is the first column.
- the drug name “Aripiprazole” and the drug name “EMPA” are shown as examples.
- the second and subsequent columns are the indications reported for each of the drugs listed in the first column.
- "Nerve injury” is indicated as a label indicating the indication of the drug name "Aripiprazole”
- Type 2 diabetes mellitus is indicated as the label of the name indicating the indication of the drug name "EMPA”.
- the third training data includes a label indicating the indication shown in FIG. 3 (B) reported for each of the plurality of predetermined existing substances administered when acquiring the first training data group. It is a group of data associated with information on adverse events reported for each of these indications.
- the information about the adverse event may include a label indicating the name of the adverse event, the presence or absence of the adverse event, or the frequency of occurrence.
- FIG. 3C shows a more specific example of the third training data group. In the example of the first training data group shown in FIG. 3C, the leftmost column is the first column. “Nerve injury”, which is an indication for the drug name “Aripiprazole” described in “Indication 1” in FIG.
- what is input to the artificial intelligence is the fourth training data group generated by associating the first training data group and the third training data group with the second training data group. ..
- FIG. 3 (D) An example of the fourth training data group is shown in FIG. 3 (D).
- the leftmost column is the first column.
- labels indicating the names of the adverse events shown in FIG. 3C and the frequency of occurrence of each are shown.
- a label indicating the name of the organ shown in FIG. 3A a label indicating the name of the gene, and the expression level of the gene are shown.
- FIG. 3 (D) is data in which the frequency of occurrence of adverse events in the second and subsequent columns shown in FIG. 3 (C) is substituted for the label in the first column indicating the drug name in FIG. 3 (A).
- the artificial intelligence model trained in (1) is used to predict the indication of the test substance in humans.
- the test data groups input to the artificial intelligence model trained in predicting the indication are the first test data group and the second test data group.
- the first test data group is input to the trained artificial intelligence model together with the second test data group.
- the first test data group is a group of data showing the behavior of biomarkers in one or more organs collected from non-human animals to which the test substance was administered.
- the plurality of periods correspond to the organs collected at the time of generation of the first training data group.
- the first test data is obtained by administering one test substance to a non-human animal, collecting one or more organs, analyzing the transcriptome, and [label indicating the organ name and the label indicating the gene name]. ] And [value indicating the expression level of each gene].
- the second test data group includes labels for a plurality of known indications and information on adverse events acquired at the time of generation of the third training data group reported corresponding to each of the plurality of known indications. Is a group of data associated with.
- the plurality of known indications may include not only indications used as the second training data but also known indications registered in an external database.
- “plurality" can be intended, for example, 100, 500, 1000, or 2000, or even more.
- the prediction method does not have to be an existing substance or an equivalent substance of the existing substance as the test substance. If the test substance is not an existing substance or an equivalent substance of an existing substance, the prediction method is a method for predicting the indication of the new substance.
- the prediction method may include an existing substance or an equivalent substance of the existing substance as the test substance.
- the prediction method is a drug repositioning method for searching for a new indication of an existing substance or an equivalent substance of an existing substance.
- the prediction method described herein it is preferable that the test substance is also included in the existing substance administered to obtain the first training data group. By doing so, the prediction accuracy can be improved.
- the conventional method shown in FIG. 2 is the method described in Patent Document 2.
- drugs A, B, and C as existing substances are individually administered to a non-human animal such as a mouse, and the above.
- Each non-human animal is harvested from an organ or tissue that is part of an organ.
- the behavior of the biomarker in the collected organ or tissue is analyzed, and the first training data group is generated.
- a second training data is generated from a human clinical database of adverse events, indications, pharmacokinetics, indications, etc. of existing substances.
- the artificial intelligence model shown in FIG. 2 is generated by training using the first training data group and the second training data.
- the conventional method builds an artificial intelligence model by associating the behavior of the biomarker with each of the adverse events, indications, pharmacokinetics, or indications of the existing substance.
- the test data used in the conventional method is one or more different organs of the non-human animal to which the test substance was administered, and one or more organs corresponding to the organs collected at the time of generation of the first training data group. It is the data which shows the behavior of the biomarker in.
- This embodiment differs from the conventional method in that not only the behavior of the biomarker but also the information on the adverse event assigned to the indication name is used as the training data. Also, as test data, not only the behavior of biomarkers but also information on a plurality of known indications and adverse events will be used.
- test substance has an unknown indication for the existing substance used when acquiring the training data, it can be predicted.
- Non-human animals are not limited in the present disclosure. Examples thereof include mammals such as mice, rats, dogs, cats, rabbits, cows, horses, goats, sheep and pigs, and birds such as chickens. Mammals such as mice, rats, dogs, cats, cows, horses and pigs are preferable, mice or rats are more preferable, and mice are even more preferable. Non-human animals also include foets, chicks and the like of the animals.
- “substances” include, for example, compounds; nucleic acids; sugars; lipids; glycolipids; glycolipids; lipoproteins; amino acids; peptides; proteins; polyphenols; chemokines; terminal metabolites and intermediate metabolites of the substances. , And at least one metabolite selected from the group consisting of synthetic raw materials; metal ions; or microorganisms and the like.
- the substance may be a simple substance or a mixture of a plurality of kinds of substances.
- the “substance” includes pharmaceuticals, quasi-drugs, medicinal cosmetics, foods, foods for specified health uses, foods with functional claims, and candidate products thereof.
- the “substance” may also include a substance whose study has been discontinued or discontinued in preclinical or clinical trials for regulatory approval.
- Existing substance is not limited as long as it is an existing substance. Preferably, it is a substance whose action in humans is known.
- the "equivalent substance of an existing substance” may include a substance having a structure similar to that of the existing substance and having an action similar to that of the existing substance.
- a similar action is intended to have the same action as an existing substance, although the strength of the action is the same or different.
- Adverse events are not limited as long as they are actions that are judged to be harmful to humans.
- FAERS https://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Surveillance/AdverseDrugEffects/ucm082193.htm
- Adverse events listed in external databases such as gov https://clinicaltrials.gov/) can be exemplified.
- “Indications” are not limited as long as they are intended to reduce, treat, stop or prevent diseases and symptoms in humans.
- the above-mentioned FAERS, DAILYMED all drag labels (https://dailymed.nlm.nih.gov/dailymed/spl-resources-all-drug-labels.cfm), Medical Subject Headings (https: /) /www.nlm.nih.gov/mesh/meshhome.html), Drugs @ FDA (https://www.accessdata.fda.gov/scripts/cder/daf/), International Classication of Diseases (https: // www) It can exemplify diseases or symptoms listed in external databases such as .who.int/health-topics/international-classification-of-diseases).
- the indications are ischemic diseases such as thrombosis, embolism, stenosis (particularly heart, brain, lung, large intestine, etc.); circulatory disorders such as aneurysm, venous aneurysm, congestion, bleeding (aorta).
- ischemic diseases such as thrombosis, embolism, stenosis (particularly heart, brain, lung, large intestine, etc.); circulatory disorders such as aneurysm, venous aneurysm, congestion, bleeding (aorta).
- Symptoms or diseases associated with infectious diseases bacteria, viruses, liquettia, chlamydia, fungi, protozoa, parasites, etc.
- renal diseases systemic erythematosus, autoimmune diseases such as multiple sclerosis, etc. be able to.
- the incidence of adverse events can be determined by the following method.
- the word indicating the name of the adverse event is referred to as the above clinicaltrials.
- Extract from databases such as gov, FAERS, and DAILYMED's all drug labels by text extraction or the like.
- the "organ” is not limited as long as it is an organ existing in the body of the mammal or bird described above.
- the organs include circulatory organs (heart, arteries, veins, lymph vessels, etc.), respiratory organs (nasal cavity, sinus cavity, laryngeal, trachea, bronchi, lungs, etc.), digestive system organs ().
- the "organs” include bone marrow, pancreas, skull, liver, skin, brain, pituitary gland, adrenal gland, thyroid gland, spleen, thymus, heart, lung, aorta, skeletal muscle, testis, peri-mitral fat, eyeball. , At least one selected from the ileum, stomach, jejunum, large intestine, adrenal gland, and parotid gland.
- the plurality of organs is not limited as long as it is two or more. For example, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, and 24 species. You can choose from organs.
- Organic-derived means, for example, that it was collected from an organ, or that it was cultured from the cells or tissues of the collected organ, or body fluid.
- the "body fluid” includes serum, plasma, urine, cerebrospinal fluid, ascites, pleural effusion, saliva, gastric fluid, pancreatic fluid, bile, milk, lymph fluid, interstitial fluid, and the like.
- Biomarker refers to an in vivo substance that can fluctuate in cells or tissues of each organ and / or body fluid depending on the administration of the substance.
- In vivo substances that can be “biomarkers” are nucleic acids; sugars; lipids; glycolipids; glycolipids; lipoproteins; amino acids, peptides; proteins; polyphenols; chemokines; terminal metabolites, intermediate metabolites, and intermediate metabolites of the substances.
- At least one metabolite selected from the group consisting of synthetic raw materials; and at least one selected from metal ions and the like can be exemplified. More preferably, it is a nucleic acid.
- the biomarker is preferably a group of in vivo substances that can vary with the cells or tissues of each organ and / or body fluids, depending on the administration of the substance.
- a group of substances in the living body for example, nucleic acids; sugars; lipids; glycolipids; glycolipids; lipoproteins; amino acids, peptides; proteins; polyphenols; chemokines; terminal metabolites, intermediate metabolites, and synthetic raw materials of the substances.
- At least one group selected from the group consisting of substances; and at least one group selected from metal ions and the like can be exemplified.
- the "nucleic acid” is preferably a group of RNA contained in a transcriptome such as mRNA, non-coding RNA, and microRNA, and more preferably a group of mRNA.
- the RNA is preferably mRNA, untranslated RNA and / or microRNA that can be expressed in cells or tissues of the above organs, or cells in body fluids, and more preferably mRNA or untranslated RNA that can be detected by RNA-Seq or the like.
- RNA-Seq RNA-Seq
- the "group of data showing the behavior of the biomarker” is intended to be a group of data showing that the biomarker fluctuated or did not fluctuate according to the administration of the existing substance.
- the behavior of the biomarker indicates that the biomarker fluctuated with administration of the existing substance.
- the data can be obtained, for example, by the following method. Measure the abundance or concentration of each biomarker for tissues, cells, body fluids, etc. derived from a certain organ collected from a non-human animal to which an existing substance has been administered, and obtain the measured value in each organ of an individual to which the existing substance has been administered. To do. Similarly, for tissues, cells, body fluids, etc.
- the abundance of each biomarker or the abundance of each biomarker was similarly obtained. Measure the concentration and obtain the measured value of the non-administered individual. The measured values of each biomarker derived from each organ of the existing substance-administered individual are compared with the measured values of the biomarkers in each organ corresponding to the biomarker of the existing substance-administered individual in the non-administered individual, and the value showing the difference is shown. Get as data.
- "corresponding" means whether the organ and biomarker are the same or the same species.
- the difference can be indicated by the ratio (eg, division value) of the measured value of each biomarker derived from the existing substance-administered individual to the measured value of the biomarker in the non-administered individual corresponding to this biomarker.
- the data is a division of the measured value of biomarker A of organ A derived from an existing substance-administered individual divided by the measured value of biomarker A of organ A derived from a non-administered individual.
- RNA-Seq total RNA that can be analyzed by RNA-Seq may be used, but the RNA is, for example, WGCNA (https://labs.genetics.ucla.edu/horvath).
- WGCNA https://labs.genetics.ucla.edu/horvath.
- CoexpressionNetwork / Rpackages / WGCNA / may be used to analyze the expression of the RNA and divide it into subsets (modules) of data showing the behavior of each RNA associated with the organ name and the gene name. For each module divided by WGCNA, calculate Pearson's correlation coefficient with the 1-of-K representation for each existing substance, and select and select the module with the highest absolute value of the correlation coefficient for each existing substance.
- the RNA in each organ contained in the module may be used as a biomarker.
- the change in transcriptome in each organ of the existing substance-administered animal compared to the non-administered animal of the existing substance should be performed using DESeq2 analysis.
- the expression level of RNA in each organ collected from an animal to which an existing substance is administered and the expression level of a gene in each corresponding organ collected from an animal to which an existing substance is not administered are quantified by htseq-count, and each count data is obtained. .. Then, the expression level of each organ and each gene in each organ is compared.
- the log 2 (fold) value of the gene expression fluctuation amount of the existing substance-administered animal and the p value which is an index of the certainty of each fluctuation amount are output for each gene for each organ. Based on the log 2 (fold) value, it is possible to determine the presence or absence of the behavior of a biomarker such as a transcriptome.
- the measured value of the biomarker can be obtained by a known method.
- the biomarker is a nucleic acid
- the measured value can be obtained by sequencing RNA-Seq or the like, quantitative PCR or the like.
- the biomarker is at least one metabolite selected from the group consisting of sugars, lipids, glycolipids, amino acids, polyphenols; chemocaines; terminal metabolites, intermediate metabolites, and synthetic raw materials of the substances.
- the measured value can be obtained by mass spectrometry or the like.
- the biomarker is a glycoprotein, lipoprotein, peptide, protein or the like
- the measured value can be obtained by an ELISA method (Enzyme-Linked Immuno Substance Association) or the like.
- a method for collecting tissue, cells, or body fluid derived from an organ used for measurement, and a pretreatment method for measuring biomarkers are also known.
- Test substance is a substance to be evaluated for its action.
- the test substance may be an existing substance, an equivalent of an existing substance, or a novel substance.
- the action of the test substance in humans can be predicted even when the relationship between the action of the test substance and the action of the existing substance or the equivalent substance of the existing substance is not found.
- the test substance is a kind selected from the existing substance or the equivalent of the existing substance, the unknown action of the existing substance or the equivalent of the existing substance can be found.
- the unknown action may be one or more.
- the unknown effect is preferably a new application.
- Drug repositioning can also be performed by predicting new indications for the test substance in humans. Administration of the test substance to non-human animals is known.
- the data showing the behavior of the biomarker in one or more organs collected from the non-human animal to which the test substance was administered shows the behavior of the biomarker in one or more organs collected from the non-human animal to which the existing substance was administered. It can be obtained in the same way as the data indicating.
- the first training data group is composed of a group of data indicating the behavior of biomarkers in one or a plurality of different organs and a label indicating the existing substance name. To. The one or more different organs can be harvested from each non-human animal individually administered with a plurality of existing substances known to act in humans.
- the first training data group can be stored as the database TR1 in the auxiliary storage unit 104 of the training device 10 shown in FIG.
- a group of data showing the behavior of biomarkers in one or more different organs is described in 1. above. It can be obtained by the method described in (4).
- Each of the data showing the behavior of the biomarker in each of the organs can be associated with information on the name of the existing substance administered, information on the name of the collected organ, information on the name of the biomarker, and the like.
- the information about the name may be the name itself, a label such as an abbreviation, or a label value corresponding to each name.
- Each data included in the group of data showing the behavior of the biomarker is an element constituting a matrix in the first training data group of the artificial intelligence model described later.
- the expression level of each RNA corresponds to the data included in the group of data showing the behavior of the biomarker, and becomes an element of the matrix constituting the first training data group.
- the biomarker is a transcriptome
- the log 2 (fold) value of each existing substance obtained by DESeq2 analysis may be used as each element of the first training data group.
- An example of the first training data group is the above 1. It is as shown in (1) and FIG. 3 (A).
- the measured value of the biomarker may be used as it is as an element of the first training data group, but after standardization, dimension reduction, etc., it is used as an element of the first training data group. You may use it.
- a standardization method for example, a method of converting data showing an expression difference so that the average value is 0 and the variance is 1 can be exemplified.
- the mean value in the standardization can be the mean value in each organ, the mean value in each gene, or the mean value in all the data.
- the dimension reduction can be performed by statistical processing such as principal component analysis.
- the population for statistical processing can be organ-by-organ, gene-by-gene, or whole data.
- the biomarker is a transcriptome
- genes whose p-value with respect to the log 2 (fold) value of each existing substance obtained by DESeq2 analysis is equal to or less than a predetermined value are used as elements of the first training data group. May be good.
- the predetermined value can be, for example, 10 -3 or 10 -4 . Preferably, it is 10-4.
- the label indicating the name of each of the predetermined existing substances administered, which is included in the first training data group may be the name of the substance itself, or may be encoded.
- the first training data group can be updated by updating existing substances and adding data showing the behavior of new biomarkers.
- the second training data group is described in the above 1.
- the indications for existing substances are as described in 1. above. Search for existing substances from external databases such as FAERS, DAILYMED's all drugs, Medical Substance Headings, Drugs @ FDA, International Classification of Diseases, etc. described in (4). You can get the label of the corresponding indication name. There can be one or more indications for one existing substance.
- the two or more indications constitute the second training data group.
- Labels indicating the indications reported for each of a plurality of predetermined existing substances are obtained by performing text extraction, natural language processing, digitizing processing, image analysis processing, etc. on the data group stored in the database. Can be obtained. For example, a label indicating the name of each indication corresponding to each existing substance administered to the non-human animal when generating the first training data group stored in the external database is inserted and registered in the text. In that case, the registered sentence may be subjected to parsing, word division, semantic analysis, etc. by natural language processing, and then the text corresponding to the action may be extracted.
- the third training data is described in 1. above.
- the indications shown in FIG. 3 (B) reported for each of the plurality of predetermined existing substances administered when acquiring the first training data group Is a group of data associated with a label indicating the above and information on adverse events reported corresponding to each of these indications.
- Indications reported for each of a plurality of pre-existing substances are from external databases such as FAERS, DAILYMED's all drug labels, Medical Subject Headings, Drugs @ FDA, International Classification of Diseases, etc., for each existing substance. You can search by the word of the substance name and obtain the label of the corresponding indication name.
- Labels indicating adverse events reported for each of these indications are FAERS, or clinicaltrials. It can be searched and obtained from an external database such as gov with a label indicating the indication name. In addition, when a label indicating the name of an indication or an adverse event is inserted in a sentence and registered, syntactic analysis, word division, semantic analysis, etc. are performed on the registered sentence by natural language processing. After that, the text corresponding to the action may be extracted. The frequency of adverse events is described in 1. above. It can be calculated by the method described in (4).
- the 4th training data group is described in 1. above. As described in (1) and FIG. 3 (D), the first column of the label indicating the drug name included in the first training data group (the first column showing the drug name in FIG. 3 (A)) is displayed. Frequency of adverse events reported for indications corresponding to labels indicating the names of existing substances administered to obtain training data (occurrence of adverse events in the second and subsequent columns shown in FIG. 3C) Frequency) is substituted and generated.
- the artificial intelligence model is not limited as long as it can solve the problem according to the present invention. In this embodiment, it is preferable to use an artificial intelligence model capable of performing Link Precision.
- One-Class SVM one-class support vector machine
- One-Class SVM one-class support vector machine
- the data to be input to the One-class SVM is input to the One-class SVM as the fourth training data group by associating the first training data group and the third training data group with the kernel function of the following equation.
- k (g A d 1 , g B d 2 ) ⁇ g A , g B > ⁇ d 1 , d 2 >
- ⁇ , ⁇ > indicates an operator that scales each vector so that the 12 norm becomes 1, and takes the inner product between both scaled vectors.
- Training system for artificial intelligence model Fig. 4 (A) shows the hardware configuration of the training system 50.
- the training system 50 includes a measuring unit 30 for acquiring measurement data of a biomarker such as a next-generation sequencer, and a training device 10.
- the training device 10 and the measuring unit 30 may be communicably connected by a wireless or wired network, but the data acquired by the measuring unit 30 may be acquired via a storage medium such as a CD-R. ..
- (1) Artificial Intelligence Model Training Device Training of the artificial intelligence model can be performed using, for example, a training device 10 (hereinafter, also referred to as a device 10).
- a training device 10 hereinafter, also referred to as a device 10.
- the device 10 includes at least a processing unit 101 and a storage unit.
- the storage unit is composed of a main storage unit 102 and / or an auxiliary storage unit 104.
- FIG. 5 shows the hardware configuration of the device 10.
- the device 10 may be connected to the input unit 111, the output unit 112, and the storage medium 113. Further, it may be connected to a measuring unit 30 such as a next-generation sequencer or a mass spectrometer.
- the device 10 includes FAERS, DAILYMED's all drugs, Medical Subject Headings, Drugs @ FDA, International Classification of Diseases, and clinical trials. It may be communicably connected to an external database 60 such as gov.
- the output interface (I / F) 107 and the media interface (I / F) 108 are connected to each other by a bus 109 so as to be capable of data communication.
- the processing unit 101 is composed of a CPU, an MPU, or the like.
- the processing of the processing unit 101 may be assisted by the GPU.
- the device 10 functions when the processing unit 101 executes a computer program stored in the auxiliary storage unit 104 or the ROM 103 and processes the acquired data.
- the processing unit 101 is described in the above 1.
- a group of data showing the behavior of biomarkers in a plurality of different organs collected from non-human animals to which the existing substance described in the above is administered, and the known action of the existing substance in humans are acquired as training data.
- the artificial intelligence model is trained using the above two training data.
- the ROM 103 is composed of a mask ROM, a PROM, an EPROM, an EEPROM, and the like, and records a computer program executed by the processing unit 101 and data used for the computer program.
- the ROM 103 stores a boot program executed by the processing unit 101 when the device 10 is started, a program related to the operation of the hardware of the device 10, a setting, and the like.
- the main storage unit 102 is composed of a RAM (Random access memory) such as a SRAM or a DRAM.
- the main storage unit 102 is used for reading the computer program recorded in the ROM 103 and the auxiliary storage unit 104. Further, the main storage unit 102 is used as a work area when the processing unit 101 executes these computer programs.
- the main storage unit 102 temporarily stores the functions of the artificial intelligence model read from the auxiliary storage unit 104, such as training data acquired via the network.
- the auxiliary storage unit 104 is composed of a hard disk, a semiconductor memory element such as a flash memory, an optical disk, or the like.
- the auxiliary storage unit 104 stores various computer programs to be executed by the processing unit 101 and various setting data used for executing the computer programs.
- the database TR3 that stores the training data group is stored non-volatilely.
- the training program TP cooperates with the operation software (OS) 1041 to perform artificial intelligence training processing described later.
- the communication I / F 105 is a serial interface such as USB, IEEE1394, RS-232C, a parallel interface such as SCSI, IDE, IEEE1284, an analog interface including a D / A converter, an A / D converter, and a network interface controller ( It is composed of Network interface controller (NIC) and the like.
- the communication I / F 105 functions as a communication unit 105, receives data from the measurement unit 30 or another external device under the control of the processing unit 101, and stores or generates information that the device 10 stores or generates as needed. It is transmitted or displayed to the measuring unit 30 or the outside.
- the communication I / F 105 may communicate with the measuring unit 30 or another external device (not shown, for example, another computer or a cloud system) via a network.
- the input I / F 106 is composed of, for example, a serial interface such as USB, IEEE1394, RS-232C, a parallel interface such as SCSI, IDE, IEEE1284, and an analog interface including a D / A converter and an A / D converter.
- a serial interface such as USB, IEEE1394, RS-232C
- a parallel interface such as SCSI, IDE, IEEE1284
- an analog interface including a D / A converter and an A / D converter.
- the input I / F 106 accepts character input, click, voice input, and the like from the input unit 111.
- the received input contents are stored in the main storage unit 102 or the auxiliary storage unit 104.
- the input unit 111 is composed of a touch panel, a keyboard, a mouse, a pen tablet, a microphone, and the like, and inputs characters or voices to the device 10.
- the input unit 111 may be connected from the outside of the device 10 or may be integrated with the device 10.
- the output I / F 107 is composed of an interface similar to that of the input I / F 106, for example.
- the output I / F 107 outputs the information generated by the processing unit 101 to the output unit 112.
- the output I / F 107 outputs the information generated by the processing unit 101 and stored in the auxiliary storage unit 104 to the output unit 112.
- the output unit 112 is composed of, for example, a display, a printer, etc., and displays the measurement results transmitted from the measurement unit 30, various operation windows in the device 10, each training data, an artificial intelligence model, and the like.
- the media I / F 108 reads, for example, application software stored in the storage medium 113.
- the read application software and the like are stored in the main storage unit 102 or the auxiliary storage unit 104. Further, the media I / F 108 writes the information generated by the processing unit 101 into the storage medium 113.
- the media I / F 108 writes the information generated by the processing unit 101 and stored in the auxiliary storage unit 104 to the storage medium 113.
- the storage medium 113 is composed of a flexible disk, a CD-ROM, a DVD-ROM, or the like.
- the storage medium 113 is connected to the media I / F 108 by a flexible disk drive, a CD-ROM drive, a DVD-ROM drive, or the like.
- the storage medium 113 may store an application program or the like for the computer to execute an operation.
- the processing unit 101 may acquire the application software and various settings necessary for controlling the device 10 via the network instead of reading from the ROM 103 or the auxiliary storage unit 104.
- the application program is stored in the auxiliary storage unit of the server computer on the network, and the device 10 can access the server computer to download the computer program and store it in the ROM 103 or the auxiliary storage unit 104. Is.
- ROM 103 or the auxiliary storage unit 104 an operation system that provides a graphical user interface environment such as Windows (registered trademark) manufactured and sold by Microsoft Corporation in the United States is installed.
- the application program according to the second embodiment shall run on the operating system. That is, the device 10 can be a personal computer or the like.
- the processing unit 101 receives the processing start command input from the input unit 111 by the operator, and the first training data group database TR1, the second training data group database TR2, and the third training data group database TR2 stored in the auxiliary storage unit 104 in step S1.
- the first training data group, the second training data group, and the third training data group are acquired from each of the training data group database TR3.
- the processing unit 101 receives the generation start command of the fourth training data group input from the input unit 111 by the operator, and generates the fourth training data group in step S2.
- the processing unit 101 receives an input command for the fourth training data group input by the operator from the input unit 111, inputs the fourth training data group to the artificial intelligence model AI1 in step S3, and trains the artificial intelligence model. ..
- the processing unit 101 stores the trained artificial intelligence model in the auxiliary storage unit 104.
- the transition between the steps may be performed by the operator by inputting a command, but the processing unit 101 may automatically proceed by using the completion of the previous step as a trigger.
- the first test data group is a group of data showing the behavior of biomarkers in one or a plurality of different organs, and the first training data was acquired. It can be obtained from a period corresponding to one or more different organs.
- the group of data showing the behavior of biomarkers in each organ is described in 1. above. It can be obtained in the same manner as the data group showing the behavior of the biomarker used as the first training data by the method described in (4).
- the second test data is described in the above 1.
- a group of data in which labels of a plurality of known indications and information on adverse events reported corresponding to each of the plurality of known indications are linked. is there.
- Labels for multiple known indications and labels indicating adverse events reported for each of these indications are available from FAERS, or clinicaltrials.gov. It can be searched and obtained from an external database such as gov with a label indicating the indication name.
- syntactic analysis, word division, semantic analysis, etc. are performed on the registered sentence by natural language processing. After that, the text corresponding to the action may be extracted.
- the frequency of adverse events is described in 1. above. It can be calculated by the method described in (4).
- FIG. 4A shows the hardware configuration of the prediction system 51.
- the prediction system 51 includes a measurement unit 30 for acquiring measurement data of a biomarker such as a next-generation sequencer, and a prediction device 20.
- the prediction device 20 and the measurement unit 30 may be connected by a wireless or wired network, but the data acquired by the measurement unit 30 may be acquired via a storage medium such as a CD-R.
- the prediction of the indication can be performed using, for example, a prediction device 20 (hereinafter, may be simply referred to as a device 20).
- FIG. 7 shows the hardware configuration of the prediction device 20 (hereinafter, also referred to as the device 20).
- the device 20 includes at least a processing unit 201 and a storage unit.
- the storage unit is composed of a main storage unit 202 and / or an auxiliary storage unit 204.
- the device 20 may be connected to the input unit 211, the output unit 212, and the storage medium 213. Further, it may be connected to a measuring unit 30 such as a next-generation sequencer or a mass spectrometer.
- the output interface (I / F) 207 and the media interface (I / F) 208 are connected to each other by bus 209 so as to be capable of data communication.
- the communication interface 205 functions as a communication unit 205.
- the auxiliary storage unit 204 of the device 20 stores the operation software (OS) 1041, the training program TP, the artificial intelligence model AI1, the database TR1 that stores the first training data group, and the second training data group.
- the operation software (OS) 2041, the prediction program PP, the trained artificial intelligence model AI2, and the database TS1 that stores the first test data group is stored non-volatilely.
- the prediction program PP cooperates with the operation software (OS) 2041 to perform prediction processing of indications described later.
- the processing unit 201 receives the processing start command input from the input unit 211 by the operator, and acquires the first test data group and the second test data group stored in the auxiliary storage unit 204 in step S51.
- the processing unit 201 receives the prediction start command input by the operator from the input unit 211, and in step S52, the first test data group database TS1, the second test data group database TS2, the first test data group, and the first test data group 2 Input the test data group into the trained artificial intelligence model AI2 to predict the indication of the test substance.
- the trained artificial intelligence model AI2 individually determines whether or not the target test substance is effective for all the indications input as the second test data. Specifically, the trained artificial intelligence model AI2 determines whether or not there is a link between the target drug and the individual indication in the LP problem.
- the processing unit 201 stores the result in the storage unit.
- the result derived by the processing unit 201 by the trained artificial intelligence model AI2 returns the label "1" if the test substance works for a certain indication, and the label "-1" if the test substance does not work for a certain indication. ". That is, the indication marked with "1" is the predicted indication of the test substance.
- the artificial intelligence model is One-Class SVM
- the description function values indicating the reliability of the prediction are calculated.
- it can be predicted that the possibility of indications is high in descending order of this value.
- another drug having a mechanism of action similar to that of the target test substance is administered to the test substance, and the transcriptome in one or more organs collected is obtained.
- the prediction result of the target test substance is compared with the prediction result of other test substances having a similar mechanism of action, and the indications common to both are used. May be the prediction result.
- FIG. 4B shows the configuration of the prediction system 400.
- the measurement unit 30, the training device 10, the prediction device 20, and the server device 40 that transmits a data group indicating the behavior of the biomarker are communicably connected to each other.
- the training device 10 and the prediction device 20 acquire the data acquired by the measuring unit 30 via the server device 40.
- Server device Regarding the server device 40 (hereinafter, may be simply referred to as the device 40), the above 1. , Above 2-1. For terms that are common to the terms described in, the above description is incorporated herein by reference.
- FIG. 9 shows the hardware configuration of the server device 40 (hereinafter, also referred to as the device 40).
- the device 40 includes at least a processing unit 401 and a storage unit.
- the storage unit is composed of a main storage unit 402 and / or an auxiliary storage unit 404.
- the device 40 may be connected to the input unit 411, the output unit 412, and the storage medium 413. In addition, it can be communicably connected to the measuring unit 30 of the next-generation sequencer, mass spectrometer, or the like via a wireless or wired network.
- the output interface (I / F) 407 and the media interface (I / F) 408 are connected to each other by a bus 409 so as to be capable of data communication.
- the communication interface 405 functions as a communication unit 405.
- the auxiliary storage unit 404 of the device 40 stores the operation software (OS) 1041, the training program TP, the artificial intelligence model AI1, the database TR1 that stores the first training data group, and the second training data group.
- the operation software (OS) 4041 and the database TS1 that stores the first test data group are stored non-volatilely.
- step S81 the measuring unit 30 acquires the measured values of the biomarkers of each organ of the non-human animal to which the existing substance has been administered.
- the acquisition of the measured value in the measuring unit 30 can be performed by inputting the measurement start instruction by the operator.
- step S82 the measuring unit 30 transmits the acquired measured value to the server device 40.
- the transmission process can be performed by inputting a transmission start instruction by the operator.
- step S83 the processing unit 401 of the server device 40 acquires the measured value via the communication I / F405.
- the communication I / F 405 functions as a communication unit.
- step S84 in response to an instruction to start acquisition of the measured value input by the operator from the input unit 111 of the training device 10, the processing unit 101 of the training device 10 sends a signal of starting measurement value transmission from the communication I / F 105 to the server device 40. Send to.
- the processing unit 401 of the server device 40 receives the input for starting the measurement value transmission via the communication I / F 405, and starts the transmission of the measured value from the communication I / F 405.
- the communication I / F 105 and the communication I / F 405 function as the communication unit 105 and the communication unit 405, respectively.
- step S85 the processing unit 101 of the training device 10 acquires information on the indication of the existing substance administered to the non-human animal from the external database 60 and the adverse event corresponding to the indication via the communication I / F 105. To do.
- step S84 the processing unit 101 of the training device 10 acquires the measured value transmitted from the server device 40 via the communication I / F 105 (step S86) and stores it in the storage unit of the training device 10.
- Step S86 may be performed before step S85.
- step S87 of FIG. 14 the processing unit 101 of the training device 10 sets the first training data group, the second training data group, and the third training data group according to the processing shown in step S1 of FIG. Generate.
- the description of step S1 of FIG. 6 is incorporated herein by reference.
- step S88 of FIG. 14 the processing unit 101 of the training device 10 from the first training data group, the second training data group, and the third training data group according to the processing shown in step S2 of FIG.
- the fourth training data group is generated.
- the description of step S2 of FIG. 6 is incorporated herein by reference.
- step S89 of FIG. 14 the processing unit 101 of the training device 10 inputs the fourth training data group to the artificial intelligence model according to the processing shown in steps S3 to S4 of FIG. 6, and inputs the artificial intelligence model. Train and store the trained artificial intelligence model in the memory. Further, the description of steps S3 to S4 in FIG. 6 is incorporated herein by reference.
- the processing unit 101 of the training device 10 After receiving the instruction from the prediction device 20 to start transmitting the artificial intelligence model, the processing unit 101 of the training device 10 transmits the trained artificial intelligence model stored in step S90 of FIG. 14 via the communication I / F 105 to the prediction device 20. Send to. At this time, the communication I / F 105 functions as the communication unit 105.
- step S91 the measuring unit 30 acquires the measured values of the biomarkers of each organ of the non-human animal to which the test substance is administered.
- the acquisition of the measured value in the measuring unit 30 can be performed by inputting the measurement start instruction by the operator.
- step S92 the measuring unit 30 transmits the acquired measured value to the server device 40.
- the transmission process can be performed by inputting a transmission start instruction by the operator.
- step S93 the processing unit 401 of the server device 40 acquires the measured value via the communication I / F405.
- the communication I / F 405 functions as the communication unit 405.
- step S94 in response to an instruction to start acquisition of the measured value input by the operator from the input unit 211 of the prediction device 20, the processing unit 201 of the prediction device 20 sends a signal of the measurement value transmission start from the communication I / F 205 to the server device 40. Send to.
- the processing unit 401 of the server device 40 receives the input for starting the measurement value transmission via the communication I / F 405, and starts the transmission of the measured value from the communication I / F 405.
- the communication I / F 205 and the communication I / F 405 function as a communication unit.
- the processing unit 201 of the prediction device 20 acquires the measured value via the communication I / F 205 and stores it in the storage unit of the prediction device 20. Subsequently, the processing unit 201 of the prediction device 20 generates the first test data group. The generation of the first test data group is described in 2-4.
- step S95 the processing unit 201 of the prediction device 20 transmits an artificial intelligence model transmission start instruction to the training device 10 via the communication I / F 205.
- the processing unit 101 of the training device 10 receives the artificial intelligence model transmission start instruction from the prediction device 20, it transmits the artificial intelligence model trained to the prediction device 20 via the communication I / F 105 of the training device 10.
- the predictor 20 acquires an artificial intelligence model trained via the communication I / F 205.
- Step S95 may be performed before step S94.
- step S96 the processing unit 201 of the prediction device 20 acquired the first test data generated in step S94 and the second test data stored in the storage unit in step S95. Input to model AI2 and predict the human action of the test substance according to step S52 of FIG.
- the processing unit 201 of the prediction device 20 outputs the result in step S97.
- steps S94 to 97 of FIG. 14 the processing unit 201 of the prediction device 20 may perform steps S62 to S67 described in FIG. 13 to predict the prediction result regarding a new application of the existing substance.
- the method of constructing a prediction system includes a step of preparing a training device 10 and a prediction device 20.
- the construction method further obtains a measured value of a biomarker in one or more organs of a non-human animal to which an existing substance has been administered, or a measured value of a biomarker in one or more organs of a non-human animal to which a test substance has been administered. It may include a step of preparation.
- the training program TP is a computer program that causes the computer to function as the training device 10 by causing the computer to execute the processes including steps S1 to S4 of FIG. 6 described in the training of the artificial intelligence model.
- the prediction program PP is a computer program that causes the computer to function as the prediction device 20 by causing the computer to execute the processes including steps S51 to S53 described in the prediction of the action of the test substance.
- a storage medium for storing a computer program The present invention relates to a storage medium for storing a computer program.
- the computer program is stored in a semiconductor memory element such as a hard disk or a flash memory, or a storage medium such as an optical disk. Further, the computer program may be stored in a storage medium such as a cloud server that can be connected to a network.
- the computer program may be a program product in download format or stored in a storage medium.
- the storage format of the program in the pre-storage medium is not limited as long as the presenting device can read the program.
- the storage in the storage medium is preferably non-volatile. 6.
- Modification example 2 The embodiment in which the training device 10 and the prediction device 20 are different computers is shown. However, one computer may train and predict artificial intelligence models. In the present specification, the same reference numerals attached to the hardware indicate the same parts or the same functions.
- mice Preparation of drug-administered mice and gene expression analysis 1.
- Administration of drug (1) Alendronate In 11-week-old male C57BL / 6N mice, alendronate sodium salt trihydrate (Wako) was dissolved in PBS (Nacalai Tesque) at a dose of 1.0 mg / kg. Subcutaneous injection was performed every 3 or 4 days for 8 days. The drug was freshly prepared for each dose. Each organ was collected in the afternoon of the 8th day after drug administration.
- Acetaminophen 10-week-old male C57BL / 6N mice were fasted for 12 hours, during which time they were allowed to freely ingest water.
- acetaminophen (Wako) dissolved in physiological saline (Otsuka Pharmaceutical Co., Ltd.) was administered intraperitoneally in a single dose of 300 mg / kg.
- physiological saline (Otsuka Pharmaceutical Co., Ltd.) was administered intraperitoneally in a single dose of 300 mg / kg.
- the mice were allowed to freely ingest the usual diet.
- Administration was performed by noon, and organs were collected 2 hours after administration.
- aripiprazole In 11-week-old male C57BL / 6N mice, aripiprazole (Sigma-Aldrich) was dissolved in 0.5% (w / v) carboxymethyl cellulose 400 solution (Wako) and peritoneally at a single dose of 0.3 mg / kg. Was administered within. The drug was administered in the afternoon and the organ was collected 2 hours later.
- Cisplatin 11-week-old male C57BL / 6N mice were intraperitoneally administered with a single dose of cisplatin (Bristol-Myers Squibb) at a dose of 20 mg / kg. Organs were collected in the afternoon of the third day after drug administration.
- Clozapine 11-week-old male C57BL / 6N mice were subcutaneously administered with clozapine (Sigma-Aldrich) at a dose of 0.3 mg / kg in a single dose.
- Clozapine was first dissolved in acetic acid, diluted with saline and adjusted to pH 6 with 1M NaOH. Organs were harvested in the afternoon 2 hours after drug administration.
- Doxycycline 9-week-old male C57BL / 6N mice were fed with RO water containing 5% sucrose (Nacalai Tesque) and 2 mg / mL doxycycline hydrochloride n-hydrate (Wako) for 2 weeks. .. RO water containing the drug was replaced with a new one every week. Organs were collected in the afternoon of the 13th day after drug administration. The negative control group was fed RO water supplemented with 5% sucrose (Nacalai Tesque).
- lenalidomide In 8-week-old male C57BL / 6N mice, lenalidomide (Wako) was dissolved in a solution containing 0.5% carboxymethyl cellulose and 0.25% Tween-80 (Nacalai Tesque), and 50 mg / kg was forced daily for 69 days. Orally administered. The drug was freshly prepared for each dose. Organs were collected in the afternoon of the 69th day after the start of drug administration. In addition, a solution containing 0.5% carboxymethyl cellulose and 0.25% Tween-80 was administered to the negative control group.
- Lurasidone 11-week-old male C57BL / 6N mice were forcibly orally administered with lurasidone hydrochloride (Medchemexpress) dissolved in a 0.5% carboxymethyl cellulose solution at a dose of 0.3 mg / kg. Organs were collected in the afternoon 2 hours after drug administration.
- Medchemexpress lurasidone hydrochloride
- Olanzapine (Tokyo Chemical Industry Co., Ltd.) dissolved in a 0.5% carboxymethyl cellulose solution was forcibly orally administered in a single dose at a dose of 0.3 mg / kg. Organs were collected in the afternoon 2 hours after drug administration.
- Evolocumab (Repatha TM) Eleven-week-old male C57BL / 6N mice were subcutaneously administered with saline-dissolved Repatha TM (Astellas Pharma Inc.) at a dose of 10 mg / kg every 10 days for 4 weeks. Organs were harvested in the afternoon 4 weeks after drug administration.
- Sophosvir 7-week-old male C57BL / 6N mice were intraperitoneally administered with Sophosvir (LKT) at a dose of 20 mg / kg daily for 10 days. Sophosvir was first diluted with DMSO (Nacalai Tesque) and then 100-fold diluted with PBS prior to administration (final concentration is 1.0% DMSO / PBS). Organs were collected in the afternoon of the 10th day after the start of administration.
- Sophosvir was first diluted with DMSO (Nacalai Tesque) and then 100-fold diluted with PBS prior to administration (final concentration is 1.0% DMSO / PBS). Organs were collected in the afternoon of the 10th day after the start of administration.
- Teriparatide 10-week-old male C57BL / 6N mice were subcutaneously administered with human parathyroid hormone fragment 1-34 (teriparatide) (Sigma-Aldrich) dissolved in physiological saline at a dose of 40 ⁇ g / kg daily. Organs were harvested in the afternoon of 4 weeks after the start of drug administration. For negative controls, saline was administered.
- human parathyroid hormone fragment 1-34 teriparatide
- Wild-type mice Organs were collected in the afternoon from 11-week-old male C57BL / 6N mice to which no drug was administered.
- mice, organ extraction, and transcriptome analysis were performed according to the method described in Patent Document 1.
- the 24 organs are the adrenal gland, aorta, bone marrow cells (BM), brain, colon, eyes, heart, ileum, jejunum, left kidney, liver, lung, pancreas, parotid gland, pituitary gland, skeletal muscle, skin, skull, White adipose tissue (WAT) of the spleen, stomach, left testis, thymus, thyroid, and gonads.
- BM bone marrow cells
- WAT White adipose tissue
- mice All mice were housed in a temperature-controlled room at approximately 25 ° C with a 12-hour light-dark cycle and were allowed to freely ingest water and normal feed (CE-2, CLEA Japan, Inc., Tokyo, Japan). ..
- Transcriptome analysis is performed by QuantSeq 3'mRNA-Seq Library Prep Kit for Illumina (FWD) (cat # 015.384, LEXOGEN) and Illumina NextSeq 500 (75bp single-read, ca. 400 million reads / run, NextSeq 500/550 High Output Kit v2.5, cat # 20024906) was used.
- RNA-seq data processing transcription product mapping and counting
- Second training data 1.
- the label of the drug name administered to the mouse and the label of the indication name of each drug were set and used as the second training data.
- the indication name corresponding to the drug name follows The FDA Adverse Event Reporting System (FAERS: https://open.fda.gov/data/faers/).
- Adverse event report data from 2014Q2 to 2018Q1 was downloaded from the 3rd training data FAERS (https://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Surveillance/AdverseDrugEffects/ucm082193.htm). Above 1. Words indicating adverse events corresponding to the indication names of each drug administered to the mice were extracted from the reported data. One extracted word is regarded as one reported adverse event (the number of cases where one adverse event was reported for one drug indication name) / (the number of reported adverse events for one drug indication name). The frequency of occurrence (%) of each adverse event was calculated using the formula (all cases).
- g A and g B indicate the transcriptome pattern of 24 organs when the drugs A and B are administered (first training data). group).
- the indication of drug A is represented by "1”
- the indication of drug B is represented by "2”
- the elements of adverse events (AE) reported for indication 1 are represented by i, ii ... N.
- d 2 (d 2i , d 2ii ,..., d 2N ) (third training data group).
- the second training data group is a set of a label indicating the name of drug A and a label indicating the name of indication 1, and a label indicating the name of drug B and a label indicating the name of indication 2. Therefore, it can be represented by g A d 1 and g B d 2 respectively (second training data group).
- the indication was positive (with indication) when the number of records of drug A taken by the patient with indication 1 in FAERS exceeded 10.
- One-class SVM The data to be input to the One-class SVM was input to the One-class SVM as the fourth training data group by associating the first training data group and the third training data group with the kernel function of the following equation.
- k (g A d 1 , g B d 2 ) ⁇ g A , g B > ⁇ d 1 , d 2 >
- ⁇ , ⁇ > indicates an operator that scales each vector so that the 12 norm becomes 1, and takes the inner product between both scaled vectors.
- Example 1 In Example 1, the above 1. Predictions were made assuming that the indication for one of the drugs administered in. In other words, first, the above 1. One-class SVM was trained using the data on 14 kinds of drugs excluding one of the drugs administered in 1 as training data. After that, the removed drug is used as the target drug, and the transcriptome pattern when the target drug is administered is input as the first test data to the trained One-class SVM together with the second test data to predict the indication. did. The result is shown in FIG. In FIG. 11, TN is true negative, TP is true positive, FN is false negative, and FP is true positive.
- true negative indicates the number of items that can be predicted to be “not indicated” for “non-indication”
- true positive indicates the number of items that can be predicted to be “indication” for “indication”.
- False negatives indicate the number of items predicted to be “not indicated” for “indications”
- false positives indicate the number of items predicted to be “indications” for "no indications”.
- the accuracy scores are scores that indicate the accuracy of the prediction.
- the recall scores indicate the coverage rate when it is predicted to be "indication”.
- the precision score indicates the reliability when predicted to be "indication”.
- the prediction method of the present invention is a useful method for predicting the indication of a new substance whose indication is unknown.
- Example 2 We evaluated whether the present invention is useful for so-called drug repositioning, which seeks new indications for known substances. Above 1. Artificial intelligence was trained using data from all 15 drugs mentioned in the section to predict the indications for individual drugs. The result is shown in FIG. The symbols in the figure are the same as those in FIG.
- TP the number of TPs for all drugs increased and the number of FNs decreased.
- the recall score has also improved.
- accuracy scores and recall scores improved for all drugs, showing 0.770-1.000. This result indicates that both reported and unreported indications can be captured with a probability of 77% or higher.
- precision scores were low due to the high number of FNs.
- FP shows the potential for new indications that have not been reported so far. Since the number of FPs is relatively large, if it is necessary to narrow down the candidates, more candidates can be selected by calculating the decision function values of each indication in FP and ranking each indication of each drug. It is possible to squeeze.
- FIG. 13 shows an example of the decision function values of the alendronate.
- the predicted indications for FP that are common to drugs that are already known to have similar mechanisms of action are also repositioned indications. It is considered to have a high possibility of illness.
- Training device 20 Prediction device 40 Server device 101 Processing unit 201 Processing unit 401 Processing unit 400 Prediction system 105 Communication unit 405 Communication unit
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Medical Informatics (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Epidemiology (AREA)
- Biophysics (AREA)
- Biotechnology (AREA)
- Analytical Chemistry (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Immunology (AREA)
- Zoology (AREA)
- Organic Chemistry (AREA)
- Pathology (AREA)
- Medicinal Chemistry (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Biochemistry (AREA)
- Primary Health Care (AREA)
- Bioethics (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Pharmacology & Pharmacy (AREA)
- Environmental Sciences (AREA)
- Wood Science & Technology (AREA)
- General Physics & Mathematics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Food Science & Technology (AREA)
- Microbiology (AREA)
Abstract
Description
項2.項1に記載の訓練において、前記第1訓練データ群と前記第3訓練データ群とを前記第2訓練データ群により紐付けて第4訓練データ群を生成し、前記第4訓練データ群を人工知能に入力する。
項3.項1又は項2に記載の訓練方法において、前記有害事象に関する情報は、前記有害事象を示すラベルと、前記適応症における前記有害事象の有無、又は発生頻度を含む。
項4.項1から3のいずれか一項に記載の訓練方法において、前記バイオマーカーがトランスクリプトームである。
項5.項1から4のいずれか一項に記載の訓練方法において、前記人工知能モデルは、One-Class SVMである。
項6.本発明のある実施形態は、人工知能モデルの訓練装置に関する。前記訓練装置は、処理部を備え、前記処理部は、第1訓練データ群と、第2訓練データ群と、第3訓練データ群とを関連付けて人工知能モデルに入力して人工知能モデルを訓練し、前記第1訓練データ群は、ヒトにおける適応症が既知である複数の所定の既存物質を個別に投与した各非ヒト動物から採取された1又は複数の異なる各器官におけるバイオマーカーの挙動を示すデータの群と、前記投与した所定の既存物質のそれぞれの名称を示すラベルとが紐付けられたデータの群であり、前記第2訓練データ群は、前記複数の所定の既存物質のそれぞれの名称を示すラベルと、前記複数の所定の既存物質のそれぞれについて報告されている前記適応症を示すラベルとが紐付けられたデータの群であり前記第3訓練データ群は、前記複数の所定の既存物質のそれぞれについて報告されている前記適応症を示すラベルとこれらの適応症のそれぞれに対応して報告されている有害事象に関する情報とが紐付けられたデータの群であり、前記人工知能モデルは、被験物質のヒトにおける適応症を予測するためのものである。
項7.本発明のある実施形態は、コンピュータに実行させたときに、第1訓練データ群と、第2訓練データ群と、第3訓練データ群とを関連付けて人工知能モデルに入力して人工知能モデルを訓練するステップをコンピュータに実行させる、人工知能モデルの訓練プログラムに関する。前記プログラムにおいて、第1訓練データ群は、ヒトにおける適応症が既知である複数の所定の既存物質を個別に投与した各非ヒト動物から採取された1又は複数の異なる各器官におけるバイオマーカーの挙動を示すデータの群と、前記投与した所定の既存物質のそれぞれの名称を示すラベルとが紐付けられたデータの群であり、前記第2訓練データ群は、前記複数の所定の既存物質のそれぞれの名称を示すラベルと、前記複数の所定の既存物質のそれぞれについて報告されている前記適応症を示すラベルとが紐付けられたデータの群であり、前記第3訓練データ群は、前記複数の所定の既存物質のそれぞれについて報告されている前記適応症を示すラベルとこれらの適応症のそれぞれに対応して報告されている有害事象に関する情報が紐付けられたデータの群であり、前記人工知能モデルは、被験物質のヒトにおける適応症を予測するためのものである。
項8.本発明のある実施形態は、被験物質のヒトにおける適応症を予測する方法に関する。前記方法は、第1被験データ群を取得する工程であって、前記第1被験データ群は、被験物質を投与した非ヒト動物から採取された1又は複数の器官におけるバイオマーカーの挙動を示すデータの群である工程と、前記第1被験データ群と、第2被験データ群とを項1~5のいずれか一項に記載の方法で訓練された人工知能モデルに入力し、前記訓練された人工知能モデルにより、入力した前記第1被験データ群と第2被験データ群に基づいて前記被験物質のヒトにおける適応症を予測する工程であって、前記第2被験データ群は、複数の公知の適応症のラベルと、前記複数の公知の適応症のそれぞれに対応して報告されている有害事象に関する情報とが紐付けられたデータの群である工程と、を含む。
項9.項8に記載の予測方法において、前記被験物質は、既存物質又は既存物質の等価物質を含まない。
項10.項8又は項9に記載の予測方法において、前記被験物質は、既存物質又は既存物質の等価物質から選択される1種である。
項11.本発明のある実施形態は、被験物質のヒトにおける適応症を予測する予測装置に関する。前記予測装置は、処理部を備え、前記処理部は、第1被験データ群と、第2被験データ群とを項1~5のいずれか一項に記載の方法で訓練された人工知能モデルに入力し、前記訓練された人工知能モデルにより、入力した前記第1被験データ群と第2被験データ群に基づいて前記被験物質のヒトにおける適応症を予測し、前記第1被験データ群は、被験物質を投与した非ヒト動物から採取された1又は複数の器官におけるバイオマーカーの挙動を示すデータの群であって、第1訓練データ群の生成時に採取された器官に対応する1又は複数の器官におけるバイオマーカーの挙動を示すデータの群であり、前記第2被験データ群は、複数の公知の適応症のラベルと、前記複数の公知の適応症のそれぞれに対応して報告されている第3訓練データ群の生成時に取得された有害事象に関する情報とが紐付けられたデータの群である。
項12.本発明のある実施形態は、コンピュータに実行させた時に、第1被験データ群と、第2被験データ群とを項1~5のいずれか一項に記載の方法で訓練された人工知能モデルに入力し、前記訓練された人工知能モデルにより、入力した前記第1被験データ群と第2被験データ群に基づいて前記被験物質のヒトにおける適応症を予測するステップであって、前記第1被験データ群は、被験物質を投与した非ヒト動物から採取された1又は複数の器官におけるバイオマーカーの挙動を示すデータの群であって、第1訓練データ群の生成時に採取された器官に対応する1又は複数の器官におけるバイオマーカーの挙動を示すデータの群であり、前記第2被験データ群は、複数の公知の適応症のラベルと、前記複数の公知の適応症のそれぞれに対応して報告されている有害事象に関する情報とが紐付けられたデータの群であるステップを、コンピュータに実行させる、被験物質のヒトにおける適応症を予測するためのコンピュータプログラムに関する。
項13.本発明のある実施形態は、被験物質のヒトにおける適応症を予測するための予測システムに関する。前記システムは、第1被験データ群を送信するサーバ装置であって、前記第1被験データ群が被験物質を投与した非ヒト動物から採取された1又は複数の器官におけるバイオマーカーの挙動を示すデータの群である、サーバ装置と、前記サーバ装置とネットワークを介して接続された、ヒトにおける前記被験物質の作用を予測するための予測装置と、を備える。前記サーバ装置は、前記第1被験データ群を送信するための通信部を備え、前記予測装置は、処理部と、通信部を備え、前記処理部は、前記サーバ装置の通信部を介して送信された第1被験データ群を、前記予測装置の通信部を介して取得し、取得した第1被験データ群と、第2被験データ群とを項1~5のいずれか一項に記載の方法で訓練された人工知能モデルに入力し、前記訓練された人工知能モデルにより、入力した前記第1被験データ群と第2被験データ群に基づいて前記被験物質のヒトにおける適応症を予測し、前記第1被験データ群は、被験物質を投与した非ヒト動物から採取された1又は複数の器官におけるバイオマーカーの挙動を示すデータの群であって、第1訓練データ群の生成時に採取された器官に対応する1又は複数の器官におけるバイオマーカーの挙動を示すデータの群であり、前記第2被験データ群は、複数の公知の適応症のラベルと、前記複数の公知の適応症のそれぞれに対応して報告されている第3訓練データ群の生成時に取得された有害事象に関する情報とが紐付けられたデータの群である。
項14.本発明のある実施形態は、1又は複数の異なる各器官におけるバイオマーカーの挙動を示すデータの群と、前記バイオマーカーの挙動を示すデータの群を取得する際に投与した前記既存物質の名称を示すラベルとが紐付けられたデータの群である、第1訓練データ群であって、前記1又は複数の異なる器官は、ヒトにおける適応症が既知である複数の所定の既存物質を個別に投与した各非ヒト動物から採取される、前記第1訓練データ群と、前記複数の所定の既存物質のそれぞれの名称を示すラベルと、前記複数の所定の既存物質のそれぞれについて報告されている前記適応症を示すラベルとが紐付けられたデータの群である、第2訓練データ群と、前記適応症を示すラベルと前記適応症のそれぞれに対応して報告されている有害事象に関する情報が紐付けられたデータの群である、前記第3訓練データ群と、を、被験物質のヒトにおける適応症を予測するための人工知能モデルの訓練のために使用する方法に関する。
項15.第1被験データ群と、第2被験データ群とを、被験物質のヒトにおける適応症を予測するための被験データとして使用する方法に関する。前記方法において、前記第1被験データ群は、被験物質を投与した非ヒト動物から採取された1又は複数の器官におけるバイオマーカーの挙動を示すデータの群であって、第1訓練データ群の生成時に採取された器官に対応する1又は複数の器官におけるバイオマーカーの挙動を示すデータの群であり、前記第2被験データ群は、複数の公知の適応症のラベルと、前記複数の公知の適応症のそれぞれに対応して報告されている有害事象に関する情報とが紐付けられたデータの群である。
はじめに、本開示のある実施形態である人工知能の訓練方法、及び予測方法の概要を説明する。また、従来法と本開示に含まれる訓練方法、及び予測方法の相違点を説明する。
図1に示すように、予測に用いられる人工知能モデルは、好ましくは、第1訓練データ群、第2訓練データ群及び第3訓練データ群の3種類の訓練データ群を関連付けたデータ群により訓練される。
上記1.(1)において訓練された人工知能モデルを用いて、被験物質のヒトにおける適応症を予測する。適応症を予測する際に訓練された人工知能モデルに入力される被験データ群は、第1被験データ群と第2被験データ群となる。第1被験データ群は第2被験データ群と共に訓練された人工知能モデルに入力される。
図2に示す従来法は、特許文献2に記載の方法であり、例えば既存物質として薬剤A、B、Cを個別にマウス等の非ヒト動物に投与し、前記非ヒト動物からそれぞれに器官又は器官の一部である組織を採取する。次に採取した器官又は組織におけるバイオマーカーの挙動を解析し、第1訓練データ群を生成する。また、既存物質の有害事象、適応症、薬物動態、及び適応症等のヒト臨床データベースから、第2訓練データを生成する。ぞして、図2に示す人工知能モデルは、第1訓練データ群と第2訓練データとを用いて訓練することにより生成される。言い換えると、従来法では、バイオマーカーの挙動と既存物質の有害事象、適応症、薬物動態、又は適応症の一つずつを対応付けて人工知能モデルを構築する。また、従来法で使用する被験データは、被験物質を投与した非ヒト動物の1又は複数の異なる器官であって、第1訓練データ群の生成時に採取された器官に対応する1又は複数の器官におけるバイオマーカーの挙動を示すデータである。
本開示において、非ヒト動物は、制限されない。例えば、マウス、ラット、イヌ、ネコ、ウサギ、ウシ、ウマ、ヤギ、ヒツジ、ブタ等の哺乳動物、ニワトリ等の鳥類等が挙げられる。好ましくはマウス、ラット、イヌ、ネコ、ウシ、ウマ、ブタ等の哺乳動物であり、より好ましくはマウス、又はラット等であり、さらに好ましくはマウスである。非ヒト動物には、前記動物の胎児、雛等も含まれる。
2-1.訓練データの生成
(1)第1訓練データ群の生成
第1訓練データ群は、1又は複数の異なる各器官におけるバイオマーカーの挙動を示すデータの群と、前記既存物質名を示すラベルから構成される。前記1又は複数の異なる器官は、ヒトにおける作用が既知である複数の既存物質を個別に投与した各非ヒト動物から採取され得る。前記第1訓練データ群は、データベースTR1として図5に示す訓練装置10の補助記憶部104に格納され得る。
前記各器官におけるバイオマーカーの挙動を示すデータのそれぞれは、投与した既存物質の名称に関する情報、採取された器官の名称に関する情報、バイオマーカーの名称に関する情報等と紐付けられ得る。名称に関する情報とは、名称そのもの、あるいは略称等のラベルであってもよく、各名称に対応するラベル値であってもよい。
第1訓練データ群の例は、上記1.(1)と図3(A)に示したとおりである。
前記第2訓練データ群は、上記1.(1)及び図3(B)に示したように、第1訓練データ群を生成する際に非ヒト動物に投与した、複数の所定の既存物質それぞれの名称を示すラベルと、前記複数の所定の既存物質のそれぞれについて報告されている前記適応症を示すラベルとを紐付けて生成される。既存物質の適応症は、上記1.(4)で述べたFAERS、DAILYMEDのall drug labels 、Medical Subject Headings、Drugs@FDA、International Classification of Diseases等の外部データベースから、既存物質ごとに、例えば既存物質名を示す単語で検索し、これに対応する適応症名のラベルを取得することができる。適応症は、1つの既存物質に対して、1又は2以上存在し得る。各適応症が、1つの既存物質に対して2以上存在する場合、前記2以上の複数の適応症が第2訓練データ群を構成する。複数の所定の既存物質のそれぞれについて報告されている前記適応症を示すラベルは、データベースに格納されているデータ群に対し、テキスト抽出、自然言語処理、デジタイズ処理、画像解析処理等を行うことにより取得することができる。例えば、外部データベースに格納されている、第1訓練データ群を生成する際に非ヒト動物に投与した各既存物質に対応した各適応症の名称を示すラベルが文章に挿入されて登録されている場合には、自然言語処理により、登録されている文章に対して、構文解析、単語分割、意味解析等を行ってから、作用に対応するテキストを抽出してもよい。
第3訓練データは、上記1.(1)、及び図3(C)で述べたように、第1訓練データ群を取得する際に投与した複数の所定の既存物質のそれぞれについて報告されている図3(B)に示す適応症を示すラベルとこれらの適応症のそれぞれに対応して報告されている有害事象に関する情報とが紐付けられたデータの群である。複数の所定の既存物質のそれぞれについて報告されている適応症は、FAERS、DAILYMEDのall drug labels 、Medical Subject Headings、Drugs@FDA、International Classification of Diseases等の外部データベースから、既存物質ごとに、例えば既存物質名の単語で検索し、これに対応する適応症名のラベルを取得することができる。これらの適応症のそれぞれに対応して報告されている有害事象を示すラベルは、FAERS、又はclinicaltrials.gov等の外部データベースから、適応症名を示すラベルで検索し、取得することができる。また、適応症又は有害事象の名称を示すラベルが文章に挿入されて登録されている場合には、自然言語処理により、登録されている文章に対して、構文解析、単語分割、意味解析等を行ってから、作用に対応するテキストを抽出してもよい。
有害事象の発生頻度は、上記1.(4)で説明した方法により、算出することができる。
第4訓練データ群は、上記1.(1)及び図3(D)において説明したように、第1訓練データ群に含まれる薬剤名を示すラベルの部分(図3(A)の薬剤名を示す第1列目)に、第1訓練データを取得するために投与された既存物質の名称を示すラベルに対応する適応症について報告されている有害事象の発生頻度(図3(C)に示す第2列目以降の有害事象の発生頻度)を代入し生成される。
人工知能モデルは、本発明に係る課題を解決することができる限り制限されない。本実施形態では、Link Predictionを行うことができる人工知能モデルを使用することが好ましい。このような人工知能モデルとしてOne-Class SVM(ワンクラス サポートベクターマシン)等を挙げることができる。
k(gAd1,gBd2)=<gA,gB><d1,d2>
ここで、<・,・>は12ノルムが1になるように各ベクトルをスケーリングし、スケーリングされた両方のベクトル間の内積を取る演算子を示す。
図4(A)に訓練システム50のハードウエアの構成を示す。訓練システム50は、次世代シーケンサー等のバイオマーカーの測定データを取得するための測定部30と、訓練装置10を備える。訓練装置10と測定部30は、無線又は有線のネットワークで通信可能に接続されていてもよいが、測定部30で取得されたデータをCD-R等の記憶媒体を介して取得してもよい。
前記人工知能モデルの訓練は、例えば、訓練装置10(以下、装置10ともいう)を使用して行うことができる。
図6を用いて、訓練プログラムTPによる人工知能モデルの訓練処理の流れを説明する。
処理部101は、オペレータが入力部111から入力した処理開始指令を受け付け、ステップS1において補助記憶部104に格納された第1訓練データ群データベースTR1と、第2訓練データ群データベースTR2と、第3訓練データ群データベースTR3のそれぞれから第1訓練データ群と、第2訓練データ群と、第3訓練データ群を取得する。
各スッテプ間の移行は、オペレータが指令を入力してもよいが、処理部101が前のステップが終了したことをトリガーとして自動的に進めてもよい。
3-1.被験データの生成
(1)第1被験データ群の生成
第1被験データ群は、1又は複数の異なる各器官におけるバイオマーカーの挙動を示すデータの群であって、前記第1訓練データを取得した1又は複数の異なる器官に対応する期間から取得されうる。各器官におけるバイオマーカーの挙動を示すデータの群は、上記1.(4)に記載の方法により第1訓練データとして使用されるバイオマーカーの挙動を示すデータ群と同様に取得され得る。
第2被験データは、上記1.(2)で説明したように、複数の公知の適応症のラベルと、前記複数の公知の適応症のそれぞれに対応して報告されている有害事象に関する情報とが紐付けられたデータの群である。複数の公知の適応症のラベルと、これらの適応症のそれぞれに対応して報告されている有害事象を示すラベルは、FAERS、又はclinicaltrials.gov等の外部データベースから、適応症名を示すラベルで検索し、取得することができる。また、適応症又は有害事象の名称を示すラベルが文章に挿入されて登録されている場合には、自然言語処理により、登録されている文章に対して、構文解析、単語分割、意味解析等を行ってから、作用に対応するテキストを抽出してもよい。
有害事象の発生頻度は、上記1.(4)で説明した方法により、算出することができる。
図4(A)に予測システム51のハードウエアの構成を示す。予測システム51は、次世代シーケンサー等のバイオマーカーの測定データを取得するための測定部30と、予測装置20を備える。予測装置20と測定部30は、無線又は有線のネットワークで接続されていてもよいが、測定部30で取得されたデータをCD-R等の記憶媒体を介して取得してもよい。
前記適応症の予測は、例えば、予測装置20(以下、単に装置20と呼ぶことがある)を使用して行うことができる。
図8を用いて、予測プログラムPPによる適応症の予測処理の流れを説明する。
処理部201は、オペレータが入力部211から入力した処理開始指令を受け付け、ステップS51において補助記憶部204に格納された第1被験データ群と、第2被験データ群を取得する。
つまり、“1”が付されている適応症が、被験物質の予測された適応症となる。
図4(B)に予測システム400の構成を示す。
予測システム400は、測定部30と、訓練装置10と、予測装置20と、バイオマーカーの挙動を示すデータ群を送信するサーバ装置40とは、通信可能に接続されている。訓練装置10と予測装置20は、測定部30が取得したデータを、サーバ装置40を介して取得する。
サーバ装置40(以下、単に装置40と呼ぶことがある)について、上記1.、上記2-1.に記載された用語と共通する用語については、上記説明をここに援用する。
図10を用いて予測システムの動作を説明する。
ここでは測定部30によるバイオマーカーの測定値の取得から、予測結果の出力まで一連の流れを説明する。
予測システムの構築方法は、訓練装置10と予測装置20を準備する工程を含む。前記構築方法は、さらに既存物質を投与した非ヒト動物の1又は複数の器官におけるバイオマーカーの測定値、あるいは、被験物質を投与した非ヒト動物の1又は複数の器官におけるバイオマーカーの測定値を準備する工程を含んでいてもよい。
4-1.訓練プログラム
訓練プログラムTPは、上記人工知能モデルの訓練で述べた図6のステップS1~S4を含む処理をコンピュータで実行させることにより、コンピュータを訓練装置10として機能させるコンピュータプログラムである。
予測プログラムPPは、上記被験物質の作用の予測で述べたステップS51~S53を含む処理をコンピュータで実行させることにより、コンピュータを予測装置20として機能させるコンピュータプログラムである。
上記コンピュータプログラムを記憶した記憶媒体に関する。前記コンピュータプログラムは、ハードディスク、フラッシュメモリ等の半導体メモリ素子、光ディスク等の記憶媒体に記憶される。また前記コンピュータプログラムは、クラウドサーバ等のネットワークで接続可能な記憶媒体に記憶されていてもよい。コンピュータプログラムは、ダウンロード形式の、又は記憶媒体に記憶されたプログラム製品であってもよい。
6.変形例
上記2.では訓練装置10と予測装置20が別のコンピュータである実施形態を示した。しかし、1台のコンピュータが、人工知能モデルの訓練と予測を行ってもよい。
本明細書において、ハードウエアに付された同一符号は、同じ部分又は同じ機能をしめす。
I-1.薬剤投与マウスの作製、および遺伝子発現解析
1.薬剤の投与
(1)アレンドロネート
11週齢の雄のC57BL / 6Nマウスに、アレンドロン酸ナトリウム塩三水和物(和光)をPBS(ナカライテスク)に溶解し、1.0 mg / kgの用量で3日又は4日ごとに8日間皮下注射した。薬剤は投与ごとに新たに調製した。薬剤投与後8日目の午後に各器官を採取した。
10週齢の雄のC57BL/6Nマウスを12時間絶食させ、その間水を自由に摂取させた。絶食期間の直後に、生理食塩水(大塚製薬)に溶解したアセトアミノフェン(和光)を300 mg / kgの用量で単回でマウスの腹腔内に投与した。投与後、マウスには通常の飼料を自由に摂取させた。投与は正午までに行い、投与2時間後に器官を採取した。
11週齢の雄C57BL / 6Nマウスに、アリピプラゾール(Sigma-Aldrich)を0.5%(w / v)カルボキシメチルセルロース400溶液(Wako)に溶解し0.3 mg / kgの用量で単回で腹腔内に投与した。午後に薬剤を投与し2時間後に器官を採取した。
11週齢のオスのC57BL/6Nマウスに、0.3 mg / kgの用量で生理食塩水に溶解したアセナピンマレイン酸塩(Chemscene)を単回で皮下に投与した。午後に薬剤を投与し2時間後に器官を採取した。
11週齢のオスのC57BL / 6Nマウスに、20 mg/kgの用量でシスプラチン(Bristol-Myers Squibb)を単回で腹腔内に投与した。薬剤投与後3日目の午後に器官を採取した。
11週齢のオスのC57BL / 6Nマウスに、クロザピン(Sigma-Aldrich)を0.3 mg / kgの用量で単回で皮下投与した。クロザピンを最初に酢酸に溶解してから生理食塩水で希釈し、1M NaOHでpH 6に調整した。薬剤投与の2時間後の午後に器官を採取した。
9週齢の雄のC57BL / 6Nマウスに、5%スクロース(ナカライテスク)と2 mg / mLのドキシサイクリン塩酸塩n-水和物(和光)を含む2週間RO水を摂取させた。薬剤を含むRO水は1週間ごとに新しいものに交換した。薬剤投与後13日目の午後に器官を採取した。陰性コントロール群には、5%スクロース(ナカライテスク)を添加したRO水を摂取させた。
10週齢のオスのC57BL / 6Nマウスに、エンパグリフロジン(トロントの研究用化学物質)を0.5%カルボキシメチルセルロースに溶解し、2週間毎日10 mg / kgの用量となるように強制的に経口投与した。薬剤は、毎回投与ごとに新しく調製した。薬剤の薬剤投与開始から2週間目の午後に器官を採取した。
8週齢のオスのC57BL / 6Nマウスに、0.5%カルボキシメチルセルロースと0.25%Tween-80(ナカライテスク)を含む溶液にレナリドマイド(和光)を溶解し、69日間毎日50mg / kgを強制的に経口投与した。薬剤は、毎回投与ごとに新しく調製した。薬剤の投与開始から69日目の午後に器官を採取した。また、陰性コントロール群には、0.5%カルボキシメチルセルロースおよび0.25%Tween-80を含む溶液を投与した。
11週齢の雄のC57BL / 6Nマウスに、0.5%カルボキシメチルセルロース溶液に溶解した塩酸ルラシドン(Medchemexpress)を0.3 mg / kgとなるように単回で強制的に経口投与した。薬剤投与後2時間後の午後に器官を採取した。
0.5%カルボキシメチルセルロース溶液に溶解したオランザピン(東京化成工業)を0.3 mg/kgとなるように単回で強制的に経口投与した。薬剤投与後2時間後の午後に器官を採取した。
11週齢のオスのC57BL / 6Nマウスに、4週間、10日ごとに生理食塩水溶解したRepatha(商標)(アステラス製薬株会社)を10 mg / kgの用量で皮下投与した。薬剤投与の4週間後の午後に器官を採取した。
11週齢の雄のC57BL / 6Nマウスに、PBSに溶解したリセドロン酸ナトリウム塩(Cayman Chemical Company)を10 mg / kgの用量で1日おきに8日間強制的に経口投与した。薬剤は、毎回投与ごとに新しく調製した。投与開始後8日目の午後に器官を採取した。
7週齢のオスのC57BL / 6Nマウスに、ソフォスビル(LKT)を20 mg / kgの用量で毎日10日間腹腔内投与した。ソフォスビルは、最初にDMSO(ナカライテスク)で希釈し、その後、投与前にPBSで100倍に希釈した(最終濃度は1.0%DMSO / PBSです)。投与開始後10日目の午後に器官を採取した。
10週齢の雄のC57BL / 6Nマウスに、生理食塩水に溶解したヒト副甲状腺ホルモンフラグメント1-34(テリパラチド)(Sigma-Aldrich)を40 μg/kg用量で毎日皮下投与した。薬剤の投与の開始後4週間目の午後に器官を採取した。陰性コントロールには、生理食塩水を投与した。
薬剤を投与していない11週齢の雄C57BL / 6Nマウスから、器官を午後に採取した。
(1)器官
マウスを使った実験、器官の抽出、及びトランスクリプトーム解析は、特許文献1に記載された方法にしたがって行った。24の器官は、副腎、大動脈、骨髄細胞(BM)、脳、結腸、目、心臓、回腸、空腸、左腎臓、肝臓、肺、膵臓、耳下腺、下垂体、骨格筋、皮膚、頭蓋、 脾臓、胃、左精巣、胸腺、甲状腺、および性腺の白色脂肪組織(WAT)である。
トランスクリプトーム解析は、 QuantSeq 3’mRNA-Seq Library Prep Kit for Illumina (FWD) (cat#015.384, LEXOGEN)とIllumina NextSeq 500 (75bp single-read, ca. 400 million reads/run, NextSeq 500/550 High Output Kit v2.5, cat#20024906)を使用して行った。
One-class SVMを用いたLink Prediction(LP)を使用した人工知能モデルを構築し、薬剤の適応症の予測を行った。
(1)第1訓練データ
各薬剤の特徴として、各器官において発現の変化が、p<0.0001を示した遺伝子を選択した。器官と、すべての器官(24器官のフレームワーク)または個々の器官(個々の器官のフレームワーク)から選択したすべての遺伝子のlog2fold値と器官名との組み合わせと、その遺伝子発現データを取得する際に投与した薬剤の名称のラベルをセットにして第1訓練データとして使用した。
上記1.でマウスに投与した薬剤名のラベルと、その各薬剤の適応症名のラベルをセットにして第2訓練データとして使用した。薬剤名に対応する適応症名は、The FDA Adverse Event Reporting System (FAERS: https://open.fda.gov/data/faers/)にしたがった。
FAERS(https://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Surveillance/AdverseDrugEffects/ucm082193.htm) から2014Q2~2018Q1までのadverse event報告データをダウンロードした。上記1.でマウスに投与した各薬剤の適応症名に対応する有害事象を名示す単語を報告データから抽出した。抽出された1単語を、1つの報告された有害事象とし、(1つの薬剤の適応症名について1つの有害事象が報告された件数)/(1つの薬剤の適応症名について報告された有害事象の全ての件数)の式を用いて各有害事象の発生頻度(%)をそれぞれ計算した。
薬剤名を例えばA、Bとしたとき、gA, gBは、各薬剤A、Bを投与した時の24器官のトランスクリプトームのパターンを示す(第1訓練データ群)。また、薬剤Aの適応症を“1”、薬剤Bの適応症を“2”で表し、適応症1ついて報告されている有害事象(AE)の要素をi、ii・・・Nで表すと、適応症1のベクトルはd1= (d1i, d1ii, … , d1N), d2= (d2i, d2ii, … , d2N)となる(第3訓練データ群)。また、第2訓練データ群は、薬剤Aの名称を示すラベルと適応症1の名称を示すラベル、及び薬剤Bの名称を示すラベルと適応症2の名称を示すラベルをセットにしたものであるので、それぞれgAd1, gBd2で表すことができる(第2訓練データ群)。ここで、適応症は、FAERSにおいて適応症1の患者が服用した薬剤Aのレコードが10個を超える場合を陽性(適応有り)とした。
One-class SVMに入力するためのデータは下式のカーネル関数により、第1訓練データ群と第3訓練データ群を対応付けて第4訓練データ群としてOne-class SVMに入力した。
k(gAd1, gBd2)= <gA, gB><d1, d2>
ここで、<・,・> は12ノルムが1になるように各ベクトルをスケーリングし、スケーリングされた両方のベクトル間の内積を取る演算子を示す。
訓練されたOne-class SVMに目的とする薬剤を投与した時の24器官のトランスクリプトームのパターン(第1被験データ)と、FAERSに登録されている[全ての適応症の名称を示すラベル]と[その適応症に対応する有害事象名と発生頻度の組み合わせ(gd)]を入力し、全ての適応症について個別に目的とする薬剤が効くか否かを訓練されたOne-class SVMに判断させた。 具体的には、LP problemにおいて、目的薬剤と個々の適応症にリンクが有るか無いかを訓練されたOne-class SVMに判断させた。SVMは、目的薬剤がある適応症に効くのであればラベル“1”を返し、目的薬剤がある適応症に効かないのであればラベル“-1” を返す。
実施例1では、上記1.で投与した薬剤の1つの適応症が未知であると仮定して、予測を行った。言い換えると、はじめに、上記1.で投与した薬剤のうち1つを除いた14種の薬剤に関するデータを訓練データとしてOne-class SVMを訓練した。その後除かれた薬剤を目的薬剤として、目的薬剤を投与した際のトランスクリプトームのパターンとを第1被験データとして、訓練されたOne-class SVMに第2被験データと共に入力し、適応症を予測した。その結果を図11に示す。図11において、TNは真陰性、TPは真陽性、FNは偽陰性、FPは真陽性を示す。真陰性は、「適応症でない」ものを「適応症でない」と予測できた項目数を示し、真陽性は、「適応症である」ものを「適応症である」と予測できた項目数を示す。偽陰性は、「適応症である」ものを「適応症でない」と予測した項目数を示し、偽陽性は、「適応症ない」ものを「適応症である」と予測した項目数を示す。 accuracy scoresは予測の正確性を示すスコアである。recall scoresは「適応症である」と予測された場合の網羅率を示す。precision scoreは「適応症である」と予測された場合の信頼度を示す。
本発明が、公知物質の新たな適応症を探索する、いわゆるドラッグリポジショニングに有用であるかいなかを評価した。上記1.で述べた15薬剤全てのデータを使って人工知能を訓練し、個々の薬剤の適応症を予測した。その結果を図12に示す。図内の記号は図11と同様である。
20 予測装置
40 サーバ装置
101 処理部
201 処理部
401 処理部
400 予測システム
105 通信部
405 通信部
Claims (15)
- 人工知能モデルの訓練方法であって、
前記訓練方法は、第1訓練データ群と、第2訓練データ群と、第3訓練データ群とを関連付けて人工知能モデルに入力して人工知能モデルを訓練することを含み、
前記第1訓練データ群は、ヒトにおける適応症が既知である複数の所定の既存物質を個別に投与した各非ヒト動物から採取された1又は複数の異なる各器官におけるバイオマーカーの挙動を示すデータの群と、前記投与した所定の既存物質のそれぞれの名称を示すラベルとが紐付けられたデータの群であり、
前記第2訓練データ群は、前記複数の所定の既存物質のそれぞれの名称を示すラベルと、前記複数の所定の既存物質のそれぞれについて報告されている前記適応症を示すラベルとが紐付けられたデータの群であり、
前記第3訓練データ群は、前記複数の所定の既存物質のそれぞれについて報告されている前記適応症を示すラベルとこれらの適応症のそれぞれに対応して報告されている有害事象に関する情報とが紐付けられたデータの群であり、
前記人工知能モデルは、被験物質のヒトにおける適応症を予測するためのものである、
前記訓練方法。 - 前記訓練において、前記第1訓練データ群と前記第3訓練データ群とを前記第2訓練データ群により紐付けて第4訓練データ群を生成し、前記第4訓練データ群を人工知能に入力する、
請求項1に記載の訓練方法。 - 前記有害事象に関する情報は、前記有害事象を示すラベルと、前記適応症における前記有害事象の有無、又は発生頻度を含む、
請求項1又は2に記載の訓練方法。 - 前記バイオマーカーがトランスクリプトームである、請求項1から3のいずれか一項に記載の訓練方法。
- 前記人工知能モデルが、One-Class SVMである、請求項1から4のいずれか一項に記載の訓練方法。
- 人工知能モデルの訓練装置であって、
前記訓練装置は、処理部を備え、
前記処理部は、
第1訓練データ群と、第2訓練データ群と、第3訓練データ群とを関連付けて人工知能モデルに入力して人工知能モデルを訓練し、
前記第1訓練データ群は、ヒトにおける適応症が既知である複数の所定の既存物質を個別に投与した各非ヒト動物から採取された1又は複数の異なる各器官におけるバイオマーカーの挙動を示すデータの群と、前記投与した所定の既存物質のそれぞれの名称を示すラベルとが紐付けられたデータの群であり、
前記第2訓練データ群は、前記複数の所定の既存物質のそれぞれの名称を示すラベルと、前記複数の所定の既存物質のそれぞれについて報告されている前記適応症を示すラベルとが紐付けられたデータの群であり、
前記第3訓練データ群は、前記複数の所定の既存物質のそれぞれについて報告されている前記適応症を示すラベルとこれらの適応症のそれぞれに対応して報告されている有害事象に関する情報とが紐付けられたデータの群であり、
前記人工知能モデルは、被験物質のヒトにおける適応症を予測するためのものである、
前記訓練装置。 - コンピュータに実行させたときに、第1訓練データ群と、第2訓練データ群と、第3訓練データ群とを関連付けて人工知能モデルに入力して人工知能モデルを訓練するステップをコンピュータに実行させる、人工知能モデルの訓練プログラムであって、
前記第1訓練データ群は、ヒトにおける適応症が既知である複数の所定の既存物質を個別に投与した各非ヒト動物から採取された1又は複数の異なる各器官におけるバイオマーカーの挙動を示すデータの群と、前記投与した所定の既存物質のそれぞれの名称を示すラベルとが紐付けられたデータの群であり、
前記第2訓練データ群は、前記複数の所定の既存物質のそれぞれの名称を示すラベルと、前記複数の所定の既存物質のそれぞれについて報告されている前記適応症を示すラベルとが紐付けられたデータの群であり、
前記第3訓練データ群は、前記複数の所定の既存物質のそれぞれについて報告されている前記適応症を示すラベルとこれらの適応症のそれぞれに対応して報告されている有害事象に関する情報が紐付けられたデータの群であり、
前記人工知能モデルは、被験物質のヒトにおける適応症を予測するためのものである、
前記訓練プログラム。 - 被験物質のヒトにおける適応症を予測する方法であって、
第1被験データ群を取得する工程であって、前記第1被験データ群は、被験物質を投与した非ヒト動物から採取された1又は複数の器官におけるバイオマーカーの挙動を示すデータの群である工程と、
前記第1被験データ群と、第2被験データ群とを請求項1~5のいずれか一項に記載の方法で訓練された人工知能モデルに入力し、前記訓練された人工知能モデルにより、入力した前記第1被験データ群と第2被験データ群に基づいて前記被験物質のヒトにおける適応症を予測する工程であって、前記第2被験データ群は、複数の公知の適応症のラベルと、前記複数の公知の適応症のそれぞれに対応して報告されている有害事象に関する情報とが紐付けられたデータの群である工程と、
を含む、前記予測方法。 - 前記被験物質は、既存物質又は既存物質の等価物質を含まない、請求項7に記載の予測方法。
- 前記被験物質は、既存物質又は既存物質の等価物質から選択される1種である、請求項7に記載の予測方法。
- 被験物質のヒトにおける適応症を予測する予測装置であって、
前記予測装置は、処理部を備え、前記処理部は、
第1被験データ群と、第2被験データ群とを請求項1~5のいずれか一項に記載の方法で訓練された人工知能モデルに入力し、前記訓練された人工知能モデルにより、入力した前記第1被験データ群と第2被験データ群に基づいて前記被験物質のヒトにおける適応症を予測し、
前記第1被験データ群は、被験物質を投与した非ヒト動物から採取された1又は複数の器官におけるバイオマーカーの挙動を示すデータの群であって、第1訓練データ群の生成時に採取された器官に対応する1又は複数の器官におけるバイオマーカーの挙動を示すデータの群であり、
前記第2被験データ群は、複数の公知の適応症のラベルと、前記複数の公知の適応症のそれぞれに対応して報告されている第3訓練データ群の生成時に取得された有害事象に関する情報とが紐付けられたデータの群である、
前記予測装置。 - コンピュータに実行させた時に、
第1被験データ群と、第2被験データ群とを請求項1~5のいずれか一項に記載の方法で訓練された人工知能モデルに入力し、前記訓練された人工知能モデルにより、入力した前記第1被験データ群と第2被験データ群に基づいて前記被験物質のヒトにおける適応症を予測するステップであって、
前記第1被験データ群は、被験物質を投与した非ヒト動物から採取された1又は複数の器官におけるバイオマーカーの挙動を示すデータの群であって、第1訓練データ群の生成時に採取された器官に対応する1又は複数の器官におけるバイオマーカーの挙動を示すデータの群であり、
前記第2被験データ群は、複数の公知の適応症のラベルと、前記複数の公知の適応症のそれぞれに対応して報告されている第3訓練データ群の生成時に取得された有害事象に関する情報とが紐付けられたデータの群であるステップを、
コンピュータに実行させる、被験物質のヒトにおける適応症を予測するためのコンピュータプログラム。 - 被験物質のヒトにおける適応症を予測するための予測システムであって、
前記システムは、
第1被験データ群を送信するサーバ装置であって、前記第1被験データ群が被験物質を投与した非ヒト動物から採取された1又は複数の器官におけるバイオマーカーの挙動を示すデータの群である、サーバ装置と、
前記サーバ装置とネットワークを介して接続された、ヒトにおける前記被験物質の作用を予測するための予測装置と、
を備え、
前記サーバ装置は、前記第1被験データ群を送信するための通信部を備え、
前記予測装置は、処理部と、通信部を備え、
前記処理部は、
前記サーバ装置の通信部を介して送信された第1被験データ群を、前記予測装置の通信部を介して取得し、
取得した第1被験データ群と、第2被験データ群とを請求項1~5のいずれか一項に記載の方法で訓練された人工知能モデルに入力し、前記訓練された人工知能モデルにより、入力した前記第1被験データ群と第2被験データ群に基づいて前記被験物質のヒトにおける適応症を予測し、
前記第1被験データ群は、被験物質を投与した非ヒト動物から採取された1又は複数の器官におけるバイオマーカーの挙動を示すデータの群であって、第1訓練データ群の生成時に採取された器官に対応する1又は複数の器官におけるバイオマーカーの挙動を示すデータの群であり、
前記第2被験データ群は、複数の公知の適応症のラベルと、前記複数の公知の適応症のそれぞれに対応して報告されている第3訓練データ群の生成時に取得された有害事象に関する情報とが紐付けられたデータの群である、
前記システム。 - 1又は複数の異なる各器官におけるバイオマーカーの挙動を示すデータの群と、前記バイオマーカーの挙動を示すデータの群を取得する際に投与した既存物質の名称を示すラベルとが紐付けられたデータの群である、第1訓練データ群であって、
前記1又は複数の異なる器官は、ヒトにおける適応症が既知である複数の所定の既存物質を個別に投与した各非ヒト動物から採取される、前記第1訓練データ群と、
前記複数の所定の既存物質のそれぞれの名称を示すラベルと、前記複数の所定の既存物質のそれぞれについて報告されている前記適応症を示すラベルとが紐付けられたデータの群である、第2訓練データ群と、
前記適応症を示すラベルと前記適応症のそれぞれに対応して報告されている有害事象に関する情報が紐付けられたデータの群である、前記第3訓練データ群と、
を、被験物質のヒトにおける適応症を予測するための人工知能モデルの訓練のために使用する方法。 - 第1被験データ群と、第2被験データ群とを、被験物質のヒトにおける適応症を予測するための被験データとして使用する方法であって、
前記第1被験データ群は、被験物質を投与した非ヒト動物から採取された1又は複数の器官におけるバイオマーカーの挙動を示すデータの群であって、第1訓練データ群の生成時に採取された器官に対応する1又は複数の器官におけるバイオマーカーの挙動を示すデータの群であり、
前記第2被験データ群は、複数の公知の適応症のラベルと、前記複数の公知の適応症のそれぞれに対応して報告されている有害事象に関する情報とが紐付けられたデータの群である、
方法。
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IL292185A IL292185A (en) | 2019-10-17 | 2020-10-16 | An artificial intelligence model for predicting labels for test substances in humans |
CN202080072814.XA CN114556481A (zh) | 2019-10-17 | 2020-10-16 | 用于预测测试物质在人类中的适应症的人工智能模型 |
EP20877483.6A EP4047607A4 (en) | 2019-10-17 | 2020-10-16 | ARTIFICIAL INTELLIGENCE MODEL FOR PREDICTING INDICATIONS FOR TEST SUBSTANCES IN HUMANS |
JP2021552483A JPWO2021075574A1 (ja) | 2019-10-17 | 2020-10-16 | |
CA3158327A CA3158327A1 (en) | 2019-10-17 | 2020-10-16 | Artificial intelligence model for predicting indications for test substances in humans |
US17/769,516 US20240153649A1 (en) | 2019-10-17 | 2020-10-16 | Artificial Intelligence Model for Predicting Indications for Test Substances in Humans |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2019190332 | 2019-10-17 | ||
JP2019-190332 | 2019-10-17 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021075574A1 true WO2021075574A1 (ja) | 2021-04-22 |
Family
ID=75538249
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2020/039179 WO2021075574A1 (ja) | 2019-10-17 | 2020-10-16 | 被験物質のヒトにおける適応疾患を予測するための人工知能モデル |
Country Status (7)
Country | Link |
---|---|
US (1) | US20240153649A1 (ja) |
EP (1) | EP4047607A4 (ja) |
JP (1) | JPWO2021075574A1 (ja) |
CN (1) | CN114556481A (ja) |
CA (1) | CA3158327A1 (ja) |
IL (1) | IL292185A (ja) |
WO (1) | WO2021075574A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11676684B2 (en) | 2018-07-27 | 2023-06-13 | Karydo Therapeutix, Inc. | Artificial intelligence model for predicting actions of test substance in humans |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0259850B2 (ja) | 1986-07-31 | 1990-12-13 | Sumitomo Metal Mining Co | |
JP2015507470A (ja) * | 2011-11-11 | 2015-03-12 | コールド スプリング ハーバー ラボラトリー,アン エデュケーションコーポレーション オブ ザ ステイト オブ ニュー ヨーク | 薬物スクリーニング法およびその使用 |
US20150371009A1 (en) * | 2014-06-19 | 2015-12-24 | Jake Yue Chen | Drug identification models and methods of using the same to identify compounds to treat disease |
WO2016208776A1 (ja) | 2015-06-25 | 2016-12-29 | 株式会社国際電気通信基礎技術研究所 | 多器官連関システムを基盤とした予測装置、及び予測プログラム |
JP2019502988A (ja) * | 2015-12-02 | 2019-01-31 | 株式会社Preferred Networks | 薬物設計のための生成機械学習システム |
JP6559850B1 (ja) * | 2018-07-27 | 2019-08-14 | Karydo TherapeutiX株式会社 | ヒトにおける被験物質の作用を予測するための人工知能モデル |
-
2020
- 2020-10-16 CA CA3158327A patent/CA3158327A1/en active Pending
- 2020-10-16 CN CN202080072814.XA patent/CN114556481A/zh not_active Withdrawn
- 2020-10-16 IL IL292185A patent/IL292185A/en unknown
- 2020-10-16 EP EP20877483.6A patent/EP4047607A4/en active Pending
- 2020-10-16 WO PCT/JP2020/039179 patent/WO2021075574A1/ja active Application Filing
- 2020-10-16 US US17/769,516 patent/US20240153649A1/en active Pending
- 2020-10-16 JP JP2021552483A patent/JPWO2021075574A1/ja active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0259850B2 (ja) | 1986-07-31 | 1990-12-13 | Sumitomo Metal Mining Co | |
JP2015507470A (ja) * | 2011-11-11 | 2015-03-12 | コールド スプリング ハーバー ラボラトリー,アン エデュケーションコーポレーション オブ ザ ステイト オブ ニュー ヨーク | 薬物スクリーニング法およびその使用 |
US20150371009A1 (en) * | 2014-06-19 | 2015-12-24 | Jake Yue Chen | Drug identification models and methods of using the same to identify compounds to treat disease |
WO2016208776A1 (ja) | 2015-06-25 | 2016-12-29 | 株式会社国際電気通信基礎技術研究所 | 多器官連関システムを基盤とした予測装置、及び予測プログラム |
JP2019502988A (ja) * | 2015-12-02 | 2019-01-31 | 株式会社Preferred Networks | 薬物設計のための生成機械学習システム |
JP6559850B1 (ja) * | 2018-07-27 | 2019-08-14 | Karydo TherapeutiX株式会社 | ヒトにおける被験物質の作用を予測するための人工知能モデル |
Non-Patent Citations (1)
Title |
---|
See also references of EP4047607A4 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11676684B2 (en) | 2018-07-27 | 2023-06-13 | Karydo Therapeutix, Inc. | Artificial intelligence model for predicting actions of test substance in humans |
Also Published As
Publication number | Publication date |
---|---|
IL292185A (en) | 2022-06-01 |
CN114556481A (zh) | 2022-05-27 |
EP4047607A1 (en) | 2022-08-24 |
US20240153649A1 (en) | 2024-05-09 |
EP4047607A4 (en) | 2022-12-07 |
JPWO2021075574A1 (ja) | 2021-04-22 |
CA3158327A1 (en) | 2021-04-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11676684B2 (en) | Artificial intelligence model for predicting actions of test substance in humans | |
JP6432962B2 (ja) | 腎機能低下、慢性腎疾患及び腎不全からなる群から選択される少なくとも一種の疾患を予防、又は治療するための有効成分の候補物質のスクリーニング方法 | |
Semler et al. | A mutation in the 5′-UTR of IFITM5 creates an in-frame start codon and causes autosomal-dominant osteogenesis imperfecta type V with hyperplastic callus | |
Rinaldi et al. | Mutation in CPT1C associated with pure autosomal dominant spastic paraplegia | |
Alexandrov et al. | Large-scale phenome analysis defines a behavioral signature for Huntington's disease genotype in mice | |
Kara et al. | A 6.4 Mb duplication of the α-synuclein locus causing frontotemporal dementia and Parkinsonism: phenotype-genotype correlations | |
US20220076832A1 (en) | Prediction device based on inter-organ cross talk system | |
Anheim et al. | Exonic deletions of FXN and early-onset Friedreich ataxia | |
Cade et al. | Associations of variants In the hexokinase 1 and interleukin 18 receptor regions with oxyhemoglobin saturation during sleep | |
WO2021145434A1 (ja) | 目的とする薬剤又はその等価物質の適応症の予測方法、予測装置、及び予測プログラム | |
Louie et al. | Molecular and cellular pathogenesis of Ellis-van Creveld syndrome: lessons from targeted and natural mutations in animal models | |
Xie et al. | Deep phenotyping and lifetime trajectories reveal limited effects of longevity regulators on the aging process in C57BL/6J mice | |
WO2021075574A1 (ja) | 被験物質のヒトにおける適応疾患を予測するための人工知能モデル | |
Ramírez Rozzi et al. | Diversity among African pygmies | |
Schachtschneider et al. | Altered hippocampal epigenetic regulation underlying reduced cognitive development in response to early life environmental insults | |
WO2021145798A2 (en) | Methods of biological age evaluation and systems using such methods | |
Yabumoto et al. | Novel variants in KAT6B spectrum of disorders expand our knowledge of clinical manifestations and molecular mechanisms | |
WO2021157739A1 (ja) | シングルセルRNA-Seq解析のカウントデータセットの補正方法、シングルセルRNA-Seqの解析方法、細胞種の構成比率の解析方法、並びにこれらの方法を実行するための装置及びコンピュータプログラム | |
Behren et al. | Genomic Selection for Dairy Cattle Behaviour Considering Novel Traits in a Changing Technical Production Environment | |
Illera et al. | Addressing Combative Behaviour in Spanish Bulls by Measuring Hormonal Indicators | |
De Lillo et al. | Phenome-wide association study of TTR and RBP4 genes in 361,194 individuals reveals novel insights in the genetics of hereditary and senile systemic amyloidoses | |
JP2020129360A (ja) | mRNA前駆体の解析方法、情報処理装置、コンピュータプログラム | |
Lindau et al. | OP0255 TLR9-independent and immune complex-independent interferon-alpha production by neutrophils upon netosis in response to circulating chromatin | |
De Lara et al. | OP0254 Who should control the classical cardiovascular risk factors in the rheumatoid arthritis? study on the consistency between primary care and rheumatology | |
Michalski | Aspects for implementation of data mining in gerontology and geriatrics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20877483 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2021552483 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 3158327 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 17769516 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2020877483 Country of ref document: EP Effective date: 20220517 |