WO2022220236A1 - 情報処理方法、情報処理装置、及びプログラム - Google Patents
情報処理方法、情報処理装置、及びプログラム Download PDFInfo
- Publication number
- WO2022220236A1 WO2022220236A1 PCT/JP2022/017576 JP2022017576W WO2022220236A1 WO 2022220236 A1 WO2022220236 A1 WO 2022220236A1 JP 2022017576 W JP2022017576 W JP 2022017576W WO 2022220236 A1 WO2022220236 A1 WO 2022220236A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- disease
- snp
- alzheimer
- information
- genotype
- Prior art date
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 46
- 238000003672 processing method Methods 0.000 title claims abstract description 26
- 208000024827 Alzheimer disease Diseases 0.000 claims abstract description 244
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 109
- 230000035772 mutation Effects 0.000 claims abstract description 57
- 238000012549 training Methods 0.000 claims abstract description 50
- 238000010801 machine learning Methods 0.000 claims abstract description 13
- 238000000034 method Methods 0.000 claims description 79
- 238000001514 detection method Methods 0.000 claims description 26
- 238000012545 processing Methods 0.000 claims description 22
- 230000002068 genetic effect Effects 0.000 claims description 20
- 238000007637 random forest analysis Methods 0.000 claims description 6
- 238000011161 development Methods 0.000 abstract description 6
- 239000000523 sample Substances 0.000 description 71
- 210000004027 cell Anatomy 0.000 description 64
- 108010064539 amyloid beta-protein (1-42) Proteins 0.000 description 50
- 239000013615 primer Substances 0.000 description 49
- 108020004414 DNA Proteins 0.000 description 43
- 210000002569 neuron Anatomy 0.000 description 38
- 210000004556 brain Anatomy 0.000 description 37
- 108010026424 tau Proteins Proteins 0.000 description 31
- 102000013498 tau Proteins Human genes 0.000 description 31
- 238000003752 polymerase chain reaction Methods 0.000 description 25
- 238000004458 analytical method Methods 0.000 description 24
- 108700028369 Alleles Proteins 0.000 description 23
- 102100029470 Apolipoprotein E Human genes 0.000 description 22
- 101710095339 Apolipoprotein E Proteins 0.000 description 22
- 238000006243 chemical reaction Methods 0.000 description 22
- 210000001175 cerebrospinal fluid Anatomy 0.000 description 16
- 230000007170 pathology Effects 0.000 description 16
- 230000000875 corresponding effect Effects 0.000 description 15
- 102000004169 proteins and genes Human genes 0.000 description 14
- 230000000295 complement effect Effects 0.000 description 13
- 230000014509 gene expression Effects 0.000 description 13
- 102100023073 Calcium-activated potassium channel subunit alpha-1 Human genes 0.000 description 12
- 210000003710 cerebral cortex Anatomy 0.000 description 12
- 238000009396 hybridization Methods 0.000 description 12
- 238000012360 testing method Methods 0.000 description 12
- 239000013598 vector Substances 0.000 description 12
- 108020004459 Small interfering RNA Proteins 0.000 description 11
- 230000001575 pathological effect Effects 0.000 description 11
- 101001049859 Homo sapiens Calcium-activated potassium channel subunit alpha-1 Proteins 0.000 description 10
- 210000000349 chromosome Anatomy 0.000 description 10
- 230000001537 neural effect Effects 0.000 description 10
- 102100028918 Catenin alpha-3 Human genes 0.000 description 9
- 101000916179 Homo sapiens Catenin alpha-3 Proteins 0.000 description 9
- 230000002490 cerebral effect Effects 0.000 description 9
- 210000003618 cortical neuron Anatomy 0.000 description 9
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 9
- 238000003205 genotyping method Methods 0.000 description 9
- 239000000047 product Substances 0.000 description 9
- 230000001225 therapeutic effect Effects 0.000 description 9
- 102100023005 Anoctamin-3 Human genes 0.000 description 8
- 101000757285 Homo sapiens Anoctamin-3 Proteins 0.000 description 8
- 108010064397 amyloid beta-protein (1-40) Proteins 0.000 description 8
- 239000012228 culture supernatant Substances 0.000 description 8
- 201000010099 disease Diseases 0.000 description 8
- 238000003197 gene knockdown Methods 0.000 description 8
- 238000003199 nucleic acid amplification method Methods 0.000 description 8
- 210000003819 peripheral blood mononuclear cell Anatomy 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 238000003860 storage Methods 0.000 description 8
- 101150104383 ALOX5AP gene Proteins 0.000 description 7
- 101100236114 Mus musculus Lrrfip1 gene Proteins 0.000 description 7
- 230000003321 amplification Effects 0.000 description 7
- 230000006933 amyloid-beta aggregation Effects 0.000 description 7
- 238000010790 dilution Methods 0.000 description 7
- 239000012895 dilution Substances 0.000 description 7
- 238000002866 fluorescence resonance energy transfer Methods 0.000 description 7
- 239000002609 medium Substances 0.000 description 7
- 238000009825 accumulation Methods 0.000 description 6
- 230000007792 alzheimer disease pathology Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 6
- 208000010877 cognitive disease Diseases 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 241000894007 species Species 0.000 description 6
- 101710137189 Amyloid-beta A4 protein Proteins 0.000 description 5
- 102100022704 Amyloid-beta precursor protein Human genes 0.000 description 5
- 101710151993 Amyloid-beta precursor protein Proteins 0.000 description 5
- 230000004544 DNA amplification Effects 0.000 description 5
- 238000000018 DNA microarray Methods 0.000 description 5
- 206010012289 Dementia Diseases 0.000 description 5
- 230000002159 abnormal effect Effects 0.000 description 5
- DZHSAHHDTRWUTF-SIQRNXPUSA-N amyloid-beta polypeptide 42 Chemical compound C([C@@H](C(=O)N[C@@H](C)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(O)=O)C(=O)N[C@H](C(=O)NCC(=O)N[C@@H](CO)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCCCN)C(=O)NCC(=O)N[C@@H](C)C(=O)N[C@H](C(=O)N[C@@H]([C@@H](C)CC)C(=O)NCC(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](C(C)C)C(=O)NCC(=O)NCC(=O)N[C@@H](C(C)C)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H]([C@@H](C)CC)C(=O)N[C@@H](C)C(O)=O)[C@@H](C)CC)C(C)C)NC(=O)[C@H](CC=1C=CC=CC=1)NC(=O)[C@@H](NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](CC=1N=CNC=1)NC(=O)[C@H](CC=1N=CNC=1)NC(=O)[C@@H](NC(=O)[C@H](CCC(O)=O)NC(=O)[C@H](CC=1C=CC(O)=CC=1)NC(=O)CNC(=O)[C@H](CO)NC(=O)[C@H](CC(O)=O)NC(=O)[C@H](CC=1N=CNC=1)NC(=O)[C@H](CCCNC(N)=N)NC(=O)[C@H](CC=1C=CC=CC=1)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@H](C)NC(=O)[C@@H](N)CC(O)=O)C(C)C)C(C)C)C1=CC=CC=C1 DZHSAHHDTRWUTF-SIQRNXPUSA-N 0.000 description 5
- 238000013461 design Methods 0.000 description 5
- 230000018109 developmental process Effects 0.000 description 5
- 230000004069 differentiation Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 238000004519 manufacturing process Methods 0.000 description 5
- 238000003068 pathway analysis Methods 0.000 description 5
- 230000003234 polygenic effect Effects 0.000 description 5
- 239000002243 precursor Substances 0.000 description 5
- 238000011282 treatment Methods 0.000 description 5
- 208000028698 Cognitive impairment Diseases 0.000 description 4
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 4
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 4
- 101000801282 Homo sapiens Protein O-mannosyl-transferase TMTC1 Proteins 0.000 description 4
- 101710096140 Neurogenin-2 Proteins 0.000 description 4
- 102100038554 Neurogenin-2 Human genes 0.000 description 4
- 102100033739 Protein O-mannosyl-transferase TMTC1 Human genes 0.000 description 4
- 108010006785 Taq Polymerase Proteins 0.000 description 4
- 238000013459 approach Methods 0.000 description 4
- 210000004369 blood Anatomy 0.000 description 4
- 239000008280 blood Substances 0.000 description 4
- 108010041758 cleavase Proteins 0.000 description 4
- 238000010367 cloning Methods 0.000 description 4
- 239000003814 drug Substances 0.000 description 4
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 4
- 238000004949 mass spectrometry Methods 0.000 description 4
- 238000002610 neuroimaging Methods 0.000 description 4
- 239000002773 nucleotide Substances 0.000 description 4
- 125000003729 nucleotide group Chemical group 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000012163 sequencing technique Methods 0.000 description 4
- 239000000126 substance Substances 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 230000007082 Aβ accumulation Effects 0.000 description 3
- OYPRJOBELJOOCE-UHFFFAOYSA-N Calcium Chemical compound [Ca] OYPRJOBELJOOCE-UHFFFAOYSA-N 0.000 description 3
- 206010064571 Gene mutation Diseases 0.000 description 3
- 102100034343 Integrase Human genes 0.000 description 3
- 101710203526 Integrase Proteins 0.000 description 3
- -1 L-MYC Proteins 0.000 description 3
- 238000007397 LAMP assay Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 238000000137 annealing Methods 0.000 description 3
- 238000003556 assay Methods 0.000 description 3
- 239000000090 biomarker Substances 0.000 description 3
- 239000011575 calcium Substances 0.000 description 3
- 229910052791 calcium Inorganic materials 0.000 description 3
- 239000002299 complementary DNA Substances 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 239000003550 marker Substances 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000008506 pathogenesis Effects 0.000 description 3
- 230000037361 pathway Effects 0.000 description 3
- 230000002265 prevention Effects 0.000 description 3
- 238000000513 principal component analysis Methods 0.000 description 3
- 238000000746 purification Methods 0.000 description 3
- 238000007894 restriction fragment length polymorphism technique Methods 0.000 description 3
- 238000013517 stratification Methods 0.000 description 3
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 2
- ZQAQXZBSGZUUNL-BJUDXGSMSA-N 2-[4-(methylamino)phenyl]-1,3-benzothiazol-6-ol Chemical compound C1=CC(N[11CH3])=CC=C1C1=NC2=CC=C(O)C=C2S1 ZQAQXZBSGZUUNL-BJUDXGSMSA-N 0.000 description 2
- 108010037497 3'-nucleotidase Proteins 0.000 description 2
- FWBHETKCLVMNFS-UHFFFAOYSA-N 4',6-Diamino-2-phenylindol Chemical compound C1=CC(C(=N)N)=CC=C1C1=CC2=CC=C(C(N)=N)C=C2N1 FWBHETKCLVMNFS-UHFFFAOYSA-N 0.000 description 2
- 108010090849 Amyloid beta-Peptides Proteins 0.000 description 2
- 102000013455 Amyloid beta-Peptides Human genes 0.000 description 2
- 102100021257 Beta-secretase 1 Human genes 0.000 description 2
- 102100040750 CUB and sushi domain-containing protein 1 Human genes 0.000 description 2
- 101710189782 Calcium-activated potassium channel subunit alpha-1 Proteins 0.000 description 2
- LZZYPRNAOMGNLH-UHFFFAOYSA-M Cetrimonium bromide Chemical compound [Br-].CCCCCCCCCCCCCCCC[N+](C)(C)C LZZYPRNAOMGNLH-UHFFFAOYSA-M 0.000 description 2
- HEDRZPFGACZZDS-UHFFFAOYSA-N Chloroform Chemical compound ClC(Cl)Cl HEDRZPFGACZZDS-UHFFFAOYSA-N 0.000 description 2
- 108020004635 Complementary DNA Proteins 0.000 description 2
- OQEBIHBLFRADNM-UHFFFAOYSA-N D-iminoxylitol Natural products OCC1NCC(O)C1O OQEBIHBLFRADNM-UHFFFAOYSA-N 0.000 description 2
- 102100025269 DENN domain-containing protein 2B Human genes 0.000 description 2
- 230000006820 DNA synthesis Effects 0.000 description 2
- 102100028561 Disabled homolog 1 Human genes 0.000 description 2
- 102100028606 E3 ubiquitin-protein ligase ZNRF2 Human genes 0.000 description 2
- 101000941893 Felis catus Leucine-rich repeat and calponin homology domain-containing protein 1 Proteins 0.000 description 2
- 101000892017 Homo sapiens CUB and sushi domain-containing protein 1 Proteins 0.000 description 2
- 101000722264 Homo sapiens DENN domain-containing protein 2B Proteins 0.000 description 2
- 101000915416 Homo sapiens Disabled homolog 1 Proteins 0.000 description 2
- 101000915569 Homo sapiens E3 ubiquitin-protein ligase ZNRF2 Proteins 0.000 description 2
- 101000919167 Homo sapiens Inactive carboxypeptidase-like protein X2 Proteins 0.000 description 2
- 101000979001 Homo sapiens Methionine aminopeptidase 2 Proteins 0.000 description 2
- 101000969087 Homo sapiens Microtubule-associated protein 2 Proteins 0.000 description 2
- 101000650697 Homo sapiens Roundabout homolog 2 Proteins 0.000 description 2
- 101000931371 Homo sapiens Zinc finger protein ZFPM2 Proteins 0.000 description 2
- 102100029326 Inactive carboxypeptidase-like protein X2 Human genes 0.000 description 2
- 102100023174 Methionine aminopeptidase 2 Human genes 0.000 description 2
- 108091034117 Oligonucleotide Proteins 0.000 description 2
- 108020005187 Oligonucleotide Probes Proteins 0.000 description 2
- 238000012879 PET imaging Methods 0.000 description 2
- 238000003559 RNA-seq method Methods 0.000 description 2
- 102100027739 Roundabout homolog 2 Human genes 0.000 description 2
- 108091007416 X-inactive specific transcript Proteins 0.000 description 2
- 102100020996 Zinc finger protein ZFPM2 Human genes 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000012098 association analyses Methods 0.000 description 2
- 210000001130 astrocyte Anatomy 0.000 description 2
- 238000011888 autopsy Methods 0.000 description 2
- 230000028956 calcium-mediated signaling Effects 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 238000003759 clinical diagnosis Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 230000001186 cumulative effect Effects 0.000 description 2
- 230000003247 decreasing effect Effects 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 239000007850 fluorescent dye Substances 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 238000012760 immunocytochemical staining Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000000338 in vitro Methods 0.000 description 2
- 238000002844 melting Methods 0.000 description 2
- 230000008018 melting Effects 0.000 description 2
- 210000000274 microglia Anatomy 0.000 description 2
- 208000027061 mild cognitive impairment Diseases 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000004031 neuronal differentiation Effects 0.000 description 2
- 230000008587 neuronal excitability Effects 0.000 description 2
- 108020004707 nucleic acids Proteins 0.000 description 2
- 102000039446 nucleic acids Human genes 0.000 description 2
- 150000007523 nucleic acids Chemical class 0.000 description 2
- 210000004248 oligodendroglia Anatomy 0.000 description 2
- 239000002751 oligonucleotide probe Substances 0.000 description 2
- 239000013610 patient sample Substances 0.000 description 2
- 239000012071 phase Substances 0.000 description 2
- 238000002600 positron emission tomography Methods 0.000 description 2
- 238000003908 quality control method Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000002829 reductive effect Effects 0.000 description 2
- 108091008146 restriction endonucleases Proteins 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 239000006228 supernatant Substances 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 101150024821 tetO gene Proteins 0.000 description 2
- 238000010361 transduction Methods 0.000 description 2
- 230000026683 transduction Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000000844 transformation Methods 0.000 description 2
- 238000007482 whole exome sequencing Methods 0.000 description 2
- QNQZPJLBGRQFDD-ZMSORURPSA-N (2r,3r,4r,5r)-2-[(1s,2s,3r,4s,6r)-4,6-diamino-3-[(2s,3r,4r,5s,6r)-3-amino-4,5-dihydroxy-6-[(1r)-1-hydroxyethyl]oxan-2-yl]oxy-2-hydroxycyclohexyl]oxy-5-methyl-4-(methylamino)oxane-3,5-diol;sulfuric acid Chemical compound OS(O)(=O)=O.O1C[C@@](O)(C)[C@H](NC)[C@@H](O)[C@H]1O[C@@H]1[C@@H](O)[C@H](O[C@@H]2[C@@H]([C@@H](O)[C@H](O)[C@@H]([C@@H](C)O)O2)N)[C@@H](N)C[C@H]1N QNQZPJLBGRQFDD-ZMSORURPSA-N 0.000 description 1
- RQOWDDLKGBMJFX-QHCPKHFHSA-N (2s)-2-[3,5-bis[4-(trifluoromethyl)phenyl]phenyl]-4-methylpentanoic acid Chemical compound C=1C([C@@H](C(O)=O)CC(C)C)=CC(C=2C=CC(=CC=2)C(F)(F)F)=CC=1C1=CC=C(C(F)(F)F)C=C1 RQOWDDLKGBMJFX-QHCPKHFHSA-N 0.000 description 1
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 description 1
- 101150037123 APOE gene Proteins 0.000 description 1
- 229930024421 Adenine Natural products 0.000 description 1
- GFFGJBXGBJISGV-UHFFFAOYSA-N Adenine Chemical compound NC1=NC=NC2=C1N=CN2 GFFGJBXGBJISGV-UHFFFAOYSA-N 0.000 description 1
- 206010001497 Agitation Diseases 0.000 description 1
- 102000002260 Alkaline Phosphatase Human genes 0.000 description 1
- 108020004774 Alkaline Phosphatase Proteins 0.000 description 1
- 208000022099 Alzheimer disease 2 Diseases 0.000 description 1
- 208000002150 Arrhythmogenic Right Ventricular Dysplasia Diseases 0.000 description 1
- 201000006058 Arrhythmogenic right ventricular cardiomyopathy Diseases 0.000 description 1
- 230000007324 Aβ metabolism Effects 0.000 description 1
- 238000000035 BCA protein assay Methods 0.000 description 1
- 101710150192 Beta-secretase 1 Proteins 0.000 description 1
- 102100039195 Cullin-1 Human genes 0.000 description 1
- 102000053602 DNA Human genes 0.000 description 1
- 238000007400 DNA extraction Methods 0.000 description 1
- 239000003155 DNA primer Substances 0.000 description 1
- 239000003298 DNA probe Substances 0.000 description 1
- 102100032883 DNA-binding protein SATB2 Human genes 0.000 description 1
- 241000238557 Decapoda Species 0.000 description 1
- 208000014094 Dystonic disease Diseases 0.000 description 1
- 108060002716 Exonuclease Proteins 0.000 description 1
- 208000034826 Genetic Predisposition to Disease Diseases 0.000 description 1
- 206010071602 Genetic polymorphism Diseases 0.000 description 1
- 101710088172 HTH-type transcriptional regulator RipA Proteins 0.000 description 1
- 101000894895 Homo sapiens Beta-secretase 1 Proteins 0.000 description 1
- 101000749829 Homo sapiens Connector enhancer of kinase suppressor of ras 3 Proteins 0.000 description 1
- 101000746063 Homo sapiens Cullin-1 Proteins 0.000 description 1
- 101000655236 Homo sapiens DNA-binding protein SATB2 Proteins 0.000 description 1
- 101001139134 Homo sapiens Krueppel-like factor 4 Proteins 0.000 description 1
- 101001057193 Homo sapiens Membrane-associated guanylate kinase, WW and PDZ domain-containing protein 1 Proteins 0.000 description 1
- 101100516507 Homo sapiens NEUROG2 gene Proteins 0.000 description 1
- 101000603698 Homo sapiens Neurogenin-2 Proteins 0.000 description 1
- 101000986779 Homo sapiens Orexigenic neuropeptide QRFP Proteins 0.000 description 1
- 101000617536 Homo sapiens Presenilin-1 Proteins 0.000 description 1
- 101000984042 Homo sapiens Protein lin-28 homolog A Proteins 0.000 description 1
- 101001060451 Homo sapiens Pyroglutamylated RF-amide peptide receptor Proteins 0.000 description 1
- 101000687905 Homo sapiens Transcription factor SOX-2 Proteins 0.000 description 1
- 101000844510 Homo sapiens Transient receptor potential cation channel subfamily M member 1 Proteins 0.000 description 1
- 101000800807 Homo sapiens Tumor necrosis factor alpha-induced protein 8 Proteins 0.000 description 1
- 206010061218 Inflammation Diseases 0.000 description 1
- 102100020677 Krueppel-like factor 4 Human genes 0.000 description 1
- 102100027240 Membrane-associated guanylate kinase, WW and PDZ domain-containing protein 1 Human genes 0.000 description 1
- 102100028142 Orexigenic neuropeptide QRFP Human genes 0.000 description 1
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 238000012408 PCR amplification Methods 0.000 description 1
- 102100035423 POU domain, class 5, transcription factor 1 Human genes 0.000 description 1
- 101710126211 POU domain, class 5, transcription factor 1 Proteins 0.000 description 1
- 229930040373 Paraformaldehyde Natural products 0.000 description 1
- ISWSIDIOOBJBQZ-UHFFFAOYSA-N Phenol Chemical compound OC1=CC=CC=C1 ISWSIDIOOBJBQZ-UHFFFAOYSA-N 0.000 description 1
- ZLMJMSJWJFRBEC-UHFFFAOYSA-N Potassium Chemical compound [K] ZLMJMSJWJFRBEC-UHFFFAOYSA-N 0.000 description 1
- 102000004257 Potassium Channel Human genes 0.000 description 1
- 102100025460 Protein lin-28 homolog A Human genes 0.000 description 1
- 102100027888 Pyroglutamylated RF-amide peptide receptor Human genes 0.000 description 1
- 239000012083 RIPA buffer Substances 0.000 description 1
- 108020004682 Single-Stranded DNA Proteins 0.000 description 1
- 238000002105 Southern blotting Methods 0.000 description 1
- 102000003617 TRPM1 Human genes 0.000 description 1
- AYFVYJQAPQTCCC-UHFFFAOYSA-N Threonine Natural products CC(O)C(N)C(O)=O AYFVYJQAPQTCCC-UHFFFAOYSA-N 0.000 description 1
- 239000004473 Threonine Substances 0.000 description 1
- 102100024270 Transcription factor SOX-2 Human genes 0.000 description 1
- 229920004890 Triton X-100 Polymers 0.000 description 1
- 239000013504 Triton X-100 Substances 0.000 description 1
- 238000010162 Tukey test Methods 0.000 description 1
- 102100033649 Tumor necrosis factor alpha-induced protein 8 Human genes 0.000 description 1
- 241000700605 Viruses Species 0.000 description 1
- 238000001793 Wilcoxon signed-rank test Methods 0.000 description 1
- JLCPHMBAVCMARE-UHFFFAOYSA-N [3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[3-[[3-[[3-[[3-[[3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-[[5-(2-amino-6-oxo-1H-purin-9-yl)-3-hydroxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxyoxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(5-methyl-2,4-dioxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(6-aminopurin-9-yl)oxolan-2-yl]methoxy-hydroxyphosphoryl]oxy-5-(4-amino-2-oxopyrimidin-1-yl)oxolan-2-yl]methyl [5-(6-aminopurin-9-yl)-2-(hydroxymethyl)oxolan-3-yl] hydrogen phosphate Polymers Cc1cn(C2CC(OP(O)(=O)OCC3OC(CC3OP(O)(=O)OCC3OC(CC3O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c3nc(N)[nH]c4=O)C(COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3COP(O)(=O)OC3CC(OC3CO)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3ccc(N)nc3=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cc(C)c(=O)[nH]c3=O)n3cc(C)c(=O)[nH]c3=O)n3ccc(N)nc3=O)n3cc(C)c(=O)[nH]c3=O)n3cnc4c3nc(N)[nH]c4=O)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)n3cnc4c(N)ncnc34)O2)c(=O)[nH]c1=O JLCPHMBAVCMARE-UHFFFAOYSA-N 0.000 description 1
- 229960000643 adenine Drugs 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000000540 analysis of variance Methods 0.000 description 1
- 238000013103 analytical ultracentrifugation Methods 0.000 description 1
- 210000004507 artificial chromosome Anatomy 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000003339 best practice Methods 0.000 description 1
- 230000003115 biocidal effect Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 210000004899 c-terminal region Anatomy 0.000 description 1
- 238000010804 cDNA synthesis Methods 0.000 description 1
- 230000008777 canonical pathway Effects 0.000 description 1
- 230000017455 cell-cell adhesion Effects 0.000 description 1
- 230000008614 cellular interaction Effects 0.000 description 1
- 239000007795 chemical reaction product Substances 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 239000003636 conditioned culture medium Substances 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000001054 cortical effect Effects 0.000 description 1
- 239000013601 cosmid vector Substances 0.000 description 1
- 229960000265 cromoglicic acid Drugs 0.000 description 1
- IMZMKUWMOSJXDT-UHFFFAOYSA-N cromoglycic acid Chemical compound O1C(C(O)=O)=CC(=O)C2=C1C=CC=C2OCC(O)COC1=CC=CC2=C1C(=O)C=C(C(O)=O)O2 IMZMKUWMOSJXDT-UHFFFAOYSA-N 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- UQLDLKMNUJERMK-UHFFFAOYSA-L di(octadecanoyloxy)lead Chemical compound [Pb+2].CCCCCCCCCCCCCCCCCC([O-])=O.CCCCCCCCCCCCCCCCCC([O-])=O UQLDLKMNUJERMK-UHFFFAOYSA-L 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 235000021186 dishes Nutrition 0.000 description 1
- BFMYDTVEBKDAKJ-UHFFFAOYSA-L disodium;(2',7'-dibromo-3',6'-dioxido-3-oxospiro[2-benzofuran-1,9'-xanthene]-4'-yl)mercury;hydrate Chemical compound O.[Na+].[Na+].O1C(=O)C2=CC=CC=C2C21C1=CC(Br)=C([O-])C([Hg])=C1OC1=C2C=C(Br)C([O-])=C1 BFMYDTVEBKDAKJ-UHFFFAOYSA-L 0.000 description 1
- 238000002224 dissection Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 239000000975 dye Substances 0.000 description 1
- 208000025688 early-onset autosomal dominant Alzheimer disease Diseases 0.000 description 1
- 239000003792 electrolyte Substances 0.000 description 1
- 238000001962 electrophoresis Methods 0.000 description 1
- 210000002472 endoplasmic reticulum Anatomy 0.000 description 1
- 210000002889 endothelial cell Anatomy 0.000 description 1
- 230000006353 environmental stress Effects 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 210000002304 esc Anatomy 0.000 description 1
- 238000012869 ethanol precipitation Methods 0.000 description 1
- ZMMJGEGLRURXTF-UHFFFAOYSA-N ethidium bromide Chemical compound [Br-].C12=CC(N)=CC=C2C2=CC=C(N)C=C2[N+](CC)=C1C1=CC=CC=C1 ZMMJGEGLRURXTF-UHFFFAOYSA-N 0.000 description 1
- 229960005542 ethidium bromide Drugs 0.000 description 1
- 102000013165 exonuclease Human genes 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 238000012921 fluorescence analysis Methods 0.000 description 1
- 239000012737 fresh medium Substances 0.000 description 1
- 238000001502 gel electrophoresis Methods 0.000 description 1
- 210000001654 germ layer Anatomy 0.000 description 1
- 239000001963 growth medium Substances 0.000 description 1
- 210000004209 hair Anatomy 0.000 description 1
- 102000045835 human NEUROG2 Human genes 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000004968 inflammatory condition Effects 0.000 description 1
- 230000004054 inflammatory process Effects 0.000 description 1
- 208000011487 inherited dystonia Diseases 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000011901 isothermal amplification Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 210000002751 lymph Anatomy 0.000 description 1
- 239000006166 lysate Substances 0.000 description 1
- 238000002595 magnetic resonance imaging Methods 0.000 description 1
- 238000001819 mass spectrum Methods 0.000 description 1
- 230000013011 mating Effects 0.000 description 1
- 238000001840 matrix-assisted laser desorption--ionisation time-of-flight mass spectrometry Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000010197 meta-analysis Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010369 molecular cloning Methods 0.000 description 1
- 230000003990 molecular pathway Effects 0.000 description 1
- 238000004264 monolayer culture Methods 0.000 description 1
- 238000010172 mouse model Methods 0.000 description 1
- POFWRMVFWIJXHP-UHFFFAOYSA-N n-benzyl-9-(oxan-2-yl)purin-6-amine Chemical compound C=1C=CC=CC=1CNC(C=1N=C2)=NC=NC=1N2C1CCCCO1 POFWRMVFWIJXHP-UHFFFAOYSA-N 0.000 description 1
- 210000003061 neural cell Anatomy 0.000 description 1
- 230000004770 neurodegeneration Effects 0.000 description 1
- 208000015122 neurodegenerative disease Diseases 0.000 description 1
- 230000007472 neurodevelopment Effects 0.000 description 1
- 210000004498 neuroglial cell Anatomy 0.000 description 1
- 230000016273 neuron death Effects 0.000 description 1
- 230000003557 neuropsychological effect Effects 0.000 description 1
- 230000009871 nonspecific binding Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000010899 nucleation Methods 0.000 description 1
- 210000004940 nucleus Anatomy 0.000 description 1
- 238000001543 one-way ANOVA Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- PIRWNASAJNPKHT-SHZATDIYSA-N pamp Chemical compound C([C@@H](C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC=1C2=CC=CC=C2NC=1)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CC=1C2=CC=CC=C2NC=1)C(=O)N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CCCNC(N)=N)C(N)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@H](CO)NC(=O)[C@H](C)NC(=O)[C@@H](NC(=O)[C@H](CC(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCCNC(N)=N)NC(=O)[C@H](C)N)C(C)C)C1=CC=CC=C1 PIRWNASAJNPKHT-SHZATDIYSA-N 0.000 description 1
- 229920002866 paraformaldehyde Polymers 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 210000004976 peripheral blood cell Anatomy 0.000 description 1
- 239000013600 plasmid vector Substances 0.000 description 1
- 238000012068 polygenic analysis Methods 0.000 description 1
- 102000054765 polymorphisms of proteins Human genes 0.000 description 1
- 108091033319 polynucleotide Proteins 0.000 description 1
- 102000040430 polynucleotide Human genes 0.000 description 1
- 239000002157 polynucleotide Substances 0.000 description 1
- 238000012636 positron electron tomography Methods 0.000 description 1
- 238000010149 post-hoc-test Methods 0.000 description 1
- 239000011591 potassium Substances 0.000 description 1
- 229910052700 potassium Inorganic materials 0.000 description 1
- 108020001213 potassium channel Proteins 0.000 description 1
- 108090000765 processed proteins & peptides Proteins 0.000 description 1
- 102000004196 processed proteins & peptides Human genes 0.000 description 1
- 230000001681 protective effect Effects 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- QPWYMHBRJDWMIS-AULSSRMGSA-N qrfp Chemical compound C([C@H](NC(=O)CNC(=O)[C@H](CC(N)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@H](C)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CO)NC(=O)CNC(=O)[C@@H](NC(=O)[C@H](C)NC(=O)[C@@H](NC(=O)CNC(=O)[C@@H](N)C(C)C)[C@@H](C)O)CC(C)C)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](CCCCN)C(=O)NCC(=O)NCC(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC=1C=CC=CC=1)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC=1C=CC=CC=1)C(N)=O)C1=CC=C(O)C=C1 QPWYMHBRJDWMIS-AULSSRMGSA-N 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000002285 radioactive effect Effects 0.000 description 1
- 238000010814 radioimmunoprecipitation assay Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000000611 regression analysis Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 230000008672 reprogramming Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 238000005096 rolling process Methods 0.000 description 1
- 102200017290 rs429358 Human genes 0.000 description 1
- 210000003765 sex chromosome Anatomy 0.000 description 1
- 210000002460 smooth muscle Anatomy 0.000 description 1
- 239000007790 solid phase Substances 0.000 description 1
- 210000000278 spinal cord Anatomy 0.000 description 1
- 238000010186 staining Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 230000035882 stress Effects 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 238000002636 symptomatic treatment Methods 0.000 description 1
- 230000007470 synaptic degeneration Effects 0.000 description 1
- 238000002560 therapeutic procedure Methods 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- 231100000331 toxic Toxicity 0.000 description 1
- 230000002588 toxic effect Effects 0.000 description 1
- 230000010474 transient expression Effects 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
Definitions
- the present invention relates to an information processing method, an information processing apparatus, and a program.
- This application claims priority to US Patent No. 63/174,500 filed April 13, 2021 in the United States, the contents of which are hereby incorporated by reference.
- AD Alzheimer's disease
- AD genome-wide association studies
- GWAS genome-wide association studies
- AD is thought to develop due to the combined action of multiple genes (polygenes), and in particular, sporadic AD with no family history, which accounts for 95% of AD patients, is characterized by genetic factors in the pathology. There is no effective approach to find the cause.
- the present invention has been made in view of the above circumstances, and provides an information processing method, information processing apparatus, and program capable of predicting the risk of developing AD in a subject.
- the inventors created cerebral cortical neurons using an iPS cohort consisting of iPS cells established from sporadic AD patients.
- a ⁇ 42/40 ratio amyloid ⁇ (A ⁇ 42/40 ratio) 42/40 ratio
- GWAS cell GWAS
- a ⁇ 42/ Loci associated with the 40 ratio were searched.
- the inventors have found that the risk of developing AD in a subject can be predicted using the identified loci associated with the A ⁇ 42/40 ratio as a polygene data set, and have completed the present invention.
- CDiP Cellular dissection of polygenicity
- Step 1 of detecting a first SNP that is a mutation in an Alzheimer's disease-associated gene in a genomic DNA sample derived from a subject; Based on a plurality of training data sets labeled with information on the onset of Alzheimer's disease for the second SNP, which is a mutation in the Alzheimer's disease-associated gene detected in a genomic DNA sample derived from a patient who developed Alzheimer's disease.
- Step 2 of determining whether the subject develops Alzheimer's disease from the first SNP using the machine learning model learned in A method of processing information comprising: (2) the machine learning model is a random forest comprising a plurality of classifiers; Each classifier is learned using a specific training data set selected from among the plurality of training data sets based on the principal component information in the attribute information and genetic information of the patient with Alzheimer's disease. , the information processing method according to (1). (3) the plurality of training data sets are labeled with information about the onset of Alzheimer's disease for the attributed genotype of the second SNP estimated from the second SNP using genotype imputation; The information processing method according to (1) or (2), wherein the attached data set is included. (4) The information processing method according to any one of (1) to (3), wherein the mutation is one or more mutations listed in Tables 1-1 to 1-77.
- a detection unit that detects a first SNP that is a mutation in an Alzheimer's disease-associated gene in a genomic DNA sample derived from a subject; Based on a plurality of training data sets labeled with information on the onset of Alzheimer's disease for the second SNP, which is a mutation in the Alzheimer's disease-associated gene detected in a genomic DNA sample derived from a patient who developed Alzheimer's disease.
- a determination unit that determines whether the subject develops Alzheimer's disease from the first SNP using the machine learning model learned in An information processing device.
- Step 1 of detecting a first SNP that is a mutation in an Alzheimer's disease-associated gene in a genomic DNA sample derived from a subject; Based on a plurality of training data sets labeled with information on the onset of Alzheimer's disease for the second SNP, which is a mutation in the Alzheimer's disease-associated gene detected in a genomic DNA sample derived from a patient who developed Alzheimer's disease.
- Step 2 of determining whether the subject develops Alzheimer's disease from the first SNP using the machine learning model learned in program to run the
- the risk of developing AD in a subject can be predicted. Prediction of onset risk can contribute to improvement of AD prevention or therapeutic effects. Furthermore, new therapeutic targets for AD can be provided.
- FIG. 4 is a flow chart showing the flow of a series of runtime processes by the processing unit 120 according to the first embodiment; It is a figure which shows an example of the prediction model MDL which concerns on 1st Embodiment.
- 6 is a flow chart showing a series of training processes performed by the processing unit 120 according to the first embodiment.
- 4 is a graph showing the amount of A ⁇ 40 corresponding to the apolipoprotein E (APOE) ⁇ 4 genotype in Example 1.
- APOE apolipoprotein E
- N.S. indicates no significant difference (P>0.05).
- 4 is a graph showing the amount of A ⁇ 42 corresponding to the APOE ⁇ 4 genotype in Example 1.
- N.S.” indicates no significant difference (P>0.05).
- 2 is a graph showing A ⁇ 42/40 ratios corresponding to APOE ⁇ 4 genotypes in Example 1.
- FIG. 4 is a graph showing the protein concentration of iPS cell-derived neurons corresponding to the APOE ⁇ 4 genotype in Example 1.
- FIG. 2 is a Manhattan plot of a genome-wide association study of cell polygenic analysis (CDiP) to identify loci associated with the A ⁇ 42/40 ratio without considering the APOE ⁇ 4 genotype in Example 1.
- FIG. The x-axis shows chromosomes and the y-axis shows ⁇ log10 (p-value) of all SNPs tested. The upper line indicates the Bonferroni-corrected significance threshold (p ⁇ 5 ⁇ 10 ⁇ 8 ).
- 2 is a Manhattan plot of CDiP genome-wide association studies to identify loci associated with the A ⁇ 42/40 ratio considering the APOE ⁇ 4 genotype in Example 1.
- the x-axis shows chromosomes and the y-axis shows ⁇ log10 (p-value) of all SNPs tested.
- the upper line indicates the Bonferroni-corrected significance threshold (p ⁇ 5 ⁇ 10 ⁇ 8 ).
- 2 is a graph showing the results of pathway analysis of 24 CDiP-identified genes using A ⁇ 42/40 ratios in Example 1.
- FIG. The horizontal axis indicates the p-value.
- 4 is a plot showing phosphorylated tau (p231-tau)/total tau ratios corresponding to APOE ⁇ 4 genotypes in Example 1.
- N.S. indicates no significant difference (P>0.05).
- 2 is a plot showing p231-tau/total tau ratios in culture supernatants of iPS cell-derived neurons, corresponding to gender in Example 1.
- the x-axis shows chromosomes and the y-axis shows ⁇ log10 (p-value) of all SNPs tested.
- the upper line indicates the Bonferroni-corrected significance threshold (p ⁇ 5 ⁇ 10 ⁇ 8 ).
- FIG. The x-axis shows chromosomes and the y-axis shows ⁇ log10 (p-value) of all SNPs tested.
- the upper line indicates the Bonferroni-corrected significance threshold (p ⁇ 5 ⁇ 10 ⁇ 8 ).
- 1 is a graph showing changes in A ⁇ 42/40 ratio due to knockdown of the identified genes in Example 1.
- FIG. The x-axis indicates the level of change in the A ⁇ 42/40 ratio compared to non-siRNA-treated controls. Values are shown as mean ⁇ standard deviation. * is p ⁇ 0.05, ** is p ⁇ 0.01, *** is p ⁇ 0.005, *** is p ⁇ 0.001. 1 is a graph showing changes in the amount of A ⁇ 40 due to knockdown of the specified gene in Example 1.
- FIG. The x-axis indicates the level of change in A ⁇ 40 abundance compared to non-siRNA-treated controls. Values are shown as mean ⁇ standard deviation. * is p ⁇ 0.05, ** is p ⁇ 0.01, *** is p ⁇ 0.005, *** is p ⁇ 0.001.
- FIG. 1 is a graph showing changes in the amount of A ⁇ 42 due to knockdown of the specified gene in Example 1.
- FIG. The x-axis indicates the level of change in A ⁇ 42 abundance compared to non-siRNA-treated controls. Values are shown as mean ⁇ standard deviation. * is p ⁇ 0.05, ** is p ⁇ 0.01, *** is p ⁇ 0.005, *** is p ⁇ 0.001.
- 2 is a graph showing changes in protein concentration due to knockdown of the identified genes in Example 1.
- FIG. The x-axis indicates the level of change in protein concentration compared to non-siRNA-treated controls. Values are shown as mean ⁇ standard deviation. * is p ⁇ 0.05, ** is p ⁇ 0.01, *** is p ⁇ 0.005, *** is p ⁇ 0.001.
- FIG. 1 is a graph comparing expression levels in neurons of genes in which siRNAs changed the A ⁇ 42/40 ratio between Alzheimer's disease brains and non-dementia control brains in Example 1.
- FIG. 1 is a graph comparing the expression levels in neurons of genes in which siRNA reduced the amount of A ⁇ 42 between Alzheimer's disease brains and non-dementia control brains in Example 1.
- FIG. 1 shows genes shown and potential therapeutic targets. Fig.
- 2 is a boxplot showing the relationship between A ⁇ -positive patients and A ⁇ -negative patients and age of onset in Example 2; The n number of A ⁇ -positive patients was 15, and the n number of A ⁇ -negative patients was 4.
- 2 is a plot showing the relationship between A ⁇ -positive patients and A ⁇ -negative patients in Example 2 and the amount of A ⁇ 40 in culture supernatants of cerebral cortical neurons induced from human iPS cells. The n number of A ⁇ -positive patients was 15, and the n number of A ⁇ -negative patients was 4.
- 2 is a plot showing the relationship between A ⁇ -positive patients and A ⁇ -negative patients in Example 2 and the amount of A ⁇ 42 in culture supernatants of cerebral cortical neurons induced from human iPS cells.
- FIG. 10 is a graph showing prediction results of A ⁇ deposition.
- 1 is a graph showing the results of predicting the amount of A ⁇ (1-42) in fluid (CSF).
- CSF using covariates (age, sex, and APOE- ⁇ 4 allele genotype) in Example 2 (left graph) or with covariates and CDiP-specified genotype sets (right graph) 1 is a graph showing prediction results of the total tau (t-tau) amount of .
- Example 2 In CSF using covariates (age, sex, and APOE- ⁇ 4 allele genotype) in Example 2 (left graph) or with covariates and CDiP-specified genotype sets (right graph) 1 is a graph showing the prediction result of the amount of phosphorylated tau (p-tau) in .
- AD-related gene mutation ⁇ Alzheimer's disease (AD)-related gene mutation>
- the inventors used neuronal cells in the cerebral cortex induced from iPS cells established from sporadic AD patients, and used the A ⁇ 42/40 ratio, which is one of the pathological indicators of AD, as a phenotype to detect GWAS (cell GWAS). ), and as shown in the examples described later, among the mutations of AD-related genes, the mutations described in Tables 1-1 to 1-77 are found as mutations related to the A ⁇ 42/40 ratio. .
- AD-related genes containing one or more mutations listed in Tables 1-1 to 1-77 above, age, sex, and accumulation of A ⁇ in the brain Without analyzing the APOE4 genotype that is said to be involved, and without analyzing mutations in AD-related genes containing one or more mutations listed in Tables 1-1 to 1-77 above, age, sex , and when only the APOE4 genotype, which is said to be involved in the accumulation of A ⁇ in the brain, is analyzed, AD-related including one or more mutations described in Tables 1-1 to 1-77 above Analysis of gene mutations, age, gender, and APOE4 genotype, which is believed to be involved in the accumulation of A ⁇ in the brain, yields higher results in the AUC score, which is one of the indicators of prediction accuracy.
- the prediction of AD onset risk using the information processing apparatus and information processing method of the present embodiment should be performed with an accuracy of AUC of about 0.7 (more specifically, about 0.73 or more and 0.76 or less). can be done.
- the risk of AD is determined by analyzing a SNP set containing one or more mutations listed in Tables 1-1 to 1-77 above. By doing so, it is possible to provide a risk determination method with high accuracy or high predictability. That is, the information processing apparatus and information processing method of the present embodiment can be said to be an AD onset risk prediction apparatus and prediction method. Further, according to the information processing apparatus and information processing method of the present embodiment, it is possible to predict the risk of developing AD in subjects, including subjects suspected of having sporadic AD with no family history.
- the information processing apparatus and information processing method of the present embodiment can also contribute to AD stratification. This can also contribute to precision medicine.
- the term "risk of Alzheimer's disease (AD)” refers to the possibility of contracting Alzheimer's disease, such as susceptibility to or difficulty in contracting AD.
- “Risk determination” includes, for example, dividing the current or future AD probability into several levels and outputting them numerically. Determining risk for AD includes assessing genetic factors or genetic susceptibility to disease, such as predisposition or predisposition to AD.
- One or more of the mutations described in Tables 1-1 to 1-77 above can be used, and the mutations described in Tables 1-1 to 1-77 above are associated with AD.
- This is an SNP that has not been recognized in the past. That is, AD is thought to develop due to the combined action of polygenes, and rather than individually analyzing the mutations listed in Tables 1-1 to 1-77 above, Table 1-1 to Table 1-
- the risk of AD can be determined with higher accuracy.
- the inventors found that mutations related to the phosphorylated tau/total tau ratio among AD-related gene mutations described in Tables 2-1 to 2-9 above. I found a mutation. Therefore, one or more mutations described in Tables 2-1 to 2-9 above can be further included, but since AD is believed to be developed by the combined action of polygenes, the above A ⁇ 42/ In addition to the mutations related to the 40 ratio, among the mutations related to the phosphorylated tau/total tau ratio, the mutations described in Table 6 below (among the mutations described in Tables 2-1 to 2-9 above) , mutations that are particularly highly relevant to AD from the viewpoint of the phosphorylated tau / total tau ratio). It is more preferred to use SNP sets that further include.
- each table list the rs number, the chromosome number where each SNP exists (indicated by X or Y in the case of sex chromosomes), and the position of each SNP on the chromosome. is doing.
- information such as base sequences and diseases related to each SNP can be obtained, for example, by searching the NCBI SNP Database based on the rs number. Their information can be referenced by the Database and is incorporated herein by reference.
- the position of each SNP on the chromosome corresponds to the assembly genome version GRCh37.
- each SNP can be identified by referring to the base sequence identified by the rs number. and a new rs number is assigned, the rs number applicable herein includes the merged rs number and the other merged rs number.
- the rs number described in this specification is a number assigned by merging multiple rs numbers, the applicable rs number in this specification includes other original rs numbers.
- the base sequence indicated by each rs number related to SNP is indicated as a specific base sequence by referring to a database such as the NCBI SNP Database, but due to differences in race etc., other than the corresponding SNP in the base sequence
- the base sequence in the portion may be altered.
- the race and gender of the subject are not limited.
- FIG. 1 is a diagram showing an example of the configuration of an information processing apparatus 100 according to the first embodiment.
- the information processing device 100 includes, for example, a detection unit 110, a processing unit 120, and a storage unit .
- the detection unit 110 detects an SNP that is a mutation in an Alzheimer's disease (AD)-related gene (hereinafter referred to as a first SNP) in a subject-derived genomic DNA sample (step 1).
- AD Alzheimer's disease
- Subject-derived genomic DNA samples can be cells or tissues collected from the subject's living body, and are not particularly limited as long as they contain nucleated cells. Examples include blood, cerebrospinal fluid, lymph, Hair etc. are mentioned. Among these, blood can be preferably used because of its low invasiveness, and blood-derived nucleated cells include, for example, peripheral blood mononuclear cells.
- genomic DNA isolated from these samples by a conventional method may be directly used, or the isolated genomic DNA may be amplified and the amplified genomic DNA may be used.
- genomic DNA there is no particular limitation on the method for extracting genomic DNA, and it can be extracted using a known method.
- the phenol/chloroform method, the cetyltrimethylammonium bromide (CTAB) method, and the like can be mentioned.
- CTCAB cetyltrimethylammonium bromide
- a commercially available kit may be used for DNA extraction. Examples of such kits include Wizard Genomic DNA Purification Kit (manufactured by Promega) and the like.
- the detection unit 110 is composed of a device used for normal genetic polymorphism analysis. Examples of such devices include DNA microarrays; conventional sequencers and next generation sequencers (NGS; Next Generation Sequencer); and nucleic acid amplification devices such as polymerase chain reaction (PCR) devices.
- DNA microarrays DNA microarrays
- NGS Next Generation Sequencer
- PCR polymerase chain reaction
- SNPs can be detected using known SNP detection methods using the devices exemplified above, such as direct sequencing, PCR, restriction fragment length polymorphism (RFLP), hybridization, TaqMan ( (registered trademark) PCR method (hereinafter the description of "registered trademark” is omitted), methods using mass spectrometry, etc. can be mentioned.
- known SNP detection methods such as direct sequencing, PCR, restriction fragment length polymorphism (RFLP), hybridization, TaqMan (registered trademark) PCR method (hereinafter the description of "registered trademark” is omitted), methods using mass spectrometry, etc.
- the direct sequencing method is performed by cloning the region containing the SNP into a vector or amplifying it by PCR and determining the base sequence of the region.
- cloning can be performed by screening a cDNA library using an appropriate probe.
- it can be cloned by amplifying by PCR reaction using appropriate primers and ligating into an appropriate vector.
- it can be subcloned into another vector, but is not limited to these.
- vectors examples include pBlue-Script SK (+) (manufactured by Stratagene), pGEM-T (manufactured by Promega), pAmp (manufactured by Gibco-BRL), p-Direct (manufactured by Clontech), pCR2.1-TOPO (manufactured by Invitrogene ) and other commercially available plasmid vectors, virus vectors, artificial chromosome vectors, and cosmid vectors can be used.
- a known method can be used for determining the base sequence. Examples include, but are not limited to, manual sequencing using a radioactive marker nucleotide and automatic sequencing using a dye terminator. Based on the base sequence thus obtained, it is determined whether or not the sample has the SNP.
- the PCR method is performed using oligonucleotide primers that hybridize only to sequences having SNPs (hereinafter sometimes referred to as "SNP detection primers"). Since a plurality of SNPs exist, a primer capable of detecting all SNPs may be used alone as the SNP detection primer, or two or more types of primers capable of detecting each SNP may be used in combination.
- the primers are used to amplify the sample's DNA. A sample contains a SNP if the SNP detection primers generate a PCR product. If no PCR product was generated, it indicates that the sample is free of SNPs.
- the region containing the SNP in the sample is first amplified by PCR. This PCR product is then cut with the appropriate restriction enzyme for the region containing the SNP.
- the restriction enzyme-digested PCR products are separated by gel electrophoresis and visualized by ethidium bromide staining. The presence of SNPs in a sample can be detected by comparing the length of the fragment with a molecular weight marker and, as a control, the PCR product not treated with a restriction enzyme.
- the hybridization method is a method for determining the presence or absence of SNPs in a sample based on the property of DNA in the sample to hybridize with complementary DNA molecules (eg, oligonucleotide probes).
- complementary DNA molecules eg, oligonucleotide probes.
- Various techniques for hybridization and detection such as colony hybridization, plaque hybridization, Southern blotting, and other known hybridizations, can be used for this hybridization method.
- colony hybridization e.g., plaque hybridization, Southern blotting, and other known hybridizations
- DNA Cloning 1 DNA Cloning 1: Core Techniques, A Practical Approach 2nd ed.” (Oxford University (1995); especially Section 2.10 for hybridization conditions
- hybridization can also be detected using a DNA chip.
- a SNP-specific oligonucleotide probe is designed and attached to a solid phase support. Then, the DNA in the sample is brought into contact with the DNA chip to detect hybridization.
- the TaqMan PCR method uses SNP-specific TaqMan probes and Taq polymerase to simultaneously detect SNPs and amplify regions containing SNPs.
- a TaqMan probe is an oligonucleotide of about 20 bases labeled with a fluorescent substance at the 5' end and a quencher at the 3' end, and is designed to hybridize to the SNP site of interest.
- Taq polymerase has 5' to 3' nuclease activity.
- the extension reaction from the forward primer side reaches the TaqMan probe hybridized to the template, the 5' to 3' nuclease activity of Taq polymerase cleaves the fluorescent substance bound to the 5' end of the TaqMan probe. . As a result, the liberated fluorescent substance is no longer affected by the quencher and emits fluorescence. Measurement of fluorescence intensity enables SNP detection.
- a SNP typing method applying the MALDI-TOF/MS method may be combined with a primer extension method.
- This method enables high-throughput analysis, and by the steps of 1) PCR, 2) purification of PCR products, 3) primer extension reaction, 4) purification of extension products, 5) mass spectrometry, and 6) genotyping.
- PCR primers are designed so as not to overlap with the SNP site bases. It is then purified by enzymatic removal using exonuclease and shrimp alkaline phosphatase or by ethanol precipitation.
- a primer extension reaction is then performed using a genotyping primer designed so that the 3' end immediately flanks the SNP site.
- the PCR products are denatured at elevated temperature and excess genotyping primers are added and allowed to anneal.
- ddNTP and DNA polymerase are added to the reaction system and subjected to thermal cycle reaction, an oligomer one base longer than the genotyping primer is produced. Oligomers one base longer generated in this extension reaction differ according to alleles due to the above design of genotyping primers.
- the purified elongation reaction product is subjected to mass spectrometry and analyzed from the mass spectrum.
- Other detection methods include a SNP typing method that allows high throughput, and a method that applies single-molecule fluorescence analysis.
- MF20/10S manufactured by Olympus
- MF20/10S is a system that employs this method.
- complementary and non-complementary primers are used in an ultra-small area of about 1 femtoliter (1/1000 trillion liter). This is to measure and analyze the single-molecule-level translational diffusion time of fluorescent labeled primers amplified by the PCR method.
- the DNA chip method is also one of the types of typing that allows high throughput.
- a DNA chip has many types of DNA probes arrayed and immobilized on a substrate, and a labeled DNA sample is hybridized on the chip to detect fluorescent signals from the probes.
- Snipper method An example of a SNP typing method that uses a gene amplification method other than the PCR method is the Snipper method.
- This method is an SNP typing method that applies the RCA (rolling circle amplification) method, which is a DNA amplification method in which complementary strand DNA is synthesized while DNA polymerase moves over circular single-stranded DNA as a template.
- the probe is an oligo DNA with a length of 80 bases or more and 90 bases or less, and contains sequences of 10 base lengths and 20 base lengths or less complementary to the vicinity of the 5' end and 3' end of the target SNP site at both ends, It is designed to anneal to the target DNA and become circular.
- the 3' end of the probe is designed to have a sequence complementary to the target SNP site. If the 3' end of the probe is perfectly complementary to the target SNP site, the probe will be circularized, but if the 3' end of the probe is mismatched, the probe will not be circularized.
- the probe also has a backbone sequence of 40 to 50 nucleotides in length and contains sequences complementary to two types of RCA amplification primers.
- the UCAN method is a method that applies the ICAN method, an isothermal gene amplification method developed by Takara Bio.
- the UCAN method uses DNA-RNA-DNA chimeric oligonucleotides (DRD) as primer precursors.
- DRD DNA-RNA-DNA chimeric oligonucleotides
- This DRD primer precursor is designed such that the DNA at the 3' end is modified so that replication of the template DNA by DNA polymerase does not occur, and the RNA portion binds to the SNP site.
- the coexisting RNase H cleaves the RNA portion of the paired DRD primer only when the DRD primer and template are perfectly matched.
- the modified DNA is removed from the 3' end of the primer and a new one is formed, so that the elongation reaction by the DNA polymerase proceeds and the template DNA is amplified.
- RNase H does not cleave the DRD primer and DNA amplification does not occur.
- the amplification reaction after the perfectly matched DRD primer precursor is cleaved by RNase H proceeds by the ICAN reaction mechanism.
- the LAMP method is a gene isothermal amplification method developed by Eiken Chemical, which defines six regions of the target gene (F3c, F2c, F1c from the 3' end, B3, B2, B1 from the 5' end). , using four types of primers (FIP primer, F3 primer, BIP primer, B3 primer) for the six regions.
- FIP primer, F3 primer, BIP primer, B3 primer For the purpose of typing, only the target SNP site (1 base) is sufficient between F1 and B1, and the FIP primer and BIP primer are designed so that the 1 base of the SNP is at the 5' end.
- a DNA synthesis reaction occurs from the dumbbell structure, which is the origin structure of the LAMP method, and the amplification reaction proceeds continuously. When SNP is present, DNA synthesis reaction from dumbbell structure does not occur, and amplification reaction does not proceed.
- Invader method is a method using two types of non-fluorescent labeled probes (allele probe, invader probe) and one type of fluorescent labeled probe (FRET probe) and endonuclease Cleavase without using nucleic acid amplification method.
- Allele probes have a sequence complementary to the template DNA on the 3'-end side from the SNP site, and a sequence unrelated to the template DNA called a flap on the 5'-end side of the probe.
- the invader probe has a complementary sequence on the 5' end side from the SNP site of the template DNA, and the portion corresponding to the SNP site has any base.
- the FRET probe has a sequence complementary to the flap sequence on the 3' end side.
- One 5' end is labeled with a fluorescent dye and a quencher, but the FRET probe is designed to form a double strand intramolecularly and is usually quenched.
- the 3' end (arbitrary base portion) of the invader probe penetrates into the SNP site when the allele probe forms a double strand with the template DNA.
- Cleavase recognizes the structure invaded by the base and cleaves the flap portion of the allele probe.
- this released flap then binds to the complementary sequence of the FRET probe, the 3' end of the flap penetrates the intramolecular double-stranded portion of the FRET probe.
- Cleavase recognizes the structure in which the base of the flap penetrates into the FRET probe, and cleaves the fluorescent dye of the FRET probe, as in the case of the allele probe and the invader probe. As the fluorochrome moves away from the quencher, fluorescence is generated. If the allele probe does not match the template DNA, the specific structure recognized by Cleavase is not formed, and the flap is not cleaved.
- primers for SNP detection design the primers according to the region to be amplified and the typing method. For example, it is preferable to be able to fully amplify the region, and sequences can be designed based on the sequences near the ends of the region.
- Techniques for designing primers are well known in the art, and primers that can be used in the method of the present embodiment satisfy conditions that allow specific annealing, such as length and base composition that allow specific annealing ( melting temperature).
- the length of the region to be amplified is not limited as long as it does not interfere with typing, and may be increased or decreased as appropriate depending on the detection method.
- the positional relationship between the primer and the SNP site can be freely designed according to the detection method, and the region containing the SNP to be detected (for example, a continuous base length of 50 bases or more and 500 bases or less). As long as you do so, you can design primers while taking into account the characteristics of your typing method.
- the length that exhibits the function as a primer is preferably 10 to 100 bases, more preferably 15 to 50 bases, and even more preferably 15 to 30 bases.
- Tm melting temperature
- the probe When using a probe for SNP detection, design the probe so that it recognizes the SNP site.
- the SNP site may be recognized anywhere in the probe according to the typing method, and may be recognized at the end of the probe depending on the typing method.
- the SNP detection polynucleotide When the SNP detection polynucleotide is used as a probe, the length of the base sequence complementary to the genomic DNA is usually 15 to 200 bases, preferably 15 to 100 bases, and 15 to 50 bases. Although more preferred, it may be longer or shorter depending on the typing method.
- SNPs which are mutations in AD disease-associated genes detected in genomic DNA samples derived from patients with AD.
- the processing unit 120 based on a plurality of training data sets labeled with information on the onset of AD, SNPs, which are mutations in AD disease-associated genes detected in genomic DNA samples derived from patients with AD, are analyzed.
- the learned machine learning model it is determined whether or not the subject develops AD based on the SNP detected by the detection unit 110 (that is, the “first SNP”) (step 2).
- the SNP included in the training data set will be referred to as a "second SNP”.
- the processing unit 120 includes, for example, an acquisition unit 121, a feature amount conversion unit 122, a determination unit 123, an output control unit 124, and a learning unit 125.
- the constituent elements of the processing unit 120 are implemented by a processor such as a CPU (Central Processing Unit) or GPU (Graphics Processing Unit) executing a program stored in the storage unit 130, for example.
- a processor such as a CPU (Central Processing Unit) or GPU (Graphics Processing Unit) executing a program stored in the storage unit 130, for example.
- Some or all of the components of the processing unit 120 are implemented by hardware (circuitry) such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), or FPGA (Field-Programmable Gate Array). It may be realized by cooperation of software and hardware.
- the storage unit 130 is implemented by a storage device such as a HDD (Hard Disc Drive), flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), ROM (Read Only Memory), RAM (Random Access Memory), or the like.
- the storage unit 130 stores model information 131 in addition to various programs such as firmware and application programs. The model information 131 will be described later.
- FIG. 2A is a flowchart showing the flow of a series of runtime processes by the processing unit 120 according to the first embodiment. The processing of this flowchart may be performed repeatedly at a predetermined cycle, for example.
- the acquisition unit 121 acquires the detection data of the first SNP, which is the mutation of the Alzheimer's disease-related gene, in the subject-derived genomic DNA sample from the detection unit 110 (step S100).
- the acquired first SNP detection data can also be said to be genotype data of AD-related genes possessed by the subject (hereinafter sometimes referred to as "first SNP set” or "subject's genotype data”). .
- the feature quantity conversion unit 122 converts the first SNP data acquired by the acquisition unit 121 into a feature quantity that can be input to the model (step S101).
- the feature value here is, for example, a parameter indicating whether the subject's genotype data is homozygous (AA), homozygous (BB), or heterozygous (AB) for each SNP. .
- the genotype is indicated by nucleotides such as "GG” indicating that both SNPs on homologous chromosomes are G (guanine), or "AG” indicating that one is G (guanine) and the other is A (adenine).
- the genotype data of the subject is combined with the second SNP (AD-related gene possessed by AD patient genotype data (hereinafter sometimes referred to as “second SNP set” or “AD patient genotype data”)) into parameters that can be input to a model.
- second SNP set AD-related gene possessed by AD patient genotype data
- the conversion of the subject's genotype data into feature quantities can be performed, for example, by assigning a value to the subject's genotype data for each SNP included in the second SNP set. For example, for each SNP, a value (for example, 0, 1 or 2). Thereby, the subject's genotype data can be converted into a feature amount.
- the value associated with each SNP is set to 0, 1, or 2, but the value associated with the SNP is not limited to 0, 1, or 2.
- the value associated with the mating type can be determined for each SNP.
- a SNP maps a value of 2 when the subject's genotype data is homozygous (AA), a value of 1 when heterozygous (AB), and a value of 1 when the subject is homozygous (BB).
- the genotype data of the subject is heterozygous (AB)
- the other SNPs are associated with a value of 2
- a value of 0 may be associated when homozygous (BB).
- the genotype data of the subject is homozygous (BB)
- the value is associated with 2 when the heterozygous type (AB) is associated with the value 1, and when the subject is homozygous (AA)
- a value of 0 may be associated.
- the subject's genotype data can be converted into feature values. Values used for correspondence in the conversion to this feature quantity can be arbitrarily determined. For example, for each SNP, a genotype with high relevance to AD is associated with a value of 2, and each SNP is associated with a genotype with low relevance to AD with a value of 1 or 0. be able to.
- the determination unit 123 inputs the feature amount converted from the first SNP data by the feature amount conversion unit 122 into the prediction model MDL indicated by the model information 131 (step S102).
- the model information 131 is information (program or data structure) that defines a prediction model MDL for determining the risk of AD from the subject's genotype data.
- Prediction model MDL uses arbitrary kernel functions such as logistic regression models, multilayer perceptrons, neural networks such as convolutional neural networks (CNN) and recurrent neural networks (RNN), Gaussian kernels, etc. It is implemented by various other models such as support vector machines, random forests modeled as regression trees, multiple regression analysis, models using hidden Markov models, etc., statistical models and probabilistic models. Moreover, it is also possible to employ a model that combines various models to perform a comprehensive determination.
- the predictive model MDL may be a random forest containing multiple classifiers. As an example, the prediction model MDL will be described below as a random forest.
- FIG. 2B is a diagram showing an example of the prediction model MDL according to the first embodiment.
- the prediction model MDL includes, for example, N classifiers ML-1 to ML-N.
- Each classifier WL is preliminarily learned to output a score indicating the likelihood that the subject will develop AD as a likelihood or probability when the feature amount converted from the data of the first SNP is input.
- Each classifier WL is in parallel with each other.
- a method of generating one learning model by combining a plurality of weak learners in this way is called ensemble learning.
- the prediction model MDL normalizes the score of each classifier WL, which is a weak learner, and outputs the normalized score. Score normalization is shown in Equation (1).
- the predictive model MDL may normalize the scores by dividing the sum of the scores of all classifiers WL by N, the sum of the classifiers WL.
- the prediction model MDL is a combination of N classifiers WL as shown in FIG. 2B, it is not limited to this.
- the prediction model MDL may be one classifier WL.
- the determination unit 123 determines whether or not the score (normalized score) output by the prediction model MDL is greater than or equal to the threshold (step S103).
- the determination unit 123 determines that the subject has a high probability of developing AD when the score is equal to or greater than the threshold (step S104), and determines that the subject has a low probability of developing AD when the score is less than the threshold. (Step S105).
- the output control unit 124 outputs the result of determination by the determination unit 123 (for example, information indicating the probability of developing Alzheimer's disease) (step S106). For example, the output control unit 124 may transmit the determination result to an external terminal device (not shown) via a communication interface. is. Moreover, if the information processing apparatus 100 includes a display (not shown), the output control unit 124 may display the determination result on the display.
- Training is a state in which the prediction model MDL used at runtime is learned.
- FIG. 2C is a flowchart showing a series of training processes performed by the processing unit 120 according to the first embodiment.
- the learning unit 120 generates a training data set for learning the prediction model MDL (step S200).
- the training data set is a data set in which genotype data of AD-related genes possessed by healthy subjects or AD patients are labeled with information on AD onset of the subject (e.g., positive or negative onset of AD). be. If the second SNP set includes some unknown SNPs, the attributed genotype estimated using genotype imputation can be used.
- genotype data of AD-related genes possessed by healthy subjects or AD patients For example, obtain genotype data of AD-related genes possessed by healthy subjects or AD patients.
- the genotype data of AD-related genes possessed by healthy subjects are labeled with information (for example, a score of 0.0) indicating that they have not developed AD, and the genotype data of AD-related genes possessed by AD patients are labeled with Information indicative of developing AD (eg score of 1.0) is labeled.
- the feature quantity conversion unit 122 transforms the genotype data of the AD-related genes included in the training data set into a feature quantity. (step S201).
- the learning unit 125 converts a plurality of feature amounts converted from the genotype data of the AD-related genes in the training data set by the feature amount conversion unit 122 into training feature amounts (training samples) and verification feature amounts. (test sample), and the training feature quantity is input to the i-th classifier WL-i among the N classifiers WL included as weak learners in the prediction model MDL (step S202 ).
- the learning unit 125 converts a plurality of feature values converted from the genotype data of the AD-related genes in the training data set, that is, the feature values of the population, into a feature value for training (training sample) and a feature value for verification ( Principal component analysis may be used when dividing into test samples). For example, the learning unit selects feature values for training from the feature values of the population based on the attribute information and genetic information of the healthy subjects or AD patients who provide the AD-related genes and the information that is the main component. (training samples) may be picked.
- the attribute information of healthy subjects or AD patients may include information such as age and gender, for example.
- the genetic information of healthy subjects or AD patients may include, for example, information on whether they have the APOE ⁇ 4 genotype or not, and other information.
- a training feature quantity (training sample) selected based on the principal components is an example of a “specific training data set”.
- the learning unit 125 acquires the output result, that is, the score si from the i -th classifier WL-i to which the training feature amount is input (step S203).
- the learning unit 125 calculates an error (also referred to as a loss) between the score s i obtained from the i-th classifier WL-i and the score labeled with the training feature (step S204 ).
- the learning unit 125 determines the parameters of the i-th classifier WL-i so that the error becomes small (step S205).
- the learning unit 125 determines whether learning for the i-th classifier WL-i has been repeated a predetermined number of times E (step S206). By inputting the same feature amount as the training feature amount used for learning in the previous process to the i-th classifier WL-i, learning of the i-th classifier WL-i is repeated. At this time, the learning unit 125 stores the parameters updated by learning in the storage unit 130, and inputs the feature amount for training to the i-th classifier WL-i whose parameters have been initialized. As a result, E classifiers WL-i with different parameters are generated before learning for the i-th classifier WL-i reaches a predetermined number of times E.
- the learning unit 125 inputs the feature amount for verification to each of the E i-th classifiers WL-i. (step S207).
- the learning unit 125 selects the classifier WL-i with the highest prediction accuracy among the E i-th classifiers WL-i (step S208). For example, the learning unit 125 selects, among the E i-th classifiers WL-i, the classifier having the smallest error between the score s i obtained when the feature amount for verification is input and the score of the training data. Select WL-i.
- the learning unit 125 determines whether all of the N classifiers WL included as weak learners in the prediction model MDL have been learned (step S209), and the learning of the N classifiers WL is still completed. If not, the process returns to S202, and the (i+1)th classifier WL-(i+1) is learned based on the training feature amount.
- the information processing apparatus 100 performs training in which information indicating whether the onset of AD is positive or negative is labeled to the feature amount converted from the genotype data of the AD-related gene.
- the predictive model MDL learned based on the data set the feature amount converted from the genotype data of the AD-related gene possessed by the subject is input, and based on the output result of the predictive model MDL in which the feature amount is input, the subject is predicted to develop AD, it is possible to accurately predict whether or not the subject will develop AD in the future.
- AD risk in subjects who have not developed AD such as infants and young people.
- a prediction model MDL including a plurality of classifiers WL realized by a machine learning model is used, a specific SNP in the genotype data of AD-related genes is positively correlated with the risk of AD. It can be expected to calculate the weighting that indicates that there is a score as a score. As a result, it is possible to predict AD risk in subjects who have not developed AD, such as infants and young people, at an early stage.
- the training data set is labeled with a score indicating whether or not the genotype data of AD-related genes possessed by healthy subjects or AD patients will develop AD.
- the training data set may be genotype data of AD-related genes possessed by healthy subjects or AD patients labeled with the age of onset of AD in addition to the scores described above.
- the learning unit 125 uses a probability P1 of developing AD, a probability P2 of not developing AD, and a probability P2 of not developing AD.
- the determination unit 123 predicts the age at which the subject develops AD based on the t element of the vector output by the prediction model MDL.
- the label is not limited to the score indicating the presence or absence of onset of AD or the age of onset, but may include the attributes of the subject who provides the genotype data of the AD-related gene. Attributes may include, for example, various information such as gender, weight, height, lifestyle habits, presence or absence of illness, and family medical history. In addition, genetic information of known AD-related genotypes such as the APOE ⁇ 4 genotype may be included.
- the information processing apparatus and information processing method of this embodiment can also be called an AD diagnosis support apparatus and diagnosis support method.
- the present invention provides instructions described in the above information processing method, specifically: detecting a first SNP that is a mutation in an Alzheimer's disease-associated gene in a genomic DNA sample from a subject; and Based on a plurality of training data sets labeled with information on the onset of Alzheimer's disease for the second SNP, which is a mutation in the Alzheimer's disease-associated gene detected in a genomic DNA sample derived from a patient who developed Alzheimer's disease. Determining whether the subject develops Alzheimer's disease from the first SNP using a machine learning model trained in provides a processor configured to execute
- Human cDNAs of reprogramming factors were introduced into human PBMC using episomal vectors (SOX2, KLF4, OCT4, L-MYC, LIN28, dominant-negative p53).
- episomal vectors SOX2, KLF4, OCT4, L-MYC, LIN28, dominant-negative p53.
- PBMCs were harvested and replated on dishes coated with laminin 511-E8 fragment (iMatrix 511, Nippi). The next day, the medium was changed to StemFit AK03. After that, the medium was changed every two days. Twenty days after transduction, iPS cell colonies were picked. iPS cells established from PBMCs were expanded for neural differentiation.
- NANOG (1:100 dilution; Abcam, ab80892), TRA1-60 (1:400 dilution; CST#4746, Danvers, Mass.
- MAP2 (1:100 dilution; Abcam, ab80892) Abcam ab5392
- SATB2 (1:400 dilution; Abcam EPNCIR130A ab92446), Alexa488-conjugated antibody (1:400 dilution; Thermo fisher A11029), Alexa488-conjugated antibody (1:400 dilution; Thermo Fisher A11039), Alexa594-conjugated antibody (1:400 dilution; Thermo Fisher A21207).
- Pathway analysis of identified genes was performed using commercially available Ingenuity Pathway Analysis (IPA, QIAGEN, https://www.qiagenbioinformatics.com/) software, The top networks were analyzed.
- IPA Ingenuity Pathway Analysis
- a ⁇ amyloid ⁇
- 6E10 A ⁇ 3-Plex kit
- this assay uses the 6E10 antibody to capture A ⁇ peptides and different C-terminal specifics of SULFO-TAG labels for detection by electrochemiluminescence using a Sector Imager 2400 (Meso Scale Discovery).
- a specific anti-A ⁇ antibody was used.
- GenomeStudio (Illumina) and quality control (Hardy-Weinberg equilibrium: p>1.0 ⁇ 10 ⁇ 6 ; minor allele frequency ⁇ 0.01; linkage disequilibrium-based variant pruning r2 ⁇ 0.8, window size: After genotyping using 100 kb, step size: 5), genotypes were imputed with minimac4 using the 1,000 Genomes Project Phase 3 as a reference panel. 7,349,481 SNPs exceeded the quality threshold after imputation (r2 ⁇ 0.3, minor allele frequency ⁇ 0.01).
- the linear association between SNPs and the specific A ⁇ 42/40 accumulation rate of iPS cell-derived neurons was analyzed with plink1.9, and the age of onset, sex, and genotype of the APOE- ⁇ 4 allele were included as covariates in the linear regression model. .
- the association analysis was set with p ⁇ 5 ⁇ 10 ⁇ 5 as the suggestive level and p ⁇ 5 ⁇ 10 ⁇ 8 as the significance level. No statistical methods were used to predetermine the sample size, but the sample size is similar to that reported in previous publications.
- the genotypes of samples from the Alzheimer's Disease Neuroimaging Initiative (ADNI) 1/GO/2 dataset were collected (Illumina; Omni 2.5M BeadChip). Quality control and imputation were performed on the genotypic data under the same conditions. Imputed genotypes of 10,121,962 SNPs were filtered by 496 SNPs obtained from the genome-wide analysis. Genotypes of SNPs listed in the polygenic cell analysis (CDiP) list but not in the ADNI dataset were imputed with the mean genotype of AD patients. Next, the phenotype of the ADNI samples was predicted from the genotype. Predicted whether the sample belonged to a valid status for AD (positive) or not (negative).
- ADNI Alzheimer's Disease Neuroimaging Initiative
- Samples were independently classified as positive/negative according to four criteria based on results reported in the ADNI database.
- SUVR normalized uptake ratio
- the genotypic vectors of the ADNI samples were mapped to the principal component space derived from the genotypic matrix of in-hospital AD patients. A 10-fold cross-validation was performed.
- the ADNI sample was divided into a training sample and a test sample.
- a random forest classifier 100 estimators was trained on the training samples and the target variables (positive/negative for conditions like AD) were the genotype matrix and the covariates (age, gender, genotype for APOE- ⁇ 4). was predicted from the top three PCs of Prediction performance was assessed by the area under the curve (AUC) of the receiver operating characteristic (ROC) curve obtained from the test sample predictions. Prediction performance was compared to when the target variable was predicted only from covariates. The significance of AUC improvement was tested with the Wilcoxon signed-rank test (significance threshold: p ⁇ 0.05). The target variable corresponds to the "information on the onset of Alzheimer's disease" described above.
- [Target gene knockdown] Six-well plates were seeded on day 5 with an initial density of 3,000,000 cells per well. Twenty-four hours after seeding (day 6), the medium was replaced with neurobasal medium containing 1 ⁇ M AccelSMARTpool siRNA (Horizon Discovery). To maximize the effect of Acell siRNA, iPS cell-derived neurons were cultured from day 6 to day 9 for 72 hours. 72 hours after the addition of siRNA (day 9), the culture medium was replaced with fresh neurobasal medium containing 1 ⁇ M AccelSMARTpool siRNA or 1 ⁇ M JNJ-40418677 (manufactured by Sigma-Aldrich), and harvested on day 11. , analyzed the A ⁇ phenotype.
- ADNI Alzheimer's Disease Neuroimaging Initiative
- NBDC National Bioscience Database Center
- Example 1 In this study, a genome-wide analysis was performed using A ⁇ released from neurons in the cerebral cortex derived from iPS cells of the AD cohort as a pathological signature. CDiP was then performed to reveal complex pathological mechanisms in a neuronal-specific manner.
- a ⁇ was selected as a pathological feature of neurons in the cerebral cortex because A ⁇ is the triggering event in the initiation of the long-term pathological cascade of AD and causes dementia.
- a ⁇ 40 and A ⁇ 42 were quantified as protective and toxic A ⁇ , respectively, and the A ⁇ 42/40 ratio was quantified in the culture supernatant of neuronal cells in the SAD cortex.
- the APP and PSEN1 genes which play a central role in the A ⁇ production pathway, are known to affect neurodevelopment and neuronal differentiation tendency from human iPS cells. Therefore, when assessing A ⁇ among iPS cells from different patients, it is important to maintain uniform purity of neuronal differentiation and normalize variations in the number of neurons per well.
- the direct differentiation method used in this study yields uniform and highly pure cortical neurons, but variations in neuronal density between patients due to the stress of direct conversion from day 0 to day 5 and this variability affected the amount of A ⁇ .
- Total protein concentration extracted from neurons across wells was used to normalize variations in neuron number per well. This is because changes in protein concentration linearly reflect the number of neurons per well of different independent neurons or patients.
- CDiP demonstrated that the A ⁇ 42/40 ratio in single cell-type cultures of iPS cell-derived neurons was primarily influenced by APOE ⁇ 4 as well as other complex gene sets. Indicated. Therefore, we adjusted the APOE genotypes and performed CDiP (Fig. 5B), genotyped 24 SNPs and associated loci ("p-value ⁇ 5 x 10-8 " or "p-value ⁇ 5 x 10-5 A genetic locus containing more than 10 SNPs that are associated with altered A ⁇ 42/40 ratios were identified. (Fig. 5C and Tables 3-1 to 3-77). In Tables 3-1 to 3-77, “chr” means chromosome, "BETA” means partial regression coefficient, and "SE” means standard error. It is also used in the following tables with the same meaning.
- loci and related genes Five loci and related genes were known to be associated with A ⁇ production, including CUL1, QRFP, CTNNA3, DAB1, and DCC. In addition, eight loci and associated genes, including MAGI1, TMTC1, TRPM1, KCNMA1, DAB1, CPXM2, ROBO2, and ANO3, have been reported as AD-related loci, or clinical biomarkers, for clinical GWAS. Twelve loci and related genes were novel as A ⁇ - or AD-related genes (Tables 5-1 to 5-2). In Tables 5-1 and 5-2, "EOAD” indicates early-onset Alzheimer's disease, “LOAD” indicates late-onset Alzheimer's disease, “CNV” indicates copy number variation, and “OR” indicates the odds ratio. means.
- “yes” in the item “Brain” means high expression in the brain, “low” means low expression in the brain, and “nd” is the GTEx portal (https://gtexportal. org/home/) means there is no data.
- the item “Brain cell-type” describes the top three cell types that showed high gene expression on the Brain RNA-Seq portal (https://www.brainrnaseq.org/). It is also used in the following tables with the same meaning.
- p231-tau which is tau phosphorylated at the 231st threonine from the N-terminus, is a highly sensitive marker for diagnosing or tracking the progression of AD
- the p231-tau/total tau ratio was quantified to apply the p231-tau ratio to CDiP.
- APOE ⁇ 4 genotype, gender, and age at onset of AD did not correlate with the p231-tau ratio (FIGS. 6A, 6B, and 6C).
- CDiP was performed using the p231-tau ratio as a trait with or without adjustment for APOE genotype (FIGS.
- the protein encoded by CTNNA3 plays a role in cell-cell adhesion, and mutations in CTNNA3 cause familial arrhythmogenic right ventricular dysplasia caused by mishandling of electrolytes such as potassium and calcium.
- the proteins encoded by KCNMA1 are composed of voltage- and calcium-sensitive potassium channels (KCa1.1) that regulate smooth muscle tone and neuronal excitability.
- KCa1.1 is a known target of cromolyn and is interestingly tested in a phase III trial in AD.
- ANO3-encoded proteins have been reported to function in endoplasmic reticulum-dependent calcium signaling, and ANO3 mutations cause familial dystonia type 24 through abnormal neuronal excitability.
- CDiP uncovered a set of genotypes that in part contributed to the polygenic architecture behind the disease pathogenesis mechanisms of AD.
- Example 2 AD onset prediction by polygene data set obtained from cell GWAS
- Example 2 AD onset prediction by polygene data set obtained from cell GWAS
- ADNI Alzheimer's Disease Neuroimaging Initiative
- AV45-PET brain A ⁇ deposition
- CSF cerebrospinal fluid
- t-tau total tau
- p - tau phosphorylated tau
- AUC area under the curve
- Example 3 (Discovery of rare variants by cell GWAS) To confirm further applicability of the system to real clinical data, we investigated whether the identified gene sets formed SAD. The associations of genes identified as rare variants in the present study were investigated. They are known to be minor, albeit infrequent, factors in the development of AD.
- J-ADNI Japanese Alzheimer's Disease Neuroimaging Initiative
- p is the p-value from the summation test
- se is the approximate standard error associated with the genotype effect
- cmafTotal is the gene's cumulative minor allele frequency
- cmafUsed is The cumulative minor allele frequency of the SNPs used for analysis
- nsnpsTotal means the number of SNPs in the gene
- nsnpsUsed means the number of SNPs used in the analysis
- nmiss means the number of missing SNPs. .
- nmiss is the number of individuals who did not contribute to the analysis due to trials in which results for that SNP were not reported.
- For genes with multiple SNPs they are summed across genes. It is also used in the following tables with the same meaning.
- AD pathology may not only play pivotal roles in the pathogenesis of AD, but may also represent potential biomarkers and therapeutic target candidates.
- other types of neuronal phenotypes in AD pathology can be applied to CDiPs, such as synaptic loss, neuronal cell death, drug response, vulnerability to environmental stress, etc. Can identify genetic background.
- new combinations of variable cell types, such as glial cells and cell type-specific pathologies reveal new genetic architectures of molecular pathologies hidden in clinical GWAS.
- AD is the sum of multiple cell-type pathologies.
- single nuclear transcriptomes from autopsy AD brains provided information on gene expression in various cell types.
- CDiP can interrogate isolated AD pathologies with cell-type specificity and can also model baseline conditions without confounding factors that can noise genome-wide studies.
- a limitation of CDiP is that it is based on 2D monolayer cultures consisting of a single cell type. To understand the cellular interactions between different cell types, the combination of CDiP and mononuclear transcriptomes from autopsy brains of AD patients was presented in the present study to explore the polygenicity of AD. (Figs. 8B and 8C).
- a ⁇ pathology is primarily based on neuronal polygenicity, whereas tau pathology may be composed exclusively of multiple non-neuronal cell types. It has been shown.
- CDiP predicted AD real-world data, stratified rare variant-associated AD, and identified CTNNA3, ANO3, and KCNMA1 as potential therapeutic targets.
- CDiP serves as a screening tool to associate pathological phenotypes with hidden genotypes.
- it is also important to accumulate evidence using different modalities, such as mouse models and patient specimens, to accommodate the actual AD pathology, which is composed of different cell types and matures over decades.
- CDiPs provide clues to understanding complex pathologies, consisting of the sum of polygenics and traits in disease target cells, paving the way for precision medicine.
- the risk of developing AD in a subject can be predicted.
- DESCRIPTION OF SYMBOLS 100 Information processing apparatus, 110... Detection part, 120... Processing part, 121... Acquisition part, 122... Feature-value conversion part, 123... Judgment part, 124... Output control part, 125... Learning part, 130... Storage part, 131 ... model information.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Analytical Chemistry (AREA)
- General Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Biophysics (AREA)
- Genetics & Genomics (AREA)
- Biotechnology (AREA)
- Medical Informatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Molecular Biology (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Pathology (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioethics (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Micro-Organisms Or Cultivation Processes Thereof (AREA)
Abstract
Description
本願は、2021年4月13日に、米国に仮出願された米国特許第63/174,500号明細書に基づき優先権を主張し、その内容をここに援用する。
(1) 被験者由来のゲノムDNA試料において、アルツハイマー病関連遺伝子の変異である第1のSNPを検出する工程1と、
アルツハイマー病を発症した患者由来のゲノムDNA試料において検出された前記アルツハイマー病関連遺伝子の変異である第2のSNPに対して、アルツハイマー病の発症に関する情報がラベル付けられた複数のトレーニングデータセットを基に学習された機械学習モデルを用いて、前記第1のSNPから、前記被験者がアルツハイマー病を発症するか否かを判定する工程2と、
を含む、情報処理方法。
(2) 前記機械学習モデルは、複数の分類器を含むランダムフォレストであり、
各分類器は、前記複数のトレーニングデータセットのうち、前記アルツハイマー病を発症した患者の属性情報及び遺伝子情報の中の主成分情報を基に選択された特定のトレーニングデータセットを用いて学習される、(1)に記載の情報処理方法。
(3) 前記複数のトレーニングデータセットには、遺伝子型インピュテーションを用いて前記第2のSNPから推定された前記第2のSNPの帰属遺伝子型に対して、アルツハイマー病の発症に関する情報がラベル付けられたデータセットが含まれる、(1)又は(2)に記載の情報処理方法。
(4) 前記変異が表1-1~表1-77に記載された1種以上の変異である、(1)~(3)のいずれか一つに記載の情報処理方法。
アルツハイマー病を発症した患者由来のゲノムDNA試料において検出された前記アルツハイマー病関連遺伝子の変異である第2のSNPに対して、アルツハイマー病の発症に関する情報がラベル付けられた複数のトレーニングデータセットを基に学習された機械学習モデルを用いて、前記第1のSNPから、前記被験者がアルツハイマー病を発症するか否かを判定する判定部と、
を備える、情報処理装置。
被験者由来のゲノムDNA試料において、アルツハイマー病関連遺伝子の変異である第1のSNPを検出する工程1と、
アルツハイマー病を発症した患者由来のゲノムDNA試料において検出された前記アルツハイマー病関連遺伝子の変異である第2のSNPに対して、アルツハイマー病の発症に関する情報がラベル付けられた複数のトレーニングデータセットを基に学習された機械学習モデルを用いて、前記第1のSNPから、前記被験者がアルツハイマー病を発症するか否かを判定する工程2と、
を実行させるためのプログラム。
発明者らは、孤発性AD患者から樹立したiPS細胞から誘導された大脳皮質の神経細胞を用いて、ADの病態指標の一つであるAβ42/40比を表現型として、GWAS(cell GWAS)を行い、後述する実施例に示すように、AD関連遺伝子の変異のうち、Aβ42/40比に関連する変異として、上記表1-1~表1-77に記載された変異を見出している。また、後述する実施例に示すように、上記表1-1~表1-77に記載された1種以上の変異を含むAD関連遺伝子の変異と、年齢、性別、及Aβの脳内蓄積に関わっているとされるAPOE4遺伝型を分析した場合と、上記表1-1~表1-77に記載された1種以上の変異を含むAD関連遺伝子の変異を分析せずに、年齢、性別、及Aβの脳内蓄積に関わっているとされるAPOE4遺伝型のみを分析した場合とを比較すると、上記表1-1~表1-77に記載された1種以上の変異を含むAD関連遺伝子の変異と、年齢、性別、及Aβの脳内蓄積に関わっているとされるAPOE4遺伝型を分析した場合の方が、予測精度の指標の一つであるAUCスコアがより高い結果が得られている。具体的には、孤発性AD患者のSNP情報を用いて、脳内Aβの蓄積が生じるかどうかの予測を、AUC=0.76±0.050の精度で実施可能であり、孤発性AD患者のSNP情報を用いて、脳脊髄液内Aβの異常検査値が生じるかどうかの予測を、AUC=0.73±0.059の精度で実施可能であった。これら、脳内Aβの蓄積及び脳脊髄液内Aβの異常検査値は、臨床的なADの診断とほぼ一致する。よって、本実施形態の情報処理装置及び情報処理方法を用いたAD発症リスクの予測は、AUCが約0.7(さらに詳細には、約0.73以上0.76以下)の精度で行うことができる。家族性AD(遺伝性AD)ではなく、孤発性ADでは、AUCが上記数値範囲となる高精度の予測はこれまでの方法ではありえなかった。これに対して、本実施形態の情報処理装置及び情報処理方法では、上記表1-1~表1-77に記載された1種以上の変異を含むSNPセットを分析してADのリスクを判定することで、精度が高い、又は予測能力が高いリスクの判定方法を提供することができる。つまり、本実施形態の情報処理装置及び情報処理方法は、AD発症リスクの予測装置及び予測方法ということができる。また、本実施形態の情報処理装置及び情報処理方法によれば、家族歴のない孤発性ADの疑いのある被験者も含む、被験者におけるADの発症リスクを予測することができる。
[全体構成]
図1は、第1実施形態の情報処理装置100の構成の一例を示す図である。図1に示されるように、情報処理装置100は、例えば、検出部110と、処理部120と、記憶部130と、を備える。
検出部110では、被験者由来のゲノムDNA試料おいて、アルツハイマー病(AD)関連遺伝子の変異であるSNP(以下、第1のSNPという)を検出する(工程1)。
処理部120では、ADを発症した患者由来のゲノムDNA試料において検出されたAD病関連遺伝子の変異であるSNPに対して、ADの発症に関する情報がラベル付けられた複数のトレーニングデータセットを基に学習された機械学習モデルを用いて、検出部110によって検出されたSNP(つまり「第1のSNP」)から、被験者がADを発症するか否かを判定する(工程2)。以下、トレーニングデータセットに含まれるSNPを「第2のSNP」と称して説明する。
記憶部130は、例えば、HDD(Hard Disc Drive)、フラッシュメモリ、EEPROM(Electrically Erasable Programmable Read Only Memory)、ROM(Read Only Memory)、RAM(Random Access Memory)等の記憶装置により実現される。記憶部130には、ファームウェアやアプリケーションプログラム等の各種プログラムの他に、モデル情報131が格納される。モデル情報131については後述する。
以下、第1実施形態に係る処理部120によるランタイムの一連の処理の流れをフローチャートに即して説明する。ランタイムとは、既に学習された予測モデルMDLを利用する状態である。図2Aは、第1実施形態に係る処理部120によるランタイムの一連の処理の流れを示すフローチャートである。本フローチャートの処理は、例えば、所定の周期で繰り返し行われてよい。
以下、第1実施形態に係る処理部120のトレーニングの一連の処理の流れをフローチャートに即して説明する。トレーニングとは、ランタイムに利用される予測モデルMDLを学習させる状態である。図2Cは、第1実施形態に係る処理部120によるトレーニングの一連の処理の流れを示すフローチャートである。
以下、第1実施形態の変形例について説明する。上述した第1実施形態において、トレーニングデータセットは、健常者又はAD患者が有するAD関連遺伝子の遺伝子型データに対して、ADを発症するのか、或いはADを発症しないのかを表したスコアがラベル付けられたデータであるものとして説明したがこれに限られない。例えば、トレーニングデータセットは、健常者又はAD患者が有するAD関連遺伝子の遺伝子型データに対して、上述したスコアに加えて、更に、ADの発症年齢がラベル付けられたデータであってよい。学習部125は、このようなトレーニングデータセットを用いて、AD関連遺伝子の遺伝子型データが入力されると、ADを発症することの確率P1と、ADを発症しないことの確率P2と、ADの発症年齢tとのそれぞれを要素とする3次元のベクトル(=[P1,P2,t])を出力するように予測モデルMDLを学習する。判定部123は、予測モデルMDLによって出力されたベクトルのtの要素に基づいて、被験者がADを発症する年齢を予測する。
一実施形態において、本発明は、上記情報処理方法に記載の指示、具体的には、
被験者由来のゲノムDNA試料において、アルツハイマー病関連遺伝子の変異である第1のSNPを検出すること;及び、
アルツハイマー病を発症した患者由来のゲノムDNA試料において検出された前記アルツハイマー病関連遺伝子の変異である第2のSNPに対して、アルツハイマー病の発症に関する情報がラベル付けられた複数のトレーニングデータセットを基に学習された機械学習モデルを用いて、前記第1のSNPから、前記被験者がアルツハイマー病を発症するか否かを判定すること;
を実行するように構成されたプロセッサを提供する。
[孤発性AD患者の末梢血細胞からのiPSコホートの樹立]
本試験は、京都大学iPS細胞研究応用センターの倫理委員会により承認された(承認番号:CiRA19-05及びCiRA20-14)。ヒト末梢血単核細胞(PBMC)からのiPS細胞の樹立のために、アルツハイマー病(AD)患者のPBMCを、京都大学医学研究科の倫理委員会によって承認された研究プロジェクトに従って収集した(承認番号:R0091、G259、及びG0722)。書面によるインフォームドコンセントを、この試験のすべての参加者から得た。リプログラミング因子のヒトcDNAは、エピソームベクター(SOX2、KLF4、OCT4、L-MYC、LIN28、ドミナントネガティブp53)を用いてヒトPBMCに導入した。形質導入の数日後、PBMCを採取し、ラミニン511-E8フラグメント(iMatrix 511、ニッピ社製)でコーティングされた皿に再播種した。翌日、培地をStemFitAK03に交換した。その後、2日おきに培地を交換した。形質導入の20日後、iPS細胞コロニーをピックアップした。PBMCから確立されたiPS細胞を神経分化のために拡張培養した。
直接転換技術を利用して、堅固で迅速な分化誘導法を確立した。テトラサイクリン誘導性プロモーター(tetO)下のヒトニューロゲニン2(NGN2)cDNAを、piggyBacトランスポゾンシステム及びLipofectamine LTX(Thermo Fisher Scientific社製)によりiPS細胞に形質転換した。tetO::NGN2を含むベクターを使用した。G418二硫酸塩(ナカライテスク社製)による抗生物質選択後、コロニーを選択し、NGN2の一時的な発現を誘導することによって神経細胞に効率的に分化できるサブクローンを、MAP2/DAPI 96%超の純度で選択した。
核型分析は、本研究所又はLSIメディエンス社が実施した。一塩基多型のジェノタイピングは、ゲノムDNAのPCR増幅によって実行し、直接シーケンスした(3100 Genetic Analyzer;Thermo Fisher社製)。APOE遺伝子はPCRによって増幅した(フォワードプライマーTCCAAGGAGCTGCAGGCGGCGCA(配列番号1);リバースプライマーACAGAATTCGCCCCGGCCTGGTACACTG(配列番号2))。PCR産物をHhaIで37℃、2時間消化した後、電気泳動を行ってバンドサイズを分析した。
細胞を室温(RT、25℃程度)で4v/v%パラホルムアルデヒド(pH7.4)で固定し、0.2v/v%TritonX-100を含むPBSTで透過処理した。非特異的結合を抑制するために、BlockingONE histo(ナカライテスク社製)でRT、60分間ブロッキング処理した。細胞を一次抗体とともに4℃で一晩インキュベートし、次に蛍光タグ付き二次抗体で標識した。DAPI(Thermo Fisher社製)を使用して核を標識した。
細胞の画像は、ハイコンテント共焦点顕微鏡IN Cell Analyzer 6000(GE Healthcare社製)で取得した。免疫細胞化学染色には次の一次抗体を使用した:NANOG(1:100希釈;Abcam社製、ab80892)、TRA1-60(1:400希釈;CST#4746、Danvers、MA)、MAP2(1:4,000希釈;Abcam社製 ab5392)、SATB2(1:400希釈;Abcam社製 EPNCIR130A ab92446)、Alexa488結合抗体(1:400希釈;Thermo fisher社製 A11029)、Alexa488結合抗体(1:400希釈;Thermo fisher社製 A11039)、Alexa594結合抗体(1:400希釈;Thermo fisher社製 A21207)。
10日目に、分化した神経細胞から総タンパク質のRIPA可溶性画分を抽出し、30μLのRIPAバッファーを添加して96ウェルプレートで培養し、12,000gで30分間遠心分離して上清を回収した。上清のタンパク質濃度は、Pierce BCAタンパク質アッセイキット(Thermo fisher社製)を使用して、キットのマニュアルに従って測定した。
市販のIngenuityPathwayAnalysis(IPA、QIAGEN社製、https://www.qiagenbioinformatics.com/)ソフトウェアを使用して、230個の同定された遺伝子(p<5×10-5)の経路分析を実行し、上位のネットワークを分析した。
8日目にすべての培地を100μLの新鮮な培地と交換した。馴化培地は、10日目に更なる分析のために採取した。培地中のAβ種は、細胞外ヒトAβについてヒト(6E10)Aβ3-Plexキット(Meso Scale Discovery社製)によって測定した。Aβ種の場合、このアッセイでは6E10抗体を使用してAβペプチドを捕捉し、Sector Imager 2400(Meso Scale Discovery社製)を使用したエレクトロケミルミネッセンスで検出するためにSULFO-TAG標識の異なるC末端特異的抗Aβ抗体を使用した。定量化されたAβ値(クローンあたりN=2ウェル)は、神経細胞の総タンパク質濃度を使用して調整され、細胞数の変化に起因するノイズを最小限に抑えて条件を比較した。
iPS細胞由来の神経細胞から抽出されたRIPAライセート中のタウ種は、キットの説明書に従って、Phospho(Thr231)/Total Tau Kit(Meso Scale Discovery社製)によって測定した。定量化されたタウ値(クローンあたりN=2ウェル)は、神経細胞の総タンパク質濃度を使用して調整され、変更された細胞数に起因するノイズを最小限に抑えて条件を比較した。
キットのマニュアル(イルミナ社製)に従って、102人のAD患者サンプルすべてについてInfinium OmniExpressExome-8v1.4BeadChipで遺伝子型を決定した。アルゴリズムの問題をデータ形式の問題から分離するために、すべての遺伝子型データを、WGSデータからのバリアント呼び出しによって生成されるフォワードストランドGRCh37.p13方向に標準化した。GenomeStudio(イルミナ社製)と品質管理(ハーディー・ワインベルク平衡:p>1.0×10-6;マイナーアレル頻度≧0.01;連鎖不平衡ベースのバリアントプルーニングr2<0.8、ウィンドウサイズ:100kb、ステップサイズ:5)を使用して遺伝子型を決定した後、遺伝子型は、参照パネルとして1,000人ゲノムプロジェクトフェーズ3を使用してminimac4で帰属された。7,349,481個のSNPが、代入後の品質閾値を超えた(r2≧0.3、マイナーアレル頻度≧0.01)。SNPとiPS細胞由来の神経細胞のAβ42/40比蓄積率との線形関連をplink1.9で分析し、APOE-ε4対立遺伝子の発症年齢、性別、遺伝子型を線形回帰モデルの共変量として含めた。p<5×10-5を示唆レベルとして設定し、p<5×10-8を有意水準として関連分析を設定した。サンプルサイズを事前に決定するために統計的手法は使用しなかったが、サンプルサイズは以前の出版物で報告されたものと同様である。
大脳皮質の神経細胞におけるAβ42/40蓄積率の結果は、PLINK1.9を使用したLDベースの凝集(r2>0.2、ウィンドウサイズ=1Mb)によって処理された。独立したSNPの中で、ゲノムワイド解析で推奨される閾値レベル(p<5×10-5)を超えるものは496個のSNPであり、予測モデルの変数として使用された。選択された102のAD患者サンプルのSNP遺伝子型マトリックスは、元々0、1、又は2で構成されていたが、正規化され、主成分分析(PCA)によって分析された。
6ウェルプレートにウェルあたり3,000,000細胞の初期密度の細胞を5日目に播種した。播種の24時間後(6日目)、培地を1μM AccellSMARTpool siRNA(Horizon Discovery社製)を含む神経基礎培地と交換した。Acell siRNAの効果を最大化するために、iPS細胞由来の神経細胞を6日目から9日目まで72時間培養した。siRNAを添加してから72時間後(9日目)、培養培地を新鮮な1μM AccellSMARTpool siRNA又は1μM JNJ-40418677(Sigma-Aldrich社製)を含む神経基礎培地に交換し、11日目に採取し、Aβ表現型を分析した。
全エクソームシーケンスは、日本のADNIプロジェクトに参加しているAD患者255人と認知的に正常なコントロール152人から得られた407の血液由来ゲノムDNAサンプルで実行された。エキソンシーケンスは、Agilent社製のSureSelect Human All Exonキット(V6)を使用したハイブリダイゼーションによって濃縮され、ペアエンドリードケミストリーを使用してイルミナ社製のHiSeq4000でシーケンスされた。デフォルト設定でBWA-MEMバージョン0.7.15-r1140を使用して、ターゲット領域のショートリードシーケンスをヒトリファレンスゲノム(hg38)にマッピングした。その後の分析(読み取り処理、バリアント呼び出し、及びバリアントフィルタリング)は、GATK4ベストプラクティスの推奨事項に従って実行され、その後、snpEffバージョン4.3tを使用してバリアントアノテーションが実行された。全エクソームシーケンシングによって同定されたすべてのバリアントの中で、非同義、ナンセンス、スプライス部位、挿入又は欠失変異体に焦点を当てた。さらに、これを、公開されているデータベースを使用して、公開されているデータベースでMAF<0.05のバリアントに絞り込んだ:ExACリリース0.3(http://exac.broadinstitute.org/)、エクソーム用のgnomADリリース2.1.1、及びゲノム用のr.3.0(https://gnomad.broadinstitute.org/)、HGVDバージョン2.3(http://www.hgvd.genome.med.kyoto-u.ac.jp/)、並びに、TfoMMoバージョン8.3KJPN(https://jmorp。 megabank.tohoku.ac.jp)。J-ADNI(n=407)及びADNI(n=479)エクソームデータを使用して、RパッケージseqMetaバージョン1.6.7でBurdentestを使用してバリアントの遺伝子ベースの関連解析を実施した。
本試験に使用されたデータは、アルツハイマー病ニューロイメージングイニシアチブ(ADNI)データベース(adni.loni.usc.edu)から取得された。ADNIは、プリンシパルインベスティゲーターのMichael W. Weiner、MDが率いる官民パートナーシップとして、2003年に発足した。ADNIの主な目標は、シリアル磁気共鳴画像法(MRI)、陽電子放出断層撮影(PET)、その他の生物学的マーカー、並びに、臨床的及び神経心理学的評価を組み合わせて、軽度認知障害(MCI)及び早期アルツハイマー病(AD)の進行を測定できるかどうかをテストすることであった。SNPアレイデータは、National Bioscience Database Center(NBDC)(https://humandbs.biosciencedbc.jp/en/、研究ID:hum0314.v1)で入手できる。
データ管理と分析のすべてのコードは、GitHub(https://github.com/HaruhisaInoue/iSNPs4ADNIpred)にオンラインでアーカイブされている。他の全てのコードは発明者らのサイトで公開されている。
ADNIデータセットの臨床データの予測、及びAD発症に関連するレアバリアントの分析を除いて、以下のように統計分析を実施した。全てのデータは平均±S.D.として示されている。再現性を確認するために、2~3回の実験的複製を実施した。データの分散は正常であると想定されていたが、これは正式にはテストされていない。3つ以上のグループ間の平均の比較は、一元配置分散分析(ANOVA)と、それに続くTukeyの複数比較テスト又はUncorrected FisherのLSD(GraphPad Prism 7.0ソフトウェア(GraphPad社製)を使用した事後テスト)によって行われた)。0.05未満のp値は有意であると見なされた。
本試験では、ADコホートのiPS細胞から誘導された大脳皮質の神経細胞から放出されたAβを病理学的特性として使用して、ゲノムワイド解析を実施した。次いで、CDiPを実施して、神経細胞特異的な方法で複雑な病態メカニズムを明らかにした。
まず、神経細胞のAD病理を分析するために、孤発性AD(SAD)コホート(N=102)の患者から正常な核型を示すiPS細胞を樹立した。樹立したiPS細胞は、3つすべての胚葉を生成するインビトロでの能力と、ヒトESCと同様のX不活性特異的転写産物(XIST)を示した。
Aβの多遺伝子性を理解するために、病理学的特徴として大脳皮質の神経細胞のAβ42/40比を用いてゲノムワイド解析を実施した。統計分析はAPOEステータスに合わせて調整し、多重検定の誤検出率を適用した。全体的な結果としては、偶然に予想されたものからの大きな偏差を示さず(λ=0.9659)、集団の構造化による検定統計量のバイアス又はインフレの証拠がなかったことを意味した。APOE遺伝子型の影響を推定するために、最初はAPOE遺伝子型を調整せずにCDiPを実施した(図5A)。その結果、rs429358のp値(T/C、APOEε4の遺伝子座)は0.794であり、統計的に有意ではなかった。APOEε4は臨床ADのリスクが高いが、CDiPは、iPS細胞由来の神経細胞の単一細胞型培養におけるAβ42/40比が、APOEε4だけでなく他の複雑な遺伝子セットによって主に影響を受けることを示した。
従って、APOE遺伝子型を調整してCDiPを実施し(図5B)、24のSNPの遺伝子型及び関連する遺伝子座(「p値<5×10-8」又は「p値<5×10-5である、10を超えるSNPを含む遺伝子座」)であって、変更されたAβ42/40比に関連しているものを特定した。(図5C及び表3-1~表3-77)。表3-1~表3-77中、「chr」は染色体を、「BETA」は偏回帰係数を、「SE」は標準誤差を意味する。以降の表においても同様の意味で使用される。
Aβ表現型とCDiPで同定された24遺伝子との直接的な相互作用を証明するために、同定された遺伝子をノックダウンした場合のAβ種を定量化した(図7A、図7B、図7C、及び図7D)。Aβ産生の重要な成分であるアミロイド前駆体タンパク質(APP)又はβ部位APP切断酵素1(BACE1)の発現を抑制すると、予想通りAβの量が減少した(図7B、及び図7C)。CDiPで同定された24個の遺伝子のうち8個をノックダウンすると、Aβ42/40比が大幅に変化した(図7A)。特に、Aβ42/40比の低下が最も大きい上位3つの標的遺伝子であるCTNNA3、ANO3、及びCSMD1に焦点を当てた。Aβ量に関しては、CDiPで同定された24個の遺伝子のうち23個のノックダウンにより、Aβ42又はAβ40の量が変化した(図7B、及び図7C)。神経細胞の密度の変化はAβ42の量に影響を与える必要があるため、焦点を当てる遺伝子を選択する前に、siRNA処理後のタンパク質濃度を定量化した(図7D)。その結果、以前に報告されたように、QRFPR、INFLR1、ZNRF2、ROBO2、DCC、及びAPPのノックダウンにより総タンパク質濃度が低下することが確かめられた。従って、Aβ42量に影響を及ぼす候補遺伝子から、ZNRF2、INFLR1、DCC、及びAPPを除外した。その後、Aβ42量の減少が最も大きい上位3つの標的遺伝子であるZFPM2、TMTC1、及びKCNMA1に焦点を当てた。
KCNMA1がコードするタンパク質は、平滑筋緊張とニューロンの興奮性を調節する電圧及びカルシウム感受性カリウムチャネル(KCa1.1)で構成されている。KCa1.1はクロモリンの標的として知られており、興味深いことにADの第III相試験でテストされる。
ANO3がコードするタンパク質は、小胞体依存性カルシウムシグナル伝達において機能することが報告されており、ANO3変異は、ニューロンの異常な興奮性を介して家族性ジストニア24型を引き起こす。
これらの結果から、同定された治療標的は、カルシウムの取り扱い及び興奮性、Aβ調節の重要な経路に関与している可能性がある。
(cell GWASから得たポリジーンデータセットによるAD発症予測)
次に、この研究でiPS細胞樹立のためにPBMCを提供した患者の脳におけるAβ沈着のPETイメージングからなるinvitroデータセットとリアルワールドデータとの類似性を評価した。大脳皮質の神経細胞における定量化されたAβ表現型と、ピッツバーグ化合物-B(PiB)-PETイメージングによって測定された脳のAβ沈着との相関関係を分析した。しかし、発症年齢もAβ表現型も脳のAβ沈着と相関していなかった(図8A、図8B、図8C、及び図8D)。
これらの結果から、遺伝情報のない単純な定量化された疾患表現型は、リアルワールドデータを反映できないことを確認した。
従って、これらの遺伝子型セットを使用して、独立したADコホートからリアルワールドビッグデータを予測できるかどうかを調べた。
以上の結果から、CDiPによって特定された遺伝子型セットを使用して、ADの実際の臨床データを予測することができた。また、孤発性AD患者のSNP情報を用いて、脳内Aβの蓄積が生じるかどうかの予測を、AUC=0.76±0.050の精度で実施可能であった(図9A)。孤発性AD患者のSNP情報を用いて、脳脊髄液内Aβの異常検査値が生じるかどうかの予測を、AUC=0.73±0.059の精度で実施可能であった(図9B)。これら、脳内Aβの蓄積及び脳脊髄液内Aβの異常検査値は、臨床的なADの診断とほぼ一致するため、本実施形態の情報処理方法を用いた予測も概ねAUCが0.73以上0.76以下の精度と外挿してみなすことができる。
(cell GWASによるレアバリアントの発見)
システムの実際の臨床データへの更なる適用性を確認するために、同定された遺伝子セットがSADを形成したかどうかを調べた。今回の試験でレアバリアントとして同定された遺伝子の関連性を調べた。これらは、低頻度であるが、ADの発症におけるマイナーな要因であることが知られている。
今回の試験では、リスクSNP、SNPが位置する遺伝子、及び大脳皮質の神経細胞でのAβ産生に影響を与える分子経路が特定された。実際に、CDiPによって同定された24の遺伝子のうち5つ、すなわちTMTC1、CTNNA3、KCNMA1、CPXM2、及びANO3は、疾患の発症又は脳のAβ沈着を伴う臨床データに基づく臨床ゲノムワイド研究の報告された結果と一致していた。(上記表5-1~表5-2参照)。この利点は、Aβ産生のリソースとして機能する主要な細胞型である大脳皮質の神経細胞の均一な集団を使用したという事実に起因する可能性がある。
将来的には、これらの細胞型特異的分析アプローチによって得られた遺伝的背景の統合された包括的な理解が、ADの複雑な病因のより良い理解につながることが期待される。
Claims (8)
- 被験者由来のゲノムDNA試料において、アルツハイマー病関連遺伝子の変異である第1のSNPを検出する工程1と、
アルツハイマー病を発症した患者由来のゲノムDNA試料において検出された前記アルツハイマー病関連遺伝子の変異である第2のSNPに対して、アルツハイマー病の発症に関する情報がラベル付けられた複数のトレーニングデータセットを基に学習された機械学習モデルを用いて、前記第1のSNPから、前記被験者がアルツハイマー病を発症するか否かを判定する工程2と、
を含む、情報処理方法。 - 前記機械学習モデルは、複数の分類器を含むランダムフォレストであり、
各分類器は、前記複数のトレーニングデータセットのうち、前記アルツハイマー病を発症した患者の属性情報及び遺伝子情報の中の主成分情報を基に選択された特定のトレーニングデータセットを用いて学習される、請求項1に記載の情報処理方法。 - 前記複数のトレーニングデータセットには、遺伝子型インピュテーションを用いて前記第2のSNPから推定された前記第2のSNPの帰属遺伝子型に対して、アルツハイマー病の発症に関する情報がラベル付けられたデータセットが含まれる、請求項1又は2に記載の情報処理方法。
- 被験者由来のゲノムDNA試料において、アルツハイマー病関連遺伝子の変異である第1のSNPを検出する検出部と、
アルツハイマー病を発症した患者由来のゲノムDNA試料において検出された前記アルツハイマー病関連遺伝子の変異である第2のSNPに対して、アルツハイマー病の発症に関する情報がラベル付けられた複数のトレーニングデータセットを基に学習された機械学習モデルを用いて、前記第1のSNPから、前記被験者がアルツハイマー病を発症するか否かを判定する判定部と、
を備える、情報処理装置。 - コンピュータに、
被験者由来のゲノムDNA試料において、アルツハイマー病関連遺伝子の変異である第1のSNPを検出する工程1と、
アルツハイマー病を発症した患者由来のゲノムDNA試料において検出された前記アルツハイマー病関連遺伝子の変異である第2のSNPに対して、アルツハイマー病の発症に関する情報がラベル付けられた複数のトレーニングデータセットを基に学習された機械学習モデルを用いて、前記第1のSNPから、前記被験者がアルツハイマー病を発症するか否かを判定する工程2と、
を実行させるためのプログラム。 - 請求項1に記載の情報処理方法を用いる、アルツハイマー病の発症リスクの予測方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA3214838A CA3214838A1 (en) | 2021-04-13 | 2022-04-12 | Information processing method, information processing device, and program |
CN202280028249.6A CN117136234A (zh) | 2021-04-13 | 2022-04-12 | 信息处理方法、信息处理装置以及程序 |
EP22788161.2A EP4324922A1 (en) | 2021-04-13 | 2022-04-12 | Information processing method, information processing device, and program |
JP2023514652A JPWO2022220236A1 (ja) | 2021-04-13 | 2022-04-12 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202163174500P | 2021-04-13 | 2021-04-13 | |
US63/174,500 | 2021-04-13 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022220236A1 true WO2022220236A1 (ja) | 2022-10-20 |
Family
ID=83640074
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/017576 WO2022220236A1 (ja) | 2021-04-13 | 2022-04-12 | 情報処理方法、情報処理装置、及びプログラム |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP4324922A1 (ja) |
JP (1) | JPWO2022220236A1 (ja) |
CN (1) | CN117136234A (ja) |
CA (1) | CA3214838A1 (ja) |
WO (1) | WO2022220236A1 (ja) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019178167A1 (en) * | 2018-03-13 | 2019-09-19 | I2Dx, Inc. | Electronic delivery of information in personalized medicine |
WO2019199105A1 (ko) * | 2018-04-13 | 2019-10-17 | 사회복지법인 삼성생명공익재단 | 알츠하이머성 치매가 발병될 가능성 평가방법 |
WO2020067386A1 (ja) * | 2018-09-26 | 2020-04-02 | 味の素株式会社 | 軽度認知障害の評価方法、算出方法、評価装置、算出装置、評価プログラム、算出プログラム、記録媒体、評価システムおよび端末装置 |
-
2022
- 2022-04-12 CA CA3214838A patent/CA3214838A1/en active Pending
- 2022-04-12 JP JP2023514652A patent/JPWO2022220236A1/ja active Pending
- 2022-04-12 EP EP22788161.2A patent/EP4324922A1/en active Pending
- 2022-04-12 WO PCT/JP2022/017576 patent/WO2022220236A1/ja active Application Filing
- 2022-04-12 CN CN202280028249.6A patent/CN117136234A/zh active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019178167A1 (en) * | 2018-03-13 | 2019-09-19 | I2Dx, Inc. | Electronic delivery of information in personalized medicine |
WO2019199105A1 (ko) * | 2018-04-13 | 2019-10-17 | 사회복지법인 삼성생명공익재단 | 알츠하이머성 치매가 발병될 가능성 평가방법 |
WO2020067386A1 (ja) * | 2018-09-26 | 2020-04-02 | 味の素株式会社 | 軽度認知障害の評価方法、算出方法、評価装置、算出装置、評価プログラム、算出プログラム、記録媒体、評価システムおよび端末装置 |
Non-Patent Citations (10)
Title |
---|
"Current Protocols in Molecular Biology", 1987, JOHN WILEY&SONS |
"DNA Cloning 1", 1995, OXFORD UNIVERSITY, article "Core Techniques, A Practical Approach" |
"Molecular Cloning, A Laboratory Manual", 2001, COLD SPRING HARBOR PRESS, pages: 6 - 7 |
HIGUCHI, YO; IKEUCHI, TAKESHI: "The role of genetics in dementia prevention", GERIATRICS, vol. 2, no. 4, 1 January 2020 (2020-01-01), pages 441 - 446, XP009540578, ISSN: 2435-1881 * |
IKEUCHI, TAKESHI: "4 Perspective on precision medicine in Alzheimer's disease", PRECISION MEDICINE, vol. 1, no. 3, 1 December 2018 (2018-12-01), pages 21 (255) - 25 (259), XP009540475, ISSN: 2434-3625 * |
INOUE, HARUHISA; KONDO, TAKAYUKI: "Improving QOL in a super-aging society by developing Alzheimer's disease risk prediction technology 2018", RESEARCH PAPERS OF THE SUZUKEN MEMORIAL FOUNDATION, vol. 37, 1 January 2020 (2020-01-01), pages 61 - 64, XP009540573, ISSN: 2185-2561 * |
KONDO TAKAYUKI, HARA NORIKAZU, KOYAMA SATOSHI, YADA YUICHIRO, TSUKITA KAYOKO, NAGAHASHI AYAKO, IKEUCHI TAKESHI, ISHII KENJI, ASADA: "Dissection of the polygenic architecture of neuronal Aβ production using a large sample of individual iPSC lines derived from Alzheimer’s disease patients", NATURE AGING, vol. 2, no. 2, 1 February 2022 (2022-02-01), pages 125 - 139, XP055977106, DOI: 10.1038/s43587-021-00158-9 * |
LEE, SEONG-WHAN ; LI, STAN Z: "SAT 2015 18th International Conference, Austin, TX, USA, September 24-27, 2015", vol. 8213 Chap.10, 3 November 2013, SPRINGER , Berlin, Heidelberg , ISBN: 3540745491, article ARAÚJO GILDERLANIO S.; SOUZA MANUELA R.; OLIVEIRA JOÃO RICARDO; COSTA IVAN G. : "Random Forest and Gene Networks for Association of SNPs to Alzheimer’s Dis", pages: 104 - 115, XP047043142, 032548, DOI: 10.1007/978-3-319-02624-4_10 * |
SIMS R ET AL.: "The multiplex model of the genetics of Alzheimer's disease", NATURE NEUROSCIENCE, vol. 23, 2020, pages 311 - 322, XP037055483, DOI: 10.1038/s41593-020-0599-5 |
THANH-TUNG NGUYEN;JOSHUA ZHEXUE HUANG;QINGYAO WU;THUY THI NGUYEN;MARK JUNJIE LI: "Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests", BMC GENOMICS, BIOMED CENTRAL LTD, LONDON, UK, vol. 16, no. Suppl 2, 21 January 2015 (2015-01-21), London, UK , pages S5, XP021209050, ISSN: 1471-2164, DOI: 10.1186/1471-2164-16-S2-S5 * |
Also Published As
Publication number | Publication date |
---|---|
JPWO2022220236A1 (ja) | 2022-10-20 |
EP4324922A1 (en) | 2024-02-21 |
CA3214838A1 (en) | 2022-10-20 |
CN117136234A (zh) | 2023-11-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Erwood et al. | Saturation variant interpretation using CRISPR prime editing | |
US11788142B2 (en) | Compositions and methods for discovery of causative mutations in genetic disorders | |
Glahn et al. | Arguments for the sake of endophenotypes: examining common misconceptions about the use of endophenotypes in psychiatric genetics | |
Hoischen et al. | Prioritization of neurodevelopmental disease genes by discovery of new mutations | |
Zilberman et al. | Genome-wide analysis of DNA methylation patterns | |
Solomon | The etiology of VACTERL association: Current knowledge and hypotheses | |
Holm et al. | A rare variant in MYH6 is associated with high risk of sick sinus syndrome | |
Townsley et al. | Massively parallel techniques for cataloguing the regulome of the human brain | |
Maia et al. | Intellectual disability genomics: current state, pitfalls and future challenges | |
Shinozaki et al. | New developments in the genetics of bipolar disorder | |
Ohnmacht et al. | Missing heritability in Parkinson’s disease: the emerging role of non-coding genetic variation | |
García-Pérez et al. | Epigenomic profiling of primate lymphoblastoid cell lines reveals the evolutionary patterns of epigenetic activities in gene regulatory architectures | |
Paris et al. | Sex bias and maternal contribution to gene expression divergence in Drosophila blastoderm embryos | |
WO2020061072A1 (en) | Method of characterizing a neurodegenerative pathology | |
Gokoolparsadh et al. | Searching for convergent pathways in autism spectrum disorders: insights from human brain transcriptome studies | |
Spielmann et al. | Computational and experimental methods for classifying variants of unknown clinical significance | |
Shen et al. | Hybrid mice reveal parent-of-origin and cis-and trans-regulatory effects in the retina | |
Pagni et al. | Non‐coding regulatory elements: Potential roles in disease and the case of epilepsy | |
Han et al. | Transposable element profiles reveal cell line identity and loss of heterozygosity in Drosophila cell culture | |
Mo et al. | Detection of lncRNA-mRNA interaction modules by integrating eQTL with weighted gene co-expression network analysis | |
Šerý et al. | Perspectives in genetic prediction of Alzheimer’s disease | |
WO2015166912A1 (ja) | 遺伝性疾患の検出方法 | |
Han et al. | Integrating brain methylome with GWAS for psychiatric risk gene discovery | |
WO2022220236A1 (ja) | 情報処理方法、情報処理装置、及びプログラム | |
Wernerfelt et al. | Arginine vasopressin 1a receptor (AVPR1a) RS3 repeat polymorphism associated with entrepreneurship |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22788161 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023514652 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 3214838 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18554514 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022788161 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2022788161 Country of ref document: EP Effective date: 20231113 |