WO2022250513A1 - 세포유리 핵산단편 말단 서열 모티프 빈도 및 크기를 이용한 암 진단 및 암 종 예측방법 - Google Patents
세포유리 핵산단편 말단 서열 모티프 빈도 및 크기를 이용한 암 진단 및 암 종 예측방법 Download PDFInfo
- Publication number
- WO2022250513A1 WO2022250513A1 PCT/KR2022/007651 KR2022007651W WO2022250513A1 WO 2022250513 A1 WO2022250513 A1 WO 2022250513A1 KR 2022007651 W KR2022007651 W KR 2022007651W WO 2022250513 A1 WO2022250513 A1 WO 2022250513A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- nucleic acid
- cancer
- acid fragment
- size
- predicting
- Prior art date
Links
- 150000007523 nucleic acids Chemical group 0.000 title claims abstract description 160
- 206010028980 Neoplasm Diseases 0.000 title claims abstract description 125
- 201000011510 cancer Diseases 0.000 title claims abstract description 123
- 238000000034 method Methods 0.000 title claims abstract description 107
- 108020004707 nucleic acids Proteins 0.000 claims abstract description 48
- 102000039446 nucleic acids Human genes 0.000 claims abstract description 48
- 238000013473 artificial intelligence Methods 0.000 claims abstract description 34
- 239000012472 biological sample Substances 0.000 claims abstract description 17
- 238000012163 sequencing technique Methods 0.000 claims description 57
- 239000000523 sample Substances 0.000 claims description 31
- 238000013527 convolutional neural network Methods 0.000 claims description 24
- 238000013528 artificial neural network Methods 0.000 claims description 20
- 238000003745 diagnosis Methods 0.000 claims description 14
- 210000004027 cell Anatomy 0.000 claims description 13
- 210000000349 chromosome Anatomy 0.000 claims description 10
- 238000003860 storage Methods 0.000 claims description 9
- 108090000623 proteins and genes Proteins 0.000 claims description 7
- 239000011324 bead Substances 0.000 claims description 6
- 230000000306 recurrent effect Effects 0.000 claims description 6
- 210000004369 blood Anatomy 0.000 claims description 5
- 239000008280 blood Substances 0.000 claims description 5
- 210000001519 tissue Anatomy 0.000 claims description 5
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 3
- 230000002759 chromosomal effect Effects 0.000 claims description 3
- 102000004169 proteins and genes Human genes 0.000 claims description 3
- 210000003296 saliva Anatomy 0.000 claims description 3
- 210000000582 semen Anatomy 0.000 claims description 3
- 210000002700 urine Anatomy 0.000 claims description 3
- 210000004381 amniotic fluid Anatomy 0.000 claims description 2
- 238000004587 chromatography analysis Methods 0.000 claims description 2
- 238000004440 column chromatography Methods 0.000 claims description 2
- 230000006862 enzymatic digestion Effects 0.000 claims description 2
- 239000003925 fat Substances 0.000 claims description 2
- 210000004209 hair Anatomy 0.000 claims description 2
- 239000000203 mixture Substances 0.000 claims description 2
- 230000003169 placental effect Effects 0.000 claims description 2
- 238000010298 pulverizing process Methods 0.000 claims description 2
- 238000005185 salting out Methods 0.000 claims description 2
- 230000001605 fetal effect Effects 0.000 claims 1
- 230000002068 genetic effect Effects 0.000 claims 1
- 230000035945 sensitivity Effects 0.000 abstract description 7
- 238000007481 next generation sequencing Methods 0.000 description 31
- 108020004414 DNA Proteins 0.000 description 25
- 239000012634 fragment Substances 0.000 description 21
- 239000013598 vector Substances 0.000 description 20
- 230000008569 process Effects 0.000 description 18
- 230000006870 function Effects 0.000 description 16
- 238000010200 validation analysis Methods 0.000 description 16
- 208000000461 Esophageal Neoplasms Diseases 0.000 description 15
- 206010030155 Oesophageal carcinoma Diseases 0.000 description 15
- 201000004101 esophageal cancer Diseases 0.000 description 15
- 238000012360 testing method Methods 0.000 description 15
- 239000002773 nucleotide Substances 0.000 description 14
- 125000003729 nucleotide group Chemical group 0.000 description 14
- 201000007270 liver cancer Diseases 0.000 description 13
- 208000014018 liver neoplasm Diseases 0.000 description 13
- 238000002864 sequence alignment Methods 0.000 description 10
- 238000012549 training Methods 0.000 description 9
- 230000003321 amplification Effects 0.000 description 8
- 238000006243 chemical reaction Methods 0.000 description 8
- 238000003199 nucleic acid amplification method Methods 0.000 description 8
- 230000002441 reversible effect Effects 0.000 description 7
- 238000012216 screening Methods 0.000 description 7
- 239000007787 solid Substances 0.000 description 7
- 206010006187 Breast cancer Diseases 0.000 description 6
- 208000026310 Breast neoplasm Diseases 0.000 description 6
- 238000003384 imaging method Methods 0.000 description 6
- 239000000243 solution Substances 0.000 description 6
- 238000013459 approach Methods 0.000 description 5
- 238000013136 deep learning model Methods 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 5
- 239000012530 fluid Substances 0.000 description 5
- 238000009396 hybridization Methods 0.000 description 5
- 238000004574 scanning tunneling microscopy Methods 0.000 description 5
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 4
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 4
- 108091034117 Oligonucleotide Proteins 0.000 description 4
- 241000283907 Tragelaphus oryx Species 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 4
- 230000000295 complement effect Effects 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 238000011528 liquid biopsy Methods 0.000 description 4
- 238000002360 preparation method Methods 0.000 description 4
- 238000012175 pyrosequencing Methods 0.000 description 4
- 238000007841 sequencing by ligation Methods 0.000 description 4
- 102000053602 DNA Human genes 0.000 description 3
- 230000006820 DNA synthesis Effects 0.000 description 3
- 238000000137 annealing Methods 0.000 description 3
- 238000001574 biopsy Methods 0.000 description 3
- 239000007850 fluorescent dye Substances 0.000 description 3
- 238000007672 fourth generation sequencing Methods 0.000 description 3
- 230000014509 gene expression Effects 0.000 description 3
- -1 leukocytes Substances 0.000 description 3
- 210000002569 neuron Anatomy 0.000 description 3
- 239000000344 soap Substances 0.000 description 3
- 239000007790 solid phase Substances 0.000 description 3
- 230000005641 tunneling Effects 0.000 description 3
- 208000024893 Acute lymphoblastic leukemia Diseases 0.000 description 2
- 208000014697 Acute lymphocytic leukaemia Diseases 0.000 description 2
- 241000143060 Americamysis bahia Species 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 2
- 102000012410 DNA Ligases Human genes 0.000 description 2
- 108010061982 DNA Ligases Proteins 0.000 description 2
- 102000004190 Enzymes Human genes 0.000 description 2
- 108090000790 Enzymes Proteins 0.000 description 2
- 102100040004 Gamma-glutamylcyclotransferase Human genes 0.000 description 2
- 101000886680 Homo sapiens Gamma-glutamylcyclotransferase Proteins 0.000 description 2
- 208000015914 Non-Hodgkin lymphomas Diseases 0.000 description 2
- 229910019142 PO4 Inorganic materials 0.000 description 2
- 208000006664 Precursor Cell Lymphoblastic Leukemia-Lymphoma Diseases 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 2
- 239000002041 carbon nanotube Substances 0.000 description 2
- 229910021393 carbon nanotube Inorganic materials 0.000 description 2
- 238000003776 cleavage reaction Methods 0.000 description 2
- 125000004122 cyclic group Chemical group 0.000 description 2
- 239000000975 dye Substances 0.000 description 2
- 239000000839 emulsion Substances 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000000799 fluorescence microscopy Methods 0.000 description 2
- 238000002866 fluorescence resonance energy transfer Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 210000000265 leukocyte Anatomy 0.000 description 2
- 235000019689 luncheon sausage Nutrition 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000002493 microarray Methods 0.000 description 2
- 239000002071 nanotube Substances 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 239000013610 patient sample Substances 0.000 description 2
- 229920002401 polyacrylamide Polymers 0.000 description 2
- 230000037452 priming Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000007017 scission Effects 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- BAAVRTJSLCSMNM-CMOCDZPBSA-N (2s)-2-[[(2s)-2-[[(2s)-2-[[(2s)-2-amino-3-(4-hydroxyphenyl)propanoyl]amino]-4-carboxybutanoyl]amino]-3-(4-hydroxyphenyl)propanoyl]amino]pentanedioic acid Chemical compound C([C@H](N)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC=1C=CC(O)=CC=1)C(=O)N[C@@H](CCC(O)=O)C(O)=O)C1=CC=C(O)C=C1 BAAVRTJSLCSMNM-CMOCDZPBSA-N 0.000 description 1
- FMKJUUQOYOHLTF-OWOJBTEDSA-N (e)-4-azaniumylbut-2-enoate Chemical compound NC\C=C\C(O)=O FMKJUUQOYOHLTF-OWOJBTEDSA-N 0.000 description 1
- JTTIOYHBNXDJOD-UHFFFAOYSA-N 2,4,6-triaminopyrimidine Chemical compound NC1=CC(N)=NC(N)=N1 JTTIOYHBNXDJOD-UHFFFAOYSA-N 0.000 description 1
- HEANZWXEJRRYTD-UHFFFAOYSA-M 2-[(6-hexadecanoylnaphthalen-2-yl)-methylamino]ethyl-trimethylazanium;chloride Chemical compound [Cl-].C1=C(N(C)CC[N+](C)(C)C)C=CC2=CC(C(=O)CCCCCCCCCCCCCCC)=CC=C21 HEANZWXEJRRYTD-UHFFFAOYSA-M 0.000 description 1
- 102100025230 2-amino-3-ketobutyrate coenzyme A ligase, mitochondrial Human genes 0.000 description 1
- 102100039217 3-ketoacyl-CoA thiolase, peroxisomal Human genes 0.000 description 1
- FWMNVWWHGCHHJJ-SKKKGAJSSA-N 4-amino-1-[(2r)-6-amino-2-[[(2r)-2-[[(2r)-2-[[(2r)-2-amino-3-phenylpropanoyl]amino]-3-phenylpropanoyl]amino]-4-methylpentanoyl]amino]hexanoyl]piperidine-4-carboxylic acid Chemical compound C([C@H](C(=O)N[C@H](CC(C)C)C(=O)N[C@H](CCCCN)C(=O)N1CCC(N)(CC1)C(O)=O)NC(=O)[C@H](N)CC=1C=CC=CC=1)C1=CC=CC=C1 FWMNVWWHGCHHJJ-SKKKGAJSSA-N 0.000 description 1
- AWXGSYPUMWKTBR-UHFFFAOYSA-N 4-carbazol-9-yl-n,n-bis(4-carbazol-9-ylphenyl)aniline Chemical compound C12=CC=CC=C2C2=CC=CC=C2N1C1=CC=C(N(C=2C=CC(=CC=2)N2C3=CC=CC=C3C3=CC=CC=C32)C=2C=CC(=CC=2)N2C3=CC=CC=C3C3=CC=CC=C32)C=C1 AWXGSYPUMWKTBR-UHFFFAOYSA-N 0.000 description 1
- FVFVNNKYKYZTJU-UHFFFAOYSA-N 6-chloro-1,3,5-triazine-2,4-diamine Chemical compound NC1=NC(N)=NC(Cl)=N1 FVFVNNKYKYZTJU-UHFFFAOYSA-N 0.000 description 1
- 241000023308 Acca Species 0.000 description 1
- 208000031261 Acute myeloid leukaemia Diseases 0.000 description 1
- 108010087522 Aeromonas hydrophilia lipase-acyltransferase Proteins 0.000 description 1
- VWEWCZSUWOEEFM-WDSKDSINSA-N Ala-Gly-Ala-Gly Chemical compound C[C@H](N)C(=O)NCC(=O)N[C@@H](C)C(=O)NCC(O)=O VWEWCZSUWOEEFM-WDSKDSINSA-N 0.000 description 1
- 101100421761 Arabidopsis thaliana GSNAP gene Proteins 0.000 description 1
- 206010004593 Bile duct cancer Diseases 0.000 description 1
- 206010005003 Bladder cancer Diseases 0.000 description 1
- 201000009030 Carcinoma Diseases 0.000 description 1
- 206010008342 Cervix carcinoma Diseases 0.000 description 1
- 206010009944 Colon cancer Diseases 0.000 description 1
- 108020004635 Complementary DNA Proteins 0.000 description 1
- 230000004544 DNA amplification Effects 0.000 description 1
- 238000000018 DNA microarray Methods 0.000 description 1
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 1
- 108700024394 Exon Proteins 0.000 description 1
- 101800000863 Galanin message-associated peptide Proteins 0.000 description 1
- 102100028501 Galanin peptides Human genes 0.000 description 1
- 208000022072 Gallbladder Neoplasms Diseases 0.000 description 1
- 101100153048 Homo sapiens ACAA1 gene Proteins 0.000 description 1
- 101000856513 Homo sapiens Inactive N-acetyllactosaminide alpha-1,3-galactosyltransferase Proteins 0.000 description 1
- 101000957437 Homo sapiens Mitochondrial carnitine/acylcarnitine carrier protein Proteins 0.000 description 1
- 101000829958 Homo sapiens N-acetyllactosaminide beta-1,6-N-acetylglucosaminyl-transferase Proteins 0.000 description 1
- 101000724418 Homo sapiens Neutral amino acid transporter B(0) Proteins 0.000 description 1
- 101000848922 Homo sapiens Protein FAM72A Proteins 0.000 description 1
- 101000869690 Homo sapiens Protein S100-A8 Proteins 0.000 description 1
- 101000837344 Homo sapiens T-cell leukemia translocation-altered gene protein Proteins 0.000 description 1
- 101000666730 Homo sapiens T-complex protein 1 subunit alpha Proteins 0.000 description 1
- 102100025509 Inactive N-acetyllactosaminide alpha-1,3-galactosyltransferase Human genes 0.000 description 1
- 208000008839 Kidney Neoplasms Diseases 0.000 description 1
- 238000012313 Kruskal-Wallis test Methods 0.000 description 1
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 1
- 206010027406 Mesothelioma Diseases 0.000 description 1
- 102100038738 Mitochondrial carnitine/acylcarnitine carrier protein Human genes 0.000 description 1
- 208000034578 Multiple myelomas Diseases 0.000 description 1
- 208000033776 Myeloid Acute Leukemia Diseases 0.000 description 1
- 102100023315 N-acetyllactosaminide beta-1,6-N-acetylglucosaminyl-transferase Human genes 0.000 description 1
- 102100028267 Neutral amino acid transporter B(0) Human genes 0.000 description 1
- 206010033128 Ovarian cancer Diseases 0.000 description 1
- 206010061535 Ovarian neoplasm Diseases 0.000 description 1
- 206010061902 Pancreatic neoplasm Diseases 0.000 description 1
- 206010035226 Plasma cell myeloma Diseases 0.000 description 1
- 206010036790 Productive cough Diseases 0.000 description 1
- 206010060862 Prostate cancer Diseases 0.000 description 1
- 208000000236 Prostatic Neoplasms Diseases 0.000 description 1
- 102100034514 Protein FAM72A Human genes 0.000 description 1
- 102100029812 Protein S100-A12 Human genes 0.000 description 1
- 101710110949 Protein S100-A12 Proteins 0.000 description 1
- 102100032442 Protein S100-A8 Human genes 0.000 description 1
- 208000015634 Rectal Neoplasms Diseases 0.000 description 1
- 206010038389 Renal cancer Diseases 0.000 description 1
- 208000005718 Stomach Neoplasms Diseases 0.000 description 1
- 102100028692 T-cell leukemia translocation-altered gene protein Human genes 0.000 description 1
- 102100038410 T-complex protein 1 subunit alpha Human genes 0.000 description 1
- 102100036049 T-complex protein 1 subunit gamma Human genes 0.000 description 1
- 201000008754 Tenosynovial giant cell tumor Diseases 0.000 description 1
- 208000024770 Thyroid neoplasm Diseases 0.000 description 1
- 208000007097 Urinary Bladder Neoplasms Diseases 0.000 description 1
- 208000006105 Uterine Cervical Neoplasms Diseases 0.000 description 1
- 235000009499 Vanilla fragrans Nutrition 0.000 description 1
- 244000263375 Vanilla tahitensis Species 0.000 description 1
- 235000012036 Vanilla tahitensis Nutrition 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000000692 anti-sense effect Effects 0.000 description 1
- 238000000429 assembly Methods 0.000 description 1
- 230000000712 assembly Effects 0.000 description 1
- 208000026900 bile duct neoplasm Diseases 0.000 description 1
- 239000013060 biological fluid Substances 0.000 description 1
- 210000001754 blood buffy coat Anatomy 0.000 description 1
- 210000001124 body fluid Anatomy 0.000 description 1
- 239000010839 body fluid Substances 0.000 description 1
- 210000000621 bronchi Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 101150062912 cct3 gene Proteins 0.000 description 1
- 210000001175 cerebrospinal fluid Anatomy 0.000 description 1
- 201000010881 cervical cancer Diseases 0.000 description 1
- 208000006990 cholangiocarcinoma Diseases 0.000 description 1
- 230000003920 cognitive function Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 208000029742 colonic neoplasm Diseases 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000002299 complementary DNA Substances 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 210000002726 cyst fluid Anatomy 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000012631 diagnostic technique Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 208000035647 diffuse type tenosynovial giant cell tumor Diseases 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001962 electrophoresis Methods 0.000 description 1
- 238000006911 enzymatic reaction Methods 0.000 description 1
- 230000005669 field effect Effects 0.000 description 1
- 201000010175 gallbladder cancer Diseases 0.000 description 1
- 206010017758 gastric cancer Diseases 0.000 description 1
- 230000004077 genetic alteration Effects 0.000 description 1
- 231100000118 genetic alteration Toxicity 0.000 description 1
- 238000013412 genome amplification Methods 0.000 description 1
- 230000000762 glandular Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 208000005017 glioblastoma Diseases 0.000 description 1
- 201000010536 head and neck cancer Diseases 0.000 description 1
- 208000014829 head and neck neoplasm Diseases 0.000 description 1
- 230000002489 hematologic effect Effects 0.000 description 1
- 238000011065 in-situ storage Methods 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
- 230000003834 intracellular effect Effects 0.000 description 1
- 150000002500 ions Chemical class 0.000 description 1
- 201000010982 kidney cancer Diseases 0.000 description 1
- 201000005202 lung cancer Diseases 0.000 description 1
- 208000020816 lung neoplasm Diseases 0.000 description 1
- 210000004880 lymph fluid Anatomy 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 208000015486 malignant pancreatic neoplasm Diseases 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 201000001441 melanoma Diseases 0.000 description 1
- 238000000386 microscopy Methods 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000007479 molecular analysis Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 210000003097 mucus Anatomy 0.000 description 1
- 239000002070 nanowire Substances 0.000 description 1
- 210000002445 nipple Anatomy 0.000 description 1
- QJGQUHMNIGDVPM-UHFFFAOYSA-N nitrogen group Chemical group [N] QJGQUHMNIGDVPM-UHFFFAOYSA-N 0.000 description 1
- 201000002528 pancreatic cancer Diseases 0.000 description 1
- 208000008443 pancreatic carcinoma Diseases 0.000 description 1
- 210000005259 peripheral blood Anatomy 0.000 description 1
- 239000011886 peripheral blood Substances 0.000 description 1
- 210000003819 peripheral blood mononuclear cell Anatomy 0.000 description 1
- NBIIXXVUZAFLBC-UHFFFAOYSA-K phosphate Chemical compound [O-]P([O-])([O-])=O NBIIXXVUZAFLBC-UHFFFAOYSA-K 0.000 description 1
- 239000010452 phosphate Substances 0.000 description 1
- 230000000704 physical effect Effects 0.000 description 1
- 210000004910 pleural fluid Anatomy 0.000 description 1
- 229920000642 polymer Polymers 0.000 description 1
- 102000054765 polymorphisms of proteins Human genes 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 239000011148 porous material Substances 0.000 description 1
- 239000002244 precipitate Substances 0.000 description 1
- 238000003793 prenatal diagnosis Methods 0.000 description 1
- 125000002924 primary amino group Chemical group [H]N([H])* 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 206010038038 rectal cancer Diseases 0.000 description 1
- 201000001275 rectum cancer Diseases 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 230000028327 secretion Effects 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 210000002966 serum Anatomy 0.000 description 1
- 201000002314 small intestine cancer Diseases 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 238000004611 spectroscopical analysis Methods 0.000 description 1
- 210000003802 sputum Anatomy 0.000 description 1
- 208000024794 sputum Diseases 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 201000011549 stomach cancer Diseases 0.000 description 1
- 239000006228 supernatant Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 210000001179 synovial fluid Anatomy 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 101150075675 tatC gene Proteins 0.000 description 1
- 210000001138 tear Anatomy 0.000 description 1
- 208000002918 testicular germ cell tumor Diseases 0.000 description 1
- 201000002510 thyroid cancer Diseases 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- YNJBWRMUSHSURL-UHFFFAOYSA-N trichloroacetic acid Chemical compound OC(=O)C(Cl)(Cl)Cl YNJBWRMUSHSURL-UHFFFAOYSA-N 0.000 description 1
- 239000000439 tumor marker Substances 0.000 description 1
- 108010032276 tyrosyl-glutamyl-tyrosyl-glutamic acid Proteins 0.000 description 1
- 201000005112 urinary bladder cancer Diseases 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
- 238000012070 whole genome sequencing analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/042—Knowledge-based neural networks; Logical representations of neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/40—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for data related to laboratory analysis, e.g. patient specimen analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H30/00—ICT specially adapted for the handling or processing of medical images
- G16H30/40—ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/50—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N2800/00—Detection or diagnosis of diseases
- G01N2800/70—Mechanisms involved in disease identification
- G01N2800/7023—(Hyper)proliferation
- G01N2800/7028—Cancer
Definitions
- the present invention relates to a method for diagnosing cancer and predicting cancer types using the frequency and size of terminal sequence motifs of cell-free nucleic acid fragments, and more specifically, nucleic acids based on reads aligned by extracting nucleic acids from biological samples and obtaining sequence information.
- Cancer diagnosis in clinical practice is usually confirmed by performing a tissue biopsy after a medical history, physical examination, and clinical evaluation. Cancer diagnosis by clinical tests is possible only when the number of cancer cells is 1 billion or more and the diameter of the cancer is 1 cm or more. In this case, the cancer cells already have the ability to metastasize, and at least half of them have already metastasized.
- tissue biopsy is invasive, it causes considerable inconvenience to patients, and there are problems in that tissue biopsy can often not be performed while treating cancer patients.
- cancer screening tumor markers are used to monitor substances produced directly or indirectly from cancer, but even when cancer is present, more than half of the tumor marker screening results are normal, and often positive even when there is no cancer. Because it appears, there is a limit to its accuracy.
- liquid biopsy using a patient's body fluid as a recent cancer diagnosis and follow-up test (liquid biopsy) is widely used.
- Liquid biopsy is a non-invasive diagnostic technique that is attracting attention as an alternative to conventional invasive diagnostic and examination methods.
- an artificial neural network refers to a calculation model implemented in software or hardware that imitates the computational capability of a biological system by using a large number of artificial neurons connected by connection lines.
- Artificial neural networks use artificial neurons that simplify the functions of biological neurons.
- the human cognitive function or learning process is performed by interconnecting them through a connection line having a connection strength.
- the connection strength is a specific value that a connection line has, and is also called a connection weight.
- Learning of artificial neural networks can be divided into supervised learning and unsupervised learning.
- Supervised learning is a method of putting input data and corresponding output data together into a neural network and updating the connection strength of connection lines so that output data corresponding to the input data is output.
- Representative learning algorithms include Delta Rule and Back Propagation Learning.
- Unsupervised learning is a method in which an artificial neural network learns connection strength by itself using only input data without a target value.
- Unsupervised learning is a method of updating connection weights by correlation between input patterns.
- the present inventors have made diligent efforts to solve the above problems and to develop a highly sensitive and accurate AI-based cancer diagnosis and cancer prediction method, based on the terminal sequence motif of cell-free nucleic acid fragments and the length information of nucleic acid fragments.
- vectorized data was generated and analyzed with a learned artificial intelligence model, it was confirmed that cancer diagnosis and cancer types could be predicted with high sensitivity and accuracy, and the present invention was completed.
- An object of the present invention is to provide a method for diagnosing cancer and predicting cancer types using cell-free nucleic acid fragment terminal sequence motif frequency and size.
- Another object of the present invention is to provide an apparatus for diagnosing cancer and predicting cancer types using cell-free nucleic acid fragment terminal sequence motif frequency and size.
- Another object of the present invention is to provide a computer readable storage medium containing instructions configured to be executed by a processor for diagnosing cancer and predicting cancer types by the above method.
- the present invention includes (a) obtaining sequence information by extracting nucleic acids from a biological sample; (b) aligning the obtained sequence information (reads) with a standard chromosome sequence database (reference genome database); (c) deriving the terminal sequence motif frequency and the size of the nucleic acid fragments using the aligned sequence reads; (d) generating vectorized data using the terminal sequence motif frequency of the derived nucleic acid fragment and the size of the nucleic acid fragment; (e) determining the presence or absence of cancer by inputting the generated vectorized data to the learned artificial intelligence model and comparing the output result value analyzed with a cut-off value; and (f) estimating the type of cancer through comparison of the output result values.
- the present invention also includes (a) obtaining sequence information by extracting nucleic acids from a biological sample; (b) aligning the obtained sequence information (reads) with a standard chromosome sequence database (reference genome database); (c) deriving the terminal sequence motif frequency and the size of the nucleic acid fragments using the aligned sequence reads; (d) generating vectorized data using the terminal sequence motif frequency of the derived nucleic acid fragment and the size of the nucleic acid fragment; (e) determining the presence or absence of cancer by inputting the generated vectorized data to the learned artificial intelligence model and comparing the output result value analyzed with a cut-off value; and (f) predicting a cancer type through comparison of the output result value.
- the present invention also includes a decoding unit for extracting nucleic acids from a biological sample and decoding sequence information; an alignment unit that aligns the translated sequence with a standard chromosomal sequence database; a nucleic acid fragment analyzer for deriving the frequency of terminal sequence motifs and the size of the nucleic acid fragments based on the aligned sequences; a data generation unit for generating vectorized data using the terminal sequence motif frequency of the derived nucleic acid fragment and the size of the nucleic acid fragment; a cancer diagnosis unit that analyzes the generated vectorized data by inputting it to the learned artificial intelligence model and compares it with a reference value to determine whether or not there is cancer; and a cancer type prediction unit that analyzes the output result value and predicts the type of cancer.
- a decoding unit for extracting nucleic acids from a biological sample and decoding sequence information
- an alignment unit that aligns the translated sequence with a standard chromosomal sequence database
- a nucleic acid fragment analyzer for deriving the frequency of
- the present invention also provides a computer-readable storage medium comprising instructions configured to be executed by a processor for diagnosing cancer and predicting cancer types, including: (a) obtaining sequence information by extracting nucleic acids from a biological sample; (b) aligning the obtained sequence information (reads) with a standard chromosome sequence database (reference genome database); (c) deriving the terminal sequence motif frequency and the size of the nucleic acid fragments using the aligned sequence reads; (d) generating vectorized data using the terminal sequence motif frequency of the derived nucleic acid fragment and the size of the nucleic acid fragment; (e) determining the presence or absence of cancer by inputting the generated vectorized data to the learned artificial intelligence model and comparing the output result value analyzed with a cut-off value; and (f) predicting the presence of cancer and the type of cancer through the step of predicting the type of cancer through the comparison of the output result values.
- 1 is an overall flowchart for performing the method for diagnosing cancer and predicting cancer types using the frequency and size of cell-free nucleic acid fragment terminal sequence motifs according to the present invention.
- Figure 2 is an example of a process for selecting a motif having a difference in expression frequency between a healthy person and a cancer patient, or each cancer type in one embodiment of the present invention.
- 3 is a graph confirming the size distribution of nucleic acid fragments selected in one embodiment of the present invention.
- the left panel of FIG. 4 is an example of the FEMS table prepared in one embodiment of the present invention using one nucleic acid fragment, and the right panel is an example of using the entire nucleic acid fragment.
- the left panel of FIG. 5 is an example of an FEMS table created by additionally performing edge summary in an embodiment of the present invention, and the right panel is a result of visualizing it.
- FIG. 6 is a visualization example of a FEMS table created based on data of healthy people, liver cancer patients, and esophageal cancer patients used in an embodiment of the present invention.
- (A) is the result of confirming the performance of the CNN model constructed in one embodiment of the present invention with accuracy and micro AUC, and (B) is the confusion matrix.
- FIG. 9 is a schematic diagram showing the configuration of a CNN model built in an embodiment of the present invention.
- first, second, A, B, etc. may be used to describe various elements, but the elements are not limited by the above terms, and are merely used to distinguish one element from another. used only as For example, without departing from the scope of the technology described below, a first element may be referred to as a second element, and similarly, the second element may be referred to as a first element.
- the terms and/or include any combination of a plurality of related recited items or any of a plurality of related recited items.
- each component to be described below may be combined into one component, or one component may be divided into two or more for each more subdivided function.
- each component to be described below may additionally perform some or all of the functions of other components in addition to its main function, and some of the main functions of each component may be performed by other components. Of course, it may be dedicated and performed by .
- each process constituting the method may occur in a different order from the specified order unless a specific order is clearly described in context. That is, each process may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the reverse order.
- the sequencing data obtained from the sample is aligned with the reference genome, and then the terminal sequence motif frequency and size of the nucleic acid fragment are derived based on the aligned sequence information, and the terminal sequence of the derived nucleic acid fragment is derived.
- the frequency of the terminal sequence motif of the nucleic acid fragment and the size of the nucleic acid fragment are derived, and the terminal sequence of the nucleic acid fragment
- the DPI value was calculated by learning it in a deep learning model, and cancer diagnosis was performed by comparing it with the reference value, Among the DPI values calculated for each cancer type, a method for determining the cancer type with the highest DPI value as the cancer type of the sample was developed (FIG. 1).
- It relates to a method for providing information for diagnosing cancer and predicting cancer types, including the step of predicting cancer types through the comparison of the output result values.
- the nucleic acid fragment can be used without limitation as long as it is a fragment of nucleic acid extracted from a biological sample, and preferably may be a fragment of cell-free nucleic acid or intracellular nucleic acid, but is not limited thereto.
- the nucleic acid fragment can be obtained by any method known to those skilled in the art, and is preferably directly sequenced, sequenced through next-generation sequencing, or non-specific whole genome amplification. ), or obtained through sequencing or probe-based sequencing, but is not limited thereto.
- the cancer may be solid cancer or hematological cancer, preferably non-Hodgkin lymphoma, non-Hodgkin lymphoma, acute myeloid leukemia, or acute lymphocytic leukemia.
- acute-lymphoid leukemia multiple myeloma, head and neck cancer, lung cancer, glioblastoma, colon/rectal cancer, pancreatic cancer, breast cancer, ovarian cancer, melanoma, prostate cancer
- It may be selected from the group consisting of liver cancer, thyroid cancer, gastric cancer, gallbladder cancer, bile duct cancer, bladder cancer, small intestine cancer, cervical cancer, cancer of unknown primary site, kidney cancer, esophageal cancer, and mesothelioma, more preferably liver cancer or esophageal cancer. It may, but is not limited thereto.
- the step (a) is
- the step of obtaining the sequence information of step (a) may be characterized in that the isolated cell-free DNA is obtained through whole genome sequencing at a depth of 1 million to 100 million reads.
- the biological sample refers to any material, biological fluid, tissue or cell obtained from or derived from an individual, for example, whole blood, leukocytes, peripheral blood mononuclear peripheral blood mononuclear cells, leukocyte buffy coat, blood (including plasma and serum), sputum, tears, mucus, nasal washes, nasal aspirates, breath, urine, semen, saliva, peritoneal washings, pelvic fluids, cyst fluids ( cystic fluid, meningeal fluid, amniotic fluid, glandular fluid, pancreatic fluid, lymph fluid, pleural fluid, nipple aspirate, bronchi Bronchial aspirate, synovial fluid, joint aspirate, organ secretions, cells, cell extract, semen, hair, saliva, urine, oral cells , placental cells, cerebrospinal fluid, and mixtures thereof, but are not limited thereto.
- cyst fluids cystic fluid, meningeal fluid, amniotic fluid, glandular fluid, pancreatic fluid, lymph fluid,
- next-generation sequencer can be used with any sequencing method known in the art. Sequencing of nucleic acids isolated by selection methods is typically performed using next-generation sequencing (NGS).
- Next-generation sequencing includes any sequencing method that determines the nucleotide sequence of an individual nucleic acid molecule or one of clonally expanded proxies for individual nucleic acid molecules in a highly similar manner (e.g., 105 or more molecules are sequenced simultaneously). do).
- the relative abundance of a nucleic acid species in a library can be estimated by counting the relative number of occurrences of its cognate sequence in data generated by sequencing experiments. Next-generation sequencing methods are known in the art and are described, for example, in Metzker, M. (2010) Nature Biotechnology Reviews 11:31-46, incorporated herein by reference.
- next-generation sequencing is performed to determine the nucleotide sequence of individual nucleic acid molecules (e.g., the HeliScope Gene Sequencing system from Helicos BioSciences and the Pacific Biosciences' HeliScope Gene Sequencing system). PacBio RS system).
- sequencing e.g., massively parallel short-read sequencing that yields more bases of sequence per sequencing unit than other sequencing methods that yield fewer but longer reads (e.g., San Diego, Calif.)
- the Illumina Inc. Solexa sequencer method determines the nucleotide sequence of clonally expanded proxies for individual nucleic acid molecules (e.g., Illumina Inc., San Diego, CA).
- Solexa sequencer 454 Life Sciences (Branford, Connecticut) and Ion Torrent).
- Other methods or machines for next-generation sequencing include, but are not limited to, 454 Life Sciences (Branford, CT), Applied Biosystems (Foster City, CA; SOLiD sequencers), Helicos Provided by Bioscience Corporation (Cambridge, MA) and emulsion and microfluidic sequencing technology nanodroplets (eg, GnuBio droplets).
- Genome Sequencer FLX system from Roche/454
- Illumina/Solexa Genome Analyzer GA
- Life/APG's Support Oligonucleotide Ligation Detection SOLiD
- Polonator's G. 007 system Helicos BioSciences' HeliScope Gene Sequencing system and the PacBio RS system from Pacific Biosciences.
- NGS techniques may include, for example, one or more of the steps of template preparation, sequencing and imaging, and data analysis.
- Methods for template preparation include steps such as randomly disrupting nucleic acids (e.g., genomic DNA or cDNA) to small sizes and creating sequencing templates (e.g., fragment templates or mate-pair templates). can do.
- Spatially separated templates can be attached to or immobilized on a solid surface or support, allowing a large number of sequencing reactions to be performed simultaneously.
- the types of templates that can be used for the NGS reaction include, for example, templates in which clones derived from a single DNA molecule are amplified and single DNA molecule templates.
- Methods for preparing a clone-amplified template include, for example, emulsion PCR (emPCR) and solid phase amplification.
- emPCR emulsion PCR
- solid phase amplification emulsion PCR
- EmPCR can be used to prepare templates for NGS.
- a library of nucleic acid fragments is created, and adapters containing universal priming sites are ligated to the ends of the fragments.
- the fragments are then denatured into single strands and captured by beads. Each bead captures a single nucleic acid molecule.
- a large amount of the template can be attached, fixed to a polyacrylamide gel on a standard microscope slide (e.g. Polonator), and placed on an amino-coated glass surface (e.g. Polonator). , Life/APG; Polonator), or deposited onto individual PicoTiterPlate (PTP) wells (e.g., Roche/454), wherein the NGS reaction this can be done
- Solid phase amplification can also be used to generate templates for NGS.
- the forward and reverse primers are covalently attached to the solid support.
- the surface density of the amplified fragments is defined as the ratio of primer to template on the support.
- Solid phase amplification can create millions of spatially separated clusters of templates (eg, Illumina/Solexa). The ends of the template cluster can be hybridized to universal primers for NGS reactions.
- MDA Multiple Displacement Amplification
- Template amplification methods such as PCR can couple the NGS platform to the target or can enrich specific regions of the genome (eg exons).
- Representative template enrichment methods include, for example, microdroplet PCR techniques (Tewhey R. et al., Nature Biotech. 2009, 27:1025-1031), custom-designed oligonucleotide microarrays (eg, Roche/ NimbleGen oligonucleotide microarrays) and solution-based hybridization methods (eg, molecular inversion probes (MIPs)) (Porreca G. J. et al., Nature Methods, 2007, 4:931-936; Krishnakumar S. et al., Proc. Natl. Acad. Sci.
- MIPs molecular inversion probes
- Single-molecule templates are another type of template that can be used for NGS reactions.
- Spatially separated single molecular templates can be immobilized on solid supports by a variety of methods.
- individual primer molecules are covalently attached to a solid support.
- An adapter is added to the template, and the template is then hybridized to the immobilized primer.
- a single-molecule template is covalently attached to a solid support by priming and extending a single-stranded single-molecule template from an immobilized primer.
- the universal primer is then hybridized to the template.
- a single polymerase molecule is attached to a solid support to which a primed template is bound.
- sequencing and imaging methods for NGS include, but are not limited to, cyclic reversible termination (CRT), sequencing by ligation (SBL), single-molecule addition (pyrosequencing ( pyrosequencing)) and real-time sequencing.
- CRT cyclic reversible termination
- SBL sequencing by ligation
- pyrosequencing pyrosequencing
- CRT uses a reversible terminator in a cyclic method involving minimal steps of nucleotide embedding, fluorescence imaging and cleavage.
- DNA polymerase incorporates in the primer a single fluorescently modified nucleotide that is complementary to the complementary nucleotide of the template base.
- DNA synthesis is terminated after the addition of a single nucleotide, and unincorporated nucleotides are washed away. Imaging is performed to determine the identity of the included labeled nucleotides. Then, in a cleavage step, the terminator/repressor and fluorescent dye are removed.
- Exemplary NGS platforms using the CRT method include, but are not limited to, the four-color CRT method detected by total internal reflection fluorescence (TIRF) and the combined clonally amplified template method using the method.
- TIRF total internal reflection fluorescence
- Illumina/Solexa Genome Analyzer (GA) and Helicos BioSciences/HeliScope, which uses a single-molecule template method coupled with a one-color CRT method detected by TIRF.
- SBL uses DNA ligase and either a 1-base-encoded probe or a 2-base-encoded probe for sequencing.
- fluorescently labeled probes hybridize to complementary sequences adjacent to the primed template.
- DNA ligase is used to ligate the dye-labeled probe to the primer. After the non-ligated probes are washed away, fluorescence imaging is performed to determine the identity of the ligated probes.
- the fluorescent dye can be removed using a cleavable probe that regenerates the 5'-PO4 group for subsequent ligation cycles.
- new primers can be hybridized to the template after the old primers have been removed.
- Exemplary SBL platforms include, but are not limited to, Life/APG/SOLiD (support oligonucleotide ligation detection), which uses a two-base-encoded probe.
- the pyrosequencing method is based on detecting the activity of DNA polymerase with another chemiluminescent enzyme. Typically, the method sequences a single strand of DNA by synthesizing the complementary strand one base pair at a time and detecting the base actually added at each step.
- the template DNA is static and solutions of A, C, G and T nucleotides are added sequentially and removed from the reaction. Light is produced only when the nucleotide solution replenishes the template's unpaired bases. The sequence of the solution that produces the chemiluminescent signal determines the sequence of the template.
- Representative pyrosequencing platforms include, but are not limited to, the Roche/454 using DNA templates prepared by emPCR with 1 to 2 million beads deposited in PTP wells.
- Real-time sequencing involves imaging the contiguous inclusion of dye-labeled nucleotides during DNA synthesis.
- exemplary real-time sequencing platforms include, but are not limited to, individual zero-mode waveguides (ZMW) for obtaining sequence information when phosphate-linked nucleotides are included in growing primer strands.
- ZMW zero-mode waveguides
- NGS NGS
- Other sequencing methods for NGS include, but are not limited to, nanopore sequencing, sequencing by hybridization, nano-transistor array based sequencing, polony sequencing, scanning tunneling microscopy (STM) based. sequencing and nanowire-molecular sensor-based sequencing.
- Nanopore sequencing involves the electrophoresis of nucleic acid molecules in solution through nano-scale pores that provide a highly confined space in which single-nucleic acid polymers can be analyzed. Representative methods of nanopore sequencing are described, eg, in Branton D. et al., Nat Biotechnol. 2008; 26(10): 1146-53.
- Sequencing by hybridization is a non-enzymatic method using DNA microarrays.
- a single pool of DNA is fluorescently labeled and hybridized to an array containing known sequences.
- a hybridization signal from a given spot on the array can identify the DNA sequence. Binding of one strand of DNA to its complementary strand in a DNA double-strand is sensitive even to single-base mismatches when the hybrid region is short or when a specific mismatch detection protein is present.
- Representative methods of sequencing by hybridization are described, for example, in Hanna G.J. et al. J. Clin. Microbiol. 2000; 38(7): 2715-21; and Edwards J.R. et al., Mut. Res. 2005; 573 (1-2): 3-12).
- Polony sequencing is based on following sequencing via Poloni amplification and multiple single-base-extension (FISSEQ).
- Poloni amplification is a method of amplifying DNA in situ on a polyacrylamide film. Representative Poloni sequencing methods are described, for example, in US Patent Application Publication No. 2007/0087362.
- Nano-transistor array based devices such as Carbon NanoTube Field Effect Transistors (CNTFETs) can also be used for NGS.
- CNTFETs Carbon NanoTube Field Effect Transistors
- DNA molecules are stretched and driven across nanotubes by micro-fabricated electrodes. DNA molecules come into contact with the carbon nanotube surface sequentially, and a difference in the current flow from each base is created due to charge transfer between the DNA molecule and the nanotube. DNA is sequenced by recording these differences.
- An exemplary nano-transistor array based sequencing method is described, for example, in US Patent Publication No. 2006/0246497.
- Scanning electron tunneling microscopy can also be used for NGS.
- STM uses piezo-electronic-controlled probes to perform raster scans of a specimen to form images of its surface.
- STM can be used to image the physical properties of single DNA molecules, for example, by integrating an actuator-driven flexible gap and a scanning electron tunneling microscope to create coherent electron tunneling imaging and spectroscopy. Representative sequencing methods using STM are described, for example, in US Patent Application Publication No. 2007/0194225.
- Molecular-analysis devices composed of nanowire-molecular sensors can also be used for NGS. Such devices can detect interactions of nitrogenous substances disposed on nanowires such as DNA and nucleic acid molecules.
- a molecular guide is positioned to guide molecules near the molecular sensor to allow interaction and subsequent detection. Representative sequencing methods using nanowire-molecular sensors are described, for example, in US Patent Application Publication No. 2006/0275779.
- Double ended sequencing methods can be used for NGS.
- Double end sequencing uses blocking and unblocking primers to sequence both the sense and antisense strands of DNA. Typically, these methods include annealing an unblocked primer to the first strand of the nucleic acid; annealing a second blocking primer to the second strand of the nucleic acid; extending the nucleic acid along the first strand with a polymerase; terminating the first sequencing primer; deblocking the second primer; and extending the nucleic acid along the second strand.
- Representative double-stranded sequencing methods are described, for example, in US Pat. No. 7,244,567.
- NGS reads are aligned to known reference sequences or assembled de novo. For example, identifying genetic alterations such as single-nucleotide polymorphisms and structural variants in a sample (eg, a tumor sample) can be performed by aligning NGS reads against a reference sequence (eg, a wild-type sequence).
- a reference sequence eg, a wild-type sequence.
- de novo assemblies are described, for example, in Warren R. et al., Bioinformatics, 2007, 23:500-501; Butler J. et al., Genome Res., 2008, 18:810-820; and Zerbino D.R. and Birney E., Genome Res., 2008, 18:821-829).
- Sequence alignment or assembly can be performed using read data from one or more NGS platforms, for example by mixing Roche/454 and Illumina/Solexa read data.
- the alignment step is not limited thereto, but may be performed using the BWA algorithm and the hg19 sequence.
- the sequence alignment in step (b) is a computer algorithm in which most of the lead sequences in the genome (eg, short-lead sequences from next-generation sequencing) are most similar to the lead sequence and the reference sequence. It includes computational methods or approaches used to determine identity from cases likely to be derived by evaluating.
- a variety of algorithms can be applied to sequence alignment problems. Some algorithms are relatively slow, but allow relatively high specificity. These include, for example, dynamic programming-based algorithms. Dynamic programming is a way to solve complex problems by breaking them down into simpler steps. Other approaches are relatively more efficient, but are typically less thorough. This includes, for example, heuristic algorithms and probabilistic methods designed for bulk database searches.
- candidate screening reduces the search space for sequence alignments from the whole genome to a shorter enumeration of possible alignment positions.
- Sequence alignment involves aligning sequences with sequences provided in the candidate screening step. This can be done using a global alignment (eg Needleman-Wunsch alignment) or a local alignment (eg Smith-Waterman alignment).
- Most attribute sorting algorithms can be characterized as one of three types based indexing methods: hash tables (e.g. BLAST, ELAND, SOAP), suffix trees (e.g. Bowtie, BWA), and merge sort. (e.g. Slider) based algorithm.
- Short lead sequences are typically used for alignment. Examples of sequence alignment algorithms/programs for short-lead sequences include, but are not limited to, BFAST (Homer N. et al., PLoS One. 2009; 4(11): e7767), BLASTN (on the World Wide Web). from blast.ncbi.nlm.nih.gov), BLAT (Kent W.J. Genome Res.
- Sequence alignment algorithms can be selected based on a number of factors including, for example, sequencing technique, read length, read number, available computing resources, and sensitivity/scoring requirements. Different sequence alignment algorithms can achieve different levels of speed, alignment sensitivity and alignment specificity. Alignment specificity refers to the percentage of target sequence residues that align, typically as found in a submission, that align correctly compared to a predicted alignment. Alignment sensitivity also refers to the percentage of target sequence residues that align correctly as found in normal predicted alignments that align correctly in a submission.
- Alignment algorithms such as ELAND or SOAP, can be used for the purpose of aligning short reads (eg, from Illumina/Solexa sequencers) to a reference genome when speed is the first factor to be considered.
- Alignment algorithms such as BLAST or Mega-BLAST can be used for similarity search purposes using short reads (e.g. from Roche FLX), although these methods are relatively slower when specificity is the most important factor.
- Alignment algorithms such as MAQ or Novoalign take quality scores into account, and thus can be used for single- or paired-end data when accuracy is of the essence (e.g., in fast-mass SNP searches). ).
- Alignment algorithms such as Bowtie or BWA use the Burrows-Wheeler Transform (BWT) and thus require a relatively small memory footprint. Alignment algorithms such as BFAST, PerM, SHRiMP, SOCS or ZOOM map color space reads and can therefore be used with ABI's SOLiD platform. In some applications, results from two or more sorting algorithms may be combined.
- BWT Burrows-Wheeler Transform
- the length of the sequence information (reads) in step (b) is 5 to 5000 bp, and the number of sequence information used may be 5,000 to 5 million, but is not limited thereto.
- the nucleic acid fragment terminal sequence motif in step (c) may be a pattern of 2 to 30 nucleotide sequences at both ends of the nucleic acid fragment.
- Reverse strand 3 ⁇ -ATGACTGAAAC CTTA -5 ⁇ (SEQ ID NO: 2)
- TACA read sequentially from the 5' end of the forward strand and ATTC read sequentially from the 5' end of the reverse strand become the terminal sequence motif values of this nucleic acid fragment.
- the frequency of the nucleic acid fragment terminal sequence motif in step (c) may be characterized in that the number of each motif detected in the entire nucleic acid fragment.
- Motif frequency is the number of observations of each motif in all nucleic acid fragments produced by sequencing, and the value calculated by dividing this value by the total number of nucleic acid fragments produced is the relative frequency of each motif.
- the frequency of AAAA nucleic acid fragment end sequence motifs is 125,071, which is The relative frequency of the nucleic acid fragment terminal sequence motif calculated by dividing by the total number of nucleic acid fragments is 0.00099.
- the size of the nucleic acid fragment in step (c) may be the number of bases from the 5' end to the 3' end of the nucleic acid fragment.
- the size of the nucleic acid fragment analyzed by SEQ ID NOs: 1 and 2 is 15.
- the size of the nucleic acid fragment may be 1 to 10000, preferably 10 to 1000, more preferably 50 to 500, and most preferably 90 to 250. It is not limited.
- the vectorized data in step (d) may be characterized in that the type of nucleic acid fragment terminal sequence motif is the X axis and the size of the nucleic acid fragment is the Y axis.
- Reverse strand 3 ⁇ -ATGACTGATCA ... AAC CTTA -5 ⁇ (SEQ ID NO: 4)
- This nucleic acid fragment can be expressed as a two-dimensional vector as shown in the left panel of FIG. 4, and when this process is extended and accumulated to the entire nucleic acid fragment, a two-dimensional vector as shown in the right panel of FIG. 4 is generated.
- the vectorized data may further include the sum of frequencies for each nucleic acid fragment terminal motif and the sum of frequencies for each size of nucleic acid fragments.
- a column sum value is added 4 times to the bottom of the 2-dimensional vector in FIG. 4, and fragment size information regardless of fragment end motif is added.
- an edge summary is additionally performed in which a row sum value is added to the rightmost side of the 2D vector in FIG. 4 four times to generate a 2D vector as shown in the left panel of FIG. 5 .
- the two-dimensional vector is defined as a Fragment End Motif frequency and size (FEMS) table.
- FEMS Fragment End Motif frequency and size
- the vectorized data in the present invention may be characterized as preferably imaged, but not limited thereto.
- An image is basically composed of pixels.
- a 1-dimensional 2D vector black and white
- 3-dimensional 2D vector color (RGB)
- CMLK 4-dimensional 2D vector
- the vectorized data of the present invention is not limited to images, and can be used, for example, as input data for an artificial intelligence model by stacking several n black-and-white images and using n-dimensional 2D vectors (Multi-dimensional Vector).
- step (c) prior to performing step (c), it may be characterized by further comprising the step of separately sorting nucleic acid fragments satisfying a mapping quality score of the aligned nucleic acid fragments.
- the mapping quality score may vary depending on a desired criterion, but may be preferably 15 to 70 points, more preferably 50 to 70 points, and most preferably 60 points.
- the artificial intelligence model in step (e) can be used without limitation as long as it can learn to distinguish images for each type of cancer, and is preferably a deep learning model.
- the artificial intelligence model can be used without limitation as long as it is an artificial neural network algorithm capable of analyzing vectorized data based on an artificial neural network, but preferably a convolutional neural network (CNN) or a deep neural network (Deep Neural Network). It may be characterized in that it is selected from the group consisting of Neural Network (DNN) and Recurrent Neural Network (RNN), but is not limited thereto.
- CNN convolutional neural network
- RNN Recurrent Neural Network
- the recurrent neural network is a group consisting of a long-short term memory (LSTM) neural network, a gated recurrent unit (GRU) neural network, a vanilla recurrent neural network, and an attentive recurrent neural network. It can be characterized as being selected.
- the loss function for performing binary classification may be characterized in that it is represented by Equation 1 below, and the loss function for performing multi-class classification is represented by Equation 2 below. can be characterized as being
- the binary classification means that an artificial intelligence model learns to determine the presence or absence of cancer
- multi-class classification means that an artificial intelligence model learns to discriminate two or more types of cancer
- learning when the artificial intelligence model is a CNN, learning may be performed including the following steps:
- the training data is used when learning the CNN model
- the validation data is used for hyper-parameter tuning verification
- the test data is used for performance evaluation after producing the optimal model.
- the hyper-parameter tuning process is a process of optimizing the values of various parameters (the number of convolution layers, the number of dense layers, the number of convolution filters, etc.) constituting the CNN model, and the hyper-parameter tuning process includes Bayesian optimization and grid search techniques. It can be characterized by using.
- the learning process optimizes the internal parameters (weights) of the CNN model using predetermined hyper-parameters, and when the validation loss compared to the training loss starts to increase, it is determined that the model is overfitting, and before that, the model It may be characterized as stopping learning.
- the result value analyzed from the vectorized data input by the artificial intelligence model in step e) can be used without limitation as long as it is a specific score or real number, and is preferably a DPI (Deep Probability Index) value. It can, but is not limited thereto.
- DPI Deep Probability Index
- the Deep Probability Index uses a sigmoid function in the case of binary classification in the last layer of the artificial intelligence model and a softmax function in the case of multi-class classification to adjust the output of artificial intelligence to a scale of 0 to 1 to obtain a value expressed as a probability value. it means.
- the sigmoid function is used to learn so that the DPI value becomes 1 in case of cancer. For example, if a breast cancer sample and a normal sample are input, the DPI value of the breast cancer sample is learned to be close to 1.
- the softmax function is used to select as many DPI values as the number of classes.
- the sum of DPI values equal to the number of classes is 1, and learning is performed so that the DPI value of the actual cancer type is 1. For example, if there are three classes breast cancer, liver cancer, and normal, and a breast cancer sample is received, the breast cancer class is learned close to 1.
- the output result value of step (e) may be characterized in that it is derived for each type of cancer.
- the artificial intelligence model learns, if there is cancer, the output result learns close to 1, and if there is no cancer, the output result learns close to 0. , 0.5 or less, it was judged that there was no cancer and performance measurement was performed (training, validation, test accuracy).
- the reference value of 0.5 is a value that can be changed at any time. For example, if you want to reduce false positives, you can strictly set the standard value higher than 0.5 to determine that you have cancer. You can take a little weaker standard that judges that there is.
- the standard value can be determined by checking the probability of the DPI value by applying unseen data (data for which the answer is not trained for learning) using the learned artificial intelligence model.
- the step of predicting the cancer type through the comparison of the output result value of step (f) is performed by a method comprising determining the cancer type showing the highest value among the output result values as the cancer of the sample. It can be characterized by doing.
- the present invention includes a decoding unit for decoding sequence information by extracting nucleic acids from a biological sample
- an alignment unit that aligns the translated sequence with a standard chromosomal sequence database
- nucleic acid fragment analyzer for deriving the frequency of terminal sequence motifs and the size of the nucleic acid fragments based on the aligned sequences
- a data generation unit for generating vectorized data using the terminal sequence motif frequency of the derived nucleic acid fragment and the size of the nucleic acid fragment;
- a cancer diagnosis unit that analyzes the generated vectorized data by inputting it to the learned artificial intelligence model and compares it with a reference value to determine whether or not there is cancer
- An apparatus for diagnosing and predicting cancer including a cancer type prediction unit for predicting a cancer type by analyzing an output result value.
- the decoding unit nucleic acid injection unit for injecting the extracted nucleic acid in an independent device; And it may include a sequence information analyzer for analyzing the sequence information of the injected nucleic acid, preferably an NGS analysis device, but is not limited thereto.
- the decryption unit may be characterized in that it receives and decodes sequence information data generated in an independent device.
- the present invention is a computer readable storage medium comprising instructions configured to be executed by a processor for diagnosing cancer and predicting cancer types,
- It relates to a computer-readable storage medium including instructions configured to be executed by a processor for predicting the presence of cancer and the type of cancer through the step of (f) predicting the type of cancer through the comparison of output result values.
- a method according to the present disclosure may be implemented using a computer.
- a computer includes one or more processors coupled to a chip set.
- a memory, a storage device, a keyboard, a graphics adapter, a pointing device, and a network adapter are connected to the chipset.
- the performance of the chipset is enabled by a memory controller hub and an I/O controller hub.
- the memory may be used directly coupled to the processor instead of a chip set.
- a storage device is any device capable of holding data, including a hard drive, compact disk read-only memory (CD-ROM), DVD, or other memory device. Memory is concerned with data and instructions used by the processor.
- the pointing device may be a mouse, track ball or other type of pointing device, and is used in combination with a keyboard to transmit input data to a computer system.
- the graphics adapter presents images and other information on a display.
- the network adapter is connected to the computer system through a local area network or a long distance communication network.
- the computer used herein is not limited to the above configuration, may not have some configurations, may include additional configurations, and may also be part of a storage area network (SAN), and the computer of the present application May be configured to be suitable for the execution of modules in the program for the execution of the method according to the present invention.
- SAN storage area network
- a module herein may mean a functional and structural combination of hardware for implementing the technical idea according to the present application and software for driving the hardware.
- the module may mean a logical unit of a predetermined code and a hardware resource for executing the predetermined code, and does not necessarily mean a physically connected code or one type of hardware. is apparent to those skilled in the art.
- It relates to a method for diagnosing cancer and predicting cancer types, including the step of predicting cancer types through the comparison of the output result values.
- the nucleic acid fragment terminal motif is set to 4 bases (A, T, G, C), and there is no difference in relative frequency in the Normal / HCC / EC group among the total 256 (4*4*4*4) types of motifs. There are motifs. If an FEMS table is created including a motif without such a difference, it becomes noise that only increases the amount of computation of the model without providing meaningful information for classification. Therefore, in order to exclude these meaningless motifs, only specific motifs with significant differences in relative frequencies in the three groups were selected.
- the nucleic acid fragment terminal motif was set to 4 bases (A, T, G, C), and a total of 256 (4*4*4*4) types of Some of the motifs show a statistically significant (Kruskal-wallis Test, FDR-adjust p ⁇ 0.05) difference in relative frequency between the healthy (Normal), liver cancer (HCC), and esophageal cancer (EC) patient groups. Motifs were selected (Fig. 2).
- a motif with an average frequency higher than the random baseline (1/256, 0.004) in the healthy group was additionally selected to prevent overfitting.
- nucleic acid fragment size screening In the case of nucleic acid fragment size screening, most of the nucleic acid fragments whose quality has been confirmed have a size in the range of 90 to 250, as shown in FIG. Since most of the area is filled with a value of 0 and only meaningless noise increases, the above size was selected.
- a two-dimensional vector was created by arranging the motif type on the X axis and the fragment size on the Y axis so that the Fragment End Motif frequency value and Size information of the nucleic acid fragment selected in Example 2 could be simultaneously expressed. More specifically, as described in the left panel of FIG. 4, the types and sizes of nucleic acid motifs at both ends of one nucleic acid fragment are expressed as frequencies, expanded and accumulated to the entire nucleic acid fragment, and as described in FIG. The same 2D vector was created.
- a 2D vector as shown in FIG. 5 was finally generated by performing an Edge Summary step of adding a row sum value to the rightmost side of the 2D vector 4 times.
- This two-dimensional vector was defined as a Fragment End Motif frequency and size (FEMS) table, and an example of visualizing it is as described in FIG. 5.
- FEMS Fragment End Motif frequency and size
- the entire sample was divided into training, validation, and test data sets, and the training data set was used for model learning, the validation data set for hyper-parameter tuning, and the test data set for final model performance evaluation.
- the number of samples for each set is as follows.
- ReLU Rectified Linear Unit
- one convolution layer was used, and five 10*10 patches were used.
- the pooling method used max and used a 2x2 patch.
- One fully connected layer was used and 512 hidden nodes were included.
- the final DPI value was calculated using the softmax function value.
- the hyper-parameter tuning process is a process of optimizing the values of various parameters (number of convolution layers, number of dense layers, number of convolution filters, etc.) that make up the CNN model.
- Bayesian optimization and grid search techniques were used in the hyper-parameter tuning process, and Training When the validation loss versus the loss started to increase, it was judged that the model was overfitting, and model training was stopped.
- the model with the best validation data set performance was determined to be the best model, and the final performance evaluation was performed with the test data set.
- the probability of being a healthy person, liver cancer patient, and esophageal cancer patient of the sample are calculated through the softmax function, which is the last layer of the CNN model. and this probability value was defined as the Deep Probability Index (DPI).
- DPI Deep Probability Index
- a random sample is determined as a group having the highest value among the three types of DPI values. For example, when the DPI values of a healthy person, a liver cancer patient, and an esophageal cancer patient calculated in a random sample are 0.6, 0.3, and 0.1, respectively, the sample is determined to be a healthy person.
- the X axis of FIG. 8 represents the group (True label) information of the actual sample
- the Y axis represents the DPI values of healthy (Normal), liver cancer patients (HCC), and esophageal cancer patients (EC) calculated by the CNN model in order from the left. .
- the DPI distribution confirmed that healthy samples had the highest probability of being healthy in all of the Train, Validation, and Test data sets, and liver cancer patient samples had the highest probability of being liver cancer patients. It was confirmed that esophageal cancer patient samples had the highest probability of being esophageal cancer patients.
- the method for diagnosing cancer and predicting cancer types using the frequency and size of cell-free nucleic acid fragment terminal sequence motifs generates vectorized data and analyzes them using an AI algorithm, so it is useful because it shows high sensitivity and accuracy even if the read coverage is low. do.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Theoretical Computer Science (AREA)
- Public Health (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Primary Health Care (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Molecular Biology (AREA)
- Pathology (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Chemical & Material Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioethics (AREA)
- Radiology & Medical Imaging (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
Claims (16)
- (a) 생체시료에서 핵산을 추출하여 서열정보를 획득하는 단계;(b) 획득한 서열정보(reads)를 표준 염색체 서열 데이터베이스(reference genome database)에 정렬(alignment)하는 단계;(c) 상기 정렬된 서열정보(reads)를 이용하여 핵산단편(fragments)의 말단 서열 모티프 빈도 및 핵산단편의 크기를 도출하는 단계;(d) 상기 도출된 핵산단편의 말단 서열 모티프 빈도 및 핵산단편의 크기를 이용하여 벡터화된 데이터를 생성하는 단계;(e) 생성된 상기 벡터화된 데이터를 학습된 인공지능 모델에 입력하여 분석한 출력 결과값과 기준값(cut-off value)을 비교하여 암 유무를 판정하는 단계; 및(f) 상기 출력 결과값 비교를 통해 암 종을 예측하는 단계를 포함하는 암 진단 및 암 종 예측을 위한 정보의 제공방법
- (a) 생체시료에서 핵산을 추출하여 서열정보를 획득하는 단계;(b) 획득한 서열정보(reads)를 표준 염색체 서열 데이터베이스(reference genome database)에 정렬(alignment)하는 단계;(c) 상기 정렬된 서열정보(reads)를 이용하여 핵산단편(fragments)의 말단 서열 모티프 빈도 및 핵산단편의 크기를 도출하는 단계;(d) 상기 도출된 핵산단편의 말단 서열 모티프 빈도 및 핵산단편의 크기를 이용하여 벡터화된 데이터를 생성하는 단계;(e) 생성된 상기 벡터화된 데이터를 학습된 인공지능 모델에 입력하여 분석한 출력 결과값과 기준값(cut-off value)을 비교하여 암 유무를 판정하는 단계; 및(f) 상기 출력 결과값 비교를 통해 암 종을 예측하는 단계를 포함하는 암 진단 및 암 종 예측방법.
- 제1항 또는 제2항에 있어서, 상기 (a) 단계는 다음의 단계를 포함하는 방법으로 수행되는 것을 특징으로 하는 방법:(a-i) 혈액, 정액, 질 세포, 모발, 타액, 소변, 구강세포, 태반세포 또는 태아세포를 포함하는 양수, 조직세포 또는 이의 혼합물에서 핵산을 수득하는 단계;(a-ii) 채취된 핵산에서 솔팅-아웃 방법(salting-out method), 컬럼 크로마토그래피 방법(column chromatography method) 또는 비드 방법(beads method)을 사용하여 단백질, 지방, 및 기타 잔여물을 제거하고 정제된 핵산을 수득하는 단계;(a-iii) 정제된 핵산 또는 효소적 절단, 분쇄, 수압 절단 방법(hydroshear method)으로 무작위 단편화(random fragmentation)된 핵산에 대하여, 싱글 엔드 시퀀싱(single-end sequencing) 또는 페어 엔드 시퀀싱(pair-end sequencing) 라이브러리(library)를 제작하는 단계;(a-iv) 제작된 라이브러리를 차세대 유전자서열검사기(next-generation sequencer)에 반응시키는 단계; 및(a-v) 차세대 유전자서열검사기에서 핵산의 서열정보(reads)를 획득하는 단계.
- 제1항에 있어서, 상기 (c) 단계의 말단 서열 모티프는 핵산단편 양 말단의 2 내지 30개의 염기서열의 패턴인 것을 특징으로 하는 방법.
- 제1항 또는 제2항에 있어서, 상기 (c) 단계의 말단 서열 모티프 빈도는 전체 핵산 단편에서 검출된 각각의 모티프 수인 것을 특징으로 하는 방법.
- 제1항 또는 제2항에 있어서, 상기 (c) 단계의 핵산단편의 크기는 핵산단편의 5’ 말단에서 3’ 말단까지의 염기 개수인 것을 특징으로 하는 방법.
- 제1항 또는 제2항에 있어서, 상기 (d) 단계의 벡터화된 데이터는 핵산단편 말단 서열 모티프 종류를 X축으로 하고, 핵산단편의 크기를 Y축으로 하는 것을 특징으로 하는 방법.
- 제7항에 있어서, 상기 벡터화된 데이터는 핵산단편 말단 모티프별 빈도의 총합 및 핵산단편 크기별 빈도의 총합을 추가로 포함하는 것을 특징으로 하는 방법.
- 제1항 또는 제2항에 있어서, 상기 (e) 단계의 인공지능 모델은 건강인 벡터화된 데이터와 암이 있는 벡터화된 데이터를 구별할 수 있도록 학습하는 것을 특징으로 하는 방법.
- 제9항에 있어서, 상기 인공지능 모델은 합성곱 신경망(convolutional neural network, CNN), 심층 신경망(Deep Neural Network, DNN) 및 순환 신경망(Recurrent Neural Network, RNN)으로 구성된 군에서 선택되는 것을 특징으로 하는 방법.
- 제1항 또는 제2항에 있어서, 상기 (e) 단계의 인공지능 모델이 입력된 벡터화된 데이터를 분석하여 출력하는 결과값은 DPI(Deep Probability Index)값인 것을 특징으로 하는 방법.
- 제1항 또는 제2항에 있어서, 상기 (e) 단계의 기준값은 0.5이며, 0.5 이상일 경우, 암 인 것으로 판정하는 것을 특징으로 하는 방법.
- 제1항 또는 제2항에 있어서,상기 (f) 단계의 출력 결과값 비교를 통해 암 종을 예측하는 단계는 출력 결과값 중, 가장 높은 값을 나타내는 암 종을 샘플의 암으로 판정하는 단계를 포함하는 방법으로 수행하는 것을 특징으로 하는 방법.
- 생체시료에서 핵산을 추출하여 서열정보를 해독하는 해독부;해독된 서열을 표준 염색체 서열 데이터베이스에 정렬하는 정렬부;정렬된 서열 기반의 핵산단편의 말단 서열 모티프 빈도 및 핵산단편의크기를 도출하는 핵산단편 분석부;도출된 핵산단편의 말단 서열 모티프 빈도 및 핵산단편의 크기를 이용한 벡터화된 데이터를 생성하는 데이터 생성부;생성된 벡터화된 데이터를 학습된 인공지능 모델에 입력하여 분석하고, 기준값과 비교하여 암 유무를 판정하는 암 진단부; 및출력된 결과값을 분석하여 암 종을 예측하는 암 종 예측부를 포함하는 암 진단 및 암 종 예측 장치.
- 컴퓨터 판독 가능한 저장 매체로서, 암 진단 및 암 종을 예측하는 프로세서에 의해 실행되도록 구성되는 명령을 포함하되,(a) 생체시료에서 핵산을 추출하여 서열정보를 획득하는 단계;(b) 획득한 서열정보(reads)를 표준 염색체 서열 데이터베이스(reference genome database)에 정렬(alignment)하는 단계;(c) 상기 정렬된 서열정보(reads)를 이용하여 핵산단편(fragments)의 말단 서열 모티프 빈도 및 핵산단편의 크기를 도출하는 단계;(d) 상기 도출된 핵산단편의 말단 서열 모티프 빈도 및 핵산단편의 크기를 이용하여 벡터화된 데이터를 생성하는 단계;(e) 생성된 상기 벡터화된 데이터를 학습된 인공지능 모델에 입력하여 분석한 출력 결과값과 기준값(cut-off value)을 비교하여 암 유무를 판정하는 단계; 및(f) 상기 출력 결과값 비교를 통해 암 종을 예측하는 단계를 통하여, 암 유무 및 암 종을 예측하는 프로세서에 의해 실행되도록 구성되는 명령을 포함하는 컴퓨터 판독 가능한 저장 매체.
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2022283089A AU2022283089A1 (en) | 2021-05-28 | 2022-05-30 | Method for diagnosing cancer and predicting cancer type by using terminal sequence motif frequency and size of cell-free nucleic acid fragment |
EP22811704.0A EP4350708A1 (en) | 2021-05-28 | 2022-05-30 | Method for diagnosing cancer and predicting cancer type by using terminal sequence motif frequency and size of cell-free nucleic acid fragment |
CN202280038191.3A CN117897776A (zh) | 2021-05-28 | 2022-05-30 | 使用细胞游离核酸片段的末端序列基序频率和大小诊断癌症和预测癌症类型的方法 |
CA3220412A CA3220412A1 (en) | 2021-05-28 | 2022-05-30 | Method for diagnosing cancer and predicting cancer type by using terminal sequence motif frequency and size of cell-free nucleic acid fragment |
US18/171,360 US20230260655A1 (en) | 2021-05-28 | 2023-02-19 | Method for diagnosing cancer and predicting cancer type by using terminal sequence motif frequency and size of cell-free nucleic acid fragment |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020210068891A KR20220160806A (ko) | 2021-05-28 | 2021-05-28 | 세포유리 핵산단편 말단 서열 모티프 빈도 및 크기를 이용한 암 진단 및 암 종 예측방법 |
KR10-2021-0068891 | 2021-05-28 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/171,360 Continuation US20230260655A1 (en) | 2021-05-28 | 2023-02-19 | Method for diagnosing cancer and predicting cancer type by using terminal sequence motif frequency and size of cell-free nucleic acid fragment |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022250513A1 true WO2022250513A1 (ko) | 2022-12-01 |
Family
ID=84229107
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2022/007651 WO2022250513A1 (ko) | 2021-05-28 | 2022-05-30 | 세포유리 핵산단편 말단 서열 모티프 빈도 및 크기를 이용한 암 진단 및 암 종 예측방법 |
Country Status (7)
Country | Link |
---|---|
US (1) | US20230260655A1 (ko) |
EP (1) | EP4350708A1 (ko) |
KR (1) | KR20220160806A (ko) |
CN (1) | CN117897776A (ko) |
AU (1) | AU2022283089A1 (ko) |
CA (1) | CA3220412A1 (ko) |
WO (1) | WO2022250513A1 (ko) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116083578A (zh) * | 2022-12-15 | 2023-05-09 | 华中科技大学同济医学院附属同济医院 | 预测宫颈癌新辅助化疗效果或复发高危分类的系统及其方法 |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060246497A1 (en) | 2005-04-27 | 2006-11-02 | Jung-Tang Huang | Ultra-rapid DNA sequencing method with nano-transistors array based devices |
US20060275779A1 (en) | 2005-06-03 | 2006-12-07 | Zhiyong Li | Method and apparatus for molecular analysis using nanowires |
US20070087362A1 (en) | 2004-02-27 | 2007-04-19 | President And Fellows Of Harvard College | Polony fluorescent in situ sequencing beads |
US7244567B2 (en) | 2003-01-29 | 2007-07-17 | 454 Life Sciences Corporation | Double ended sequencing |
US20070194225A1 (en) | 2005-10-07 | 2007-08-23 | Zorn Miguel D | Coherent electron junction scanning probe interference microscope, nanomanipulator and spectrometer with assembler and DNA sequencing applications |
KR20180124550A (ko) | 2017-05-12 | 2018-11-21 | 한국전자통신연구원 | 연관패턴 학습을 통한 사용자 일정 추천 시스템 및 방법 |
KR20190001741A (ko) | 2017-06-28 | 2019-01-07 | 삼성전자주식회사 | 안테나 장치 및 안테나를 포함하는 전자 장치 |
KR20190003676A (ko) | 2016-05-02 | 2019-01-09 | 코닝 인코포레이티드 | 광학적 선명도(clarity)를 갖는 적층된(laminated) 유리 구조물 및 이의 제조 방법. |
KR20190036494A (ko) * | 2017-09-27 | 2019-04-04 | 이화여자대학교 산학협력단 | Dna 복제수 변이 기반의 암 종 예측 방법 |
US20190189242A1 (en) * | 2017-12-18 | 2019-06-20 | Personal Genome Diagnostics Inc. | Machine learning system and method for somatic mutation discovery |
WO2020125709A1 (en) | 2018-12-19 | 2020-06-25 | The Chinese University Of Hong Kong | Cell-free dna end characteristics |
KR20200101106A (ko) * | 2019-02-19 | 2020-08-27 | 주식회사 녹십자지놈 | 혈중 무세포 dna 기반 간암 치료 예후예측 방법 |
KR20200108938A (ko) * | 2019-03-04 | 2020-09-22 | 주식회사 엑소퍼트 | 엑소좀에 의한 인공지능 기반의 액체생검을 이용한 암 진단 정보 제공 방법 및 시스템 |
US10975431B2 (en) | 2018-05-18 | 2021-04-13 | The Johns Hopkins University | Cell-free DNA for assessing and/or treating cancer |
-
2021
- 2021-05-28 KR KR1020210068891A patent/KR20220160806A/ko unknown
-
2022
- 2022-05-30 AU AU2022283089A patent/AU2022283089A1/en active Pending
- 2022-05-30 EP EP22811704.0A patent/EP4350708A1/en active Pending
- 2022-05-30 WO PCT/KR2022/007651 patent/WO2022250513A1/ko active Application Filing
- 2022-05-30 CN CN202280038191.3A patent/CN117897776A/zh active Pending
- 2022-05-30 CA CA3220412A patent/CA3220412A1/en active Pending
-
2023
- 2023-02-19 US US18/171,360 patent/US20230260655A1/en active Pending
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7244567B2 (en) | 2003-01-29 | 2007-07-17 | 454 Life Sciences Corporation | Double ended sequencing |
US20070087362A1 (en) | 2004-02-27 | 2007-04-19 | President And Fellows Of Harvard College | Polony fluorescent in situ sequencing beads |
US20060246497A1 (en) | 2005-04-27 | 2006-11-02 | Jung-Tang Huang | Ultra-rapid DNA sequencing method with nano-transistors array based devices |
US20060275779A1 (en) | 2005-06-03 | 2006-12-07 | Zhiyong Li | Method and apparatus for molecular analysis using nanowires |
US20070194225A1 (en) | 2005-10-07 | 2007-08-23 | Zorn Miguel D | Coherent electron junction scanning probe interference microscope, nanomanipulator and spectrometer with assembler and DNA sequencing applications |
KR20190003676A (ko) | 2016-05-02 | 2019-01-09 | 코닝 인코포레이티드 | 광학적 선명도(clarity)를 갖는 적층된(laminated) 유리 구조물 및 이의 제조 방법. |
KR20180124550A (ko) | 2017-05-12 | 2018-11-21 | 한국전자통신연구원 | 연관패턴 학습을 통한 사용자 일정 추천 시스템 및 방법 |
KR20190001741A (ko) | 2017-06-28 | 2019-01-07 | 삼성전자주식회사 | 안테나 장치 및 안테나를 포함하는 전자 장치 |
KR20190036494A (ko) * | 2017-09-27 | 2019-04-04 | 이화여자대학교 산학협력단 | Dna 복제수 변이 기반의 암 종 예측 방법 |
US20190189242A1 (en) * | 2017-12-18 | 2019-06-20 | Personal Genome Diagnostics Inc. | Machine learning system and method for somatic mutation discovery |
US10975431B2 (en) | 2018-05-18 | 2021-04-13 | The Johns Hopkins University | Cell-free DNA for assessing and/or treating cancer |
WO2020125709A1 (en) | 2018-12-19 | 2020-06-25 | The Chinese University Of Hong Kong | Cell-free dna end characteristics |
KR20200101106A (ko) * | 2019-02-19 | 2020-08-27 | 주식회사 녹십자지놈 | 혈중 무세포 dna 기반 간암 치료 예후예측 방법 |
KR20200108938A (ko) * | 2019-03-04 | 2020-09-22 | 주식회사 엑소퍼트 | 엑소좀에 의한 인공지능 기반의 액체생검을 이용한 암 진단 정보 제공 방법 및 시스템 |
Non-Patent Citations (32)
Title |
---|
BRANTON D ET AL., NAT. BIOTECHNOL., vol. 26, no. 10, 2008, pages 1146 - 53 |
CLEMENT N. L. ET AL., BIOINFORMATICS, vol. 26, no. 10, 2010, pages 1284 - 90 |
EDWARDS J. R. ET AL., MUT. RES., vol. 573, no. 1-2, 2005, pages 3 - 12 |
FAHLGREN N ET AL., RNA, vol. 15, 2009, pages 992 - 1002 |
GNIRKE A ET AL., NAT. BIOTECHNOL., vol. 27, no. 2, 2009, pages 182 - 9 |
HANNA G. J. ET AL., J. CLIN. MICROBIOL., vol. 38, no. 7, 2000, pages 2715 - 21 |
HINTONGEOFFREY ET AL., IEEE SIGNAL PROCESSING MAGAZINE, vol. 29, no. 6, 2012, pages 82 - 97 |
HOMER N ET AL., PLOS ONE, vol. 4, no. 11, 2009, pages e7767 |
JIANG PEIYONG, SUN KUN, PENG WENLEI, CHENG SUK HANG, NI MENG, YEUNG PHILIP C., HEUNG MACY M.S., XIE TINGTING, SHANG HUIMIN, ZHOU Z: "Plasma DNA End-Motif Profiling as a Fragmentomic Marker in Cancer, Pregnancy, and Transplantation", CANCER DISCOVERY, AMERICAN ASSOCIATION FOR CANCER RESEARCH, US, vol. 10, no. 5, 1 May 2020 (2020-05-01), US , pages 664 - 673, XP093007557, ISSN: 2159-8274, DOI: 10.1158/2159-8290.CD-19-0622 * |
KENT W, J. GENOME RES., vol. 12, no. 4, 2002, pages 656 - 64 |
KRISHNAKUMAR S ET AL., PROC. NATL. ACAD. SCI. USA, vol. 105, 2008, pages 9296 - 9310 |
LANGMEAD B ET AL., GENOME BIOL, vol. 10, no. 3, 2009, pages R25 |
LASKEN R. S., CURR. OPIN. MICROBIOL., vol. 10, no. 5, 2007, pages 510 - 6 |
LI H ET AL., GENOME RES, vol. 18, no. 11, 2008, pages 1851 - 8 |
LUNTER G.GOODSON M., GENOME RES, 2010 |
METZKER, M, NATURE BIOTECHNOLOGY REVIEWS, vol. 11, 2010, pages 31 - 46 |
MULLER T ET AL., BIOINFORMATICS, vol. 17, 2001, pages S182 - 9 |
NING Z ET AL., GENOME RES, vol. 11, no. 10, 2001, pages 1725 - 9 |
ONDOV B.D. ET AL., BIOINFORMATICS, vol. 24, no. 23, 2008, pages 2776 - 2396 |
PEIYONG JIANG ET AL., CANCER DISCOVERY, vol. 10, 2020, pages 664 - 673 |
PORRECA GJ ET AL., NATURE METHODS, vol. 4, 2007, pages 931 - 936 |
RUMBLE S.M. ET AL., PLOS COMPUT. BIOL., vol. 5, no. 5, 2009, pages e1000386 |
SHI H ET AL., J. COMPUT. BIOL., vol. 17, no. 4, 2010, pages 603 - 15 |
SMITH A.D. ET AL., BIOINFORMATICS, vol. 25, no. 15, 2009, pages 1966 - 2521 |
TEWHEY R ET AL., NATURE BIOTECH, vol. 27, 2009, pages 1025 - 1031 |
TRAPNELL CSALZBERG S.L., NATURE BIOTECH., vol. 27, 2009, pages 455 - 457 |
TURNER EH ET AL., NATURE METHODS, vol. 6, 2009, pages 315 - 316 |
WARREN R ET AL., BIOINFORMATICS, vol. 23, 2007, pages 500 - 501 |
WEESE D ET AL., GENOME RESEARCH, vol. 19, 2009, pages 1646 - 1654 |
WU T.D.WATANABE C.K., BIOINFORMATICS, vol. 21, no. 9, 2005, pages 1859 - 75 |
ZERBINO D.R.BIRNEY E., GENOME RES., vol. 18, 2008, pages 821 - 829 |
ZHOU, XIONGHUI ET AL., BIORXIV, 2020.07.16.201350 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116083578A (zh) * | 2022-12-15 | 2023-05-09 | 华中科技大学同济医学院附属同济医院 | 预测宫颈癌新辅助化疗效果或复发高危分类的系统及其方法 |
Also Published As
Publication number | Publication date |
---|---|
AU2022283089A9 (en) | 2024-01-04 |
EP4350708A1 (en) | 2024-04-10 |
US20230260655A1 (en) | 2023-08-17 |
CA3220412A1 (en) | 2022-12-01 |
KR20220160806A (ko) | 2022-12-06 |
CN117897776A (zh) | 2024-04-16 |
AU2022283089A1 (en) | 2023-12-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021107676A1 (ko) | 인공지능 기반 염색체 이상 검출 방법 | |
WO2022114631A1 (ko) | 인공지능 기반 암 진단 및 암 종 예측방법 | |
WO2022250513A1 (ko) | 세포유리 핵산단편 말단 서열 모티프 빈도 및 크기를 이용한 암 진단 및 암 종 예측방법 | |
WO2022097844A1 (ko) | 유전자 복제수 변이 정보를 이용하여 췌장암 환자의 생존 예후를 예측하는 방법 | |
JP2024028758A (ja) | 核酸断片間距離情報を用いた染色体異常検出方法 | |
WO2022250514A1 (ko) | 세포유리 핵산과 이미지 분석기술 기반의 암 진단 및 암 종 예측 방법 | |
WO2024117792A1 (ko) | 세포유리 핵산단편 말단 서열 모티프 빈도 및 크기를 이용한 암 진단 및 암 종 예측방법 | |
KR102452413B1 (ko) | 핵산 단편간 거리 정보를 이용한 염색체 이상 검출 방법 | |
WO2023075402A1 (ko) | 메틸화된 무세포 핵산을 이용한 암 진단 및 암 종 예측방법 | |
WO2022108407A1 (ko) | 핵산 길이 비를 이용한 암 진단 및 예후예측 방법 | |
WO2023080586A1 (ko) | 세포유리 핵산단편 위치별 서열 빈도 및 크기를 이용한 암 진단 방법 | |
WO2022250512A1 (ko) | 조직 특이적 조절지역의 무세포 dna 분포를 이용한 인공지능 기반 암 조기진단 방법 | |
WO2022203437A1 (ko) | 인공지능 기반 무세포 dna의 종양 유래 변이 검출 방법 및 이를 이용한 암 조기 진단 방법 | |
WO2024096538A1 (ko) | 간암 진단용 dna 메틸화 마커 및 이의 용도 | |
KR20220062839A (ko) | 인공지능 기반 모체 시료 중 태아 분획 결정 방법 | |
Huang | Computational Discovery and Annotations of Cell-Type Specific Long-Range Gene Regulation | |
KR101023163B1 (ko) | 컴퓨터로 구현되는 생물학적 서열 동정자 시스템 및 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22811704 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2301007635 Country of ref document: TH |
|
WWE | Wipo information: entry into national phase |
Ref document number: 3220412 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: P6003073/2023 Country of ref document: AE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202280038191.3 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023573426 Country of ref document: JP Ref document number: 805960 Country of ref document: NZ Ref document number: 2022283089 Country of ref document: AU Ref document number: AU2022283089 Country of ref document: AU |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112023024444 Country of ref document: BR |
|
ENP | Entry into the national phase |
Ref document number: 2022283089 Country of ref document: AU Date of ref document: 20220530 Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022811704 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2022811704 Country of ref document: EP Effective date: 20240102 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 523451696 Country of ref document: SA |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01E Ref document number: 112023024444 Country of ref document: BR Free format text: APRESENTE A DECLARACAO CONTENDO TODOS OS TITULARES E DEMAIS DADOS DA PRIORIDADE KR 10-2021-0068891 DE 28/05/2021 CONFORME O PARAGRAFO UNICO DO ART. 15 DA PORTARIA/INPI/NO 39/2021, A DECLARACAO ENVIADA NAO POSSUI A INFORMACAO COMPLETA. A EXIGENCIA DEVE SER RESPONDIDA EM ATE 60 (SESSENTA) DIAS DE SUA PUBLICACAO E DEVE SER REALIZADA POR MEIO DA PETICAO GRU CODIGO DE SERVICO 207. |
|
ENP | Entry into the national phase |
Ref document number: 112023024444 Country of ref document: BR Kind code of ref document: A2 Effective date: 20231123 |