KR101832834B1 - 다중점도표 분석 기반 변이 탐색 방법 및 시스템 - Google Patents
다중점도표 분석 기반 변이 탐색 방법 및 시스템 Download PDFInfo
- Publication number
- KR101832834B1 KR101832834B1 KR1020170128472A KR20170128472A KR101832834B1 KR 101832834 B1 KR101832834 B1 KR 101832834B1 KR 1020170128472 A KR1020170128472 A KR 1020170128472A KR 20170128472 A KR20170128472 A KR 20170128472A KR 101832834 B1 KR101832834 B1 KR 101832834B1
- Authority
- KR
- South Korea
- Prior art keywords
- sequence
- genome
- gap
- assembly
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 108
- 238000004458 analytical method Methods 0.000 title claims abstract description 51
- 238000012360 testing method Methods 0.000 claims abstract description 54
- 102000054766 genetic haplotypes Human genes 0.000 claims abstract description 37
- 206010064571 Gene mutation Diseases 0.000 claims abstract description 10
- 238000012163 sequencing technique Methods 0.000 claims description 57
- 241000282414 Homo sapiens Species 0.000 claims description 33
- 238000003780 insertion Methods 0.000 claims description 25
- 230000037431 insertion Effects 0.000 claims description 25
- 239000002773 nucleotide Substances 0.000 claims description 22
- 125000003729 nucleotide group Chemical group 0.000 claims description 19
- 210000000349 chromosome Anatomy 0.000 claims description 17
- 238000012217 deletion Methods 0.000 claims description 16
- 230000037430 deletion Effects 0.000 claims description 16
- 238000012986 modification Methods 0.000 claims description 14
- 230000004048 modification Effects 0.000 claims description 14
- 230000007614 genetic variation Effects 0.000 claims description 6
- 238000003556 assay Methods 0.000 claims description 4
- 241000894006 Bacteria Species 0.000 claims description 3
- 241001465754 Metazoa Species 0.000 claims description 3
- 150000007523 nucleic acids Chemical class 0.000 claims description 3
- 241000206602 Eukaryota Species 0.000 claims description 2
- 238000012790 confirmation Methods 0.000 claims description 2
- 108020004707 nucleic acids Proteins 0.000 claims description 2
- 102000039446 nucleic acids Human genes 0.000 claims description 2
- 241000196324 Embryophyta Species 0.000 claims 1
- 102100030569 Nuclear receptor corepressor 2 Human genes 0.000 claims 1
- 101710153660 Nuclear receptor corepressor 2 Proteins 0.000 claims 1
- 241000700605 Viruses Species 0.000 claims 1
- 108020004414 DNA Proteins 0.000 description 48
- 208000024191 minimally invasive lung adenocarcinoma Diseases 0.000 description 34
- 238000005516 engineering process Methods 0.000 description 32
- 108090000623 proteins and genes Proteins 0.000 description 32
- 230000035772 mutation Effects 0.000 description 21
- 238000007481 next generation sequencing Methods 0.000 description 20
- 108700028369 Alleles Proteins 0.000 description 19
- 201000010099 disease Diseases 0.000 description 18
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 18
- 102100032290 A disintegrin and metalloproteinase with thrombospondin motifs 13 Human genes 0.000 description 15
- 108091005670 ADAMTS13 Proteins 0.000 description 15
- 210000004027 cell Anatomy 0.000 description 14
- 239000012634 fragment Substances 0.000 description 13
- 238000000429 assembly Methods 0.000 description 12
- 230000000712 assembly Effects 0.000 description 12
- 230000008569 process Effects 0.000 description 12
- 102100039233 Pyrin Human genes 0.000 description 11
- 108010059278 Pyrin Proteins 0.000 description 11
- 230000002068 genetic effect Effects 0.000 description 11
- 238000013507 mapping Methods 0.000 description 11
- 239000000523 sample Substances 0.000 description 11
- 238000012268 genome sequencing Methods 0.000 description 10
- 238000002360 preparation method Methods 0.000 description 9
- 230000001717 pathogenic effect Effects 0.000 description 8
- 238000003559 RNA-seq method Methods 0.000 description 7
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 6
- 229920000936 Agarose Polymers 0.000 description 6
- 238000013459 approach Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 6
- 230000002759 chromosomal effect Effects 0.000 description 6
- 238000009826 distribution Methods 0.000 description 6
- 239000003814 drug Substances 0.000 description 6
- 150000001413 amino acids Chemical class 0.000 description 5
- 230000003321 amplification Effects 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 5
- 238000011161 development Methods 0.000 description 5
- 238000011049 filling Methods 0.000 description 5
- 230000014509 gene expression Effects 0.000 description 5
- 238000003199 nucleic acid amplification method Methods 0.000 description 5
- 102200068145 rs2301612 Human genes 0.000 description 5
- VLEIUWBSEKKKFX-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;2-[2-[bis(carboxymethyl)amino]ethyl-(carboxymethyl)amino]acetic acid Chemical compound OCC(N)(CO)CO.OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O VLEIUWBSEKKKFX-UHFFFAOYSA-N 0.000 description 4
- 108700018351 Major Histocompatibility Complex Proteins 0.000 description 4
- 108091028043 Nucleic acid sequence Proteins 0.000 description 4
- 239000011324 bead Substances 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 238000003745 diagnosis Methods 0.000 description 4
- 238000002372 labelling Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 102200068166 rs28647808 Human genes 0.000 description 4
- 239000000758 substrate Substances 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 230000005945 translocation Effects 0.000 description 4
- 238000012070 whole genome sequencing analysis Methods 0.000 description 4
- 108010001237 Cytochrome P-450 CYP2D6 Proteins 0.000 description 3
- 102100021704 Cytochrome P450 2D6 Human genes 0.000 description 3
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 3
- 101000572976 Homo sapiens POU domain, class 2, transcription factor 3 Proteins 0.000 description 3
- 208000026350 Inborn Genetic disease Diseases 0.000 description 3
- 102000043131 MHC class II family Human genes 0.000 description 3
- 108091054438 MHC class II family Proteins 0.000 description 3
- 102100026466 POU domain, class 2, transcription factor 3 Human genes 0.000 description 3
- HEMHJVSKTPXQMS-UHFFFAOYSA-M Sodium hydroxide Chemical compound [OH-].[Na+] HEMHJVSKTPXQMS-UHFFFAOYSA-M 0.000 description 3
- 239000000872 buffer Substances 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000000052 comparative effect Effects 0.000 description 3
- 230000029087 digestion Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 208000016361 genetic disease Diseases 0.000 description 3
- 238000000126 in silico method Methods 0.000 description 3
- 239000003550 marker Substances 0.000 description 3
- 238000002715 modification method Methods 0.000 description 3
- 239000002096 quantum dot Substances 0.000 description 3
- 102200011921 rs11466023 Human genes 0.000 description 3
- 102200011922 rs11466024 Human genes 0.000 description 3
- 102200012155 rs3743930 Human genes 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 239000000243 solution Substances 0.000 description 3
- 239000000126 substance Substances 0.000 description 3
- 230000020382 suppression by virus of host antigen processing and presentation of peptide antigen via MHC class I Effects 0.000 description 3
- 102100022991 Anoctamin-2 Human genes 0.000 description 2
- 101150049556 Bcr gene Proteins 0.000 description 2
- 241001474374 Blennius Species 0.000 description 2
- 101100284398 Bos taurus BoLA-DQB gene Proteins 0.000 description 2
- 102000053602 DNA Human genes 0.000 description 2
- 238000001712 DNA sequencing Methods 0.000 description 2
- 101150034979 DRB3 gene Proteins 0.000 description 2
- 102000004190 Enzymes Human genes 0.000 description 2
- 108090000790 Enzymes Proteins 0.000 description 2
- 206010016207 Familial Mediterranean fever Diseases 0.000 description 2
- 241000233866 Fungi Species 0.000 description 2
- 108010034791 Heterochromatin Proteins 0.000 description 2
- 101100268646 Homo sapiens ABL1 gene Proteins 0.000 description 2
- 101000757263 Homo sapiens Anoctamin-2 Proteins 0.000 description 2
- 101000896576 Homo sapiens Putative cytochrome P450 2D7 Proteins 0.000 description 2
- 101001100327 Homo sapiens RNA-binding protein 45 Proteins 0.000 description 2
- 101100278514 Oryza sativa subsp. japonica DRB2 gene Proteins 0.000 description 2
- 206010060862 Prostate cancer Diseases 0.000 description 2
- 208000000236 Prostatic Neoplasms Diseases 0.000 description 2
- 102100021702 Putative cytochrome P450 2D7 Human genes 0.000 description 2
- 102100038823 RNA-binding protein 45 Human genes 0.000 description 2
- 108091081062 Repeated sequence (DNA) Proteins 0.000 description 2
- 102000006382 Ribonucleases Human genes 0.000 description 2
- 108010083644 Ribonucleases Proteins 0.000 description 2
- 238000012300 Sequence Analysis Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 238000003776 cleavage reaction Methods 0.000 description 2
- 238000010367 cloning Methods 0.000 description 2
- 238000010835 comparative analysis Methods 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 238000005520 cutting process Methods 0.000 description 2
- 125000004122 cyclic group Chemical group 0.000 description 2
- 230000001351 cycling effect Effects 0.000 description 2
- 229940079593 drug Drugs 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 210000004458 heterochromatin Anatomy 0.000 description 2
- 210000003917 human chromosome Anatomy 0.000 description 2
- 238000009396 hybridization Methods 0.000 description 2
- 239000006166 lysate Substances 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 239000002090 nanochannel Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 244000052769 pathogen Species 0.000 description 2
- 239000008188 pellet Substances 0.000 description 2
- 238000005498 polishing Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 102000004169 proteins and genes Human genes 0.000 description 2
- 238000003753 real-time PCR Methods 0.000 description 2
- 230000001105 regulatory effect Effects 0.000 description 2
- 230000008439 repair process Effects 0.000 description 2
- 230000002441 reversible effect Effects 0.000 description 2
- 102220048240 rs587777909 Human genes 0.000 description 2
- 102200056850 rs796065356 Human genes 0.000 description 2
- 235000002020 sage Nutrition 0.000 description 2
- 230000007017 scission Effects 0.000 description 2
- 241000894007 species Species 0.000 description 2
- 230000003612 virological effect Effects 0.000 description 2
- QKNYBSVHEMOAJP-UHFFFAOYSA-N 2-amino-2-(hydroxymethyl)propane-1,3-diol;hydron;chloride Chemical compound Cl.OCC(N)(CO)CO QKNYBSVHEMOAJP-UHFFFAOYSA-N 0.000 description 1
- 208000010543 22q11.2 deletion syndrome Diseases 0.000 description 1
- 101100243950 Arabidopsis thaliana PIE1 gene Proteins 0.000 description 1
- 108010014064 CCCTC-Binding Factor Proteins 0.000 description 1
- 108091026890 Coding region Proteins 0.000 description 1
- 208000006992 Color Vision Defects Diseases 0.000 description 1
- 208000022774 Congenital thrombotic thrombocytopenic purpura Diseases 0.000 description 1
- 208000011231 Crohn disease Diseases 0.000 description 1
- 108020003215 DNA Probes Proteins 0.000 description 1
- 230000005778 DNA damage Effects 0.000 description 1
- 231100000277 DNA damage Toxicity 0.000 description 1
- 239000003298 DNA probe Substances 0.000 description 1
- 230000006820 DNA synthesis Effects 0.000 description 1
- 108010014303 DNA-directed DNA polymerase Proteins 0.000 description 1
- 102000016928 DNA-directed DNA polymerase Human genes 0.000 description 1
- 241000086550 Dinosauria Species 0.000 description 1
- 101100310856 Drosophila melanogaster spri gene Proteins 0.000 description 1
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 1
- 108010067770 Endopeptidase K Proteins 0.000 description 1
- 241000588724 Escherichia coli Species 0.000 description 1
- 108010022894 Euchromatin Proteins 0.000 description 1
- 108091092584 GDNA Proteins 0.000 description 1
- 206010018364 Glomerulonephritis Diseases 0.000 description 1
- 208000031886 HIV Infections Diseases 0.000 description 1
- 208000037357 HIV infectious disease Diseases 0.000 description 1
- 102100028972 HLA class I histocompatibility antigen, A alpha chain Human genes 0.000 description 1
- 102100028976 HLA class I histocompatibility antigen, B alpha chain Human genes 0.000 description 1
- 102100028971 HLA class I histocompatibility antigen, C alpha chain Human genes 0.000 description 1
- 102100036242 HLA class II histocompatibility antigen, DQ alpha 2 chain Human genes 0.000 description 1
- 102100036241 HLA class II histocompatibility antigen, DQ beta 1 chain Human genes 0.000 description 1
- 102100040482 HLA class II histocompatibility antigen, DR beta 3 chain Human genes 0.000 description 1
- 102100040485 HLA class II histocompatibility antigen, DRB1 beta chain Human genes 0.000 description 1
- 108010075704 HLA-A Antigens Proteins 0.000 description 1
- 108010058607 HLA-B Antigens Proteins 0.000 description 1
- 108010052199 HLA-C Antigens Proteins 0.000 description 1
- 108010041384 HLA-DPA antigen Proteins 0.000 description 1
- 108010086786 HLA-DQA1 antigen Proteins 0.000 description 1
- 108010065026 HLA-DQB1 antigen Proteins 0.000 description 1
- 108010039343 HLA-DRB1 Chains Proteins 0.000 description 1
- 108010061311 HLA-DRB3 Chains Proteins 0.000 description 1
- 208000031220 Hemophilia Diseases 0.000 description 1
- 208000009292 Hemophilia A Diseases 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 101000912243 Homo sapiens Beta-defensin 104 Proteins 0.000 description 1
- 101000884714 Homo sapiens Beta-defensin 4A Proteins 0.000 description 1
- 101000917858 Homo sapiens Low affinity immunoglobulin gamma Fc region receptor III-A Proteins 0.000 description 1
- 101000917839 Homo sapiens Low affinity immunoglobulin gamma Fc region receptor III-B Proteins 0.000 description 1
- 101000595802 Homo sapiens Phospholipase A and acyltransferase 2 Proteins 0.000 description 1
- 241000701044 Human gammaherpesvirus 4 Species 0.000 description 1
- 102100029185 Low affinity immunoglobulin gamma Fc region receptor III-B Human genes 0.000 description 1
- 241000204031 Mycoplasma Species 0.000 description 1
- 229910019142 PO4 Inorganic materials 0.000 description 1
- 102000035195 Peptidases Human genes 0.000 description 1
- 108091005804 Peptidases Proteins 0.000 description 1
- 102100036067 Phospholipase A and acyltransferase 2 Human genes 0.000 description 1
- 235000014676 Phragmites communis Nutrition 0.000 description 1
- 201000010769 Prader-Willi syndrome Diseases 0.000 description 1
- 239000004365 Protease Substances 0.000 description 1
- 108010029485 Protein Isoforms Proteins 0.000 description 1
- 102000001708 Protein Isoforms Human genes 0.000 description 1
- 206010037660 Pyrexia Diseases 0.000 description 1
- 239000012980 RPMI-1640 medium Substances 0.000 description 1
- 238000011529 RT qPCR Methods 0.000 description 1
- 208000037340 Rare genetic disease Diseases 0.000 description 1
- 108700005079 Recessive Genes Proteins 0.000 description 1
- 102000052708 Recessive Genes Human genes 0.000 description 1
- 101100495925 Schizosaccharomyces pombe (strain 972 / ATCC 24843) chr3 gene Proteins 0.000 description 1
- VMHLLURERBWHNL-UHFFFAOYSA-M Sodium acetate Chemical compound [Na+].CC([O-])=O VMHLLURERBWHNL-UHFFFAOYSA-M 0.000 description 1
- 108091081400 Subtelomere Proteins 0.000 description 1
- 108010006785 Taq Polymerase Proteins 0.000 description 1
- 201000007023 Thrombotic Thrombocytopenic Purpura Diseases 0.000 description 1
- ATJFFYVFTNAWJD-UHFFFAOYSA-N Tin Chemical compound [Sn] ATJFFYVFTNAWJD-UHFFFAOYSA-N 0.000 description 1
- 102100027671 Transcriptional repressor CTCF Human genes 0.000 description 1
- 101710200687 UDP-glucuronosyltransferase 2B17 Proteins 0.000 description 1
- 108091023045 Untranslated Region Proteins 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 108010045649 agarase Proteins 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 201000006288 alpha thalassemia Diseases 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 239000000427 antigen Substances 0.000 description 1
- 108091007433 antigens Proteins 0.000 description 1
- 102000036639 antigens Human genes 0.000 description 1
- 238000012550 audit Methods 0.000 description 1
- 230000001580 bacterial effect Effects 0.000 description 1
- 208000005980 beta thalassemia Diseases 0.000 description 1
- 238000003766 bioinformatics method Methods 0.000 description 1
- 239000012472 biological sample Substances 0.000 description 1
- 210000004369 blood Anatomy 0.000 description 1
- 239000008280 blood Substances 0.000 description 1
- 238000010804 cDNA synthesis Methods 0.000 description 1
- 238000005119 centrifugation Methods 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 238000003759 clinical diagnosis Methods 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 201000007254 color blindness Diseases 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000001143 conditioned effect Effects 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000009089 cytolysis Effects 0.000 description 1
- 230000000593 degrading effect Effects 0.000 description 1
- 239000000839 emulsion Substances 0.000 description 1
- 239000003623 enhancer Substances 0.000 description 1
- 238000012869 ethanol precipitation Methods 0.000 description 1
- 210000000632 euchromatin Anatomy 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000010195 expression analysis Methods 0.000 description 1
- 238000003633 gene expression assay Methods 0.000 description 1
- 229910052732 germanium Inorganic materials 0.000 description 1
- GNPVGFCGXDBREM-UHFFFAOYSA-N germanium atom Chemical compound [Ge] GNPVGFCGXDBREM-UHFFFAOYSA-N 0.000 description 1
- 229920001519 homopolymer Polymers 0.000 description 1
- 208000033519 human immunodeficiency virus infectious disease Diseases 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000002649 immunization Methods 0.000 description 1
- 230000003053 immunization Effects 0.000 description 1
- 239000007943 implant Substances 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 150000002500 ions Chemical class 0.000 description 1
- 208000032839 leukemia Diseases 0.000 description 1
- 210000000265 leukocyte Anatomy 0.000 description 1
- 239000012139 lysis buffer Substances 0.000 description 1
- 230000008774 maternal effect Effects 0.000 description 1
- 238000012067 mathematical method Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000000968 medical method and process Methods 0.000 description 1
- 239000002609 medium Substances 0.000 description 1
- 238000002844 melting Methods 0.000 description 1
- 230000004060 metabolic process Effects 0.000 description 1
- 239000000178 monomer Substances 0.000 description 1
- 210000005087 mononuclear cell Anatomy 0.000 description 1
- 230000002974 pharmacogenomic effect Effects 0.000 description 1
- 210000004214 philadelphia chromosome Anatomy 0.000 description 1
- NBIIXXVUZAFLBC-UHFFFAOYSA-K phosphate Chemical compound [O-]P([O-])([O-])=O NBIIXXVUZAFLBC-UHFFFAOYSA-K 0.000 description 1
- 239000010452 phosphate Substances 0.000 description 1
- -1 phosphoryl group Chemical group 0.000 description 1
- 239000013612 plasmid Substances 0.000 description 1
- 229920000642 polymer Polymers 0.000 description 1
- 230000002250 progressing effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 235000019419 proteases Nutrition 0.000 description 1
- 239000011535 reaction buffer Substances 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000004043 responsiveness Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 102220009046 rs1231122 Human genes 0.000 description 1
- 102220074243 rs34116584 Human genes 0.000 description 1
- 238000007480 sanger sequencing Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 210000003765 sex chromosome Anatomy 0.000 description 1
- 239000001632 sodium acetate Substances 0.000 description 1
- 235000017281 sodium acetate Nutrition 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000001847 surface plasmon resonance imaging Methods 0.000 description 1
- 208000011580 syndromic disease Diseases 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000011144 upstream manufacturing Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 201000000866 velocardiofacial syndrome Diseases 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/20—Sequence assembly
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G06F19/24—
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G06F19/20—
-
- G06F19/26—
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- Chemical & Material Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Bioethics (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Apparatus Associated With Microorganisms And Enzymes (AREA)
Abstract
Description
도 2는 디노보 어셈블리된 아시안 게놈 AK1 어셈블리와 BAC을 이용하여 레퍼런스 게놈에 존재하는 갭을 클로징한 예 (gap_367)와 갭을 연장한 예 (gap_368)을 나타낸다.
도 3은 디노보 어셈블리된 아시안 게놈 AK1 어셈블리를 이용하여 다중점도표를 작성하여 레퍼런스 게놈의 갭을 클로징한 예 (gap_367)와 갭을 연장한 예 (gap_368)를 나타낸다.
도 4는 디노보 어셈블리된 아시안 게놈 AK1 어셈블리를 이용하여 각 염색체 상에 존재하는 갭 수 (그레이) 대비 클로징된 유크로마틴 갭의 수 (레드)를 나타낸다.
도 5는 AK1 어셈블리만으로 클로징되는 갭의 수, 로컬 어셈블리로 클로징되는 갭의 수, 롱리드만으로 클로징되는 갭의 수를 나타내는 그래프이다. 또한, AK1 어셈블리만으로 연장되는 갭의 수, 롱리드로 연장되는 갭의 수 및 오픈 갭의 수를 나타낸다.
도 6은 기존 레퍼런스 게놈인 GRCh37과 AK1 어셈블리를 염기수준의 해상도 (base resolution)로 직접 비교하여, 결실, 삽입, 전좌 및 복합 변이체의 전체적인 분포를 확인한 것이다. 바깥쪽 파이그래프가 각 SV 유형 중 신규한 유형을 나타낸다. 확인된 전체 18,210개의 SV 중 총 65% (11,927개)의 SV가 기존에 보고된 적 없는 신규한 SV이다.
도 7은 기존에 보고된 SV들과 비교하여, AK1과 GRCh37를 비교했을 때 삽입 및 결실 변이체의 분포를 나타낸 것이다.
도 8은 45종의 아시안 특이적 삽입 변이체에서 대립유전자 빈도를 나타낸 것이다.
도 9는 ANO2에서 아시안 특이적 삽입체가 동아시안 (East Asian, EAS) 링크 불평형 (linkage disequilibrium, LD) 블록 내에서 일어나는 것을 나타낸다.
도 10은 이형접합체 영역의 게놈-와이드 맵과 하플로타입 A와 하플로타입 B의 발현을 로그 스케일로 나타낸 것이다.
도 11은 하플로타입 페이징된 MHC 클래스 II 영역에서 HLA 유전자를 나타낸다.
도 12는 디노보 어셈블리 기반 페이징으로 확인한 CYP2D6와 CYP2D7의 하플로타입을 나타낸다.
도 13 내지 16은 디노보 어셈블리된 시험 서열인 AK1 어셈블리와 레퍼런스 게놈 GRCh38의 다중점도표 분석으로 레퍼런스 서열에서 클로징된 갭의 예를 나타낸다.
AK1 | GRCh38 | |
어셈블리 방법 | WGS 및 BAC | BAC 및 포스미드 |
시퀀싱 및 물리적 맵핑 | PacBio 및 BioNano | Sanger, FISH, OM 및 핑거프린트 컨티그 |
디노보 어셈블리 알고리즘 | FALCON | 다수의 방법 |
페이징 방법 | De novo | NA |
스캐폴드/컨티그 N50 (Mb) | 44.85/17.92 | 67.69/56.41 |
스캐폴드/컨티그 L50 | 21/50 | 16/19 |
총 갭의 길이 (Mb) | 2,832/4,206 | 735/1,385 |
갭의 수 | 264* | 999** |
어셈블리 중 총 염기수/non-N 염기 (bp) | 2,904,207,288/2,866,687,809 | 159.97 |
페이징된 블록 N50 (Mb) | 11.55 | 3,209,286,105/3,049,316,098 |
하플로티그 수 | 18,964 | NA |
하플로티그 N50 (kb) | 875 | NA |
총 하플로티그 (bp) | 4,804,460,182 | NA |
|
SNPs | INDELs | SVs | |||
Het | Hom | He | Hom | Het | Hom | |
chr1 | 153,853 | 124,818 | 7,358 | 6,514 | 1,863 | 323 |
chr2 | 159,672 | 133,893 | 7,721 | 6,941 | 1,880 | 323 |
chr3 | 128,221 | 114,943 | 5,972 | 5,728 | 1,495 | 265 |
chr4 | 138,860 | 120,254 | 6,695 | 5,856 | 1,658 | 234 |
chr5 | 120,983 | 91,714 | 5,303 | 4,712 | 1,301 | 216 |
chr6 | 129,980 | 101,781 | 6,106 | 5,162 | 1,470 | 295 |
chr7 | 114,395 | 87,045 | 5,255 | 4,610 | 1,782 | 294 |
chr8 | 111,101 | 77,958 | 4,555 | 3,586 | 1,209 | 224 |
chr9 | 89,046 | 64,945 | 3,842 | 3,009 | 1,088 | 200 |
chr10 | 94,171 | 75,201 | 4,370 | 3,806 | 1,252 | 209 |
chr11 | 92,486 | 78,056 | 4,191 | 4,007 | 1,167 | 202 |
chr12 | 94,828 | 75,902 | 4,689 | 4,139 | 1,225 | 232 |
chr13 | 70,731 | 64,056 | 3,272 | 3,342 | 989 | 173 |
chr14 | 68,444 | 47,451 | 3,147 | 2,471 | 671 | 102 |
chr15 | 54,878 | 46,343 | 2,550 | 2,531 | 664 | 122 |
chr16 | 62,360 | 44,862 | 2,714 | 2,088 | 844 | 115 |
chr17 | 49,833 | 39,326 | 2,573 | 2,330 | 933 | 172 |
chr18 | 54,672 | 46,805 | 2,604 | 2,413 | 756 | 133 |
chr19 | 49,278 | 29,131 | 2,596 | 1,759 | 923 | 137 |
chr20 | 42,647 | 31,038 | 1,981 | 1,633 | 651 | 121 |
chr21 | 28,924 | 20,578 | 1,361 | 1,448 | 482 | 81 |
chr22 | 28,340 | 18,773 | 1,268 | 1,106 | 593 | 65 |
Total | 1,937,703 | 1,534,873 | 90,123 | 79,191 | 24,896 | 4,238 |
Chr | Start | End | Type | Phase | SV_ID (Haplotig position) | Identity | BAC Consensus |
chr1 | 113,958,943 | 113,958,943 | INS | A | A_01550005_005:96570-96835 | Perfect match | tig00002142 |
chr1 | 246,989,196 | 246,989,257 | DEL | A | A_00850001_001:536877-536877 | Perfect match | tig00000198 |
chr2 | 98,582,146 | 98,582,146 | INS | A | A_00730005_006:525155-525459 | Perfect match | tig00000392 |
chr2 | 119,652,804 | 119,652,804 | INS | B | B_01630002_002:76084-76195 | Perfect match | tig00000221 |
chr4 | 1,357,963 | 1,358,091 | DEL | A | A_00790001_001:2199723-2199723 | Perfect match | tig00000581 |
chr4 | 1,391,891 | 1,392,161 | DEL | A | A_00790001_001:2164750-2164750 | Perfect match | tig00000581 |
chr4 | 1,421,173 | 1,421,173 | INS | A | A_00790001_001:2134820-2135686 | Perfect match | tig00000581 |
chr5 | 180,473,219 | 180,473,219 | INS | A | A_00870001_013:252762-253831 | Reasonable match | tig00000671 |
chr6 | 31,274,795 | 31,274,929 | DEL | B | B_00400065_003:8064-8064 | Perfect match | tig00000117 |
chr6 | 31,288,028 | 31,288,262 | DEL | B | B_00400066_001:19587-19587 | Perfect match | tig00000559 |
chr6 | 31,296,782 | 31,296,782 | INS | A | A_00400001_001:1222942-1223286 | Perfect match | tig00000559 |
chr6 | 31,297,394 | 31,297,394 | INS | A | A_00400001_001:1221538-1222326 | Perfect match | tig00000559 |
chr7 | 91,214,216 | 91,220,724 | DEL | A | A_00370008_002:1198113-1198113 | Perfect match | tig00000431 |
chr7 | 142,098,198 | 142,276,193 | COMPLEX | A | A_01470003_003:516785-707087 | Perfect match (Half covered) | tig00000418 |
chr8 | 40,748,795 | 40,748,795 | INS | A | A_01790003_041:80596-80651 | Perfect match | tig00000737 |
chr8 | 58,127,808 | 58,127,808 | INS | B | B_01790055_010:23705-29088 | Perfect match | tig00000614 |
chr8 | 58,129,671 | 58,129,671 | INS | B | B_01790055_010:30927-36248 | Perfect match | tig00000614 |
chr8 | 58,132,170 | 58,132,170 | INS | B | B_01790055_010:38759-41799 | Perfect match | tig00000614, tig00000617 |
chr8 | 58,133,432 | 58,133,447 | COMPLEX | B | B_01790055_010:43074-44511 | Perfect match | tig00000614 |
chr8 | 58,134,140 | 58,134,162 | COMPLEX | B | B_01790055_010:45224-49532 | Perfect match | tig00000614 |
chr8 | 144,744,161 | 144,744,161 | INS | B | B_00010011_001:63909-64545 | Perfect match | tig00000658 |
chr8 | 144,744,161 | 144,744,161 | INS | B | B_00010011_001:63909-64545 | Perfect match | tig00000658 |
chr8 | 144,744,421 | 144,744,421 | INS | B | B_00010011_001:63071-63649 | Perfect match | tig00000658 |
chr8 | 144,749,197 | 144,749,247 | DEL | B | B_00010011_001:58316-58316 | Perfect match | tig00000658 |
chr9 | 72,092,330 | 72,121,287 | DEL | B | B_01450125_001:229057-229057 | Perfect match | tig00002138, tig00002139, tig00002141 |
chr9 | 73,322,438 | 73,340,431 | DEL | A | A_01450002_003:886988-886988 | Perfect match | tig00000079 |
chr10 | 124,440,913 | 124,440,913 | INS | A | A_01800001_005:119801-119978 | Perfect match | tig00000028 |
chr11 | 980,298 | 980,298 | INS | A | A_00970002_002:587698-588127 | Perfect match | tig00000703 |
chr11 | 1,017,240 | 1,017,240 | INS | B | B_00970005_001:6289-7965 | Perfect match | tig00000550 |
chr11 | 93,154,136 | 93,160,197 | DEL | B | B_01000047_001:358749-358749 | Perfect match | tig00000407 |
chr17 | 225,442 | 225,496 | DEL | B | B_00490012_006:37798-37798 | Perfect match | tig00000486 |
chr19 | 2,131,645 | 2,131,645 | INS | B | B_00670013_001:27868-27922 | Perfect match | tig00000482 |
chr19 | 2,200,975 | 2,201,029 | DEL | A | A_00670001_003:176148-176148 | Perfect match | tig00000482 |
chr19 | 8,349,945 | 8,364,794 | DEL | A | A_00670001_002:1228743-1228743 | Perfect match | tig00000303 |
chr19 | 9,322,407 | 9,322,407 | INS | A | A_00080001_001:459699-459768 | Perfect match | tig00000355 |
chr19 | 40,373,874 | 40,389,581 | DEL | B | B_01720013_001:112919-112919 | Perfect match | tig00000650, tig00000668 |
chr20 | 1,592,433 | 1,592,433 | INS | A | A_01370001_002:1107999-1108163 | Perfect match | tig00000000 |
chr20 | 1,778,684 | 1,778,684 | INS | B | B_01370008_001:49495-49609 | Perfect match | tig00000532 |
chr20 | 1,857,765 | 1,857,765 | INS | B | B_01370008_001:128670-137880 | Perfect match (Half covered) | tig00000611 |
chr20 | 4,399,805 | 4,399,860 | DEL | A | A_01370001_002:3918691-3918691 | Perfect match | tig00000499 |
chr20 | 7,458,207 | 7,458,445 | DEL | A | A_01370001_001:804521-804521 | Perfect match | tig00000430 |
chr20 | 8,586,996 | 8,587,157 | DEL | A | A_01370001_001:1933527-1933527 | Perfect match | tig00000391 |
chr20 | 16,133,474 | 16,133,474 | INS | A | A_01370002_001:267072-267124 | Reasonable match | tig00000146 |
chr20 | 16,169,497 | 16,169,497 | INS | A | A_01370002_001:303056-303200 | Perfect match | tig00000146 |
chr20 | 18,794,815 | 18,795,135 | DEL | B | B_01370014_001:474483-474483 | Perfect match | tig00000133 |
chr20 | 23,526,216 | 23,526,353 | DEL | A | A_01370003_001:340134-340134 | Perfect match | tig00000010 |
chr20 | 23,527,766 | 23,527,766 | INS | A | A_01370003_001:341528-341696 | Perfect match | tig00000010 |
chr20 | 25,540,289 | 25,540,472 | COMPLEX | A | A_01370003_001:2357667-2357775 | Perfect match | tig00000398 |
chr20 | 33,219,540 | 33,219,594 | DEL | A | A_01640001_001:6178696-6178696 | Perfect match | tig00000656 |
chr20 | 35,188,116 | 35,188,116 | INS | A | A_01640001_001:4244038-4244229 | Perfect match | tig00000232 |
chr20 | 35,190,278 | 35,190,443 | DEL | A | A_01640001_001:4241881-4241881 | Perfect match | tig00000232 |
chr20 | 51,845,284 | 51,845,341 | DEL | B | B_01640034_001:51135-51135 | Perfect match | tig00000332 |
chr20 | 54,202,888 | 54,202,888 | INS | B | B_01640046_001:984223-984298 | Perfect match | tig00000087 |
chr20 | 54,203,188 | 54,203,188 | INS | B | B_01640046_001:983800-983922 | Perfect match | tig00000087 |
chr20 | 54,406,672 | 54,406,672 | INS | A | A_01640002_001:4128535-4128676 | Perfect match | tig00000668 |
chr20 | 55,992,472 | 55,992,472 | INS | A | A_01640002_003:671879-671945 | Perfect match | tig00000337 |
chr20 | 58,874,630 | 58,874,630 | INS | A | A_01640002_004:339618-339674 | Perfect match | tig00000312 |
chr20 | 59,367,195 | 59,367,195 | INS | A | A_01640002_004:833088-833159 | Perfect match | tig00000290 |
chr20 | 59,478,144 | 59,478,144 | INS | A | A_01640002_004:944093-944180 | Perfect match | tig00000290 |
chr20 | 59,556,318 | 59,556,318 | INS | B | B_01640053_001:153874-153958 | Perfect match | tig00000003 |
chr20 | 59,604,103 | 59,604,103 | INS | B | B_01640053_001:106124-106174 | Perfect match | tig00000003 |
chr20 | 59,621,200 | 59,621,275 | DEL | B | B_01640053_001:89008-89008 | Perfect match | tig00000003 |
chr20 | 59,865,494 | 59,865,494 | INS | A | A_01640002_004:1331313-1331530 | Perfect match | tig00000606 |
chr20 | 59,974,166 | 59,974,487 | DEL | A | A_01640002_004:1444321-1444321 | Perfect match | tig00000895 |
chr20 | 60,948,560 | 60,948,700 | DEL | A | A_01640002_004:2427006-2427006 | Perfect match | tig00000390 |
chr20 | 60,992,933 | 60,992,933 | INS | A | A_01640002_004:2472158-2472210 | Perfect match | tig00000390 |
chr20 | 62,807,430 | 62,807,430 | INS | B | B_01640059_001:34217-34366 | Perfect match | tig00000145 |
chr20 | 62,902,209 | 62,902,425 | DEL | A | A_01640003_001:5674-5674 | Perfect match | tig00000145 |
chr22 | 24,195,933 | 24,198,505 | DEL | B | B_01040007_001:81879-81879 | Perfect match | tig00000305 |
HLA Gene | MHC Haplotig A | MHC Haplotig B |
HLA-A | A*32:01:01 | A*24:02:01:01 |
HLA-B | B*51:01:01:01 | B*58:01:01 |
HLA-C | C*03:02:02:01 | C*14:02:01 |
HLA-DRB1 | DRB1*03:01:01:01 | DRB1*15:01:01:01 |
HLA-DRB3 | DRB3:02:02:01:01 | DRB3:02:02:01:01 |
HLA-DQA1 | DQA1*05:01:01:01 | DQA1*01:02:01:01 |
HLA-DQB1 | DQB1*06:02:01 | DQB1*02:01:01 |
HLA-DPA | DPA1*01:03:01:01 | DPA1*02:02:02 |
HLA-DPB | DPB1*02:01:02 | DPB1*05:01:01 |
Chr | Start | End | dbSNP144 | Gene | Ref | Alt |
Num. variants
in this gene |
chr9 | 136297737 | 136297737 | novel | ADAMTS13 | C | G | 3 |
chr9 | 136301982 | 136301982 | rs2301612 | ADAMTS13 | C | G | 3 |
chr9 | 136305530 | 136305530 | novel | ADAMTS13 | C | G | 3 |
chr16 | 3293888 | 3293888 | rs1231122 | MEFV | C | T | 4 |
chr16 | 3299468 | 3299468 | rs11466024 | MEFV | C | T | 4 |
chr16 | 3299586 | 3299586 | rs11466023 | MEFV | G | A | 4 |
chr16 | 3304626 | 3304626 | rs3743930 | MEFV | C | G | 4 |
Chr | Function |
Haplotig:
Position |
Clinvar
(20150629) |
Polyphen2
HDIV score |
Polyphen2
HDIV pred |
Polyphen2
HVAR score |
Polyphen2
HVAR pred |
chr9 | nonsynonymous SNV | B_00130001_003:132374 | NA | 0.999 | D | 0.917 | D |
chr9 | nonsynonymous SNV | B_00130001_003:136616 | Pathogenic | 0 | B | 0 | B |
chr9 | nonsynonymous SNV | B_00130001_003:140165 | NA | 1 | D | 0.998 | D |
chr16 | nonsynonymous SNV | A_01690003_003:604996 | Pathogenic | . | . | . | . |
chr16 | nonsynonymous SNV | A_01690003_003:610574 | NA | 0.259 | B | 0.045 | B |
chr16 | nonsynonymous SNV | A_01690003_003:610448 | NA | 0.959 | D | 0.503 | P |
chr16 | nonsynonymous SNV | A_01690003_003:615726 | Pathogenic | 0.995 | D | 0.851 | P |
Chr |
SIFT
score |
SIFT
pred |
Trans
/Cis |
GWAS
Catalog |
Haplotype |
Expressed RNA-Seq
Allele Count A |
Expressed RNA-Seq
Allele Count B |
chr9 | 0.21 | T | CIS | NA | B | NA | NA |
chr9 | 1 | T | CIS | NA | B | 2 | 4 |
chr9 | 0.12 | T | CIS | NA | B | NA | NA |
chr16 | 0.26 | T | CIS | NA | A | NA | NA |
chr16 | 0.23 | T | CIS | NA | A | NA | NA |
chr16 | 0.05 | D | CIS | NA | A | 0 | 2 |
chr16 | 0.01 | D | CIS | NA | A | NA | NA |
Chr |
1000g2015aug
All |
1000g2015aug
EAS |
1000g2015aug
SAS |
1000g2015aug
EUR |
1000g2015aug
AFR |
1000g2015aug
AMR |
chr9 | 0.00439297 | 0.0188 | 0.002 | 0.001 | NA | NA |
chr9 | 0.271565 | 0.1835 | 0.4325 | 0.4254 | 0.0408 | 0.389 |
chr9 | 0.0323482 | 0.0198 | 0.0276 | 0.0825 | 0.0053 | 0.036 |
chr16 | 0.353634 | 0.3958 | 0.2352 | 0.4423 | 0.3306 | 0.3746 |
chr16 | 0.0171725 | 0.0546 | 0.0204 | 0.004 | 0.0015 | 0.0072 |
chr16 | 0.0201677 | 0.0675 | 0.0215 | 0.004 | 0.0023 | 0.0072 |
chr16 | 0.126398 | 0.2887 | 0.3047 | 0.0089 | 0.0204 | 0.0115 |
Chr | Clinvar (details) | AA Change |
chr9 | NA | ADAMTS13:ENST00000536611.1:exon3:c.C32G:p.T11R,ADAMTS13:ENST00000371916.1:exon7:c.C715G:p.Q239E,ADAMTS13:ENST00000355699.2:exon9:c.C1016G:p.T339R,ADAMTS13:ENST00000356589.2:exon9:c.C923G:p.T308R,ADAMTS13:ENST00000371929.3:exon9:c.C1016G:p.T339R |
chr9 | CLINSIG=pathogenic;CLNDBN=Upshaw-Schulman_syndrome;CLNREVSTAT=no_assertion_criteria_provided;CLNACC=RCV000006169.3;CLNDSDB=MedGen:OMIM:Orphanet:SNOMED_CT;CLNDSDBID=C1268935:274150:ORPHA54057:373420004 | ADAMTS13:ENST00000536611.1:exon6:c.C358G:p.Q120E,ADAMTS13:ENST00000355699.2:exon12:c.C1342G:p.Q448E,ADAMTS13:ENST00000356589.2:exon12:c.C1249G:p.Q417E,ADAMTS13:ENST00000371929.3:exon12:c.C1342G:p.Q448E |
chr9 | NA | ADAMTS13:ENST00000536611.1:exon10:c.C868G:p.P290A,ADAMTS13:ENST00000355699.2:exon16:c.C1852G:p.P618A,ADAMTS13:ENST00000356589.2:exon16:c.C1759G:p.P587A,ADAMTS13:ENST00000371929.3:exon16:c.C1852G:p.P618A |
chr16 | CLINSIG=non-pathogenic,non-pathogenic;CLNDBN=not_provided,Familial_Mediterranean_fever;CLNREVSTAT=criteria_provided\x2c_single_submitter,criteria_provided\x2c_multiple_submitters\x2c_no_conflicts;CLNACC=RCV000126738.1,RCV000030177.2;CLNDSDB=MedGen,GeneReviews:MedGen:OMIM:Orphanet:SNOMED_CT;CLNDSDBID=CN221809,NBK1227:C0031069:249100:ORPHA342:12579009 | MEFV:ENST00000541159.1:exon8:c.G1306A:p.G436R |
chr16 | NA | MEFV:ENST00000536379.1:exon2:c.G590A:p.R197Q,MEFV:ENST00000541159.1:exon2:c.G590A:p.R197Q,MEFV:ENST00000219596.1:exon3:c.G1223A:p.R408Q,MEFV:ENST00000339854.4:exon3:c.G683A:p.R228Q |
chr16 | NA | MEFV:ENST00000536379.1:exon2:c.C472T:p.P158S,MEFV:ENST00000541159.1:exon2:c.C472T:p.P158S,MEFV:ENST00000219596.1:exon3:c.C1105T:p.P369S,MEFV:ENST00000339854.4:exon3:c.C565T:p.P189S |
chr16 | CLINSIG=pathogenic;CLNDBN=Familial_mediterranean_fever\x2c_autosomal_dominant;CLNREVSTAT=no_assertion_criteria_provided;CLNACC=RCV000002664.1;CLNDSDB=MedGen:OMIM:Orphanet;CLNDSDBID=C1851347:134610:ORPHA342 | MEFV:ENST00000219596.1:exon2:c.G442C:p.E148Q |
Claims (8)
- 하기 수단들에 의해 수행되는 단계를 포함하는 컴퓨터를 이용한 게놈 분석 방법으로, 디노보 게놈 어셈블리 (de novo genome assembly)된 시험 서열로 기존에 알려진 대상 게놈에 대한 레퍼런스 어셈블리 서열 정보의 수정을 수행하는 게놈 분석 방법:
(a) 유전체 조립부가 시험 서열을 디노보 어셈블리로 게놈을 어셈블리하여 전체 서열을 재구성하는 단계;
(b) 점도표 생성부가 서열 정보의 수정을 수행하고자 하는 위치의 레퍼런스 서열의 자가-유사 점도표 (self-similarity dot-plot)를 생성하는 단계;
(c) 점도표 분석부가 상기 (b) 단계에서 생성된 레퍼런스 서열의 자가-유사 점도표 상에서 서열 갭을 확인하는 단계;
(d) 서열 선택부가 서열 정보의 수정을 수행하고자 하는 위치의 레퍼런스 서열 영역에 대응하는 위치의 서열로 디노보 게놈 어셈블리된 시험 서열의 전체 서열에서 일부 서열을 선택하는 단계;
(e) 점도표 생성부가 상기 (d) 단계에서 선택된 디노보 게놈 어셈블리된 시험 서열의 자가-유사 점도표를 생성하는 단계;
(f) 다중점도표 분석부가 상기 (b) 단계에서 생성된 레퍼런스 서열의 자가-유사 점도표와 상기 (e) 단계에서 생성된 디노보 게놈 어셈블리된 시험 서열의 자가-유사 점도표의 크기 비율을 맞추어 한 화면에 정렬하여, 디노보 게놈 어셈블리된 시험 서열의 대응하는 위치의 레퍼런스 서열에서 나타나는 서열 갭을 확인하는 단계; 및
(g) 서열 정보 교정부가 상기 (f) 단계에서 확인된 레퍼런스 서열에서 나타나는 서열 갭을 대응하는 위치의 디노보 어셈블리된 시험 서열로 클로징 (closing) 하거나 또는 갭을 연장 (extend) 하는 단계.
- 제1항에서, 상기 (a) 내지 (g) 단계 중 하나 이상의 단계에서 선택적으로, 추가 서열 정보 수정부가 로컬 재정렬 (local realignment) 또는 재어셈블리 (reassembly), 또는 스패닝 (spanning) 롱 리드를 사용하여 레퍼런스 서열 정보를 수정하는 단계를 포함하는 게놈 분석 방법.
- 제1항에서, 상기 대상 게놈은 원핵생물, 진핵생물, 박테리아, 바이러스, 동물, 식물, 또는 인간으로부터 유래한 게놈 서열 또는 그 일부인 것인 게놈 분석 방법.
- 제1항에서, 레퍼런스 서열 정보의 수정은 갭 클로징 (gap closing) 또는 갭을 연장 (extend) 하는 것인 게놈 분석 방법.
- 제1항에 있어서, 시험 서열의 디노보 어셈블리는 PacBio SMRT 롱리드, BioNano Genomics 차세대 맵핑 (next-generation maps), Illumina HiSeq 쇼트리드, 10X Genomics GemCode 링크드 리드 및 BAC 클론 시퀀싱 방법 중 하나 이상의 방법을 조합하는 것인 게놈 분석 방법.
- 제1항의 방법에 추가적으로 유전자 변이 확인부가 대상 시험체의 게놈 서열 상의 단일염기다형성 (single nucleotide polymorphism, SNP), 삽입결실 (indel) 또는 구조 변이체 (structural variant, SV)를 포함하는 유전자 변이를 확인하는 단계를 포함하는 방법.
- 제6항에 있어서, 하플로타입 특이적 유전자 변이 확인부가 디노보 어셈블리된 시험 서열로 염색체 상의 하플로타입을 나타내도록 하플로티그의 디노보 어셈블리를 구성한 서열과 대상 시험체의 게놈 서열 정보와 비교하여 하플로타입 특이적인 단일염기다형성, 삽입결실 또는 구조 변이체를 포함하는 유전자 변이를 확인하는 단계를 포함하는 방법.
- (a) 시험 서열을 디노보 게놈 어셈블리로 게놈을 어셈블리하여 전체 서열을 재구성하는 유전체 조립부;
(b) 서열 정보의 수정을 수행하고자 하는 위치의 레퍼런스 서열의 자가-유사 점도표 (self-similarity dot-plot)를 생성하는 점도표 생성부;
(c) 레퍼런스 서열의 자가-유사 점도표 상에서 서열 갭을 확인하는 점도표 분석부;
(d) 서열 정보의 수정을 수행하고자 하는 위치의 레퍼런스 서열 영역에 대응하는 위치의 서열로 디노보 게놈 어셈블리된 시험 서열의 전체 서열에서 일부 서열을 선택하는 서열 선택부;
(e) 디노보 게놈 어셈블리된 시험 서열의 자가-유사 점도표를 생성하는 점도표 생성부;
(f) 레퍼런스 서열의 자가-유사 점도표와 디노보 게놈 어셈블리된 시험 서열의 자가-유사 점도표의 크기 비율을 맞추어 한 화면에 정렬하여, 디노보 게놈 어셈블리된 시험 서열의 대응하는 위치의 레퍼런스 서열에서 나타나는 서열 갭을 확인하는 다중점도표 분석부
; 및
(f) 레퍼런스 서열에서 나타나는 서열 갭을 대응하는 위치의 디노보 어셈블리된 시험 서열로 클로징하거나 또는 갭을 연장하는 서열 정보 교정부
를 포함하는 기존에 알려진 대상 게놈에 대한 레퍼런스 어셈블리를 시험 서열의 디노보 어셈블리로 수정하는 시스템.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/880,934 US20180260521A1 (en) | 2017-03-09 | 2018-01-26 | Method and apparatus for multiple dot plot analysis |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR20170030200 | 2017-03-09 | ||
KR1020170030200 | 2017-03-09 |
Publications (1)
Publication Number | Publication Date |
---|---|
KR101832834B1 true KR101832834B1 (ko) | 2018-04-13 |
Family
ID=61974165
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020170128472A Active KR101832834B1 (ko) | 2017-03-09 | 2017-10-01 | 다중점도표 분석 기반 변이 탐색 방법 및 시스템 |
Country Status (2)
Country | Link |
---|---|
US (1) | US20180260521A1 (ko) |
KR (1) | KR101832834B1 (ko) |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102659357B1 (ko) | 2019-02-07 | 2024-04-19 | 삼성전자주식회사 | 아바타 애니메이션을 제공하기 위한 전자 장치 및 그에 관한 방법 |
CN110853708B (zh) * | 2019-11-13 | 2022-03-08 | 上海仁东医学检验所有限公司 | 用于hla分型的核酸捕获探针的设计方法 |
CN111445949A (zh) * | 2020-03-27 | 2020-07-24 | 武汉古奥基因科技有限公司 | 利用纳米孔测序数据的高原多倍体鱼类基因组注释方法 |
CN111724858B (zh) * | 2020-05-14 | 2024-06-07 | 东北林业大学 | 利用软件运行基因组序列比对修补gap的方法 |
CN112164424B (zh) * | 2020-08-03 | 2024-04-09 | 南京派森诺基因科技有限公司 | 一种基于无参考基因组的群体进化分析方法 |
CN112435712B (zh) * | 2020-11-20 | 2024-07-30 | 元码基因科技(苏州)有限公司 | 用于分析基因测序数据的方法及系统 |
CN112669902B (zh) * | 2021-03-16 | 2021-06-04 | 北京贝瑞和康生物技术有限公司 | 检测基因组结构变异的方法、计算设备和存储介质 |
CN113035269B (zh) * | 2021-04-16 | 2022-11-01 | 北京计算科学研究中心 | 基于高通量测序技术的基因组代谢模型构建、优化及可视化的方法 |
CN113257358A (zh) * | 2021-05-27 | 2021-08-13 | 山东建筑大学 | 基于片段重叠群的单面基因组片段填充方法和装置 |
EP4533461A1 (en) * | 2022-05-26 | 2025-04-09 | Myome, Inc. | Systems and methods for identification of structural variants based on an autoencoder |
CN119724338A (zh) * | 2024-12-04 | 2025-03-28 | 中国水产科学研究院渔业工程研究所 | 基于图泛基因组鉴定鱼基因组结构变异的方法及应用 |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140249764A1 (en) | 2011-06-06 | 2014-09-04 | Koninklijke Philips N.V. | Method for Assembly of Nucleic Acid Sequence Data |
-
2017
- 2017-10-01 KR KR1020170128472A patent/KR101832834B1/ko active Active
-
2018
- 2018-01-26 US US15/880,934 patent/US20180260521A1/en not_active Abandoned
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140249764A1 (en) | 2011-06-06 | 2014-09-04 | Koninklijke Philips N.V. | Method for Assembly of Nucleic Acid Sequence Data |
Non-Patent Citations (2)
Title |
---|
Journal of Computational and Graphical Statistics 2.2 (1993): 153-174. |
Nature genetics 40.1 (2008): pp.96-101. |
Also Published As
Publication number | Publication date |
---|---|
US20180260521A1 (en) | 2018-09-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101832834B1 (ko) | 다중점도표 분석 기반 변이 탐색 방법 및 시스템 | |
Logsdon et al. | The structure, function and evolution of a complete human chromosome 8 | |
Seo et al. | De novo assembly and phasing of a Korean human genome | |
Ling et al. | Genome sequence of the progenitor of wheat A subgenome Triticum urartu | |
US12173370B2 (en) | Whole-genome haplotype reconstruction | |
Chen et al. | Whole-genome sequence analysis unveils different origins of European and Asiatic mouflon and domestication-related genes in sheep | |
Naidoo et al. | Human genetics and genomics a decade after the release of the draft sequence of the human genome | |
Mungall et al. | The DNA sequence and analysis of human chromosome 6 | |
Willemsen et al. | Intra-species differences in population size shape life history and genome evolution | |
Swart et al. | The Oxytricha trifallax macronuclear genome: a complex eukaryotic genome with 16,000 tiny chromosomes | |
ES2229781T3 (es) | Metodos programas y aparatos para identificar regiones genomicas que albergan un gen asociado con un rasgo detectable. | |
Pértille et al. | High-throughput and cost-effective chicken genotyping using next-generation sequencing | |
Mossman et al. | Mitochondrial-nuclear interactions mediate sex-specific transcriptional profiles in Drosophila | |
Kockum et al. | Overview of genotyping technologies and methods | |
Van Bers et al. | The design and cross‐population application of a genome‐wide SNP chip for the great tit Parus major | |
El‐Sayed et al. | The sequence and analysis of Trypanosoma brucei chromosome II | |
Fernandes et al. | Genome-wide detection of CNVs and their association with performance traits in broilers | |
Macas et al. | Assembly of the 81.6 Mb centromere of pea chromosome 6 elucidates the structure and evolution of metapolycentric chromosomes | |
Porubsky et al. | A familial, telomere-to-telomere reference for human de novo mutation and recombination from a four-generation pedigree | |
Bredemeyer et al. | Single-haplotype comparative genomics provides insights into lineage-specific structural variation during cat evolution | |
Vara et al. | PRDM9 diversity at fine geographical scale reveals contrasting evolutionary patterns and functional constraints in natural populations of house mice | |
Reinhardt et al. | Impacts of sex ratio meiotic drive on genome structure and function in a stalk-eyed fly | |
Li et al. | Large-scale chromosomal changes lead to genome-level expression alterations, environmental adaptation, and speciation in the gayal (Bos frontalis) | |
Bradley et al. | A major zebrafish polymorphism resource for genetic mapping | |
Porubsky et al. | Human de novo mutation rates from a four-generation pedigree reference |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PA0109 | Patent application |
Patent event code: PA01091R01D Comment text: Patent Application Patent event date: 20171001 |
|
PA0201 | Request for examination | ||
PA0302 | Request for accelerated examination |
Patent event date: 20171016 Patent event code: PA03022R01D Comment text: Request for Accelerated Examination Patent event date: 20171001 Patent event code: PA03021R01I Comment text: Patent Application |
|
PE0902 | Notice of grounds for rejection |
Comment text: Notification of reason for refusal Patent event date: 20171229 Patent event code: PE09021S01D |
|
PE0701 | Decision of registration |
Patent event code: PE07011S01D Comment text: Decision to Grant Registration Patent event date: 20180207 |
|
GRNT | Written decision to grant | ||
PR0701 | Registration of establishment |
Comment text: Registration of Establishment Patent event date: 20180221 Patent event code: PR07011E01D |
|
PR1002 | Payment of registration fee |
Payment date: 20180221 End annual number: 3 Start annual number: 1 |
|
PG1601 | Publication of registration | ||
PR1001 | Payment of annual fee |
Payment date: 20201211 Start annual number: 4 End annual number: 4 |
|
PR1001 | Payment of annual fee |
Payment date: 20211119 Start annual number: 5 End annual number: 5 |
|
PR1001 | Payment of annual fee |
Payment date: 20231127 Start annual number: 7 End annual number: 7 |