US20210210161A1 - Methods and systems for generating a virtual progeny genome - Google Patents
Methods and systems for generating a virtual progeny genome Download PDFInfo
- Publication number
- US20210210161A1 US20210210161A1 US17/160,191 US202117160191A US2021210161A1 US 20210210161 A1 US20210210161 A1 US 20210210161A1 US 202117160191 A US202117160191 A US 202117160191A US 2021210161 A1 US2021210161 A1 US 2021210161A1
- Authority
- US
- United States
- Prior art keywords
- gamete
- sequence
- donors
- genome
- potential
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 83
- 108700028369 Alleles Proteins 0.000 claims description 114
- 102000054766 genetic haplotypes Human genes 0.000 claims description 88
- 230000002068 genetic effect Effects 0.000 claims description 78
- 108020004414 DNA Proteins 0.000 claims description 60
- 108091028043 Nucleic acid sequence Proteins 0.000 claims description 20
- 239000012472 biological sample Substances 0.000 claims description 16
- 238000009826 distribution Methods 0.000 claims description 14
- 230000006798 recombination Effects 0.000 claims description 7
- 238000005215 recombination Methods 0.000 claims description 7
- 238000012217 deletion Methods 0.000 claims description 6
- 230000037430 deletion Effects 0.000 claims description 6
- 230000001747 exhibiting effect Effects 0.000 claims description 5
- 102000054765 polymorphisms of proteins Human genes 0.000 claims description 4
- 238000003780 insertion Methods 0.000 claims description 3
- 230000037431 insertion Effects 0.000 claims description 3
- 230000037429 base substitution Effects 0.000 claims 2
- 238000010448 genetic screening Methods 0.000 claims 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 44
- 201000010099 disease Diseases 0.000 description 42
- 102100040458 2',3'-cyclic-nucleotide 3'-phosphodiesterase Human genes 0.000 description 30
- 239000000523 sample Substances 0.000 description 28
- 210000000349 chromosome Anatomy 0.000 description 23
- 235000013601 eggs Nutrition 0.000 description 18
- 230000013011 mating Effects 0.000 description 15
- 239000002773 nucleotide Substances 0.000 description 15
- 125000003729 nucleotide group Chemical group 0.000 description 15
- 230000008569 process Effects 0.000 description 15
- 108090000623 proteins and genes Proteins 0.000 description 15
- 238000005070 sampling Methods 0.000 description 13
- 238000004458 analytical method Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 12
- 230000007614 genetic variation Effects 0.000 description 9
- 238000000342 Monte Carlo simulation Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 8
- 230000000652 homosexual effect Effects 0.000 description 8
- 238000012360 testing method Methods 0.000 description 8
- 238000000018 DNA microarray Methods 0.000 description 7
- 210000004027 cell Anatomy 0.000 description 7
- 239000003795 chemical substances by application Substances 0.000 description 7
- 210000000988 bone and bone Anatomy 0.000 description 6
- 238000003205 genotyping method Methods 0.000 description 6
- 208000016354 hearing loss disease Diseases 0.000 description 6
- 102000008371 intracellularly ATP-gated chloride channel activity proteins Human genes 0.000 description 6
- 239000000203 mixture Substances 0.000 description 6
- 230000035772 mutation Effects 0.000 description 6
- 230000011514 reflex Effects 0.000 description 6
- 206010011878 Deafness Diseases 0.000 description 5
- 208000002537 Neuronal Ceroid-Lipofuscinoses Diseases 0.000 description 5
- 230000002759 chromosomal effect Effects 0.000 description 5
- 210000000287 oocyte Anatomy 0.000 description 5
- 102000004169 proteins and genes Human genes 0.000 description 5
- 230000001850 reproductive effect Effects 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 4
- 108091026890 Coding region Proteins 0.000 description 4
- 201000003883 Cystic fibrosis Diseases 0.000 description 4
- 238000003491 array Methods 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 4
- 231100000895 deafness Toxicity 0.000 description 4
- 210000004907 gland Anatomy 0.000 description 4
- 230000013016 learning Effects 0.000 description 4
- 238000004519 manufacturing process Methods 0.000 description 4
- 230000021121 meiosis Effects 0.000 description 4
- 201000008051 neuronal ceroid lipofuscinosis Diseases 0.000 description 4
- 201000006790 nonsyndromic deafness Diseases 0.000 description 4
- 230000035479 physiological effects, processes and functions Effects 0.000 description 4
- 238000012163 sequencing technique Methods 0.000 description 4
- 230000014639 sexual reproduction Effects 0.000 description 4
- 206010003805 Autism Diseases 0.000 description 3
- 208000020706 Autistic disease Diseases 0.000 description 3
- 101100170173 Caenorhabditis elegans del-1 gene Proteins 0.000 description 3
- 206010010356 Congenital anomaly Diseases 0.000 description 3
- 206010020608 Hypercoagulation Diseases 0.000 description 3
- 208000024556 Mendelian disease Diseases 0.000 description 3
- 230000030741 antigen processing and presentation Effects 0.000 description 3
- 230000008827 biological function Effects 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 3
- 210000000481 breast Anatomy 0.000 description 3
- 230000000747 cardiac effect Effects 0.000 description 3
- 230000003750 conditioning effect Effects 0.000 description 3
- 230000007547 defect Effects 0.000 description 3
- 230000007812 deficiency Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 210000003414 extremity Anatomy 0.000 description 3
- 210000001508 eye Anatomy 0.000 description 3
- 230000035558 fertility Effects 0.000 description 3
- 230000010370 hearing loss Effects 0.000 description 3
- 231100000888 hearing loss Toxicity 0.000 description 3
- 230000000423 heterosexual effect Effects 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 230000003340 mental effect Effects 0.000 description 3
- 210000003205 muscle Anatomy 0.000 description 3
- 230000011164 ossification Effects 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- 201000005665 thrombophilia Diseases 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 201000008162 B cell deficiency Diseases 0.000 description 2
- 208000031229 Cardiomyopathies Diseases 0.000 description 2
- 206010008723 Chondrodystrophy Diseases 0.000 description 2
- 208000032170 Congenital Abnormalities Diseases 0.000 description 2
- 206010053138 Congenital aplastic anaemia Diseases 0.000 description 2
- 208000009283 Craniosynostoses Diseases 0.000 description 2
- 206010049889 Craniosynostosis Diseases 0.000 description 2
- WQZGKKKJIJFFOK-GASJEMHNSA-N Glucose Natural products OC[C@H]1OC(O)[C@H](O)[C@@H](O)[C@@H]1O WQZGKKKJIJFFOK-GASJEMHNSA-N 0.000 description 2
- 206010020365 Homocystinuria Diseases 0.000 description 2
- 208000008852 Hyperoxaluria Diseases 0.000 description 2
- 208000021642 Muscular disease Diseases 0.000 description 2
- 201000009623 Myopathy Diseases 0.000 description 2
- 206010028980 Neoplasm Diseases 0.000 description 2
- 208000014060 Niemann-Pick disease Diseases 0.000 description 2
- 208000004286 Osteochondrodysplasias Diseases 0.000 description 2
- 206010033128 Ovarian cancer Diseases 0.000 description 2
- 206010061535 Ovarian neoplasm Diseases 0.000 description 2
- 201000001322 T cell deficiency Diseases 0.000 description 2
- 108700036262 Trifunctional Protein Deficiency With Myopathy And Neuropathy Proteins 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 2
- 210000003719 b-lymphocyte Anatomy 0.000 description 2
- 230000033228 biological regulation Effects 0.000 description 2
- 210000004369 blood Anatomy 0.000 description 2
- 239000008280 blood Substances 0.000 description 2
- 239000000969 carrier Substances 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000005094 computer simulation Methods 0.000 description 2
- 238000009223 counseling Methods 0.000 description 2
- 230000034994 death Effects 0.000 description 2
- 206010012601 diabetes mellitus Diseases 0.000 description 2
- 208000035475 disorder Diseases 0.000 description 2
- 230000029036 donor selection Effects 0.000 description 2
- 210000000624 ear auricle Anatomy 0.000 description 2
- 210000000959 ear middle Anatomy 0.000 description 2
- 206010015037 epilepsy Diseases 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000009472 formulation Methods 0.000 description 2
- 239000008103 glucose Substances 0.000 description 2
- 210000003714 granulocyte Anatomy 0.000 description 2
- 210000004209 hair Anatomy 0.000 description 2
- 210000003783 haploid cell Anatomy 0.000 description 2
- 230000002440 hepatic effect Effects 0.000 description 2
- 210000003917 human chromosome Anatomy 0.000 description 2
- 201000005706 hypokalemic periodic paralysis Diseases 0.000 description 2
- NOESYZHRGYRDHS-UHFFFAOYSA-N insulin Chemical compound N1C(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(NC(=O)CN)C(C)CC)CSSCC(C(NC(CO)C(=O)NC(CC(C)C)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CCC(N)=O)C(=O)NC(CC(C)C)C(=O)NC(CCC(O)=O)C(=O)NC(CC(N)=O)C(=O)NC(CC=2C=CC(O)=CC=2)C(=O)NC(CSSCC(NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2C=CC(O)=CC=2)NC(=O)C(CC(C)C)NC(=O)C(C)NC(=O)C(CCC(O)=O)NC(=O)C(C(C)C)NC(=O)C(CC(C)C)NC(=O)C(CC=2NC=NC=2)NC(=O)C(CO)NC(=O)CNC2=O)C(=O)NCC(=O)NC(CCC(O)=O)C(=O)NC(CCCNC(N)=N)C(=O)NCC(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC=CC=3)C(=O)NC(CC=3C=CC(O)=CC=3)C(=O)NC(C(C)O)C(=O)N3C(CCC3)C(=O)NC(CCCCN)C(=O)NC(C)C(O)=O)C(=O)NC(CC(N)=O)C(O)=O)=O)NC(=O)C(C(C)CC)NC(=O)C(CO)NC(=O)C(C(C)O)NC(=O)C1CSSCC2NC(=O)C(CC(C)C)NC(=O)C(NC(=O)C(CCC(N)=O)NC(=O)C(CC(N)=O)NC(=O)C(NC(=O)C(N)CC=1C=CC=CC=1)C(C)C)CC1=CN=CN1 NOESYZHRGYRDHS-UHFFFAOYSA-N 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 210000003734 kidney Anatomy 0.000 description 2
- 210000004185 liver Anatomy 0.000 description 2
- 108020004999 messenger RNA Proteins 0.000 description 2
- 230000004060 metabolic process Effects 0.000 description 2
- 210000000653 nervous system Anatomy 0.000 description 2
- 229920001184 polypeptide Polymers 0.000 description 2
- 102000004196 processed proteins & peptides Human genes 0.000 description 2
- 108090000765 processed proteins & peptides Proteins 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 102200128178 rs213950 Human genes 0.000 description 2
- 238000005204 segregation Methods 0.000 description 2
- 230000001953 sensory effect Effects 0.000 description 2
- 230000020341 sensory perception of pain Effects 0.000 description 2
- 210000002356 skeleton Anatomy 0.000 description 2
- 210000003491 skin Anatomy 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 210000001519 tissue Anatomy 0.000 description 2
- 210000002700 urine Anatomy 0.000 description 2
- 210000003135 vibrissae Anatomy 0.000 description 2
- 206010000021 21-hydroxylase deficiency Diseases 0.000 description 1
- 108700020831 3-Hydroxyacyl-CoA Dehydrogenase Proteins 0.000 description 1
- 102100021834 3-hydroxyacyl-CoA dehydrogenase Human genes 0.000 description 1
- 102100033051 40S ribosomal protein S19 Human genes 0.000 description 1
- 102100032123 AMP deaminase 1 Human genes 0.000 description 1
- 208000000363 Agenesis of Corpus Callosum Diseases 0.000 description 1
- 208000028060 Albright disease Diseases 0.000 description 1
- 102100035028 Alpha-L-iduronidase Human genes 0.000 description 1
- 208000033337 Alpha-sarcoglycan-related limb-girdle muscular dystrophy R3 Diseases 0.000 description 1
- 102000008873 Angiotensin II receptor Human genes 0.000 description 1
- 108050000824 Angiotensin II receptor Proteins 0.000 description 1
- 102100029470 Apolipoprotein E Human genes 0.000 description 1
- 101710095339 Apolipoprotein E Proteins 0.000 description 1
- 206010003210 Arteriosclerosis Diseases 0.000 description 1
- 206010068220 Aspartylglucosaminuria Diseases 0.000 description 1
- 206010003594 Ataxia telangiectasia Diseases 0.000 description 1
- 208000001827 Ataxia with vitamin E deficiency Diseases 0.000 description 1
- 208000031212 Autoimmune polyendocrinopathy Diseases 0.000 description 1
- 208000034320 Autosomal recessive spastic ataxia of Charlevoix-Saguenay Diseases 0.000 description 1
- 102000036365 BRCA1 Human genes 0.000 description 1
- 108700020463 BRCA1 Proteins 0.000 description 1
- 101150072950 BRCA1 gene Proteins 0.000 description 1
- 102000052609 BRCA2 Human genes 0.000 description 1
- 108700020462 BRCA2 Proteins 0.000 description 1
- 201000001321 Bardet-Biedl syndrome Diseases 0.000 description 1
- 208000037663 Best vitelliform macular dystrophy Diseases 0.000 description 1
- 208000034067 Beta-sarcoglycan-related limb-girdle muscular dystrophy R4 Diseases 0.000 description 1
- 208000033258 Bifunctional enzyme deficiency Diseases 0.000 description 1
- 208000033932 Blackfan-Diamond anemia Diseases 0.000 description 1
- 208000009766 Blau syndrome Diseases 0.000 description 1
- 208000005692 Bloom Syndrome Diseases 0.000 description 1
- 208000006386 Bone Resorption Diseases 0.000 description 1
- 101150008921 Brca2 gene Proteins 0.000 description 1
- 102100031478 C-type natriuretic peptide Human genes 0.000 description 1
- 208000022526 Canavan disease Diseases 0.000 description 1
- 208000005623 Carcinogenesis Diseases 0.000 description 1
- 108700005857 Carnitine palmitoyl transferase 1A deficiency Proteins 0.000 description 1
- 208000005359 Carnitine palmitoyl transferase 1A deficiency Diseases 0.000 description 1
- 108700005858 Carnitine palmitoyl transferase 2 deficiency Proteins 0.000 description 1
- 201000002929 Carnitine palmitoyltransferase II deficiency Diseases 0.000 description 1
- 208000004918 Cartilage-hair hypoplasia Diseases 0.000 description 1
- 208000009132 Catalepsy Diseases 0.000 description 1
- 206010007747 Cataract congenital Diseases 0.000 description 1
- 208000031464 Cavernous Central Nervous System Hemangioma Diseases 0.000 description 1
- 208000032929 Cerebral haemangioma Diseases 0.000 description 1
- 201000003679 Charlevoix-Saguenay spastic ataxia Diseases 0.000 description 1
- 208000033810 Choroidal dystrophy Diseases 0.000 description 1
- 208000037051 Chromosomal Instability Diseases 0.000 description 1
- 241000692783 Chylismia claviformis Species 0.000 description 1
- 208000013147 Classic homocystinuria Diseases 0.000 description 1
- 208000008020 Cohen syndrome Diseases 0.000 description 1
- 208000006992 Color Vision Defects Diseases 0.000 description 1
- 208000021599 Congenital lactic acidosis, Saguenay-Lac-Saint-Jean type Diseases 0.000 description 1
- 208000029767 Congenital, Hereditary, and Neonatal Diseases and Abnormalities Diseases 0.000 description 1
- 206010010904 Convulsion Diseases 0.000 description 1
- 102000012437 Copper-Transporting ATPases Human genes 0.000 description 1
- 208000011231 Crohn disease Diseases 0.000 description 1
- 206010071093 Cystathionine beta-synthase deficiency Diseases 0.000 description 1
- 206010011777 Cystinosis Diseases 0.000 description 1
- 102000004127 Cytokines Human genes 0.000 description 1
- 108090000695 Cytokines Proteins 0.000 description 1
- 102000053602 DNA Human genes 0.000 description 1
- 238000001712 DNA sequencing Methods 0.000 description 1
- 206010011968 Decreased immune responsiveness Diseases 0.000 description 1
- 206010012335 Dependence Diseases 0.000 description 1
- 208000000398 DiGeorge Syndrome Diseases 0.000 description 1
- 201000004449 Diamond-Blackfan anemia Diseases 0.000 description 1
- 201000010385 Dihydropyrimidine Dehydrogenase Deficiency Diseases 0.000 description 1
- 208000035240 Disease Resistance Diseases 0.000 description 1
- 206010013654 Drug abuse Diseases 0.000 description 1
- 206010066054 Dysmorphism Diseases 0.000 description 1
- 208000014094 Dystonic disease Diseases 0.000 description 1
- 102000004190 Enzymes Human genes 0.000 description 1
- 108090000790 Enzymes Proteins 0.000 description 1
- 206010014970 Ephelides Diseases 0.000 description 1
- 206010014989 Epidermolysis bullosa Diseases 0.000 description 1
- 241000588724 Escherichia coli Species 0.000 description 1
- 108700024394 Exon Proteins 0.000 description 1
- 208000033534 FKRP-related limb-girdle muscular dystrophy R9 Diseases 0.000 description 1
- 208000013668 Facial cleft Diseases 0.000 description 1
- 108010014172 Factor V Proteins 0.000 description 1
- 201000007371 Factor XIII Deficiency Diseases 0.000 description 1
- 208000028506 Familial Exudative Vitreoretinopathies Diseases 0.000 description 1
- 206010016207 Familial Mediterranean fever Diseases 0.000 description 1
- 201000006107 Familial adenomatous polyposis Diseases 0.000 description 1
- 208000001730 Familial dysautonomia Diseases 0.000 description 1
- 208000033627 Familial osteochondritis dissecans Diseases 0.000 description 1
- 201000004939 Fanconi anemia Diseases 0.000 description 1
- 208000004930 Fatty Liver Diseases 0.000 description 1
- 201000011240 Frontotemporal dementia Diseases 0.000 description 1
- 206010072104 Fructose intolerance Diseases 0.000 description 1
- 208000006517 Fumaric aciduria Diseases 0.000 description 1
- 108700036912 Fumaric aciduria Proteins 0.000 description 1
- 208000025499 G6PD deficiency Diseases 0.000 description 1
- 208000013381 GRACILE syndrome Diseases 0.000 description 1
- 208000027472 Galactosemias Diseases 0.000 description 1
- 208000012671 Gastrointestinal haemorrhages Diseases 0.000 description 1
- 208000015872 Gaucher disease Diseases 0.000 description 1
- 208000010055 Globoid Cell Leukodystrophy Diseases 0.000 description 1
- 102400000321 Glucagon Human genes 0.000 description 1
- 108060003199 Glucagon Proteins 0.000 description 1
- 206010018444 Glucose-6-phosphate dehydrogenase deficiency Diseases 0.000 description 1
- 108700006770 Glutaric Acidemia I Proteins 0.000 description 1
- 208000021097 Glutaryl-CoA dehydrogenase deficiency Diseases 0.000 description 1
- 229920002527 Glycogen Polymers 0.000 description 1
- 102100029492 Glycogen phosphorylase, muscle form Human genes 0.000 description 1
- 208000032007 Glycogen storage disease due to acid maltase deficiency Diseases 0.000 description 1
- 208000011476 Glycogen storage disease due to glucose-6-phosphatase deficiency type Ib Diseases 0.000 description 1
- 208000032008 Glycogen storage disease due to glycogen debranching enzyme deficiency Diseases 0.000 description 1
- 208000032000 Glycogen storage disease due to muscle glycogen phosphorylase deficiency Diseases 0.000 description 1
- 206010053185 Glycogen storage disease type II Diseases 0.000 description 1
- 206010053250 Glycogen storage disease type III Diseases 0.000 description 1
- 206010018462 Glycogen storage disease type V Diseases 0.000 description 1
- 208000010496 Heart Arrest Diseases 0.000 description 1
- 208000032843 Hemorrhage Diseases 0.000 description 1
- 206010019708 Hepatic steatosis Diseases 0.000 description 1
- 208000002972 Hepatolenticular Degeneration Diseases 0.000 description 1
- 208000032087 Hereditary Leber Optic Atrophy Diseases 0.000 description 1
- 208000028572 Hereditary chronic pancreatitis Diseases 0.000 description 1
- 206010019878 Hereditary fructose intolerance Diseases 0.000 description 1
- 208000033981 Hereditary haemochromatosis Diseases 0.000 description 1
- 206010056976 Hereditary pancreatitis Diseases 0.000 description 1
- 102000016871 Hexosaminidase A Human genes 0.000 description 1
- 108010053317 Hexosaminidase A Proteins 0.000 description 1
- 102100031159 Homeobox protein prophet of Pit-1 Human genes 0.000 description 1
- 101000775844 Homo sapiens AMP deaminase 1 Proteins 0.000 description 1
- 101001019502 Homo sapiens Alpha-L-iduronidase Proteins 0.000 description 1
- 101000700475 Homo sapiens Glycogen phosphorylase, muscle form Proteins 0.000 description 1
- 101000706471 Homo sapiens Homeobox protein prophet of Pit-1 Proteins 0.000 description 1
- 101000587058 Homo sapiens Methylenetetrahydrofolate reductase Proteins 0.000 description 1
- 101000641122 Homo sapiens Sacsin Proteins 0.000 description 1
- 208000007599 Hyperkalemic periodic paralysis Diseases 0.000 description 1
- 208000031226 Hyperlipidaemia Diseases 0.000 description 1
- 208000000563 Hyperlipoproteinemia Type II Diseases 0.000 description 1
- 208000034600 Hyperornithinemia-hyperammonemia-homocitrullinuria syndrome Diseases 0.000 description 1
- 206010020751 Hypersensitivity Diseases 0.000 description 1
- 206010049933 Hypophosphatasia Diseases 0.000 description 1
- 206010062767 Hypophysitis Diseases 0.000 description 1
- 206010061598 Immunodeficiency Diseases 0.000 description 1
- 208000029462 Immunodeficiency disease Diseases 0.000 description 1
- 102000004877 Insulin Human genes 0.000 description 1
- 108090001061 Insulin Proteins 0.000 description 1
- 206010022489 Insulin Resistance Diseases 0.000 description 1
- 206010022653 Intestinal haemorrhages Diseases 0.000 description 1
- 108091092195 Intron Proteins 0.000 description 1
- 208000000420 Isovaleric acidemia Diseases 0.000 description 1
- 208000000913 Kidney Calculi Diseases 0.000 description 1
- 208000028226 Krabbe disease Diseases 0.000 description 1
- 206010023506 Kyphoscoliosis Diseases 0.000 description 1
- 206010023509 Kyphosis Diseases 0.000 description 1
- OUYCCCASQSFEME-QMMMGPOBSA-N L-tyrosine Chemical compound OC(=O)[C@@H](N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-QMMMGPOBSA-N 0.000 description 1
- 206010056715 Laurence-Moon-Bardet-Biedl syndrome Diseases 0.000 description 1
- 201000000639 Leber hereditary optic neuropathy Diseases 0.000 description 1
- 208000007623 Lordosis Diseases 0.000 description 1
- 102100024640 Low-density lipoprotein receptor Human genes 0.000 description 1
- 208000035177 MELAS Diseases 0.000 description 1
- 208000035172 MERRF Diseases 0.000 description 1
- 206010050183 Macrocephaly Diseases 0.000 description 1
- 208000030162 Maple syrup disease Diseases 0.000 description 1
- 201000001853 McCune-Albright syndrome Diseases 0.000 description 1
- 108700000232 Medium chain acyl CoA dehydrogenase deficiency Proteins 0.000 description 1
- 206010072654 Medium-chain acyl-coenzyme A dehydrogenase deficiency Diseases 0.000 description 1
- 208000005767 Megalencephaly Diseases 0.000 description 1
- 208000003351 Melanosis Diseases 0.000 description 1
- 208000036626 Mental retardation Diseases 0.000 description 1
- 201000011442 Metachromatic leukodystrophy Diseases 0.000 description 1
- 102100029684 Methylenetetrahydrofolate reductase Human genes 0.000 description 1
- 208000000570 Methylenetetrahydrofolate reductase deficiency Diseases 0.000 description 1
- 108700019352 Methylenetetrahydrofolate reductase deficiency Proteins 0.000 description 1
- 208000035155 Mitochondrial DNA-associated Leigh syndrome Diseases 0.000 description 1
- 102100027891 Mitochondrial chaperone BCS1 Human genes 0.000 description 1
- 208000008955 Mucolipidoses Diseases 0.000 description 1
- 206010056886 Mucopolysaccharidosis I Diseases 0.000 description 1
- 206010056893 Mucopolysaccharidosis VII Diseases 0.000 description 1
- 208000028781 Mucopolysaccharidosis type 1 Diseases 0.000 description 1
- 208000007326 Muenke Syndrome Diseases 0.000 description 1
- 206010073149 Multiple endocrine neoplasia Type 2 Diseases 0.000 description 1
- 206010073148 Multiple endocrine neoplasia type 2A Diseases 0.000 description 1
- 241000699670 Mus sp. Species 0.000 description 1
- 208000007101 Muscle Cramp Diseases 0.000 description 1
- 206010028289 Muscle atrophy Diseases 0.000 description 1
- 206010028347 Muscle twitching Diseases 0.000 description 1
- 208000012905 Myotonic disease Diseases 0.000 description 1
- 102100027661 N-sulphoglucosamine sulphohydrolase Human genes 0.000 description 1
- 208000034965 Nemaline Myopathies Diseases 0.000 description 1
- 206010029148 Nephrolithiasis Diseases 0.000 description 1
- 206010029164 Nephrotic syndrome Diseases 0.000 description 1
- 208000012902 Nervous system disease Diseases 0.000 description 1
- 208000025966 Neurological disease Diseases 0.000 description 1
- 101100182136 Neurospora crassa (strain ATCC 24698 / 74-OR23-1A / CBS 708.71 / DSM 1257 / FGSC 987) loc-1 gene Proteins 0.000 description 1
- 208000004485 Nijmegen breakage syndrome Diseases 0.000 description 1
- 206010031252 Osteomyelitis Diseases 0.000 description 1
- 208000001132 Osteoporosis Diseases 0.000 description 1
- 208000001164 Osteoporotic Fractures Diseases 0.000 description 1
- 201000011392 Pallister-Hall syndrome Diseases 0.000 description 1
- 206010033799 Paralysis Diseases 0.000 description 1
- 206010033892 Paraplegia Diseases 0.000 description 1
- 208000004843 Pendred Syndrome Diseases 0.000 description 1
- 208000001300 Perinatal Death Diseases 0.000 description 1
- 208000012202 Pervasive developmental disease Diseases 0.000 description 1
- 201000011252 Phenylketonuria Diseases 0.000 description 1
- 206010034972 Photosensitivity reaction Diseases 0.000 description 1
- 206010035039 Piloerection Diseases 0.000 description 1
- 108010077971 Plasminogen Inactivators Proteins 0.000 description 1
- 102000010752 Plasminogen Inactivators Human genes 0.000 description 1
- 108010094028 Prothrombin Proteins 0.000 description 1
- 102100027378 Prothrombin Human genes 0.000 description 1
- 241000320126 Pseudomugilidae Species 0.000 description 1
- 101100240886 Rattus norvegicus Nptx2 gene Proteins 0.000 description 1
- 208000004756 Respiratory Insufficiency Diseases 0.000 description 1
- 206010038687 Respiratory distress Diseases 0.000 description 1
- 208000007014 Retinitis pigmentosa Diseases 0.000 description 1
- 208000006289 Rett Syndrome Diseases 0.000 description 1
- 201000008539 Rhizomelic chondrodysplasia punctata type 1 Diseases 0.000 description 1
- 201000001638 Riley-Day syndrome Diseases 0.000 description 1
- 102100034272 Sacsin Human genes 0.000 description 1
- 208000025816 Sanfilippo syndrome type A Diseases 0.000 description 1
- 108700017825 Short chain Acyl CoA dehydrogenase deficiency Proteins 0.000 description 1
- 201000004283 Shwachman-Diamond syndrome Diseases 0.000 description 1
- 108010016797 Sickle Hemoglobin Proteins 0.000 description 1
- 208000018020 Sickle cell-beta-thalassemia disease syndrome Diseases 0.000 description 1
- 206010048676 Sjogren-Larsson Syndrome Diseases 0.000 description 1
- 206010072610 Skeletal dysplasia Diseases 0.000 description 1
- 201000007410 Smith-Lemli-Opitz syndrome Diseases 0.000 description 1
- 208000005392 Spasm Diseases 0.000 description 1
- 208000032930 Spastic paraplegia Diseases 0.000 description 1
- 208000006011 Stroke Diseases 0.000 description 1
- QAOWNCQODCNURD-UHFFFAOYSA-L Sulfate Chemical compound [O-]S([O-])(=O)=O QAOWNCQODCNURD-UHFFFAOYSA-L 0.000 description 1
- 230000024932 T cell mediated immunity Effects 0.000 description 1
- 210000001744 T-lymphocyte Anatomy 0.000 description 1
- 206010044565 Tremor Diseases 0.000 description 1
- 208000007824 Type A Niemann-Pick Disease Diseases 0.000 description 1
- -1 Type I Proteins 0.000 description 1
- 206010045261 Type IIa hyperlipidaemia Diseases 0.000 description 1
- 208000032001 Tyrosinemia type 1 Diseases 0.000 description 1
- 206010047370 Vesicoureteric reflux Diseases 0.000 description 1
- 201000006793 Walker-Warburg syndrome Diseases 0.000 description 1
- 201000005375 Warsaw breakage syndrome Diseases 0.000 description 1
- 206010047853 Waxy flexibility Diseases 0.000 description 1
- 208000018839 Wilson disease Diseases 0.000 description 1
- 201000001408 X-linked juvenile retinoschisis 1 Diseases 0.000 description 1
- 208000017441 X-linked retinoschisis Diseases 0.000 description 1
- 201000004525 Zellweger Syndrome Diseases 0.000 description 1
- 238000010521 absorption reaction Methods 0.000 description 1
- 208000008919 achondroplasia Diseases 0.000 description 1
- 201000000761 achromatopsia Diseases 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 208000038016 acute inflammation Diseases 0.000 description 1
- 230000006022 acute inflammation Effects 0.000 description 1
- 230000004721 adaptive immunity Effects 0.000 description 1
- 210000000577 adipose tissue Anatomy 0.000 description 1
- 210000004100 adrenal gland Anatomy 0.000 description 1
- 230000016571 aggressive behavior Effects 0.000 description 1
- 206010001689 alkaptonuria Diseases 0.000 description 1
- 208000026935 allergic disease Diseases 0.000 description 1
- 208000006682 alpha 1-Antitrypsin Deficiency Diseases 0.000 description 1
- 201000006288 alpha thalassemia Diseases 0.000 description 1
- 201000008333 alpha-mannosidosis Diseases 0.000 description 1
- 150000001413 amino acids Chemical class 0.000 description 1
- 206010002022 amyloidosis Diseases 0.000 description 1
- 230000000844 anti-bacterial effect Effects 0.000 description 1
- 210000000612 antigen-presenting cell Anatomy 0.000 description 1
- 201000003554 argininosuccinic aciduria Diseases 0.000 description 1
- 230000037007 arousal Effects 0.000 description 1
- 208000004900 arterial calcification of infancy Diseases 0.000 description 1
- 208000011775 arteriosclerosis disease Diseases 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 208000029560 autism spectrum disease Diseases 0.000 description 1
- 210000003403 autonomic nervous system Anatomy 0.000 description 1
- 201000009561 autosomal recessive limb-girdle muscular dystrophy type 2D Diseases 0.000 description 1
- 201000009553 autosomal recessive limb-girdle muscular dystrophy type 2E Diseases 0.000 description 1
- 201000009510 autosomal recessive limb-girdle muscular dystrophy type 2I Diseases 0.000 description 1
- 210000003651 basophil Anatomy 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 208000005980 beta thalassemia Diseases 0.000 description 1
- SQVRNKJHWKZAKO-UHFFFAOYSA-N beta-N-Acetyl-D-neuraminic acid Natural products CC(=O)NC1C(O)CC(O)(C(O)=O)OC1C(O)C(O)CO SQVRNKJHWKZAKO-UHFFFAOYSA-N 0.000 description 1
- 230000031018 biological processes and functions Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 206010071434 biotinidase deficiency Diseases 0.000 description 1
- 230000007698 birth defect Effects 0.000 description 1
- 230000004397 blinking Effects 0.000 description 1
- 230000017531 blood circulation Effects 0.000 description 1
- 230000037148 blood physiology Effects 0.000 description 1
- 230000036772 blood pressure Effects 0.000 description 1
- 210000001124 body fluid Anatomy 0.000 description 1
- 230000037396 body weight Effects 0.000 description 1
- 230000037182 bone density Effects 0.000 description 1
- 210000001185 bone marrow Anatomy 0.000 description 1
- 230000024279 bone resorption Effects 0.000 description 1
- 230000037118 bone strength Effects 0.000 description 1
- 102220415176 c.1521_1523del Human genes 0.000 description 1
- 201000011510 cancer Diseases 0.000 description 1
- 230000036952 cancer formation Effects 0.000 description 1
- 231100000504 carcinogenesis Toxicity 0.000 description 1
- 210000000748 cardiovascular system Anatomy 0.000 description 1
- 201000004010 carnitine palmitoyltransferase I deficiency Diseases 0.000 description 1
- 210000003010 carpal bone Anatomy 0.000 description 1
- 208000014884 cartilage development disease Diseases 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000030833 cell death Effects 0.000 description 1
- 230000024245 cell differentiation Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 210000003169 central nervous system Anatomy 0.000 description 1
- 230000007381 central nervous system physiology Effects 0.000 description 1
- 201000000760 cerebral cavernous malformation Diseases 0.000 description 1
- 239000005482 chemotactic factor Substances 0.000 description 1
- 230000035605 chemotaxis Effects 0.000 description 1
- 208000003571 choroideremia Diseases 0.000 description 1
- 208000037976 chronic inflammation Diseases 0.000 description 1
- 230000006020 chronic inflammation Effects 0.000 description 1
- 230000027288 circadian rhythm Effects 0.000 description 1
- 208000029664 classic familial adenomatous polyposis Diseases 0.000 description 1
- 238000010367 cloning Methods 0.000 description 1
- 239000005515 coenzyme Substances 0.000 description 1
- 230000003930 cognitive ability Effects 0.000 description 1
- 201000007254 color blindness Diseases 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 239000002299 complementary DNA Substances 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000010205 computational analysis Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000001143 conditioned effect Effects 0.000 description 1
- 208000030483 congenital disorder of glycosylation Ib Diseases 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001054 cortical effect Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 210000004443 dendritic cell Anatomy 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 239000005546 dideoxynucleotide Substances 0.000 description 1
- 230000001079 digestive effect Effects 0.000 description 1
- 210000002249 digestive system Anatomy 0.000 description 1
- 210000001840 diploid cell Anatomy 0.000 description 1
- 208000022602 disease susceptibility Diseases 0.000 description 1
- 230000004590 drinking behavior Effects 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 208000010118 dystonia Diseases 0.000 description 1
- 210000000883 ear external Anatomy 0.000 description 1
- 210000003027 ear inner Anatomy 0.000 description 1
- 208000016570 early-onset generalized limb-onset dystonia Diseases 0.000 description 1
- 230000020595 eating behavior Effects 0.000 description 1
- 208000002169 ectodermal dysplasia Diseases 0.000 description 1
- 230000013020 embryo development Effects 0.000 description 1
- 230000008011 embryonic death Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 230000006397 emotional response Effects 0.000 description 1
- 229940088598 enzyme Drugs 0.000 description 1
- 210000003979 eosinophil Anatomy 0.000 description 1
- 210000002745 epiphysis Anatomy 0.000 description 1
- 210000003238 esophagus Anatomy 0.000 description 1
- 230000029142 excretion Effects 0.000 description 1
- 201000006902 exudative vitreoretinopathy Diseases 0.000 description 1
- 230000004384 eye physiology Effects 0.000 description 1
- 230000000193 eyeblink Effects 0.000 description 1
- 210000004709 eyebrow Anatomy 0.000 description 1
- 210000000720 eyelash Anatomy 0.000 description 1
- 208000014337 facial nerve disease Diseases 0.000 description 1
- 108010091897 factor V Leiden Proteins 0.000 description 1
- 201000007219 factor XI deficiency Diseases 0.000 description 1
- 201000001386 familial hypercholesterolemia Diseases 0.000 description 1
- 208000025697 familial rhabdoid tumor Diseases 0.000 description 1
- 208000010706 fatty liver disease Diseases 0.000 description 1
- 231100000562 fetal loss Toxicity 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 210000002082 fibula Anatomy 0.000 description 1
- 210000003811 finger Anatomy 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 231100000221 frame shift mutation induction Toxicity 0.000 description 1
- 230000037433 frameshift Effects 0.000 description 1
- 208000014346 fumarase deficiency Diseases 0.000 description 1
- 210000000232 gallbladder Anatomy 0.000 description 1
- 230000006543 gametophyte development Effects 0.000 description 1
- 238000012252 genetic analysis Methods 0.000 description 1
- 210000004602 germ cell Anatomy 0.000 description 1
- MASNOZXLGMXCHN-ZLPAWPGGSA-N glucagon Chemical compound C([C@@H](C(=O)N[C@H](C(=O)N[C@@H](CCC(N)=O)C(=O)N[C@@H](CC=1C2=CC=CC=C2NC=1)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CCSC)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H]([C@@H](C)O)C(O)=O)C(C)C)NC(=O)[C@H](CC(O)=O)NC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](C)NC(=O)[C@H](CCCNC(N)=N)NC(=O)[C@H](CCCNC(N)=N)NC(=O)[C@H](CO)NC(=O)[C@H](CC(O)=O)NC(=O)[C@H](CC(C)C)NC(=O)[C@H](CC=1C=CC(O)=CC=1)NC(=O)[C@H](CCCCN)NC(=O)[C@H](CO)NC(=O)[C@H](CC=1C=CC(O)=CC=1)NC(=O)[C@H](CC(O)=O)NC(=O)[C@H](CO)NC(=O)[C@@H](NC(=O)[C@H](CC=1C=CC=CC=1)NC(=O)[C@@H](NC(=O)CNC(=O)[C@H](CCC(N)=O)NC(=O)[C@H](CO)NC(=O)[C@@H](N)CC=1NC=NC=1)[C@@H](C)O)[C@@H](C)O)C1=CC=CC=C1 MASNOZXLGMXCHN-ZLPAWPGGSA-N 0.000 description 1
- 229960004666 glucagon Drugs 0.000 description 1
- 230000014101 glucose homeostasis Effects 0.000 description 1
- 208000008605 glucosephosphate dehydrogenase deficiency Diseases 0.000 description 1
- 229940096919 glycogen Drugs 0.000 description 1
- 201000004502 glycogen storage disease II Diseases 0.000 description 1
- 201000004543 glycogen storage disease III Diseases 0.000 description 1
- 208000005516 glycogen storage disease Ib Diseases 0.000 description 1
- 201000004534 glycogen storage disease V Diseases 0.000 description 1
- 208000011460 glycogen storage disease due to glucose-6-phosphatase deficiency type IA Diseases 0.000 description 1
- 230000021061 grooming behavior Effects 0.000 description 1
- 230000012010 growth Effects 0.000 description 1
- 230000037308 hair color Effects 0.000 description 1
- 210000003780 hair follicle Anatomy 0.000 description 1
- 230000003779 hair growth Effects 0.000 description 1
- 210000004247 hand Anatomy 0.000 description 1
- 210000000259 harderian gland Anatomy 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000005802 health problem Effects 0.000 description 1
- 208000019622 heart disease Diseases 0.000 description 1
- 208000002085 hemarthrosis Diseases 0.000 description 1
- 201000000391 hemochromatosis type 1 Diseases 0.000 description 1
- 230000013632 homeostatic process Effects 0.000 description 1
- 208000013144 homocystinuria due to methylene tetrahydrofolate reductase deficiency Diseases 0.000 description 1
- 210000002758 humerus Anatomy 0.000 description 1
- 230000028996 humoral immune response Effects 0.000 description 1
- 201000008980 hyperinsulinism Diseases 0.000 description 1
- 230000009610 hypersensitivity Effects 0.000 description 1
- 201000010072 hypochondroplasia Diseases 0.000 description 1
- 208000011111 hypophosphatemic rickets Diseases 0.000 description 1
- 230000001096 hypoplastic effect Effects 0.000 description 1
- 210000003016 hypothalamus Anatomy 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 210000002865 immune cell Anatomy 0.000 description 1
- 230000036737 immune function Effects 0.000 description 1
- 210000000987 immune system Anatomy 0.000 description 1
- 230000000899 immune system response Effects 0.000 description 1
- 230000006058 immune tolerance Effects 0.000 description 1
- 230000007813 immunodeficiency Effects 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 208000015181 infectious disease Diseases 0.000 description 1
- 208000021267 infertility disease Diseases 0.000 description 1
- 230000002757 inflammatory effect Effects 0.000 description 1
- 230000028709 inflammatory response Effects 0.000 description 1
- 230000015788 innate immune response Effects 0.000 description 1
- 230000030214 innervation Effects 0.000 description 1
- 229940125396 insulin Drugs 0.000 description 1
- 210000000936 intestine Anatomy 0.000 description 1
- 230000019948 ion homeostasis Effects 0.000 description 1
- 210000001847 jaw Anatomy 0.000 description 1
- 210000004561 lacrimal apparatus Anatomy 0.000 description 1
- 208000006443 lactic acidosis Diseases 0.000 description 1
- 210000000867 larynx Anatomy 0.000 description 1
- 210000004936 left thumb Anatomy 0.000 description 1
- 210000000265 leukocyte Anatomy 0.000 description 1
- 210000003041 ligament Anatomy 0.000 description 1
- 230000004322 lipid homeostasis Effects 0.000 description 1
- 150000002632 lipids Chemical class 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000031142 liver development Effects 0.000 description 1
- 230000033001 locomotion Effects 0.000 description 1
- 230000006742 locomotor activity Effects 0.000 description 1
- 208000026695 long chain 3-hydroxyacyl-CoA dehydrogenase deficiency Diseases 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 230000007040 lung development Effects 0.000 description 1
- 210000002751 lymph Anatomy 0.000 description 1
- 210000002540 macrophage Anatomy 0.000 description 1
- 210000005075 mammary gland Anatomy 0.000 description 1
- 208000024393 maple syrup urine disease Diseases 0.000 description 1
- 208000012406 maple syrup urine disease type 1B Diseases 0.000 description 1
- 230000029082 maternal behavior Effects 0.000 description 1
- 230000008774 maternal effect Effects 0.000 description 1
- 208000005548 medium chain acyl-CoA dehydrogenase deficiency Diseases 0.000 description 1
- 208000002839 megalencephalic leukoencephalopathy with subcortical cysts Diseases 0.000 description 1
- 230000006996 mental state Effects 0.000 description 1
- 230000027939 micturition Effects 0.000 description 1
- 230000002438 mitochondrial effect Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004973 motor coordination Effects 0.000 description 1
- 208000005340 mucopolysaccharidosis III Diseases 0.000 description 1
- 208000011045 mucopolysaccharidosis type 3 Diseases 0.000 description 1
- 208000025919 mucopolysaccharidosis type 7 Diseases 0.000 description 1
- 208000012226 mucopolysaccharidosis type IIIA Diseases 0.000 description 1
- 201000006417 multiple sclerosis Diseases 0.000 description 1
- 230000037191 muscle physiology Effects 0.000 description 1
- 230000009756 muscle regeneration Effects 0.000 description 1
- 208000011042 muscle-eye-brain disease Diseases 0.000 description 1
- 201000006938 muscular dystrophy Diseases 0.000 description 1
- 230000023105 myelination Effects 0.000 description 1
- 210000004165 myocardium Anatomy 0.000 description 1
- 239000002105 nanoparticle Substances 0.000 description 1
- 210000000822 natural killer cell Anatomy 0.000 description 1
- 208000009928 nephrosis Diseases 0.000 description 1
- 231100001027 nephrosis Toxicity 0.000 description 1
- 230000000955 neuroendocrine Effects 0.000 description 1
- 210000004498 neuroglial cell Anatomy 0.000 description 1
- 230000000926 neurological effect Effects 0.000 description 1
- 201000007657 neuronal ceroid lipofuscinosis 5 Diseases 0.000 description 1
- 230000007827 neuronopathy Effects 0.000 description 1
- 230000007823 neuropathy Effects 0.000 description 1
- 210000000440 neutrophil Anatomy 0.000 description 1
- 235000015097 nutrients Nutrition 0.000 description 1
- 230000034004 oogenesis Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012634 optical imaging Methods 0.000 description 1
- 230000005305 organ development Effects 0.000 description 1
- 208000007656 osteochondritis dissecans Diseases 0.000 description 1
- 230000036284 oxygen consumption Effects 0.000 description 1
- 210000003254 palate Anatomy 0.000 description 1
- 210000000496 pancreas Anatomy 0.000 description 1
- 208000027838 paramyotonia congenita of Von Eulenburg Diseases 0.000 description 1
- 210000002990 parathyroid gland Anatomy 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000032964 paternal behavior Effects 0.000 description 1
- 230000008775 paternal effect Effects 0.000 description 1
- 210000004197 pelvis Anatomy 0.000 description 1
- 230000018052 penile erection Effects 0.000 description 1
- 210000001428 peripheral nervous system Anatomy 0.000 description 1
- 210000003800 pharynx Anatomy 0.000 description 1
- 230000036211 photosensitivity Effects 0.000 description 1
- 208000024335 physical disease Diseases 0.000 description 1
- 230000035790 physiological processes and functions Effects 0.000 description 1
- 230000005371 pilomotor reflex Effects 0.000 description 1
- 230000001817 pituitary effect Effects 0.000 description 1
- 210000003635 pituitary gland Anatomy 0.000 description 1
- 210000002381 plasma Anatomy 0.000 description 1
- 239000002797 plasminogen activator inhibitor Substances 0.000 description 1
- 208000030761 polycystic kidney disease Diseases 0.000 description 1
- 208000001061 polyostotic fibrous dysplasia Diseases 0.000 description 1
- 208000015768 polyposis Diseases 0.000 description 1
- 230000009596 postnatal growth Effects 0.000 description 1
- 230000002028 premature Effects 0.000 description 1
- 229940039716 prothrombin Drugs 0.000 description 1
- 208000020016 psychiatric disease Diseases 0.000 description 1
- 230000004088 pulmonary circulation Effects 0.000 description 1
- 230000001179 pupillary effect Effects 0.000 description 1
- 201000010108 pycnodysostosis Diseases 0.000 description 1
- 208000022563 qualitative or quantitative defects of alpha-sarcoglycan Diseases 0.000 description 1
- 208000022561 qualitative or quantitative defects of beta-sarcoglycan Diseases 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 210000005227 renal system Anatomy 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 210000004994 reproductive system Anatomy 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 201000004193 respiratory failure Diseases 0.000 description 1
- 230000029058 respiratory gaseous exchange Effects 0.000 description 1
- 210000001533 respiratory mucosa Anatomy 0.000 description 1
- 210000003019 respiratory muscle Anatomy 0.000 description 1
- 210000002345 respiratory system Anatomy 0.000 description 1
- 230000008458 response to injury Effects 0.000 description 1
- 108091008146 restriction endonucleases Proteins 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 201000007714 retinoschisis Diseases 0.000 description 1
- 208000007442 rickets Diseases 0.000 description 1
- 210000004935 right thumb Anatomy 0.000 description 1
- 102220193204 rs1057516128 Human genes 0.000 description 1
- 210000003296 saliva Anatomy 0.000 description 1
- 210000003079 salivary gland Anatomy 0.000 description 1
- 208000010532 sarcoglycanopathy Diseases 0.000 description 1
- 201000000980 schizophrenia Diseases 0.000 description 1
- 206010039722 scoliosis Diseases 0.000 description 1
- 210000001732 sebaceous gland Anatomy 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 210000002966 serum Anatomy 0.000 description 1
- 210000003765 sex chromosome Anatomy 0.000 description 1
- 230000001568 sexual effect Effects 0.000 description 1
- 208000001392 short chain acyl-CoA dehydrogenase deficiency Diseases 0.000 description 1
- 210000002832 shoulder Anatomy 0.000 description 1
- SQVRNKJHWKZAKO-OQPLDHBCSA-N sialic acid Chemical compound CC(=O)N[C@@H]1[C@@H](O)C[C@@](O)(C(O)=O)OC1[C@H](O)[C@H](O)CO SQVRNKJHWKZAKO-OQPLDHBCSA-N 0.000 description 1
- 230000022379 skeletal muscle tissue development Effects 0.000 description 1
- 230000036548 skin texture Effects 0.000 description 1
- 210000003625 skull Anatomy 0.000 description 1
- 231100001051 skull abnormality Toxicity 0.000 description 1
- 230000007958 sleep Effects 0.000 description 1
- 230000000392 somatic effect Effects 0.000 description 1
- 230000019100 sperm motility Effects 0.000 description 1
- 230000021595 spermatogenesis Effects 0.000 description 1
- 231100000240 steatosis hepatitis Toxicity 0.000 description 1
- 210000001562 sternum Anatomy 0.000 description 1
- 210000002784 stomach Anatomy 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 208000023516 stroke disease Diseases 0.000 description 1
- 208000011117 substance-related disease Diseases 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 238000010897 surface acoustic wave method Methods 0.000 description 1
- 210000000106 sweat gland Anatomy 0.000 description 1
- 210000000457 tarsus Anatomy 0.000 description 1
- 230000036327 taste response Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 210000002435 tendon Anatomy 0.000 description 1
- 201000003896 thanatophoric dysplasia Diseases 0.000 description 1
- 239000010409 thin film Substances 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
- 210000001685 thyroid gland Anatomy 0.000 description 1
- 210000002303 tibia Anatomy 0.000 description 1
- 210000003437 trachea Anatomy 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 201000007905 transthyretin amyloidosis Diseases 0.000 description 1
- 208000001072 type 2 diabetes mellitus Diseases 0.000 description 1
- OUYCCCASQSFEME-UHFFFAOYSA-N tyrosine Natural products OC(=O)C(N)CC1=CC=C(O)C=C1 OUYCCCASQSFEME-UHFFFAOYSA-N 0.000 description 1
- 201000011296 tyrosinemia Diseases 0.000 description 1
- 201000007972 tyrosinemia type I Diseases 0.000 description 1
- 210000000623 ulna Anatomy 0.000 description 1
- 210000002438 upper gastrointestinal tract Anatomy 0.000 description 1
- 210000000689 upper leg Anatomy 0.000 description 1
- 230000002485 urinary effect Effects 0.000 description 1
- 210000002229 urogenital system Anatomy 0.000 description 1
- 210000005166 vasculature Anatomy 0.000 description 1
- 230000002227 vasoactive effect Effects 0.000 description 1
- 201000008618 vesicoureteral reflux Diseases 0.000 description 1
- 208000031355 vesicoureteral reflux 1 Diseases 0.000 description 1
- 230000031836 visual learning Effects 0.000 description 1
- 201000007790 vitelliform macular dystrophy Diseases 0.000 description 1
- 238000012070 whole genome sequencing analysis Methods 0.000 description 1
- 239000002676 xenobiotic agent Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6813—Hybridisation assays
- C12Q1/6827—Hybridisation assays for detection of mutation or polymorphism
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B10/00—ICT specially adapted for evolutionary bioinformatics, e.g. phylogenetic tree construction or analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/10—Ploidy or copy number detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/40—Population genetics; Linkage disequilibrium
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Definitions
- Mendel described the rules of probability that govern the inheritance of simple genetic traits. He demonstrated that individuals who show no evidence of a disease can still be silent “carriers” of a disease mutation. The disease only appears in a child who inherits two defective copies of a particular gene from two parents who are carriers. Diseases and other traits inherited in this manner are referred to as Mendelian. Thousands of Mendelian diseases of metabolism and other physiological functions have been described in the medical literature.
- each individual Mendelian disease mutation is relatively rare, they are cumulatively so common that nearly every human being is a carrier for at least one. Carrier testing can identify two potential parents who have the same mutation and thus have a 25% likelihood of transmitting the corresponding disease to their child.
- SNP single nucleotide polymorphism
- a pre-conception method for predicting the likelihood that a hypothetical child of any two persons, of opposite or same sex, who may or may not be fertile, will express any trait or disease that is subject to genetic influences that have been previously characterized, completely or partially.
- the methods allow for the simulation of the generation of multiple VirtualGametes, each of which contains a single allele from each genotype in combinatorial frequencies estimated for naturally produced sperm and eggs.
- VirtualGamete production uses Mendel's Law of independent assortment with a random number generator to choose one copy of each genetic locus independently of others for incorporation into a haploid genome profile. Accuracy and resolution is augmented with high complexity personal genome profiles that can be used to provide phasing information and further genotype imputation.
- a uniquely derived VirtualGamete from one personal genome is combined with a uniquely derived VirtualGamete from the second personal genome to produce a discrete Virtual Progeny (“VP”) genome sampling containing two definitive copies of each locus. This computational process is repeated a sufficient number of times to obtain a reproducible Virtual Progeny genome.
- VP Virtual Progeny
- a Virtual Progeny genome is a sufficiently large sampling of discrete genomes that are cumulatively representative of the likelihoods of alternative genotype combinations in a hypothetical child conceived from the two progenitors.
- Each discrete sampling that makes up a Virtual Progeny genome is evaluated independently for the likelihood of expressing any trait for which a genetic correlation has been previously determined.
- the traits associated with each Virtual Progeny sampling are normalized and combined to obtain a Virtual Progeny phenome distribution.
- Virtual Progeny may be evaluated for several thousand rare Mendelian diseases as well as hundreds of more common complex diseases such as diabetes, arteriosclerosis, autism, and schizophrenia.
- Disease likelihood values for a Virtual Progeny sampling can be calculated indirectly by protein modeling, computational analysis, and geographical origin of chromosomal regions, in addition to direct empirical associations.
- sperm donor and egg donor agencies can use Virtual Progeny to screen out specific client/donor “pairings” that could give rise to offspring with increased disease risk.
- Sperm donors who represent a heightened genetic risk in specific combination with a particular client will be removed from that client's pool of potential donors. It is likely that these same donors will show no evidence of risk in combination with most other clients. Since the generation of Virtual Progeny does not depend on interpreting genome profiles of the progenitors, it does not involve carrier screening, and is not burdened with problems of false stigmatization.
- Committed couples can use their virtual child's profile to prevent serious conditions through IVF technology or to prepare for their child's unique gifts and needs.
- FIG. 1A is a schematic illustration of an exemplary protocol for creating Virtual Progeny where DNA samples from two individuals (Jane Doe and John Smith) are processed to generate genome scans.
- FIG. 1B is a schematic illustration showing genome scans expanded into genome profiles with additional imputed genotypes and phasing information for genetic loci 01 through N.
- FIG. 1C is a schematic illustration showing the genome profile for John Smith, taking into account the profile for loci 01 through N.
- FIG. 1D is a schematic illustration showing a random VirtualGamete from each of Jane Doe and John Smith, the VirtualGametes being combined to form a Virtual Progeny genome sampling.
- FIG. 1E is a schematic illustration showing the haplopaths in rows 1 and 7 for John Smith.
- FIG. 2 is a representation of Virtual Progeny results for SRS and four partners in the gene region responsible for blue eye color.
- FIG. 3 is a schematic illustration of Virtual Progeny genome profiles containing simulated genotype information at approximately 10,000 SNP loci across 82 million base pairs of chromosome 15. The bottom row shows the location of every SNP locus in this region (each index mark associated with a differential trait).
- FIG. 4 is a summary of Virtual Progeny determinations from four matings of a client (“SRS”) mated to four potential partners in which each SNP locus has two alleles designated, A and D.
- SRS client
- a and D two alleles designated, A and D.
- Six classes of genotype results are possible for each SNPs or CNPs in a Virtual Progeny genome. The number of SNPs or CNPs in each class is shown in each mating.
- FIGS. 5A-5F are a schematic representation showing various diseases or traits associated with various SNPs or CNPs together with the genotype results of simulated mating of one woman subject to four potential partners.
- FIGS. 6A and 6B are a graphical representation of 100 haplopaths (H1-H100) across 9 SNP loci numbered 64-72, generated from 100 Monte Carlo simulations, where homologue index numbers are (0,1).
- an element means one element or more than one element.
- haploid cell refers to a cell with a haploid number (n) of chromosomes.
- Gametes are specialized haploid cells (e.g., spermatozoa and oocytes) produced through the process of meiosis and involved in sexual reproduction.
- gametotype refers to single copies with one allele of each of one or more loci in the haploid genome of a single gamete.
- an “autosome” is any chromosome exclusive of the X and Y sex chromosomes.
- diploid cell has a homologous pair of each of its autosomal chromosomes, and has two copies (2n) of each autosomal genetic locus.
- chromosome refers to a molecule of DNA with a sequence of basepairs that corresponds closely to a defined chromosome reference sequence of the organism in question.
- gene refers to a DNA sequence in a chromosome that codes for a product (either RNA or its translation product, a polypeptide) or otherwise plays a role in the expression of said product.
- a gene contains a DNA sequence with biological function.
- the biological function may be contained within the structure of the RNA product or a coding region for a polypeptide.
- the coding region includes a plurality of coding segments (“exons”) and intervening non-coding sequences (“introns”) between individual coding segments and non-coding regions preceding and following the first and last coding regions respectively.
- locus refers to any segment of DNA sequence defined by chromosomal coordinates in a reference genome known to the art, irrespective of biological function.
- a DNA locus can contain multiple genes or no genes; it can be a single base pair or millions of base pairs.
- a “polymorphic locus” is a genomic locus at which two or more alleles have been identified.
- an “allele” is one of two or more existing genetic variants of a specific polymorphic genomic locus.
- SNP single nucleotide polymorphism
- CNV copy number variant
- CNP copy number polymorphism
- genotype refers to the diploid combination of alleles at a given genetic locus, or set of related loci, in a given cell or organism.
- a homozygous subject carries two copies of the same allele and a heterozygous subject carries two distinct alleles.
- three genotypes can be formed: A/A, A/a, and a/a.
- genotyping refers to any experimental, computational, or observational protocol for distinguishing an individual's genotype at one or more well-defined loci.
- haplotype is a unique set of alleles at separate loci that are normally grouped closely together on the same DNA molecule, and are observed to be inherited as a group.
- a haplotype can be defined by a set of specific alleles at each defined polymorphic locus within a haploblock.
- Haploblock refers to a genomic region that maintains genetic integrity over multiple generations and is recognized by linkage disequilibrium within a population. Haploblocks are defined empirically for a given population of individuals.
- linkage disequilibrium is the non-random association of alleles at two or more loci within a particular population. Linkage disequilibrium is measured as a departure from the null hypothesis of linkage equilibrium, where each allele at one locus associates randomly with each allele at a second locus in a population of individual genomes.
- a “genome” is the total genetic information carried by an individual organism or cell, represented by the complete DNA sequences of its chromosomes.
- a “genome profile” is a representative subset of the total information contained within a genome.
- a genome profile contains genotypes at a particular set of polymorphic loci.
- PGP personal genome profile
- a genetic “trait” is a distinguishing attribute of an individual, whose expression is fully or partially influenced by an individual's genetic constitution.
- phenotype is a class of alternative traits which may be discrete or continuous.
- a “haplopath” is a haploid path laid out along a defined region of a diploid genome by a single iteration of a Monte Carlo simulation or a single chain generated through a Markov process.
- a haplopath can be formed by starting at one end of a personal chromosome or genome and walking from locus to locus, choosing a single allele at each step based on available linkage disequilibrium information, inter-locus allele association coefficients, and formal rules of genetics that describe the natural process of gamete production in a sexually reproducing organism.
- a “haplopath” is generated through the application of formal rules of genetics that describe the reduction of the diploid genome into haploid genomes through the natural process of meiosis.
- a “Virtual Gamete” is a single haplopath that extends across an entire genome.
- a “Virtual Progeny genome sampling” is the discrete genetic product of two Virtual Gametes.
- a “Virtual Progeny genome” is a collection of discrete Virtual Progeny genome samplings, each generated by combining two uniquely-derived random Virtual Gametes.
- a Virtual Progeny genome is represented as a probability mass function over a sample space of all discrete genome states.
- a Virtual Progeny is an informed simulation of a child or children that might result as a consequence of sexual reproduction between two individuals.
- a “Virtual Progeny phenome” is a multi-dimensional likelihood function representing the likelihood and/or likely degree of expression of a set of one or more traits from a complete Virtual Progeny genome.
- a Virtual Progeny phenome is represented as a probability mass function over a sample space of discrete or continuous phenotypic states.
- a Virtual Progeny phenome is an informed simulation of a child or children that might result as a consequence of sexual reproduction between two individuals.
- partner includes a marriage partner, sexual or reproductive partner, domestic partner, opposite-sex partner, and same-sex partner.
- the methods and compositions disclosed herein relate to assessing the genotypes of individuals and the phenotypes associated with particular genotypes of potential progeny from such individuals.
- genome profiles from two individuals are used to determine the probabilities that potential progeny from such individuals will express certain traits, such as an increased risk of disease. Such methods are referred to herein as “Virtual Progeny assessment.”
- the methods comprise obtaining a genomic DNA sample from a diploid subject.
- the presence or absence of one or more nucleotide variants are identified at one or more loci of at least one pair of chromosomes of the genomic DNA.
- a haploid gamete genome from a sampling of the potential gamete pool is generated from the diploid genome profile.
- a sampling of a potential very large number of gametes is performed rather than identifying each gamete from a population (i.e., “pool”) of gametes.
- These identified nucleotide variants are compared to a plurality of predetermined genomic sequences of haplotypes having predetermined frequencies at predetermined loci to identify haplotypes present in the genomic DNA.
- the methods also entail the construction of a diploid genome profile for the subject.
- the genome profile comprises the identified haplotypes and a linkage probability determined by the frequencies of the haplotypes in the plurality of predetermined genomic sequences.
- a haploid gamete genome for each potential gamete can be generated from the diploid genome profile by generating a combination of the identified haplotypes using the linkage probability for each combination of the identified haplotypes.
- aspects of the methods and systems disclosed herein involve generating a library of potential haploid gamete genomes from an individual diploid subject.
- the methods comprise providing a database having a plurality of predetermined genomic sequences.
- a first proportion of the genomic sequences comprises a first predetermined haplotype adjacent to a second predetermined haplotype.
- a second proportion of the genomic sequences comprises the first predetermined haplotype adjacent to a third predetermined haplotype.
- genomic DNA sample from the subject is obtained and the presence or absence of one or more nucleotide variants at one or more loci of at least one pair of chromosomes of the genomic DNA is identified.
- the identified nucleotide variants are compared to the database to allow for identification of a plurality of sample haplotypes present in the at least one pair of chromosomes.
- the plurality of sample haplotypes comprises the first haplotype adjacent to a wobble haplotype.
- the wobble haplotype is either the second haplotype or the third haplotype.
- the methods further entail a diploid genome profile that is constructed for the at least one pair of chromosomes of the subject from the identified sample haplotypes.
- the genome profile comprises the first haplotype, the wobble haplotype, and a linkage probability determined by the proportion of the predetermined genomic sequences.
- the genomic sequences comprise the first haplotype adjacent to the wobble haplotype.
- a haploid gamete genome is generated for each potential gamete from the diploid genome profile by linearly combining the first haplotype and the wobble haplotype using the linkage probability.
- the diploid profiles generated from the methods disclosed herein comprise additional identified and additional linkage probabilities determined by the proportion of the predetermined genomic sequences that comprise the first predetermined haplotype adjacent to each additional haplotype.
- a method of selecting a potential sperm or oocyte donor is further disclosed.
- the donor genomic DNA samples from potential sperm donors or potential oocyte donors are obtained as well as a recipient genomic DNA sample from a potential recipient.
- the DNA is analyzed to identify the presence or absence of one or more nucleotide variants at one or more loci of at least one pair of chromosomes of the donor genomic DNA samples and the recipient genomic DNA sample.
- the identified nucleotide variants are compared to a plurality of predetermined genomic sequences of haplotypes having predetermined frequencies at predetermined loci to identify haplotypes present in the donor genomic DNA samples and the recipient genomic DNA sample.
- a donor diploid genome profile is constructed for each potential donor.
- Each donor diploid genome profile comprises the identified haplotypes in each donor genomic DNA sample and a linkage probability determined by the frequencies of the identified haplotypes in the plurality of predetermined genomic sequences.
- a recipient diploid genome profile for the potential recipient is constructed.
- the recipient genome profile comprises the identified haplotypes (in this case, the haplotypes in the recipient genomic DNA sample) and a linkage probability determined by the frequencies of the identified haplotypes in the plurality of predetermined genomic sequences.
- Libraries are generated for donors and recipients, each library comprising potential haploid gamete genomes from a diploid genome profile.
- Each potential haploid gamete genome is generated by combining the haplotypes identified in the respective genomic DNA samples using the linkage probability for each combination of the identified haplotypes.
- Each haploid gamete genome is independently combined from each donor library with a second haploid gamete genome from the recipient library to form a library of diploid progeny genomes for each potential donor.
- the diploid progeny genomes are compared to a database of disease-associated genomes to assess the risk of disease of each potential progeny, wherein a sperm donor or an oocyte donor is eliminated from consideration if an increased risk of disease of the potential progeny is determined.
- FIGS. 1A-1F depict an exemplary method of the methods disclosed herein.
- the exemplary methods shows how to create a Virtual Progeny.
- DNA samples from two individuals are obtained.
- the samples are then processed by performing genome scans to identify genotypes at genetic markers, such as SNPs or CNPs, present in each individual's genomic DNA.
- Identified genotypes can be used to expand a personal genome profile by imputation of genotypes at non-processed genetic markers and to determine haplotypes present within each individual's genomic DNA.
- the likelihoods of association of sequential haplotypes are retrieved from lookup tables generated by the international HapMap project.
- the methods comprise obtaining a genomic DNA sample from the subject. These methods of assessing potential Virtual Progeny involves performing genome scans on individuals. In certain embodiments, the individuals share a common ancestry. Genome scans can be performed using any of a number of known procedures. For example, a biologic sample from an individual can first be obtained. Such biological samples include, but are not limited to, a bodily fluid (such as urine, saliva, plasma, or serum) or a tissue sample (such as a buccal tissue sample or buccal cell). The biologic sample can then be used to perform a genome scan using known methods. For example, DNA arrays can be used to analyze at least a portion of the genomic sequence of the individual. Exemplary DNA arrays include GeneChip Arrays, GenFlex Tag arrays, and Genome-Wide Human SNP Array 6.0 (available from Affymetrix, Santa Clara, Calif.).
- whole or partial genome sequence information is used to perform the genome scans.
- sequences can be determined using standard sequencing methods including chain-termination (Sanger dideoxynucleotide), dye-terminator sequencing, and SOLiD T ⁇ sequencing (Applied Biosystems).
- Whole genome sequences can be cut by restriction enzymes or sheared (mechanically) into shorter fragments for sequencing.
- DNA sequences can also be amplified using known methods such as PCR and vector-based cloning methods (e.g., Escherichia coli ).
- At least a portion of an individual's genetic material e.g., DNA, RNA, mRNA, cDNA, other nucleotide bases or derivative thereof
- an individual's genetic material e.g., DNA, RNA, mRNA, cDNA, other nucleotide bases or derivative thereof
- a scanning step can involve scanning at least about 1,000 bases, at least about 5,000 bases, at least about 10,000 bases, at least about 20,000 bases, at least about 50,000 bases, at least about 100,000 bases, at least about 200,000 bases, at least about 500,000 bases, at least about 1,000,000 bases, at least about 2,000,000 bases, at least about 5,000,000 bases, at least about 10,000,000 bases, at least about 20,000,000 bases, at least about 50,000,000 bases, at least about 100,000,000 bases, at least about 200,000,000 bases, at least about 500,000,000 bases, at least about 1,000,000,000 bases, at least about 2,000,000,000 bases, or at least 3,000,000,000 bases of an individual's genetic material.
- nucleotide bases are scanned from a first set of individuals (e.g., at least about 10 individuals, at least about 20 individuals, at least about 30 individuals, at least about 40 individuals, at least about 50 individuals, at least about 100 individuals, at least about 250 individuals, at least about 500 individuals, or more), and genetic variations between individuals are identified. Genetic variation data generated from each individual can be compared with genetic variation data generated from other individuals in a first set of individuals to discover genetic variations among the first group of individuals.
- a first set of individuals e.g., at least about 10 individuals, at least about 20 individuals, at least about 30 individuals, at least about 40 individuals, at least about 50 individuals, at least about 100 individuals, at least about 250 individuals, at least about 500 individuals, or more
- the variations identified in the first set of individuals can be used in subsequent studies in which such variations are analyzed to determine if they are associated with a phenotype-of-interest.
- These variations can include, e.g., SNPs or CNPs, common SNPs or CNPs, informative SNPs or CNPs, rare SNPs or CNPs, deletions, insertions, or frameshift mutations.
- Such genetic variations can be detected in, e.g., genomic DNA, RNA, mRNA, or derivatives thereof.
- genetic variations scanned and/or identified are informative SNPs or CNPs.
- a limited number of informative SNPs or CNPs e.g., about 300 to about 500,000, can be scanned or read.
- a limited number of informative SNPs or CNPs e.g., about 300 to about 500,000, can be scanned or read.
- scanning whole genomes is contemplated, in other instances, only specific chromosomes, loci, common SNPs or CNPs, or informative SNPs or CNPs are scanned and/or used.
- Specific chromosomes, loci, common SNPs or CNPs, or informative SNPs or CNPs can be selected based on prior knowledge that such regions are related to a particular phenotype of interest.
- the scanning step is supplemented and/or substituted by obtaining data on genetic variations from databases.
- databases can provide, for example, a list of identified genetic variations (e.g., SNPs or CNPs or haplotypes) or genotyping data on particular individuals or populations.
- Examples of publicly available databases useful in the methods described herein include, but are not limited to, UCSC's Genome Browser, NCBI's dbSNP, MIT's human SNP database, University of Geneva's human Chromosome 21 SNP database, and the University of Tokyo's SNP database.
- Other databases known in the art can be used in conjunction with the methods described herein.
- haplotype structure of the human genome analysis of a relatively small number of SNP loci can be used to profile an entire human genome. For example, a haplotype, containing dozens or hundreds of SNP alleles, can be “tagged” with just a few well-chosen “Tag SNPs or CNPs”. A nearly complete whole genome profile of an individual can thus be obtained, e.g., by using a DNA microarray that distinguishes genotypes at around 500,000 Tag SNPs or CNPs.
- haploblock information obtained from the International HapMap project (hapmap.org) for a population of European ancestry.
- the presence or absence of one or more nucleotide variants are identified at one or more loci of at least one pair of chromosomes of the genomic DNA and these identified nucleotide variants are compared to a plurality of predetermined genomic sequences of haplotypes having predetermined frequencies at predetermined loci to identify haplotypes present in the genomic DNA.
- the European frequency of each haplotype is indicated in the first column. The six shown add % of total variation.
- the block of elements with bolded perimeter represents a modified stochastic matrix (“A x ”) with the observed joint frequencies of occurrence of the corresponding row haplotype of X together with row haplotype k of haplogroup X+1.
- a x modified stochastic matrix
- L term in the following equations is “ . . . .”
- the first “a” term in the following equation is
- the methods further comprise comparing the plurality of predetermined genomic sequences of haplotypes having predetermined frequencies at predetermined loci to identify haplotypes present in the genomic DNA.
- the assessment of Virtual Progeny includes imputing from publicly available information to further characterize the genotypes of individuals. For example, through an automated process, the composition of haplogroup 69 described above can be reduced to haplotypes 1 and 2 for the “AM” genome, and haplotypes 3 and 4 for “LMS” (Table 2). With the assignment of haplotypes, additional association information can be obtained from the public database for a personal genome as shown in the Tables 3 and 4.
- a useful, simplified formulation of the extended genotype (g) carried by an individual person (W) at a previously defined haplogroup (X) takes the form of the following 3 ⁇ 3 matrix containing a 2 ⁇ 2 stochastic matrix partition derived from the haplogroup lookup table, where each element contains the empirically-determined joint frequency of haplotypes i i X with ⁇ j X+1
- g ⁇ ⁇ ( w ) [ X ⁇ ⁇ ⁇ 1 ⁇ + 1 ⁇ ⁇ ⁇ 2 ⁇ + 1 ⁇ i ⁇ 1 ⁇ ⁇ ⁇ i ⁇ 1 ⁇ , ⁇ ⁇ ⁇ 1 ⁇ + 1 ⁇ ⁇ i ⁇ 1 ⁇ , ⁇ ⁇ ⁇ 2 ⁇ + 1 ⁇ i ⁇ 2 ⁇ ⁇ ⁇ i ⁇ 2 ⁇ , ⁇ ⁇ ⁇ 1 ⁇ + 1 ⁇ ⁇ i ⁇ 2 ⁇ , ⁇ ⁇ ⁇ 2 ⁇ + 1 ] , ⁇ ⁇ i ⁇ X ⁇ ( 1 , L , v ⁇ ) , ⁇ ⁇ ⁇ X + 1 ⁇ ( 1 , L , v ⁇ + 1 )
- initial association matrices can be parsed with diploid genome consistency rules, leading to a transformation of:
- Associations can be determined according to several criteria including, e.g., population data and inter-locus distance on a chromosome. When no association information is available for sequential loci, independent assortment can be assumed:
- the complete formulation of a whole genome profile can be an indexed set of genotype matrices:
- a haploid gamete genome for each potential gamete is generated from the diploid genome profile by generating a combination of the identified haplotypes using the linkage probability for each combination of the identified haplotypes.
- the Virtual Gametes are then generated using a computer simulation of gamete production from a diploid parental genome profile.
- the simulated haplopath can be computed or generated by subjecting the parental genome to formal rules of genetics that operate naturally during meiosis and gamete production.
- Formal rules of genetics are known in the art, and are mathematical formulas or algorithms that serve as abstract representations of the biological processes of sexual reproduction. Rules are based on allele segregation, independent assortment, linkage between genetic loci, recombination suppression, Hardy-Weinberg equilibria, and other probabilistic genetic processes. Formal rules can be used to estimate the likelihood of transmission of particular alleles and combinations of alleles at multiple loci, from an individual to a gamete.
- the geographic origin of a parental genome, or subregion thereof, can provide population-specific allele and genotype frequency information that can be incorporated into computational models that predict genotypic and phenotypic probabilities of Virtual Progeny.
- haplopath includes a single allele (variant or haplotype) from each meta-locus, defined by the index number of the chromosomal homologue.
- a haplopath can be initiated with a random number generator (e.g., such as a Monte Carlo method) that chooses a random allele(i) at a random initializing locus in the set of N such loci.
- a random number generator e.g., such as a Monte Carlo method
- Each prior and subsequent allele along the haplopath can be generated according to normalized likelihoods derived from locus-specific association matrices.
- H i ⁇ ⁇ h 1 ,h 2 , . . . ,h N ⁇ ,h ⁇ ⁇ (1,2)
- H i ⁇ ⁇ h 1 ,h 2 , . . . ,h N ⁇ ,h ⁇ ⁇ (1,2); h ⁇ , ⁇ ⁇ ( h ⁇ ,i ⁇ )
- FIG. 6 shows an example of 100 iterations of a Monte Carlo run on the genomic region shown above, where homologue index numbers have been converted from (1,2) to (0,1).
- the Virtual Progeny genome is a collection of permutations of diploid genomes that can each be formed by combining a random Virtual Gamete from one parent (the paternal line, pat) with a random Virtual Gamete from a second parent (the maternal line, mat).
- Each permutation of a Virtual Progeny genome comprises a discrete set of defined integer genotypes.
- a single permutation (i) of a Virtual Progeny genome can take the following form:
- a Virtual Progeny genome can comprise a set of individual permutations of Virtual Progeny genomes:
- G VP ⁇ G 1 VP ,G 2 VP , . . . ⁇
- a phenome can comprise a measure of phenotypes and traits expressed by a diploid organism over the time period of its life. Numerous discrete or continuous phenotypes associated with discrete genotypes are listed on databases maintained at the National Institutes of Health division of Bioinformatic Information and other public databases. Additional sources of discrete or continuous phenotypes associated with discrete genotypes are located in PubMed, the UCSC browser, NCBI (fancy output genomic browser), Online Mendelian Inheritance in Man (“OMIM”), SNPedia, GeneTests, Entrez Gene, HuGENavigator, HuGENavigator/Genopedia/Search, HuGENavigator/Phenopedia/Search, NextBio Database, and Genetic Association Database.
- PubMed PubMed
- NCBI field output genomic browser
- OMIM Online Mendelian Inheritance in Man
- databases for SNP and variant datasets include SNP Cluster Report, Genome-Wide Association Studies (National Human Genome Research Institute), Autism Chromosome Rearrangement Database (hosted by The Centre for Applied Genomics), and the Database of Genomic Variants (hosted by The Centre for Applied Genomics).
- the Virtual Progeny phenome (ph VP ) can comprise a single probability density function defined by the summation of the weighted set of phenomes that are individually associated with each permutation of a Virtual Progeny genome.
- prior population genetic data can be used to predict the population or populations of origin for a trait-affecting locus, which can be incorporated into models that can be used to predict trait likelihoods of virtual or actual progeny.
- Phenotypes associated with all genotypes in the Virtual Progeny sample space can be integrated to produce an overall assessment of phenotypic likelihoods (in terms of penetrance and expressivity) for each individual trait, alone or in combination.
- the method entails a genotyping panel with the disease impacted by each locus, the official name of the locus, the alleles at each locus that are probed, the frequency of these alleles in the population being analyzed, standard reference names for probes that detect each allele, (an abbreviated name for each allele for the purposes of this illustration), and the impact of the allele on disease risk or disease expression within each genotypic context.
- Table 1 shows information for a two-locus genotyping panel.
- Genotype data captured from each individual who is a component of the analysis is assembled (Table 8).
- a Monte Carlo algorithm is applied to each individual genotype profile to generate a pool of “Virtual Gametes” containing single alleles from each locus.
- a virtual gamete from each designated genetic parent is chosen randomly and combined to produce one permutation of a potential child's genotype profile.
- the process of virtual gamete choice and combination to produce a diploid genome is iterated a sufficient number of times so that the sum of permutations provides a stable estimate of a child's genome likelihood distribution, as illustrated in columns 1 and 2 of Table 9.
- the risk of disease associated with each discrete Virtual Progeny genotype is determined from the previously established risk table (from Table 7 into column 3 of Table 9). Disease risks associated with each genotype are weighted according to their appearance in the set of permutations to assign a normalized disease risk to the particular pairing under analysis (Table 10).
- the final result obtained above is used in conjunction with a chosen risk tolerance cutoff to determine whether the donor should be retained or removed from the client pool. If risk tolerance had been set previously at 1/50,000, the result obtained in this example would lead to no action being taken. Donor B would remain in client #1's pool. However, the other donors would be eliminated as potential parents of offspring. In the case of disease or particular traits, an increased likelihood or risk of such disease would have been determined, thereby rendering certain donors eliminated from the pool of potential donors.
- the potential parent is a sperm donor. In other instances, the potential parent is an oocyte donor.
- the Virtual Progeny genome which actually comprises a collection of likely genome permutations, or distribution of genome states generated through a Markov process, can be interrogated for associated trait expression.
- Each discrete genome in the Virtual Progeny collection or distribution can be evaluated independently for expected trait expression, based on published genotype-phenotype associations.
- the weighted summation of traits expressed by individual genome permutations can yield a Virtual Progeny phenome probability distribution.
- the disclosed methods and systems can be used to provide visual representations of differential virtual progeny results for virtual mating of a client with potential donors ( FIG. 2 ). Further, each of the virtual matings are compared.
- the methods disclosed herein are used to obtain samples, which are processed by performing genome scans to identify SNPs or CNPs present in each individual's genomic DNA.
- the identified genetic markers are used to expand personal genome data and to determine haplotypes present within each individual's genomic DNA.
- the likelihoods of association of sequential haplotypes are retrieved from lookup tables generated by the international HapMap project. Using the identified haplotypes together with association data, reiteration of a Monte Carlo simulation is performed to generate a haplopath, resulting in a Virtual Gamete population for each individual.
- Virtual Gametes from each individual are combined to produce Virtual Progeny genome samplings, each of which is evaluated for corresponding trait likelihood values. Finally, an integrated Virtual Progeny phenome likelihood distribution is determined to assess the probability that potential progeny express certain traits, such as increased risk of disease.
- This methodology was used to identify probable phenotypes of Virtual Progeny from the simulated matings of four donors with client SRS ( FIG. 2 ).
- Virtual Progeny are generated analyzing the potential phenotypes that would be generated from the matings of SRS with four different donors.
- VP-SRSxTJ i.e., a mating of SRS to TJ
- the progeny phenotypes will all be blue eyes
- VP-SRSxAFl i.e., a mating SRS to AFl
- the phenotypes will all be brown eyes.
- the Virtual Progeny likelihood phenotype is 50% blue and 50% brown.
- FIG. 3 shows an example of a visual representation of Virtual Progeny genome profiles for the matings of the SRS to the four donors, all of which are shown across 82 million base pairs of Chromosome 15. The bottom row shows the loci associated with various traits.
- FIG. 4 shows an analysis of Virtual Progeny (“VP”) genotypes.
- Each SNP locus has two alleles (A and D), which can produce three distinct genotypes (AA, AD, DD).
- A is used to signify the original “ancestral” allele and “D” is the allele “derived” by mutation.
- a and D each refer to empirically determined nucleotide bases A, C, G, or T. Utilizing reference information from databases known in the art and disclosed herein, “A” is used as a reference allele.
- a SNP genotype is completely described by the number of ancestral alleles it contains (0, 1, 2). For instance, an integer is used to indicate determination of a single definitive genotype. If a progeny is described using a “0” for a particular allele, then the progeny definitively has zero ancestral alleles at the particular locus. If “1” describes the progeny, the progeny definitively has one ancestral allele at that locus. If “2” describes the progeny, then the progeny definitively has both ancestral alleles. In other words, integers occur when both mates are homozygous for a particular allele. Non-integers (0.5,1.5) indicate restriction to the two genotypes that are ⁇ 0.5 removed.
- non-integers indicate that one mate was heterozygous for a particular allele.
- an out-of-bounds integer (5) is used when data from the two virtual parents is non-informative (i.e., both parents are heterozygous at that locus).
- the number of loci falling into the categories 0, 0.5, 1, 1.5, 2, and 5 are shown.
- each partner provides a different profile for SNPs or CNPs found in the reference databases.
- FIG. 5 provides a more detailed view of actual SNPs or CNPs that are known and can be analyzed using the methods and systems disclosed herein.
- Various SNPs or CNPs and the results of simulated mating of one woman subject to four potential partners is again shown as either an integer (0, 1, or 2), a non-integer (0.5 or 1.5), or as non-informative (5) ( FIG. 5 ).
- the profile of each VP of each mating is provided for the various traits and diseases.
- the data comparison identified virtual progeny genotypes at over 1,300 SNP loci that influence over 100 disease traits shown in the table.
- Nonlimiting examples of traits that may assessed using the methods described herein include or relate to ability to roll the tongue, ability to taste PTC, acute inflammation, adaptive immunity, addiction(s), adipose tissue, adrenal gland, age, aggression, amino acid level, amyloidosis, anogenital distance, antigen presenting cells, auditory system, autonomic nervous system, avoidance learning, axial defects or lack thereof, B cell deficiency, B cells, B lymphocytes (e.g., antigen presentation), basophils, bladder size/shape, blinking, blood chemistry, blood circulation, blood glucose level, blood physiology, blood pressure, body mass index, body weight, bone density, bone marrow formation/structure, bone strength, bone/skeletal physiology, breast size/shape, bursae, cancellous bone, cardiac arrest, cardiac muscle contractility, cardiac output, cardiac stroke volume, cardiomyopathy, cardiovascular system/disease, carpal bone, catalepsy, cell abnormalities, cell death, cell differentiation, cell morphology, cell number, cell-
- Nonlimiting traits include cognitive ability (Ruano et al., Am. J. Hum. Genet. 86:113 (2010)); Familial Osteochondritis Dissecans (Stattin et al., Am. J. Hum. Genet. 86:126 (2010)); hearing impairment (Schraders et al., Am. J. Hum. Genet. 86:138 (2010)); mental retardation associated with autism, epilepsy, or macrocephaly (Giannandrea et al., Am. J. Hum. Genet. 86:185 (2010)); muscular dystrophies (Bolduc et al., Am. J. Hum. Genet. 86:213 (2010)); Diamond-Blackfan anemia (Doherty et al., Am.
- hypophosphatemic rickets (Lorenz-Depiereux et al., Am. J. Hum. Genet. 86:267 (2010); Levy-Litan et al., Am. J. Hum. Genet. 86:273 (2010)); rhabdoid tumor predisposition syndrome (Schneppenheim et al., Am. J. Hum. Genet. 86:279 (2010)); and multiple sclerosis (Jakkula et al., Am. J. Hum. Genet. 86:285 (2010)).
- Yet other nonlimiting traits include 21-Hydroxylase Deficiency, ABCC8-Related Hyperinsulinism, ARSACS, Achondroplasia, Achromatopsia, Adenosine Monophosphate Deaminase 1, Agenesis of Corpus Callosum with Neuronopathy, Alkaptonuria, Alpha-1-Antitrypsin Deficiency, Alpha-Mannosidosis, Alpha-Sarcoglycanopathy, Alpha-Thalassemia, Alzheimers, Angiotensin II Receptor, Type I, Apolipoprotein E Genotyping, Argininosuccinicaciduria, Aspartylglycosaminuria, Ataxia with Vitamin E Deficiency, AtaxiaTelangiectasia, Autoimmune Polyendocrinopathy Syndrome Type 1, BRCA1 Hereditary Breast/Ovarian Cancer, BRCA2 Hereditary Breast/Ovarian Cancer, Bardet-Biedl Syndrome, Best Vitelliform Macular Dystrophy, Beta-Sarco
- the methods of assessing the probability that progeny will express certain traits can be implemented into systems, programs, and/or services, which can be authorized by, referred by, and/or performed by, e.g., agencies, public or private companies, genetic counseling centers, dating or match-making services, sperm banks, egg providers, reproductive service providers, fertility clinics, or specialty laboratories
- the methods described herein are integrated into a testing service that can provide information to a couple on the probability that the couple's offspring will express one or more traits described herein, such as risk of a disease.
- a testing service can provide information to a couple on the probability that the couple's offspring will express one or more traits described herein, such as risk of a disease.
- referrals to genetic counselors and/or other relevant medical professionals can be provided in order to provide for follow up testing and consultation.
- a Virtual Progeny assessment begins with a customer order, and the customer can pay a service provider a fee in exchange for the assessment.
- a customer can be a two potential parents, e.g., partners.
- a customer can be a physician, a genetic counselor, a medical center, an insurance company, a website, a dating service, a matchmaking service, a pharmaceutical company, or a laboratory testing service provider, who places an order on behalf of two potential parents.
- a customer can be two prospective parents who seek to learn whether their offspring will be at risk for developing disease.
- DNA collection kits can be sent to the prospective parents, who can deposit a biological sample described herein into the collection kits.
- the collection kits can then be returned to the company for sending to a specialty lab or can be returned directly to the specialty lab for performing the assessment.
- a specialty lab either internal within the company, contracted to work with the company, or external from the company, can isolate the potential parents' DNA from the provided samples for genome scanning from which Virtual Progeny can be generated, as described herein.
- the results can be provided to the potential parents.
- the results can inform the potential parents of the chances that their future offspring will express one or more traits, such as traits described herein.
- the potential parents can also receive, for example, direct phone consultation with a genetic counselor employed by the company, or contact information for genetic counselors and/or other medical professionals who can provide the potential parents with follow up testing and consultation.
- a Virtual Progeny assessment can be offered to a customer in connection with a matchmaking service, for example, through a single company or a co-marketing or partnership relationship.
- a user of a matchmaking service can order an assessment of Virtual Progeny described herein to determine the probability that an offspring resulting from the potential match between the user and a candidate partner will express one or more traits described herein. The user can then use this information to aid in evaluating the candidate partner for a potential match.
- the matchmaking service can be an on-line service, such as Shaadi.com, eHarmony.com and Match.com.
- assessment of Virtual Progeny begins with a customer order, where the customer pays a fee in exchange for the assessment.
- a customer can be a user of a matchmaking service who is interested in evaluating another user for a suitable match.
- Such a customer can use an assessment of Virtual Progeny described herein to learn whether the potential offspring of a match between such customer and a candidate partner will express one or more traits, such as risk of disease.
- a customer can pay for both the customer's and the candidate partner's initial genomic scans with the candidate partner's consent. In other instances, the customer and the candidate partner can also pay separately for the initial genomic scans.
- DNA collection kits can be sent to the customer and the candidate partner, and the customer and the candidate partner can each deposit a biological sample into the collection kit.
- the collection kits can then be returned to the company for sending to a specialty lab or can be returned directly to the specialty lab for processing according to the methods described herein.
- a specialty lab either internal within the company, contracted to work with the company, or external from the company, can perform genomic scans on the customer's and candidate partner's DNA from the provided sample and perform an assessment of Virtual Progeny using the methods described herein.
- the results of the assessment can then be provided to the customer and/or the candidate partner, and the customer and/or the candidate partner can use the results of the assessment in determining whether the other party is a suitable match.
- a female client seeking to have a child can have a Virtual Progeny assessment performed with one or more sperm donors to aid in selecting a donor.
- potential sperm donors are first recruited by a sperm bank Donors who complete the screening process and are considered qualified by the sperm bank then provide a biological sample (such as a buccal swab) that can be processed to obtain whole DNA sequence, SNP genotypes, CNV genotypes or any other digital genetic information.
- a female client also provides a biological sample, such as a buccal swab, which is used to generate a genome profile for the female client.
- the female client genome is then recombined computationally with each donor genome to generate a series of independent Virtual Progeny genomes, as described herein, representing each potential donor-client combination.
- Each Virtual Progeny genome can then be assessed for the probability of exhibiting one or more traits, such as increased risk of disease.
- incompatible donor-client combinations are subtracted from the total donor pool to obtain a client-specific filtered donor pool, which can be used, e.g., as a starting point for further selection by the client.
- a client can be given information on the probability of the incidence of one or more traits from donor-client combinations, such as traits preselected by the client, for further sperm donor selection by the client.
- a male client seeking to have a child can have a Virtual Progeny assessment performed with one or more egg donors to aid in selecting a donor.
- Egg donors can provide a biological sample (such as a buccal swab) to generate a genome profile for the egg donor, as described herein.
- the male client also provides a biological sample, such as a buccal swab, which is used to generate a genome profile for the male client.
- the male client genome is then recombined computationally with each egg donor genome to generate a series of independent Virtual Progeny genomes, as described herein, representing each potential donor client combination.
- Each Virtual Progeny genome can then be assessed for the probability of exhibiting one or more traits, such as increased risk of disease.
- incompatible donor-client combinations are subtracted from the total donor pool to obtain a client-specific filtered donor pool, which can be used, e.g., as a starting point for further selection by the client.
- a client can be given information on the probability of the incidence of one or more traits from donor-client combinations, such as traits preselected by the client, for further egg donor selection by the client.
- a heterosexual couple seeking to use a sperm or egg donor to have a child can use Virtual Progeny assessments to screen potential donors.
- the couple may seek a sperm donor, and the female partner will be the genetic parent of offspring with the sperm donor.
- the couple may seek an egg donor, and the male partner will be the genetic parent of offspring with the egg donor.
- two rounds of Virtual Progeny assessments can be performed.
- a first round of Virtual Progeny assessment is performed using biological samples from the heterosexual couple.
- a second round of Virtual Progeny assessment is performed between the genetic parent and one or more potential donors. The results of the first round of Virtual Progeny assessment can then be compared with the results of the second round, and a donor can be chosen whose Virtual Progeny exhibits an acceptable amount of matching in one or more traits with the Virtual Progeny from the heterosexual couple.
- a female homosexual couple seeking to use a sperm donor to have a child can use Virtual Progeny assessments to screen potential sperm donors. Only one of the female partners will be the genetic parent of offspring with the sperm donor.
- a first round of Virtual Progeny assessment is performed using biological samples from the homosexual couple.
- a second round of Virtual Progeny assessment is performed between the genetic female parent and one or more potential sperm donors. The results of the first round of Virtual Progeny assessment can then be compared with the results of the second round, and a sperm donor can be chosen whose Virtual Progeny exhibits an acceptable amount of matching in one or more traits with the Virtual Progeny from the homosexual couple.
- a Virtual Progeny assessment is also performed with the second female partner and one or more potential sperm donors, and a donor is selected whose Virtual Progeny exhibits an acceptable amount of matching in one or more traits with the Virtual Progeny from the homosexual couple.
- a male homosexual couple seeking to use an egg donor to have a child can use Virtual Progeny assessments to screen potential egg donors. Only one of the male partners will be the genetic parent of offspring with the egg donor.
- a first round of Virtual Progeny assessment is performed using biological samples from the homosexual couple.
- a second round of Virtual Progeny assessment is performed between the genetic male parent and one or more potential egg donors. The results of the first round of Virtual Progeny assessment can then be compared with the results of the second round, and an egg donor can be chosen whose Virtual Progeny exhibits an acceptable amount of matching in one or more traits with the Virtual Progeny from the homosexual couple.
- a Virtual Progeny assessment is also performed with the second male partner and one or more potential egg donors, and a donor is selected whose Virtual Progeny exhibits an acceptable amount of matching in one or more traits with the Virtual Progeny from the homosexual couple.
- a DNA sample from a first genomic DNA are obtained from a first potential parent and a second genomic DNA sample from a second potential parent.
- the presence or absence of one or more nucleotide variants are identified at one or more loci of at least one pair of chromosomes of the first and the second genomic DNA samples and these identified nucleotide variants for the first and second genomic DNA samples are compared to a plurality of predetermined genomic sequences of haplotypes having predetermined frequencies at predetermined loci to identify haplotypes present in the first and second genomic DNA samples.
- a first diploid genome profile for the first potential parent is constructed.
- the first genome profile comprises the identified haplotypes in the first genomic DNA sample and a linkage probability determined by the frequencies of the identified haplotypes in the plurality of predetermined genomic sequences.
- a second diploid genome profile for the second potential parent is constructed.
- the second genome profile comprises the identified haplotypes in the second genomic DNA sample and a linkage probability determined by the frequencies of the identified haplotypes in the plurality of predetermined genomic sequences.
- a first library is constructed that comprises potential haploid gamete genomes from the first diploid genome profile by generating a combination of the haplotypes identified in the first genomic DNA sample using the linkage probability for each combination of the identified haplotypes
- a second library is constructed that comprises potential haploid gamete genomes from the second diploid genome profile by generating a combination of the haplotypes identified in the second genomic DNA sample using the linkage probability for each combination of the identified haplotypes.
- the method also entails combining a first haploid gamete genome from the first library with a second haploid gamete genome from the second library to form a diploid progeny genome.
- the diploid progeny genome is compared to a database of genomes relating to disease-associated or genetically influenced traits, thereby assessing the risk of disease or the likelihood of expressing a genetically influenced trait of the potential progeny.
- the methods and systems described herein can be used in combination with one or more processors, having either single or multiple cores.
- the processor can be operatively connected to a memory.
- the memory can be solid state, flash, or nanoparticle based.
- the processor and/or memory can be operatively connected to a network via a network adapter.
- the network can be digital, analog, or a combination of the two.
- the processor can be operatively connected to the memory to execute computer program instructions to perform one or more steps described herein. Any computer language known to those skilled in the art can be used.
- Input/output circuitry can be included to provide the capability to input data to, or output data from, the processor and/or memory.
- input/output circuitry can include input devices, such as keyboards, mice, touch pads, trackballs, scanners, and the like, output devices, such as video adapters, monitors, printers, and the like, and input/output devices, such as, modems and the like.
- the memory can store program instructions that are executed by, and data that are used and processed by, CPUs to perform various functions.
- the memory can include electronic memory devices, such as random-access memory (RAM), read-only memory (ROM), programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), and flash memory, and electro-mechanical memory, such as magnetic disk drives, tape drives, and optical disk drives, which can be used as an integrated drive electronics (IDE) interface, or a variation or enhancement thereof, such as enhanced IDE (EIDE) or ultra direct memory access (UDMA), or a small computer system interface (SCSI) based interface, or a variation or enhancement thereof, such as fast-SCSI, wide-SCSI, fast and wide-SCSI, etc, or a fiber channel-arbitrated loop (FC-AL) interface.
- RAM random-access memory
- ROM read-only memory
- PROM programmable read-only memory
- EEPROM electrically erasable programmable read-only memory
- flash memory
- the systems described herein can also include an operating system that runs on the processor, including UNIX®, OS/2®, and Windows®, each of which can be configured to run many tasks at the same time, e.g., a multitasking operating systems.
- the methods are utilized with a wireless communication and/or computation device, such as a mobile phone, personal digital assistant, personal computer, and the like.
- the computing system can be operable to wirelessly transmit data to wireless or wired communication devices using a data network, such as the Internet, or a local area network (LAN), wide-area network (WAN), cellular network, or other wireless networks known to those skilled in the art.
- a graphical user interface can be included to allow human interaction with the computing system.
- the graphical user interface can comprise a screen, such as an organic light emitting diode screen, liquid crystal display screen, thin film transistor display, and the like.
- the graphical user interface can generate a wide range of colors, or a black and white screen can be used.
- the graphical user interface can be touch sensitive, and it can use any technology known to skilled artisans including, but not limited to, resistive, surface acoustic wave, capacitive, infrared, strain gauge, optical imaging, dispersive signal technology, acoustic pulse recognition, frustrated total internal reflection, and diffused laser imaging.
- the generation of a Virtual Progeny genome is a four step process.
- One of ordinary skill in the art will understand that other steps may be added, combined, or deleted as desired.
- DNA microarrays is used to generate information relating to loci of interest. This information is utilized to produce genome scans that include genotype information from the plurality of loci of interest, which are defined by single base polymorphisms (“SNPs or CNPs”), DNA sequence reads, copy number, or other forms of personal genetic information.
- SNPs or CNPs single base polymorphisms
- DNA sequence reads copy number, or other forms of personal genetic information.
- Jane Doe and John Smith provided samples, which have such information provided for loci 01 through N ( FIG. 1A ).
- derived genome profile preferably incorporates phasing information in the form of stochastic matrices between haplotypes.
- haplotype structure is shown in the figure below: loci 01-04 are inherited as an indivisible block, which is a haplotype. Stochastic matrices between loci 05 and 06 are shown in block boxes for John and Jane ( FIG. 1B ).
- the UCSC genome browser is used to display phasing over large maternally-inherited chromosomal segments that comprise 100 million base pairs or more ( FIG. 1C ).
- a Monte Carlo simulation or Markov process as described above is used to generate haplopaths through a genome, where haplotypes are transmitted intact, and stochastic matrices are used to move from one haplotype or locus to the next one.
- John Smith's genome is converted into a series of haplopaths by means of a Monte Carlo simulation ( FIG. 1D ).
- Each individual genome profile is used to generate a pool of VirtualGametes ( FIG. 1E ). Exemplary haplopaths in rows 1 and 7 are illustrated.
- Step 4 Virtual Progeny Permutations from Random Virtual Gametes from Each Individual
- the number of iterations may be between about 10 and about 100. More preferably, the number of iterations may be between about 100 and about 1000. Most preferably, the number of iterations may be between about 1000 and about 100,000. In another aspect, the number of iterations may be about 50 or greater. More preferably, the number of iterations may be about 150 or greater. Most preferably, the number of iterations may be about 3000 or greater.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Physics & Mathematics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Organic Chemistry (AREA)
- Biotechnology (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Analytical Chemistry (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Public Health (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Immunology (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- Primary Health Care (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Data Mining & Analysis (AREA)
- Epidemiology (AREA)
- Biomedical Technology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Algebra (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Mathematical Physics (AREA)
- Pure & Applied Mathematics (AREA)
- Chemical Kinetics & Catalysis (AREA)
Abstract
Description
- This application is a continuation of U.S. patent application Ser. No. 14/293,541, filed on Jun. 2, 2014, which is a continuation of U.S. patent application Ser. No. 12/908,636 (now U.S. Pat. No. 8,805,620), filed Oct. 20, 2010, which claims priority to U.S. Provisional Patent Application No. 61/253,108, filed Oct. 20, 2009, each of which is hereby incorporated by reference in its entirety for all purposes.
- In 1866, Mendel described the rules of probability that govern the inheritance of simple genetic traits. He demonstrated that individuals who show no evidence of a disease can still be silent “carriers” of a disease mutation. The disease only appears in a child who inherits two defective copies of a particular gene from two parents who are carriers. Diseases and other traits inherited in this manner are referred to as Mendelian. Thousands of Mendelian diseases of metabolism and other physiological functions have been described in the medical literature.
- Although each individual Mendelian disease mutation is relatively rare, they are cumulatively so common that nearly every human being is a carrier for at least one. Carrier testing can identify two potential parents who have the same mutation and thus have a 25% likelihood of transmitting the corresponding disease to their child.
- The vast majority of heritable attributes that distinguish one person from another in morphology, physiology, disease resistance and susceptibility, and mental function result from complex interactions among multiple genes and non-gene loci. Therefore, in most cases, the likelihood that two healthy parents will have a child with genetically-influenced health problems cannot be determined by simply comparing the carrier status of each individual since carrier status has no meaning in the context of a complex genetic trait.
- Over the last decade, a pair of conceptual and technological breakthroughs has revolutionized the practice and potential of human genetic analysis. The conceptual breakthrough emerged from the discovery that most genetic variations in the global human population are confined to a limited number of chromosomal positions where one of two letters in the DNA alphabet can occur. These variant genomic positions are called single nucleotide polymorphism (“SNP”) loci.
- The SNP conceptual breakthrough alone would not have been enough to transform human genetics if the process of determining DNA genotypes remained as tedious as it had been just a few years earlier. But high complexity DNA microarrays allowed the development of a technology for cheaply screening increasingly large numbers of SNPs or CNPs at ever-lower costs. A current state-of-the-art DNA microarray can assay over two million genotypes in an individual human genome.
- Advances in DNA microarray design have enabled the detection of many types of characterized genetic variants. Whether simple or complex, most types of genetic variants can be defined in molecular terms as unique SNPs or CNPs for the purpose of analysis.
- The ultimate description of a genome is its two 3 billion base pair long DNA sequences. The cost of obtaining a complete personal genome sequence is dropping rapidly. Scientists predict that it will become affordable to average consumers within a few years.
- High complexity DNA microarray technologies, high throughput whole genome sequencing, and accompanying information technologies have revolutionized the field of human genetics, with extraordinary advances in understanding the genetic basis for complex traits and an enormous depth of public-access genetic datasets that increase in size daily.
- Among other recent advances, scientists can now use cost-effective tools to analyze a patient's genome and predict susceptibility to thousands of medical conditions, including mental illness, neurological diseases, cancer, stroke and heart disease.
- While progress has been made in developing computational tools that use information from an individual's genome to predict the likelihood of disease for that person, these tools cannot be applied to the pre-conception prediction of disease in a person's child. Thus, there is a need for methods of assessing the inheritance of such complex attributes prior to, or in place of, conception.
- A pre-conception method is provided herein for predicting the likelihood that a hypothetical child of any two persons, of opposite or same sex, who may or may not be fertile, will express any trait or disease that is subject to genetic influences that have been previously characterized, completely or partially.
- Based on an application of formal rules of genetics to a pair of personal genome profile or whole genome sequences, the methods allow for the simulation of the generation of multiple VirtualGametes, each of which contains a single allele from each genotype in combinatorial frequencies estimated for naturally produced sperm and eggs.
- The simplest form of VirtualGamete production uses Mendel's Law of independent assortment with a random number generator to choose one copy of each genetic locus independently of others for incorporation into a haploid genome profile. Accuracy and resolution is augmented with high complexity personal genome profiles that can be used to provide phasing information and further genotype imputation.
- A uniquely derived VirtualGamete from one personal genome is combined with a uniquely derived VirtualGamete from the second personal genome to produce a discrete Virtual Progeny (“VP”) genome sampling containing two definitive copies of each locus. This computational process is repeated a sufficient number of times to obtain a reproducible Virtual Progeny genome.
- A Virtual Progeny genome is a sufficiently large sampling of discrete genomes that are cumulatively representative of the likelihoods of alternative genotype combinations in a hypothetical child conceived from the two progenitors.
- Each discrete sampling that makes up a Virtual Progeny genome is evaluated independently for the likelihood of expressing any trait for which a genetic correlation has been previously determined. The traits associated with each Virtual Progeny sampling are normalized and combined to obtain a Virtual Progeny phenome distribution.
- Virtual Progeny may be evaluated for several thousand rare Mendelian diseases as well as hundreds of more common complex diseases such as diabetes, arteriosclerosis, autism, and schizophrenia. Disease likelihood values for a Virtual Progeny sampling can be calculated indirectly by protein modeling, computational analysis, and geographical origin of chromosomal regions, in addition to direct empirical associations.
- Evaluation of Virtual Progeny will be informed by additional factors not used on actual individuals, including the likelihood and length of particular chromosome regions that could be inherited in an identical, homozygous form from two heterozygous parents, which correlates with an increased frequency of fetal loss and children born with “birth defects.”
- Individuals and couples using a third party to reproduce can compare virtual Progeny produced with each potential genetic partner in a database.
- Sperm donor and egg donor agencies can use Virtual Progeny to screen out specific client/donor “pairings” that could give rise to offspring with increased disease risk. Sperm donors who represent a heightened genetic risk in specific combination with a particular client will be removed from that client's pool of potential donors. It is likely that these same donors will show no evidence of risk in combination with most other clients. Since the generation of Virtual Progeny does not depend on interpreting genome profiles of the progenitors, it does not involve carrier screening, and is not burdened with problems of false stigmatization.
- Same-sex and infertile couples will be able to simulate the genomic profile of their “own” purely hypothetical child and match this profile to the ones created virtually with selected donors.
- Individuals searching for a reproductive partner through a matchmaking agency can use Virtual Progeny comparisons to distinguish strong genetic matches based on desired and undesired offspring traits.
- Committed couples can use their virtual child's profile to prevent serious conditions through IVF technology or to prepare for their child's unique gifts and needs.
- The foregoing and other objects of the methods disclosed herein, the various features thereof, as well as the methods and compositions themselves, may be more fully understood from the following description, when read together with the accompanying drawings, in which:
-
FIG. 1A is a schematic illustration of an exemplary protocol for creating Virtual Progeny where DNA samples from two individuals (Jane Doe and John Smith) are processed to generate genome scans. -
FIG. 1B is a schematic illustration showing genome scans expanded into genome profiles with additional imputed genotypes and phasing information forgenetic loci 01 through N. -
FIG. 1C is a schematic illustration showing the genome profile for John Smith, taking into account the profile forloci 01 through N. -
FIG. 1D is a schematic illustration showing a random VirtualGamete from each of Jane Doe and John Smith, the VirtualGametes being combined to form a Virtual Progeny genome sampling. -
FIG. 1E is a schematic illustration showing the haplopaths inrows -
FIG. 2 is a representation of Virtual Progeny results for SRS and four partners in the gene region responsible for blue eye color. -
FIG. 3 is a schematic illustration of Virtual Progeny genome profiles containing simulated genotype information at approximately 10,000 SNP loci across 82 million base pairs ofchromosome 15. The bottom row shows the location of every SNP locus in this region (each index mark associated with a differential trait). -
FIG. 4 is a summary of Virtual Progeny determinations from four matings of a client (“SRS”) mated to four potential partners in which each SNP locus has two alleles designated, A and D. Six classes of genotype results are possible for each SNPs or CNPs in a Virtual Progeny genome. The number of SNPs or CNPs in each class is shown in each mating. -
FIGS. 5A-5F are a schematic representation showing various diseases or traits associated with various SNPs or CNPs together with the genotype results of simulated mating of one woman subject to four potential partners. -
FIGS. 6A and 6B are a graphical representation of 100 haplopaths (H1-H100) across 9 SNP loci numbered 64-72, generated from 100 Monte Carlo simulations, where homologue index numbers are (0,1). - All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below.
- For convenience, certain terms employed in the specification, examples, and appended claims are collected here. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The initial definition provided for a group or term herein applies to that group or term throughout the present specification individually or as part of another group, unless otherwise indicated.
- The articles “a” and “an” are used herein to refer to one or to more than one (i.e., to at least one) of the grammatical object of the article. By way of example, “an element” means one element or more than one element.
- The term “or” is used herein to mean, and is used interchangeably with, the term “and/or,” unless context clearly indicates otherwise.
- The term “about” is used herein to mean a value − or +20% of a given numerical value. Thus, “about 60%” means a value of between 60−(20% of 60) and 60+(20% of 60) (i.e., between 48 and 70).
- As used herein, “haploid cell” refers to a cell with a haploid number (n) of chromosomes.
- “Gametes”, as used herein, are specialized haploid cells (e.g., spermatozoa and oocytes) produced through the process of meiosis and involved in sexual reproduction.
- As used herein, “gametotype” refers to single copies with one allele of each of one or more loci in the haploid genome of a single gamete.
- As used herein, an “autosome” is any chromosome exclusive of the X and Y sex chromosomes.
- As used herein, “diploid cell” has a homologous pair of each of its autosomal chromosomes, and has two copies (2n) of each autosomal genetic locus.
- The term “chromosome”, as used herein, refers to a molecule of DNA with a sequence of basepairs that corresponds closely to a defined chromosome reference sequence of the organism in question.
- The term “gene”, as used herein, refers to a DNA sequence in a chromosome that codes for a product (either RNA or its translation product, a polypeptide) or otherwise plays a role in the expression of said product. A gene contains a DNA sequence with biological function. The biological function may be contained within the structure of the RNA product or a coding region for a polypeptide. The coding region includes a plurality of coding segments (“exons”) and intervening non-coding sequences (“introns”) between individual coding segments and non-coding regions preceding and following the first and last coding regions respectively.
- As used herein, “locus” refers to any segment of DNA sequence defined by chromosomal coordinates in a reference genome known to the art, irrespective of biological function. A DNA locus can contain multiple genes or no genes; it can be a single base pair or millions of base pairs.
- As used herein, a “polymorphic locus” is a genomic locus at which two or more alleles have been identified.
- As used herein, an “allele” is one of two or more existing genetic variants of a specific polymorphic genomic locus.
- As used herein, a “single nucleotide polymorphism” or “SNP” is a particular base position in the genome where alternative bases are known to distinguish one individual from another. Most categories of more complex genetic variants can be reduced for analytical purposes to one or a few defining SNPs or CNPs.
- As used herein, a “copy number variant” or “CNV” is a deletion or duplication of a large block of genetic material that exists in a population at a frequency less than 1%.
- As used herein, a “copy number polymorphism” or “CNP” is a deletion or duplication of a large block of genetic material that exists in a population at a frequency of greater than 1%. Since a CNV in one population can be a CNP in a second population, the two terms can be used interchangeably
- As used herein, “genotype” refers to the diploid combination of alleles at a given genetic locus, or set of related loci, in a given cell or organism. A homozygous subject carries two copies of the same allele and a heterozygous subject carries two distinct alleles. In the simplest case of a locus with two alleles “A” and “a”, three genotypes can be formed: A/A, A/a, and a/a.
- As used herein, “genotyping” refers to any experimental, computational, or observational protocol for distinguishing an individual's genotype at one or more well-defined loci.
- As used herein, a “haplotype” is a unique set of alleles at separate loci that are normally grouped closely together on the same DNA molecule, and are observed to be inherited as a group. A haplotype can be defined by a set of specific alleles at each defined polymorphic locus within a haploblock.
- As used herein, a “haploblock” refers to a genomic region that maintains genetic integrity over multiple generations and is recognized by linkage disequilibrium within a population. Haploblocks are defined empirically for a given population of individuals.
- As used herein, “linkage disequilibrium” is the non-random association of alleles at two or more loci within a particular population. Linkage disequilibrium is measured as a departure from the null hypothesis of linkage equilibrium, where each allele at one locus associates randomly with each allele at a second locus in a population of individual genomes.
- As used herein, a “genome” is the total genetic information carried by an individual organism or cell, represented by the complete DNA sequences of its chromosomes.
- As used herein, a “genome profile” is a representative subset of the total information contained within a genome. A genome profile contains genotypes at a particular set of polymorphic loci.
- As used herein, a “personal genome profile”, abbreviated PGP, is the genome profile of a particular individual person.
- As used herein, a genetic “trait” is a distinguishing attribute of an individual, whose expression is fully or partially influenced by an individual's genetic constitution.
- As used herein, a “phenotype” is a class of alternative traits which may be discrete or continuous.
- As used herein, a “haplopath” is a haploid path laid out along a defined region of a diploid genome by a single iteration of a Monte Carlo simulation or a single chain generated through a Markov process. A haplopath can be formed by starting at one end of a personal chromosome or genome and walking from locus to locus, choosing a single allele at each step based on available linkage disequilibrium information, inter-locus allele association coefficients, and formal rules of genetics that describe the natural process of gamete production in a sexually reproducing organism. A “haplopath” is generated through the application of formal rules of genetics that describe the reduction of the diploid genome into haploid genomes through the natural process of meiosis.
- As used herein, a “Virtual Gamete” is a single haplopath that extends across an entire genome.
- As used herein, a “Virtual Progeny genome sampling” is the discrete genetic product of two Virtual Gametes.
- As used herein, a “Virtual Progeny genome” is a collection of discrete Virtual Progeny genome samplings, each generated by combining two uniquely-derived random Virtual Gametes. In some instances, a Virtual Progeny genome is represented as a probability mass function over a sample space of all discrete genome states. In some instances, a Virtual Progeny is an informed simulation of a child or children that might result as a consequence of sexual reproduction between two individuals.
- As used herein, a “Virtual Progeny phenome” is a multi-dimensional likelihood function representing the likelihood and/or likely degree of expression of a set of one or more traits from a complete Virtual Progeny genome. In some instances, a Virtual Progeny phenome is represented as a probability mass function over a sample space of discrete or continuous phenotypic states. In some instances, a Virtual Progeny phenome is an informed simulation of a child or children that might result as a consequence of sexual reproduction between two individuals.
- As used herein, “partner” includes a marriage partner, sexual or reproductive partner, domestic partner, opposite-sex partner, and same-sex partner.
- The methods and compositions disclosed herein relate to assessing the genotypes of individuals and the phenotypes associated with particular genotypes of potential progeny from such individuals. Generally, genome profiles from two individuals are used to determine the probabilities that potential progeny from such individuals will express certain traits, such as an increased risk of disease. Such methods are referred to herein as “Virtual Progeny assessment.”
- Disclosed herein are methods of generating a library of potential haploid gamete genomes from an individual diploid subject. The methods comprise obtaining a genomic DNA sample from a diploid subject. In addition, the presence or absence of one or more nucleotide variants (e.g., SNPs or CNPs) are identified at one or more loci of at least one pair of chromosomes of the genomic DNA. In particular embodiments, a haploid gamete genome from a sampling of the potential gamete pool is generated from the diploid genome profile. In such instances, a sampling of a potential very large number of gametes is performed rather than identifying each gamete from a population (i.e., “pool”) of gametes. These identified nucleotide variants are compared to a plurality of predetermined genomic sequences of haplotypes having predetermined frequencies at predetermined loci to identify haplotypes present in the genomic DNA.
- In certain embodiments, the methods also entail the construction of a diploid genome profile for the subject. In certain embodiments, the genome profile comprises the identified haplotypes and a linkage probability determined by the frequencies of the haplotypes in the plurality of predetermined genomic sequences. A haploid gamete genome for each potential gamete can be generated from the diploid genome profile by generating a combination of the identified haplotypes using the linkage probability for each combination of the identified haplotypes.
- Aspects of the methods and systems disclosed herein involve generating a library of potential haploid gamete genomes from an individual diploid subject. The methods comprise providing a database having a plurality of predetermined genomic sequences. In certain embodiments, a first proportion of the genomic sequences comprises a first predetermined haplotype adjacent to a second predetermined haplotype. Also, a second proportion of the genomic sequences comprises the first predetermined haplotype adjacent to a third predetermined haplotype. In particular embodiments, genomic DNA sample from the subject is obtained and the presence or absence of one or more nucleotide variants at one or more loci of at least one pair of chromosomes of the genomic DNA is identified. The identified nucleotide variants are compared to the database to allow for identification of a plurality of sample haplotypes present in the at least one pair of chromosomes. In certain embodiments, the plurality of sample haplotypes comprises the first haplotype adjacent to a wobble haplotype. In some embodiments, the wobble haplotype is either the second haplotype or the third haplotype.
- The methods further entail a diploid genome profile that is constructed for the at least one pair of chromosomes of the subject from the identified sample haplotypes. The genome profile comprises the first haplotype, the wobble haplotype, and a linkage probability determined by the proportion of the predetermined genomic sequences. The genomic sequences comprise the first haplotype adjacent to the wobble haplotype. A haploid gamete genome is generated for each potential gamete from the diploid genome profile by linearly combining the first haplotype and the wobble haplotype using the linkage probability. The diploid profiles generated from the methods disclosed herein, in some embodiments, comprise additional identified and additional linkage probabilities determined by the proportion of the predetermined genomic sequences that comprise the first predetermined haplotype adjacent to each additional haplotype.
- Furthermore, a method of selecting a potential sperm or oocyte donor is further disclosed. Specifically, the donor genomic DNA samples from potential sperm donors or potential oocyte donors are obtained as well as a recipient genomic DNA sample from a potential recipient. The DNA is analyzed to identify the presence or absence of one or more nucleotide variants at one or more loci of at least one pair of chromosomes of the donor genomic DNA samples and the recipient genomic DNA sample. The identified nucleotide variants are compared to a plurality of predetermined genomic sequences of haplotypes having predetermined frequencies at predetermined loci to identify haplotypes present in the donor genomic DNA samples and the recipient genomic DNA sample.
- Using this information, a donor diploid genome profile is constructed for each potential donor. Each donor diploid genome profile comprises the identified haplotypes in each donor genomic DNA sample and a linkage probability determined by the frequencies of the identified haplotypes in the plurality of predetermined genomic sequences. In addition, a recipient diploid genome profile for the potential recipient is constructed. As for the donor, the recipient genome profile comprises the identified haplotypes (in this case, the haplotypes in the recipient genomic DNA sample) and a linkage probability determined by the frequencies of the identified haplotypes in the plurality of predetermined genomic sequences. Libraries are generated for donors and recipients, each library comprising potential haploid gamete genomes from a diploid genome profile. Each potential haploid gamete genome is generated by combining the haplotypes identified in the respective genomic DNA samples using the linkage probability for each combination of the identified haplotypes. Each haploid gamete genome is independently combined from each donor library with a second haploid gamete genome from the recipient library to form a library of diploid progeny genomes for each potential donor. The diploid progeny genomes are compared to a database of disease-associated genomes to assess the risk of disease of each potential progeny, wherein a sperm donor or an oocyte donor is eliminated from consideration if an increased risk of disease of the potential progeny is determined.
-
FIGS. 1A-1F depict an exemplary method of the methods disclosed herein. The exemplary methods shows how to create a Virtual Progeny. As illustrated inFIG. 1A-1F , DNA samples from two individuals are obtained. The samples are then processed by performing genome scans to identify genotypes at genetic markers, such as SNPs or CNPs, present in each individual's genomic DNA. Identified genotypes can be used to expand a personal genome profile by imputation of genotypes at non-processed genetic markers and to determine haplotypes present within each individual's genomic DNA. The likelihoods of association of sequential haplotypes are retrieved from lookup tables generated by the international HapMap project. - Using the identified haplotypes together with association data, a Monte Carlo simulation is performed to generate haplopaths that extend across each genome, resulting in a Virtual Gamete population for each individual. Virtual Gamete from each individual is combined to produce Virtual Progeny genome sampling, each of which is evaluated for corresponding trait likelihood values. The entire process is repeated a large number of times, each time starting with a new Monte Carlo simulation of Virtual Gametes. An integrated Virtual Progeny phenome likelihood distribution is determined to assess the probability that potential progeny express certain traits, such as increased risk of disease.
- As disclosed herein, the methods comprise obtaining a genomic DNA sample from the subject. These methods of assessing potential Virtual Progeny involves performing genome scans on individuals. In certain embodiments, the individuals share a common ancestry. Genome scans can be performed using any of a number of known procedures. For example, a biologic sample from an individual can first be obtained. Such biological samples include, but are not limited to, a bodily fluid (such as urine, saliva, plasma, or serum) or a tissue sample (such as a buccal tissue sample or buccal cell). The biologic sample can then be used to perform a genome scan using known methods. For example, DNA arrays can be used to analyze at least a portion of the genomic sequence of the individual. Exemplary DNA arrays include GeneChip Arrays, GenFlex Tag arrays, and Genome-Wide Human SNP Array 6.0 (available from Affymetrix, Santa Clara, Calif.).
- In certain embodiments, whole or partial genome sequence information is used to perform the genome scans. Such sequences can be determined using standard sequencing methods including chain-termination (Sanger dideoxynucleotide), dye-terminator sequencing, and SOLiDT
Λ sequencing (Applied Biosystems). Whole genome sequences can be cut by restriction enzymes or sheared (mechanically) into shorter fragments for sequencing. DNA sequences can also be amplified using known methods such as PCR and vector-based cloning methods (e.g., Escherichia coli). - In some embodiments, at least a portion of an individual's genetic material (e.g., DNA, RNA, mRNA, cDNA, other nucleotide bases or derivative thereof) is scanned or sequenced using, e.g., conventional DNA sequencers or chip-based technologies, to identify the presence or absence of one or more SNPs or copy number polymorphisms (“CNPs”) and their corresponding alleles.
- A scanning step can involve scanning at least about 1,000 bases, at least about 5,000 bases, at least about 10,000 bases, at least about 20,000 bases, at least about 50,000 bases, at least about 100,000 bases, at least about 200,000 bases, at least about 500,000 bases, at least about 1,000,000 bases, at least about 2,000,000 bases, at least about 5,000,000 bases, at least about 10,000,000 bases, at least about 20,000,000 bases, at least about 50,000,000 bases, at least about 100,000,000 bases, at least about 200,000,000 bases, at least about 500,000,000 bases, at least about 1,000,000,000 bases, at least about 2,000,000,000 bases, or at least 3,000,000,000 bases of an individual's genetic material.
- In certain instances, nucleotide bases are scanned from a first set of individuals (e.g., at least about 10 individuals, at least about 20 individuals, at least about 30 individuals, at least about 40 individuals, at least about 50 individuals, at least about 100 individuals, at least about 250 individuals, at least about 500 individuals, or more), and genetic variations between individuals are identified. Genetic variation data generated from each individual can be compared with genetic variation data generated from other individuals in a first set of individuals to discover genetic variations among the first group of individuals.
- The variations identified in the first set of individuals can be used in subsequent studies in which such variations are analyzed to determine if they are associated with a phenotype-of-interest. These variations can include, e.g., SNPs or CNPs, common SNPs or CNPs, informative SNPs or CNPs, rare SNPs or CNPs, deletions, insertions, or frameshift mutations. Such genetic variations can be detected in, e.g., genomic DNA, RNA, mRNA, or derivatives thereof. In some instances, genetic variations scanned and/or identified are informative SNPs or CNPs.
- In certain instances, instead of scanning and reading all of the bases from each genome or all common SNPs or CNPs, a limited number of informative SNPs or CNPs, e.g., about 300 to about 500,000, can be scanned or read. Thus, while in some instances scanning whole genomes is contemplated, in other instances, only specific chromosomes, loci, common SNPs or CNPs, or informative SNPs or CNPs are scanned and/or used. Specific chromosomes, loci, common SNPs or CNPs, or informative SNPs or CNPs can be selected based on prior knowledge that such regions are related to a particular phenotype of interest.
- In some instances, the scanning step is supplemented and/or substituted by obtaining data on genetic variations from databases. These genomic sequences have been predetermined. Such databases can provide, for example, a list of identified genetic variations (e.g., SNPs or CNPs or haplotypes) or genotyping data on particular individuals or populations. Examples of publicly available databases useful in the methods described herein include, but are not limited to, UCSC's Genome Browser, NCBI's dbSNP, MIT's human SNP database, University of Geneva's
human Chromosome 21 SNP database, and the University of Tokyo's SNP database. Other databases known in the art can be used in conjunction with the methods described herein. - Because of the haplotype structure of the human genome, analysis of a relatively small number of SNP loci can be used to profile an entire human genome. For example, a haplotype, containing dozens or hundreds of SNP alleles, can be “tagged” with just a few well-chosen “Tag SNPs or CNPs”. A nearly complete whole genome profile of an individual can thus be obtained, e.g., by using a DNA microarray that distinguishes genotypes at around 500,000 Tag SNPs or CNPs.
- The methods disclosed herein are described below and are illustrated by haploblock information obtained from the International HapMap project (hapmap.org) for a population of European ancestry. In the exemplary method, the presence or absence of one or more nucleotide variants (e.g., SNPs or CNPs) are identified at one or more loci of at least one pair of chromosomes of the genomic DNA and these identified nucleotide variants are compared to a plurality of predetermined genomic sequences of haplotypes having predetermined frequencies at predetermined loci to identify haplotypes present in the genomic DNA. In addition, a haplogroup and its haplotype members can be represented in tabular form for a 14,699 base region of
human chromosome 15 from position 25,942,585 to 25,957,284 in the hg18 build of the human reference genome, which is given an index number of “x=69” for this particular download. This haplogroup contains λ=18 loci with unique IDs displayed inrow 0, and v=6 haplotype variants represented in rows 1-6. The European frequency of each haplotype is indicated in the first column. The six shown add % of total variation. The block of elements with bolded perimeter represents a modified stochastic matrix (“Ax”) with the observed joint frequencies of occurrence of the corresponding row haplotype of X together with row haplotype k ofhaplogroup X+ 1. The “L” term in the following equations is “ . . . .” Furthermore, the first “a” term in the following equation is -
- For convenience in describing the genetic algorithms presented in this embodiment, the same tabular format shown in Table 1 below and the same formal genetic notation described below is used to denote all loci and meta-loci with one or more loci, whether variant or nonvariant. Note that the tables below show haplotypes and association matrices.
-
TABLE 1 (Table 1 discloses SEQ ID Nos: 1-6, respectively, in order of appearance.) X = 69 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 69 1 2 3 4 1 0.425 C A C G C G T G C G G T G C G G G C 0.425 0.000 0.000 0.000 2 0.267 C A A G C G T G C G A T G C G G G C 0.267 0.000 0.000 0.000 3 0.117 C A C G C G T G C G G T A C G G G C 0.117 0.000 0.000 0.000 4 0.067 A C C A T A T G T G G C G T A T A T 0.000 0.067 0.000 0.000 5 0.058 C C C G C G T G C G G T G T G G G C 0.000 0.000 0.058 0.000 6 0.025 A C C G C G C A C T G T G T G T G C 0.000 0.000 0.000 0.025 var: 6 96% of the population chr 15 from 25,942,585 to 25,957,284 = 14,699 bases G#1: AM CC CC GG AG GG GG GG CC (1, 2) G#2: LMS AC CC AG CT GG GG GG GT AG CT (3, 4) LX: = indexed meta-locus; X: = meta-locus index number a [ι00] λ: = number of loci; locus j data a [ι*j X]: jϵ(1, L, λ) v: = number of variants or haplotypes; variant i a [ιj* X]: iϵ(1, L, v) f (i): = population frequency of a variant i a [ιi0 X] f( i X | k X+1) a [Ii,λ+k X] ≡ [ai,k X] - In the preceding equation, all “L” terms with the exception of the first term represents “ . . . ” In addition, the “a” terms represent n is →. As shown above, genome scans performed for two individuals (“AM” and “LMS”) identify the presence of particular bases at
loci - The methods further comprise comparing the plurality of predetermined genomic sequences of haplotypes having predetermined frequencies at predetermined loci to identify haplotypes present in the genomic DNA. In certain embodiments, the assessment of Virtual Progeny includes imputing from publicly available information to further characterize the genotypes of individuals. For example, through an automated process, the composition of
haplogroup 69 described above can be reduced tohaplotypes haplotypes -
TABLE 2 (Table 2 discloses SEQ ID Nos: 1-4, respectively, in order of appearance.) X = 69 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 18 loc 1 2 3 4 G#1: AM CC CC GG AG GG GG GG CC 1 0.425 C A C G C G T G C G G T G C G G G C 0.425 0.000 0.000 0.000 2 0.267 C A A G C G T G C G A T G C G G G C 0.267 0.000 0.000 0.000 G#2: LMS AC CC AG CT GG GG AG GT AG CT 3 0.117 C A C G C G T G C G G T A C G G G C 0.117 0.000 0.000 0.000 4 0.067 A C C A T A T G T G G C G T A T A T 0.000 0.067 0.000 0.000 chr 15 from 25,942,585 to 25,957,284 = 14,699 bases - A useful, simplified formulation of the extended genotype (g) carried by an individual person (W) at a previously defined haplogroup (X) takes the form of the following 3×3 matrix containing a 2×2 stochastic matrix partition derived from the haplogroup lookup table, where each element contains the empirically-determined joint frequency of haplotypes i i X with κ j X+1
-
- From the data shown above for “LMS” and “AM”, the following prenormalization genotype matrices can be obtained (Tables 3 and 4).
-
TABLE 3 69 1 1 g69(AM) = 1 0.425 0.425 2 0.267 0.267 -
TABLE 4 69 1 2 g69(LMS) = 3 0.117 0.000 4 0.000 0.067 - These data represent relationships of adjacent loci in a population, not an individual diploid genome. Thus, in certain embodiments, initial association matrices can be parsed with diploid genome consistency rules, leading to a transformation of:
- Associations can be determined according to several criteria including, e.g., population data and inter-locus distance on a chromosome. When no association information is available for sequential loci, independent assortment can be assumed:
-
[αi,k χ]i=1,2;k=1,2=0.5 - The complete formulation of a whole genome profile can be an indexed set of genotype matrices:
-
G(W)={g X(W): X=1, . . . ,N} - An example of a portion of two whole genome profiles across a distance of 110,000 bases of
chromosome 11 for personal genome scans for “AM” and “LMS” that was expanded directly (through a haplogroup lookup table into phased genetic information at over 400 individual SNP loci) is given below (Table 5). -
TABLE 5 1 2 3 4 5 6 7 64 1 3 65 1 1 66 1 2 67 1 3 68 1 2 69 1 1 70 1 1 1 0.625 0.125 1 0.617 0.617 1 0.492 0.283 1 0.242 0.033 1 0.350 0.083 1 0.425 0.425 1 0.825 0.825 1 0.625 0.125 3 0.125 0.125 1 0.492 0.283 2 0.300 0.008 3 0.075 0.008 2 0.267 0.267 1 0.825 0.825 64 1 3 65 1 1 66 1 1 67 1 1 68 3 4 69 1 2 70 1 2 1 0.625 0.125 1 0.617 0.617 1 0.492 0.492 1 0.242 0.242 1 0.117 0.025 3 0.117 0.000 1 0.825 0.000 1 0.625 0.125 3 0.125 0.125 1 0.492 0.492 1 0.242 0.242 1 0.117 0.025 4 0.000 0.067 2 0.000 0.075 25,861,367 25,879,159 25,909,368 25,938,449 25,940,622 25,942,586 25,959,712 8 9 71 1 1 72 1 1 1 0.817 0.817 1 0.683 0.683 1 0.817 0.817 1 0.683 0.683 71 1 2 72 3 3 1 0.817 0.008 1 0.017 0.017 2 0.000 0.075 2 0.033 0.033 25,967,383 25,971,652 - In particular embodiments, a haploid gamete genome for each potential gamete is generated from the diploid genome profile by generating a combination of the identified haplotypes using the linkage probability for each combination of the identified haplotypes. In some embodiments, the Virtual Gametes are then generated using a computer simulation of gamete production from a diploid parental genome profile. The simulated haplopath can be computed or generated by subjecting the parental genome to formal rules of genetics that operate naturally during meiosis and gamete production.
- Formal rules of genetics are known in the art, and are mathematical formulas or algorithms that serve as abstract representations of the biological processes of sexual reproduction. Rules are based on allele segregation, independent assortment, linkage between genetic loci, recombination suppression, Hardy-Weinberg equilibria, and other probabilistic genetic processes. Formal rules can be used to estimate the likelihood of transmission of particular alleles and combinations of alleles at multiple loci, from an individual to a gamete.
- The geographic origin of a parental genome, or subregion thereof, can provide population-specific allele and genotype frequency information that can be incorporated into computational models that predict genotypic and phenotypic probabilities of Virtual Progeny.
- A Virtual Gamete includes one permutation of a haploid genome path, or haplopath (H), from one end of a computed genome profile (X=1) to the other (X=N). Each haplopath includes a single allele (variant or haplotype) from each meta-locus, defined by the index number of the chromosomal homologue.
- A haplopath can be initiated with a random number generator (e.g., such as a Monte Carlo method) that chooses a random allele(i) at a random initializing locus in the set of N such loci. Each prior and subsequent allele along the haplopath can be generated according to normalized likelihoods derived from locus-specific association matrices.
-
- For each Monte Carlo haplopath [Hi↑] that is generated computationally, a reciprocal haplopath [Hi↓] can be created with alleles present on the opposite homologue at every locus. This converse haplopath represents the sister/brother gamete of a simulated haplopath:
-
H i↑ ={h 1 ,h 2 , . . . ,h N },h χ∈(1,2) -
H i↓ ={h 1 ,h 2 , . . . ,h N },h χ∈(1,2);h χ,↓∉(h χ,i↑) - The inclusion of converse haplopaths in a Virtual Gamete pool can increase consistency of the simulated data with Mendel's first law of equal segregation at every locus.
FIG. 6 shows an example of 100 iterations of a Monte Carlo run on the genomic region shown above, where homologue index numbers have been converted from (1,2) to (0,1). - In further embodiments, the Virtual Progeny genome is a collection of permutations of diploid genomes that can each be formed by combining a random Virtual Gamete from one parent (the paternal line, pat) with a random Virtual Gamete from a second parent (the maternal line, mat). Each permutation of a Virtual Progeny genome comprises a discrete set of defined integer genotypes. In one embodiment, a single permutation (i) of a Virtual Progeny genome can take the following form:
-
- In exemplary methods, a Virtual Progeny genome can comprise a set of individual permutations of Virtual Progeny genomes:
-
G VP ={G 1 VP ,G 2 VP, . . . } - A phenome can comprise a measure of phenotypes and traits expressed by a diploid organism over the time period of its life. Numerous discrete or continuous phenotypes associated with discrete genotypes are listed on databases maintained at the National Institutes of Health division of Bioinformatic Information and other public databases. Additional sources of discrete or continuous phenotypes associated with discrete genotypes are located in PubMed, the UCSC browser, NCBI (fancy output genomic browser), Online Mendelian Inheritance in Man (“OMIM”), SNPedia, GeneTests, Entrez Gene, HuGENavigator, HuGENavigator/Genopedia/Search, HuGENavigator/Phenopedia/Search, NextBio Database, and Genetic Association Database. In addition, databases for SNP and variant datasets include SNP Cluster Report, Genome-Wide Association Studies (National Human Genome Research Institute), Autism Chromosome Rearrangement Database (hosted by The Centre for Applied Genomics), and the Database of Genomic Variants (hosted by The Centre for Applied Genomics).
- The Virtual Progeny phenome (phVP) can comprise a single probability density function defined by the summation of the weighted set of phenomes that are individually associated with each permutation of a Virtual Progeny genome.
- In some methods, prior population genetic data can be used to predict the population or populations of origin for a trait-affecting locus, which can be incorporated into models that can be used to predict trait likelihoods of virtual or actual progeny. Phenotypes associated with all genotypes in the Virtual Progeny sample space can be integrated to produce an overall assessment of phenotypic likelihoods (in terms of penetrance and expressivity) for each individual trait, alone or in combination.
- The following is an example showing the application of the methods described herein. The actual methods can cover many more genes with a larger number of loci.
- The method entails a genotyping panel with the disease impacted by each locus, the official name of the locus, the alleles at each locus that are probed, the frequency of these alleles in the population being analyzed, standard reference names for probes that detect each allele, (an abbreviated name for each allele for the purposes of this illustration), and the impact of the allele on disease risk or disease expression within each genotypic context. Such a description is shown in Table 1, which shows information for a two-locus genotyping panel.
-
TABLE 6 Disease genome locus allele freq. probe abb. effect of allele in genotype Cystic fibrosis CFTR: c.1408A>G CFTR: p.469V 0.50 rs213950G G GG: no effect CFTR: p.469M 0.50 rs213950A A AA: 0.002 risk; GA: 0.05 risk CFTR: c.1521_1523del CFTR: p.508F 0.98 rs332TTT T TT, Td: no direct effect, but 3 risk of other CF mutations remains CFTR: p.508del 0.02 rs332del d dd: cystic fibrosis, severe - The findings on genotype-phenotype associations and population frequencies of alleles, including alleles not determined in this analysis, will be used to create a risk table that covers all possible genotype combinations of alleles in the genotyping panel (Table 7).
-
TABLE 7 2-locus Risk of disease in Disease genotype virtual child Cystic fibrosis GG, TT 1/25,000 GA, TT 1/192,000 AA, TT 1/3,600,000 GG, dT 1/75 GA, dT 1/1,500 AA, dt 1/30,000 GG, dd 1/1 GA, dd 1/1 AA, dd 1/1 - Genotype data captured from each individual who is a component of the analysis is assembled (Table 8).
-
TABLE 8 Individual 2-locus genotype Donor B GA, TT Donor F AA, dT Client # 1 GA, TT Client # 2 AA, TT - Next, a Monte Carlo algorithm is applied to each individual genotype profile to generate a pool of “Virtual Gametes” containing single alleles from each locus. A virtual gamete from each designated genetic parent is chosen randomly and combined to produce one permutation of a potential child's genotype profile. The process of virtual gamete choice and combination to produce a diploid genome is iterated a sufficient number of times so that the sum of permutations provides a stable estimate of a child's genome likelihood distribution, as illustrated in
columns -
TABLE 9 Pairing: Client #1 X Donor B Monte Carlo-generated Virtual progeny genotypes genotype Likelihood distribution CF risk from table GG, TT 0.25 1/25,000 GA, TT 0.50 1/192,000 AA, TT 0.25 1/3,600,000 -
Normalized disease risk for composite virtual progeny cystic fibrosis 1/79,000 - The risk of disease associated with each discrete Virtual Progeny genotype is determined from the previously established risk table (from Table 7 into
column 3 of Table 9). Disease risks associated with each genotype are weighted according to their appearance in the set of permutations to assign a normalized disease risk to the particular pairing under analysis (Table 10). -
TABLE 10 Genotypes Locus #1 Locus #2 rs213950 delF508 DONORS No testing RISK A B C D E F —/— —/— 2,500 24% 48% 24% 1% 2% 1% Client 1 G/A, F/F X G/G, F/F G/A, F/F A/A, F/F G/G, del/F G/A, del/F A/A, del/F G/G F/F 25,000 0.500 0.250 0.000 0.250 0.125 0.000 G/A F/F 191,755 0.500 0.500 0.500 0.250 0.250 0.250 A/A F/F 3,634,742 0.000 0.250 0.500 0.000 0.125 0.250 G/G del/F 75 0.000 0.000 0.000 0.250 0.125 0.000 G/A del/F 1,500 0.000 0.000 0.000 0.250 0.250 0.250 A/A del/F 30,000 0.000 0.000 0.000 0.000 0.125 0.250 n/n del/del 1 0.000 0.000 0.000 0.000 0.000 0.000 11,638 44,233 78,888 364,291 285 542 5,670 Client 2 A/A, F/F X G/G, F/F G/A, F/F A/A, F/F G/G, del/F G/A, del/F A/A, del/F G/G F/F 25,000 0.00 0.00 0.00 0.00 0.00 0.00 G/A F/F 191,755 1.00 0.50 0.00 0.50 0.25 0.00 A/A F/F 3,634,742 0.00 0.50 1.00 0.00 0.25 0.50 G/G del/F 75 0.00 0.00 0.00 0.00 0.00 0.00 G/A del/F 1,500 0.00 0.00 0.00 0.50 0.25 0.00 A/A del/F 30,000 0.00 0.00 0.00 0.00 0.25 0.50 n/n del/del 1 0.00 0.00 0.00 0.00 0.00 0.00 103,198 191,755 364,291 3,634,742 2,977 5,670 59,509 Client 3 G/G, F/F X G/G, F/F G/A, F/F A/A, F/F G/G, del/F G/A, del/F A/A, del/F G/G F/F 25,000 1.00 0.50 0.00 0.50 0.25 0.00 G/A F/F 191,755 0.00 0.50 1.00 0.00 0.25 0.50 A/A F/F 3,634,742 0.00 0.00 0.00 0.00 0.00 0.00 G/G del/F 75 0.00 0.00 0.00 0.50 0.25 0.00 G/A del/F 1,500 0.00 0.00 0.00 0.00 0.25 0.50 A/A del/F 30,000 0.00 0.00 0.00 0.00 0.00 0.00 n/n del/del 1 0.00 0.00 0.00 0.00 0.00 0.00 6,167 25,000 44,233 191,755 150 285 2,977 - The final result obtained above is used in conjunction with a chosen risk tolerance cutoff to determine whether the donor should be retained or removed from the client pool. If risk tolerance had been set previously at 1/50,000, the result obtained in this example would lead to no action being taken. Donor B would remain in
client # 1's pool. However, the other donors would be eliminated as potential parents of offspring. In the case of disease or particular traits, an increased likelihood or risk of such disease would have been determined, thereby rendering certain donors eliminated from the pool of potential donors. In certain instances, the potential parent is a sperm donor. In other instances, the potential parent is an oocyte donor. -
TABLE 11 Categories of donors (in relation to 2-SBP genotype) Locus # 1Locus #2 A B C D E F rs213950 delF508 Clients 24% 48% 24% 1% 2% 1% —/— —/— 2,500 G/G, F/F G/A, F/F A/A, F/F G/G, del/F G/A, del/F A/A, del/F G/A F/F 11,638 X 44,233 78,888 364,291 285 542 5,670 A/A F/F 103,198 X 191,755 364,291 3,634,742 2,977 5,670 59,509 G/G F/F 6,167 X 25,000 44,233 191,755 150 285 2,977 - The analysis presented in this illustration can be readily scaled to any number of diseases and loci. The most time-consuming components are genome scans and assessing Virtual Progeny, which only need be performed once. The actual application of the rest of the methodology is entirely automated, and a test run with a panel of 1,000 loci (2,000 probes) was performed in less than one minute on a laptop computer.
- In the particular embodiment, the Virtual Progeny genome, which actually comprises a collection of likely genome permutations, or distribution of genome states generated through a Markov process, can be interrogated for associated trait expression. Each discrete genome in the Virtual Progeny collection or distribution can be evaluated independently for expected trait expression, based on published genotype-phenotype associations. The weighted summation of traits expressed by individual genome permutations can yield a Virtual Progeny phenome probability distribution.
- The disclosed methods and systems can be used to provide visual representations of differential virtual progeny results for virtual mating of a client with potential donors (
FIG. 2 ). Further, each of the virtual matings are compared. The methods disclosed herein are used to obtain samples, which are processed by performing genome scans to identify SNPs or CNPs present in each individual's genomic DNA. The identified genetic markers are used to expand personal genome data and to determine haplotypes present within each individual's genomic DNA. The likelihoods of association of sequential haplotypes are retrieved from lookup tables generated by the international HapMap project. Using the identified haplotypes together with association data, reiteration of a Monte Carlo simulation is performed to generate a haplopath, resulting in a Virtual Gamete population for each individual. Virtual Gametes from each individual are combined to produce Virtual Progeny genome samplings, each of which is evaluated for corresponding trait likelihood values. Finally, an integrated Virtual Progeny phenome likelihood distribution is determined to assess the probability that potential progeny express certain traits, such as increased risk of disease. - This methodology was used to identify probable phenotypes of Virtual Progeny from the simulated matings of four donors with client SRS (
FIG. 2 ). Virtual Progeny are generated analyzing the potential phenotypes that would be generated from the matings of SRS with four different donors. For VP-SRSxTJ (i.e., a mating of SRS to TJ), the progeny phenotypes will all be blue eyes, while for VP-SRSxAFl (i.e., a mating SRS to AFl), the phenotypes will all be brown eyes. For VP-SRSxLMS (i.e., a mating SRS to LMS) and VP-SRSxNEA (i.e., a mating SRS to NEA), the Virtual Progeny likelihood phenotype is 50% blue and 50% brown. - Such divergence increases radically when more loci are added to the analysis. For example, when 100,000 loci are analyzed, the number of potential genotypes increases to 2100,000. The disclosed methods and systems allow for the analysis of large numbers of loci without the incredible amount of time or uncertainty associated with prior methodologies.
FIG. 3 shows an example of a visual representation of Virtual Progeny genome profiles for the matings of the SRS to the four donors, all of which are shown across 82 million base pairs ofChromosome 15. The bottom row shows the loci associated with various traits. - As described above, the probability that Virtual Progeny described herein will have a particular trait can be assessed by comparing discrete Virtual Progeny genome samplings to publicly available databases.
FIG. 4 shows an analysis of Virtual Progeny (“VP”) genotypes. Each SNP locus has two alleles (A and D), which can produce three distinct genotypes (AA, AD, DD). “A” is used to signify the original “ancestral” allele and “D” is the allele “derived” by mutation. At any one locus, “A” and “D” each refer to empirically determined nucleotide bases A, C, G, or T. Utilizing reference information from databases known in the art and disclosed herein, “A” is used as a reference allele. Using this analytical technique, a SNP genotype is completely described by the number of ancestral alleles it contains (0, 1, 2). For instance, an integer is used to indicate determination of a single definitive genotype. If a progeny is described using a “0” for a particular allele, then the progeny definitively has zero ancestral alleles at the particular locus. If “1” describes the progeny, the progeny definitively has one ancestral allele at that locus. If “2” describes the progeny, then the progeny definitively has both ancestral alleles. In other words, integers occur when both mates are homozygous for a particular allele. Non-integers (0.5,1.5) indicate restriction to the two genotypes that are ±0.5 removed. In other words, non-integers indicate that one mate was heterozygous for a particular allele. Finally, an out-of-bounds integer (5) is used when data from the two virtual parents is non-informative (i.e., both parents are heterozygous at that locus). In the example provided inFIG. 4 , the number of loci falling into thecategories 0, 0.5, 1, 1.5, 2, and 5 are shown. As is apparent, each partner provides a different profile for SNPs or CNPs found in the reference databases. -
FIG. 5 provides a more detailed view of actual SNPs or CNPs that are known and can be analyzed using the methods and systems disclosed herein. Various SNPs or CNPs and the results of simulated mating of one woman subject to four potential partners is again shown as either an integer (0, 1, or 2), a non-integer (0.5 or 1.5), or as non-informative (5) (FIG. 5 ). As inFIG. 4 , the profile of each VP of each mating is provided for the various traits and diseases. The data comparison identified virtual progeny genotypes at over 1,300 SNP loci that influence over 100 disease traits shown in the table. - Nonlimiting examples of traits that may assessed using the methods described herein include or relate to ability to roll the tongue, ability to taste PTC, acute inflammation, adaptive immunity, addiction(s), adipose tissue, adrenal gland, age, aggression, amino acid level, amyloidosis, anogenital distance, antigen presenting cells, auditory system, autonomic nervous system, avoidance learning, axial defects or lack thereof, B cell deficiency, B cells, B lymphocytes (e.g., antigen presentation), basophils, bladder size/shape, blinking, blood chemistry, blood circulation, blood glucose level, blood physiology, blood pressure, body mass index, body weight, bone density, bone marrow formation/structure, bone strength, bone/skeletal physiology, breast size/shape, bursae, cancellous bone, cardiac arrest, cardiac muscle contractility, cardiac output, cardiac stroke volume, cardiomyopathy, cardiovascular system/disease, carpal bone, catalepsy, cell abnormalities, cell death, cell differentiation, cell morphology, cell number, cell-mediated immunity, central nervous system, central nervous system physiology, chemotactic factors, chondrodystrophy, chromosomal instability, chronic inflammation, circadian rhythm, circulatory system, cleft chin, clonal anergy, clonal deletion, T and B cell deficiencies, conditioned emotional response, congenital skeletal deformities, contextual conditioning, cortical bone thickness, craniofacial bones, craniofacial defects, crypts of Lieberkuhn, cued conditioning, cytokines, delayed bone ossification, dendritic cells (e.g., antigen presentation), Di George syndrome, digestive function, digestive system, digit dysmorphology, dimples, discrimination learning, drinking behavior, drug abuse, drug response, ear size/shape including ear lobe attachment, eating behavior, ejaculation function, embryogenesis, embryonic death, embryonic growth/weight/body size, emotional affect, enzyme/coenzyme level, eosinophils, epilepsy, epiphysis, esophagus, excretion physiology, extremities, eye blink conditioning, eye color/shape, eye physiology, eyebrows shape, eyelash length, face shape, facial cleft, femur, fertility/fecundity, fibula, finger length/shape, fluid regulation, fontanels, foregut, fragile skeleton, freckles, gall bladder, gametogenesis, gastrointestinal hemorrhage, germ cells (e.g., morphology, depletion), gland dysmorphology, gland function, glucagon level, glucose homeostasis, glucose tolerance, glycogen catabolism, granulocytes, granulocytes (e.g., bactericidal activity, chemotaxis), grip strength, grooming behavior, hair color, hair follicle structure/orientation, hair growth, hair on mid joints, hair texture, handedness, harderian glands, head, hearing function, heart, heart rate, heartbeat (e.g., rate, irregularity), height, hemarthrosis, hemolymphoid system, hepatic system, hitchhiker's thumb, homeostasis, humerus, humoral immune response, hypoplastic axial skeleton, hypothalamus, immune cell, immune system (e.g., hypersensitivity), immune system response/function, immune tolerance, immunodeficiency, inability to urinate, increased sensitivity to gamma-irradiation, inflammatory mediators, inflammatory response, innate immunity, inner ear, innervation, insulin level, insulin resistance, intestinal bleeding, intestine, ion homeostasis, jaw, kidney hemorrhage, kidney stones, kidney/renal system, kyphoscoliosis, kyphosis, lacrimal glands, larynx, learning/memory, leukocyte, ligaments, limb dysmorphology, limb grasping, lipid chemistry, lipid homeostasis, lips size/shape, liver (e.g., development/function), liver/hepatic system, locomotor activity, lordosis, lung, lung development, lymph organ development, macrophages (e.g., antigen presentation), mammary glands, maternal/paternal behavior, mating patterns, meiosis, mental acuity, mental stability, mental state, metabolism of xenobiotics, metaphysis, middle ear, middle ear bone, morbidity and mortality, motor coordination/balance, motor learning, mouth, movement, muscle, muscle contractility, muscle degeneration, muscle development, muscle physiology, muscle regeneration, muscle spasms, muscle twitching, musculature, myelination, myogenesis, nervous system, neurocranium, neuroendocrine glands, neutrophils, NK cells, nociception, nose, nutrients/absorption, object recognition memory, ocular reflex, odor preference, olfactory system, oogenesis, operant or “target response”, orbit, osteogenesis, osteogenesis/developmental, osteomyelitis, osteoporosis, outer ear, oxygen consumption, palate, pancreas, paralysis, parathyroid glands, pelvis girdle, penile erection function, perinatal death, peripheral nervous system, phalanxes, pharynx, photosensitivity, piloerection, pinna reflex, pituitary gland, PNS glia, postnatal death, postnatal growth/weight/body size, posture, premature death, preneoplasia, propensity to cross the right arm over the left of vice versa, propensity to cross the right thumb over the left thumb when clasping hands or vice versa, pulmonary circulation, pupillary reflex, radius, reflexes, reproductive condition, reproductive system, resistance to fatty liver development, resistance to hyperlipidemia, respiration (e.g., rate, shallowness), respiratory distress or failure, respiratory mucosa, respiratory muscle, respiratory system, response to infection, response to injury, response to new environment (transfer arousal), ribs, salivary glands, scoliosis, sebaceous glands, secondary bone resorption, seizures, self tolerance, senility, sensory capabilities, sensory system physiology/response, sex, sex glands, shoulder, skin, skin color, skin texture/condition, skull, skull abnormalities, sleep pattern, social intelligence, somatic nervous system, spatial learning, sperm count, sperm motility, spermatogenesis, startle reflex, sternum defect, stomach, suture closure, sweat glands, T cell deficiency, T cells (e.g., count), tarsus, taste response, teeth, temperature regulation, temporal memory, tendons, thyroid glands, tibia, touch/nociception, trachea, tremors, trunk curl, tumor incidence, tumorigenesis, ulna, urinary system, urination pattern, urine chemistry, urogenital condition, urogenital system, vasculature, vasoactive mediators, vertebrae, vesicoureteral reflux, vibrissae, vibrissae reflex, viscerocranium, visual system, weakness, widows peak or lack thereof, etc.
- Other nonlimiting traits include cognitive ability (Ruano et al., Am. J. Hum. Genet. 86:113 (2010)); Familial Osteochondritis Dissecans (Stattin et al., Am. J. Hum. Genet. 86:126 (2010)); hearing impairment (Schraders et al., Am. J. Hum. Genet. 86:138 (2010)); mental retardation associated with autism, epilepsy, or macrocephaly (Giannandrea et al., Am. J. Hum. Genet. 86:185 (2010)); muscular dystrophies (Bolduc et al., Am. J. Hum. Genet. 86:213 (2010)); Diamond-Blackfan anemia (Doherty et al., Am. J. Hum. Genet. 86:222 (2010)); osteoporotic fractures (Kung et al., Am. J. Hum. Genet. 86:229 (2010)); familial exudative vitreoretinopathy (Poulter et al., Am. J. Hum. Genet. 86:248 (2010)); skeletal dysplasia, eye, and cardiac abnormalities (Iqbal et al., Am. J. Hum. Genet. 86:254 (2010)); Warsaw breakage syndrome (van der Lilij et al., Am. J. Hum. Genet. 86:262 (2010)); arterial calcification of infancy (Lorenz-Depiereux et al., Am. J. Hum. Genet. 86:267 (2010)); hypophosphatemic rickets (Lorenz-Depiereux et al., Am. J. Hum. Genet. 86:267 (2010); Levy-Litan et al., Am. J. Hum. Genet. 86:273 (2010)); rhabdoid tumor predisposition syndrome (Schneppenheim et al., Am. J. Hum. Genet. 86:279 (2010)); and multiple sclerosis (Jakkula et al., Am. J. Hum. Genet. 86:285 (2010)).
- Yet other nonlimiting traits include 21-Hydroxylase Deficiency, ABCC8-Related Hyperinsulinism, ARSACS, Achondroplasia, Achromatopsia, Adenosine Monophosphate Deaminase 1, Agenesis of Corpus Callosum with Neuronopathy, Alkaptonuria, Alpha-1-Antitrypsin Deficiency, Alpha-Mannosidosis, Alpha-Sarcoglycanopathy, Alpha-Thalassemia, Alzheimers, Angiotensin II Receptor, Type I, Apolipoprotein E Genotyping, Argininosuccinicaciduria, Aspartylglycosaminuria, Ataxia with Vitamin E Deficiency, AtaxiaTelangiectasia, Autoimmune Polyendocrinopathy Syndrome Type 1, BRCA1 Hereditary Breast/Ovarian Cancer, BRCA2 Hereditary Breast/Ovarian Cancer, Bardet-Biedl Syndrome, Best Vitelliform Macular Dystrophy, Beta-Sarcoglycanopathy, Beta-Thalassemia, Biotinidase Deficiency, Blau Syndrome, Bloom Syndrome, CFTR-Related Disorders, CLN3-Related Neuronal Ceroid-Lipofuscinosis, CLN5-Related Neuronal Ceroid-Lipofuscinosis, CLN8-Related Neuronal Ceroid-Lipofuscinosis, Canavan Disease, Carnitine Palmitoyltransferase IA Deficiency, Carnitine Palmitoyltransferase II Deficiency, Cartilage-Hair Hypoplasia, Cerebral Cavernous Malformation, Choroideremia, Cohen Syndrome, Congenital Cataracts, Facial Dysmorphism, and Neuropathy, Congenital Disorder of Glycosylationla, Congenital Disorder of Glycosylation 1b, Congenital Finnish Nephrosis, Crohn Disease, Cystinosis, DFNA 9 (COCH), Diabetes and Hearing Loss, Early-Onset Primary Dystonia (DYTI), Epidermolysis Bullosa Junctional, Herlitz-Pearson Type, FANCC-Related Fanconi Anemia, FGFR1-Related Craniosynostosis, FGFR2-Related Craniosynostosis, FGFR3-Related Craniosynostosis, Factor V Leiden Thrombophilia, Factor V R2 Mutation Thrombophilia, Factor XI Deficiency, Factor XIII Deficiency, Familial Adenomatous Polyposis, Familial Dysautonomia, Familial Hypercholesterolemia Type B, Familial Mediterranean Fever, Free Sialic Acid Storage Disorders, Frontotemporal Dementia with Parkinsonism-17, Fumarase deficiency, GJB2-Related DFNA 3 Nonsyndromic Hearing Loss and Deafness, GJB2-Related DFNB 1 Nonsyndromic Hearing Loss and Deafness, GNE-Related Myopathies, Galactosemia, Gaucher Disease, Glucose-6-Phosphate Dehydrogenase Deficiency, Glutaricacidemia Type 1, Glycogen Storage Disease Type 1a, Glycogen Storage Disease Type 1b, Glycogen Storage Disease Type II, Glycogen Storage Disease Type III, Glycogen Storage Disease Type V, Gracile Syndrome, HFE Associated Hereditary Hemochromatosis, Halder AIMS, Hemoglobin S Beta-Thalassemia, Hereditary Fructose Intolerance, Hereditary Pancreatitis, Hereditary Thymine-Uraciluria, Hexosaminidase A Deficiency, Hidrotic Ectodermal Dysplasia 2, Homocystinuria Caused by Cystathionine Beta-Synthase Deficiency, Hyperkalemic Periodic Paralysis Type 1, Hyperornithinemia-Hyperammonemia-Homocitrullinuria Syndrome, Hyperoxaluria, Primary, Type 1, Hyperoxaluria, Primary, Type 2, Hypochondroplasia, Hypokalemic Periodic Paralysis Type 1, Hypokalemic Periodic Paralysis Type 2, Hypophosphatasia, Infantile Myopathy and Lactic Acidosis (Fatal and Non-Fatal Forms), Isovaleric Acidemias, Krabbe Disease, LGMD2I, Leber Hereditary Optic Neuropathy, Leigh Syndrome, French-Canadian Type, Long Chain 3-Hydroxyacyl-CoA Dehydrogenase Deficiency, MELAS, MERRF, MTHFR Deficiency, MTHFR Thermolabile Variant, MTRNR1-Related Hearing Loss and Deafness, MTTS1-Related Hearing Loss and Deafness, MYH-Associated Polyposis, Maple Syrup Urine Disease Type IA, Maple Syrup Urine Disease Type 1B, McCune-Albright Syndrome, Medium Chain Acyl-Coenzyme A Dehydrogenase Deficiency, Megalencephalic Leukoencephalopathy with Subcortical Cysts, Metachromatic Leukodystrophy, Mitochondrial Cardiomyopathy, Mitochondrial DNA-Associated Leigh Syndrome and NARP, Mucolipidosis IV, Mucopolysaccharidosis Type I, Mucopolysaccharidosis Type IIIA, Mucopolysaccharidosis Type VII, Multiple Endocrine Neoplasia Type 2, Muscle-Eye-Brain Disease, Nemaline Myopathy, Neurological phenotype, Niemann-Pick Disease Due to Sphingomyelinase Deficiency, Niemann-Pick Disease Type Cl, Nijmegen Breakage Syndrome, PPT1-Related Neuronal Ceroid-Lipofuscinosis, PROP1-related pituitary hormome deficiency, Pallister-Hall Syndrome, Paramyotonia Congenita, Pendred Syndrome, Peroxisomal Bifunctional Enzyme Deficiency, Pervasive Developmental Disorders, Phenylalanine Hydroxylase Deficiency, Plasminogen Activator Inhibitor I, Polycystic Kidney Disease, Autosomal Recessive, Prothrombin G20210A Thrombophilia, Pseudovitamin D Deficiency Rickets, Pycnodysostosis, Retinitis Pigmentosa, Autosomal Recessive, Bothnia Type, Rett Syndrome, Rhizomelic Chondrodysplasia Punctata Type 1, Short Chain Acyl-CoA Dehydrogenase Deficiency, Shwachman-Diamond Syndrome, Sjogren-Larsson Syndrome, Smith-Lemli-Opitz Syndrome, Spastic Paraplegia 13, Sulfate Transporter-Related Osteochondrodysplasia, TFR2-Related Hereditary Hemochromatosis, TPP1-Related Neuronal Ceroid-Lipofuscinosis, Thanatophoric Dysplasia, Transthyretin Amyloidosis, Trifunctional Protein Deficiency, Tyrosine Hydroxylase-Deficient DRD, Tyrosinemia Type I, Wilson Disease, X-Linked Juvenile Retinoschisis, and Zellweger Syndrome Spectrum.
- The methods of assessing the probability that progeny will express certain traits, as described herein, can be implemented into systems, programs, and/or services, which can be authorized by, referred by, and/or performed by, e.g., agencies, public or private companies, genetic counseling centers, dating or match-making services, sperm banks, egg providers, reproductive service providers, fertility clinics, or specialty laboratories
- In one example, the methods described herein are integrated into a testing service that can provide information to a couple on the probability that the couple's offspring will express one or more traits described herein, such as risk of a disease. In addition to the results of the Virtual Progeny assessment, referrals to genetic counselors and/or other relevant medical professionals can be provided in order to provide for follow up testing and consultation.
- In certain embodiments, a Virtual Progeny assessment begins with a customer order, and the customer can pay a service provider a fee in exchange for the assessment. A customer can be a two potential parents, e.g., partners. Alternatively, a customer can be a physician, a genetic counselor, a medical center, an insurance company, a website, a dating service, a matchmaking service, a pharmaceutical company, or a laboratory testing service provider, who places an order on behalf of two potential parents. For example, a customer can be two prospective parents who seek to learn whether their offspring will be at risk for developing disease. After a customer places an order, DNA collection kits can be sent to the prospective parents, who can deposit a biological sample described herein into the collection kits. The collection kits can then be returned to the company for sending to a specialty lab or can be returned directly to the specialty lab for performing the assessment. A specialty lab, either internal within the company, contracted to work with the company, or external from the company, can isolate the potential parents' DNA from the provided samples for genome scanning from which Virtual Progeny can be generated, as described herein. After analysis of the Virtual Progeny, the results can be provided to the potential parents. The results can inform the potential parents of the chances that their future offspring will express one or more traits, such as traits described herein. In certain instances, the potential parents can also receive, for example, direct phone consultation with a genetic counselor employed by the company, or contact information for genetic counselors and/or other medical professionals who can provide the potential parents with follow up testing and consultation.
- In other instances, the methods described herein can be used to allow for the evaluation of potential partners in connection with a matchmaking service. In one example, a Virtual Progeny assessment can be offered to a customer in connection with a matchmaking service, for example, through a single company or a co-marketing or partnership relationship. A user of a matchmaking service can order an assessment of Virtual Progeny described herein to determine the probability that an offspring resulting from the potential match between the user and a candidate partner will express one or more traits described herein. The user can then use this information to aid in evaluating the candidate partner for a potential match. The matchmaking service can be an on-line service, such as Shaadi.com, eHarmony.com and Match.com.
- In a particular application, assessment of Virtual Progeny begins with a customer order, where the customer pays a fee in exchange for the assessment. For example, a customer can be a user of a matchmaking service who is interested in evaluating another user for a suitable match. Such a customer can use an assessment of Virtual Progeny described herein to learn whether the potential offspring of a match between such customer and a candidate partner will express one or more traits, such as risk of disease. After selecting a candidate partner to evaluate, a customer can pay for both the customer's and the candidate partner's initial genomic scans with the candidate partner's consent. In other instances, the customer and the candidate partner can also pay separately for the initial genomic scans. After a customer places an order, DNA collection kits can be sent to the customer and the candidate partner, and the customer and the candidate partner can each deposit a biological sample into the collection kit. The collection kits can then be returned to the company for sending to a specialty lab or can be returned directly to the specialty lab for processing according to the methods described herein. A specialty lab, either internal within the company, contracted to work with the company, or external from the company, can perform genomic scans on the customer's and candidate partner's DNA from the provided sample and perform an assessment of Virtual Progeny using the methods described herein. The results of the assessment can then be provided to the customer and/or the candidate partner, and the customer and/or the candidate partner can use the results of the assessment in determining whether the other party is a suitable match.
- In other applications, a female client seeking to have a child can have a Virtual Progeny assessment performed with one or more sperm donors to aid in selecting a donor. In one exemplary method, potential sperm donors are first recruited by a sperm bank Donors who complete the screening process and are considered qualified by the sperm bank then provide a biological sample (such as a buccal swab) that can be processed to obtain whole DNA sequence, SNP genotypes, CNV genotypes or any other digital genetic information.
- A female client also provides a biological sample, such as a buccal swab, which is used to generate a genome profile for the female client. The female client genome is then recombined computationally with each donor genome to generate a series of independent Virtual Progeny genomes, as described herein, representing each potential donor-client combination. Each Virtual Progeny genome can then be assessed for the probability of exhibiting one or more traits, such as increased risk of disease. In certain instances, incompatible donor-client combinations are subtracted from the total donor pool to obtain a client-specific filtered donor pool, which can be used, e.g., as a starting point for further selection by the client. In other instances, a client can be given information on the probability of the incidence of one or more traits from donor-client combinations, such as traits preselected by the client, for further sperm donor selection by the client.
- In other applications, a male client seeking to have a child can have a Virtual Progeny assessment performed with one or more egg donors to aid in selecting a donor. Egg donors can provide a biological sample (such as a buccal swab) to generate a genome profile for the egg donor, as described herein. The male client also provides a biological sample, such as a buccal swab, which is used to generate a genome profile for the male client. The male client genome is then recombined computationally with each egg donor genome to generate a series of independent Virtual Progeny genomes, as described herein, representing each potential donor client combination. Each Virtual Progeny genome can then be assessed for the probability of exhibiting one or more traits, such as increased risk of disease. In certain instances, incompatible donor-client combinations are subtracted from the total donor pool to obtain a client-specific filtered donor pool, which can be used, e.g., as a starting point for further selection by the client. In other instances, a client can be given information on the probability of the incidence of one or more traits from donor-client combinations, such as traits preselected by the client, for further egg donor selection by the client.
- In yet other applications, a heterosexual couple seeking to use a sperm or egg donor to have a child can use Virtual Progeny assessments to screen potential donors. For example, the couple may seek a sperm donor, and the female partner will be the genetic parent of offspring with the sperm donor. Alternatively, the couple may seek an egg donor, and the male partner will be the genetic parent of offspring with the egg donor. In such instances, two rounds of Virtual Progeny assessments can be performed. A first round of Virtual Progeny assessment is performed using biological samples from the heterosexual couple. A second round of Virtual Progeny assessment is performed between the genetic parent and one or more potential donors. The results of the first round of Virtual Progeny assessment can then be compared with the results of the second round, and a donor can be chosen whose Virtual Progeny exhibits an acceptable amount of matching in one or more traits with the Virtual Progeny from the heterosexual couple.
- In still other applications, a female homosexual couple seeking to use a sperm donor to have a child can use Virtual Progeny assessments to screen potential sperm donors. Only one of the female partners will be the genetic parent of offspring with the sperm donor. A first round of Virtual Progeny assessment is performed using biological samples from the homosexual couple. A second round of Virtual Progeny assessment is performed between the genetic female parent and one or more potential sperm donors. The results of the first round of Virtual Progeny assessment can then be compared with the results of the second round, and a sperm donor can be chosen whose Virtual Progeny exhibits an acceptable amount of matching in one or more traits with the Virtual Progeny from the homosexual couple. In some situations, a Virtual Progeny assessment is also performed with the second female partner and one or more potential sperm donors, and a donor is selected whose Virtual Progeny exhibits an acceptable amount of matching in one or more traits with the Virtual Progeny from the homosexual couple.
- In yet other applications, a male homosexual couple seeking to use an egg donor to have a child can use Virtual Progeny assessments to screen potential egg donors. Only one of the male partners will be the genetic parent of offspring with the egg donor. A first round of Virtual Progeny assessment is performed using biological samples from the homosexual couple. A second round of Virtual Progeny assessment is performed between the genetic male parent and one or more potential egg donors. The results of the first round of Virtual Progeny assessment can then be compared with the results of the second round, and an egg donor can be chosen whose Virtual Progeny exhibits an acceptable amount of matching in one or more traits with the Virtual Progeny from the homosexual couple. In some situations, a Virtual Progeny assessment is also performed with the second male partner and one or more potential egg donors, and a donor is selected whose Virtual Progeny exhibits an acceptable amount of matching in one or more traits with the Virtual Progeny from the homosexual couple.
- In further applications of the methods disclosed herein, the risk of disease in a potential progeny can be assessed, as well as the likelihood of expressing a genetically influenced trait. As with the other methods disclosed, a DNA sample from a first genomic DNA are obtained from a first potential parent and a second genomic DNA sample from a second potential parent. The presence or absence of one or more nucleotide variants are identified at one or more loci of at least one pair of chromosomes of the first and the second genomic DNA samples and these identified nucleotide variants for the first and second genomic DNA samples are compared to a plurality of predetermined genomic sequences of haplotypes having predetermined frequencies at predetermined loci to identify haplotypes present in the first and second genomic DNA samples. A first diploid genome profile for the first potential parent is constructed. The first genome profile comprises the identified haplotypes in the first genomic DNA sample and a linkage probability determined by the frequencies of the identified haplotypes in the plurality of predetermined genomic sequences. A second diploid genome profile for the second potential parent is constructed. The second genome profile comprises the identified haplotypes in the second genomic DNA sample and a linkage probability determined by the frequencies of the identified haplotypes in the plurality of predetermined genomic sequences. A first library is constructed that comprises potential haploid gamete genomes from the first diploid genome profile by generating a combination of the haplotypes identified in the first genomic DNA sample using the linkage probability for each combination of the identified haplotypes, while a second library is constructed that comprises potential haploid gamete genomes from the second diploid genome profile by generating a combination of the haplotypes identified in the second genomic DNA sample using the linkage probability for each combination of the identified haplotypes. The method also entails combining a first haploid gamete genome from the first library with a second haploid gamete genome from the second library to form a diploid progeny genome. The diploid progeny genome is compared to a database of genomes relating to disease-associated or genetically influenced traits, thereby assessing the risk of disease or the likelihood of expressing a genetically influenced trait of the potential progeny.
- The methods and systems described herein can be used in combination with one or more processors, having either single or multiple cores. The processor can be operatively connected to a memory. For instance, the memory can be solid state, flash, or nanoparticle based. The processor and/or memory can be operatively connected to a network via a network adapter. The network can be digital, analog, or a combination of the two. The processor can be operatively connected to the memory to execute computer program instructions to perform one or more steps described herein. Any computer language known to those skilled in the art can be used.
- Input/output circuitry can be included to provide the capability to input data to, or output data from, the processor and/or memory. For example, input/output circuitry can include input devices, such as keyboards, mice, touch pads, trackballs, scanners, and the like, output devices, such as video adapters, monitors, printers, and the like, and input/output devices, such as, modems and the like.
- The memory can store program instructions that are executed by, and data that are used and processed by, CPUs to perform various functions. The memory can include electronic memory devices, such as random-access memory (RAM), read-only memory (ROM), programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), and flash memory, and electro-mechanical memory, such as magnetic disk drives, tape drives, and optical disk drives, which can be used as an integrated drive electronics (IDE) interface, or a variation or enhancement thereof, such as enhanced IDE (EIDE) or ultra direct memory access (UDMA), or a small computer system interface (SCSI) based interface, or a variation or enhancement thereof, such as fast-SCSI, wide-SCSI, fast and wide-SCSI, etc, or a fiber channel-arbitrated loop (FC-AL) interface.
- The systems described herein can also include an operating system that runs on the processor, including UNIX®, OS/2®, and Windows®, each of which can be configured to run many tasks at the same time, e.g., a multitasking operating systems. In one aspect, the methods are utilized with a wireless communication and/or computation device, such as a mobile phone, personal digital assistant, personal computer, and the like. Moreover, the computing system can be operable to wirelessly transmit data to wireless or wired communication devices using a data network, such as the Internet, or a local area network (LAN), wide-area network (WAN), cellular network, or other wireless networks known to those skilled in the art.
- In one embodiment, a graphical user interface can be included to allow human interaction with the computing system. The graphical user interface can comprise a screen, such as an organic light emitting diode screen, liquid crystal display screen, thin film transistor display, and the like. The graphical user interface can generate a wide range of colors, or a black and white screen can be used.
- In certain instances, the graphical user interface can be touch sensitive, and it can use any technology known to skilled artisans including, but not limited to, resistive, surface acoustic wave, capacitive, infrared, strain gauge, optical imaging, dispersive signal technology, acoustic pulse recognition, frustrated total internal reflection, and diffused laser imaging.
- The methods and compositions disclosed herein are further illustrated by the following examples. The examples are provided for illustrative purposes only. They are not to be construed as limiting the scope or content of the invention in any way.
- In this particular example, the generation of a Virtual Progeny genome is a four step process. One of ordinary skill in the art will understand that other steps may be added, combined, or deleted as desired.
- Processing is accomplished with the use of DNA microarrays, DNA sequencing protocols, or other DNA reading technologies. In the present example, a DNA microarray is used to generate information relating to loci of interest. This information is utilized to produce genome scans that include genotype information from the plurality of loci of interest, which are defined by single base polymorphisms (“SNPs or CNPs”), DNA sequence reads, copy number, or other forms of personal genetic information. In the present example, Jane Doe and John Smith provided samples, which have such information provided for
loci 01 through N (FIG. 1A ). - Existing population datasets, genome scans of family members, and a variety of computational tools and algorithms, known to those skilled in the art, may be used in combination with each person's genome scan to distinguish haplotypes, impute genotypes at additional loci, and establish long-range genetic phasing. The derived genome profile preferably incorporates phasing information in the form of stochastic matrices between haplotypes. An example of haplotype structure is shown in the figure below: loci 01-04 are inherited as an indivisible block, which is a haplotype. Stochastic matrices between
loci FIG. 1B ). - With genome scans performed on two or more related persons, phasing information is extended. In an example of genome analysis, the UCSC genome browser is used to display phasing over large maternally-inherited chromosomal segments that comprise 100 million base pairs or more (
FIG. 1C ). A Monte Carlo simulation or Markov process as described above is used to generate haplopaths through a genome, where haplotypes are transmitted intact, and stochastic matrices are used to move from one haplotype or locus to the next one. In the example, John Smith's genome is converted into a series of haplopaths by means of a Monte Carlo simulation (FIG. 1D ). - Each individual genome profile is used to generate a pool of VirtualGametes (
FIG. 1E ). Exemplary haplopaths inrows -
Step 4—Virtual Progeny Permutations from Random Virtual Gametes from Each Individual - Single Virtual Gametes from each person is chosen randomly and combined to produce one permutation of a Virtual Progeny genome. The process of Virtual Gamete choice and reproductive combination to produce a diploid genome is iterated a sufficient number of times such that the normalized sum of Virtual Progeny permutations provides a stable estimate of the Virtual Progeny genome probability distribution. For instance, the number of iterations may be between about 10 and about 100. More preferably, the number of iterations may be between about 100 and about 1000. Most preferably, the number of iterations may be between about 1000 and about 100,000. In another aspect, the number of iterations may be about 50 or greater. More preferably, the number of iterations may be about 150 or greater. Most preferably, the number of iterations may be about 3000 or greater.
- It is to be understood that while the invention has been described in conjunction with the detailed description thereof, the foregoing description is intended to illustrate and not limit the scope of the invention, which is defined by the scope of the appended claims. Other aspects, advantages, and modifications are within the scope of the following claims.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/160,191 US20210210161A1 (en) | 2009-10-20 | 2021-01-27 | Methods and systems for generating a virtual progeny genome |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US25310809P | 2009-10-20 | 2009-10-20 | |
US12/908,636 US8805620B2 (en) | 2009-10-20 | 2010-10-20 | Method and system for selecting a donor or reproductive partner for a potential parent |
US14/293,541 US10916332B2 (en) | 2009-10-20 | 2014-06-02 | Methods and systems for generating a virtual progeny genome |
US17/160,191 US20210210161A1 (en) | 2009-10-20 | 2021-01-27 | Methods and systems for generating a virtual progeny genome |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/293,541 Continuation US10916332B2 (en) | 2009-10-20 | 2014-06-02 | Methods and systems for generating a virtual progeny genome |
Publications (1)
Publication Number | Publication Date |
---|---|
US20210210161A1 true US20210210161A1 (en) | 2021-07-08 |
Family
ID=43900666
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/908,636 Active - Reinstated 2032-09-20 US8805620B2 (en) | 2009-10-20 | 2010-10-20 | Method and system for selecting a donor or reproductive partner for a potential parent |
US13/591,373 Active US8620594B2 (en) | 2009-10-20 | 2012-08-22 | Method and system for generating a virtual progeny genome |
US14/293,541 Active 2033-03-09 US10916332B2 (en) | 2009-10-20 | 2014-06-02 | Methods and systems for generating a virtual progeny genome |
US17/160,191 Pending US20210210161A1 (en) | 2009-10-20 | 2021-01-27 | Methods and systems for generating a virtual progeny genome |
Family Applications Before (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/908,636 Active - Reinstated 2032-09-20 US8805620B2 (en) | 2009-10-20 | 2010-10-20 | Method and system for selecting a donor or reproductive partner for a potential parent |
US13/591,373 Active US8620594B2 (en) | 2009-10-20 | 2012-08-22 | Method and system for generating a virtual progeny genome |
US14/293,541 Active 2033-03-09 US10916332B2 (en) | 2009-10-20 | 2014-06-02 | Methods and systems for generating a virtual progeny genome |
Country Status (3)
Country | Link |
---|---|
US (4) | US8805620B2 (en) |
EP (1) | EP2491170A4 (en) |
WO (1) | WO2011050076A1 (en) |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8543339B2 (en) * | 2008-12-05 | 2013-09-24 | 23Andme, Inc. | Gamete donor selection based on genetic calculations |
US10131947B2 (en) | 2011-01-25 | 2018-11-20 | Ariosa Diagnostics, Inc. | Noninvasive detection of fetal aneuploidy in egg donor pregnancies |
EP2929070A4 (en) * | 2012-12-05 | 2016-06-01 | Genepeeks Inc | System and method for the computational prediction of expression of single-gene phenotypes |
FI20136079A (en) * | 2013-11-04 | 2015-05-05 | Medisapiens Oy | Genetic health assessment procedure and system |
US10114851B2 (en) | 2014-01-24 | 2018-10-30 | Sachet Ashok Shukla | Systems and methods for verifiable, private, and secure omic analysis |
JP6340438B2 (en) * | 2014-02-13 | 2018-06-06 | イルミナ インコーポレイテッド | Integrated consumer genome service |
US10658068B2 (en) * | 2014-06-17 | 2020-05-19 | Ancestry.Com Dna, Llc | Evolutionary models of multiple sequence alignments to predict offspring fitness prior to conception |
US12045742B1 (en) * | 2015-04-18 | 2024-07-23 | Mikhail Ulinich | Sharing probabilities for user traits |
WO2016172464A1 (en) | 2015-04-22 | 2016-10-27 | Genepeeks, Inc. | Device, system and method for assessing risk of variant-specific gene dysfunction |
US10395759B2 (en) | 2015-05-18 | 2019-08-27 | Regeneron Pharmaceuticals, Inc. | Methods and systems for copy number variant detection |
CN115273970A (en) | 2016-02-12 | 2022-11-01 | 瑞泽恩制药公司 | Method and system for detecting abnormal karyotype |
US10252145B2 (en) | 2016-05-02 | 2019-04-09 | Bao Tran | Smart device |
US10381105B1 (en) | 2017-01-24 | 2019-08-13 | Bao | Personalized beauty system |
US10052026B1 (en) | 2017-03-06 | 2018-08-21 | Bao Tran | Smart mirror |
US11699069B2 (en) | 2017-07-13 | 2023-07-11 | Helix, Inc. | Predictive assignments that relate to genetic information and leverage machine learning models |
US9922285B1 (en) | 2017-07-13 | 2018-03-20 | HumanCode, Inc. | Predictive assignments that relate to genetic information and leverage machine learning models |
US20210032692A1 (en) * | 2017-08-17 | 2021-02-04 | Tai Diagnostics, Inc. | Methods of determining donor cell-free dna without donor genotype |
US10930380B2 (en) | 2017-11-10 | 2021-02-23 | Reliant Immune Diagnostics, Inc. | Communication loop and record loop system for parallel/serial dual microfluidic chip |
US11200986B2 (en) * | 2017-11-10 | 2021-12-14 | Reliant Immune Diagnostics, Inc. | Database and machine learning in response to parallel serial dual microfluidic chip |
US10930381B2 (en) | 2017-11-10 | 2021-02-23 | Reliant Immune Diagnostics, Inc. | Microfluidic testing system for mobile veterinary applications |
US11124821B2 (en) | 2017-11-10 | 2021-09-21 | Reliant Immune Diagnostics, Inc. | Microfluidic testing system with cell capture/analysis regions for processing in a parallel and serial manner |
US11527324B2 (en) | 2017-11-10 | 2022-12-13 | Reliant Immune Diagnostics, Inc. | Artificial intelligence response system based on testing with parallel/serial dual microfluidic chip |
US11041185B2 (en) | 2017-11-10 | 2021-06-22 | Reliant Immune Diagnostics, Inc. | Modular parallel/serial dual microfluidic chip |
WO2020102565A2 (en) | 2018-11-14 | 2020-05-22 | Flagship Pioneering Innovations V, Inc. | Systems and methods for nondestructive testing of gametes |
CN111192633A (en) * | 2020-01-07 | 2020-05-22 | 深圳市早知道科技有限公司 | Method and terminal equipment for predicting thalassemia phenotype |
KR102405758B1 (en) | 2021-11-19 | 2022-06-08 | 주식회사 클리노믹스 | System and method for determing gemetic population composition using hybrid specific reference genetic data generation for population, breed, disease groups, and species and analysis for determinig genetic components |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030195707A1 (en) * | 2000-05-25 | 2003-10-16 | Schork Nicholas J | Methods of dna marker-based genetic analysis using estimated haplotype frequencies and uses thereof |
US20050278125A1 (en) * | 2004-06-10 | 2005-12-15 | Evan Harwood | V-life matching and mating system |
WO2007050511A2 (en) * | 2005-10-24 | 2007-05-03 | Bioarray Solutions, Ltd. | Selection of genotyped transfusion donors by cross-matching to genotyped recipients |
US20090099789A1 (en) * | 2007-09-26 | 2009-04-16 | Stephan Dietrich A | Methods and Systems for Genomic Analysis Using Ancestral Data |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020077775A1 (en) * | 2000-05-25 | 2002-06-20 | Schork Nicholas J. | Methods of DNA marker-based genetic analysis using estimated haplotype frequencies and uses thereof |
WO2004092333A2 (en) * | 2003-04-09 | 2004-10-28 | Omicia Inc. | Methods of selection, reporting and analysis of genetic markers using broad based genetic profiling applications |
US20050112684A1 (en) * | 2003-11-21 | 2005-05-26 | Eric Holzle | Class I and Class II MHC Profiling for Social and Sexual Matching of Human Partners |
AU2008206924B2 (en) * | 2007-01-17 | 2013-02-07 | Syngenta Participations Ag | Process for selecting individuals and designing a breeding program |
EP1962212A1 (en) * | 2007-01-17 | 2008-08-27 | Syngeta Participations AG | Process for selecting individuals and designing a breeding program |
WO2009073628A2 (en) * | 2007-11-30 | 2009-06-11 | Celera Corporation | Genetic polymorphisms associated with psoriasis, methods of detection and uses thereof |
CA2718887A1 (en) | 2008-03-19 | 2009-09-24 | Existence Genetics Llc | Genetic analysis |
TWI460602B (en) | 2008-05-16 | 2014-11-11 | Counsyl Inc | Device for universal preconception screening |
US8543339B2 (en) | 2008-12-05 | 2013-09-24 | 23Andme, Inc. | Gamete donor selection based on genetic calculations |
WO2011038155A2 (en) | 2009-09-23 | 2011-03-31 | Existence Genetics Llc | Genetic analysis |
-
2010
- 2010-10-20 WO PCT/US2010/053396 patent/WO2011050076A1/en active Application Filing
- 2010-10-20 US US12/908,636 patent/US8805620B2/en active Active - Reinstated
- 2010-10-20 EP EP10825601.7A patent/EP2491170A4/en not_active Ceased
-
2012
- 2012-08-22 US US13/591,373 patent/US8620594B2/en active Active
-
2014
- 2014-06-02 US US14/293,541 patent/US10916332B2/en active Active
-
2021
- 2021-01-27 US US17/160,191 patent/US20210210161A1/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030195707A1 (en) * | 2000-05-25 | 2003-10-16 | Schork Nicholas J | Methods of dna marker-based genetic analysis using estimated haplotype frequencies and uses thereof |
US20050278125A1 (en) * | 2004-06-10 | 2005-12-15 | Evan Harwood | V-life matching and mating system |
WO2007050511A2 (en) * | 2005-10-24 | 2007-05-03 | Bioarray Solutions, Ltd. | Selection of genotyped transfusion donors by cross-matching to genotyped recipients |
US20090099789A1 (en) * | 2007-09-26 | 2009-04-16 | Stephan Dietrich A | Methods and Systems for Genomic Analysis Using Ancestral Data |
Also Published As
Publication number | Publication date |
---|---|
US10916332B2 (en) | 2021-02-09 |
US8620594B2 (en) | 2013-12-31 |
US20150088429A1 (en) | 2015-03-26 |
EP2491170A4 (en) | 2014-10-22 |
WO2011050076A1 (en) | 2011-04-28 |
US20110124515A1 (en) | 2011-05-26 |
US20120322671A1 (en) | 2012-12-20 |
EP2491170A1 (en) | 2012-08-29 |
US8805620B2 (en) | 2014-08-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210210161A1 (en) | Methods and systems for generating a virtual progeny genome | |
US11545235B2 (en) | System and method for the computational prediction of expression of single-gene phenotypes | |
Crow | The origins, patterns and implications of human spontaneous mutation | |
US20060166224A1 (en) | Associations using genotypes and phenotypes | |
Pritchard et al. | The genetics of human adaptation: hard sweeps, soft sweeps, and polygenic adaptation | |
McGuffin et al. | Psychiatric genetics and genomics | |
TWI363309B (en) | Genetic analysis systems, methods and on-line portal | |
Manton et al. | Medical demography: interaction of disability dynamics and mortality | |
US20160103959A1 (en) | Methods and Systems for Universal Carrier Screening | |
US20160034635A1 (en) | Evolutionary models of multiple sequence alignments to predict offspring fitness prior to conception | |
De Benedictis et al. | Inherited variability of the mitochondrial genome and successful aging in humans | |
Long et al. | EEF1A2 mutations in epileptic encephalopathy/intellectual disability: Understanding the potential mechanism of phenotypic variation | |
US20030233377A1 (en) | Methods, systems, software and apparatus for prediction of polygenic conditions | |
US20080268443A1 (en) | Broad-based disease association from a gene transcript test | |
Nishimura et al. | ENU large-scale mutagenesis and quantitative trait linkage (QTL) analysis in mice: novel technologies for searching polygenetic determinants of craniofacial abnormalities | |
Suzuki et al. | Trait selection strategy in multi-trait GWAS: Boosting SNP discoverability | |
US20080270041A1 (en) | System and method for broad-based multiple sclerosis association gene transcript test | |
Kutalik | 48th European Mathematical Genetics Meeting (EMGM) 2020 | |
Kutalik et al. | 48th European Mathematical Genetics Meeting (EMGM) 2020: Lausanne, Switzerland, April 16–17, 2020 | |
Fummey | Exploiting large-scale exome sequence data to study the genotype-phenotype relationship | |
Roberts | Advanced statistical methods for genetic studies of lupus | |
US20090125244A1 (en) | Broad-based neurotoxin-related gene mutation association from a gene transcript test | |
Althoff | WHAT CHILD MENTAL HEALTH PROFESSIONALS SHOULD KNOW ABOUT GENETICS | |
Zilaei | BY Mohammad Zilaei | |
Lorey | Genetic Determinants of Disease and Genetics in Public Health |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED |
|
AS | Assignment |
Owner name: ANCESTRY.COM DNA, LLC, UTAH Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GENEPEEKS (ABC), LLC;REEL/FRAME:055979/0519 Effective date: 20180904 Owner name: GENEPEEKS (ABC), LLC, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:GENEPEEKS, INC.;REEL/FRAME:055976/0891 Effective date: 20180420 Owner name: GENEPEEKS, INC., NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SILVER, LEE M.;REEL/FRAME:055976/0361 Effective date: 20141222 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: WILMINGTON TRUST, NATIONAL ASSOCIATION, AS NOTES COLLATERAL AGENT, DELAWARE Free format text: PATENT SECURITY AGREEMENT;ASSIGNORS:ANCESTRY.COM DNA, LLC;ANCESTRY.COM OPERATIONS INC.;REEL/FRAME:058536/0278 Effective date: 20211217 Owner name: CREDIT SUISSE AG, CAYMAN ISLANDS BRANCH, AS COLLATERAL AGENT, NEW YORK Free format text: PATENT SECURITY AGREEMENT;ASSIGNORS:ANCESTRY.COM DNA, LLC;ANCESTRY.COM OPERATIONS INC.;REEL/FRAME:058536/0257 Effective date: 20211217 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |