WO2008140467A2 - Genetic markers and methods for improving dairy productivity and fitness traits - Google Patents
Genetic markers and methods for improving dairy productivity and fitness traits Download PDFInfo
- Publication number
- WO2008140467A2 WO2008140467A2 PCT/US2007/021187 US2007021187W WO2008140467A2 WO 2008140467 A2 WO2008140467 A2 WO 2008140467A2 US 2007021187 W US2007021187 W US 2007021187W WO 2008140467 A2 WO2008140467 A2 WO 2008140467A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- animal
- snps
- genotype
- snp
- loci
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 202
- 230000002068 genetic effect Effects 0.000 title claims abstract description 80
- 235000013365 dairy product Nutrition 0.000 title abstract description 30
- 241001465754 Metazoa Species 0.000 claims abstract description 244
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 94
- 238000009395 breeding Methods 0.000 claims abstract description 77
- 230000001488 breeding effect Effects 0.000 claims abstract description 77
- 235000013336 milk Nutrition 0.000 claims abstract description 42
- 239000008267 milk Substances 0.000 claims abstract description 42
- 210000004080 milk Anatomy 0.000 claims abstract description 42
- 238000004519 manufacturing process Methods 0.000 claims abstract description 32
- 210000001082 somatic cell Anatomy 0.000 claims abstract description 32
- 230000035935 pregnancy Effects 0.000 claims abstract description 28
- 230000035772 mutation Effects 0.000 claims abstract description 17
- 239000003550 marker Substances 0.000 claims description 96
- 108700028369 Alleles Proteins 0.000 claims description 77
- 238000004458 analytical method Methods 0.000 claims description 47
- 210000000349 chromosome Anatomy 0.000 claims description 38
- 239000002773 nucleotide Substances 0.000 claims description 26
- 125000003729 nucleotide group Chemical group 0.000 claims description 26
- 150000007523 nucleic acids Chemical group 0.000 claims description 14
- 210000004681 ovum Anatomy 0.000 claims description 14
- 230000009027 insemination Effects 0.000 claims description 11
- 210000000582 semen Anatomy 0.000 claims description 11
- 230000001364 causal effect Effects 0.000 claims description 10
- 230000004720 fertilization Effects 0.000 claims description 10
- 238000000338 in vitro Methods 0.000 claims description 10
- 210000001161 mammalian embryo Anatomy 0.000 claims description 10
- 102000002322 Egg Proteins Human genes 0.000 claims description 9
- 108010000912 Egg Proteins Proteins 0.000 claims description 9
- 230000006872 improvement Effects 0.000 claims description 9
- 238000012546 transfer Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 8
- 238000003499 nucleic acid array Methods 0.000 claims description 4
- 206010071602 Genetic polymorphism Diseases 0.000 claims 5
- 244000144980 herd Species 0.000 abstract description 24
- 238000012360 testing method Methods 0.000 abstract description 13
- 238000003205 genotyping method Methods 0.000 abstract description 5
- 241000283690 Bos taurus Species 0.000 description 49
- 244000309464 bull Species 0.000 description 26
- 230000000694 effects Effects 0.000 description 22
- 102000054766 genetic haplotypes Human genes 0.000 description 20
- 238000009396 hybridization Methods 0.000 description 18
- 102000004169 proteins and genes Human genes 0.000 description 18
- 102000054765 polymorphisms of proteins Human genes 0.000 description 17
- 239000000523 sample Substances 0.000 description 17
- 238000011156 evaluation Methods 0.000 description 16
- 238000013507 mapping Methods 0.000 description 16
- 235000018102 proteins Nutrition 0.000 description 14
- ZHNUHDYFZUAESO-UHFFFAOYSA-N Formamide Chemical compound NC=O ZHNUHDYFZUAESO-UHFFFAOYSA-N 0.000 description 12
- 108020004414 DNA Proteins 0.000 description 11
- 108091028043 Nucleic acid sequence Proteins 0.000 description 10
- 102100039824 Pre T-cell antigen receptor alpha Human genes 0.000 description 9
- 238000002399 angioplasty Methods 0.000 description 9
- FAPWRFPIFSIZLT-UHFFFAOYSA-M Sodium chloride Chemical compound [Na+].[Cl-] FAPWRFPIFSIZLT-UHFFFAOYSA-M 0.000 description 8
- 230000000875 corresponding effect Effects 0.000 description 8
- 230000014509 gene expression Effects 0.000 description 8
- 230000006651 lactation Effects 0.000 description 8
- 241000282414 Homo sapiens Species 0.000 description 7
- 230000002759 chromosomal effect Effects 0.000 description 7
- 230000007614 genetic variation Effects 0.000 description 7
- 210000004940 nucleus Anatomy 0.000 description 7
- 244000309466 calf Species 0.000 description 6
- 230000000295 complement effect Effects 0.000 description 6
- 230000000366 juvenile effect Effects 0.000 description 6
- 239000000203 mixture Substances 0.000 description 6
- 208000002254 stillbirth Diseases 0.000 description 6
- 231100000537 stillbirth Toxicity 0.000 description 6
- 108091035707 Consensus sequence Proteins 0.000 description 5
- 230000008901 benefit Effects 0.000 description 5
- 230000002349 favourable effect Effects 0.000 description 5
- 230000001965 increasing effect Effects 0.000 description 5
- 238000011002 quantification Methods 0.000 description 5
- 230000001850 reproductive effect Effects 0.000 description 5
- 241000894007 species Species 0.000 description 5
- 101100478277 Homo sapiens SPTA1 gene Proteins 0.000 description 4
- 102000014171 Milk Proteins Human genes 0.000 description 4
- 108010011756 Milk Proteins Proteins 0.000 description 4
- DBMJMQXJHONAFJ-UHFFFAOYSA-M Sodium laurylsulphate Chemical compound [Na+].CCCCCCCCCCCCOS([O-])(=O)=O DBMJMQXJHONAFJ-UHFFFAOYSA-M 0.000 description 4
- 102100037608 Spectrin alpha chain, erythrocytic 1 Human genes 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 4
- 230000002596 correlated effect Effects 0.000 description 4
- 230000008774 maternal effect Effects 0.000 description 4
- 238000002844 melting Methods 0.000 description 4
- 230000008018 melting Effects 0.000 description 4
- 235000021243 milk fat Nutrition 0.000 description 4
- 235000021239 milk protein Nutrition 0.000 description 4
- 230000008775 paternal effect Effects 0.000 description 4
- 239000011780 sodium chloride Substances 0.000 description 4
- 235000019333 sodium laurylsulphate Nutrition 0.000 description 4
- 239000000243 solution Substances 0.000 description 4
- 238000000018 DNA microarray Methods 0.000 description 3
- 108700028146 Genetic Enhancer Elements Proteins 0.000 description 3
- 230000001276 controlling effect Effects 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 3
- 210000002257 embryonic structure Anatomy 0.000 description 3
- 230000035558 fertility Effects 0.000 description 3
- 238000009399 inbreeding Methods 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 235000013372 meat Nutrition 0.000 description 3
- 108020004707 nucleic acids Proteins 0.000 description 3
- 102000039446 nucleic acids Human genes 0.000 description 3
- 238000012163 sequencing technique Methods 0.000 description 3
- 210000001519 tissue Anatomy 0.000 description 3
- 108020004635 Complementary DNA Proteins 0.000 description 2
- 230000004544 DNA amplification Effects 0.000 description 2
- 238000009007 Diagnostic Kit Methods 0.000 description 2
- 108091034117 Oligonucleotide Proteins 0.000 description 2
- 108091081024 Start codon Proteins 0.000 description 2
- 241000251539 Vertebrata <Metazoa> Species 0.000 description 2
- 239000000654 additive Substances 0.000 description 2
- 230000000996 additive effect Effects 0.000 description 2
- 238000010171 animal model Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 239000008280 blood Substances 0.000 description 2
- 210000004369 blood Anatomy 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 239000003153 chemical reaction reagent Substances 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 2
- 239000002131 composite material Substances 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000012173 estrus Effects 0.000 description 2
- UYTPUPDQBNUYGX-UHFFFAOYSA-N guanine Chemical compound O=C1NC(N)=NC2=C1N=CN2 UYTPUPDQBNUYGX-UHFFFAOYSA-N 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 244000309465 heifer Species 0.000 description 2
- 238000002513 implantation Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000013011 mating Effects 0.000 description 2
- 238000002493 microarray Methods 0.000 description 2
- 230000016087 ovulation Effects 0.000 description 2
- 230000003234 polygenic effect Effects 0.000 description 2
- 238000003752 polymerase chain reaction Methods 0.000 description 2
- 230000006798 recombination Effects 0.000 description 2
- 238000005215 recombination Methods 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 150000003839 salts Chemical class 0.000 description 2
- 229910001415 sodium ion Inorganic materials 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 1
- 108020005345 3' Untranslated Regions Proteins 0.000 description 1
- 108020003589 5' Untranslated Regions Proteins 0.000 description 1
- 108700024394 Exon Proteins 0.000 description 1
- 238000001134 F-test Methods 0.000 description 1
- 238000003657 Likelihood-ratio test Methods 0.000 description 1
- 102000013013 Member 2 Subfamily G ATP Binding Cassette Transporter Human genes 0.000 description 1
- 108010090306 Member 2 Subfamily G ATP Binding Cassette Transporter Proteins 0.000 description 1
- 108091092878 Microsatellite Proteins 0.000 description 1
- 241001455588 Mushroom bacilliform virus Species 0.000 description 1
- 108020004711 Nucleic Acid Probes Proteins 0.000 description 1
- 238000012228 RNA interference-mediated gene silencing Methods 0.000 description 1
- 206010042573 Superovulation Diseases 0.000 description 1
- 241000188156 Tamu Species 0.000 description 1
- 108020005038 Terminator Codon Proteins 0.000 description 1
- 108091023045 Untranslated Region Proteins 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 239000012805 animal sample Substances 0.000 description 1
- 230000000692 anti-sense effect Effects 0.000 description 1
- 239000007864 aqueous solution Substances 0.000 description 1
- 238000003556 assay Methods 0.000 description 1
- 238000012098 association analyses Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000000481 breast Anatomy 0.000 description 1
- 239000007853 buffer solution Substances 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 150000001768 cations Chemical class 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 239000002299 complementary DNA Substances 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- OPTASPLRGRRNAP-UHFFFAOYSA-N cytosine Chemical class NC=1C=CNC(=O)N=1 OPTASPLRGRRNAP-UHFFFAOYSA-N 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000003795 desorption Methods 0.000 description 1
- 230000000368 destabilizing effect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 235000019197 fats Nutrition 0.000 description 1
- 239000012530 fluid Substances 0.000 description 1
- 238000011010 flushing procedure Methods 0.000 description 1
- 230000009368 gene silencing by RNA Effects 0.000 description 1
- 231100000640 hair analysis Toxicity 0.000 description 1
- 238000010348 incorporation Methods 0.000 description 1
- 238000011005 laboratory method Methods 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 244000144972 livestock Species 0.000 description 1
- 238000004949 mass spectrometry Methods 0.000 description 1
- 208000004396 mastitis Diseases 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000010369 molecular cloning Methods 0.000 description 1
- 239000002853 nucleic acid probe Substances 0.000 description 1
- 238000001558 permutation test Methods 0.000 description 1
- 230000004983 pleiotropic effect Effects 0.000 description 1
- 108091033319 polynucleotide Proteins 0.000 description 1
- 102000040430 polynucleotide Human genes 0.000 description 1
- 239000002157 polynucleotide Substances 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000007894 restriction fragment length polymorphism technique Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 210000002966 serum Anatomy 0.000 description 1
- 239000001509 sodium citrate Substances 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000013179 statistical model Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- HRXKRNGNAMMEHJ-UHFFFAOYSA-K trisodium citrate Chemical compound [Na+].[Na+].[Na+].[O-]C(=O)CC(O)(CC([O-])=O)C([O-])=O HRXKRNGNAMMEHJ-UHFFFAOYSA-K 0.000 description 1
- 229940038773 trisodium citrate Drugs 0.000 description 1
- 230000035899 viability Effects 0.000 description 1
- 230000002747 voluntary effect Effects 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/172—Haplotypes
Definitions
- the invention relates to the enhancement of desirable characteristics in dairy cattle.
- Embodiments relate to genes, gene expression, and genetic markers used in methods for improving dairy cattle fitness and/or productivity. More specifically, it relates to the use of genetic markers in methods for improving dairy cattle, including improvements with respect to Milk production (MP), Somatic Cell Score (SCS), Daughter Pregnancy Rate (DPR), Productive Life (PL), and Net Merit (NM).
- MP Milk production
- SCS Somatic Cell Score
- DPR Daughter Pregnancy Rate
- PL Productive Life
- NM Net Merit
- milk productivity e.g. milk, fat, protein yield, fat%, protein % and persistency of lactation
- health e.g. Somatic Cell Count, mastitis incidence
- fertility e.g. pregnancy rate, display of estrus, calving interval and non-return rates in bulls
- calving ease e.g. direct and maternal calving ease
- longevity e.g. productive life
- functional conformation e.g. udder support, proper foot and leg shape, proper rump angle, etc.
- efficiency traits are often unfavorably correlated with fitness traits.
- Genomics offers the potential for greater improvement in productivity and fitness traits through the discovery of genes, or genetic markers linked to genes, that account for genetic variation and can be used for more direct and accurate selection. Close to 1000 markers with associations with productivity and fitness traits have been reported (see www.bovineqtl.tamu.edu/ for a searchable database of reported QTL), however, the resolution of QTL location is still quite low which makes it difficult to utilize these QTL in marker-assisted selection (MAS) on an industrial scale.
- MAS marker-assisted selection
- Linkage disequilibrium can be defined as the observance of alleles at two distinctive loci occurring in gametes more frequently than expected given the known allele frequencies and recombination fraction between the two loci (Source: NHBLI/NCBI Glossary). Given the knowledge that allele 1 at marker A occurs in gametes disproportionately with allele 1 at marker B confers the equivalent practical and economic value of marker B upon marker A, such that they can be considered as equivalent.
- the large number of resulting linked markers can be used in several methods of marker selection or marker-assisted selection, including whole-genome selection (WGS) (Meu Giveaway et al., Genetics 2001) to improve the genetic merit of the population for these traits and create value in the dairy industry.
- WGS whole-genome selection
- Various embodiments of the invention provide methods for evaluating an animal's genotype at 10 or more positions in the animal's genome.
- the animal's genotype is evaluated at positions within a segment of DNA (an allele), that contains at least one SNP selected from the SNPs described in the Tables and Sequence Listing of the present application.
- Other embodiments of the invention provide methods for allocating animals for use according to their predicted marker breeding value for productivity and/or fitness.
- Various aspects of this embodiment of the invention provide methods that comprise: a) analyzing the animal's genomic sequence at one or more polymorphisms (where the alleles analyzed each comprise at least one SNP) to determine the animal's genotype at each of those polymorphisms; b) analyzing the genotype determined for each polymorphisms to determine which allele of the SNP is present; c) allocating the animal for use based on its genotype at one or more of the polymorphisms analyzed.
- Various aspects of this embodiment of the invention provide methods for allocating animals for use based on a favorable association between the animal's genotype, at one or more polymorphisms disclosed in the present application, and a desired phenotype. Alternatively, the methods provide for not allocating an animal for a certain use because it has one or more SNP alleles that are either associated with undesirable phenotypes or are not associated with desirable phenotypes.
- Other embodiments of the invention provide methods for allocating animals for use according to their predicted marker breeding value for Milk production, Somatic Cell Score, Daughter Pregnancy Rate, Productive Life, and/or Net Merit.
- Various aspects of this embodiment of the invention provide methods that comprise: a) analyzing the animal's genomic sequence at one or more polymorphisms (where the alleles analyzed each comprise at least one SNP) to determine the animal's genotype at each of those polymorphisms; b) analyzing the genotype determined for each polymorphisms to determine which allele of the SNP is present; c) allocating the animal for use based on its genotype at one or more of the polymorphisms analyzed.
- Various aspects of this embodiment of the invention provide methods for allocating animals for use based on a favorable association between the animal's genotype, at one or more polymorphisms disclosed in the present application, and a desired phenotype.
- the methods provide for not allocating an animal for a certain use because it has one or more SNP alleles that are either associated with undesirable phenotypes or are not associated with desirable phenotypes.
- Other embodiments of the invention provide methods for selecting animals for use in breeding to produce progeny.
- Various aspects of these methods comprise: A) determining the genotype of at least one potential parent animal at one or more locus/loci, where at least one of the loci analyzed contains an allele of a SNP selected from the group of SNPs described in Table 3, 4, 5 and 6 and the Sequence Listing. B) Analyzing the determined genotype at one or more positions for at least one animal to determine which of the SNP alleles is present. C) Correlating the analyzed allele(s) with one or more phenotypes. D) Allocating at least one animal for use to produce progeny.
- Other embodiments of the invention provide methods for selecting animals for use in breeding to produce progeny.
- Various aspects of these methods comprise: A) determining the genotype of at least one potential parent animal at one or more locus/loci, where at least one of the loci analyzed contains an allele of a SNP selected from the group of SNPs described in Table 4 and the Sequence Listing and/or other SNPs within one or more genes described in Table 4. B) Analyzing the determined genotype at one or more positions for at least one animal to determine which of the SNP alleles is present. C) Correlating the analyzed allele(s) with one or more phenotypes. D) Allocating at least one animal for use to produce progeny.
- inventions provide methods for producing offspring animals (progeny animals). Aspects of this embodiment of the invention provide methods that comprise: breeding an animal that has been selected for breeding by methods described herein to produce offspring.
- the offspring may be produced by purely natural methods or through the use of any appropriate technical means, including but not limited to: artificial insemination; embryo transfer (ET), multiple ovulation embryo transfer (MOET), in vitro fertilization (FVF), or any combination thereof.
- databases or groups of databases each database comprising lists of the nucleic acid sequences, which lists include a plurality of the SNPs described in Tables 3, 4, 5, and 6 and the Sequence Listing.
- Preferred aspects of this embodiment of the invention provide for databases comprising the sequences for 50 or more SNPs.
- Other aspects of these embodiments comprise methods for using a computer algorithm or algorithms that use one or more database(s), each database comprising a plurality of the SNPs described in Table 3, 5 and 6 and the Sequence Listing to identify phenotypic traits associated with the inheritance of one or more alleles of the SNPs, and/or using such a database to aid in animal allocation.
- databases or groups of databases each database comprising lists of the nucleic acid sequences, which lists include a plurality of the SNPs described in Table 4 and the Sequence Listing and/or other SNPs located with the genes listed in Table 4.
- Preferred aspects of this embodiment of the invention provide for databases comprising the sequences for 50 or more SNPs.
- Other aspects of these embodiments comprise methods for using a computer algorithm or algorithms that use one or more database(s), each database comprising a plurality of the SNPs described in Table 4 and the Sequence Listing to identify phenotypic traits associated with the inheritance of one or more alleles of the SNPs, and/or using such a database to aid in animal allocation.
- genes identified in this application are likely responsible (either through quantitative or qualitative variation of the protein) for a significant proportion of the observed genetic variation for the trait.
- other embodiments of the invention may include modulating the presence of the protein products of genes described in Table 4 in the animal through various established methods, which would therefore likely modulate the phenotype of the animal in a predictable fashion.
- Additional embodiments of the invention provide methods for identifying other genetic markers that are in allelic association with one or more of the SNPs described in Tables 3, 4, 5, and 6and the Sequence Listing.
- allelic association preferably means: nonrandom deviation of f(AjB j ) from the product of f(A,) and f(B j ), which is specifically defined by r ⁇ >0.2, where r " is measured from a reasonably large animal sample (e.g., ⁇ IOO) and defined as
- Ai represents an allele at one locus
- Bi represents an allele at another locus
- ) denotes frequency of gametes having both Ai and Bi
- f(A0 is the frequency of A
- ) is the frequency of Bi in a population.
- allocating animals for use and “allocation for use” preferably mean deciding how an animal will be used within a herd or that it will be removed from the herd to achieve desired herd management goals.
- an animal might be allocated for use as a breeding animal or allocated for sale as a non- breeding animal (e.g. allocated to animals intended to be sold for meat).
- animals may be allocated for use in sub-groups within the breeding programs that have very specific goals (e.g. productivity or fitness). Accordingly, even within the group of animals allocated for breeding purposes, there may be more specific allocation for use to achieve more specific and/or specialized breeding goals.
- anchor SNP and “anchor Marker” preferably refer to a SNP located in a region determined to be in genetic association/linkage with one or more traits.
- anchor markers are identified as those SNP/Markers that have a SEQ ID NO: identical to region number of the region containing them.
- SEQ ID NO: 1 provides the nucleic acid sequence for the anchor marker for region number 1.
- anchor markers are identified as those SNP/Markers that have a SEQ ID NO listed in column 1.
- SEQ ID NO: 173931 provides the nucleic acid sequence for the anchor marker for the first region, corresponding nearby genes are listed in column 2, and SEQ ID numbers of markers within the gene are listed in column3.
- animal or “animals” preferably refer to dairy cattle.
- fit preferably refers to traits that include, but are not limited to: pregnancy rate (PR), daughter pregnancy rate (DPR), productive life (PL), somatic cell count (SCC) and somatic cell score (SCS).
- PR pregnancy rate
- DPR daughter pregnancy rate
- PL productive life
- SCC somatic cell count
- SCS somatic cell score
- PR and DPR refer to the percentage of non-pregnant animals that become pregnant during each 21 -day period.
- PL is calculated as months in milk in each lactation, summed across all lactations until removal of the cow from the herd (by culling or death).
- growth refers to the measurement of various parameters associated with an increase in an animal's size and/or weight.
- linkage disequilibrium preferably means allelic association wherein A
- MAS marker-assisted selection
- marker breeding value MBV
- PMBV predicted marker breeding value
- natural breeding preferably refers to mating animals without human intervention in the fertilization process. That is, without the use of mechanical or technical methods such as artificial insemination or embryo transfer. The term does not refer to selection of the parent animals.
- neighbored markers or “neighboring SNPs” preferably refer to SNPs in close proximity to an anchor SNP, most preferably these terms refer to markers/SNPs located within about 70 kilobases of the anchor SNP.
- net merit preferably refers to a composite index that includes several commonly measured traits weighted according to relative economic value in a typical production setting and expressed as lifetime economic worth per cow relative to an industry base.
- Examples of a net merit indexes include, but are not limited to, $NM or TPI in the USA, LPI in Canada, etc (formulae for calculating these indices are well known in the art (e.g. $NM can be found on the USDA/AIPL website: www.aipl.arsusda.gov/reference.htm)
- milk production preferably refers to phenotypic traits related to the productivity of a dairy animal including milk fluid volume, fat percent, protein percent, fat yield, and protein yield.
- the term “predicted value” preferably refers to an estimate of an animal's breeding value or transmitting ability based on its genotype and pedigree.
- productivity and “production” preferably refers to yield traits that include, but are not limited to: total milk yield, milk fat percentage, milk fat yield, milk protein percentage, milk protein yield, total lifetime production, milking speed and lactation persistency.
- quantitative trait is used to denote a trait that is controlled by multiple (two or more, and often many) genes each of which contributes small to moderate effect on the trait. The observations on quantitative traits often follow a normal distribution.
- QTL quantitative trait locus
- reproductive material includes, but is not limited to semen, spermatozoa, ova, and zygote(s).
- single nucleotide polymorphism refers to a location in an animal's genome that is polymorphic within the population. That is, within the population some individual animals have one type of base at that position, while others have a different base. For example, a SNP might refer to a location in the genome where some animals have a "G” in their DNA sequence, while others have a "T”.
- hybridization under stringent conditions and “stringent hybridization conditions” preferably mean conditions under which a "probe” will hybridize to its target sequence to a detectably greater degree than to other sequences (e.g., at least 5-fold over background).
- Stringent conditions are target-sequence-dependent and will differ depending on the structure of the polynucleotide. By controlling the stringency of the hybridization and/or washing conditions, target sequences that are 100% complementary to the probe can be identified (homologous probing). Alternatively, stringency conditions can be adjusted to allow some mismatching in sequences so that lower degrees of similarity are detected (heterologous probing).
- stringent conditions will be those in which the salt concentration is less than about 1.5 M Na ion, typically about 0.01 to 1.0 M Na ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30 0 C for short probes (e.g., 10 to 50 nucleotides) and at least about 60 0 C for long probes (e.g., greater than 50 nucleotides). Stringency may also be adjusted with the addition of destabilizing agents such as formamide.
- Exemplary moderate stringency conditions include hybridization in 40 to 45% formamide, 1 M NaCl, 1% SDS at 37 0 C, and a wash in 0.5X to IX SSC at 55 to 60 0 C.
- Exemplary high stringency conditions include hybridization in 50% formamide, 1 M NaCl, 1% SDS at 37°C, and a wash in 0.1 X SSC at 60 to 65 0 C.
- the duration of hybridization is generally less than about 24 hours, usually about 4 to about 12 hours.
- T m the thermal melting point
- % GQ-0.61 (% form)-500/L the thermal melting point
- M the molarity of monovalent cations
- % GC the percentage of guanine and cytosine nucleotides in the DNA
- % form is the percentage of formamide in the hybridization solution
- L the length of the hybrid in base pairs.
- the T n is the temperature (under defined ionic strength and pH) at which 50% of a complementary target sequence hybridizes to a perfectly matched probe.
- T m is reduced by about 1°C for each 1 % of mismatching; thus, T n ,, hybridization, and/or wash conditions can be adjusted to hybridize to sequences of the desired identity. For example, if sequences with 90% identity are sought, the T m can be decreased 10 0 C.
- stringent conditions are selected to be about 5°C lower than the T n , for the specific sequence and its complement at a defined ionic strength and pH.
- highly stringent conditions can utilize a hybridization and/or wash at 1 , 2, 3, or 4°C lower than the thermal melting point (T m ); moderately stringent conditions can utilize a hybridization and/or wash at 6, 7, 8, 9, or 10 0 C lower than the thermal melting point (T m ); low stringency conditions can utilize a hybridization and/or wash at 1 1 , 12, 13, 14, 15, or 20 0 C lower than the thermal melting point (T m ).
- T m thermal melting point
- T n a temperature at which the desired degree of mismatching results in a T n , of less than 45°C (aqueous solution) or 32°C (formamide solution).
- SSC concentration a temperature at which a higher temperature can be used.
- An extensive guide to the hybridization of nucleic acids is found in Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology— Hybridization with Nucleic Acid Probes, Part I, Chapter 2 (Elsevier, N. Y.); and Ausubel et al., eds. (1995) Current Protocols in Molecular Biology, Chapter 2 (Greene Publishing and Wiley-Interscience, New York). See also Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, N.Y.).
- marker breeding value MBV
- PMBV predicted marker breeding value
- whole-genome analysis preferably refers to the process of QTL mapping of the entire genome at high marker density (i.e. at least about one marker per cM) and detection of markers that are in population-wide linkage disequilibrium with QTL.
- WGS whole-genome selection
- MAS marker-assisted selection
- Various embodiments of the present invention provide methods for evaluating an animal's (especially a dairy animal's) genotype.
- the animal's genotype is evaluated at 10 or more positions (i.e. with respect to 10 or more genetic markers).
- aspects of these embodiments of the invention provide methods that comprise determining the animal's genomic sequence at 10 or more locations (loci) that contain single nucleotide polymorphisms (SNPs).
- Embodiments of the invention provide methods for evaluating an animal's genotype by determining which of two or more alleles for the SNP are present for each of 10 or more SNPs selected from the group consisting of the SNPs described in Tables 5 and 6 of the instant application.
- embodiments of the invention provide methods for evaluating an animal's genotype by determining which of two or more alleles for the SNP are present for each of 10 or more SNPs selected from the group consisting of the SNPs described in Tables 3, 4, 5, and 6 and the Sequence Listing of the instant application and/or SNPs located on the same chromosome and within approximately 70 kilobases of one or more of the SNP anchor markers described in Tables 3 and 4 and the Sequence Listing. That is, within approximately 70,000 base pairs of the anchor SNP marker, in either the 5' or 3' direction from an anchor SNP.
- Various embodiments of the present invention provide methods for evaluating an animal's (especially a dairy animal's) genotype.
- the animal's genotype is evaluated at 10 or more positions (i.e. with respect to 10 or more genetic markers).
- aspects of these embodiments of the invention provide methods that comprise determining the animal's genomic sequence at 10 or more locations (loci) that contain single nucleotide polymorphisms (SNPs).
- the invention provides methods for evaluating an animal's genotype by determining which of two or more alleles for the SNP are present for each of 10 or more SNPs selected from the group consisting of the SNPs described in Table 4 and the Sequence Listing of the instant application and/or SNPs located on the same chromosome and within a gene, wherein the gene has a portion of its sequence within approximately 70 kilobases of one or more of the SNP anchor markers described in Table 4 and the Sequence Listing. That is, within a gene residing within approximately 70,000 base pairs of the anchor SNP marker, in either the 5' or 3' direction from the anchor SNP.
- the animal's genotype is evaluated to determine which allele is present for 25 or more SNPs selected from the group of consisting of the SNPs described in Tables 3, 4, 5, and 6and the Sequence Listing and SNPs on the same chromosome and within about 70 kilobases of one or more of the SNP anchor markers, described in Tables 3, 4, 5, and 6and the Sequence Listing. More, preferably the animal's genotype is determined for positions corresponding with 50, 100, 200, 500, or 1000, or more of the SNPs selected from this group.
- the animal's genotype is evaluated to determine which allele is present for 25 or more SNPs selected from the group of consisting of the SNPs described in Table 4 and the Sequence Listing and SNPs on the same chromosome and within a gene, wherein the gene has a portion of its sequence within about 70 kilobases of one or more of the SNP anchor markers, described in Table 4 and the Sequence Listing. More, preferably the animal's genotype is determined for positions corresponding with 50, 100, 200, 500, or 1000, or more of the SNPs selected from this group.
- the animal's genotype is evaluated to determine which allele is present for 25 or more SNPs selected from the group of SNPs described in Tables 5 and 6. More, preferably the animal's genotype is determined for positions corresponding with 50, 100, 200, 500, or 1000, or more of the SNPs described in Tables 5 and 6.
- the animal's genotype is analyzed with respect to at least 10, 25, 50, 100, 200, 500, or more SNPs that have been shown to be associated with productivity and/or fitness (see Table 5 for a list of the SNPs associated with these traits).
- embodiments of the invention provides a method for genotyping 10 or more, 25 or more, 50 or more, 100 or more, 200 or more, or 500 or more, or 1000 or more SNPs that have been determined to be significantly associated with productivity, as described in Tables 5 and 6.
- the animal's genotype is analyzed with respect to at least 10, 25, 50, 100, 200, 500, or more SNPs that have been shown to be associated with one or more traits selected from the group consisting of Milk Production, Somatic Cell Score, Daughter Pregnancy Rate, Productive Life, and Net Merit (see Tables 3 and 4 for a list of the SEQ ID NOs of the SNPs associated with these traits, including associated anchor SNPs).
- SNPs are preferably selected from the group consisting of the SNPs described in Table 3 and 4 and the Sequence Listing and the respective neighboring SNPs.
- aspects of the present invention also provides for both whole-genome analysis and whole genome-selection (WGS) (that is marker-assisted selection (MAS) on a genome-wide basis).
- WGS whole-genome analysis
- MAS marker-assisted selection
- Various aspects of this embodiment of the invention provide for either whole-genome analysis or WGS wherein the makers analyzed for an animal span the animal's entire genome at moderate to high density. That is, the animal's genome is analyzed with markers that on average occur, at least, approximately every 1 to 5 centimorgans in the genome.
- the invention provides that of the markers used to carry out the whole-genome analysis or WGS, 10 or more, 25, or more, 50 or more, 100 or more, 200 or more, 500 or more, or 1000 or more are selected from the markers described in Tables 5 and 6.
- the markers may be associated with fitness or associated with productivity, or may be associated with both fitness and productivity.
- aspects of the present invention also provides for both whole-genome analysis and whole genome-selection (WGS) (i.e. marker-assisted selection (MAS) on a genome-wide basis).
- WGS whole genome-selection
- MAS marker-assisted selection
- Various aspects of this embodiment of the invention provide for either whole-genome analysis or WGS wherein the makers analyzed for an animal span the animal's entire genome at moderate to high density. That is, the animal's genome is analyzed with markers that on average occur, at least, approximately every 1 to 5 centimorgans in the genome.
- the invention provides that of the markers used to carry out the whole-genome analysis or WGS, 10 or more, 25, or more, 50 or more, 100 or more, 200 or more, 500 or more, or 1000 or more are selected from the group consisting of the markers described in Table 3 and 4 and the Sequence Listing and SNPs on the same chromosome and within about 70 kilobases of one or more of the SNP anchor markers in Table 3.
- the markers may be associated with fitness or associated with productivity, or may be associated with both fitness and productivity.
- the invention provides that of the markers used to carry out the whole-genome analysis or WGS, 10 or more, 25, or more, 50 or more, 100 or more, 200 or more, 500 or more, or 1000 or more are selected from the group consisting of the markers described in Tables 3 and 4 and the Sequence Listing and SNPs on the same chromosome and within a gene, wherein the gene has a portion of its sequence within about 70 kilobases of one or more of the SNP anchor markers in Table 4.
- the markers may be associated with fitness or associated with productivity, or may be associated with both fitness and productivity.
- the genomic sequence at the SNP locus may be determined by any means compatible with the present invention. Suitable means are well known to those skilled in the art and include, but are not limited to direct sequencing, sequencing by synthesis, primer extension, Matrix Assisted Laser Desorption /Ionization-Time Of Flight (MALDI-TOF) mass spectrometry, polymerase chain reaction-restriction fragment length polymorphism, microarray/multiplex array systems (e.g. those available from Affymetrix, Santa Clara, California), and allele-specific hybridization.
- Suitable means are well known to those skilled in the art and include, but are not limited to direct sequencing, sequencing by synthesis, primer extension, Matrix Assisted Laser Desorption /Ionization-Time Of Flight (MALDI-TOF) mass spectrometry, polymerase chain reaction-restriction fragment length polymorphism, microarray/multiplex array systems (e.g. those available from Affymetrix, Santa Clara, California), and allele-specific hybridization.
- Other embodiments of the invention provide methods for allocating animals for subsequent use (e.g. to be used as sires or dams or to be sold for meat or dairy purposes) according to their predicted value for productivity or fitness.
- Various aspects of this embodiment of the invention comprise determining at least one animal's genotype for at least one SNP selected from the group of SNPs consisting of the SNPs described in Table 3 and the sequence listing and SNPs on the same chromosome and within about 70 kilobases of one or more of the SNP anchor markers, (methods for determining animals' genotypes for one or more SNPs are described supra).
- the animal's allocation for use may be determined based on its genotype at one or more, 5 or more, 10 or more, 25 or more, 50 or more, 100 or more, 300 or more, or 500 or more, or 1000 or more of the SNPs in this group.
- the instant invention also provides embodiments where analysis of the genotypes of the SNPs described in Tables 3 and the Sequence Listing is the only analysis done. Other embodiments provide methods where analysis of the SNPs disclosed herein is combined with any other desired type of genomic or phenotypic analysis (e.g. analysis of any genetic markers beyond those disclosed in the instant invention).
- the SNPs analyzed may be selected from those SNPs associated only with milk production (MP), only with somatic cell score (SCS), only with daughter pregnancy rate (DPR), only with reproductive life (PL), or only with net merit (NM).
- the analysis may be done for SNPs selected from any desired combination of MP, SCS, DPR, PL, and NM.
- SNPs associated with various traits include those listed in Tables 3 and others that are located on the same chromosome and within about 70 kb of an anchor marker.
- Other embodiments of the invention provide methods for allocating animals for subsequent use (e.g. to be used as sires or dams or to be sold for meat or dairy purposes) according to their predicted value for productivity or fitness.
- Various aspects of this embodiment of the invention comprise determining at least one animal's genotype for at least one SNP selected from the group of SNPs consisting of the SNPs described in Table 4 and the sequence listing and SNPs on the same chromosome and within a gene, wherein the gene has a portion of its sequence within about 70 kilobases of one or more of the SNP anchor markers, (methods for determining animals' genotypes for one or more SNPs are described supra).
- the animal's allocation for use may be determined based on its genotype at one or more, 5 or more, 10 or more, 25 or more, 50 or more, 100 or more, 300 or more, or 500 or more, or 1000 or more of the SNPs in this group.
- the instant invention also provides embodiments where analysis of the genotypes of the SNPs described in Table 4 and the Sequence Listing is the only analysis done.
- Other embodiments provide methods where analysis of the SNPs disclosed herein is combined with any other desired type of genomic or phenotypic analysis (e.g. analysis of any genetic markers beyond those disclosed in the instant invention).
- the SNPs analyzed may be selected from those SNPs associated only with milk production (MP), only with somatic cell score (SCS), only with daughter pregnancy rate (DPR), only with reproductive life (PL), or only with net merit (NM).
- the analysis may be done for SNPs selected from any desired combination of MP, SCS, DPR, PL, and NM.
- SNPs associated with various traits include those SNPs listed in Table 4 and others that are located on the same chromosome and within a gene, wherein the gene has a portion of its sequence within about 70 kb of an anchor marker.
- the animal's genetic sequence for the selected SNP(s) are evaluated to determine which allele of the SNP is present for at least one of the selected SNPs.
- the animal's allelic complement for all of the determined SNPs is evaluated.
- the animal is allocated for use based on its genotype for one or more of the SNP positions evaluated.
- the allocation is made taking into account the animal's genotype at each of the SNPs evaluated, but its allocation may be based on any subset or subsets of the SNPs evaluated.
- the allocation may be made based on any suitable criteria. For any SNP, a determination may be made as to whether one of the allele(s) is associated/correlated with desirable characteristics or associated with undesirable characteristics. This determination will often depend on breeding or herd management goals. Determination of which alleles are associated with desirable phenotypic characteristics can be made by any suitable means. Methods for determining these associations are well known in the art; moreover, aspects of the use of these methods are generally described in the EXAMPLES, below.
- Phenotypic traits that may be associated with the SNPs of the current invention include, but are not limited to; fitness traits and productivity traits (including for example, MP, SCS, DPR, PL, and NM).
- allocation for use of the animal may entail either positive selection for the animals having the desired genotype(s) (e.g. the animals with the desired genotypes are selected for productivity traits), negative selection of animals having undesirable genotypes (e.g. animals with an undesirable genotypes are culled from the herd), or any combination of these methods.
- animals identified as having SNP alleles associated with desirable phenotypes are allocated for use consistent with that phenotype (e.g. allocated for breeding based on phenotypes positively associated with fitness).
- animals that do not have SNP genotypes that are positively correlated with the desired phenotype (or possess SNP alleles that are negatively correlated with that phenotype) are not allocated for the same use as those with a positive correlation for the trait.
- Other embodiments of the invention provide methods for selecting potential parent animals ⁇ i.e., allocation for breeding) to improve fitness and/or productivity in potential offspring.
- Various aspects of this embodiment of the invention comprise determining at least one animal's genotype for at least one SNP selected from the group of SNPs consisting of the SNPs described in Table 3, 5 and 6 and the Sequence Listing and SNPs on the same chromosome and within about 70 kilobases of one or more of the SNP anchor markers.
- determination of whether and how an animal will be used as a potential parent animal may be based on its genotype at one or more, 10 or more, 25 or more, 50 or more, 100 or more, 300 or more, or 500 or more of the SNPs from that group.
- various aspects of these embodiments of the invention provide methods where the only analysis done is to genotype the animal for one or more of the SNPs described in Tables 3, 4, 5, and 6 and the Sequence Listing.
- Other aspects of these embodiments provide methods where analysis of one or more SNPs disclosed herein is combined with any other desired genomic or phenotypic analysis (e.g. analysis of any genetic markers beyond those disclosed in the instant invention).
- the SNP(s) analyzed may all be selected from those associated only with MP, only with SCS, only with DPR, only with PL, or only with NM. Conversely, the analysis may be done for SNPs selected from any desired combination of these or other traits.
- Other embodiments of the invention provide methods for selecting potential parent animals ⁇ i.e., allocation for breeding) to improve fitness and/or productivity in potential offspring.
- Various aspects of this embodiment of the invention comprise determining at least one animal's genotype for at least one SNP selected from the group of SNPs consisting of the SNPs described in Table 4 and the Sequence Listing and SNPs on the same chromosome and within a gene, wherein the gene has a portion of its sequence within about 70 kilobases of one or more of the SNP anchor markers.
- determination of whether and how an animal will be used as a potential parent animal may be based on its genotype at one or more, 10 or more, 25 or more, 50 or more, 100 or more, 300 or more, or 500 or more of the SNPs from that group.
- various aspects of these embodiments of the invention provide methods where the only analysis done is to genotype the animal for one or more of the SNPs described in Table 4 and the Sequence Listing.
- Other aspects of these embodiments provide methods where analysis of one or more SNPs disclosed herein is combined with any other desired genomic or phenotypic analysis (e.g. analysis of any genetic markers beyond those disclosed in the instant invention).
- the SNP(s) analyzed may all be selected from those associated only with MP, only with SCS, only with DPR, only with PL, or only with NM. Conversely, the analysis may be done for SNPs selected from any desired combination of these or other traits.
- this information is evaluated to determine which allele of the SNP is present for at least one of the selected SNPs.
- the animal's allelic complement for all of the sequenced SNPs is evaluated.
- the animal's allelic complement is analyzed and evaluated to estimate the probability that the animal's progeny will express one or more phenotypic traits or to predict the animal's progeny's genetic merit or phenotypic value of traits of interest.
- the animal is allocated for breeding use based on its genotype for one or more of the SNP positions evaluated and the probability that it will pass the desired genotype(s)/allele(s) to its progeny.
- the breeding allocation is made taking into account the animal's genotype at each of the SNPs evaluated.
- its breeding allocation may be based on any subset or subsets of the SNPs evaluated.
- the breeding allocation may be made based on any suitable criteria. For example, breeding allocation may be made so as to increase the probability of enhancing a single certain desirable characteristic in a population, in preference to other characteristics, (e.g. increased fitness, or even specifically lowering somatic cell score (SCS) as part of fitness); alternatively, the selection may be made so as to generally maximize overall production based on a combination of traits.
- the allocations chosen are dependent on the breeding goals.
- Sub-categories falling within fitness include, inter alia: daughter pregnancy rate (DPR), productive life (PL), and somatic cell score.
- Sub-categories falling within productivity include, inter alia: milk fat percentage, milk fat yield, total milk yield, milk protein percentage, and total milk protein.
- the animals used to produce the progeny are those that have been allocated for breeding according to any of the embodiments of the current invention. Those using the animals to produce progeny may perform the necessary analysis or, alternatively, those producing the progeny may obtain animals that have been analyzed by another.
- the progeny may be produced by any appropriate means, including, but not limited to using: (i) natural breeding, (ii) artificial insemination, (iii) in vitro fertilization (IVF) or (iv) collecting semen/spermatozoa and/or at least one ovum from the animal and contacting it, respectively with ova/ovum or semen/spermatozoa from a second animal to produce a conceptus by any means.
- the progeny are produced by a process comprising natural breeding.
- the progeny are produced through a process comprising the use of standard artificial insemination (AI), in vitro fertilization, multiple ovulation embryo transfer (MOET), or any combination thereof.
- AI artificial insemination
- MOET multiple ovulation embryo transfer
- Other embodiments of the invention provide for methods that comprise allocating an animal for breeding purposes and collecting/isolating genetic material from that animal: wherein genetic material includes but is not limited to: semen, spermatozoa, ovum, zygotes, blood, tissue, serum, DNA, and RNA.
- the various embodiments of the instant invention provide for databases comprising all or a portion of the sequences corresponding to at least 10 SNPs described in Tables 3, 4, 5, and 6 and the Sequence Listing.
- the databases comprise sequences for 25 or more, 50 or more, 100 or more, 200 or more, 500 or more, or 1000 or more, or substantially all of the SNPs described in Table 3, 5 and 6 and the Sequence Listing.
- inventions provide methods wherein one or more of the SNP sequence databases described herein are accessed by one or more computer- executable programs. Such methods include, but are not limited to, use of the databases by programs to analyze for an association between the SNP and a phenotypic trait, or other user-defined trait (e.g. traits measured using one or more metrics such as gene expression levels, protein expression levels, or chemical profiles), and programs used to allocate animals for breeding or market.
- a phenotypic trait e.g. traits measured using one or more metrics such as gene expression levels, protein expression levels, or chemical profiles
- Other embodiments of the invention provide methods comprising collecting genetic material from an animal that has been allocated for breeding. Wherein the animal has been allocated for breeding by any of the methods disclosed as part of the instant invention.
- kits or other diagnostic devices for determining which allele of a SNP is present in a sample; wherein the SNP(s) are selected from the group of SNPs described in Tables 5 and 6.
- the kit or device provides reagents/instruments to facilitate a determination as to whether nucleic acid corresponding to the SNP is present. Such kit/or device may further facilitate a determination as to which allele of the SNP is present.
- the kit or device comprises at least one nucleic acid oligonucleotide suitable for DNA amplification (e.g. through polymerase chain reaction).
- the kit or device comprises a purified nucleic acid fragment capable of specifically hybridizing, under stringent conditions, with at least one allele of at least one SNP described in Tables 5 and 6.
- kits or other diagnostic devices for determining which allele of one or more SNP(s) is/are present in a sample; wherein the SNP(s) are selected from the group of SNPs consisting of the SNPs described in Table 3 and the sequence listing and SNPs on the same chromosome and within about 70 kilobases of one or more of the SNP anchor markers.
- the kit or device provides reagents/instruments to facilitate a determination as to whether nucleic acid corresponding to the SNP is present. Such kit/or device may further facilitate a determination as to which allele of the SNP is present.
- the kit or device comprises at least one nucleic acid oligonucleotide suitable for DNA amplification (e.g. through polymerase chain reaction).
- the kit or device comprises a purified nucleic acid fragment capable of specifically hybridizing, under stringent conditions, with at least one allele of at least one of the SNPs described above.
- the kit or device comprises at least one nucleic acid array (e.g. DNA micro-arrays) capable of determining which allele of one or more of the SNPs are present in a sample; where the SNPs are selected from the group of SNPs consisting of the SNPs described in Tables 3, 4, 5, and 6 and the Sequence Listing and SNPs on the same chromosome and within about 70 kilobases of one or more of the SNP anchor markers.
- Preferred aspects of this embodiment of the invention provide DNA micro-arrays capable of simultaneously determining which allele is present in a sample for 10 or more SNPs.
- the DNA micro-array is capable of determining which SNP allele is present in a sample for 25 or more, 50 or more, 100 or more, 200 or more, 500 or more, or 1000 or more SNPs.
- Methods for making such arrays are known to those skilled in the art and such arrays are commercially available (e.g. from Affymetrix, Santa Clara, California).
- Other embodiments of the instant invention provide methods of identifying genetic markers associated with MP, SCS, DPR, PL, and/or NM (and/or any other fitness or productivity trait) that are in allelic association with one or more of the SNPs described in Tables 3, 4, 5, and 6 and the Sequence Listing.
- the method comprises: (a) identifying a marker, Bi, that is suspected of being in allelic association with at least one marker, Ai, wherein A
- Even more preferred aspects of this embodiment of the invention provide methods for identifying genetic markers in linkage disequilibrium with one or more SNPs selected from the group of SNPs described in Tables 3, 4, 5, and 6.
- the value of r is greater than 0.5, greater than 0.7, or greater than 0.9.
- the method comprises finding polymorphisms within about 70 kb of one of the anchor SNPs listed in Table 3 or for finding a polymorphism in a gene, wherein the gene has a portion of its sequence within about 70 kb of one of the anchor SNPs listed in Table 4..
- Additional embodiments of the invention provide for genetic markers for fitness and/or productivity that are in allelic association with one or more of the SNPs described in Tables 3, 4, 5, and 6.
- Markers provided as part of this embodiment of the invention may be identified by any suitable means known to those of ordinary skill in the art.
- a marker falls within this embodiment of the invention if it is determined to be in allelic association with one or more of the SNPs described in Tables 3, 4, 5, and 6as defined by Equation 1 , supra, where r 2 is greater than 0.2, greater than 0.5, greater than 0.7, or greater than 0.9.
- the markers are in linkage disequilibrium with one or more of the SNPs described in Table 3.
- Additional embodiments of the invention provide for genetic markers for fitness and/or productivity that are in allelic association with one or more of the SNPs described in Tables 5 and 6.
- Markers provided as part of this embodiment of the invention may be identified by any suitable means known to those of ordinary skill in the art.
- a marker falls within this embodiment of the invention if it is determined to be in allelic association with one or more of the SNPs described in Tables 5 and 6 as defined by Equation 1, supra, where r is greater than 0.2, greater than 0.5, greater than 0.7, or greater than 0.9.
- the markers are in linkage disequilibrium with one or more of the SNPs described in Tables 5 and 6.
- Genetic markers that are in allelic association with any of the SNPs described in the Tables may be identified by any suitable means known to those skilled in the art. For example, a genomic library may be screened using a probe specific for any of the sequences of the SNPs described in the Tables. In this way clones comprising at least a portion of that sequence can be identified and then up to 300 kilobases of 3' and/or 5' flanking chromosomal sequence can be determined. Preferably up to about 70 kilobases of 3' and/or 5' flanking chromosomal sequences are evaluated. By this means, genetic markers in allelic association with the SNPs described in the Tables will be identified.
- chromosomal location of a SNP associated with a particular phenotypic variation can be determined, by means well known to those skilled in the art. Once the chromosomal location is determined genes suspected to be involved with determination of the phenotype can be analyzed. Such genes may be identified by sequencing adjacent portions of the chromosome or by comparison with analogous section of the human genetic map (or known genetic maps for other species).
- Other embodiments of the invention provide methods for identifying causal mutations that underlie one or more quantitative trait loci (QTL).
- QTL quantitative trait loci
- Various aspects of this embodiment of the invention provide for the identification of SNPs that are in allelic association with one or more of the SNPs described in Table 3. Once these SNPs are identified, it is within the ability of skilled artisans to identify mutations located proximal to such SNP(s). Further, one skilled in the art can identify genes located proximate to the identified SNP(s) and evaluate these genes to select those likely to contain the causal mutation. Once identified, these genes and the surrounding sequence can be analyzed for the presence of mutations, in order to identify the causal mutation.
- Other embodiments of the invention provide methods for identifying causal mutations that underlie one or more quantitative trait loci (QTL).
- QTL quantitative trait loci
- Various aspects of this embodiment of the invention provide for the identification QTL that are in allelic association with one or more of the SNPs described in Tables 5 and 6. Once these SNPs are identified, it is within the ability of skilled artisans to identify mutations located proximal to such SNP(s). Further, one skilled in the art can identify genes located proximate to the identified SNP(s) and evaluate these genes to select those likely to contain the causal mutation. Once identified, these genes and the surrounding sequence can be analyzed for the presence of mutations, in order to identify the causal mutation.
- Still other embodiments of the invention provide methods to modulate the expression and/or concentration of a gene product from the genes described in Table 4 with the intent of manipulating the performance and/or product quality of an animal or animal product. This can be performed through any of a number of technologies designed to positively or negatively influence gene expression, including but not limited to transgenesis, RNA interference, and anti-sense protocols.
- the new linkage mapping tools build on the basic mapping principles programmed in CRIMAP to improve efficiency through partitioning of large pedigrees, automation of chromosomal assignment and two-point linkage analysis, and merging of sub-maps into complete chromosomes.
- the resulting whole- genome discovery map included 6,966 markers and a map length of 3,290 cM for an average map density of 2.18 markers/cM. The average gap between markers was 0.47 cM and the largest gap was 7.8 cM. This map provided the basis for whole-genome analysis and fine-mapping of QTL contributing to variation in productivity and fitness in dairy cattle.
- Systems for discovery and mapping populations can take many forms.
- the most effective strategies for determining population-wide marker/QTL associations include a large and genetically diverse sample of individuals with phenotypic measurements of interest collected in a design that allows accounting for non-genetic effects and includes information regarding the pedigree of the individuals measured.
- an outbred population following the grand-daughter design (Weller et ai, 1990) was used to discover and map QTL: the population, from the Holstein breed, had 529 sires each with an average of 6.1 genotyped sons, and each son has an average of 4216 daughters with milk data.
- DNA samples were collected from approximately 3,200 Holstein bulls and about 350 bulls from other dairy breeds; representing multiple sire and grandsire families.
- Dairy traits under evaluation include fitness and productivity traits such as milk yield (“MILK”) (pounds), fat yield (“FAT”) (pounds), fat percentage (“FATPCT”) (percent), productive life (“PL”) (months), somatic cell score (“SCS”) (Log), daughter pregnancy rate (“DPR”) (percent), protein yield (“PROT”) (pounds), protein percentage (“PROTPCT”) (percent), and net merit (“NM”) (dollar).
- MILK milk yield
- FAT fat yield
- FATPCT fat percentage
- PL productive life
- SCS somatic cell score
- DPR daughter pregnancy rate
- PROT protein yield
- PROTPCT protein percentage
- net merits of these traits defined as PTA (predicted transmitting ability) were estimated using phenotypes of all relatives.
- Protein yield and fat yield together account for >50% of NM, and the value of milk yield, fat content, and protein content is accounted for via protein yield and fat yield.
- y, ⁇ + P 1 (SPTA) 1 + PTAd 1 [2]
- y, (y u ) is the PTA of the i th bull (PTA of the j th son of the i lh sire); S 1 is the effect of the i th sire; (SPTA) 1 is the sire's PTA of the i th bull of the whole sample; ⁇ is the population mean; PTAd 1 (PTAd 11 ) is the residual bull PTA.
- Equation [1] is referred to as the sire model, in which sires were fitted as fixed factors.
- the sire model in which sires were fitted as fixed factors.
- a considerably large number of sires only have a very small number of progeny tested sons (e.g., some have one son), and it is clearly undesirable to fit sires as fixed factors in these cases.
- the USA Holstein herds have been making steady and rapid genetic progress in traditional dairy traits in the last several decades, implying that the sire's effect can be partially accounted for by fitting the birth year of a bull.
- sires were replaced with son's birth year in Eq. [I ].
- Eq. [2] is referred to as the SPTA model, in which sire's PTA are fitted as a covariate. Residual PTA (PTAd 1 or PTAd 1 ,) were estimated using linear regression.
- linkage disequilibrium (LD) mapping was performed in the aforementioned discovery population using statistical analyses based on probabilities of individual ordered genotypes estimated conditional on observed marker genotypes.
- the first step was to estimate sire's ordered genotype probabilities at all linked markers conditional on grandsire's and offspring marker genotype data.
- the exact calculation quickly becomes computationally infeasible as the size and complexity of the pedigree and number of linked markers increases. For example, there are, in total 2 k ordered genotypes for all linked loci when a sire has k linked heterozygous loci.
- a stepwise procedure developed based on a likelihood ratio test was used for estimating probabilities of sire's ordered genotypes at all linked markers. [0098J
- the probabilities of ordered genotypes at loci of interest were estimated conditional on flanking informative markers as follows:
- P(H sa H db I M) is the probability of sire having a pair of haplotypes (or order genotype) H sa H db at all linked loci conditional on the observed genotype data M
- P(H uk H dlk I H sa H db ,M) is the probability of a son having ordered genotype H slk H dlk at loci of interest conditional on sire's ordered genotype H 53 H 1 ,,, at all linked loci and the observed genotype data M.
- haplotypes of neighboring (and/or non-neighboring) markers across each chromosome were defined by setting the maximum length of a chromosomal interval and minimum and maximum number of markers to be included.
- haplotype evaluation The association between pre-adjusted trait phenotypes and haplotype (or pair of haplotype that is alternatively termed as ordered genotypes) was evaluated via a regression approach with the following models:
- PTAd k is the preadjusted PTA of the k th bull as defined in Eq. [1] under the sire model and can be replaced with PTAd, as defined in Eq [2] under the SPTA model, and ek is the residual;
- P( ⁇ s , k ) and P(H d ,k) are the probability of paternal and maternal haplotype of individual k being haplotype i;
- the mapping of SNPs to bovine genomic sequence assembly was done by comparing SNP sequences with the assembled bovine genomic sequences obtained from ftp://ftp.hgsc.bcm.tmc.edu/pub/data/Btaurus/fasta/Btau20060815-freeze.
- the SNPs' sequences were constructed by concatenating the left flank sequence, one of the SNP alleles (1st character of ALLELEs), and right flank sequence.
- the sequences were blasted (linux megablast 2.2.15 was used for sequence comparison on a computer farm, against the bovine sequence assembly.
- the matches were further filtered to remove matches that have match length ⁇ 0.90 * seq_length. Furthermore, those SNPs that match to ambiguous locations and unknown chromosomes are ignored in this study.
- PZG' is calculated as the physical distance over the genetic distance for each chromosome (Chromosome-wide P/G), and an average of physical distance over the genetic distance all chromosomesyields Genome-wide P/G. Based on the data presented in Table 2, a genetic distance of 1 cM is assumed to be equivalent to about 702660 bp in physical map distance.
- Public SNPs that are in close proximity to the SNPs that we have identified to be significantly associated with traits include Daughter Pregnancy Rate, Milk Yield and Composition, Net Merit, Productive Life, and Somatic Cell Score, were identified by their physical map location. If an Anchor Marker is located at position z, those public SNPs that are located between z-70266 and z+70266 are deemed as in close proximity of the Anchor Marker. These markers, which are closely associated with the anchor markers, are called “neighboring markers” or “neighboring SNPs”.
- Bovine genes were retrieved from the file "Bos_taurus.Btau_3.1.43.pep.known.fa.gz” from ensemble ftp site at "ftp.ensembl.org/pub/current_bos_taurus/data/fasta/pep/”. In total 18654 entries were included in the file with protein sizes ranging from 8 to 23992.
- the Ensembl genes are annotated by the Ensembl automatic analysis pipeline using either a GeneWise model from a species-specific or vertebrate protein, a set of aligned species-specific cDNAs followed by GenomeWise for ORF prediction or from GENSCAN exons supported by protein, cDNA and EST evidence. GeneWise models are further combined with available aligned cDNAs to annotate UTRs
- the genome mapping information can be obtained from the header line ">ENSP00000328693 pep:novel chromosome:NCBI35:l :904515:910768: l gene:ENSG00000158815:transcript:ENST00000328693" as from 904,515 bp to 910,768 bp on chromosome 1 of the bovine genome (build NCBI35, the same as Btau_3.1).
- the stable transcript identifier is used as gene name in this application.
- anchor markers the SNPs that we have identified to be significantly associated with traits, including Daughter Pregnancy Rate, Milk Yield and Composition, Net Merit, Productive Life, and Somatic Cell Score
- these genes, which are closely associated with the anchor markers are called “neighboring genes”. Markers within these genes, such as for example SNPs, which are closely associated with the anchor markers, are called “neighboring markers” or “neighboring SNPs”.
- one or more of the markers with significant association to that trait can be used in selection of breeding animals.
- use of animals possessing a marker allele (or a haplotype of multiple marker alleles) in population-wide LD with a favorable QTL allele will increase the breeding value of animals used in breeding, increase the frequency of that QTL allele in the population over time and thereby increase the average genetic merit of the population for that trait. This increased genetic merit can be disseminated to commercial populations for full realization of value.
- a progeny-testing scheme could greatly improve its rate of genetic progress or graduation success rate via the use of markers for screening juvenile bulls.
- a progeny testing program would use pedigree information and performance of relatives to select juvenile bulls as candidates for entry into the program with an accuracy of approx 0.5.
- marker information young bulls could be screened and selected with much higher accuracy.
- DNA samples from potential bull mothers and their male offspring could be screened with a genome-wide set of markers in linkage disequilibrium with QTL, and the bull-mother candidates with the best marker profile could be contracted for matings to specific bulls.
- MBV marker breeding values
- a centralized or dispersed genetic nucleus (GN) population of cattle could be maintained to produce juvenile bulls for use in progeny testing or direct sale on the basis of MBVs.
- a GN herd of 1000 cows could be expected to produce roughly 3000 offspring per year, assuming the top 10-15% of females were used as ET donors in a multiple-ovulation and embryo-transfer (MOET) scheme.
- MOET multiple-ovulation and embryo-transfer
- markers could change the effectiveness MOET schemes and in vitro embryo production.
- MOET nucleus schemes have proven to be promising from the standpoint of extra genetic gain, but the costs of operating a nucleus herd together with the limited information on juvenile animals has limited widespread adoption.
- the first step in using a SNP for estimation of breeding value and selection in the GN is collection of DNA from all offspring that will be candidates for selection as breeders in the GN or as breeders in other commercial populations (in the present example, the 3,000 offspring produced in the GN each year).
- One method is to capture shortly after birth a small bit of ear tissue, hair sample, or blood from each calf into a labeled (bar-coded) tube. The DNA extracted from this tissue can be used to assay an essentially unlimited number of SNP markers and the results can be included in selection decisions before the animal reaches breeding age.
- ⁇ A I A I I ⁇ A I A2 and ⁇ A2A2 are the (marker) breeding values for animals with marker genotypes AlAl , A1A2 and A2A2, respectively.
- the total trait breeding value for an animal is the sum of breeding values for each marker (or haplotype) considered and the residual polygenic breeding value:
- EBV y is the Estimated Trait Breeding Value for the i th animal
- n is the total number of markers (haplotypes) under consideration
- O 1 is the polygenic breeding value for the i th animal after fitting the marker genotype(s).
- a nucleic acid sequence contains a SNP of the present invention if it comprises at least 20 consecutive nucleotides that include and/or are adjacent to a polymorphism described in Tables 3, 4, 5, and 6and the Sequence Listing.
- a SNP of the present invention may be identified by a shorter stretch of consecutive nucleotides which include or are adjacent to a polymorphism which is described in Table 3, 4, 5, and 6 and the Sequence Listing in instances where the shorter sequence of consecutive nucleotides is unique in the bovine genome.
- a SNP site is usually characterized by the consensus sequence in which the polymorphic site is contained, the position of the polymorphic site, and the various alleles at the polymorphic site.
- Consensus sequence means DNA sequence constructed as the consensus at each nucleotide position of a cluster of aligned sequences.
- Consensus sequence can be based on either strand of DNA at the locus, and states the nucleotide base of either one of each SNP allele in the locus and the nucleotide bases of all Indels in the locus, or both SNP alleles using degenerate code (IUPAC code: M for A or C; R for A or G; W for A or T; S for C or G; Y for C or T; K for G or T; V for A or C or G; H for A or C or T; D for A or G or T; B for C or G or T; N for A or C or G or T; Additional code that we use include I for "-"or A; O for "-” or C; E for "-” or G; L for "-” or T; where "-” means a deletion).
- IUPAC code M for A or C; R for A or G; W for A or T; S for C or G; Y for C or T; K for G or T; V for A or C or G
- Such SNP have a nucleic acid sequence having at least 90% sequence identity, more preferably at least 95% or even more preferably for some alleles at least 98% and in many cases at least 99% sequence identity, to the sequence of the same number of nucleotides in either strand of a segment of animal DNA which includes or is adjacent to the polymorphism.
- the nucleotide sequence of one strand of such a segment of animal DNA may be found in a sequence in the group consisting of SEQ ID NO:1 through SEQ ID NO:262,149. It is understood by the very nature of polymorphisms that for at least some alleles there will be no identity at the polymorphic site itself. Thus, sequence identity can be determined for sequence that is exclusive of the polymorphism sequence.
- the polymorphisms in each locus are described in the tables and the sequence listing. [01 19J Shown below are examples of public bovine SNPs that match each other:
- ss38333809 tcttacacatcaggagatagytccgaggtggatttctacaa
- SS38333809 is SEQ ID NO:262146
- SS38334335 is SEQ ID NO:262149
- Quantifying production traits can be accomplished by measuring milk of a cow and milk composition at each milking, or in certain time intervals only.
- USDA yield evaluation the milk production data are collected by Dairy Herd Improvement Associations (DHIA) using ICAR approved methods.
- Genetic evaluation includes all cows with the known sire and the first calving in 1960 and later and pedigree from birth year 1950 on. Lactations shorter than 305 days are extended to 305 days. All records are preadjusted for effects of age at calving, month of calving, times milked per day, previous days open, and heterogeneous variance. Genetic evaluation is conducted using the single-trait BLUP repeatability model.
- the model includes fixed effects of management group (herd x year x season plus register status), parity x age, and inbreeding, and random effects of permanent environment and herd by sire interaction.
- PTAs are estimated and published four times a year (February, May, August, and November). PTAs are calculated relative to a five year stepwise base i.e., as a difference from the average of all cows born in the current year, minus five (5) years. Bull PTAs are published estimating daughter performance for bulls having at least 10 daughters with valid lactation records.
- CE calving ease
- SB stillbirths
- DPR daughter pregnancy rate
- CE is scored by the owner on a scale of 1 to 5, 1 meaning no problems encountered or unobserved birth and 5 meaning extreme difficulty.
- the CE PTAs for sires are expressed as percent difficult births in primiparous daughter heifers (%DBH), where difficult births are those scored as requiring considerable force or being extremely difficult (4 or 5 on a five point scale).
- SB is scored by the owner on a scale of 1 to 3, 1 meaning the calf was born alive and was alive 48 h postpartum, 2 meaning the calf was born dead, and 3 indicating the calf was born alive but died within 48 h postpartum. SB scores of 2 and 3 are combined into a single category for evaluation.
- the SB PTAs for sires are expressed as percent stillbirths in daughter heifers (%SBH), where stillborn calves are those scored as dead at birth or born alive but died within 48 h of birth (2 or 3 on a three point scale).
- Pregnancy rate is a function of the number of days open, which is the number of days between calving and a successful breeding.
- DPR is defined as the percentage of nonpregnant cows (daughters) that become pregnant during each 21 -day period.
- a DPR PTA of "1" implies that daughters from this bull are 1 % more likely to become pregnant during that estrus cycle than a bull with a DPR PTA of zero.
- Productive life is defined as the length of time a cow remains in a milking herd before removal by voluntary or involuntary culling (due to health or fertility problems), or death. PL is usually measured as the number of days, months, or days in milk (DM) from the first calving to the day the cow exits the herd (due to death, culling, or selling to non-dairy purposes). Because some cows are still alive at the time of data collection, their records are projected (VanRaden, P.M. and E. J. H. Klaaskate. 1993) or treated as censored (Ducrocq, 1987).
- the USDA genetic evaluation for PL includes all cows with first calving in 1960 and later (born in 1950 and later for the pedigree). Cows born at least 3 years prior to evaluation, with a valid sire ID and first lactation records are considered. PL is considered to be completed at 7 years of age. Records are extended for cows that have not had the opportunity to reach 7 years of age because they are still alive, were sold for dairy purposes, or the herd discontinued testing. Cows sold for dairy purposes or in herds that discontinued testing receive extended records if they had opportunity to reach 3 years of age; otherwise their records are discarded.
- the method of genetic evaluation is a single trait BLUP animal model. The statistical model includes effects of management group (based on herd of first lactation and birth date) and sire by herd interaction. Sires' PTAs for PL are calculated relative to a five year stepwise base i.e., as a difference from the average PL of all cows born in the current year, minus five (5) years.
- SCS somatic cell score
- Ciobanu DC, Bastiaansen, JWM, Longergan, SM, Thomsen, H, Dekkers, JCM, Plastow, GS, and Rothschild, MF, (2004) J. Anim. ScL 82:2829-39.
- Patent Literature (Swine)
- Table 3 provides a list of regions in the first column, SEQ ID numbers of anchor markers in the second column, SEQ ID numbers for markers within the region in the third column, and trait association abbreviations in the fourth column. Columns five through eight, nine through twelve, and thirteen through sixteen contain similarly arranged information. The abbreviations used for the trait associations are as follows:
- M Milk production: MILK, FAT, FATPCT, PROT, PROTPCT
- D Daughter Pregnancy Rate: DPR
- Table 4 provides a list of anchor marker SEQ ID numbers in the first column, gene names in the second column, SEQ ID numbers of SNPs within the gene in the third column, and trait association abbreviations in the fourth column. Columns five through eight, nine through twelve, and thirteen through sixteen contain similarly arranged information. The abbreviations used for the trait associations are as follows:
- M Milk production: MILK, FAT, FATPCT, PROT, PROTPCT
- D Daughter Pregnancy Rate: DPR
- ENSBTAT00000033728 is represented as Z033728; where "Z” in gene name stands for "ENSBTAT00000”.
- Table 5 provides a list of phenotypic traits and the assigned identification numbers of SNPs found to be associated with each trait. The left column provides a counter to allow easier reading of the table. The "Trait” column lists the following traits: “FITNESS”, rows 1-2397; and "PRODUCTIVITY”, rows 2398-51 17.
- Table 6 provides the SEQ ID NO of the sequence associated with each of the SNPs listed in Table 5.
- the "SNP POSITION” column provides the position (nucleotide number) of the SNP within the associated sequence (SEQ ID NO) and the "SNP ALLELE 1" and “SNP ALLELE 2" columns provide the identity of the two nucleotides that occur most frequently at the SNP POSITION within the population analyzed.
- Table 7 provides a list of the SNP ID NOs and SEQ ID NOs, listed in Tables 5 and 6, sorted numerically according to SNP ID NO.
Abstract
The present invention provides methods for improving desirable animal traits including improved fitness and productivity in dairy animals. Also provided are methods for determining a dairy animal's genotype with respect to multiple markers associated with milk production, somatic cell score, daughter pregnancy rate, productive life, and/or net merit. The invention also provides methods for selecting or allocating animals for predetermined uses such as progeny testing or nucleus herd breeding, for picking potential parent animals for breeding, and for producing improved progeny animals. Also provided are methods for identifying genes and/or genetic markers for traits including fitness and productivity traits; including identifying markers that are in allelic association with the SNPs disclosed herein and for identifying the genotypes of a causative mutation underlying a quantitative trait locus (QTL). The invention further provides methods for genotyping animals for multiple SNPs associated with the traits described above.
Description
GENETIC MARKERS AND METHODS FOR IMPROVING DAIRY PRODUCTIVITY AND FITNESS TRAITS
PRIORITY CLAIM
[0001] This application claims the benefit of U.S. Provisional Application Serial No. 60/848,541 , filed September 29, 2006. This application also claims the benefit of U.S. Provisional Application Serial No, filed 60/919,099, filed March 20, 2007. This application also claims the benefit of U.S. Provisional Application Serial No. 60/931 ,680, filed May 24, 2007. Each of the above-referenced applications is hereby incorporated by reference in their entirety.
INCORPORATION OF SEQUENCE LISTING L0002] A sequence listing is contained in the file named
"Dairy_ComboApp_st25_FINAL.txt" which is 21 1 ,087,360 bytes (201 megabytes) (measured in MS-Windows XP) and was created on September 27, 2007 and is located in computer readable form on a compact disc (in accordance with 37 C.F.R. §1. 52(e) and 37 C.F.R. § 1. 1.821) enclosed herewith and incorporated herein by reference.
FIELD OF THE INVENTION
[0003] The invention relates to the enhancement of desirable characteristics in dairy cattle. Embodiments relate to genes, gene expression, and genetic markers used in methods for improving dairy cattle fitness and/or productivity. More specifically, it relates to the use of genetic markers in methods for improving dairy cattle, including improvements with respect to Milk production (MP), Somatic Cell Score (SCS), Daughter Pregnancy Rate (DPR), Productive Life (PL), and Net Merit (NM).
BACKGROUND OF THE INVENTION
[0004] The future viability and competitiveness of the dairy industry depends on continual improvement in productivity and fitness traits. Examples of key traits include milk productivity (e.g. milk, fat, protein yield, fat%, protein % and
persistency of lactation), health (e.g. Somatic Cell Count, mastitis incidence), fertility (e.g. pregnancy rate, display of estrus, calving interval and non-return rates in bulls), calving ease (e.g. direct and maternal calving ease), longevity (e.g. productive life), and functional conformation (e.g. udder support, proper foot and leg shape, proper rump angle, etc.). Unfortunately efficiency traits are often unfavorably correlated with fitness traits. Although fitness traits all have some degree of underlying genetic variation in commercial cattle populations, the accuracy of selecting breeding animals with superior genetic merit for many of them is low due to low heritability or the inability to measure the trait cost effectively on the candidate animal. In addition, many productivity and fitness traits can only be measured on females. Thus, the accuracy of conventional selection for these traits is moderate to low and ability to make genetic change through selection is limited, particularly for fitness traits.
[0005] Genomics offers the potential for greater improvement in productivity and fitness traits through the discovery of genes, or genetic markers linked to genes, that account for genetic variation and can be used for more direct and accurate selection. Close to 1000 markers with associations with productivity and fitness traits have been reported (see www.bovineqtl.tamu.edu/ for a searchable database of reported QTL), however, the resolution of QTL location is still quite low which makes it difficult to utilize these QTL in marker-assisted selection (MAS) on an industrial scale. Only a few QTL have been fully characterized with a strong putative or well-confirmed causal mutation: DGATl on chromosome 14 (Grisard et al., 2002; Winter et al, 2002; Kuhn et al., 2004) GHR on chromosome 20 (Blott et al., 2003), ABCG2 (Cohen- Zinder et al., 2005) or SPPl on chromosome 6 (Schnabel et al., 2005). However, these discoveries are rare and only explain a small portion of the genetic variance for productivity traits and no genes controlling fitness traits have been fully characterized. A more successful strategy employs the use of whole-genome high-density scans of the entire bovine genome in which QTL are mapped with sufficient resolution to explain the majority of genetic variation around the traits of interest.
[0006] Cattle herds used for milk production around the world originate predominantly from the Holstein or Holstein-Friesian breeds which are known for high levels of production. However, the high production levels in Holsteins have also been linked to greater calving difficulty and reduced levels of fertility. It is unclear
whether these unfavorable correlations are due to pleiotropic gene effects or simply due to linked genes. If the latter is true, with marker knowledge, it may be possible to select for favorable recombinants that contain the favorable alleles from several linked genes that are normally at frequencies too low to allow much progress with traditional selection. Since Holstein germplasm has been sold and transported globally for several decades, the Holstein breed has effectively become one large global population held to relatively moderate inbreeding rates. Also, the outbred nature of such a large population selected for several generations has allowed linkage disequilibrium to break down except within relatively short distances (i.e. less than a few centimorgans) (Hayes et al., 2006). Given this pattern of linkage disequilibrium, dense marker coverage is required to refine QTL locations with sufficient precision to find markers that are in tight linkage disequilibrium with them. Therefore, markers that are in tight linkage disequilibrium with the QTL are essential for effective population-wide MAS or whole-genome selection (WGS).
[0007] The presence of linkage disequilibrium between very closely linked markers provides one skilled in the art an opportunity to substitute one marker for the other in the context of a marker-assisted breeding/management/allocation program. Linkage disequilibrium can be defined as the observance of alleles at two distinctive loci occurring in gametes more frequently than expected given the known allele frequencies and recombination fraction between the two loci (Source: NHBLI/NCBI Glossary). Given the knowledge that allele 1 at marker A occurs in gametes disproportionately with allele 1 at marker B confers the equivalent practical and economic value of marker B upon marker A, such that they can be considered as equivalent.
[0008] It is well documented that linkage disequilibrium between two markers increases as physical distance decreases. As such, there is value in identifying markers that are located physically close to markers that have proven value when used in marker-assisted breeding/management/allocation programs. Because the nearby marker will likely display linkage disequilibrium with the marker of proven value, the nearby marker can be used as a proxy as it will show a similar association with the phenotypic trait. Thus the proven economic value of any given marker can be replicated by identifying and substituting very closely linked (nearby) markers.
[0009JFurthermore, it is well documented that DNA sequences within (intronic, 5' UTR, 3' UTR) and outside of (promoter sequence motifs, enhancer sequence motifs) the formal boundaries of a gene can significantly influence the expression of the gene. For example, McDowell and Dean (1999) showed that an enhancer element more than 2 kb from the translational start codon directly influenced transcription of the gene. More recently, Markstein et al (2004) described the effects of an enhancer element residing 15 kb from the translational start site. Given these observations, it is clear that sequence variation at locations outside of the formal boundaries of a gene (typically from the translational start codon to the translational termination codon), can alter the expression patterns of a gene.
[0010] Most traits are quantitative in nature and hence are governed by a large number of QTL of small to moderately sized effects. Therefore, to characterize enough QTL to explain a majority of genetic variation for these traits, the whole genome needs to be scanned with a set of markers mapped to the genome at high resolution (i.e. greater than 1 marker/cM); otherwise known as whole-genome analysis. Furthermore, a sufficient number of marked QTL must be used in MAS in order to accurately predict the breeding value of an animal without phenotyping records on relatives or the animal itself. The application of such a high-density whole-genome marker map to discover and finely-map QTL explaining variation in productivity and fitness traits is described herein. The large number of resulting linked markers can be used in several methods of marker selection or marker-assisted selection, including whole-genome selection (WGS) (Meuwissen et al., Genetics 2001) to improve the genetic merit of the population for these traits and create value in the dairy industry.
SUMMARY OF THE INVENTION
[001 1] This section provides a non-exhaustive summary of the present invention.
[0012] Various embodiments of the invention provide methods for evaluating an animal's genotype at 10 or more positions in the animal's genome. In various aspects of these embodiments the animal's genotype is evaluated at positions within a segment of DNA (an allele), that contains at least one SNP selected from the SNPs described in the Tables and Sequence Listing of the present application.
[0013| Other embodiments of the invention provide methods for allocating animals for use according to their predicted marker breeding value for productivity and/or fitness. Various aspects of this embodiment of the invention provide methods that comprise: a) analyzing the animal's genomic sequence at one or more polymorphisms (where the alleles analyzed each comprise at least one SNP) to determine the animal's genotype at each of those polymorphisms; b) analyzing the genotype determined for each polymorphisms to determine which allele of the SNP is present; c) allocating the animal for use based on its genotype at one or more of the polymorphisms analyzed. Various aspects of this embodiment of the invention provide methods for allocating animals for use based on a favorable association between the animal's genotype, at one or more polymorphisms disclosed in the present application, and a desired phenotype. Alternatively, the methods provide for not allocating an animal for a certain use because it has one or more SNP alleles that are either associated with undesirable phenotypes or are not associated with desirable phenotypes.
[0014J Other embodiments of the invention provide methods for allocating animals for use according to their predicted marker breeding value for Milk production, Somatic Cell Score, Daughter Pregnancy Rate, Productive Life, and/or Net Merit. Various aspects of this embodiment of the invention provide methods that comprise: a) analyzing the animal's genomic sequence at one or more polymorphisms (where the alleles analyzed each comprise at least one SNP) to determine the animal's genotype at each of those polymorphisms; b) analyzing the genotype determined for each polymorphisms to determine which allele of the SNP is present; c) allocating the animal for use based on its genotype at one or more of the polymorphisms analyzed. Various aspects of this embodiment of the invention provide methods for allocating animals for use based on a favorable association between the animal's genotype, at one or more polymorphisms disclosed in the present application, and a desired phenotype. Alternatively, the methods provide for not allocating an animal for a certain use because it has one or more SNP alleles that are either associated with undesirable phenotypes or are not associated with desirable phenotypes.
[0015] Other embodiments of the invention provide methods for selecting animals for use in breeding to produce progeny. Various aspects of these methods comprise: A) determining the genotype of at least one potential parent animal at one or more
locus/loci, where at least one of the loci analyzed contains an allele of a SNP selected from the group of SNPs described in Table 3, 4, 5 and 6 and the Sequence Listing. B) Analyzing the determined genotype at one or more positions for at least one animal to determine which of the SNP alleles is present. C) Correlating the analyzed allele(s) with one or more phenotypes. D) Allocating at least one animal for use to produce progeny.
[0016] Other embodiments of the invention provide methods for selecting animals for use in breeding to produce progeny. Various aspects of these methods comprise: A) determining the genotype of at least one potential parent animal at one or more locus/loci, where at least one of the loci analyzed contains an allele of a SNP selected from the group of SNPs described in Table 4 and the Sequence Listing and/or other SNPs within one or more genes described in Table 4. B) Analyzing the determined genotype at one or more positions for at least one animal to determine which of the SNP alleles is present. C) Correlating the analyzed allele(s) with one or more phenotypes. D) Allocating at least one animal for use to produce progeny.
[0017] Other embodiments of the invention provide methods for producing offspring animals (progeny animals). Aspects of this embodiment of the invention provide methods that comprise: breeding an animal that has been selected for breeding by methods described herein to produce offspring. The offspring may be produced by purely natural methods or through the use of any appropriate technical means, including but not limited to: artificial insemination; embryo transfer (ET), multiple ovulation embryo transfer (MOET), in vitro fertilization (FVF), or any combination thereof.
[0018] Other embodiments of the invention provide for databases or groups of databases, each database comprising lists of the nucleic acid sequences, which lists include a plurality of the SNPs described in Tables 3, 4, 5, and 6 and the Sequence Listing. Preferred aspects of this embodiment of the invention provide for databases comprising the sequences for 50 or more SNPs. Other aspects of these embodiments comprise methods for using a computer algorithm or algorithms that use one or more database(s), each database comprising a plurality of the SNPs described in Table 3, 5 and 6 and the Sequence Listing to identify phenotypic traits associated with the
inheritance of one or more alleles of the SNPs, and/or using such a database to aid in animal allocation.
[0019] Other embodiments of the invention provide for databases or groups of databases, each database comprising lists of the nucleic acid sequences, which lists include a plurality of the SNPs described in Table 4 and the Sequence Listing and/or other SNPs located with the genes listed in Table 4. Preferred aspects of this embodiment of the invention provide for databases comprising the sequences for 50 or more SNPs. Other aspects of these embodiments comprise methods for using a computer algorithm or algorithms that use one or more database(s), each database comprising a plurality of the SNPs described in Table 4 and the Sequence Listing to identify phenotypic traits associated with the inheritance of one or more alleles of the SNPs, and/or using such a database to aid in animal allocation.
[0020] The expression of genes identified in this application are likely responsible (either through quantitative or qualitative variation of the protein) for a significant proportion of the observed genetic variation for the trait. As such, other embodiments of the invention may include modulating the presence of the protein products of genes described in Table 4 in the animal through various established methods, which would therefore likely modulate the phenotype of the animal in a predictable fashion. [0021] Additional embodiments of the invention provide methods for identifying other genetic markers that are in allelic association with one or more of the SNPs described in Tables 3, 4, 5, and 6and the Sequence Listing.
DEFINITIONS
[0022] The following definitions are provided to aid those skilled in the art to more readily understand and appreciate the full scope of the present invention. Nevertheless, as indicated in the definitions provided below, the definitions provided are not intended to be exclusive, unless so indicated. Rather, they are preferred definitions, provided to focus the skilled artisan on various illustrative embodiments of the invention.
[0023] As used herein the term "allelic association" preferably means: nonrandom deviation of f(AjBj) from the product of f(A,) and f(Bj), which is specifically defined
by r~>0.2, where r" is measured from a reasonably large animal sample (e.g., ≥IOO) and defined as
2 If(A1B1 ) - f(At )f (B1 )]2 r = —± — [Equation 1] f^ Xl - ftA^Xf^ Xl - fφj))
where Ai represents an allele at one locus, Bi represents an allele at another locus; f(A|B|) denotes frequency of gametes having both Ai and Bi, f(A0 is the frequency of A|, f(B|) is the frequency of Bi in a population.
[0024] As used herein the terms "allocating animals for use" and "allocation for use" preferably mean deciding how an animal will be used within a herd or that it will be removed from the herd to achieve desired herd management goals. For example, an animal might be allocated for use as a breeding animal or allocated for sale as a non- breeding animal (e.g. allocated to animals intended to be sold for meat). In certain aspects of the invention, animals may be allocated for use in sub-groups within the breeding programs that have very specific goals (e.g. productivity or fitness). Accordingly, even within the group of animals allocated for breeding purposes, there may be more specific allocation for use to achieve more specific and/or specialized breeding goals.
[0025] As used here in the term "anchor SNP" and "anchor Marker" preferably refer to a SNP located in a region determined to be in genetic association/linkage with one or more traits. In Table 3 anchor markers are identified as those SNP/Markers that have a SEQ ID NO: identical to region number of the region containing them. For example, SEQ ID NO: 1 provides the nucleic acid sequence for the anchor marker for region number 1. In Table 4, anchor markers are identified as those SNP/Markers that have a SEQ ID NO listed in column 1. For example, SEQ ID NO: 173931 provides the nucleic acid sequence for the anchor marker for the first region, corresponding nearby genes are listed in column 2, and SEQ ID numbers of markers within the gene are listed in column3.
[0026] As used herein the terms "animal" or "animals" preferably refer to dairy cattle.
[0027] As used herein "fitness" preferably refers to traits that include, but are not limited to: pregnancy rate (PR), daughter pregnancy rate (DPR), productive life (PL), somatic cell count (SCC) and somatic cell score (SCS).
[0028] As used herein, PR and DPR refer to the percentage of non-pregnant animals that become pregnant during each 21 -day period.
[0029] As used herein, PL is calculated as months in milk in each lactation, summed across all lactations until removal of the cow from the herd (by culling or death).
[0030] As used herein, somatic cell score can be calculated using the following relationship: SCS = log2(SCC/100,000)+3, where SCC is somatic cells per milliliter of milk.
[0031] As used herein the term "growth" refers to the measurement of various parameters associated with an increase in an animal's size and/or weight.
[0032] As used herein the term "linkage disequilibrium" preferably means allelic association wherein A| and Bi (as used in the above definition of allelic association) are present on the same chromosome.
[0033] As used herein the term "marker-assisted selection (MAS) preferably refers to the selection of animals on the basis of marker information in possible combination with pedigree and phenotypic data.
[0034] As used herein the terms "marker breeding value (MBV)" and "predicted marker breeding value (PMBV)" refer to an estimate of an animal's genetic transmitting ability with respect to specific traits and is based on its genotype.
[0035] As used herein the term "natural breeding" preferably refers to mating animals without human intervention in the fertilization process. That is, without the use of mechanical or technical methods such as artificial insemination or embryo transfer. The term does not refer to selection of the parent animals.
[0036] As used herein the terms "neighboring markers" or "neighboring SNPs" preferably refer to SNPs in close proximity to an anchor SNP, most preferably these terms refer to markers/SNPs located within about 70 kilobases of the anchor SNP.
[0037] As used herein the term "net merit" preferably refers to a composite index that includes several commonly measured traits weighted according to relative economic value in a typical production setting and expressed as lifetime economic worth per
cow relative to an industry base. Examples of a net merit indexes include, but are not limited to, $NM or TPI in the USA, LPI in Canada, etc (formulae for calculating these indices are well known in the art (e.g. $NM can be found on the USDA/AIPL website: www.aipl.arsusda.gov/reference.htm)
[0038] As used herein, the term "milk production" preferably refers to phenotypic traits related to the productivity of a dairy animal including milk fluid volume, fat percent, protein percent, fat yield, and protein yield.
[0039] As used herein the term "predicted value" preferably refers to an estimate of an animal's breeding value or transmitting ability based on its genotype and pedigree.
[0040] As used herein "productivity" and "production" preferably refers to yield traits that include, but are not limited to: total milk yield, milk fat percentage, milk fat yield, milk protein percentage, milk protein yield, total lifetime production, milking speed and lactation persistency.
[0041] As used herein the term "quantitative trait" is used to denote a trait that is controlled by multiple (two or more, and often many) genes each of which contributes small to moderate effect on the trait. The observations on quantitative traits often follow a normal distribution.
[0042] As used herein the term "quantitative trait locus (QTL)" is used to describe a locus that contains polymorphism that has an effect on a quantitative trait.
[0043] As used herein the term "reproductive material" includes, but is not limited to semen, spermatozoa, ova, and zygote(s).
[0044] As used herein the term "single nucleotide polymorphism" or "SNP" refer to a location in an animal's genome that is polymorphic within the population. That is, within the population some individual animals have one type of base at that position, while others have a different base. For example, a SNP might refer to a location in the genome where some animals have a "G" in their DNA sequence, while others have a "T".
[0045] As used herein the terms "hybridization under stringent conditions" and "stringent hybridization conditions" preferably mean conditions under which a "probe" will hybridize to its target sequence to a detectably greater degree than to other sequences (e.g., at least 5-fold over background). Stringent conditions are target-sequence-dependent and will differ depending on the structure of the
polynucleotide. By controlling the stringency of the hybridization and/or washing conditions, target sequences that are 100% complementary to the probe can be identified (homologous probing). Alternatively, stringency conditions can be adjusted to allow some mismatching in sequences so that lower degrees of similarity are detected (heterologous probing).
[0046] Typically, stringent conditions will be those in which the salt concentration is less than about 1.5 M Na ion, typically about 0.01 to 1.0 M Na ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30 0C for short probes (e.g., 10 to 50 nucleotides) and at least about 60 0C for long probes (e.g., greater than 50 nucleotides). Stringency may also be adjusted with the addition of destabilizing agents such as formamide. Exemplary low stringency conditions include hybridization with a buffer solution of 30 to 35% formamide, 1 M NaCl, 1 % SDS (sodium dodecyl sulphate) at 37 0C, and a wash in IX to 2X SSC (2OX SSC=3.0 M NaCl/0.3 M trisodium citrate) at 50 to 55 0C. Exemplary moderate stringency conditions include hybridization in 40 to 45% formamide, 1 M NaCl, 1% SDS at 37 0C, and a wash in 0.5X to IX SSC at 55 to 600C. Exemplary high stringency conditions include hybridization in 50% formamide, 1 M NaCl, 1% SDS at 37°C, and a wash in 0.1 X SSC at 60 to 65 0C. The duration of hybridization is generally less than about 24 hours, usually about 4 to about 12 hours.
[0047] Specificity is typically the function of post-hybridization washes, the critical factors being the ionic strength and temperature of the final wash solution. For DNA- DNA hybrids, the thermal melting point (Tm) can be approximated from the equation of Meinkoth and Wahl (1984) Anal. Biochem. 138:267-284: Tm=81.5°C+16.6 (log M)+0.41 (% GQ-0.61 (% form)-500/L; where M is the molarity of monovalent cations, % GC is the percentage of guanine and cytosine nucleotides in the DNA, % form is the percentage of formamide in the hybridization solution, and L is the length of the hybrid in base pairs. The Tn, is the temperature (under defined ionic strength and pH) at which 50% of a complementary target sequence hybridizes to a perfectly matched probe. Tm is reduced by about 1°C for each 1 % of mismatching; thus, Tn,, hybridization, and/or wash conditions can be adjusted to hybridize to sequences of the desired identity. For example, if sequences with 90% identity are sought, the Tm can be decreased 100C. Generally, stringent conditions are selected to be about 5°C lower than the Tn, for the specific sequence and its complement at a defined ionic strength
and pH. However, highly stringent conditions can utilize a hybridization and/or wash at 1 , 2, 3, or 4°C lower than the thermal melting point (Tm); moderately stringent conditions can utilize a hybridization and/or wash at 6, 7, 8, 9, or 100C lower than the thermal melting point (Tm); low stringency conditions can utilize a hybridization and/or wash at 1 1 , 12, 13, 14, 15, or 200C lower than the thermal melting point (Tm). Using the equation, hybridization and wash compositions, and desired Tm, those of ordinary skill will understand that variations in the stringency of hybridization and/or wash solutions are inherently described. If the desired degree of mismatching results in a Tn, of less than 45°C (aqueous solution) or 32°C (formamide solution), it is preferred to increase the SSC concentration so that a higher temperature can be used. An extensive guide to the hybridization of nucleic acids is found in Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology— Hybridization with Nucleic Acid Probes, Part I, Chapter 2 (Elsevier, N. Y.); and Ausubel et al., eds. (1995) Current Protocols in Molecular Biology, Chapter 2 (Greene Publishing and Wiley-Interscience, New York). See also Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, N.Y.).
[0048] As used herein the terms "marker breeding value (MBV)" and "predicted marker breeding value (PMBV)" respectively refer to an estimate of an animal's genetic transmitting ability with respect to either productivity traits or fitness traits that is based on its genotype.
[0049] As used herein the term "whole-genome analysis" preferably refers to the process of QTL mapping of the entire genome at high marker density (i.e. at least about one marker per cM) and detection of markers that are in population-wide linkage disequilibrium with QTL.
[0050] As used herein the term "whole-genome selection (WGS)" preferably refers to the process of marker-assisted selection (MAS) on a genome-wide basis in which markers spanning the entire genome at moderate to high density (e.g. at least about one marker per 1 -5 cM), or at moderate to high density in QTL regions, or directly neighboring or flanking QTL that explain a significant portion of the genetic variation controlling one or more traits.
ILLUSTRATIVE EMBODIMENTS OF THE INVENTION
10051 ] Various embodiments of the present invention provide methods for evaluating an animal's (especially a dairy animal's) genotype. In preferred embodiments of the invention, the animal's genotype is evaluated at 10 or more positions (i.e. with respect to 10 or more genetic markers). Aspects of these embodiments of the invention provide methods that comprise determining the animal's genomic sequence at 10 or more locations (loci) that contain single nucleotide polymorphisms (SNPs). Embodiments of the invention provide methods for evaluating an animal's genotype by determining which of two or more alleles for the SNP are present for each of 10 or more SNPs selected from the group consisting of the SNPs described in Tables 5 and 6 of the instant application. Furthermore, embodiments of the invention provide methods for evaluating an animal's genotype by determining which of two or more alleles for the SNP are present for each of 10 or more SNPs selected from the group consisting of the SNPs described in Tables 3, 4, 5, and 6 and the Sequence Listing of the instant application and/or SNPs located on the same chromosome and within approximately 70 kilobases of one or more of the SNP anchor markers described in Tables 3 and 4 and the Sequence Listing. That is, within approximately 70,000 base pairs of the anchor SNP marker, in either the 5' or 3' direction from an anchor SNP.
[0052] Various embodiments of the present invention provide methods for evaluating an animal's (especially a dairy animal's) genotype. In preferred embodiments of the invention, the animal's genotype is evaluated at 10 or more positions (i.e. with respect to 10 or more genetic markers). Aspects of these embodiments of the invention provide methods that comprise determining the animal's genomic sequence at 10 or more locations (loci) that contain single nucleotide polymorphisms (SNPs). Specifically, the invention provides methods for evaluating an animal's genotype by determining which of two or more alleles for the SNP are present for each of 10 or more SNPs selected from the group consisting of the SNPs described in Table 4 and the Sequence Listing of the instant application and/or SNPs located on the same chromosome and within a gene, wherein the gene has a portion of its sequence within approximately 70 kilobases of one or more of the SNP anchor markers described in Table 4 and the Sequence Listing. That is, within a gene residing within approximately 70,000 base pairs of the anchor SNP marker, in either the 5' or 3' direction from the anchor SNP.
[0053J In preferred aspects of these embodiments the animal's genotype is evaluated to determine which allele is present for 25 or more SNPs selected from the group of consisting of the SNPs described in Tables 3, 4, 5, and 6and the Sequence Listing and SNPs on the same chromosome and within about 70 kilobases of one or more of the SNP anchor markers, described in Tables 3, 4, 5, and 6and the Sequence Listing. More, preferably the animal's genotype is determined for positions corresponding with 50, 100, 200, 500, or 1000, or more of the SNPs selected from this group.
[0054] In preferred aspects of these embodiments the animal's genotype is evaluated to determine which allele is present for 25 or more SNPs selected from the group of consisting of the SNPs described in Table 4 and the Sequence Listing and SNPs on the same chromosome and within a gene, wherein the gene has a portion of its sequence within about 70 kilobases of one or more of the SNP anchor markers, described in Table 4 and the Sequence Listing. More, preferably the animal's genotype is determined for positions corresponding with 50, 100, 200, 500, or 1000, or more of the SNPs selected from this group.
[0055] In preferred aspects of these embodiments the animal's genotype is evaluated to determine which allele is present for 25 or more SNPs selected from the group of SNPs described in Tables 5 and 6. More, preferably the animal's genotype is determined for positions corresponding with 50, 100, 200, 500, or 1000, or more of the SNPs described in Tables 5 and 6.
[0056] In other aspects of this embodiment, the animal's genotype is analyzed with respect to at least 10, 25, 50, 100, 200, 500, or more SNPs that have been shown to be associated with productivity and/or fitness (see Table 5 for a list of the SNPs associated with these traits). For example, embodiments of the invention provides a method for genotyping 10 or more, 25 or more, 50 or more, 100 or more, 200 or more, or 500 or more, or 1000 or more SNPs that have been determined to be significantly associated with productivity, as described in Tables 5 and 6.
[0057] In other aspects of this embodiment, the animal's genotype is analyzed with respect to at least 10, 25, 50, 100, 200, 500, or more SNPs that have been shown to be
associated with one or more traits selected from the group consisting of Milk Production, Somatic Cell Score, Daughter Pregnancy Rate, Productive Life, and Net Merit (see Tables 3 and 4 for a list of the SEQ ID NOs of the SNPs associated with these traits, including associated anchor SNPs). For example, embodiments of the invention provides a method for genotyping 10 or more, 25 or more, 50 or more, 100 or more, 200 or more, or 500 or more, or 1000 or more SNPs that have been determined to be significantly associated with one or more of these traits. These SNPs are preferably selected from the group consisting of the SNPs described in Table 3 and 4 and the Sequence Listing and the respective neighboring SNPs.
[0058] Aspects of the present invention also provides for both whole-genome analysis and whole genome-selection (WGS) (that is marker-assisted selection (MAS) on a genome-wide basis). Various aspects of this embodiment of the invention provide for either whole-genome analysis or WGS wherein the makers analyzed for an animal span the animal's entire genome at moderate to high density. That is, the animal's genome is analyzed with markers that on average occur, at least, approximately every 1 to 5 centimorgans in the genome. Moreover the invention provides that of the markers used to carry out the whole-genome analysis or WGS, 10 or more, 25, or more, 50 or more, 100 or more, 200 or more, 500 or more, or 1000 or more are selected from the markers described in Tables 5 and 6. In preferred aspects of this embodiment the markers may be associated with fitness or associated with productivity, or may be associated with both fitness and productivity.
[0059] Aspects of the present invention also provides for both whole-genome analysis and whole genome-selection (WGS) (i.e. marker-assisted selection (MAS) on a genome-wide basis). Various aspects of this embodiment of the invention provide for either whole-genome analysis or WGS wherein the makers analyzed for an animal span the animal's entire genome at moderate to high density. That is, the animal's genome is analyzed with markers that on average occur, at least, approximately every 1 to 5 centimorgans in the genome. Moreover the invention provides that of the markers used to carry out the whole-genome analysis or WGS, 10 or more, 25, or more, 50 or more, 100 or more, 200 or more, 500 or more, or 1000 or more are selected from the group consisting of the markers described in Table 3 and 4 and the Sequence Listing and SNPs on the same chromosome and within about 70 kilobases
of one or more of the SNP anchor markers in Table 3. In preferred aspects of this embodiment the markers may be associated with fitness or associated with productivity, or may be associated with both fitness and productivity. Furthermore, the invention provides that of the markers used to carry out the whole-genome analysis or WGS, 10 or more, 25, or more, 50 or more, 100 or more, 200 or more, 500 or more, or 1000 or more are selected from the group consisting of the markers described in Tables 3 and 4 and the Sequence Listing and SNPs on the same chromosome and within a gene, wherein the gene has a portion of its sequence within about 70 kilobases of one or more of the SNP anchor markers in Table 4. In preferred aspects of this embodiment the markers may be associated with fitness or associated with productivity, or may be associated with both fitness and productivity.
[0060] In any embodiment of the invention the genomic sequence at the SNP locus may be determined by any means compatible with the present invention. Suitable means are well known to those skilled in the art and include, but are not limited to direct sequencing, sequencing by synthesis, primer extension, Matrix Assisted Laser Desorption /Ionization-Time Of Flight (MALDI-TOF) mass spectrometry, polymerase chain reaction-restriction fragment length polymorphism, microarray/multiplex array systems (e.g. those available from Affymetrix, Santa Clara, California), and allele-specific hybridization.
[0061] Other embodiments of the invention provide methods for allocating animals for subsequent use (e.g. to be used as sires or dams or to be sold for meat or dairy purposes) according to their predicted value for productivity or fitness. Various aspects of this embodiment of the invention comprise determining at least one animal's genotype for at least one SNP selected from the group of SNPs consisting of the SNPs described in Table 3 and the sequence listing and SNPs on the same chromosome and within about 70 kilobases of one or more of the SNP anchor markers, (methods for determining animals' genotypes for one or more SNPs are described supra). Thus, the animal's allocation for use may be determined based on its genotype at one or more, 5 or more, 10 or more, 25 or more, 50 or more, 100 or more, 300 or more, or 500 or more, or 1000 or more of the SNPs in this group. The instant invention also provides embodiments where analysis of the genotypes of the SNPs described in Tables 3 and the Sequence Listing is the only analysis done. Other
embodiments provide methods where analysis of the SNPs disclosed herein is combined with any other desired type of genomic or phenotypic analysis (e.g. analysis of any genetic markers beyond those disclosed in the instant invention). Moreover, the SNPs analyzed may be selected from those SNPs associated only with milk production (MP), only with somatic cell score (SCS), only with daughter pregnancy rate (DPR), only with reproductive life (PL), or only with net merit (NM). Alternatively, the analysis may be done for SNPs selected from any desired combination of MP, SCS, DPR, PL, and NM. SNPs associated with various traits include those listed in Tables 3 and others that are located on the same chromosome and within about 70 kb of an anchor marker.
[0062] Other embodiments of the invention provide methods for allocating animals for subsequent use (e.g. to be used as sires or dams or to be sold for meat or dairy purposes) according to their predicted value for productivity or fitness. Various aspects of this embodiment of the invention comprise determining at least one animal's genotype for at least one SNP selected from the group of SNPs consisting of the SNPs described in Table 4 and the sequence listing and SNPs on the same chromosome and within a gene, wherein the gene has a portion of its sequence within about 70 kilobases of one or more of the SNP anchor markers, (methods for determining animals' genotypes for one or more SNPs are described supra). Thus, the animal's allocation for use may be determined based on its genotype at one or more, 5 or more, 10 or more, 25 or more, 50 or more, 100 or more, 300 or more, or 500 or more, or 1000 or more of the SNPs in this group. The instant invention also provides embodiments where analysis of the genotypes of the SNPs described in Table 4 and the Sequence Listing is the only analysis done. Other embodiments provide methods where analysis of the SNPs disclosed herein is combined with any other desired type of genomic or phenotypic analysis (e.g. analysis of any genetic markers beyond those disclosed in the instant invention). Moreover, the SNPs analyzed may be selected from those SNPs associated only with milk production (MP), only with somatic cell score (SCS), only with daughter pregnancy rate (DPR), only with reproductive life (PL), or only with net merit (NM). Alternatively, the analysis may be done for SNPs selected from any desired combination of MP, SCS, DPR, PL, and NM. SNPs associated with various traits include those SNPs listed in Table 4 and others that are located on the same chromosome and within a gene,
wherein the gene has a portion of its sequence within about 70 kb of an anchor marker.
[0063] According to various aspects of these embodiments of the invention, once the animal's genetic sequence for the selected SNP(s) have been determined, this information is evaluated to determine which allele of the SNP is present for at least one of the selected SNPs. Preferably the animal's allelic complement for all of the determined SNPs is evaluated. Finally, the animal is allocated for use based on its genotype for one or more of the SNP positions evaluated. Preferably, the allocation is made taking into account the animal's genotype at each of the SNPs evaluated, but its allocation may be based on any subset or subsets of the SNPs evaluated.
[0064] The allocation may be made based on any suitable criteria. For any SNP, a determination may be made as to whether one of the allele(s) is associated/correlated with desirable characteristics or associated with undesirable characteristics. This determination will often depend on breeding or herd management goals. Determination of which alleles are associated with desirable phenotypic characteristics can be made by any suitable means. Methods for determining these associations are well known in the art; moreover, aspects of the use of these methods are generally described in the EXAMPLES, below.
[0065] Phenotypic traits that may be associated with the SNPs of the current invention include, but are not limited to; fitness traits and productivity traits (including for example, MP, SCS, DPR, PL, and NM).
[0066] According to various aspects of this embodiment of the invention allocation for use of the animal may entail either positive selection for the animals having the desired genotype(s) (e.g. the animals with the desired genotypes are selected for productivity traits), negative selection of animals having undesirable genotypes (e.g. animals with an undesirable genotypes are culled from the herd), or any combination of these methods. According to preferred aspects of this embodiment of the invention animals identified as having SNP alleles associated with desirable phenotypes are allocated for use consistent with that phenotype (e.g. allocated for breeding based on phenotypes positively associated with fitness). Alternatively, animals that do not have SNP genotypes that are positively correlated with the desired phenotype (or
possess SNP alleles that are negatively correlated with that phenotype) are not allocated for the same use as those with a positive correlation for the trait.
[0067] Other embodiments of the invention provide methods for selecting potential parent animals {i.e., allocation for breeding) to improve fitness and/or productivity in potential offspring. Various aspects of this embodiment of the invention comprise determining at least one animal's genotype for at least one SNP selected from the group of SNPs consisting of the SNPs described in Table 3, 5 and 6 and the Sequence Listing and SNPs on the same chromosome and within about 70 kilobases of one or more of the SNP anchor markers. Furthermore, determination of whether and how an animal will be used as a potential parent animal may be based on its genotype at one or more, 10 or more, 25 or more, 50 or more, 100 or more, 300 or more, or 500 or more of the SNPs from that group. Moreover, as with other types of allocation for use, various aspects of these embodiments of the invention provide methods where the only analysis done is to genotype the animal for one or more of the SNPs described in Tables 3, 4, 5, and 6 and the Sequence Listing. Other aspects of these embodiments provide methods where analysis of one or more SNPs disclosed herein is combined with any other desired genomic or phenotypic analysis (e.g. analysis of any genetic markers beyond those disclosed in the instant invention). Moreover, the SNP(s) analyzed may all be selected from those associated only with MP, only with SCS, only with DPR, only with PL, or only with NM. Conversely, the analysis may be done for SNPs selected from any desired combination of these or other traits.
[0068] Other embodiments of the invention provide methods for selecting potential parent animals {i.e., allocation for breeding) to improve fitness and/or productivity in potential offspring. Various aspects of this embodiment of the invention comprise determining at least one animal's genotype for at least one SNP selected from the group of SNPs consisting of the SNPs described in Table 4 and the Sequence Listing and SNPs on the same chromosome and within a gene, wherein the gene has a portion of its sequence within about 70 kilobases of one or more of the SNP anchor markers. Furthermore, determination of whether and how an animal will be used as a potential parent animal may be based on its genotype at one or more, 10 or more, 25 or more, 50 or more, 100 or more, 300 or more, or 500 or more of the SNPs from that group. Moreover, as with other types of allocation for use, various aspects of these embodiments of the invention provide methods where the only analysis done is to
genotype the animal for one or more of the SNPs described in Table 4 and the Sequence Listing. Other aspects of these embodiments provide methods where analysis of one or more SNPs disclosed herein is combined with any other desired genomic or phenotypic analysis (e.g. analysis of any genetic markers beyond those disclosed in the instant invention). Moreover, the SNP(s) analyzed may all be selected from those associated only with MP, only with SCS, only with DPR, only with PL, or only with NM. Conversely, the analysis may be done for SNPs selected from any desired combination of these or other traits.
[0069] According to various aspects of these embodiments of the invention, once the animal's genetic sequence at the site of the selected SNP(s) have been determined, this information is evaluated to determine which allele of the SNP is present for at least one of the selected SNPs. Preferably the animal's allelic complement for all of the sequenced SNPs is evaluated. Additionally, the animal's allelic complement is analyzed and evaluated to estimate the probability that the animal's progeny will express one or more phenotypic traits or to predict the animal's progeny's genetic merit or phenotypic value of traits of interest. Finally, the animal is allocated for breeding use based on its genotype for one or more of the SNP positions evaluated and the probability that it will pass the desired genotype(s)/allele(s) to its progeny. Preferably, the breeding allocation is made taking into account the animal's genotype at each of the SNPs evaluated. However, its breeding allocation may be based on any subset or subsets of the SNPs evaluated.
[0070] The breeding allocation may be made based on any suitable criteria. For example, breeding allocation may be made so as to increase the probability of enhancing a single certain desirable characteristic in a population, in preference to other characteristics, (e.g. increased fitness, or even specifically lowering somatic cell score (SCS) as part of fitness); alternatively, the selection may be made so as to generally maximize overall production based on a combination of traits. The allocations chosen are dependent on the breeding goals. Sub-categories falling within fitness, include, inter alia: daughter pregnancy rate (DPR), productive life (PL), and somatic cell score. Sub-categories falling within productivity include, inter alia: milk fat percentage, milk fat yield, total milk yield, milk protein percentage, and total milk protein.
L0071] Other embodiments of the instant invention provide methods for producing progeny animals. According to various aspects of this embodiment of the invention, the animals used to produce the progeny are those that have been allocated for breeding according to any of the embodiments of the current invention. Those using the animals to produce progeny may perform the necessary analysis or, alternatively, those producing the progeny may obtain animals that have been analyzed by another. The progeny may be produced by any appropriate means, including, but not limited to using: (i) natural breeding, (ii) artificial insemination, (iii) in vitro fertilization (IVF) or (iv) collecting semen/spermatozoa and/or at least one ovum from the animal and contacting it, respectively with ova/ovum or semen/spermatozoa from a second animal to produce a conceptus by any means.
[0072] According to preferred aspects of this embodiment of the invention the progeny are produced by a process comprising natural breeding. In other aspects of this embodiment the progeny are produced through a process comprising the use of standard artificial insemination (AI), in vitro fertilization, multiple ovulation embryo transfer (MOET), or any combination thereof.
[0073] Other embodiments of the invention provide for methods that comprise allocating an animal for breeding purposes and collecting/isolating genetic material from that animal: wherein genetic material includes but is not limited to: semen, spermatozoa, ovum, zygotes, blood, tissue, serum, DNA, and RNA.
[0074] It is understood that most efficient and effective use of the methods and information provided by the instant invention employ computer programs and/or electronically accessible databases that comprise all or a portion of the sequences disclosed in the instant application. Accordingly, the various embodiments of the instant invention provide for databases comprising all or a portion of the sequences corresponding to at least 10 SNPs described in Tables 3, 4, 5, and 6 and the Sequence Listing. In preferred aspect of these embodiments the databases comprise sequences for 25 or more, 50 or more, 100 or more, 200 or more, 500 or more, or 1000 or more, or substantially all of the SNPs described in Table 3, 5 and 6 and the Sequence Listing.
[0075] It is further understood that efficient analysis and use of the methods and information provided by the instant invention will employ the use of automated
genotyping; particularly when large numbers (e.g. 100s) of markers are evaluated. Any suitable method known in the art may be used to perform such genotyping, including, but not limited to the use of micro-arrays.
[0076] Other embodiments of the invention provide methods wherein one or more of the SNP sequence databases described herein are accessed by one or more computer- executable programs. Such methods include, but are not limited to, use of the databases by programs to analyze for an association between the SNP and a phenotypic trait, or other user-defined trait (e.g. traits measured using one or more metrics such as gene expression levels, protein expression levels, or chemical profiles), and programs used to allocate animals for breeding or market.
[0077] Other embodiments of the invention provide methods comprising collecting genetic material from an animal that has been allocated for breeding. Wherein the animal has been allocated for breeding by any of the methods disclosed as part of the instant invention.
[0078] Other embodiments of the invention provide for diagnostic kits or other diagnostic devices for determining which allele of a SNP is present in a sample; wherein the SNP(s) are selected from the group of SNPs described in Tables 5 and 6. In various aspects of this embodiment of the invention, the kit or device provides reagents/instruments to facilitate a determination as to whether nucleic acid corresponding to the SNP is present. Such kit/or device may further facilitate a determination as to which allele of the SNP is present. In certain aspects of this embodiment of the invention the kit or device comprises at least one nucleic acid oligonucleotide suitable for DNA amplification (e.g. through polymerase chain reaction). In other aspects of the invention the kit or device comprises a purified nucleic acid fragment capable of specifically hybridizing, under stringent conditions, with at least one allele of at least one SNP described in Tables 5 and 6.
[0079] Other embodiments of the invention provide for diagnostic kits or other diagnostic devices for determining which allele of one or more SNP(s) is/are present in a sample; wherein the SNP(s) are selected from the group of SNPs consisting of the SNPs described in Table 3 and the sequence listing and SNPs on the same chromosome and within about 70 kilobases of one or more of the SNP anchor markers. In various aspects of this embodiment of the invention, the kit or device
provides reagents/instruments to facilitate a determination as to whether nucleic acid corresponding to the SNP is present. Such kit/or device may further facilitate a determination as to which allele of the SNP is present. In certain aspects of this embodiment of the invention the kit or device comprises at least one nucleic acid oligonucleotide suitable for DNA amplification (e.g. through polymerase chain reaction). In other aspects of the invention the kit or device comprises a purified nucleic acid fragment capable of specifically hybridizing, under stringent conditions, with at least one allele of at least one of the SNPs described above.
[0080] In particularly preferred aspects of this embodiment of the invention the kit or device comprises at least one nucleic acid array (e.g. DNA micro-arrays) capable of determining which allele of one or more of the SNPs are present in a sample; where the SNPs are selected from the group of SNPs consisting of the SNPs described in Tables 3, 4, 5, and 6 and the Sequence Listing and SNPs on the same chromosome and within about 70 kilobases of one or more of the SNP anchor markers. Preferred aspects of this embodiment of the invention provide DNA micro-arrays capable of simultaneously determining which allele is present in a sample for 10 or more SNPs. Preferably, the DNA micro-array is capable of determining which SNP allele is present in a sample for 25 or more, 50 or more, 100 or more, 200 or more, 500 or more, or 1000 or more SNPs. Methods for making such arrays are known to those skilled in the art and such arrays are commercially available (e.g. from Affymetrix, Santa Clara, California).
[0081] Other embodiments of the instant invention provide methods of identifying genetic markers associated with MP, SCS, DPR, PL, and/or NM (and/or any other fitness or productivity trait) that are in allelic association with one or more of the SNPs described in Tables 3, 4, 5, and 6 and the Sequence Listing. According to various aspects of this embodiment of the invention the method comprises: (a) identifying a marker, Bi, that is suspected of being in allelic association with at least one marker, Ai, wherein A| is selected from the group of SNPs described in Tables 3, 4, 5, and 6 and the Sequence Listing; (b) determining whether A| and Bi are in allelic association; wherein allelic association is determined to exist if r2 > 0.2 for Equation 1 for a population sample of at least 100 animals and wherein Equation 1 is:
2 E(A1B1 ) -f(At)f (B1 )]2 r = [Equation 1]
and wherein for Equation 1 A| represents an allele at one locus (e.g., a SNP described in Table 3); B| represents a genetic marker at another locus; f(AiB0 denotes frequency of having both Ai and Bi; f(A|) is the frequency of A| in the population, f(B|) is the frequency of Bi in the population. In preferred aspects of this embodiment of the invention B| is a SNP. Even more preferred aspects of this embodiment of the invention provide methods for identifying genetic markers in linkage disequilibrium with one or more SNPs selected from the group of SNPs described in Tables 3, 4, 5, and 6. In other aspects of these embodiments of the invention the value of r is greater than 0.5, greater than 0.7, or greater than 0.9.
[0082]In other aspects of this embodiment of the invention the method comprises finding polymorphisms within about 70 kb of one of the anchor SNPs listed in Table 3 or for finding a polymorphism in a gene, wherein the gene has a portion of its sequence within about 70 kb of one of the anchor SNPs listed in Table 4..
[0083] Additional embodiments of the invention provide for genetic markers for fitness and/or productivity that are in allelic association with one or more of the SNPs described in Tables 3, 4, 5, and 6. Markers provided as part of this embodiment of the invention may be identified by any suitable means known to those of ordinary skill in the art. A marker falls within this embodiment of the invention if it is determined to be in allelic association with one or more of the SNPs described in Tables 3, 4, 5, and 6as defined by Equation 1 , supra, where r2 is greater than 0.2, greater than 0.5, greater than 0.7, or greater than 0.9. In preferred aspects of these embodiments of the invention the markers are in linkage disequilibrium with one or more of the SNPs described in Table 3.
[0084] Additional embodiments of the invention provide for genetic markers for fitness and/or productivity that are in allelic association with one or more of the SNPs described in Tables 5 and 6. Markers provided as part of this embodiment of the invention may be identified by any suitable means known to those of ordinary skill in the art. A marker falls within this embodiment of the invention if it is determined to be in allelic association with one or more of the SNPs described in Tables 5 and 6 as defined by Equation 1, supra, where r is greater than 0.2, greater than 0.5, greater than 0.7, or greater than 0.9. In preferred aspects of these embodiments of the invention the markers are in linkage disequilibrium with one or more of the SNPs described in Tables 5 and 6.
[0085] Genetic markers that are in allelic association with any of the SNPs described in the Tables may be identified by any suitable means known to those skilled in the art. For example, a genomic library may be screened using a probe specific for any of the sequences of the SNPs described in the Tables. In this way clones comprising at least a portion of that sequence can be identified and then up to 300 kilobases of 3' and/or 5' flanking chromosomal sequence can be determined. Preferably up to about 70 kilobases of 3' and/or 5' flanking chromosomal sequences are evaluated. By this means, genetic markers in allelic association with the SNPs described in the Tables will be identified.
[0086] Other embodiments of the present invention provide methods for identifying genes that may be associated with phenotypic variation. According to various aspects of these embodiments, the chromosomal location of a SNP associated with a particular phenotypic variation can be determined, by means well known to those skilled in the art. Once the chromosomal location is determined genes suspected to be involved with determination of the phenotype can be analyzed. Such genes may be identified by sequencing adjacent portions of the chromosome or by comparison with analogous section of the human genetic map (or known genetic maps for other species). An early example of the existence of clusters of conserved genes is reviewed in Womack (1987), where genes mapping to the same chromosome in one species were observed to map to the same chromosome in other, closely related, species. As mapping resolution improved, reports of the conservation of gene structure and order within conserved chromosomal regions were published (for example, Grosz et al, 1992). More recently, large scale radiation hybrid mapping and BAC sequence have yielded chromosome-scale comparative mapping predictions between human and bovine genomes (Everts-van der Wind et al., 2005), between human and porcine genomes (Yasue et al., 2006) and among vertebrate genomes (Demars et al., 2006)
[0087] Other embodiments of the invention provide methods for identifying causal mutations that underlie one or more quantitative trait loci (QTL). Various aspects of this embodiment of the invention provide for the identification of SNPs that are in allelic association with one or more of the SNPs described in Table 3. Once these SNPs are identified, it is within the ability of skilled artisans to identify mutations
located proximal to such SNP(s). Further, one skilled in the art can identify genes located proximate to the identified SNP(s) and evaluate these genes to select those likely to contain the causal mutation. Once identified, these genes and the surrounding sequence can be analyzed for the presence of mutations, in order to identify the causal mutation.
[0088] Other embodiments of the invention provide methods for identifying causal mutations that underlie one or more quantitative trait loci (QTL). Various aspects of this embodiment of the invention provide for the identification QTL that are in allelic association with one or more of the SNPs described in Tables 5 and 6. Once these SNPs are identified, it is within the ability of skilled artisans to identify mutations located proximal to such SNP(s). Further, one skilled in the art can identify genes located proximate to the identified SNP(s) and evaluate these genes to select those likely to contain the causal mutation. Once identified, these genes and the surrounding sequence can be analyzed for the presence of mutations, in order to identify the causal mutation.
[0089]Still other embodiments of the invention provide methods to modulate the expression and/or concentration of a gene product from the genes described in Table 4 with the intent of manipulating the performance and/or product quality of an animal or animal product. This can be performed through any of a number of technologies designed to positively or negatively influence gene expression, including but not limited to transgenesis, RNA interference, and anti-sense protocols.
EXAMPLES
[0090] The following examples are included to demonstrate general embodiments of the invention. It should be appreciated by those of skill in the art that the techniques disclosed in the examples which follow represent techniques discovered by the inventors to function well in the practice of the invention, and thus can be considered to constitute preferred modes for its practice. However, those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments which are disclosed and still obtain a like or similar result without departing from the invention.
[0091 | All of the compositions and methods disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure. While the compositions and methods of this invention have been described in terms of preferred embodiments, it will be apparent to those of skill in the art that variations may be applied without departing from the concept and scope of the invention.
Example 1 : Determining Associations between Genetic Markers and Phenotypic Traits
[0092] Simultaneous discovery and fine-mapping on a genome-wide basis of genes underlying quantitative traits (Quantitative Trait Loci: QTL) requires genetic markers densely covering the entire genome. As described in this example, a whole-genome, dense-coverage marker map was constructed from microsatellite and single nucleotide polymorphism (SNP) markers with previous estimates of location in the bovine genome, and from SNP markers with putative locations in the bovine genome based on homology with human sequence and the human/cow comparative map. A new linkage-mapping software package was developed, as an extension of the CRIMAP software (Green et al., Washington University School of Medicine, St. Louis, 1990), to allow more efficient mapping of densely-spaced markers genome-wide in a pedigreed livestock population (Liu and Grosz Abstract C014; Grapes et al. Abstract W244; 2006 Proceedings of the XIV Plant and Animal Genome Conference, www.intl-pag.org). The new linkage mapping tools build on the basic mapping principles programmed in CRIMAP to improve efficiency through partitioning of large pedigrees, automation of chromosomal assignment and two-point linkage analysis, and merging of sub-maps into complete chromosomes. The resulting whole- genome discovery map (WGDM) included 6,966 markers and a map length of 3,290 cM for an average map density of 2.18 markers/cM. The average gap between markers was 0.47 cM and the largest gap was 7.8 cM. This map provided the basis for whole-genome analysis and fine-mapping of QTL contributing to variation in productivity and fitness in dairy cattle.
Discovery and Mapping Populations
[0093] Systems for discovery and mapping populations can take many forms. The most effective strategies for determining population-wide marker/QTL associations
include a large and genetically diverse sample of individuals with phenotypic measurements of interest collected in a design that allows accounting for non-genetic effects and includes information regarding the pedigree of the individuals measured. In the present example, an outbred population following the grand-daughter design (Weller et ai, 1990) was used to discover and map QTL: the population, from the Holstein breed, had 529 sires each with an average of 6.1 genotyped sons, and each son has an average of 4216 daughters with milk data. DNA samples were collected from approximately 3,200 Holstein bulls and about 350 bulls from other dairy breeds; representing multiple sire and grandsire families.
Phenotypic Analyses
[0094] Dairy traits under evaluation include fitness and productivity traits such as milk yield ("MILK") (pounds), fat yield ("FAT") (pounds), fat percentage ("FATPCT") (percent), productive life ("PL") (months), somatic cell score ("SCS") (Log), daughter pregnancy rate ("DPR") (percent), protein yield ("PROT") (pounds), protein percentage ("PROTPCT") (percent), and net merit ("NM") (dollar). These traits are sex-limited, as no individual phenotypes can be measured on male animals. Instead, genetic merits of these traits defined as PTA (predicted transmitting ability) were estimated using phenotypes of all relatives. Most dairy bulls were progeny tested with a reasonably larger number of daughters (e.g., >50), and their PTA estimation is generally more or considerably more accurate than individual cow phenotype data. The genetic evaluation for traditional dairy traits of the US Holstein population is performed quarterly by USDA. Detailed descriptions of traits, genetic evaluation procedures, and genetic parameters used in the evaluation can be found at the USDA AIPL web site (www.aipl.arsusda.gov). It is meaningful to note that the dairy traits evaluated in this example are not independent: FAT and PROT are composite traits of MILK and FATPCT, and MILK and PROTPCT, respectively. NM is an index trait calculated based on protein yield, fat yield, production life, somatic cell score, daughter pregnancy, calving difficulty, and several type traits. Protein yield and fat yield together account for >50% of NM, and the value of milk yield, fat content, and protein content is accounted for via protein yield and fat yield.
[0095] PTA data of all bulls with progeny testing data were downloaded from the USDA evaluation published at the AIPL site in November 2005. The PTA data were analyzed using the following two models:
y, = μ + P1(SPTA)1 + PTAd1 [2] where y, (yu) is the PTA of the ith bull (PTA of the jth son of the ilh sire); S1 is the effect of the ith sire; (SPTA)1 is the sire's PTA of the ith bull of the whole sample; μ is the population mean; PTAd1 (PTAd11) is the residual bull PTA.
[0096] Equation [1] is referred to as the sire model, in which sires were fitted as fixed factors. Among all USA Holstein progeny tested bulls, a considerably large number of sires only have a very small number of progeny tested sons (e.g., some have one son), and it is clearly undesirable to fit sires as fixed factors in these cases. It is well known the USA Holstein herds have been making steady and rapid genetic progress in traditional dairy traits in the last several decades, implying that the sire's effect can be partially accounted for by fitting the birth year of a bull. For sires with <10 progeny tested sons, sires were replaced with son's birth year in Eq. [I ]. Eq. [2] is referred to as the SPTA model, in which sire's PTA are fitted as a covariate. Residual PTA (PTAd1 or PTAd1,) were estimated using linear regression.
SNP-trait Association Analyses and Identification of Anchor Markers
[0097] In the present example, linkage disequilibrium (LD) mapping was performed in the aforementioned discovery population using statistical analyses based on probabilities of individual ordered genotypes estimated conditional on observed marker genotypes. The first step was to estimate sire's ordered genotype probabilities at all linked markers conditional on grandsire's and offspring marker genotype data. The exact calculation quickly becomes computationally infeasible as the size and complexity of the pedigree and number of linked markers increases. For example, there are, in total 2k ordered genotypes for all linked loci when a sire has k linked heterozygous loci. A stepwise procedure developed based on a likelihood ratio test was used for estimating probabilities of sire's ordered genotypes at all linked markers.
[0098J The probabilities of ordered genotypes at loci of interest were estimated conditional on flanking informative markers as follows:
P(HHkHdlk \ M) = ∑ ∑ P(HsaHdb \ M) * P(H^kH<llk \ HsaHdb ,M) [3] a b
Where P(HsaHdb I M) is the probability of sire having a pair of haplotypes (or order genotype) HsaHdb at all linked loci conditional on the observed genotype data M, and P(HukHdlk I HsaHdb,M) is the probability of a son having ordered genotype HslkHdlk at loci of interest conditional on sire's ordered genotype H53H1,,, at all linked loci and the observed genotype data M.
[0099] To determine associations between haplotypes probabilities and trait phenotypes, haplotypes of neighboring (and/or non-neighboring) markers across each chromosome were defined by setting the maximum length of a chromosomal interval and minimum and maximum number of markers to be included. Clearly, one needs to set similar parameters to form or define groups of marker loci for haplotype evaluation. The association between pre-adjusted trait phenotypes and haplotype (or pair of haplotype that is alternatively termed as ordered genotypes) was evaluated via a regression approach with the following models:
PTAd k = ∑ βs,P(Hslk ) + ek [ 4 ]
PTAd k = ∑ βdlP{Hdιk ) + ek [ 5 ]
PTAdk = ∑ βΛP(Hslk ) + P(Hώk )] + ek [ 6 ] t
PTAdk = ∑ βsl[P(HslkHdjk ) + P(H≠Hώk )] + ek [ 7 ]
where PTAdk is the preadjusted PTA of the kth bull as defined in Eq. [1] under the sire model and can be replaced with PTAd, as defined in Eq [2] under the SPTA model, and ek is the residual; P(Ηs,k) and P(Hd,k) are the probability of paternal and maternal haplotype of individual k being haplotype i; P(Hs,kHC)lk) is the probability of individual k having paternal haplotype i and maternal haplotype j that can be estimated using Eq.
[3]; all β are corresponding regression coefficients. Equations [4], [5], [6], and [7] are designed to model paternal haplotype, maternal haplotype, additive haplotype, and genotype effects, respectively.
[0100] Least-squares methods were used to estimate the effect of a haplotype or haplotype pair on a phenotypic trait and the regular F-test used to test the significance of the effect. Permutation tests were performed based on phenotype permutation (20,000) within each paternal half-sib family to estimate Type I error rate (p value)
Linkage Disequilibrium:
[0101] In the Holstein bulls that were sampled in our study, high level of LD was found between tightly linked SNPs (Table 2). As linkage distance between two SNPs increased, the level of LD decreased. Furthermore, it is reasonable to believe that the extent and range of LD in other dairy cattle populations is similar to that in Holstein population.
Table 1. LD as a function of linkage distance
Linkage Distance (cM) Average r2
<0.1 0.465
0.10 0.203
0.29 0.164
0.50 0.127
0.70 0.109
0.88 0.085
1.07 0.081
[0102] These results suggested that, for any two loci in very tight linkage, the probability of having high level of LD is high. Therefore, an anonymous SNP that is in very tight linkage with a SNP that was found to be statistically associated with a dairy trait can be found, with high probability, to be statistically associated with the same dairy trait, and the probability is generally an increasing function of LD. As a result, it is an efficient way to exploit similar SNP-trait association by discovering
SNP in the neighboring regions. The threshold for minimum level of LD for this approach is dependent on various factors including the nature of an application, effect size, and sample size. Although not limited by this theory, a minimum of r" = 0.1 is believed to be a conservative estimate of useful LD for breeding purposes. More preferably, r2 = 0.2 is expected to be a useful LD (Zondervan et al, 2004).
Mapping of SNPs to Bovine Sequence Assembly
[0103] The mapping of SNPs to bovine genomic sequence assembly was done by comparing SNP sequences with the assembled bovine genomic sequences obtained from ftp://ftp.hgsc.bcm.tmc.edu/pub/data/Btaurus/fasta/Btau20060815-freeze. The SNPs' sequences were constructed by concatenating the left flank sequence, one of the SNP alleles (1st character of ALLELEs), and right flank sequence. The sequences were blasted (linux megablast 2.2.15 was used for sequence comparison on a computer farm, against the bovine sequence assembly. The matches were further filtered to remove matches that have match length < 0.90 * seq_length. Furthermore, those SNPs that match to ambiguous locations and unknown chromosomes are ignored in this study.
Bovine Physical Map Distance vs. Genetic Map Distance
[0104] To calculate the average physical map distance for 1 cM of genetic distance, we used those SNP markers on bovine genetic map that can be placed on bovine sequence assembly, and calculated the distance between SNPs that were placed on the two extreme ends of each chromosome as the physical map distance and genetic map distance for physical map and genetic map respectively (ignoring the unmapped assembly sequences for the purpose of this study)
Table 2. Bovine Physical Map Distance vs. Genetic Map Distance
[0105] In Table 2, "PZG' is calculated as the physical distance over the genetic distance for each chromosome (Chromosome-wide P/G), and an average of physical distance over the genetic distance all chromosomesyields Genome-wide P/G. Based on the data presented in Table 2, a genetic distance of 1 cM is assumed to be equivalent to about 702660 bp in physical map distance.
Markers in close Proximity to Anchor Markers
[0106] Public SNPs that are in close proximity to the SNPs that we have identified
to be significantly associated with traits include Daughter Pregnancy Rate, Milk Yield and Composition, Net Merit, Productive Life, and Somatic Cell Score, were identified by their physical map location. If an Anchor Marker is located at position z, those public SNPs that are located between z-70266 and z+70266 are deemed as in close proximity of the Anchor Marker. These markers, which are closely associated with the anchor markers, are called "neighboring markers" or "neighboring SNPs".
Genes in close Proximity to Anchor Markers [01071 Bovine genes were retrieved from the file "Bos_taurus.Btau_3.1.43.pep.known.fa.gz" from ensemble ftp site at "ftp.ensembl.org/pub/current_bos_taurus/data/fasta/pep/". In total 18654 entries were included in the file with protein sizes ranging from 8 to 23992. The Ensembl genes are annotated by the Ensembl automatic analysis pipeline using either a GeneWise model from a species-specific or vertebrate protein, a set of aligned species-specific cDNAs followed by GenomeWise for ORF prediction or from GENSCAN exons supported by protein, cDNA and EST evidence. GeneWise models are further combined with available aligned cDNAs to annotate UTRs
(www. ensembl. org/Homo_sapiens/helpview?se=l ;ref=;kw=geneview). The positions where these genes locate on the bovine genome were retrieved from the FASTA sequence header lines. For example, the genome mapping information can be obtained from the header line ">ENSP00000328693 pep:novel chromosome:NCBI35:l :904515:910768: l gene:ENSG00000158815:transcript:ENST00000328693" as from 904,515 bp to 910,768 bp on chromosome 1 of the bovine genome (build NCBI35, the same as Btau_3.1). The stable transcript identifier is used as gene name in this application.
[0108] Genes that are in close proximity to "anchor markers" (the SNPs that we have identified to be significantly associated with traits, including Daughter Pregnancy Rate, Milk Yield and Composition, Net Merit, Productive Life, and Somatic Cell Score), were identified by their physical map location. If an anchor marker is located at position z, those bovine genes with any part located between z- 70266 and z+70266 are deemed as in close proximity of the anchor marker. These genes, which are closely associated with the anchor markers are called "neighboring genes". Markers within these genes, such as for example SNPs, which are closely
associated with the anchor markers, are called "neighboring markers" or "neighboring SNPs".
Example 2: Use of single nucleotide polymorphisms to improve offspring traits
LO 109] To improve the average genetic merit of a population for a chosen trait, one or more of the markers with significant association to that trait can be used in selection of breeding animals. In the case of each discovered locus, use of animals possessing a marker allele (or a haplotype of multiple marker alleles) in population-wide LD with a favorable QTL allele will increase the breeding value of animals used in breeding, increase the frequency of that QTL allele in the population over time and thereby increase the average genetic merit of the population for that trait. This increased genetic merit can be disseminated to commercial populations for full realization of value.
[01 10] For example, a progeny-testing scheme could greatly improve its rate of genetic progress or graduation success rate via the use of markers for screening juvenile bulls. Typically, a progeny testing program would use pedigree information and performance of relatives to select juvenile bulls as candidates for entry into the program with an accuracy of approx 0.5. However, by adding marker information, young bulls could be screened and selected with much higher accuracy. In this example, DNA samples from potential bull mothers and their male offspring could be screened with a genome-wide set of markers in linkage disequilibrium with QTL, and the bull-mother candidates with the best marker profile could be contracted for matings to specific bulls. If superovulation and embryo transfer (ET) is employed, a set of 5- 10 offspring could be produced per bull mother per flush procedure. Then the marker set could again be used to select the best male offspring as a candidate for the progeny test program. If genome-wide markers are used, it was estimated that accuracies of marker selection could reach as high as 0.85 (Meuwissen et al., 2001). This additional accuracy could be used to greatly improve the genetic merit of candidates entering the progeny test program and thereby increasing the probability of successfully graduating a marketable progeny-tested bulls. This information could also be used to reduce program costs by decreasing the number of juvenile bull candidates tested while maintaining the same number of successful graduates. In the
extreme, very accurate marker breeding values (MBV) could be used to directly market semen from juvenile sires without the need of progeny-testing at all. Due to the fact that juveniles could now be marketed starting at puberty instead of 4.5 to 5 years, generation interval could be reduced by more than half and rates of gain could increase as much as 68.3% (Schrooten et al., 2004). With the elimination of the need for progeny testing, the cost of genetic improvement for the artificial insemination industry would be vastly improved (Schaeffer, 2006).
[01 1 11 In an alternate example, a centralized or dispersed genetic nucleus (GN) population of cattle could be maintained to produce juvenile bulls for use in progeny testing or direct sale on the basis of MBVs. A GN herd of 1000 cows could be expected to produce roughly 3000 offspring per year, assuming the top 10-15% of females were used as ET donors in a multiple-ovulation and embryo-transfer (MOET) scheme. However, markers could change the effectiveness MOET schemes and in vitro embryo production. Previously, MOET nucleus schemes have proven to be promising from the standpoint of extra genetic gain, but the costs of operating a nucleus herd together with the limited information on juvenile animals has limited widespread adoption. However, with marker information, juveniles can be selected much more accurately than before resulting in greatly reduced generation intervals and boosted rates of genetic response. This is especially true in MOET nucleus herd schemes because, previously, breeding values of full-sibs would be identical, but with marker information the best full-sib can be identified early in life. The marker information would also help limit inbreeding because less selection pressure would be placed on pedigree information and more on individual marker information. An early study (Meuwissen and van Arendonk, 1992) found advantages of up to 26% additional genetic gain when markers were employed in nucleus herd scenarios; whereas, the benefit in regular progeny testing was much less.
[01 12] Together with MAS, female selection could also become an important source of genetic improvement particularly if markers explain substantial amounts of genetic variation. Further efficiencies could be gained by marker testing of embryos prior to implantation (Bredbacka, 2001). This would allow considerable selection to occur on embryos such that embryos with inferior marker profiles could be discarded prior to implantation and recipient costs. This would again increase the cost effectiveness of nucleus herds because embryo pre-selection would allow equal progress to be made
with a smaller nucleus herd. Alternatively, this presents further opportunities for preselection prior to bulls entering progeny test and rates of genetic response predicted to be up to 31% faster than conventional progeny testing (Schrooten et al., 2004).
[01 13J The first step in using a SNP for estimation of breeding value and selection in the GN is collection of DNA from all offspring that will be candidates for selection as breeders in the GN or as breeders in other commercial populations (in the present example, the 3,000 offspring produced in the GN each year). One method is to capture shortly after birth a small bit of ear tissue, hair sample, or blood from each calf into a labeled (bar-coded) tube. The DNA extracted from this tissue can be used to assay an essentially unlimited number of SNP markers and the results can be included in selection decisions before the animal reaches breeding age.
[01 14] One method for incorporating into selection decisions the markers (or marker haplotypes) determined to be in population-wide LD with valuable QTL alleles (see Example 1) is based on classical quantitative genetics and selection index theory (Falconer and Mackay, 1996; Dekkers and Chakraborty, 2001). To estimate the effect of the marker in the population targeted for selection, a random sample of animals with phenotypic measurements for the trait of interest can be analyzed with a mixed animal model with the marker fitted as a fixed effect or as a covariate (regression of phenotype on number of allele copies). Results from either method of fitting marker effects can be used to derive the allele substitution effects, and in turn the breeding value of the marker:
(Xi = q[a + d(q - p)] [Equation 3]
(X2 = -p[a + d(q - p)] [Equation 4]
(X = a + d(q - p) [Equation 5]
gAiAi = 2((X1) [Equation 6]
gA i A2 = ((X i ) + (Ot2) [Equation 7]
gA2A2 = 2(α2) [Equation 8]
where (X i and (X2 are the average effects of alleles 1 and 2, respectively; (X is the average effect of allele substitution; p and q are the frequencies in the population of alleles 1 and 2, respectively; a and d are additive and dominance effects, respectively;
§A I A II §A I A2 and §A2A2 are the (marker) breeding values for animals with marker genotypes AlAl , A1A2 and A2A2, respectively. The total trait breeding value for an animal is the sum of breeding values for each marker (or haplotype) considered and the residual polygenic breeding value:
EBV1J = Σ gj + U1 [Equation 9]
where EBVy is the Estimated Trait Breeding Value for the ith animal, Σ g j is the marker breeding value summed from j = 1 to n where n is the total number of markers (haplotypes) under consideration, and O1 is the polygenic breeding value for the ith animal after fitting the marker genotype(s).
[0115] These methods can readily be extended to estimate breeding values for selection candidates for multiple traits, the breeding value for each trait including information from multiple markers (haplotypes), all within the context of selection index theory and specific breeding objectives that set the relative importance of each trait. Other methods also exist for optimizing marker information in estimation of breeding values for multiple traits, including random models that account for recombination between markers and QTL (e.g., Fernando and Grossman, 1989), and the potential inclusion of all discovered marker information in whole-genome selection (Meuwissen et al., Genetics 2001). Through any of these methods, the markers reported herein that have been determined to be in population-wide LD with valuable QTL alleles may be used to provide greater accuracy of selection, greater rate of genetic improvement, and greater value accumulation in the dairy industry.
Example 3: Identification of SNPs
[01 16] A nucleic acid sequence contains a SNP of the present invention if it comprises at least 20 consecutive nucleotides that include and/or are adjacent to a polymorphism described in Tables 3, 4, 5, and 6and the Sequence Listing. Alternatively, a SNP of the present invention may be identified by a shorter stretch of consecutive nucleotides which include or are adjacent to a polymorphism which is described in Table 3, 4, 5, and 6 and the Sequence Listing in instances where the shorter sequence of consecutive nucleotides is unique in the bovine genome. A SNP site is usually characterized by the consensus sequence in which the polymorphic site is contained, the position of the polymorphic site, and the various alleles at the polymorphic site. "Consensus sequence" means DNA sequence constructed as the consensus at each nucleotide position of a cluster of aligned sequences.
[01 17]Consensus sequence can be based on either strand of DNA at the locus, and states the nucleotide base of either one of each SNP allele in the locus and the nucleotide bases of all Indels in the locus, or both SNP alleles using degenerate code (IUPAC code: M for A or C; R for A or G; W for A or T; S for C or G; Y for C or T; K for G or T; V for A or C or G; H for A or C or T; D for A or G or T; B for C or G or T; N for A or C or G or T; Additional code that we use include I for "-"or A; O for "-" or C; E for "-" or G; L for "-" or T; where "-" means a deletion). Thus, although a consensus sequence may not be a copy of an actual DNA sequence, a consensus sequence is useful for precisely designing primers and probes for actual polymorphisms in the locus.
[01 18] Such SNP have a nucleic acid sequence having at least 90% sequence identity, more preferably at least 95% or even more preferably for some alleles at least 98% and in many cases at least 99% sequence identity, to the sequence of the same number of nucleotides in either strand of a segment of animal DNA which includes or is adjacent to the polymorphism. The nucleotide sequence of one strand of such a segment of animal DNA may be found in a sequence in the group consisting of SEQ ID NO:1 through SEQ ID NO:262,149. It is understood by the very nature of polymorphisms that for at least some alleles there will be no identity at the polymorphic site itself. Thus, sequence identity can be determined for sequence that is exclusive of the polymorphism sequence. The polymorphisms in each locus are described in the tables and the sequence listing.
[01 19J Shown below are examples of public bovine SNPs that match each other:
SNP ss38333809 was determined to be the same as ss3833381O because 41 bases (with the polymorphic site at the middle) from each sequence match one another perfectly (match length=41 , identity=100%).
ss38333809 : tcttacacatcaggagatagytccgaggtggatttctacaa
I I I I I I I I I I Il Il I I I I I I I I Il I I I I I I I I I I I I I I I I I ss38333810 : tcttacacatcaggagatagytccgaggtggatttctacaa ss38333809 is SEQ ID NO:262146 ss38333810 is SEQ ID NO:262147
[0120] SNP ss38333809 was determined to be the same as ss38334335 because 41 bases (with the polymorphic site at the middle) from each sequence match one another at all bases except for one base (match length=41, identity=97%). ss38333809 : tcttacacatcaggagatagytccgaggtggatttctacaa
I I I I I I I I I I I I I I I I I I I I I I I I I I Il I I I I I I I I I I I I ss38334335 : tcttacacatcaggagatggytccgaggtggatttctacaa
SS38333809 is SEQ ID NO:262146 SS38334335 is SEQ ID NO:262149
Example 4: Quantification of and genetic evaluation for production traits
[0121] Quantifying production traits can be accomplished by measuring milk of a cow and milk composition at each milking, or in certain time intervals only. In the USDA yield evaluation the milk production data are collected by Dairy Herd Improvement Associations (DHIA) using ICAR approved methods. Genetic evaluation includes all cows with the known sire and the first calving in 1960 and later and pedigree from birth year 1950 on. Lactations shorter than 305 days are extended to 305 days. All records are preadjusted for effects of age at calving, month of calving, times milked per day, previous days open, and heterogeneous variance. Genetic evaluation is conducted using the single-trait BLUP repeatability model. The model includes fixed effects of management group (herd x year x season plus register
status), parity x age, and inbreeding, and random effects of permanent environment and herd by sire interaction. PTAs are estimated and published four times a year (February, May, August, and November). PTAs are calculated relative to a five year stepwise base i.e., as a difference from the average of all cows born in the current year, minus five (5) years. Bull PTAs are published estimating daughter performance for bulls having at least 10 daughters with valid lactation records.
Example 5: Quantification of reproductive traits in daughters (cows) and sires' PTAs
[0122] Quantification of and genetic evaluation of the reproductive capability such as calving ease (CE), occurrence of stillbirths (SB) and daughter pregnancy rate (DPR). Calving ease measures the ability of a particular cow (daughter) to calve easily. CE is scored by the owner on a scale of 1 to 5, 1 meaning no problems encountered or unobserved birth and 5 meaning extreme difficulty. The CE PTAs for sires are expressed as percent difficult births in primiparous daughter heifers (%DBH), where difficult births are those scored as requiring considerable force or being extremely difficult (4 or 5 on a five point scale). SB is scored by the owner on a scale of 1 to 3, 1 meaning the calf was born alive and was alive 48 h postpartum, 2 meaning the calf was born dead, and 3 indicating the calf was born alive but died within 48 h postpartum. SB scores of 2 and 3 are combined into a single category for evaluation. The SB PTAs for sires are expressed as percent stillbirths in daughter heifers (%SBH), where stillborn calves are those scored as dead at birth or born alive but died within 48 h of birth (2 or 3 on a three point scale). Pregnancy rate is a function of the number of days open, which is the number of days between calving and a successful breeding. DPR is defined as the percentage of nonpregnant cows (daughters) that become pregnant during each 21 -day period. A DPR PTA of "1" implies that daughters from this bull are 1 % more likely to become pregnant during that estrus cycle than a bull with a DPR PTA of zero.
Example 6: Quantification of and genetic evaluation for productive life (PL)
[0123] Productive life (PL) is defined as the length of time a cow remains in a milking herd before removal by voluntary or involuntary culling (due to health or fertility problems), or death. PL is usually measured as the number of days, months,
or days in milk (DM) from the first calving to the day the cow exits the herd (due to death, culling, or selling to non-dairy purposes). Because some cows are still alive at the time of data collection, their records are projected (VanRaden, P.M. and E. J. H. Klaaskate. 1993) or treated as censored (Ducrocq, 1987). The USDA genetic evaluation for PL includes all cows with first calving in 1960 and later (born in 1950 and later for the pedigree). Cows born at least 3 years prior to evaluation, with a valid sire ID and first lactation records are considered. PL is considered to be completed at 7 years of age. Records are extended for cows that have not had the opportunity to reach 7 years of age because they are still alive, were sold for dairy purposes, or the herd discontinued testing. Cows sold for dairy purposes or in herds that discontinued testing receive extended records if they had opportunity to reach 3 years of age; otherwise their records are discarded. The method of genetic evaluation is a single trait BLUP animal model. The statistical model includes effects of management group (based on herd of first lactation and birth date) and sire by herd interaction. Sires' PTAs for PL are calculated relative to a five year stepwise base i.e., as a difference from the average PL of all cows born in the current year, minus five (5) years.
Example 7: Quantification of somatic cell score in daughters (cows) and sires' PTAs
[0124] Quantifying somatic cell score (SCS) is accomplished by calculating log2 (SCC/100,000)+3, where SCC is number of somatic cells per milliliter of milk from a cow (daughter). The SCS PTAs for sires are expressed as a deviation from a SCS PTA of zero.
REFERENCES
[0125J The references cited in this application, both above and below, are specifically incorporated herein by reference.
Non-Patent Literature
Abdel-Azim, G and Freeman, AE, (2002) /. Dairy ScL 85:1869-1880.
Blott, S., Kim, J.J., Moisio, S., et al. (2003). Genetics 163: 253-266.
Ciobanu, DC, Bastiaansen, JWM, Longergan, SM, Thomsen, H, Dekkers, JCM, Plastow, GS, and Rothschild, MF, (2004) J. Anim. ScL 82:2829-39.
Cohen-Zinder, M. et al. (2005) Genome Res. 15:936-44.
Davis, GP and DeNise, SK, (1998) /. Anim. ScL 76:2331-39.
Dekkers, JCM, and Chakraborty, R, (2001) J. Anim. Sci. 79:2975-90.
Demars J, Riquet J, Feve K, Gautier M, Morisson M, Demeure O, Renard C, Chardon P, Milan D. (2006), BMC Genomics, 24:7-13
Du and Hoeschele, (2000) Genetics 156:2051-62.
Ducrocq, V. 1987. An analysis of length of productivelife in dairy cattle. Ph.D. Diss.. Cornell Univ., Ithaca, NY; Univ. Microfilms Int., Ann Arbor, MI.
Everts- van der Wind A, Larkin DM, Green CA, Elliott JS, Olmstead CA, Chiu R,
Schein JE, Marra MA, Womack JE, Lewin HA. (2005) Proc Natl Acad Sci U S A, 20; 102(51): 18526-31.
Falconer, DS, and Mackay, TFC, (1996) Introduction to Quantitative Genetics. Harlow, UK: Longman.
Fernando, R, and Grossman, M, (1989) Marker assisted selection using best linear unbiased prediction. Genetics Selection Evolution 21 :467-77.
Franco, MM, Antunes, RC, Silva, HD, and Goulart, LR (2005) J. Appl. Genet. 46(2): 195-200.
Grisart, B. et al. (2002) Genome Res. 12:222-231
Grosz, MD, Womack, JE, and Skow, LC (1992) Genomics, 14(4):863-868.
Hayes, B, and Goddard, ME, (2001) Genet. SeI. Evol. 33:209-229.
Hayes, B, Chamberlain, A.J., Goddard, M.E. (2006) Proc. 8th WCGALP 22:(16).
Kaminski, S, Ahman, A, Ruse, A, Wojcik, E, and Malewski, T (2005) J. Appl. Genet. 46(l):45-58.
Kuhn, C. et al. (2004). Genetics 167: 1873-81.
Kwok PY, Methods for genotyping single nucleotide polymorphisms, (2001), Annu. Rev. Genomics Hum. Genet., 2:235-258.
Markstein M, Zinzen R, Markstein P, Yee KP, Erives A, Stathopoulos A, and Levine M. 2004. A regulatory code for neurogenic gene expression in the Drosophila embryo. Development 131, 2387-2394
McDowell JC and Dean A. 1999. Structural and functional cross-talk between a distant enhancer and the epsilon-globin gene promoter shows interdependence of the two elements in chromatin. MoI Cell Biol. 19(1 1):7600-9.
Meuwissen, THE, and Van Arendonk, JAM, (1992) J. Dairy ScL 75:1651-1659.
Meuwissen, THE, Hayes, BJ, and Goddard, ME, (2001) Genetics. 157: 1819-29.
Rothschild and Plastow, (1999), AgBioTechNet 10: 1-8.
Schaeffer, LR (2006) J. Anim. Breed. Genet. 123:218-223.
Schnabel, R. et al. (2005) PNAS 102:6896-6901.
Schrooten, C, Bovenhuis, H, van Arendonk, JAM, and Bijma, P (2005) /. Dairy ScL 88: 1569-1581.
Sharma, BS, Jansen, GB, Karrow, NA, Kelton, D, and Jiang, Z, (2006) J. Dairy ScL 89:3653-3663.
Short, TH, et al. (1997) J. Anim. ScL 75:3138-3142.
Spelman, RJ and Bovenhuis, H, (\99S) Animal Genetics, 29:77-84.
Spelman, RJ and Garrick, DJ, (1998) J. Dairy Sci, 81 :2942-2950.
Stearns, TM, Beever, JE, Southey, BR, Ellis, M, McKeith, FK and Rodriguez-Zas, SL, (2005) J. Anim. ScL 83:1481-93.
Syvanen AC, Accessing genetic variation: genotyping single nucleotide polymorphisms, (2001) Nat. Rev. Genet. 2:930-942.
VanRaden, P.M. and E.J.H. Klaaskate. 1993. /. Dairy Sci. 76:2758-2764.
Verrier, E, (2001) Genet. SeI. Evol. 33: 17-38.
Villanueva, B, Pong-Wong, R, Fernandez, J, and Toro, MA (2005) J. Anim. ScL 83: 1747-52.
Weller JI, Kashi Y, Soller M. (1990) J. Dairy ScL 73:2525-37
Williams, JL, (2005), Rev. Sci. Tech. Off. Int. Epiz. 24(1):379-391.
Windig, JJ, and Meuwissen, THE, (2004) J. Anim. Breed. Genet. 121 :26-39.
Winter, A. et al. (2002). PNAS, 99:9300-9305.
Womack, J, (1987), Dev. Genet. 8(4):281 -293.
Yasue H, Kiuchi S, Hiraiwa H, Ozawa A, Hayashi T, (2006), Cytogenet. Genome Res., 1 12(1 -2):121-125.
Youngerman, SM, Saxton, AM, Oliver, SP, and Pighetti, GM, (2004) J. Dairy Sci. 87:2442-2448.
Zondervan, K and Cardon, L (2004)_Nat Rev Genet. 2004 Feb; 5(2):89-100
Patent Literature (Swine)
Patent Literature (Dairy)
[0126] Table 3: provides a list of regions in the first column, SEQ ID numbers of anchor markers in the second column, SEQ ID numbers for markers within the region in the third column, and trait association abbreviations in the fourth column. Columns five through eight, nine through twelve, and thirteen through sixteen contain similarly arranged information. The abbreviations used for the trait associations are as follows:
M = Milk production: MILK, FAT, FATPCT, PROT, PROTPCT
S = Somatic Cell Score: SCS
D = Daughter Pregnancy Rate: DPR
P = Productive Life: PL
N = Net Merit: NM
[0127] Table 4: provides a list of anchor marker SEQ ID numbers in the first column, gene names in the second column, SEQ ID numbers of SNPs within the gene in the third column, and trait association abbreviations in the fourth column. Columns five through eight, nine through twelve, and thirteen through sixteen contain similarly arranged information. The abbreviations used for the trait associations are as follows:
M = Milk production: MILK, FAT, FATPCT, PROT, PROTPCT
S = Somatic Cell Score: SCS
D = Daughter Pregnancy Rate: DPR
P = Productive Life: PL
N = Net Merit: NM
[0128] In order to reduce redundancy, the prefix of each gene name has been replaced with a single letter. For example, ENSBTAT00000033728 is represented as Z033728; where "Z" in gene name stands for "ENSBTAT00000".
[0129] Table 5: provides a list of phenotypic traits and the assigned identification numbers of SNPs found to be associated with each trait. The left column provides a counter to allow easier reading of the table. The "Trait" column lists the following traits: "FITNESS", rows 1-2397; and "PRODUCTIVITY", rows 2398-51 17.
[0130] Table 6: provides the SEQ ID NO of the sequence associated with each of the SNPs listed in Table 5. The "SNP POSITION" column provides the position (nucleotide number) of the SNP within the associated sequence (SEQ ID NO) and the "SNP ALLELE 1" and "SNP ALLELE 2" columns provide the identity of the two nucleotides that occur most frequently at the SNP POSITION within the population analyzed.
[0131] Table 7: provides a list of the SNP ID NOs and SEQ ID NOs, listed in Tables 5 and 6, sorted numerically according to SNP ID NO.
Claims
1. A method for allocating an animal for use according to the animal's predicted marker breeding value for one or more traits selected from the group consisting of milk production, somatic cell score, daughter pregnancy rate, productive life, and/or net merit, the method comprising: a. determining the animal's genotype at one or more locus/loci; wherein at least one locus contains a single nucleotide polymorphism (SNP), having at least two allelic variants; and wherein at least one SNP is: i) selected from the SNPs described in Table 3 and the Sequence Listing; and/or ii) SNPs located on the same chromosome and within 70 kilobases of one or more of the SNP anchor markers described in Table 3 and the Sequence Listing"; b. analyzing the determined genotype of at least one evaluated animal at one or more SNPs selected from: i) the SNPs described in Table 3 and the Sequence Listing; and/or ii) SNPs located on the same chromosome and within 70 kilobases of one or more of the SNP anchor markers described in Table 3 and the Sequence Listing, c. correlating the analyzed genotype with one or more phenotypes using Table 3 and the Sequence Listing d. allocating the animal for use according to its determined genotype.
2. The method of claim 1 wherein the animal's genotype is evaluated at 10 or more loci that contain SNPs selected from: a. the SNPs described in Table 3 and the Sequence Listing; b. and/or SNPs located on the same chromosome and within 70 kilobases of one or more of the SNP anchor markers described in Table 3 and the Sequence Listing.
3. The method of claim 2 wherein the animal's genotype is evaluated at 50 or more loci .
4. The method of claim 2 wherein the animal's genotype is evaluated at 100 or more loci.
5. The method of claim 2 wherein the animal's genotype is evaluated at 200 or more loci.
6. The method of 2 wherein the animal's genotype is evaluated at 500 or more loci.
7. The method of any of claims 1 to 6 that comprises whole-genome analysis.
8. A method for selecting a potential parent animal for improvement of one or more traits selected from the group consisting of milk production, somatic cell score, daughter pregnancy rate, productive life, and net merit : a. determining at least one potential parent animal's genotype at one or more genomic locus/loci; wherein at least one locus contains a single nucleotide polymorphism (SNP) that has at least two allelic variants, and wherein at least one SNP is selected from: i) the SNPs described in Table 3 and the Sequence Listing; ii) and/or SNPs located on the same chromosome and within 70 kilobases of one or more of the SNP anchor markers described in Table 3 and the Sequence Listing; b. analyzing the determined genotype of at least one evaluated animal for one or more SNPs selected from: i) the SNPs described in Table 3; ii) and/or SNPs located on the same chromosome and within 70 kilobases of one or more of the SNP anchor markers described in Table 3 and the Sequence Listing; c. correlating the identified allelic variants with a milk production phenotype, somatic cell score phenotype, daughter pregnancy rate
phenotype, productive life phenotype, and/or net merit phenotype; using the information provided in Table 3 and the sequence listing, d. allocating at least one animal for breeding use based on its genotype.
9. The method of claim 8 wherein the potential parent animal's genotype is evaluated at 10 or more loci that contain SNPs selected from: a. the SNPs described in Table 3; and/or b. SNPs located on the same chromosome and within 70 kilobases of one or more of the SNP anchor markers described in Table 3 and the Sequence Listing.
10. The method of claim 8 wherein the potential parent animal's genotype is evaluated at 50 or more loci.
1 1. The method of claim 8 wherein the potential parent animal's genotype is evaluated at 100 or more loci.
12. The method of claim 8 wherein the potential parent animal's genotype is evaluated at 200 or more loci.
13. The method of claim 8 wherein the potential parent animal's genotype is evaluated at 500 or more loci.
14. The method of any of claims 8 to 13 that comprises whole-genome analysis.
15. A method of producing progeny animals comprising: a) identifying at least one potential parent animal that has been allocated for breeding in accordance with the method of any of claims 1 to 7; b) producing progeny from the allocated animal through a process selected from the group consisting of: i) natural breeding; ii) artificial insemination; iii) in vitro fertilization; and
iv) collecting semen/spermatozoa or at least one ovum from the animal and contacting it, respectively, with ovum/ova or semen/spermatozoa from a second animal to produce a conceptus by any means.
16. The method of claim 15 comprising producing progeny through natural breeding.
17. The method of claim 15 comprising producing offspring through artificial insemination, embryo transfer, and/or in vitro fertilization.
18. A nucleic acid array for determining which allele of at least 100 SNPs associated with one or more traits selected from the group consisting of milk production, somatic cell score, daughter pregnancy rate, productive life, and net merit are present in a sample; wherein the array comprises 100 or more nucleic acid sequences capable of hybridizing, under stringent conditions, with at 100 or more SNPs selected from: a. the SNPs described in Table 3 and the Sequence Listing; and/or b. SNP(s) located on the same chromosome and within 70 kilobases of one or more of the SNP anchor markers described in Table 3 and the Sequence Listing.
19. A method of identifying quantitative trait locus associated with milk production, somatic cell score, daughter pregnancy rate, productive life, and/or net merit by identifying a genetic marker located on the same chromosome and within 70 kilobases of at least one SNP selected from the group of SNP anchor markers described in Table 3 and the Sequence Listing, the method comprising: a) selecting a SNP from the group of anchor SNPs described in Table 3 and the Sequence Listing; wherein said SNP is associated with at least one trait as described in Table 3. b) identifying a genetic polymorphism within 70 kilobases of this SNP.
20. The method of claim 19 wherein the genetic polymorphism is a SNP.
21. The method of claim 19 wherein the genetic polymorphism is a causal mutation underlying a quantitative trait locus
22. A method comprising evaluating an animal's genotype at 100 or more genomic locus/loci; wherein at least 100 loci each comprise a single nucleotide polymorphism (SNP) selected from: a. the SNPs described in Table 3 and the Sequence Listing; and/or b. SNPs located on the same chromosome and within 70 kilobases of one or more of the SNP anchor markers described in Table 3 and the Sequence Listing.
23. The method of claim 22 wherein said SNP is associated with at least one trait as described in table 3.
24. The method of claim 22 wherein the animal's genotype is evaluated at 200 or more loci.
25. The method of claim 22 wherein the animal's genotype is evaluated at 500 or more loci
26. The method of any of claim 22 that comprises whole-genome analysis.
27. A method for allocating an animal for use according to the animal's predicted marker breeding value for one or more traits selected from the group consisting of milk production, somatic cell score, daughter pregnancy rate, productive life, and/or net merit, the method comprising: a. determining the animal's genotype at one or more locus/loci; wherein at least one locus contains a single nucleotide polymorphism (SNP), having at least two allelic variants; and wherein at least one SNP is selected from:
i) the SNPs described in Table 4 and the Sequence
Listing; and/or ii) a SNP located in a gene described in Table 4; b. analyzing the determined genotype of at least one evaluated animal at one or more SNPs selected from: i) the SNPs described in Table 4 and the Sequence
Listing; and/or ii) a SNP located in a gene described in Table 4; c. correlating the analyzed genotype with one or more phenotypes using Table 4 and the Sequence Listing d. allocating the animal for use according to its determined genotype.
28. The method of claim 27 wherein the animal's genotype is evaluated at 10 or more loci that contain SNPs selected from: a. the SNPs described in Table 4 and the Sequence Listing; and/or b. a SNP located in a gene described in Table 4;
29. The method of claim 28 wherein the animal's genotype is evaluated at 50 or more loci .
30. The method of claim 28 wherein the animal's genotype is evaluated at 100 or more loci.
31. The method of claim 28 wherein the animal's genotype is evaluated at 200 or more loci.
32. The method of 28 wherein the animal's genotype is evaluated at 500 or more loci.
33. The method of any of claims 27 to 32 that comprises whole-genome analysis.
4. A method for selecting a potential parent animal for improvement of one or more traits selected from the group consisting of milk production, somatic cell score, daughter pregnancy rate, productive life, and net merit : a. determining at least one potential parent animal's genotype at one or more genomic locus/loci; wherein at least one locus contains a single nucleotide polymorphism (SNP) that has at least two allelic variants, and wherein at least one SNP is selected from: i) the SNPs described in Table 4 and the Sequence Listing; and/or ii) a SNP located in a gene described in Table 4;
b. analyzing the determined genotype of at least one evaluated animal for one or more SNPs selected from: i) the SNPs described in Table 4; and/or ii) a SNP located in a gene described in Table 4;
c. correlating the identified allelic variants with a milk production phenotype, somatic cell score phenotype, daughter pregnancy rate phenotype, productive life phenotype, and/or net merit phenotype; using the information provided in Table 4 and the sequence listing. d. allocating at least one animal for breeding use based on its genotype.
35. The method of claim 34 wherein the potential parent animal's genotype is evaluated at 10 or more loci that contain SNPs selected from: a. the SNPs described in Table 4; and/or b. a SNP located in a gene described in Table 4;
36. The method of claim 34 wherein the potential parent animal's genotype is evaluated at 50 or more loci.
37. The method of claim 34 wherein the potential parent animal's genotype is evaluated at 100 or more loci.
38. The method of claim 34 wherein the potential parent animal's genotype is evaluated at 200 or more loci.
39. The method of claim 34 wherein the potential parent animal's genotype is evaluated at 500 or more loci.
40. The method of any of claims 34 to 39 that comprises whole-genome analysis.
41. A method of producing progeny animals comprising: a) identifying at least one potential parent animal that has been allocated for breeding in accordance with the method of any of claims 27 to 34; b) producing progeny from the allocated animal through a process selected from the group consisting of: i) natural breeding; ii) artificial insemination; iii) in vitro fertilization; and iv) collecting semen/spermatozoa or at least one ovum from the animal and contacting it, respectively, with ovum/ova or semen/spermatozoa from a second animal to produce a conceptus by any means.
42. The method of claim 41 comprising producing progeny through natural breeding.
43. The method of claim 41 comprising producing offspring through artificial insemination, embryo transfer, and/or in vitro fertilization.
44. A nucleic acid array for determining which allele of at least 100 SNPs associated with one or more traits selected from the group consisting of milk production, somatic cell score, daughter pregnancy rate, productive life, and net merit are present in a sample; wherein the array comprises 100 or more nucleic acid sequences capable of hybridizing, under stringent conditions, with at 100 or more SNPs selected from: a. the SNPs described in Table 4 and the Sequence Listing; and/or b. a SNP located in a gene described in Table 4;
5. A method of identifying quantitative trait locus associated with milk production, somatic cell score, daughter pregnancy rate, productive life, and/or net merit by identifying a genetic marker located on the same chromosome and within 70 kilobases of at least one SNP selected from the group of SNP anchor markers described in Table 4 and the Sequence Listing, the method comprising: a) selecting a SNP from the group of anchor SNPs described in Table 4 and the Sequence Listing; wherein said SNP is associated with at least one trait as described in Table 4. b) identifying a gene within 70 kilobases of this SNP. c) associating said gene with at least one trait described in Table 4
46. The method of claim 45 further comprising identifying a genetic polymorphism within said gene.
47. The method of claim 46 wherein the genetic polymorphism is a causal mutation underlying a quantitative trait locus
48. A method comprising evaluating an animal's genotype at 100 or more genomic locus/loci; wherein at least 100 loci each comprise a single nucleotide polymorphism (SNP) selected from: a. the SNPs described in Table 4 and the Sequence Listing; and/or b. a SNP located in a gene described in Table 4;
49. The method of claim 48 wherein said SNP is associated with at least one trait as described in Table 4.
50. The method of claim 48 wherein the animal's genotype is evaluated at 200 or more loci.
51. The method of claim 48 wherein the animal's genotype is evaluated at 500 or more loci.
52. The method of any of claim 48 that comprises whole-genome analysis.
53. A method for allocating an animal for use according to the animal's predicted marker breeding value for productivity and/or fitness, the method comprising: a. determining the animal's genotype at one or more locus/loci; wherein at least one locus contains a single nucleotide polymorphism (SNP), having at least two allelic variants; and wherein at least one SNP is selected from the SNPs described in Tables 5 and 6; b. analyzing the determined genotype of at least one evaluated animal at one or more SNPs selected from the SNPs described in Tables 5 and 6 to determine which allelic variant is present; c. allocating the animal for use according to its determined genotype.
54. The method of claim 53 wherein the animal's genotype is evaluated at 10 or more loci that contain SNPs selected from the SNPs described in Tables 5 and 6.
55. The method of claim 53 wherein the animal's genotype is evaluated at 50 or more loci that contain SNPs selected from the SNPs described in Tables 5 and 6.
56. The method of claim 53 wherein the animal's genotype is evaluated at 100 or more loci that contain SNPs selected from the SNPs described in Tables 5 and 6.
57. The method of claim 53 wherein the animal's genotype is evaluated at 200 or more loci that contain SNPs selected from the SNPs described in Tables 5 and 6.
58. The method of any of claims 53 to 57 wherein SNPs evaluated are associated with fitness.
59. The method of any of claims 53 to 57 wherein SNPs evaluated are associated with productivity.
60. The method of any of claims 53 to 57 that comprises whole-genome analysis.
61. A method for selecting a potential parent animal for breeding to improve fitness and/or productivity in potential offspring: a. determining at least one potential parent animal's genotype at one or more genomic locus/loci; wherein at least one locus contains a single nucleotide polymorphism (SNP) that has at least two allelic variants, and wherein at least one SNP is selected from the SNPs described in Tables 5 and 6; b. analyzing the determined genotype of at least one evaluated animal for one or more SNPs selected from the SNPs described in Tables 5 and 6 to determine which allele is present; c. correlating the identified allele with a fitness and/or productivity phenotype; d. allocating at least one animal for breeding use based on its genotype.
62. The method of claim 61 wherein the potential parent animal's genotype is evaluated at 10 or more loci that contain SNPs selected from the SNPs described in Tables 5 and 6.
63. The method of claim 61 wherein the potential parent animal's genotype is evaluated at 50 or more loci that contain SNPs selected from the SNPs described in Tables 5 and 6.
64. The method of claim 61 wherein the potential parent animal's genotype is evaluated at 100 or more loci that contain SNPs selected from the SNPs described in Tables 5 and 6.
65. The method of claim 61 wherein the potential parent animal's genotype is evaluated at 200 or more loci that contain SNPs selected from the SNPs described in Tables 5 and 6.
66. The method of any of claims 61 to 65 wherein the potential parent animal is selected to improve fitness in the potential offspring.
67. The method of any of claims 61 to 65 wherein the potential parent animal is selected to improve productivity in the potential offspring.
68. The method of any of claims 61 to 65 that comprises whole-genome analysis.
69. A method of producing progeny animals comprising: a) t identifying at least one potential parent animal that has been allocating for breeding in accordance with the method of claim 1 ; b) producing progeny from the allocated animal through a process comprising: i) natural breeding; ii) artificial insemination; iii) in vitro fertilization; and/or ii) collecting semen/spermatozoa or at least one ovum from the animal and contacting it, respectively, with ovum/ova or semen/spermatozoa from a second animal to produce a conceptus by any means.
70. The method of claim 69 comprising producing progeny through natural breeding.
71. The method of claim 69 comprising producing offspring through artificial insemination, embryo transfer, and/or in vitro fertilization.
72. The method of claim 69 wherein the potential parent animal's genotype is evaluated at 10 or more loci that contain SNPs selected from the SNPs described in Tables 5 and 6.
73. The method of claim 69 wherein the potential parent animal's genotype is evaluated at 50 or more loci that contain SNPs selected from the SNPs described in Tables 5 and 6.
74. The method of claim 69 wherein the potential parent animal's genotype is evaluated at 100 or more loci that contain SNPs selected from the SNPs described in Tables 5 and 6.
75. The method of claim 69 wherein the potential parent animal's genotype is evaluated at 200 or more loci that contain SNPs selected from the SNPs described in Tables 5 and 6.
76. The method of any of claims 69 to 75 wherein the potential parent animal is selected to improve fitness in the offspring.
77. The method of any of claims 69 to 75 wherein the potential parent animal is selected to improve productivity in the offspring.
78. The method of any of claims 69 to 75 that comprises whole-genome analysis.
79. A nucleic acid array for determining which allele of at least 10 of the SNPs described by Tables 5 and 6 are present in a sample; wherein the array comprises 10 or more nucleic acid sequences capable of hybridizing, under stringent conditions, with at 10 or more SNPs selected from the group consisting of the SNPs described by Tables 5 and 6.
80. The array of claim 79 wherein the array is capable of determining which allele is present for each of 25 or more SNPs selected from the group consisting of the SNPs described by Tables 5 and 6.
81. The array of claim 79 wherein the array is capable of determining which allele is present for each of 50 or more SNPs selected from the group consisting of the SNPs described by Tables 5 and 6.
82. The array of claim 79 wherein the array is capable of determining which allele is present for each of 100 or more SNPs selected from the group consisting of the SNPs described by Tables 5 and 6.
83. The array of any of claims 79 to 82 for determining which alleles are present for SNPs associated with fitness.
84. The array of any of claims 79 to 82 for determining which alleles are present for SNPs associated with productivity.
85. A method of identifying quantitative trait locus associated with production or fitness by identifying a genetic marker in allelic association with at least one SNP selected from the group of SNPs described in Tables 5 and 6, the method comprising: a) identifying a genetic marker Bi suspected of being in allelic association with a marker A( selected from the group of SNPs described in Tables 5 and 6; b) determining whether Ai and Bi are in allelic association; wherein allelic association exists if r>0.2 for Equation 1 for a population sample of at least 100 animals and wherein Equation 1 is:
and wherein Ai represents an allele of a SNP described in Tables 5 and 6; Bi represents a genetic marker at another locus; f(AiBi) denotes frequency of having both Ai and Bi; f(A|) is the frequency of Ai in the population; and f(B|) is the frequency of Bi in a population.
86. The method of claim 85 wherein the genetic marker Bi is a SNP.
87. The method of claim 85 wherein the genetic marker identified is in linkage disequilibrium with at least one SNP selected from the group of SNPs described in Tables 5 and 6.
88. The method of claim 85 wherein Bi is a causal mutation underlying a quantitative trait locus
89. The method of claim 85 wherein r2 > 0.5.
90. The method of claim 85 wherein r2 > 0.9.
91. The method of any of claims 85 to 90 for identifying a genetic marker in allelic association with a SNP associated with fitness.
92. The method of any of claims 85 to 90 for identifying a genetic marker in allelic association with a SNP associated with productivity.
93. The method of claim 91 wherein the genetic marker is in linkage disequilibrium with a SNP associated with fitness.
94. The method of claim 92 wherein the genetic marker is in linkage disequilibrium with a SNP associated with productivity.
95. The method of claim 91 wherein the identified genetic marker is a causative mutation.
96. The method of claim 92 wherein the identified genetic marker is a causative mutation.
97. A method comprising evaluating an animal's genotype at 10 or more genomic locus/loci; wherein at least 10 loci each comprise a single nucleotide polymorphism (SNP) selected from the SNPs described in Tables 5 and 6.
98. The method of claim 97 wherein the animal's genotype is evaluated at 50 or more loci that contain SNPs selected from the SNPs described in Tables 5 and 6.
99. The method of claim 98 wherein the animal's genotype is evaluated at 100 or more loci that contain SNPs selected from the SNPs described in Tables 5 and 6.
100. The method of claim 97 wherein the animal's genotype is evaluated at 200 or more loci that contain SNPs selected from the SNPs described in Tables 5 and 6.
101. The method of claim 97 wherein the animal's genotype is evaluated at 500 or more loci that contain SNPs selected from the SNPs described in Tables 5 and 6.
102. The method of any of claims 97 to 101 where the SNPs are associated with fitness.
103. The method of any of claims 97 to 101 wherein the SNPs are associated with productivity.
104. The method of any of claims 97 to 101 that comprises whole-genome analysis.
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US84854106P | 2006-09-29 | 2006-09-29 | |
US60/848,541 | 2006-09-29 | ||
US91909907P | 2007-03-20 | 2007-03-20 | |
US60/919,099 | 2007-03-20 | ||
US93168007P | 2007-05-24 | 2007-05-24 | |
US60/931,680 | 2007-05-24 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2008140467A2 true WO2008140467A2 (en) | 2008-11-20 |
WO2008140467A3 WO2008140467A3 (en) | 2016-06-09 |
Family
ID=40002777
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2007/021187 WO2008140467A2 (en) | 2006-09-29 | 2007-09-28 | Genetic markers and methods for improving dairy productivity and fitness traits |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2008140467A2 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2010020252A1 (en) * | 2008-08-19 | 2010-02-25 | Viking Genetics Fmba | Methods for determining a breeding value based on a plurality of genetic markers |
EP2178363A2 (en) * | 2007-07-16 | 2010-04-28 | Pfizer, Inc. | Methods of improving a genomic marker index of dairy animals and products |
CN103576829A (en) * | 2012-08-01 | 2014-02-12 | 复旦大学 | Hybrid genetic algorithm based dynamic cloud-computing virtual machine scheduling method |
EP3204498A4 (en) * | 2014-10-08 | 2018-03-07 | Dow AgroSciences LLC | Gho/sec24b2 and sec24b1 nucleic acid molecules to control coleopteran and hemipteran pests |
CN108324405A (en) * | 2018-02-05 | 2018-07-27 | 杭州博古科技有限公司 | One boar cultivates artificial insemination pregnancy rate evaluation system |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2002224229B2 (en) * | 2000-10-31 | 2007-11-29 | Wouter Herman Robert Coppieters | Marker assisted selection of bovine for improved milk production using diacylglycerol acyltransferase gene DGAT1 |
WO2005030789A1 (en) * | 2003-09-23 | 2005-04-07 | The Arizona Board Of Regents On Behalf Of The University Of Arizona | Adrenergic receptor snp for improved milking characteristics |
-
2007
- 2007-09-28 WO PCT/US2007/021187 patent/WO2008140467A2/en active Application Filing
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2178363A2 (en) * | 2007-07-16 | 2010-04-28 | Pfizer, Inc. | Methods of improving a genomic marker index of dairy animals and products |
EP2178363A4 (en) * | 2007-07-16 | 2010-07-21 | Pfizer | Methods of improving a genomic marker index of dairy animals and products |
WO2010020252A1 (en) * | 2008-08-19 | 2010-02-25 | Viking Genetics Fmba | Methods for determining a breeding value based on a plurality of genetic markers |
CN103576829A (en) * | 2012-08-01 | 2014-02-12 | 复旦大学 | Hybrid genetic algorithm based dynamic cloud-computing virtual machine scheduling method |
EP3204498A4 (en) * | 2014-10-08 | 2018-03-07 | Dow AgroSciences LLC | Gho/sec24b2 and sec24b1 nucleic acid molecules to control coleopteran and hemipteran pests |
CN108324405A (en) * | 2018-02-05 | 2018-07-27 | 杭州博古科技有限公司 | One boar cultivates artificial insemination pregnancy rate evaluation system |
Also Published As
Publication number | Publication date |
---|---|
WO2008140467A3 (en) | 2016-06-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20110123983A1 (en) | Methods of Using Genetic Markers and Related Epistatic Interactions | |
US20100324356A1 (en) | Methods for improving genetic profiles of dairy animals and products | |
US20110262909A1 (en) | Genetic Markers for Horned and Polled Cattle and Related Methods | |
US20200375156A1 (en) | Genetic markers and uses therefor | |
JP2020074781A (en) | Method of breeding cows for improved milk yield | |
WO2008140467A2 (en) | Genetic markers and methods for improving dairy productivity and fitness traits | |
US20110054246A1 (en) | Whole genome scan to discover quantitative trai loci (qtl) affecting growth, body composition, and reproduction in maternal pig lines | |
EP2178363A2 (en) | Methods of improving a genomic marker index of dairy animals and products | |
WO2009055805A2 (en) | Genetic markers and methods for improving swine genetics | |
WO2008024227A2 (en) | Genetic markers and methods for improving swine genetics | |
WO2008089108A2 (en) | Sire early selection for male fertility using single nucleotide polymorphisms (snps) of the dazl gene | |
RU2754039C2 (en) | Method for predicting resistance | |
JP7465485B2 (en) | DNA marker for use in determining risk of developing mastitis and method for determining risk of mastitis using the same | |
Dash et al. | Exploring haplotype block structure, runs of homozygosity, and effective population size among dairy cattle breeds of India | |
CN111154893A (en) | SNP (single nucleotide polymorphism) marker related to pig growth speed and application thereof | |
AU2013204384A1 (en) | Genetic markers for horned and polled cattle and related methods |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 07874069 Country of ref document: EP Kind code of ref document: A2 |
|
NENP | Non-entry into the national phase in: |
Ref country code: DE |
|
122 | Ep: pct app. not ent. europ. phase |
Ref document number: 07874069 Country of ref document: EP Kind code of ref document: A2 |