WO2006101623A2

WO2006101623A2 - Cstf1 and c20orf43 markers for meat quality and growth rate in animals

Info

Publication number: WO2006101623A2
Application number: PCT/US2006/005214
Authority: WO
Inventors: Max F. Rothschild; Antonio Marcos Costa Do Amaral Ramos
Original assignee: Iowa State University Research Foundation, Inc.
Priority date: 2005-03-23
Filing date: 2006-02-15
Publication date: 2006-09-28
Also published as: WO2006101623A3

Abstract

Disclosed herein is fine mapping of a quantitative trait locus on Chromosome 17 which is associated with meat traits, growth and fatness. The quantitative trait locus correlates with several major effect genes which have phenotypic correlations with animal growth and meat quality which may be used for marker assisted breeding. Specific polymorphic alleles of the CSTF1 and C20orf43 genes are disclosed for tests to screen animals to determine those more likely to produce desired traits.

Description

TITLE: CSTFl And C20orf43 Markers For Meat Quality And Growth Rate In

Animals

GRANT REFERENCE CLAUSE This invention was funded in part by grant USDA/CSREES Grant No. 2004-31100-

06019, USDA/CSREES Grant No. 2003-31100-06019, and USDA/CSREES Grant No. 2002-31100-06019. The government may have certain rights in this invention.

FIELD OF THE INVENTION This invention relates generally to the detection of genetic differences among animals. More particularly, the invention relates to genetic variation that is indicative of heritable phenotypes associated with higher meat quality and growth and fat deposition. Methods and compositions for use of specific genetic markers and chromosomal regions associated with the variation in genotyping of animals and selection are also disclosed.

BACKGROUND OF THE INVENTION

Researchers have found that quantitative trait phenotypes are continuously distributed in natural populations, due to segregation of alleles at multiple genes in different regions. These quantitative trait loci (QTL) combined with differences in environmental sensitivity of QTL alleles affect the phenotypes. Determining the genetic and environmental bases of variation for quantitative traits is important for human health, agriculture, and the study of evolution. But, complete genetic dissection of quantitative traits is currently feasible only in genetically tractable and well characterized model systems. (Mackay, Nat. Rev. Genet. 2:11-20 (2001); Wright et al., Genome Biol. 2: 2007.1-2007.8 (2001)). For example, the number of genes involvedin quantitative genetic variation is not known, the number and effects of individual alleles at these genes, or the gene action is also generally unknown. To date, genes and causal variants have been detected for very few quantitative traits. For example, such quantitative traits such as double-muscling in cattle (Grobet et al., Mamm. Genome 9:210-213 (1998), alteration in fruit size (Frary et al., Science 289:85-88 (2000), growth and performance traits in pigs (Kimet al., Mamm. Genome 11:131-135 (2000), excess glycogen content in pig skeletal muscle Ciobanu et al, Genetics 159:1151-1162, Evidence for New Alleles in the Protein Kinase Adenosine Monophosphate-Activated 7₃-Subunit Gene Associated With Low Glycogen Content in Pig Skeletal Muscle and Improved Meat Quality (2001); Milan et al., Science 288:1248-1251 (2000), and increased ovulation and litter size in sheep (Wilson et al., Biol. Reprod. 64: 1225-1235 (2001). The effects of the mutations in the majority of these examples are so large that the phenotypes segregate almost as Mendelian traits.

To understand and exploit the genetics of complex quantitative traits, experimental populations derived from two lines differing widely for traits of interest have been successfully used inmodel species (Belknap et al., Behav. Genet. 23:213-222 (1993); Talbot et al., Nat. Genet. 21:305-308 (1999)), plants (Paterson et al., Nature 335:721-726 (1988)), and livestock (Andersson etal., Science 263:1771-1774 (1994)) to detect quantitative trait loci (QTL). These studies have succeeded in mapping QTL for which alleles differ in frequency between the parental populations, for example, between commercial agricultural cultivars and wild-type populations (Paterson et al., Nature 335:721-726 (1988); Andersson et al., Science 263:1771-1774 (1994)). In addition to understanding the architecture of quantitative traits, crosses involving agricultural species are also motivated by the potential to exploit variation within elite populations; commercial plant and animal populations are usually not based upon the same crosses that are used in the QTL detection studies but the power of linkage studies in line crosses is generally greater than that of studies within populations. In commercial pig breeding populations, for example, elite populations comprise closed outbred populations that have been subjected to selection over a number of generations to improve their commercial performance, whereas wild boar (Andersson et al., Science 263:1111-111 A (1994)) and Chinese Meishan (Walling et al. Anim. Genet. 29:415-424 (1998); DeKoning et al, Genetics 152:1679-1690 (1999); De Koning et al, Proc. Natl. Acad. ScL USA 97:7947- 7950 (2000); Bidanel et al., Genet. SeI. Evol. 33:289-309 (2001)) populations have been often employed in QTL studies. The implicit hypothesis in many QTL studies using divergent lines is that knowledge of between-population genetic variation can be extrapolated to genetic variation in other populations or species. Segregation at QTL in commercial populations can be utilized by breeders through gene- or marker-assisted selection programs (e.g., Dekkers and Hospital, Nat. Rev. Genet. 3:22-32 (2002)). Selection for meat and fat production, for example, in pigs has taken place for centuries, but intense selection using modern statistical methods has been practiced for only the past -50 years (Clutter, A. C, and E. W. Brascamp, 1998 Genetics of performance traits, pp. 427-462 in The Genetics of the Pig, edited by M. F. Rothschild and A. Ruvinsky. CAB International, Wallingford, UK).

Until recently, it has been impracticable to identify the genes that are responsible for variation in continuous traits, or to directly observe the effects of their different alleles. But now, the abundance of genetic markers has made it possible to identify quantitative trait loci (QTL)- the regions of a chromosome or, individual sequence variants that are responsible for trait variation. (Barton et al., Nat. Rev. Genet 3:11-21 (2002)). To the extent that genes are conserved among species and animals, it is expected that the different alleles will also correlate with variability in certain gene(s) as well as in economic or meat- producing animal species such as cattle, sheep, chicken, etc. There are instances of conserved polymorphisms among species. For example, Nonneman et al. recently discovered a polymorphism in exon 2 of the porcine TBG gene that results in the amino acid change of the consensus histidine to an asparagine. This SNP resides in the ligand- binding domain of the mature polypeptide and the Meishan allele is the conserved allele found in human, bovine, sheep and rodent TBG. Mutations in this region of human TBG result in decreased heat stability and affinity for ligand. Functional studies indicate altered binding characteristics of the TBG isoforms. Nonneman et al., Plant & Animal Genomes XII Conference, "Functional Validation of A Polymorphism for Testis Size on the Porcine X Chromosome", January 10-14, 2004, Town & Country Convention Center, San Diego, CA. Additionally, Winter et al. finds that increased milk fat content in different breeds is strongly associated with a lysine at position 232 of the protein encoded by bovine DGAT. An alignment of DGATl amino acid sequences of different plant and animal species indicates a conserved lysine residue at position 232 of the bovine sequence. Winter et al., Proc Natl Acad Sci USA. July 9; 99 (14: 9300-9305 (2002 ). Furthermore, a conserved mutation in the MATP gene has been identified, which causes the cream coat color in the horse. This conserved mutation was also described in mice and humans, but not in medaka. Mariat et al., Genet SeI Evol. Jan-Feb;35(l): 119-33 (2003). There have also been instances of conservation of a gene across species. Many genes involved in fundamental biological processes have been conserved as species have evolved, i.e., many genes are similar in different species. The MClR gene has been indicated to be a well-conserved gene having no other fundamental function beside pigmentation. In several species, mutations in the MClR gene have been shown to cause the dominant expression of black pigment. Klungland et al., Pigmentary Switches in Domestic Animal Species Annals of the New York Academy of Sciences, 994:331-338 (2003). A specific protein-DNA interaction was found to be blocked by a single base pair change in the binding site of glucocorticoid receptor protein (GCR). Moreover it is reported that all three putative domains (the steroid binding, immunoreactive, and DNA binding) have been conserved between two divergent species, pig and rat. Marks et al., J Steroid Biochem. Jun;24(6): 1097-103 (1986).

An example of a conserved gene order is demonstrated by Seroude et al. {Mammalian Genomics, Jun; 10(6) 565-8 (1999)) wherein a radiation hybrid map of the Chromosome 15q2.3-q2.6 region containing the RN gene was constructed, which has large effects on glycogen content in muscle and meat quality. Ten microsatellites and eight genes were mapped. They found that the relative order of genes AE3 and INHA was inverted on the porcine physical map in comparison with the mouse linkage map, but the order of other genes already mapped in the mouse was identical to pigs. Moreover, they found no clear difference between the gene order in pig Chromosome 15 and human

Chromosome 2q. Based on the evolutionary link and comparative genomics of animals, it can be determined whether the variation in a gene is or is likely to be associated with a functional trait between closely linked species.

Indeed, the best approach to genetically improve economic traits is to find relevant chromosomal regions and then genetic -markers directly in the population under selection. Phenotypic measurements can be performed continuously on some animals from the nucleus population of breeding organizations. These phenotypic data are collected in order to enable the detection of relevant genetic markers, and to validate markers identified using experimental populations or to test candidate genes. Not all genes have an easily identifiable common functional variant that can be exploited in association studies, and in many gene cases researchers have identified only changes in individual nucleotides (i.e., single nucleotide polymorphisms (SNPs)) that have no known functional significance. Nevertheless, SNPs are potentially useful in narrowing a linkage region with in a chromosome. In addition, SNPs may show a statistically significant association with a quantitative trait if located within or near that gene by virtue of linkage disequilibrium.

Significant markers or genes can then be included directly in the selection process. An advantage of the molecular information is that we can obtain it already at very young age of the breeding animal, which means that animals can be preselected based on DNA markers before the growing performance test is completed. This is a great advantage for the overall testing and selection system.

Polymorphisms hold promise for use as genetic markers in determining which genes contribute to multigenic or quantitative traits, suitable markers and suitable methods for exploiting those markers are beginning to be brought to bear on the genes related to growth and meat quality. It can be seen from the foregoing that a need exists for identification of genetic variation associated with or in linkage disequilibrium with, genomic regions, which may be used to improve economically beneficial characteristics in animals by identifying and selecting animals with the improved characteristics at the genetic level.

Another object of the invention is to identify a genetic locus in which the variation present has a quantitative effect on a phenotypic trait of interest to breeders.

Another object of the invention is to provide a specific assay for determining the presence of such genetic variation.

A further object of the invention is to provide a method of evaluating animals that increases accuracy of selection and breeding methods for desired traits. Yet another object of the invention is to provide a PCR amplification test to greatly expedite the determination of presence of the marker(s) of such quantitative trait variation.

Additional objects and advantages of the invention will be set forth in part in the description that follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The objects and advantages of the invention will be attained by means of the instrumentalities and combinations particularly pointed out in the appended claims. BRIEF SUMMARY OF THE INVENTION

The methods of the present invention comprise the use of nucleic acid markers genetically linked to loci associated with economically important traits. The markers are used in genetic mapping of genetic material of animals to be used in and/or which have been developed in a breeding program, allowing for marker-assisted selection to identify or to move traits into elite germplasm. The invention relates to the discovery of genetic variation in genomic regions associated with or in linkage disequilibrium or otherwise genetically linked therewith that may be used to predict phenotypic traits in animals. According to an embodiment of the invention, specific regions of chromosome 17 have been fine mapped and shown to be quantitative trait loci for various traits. Namely the region of chromosome 17 at 90.4-92.9, more specifically 92.4 to 92.6cM have been identified as quantitative trait loci for meat quality, fatness and growth traits. Other regions of chromosome 17 have also been shown to be polymorphic an useful as markers as shown in WO2005001032 which is hereby incorporated hereinin its entirety by reference. The instant applcaition identifies different regions with in the chromosome, namely at about 90.4-92.9 to be useful as markers. More specific regions within this area have been identified for growth and meat quality and fatness. Further several genes located in this region have been shown to be polymorphic and thus useful as genetic markers for these QTL. This includes cleavage stimulation factor 3'pre-RNA, subunit 1 50 kDa, (hereinafter "CSTFl") and Chromosome 20 open reading frame 43, (hereinafter "C20orf43") (name derives from human chr 20). To the extent that these genes are conserved among species and animals, and it is expected that the different alleles disclosed herein will also correlate with variability in these gene(s) in other economic or meat-producing animals such as cattle, sheep, chicken, etc. An embodiment of the invention is a method of identifying an allele that is associated with meat quality, fatness and growth traits comprising obtaining a tissue or body fluid sample from an animal; amplifying DNA present in said sample comprising a region 90.4-92.9cM, more specifically, 92.4 to 92.6 of pig chromosome 17 linked to a nucleotide sequence which encodes CSTFl and/or C20orf43; and detecting the presence of a polymorphic variant of said nucleotide sequences wherein said variant is associated with phenotypic variation in meat quality. Another embodiment of the invention is a method of determining a genetic marker which may be used to identify and select animals based upon their meat quality or growth traits comprising obtaining a sample of tissue or body fluid from said animals, said sample comprising DNA; amplifying DNA present in said sample in the region of chromosome 17, said region comprising a nucleotide sequence which encodes upon expression CSTFl and/or C20orf43 present in said sample from a first animal; determining the presence of a polymorphic allele present in said sample by comparison of said sample with a reference sample or sequence; correlating variability for growth, fatness and/ or meat quality in said animals with said polymorphic allele; so that said allele may be used as a genetic marker for the same in a given group, population, or species.

Yet anther embodiment of the invention is a method of identifying an animal for its propensity for growth, fatness and/or meat quality traits, said method comprising obtaining a nucleic acid sample from said animal, and determining the presence of an allele characterized by a polymorphism in a CSTFl and/or C20orf43 sequence present in said sample, or a polymorphism in linkage disequilibrium therewith, said genotype being one which is or has been shown to be significantly associated with a trait indicative of growth or meat quality.

Additional embodiments are set forth in the Detailed Description of the Invention and in the Examples. BRIEF DESCR IPTION OF THE DRAWINGS

Figure 1 shows the fine mapping of the QTL at chromosome 17 according to the invention.

Figure 2 shows the expected RFLP pattern for the Taa I digestion of CSTFl . Figure 3 shows the expected RFLP pattern for the Mwo I digestion of C20orf43. Figure 4 shows the sequence amplified by the CSTFl RFLP test, the polymorphism is shown in bold.

Figure 5 shows the sequence amplified by the C20orf43 RFLP test, the 22 base insertion/deletion is shown in bold.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT Genetic markers closely linked to important genes may be used to indirectly select for favorable alleles more efficiently than direct phenotypic selection (Lande and Thompson 1990). Therefore, it is of particular importance, both to the animal breeder and to farmers who grow and sell animals as a cash crop, to identify, through genetic mapping, the quantitative trait loci (QTL) for various economically valuable traits such as growth, meat quality and fatness. Knowing the QTLs associated with these traits animal breeders will be better able to breed animals which possess genotypic and phenotypic characteristics. To achieve the objectives and in accordance with the purpose of the invention, as embodied and broadly described herein, the present invention provides the discovery of alternate chromosomal regions and genotypes which provide a method for genetically typing animals and screening animals to determine those more likely to possess favorable growth and less fat deposition and meat quality traits or to select against animals which have alleles indicating less favorable growth, are fatter and poorer meat quality traits and/or meat quality traits. As described herein, the effect on a trait such as meat quality which may be demonstrated through the use of any of a number of particular identifiers, such as pH, ham Minolta, or drip loss, but the invention is not so limited. As used herein the use of any particular indicia of the phenotypic traits of growth: e.g. ADG, lifetime daily gain, weight at slaughter etc.; fatness: e.g. 10^th rib fat, average back fat, lean meat percentage, lumbar fat, etc. or meat quality: e.g. loin pH, drip loss, ham Minolta shall be interpreted to include all indicia for which variability is associated with the disclosed allele with respect to meat quality or growth or fatness. As used herein a "favorable growth, fatness, or meat quality trait" means a significant improvement (increase or decrease) in one of any measurable indicia of growth, or meat quality above the mean of a given animal, group, line or population which has the alternate allele form, so that this information can be used in breeding to achieve a uniform group, line or population which is optimized for these traits. This may include an increase in some traits or a decrease in others depending on the desired characteristics. For a review of economic traits and some examples of art accepted measurements, the following maybe consulted: Sosnicki, A. A., E.R. Wilson, E.B. Sheiss, A. deVries, 1998 "Is there a cost effective way to produce high quality pork?", Reciprocal Meat Conference Proceedings, Vol. 51.

Methods for assaying for these traits generally comprises the steps 1) obtaining a biological sample from an animal; and 2) analyzing the genomic DNA or protein obtained in 1) to determine which allele(s) is/are present. Haplotype data which allows for a series of linked polymorphisms to be combined in a selection or identification protocol to maximize the benefits of each of these markers may also be used and are contemplated by this invention. hi another embodiment, the invention comprises a method for identifying genetic markers for growth, fatness and meat quality. Once a major effect gene has been identified, it is expected that other variation present in the same gene, allele or in sequences in useful linkage disequilibrium therewith may be used to identify similar effects on these traits without undue experimentation. The identification of other such genetic variation, once a major effect gene has been discovered, represents more than routine screening and optimization of parameters well known to those of skill in the art and is intended to be within the scope of this invention.

The following terms are used to describe the sequence relationships between two or more nucleic acids or polynucleotides: (a) "reference sequence", (b) "comparison window", (c) "sequence identity", (d) "percentage of sequence identity", and (e) "substantial identity". (a) As used herein, "reference sequence" is a defined sequence used as a basis for sequence comparison; in this case, the Reference sequences. A reference sequence may be a subset or the entirety of a specified sequence; for example, as a segment of a full-length cDNA or gene sequence, or the complete cDNA or gene sequence.

(b) As used herein, "comparison window" includes reference to a contiguous and specified segment of a polynucleotide sequence, wherein the polynucleotide sequence may be compared to a reference sequence and wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. Generally, the comparison window is at least 20 contiguous nucleotides in length, and optionally can be 30, 40, 50, 100, or longer. Those of skill in the art understand that to avoid a high similarity to a reference sequence due to inclusion of gaps in the polynucleotide sequence, a gap penalty is typically introduced and is subtracted from the number of matches.

Methods of alignment of sequences for comparison are well known in the art. Optimal alignment of sequences for comparison may be conducted by the local homology algorithm of Smith and Waterman, Adv. Appl. Math. 2:482 (1981); by the homology alignment algorithm of Needleman and Wunsch, J MoI. Biol. 48:443 (1970); by the search for similarity method of Pearson and Lipman, Proc. Natl. Acad. Sd. 85:2444 (1988); by computerized implementations of these algorithms, including, but not limited to: CLUSTAL in the PC/Gene program by Intelligenetics, Mountain View, California; GAP, BESTFIT, BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group (GCG), 575 Science Dr., Madison, Wisconsin, USA; the CLUSTAL program is well described by Higgins and Sharp, Gene 73:237-244 (1988); Higgins and Sharp, CABIOS 5:151-153 (1989); Corpet, et al., Nucleic Acids Research 16:10881-90 (1988); Huang, et al, Computer Applications in the Biosciences 8:155-65 (1992), and Pearson, et al, Methods in Molecular Biology 24:307-331 (1994). The

BLAST family of programs which can be used for database similarity searches includes: BLASTN for nucleotide query sequences against nucleotide database sequences; BLASTX for nucleotide query sequences against protein database sequences; BLASTP for protein query sequences against protein database sequences; TBLASTN for protein query sequences against nucleotide database sequences; and TBLASTX for nucleotide query sequences against nucleotide database sequences. See, Current Protocols in Molecular Biology, Chapter 19, Ausubel, et al., Eds., Greene Publishing and Wiley-Interscience, New York (1995).

Unless otherwise stated, sequence identity/similarity values provided herein refer to the value obtained using the BLAST 2.0 suite of programs using default parameters.

Altschul et al., Nucleic Acids Res. 25:3389-3402 (1997). Software for performing BLAST analyses is publicly available, e.g., through the National Center for Biotechnology- Information (http://www.hcbi.nlm.nih.gov/).

This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive- valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold (Altschul et al., supra). These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are then extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always > 0) and N (penalty score for mismatching residues; always < 0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) uses as defaults a wordlength (W) of 11, an expectation (E) of 10, a cutoff of 100, M=5, N=-4, and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix (see Henikoff & Henikoff (1989) Proc. Natl. Acad. Sd. USA 89:10915).

In addition to calculating percent sequence identity, the BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin & Altschul, Proc. Natl. Acad. ScL USA 90:5873-5787 (1993)). One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance.

BLAST searches assume that proteins can be modeled as random sequences. However, many real proteins comprise regions of nonrandom sequences which may be homopolymeric tracts, short-period repeats, or regions enriched in one or more amino acids. Such low-complexity regions may be aligned between unrelated proteins even though other regions of the protein are entirely dissimilar. A number of low-complexity filter programs can be employed to reduce such low-complexity alignments. For example, the SEG (Wooten and Federhen, Comput. Chem., 17:149-163 (1993)) and XNU (Claverie and States, Comput. Chem., 17:191-201 (1993)) low-complexity filters can be employed alone or in combination.

(c) As used herein, "sequence identity" or "identity" in the context of two nucleic acid or polypeptide sequences includes reference to the residues in the two sequences which are the same when aligned for maximum correspondence over a specified comparison window. When percentage of sequence identity is used in reference to proteins it is recognized that residue positions which are not identical often differ by conservative amino acid substitutions, where amino acid residues are substituted for other amino acid residues with similar chemical properties (e.g., charge or hydrophobicity) and therefore do not change the functional properties of the molecule. Where sequences differ in conservative substitutions, the percent sequence identity may be adjusted upwards to correct for the conservative nature of the substitution. Sequences which differ by such conservative substitutions are said to have "sequence similarity" or "similarity". Means for making this adjustment are well known to those of skill in the art. Typically this involves scoring a conservative substitution as a partial rather than a full mismatch, thereby increasing the percentage sequence identity. Thus, for example, where an identical amino acid is given a score of 1 and a non-conservative substitution is given a score of zero, a conservative substitution is given a score between zero and 1. The scoring of conservative substitutions is calculated, e.g., according to the algorithm of Meyers and Miller, Computer Applic. Biol. ScL, 4:11-17 (1988) e.g., as implemented in the program PC/GENE (Intelligenetics, Mountain View, California, USA).

(d) As used herein, "percentage of sequence identity" means the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the result by 100 to yield the percentage of sequence identity.

(e)(I) The term "substantial identity" of polynucleotide sequences means that a polynucleotide comprises a sequence that has at least 70% sequence identity, preferably at least 80%, more preferably at least 90% and most preferably at least 95%, compared to a reference sequence using one of the alignment programs described using standard parameters. One of skill will recognize that these values can be appropriately adjusted to determine corresponding identity of proteins encoded by two nucleotide sequences by taking into account codon degeneracy, amino acid similarity, reading frame positioning and the like. Substantial identity of amino acid sequences for these purposes normally means sequence identity of at least 60%, or preferably at least 70%, 80%, 90%, and most preferably at least 95%. These programs and algorithms can ascertain the analogy of a particular polymorphism in a target gene to those disclosed herein. It is expected that this polymorphism will exist in other animals and use of the same in other animals than disclosed herein involved no more than routine optimization of parameters using the teachings herein. It is also possible to establish linkage between specific alleles of alternative DNA markers and alleles of DNA markers known to be associated with a particular gene (e.g., the genes discussed herein), which have previously been shown to be associated with a particular trait. Thus, in the present situation, taking one or both of the genes, it would be possible, at least in the short term, to select for animals likely to produce desired traits, or alternatively against animals likely to produce less desirable traits indirectly, by selecting for certain alleles of an associated marker through the selection of specific alleles of alternative chromosome markers. As used herein the term "genetic marker" shall include not only the nucleotide polymorphisms disclosed by any means of assaying for the protein changes associated with the polymorphism, be they linked genetic markers in the same chromosomal region, use of microsatellites, or even other means of assaying for the causative protein changes indicated by the marker and the use of the same to influence traits of an animal.

As used herein, often the designation of a particular polymorphism is made by the name of a particular restriction enzyme. This is not intended to imply that the only way that the site can be identified is by the use of that restriction enzyme. There are numerous databases and resources available to those of skill in the art to identify other restriction enzymes which can be used to identify a particular polymorphism, for example http://darwin.bio.geneseo.edu which can give restriction enzymes upon analysis of a sequence and the polymorphism to be identified. In fact as disclosed in the teachings herein there are numerous ways of identifying a particular polymorphism or allele with alternate methods which may not even include a restriction enzyme, but which assay for the same genetic or proteomic alternative form.

The invention is intended to include the disclosed sequences as well as all conservatively modified variants thereof. The terms CSTFl and/or C20orf43 as used herein shall be interpreted to include conservatively modified variants. The term

"conservatively modified variants" applies to both amino acid and nucleic acid sequences. With respect to particular nucleic acid sequences, conservatively modified variants refer to those nucleic acids which encode identical or conservatively modified variants of the amino acid sequences. Because of the degeneracy of the genetic code, a large number of functionally identical nucleic acids encode any given protein. For instance, the codons GCA, GCC, GCG and GCU all encode the amino acid alanine. Thus, at every position where an alanine is specified by a codon, the codon can be altered to any of the corresponding codons described without altering the encoded polypeptide. Such nucleic acid variations are "silent variations" and represent one species of conservatively modified variation. Every nucleic acid sequence herein that encodes a polypeptide also, by reference to the genetic code, describes every possible silent variation of the nucleic acid. One of ordinary skill will recognize that each codon in a nucleic acid (except AUG, which is ordinarily the only codon for methionine; and UGG, which is ordinarily the only codon for tryptophan) can be modified to yield a functionally identical molecule. Accordingly, each silent variation of a nucleic acid which encodes a polypeptide of the present invention is implicit in each described polypeptide sequence and is within the scope of the present invention.

As to amino acid sequences, one of skill will recognize that individual substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which alters, adds or deletes a single amino acid or a small percentage of amino acids in the encoded sequence is a "conservatively modified variant" where the alteration results in the substitution of an amino acid with a chemically similar amino acid. Thus, any number of amino acid residues selected from the group of integers consisting of from 1 to 15 can be so altered. Thus, for example, 1, 2, 3, 4, 5, 7, or 10 alterations can be made. Conservatively modified variants typically provide similar biological activity as the unmodified polypeptide sequence from which they are derived. For example, substrate specificity, enzyme activity, or ligand/receptor binding is generally at least 30%, 40%, 50%, 60%, 70%, 80%, or 90% of the native protein for its native substrate. Conservative substitution tables providing functionally similar amino acids are well known in the art. Conservative substitutions of encoded amino acids include, for example, amino acids that belong within the following groups: (1) non-polar amino acids (GIy, Ala, VaI, Leu, and He); (2) polar neutral amino acids (Cys, Met, Ser, Thr, Asn, and GIn); (3) polar acidic amino acids (Asp and GIu); (4) polar basic amino acids (Lys, Arg and His); and (5) aromatic amino acids (Phe, Tip, Tyr, and His).

Those of ordinary skill in the art will recognize that some substitution will not alter the activity of the polypeptide to an extent that the character or nature of the polypeptide is substantially altered. A "conservative substitution" is one in which an amino acid is substituted for another amino acid that has similar properties, such that one skilled in the art of peptide chemistry would expect the secondary structure and hydropathic nature of the polypeptide to be substantially unchanged. Modifications may be made in the structure of the polynucleotides and polypeptides of the present invention and still obtain a functional molecule that encodes a variant or derivative polypeptide with desirable characteristics, e.g., with meat quality/growth-like characteristics. When it is desired to alter the amino acid sequence of a polypeptide to create an equivalent, or a variant or portion of a polypeptide of the invention, one skilled in the art will typically change one or more of the codons of the encoding DNA sequence according to Table 1 (See infra). For example, certain amino acids may be substituted for other amino acids in a protein structure without appreciable loss of activity. Since it is the interactive capacity and nature of a protein that defines that protein's biological functional activity, certain amino acid sequence substitutions can be made in a protein sequence, and, of course, its underlying DNA coding sequence, and nevertheless obtain a protein with like properties. It is thus contemplated that various changes may be made in the peptide sequences of the disclosed compositions, or corresponding DNA sequences, which encode said peptides without appreciable loss of their biological utility or activity. A degenerate codon means that a different three letter codon is used to specify the same amino acid. For example, it is well known in the art that the following RNA codons (and therefore, the corresponding DNA codons, with a T substituted for a U) can be used interchangeably to code for each specific amino acid: TABLE l Amino Acids Codons Phenylalanine (Phe or F) UUU, UUC, UUA orUUG Leucine (Leu or L) CUU, CUC, CUA or CUG Isoleucine (He or I) AUU, AUC orAUA

Methionine (Met or M) AUG Valine (VaI or V) GUU, GUC, GUA, GUG Serine (Ser or S) AGU orAGC Proline (Pro or P) CCU, CCC, CCA, CCG Threonine (Thr or T) ACU, ACC, ACA, ACG

Alanine (Ala or A) GCU, GCG, GCA, GCC Tryptophan (Trp) UGG Tyrosine (Tyr or Y) UAU orUAC Histidine (His or H) CAU or CAC Glutamine (GIn or Q) CAA or CAG

Asparagine (Asn or N) AAU orAAC Lysine (Lys or K) AAA orAAG Aspartic Acid (Asp or D) GAU or GAC Glutamic Acid (GIu or E) GAA or GAG Cysteine (Cys or C) UGU orUGC

Arginine (Arg or R) AGA orAGG Glycine (GIy or G) GGU or GGC or GGA or GGG Termination codon UAA, UAG orUGA

An embodiment of the invention relates to genetic markers for economically valuable traits in animals. The markers represent polymorphic variation or alleles that are associated significantly with growth and/or meat quality and thus provide a method of screening animals to determine those more likely to produce desired traits. As used herein the term "marker" shall include a polymorphic variant capable of detection which may be linked to a quantitative trait loci and thus useful for assaying for the particular trait in the QTL. Thus, the invention relates to genetic markers and methods of identifying those markers in an animal of a particular breed, strain, population, or group, whereby the animal is more likely to yield desired meat or growth or fatness traits.

Genetic Association with Meat Quality. Fatness and Growth Traits on Chromosome 17

Genetic analysis described herein led to the discovery of genetic association with meat quality, fatness and growth traits on chromosome 17. The association identifies chromosome 17 as the location of one or more chromosomal regions/DNA segments or genes associated with favorable meat quality, fatness, and growth traits in animals and of considerable effect size. In particular, chromosome 17 is identified as containing at least one DNA segment or gene associated with favorable meat quality, fatness and growth traits. More particularly, a region of chromosome 17 with map position 90.4 - 92.9 on BY map, more specifically 92.4 and 92.6 have been identified which comprises the CSTFl and C20orf43 genes. The finding of association of genetic markers/polymorphisms disclosed herein with meat quality, fatness and growth traits indicates that there is one or more meat quality and growth traits chromosomal regions/DNA segments or meat quality and growth traits genes on chromosome 17 that either directly cause or confer a significant improvement in one of any measurable indicia of growth, fatness or meat quality above the mean of a given population.

The discovery of one or more growth, fatness, or meat quality-associated genes on chromosome 17, as evidenced by significant association with growth, fatness, or meat quality on chromosome 17, thus provides the basis for genetic analysis methods described herein which include: methods of identifying an allele that is associated with meat quality, fatness, and growth traits; methods of determining a genetic marker which may be used and select animals based upon their meat quality or growth traits; methods of identifying an animal for its propensity for growth, fatness or meat quality traits.

Genetic Markers Associated With Growth. Fatness or Meat Quality Traits Genetic markers associated with meat growth or meat quality traits are provided herein. The markers are located on porcine chromosome 17. In particular embodiments of the genetic markers found in CSTFl and C20orf43 were mapped underneath the SSC 17 QTL peaks for traits disclosed herein. The markers can be identified through linkage disequilibrium or association assessment methods described herein or known to those of skill in the art, and provide scores or results indicative of linkage disequilibrium with a chromosomal region/DNA segment or gene or of association with growth, fatness or meat quality when tested by such assessment methods. The genetic markers are associated with growth or meat quality as individual markers and/or in combinations, such as haplotypes, that are associated with growth or meat quality.

Genetic Markers on Porcine Chromosome 17

A genetic marker is a DNA segment with an identifiable location in a chromosome. Genetic markers may be used in a variety of genetic studies such as, for example, locating the chromosomal position or locus of a DNA sequence of interest, and determining if a subject is predisposed to or has a particular trait. Because DNA sequences that are relatively close together on a chromosome tend to be inherited together, tracking of a genetic marker through generations in a population and comparing its inheritance to the inheritance of another DNA sequence of interest can provide information useful in determining the relative position of the DNA sequence of interest on a chromosome. Genetic markers particularly useful in such genetic studies are polymorphic. Such markers also may have an adequate level of heterozygosity to allow a reasonable probability that a randomly selected animal will be heterozygous.

The occurrence of variant forms of a particular DNA sequence, e.g., a gene, is referred to as polymorphism. A region of a DNA segment in which variation occurs may be referred to as a polymorphic region or site. A polymorphic region can be a single nucleotide (single nucleotide polymorphism or SNP), the identity of which differs, e.g., in different alleles, or can be two or more nucleotides in length. For example, variant forms of a DNA sequence may differ by an insertion or deletion of one or more nucleotides, as is the case with c20orf43, insertion of a sequence that was duplicated, inversion of a sequence or conversion of a single nucleotide to a different nucleotide. Each animal can carry two different forms of the specific sequence or two identical forms of the sequence. Differences between polymorphic forms of a specific DNA sequence may be detected in a variety of ways. For example, if the polymorphism is such that it creates or deletes a restriction enzyme site, such differences may be traced by using restriction enzymes that recognize specific DNA sequences. Restriction enzymes cut (digest) DNA at sites in their specific recognized sequence, resulting in a collection of fragments of the DNA. When a change exists in a DNA sequence that alters a sequence recognized by a restriction enzyme to one not recognized the fragments of DNA produced by restriction enzyme digestion of the region will be of different sizes. The various possible fragment sizes from a given region therefore depend on the precise sequence of DNA in the region. Variation in the fragments produced is termed "restriction fragment length polymorphism" (RFLP). The different sized-fragments reflecting variant DNA sequences can be visualized by separating the digested DNA according to its size on an agarose gel and visualizing the individual fragments by annealing to a labeled, e.g., radioactively or otherwise labeled, DNA "probe". PCR-RFLP, broadly speaking, is a technique that involves obtaining the DNA to be studied, amplifying the DNA, digesting the DNA with restriction endonucleases, separating the resulting fragments, and detecting the fragments of various genes. The use of PCR- RFLPs is the preferred method of detecting the polymorphisms, disclosed herein. However, since the use of RFLP analysis depends ultimately on polymorphisms and DNA restriction sites along the nucleic acid molecule, other methods of detecting the polymorphism can also be used and are contemplated in this invention. Such methods include ones that analyze the polymorphic gene product and detect polymorphisms by detecting the resulting differences in the gene product.

SNP markers may also be used in fine mapping and association analysis, as well as linkage analysis (see, e.g., Kruglyak (1997) Nature Genetics 17:21-24). Although an SNP may have limited information content, combinations of SNPs (which individually occur about every 100-300 bases) may yield informative haplotypes. SNP databases are available. Assay systems for determining SNPs include synthetic nucleotide arrays to which labeled, amplified DNA is hybridized (see, e.g., Lipshutz et al. (1999) Nature Genet. 21:2-24); single base primer extension methods (Pastinen et al. (1997) Genome Res. 7:606-614), mass spectroscopy on tagged beads, and solution assays in which allele-specific oligonucleotides are cleaved or joined at the position of the SNP allele, resulting in activation of a fluorescent reporter system (see, e.g., Landegren et al. (1998) Genome Res. 8:769-776).

Chromosome 17

Pig chromosome 17 is well conserved (homologous to human chromosome 20 and mouse chromosome 2. It istended that these homologous chromosomes be included within the term Chromosome 17.).

Genetic Association

When two loci are extremely close together, recombination between them is very rare, and the rate at which the two neighboring loci recombine can be so slow as to be unobservable except over many generations. The resulting allelic association is generally referred to as linkage disequilibrium. Linkage disequilibrium can be defined as specific alleles at two or more loci that are observed together on a chromosome more often than expected from their frequencies in the population. As a consequence of linkage disequilibrium, the frequency of all other alleles present in a haplotype carrying a trait- causing allele will also be increased (just as the trait-causing allele is increased in an affected, or trait-positive, population) compared to the frequency in a trait-negative or random control population. Therefore, association between the trait and any allele in linkage disequilibrium with the trait-causing allele will suffice to suggest the presence of a trait-related DNA segment in that particular region of a chromosome. On this basis, association studies are used in methods of locating and discovering methods, as disclosed herein, of identifying an allele that is associated with meat quality and growth traits in animals.

A marker locus must be tightly linked to the trait locus in order for linkage disequilibrium to exist between the loci, hi particular, loci must be very close in order to have appreciable linkage disequilibrium that may be useful for association studies. Association studies rely on the retention of adjacent DNA variants over many generations in historic ancestries, and, thus, trait-associated regions are theoretically small in outbred random mating populations. The power of genetic association analysis to detect genetic contributions to traits can be much greater than that of linkage studies. Linkage analysis can be limited by a lack of power to exclude regions or to detect loci with modest effects. Association tests can be capable of detecting loci with smaller effects (Risch and Merikangas (1996) Science 273 : 1516- 1517), which may not be detectable by linkage analysis.

The aim of association studies when used to discover genetic variation in genes associated with phenotypic traits is to identify particular genetic variants that correlate with the phenotype at the population level. Association at the population level may be used in the process of identifying a gene or DNA segment because it provides an indication that a particular marker is either a functional variant underlying the trait (i.e., a polymorphism that is directly involved in causing a particular trait) or is extremely close to the trait gene on a chromosome. When a marker analyzed for association with a phenotypic trait is a functional variant, association is the result of the direct effect of the genotype on the phenotypic outcome. When a marker being analyzed for association is an anonymous marker, the occurrence of association is the result of linkage disequilibrium between the marker and a functional variant.

There are a number of methods typically used in assessing genetic association as an indication of linkage disequilibrium, including case-control study of unrelated animals and methods using family-based controls. Although the case-control design is relatively simple, it is the most prone to identifying DNA variants that prove to be spuriously associated (i.e., association without linkage) with the trait. Spurious association can be due to the structure of the population studied rather than to linkage disequilibrium. Linkage analysis of such spuriously associated allelic variants, however, would not detect evidence of significant linkage because there would be no familial segregation of the variants. Therefore, putative association between a marker allele and a meat quality, fatness and growth trait identified in a case-control study should be tested for evidence of linkage between the marker and the disease before a conclusion of probable linkage disequilibrium is made. Association tests that avoid some of the problems of the standard case-control study utilize family-based controls in which parental alleles or haplotypes not transmitted to affected offspring are used as controls. In contrast to genetic linkage, which is a property of loci, genetic association is a property of alleles. Association analysis involves a determination of a correlation between a single, specific allele and a trait across a population, not only within individual groups. Thus, a particular allele found through an association study to be in linkage disequilibrium with a meat quality or growth or fatness associated-allele can form the basis of a method of determining a predisposition to or the occurrence of the trait in any animal. Such methods would not involve a determination of phase of an allele and thus would not be limited in terms of the animals that may be screened in the method.

Methods for Identifying Genetic Markers Associated with Meat Quality, Growth or Fatness Traits

Also provided herein are methods of determining a genetic marker, which may be used to identify and select animals, based upon their meat quality or growth traits. The methods include a step of testing a polymorphic marker on chromosome 17 for association with meat quality or growth traits. The testing may involve genotyping DNA from animals, and possibly be used as a genetic marker for the same in a given group, population or species, with respect to the polymorphic marker and analyzing the genotyping data for association with meat quality or growth traits using methods described herein and/or known to those of skill in the art.

Candidate Gene Approach

The candidate gene approach typically takes into account knowledge of biological processes of a disease as a basis for selecting genes that encode proteins that could be envisioned to be involved in the biological processes. For example, reasonable candidate genes for blood pressure disorders could be proteins and enzymes involved in the renin- angiotensin system. Candidate genes can be evaluated genetically as possible disease genes by linkage and/or association studies of markers in the candidate gene region.

Methods of Identifying a Candidate Meat Quality, Fatness and/or Growth Gene The methods of identifying a candidate meat quality, fatness and/or growth gene include a step of selecting a gene on chromosome 17 that is or encodes a product that has one or more properties relating to one or more phenomena in meat quality, fatness or growth. Additional genes that have been mapped to chromosome 17 are also known. Thus, genes on chromosome 17 may be evaluated as possible candidate genes on the basis of, for example, knowledge of the functions of the genes or products thereof and/or their occurrence or alteration in meat quality and growth.

Properties Relating to Phenomena in Meat Quality, Fatness and Growth

In the methods of identifying a candidate meat quality and growth gene provided herein, a gene on chromosome 17, and, in particular embodiments, on particular regions of chromosome 17 as described herein, are selected that is or encodes a product that has properties relating to one or more phenomena in meat quality and growth. The properties may be any aspect or feature of the gene or gene product, including but not limited to its physical composition (e.g., nucleic acids, amino acids, peptides and proteins), functional attributes (e.g., enzymatic capabilities, such as an enzyme catalyst, inhibitory functions, such as enzyme inhibition, antigenic properties, and binding capabilities, such as a receptor or ligand), cellular location(s), expression pattern (e.g., expression in the cells and tissues associated therewith) and/or interactions with other compositions.

The properties of the gene or gene product that are selected for in the methods of identifying a candidate meat quality, fatness and growth gene are those that relate to one or more phenomena in meat quality and growth. Such phenomena, which have been widely described and are known to those of skill in the art, are numerous and include morphological, structural, biological and biochemical occurrences. As described herein, the effect on meat quality may be demonstrated through the use of a particular identifier, such as pH or drip loss.

Candidate Genes of the Present Invention

Generally, in a candidate gene approach to the identification of a trait gene using association analysis of polymorphic markers, one or a few markers around or within candidate trait genes, particularly those with hypothesized functional importance, are genotyped in a few hundred case and control animals. The specific characteristics of the associated allele with respect to a candidate gene function usually gives further insight into the relationship between the associated allele and the trait (causal or in linkage disequilibrium). If the evidence indicates that the associated allele within the candidate gene is most probably not the trait-causing allele but is in linkage disequilibrium with the real trait-causing allele, then the trait-causing allele can be found by sequencing the vicinity of the associated marker, and performing further association studies with the polymorphisms that are revealed in an iterative manner.

The Inventors of this invention have applied in part the candidate gene approach to meat quality, fatness and growth traits of the pig. The number of genes that are known to date that control meat quality and growth rates in pigs are small but their individual effects are, in most cases large. Often, this is due to the observation of the large effects that a polymorphism or mutation has on an animal's function. From such genes and others which seemed to be good candidates, the Inventors selected their candidate genes as disclosed herein. The candidate gene analysis clearly provides a short-cut approach to the identification of genes and gene polymorphisms related to a particular phenotypic trait when the candidate gene plays a plausible role in a biological or physiological pathway of the candidate gene. The basis of mutational effects on a trait in humans or mouse, suggests a role for the same gene in corresponding traits in livestock.

According to the invention, CSTFl and C20orf43 genes have all been identified as major effect genes and variability in these genes have been shown associated with the phenotypic traits of meat quality, fatness and/or growth traits in animals, particularly pigs. Thus, screening methods may be developed for variation within or linked to these genes that are predictive of phenotypic variation.

Oligonucleotides were used in the PCR amplification of genomic DNA for sequences prior to design of specific oligonucleotides for single-nucleotide polymorphism (SNP) detection and genotyping. PCR conditions are exemplified in the Examples section. The detection of the polymorphism(s) was carried out by restriction fragment length polymorphism detection. Genotyping for CSTFl and/or C20orf43were based on the presence or absence of a restriction site at the polymorphic sites in PCR-amplified DNA fragments (PCR-RFLP). The genotypes were identified according to the resolved products on an electrophoretic gel. A single nucleotide polymorphism was detected in the pre-RNA subunit of the CSTFl gene a portion of which is depicted in SEQ ID NO: 1, Figure 4. The polymorphism is a T/C polymorphism and an RFLP test has been designed to detect the base present at this polymorphic site. Digestion of an amplified CSTFl fragment with Taa /resulted in an RFLP depicted in Figure 4. Homozygous allele 1 genotype generated a 251 and 118 base pair (bp) restriction fragment, while homozygous allele 2 genotype generated a 165, 118, and 86 bp restriction fragment. Heterozygous 12 genotype showed all four fragments. A T at this position was significantly correlated with growth traits, (lifetime daily gain, while a C at this position (cut, allele 1) correlated with lean traits and redder meat. A 22bp insertion/deletion was detected in open reading frame 43 of Chromosome

20 gene, depicted in SEQ ID NO.2. Digestion with Mwo I resulted in an RFLP depicted in Figure 3. Homozygous allele 1 genotype generated a 236, 165, and 69 base pair (bp) restriction fragments, while homozygous allele 2 genotype generated a 143 and 69 bp restriction fragment. Heterozygous 12 genotype showed all four fragments. Importantly, digestion is not essential to determining the presense or absence of the insertion/deletion polymorphism. Amplificaiton without digestion would lead to a band a 470 bp (geneotype 11) insertion/insertion and 448 deletion/deletion which may be used to detect the particular allele present.

Any method of identifying the presence or absence of these polymorphisms may be used, including for example single-strand conformation polymorphism (SSCP) analysis, base excision sequence scanning (BESS), RFLP analysis, heteroduplex analysis, denaturing gradient gel electrophoresis, and temperature gradient electrophoresis, allelic PCR, ligase chain reaction direct sequencing, mini sequencing, nucleic acid hybridization, micro-array- type detection of a major effect gene or allele, or other linked sequences of the same. Also within the scope of the invention includes assaying for protein conformational or sequences changes, which occur in the presence of this polymorphism. The polymorphism may or may not be the causative mutation but will be indicative of the presence of this change and one may assay for the genetic or protein bases for the phenotypic difference. Based upon detection of there markers allele frequencies may be calculated for a given population to , determine differences in allele frequencies between groups of animals, i.e. the use of quantitative genotyping. This will provide for the ability to select specific populations for associated traits.

In general, the polymorphisms used as genetic markers of the present invention find use in any method known in the art to demonstrate a statistically significant correlation between a genotype and a phenotype.

The invention therefore, comprises in one embodiment, a method of identifying an allele that is associated with meat quality traits. The invention also comprises methods of determining a genetic region or marker which may be used to identify and select animals based upon their meat quality, fatness or growth traits. Yet another embodiment provides a method of identifying an animal for its propensity for growth, fatness or meat quality traits.

Also provided herein are method of detecting an association between a genotype and a phenotype, which may comprising the steps of a) genotyping at least one candidate gene-related marker in a trait positive population according to a genotyping method of the invention; b) genotyping the candidate gene-related marker in a control population according to a genotyping method of the invention; and c) determining whether a statistically significant association exists between said genotype and said phenotype. In addition, the methods of detecting an association between a genotype and a phenotype of the invention encompass methods with any further limitation described in this disclosure, or those following, specified alone or in any combination. Preferably, the candidate gene- related marker is present in one or more of SEQ ID NOs: 1 or 2. Each of said genotyping of steps a) and b) is performed separately on biological samples derived from each pig in said population or a subsample thereof. Preferably, the phenotype is a trait involving the growth, fatness and meat quality characteristics of an animal.

The invention described herein contemplates alternative approaches that can be employed to perform association studies: genome-wide association studies, candidate region association studies and candidate gene association studies. In a preferred embodiment, the markers of the present invention are used to perform candidate gene association studies. Further, the markers of the present invention may be incorporated in any map of genetic markers of the pig genome in order to perform genome-wide association studies. Methods to generate a high-density map of markers well known to those of skill in the art. The markers of the present invention may further be incorporated in any map of a specific candidate region of the genome (a specific chromosome or a specific chromosomal segment for example).

Association studies are extremely valuable as they permit the analysis of sporadic or multifactor traits. Moreover, association studies represent a powerful method for fine-scale mapping enabling much finer mapping of trait causing alleles than linkage studies. Once a chromosome segment of interest has been identified, the presence of a candidate gene such as a candidate gene of the present invention, in the region of interest can provide a shortcut to the identification of the trait causing allele. Polymorphisms used as genetic markers of the present invention can be used to demonstrate that a candidate gene is associated with a trait. Such uses are specifically contemplated in the present invention and claims.

Association Analysis

The general strategy to perform association studies using markers derived from a region carrying a candidate gene is to scan two groups of animals (case-control populations) in order to measure and statistically compare the allele frequencies of the markers of the present invention in both groups.

If a statistically significant association with a trait is identified for at least one or more of the analyzed markers, one can assume that: either the associated allele is directly responsible for causing the trait (the associated allele is the trait causing allele), or more likely the associated allele is in linkage disequilibrium with the trait causing allele. The specific characteristics of the associated allele with respect to the candidate gene function usually gives further insight into the relationship between the associated allele and the trait (causal or in linkage disequilibrium). If the evidence indicates that the associated allele within the candidate gene is most probably not the trait causing allele but is in linkage disequilibrium with the real trait causing allele, then the trait causing allele can be found by sequencing the vicinity of the associated marker.

Association studies are usually run in two successive steps. In a first phase, the frequencies of a reduced number of markers from the candidate gene are determined in the trait positive and trait negative populations, hi a second phase of the analysis, the position of the genetic loci responsible for the given trait is further refined using a higher density of markers from the relevant region. However, if the candidate gene under study is relatively small in length, a single phase may be sufficient to establish significant associations. Testing for Association

Methods for determining the statistical significance of a correlation between a phenotype and a genotype, in this case an allele at a marker or a haplotype made up of such alleles, may be determined by any statistical test known in the art and is with any accepted threshold of statistical significance being required. The application of particular methods and thresholds of significance are well with in the skill of the ordinary practitioner of the art. Testing for association is performed in one way by determining the frequency of a marker allele in case and control populations and comparing these frequencies with a statistical test to determine if there is a statistically significant difference in frequency which would indicate a correlation between the trait and the marker allele under study. Similarly, a haplotype analysis is performed by estimating the frequencies of all possible haplotypes for a given set of markers in case and control populations, and comparing these frequencies with a statistical test to determine if their is a statistically significant correlation between the haplotype and the phenotype (trait) under study. Any statistical tool useful to test for a statistically significant association between a genotype and a phenotype may be used and many exist. Preferably the statistical test employed is a chi-square test with one degree of freedom. A P-value is calculated (the P-value is the probability that a statistic as large or larger than the observed one would occur by chance). Other methods involve linear models and analysis of variance techniques.

The following is a general overview of techniques which can be used to assay for the polymorphisms of the invention. In the present invention, a sample of genetic material is obtained from an animal.

Samples can be obtained from blood, tissue, semen, etc. Generally, peripheral blood cells are used as the source, and the genetic material is DNA. A sufficient amount of cells are obtained to provide a sufficient amount of DNA for analysis. This amount will be known or readily determinable by those skilled in the art. The DNA is isolated from the blood cells by techniques known to those skilled in the art. Isolation and Amplification of Nucleic Acid

Samples of genomic DNA are isolated from any convenient source including saliva, buccal cells, hair roots, blood, cord blood, amniotic fluid, interstitial fluid, peritoneal fluid, chorionic villus, and any other suitable cell or tissue sample with intact interphase nuclei or metaphase cells. The cells can be obtained from solid tissue as from a fresh or preserved organ or from a tissue sample or biopsy. The sample can contain compounds which are not naturally intermixed with the biological material such as preservatives, anticoagulants, buffers, fixatives, nutrients, antibiotics, or the like.

Methods for isolation of genomic DNA from these various sources are described in, for example, Kirby, DNA Fingerprinting, An Introduction, W.H. Freeman & Co. New

York (1992). Genomic DNA can also be isolated from cultured primary or secondary cell cultures or from transformed cell lines derived from any of the aforementioned tissue samples.

Samples of animal RNA can also be used. RNA can be isolated from tissues expressing the major effect gene of the invention as described in Sambrook et al., supra. RNA can be total cellular RNA, mRNA, poly A+ RNA, or any combination thereof. For best results, the RNA is purified, but can also be unpurified cytoplasmic RNA. RNA can be reverse transcribed to form DNA which is then used as the amplification template, such that the PCR indirectly amplifies a specific population of RNA transcripts. See, e.g., Sambrook, supra, Kawasaki et al., Chapter 8 in PCR Technology, (1992) supra, and Berg et al., Hum. Genet. 85:655-658 (1990).

PCR Amplification

The most common means for amplification is polymerase chain reaction (PCR), as described in U.S. Pat. Nos. 4,683,195, 4,683,202, 4,965,188 each of which is hereby incorporated by reference. IfPCR is used to amplify the target regions in blood cells, heparinized whole blood should be drawn in a sealed vacuum tube kept separated from other samples and handled with clean gloves. For best results, blood should be processed immediately after collection; if this is impossible, it should be kept in a sealed container at 4°C until use. Cells in other physiological fluids may also be assayed. When using any of these fluids, the cells in the fluid should be separated from the fluid component by centrifugation.

Tissues should be roughly minced using a sterile, disposable scalpel and a sterile needle (or two scalpels) in a 5 mm Petri dish. Procedures for removing paraffin from tissue sections are described in a variety of specialized handbooks well known to those skilled in the art.

To amplify a target nucleic acid sequence in a sample by PCR, the sequence must be accessible to the components of the amplification system. One method of isolating target DNA is crude extraction which is useful for relatively large samples. Briefly, mononuclear cells from samples of blood, amniocytes from amniotic fluid, cultured chorionic villus cells, or the like are isolated by layering on sterile Ficoll-Hypaque gradient by standard procedures. Interphase cells are collected and washed three times in sterile phosphate buffered saline before DNA extraction. If testing DNA from peripheral blood lymphocytes, an osmotic shock (treatment of the pellet for 10 sec with distilled water) is suggested, followed by two additional washings if residual red blood cells are visible following the initial washes. This will prevent the inhibitory effect of the heme group carried by hemoglobin on the PCR reaction. IfPCR testing is not performed immediately after sample collection, aliquots of 10⁶ cells can be pelleted in sterile Eppendorf tubes and the dry pellet frozen at -20°C until use. The cells are resuspended (10⁶ nucleated cells per 100 μl) in a buffer of 50 mM

Tris-HCl (pH 8.3), 50 mM KCl 1.5 mM MgCl₂, 0.5% Tween 20, 0.5% NP40 supplemented with 100 μg/ml of proteinase K. After incubating at 56°C for 2 hr. the cells are heated to 95 °C for 10 min to inactivate the proteinase K and immediately moved to wet ice (snap-cool). If gross aggregates are present, another cycle of digestion in the same buffer should be undertaken. Ten μl of this extract is used for amplification.

When extracting DNA from tissues, e.g., chorionic villus cells or confluent cultured cells, the amount of the above mentioned buffer with proteinase K may vary according to the size of the tissue sample. The extract is incubated for 4-10 hrs at 50°-60°C and then at 95°C for 10 minutes to inactivate the proteinase. During longer incubations, fresh proteinase K should be added after about 4 hr at the original concentration. When the sample contains a small number of cells, extraction may be accomplished by methods as described in Higuchi, "Simple and Rapid Preparation of Samples for PCR", in PCR Technology, Ehrlich, H. A. (ed.), Stockton Press, New York, which is incorporated herein by reference. PCR can be employed to amplify target regions in very small numbers of cells (1000-5000) derived from individual colonies from bone marrow and peripheral blood cultures. The cells in the sample are suspended in 20 μl of PCR lysis buffer (10 mM Tris-HCl (pH 8.3), 50 mM KCl, 2.5 mM MgCl₂, 0.1 mg/ml gelatin, 0.45% NP40, 0.45% Tween 20) and frozen until use. When PCR is to be performed, 0.6 μl of proteinase K (2 mg/ml) is added to the cells in the PCR lysis buffer. The sample is then heated to about 60°C and incubated for 1 hr. Digestion is stopped through inactivation of the proteinase K by heating the samples to 95°C for 10 min and then cooling on ice.

A relatively easy procedure for extracting DNA for PCR is a salting out procedure adapted from the method described by Miller et al., Nucleic Acids Res. 16:1215 (1988), which is incorporated herein by reference. Mononuclear cells are separated on a Ficoll- Hypaque gradient. The cells are resuspended in 3 ml of lysis buffer (10 mM Tris-HCl, 400 mM NaCl, 2 mM Na₂ EDTA, pH 8.2). Fifty μl of a 20 mg/ml solution of proteinase K and 150 μl of a 20% SDS solution are added to the cells and then incubated at 37°C overnight. Rocking the tubes during incubation will improve the digestion of the sample. If the proteinase K digestion is incomplete after overnight incubation (fragments are still visible), an additional 50 μl of the 20 mg/ml proteinase K solution is mixed in the solution and incubated for another night at 37°C on a gently rocking or rotating platform. Following adequate digestion, one ml of a 6 M NaCl solution is added to the sample and vigorously mixed. The resulting solution is centrifuged for 15 minutes at 3000 rpm. The pellet contains the precipitated cellular proteins, while the supernatant contains the DNA. The supernatant is removed to a 15 ml tube that contains 4 ml of isopropanol. The contents of the tube are mixed gently until the water and the alcohol phases have mixed and a white DNA precipitate has formed. The DNA precipitate is removed and dipped in a solution of 70% ethanol and gently mixed. The DNA precipitate is removed from the ethanol and air- dried. The precipitate is placed in distilled water and dissolved. Kits for the extraction of high-molecular weight DNA for PCR include a Genomic

Isolation Kit A.S.A.P. (Boehringer Mannheim, Indianapolis, Ind.), Genomic DNA Isolation System (GIBCO BRL, Gaithersburg, Md.), Elu-Quik DNA Purification Kit (Schleicher & Schuell, Keene, N.H.), DNA Extraction Kit (Stratagene, LaJoIIa, Calif.), TurboGen Isolation Kit (rnvitrogen, San Diego, Calif), and the like. Use of these kits according to the manufacturer's instructions is generally acceptable for purification of DNA prior to practicing the methods of the present invention.

The concentration and purity of the extracted DNA can be determined by spectrophotometric analysis of the absorbance of a diluted aliquot at 260 nm and 280 nm. After extraction of the DNA, PCR amplification may proceed. The first step of each cycle of the PCR involves the separation of the nucleic acid duplex formed by the primer extension. Once the strands are separated, the next step in PCR involves hybridizing the separated strands with primers that flank the target sequence. The primers are then extended to form complementary copies of the target strands. For successful PCR amplification, the primers are designed so that the position at which each primer hybridizes along a duplex sequence is such that an extension product synthesized from one primer, when separated from the template (complement), serves as a template for the extension of the other primer. The cycle of denaturation, hybridization, and extension is repeated as many times as necessary to obtain the desired amount of amplified nucleic acid. hi a particularly useful embodiment of PCR amplification, strand separation is achieved by heating the reaction to a sufficiently high temperature for a sufficient time to cause the denaturation of the duplex but not to cause an irreversible denaturation of the polymerase (see U.S. Pat. No. 4,965,188, incorporated herein by reference). Typical heat denaturation involves temperatures ranging from about 80⁰C to 105°C for times ranging from seconds to minutes. Strand separation, however, can be accomplished by any suitable denaturing method including physical, chemical, or enzymatic means. Strand separation may be induced by a helicase, for example, or an enzyme capable of exhibiting helicase activity. For example, the enzyme RecA has helicase activity in the presence of ATP. The reaction conditions suitable for strand separation by helicases are known in the art (see Kuhn Hoffman-Berling, 1978, CSH-Quantitative Biology, 43:63-67; and Radding, 1982, Ann. Rev. Genetics 16:405-436, each of which is incorporated herein by reference). Template-dependent extension of primers in PCR is catalyzed by a polymerizing agent in the presence of adequate amounts of four deoxyribonucleotide triphosphates (typically dATP, dGTP, dCTP, and dTTP) in a reaction medium comprised of the appropriate salts, metal cations, and pH buffering systems. Suitable polymerizing agents are enzymes known to catalyze template-dependent DNA synthesis. In some cases, the target regions may encode at least a portion of a protein expressed by the cell. In this instance, mRNA may be used for amplification of the target region. Alternatively, PCR can be used to generate a cDNA library from RNA for further amplification, the initial template for primer extension is RNA. Polymerizing agents suitable for synthesizing a complementary, copy-DNA (cDNA) sequence from the RNA template are reverse transcriptase (RT), such as avian myeloblastosis virus RT, Moloney murine leukemia virus RT, or Thermus thermophilus (Tth) DNA polymerase, a thermostable DNA polymerase with reverse transcriptase activity marketed by Perkin Elmer Cetus, Inc. Typically, the genomic RNA template is heat degraded during the first denaturation step after the initial reverse transcription step leaving only DNA template. Suitable polymerases for use with a DNA template include, for example, E. coli DNA polymerase I or its Klenow fragment, T4 DNA polymerase, Tth polymerase, and Taq polymerase, a heat-stable DNA polymerase isolated from Thermus aquaticus and commercially available from Perkin Elmer Cetus, Inc. The latter enzyme is widely used in the amplification and sequencing of nucleic acids. The reaction conditions for using Taq polymerase are known in the art and are described in Gelfand, 1989, PCR Technology, supra.

Allele Specific PCR

Allele-specific PCR differentiates between target regions differing in the presence of absence of a variation or polymorphism. PCR amplification primers are chosen which bind only to certain alleles of the target sequence. This method is described by Gibbs, Nucleic Acid Res. 17:12427-2448 (1989).

Allele Specific Oligonucleotide Screening Methods

Further diagnostic screening methods employ the allele-specific oligonucleotide (ASO) screening methods, as described by Saiki et al., Nature 324:163-166 (1986). Oligonucleotides with one or more base pair mismatches are generated for any particular allele. ASO screening methods detect mismatches between variant target genomic or PCR amplified DNA and non-mutant oligonucleotides, showing decreased binding of the oligonucleotide relative to a mutant oligonucleotide. Oligonucleotide probes can be designed that under low stringency will bind to both polymorphic forms of the allele, but which at high stringency, bind to the allele to which they correspond. Alternatively, stringency conditions can be devised in which an essentially binary response is obtained, i.e., an ASO corresponding to a variant form of the target gene will hybridize to that allele, and not to the wild type allele.

Ligase Mediated Allele Detection Method Target regions of a test subject's DNA can be compared with target regions in unaffected and affected family members by ligase-mediated allele detection. See Landegren et al., Science 241 : 107-1080 (1988). Ligase may also be used to detect point mutations in the ligation amplification reaction described in Wu et al., Genomics 4:560-569 (1989). The ligation amplification reaction (LAR) utilizes amplification of specific DNA sequence using sequential rounds of template dependent ligation as described in Wu, supra, and Barany, Proc. Nat. Acad. ScL 88:189-193 (1990).

Denaturing Gradient Gel Electrophoresis

Amplification products generated using the polymerase chain reaction can be analyzed by the use of denaturing gradient gel electrophoresis. Different alleles can be identified based on the different sequence-dependent melting properties and electrophoretic migration of DNA in solution. DNA molecules melt in segments, termed melting domains, under conditions of increased temperature or denaturation. Each melting domain melts cooperatively at a distinct, base-specific melting temperature (TM). Melting domains are at least 20 base pairs in length, and may be up to several hundred base pairs in length.

Differentiation between alleles based on sequence specific melting domain differences can be assessed using polyacrylamide gel electrophoresis, as described in Chapter 7 of Erlich, ed., PCR Technology, Principles and Applications for DNA Amplification, W.H. Freeman and Co., New York (1992), the contents of which are hereby incorporated by reference. Generally, a target region to be analyzed by denaturing gradient gel electrophoresis is amplified using PCR primers flanking the target region. The amplified PCR product is applied to a polyacrylamide gel with a linear denaturing gradient as described in Myers et al., Meth. Enzymol. 155:501-527 (1986), and Myers et al., in Genomic Analysis, A Practical Approach, K. Davies Ed. IRL Press Limited, Oxford, pp. 95-139 (1988), the contents of which are hereby incorporated by reference. The electrophoresis system is maintained at a temperature slightly below the Tm of the melting domains of the target sequences. hi an alternative method of denaturing gradient gel electrophoresis, the target sequences may be initially attached to a stretch of GC nucleotides, termed a GC clamp, as described in Chapter 7 of Erlich, supra. Preferably, at least 80% of the nucleotides in the GC clamp are either guanine or cytosine. Preferably, the GC clamp is at least 30 bases long. This method is particularly suited to target sequences with high Tm's.

Generally, the target region is amplified by the polymerase chain reaction as described above. One of the oligonucleotide PCR primers carries at its 5' end, the GC clamp region, at least 30 bases of the GC rich sequence, which is incorporated into the 5' end of the target region during amplification. The resulting amplified target region is run on an electrophoresis gel under denaturing gradient conditions as described above. DNA fragments differing by a single base change will migrate through the gel to different positions, which may be visualized by ethidium bromide staining.

Temperature Gradient Gel Electrophoresis

Temperature gradient gel electrophoresis (TGGE) is based on the same underlying principles as denaturing gradient gel electrophoresis, except the denaturing gradient is produced by differences in temperature instead of differences in the concentration of a chemical denaturant. Standard TGGE utilizes an electrophoresis apparatus with a temperature gradient running along the electrophoresis path. As samples migrate through a gel with a uniform concentration of a chemical denaturant, they encounter increasing temperatures. An alternative method of TGGE, temporal temperature gradient gel electrophoresis (TTGE or tTGGE) uses a steadily increasing temperature of the entire electrophoresis gel to achieve the same result. As the samples migrate through the gel the temperature of the entire gel increases, leading the samples to encounter increasing temperature as they migrate through the gel. Preparation of samples, including PCR amplification with incorporation of a GC clamp, and visualization of products are the same as for denaturing gradient gel electrophoresis.

Single-Strand Conformation Polymorphism Analysis

Target sequences or alleles at an particular locus can be differentiated using single- strand conformation polymorphism analysis, which identifies base differences by alteration in electrophoretic migration of single stranded PCR products, as described in Orita et al., Proc. Nat. Acad. Sd. 85:2766-2770 (1989). Amplified PCR products can be generated as described above, and heated or otherwise denatured, to form single stranded amplification products. Single-stranded nucleic acids may refold or form secondary structures which are partially dependent on the base sequence. Thus, electrophoretic mobility of single-stranded amplification products can detect base-sequence difference between alleles or target sequences.

Chemical or Enzymatic Cleavage of Mismatches

Differences between target sequences can also be detected by differential chemical cleavage of mismatched base pairs, as described in Grompe et al., Am. J. Hum. Genet. 48:212-222 (1991). In another method, differences between target sequences can be detected by enzymatic cleavage of mismatched base pairs, as described in Nelson et al., Nature Genetics 4:11-18 (1993). Briefly, genetic material from an animal and an affected family member may be used to generate mismatch free heterohybrid DNA duplexes. As used herein, "heterohybrid" means a DNA duplex strand comprising one strand of DNA from one animal, and a second DNA strand from another animal, usually an animal differing in the phenotype for the trait of interest. Positive selection for heterohybrids free of mismatches allows determination of small insertions, deletions or other polymorphisms that may be associated with polymorphisms. Non-gel Systems

Other possible techniques include non-gel systems such as TaqMan™ (Perkin Elmer), hi this system oligonucleotide PCR primers are designed that flank the mutation in question and allow PCR amplification of the region. A third oligonucleotide probe is then designed to hybridize to the region containing the base subject to change between different alleles of the gene. This probe is labeled with fluorescent dyes at both the 5' and 3' ends. These dyes are chosen such that while in this proximity to each other the fluorescence of one of them is quenched by the other and cannot be detected. Extension by Taq DNA polymerase from the PCR primer positioned 5' on the template relative to the probe leads to the cleavage of the dye attached to the 5' end of the annealed probe through the 5' nuclease activity of the Taq DNA polymerase. This removes the quenching effect allowing detection of the fluorescence from the dye at the 3' end of the probe. The discrimination between different DNA sequences arises through the fact that if the hybridization of the probe to the template molecule is not complete, i.e. there is a mismatch of some form; the cleavage of the dye does not take place. Thus only if the nucleotide sequence of the oligonucleotide probe is completely complimentary to the template molecule to which it is bound will quenching be removed. A reaction mix can contain two different probe sequences each designed against different alleles that might be present thus allowing the detection of both alleles in one reaction. Yet another technique includes an Invader Assay which includes isothermic amplification that relies on a catalytic release of fluorescence. See Third Wave Technology at www.twt.com.

Non-PCR Based DNA Diagnostics The identification of a DNA sequence linked to an allele sequence can be made without an amplification step, based on polymorphisms including restriction fragment length polymorphisms in an animal and a family member. Hybridization probes are generally oligonucleotides which bind through complementary base pairing to all or part of a target nucleic acid. Probes typically bind target sequences lacking complete complementarity with the probe sequence depending on the stringency of the hybridization conditions. The probes are preferably labeled directly or indirectly, such that by assaying for the presence or absence of the probe, one can detect the presence or absence of the target sequence. Direct labeling methods include radioisotope labeling, such as with 32P or 35S. Indirect labeling methods include fluorescent tags, biotin complexes which may be bound to avidin or streptavidin, or peptide or protein tags. Visual detection methods include photoluminescents, Texas red, rhodamine and its derivatives, red leuco dye and 3,3',5,5'-tetramethylbenzidine (TMB), fluorescein, and its derivatives, dansyl, umbelliferone and the like or with horse radish peroxidase, alkaline phosphatase and the like.

Hybridization probes include any nucleotide sequence capable of hybridizing to a porcine chromosome where one of the major effect genes resides, and thus defining a genetic marker linked to one of the major effect genes, including a restriction fragment length polymorphism, a hypervariable region, repetitive element, or a variable number tandem repeat. Hybridization probes can be any gene or a suitable analog. Further suitable hybridization probes include exon fragments or portions of cDNAs or genes known to map to the relevant region of the chromosome.

Preferred tandem repeat hybridization probes for use according to the present invention are those that recognize a small number of fragments at a specific locus at high stringency hybridization conditions, or that recognize a larger number of fragments at that locus when the stringency conditions are lowered. One or more additional restriction enzymes and/or probes and/or primers can be used. Additional enzymes, constructed probes, and primers can be determined by routine experimentation by those of ordinary skill in the art and are intended to be within the scope of the invention.

Although the methods described herein maybe in terms of the use of a single restriction enzyme and a single set of primers, the methods are not so limited. One or more additional restriction enzymes and/or probes and/or primers can be used, if desired. Indeed in some situations it may be preferable to use combinations of markers giving specific haplotypes. Additional enzymes, constructed probes and primers can be determined through routine experimentation, combined with the teachings provided and incorporated herein. According to one embodiment of the invention, polymorphisms in major effect genes have been identified which have an association with growth and meat quality. The presence or absence of the markers, in one embodiment may be assayed by PCR RFLP analysis using if needed, restriction endonucleases, and amplification primers which may be designed using analogous human, pig or other of the sequences due to the high homology in the region surrounding the polymorphisms, or may be designed using known sequences (for example, human) as exemplified in GenBank or even designed from sequences obtained from linkage data from closely surrounding genes based upon the teachings and references herein. The sequences surrounding the polymorphism will facilitate the development of alternate PCR tests in which a primer of about 4-30 contiguous bases taken from the sequence immediately adjacent to the polymorphism is used in connection with a polymerase chain reaction to greatly amplify the region before treatment with the desired restriction enzyme. The primers need not be the exact complement; substantially equivalent sequences are acceptable. The design of primers for amplification by PCR is known to those of skill in the art and is discussed in detail in

Ausubel (ed.), Short Protocols in Molecular Biology, Fourth Edition, John Wiley and Sons 1999. The following is a brief description of primer design.

PRIMER DESIGN STRATEGY Increased use of polymerase chain reaction (PCR) methods has stimulated the development of many programs to aid in the design or selection of oligonucleotides used as primers for PCR. Four examples of such programs that are freely available via the Internet are: PRIMER by Mark Daly and Steve Lincoln of the Whitehead Institute (UNIX, VMS, DOS, and Macintosh), Oligonucleotide Selection Program (OSP) by Phil Green and LaDeana Hiller of Washington University in St. Louis (UNTX, VMS, DOS, and

Macintosh), PGEN by Yoshi (DOS only), and Amplify by Bill Engels of the University of Wisconsin (Macintosh only). Generally these programs help in the design of PCR primers by searching for bits of known repeated-sequence elements and then optimizing the T_m by analyzing the length and GC content of a putative primer. Commercial software is also available and primer selection procedures are rapidly being included in most general sequence analysis packages. Sequencing and PCR Primers

Designing oligonucleotides for use as either sequencing or PCR primers requires selection of an appropriate sequence that specifically recognizes the target, and then testing the sequence to eliminate the possibility that the oligonucleotide will have a stable secondary structure. Inverted repeats in the sequence can be identified using a repeat- identification or RNA-folding program such as those described above (see prediction of Nucleic Acid Structure). If a possible stem structure is observed, the sequence of the primer can be shifted a few nucleotides in either direction to minimize the predicted secondary structure. The sequence of the oligonucleotide should also be compared with the sequences of both strands of the appropriate vector and insert DNA. Obviously, a sequencing primer should only have a single match to the target DNA. It is also advisable to exclude primers that have only a single mismatch with an undesired target DNA sequence. For PCR primers used to amplify genomic DNA, the primer sequence should be compared to the sequences in the GenBank database to determine if any significant matches occur. If the oligonucleotide sequence is present in any known DNA sequence or, more importantly, in any known repetitive elements, the primer sequence should be changed.

The methods and materials of the invention may also be used more generally to evaluate animal DNA, genetically type individual animals, and detect genetic differences in animals. In particular, a sample of animal genomic DNA may be evaluated by reference to one or more controls to determine if a polymorphism in one of the sequences is present. Preferably, RFLP analysis is performed with respect to the animal 's sequences, and the results are compared with a control. The control is the result of a RFLP analysis of one or both of the sequences of a different animal where the polymorphism of the animal gene is known. Similarly, the genotype of an animal may be determined by obtaining a sample of its genomic DNA, conducting RFLP analysis of the gene in the DNA, and comparing the results with a control. Again, the control is the result of RFLP analysis of one of the sequences of a different animal. The results genetically type the animal by specifying the polymorphism(s) in its gene. Finally, genetic differences among animals can be detected by obtaining samples of the genomic DNA from at least two animals, identifying the presence or absence of a polymorphism in one of the nucleotide sequences, and comparing the results.

These assays are useful for identifying the genetic markers relating to growth and meat quality, as discussed above, for identifying other polymorphisms in the same genes or alleles that may be correlated with other characteristics, and for the general scientific analysis of animal genotypes and phenotypes.

One of skill in the art, once a polymorphism has been identified and a correlation to a particular trait established will understand that there are many ways to genotype animals for this polymorphism. The design of such alternative tests merely represents optimization of parameters known to those of skill in the art and is intended to be within the scope of this invention as fully described herein.

In accordance with the present invention there may be employed conventional molecular biology, microbiology, and recombinant DNA techniques within the skill of the art. Such techniques are explained fully in the literature. See, e.g., Maniatis, Fritsch & Sambrook, Molecular Cloning: A Laboratory Manual (1982); DNA Cloning: A Practical Approach, Volumes I and π (D.N. Glover ed. 1985); Oligonucleotide Synthesis (M. J. Gait ed. 1984); Nucleic Acid Hybridization (B. D. Hames & S. J. Higgins eds. (1985)); Transcription and Translation (B. D. Hames & S. J. Higgins eds. (1984)); Animal Cell Culture (R. I. Freshney, ed. (1986)); Immobilized Cells And Enzymes (IRL Press, (1986)); B. Perbal, A Practical Guide To Molecular Cloning, (1984).

The following examples serves to better illustrate the invention described herein and are not intended to limit the invention in any way. Those skilled in the art will recognize that there are several different parameters which may be altered using routine experimentation._.and which are intended to be within the scope of this invention.

EXAMPLE l

CSTFl PCR-RFLP Test Taa I polymorphism Primers

CS06F : 5' ACG TCC AGA CTA TGT CCC CA 3' CS06R : 5' CTG TGC GGT CTC GTT CAT C 3'

PCR conditions:

Mix 1:

1 OX Promega Buffer 1.0 μL

25 mM MgCl₂ 1.0 μL dNTPs mix (2 mM) 0.5 μL 25 pmol/μL CS06F 0.1 μL

25 pmol/μL CS06R 0.1 μL dd sterile H₂O 7.23 μL

Taq Polymerase (5 U/μL) 0.07 μL genomic DNA (12.5ng/μL) 1.0 μL

Combine 10 μL of Mix 1 and DNA in a reaction tube. Overlay with mineral oil. Run the following PCR program: 94⁰C for 3 min; 35 cycles of 94°C for 30 sec, 59⁰C 30 sec, and

72°C 30 sec; followed by a final extension at 72°C for 3 min.

Check 3μL of the PCR reaction on a standard 2% agarose gel to confirm amplification success and clean negative control. Product size is approximately 370 base pairs. Digestion is performed using the following procedure:

Taa I Digestion Reaction 10 uL reaction

PCR product ^'*^• 6.0 μL Buffer Y⁺/Tango 1.0 μL

Taa I enzyme (lOU/μL) 0.3 μL dd sterile H₂O 2.6 μL Make a cocktail with the buffer, enzyme and water. Add 4 μL to each reaction tube containing the DNA. Incubate at 65°C at least 4 hours, although the best option is to perform the digestion overnight.

Mix 6 μL of loading dye with 10 μL of the digested PCR product and load 10 μL on a 3% agarose gel.

The Taa I pattern expected is shown in figure 2:

C20orf43 PCR-RFLP Test Mwo I polymorphism

Primers

ORl 5F : 5' CTG GGG CTT TAT GTC ACC AC 3' ORl 5R : 5' ACC ACA GAG CAT TCC AAA CA 3'

PCR conditions:

Mix l:

1 OX Promega Buffer 1.0 μL

25 mM MgCl₂ 0.4 μL dNTPs mix (2 mM) 0.5 μL

25 pmol/μL OR15F 0.1 μL

25 pmol/μL OR15R 0.1 μL dd sterile H₂O 7.83 μL

Taq Polymerase (5 U/μL) 0.07 μL genomic DNA (l<2.5ng/μL) 1.0 μL

Combine 10 μL of Mix 1 and DNA in a reaction tube. Overlay with mineral oil. Run the following PCR program: 94⁰C for 3 min; 35 cycles of 94°C for 30 sec, 54⁰C 30 sec, and 72°C 30 sec; followed by a final extension at 72°C for 3 min. Check 3μL of the PCR reaction on a standard 2% agarose gel to confirm amplification success and clean negative control. Product size is approximately 470 base pairs. Digestion is performed using the following procedure:

Mwo I Digestion Reaction 10 uL reaction

PCR product 4.0 μL

Buffer NEBl ; 1.0 μL

Mwo I enzyme (5U/μL) 0.2 μL dd sterile H₂O 4.8 μL

Make a cocktail with the buffer, enzyme and water. Add 4 μL to each reaction tube containing the DNA. Incubate at 60°C at least for 4 hours.

Mix 6 μL of loading dye with 10 μL of the digested PCR product and load 10 μL on a 3.5% agarose gel.

Mwo I pattern expected is shown in Figure 3:

Several quantitative trait loci (QTL) for growth and meat quality traits have been discovered on pig chromosome 17 (SSC17). An effort to fine map this QTL region was made and several genes were mapped to the relevant SSC17 QTL region. Previously, fifteen genes were already disclosed in PCT/US04/16418 a copy of which is attached herewith. We now update the map with two more genes (CSTFl - cleavage stimulation factor, 3' pre-RNA, subunit 1, 5OkDa; C20orf43 - chromosome 20 open reading frame 43) that were recently mapped. The updated SSC 17 map is indicated in figure 1.

In the BY population, CSTFl had a significant effect on 3 fat traits and on 5 growth traits (including¹ 'average daily gain on test). The CSTFl genotype 12 is associated with higher values of average, lumbar and tenth rib backfat. Pigs carrying genotype 11 (homozygous for the cut) presented higher values for carcass weight and loin eye area, and lower values for fiber type II ratio (lower values for this trait indicate more muscle). Two additional significant effects of these markers on average daily gain on test and birth weight were also detected, with heterozygous pigs presenting higher values for both traits. Therefore, the results obtained in the BY population suggest that genotype 11 could be regarded as the leaner genotype because it presents not only less backfat but also more muscle area. This would also be the heavier genotype, as indicated by the higher carcass weights. However, heterozygous animals grow faster, which could offset these advantages. Nevertheless, despite growing faster, these animals deposit more fat, which may not be desirable from an industry point of view. Therefore, this marker may be used in the selection of pigs in two different ways. For instance, under circumstances where leaner pigs were preferred, genotype 11 could be preferred. On the other hand, if growth rate would be more interesting then selection favoring genotype 12 could be done.

i -

45 b Table 1 - Significant associations of CSTFl genotypes with fat and growth related traits; FTYPIIR - Fiber type II ratio

Trait CSTFl

11 12

Av. backfat 3.25 ± 0.05 a 3.37 ± 0.08 b

Carcass wt. 87.3 ± 0.17 a 86.8 ± 0.32 b

Loin eye area 36.31 ± 0.54 k 34.17 ± 0.77 1

Lumbar bfat. 3.54 ± 0.06 a 3.68 ± 0.10 b

10^th rib bfat. 3.08 ± 0.06 c 3.23 ± 0.10 d

ADG Test 0.685 ± 0.006 g 0.705 ± 0.009 h

Birth wt. 1.522 ± 0.04 g 1.631 ± 0.05 h

FTYPIIR 0.981 ± 0.05 k 1.350 ± 0.10 1 Significance levels used: a, b - 0.3; c, d- 0.1; e, f- 0.05; g, h- 0.01; i, j - 0.005; k, 1- 0.001; m, n- 0.0005; o, p - O.OOOlyou have repeated the first two rows of the table

For CSTFl the results detected in the BY and PIC datasets seem to be in complete agreement. Just like in the BY data, In the PIC datasets the CSTFl genotype 11 is also associated with higher carcass weight, less backfat and, consequently higher lean meat percentage. CSTFl genotype 22 is the preferred genotype for all growth traits analyzed because animals carrying this genotype are heavier at the end of the test period, present higher values for life time daily gain and daily gain while on test and, consequently, spend fewer days in the test period. Just like in the BY, selection can again be made favoring either genotype, depending on the traits being improved. Furthermore, CSTFl also has a significant association with the color trait ham Minolta, where animals with genotype 22 present higher values, which indicate paler meat. Nevertheless, it is likely that the advantages regarding the growth traits will offset these minor disadvantages and selection favoring genotype 22, the less frequent genotype, could be envisioned. Table 2 contains the results of the association analysis with CSTFl in the PIC lines. Table 2 - Significant associations of CSTFl detected in the PIC lines

CSTFl

Trait 11 12 22 P-value dirtywt 246.9 ± 1.58 c a 243.5 ± 1.55 d 243.3 ± 2.82 b 0.18 hammina 8.03 ± 0.16 e 8.16 ± 0.16 e 8.89 ± 0.32 f 0.04

Imprct 45.63 ± 0.15 a 45.69 ± 0.16 a 45.25 ± 0.28 b 0.34 aloc f 13.99 ± 0.26 a 13.61 ± 0.25 b 14.24 ± 0.43 a 0.22 endwt 110.9 ± 0.54 e 111.0 ± 0.53 e 113.0 ± 0.96 f 0.10 days 159.6 ± 0.78 e 159.3 ± 0.75 e 156.2 ± 1.39 f 0.07 ldg . 669.7 ± 3.42 c 668.6 ± 3.35 e 682.6 ± 6.06 d f 0.09 tdg 857.6 ± 6.19 c 850.4 ± 6.03 e 878.9 ± 10.4 d f 0.04

Significance levels used: a, b - 0.3; c, d - 0.1; e, f- 0.05; g, h- 0.01; i, j - 0.005; k, 1 - 0.001; m, n - 0.0005; o, p - 0.0001

The variability observed for C20orf43 in the BY population was fairly low (only 24 heterozygous animals were detected in the whole population). It was still possible to map the marker with this low number of heterozygote pigs, but obviously this is likely the main reason which explains why no significant (P < 0.1) associations were found in the BY data. Analysis of the PIC data indicates that pigs carrying C20orf43 genotype 11 presented higher values for Henessey probe and Aloca backfat thickness, while having lower values for lean meat percentage. Accordingly, C20orf43 genotype 11 was found to be associated with higher weights at the end of the test period, mainly due to the higher life time daily gain and daily gain while on test observed for this genotype. And as a consequence, these animals spend fewer days in the test period. Furthermore, they also seem to have paler meat, as indicated by the higher values for ham Minolta. An effect of this marker on pH is also detected, with animals carrying genotype 22 presenting the lowest pH values. These results are indicated on table 3.

r ^■ Table 3 - Significant associations of C20orf43 detected in the PIC lines

C20orf43

Trait 11 12 22 P-value loinph 5.72 ± 0.02 a 5.72 ± 0.01 e 5.69 ± 0.01 b f 0.04 hammina 8.43 ± 0.28 a 8.09 ± 0.16 b 8.12 ± 0.16 0.53 hprofat 15.93 ± 0.37 c 15.24 ± 0.24 da 15.69 ± 0.25 b 0.11 lmprct ^■ 45.41 ± 0.23 a 45.73 ± 0.15 b 45.64 ± 0.15 0.42 alocjf 14.31 ± 0.38 a 13.80 ± 0.25 b 13.90 ± 0.26 0.44 endwt 112.3 ± 0.85 c 111.8 ± 0.50 c 110.5 ± 0.52 d 0.08 days 157.1 ± 1.22 e 158.2 ± 0.70 c 159.9 ± 0.73 fd 0.06 ldg 680.1 ± 5.36 e 674.2 ± 3.18 a 667.6 ± 3.31 fb 0.08 tdg ^>y 870.4 ± 9.30 a c 859.6 ± 5.65 b 852.9 ± 5.83 d 0.24

Significance levels used: a, b - 0.3; c, d - 0.1; e, f- 0.05; g, h - 0.01; i, j - 0.005; k, 1 - 0.001; m, n- 0.0005; o, p - 0.0001

The results determined in these commercial lines suggest strong associations of both markers with growth related traits (weight at the end of the test period, days until the end of the test period, daily gain from birth to the end of test period and daily gain while on test) and other meat quality traits. These are all valuable traits for the pork industry. In a most preferred embodiment, both CSTFl and C20orf43 may be used simultaneously. If this strategy is adopted, then it is very likely that selection will be possible not only for growth and fatness traits but also for meat quality traits. The use of CSTFl and C20orf43 as genetic markers for growth and meat quality is recommended. The CSTFl consensus sequence is shown in figure 4. The position of a single nucleotide polymorphism is indicated in bold.

The C20orf43 consensus sequence in shown in Figure 5. The position of a 22 base pair insertion/deleϊion is indicated in bold

Example 2 The human CSTFl gene is sequenced and the information is accessible in Genbank. The

Gen bank report and sequence information are reprinted below. This information can be used to locate additional primers useful for the invention, using the same method used and described herein.

NM 001324. Reports Homo sapiens clea...[gi:4557490] Links

LOCUS NM_001324 1801 bp mRNA linear PRI 02-

MAR-2005

DEFINITION Homo sapiens cleavage stimulation factor, 3' pre-RNA, subunit

1,

5OkDa (CSTFl), mRNA. ACCESSION NH_001324 VERSION NM_001324.1 GI: 4557490 KEYWORDS

SOURCE Homo sapiens (human) ORGANISM Homo sapiens

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ;

Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini;

Hominidae; Homo.

REFERENCE I_^ (bases 1 to 1801) AUTHORS Takagaki,Y. and Manley,J.L. TITLE A polyadenylation factor subunit is the human homologue of the

Drosophila suppressor of forked protein

JOURNAL Nature 372 (6505), 471-474 (1994)

PUBMED 7984242

REFERENCE 2, (bases 1 to 1801)

AUTHORS Takagaki,Y. and Manley, J. L.

TITLE A human polyadenylation factor is a G protein beta-subunit homologue

JOURNAL J^N: Biol. Chem. 267 (33), 23471-23474 (1992)

PUBMED 1358884

REFERENCE 3 _; (bases 1 to 1801)

AUTHORS Takagaki,Y., MacDonald, CC. , Shenk,T. and Manley, J. L.

TITLE The human 64-kDa polyadenylylation factor contains a ribonucleoprotein-type RNA binding domain and unusual auxiliary motifs

JOURNAL Proc. Natl. Acad. Sci. U.S.A. 89 {4 1403-1407 (1992) PUBMED 1741396 COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The reference sequence was derived from L02547.1.

Summary: This gene encodes one of three subunits which combine to form cleavage stimulation factor (CSTF) . CSTF is involved in the polyadenylation and 3 'end cleavage of pre-mRNAs. Similar to mammalian G protein beta subunits, this protein contains transducin-like repeats. COMPLETENESS: complete on the 3' end.

FEATURES Location/Qualifiers source ^{"" '"} 1..1801

/organism="Homo sapiens"

/mol_type="mRNA"

/db xref="taxon:9606"

/chromosome="20"

/map="20ql3.31"

1..1801

/gene="CSTFl"

/db_xref="GeneID : 1477" . <.-. /db xref="MIM: 600369" CDS 182..1477

/gene="CSTFl"

/note="go_component : nucleus [goid 0005634]

[evidence TAS]

[pmid 1358884] ; go_function: RNA binding [goid 0003723] [evidence

TAS]

[pmid 1741396] ; go process: mRNA cleavage [goid 0006379] [evidence

TAS] ^~

[pmid 1358884] ; go_process: RNA processing [goid 0006396] [evidence

TAS]

[pmid 1741396] ; go_process: mRNA polyadenylylation [goid 0006378]

[evidence TAS] [pmid 1358884]"

/codon_start=l

/product="cleavage stimulation factor subunit 1"

/protein id="NP 001315.1" /db_xref="GI: 4557491"

/db xref="CCDS:CCDS13452.1"

/db_xref="GeneID : 1£7_7-"

/db xref="MIM: 600369" /translation="MYRTKVGLKDRQQLYKLIISQLLYDGYISIANGLINEIKPQSVC APSEQLLHLIKLGMENDDTAVQYAIGRSDTVAPGTGIDLEFDADVQTMSPEASEYETC YVTSHKGPCRVATYSRDGQLIATGSADASIKILDTERMLAKSAMPIEVMMNETAQQNM

ENHPVIRTLYDHVDEVTCLAFHPTEQILASGSRDYTLKLFDYSKPSAKRAFKYIQEAE

MLRSISFHPSGDFILVGTQHPTLRLYDINTFQCFVSCNPQDQHTDAICSVNYNSSANM YVTGSKDGCIKLWDGVSNRCITTFEKAHDGAEVCSAIFSKNSKYILSSGKDSVAKLWE

ISTGRTLVRYTGAGLSGRQVHRTQΆVFNHTEDYVLLPDERTISLCCWDSRTAERRNLL

SLGHNNIVRCIVHSPTNPGFMTCSDDFRARFWYRRSTTD" polyA signal 1779..1784 . /gene="CSTFl" polyA sfte 1801

/gene="CSTFl" ORIGIN

1 agaggagtgg gaccgatcga tagcgcagcg gtcgcttggc gccctttcag cgtgcgcagt

61 gaacgtgcgc tcggagcggt agattgggca ggattcgcgc ctccattttt ccaggagaga ^{■'' Λ}

121 gcgggatacc aagagaaccg gaccagctgc tggcagggaa actgtcttcc ttttctccaa 181 gat_Λgtacaga accaaagtgg gcttgaagga ccgccagcag ctctacaagc tgatcattag

241 ccagctgcta tatgacggct acatcagcat cgccaatggc ctcatcaatg aaatcaagcc

301 tcagtctgtg tgtgcaccct cggagcagct cctgcatctc atcaaactcg gaatggaaaa

361 cgatgacacc gcagttcagt atgcaattgg tcgttcagat actgttgccc ctggcacagg

421 gattgacctg gaatttgatg cagatgttca gactatgtcc ccagaggctt ctgagtacga 481 aacatgctat gtcacatcac ataaaggacc atgccgtgta gctacctata gtagagatgg

541 acagttaata gctactgggt ctgctgatgc ttcgataaag atacttgaca cagagaggat 601 gttggccaaa agtgccatgc caatagaggt catgatgaat gagaccgcac aacaaaatat

661 ggaaaaccac ccagtgattc gaactcttta tgaccatgtg gatgaagtca cgtgccttgc 721 tttccaccca acagaacaga tcctggcttc tggttcaagg gattatactc ttaaattatt

781 tgattattcc aaaccatcag caaaaagagc cttcaaatac attcaggaag ctgaaatgtt

841 acgttccatc tcttttcatc cttctggaga ctttatactt gtcggaactc agcatcctac

901 tcttcgcctt tatgatatca acacctttca atgttttgtc tcttgcaatc ctcaagatca

961 acacaccgat gctatatgtt ccgttaatta caattctagt gccaatatgt acgtaactgg >, > 1021 aagcaaggac ggctgcatca aattatggga tggtgtttca aatcgatgca tcacaacttt

1081 tgagaaagca catgacggtg ctgaagtttg ttctgccatt ttttccaaaa attctaaata

1141 cattctctca agtggaaaag actctgtagc taaactttgg gaaatatcaa cgggacgaac

1201 actggtcaga tacacgggcg cgggtttaag tggacgccag gtgcaqcgga cacaggctgt

1261 gtttaaccac accgaggact atgtgttgct gcccgacgag aggacgatca gtctttgctg 1321 ctgggactcg aggacagccg agcggagaaa cctgctgtcg ttggggcaca acaatattgt

1381 acgctgcata gtgcactccc ccaccaaccc cgggttcatg acgtgcagcg atgacttcag

1441 agcgcggttt tggtaccgga gatcgaccac tgactgagcc accctctccg tagggttctt

1501 tctcgaggac tctaccctcc tcccccacgt cctgtctcag ctgcagtcgt aagtccgtgc

1561 accat;ccttg acgttttgct gccacctctg tccacattct tcttggattt gtataaaaga 1621 atcttttttt accttgatgt agaatcatgg tggaaaaagt tggaaacaca gatctgtgca

1681 gttctacatt cactgattat tacagtgtga ttttcatcgg ttttgtaagt acaggacttg

1741 ccgtttcttt tgatctcttg attgaaggag gatagggcat taaagtgctt ttgacatgag 1801 g

Example 3 The human Chromosome 20 orf 43 gene is sequenced and the information is accessible in Genbank. The Gen bank report and sequence information are reprinted below. This information can be used to locate additional primers useful for the invention, using the same method used and described herein.

: NM 016407. Reports Homo sapiens chro...[gi:77054821 Links LOCUS NM_016407 1639 bp mRNA linear PRI 02-

MAR-2005

DEFINITION Homo sapiens chromosome 20 open reading frame 43 ( C20orf43 ) , mRNA . ACCESSION NMJD 16407

VERSION NM_016407 . 1 GI : 7705482

KEYWORDS

SOURCE Homo sapiens (human)

ORGANISM Homo sapiens Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;

Euteleostomi ;

Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini; Ho'minidae; Homo.

REFERENCE 1 (bases 1 to 1639) AUTHORS Strausberg,R.L. , Feingold, E .A. , Grouse, L. H., Derge,J.G.,

Klausner,R. D. , Collins, F. S ., Wagner, L., Shenmen, CM. , Schuler,G.D. , '

Altschul, S. F. , Zeeberg,B., Buetow,K.H., Schaefer,C. F. , Bhat,N.K., Hopkins, R. F. , Jordan, H., Moore,T., Max, S. I., Wang, J.,

Hsieh, F. ,

Diatchenko, L. , Marusina,K., Farmer, A.A., Rubin, G. M., Hong, L., Stapleton,M. , Soares,M.B., Bonaldo,M. F. , Casavant, T . L. , Scheetz,T.E. , Brownstein,M. J. , Usdin,T.B., Toshiyuki, S . , Carninci,P., Prange,C, Raha,S.S., Loquellano, N.A. ,

Peters, G. J., ^

Abramson,R. D. , Mullahy, S . J. , Bosak,S.A., McEwan,P.J., McKernan, K. J. , Malek,J.A., Gunaratne, P. H. , Richards, S., Worley,K.C, Hale, S., Garcia, A. M., Gay, L. J., Hulyk,S.W., Villalon,D.K. , Muzny,D.M., Sodergren, E. J. , Lu, X., Gibbs,R.A.,

Fahey,J., Helton, E., Ketteman,M., Madan,A., Rodrigues, S . , Sanchez, A. , Whiting, M., Madan,A., Young, A. C, Shevchenko, Y. , Boιiffard,G.G. , Blakesley, R. W. , Touchman, J. W. , Green, E. D., Dickson,M.C. , Rodriguez, A. C. , Grimwood,J., Schmutz,J., Myers, R. M.,

Butterfield, Y. S . , Krzywinski,M. I . , Skalska,U., Smailus, D. E. , Schnerch,A., Schein,J.E., Jones, S.J. and Marra,M.A. CONSRTM Mammalian Gene Collection Program Team

TITLE Generation and initial analysis of more than 15,000 full- length human and mouse cDNA sequences JOURNAL Proc. Natl. Acad. Sci. U.S.A. 99 (26), 16899-16903 (2002)

PUBMED 12477932 REFERENCE 2 (bases 1 to 1639) AUTHORS Zhang, Q. H., Ye, M., Wu, X. Y., Ren, S. X., Zhao, M., Zhao, C. J., Fu, G., *

Shen,Y., Fan, H. Y., Lu, G., Zhong,M., Xu, X. R., Han, Z. G., Zhang, J. W. ,

Tao,J., Huang, Q. H., Zhou, J., Hu, G. X., Gu, J., Chen, S. J. and Chen, Z.

TITLE Cloning and functional analysis of cDNAs with open reading frames -^ for 300 previously undefined genes expressed in CD34+ hematopoietic stem/progenitor cells

JOURNAL Ge'nome Res. 10 (10), 1546-1560 (2000)

PUBMED 11042152

COMMENT PREDICTED REFSEQ: The mRNA record is supported by experimental evidence; however, the coding sequence is predicted. The reference sequence was derived from AF161513.1. FEATURES Location/Qualifiers source 1..1639

/organism="Homo sapiens"

/mol_type="mRNA"

/db xref="taxon:9606" /chromosome="20"

/map="20ql3.31" gene 1..1639

/gene="C20orf43"

/note="synonyms: CDAO5, HSPCl64" /db xref="GeneID: 51507"

CDS 71..991

/gene="C20orf43"

/codon_start=l

/product="chromosome 20 open reading frame 43" /protein id="NP 057491.1"

/db_xref="GI : 7705483"

/db xref="CCDS:CCDS13453.1"

/db xref="GeneID: 51507"

I /translation="MGCDGGTIPKRHELVKGPKKVEKVDKDAELVAQWNYCTLSQEIL

RRPIVACELGRLYNKDAVIEFLLDKSAEKALGKAASHIKSIKNVTELKLSDNPAWEGD KGNTKGDKHDDLQRARFICPVVGLEMNGRHRFCFLRCCGCVFSERALKEIKAEVCHTC

GAAFQEDDVIVLNGTKEDVDVLKTRMEERRLRAKLEKKPKKPKAAESASKPDVSEEAP

GPSKVKTGKPEEASLDSREKKTNLAPKSTAMNESSSGKAGKPPCGATKRSIADSEESE

AYKSLFTTHSSAKRSKEESAHWVTHTSYCF" ORIGIN

1 gggatttcgc gggaaatccc ggaagtgaca gctttggggg tttgctgctg gctctgactc

61 ccgtcctgcg atgggttgcg acgggggaac aatccccaag aggcatgaac tggtgaaggg i_t 121 gccgaagaag gttgagaagg tcgacaaaga tgctgaatta gtggcccaat ggaactattg

181 tactctaagt caggaaatat taagacgacc aatagttgcc tgtgaacttg gcagacttta

241 taacaaagat gccgtcattg aatttctctt ggacaaatct gcagaaaagg ctcttgggaa

301 ggcagcatct cacattaaaa gcattaagaa tgtgacagag ctgaagcttt ctgataatcc

361 tgcctgggaa ggggataaag gaaacactaa aggtgacaag cacgatgacc tccagcgggc ,<, 421 gcgtttcatc tgccccgttg tgggcctgga gatgaacggc cgacacaggt tctgcttcct

481 tcggtgctgc ggctgtgtgt tttctgagcg agccttgaaa gagataaaag cggaagtttg ^!

541 ccacacgtgt ggggctgcct tccaggagga tgatgtcatc gtgctcaatg gcaccaagga

601 ggatgtggac gtgctgaaga caaggatgga ggagagaagg ctgagagcga agctggaaaa

661 gaaaccaaag aaacccaagg cagcagagtc tgcttcaaaa ccagatgtca gtgaagaagc 721 cccagggcca tcaaaagtta agacagggaa gcctgaagaa gccagccttg attctagaga

781 gaagaaaacc aacttggctc ccaaaagcac agcaatgaat gagagctctt ctggaaaagc

841 tgggaagcct ccgtgtggag ccacaaagag gtccatcgct gacagtgaag aatcggaggc

901 ctacaagtcc ctctttacca ctcacagctc cgccaagcgc tccaaggagg agtctgccca 961 ctgggtcacc cacacgtcct actgcttctg aagcccgcac tgccaccgct cctgccccag

1021 aaggttgttt agtttccacg taggcaggtc gctttgtgcc tctgagtgcg ctgctgtgtg 1081 ttctctctat agttctgtgt cataaagctg tcctgggcca gccttcaagc tgggtgttgg

1141 ccactcttga tgtgaggcgt gtcggttcca ggggggacat gggaggggct gcacagtggc

1201 ccgaggtcat gcttgcttcc acctgcaggt gcatttggtc ctttccatgg ccaggaagcc

1261 ctgtgggctg cactttttat gcttgcagta acaagagact ccagagtcct caccggtgca

1321 gagttggcac atattaatta actaaaattc taatgatctt gctaccagca ataaatcaag '• 1381 taggccaagt gaaactgggc tttaaaaagg atggatttca aatacactgt gcccactaga

1441 agcttcgaag ggcctcgtcc ctctgctaca gccctgggag gagccaggat ccttgttggt

1501 ctagctaaat actgttaggg gagtgtgccc catctcatca tttcgaagat agcagagtca ^■;

1561 tagttgggca cccagtgatt gggttcaaaa ataaagctgg tctgcctctc caaaaaaaaa

1621 aaaaaaaaaa aaaaaaaaa //

As canbe>seenfromthe foregoingthe inventionaccomplishes at least all ofits objectives. All references citedherein areherebyincorporated in their entiretyhereinby reference.

Claims

What is claimed is:

1. A method of selecting a first pig by marker assisted selection of a quantitative trait locus associated with meat quality, fat and/or growth traits said method comprising: determining the presence of a locus in the first pig where the locus is located on chromosome 17 in a region of approximately 90.4cM to approximately 92.9cM and is genetically linked^:to a polymorphic marker selecting said first animal comprising the marker and thereby selecting the quantitative trait locus associated with growth traits.

2. The method of claim 1 wherein said marker is a polymorphic restriction site selected from the group consisting of Taa I.

3. A method of identifying an allele that is associated with growth traits comprising: obtaining a tissue or body fluid sample from an animal; amplifying DNA present in said sample comprising a region of chromosome 17 at a region of approximately 90.4cM to approximately 92.9cM detecting the presence of a polymorphic marker in said chromosomal region wherein said marker is associated with phenotypic variation in growth traits.

4. A method of determining a genetic marker which may be used to identify and select animals based upon their growth traits comprising: obtaining a sample of tissue or body fluid from said animals, said sample comprising DNA; amplifying DNA present in said sample in a region of chromosome 17 of approximately 90.4cM to approximately 92.9cM, present in said sample from a first animal; determining the presence of a polymorphic allele present in said sample by comparison of said sample with a reference sample or sequence; correlating variability for growth, fatness or meat quality in said animals with said polymorphic allele; so that said allele may be used as a genetic marker for the same in a given group, population, or species.

5. A method of determining a genetic marker which may be used to identify and select animals based upon their meat quality or growth traits comprising: determining a polymorphic allele in useful linkage disequilibrium with the marker disclosed in claim 4.

6. A method of determining a genetic marker which may be used to identify and select animals based upon their meat quality, fatness or growth traits comprising: obtaining a sample of tissue or body fluid from said animals, said sample comprising DNA; amplifying DNA present in said sample in a region of chromosome 17 of approximately 90.4 cM to approximately 92.9 cM, present in said sample from a first animal; determining the presence of a polymorphic allele present in said sample by comparison of said sample with a reference sample or sequence; correlating variability for growth, fatness or meat quality in said animals with said polymorphic allele; so that said allele may be used as a genetic marker for the same in a given group, population, or species.

7. A method of determining a genetic marker which may be used to identify and select animals based upon their meat quality or growth traits comprising: determining a polymorphic allele in useful linkage disequilibrium with the marker disclosed in claim 6.

8. The method of claim 6 wherein said step of determining is selected from the group consisting of: restriction fragment length polymorphism (RFLP) analysis, minisequencing, MALD-TOF, SINE, heteroduplex analysis, single strand conformational polymorphism (SSCP), denaturing gradient gel electrophoresis (DGGE) and temperature gradient gel electrophoresis (TGGE).

9. The method of claim 6 wherein said animal is a pig.

10. The method of claim 6 wherein said amplification includes the steps of: selecting a forward and a reverse primer capable of amplifying a said region of chromosome 17.

11. A method for identifying a pig with an increased likelihood of having a phenotype

which includes lean traits, wherein a pig with a cytosine at position 108 of SEQ ID NO: 1

shown in figure 4 is indicative of said pig being more likely to have improved leanness

than a pig with an thymine at position 108 of SEQ DD NO: 1, said method comprising: detecting the nucleotide present at position 108 of SEQ ID NO: 1, and relating the nucleotide to the phenotype.

12. The method of claim 11 wherein the nucleotide is detected at position 108 of a PCR amplified sequence using a forward primer and a reverse primer.

13. The methςd of claim 12 wherein the step of detecting the nucleotide is a method employing allele specific primers.

14. The method of claim 13 wherein said forward primer has an oligonucleotide sequence 5' acgtccagactatgtcccca 3' (SEQ ID NO:3) and said reverse primer has an oligonucleotide sequence 5' ctgtgcggtctcgttcatc 3' (SEQ ID NO:4).

15. The method of claim 11 wherein the step of detecting the nucleotide is selected from the group consisting of restriction fragment length polymorphism (RFLP) analysis, heteroduplex analysis, single strand conformational polymorphism (SSCP) analysis, denaturing gradient gel electrophoresis (DGGE), and temperature gradient gel electrophoresis (TGGE).

16. The method of claim 11 further comprising the step of amplifying SEQ ID NO: 1 or

a region of thereof containing said nucleotide.

17. The method of claim 16 further comprising the step of digesting the amplified

region with the restriction endonuclease Taa I.

18. The method of claim 17, wherein restriction fragments of 165, 118, and 86 base pairs indicate the presence of an thymine at nucleotide at position 108 of SEQ ID NO: 1.

19. The method of claim 17, wherein restriction fragments of 251 and 118 base pairs indicate the presence of the both an cytosine nucleotide at position 108 of SEQ ID NO: 1.

20. A method for identifying a pig with an increased likelihood of having a phenotype which includes improved growth traits, wherein a pig with a thymine at position 108 of SEQ ID NO: 1 shown in figure 4 is indicative of said pig being more likely to have

improved leanness than a pig with an thymine at position 108 of SEQ ID NO: 1, said method comprising: detecting the nucleotide present at position 108 of SEQ ID NO: 1, and relating the nucleotide to the phenotype.

21. The method of claim 20 wherein the nucleotide is detected at position 108 of a PCR amplified sequence using a forward primer and a reverse primer.

22. The method of claim 20 wherein the step of detecting the nucleotide is a method employing allele specific primers.

23. The method of claim 21 wherein said forward primer has an oligonucleotide

sequence 5' acgtccagactatgtcccca 3' (SEQ ID NO:3) and said reverse primer has an

oligonucleotide sequence 5' ctgtgcggtctcgttcatc 3' (SEQ ID NO:4).

24. The method of claim 20 wherein the step of detecting the nucleotide is selected

from the group consisting of restriction fragment length polymorphism (RFLP) analysis, heteroduplex analysis, single strand conformational polymorphism (SSCP) analysis, denaturing gradient gel electrophoresis (DGGE), and temperature gradient gel electrophoresis (TGGE).

25. The method of claim 24 further comprising the step of amplifying SEQ ID NO: 1 or a region of thereof containing said nucleotide.

26. The method of claim 25 further comprising the step of digesting the amplified region with the restriction endonuclease Taa I.

27. The method of claim 26, wherein restriction fragments of 165, 118, and 86 base pairs indicate the presence of an thymine at nucleotide at position 108 of SEQ ID NO: 1.

28. The method of claim 26, wherein restriction fragments of 251 and 118 base pairs indicate the presence of the both an cytosine nucleotide at position 108 of SEQ ID NO: 1.

29. A method for identifying a pig with an increased likelihood of having a phenotype

which includes improved growth traits, wherein a pig with a 22 bp insertion at position 275 of SEQ ID NO: 2 is indicative of said pig being more likely to have the phenotype than a

pig with a 22 bp deletion at position 275 of SEQ ID NO: 2, said method comprising:

detecting the nucleotide present at position 275 of SEQ ID NO: 2; and relating the

nucleotide to the phenotype.

30. The method of claim 29 wherein the nucleotide is detected at position 275 of a PCR sequence using a forward primer and a reverse primer.

31. The method of claim 29 wherein the step of detecting the nucleotide is a method employing allele specific primers.

32. The method of claim 29 wherein said forward primer has an oligonucleotide

sequence 5' ctggggctttatgtcaccac 3' (SEQ ID NO:5) and said reverse primer has an oligonucleotide sequence 5' accacagagcattccaaaca 3' (SEQ ID NO:6).

33. The method of claim 29 wherein the step of detecting the nucleotide is selected from the group consisting of restriction fragment length polymorphism (RFLP) analysis, heteroduplex analysis, single strand conformational polymorphism (SSCP) analysis, denaturing gradient gel electrophoresis (DGGE), and temperature gradient gel

electrophoresis (TGGE).

34. The method of claim 29 further comprising the step of amplifying SEQ ED NO: 2 or a region of thereof containing said nucleotide.

35. A method for identifying a pig with an increased likelihood of having a phenotype

which includes improved meat quality, wherein a pig with a 22 bp deletion at position 275 of SEQ ID NO: 2 is indicative of said pig being more likely to have the phenotype than a

pig with a 22 bp deletion at position 275 of SEQ ID NO: 2, said method comprising: detecting the nucleotides present at position 275 of SEQ ID NO: 2; and relating the

nucleotide to the phenotype.

36. The method of claim 35 wherein the nucleotide is detected at position 275 of a PCR sequence using a forward primer and a reverse primer.

37. The method of claim 35 wherein the step of detecting the nucleotide is a method employing allele specific primers.

38. The method of claim 35 wherein said forward primer has an oligonucleotide sequence 5' ctggggctttatgtcaccac 3' (SEQ ID NO:5) and said reverse primer has an oligonucleotide sequence 5' accacagagcattccaaaca 3¹ (SEQ ID NO:6).

39. The method of claim 35 wherein the step of detecting the nucleotide is selected from the group consisting of restriction fragment length polymorphism (RFLP) analysis, heteroduplex analysis, single strand conformational polymorphism (SSCP) analysis, denaturing gradient gel electrophoresis (DGGE), and temperature gradient gel

electrophoresis (TGGE).

40. The method of claim 39 further comprising the step of amplifying SEQ ID NO.2 or a region of thereof containing said nucleotide.

41. A method of genetically identifying a marker correlated with desired meat quality,

fat and/or growth traits comprising: obtaining a sample of genetic material from said animal; assaying for the presence of a polymorphic allele in a CSTFl or C20 orf 43 gene; and correlating whether a statistically significant association exists between said polymorphic allele and desired muscle growth and/or favorable meat quality, fat and/or growth traits in an animal of a particular breed, strain, population, or group whereby said animal can be characterized for said marker.

42. A method of genetically identifying a marker correlated with desired meat quality, fat and/or growth traits comprising: obtaining a sample of genetic material from said animal; assaying for the presence of a polymorphic allele in a region of chromosome 17 approximately 90.4 cM to approximately 92.9 cM and correlating whether a statistically significant association exists between said polymorphic allele and desired muscle growth and/or favorable meat quality, fat and/or growth traits in an animal of a particular breed, strain, population, or group whereby said animal can be characterized for said marker