WO2008058013A2 - Compositions and methods for predicting body size in dogs - Google Patents

Compositions and methods for predicting body size in dogs Download PDF

Info

Publication number
WO2008058013A2
WO2008058013A2 PCT/US2007/083496 US2007083496W WO2008058013A2 WO 2008058013 A2 WO2008058013 A2 WO 2008058013A2 US 2007083496 W US2007083496 W US 2007083496W WO 2008058013 A2 WO2008058013 A2 WO 2008058013A2
Authority
WO
WIPO (PCT)
Prior art keywords
dog
body size
markers
dogs
canfaml
Prior art date
Application number
PCT/US2007/083496
Other languages
French (fr)
Other versions
WO2008058013A3 (en
Inventor
Elaine A. Ostrander
Nathaniel B. Sutter
Pascale Quignon
Robert K. Wayne
Melissa Gray
Carlos D. Bustamante
Badri Padhukasahasram
Keyan Zhao
Magnus Nordborg
Ebenezer Satyaraj
Paul G. Jones
Dennis F. Lawler
Original Assignee
The Government Of The United States Of America As Represented By The Secretary Of The Department Of Health And Human Services
University Of California At Los Angeles
Cornell University
Mars, Inc.
Nestle Research Center (Nrc-Stl)
University Of Southern California
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Government Of The United States Of America As Represented By The Secretary Of The Department Of Health And Human Services, University Of California At Los Angeles, Cornell University, Mars, Inc., Nestle Research Center (Nrc-Stl), University Of Southern California filed Critical The Government Of The United States Of America As Represented By The Secretary Of The Department Of Health And Human Services
Publication of WO2008058013A2 publication Critical patent/WO2008058013A2/en
Publication of WO2008058013A3 publication Critical patent/WO2008058013A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/172Haplotypes

Definitions

  • This disclosure relates to the field of genetic diagnostics. More particularly, this disclosure relates to compositions and methods for predicting body size in dogs.
  • the domestic dog shows remarkable variation in size, proportion, behavior and coat color far exceeding that of any other quadruped (Wayne, Evolution, 40:243-261, 1986). Size variation is especially extreme ranging from the one kilogram tea-cup poodle to the 100 kilogram English mastiff and surpassing that of all living and extinct species in the dog family, Canidae (Wayne, J. Morphol., 187:301-319, 1986; Wayne et al, J. Hered., 80:447-454, 1989).
  • the insulin-like growth factor- 1 (IGFl) gene is a strong genetic determinant of body size in the mouse; the knock-out is just 60% normal mass (Baker et al, Cell, 75:73-82, 1993; Liu et al., Cell, 75:59-72, 1993) and haploinsufficient mice have lower mass, lower bone mineral density and shorter femurs (He et al., Bone, 38:826- 835, 2006). Similarly, a human with a homozygous partial deletion of the gene was born extremely small and grew very slowly throughout childhood (Woods et al. , Acta. Paediatr.
  • IGFl mediates many of the growth-promoting properties of growth hormone (Cohen, Hormone Research, 65:3-8, 2006). Growth hormone activates transcription of IGFl in the liver (Mathews et al, Proc. Natl.
  • IGFl binds the type 1 IGF receptor, a tyrosine kinase signal transducer, and induces cell growth, maintenance of cell survival (Kooijman, Cytokine Growth Factor Rev., 17:305-323, 2006), and induction of cellular differentiation (Cohen, Hormone Research, 65:3-8, 2006).
  • studies seeking a correlation between blood serum IGFl protein levels and body size have produced inconsistent results.
  • Eigenmann et al. found that standard poodles had a six fold higher IGFl protein concentration in plasma than toy poodles (Eigenmann et ah, Acta. Endocrinol, 106:448-453, 1984) and Tryfonidou et al. observed a three fold difference between miniature poodles and great Danes (Tryfonidou et al. , J. Anim. ScL, 81:1568-1580, 2003). However, Favier et al. found no difference in serum IGFl protein levels between beagles and great Danes (Favier et al., J. Endocrinol, 170:479-484, 2001).
  • IGFl maps to chromosome 15 in the dog (Mellersh et al, Mamm Genome 11:120-30, 2000). To identify factors contributing to size variation in dogs, a sequence based marker discovery was initiated across the interval between 34 and 49Mb of chromosome 15 in the Portuguese water dog. Two quantitative trait loci (QTL, FH2017 at 37.9Mb and FH2295 at 43.5Mb) within this region were strongly associated with body size across 463 Portuguese water dogs with 100 radiographic skeletal measurements for size (Chase et al, Proc. Natl. Acad. ScL USA, 99:9930- 9935, 2002; Chase et al, Genome Res., 15:1820-1824, 2005).
  • QTL quantitative trait loci
  • This disclosure concerns markers that define chromosomal haplotypes that identify an IGFl associated quantitative trait loci (QTL) associated with adult dog body size.
  • Adult dogs are at least about one year old.
  • An aspect of this disclosure provides markers on chromosome 15 within the interval between canine family version #1 genomic sequence (Canfaml) reference positions 34,000,000 to 49,000,000 flanking the IGFl locus.
  • Canfaml genomic sequence was completed in 2004 by the Broad Institute of MIT/Harvard and Agencourt Bioscience.
  • the Canfaml sequence is available through UCSC Genome Bioinfomatics (for a description of the browser see, Kent et al., Genome Res. 12:996-1006, 2002).
  • Alleles and haplotypes are dislosed that are predictive of a directional contribution to body size by an IGFl associated QTL. Also disclosed are methods for predicting adult body size in dogs, for example, by determining a directional contribution to body size by the QTL, using the disclosed markers. Kits for performing such methods are also disclosed.
  • FIG. IA is a graph showing localization of a QTL associated with adult body size in Portuguese water dogs to a position on chromosome 15 corresponding to the IGFl locus.
  • FIGS. IB and 1C are graphs showing correlation of body size and serum IGF with different haplotypes segregating in a population of Portuguese water dogs.
  • FIGS. 2A-2D is a series of graphs showing localization of a QTL associated with adult body size across dog breeds.
  • FIG. 3 is a graph showing localization on chromorome 15 of a QTL associated with adult body using a Fisher's exact test.
  • FIG. 1 is a table showing the identification of chromosome haplotypes (20 markers) associated with small and large body size in a variety of small and giant do s breeds.
  • FIGS. 5A and 5B are graphs showing association of specified haplotypes with body size in several large breeds.
  • FIG. 6 is a table showing the identification of chromosome haplotypes (6 markers) associated with small and large body size in a variety of dog breeds.
  • FIG. 7 is an ancestral recombination graph.
  • SEQ ID Nos: 1-20 represent the polymorphic nucleotide sequence of alternative alleles of 20 single nucleotide polymorphisms (SNP).
  • SEQ ID NO: 21 represents the polynucleotide sequence of a SINE-Cf insertion element at Canfaml reference position 44,228,010.
  • SEQ ID NO: 22 represents the Boxer reference polynucleotide sequence containing the CA simple repeat length polymorphism at position 44,283,669. The illustrated sequence is primed from the amplification primer FH5934F.
  • the CA repeat is shown here as a (TG) tract, the reverse complementary (CA) repeat is present on the opposite strand of DNA.
  • SEQ ID NOS: 23-250 represent additional polymorphic nucleotide sequences of alternative alleles.
  • SEQ ID NOS: 251 and 252 are PCR primers.
  • This disclosure provides representative markers, and alleles thereof, that correspond to and identify a locus on dog chromosome 15 that is associated with adult body size.
  • the markers and alleles described herein were located by general mapping of a quantitative trait locus (QTL) that constitutes a major effect locus for size in the dog. Successively finer mapping of chromosome 15 which contained the QTL showed that the region spans and encompasses the IGFl locus. The magnitude of the contribution of the IGFl associated QTL differs between breeds, and the ultimate size phenotype is produced by the expression of multiple genes and their interactions with each other (gene x gene interactions) and with environmental factors (gene x environment interactions).
  • the markers and alleles disclosed herein can be used to define haplotypes that provide information regarding expected adult body size in dogs.
  • markers are associated with body size across breeds of dog.
  • the disclosed markers provide the means for identifying the genotype of a subject dog and thereby providing a means of assessing the dog's adult size prior to maturity.
  • a single IGFl allele is carried by almost all dogs from the sampled small breeds and strongly implies that the same causal variant is responsible for the phenotype of diminished body size. Furthermore, the dominance of a single unique haplotype in a panel of phylogenetically divergent small dog breeds and its near absence in giant dogs indicates that the mutation predates the common origin of these small breeds and likely evolved early in the history of domestic dogs or conceivably in gray wolves.
  • the markers, and alleles thereof provide a simple, inexpensive and reliable means of identifying the haplotype associated with the IGFl locus on chromosome 15. By identifying the chromosome haplotype in this region, it is possible to predict whether the IGFl associated QTL contributes to small or large size of the dog.
  • one aspect of this disclosure concerns markers (and alleles thereof) localized to an interval of dog chromosome 15 associated with an IGFl associated QTL that provides a directional contribution to adult body size in dogs.
  • the marker or markers
  • the marker includes polymorphic nucleotide sequences situated in an interval between Canfaml reference position 44,199,850 and Canfaml reference position 44,284,186.
  • the marker is localized to a position between Canfaml reference position 44,212,792 and Canfaml reference position 44,278,140.
  • Exemplary markers include polymorphic nucleotide sequences at Canfaml reference positions 44,226,324; 44,228,010; 44,228,468; and 44,283,669. Kits including probes that detect the markers described herein are also a feature of this disclosure.
  • Another aspect of this disclosure concerns a method for predicting adult body size in a dog.
  • the method can include genotyping a sample obtained from a subject dog for one or more markers on chromosome 15 within the interval between Canfaml reference positions 34,000,000 and 49,000,000, e.g., that spans the IGFl locus.
  • the markers are chosen to individually or collectively identify a haplotype associated with body size in a plurality of inbred dog breeds.
  • the haplotype is correlated with adult body size providing a prediction of the adult body size of the subject dog.
  • the selected markers localize to an interval on chromosome 15 between Canfaml reference positions 44,000,000 and 45,000,000.
  • the markers are localized to an interval on chromosome 15 between Canfaml reference positions 44,199,850 and 44,284,186. In certain embodiments, the markers are localized to an interval on chromosome 15 between Canfaml reference position 44,212,792 and Canfaml reference position 44,278,140. Exemplary markers include those described in Table 1. In specific examples, the markers include polymorphic markers at one or more of Canfaml reference positions 44,226,324; 44,228,010; 44,228,468; and 44,283,669. The methods can include identifying a haplotype on chromosome.
  • the haplotype is correlated with adult body size by comparing the haplotype to an index of average body size by breed. In some embodiments, haplotypes correlating with different body size segregate in an inbred population of dogs of which the subject dog is a member. In an embodiment, the identified haplotype is correlated with reduced serum IGFl expression or reduced IGFl function.
  • This disclosure also provides methods for determining the directional contribution of a QTL associated with adult body size in the dog.
  • Such methods can include genotyping a sample obtained from a subject dog for one or more markers, which markers individually or collectively identify a haplotype on chromosome 15 within the interval from Canfaml reference position 34,000,000 to Canfaml reference position 49,000,000 that is correlated with a directional contribution to body size by the QTL, thereby determining the directional contribution to body size by the QTL.
  • Such methods can be used to predict adult body size by correlating the haplotype with an adult body size in the subject dog.
  • the chromosome 15 haplotype can be correlated with the directional contribution to body size by the QTL by comparing the haplotype to an index of average body size by breed.
  • at least two haplotypes correlating with different body size are segregating in an inbred population of dogs of which the subject dog is a member.
  • dogs identified by the disclosed methods as having a desired body size can be crossed to produce progeny dogs with a desired adult body size.
  • Allele An alternate form of a gene or locus.
  • a locus can have many different alleles, which can differ from each other by a single base substitution, deletion or addition, or by the substitution, deletion or addition of several or even many nucleotides.
  • Exemplary alleles include the polymorphisms shown in Tables 1 and 3. As illustrated by the sequences in these tables, at several places within the identified region of the Canfaml genomic map, there is variation in the DNA sequence among dogs.
  • SEQ ID NO: 1 in Table 1 shows the two common alleles at Canfaml reference position 44212792, one allele contains the sequence TAATGATGCT C ACACTTGGAA (SEQ ID NO: 1 ) and the other allele that is found at that same reference position is TAATGATGCT T ACACTTGGAA (SEQ ID NO: 1). These alleles differ at the nucleotide indicated in bold.
  • Amplifying a nucleic acid molecule To increase the number of copies of a nucleic acid sequence, such as a region of chromosome 15 from Canfaml, a gene or a fragment of a gene. In some instances the amplified region can be referred to as an amplicon and it can contain one or more genetic markers. Table 2 describes several amplicons that were made from Canfaml.
  • Array An arrangement of molecules, such as biological macromolecules (such as polypeptides or nucleic acids) or biological samples (such as tissue sections), in addressable locations on or in a substrate.
  • a "microarray” is an array that is miniaturized so as to require or be aided by microscopic examination for evaluation or analysis. Arrays are sometimes called DNA chips or biochips.
  • the array of molecules makes it possible to carry out a very large number of analyses on a sample at one time.
  • one or more molecules such as an oligonucleotide probe
  • the number of addressable locations on the array can vary, for example from a few (such as three) to at least six, at least 20, at least 25, or more.
  • an array includes nucleic acid molecules, such as oligonucleotide sequences that are at least 15 nucleotides in length, such as about 15-40 nucleotides in length, such as at least 18 nucleotides in length, at least 21 nucleotides in length, or even at least 25 nucleotides in length.
  • the molecule includes oligonucleotides attached to the array via their 5'- or 3 '-end.
  • each arrayed sample is addressable, in that its location can be reliably and consistently determined within the at least two dimensions of the array.
  • the feature application location on an array can assume different shapes.
  • the array can be regular (such as arranged in uniform rows and columns) or irregular.
  • the location of each sample is assigned to the sample at the time when it is applied to the array, and a key may be provided in order to correlate each location with the appropriate target or feature position.
  • ordered arrays are arranged in a symmetrical grid pattern, but samples could be arranged in other patterns (such as in radially distributed lines, spiral lines, or ordered clusters).
  • Addressable arrays usually are computer readable, in that a computer can be programmed to correlate a particular address on the array with information about the sample at that position (such as hybridization or binding data, including for instance signal intensity).
  • information about the sample at that position such as hybridization or binding data, including for instance signal intensity.
  • the individual features in the array are arranged regularly, for instance in a Cartesian grid pattern, which can be correlated to address information by a computer.
  • Exemplary arrays that are useful for detecting the haplotype of a dog include one or more nucleic acid sequences that target the polymorphisms shown in Table 1 or 3.
  • an array is made by fixing labeled probes that hybridize to SEQ ID NOS: 1-20 to a solid support.
  • cDNA complementary DNA: A piece of DNA corresponding in sequence to a messenger RNA extracted from a cell. cDNA can be produced by reverse transcription of cellular RNA. Typically, a cDNA lacks internal, non-coding segments (introns) and regulatory sequences which determine transcription.
  • Correlation A correlation between a phenotypic trait and the presence or absence of a genetic marker (or haplotype or genotype) can be observed by measuring the phenotypic trait and comparing it to data showing the presence or absence of one or more genetic markers. Some correlations are stronger than others, meaning that in some instances all dogs having large adult body size will display a particular genetic marker (i.e., 100% correlation). In other examples the correlation will not be as strong, meaning that dogs having large adult body size (either compared within an inbred line of dogs, or within a mixed breed population) will only display a particular genetic marker 90%, 85%, 70%, 60%, 55%, or 50% of the time.
  • a haplotype which contains information relating to the presence or absence of multiple markers can also be correlated to adult dog body size. Examples of correlations between genetic markers and haplotypes are shown in the figures. Correlations can also be described using various statistical analysis, such as the Spearman's method described in Example 7 or the Mann-Whitney U test described in Example 4. [043] Directional Contribution: A contribution to a trait by a gene or QTL that can be measured directionally along a linear scale. For example, adult body size in dogs can be measured along a linear scale measured in weight. A gene or QTL makes a directional contribution towards size if its expression contributes to larger (or conversely, smaller) size.
  • DNA deoxyribonucleic acid
  • RNA Ribonucleic acid
  • the repeating units in DNA polymers are four different nucleotides, each of which includes one of the four bases, adenine, guanine, cytosine and thymine bound to a deoxyribose sugar to which a phosphate group is attached.
  • Triplets of nucleotides, referred to as codons, in DNA molecules code for amino acid in a polypeptide.
  • codon is also used for the corresponding (and complementary) sequences of three nucleotides in the mRNA into which the DNA sequence is transcribed.
  • Deletion The removal of one or more nucleotides from a nucleic acid sequence (or one or more amino acids from a protein sequence), the regions on either side of the removed sequence being joined together.
  • Genotype The set of alleles present in a subject at one or more loci under investigation. At any one autosomal locus a genotype will be either homozygous (with two identical alleles) or heterozygous (with two different alleles).
  • Haplotype The set of alleles present at linked loci (or nucleotide changes within a gene) that are found together on a single chromosome homolog. Haplotypes can be recognized by detecting more than one polymorphism. For example several haplotypes are shown in FIGS. 4 and 6.
  • Hybridization To form base pairs between complementary regions of two strands of DNA, RNA, or between DNA and RNA, thereby forming a duplex molecule or hybridization complex.
  • Insertion The addition of one or more nucleotides to a nucleic acid sequence, or the addition of one or more amino acids to a protein sequence.
  • deletion refers to the subtraction of or more nucleotides from a nucleic acid sequence, or the subtraction of one more amino acids from a protein sequence.
  • substitution refers to the replacement of one nucleotide for a different nucleotide in a nucleic acid sequence, or the replacement of one amino acid for another amino acid in a protein sequence.
  • Isolated An "isolated" biological component, such as a nucleic acid molecule (or a protein or organelle) has been substantially separated or purified away from other biological components in the cell of the organism in which the component naturally occurs, such as other chromosomal and extra-chromosomal DNA and RNA, proteins and organelles.
  • Nucleic acid molecules and proteins that have been "isolated” include nucleic acid molecules and proteins purified by standard purification methods. The term also embraces nucleic acid molecules and proteins prepared by recombinant expression in a host cell as well as chemically synthesized nucleic acid molecules and proteins.
  • Label An agent capable of detection, for example by ELISA, spectrophotometry, flow cytometry, or microscopy.
  • a label can be attached to a nucleic acid molecule, thereby permitting detection of the nucleic acid molecule.
  • labels include, but are not limited to, radioactive isotopes, enzyme substrates, co-factors, ligands, chemilumine scent agents, fluorophores, haptens, enzymes, and combinations thereof. Methods for labeling and guidance in the choice of labels appropriate for various purposes are discussed for example in Sambrook et al. (Molecular Cloning: A Laboratory Manual, Cold Spring Harbor, New York, 1989) and Ausubel et al. (In Current Protocols in Molecular Biology, John Wiley & Sons, New York, 1998).
  • Linkage The association of two or more (and/or traits) at positions on the same chromosome, such that recombination between the two loci is reduced to a proportion significantly less than 50%.
  • the term linkage can also be used in reference to the association between one or more loci and a trait if an allele (or alleles) and the trait, or absence thereof, are observed together in significantly greater than 50% of occurrences.
  • a linkage group is a set of loci, in which all members are linked either directly or indirectly to all other members of the set.
  • Linkage Disequilibrium Co-occurrence of two genetic loci (e.g., markers) at a frequency greater than expected for independent loci based on the allele frequencies.
  • Linkage disequilibrium (LD) typically occurs when two loci are located close together on the same chromosome.
  • alleles of two genetic loci such as a marker locus and a causal locus
  • the allele observed at one locus is predictive of the allele found at the other locus (for example, a causal locus contributing to a phenotypic trait).
  • Marker or Genetic Marker A nucleic acid at a known location on a chromosome, which is associated with a specified gene or trait. Typically markers are highly polymorphic and the variant forms (or alleles) can be identified by simple and reproducible assay. The term marker can also be used to refer to the alleles and/or the polymorphisms shown in Tables 1 and 3.
  • Microsatellite or Simple Sequence Repeat A very short unit sequence of nucleotides that is repeated multiple times in tandem. Microsatellite sequences are present throughout the genome and are highly polymorphic in terms of length (number) of the repeated nucleotides. A polymorphism at a microsatellite locus is also referred to as a Simple Sequence Length Polymorphism (SSLP). An exemplary simple repeat sequence is shown in SEQ ID NO: 22.
  • Multifactorial A trait controlled by at least two factors, which can be genetic or environmental (for example, body weight).
  • Polygenic traits which are phenotypes that result from interactions among the products of two or more genes with alternative alleles, represent a subset of multifactorial traits.
  • Nucleic acid molecules A deoxyribonucleotide or ribonucleotide polymer including, without limitation, cDNA, mRNA, genomic DNA, and synthetic (such as chemically synthesized) DNA.
  • the nucleic acid molecule can be double-stranded or single-stranded. Where single-stranded, the nucleic acid molecule can be the sense strand or the antisense strand. In addition, nucleic acid molecule can be circular or linear.
  • the disclosure includes isolated nucleic acid molecules that include specified loci associated with adult body size in dogs. Such molecules can include at least 10, at least 15, at least 20, at least 21, at least 25, at least 30, at least 35, at least 40, at least 45 or at least 50 consecutive nucleotides of these sequences or more.
  • Nucleotide Includes, but is not limited to, a monomer that includes a base linked to a sugar, such as a pyrimidine, purine or synthetic analogs thereof, or a base linked to an amino acid, as in a peptide nucleic acid (PNA).
  • a nucleotide is one monomer in a polynucleotide.
  • a nucleotide sequence refers to the sequence of bases in a polynucleotide.
  • Oligonucleotide An oligonucleotide is a plurality of joined nucleotides joined by native phosphodiester bonds, between about 6 and about 300 nucleotides in length.
  • An oligonucleotide analog refers to moieties that function similarly to oligonucleotides but have non-naturally occurring portions.
  • oligonucleotide analogs can contain non-naturally occurring portions, such as altered sugar moieties or inter-sugar linkages, such as a phosphorothioate oligodeoxynucleotide.
  • Particular oligonucleotides and oligonucleotide analogs can include linear sequences up to about 200 nucleotides in length, for example a sequence (such as DNA or RNA) that is at least 6 bases, for example at least 8, at least 10, at least 15, at least 20, at least 21, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 100 or even at least 200 bases long, or from about 6 to about 50 bases, for example about 10-25 bases, such as 12, 15, 20, 21, or 25 bases.
  • these oligonucleotides are engineered to bind to the markers described herein.
  • Oligonucleotide probe A short sequence of nucleotides, such as at least 8, at least 10, at least 15, at least 20, at least 21, at least 25, or at least 30 nucleotides in length, used to detect the presence of a complementary sequence by molecular hybridization.
  • oligonucleotide probes include a label that permit detection of oligonucleotide probe:target sequence hybridization complexes.
  • Exemplary probes that are useful for detecting alleles that can be used to predict the body size of dogs include for example sequences that are complementary to any one of the sequences shown in Tables 1 and 3. Such probes can be used in various combinations in arrays so that the haplotype of a dog can be identified. For example, probes that target the sequences identified in Table 1 can be used in an array to identify the haploytypes shown in FIG. 4.
  • Parent or Parental An animal that is used in the initial cross of a multi- generational breeding program.
  • Phenotype The physical manifestation of a subject's genotype.
  • the phenotype for a particular trait such as adult dog body size, can be determined predominantly by the genotype at a single locus, at two or more loci, or as the result of interactions between the genotype and the environment.
  • Polymorphism As a result of mutations, a gene sequence can differ among individuals. The differing sequences are referred to as alleles. The alleles that are present at a given locus (a specific point within a nucleic acid sequence) are referred to as the individual's genotype. Some loci vary considerably among individuals. If a locus has two or more alleles whose frequencies each exceed 1% in a population, the locus is said to be polymorphic. The polymorphic site is termed a polymorphism. The term polymorphism also encompasses variations that produce gene products with altered function, that is, variants in the gene sequence that lead to gene products that are not functionally equivalent. This term also encompasses variations that produce no gene product, an inactive gene product, or increased or decreased activity gene product or even no biological effect.
  • Polymorphisms can be referred to, for instance, by the nucleotide position at which the variation exists, by the change in amino acid sequence caused by the nucleotide variation, or by a change in some other characteristic of the nucleic acid molecule or protein that is linked to the variation. Polymorphisms can be causative (actually involved in or influencing the condition or trait to which the polymorphism is linked) or associative (linked to but not having any direct involvement in or influence on the condition or trait to which the polymorphism is linked).
  • Polymorphisms useful as genetic markers include the exemplary SNP, SINE and micro satellite (SSLP) polymorphisms disclosed herein, as well as numerous alternatives, including for example, minisatellites, restriction fragment length polymorphisms (RFLPs), restriction fragment length variants (RFLVs), single strand conformation polymorphisms (SSCPs), amplification length polymorphisms (AFLPs), and the like.
  • RFLPs restriction fragment length polymorphisms
  • RFLVs restriction fragment length variants
  • SSCPs single strand conformation polymorphisms
  • AFLPs amplification length polymorphisms
  • Primers Short nucleic acid molecules, for instance DNA oligonucleotides 10 -100 nucleotides in length, such as about 15, 20, 21, 25, 30 or 50 nucleotides or more in length. Primers can be annealed to a complementary target DNA strand by nucleic acid hybridization to form a hybrid between the primer and the target DNA strand. Primer pairs can be used for amplification of a nucleic acid sequence, for example the CanFaml regions described in Tables 1-3, such as by PCR or other nucleic acid amplification methods known in the art. In some instances primers can include a labeling moiety and upon hybridization and amplification the resulting amplified DNA contains the label. When primers are used in this way they can also be referred to as probes.
  • PCR primer pairs can be derived from a known sequence, for example, by using computer programs intended for that purpose such as Primer (Version 0.5, ⁇ 1991, Whitehead Institute for Biomedical Research, Cambridge, MA).
  • purified does not require absolute purity; rather, it is intended as a relative term.
  • a purified nucleic acid preparation is one in which the specified nucleic acid is more pure than the nucleic acid is in its natural environment within a cell.
  • a preparation of a nucleic acid is purified such that the specified nucleic acid represents at least 50% of the total nucleic acid content of the preparation.
  • Quantitative Trait Loci The location of one or more genes that are involved in the inheritance of quantitative traits. Quantitative traits are traits that measured on a continuous scale, such as body size which can take any number of different values. Though not necessarily genes themselves, quantitative trait loci (QTLs) are stretches of DNA that are closely linked to the genes that underlie the trait in question. QTLs can be molecularly identified (for example, with PCR) to help map regions of the genome that contain genes involved in specifying a quantitative trait. Markers (specific nucleic acid sequences) found within QTLs can be statistically correlated to various phenotypic traits, such as dog size, and such markers can then be used to predict the adult size of a dog.
  • Sample A biological specimen, such as those containing genomic DNA, RNA (including mRNA), protein, or combinations thereof. Examples include, but are not limited to, peripheral blood, urine, saliva, tissue biopsy, surgical specimen, amniocentesis samples, and autopsy material.
  • SINE Short interspersed element a few hundered basepairs in size, which belongs to family of retrotransposons dispersed throughout the genome of many mammals including domestic dogs. Presence of absence of a SINE at a specific site in the genome constitutes a detectable polymorphism. In dogs the most prevalent SINE family is designated SINEC. Dog SINEC elements are composed of two major subfamilies: SINECl and SINEC2, as well as a number of subfamilies. SINE insertion sites can be detected by amplifying the putative SINE insertion site using sequences flanking the SINE as primers. After amplification, the presence of the element is detected, for example, by southern hybridization.
  • SNP Single nucleotide polymorphism
  • Subject Living multi-cellular vertebrate organisms, a category that includes human and non-human mammals (such as veterinary subjects).
  • a subject is a dog, such as a pure breed dog, for example a dachshund, collie, terrier, or one of the breeds described in the figures.
  • Target sequence A sequence of nucleotides located in a particular region in a genome (such as a human genome or the genome of any mammal) that corresponds to one or more specific phenotypic attributes, such as one or more nucleotide substitutions, deletions, insertions, amplifications, or combinations thereof.
  • the target can be for instance a coding sequence; it can also be the non- coding strand that corresponds to a coding sequence.
  • target sequences include those sequences associated with adult dog body size, such as SNP markers listed in Table 1 (polynucleotide sequences of alternative alleles of these markers are provided in SEQ ID NOs: 1-20) or alternative markers, such as those at Canfaml reference positions 44,228,010 and 44,283,669.
  • amplicons larger regions within the dog chromosome 15 were sequenced to identify the markers shown in Table 3.
  • markers can then be used to identify correlations with dog body size.
  • the 20 single nucleotide polymorphisms shown in bold in Table 3 and also shown in Table 1 were used to develop haplotypes that can be used to predict adult dog body size.
  • these markers can be used to predict adult dog body size among various breeds and also within breeds.
  • IGFl insulin-like growth factor- 1
  • the present disclosure provides exemplary markers and alleles thereof that span this region of the dog genome that identify discrete chromosomal haplotypes that correlate with adult body size in dogs. These markers have been validated in a wide variety of dog breeds of different sizes and are useful for predicting adult body size across multiple dog breeds.
  • Numerous polymorphic loci are provided in the example section herein.
  • markers within the chromosome 15 interval from Canfaml reference position 44,000,000 to Canfaml reference position 45,000,000 are favorably used to predict haplotypes associated with size in dogs.
  • the markers reside within an interval between Canfaml reference position 44,190,850 to Canfaml reference position 44,284,186.
  • the markers are localized within an even smaller interval from Canfaml reference position 44,212,792 to Canfaml reference position 44,278,140.
  • the marker is selected from among specific markers, such as the SNP markers of Table 1 (alternative alleles of these marker loci are depicted in SEQ ID NOs: 1-20) or from a set of markers (e.g. containing putative causal mutations), such as markers at Canfaml reference positions 44,226,324; 44,228,010; 44,228,468; and 44,283,669.
  • markers have been validated for their high predictive value for identifying a chromosomal haplotype associated with body size in adult dogs. The ability of these markers to identify an IGFl genotype correlated with small or large body size exceeds 99%, regardless of the breed. Thus, these markers and alleles constitute a substantial advancement with respect to prior art markers associated with size in dogs.
  • Table 1 provides a set of 20 SNP marker loci indicated by reference to the Canfaml assembly, along with the nucleotide sequence of the alternative alleles at the marker locus. These markers consistently and reliably identify chromosomal haplotypes that segregate with small and large body size in dogs.
  • markers are localized within an interval from Canfaml Reference Positions 44,212,792 to 44,278,140, a distance of approximately 65 kb.
  • a distance of approximately 65 kb is an interval from Canfaml Reference Positions 44,212,792 to 44,278,140.
  • any one or more of these markers can be used in the methods provided herein, such as any combination of SEQ ID NOS: 1-20.
  • Table 1 Polymorphic marker set for predicting adult body size in dogs
  • the markers are selected from a reduced set of only six marker loci, at Canfaml reference positions: 44,212,792; 44,226,324; 44,226,684; 44,228,468; 44,237,388; and 44,260,949 (SEQ ID NOs: 1, 3, 4, 5, 9, and 16), which identify chromosomal haplotypes correlated with either small or large size in dogs.
  • SEQ ID NOs: 1, 3, 4, 5, 9, and 16 SEQ ID NOs: 1, 3, 4, 5, 9, and 16
  • markers evaluated are not important, rather the predictive value resides in the association of these markers with a chromosomal conformation associated with an IGFl locus that constitutes a major effect locus for body size in dogs. Indeed, a single marker within this region, such as SNP5 (44,228,468) represented by SEQ ID NO: 5, can be used alone to predict adult dog body size. Similarly, alternative markers closely linked to the described marker(s) can also be used to infer size.
  • polymorphic markers exist within this region, including insertion/deletion polymorphisms (such as a SINEC element insertion at Canfaml Reference position 44,228,010), and micro satellite polymorphisms (for example, a CA simple sequence repeat polymorphism at Canfaml Reference position 44,283,669).
  • insertion/deletion polymorphisms such as a SINEC element insertion at Canfaml Reference position 44,228,010
  • micro satellite polymorphisms for example, a CA simple sequence repeat polymorphism at Canfaml Reference position 44,283,669.
  • markers are particularly useful for determining the directional contribution of an IGFl associated QTL in the absence of an identified causal variation at this locus.
  • causal mutations within the IGFl locus could also be used to predict size in dogs.
  • markers with sequence identity or sequence similarity to those disclosed herein are also suitable in the context of the methods disclosed herein.
  • Sequence identity can be measured in terms of percentage identity; the higher the percentage, the more identical the sequences are.
  • Homologs or orthologs of nucleic acid or amino acid sequences possess a relatively high degree of sequence identity/similarity when aligned using standard methods. This homology is more significant when the orthologous proteins or cDNAs are derived from species which are more closely related (such as between closely related breeds of dog), compared to species more distantly related (such as a dog and wolf or jackal).
  • NCBI Basic Local Alignment Search Tool (BLAST) (Altschul et al, J. MoI. Biol. 215:403-10, 1990) is available from several sources, including the National Center for Biological Information (NCBI, National Library of Medicine, Building 38A, Room 8N805, Bethesda, MD 20894) and on the Internet, for use in connection with the sequence analysis programs blastp, blastn, blastx, tblastn and tblastx. Additional information can be found at the NCBI web site. Any of these programs can be employed using the default or other parameters specified by the practitioner.
  • NCBI National Center for Biological Information
  • NCBI National Library of Medicine, Building 38A, Room 8N805, Bethesda, MD 20894
  • sequence analysis programs blastp, blastn, blastx, tblastn and tblastx. Additional information can be found at the NCBI web site. Any of these programs can be employed using the default or other parameters specified by the practitioner.
  • BLASTN is used to compare nucleic acid sequences
  • BLASTP is used to compare amino acid sequences. If the two compared sequences share sequence identity, then the designated output file will present those regions of identity as aligned sequences. If the two compared sequences do not share sequence identity, then the designated output file will not present aligned sequences.
  • the number of matches is determined by counting the number of positions where an identical nucleotide or amino acid residue is presented in both sequences.
  • the percent sequence identity is determined by dividing the number of matches either by the length of the sequence set forth in the identified sequence, or by an articulated length (such as 100 consecutive nucleotides or amino acid residues from a sequence set forth in an identified sequence), followed by multiplying the resulting value by 100.
  • Probes and primers directed to the polymorphisms correlated to adult dog body size need not share 100% sequence identity.
  • Such homologous nucleic acid sequences can, for example, possess at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, or at least 99% sequence identity determined by this method.
  • Methods of predicting the adult body size of a dog can include obtaining a sample from a subject dog that includes genomic DNA using any method known in the art.
  • the dog is being used as part of a breeding program and the genotype of the dog will be matched according to characteristics desired by the breeder. For example, a dog may be chosen to mate with another dog that carries the same alleles, therefore, increasing the probability that the offspring will display a homogeneous adult size.
  • the dog is a puppy, and the adult size of the puppy will be predicted.
  • the genomic DNA sample taken from the subject dog will include an interval on chromosome 15 between Canfaml reference positions 44,000,000 and 45,000,000.
  • This interval is shown herein to contain many markers that can be used to predict adult dog body size.
  • the interval contains the sequences set forth in SEQ ID NOS: 1-22, and 66-226.
  • the genomic DNA sample can then be assayed to identify the alleles (polymorphisms) within the interval.
  • the alleles present can then be used to predict adult dog body size.
  • Breeding plans can include inbreeding, which refers to the breeding of two related (related as used herein refers to when the parentage of both the female dog and the male dog when traced back over several generations contain at least one common ancestor) dogs.
  • Inbreeding is common practice when purebred dogs are desired and the practice of inbreeding allows for the maintenance of desirable phenotypic traits.
  • Within a population of purebred dogs typically the adult body size is more consistent, then for instance within a population comprising a mixture of dog breeds.
  • An index of adult dog body size within a pure bred population of dogs can be created and then the dogs can be genotyped using the markers provided herein. The haplotypes can then be correlated with the index of average body size.
  • the polymorphism includes one or more of the sequences shown in SEQ ID NOS: 5-8, 11, 12, 15, 16, or 20 and the dog, or the offspring from the dog, is predicted to have small body size. In other examples, the polymorphism includes one or more of the sequences shown in SEQ ID NOS: 1-4, 8, 10, 13, 14, 17, or 20 and the dog, or offspring from the dog is predicted to have large body size. In yet other examples, the dog is less than 6 months, 1 year old, or 2 years old.
  • dog breeding plans vary depending upon the breeders' desired outcome.
  • a breeding program can be designed to diversify the alleles for small body size and large body size in the offspring.
  • the breeding pair the male and female parental dogs
  • the breeding pair can be chosen such that they do not display the same alleles.
  • the breeding pair can be selected such that the parental dogs display the same markers.
  • the average size of the litter is similar to that of the breeding pair.
  • the genomic DNA can be assayed to determine which markers are present using any method known in the art. For example, single-strand conformation polymorphism (SSCP) analysis, base excision sequence scanning (BESS), restriction fragment length polymorphism (RFLP) analysis, heteroduplex analysis, denaturing gradient gel electrophoresis (DGGE), temperature gradient electrophoresis, allelic polymerase chain reaction (PCR), ligase chain reaction direct sequencing, mini sequencing, nucleic acid hybridization, or micro-array-type detection can be used to identify the polymorphisms present in the sample.
  • SSCP single-strand conformation polymorphism
  • BESS base excision sequence scanning
  • RFLP restriction fragment length polymorphism
  • heteroduplex analysis heteroduplex analysis
  • denaturing gradient gel electrophoresis (DGGE) denaturing gradient gel electrophoresis
  • DGGE denaturing gradient gel electrophoresis
  • PCR allelic polymerase chain reaction
  • ligase chain reaction direct sequencing mini sequencing, nu
  • the methods described herein include genotyping a sample of genetic material obtained from a subject dog for one or more markers on chromosome 15 to determine the allele present at the marker locus.
  • the markers are chosen from the markers provided in Table 3.
  • one or more markers located in the interval from Canfaml reference position position 44,000,000 to Canfaml reference position 45,000,000 are genotyped to determine the allele for the marker(s).
  • the markers localize within an interval between Canfaml reference position 44,190,850 and Canfam reference position 44,284,186.
  • the markers are localized within an even smaller interval from Canfaml reference position 44,212,792 to Canfaml reference position 44,278,140.
  • the marker is selected from among specific markers, such as the SNP markers of Table 1 (alternative alleles of these markers are represented by SEQ ID NOS: 1-20) or from a set of markers (e.g. containing putative causal mutations), such as markers at Canfaml reference positions 44,226,324; 44,228,010; 44,228,468; and 44,283,669.
  • markers e.g. containing putative causal mutations
  • the markers shown in SEQ ID NOS: 1, 4, 5, and 21 can be detected.
  • the marker shown in SEQ ID NOS: 5 or 21 can be detected.
  • the genotype of the one or more markers identifies a haplotype on chromosome 15 associated with a QTL of major effect in the vicinity of the insulin- like growth factor- 1 (IGFl) locus that contributes to small or large size in dogs.
  • IGFl insulin-like growth factor- 1
  • identification of a haplotype correlated with small body for example, haplotypes A, B, C shown in FIG. 4
  • haplotypes A, B, C shown in FIG. 4 indicates that the subject possesses an allele of the IGFl associated QTL that contributes to small body size.
  • a dog with this haplotype will tend to smaller body size than a dog of the same breed and gender with a haplotype correlated with large body size (for example, haplotypes D-L shown in FIG. 4).
  • appropriate specimens, or samples, for use with the current disclosure in determining body size of a dog include any conventional clinical sample, for instance blood or blood-fractions (such as serum). Techniques for acquisition of such samples are well known in the art (for example see Schluger et al. J. Exp. Med. 176:1327-33, 1992, for the collection of serum samples). Serum or other blood fractions can be prepared in the conventional manner. For example, about 200 ⁇ L of serum can be used for the extraction of DNA for use in amplification reactions.
  • the sample can be used directly, concentrated (for example by centrifugation or filtration), purified, or combinations thereof.
  • nucleic acids in the sample are subjected to an amplification reaction.
  • rapid DNA preparation can be performed using a commercially available kit (such as the InstaGene Matrix, BioRad, Hercules, CA; the NucliSens isolation kit, Organon Teknika, Netherlands).
  • the DNA preparation method yields a nucleotide preparation that is accessible to, and amenable to, nucleic acid amplification.
  • the markers genotype is determined by any convenient method for ascertaining pertinent information regarding the nucleic acid sequence of the locus in the sample. In many cases, this can include obtaining information regarding the nucleotide sequence of the target sequence corresponding to the marker locus in the sample.
  • the nucleic acids obtained from the sample can be genotyped to identify the particular allele present for a marker locus.
  • a sample of sufficient quantity to permit direct detection of marker alleles from the sample can be obtained from the subject.
  • a smaller sample is obtained from the subject and the nucleic acids are amplified prior to detection.
  • the nucleic acid sample is purified (or partially purified) prior to detection of the marker alleles. Any target nucleic that is informative for a chromosome haplotype in the interval between 34 and 49 Mb can be detected.
  • the target nucleic acid corresponds to a marker locus localized to an interval between Canfaml reference position 44,190,850 and Canfaml reference position 44,284,186, flanking the IGFl locus.
  • the target nucleic acid corresponds to a SNP marker selected from Table 1, or a SINEC insertion polymorphism at position 44,228,010, or a CA simple repeat length polymorphism at position 44,283,669.
  • Such mutations or polymorphisms (or both) can be detected to identify the chromosomal haplotype in the subject. Any method of detecting a nucleic acid molecule can be used, such as hybridization and/or sequencing assays.
  • Hybridization is the binding of complementary strands of DNA, DNA/RNA, or RNA. Hybridization can occur when primers or probes bind to target sequences such as target sequences within dog genomic DNA. Probes and primers that are useful generally include nucleic acid sequences that hybridize (for example under high stringency conditions) with at least 10, 12, 14, 16, 18, or 20 of the sequences provided in SEQ ID NOS: 1-250. Physical methods of detecting hybridization or binding of complementary strands of nucleic acid molecules, include but are not limited to, such methods as DNase I or chemical footprinting, gel shift and affinity cleavage assays, Southern and Northern blotting, dot blotting and light absorption detection procedures.
  • T m The binding between a nucleic acid primer or probe and its target nucleic acid is frequently characterized by the temperature (T m ) at which 50% of the nucleic acid probe is melted from its target.
  • T m the temperature at which 50% of the nucleic acid probe is melted from its target.
  • a higher (T m ) means a stronger or more stable complex relative to a complex with a lower (T m ).
  • complementary nucleic acids form a stable duplex or triplex when the strands bind, (hybridize), to each other by forming Watson-Crick, Hoogsteen or reverse Hoogsteen base pairs. Stable binding occurs when an oligonucleotide molecule remains detectably bound to a target nucleic acid sequence under the required conditions.
  • Complementarity is the degree to which bases in one nucleic acid strand base pair with the bases in a second nucleic acid strand. Complementarity is conveniently described by percentage, that is, the proportion of nucleotides that form base pairs between two strands or within a specific region or domain of two strands. For example, if 10 nucleotides of a 15-nucleotide oligonucleotide form base pairs with a targeted region of a DNA molecule, that oligonucleotide is said to have 66.67% complementarity to the region of DNA targeted.
  • sufficient complementarity means that a sufficient number of base pairs exist between an oligonucleotide molecule and a target nucleic acid sequence (such as one of the markers provided in Tables 1 or 3) to achieve detectable binding.
  • the percentage complementarity that fulfills this goal can range from as little as about 50% complementarity to full (100%) complementary.
  • sufficient complementarity is at least about 50%, for example at least about 75% complementarity, at least about 90% complementarity, at least about 95% complementarity, at least about 98% complementarity, or even at least about 100% complementarity.
  • Hybridization conditions resulting in particular degrees of stringency will vary depending upon the nature of the hybridization method and the composition and length of the hybridizing nucleic acid sequences. Generally, the temperature of hybridization and the ionic strength (such as the Na+ concentration) of the hybridization buffer will determine the stringency of hybridization. Calculations regarding hybridization conditions for attaining particular degrees of stringency are discussed in Sambrook et ah, (1989) Molecular Cloning: a laboratory manual, second edition, Cold Spring Harbor Laboratory, Plainview, NY (chapters 9 and 11). The following is an exemplary set of hybridization conditions and is not limiting:
  • Radiolabels include, but are not limited to an enzyme, chemiluminescent compound, fluorescent compound (such as FITC, Cy3, and Cy5), metal complex, hapten, enzyme, colorimetric agent, a dye, or combinations thereof. Radiolabels include, but are not limited to, 125 I and 35 S. For example, radioactive and fluorescent labeling methods, as well as other methods known in the art, are suitable for use with the present disclosure.
  • primers used to amplify the subject's nucleic acids are labeled (such as with biotin, a radiolabel, or a fluorophore).
  • amplified target nucleic acid samples are end-labeled to form labeled amplified material.
  • amplified nucleic acid molecules can be labeled by including labeled nucleotides in the amplification reactions.
  • Nucleic acid molecules associated corresponding to one or more marker loci can also be detected by hybridization procedures using a labeled nucleic acid probe, such as a probe that detects only one alternative allele at a marker locus.
  • a labeled nucleic acid probe such as a probe that detects only one alternative allele at a marker locus.
  • the target nucleic acid or amplified target nucleic acid
  • the solid support (such as membrane made of nylon or nitrocellulose) is contacted with a labeled nucleic acid probe, which hybridizes to it complementary target under suitable hybridization conditions to form a hybridization complex.
  • Hybridization conditions for a given combination of array and target material can be optimized routinely in an empirical manner close to the T m of the expected duplexes, thereby maximizing the discriminating power of the method.
  • the hybridization conditions can be selected to permit discrimination between matched and mismatched oligonucleotides.
  • Hybridization conditions can be chosen to correspond to those known to be suitable in standard procedures for hybridization to filters (and optionally for hybridization to arrays).
  • temperature is controlled to substantially eliminate formation of duplexes between sequences other than an exactly complementary allele of the selected marker.
  • a variety of known hybridization solvents can be employed, the choice being dependent on considerations known to one of skill in the art (see U.S. Patent 5,981,185).
  • the presence of the hybridization complex can be analyzed, for example by detecting the complexes.
  • detection includes detecting one or more labels present on the oligonucleotides, the target (e.g., amplified) sequences, or both.
  • Detection can include treating the hybridized complex with a buffer and/or a conjugating solution to effect conjugation or coupling of the hybridized complex with the detection label, and treating the conjugated, hybridized complex with a detection reagent.
  • the conjugating solution includes streptavidin alkaline phosphatase, avidin alkaline phosphatase, or horseradish peroxidase.
  • conjugating solutions include streptavidin alkaline phosphatase, avidin alkaline phosphatase, or horseradish peroxidase.
  • the conjugated, hybridized complex can be treated with a detection reagent.
  • the detection reagent includes enzyme-labeled fluorescence reagents or calorimetric reagents.
  • the detection reagent is enzyme-labeled fluorescence reagent (ELF) from Molecular Probes, Inc. (Eugene, OR).
  • ELF enzyme-labeled fluorescence reagent
  • the hybridized complex can then be placed on a detection device, such as an ultraviolet (UV) transilluminator (manufactured by UVP, Inc. of Upland, CA).
  • UV ultraviolet
  • the signal is developed and the increased signal intensity can be recorded with a recording device, such as a charge coupled device (CCD) camera (manufactured by Photometries, Inc. of Arlington, AZ).
  • a recording device such as a charge coupled device (CCD) camera (manufactured by Photometries, Inc. of Arlington, AZ).
  • CCD charge coupled device
  • these steps are not performed when radiolabels are used.
  • the method further includes quantification, for instance by determining the amount of hybridization.
  • Allele-specific PCR differentiates between target regions differing in the presence of absence of a variation or polymorphism.
  • PCR amplification primers are chosen based upon their complementarity to the target sequence, such as the sequences provided in Table 3, within the dog genomic DNA. The primers bind only to certain alleles of the target sequence. This method is described by Gibbs, Nucleic Acid Res. 17:12427 2448, 1989.
  • ASO screening methods employ the allele-specific oligonucleotide (ASO) screening methods (e.g. see Saiki et al., Nature 324:163-166, 1986). Oligonucleotides with one or more base pair mismatches are generated for any particular allele. ASO screening methods detect mismatches between one allele in the target genomic or PCR amplified DNA and the other allele, showing decreased binding of the oligonucleotide relative to the second allele (i.e. the other allele) oligonucleotide.
  • ASO allele-specific oligonucleotide
  • Oligonucleotide probes can be designed that under low stringency will bind to both polymorphic forms of the allele, but which at high stringency, bind to the allele to which they correspond.
  • stringency conditions can be devised in which an essentially binary response is obtained, i.e., an ASO corresponding to a variant form of the target gene will hybridize to that allele, and not to the wildtype allele.
  • Ligase can also be used to detect point mutations, such as the SNPs in Table 3 in a ligation amplification reaction (e.g. as described in Wu et al., Genomics 4:560-569, 1989).
  • the ligation amplification reaction utilizes amplification of specific DNA sequence using sequential rounds of template dependent ligation (e.g. as described in Wu, supra, and Barany, Proc. Nat. Acad. Sci. 88:189-193, 1990).
  • Amplification products generated using the polymerase chain reaction can be analyzed by the use of denaturing gradient gel electrophoresis. Different alleles can be identified based on the different sequence-dependent melting properties and electrophoretic migration of DNA in solution.
  • DNA molecules melt in segments, termed melting domains, under conditions of increased temperature or denaturation. Each melting domain melts cooperatively at a distinct, base-specific melting temperature (T M ). Melting domains are at least 20 base pairs in length, and can be up to several hundred base pairs in length.
  • a target region to be analyzed by denaturing gradient gel electrophoresis is amplified using PCR primers flanking the target region.
  • the amplified PCR product is applied to a polyacrylamide gel with a linear denaturing gradient as described in Myers et al, Meth. Enzymol. 155:501-527, 1986, and Myers et al, in Genomic Analysis, A Practical Approach, K. Davies Ed. IRL Press Limited, Oxford, pp. 95 139, 1988.
  • the electrophoresis system is maintained at a temperature slightly below the Tm of the melting domains of the target sequences.
  • the target sequences can be initially attached to a stretch of GC nucleotides, termed a GC clamp, as described in Chapter 7 of Erlich, supra.
  • a GC clamp a stretch of GC nucleotides
  • at least 80% of the nucleotides in the GC clamp are either guanine or cytosine.
  • the GC clamp is at least 30 bases long. This method is particularly suited to target sequences with high T m 's.
  • the target region is amplified by the polymerase chain reaction as described above.
  • One of the oligonucleotide PCR primers carries at its 5' end, the GC clamp region, at least 30 bases of the GC rich sequence, which is incorporated into the 5' end of the target region during amplification.
  • the resulting amplified target region is run on an electrophoresis gel under denaturing gradient conditions as described above. DNA fragments differing by a single base change will migrate through the gel to different positions, which can be visualized by ethidium bromide staining.
  • Temperature gradient gel electrophoresis is based on the same underlying principles as denaturing gradient gel electrophoresis, except the denaturing gradient is produced by differences in temperature instead of differences in the concentration of a chemical denaturant.
  • Standard TGGE utilizes an electrophoresis apparatus with a temperature gradient running along the electrophoresis path. As samples migrate through a gel with a uniform concentration of a chemical denaturant, they encounter increasing temperatures.
  • An alternative method of TGGE, temporal temperature gradient gel electrophoresis uses a steadily increasing temperature of the entire electrophoresis gel to achieve the same result.
  • Target sequences or alleles such as those provided in SEQ ID NOS: 1-250 can be differentiated using single- strand conformation polymorphism analysis, which identifies base differences by alteration in electrophoretic migration of single stranded PCR products, for example as described in Orita et al, Proc. Nat. Acad. Sci. 85:2766-2770, 1989.
  • Amplified PCR products can be generated as described above, and heated or otherwise denatured, to form single stranded amplification products.
  • Single-stranded nucleic acids can refold or form secondary structures which are partially dependent on the base sequence.
  • electrophoretic mobility of single-stranded amplification products can detect base-sequence difference between alleles or target sequences.
  • Differences between target sequences can also be detected by differential chemical cleavage of mismatched base pairs, for example as described in Grompe et al., Am. J. Hum. Genet. 48:212-222, 1991.
  • differences between target sequences can be detected by enzymatic cleavage of mismatched base pairs, as described in Nelson et al., Nature Genetics 4:11-18, 1993. Briefly, genetic material from an animal and an affected family member can be used to generate mismatch free heterohybrid DNA duplexes.
  • heterohybrid means a DNA duplex strand comprising one strand of DNA from one animal, and a second DNA strand from another animal, usually an animal differing in the phenotype for the trait of interest. Positive selection for heterohybrids free of mismatches allows determination of small insertions, deletions or other polymorphisms such as those shown in Tables 1 and 3.
  • oligonucleotide PCR primers are designed that flank the mutation in question and allow PCR amplification of the region.
  • a third oligonucleotide probe is then designed to hybridize to the region containing the base subject to change between different alleles of the gene. This probe is labeled with fluorescent dyes at both the 5' and 3' ends. These dyes are chosen such that while in this proximity to each other the fluorescence of one of them is quenched by the other and cannot be detected.
  • Extension by Taq DNA polymerase from the PCR primer positioned 5' on the template relative to the probe leads to the cleavage of the dye attached to the 5' end of the annealed probe through the 5' nuclease activity of the Taq DNA polymerase. This removes the quenching effect allowing detection of the fluorescence from the dye at the 3' end of the probe.
  • the discrimination between different DNA sequences arises through the fact that if the hybridization of the probe to the template molecule is not complete, i.e. there is a mismatch of some form, the cleavage of the dye does not take place.
  • a reaction mix can contain two different probe sequences each designed against different alleles that might be present thus allowing the detection of both alleles in one reaction.
  • Hybridization probes are generally oligonucleotides which bind through complementary base pairing to all or part of a target nucleic acid. Probes typically bind target sequences lacking complete complementarity with the probe sequence depending on the stringency of the hybridization conditions.
  • the probes can be labeled directly or indirectly, such that by assaying for the presence or absence of the probe, one can detect the presence or absence of the target sequence. Direct labeling methods include radioisotope labeling, such as with 32 P or 35 S.
  • Indirect labeling methods include fluorescent tags, biotin complexes which can be bound to avidin or streptavidin, or peptide or protein tags.
  • Visual detection methods include photoluminescents, Texas red, rhodamine and its derivatives, red leuco dye and 3,3',5,5'-tetramethylbenzidine (TMB), fluorescein, and its derivatives, dansyl, umbelliferone and the like or with horse radish peroxidase, alkaline phosphatase and the like.
  • Hybridization probes include any nucleotide sequence capable of hybridizing to dog chromosome 15 where a polymorphism is present that correlates with adult dog body size, and thus defining a genetic marker, including a restriction fragment length polymorphism, a hypervariable region, repetitive element, or a variable number tandem repeat.
  • Hybridization probes can be any gene or a suitable analog. Further suitable hybridization probes include exon fragments or portions of cDNAs or genes known to map to the relevant region of the chromosome.
  • tandem repeat hybridization probes for use in the methods disclosed are those that recognize a small number of fragments at a specific locus at high stringency hybridization conditions, or that recognize a larger number of fragments at that locus when the stringency conditions are lowered.
  • PCR polymerase chain reaction
  • Designing oligonucleotides for use as either sequencing or PCR primers to detect requires selection of an appropriate sequence that specifically recognizes the target, and then testing the sequence to eliminate the possibility that the oligonucleotide will have a stable secondary structure. Inverted repeats in the sequence can be identified using a repeat-identification or RNA-folding programs. If a possible stem structure is observed, the sequence of the primer can be shifted a few nucleotides in either direction to minimize the predicted secondary structure.
  • the sequence of the oligonucleotide should also be compared with the sequences of both strands of the appropriate vector and insert DNA.
  • a sequencing primer should only have a single match to the target DNA. It is also advisable to exclude primers that have only a single mismatch with an undesired target DNA sequence.
  • the primer sequence should be compared to the sequences in the GenBank database to determine if any significant matches occur. If the oligonucleotide sequence is present in any known DNA sequence or, more importantly, in any known repetitive elements, the primer sequence should be changed.
  • the nucleic acid samples obtained from the subject are amplified prior to detection.
  • Target nucleic acids are amplified to obtain amplification products, including sequences from one or more markers enumerated in Table 1, Table 3, and/or at Canfaml reference positions 44,228,010 and 44,283,669, can be amplified from the sample prior to detection.
  • DNA sequences are amplified, although in some instances RNA sequences can be amplified or converted into cDNA, using RT PCR for example.
  • Any nucleic acid amplification method can be used.
  • An example of in vitro amplification is the polymerase chain reaction (PCR), in which a biological sample obtained from a subject is contacted with a pair of oligonucleotide primers, under conditions that allow for hybridization of the primers to a nucleic acid molecule in the sample.
  • the primers are extended under suitable conditions, dissociated from the template, and then re-annealed, extended, and dissociated to amplify the number of copies of the nucleic acid molecule.
  • in vitro amplification techniques include quantitative real-time PCR, strand displacement amplification (see USPN 5,744,311); transcription-free isothermal amplification (see USPN 6,033,881); repair chain reaction amplification (see WO 90/01069); ligase chain reaction amplification (see EP-A-320 308); gap filling ligase chain reaction amplification (see USPN 5,427,930); coupled ligase detection and PCR (see USPN 6,027,889); and NASBATM RNA transcription-free amplification (see USPN 6,025,134).
  • the target sequences to be amplified from the subject include one or more SNP markers shown in Table 1 and/or alternative markers such as those at Canfaml reference positions 44,228,010 and 44,283,669.
  • target sequences containing one or more of SEQ ID NOs: 1-20, or a subset thereof, such as SEQ ID NOs: 1, 3, 4, 5, 9, and 16 are amplified.
  • a single marker with exceptionally high predictive value is amplified, such as a marker at Canfaml reference positions 44,228, 010 or 44,228,468.
  • a pair of primers can be utilized in the amplification reaction.
  • One or both of the primers can be labeled, for example with a detectable radiolabel, fluorophore, or biotin molecule.
  • the pair of primers includes an upstream primer (which binds 5' to the downstream primer) and a downstream primer (which binds 3' to the upstream primer).
  • the pair of primers used in the amplification reaction are selective primers which permit amplification of a size related marker locus.
  • Primers can be selected to amplify a nucleic acid molecule listed in Table 1, Table 3, or positions 44,228,010 and 44,283,669. Exemplary primers are shown in Table 2.
  • primers can be designed by those of skill in the art simply by determining the sequence of the desired target region, for example, using well known computer assisted algorithms that select primers within desired parameters suitable for annealing and amplification.
  • an additional pair of primers can be included in the amplification reaction as an internal control.
  • these primers can be used to amplify a "housekeeping" nucleic acid molecule, and serve to provide confirmation of appropriate amplification.
  • a target nucleic acid molecule including primer hybridization sites can be constructed and included in the amplification reactor.
  • the methods can be performed using an array that includes a plurality of markers.
  • arrays can include nucleic acid molecules.
  • the array includes nucleic acid oligonucleotide probes that can hybridize to one or more alleles of a marker within the chromosome 15 interval between 34 and 40 Mb, such as any of the marker loci disclosed herein and/or their equivalents.
  • the arrays are designed to detect alleles at some or all of the twenty SNP markers of Table 1, e.g. alternative alleles represented by SEQ ID NOs: 1, 3, 4, 5, 9, and 16.
  • additional markers not selected from Table 1 are assessed using the array, for example, markers at Canfaml reference position 44,228,010 and/or at position 44,283,669.
  • Certain of such arrays can include additional molecules that are related to size in dogs, as well as other sequences, such as one or more probes that recognize one or more housekeeping genes.
  • Arrays can be used to detect the presence of amplified sequences corresponding to marker loci related to size in dogs using specific oligonucleotide probes.
  • a set of oligonucleotide probes is attached to the surface of a solid support for use in detection of marker alleles that define haplotypes that are predictive of adult body size in dogs, such as amplified nucleic acid sequences obtained from the subject.
  • an oligonucleotide probe can be included to detect the presence of this amplified nucleic acid molecule.
  • the oligonucleotide probes bound to the array can specifically bind sequences amplified in the amplification reaction (such as under high stringency conditions).
  • the methods and apparatus in accordance with the present disclosure takes advantage of the fact that under appropriate conditions oligonucleotides form base- paired duplexes with nucleic acid molecules that have a complementary base sequence.
  • the stability of the duplex is dependent on a number of factors, including the length of the oligonucleotides, the base composition, and the composition of the solution in which hybridization is effected.
  • the effects of base composition on duplex stability can be reduced by carrying out the hybridization in particular solutions, for example in the presence of high concentrations of tertiary or quaternary amines.
  • the thermal stability of the duplex is also dependent on the degree of sequence similarity between the sequences.
  • each oligonucleotide sequence employed in the array can be selected to optimize binding to a specific allele of a marker locus associated with size in dogs.
  • An optimum length for use with a particular marker nucleic acid sequence under specific screening conditions can be determined empirically.
  • the length for each individual element of the set of oligonucleotide sequences including in the array can be optimized for screening.
  • oligonucleotide probes are from about 20 to about 35 nucleotides in length or about 25 to about 40 nucleotides in length.
  • the oligonucleotide probe sequences forming the array can be directly linked to the support, for example via the 5'- or 3 '-end of the probe.
  • the oligonucleotides are bound to the solid support by the 5' end.
  • one of skill in the art can determine whether the use of the 3' end or the 5' end of the oligonucleotide is suitable for bonding to the solid support.
  • the internal complementarity of an oligonucleotide probe in the region of the 3' end and the 5' end determines binding to the support.
  • the oligonucleotide probes can be attached to the support by sequences such as oligonucleotides or other molecules that serve as spacers or linkers to the solid support.
  • the array is a microarray formed from glass (silicon dioxide).
  • Suitable silicon dioxide types for the solid support include, but are not limited to: aluminosilicate, borosilicate, silica, soda lime, zinc titania and fused silica (for example see Schena, Micraoarray Analysis. John Wiley & Sons, Inc, Hoboken, New Jersey, 2003).
  • the attachment of nucleic acids to the surface of the glass can be achieved by methods known in the art, for example by surface treatments that form from an organic polymer.
  • Particular examples include, but are not limited to: polypropylene, polyethylene, polybutylene, polyisobutylene, polybutadiene, polyisoprene, polyvinylpyrrolidine, polytetrafluroethylene, polyvinylidene difluroide, polyfluoroethylene-propylene, polyethylenevinyl alcohol, polymethylpentene, polycholorotrifluoroethylene, polysulfornes, hydroxylated biaxially oriented polypropylene, aminated biaxially oriented polypropylene, thiolated biaxially oriented polypropylene, etyleneacrylic acid, thylene methacrylic acid, and blends of copolymers thereof (see U.S. Patent No. 5,985,567), organosilane compounds that provide chemically active amine or aldehyde groups, epoxy or polylysine treatment of the microarray.
  • a solid support surface is polypropylene.
  • suitable characteristics of the material that can be used to form the solid support surface include: being amenable to surface activation such that upon activation, the surface of the support is capable of covalently attaching a biomolecule such as an oligonucleotide thereto; amenability to "in situ" synthesis of biomolecules; being chemically inert such that at the areas on the support not occupied by the oligonucleotides are not amenable to non-specific binding, or when non-specific binding occurs, such materials can be readily removed from the surface without removing the oligonucleotides.
  • the surface treatment is amine-containing silane derivatives. Attachment of nucleic acids to an amine surface occurs via interactions between negatively charged phosphate groups on the DNA backbone and positively charged amino groups (Schena, Micraoarray Analysis. John Wiley & Sons, Inc, Hoboken, New Jersey, 2003).
  • reactive aldehyde groups are used as surface treatment. Attachment to the aldehyde surface is achieved by the addition of 5 '-amine group or amino linker to the DNA of interest. Binding occurs when the nonbonding electron pair on the amine linker acts as a nucleophile that attacks the electropositive carbon atom of the aldehyde group (Id.).
  • a wide variety of array formats can be employed in accordance with the present disclosure.
  • One example includes a linear array of oligonucleotide bands, generally referred to in the art as a dipstick.
  • Another suitable format includes a two- dimensional pattern of discrete cells (such as 4096 squares in a 64 by 64 array).
  • other array formats including, but not limited to slot (rectangular) and circular arrays are equally suitable for use (see U.S. Patent No. 5,981,185).
  • the array is formed on a polymer medium, which is a thread, membrane or film.
  • An example of an organic polymer medium is a polypropylene sheet having a thickness on the order of about 1 mil.
  • the array is a solid phase, Allele- Specific Oligonucleotides (ASO) based nucleic acid array.
  • ASO Allele- Specific Oligonucleotides
  • a "format” includes any format to which the solid support can be affixed, such as microtiter plates, test tubes, inorganic sheets, dipsticks, and the like.
  • the solid support is a polypropylene thread
  • one or more polypropylene threads can be affixed to a plastic dipstick-type device
  • polypropylene membranes can be affixed to glass slides.
  • the particular format is, in and of itself, unimportant.
  • the solid support can be affixed thereto without affecting the functional behavior of the solid support or any biopolymer absorbed thereon, and that the format (such as the dipstick or slide) is stable to any materials into which the device is introduced (such as clinical samples and hybridization solutions).
  • the arrays of the present disclosure can be prepared by a variety of approaches.
  • oligonucleotide or protein sequences are synthesized separately and then attached to a solid support (see U.S. Patent No. 6,013,789).
  • sequences are synthesized directly onto the support to provide the desired array (see U.S. Patent No. 5,554,501).
  • Suitable methods for covalently coupling oligonucleotides and proteins to a solid support and for directly synthesizing the oligonucleotides or proteins onto the support are known to those working in the field; a summary of suitable methods can be found in Matson et al., Anal. Biochem. 217:306-10, 1994.
  • the oligonucleotides are synthesized onto the support using conventional chemical techniques for preparing oligonucleotides on solid supports (such as see PCT applications WO 85/01051 and WO 89/10977, or U.S. Patent No. 5,554,501).
  • a suitable array can be produced using automated means to synthesize oligonucleotides in the cells of the array by laying down the precursors for the four bases in a predetermined pattern.
  • a multiple-channel automated chemical delivery system is employed to create oligonucleotide probe populations in parallel rows (corresponding in number to the number of channels in the delivery system) across the substrate.
  • the substrate can then be rotated by 90° to permit synthesis to proceed within a second (2°) set of rows that are now perpendicular to the first set. This process creates a multiple-channel array whose intersection generates a plurality of discrete cells.
  • the oligonucleotide probes on the array include one or more labels, that permit detection of oligonucleotide probe:target sequence hybridization complexes. Kits
  • kits that can be used to predict body size of a subject dog (such as an immature dog (puppy) that is less than about one year old) or predict the size of the dog's offspring.
  • Such kits allow one to determine the allele of one or more markers on chromosome 15 that identify a haplotype associated with a QTL that contributes to body size.
  • Non exclusive examples of such markers are provided in Table 1 (and in Table 3).
  • kits can include a binding molecule, such as an oligonucleotide probe that selectively hybridizes to an allele of a marker associated with size in dogs.
  • the kit includes the isolated oligonucleotide probes shown in Table 1 or a subset thereof (such as either or both alternative forms of SEQ ID NOs: 1, 3, 4, 5, 9 and 16).
  • the kits can include one or more isolated primers or primer pairs for amplifying a target nucleic acid including the marker.
  • the location of exemplary primers is provided in Table 2 and one of skill in the art can determine appropriate primer sequences at the locations provided using the Canfaml sequence.
  • a probe or primer including the full length of any one of the markers can be used as can fragments, such as fragments of at least 12 contiguous nucleotides, or 13 contiguous nucleotides, or 14 contiguous nucleotides, or 15 contiguous nucleotides, or more of any of these sequences.
  • the kit can further include one or more of a buffer solution, a conjugating solution for developing the signal of interest, or a detection reagent for detecting the signal of interest, each in separate packaging, such as a container.
  • the kit includes a plurality of size-associated marker target nucleic acid sequences for hybridization with a detection array.
  • the target nucleic acid sequences can include oligonucleotides such as DNA, RNA, and peptide-nucleic acid, or can include PCR fragments.
  • This Example describes the identification of amplicons that can be used to identify markers that are useful in methods of predicting the adult body size of a dog.
  • SNPs and insertion/deletion polymorphisms were discovered (see, Example 2) by sequencing PCR amplicons from dog genomic DNA. Sequencing reactions were bi-directional from exonuclease/shrimp alkaline phosphatase cleaned PCR amplicons by standard methods. SNP genotyping utilized the SNPlex platform (Applied Biosystems) following the manufacturer's protocol. A total of 338 amplicons spanning this interval were sequenced partly in four large and four small Portuguese water dogs and partly in nine dogs from small ( ⁇ 9 kg) and giant (>30 kg) breeds.
  • This Example describes the markers that are useful for predicting the body size of Portuguese water dogs.
  • Y Xa + Zu + e [0156]
  • Y is the vector of the skeletal size trait
  • is a vector of fixed effect, the SNP effect being testing
  • u is a vector of random effect reflecting the polygenetic background
  • X and Z are known incidence matrices relating the observations to fixed and random effects, respectively.
  • the essential idea is that relatedness is incorporated into the model.
  • the variance in the model can be expressed as:
  • K is the consanguinity matrix estimated from the known pedigree, which reflects the genetic background correlations between individuals.
  • FIG. 1 A mixed model test for association between size and genotype as three categories (A 1 A 1 , A 1 A 2 , A 2 A 2 ) was calculated using all pairwise coefficients of consanguinity for 376 dogs with skeletal size measurements. As shown in FIG. IA, each filled circle plots a single SNP' s position on canine chromosome 15 and negative log P-value for the association statistic.
  • FIG. IB illustrates mean skeletal size in Portuguese water dogs population samples carrying different IGFl haplotypes.
  • Haplotypes were inferred for 20 markers (see, bolded markers in Table 3) spanning the IGFl gene (cfal5:44,212,792-44,278,140, Canfaml).
  • Data are plotted as a cumulative distribution for each genotype: B/B, B/I, and I/I.
  • homozygotes for the 'B' allele have a lower level of IGFl protein in blood serum than either heterozygotes or homozygotes for haplotype T (FIG. 1C; P-value ⁇ 9.34 x 10-4, ANOVA). These results support simple Mendelian inheritance with the T allele acting in a partially dominant fashion with the 'B' allele.
  • This Example shows a correlation between circulating IGFl concentrations and dog body size.
  • This Example describes the identification of the genomic region containing the genes responsible for dog size using 122 SNPs in 43 different breeds of dogs.
  • FIG. 2 summarizes association mapping statistics, haplotype variation, marker heterozygosity, and population differentiation among small and giant dogs across these regions.
  • FIG. 2 provides evidence for IGFl as a determinant of body size in dogs and signatures of recent selection on the locus across breeds. The dashed line indicates Bonferroni correction for multiple tests.
  • FIG. 2A shows the Mann- Whitney U p-values for SNPs on chromosome 15 and 5 control chromosomes. The haplotype sharing among 952 dogs from 22 breeds.
  • FIG. 2B & 2C show the heterozygosity ratio (small vs. large dogs) and genetic differentiation (F st ) for a sliding 10 SNP window across IGFl. Dashed lines delimit the 95% CIs based on non-parametric bootstrap resampling.
  • FIG. 2D shows a graph depicting small breeds ( ⁇ 9 kg) have a reduction in observed heterozygosity compared with giant breeds (>30 kg). The dashed lines are LOWESS best fit to the data. The IGFl gene interval is indicated.
  • FIG. 3 illustrates a Fisher's exact test p-values for tests of association between individual SNPs and body size (small vs. big) for 122 SNPs on chromosome 15 and 92 SNPs on five control chromosomes. P-values as small as 10 "15 were also obtained at the 92 genomic control markers surveyed on five other chromosomes (FIG. 3). Since the Mann-Whitney U nominal P-values at genome control markers are not inflated, it is likely that population structure within dog breeds is essentially the entire source of such inflation. This illustrates the need for caution in interpretation of nominal P- values obtained from structured populations, especially in the absence of an empirical genome distribution of P- values.
  • FIG. 2B the longest unrecombined region containing this SNP was plotted for each dog chromosome in the sample of 23 breeds with at least 5 dogs per breed (476 dogs in total). On the left haplotypes carrying the A allele and on the right the haplotypes carrying the G allele at this position are shown. Several remarkable features are noted: First, the A allele is vastly more common in small dog breeds than the G allele, with 96% of the small dog chromosomes carrying the A allele and 92% of the large dog chromosomes caring the G allele.
  • the extent of haplotype sharing that differentiates small from large dogs is just a 1,784 bp interval that can extend on the 3' end by no more than 360 bp (to chrl5:44226324) and on the 5' end by no more than 6.630 kb (to chrl5: 44235098) for a maximum length of 8.7 Kb.
  • Example 5 Inference of size in dogs based on a 20 SNP haplotype
  • This Example shows that dog size can be predicted using one or more SNPs of the 20 SNPs shown in Table 1.
  • haplotypes were inferred independently for samples from each small and giant breed.
  • Haplotypes for the 20 markers (alternative alleles of these markers are represented by SEQ ID NOs: 1-20, shown in Table 1) spanning the small breed sweep interval over IGFl were inferred independently in each breed.
  • fractional chromosome counts were summed for all haplotypes with at least 5% probability according to PHASE. Chromosome sums for each breed were rounded to integer values; several breeds have odd numbers of chromosomes due to round off error.
  • Small and giant dog breeds carry different IGFl haplotypes. Across a 70kb segment that spans the exons and introns of IGFl there is a 20-SNP shared haplotype (markers of Table 1) in small dog breeds (FIG. 4). Only inferred haplotypes carried by at least three dog chromosomes total (i.e. >0.5% frequency overall) are shown in FIG. 4.
  • the haplotypes (left) are rows labeled A-L and marker alleles are colored yellow for ancestral state (matching the nucleotide observed in >10 Canis aureus samples) and blue for derived state.
  • Haplotype 'B' found in small Portuguese water dogs is the most common haplotype in every one of the 14 small ( ⁇ 9 kg) breeds analyzed. This haplotype was observed in only three of the nine giant (>30 kg) breeds and only 5% of the chromosomes carrying haplotype "B" are from giant dogs. Most giant dogs carry one or both of two distinct haplotypes: 'F' and T . The former was not observed in any small breed. The two giant dog haplotypes are highly divergent from the small haplotype, 'B', differing from it at 11 and 16 positions, respectively, and from each other at 11 of 20 positions. Haplotype "C”, carried exclusively by small dogs, points to an 8 kb interval likely to contain the causal mutation. This haplotype shows evidence of recombination between SNPs 5 and 6, with SNPs 6 - 20 of haplotype C showing similarity to the large dog haplotypes.
  • Example 6 Inference of size based on a six tag SNP haplotype
  • This Example describes the detection of 6 SNPs and the use of the haplotype generated to predict adult dog body size.
  • IGFl haplotypes The distribution of IGFl haplotypes across dog populations was further characterized to assess the correlation between haplotype frequency and body size within breeds.
  • Six tagging SNPs were selected that together discriminate all major IGFl haplotypes: Canfaml reference positions: 44,212,792; 44,226,324; 44,226,684; 44,228,468; 44,237,388; and 44,260,949 (alternative alleles are provided in SEQ ID NOs: 1, 3, 4, 5, 9, and 16). These were evaluated in 3,241 dogs from 143 breeds. 90% of these samples do not overlap with the dogs analyzed for FIGS. 2 and 4.
  • haplotypes are shown vertically with genome position in the Canfaml assembly to the left. These haplotypes match the more highly resolved haplotypes from FIG. 3 as indicated by letters along the top. Most haplotypes from FIG. 3 are resolved but a few are not, e.g. the second haplotype here is consistent with both FIG. 3 haplotypes 'A' and 'B'. Marker ancestral states are inferred from jackal genotyping and are colored white. Derived alleles are indicated with bold letters.
  • Example 7 Identification of single SNP useful for predicting adult dog size
  • This Example describes the characterization of a SNP that can be used to predict adult dog body size.
  • SNP alternative alleles are provided in SEQ ID NO:5 in the center of the 84.3 Kb interval at position 44,228,468 stands out as showing a particularly strong association with average breed weight.
  • SNP represented by SEQ ID NO:5 constitutes a single best predictor of body size in dogs.
  • FIG. 5 demonstrates that breed size is negatively correlated with the presence of the "A" derived allele at SNP '5' (chrl5:44,228,468; fifth marker from the left in the haplotypes shown in FIG.
  • FIG. 5A shows binomial regression of allele frequency on square-root of mean breed weight. The dashed lines indicate 95% confidence interval on predicted equation line as estimated from non-parametric bootstrap resampling.
  • Example 8 Estimation of an ancestral recombination graph
  • An ancestral recombination graph was reconstructed for a 1.2 Mb interval (chrl5:43.7-44.9 Mb) that includes the IGFlcore region from 1052 sequences of all small and giant dog breeds and is rooted with data from the golden jackal (Canis aureus) using the software SHRUB (Song et al., Bioinformatics, 21:Suppl 1, i413-422, 2005; www.cs.ucdavis.edu/ ⁇ yssong/lu.html). Given a set of sequences and the ancestral sequence, SHRUB uses efficient branch and bound methods to compute the minimum number of recombination events necessary to explain the data and generates ARGs consistent with the data.
  • the ARG is illustrated in FIG. 7: Balls identified with "**” denote the 12 haplotypes, white balls denote coalescent events while balls identified with "*” indicate recombination vertices. The numbers below recombination vertices denote breakpoints. Numbers along the edges in the graph indicate mutations. Recombination branches are labeled “1" or "r” to denote material to the left or right of recombination breakpoints.
  • This Example describes the detection of the insertion of SINEC_Cf sequences in chromosome 15 of the dog and the correlation of such sequences with dog size.
  • exons of IGFl were sequenced in a panel of eight dogs (one large and one small Portuguese water dog defined using 43 radiographic metrics, and one each of rottweiler, miniature poodle, border terrier, Italian greyhound, pomeranian, Saint Bernard, and Vietnamese mastiff) and found only one variation in coding sequence, a synonymous SNP in exon 3 (Thr->Thr; chr 15:44:226,324, CanFaml). Extensive resequencing within introns and flanking genomic sequence was also undertaken. Although several additional SNPs unique to small breeds were identified, all were in strong linkage disequilibrium and therefore a single variation or combination of causative variants could not be definitively identified by this approach.
  • Insertion of a SINEC_Cf within an IGFl intron was genotyped in 23 dogs from 13 breeds using bi-directional sequencing by standard methods.
  • primers SQ6015F and SQ6015R (Table 2) were used to amplify the genomic target sequence by PCR.
  • the SINEC_Cf is inserted at chrl5:44,228,010 (canFaml) and has a characteristic 12 bp duplication of the insertion site flanking it. SINE insertion is perfectly correlated with haplotypes B, and C as illustrated in Table 4.
  • AU dogs without the SINEC-Cf insertion have the following SQ6015 primed sequence, which matches the Boxer reference genome sequence. All dogs with the SINEC-CF insertion have the following AQ6015 primed sequence which matches the Boxer reference genome sequence to position 44,338,010 (shown in normal type) and thereafter (shown in bold type) matches the SINEC-Cf consensus sequence with greater than 97% identity, as shown in SEQ ID NO:21.
  • the CA n micro satellite alleles are significantly associated with body size in both the Portuguese water dog samples (P- value ⁇ 1.4 x 10 "6 , ANOVA) and the 23 small and giant breeds (P- value ⁇ 2.2 X 10 "14 , Chi square) as illustrated in Table 5.
  • the target genomic sequence amplified using primers FH5934F (CACCTGAGGGGCAAACTATT SEQ ID NO: 251) and FH5934R (CCAGTTGAGGGATTTGAATGA SEQ ID NO: 252).
  • the size of the amplicon containing the micro satellite locus was determined to base pair resolution via electrophoresis according to standard methods.
  • the size of the microsatellite repeat that is the number of CA repeat units, is deduced from the overall size of the amplicon.
  • Alleles are named as the length of the PCR amplicon in base pairs. Table entries are counts of chromosomes from dogs within all 14 small breeds, all nine giant breeds, and the Portuguese water dog breed.
  • This example describes methods that can be used to predict the adult body size of a puppy.
  • a DNA sample from a puppy of unknown parentage is prepared using a genomic DNA isolation kit.
  • Kits for the extraction of high-molecular weight DNA for PCR include a Genomic Isolation Kit A.S.A.P. (Boehringer Mannheim, Indianapolis, Ind.), Genomic DNA Isolation System (GlBCO BRL, Gaithersburg, Md.), Elu-Quik DNA Purification Kit (Schleicher & Schuell, Keene, N.H.), DNA Extraction Kit (Stratagene, LaJoIIa, Calif.), TurboGen Isolation Kit (Invitrogen, San Diego, Calif.), and the like. Use of these kits according to the manufacturer's instructions is generally acceptable for purification of DNA prior to detecting the desired SNPs.
  • the concentration and purity of the extracted DNA can be determined by spectrophotometric analysis of the absorbance of a diluted aliquot at 260 nm and 280 nm. After extraction of the DNA, PCR amplification can proceed.
  • PCR primers are selected to amplify the region containing the sequence of interest.
  • PCR primers are selected to amplify a region that includes chromosome 15 Canfaml reference position 44,228,468.
  • the allele at SNP5 SEQ ID NO: 5
  • the puppy will be predicted to have a larger body size if the allele at chromosome 15 Can faml reference position 44,228,468 is not "A" (FIG. 4), and conversely, if the presence of an "A" allele is detected the puppy will be predicted to have smaller body size.
  • This example describes how determining the haplotype of parental male and female dogs can be used in a breeding program.
  • Genomic DNA from a male shih tzu dog (or other small breed) and a female shih tzu dog are isolated as described in Example 10 above.
  • PCR primers are selected to amplify the region of DNA encompassing Canfaml reference positions 44,212,792; 44,226,324; 44,226,684; 44,228,468; 44,237,388; and 44,260,949.
  • the alleles present at each of these positions are detected by DNA sequencing using standard methods. If the resulting male dog is found to have haplotype B (see, FIG. 4) and the resulting female dog is found to have haplotype I (see, FIG. 4). Therefore, a breeder desiring to create puppies having haplotypes more commonly associated with small dogs would choose different dogs as mating pairs.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Immunology (AREA)
  • Physics & Mathematics (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Image Analysis (AREA)

Abstract

Markers and alleles thereof useful for estimating the directional contribution to adult body size of a QTL associated with the IGFl locus are provided. Also provided are methods for identifying haplotypes associated with a desired adult body size useful for predicting adult body size of a subject dog.

Description

COMPOSITIONS AND METHODS FOR PREDICTING BODY SIZE IN
DOGS
CROSS REFERENCE TO RELATEDAPPLICATION
[001] This application claims priority from U.S. patent application number 60/856,411, filed November 2, 2006, the contents of which are herein incorporated by reference in their entirety.
FIELD
[002] This disclosure relates to the field of genetic diagnostics. More particularly, this disclosure relates to compositions and methods for predicting body size in dogs.
BACKGROUND
[003] The domestic dog shows remarkable variation in size, proportion, behavior and coat color far exceeding that of any other quadruped (Wayne, Evolution, 40:243-261, 1986). Size variation is especially extreme ranging from the one kilogram tea-cup poodle to the 100 kilogram English mastiff and surpassing that of all living and extinct species in the dog family, Canidae (Wayne, J. Morphol., 187:301-319, 1986; Wayne et al, J. Hered., 80:447-454, 1989).
[004] The insulin-like growth factor- 1 (IGFl) gene is a strong genetic determinant of body size in the mouse; the knock-out is just 60% normal mass (Baker et al, Cell, 75:73-82, 1993; Liu et al., Cell, 75:59-72, 1993) and haploinsufficient mice have lower mass, lower bone mineral density and shorter femurs (He et al., Bone, 38:826- 835, 2006). Similarly, a human with a homozygous partial deletion of the gene was born extremely small and grew very slowly throughout childhood (Woods et al. , Acta. Paediatr. SuppL, 423:39-45, 1997; Woods et al., N. Engl. J. Med., 335:1363- 1367, 1996). A second subject, whose body mass and length at 19 months of age were six standard deviations below the mean, was homozygous for a polyadenylation signal mutation that resulted in low levels of IGFl mRNA and circulating protein (Bonapace et al., Journal of Medical Genetics, 40:913-917, 2003). [005] IGFl mediates many of the growth-promoting properties of growth hormone (Cohen, Hormone Research, 65:3-8, 2006). Growth hormone activates transcription of IGFl in the liver (Mathews et al, Proc. Natl. Acad. ScL USA, 83:9343-9347, 1986), the major site of IGFl gene expression, but many different tissues express the two alternatively spliced IGFl mRNA variants (Ohtsuki et al., Zoolog. ScL, 22:1011-1021, 2005). IGFl binds the type 1 IGF receptor, a tyrosine kinase signal transducer, and induces cell growth, maintenance of cell survival (Kooijman, Cytokine Growth Factor Rev., 17:305-323, 2006), and induction of cellular differentiation (Cohen, Hormone Research, 65:3-8, 2006). In domestic dogs, studies seeking a correlation between blood serum IGFl protein levels and body size have produced inconsistent results. Eigenmann et al. found that standard poodles had a six fold higher IGFl protein concentration in plasma than toy poodles (Eigenmann et ah, Acta. Endocrinol, 106:448-453, 1984) and Tryfonidou et al. observed a three fold difference between miniature poodles and great Danes (Tryfonidou et al. , J. Anim. ScL, 81:1568-1580, 2003). However, Favier et al. found no difference in serum IGFl protein levels between beagles and great Danes (Favier et al., J. Endocrinol, 170:479-484, 2001).
[006] IGFl maps to chromosome 15 in the dog (Mellersh et al, Mamm Genome 11:120-30, 2000). To identify factors contributing to size variation in dogs, a sequence based marker discovery was initiated across the interval between 34 and 49Mb of chromosome 15 in the Portuguese water dog. Two quantitative trait loci (QTL, FH2017 at 37.9Mb and FH2295 at 43.5Mb) within this region were strongly associated with body size across 463 Portuguese water dogs with 100 radiographic skeletal measurements for size (Chase et al, Proc. Natl. Acad. ScL USA, 99:9930- 9935, 2002; Chase et al, Genome Res., 15:1820-1824, 2005). However, although extensive linkage disequilibrium facilitated the initial scan, further resolution by fine scale mapping (Lark et al, Trends Genet., 22:537-544, 2006) was limited in this breed, and the markers were uncorrelated with size in other breeds.
SUMMARY
[007] This disclosure concerns markers that define chromosomal haplotypes that identify an IGFl associated quantitative trait loci (QTL) associated with adult dog body size. Adult dogs are at least about one year old. An aspect of this disclosure provides markers on chromosome 15 within the interval between canine family version #1 genomic sequence (Canfaml) reference positions 34,000,000 to 49,000,000 flanking the IGFl locus. The Canfaml genomic sequence was completed in 2004 by the Broad Institute of MIT/Harvard and Agencourt Bioscience. The Canfaml sequence is available through UCSC Genome Bioinfomatics (for a description of the browser see, Kent et al., Genome Res. 12:996-1006, 2002). Alleles and haplotypes are dislosed that are predictive of a directional contribution to body size by an IGFl associated QTL. Also disclosed are methods for predicting adult body size in dogs, for example, by determining a directional contribution to body size by the QTL, using the disclosed markers. Kits for performing such methods are also disclosed.
[008] The foregoing and other objects and features of the disclosures will become more apparent from the following detailed description, which proceeds with reference to the accompanying figures.
BRIEF DESCRIPTION OF THE DRAWINGS
[009] FIG. IA is a graph showing localization of a QTL associated with adult body size in Portuguese water dogs to a position on chromosome 15 corresponding to the IGFl locus. FIGS. IB and 1C are graphs showing correlation of body size and serum IGF with different haplotypes segregating in a population of Portuguese water dogs.
[010] FIGS. 2A-2D is a series of graphs showing localization of a QTL associated with adult body size across dog breeds.
[011] FIG. 3 is a graph showing localization on chromorome 15 of a QTL associated with adult body using a Fisher's exact test.
[012] FIG. 1 is a table showing the identification of chromosome haplotypes (20 markers) associated with small and large body size in a variety of small and giant do s breeds. [013] FIGS. 5A and 5B are graphs showing association of specified haplotypes with body size in several large breeds.
[014] FIG. 6 is a table showing the identification of chromosome haplotypes (6 markers) associated with small and large body size in a variety of dog breeds.
[015] FIG. 7 is an ancestral recombination graph.
DESCRIPTION OF THE SEQUENCE LISTING
[016] SEQ ID NOs: 1-20 represent the polymorphic nucleotide sequence of alternative alleles of 20 single nucleotide polymorphisms (SNP).
[017] SEQ ID NO: 21 represents the polynucleotide sequence of a SINE-Cf insertion element at Canfaml reference position 44,228,010.
[018] SEQ ID NO: 22 represents the Boxer reference polynucleotide sequence containing the CA simple repeat length polymorphism at position 44,283,669. The illustrated sequence is primed from the amplification primer FH5934F. The CA repeat is shown here as a (TG) tract, the reverse complementary (CA) repeat is present on the opposite strand of DNA.
[019] SEQ ID NOS: 23-250 represent additional polymorphic nucleotide sequences of alternative alleles.
[020] SEQ ID NOS: 251 and 252 are PCR primers.
DETAILED DESCRIPTION
[021] This disclosure provides representative markers, and alleles thereof, that correspond to and identify a locus on dog chromosome 15 that is associated with adult body size. The markers and alleles described herein were located by general mapping of a quantitative trait locus (QTL) that constitutes a major effect locus for size in the dog. Successively finer mapping of chromosome 15 which contained the QTL showed that the region spans and encompasses the IGFl locus. The magnitude of the contribution of the IGFl associated QTL differs between breeds, and the ultimate size phenotype is produced by the expression of multiple genes and their interactions with each other (gene x gene interactions) and with environmental factors (gene x environment interactions). The markers and alleles disclosed herein can be used to define haplotypes that provide information regarding expected adult body size in dogs.
[022] In a preliminary study (Chase et al, Proc. Natl. Acad. ScL USA, 99:9930- 9935, 2002) a major effect locus for size in dogs was identified in Portuguese Water dogs. Two markers located on chromosome 15 in the vicinity of the IGFl locus were identified that define a chromosome haplotype correlating with adult body size in this breed. In this mid-size breed, haplotypes associated with small and large body size segregate in the population, and are a major determinant of adult body size. However, due to intervening recombination, these markers were not widely applicable across dog breeds, and cannot be used to predict body size in dogs of many other breeds.
[023] Therefore, genetic association studies were undertaken to identify markers, and alleles thereof, that are associated with body size across breeds of dog. The disclosed markers provide the means for identifying the genotype of a subject dog and thereby providing a means of assessing the dog's adult size prior to maturity.
[024] A single IGFl allele is carried by almost all dogs from the sampled small breeds and strongly implies that the same causal variant is responsible for the phenotype of diminished body size. Furthermore, the dominance of a single unique haplotype in a panel of phylogenetically divergent small dog breeds and its near absence in giant dogs indicates that the mutation predates the common origin of these small breeds and likely evolved early in the history of domestic dogs or conceivably in gray wolves.
[025] The markers, and alleles thereof, provide a simple, inexpensive and reliable means of identifying the haplotype associated with the IGFl locus on chromosome 15. By identifying the chromosome haplotype in this region, it is possible to predict whether the IGFl associated QTL contributes to small or large size of the dog.
[026] Thus, one aspect of this disclosure concerns markers (and alleles thereof) localized to an interval of dog chromosome 15 associated with an IGFl associated QTL that provides a directional contribution to adult body size in dogs. Typically, the marker (or markers) includes polymorphic nucleotide sequences situated in an interval between Canfaml reference position 44,199,850 and Canfaml reference position 44,284,186. In certain embodiments, the marker is localized to a position between Canfaml reference position 44,212,792 and Canfaml reference position 44,278,140. Exemplary markers include polymorphic nucleotide sequences at Canfaml reference positions 44,226,324; 44,228,010; 44,228,468; and 44,283,669. Kits including probes that detect the markers described herein are also a feature of this disclosure.
[027] Another aspect of this disclosure concerns a method for predicting adult body size in a dog. The method can include genotyping a sample obtained from a subject dog for one or more markers on chromosome 15 within the interval between Canfaml reference positions 34,000,000 and 49,000,000, e.g., that spans the IGFl locus. The markers are chosen to individually or collectively identify a haplotype associated with body size in a plurality of inbred dog breeds. The haplotype is correlated with adult body size providing a prediction of the adult body size of the subject dog. Typically, the selected markers localize to an interval on chromosome 15 between Canfaml reference positions 44,000,000 and 45,000,000. More commonly, the markers are localized to an interval on chromosome 15 between Canfaml reference positions 44,199,850 and 44,284,186. In certain embodiments, the markers are localized to an interval on chromosome 15 between Canfaml reference position 44,212,792 and Canfaml reference position 44,278,140. Exemplary markers include those described in Table 1. In specific examples, the markers include polymorphic markers at one or more of Canfaml reference positions 44,226,324; 44,228,010; 44,228,468; and 44,283,669. The methods can include identifying a haplotype on chromosome. [028] In certain embodiments, the haplotype is correlated with adult body size by comparing the haplotype to an index of average body size by breed. In some embodiments, haplotypes correlating with different body size segregate in an inbred population of dogs of which the subject dog is a member. In an embodiment, the identified haplotype is correlated with reduced serum IGFl expression or reduced IGFl function.
[029] This disclosure also provides methods for determining the directional contribution of a QTL associated with adult body size in the dog. Such methods can include genotyping a sample obtained from a subject dog for one or more markers, which markers individually or collectively identify a haplotype on chromosome 15 within the interval from Canfaml reference position 34,000,000 to Canfaml reference position 49,000,000 that is correlated with a directional contribution to body size by the QTL, thereby determining the directional contribution to body size by the QTL. Such methods can be used to predict adult body size by correlating the haplotype with an adult body size in the subject dog. For example, the chromosome 15 haplotype can be correlated with the directional contribution to body size by the QTL by comparing the haplotype to an index of average body size by breed. In some instances at least two haplotypes correlating with different body size are segregating in an inbred population of dogs of which the subject dog is a member.
[030] Optionally, dogs identified by the disclosed methods as having a desired body size can be crossed to produce progeny dogs with a desired adult body size.
[031] Unless otherwise explained, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Definitions of common terms in molecular biology can be found in Benjamin Lewin, Genes V, published by Oxford University Press, 1994 (ISBN 0-19-854287-9); Kendrew et al. (eds.), The Encyclopedia of Molecular Biology, published by Blackwell Science Ltd., 1994 (ISBN 0-632-02182-9); and Robert A. Meyers (ed.), Molecular Biology and Biotechnology: a Comprehensive Desik /Je/erence, published by VCH Publishers, Inc., 1995 (ISBN 1-56081-569-8). [032] The singular terms "a," "an," and "the" include plural referents unless context clearly indicates otherwise. Similarly, the word "or" is intended to include "and" unless the context clearly indicates otherwise. It is further to be understood that all base sizes or amino acid sizes, and all molecular weight or molecular mass values, given for nucleic acids or polypeptides are approximate, and are provided for description. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of this disclosure, suitable methods and materials are described below. The term "comprises" means "includes." The abbreviation, "e.g." is derived from the Latin exempli gratia, and is used herein to indicate a non-limiting example. Thus, the abbreviation "e.g." is synonymous with the term "for example."
[033] In order to facilitate review of the various embodiments of this disclosure, the following explanations of specific terms are provided:
[034] Allele: An alternate form of a gene or locus. A locus can have many different alleles, which can differ from each other by a single base substitution, deletion or addition, or by the substitution, deletion or addition of several or even many nucleotides. Exemplary alleles include the polymorphisms shown in Tables 1 and 3. As illustrated by the sequences in these tables, at several places within the identified region of the Canfaml genomic map, there is variation in the DNA sequence among dogs. For example SEQ ID NO: 1 in Table 1 shows the two common alleles at Canfaml reference position 44212792, one allele contains the sequence TAATGATGCT C ACACTTGGAA (SEQ ID NO: 1 ) and the other allele that is found at that same reference position is TAATGATGCT T ACACTTGGAA (SEQ ID NO: 1). These alleles differ at the nucleotide indicated in bold.
[035] Amplifying a nucleic acid molecule: To increase the number of copies of a nucleic acid sequence, such as a region of chromosome 15 from Canfaml, a gene or a fragment of a gene. In some instances the amplified region can be referred to as an amplicon and it can contain one or more genetic markers. Table 2 describes several amplicons that were made from Canfaml. [036] Array: An arrangement of molecules, such as biological macromolecules (such as polypeptides or nucleic acids) or biological samples (such as tissue sections), in addressable locations on or in a substrate. A "microarray" is an array that is miniaturized so as to require or be aided by microscopic examination for evaluation or analysis. Arrays are sometimes called DNA chips or biochips.
[037] The array of molecules ("features") makes it possible to carry out a very large number of analyses on a sample at one time. In certain example arrays, one or more molecules (such as an oligonucleotide probe) will occur on the array a plurality of times (such as twice), for instance to provide internal controls. The number of addressable locations on the array can vary, for example from a few (such as three) to at least six, at least 20, at least 25, or more. In particular examples, an array includes nucleic acid molecules, such as oligonucleotide sequences that are at least 15 nucleotides in length, such as about 15-40 nucleotides in length, such as at least 18 nucleotides in length, at least 21 nucleotides in length, or even at least 25 nucleotides in length. In one example, the molecule includes oligonucleotides attached to the array via their 5'- or 3 '-end.
[038] Within an array, each arrayed sample is addressable, in that its location can be reliably and consistently determined within the at least two dimensions of the array. The feature application location on an array can assume different shapes. For example, the array can be regular (such as arranged in uniform rows and columns) or irregular. Thus, in ordered arrays the location of each sample is assigned to the sample at the time when it is applied to the array, and a key may be provided in order to correlate each location with the appropriate target or feature position. Often, ordered arrays are arranged in a symmetrical grid pattern, but samples could be arranged in other patterns (such as in radially distributed lines, spiral lines, or ordered clusters). Addressable arrays usually are computer readable, in that a computer can be programmed to correlate a particular address on the array with information about the sample at that position (such as hybridization or binding data, including for instance signal intensity). In some examples of computer readable formats, the individual features in the array are arranged regularly, for instance in a Cartesian grid pattern, which can be correlated to address information by a computer.
[039] Exemplary arrays that are useful for detecting the haplotype of a dog include one or more nucleic acid sequences that target the polymorphisms shown in Table 1 or 3. In additional examples an array is made by fixing labeled probes that hybridize to SEQ ID NOS: 1-20 to a solid support.
[040] cDNA (complementary DNA): A piece of DNA corresponding in sequence to a messenger RNA extracted from a cell. cDNA can be produced by reverse transcription of cellular RNA. Typically, a cDNA lacks internal, non-coding segments (introns) and regulatory sequences which determine transcription.
[041] Concordance: The presence of two or more loci or traits (or combination thereof) derived from the same parental chromosome. The opposite of concordance is discordance, that is, the inheritance of only one (of two or more) parental alleles and/or traits) associated with a parental chromosome.
[042] Correlation: A correlation between a phenotypic trait and the presence or absence of a genetic marker (or haplotype or genotype) can be observed by measuring the phenotypic trait and comparing it to data showing the presence or absence of one or more genetic markers. Some correlations are stronger than others, meaning that in some instances all dogs having large adult body size will display a particular genetic marker (i.e., 100% correlation). In other examples the correlation will not be as strong, meaning that dogs having large adult body size (either compared within an inbred line of dogs, or within a mixed breed population) will only display a particular genetic marker 90%, 85%, 70%, 60%, 55%, or 50% of the time. In some instances, a haplotype which contains information relating to the presence or absence of multiple markers can also be correlated to adult dog body size. Examples of correlations between genetic markers and haplotypes are shown in the figures. Correlations can also be described using various statistical analysis, such as the Spearman's method described in Example 7 or the Mann-Whitney U test described in Example 4. [043] Directional Contribution: A contribution to a trait by a gene or QTL that can be measured directionally along a linear scale. For example, adult body size in dogs can be measured along a linear scale measured in weight. A gene or QTL makes a directional contribution towards size if its expression contributes to larger (or conversely, smaller) size.
[044] DNA (deoxyribonucleic acid): A long chain polymer which includes the genetic material of most living organisms (some viruses have genes including ribonucleic acid, RNA). The repeating units in DNA polymers are four different nucleotides, each of which includes one of the four bases, adenine, guanine, cytosine and thymine bound to a deoxyribose sugar to which a phosphate group is attached. Triplets of nucleotides, referred to as codons, in DNA molecules code for amino acid in a polypeptide. The term codon is also used for the corresponding (and complementary) sequences of three nucleotides in the mRNA into which the DNA sequence is transcribed.
[045] Deletion: The removal of one or more nucleotides from a nucleic acid sequence (or one or more amino acids from a protein sequence), the regions on either side of the removed sequence being joined together.
[046] Genotype: The set of alleles present in a subject at one or more loci under investigation. At any one autosomal locus a genotype will be either homozygous (with two identical alleles) or heterozygous (with two different alleles).
[047] Haplotype: The set of alleles present at linked loci (or nucleotide changes within a gene) that are found together on a single chromosome homolog. Haplotypes can be recognized by detecting more than one polymorphism. For example several haplotypes are shown in FIGS. 4 and 6.
[048] Hybridization: To form base pairs between complementary regions of two strands of DNA, RNA, or between DNA and RNA, thereby forming a duplex molecule or hybridization complex.
[049] Insertion: The addition of one or more nucleotides to a nucleic acid sequence, or the addition of one or more amino acids to a protein sequence. In contrast, the term deletion refers to the subtraction of or more nucleotides from a nucleic acid sequence, or the subtraction of one more amino acids from a protein sequence. The term substitution refers to the replacement of one nucleotide for a different nucleotide in a nucleic acid sequence, or the replacement of one amino acid for another amino acid in a protein sequence.
[050] Isolated: An "isolated" biological component, such as a nucleic acid molecule (or a protein or organelle) has been substantially separated or purified away from other biological components in the cell of the organism in which the component naturally occurs, such as other chromosomal and extra-chromosomal DNA and RNA, proteins and organelles. Nucleic acid molecules and proteins that have been "isolated" include nucleic acid molecules and proteins purified by standard purification methods. The term also embraces nucleic acid molecules and proteins prepared by recombinant expression in a host cell as well as chemically synthesized nucleic acid molecules and proteins.
[051] Label: An agent capable of detection, for example by ELISA, spectrophotometry, flow cytometry, or microscopy. For example, a label can be attached to a nucleic acid molecule, thereby permitting detection of the nucleic acid molecule. Examples of labels include, but are not limited to, radioactive isotopes, enzyme substrates, co-factors, ligands, chemilumine scent agents, fluorophores, haptens, enzymes, and combinations thereof. Methods for labeling and guidance in the choice of labels appropriate for various purposes are discussed for example in Sambrook et al. (Molecular Cloning: A Laboratory Manual, Cold Spring Harbor, New York, 1989) and Ausubel et al. (In Current Protocols in Molecular Biology, John Wiley & Sons, New York, 1998).
[052] Linkage: The association of two or more (and/or traits) at positions on the same chromosome, such that recombination between the two loci is reduced to a proportion significantly less than 50%. The term linkage can also be used in reference to the association between one or more loci and a trait if an allele (or alleles) and the trait, or absence thereof, are observed together in significantly greater than 50% of occurrences. A linkage group is a set of loci, in which all members are linked either directly or indirectly to all other members of the set.
[053] Linkage Disequilibrium: Co-occurrence of two genetic loci (e.g., markers) at a frequency greater than expected for independent loci based on the allele frequencies. Linkage disequilibrium (LD) typically occurs when two loci are located close together on the same chromosome. When alleles of two genetic loci (such as a marker locus and a causal locus) are in strong LD, the allele observed at one locus (such as a marker locus) is predictive of the allele found at the other locus (for example, a causal locus contributing to a phenotypic trait).
[054] Marker or Genetic Marker: A nucleic acid at a known location on a chromosome, which is associated with a specified gene or trait. Typically markers are highly polymorphic and the variant forms (or alleles) can be identified by simple and reproducible assay. The term marker can also be used to refer to the alleles and/or the polymorphisms shown in Tables 1 and 3.
[055] Microsatellite or Simple Sequence Repeat: A very short unit sequence of nucleotides that is repeated multiple times in tandem. Microsatellite sequences are present throughout the genome and are highly polymorphic in terms of length (number) of the repeated nucleotides. A polymorphism at a microsatellite locus is also referred to as a Simple Sequence Length Polymorphism (SSLP). An exemplary simple repeat sequence is shown in SEQ ID NO: 22.
[056] Multifactorial: A trait controlled by at least two factors, which can be genetic or environmental (for example, body weight). Polygenic traits, which are phenotypes that result from interactions among the products of two or more genes with alternative alleles, represent a subset of multifactorial traits.
[057] Nucleic acid molecules: A deoxyribonucleotide or ribonucleotide polymer including, without limitation, cDNA, mRNA, genomic DNA, and synthetic (such as chemically synthesized) DNA. The nucleic acid molecule can be double-stranded or single-stranded. Where single-stranded, the nucleic acid molecule can be the sense strand or the antisense strand. In addition, nucleic acid molecule can be circular or linear.
[058] The disclosure includes isolated nucleic acid molecules that include specified loci associated with adult body size in dogs. Such molecules can include at least 10, at least 15, at least 20, at least 21, at least 25, at least 30, at least 35, at least 40, at least 45 or at least 50 consecutive nucleotides of these sequences or more.
[059] Nucleotide: Includes, but is not limited to, a monomer that includes a base linked to a sugar, such as a pyrimidine, purine or synthetic analogs thereof, or a base linked to an amino acid, as in a peptide nucleic acid (PNA). A nucleotide is one monomer in a polynucleotide. A nucleotide sequence refers to the sequence of bases in a polynucleotide.
[060] Oligonucleotide: An oligonucleotide is a plurality of joined nucleotides joined by native phosphodiester bonds, between about 6 and about 300 nucleotides in length. An oligonucleotide analog refers to moieties that function similarly to oligonucleotides but have non-naturally occurring portions. For example, oligonucleotide analogs can contain non-naturally occurring portions, such as altered sugar moieties or inter-sugar linkages, such as a phosphorothioate oligodeoxynucleotide.
[061] Particular oligonucleotides and oligonucleotide analogs can include linear sequences up to about 200 nucleotides in length, for example a sequence (such as DNA or RNA) that is at least 6 bases, for example at least 8, at least 10, at least 15, at least 20, at least 21, at least 25, at least 30, at least 35, at least 40, at least 45, at least 50, at least 100 or even at least 200 bases long, or from about 6 to about 50 bases, for example about 10-25 bases, such as 12, 15, 20, 21, or 25 bases. In some examples these oligonucleotides are engineered to bind to the markers described herein.
[062] Oligonucleotide probe: A short sequence of nucleotides, such as at least 8, at least 10, at least 15, at least 20, at least 21, at least 25, or at least 30 nucleotides in length, used to detect the presence of a complementary sequence by molecular hybridization. In particular examples, oligonucleotide probes include a label that permit detection of oligonucleotide probe:target sequence hybridization complexes. Exemplary probes that are useful for detecting alleles that can be used to predict the body size of dogs include for example sequences that are complementary to any one of the sequences shown in Tables 1 and 3. Such probes can be used in various combinations in arrays so that the haplotype of a dog can be identified. For example, probes that target the sequences identified in Table 1 can be used in an array to identify the haploytypes shown in FIG. 4.
[063] Parent or Parental: An animal that is used in the initial cross of a multi- generational breeding program.
[064] Phenotype: The physical manifestation of a subject's genotype. The phenotype for a particular trait, such as adult dog body size, can be determined predominantly by the genotype at a single locus, at two or more loci, or as the result of interactions between the genotype and the environment.
[065] Polymorphism: As a result of mutations, a gene sequence can differ among individuals. The differing sequences are referred to as alleles. The alleles that are present at a given locus (a specific point within a nucleic acid sequence) are referred to as the individual's genotype. Some loci vary considerably among individuals. If a locus has two or more alleles whose frequencies each exceed 1% in a population, the locus is said to be polymorphic. The polymorphic site is termed a polymorphism. The term polymorphism also encompasses variations that produce gene products with altered function, that is, variants in the gene sequence that lead to gene products that are not functionally equivalent. This term also encompasses variations that produce no gene product, an inactive gene product, or increased or decreased activity gene product or even no biological effect.
[066] Polymorphisms can be referred to, for instance, by the nucleotide position at which the variation exists, by the change in amino acid sequence caused by the nucleotide variation, or by a change in some other characteristic of the nucleic acid molecule or protein that is linked to the variation. Polymorphisms can be causative (actually involved in or influencing the condition or trait to which the polymorphism is linked) or associative (linked to but not having any direct involvement in or influence on the condition or trait to which the polymorphism is linked). Polymorphisms useful as genetic markers include the exemplary SNP, SINE and micro satellite (SSLP) polymorphisms disclosed herein, as well as numerous alternatives, including for example, minisatellites, restriction fragment length polymorphisms (RFLPs), restriction fragment length variants (RFLVs), single strand conformation polymorphisms (SSCPs), amplification length polymorphisms (AFLPs), and the like.
[067] Primers: Short nucleic acid molecules, for instance DNA oligonucleotides 10 -100 nucleotides in length, such as about 15, 20, 21, 25, 30 or 50 nucleotides or more in length. Primers can be annealed to a complementary target DNA strand by nucleic acid hybridization to form a hybrid between the primer and the target DNA strand. Primer pairs can be used for amplification of a nucleic acid sequence, for example the CanFaml regions described in Tables 1-3, such as by PCR or other nucleic acid amplification methods known in the art. In some instances primers can include a labeling moiety and upon hybridization and amplification the resulting amplified DNA contains the label. When primers are used in this way they can also be referred to as probes.
[068] Methods for preparing and using nucleic acid primers are described, for example, in Sambrook et al. (In Molecular Cloning: A Laboratory Manual, CSHL, New York, 1989), Ausubel et al. (ed.) (In Current Protocols in Molecular Biology, John Wiley & Sons, New York, 1998), and Innis et al. (PCR Protocols, A Guide to Methods and Applications, Academic Press, Inc., San Diego, CA, 1990). PCR primer pairs can be derived from a known sequence, for example, by using computer programs intended for that purpose such as Primer (Version 0.5, © 1991, Whitehead Institute for Biomedical Research, Cambridge, MA).
[069] Purified: The term "purified" does not require absolute purity; rather, it is intended as a relative term. Thus, for example, a purified nucleic acid preparation is one in which the specified nucleic acid is more pure than the nucleic acid is in its natural environment within a cell. For example, a preparation of a nucleic acid is purified such that the specified nucleic acid represents at least 50% of the total nucleic acid content of the preparation.
[070] Quantitative Trait Loci (QTL): The location of one or more genes that are involved in the inheritance of quantitative traits. Quantitative traits are traits that measured on a continuous scale, such as body size which can take any number of different values. Though not necessarily genes themselves, quantitative trait loci (QTLs) are stretches of DNA that are closely linked to the genes that underlie the trait in question. QTLs can be molecularly identified (for example, with PCR) to help map regions of the genome that contain genes involved in specifying a quantitative trait. Markers (specific nucleic acid sequences) found within QTLs can be statistically correlated to various phenotypic traits, such as dog size, and such markers can then be used to predict the adult size of a dog.
[071] Sample: A biological specimen, such as those containing genomic DNA, RNA (including mRNA), protein, or combinations thereof. Examples include, but are not limited to, peripheral blood, urine, saliva, tissue biopsy, surgical specimen, amniocentesis samples, and autopsy material.
[072] SINE: Short interspersed element a few hundered basepairs in size, which belongs to family of retrotransposons dispersed throughout the genome of many mammals including domestic dogs. Presence of absence of a SINE at a specific site in the genome constitutes a detectable polymorphism. In dogs the most prevalent SINE family is designated SINEC. Dog SINEC elements are composed of two major subfamilies: SINECl and SINEC2, as well as a number of subfamilies. SINE insertion sites can be detected by amplifying the putative SINE insertion site using sequences flanking the SINE as primers. After amplification, the presence of the element is detected, for example, by southern hybridization.
[073] Single nucleotide polymorphism (SNP): A single base (nucleotide) difference in a DNA sequence among individuals in a population.
[074] Subject: Living multi-cellular vertebrate organisms, a category that includes human and non-human mammals (such as veterinary subjects). In specific examples a subject is a dog, such as a pure breed dog, for example a dachshund, collie, terrier, or one of the breeds described in the figures.
[075] Target sequence: A sequence of nucleotides located in a particular region in a genome (such as a human genome or the genome of any mammal) that corresponds to one or more specific phenotypic attributes, such as one or more nucleotide substitutions, deletions, insertions, amplifications, or combinations thereof. The target can be for instance a coding sequence; it can also be the non- coding strand that corresponds to a coding sequence. Examples of target sequences include those sequences associated with adult dog body size, such as SNP markers listed in Table 1 (polynucleotide sequences of alternative alleles of these markers are provided in SEQ ID NOs: 1-20) or alternative markers, such as those at Canfaml reference positions 44,228,010 and 44,283,669.
Genetic markers associated with body size in dogs
[076] As described in more detail below, amplicons (larger regions within the dog chromosome 15) were sequenced to identify the markers shown in Table 3. One of ordinary skill in the art will recognize that these markers can then be used to identify correlations with dog body size. For example, the 20 single nucleotide polymorphisms (shown in bold in Table 3 and also shown in Table 1) were used to develop haplotypes that can be used to predict adult dog body size. Moreover, these markers can be used to predict adult dog body size among various breeds and also within breeds.
[077] More specifically, selective sweep and association analyses identified a region of dog chromosome 15 near insulin-like growth factor- 1 (IGFl) with low variation, high divergence, and haplotype sharing in small dogs. These results also suggest that IGFl is associated with a major causative mutation for small size in dogs. The present disclosure provides exemplary markers and alleles thereof that span this region of the dog genome that identify discrete chromosomal haplotypes that correlate with adult body size in dogs. These markers have been validated in a wide variety of dog breeds of different sizes and are useful for predicting adult body size across multiple dog breeds. [078] Numerous polymorphic loci are provided in the example section herein. One of ordinary skill in the art will recognize the equivalency of these and additional markers localized within the same interval (34-49Mb) on dog chromosome 15. As will be appreciated by those of skill in the art, the closer the marker is to the causal mutation underlying the directional contribution to size of the IGFl associated quantitative trait loci (QTL), the higher the predictive value of the marker genotype. Although any of the provided polymorphic markers possesses predictive value with respect to size in dogs, the results disclosed herein demonstrate that the value of particular markers increases when the marker is positioned within progressively smaller intervals. It is presumed that the increase in predictive value is based on proximity to the putative causal mutation in IGFl which determines the contribution of this locus to adult body size in dogs. Thus, markers within the chromosome 15 interval from Canfaml reference position 44,000,000 to Canfaml reference position 45,000,000 are favorably used to predict haplotypes associated with size in dogs. In some embodiments, the markers reside within an interval between Canfaml reference position 44,190,850 to Canfaml reference position 44,284,186. In particular embodiments, the markers are localized within an even smaller interval from Canfaml reference position 44,212,792 to Canfaml reference position 44,278,140. In another embodiment, the marker is selected from among specific markers, such as the SNP markers of Table 1 (alternative alleles of these marker loci are depicted in SEQ ID NOs: 1-20) or from a set of markers (e.g. containing putative causal mutations), such as markers at Canfaml reference positions 44,226,324; 44,228,010; 44,228,468; and 44,283,669.
[079] To facilitate understanding of this disclosure, specific exemplary subsets of markers are described and discussed. These exemplary markers have been validated for their high predictive value for identifying a chromosomal haplotype associated with body size in adult dogs. The ability of these markers to identify an IGFl genotype correlated with small or large body size exceeds 99%, regardless of the breed. Thus, these markers and alleles constitute a substantial advancement with respect to prior art markers associated with size in dogs. [080] Table 1 provides a set of 20 SNP marker loci indicated by reference to the Canfaml assembly, along with the nucleotide sequence of the alternative alleles at the marker locus. These markers consistently and reliably identify chromosomal haplotypes that segregate with small and large body size in dogs. These markers are localized within an interval from Canfaml Reference Positions 44,212,792 to 44,278,140, a distance of approximately 65 kb. One of ordinary skill in the art will appreciate that any one or more of these markers can be used in the methods provided herein, such as any combination of SEQ ID NOS: 1-20.
Table 1: Polymorphic marker set for predicting adult body size in dogs
Figure imgf000021_0001
[081] In specific embodiments, the markers are selected from a reduced set of only six marker loci, at Canfaml reference positions: 44,212,792; 44,226,324; 44,226,684; 44,228,468; 44,237,388; and 44,260,949 (SEQ ID NOs: 1, 3, 4, 5, 9, and 16), which identify chromosomal haplotypes correlated with either small or large size in dogs. Thus, it is readily apparent that all or some of these exemplary markers can favorably be used to assess body size. The precise number of markers evaluated is not important, rather the predictive value resides in the association of these markers with a chromosomal conformation associated with an IGFl locus that constitutes a major effect locus for body size in dogs. Indeed, a single marker within this region, such as SNP5 (44,228,468) represented by SEQ ID NO: 5, can be used alone to predict adult dog body size. Similarly, alternative markers closely linked to the described marker(s) can also be used to infer size. For example, numerous additional polymorphic markers exist within this region, including insertion/deletion polymorphisms (such as a SINEC element insertion at Canfaml Reference position 44,228,010), and micro satellite polymorphisms (for example, a CA simple sequence repeat polymorphism at Canfaml Reference position 44,283,669).
[082] These markers are particularly useful for determining the directional contribution of an IGFl associated QTL in the absence of an identified causal variation at this locus. Of course, causal mutations within the IGFl locus could also be used to predict size in dogs.
[083] In addition, markers with sequence identity or sequence similarity to those disclosed herein are also suitable in the context of the methods disclosed herein. Sequence identity can be measured in terms of percentage identity; the higher the percentage, the more identical the sequences are. Homologs or orthologs of nucleic acid or amino acid sequences possess a relatively high degree of sequence identity/similarity when aligned using standard methods. This homology is more significant when the orthologous proteins or cDNAs are derived from species which are more closely related (such as between closely related breeds of dog), compared to species more distantly related (such as a dog and wolf or jackal).
[084] Methods of alignment of sequences for comparison are well known in the art. Various programs and alignment algorithms are described in: Smith & Waterman, Adv. Appl. Math. 2:482, 1981; Needleman & Wunsch, J. MoI. Biol. 48:443, 1970; Pearson & Lipman, Proc. Natl. Acad. ScL USA 85:2444, 1988; Higgins & Sharp, Gene, 73:237-44, 1988; Higgins & Sharp, CABIOS 5:151-3, 1989; Corpet et al, Nuc. Acids Res. 16:10881-90, 1988; Huang et al. Computer Appls. in the Biosciences 8, 155-65, 1992; and Pearson et al, Meth. MoI. Bio. 24:307-31, 1994. Altschul et al, J. MoI. Biol. 215:403-10, 1990, presents a detailed consideration of sequence alignment methods and homology calculations.
[085] The NCBI Basic Local Alignment Search Tool (BLAST) (Altschul et al, J. MoI. Biol. 215:403-10, 1990) is available from several sources, including the National Center for Biological Information (NCBI, National Library of Medicine, Building 38A, Room 8N805, Bethesda, MD 20894) and on the Internet, for use in connection with the sequence analysis programs blastp, blastn, blastx, tblastn and tblastx. Additional information can be found at the NCBI web site. Any of these programs can be employed using the default or other parameters specified by the practitioner.
[086] BLASTN is used to compare nucleic acid sequences, while BLASTP is used to compare amino acid sequences. If the two compared sequences share sequence identity, then the designated output file will present those regions of identity as aligned sequences. If the two compared sequences do not share sequence identity, then the designated output file will not present aligned sequences.
[087] Once aligned, the number of matches is determined by counting the number of positions where an identical nucleotide or amino acid residue is presented in both sequences. The percent sequence identity is determined by dividing the number of matches either by the length of the sequence set forth in the identified sequence, or by an articulated length (such as 100 consecutive nucleotides or amino acid residues from a sequence set forth in an identified sequence), followed by multiplying the resulting value by 100. Probes and primers directed to the polymorphisms correlated to adult dog body size need not share 100% sequence identity. Such homologous nucleic acid sequences can, for example, possess at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, at least 98%, or at least 99% sequence identity determined by this method.
Methods for predicting adult body size in dogs
[088] Methods of predicting the adult body size of a dog can include obtaining a sample from a subject dog that includes genomic DNA using any method known in the art. In some instances the dog is being used as part of a breeding program and the genotype of the dog will be matched according to characteristics desired by the breeder. For example, a dog may be chosen to mate with another dog that carries the same alleles, therefore, increasing the probability that the offspring will display a homogeneous adult size. In other instances the dog is a puppy, and the adult size of the puppy will be predicted. The genomic DNA sample taken from the subject dog will include an interval on chromosome 15 between Canfaml reference positions 44,000,000 and 45,000,000. This interval is shown herein to contain many markers that can be used to predict adult dog body size. For example, the interval contains the sequences set forth in SEQ ID NOS: 1-22, and 66-226. The genomic DNA sample can then be assayed to identify the alleles (polymorphisms) within the interval. The alleles present can then be used to predict adult dog body size.
[089] Breeding plans can include inbreeding, which refers to the breeding of two related (related as used herein refers to when the parentage of both the female dog and the male dog when traced back over several generations contain at least one common ancestor) dogs. Inbreeding is common practice when purebred dogs are desired and the practice of inbreeding allows for the maintenance of desirable phenotypic traits. Within a population of purebred dogs typically the adult body size is more consistent, then for instance within a population comprising a mixture of dog breeds. An index of adult dog body size within a pure bred population of dogs can be created and then the dogs can be genotyped using the markers provided herein. The haplotypes can then be correlated with the index of average body size.
[090] In some examples the polymorphism includes one or more of the sequences shown in SEQ ID NOS: 5-8, 11, 12, 15, 16, or 20 and the dog, or the offspring from the dog, is predicted to have small body size. In other examples, the polymorphism includes one or more of the sequences shown in SEQ ID NOS: 1-4, 8, 10, 13, 14, 17, or 20 and the dog, or offspring from the dog is predicted to have large body size. In yet other examples, the dog is less than 6 months, 1 year old, or 2 years old.
[091] One of ordinary skill in the art will appreciate that dog breeding plans vary depending upon the breeders' desired outcome. For example, a breeding program can be designed to diversify the alleles for small body size and large body size in the offspring. When diversity is desired, the breeding pair (the male and female parental dogs) will be chosen such that they do not display the same alleles. Conversely, when it is desired to have litters where the size of the adult offspring are homogeneous the breeding pair can be selected such that the parental dogs display the same markers. Generally, when homogeneity is desired the average size of the litter is similar to that of the breeding pair.
[092] The genomic DNA can be assayed to determine which markers are present using any method known in the art. For example, single-strand conformation polymorphism (SSCP) analysis, base excision sequence scanning (BESS), restriction fragment length polymorphism (RFLP) analysis, heteroduplex analysis, denaturing gradient gel electrophoresis (DGGE), temperature gradient electrophoresis, allelic polymerase chain reaction (PCR), ligase chain reaction direct sequencing, mini sequencing, nucleic acid hybridization, or micro-array-type detection can be used to identify the polymorphisms present in the sample.
[093] The methods described herein include genotyping a sample of genetic material obtained from a subject dog for one or more markers on chromosome 15 to determine the allele present at the marker locus. In one example, the markers are chosen from the markers provided in Table 3. In another example, one or more markers located in the interval from Canfaml reference position position 44,000,000 to Canfaml reference position 45,000,000 are genotyped to determine the allele for the marker(s). In certain cases, the markers localize within an interval between Canfaml reference position 44,190,850 and Canfam reference position 44,284,186. Optionally, the markers are localized within an even smaller interval from Canfaml reference position 44,212,792 to Canfaml reference position 44,278,140. In some instances, the marker is selected from among specific markers, such as the SNP markers of Table 1 (alternative alleles of these markers are represented by SEQ ID NOS: 1-20) or from a set of markers (e.g. containing putative causal mutations), such as markers at Canfaml reference positions 44,226,324; 44,228,010; 44,228,468; and 44,283,669. In one example, the markers shown in SEQ ID NOS: 1, 4, 5, and 21 can be detected. In yet other examples, the marker shown in SEQ ID NOS: 5 or 21 can be detected. [094] The genotype of the one or more markers identifies a haplotype on chromosome 15 associated with a QTL of major effect in the vicinity of the insulin- like growth factor- 1 (IGFl) locus that contributes to small or large size in dogs. For example, in populations of inbred dogs in which alternative alleles of the QTL are segregating, (such as Portuguese water dogs), identification of a haplotype correlated with small body (for example, haplotypes A, B, C shown in FIG. 4) in a subject dog indicates that the subject possesses an allele of the IGFl associated QTL that contributes to small body size. Thus, a dog with this haplotype will tend to smaller body size than a dog of the same breed and gender with a haplotype correlated with large body size (for example, haplotypes D-L shown in FIG. 4).
Specimens
[095] As previously mentioned, appropriate specimens, or samples, for use with the current disclosure in determining body size of a dog include any conventional clinical sample, for instance blood or blood-fractions (such as serum). Techniques for acquisition of such samples are well known in the art (for example see Schluger et al. J. Exp. Med. 176:1327-33, 1992, for the collection of serum samples). Serum or other blood fractions can be prepared in the conventional manner. For example, about 200 μL of serum can be used for the extraction of DNA for use in amplification reactions.
[096] Once a sample has been obtained, the sample can be used directly, concentrated (for example by centrifugation or filtration), purified, or combinations thereof. In some examples nucleic acids in the sample are subjected to an amplification reaction. For example, rapid DNA preparation can be performed using a commercially available kit (such as the InstaGene Matrix, BioRad, Hercules, CA; the NucliSens isolation kit, Organon Teknika, Netherlands). In one example, the DNA preparation method yields a nucleotide preparation that is accessible to, and amenable to, nucleic acid amplification.
[097] The markers genotype is determined by any convenient method for ascertaining pertinent information regarding the nucleic acid sequence of the locus in the sample. In many cases, this can include obtaining information regarding the nucleotide sequence of the target sequence corresponding to the marker locus in the sample.
Detection of alleles
[098] The nucleic acids obtained from the sample can be genotyped to identify the particular allele present for a marker locus. A sample of sufficient quantity to permit direct detection of marker alleles from the sample can be obtained from the subject. Alternatively, a smaller sample is obtained from the subject and the nucleic acids are amplified prior to detection. Optionally, the nucleic acid sample is purified (or partially purified) prior to detection of the marker alleles. Any target nucleic that is informative for a chromosome haplotype in the interval between 34 and 49 Mb can be detected. More commonly, the target nucleic acid corresponds to a marker locus localized to an interval between Canfaml reference position 44,190,850 and Canfaml reference position 44,284,186, flanking the IGFl locus. In certain examples, the target nucleic acid corresponds to a SNP marker selected from Table 1, or a SINEC insertion polymorphism at position 44,228,010, or a CA simple repeat length polymorphism at position 44,283,669. Such mutations or polymorphisms (or both) can be detected to identify the chromosomal haplotype in the subject. Any method of detecting a nucleic acid molecule can be used, such as hybridization and/or sequencing assays.
Hybridization
[099] Hybridization is the binding of complementary strands of DNA, DNA/RNA, or RNA. Hybridization can occur when primers or probes bind to target sequences such as target sequences within dog genomic DNA. Probes and primers that are useful generally include nucleic acid sequences that hybridize (for example under high stringency conditions) with at least 10, 12, 14, 16, 18, or 20 of the sequences provided in SEQ ID NOS: 1-250. Physical methods of detecting hybridization or binding of complementary strands of nucleic acid molecules, include but are not limited to, such methods as DNase I or chemical footprinting, gel shift and affinity cleavage assays, Southern and Northern blotting, dot blotting and light absorption detection procedures. The binding between a nucleic acid primer or probe and its target nucleic acid is frequently characterized by the temperature (Tm) at which 50% of the nucleic acid probe is melted from its target. A higher (Tm) means a stronger or more stable complex relative to a complex with a lower (Tm).
[0100] More generally, complementary nucleic acids form a stable duplex or triplex when the strands bind, (hybridize), to each other by forming Watson-Crick, Hoogsteen or reverse Hoogsteen base pairs. Stable binding occurs when an oligonucleotide molecule remains detectably bound to a target nucleic acid sequence under the required conditions.
[0101] Complementarity is the degree to which bases in one nucleic acid strand base pair with the bases in a second nucleic acid strand. Complementarity is conveniently described by percentage, that is, the proportion of nucleotides that form base pairs between two strands or within a specific region or domain of two strands. For example, if 10 nucleotides of a 15-nucleotide oligonucleotide form base pairs with a targeted region of a DNA molecule, that oligonucleotide is said to have 66.67% complementarity to the region of DNA targeted.
[0102] In the present disclosure, "sufficient complementarity" means that a sufficient number of base pairs exist between an oligonucleotide molecule and a target nucleic acid sequence (such as one of the markers provided in Tables 1 or 3) to achieve detectable binding. When expressed or measured by percentage of base pairs formed, the percentage complementarity that fulfills this goal can range from as little as about 50% complementarity to full (100%) complementary. In general, sufficient complementarity is at least about 50%, for example at least about 75% complementarity, at least about 90% complementarity, at least about 95% complementarity, at least about 98% complementarity, or even at least about 100% complementarity.
[0103] A thorough treatment of the qualitative and quantitative considerations involved in establishing binding conditions that allow one skilled in the art to design appropriate oligonucleotides for use under the desired conditions is provided by Beltz et al. Methods Enzymol 100:266-285, 1983, and by Sambrook et al. (ed.), Molecular Cloning: A Laboratory Manual, 2nd ed., vol. 1-3, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 1989.
[0104] Hybridization conditions resulting in particular degrees of stringency will vary depending upon the nature of the hybridization method and the composition and length of the hybridizing nucleic acid sequences. Generally, the temperature of hybridization and the ionic strength (such as the Na+ concentration) of the hybridization buffer will determine the stringency of hybridization. Calculations regarding hybridization conditions for attaining particular degrees of stringency are discussed in Sambrook et ah, (1989) Molecular Cloning: a laboratory manual, second edition, Cold Spring Harbor Laboratory, Plainview, NY (chapters 9 and 11). The following is an exemplary set of hybridization conditions and is not limiting:
Very High Stringency (detects sequences that share at least 90% complementarity)
Hybridization: 5x SSC at 65°C for 16 hours
Wash twice: 2x SSC at room temperature (RT) for 15 minutes each
Wash twice: 0.5x SSC at 65°C for 20 minutes each
High Stringency (detects sequences that share at least 80% complementarity)
Hybridization: 5x-6x SSC at 65°C-70°C for 16-20 hours
Wash twice: 2x SSC at RT for 5-20 minutes each
Wash twice: Ix SSC at 55°C-70°C for 30 minutes each
Low Stringency (detects sequences that share at least 50% complementarity)
Hybridization: 6x SSC at RT to 55°C for 16-20 hours
Wash at least twice: 2x-3x SSC at RT to 55°C for 20-30 minutes each.
[0105] Methods for labeling nucleic acid molecules so they can be detected are well known. Examples of such labels include non-radiolabels and radiolabels. Non- radiolabels include, but are not limited to an enzyme, chemiluminescent compound, fluorescent compound (such as FITC, Cy3, and Cy5), metal complex, hapten, enzyme, colorimetric agent, a dye, or combinations thereof. Radiolabels include, but are not limited to, 125I and 35S. For example, radioactive and fluorescent labeling methods, as well as other methods known in the art, are suitable for use with the present disclosure. In one example, primers used to amplify the subject's nucleic acids are labeled (such as with biotin, a radiolabel, or a fluorophore). In another example, amplified target nucleic acid samples are end-labeled to form labeled amplified material. For example, amplified nucleic acid molecules can be labeled by including labeled nucleotides in the amplification reactions.
[0106] Nucleic acid molecules associated corresponding to one or more marker loci can also be detected by hybridization procedures using a labeled nucleic acid probe, such as a probe that detects only one alternative allele at a marker locus. Most commonly, the target nucleic acid (or amplified target nucleic acid) is separated based on size or charge and transferred to a solid support. The solid support (such as membrane made of nylon or nitrocellulose) is contacted with a labeled nucleic acid probe, which hybridizes to it complementary target under suitable hybridization conditions to form a hybridization complex.
[0107] Hybridization conditions for a given combination of array and target material can be optimized routinely in an empirical manner close to the Tm of the expected duplexes, thereby maximizing the discriminating power of the method. For example, the hybridization conditions can be selected to permit discrimination between matched and mismatched oligonucleotides. Hybridization conditions can be chosen to correspond to those known to be suitable in standard procedures for hybridization to filters (and optionally for hybridization to arrays). In particular, temperature is controlled to substantially eliminate formation of duplexes between sequences other than an exactly complementary allele of the selected marker. A variety of known hybridization solvents can be employed, the choice being dependent on considerations known to one of skill in the art (see U.S. Patent 5,981,185).
[0108] Once the target nucleic acid molecules have been hybridized with the labeled probes, the presence of the hybridization complex can be analyzed, for example by detecting the complexes.
[0109] Methods for detecting hybridized nucleic acid complexes are well known in the art. In one example, detection includes detecting one or more labels present on the oligonucleotides, the target (e.g., amplified) sequences, or both. Detection can include treating the hybridized complex with a buffer and/or a conjugating solution to effect conjugation or coupling of the hybridized complex with the detection label, and treating the conjugated, hybridized complex with a detection reagent. In one example, the conjugating solution includes streptavidin alkaline phosphatase, avidin alkaline phosphatase, or horseradish peroxidase. Specific, non-limiting examples of conjugating solutions include streptavidin alkaline phosphatase, avidin alkaline phosphatase, or horseradish peroxidase. The conjugated, hybridized complex can be treated with a detection reagent. In one example, the detection reagent includes enzyme-labeled fluorescence reagents or calorimetric reagents. In one specific non- limiting example, the detection reagent is enzyme-labeled fluorescence reagent (ELF) from Molecular Probes, Inc. (Eugene, OR). The hybridized complex can then be placed on a detection device, such as an ultraviolet (UV) transilluminator (manufactured by UVP, Inc. of Upland, CA). The signal is developed and the increased signal intensity can be recorded with a recording device, such as a charge coupled device (CCD) camera (manufactured by Photometries, Inc. of Tucson, AZ). In particular examples, these steps are not performed when radiolabels are used.
[0110] In particular examples, the method further includes quantification, for instance by determining the amount of hybridization.
Allele Specific PCR
[0111] Allele- specific PCR differentiates between target regions differing in the presence of absence of a variation or polymorphism. PCR amplification primers are chosen based upon their complementarity to the target sequence, such as the sequences provided in Table 3, within the dog genomic DNA. The primers bind only to certain alleles of the target sequence. This method is described by Gibbs, Nucleic Acid Res. 17:12427 2448, 1989.
Allele Specific Oligonucleotide Screening Methods
[0112] Further screening methods employ the allele-specific oligonucleotide (ASO) screening methods (e.g. see Saiki et al., Nature 324:163-166, 1986). Oligonucleotides with one or more base pair mismatches are generated for any particular allele. ASO screening methods detect mismatches between one allele in the target genomic or PCR amplified DNA and the other allele, showing decreased binding of the oligonucleotide relative to the second allele (i.e. the other allele) oligonucleotide. Oligonucleotide probes can be designed that under low stringency will bind to both polymorphic forms of the allele, but which at high stringency, bind to the allele to which they correspond. Alternatively, stringency conditions can be devised in which an essentially binary response is obtained, i.e., an ASO corresponding to a variant form of the target gene will hybridize to that allele, and not to the wildtype allele.
Ligase Mediated Allele Detection Method
[0113] Ligase can also be used to detect point mutations, such as the SNPs in Table 3 in a ligation amplification reaction (e.g. as described in Wu et al., Genomics 4:560-569, 1989). The ligation amplification reaction (LAR) utilizes amplification of specific DNA sequence using sequential rounds of template dependent ligation (e.g. as described in Wu, supra, and Barany, Proc. Nat. Acad. Sci. 88:189-193, 1990).
Denaturing Gradient Gel Electrophoresis
[0114] Amplification products generated using the polymerase chain reaction can be analyzed by the use of denaturing gradient gel electrophoresis. Different alleles can be identified based on the different sequence-dependent melting properties and electrophoretic migration of DNA in solution. DNA molecules melt in segments, termed melting domains, under conditions of increased temperature or denaturation. Each melting domain melts cooperatively at a distinct, base-specific melting temperature (TM). Melting domains are at least 20 base pairs in length, and can be up to several hundred base pairs in length.
[0115] Differentiation between alleles based on sequence specific melting domain differences can be assessed using polyacrylamide gel electrophoresis, as described in Chapter 7 of Erlich, ed., PCR Technology, Principles and Applications for DNA Amplification, W. H. Freeman and Co., New York (1992).
[0116] Generally, a target region to be analyzed by denaturing gradient gel electrophoresis is amplified using PCR primers flanking the target region. The amplified PCR product is applied to a polyacrylamide gel with a linear denaturing gradient as described in Myers et al, Meth. Enzymol. 155:501-527, 1986, and Myers et al, in Genomic Analysis, A Practical Approach, K. Davies Ed. IRL Press Limited, Oxford, pp. 95 139, 1988. The electrophoresis system is maintained at a temperature slightly below the Tm of the melting domains of the target sequences.
[0117] In an alternative method of denaturing gradient gel electrophoresis, the target sequences can be initially attached to a stretch of GC nucleotides, termed a GC clamp, as described in Chapter 7 of Erlich, supra. In one example, at least 80% of the nucleotides in the GC clamp are either guanine or cytosine. In another example, the GC clamp is at least 30 bases long. This method is particularly suited to target sequences with high Tm's.
[0118] Generally, the target region is amplified by the polymerase chain reaction as described above. One of the oligonucleotide PCR primers carries at its 5' end, the GC clamp region, at least 30 bases of the GC rich sequence, which is incorporated into the 5' end of the target region during amplification. The resulting amplified target region is run on an electrophoresis gel under denaturing gradient conditions as described above. DNA fragments differing by a single base change will migrate through the gel to different positions, which can be visualized by ethidium bromide staining.
Temperature Gradient Gel Electrophoresis
[0119] Temperature gradient gel electrophoresis (TGGE) is based on the same underlying principles as denaturing gradient gel electrophoresis, except the denaturing gradient is produced by differences in temperature instead of differences in the concentration of a chemical denaturant. Standard TGGE utilizes an electrophoresis apparatus with a temperature gradient running along the electrophoresis path. As samples migrate through a gel with a uniform concentration of a chemical denaturant, they encounter increasing temperatures. An alternative method of TGGE, temporal temperature gradient gel electrophoresis (TTGE or tTGGE) uses a steadily increasing temperature of the entire electrophoresis gel to achieve the same result. As the samples migrate through the gel the temperature of the entire gel increases, leading the samples to encounter increasing temperature as they migrate through the gel. Preparation of samples, including PCR amplification with incorporation of a GC clamp, and visualization of products are the same as for denaturing gradient gel electrophoresis.
Single-Strand Conformation Polymorphism Analysis
[0120] Target sequences or alleles such as those provided in SEQ ID NOS: 1-250 can be differentiated using single- strand conformation polymorphism analysis, which identifies base differences by alteration in electrophoretic migration of single stranded PCR products, for example as described in Orita et al, Proc. Nat. Acad. Sci. 85:2766-2770, 1989. Amplified PCR products can be generated as described above, and heated or otherwise denatured, to form single stranded amplification products. Single-stranded nucleic acids can refold or form secondary structures which are partially dependent on the base sequence. Thus, electrophoretic mobility of single-stranded amplification products can detect base-sequence difference between alleles or target sequences.
Chemical or Enzymatic Cleavage of Mismatches
[0121] Differences between target sequences can also be detected by differential chemical cleavage of mismatched base pairs, for example as described in Grompe et al., Am. J. Hum. Genet. 48:212-222, 1991. In another method, differences between target sequences can be detected by enzymatic cleavage of mismatched base pairs, as described in Nelson et al., Nature Genetics 4:11-18, 1993. Briefly, genetic material from an animal and an affected family member can be used to generate mismatch free heterohybrid DNA duplexes. As used herein, "heterohybrid" means a DNA duplex strand comprising one strand of DNA from one animal, and a second DNA strand from another animal, usually an animal differing in the phenotype for the trait of interest. Positive selection for heterohybrids free of mismatches allows determination of small insertions, deletions or other polymorphisms such as those shown in Tables 1 and 3. Non-gel Systems
[0122] Other possible techniques include non-gel systems such as TaqMan™ (Perkin Elmer). In this system oligonucleotide PCR primers are designed that flank the mutation in question and allow PCR amplification of the region. A third oligonucleotide probe is then designed to hybridize to the region containing the base subject to change between different alleles of the gene. This probe is labeled with fluorescent dyes at both the 5' and 3' ends. These dyes are chosen such that while in this proximity to each other the fluorescence of one of them is quenched by the other and cannot be detected. Extension by Taq DNA polymerase from the PCR primer positioned 5' on the template relative to the probe leads to the cleavage of the dye attached to the 5' end of the annealed probe through the 5' nuclease activity of the Taq DNA polymerase. This removes the quenching effect allowing detection of the fluorescence from the dye at the 3' end of the probe. The discrimination between different DNA sequences arises through the fact that if the hybridization of the probe to the template molecule is not complete, i.e. there is a mismatch of some form, the cleavage of the dye does not take place. Thus only if the nucleotide sequence of the oligonucleotide probe is completely complimentary to the template molecule to which it is bound will quenching be removed. A reaction mix can contain two different probe sequences each designed against different alleles that might be present thus allowing the detection of both alleles in one reaction.
Non-PCR Based Allele detection
[0123] The identification of a DNA sequence linked to dog size can be made without an amplification step, based on polymorphisms including restriction fragment length polymorphisms in an animal and a family member. Hybridization probes are generally oligonucleotides which bind through complementary base pairing to all or part of a target nucleic acid. Probes typically bind target sequences lacking complete complementarity with the probe sequence depending on the stringency of the hybridization conditions. The probes can be labeled directly or indirectly, such that by assaying for the presence or absence of the probe, one can detect the presence or absence of the target sequence. Direct labeling methods include radioisotope labeling, such as with 32P or 35S. Indirect labeling methods include fluorescent tags, biotin complexes which can be bound to avidin or streptavidin, or peptide or protein tags. Visual detection methods include photoluminescents, Texas red, rhodamine and its derivatives, red leuco dye and 3,3',5,5'-tetramethylbenzidine (TMB), fluorescein, and its derivatives, dansyl, umbelliferone and the like or with horse radish peroxidase, alkaline phosphatase and the like.
[0124] Hybridization probes include any nucleotide sequence capable of hybridizing to dog chromosome 15 where a polymorphism is present that correlates with adult dog body size, and thus defining a genetic marker, including a restriction fragment length polymorphism, a hypervariable region, repetitive element, or a variable number tandem repeat. Hybridization probes can be any gene or a suitable analog. Further suitable hybridization probes include exon fragments or portions of cDNAs or genes known to map to the relevant region of the chromosome.
[0125] Exemplary tandem repeat hybridization probes for use in the methods disclosed are those that recognize a small number of fragments at a specific locus at high stringency hybridization conditions, or that recognize a larger number of fragments at that locus when the stringency conditions are lowered.
Primer Design Strategy
[0126] Increased use of polymerase chain reaction (PCR) methods has stimulated the development of many programs to aid in the design or selection of oligonucleotides used as primers for PCR. Four examples of such programs that are freely available via the Internet are: PRIMER by Mark Daly and Steve Lincoln of the Whitehead Institute (UNIX, VMS, DOS, and Macintosh), Oligonucleotide Selection Program (OSP) by Phil Green and LaDeana Hiller of Washington University in St. Louis (UNIX, VMS, DOS, and Macintosh), PGEN by Yoshi (DOS only), and Amplify by Bill Engels of the University of Wisconsin (Macintosh only). Generally these programs help in the design of PCR primers by searching for bits of known repeated- sequence elements and then optimizing the Tm by analyzing the length and GC content of a putative primer. Commercial software is also available and primer selection procedures are rapidly being included in most general sequence analysis packages.
[0127] Designing oligonucleotides for use as either sequencing or PCR primers to detect requires selection of an appropriate sequence that specifically recognizes the target, and then testing the sequence to eliminate the possibility that the oligonucleotide will have a stable secondary structure. Inverted repeats in the sequence can be identified using a repeat-identification or RNA-folding programs. If a possible stem structure is observed, the sequence of the primer can be shifted a few nucleotides in either direction to minimize the predicted secondary structure. When the amplified sequence is intended for subsequence cloning, the sequence of the oligonucleotide should also be compared with the sequences of both strands of the appropriate vector and insert DNA. Obviously, a sequencing primer should only have a single match to the target DNA. It is also advisable to exclude primers that have only a single mismatch with an undesired target DNA sequence. For PCR primers used to amplify genomic DNA, the primer sequence should be compared to the sequences in the GenBank database to determine if any significant matches occur. If the oligonucleotide sequence is present in any known DNA sequence or, more importantly, in any known repetitive elements, the primer sequence should be changed.
Amplification of nucleic acid molecules
[0128] Optionally, the nucleic acid samples obtained from the subject are amplified prior to detection. Target nucleic acids are amplified to obtain amplification products, including sequences from one or more markers enumerated in Table 1, Table 3, and/or at Canfaml reference positions 44,228,010 and 44,283,669, can be amplified from the sample prior to detection. Typically, DNA sequences are amplified, although in some instances RNA sequences can be amplified or converted into cDNA, using RT PCR for example.
[0129] Any nucleic acid amplification method can be used. An example of in vitro amplification is the polymerase chain reaction (PCR), in which a biological sample obtained from a subject is contacted with a pair of oligonucleotide primers, under conditions that allow for hybridization of the primers to a nucleic acid molecule in the sample. The primers are extended under suitable conditions, dissociated from the template, and then re-annealed, extended, and dissociated to amplify the number of copies of the nucleic acid molecule. Other examples of in vitro amplification techniques include quantitative real-time PCR, strand displacement amplification (see USPN 5,744,311); transcription-free isothermal amplification (see USPN 6,033,881); repair chain reaction amplification (see WO 90/01069); ligase chain reaction amplification (see EP-A-320 308); gap filling ligase chain reaction amplification (see USPN 5,427,930); coupled ligase detection and PCR (see USPN 6,027,889); and NASBA™ RNA transcription-free amplification (see USPN 6,025,134).
[0130] In specific examples, the target sequences to be amplified from the subject include one or more SNP markers shown in Table 1 and/or alternative markers such as those at Canfaml reference positions 44,228,010 and 44,283,669. In certain embodiments, target sequences containing one or more of SEQ ID NOs: 1-20, or a subset thereof, such as SEQ ID NOs: 1, 3, 4, 5, 9, and 16 are amplified. In an embodiment, a single marker with exceptionally high predictive value is amplified, such as a marker at Canfaml reference positions 44,228, 010 or 44,228,468.
[0131] A pair of primers can be utilized in the amplification reaction. One or both of the primers can be labeled, for example with a detectable radiolabel, fluorophore, or biotin molecule. The pair of primers includes an upstream primer (which binds 5' to the downstream primer) and a downstream primer (which binds 3' to the upstream primer). The pair of primers used in the amplification reaction are selective primers which permit amplification of a size related marker locus. Primers can be selected to amplify a nucleic acid molecule listed in Table 1, Table 3, or positions 44,228,010 and 44,283,669. Exemplary primers are shown in Table 2. Numerous alternative primers can be designed by those of skill in the art simply by determining the sequence of the desired target region, for example, using well known computer assisted algorithms that select primers within desired parameters suitable for annealing and amplification. [0132] If desired, an additional pair of primers can be included in the amplification reaction as an internal control. For example, these primers can be used to amplify a "housekeeping" nucleic acid molecule, and serve to provide confirmation of appropriate amplification. In another example, a target nucleic acid molecule including primer hybridization sites can be constructed and included in the amplification reactor. One of skill in the art will readily be able to identify primer pairs to serve as internal control primers.
Arrays for detecting nucleic acid
[0133] In particular examples involving genotyping of multiple marker loci, the methods can be performed using an array that includes a plurality of markers. Such arrays can include nucleic acid molecules. In one example, the array includes nucleic acid oligonucleotide probes that can hybridize to one or more alleles of a marker within the chromosome 15 interval between 34 and 40 Mb, such as any of the marker loci disclosed herein and/or their equivalents. In certain examples the arrays are designed to detect alleles at some or all of the twenty SNP markers of Table 1, e.g. alternative alleles represented by SEQ ID NOs: 1, 3, 4, 5, 9, and 16. Optionally, additional markers not selected from Table 1 are assessed using the array, for example, markers at Canfaml reference position 44,228,010 and/or at position 44,283,669. Certain of such arrays (as well as the methods described herein) can include additional molecules that are related to size in dogs, as well as other sequences, such as one or more probes that recognize one or more housekeeping genes.
[0134] Arrays can be used to detect the presence of amplified sequences corresponding to marker loci related to size in dogs using specific oligonucleotide probes. In one example, a set of oligonucleotide probes is attached to the surface of a solid support for use in detection of marker alleles that define haplotypes that are predictive of adult body size in dogs, such as amplified nucleic acid sequences obtained from the subject. Additionally, if an internal control nucleic acid sequence was amplified in the amplification reaction (see above), an oligonucleotide probe can be included to detect the presence of this amplified nucleic acid molecule. The oligonucleotide probes bound to the array can specifically bind sequences amplified in the amplification reaction (such as under high stringency conditions).
[0135] The methods and apparatus in accordance with the present disclosure takes advantage of the fact that under appropriate conditions oligonucleotides form base- paired duplexes with nucleic acid molecules that have a complementary base sequence. The stability of the duplex is dependent on a number of factors, including the length of the oligonucleotides, the base composition, and the composition of the solution in which hybridization is effected. The effects of base composition on duplex stability can be reduced by carrying out the hybridization in particular solutions, for example in the presence of high concentrations of tertiary or quaternary amines.
[0136] The thermal stability of the duplex is also dependent on the degree of sequence similarity between the sequences. By carrying out the hybridization at temperatures close to the anticipated Tm's of the type of duplexes expected to be formed between the target sequences and the oligonucleotides bound to the array, the rate of formation of mis-matched duplexes can be substantially reduced.
[0137] The length of each oligonucleotide sequence employed in the array can be selected to optimize binding to a specific allele of a marker locus associated with size in dogs. An optimum length for use with a particular marker nucleic acid sequence under specific screening conditions can be determined empirically. Thus, the length for each individual element of the set of oligonucleotide sequences including in the array can be optimized for screening. In one example, oligonucleotide probes are from about 20 to about 35 nucleotides in length or about 25 to about 40 nucleotides in length.
[0138] The oligonucleotide probe sequences forming the array can be directly linked to the support, for example via the 5'- or 3 '-end of the probe. In one example, the oligonucleotides are bound to the solid support by the 5' end. However, one of skill in the art can determine whether the use of the 3' end or the 5' end of the oligonucleotide is suitable for bonding to the solid support. In general, the internal complementarity of an oligonucleotide probe in the region of the 3' end and the 5' end determines binding to the support. Alternatively, the oligonucleotide probes can be attached to the support by sequences such as oligonucleotides or other molecules that serve as spacers or linkers to the solid support.
[0139] In particular examples, the array is a microarray formed from glass (silicon dioxide). Suitable silicon dioxide types for the solid support include, but are not limited to: aluminosilicate, borosilicate, silica, soda lime, zinc titania and fused silica (for example see Schena, Micraoarray Analysis. John Wiley & Sons, Inc, Hoboken, New Jersey, 2003). The attachment of nucleic acids to the surface of the glass can be achieved by methods known in the art, for example by surface treatments that form from an organic polymer. Particular examples include, but are not limited to: polypropylene, polyethylene, polybutylene, polyisobutylene, polybutadiene, polyisoprene, polyvinylpyrrolidine, polytetrafluroethylene, polyvinylidene difluroide, polyfluoroethylene-propylene, polyethylenevinyl alcohol, polymethylpentene, polycholorotrifluoroethylene, polysulfornes, hydroxylated biaxially oriented polypropylene, aminated biaxially oriented polypropylene, thiolated biaxially oriented polypropylene, etyleneacrylic acid, thylene methacrylic acid, and blends of copolymers thereof (see U.S. Patent No. 5,985,567), organosilane compounds that provide chemically active amine or aldehyde groups, epoxy or polylysine treatment of the microarray. Another example of a solid support surface is polypropylene.
[0140] In general, suitable characteristics of the material that can be used to form the solid support surface include: being amenable to surface activation such that upon activation, the surface of the support is capable of covalently attaching a biomolecule such as an oligonucleotide thereto; amenability to "in situ" synthesis of biomolecules; being chemically inert such that at the areas on the support not occupied by the oligonucleotides are not amenable to non-specific binding, or when non-specific binding occurs, such materials can be readily removed from the surface without removing the oligonucleotides.
[0141] In one example, the surface treatment is amine-containing silane derivatives. Attachment of nucleic acids to an amine surface occurs via interactions between negatively charged phosphate groups on the DNA backbone and positively charged amino groups (Schena, Micraoarray Analysis. John Wiley & Sons, Inc, Hoboken, New Jersey, 2003). In another example, reactive aldehyde groups are used as surface treatment. Attachment to the aldehyde surface is achieved by the addition of 5 '-amine group or amino linker to the DNA of interest. Binding occurs when the nonbonding electron pair on the amine linker acts as a nucleophile that attacks the electropositive carbon atom of the aldehyde group (Id.).
[0142] A wide variety of array formats can be employed in accordance with the present disclosure. One example includes a linear array of oligonucleotide bands, generally referred to in the art as a dipstick. Another suitable format includes a two- dimensional pattern of discrete cells (such as 4096 squares in a 64 by 64 array). As is appreciated by those skilled in the art, other array formats including, but not limited to slot (rectangular) and circular arrays are equally suitable for use (see U.S. Patent No. 5,981,185). In one example, the array is formed on a polymer medium, which is a thread, membrane or film. An example of an organic polymer medium is a polypropylene sheet having a thickness on the order of about 1 mil. (0.001 inch) to about 20 mil., although the thickness of the film is not critical and can be varied over a fairly broad range. Biaxially oriented polypropylene (BOPP) films are also suitable in this regard; in addition to their durability, BOPP films exhibit a low background fluorescence. In a particular example, the array is a solid phase, Allele- Specific Oligonucleotides (ASO) based nucleic acid array.
[0143] The array formats of the present disclosure can be included in a variety of different types of formats. A "format" includes any format to which the solid support can be affixed, such as microtiter plates, test tubes, inorganic sheets, dipsticks, and the like. For example, when the solid support is a polypropylene thread, one or more polypropylene threads can be affixed to a plastic dipstick-type device; polypropylene membranes can be affixed to glass slides. The particular format is, in and of itself, unimportant. All that is necessary is that the solid support can be affixed thereto without affecting the functional behavior of the solid support or any biopolymer absorbed thereon, and that the format (such as the dipstick or slide) is stable to any materials into which the device is introduced (such as clinical samples and hybridization solutions).
[0144] The arrays of the present disclosure can be prepared by a variety of approaches. In one example, oligonucleotide or protein sequences are synthesized separately and then attached to a solid support (see U.S. Patent No. 6,013,789). In another example, sequences are synthesized directly onto the support to provide the desired array (see U.S. Patent No. 5,554,501). Suitable methods for covalently coupling oligonucleotides and proteins to a solid support and for directly synthesizing the oligonucleotides or proteins onto the support are known to those working in the field; a summary of suitable methods can be found in Matson et al., Anal. Biochem. 217:306-10, 1994. In one example, the oligonucleotides are synthesized onto the support using conventional chemical techniques for preparing oligonucleotides on solid supports (such as see PCT applications WO 85/01051 and WO 89/10977, or U.S. Patent No. 5,554,501).
[0145] A suitable array can be produced using automated means to synthesize oligonucleotides in the cells of the array by laying down the precursors for the four bases in a predetermined pattern. Briefly, a multiple-channel automated chemical delivery system is employed to create oligonucleotide probe populations in parallel rows (corresponding in number to the number of channels in the delivery system) across the substrate. Following completion of oligonucleotide synthesis in a first direction, the substrate can then be rotated by 90° to permit synthesis to proceed within a second (2°) set of rows that are now perpendicular to the first set. This process creates a multiple-channel array whose intersection generates a plurality of discrete cells.
[0146] In particular examples, the oligonucleotide probes on the array include one or more labels, that permit detection of oligonucleotide probe:target sequence hybridization complexes. Kits
[0147] The present disclosure provides for kits that can be used to predict body size of a subject dog (such as an immature dog (puppy) that is less than about one year old) or predict the size of the dog's offspring. Such kits allow one to determine the allele of one or more markers on chromosome 15 that identify a haplotype associated with a QTL that contributes to body size. Non exclusive examples of such markers are provided in Table 1 (and in Table 3).
[0148] The disclosed kits can include a binding molecule, such as an oligonucleotide probe that selectively hybridizes to an allele of a marker associated with size in dogs. In one example, the kit includes the isolated oligonucleotide probes shown in Table 1 or a subset thereof (such as either or both alternative forms of SEQ ID NOs: 1, 3, 4, 5, 9 and 16). Alternatively or additionally, the kits can include one or more isolated primers or primer pairs for amplifying a target nucleic acid including the marker. The location of exemplary primers is provided in Table 2 and one of skill in the art can determine appropriate primer sequences at the locations provided using the Canfaml sequence. For example, a probe or primer including the full length of any one of the markers (for example, a probe corresponding or complementary to the sequences of Table 1 or Table 3 or a primer corresponding to a sequence of Table 2) can be used as can fragments, such as fragments of at least 12 contiguous nucleotides, or 13 contiguous nucleotides, or 14 contiguous nucleotides, or 15 contiguous nucleotides, or more of any of these sequences.
[0149] The kit can further include one or more of a buffer solution, a conjugating solution for developing the signal of interest, or a detection reagent for detecting the signal of interest, each in separate packaging, such as a container. In another example, the kit includes a plurality of size-associated marker target nucleic acid sequences for hybridization with a detection array. The target nucleic acid sequences can include oligonucleotides such as DNA, RNA, and peptide-nucleic acid, or can include PCR fragments. EXAMPLES Example 1: Identification of polymorphic markers associated with size in dogs
[0150] This Example describes the identification of amplicons that can be used to identify markers that are useful in methods of predicting the adult body size of a dog.
[0151] SNPs and insertion/deletion polymorphisms were discovered (see, Example 2) by sequencing PCR amplicons from dog genomic DNA. Sequencing reactions were bi-directional from exonuclease/shrimp alkaline phosphatase cleaned PCR amplicons by standard methods. SNP genotyping utilized the SNPlex platform (Applied Biosystems) following the manufacturer's protocol. A total of 338 amplicons spanning this interval were sequenced partly in four large and four small Portuguese water dogs and partly in nine dogs from small (<9 kg) and giant (>30 kg) breeds.
[0152] The following amplicons, listed in Table 2 were sequenced for marker discovery. Start and end positions are given for Canfaml assembly.
Table 2: Amplicons sequenced for marker discovery
Figure imgf000045_0001
Figure imgf000046_0001
Figure imgf000047_0001
Figure imgf000048_0001
Figure imgf000049_0001
Figure imgf000050_0001
Figure imgf000051_0001
Figure imgf000052_0001
[0153] In 170kb of sequence 248 SNPs, simple repeat length polymorphisms and insertion/deletion polymorphisms were validated as shown in Table 3.
Figure imgf000052_0002
Figure imgf000053_0001
Figure imgf000054_0001
Figure imgf000055_0001
Figure imgf000056_0001
Figure imgf000057_0001
For insertion/deletion variations, alleles are given as e.g. 'NNN/***' where asterisks indicate the number of bases deleted (or not inserted) at the position. SNPs shown in Table 1 are shown in bold type. Example 2: Mapping Genetic Factor for size in Portuguese water dogs
[0154] This Example describes the markers that are useful for predicting the body size of Portuguese water dogs.
[0155] One hundred twenty two of the SNP markers dispersed throughout the 34- 49Mb interval of chromosome 15 (Canfaml) were genotyped in 463 Portuguese water dogs, all from a known pedigree and having quantitative body size measurements based on 43 metrics from radiographs covering most of the skeleton (Chase et al, Proc. Natl. Acad. ScL USA, 99:9930-9935, 2002). A mixed model was applied for fine mapping within the Portuguese water dog population since the shared ancestry within the breed could lead to spurious associations. To reduce the effect of this cryptic relatedness between dogs, the mixed model analysis of Yu et al. (Yu et al, Nat. Genet., 38:203-208, 2006) was applied using:
Y = Xa + Zu + e [0156] where Y is the vector of the skeletal size trait; α is a vector of fixed effect, the SNP effect being testing; u is a vector of random effect reflecting the polygenetic background; and X and Z are known incidence matrices relating the observations to fixed and random effects, respectively. The essential idea is that relatedness is incorporated into the model. The variance in the model can be expressed as:
Figure imgf000058_0001
[0157] where K is the consanguinity matrix estimated from the known pedigree, which reflects the genetic background correlations between individuals.
[0158] Marker association with skeletal size was measured, and a single peak was identified within 300kb of the insulin-like growth factor- 1 gene (IGFl) as shown in FIG. 1. Skeletal size is associated with a genetic interval containing IGFl in Portuguese water dogs. A mixed model test for association between size and genotype as three categories (A1A1, A1A2, A2A2) was calculated using all pairwise coefficients of consanguinity for 376 dogs with skeletal size measurements. As shown in FIG. IA, each filled circle plots a single SNP' s position on canine chromosome 15 and negative log P-value for the association statistic. FIG. IB illustrates mean skeletal size in Portuguese water dogs population samples carrying different IGFl haplotypes. Haplotypes were inferred for 20 markers (see, bolded markers in Table 3) spanning the IGFl gene (cfal5:44,212,792-44,278,140, Canfaml). Out of the 720 chromosomes with successful inference, 96% carry one of just two haplotypes, "B" and "I", identical to haplotypes inferred for small and giant dogs (see FIG. 4). Data are plotted as a cumulative distribution for each genotype: B/B, B/I, and I/I.
[0159] Haplotype analysis of 20 SNPs surrounding IGFl in the full panel of 463 Portuguese water dogs further supports a role for the locus in determining body size. It was observed that 889 of the 926 (96%) PWD chromosomes carry one of just two haplotypes, termed 'B' and T. Portuguese water dogs homozygous for haplotype 'B' have the smallest median skeletal size (FIG. IB) while dogs homozygous for T are largest (P-value < 3.27 x 10-7, ANOVA). IGFl haplotype content explains 15% of the variance in skeletal size within the Portuguese water dog sample. Further, homozygotes for the 'B' allele have a lower level of IGFl protein in blood serum than either heterozygotes or homozygotes for haplotype T (FIG. 1C; P-value < 9.34 x 10-4, ANOVA). These results support simple Mendelian inheritance with the T allele acting in a partially dominant fashion with the 'B' allele.
Example 3: Causal relationship between IGFl and body size in Portuguese Water Dogs
[0160] This Example shows a correlation between circulating IGFl concentrations and dog body size.
[0161] To determine the causal relationship between IGFl and body size in dogs, the serum levels of IGFl protein (ng/ml) were assayed by immunoassay using a standard ELISA assay in 31 Portuguese water dogs carrying haplotypes 'B' and T. As shown in FIG. 1C, dogs carrying the I haplotype (heterozygous or homozygous) have higher serum IGFl levels than dogs carrying only the B haplotype. Example 4: Identification of haplotypes associated with small and large body size in dogs
[0162] This Example describes the identification of the genomic region containing the genes responsible for dog size using 122 SNPs in 43 different breeds of dogs.
[0163] To support a more general role of IGFl in influencing size differences among dog breeds, genetic variation in the same 122 SNPs as described in Example 2, was surveyed across 526 dogs representing 23 small (<9 kg) breeds and 20 giant (>23 kg) breed as well as the golden jackal (Canis aureus, a wild relative diverged 1-2 million years ago from dogs and used to obtain ancestral marker states). As a genomic control, variation in 92 SNPs was also surveyed on canine chromosomes 1, 2, 3, 34, and 37. The SNPs were assayed using SNPlex, from Applied Biosytems, Foster City, California, using manufacturers instructions.
[0164] When testing for association across structured populations such as dog breeds, there is a large inflation of nominal p-values in, e.g. Fisher's exact test, which is caused by the relatedness between samples within populations. Because dogs from different breeds are only very distantly related, a reasonable strategy is to only remove cryptic relatedness within breeds by collapsing the information obtained from dogs within the same breed into an allele frequency distribution. For each breed, the relative frequency of the minor allele at a marker was calculated and then a Mann- Whitney U test was conducted comparing the frequency in small dog breeds with the frequency in giant dog breeds. The test rejects the null hypothesis of no association if there is a large difference in the median allele frequency across small breeds as compared to the median frequency in large breeds.
[0165] FIG. 2 summarizes association mapping statistics, haplotype variation, marker heterozygosity, and population differentiation among small and giant dogs across these regions. FIG. 2 provides evidence for IGFl as a determinant of body size in dogs and signatures of recent selection on the locus across breeds. The dashed line indicates Bonferroni correction for multiple tests. FIG. 2A shows the Mann- Whitney U p-values for SNPs on chromosome 15 and 5 control chromosomes. The haplotype sharing among 952 dogs from 22 breeds. The longest region containing the SNP 5 (chrl5:44,228,468; a polymorphism shared by nearly all small breeds but nearly absent in large dogs) is displayed in both 5' and 3' directions until a recombination breakpoint is inferred. FIG. 2B & 2C show the heterozygosity ratio (small vs. large dogs) and genetic differentiation (Fst) for a sliding 10 SNP window across IGFl. Dashed lines delimit the 95% CIs based on non-parametric bootstrap resampling. FIG. 2D shows a graph depicting small breeds (<9 kg) have a reduction in observed heterozygosity compared with giant breeds (>30 kg). The dashed lines are LOWESS best fit to the data. The IGFl gene interval is indicated.
[0166] The association between a given SNP and average breed size was tested using a very conservative approach that compares the distribution of markers frequencies among small and giant breeds using a Mann- Whitney U test for breeds with at least ten chromosomes represented in the sample (14 small and 9 giant breeds). This test is very conservative in the sense that one is reducing the sample size of the test to the surveyed number of breeds (n = 23). After correction for multiple tests, 25 SNPs were identified in an interval of 84.3Kb (chrl5: 44199850- 44284186) centered on IGFl, rejecting the null hypothesis of no association between body size and marker frequency at the 5% level. As shown in FIG. 2A, none of the SNPs outside this interval of chromosome 15, or on any of the other chromosomes approached the criteria for rejecting the null hypothesis after correcting for multiple tests (i.e., p-values lie below the dashed line on FIG. 2A).
[0167] The same calculation was performed using Fisher's exact test with two categories (small and giant body size) and obtained P-values within IGFl smaller than 10"100. FIG. 3 illustrates a Fisher's exact test p-values for tests of association between individual SNPs and body size (small vs. big) for 122 SNPs on chromosome 15 and 92 SNPs on five control chromosomes. P-values as small as 10"15 were also obtained at the 92 genomic control markers surveyed on five other chromosomes (FIG. 3). Since the Mann-Whitney U nominal P-values at genome control markers are not inflated, it is likely that population structure within dog breeds is essentially the entire source of such inflation. This illustrates the need for caution in interpretation of nominal P- values obtained from structured populations, especially in the absence of an empirical genome distribution of P- values.
[0168] In FIG, 2B, the longest unrecombined region containing this SNP was plotted for each dog chromosome in the sample of 23 breeds with at least 5 dogs per breed (476 dogs in total). On the left haplotypes carrying the A allele and on the right the haplotypes carrying the G allele at this position are shown. Several remarkable features are noted: First, the A allele is vastly more common in small dog breeds than the G allele, with 96% of the small dog chromosomes carrying the A allele and 92% of the large dog chromosomes caring the G allele. Secondly, haplotypes regardless of whether they occur in big or small dogs are, on average, much longer on the A background (median length = 51.4 kb) than on the G background (median = 9.9 Kb; p<le-16, Mann- Whitney U test). This is consistent with the observation that the A allele is the derived state, suggesting a more recent origin for all haplotypes carrying this mutation. The extent of haplotype sharing that differentiates small from large dogs is just a 1,784 bp interval that can extend on the 3' end by no more than 360 bp (to chrl5:44226324) and on the 5' end by no more than 6.630 kb (to chrl5: 44235098) for a maximum length of 8.7 Kb.
[0169] Interestingly, there is a dramatic reduction in marker heterozygosity and increased genetic differentiation among small and large dogs (as measured by FST) in precisely this interval, signatures consistent with recent selection on the locus. Specifically, a sliding window of average heterozgyosity in small dogs as compared to large dogs (Figure 2C) shows a more than 4 fold reduction around the non- recombined interval shared by the vast majority of small dogs. Outside of this interval, the statistic fluctuates near 1. Likewise, if one treats "small dogs" as a population and compares to "giant dogs" as a population, there is a highly significant spike in sliding window FSχ (a common population genetic measure of differentiation) centered on and constrained to IGFl . Lastly, a detailed analysis of nucleotide variation in a 300 KB interval centered on the haplotype sharing positions demonstrates a dramatic reduction in the average observed heterozygosity among small breeds (but not giant breeds) as inferred from a LOWESS regression (FIG. 2E). Taken together, these results show that a narrowly defined genomic region holds the mutation(s) responsible for small size in a panel of disparate small breeds. Further, it demonstrates the utility of selective sweep mapping for identifying regions under selection in vertebrate genomes.
Example 5: Inference of size in dogs based on a 20 SNP haplotype
[0170] This Example shows that dog size can be predicted using one or more SNPs of the 20 SNPs shown in Table 1.
[0171] To further assess the association of SNP haplotypes and body size, haplotypes were inferred independently for samples from each small and giant breed. Haplotypes for the 20 markers (alternative alleles of these markers are represented by SEQ ID NOs: 1-20, shown in Table 1) spanning the small breed sweep interval over IGFl were inferred independently in each breed. For each individual, fractional chromosome counts were summed for all haplotypes with at least 5% probability according to PHASE. Chromosome sums for each breed were rounded to integer values; several breeds have odd numbers of chromosomes due to round off error.
[0172] A striking association between body size and haplotype centered in the genomic region containing IGFl . Small and giant dog breeds carry different IGFl haplotypes. Across a 70kb segment that spans the exons and introns of IGFl there is a 20-SNP shared haplotype (markers of Table 1) in small dog breeds (FIG. 4). Only inferred haplotypes carried by at least three dog chromosomes total (i.e. >0.5% frequency overall) are shown in FIG. 4. The haplotypes (left) are rows labeled A-L and marker alleles are colored yellow for ancestral state (matching the nucleotide observed in >10 Canis aureus samples) and blue for derived state. SNP positions within IGFl are shown at the top left with a mapping to IGFl introns (horizontal line) and exons (vertical bars). Breed size is given below the name as the average weight in kg of adult males. Small breeds less than 9 kg and giant breeds greater than 31 kg are grouped for totals shown at the far right.
[0173] Haplotype 'B' found in small Portuguese water dogs is the most common haplotype in every one of the 14 small (<9 kg) breeds analyzed. This haplotype was observed in only three of the nine giant (>30 kg) breeds and only 5% of the chromosomes carrying haplotype "B" are from giant dogs. Most giant dogs carry one or both of two distinct haplotypes: 'F' and T . The former was not observed in any small breed. The two giant dog haplotypes are highly divergent from the small haplotype, 'B', differing from it at 11 and 16 positions, respectively, and from each other at 11 of 20 positions. Haplotype "C", carried exclusively by small dogs, points to an 8 kb interval likely to contain the causal mutation. This haplotype shows evidence of recombination between SNPs 5 and 6, with SNPs 6 - 20 of haplotype C showing similarity to the large dog haplotypes.
Example 6: Inference of size based on a six tag SNP haplotype
[0174] This Example describes the detection of 6 SNPs and the use of the haplotype generated to predict adult dog body size.
[0175] The distribution of IGFl haplotypes across dog populations was further characterized to assess the correlation between haplotype frequency and body size within breeds. Six tagging SNPs were selected that together discriminate all major IGFl haplotypes: Canfaml reference positions: 44,212,792; 44,226,324; 44,226,684; 44,228,468; 44,237,388; and 44,260,949 (alternative alleles are provided in SEQ ID NOs: 1, 3, 4, 5, 9, and 16). These were evaluated in 3,241 dogs from 143 breeds. 90% of these samples do not overlap with the dogs analyzed for FIGS. 2 and 4.
[0176] The six tag SNPs that identify 14 haplotypes spanning the IGFl gene are shown in the table of FIG. 6. Haplotypes are shown vertically with genome position in the Canfaml assembly to the left. These haplotypes match the more highly resolved haplotypes from FIG. 3 as indicated by letters along the top. Most haplotypes from FIG. 3 are resolved but a few are not, e.g. the second haplotype here is consistent with both FIG. 3 haplotypes 'A' and 'B'. Marker ancestral states are inferred from jackal genotyping and are colored white. Derived alleles are indicated with bold letters.
Example 7: Identification of single SNP useful for predicting adult dog size
[0177] This Example describes the characterization of a SNP that can be used to predict adult dog body size. [0178] Across a myriad of different association test statistics employed (including MWU) a SNP (alternative alleles are provided in SEQ ID NO:5) in the center of the 84.3 Kb interval at position 44,228,468 stands out as showing a particularly strong association with average breed weight. As such, the SNP represented by SEQ ID NO:5 constitutes a single best predictor of body size in dogs. FIG. 5 demonstrates that breed size is negatively correlated with the presence of the "A" derived allele at SNP '5' (chrl5:44,228,468; fifth marker from the left in the haplotypes shown in FIG. 3 which uniquely separates big vs. small dog haplotypes). FIG. 5A shows binomial regression of allele frequency on square-root of mean breed weight. The dashed lines indicate 95% confidence interval on predicted equation line as estimated from non-parametric bootstrap resampling. FIG. 5B shows ordinarily least squares regression of square -root of mean breed weight on frequency of the "A" allele across 143 breeds. Between 5 and 109 dogs were genotyped (median = 22) per breed.
[0179] Body size and the frequency of the "A" allele at "SNP5" (chrl5:44,228,468; the 5th SNP from the left in the FIG. 3 haplotypes) are strongly negatively correlated across 143 breeds surveyed (n = 3,231 total dogs). This is evidenced by Spearman's rank correlation (p = -0.773; p-value < 2.2 x 10"16) as well as a logistic regression of allele frequency on body size (LRT = 2,882.3, χ2 df=1 , <2 x 10"16). Since the objective was to find genetic variants that explain body size, body size was regressed on allele frequency (β = -28.097; Intercept = 36.44; p < 2 x 10"16 for both; R2 = 0.5199). This latter analysis indicates that SNP5 (or a mutation in LD with SNP5) accounts for more than half of the variation in average breed size among domestic dogs with population substitution of the "G" allele by the "A" allele reducing body size by 28 kg. There are three clear outlier breeds identified by standard analysis of the residuals: great Dane, rottweiler, and mastiff. Removal of these breeds raises the proportion of variation in average breed size explained by SNP5 in /GFi to 58.4%.
Example 8: Estimation of an ancestral recombination graph
[0180] An ancestral recombination graph (ARG) was reconstructed for a 1.2 Mb interval (chrl5:43.7-44.9 Mb) that includes the IGFlcore region from 1052 sequences of all small and giant dog breeds and is rooted with data from the golden jackal (Canis aureus) using the software SHRUB (Song et al., Bioinformatics, 21:Suppl 1, i413-422, 2005; www.cs.ucdavis.edu/~yssong/lu.html). Given a set of sequences and the ancestral sequence, SHRUB uses efficient branch and bound methods to compute the minimum number of recombination events necessary to explain the data and generates ARGs consistent with the data.
[0181] The ARG is illustrated in FIG. 7: Balls identified with "**" denote the 12 haplotypes, white balls denote coalescent events while balls identified with "*" indicate recombination vertices. The numbers below recombination vertices denote breakpoints. Numbers along the edges in the graph indicate mutations. Recombination branches are labeled "1" or "r" to denote material to the left or right of recombination breakpoints.
Example 9: Candidate causal mutations in IGFl correlating with small size in dogs
[0182] This Example describes the detection of the insertion of SINEC_Cf sequences in chromosome 15 of the dog and the correlation of such sequences with dog size.
[0183] The exons of IGFl were sequenced in a panel of eight dogs (one large and one small Portuguese water dog defined using 43 radiographic metrics, and one each of rottweiler, miniature poodle, border terrier, Italian greyhound, pomeranian, Saint Bernard, and Tibetan mastiff) and found only one variation in coding sequence, a synonymous SNP in exon 3 (Thr->Thr; chr 15:44:226,324, CanFaml). Extensive resequencing within introns and flanking genomic sequence was also undertaken. Although several additional SNPs unique to small breeds were identified, all were in strong linkage disequilibrium and therefore a single variation or combination of causative variants could not be definitively identified by this approach.
[0184] There are three putatively functional mutations in IGFl that may account for reduced IGFl serum levels. These include: (1) an anti-sense oriented retrotransposon SINEC_Cf element (chrl5:44,228,010) that appears solely on the {A,B,C} haplotypes 458 bp from, and in complete LD with, SNP "5" (FIG. 2), (2) the synonymous SNP at chrl5:44,226,324, and (3) a micro satellite CAn mutation in the promoter of IGFl , alleles of which have been associated in humans with body size and height differences (Sweeney et al, Cancer Epidemiol. Biomarkers Prev., 14: 1802, 2005; Rietveld et al, Clin. Endocrinol, 61: 195, 2004) and the age related decline in circulating IGFl (Rietveld et al, Eur. J. Endocrinol, 148: 171-175, 2003).
[0185] Insertion of a SINEC_Cf within an IGFl intron was genotyped in 23 dogs from 13 breeds using bi-directional sequencing by standard methods. In brief, primers SQ6015F and SQ6015R (Table 2) were used to amplify the genomic target sequence by PCR. The SINEC_Cf is inserted at chrl5:44,228,010 (canFaml) and has a characteristic 12 bp duplication of the insertion site flanking it. SINE insertion is perfectly correlated with haplotypes B, and C as illustrated in Table 4.
Figure imgf000067_0001
WW* = sequencing provided no evidence for the presence of the SINE element.
However size separation by agarose gel showed the presence of a faint 2 nnd amplicon in each of these sample lanes of a size consistent with SINE insertion. It is likely that the short PCR extension time favored the smaller of two amplicons and genotypes for these two samples are therefore uncertain.
[0186] AU dogs without the SINEC-Cf insertion have the following SQ6015 primed sequence, which matches the Boxer reference genome sequence. All dogs with the SINEC-CF insertion have the following AQ6015 primed sequence which matches the Boxer reference genome sequence to position 44,338,010 (shown in normal type) and thereafter (shown in bold type) matches the SINEC-Cf consensus sequence with greater than 97% identity, as shown in SEQ ID NO:21.
[0187] The CAn micro satellite alleles are significantly associated with body size in both the Portuguese water dog samples (P- value < 1.4 x 10"6, ANOVA) and the 23 small and giant breeds (P- value < 2.2 X 10"14, Chi square) as illustrated in Table 5. The target genomic sequence amplified using primers FH5934F (CACCTGAGGGGCAAACTATT SEQ ID NO: 251) and FH5934R (CCAGTTGAGGGATTTGAATGA SEQ ID NO: 252). The size of the amplicon containing the micro satellite locus was determined to base pair resolution via electrophoresis according to standard methods. The size of the microsatellite repeat, that is the number of CA repeat units, is deduced from the overall size of the amplicon.
[0188] Alleles are named as the length of the PCR amplicon in base pairs. Table entries are counts of chromosomes from dogs within all 14 small breeds, all nine giant breeds, and the Portuguese water dog breed.
Table 5: Genotypes for an IGFl promoter CAn microsatellite
Figure imgf000068_0001
[0189] One or more of these putative causal mutations can explain functional differences between the alleles carried by small and giant dog breeds.
EXAMPLE 10: Predicting adult dog body size
[0190] This example describes methods that can be used to predict the adult body size of a puppy.
[0191] A DNA sample from a puppy of unknown parentage is prepared using a genomic DNA isolation kit. Kits for the extraction of high-molecular weight DNA for PCR include a Genomic Isolation Kit A.S.A.P. (Boehringer Mannheim, Indianapolis, Ind.), Genomic DNA Isolation System (GlBCO BRL, Gaithersburg, Md.), Elu-Quik DNA Purification Kit (Schleicher & Schuell, Keene, N.H.), DNA Extraction Kit (Stratagene, LaJoIIa, Calif.), TurboGen Isolation Kit (Invitrogen, San Diego, Calif.), and the like. Use of these kits according to the manufacturer's instructions is generally acceptable for purification of DNA prior to detecting the desired SNPs.
[0192] The concentration and purity of the extracted DNA can be determined by spectrophotometric analysis of the absorbance of a diluted aliquot at 260 nm and 280 nm. After extraction of the DNA, PCR amplification can proceed.
[0193] PCR primers are selected to amplify the region containing the sequence of interest. For example PCR primers are selected to amplify a region that includes chromosome 15 Canfaml reference position 44,228,468. The allele at SNP5 (SEQ ID NO: 5) can be detected. The puppy will be predicted to have a larger body size if the allele at chromosome 15 Can faml reference position 44,228,468 is not "A" (FIG. 4), and conversely, if the presence of an "A" allele is detected the puppy will be predicted to have smaller body size.
[0194] One of ordinary skill in the art will appreciate that the identification of the allele as SEQ ID NO: 5 can also be used as part of a breeding program. EXAMPLE 11: Using markers in breeding programs
[0195] This example describes how determining the haplotype of parental male and female dogs can be used in a breeding program.
[0196] Genomic DNA from a male shih tzu dog (or other small breed) and a female shih tzu dog are isolated as described in Example 10 above. PCR primers are selected to amplify the region of DNA encompassing Canfaml reference positions 44,212,792; 44,226,324; 44,226,684; 44,228,468; 44,237,388; and 44,260,949. The alleles present at each of these positions are detected by DNA sequencing using standard methods. If the resulting male dog is found to have haplotype B (see, FIG. 4) and the resulting female dog is found to have haplotype I (see, FIG. 4). Therefore, a breeder desiring to create puppies having haplotypes more commonly associated with small dogs would choose different dogs as mating pairs.
[0197] In view of the many possible embodiments to which the principles of the disclosure can be applied, it should be recognized that the illustrated embodiments are only examples of the disclosed methods and compositions and should not be taken as limiting the scope of the invention. Rather, the scope of the invention is defined by the following claims. We therefore claim as our invention all that comes within the scope and spirit of these claims.

Claims

We claim:
1. A method for predicting adult body size in a dog, the method comprising: genotyping a sample obtained from a subject dog for one or more markers on chromosome 15 within the interval between Canfaml reference positions 44,000,000 and 45,000,000 which markers individually or collectively identify a haplotype associated with body size in a plurality of inbred dog breeds; correlating the haplotype with adult body size, thereby predicting the adult body size of the subject dog.
2. The method of claim 1, wherein the one or more markers are localized to an interval on chromosome 15 between Canfaml reference positions 44,199,850 and 44,284,186.
3. The method of claim 2, wherein the one or more markers are localized to an interval on chromosome 15 between Canfaml reference position 44,212,792 and Canfaml reference position 44,278,140.
4. The method of claim 2, wherein the one or more markers comprise a polymorphic marker at one or more of Canfaml reference positions 44,226,324; 44,228,010; 44,228,468; and 44,283,669
5. The method of claim 2, wherein the one or more markers comprise at least one polymorphic marker enumerated in Table 1.
6. The method of claim 1, wherein correlating haplotype with adult body size comprises comparing the haplotype to an index of average body size by breed.
7. The method of claim 1, wherein at least two haplotypes correlating with different body size are segregated in an inbred population of dogs of which the subject dog is a member.
8. The method of claim 1, wherein the one or more markers identify a haplotype on chromosome 15 comprising the IGFl locus.
9. The method of claim 1, wherein the one or more markers comprise SEQ ID NO: 5 or SEQ ID NO: 21.
10. A method for determining the directional contribution of a Quantitative Trait Loci associated with adult body size in the dog, the method comprising: genotyping a sample obtained from a subject dog for one or more markers, which markers individually or collectively identify a haplotype on chromosome 15 within the interval from Canfaml reference position 34,000,000 to Canfaml reference position 49,000,000; wherein the haplotype is correlated with a directional contribution to body size by a gene comprising the quantitative trait loci, thereby determining the directional contribution to body size by the quantitative trait loci.
11. The method of claim 10, further comprising predicting adult body size by correlating the haplotype with an adult body size in the subject dog.
12. The method of claim 10, comprising correlating the chromosome 15 haplotype with the directional contribution to body size by comparing the haplotype to an index of average body size by breed.
13. The method of claim 10, wherein at least two haplotypes correlating with different body size are segregated in an inbred population of dogs of which the subject dog is a member.
14. The method of claim 9, wherein the one or more markers comprise a polymorphic marker selected from the markers enumerated in Table 1.
15. The method of claim 1, further comprising crossing the subject dog to produce progeny dogs with a desired adult body size.
16. An isolated nucleic acid sequence comprising a marker or plurality of markers localized to an interval of dog chromosome 15 associated with an IGFl associated QTL, which QTL provides a directional contribution to adult body size in dogs, wherein the marker or plurality of markers comprise polymorphic nucleotide sequences at one or more positions in an interval between Canfaml reference position 44,199,850 and Canfaml reference position 44,284,186.
17. The marker or plurality of markers of claim 16, which comprise polymorphic nucleotide sequences at one or more positions in an interval between Canfaml reference position 44,212,792 and Canfaml 44,278,140 reference position.
18. The marker or plurality of markers of claim 17, which comprise one or more polymorphic nucleotide sequences at Canfaml reference positions 44,226,324; 44,228,010; 44,228,468; and 44,283,669.
19. A kit comprising a plurality of markers of claim 16.
20. A method of identifying a polymorphism within chromosome 15 of a dog, comprising:
obtaining a sample comprising nucleic acids from a dog, wherein the sample comprises an interval on chromosome 15 between Canfaml reference positions 44,000,000 and 45,000,000, and wherein the interval comprises a sequence selected from the sequences set forth in SEQ ID NOS: 1-22, and 66-226; and
assaying the sample to identify at least one polymorphism within the sequences set forth in SEQ ID NOS: 1-22 and 66-226.
21. The method of claim 20, further comprising correlating the polymorphism present in the sample to adult dog body size.
22. The method of claim 21, wherein the polymorphism identified is selected from sequence SEQ ID NOS: 5-8, 11, 12, 15, 16, or 20 and is correlated with small adult dog body size.
23. The method of claim 21, wherein the polymorphism identified is selected from sequence SEQ ID NOS: 1-4, 8, 10, 13, 14, 17, or 20 and is correlated with large adult dog body size.
24. The method of claim 20, wherein the dog is less than one year old.
25. The method of claim 20, further comprising including the polymorphism in a dog breeding plan, wherein the dog breeding plan comprises crossing two parental dogs and offspring.
26. The method of claim 25, wherein the dog breeding plan produces offspring that display a homogenous size at two years of age.
27. The method of claim 26, wherein when the offspring are at least as small as the parental dogs.
28. The method of claim 26, wherein when the offspring are at least as large as the parental dogs.
29. The method of claim 20 wherein the assaying comprises single-strand conformation polymorphism (SSCP) analysis, base excision sequence scanning (BESS), restriction fragment length polymorphism (RFLP) analysis, heteroduplex analysis, denaturing gradient gel electrophoresis (DGGE), temperature gradient electrophoresis, allelic polymerase chain reaction (PCR), ligase chain reaction direct sequencing, mini sequencing, nucleic acid hybridization, or micro-array-type detection of the polymorphism.
30. The method of claim 20, wherein the polymophism is associated with a single nucleotide polymorphism (SNP).
31. The method of claim 30, further comprising PCR amplifying an amount of the interval comprising the SNP.
32. The method of claim 31, wherein the PCR amplification includes selecting a forward and a reverse primer capable of amplifying an amount of the interval.
33. The method of claim 31, wherein the primer comprises at least 5 contiguous nucleic acids of the interval on chromosome 15 between Canfaml reference positions 44,000,000 and 45,000,000.
34. An array comprising at least one first nucleic acid molecule that hybridizes to a second nucleic acid molecule comprising a marker selected from SEQ ID NOS: 1-250, wherein the at least one first nucleic acid molecule is linked to a solid support.
35. The array according to claim 34, further comprising at least 10 contiguous nucleic acids of any one of SEQ ID NOS: 1-250, wherein the at least 10 contiguous nucleic acids are hybridized to the at least one first nucleic acid sequence.
PCT/US2007/083496 2006-11-02 2007-11-02 Compositions and methods for predicting body size in dogs WO2008058013A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US85641106P 2006-11-02 2006-11-02
US60/856,411 2006-11-02

Publications (2)

Publication Number Publication Date
WO2008058013A2 true WO2008058013A2 (en) 2008-05-15
WO2008058013A3 WO2008058013A3 (en) 2008-12-24

Family

ID=39327183

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/083496 WO2008058013A2 (en) 2006-11-02 2007-11-02 Compositions and methods for predicting body size in dogs

Country Status (1)

Country Link
WO (1) WO2008058013A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20220081560A (en) * 2020-12-09 2022-06-16 대한민국(농촌진흥청장) Development of genetic markers for early prediction of body length of Jindo dogs

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020155564A1 (en) * 1997-12-29 2002-10-24 The Regents Of The University Of California Cloning of a high growth gene

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020155564A1 (en) * 1997-12-29 2002-10-24 The Regents Of The University Of California Cloning of a high growth gene

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
AFFYMETRIX: "GeneChip Canine Genome 2.0 Array" TECHNICAL DATA SHEET, 2005, XP002480398 *
CHASE KEVIN ET AL: "Genetic basis for systems of skeletal quantitative traits: Principal component analysis of the canid skeleton" PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, vol. 99, no. 15, 23 July 2002 (2002-07-23), pages 9930-9935, XP002480394 ISSN: 0027-8424 cited in the application *
CHASE KEVIN ET AL: "Interaction between the X chromosome and an autosome regulates size sexual dimorphism in Portuguese Water Dogs" GENOME RESEARCH, vol. 15, no. 12, December 2005 (2005-12), pages 1820-1824, XP002480393 ISSN: 1088-9051 cited in the application *
EIGENMANN J E ET AL: "BODY SIZE PARALLELS INSULIN-LIKE GROWTH FACTOR I LEVELS BUT NOT GROWTH HORMONE SECRETORY CAPACITY" ACTA ENDOCRINOLOGICA, vol. 106, no. 4, 1984, pages 448-453, XP009100198 ISSN: 0001-5598 *
LARK KARL G ET AL: "Genetic architecture of the dog: sexual size dimorphism and functional morphology." TRENDS IN GENETICS : TIG OCT 2006, vol. 22, no. 10, October 2006 (2006-10), pages 537-544, XP002480399 ISSN: 0168-9525 cited in the application *
LINDBLAD-TOH KERSTIN ET AL: "Genome sequence, comparative analysis and haplotype structure of the domestic dog" NATURE (LONDON), vol. 438, no. 7069, December 2005 (2005-12), pages 803-819, XP002429578 ISSN: 0028-0836 *
RINCON G ET AL: "Characterization of variation in the canine suppressor of cytokine signaling-2 (SOCS2) gene" GENETICS AND MOLECULAR RESEARCH, vol. 6, no. 1, 2007, pages 144-151, XP002480397 ISSN: 1676-5680 *
SUTTER NATHAN B ET AL.: "An Ancient Haplotype Pinpoints IGF1?s Role In Determining Dog Body Size."[Online] 11 October 2006 (2006-10-11), XP002480392 Retrieved from the Internet: URL:http://www.ashg.org/cgi-bin/ashg06s/ashg06?pgmnr=116&sort=ptimes&sbutton=Detail&absno=30755&sid=414063> [retrieved on 2008-05-13] *
SUTTER NATHAN B ET AL: "A single IGF1 allele is a major determinant of small size in dogs." SCIENCE (NEW YORK, N.Y.) 6 APR 2007, vol. 316, no. 5821, 6 April 2007 (2007-04-06), pages 112-115, XP002480396 ISSN: 1095-9203 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20220081560A (en) * 2020-12-09 2022-06-16 대한민국(농촌진흥청장) Development of genetic markers for early prediction of body length of Jindo dogs
KR102470954B1 (en) 2020-12-09 2022-11-29 대한민국 Development of genetic markers for early prediction of body length of Jindo dogs

Also Published As

Publication number Publication date
WO2008058013A3 (en) 2008-12-24

Similar Documents

Publication Publication Date Title
US20090186347A1 (en) Markers for metabolic syndrome
KR101418402B1 (en) Novel SNP marker for discriminating level of loinmuscle area of Pig and use thereof
EP1609876A1 (en) Identification of the gene and mutation for progressive rod-cone degeneration in dog and method for testing same
KR102124652B1 (en) Composition for early predicting or diagnosing anxiety disorder in dog
Vallejo et al. Genetic diversity and background linkage disequilibrium in the North American Holstein cattle population
WO2012137110A1 (en) Association markers for beta thalassemia trait
US7794982B2 (en) Method for identifying gene with varying expression levels
Tozaki et al. Prospects for whole genome linkage disequilibrium mapping in thoroughbreds
CN115976226A (en) Application of SNP (Single nucleotide polymorphism) marker in inbred line mouse strain identification and primer sequence
WO2008014550A1 (en) Markers for pigmentation
WO2008058013A2 (en) Compositions and methods for predicting body size in dogs
WO2014110562A1 (en) Compositions and methods for genotyping canines
AU2012349841B2 (en) Genetic test for liver copper accumulation in dogs
US20080118914A1 (en) Follistatin gene as a genetic marker for first parity litter size in pigs
US8124337B2 (en) Hereditary cataract status in canines based on HSF4 gene marker
US20130109589A1 (en) Single nucleotide polymorphisms associated with amyotrophic lateral sclerosis
US20030008301A1 (en) Association between schizophrenia and a two-marker haplotype near PILB gene
US20130040838A1 (en) Methods for identifying the presence of a bicuspid aortic valve
WO2013030786A1 (en) Method for diagnosing or predicting hepatocellular carcinoma outcome
KR20190053704A (en) Method for identification of Baekwoo breed using single nucleotide polymorphism markers
AU2007281024B2 (en) Markers for pigmentation
US7811762B2 (en) Identification of a novel gene underlying familial spastic paraplegia
WO2023141462A2 (en) Selection method for domestic animal breeding
EP2149612A1 (en) Genetic markers of response to efalizumab
JP2021073993A (en) Canine cataract testing method, canine cataract testing reagent, and canine cataract testing kit

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07863842

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07863842

Country of ref document: EP

Kind code of ref document: A2