New Zealand Paient Spedficaiion for Paient Number 524237 524237 Intellectual Prn^erty Cfncc of V'Z 1 r rcn *■»^-*#1 * 0 * /wf received PATENTS ACT 1953 COMPLETE SPECIFICATION After Provisional No: 524237 Dated: 14 February 2003 ANIMAL TESTING PROCEDURE WE AGRESEARCH LIMITED, a New Zealand company duly incorporated pursuant to the Crown Research Institutes Act 1992 and having its Registered Office at 5th Floor Tower Block, Ruakura Research Centre, East Street, Hamilton, New Zealand; hereby declare the invention for which we pray that a patent may be granted to us, and the method by which it is to be performed to be particularly described in and by the following statement 1 James & Wells Ref: 121679/28 ANIMAL TESTING PROCEDURE TECHNICAL FIELD This invention relates to an animal testing procedure. Specifically, this invention relates to a method of determining the genetic value of an animal during animal 5 breeding.
BACKGROUND ART Species that are under some form of artificial genetic selection provide the majority of the world's food and natural fibres, companion animals and racing animals.
While there are many methods and schemes to selectively breed and improve a 10 species, these almost invariably require the objective measurement of specific traits related to the characteristic to be improved and accurate recordal of pedigrees or family relationships in order to distinguish genetic influences from environment.
In many species the need to record pedigree places serious restrictions on the 15 management of breeding populations. For example, to be certain of pedigree in an extensively farmed species such as the sheep, ewes must be single sire mated and at lambing the birth must be observed and the lamb uniquely tagged.
These requirements of pedigree recording constrain management options at mating and lambing, require highly skilled stock managers to gather accurate 20 records, and limit the size of breeding flocks and conditions under which breed flocks to manageable levels.
Further, single sire matings risk low pregnancy rates if the ram is not active or is infertile, while such extensive recording often causes stress to ewe flocks during 2 James & Wells Ref: 121679/28 lambing which may increase miss-mothering and lamb loss.
Only under extreme conditions, such as single pen lambing, can parentage be determined completely error-free.
Similar difficulties are experienced in virtually any breeding programme where 5 animals or plants are in an extensive farming situation for at least part of their lifecycle.
One obvious alternative to such traditional recording of pedigree is to use DNA marker profiles to identify parents. It is well known that DNA can be used to determine paternity in humans.
Use of DNA markers to generate pedigrees on a large scale for animal breeding would enable management constraints of visual recording and confinement of animals to be relaxed.
DNA testing has been widely used in the breeding of high value animals such as horses to monitor the accuracy of pedigree records. Such parentage matching is 15 typically very reliable and effective when there is a mother and progeny with two or more potential sires.
However, while DNA profiling has been available for nearly 20 years there are few examples of the widespread use of DNA generated pedigrees in breeding strategies.
One of the primary reasons for this is the cost and practicality of accurate DNA matching in large populations. If recording or management is relaxed at mating and parturition in an extensive breeding situation, then potentially there can be a very large number of parents for an offspring. For example, in sheep where ten sires may be mated with a thousand ewes, a lamb could have one of 10,000 3 James & Wells Ref: 121679/28 possible combinations of parents.
An additional problem when trying to match a progeny to a unique pair of parents arises when allowing for genotyping errors. A common way to overcome this is to require more than one exclusion among the marker tests. Although this works 5 reasonably well in situations where many markers are typed, it is somewhat arbitrary, and is less useful for lower numbers of markers being scored.
The exclusion method tries to exclude all possibilities but the correct one, by excluding relationships with incompatible genotypes. A common way to cope with genotyping errors is to allow a discrepancy at one marker. This means that there 10 must be at least two discrepant markers to exclude possible parents (as it is unlikely that two of the genotypes in the comparison will be in error). When a small set of markers is used, it is difficult to exclude all incorrect relationships, and if there need to be two out of a small number of markers showing an inconsistency this will exacerbate the problem.
As progeny produced in extensive breeding systems potentially have a very large number of parents, genetic evaluation of such animals are thus complicated by the uncertainty of the parentage. A number of statistical methods have been developed to attempt to assign pedigree to such offspring.
Such methods typically only aim to identify the sire. As multiple sires are often 20 found to be genetically compatible with each offspring, the use of genetic markers may be used to identify the most likely male with a certain level of likelihood.
These methods are useful in tracing pedigrees through natural populations where there is no record or information on matings. This yields information on population parameters, such as the distribution of mating success(e.g. Devlin et at., 1988, 25 Theoretical and Applied Genetics 76:369-380; Dickinson and McCulloch, 1989, 4 James & Wells Ref: 121679/28 Animal Behaviour 38:719-721; Neff et al, 2001, Theoretical Population Biology 59:315-331), but the methods have not been widely used in extensive breeding situations due to the large amount of genetic analysis required.
Whilst other methods have been proposed for use in breeding situations, they 5 often seek to account for multiple possibilities of sire pedigrees during the genetic evaluation process (Foulley et al., 1987, Genetics Selection Evolution 19:83-102; Henderson, 1988, J Animal Science 66:1614-1621; Perez-Enciso and Fernando, 1992, Theoretical and Applied Genetics 84:173-179; Cardoso and Templeman, 2003, Genetics Selection Evolution 35:469-487). The use of genetic markers has 10 not been exemplified in such techniques, but rather is mentioned only as a possibility in assigning probabilities to possible parentages.
All references, including any patents or patent applications cited in this specification are hereby incorporated by reference. No admission is made that any reference constitutes prior art. The discussion of the references states what their 15 authors assert, and the applicants reserve the right to challenge the accuracy and pertinency of the cited documents. It will be clearly understood that, although a number of prior art publications are referred to herein, this reference does not constitute an admission that any of these documents form part of the common general knowledge in the art, in New Zealand or in any other country.
It is acknowledged that the term 'comprise' may, under varying jurisdictions, be attributed with either an exclusive or an inclusive meaning. For the purpose of this specification, and unless otherwise noted, the term 'comprise' shall have an inclusive meaning - i.e. that it will be taken to mean an inclusion of not only the listed components it directly references, but also other non-specified components 25 or elements. This rationale will also be used when the term 'comprised' or 'comprising' is used in relation to one or more steps in a method or process.
James & Wells Ref: 121679/28 It is an object of the present invention to address the foregoing problems or at least to provide the public with a useful choice.
Further aspects and advantages of the present invention will become apparent from the ensuing description which is given by way of example only.
DISCLOSURE OF INVENTION According to one aspect of the present invention there is provided a method of determining breeding values of individual animals, including the steps of (i) identifying possible parents of progeny, and 10 (ii) genotyping of possible parents and progeny, and (iii) calculating parentage likelihoods and converting them to statistical probabilities, and (iv) estimating the breeding values of the individual animals the method characterised by the step of using a selected number of genetic markers in the genotyping.
In preferred embodiments of the present invention this method is used to determine the parentage of progeny produced in large scale animal breeding.
In preferred embodiments of the present invention the procedure is used to determine the parentage in sheep.
However, this should not be seen as a limitation on the present invention in any way for this procedure may find use in any breeding programme where animals or 6 James & Wells Ref: 121679/28 plants are in an extensive farming situation for at least part of their lifecycle, such as other livestock (cattle, camels, camelids, goats), farmed fish (salmonids), poultry, and sexually reproducing plants.
The term "breeding value" should be taken to mean a measure of the genetic merit 5 of an individual. "If an individual is mated to a number of individuals taken at random from the population, then its breeding value is twice the mean deviation of the progeny from the population mean" (D. S. Falconer, 1981, Introduction to Quantitative Genetics, 2nd ed., Longman, London). The "mean deviation" could refer to a single trait, or to a mathematical combination of traits. Detailed methods for estimating breeding values can be found in Mrode (1996, Linear Models for the Prediction of Animal Breeding Values, CAB International, Wallingford).
In preferred embodiments of the present invention, breeding values may be determined for both parents and/or progeny. One or more breeding values may also be calculated for each individual, depending on the trait(s) of interest.
Identification of possible parents preferably involves the accurate recording of the mating mob (groups of sires and mothers which have the opportunity to mate) and the parturition mob (groups of mothers in defined areas and the animals which are potentially their offspring). Such recordal is standard practice in conventional pedigree recording.
Possible parents may also be identified by some form of objective measurement of specific traits or through other farm management practices.
Identifying possible parents allows animals obviously not the parents to be eliminated before beginning any statistical analysis, thus reducing the number of calculations required.
Intellectual PropQ^y Office of N.Z. - 7 APR ?nffi 7 James & Wells Ref: 121679/28 In addition, allowance can be made for unknown parents, such as when there are incomplete mob records. These will have higher relative likelihoods for cases where none of the other possibilities provides a good match.
The term "genotyping" should be taken to mean the analysis of the distribution of a 5 number of polymorphic genetic markers. Preferably the markers are analysed simultaneously using standard molecular biology techniques known in the art, such as multiplex or parallel analysis systems.
Such markers will herein be referred to as DNA markers, though this should not be seen as limiting.
The term "selected markers" should be taken to mean a set of markers which together enable a cost-effective solution to parentage testing and genetic evaluation using the methods proposed here. Using current technology this would normally constitute four to ten microsatellite markers chosen to be highly polymorphic and able to be multiplexed (i.e. analysed together), but the proposed 15 technology will make use of any marker information no matter how good or poor that information is.
Alternatively, a larger set (20-30) of markers with lower probability of pedigree exclusion (e.g. SNP markers). However this is not limiting. The key feature of the selected markers is that less markers are used than one would reasonably expect 20 from simulation or know from experience to provide a perfect solution to parentage problem. Ideally between 50 and 90% of progeny would be matched to a single parent with high probability while the remainder would have multiple possible parents.
In the present invention the use of a selected number of DNA markers to generate 25 DNA profiles does not enable the identification of the parents of all progeny.
Intellectual Properiv ~ Office of N.Z. - 7 APR 2005 RECEIVED James & Wells Ref: 121679/28 Instead, the partial parentage information provided by selected DNA marker information can be used to calculate the statistical probability for each possible parentage.
According to another aspect of the present invention there is provided software 5 programmes, and computer systems adapted to run such software, to calculate the statistical probability for each possible parentage.
The identification of possible parents creates a database of likely parents for each offspring. Having a finite number of potential parents then allows the statistical probabilities of each being the true parents to be calculated.
The set of possibilities are more likely to contain relatives of the true parents than unrelated animals. In some cases incorrectly allowing these as possible parents may still be better than not assigning parents to a progeny.
The parentage probabilities are then preferably used to calculate breeding values for each progeny, taking into account available biological information such as 15 mating and lambing dates, pregnancy scanning data, specific trait records and so forth to modify the likelihood for each possible parentage.
Incorporating biological factors into the calculations improves the accuracy of the procedure. For example, a ewe scanned pregnant with one lamb is unlikely to give birth to five lambs.
Once the final modified set of pedigree relationships are obtained breeding values can be calculated by a number of techniques that involve either by taking into account all possible pedigree combinations weight/referenced by their probability and doing one analysis or by sampling possible pedigree combinations based on their probability and running multiple breeding value analyses and averaging the 25 results.
Intellectual Properly Office of N.Z. 9 James & Wells Ref: 121679/28 - 7 APR 2005 DNA testing has been widely used in the breeding of high value animals such as horses to monitor the accuracy of pedigree records. Typically, DNA parentage matching is very reliable and effective when matching a mother and progeny with two possible sires. However, in a less controlled animal breeding situation there 5 may potentially be a very large number of possible parents for an offspring and as such the costs of genotyping pedigree records is prohibitive.
The present invention utilises a set of DNA markers that do not identify the pedigree of all offspring. In comparison, the number of DNA markers used will only offer partial solutions to the pedigree of the offspring. While initially counter-10 intuitive, this offers a cost effective option to determining pedigrees in large-scale animal breeding using current DNA marker technology.
The optimal number of markers for a given situation will vary with the number of animals involved and the accuracy with which breeding value are required. We have used simulation to determine the optimal number of markers for situations 15 commonly encountered in sheep farming. For example, for a particular situation ("single sire/mixed lambing", lambing mob of 250 lambs of 100 dams) the proportion of progeny uniquely assigned (assuming no allowance for genotype errors) was up to 60%, 80% and 90% for 4, 5 and 6 markers respectively.
Unexpectedly, the breeding values data provided by the partial parentage solution 20 of the present invention provides similar genetic progress to that achieved using traditional pedigree recording, but with greatly relaxed management constraints and half or less than half the number of DNA markers required for a near complete pedigree solution. As such, the process makes genetic progress using DNA pedigrees feasible and cost effective even in large-scale farmed species such as 25 sheep.
Automation and refinement of the genotyping process is anticipated to further Intellectual Propi Office of N.Z. :tual Prcpariy James & Wells Ref: 121679/28 - 7 APR 2005 reduce the costs of genotype analysis and broaden the applicability of this technology.
Further, the present invention also unexpectedly provides additional benefits that arise from the technology, including robustness to error rates in genotyping data, in addition to options for incorporating marked Quantitative Trait Loci (QTL) in the breeding DNA profile and breeding analysis.
The marker information can be used in conjunction with knowledge of the characteristics of the QTL and of its status in the parents to undertake marker or gene assisted selection of progeny with only partially determined parentage, allowing the selection for favourable gene variants.
By combining information on the possible parents (such as mob records) and the limited amount of genotypic information obtained by the partial pedigree solution with biological information collected during standard farm management such as mating and lambing dates, pregnancy scanning data, specific trait records and so forth then the estimated breeding value for each offspring can be calculated.
This ability to use DNA to generate pedigrees on a large scale for extensive breeding situations enables management constraints of visual recording and confinement of animals to be greatly relaxed. This allows more efficient management of animals at mating and lambing, thus lowering costs and stress on animals which could result in increased survival of young; the size of flocks to be increased more easily (less capital and staff training required); and breeding flocks to be run under similar conditions to production flocks, thereby increasing the relevance of selection.
It is expected that the present invention will provide a better solution, or at least a cost and labour effective alternative to traditional pedigree recording or to Intellectual Proporiy Office of N.Z. 11 James & Wells Ref: 121679/28 - 7 APR 2005 applications of DNA technology which aim to identify the exact parents of offspring in a breeding situation.
It is anticipated that there are a number of points in this procedure which can be modified or extended depending on breeding situation in question. Use of different 5 data collected in standard genetic evaluation and pedigree recordal, and the manner these variables are analysed, could be used.
It is anticipated such modifications to the current procedure would be obvious to a skilled addressee and as such the present invention should not be seen as limiting.
DETAILED DESCRIPTION OF THE INVENTION As defined above, the present invention is directed to determining a breeding value of progeny in large scale breeding situations.
The invention is based upon the inventors investigations into the ability to use a selected number of DNA markers in conjunction with currently relaxed management restraints to achieve similar results to that achieved using traditional 15 pedigree recordal or full DNA profiling.
Non limiting examples of the invention will now be provided.
Example 1 This example details a simulation model that has been developed to demonstrate how partial pedigree information can be used for the calculation of breeding values. 20 The simulation used parameters that are typical for a prolific sheep breeding operation, and using DNA markers typical of those that could be used cost-effectively for this situation. Breeding values were calculated using an inverse additive relationship matrix incorporating the parental uncertainty in the genetic evaluation.
IntellcvtuarFrcooFy Office of N.2. 12 James & Wells Ref: 121679/28 - 7 APR 2505 Simulation Methods. 1 Pedigree simulation A pedigree is simulated comprising three generations. This allowed some genetic similarities between the parents, as would normally be the case in practice.
• Generation I (grandparents); o assumed to be unrelated animals. o paternal and maternal lines were produced.
■ Paternal line included 5 grand-sires and 20 grand-mothers.
■ Maternal line included 10 grand-sires and 120 grand-mothers.
• Generation II (parents); o Parents were generated from the separate paternal/maternal grandparent lines. o The number of progeny produced from each mating was based on a distribution of 0.15, 0.45, 0.35, 0.04 and 0.01 for 1-5 lambs 15 respectively. o Animals in each group were randomly assigned a sex.
■ Consequently 20-30 potential sires and in excess of 100 potential mothers were generated.
• Generation III progeny 13 James & Wells Ref: 121679/28 o 10 animals were randomly selected for use as sires from the set of potential sires from generation II. o 100 animals were randomly selected for use as mothers from the set of potential mothers from generation II. These animals were randomly 5 assigned a mate from the set of sires. o The number of progeny per mother was determined as in generation II. o Progeny were randomly removed (to mimic deaths) according to birth rank. The proportions removed were 0.15, 0.15, 0.30, 0.5 and 0.5 for 1-5 lambs respectively. o Birth rank and rearing rank data were stored for all progeny. o All progeny were randomly assigned a sex.
Generation I and II animals not directly related to progeny were removed from analysis. 2 Genotypes Genotypes were applied to all animals according to pedigree. o The marker data was simulated for six markers, based on a set of markers currently being used by the applicants. The allele frequency data for these markers was calculated from a set of unrelated animals. o Genotypes were assigned to generation I animals by randomly 20 selecting alleles based on the allele frequency data. o Generation II and III animals' alleles were randomly selected from each parental set of alleles. 14 James & Wells Ref: 121679/28 o In order to conservatively model the proportion of results from a commercial cost effective laboratory, 10% of genotypes were randomly removed from mothers and progeny. No genotypes were removed from the sires. o All genotypes were assigned without error. 3 Parentage assignment Progeny assigned to parents using partial pedigree software designed by the applicants, as described below. Marshall et al. (1998, Molecular Ecology 7: 639-655) presented similar formulas, but for paternity assignment rather than 10 assignment of both parents.
In this example all relevant animals belong to a single mating or lambing group. o The likelihood of the data given a putative parentage is calculated relative to the likelihood of the data given no relationship. o The likelihood is L is the product over all markers of L(Hi)/L(H2), with 15 L(H1) = (1-e)3T(gplg0)P(g0) + e(1-e)2 [T(g, I g0) P(g0) +T(gm I g0) P(g0)+P(gP)] + e2(1-e)[P(g0)+P(gm) +P(gf) ] + e3 L(H2) = (1-e)3 P(gp) P(g0) + e(i -e)2 [P(gf)P(g0) +P(gm)P(g0) +P(gm)P(gO] + e2(1 -e) [P(g0) +P(g,) +P(gm)] + e3 e is the assumed rate of genotyping errors and other quantities are found from the following tables according to the genotypes (A 25 represents the Ah allele and p, its frequency).
James & Wells Ref: 121679/28 Offspring Parents Genotype (g0) Genotypes (flD) T(gP I g0) P(gP) AA AjAj x A/Ai 2 Pi Pi A/Ai x AjAj 2PiPj 4PiPj A,Aj x AjAj _2 Pi *^2 2 4Pi Pj AjAj x AjAk 2PiPk 8pfPiPk AAj AjAj x AjAj 2 Pi 4pfps AjAj x AjAj Pi Pj 2Pi Pi AjAj x AjAk PiPk 4pfPiPk AiAj x AjAj Pi Pi 4P?P? AjAk x AiAj PiPk 8pfrjPk AjAk x AjAk „ 2 Pk 8piPiPk AjAk x AjAj PkPi 8PiPi pk Pi Offspring Parent T(gf I g0) P(gf) Genotype (g0) Genotype (flf or gm) or T(gm I g0) Or P(gm) AiA, AA Pi 2 Pi AAj Pi 2PiPj AiAi AAi Pi2 Pi AAj (pi+p,y2 2PiPj AAk PJ2 2PiPk A probability is assigned to putative parentage in proportion to its likelihood, from the set of parentages which are more likely than a randomly chosen set of parents. 16 James & Wells Ref: 121679/28 o The rate of genotype errors was assumed to be 1% when performing the likelihood calculations. 4 Trait assignment Generation I animals were assigned genetic values for a trait with a known mean 5 (40), residual standard deviation (8.4) and heritability (0.3). Genetic values determined for generation II and III animals using known pedigree relationships and quantitative genetics theory. Phenotypes for generation III animals were produced from their genetic values, a component that depended on their birth and rearing rank (see the following table), and a randomly sampled environmental 10 component. Phenotypes were not modified according to sex.
Birth and rearing rank components Rearing Rank Birth Rank 1 2 3 4 1 0 0 0 0 0 2 -1.4 -4 0 0 0 3 -1.4 -5 -7 0 0 4 -1.4 -5 -7 -10 0 -1.4 -5 -7 -10 -12 Breeding value (BV) estimation 15 Partial pedigree results used in estimation of animal breeding values. o BVs were estimated using ASREML (software that is able to produce estimated BV's by using a supplied inverse relationship matrix or by using a supplied pedigree) with known (generation III) phenotypic information. The model included fixed effects of sex and birth/rearing 20 rank combination. 17 James & Wells Ref: 121679/28 It was assumed that the litter size of each mother was known. In practice this may rely on pregnancy scanning information.
Six different methods were used to estimate BVs: 1. True pedigrees (TRUE): The true pedigrees and birth and rearing ranks (known from the simulation) were used. For real data this method is not applicable, but is presented for comparison. 2. Partial pedigrees - average relationship matrix method (ARM): Pedigrees were assumed unknown and the relationship matrix was formed, using the partial pedigree probabilities, and inverted. Birth ranks and rearing ranks were derived as the weighted (by the probabilities) means of the assigned mothers' litter sizes and rearing group sizes. Where a mother had a birth rank greater than 3 (5% of mothers had >3 progeny generated), the birth rank for this mother was set to 3. This reflects typical on-farm practice. Assigned rearing ranks were rounded to integer values, with a maximum of three, for the purposes of the genetic evaluation. 3. Partial pedigrees - pedigree sampling method (PS): Pedigrees were assumed unknown. Sample sets of possible parentages according to their probabilities, analyse each sample, using standard genetic evaluation software, as if it contained true parentage information, and then average the estimated breeding values from each of these analyses. One hundred samples were used in this example.
Each sampled parentage allows calculation of a birth rank (from collection and sampling of dead offspring if possible, or from litter 18 James & Wells Ref: 121679/28 size from pregnancy scan data), and a rearing rank. These data are of the types that are available with full parentage recording, and therefore can be handled by standard genetic evaluation software. Only birth rank and rearing rank calculations were included in the present example.
Partial pedigrees - Upper quartile pedigree sampling method (UQPS). Pedigrees were assumed unknown, and a set of pedigrees sampled as in the PS method. For each sample of parentages, calculate a weighting factor which increases as estimated family sizes come closer to expected family sizes (using available prior information such as litter size from pregnancy scan data). To see if more highly weighted pedigrees might be better to use than those with lower weights, the PS method was carried out, but using only the top 25% (upper quartile) of pedigrees on weighting. The weighting scheme used in this simulation was as follows: a. Set the weight to 1. b. For each dam with progeny, multiply the weight by 0.75(/ls"1), where ns is the number of different sires assigned for the progeny of that dam. c. For each dam, multiply the weight by 4w, where w is found in the following table according to the number of progeny detected by scanning the dam, and the number of progeny assigned (reared). These values approximately represent the (relative) probabilities of rearing the given number of progeny given the scanned number of progeny (and 19 James & Wells Ref: 121679/28 allowing for errors in the scanning process). The values are multiplied by 4 to ensure that calculations are within the range of computer storage. For combinations not shown, a value of wi=0.01 was used.
Lambs assigned (reared) Pregnancy Scan 0 1_ 1 0.1 0.9 0.05 2 0.005 0.13 0.865 0.05 3+ 0.008 0.1 0.39 0.51 0.05 Best pedigree (BP): Pedigrees were assumed unknown and the set of parents with the highest probability were used as the parents in the genetic evaluation. Birth ranks and rearing ranks were taken as the mother's litter size, and as the number of live progeny assigned to that mother respectively. In other words, the best pedigree method uses the parents with the highest probability as if they are the true parents, i.e. only one set of parents is assigned for each probability. Partial pedigree allows a number of different parent possibilities.
Best fitted pedigree (FP): Pedigrees were assumed unknown and the set of parents with the highest probability were set as an initial James & Wells Ref: 121679/28 assignment of parentage. The probabilities were then altered in an iterative process by down weighting values when a progeny showed an unlikely dam assignment on the basis of family size. If a progeny was in a litter of size greater than one, and was the 5 only one assigned to a particular sire, it was given a 'check value' of 0, otherwise 1. A dam's progeny were then sorted by check value and descending probability. Progeny that ranked below the dam's scanned litter size were downweighted - by a factor of 0.75 if the progeny had a check value of 0, and by 0.5 if it had a check 10 value of 1 .The new set of probabilities was then used to reassign parentage and the process iterated 15 times (which was found to be sufficient to ensure further reassignment was minimal). The FP method, therefore, used a single pedigree for breeding value analysis, but one which gave family sizes that coincided more 15 closely with prior expectations than the BP method. o The true genetic values as produced in the trait assignment were also available for comparison. o Correlations were found between breeding values estimated by each of the methods and between these and the true genetic values. Results 20 were partitioned into generation II males and females and generation III animals.
These results were then summarised over multiple replications (50 in the cases presented for this example) of the simulation. 21 James & Wells Ref: 121679/28 Results from simulation.
Simulation results are presented as the means (over replicates of the simulation) of the correlations between breeding values calculated using the true pedigree and by the other two methods (partial pedigrees or best pedigrees). Results are presented for three different groups of animals: sires, mothers and progeny. Each of these has substantially different amounts of information (number and type of close relatives with trait information) for estimating breeding values.
True birth/rearing rank Estimated birth/rearing rank Method Group Mean correlation Correlation Standard Error Mean correlation Correlation Standard Error PS Sires 0.944 0.008 0.919 0.010 PS Mothers 0.730 0.008 0.691 0.008 PS Progeny 0.963 0.002 0.931 0.003 ARM Sires 0.949 0.008 0.912 0.010 ARM Mothers 0.730 0.008 0.678 0.009 ARM Progeny 0.965 0.002 0.921 0.003 UQPS Sires 0.942 0.009 0.917 0.011 UQPS Mothers 0.718 0.008 0.681 0.008 UQPS Progeny 0.962 0.002 0.930 0.003 BP Sires 0.906 0.014 0.866 0.016 BP Mothers 0.685 0.010 0.634 0.009 BP Progeny 0.936 0.002 0.890 0.003 FP Sires 0.880 0.014 0.842 0.017 FP Mothers 0.697 0.010 0.647 0.009 FP Progeny 0.927 0.002 0.882 0.003 22 James & Wells Ref: 121679/28 The table shows that the ARM and the PS methods are similar if birth and rearing ranks are known. Because ARM uses rounded average birth and rearing ranks, when these are estimated, it performs worse than the PS method (particularly for progeny). Further results are presented for PS in preference to ARM. The UQPS 5 method gives similar results to the PS method. This indicates that the weighting scheme employed has not been successful in improving the results per breeding value analysis. While this does not preclude the existence of a method that uses prior family size information for improving the process, further results for the UQPS method are not shown here. The FP method performed slightly poorer than the BP 10 method (even though it obtained a greater proportion of correct parentage assignments; not shown). Therefore further results are shown for BP in preference to FP, as an example of a 'single pedigree' method. Further results are shown only for the situation where birth and rearing rank is estimated.
The accuracy of estimated breeding values is the correlation between them and 15 the true breeding value (which are known within the simulations). Accuracies are shown for the three groups of animals and for known and partial pedigrees below: 23 James & Wells Ref: 121679/28 Accuracy Pedigre e Group Accuracy Standard Error TRUE Sires 0.733 0.025 TRUE Mothers 0.312 0.015 TRUE Progeny 0.589 0.012 PS Sires 0.678 0.033 PS Mothers 0.231 0.015 PS Progeny 0.567 0.013 BP Sires 0.655 0.032 BP Mothers 0.202 0.013 BP Progeny 0.536 0.011 Genetic progress is proportional to the square of the accuracy of the estimated breeding value. The ratio of the squared accuracies (pedigree unknown to pedigree known) gives the proportion of genetic progress that can be made with unknown pedigrees compared to that with known pedigrees.
Relative genetic gain Group PS BP Sires 86% 80% Mothers 55% 42% Progeny 93% 83% These results show that the "partial pedigree" method we have described for estimating breeding values allows much of the genetic progress that could have been made had the true parents been known. Relative genetic gain results for the ARM and PS methods are similar if birth / rearing rank is known (both 95% for 10 progeny), but PS performs better than ARM when birth / rearing rank is estimated 24 James & Wells Ref: 121679/28 (e.g. for progeny the values are 88% for ARM compared to 93% for PS). This confirms results shown above (correlation with true pedigree breeding value).
Many factors may affect the amount of genetic progress achieved using DNA markers. These include; the number of DNA markers used, missing animals and 5 genotypes from the dataset, and the heritability of the trait under test. In order to assess the impact of these factors on genetic progress, a number of simulation runs were performed. Results are presented as the degree of genetic progress achieved in progeny using DNA pedigrees compared to the true pedigree.
Increasing the number of DNA markers from 6 to 8 gives a 3% lift in the expected 10 genetic gain achieved relative to true pedigrees. Similarly, by decreasing the number of missing genotypes in the mothers and progeny from 10% missing to 5%, a relative genetic gain of 95% is achieved with the 6 DNA markers Relative genetic progress in Number of markers progeny 4 markers 88% 6 markers 93% 8 markers 96% The parentage analysis used assumes all animals are present and available for the 15 analysis. Practically speaking, in a farm situation some animals may be unrecorded, and unavailable for inclusion in the analysis. Whilst the effect of 5% to10% of mothers missing does not greatly impact on the overall result, if 1 of the 10 sires used is absent from the analysis, the relative genetic gain achieved using DNA pedigrees, compared to true drops from 93% to91%. This reflects the 20 importance of the sires where 1 animal is the parent of a large number of progeny James & Wells Ref: 121679/28 Relative genetic progress in progeny all sires present 93% 1 sire missing 91% % dams missing 93% % dams missing 94% To assess the effects of heritability (h2) on the estimation of DNA breeding values, genetic values were generated and estimated using a range of h2 values. Both the mean value and residual SD of the trait remained at 40 and 8.4 respectively; with the same adjustments for birth/rearing rank. As the heritability of the trait 5 increases the difference in genetic progress between the true pedigree as known from simulation and the DNA pedigree lessens. This is expected, as breeding values depend less on relatives as the heritability increases.
Relative genetic progress in Heritability progeny 0.1 91% 0.3 93% 0.5 97% While the process has been modeled and used for one trait, the same process can 10 be applied to any trait which can be measured objectively in organism and has a hereditary component. Genetic progress is made by selectively breeding from animals identified as having higher or more desirable breeding value. In this way the next generation will be on average improved for the trait in question. The more accurate the estimated breeding value the faster the genetic progress all other 15 things being equal. 26 James & Wells Ref: 121679/28 The progress is especially high for the groups of animals which are usually most intensively selected, i.e. sires and progeny. The results also show that a simplified method ("best pedigrees") could also be useful in this context, but at the cost of lower genetic gain than with the partial pedigree method.
The results presented compare the gain against a "gold standard", i.e. perfect pedigree recording. Pedigree recording errors in practical farming systems mean that this "gold standard" is seldom, if ever, achieved. A number of reports have investigated the level of pedigree errors, and some of these are listed below.
There have also been a number of reports investigating the reduction in genetic 10 gain due to errors in pedigree recording. For example, Israel and Weller (2000, J Dairy Science 83:181-187) predicted a loss of 4% gain with 10% incorrect sire identification when modeling selection in dairy cattle, while Banos et al. (2001, J Dairy Science 84:2523-2529) estimated a reduction of 11 to 15% in a similar situation, but incorporating international comparisons.
While the loss in genetic gains due to pedigree errors will depend on the precise nature of genetic selection, it would appear that the gains with the proposed marker-based methods relative to what can practically (due to pedigree errors) be achieved would be up to 10% higher than those tabulated above.
The genetic progress that could be made if there was no pedigree recording at all 20 has not been simulated. This is expected to be 0% for sires and mothers, and <80% for progeny (would be 80% if fixed effects did not need to be estimated in the process). 27 James & Wells Ref: 121679/28 Type Estimated pedigree errors Reference Sheep, New Zealand 1-15% Crawford et al. (1993) Proc NZ Society of Animal Production 53: 363-366.
Sheep, New Zealand % Welch & Kilgour (1971) Proc NZ Society of Animal Production 31:41.
Sheep, Australia 12% Alexander et al. (1983) Aust J Experimental Agriculture Animal Husbandry23: 361-368.
Sheep, USA 14% Wang & Foot (1990) Theriogenology 34:1079-1085.
UK dairy cattle - sire assignment % Visscher et al. (2002) J Dairy Science 85:2368-2375.
Dairy cattle - sire assignment. Average of surveyed reports. 11% Banos et al. (2001} J Dairy Science 84:2523-2529.
Alternative methods There are several places where the proposed procedures could be modified or extended. In the simulations we have rounded birth ranks and rearing ranks for 5 use in the genetic evaluation. A similar method could also be used for other effects used in genetic evaluation and depending on pedigree records, for example date of birth (from pregnancy scan if otherwise unavailable) and age of mother. There are other ways that any of these could be used in the genetic evaluation. Rather than rounding, the estimated values could be used directly by fitting 10 polynomials or splines to these values.
The best pedigree method described here has made no use of measures of confidence (e.g. ratio of probability of the best to the second best pair of parents) in the parentage chosen. It could be possible to improve this simple method by 28 James & Wells Ref: 121679/28 excluding cases where there is another possible parentage with a similar probability.
The example shows a method were prior information on family size distribution is used. This method did not appear to improve the procedure. There may be 5 alternative weighting schemes, or alternative use of the weights (e.g. a weighted mean of breeding values), which would improve the procedure. In some situations there may be prior information on sire family sizes, or on conception date ranges for particular sires. The method could be extended to utilize such information.
When survival is a trait of interest, and if dead offspring have not been sampled, or 10 it cannot be guaranteed that all have been DNA sampled, records for dead animals with estimated parentage need to be created. This can be achieved within a sampled set of parentages, by creating offspring for mothers with less lambs assigned to them than their litter size (from pregnancy scan data). Sires for these offspring can be assigned according to other DNA sampled offspring assigned to 15 that mother from that mating and sampled set, or if unavailable equally (or proportionally according to mating success if known) across all sires from that mating group.
In some situations it may not be possible to determine and/or DNA sample the complete set of possible parents, or it may not be done with certainty. The set of 20 possible parents for which likelihoods are calculated would then include pairs where one or both parents were unknown. In the former case, the calculations of Marshall et al. (1998, Molecular Ecology 7: 639-655) are used; in the later case the relative likelihood to be used is one. A more extreme case of this situation is where one sex of parents is not DNA sampled (commonly the mothers) and it is the 25 relationship to only the other sex of parents that is to be considered. 29 James & Wells Ref: 121679/28 Summary The inventors have developed a number of methods for estimating breeding values wherein the first three steps of the parentage procedure are identical, but differ in the manner in which the breeding values are calculated.
The sampling approaches detailed in methods 3 and 4 above arose due to computing limitations whereby flocks larger than 3000 could not currently be handled by the inverse matrix approach outlined in method 2. By taking sample sets, standard genetic evaluation software can be used, which can handle data sets of the size that are currently handled by such software. Although computing 10 time may be is increased over method 2 by the need to evaluate many sets of sampled pedigrees, this computing requirement can be dispersed across many computers. Therefore, larger evaluations can be calculated in the same amount of time as the previous procedure.
Another advantage of the sampling methods 3 and 4 is the ability to continue to 15 use the system for the following year's cycle for the animals and for the next generation, without a need to increase to a larger computing capacity which is required by the method 2. The advantage being in the use of the standard software and the number of iterations to get the accuracy required commercially.
Method 2 has the advantage that it is more useful and versatile for smaller herds of 20 animals. With advances in computing it is anticipated that the inverse matrix approach could be used for flocks larger than 3000 and it is anticipated that alternative systems would likely make use of an inverse matrix approach.
Example 2 This example is as for example 1, except that one of the parentage markers is, or 25 is linked to a quantitative trait locus (QTL) and the association between marker and James & Wells Ref: 121679/28 QTL alleles in the parent generation is known. Two situations are investigated by extensions to the simulations described in example 1. The first is genotype assisted selection (GAS) where the marker confers knowledge of the QTL genotype, and the number of favourable QTL alleles is used as a covariate in the 5 genetic analysis. Final genetic value is found by adding the estimated value of the covariate times the number of favourable QTL alleles to the estimated polygenic (non-QTL) breeding value. The second situation is marker assisted selection (MAS). In this case the sire's QTL genotype and association with the linked marker genotype is assumed to be known. Marker inheritance from sire to offspring is 10 used to estimate the number of favourable QTL alleles (EQTL; between zero and one) passed from the sire to the offspring. EQTL is then used as a covariate in the genetic analysis. The final genetic value is found similarly to the method used for GAS.
Parameters of the simulation were as described above (for example 1), with the 15 addition of a QTL with an additive effect of one residual standard deviation (8.4), at a recombination fraction of 0.01 from the linked marker (relevant for MAS only).
The relative (to knowing the true pedigree) genetic progress when selecting progeny was 99% for GAS, and 85% for MAS. GAS does not rely on parentage for determining the QTL genotype, and therefore progress in this component of the 20 genetic value is the same regardless of pedigree assignment. MAS does rely on parentage (only the sire for this simulation) to provide information on the progeny QTL genotype, and therefore is more affected by the ability to assign parentage.
Example 3 The procedure may be applied to any species provided polymorphic genetic 25 markers are available, and the set of possible parents can be determined and DNA-sampled. The following table gives references, for a variety of animal 31 James & Wells Ref: 121679/28 species, for sets of markers that could possibly be used in this setting. In most cases the reference is to sets of markers that have been used to create linkage maps for that species.
Species Reference Sheep Crawford etal. (1995) Genetics 140: 703-724.
Cattle Barendse et at. (1997) Mammalian Genome 8: 21-28.
Pig Archibald etal. (1995) Mammalian Genome 6: 157-175.
Goat Vaiman etal. (1996) Genetics 144: 279-305.
Deer Slate etal. (2002) Genetics 160: 1587-97.
Horse Guerin et al. (1999) Animal Genetics 30: 341-54.
Chicken Levin et al. (1994) Journal of Heredity 85: 79-85.
Turkey Burt et al. (2003) Animal Genetics 34:399-409.
Mouse Dietrich etal. (1994) Nature Genetics7\ 220-245.
Rat Yamada et al. (1994) Mammalian Genome 5:63-83.
Cat Menotti-Raymond etal. (1999) Genomics57: 9-23.
Dog Werner etal. (1999) Mammalian Genome 10: 814-823 Baboon Rogers etal. (2000) Genomics 67: 237-247.
Salmon Naish and Park (2002) Animal Genetics 33: 316-318 Rainbow Trout Sakamoto et al. (2000) Genetics 155:1331 -1345.
Catfish Waldbieser et al. (2001) Genetics 158: 727-734.
Simulation of a fish breeding scheme.
The breeding population parameters were chosen to be within the range suggested by Bentsen and Olesen (2002, Aquaculture 204:349-359), being 50 parent families with 50 progeny per family. Each grandparent cohort consisted of 30 families, having 20 progeny each. Each family had distinct parents, and for parentage 10 assignment, it was assumed that the mating pairs were known. A panel of six 32 James & Wells Ref: 121679/28 markers were used, with frequencies taken from those estimated for the Stuart population of Chinook salmon from the study of Beacham et al. (2003, Fishery Bulletin 101:243-259). The first six loci scored in this study were used. The trait was simulated to be unaffected by any fixed effects, although sex was fitted as a 5 fixed effect in the genetic evaluation. The sampling system outlined in example 2 was used for genetic evaluation. For this example, 50 samples were used for averaging to obtain breeding values, and the process was replicated 10 times.
For this simulation, the relative genetic progress in the progeny was 99%. The high discriminating power of the marker set used has contributed to this value 10 being so high.
Aspects of the present invention have been described by way of example only and it should be appreciated that modifications and additions may be made thereto without departing from the scope thereof as defined in the appended claims. 33 James & Wells Ref: 121679/28