EP1265476A2 - Mqm mapping using haplotyped putative qtl-alleles: a simple approach for mapping qtl's in plant breeding populations - Google Patents

Mqm mapping using haplotyped putative qtl-alleles: a simple approach for mapping qtl's in plant breeding populations

Info

Publication number
EP1265476A2
EP1265476A2 EP00989407A EP00989407A EP1265476A2 EP 1265476 A2 EP1265476 A2 EP 1265476A2 EP 00989407 A EP00989407 A EP 00989407A EP 00989407 A EP00989407 A EP 00989407A EP 1265476 A2 EP1265476 A2 EP 1265476A2
Authority
EP
European Patent Office
Prior art keywords
qtl
plant
haplo
mqm
haplotype
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP00989407A
Other languages
German (de)
French (fr)
Inventor
Ritsert C. Jansen
William D. Beavis
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Pioneer Hi Bred International Inc
Original Assignee
Pioneer Hi Bred International Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pioneer Hi Bred International Inc filed Critical Pioneer Hi Bred International Inc
Publication of EP1265476A2 publication Critical patent/EP1265476A2/en
Withdrawn legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8241Phenotypically and genetically modified plants via recombinant DNA technology
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01HNEW PLANTS OR NON-TRANSGENIC PROCESSES FOR OBTAINING THEM; PLANT REPRODUCTION BY TISSUE CULTURE TECHNIQUES
    • A01H1/00Processes for modifying genotypes ; Plants characterised by associated natural traits
    • A01H1/04Processes of selection involving genotypic or phenotypic markers; Methods of using phenotypic markers for selection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/40Population genetics; Linkage disequilibrium

Definitions

  • MQM MAPPING USING HAPLOTYPED PUTATIVE QTL-ALLELES A SIMPLE APPROACH FOR MAPPING QTL' S IN PLANT BREEDING POPULATIONS
  • the present invention relates to the field of plant breeding.
  • mapping of QTLs using statistical models to correlate haplotypes with phenotypic traits are known in the art.
  • this experimental protocol involves deriving 100 to 300 segregating progeny from a single cross of two divergent inbred lines. The segregating progeny are genotyped for multiple marker loci and evaluated for one to several quantitative traits in several environments. QTLs are then identified as significant statistical associations between genotypic values and phenotypic variability among the segregating progeny.
  • the strength of this experimental protocol comes from the utilization of the inbred cross because the resulting Fi parents all have the same linkage phase. Thus, after selfing of the F 1 plants, all segregating progeny (F 2 ) are informative and linkage disequilibrium is maximized.
  • F 2 segregating progeny
  • the present invention provides a novel and powerful method for mapping phenotypic traits in multiple related plant families. Central to this method is the clustering of the original parents into groups on the basis of their haplotype for multiple genetic markers. The effect of a putative genetic locus is then modeled per haplotype group, instead of per family of progeny, using statistical models which correlate the haplotype with numerical values assigned to the phenotypic trait.
  • Embodiments of the invention provide for methods of mapping phenotypic traits to a corresponding chromosomal location.
  • the invention provides for mapping phenotypic traits of agronomic importance, including such properties as yield, grain composition, and insect, disease and drought resistance.
  • Progeny of multiple related crosses are assigned numerical values corresponding to a phenotypic trait, and their genotype at clusters of genetic marker loci, i.e., their haplotype, is ascertained. Using statistical methods, correspondence between the haplotype and the phenotypic value is determined. In preferred embodiments, QTL are mapped.
  • the methods of the invention can be practiced in essentially any plant population, in preferred embodiments, the methods of the invention are used to map phenotypic traits in corn, soybean, sunflower, sorghum, wheat, rice and canola.
  • the related parents used to generate progeny are inbred lines. In a preferred embodiment the lines are between 0 and 85% related. In one embodiment, the progeny are derived from a topcross and/or a backcross.
  • the statistical method utilized accounts for identical by descent (IBD) data derived by correlating pedigrees and haplotypes for the genetic markers under evaluation.
  • the model is selected from among a HAPLO-IM + model, a HAPLO-MQM model and a HAPLO-MQM" " model.
  • the invention provides for the use of molecular genetic markers to define genetic haplotypes.
  • markers are restriction fragment length polymorphisms (RFLP), isozyme markers, allele specific hybridization (ASH), amplified variable sequences of the plant genome, self-sustained sequence replication, simple sequence repeat (SSR), single nucleotide polymorphism (SNP), or arbitrary fragment length polymorphisms (AFLP).
  • RFLP restriction fragment length polymorphisms
  • ASH allele specific hybridization
  • amplified variable sequences of the plant genome self-sustained sequence replication
  • simple sequence repeat SSR
  • SNP single nucleotide polymorphism
  • AFLP arbitrary fragment length polymorphisms
  • the haplotypes are determined by high throughput screening methods.
  • one or more steps of the method is performed with computer assistance.
  • Another aspect of the invention provides for the selection of phenotypic traits in a plant breeding population, as well as plants selected using the methods of the invention.
  • the selection is performed
  • the invention further provides for the cloning of a nucleic acid sequence or fragment in linkage disequilibrium with a phenotypic trait, and for transducing the cloned nucleic acid sequence or fragment into a plant.
  • the nucleic acid is operably linked to a promoter in an expression cassette.
  • FIG. 1 A line drawing which graphically depicts relationships between multiple related crosses used in plant breeding. D indicates donor parents; E indicates elite parents;
  • P indicates progeny; a shaded box indicates a bi-parental cross made; a solid arrow indicates parent-offspring relation; a dashed arrow indicates a half-sib relation; and a dotted arrow indicates and indirect relation.
  • FIG. 1 Diagram of the interval mapping (Evl) and multiple-QTL mapping (MQM) methods. 1 indicates QTL-allele substitution-effect per family; 2 indicates QTL- haplotype effects explain within-family variance; and 3 indicates QTL-haplotype effects explain within-and between-family variances.
  • Figure 3 a and b A schematic depiction of haplotypes in a "window" of four markers for parents with identical and different haplotypes, respectively. Numbers indicate type of marker allele.
  • Figure 4. A line graph showing a comparison of the , HAPLO-IM + , HAPLO-MQM and
  • HAPLO-MQM + models for a single chromosome (simulation 2.1).
  • the present invention provides a novel and powerful method for joint analysis of multiple related families.
  • the key to increasing the power of the analysis as compared to the prior art, is the clustering of the original parents into groups on the basis of their haplotype.
  • the effect of a QTL on the phenotype is then modeled per haplotype group instead of per family. This permits an examination of the effects of haplotype-alleles across families.
  • Simulations of realistic plant breeding schemes demonstrate a significant increase in power of QTL detection compared to existing methods.
  • the present invention offers new opportunities for the mapping and exploitation of QTLs in commercial breeding activities.
  • phenotypic trait refers to an observable trait of an organism.
  • the phenotypic trait can be observable to the naked eye, or by any other means of evaluation known in the art, e.g., microscopy, biochemical analysis, etc.
  • a phenotype is directly controlled by a single gene, a "single gene trait.”
  • a phenotype is the result of several "quantitative trait loci" acting together.
  • Such a phenotype can generally be described in quantitative terms, e.g., height, weight, oil content, days to germination, etc, and therefore can be assigned a "phenotypic value" which corresponds to a quantitative value for the phenotypic trait.
  • genotype refers to the genetic constitution, as contrasted with the observable trait, i.e., the phenotype.
  • a genotype is an individual's genetic make-up for all the genes in its genome (chromosome compelement).
  • haplotype is an individual's genotype at multiple, generally linked, loci.
  • haplotype can be an individual's genotype for multiple loci or genetic markers on a single chromosome.
  • chromosomal haplotype is, alternatively, used.
  • an individual's genotype for multiple loci (or markers) within a defined region of a chromosome is, optionally, referred to as a "regional haplotype.”
  • Genetic markers are loci, or DNA sequences which both vary (are polymorphic) between individual's in a population, and can be detected by one or more analytic methods, e.g., RFLP, AFLP, isozyme, SNP, SSR, and the like.
  • RFLP Radioactive Polypeptide
  • AFLP AFLP
  • isozyme SNP
  • SSR SSR
  • MAS Marker Assisted Selection
  • correspondence in the context of the invention refers to a genetic marker locus or loci in linkage disequilibrium with a phenotypic trait.
  • a genetic marker is said to be in "linkage disequilibrium” with a gene or genes that control part or all of the variance for a trait (or locus), when the marker and the gene that controls the variance of the trait (or locus) do not segregate independently, i.e., they are inherited together more often than is expected by chance.
  • the Fi is fully informative, linkage disequilibrium is maximized, the linkage phase is known, there are only two QTL alleles, and, except for backcross progeny, the frequency of each QTL allele is 0.5.
  • the selection of lines for breeding is based- on maximizing genetic variability of traits useful for agronomic performance.
  • the crosses are not necessarily informative at all marker loci and QTLs, linkage disequilibrium exists among the (F 2 ) progeny within families, but not necessarily across the breeding population.
  • the linkage phase is not consistent across the breeding population, multiple QTL alleles can exist and the frequency of each will vary between 0 and 1.
  • a plant breeding population usually consists of multiple related instead of unrelated families.
  • higher mapping resolution and power can be obtained by methods properly including these relationships into the model.
  • Current technologies are making such detailed information about relationships readily available. It is not unrealistic to assume that plant-breeding populations will be fingerprinted on a regular basis at 200-500 marker loci, and with chip technology soon at 1000 or more loci. Marker information can monitor identity-by-descent (IBD) from parent to offspring throughout the populations.
  • Figure 1 shows a typical design of multiple related crosses. There are clear and direct half- sib relationships between families if the same line is used as parent in two or more different crosses.
  • IBD identical-by-descent
  • the present invention exploits this kind of US and IBD information.
  • We previously developed the MQM approach for analyzing experimental populations Jansen (1996) supra; and Beavis, PCT Application WO 99/32661, published January 7, 1999, QTL MAPPING IN PLANT BREEDING POPULATIONS, herein incorporated in their entirety).
  • the present invention for mapping phenotypic traits in plant breeding populations significantly extends the previous methodologies and is described in detail herein.
  • One important feature of the methods of the present invention is the clustering of the original parents into groups on the basis of their haplotype for multiple genetic markers.
  • the effect of a putative genetic locus is then modeled per parental haplotype group, instead of per family of progeny as previously performed.
  • other existing QTL analysis methods e.g., frequentist or Bayesian analysis procedures, with fixed, random, or mixed QTL effects, etc., can also be utilized.
  • Figure 1 illustrates a typical design of multiple related crosses.
  • Figure 2 provides an overview of the methods of analysis.
  • a single F? ⁇ test-cross population An F 2:3 population is, in short, a standard F 2 population in which the phenotypic scores of F individuals are obtained not by evaluation of the F plants themselves but rather by evaluation of F 3 offspring of the F 2 plants. Each F 2 individual is crossed with the same homozygous tester to generate a number of F offspring per F 2 plant and those F 3 plants are evaluated for traits of interest. Like a standard F 2 , each F 2 plant has genotype and phenotype scores. But in an F :3 the trait value of an F plant is computed as the average trait value of its F 3 testcross offspring.
  • the F 2 population is a mixture of the three QTL genotypes: QiQi, Q 2 Q 2 and Q ⁇ Q 2 .
  • Each F 2 plant is crossed to a tester QQ to generate a F 2:3 topcross.
  • Table 1 shows two characteristics important for QTL modeling in this testcross design, namely (a) only heterozygous F 2 plants generate segregating F 3 offspring and (b) the expected trait value of a heterozygous F 2 plant is halfway between the expected trait values of the homozygous F 2 plants.
  • the latter implies that QTL models will not contain parameters for dominance.
  • the former implies that we can expect heterogeneous residual variance.
  • IM was developed for analyzing a single family obtained from one bi-parental cross (Lander and Botstein (1989) Genetics 121:185). In the case of multiple families, one can analyze each family separately by IM and sum up the QTL likelihood over the families. This straightforward approach does not model QTL activity across families (in statistical terms, QTLs are nested within families).
  • the allele effects (a x and a 2 in Table 1) need to be indexed by family number: a ⁇ f and a 2f .
  • This F 2 plant can obtain 0, 1 or 2 copies of the QTL allele of the first parent, Q 1# In the first case it has received two copies of allele a 2f from the second parent and the model for the model reads
  • N fam be the number of families. QTLs are nested within families and in total there are 2Nf am regression parameters and N fam residual variance parameters.
  • N QTL be the number of QTLs.
  • a ⁇ and a 2f to denote the allele effects of a single QTL in family f.
  • the QTL model for a given F2 plant in family f reads
  • haplotypes are now numbered from 1 to N HaP i o (i)- Let h(lf) and h(2f) indicate the haplotype of the allele from the first and second parent, respectively. Then, a h (i f ) and a h ( 2f ) indicate the allele type after clustering into haplotype alleles. This will depend on the size of the window with larger windows having more different haplotypes.
  • the models now include a parameter for any between-family differences not yet accounted for
  • NQ ⁇ _N fam effects a lf (i) and a 2f (i), but in HAPLO-IM + and HAPLO-MQM + they are 'mapped' to the smaller set of N Hap i o (i) parameters for the effects of the haplotype clusters (Fig 2).
  • the model includes NH ap ⁇ 0 (i)-l free parameters for the effect of the i-th QTL and this number can be significantly lower than the N fam parameters in the "full parameterization" with QTLs nested within families as in TM and MQM. The number of QTL parameters can be reduced without loss of information.
  • the '+' in the name HAPLO-MQM 1" indicates that parameters for family effects have been included. In the following, models without parameters for family effects are considered, and the method is indicated without '+' by 'just' HAPLO-MQM. Analyzing multiple related F? ⁇ families via HAPLO-MQM:
  • the IBD concept is based on between-family genotypic information.
  • the next step is to additionally explore the potential of between-family phenotypic information.
  • the Fi of one family can be homozygous Q]Q X for a given QTL, others can be homozygous Q 2 Q 2 .
  • Clearly non-segregation of (major) genes will show up only as different mean trait values of the families. In some cases there can be no other sources generating additional between- family variation.
  • the effects of QTL alleles can then be estimated from populations in which they are segregating, and twice their effect contributes to the mean of any family in which they are not segregating.
  • the multiple-QTL model for the phenotype y of a given F 2 plant in family f is equivalent to the MQM model:
  • HAPLO-MQM exploits the within- families and between-families phenotypic information. HAPLO-MQM exploits the full power of QTL detection by combining the sources of information. Gain of power will be highest if between-family differences originate mainly from multiple additively acting and "detectable" QTLs only (see discussion).
  • the models include a parameter for the mean trait value of a population and a parameter for the effect of substitution of one allele by another allele within the population genetic background.
  • the BVI analysis requires 60 parameters for the family means plus 60 parameters for QTL-allele substitutions, i.e., 120 parameters all together.
  • the HAPLO- MQM method of the present invention is based on a different paradigm. The effects of haplotyped QTL-alleles across families, and not the effects of allele substitution within families, are evaluated across families.
  • the present invention provides models which can cope with QTLs segregating in only a subset of the families and which exploit within-family variation, but in addition also consider between-family variation.
  • the number of parents is 120, or less if some parents are used more than once.
  • EVI and HAPLO-MQM require the same maximum number of parameters.
  • the allele effects of segregating and non-segregating QTLs contribute to the differences between families, but there can also be other genetic and non-genetic sources of variation (e.g., epistatic interactions).
  • the HAPLO-MQM* model includes parameters to account and test for these differences. Note that these additional parameters do not play the role of "mean value of families" as in IM. In fact, they quantify the deviance between the observed trait values and the predicted trait values under the HAPLO-MQM model. This deviance is a measure for how well the sum of estimated allele effects can explain the between-family differences.
  • TM and HAPLO-MQM in no-OTL regions In the present IBD approach, the parents are clustered into less than 120 groups on the basis of their haplotype in a window of, e.g., four, markers around the QTL position under study. The same approach is carried out for each marker cofactor (each time leading to different groupings). The number of parameters per QTL (or marker cofactor) is equal to the number of different haplotypes.
  • the models of HAPLO-MQM required -15 parameters in the 1 st set of simulations, -15 in the 2 nd set and -37 in the 3 rd set. In each set of simulations IM takes 60 allele-substitution parameters per QTL.
  • the computed QTL likelihood is expected to be on average -15 and -37 under HAPLO-MQM, and -60 under IM.
  • Table 3 and Figure 4 show that this "background" likelihood often takes values in the predicted ranges.
  • An important consequence is that the threshold for genome-wide significance in HAPLO-MQM is much lower than that in multi-family IM. This increases the power of QTL detection using the HAPLO-MQM methods of the invention.
  • HAPLO-MQM approach requires -15 parameters per QTL in the 1 st and 2 nd set, and -37 in the 3 rd set, but the IM approach requires 60 parameters. Therefore TM and MQM are still highly over-parameterized. On the other hand, the additional parameters do not really cause problems: the model has the flexibility to fit the data well, the JJVI can be more over-fitted.
  • Figure 4 allows us to compare the single-QTL approach of IM to that of HAPLO-JJVT 1" , which is a single-QTL HAPLO-MQM* " model with family effects included to eliminate effects of the other QTLs.
  • the QTL-likelihood peaks for EVI and HAPLO-Uvf " are approximately of the same height. However, the "background" likelihood is lower for the latter, which indicates that HAPLO-fM "1" is the more powerful approach.
  • the power of QTL detection is determined by the ratio between the variance induced by the given QTL and the unexplained residual variance (Jansen 1994, 1996, both supra). In simulations provided for illustration: f ⁇ 2 QTL / ( f ⁇ 2 QT + f ⁇ a + f ⁇ 2 e ). With JM, the genetic background QTL-effects are part of the unexplained variance.
  • the HAPLO-MQM 1" model included 10-30 marker cofactors simultaneously.
  • h 2 QTL ⁇ 2 QTL / (O" 2 Q TL + f ⁇ 2 a + f ⁇ 2 e ).
  • ⁇ 2 QT L / ( ⁇ 2 QTL + f ⁇ 2 a + f ⁇ 2 e ) 0.05.
  • HAPLO-MQM 4" successfully removed the 70% ⁇ QTL / ( ⁇ QTL + ⁇ e ) ⁇ 0.16.
  • simulation 2.3 with smaller proportion of QTL-induced variation f ⁇ 2 QTL / (O "2 QTL + ⁇ 2 e ) - 0.09.
  • the HAPLO-MQM " models use all together -200-600 parameters for QTLs, leaving still -2400-2800 degrees of freedom for estimating residual error variance.
  • QTL- likelihood peaks are clearly higher for HAPLO-MQM 4" than for IM in simulations 2.1 and 2.2 (Table 3b and Figure 4). Modeling of multiple QTLs can be a second step to increase the power of QTL detection.
  • HAPLO-MOM + versus HAPLO-MQM
  • the present invention provides a third possibility for further increasing the
  • QTL power the use of between-family information in addition to within-family information. Families are usually not segregating for all QTLs involved. But that does not imply that the effect of those non-segregating QTLs cannot be detected: they generate differences between the mean values of the families. Therefore, the QTL-allele effects in our models should not only explain the within-family variation but they should also capture the between-family variation. This new multiple-QTL model can meet these additional constraints. Of course, this approach works efficiently when most of the (unexplained) differences between families are indeed induced by additively acting QTLs. This can be tested by comparing the HAPLO- MQM and HAPLO-MQM 4" models.
  • Markers in a window around the QTL position under study are used to group the parents into haplotype categories.
  • a relatively large window of four markers (Figure 3) was used. To some extent this was an arbitrary choice. With more markers, the windows would have become very large, given the sparse marker map of the simulations (200 markers at 2000 cM).
  • all markers are used simultaneously for haplotyping, and a one to one relation between haplotype and parent is established.
  • the HAPLO-MQM model then includes as many haplotype effects as there are different parents and correctly models all half-sib relationships.
  • there are good reasons for using few(er) markers in haplotyping Haplotyping on the basis of fewer markers tends to result in less haplotype classes, so that fewer QTL parameters are required. This increases the power of QTL detection and allows us to fit more complex models, e.g., with interactions.
  • a larger number of markers can be used in haplotyping, in particular if the marker map is dense. In this case it is assumed that two parents with identical haplotype in the window under study have the same QTL allele within this region. The probability that this is indeed true increases when the haplotyping is based on more markers, because more markers decreases the chance of erroneous grouping. It will be appreciated that there is an optimal balance, and it is likely that the optimum can change, e.g., when different marker densities are used, or when different types of marker are used. In breeding populations, bi-allelic markers (e.g., AFLPs) are expected to be less informative in fingerprinting than multi-allelic markers (e.g., microsatellites).
  • lower informative marker types are available at a higher map density to achieve indirectly a high multilocus information content.
  • IBD identity-by-descent
  • US identical-in-state
  • the methods of the present invention focus on QTLs with fixed effects across populations. In reality, the effects of QTL alleles are modified by genetic background. With the current HAPLO-MQM models, the "average" allele effect across the population is estimated. In order to keep the number of parameters within reasonable bounds, one can extend the use of the HAPLO-MQM model as follows. Use a priori criteria, such as genetic distance, to classify families into sub-populations and include QTL x sub-population instead of QTL x family interactions as fixed or random effects in the models.
  • Genotype x environment interaction is a very important issue in breeding. It will be appreciated that any methodological concept developed for QTL x E interaction in IM or MQM models can also be applied to HAPLO-MQM. Furthermore, the likelihood of the HAPLO-MQM and that of the HAPLO-MQM 4" can be compared to assess the amount of such interaction. Computation:
  • the HAPLO-MQM models contain many parameters. Up to 30 QTLs with
  • the "QTL-allele breeding values" (Best Linear Unbiased Prediction) can then be predicted.
  • marker-assisted selection BLUPs are calculated for the breeding values of individuals. With HAPLO-MQM, selection occurs at the level of the QTL-allele predictions rather than at the level of the individual ' s predictions.
  • HAPLO-MQM can, in most cases, detect the QTLs and even dissect the linked ones. Higher genetic complexity (smaller heritability, more QTLs, tightly linked QTLs) is compensated by increasing the number of families and/or the progeny sizes per family. In any circumstance, the application of more powerful analysis tools, such as HAPLO-MQM, will improve the chance of successful dissection of the effects of linked and unlinked QTLs.
  • the genome for the simulations consisted of 10 linkage groups of 200 centiMorgan (cM) each, e.g., as in maize.
  • the genotype and phenotype data were generated in a number of steps. First, a base population of more or less related inbred lines was simulated, all lines belonging to one and the same heterotic group. Then, pairs of parents for multiple crosses were selected from the base population. Next, F 2 offspring of the multiple crosses were generated and each F 2 plant was testcrossed to a tester from another heterotic group to generate F 2:3 offspring. Finally, phenotypic values were assigned to the segregating F 2 ; 3 and F progeny. Certain aspects of the simulation steps are described below in more detail.
  • the following (ad-hoc) protocol was used for generating a base population of inbred lines with different (re)combinations of "ancestral" linkage blocks.
  • the genome consisted of 10 linkage groups, each containing 101 bi-allelic marker loci with 2 cM of recombination between adjacent pairs per riieiosis.
  • the ad-hoc procedure for generating linkage blocks is as follows: a set of 400 homozygous recombinant lines from the cross between hypothetical parents with genotypes 1111 (and so on) and 2222 (and so on) was simulated.
  • the genotype of a homozygous line consisted of linkage blocks of l's and of 2's; the genotype will be an expression like 111211222 (and so on).
  • 400 doubled haploid lines were simulated using a recombination frequency of 0.02 and 0.2 between adjacent markers, respectively. Linkage disequilibrium between adjacent loci is 0.20 and 0, respectively.
  • the next step of the procedure was to move from linkage blocks to biallelic marker genotypes (1 st set of simulations) or multi-allelic marker genotypes (five alleles per marker in the 2 nd and 3 rd set of simulations). This was accomplished by assigning types of marker allele to the linkage blocks.
  • a (new) type of marker allele using preset marker allele frequencies was independently sampled, linkage block by linkage block, from a multinomial distribution.
  • the original genotype 111221 contains three linkage blocks, 111, 22 and 1. Each linkage block gets randomly assigned one of the types of a marker allele. Thus with 5 marker alleles, the l's in the first linkage block can be replaced by 5's.
  • After sampling types of marker allele for the other blocks the original genotype 111221 can be converted into the new configuration 555133.
  • the 400 lines in the base population can be crossed amongst each other in various combinations.
  • pairs of parents which approximately showed a preset level of relatedness, say -45% are selected.
  • only pairs of parents having a preset level of relatedness were used for crossings and all the other possible pairs with other levels of relatedness were ignored.
  • different levels of relatedness within pairs of parents were used (10%, 40% and 45%).
  • the heterozygous F t offspring of the crosses were selfed in order to generate segregating F 2 families.
  • Each F 2 individual was crossed with one and the same homozygous tester to generate a number of F 3 offspring per F 2 plant and those F 3 plants were "evaluated" for traits of interest.
  • QTLs were placed at marker positions, which made it easy to derive genotypic values.
  • the trait value was calculated as the sum of genotype and random Gaussian noise.
  • h 2 QT L ⁇ 2 QT L / ( ⁇ 2 QTL + ⁇ 2 a + ⁇ 2 e ) was used.
  • the set of 1010 markers was reduced: in each of the three simulations two hundred loci were randomly sampled from the genome and only these marker data were available for analysis.
  • the average recombination frequency between adjacent markers is 0.1 and 0.5, and the average linkage disequilibrium is approximately 0.20 and zero, respectively.
  • the two parents of each 60 crosses were about 45% related, i.e., it is expected that 55% of the loci are polymorphic in the progeny.
  • Linkage disequilibrium between adjacent pairs of markers in the breeding population was investigated at values of approximately zero (i.e., loci are statistically independent) and 0.20.
  • Family sizes of either 10 or 50 F 2 progeny were investigated in a topcross combination with a single unrelated tester. For any given simulation all families were of equal size, and populations of 600 or 3000 progeny were evaluated. All F 2 progeny were genotyped and all F 2:3 progeny were evaluated for a quantitatively expressed trait.
  • this first set of simulations has many similarities with a population derived from a single cross. The distinctions are that the number of sampled progeny is larger, different sets of QTLs can be segregating in each F 2:3 family, and linkage phase between QTLs and marker loci are not consistent across the population. Under these conditions, the impact of population size, number of segregating QTLs, and linkage disequilibrium among breeding lines upon analysis methods was investigated. 2 nd set of simulations: For the inbred parents in the base populations one of five alleles can occur at each locus (markers and QTLs). The allelic genotypes indicate ancestral alleles that are identical by descent (IBD) among the breeding lines.
  • IBD identical by descent
  • Such information is obtained by genotyping all the important lines involved in the pedigrees of the breeding populations.
  • the frequencies of each allele in the population were 0.55, 0.24, 0.12, 0.06, and 0.03 respectively. Pairs of parents were selected in such a way that the two parents of a cross were about 10% related, i.e., approximately 90% of the loci are polymorphic between any pair of parents. Linkage disequilibrium between adjacent pairs of markers was approximately 0.20.
  • Family sizes consisted of 50 F 2:3 progeny that were top-crossed with a single unrelated tester. Each progeny was genotyped and evaluated for a quantitatively expressed trait. Expression of quantitative traits was due to five or ten QTLs.
  • the QTLs segregated independently and they were located in the middle of the linkage groups.
  • pairs of QTLs were located 50 centiMorgan (cM) from each other on the same chromosome.
  • the QTLs were functionally bi-allelic, i.e., one of the five alleles that could occur at a QTL was chosen to have a positive (+) effect, while all remaining alleles were simulated to have an equal negative effect when combined with the tester allele.
  • marker loci are multi-allelic and the allelic state of the marker loci is independent of the functional state of the QTLs.
  • the multi-allelic state of marker loci is similar to the polymorphism index that has been observed in simple sequence repeat markers in maize (Senior et al. 1996).
  • the family size remained at 50 progeny.
  • different sets of QTLs are segregating in each family, and linkage phases between QTLs and marker loci are not consistent across the population.
  • the third set of simulations was very much like the second set except that the parents of crosses were about 40% related and that linkage disequilibrium between adjacent pairs of markers was zero (worst case scenario). Thus, for the third set of simulations the primary changes in available information were due to changes in population structure. RESULTS OF THE SIMULATION STUDY
  • HAPLO-MQM results are compared to those of HAPLO-MQM 4" , to see what the effect is of exclusion versus inclusion of parameters for family effects.
  • the QTL likelihood peak is higher for HAPLO-MQM than for HAPLO-MQM + . This is caused by the fact that HAPLO- MQM exploits the between-family information in the trait means of the families.
  • the figures of the "likelihood elsewhere" seem to be upwards biased under HAPLO-MQM in simulation 1.2. Under the null-hypothesis of "no QTL segregating," the expected QTL likelihood in no-QTL regions should be equal to the number of QTL parameters (-15).
  • Results of simulation 1.2 support the initial expectation that bi-allelic and widely spread markers are not very suitable for an analysis of the HAPLO-MQM type: the families are clustered into -15 haplotype groups and a number of families is probably wrongly classified. With HAPLO-MQM, QTLs and fitted cofactors were found near the simulated positions.
  • the "effective" population size for a QTL which is here defined as the expected number of families segregating for the QTL times the progeny size, is now approximated.
  • One of the five QTL alleles was assigned a positive (+) effect, the other QTL alleles each had an equal negative effect.
  • the + allele had a frequency of 0.55 or 0.12.
  • any given QTL was segregating for the + allele in only a subset of the entire population and the "effective" population sizes were relatively high, namely -1500 (30 families) and -300 (6 families) for the + allele frequencies of 0.55 and 0.12, respectively.
  • 3 rd set of simulations The third set of simulations differs in two aspects from the 2 nd set.
  • the breeding population consisted of fully inbred lines that were about 45% instead of 10% related.
  • linkage disequilibrium between adjacent pairs of markers in the breeding population was zero instead of 0.20. Therefore the "effective" population size is now much smaller, namely -600 (12 families) and -120 (2 families) for the two cases with positive QTL-allele frequencies of 0.55 and 0.12, respectively.
  • the QTL likelihood peaks in the 3 rd set are much lower than in the 2 nd set (Tables 3b and 3c).
  • the parents of the families are clustered on the basis of their haplotype in a window of 4 markers around the putative QTL.
  • the same approach was used for cofactor markers.
  • the 120 parents were clustered into -15 groups.
  • the 3 rd set the number of clusters increased to -37, partly because several parents were used more than once. This number of parameters per QTL or marker cofactor is still much lower than the 60 parameters per QTL in the IM approach.
  • phenotypic traits relies on the ability to detect genetic differences between individuals. These genetic differences, or "genetic markers" are then correlated with phenotypic variations using the statistical methods of the present invention.
  • a single gene encoding a protein responsible for a phenotypic trait is detectable directly by a mutation which results in the variation in phenotype. More frequently, it is the case that multiple genetic loci each contribute to the observed phenotype.
  • a quantifiable phenotype e.g., height, weight, grain yield, oil content, etc.
  • the genes underlying a phenotype are designated quantitative trait loci, or QTL.
  • Detection and mapping of QTL typically utilizes the detection and correlation of genetic markers with the phenotypic trait under investigation.
  • regions of DNA which are non-coding, or which encode proteins or portions of proteins which lack critical function tend to accumulate mutations, and therefore, are variable between members of the same species. Such regions provide the basis for numerous molecular genetic markers. Markers identify alterations in the genome which can be insertions, deletions, point mutations, recombination events, or the presence and sequence of transposable elements.
  • nucleic acid refers to single- stranded or double-stranded deoxyribonucleotides or ribonucleotides and polymers thereof.
  • nucleic acid sequence refers to single- stranded or double-stranded deoxyribonucleotides or ribonucleotides and polymers thereof.
  • the term optionally includes known analogs of naturally occuring nucleotides.
  • a particular nucleic acid sequence of this invention encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions) and complementary sequences, in addition to the sequence explicitly indicated.
  • the term "gene” is used interchangebly for a specific genomic sequence, a cDNA and a mRNA encoded by the genomic sequence.
  • Two single-stranded nucleic acids "hybridize” when they form a double- stranded duplex.
  • the region of double-strandedness can include the full-length of one or both of the single-stranded nucleic acids, or all of one single stranded nucleic acid and a subsequence of the other single-stranded nucleic acid, or the region of double-strandedness can include a subsequence of each nucleic acid.
  • some techniques for detecting genetic markers utilize hybridization of a probe nucleic acid to nucleic acids corresponding to the genetic marker. Markers which are restriction fragment length polymorphisms (RFLP), are detected by hybridizing a probe which is typically a sub-fragment (or a synthetic oligonucleotide corresponding to a sub-fragment) of the nucleic acid to be detected to restriction digested genomic DNA.
  • RFLP restriction fragment length polymorphisms
  • the restriction enzyme is selected to provide restriction fragments of at least two alternative (or polymorphic) lengths in different individuals.
  • an appropriate matrix e.g., agarose
  • a membrane e.g., nitrocellulose, nylon
  • the labeled probe is hybridized under conditions which result in equilibrium binding of the probe to the target followed by removal of excess probe by washing.
  • the hybridized probe is then detected using, most typically by autoradiography or other similar detection technique (e.g., fluorography, liquid scintillation counter, etc.). Examples of specific hybridization protocols are widely available in the art, see, e.g., Berger, Sambrook, Ausubel, all supra.
  • Amplified variable sequences refer to amplified sequenes of the plant genome which exhibit high nucleic acid residue variability between members of the same species. All organisms have variable genomic seuqences and each organism (with the exception of a clone) has a different set of variable sequences. Once identified, the presence of specific variable sequence can be used to predict phenotypic traits.
  • DNA from the plant serves as a template for amplification with primers that flank a variable sequence of DNA. The variable sequence is amplified and then sequenced.
  • RNA polymerase mediated techniques e.g., NASBA
  • PCR polymerase chain reaction
  • LCR ligase chain reaction
  • NASBA RNA polymerase mediated techniques
  • Oligonucleotides for use as primers, e.g., in amplification reactions and for use as nucleic acid sequence probes are typically synthesized chemically according to the solid phase phosphoramidite triester method described by Beaucage and Caruthers (1981) Tetrahedron Lett. 22: 1859, or can simply be ordered commercially.
  • self-sustained sequence replication can be used to identify genetic markers.
  • Self-sustained sequence replication refers to a method of nucleic acid amplification using target nucleic acid sequences which are replicated exponentially in vitro under substantially isothermal conditions by using three enzymatic activities involved in retroviral replication: (1) reverse transcriptase, (2) Rnase H, and (3) a DNA-dependent RNA polymerase (Guatelli et al. (1990) Proc Natl Acad Sci USA 87:1874).
  • this reaction accumulates cDNA and RNA copies of the original target.
  • Arbitrary fragment length polymophisms can also be used as genetic markers (Vos et al. (1995) Nucl Acids Res 23:4407.
  • the phrase "arbitrary fragment length polymorphism” refers to selected restriction fragments which are amplified before or after cleavage by a restriction endonuclease. The amplification step allows easier detection of specific restriction fragments.
  • AFLP allows the detection large numbers of polymorphic markers and has been used form genetic mapping of plants (Becker et al. (1995) Mol Gen Genet 249:65; and Meksem et al. (1995) Mol Gen Genet 249:74.
  • Allele-specific hybridization can be used to identify the genetic markers of the invention.
  • ASH technology is based on the stable annealing of a short, single- stranded, oligonucleotide probe to a completely complementary single-strand target nucleic acid. Detection is via an isotopic or non-isotopic label attached to the probe.
  • two or more different ASH probes are designed to have identical DNA sequences except at the polymorphic nucleotides. Each probe will have exact homology with one allele sequence so that the range of probes can distinguish all the known alternative allele sequences. Each probe is hybridized to the target DNA. With appropriate probe design and hybridization conditions, a single-base mismatch between the probe and target DNA will prevent hybridization. In this manner, only one of the alternative probes will hybridize to a target sample that is homozygous or homogenous for an allele. Samples that are heterozygous or heterogeneous for two alleles will hybridize to both of two alternative probes.
  • ASH markers are used as dominant markers where the presence or absence of only one allele is determined from hybridization or lack of hybridization by only one probe. The alternative allele can be inferred from the lack of hybridizaiton.
  • ASH probe and target molecules are optionally RNA or DNA; the target molecules are any length of nucleotides behond the sequence that is complementary to the probe; the probe is designed to hybridize with either strand of a DNA target; the probe ranges in size to conform to variously stringent hybridization conditions, etc.
  • PCR allows the target sequence for ASH to be amplified from low concentrations of nucleic acid in relatively small volumes. Otherwise, the target sequence from genomic DNA is digested with a restriction endonuclease and size separated by gel electrophoresis. Hybridizations typically occur with the target sequence bound to the surface of a membrane or, as described in U.S. Patent 5,468,613, the ASH probe sequence can be bound to a membrane.
  • ASH data are obtained by amplifying nucleic acid fragments (amplicons) from genomic DNA using PCR, transferring the amplicon target DNA to a membrane in a dot-blot format, hybridizing a labeled oligonucleotide probe to the amplicon target, and observing the hybridization dots by autoradiography.
  • Single nucleotide polymorphisms SNP are markers that consist of a shared sequence differentiated on the basis of a single nucleotide. Typically, this distinction is detected by differential migration patterns of an amplicon comprising the SNP on e.g., an acrylamide gel.
  • alternative modes of detection such as hybridization, e.g., ASH, or RFLP analysis are not excluded.
  • Simple sequence repeats take advantage of high levels of di-, tri-, or tetra-nucleotide tandem repeats within a genome. Dinucleotide repeats have been reported to occur in the human genome as amny as 50,000 times with n varying from 10 to 60 or more (Jacob et al. (1991) Cell 67:213. Dinucleotide repeats have also been found in higher plants (Condit and Hubbell (1991) Genome 34:66).
  • SSR data is generated by hybridizing primers to conserved regions of the plant genome which flank the SSR sequence. PCR is then used to amplify the dinucleotide repeats between the primers. The amplified sequences are then electorphoresed to determine the size and therefore the number of di-, tri-, and tetra-nucleotide repeats.
  • isozyme markers are employed as genetic markers. Isozymes are multiple forms of enzymes which differ from one another in their amino acid, and therefore their nucleic acid sequences. Some isozymes are multimeric enzymes contianing slightly different subunits.
  • isozymes are either multimeric or monomeric but have been cleaved from the proenzyme at different sites in the amino acid seuqence. Isozymes can be characterized and analysed at the protein level, or alternatively, isozymes which differ at the nucleic acid level can be determined. In such cases any of the nucleic acid based methods described herein can be used to analyze isozyme markers.
  • a primary motivation for development of molecular markers in crop species is the potential for increased efficiency in plant breeding through marker assisted selection
  • MAS Genetic marker alleles are used to identify plants that contain a desired genotype at multiple loci, and that are expected to transfer the desired genotype, along with a desired phenotype to their progeny. Genetic marker alleles can be used to identify plants that contain a desired genotype at one marker locus, several loci, or a haplotype, and that would be expected to transfer the desired genotype, along with a desired phenotype to their progeny.
  • the presence and/or absence of a particular genetic marker allele in the genome of a plant exhibiting a preferred phenotypic trait is made by any method listed above, e.g., RFLP, AFLP, SSR, etc. If the nucleic acids from the plant are positive for a desired genetic marker, the plant can be selfed to create a true breeding line with the same genotype, or it can be crossed with a plant with the same marker or with other desired characteristics to create a sexually crossed hybrid generation.
  • Clones of nucleic acids linked to QTL have a variety of uses, including as genetic markers for identificaiton of additional QTLs in subsequent applications of marker assited selection (MAS). Markers which are adjacent to an open reading frame (ORF) associated with a phenotypic trait can hybridize to a DNA clone, thereby identifying a clone on which an ORF is located.
  • ORF open reading frame
  • a fragment containing the open reading frame is identified by successive rounds of screening and isolation of clones which together comprise a contiguous sequence of DNA, a "contig.” Protocols sufficient to guide one of skill through the isolation of clones associated with linked markers are found in, e.g., Berger, Sambrook and Ausubel, all supra.
  • vectors are available in the art for the isolation and replication of the nucleic acids of the invention.
  • plasmids, cosmids and phage vectors are well known in the art, and are sufficient for many applications.
  • a number of vectors capable of accomodating large nucleic acids are available in the art, these include, yeast artificial chromosomes (YACs), bacterial artificial chromosomes (BACs), plant artificial chromosomes (PLACs) and the like.
  • the present invention also relates to host cells and organisms which are transformed with nucleic acids corresponding to QTL and other genes identified according to the invention. Additionally, the invention provides for the production of polypeptides corresponding to QTL by recombinant techniques.
  • Host cells are genetically engineered (i.e., transduced, transfected or transformed) with the vectors of this invention (i.e., vectors which comprise QTLs or other nucleic acides identified according ot the methods of the invention and as described above) which are, for example, a cloning vector or an expression vector.
  • Such vectors are, for example, in the form of a plasmid, an agrobacterium, a virus, a naked polynucleotide, or a conjugated polynucleotide.
  • the vectors are introduced into plant tissues, cultured plant cells or plant protoplasts by standard methods including electroporation (From et al. (1985) Proc. Natl. Acad. Sci. USA 82;5824), infection by viral vectors such as cauliflower mosaic virus (CaMV) (Hohn et al. (1982) Molecular Biology of Plant Tumors (Academic Press, New York, pp. 549-560; Howell U.S. Patent No.
  • the engineered host cells can be cultured in conventional nutrient media modified as appropriate for such activities as, for example, activating promoters or selecting transformants. These cells can optionally be cultured into transgenic plants. Plant regeneration from cultured protoplasts is described in Evans et al. (1983) "Protoplast Isolation and Culture," Handbook of Plant Cell Cultures 1, 124-176 (MacMillan Publishing Co., New York,; Davey (1983) "Recent Developments in the Culture and Regeneration of Plant Protoplasts," Protoplasts, pp. 12-29, (Birkhauser, Basel); Dale (1983) "Protoplast
  • the present invention also relates to the production of transgenic organisms, which can be bacteria, yeast, fungi, or plants, transduced with the nucleic acids, e.g., cloned QTL of the invention.
  • transgenic organisms which can be bacteria, yeast, fungi, or plants
  • nucleic acids e.g., cloned QTL of the invention.
  • a thorough discussion of techniques relevant to bacteria, unicellular eukaryotes and cell culture can be found in references enumerated above and are briefly outlined as follows.
  • Several well-known methods of introducing target nucleic acids into bacterial cells are available, any of which can be used in the present invention. These include: fusion of the recipient cells with bacterial protoplasts containing the DNA, electroporation, projectile bombardment, and infection with viral vectors (discussed further, below), etc.
  • Bacterial cells can be used to amplify the number of plasmids containing DNA constructs of this invention.
  • the bacteria are grown to log phase and the plasmids within the bacteria can be isolated by a variety of methods known in the art (see, for instance, Sambrook).
  • kits are commercially available for the purification of plasmids from bacteria. For their proper use, follow the manufacturer's instructions (see, for example, EasyPrepTM, FlexiPrepTM, both from Pharmacia Biotech; StrataCleanTM, from Stratagene; and, QIAprepTM from Qiagen).
  • the isolated and purified plasmids are then further manipulated to produce other plasmids, used to transfect plant cells or incorporated into Agrobacterium tumefaciens related vectors to infect plants.
  • Typical vectors contain transcription and translation terminators, transcription and translation initiation sequences, and promoters useful for regulation of the expression of the particular target nucleic acid.
  • the vectors optionally comprise generic expression cassettes containing at least one independent terminator sequence, sequences permitting replication of the cassette in eukaryotes, or prokaryotes, or both, (e.g., shuttle vectors) and selection markers for both prokaryotic and eukaryotic systems.
  • Vectors are suitable for replication and integration in prokaryotes, eukaryotes, or preferably both.
  • Embodiments of the present invention pertain to the production of transgenic plants comprising the cloned nucleic acids of the invention.
  • Techniques for transforming plant cells with nucleic acids are generally available and can be adapted to the invention by the use of nucleic acids encoding QTL or other genes encoding phenotypic traits of the invention.
  • useful general references for plant cell cloning, culture and regeneration include Jones (ed) (1995) Plant Gene Transfer and
  • nucleic acid constructs of the invention e.g., plasmids, cosmids, artificial chromosomes, DNA and RNA polynucleotides, are introduced into plant cells, either in culture or in the organs of a plant by a variety of conventional techniques.
  • sequence is expressed, the sequence is optionally combined with transcriptional and translational initiation regulatory sequences which direct the transcription or translation of the sequence from the exogenous DNA in the intended tissues of the transformed plant.
  • the DNA constructs of the invention for example plasmids, cosmids, phage, naked or variously conjugated-DNA polynucleotides, (e.g., polylysine-conjugated DNA, pep tide-conjugated DNA, liposome-conjugated DNA, etc.), or artificial chromosomes, can be introduced directly into the genomic DNA of the plant cell using techniques such as electroporation and microinjection of plant cell protoplasts, or the DNA constructs can be introduced directly to plant cells using ballistic methods, such as DNA particle bombardment.
  • variously conjugated-DNA polynucleotides e.g., polylysine-conjugated DNA, pep tide-conjugated DNA, liposome-conjugated DNA, etc.
  • artificial chromosomes can be introduced directly into the genomic DNA of the plant cell using techniques such as electroporation and microinjection of plant cell protoplasts, or the DNA constructs can be introduced directly to plant cells using ballistic methods, such
  • Microinjection techniques for injecting e.g., cells, embryos, and protoplasts are known in the art and well described in the scientific and patent literature. For example, a number of methods are described in Jones (ed) (1995) Plant Gene Transfer and Expression Protocols- Methods in Molecular Biology, Volume 49 Humana Press Towata NJ, as well as in the other references noted herein and available in the literature.
  • Agrobacterium mediated transformation is employed to generate transgenic plants.
  • Agrobacterium-mediated transformation techniques including disarming and use of binary vectors, are also well described in the scientific literature. See, for example Horsch, et al. (1984) Science 233:496; and Fraley et al. (1984) Proc. Nat'l. Acad. Sci. USA 80:4803 and recently reviewed in Hansen and Chilton (1998) Current Topics in Microbiology 240:22 and Das (1998) Subcellular Biochemistry 29: Plant Microbe Interactions pp343-363.
  • Transformed plant cells which are derived by any of the above transformation techniques can be cultured to regenerate a whole plant which possesses the transformed genotype and thus the desired phenotype.
  • Such regeneration techniques rely on manipulation of certain phytohormones in a tissue culture growth medium, typically relying on a biocide and/or herbicide marker which has been introduced together with the desired nucleotide sequences.
  • Plant regeneration from cultured protoplasts is described in Evans et al. (1983) Protoplasts Isolation and Culture, Handbook of Plant Cell Culture pp. 124-176, Macmillian Publishing Company, New York; and Binding (1985) Regeneration of Plants, Plant Protoplasts pp. 21-73, CRC Press, Boca Raton. Regeneration can also be obtained from plant callus, explants, somatic embryos (Dandekar et al. (1989) J. Tissue Cult. Meth. 12:145;
  • Preferred plants for the transformation and expression of QTL and other nucleic acids identified and cloned according to the present invention include agronomically and horticulturally important species.
  • Such species include, but are not restricted to members of the families: Graminae (including corn, rye, triticale, barley, millet, rice, wheat, oats, etc.); Leguminosae (including pea, beans, lentil, peanut, yam bean, cowpeas, velvet beans, soybean, clover, alfalfa, lupine, vetch, lotus, sweet clover, wisteria, and sweetpea); Compositae (the largest family of vascular plants, including at least 1,000 genera, including important commercial crops such as sunflower) and Rosaciae (including raspberry, apricot, almond, peach, rose, etc.), as well as nut plants (including, walnut, pecan, hazelnut, etc.), and forest trees (including Pinus, Quercus, Pseutotsuga
  • plants from the genera Agrostis, Allium, Antirrhinum, Apium, Arachis, Asparagus, Atropa, Avena (e.g., oats), Bambusa, Brassica, Bromus, Browaalia, Camellia, Cannabis, Capsicum, Cicer, Chenopodium, Chichorium, Citrus, Coffea, Coix, Cucumis, Curcubita, Cynodon, Dactylis, Datura, Daucus, Digitalis, Dioscorea, Elaeis, Eleusine, Festuca, Fragaria, Geranium, Glycine, Helianthus, Heterocallis, Hevea, Hordeum (e.g., barley), Hyoscyamus, Ipomoea, Lactuca, Lens, Lilium, Linum, Lolium, Lotus, Lycopersicon, Majorana, Malus, Mangif
  • plants in the family Graminae are a particularly preferred target plants for transformation with cloned sequences corresponding to QTL or other nucleic acids by the methods of the invention.
  • Common crop plants which are targets of the present invention include corn, rice, triticale, rye, cotton, soybean, sorghum, wheat, oats, barley, millet, sunflower, canola, peas, beans, lentils, peanuts, yam beans, cowpeas, velvet beans, clover, alfalfa, lupine, vetch, lotus, sweet clover, wisteria, sweetpea and nut plants (e.g., walnut, pecan, etc).
  • corn, rice, triticale, rye, cotton, soybean, sorghum, wheat, oats, barley, millet, sunflower, canola, peas, beans, lentils, peanuts, yam beans, cowpeas, velvet beans, clover, alfalfa, lupine, vetch, lotus, sweet clover, wisteria, sweetpea and nut plants e.g., walnut, pecan, etc.
  • a plant promoter fragment is optionally employed which directs expression of a nucleic acid in any or all tissues of a regenerated plant.
  • constitutive promoters include the cauliflower mosaic virus (CaMV) 35S transcription initiation region, the 1'- or 2'- promoter derived from T-DNA of Agrobacterium tumefaciens, and other transcription initiation regions from various plant genes known to those of skill.
  • the plant promoter can direct expression of the polynucleotide of the invention in a specific tissue (tissue-specific promoters) or can be otherwise under more precise environmental control (inducible promoters).
  • tissue-specific promoters under developmental control include promoters that initiate transcription only in certain tissues, such as fruit, seeds, or flowers.
  • promoters which direct transcription in plant cells can be suitable.
  • the promoter can be either constitutive or inducible.
  • promoters of bacterial origin which operate in plants include the octopine synthase promoter, the nopaline synthase promoter and other promoters derived from native Ti plasmids. See, Herrara-Estrella et al. (1983), Nature, 303:209.
  • Viral promoters include the 35S and 19S RNA promoters of cauliflower mosaic virus. See, Odell et al. (1985) Nature, 313:810.
  • Other plant promoters include the ribulose-l,3-bisphosphate carboxylase small subunit promoter and the phaseolin promoter.
  • the promoter sequence from the E8 gene and other genes can also be used.
  • the isolation and sequence of the E8 promoter is described in detail in Deikman and Fischer (1988) EMBO J. 7:3315.
  • Many other promoters are in current use and can be coupled to an exogenous DNA sequence to direct expression of the nucleic acid.
  • polyadenylation region at the 3 '-end of the coding region is typically included.
  • the polyadenylation region can be derived from the natural gene, from a variety of other plant genes, or from, e.g., T-DNA.
  • the vector comprising the sequences (e.g., promoters or coding regions) from genes encoding expression products and transgenes of the invention will typically include a nucleic acid subsequence, a marker gene which confers a selectable, or alternatively, a screenable, phenotype on plant cells.
  • the marker can encode biocide tolerance, particularly antibiotic tolerance, such as tolerance to kanamycin, G418, bleomycin, hygromycin, or herbicide tolerance, such as tolerance to chlorosluforon, or phosphinothricin (the active ingredient in the herbicides bialaphos or Basta). See, e.g., Padgette et al.
  • exogenous DNA sequence is stably incorporated in transgenic plants and confirmed to be operable, it can be introduced into other plants by sexual crossing. Any of a number of standard breeding techniques can be used, depending upon the species to be crossed.
  • the determination of genetic marker alleles is performed by high throughput screening.
  • High throughput screening involves providing a library of genetic markers, e.g., RFLPs, AFLPs, isozymes, specific alleles and variable seuqences, including SSR. Such libraries are then screened against plant genomes to generate a "fingerprint" for each plant under consideration. In some cases a partial fingerprint comprising a sub-portion of the markers is generated in an area of interest.
  • High throughput screening can be performed in many different formats.
  • Hybridization can take place in a 96-, 324-, or a 1524-well format or in a matrix on a silicon chip or other format.
  • a dot blot apparatus is used to deposit samples of fragmented and denatured genomic or amplified DNA on a nylon or nitrocellulose membrane. After cross-linking the nucleic acid to the membrane, either through exposure to ultra-violet light or by heat, the membrane is incubated with a labeled hybridization probe.
  • the labels are incorporated into the nucleic acid probes by any of a number of means well- known in the art.
  • the membranes are washed to remove non-hybridized probes and the association of the label with the target nucleic acid sequence is determined.
  • high throughput screening systems themselves are commercially available (see, e.g., Zymark Corp., Hopkinton, MA; Air Technical Industries, Mentor, OH; Beckman Instruments, Inc. FuUerton, CA; Precision Systems, Inc., Natick, MA, etc.). These systems typically automate entire procedures including all sample and reagent pipetting, liquid dispensing, timed incubations, and final readings of the microplate or membrane in detector(s) appropriate for the assay.
  • These configurable systems provide high throughput and rapid start up as well as a high degree of flexibility and customization. The manufacturers of such systems provide detailed protocols for the use of their products in high throughput applications.
  • solid phase arrays are adapted for the rapid and spcific detection of multiple polymorphic nucleotides.
  • a nucleic acid probe is linked to a solid support and a target nucleic acid is hybridized to the probe. Either the probe, or the target, or both, can be labeled, typically with a fluoropore. If the target is labeled, hybridization is evaluated by detecting bound fluorescence. If the probe is labeled, hybridization is typically detected by quenching of the label by the bound nucleic acid. If both the probe and the target are labeled, detection of hybridizaiton is typically performed by monitoring a color shift resulting from proximity of the two bound labels.
  • an array of probes are synthesized on a solid support.
  • arrays which are known, e.g., as "DNA chips” or as very large scale immobilized polymer arrays (VLSTPSTM arrays) can include millions of defined probe regions on a substrate having an area of about 1 cm 2 to several cm 2 .
  • capillary electorphoresis is used to analyze polymorphism. This technique works best when the polymophism is based on size, for example, AFLP and SSR. This technique is described in detail in U.S.Patent Nos. 5,534,123 and 5,728,282. Briefly, capillary electrophoresis tubes are filled with the separation matrix.
  • the separation matrix contains hydroxyethyl cellulose, urea and optionally formamide.
  • the AFLP or SSR samples are loaded onto the capillary tube and electorphoresed. Because of the small amount of sample and separation matrix required by capillary electrophoresis, the run times are very short. The molecular sizes and therefore, the number of nucleotides present in the nucleic acid sample is determined by techniques described herein.In a high throughput format, many capillary tubes are placed in a capillary electrophoresis apparatus. The samples are loaded onto the tubes and electrophoresis of the samples is run simultaneously. See, Mathies and Huang, (1992) Nature 359:167.
  • an integrated system such as a computer, software corresponding to the statistical models of the invention, and data sets corresponding to genetic markers and phenotypic values, facilitates mapping of phenotypic traits, including QTLs.
  • integrated system in the context of this invention refers to a system in which data entering a computer corresponds to physical objects or processes external to the computer, e.g., nucleic acid sequence hybridization, and a process that, within a computer, causes a physical transformation of the input signals to different output signals.
  • the input data e.g., hybridization on a specific region of an array is transformed to output data, e.g., the identification of the sequence hybridized.
  • the process within the computer is a set of instructions, or "program,” by which positive hybridization signals are recognized by the integrated system and attributed to individual samples as a genotype.
  • Additional programs correlate the genotype, and more particularly in the methods of the invention, the haplotype, of individual samples with phenotypic values, e.g., using the HAPLO-IM 4" , HAPLO-MQM, and/or HAPLO-MQM 4" models of the invention.
  • the programs JoinMap® and MapQTL® are particularly suited to this type of analysis and can be extended to include the
  • HAPLO-DVr HAPLO-MQM
  • HAPLO-MQM HAPLO-MQM 4
  • HAPLO-MQM HAPLO-MQM 4
  • GUI interfaces GUI interfaces
  • Active X applications e.g., Olectra Chart and True WevChart
  • Other useful software tools in the context of the integrated systems of the invention include statistical packages such as SAS, Genstat, and S-Plus.
  • Additional programming languages such as Fortran and the like are also suitably employed in the integrated systems of the invention.
  • phenotypic values assigned to a population of progeny descending from releated or unrelated crosses are recorded in a computer readable medium, thereby establishing a database corresponding phenotypic values with unique identifiers for each member of the population of progeny.
  • Data regarding gentoype for one or more haplotypes corresponding to a plurarlity of genetic markers, e.g., RFLP, AFLP, ASH, isozyme markers, SSR, SNP or other markers as described herein, are similarly recorded in a
  • marker data is obtained using an integrated system that automates one or more aspects of the assay (or assays) used to determine the haplotype.
  • input data corresponding to genotypes for independent genetic markers or for haplotypes are relayed from a device, e.g., an array, a scanner, a CCD, or other detection device directly to files in a computer readable medium accessible to the central processing unit.
  • a set of instructions (embodied in one or more programs) encoding the statistical models of the invention is then executed by the computational device to identify correlations between phenotypic values and haplotypes.
  • the integrated system also includes a user input device, such as a keyboard, a mouse, a touchscreen, or the like, for, e.g., selecting files, retrieving data, etc., and an output device (e.g., a monitor, a printer, etc.) for viewing or recovering the product of the statistical analysis.
  • a user input device such as a keyboard, a mouse, a touchscreen, or the like
  • an output device e.g., a monitor, a printer, etc.
  • the invention provides an integrated system comprising a computer or computer readable medium comprising a database with at least one data set that corresponds to genotypes for genetic markers.
  • the system also includes a user interface allowing a user to selectively view one or more databases.
  • standard text manipulation software such as word processing software (e.g., Microsoft WordTM or Corel
  • WordperfectTM and database or spreadsheet software (e.g., spreadsheet software such as
  • ParadoxTM can be used in conjunction with a user interface (e.g., a GUI in a standard operating system such as a Windows, Macintosh or Linux system) to manipulate strings of characters.
  • a user interface e.g., a GUI in a standard operating system such as a Windows, Macintosh or Linux system
  • the invention also provides integrated systems for sample manipulation incorporating robotic devices as previously described.
  • An input device for entering data to the digital computer to control high throughput liquid transfer by the robotic liquid control armature and, optionally, to control transfer by the armature to the solid support is commonly a feature of the integrated system.
  • Integrated systems for genetic marker analysis of the present invention typically include a digital computer with one or more of high-throughput liquid control software, image analysis software, data interpretation software, a robotic liquid control armature for transferring solutions from a source to a destination operably linked to the digital computer, an input device (e.g., a computer keyboard) for entering data to the digital computer to control high throughput liquid transfer by the robotic liquid control armature and, optionally, an image scanner for digitizing label signals from labeled probes hybridized, e.g., to expression products on a solid support operably linked to the digital computer.
  • an input device e.g., a computer keyboard
  • an image scanner for digitizing label signals from labeled probes hybridized, e.g., to expression products on a solid support operably linked to the digital computer.
  • the image scanner interfaces with the image analysis software to provide a measurement of, e.g., differentiating nucleic acid probe label intensity upon hybridization to an arrayed sample nucleic acid population, where the probe label intensity measurement is interpreted by the data interpretation software to show whether, and to what degree, the labeled probe hybridizes to a label.
  • the data so derived is then correlated with phenotypic values using the statistical models of the present invention, to determine the correspondence between phenotype and genotype(s) for genetic markers, thereby, assigning chromosomal locations.
  • Optical images e.g., hybridization patterns viewed (and, optionally, recorded) by a camera or other recording device (e.g., a photodiode and data storage device) are optionally further processed in any of the embodiments herein, e.g., by digitizing the image and/or storing and analyzing the image on a computer.
  • a camera or other recording device e.g., a photodiode and data storage device
  • a variety of commercially available peripheral equipment and software is available for digitizing, storing and analyzing a digitized video or digitized optical image, e.g., using PC (Intel x86 or pentium chip- compatible DOSTM, OS2TM WINDOWSTM, WINDOWS NTTM or WINDOWS95TM based machines), MACINTOSHTM, LINUX, or UNIX based (e.g., SUNTM work station) computers.
  • PC Intel x86 or pentium chip- compatible DOSTM, OS2TM WINDOWSTM, WINDOWS NTTM or WINDOWS95TM based machines
  • MACINTOSHTM e.g., LINUX
  • LINUX UNIX based (e.g., SUNTM work station) computers.
  • TM Family y a lf x 1 + a 2f x 2 + e '
  • MQM Family y ⁇ i ⁇ a ⁇ f(i) . ⁇ l(i)+a2f(i) . ⁇ a(i) ⁇ + e
  • HAPLO-MQM Haplotype y ⁇ i ⁇ a h(lf)( i). ⁇ l( i ) +a h(2f)( i). ⁇ 2 ( i) ⁇ + e HAPLO-MQM + Haplotype ⁇ ⁇ ( ⁇ QH ⁇ C ⁇ Ci) ⁇ ⁇ u f + e y trait value of a given F2 plant in family f "
  • a LD linkage disequilibrium
  • f(+) frequency of active QTL allele
  • h 2 QTL TO" 2 QT L / ( ⁇ 2 Q T L + ⁇ 2 a + ⁇ 2 e ), where Q TL is the variation induced by the QTL if it segregates in a single F2:3 population, ⁇ 2 a refers to genetic variation when also all other QTLs segregate and ⁇ 2 e is the residual variation;
  • b We used (approximate) thresholds for QTL detection; For TM ⁇ (60;0.001) ⁇ 100.
  • HAPLO-MQM and HAPLO-MQM 4" ⁇ 2 (15;0.001) ⁇ 38 in the 1 st and 2 nd set and ⁇ 2 (37;0.001) ⁇ 69 in the 3 rd set; c Lowest and highest QTL peak in the regions of the simulated QTLs.
  • d IM average QTL likelihood on chromosomes where no QTLs were simulated; No results (-) when each chromosome contains a QTL.
  • e HAPLO-MQM and HAPLO-MQM 4" average QTL likelihood on chromosomes 1-10, regions of 30 cM on each side of simulated QTLs excluded;
  • progeny e.g. 300, 3000

Abstract

Methods for mapping a phenotypic trait to a corresponding chromosomal location are provided. Statistical methods which correlate pedigrees with multiple genetic markers, the haplotype, to determine identical-by-descent (IBD) data are employed to map phenotypic traits. The statistical models provided are a HAPLO-IM+ model, a HAPLO-MQM model, and a HAPLO-MQM+ model. These statistical methods are applied to map traits determined alternatively by single genes or by quantitative trait loci. Methods of marker assisted selection (MAS) using a variety of genetic markers are provided. Plants selected by MAS using the methods are provided. Additionally, methods for cloning nucleic acids corresponding to phenotypic traits that are in linkage disequilibrium with genetic markers are provided, and for transducing them into plant cells are provided. Transgenic plants transduced with the cloned nucleic acids corresponding to phenotypic traits, e.g. QTL, are provided.

Description

MQM MAPPING USING HAPLOTYPED PUTATIVE QTL-ALLELES: A SIMPLE APPROACH FOR MAPPING QTL' S IN PLANT BREEDING POPULATIONS
CROSS REFERENCE TO RELATED APPLICATIONS
This application claims priority to and benefit of US Provisional Applications 60/173,985, filed Dec. 30, 1999; 60/175,117, filed Jan. 6, 2000; and 60/180,330, filed Feb. 4, 2000, the disclosures of which are incorporated herein in their entirety for all purposes.
FIELD OF THE INVENTION The present invention relates to the field of plant breeding. In particular, to the mapping of QTLs using statistical models to correlate haplotypes with phenotypic traits.
BACKGROUND OF THE INVENTION
Over the last 60 to 70 years the contribution of plant breeding to agricultural productivity has been spectacular (Smith (1998) 53rd Annual corn and sorghum research conference, American Seed Trade Association, Washington, D.C.; Duvick (1992) Maydica 37: 69). This has happened in large part because plant breeders have been adept at assimilating and integrating information from extensive evaluations of segregating progeny derived from multiple crosses of elite, inbred lines. Conducting such breeding programs requires extensive resources. A commercial maize breeder, for example, may evaluate 1,000 to 10,000 F3 topcrossed progeny derived from 100 to 200 crosses in replicated field trials across wide geographic regions. Despite such significant investments, there is evidence that the gains of the past will be difficult to sustain with current resources (Smith (1998), supra).
One of the motivations for developing molecular marker technologies has been the possibility to increase breeding efficiency through marker assisted selection (MAS). The development of ubiquitous polymorphic genetic markers that span the genome have made it possible for quantitative and molecular geneticists to investigate what Edwards et al. (1987) Genetics 116:113, referred to as the numbers, magnitudes and distributions of quantitative trait loci (QTLs). To date, molecular markers have been used to assist introgression breeding of single genes from transgenic and unadapted germplasm. However, the identification of thousands of QTL from hundreds of experiments have had little impact on the improvement of quantitative traits.
Since 1987 virtually all published reports on QTL mapping in crop species have been based on the use of the bi-parental cross (Lynch and Walsh (1997) Genetics and Analysis of Quantitative Traits Sinauer Associates, Sunderland). Typically, this experimental protocol involves deriving 100 to 300 segregating progeny from a single cross of two divergent inbred lines. The segregating progeny are genotyped for multiple marker loci and evaluated for one to several quantitative traits in several environments. QTLs are then identified as significant statistical associations between genotypic values and phenotypic variability among the segregating progeny. The strength of this experimental protocol comes from the utilization of the inbred cross because the resulting Fi parents all have the same linkage phase. Thus, after selfing of the F1 plants, all segregating progeny (F2) are informative and linkage disequilibrium is maximized. However, the strategy has some severe limitations.
Theoretical considerations (Soller et al. (1978) Biometrics 34:47; Jansen (1994) Genetics 138:871; Zeng (1994) Genetics 136:1457), Monte Carlo simulations (Van Ooijen (1994) Theor Appl Genet 84:517; Beavis (1994) supra; Beavis (1998) QTL Analyses: Power, Precision and Accuracy, in Molecular Analysis of Complex Traits, AH Paterson (ed) pp 145-161, CRC Press), and recent experimental results (Openshaw and FrascaroH (1997) 52nd Annual corn and sorghum research conference, pp 44-53. American Seed Trade Association, Washington D.C.) have clearly shown that QTL studies in plant species have been inadequate for estimating numbers, magnitudes and distribution of QTLs for most quantitative traits. These studies show there is little power to identify markers linked to QTLs or to accurately estimate their genetic effects, unless a large number of progeny are evaluated. Furthermore, inferences about identified QTL and their estimated genetic effects are limited to the sample of progeny evaluated in the experiment. Therefore, additional evaluation in samples of progeny from other crosses is needed before inferences can be extended. From a breeding perspective, this is a severe limitation. In essence, the experimental protocol separates QTL identification from the breeding process. Beavis (1998), supra, suggested that the lackluster application of molecular techniques to crop improvement has been due in part to failure to integrate QTL identification methods into existing breeding methods.
Here we present a novel approach that exploits data correlating genetic haplotypes with phenotypic traits in interrelated plant populations. This new approach addresses the shortcomings of prior methods and provides for generalizable mapping data using smaller breeding populations, thus resulting in improved efficiency of QTL mapping as well as other features which will become apparent upon review of this disclosure. SUMMARY OF THE INVENTION
The present invention provides a novel and powerful method for mapping phenotypic traits in multiple related plant families. Central to this method is the clustering of the original parents into groups on the basis of their haplotype for multiple genetic markers. The effect of a putative genetic locus is then modeled per haplotype group, instead of per family of progeny, using statistical models which correlate the haplotype with numerical values assigned to the phenotypic trait.
Embodiments of the invention provide for methods of mapping phenotypic traits to a corresponding chromosomal location. The invention provides for mapping phenotypic traits of agronomic importance, including such properties as yield, grain composition, and insect, disease and drought resistance. Progeny of multiple related crosses are assigned numerical values corresponding to a phenotypic trait, and their genotype at clusters of genetic marker loci, i.e., their haplotype, is ascertained. Using statistical methods, correspondence between the haplotype and the phenotypic value is determined. In preferred embodiments, QTL are mapped.
Although the methods of the invention can be practiced in essentially any plant population, in preferred embodiments, the methods of the invention are used to map phenotypic traits in corn, soybean, sunflower, sorghum, wheat, rice and canola.
In some embodiments, the related parents used to generate progeny are inbred lines. In a preferred embodiment the lines are between 0 and 85% related. In one embodiment, the progeny are derived from a topcross and/or a backcross.
In preferred embodiments, the statistical method utilized accounts for identical by descent (IBD) data derived by correlating pedigrees and haplotypes for the genetic markers under evaluation. In an especially preferred embodiment, the model is selected from among a HAPLO-IM+ model, a HAPLO-MQM model and a HAPLO-MQM"" model.
The invention provides for the use of molecular genetic markers to define genetic haplotypes. Such markers are restriction fragment length polymorphisms (RFLP), isozyme markers, allele specific hybridization (ASH), amplified variable sequences of the plant genome, self-sustained sequence replication, simple sequence repeat (SSR), single nucleotide polymorphism (SNP), or arbitrary fragment length polymorphisms (AFLP). In some embodiments, the haplotypes are determined by high throughput screening methods. In an embodiment, one or more steps of the method is performed with computer assistance. Another aspect of the invention provides for the selection of phenotypic traits in a plant breeding population, as well as plants selected using the methods of the invention. In a preferred embodiment, the selection is performed by marker assisted selection (MAS). The invention further provides for the cloning of a nucleic acid sequence or fragment in linkage disequilibrium with a phenotypic trait, and for transducing the cloned nucleic acid sequence or fragment into a plant. In a preferred embodiment, the nucleic acid is operably linked to a promoter in an expression cassette. The production and breeding of transgenic plants incorporating these nucleic acid sequences or fragments is also provided by the invention, as are the transgenic plants so produced.
BRIEF DESCRIPTION OF THE FIGURES
Figure 1. A line drawing which graphically depicts relationships between multiple related crosses used in plant breeding. D indicates donor parents; E indicates elite parents;
P indicates progeny; a shaded box indicates a bi-parental cross made; a solid arrow indicates parent-offspring relation; a dashed arrow indicates a half-sib relation; and a dotted arrow indicates and indirect relation.
Figure 2. Diagram of the interval mapping (Evl) and multiple-QTL mapping (MQM) methods. 1 indicates QTL-allele substitution-effect per family; 2 indicates QTL- haplotype effects explain within-family variance; and 3 indicates QTL-haplotype effects explain within-and between-family variances. Figure 3 a and b. A schematic depiction of haplotypes in a "window" of four markers for parents with identical and different haplotypes, respectively. Numbers indicate type of marker allele. Figure 4. A line graph showing a comparison of the , HAPLO-IM+, HAPLO-MQM and
HAPLO-MQM+ models for a single chromosome (simulation 2.1).
DETAILED DESCRIPTION
In the practice of breeding for agronomically important crops such as maize and soybean, the breeder annually generates many crosses. Typically, a few elite inbred lines or varieties are crossed with a wide range of new inbred lines or varieties to generate a large number of segregating crosses. For obvious reasons, the number of progeny per cross is small (often around 10, seldom more than 50), but the total number of progeny tested is relatively large. Current techniques for QTL mapping (like interval mapping) do not exploit the potential power of plant breeding experiments. Moreover QTL mapping and breeding are still separate processes without much coherence, which is a serious drawback.
The present invention provides a novel and powerful method for joint analysis of multiple related families. The key to increasing the power of the analysis as compared to the prior art, is the clustering of the original parents into groups on the basis of their haplotype. The effect of a QTL on the phenotype is then modeled per haplotype group instead of per family. This permits an examination of the effects of haplotype-alleles across families. Simulations of realistic plant breeding schemes demonstrate a significant increase in power of QTL detection compared to existing methods. The present invention offers new opportunities for the mapping and exploitation of QTLs in commercial breeding activities.
DEFINITIONS
Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this invention belongs. The following references provide one of skill with a general definition of many of the terms used I this invention: Singleton et al.(1994) Dictionary of Microbiology and
Molecular Biology, 2nd ed.; Walder (ed)(1988) The Cambridge Dictionary of Science and Technology; Rieger et al. (eds)(1991) The Glossary of Genetics, 5th ed. Although any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, preferred methods and materials are described. For the purpose of the present invention, the following terms are defined below.
The term "phenotypic trait" or phenotype, refers to an observable trait of an organism. The phenotypic trait can be observable to the naked eye, or by any other means of evaluation known in the art, e.g., microscopy, biochemical analysis, etc. In some cases, a phenotype is directly controlled by a single gene, a "single gene trait." In other cases, a phenotype is the result of several "quantitative trait loci" acting together. Such a phenotype can generally be described in quantitative terms, e.g., height, weight, oil content, days to germination, etc, and therefore can be assigned a "phenotypic value" which corresponds to a quantitative value for the phenotypic trait.
The term "genotype" refers to the genetic constitution, as contrasted with the observable trait, i.e., the phenotype. Thus, a genotype is an individual's genetic make-up for all the genes in its genome (chromosome compelement). A "haplotype" is an individual's genotype at multiple, generally linked, loci. For example, a haplotype can be an individual's genotype for multiple loci or genetic markers on a single chromosome. In this case, the term "chromosomal haplotype" is, alternatively, used. Similarly, an individual's genotype for multiple loci (or markers) within a defined region of a chromosome is, optionally, referred to as a "regional haplotype."
"Genetic markers" are loci, or DNA sequences which both vary (are polymorphic) between individual's in a population, and can be detected by one or more analytic methods, e.g., RFLP, AFLP, isozyme, SNP, SSR, and the like. "Marker Assisted Selection" or "MAS" refers to the practice of selecting for desired phenotypes among members of a breeding population using genetic markers.
The term "correspondence" in the context of the invention refers to a genetic marker locus or loci in linkage disequilibrium with a phenotypic trait. A genetic marker is said to be in "linkage disequilibrium" with a gene or genes that control part or all of the variance for a trait (or locus), when the marker and the gene that controls the variance of the trait (or locus) do not segregate independently, i.e., they are inherited together more often than is expected by chance.
INTRODUCTION
While much of the ensuing discussion relates to the mapping of QTLs, it will be appreciated that the methods of the invention are equally applicable to the mapping of other genetic loci, e.g., those underlying single gene traits. Accordingly, even where QTLs are referred to exclusively for the sake of clarity and simplicity, genes underlying single gene traits are to be understood to be assessable by essentially similar methods.
Because applied breeding programs evaluate large numbers of progeny derived from multiple crosses, they can provide the necessary phenotypic data for routinely mapping QTLs for a wide range of agronomic traits. Thus, by integrating QTL analyses into existing breeding strategies, the power, precision and accuracy associated with large numbers of progeny can be attained. Furthermore, inferences about QTLs can be drawn across the breeding program rather than being limited to the sample of progeny from a single cross. Integrating QTL identification into existing breeding programs makes the information much more valuable for marker-assisted selection (MAS), because the QTLs apply to agronomically challenging situations in the field. This is more efficient than the current strategy of linking the sequential discrete processes, which comprise the production of progeny from carefully chosen contrasting inbred lines, the identification of QTLs, the assembly of QTLs, and testing and evaluation of these QTLs in numerous backgrounds through modified backcrossing strategies.
Numerous statistical methods have been developed for QTL mapping in experimental populations (see, e.g., Jansen (1996) Trends Plant Sci 1:89), and there has been some effort to adapt the methods of analysis developed for bi-parental experimental populations to (diallel) breeding populations (Rebai and Goffinet (1993) Theor Appl Genet 86:1014). However, the principles that underlie analysis methods for the bi-parental inbred cross are not adequate for application to breeding populations, because the genetic structures of cross and population are different. The selection of inbred lines used in the experimental paradigm for the bi-parental cross is often based on maximizing phenotypic and molecular marker differences between the inbred lines. As a consequence, the Fi is fully informative, linkage disequilibrium is maximized, the linkage phase is known, there are only two QTL alleles, and, except for backcross progeny, the frequency of each QTL allele is 0.5. In contrast, the selection of lines for breeding is based- on maximizing genetic variability of traits useful for agronomic performance. As a consequence, the crosses are not necessarily informative at all marker loci and QTLs, linkage disequilibrium exists among the (F2) progeny within families, but not necessarily across the breeding population. The linkage phase is not consistent across the breeding population, multiple QTL alleles can exist and the frequency of each will vary between 0 and 1.
Recently, Xu and co-workers have proposed fixed and random effect approaches for combining multiple line crosses in plant breeding populations (Xu (1998) Genetics 148:517; Xie et al. (1998) Genetics 149:1139). Their fixed-model strategy nests QTL effects within families; each family has its own set of parameters for QTL effects and it is assumed that residual errors are identically distributed. The random effect model also treats QTL effects as nested within families, but uses a variance component parameter for modeling QTL variance over families. This strategy of treating progeny from multiple crosses as independent families is straightforward and provides a robust tool for analyzing multiple plant breeding families. The present invention takes the relationships between the families into account, thereby providing significant enhancements in identifying QTL.
One simple approach is to apply the existing methods developed for single line crosses and to use this software for analyzing multiple populations one-by-one. The QTL likelihood curves are then summed up in order to generate an overall QTL likelihood. This approach is very straightforward, but does not model relationships between families. As for the method, one has a choice of interval mapping (Lander and Botstein (1989) Genetics 121:185), regression mapping (Haley and Knott (1992) Heredity 69:315) or MQM mapping (Jansen (1994) Genetics 138:871). See, e.g., Spelman et al. (1996) Genetics 144:1799, for an illustration with multiple dairy cattle families.
In reality, a plant breeding population usually consists of multiple related instead of unrelated families. Clearly, higher mapping resolution and power can be obtained by methods properly including these relationships into the model. Current technologies are making such detailed information about relationships readily available. It is not unrealistic to assume that plant-breeding populations will be fingerprinted on a regular basis at 200-500 marker loci, and with chip technology soon at 1000 or more loci. Marker information can monitor identity-by-descent (IBD) from parent to offspring throughout the populations. Figure 1 shows a typical design of multiple related crosses. There are clear and direct half- sib relationships between families if the same line is used as parent in two or more different crosses. In addition, other more ancestrial relationships between families can exist as parents of the families can share identical-by-descent (IBD) regions; In an on-going breeding program genomic blocks are moved around and recombined in new configurations. Even if full information is not available about breeding history, these IBD regions can still be detected with reasonable certainty from fingerprint information by using simple probabilistic rules. If two parents have an identical marker haplotype in a certain region, for example, fingerprinted by three or four markers, then it is more likely that they are identical-by-descent (JJBD) than that they are just identical-in-state (US). It is, therefore, likely that the parents share the same ancestral genomic block and, if we assume the presence of a putative QTL within this block, that they have the same QTL genotype. In particular, with multi-allelic markers one has powerful tools for identification of IBD regions (prior to QTL analysis) and there will be only small unknown regions for each pair of parents.
The present invention exploits this kind of US and IBD information. We previously developed the MQM approach for analyzing experimental populations (Jansen (1996) supra; and Beavis, PCT Application WO 99/32661, published January 7, 1999, QTL MAPPING IN PLANT BREEDING POPULATIONS, herein incorporated in their entirety). The present invention for mapping phenotypic traits in plant breeding populations significantly extends the previous methodologies and is described in detail herein. MODELING QTL ACTIVITY TN MULTIPLE RELATED FAMILIES
In this section, the use of standard approaches (IM, MQM) is first addressed; next, the concepts of HAPLO-Evf" and HAPLO-MQM+ are introduced; and finally, the HAPLO-MQM approach is described. While these statistical methods constitute preferred embodiments of the present invention, other statistical approaches can, similarly, be employed in conjucntion with IBD information as an alternative to or in conjunction with the methods detailed herein.
One important feature of the methods of the present invention, is the clustering of the original parents into groups on the basis of their haplotype for multiple genetic markers. The effect of a putative genetic locus is then modeled per parental haplotype group, instead of per family of progeny as previously performed. Indeed, as will be apparent to one of skill in the art, other existing QTL analysis methods, e.g., frequentist or Bayesian analysis procedures, with fixed, random, or mixed QTL effects, etc., can also be utilized. Figure 1 illustrates a typical design of multiple related crosses. Figure 2 provides an overview of the methods of analysis.
Before proceeding to the modeling of multiple related families, the specifics of a single F2:3 testcross population used as an illustrative example in the current discussion is described.
A single F?^ test-cross population: An F2:3 population is, in short, a standard F2 population in which the phenotypic scores of F individuals are obtained not by evaluation of the F plants themselves but rather by evaluation of F3 offspring of the F2 plants. Each F2 individual is crossed with the same homozygous tester to generate a number of F offspring per F2 plant and those F3 plants are evaluated for traits of interest. Like a standard F2, each F2 plant has genotype and phenotype scores. But in an F :3 the trait value of an F plant is computed as the average trait value of its F3 testcross offspring.
For the purposes of this discussion, it is assumed that the two parents have different QTL genotypes, QiQi and Q2Q2, respectively. The F2 population is a mixture of the three QTL genotypes: QiQi, Q2Q2 and QιQ2. Each F2 plant is crossed to a tester QQ to generate a F2:3 topcross. Table 1 shows two characteristics important for QTL modeling in this testcross design, namely (a) only heterozygous F2 plants generate segregating F3 offspring and (b) the expected trait value of a heterozygous F2 plant is halfway between the expected trait values of the homozygous F2 plants. The latter implies that QTL models will not contain parameters for dominance. The former implies that we can expect heterogeneous residual variance. However, it is assumed that the environmental variation is much larger than the variation induced by segregation, so that all F individuals will have (approximately) identically distributed residual error. In the next sections a number of different QTL models are described. Table 2 summarizes all the models.
Analyzing multiple related F^ families via IM and MOM:
IM was developed for analyzing a single family obtained from one bi-parental cross (Lander and Botstein (1989) Genetics 121:185). In the case of multiple families, one can analyze each family separately by IM and sum up the QTL likelihood over the families. This straightforward approach does not model QTL activity across families (in statistical terms, QTLs are nested within families). The allele effects (ax and a2 in Table 1) need to be indexed by family number: aϊf and a2f.
The model for a given F plant in a certain family is now described. This F2 plant can obtain 0, 1 or 2 copies of the QTL allele of the first parent, Q1# In the first case it has received two copies of allele a2f from the second parent and the model for the model reads
y = 2a2f+ e,
where e is the residual error. In the second case it has received one allele from each parent (alleles atf and a2f) and the model reads
In the last case the plant has received two copies from of allele aif from the first parent and the model for the model reads
y = 2a!f+ e.
By using indicator variables, the three models above are formulated in one expression. Let x\ and x2 = 2 - XΪ be the indicators for the number of copies of the Q! and Q2 allele transmitted to the F2 plant. The probability distribution for xι can be easily calculated on the basis of (flanking) marker data. The interval mapping (mixture) model for the phenotype y reads
y = aif xι+a2f-x2 + e (J model)
Let Nfam be the number of families. QTLs are nested within families and in total there are 2Nfam regression parameters and Nfam residual variance parameters.
So far, it has been assumed that the trait was encoded by a single QTL only. That assumption is now altered in that the trait value is obtained as the sum of all the QTL allele effects. Let NQTL be the number of QTLs. In the IM model we used a^ and a2f to denote the allele effects of a single QTL in family f. An extra index to indicate the QTL number is now included, and aif(i) and a2 (i) are written for the allele effects of the i-the QTL in the f-th family (f=l..Nfam; analogously, Xι(i) and x2(i) = 2 - xι(i). The QTL model for a given F2 plant in family f reads
y = ∑i {aif(i)xι(i)+a2f(i) x2(i)} + e, (MQM model)
where summation is over QTLs. There are 2NQTLNfam regression parameters and in addition Nfam parameters for residual error. Unfortunately, the model is over-parameterized, i.e. one cannot estimate all regression parameters individually. In fact, one usually reparameterizes in terms of intercept and allele-substitution effects, of which there are in total (NQTL+l)Nfam summed over all families. In addition there are Nfam parameters for residual variance.
If two parents are related, then they can share larger genomic blocks, sometimes even whole genomes, and the markers in these regions are not informative, simply because they do not segregate. Unfortunately, if there are large uninformative genome regions, MQM mapping with several QTLs and/or cofactors becomes cumbersome. Of course, there can be a segregating QTL located close to a non-segregating cofactor marker. The effect of the QTL can be eliminated in this case by replacing the observed marker scores by missing values in the given family. In this case, a segregating cofactor marker is mimicked. Information about inheritance comes from segregating markers in the same region. However, there is only information about IBD if such segregating markers are located nearby the cofactor marker. Thus for a cofactor marker not segregating in a given family, one can adapt the MQM software as follows. Fit the cofactor in the given family if and only if for the given family a segregating marker is at a distance of less than 20 cM away from the cofactor marker; replace the genotype scores of the marker by missing values.
Analyzing multiple related F?,3 families via HAPLO-fM+ and HAPLO-MOM+: The following approach for QTL analysis is applied to data from multiple related families (see Figure 3). To fit a QTL at a certain map position, a window around this map position is defined, the number of different marker haplotypes in this window is counted, and the parents are grouped according to their haplotype in the window. In the QTL model, the genetic (QTL) effect is defined per haplotype. Alleles have been defined by a^ and a2f, indicating the allele of the first and second parent of the f-th family. Let NπapioO) be the number of different haplotypes at the i-th QTL. The haplotypes are now numbered from 1 to NHaPio(i)- Let h(lf) and h(2f) indicate the haplotype of the allele from the first and second parent, respectively. Then, ah(if) and ah(2f) indicate the allele type after clustering into haplotype alleles. This will depend on the size of the window with larger windows having more different haplotypes. The models now include a parameter for any between-family differences not yet accounted for
y = ah(i ) X1+ah(2f) X2 + uf + e, (HAPLO-JM1" model)
and
y = ∑i { ah(if)(i) x1 (i)+ah(2ft X2(i) } + uf + e, (HAPLO-MQM+ model)
where Uf is the parameter for family effect and ah(if) and ah(2f) are the effect of the haplotypes h(lf) and h(2f) inherited from parent 1 and 2 of the f-th family, respectively. Note that the result is not a simple "QTLs nested within families" analysis; the QTL effects are still defined for haplotypes and not per family. Moreover, this approach can handle cases in which two parents share larg(er) genomic blocks, and when a (putative) QTL is therefore not segregating in their offspring. Due to the modeling of QTLs across families, it is now possible to estimate all parameters. There are altogether NQτι_Nfam effects alf(i) and a2f(i), but in HAPLO-IM+ and HAPLO-MQM+ they are 'mapped' to the smaller set of NHapio(i) parameters for the effects of the haplotype clusters (Fig 2). The model includes NHapι0(i)-l free parameters for the effect of the i-th QTL and this number can be significantly lower than the Nfam parameters in the "full parameterization" with QTLs nested within families as in TM and MQM. The number of QTL parameters can be reduced without loss of information. The '+' in the name HAPLO-MQM1" indicates that parameters for family effects have been included. In the following, models without parameters for family effects are considered, and the method is indicated without '+' by 'just' HAPLO-MQM. Analyzing multiple related F?^ families via HAPLO-MQM:
The IBD concept is based on between-family genotypic information. The next step is to additionally explore the potential of between-family phenotypic information. The Fi of one family can be homozygous Q]QX for a given QTL, others can be homozygous Q2Q2. Clearly non-segregation of (major) genes will show up only as different mean trait values of the families. In some cases there can be no other sources generating additional between- family variation. The effects of QTL alleles can then be estimated from populations in which they are segregating, and twice their effect contributes to the mean of any family in which they are not segregating. The multiple-QTL model for the phenotype y of a given F2 plant in family f is equivalent to the MQM model:
y = ∑i ah(lf)(i) xι(i)+ah(2f)(i) x2(i) + e, (HAPLO-MQM model)
where summation is over QTLs. The HAPLO-MQM approach fully exploits the within- families and between-families phenotypic information. HAPLO-MQM exploits the full power of QTL detection by combining the sources of information. Gain of power will be highest if between-family differences originate mainly from multiple additively acting and "detectable" QTLs only (see discussion).
Strategy for QTL detection:
Jansen (1994), Genetics 138:871, described a general two-step MQM procedure to find markers closely linked to QTLs and for using these markers as cofactors in QTL analysis. First, a set of markers covering the entire genome is selected, these markers are regressed simultaneously, a statistical elimination procedure is performed to find markers in plausible QTL regions. Such markers are selected via a backward elimination approach on the basis of a 2% significance threshold per marker test. Second, an approach for precision mapping of QTLs within marker intervals is applied. The presence of a QTL for a particular genomic marker interval is tested at a genome-wide 5% significance level, while simultaneously fitting the selected markers from the first step in the model of analysis. Hence, the markers selected in the first step function as cofactors in the model used in the second step. Markers inside a small window around the position under study are not used as cofactors. Genome-wide significance thresholds for MQM mapping can be obtained by simulation ('parametric bootstrapping') as in Jansen (1994), supra. This is a computer- intensive task. In the present discussion an ad-hoc approach is utlilized for illustrative purposes: (approximate) thresholds for QTL detection at =0.001 per test: χ2(60;0.001)~102 for IM and χ2(15;0.001)~38 for HAPLO-MQM and HAPLO-MQM+ were used.
Modeling gene action:
In quantitative genetics, it is the convention to model phenotypic variation within segregating progenies and to ignore between-family information. Thus the models include a parameter for the mean trait value of a population and a parameter for the effect of substitution of one allele by another allele within the population genetic background. In the case of 60 families the BVI analysis requires 60 parameters for the family means plus 60 parameters for QTL-allele substitutions, i.e., 120 parameters all together. The HAPLO- MQM method of the present invention is based on a different paradigm. The effects of haplotyped QTL-alleles across families, and not the effects of allele substitution within families, are evaluated across families. The present invention provides models which can cope with QTLs segregating in only a subset of the families and which exploit within-family variation, but in addition also consider between-family variation. In 60 related families the number of parents is 120, or less if some parents are used more than once. Thus, EVI and HAPLO-MQM require the same maximum number of parameters. The allele effects of segregating and non-segregating QTLs contribute to the differences between families, but there can also be other genetic and non-genetic sources of variation (e.g., epistatic interactions). The HAPLO-MQM* model includes parameters to account and test for these differences. Note that these additional parameters do not play the role of "mean value of families" as in IM. In fact, they quantify the deviance between the observed trait values and the predicted trait values under the HAPLO-MQM model. This deviance is a measure for how well the sum of estimated allele effects can explain the between-family differences.
TM and HAPLO-MQM in no-OTL regions: In the present IBD approach, the parents are clustered into less than 120 groups on the basis of their haplotype in a window of, e.g., four, markers around the QTL position under study. The same approach is carried out for each marker cofactor (each time leading to different groupings). The number of parameters per QTL (or marker cofactor) is equal to the number of different haplotypes. The models of HAPLO-MQM required -15 parameters in the 1st set of simulations, -15 in the 2nd set and -37 in the 3rd set. In each set of simulations IM takes 60 allele-substitution parameters per QTL. This shows that the number of parameters per QTL can be much lower in HAPLO-MQM than in IM and this has important implications for testing. Under the null-hypothesis the likelihood-ratio test statistic at a fixed map position follows approximately a chi-squared (or F-like) distribution (Jansen 1994), supra. For this test there are -15 degrees of freedom (d.f.) under HAPLO-MQM in the 1st and 2nd set of simulations, -37 d.f. under HAPLO-MQM in the 3rd set, but -60 d.f. under IM. Therefore, in regions where there are actually no QTLs present, the computed QTL likelihood is expected to be on average -15 and -37 under HAPLO-MQM, and -60 under IM. Table 3 and Figure 4 show that this "background" likelihood often takes values in the predicted ranges. An important consequence is that the threshold for genome-wide significance in HAPLO-MQM is much lower than that in multi-family IM. This increases the power of QTL detection using the HAPLO-MQM methods of the invention. IM versus HAPLO-TM+:
QTL action with only two modes (on or off) is examined for the purpose of illustration. Therefore, two parameters per QTL would have been sufficient. However the
HAPLO-MQM approach requires -15 parameters per QTL in the 1st and 2nd set, and -37 in the 3rd set, but the IM approach requires 60 parameters. Therefore TM and MQM are still highly over-parameterized. On the other hand, the additional parameters do not really cause problems: the model has the flexibility to fit the data well, the JJVI can be more over-fitted. Figure 4 allows us to compare the single-QTL approach of IM to that of HAPLO-JJVT1", which is a single-QTL HAPLO-MQM*" model with family effects included to eliminate effects of the other QTLs. The QTL-likelihood peaks for EVI and HAPLO-Uvf" are approximately of the same height. However, the "background" likelihood is lower for the latter, which indicates that HAPLO-fM"1" is the more powerful approach.
IM versus HAPLO-MOM+:
In general, the power of QTL detection is determined by the ratio between the variance induced by the given QTL and the unexplained residual variance (Jansen 1994, 1996, both supra). In simulations provided for illustration: 2QTL / ( fσ2QT + fσ a + f σ2 e). With JM, the genetic background QTL-effects are part of the unexplained variance.
In these simulations the HAPLO-MQM1" model included 10-30 marker cofactors simultaneously. The aim of MQM is to eliminate the (additive) QTL variation from the error term, so that h2 QTL= σ2 QTL / (O"2QTL + f σ2 a + fσ2 e). Depending on the proportion of this QTL component, from small to large increases of power can be realized (Jansen 1994, 1996). For example, in simulation 2.1 each QTL explains 5% of the total variation, i.e., σ2 QTL / (σ2 QTL + f σ2 a + f σ2 e) = 0.05. In the ideal case HAPLO-MQM4" successfully removed the 70% σ QTL / (σ QTL + σ e) ~ 0.16. In simulation 2.3 with smaller proportion of QTL-induced variation f σ2QTL / (O"2QTL + σ2 e) - 0.09. In simulations, the HAPLO-MQM " models use all together -200-600 parameters for QTLs, leaving still -2400-2800 degrees of freedom for estimating residual error variance. QTL- likelihood peaks are clearly higher for HAPLO-MQM4" than for IM in simulations 2.1 and 2.2 (Table 3b and Figure 4). Modeling of multiple QTLs can be a second step to increase the power of QTL detection.
HAPLO-MOM+ versus HAPLO-MQM:
The present invention provides a third possibility for further increasing the
QTL power: the use of between-family information in addition to within-family information. Families are usually not segregating for all QTLs involved. But that does not imply that the effect of those non-segregating QTLs cannot be detected: they generate differences between the mean values of the families. Therefore, the QTL-allele effects in our models should not only explain the within-family variation but they should also capture the between-family variation. This new multiple-QTL model can meet these additional constraints. Of course, this approach works efficiently when most of the (unexplained) differences between families are indeed induced by additively acting QTLs. This can be tested by comparing the HAPLO- MQM and HAPLO-MQM4" models. Firstly, fit the HAPLO-MQM4" model and calculate the likelihood, then drop the 60 family-effect parameters, calculate the likelihood of the HAPLO- MQM model, and calculate the difference between the two likelihoods. Table 3 and Figure 4 show that the difference for QTL-likelihood between the two HAPLO-MQM approaches can be substantial, demonstrating that between-family information can be used to increase QTL- likelihood peaks.
Choice of the window size in HAPLO-MQM:
Markers in a window around the QTL position under study are used to group the parents into haplotype categories. In the simulation study a relatively large window of four markers (Figure 3) was used. To some extent this was an arbitrary choice. With more markers, the windows would have become very large, given the sparse marker map of the simulations (200 markers at 2000 cM). In the extreme case, all markers are used simultaneously for haplotyping, and a one to one relation between haplotype and parent is established. The HAPLO-MQM model then includes as many haplotype effects as there are different parents and correctly models all half-sib relationships. In general, there are good reasons for using few(er) markers in haplotyping. Haplotyping on the basis of fewer markers tends to result in less haplotype classes, so that fewer QTL parameters are required. This increases the power of QTL detection and allows us to fit more complex models, e.g., with interactions.
Alternatively, a larger number of markers can be used in haplotyping, in particular if the marker map is dense. In this case it is assumed that two parents with identical haplotype in the window under study have the same QTL allele within this region. The probability that this is indeed true increases when the haplotyping is based on more markers, because more markers decreases the chance of erroneous grouping. It will be appreciated that there is an optimal balance, and it is likely that the optimum can change, e.g., when different marker densities are used, or when different types of marker are used. In breeding populations, bi-allelic markers (e.g., AFLPs) are expected to be less informative in fingerprinting than multi-allelic markers (e.g., microsatellites). Preferably, lower informative marker types are available at a higher map density to achieve indirectly a high multilocus information content. In some cases, e.g., in the case of low density of informative markers, it is hard to distinguish marker alleles displaying identity-by-descent (IBD) from those that are merely identical-in-state (US).
Epistasis and environment:
The methods of the present invention focus on QTLs with fixed effects across populations. In reality, the effects of QTL alleles are modified by genetic background. With the current HAPLO-MQM models, the "average" allele effect across the population is estimated. In order to keep the number of parameters within reasonable bounds, one can extend the use of the HAPLO-MQM model as follows. Use a priori criteria, such as genetic distance, to classify families into sub-populations and include QTL x sub-population instead of QTL x family interactions as fixed or random effects in the models.
Genotype x environment interaction is a very important issue in breeding. It will be appreciated that any methodological concept developed for QTL x E interaction in IM or MQM models can also be applied to HAPLO-MQM. Furthermore, the likelihood of the HAPLO-MQM and that of the HAPLO-MQM4" can be compared to assess the amount of such interaction. Computation:
The HAPLO-MQM models contain many parameters. Up to 30 QTLs with
-15 or -37 parameters have been fitted per QTL in 3000 progeny. Nonetheless, enough degrees of freedom remain for a further increase of the number of QTLs in the model. A whole genome-scan, including the backward elimination and permutation procedure, would become rather time consuming on an ordinary PC (Pentium II 400 Mhz). Computational and time limitations are set mainly by the matrix routines for solving large sets of linear equations of size equal to the number of parameters. Fortunately, the design matrix contains many elements known to be zero. Thus with efficient programming of the matrix calculations one can gain much speed. An alternative would be to consider the use of variance component models, taking the effects of the haplotype effects per QTL as random in HAPLO-MQM. The "QTL-allele breeding values" (Best Linear Unbiased Prediction) can then be predicted. In marker-assisted selection BLUPs are calculated for the breeding values of individuals. With HAPLO-MQM, selection occurs at the level of the QTL-allele predictions rather than at the level of the individual ' s predictions.
Minimum conditions for number and size of breeding populations:
Settings with relatively high overall heritabilities, and small to relatively large progeny sizes per family have been discussed. Table 3 shows that HAPLO-MQM can, in most cases, detect the QTLs and even dissect the linked ones. Higher genetic complexity (smaller heritability, more QTLs, tightly linked QTLs) is compensated by increasing the number of families and/or the progeny sizes per family. In any circumstance, the application of more powerful analysis tools, such as HAPLO-MQM, will improve the chance of successful dissection of the effects of linked and unlinked QTLs.
SIMULATIONS The feasibility of identifying QTLs in progeny derived from breeding crosses has been investigated using simulations. In addition to the obvious cost effectiveness of using simulated data, it is possible to identify factors that can have a major impact on the power, precision and accuracy of QTL mapping. Thus, by varying the factors affecting information it is possible to guide the development of the statistical methods. Discussions with plant breeders and molecular biologists make it clear that the genetic structure of plant breeding progeny is distinct from that of progeny derived from a bi-parental cross (Table 4). Furthermore, unlike the information provided by progeny from a single cross, the information provided by progeny from multiple breeding crosses is affected by many factors. The information provided by progeny from plant breeding populations can be affected by at least 20 factors (Table 5). These factors can be categorized as descriptors of population structure, genome structure, marker information, and quantitative trait expression. If a minimum of two values for each of these factors is considered, a very large set of settings is generally investigated. Therefore, the present simulations were begun with realistic and less informative genomic and population structures, and with simplistic assumptions regarding genetic components of quantitative trait variability. The goal is to simulate data that would be expected from early generation progeny tests in maize breeding. These are often referred to as first and second year topcross tests. The breeding population then consists of segregating progeny derived from multiple crosses of somewhat related inbred lines. Using the methods of the present invention it is possible to join all data over families together and present one overall analysis for all these data. The results from three sets of simulations mimicking this type of breeding population are provided in the following discussion.
Simulation protocol: The genome for the simulations consisted of 10 linkage groups of 200 centiMorgan (cM) each, e.g., as in maize. The genotype and phenotype data were generated in a number of steps. First, a base population of more or less related inbred lines was simulated, all lines belonging to one and the same heterotic group. Then, pairs of parents for multiple crosses were selected from the base population. Next, F2 offspring of the multiple crosses were generated and each F2 plant was testcrossed to a tester from another heterotic group to generate F2:3 offspring. Finally, phenotypic values were assigned to the segregating F2;3 and F progeny. Certain aspects of the simulation steps are described below in more detail.
The following (ad-hoc) protocol was used for generating a base population of inbred lines with different (re)combinations of "ancestral" linkage blocks. The genome consisted of 10 linkage groups, each containing 101 bi-allelic marker loci with 2 cM of recombination between adjacent pairs per riieiosis. The ad-hoc procedure for generating linkage blocks is as follows: a set of 400 homozygous recombinant lines from the cross between hypothetical parents with genotypes 1111 (and so on) and 2222 (and so on) was simulated. Thus, the genotype of a homozygous line consisted of linkage blocks of l's and of 2's; the genotype will be an expression like 111211222 (and so on). 400 doubled haploid lines were simulated using a recombination frequency of 0.02 and 0.2 between adjacent markers, respectively. Linkage disequilibrium between adjacent loci is 0.20 and 0, respectively.
The next step of the procedure was to move from linkage blocks to biallelic marker genotypes (1st set of simulations) or multi-allelic marker genotypes (five alleles per marker in the 2nd and 3rd set of simulations). This was accomplished by assigning types of marker allele to the linkage blocks. A (new) type of marker allele using preset marker allele frequencies was independently sampled, linkage block by linkage block, from a multinomial distribution. For example, the original genotype 111221 contains three linkage blocks, 111, 22 and 1. Each linkage block gets randomly assigned one of the types of a marker allele. Thus with 5 marker alleles, the l's in the first linkage block can be replaced by 5's. After sampling types of marker allele for the other blocks the original genotype 111221 can be converted into the new configuration 555133.
The 400 lines in the base population can be crossed amongst each other in various combinations. One can select parents at random, alternatively, pairs of parents, which approximately showed a preset level of relatedness, say -45% are selected. In the present example, only pairs of parents having a preset level of relatedness were used for crossings and all the other possible pairs with other levels of relatedness were ignored. In the three sets of simulations different levels of relatedness within pairs of parents were used (10%, 40% and 45%). The heterozygous Ft offspring of the crosses were selfed in order to generate segregating F2 families. Each F2 individual was crossed with one and the same homozygous tester to generate a number of F3 offspring per F2 plant and those F3 plants were "evaluated" for traits of interest. QTLs were placed at marker positions, which made it easy to derive genotypic values. The trait value was calculated as the sum of genotype and random Gaussian noise. The trait value of an F2 plant was computed as the average trait value of its F3 testcross offspring.
In these simulations, the sizes of the QTL effects were standardized to stay close to what is commonly experienced, namely heritability in a single F2:3 population. In a set of multiple related families a QTL contributes to within-family and between-family variation. In different families different numbers of QTLs can be segregating. The ambiguity is to use the maximum h QTL= σ QTL / (σ QTL + σ e) or the minimum h QTL= σ Q L / (O"2QTL + σ2 a + σ2 e), where h2 QTL is the variation induced by the QTL in the testcross population, σ e is the residual variation and σ a refers to genetic variation when all other loci segregate. Therefore, h2 QTL= σ2 QTL / (^2QTL + σ2 a + σ2 e) was used.
As a last step in the simulation procedure, the set of 1010 markers was reduced: in each of the three simulations two hundred loci were randomly sampled from the genome and only these marker data were available for analysis. In the thinned set of markers, the average recombination frequency between adjacent markers is 0.1 and 0.5, and the average linkage disequilibrium is approximately 0.20 and zero, respectively.
1st set of simulations: In this set of simulations, the merits of the HAPLO- MQM model was evaluated under conditions which are far from optimal; specifically, the loci (markers and QTLs) for the first set of simulations were bi-allelic and linkage disequilibrium was set at zero in simulations 1.2 and 1.3.
The two parents of each 60 crosses were about 45% related, i.e., it is expected that 55% of the loci are polymorphic in the progeny. Linkage disequilibrium between adjacent pairs of markers in the breeding population was investigated at values of approximately zero (i.e., loci are statistically independent) and 0.20. Family sizes of either 10 or 50 F2 progeny were investigated in a topcross combination with a single unrelated tester. For any given simulation all families were of equal size, and populations of 600 or 3000 progeny were evaluated. All F2 progeny were genotyped and all F2:3 progeny were evaluated for a quantitatively expressed trait. Across the whole population, the expression of quantitative traits was due to five or ten unlinked QTLs, located in the middle of the linkage groups. Because of relatedness of the parents, fewer (e.g., 3 instead of 5) QTLs will be segregating in each cross. Each QTL had additive and fixed effect across all families with h2 QτL=0.05 or 0.15 (see Table 3).
Note that this first set of simulations has many similarities with a population derived from a single cross. The distinctions are that the number of sampled progeny is larger, different sets of QTLs can be segregating in each F2:3 family, and linkage phase between QTLs and marker loci are not consistent across the population. Under these conditions, the impact of population size, number of segregating QTLs, and linkage disequilibrium among breeding lines upon analysis methods was investigated. 2nd set of simulations: For the inbred parents in the base populations one of five alleles can occur at each locus (markers and QTLs). The allelic genotypes indicate ancestral alleles that are identical by descent (IBD) among the breeding lines. Typically such information is obtained by genotyping all the important lines involved in the pedigrees of the breeding populations. The frequencies of each allele in the population were 0.55, 0.24, 0.12, 0.06, and 0.03 respectively. Pairs of parents were selected in such a way that the two parents of a cross were about 10% related, i.e., approximately 90% of the loci are polymorphic between any pair of parents. Linkage disequilibrium between adjacent pairs of markers was approximately 0.20. Family sizes consisted of 50 F2:3 progeny that were top-crossed with a single unrelated tester. Each progeny was genotyped and evaluated for a quantitatively expressed trait. Expression of quantitative traits was due to five or ten QTLs. In four of the simulations, the QTLs segregated independently and they were located in the middle of the linkage groups. In one of the simulations, pairs of QTLs were located 50 centiMorgan (cM) from each other on the same chromosome. The QTLs were functionally bi-allelic, i.e., one of the five alleles that could occur at a QTL was chosen to have a positive (+) effect, while all remaining alleles were simulated to have an equal negative effect when combined with the tester allele. For some simulations the first allele was chosen as the positive allele (frequency f(+)=0.55), while for other simulations the third allele was chosen as the positive allele (f(+)=0.12). Thus, in the present simulations, the point of having 5 alleles was to easily vary the frequency of the positive allele. As is discussed later, the analysis method doesn't have this information and tries to deal with the multi-allelic data. Again h2 QTL=0.05 or 0.15 (see Table 3).
Note that for the second set of simulations, the main changes are: marker loci are multi-allelic and the allelic state of the marker loci is independent of the functional state of the QTLs. The multi-allelic state of marker loci is similar to the polymorphism index that has been observed in simple sequence repeat markers in maize (Senior et al. 1996). Because of the results from the first set of simulations, the family size remained at 50 progeny. As with the first set of simulations, different sets of QTLs are segregating in each family, and linkage phases between QTLs and marker loci are not consistent across the population.
3rd set of simulations: The third set of simulations was very much like the second set except that the parents of crosses were about 40% related and that linkage disequilibrium between adjacent pairs of markers was zero (worst case scenario). Thus, for the third set of simulations the primary changes in available information were due to changes in population structure. RESULTS OF THE SIMULATION STUDY
Three sets of simulations have been generated and analyzed via EVI and the new HAPLO-MQM model of the invetnion. The results are summarized in Table 3. Figure 4 shows several illustrative QTL likelihood plots. 1st set of simulations: Two of the most relevant aspects of these simulations are summarized. First, 200 bi-allelic markers were randomly chosen on the genome of 2000 cM. This resembles the marker density of the present day. But having bi-allelic markers at a higher density is expected to improve the application of the HAPLO-MQM model, where the goal is to reliably identify shared genomic blocks (IBD blocks). Second, a breeding population of 600 or 3000 individuals was generated, 60 families of 10 or 50 offspring each. The male and female parents of each family were chosen to be -45% related, so one Or more QTLs are potentially not segregating in their offspring. Therefore the "effective" population sizes are probably -300 or 1500, respectively. In the usual one-family experimental protocol a population size of 300 is already be considered relatively big, and a size of 1500 is considered huge.
With 50 progeny per family, most QTLs are detected by both JJVI and HAPLO- MQM (Table 3a). As expected, the case with moderate linkage disequilibrium (LD=0.20) is more favorable than that with zero linkage disequilibrium (simulation 1.1 and 1.2 in Table 3a). The latter case is the worst case scenario for HAPLO-MQM, because the marker haplotype does not provide information about the QTL allele associated with the markers. Surprisingly, HAPLO-MQM and HAPLO-MQM4" still worked sufficiently well - in simulation 1.3, even better than IM. This is because, although there are at most 120 different parents for 60 families. In reality, ten lines were used up to ten times as parents of different crosses and, in total, only 65 different parents were involved. The 65 genotypes at a window of four markers were clustered into 24=16 different haplotypes. Since several lines had been used frequently as parent, haplotype and phenotype still tended to be correlated.
In simulations 1.1 and 1.2 we used a family size of 50 progeny. Although with 10 progeny per family it is much harder to detect the QTLs, HAPLO-MQM performs slightly better than IM. Thus, while it remains difficult to detect each of 10 QTLs with h2QTL=0.05 per QTL, it would have been more difficult in the case of a single F2-testcross population of size 300. Unraveling the effects of 5 QTLs, with h2 QTL=0.15 for each QTL, is feasible, but this "major genes" situation is of less relevance to the breeder. The results of HAPLO-MQM are compared to those of HAPLO-MQM4", to see what the effect is of exclusion versus inclusion of parameters for family effects. The QTL likelihood peak is higher for HAPLO-MQM than for HAPLO-MQM+. This is caused by the fact that HAPLO- MQM exploits the between-family information in the trait means of the families. On the other hand, the figures of the "likelihood elsewhere" seem to be upwards biased under HAPLO-MQM in simulation 1.2. Under the null-hypothesis of "no QTL segregating," the expected QTL likelihood in no-QTL regions should be equal to the number of QTL parameters (-15). Results of simulation 1.2 support the initial expectation that bi-allelic and widely spread markers are not very suitable for an analysis of the HAPLO-MQM type: the families are clustered into -15 haplotype groups and a number of families is probably wrongly classified. With HAPLO-MQM, QTLs and fitted cofactors were found near the simulated positions.
2nd set of simulations: A low-density map of 200 markers was simulated, as in the 1st set of simulations. The main difference between the 1st and the 2nd set is that bi- allelic markers were changed to multi-allelic markers. Moreover, in the 2nd set linkage disequilibrium is equal to 0.20 in all simulations.
It was a priori expected that multi-allelic markers would be more suitable for tracing IBD blocks. A base population of 400 inbred lines was generated. Pairs of lines were chosen as parents of multiple families in such a way that for each pair the male and female parent were -10% related. This way 60 related families of 50 progeny each, 3000 individuals all together were produced. Thus, each cross per se was made between two mostly unrelated lines. Most parents were used only once or twice. Surprisingly, a parent of one given cross can be related to a parent of another cross at a much higher level than 10%. In fact, high values of relatedness were indeed observed. In total there could have been 54=625 different haplotypes. In reality the genotypes were clustered into -15 haplotype groups (see also below). The "effective" population size for a QTL, which is here defined as the expected number of families segregating for the QTL times the progeny size, is now approximated. One of the five QTL alleles was assigned a positive (+) effect, the other QTL alleles each had an equal negative effect. The + allele had a frequency of 0.55 or 0.12. Thus any given QTL was segregating for the + allele in only a subset of the entire population and the "effective" population sizes were relatively high, namely -1500 (30 families) and -300 (6 families) for the + allele frequencies of 0.55 and 0.12, respectively. One would therefore expect that at least major QTLs could be detected with standard JJVl and hopefully also with HAPLO- MQM. The results as presented in Table 3b are now discussed. Relatively high h2 with 5 or 10 QTLs have been simulated, and in many cases these QTLs can be detected via JJVI and HAPLO-MQM. As expected, QTL peaks are higher for a frequency of 0.55 of the positive QTL allele than for a frequency of 0.12 (compare simulations 2.1 and 2.2, or 2.3 and 2.4 in Table 3b). In general, HAPLO-MQM produces higher peaks than IM and HAPLO- MQM4". In the case of 10 QTLs linked in pairs at a distance of 50 cM, all three methods correctly detect the joint effect of pairs of linked QTLs (simulation 2.5). With EVI the 10 QTLs were mapped as 5 ghost QTLs in the middle of each pair of linked QTLs. Only the HAPLO-MQM and HAPLO-MQM4" methods have the power to dissect the effects of the pair-wise linked QTLs. The marker cofactors were selected by backward elimination and the QTL likelihoods in Table 3b were calculated conditionally on selected markers in the regions of the linked and unlinked QTLs.
3rd set of simulations: The third set of simulations differs in two aspects from the 2nd set. First, the breeding population consisted of fully inbred lines that were about 45% instead of 10% related. Second, linkage disequilibrium between adjacent pairs of markers in the breeding population was zero instead of 0.20. Therefore the "effective" population size is now much smaller, namely -600 (12 families) and -120 (2 families) for the two cases with positive QTL-allele frequencies of 0.55 and 0.12, respectively. In accordance, the QTL likelihood peaks in the 3rd set are much lower than in the 2nd set (Tables 3b and 3c). In the HAPLO-MQM approach the parents of the families are clustered on the basis of their haplotype in a window of 4 markers around the putative QTL. The same approach was used for cofactor markers. In the 2nd set the 120 parents were clustered into -15 groups. In the 3rd set the number of clusters increased to -37, partly because several parents were used more than once. This number of parameters per QTL or marker cofactor is still much lower than the 60 parameters per QTL in the IM approach.
MARKER ASSISTED SELECTION
The mapping of phenotypic traits relies on the ability to detect genetic differences between individuals. These genetic differences, or "genetic markers" are then correlated with phenotypic variations using the statistical methods of the present invention. In an ideal case, a single gene encoding a protein responsible for a phenotypic trait is detectable directly by a mutation which results in the variation in phenotype. More frequently, it is the case that multiple genetic loci each contribute to the observed phenotype. In the case of a quantifiable phenotype, e.g., height, weight, grain yield, oil content, etc., the genes underlying a phenotype are designated quantitative trait loci, or QTL. Detection and mapping of QTL typically utilizes the detection and correlation of genetic markers with the phenotypic trait under investigation. Although the specific DNA sequences which encode proteins are generally well-conserved across a species, regions of DNA which are non-coding, or which encode proteins or portions of proteins which lack critical function, tend to accumulate mutations, and therefore, are variable between members of the same species. Such regions provide the basis for numerous molecular genetic markers. Markers identify alterations in the genome which can be insertions, deletions, point mutations, recombination events, or the presence and sequence of transposable elements. Extensive review of the mechanisms underlying mutational events can be found in e.g., Freidberg et al., (1995) DNA Repair and Mutagenesis, American Society for Microbiology, Washington, D.C. Many genetic markers have been characterized in plant species of interest, and are known to those of skill in the art. Genetic markers can be detected by numerous methods, well-established in the art (e.g., restriction fragment length polymorphisms, isozyme markers, allele specific hybridization (ASH), amplified variable sequences of the plant genome, self-sustained sequence replication, simple sequence repeat (SSR), single nucleotide polymorphism (SNP), or arbitrary fragment length polymorphisms (AFLP)) . The majority of genetic markers rely on one or more property of nucleic acids for their detection. For the puposes of this discussion, the terms "nucleic acid," "polynucleotide," "polynucleotide sequence" and "nucleic acid sequence" refer to single- stranded or double-stranded deoxyribonucleotides or ribonucleotides and polymers thereof. As used herein, the term optionally includes known analogs of naturally occuring nucleotides. Unless otherwise indicated, a particular nucleic acid sequence of this invention encompasses conservatively modified variants thereof (e.g., degenerate codon substitutions) and complementary sequences, in addition to the sequence explicitly indicated. The term "gene" is used interchangebly for a specific genomic sequence, a cDNA and a mRNA encoded by the genomic sequence. Two single-stranded nucleic acids "hybridize" when they form a double- stranded duplex. The region of double-strandedness can include the full-length of one or both of the single-stranded nucleic acids, or all of one single stranded nucleic acid and a subsequence of the other single-stranded nucleic acid, or the region of double-strandedness can include a subsequence of each nucleic acid. An overview of the hybridization of nucleic acids is found in Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology-Hybridization with Nucleic Acid Probes. Part I, Chapter 2 "Overview of Principles of Hybridization and the Strategy of Nucleic Acid Probe Assays," Elsevier, New York. General Texts which discuss considerations relevant to nucleic acid hybridization, the selection of probes, and buffer and incubation conditions, and the like, as well as numerous other topics of interest in the context of the present invention (e.g., cloning of nucleic acids which correlate with QTL, sequencing of cloned QTL, the use of promoters, vectors, etc.) can be found in Berger and Kirnmel (1987) Guide to Molecular Cloning Techniques, Methods in Enzymology vol.152, Academic Press, Inc., San Diego ("Berger"); Sambrook et al., (1989) Molecular Cloning-A Laboratory Manual, 2nd ed. Vols. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor ("Sambrook"); and Ausubel et al., (eds) (supplemented through 1999) Current Protocols in Molecular Biology, John Wiley and Sons, Inc., ("Ausubel"). For example, some techniques for detecting genetic markers utilize hybridization of a probe nucleic acid to nucleic acids corresponding to the genetic marker. Markers which are restriction fragment length polymorphisms (RFLP), are detected by hybridizing a probe which is typically a sub-fragment (or a synthetic oligonucleotide corresponding to a sub-fragment) of the nucleic acid to be detected to restriction digested genomic DNA. The restriction enzyme is selected to provide restriction fragments of at least two alternative (or polymorphic) lengths in different individuals. After separation by length in an appropriate matrix (e.g., agarose) and transfer to a membrane (e.g., nitrocellulose, nylon), the labeled probe is hybridized under conditions which result in equilibrium binding of the probe to the target followed by removal of excess probe by washing. The hybridized probe is then detected using, most typically by autoradiography or other similar detection technique (e.g., fluorography, liquid scintillation counter, etc.). Examples of specific hybridization protocols are widely available in the art, see, e.g., Berger, Sambrook, Ausubel, all supra.
Amplified variable sequences refer to amplified sequenes of the plant genome which exhibit high nucleic acid residue variability between members of the same species. All organisms have variable genomic seuqences and each organism (with the exception of a clone) has a different set of variable sequences. Once identified, the presence of specific variable sequence can be used to predict phenotypic traits. Preferably, DNA from the plant serves as a template for amplification with primers that flank a variable sequence of DNA. The variable sequence is amplified and then sequenced.
In vitro amplification techniques are well known in the art. Examples of techniques sufficient to direct persons of skill through such invitro methods, including the polymerase chain reaction (PCR), the ligase chain reaction (LCR), Qβ-replicase amplification and other RNA polymerase mediated techniques (e.g., NASBA), are found in Berger, Sambrook and Ausubel (all supra) as well as Mullis et al. (1987) U.S. Patent No.4,683,202; PCR Protocols, A Guide to Methods and Applications (Innis et al., eds.) Academic Press Inc., San Diego Academic Press Inc. San Diego, CA (1990) (Innis); Arnheim & Levinson (October 1, 1990) C&EN 36-47; The Journal Of NIH Research (1991) 3, 81-94; (Kwoh et al. (1989) Proc. Natl. Acad. Sci. USA 86, 1173; Guatelli et al. (1990) Proc. Natl. Acad. Sci. USA 87, 1874; Lomell et al. (1989) J. Clin. Chem 35, 1826; Landegren et al., (1988) Science 241, 1077-1080; Van Brunt (1990) Biotechnology 8, 291-294; Wu and Wallace, (1989) Gene 4, 560; Barringer et al. (1990) Gene 89, 117, and Sooknanan and Malek (1995) Biotechnology 13: 563-564. Improved methods of cloning in vitro amplified nucleic acids are described in Wallace et al., U.S. Pat. No. 5,426,039. Improved methods of amplifying large nucleic acids by PCR are summarized in Cheng et al. (1994) Nature 369: 684, and the references therein, in which PCR amplicons of up to 40kb are generated. One of skill will appreciate that essentially any RNA can be converted into a double stranded DNA suitable for restriction digestion, PCR expansion and sequencing using reverse transcriptase and a polymerase. See, Ausubel, Sambrook and Berger, all supra.
Oligonucleotides for use as primers, e.g., in amplification reactions and for use as nucleic acid sequence probes are typically synthesized chemically according to the solid phase phosphoramidite triester method described by Beaucage and Caruthers (1981) Tetrahedron Lett. 22: 1859, or can simply be ordered commercially.
Alternatively, self-sustained sequence replication can be used to identify genetic markers. Self-sustained sequence replication refers to a method of nucleic acid amplification using target nucleic acid sequences which are replicated exponentially in vitro under substantially isothermal conditions by using three enzymatic activities involved in retroviral replication: (1) reverse transcriptase, (2) Rnase H, and (3) a DNA-dependent RNA polymerase (Guatelli et al. (1990) Proc Natl Acad Sci USA 87:1874). By mimicking the retroviral strategy of RNA replication by means of cDNA intermediates, this reaction accumulates cDNA and RNA copies of the original target. Arbitrary fragment length polymophisms (AFLP) can also be used as genetic markers (Vos et al. (1995) Nucl Acids Res 23:4407. The phrase "arbitrary fragment length polymorphism" refers to selected restriction fragments which are amplified before or after cleavage by a restriction endonuclease. The amplification step allows easier detection of specific restriction fragments. AFLP allows the detection large numbers of polymorphic markers and has been used form genetic mapping of plants (Becker et al. (1995) Mol Gen Genet 249:65; and Meksem et al. (1995) Mol Gen Genet 249:74.
Allele-specific hybridization (ASH) can be used to identify the genetic markers of the invention. ASH technology is based on the stable annealing of a short, single- stranded, oligonucleotide probe to a completely complementary single-strand target nucleic acid. Detection is via an isotopic or non-isotopic label attached to the probe.
For each polymorphism, two or more different ASH probes are designed to have identical DNA sequences except at the polymorphic nucleotides. Each probe will have exact homology with one allele sequence so that the range of probes can distinguish all the known alternative allele sequences. Each probe is hybridized to the target DNA. With appropriate probe design and hybridization conditions, a single-base mismatch between the probe and target DNA will prevent hybridization. In this manner, only one of the alternative probes will hybridize to a target sample that is homozygous or homogenous for an allele. Samples that are heterozygous or heterogeneous for two alleles will hybridize to both of two alternative probes.
ASH markers are used as dominant markers where the presence or absence of only one allele is determined from hybridization or lack of hybridization by only one probe. The alternative allele can be inferred from the lack of hybridizaiton. ASH probe and target molecules are optionally RNA or DNA; the target molecules are any length of nucleotides behond the sequence that is complementary to the probe; the probe is designed to hybridize with either strand of a DNA target; the probe ranges in size to conform to variously stringent hybridization conditions, etc.
PCR allows the target sequence for ASH to be amplified from low concentrations of nucleic acid in relatively small volumes. Otherwise, the target sequence from genomic DNA is digested with a restriction endonuclease and size separated by gel electrophoresis. Hybridizations typically occur with the target sequence bound to the surface of a membrane or, as described in U.S. Patent 5,468,613, the ASH probe sequence can be bound to a membrane. In one embodiment, ASH data are obtained by amplifying nucleic acid fragments (amplicons) from genomic DNA using PCR, transferring the amplicon target DNA to a membrane in a dot-blot format, hybridizing a labeled oligonucleotide probe to the amplicon target, and observing the hybridization dots by autoradiography. Single nucleotide polymorphisms (SNP) are markers that consist of a shared sequence differentiated on the basis of a single nucleotide. Typically, this distinction is detected by differential migration patterns of an amplicon comprising the SNP on e.g., an acrylamide gel. However, alternative modes of detection, such as hybridization, e.g., ASH, or RFLP analysis are not excluded. In yet another basis for providing a genetic linkage map, Simple sequence repeats (SSR), take advantage of high levels of di-, tri-, or tetra-nucleotide tandem repeats within a genome. Dinucleotide repeats have been reported to occur in the human genome as amny as 50,000 times with n varying from 10 to 60 or more (Jacob et al. (1991) Cell 67:213. Dinucleotide repeats have also been found in higher plants (Condit and Hubbell (1991) Genome 34:66).
Briefly, SSR data is generated by hybridizing primers to conserved regions of the plant genome which flank the SSR sequence. PCR is then used to amplify the dinucleotide repeats between the primers. The amplified sequences are then electorphoresed to determine the size and therefore the number of di-, tri-, and tetra-nucleotide repeats. Alternatively, isozyme markers are employed as genetic markers. Isozymes are multiple forms of enzymes which differ from one another in their amino acid, and therefore their nucleic acid sequences. Some isozymes are multimeric enzymes contianing slightly different subunits. Other isozymes are either multimeric or monomeric but have been cleaved from the proenzyme at different sites in the amino acid seuqence. Isozymes can be characterized and analysed at the protein level, or alternatively, isozymes which differ at the nucleic acid level can be determined. In such cases any of the nucleic acid based methods described herein can be used to analyze isozyme markers.
Selection of Plants using Marker Assisted Selection
A primary motivation for development of molecular markers in crop species is the potential for increased efficiency in plant breeding through marker assisted selection
(MAS). Genetic marker alleles are used to identify plants that contain a desired genotype at multiple loci, and that are expected to transfer the desired genotype, along with a desired phenotype to their progeny. Genetic marker alleles can be used to identify plants that contain a desired genotype at one marker locus, several loci, or a haplotype, and that would be expected to transfer the desired genotype, along with a desired phenotype to their progeny.
The presence and/or absence of a particular genetic marker allele in the genome of a plant exhibiting a preferred phenotypic trait is made by any method listed above, e.g., RFLP, AFLP, SSR, etc. If the nucleic acids from the plant are positive for a desired genetic marker, the plant can be selfed to create a true breeding line with the same genotype, or it can be crossed with a plant with the same marker or with other desired characteristics to create a sexually crossed hybrid generation.
CLONING QTL AND OTHER GENES WHICH CORRELATE TO PHENOTYPES "Positional gene cloning" uses the proximity of a genetic marker to physically define a cloned chromosomal fragment that is linked to a QTL identified using the statistical methods of the invention. Clones of nucleic acids linked to QTL have a variety of uses, including as genetic markers for identificaiton of additional QTLs in subsequent applications of marker assited selection (MAS). Markers which are adjacent to an open reading frame (ORF) associated with a phenotypic trait can hybridize to a DNA clone, thereby identifying a clone on which an ORF is located. If the marker is more distant, a fragment containing the open reading frame is identified by successive rounds of screening and isolation of clones which together comprise a contiguous sequence of DNA, a "contig." Protocols sufficient to guide one of skill through the isolation of clones associated with linked markers are found in, e.g., Berger, Sambrook and Ausubel, all supra.
It will be appreciated that numerous vectors are available in the art for the isolation and replication of the nucleic acids of the invention. For example, plasmids, cosmids and phage vectors are well known in the art, and are sufficient for many applications. In certain applications, it is advantageous to make or clone large nucleic acids to identify nucleic acids more distantly linked to a given marker, or to isolate nucleic acids linked to or reponsible for QTLs as identified herein. In such cases, a number of vectors capable of accomodating large nucleic acids are available in the art, these include, yeast artificial chromosomes (YACs), bacterial artificial chromosomes (BACs), plant artificial chromosomes (PLACs) and the like. For a general introduction to YACs, BACs, PACs and MACs as artificial chromosomes, see, e.g., Monaco and Larin (1994) Trends Biotechnol 12:280. In addition, methods for the in vitro amplification of large nucleic acids linked to genetic markers are widely available (e.g., Cheng et al. (1994) Nature 369:684, and references therein).
GENERATION OF TRANSGENIC PLANTS
The present invention also relates to host cells and organisms which are transformed with nucleic acids corresponding to QTL and other genes identified according to the invention. Additionally, the invention provides for the production of polypeptides corresponding to QTL by recombinant techniques. Host cells are genetically engineered (i.e., transduced, transfected or transformed) with the vectors of this invention (i.e., vectors which comprise QTLs or other nucleic acides identified according ot the methods of the invention and as described above) which are, for example, a cloning vector or an expression vector. Such vectors are, for example, in the form of a plasmid, an agrobacterium, a virus, a naked polynucleotide, or a conjugated polynucleotide. The vectors are introduced into plant tissues, cultured plant cells or plant protoplasts by standard methods including electroporation (From et al. (1985) Proc. Natl. Acad. Sci. USA 82;5824), infection by viral vectors such as cauliflower mosaic virus (CaMV) (Hohn et al. (1982) Molecular Biology of Plant Tumors (Academic Press, New York, pp. 549-560; Howell U.S. Patent No. 4,407,956), high velocity ballistic penetration by small particles with the nucleic acid either within the matrix of small beads or particles, or on the surface (Klein et al. (1987) Nature 321:10), use of pollen as vector (WO 85/01856), or use of Agrobacterium tumefaciens or A. rhizogenes carrying a T- DNA plasmid in which DNA fragments are cloned. The T-DNA plasmid is transmitted to plant cells upon infection by Agrobacterium tumefaciens, and a portion is stably integrated into the plant genome (Horsch et al. (1984) Science 233:496: Fraley et al. (T983)Proc. Natl. Acad. Sci. USA 80:4803).
The engineered host cells can be cultured in conventional nutrient media modified as appropriate for such activities as, for example, activating promoters or selecting transformants. These cells can optionally be cultured into transgenic plants. Plant regeneration from cultured protoplasts is described in Evans et al. (1983) "Protoplast Isolation and Culture," Handbook of Plant Cell Cultures 1, 124-176 (MacMillan Publishing Co., New York,; Davey (1983) "Recent Developments in the Culture and Regeneration of Plant Protoplasts," Protoplasts, pp. 12-29, (Birkhauser, Basel); Dale (1983) "Protoplast
Culture and Plant Regeneration of Cereals and Other Recalcitrant Crops," Protoplasts pp. 31- 41, (Birkhauser, Basel); Binding (1985) "Regeneration of Plants," Plant Protoplasts, pp. 21- 73, (CRC Press, Boca Raton,).
The present invention also relates to the production of transgenic organisms, which can be bacteria, yeast, fungi, or plants, transduced with the nucleic acids, e.g., cloned QTL of the invention. A thorough discussion of techniques relevant to bacteria, unicellular eukaryotes and cell culture can be found in references enumerated above and are briefly outlined as follows. Several well-known methods of introducing target nucleic acids into bacterial cells are available, any of which can be used in the present invention. These include: fusion of the recipient cells with bacterial protoplasts containing the DNA, electroporation, projectile bombardment, and infection with viral vectors (discussed further, below), etc. Bacterial cells can be used to amplify the number of plasmids containing DNA constructs of this invention. The bacteria are grown to log phase and the plasmids within the bacteria can be isolated by a variety of methods known in the art (see, for instance, Sambrook). In addition, a plethora of kits are commercially available for the purification of plasmids from bacteria. For their proper use, follow the manufacturer's instructions (see, for example, EasyPrep™, FlexiPrep™, both from Pharmacia Biotech; StrataClean™, from Stratagene; and, QIAprep™ from Qiagen). The isolated and purified plasmids are then further manipulated to produce other plasmids, used to transfect plant cells or incorporated into Agrobacterium tumefaciens related vectors to infect plants. Typical vectors contain transcription and translation terminators, transcription and translation initiation sequences, and promoters useful for regulation of the expression of the particular target nucleic acid. The vectors optionally comprise generic expression cassettes containing at least one independent terminator sequence, sequences permitting replication of the cassette in eukaryotes, or prokaryotes, or both, (e.g., shuttle vectors) and selection markers for both prokaryotic and eukaryotic systems. Vectors are suitable for replication and integration in prokaryotes, eukaryotes, or preferably both. See, Giliman & Smith (1979) Gene 8:81; Roberts et al. (1987) Nature 328:731; Schneider et al. (1995) Protein Expr. Purif. 6435:10; Ausubel, Sambrook, Berger (all supra). A catalogue of Bacteria and Bacteriophages useful for cloning is provided, e.g., by the ATCC, e.g., The ATCC Catalogue of Bacteria and Bacteriophage (1992) Gherna et al. (eds) published by the ATCC. Additional basic procedures for sequencing, cloning and other aspects of molecular biology and underlying theoretical considerations are also found in Watson et al. (1992) Recombinant DNA, Second Edition, Scientific American Books, NY. Transforming Nucleic Acids into Plants.
Embodiments of the present invention pertain to the production of transgenic plants comprising the cloned nucleic acids of the invention. Techniques for transforming plant cells with nucleic acids are generally available and can be adapted to the invention by the use of nucleic acids encoding QTL or other genes encoding phenotypic traits of the invention. In addition to Berger, Ausubel and Sambrook, useful general references for plant cell cloning, culture and regeneration include Jones (ed) (1995) Plant Gene Transfer and
Expression Protocols- Methods in Molecular Biology, Volume 49 Humana Press Towata NJ;
Payne et al. (1992) Plant Cell and Tissue Culture in Liquid Systems John Wiley & Sons, Inc. New York, NY (Payne); and Gamborg and Phillips (eds) (1995) Plant Cell, Tissue and Organ
Culture; Fundamental Methods Springer Lab Manual, Springer- Verlag (Berlin Heidelberg
New York) (Gamborg). A variety of cell culture media are described in Atlas and Parks (eds)
The Handbook of Microbiological Media Q993) CRC Press, Boca Raton, FL (Atlas).
Additional information for plant cell culture is found in available commercial literature such as the Life Science Research Cell Culture Catalogue (1998) from Sigma- Aldrich, Inc (St
Louis, MO) (Sigma-LSRCCC) and, e.g., the Plant Culture Catalogue and supplement (1997) also from Sigma-Aldrich, Inc (St Louis, MO) (Sigma-PCCS). Additional details regarding plant cell culture are found in Cray, (ed.) (1993) Plant Molecular Biology Bios Scientific
Publishers, Oxford, U.K. The nucleic acid constructs of the invention, e.g., plasmids, cosmids, artificial chromosomes, DNA and RNA polynucleotides, are introduced into plant cells, either in culture or in the organs of a plant by a variety of conventional techniques. Where the sequence is expressed, the sequence is optionally combined with transcriptional and translational initiation regulatory sequences which direct the transcription or translation of the sequence from the exogenous DNA in the intended tissues of the transformed plant.
The DNA constructs of the invention, for example plasmids, cosmids, phage, naked or variously conjugated-DNA polynucleotides, (e.g., polylysine-conjugated DNA, pep tide-conjugated DNA, liposome-conjugated DNA, etc.), or artificial chromosomes, can be introduced directly into the genomic DNA of the plant cell using techniques such as electroporation and microinjection of plant cell protoplasts, or the DNA constructs can be introduced directly to plant cells using ballistic methods, such as DNA particle bombardment.
Microinjection techniques for injecting e.g., cells, embryos, and protoplasts, are known in the art and well described in the scientific and patent literature. For example, a number of methods are described in Jones (ed) (1995) Plant Gene Transfer and Expression Protocols- Methods in Molecular Biology, Volume 49 Humana Press Towata NJ, as well as in the other references noted herein and available in the literature.
For example, the introduction of DNA constructs using polyethylene glycol precipitation is described in Paszkowski, et al., EMBO J. 3:2717 (1984). Electroporation techniques are described in Fromm, et al., Proc. NatT. Acad. Sci. USA 82:5824 (1985). Ballistic transformation techniques are described in Klein, et al., Nature 327:70-73 (1987). Additional details are found in Jones (1995) supra.
Alternatively, and in some cases preferably Agrobacterium mediated transformation is employed to generate transgenic plants. Agrobacterium-mediated transformation techniques, including disarming and use of binary vectors, are also well described in the scientific literature. See, for example Horsch, et al. (1984) Science 233:496; and Fraley et al. (1984) Proc. Nat'l. Acad. Sci. USA 80:4803 and recently reviewed in Hansen and Chilton (1998) Current Topics in Microbiology 240:22 and Das (1998) Subcellular Biochemistry 29: Plant Microbe Interactions pp343-363.
Transformed plant cells which are derived by any of the above transformation techniques can be cultured to regenerate a whole plant which possesses the transformed genotype and thus the desired phenotype. Such regeneration techniques rely on manipulation of certain phytohormones in a tissue culture growth medium, typically relying on a biocide and/or herbicide marker which has been introduced together with the desired nucleotide sequences. Plant regeneration from cultured protoplasts is described in Evans et al. (1983) Protoplasts Isolation and Culture, Handbook of Plant Cell Culture pp. 124-176, Macmillian Publishing Company, New York; and Binding (1985) Regeneration of Plants, Plant Protoplasts pp. 21-73, CRC Press, Boca Raton. Regeneration can also be obtained from plant callus, explants, somatic embryos (Dandekar et al. (1989) J. Tissue Cult. Meth. 12:145;
McGranahan, et al. (1990) Plant Cell Rep. 8:512) organs, or parts thereof. Such regeneration techniques are described generally in Klee et al. (1987)., Ann. Rev, of Plant Phys. 38:467- 486. Additional details are found in Payne (1992) and Jones (1995), both supra. These methods are adapted to the invention to produce transgenic plants bearing QTLs and other genes isolated according to the methods of the invention.
Preferred plants for the transformation and expression of QTL and other nucleic acids identified and cloned according to the present invention include agronomically and horticulturally important species. Such species include, but are not restricted to members of the families: Graminae (including corn, rye, triticale, barley, millet, rice, wheat, oats, etc.); Leguminosae (including pea, beans, lentil, peanut, yam bean, cowpeas, velvet beans, soybean, clover, alfalfa, lupine, vetch, lotus, sweet clover, wisteria, and sweetpea); Compositae (the largest family of vascular plants, including at least 1,000 genera, including important commercial crops such as sunflower) and Rosaciae (including raspberry, apricot, almond, peach, rose, etc.), as well as nut plants (including, walnut, pecan, hazelnut, etc.), and forest trees (including Pinus, Quercus, Pseutotsuga, Sequoia, Populus.etc)
Additionally, preferred targets for modification with the nucleic acids of the invention, as well as those specified above, plants from the genera: Agrostis, Allium, Antirrhinum, Apium, Arachis, Asparagus, Atropa, Avena (e.g., oats), Bambusa, Brassica, Bromus, Browaalia, Camellia, Cannabis, Capsicum, Cicer, Chenopodium, Chichorium, Citrus, Coffea, Coix, Cucumis, Curcubita, Cynodon, Dactylis, Datura, Daucus, Digitalis, Dioscorea, Elaeis, Eleusine, Festuca, Fragaria, Geranium, Glycine, Helianthus, Heterocallis, Hevea, Hordeum (e.g., barley), Hyoscyamus, Ipomoea, Lactuca, Lens, Lilium, Linum, Lolium, Lotus, Lycopersicon, Majorana, Malus, Mangifera, Manihot, Medicago,
Nemesia, Nicotiana, Onobrychis, Oryza (e.g., rice), Panicum, Pelargonium, Pennisetum (e.g., millet), Petunia, Pisum, Phaseolus, Phleum, Poa, Prunus, Ranunculus, Raphanus, Ribes, Ricinus, Rubus, Saccharum, Salpiglossis, Secale (e.g., rye), Senecio, Setaria, Sinapis, Solarium, Sorghum, Stenotaphrum, Theobroma, Trifolium, Trigonella, Triticum (e.g., wheat), Vicia, Vigna, Vitis, Zea (e.g., corn), and the Olyreae, the Pharoideae and many others. As noted, plants in the family Graminae are a particularly preferred target plants for transformation with cloned sequences corresponding to QTL or other nucleic acids by the methods of the invention.
Common crop plants which are targets of the present invention include corn, rice, triticale, rye, cotton, soybean, sorghum, wheat, oats, barley, millet, sunflower, canola, peas, beans, lentils, peanuts, yam beans, cowpeas, velvet beans, clover, alfalfa, lupine, vetch, lotus, sweet clover, wisteria, sweetpea and nut plants (e.g., walnut, pecan, etc).
In construction of recombinant expression cassettes of the invention, which include, for example, helper plasmids comprising virulence functions, and plasmids or viruses comprising exogenous DNA sequences such as structural genes, a plant promoter fragment is optionally employed which directs expression of a nucleic acid in any or all tissues of a regenerated plant. Examples of constitutive promoters include the cauliflower mosaic virus (CaMV) 35S transcription initiation region, the 1'- or 2'- promoter derived from T-DNA of Agrobacterium tumefaciens, and other transcription initiation regions from various plant genes known to those of skill. Alternatively, the plant promoter can direct expression of the polynucleotide of the invention in a specific tissue (tissue-specific promoters) or can be otherwise under more precise environmental control (inducible promoters). Examples of tissue-specific promoters under developmental control include promoters that initiate transcription only in certain tissues, such as fruit, seeds, or flowers.
Any of a number of promoters which direct transcription in plant cells can be suitable. The promoter can be either constitutive or inducible. In addition to the promoters noted above, promoters of bacterial origin which operate in plants include the octopine synthase promoter, the nopaline synthase promoter and other promoters derived from native Ti plasmids. See, Herrara-Estrella et al. (1983), Nature, 303:209. Viral promoters include the 35S and 19S RNA promoters of cauliflower mosaic virus. See, Odell et al. (1985) Nature, 313:810. Other plant promoters include the ribulose-l,3-bisphosphate carboxylase small subunit promoter and the phaseolin promoter. The promoter sequence from the E8 gene and other genes can also be used. The isolation and sequence of the E8 promoter is described in detail in Deikman and Fischer (1988) EMBO J. 7:3315. Many other promoters are in current use and can be coupled to an exogenous DNA sequence to direct expression of the nucleic acid.
If expression of a polypeptide, including those encoded by QTL or other nucleic acids correlating with phenotypic traits of the present invention, is desired, a polyadenylation region at the 3 '-end of the coding region is typically included. The polyadenylation region can be derived from the natural gene, from a variety of other plant genes, or from, e.g., T-DNA.
The vector comprising the sequences (e.g., promoters or coding regions) from genes encoding expression products and transgenes of the invention will typically include a nucleic acid subsequence, a marker gene which confers a selectable, or alternatively, a screenable, phenotype on plant cells. For example, the marker can encode biocide tolerance, particularly antibiotic tolerance, such as tolerance to kanamycin, G418, bleomycin, hygromycin, or herbicide tolerance, such as tolerance to chlorosluforon, or phosphinothricin (the active ingredient in the herbicides bialaphos or Basta). See, e.g., Padgette et al. (1996) In: Herbicide-Resistant Crops (Duke, ed.), pp 53-84, CRC Lewis Publishers, Boca Raton ("Padgette, 1996"). For example, crop selectivity to specific herbicides can be conferred by engineering genes into crops which encode appropriate herbicide metabolizing enzymes from other organisms, such as microbes. See, Vasil (1996) In: Herbicide-Resistant Crops (Duke, ed.), pp 85-91, CRC Lewis Publishers, Boca Raton) ("Vasil", 1996).
One of skill will recognize that after the exogenous DNA sequence is stably incorporated in transgenic plants and confirmed to be operable, it can be introduced into other plants by sexual crossing. Any of a number of standard breeding techniques can be used, depending upon the species to be crossed.
High Throughput Screening
In one aspect of the invention, the determination of genetic marker alleles is performed by high throughput screening. High throughput screening involves providing a library of genetic markers, e.g., RFLPs, AFLPs, isozymes, specific alleles and variable seuqences, including SSR. Such libraries are then screened against plant genomes to generate a "fingerprint" for each plant under consideration. In some cases a partial fingerprint comprising a sub-portion of the markers is generated in an area of interest. Once the genetic marker alleles of a plant have been identified, the correspondence between one or several of the marker alleles and a desired phenotypic trait is determined through statistical associations based on the methods of this invention.
High throughput screening can be performed in many different formats.
Hybridization can take place in a 96-, 324-, or a 1524-well format or in a matrix on a silicon chip or other format. In one commonly used format, a dot blot apparatus is used to deposit samples of fragmented and denatured genomic or amplified DNA on a nylon or nitrocellulose membrane. After cross-linking the nucleic acid to the membrane, either through exposure to ultra-violet light or by heat, the membrane is incubated with a labeled hybridization probe.
The labels are incorporated into the nucleic acid probes by any of a number of means well- known in the art. The membranes are washed to remove non-hybridized probes and the association of the label with the target nucleic acid sequence is determined.
A number of well-known robotic systems have been developed for high thtoughput screening, particularly in a 96 well format. These systems inlcude automated worksations like the automated synthesis apparatus developed by Takeda Chemical Industries, LTD. (Osaka, Japan) and many robotic systems utilizing robotic arms (Zymate π,
Zymark Corporation, Hopkinton, MA.; ORCA™, Beckman Coulter, Fullerton CA). Any of the above devices are suitable for use with the present invention. The nature and implementation of modifications to these devices (if any) so that they can operate as discussed herein will be apparent to persons skilled in the relevant art.
In addition, high throughput screening systems themselves are commercially available (see, e.g., Zymark Corp., Hopkinton, MA; Air Technical Industries, Mentor, OH; Beckman Instruments, Inc. FuUerton, CA; Precision Systems, Inc., Natick, MA, etc.). These systems typically automate entire procedures including all sample and reagent pipetting, liquid dispensing, timed incubations, and final readings of the microplate or membrane in detector(s) appropriate for the assay. These configurable systems provide high throughput and rapid start up as well as a high degree of flexibility and customization. The manufacturers of such systems provide detailed protocols for the use of their products in high throughput applications.
In one variation of the invention, solid phase arrays are adapted for the rapid and spcific detection of multiple polymorphic nucleotides. Typically, a nucleic acid probe is linked to a solid support and a target nucleic acid is hybridized to the probe. Either the probe, or the target, or both, can be labeled, typically with a fluoropore. If the target is labeled, hybridization is evaluated by detecting bound fluorescence. If the probe is labeled, hybridization is typically detected by quenching of the label by the bound nucleic acid. If both the probe and the target are labeled, detection of hybridizaiton is typically performed by monitoring a color shift resulting from proximity of the two bound labels. In one embodiment, an array of probes are synthesized on a solid support.
Using chip masking technologies and photoprotective chemistry, it is possible to generate ordered arrays of nucleic acid probes. These arrays, which are known, e.g., as "DNA chips" or as very large scale immobilized polymer arrays (VLSTPS™ arrays) can include millions of defined probe regions on a substrate having an area of about 1 cm2 to several cm2. In another embodiment, capillary electorphoresis is used to analyze polymorphism. This technique works best when the polymophism is based on size, for example, AFLP and SSR. This technique is described in detail in U.S.Patent Nos. 5,534,123 and 5,728,282. Briefly, capillary electrophoresis tubes are filled with the separation matrix. The separation matrix contains hydroxyethyl cellulose, urea and optionally formamide. The AFLP or SSR samples are loaded onto the capillary tube and electorphoresed. Because of the small amount of sample and separation matrix required by capillary electrophoresis, the run times are very short. The molecular sizes and therefore, the number of nucleotides present in the nucleic acid sample is determined by techniques described herein.In a high throughput format, many capillary tubes are placed in a capillary electrophoresis apparatus. The samples are loaded onto the tubes and electrophoresis of the samples is run simultaneously. See, Mathies and Huang, (1992) Nature 359:167.
Integrated Systems/Computer Assisted Methods Because of the great number of possible combinations present in one array, in one aspect of the invention, an integrated system such as a computer, software corresponding to the statistical models of the invention, and data sets corresponding to genetic markers and phenotypic values, facilitates mapping of phenotypic traits, including QTLs. The phrase
"integrated system" in the context of this invention refers to a system in which data entering a computer corresponds to physical objects or processes external to the computer, e.g., nucleic acid sequence hybridization, and a process that, within a computer, causes a physical transformation of the input signals to different output signals. In other words, the input data, e.g., hybridization on a specific region of an array is transformed to output data, e.g., the identification of the sequence hybridized. The process within the computer is a set of instructions, or "program," by which positive hybridization signals are recognized by the integrated system and attributed to individual samples as a genotype. Additional programs correlate the genotype, and more particularly in the methods of the invention, the haplotype, of individual samples with phenotypic values, e.g., using the HAPLO-IM4", HAPLO-MQM, and/or HAPLO-MQM4" models of the invention. For example, the programs JoinMap® and MapQTL® are particularly suited to this type of analysis and can be extended to include the
HAPLO-DVr", HAPLO-MQM, and/or HAPLO-MQM4" models of the invention. In addition there are numerous e.g., C/C++ programs for computing, Delphi and/or Java programs for
GUI interfaces, and Active X applications (e.g., Olectra Chart and True WevChart) for charting tools. Other useful software tools in the context of the integrated systems of the invention include statistical packages such as SAS, Genstat, and S-Plus. Furthermore additional programming languages such as Fortran and the like are also suitably employed in the integrated systems of the invention.
For example, phenotypic values assigned to a population of progeny descending from releated or unrelated crosses are recorded in a computer readable medium, thereby establishing a database corresponding phenotypic values with unique identifiers for each member of the population of progeny. Data regarding gentoype for one or more haplotypes corresponding to a plurarlity of genetic markers, e.g., RFLP, AFLP, ASH, isozyme markers, SSR, SNP or other markers as described herein, are similarly recorded in a
40
SUBSf ITUTE SHEET (RULE 26) computer accessible database. Optionally, marker data is obtained using an integrated system that automates one or more aspects of the assay (or assays) used to determine the haplotype.
In such a system, input data corresponding to genotypes for independent genetic markers or for haplotypes are relayed from a device, e.g., an array, a scanner, a CCD, or other detection device directly to files in a computer readable medium accessible to the central processing unit. A set of instructions (embodied in one or more programs) encoding the statistical models of the invention is then executed by the computational device to identify correlations between phenotypic values and haplotypes. Typically, the integrated system also includes a user input device, such as a keyboard, a mouse, a touchscreen, or the like, for, e.g., selecting files, retrieving data, etc., and an output device (e.g., a monitor, a printer, etc.) for viewing or recovering the product of the statistical analysis.
Thus, in one aspect, the invention provides an integrated system comprising a computer or computer readable medium comprising a database with at least one data set that corresponds to genotypes for genetic markers. The system also includes a user interface allowing a user to selectively view one or more databases. In addition, standard text manipulation software such as word processing software (e.g., Microsoft Word™ or Corel
Wordperfect™) and database or spreadsheet software (e.g., spreadsheet software such as
Microsoft Excel™, Corel Quattro Pro™, or database programs such as Microsoft Access™ or
Paradox™) can be used in conjunction with a user interface (e.g., a GUI in a standard operating system such as a Windows, Macintosh or Linux system) to manipulate strings of characters.
The invention also provides integrated systems for sample manipulation incorporating robotic devices as previously described. A robotic liquid control armature for transferring solutions (e.g., plant cell extracts) from a source to a destination, e.g., from a microtiter plate to an array substrate, is optionally operably linked to the digital computer (or to an additional computer in the integrated system). An input device for entering data to the digital computer to control high throughput liquid transfer by the robotic liquid control armature and, optionally, to control transfer by the armature to the solid support is commonly a feature of the integrated system. Integrated systems for genetic marker analysis of the present invention typically include a digital computer with one or more of high-throughput liquid control software, image analysis software, data interpretation software, a robotic liquid control armature for transferring solutions from a source to a destination operably linked to the digital computer, an input device (e.g., a computer keyboard) for entering data to the digital computer to control high throughput liquid transfer by the robotic liquid control armature and, optionally, an image scanner for digitizing label signals from labeled probes hybridized, e.g., to expression products on a solid support operably linked to the digital computer. The image scanner interfaces with the image analysis software to provide a measurement of, e.g., differentiating nucleic acid probe label intensity upon hybridization to an arrayed sample nucleic acid population, where the probe label intensity measurement is interpreted by the data interpretation software to show whether, and to what degree, the labeled probe hybridizes to a label. The data so derived is then correlated with phenotypic values using the statistical models of the present invention, to determine the correspondence between phenotype and genotype(s) for genetic markers, thereby, assigning chromosomal locations.
Optical images, e.g., hybridization patterns viewed (and, optionally, recorded) by a camera or other recording device (e.g., a photodiode and data storage device) are optionally further processed in any of the embodiments herein, e.g., by digitizing the image and/or storing and analyzing the image on a computer. A variety of commercially available peripheral equipment and software is available for digitizing, storing and analyzing a digitized video or digitized optical image, e.g., using PC (Intel x86 or pentium chip- compatible DOS™, OS2™ WINDOWS™, WINDOWS NT™ or WINDOWS95™ based machines), MACINTOSH™, LINUX, or UNIX based (e.g., SUN™ work station) computers.
While the foregoing invention has been described in some detail for puposes of clarity and understanding, it will be clear to one skilled in the art from a reading of this disclosure that various changes in form and detail can be made without departing from the true scope of the invention. For example, alternative genetic markers can readily be applied in the methods of the invention. Additionally, both single gene and quantitative trait loci are suitable for mapping according to the methods of the invention. All publications, patents, patent applications, or other documents cited in this application are incorporated by reference in their entirety for all purposes to the same extent as if each individual publication, patent, patent application, or other document were individually indicated to be incorporated by reference for all purposes. Table 1. QTL in an F2:3 testcross. Parents Q1Q1 and Q2Q2 are crossed to generate F2 offspring and each F2 plant is crossed to a tester QQ to generate an F2:3 line.
Table 2. Summary of models for multiple F2.3 populations
Approach QTL-effects defined per Regression model for a given F2 plant in a given f
"TM Family y= alfx1 + a2fx2 + e '
MQM Family y= ∑i{aιf(i).χ l(i)+a2f(i).χ a(i) } + e
HAPLO-MQM Haplotype y=i{ah(lf)(i).χl(i)+ah(2f)(i).χ2(i) } + e HAPLO-MQM+ Haplotype ^ ^^(^QH^C^Ci) } ÷ uf+ e y trait value of a given F2 plant in family f "
CΛ if effect of QTL allele obtained from parent 1 of family f
C m if effect of QTL allele obtained from parent 2 of family f
5 h(1f)
H effect of QTL haplotype obtained from parent 1 of family f
C 4-- m h(2f) effect of QTL haplotype obtained from parent 2 of family f
X i indicator for the number of copies of QTL allele obtained from parent 1 m m X2 indicator for the number of copies of QTL allele obtained from parent 2
"5 uf family effect (genetic background) rπ 10 e residual error ro i QTL number indicator; i=l..NQTL
Nfam number of families
NQTL number of QTLs
Nπaplo . number of different haplotypes in window around QTL
Table 3. Summarized results from the simulations. See main text for detailed description, (a) 1st set of simulations with bi-allelic markers and 60 families with 10 or 50 progeny/family derived from 45% related parents with zero or 0.20 linkage disequilibrium; (b) 2nd set of simulations with multi-allelic markers and 60 families with 50 progeny/family derived from 10% related parents with 0.20 linkage disequilibrium; (c) 3rd set of simulations with multi- allelic markers and 60 families with 50 progeny/family derived from 40% related parents with zero linkage disequilibrium.
a LD=linkage disequilibrium; f(+)=frequency of active QTL allele; h2QTL= TO"2QTL / (σ2QTL + σ2 a + σ2 e), where QTL is the variation induced by the QTL if it segregates in a single F2:3 population, σ2 a refers to genetic variation when also all other QTLs segregate and σ2 e is the residual variation; b We used (approximate) thresholds for QTL detection; For TM χ (60;0.001)~100. For
HAPLO-MQM and HAPLO-MQM4" χ2(15;0.001)~38 in the 1st and 2nd set and χ2(37;0.001)~69 in the 3rd set; c Lowest and highest QTL peak in the regions of the simulated QTLs. We used twice the likelihood-ratio as a test statistic for QTL likelihood; d IM: average QTL likelihood on chromosomes where no QTLs were simulated; No results (-) when each chromosome contains a QTL. e HAPLO-MQM and HAPLO-MQM4": average QTL likelihood on chromosomes 1-10, regions of 30 cM on each side of simulated QTLs excluded;
Short description of Results IM HAPLO- HAPLO- simulation" MQM MQM+
(a) Is' set of simulations
1.1 10 unlinked QTLs # QTLs found" 10 10 10
LD=0.20; h2 QTL=0.05 Likelihood QTL regions0 108-205 132-243 88-163
50 progeny/family Likelihood elsewherede - 14 5.3
1.2 10 unlinked QTLs # QTLs found 10 9 8
LD=0; h2 QTL= 0.05 Likelihood QTL regions 108-202 35-228 27-183
50 progeny/family Likelihood elsewhere - 33 16
1.3 10 unlinked QTLs # QTLs found 0 3 0
LD=0; h2 QTL= 0.05 Likelihood QTL regions 35-85 24-47 20-36
10 progeny/family Likelihood elsewhere - 17 13
(b) 2nd set of simulations
2.1 5 unlinked QTLs # QTLs foundb 5 5 5
LD=0. 20; h2 QTL=0.15 Likelihood QTL regions0 314-435 839-1222 590-937 f(+) = 0.55 Likelihood elsewherede 53 23 11
2.2 5 unlinked QTLs # QTLs found 5 5 5
LD=0. 20; h2 QTL=0.15 Likelihood QTL regions 202-270 630-950 374-498 f(+) = 0.12 Likelihood elsewhere 61 19 11
2.3 10 unlinked QTLs # QTLs found 7 10 10
LD=0. 20; h2 QTL=0.05 Likelihood QTL regions 83-146 94-177 61-131 f(+) = 0.12 Likelihood elsewhere - 12 10
2.4 10 unlinked QTLs # QTLs found 9 10 10
LD=0. 20; h2 QTL=0.05 Likelihood QTL regions 89-227 94-289 67-250 f(+) = 0.55 and 0.12 Likelihood elsewhere - 13 11
2.5 10 QTLs linked in pairs # QTLs found 5 10 10
LD=0. 20; h2 QTL=0.05 Likelihood QTL regions 102-175 53-252 50-167 f(+) = 0.55 and 0.12 Likelihood elsewhere 72 12 10
(c) 3rd set of simulations
3.1 5 unlinked QTLs # QTLs founda 4 5 4
LD=0; h2 QTL=0.15 Likelihood QTL regions'1 85-422 201-969 46-644 f(+) = 0.55 Likelihood elsewherecd 55 56 26
3.2 5 unlinked QTLs # QTLs found 3 4 3
LD=0; h2 QTL=0.15 Likelihood QTL regions 48-543 57-776 31-508 f(+) = 0.12 Likelihood elsewhere 50 42 25 Table 4. A comparison of informative features that are important for mapping QTL in two general population structures utilized by plant breede
Features Bi-parental inbred cross Multi-parent breeding crosses
Basis for selecting lines in the cross Divergent phenotypes and informative markers Agronomic performance and pedigree relationships
Number of lines involved in 2 50-100 crossing
Number of progeny 10°-300 1,000-10,000
Information Fully informative at all polymorphic markers and Partially informative markers and QTLs across
C oo TLs families
CΛ Disequilibrium Maximized Maximized within families, but variable across
C - families m
CΛ Linkage phase Known and consistent May be known within families; inconsistent across m m families Frequency of QTL alleles
7i f(Q) = f(q) = 0.5 f(Q ) ≠ f(Qj) o.5 c m ro σ>
Table 5. Factors affecting information for QTL identification in progeny from plant breeding populations.
Genome
- number of markers (e.g. 200, 400)
- type of markers (co-dominant, dominant, mixture)
- number of alleles (e.g. 2, 5)
- frequency of missing and incorrect marker genotyping
- genome size (20M, 10 linkage groups)
- disequilibrium among breeding lines (e.g. 0, 0.15)
Population
- types of progeny (F2:3,F4:s,per se, top-cross)
- number of progeny (e.g. 300, 3000)
- number of families
- number of progeny per family (fixed, variable)
- number of testers
- relationships among families
- relationships among testers
Expression of Quantitative Trait
- number QTL (e.g. 5, 10)
- relationship between marker loci and QTLs
- number of functional QTL alleles
- genetic effects (fixed, random)
- family effects (fixed, random)
- precision of QT data ( progeny, parents)
- tester effects
- linkage among QTLs (linked or unlinked)

Claims

WHAT IS CLAIMED IS:
1. A method of mapping a phenotypic trait to a corresponding chromosomal location or region, the method comprising:
i) providing a population of progeny, the progeny descending from a plurality of families resulting from related or unrelated crosses; ii) assigning phenotypic values to at least one phenotypic trait segregating in the population of progeny; iii) determining a genotype for at least one haplotype in the population of progeny, which at least one haplotype comprises a plurality of genetic markers; and iv) applying a statistical model which evaluates correspondence between the haplotype and the assigned phenotypic value, thereby identifying a chromosomal location corresponding to the phenotypic trait.
2. The method of claim 1, comprising mapping at least one quantitative trait loci in a plant, which at least one quantitative trait loci contributes to the phenotypic trait.
3. The method of claim 2, wherein the plant is a species selected from the group consisting of: Agrostis, Allium, Antirrhinum, Apium, Arabidopsis, Arachis, Asparagus, Atropa, Avena, Bambusa, Brassica, Bromus, Browaalia, Camellia, Cannabis, Capsicum, Cicer, Chenopodium, Chichorium, Citrus, Coffea, Coix, Cucumis, Curcubita, Cynodon, Dactylis, Datura, Daucus, Digitalis, Dioscorea, Elaeis, Eleusine, Festuca, Fragaria, Geranium, Glycine, Helianthus, Heterocallis, Hevea, Hordeum, Hyoscyamus, Ipomoea, Lactuca, Lens, Lilium, Linum, Lolium, Lotus, Lycopersicon, Majorana, Malus, Mangifera, Manihot, Medicago, Nemesia, Nicotiana, Onobrychis, Oryza, Panicum, Pelargonium, Pennisetum, Petunia, Pisum, Phaseolus, Phleum, Poa, Prunus, Ranunculus, Raphanus, Ribes, Ricinus, Rubus, Saccharum, Salpiglossis, Secale, Senecio, Setria, Sinapis, Soanum, Sorghum, Stenotaphrum, Theobroma, Trifolium, Trigonella, Triticum, Vicia, Vigna, Vitis, Zea, the Olyreae, and the Pharoideae.
4. The method of claim 3, wherein the plant is Zea mays, Glycine max, Sunflower, Sorghum, Wheat, Rice or Canola.
5. The method of claim 1, wherein the related crosses are biparental crosses, which biparental crosses comprise the mating of related parents.
6. The method of claim 5, wherein the related parents comprise inbred lines.
7. The method of claim 5, wherein the parents of the related crosses have a pedigree relationship between 0 and 85%.
8. The method of claim 1, comprising generating the population of progeny by a backcross or a topcross.
9. The method of claim 1, wherein the segregating phenotypic trait is selected from the group consisting of: yield, grain moisture, grain oil, root lodging, stalk lodging, plant height, ear height, disease resistance, insect resistance, grain yield, silage yield, grain composition, starch composition, oil composition, protein composition, maturity, time to flower, heat units to flower, days to flower, resistance to density, resistance to moisture stress, kernel number, kernel size, ear size, ear number, pod number, or number of seeds per pod.
10. The method of claim 1, comprising correlating pedigrees and genotypes at a plurality of genetic markers comprising a haplotype, wherein the haplotype is identified in a region under study, thereby determining identical by descent (IBD) data, and correlating the IBD data with a QTL.
11. The method of claim 1, comprising determining the genotype for at least one chromosomal haplotype.
12. The method of claim 1, comprising determining the genotype for at least one regional haplotype.
13. The method of claim 1, comprising applying a statistical modelcomprising a HAPLO- IM4" model, a HAPLO-MQM model, and a HAPLO-MQM4" model.
14. The method of claim 1, comprising applying a statistical model comprising a Bayesian analysis method, a frequentist analysis method, a fixed effects model, a random effects model, or a mixed effects model.
15. The method of claim 1, wherein the genetic markers comprising the haplotype are restriction fragment length polymorphisms (RFLP), isozyme markers, allele specific hybridization (ASH), amplified variable sequences of the plant genome, self-sustained sequence replication, simple sequence repeat (SSR), single nucleotide polymorphism (SNP), or arbitrary fragment length polymorphisms (AFLP).
16. The method of claim 15, wherein the genetic markers comprising the haplotype are typed by high throughput screening.
17. The method of claim 1, wherein:
(a) the phenotypic values are assigned to a plurality of phenotypic traits in a digital system by parameterizing the plurality of phenotypic traits to provide a set of uni or multi-dimensional datapoints correspoding to the phenotypic values;
(b) the phenotypic values, or a set of uni or multi-dimensional datapoints correspoding to the phenotypic values are stored in a digital system or in a computer readable medium;
(c) the genotype of the at least one haplotype is determined in a digital system by determining a set of uni or multi-dimensional datapoints corresponding to the haplotype; or (d) the statistical model is applied in a digital system by executing an instruction set which solves one or more regression equations selected from a HAPLO-IM4" model, a HAPLO-MQM model, and a HAPLO-MQM+ model.
18. The method of claim 1, further comprising selecting for a desired phenotypic trait in at least one progeny of a plant breeding population.
19. The method of claim 18, comprising selecting for the desired phenotypic trait by marker assisted selection.
20. A plant selected by the method of claim 18.
21. The method of claim 1, further comprising cloning a nucleic acid fragment in linkage disequilibrium with a phenotypic trait; and transducing the nucleic acid fragment into a plant.
22. The method of claim 21, comprising transducing the nucleic acid fragment into a plant in an expression cassette comprising a promoter operably linked to the nucleic acid fragment.
23. The method of claim 21, wherein the plant is sexually crossed with a second plant.
24. The transgenic plant made by the method of claim 21.
25. The transgenic plant of claim 24, which is a member of the species Zea mays, Glycine max, Sunflower, Sorghum, Wheat, Rice or Canola.
EP00989407A 1999-12-30 2000-12-21 Mqm mapping using haplotyped putative qtl-alleles: a simple approach for mapping qtl's in plant breeding populations Withdrawn EP1265476A2 (en)

Applications Claiming Priority (7)

Application Number Priority Date Filing Date Title
US17398599P 1999-12-30 1999-12-30
US173985P 1999-12-30
US17511700P 2000-01-06 2000-01-06
US175117P 2000-01-06
US18033000P 2000-02-04 2000-02-04
US180330P 2000-02-04
PCT/US2000/034971 WO2001049104A2 (en) 1999-12-30 2000-12-21 Mqm mapping using haplotyped putative qtl-alleles: a simple approach for mapping qtl's in plant breeding populations

Publications (1)

Publication Number Publication Date
EP1265476A2 true EP1265476A2 (en) 2002-12-18

Family

ID=27390364

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00989407A Withdrawn EP1265476A2 (en) 1999-12-30 2000-12-21 Mqm mapping using haplotyped putative qtl-alleles: a simple approach for mapping qtl's in plant breeding populations

Country Status (3)

Country Link
EP (1) EP1265476A2 (en)
AU (1) AU2591501A (en)
WO (1) WO2001049104A2 (en)

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7250552B2 (en) 2001-06-07 2007-07-31 Pioneer Hi-Bred International, Inc. QTL controlling sclerotinia stem rot resistance in soybean
BRPI0410656A (en) * 2003-05-28 2006-07-18 Pioneer Hi Bred Int plant breeding method and plant obtained
CN1849064A (en) 2003-07-07 2006-10-18 先锋高级育种国际公司 QTL 'mapping as-you-go'
EP1815019A2 (en) 2004-11-17 2007-08-08 Pioneer Hi-Bred International, Inc. Genetic loci associated with sclerotinia tolerance in soybean
PT1885176T (en) 2005-05-27 2016-11-28 Monsanto Technology Llc Soybean event mon89788 and methods for detection thereof
EP2056667A1 (en) 2006-08-15 2009-05-13 Monsanto Technology, LLC Compositions and methods of plant breeding using high density marker information
AU2008204740A1 (en) * 2007-01-12 2008-07-17 Monsanto Do Brasil Ltda. Microsatellite-based fingerprinting system for Saccharum complex
AR066922A1 (en) 2007-06-08 2009-09-23 Monsanto Technology Llc METHODS OF MOLECULAR IMPROVEMENT OF THE GERMOPLASMA OF A PLANT BY DIRECTED SEQUENCING
EP2735619A3 (en) 2007-08-29 2014-08-13 Monsanto Technology LLC Methods and compositions for breeding for preferred traits associated with Goss' Wilt resistance in plants
US8900808B2 (en) * 2008-07-15 2014-12-02 E.I. Du Pont De Nemours And Company Genetic loci associated with mechanical stalk strength in maize
US8321147B2 (en) 2008-10-02 2012-11-27 Pioneer Hi-Bred International, Inc Statistical approach for optimal use of genetic information collected on historical pedigrees, genotyped with dense marker maps, into routine pedigree analysis of active maize breeding populations
CN102421479A (en) 2009-02-26 2012-04-18 北卡罗来纳大学查珀尔希尔分校 Interventional drug delivery system and associated methods
WO2010135324A1 (en) 2009-05-18 2010-11-25 Monsanto Technology Llc Use of glyphosate for disease suppression and yield enhancement in soybean
CN103184277A (en) * 2011-12-30 2013-07-03 北京林业大学 Plum blossom genetic map construction method
US10036029B2 (en) 2012-03-20 2018-07-31 Dow Agrosciences Llc Molecular markers for low palmitic acid content in sunflower (Helianthus annus), and methods of using the same
CN102703586B (en) * 2012-05-18 2015-01-07 北京林业大学 Method for constructing Prunus mume Sieb.et Zucc SSR (simple sequence repeat) genetic map
CN102907312A (en) * 2012-11-02 2013-02-06 青岛农业大学 Breeding method for drought-resistant and high-yield wheat
US11545235B2 (en) 2012-12-05 2023-01-03 Ancestry.Com Dna, Llc System and method for the computational prediction of expression of single-gene phenotypes
CN104004749B (en) * 2013-02-27 2016-05-04 中国农业科学院蔬菜花卉研究所 Chain SSR labeled primer and the application thereof of cucumber black fruit Ci Jiyin B
CN103882033B (en) * 2014-03-26 2016-04-20 湖南杂交水稻研究中心 Rice Panicle Traits regulatory gene PT2 and application thereof
CN104313149B (en) * 2014-10-22 2016-02-10 武汉市蔬菜科学研究所 A kind of being applicable to detects molecule marker and the application that Folium Raphani marginal slit carves proterties
CN105695478B (en) * 2014-12-09 2020-03-03 中国科学院上海生命科学研究院 Gene for regulating plant type and yield of plant and application thereof
CN105524993B (en) * 2016-01-12 2018-08-28 四川农业大学 The molecular labeling HRM1 of barley grain length gene Lkl1 a kind of and its application
CN105524994B (en) * 2016-01-12 2018-08-28 四川农业大学 The molecular labeling HRM7 of barley grain length gene Lkl2 a kind of and its application
CN107142317A (en) * 2017-06-02 2017-09-08 中国水稻研究所 A kind of method excavated and verify the Plant Height of Rice allele with cumulative effect
CN107896972B (en) * 2017-11-09 2019-12-31 四川农业大学 Method for breeding perennial feeding coix seeds by distant hybridization
CN109468399A (en) * 2018-12-05 2019-03-15 中国科学院西北高原生物研究所 A kind of northwest spring wheat character is associated with label and the method for effect analysis
WO2021092251A1 (en) 2019-11-05 2021-05-14 Apeel Technology, Inc. Prediction of infection in plant products
CN110669865B (en) * 2019-11-06 2021-03-16 中国科学院武汉植物园 SSR molecular marker for identifying Thailand lotus germplasm and application
CN110698550B (en) * 2019-11-11 2021-07-06 北京林业大学 Molecular detection method for rapidly identifying real plum/apricot plum strain
CN111489790B (en) * 2020-04-02 2023-03-17 华中农业大学 RapMap method for rapidly and high-throughput positioning and cloning plant QTL gene
CN112593007B (en) * 2021-01-11 2022-07-01 四川农业大学 SNP molecular marker linked with wheat grain length QTL and application thereof
CN112941218B (en) * 2021-02-04 2022-10-04 湖北省农业科学院粮食作物研究所 Method for identifying authenticity of yam germplasm resources by virtue of cpPSSR molecular marker method
US20220254447A1 (en) * 2021-02-05 2022-08-11 Monsanto Technology Llc Intermediate recurrent parents, an accelerated and efficient multi-layer trait delivery system
ES2940627A1 (en) * 2021-08-12 2023-05-09 The State Of Israel Ministry Of Agriculture & Rural Development Agricultural Res Organization Aro Vo Methods to increase the yield of prunus dulcis and plants produced by the same. (Machine-translation by Google Translate, not legally binding)
CN113699268B (en) * 2021-09-02 2022-09-30 河北师范大学 Wheat thousand grain weight character related SNP site and application thereof
CN114752683B (en) * 2022-04-18 2023-08-08 广东海洋大学 Construction method of QTL locus related to sex characteristics of Sillago sihama

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6399855B1 (en) * 1997-12-22 2002-06-04 Pioneer Hi-Bred International, Inc. QTL mapping in plant breeding populations

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0149104A2 *

Also Published As

Publication number Publication date
AU2591501A (en) 2001-07-16
WO2001049104A2 (en) 2001-07-12
WO2001049104A3 (en) 2002-08-15

Similar Documents

Publication Publication Date Title
WO2001049104A2 (en) Mqm mapping using haplotyped putative qtl-alleles: a simple approach for mapping qtl's in plant breeding populations
US8039686B2 (en) QTL “mapping as-you-go”
US7977533B2 (en) Genetic loci associated with iron deficiency tolerance in soybean
US9745637B2 (en) Genetic loci associated with fusarium solani tolerance in soybean
Liu et al. Association mapping of six agronomic traits on chromosome 4A of wheat (Triticum aestivum L.)
CA2688644A1 (en) Methods for sequence-directed molecular breeding
US9756800B2 (en) Loci associated charcoal rot drought complex tolerance in soybean
US9670499B2 (en) Yield traits for maize
WO2011090987A1 (en) Methods for trait mapping in plants
Truntzler et al. Diversity and linkage disequilibrium features in a composite public/private dent maize panel: consequences for association genetics as evaluated from a case study using flowering time
CN108064302A (en) QTL associated with the resisting breakage of Canola and the method for identifying resisting breakage
US20100325750A1 (en) Major QTLS Conferring Resistance Of Corn To Fijivirus
Madhusudhana Linkage mapping
WO2017083091A1 (en) Methods and systems for trait introgression
CN117156969A (en) Accelerated method for producing target elite inbred lines with specific and engineered trait modifications
Zhang Association Genetics for Agronomic Traits in Rice and Cloning of ALS Herbicide Resistant Genes from Coreopsis Tinctoria Nutt

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20020719

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

17Q First examination report despatched

Effective date: 20040527

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20041208