CA2358509A1

CA2358509A1 - Molecular profiling for heterosis selection

Info

Publication number: CA2358509A1
Application number: CA002358509A
Authority: CA
Inventors: Ben Bowen; Mei Guo; Oscar Smith
Original assignee: Individual
Current assignee: Pioneer Hi Bred International Inc
Priority date: 1999-01-21
Filing date: 2000-01-19
Publication date: 2000-07-27
Also published as: HUP0200319A3; WO2000042838A2; HUP0200319A2; MXPA01007325A; AU2621300A; WO2000042838A3; EP1143787A2

Abstract

Methods of correlating molecular profile information and heterosis are provided. Selection for dominant, additive, or under/overdominant markers provides for improved heterosis. Selection for the number of expression products in an expression profile provides for improved heterosis. Methods of identifying and cloning nucleic acids linked to heterotic traits are provided.
Methods of identifying parentage by consideration of expression profiles are provided.

Description

WO 00/42838 ~ ~03 PCT/US00/01422 MOLECULAR PROFILING FOR HETEROSIS SELECTION
FIELD OF THE INVENTION
The invention relates to new methods of improving crop selection and selecting for heterosis using molecular and computer modeling techniques.
CROSS-REFERENCE TO RELATED APPLICATIONS
This application is a non-provisional filing of and claims priority to "MOLECULAR
PROFILING FOR HETEROSIS" by Ben Bowen et al., USSN 60/116,617 filed January 21, 1999 and "MOLECULAR PROFILING FOR HETEROSIS" by Ben Bowen et al., USSN
601166,368 filed November 17, 1999.
BACKGROUND OF THE INVENTION
Hybrid offspring often outperform their parents by a variety of different measures, including yield, adaptability to environmental changes, disease resistance, pest resistance, and the like. The improved properties for the hybrid as compared to the parents are collectively referred to as "hybrid vigor," or "heterosis." Hybridization between parents of dissimilar genetic stock has been used in animal husbandry and especially for improving major plant crops, such as corn, sugarbeet and sunflower.
Indeed, for some crops, such as corn (Zea mays), most of the crop which is grown is hybrid offspring. Because crossing these hybrid offspring results in a loss of vigor and lack of uniformity, the production of seed of these crops for planting is complex, utilizing inbred strains that are crossed to produce hybrid seed with uniform characteristics.
For example, the development of a maize hybrid typically involves three steps:
(1) the selection of plants from various germplasm pools for initial breeding crosses; (2) the selfing of the selected plants from the breeding crosses for several generations to produce a series of inbred lines, which, although different from each other, breed true and are highly uniform; and (3) crossing the selected inbred lines with different inbred lines to produce hybrid progeny (sometimes referred to as "F1" hybrids). During the inbreeding process in maize, the vigor of the lines decreases. Vigor is restored when two different inbred lines are crossed to produce hybrid progeny. A consequence of the homozygosity and homogeneity of the inbred lines is that hybrids produced by crossing a defined pair of inbreds are uniform and predictable. Once the inbreds that give a superior hybrid have been identified, the hybrid seed can be reproduced for as long as the homogeneity of the inbred parents is maintained.
Despite many years of research and the considerable commercial importance of generating hybrids with desirable traits, the molecular basis for heterosis is still essentially unknown. In a few cases, the loss of vigor due to inbreeding can be traced directly to a combination of undesirable genes (e.g., lethal or suhlethal recessives).
However, the simple genetic combination of such genes is not at all sufficient to explain the phenomenon of heterosis. Even when crosses are optimized to eliminate such problematic genes, the resulting offspring still show a decrease in vigor when inbred. Furthermore, many phenotypic traits, such as yield, are the result of several interacting genes and it is unclear why combining parents with different genetic backgrounds results in an increase in yield.
Indeed, it is not even clear whether heterosis is the result of one or a few general genetic mechanisms, or whether it is the result of many simultaneously interacting processes.
Because of the lack of understanding of the molecular basis for heterosis, crop development has relied upon empirical observations of heterosis for hybrids which result from crossing selected inbred crop strains (or resulting from second order crosses, e.g., in which two inbreds are crossed to produce a hybrid which is then crossed with an inbred or hybrid strain to produce a subsequent 3-4 way heterotic hybrid). This laborious process has been conducted on a large scale, resulting in increases in desirable measures of heterosis, such as yield, of several percent per year.
Empirical methods based on quantitative genetics theory have resulted in a tripling of hybrid corn yield over the last 70 years. This has been essential for food security and a major contribution to the U.S. and world economy. By 2020, the world bank and other groups predict that it will be necessary to double maize production and increase rice and wheat production by 50% to support projected population growth. Such an increase can not be accomplished by increasing acreage in production (there is not enough additional acreage available). It is doubtful that simple empirical approaches will be sufficient to increase yield fast enough to meet projected demand.
Molecular methods have been used to a limited extent to supplement crop breeding programs to select desirable inbreds and hybrids. In general, these procedures have been used to identify genetic markers corresponding to desirable or undesirable loci (e.g., W0 00/42838 __ V 'JV ~~v y 'V ~~ V , V' PCT/US00/01422 "quantitative trait loci" or QTLs) in plants under analysis. Genetic markers represent (mark the location of) specific loci in the genome of a species or closely related species, and sampling of different genotypes at these marker loci reveals genetic variation. The genetic variation at marker loci can then be described and applied to genetic studies, commercial breeding, diagnostics, cladistic analysis of variance, or genotyping of samples. Because molecular methods are amenable to high throughput analysis and because they do not require yield testing, they can be used to speed the process of crop development.
However, although these techniques are of considerable use, and can and do enhance the efficiency of crop breeding programs, they are not currently used, or useful, as a predictor for the more general phenomenon of heterosis.
Accordingly, there is a need in the art to determine how molecular, or other high-throughput methods, or models, can be applied to predict heterosis in individual organisms and in populations. The present invention provides a number of fundamental discoveries which make it possible to correlate molecular methods and the phenomenon of heterosis, as well as a variety of additional aspects which will be apparent upon complete renew.
SUMMARY OF THE INVENTION
It is discovered that the number of gene products expressed at optimum levels in an organism such as a plant correlates with the degree of heterosis the organism displays.
Thus, by profiling the expression of RNA or protein in a tissue of a plant, it is possible to predict the level of heterosis the plant will display if tested for a heterotic trait such as yield.
Usc of this correlation permits initial selection of organisms, such as commercial crops, without actual field testing. Because of the high throughput nature of molecular methods which can be used to profile expression, this initial selection dramatically speeds the process of increasing desirable traits (and decreasing undesirable traits), resulting in an increase in the rate, e.g., of crop improvement.
It is additionally discovered that there is a correlation between the number of dominant and additive expression products and the heterosis an organism such as a plant displays. As above, determination of the number (and/or ratio) of dominant and or additive expression products permits selection of plants for heterosis without field testing. In all cases, profiling methods are used to determine the number, and/or relative ratio of any or all of additive, dominant, or under- or over-dominant expression products, thereby providing methods of selecting plants for increased heterosis based upon observed expression profiles.
In addition, modeling methods for predicting which crosses from a panel of potential crosses are most likely to result in increases in the number of expressed genes, or the number or ratio of additive or dominant genes, or which minimize the ratio of under- or over-dominant genes-are provided. New selection methods for obtaining desirable plants, and plants obtained by these methods are provided.
It is additionally discovered that gene silencing plays a role in heterosis.
Thus, by monitoring silencing of genes, it is possible to identify which genes are responsible for heterosis. Thus, in one aspect, a heterologous nucleic acid that results in expression of expression products from silenced genes (e.g., dominant or additive products) is introduced into a target plant. Examples of appropriate heterologous nucleic acids include one or more of: a transcription factor which activates a promoter from a silenced gene, a nucleic acid encoded by the silenced gene under the control of a heterologous promoter, and a nucleic acid homologous to the silenced gene with at least one region of difference with the silenced gene, which homologous nucleic acid can recombine with the silenced gene to produce a modified gene. Any of these nucleic acids can be cloned under the control of heterologous promoters and placed into target plants to increase heterosis of the target plants.
In desirable implementations of the methods herein, integrated systems comprising computer databases having expression profile information can be used to select which parental crosses are most likely to result in an increase in the number of expression products (or an optimization of expression products of a selected class, i.e., dominant, undcr-dominant, over-dominant, additive, or the like) in offspring. Thus, consideration of expression profile information provides not only a basis for selecting hybrids from crosses, but, using the methods herein, also identifies desirable crosses to be made.
Production and automated consideration of expression profile databases also provides a mechanism for identifying the genetic source of particular expression products, thereby indicating the likely parentage of given hybrids.
The invention additionally provides methods of cloning and transducing target plants or animals with dominant, additive, under-dominant and over-dominant genes identified by comparative examination of expression profiles.

lOntv~~p~n: ta/ 6IDD Zii1= 6~0~77a77 -s EPO/iPA/OE6 Ri~~wi~ki P~pina ~
_, , 85/12/28A8 16: 21 [5193377877 6l.lII~ELANI PAGE B5 BR~F DB~RCR~TiON OF Ti~ FxGZJRES
1 is s scatter plot showing the correlation between the degree of hetezoais and relationship.
igute 2 is a set of bar grsplu showing classification of gene expc~ession 5 portents in H 'd vs. inbred patents.
gore 3 is a tine graph showing she correlation between the pattern of one gurc to a set of bar graphs showing dominant, additive sad over-/under-daminant RN txpression.

tgure is a scatter graph showing the correlation betwoai penmtal effeeu on gene expseuioand hecerosis.

figure 6a-c is a set of echetnatic illustrstions ahowinE
potysno~hic do~nsnt products and sequence:
(SEQ
)I?
NOS
L-23, rtspoctively)_ AEF~IT10NS

"expression profile"
is the result of detecting a taprcset~tative sample of expression ucca p fcnrn a cell, tissue or whole organism, or a rcpraentation (picture, graph, data table, boat, da etc.) thereof.
Four exarapla, ntaxly RNA
expression products or a cell or tissue can tancouaty sim be deucted on a nucleic acid away, or by the tahnique of diffe:endal lay di or modifscativn thereof such as Curagen's "CiaaaCallittg'~"
technology.

Similarly,'n pco expression products can lx tested by various protein datxcion methods, such as hybri'ration to peptide or antibody arrays, or by screening phage display libraries.
A

"portion" ubportion" of an expression profile, or a "partial or " profile" is a subset of the data provided by a complete profile, such as the information provided by a subset of the total number of red expression products.

2S An "expression product"
is any product tran:cribed in a cell from a DNA
(e.g., frogs a gene)translated from an RNA
(e.g., a protein).
Facample exprtasion product include u~N and proteins.

A
"repc~esentuiva sample"
of exptzssion products, e.g., from a particular x11, tissue, or e orgu~isrn is a auffici,entty large number wh l of exptss:;on products that aWi:tical comparisonthe actual number andlor type of expression products between diffc:mt cells, eisaues, of le organisms can be made. Ideally, at least w o about 5096, and typically 6096, AMENDED SHEET
___ 70%, 80%, 90%, 95% or 100% of the total expression products which are detectable by a given technique constitute the "representative sample." The representative sample will typically include a large number of expression products, as cells, tissues and organisms typically produce a fairly large number of expression products. For example, a typical representative sample of expression products includes between about 100 and 20,000 or more expression products, e.g., about 100-500, 1,000, 1,.500, 2,000, 2,500, 3,000, 3,500, 4,000, 4,500, 5,000, 5,500, 6,000, 6,500, 7,000, 7,500, 8,000, 8,500, 9,000, 9,500, 10,000, 10,500, 11,000, 11,500, 12,000, 12,500, 13,000, 13,500, 14,000, 14,500, 15,000, 15,500, 16,000, 16,500, 17,000, 17,500, 18,000, 18,500, 19,000, 19,500, 20,000, or 30,000 expression products, or the like.
The term "correlation" unless indicated otherwise, is used herein to indicate that a "statistical association" exists between, e.g., an expression product and the degree of heterosis.
"Dominant" expression for an expression product refers to the situation where expression of the product in a progeny differs from one parent, and not the other for the expression product. "Additive" expression for an expression product refers to the situation where expression of the product in a progeny falls within the range of the two parents (and may or may not differ from both parents). "Over-dominant" or "under-dominant"
expression for an expression product refers to the situation where expression of an expression product in a progeny differs from both parents and falls outside of the range of the two parents, either over the higher parent value, or under the lower parent value, respectively (Figure 2). Further, the term "differ" when referring to values is dependent on the technologies being utilized.
For example, when using Curagen's "GeneCallingT"'" technology, any differences in value less than approximately 1.5 to 2.0 fold different from a given parent is considered not to differ.
A "biological sample" is a portion of material isolated from a biological source such as a plant, isolated plant tissue, or plant cell, or a portion of material made from such a source, such as a cell extract or the like.
A "promoter" is an array of nucleic acid control sequences which direct transcription of a nucleic acid. As used herein, a promoter includes necessary nucleic acid sequences near the start site of transcription, such as, in the case of a polymerase II type promoter, a TATA element. A promoter also optionally includes distal enhancer or repressor elements which can be located as much as several thousand base pairs from the start site of transcription. A "constitutive" promoter is a promoter which is active in a selected organism under most environmental and developmental conditions. An "inducible" promoter is a promoter which is under environmental or developmental regulation in a selected organism. -The phrase "hybrid plants" refers to plants which result from a cross between genetically different individuals.
The phrase "sexually crossed" or sexual reproduction" in the context of seed crop plants refers to the fusion of gametes to produce, e.g., seed by pollination. A "sexual cross" is pollination of one plant by another. "Selfing" is the production of, e.g., seed by self-pollination, i.e., where the pollen and the ovule are from the same plant.
The phrase "tester parent" refers to a parent that is genetically different from a set of lines to which it is crossed. The cross is for purposes of evaluating differences among the lines in tvpcross combination. Using a tester parent in a sexual cross allows one of skill to determine the genetic differences bctwecn the tested lines on the phenotypic trait with expression of quantitative trait loci in a hybrid combination.
The phrases "topcross combination" and "hybrid combination" refer to the processes of crossing a single tester parent to multiple lines. The purposes of producing such crosses is to evaluate the ability of the lines to produce desirable phenotypes in hybrid progeny derived from the line by the tester cross.
The phrase "transgenic plant" refers to a plant into which exogenous polynucleotides have been introduced by any process other than sexual cross or selfing.
Examples of processes by which this can be accomplished are described below, and include Agrohacrerium-mediated transformation, biolistic methods, electroporation, in planta techniques, and the like. Such a plant containing the exogenous polynucleotides is referred to here as an R, generation transgenic plant. Transgenic plants may also arise from sexual cross or by selfing of transgenic plants into which exogenous polynucleotides have been introduced.

DETAILED DESCRIPTION
OVERVIEW OF SELECTION FOR HETEROSIS
Crop improvement relies extensively on the phenomenon of heterosis. inbreds and/or hybrids are crossed to produce heterotic hybrids with desirable traits such as high yield, disease resistance, resistance to heat, cold, salinity, insects, fungi, herbicides, pesticides, etc. Secondary desirable traits such as a particular size or shape of ears, solids content, sugar content, oil content, water content, etc., can also be affected by heterosis. The present invention establishes several correlations between the expression of gene products and heterosis, e.g., with respect to yield. These include a statistical association between the number of gene products and the degree of heterosis displayed; a statistical association between the number of gene products with a dominant expression pattern and the degree of heterosis displayed and a statistical association with the number of gene products with an additive expression pattern and the degree of heterosis displayed. In addition, it is discovered that genes are silenced during inbreeding in plants.
These correlations provide new methods of selecting heterotic hybrids, without the necessity of field testing every hybrid to monitor heterotic traits. In the methods, expression of a first representative sample of first expression products (e.g., RNAs or proteins) is profiled from a first progeny plant (e.g., a hybrid from resulting from crossing two or more parental lines). The expression products produced in the first progeny plant are quantified and/or monitored for the type of expression product (additive, dominant, under-dominant, over-dominant, etc.). As noted above, the number of first expression products produced in the first progeny plant is statistically associated with a measure of heterosis in the first progeny plant, as is the number of dominant, additive, under-dominant or over-dominant, or silenced expression products. The plant is then selected (e.g., against similar measures for a second progeny plant, or a population of progeny plants, or against the parental stock) for further testing based upon the number or type of expression products detected.
Thus, the plant can be selected for one or more characteristic, including: a selected number of expression products, a selected number of dominant expression products, a selected ratio of dominant expression products to total expression products, a desired number of over- or under-dominant expression products, a selected ratio of over- or under-dominant expression products to total expression products, a selected number of additive expression products, and a selected ratio of additive expression products to total expression products.
Typically, the first progeny plant is selected to maximize the number of dominant expression products and/or to maximize the number of additive expression products, and/or to minimize the number of over- or under-dominant expression products. Crosses can also be selected to minimize silencing in the progeny plant.
The parental plants used to produce the first progeny can also be profiled.
Resulting parental expression profiles serve any of a variety of purposes. The parental expression profiles can be compared to the first progeny profile to aid in determining whether the progeny show an increase in the number of expression products as compared to parental stocks (thereby indicating that the progeny is likely to be heteratic). In addition, comparison between the parental expression profiles and the progeny profile is used to determine whether the individual expression products represented in the profile are dominant, additive, under-dominant, over-dominant, or the like. The parental expression profiles can also be placed into a database to aid in determining which crosses are most likely to produce heterotic hybrids. Potentially desirable crosses among members of the database are selected by identifying plants likely to produce progeny plants with a selected number of expression products which are dominant, over-dominant, under-dominant or additive. For example, parents are selected to produce the first progeny plant by selecting for complementary expression of dominant or additive expression products between the parents, or by selecting against expression of over-dominant or under-dominant expression products in the parents.
An additional statistical association relates to the relationship between parental and progeny plants. It is discovered that plants which exhibit an expression profile that is more similar to the maternal plant than to the paternal plant may be more heterotic.
Accordingly, comparison of the maternal, paternal and progeny expression profiles can be used to monitor this relationship. In addition, multiple crosses to a single female type can be made (or the results predicted by comparison in a database) and the progeny screened (or predicted) for similarity to the female type.
As noted above, silencing was determined to play a significant role in the loss of heterosis due to inbreeding. Accordingly, by comparing parental and progeny plants it is possible to determine which genes are silenced. These genes can be rescued, e.g., by cloning the silenced genes and placing them under the control of heterologous promoters, or other strategies noted herein, and transducing the genes back into target plants (e.g., the parental lines, the hybrids, or any other plant). In addition, by compiling database information for which genes are silenced in inbreds, it is possible to decrease silencing in hybrids by selecting crosses where parents have complementary patterns. It is also possible to use these methods 5 to increase the performance (e.g., grain yield, standability, ete.) of the inbred lines themselves.
The first progeny plant selected by any of the methods herein, or a subsequent progeny plant, or a transgenic plant as described above can be subjected to any of the field tests appropriate for monitoring one or more desired traits. Thus, the first progeny plant, or a 10 subsequent progeny plant thereof, can be tested for a desired phenotypic trait. The phenotypic trait can be compared between the first progeny plant, or a subsequent progeny plant, and a selected hybrid or inbred plant. The expression profile of the selected hybrid or inbred plant can be compared to an expression profile of the first progeny plant, or the subsequent progeny plant. Nucleic acids differentially expressed between the selected hybrid or inbred plant and the first progeny plant, or the subsequent progeny plant are identified as targets for cloning. Similarly, genes that are expressed in high yielding hybrids that are not expressed in low yielding hybrids can be determined by comparisons of the expression profiles for the high and low yielding hybrids. Nucleic acids from (or corresponding to) the differentially expressed genes are cloned for introduction into target nucleic acids. After identifying which expression products from the representative sample show an additive, dominant, underdominant, or overdominant expression pattern for at least a portion of the representative sample, or a nucleic acid corresponding to the expression product, can be cloned. The cloned nucleic acid can then be transduced into target plants to test whether the nucleic acid encodes a useful trait, or to improve traits in the target plant.
Further details on expression profiling, cloning of nucleic acids, selection of hybrids, integrated systems, screening methods and the like are set forth below.
EXPRESSION PROFILING
As set forth below, a variety of tissues can be profiled, with immature tissues being preferentially profiled. Immature tissues are preferred, because it increases the rate at which crops can be screened, as a plant does not have to be grown to maturity.
However, essentially any tissue, or whale plant, can be profiled. A variety of profiling methods are WO 00/42838 PC'f/US00/01422 available, including hybridization of expressed or amplified nucleic acids to a nucleic acid array, hybridization of expressed polypeptides to a protein array, hybridization of peptides or nucleic acids to an antibody array, subtractive hybridization, differential display and others.
CROPS TO BE PROFILED
The parental or progeny plants can be inbreds or hybrids. Most commonly, tire progeny plant is a hybrid, produced by crossing two different inbred lines, or crossing an inbred line and a hybrid line, or crossing two hybrid lines (which are the result of crossing inbred or hybrid lines), or crossing of more than two lines (e.g., to generate polyploid yr recombinant plants) in a single cross. Once a desirable heterotic hybrid is identified, it can be treated as such hybrids typically are in breeding schemes, e.g., it can produced in quantity as seed; it can be top crossed to inbred lines to produce a 3-way hybrid plant;
it can be selfed to produce more inbred lines, or the like.
Mast, if not all, plants and animals show hybrid vigor. Much of the discussion herein relates to commercially valuable crops, as these are an important target of the methods of the invention. However, the methods are general and can be applied to non-commercial crop plants, fungi, and to the production of animals, including poultry, cattle, sheep, pigs, and the like.
Important commercial crops include both monocots and dicots. Monocots such as plants in the grass family (Gramineae), such as plants in the sub families Fetucoideae and Poacoideae, which together include several hundred genera including plants in the genera Agrostis, Phleum, Daclylis, Sorgum, Setaria, Zea (e.g., corn), Oryza (e.g., rice), Triticum (e.g., wheat), Secale (e.g., rye), Avena (e.g., oats), Hordeum (e.g., barley), Saccharum, Poa, Festuca, Stenotaphrum, Cynodon, Coix, the Olyreae, Phareae and many others.
Plants in the family Gramineae are a particularly preferred target plants for the methods of the invention.
Additional preferred targets include other commercially important crops, e.g., from the families Compositae (the iargest family of vascular plants, including at least 1,000 genera, including important commercial crops such as sunflower), and Leguminosae or "pea family,"
which includes several hundred genera, including many commercially valuable crops such as pea, beans, lentil, peanut, yam bean, cowpeas, velvet beans, soybean, clover, alfalfa, lupine, vetch, lotus, sweet clover, wisteria, and sweetpca. Common crops applicable to the methods WO OO1d2838 PCT/US00/01422 of the invention include Zea mays, rice, soybean, sorghum, wheat, oats, barley, millet, sunflower, and canola.
TISSUES TO BE PROFILED
As noted above, one advantage of the present invention is that the methods can be performed without the necessity of field testing progeny (field testing can, of course, be used as a part of, or an adjunct to the other methods herein). An extension of this advantage is that immature tissues can be profiled from a test plant, which speeds the testing process.
Thus, although expression profiles can be performed from any tissue or whole organism, in one preferred embodiment, the representative samples are from immature tissues or immature plants. For example, an immature ear of the plant, or a whole seedling plant (or any tissue thereof), can be profiled. It will be appreciated that when comparisons are performed, they are typically performed between expression profiles obtained from the same tissue and developmental stage (and environmental conditions) for the plants which are compared.
$,rjA P~tOFILING
In one preferred embodiment, the expression products which are detected in the methods of the invention are RNAs, e.g., mRNAs expressed from genes within a cell of the plant or tissue profiled.
A number of techniques are available for detecting RNAs. For example, northern blot hybridization is widely used for RNA detection, and is generally taught in a variety of standard texts on molecular biology, including: Berger and Kimmel, Guide to Molecular Cloning Tr,~chniques ethods in Enz m~o(o,~,Y volume 152 Academic Press, Inc., San Diego, CA (Berger); Sambrook et al., Molecular Cloning - A Laboratory Manual (2nd Ed.), Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, ("Sambrook") and Current Protocols inMolecular Bioloav, F.M. Ausubel et al., eds., Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John VViley &
Sons, Inc., (supplemented through 1998) ("Ausubel")).
Furthermore, one of skill will appreciate that essentially any RNA can be converted into a double stranded DNA using a reverse transcriptase enzyme and a polymerase. See, Ausubel, Sambrook and Berger, id. Thus, detection of mRNAs can be performed by converting, e.g., mRNAs into DNAs, which are subsequently detected in, e.g., a standard "Southern blot" format.

Furthermore, DNAs can be amplified to aid in the detection of rare molecules by any of a number of well known techniques, including: the polymerise chain reaction (PCR), the ligasc chain reaction (LCR), Q(3-replicase amplification and other RNA
polymerise mediated techniques (e.g., NASHA). Examples of these techniques are found in S Berger, Sambrook, and Ausubel, id., as well as in Mullis et al., (1987) U.S.
Patent No.
4,683,202; PCR Protocols A Guide to Methods and Applications (Innis et al.
eds) Academic Press Inc. San Diego, CA (1990) (Innis); Arnheim & Levinson (October 1, 1990) CC&EN 36-47; Theh Journal (Zf NI~iRes~rch (1991) 3, 81-94; Kwoh et al. (1989) Proc.
NaI,~,Acad. Sci.
USA 86, 1173; Guatelli et al. (1990) Proc. Natl. Acid. Sci. USA 87, 1874;
Lomell et al.
(1989) J. Clin. Chem 35, 1826; Landegren et al., (1988) Science 241, 10?7-1080; Van Brunt (1990) Biotechnolosv 8, 291-294; Wu and Wallace, (1989) Gene 4, 560; Barringer et al.
(1990) ~ 89, 117, and Sooknanan and Malek (1995) Biotechnology 13: 563-564.
Improved methods of cloning in vitro amplified nucleic acids are described in Wallace et al., U.S. Pat. No. 5,426,039. Improved methods of amplifying large nucleic acids by PCR are summarized in Cheng et al. (1994) Nature 369: 684-685 and the references therein, in which PCR amplicons of up to 40kb are generated. One of skill will appreciate that essentially any RNA can be converted into a double stranded DNA suitable for restriction digestion, PCR
expansion and sequencing using reverse transcriptase and a polymerise. See, Ausubel, Sambrook and Berger, all supra.
These general methods can be used for expression profiling. For example, arrays of probes can be spotted onto a surface and expression products (or in vitro amplified nucleic acids corresponding to expression products) can be labeled and hybridized with the array. For convenience, it may be helpful to use several arrays simultaneously. It is expected that one of skill is familiar with nucleic acid hybridization. General methods of hybridization are found in Berger, Sambrook and Ausubel, ,sc~pra, and further in Tijssen (1993) ora o Technigues in Bioche~is~ and Molecular BiologX -I-Iybridiz~,tion with Nucleic Acid Probes, e.g., part I chapter 2 "Overview of principles of hybridization and the strategy of nucleic acid probe assays," Elsevier, New York.
In one useful variation of these methods, solid phase arrays are adapted for the rapid and specific detection of multiple poIymotphic nucleotides. Typically, a nucleic acid probe is chemically linked to a solid support and a target nucleic acid (e.g., an RNA or wo ooiaza3s -- -- rcriusooioiaaz corresponding amplified DNA) is hybridized to the probe. Either the probe, or the target, or both, can be labeled, typically with a fluorophore. Where the target is labeled, hybridization is detected by detecting bound fluorescence. Where the probe is labeled, hybridization is typically detected by quenching of the label by the bound nucleic acid. Where both the probe and the target are labeled, detection of hybridization is typically performed by monitoring a signal shift such as a change in color, fluorescent quenching, or the like, resulting from proximity of the two bound labels.
In one embodiment of this concept, an array of probes are synthesized on a solid support. Using chip masking technologies and photoprotective chemistry, it is possible to generate ordered arrays of nucleic acid probes with large numbers of probes. These arrays, which are known, e.g., as "DNA chips," or as very large scale immobilized polymer arrays ("VLSIPS"TM arrays) can include millions of defined probe regions on a substrate having an area of about lcm= to several cmz. In addition to photomasking technologies, arrays of chemicals, nucleic acids, proteins or the like can also be printed on a solid substrate using printing technologies.
The construction and use of solid phase nucleic acid arrays to detect target nucleic acids is well described in the literature. See, Fodor, et al. Science 251:767 (1991);
Sheldon, et al. Clin. Chem. 39(4):718 (1993); Kozal, et al. Nature Medicine 2(7):753 (1996) and Hubbell, U.S. Pat. No. 5,571,639. In brief, a combinatorial strategy allows for the synthesis of arrays containing a large number of probes using a minimal number of synthetic steps. For instance, it is possible to synthesize and attach all possible DNA
8-mer oligonucleotides (4s, or 65,536 possible combinations) using only 32 chemical synthetic steps. In general, these procedures provide a method of producing 4°
different oligonucleotide probes on an array using only 4n synthetic steps.
Light-directed combinatorial synthesis of oligonucleotide arrays on a glass surface is performed with automated phosphoramidite chemistry and chip masking techniques similar to photo resist technologies in the computer chip industry. Typically, a glass surface is derivatized with a silane reagent containing a functional group, e.g., a hydroxyl (for nucleic acid arrays) or amine group (for peptide or peptide nucleic acid arrays) blocked by a photolabile protecting group. Photolysis through a photolithogaphic mask is used selectively to expose functional groups which are then ready to react with incoming 5'-photoprotected WO 00/42838 "- PGT/US00/01422 nucleoside phosphoramidites. The phosphoramidites react only with those sites which are illuminated (and thus exposed by removal of the photolabile blocking group).
Thus, the phosphoramidites only add to those areas selectively exposed from the preceding step. These steps are repeated until the desired array of sequences have been synthesized on the solid surface. Combinatorial synthesis of different oligonucleotide analogues at different locations-on the array is dctermincd by the pattern of illumination during synthesis and the order of addition of coupling reagents. Monitoring of hybridization of target nucleic acids to the array is typically performed with fluorescence microscopes or laser scanning microscopes.
In addition to being able to design, build and use probe anays using available 10 techniques, one of skill is alsa able to order custom-made arrays and array-reading devices from manufacturers specializing in array manufacture. For example, Affymetrix Corp. in Santa Clara, CA manufactures nucleic acid arrays.
It will be appreciated that probe design is influenced by the intended application. For example, where several allele-specific probe-target interactions are to be 15 detected in a single assay, e.g., on a single nucleic acid chip, it is desirable to have similar melting temperatures for all of the probes. Accordingly, the length of the probes are adjusted so that the melting temperatures for all of the probes on the array are closely similar (it will be appreciated that different lengths for different probes may be needed to achieve a particular Tm where different probes have different GC contents). Although melting temperature is a primary consideration in probe design, other factors are also optionally used to further adjust probe construction, such as elimination of self-complementarity in the probe (which can inhibit hybridization of a target nucleotide). Techniques for designing and using sets of probes for screening many nucleic acids, such as expression products, simultaneously, and for monitoring expression on nucleic acid arrays are described in EP 0799 897 A1.
One way to compare expression products between two cell populations is to identify mRNA species which are differentially expressed between the cell populations (i.e., present at different abundances betwccn the cell populations). In addition to the array techniques noted above, another preferred method is to use subtractive hybridization (Lee et ul. (1991) Proc. Natl. Acad. Sci. (U.S.A.I 88:2825) or differential display employing arbitrary primer polymerase chain reaction (PCR) (Lung and Pardee (1992) Science 257:967). Each of these methods has been used by various investigators to identify differentially expressed mRNA species. See, Salesiotis et al. (1995) ~ance~ Lett. 91:47; Jiang et al.
(1995) Oncoeene 10:1855; Blok et al. (1995) sta a 26:213; Shinoura et al. (1995) Cancer Lett.
89:215;
Murphy et al. (1993) Cell Growth Differ 4:715; Austruy et al. (1993) Cancer Res. 53:2888;
Zhang et al. (1993) Mol. Carcinoe. 8:123; and Liang et al. (1992) Cancer Res.
52:6966). The methods have also been used to identify mRNA species which are induced or repressed, e.g.;
by drugs or certain nutrients (Fisicaro et al. (1995) Mol. Immunol. 32:565;
Chapman et al.
(1995) Mol. Cell. Endocrinol. 108:108; Douglass et al. (1995) J. Neurosci.
15:2471; Aiello et al. (1994) I'roc. Natl. Acad. Sci. (U.S.A.) 91:6231; Ace et al. (1994) EndqJ;rinology 134:1305.
For the technique of differential display, Liang and Pardee (1992), supra provide theoretical calculations for the selection of 5' and 3' arbitrary primers. Correlation of observed results to the theory is also provided. In practice, 5' primers of less than about 9 nucleotides may not provide adequate specificity (slightly shorter primers of about 8 to 10 nucleotides have been used in PCR methods for analysis of DNA polymorphisms.
See also, Williams et al. (1991) Nucl~c Acids Research 18: 6531). The primers) optionally comprise 5'-terminal sequences which serve to anchor other PCR primers (distal primers) and/or which comprise a restriction site or half site or other ligatable end. Where a restriction site or amplification template for a second primer is incorporated, the primers are optionally longer than those described above by the length of the restriction site, or amplification template site.
Standard restriction enzyme sites include 4 base sites, 5 base sites, 6 base sites, 7 base sites, and 8 base sites. An amplification template site for a second primer can be of essentially any length, for example, the site can be about 15-25 nucleotides in length.
The amplified products are optionally labeled and are typically resolved by electrophoresis on a polyacrylamide gel; the locations) where label is present are excised and the labeled product species is/are recovered from the gel portion, typically by elution. The resultant recovered product species can be subcloned into a replicable vector with or without attachment of linkers, amplified further, and/or detected, or even sequenced directly.
Sequencing methods are dcscribcd in Berger, Sambrook and Ausubel, supra.
Direct sequencing of PCR generated amplicons by selectively incorporating boronated nuclease resistant nucleotides into the amplicons during PCR and digestion of the amplicons with a nuclease to produce sized template fragments has also been proposed (Porter et al. (1997) Nucleic Acids Research 25(8):1611).
It is expected that one of skill can use, c.g., differential display for expression profiling. In addition, companies such as CuraGen Corp. (New Haven CT) provide robust expression profiling based upon modified differential display techniques. See, e.g., WO
97/15690 by Rothberg et al. Accordingly, one of skill can have expression profiling performed by companies which specialize in such techniques.
PROTEIN PROFILING
In addition to profiling RNAs (or corresponding cDNAs) as described above, it is also possible to profile proteins. In particular, various strategies are available for detecting many proteins simultaneously. As applied to the present invention, detected proteins, corresponding to expression products, can be derived from one of at least two sources. First, the proteins which are detected can be either directly isolated from a cell or tissue to be profiled, providing direct detection (and, optionally, quantification) of proteins present in a cell. Second, mRNAs can be translated into cDNA sequences, cloned and expressed. This increases the ability to detect rare RNAs, and makes it possible to immediately associate a detected protein with its coding sequence. For purposes of the present invention, it is not necessary even to express nucleic acids in the proper reading frame, as it is typically the presence or absence of an expression product that is, initially, at issue. Even an out of frame peptide is an indicator for the presence of a corresponding RNA.
A variety of hybridization techniques, including western blotting, ELISA
assays, and the like are available for detection of specific proteins. See, Ausubel, Sambrook and Bergen supra. See also, Antibodies: A Labor~,h~,y M~,ual, ( 1988) E.
Harlow and D.
Lane, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY. Non-hybridization based techniques such as two-dimensional electrophoresis can also be used to simultaneously and specifically detect large numbers of proteins.
One typical technology for detecting specific proteins involves making antibodies to the proteins. By specifically detecting binding of an antibody and a given protein, the presence of the protein can be detected. In addition to available antibodies, one of skill can easily make antibodies using existing techniques, or modify those antibodies which are commercially or publicly available. In addition to the art referenced above, general WO 00/42838 PCTlUS00/014ZZ

methods of producing polyclonal and monoclonal antibodies are known to those of skill in the art. See, e.g., Paul (ed) (1998) Fundamental lmmunoloQV. Fourth Edition Raven Press, Ltd., New York Coligan (1991) Current Protocols in Immunology Wiley/Greene, NY;
Harlow and Lane (1989) Antibodies A Laboratory Manual Cold Spring Harbor Press, NY;
Stites et al. (eds.) Basic and Clinical Immunoloav (4th ed.) Large Medical Publications, Log Altos, CA, and references cited therein; Goding (1986) IJV~onc,,~lona Antibodies: ~nci .~les_ and Practice (2d ed.) Academic Press, New York, NY; and Kohler and Milstein (1975) Nature 256:495-497. Other suitable techniques for antibody preparation include selection of libraries of recombinant antibodies in phage or similar vectors. See, Huse et al. (1989) Science 246:1275-1281; and Ward et al. (1989) Nature 341:544-546. Specific monoclonal and polyclonal antibodies and antisera will usually bind with a Kp of at least about .1 ~.M, preferably at least about .O1 ~M or better, and most typically and preferably, .001 ~.M or better.
As used herein, an "antibody" refers to a protein consisting of one or more polypeptide substantially or partially encoded by immunoglobulin genes or fragments of immunoglobulin genes. The recognized immunoglobulin genes include the kappa, lambda, alpha, gamma, delta, epsilon and rnu constant region genes, as well as myriad immunoglobulin variable region genes. Light chains are classified as either kappa or lambda.
Heavy chains are classified as gamma, mu, alpha, delta, or epsilon, which in turn define the immunoglobulin classes, IgG, IgM, IgA, igD and IgE, respectively. A typical immunoglobulin (antibody) structural unit is known to comprise a tetramer.
Each tetramer is composed of two identical pairs of polypeptide chains, each pair having one "light" (about 25 kD) and one "heavy" chain (about 50-70 kD). The N-terminus of each chain defines a variable region of about 100 to 110 or more amino acids primarily responsible for antigen recognition. The terms variable light chain (VL) and variable heavy chain (VH) refer to these light and heavy chains respectively. Antibodies exist as intact immunoglobulins or as a number of well characterized fragments produced by digestion with various peptidases.
Thus, for example, pepsin digests an antibody below the disulfide linkages in the hinge region to produce F(ab)',, a dimer of Fab which itself is a light chain joined to VH-CH1 by a disulfide bond. The F(ab)'Z may be reduced under mild conditions to break the disulfide linkage in the hinge region thereby converting the (Fab'): dimer into an Fab' monomer. The Fab' monomer is essentially an Fab with part of the hinge region (see, Fundamental' Im~,unoloEV, W.E. Paul, ed., Raven Press, N.Y. (1993), for a more detailed description of other antibody fragments).
While various antibody fragments are defined in terms of the digestion of an intact antibody, one of skill will appreciate that such Fab' fragments may be synthesized de novo either chemically or by utilizing recombinant DNA methodology. Thus, the term antibody, as used herein also includes antibody fragments either produced by the modification of whole antibodies or synthesized de novo using recombinant DNA methodologies.
Antibodies include single chain antibodies, including single chain Fv (sFv) antibodies in which a variable heavy and a variable light chain are joined together (directly or through a peptide linker) to form a continuous polypeptide.
For purposes of the present invention, antibodies or antibody fragments can be arrayed, e.g., by coupling to an amine moiety fixed to a solid phase array, in a manner similar to that described above for construction of nucleic acid arrays. As above for nucleic acid probes, the antibodies can be labeled, or proteins corresponding to expression products can be labeled. In this manner, il is possible to couple hundreds, or even thousands, of different antibodies to an array.
In one embodiment, a bacteriophage antibody display library is screened with a polypeptide encoded by a cell, or obtained by expression of mRNAs, differential display, subtractive hybridization or the like. Combinatorial libraries of antibodies have been generated in bacteriophage lambda expression systems which arc screened as bacteriophage plaques or as colonies of lysogens (Huse et al. (1989) cience 246:1275; Caton and Koprowski (1990) Proc. Ng_t]. Acad. Sci. IU.S.A.I 87:6450; Mullinax et al (1990) ProcNatl.Natl.
Acad. Sci. IU.S.A.I 87:8095; Persson et al. (1991) Proc. Natl. Acad. Sci.
,~U.S.A.I 88:2432).
Various embodiments of bacteriophage antibody display libraries and lambda phage expression libraries have been described (Kang et al. (1991) Proc. Natl. Acad.
Sci. IU.S.A.) 88:4363; Clackson et al. (1991) Nature 352:624; McCafferty et al. (I990) Nature 348:552;
Burton et al. (1991) PJroc. Natl. Acad. ~ci. IU.S.A~ 88:10134; Hoogenboom et al. (1991) Nucleic Acids Res. 19:4133; Chang et al. (1991) J. Immunol. 147:3610;
Brcitling et al.
(1991) Gene 104:147; Marks et al. (1991) J. Mol. Biol. 222:581; Barbas et al.
(1992) Proc.
LVatl. Acad. Sci.,SU.S.A.) 89:4457; Hawkins and Winter (1992) J. I~mWn_ol.
22:867; Marks et al. (1992) Biotechnology 10:779; Marks et al. (1992) J. Hiol. Chem. 267:16007;
Lowman et al (1991) Biochemistry 30:10832; Lerner et al. (1992) Science 258:1313.
The patterns of hybridization which are detected provide an indication of the presence or absence of protein sequences. As long as the library or array against which a population of proteins arc to be screened can be correlated from one experiment to the next -(e.g., by noting the x-y coordinates of the library or array member), no sequence information is required to compare expression profiles from one representative sample to another. In particular, the mere presence or absence (or degree) of label provides the ability to determine differences. One advantage of using libraries of antibodies for protein detection is that the 10 individual libraries can be uncharacterized. As long as library members have a set spatial relationship, e.g., gridded on a plate, duplicate plates can be made and label patterns to the set spatial relationship determined.
More generally, peptide and nucleic acid hybridization to arrays or libraries (or even simple two dimensional gels) can be treated in a manner analogous to a bar code label.
15 Any diverse library or array can be used to screen for the presence or absence of complementary molecules, whether RNA, DNA, protein, or a combination thereof.
By measuring corresponding signal information between different sources of test material (e.g., different hybrid or inbred plants, or different tissues, or the like), it is possible to determine differences in expression products for the different source materials. As set forth below, this 20 process is facilitated by various high throughput integrated systems set forth below.
In addition to array based approaches, mass spectrometry is in use for identification of large sets of proteins in samples, and is suitable for identification of many proteins in a sequential or parallel fashion. For example, Hutchens et al.
U.S. Pat. 5,719,060, describe methods and apparatus for desotption and ionization of analytes for subsequent analysis by mass spectroscopy and/or biosensors. Sample presenting means with probe elements with "Surfaces Enhanced for Laser Desorption/ionization" (SELDI) described in the '060 patent is particularly useful in the context of the present invention; however, other approaches described in the '060 are also generally applicable to the present invention.
Two and three dimensional gel based approaches can also be used for the specific and simultaneous identification and quantification of large numbers of proteins from biological samples. Mufti-dimensional gel technology is well-known and described e.g., in Ausubel, supra, Volume 2, Chapter 10. Image analysis of multi-dimensional protein separation gels provides an indication of the proteins that are expressed e.g., in a cell or tissue type. It is worth noting that identification of particular proteins is not necessary; instead, positional and pattern information e.g., of protein staining or fluorescing patterns is suff dent to identify sets of protein expression products.
In addition to identifying expression products, such as proteins or RNA, it is also possible to screen for large numbers of metabolites in cell or tissue samples. The presence, absence or level of a metabolite can be treated as a character for comparison purposes in the same way that nucleic acids or proteins are discussed herein.
Metabolites can be monitored by any of currently available method, including chromatography, uni or multi dimensional gel separations, hybridization to complementary molecules, or the like.
The invention provides methods of identifying plant crosses with an increase in probability for heterosis in progeny plants. For example, in a preferred method, the expression profiles for a plurality of plants are compared, and the expression profiles are considered by pair-wise comparison. Desirable crosses produce progeny with a selected or optimal number of expression products, or progeny with a selected number or type of expression products that display a dominant, additive. over-dominant or under-dominant expression pattern. Desirably, these comparisons are performed in an integrated system which includes a computer.
The generation and use of databases of expression profile information for performing a variety of comparisons is a feature of the invention. Because of the large number of comparisons between expression profiles (which, as noted above, comprise e.g., detection information from about 1,000 to about 20,000 or more expression products), the most practical way of performing the comparisons is by entering the information into one or more database and using a computer to make the comparisons.
A variety of comparative methods can be performed in an integrated system, e.g., to determine the heterosis (or likely heterosis) of a cross. For example, one simple measure that can be compared across different actual or potential crosses to determine the desirability of a particular cross is to determine the sum of the expressed gene products that differ from a progeny plant in each of a first and second parental plant and the number of expressed gene products that differ between the first and second parental plant, The larger this sum, typically, the more desirable the cross.
In the integrated systems herein, it is also possible to predict the likely outcomes of crosses between parental plants. 1n these methods, matrices of possible expression profile combinations for plants are generated, For example, the expression -profiles are compiled in a database in a computer and a matrix of possible pair-wise expression profile combinations for the plants is generated and queried using an integrated system comprising a computer with softwa~-~e for generating and comparing matrices. Subsets of potential crosses from all of the possible pair-wise comparisons which exhibit a maximal number of expression profile differences represent one preferred cross. Useful software aids in determining how many genes are expressed, or whether expressed genes are additive, dominant, over-dominant or under-dominant.
Which plants to select as possible crosses is up to the discretion of the user. It is possible simply to test all possible first order crosses in a database.
However, it is not possible to test all possible subsequent crosses, as the set size for such a procedure is theoretically infinite. That is, after generating a progeny matrix of expression products for all possible pair-wise parental crosses, the progeny matrix can be used to generate a possible theoretical set of crosses between the hypothesised progeny represented by the progeny matrix and/or the original database of parental expression profiles. A
resulting expression profile matrix can be generated for hypothesized subsequent progeny, which can again be compared to any of the preceding expression profile information. 1n theory, this process can be repeated ad infinitum.
More practically, certain rules can be implemented to reduce the total amount of calculations to be performed. For example, matrix information can be limited to possible pair-wise crosses for plants from different heterotic groups, or from the same heterotic group.
In addition, the fidelity of predicted expression profile information increasingly varies as subsequent cross information is considered, and of course, the number of possible crosses increases. Accordingly, typically only one or a few rounds of potential crosses are considered at one time. In any case, selection of a subset of potential crosses from all of the possible pair-wise comparisons which exhibit a maximal number of expression profile differences is desirable.

A variety of rules for performing the basic comparisons can be used. In one desirable implementation, crosses are identified in which the sum of: (i) expression products produced in a first plant from a first heterotic group (A;) which are not expressed in a second plant from the first heterotic group (A) to which the first plant is crossed (A~), and which are not expressed in a selected third plant from a second heterotic group (B), plus (ii) the expression products produced in A~ which are not produced A; and which are not produced in B, is optimized. This optimization results in crosses which achieve elevated numbers of expression products expressed in heterodc hybrid progeny, and also in an optimization of the number of dominant products expressed.
In another optimisation protocol, optimization is achieved by determining all possible pair-wise combinations from the first heterotic group and identifying the cross which results in the largest sum of expression products, or by determining all possible pair-wise combinations from the first heterotic group and identifying crosses which result in a hybrid progeny (A; x A~) with a maximal number of differences as compared to B, or by determining all possible pair-wise combinations from the first heterotic group and identifying crosses which result in the hybrid progeny (A; x A~) having a greater number of differences with H
than the number of differences between B and A; or B and A,~. As above, this optimization results in crosses which achieve elevated numbers of expression products expressed in heterotic hybrid progeny, and also in an optimization of the number of dominant products expressed.
Such implementations can also be used to improve selection methods per se.
For example, in one method, self or back-crossed progeny derived from the A; x A~ hybrid are selected which either retain a set of expression products defined by the sum of expression products expressed in A; (but not A~ or B) and A~ (but not A; or B), or which show a larger number of expression products expressed in a topcross with B than does either A, or A~ when topcrossed with H.
One approach for comparing profiles is a nested analysis in which expression profiles are successively grouped together, and the many gene expression differences seen in individual pair-wise comparisons can be ranked hierarchically in a filtering process. This method is useful for identifying genes expressed in one set of genotypes vs, another, e.g.

WO 00/428:18 PCT/US00101422 hybrids vs. inbreds or bulked segregants from the two ends of a quantitative phenotypic distribution.
In any case, the methods of the invention can include inputing an expression profile for progeny or parental plants into a database of expression profiles.
This can be performed manually, but is more typically performed in an automated system.
Computer databases of expression profile information can be quite large, with from a few up to several thousand profiles in the database. Typically, the database will have expression product profiles of a representative sample of expression products for hybrid progeny plants resulting from at least 10 separate inbred plant crosses, or at least 10 inbred plant expression product profiles.
The phrase "computer system" or "integrated system" in the context of this invention refers to a system in which data entering a computer corresponds to physical objects or processes external to the computer, e.g., nucleic acid hybridization or protein binding data and a process that, within a computer, causes a physical transformation of the input signals to different output signals. In other words, the input data, e.g., hybridization of expression products on a specific array, is transformed to output data, e.g., the identification or counting of the sequence hybridized, comparison to similar arrays with different test materials, counting and categorization of expression products or the like. The process within the computer is a program by which positive (or negative) hybridization signals are recognized by the computer system and attributed to a region of an array, or other expression profile format (e.g., simple counting of array signals). The program then determines which region of the array the hybridized expression products are located on and, optionally, the specific corresponding sequences which the probe is based on (as noted above, no sequence information is required for making or assessing expression profiles).
The invention provides integrated systems for plant or plant cell manipulation and hybridization analysis. Typical systems include a digital computer with high-throughput liquid control software, image analysis software, and data interpretation software. A robotic liquid control armature for transferring solutions (e.g., plant cell extracts) from a source to a destination, is typically operably linked to the digital computer. An input device for entering data to the digital computer to control high throughput liquid transfer by the robotic liquid control armature and, optionally, to control transfer by the pinning armature to the solid WO 00112838 "" ' PCTIUSlIO/01422 support is commonly a feature of the integrated system, as is an image scanner for digitizing label signals from labeled probe hybridised to the DNA on the solid support operably linked to the digital computer. The image scanner interfaces with the image analysis software to provide a measurement of probe label intensity, where the probe label intensity measurement is interpreted by the data interpretation software to show whether, and to what degree, the -labeled probe hybridizes to a label.
A number of well known robotic systems have also been developed for solution phase chemistries. These systems include automated workstations like the automated synthesis apparatus developed by Takeda Chemical Industries, LTD. (Osaka, Japan) and 10 many robotic systems utilizing robotic arms (Zymate II, Zymark Corporation, Hopkinton, Mass.; Orca, Hewlett-Packard, Palo Alto, Calif.) which mimic the manual synthetic operations performed by a scientist. Any of the above devices are suitable for use with the present invention. The nature and implementation of modifications to these devices (if any) so that they can operate as discussed herein with reference to the integrated system will be 15 apparent to persons skilled in the relevant art.
High throughput screening systems are commercially available (see, e.g., Zymark Corp., Hopkinton, MA; Air Technical Industries, Mentor, OH; Beckman Instruments, Ine. Fullerton, CA; Precision Systems, lnc., Natick, MA, ete.).
These systems typically automate entire procedures including all sample and reagent pipetting, liquid 20 dispensing, timed incubations, and final readings of the microplate in detectors) appropriate for the assay. These configurable systems provide high throughput and rapid start up as well as a high degree of flexibility and customization. For example, the currently available commercial software package, BioWorks~ 1.4~, provided by Beckman Instruments, Inc. to control and operate their Hiomek~ 2000 robotics liquid handler supports a scripting 25 capability based on the publicly available Tool Command Language (TCL).
Beckman has incorporated a TCL interpreter into the Biomek~ 2000 and has included TCL
extensions (Bioscript~) to allow direct motor control and other instrument functionality.
A 16-bit (to run under Microsoft Windows 3.I~ and Microsoft Windows 95~) application to generate the TCLBioscript code can be created, e.g., in Microsoft Visual Basic 4.0 ~t .
The manufacturers of such systems provide detailed protocols the various high throughput. Thus, for example, Zymark Corp. provides technical bulletins describing screening systems for detecting the modulation of gene transcription, ligand binding, and the like. More recently, microfluidic approaches to reagent manipulation have been developed, e.g., by Caliper Technologies (Palo Alto, CA).
Optical images viewed (and, optionally, recorded) by a camera or other recording device (e.g., a photodiode and data storage device) are optionally further processed-in any of the embodiments herein, e.g., by digitizing the image and/or storing and analyzing the image on a computer. A variety of commercially available peripheral equipment and software is available for digitizing, storing and analyzing a digitized video or digitized optical image, e. g., using PC (Intel x86 or pentium chip- compatible DOSTM> OS2TM
WINDOWST"', WINDOWS NTTM or WINDOWS95TM based machines), MACiNTOSHTM, or UNIX based {e.g., SIJNTM work station) computers.
One conventional system carries light from the specimen field to a cooled charge-coupled device (CCD) camera, in common use in the art. A CCD camera includes an array of picture elements (pixels). The light from the specimen is imaged on the CCD.
Particular pixels corresponding to regions of the specimen (e.g., individual hybridization sites on an array of biological polymers) are sampled to obtain light intensity readings for each position. Multiple pixels are processed in parallel to increase speed. The apparatus and methods of the invention are easily used for viewing any sample, e.g., by fluorescent or dark field microscopic techniques.
Integrated systems for hybridization analysis of the present invention typically include a digital computer with high-throughput liquid control software, image analysis software, data interpretation software, a robotic liquid control armature for transferring solutions from a source to a destination operably linked to the digital computer, an input device (e.g., a computer keyboard) for entering data to the digital computer to control high throughput liquid transfer by the robotic liquid control armature and, optionally, an image scanner for digitizing label signals from labeled probe hybridized to expression products, e.g., on a solid support operably linked to the digital computer. The image scanner interfaces with the image analysis software to provide a measurement of probe label intensity.
Typically, the probe label intensity measurement is interpreted by the data interpretation software to show whether the labeled probe hybridizes to the DNA on the solid support.

Software to support sample processing can be divided into 4 functional categories: 1 ) liquid transfer control software, 2) image analysis software, 3) data management software, and 4) data interpretation software.
Conveniently, applications can share information through data files which the applications can read and create. For flexibility and ease of use, files can be formatted as simple text files and/or in Microsoft Excel~ or other worksheet format. This allows viewing and editing of the files through the use of commercially available software such as Microsoft Excel~. Those of skill in the art will recognize that this approach is only one possible set of systems that could be used in the support and facilitation of the process of the present invention. Other systems can easily designed to fit the particular needs of the user in the practice of the invention. By way of example, and not limitation, a Microsoft Windows~
user interface can be developed for mast applications using Microsoft Visual Basic 4.0~.
Most applications can be developed for a 32-bit environment to run under Microsoft Windows 95~ or 98~. 16-bit applications such as image analysis software developed by Optimas Corporation, Optimas 5.0, can also be useful components of the integrated system.
CLONING OF EXPRESSION PRODUCTS
Any nucleic acid encoding an expression product identified as being of interest by the expression profiling techniques noted herein, including dominant, additive and over or under dominant expression products can be cloned. Il is expected that many such nucleic acids, particularly dominant and additive nucleic acids will be encoded by loci responsible for desirable quantitative traits ("QTL" see, Edwards, et al., (1987) in Genetics 115:113). QTL
include genes that control, to some degree, numerically quantifiable phenotypic traits such as disease resistance, crop yield, resistance to environmental extremes, etc. In addition to the methods herein, other experimental paradigms can be used to identify, analyze and select for QTL. One paradigm involves crossing two inbred lines and genotyping multiple marker loci and evaluating one to several quantitative phenotypic traits among the progeny of the cross.
QTL are then identified and ultimately selected for based on significant statistical associations between the genotypic values determined by genetic marker technology and the phenotypic variability among the segregating progeny.
As applied to the present invention, the identification of particular nucleic acids which encode dominant, additive or under or over dominant expression products, or which encode silenced expression products, are potential products of QTLs or other genes or loci of interest. Accordingly, it is desirable to clone nucleic acids which are genetically linked to DNAs encoding these expression products for transduction into cells (e.g., coding sequences for expression products, or genetically linked coding or non-coding sequences), especially to make transgenic plants. The cloned sequences are also useful as molecular tags-for selected plant strains, e.g., to identify parentage, and are further useful for encoding expression products, including nucleic acids and polypeptides. Often, expression products which are differentially expressed between heterotic and non-heterotic plants are encoded by QTL and are responsible for the phenotypic effects of the QTL.
A DNA linked to a locus encoding an expression product is introduced into plant cells, either in culture or in organs of a plant, e.g., leaves, stems, fruit, seed, elc. The expression of natural or synthetic nucleic acids encoded by nucleic acids linked to expression product coding nucleic acids can be achieved by operably linking a cloned nucleic acid of interest, such as an expression product or a genetically linked nucleic acid, to a promoter, incorporating the construct into an expression vector and introducing the vector into a suitable host cell. Alternatively, an endogenous promoter linked to the nucleic acids can be used.
CIQ,~ine of Exnre~sion Product ~e4uences into Bacterial Hosts There are several well-known methods of introducing expression product nucleic acids into bacterial cells, any of which may be used in the present invention. These include: fusion of the recipient cells with bacterial protoplasts containing the DNA, electroporation, projectile bombardment, and infection with viral vectors, etc. Bacterial cells are often used to amplify increase the number of plasmids containing DNA
constructs of this invention. The bacteria are grown to log phase and the plasmids within the bacteria can be isolated by a variety of methods known in the art (see, for instance, Sambrook). In addition, a plethora of kits are commercially available for the purification of plasmids from bacteria. For their proper use, follow the manufacturer's instructions (see, for example, EasyPrepTM, FlexiPrep''M, both from Pharmacia Biotech; StrataCleanTM, from Stratagene;
and, QIAexpress Expression SystemTM from Qiagen). The isolated and purified plasmids are then further manipulated to produce other plasmids, used to transfect plant cells or incorporated into Agrobacterium tumefaciens related vectors to infect plants. Typical vectors contain transcription and translation terminators, transcription and translation initiation sequences, and promoters useful for regulation of the expression of the particular nucleic acid. The vectors optionally comprise generic expression cassettes containing at least one independent terminator sequence, sequences permitting replication of the cassette in eukaryotes, or prokaryotes, or both, (e.g., shuttle vectors) and selection markers for both prokaryotic and -eukaryotic systems. Vectors are suitable for replication and integration in prokaryotes, eukaryotes, or preferably both. See, Giliman & Smith, Gene 8:81 (1979);
Roberts, et al., Nature, 328:731 (1987); Schneider, B., et al., Protein Expr. Purif. 6435:10 (1995); Bergen Sambrook, Ausubcl (all supra). A catalogue of Bacteria and Bacteriophages useful for cloning is provided, e.g., by the ATCC, e.g., ~'he ATCC Catalogue of Bacteria a~d-Bacteriopha~_e (1992) Gherna et al. (eds) published by the ATCC. Additional basic procedures for sequencing, cloning and other aspects of molecular biology and underlying theoretical considerations are also found in Watson et al. (1992) Recombinant DNA Second Edition Scientific American Books, NY.
Transfecting and Manipulating Plant Cells Methods of transducing plant cells with nucleic acids are generally available.
In addition to Berger, Ausubel and Sambrook, useful general references for plant cell cloning, culture and regeneration include Payne et al. (1992) Plant Cell and Tissue Culture in Liguid Systems John Wiley & Sons, Inc. New York, NY (Payne); and Gamborg and Phillips (eds) (1995) Plant Ccll, Tissue and Organ Culture; Fundamental Methods Springer Lab Manual, Springer-Verlag (Berlin Heidelberg New York) (Gamborg). A variety of Cell culture media are described in Atlas and Parks (eds) The Handbook of Microbiological Media (1993) CRC
Press, Boca Raton, FL (Atlas). Additional information for plant cell culture is found in available commercial literature such as the Life Science Re,~garch CPS Culture Catalogue (1998) from Sigma- Aldrich, Inc (St Louis, MO) (Sigma-LSRCCC) and, e.g., the Plant Culture Catalogue and supplement (1997) also from Sigma-Aldrich, Inc (St Louis, MO) (Sigma-PCCS).
The nucleic acid constructs of the invention are introduced into plant cells, either in culture or in the organs of a plant by a variety of conventional techniques. For example, the DNA construct can be introduced directly into the genomic DNA of the plant cell using techniques such as electroporation and microinjection of plant cell protoplasts, or WO 00/42838 PCT/USOO/Ot422 the DNA constructs can be introduced directly to plant cells using ballistic methods, such as DNA particle bombardment. Alternatively, the DNA constructs are combined with suitable T-DNA flanking regions and introduced into a conventional Agrobacterium tumefaciens host vector. The virulence functions of the Agrobacterium tumefaciens host directs the insertion 5 of the construct and adjacent marker into the plant cell DNA when the cell is infected by the-bacteria.
Microinjection techniques are known in the art and well described in the scientific and patent literature. The introduction of DNA constructs using polyethylene glycol precipitation is described in Paszkowski, et al., EMBO J. 3:2717 (1984).
10 Electroporation techniques are described in Fromm, et al., Proc. Nat'l.
Acad. Sci. 1SA
82:5824 (1985). Ballistic transformation techniques are described in Klein, et al., Na_ lure 327:70-73 (1987).
Agrobacterium tumefaciens-mediated transformation techniques, including disarming and use of binary vectors, arc also well described in the scientific literature. See, 15 for example Borsch, et al., Science 233:496-498 (1984), and Fraley, et al., Proc. Nat'l. Acad.
Sri. USA $0:4803 (1983). Agrobacterium-mediated transformation is a preferred method of transformation of dicots.
To use isolated sequences corresponding to or linked to expression products in the above techniques, recombinant DNA vectors suitable for transformation of plant cells are 20 prepared. A DNA sequence coding for the desired mRNA, polypeptide, or non-expressed sequence is transduced into the plant. Where the sequence is expressed, the sequence is optionally combined with transcriptional and translational initiation regulatory sequences which will direct the transcription of the sequence from the gene in the intended tissues of the transformed plant.
25 Promoters, in nucleic acids linked to loci identified by detecting expression products, are identified, e.g., by analyzing the 5' sequences upstream of a coding sequence in linkage disequilibrium with the loci. Optionally, such promoters will be associated with a QTL. Sequences characteristic of promoter sequences can be used to identify the promoter.
Sequences controlling eukaryotic gene expression have been extensively studied. For 30 instance, promoter sequence elements include the TATA box consensus sequence (TATAAT), which is usually 20 to 30 base pairs upstream of a transcription start site. In most instances the TATA box aids in accurate transcription initiation. In plants, further upstream from the TATA box, at positions -80 to -100, there is typically a promoter element with a series of adenines surrounding the trinucleotide G (or T) N G. Sec, e.g., J. Messing, et al., in Genetic Engineering i-n Plants, pp. 221-227 (Kosage, Meredith and Hollaender, eds.
(1983)). A number of methods are known to those of skill in the art for identifying and -characterizing promoter regions in plant genomic DNA. See, e.g., Jordano, et al., Plant Cep, 1:855-866 (1989); Hustos, et ul., Plan dell 1:839-854 (1989); Green, et al., F,~fB
7:4035-4044 (1988); Meier, et al., Piant Cell 3:309-316 (1991); and Zhang, et al., Plant Physioloev 110:1069-1079 (1996).
In construction of recombinant expression cassettes of the invention, a plant promoter fragment is optionally employed which directs expression of a nucleic acid in any or all tissues of a regenerated plant. Examples of constitutive promoters include the cauliflower mosaic virus (CaMV) 35S transcription initiation region, the 1'- or 2'-promoter derived from T-DNA of Agrvbacteriurrr tumafaciens, and other transcription initiation regions from various plant genes known to those of skill. Alternatively, the plant promoter may direct expression of the polynucleotide of the invention in a specific tissue (tissue-specific promoters) or may be otherwise under more precise environmental control (inducible promoters).
Examples of tissue-specific promoters under developmental control include promoters that initiate transcription only in certain tissues, such as fruit, seeds, or flowers.
Any of a number of promoters which direct transcription in plant cells can be suitable. The promoter can be either constitutive or inducible. In addition to the promoters noted above, promoters of bacterial origin which operate in plants include the octopine synthase promoter, the nopaline synthase promoter and other promoters derived from native Ti plasmids. See, Herrara-Estrella et al. (1983), Nature, 303:209-213. Viral promoters include the 35S and 19S RNA promoters of cauliflower mosaic virus. See, Odell 'et al.
(1985) Na re, 313:810-812. Other plant promoters include the ribulose-1,3-bisphosphate carboxylase small subunit promoter and the phaseolin promoter. The promoter sequence from the E8 gene and other genes may also be used. The isolation and sequence of the E8 promoter is described in detail in Deikman and Fischer, (1988) 7:3315- 3327.

WO 00/42838 PCTlUS00/01422 If polypeptide expression is desired, a polyadenylation region at the 3'-end of the coding region is typically included. The polyadenylation region can be derived from the natural gene, from a variety of other plant genes, or from T-DNA.
The vector comprising the sequences (e.g., promoters or coding regions) from genes encoding expression products of the invention will typically comprise a nucleic acid -subsequence which confers a selectable phenotype on plant cells. The vector comprising the sequence will typically comprise a marker gene which confers a selectable phenotype on plant cells. For example, the marker may encode biocide tolerance, particularly antibiotic tolerance, such as tolerance to kanamycin, 6418, bleomycin, hygromycin, or herbicide tolerance, such as tolerance to chlorosluforon, or phosphinothricin (the active ingredient in the herbicides bialaphos and Basta). For example, crop selectivity to specific herbicides can be conferred by engineering genes into crops which encode appropriate herbicide metabolizing enzymes from other organisms, such as microbes. See, Padgette et al. (1996) "New weed control opportunities: Development of soybeans with a Round UP
ReadyT"' gene" In: Herbicide-Resistant Crops (Duke, ed.), pp 53-84, CRC Lewis Publishers, Boca Raton ("Padgette, 1996"); and Vasil (1996) "Phosphinothricin-resistant crops"
In: Herbicide-Resistant Crons (Duke, ed.), pp 85-91, CRC Lewis Publishers, Boca Raton) (Vasil, 1996).
Transgenic plants have been engineered to express a variety of herbicide tolerance/metabolizing genes, from a variety of organisms. For example, acetohydroxy acid synthase, which has been found to make plants which express this enzyme resistant to multiple types of herbicides, has been cloned into a variety of plants (see, e.g., Hattori, J., et al. (1995) Mol. Gen. Genet. 246(4):419). Other genes that confer tolerance to herbicides include: a gene encoding a chimeric protein of rat cytochrome P4507A1 and yeast NADPH-cytochrome P450 oxidoreductase (Shiota, el al. (1994) Plant Ph~iol. 106(1)17, genes far glutathione reductase and superoxide dismutase (Aono, et al. (1995) Plant Cell Physiol.
36(8):1687, and genes for various phosphotransferases (Datta, et al. ( 1992) Plant Mol. Biol.
20(4):619. Similarly, crop selectivity can be conferred by altering the gene coding for an herbicide target site so that the altered protein is no longer inhibited by the herbicide (Padgette, 1996). Several such crops have been engineered with specific microbial enzymes for confer selectivity to specific herbicides (Vasil, 1996).

Further, nucleic acids which can be cloned and introduced into plants to modify or complement expression of a gene, including a silenced gene, a dominant gene, and additive gene or the like, can be any of a variety of constructs, depending on the particular application. Thus, a nucleic acid encoding a cDNA expressed from an identified gene can be expressed in a plant under the control of a heterologous promoter. Similarly, a nucleic acid -encoding a transcription factor that regulates a target identified by the methods herein, or that encodes any other moiety affecting transcription, can be cloned and transduced into a plant.
Methods of identifying such factors are replete throughout the literature. Far a basic introduction to genetic regulation, see, Lewin (1995) Genes V Oxford University Press lnc., NY (Lewin), and the references cited therein.
R~eneration of Transeenic Plants Transformed plant cells which are derived by any of the above transformation techniques can be cultured to regenerate a whole plant which possesses the transformed genotype and thus the desired phenotype. Such regeneration techniques rely on manipulation of certain phytohormones in a tissue culture growth medium, typically relying on a biocide and/or herbicide marker which has been introduced together with the desired nucleotide sequences. Plant regeneration from cultured protoplasts is described in Evans, et al., Protoplasts Isolation and Culture~Handbook of Plant Cell Culture, pp. 124-176, Macmillian Publishing Company, New York, (1983); and Binding, Regeneration of Plants.
Plant Protoplasts, pp. 21-73, CRC Press, Boca Raton, (1985). Regeneration can also be obtained from plant callus, explants, somatic embryos (Dandekar, et al., J. Tissue Cult. Meth. 12:145 (1989); McGranahan, et al., Plant Cell Ren. 8:512 (1990)), organs, or parts thereof. 5ueh regeneration techniques are described generally in Klee, et al., Ann. Rev. of Plant Phvs.
38:467-486 (1987).
One of skill will recognize that after the expression cassette is stably incorporated in transgenic plants and confirmed to be operable, it can be introduced into other plants by sexual crossing. Any of a number of standard breeding techniques can be used, depending upon the species to be crossed.
GENE SILENCING AND HETEROSIS
It is discovered that gene silencing and epigenetic effects play a role in inbreeding depression. As demonstrated herein, the number of genes in hybrids with a dominant pattern of gene expression is correlated with hybrid yield, a component of which is found to be relief from inbreeding depression. An other way of considering genes in this class is to classify them as genes that are expressed at lower levels in one inbred parent than the other. When one copy of a gene that is expressed at low levels in one inbred is combined with a copy from another inbred, a frequent outcome in the hybrid is an equivalent level of -expression to that seen with two copies of the gene in one or other of the parental inbreds (most often the more highly expressing parent).
The number of genes in the dominant class were considered as a function of the number of hybrids that share those genes, and the frequency distribution indicated that the overlap between sets of genes contributing to dominant patterns of gene expression in hybrids is essentially random. This suggests that, during the process of inbreeding, expression of a subset of genes may always be altered (and usually reduced), and that the expression of different random subsets of genes are silenced in different inbreds.
These results agree well with the classical complementation concepts of metabolic balance and physiological bottlenecks (Hageman et al. 1967 "A
biochemical approach to corn breeding" Advan. A,g,~on. 19:45; Schrader, L.E. 19$5 "Selection for metabolic balance in maize" pp79-89 in Exploitation of physiological and genetic variability 1Q enhance crop groductivitv. Harper J.E.(ed). Waverly Press, Baltimore; and Manglesdorf, A.J. 1952 "Gene interaction in Heterosis, pp321-329 in to is, Gowen, J. (ed) Iowa State College Press, Ames) to explain heterosis. This hypothesis proposes that maize inbred lines have unbalanced metabolic systems with some enzymes at optimum level and some at rate limiting levels, or bottlenecks. Hybrids from inbred lines that have different rate limiting systems can overcome the bottlenecks by complementation. Depending on the gene product, a favorable allele can become an unfavorable allele in a different developmental stage; and vice-versa. Complementation, therefore results not only from quantitative aspects; i.e., variation in the level of expression, but also from qualitative aspects, e.g.
variation in function due to sequence polymorphisms.
Closely related crosses are less heterotic because, firstly, there are fewer band differences, either in level of expression or in sequence polymorphism, therefore fewer heterozygous loci providing potential opportunities for complementation.
Secondly, loci from closely related crosses are more susceptible to gene silencing. In more distantly related WO 00/42838 PCT/US00/Ot422 crosses, the inbred parents have a higher number of differential bands, and the resulting hybrid tends to express both alleles providing better complementation of unfavorable parental alleles. Such complementation allows for better responses to differing environments or during different developmental stages.
5 Without being bound to a particular theory, epigenetics provide a simple and elegant explanation for these effects. In Drosophila and other organisms, allelic (and non-allelic) effects have been described where expression in heterozygotes is normal, but in homozygotes traps-inactivation (or silencing) of both alleles occurs. These effects are mediated by cis-acting regulatory sequences that need to be present at more than one copy 10 (e.g. on different chromosome homologs) to mediate the cooperative assembly of multimeric protein complexes responsible for gene silencing (e.g., Polycomb proteins in Drosophila or SIR proteins in yeast). In maize, sequences responsible for these effects most likely occur in intergenic regions outside of the chromatin loops flanked by MARS that contain genes.
About 80% of the sequences in these regions are derived from retroelements that may be 15 transcriptionally silenced through natural selection. However, the intergenic regions are also where make exhibits most DNA sequence polymorphism. Thus, homozygosity of certain intergenic regions in inbreds could lead to adjacent gene silencing, whereas in hybrids fewer intergenic regions will be homozygous for sites that can assemble silencing complexes, so more genes will be derepressed. As new inbreds are created from hybrid crosses, 20 recombination randomizes the intergenic regions across the genome, thereby resulting in a new subset of genes that are silenced when those regions that can assemble silencing complexes are made homozygous. This model explains why inbreds express fewer genes than hybrids (which accounts for their lower yield) and why the number of genes that exhibit a dominant pattern of gene expression in hybrids increases as the percent relationship 25 between inbreds decreases. It also can easily accommodate potential explanations for the existence of heterotic pools, and the higher level of heterosis seen in maize as compared to other cereals (e.g. rice), which have a very different genome organization and level of sequence polymorphism. Finally, it is also possible that in maize, where natural inbreeding occurs infrequently because of its floral characteristics, natural selection may not have acted 30 to eliminate gene silencing at the same rate as in self-fertilizing species.

WO 00/42838 "" -- PCTNS00101422 MOLECULAR SECURITY; IDENTIFICATION OF PARENTAL SOURCES BY
COMPARISON OF EXPRESSION PROFILES
One general concern in the agricultural industry is that proprietary plant stocks or other sources of germ plasm can sometimes be inadvertently, or even deliberately, misappropriated. Because the germ plasm may be recombined with other sources of germ plasm before producing a product such as a hybrid seed, it is not always possible to tell that the product is improperly derived from proprietary parental plants, clones, or the like.
The present invention provides methods of identifying unique expression products and/or unique profiles (or partial profiles). This ability to identify unique expression products provides one way of ascertaining parentage, which, in turn, provides the ability to determine whether a hybrid comprises proprietary material.
In the methods, a source or the sources of a test plant such as a hybrid can be identified. In the methods, a representative sample of expression products from the test plant is profiled and the resulting test expression profile is compared to a database of known expression profiles for plants from known inbred or hybrid strains (methods of making such databases are described above). For example, the expression profiles for a selected tissue can be entered into a database far any or every proprietary plant (or clone, or any other source of germ plasm) that a corporation owns.
By profiling a number of plants, it is possible to detect unique expression products and/or expression patterns within the expression profile of specific plants. It is also possible to generate likely expression profiles for hybrid products of members of the database. Any of these expression profiles can be compared to an actual expression profile for a test plant suspected of being derived from a one or more proprietary plant. For example, a matrix of pair-wise comparisons for potential progeny from the expression profiles in the database can be compared to the test expression profile. Either the entire expression profile or a sub portion of the expression profile (i.e., a plurality of characters corresponding to expression products found in the overall profile) comprising at least one unique expression marker can be evaluated.
EXAMPLES
The following examples are offered by way of illustration, and are not intended to be limiting. One of skill will immediately recognize a variety of alternate WO 00/42838 PCT/USOOl0I422 procedures, compositions, reagents and the like which can be substituted for those exemplified below.
EXAMPLE 1: DIFFERENCES IN RNA EXPRESSION PROFILES CORRELATE
WITH HETEROSIS
Heterosis is a term used to describe the increased vigor of hybrid progeny in -comparison to their parents. Although heterosis has been widely used in plant breeding for many decades, the molecular mechanisms underlying the phenomenon were previously unknown. In this example, heterosis was studied as a phenotype using CuraGen (CuraGen Corp., New Haven CT) RNA profiling technology to examine differences in RNA
expression between hybrids and their inhybrid parents. Using this approach, it was possible to sort out cDNA fragments into different categories, depending on their relative levels of expression in a given hybrid and its two parents. Data indicated a difference in the number of genes in each category (dominant, under-dominant, over-dominant, additive) between heterotic and non-heterotic hybrids. The results also suggested the ability of this approach to explain the molecular basis of heterosis and the application of the information obtained to plant breeding methods.
The degree of heterosis varies tremendously among hybrids from different parental combinations. In current breeding practice, selection far parent combinations which give a high degree of heterosis depends on top-cross yield tests. In this disclosure, new methods of monitoring heterosis by identifying genes and gene expression patterns associated with heterosis expression are provided. Specific gene expression patterns associated with heterosis are identified prior to yield testing. This allows screening of larger numbers of top-crosses without having to yield test all combinations. Similarly, non-optimally expressed genes in existing commercial hybrids can be identified and improved by transgenic manipulation or gene-expression profile assisted selection.
"PAR" names herein are arbitrary predesignations of commercial and proprietary strain names. Because the invention is applicable to any crop strain, the particular strains used are not critical, or even relevant, to the claimed invention.
Accordingly, actual crop strain names are not provided.
The PAR, series of hybrids used for RNA profiling are listed in Table 1.

WO 00/42838 PC"f/US00/01422 1n Table 1, these hybrids range from a highly heterotic commercial hybrid (PAR, = PAR,/PAR~) to sibling crosses (e.g. PAR,/PAR,7) which have much less heterosis.
Each hybrid is derived from the same female parent (PAR,) and a male parent with a different percentage pedigree relationship. The correlation between heterosis and pedigree relationship is given in Table 1. Figure 1 graphically represents the correlation between degree of heterosis and % relationship: % relationship is designated on the X axis; F1-MP heterosis in bu/LCR is given on the Y axis. Data was obtained from 4 locations in JH97.
t o N i' FI-MP
H brid Inbred arents % Heterosis Relationshi(bu/LCR) PAR /PAR, PAR PAR, 0.8 74.6.5 PAR /PAR PAR PAR 1.1 67.15 PAR /PAR PAR PAR 1.5 69.75 PAR /PAR PAR PAR, 2.4 80.20 PAR /PAR PAR PAR 3.1 53.55 PAR /PAR PAR PAR 4.5 71.65 PAR /PAR PAR PAR 20.7 50.40 PAR /PAR PAR PAR 45.9 12.75 PAR /PAR PAR PAR 47.3 34.80 PAR /PAR PAR, PAR 48.1 55.20 PAR /PAR PAR PAR , 48.8 43.95 , PAR /PAR PAR PAR 60.5 50.05 , PAR /PAR PAR PAR 62.4 37.50 PAR IPAR PAR PAR 63.9 43.10 PAR /PAR PAR PAR 71.0 48.50 PAR /PAR PAR PAR 85.5 25.10 In seedlings and immature ears, 90-95% of RNAs in each F, hybrid were expressed at the same levels as in both parental inbreds. Genetically distantly related inbreds, e.g., the parents of commercial hybrids, had less than 6% of the mRNAs differentially expressed. The number of differentially expressed RNA bands between two inbred parents WO 00/42838 ~ ' PCT/US00/01422 was positively correlated with the corresponding hybrid yield, demonstrating that either gene expression differences and/or DNA sequence polymorphism between inbred parents are important for heterosis.
The level of RNA expression in the hybrid can differ from one inbred parent or the other (dominant), or both (additive or over-/under-dominant). Figure 2 depicts the classification of gene expression patterns in F1 hybrids relative to the inbred parents. RNA
levels are provided on the vertical axis. Bands in each class exhibited the following expression patterns: (A) Over/under-dominant class: the level of expression in Fl hybrid is at least two folds higher or lower than both parents, which have either equal or different levels of expression. In the additive. The majority of RNA expression level differences in both tissues of all hybrids analyzed were in the (B) additive and (C) dominant classes, the mRNA
levels of the inbred parents are different. Additive class: F1's expression level falls within the range of the two parents. Dominant class: the level of expression in Fl hybrid is equal to one parent but different from the other. Two-thirds of the differences observed exhibited additive expression, and the rest of the differences demonstrated a dominant expression pattern.
Furthermore, the number of dominant and additive RNA fragments correlated with the degree of heterosis. Initial studies of both seedlings and immature ears, demonstrated correlations between the number of RNA fragments in the over-/under-dominant class and the %
relationship between the inbred parents. Five commercial hybrids (PAR,-PARZ~) selected for high yield all had high numbers of dominant and additive bands and a lower number of under/over dominant bands.
A new metric that measures the genetic distance between the two parents and the frequency of non-additively expressed RNA's in the over-/under-dominant class was developed. This was defined as the ratio between the sum of the RNA fragment numbers that differ from the hybrid in each of the two parents and the number of RNA's that differ between the two parents. [i.e., (A-Fl)+(B-Fl)/(A-B)J. High yielding commercial hybrids between distantly related parents and with fewer over-/under-dominant RNA's give a lower ratio close to 1.0 (Tables 2, 3 & 4). Figure 3 illustrates the correlation of gene expression patterns with hybrid yield. Hybrid yield in bu/LCR is given on the X axis, while % of bands in each expression class is given on the Y axis (% of bands different: dotted line; %
of additive bands: dashed line; and % of dominant bands: solid lint).

4p c~. v,v,a a N r N ..,.p ..

N ~ M N - G W G: o:~ ~ ~nO:
a c. o.c ~ G ooa ooa. oor r c ~
~
v ~

w W
,N N O~~ ~ ~ N ~ h ~ M

~ N N 00N N N V N ~ V1 _C Z

t '''' .3 a N .~
c ~n o - .oo v~_ ._o; ~o o,--~
A
a V ~fO O ~ t r r vO t~1V
a B d v 'ZS
a _ M <

N

'o " N c o ' H z '~ a~m ' o .~ . ~

-b ca N

'~ '~'" ~?ao a oo ~op . ~ ~,a r,a N
_ ey ty M M

'$ 'd 'v y aE

~ ~ N MVN 0~0~ ~1~ ~ ~ r . Z.
E

U

Q.

o s a~

V o ~, e.~n~.r N N
.s o V' v 'nerrv "
~

3 = ~ , a ., Ebb ar O
z a ~ ~ V Vf,~ ~O O O O O V 'CV
Zf h h M ~ n n n n h ~ t~1 r .

V ~ r ~ yn- ac 4 O O A r oo ~

Z z z z r v~v ~ r Zm ..r J r V p .~V ~ ~ N ~ r V
~ ' ~

w ~ e ~ ~ b .'cn n n n ~

-p m 'p ~ ~ o = $ g ~ $ $ o o ~ , o C C ' C C C C G ~ ~p v o0 r ~ ~. A

c~ m w t , 3 ~ ~ . ~ E ~ o ~

ae ~ oeocm x ~ z o~ ., 0Gof a ~ aG
~

a a a a < a a a a a < <
i! ~ 4 L 0.a L p c 0. 0. 0Ø

~ '. , , Z

~

V1 ~ .~-~ N N

WO 00/42838 PC'f/US00101422 The number of RNA fragments in the additive class was higher in all heterotic hybrids which include five commercial hybrids (PAR,/PARz (+PAR,9], PARzo, PAR,"
PAR~2, PAR", PAR,/PAR~). The same trend was also found in seedling tissues of selected hybrids analyzed. There was also a strong correlation between the number of dominant RNA
bands and Quo of yield heterosis (Figure 3).
Table 3 Gene expression~att~rns of h"~brids in relation to heterosis. Total no. of bands assayed is approximately 14,000 for all genotypes. 1 % is about 140 bands. %
of bands different (A-B): °h of bands differentially expressed when comparing the inbred parents of corresponding hybrids. % of bands additive or dominant: % of bands where F1 had an additive or dominant expression pattern, respectively. Expression data based on 3 sample replicates.
Hybrid 6 of pedigreeHybrid 9 of bands~ of bands9& of relationshipyield different additive bands (Bulac (A-B) dominant rc) PAR, /PARE0. 8 125.7 5.6 3.7 1.9 PAR~/PAR31.1 123.4 5.6 3.6 2.0 PAR,/PAR,671.0 99.0 3.6 2.4 1.2 PARt/PAR946.0 68.6 0.8 0.5 0.3 PAR,/PAR"85.5 67.1 0.5 0.3 0.2 One way to interpret these data is to assume that for every gene, there is an optimal level of expression. Different inbreds may have subsets of genes that are expressed either below or above the optimum, thus contributing to their poor vigor. In hybrids, many genes expressed in parent A, but not in parent B, may be expressed at the same level as in parent A and vice versa. Thus, hybrids will have more genes expressed at an optimum level than either parent A or parent B, and the genes expressed at optimum level in A and B will complement those expressed at sub- or supra-optimal level in the other inbred.
This arrangement is represented graphically in Figure 4. (Panel A illustrates the dominant class, panels B and C illustrate the additive and over/under dominant classes, respectively. H:
3U "optimum" level of mRNA expression). Thus, high heterosis is associated with an increase in the dominant and additive classes and a decrease in the over-/under-dominant class. In crosses between related inbreds, the additive class may disappear as more of these genes are likely to be iso-allelic.
T 1 4. The numb r t d' w n Comparison of RNA No. of No. of Ratio of Heterosis Profiles DifferencesDominant (A-bands Fl)+(B-F,)/ (k of (Inbreds denoted (A- Fl) by A or B, B) H brids b F1) PAR -PAR A-B 1309 1227 0.95 59.4 PAR -PAR A-B) 1404 1297 0.93 54.4 PAR -PAR xPAR (A-F1)706 PAR -PAR xPAR (B-F1)604 PAR -PAR (A-B 952 870 1.12 48.4 PAR -PAR xPAR (A-F 601 1 ) PAR -PAR xPAR (B-F 464 PAR -PAR (A-B 505 252 2.96 37.4 PAR -PAR xPAR (A-F1)745 PAR -PAR xPAR (B-F1 748 PAR -PAR A-B 359 276 2.47 18.6 PAR -PAR xPAR (A-F 510 1 ) PA -PAR xPA B-F1 376 PAR PAR (A-B 1487 1327 0.99 > 50 PAR -PAR (A-F 1 762 PAR -PAR (B-F 1 ) 704 PAR -PA A-B 1565 1425 0.81 > 50 PAR -PAR (A-F 1 ) 625 PA -PAR (B-Fl 640 PAR -PAR (A-B 1403 1258 0.93 > 50 PAR -PAR (A-F1) 724 The poor correlation between the number of genes in the over-/under-dominant class and the degree of heterosis is surprising. The data suggest that when breeders select for highly heterotic hybrids, they may also be selecting against genes that fall into this class. The logical extension to this argument would be that if derivatives, e.g., of PAR,9 are selected or engineered that have fewer or no genes that fall into this class, they will have a higher yield than PAR,9 itself.
EXAMPLE 2: PREDICTING HETEROSIS FROM ANALYSIS OF SHARED
ADDITIVE BANDS; IDENTIFICATION OF GENES INVOLVED IN HETEROSIS
Immature ear mRNA was profiled from 10 hybrids and their respective inbred parents. The genotypes profiled included a number of commercial hybrids and a set from the "PARZ, series," in which PARZ, was used as a common female with a series of males that differed in percent relationship. Differentially expressed bands among hybrids and inbred parents were categorized according to whether they were additive, non-additive [= over-/under-dominant] or dominant. Analysis of this set of data from profiles of all 10 hybrids showed the following.
First, there was an inverse correlation between heterosis and the number of non-additively expressed sequences. Second, the number of RNA fragments in the additive class was higher in all heterotic hybrids analyzed, which include five commercial hybrids (PAR,9, PARZO, PAR2,, PARz2, PAR23) and PARz4/PARZS (PARzb cross). Third, the data also indicated a strong correlation between the number of dominant RNA bands and the degree of heterosis.
Table 5: # of Ad ve Bands 'fi # of Additive Bands# of Hybrids sharing 26 5 or more 94 4 or more 262 3 or more 612 2 or more 1635 1 or more Identifying and cloning genes in common to the additive and dominant classes amongst a series of highly heterotic hybrids that share little relationship to each other by pedigree is of value. In comparing the additive class of all 10 hybrids, the additive bands occurring in one or more of the 10 hybrids were considered. The results are shown in Table 5.
The maximum number of hybrids an additive band occurred in was seven out of 10. By analyzing the frequency of bands that occurred in each group mentioned above, in all groups a constant pattern in the bands shared between hcterotic hybrids (See Table 6) was detected. Heterosis prediction based on this data gave the following rank:
10>$>2>7>6>1>3>9>4>S. The corresponding hybrids are: PAR23> PAR22>
PAR27/PAR25>
PAR21> PAR2a> PAR19> PAR2,/PAR43> PAR24/PAR25> PAR2~lPAR3,> PAR~.~'AR~,.
Comparing with the actual yield heterosis data in Table 6, the ranking is very close to the yield.
Table 6' Heterosis/Corresnon,~in~Hybrid Information Hybrid CorrespondingHeterosisFrequency FrequencyFrequencyFrequency *

hybrid (~ of (2G bands) (94 (262 (G12 Pl ) bands)bands) bends) 1 PARz,/PAR,,,59.4 15 52 103 IG7 (PAR ",) 2 PAR"/PAR=,54.4 20 57 137 212 3 PAR"/PAR"48.5 13 36 63 94 4 PAR=r/PAR,725.1 G 8 12 15 5 PAR=.,/PAR"12.8 I 2 4 6 6 PARE, >50 15 48 107 214 7 PAR" >50 17 52 124 252 2~ 8 PAR" >50 21 62 142 253 10 PARE, Commercial G7 139 253 hybrid The expression patterns of the two SS x SS hybrids, PAR4G/PAR48 and PAR4G/PAR4~, were also informative. The pedigree relationship of the two hybrids are similar (23% and 27%, respectively ); however, the heterosis levels are different significantly, 2.3%
for the former and 36.6% for the latter. The difference in the number of additive bands is striking between the two (1 and 60).
EXAMPLE 3: ANALYSIS OF DOMINANT GENE EXPRESSION CLASS
Additional observation from further data analysis was that there is a difference in the number of dominant bands contributed by the male vs. female parent, i.e., whether the expression level in F1 is the same as that of male or female parent. The number of dominant bands contributed by the male parent are consistently higher across all hybrids analyzed, regardless of the degree of heterosis. However, there is a better correlation between the yield heterosis and the number of the dominant bands contributed by female parents than male parents, especially with the PARZ, series.
When the dominant bands were grouped according to whether they arc up- or down-regulated in the hybrid, that is, whether the hybrid is the same as either the higher or lower parent, Fl tends to have an expression level closer to the parents with higher expression. This is especially true with the RNA bands that are similar between an F1 hybrid and its male parent.
The consistent association of higher numbers of dominant and additive RNA
5 bands with heterotie hybrids, regardless of genetic backgrounds or developmental stages, and the tendency of up-regulated gene expression of dominant bands in hybrids, suggested that genes in hybrids arc in a more active phase than in inbreds. Secondly, more genes are in such an active condition in heterotic hybrids than in poor hybrids. Thus, genes are mostly silenced or inactivated when their regulatory elements are in a homozygous condition, e.g., in inbreds, 10 but re-activated when in a heterozygous condition. Hybrids derived from two inbreds that have optimal complementation to each other to give rise to an heterozygosity condition for most of these regulatory elements had a maximal number of genes "re-activated"
and were therefore, heterotic. Crosses of closely related inbreds or inbred lines that did not have such "optimal complementation" had fewer genes re-activated and produced low heterotic hybrids.
Table 7 Difference in the numbers of dominant band~con pbuted by male vs female parent Genotypes 96 HeterosisTotal Total No. No. of Ratio REI, No. No. of of (~ of of bandsof domntdomnt domnt male F1 ) to differentbands bands bands Female by by (A-B) male female parent parent PAR_; PAR_, 0.08 59.4 1309 605 346 239 1.34 :

PART; PAR_s0.11 34.4 140d 64G 419 227 1.85 :

PARi,-PAR" 71 48.4 952 462 299 163 1.83:1 PARl,-PAR" 86 37.4 505 241 133 108 1.23:1 PAR"-PAR" 45 18.G 359 180 134 46 2.91:1 PAR,,-PAR", 0.04 com 1487 739 415 324 1.2R
:

PARro PAR"0.04 com 1565 751 417 333 1.25 :

PAR,_-PAR" 0.01 com 1403 G50 370 280 1.32:1 PARE,-PAR=f 0.21 PARE 1484 681 444 237 1.87:1 PAR,; PAR,f 0.03 com 1297 SGB 364 204 1.78 :

Another correlation (statistictll association) from this data set is that there is a difference in the number of dominant bands contributed by the male vs. femme parent, i.e., the expression level in F1 is the same as that of male or female parent.
Figure % illustrates parental effects on gene expression of heterotic and non-heterotic hybrids.
Total number of dominant bands were calculated for each hybrid as 100oJo. (Fl=male or female:
dominant bands where F1 hybrid has equal level of expression as the male or female parent, S respectively). The number of dominant bands contributed by the male parent are constantly higher across all hybrids analyzed, regardless of whether heterotic or non-heterotic (Table 7).
Also, there is a better correlation between the yield heterosis and the number of the dominant bands by female parents than male parents, especially with the PAR27 series (Table 7). For example, the least heterotic hybrid PARIlPARI7. a sib cross, had 96% of male dominant bands and 4% female dominant bands. Whereas the hybrid exhibiting the highest degree of heterosis PARI/PAR2 had 60% and 40°Io male and female dominant bands, respectively. When these dominant bands were grouped according to whether they are up- or down-regulated in the F1, that is, where the F1 is the same as either the higher or lower parent, Fl tends to have an expression level closer to the parents with higher expression.
This is especially true with the RNA bands dominant by the male parents (Table 8).
Table 8 Iy~amber or of down RNA regulated.
bands where is up Male Female Parent Parent Hybrids TotatUp Total Up RegulatedDown Regulated (F1 Regulated Down Ratio Regulated Ratio (FI (F=higher(F1 =lower(up:down)=female)(Fl (Fl (up:down) = =higher=lower male)parent)pnrent) parent)parent) 2.0 PAR:.,/f'ARa344 258 8G 3:1 259 l57 102 1.5:1 PAR_.~PARa 419 304 ll5 2.G:1227 154 73 2.1:1 PARr,/PAR" 299 20G 93 2.2:11G3 96 67 1.4:1 PAR_.~l'AR" 133 97 3G 2.7:1108 51 57 0.9:1 PAR_~PAR" 134 I06 28 3.B: 46 1 B 28 0.6:1 I

PARp,/PAR~,415 274 141 (.9:1324 152 172 0.9:1 PAR,/PAR" 417 285 132 2.G:1333 189 144 1.3:1 PARr./PAR" 370 251 119 2.1:1280 195 85 2.3:1 PAR'/PAR.~ 444 270 174 LG:1 237 165 72 2.3:1 PAR,,~/PAR" 363 19G 167 1.2:1204 144 GO 2.4:1 EXAMPLE 4: GENES SPECIFICALLY EXPRESSED IN HIGH YIELDING
COMMERCIAL HYBRIDS
Most of the analyses so far with the RNA profile data described in Example 1 are based on the expression patterns of F1 hybrids relative to their inbred parents, such as additive vs. non additive classifications and the differences of these categories between heterotic and non-heterotic hybrids. While the results so far were informative, another way of analyzing this data set by comparing the levels of RNA expression of poor hybrids with heterotic hybrids without any involvement of their parents. In comparing all 10 hybrids, which include 3 breeding crosses and 7 commercial hybrids, a list of bands that have similar expression level among heterotic hybrids but different from the non-heterotic hybrids (breeding crosses) was determined.
EXAMPLE 5: EXPRESSION PROFILING USING DIFFERENT TISSUES FROM
HYBRIDS AND PARENTS
RNA profiling data from hybrid sets (hybrids and their respective parents) were obtained in maize. Five other sets utilized kernel tissue at 13 days after pollination ("DAP"). A total of 14 hybrid sets for the immature ear (V 19), five for the kernel (R2) and three for the seedling tissue (V3) were profiled.
For immature ear tissue, the 14 hybrid sets analyzed included seven from the PAR,7 series, which covers a spectrum of heterosis levels ranging from commercial hybrids to low heterotic hybrids of sibling crosses; four commercial hybrids from diversified genetic backgrounds other than PAR" series and three crosses between inbreds of the same heterotic group, typical of those that would be useful for breeding new inbreds.
The five hybrid sets where kernel tissue was analyzed and the three hybrid sets from the PAR~~ series where seedling tissue was analyzed were from the PAR2~
series.
Profiling data of all these hybrids from all three tissues analyzed gave similar expression patterns. However, the immature ear tissue was mare informative than seedling tissue and less complicated than the kernel tissue, which is compounded with other effects due to pollen WO 00/42838 PC'f/US00/01422 4$
o ' n m rs re t N-f ld 'f ere es in the ex res ion of each h fr A ,"Z
PAR
vs.

S PAR" PAR" PAR" PAR" PAR" PARi,/PAR,sPAR" PAR" PAR;, vt. vs. vs. vs. vs. vs. vs vs.

Baod PAR"/PAR,"PAR,,/PAR"PAR"/PARE,PAR=,/PAR~,PARa,/PAR"
PARs,/PAR"PAR=,!
Jm (PAR"
crass) PARr,/PAR"

PARn d010-163.50 0 0 0 0 -2.66 -2.19-G.49 -4.31 dIlvO-123.20 0 -2.66 -2.06 U 0 3.34 7.09 5.14 dOvO-172.47.8 0 0 0 0 8.91 IB.4818.44 29.45 In dOvO-104.40 0 U U 0 U 2.36 2.55 2.45 gOmO-339.80 0 0 0 0 0 4.97 2.82 2.09 gln0-389.30 0 U 2.69 -4.U3 U -2.73-2.57 -2.98 hOcO-173.10 0 0 0 0 0 2.43 2.83 2.16 hOcO-285.42.07 2.64 0 2.04 0 0 -2.4?-2.1 -2.88 M1~0-131.40 0 0 0 0 3.85 4.83 4.81 5.16 i0e0-45.6D 0 0 0 U U -2.49-2.2G -2.53 i0a0-237.80 0 0 0 0 0 2.59 3.3 2.81 i0a0-242.80 0 0 0 0 0 -2.R4-4.76 -2.87 i0a0-252.1U 0 0 -2.74 2.72 0 69.2129.62 69.23 i0c0-95.60 0 -2.5 0 0 0 -3.02-22 -2.76 f iUCO-203.40 U 0 0 0 0 -2.26-2.87 -2.04 i0c0-312.10 0 0 0 -R.74 0 6.11 7.11 7.9 i0m(1-271.5U 0 0 0 0 0 -2.922.39 2.63 IOnO-140.4D 0 0 0 -2.03 0 2.55 2.1 2.45 25 IOnO-210.5-2.13 0 -2.21 0 0 12.09 13.438.82 16.27 mla0-89.30 0 0 0 0 0 3.41 3.74 2.4 mlaU-239.6-2.54 0 0 0 0 0 -2.8 -4.05 -4.94 mla0-241.60 0 0 0 0 0 -4 -3.14 -2.G4 mla0-425.1D 0 -2.34 U 0 0 4.59 6.64 6.1 3~ rOkO-190.70 0 0 0 2.45 0 -2.85-4.4 -9.01 w9c0-128.30 U 4.54 0 U 0 2.53 3.63 3.14 wOcO-230.20 0 6.83 0 0 0 -2.72-3.14 -3.13 wOcO-267.2-15.180 3.09 0 -5.12 2.6. 6.62 6.35 10.73 wOcO-381.30 0 0 0 0 0 17.4539.08 44.17 35 wuno-xsl.zo D o 0 0 0 2.64 2.s6 z.s wOhO-406.60 0 0 0 0 0 7.34 4.21 7 PAR", vs.

PAR" PAR" PAR" PAR" PAR" PAR"IPAR"PAR" PAR" PAR"
vs. vs. vs. vs. vs. vs. vs vs.

Band PAR"/PAR"PAR,,/p'AR"PAR"IPAR"PAR,,IPAR"PAR"/PAR"(PAR"
PARi,/PAR"PAR=,/
m cross) PAR"/PAR"

PAR"

w0i0-J54.40 0 -2.04 0 0 0 3.73 3.65 3.8 w0i0-265.30 0 0 0 0 0 -2.51-2.73 -2.28 y0i0-118.10 0 0 0 -2.15 0 4.19 6.57 6.ti5 y0i0-254.40 0 0 0 0 4.21 2.6 6.95 5.32 sources, such as xenia, maternal effects, etc. Profile analysis of all the samples consistently showed similar correlations between profile information and heterosis to those described in Examples I, 2 and 3.
IO Dominant bands from all I4 hybrids shared by number of hybrids is presented in Figure 2. The number of dominant bands shared by one or more hybrids is normalized to 100%. The data show that the dominant bands shared by two or more hybrids range from 60-80%; bands shared by three or more hybrids is about 40-50% and so on. Although the total number of dominant bands was important to make a heterotic hybrid, the dominant bands shared by a higher number of hybrids may not necessarily contribute to the heterosis expression.
Since seedlings show the same trend as immature ears, albeit with different genes involved, it is possible to select at the seedling stage individual hybrid combinations that express the highest number of genes with a dominant expression pattern and that have fewest genes in the over-/under-dominant class. Thus, much larger numbers of F2 top-crosses are screened using this procedure as a first cut, than could be screened by rnulti-location yield tests alone.
In addition to the analyses above, another way of analyzing the profile data can be used. In this approach, the levels of RNA expression of poor hybrids and heterotic hybrids are compared without any involvement of their parents. This approach examines whether the absolute level of expression of a subset of genes are important for heterosis, in addition to the additive vs, non-additive expression patterns we already found. In the dominantly expressed bands, the F1 hybrids tend to have the same expression levels as the higher parent, i.e. showing overall an up-regulation of gene expression (Table 10). In comparing all hybrids with PAR,9, 34 bands that have a similar expression level among heterotic hybrids but different from the non-heterotic hybrid were identified (Table 9; the last three columns are non-heterotic hybrids). For these 34 bands, the 3 poor hybrids show either higher or lower expression than PAR,9 whereas all other hybrids, which are heterotic, show 5 no or little differences in the expression relative to PAR,9. -TalnP Ill PrPrlnminanc~P of u~reEUlated bands ir~t~e hybrids vs the,~r parents Dominant Bands Genotype TotalUp- Dn-10 PARz,- 656472 184 PAR3z- 629446 183 PARZ,~ 670435 235 PAR~~- 621431 190 PAR"- 558342 216 PAR2~- 588319 269 PAR2,- 549311 238 20 PARZ,- 459304 1 SS

PARZ~- 269175 94 PAR,,- 191137 54 EXAMPLE 6: CORRELATIONS TO MALE VERSUS FEMALE PARENTS
As indicated previously, a preponderance of male dominant bands was observed when immature ear mRNA was profiled from hybrids and their respective inbred parents (Figure 5). Selected male dominant bands were screened for allelic sequence polymorphism between inbred parents such that male and female alleles were identified.
Several bands exhibited an allelic polymorphism between the two parental alleles, and these were further tested for mono- or bi-allelic expression in the F1 hybrids. PCR
primers were designed based on the sequence information and used to amplify cDNAs derived from mRNAs of F1 hybrids. More than 20 cDNA clones derived from F1 mRNA derived from a single locus were randomly picked and sequenced. All cDNAs expressed in the Fl were identical to the allele expressed in the male parent and none were identical to that expressed in the female parent. These results are illustrated in Figure 6a-c which show an allelic expression test of a male dominant band (wOhO) cloned from CuraGen. (A) schematic representation of polymorphic amplification products; B) sequences of 9 random cDNAs from 50% PAR, + SO% PAR, mRNA used as a control for allelic discrimination in PCR
cloning; C) sequences of 10 random cDNAs from PAR,/PARZ mRNA are all the same as PARZ allele). This result is consistent with expression of only the male-derived allele and -silencing of the female-derived allele. To insure that preferential amplification did not explain the differential amplification results, equal amounts of mRNA from each parent genotype was mixed and amplified by PCR. Of nine cDNA clones sequenced from the control reaction, five were from the male parental allele, and four were from the female parental allele, demonstrating that no discrimination between the alleles occurred during amplification.
Accordingly, the disclosures and descriptions herein are intended to be illustrative, but not limiting, of the scope of the invention which is set forth in the following claims. One of skill will recognize many modifications which fall within the scope of the following claims. For example, all of the methods and compositions herein may be used in different combinations to achieve results selected by one of skill. All publications and patent applications cited herein are incorporated by reference in their entirety for all purposes, as if each were specifically indicated to be incorporated by reference.

Claims

WHAT IS CLAIMED IS:

1. A method of screening for heterosis in plants, comprising:
(i) profiling expression of a first representative sample of first expression products from a first progeny plant to quantify the expression products produced in the first progeny plant, wherein the number of first expression products produced in the first progeny plant is -correlated with a measure of heterosis in the first progeny plant; or, (ii) profiling expression of a second representative sample of second expression products from the first progeny plant to quantify or identify the dominant expression products in the second representative sample, wherein the number of dominant expression products is correlated with a measure of heterosis in the progeny plant.

2. The method of claim 1, further comprising:
selecting the progeny plant profited in (i) or (ii), based upon the number of first expression products in the first representative sample, or based upon the number of second expression products in the second representative sample that exhibit a dominant expression pattern.

3. A plant selected by the method of claim 2.

4. The method of claim 1, wherein the first and second expression products are independently selected from: mRNAs and proteins.

5. The method of claim 1, wherein the first or second representative sample corresponds to between about 1,000 and about 20,000 gene products.

6. The method of claim 1, wherein expression of at least about 50% of the first or second expression products produced in a selected tissue are detected.

7. The method of claim 1, wherein expression is profiled in step (i) or step (ii) using one or more technique selected from: hybridization of expressed or amplified nucleic acids to a nucleic acid array, hybridization to a protein array, hybridization to an antibody array, subtractive hybridization, and differential display.

8. The method of claim 1, further comprising:
selecting the first progeny plant for one or more characteristics selected from: a selected number of dominant expression products, a selected ratio of dominant expression products to total expression products, expression of a dominant expression product exhibiting an allelic sequence polymorphism, a desired number of over- or under-dominant expression products, a selected ratio of over- or under-dominant expression products to total expression products, a selected number of additive expression products, and a selected ratio of additive expression products to total expression products.

9. The method of claim 1, further comprising:
identifying which expression products from the first or second representative sample show a dominant, additive, under-dominant, or over-dominant expression pattern for at least a portion of the representative sample.

10. The method of claim 1, further comprising:
selecting the first progeny plant to maximize the number of dominant expression products or to maximize the number of additive expression products, or to express a dominant expression product exhibiting an allelic sequence polymorphism, or to minimize the number of over- or under-dominant expression products.

11. The method of claim 1, further comprising:
cloning at least one nucleic acid encoding an expression product selected from: an additive gene product, a dominant gene product, which dominant gene product optionally has an allelic sequence polymorphism, an over-dominant gene product, an under-dominant gene product, and the product of a transgene derived from the first or second parental plant or the first progeny plant.

12. The method of claim 11, further comprising transducing the at least one nucleic acid into a target plant, resulting in an increase in the number of additive or dominant gene products expressed in the target plant.

13. The method of claim 1, further comprising:
crossing a first parent plant with a second parent plant to produce the first progeny plant.

14. The method of claim 13, further comprising: profiling parental expression products from either the first or second parent plant.

15. The method of claim 13, further comprising:
profiling expression of parental representative samples of gene products from the first and second parent plant; and, comparing the resulting parental expression profiles of the first and second parent plants with an expression profile of the first progeny plant.

16. The method of claim 13, wherein the first parent plant is a female plant and the second plant is a male plant, the method further comprising:
crossing at least a third male plant to the first female plant to produce at least a second progeny plant;
comparing an expression profile of the first progeny plant and an expression profile of the second progeny plant to an expression profile of the first female plant;
and, selecting the first or second progeny plant based upon similarity to the expression profile of the first female plant.

17. The method of claim 13, further comprising: identifying genes which are silenced in the first parent plant, the second parent plant, or the first progeny plant.

18. The method of claim 13, further comprising: cloning a nucleic acid encoded by a gene silenced in the first parent plant, the second parent plant or in the first progeny plant.

19. The method of claim 13, further comprising: introducing a heterologous nucleic acid into the first parent plant, the second parent plant, the first progeny plant, or a subsequent progeny plant derived from one or more of: the first parent plant, the second parent plant, or the first progeny plant, which heterologous nucleic acid results in increased expression of an expression product from a silenced gene.

20. The method of claim 13, further comprising: determining a ratio between the sum of expressed gene products that differ from the progeny plant in each of the first and second parent plants and the number of expressed gene products that differ between the first and the second parent plant.

21. The method of claim 13, further comprising: crossing 1 or more additional plants with the first or second parent plant to produce at least one additional progeny plant.

22. The method of claim 13, further comprising:
crossing 1 or more additional plants with the first or second parent plant to produce one or more additional progeny plant;
profiling expression of a representative sample of gene products from the one or more additional progeny plant; and, comparing the resulting expression profile of the one or more additional progeny plant with an expression profile of the first progeny plant.

23. The method of claim 22, further comprising: selecting a heterotic progeny plant from a group of progeny plants comprising the first progeny plant and the one or more additional progeny plant.

24. The method of claim 23, wherein the heterotic progeny plant is selected based upon one or more selectable property selected from: an elevated number of expressed RNAs relative to one or more parental plant; an elevated number of expressed RNAs relative to other progeny plants in the group of progeny plants; an elevated number of RNAs showing a dominant expression pattern relative to one or more parental plant; an elevated number of gene products showing a dominant expression pattern relative to other progeny plants in the group of progeny plants; an RNA showing an allelic sequence polymorphism relative to one or more parental plant, an RNA showing an allelic sequence polymorphism relative to other progeny plants in the group of progeny plants, a decreased number of gene products showing an over or underdominant gene expression pattern as compared to one or more parental plant;
and, a decreased number of gene products showing an over or underdominant expression pattern as compared to other progeny plants in the group of progeny plants.

25. The method of claim 13, wherein the first parent plant, second parent plant, and first progeny plant are independently selected from: an inbred plant, and a hybrid plant.

26. The method of claim 13, wherein the first parent plant is a first inbred plant, the second parent plant is a second inbred plant and the progeny plant is a hybrid plant.

27. The method of claim 13, wherein the first or second parent plant is an inbred or hybrid plant, and the progeny plant is a hybrid plant, the method further comprising crossing a plurality of first additional plants of the same strain as the first parent plant with a plurality second additional plants of the same strain as the second parent plant, to produce a plurality of progeny hybrid plants.

28. The method of claim 27, further comprising topcrossing at least one of the plurality of progeny hybrid plants with a plurality of inbred plants to provide a plurality of topcross plants.

29. The method of claim 28, further comprising topcrossing the topcross plants to an inbred plant to produce a topcross progeny plant, and, optionally, profiling expression of a representative sample of RNA from the topcross plant or from the topcross progeny plant.

30. The method of claim 28, further comprising:
selfing a test plant selected from: the first parent plant, the second parent plant, the first progeny plant, one of the plurality of progeny hybrid plants, one of the plurality of topcross plants, and one of the plurality of topcross progeny plants; or crossing one or more test plants selected from: the first parent plant, the second parent plant, the first progeny plant, one of the plurality of progeny hybrid plants, one of the plurality of topcross plants, and one of the plurality of topcross progeny plants.

31. The method of claim 30, further comprising: profiling expression of the test plant.

32. The method of claim 30, further comprising: profiling expression of an immature tissue from the test plant.

33. The method of claim 28, the method further comprising:
profiling expression of a representative number of expression products from one or more of the plurality of hybrid progeny plants, or progeny thereof; and additionally performing at least one of:
(i) determining the number of expression products in the representative sample from the plurality of hybrid progeny plants, or progeny thereof, wherein the number of expression products in the plurality of hybrid progeny plants, or progeny thereof, is correlated with a measure of heterosis in the plurality of hybrid progeny plants, or progeny thereof;
(ii) determining the number of expression products in the representative sample of expression products from the plurality of hybrid progeny plants, or progeny thereof, wherein the number of expression products exhibiting a dominant expression pattern in the plurality of hybrid progeny plants, or progeny thereof is correlated with a measure of heterosis in the plurality of hybrid progeny plants, or progeny thereof; and, (iii) selecting the plurality of hybrid progeny plants, or progeny thereof for plants which display a selected number of expression products, or a selected number of dominant expression products, thereby selecting for an increase in a measure of heterosis.

34. The method of claim 13, wherein the first and second parent plants are monocots.

35. The method of claim 13, wherein the first and second parent plant are selected from the families Gramineae, Compositae, and Leguminosae.

36. The method of claim 13, wherein the first and second parent plant are selected from: Zea mays, rice, soybean, sorghum, wheat, oats, barley, millet, sunflower, and canola.

37. The method of claim 13, further comprising selecting the first and second parent plant to produce the first progeny plant with a selected number of expression products which are dominant, over-dominant, under-dominant or additive.

38. The method of claim 37, wherein the parents are selected to produce the first progeny plant by selecting for complementary expression of dominant or additive expression products between the parents.

39. The method of claim 1, further comprising:
(iii) comparing a set of first expression products in the first progeny plant to a set of second plant expression products from a second plant; or, (iv) comparing a set of expression products exhibiting a dominant expression pattern in the first progeny plant to a set of expression products exhibiting a dominant expression pattern in a second plant.

40. The method of claim 39, wherein step (iii) or step (iv) is performed using a computer.

41. The method of claim 39, wherein step (iv) or step (v) is performed using a computer, wherein the second number or expressed gene products or the second number of gene products exhibiting a dominant expression pattern is present in a database in the computer.

42. The method of claim 1, wherein the steps of profiling expression are performed in an integrated system comprising a microprocessor with software for determining one or more of: how many genes are expressed; whether expressed genes are dominant; whether expressed genes are additive; whether expressed genes are over-dominant;
and, whether expressed genes are under-dominant.

43. The method of claim 1, further comprising inputing a resulting expression profile for the first progeny plant into a database of expression profiles.

44. The method of claim 43, wherein the database is in an integrated system comprising a computer.

45. A database produced by the method of claim 43.

46. The database of claim 45, wherein the database is present in a computer.

47. The computer database of claim 46, wherein the database comprises expression product profiles of a representative sample of expression products for hybrid progeny plants resulting from at least 10 separate inbred plant crosses.

48. The method of claim 43, further comprising selecting an expression profile from the database, which profile provides a unique subset of expression products.

49. The method of claim 48, further comprising:
cloning a nucleic acid which expresses at least one expression product in the unique subset of expression products; or, cloning a nucleic acid which expresses at least one expression product in the unique subset of expression products and transducing the nucleic acid into a heterologous plant; or, crossing a first selected plant which expresses the unique subset of expression products with a second selected plant which does not express the unique subset of expression products.

50. The method of claim 1, further comprising: selfing the first progeny plant.

51. The method of claim 1, further comprising: selfing the first progeny plant and detecting silencing of dominant expression products in subsequent progeny plants which are derived from selfing the first progeny plant.

52. The method of claim 51, further comprising: cloning a silenced nucleic acid encoding a dominant expression product.

53. The method of claim 51, further comprising: introducing a heterologous nucleic acid that results in expression of dominant expression products from silenced genes.

54. The method of claim 53, wherein the heterologous nucleic acid encodes one or more of: a transcription factor which activates a promoter from a silenced gene; a nucleic acid encoded by the silenced gene under the control of a heterologous promoter; and, a nucleic acid homologous to the silenced gene with at lease one [region of difference] with the silenced gene, which homologous nucleic acid can recombine with the silenced gene to produce a modified gene.

55. The method of claim 1, further comprising:
testing the first progeny plant or a subsequent progeny plant thereof for a desired trait.

56. The method of claim 1, further comprising:
testing the first progeny plant, or a subsequent progeny plant thereof, for a desired phenotypic trait;
comparing the phenotypic trait between the first progeny plant, or the subsequent progeny plant, to a selected hybrid plant;
comparing an expression profile of the selected hybrid plant to an expression profile of the first progeny plant, or the subsequent progeny plant; and, cloning at least one nucleic acid which is differentially expressed between the selected hybrid plant and the first progeny plant, or the subsequent progeny plant.

57. The method of claim 56, further comprising transducing the at least one nucleic acid into a selected plant to produce a transgenic plant.

58. The method of claim 1, wherein the first and second representative samples arc from an immature tissue of first progeny plant.

59. The method of claim 58, wherein the immature tissue is an immature ear of the plant, or a seedling plant.

60. A method of identifying plant crosses with an increase in probability for heterosis in progeny plants, comprising:
(i) comparing expression profiles for a plurality of plants; and (ii) determining, by pair-wise comparisons of the expression profiles, which crosses will produce at least one of the following:
(a) progeny with a selected or optimal number of expression products; or, (b) progeny with a selected number or type of expression products that display a dominant, additive, overdominant or underdominant expression pattern.

61. The method of claim 60, further comprising making identified plant crosses to produce progeny plants.

62. The method of claim 60, further comprising making identified plant crosses to produce progeny plants, which progeny plants are tested for one or more desired trait.

63. The method of claim 60, wherein crosses are identified which maximize the number of expression products in potential progeny, or which maximize the number of dominant expression products in potential progeny, or which maximize the number of additive expression products in potential progeny, or which minimise the number of over-dominant expression products in potential progeny, or which minimize the number of under-dominant expression products in potential progeny.

64. The method of claim 60, wherein:
the plants are inbred plants, hybrid plants, or transgenic plants; and, the plants are selected from: plants in the families Gramineae, Compositae, and Leguminosae; or, the plants are selected from: Zea mays, rice, soybean, sorghum, wheat, oats, barley, millet, sunflower, and canola.

65. The method of claim 60, wherein the expression profiles are compiled in a database.

66. The method of claim 60, wherein a matrix of possible pair-wise expression profile combinations for the plants is generated.

67. The method of claim 60, wherein the expression profiles are compiled in a database in a computer and a matrix of possible pair-wise expression profile combinations for the plants is considered using an integrated system comprising a computer.

68. The method of claim 60, further comprising: selecting a subset of potential crosses from all of the possible pair-wise comparisons which exhibit a maximal number of expression profile differences.

69. The method of claim 60, further comprising: selecting a subset of potential crosses from all of the possible pair-wise comparisons which exhibit a maximal number of expression profile differences, wherein at least a plurality of the possible pair-wise comparisons are for plants from different heterotic groups.

70. The method of claim 60, wherein the pair-wise comparisons are considered to identify crosses from the same heterotic group.

71. The method of claim 60, further comprising:
(iii) identifying crosses where:
the sum of:
(a) expression products produced in a first plant from a first heterotic group (A j) which are not expressed in a second plant from the first heterotic group (A) to which the first plant is crossed (A k), and which are not expressed in a selected third plant from a second heterotic group (B); plus (b) the expression products produced in A k which are not produced A j and which are not produced in B;
is optimized.

72. The method of claim 71, further comprising making a cross identified in (iii).

73. The method of claim 71, wherein optimization is made by:
determining all possible pair-wise combinations from the first heterotic group and identifying the cross which results in the largest sum of expression products;
or determining all possible pair-wise combinations from the first heterotic group and identifying crosses which result in a hybrid progeny (A i x A j) with a maximal number of differences as compared to B; or, determining all possible pair-wise combinations from the first heterotic group and identifying crosses which result in the hybrid progeny (A i x A j) having a greater number of differences with B than the number of differences between B and Ai or B and A
j.

74. The method of claim 73, further comprising selecting self- or back-crossed progeny derived from the A i x A j hybrid that:
retain a set of expression products defined by the sum of expression products expressed in A i (but not A j or B) and A j (but not A i or B); or which show a larger number of expression products expressed in a topcross with B
than does either A i or A j when topcrossed with B.

75. A method of identifying a source of a test plant, comprising:
profiling expression of a representative sample of expression products from the test plant; and, comparing the resulting test expression profile to a database of known expression profiles for plants from known inbred or hybrid strains.

76. The method of claim 75, wherein the expression profile is for a selected tissue and the database of expression profiles comprises expression profiles for the same tissue from the known inbred or hybrid strains.

77. The method of claim 75, wherein the database of expression profiles is used to provide a matrix of pair-wise comparisons for potential progeny from the expression profiles in the database, which matrix of pair-wise comparisons is compared to the test expression profile.

78. The method of claim 75, wherein the source identified is a sub-portion of the total expression profile, which subportion corresponds to a unique marker for a specific parental strain.