CA2473555A1 - Methods to identify evolutionarily significant changes in polynucleotide and polypeptide sequences in domesticated plants and animals - Google Patents

Methods to identify evolutionarily significant changes in polynucleotide and polypeptide sequences in domesticated plants and animals Download PDF

Info

Publication number
CA2473555A1
CA2473555A1 CA002473555A CA2473555A CA2473555A1 CA 2473555 A1 CA2473555 A1 CA 2473555A1 CA 002473555 A CA002473555 A CA 002473555A CA 2473555 A CA2473555 A CA 2473555A CA 2473555 A1 CA2473555 A1 CA 2473555A1
Authority
CA
Canada
Prior art keywords
seq
polynucleotide
polypeptide
plant
domesticated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002473555A
Other languages
French (fr)
Inventor
Walter Messier
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Evolutionary Genomics LLC
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US10/079,042 external-priority patent/US7252966B2/en
Application filed by Individual filed Critical Individual
Publication of CA2473555A1 publication Critical patent/CA2473555A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • C12Q1/6895Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for plants, fungi or algae
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/415Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from plants
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8241Phenotypically and genetically modified plants via recombinant DNA technology
    • C12N15/8261Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6809Methods for determination or identification of nucleic acids involving differential detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/13Plant traits
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/10Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture
    • Y02A40/146Genetically Modified [GMO] plants, e.g. transgenic plants

Abstract

The present invention provides methods for identifying polynucleotide and polypeptide sequences which may be associated with commercially or aesthetically relevant traits in domesticated plants or animals. The methods employ comparison of homologous genes from the domesticated organism and its ancestor to identify evolutionarily significant changes and evolutionarily neutral changes. Sequences thus identified may be useful in enhancing commercially or aesthetically desirable traits in domesticated organisms or their wild ancestors.

Description

DEMANDE OU BREVET VOLUMINEUX
LA PRESENTE PARTIE DE CETTE DEMANDE OU CE BREVET COMPREND
PLUS D'UN TOME.

NOTE : Pour les tomes additionels, veuillez contacter le Bureau canadien des brevets JUMBO APPLICATIONS/PATENTS
THIS SECTION OF THE APPLICATION/PATENT CONTAINS MORE THAN ONE
VOLUME

NOTE: For additional volumes, please contact the Canadian Patent Office NOM DU FICHIER / FILE NAME
NOTE POUR LE TOME / VOLUME NOTE:
METHODS TO IDENTIFY EVOLUTIONARILY SIGNIFICANT
CHANGES IN POLYNUCLEOTIDE AND POLYPEPTIDE
SEQUENCES IN DOMESTICATED PLANTS AND ANIMALS
TECHNICAL FIELD
This invention relates to using molecular and evolutionary techniques to identify polynucleotide and polypeptide sequences corresponding to commercially or aesthetically relevant traits in domesticated plants and animals.
BACKGROUND ART
Humans have bred plants and animals for thousands of years, selecting for certain commercially valuable and/or aesthetic traits. Domesticated plants differ from their wild ancestors in such traits as yield, short day length flowering, protein andlor oil content, ease of harvest, taste, disease resistance and drought resistance. Domesticated animals differ from their wild ancestors in such traits as fat and/or protein content, milk production, docility, fecundity and time to maturity. At the present time, most genes underlying the above differences are not known, nor, as importantly, are the specific changes that have evolved in these genes to provide these capabilities. Understanding the basis of these differences between domesticated plants and animals and their wild ancestors will provide useful information for maintaining and enhancing those traits. In the case of crop plants, identification of the specific genes that control desired traits will allow direct and rapid improvement in a manner not previously possible.
Although comparison of homologous genes or proteins between domesticated species and their wild ancestors may provide useful information with respect to conserved molecular sequences and functional features, this approach is of limited use in identifying genes whose sequences have changed due to human imposed selective pressures. With the advent of sophisticated algorithms and analytical methods, much more information can be teased out of DNA sequence changes with regard to which genes have been positively selected.
The most powerful of these methods, "KA/Ks," involves pairwise comparisons between aligned protein-coding nucleotide sequences of the ratios of nonsynonymous nucleotide substitutions per nonsynon~mous site (Kal synonymous substitutions per synonymous site (Ks) (where nonsynonymous means substitutions that change the encoded amino acid and synonymous means substitutions that do not change the encoded amino acid).
"KA/Ks-type methods" include this and similar methods.

These methods have been used to demonstrate the occurrence of Darwinian (i.e., natural) molecular-level positive selection, resulting in amino acid differences in homologous proteins. Several groups have used such methods to document that a particular protein has evolved more rapidly than the neutral substitution rate, and thus supports the existence of Darwinian molecular-level positive selection. For example, McDonald and Kreitman (1991) Nature 351:652-654, propose a statistical test of the neutral protein evolution hypothesis based on comparison of the number of amino acid replacement substitutions to synonymous substitutions in the coding region of a locus. When they apply this test to the Adh locus of three Drosophila species, they conclude that it shows instead that the locus has undergone adaptive fixation of selectively advantageous mutations and that selective fixation of adaptive mutations may be a viable alternative to the cloclclike accumulation of neutral mutations as an explanation for most protein evolution. Jerkins et al. (1995) Pr~oc. R. Soc.
Lofzd. B 261:203-207 use the McDonald & Kreitman test to investigate whether adaptive evolution is occurring in sequences controlling transcription (non-coding sequences).
Nakashima et al. (1995) P~oc. Natl. Acad. Sci USA 92:5606-5609, use the method of Miyata and Yasunaga to perform pairwise comparisons of the nucleotide sequences of ten PLA2 isozyme genes from two snake species; this method involves comparing the number of nucleotide substitutions per site for the noncoding regions including introns (KN) and the KA
and Ks, They conclude that the protein coding regions have been evolving at much higher rates than the noncoding regions including introns. The highly accelerated substitution rate is responsible for Darwinian molecular-level evolution of PLA2 isozyme genes to produce new physiological activities that must have provided strong selective advantage for catching prey or for defense against predators. Endo et al. (1996) Mol. Biol. Evol.
13(5):685-690 use the method of Nei and Gojobori, wherein dN is the number of nonsynonymous substitutions and ds is the number of synonymous substitutions, for the purpose of documenting natural selection on genes. Metz and Palumbi (1996) Mol. Biol. Evol. 13(2):397-406 use the McDonald & Kreitman (supf°a) test as well as a method attributed to Nei and Gojobori, Nei and Jin, and Kumar, Tamara, and Nei; examining the average proportions of P", the replacement substitutions per replacement site, and PS, the silent substitutions per silent site, to loolc for evidence of positive selection on binding genes in sea urchins to investigate whether they have rapidly evolved as a prelude to species formation. Goodwin et al. (1996) Mol. Biol. Evol. 13(2):346-358 uses similar methods to examine the evolution of a particular marine gene family and conclude that the methods provide important fundamental insights into how selection drives genetic divergence in an experimentally manipulatable system.

Edwards et al. (1995) use degenerate primers to pull out MHC loci from various species of birds and an alligator species, which are then analyzed by the Nei and Gojobori methods (dN:
ds ratios) to extend MHC studies to nonmammalian vertebrates. Whitfield et al.
(1993) Nature 364:713-715 use KA/Ks analysis to look for directional selection in the regions flanking a conserved region in the SRYgene (that determines male sex). They suggest that the rapid evolution of SRY could be a significant cause of reproductive isolation, leading to new species. Wettsetin et al. (1996) Mol. Biol. Evol. 13(1):56-66 apply the MEGA
program of Kumar, Tamura and Nei and phylogenetic analysis to investigate the diversification of MHC
class I genes in squirrels and related rodents. Parham and Ohta (1996) Science 272:67-74 state that a population biology approach, including tests for selection as well as for gene conversion and neutral drift are required to analyze the generation and maintenance of human MHC class I polymorphism. Hughes (1997) Mol. Biol. Evol. 14(1):1-5 compared over one hundred orthologous immunoglobulin C2 domains between human and rodent, using the method of Nei and Gojobori (dN: ds ratios) to test the hypothesis that proteins expressed in cells of the vertebrate immune system evolve unusually rapidly. Swanson and Vacquier (1998) Science 281:710-712 use dN: ds ratios to demonstrate concerted evolution between the lysin and the egg receptor for lysin and discuss the role of such concerted evolution in forming new species (speciation). Messier and Stewart (1997) Natus°e 385:151-154, used KA/Ks to demonstrate positive selection in primate lysozymes.
The genetic changes associated with domestication have been most extensively investigated in maize (the preferred agricultural term for corn) (Dorweiler (1993) Science 262:232-235). For maize, (Zea mat's ssp. mat's), a small number of single-gene changes apparently accounts for all the differences between our present domesticated maize plant and its wild ancestor, teosinte (tea mat's ssp paruiglumis) (Dorweiler, 1993). QTL
(quantitative trait locus) analysis has demonstrated (Doebley (1990) PNAS USA 87:9888-9892) that no more than fifteen genes control traits of interest in maize and explain the profound difference in morphology between maize and teosinte (Wang (1999) Nature 398:236-239).
Importantly, a similarly small number of genes may control traits of interest in other grass-derived crop plants, including rice, wheat, millet and sorghum (Paterson (1995) Science 269:1714-1718). In fact, for most of these relevant genes in maize, the homologous gene may control similar traits in other grass-derived crop plants (Paterson, 1995).
Thus, identification of these genes in one grass-derived crop plant would facilitate identification of homologous genes in all of the others.
As can be seen from the papers cited above, analytical methods of molecular evolution to identify rapidly evolving genes (KA/Ks-type methods) can be applied to achieve many.
different purposes, most commonly to confirm the existence of Darwinian molecular-level positive selection, but also to assess the frequency of Darwinian molecular-level positive selection, to elucidate mechanisms by which new species are formed, or to establish single or multiple origin for specific gene polymorphisms. What is clear is from the papers cited above and others in the literature is that none of the authors applied KA/Ks-type methods to identify evolutionary changes in domesticated plants and animals brought about by artificial selective pressures. While Turcich et al. (1996) Sexual Plant RepYOduction 9:65-74, describes the use of Ks analysis on plant genes, it is believed that no one has used KA/Ks type analysis as a systematic tool for identifying in domesticated plants and animals those genes that contain evolutionarily significant sequence changes that can be exploited in the development, maintenance or enhancement of desirable commercial or aesthetic traits.
The identification in domesticated species of genes that have evolved to confer unique, enhanced or altered functions compared to homologous ancestral genes could be used to develop agents to modulate these functions. The identification of the underlying domesticated species genes and the specific nucleotide changes that have evolved, and the further characterization of the physical and biochemical changes in the proteins encoded by these evolved genes, could provide valuable information on the mechanisms underlying the desired trait. This valuable information could be applied to developing agents that further enhance the function of the target proteins. Alternatively, further engineering of the responsible genes could modify or augment the desired trait. Additionally, the identified genes may be found to play a role in controlling traits of interest in other domesticated plants. A
similar process can identify genes for traits of interest in domestic animals.
All references cited herein are hereby incorporated by reference in their entirety.
DISCLOSURE OF THE INVENTION
The subject invention concerns methods of identifying polynucleotides that control commercially valuable traits in domesticated plants or animals. These polynucleotides that, in accordance with the methods of the subject invention, are found to control commercially valuable traits can be used to further enhance those traits. Polynucleotides identified to control commercially valuable traits such as drought-, disease-, or stress-resistance or yield, protein content, short day length flowering, oil content, ease of harvest, taste, and the like can be used to develop compositions and methods to further enhance the commercial value of domesticated plants. While it is desired to identify polynucleotides that control valuable traits, it is challenging to identify such polynucleotides among the tens of thousands of genes in plant and animal genomes. The invention comprises narrowing the search for such polynucleotides by comparing the corresponding polynucleotide sequences of domesticated and ancestor organisms to select those sequences containing nucleotide changes that are evolutionarily significant, which is typically indicated by a Ka/Ks ratio of 1.0 or greater. For example, the subset of ancestor-modern plant polynucleotide pairs with Ka/Ks ratios of 1.0 should contain polynucleotides affected by neutral evolution, that is those for which the trait has not been under pressure, imposed by man or nature, to either be conserved or to change.
Such polynucleotides can then be tested for those encoding traits such as such as drought-, disease-, or stress-resistance, because these functions have been dramatically supplemented by domestication, alleviating natural selection pressures on these polynucleotides. The subset of ancestor-modern plant polynucleotide pairs with KalKs ratios greater than 1.0 should contain polynucleotides affected by selection. Such polynucleotides can then be tested for those encoding traits such as yield, protein content, short day length flowering, oil content, ease of harvest, taste, and the like, because these traits have been under intense, unidirectional, unremitting selective pressure by humans in the course of domestication of plants such as food crops.
Thus, in one embodiment, the present invention provides methods for identifying polynucleotide and polypeptide sequences having evolutionarily significant changes, which are associated with commercial or aesthetic traits in domesticated organisms including plants and animals. The invention uses comparative genomics to identify specific gene changes which may be associated with, and thus responsible for, structural, biochemical or physiological conditions, such as commercially or aesthetically relevant traits, and using the information obtained from these polynucleotide or polypeptide sequences to develop domesticated organisms with enhanced traits of interest.
In one preferred embodiment, a polynucleotide or polypeptide of a domesticated plant or animal has undergone artificial selection that resulted in an evolutionarily significant change present in the domesticated species that is not present in the wild ancestor. One example of this embodiment is that the polynucleotide or polypeptide may be associated with enhanced crop yield as compared to the ancestor. Other examples include short day length flowering (i.e., flowering only if the daily period of light is shorter than some critical length), protein content, oil content, ease of harvest, and taste. The present invention can thus be useful in gaining insight into the genes and/or molecular mechanisms that underlie functions or traits in domesticated organisms. This information can be useful in designing the polynucleotide so as to further enhance the function or trait. For example, a polynucleotide determined to be responsible for improved crop yield could be subjected to random or directed mutagenesis, followed by testing of the mutant genes to identify those which further enhance the trait.
Accordingly, in one aspect, methods are provided for identifying a polynucleotide sequence encoding a polypeptide of a domesticated organism (e.g., a plant or animal), wherein the polypeptide may be associated with a commercially or aesthetically relevant trait that is unique, enhanced or altered in the domesticated organism as compared to the ancestor of the domesticated organism, comprising the steps of: a) comparing protein-coding nucleotide sequences of said domesticated organism to protein-coding nucleotide sequences of said wild ancestor; and b) selecting a polynucleotide sequence in the domesticated organism that contains a nucleotide change as compared to a corresponding sequence in the wild ancestor, wherein said change is evolutionarily significant.
In another aspect of the invention, methods are provided for identifying an evolutionarily significant change in a protein-coding nucleotide sequence of a domesticated organism (e.g., a plant or animal), comprising the steps of: a) comparing protein-coding nucleotide sequences of the domesticated organism to corresponding sequences of a wild ancestor of the domesticated organism; and b) selecting a polynucleotide sequence in said domesticated organism that contains a nucleotide change as compared to the corresponding sequence of the wild ancestor, wherein the change is evolutionarily significant.
In some embodiments, the nucleotide change identified by any of the methods described herein is a-non-synonymous substitution. In some embodiments, the evolutionary significance of the nucleotide change is determined according to the non-synonymous substitution rate (KA) of the nucleotide sequence. In some embodiments, the evolutionarily significant changes are assessed by determining the KA/Ks ratio between the domesticated organism polynucleotide and the corresponding ancestral polynucleotide. In some of these embodiments, preferably the ratio is at least about 0.75, or more preferably 1Ø With increasing preference, the ratio is at least about 1.0, 1.25, 1.50, 2.00, or greater.
In another aspect, the invention provides a method of identifying an agent which may modulate the relevant trait in the domesticated organism, said method comprising contacting at least one candidate agent with a cell, model system or transgenic plant or animal that expresses the polynucleotide sequence having the evolutionarily significant change, or a composition comprising the evolutionarily significant polypeptide wherein the agent is identified by its ability to modulate function or synthesis of the polypeptide.
Also provided is a method for large scale sequence comparison between protein-coding nucleotide sequences of a domesticated organism and protein-coding sequences from a wild ancestor, said method comprising: a) aligning the domesticated organism sequences with corresponding sequences from the wild ancestor according to sequence homology; and b) identifying any nucleotide changes within the domesticated organism's sequences as compared to the homologous sequences from the wild ancestor organism.
In another aspect, the subj ect invention provides a method for correlating an evolutionarily significant nucleotide change to a commercially or aesthetically relevant trait that is unique, enhanced or altered in a domesticated organism, comprising: a) identifying a nucleotide sequence having an evolutionarily significant change according to the methods described herein; and b) analyzing the functional effect of the presence or absence of the identified sequence in the domesticated organism or in a model system.
The domesticated plants used in the subject methods can be maize, rice, tomatoes, potatoes or any domesticated plant for which the wild ancestor is extant and known. For example, the ancestor of maize is teosinte (Zea mays pa~viglumis); ancestors of wheat are Triticum monococcum, T. speltoides and Aegilops tauschii; and an ancestor of rice is O.
rufipogor~. The relevant trait can be any commercially or aesthetically relevant trait such as yield, short day length flowering, protein content, oil content, drought resistance, taste, ease of harvest or disease resistance. In a preferred embodiment, the domesticated plant is rice, and the relevant trait is yield.
In another embodiment of the invention, methods for the identification of polynucleotides associated with stress-resistance in an ancestor organism are provided. In this embodiment, a polynucleotide in the domesticated organism has undergone neutral evolution relative to a polynucleotide in the ancestor which is or is suspected of being associated with stress-resistance, whereby mutations have accumulated in the domesticated organism's polynucleotide. The stress-resistance trait in the ancestor may be unique, enhanced or altered relative to the domesticated organism.
The method for identifying the polynucleotide sequence comprises a) comparing polypeptide-coding nucleotide sequences of the domesticated organism to polypeptide coding nucleotide sequences of the wild ancestor; and b) selecting a polynucleotide sequence in the ancestor organism that contains at least one nucleotide change as compared to a corresponding sequence in the domesticated organism, wherein the change is evolutionarily neutral. The stress-resistance trait may be drought resistance, disease resistance, pest resistance, high salt level resistance or other stress-resistance traits of commercial interest.
Also provided is a method for identifying an evolutionarily neutral change in a polypeptide-coding polynucleotide sequence of a wild ancestor of a domesticated organism comprising: a) comparing polypeptide-coding polynucleotide sequences of said wild ancestor to corresponding sequences of said domesticated organism; and b) selecting a polynucleotide sequence in the domesticated organism that contains a nucleotide change as compared to the corresponding sequence of the wild ancestor, wherein the change is evolutionarily neutral and the polynucleotide is associated with a stress-resistance trait in the wild ancestor.
Neutral evolution is typically indicated by a KA/Ks ratio of between about 0.75 and 1.25, more preferably between about 0.9 and 1.1, and most preferably about 1Ø The KA/Ks comparison may be calculated as ancestor to domestic organism, or domestic to ancestor organism.
In another aspect, the invention provides for a method of identifying an agent that may modulate a stress-resistance trait in an organism (ancestor or domesticated organism), wherein at least one candidate agent is contacted with the ancestor, domesticated organism or with a cell or transgenic organism that expresses the polynucleotide sequence associated with stress-resistance, wherein the agent is identified by its ability to modulate the function of the polypeptide encoded by the polynucleotide.
. Also provided is a method for large scale sequence comparison between polypeptide-coding nucleotide sequences of a wild ancestor and those of a domesticated organism, wherein the ancestor polypeptide confers or is suspected of conferring a stress-related trait that is unique, enhanced or altered in the wild ancestor as compared to the domesticated organism, comprising: a) aligning the ancestor and domesticated sequences according to sequence homology, and b) identifying any nucleotide changes in the domesticated organism sequence as compared to the ancestor homologous sequence, wherein said changes are evolutionarily neutral.
.In another aspect, the subject invention provides a method for correlating an evolutionarily neutral nucleotide change to a commercially or aesthetically relevant trait that is unique, enhanced or altered in a domesticated organism, comprising: a) identifying a nucleotide sequence having an evolutionarily neutral change according to the methods described herein; and b) analyzing the functional effect of the presence or absence of the identified sequence in the domesticated organism or in a model system.

BRIEF DESCRIPTION OF THE FIGURES
Figure 1 shows a nucleotide alignment of O. sativa cv. Nipponbare and O.
~ufipogon (NSGC5953) for EG307. This alignment includes untranslated regions (UTR) on the 5' end and notes the start and stop codons for this gene.
Figure 2 shows a protein alignment of O. sativa cv. Nipponbare and O.
~ufipogon (NSGC5953) for EG307. This alignment includes the complete coding (CDS) region.
Figure 3 shows a nucleotide sequence of EG307 in Zea nays mays and Zea mays parviglumis (teosinte, strain Benz967) for coding region of the gene. Start and stop codons are identified.
Figure 4 shows a protein alignment of Zea mays mays and Zea mays pa~viglumis EG307. This alignment includes the full-length deduced protein sequence.
Figure 5 shows markers CD01387 and RZ672 mapped to five different genetic rice maps, indicating that the range of these markers is consistent among the five maps. EG307 is upstream of CD01387 (about 200kb) and a QTL for 1000 Grain Weight is associated with marker RZ672.
Figure 6 shows the nucleotide alignment of O. sativa (strain Nipponbare) and O.
t~ufigogon (strain 5498) for EGl 17, and indicates there are three nonsynomous changes.
Figure 7 shows the protein alignment of O. sativa (strain Nipponbare) and O.
~°ufigogon (strain 5498) for EG117. This alignment includes the partial CDS rgion with the stop codon. The three amino acid difference beween O. sativa and O.
s°ufipogon are shown in bold.
Figure 8 shows the protein alignment of O. sativa (strain Nipponbare) and A~aidopsis PTR2-B (histidine transporting protein, NP_178313).
DETAILED DESCRIPTION OF THE INVENTION
In one embodiment, the present invention utilizes comparative genomics to identify positively selected genes and specific gene changes which are associated with, and thus may contribute to or be responsible for, commercially or aesthetically relevant traits in domesticated organisms (e.g., plants and animals).
In another embodiment, the invention identifies evolutionarily neutral genes and gene changes that are associated with stress-resistance in ancestors of domesticated organisms.
The practice of the present invention employs, unless otherwise indicated, conventional techniques of molecular biology, genetics and molecular evolution, which are within the skill of the art. Such techniques are explained fully in the literature, such as:

"Molecular Cloning: A Laboratory Manual", second edition (Sambrook et al., 1989);
"Oligonucleotide Synthesis" (M.J. Gait, ed., 1984); "Current Protocols in Molecular Biology"
(F.M. Ausubel et al., eds., 1987); "PCR: The Polymerase Chain Reaction", (Mullis et al., eds., 1994); "Molecular Evolution", (Li, 1997).
1. Definitions As used herein, a "polynucleotide" refers to a polymeric form of nucleotides of any length, either ribonucleotides or deoxyribonucleotides, or analogs thereof.
This term refers to the primary structure of the molecule, and thus includes double- and single-stranded I~NA, as well as double- and single-stranded RNA. It also includes modified polynucleotides such as methylated and/or capped polynucleotides, polynucleotides containing modified bases, backbone modifications, and the like. The terms "polynucleotide" and "nucleotide sequence"
are used interchangeably.
As used herein, a "gene" refers to a polynucleotide or portion of a polynucleotide comprising a sequence that encodes a protein. It is well understood in the art that a gene also comprises non-coding sequences, such as 5' and 3' flanking sequences (such as promoters, enhancers, repressors, and other regulatory sequences) as well as introns.
The terms "polypeptide," "peptide," and "protein" are used interchangeably herein to refer to polymers of amino acids of any length. These terms also include proteins that are post-translationally modified through reactions that include glycosylation, acetylation and phosphorylation.
The term "domesticated organism" refers to an individual living organism or population of same, a species, subspecies, variety, cultivar or strain, that has been subjected to artificial selection pressure and developed a commercially or aesthetically relevant trait. In some preferred embodiments, the domesticated organism is a plant selected from the group consisting of maize, wheat, rice, sorghum, tomato or potato, or any other domesticated plant of commercial interest, where an ancestor is known. A "plant" is any plant at any stage of development, particularly a seed plant.
In other preferred embodiments, the domesticated organism is an animal selected from the group consisting of cattle, horses, pigs, cats and dogs. A domesticated organism and its ancestor may be related as different species, subspecies, varieties, cultivars or strains or any combination thereof.
The term "wild ancestor" or "ancestor" means a forerunner or predecessor organism, species, subspecies, variety, cultivar or strain from which a domesticated organism, species, subspecies, variety, cultivar or strain has evolved. A domesticated organism can have one or more than one ancestor. Typically, domesticated plants can have one or a plurality of ancestors, while domesticated animals usually have only a single ancestor.
The term "commercially or aesthetically relevant trait" is used herein to refer to traits that exist in domesticated organisms such as plants or animals whose analysis could provide information (e.g., physical or biochemical data) relevant to the development of improved organisms or of agents that can modulate the polypeptide responsible for the trait, or the respective polynucleotide. The commercially or aesthetically relevant trait can be unique, enhanced or altered relative to the ancestor. By "altered," it is meant that the relevant trait differs qualitatively or quantitatively from traits observed in the ancestor.
The term "KA/Ks-type methods" means methods that evaluate differences, frequently (but not always) shown as a ratio, between the number of nonsynonymous substitutions and synonymous substitutions in homologous genes (including the more rigorous methods that determine non-synonymous and synonymous sites). These methods are designated using several systems of nomenclature, including but not limited to KA/Ks, dN/ds, DN/Ds.
The terms "evolutionarily significant change" and "adaptive evolutionary change"
refer to one or more nucleotide or peptide sequence changes) between two organisms, species, subspecies, varieties, cultivars and/or strains that may be attributed to either relaxation of selective pressure or positive selective pressure. One method for determining the presence of an evolutionarily significant change is to apply a KA/Ks-type analytical method, such as to measure a KA/Ks ratio. Typically, a KA/Ks ratio of 1.0 or greater is considered to be an evolutionarily significant change.
Strictly speaking, KA/Ks ratios of exactly 1.0 are indicative of relaxation of selective pressure (neutral evolution), and KA/Ks ratios greater than 1.0 are indicative of positive selection. However, it is commonly accepted that the ESTs in GenBank and other public databases often suffer from some degree of sequencing error, and even a few incorrect nucleotides can influence KA/Ks ratios. For this reason, polynucleotides with KA/Ks ratios as low as 0.75 can be carefully resequenced and re-evaluated for relaxation of selective pressure (neutral evolutionarily significant change), positive selection pressure (positive evolutionarily significant change), or negative selective pressure (evolutionarily conservative change).
The term "positive evolutionarily significant change" means an evolutionarily significant change in a particular organism, species, subspecies, variety, cultivar or strain that results in an adaptive change that is positive as compared to other related organisms. An example of a positive evolutionarily significant change is a change that has resulted in enhanced yield in crop plants. As stated above, positive selection is indicated by a KA/Ks ratio greater than 1Ø With increasing preference, the KA/Ks value is greater than 1.25, 1.5 and 2Ø
The term "neutral evolutionarily significant change" refers to a polynucleotide or polypeptide change that appears in a domesticated organism relative to its ancestral organism, and which has developed under neutral conditions. A neutral evolutionary change is evidenced by a KA/Ks value of between about 0.75-1.25, preferably between about 0.9 and 1.1, and most preferably equal to about 1Ø Also, in the case of neutral evolution, there is no "directionality" to be inferred. The gene is free to accumulate changes without constraint, so both the ancestral and domesticated versions are changing with respect to one another.
The term "resistant" means that an organism exhibits an ability to avoid, or diminish the extent of, a disease condition and/or development of the disease, preferably when compared to non-resistant organisms.
The term "susceptibility" means that an organism fails to avoid, or diminish the extent of, a disease condition and/or development of the disease condition, preferably when compared to an organism that is known to be resistant.
It is understood that resistance and susceptibility vary from individual to individual, and that, for purposes of this invention, these terms also apply to a group of individuals within a species, and comparisons of resistance and susceptibility generally refer overall to inter-specific differences, although comparisons within species may be used.
Taxonomic classification of wild relatives is fairly changeable. Thus, a species difference based on a taxonomic classification may change to an intra-specific difference if taxonomic classifications are changed.
The term "stress-resistance" refers to the ability to withstand drought, disease, pests (including, but not limited to, insects, animal herbivores, and microbes), high salt levels, and other adverse stimuli, internal or external, that tend to disturb the plant's homeostasis, and may lead to disorder, disease, or death if uncorrected.
The term "homologous" or "homologue" or "ortholog" is known and well understood in the art and refers to related sequences that share a common ancestor and is determined based on degree of sequence identity. These terms describe the relationship between a gene found in one species, subspecies, variety, cultivar or strain and the corresponding or equivalent gene in another species, subspecies, variety, cultivar or strain.
For purposes of this invention homologous sequences are compared. "Homologous sequences" or "homologues"
or "orthologs" are thought, believed, or known to be functionally related. A
functional relationship may be indicated in any one of a number of ways, including, but not limited to, (a) degree of sequence identity; (b) same or similar biological function.
Preferably, both (a) and (b) are indicated. The degree of sequence identity may vary, but is preferably at least 50% (when using standard sequence alignment programs known in the art), more preferably at least 60%, more preferably at least about 75%, more preferably at least about 85%.
Homology can be determined using software programs readily available in the art, such as those discussed in CuJ°~eut Protocols in Molecular Biology (F.M.
Ausubel et al., eds., 1987) Supplement 30, section 7.718, Table 7.71. Preferred alignment programs are MacVector (Oxford Molecular Ltd, Oxford, U.K.) and ALIGN Plus (Scientific and Educational Software, Pennsylvania). Another preferred alignment program is Sequencher (Gene Codes, Ann Arbor, Michigan), using default parameters.
The term "nucleotide change" refers to nucleotide substitution, deletion, and/or insertion, as is well understood in the art.
"Housekeeping genes" is a term well understood in the art and means those genes associated with general cell function, including but not limited to growth, division, stasis, metabolism, and/or death. "Housekeeping" genes generally perform functions found in more than one cell type. In contrast, cell-specific genes generally perform functions in a particular cell type and/or class.
The term "agent", as used herein, means a biological or chemical compound such as a simple or complex organic or inorganic molecule, a peptide, a protein or an oligonucleotide that modulates the function of a polynucleotide or polypeptide. A vast array of compounds can be synthesized, for example oligomers, such as oligopeptides and oligonucleotides, and synthetic organic and inorganic compounds based on various core structures, and these are also included in the term "agent". In addition, various natural sources can provide compounds for screening, such as plant or animal extracts, and the like. Compounds can be tested singly or in combination with one another.
The term "to modulate function" of a polynucleotide or a polypeptide means that the function of the polynucleotide or polypeptide is altered when compared to not adding an agent. Modulation may occur on any level that affects function. A
polynucleotide or polypeptide function may be direct or indirect, and measured directly or indirectly.
A "function of a polynucleotide" includes, but is not limited to, replication;
translation; expression pattern(s). A polynucleotide function also includes functions associated with a polypeptide encoded within the polynucleotide. For example, an agent which acts on a polynucleotide and affects protein expression, conformation, folding (or other physical characteristics), binding to other moieties (such as ligands), activity (or other functional characteristics), regulation and/or other aspects of protein structure or function is considered to have modulated polynucleotide function.
A "function of a polypeptide" includes, but is not limited to, conformation, folding (or other physical characteristics), binding to other moieties (such as ligands), activity (or other functional characteristics), and/or other aspects of protein structure or functions. For example, an agent that acts on a polypeptide and affects its conformation, folding (or other physical characteristics), binding to other moieties (such as ligands), activity (or other functional characteristics), and/or other aspects of protein structure or functions is considered to have modulated polypeptide function. The ways that an effective agent can act to modulate the function of a polypeptide include, but are not limited to 1 ) changing the conformation, folding or other physical characteristics; 2) changing the binding strength to its natural ligand or changing the specificity of binding to ligands; and 3) altering the activity of the polypeptide.
The term "target site" means a location in a polypeptide which can be a single amino acid and/or is a part of, a structural and/or functional motif, e.g., a binding site, a dimerization domain, or a catalytic active site. Target sites may be useful for direct or indirect interaction with an agent, such as a therapeutic agent.
The term "molecular difference" includes any structural and/or functional difference.
Methods to detect such differences, as well as examples of such differences, are described herein.
A "functional effect" is a term well known in the art, and means any effect which is exhibited on any level of activity, whether direct or indirect.
The term "ease of harvest" refers to plant characteristics or features that facilitate manual or automated collection of structures or portions (e.g., fruit, leaves, roots) for consumption or other commercial processing.
The term "yield" refers to the amount of plant or animal tissue or material that is available for use by humans for food, therapeutic, veterinary or other markets.
The term "enhanced economic productivity" refers to the ability to modulate a commercially or aesthetically relevant trait so as to improve desired features. Increased yield and enhanced stress resistance are two examples of enhanced economic productivity.

II. General Procedures Known in the Art For the purposes of this invention, the source of the polynucleotide from the domesticated plant or animal or its ancestor can be any suitable source, e.g., genomic sequences or cDNA sequences. Preferably, cDNA sequences are compared. Protein-coding sequences can be obtained from available private, public and/or commercial databases such as those described herein. These databases serve as repositories of the molecular sequence data generated by ongoing research efforts. Alternatively, protein-coding sequences may be obtained from, for example, sequencing of cDNA reverse transcribed from mRNA
expressed in cells, or after PCR amplification, according to methods well known in the art.
Alternatively, genomic sequences may be used for sequence comparison. Genomic sequences can be obtained from available public, private and/or commercial databases or from sequencing of genomic DNA libraries or from genomic DNA, after PCR.
In some embodiments, the cDNA is prepared from mRNA obtained from a tissue at a determined developmental stage, or a tissue obtained after the organism has been subjected to certain environmental conditions. cDNA libraries used for the sequence comparison of the present invention can be constructed using conventional cDNA library construction techniques that are explained fully in the literature of the art. Total mRNAs are used as templates to reverse-transcribe cDNAs. Transcribed cDNAs are subcloned into appropriate vectors to establish a cDNA library. The established cDNA library can be maximized for full-length cDNA contents, although less than full-length cDNAs may be used.
Furthermore, the sequence frequency can be normalized according to, for example, Bonaldo et al. (1996) Ge~ome Research 6:791-X06. cDNA clones randomly selected from the constructed cDNA
library can be sequenced using standard automated sequencing techniques.
Preferably, full-length cDNA clones are used for sequencing. Either the entire or a large portion of cDNA
clones from a cDNA library may be sequenced, although it is also possible to practice some embodiments of the invention by sequencing as little as a single cDNA, or several cDNA
clones.
In one preferred embodiment of the present invention, cDNA clones to be sequenced can be pre-selected according to their expression specificity. In order to select cDNAs corresponding to active genes that are specifically expressed, the cDNAs can be subject to subtraction hybridization using mRNAs obtained from other organs, tissues or cells of the same animal. Under certain hybridization conditions with appropriate stringency and concentration, those cDNAs that hybridize with non-tissue specific mRNAs and thus likely represent "housekeeping" genes will be excluded from the cDNA pool.
Accordingly, remaining cDNAs to be sequenced are more likely to be associated with tissue-specific functions. For the purpose of subtraction hybridization, non-tissue-specific mRNAs can be obtained from one organ, or preferably from a combination of different organs and cells. The amount of non-tissue-specific mRNAs are maximized to saturate the tissue-specific cDNAs.
Alternatively, information from online databases can be used to select or give priority to cDNAs that are more likely to be associated with specific functions. For example, the ancestral cDNA candidates for sequencing can be selected by PCR using primers designed from candidate domesticated organism cDNA sequences. Candidate domesticated organism cDNA sequences are, for example, those that are only found in a specific tissue, such as skeletal muscle, or that correspond to genes likely to be important in the specific function.
Such tissue-specific cDNA sequences may be obtained by searching online sequence databases in which information with respect to the expression profile and/or biological activity for cDNA sequences may be specified.
Sequences of ancestral homologues) to a known domesticated organism's gene may be obtained using methods standard in the art, such as PCR methods (using, for example, GeneAmp PCR System 9700 thermocyclers (Applied Biosystems, Inc.)). For example, ancestral cDNA candidates for sequencing can be selected by PCR using primers designed from candidate domesticated organism cDNA sequences. For PCR, primers may be made from the domesticated organism's sequences using standard methods in the art, including publicly available primer design programs such as PRIMER~ (Whitehead Institute). The ancestral sequence amplified may then be sequenced using standard methods and equipment in the art, such as automated sequencers (Applied Biosystems, Inc.). Likewise, ancestors gene mimics can be used to obtain corresponding genes in domesticated organisms.
III. Identification of Positively Selected Polynucleotides in Domesticated Organisms In a preferred embodiment, the methods described herein can be applied to identify the genes that control traits of interest in agriculturally important domesticated plants. Humans have bred domesticated plants for several thousand years without knowledge of the genes that control these traits. Knowledge of the specific genetic mechanisms involved would allow much more rapid and direct intervention at the molecular level to create plants with desirable or enhanced traits.
Humans, through artificial selection, have provided intense selection pressures on crop plants. This pressure is reflected in evolutionarily significant changes between homologous genes of domesticated organisms and their wild ancestors. It has been found that only a few genes, e.g., 10-15 per species, control traits of commercial interest in domesticated crop plants. These few genes have been exceedingly difficult to identify through standard methods of plant molecular biology. The KA/Ks and related analyses described herein can identify the genes controlling traits of interest.
For any crop plant of interest, cDNA libraries can be constructed from the domesticated species or subspecies and its wild ancestor. As is described in USSN
09/240,915, filed January 29, 1999, the cDNA libraries of each are "BLASTed"
against each other to identify homologous polynucleotides. Alternatively, the skilled artisan can access commercially and/or publicly available genomic or cDNA databases rather than constructing cDNA libraries.
Next, a KA/Ks or related analysis is conducted to identify selected genes that have rapidly evolved under selective pressure. These genes are then evaluated using standard molecular and transgenic plant methods to determine if they play a role in the traits of commercial or aesthetic interest. The genes of interest are then manipulated by, e.g., random or site-directed mutagenesis, to develop new, improved varieties, subspecies, strains or cultivars.
The general method of the invention is as follows. Briefly, nucleotide sequences are obtained from a domesticated organism and a wild ancestor. The domesticated organism's and ancestor's nucleotide sequences are compared to one another to identify sequences that are homologous. The homologous sequences are analyzed to identify those that have nucleic acid sequence differences between the domesticated organism and ancestor. Then molecular evolution analysis is conducted to evaluate quantitatively and qualitatively the evolutionary significance of the differences. For genes that have been positively selected, outgroup analysis can be done to identify those genes that have been positively selected in the domesticated organism (or in the ancestor). Next, the sequence is characterized in terms of molecular/genetic identity and biological function. Finally, the information can be used to identify agents that can modulate the biological function of the polypeptide encoded by the gene.
The general methods of the invention entail comparing protein-coding nucleotide sequences of ancestral and domesticated organisms. Bioinformatics is applied to the comparison and sequences are selected that contain a nucleotide change or changes that is/are evolutionarily significant change(s). The invention enables the identification of genes that have evolved to confer some evolutionary advantage and the identification of the specific evolved changes. In a preferred embodiment, the domesticated organism is O~yza sativa and the wild ancestor is Of yza rufipogo~c. In the case of the present invention, protein-coding nucleotide sequences were obtained from O. rufipogon clones by standard sequencing techniques.
Protein-coding sequences of a domesticated organism and its ancestor are compared to identify homologous sequences. Any appropriate mechanism for completing this comparison is contemplated by this invention. Alignment may be performed manually or by software (examples of suitable alignment programs are known in the art). Preferably, protein-coding sequences from an ancestor are compared to the domesticated species sequences via database searches, e.g., BLAST searches. The high scoring "hits," i.e., sequences that show a significant similarity after BLAST analysis, will be retrieved and analyzed.
Sequences showing a significant similarity can be those having at least about 60%, at least about 75%, at least about 80%, at least about 85%, or at least about 90% sequence identity.
Preferably, sequences showing greater than about 80% identity are further analyzed. The homologous sequences identified via database searching can be aligned in their entirety using sequence alignment methods and programs that are known and available in the art, such as the commonly used simple alignment program CLUSTAL V by Higgins et al. (1992) CABIOS
8:189-191.
The present invention provides a method for identifying a polynucleotide sequence encoding a polypeptide of a domesticated organism, wherein said polypeptide is or is suspected of being associated with improved yield in said domesticated organism as compared to a wild ancestor of said domesticated organism, comprising the steps of a) comparing polypeptide-coding nucleotide sequences of said domesticated organism to polypeptide-coding nucleotide sequences of said wild ancestor; and b) selecting a polynucleotide sequence in the domesticated organism that contains a nucleotide change as compared to a corresponding sequence in the wild ancestor, wherein said change is evolutionarily significant, whereby the domesticated organism's polynucleotide sequence is identified. In a preferred embodiment, the polypeptide that is associated with improved yield is an EG307 polypeptide.
In the present case, for example, nucleotide sequences obtained from O. s ufipogo~
were used as query sequences in a search of O. sativa ESTs in GenBank to identify homologous sequences. It should be noted. that a complete protein-coding nucleotide sequence is not required. Indeed, partial cDNA sequences may be compared. Once sequences of interest are identified by the methods described below, further cloning and/or bioinformatics methods can be used to obtain the entire coding sequence for the gene or protein ofinterest.
Alternatively, the sequencing and homology comparison of protein-coding sequences between the domesticated organism and its ancestor may be performed simultaneously by using the newly developed sequencing chip technology. See, for example, Rava et al. LTS
Patent 5,545,531.
The aligned protein-coding sequences of domesticated organism and ancestor are analyzed to identify nucleotide sequence differences at particular sites.
Again, any suitable method for achieving this analysis is contemplated by this invention. If there are no nucleotide sequence differences, the ancestor protein coding sequence is not usually further analyzed. The detected sequence changes are generally, and preferably, initially checked for accuracy. Preferably, the initial checking comprises performing one or more of the following steps, any and all of which are known in the art: (a) finding the points where there are changes between the ancestral and domesticated organism sequences; (b) checking the sequence fluorogram (chromatogram) to determine if the bases that appear unique to the ancestor or domesticated organism correspond to strong, clear signals specific for the called base; (c) checking the domesticated organism hits to see if there is more than one domesticated organism sequence that corresponds to a sequence change. Multiple domesticated organism sequence entries for the same gene that have the same nucleotide at a position where there is a different nucleotide in an ancestor sequence provides independent support that the domesticated sequence is accurate, and that the change is significant. Such changes are examined using database information and the genetic code to determine whether these nucleotide sequence changes result in a change in the amino acid sequence of the encoded protein. As the definition of "nucleotide change" makes clear, the present invention ZS encompasses at least one nucleotide change, either a substitution, a deletion or an insertion, in a protein-coding polynucleotide sequence of a domesticated organism as compared to a corresponding sequence from the ancestor. Preferably, the change is a nucleotide substitution.
More preferably, more than one substitution is present in the identified sequence and is subjected to molecular evolution analysis.
SO Any of several different molecular evolution analyses or KA/Ks-type methods can be employed to evaluate quantitatively and qualitatively the evolutionary significance of the identified nucleotide changes between domesticated species gene sequences and those of corresponding ancestors. Kreitman and Akashi ( 1995) A~~u. Rev. Ecol. Syst.
26:403-422; Li, Moleculas~ Evolution, Sinauer Associates, Sunderland, MA, 1997. For example, positive selection on proteins (i. e., molecular-level adaptive evolution) can be detected in protein-coding genes by pairwise comparisons of the ratios of nonsynonymous nucleotide substitutions per nonsynonymous site (KA) to synonymous substitutions per synonymous site (Ks) (Li et al., 1985; Li, 1993). Any comparison of KA and Ks may be used, although it is particularly convenient and most effective to compare these two variables as a ratio.
Sequences are identified by exhibiting a statistically significant difference between KA and Ks using standard statistical methods.
In the case of the present invention, homologous sequences from O. ~ufipogon and O.
sativa were identified. Comparison of the sequences of one O. rufipogon clone, PBI0307H9, SEQ ID N0:31, and O. sativa in GenBank revealed a high KA/KS ratio. Further cloning and PCR of several different strains of O. sativa were completed in order to obtain the entire gene, named EG307, so that the entire gene sequence could be subjected to KA/Ks analysis. These procedures are detailed in Example 10. The complete sequence of EG307 in O.
s°ufipogo~, SEQ ID N0:28, and O. sativa cv. Nipponbare l, SEQ ID N0:25, are shown in Figure 1. The corresponding protein sequences, SEQ ID N0:30, and SEQ ID N0:27, are shown in Figure 2.
A summary of the KA/Ks ratios is shown in Table 1 of Example 11. Some strains were more similar to O. rufipogo~ due to cross-breeding between O. rufipogon and the domestic strain.
High KAIKs ratios for some strains indicates an evolutionarily significant change.
Preferably, the KA/Ks analysis computer program by Li et al. is used to carry out the present invention, although other analysis programs that can detect positively selected genes between species can also be used. Li et al. (1985) Mol. Biol. Evol. 2:150-174;
Li (1993); see also J. Mol. Evol. 36:96-99; Messier and Stewart (1997) Nature 385:151-154;
Nei (1987) Molecula~° Evolutionary Gefzetics (New York, Columbia University Press). The KA/Ks method, which comprises a comparison of the rate of non-synonymous substitutions per non-synonymous site with the rate of synonymous substitutions per synonymous site between homologous protein-coding region of genes in terms of a ratio, is used to identify sequence substitutions that may be driven by adaptive selections or by neutral selections during evolution. A synonymous ("silent") substitution is one that, owing to the degeneracy of the genetic code, makes no change to the amino acid sequence encoded; a non-synonymous substitution results in an amino acid replacement. The extent of each type of change can be estimated as KA and Ks, respectively, the numbers of non-synonymous substitutions per non-synonymous site and synonymous substitutions per synonymous site. Calculations of KAIKs may be performed manually or by using software. An example of a suitable program is MEGA (Molecular Genetics Institute, Pennsylvania State University).

For the purpose of estimating KA and Ks, either complete or partial protein-coding sequences are used to calculate total numbers of synonymous and non-synonymous substitutions, as well as non-synonymous and synonymous sites. The length of the polynucleotide sequence analyzed can be any appropriate length. Preferably, the entire S coding sequence is compared, in order to determine any and all significant changes. Publicly available computer programs, such as L,i93 (Li (1993) J. Mol. Evol. 36:96-99) or INA, can be used to calculate the KA and Ks values for all pairwise comparisons. This analysis can be further adapted to examine sequences in a "sliding window" fashion such that small numbers of important changes are not masked by the whole sequence. "Sliding window"
refers to examination of consecutive, overlapping subsections of the gene (the subsections can be of any length).
Sliding window KA/Ks analysis of, for example, identified gene EG307 showed that there are a number of nonsynonymous changes on the 5'-end of EG307 in many of the O.
sativa strains when compared to O. rz fpogon. The 3'-end of the gene had a low ratio in all of I S the strains. These procedures and results are detailed in Example I I and Tables 2-7.
The comparison of non-synonymous and synonymous substitution rates is represented by the KA/Ks ratio. KA/Ks has been shown to be a reflection of the degree to which adaptive evolution has been at work in the sequence under study. Full length or partial segments of a coding sequence can be used for the KA/Ks analysis. The higher the KA/Ks ratio, the more likely that a sequence has undergone adaptive evolution and the non-synonymous substitutions are evolutionarily significant. See, for example, Messier and Stewart (1997).
Preferably, the KA/Ks ratio is at least about 0.75, more preferably at least about 1.0, more preferably at least about 1.25, more preferably at least about 1.50, or more preferably at least about 2.00. Preferably, statistical analysis is performed on all elevated KA/Ks ratios, including, but not limited to, standard methods such as Student's t-test and likelihood ratio tests described by Yang (1998) Mol. l3iol Evol. 37:441-456.
For a pairwise comparison of homologous sequences, K,~/Ks ratios significantly greater than unity strongly suggest that positive selection has fixed greater numbers of amino acid replacements than can be expected as a result of chance alone, and is in contrast to the commonly observed pattern in which the ratio is less than one. Nei (1987); I-Iughes and I-Iei (1988) Nature 335:167-170; Messier and Stewart (1994) Current Biol. 4:911-913;
Kreittnan and Akashi (1995) Ann. Rev. l:col. Syst. 26:403-422; Messier and Stewart (1997). Ratios less than one generally signify the role of negative, or purifying selection: there is strong pressure on the primary structure of functional, effective proteins to remain unchanged. Ratios of about 1 indicate evolution under neutral conditions.
All methods for calculating KA/Ks ratios are based on a pairwise comparison of the number of nonsynonymous substitutions per nonsynonymous site to the number of synonymous substitutions per synonymous site for the protein-coding regions of homologous genes from the ancestral and domesticated organisms. Each method implements different corrections for estimating "multiple hits" (i. e., more than one nucleotide substitution at the same site). Each method also uses different models for how DNA sequences change over evolutionary time. Thus, preferably, a combination of results from different algorithms is used to increase the level of sensitivity for detection of positively-selected genes and confidence in the result.
Preferably, KA/Ks ratios should be calculated for orthologous gene pairs, as opposed to paralogous gene pairs (i. e., a gene which results from speciation, as opposed to a gene that is the result of gene duplication) Messier and Stewart (1997). This distinction may be made by performing additional comparisons with other ancestors, which allows for phylogenetic tree-building. Orthologous genes when used in tree-building will yield the known "species tree", i. e., will produce a tree that recovers the known biological tree. In contrast, paralogous genes will yield trees which will violate the known biological tree.
It is understood that the methods described herein could lead to the identification of ancestral or domesticated organism polynucleotide sequences that are functionally related to the protein-coding sequences. Such sequences may include, but are not limited to, non-coding sequences or coding sequences that do not encode proteins. These related sequences can be, for example, physically adjacent to the protein-coding sequences in the genome, such as introns or 5'- and 3'- flanking sequences (including control elements such as promoters and enhancers). These related sequences may be obtained via searching available public, private and/or commercial genome databases or, alternatively, by screening and sequencing the organism's genomic library with a protein-coding sequence as probe. Methods and techniques for obtaining non-coding sequences using related coding sequence are well known to one skilled in the art.
The evolutionarily significant nucleotide changes, which are detected by molecular evolution analysis such as the KA/Ks analysis, can be further assessed for their unique occurrence in the domesticated organism or the extent to which these changes are unique in the domesticated organism. For example, the identified changes in the domesticated gene can be tested for presence/absence in other sequences of related species, subspecies or other organisms having a common ancestor with the domesticated organism. This comparison ("outgroup analysis") permits the determination of whether the positively selected gene is positively selected for in the domesticated organism at issue (as opposed to the ancestor).
For example, the identified changes in the EG307 gene were identified to various S degrees in a number of O. saliva strains. See Tables 2-7. Additionally, a counterpart to EG307 was identified in maize, lea mays mays, its wild ancestor, teosinte, Zea mays parviglumis, and also wild relatives of maize, Z. diploperennis and Z.
luxurians. See Example 13 and Table 9. While EG307 in rice and maize was somewhat different at the nucleotide level, the protein sequences were more similar. Observing that rice and corn were independently domesticated from their wild ancestors, a consistent pattern emerges: the majority of the amino acid replacements in the modern crop (whether maize or rice), as compared to the ancestral plant (teosinte or ancestral rice) result in increased charge/polarity, increased solubility, and decreased hydrophobicity. This pattern is most unlikely to have occurred by chance in these two independent domestication events. This suggests that these 1 S replacements were a similar response to human imposed domestication. This is powerful evidence that EG307 has been selected as a result of human domestication of these two cereals.
The sequences with at least one evolutionarily significant change between a domesticated organism and its ancestor can be used as primers for PCR analysis of other ancestor protein-coding sequences, and resulting polynucleotides are sequenced to see whether the same change is present in other ancestors. These comparisons allow further discrimination as to whether the adaptive evolutionary changes are unique to the domesticated lineage as compared to other ancestors or whether the adaptive change is unique to the ancestor as compared to the domesticated species and other ancestors. A
nucleotide change that is detected in the domesticated organism but not other ancestors more likely represents an adaptive evolutionary change in the domesticated organism. Alternatively, a nucleotide change that is detected in an ancestor that is not detected in the domesticated organism or other ancestors likely represents an ancestor adaptive evolutionary change.
Other ancestors used for comparison can be selected based on their phylogenetic relationships with the domesticated organism. Statistical significance of such comparisons may be determined using established available programs, e.g., t-test as used by Messier and Stewart (1997) Nature 385:1 S 1-154. Those genes showing statistically high K~/Ks ratios are very likely to have undergone adaptive evolution.

Sequences with significant changes can be used as probes in genomes from different domesticated populations to see whether the sequence changes are shared by more than one domesticated population. Gene sequences from different domesticated populations can be obtained from databases or, alternatively, from direct sequencing of PCR-amplified DNA
from a number of unrelated, diverse domesticated populations. The presence of the identified changes in different domesticated populations would further indicate the evolutionary significance of the changes.
Sequences with significant changes between species can be further characterized in terms of their molecular/genetic identities and biological functions, using methods and techniques known to those of ordinary skill in the art. For example, the sequences can be located genetically and physically within the organism's genome using publicly available bio-informatics programs. The newly identified significant changes within the nucleotide sequence may suggest a potential role of the gene in the organism's evolution and a potential association with unique, enhanced or altered functional capabilities.
Using the techniques of the present invention, a heretofore unknown evolutionarily significant gene in rice, termed EG307, has been discovered as detailed in EXAMPLE 10.
KA/Ks analysis, performed as described in EXAMPLE 11 between O. y~ufipogon and certain O. sativa strains indicated an evolutionarily significant change as shown in Table 1. The gene has been positively selected. Using several different rice maps, as described in EXAMPLE
12, it was found that EG307 was within about 10 cM of marker RZ672, a marker associated with a QTL for 1000 grain weight residing on chromosome 3. (1000-grain weight is the weight (mass) of three different samples of 1000 randomly chosen fully filled grains of rice.) This is a sensitive measure of yield, which takes into account the individual variation in weight that occurs among rice grains. Thus, there only is about a 10% chance that the RZ672 marker will be separated from EG307 to crossing over in a single generation, strongly suggesting that EG307 plays an important role in controlling increased yield.
Also using the techniques of the present invention, a heretofore unknown evolutionarily significant gene in rice, termed EG3117, has been discovered as detailed in EXAMPLE 14. KA/Ks analysis, performed as described in EXAMPLE 14 between O.
~ufipogoh and certain O. sativa strains indicated an evolutionarily significant change as shown in Table 10. The gene has been positively selected. Using several different rice maps, as described in EXAMPLES 13 and 14, it was found that EGl 117 lies on the same BAC as marker RZ672, a marlcer associated with a QTL for 1000 grain weight residing on chromosome 3. EG1117 lies about 2-3 cM from EG307.

From the combination of the evolutionarily significant KA/Ks value and mapping data, one of skill in the art can reasonably conclude that EG307 and EG1117 are yield-related genes. EG30Ts and EG1117's yield-increasing function could be easily confirmed by making and growing a mutant or transgenic plant. Alternative methods include association analysis and pedigree analysis using the EG307 and EGl 17 sequence derived from rice, EG307 and EGl 17 genes from maize and its wild ancestor were obtained as detailed in EXAMPLE 13.
The putative gene with the identified sequences may be further characterized by, for example, homologue searching. Shared homology of the putative gene with a known gene may indicate a similar biological role or function. Another exemplary method of characterizing a putative gene sequence is on the basis of known sequence motifs. Certain sequence patterns are known to code for regions of proteins having specific biological characteristics such as signal sequences, DNA binding domains, or transmembrane domains.
The identified sequences with significant changes can also be further evaluated by looking at where the gene is expressed in terms of tissue- or cell type-specificity. For example, the identified coding sequences can be used as probes to perform ih situ mRNA
hybridization that will reveal the expression patterns of the sequences. Genes that are expressed in certain tissues may be better candidates as being associated with important functions associated with that tissue, for example developing endosperm tissue. The timing of the gene expression during each stage of development of a species member can also be determined.
As another exemplary method of sequence characterization, the functional roles of the identified nucleotide sequences with significant changes can be assessed by conducting functional assays for different alleles of an identified gene in the transfected domesticated organism, e.g., in the transgenic plant or animal. Current examples of plant functional assays include the use of microanays, see Seki, et al., Monitoring the Exapression Pattern of 1300 A~abidopsis Genes Under Drought and Cold Stresses Using a Full-Length cDNA
Microarray.
Plant Cell 13:61-72 (2001), and metabolite profiling, see Roessner, et al, Metabolic Profiling Allows Comprhensive Phenotyping of Geneticaly or Environmentally Modified Plant Systems. Plant Cell 13:11-29 (2001).
As another exemplary method of sequence characterization, the use of computer programs may allow modeling and visualizing the three-dimensional structure of the homologous proteins from domesticated organism and ancestor. Specific, exact knowledge of which amino acids have been replaced in the ancestor proteins) allows detection of structural changes that may be associated with functional differences. Thus, use of modeling techniques is closely associated with identification of functional roles discussed in the previous paragraph. The use of individual or combinations of these techniques constitutes part of the present invention.
A domesticated organism's gene identified by the subject method can be used to identify homologous genes in other species that share a common ancestor. For example, maize, rice, wheat, millet, sorghum and other cereals share a common ancestor, and genes identified in rice can lead directly to homologous genes in these other grasses. Likewise, tomatoes and potatoes share a common ancestor, and genes identified in tomatoes by the subject method are expected to have homologues in potatoes, and vice versa.
The present invention also provides a method of detecting a yield-increasing gene in a plant cell comprising: a) contacting the EG307 gene or a portion thereof greater than 12 nucleotides, preferably greater than 30 nucleotides in length with a preparation of genomic DNA from the plant cell under hybridization conditions providing detection of nucleic acid molecule sequences having about 50% or greater sequence identity to the a nucleic acid molecule selected from the group consisting of SEQ ID NO:1, SEQ ID N0:91, SEQ
ID.
N0:2, SEQ ID N0:4, SEQ ID NO:S, SEQ ID N0:7, SEQ ID NO:10, SEQ ID NO:11, SEQ
ID
N0:12, SEQ ID N0:14, SEQ ID NO:15, SEQ ID N0:17, SEQ ID N0:18, SEQ ID N0:20, SEQ ID NO:21, SEQ ID. N0:23, SEQ ID N0:24, SEQ ID N0:25, SEQ ID N0:27, SEQ ID
N0:28, SEQ ID N0:29, SEQ ID N0:30, SEQ ID N0:31, SEQ ID N0:33, SEQ ID N0:34, SEQ ID. N0:35, SEQ ID N0:37, SEQ ID N0:38, SEQ ID N0:40, SEQ ID N0:41, SEQ ID
N0:42, SEQ ID NO:44, SEQ ID N0:45, SEQ ID N0:46, SEQ ID N0:47, SEQ ID N0:49, SEQ ID. NO:50, SEQ ID NO:51, SEQ ID N0:53, SEQ ID N0:54, SEQ ID NO:55, SEQ ID
N0:57, SEQ ID N0:58, SEQ ID N0:60, SEQ ID N0:62, SEQ ID NO:63, SEQ ID N0:64, SEQ ID N0:66, SEQ ID N0:67, SEQ ID N0:69, SEQ ID. N0:70, SEQ ID N0:71, SEQ ID
N0:73, SEQ ID N0:74, SEQ ID N0:75, SEQ ID N0:77, S~Q ID N0:59, SEQ ID N0:78, SEQ ID N0:80, SEQ ID NO:81, SEQ ID N0:82, SEQ ID N0:84 and SEQ ID N0:85; and b) detecting hybridization, whereby a yield-increasing gene may be identified.
The present invention also provides a method of isolating a yield-related gene from a recombinant plant cell library, comprising a) providing a preparation of plant cell DNA or a recombinant plant cell library; b) contacting the preparation or plant cell library with a detectably-labelled EG307 conserved oligonucleotide under hybridization conditions providing detection of genes having 50% or greater sequence identity; and c) isolating a yield-related gene by its association with the detectable label.

The present invention also provides a method of isolating a yield-related gene from plant cell DNA comprising a) providing a sample of plant cell DNA; b) providing a pair of oligonucleotides having sequence homology to a conserved region of an EG307 gene; c) combining the pair of oligonucleotides with the plant cell DNA sample under conditions suitable for polymerase chain reaction-mediated DNA amplification; and d) isolating the amplified yield-related gene or fragment thereof.
The sequences identified by the methods described herein can be used to identify agents that are useful in modulating domesticated organism-unique, enhanced or altered functional capabilities and/or correcting defects in these capabilities using these sequences.
These methods employ, for example, screening techniques known in the art, such as in vita°o systems, cell-based expression systems and transgenic animals and plants. The approach provided by the present invention not only identifies rapidly evolved genes, but indicates modulatioxis that can be made to the protein that may not be too toxic because they exist in smother species.
The present invention also provides a method of producing an EG307 polypeptide comprising: a) providing a cell transfected with a polynucleotide encoding an polypeptide positioned for expression in the cell; b) culturing the transfected cell under conditions for expressing the polynucleotide; and c) isolating the EG307 polypeptide.
The present invention also provides a method of detecting a yield-increasing gene in a plant cell comprising: a) contacting the EG307 OR EG1117 gene or a portion thereof greater than 12 nucleotides, preferably greater than 30 nucleotides in length with a preparation of genomic DNA from the plant cell under hybridization conditions providing detection of nucleic acid molecule sequences having about 50% or greater sequence identity to the a nucleic acid molecule selected from the group consisting of SEQ ID N0:92, SEQ
ID N0:93, SEQ ID N0:94, SEQ ID N0:96, SEQ ID NO:97, SEQ ID N0:98, SEQ ID NO:100, SEQ ID
NO:101, SEQ ID N0:102, SEQ ID N0:103, SEQ ID N0:104, SEQ ID N0:104, SEQ ID
N0:106, SEQ ID N0:107, SEQ ID N0:109, SEQ ID NO:110, SEQ ID N0:112, SEQ ID
N0:113, SEQ ID N0:114, SEQ ID N0:116, SEQ ID NO:l 17, SEQ ID N0:119, SEQ ID
N0:120, SEQ ID N0:121, SEQ ID N0:122, SEQ ID N0:123, SEQ ID N0:124, SEQ ID
N0:125, SEQ ID N0:127, SEQ ID N0:128, SEQ ID N0:129, SEQ ID N0:130, SEQ ID
N0:131, SEQ ID N0:133, SEQ ID N0:135, SEQ ID N0:136, SEQ ID N0:137, SEQ ID
N0:138, SEQ ID N0:140, SEQ ID N0:141, SEQ ID N0:142, SEQ ID N0:144, SEQ ID
N0:145, SEQ ID N0:146, SEQ ID N0:147, SEQ ID N0:149, SEQ ID NO:150, SEQ ID
NO:151, SEQ ID N0:152, SEQ ID N0:154, SEQ ID NO:155, SEQ ID N0:157, SEQ ID

N0:158, SEQ ID N0:160, SEQ ID N0:161, SEQ ID N0:162, SEQ ID N0:163, SEQ ID
N0:165, SEQ ID N0:166, SEQ ID N0:167, and SEQ ID N0:168; and b) detecting hybridization, whereby a yield-increasing gene may be identified.
The present invention also provides a method of isolating a yield-related gene from a ~ recombinant plant cell library, comprising a) providing a preparation of plant cell DNA or a recombinant plant cell library; b) contacting the preparation or plant cell library with a detectably-labelled EG307 OR EG1117 conserved oligonucleotide under hybridization conditions providing detection of genes having 50% or greater sequence identity; and c) isolating a yield-related gene by its association with the detectable label.
The present invention also provides a method of isolating a yield-related gene from plant cell DNA comprising a) providing a sample of plant cell DNA; b) providing a pair of oligonucleotides having sequence homology to a conserved region of an EG307 OR

gene; c) combining the pair of oligonucleotides with the plant cell DNA sample under conditions suitable for polymerase chain reaction-mediated DNA amplification;
and d) isolating the amplified yield-related gene or fragment thereof.
The sequences identified by the methods described herein can be used to identify agents that are useful in modulating domesticated organism-unique, enhanced or altered functional capabilities and/or correcting defects in these capabilities using these sequences.
These methods employ, for example, screening techniques known in the art, such as i~ vits°o systems, cell-based expression systems and transgenic animals and plants. The approach provided by the present invention not only identifies rapidly evolved genes, but indicates modulations that can be made to the protein that may not be too toxic because they exist in another species.
The present invention also provides a method of producing an EG307 OR EG1117 polypeptide comprising: a) providing a cell transfected with a polynucleotide encoding an EG307 OR EG1117 polypeptide positioned for expression in the cell; b) culturing the transfected cell under conditions for expressing the polynucleotide; and c) isolating the EG307 OR EGl 117 polypeptide.
A. EG307 Polypeptides One embodiment of the present invention is an isolated plant EG307 polypeptide. As used herein, an EG307 polypeptide, in one embodiment, is a polypeptide that is related to (i.e., bears structural similarity to) the O. sativa polypeptide of about 447 amino acids and having the sequence depicted in Figure 2 (SEQ ID NO: 6). The original identification of such a polypeptide is detailed in the Examples. A preferred EG307 polypeptide is encoded by a polynucleotide that hybridizes under stringent hybridization conditions to at least one of the following genes: (a) a gene encoding an O. sativa EG307 polypeptide (i.e., an O. sativa gene);
(b) a gene encoding an O. rufipogon EG307 polypeptide (i.e., an O. ~ufipogoh gene); (c) a gene encoding a Zea mays mays EG307 gene; (d) a gene encoding a Zea rnays pa~viglunzis EG307 polypeptide (i.e., a. Z. mays parviglumis gene); (e) a gene encoding a Zea diplope~euv~is EG307 polypeptide (i.e., a. Z. diplopereuhis gene); and (f) a gene encoding a Zea luxuf°ia~s EG307 polypeptide (i.e., a. Z. luxu~ians gene). It is to be noted that the term "a" or "an" entity refers to one or more of that entity; for example, a gene refers to one or more genes or at least one gene. As such, the terms "a" (or "an"), "one or more" and "at least one" can be used interchangeably herein. It is also to be noted that the terms "comprising,"
"including," and "having" can be used interchangeably.
As used herein, stringent hybridization conditions refer to standard hybridization conditions under which polynucleotides, including oligonucleotides, are used to identify molecules having similar nucleic acid sequences. Such standard conditions are disclosed, for example, in Sambrook et al., MOLECULAR CLONING: A LABORATORY MANUAL, Cold Spring Harbor Labs Press, 1989. Examples of such conditions are provided in the Examples section of the present application.
As used herein, an O. sativa EG307 gene includes all nucleic acid sequences related to a natural O. sativa EG307 gene such as regulatory regions that control production of the O.
sativa EG307 polypeptide encoded by that gene (such as, but not limited to, transcription, translation or post-translation control regions) as well as the coding region itself. In one embodiment, an O. sativa EG307 gene includes the nucleic acid sequence SEQ ID
N0:4.
Nucleic acid sequence SEQ ID N0:4 represents the deduced sequence of a cDNA
(complementary DNA) polynucleotide, the production of which is disclosed in the Examples.
It should be noted that since nucleic acid sequencing technology is not entirely error-free, SEQ ID N0:4 (as well as other sequences presented herein), at best, represents an apparent nucleic acid sequence of the polynucleotide encoding an O. sativa EG307 polypeptide of the present invention.
In another embodiment, an O. sativa EG307 gene can be an allelic variant that includes a similar but not identical sequence to SEQ ID N0:4. An allelic variant of an O.
sativa EG307 gene including SEQ ID NO: 1 is a locus (or loci) in the geriome whose activity is concerned with the same biochemical or developmental processes, and/or a gene that that occurs at essentially the same locus as the gene including SEQ ID N0:4, but which, due to natural variations caused by, for example, mutation or recombination, has a similar but not identical sequence. Because genomes can undergo rearrangement, the physical arrangement of alleles is not always the same. Allelic variants typically encode polypeptides having similar activity to that of the polypeptide encoded by the gene to which they are being compared. Allelic variants can also comprise alterations in the 5' or 3' untranslated regions of the gene (e.g., in regulatory control regions). Allelic variants are well known to those skilled in the art and would be expected to be found within a given rice cultivar or strain since the genome is diploid and/or among a population comprising two or more rice cultivars or strains.
For example, it is believed that the O. sativa polynucleotide having nucleic acid sequences reprepsented by SEQ ID N0:18, to be described in more detail below, represents allelic variants of the Kasalath strain of O. sativa.
Similarly, a Zea mays ynays EG307 gene includes all nucleic acid sequences related to a natural Z. mays mays EG307 gene such as regulatory regions that control production of the Z. mays mays EG307 polypeptide encoded by that gene as well as the coding region itself. In one embodiment, a Zea mays mays EG307 gene includes the nucleic acid sequence SEQ ID
N0:66. Nucleic acid sequence SEQ ID N0:66 represents the deduced sequence of a cDNA
polynucleotide, the production of which is disclosed in the Examples. In another embodiment, a Zea mays mays EG307 gene can be an allelic variant that includes a similar but not identical sequence to SEQ ID NO:66.
According to the present invention, an isolated, or biologically pure, polypeptide, is a polypeptide that has been removed from its natural milieu. As such, "isolated"
and "biologically pure" do not necessarily reflect the extent to which the polypeptide has been purified. An isolated EG307 polypeptide of the present invention can be obtained from its natural source, can be produced using recombinant DNA technology or can be produced by chemical synthesis. An EG307 polypeptide of the present invention may be identified by its ability to perform the function of natural EG307 in a functional assay. By "natural EG307 polypeptide," it is meant the full length EG307 polypeptide of O. sativa, O.
rufipogoh, Z.
mays mays, and/or Z. mays parviglun2is. The phrase "capable of performing the function of a natural EG307 in a functional assay" means that the polypeptide has at least about 10% of the activity of the natural polypeptide in the functional assay. In other preferred embodiments, the EG307 polypeptide has at least about 20% of the activity of the natural polypeptide in the functional assay. In other preferred embodiments, the EG307 polypeptide has at least about 30% of the activity of the natural polypeptide in the functional assay. In other preferred embodiments, the EG307 polypeptide has at least about 40% of the activity of the natural polypeptide in the functional assay. In other preferred embodiments, the EG307 polypeptide has at least about 50% of the activity of the natural polypeptide in the functional assay. In other preferred embodiments, the polypeptide has at least about 60% of the activity of the natural polypeptide in the functional assay. In more preferred embodiments, the polypeptide has at least about 70% of the activity of the natural polypeptide in the functional assay. In more preferred embodiments, the polypeptide has at least about ~0% of the activity of the natural polypeptide in the functional assay. In more preferred embodiments, the polypeptide has at least about 90% of the activity of the natural polypeptide in the functional assay.
Examples of functional assays include antibody-binding assays, or yield-increasing assays, as detailed elsewhere in this specification.
As used herein, an isolated plant EG307 polypeptide can be a full-length polypeptide or any homologue of such a polypeptide. Examples of EG307 homologues include polypeptides in which amino acids have been deleted (e.g., a truncated version of the polypeptide, such as a peptide), inserted; inverted, substituted and/or derivatized (e.g., by glycosylation, phosphorylation, acetylation, myristylation, prenylation, palmitoylation, amidation and/or addition of glycerophosphatidyl inositol) such that the homolog has natural EG307 activity.
In one embodiment, when the homologue is administered to an animal as an immunogen, using techniques known to those skilled in the art, the animal will produce a humoral and/or cellular immune response against at least one epitope of a natural EG307 polypeptide. EG307 homologues can also be selected by their ability to perform the function of EG307 in a functional assay.
Plant EG307 polypeptide homologues can be the result of natural allelic variation or natural mutation. EG307 polypeptide homologues of the present invention can also be produced using techniques known in the art including, but not limited to, direct modifications to the polypeptide or modifications to the gene encoding the polypeptide using, for example, classic or recombinant DNA techniques to effect random or targeted mutagenesis.
In accordance with the present invention, a mimetope refers to any compound that is able to mimic the ability of an isolated plant EG307 polypeptide of the present invention to perform the function of an EG307 polypeptide of the present invention in a functional assay.
Examples of mimetopes include, but are not limited to, anti-idiotypic antibodies or fragments thereof, that include at least one binding site that mimics one or more epitopes of an isolated polypeptide of the present invention; non-polypeptideaceous immunogenic portions.of an isolated polypeptide (e.g., carbohydrate structures); and synthetic or natural organic molecules, including nucleic acids, that have a structure similar to at least one epitope of an isolated polypeptide of the present invention. Such mimetopes can be designed using computer-generated structures of polypeptides of the present invention.
Mimetopes can also be obtained by generating random samples of molecules, such as oligonucleotides, peptides or other organic molecules, and screening such samples by affinity chromatography techniques using the corresponding binding partner.
The minimal size of an EG307 polypeptide homologue of the present invention is a size sufficient to be encoded by a polynucleotide capable of forming a stable hybrid with the complementary sequence of a polynucleotide encoding the corresponding natural polypeptide.
As such, the size of the polynucleotide encoding such a polypeptide homologue is dependent on nucleic acid composition and percent homology between the polynucleotide and complementary sequence as well as upon hybridization conditions per se (e.g., temperature, salt concentration, and formamide concentration). It should also be noted that the extent of homology required to form a stable hybrid can vary depending on whether the homologous sequences are interspersed throughout the polynucleotides or are clustered (i.e., localized) in distinct regions on the polynucleotides. The minimal size of such polynucleotides is typically at least about 12 to about 15 nucleotides in length if the polynucleotides are GC-rich and at least about 15 to about 17 bases in length if they are AT-rich. Preferably, the polynucleotide is at least 12 bases in length.
As such, the minimal size of a polynucleotide used to encode an EG307 polypeptide homologue of the present invention is from about 12 to about 18 nucleotides in length. There is no limit, other than a practical limit, on the maximal size of such a polynucleotide in that the polynucleotide can include a portion of a gene, an entire gene, or multiple genes, or portions thereof. Similarly, the minimal size of an EG307 polypeptide homologue of the present invention is from about 4 to about 6 amino acids in length, with preferred sizes depending on whether a full-length, fusion, multivalent, or functional portions of such polypeptides are desired. Preferably, the polypeptide is at least 30 amino acids in length.
Any plant EG307 polypeptide is a suitable polypeptide of the present invention.
Suitable plants from which to isolate EG307 polypeptides (including isolation of the natural polypeptide or production of the polypeptide by recombinant or synthetic techniques) include maize, wheat, barley, rye, millet, chickpea, lentil, flax, olive, fig almond, pistachio, walnut, beet, parsnip, citrus fruits, including, but not limited to, orange, lemon, lime, grapefruit, tangerine, minneola, and tangelo, sweet potato, bean, pea, chicory, lettuce, cabbage, cauliflower, broccoli, turnip, radish, spinach, asparagus, onion, garlic, pepper, celery, squash, pumpkin, hemp, zucchini, apple, pear, quince, melon, plum, cherry, peach, nectarine, apricot, strawberry, grape, raspberry, blackberry, pineapple, avocado, papaya, mango, banana, soybean, tomato, sorghum, sugarcane, sugarbeet, sunflower, rapeseed, clover, tobacco, carrot, cotton, alfalfa, rice, potato, eggplant, cucumber, A~abidopsis, and woody plants such as coniferous and deciduous trees, with rice and maize being preferred. Preferred rice plants from which to isolate EG307 polypeptides include Nipponbarel and 2, Lemont, IR64, Teqing, Azucena, and Kasalath 1, 2, 3, and 4 strains of O. sativa.
A preferred plant EG307 polypeptide of the present invention is a compound that when expressed or modulated in a plant, is capable of increasing the yield of the plant.
One embodiment of the present invention is a fusion polypeptide that includes an EG307 polypeptide-containing domain attached to a fusion segment. Inclusion of a fusion segment as part of a EG307 polypeptide of the present invention can enhance the polypeptide's stability during production, storage and/or use. Depending on the segment's characteristics, a fusion segment can also act as an immunopotentiator to enhance the immune response mounted by an animal immunized with an EG307 polypeptide containing such a fusion segment. Furthermore, a fusion segment can function as a tool to simplify purification of an EG307 polypeptide, such as to enable purification of the resultant fusion polypeptide using affinity chromatography. A suitable fusion segment can be a domain of any size that has the desired function (e.g., imparts increased stability, imparts increased immunogenicity to a polypeptide, and/or simplifies purification of a polypeptide). It is within the scope of the present invention to use one or more fusion segments. Fusion segments can be joined to amino and/or carboxyl termini of the EG307-containing domain of the polypeptide. Linkages between fusion segments and EG307-containing domains of fusion polypeptides can be susceptible to cleavage in order to enable straightforward recovery of the EG307-containing domains of such polypeptides. Fusion polypeptides are preferably produced by culturing a .
recombinant cell transformed with a fusion polynucleotide that encodes a polypeptide including the fusion segment attached to either the carboxyl and/or amino terminal end of a EG307-containing domain.
Preferred fusion segments for use in the present invention include a glutathione binding domain; a metal binding domain, such as a poly-histidine segment capable of binding to a divalent metal ion; an immunoglobulin binding domain, such as Polypeptide A, Polypeptide G, T cell, B cell, Fc receptor or complement polypeptide antibody-binding domains; a sugar binding domain such as a maltose binding domain from a maltose binding polypeptide; and/or a "tag" domain (e.g., at least a portion of (3-galactosidase, a strep tag peptide, other domains that can be purified using compounds that bind to the domain, such as monoclonal antibodies). More preferred fusion segments include metal binding domains, such as a poly-histidine segment; a maltose binding domain; a strap tag peptide.
Preferred plant EG307 polypeptides of the present invention are rice EG307 polypeptides and maize EG307 polypeptides. More preferred EG307 polypeptides are O.
sativa, O. nufipogon, Z. mays ways, Zea ways parviglumis, Z.
diplope~°eu~cis and Z. luzuf°ians EG307 polypeptides. O. sativa strains inlcude Nipponbare, Azucena, Kasalath 1, 2, 3, and 4, Teqing, Lemont, and IR64. Z. ways pa~viglumis strains include Benz, BK4, IA19, and Wilkes. Z mays mays strains include BS7, HuoBai, Makki, Minl3, Pira, Sari, Smena, and W22.
One preferred O. sativa EG307 polypeptide of the present invention is a polypeptide encoded by an O. sativa polynucleotide that hybridizes under stringent hybridization conditions with complements of polynucleotides represented by SEQ ID NO:1, SEQ
ID
N0:91, SEQ ID. N0:2, SEQ ID N0:4, SEQ ID NO:S, SEQ ID N0:7, SEQ ID NO:10, SEQ
ID NO:11, SEQ ID N0:12, SEQ ID N0:14, SEQ ID NO:15, SEQ ID NO:17, and/or SEQ
ID
NO:18. Such an EG307 polypeptide is encoded by a polynucleotide that hybridizes under stringent hybridization conditions with a polynucleotide having nucleic acid sequence SEQ ID
NO:l, SEQ ID N0:91, SEQ ID. N0:2, SEQ ID N0:4, SEQ ID NO:S, SEQ ID N0:7, SEQ
ID
NO:10, SEQ ID NO:1 l, SEQ ID N0:12, SEQ ID N0:14, SEQ ID NO:15, SEQ ID NO:17, and/or SEQ ID N0:18.
Inspection of EG307 genomic nucleic acid sequences indicates that the genes comprise several regions, including a first axon region, a first intron region, a second axon region, a second intron region, and a third axon region.
Polynucleotides SEQ ID N0:4 and SEQ ID N0:91 represent the 5' and 3' ends of the EG307 gene in O. sativa (cv. Nipponbare). SEQ ID N0:4 and SEQ ID N0:91 are joined by a number of nucleotides, the exact number of which is unknown due to potential insertions/deletions in the non-coding portions of the gene, but is believed to be about 6.
Translation of SEQ ID N0:4 and SEQ ID N0:91 suggests that the O. sativa EG307 polynucleotide includes an open reading frame. The reading frame encodes an O.
sativa EG307 polypeptide of about 447 amino acids, the deduced amino acid sequence of which is represented herein as SEQ ID N0:6, assuming an open reading frame~having an initiation (start) codon spanning from about nucleotide 37 through about nucleotide 39 of SEQ ID N0:4 and a termination (stop) codon spanning from about nucleotide 2278 through about nucleotide 2280 of SEQ ID N0:4, with the first axon spanning nucleotides 1-126 of SEQ ID
NO: 4, the first intron spanning nucleotides 9-822 of SEQ ID N0:91, the second axon spanning nucleotides 823-1141 of SEQ ID N0:91, the second intron spanning nucleotides of SEQ ID N0:91, and the third exon spanning nucleotides 1223-2157 of SEQ ID
N0:91.
The open reading frame from nucleotide 37 through about nucleotide 2280 of SEQ
ID N0:4 is represented herein as SEQ ID NO:S.
Similarly, translation of O. sativa (strain Azucena) polynucleotide SEQ ID
NO:1 suggests an open reading frame from about nucleotide 3 to about nucleotide 2410 of SEQ ID
NO:l, with the first exon spanning nucleotides 1-92 of SEQ ID NO: 1, the first intron spanning nucleotides 93-1075 of SEQ ID NO:1, the second exon spanning nucleotides 1076-1394 of SEQ ID NO:1, the second intron spanning nucleotides 1395-1475 of SEQ
ID NO:1, and the third exon spanning nucleotides 1476-2441 of SEQ ID NO:1. The open reading frame is represented herein as SEQ ID N0:2, and encodes a polypeptide represented herein as SEQ ID N0:3.
Similarly, translation of O. sativa (strain Teqing) polynucleotide SEQ ID N0:7 suggests an open reading frame from about nucleotide 21 to about nucleotide 2421, with the first exon spanning nucleotides 1-110 of SEQ ID N0:7, the first intron spanning nucleotides 111-1089 of SEQ ID N0:7, the second exon spanning nucleotides 1090-1405 of SEQ
ID
N0:7, the second intron spanning nucleotides 1406-1486 of SEQ ID N0:7, and the third exon spanning nucleotides 1487-2461 of SEQ ID N0:7. The open reading frame is represented herein as SEQ ID NO:B, and encodes a polypeptide represented herein as SEQ ID
NO:9.
Similarly, polynucleotides SEQ ID NO:10 and SEQ ID NO:11 represent the 5' and 3' ends of the EG307 gene in O. sativa (strain Lemont). SEQ ID NO:10 and SEQ ID
NO:11 are joined by an unknown number of nucleotides. In the genomic sequence, there may be insertions/deletions in the non-coding portions of the gene, thus the actual number of nucleotides is unknown, but is believed to be about 10. Translation of O.
sativa (strain Lemont) polynucleotides SEQ ID NO:10 and SEQ ID NO:11 suggests an open reading frame from about nucleotide 166 of SEQ ID NO: 10 to about nucleotide 1547 of SEQ ID
NO:1 l, with the first exon spanning nucleotides 1-255 of SEQ ID NO:10, the first intron spanning nucleotides 255-451 of SEQ ID NO:10 and nucleotides 1-212of SEQ ID NO:11, the second exon spanning nucleotides 213-531 of SEQ ID NO:11, the second intron spanning nucleotides 532-612 of SEQ ID NO:11, and the third exon spanning nucleotides 613-1616 of SEQ ID
NO:11. The open reading frame is represented herein as SEQ ID N0:12, and encodes a polypeptide represented herein as SEQ ID N0:13.
Similarly, translation of O. sativa (strain IR64) polynucleotide SEQ ID N0:14 suggests an open reading frame from about nucleotide 1 to about nucleotide 2400, with the first exon spanning nucleotides 1-90 of SEQ ID N0:14, the first intron spanning nucleotides 91-1068 of SEQ ID N0:14, the second exon spanning nucleotides 1069-1384 of SEQ
ID
N0:14, the second intron spanning nucleotides 1385-1465 of SEQ ID N0:14, and the third exon spanning nucleotides 1466-2459 of SEQ ID NO:11. The open reading frame is represented herein as SEQ ID N0:14, and encodes a polypeptide represented herein as SEQ
ID NO:15.
Similarly, translation of O. sativa (strain Kasalath) polynucleotide SEQ ID
N0:17 suggests an open reading frame from about nucleotide 2 to about nucleotide 2402, , with the first exon spanning nucleotides 1-91 of SEQ ID N0:17, the first intron spanning nucleotides 92-1070 of SEQ ID N0:17, the second exon spanning nucleotides 1071-1386 of SEQ
ID
N0:17, the second intron spanning nucleotides 1387-1467 of SEQ ID N0:17, and the third exon spanning nucleotides 1468-2432 of SEQ ID N0:17.
The open reading frame is represented as SEQ ID NO:l 8, and encodes a polypeptide represented herein as SEQ ID N0:19. In SEQ ID NO: 18, "N" at postion 889 is "G", and "N"
at position 971 is "A" for strain Kasalath 1, making amino acid residue 297 in SEQ ID N0:19 a valine, and amino acid residue 324 a glutamine. In SEQ ID NO: 18, "N" at postion 889 is "G", and "N" at position 971 is "T" for strain Kasalath 2, making amino acid residue 297 in SEQ ID N0:19 a valine, and amino acid residue 324 a leucine. In SEQ ID NO: 18, "N" at postion 889 is "C", and "N" at position 971 is "A" for strain Kasalath 3, making amino acid residue 297 in SEQ ID N0:19 a leucine, and amino acid residue 324 a glutamine.
In SEQ ID
NO: 18, "N" at postion 889 is "C", and "N" at position 971 is "T" for strain Kasalath 4, making amino acid residue 297 in SEQ ID NO:19 a leucine, and amino acid residue 324 a leucine.
A preferred O. sativa EG307 polypeptide of the present invention is a polypeptide encoded by a polynucleotide that hybridizes under stringent hybridization conditions with polynucleotides represented by SEQ ID NO:1, SEQ ID N0:91, SEQ ID. N0:2, SEQ ID
N0:4, SEQ ID NO:S, SEQ ID N0:7, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ
ID N0:14, SEQ ID NO:15, SEQ ID N0:17, and/or SEQ ID N0:18.
Preferred O.~ufipogofz EG307 polypeptides of the present invention are polypeptides encoded by an O:~ufipogon polynucleotide that hybridizes under stringent hybridization conditions with complements of polynucleotides represented by SEQ ID N0:20, SEQ ID
N0:21, SEQ ID. N0:23, SEQ ID N0:24, SEQ ID N0:25, SEQ ID N0:27, SEQ ID N0:28, SEQ ID N0:29, SEQ ID N0:30, and/or SEQ ID N0:31. Such an EG307 polypeptide is encoded by a polynucleotide that hybridizes under stringent hybridization conditions with a polynucleotide having nucleic acid sequence SEQ ID N0:20, SEQ ID N0:21, SEQ
ID.
N0:23, SEQ ID N0:24, SEQ ID N0:25, SEQ ID N0:27, SEQ ID N0:28, SEQ ID N0:29, SEQ ID N0:30, and/or SEQ ID N0:31.
Polynucleotides SEQ ID N0:27 and SEQ ID N0:28 represent the 5' and 3' ends of the EG307 gene in O. ~~ufpogon (strain 5953). SEQ ID N0:27 and SEQ ID N0:28 are joined by a number of nucleotides, the exact number of which is unknown due to potential insertions/deletions in the non-coding portions of the gene, but is believed to be about 23.
Translation of SEQ ID N0:27 and SEQ ID N0:28 suggests that the O. rufipogon polynucleotide includes an open reading frame. The reading frame encodes an O.
~ufipogov~
EG307 polypeptide of about 446 amino acids, the deduced amino acid sequence of which is represented herein as SEQ ID N0:30, assuming an open reading frame having an initiation (start) codon spanning from about nucleotide 18 through about nucleotide 20 of SEQ ID
N0:27 and a termination (stop) codon spanning from about nucleotide 1330 through about nucleotide 1332 of SEQ ID N0:28, with the first exon spanning nucleotides 1-107 of SEQ ID
N0:27, no first intron, the second exon spanning nucleotides 1-316 of SEQ ID
N0:28, the second intron spanning nucleotides 317-397 of SEQ ID N0:28, and the third exon spanning nucleotides 398-1332 of SEQ ID NO:28. The open reading frame from nucleotide 18 of SEQ
ID N0:27 through about nucleotide 1332 of SEQ ID N0:28 is represented herein as SEQ ID
NO:29.
Similarly, translation of O. ~ufipogon (strain 5948) polynucleotide SEQ ID
NO:20 suggests an open reading frame from about 15 nucelotides 5' of nucleotide 1 to about nucleotide 2385, first exon not represented, the first intron spanning nucleotides 1-1053 of SEQ ID N0:20, the second exon spanning nucleotides 1054-1369 of SEQ ID NO:20, the second intron spanning nucleotides 1370-1450 of SEQ ID N0:20, and the third exon spanning nucleotides 1451-2447 of SEQ ID N0:20. The open reading frame is represented herein as SEQ ID N0:21, and encodes a polypeptide represented herein as SEQ ID N0:22.
Similarly, polynucleotides SEQ ID N0:23 and SEQ ID N0:24 represent the 5' and 3' ends of the EG307 gene in O. f~ufpogon (strain 5949). SEQ ID N0:23 and SEQ ID
N0:24 are joined by a number of nucleotides, the exact number of which is unknown due to potential insertions/deletions in the non-coding portions of the gene, but is believed to be about 13.
Translation of SEQ ID N0:23 and SEQ ID N0:24 suggests an open reading frame from about nucleotide 57 of SEQ ID N0:23 to about nucleotide 1562 of SEQ ID NO:24, with the first exon spanning nucleotides 1-146 of SEQ ID N0:23, the first intron spanning nucleotides 1-230 of SEQ ID N0:24, the second exon spanning nucleotides 231-546 of SEQ ID
N0:24, the second intron spanning nucleotides 547-627 of SEQ ID N0:24, and the third exon spanning nucleotides 628-1615 of SEQ ID N0:24. The open reading frame is represented as SEQ ID
N0:25, and encodes a polypeptide represented herein as SEQ ID N0:26.
Similarly, translation of O. f~ufpogon (strain IRCG 105491) polynucleotide SEQ
ID
N0:90 suggests an open reading frame from about nucleotide 1 to about nucleotide 1341.
The open reading frame is represented herein as SEQ ID N0:31 encoding a polypeptide represented herein as SEQ ID N0:32.
A preferred O. rufipogon EG307 polypeptide of the present invention is a polypeptide encoded by a polynucleotide that hybridizes under stringent hybridization conditions with a polynucleotide represented by SEQ ID NO:20, SEQ ID NO:21, SEQ ID. N0:23, SEQ
ID
N0:24, SEQ ID N0:25, SEQ ID N0:27, SEQ ID N0:28, SEQ ID N0:29, SEQ ID N0:30, and/or SEQ ID N0:31.
One preferred Zea mays panviglumis EG307 polypeptide of the present invention is a polypeptide encoded by a Zea mays parviglumis polynucleotide that hybridizes under stringent hybridization conditions with complements of polynucleotides represented by SEQ
ID N0:66, SEQ ID NO:67, SEQ ID N0:69, SEQ ID. N0:70, SEQ ID NO:71, SEQ ID
NO:73, SEQ ID N0:74, SEQ ID N0:75, SEQ ID N0:77, SEQ ID NO:59, and/or SEQ ID
N0:78. Such an EG307 polypeptide is encoded by a polynucleotide that hybridizes under stringent hybridization conditions with a polynucleotide having nucleic acid sequence SEQ ID
N0:66, SEQ ID N0:67, SEQ ID N0:69, SEQ ID. N0:70, SEQ ID N0:71, SEQ ID N0:73, SEQ ID N0:74, SEQ ID N0:75, SEQ ID N0:77, SEQ ID N0:59, and/or SEQ ID NO:78.
Translation of SEQ ID N0:66 suggests that the Zea mays paywiglunZis EG307 polynucleotide (strain Benz) includes an open reading frame. The reading frame encodes an Zea nays pa~viglumis EG307 polypeptide of about 448 amino acids, the deduced amino acid sequence of which is represented herein as SEQ ID N0:68, assuming an open reading frame having an initiation (start) codon spanning from about nucleotide 1 through about nucleotide 3 of SEQ ID N0:66 and a termination (stop) codon spanning from about nucleotide through about nucleotide 2571 of SEQ ID N0:66, with the first exon spanning nucleotides 1-81 of SEQ ID N0:66, the first intron spanning nucleotides 82-1204 of SEQ ID
N0:66, the second exon spanning nucleotides 1205-1517 of SEQ ID NO:66, the second intron spanning nucleotides 1518-1618 of SEQ ID N0:66, and the third exon spanning nucleotides of SEQ ID N0:66. The open reading frame from nucleotide 3 through about nucleotide 2571 of SEQ ID N0:66 is represented herein as SEQ ID N0:67.

Similarly, polynucleotides SEQ ID N0:69 and SEQ ID N0:70 represent the 5' and 3' ends of the EG307 gene in Z. nays pa~viglumis (strain BI~4). SEQ ID N0:69 and SEQ ID
N0:70 are joined by a number of nucleotides, the exact number of which is unknown due~to potential insertions/deletions in the non-coding portions of the gene, but is believed to be about 10. Translation of Z. rnays pay°viglumis (strain BK4) polynucleotide SEQ ID N0:69 and SEQ ID N0:70 suggests an open reading frame from about nucleotide 10 of SEQ ID
N0:69 to about nucleotide 1728 of SEQ ID N0:70, with the first exon spanning nucleotides 1-90 of SEQ ID N0:69, the first intron spanning nucleotides 91-586 of SEQ ID
N0:69 and nucleotides 1-361 of SEQ ID NO:70, the second exon spanning nucleotides 362-674 of SEQ
ID N0:70, the second intron spanning nucleotides 675-775 of SEQ ID N0:70, and the third exon spanning nucleotides 776-1775 of SEQ ID NO:l 1. The open reading frame is represented as SEQ ID N0:71, and encodes a polypeptide represented herein as SEQ ID
N0:72.
Similarly, polynucleotides SEQ ID N0:73 and SEQ ID N0:74 represent the 5' and 3' ends of the EG307 gene in Z. mays pa~viglumis (strain IA19). SEQ ID N0:73 and SEQ ID
N0:74 are joined by a number of nucleotides, the exact number of which is unknown due to potential insertions/deletions in the non-coding portions of the gene, but is believed to be about 12. Translation of Z. nays parviglumis (strain IA19) polynucleotides SEQ
ID N0:73 and SEQ ID N0:74 suggests an open reading frame from about nucleotide 69 of SEQ ID
N0:73 to about nucleotide 1280 of SEQ ID N0:74, with the first exon spanning nucleotides 1-149 of SEQ ID NO:73, the first intron spanning nucleotides 150-305 of SEQ ID
N0:73, the second exon spanning nucleotides 1-226 of SEQ ID N0:74, the second intron spanning nucleotides 227-327 of SEQ ID N0:74, and the third. exon spanning nucleotides 328-1309 of SEQ ID N0:74. The open reading frame is represented herein as SEQ ID N0:75, and encoding a polypeptide represented herein as SEQ ID N0:76.
Similarly, polynucleotides SEQ ID N0:77 and SEQ ID NO:59 represent the 5' and 3' ends of the EG307 gene in Z. nZays pa~viglumis (strain Wilkes). SEQ ID N0:77 and SEQ ID
N0:59 are joined by a number of nucleotides, the exact number of which is unknown due to potential insertions/deletions in the non-coding portions of the gene, but is believed to be about 14. Translation of Z. nays pa~viglumis (strain Wilkes) polynucleotide SEQ ID N0:77 and SEQ ID N0:59 suggests an open reading frame from about nucleotide 36 of SEQ ID
N0:77 to about nucleotide 1598 of SEQ ID N0:59, with the first exon spanning nucleotides 1-86 of SEQ ID NO:77, the first intron spanning nucleotides 1-231 of SEQ ID
NO:59, the second exon spanning nucleotides 232-544 of SEQ ID N0:59, the second intron spanning nucleotides 545-645 of SEQ ID N0:59, and the third exon spanning nucleotides 656-1640 of SEQ ID N0:59. The open reading frame is represented herein as SEQ ID N0:78, and encoding a polypeptide represented herein as SEQ ID N0:79.
A preferred EG307 polypeptide of the present invention is a polypeptide encoded by a polynucleotide that hybridizes under stringent hybridization conditions with a polynucleotide represented by SEQ ID N0:33, SEQ ID N0:34, SEQ ID. N0:35, SEQ ID N0:37, SEQ ID
N0:38, SEQ ID N0:40, SEQ ID N0:41, SEQ ID N0:42, SEQ ID N0:44, SEQ ID N0:45, SEQ ID N0:46, SEQ ID N0:47, SEQ ID N0:49, SEQ ID. NO:50, SEQ ID NO:51, SEQ ID
N0:53, SEQ ID N0:54, SEQ ID NO:55, SEQ ID N0:57, SEQ ID NO:58, SEQ ID N0:60, SEQ ID N0:62, SEQ ID NO:63, and/or SEQ ID N0:64.
One preferred Zea fzzays mays EG307 polypeptide of the present invention is a polypeptide encoded by an Zea rrzays nzays polynucleotide that hybridizes under stringent hybridization conditions with complements of polynucleotides represented by SEQ ID N0:33, SEQ ID N0:34, SEQ ID. N0:35, SEQ ID N0:37, SEQ ID N0:38, SEQ ID N0:40, SEQ ID
N0:41, SEQ ID N0:42, SEQ ID N0:44, SEQ ID N0:45, SEQ ID NO:46, SEQ ID N0:47, SEQ ID N0:49, SEQ ID. NO:50, SEQ ID NO:51, SEQ ID N0:53, SEQ ID N0:54, SEQ ID
NO:55, SEQ ID N0:57, SEQ ID N0:58, SEQ ID N0:60, SEQ ID NO:62, SEQ ID N0:63, and/or SEQ ID N0:64. Such an EG307 polypeptide is encoded by a polynucleotide that hybridizes under stringent hybridization conditions with a polynucleotide having nucleic acid sequence SEQ ID N0:33, SEQ ID N0:34, SEQ ID. N0:35, SEQ ID N0:37, SEQ ID
N0:38, SEQ ID NO:40, SEQ ID N0:41, SEQ ID N0:42, SEQ ID N0:44, SEQ ID N0:45, SEQ ID
N0:46, SEQ ID N0:47, SEQ ID N0:49, SEQ ID. NO:50, SEQ ID NO:51, SEQ ID NO:53, SEQ ID N0:54, SEQ ID NO:55, SEQ ID NO:57, SEQ ID N0:58, SEQ ID N0:59, SEQ ID
N0:60, SEQ ID NO:62, SEQ ID N0:63, and/or SEQ ID N0:64.
Polynucleotides SEQ ID N0:33 and SEQ ID N0:34 represent the 5' and 3' ends of the EG307 gene in Z (nays mays (strain BS 7). SEQ ID NO:33 and SEQ ID N0:34 are joined by a number of nucleotides, the exact number of which is unknown due to potential insertions/deletions in the non-coding portions of the gene, but is believed to be about 21.
Translation of SEQ ID N0:33 and SEQ ID N0:34 suggests that the Zea nays mays polynueleotide includes an open reading frame. The reading frame encodes an Zea nays nays EG307 polypeptide of about 448 amino acids, the deduced amino acid sequence of which is represented herein as SEQ ID N0:36, assuming an open reading frame having an initiation (start) codon spanning from about nucleotide 3 through about nucleotide 5 of SEQ
ID N0:33 and a termination (stop) codon spanning from about nucleotide 1396 through about nucleotide 1398 of SEQ ID N0:34, with the first exon spanning nucleotides 1-83 of SEQ ID
N0:33, the first intron spanning nucleotides 84-180 of SEQ ID N0:33 and nucleotides 1-31 of SEQ ID N0:34, the second exon spanning nucleotides 32-344 of SEQ ID N0:34, the second intron spanning nucleotides 345-445 of SEQ ID N0:34, and the third exon spanning nucleotides 446-1447 of SEQ ID N0:34. The open reading frame from nucleotide 3 of SEQ
ID N0:33 through about nucleotide 1398 of SEQ ID N0:34 is represented herein as SEQ ID
N0:35.
Similarly, translation of ~ mays rr2ays (strain HuoBai) polynucleotide SEQ ID
N0:37 suggests an open reading frame from about nucleotide 28 to about nucleotide 2599, with the first exon spanning nucleotides 1-108 of SEQ ID N0:37, the first intron spanning nucleotides 109-1232 of SEQ ID N0:37, the second exon spanning nucleotides 1233-1545 of SEQ ID
N0:37, the second intron spanning nucleotides 1546-1646 of SEQ ID N0:37, and the third exon spanning nucleotides 1647-2646 of SEQ ID N0:37. The open reading frame is represented herein as SEQ ID N0:38, and encodes a polypeptide represented herein as SEQ
ID N0:39.
Similarly, polynucleotides SEQ ID N0:40 and SEQ ID N0:41 represent 5' end to the 3' end of the EG307 gene in Z. nays mays (strain Makki). SEQ ID N0:40 and SEQ
ID
N0:41 are joined by a number of nucleotides, the exact number of which is unknown due to potential insertions/deletions in the non-coding portions of the gene, but is believed to be about 20. Translation of Z. nays nays (strain Makki) polynucleotides SEQ ID
N0:40 and SEQ ID N0:41 suggests an open reading frame from about nucleotide 61 of SEQ ID
N0:40 to about nucleotide 2263 of SEQ ID N0:41, with the first exon spanning nucleotides 1-141 of SEQ ID N0:40, the first intron spanning nucleotides 142-262 of SEQ ID N0:40 and nucleotides 1-896 of SEQ ID N0:41, the second exon spanning nucleotides 897-1209 of SEQ
ID N0:41, the second intron spanning nucleotides 1210-1310 of SEQ ID N0:41, and the third exon spanning nucleotides 1311-2311 of SEQ ID N0:41. The open reading frame is represented as SEQ ID N0:42 encoding a polypeptide represented herein as SEQ
ID N0:43.
Similarly, polynucleotides SEQ ID N0:44, SEQ ID N0:45 and SEQ ID N0:46 represent the three parts of the EG307 gene in 2 mays mays (strain Minl3), from the 5' end to the 3' end. SEQ ID N0:44, SEQ ID N0:45 and SEQ ID N0:46 are joined by a number of nucleotides, the exact number of which is unknown due to potential insertions/deletions in the non-coding portions of the genie, but is belived to be 19 between SEQ ID N0:44 and SEQ ID
N0:45, and 17 between SEQ ID N0:45 and SEQ ID N0:46. Translation of Z. nays mays (strain Minl3) polynucleotides SEQ ID N0:44, SEQ ID N0:45 and SEQ ID N0:46 suggests an open reading frame from about nucleotide 45 of SEQ ID N0:44 to about nucleotide 1741 of SEQ ID N0:46, with the first exon spanning nucleotides 1-125 of SEQ ID
N0:44, the first intron spanning nucleotides 1-198 of SEQ ID N0:45 and nucleotides 1-374 of SEQ
ID
N0:46, the second exon spanning nucleotides 375-687 of SEQ ID N0:46, the second intron spanning nucleotides 688-788 of SEQ ID N0:46, and the third exon spanning nucleotides 789-1787 of SEQ ID N0:46. The open reading frame is represented herein as SEQ
ID
N0:47, and encodes a polypeptide represented herein as SEQ ID N0:48.
Similarly, polynucleotides SEQ ID N0:49 and SEQ ID NO:50 represent the 5' and 3' ends of the EG307 gene in Z. ways mat's (strain Pira). SEQ ID N0:49 and SEQ ID
NO:50 are joined by a number of nucleotides, the exact number of which is unknown due to potential insertions/deletions in the non-coding portions of the gene. Translation of Z.
mat's nays (strain Pira) polynucleotides SEQ ID N0:49 and SEQ ID NO:50 suggests an open reading frame from about nucleotide 31 of SEQ ID N0:49 to about nucleotide 1722 of SEQ
ID
NO:50, with the first exon spanning nucleotides 1-111 of SEQ ID N0:49, the first intron spanning nucleotides 112-495 of SEQ ID N0:49 and nucleotides 1-355 of SEQ ID
NO:50, the second exon spanning nucleotides 356-668 of SEQ ID NO:50, the second intron spanning nucleotides 669-769 of SEQ ID NO:50, and the third exon spanning nucleotides 770-1768 of SEQ ID NO:50. The open reading frame is represented herein as SEQ ID NO:51, and encodes a polypeptide represented herein as SEQ ID N0:52.
Similarly, polynucleotides SEQ ID N0:53 and SEQ ID N0:54 represent the 5' and 3' ends of the EG307 gene in Z. mat's mat's (strain Sari). SEQ ID N0:53 and SEQ
ID N0:54 are joined by a number of nucleotides, the exact number of which is unknown due to potential insertions/deletions in the non-coding portions of the gene, but is believed to be about 22.
Translation of Z mat's mat's (strain Pira) polynucleotides SEQ ID N0:53 and SEQ ID NO:54 suggests an open reading frame from about nucleotide 19 of SEQ ID N0:53 to about nucleotide 1756 of SEQ ID NO:54, with the first exon spanning nucleotides 1-99 of SEQ ID
N0:53, the first intron spanning nucleotides 100-212 of SEQ ID NO:53 and nucleotides 1-389 of SEQ ID N0:54, the second exon spanning nucleotides 390-702 of SEQ ID N0:54, the second intron spanning nucleotides 703-803 of SEQ ID N0:54, and the third exon spanning nucleotides 804-1803 of SEQ ID N0:54. The open reading frame is represented herein as SEQ ID NO:55, and encodes a polypeptide represented herein as SEQ ID N0:56.
Similarly, polynucleotides SEQ ID N0:57 and SEQ ID N0:58 represent the 5' and 3' ends of the EG307 gene in Z. mat's mat's (strain Smena). SEQ ID N0:57 and SEQ
ID N0:58 are joined by a number of nucleotides, the exact number of which is unknown due to potential insertions/deletions in the non-coding portions of the gene, but is believed to be 14.
Translation of Z. mays mays (strain Smena) polynucleotides SEQ ID N0:57 and SEQ ID
N0:58 suggests an open reading frame from about nucleotide 68 of SEQ ID N0:57 to about nucleotide 2199 of SEQ ID N0:58, with the first axon spanning nucleotides 1-148 of SEQ ID
NO:57, the first intron spanning nucleotides 149-305 of SEQ ID N0:57 and nucleotides 1-834 of SEQ ID N0:58, the second axon spanning nucleotides 835-1147 of SEQ ID
N0:58, the second intron spanning nucleotides 1148-1248 of SEQ ID N0:58, and the third axon spanning nucleotides 1249-2208 of SEQ ID N0:58. Additionally, sequence SEQ ID N0:59 contains a deletion at starting after nucleotide 738 of SEQ ID N0:59. The open reading frame is represented herein as SEQ ID N0:60, and encodes a polypeptide represented herein as SEQ
ID N0:61.
Similarly, polynucleotides SEQ ID N0:62 and SEQ ID N0:63 represent the 5' and 3' ends of the EG307 gene in Z. mays mays (strain W22). SEQ ID N0:62 and SEQ ID
N0:63 are joined by a number of nucleotides, the exact number of which is unknown due to potential insertions/deletions in the non-coding portions of the gene, but is believed to be about 22.
Translation of ~ mays mays (strain W22) polynucleotides SEQ ID N0:62 and SEQ
ID N0:63 suggests an open reading frame from about nucleotide 1 of SEQ ID NO:62 to about nucleotide 1367 of SEQ ID N0:63, with the first axon spanning nucleotides 1-81 of SEQ ID
N0:62, the first intron spanning nucleotides 82-893 of SEQ ID N0:62, the second axon spanning nucleotides 1-313 of SEQ ID N0:63, the second intron spanning nucleotides 314-414 of SEQ ID NO:63, and the third axon spanning nucleotides 415-1411 of SEQ
ID N0:63.
The open reading frame is represented herein as SEQ ID N0:64, and encodes a polypeptide represented herein as SEQ ID N0:65.
A preferred Z. nays nZays EG307 polypeptide of the present invention is a polypeptide encoded by a polynucleotide that hybridizes under stringent hybridization conditions with a polynucleotide represented by SEQ ID N0:33, SEQ ID NO:34, SEQ ID. N0:35, SEQ
ID
N0:37, SEQ ID N0:38, SEQ ID N0:40, SEQ ID N0:41, SEQ ID N0:42, SEQ ID NO:44, SEQ ID N0:45, SEQ ID N0:46, SEQ ID N0:47, SEQ ID N0:49, SEQ ID. NO:50, SEQ ID
NO:51, SEQ ID N0:53, SEQ ID N0:54, SEQ ID NO:55, SEQ ID N0:57, SEQ ID N0:58, SEQ ID N0:60, SEQ ID N0:62, SEQ ID N0:63, and/or SEQ ID N0:64.
A preferred O. Yufipogo~ EG307 polypeptide of the present invention is a polypeptide encoded by a polynucleotide that hybridizes under stringent hybridization conditions with a polynucleotide represented by SEQ ID N0:20, SEQ ID NO:21, SEQ ID. N0:23, SEQ
ID

N0:24, SEQ ID N0:25, SEQ ID N0:27, SEQ ID N0:28, SEQ ID N0:29, SEQ ID N0:30, and/or SEQ ID N0:31.
One preferred Zea diplope~eufzis EG307 polypeptide of the present invention is a polypeptide encoded by an Zea mat's pafwiglumis polynucleotide that hybridizes under stringent hybridization conditions with complements of polynucleotides represented by SEQ
ID~N0:80, SEQ ID N0:81, and/or SEQ ID N0:82. Such an EG307 polypeptide is encoded by a polynucleotide that hybridizes under stringent hybridization conditions with a polynucleotide having nucleic acid sequence SEQ ID N0:80, SEQ ID N0:81, and/or SEQ ID
N0:82.
Polynucleotides SEQ ID N0:80 and SEQ ID N0:81 represent the 5' and 3' ends of the EG307 gene in Z. diploperehuis SEQ ID N0:80 and SEQ ID N0:81 are joined by a number of nucleotides, the exact number of which is unknown due to potential insertions/deletions in the non-coding portions of the gene, but is believed to be about 24. One preferred Zea diploperenuis EG307 polypeptide of the present invention is a polypeptide encoded by an Zea diplope~e~nis polynucleotide that hybridizes under stringent hybridization conditions with complements of polynucleotides represented by SEQ ID N0:80 and SEQ ID N0:81.
Such an EG307 polypeptide is encoded by a polynucleotide that hybridizes under stringent hybridization conditions with a polynucleotide having nucleic acid sequence SEQ ID N0:80 and SEQ ID NO:81.
Translation of SEQ ID N0:80 and SEQ ID NO:81 suggests that the Zea mat's diplopes~ehnis EG307 polynucleotides includes an open reading frame. The reading frame encodes an Zea diplopere~cnis EG307 polypeptide of about 448 amino acids, the deduced amino acid sequence of which is represented herein as SEQ ID N0:83, assuming an open reading frame having an initiation (start) codon spanning from about nucleotide 21 through about nucleotide 23 of SEQ ID NO:80 and a termination (stop) codon spanning from about nucleotide 1656 through about nucleotide 1658 of SEQ ID N0:81, with the first exon spanning nucleotides 1-101 of SEQ ID N0:80, the first intron spanning nucleotides 102-225 of SEQ ID N0:80 and nucleotides 1-291 of SEQ ID N0:81, the second exon spanning nucleotides 292-313 of SEQ ID N0:81, the second intron spanning nucleotides 314-705 of SEQ ID N0:81, and the third exon spanning nucleotides 706-1672 of SEQ ID
N0:81. The open reading frame from nucleotide 21 of SEQ ID N0:80 through about nucleotide 1658 of SEQ ID N0:81 is represented herein as SEQ ID N0:82.
A preferred Z. diploperennis EG307 polypeptide of the present invention is a polypeptide encoded by a polynucleotide that hybridizes under stringent hybridization conditions with polynucleotides represented by SEQ ID N0:80, SEQ ID N0:81, and/or SEQ
ID N0:82.
One preferred Zea luxu~iahs EG307 polypeptide of the present invention is a polypeptide encoded by an Zea luxuriaas polynucleotide that hybridizes under stringent hybridization conditions with complements of polynucleotides represented by SEQ ID N0:84 andlor SEQ ID N0:85. Such an EG307 polypeptide is encoded by a polynucleotide that hybridizes under stringent hybridization conditions with a polynucleotide having nucleic acid sequence SEQ ID N0:84 and/or SEQ ID N0:85.
Translation of SEQ ID N0:84 suggests that the Zea luxur~iav~s EG307 polynucleotide includes an open reading frame. The reading frame encodes an Zea luxuria~rs polypeptide of about 448 amino acids, the deduced amino acid sequence of which is represented herein as SEQ ID N0:86, assuming an open reading frame having an initiation (start) codon spanning from about nucleotide 5 through about nucleotide 7 of SEQ ID N0:84 and a termination (stop) codon spanning from about nucleotide 2365 through about nucleotide 2367 of SEQ ID N0:84, with the first exon spanning nucleotides 1-85 of SEQ ID
N0:84, the first intron spanning nucleotides 86-998 of SEQ ID N0:84, the second exon spanning nucleotides 999-1311 of SEQ ID N0:84, the second intron spanning nucleotides of SEQ ID NO:84, and the third exon spanning nucleotides 1415-2423 of SEQ ID
NO:84.
The open reading frame from nucleotide 5 through about nucleotide 2367 of SEQ
ID N0:84 is represented herein as SEQ ID N0:85.
A preferred Z luxut~iahs EG307 polypeptide of the present invention is a polypeptide encoded by a polynucleotide that hybridizes under stringent hybridization conditions with polynucleotides represented by SEQ ID NO:84, and/or SEQ ID N0:85.
Comparison of the various D. sativa, O. rufipogo~, Z. mays mays, Z mays parviglumis, Z. diplope~°ercnis, and Z. luxuf°ians EG307 nucleic acid sequences and amino acid sequences indicates that these species of plants possess similar EG307 genes and polypeptides. The nucleotide sequences of the coding region of EG307 from the various strains of O. sativa and O. r~ufipogon have 99.0% sequence identity, when compared to each other, which makes clear that they are homologous. All rice sequences, both ancestral and modern, share the same stop codon (TAG), and (for the 5' UTR sequence that we have collected to date), the 5' UTR sequences have 98.4% sequence identity. The protein sequences of the various strains of O. sativa and O. rufipogo~c have 98.2%
sequence identity, again demonstrating that these are homologous sequences. The protein sequence of EG307 from rice is about 94% identical to the protein sequence of EG307 from maize, again demonstrating their homology. The protein sequences of maize EG307 and teosinte EG307 have 99.8% sequence identity.
Finding this degree of identity between O. sativa, O. ~ufipogon, Z. mays nays, ~ mays pa~viglumis, Z. diplopef-ennis, and Z. luxurians EG307 nucleic acid sequences and amino acid sequences supports the ability to obtain any plant EG307 polypeptide and polynucleotide given the polypeptide and nucleic acid sequences disclosed herein.
These plant EG307 polypeptides, and the polynucleotides that encode them, represent novel compounds with utility in increasing yield in a plant.
Preferred plant EG307 polypeptides of the present invention include polypeptides comprising amino acid sequences that axe at least about 30%, preferably at least about 50%, more preferably at least about 75%, more preferably at least about 80%, more preferably at least about 85%, more preferably at least about 90%, and more preferably at least about 95%, more preferably at least about 98% identical to one or more of the amino acid sequences disclosed herein for O. sativa, O. nufipogon, Z. mays mays, Z. mays par°viglumis, Z.
diplopef°ennis, and Z. luxu~ians EG307 polypeptides of the present invention. More preferred plant EG307 polypeptides of the present invention include: polypeptides encoded by at least a portion of SEQ ID NO. 1 and/or SEQ ID N0:2 and, as such, have amino acid sequences that include at least a portion of SEQ ID NO:3; polypeptides encoded by at least a portion of SEQ
ID NO:4, SEQ ID N0:81 and/or SEQ ID NO:S and, as such, have amino acid sequences that include at least a portion of SEQ ID NO:6; polypeptides encoded by at least a portion of SEQ
ID N0:7 and/or SEQ ID N0:8 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:9; polypeptides encoded by at least a portion of SEQ ID
NO:10, SEQ
ID NO:l l, and/or SEQ ID N0:12 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:13 ; polypeptides encoded by at least a portion of SEQ ID
N0:14 and/or SEQ ID NO:15 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:16; polypeptides encoded by at least a portion of SEQ ID
N0:17 and/or SEQ ID NO:18 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:19; polypeptides encoded by at least a portion of SEQ ID N0:20 and/or SEQ
ID N0:21 and, as such, have amino acid sequences that include at least a portion of SEQ ID
N0:22; polypeptides encoded by at least a portion of SEQ ID N0:23, SEQ ID
N0:24, and/or SEQ ID N0:25 and, as such, have amino acid sequences that include at least a portion of SEQ
ID N0:26; polypeptides encoded by at least a portion of SEQ ID N0:27, SEQ ID
N0:28 and/or SEQ ID N0:29 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:30; polypeptides encoded by at least a portion of SEQ ID N0:90 and/or SEQ

ID N0:31 and, as such, have amino acid sequences that include at least a portion of SEQ ID
NO:32; polypeptides encoded by at least a portion of SEQ ID N0:33, SEQ ID
N0:34 and/or SEQ ID N0:35 and, as such, have amino acid sequences that include at least a portion of SEQ
ID N0:36; polypeptides encoded by at least a portion of SEQ ID N0:37 andlor SEQ ID
N0:38 and, as such, have amino acid sequences that include at least a portion of SEQ ID
N0:39; polypeptides encoded by at least a portion of SEQ ID N0:40, SEQ ID
NO:41, and/or SEQ ID N0:42 and, as such, have amino acid sequences that include at least a portion of SEQ
ID N0:43; polypeptides encoded by at least a portion of SEQ ID N0:44, SEQ ID
N0:45, SEQ ID N0:46, and/or SEQ ID N0:47 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:48; polypeptides encoded by at least a portion of SEQ ID
N0:49, SEQ ID NO:50, and/or SEQ ID NO:51 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:52; polypeptides encoded by at least a portion of SEQ ID N0:53, SEQ ID N0:54, and/or SEQ ID NO:55 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:56; polypeptides encoded by at least a portion of SEQ ID N0:57, SEQ ID N0:58, and/or SEQ ID N0:60 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:61; polypeptides encoded by at least a portion of SEQ ID N0:62, SEQ ID N0:63, and/or SEQ ID N0:64 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:65;
polypeptides encoded by at least a portion of SEQ ID NO:66, and/or SEQ ID N0:67 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:68; polypeptides encoded by at least a portion of SEQ ID N0:69, SEQ ID N0:70, and/or SEQ ID N0:71 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:72; polypeptides encoded by at least a portion of SEQ ID N0:73, SEQ ID N0:74, and/or SEQ ID NO:75 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:76;
polypeptides encoded by at least a portion of SEQ ID N0:77, SEQ ID N0:59, and/or SEQ ID N0:78 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:79;
polypeptides encoded by at least a portion of SEQ ID N0:80, SEQ ID N0:81, and/or SEQ ID
N0:82 and, as such, have amino acid sequences that include at least a portion of SEQ ID
N0:83; and polypeptides encoded by at least a portion of SEQ ID N0:84, and/or SEQ ID
N0:85 and, as such, have amino acid sequences that include at least a portion of SEQ ID
N0:86. As used herein, "at least a portion" of a polynucleotide or polypeptide means a portion having the minimal size characteristics of such sequences, as described above, or any larger fragment of the full length molecule, up to and including the full length molecule. For example, a portion of a polynucleotide may be 12 nucleotides, 13 nucleotides, 14 nucleotides, 15 nucleotides, and so on, going up to the full length polynucleotide. Similarly, a portion of a polypeptide may be 4 amino acids, 5 amino acids, 6 amino acids, 7 amino acids, and so on, going up to the full length polypeptide. The length of the portion to be used will depend on the particular application. As discussed above, a portion of a polynucleotide useful as hybridization probe may be as short as 12 nucleotides. A portion of a polypeptide useful as an epitope may be as short as 4 amino acids. A portion of a polypeptide that performs the function of the full-length polypeptide would generally be longer than 4 amino acids.
Particularly preferred plant EG307 polypeptides of the present invention are polypeptides that include SEQ ID N0:3, SEQ ID N0:6, SEQ ID NO:9, SEQ ID N0:13, SEQ
ID N0:16, SEQ ID NO:19, SEQ ID NO:22, SEQ ID N0:26, SEQ ID N0:30, SEQ ID
NO:32, SEQ ID N0:36, SEQ ID N0:39, SEQ ID N0:43, SEQ ID N0:48, SEQ ID N0:52, SEQ ID
N0:56, SEQ ID N0:61, SEQ ID N0:65, SEQ ID N0:68. SEQ ID NO:72, SEQ ID N0:76, SEQ ID N0:79, SEQ ID N0:83and/or SEQ ID N0:86 (including, but not limited to the encoded polypeptides, full-length polypeptides, processed polypeptides, fusion polypeptides and multivalent polypeptides thereof) as well as polypeptides that are truncated homologues of polypeptides that include at least portions of the aforementioned SEQ ID
NOs. Examples of methods to produce such polypeptides are disclosed herein, including in the Examples section.
B. EG1117 Polypeptirles One embodiment of the present invention is an isolated plant EG1117 polypeptide. As used herein, an EG307 polypeptide, in one embodiment, is a polypeptide that is related to (i.e., bears structural similarity to) the O. f-ufigogov~ polypeptide of about 552 amino acids and having the sequence depicted in Figure 7 (SEQ ID N0:95). The original identification of such a polypeptide is detailed in the Examples. A preferred EG1117 polypeptide is encoded by a polynucleotide that hybridizes under stringent hybridization conditions to at least one of the following genes: (a) a gene encoding an O. sativa EG1117 polypeptide (i.e., an O. sativa gene); (b) a gene encoding an O. rufipogon EG1117 polypeptide (i.e., an O.
nufzpogou gene);
(c) a gene encoding a Zea nays mays EG1117 gene; (d) a gene encoding a Zea ways parviglumis EG1117 polypeptide (i.e., a. ~ mays parviglumis gene).
As used herein, an O. sativa EG1117 gene includes all nucleic acid sequences related to a natural O. sativa EG307 gene such as regulatory regions that control production of the O.
sativa EG1117 polypeptide encoded by that gene (such as, but not limited to, transcription, translation or post-translation control regions) as well as the coding region itself. In one embodiment, an O. sativa EG1117 gene includes the nucleic acid sequence SEQ ID
N0:4.

Nucleic acid sequence SEQ ID N0:4 represents the deduced sequence of a cDNA
(complementary DNA) polynucleotide, the production of which is disclosed in the Examples.
It should be noted that since nucleic acid sequencing technology is not entirely error-free;
SEQ ID N0:4 (as well as other sequences presented herein), at best, represents an apparent nucleic acid sequence of the polynucleotide encoding an O. sativa EG307 polypeptide of the present invention.
In another embodiment, an O. sativa EGl 117 gene can be an allelic variant that includes a similar but not identical sequence to SEQ ID N0:92 and/or SEQ ID
N0:93.
An EG1117 polypeptide of the present invention may be identified by its ability to perform the function of natural EG1117 in a functional assay. By "natural polypeptide," it is meant the full length EG1117 polypeptide of O. sativa, O.
rufipogoh, Z.
ways mays, and/or Z. mays parviglumis. The phrase "capable of performing the function of a natural EG1117 in a functional assay" means that the polypeptide has at least about 10% of the activity of the natural polypeptide in the functional assay. In other preferred embodiments, the EG1117 polypeptide has at least about 20% of the activity of the natural polypeptide in the functional assay. In other preferred embodiments, the EG1117 polypeptide has at least about 30% of the activity of the natural polypeptide in the functional assay. In other preferred embodiments, the EG1117 polypeptide has at least about 40% of the activity of the natural polypeptide in the,functional assay. In other preferred embodiments, the EG1117 polypeptide has at least about 50% of the activity of the natural polypeptide in the functional assay. In other preferred embodiments, the polypeptide has at least about 60% of the activity of the natural polypeptide in the functional assay. In more preferred embodiments, the polypeptide has at least about 70% of the activity of the natural polypeptide in the functional assay. In more preferred embodiments, the polypeptide has at least about 80% of the activity of the natural polypeptide in the functional assay. In more preferred embodiments, the polypeptide has at least about 90% of the activity of the natural polypeptide in the functional assay. Examples of functional assays include antibody-binding assays, or yield-increasing assays, as detailed elsewhere in this specification.
As used herein, an isolated plant EG1117 polypeptide can be a full-length polypeptide or any homologue of such a polypeptide. In one embodiment, when the homologue is administered to an animal as an immunogen, using techniques known to those skilled in the art, the animal will produce a humoral and/or cellular immune response against at least one epitope of a natural EG1117 polypeptide. EGl 117 homologues can also be selected by their ability to perform the function of EG1117 in a functional assay.

Plant EG117 polypeptide homologues can be the result of natural allelic variation or natural mutation. EG1117 polypeptide homologues of the present invention can also be produced using techniques known in the art including, but not limited to, direct modifications to the polypeptide or modifications to the gene encoding the polypeptide using, for example, classic or recombinant DNA techniques to effect random or targeted mutagenesis.
In accordance with the present invention, a mimetope refers to any compound that is able to mimic the ability of an isolated plant EG307 polypeptide of the present invention to perform the function of an EG307 polypeptide of the present invention in a functional assay.
Examples of mimetopes include, but are not limited to, anti-idiotypic antibodies or fragments thereof, that include at least one binding site that mimics one or more epitopes of an isolated polypeptide of the present invention; non-polypeptideaceous immunogenic portions of an isolated polypeptide (e.g., carbohydrate structures); and synthetic or natural organic molecules, including nucleic acids, that have a structure similar to at least one epitope of an isolated polypeptide of the present invention. Such mimetopes can be designed using computer-generated structures of polypeptides of the present invention.
Mimetopes can also be obtained by generating random samples of molecules, such as oligonucleotides, peptides or other organic molecules, and screening such samples by affinity chromatography techniques using the corresponding binding partner.
The minimal size of an EG307 polypeptide homologue of the present invention is a size sufficient to be encoded by a polynucleotide capable of forming a stable hybrid with the complementary sequence of a polynucleotide encoding the corresponding natural polypeptide.
Minimal size characteristics are disclosed herein.
Any plant EG1117 polypeptide is a suitable polypeptide of the present invention.
Suitable plants from which to isolate EGl 117 polypeptides (including isolation of the natural polypeptide or production of the polypeptide by recombinant or synthetic techniques) include those described in the section entitles "EG307 Polypeptides."
A preferred plant EG1117 polypeptide of the present invention is a compound that when expressed or modulated in a plant, is capable of increasing the yield of the plant.
One embodiment of the present invention is a fusion polypeptide that includes an EG1117 polypeptide-containing domain attached to a fusion segment.
Preferred plant EG1117 polypeptides of the present invention are rice EG1117 polypeptides and maize EG1117 polypeptides. More preferred EG1117 polypeptides are O.
sativa, O. ~ufipogon, Z. rnays mays, and lea ynays parviglumis EG1117 polypeptides. O.
sativa strains inlcude Nipponbare, Azucena, Kasalath 1, 2, 3, and 4, Teqing, Lemont, and IR64. Z. nays panviglumis strains include Benz, BK4, IA19, and Wilkes. Z. mays mays strains include BS7, HuoBai, Makki, Minl3, Pira, Sari, Smena, and W22.
One preferred O. rufipogon EGl 117 polypeptide of the present invention is a polypeptide encoded by an O. ~ufipgon polynucleotide that hybridizes under stringent hybridization conditions with complements of polynucleotides represented by SEQ ID N0:92, SEQ ID N0:93, SEQ ID N0:94, SEQ ID N0:96, SEQ ID N0:97, and/or SEQ ID N0:98.
One preferred O. sativa EG1117 polypeptide of the present invention is a polypeptide encoded by an O. sativa polynucleotide that hybridizes under stringent hybridization conditions with complements of polynucleotides represented by SEQ ID NO:100, SEQ ID
NO:101, SEQ ID NO:102, SEQ ID N0:103, SEQ ID N0:104, SEQ ID N0:104, SEQ ID
N0:106, SEQ ID N0:107, SEQ ID N0:109, SEQ ID NO:110, SEQ ID NO:112, SEQ ID
N0:113, SEQ ID N0:114, SEQ ID N0:116, SEQ ID N0:117.
One preferred Z. mays mays EGl 117 polypeptide of the present invention is a polypeptide encoded by an mays mays polynucleotide that hybridizes under stringent hybridization conditions with complements of polynucleotides represented by SEQ ID
N0:119, SEQ ID N0:120, SEQ ID N0:122, SEQ ID N0:123, SEQ ID N0:124, SEQ ID
N0:125, SEQ ID N0:127, SEQ ID N0:128, SEQ ID N0:129, SEQ ID NO:130, SEQ ID
N0:133, SEQ ID N0:135, SEQ ID N0:136, SEQ. ID NO:137, SEQ ID N0:138, SEQ ID
N0:140, SEQ ID N0:141, SEQ ID N0:142, SEQ ID N0:144, SEQ ID N0:145, SEQ ID
N0:146, SEQ ID N0:147, SEQ ID N0:149, SEQ ID NO:150, SEQ ID NO:151, SEQ ID
N0:152, SEQ ID N0:154, SEQ ID NO:155, One preferred Z. ways parviglumis EG1117 polypeptide of the present invention is a polypeptide encoded by an Z. mays parviglumis polynucleotide that hybridizes under stringent hybridization conditions with complements of polynucleotides represented by SEQ ID
NO:157, SEQ ID NO:158, SEQ ID N0:160, SEQ ID N0:161, SEQ ID N0:162, SEQ ID
N0:163, SEQ ID N0:165, SEQ ID N0:166, SEQ ID N0:167, and/or SEQ ID N0:168.
Inspection of EG1117 genomic nucleic acid sequences for rice indicates that the genes comprise several regions, including a first exon region, a first intron region, a second exon region, a second.intron region, a third exon region, a third intron region, and a fourth exon region. The locations of these regions in each of the EG1117 rice and rice ancestor genomic nucleic acid sequences is summarized in the Table below:

SEQ
ID.

Or anism NO. exon intronexon intron exon intron exon O. nufipogon92 strain (5'-ward 1-64 65-349350-567568-702703-1259- -end) O. rufipogofz93 strain (3'-ward 1-868- - - - - -end) O. f~ufipogon96 strain (5'-ward 135 36-320321-538539-673674-1230- -end) O. ruftpogon97 strain (3'-ward - 1-357 358-1225- - - -end) O. sativa 100 strain Azucena 1-64 65-349350-567568-702703-12591260-17311732-2599 O. sativa 103 strain 1-64 65-349350-567568-702703-12591260-17331734-2601 O. sativa 106 strain Kasalath 1-64 65-349350-567568-702703-12591260-17331734-2601 O. sativa 109 strain 1-64 65-349350-567568-702703-12591260-17311732-2599 Lemont O. sativa 112 strain Nipponbare (5'-ward 1-64 65-349350-567568-702703-1259- -end) O. sativa 113 strain Nipponbare (3'-ward 1-864- - - - - -end) O. sativa 116 strain 1-64 65-349350-567568-702703-12591260-17331734-2601 Te in Translation of the genomic sequences suggests that the O. s°ufipogor~
and O. sativa EG1117 polynucleotide include open reading frames. The deduced protein sequence of O.
sativa strain Nipponbare was used to perform a BLAST search. A very strong protein BLAST hit to Arabidopsis PTR2-B (histidine transporting protein, NP-178313) suggests that only about 30 codons of coding sequence (CDS) are missing from the rice sequence (Figure 8).

Finally, the deduced coding sequence and protein sequences are represented as follows:
SEQ ID NO: for SEQ ID NO:
Organism partial CDS for partial protein O. rufipogon strain 94 95 O. ruff o o~c strain 98 99 O. sativa strain Azucena101 102 O. sativa strain IR64104 105 O. sativa strain Kasalath107 108 O. sativa strain Lemont110 111 O. sativa strain Nipponbare114 115 O. sativa strain Teqin117 118 The has also been partial determined sequence in maize and of teosinte. This information is summarized in the Table below:
Organism SEQ exon intron exon CDS
ID
NO:

Zea nays naays strain119 1-531 - - 1-531 Zea rnays naays strain122 1-365 366-536 - 1-365 Enano Zea rnays mat's strain123 - 1-393 394-550 394-550 Enano Zea rnays rnays strain124 1-533 - - 1-533 Enano Zea nays mat's strain127 1-375 376-525 - 1-375 Huobai Zea mat's mat's strain128 - 1-143 144-334 144-334 Huobai Zea mat's rnays strain129 1-529 - - 1-529 Huobai Zea mat's mat's strain132 1-513 - - 1-513 Makki Zea mat's naays strain135 1-374 375-545 - 1-374 Minl3 Zea mat's ways strain136 - 1-390 391-570 391-570 Minl3 Zea mat's nays strain137 1-525 - - 1-525 Minl3 Zea mat's rnays strain140 1-371 372-526 - 1-371 Pira Zea naays mat's strain141 1-525 - - 1-525 Pira Zea rnays mat's strain144 1-364 365-499 - 1-364 Sari Zea mat's mat's strain145 - 1-422 423-607 423-607 Sari Zea rnays mat's strain146 1-520 - - 1-520 Sari Zea mat's rnays strain149 1-371 372-543 - 1-371 Smena Zea mat's mat's strain150 - 1-262 263-443 263-443 Smena Zea mat's mat's strain151 1-523 - - 1-523 Smena Zea mat's naays strain154 1-488 - - 1-488 Zea rnays parviglurnis157 1-516 - - 1-516 strain Benz Zea mat's pasviglumis160 1-372 373-385 - 1-372 strain Zea rnays parviglurnis161 - 1-433 434-613 434-613 strain Zea nays parwiglurnis162 1-462 - - 1-462 strain Zea rnays parviglurnis165 1-355 356-556 - 1-355 strain Wilkes Zea mat's parviglumis166 - 1-395 396-552 396-552 strain Wilkes Zea mat's parviglumis167 1-511 - - 1-511 strain Wilkes Translation of the genomic sequences suggests that the Z. nays mat's and Z.
mat's parviglumis EG1117 polynucleotide include open reading frames.
A summary of the open reading frame information appears in the following Table:
SEQ ID NO: SEQ ID NO:
Organism for partial for partial CDS protein Zea mat's mat's strain BS7 120 121 Zea mat's mat's strain Enano 125 126 Zea mat's mat's strain Huobai 130 131 Zea mat's mat's strain Makki 133 134 Zea mat's nays strain Minl3 138 139 Zea mat's mat's strain Pira 142 143 Zea nays mat's strain Sari 147 148 Zea mat's mat's strain Smena 152 153 Zea mat's mat's strain W22 155 156 Zea mat's pafviglumis strain 158 159 Benz Zea nays arviglumis strain BI~4 163 164 Zea nays aywi lumis strain Wilkes168 169 Preferred plant EG1117 polypeptides of the present invention include polypeptides comprising amino acid sequences that are at least about 30%, preferably at least about 50%, more preferably at least about 75%, more preferably at least about 80%, more preferably at least about 85%, more preferably at least about 90%, and more preferably at least about 95%, more preferably at least about 98% identical to one or more of the amino acid sequences disclosed herein for O. sativa, O. s°ufipogo~, Z. mat's mat's, and Z.
mat's pa~viglumis, EG1117 polypeptides of the present invention. More preferred plant EG1117 polypeptides of the present invention include: polypeptides encoded by at least a portion of SEQ
ID NO. 92, SEQ
ID NO. 93, and/or SEQ ID N0:94 and, as such, have amino acid sequences that include at least a portion of SEQ ID NO:95; polypeptides encoded by at least a portion of SEQ ID
N0:96, SEQ ID N0:97 and/or SEQ ID NO:98 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:99; polypeptides encoded by at least a portion of SEQ ID NO:100 and/or SEQ ID NO:101 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:102; polypeptides encoded by at least a portion of SEQ ID
N0:103, and/or SEQ ID N0:104 and, as such, have amino acid sequences that include at least a portion of SEQ ID NO:105 ; polypeptides encoded by at least a portion of SEQ
ID N0:106 and/or SEQ ID N0:107 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:108; polypeptides encoded by at least a portion of SEQ ID N0:09 and/or SEQ

ID NO:l 10 and, as such, have amino acid sequences that include at least a portion of SEQ ID
NO:11 l; polypeptides encoded by at least a portion of SEQ ID N0:112, SEQ ID
N0:113 and/or SEQ ID N0:114 and, as such, have amino acid sequences that include at least a portion of SEQ' ID NO:115; polypeptides encoded by at least a portion of SEQ ID N0:116 and/or SEQ ID NO:l 17 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:118; polypeptides encoded by at least a portion of SEQ ID NO:I 19 and/or SEQ
ID N0:120 and, as such, have amino acid sequences that include at least a portion of SEQ ID
NO:121; polypeptides encoded by at least a portion of SEQ ID N0:122, SEQ ID
N0:123 SEQ ID N0:124 and/or SEQ ID N0:125,and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:126; polypeptides encoded by at least a portion of SEQ ID
N0:127, SEQ ID N0:128, SEQ ID N0:129 and/or SEQ ID N0:130 and, as such, have amino acid sequences that include at least a portion of SEQ ID NO:131; polypeptides encoded by at least a portion of SEQ ID N0:132 and/or SEQ ID N0:133 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:134; polypeptides encoded by at least a portion of SEQ ID N0:135, SEQ ID N0:136, SEQ ID N0:137, and/or SEQ ID N0:138 and, as such, have amino acid sequences that include at least a portion of SEQ
ID N0:139;
polypeptides encoded by at least a portion of SEQ ID N0:140, SEQ ID N0:141, and/or SEQ
ID NO:142 and, as such, have amino acid sequences that include at least a portion of SEQ ID
N0:143; polypeptides encoded by at least a portion of SEQ ID NO:144, SEQ ID
N0:145, SEQ ID NO:146 and/or SEQ ID NO:147 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:148; polypeptides encoded by at least a portion of SEQ ID
NO:149, SEQ ID NO:150, SEQ ID NO:151, and/or SEQ ID N0:152 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:153; polypeptides encoded by at least a portion of SEQ ID N0:154 and/or SEQ ID NO:155 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:156; polypeptides encoded by at least a portion of SEQ ID N0:157, and/or SEQ ID N0:158 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:159; polypeptides encoded by at least a portion of SEQ ID N0:160, SEQ ID N0:161, SEQ ID N0:162, and/or SEQ ID N0:163 and, as such, have amino acid sequences that include at least a portion of SEQ
ID NO:164;
and polypeptides encoded by at least a portion of SEQ ID N0:165, SEQ ID
N0:166, SEQ ID
N0:167, and/or SEQ ID N0:168 and, as such, have amino acid sequences that include at least a portion of SEQ ID N0:169. As used herein, "at least a portion" of a polynucleotide or polypeptide means a portion having the minimal size characteristics of such sequences, as described above, or any larger fragment of the full length molecule, up to and including the full length molecule. For example, a portion of a polynucleotide may be 12 nucleotides, 13 nucleotides, 14 nucleotides, 15 nucleotides, and so on, going up to the full length polynucleotide. Similarly, a portion of a polypeptide may be 4 amino acids, 5 amino acids, 6 amino acids, 7 amino acids, and so on, going up to the full length polypeptide. The length of the portion to be used will depend on the particular application. As discussed above, a portion of a polynucleotide useful as hybridization probe may be as short as 12 nucleotides. A
portion of a polypeptide useful as an epitope may be as short as 4 amino acids. A portion of a polypeptide that performs the function of the full-length polypeptide would generally be longer than 4 amino acids.
Particularly preferred plant EG1117 polypeptides of the present invention are polypeptides that include SEQ ID N0:95, SEQ ID N0:99, SEQ ID N0:102, SEQ ID
NO:105, SEQ ID N0:108, SEQ ID NO:111, SEQ ID NO:l 15, SEQ ID NO:118, SEQ ID N0:121, SEQ
ID N0:126, SEQ ID N0:131, SEQ ID N0:134, SEQ ID N0:139, SEQ ID N0:143, SEQ ID
NO:148, SEQ ID N0:153, SEQ ID N0:156, SEQ ID N0:159, SEQ ID N0:164, SEQ,ID
N0:169, and/or SEQ ID N0:170 (including, but not limited to the encoded polypeptides, full-length polypeptides, processed polypeptides, fusion polypeptides and multivalent polypeptides thereof) as well as polypeptides that are truncated homologues of polypeptides that include at least portions of the aforementioned SEQ ID NOs. Examples of methods to produce such polypeptides are disclosed herein, including in the Examples section.
C. EG307Polyuucleotides One embodiment of the present invention is an isolated plant polynucleotide that hybridizes under stringent hybridization conditions with at least one of the following genes:
an O. sativa EG307 gene, an O. ~°ufipogo~t EG307 gene, a Z mays mays EG307 gene, a Z
mays pa~viglumis EG307 gene, a Z, diplopey~eunis EG307 gene, and a Z.
luxu~iahs gene. The identifying characteristics of such genes are heretofore described. A
polynucleotide of the present invention can include an isolated natural plant EG307 gene or a homologue thereof, the latter of which is described in more detail below. A polynucleotide of the present invention can include one or more regulatory regions, full-length or partial coding regions, or combinations thereof. The minimal size of a polynucleotide of the present invention is the minimal size that can form a stable hybrid with one of the aforementioned genes under stringent hybridization conditions. Suitable and preferred plants are disclosed above.
In accordance with the present invention, an isolated polynucleotide is a polynucleotide that has been removed from its natural milieu (i.e., that has been subject to human manipulation). As such, "isolated" does not reflect the extent to which the .

polynucleotide has been purified. An isolated polynucleotide can include DNA, RNA, or derivatives of either DNA or RNA.
An isolated plant EG307 polynucleotide of the present invention can be obtained from its natural source either as an entire (i.e., complete) gene or a portion thereof capable of forming a stable hybrid with that gene. An isolated plant EG307 polynucleotide can also be produced using recombinant DNA technology (e.g., polymerase chain reaction (PCR) amplification, cloning) or chemical synthesis. Isolated plant EG307 polynucleotides include natural polynucleotides and homologues thereof, including, but not limited to, natural allelic variants and modified polynucleotides in which nucleotides have been inserted, deleted, substituted, and/or inverted in such a manner that such modifications do not substantially interfere with the polynucleotide's ability to encode an EG307 polypeptide of the present invention or to form stable hybrids under stringent conditions with natural gene isolates.
A plant EG307 polynucleotide homologue can be produced using a number of methods known to those skilled in the art (see, for example, Sambrook et al., ibid.). For example, polynucleotides can be modified using a variety of techniques including, but not limited to, classic mutagenesis techniques and recombinant DNA techniques, such as site-directed mutagenesis, chemical treatment of a polynucleotide to induce mutations, restriction enzyme cleavage of a nucleic acid fragment, ligation of nucleic acid fragments, polymerase chain reaction (PCR) amplification and/or mutagenesis of selected regions of a nucleic acid sequence, synthesis of oligonucleotide mixtures and ligation of mixture groups to "build" a mixture of polynucleotides and combinations thereof. Polynucleotide homologues can be selected from a mixture of modified nucleic acids by screening for the function of the polypeptide encoded by the nucleic acid (e.g., ability to elicit an immune response against at least one epitope of an EG307 polypeptide, ability to increase yield in a transgenic plant containing an EG307 gene) and/or by hybridization with an O. sativa EG307 gene, with an O.
~ufipogon EG307 gene, with a Z. mays mays EG307 gene, with a Z. mays pay°viglumis EG307 gene, a Z diplope~eunis EG307 gene and/or a Z. luxu~ians EG307 gene.
An isolated polynucleotide of the present invention can include a nucleic acid sequence that encodes at least one plant EG307 polypeptide of the present invention, examples of such polypeptides being disclosed herein. Although the phrase "polynucleotide"
primarily refers to the physical polynucleotide and the phrase "nucleic acid sequence"
primarily refers to the sequence of nucleotides on the polynucleotide, the two phrases can be used interchangeably, especially with respect to a polynucleotide, or a nucleic acid sequence, being capable of encoding an EG307 polypeptide. As heretofore disclosed, plant polypeptides of the present invention include, but are not limited to, polypeptides having full-length plant EG307 coding regions, polypeptides having partial plant EG307 coding regions, fusion polypeptides, multivalent protective polypeptides and combinations thereof.
At least certain polynucleotides of the present invention encode polypeptides that selectively bind to immune serum derived from an animal that has been immunized with an EG307 polypeptide from which the polynucleotide was isolated.
A preferred polynucleotide of the present invention, when expressed in a suitable plant, is capable of increasing the yield of the plant. As will be disclosed in more detail below, such a polynucleotide can be, or encode, an antisense RNA, a molecule capable of triple helix formation, a ribozyme, or other nucleic acid-based compound.
One embodiment of the present invention is a plant EG307 polynucleotide that hybridizes under stringent hybridization conditions to an EG307 polynucleotide of the present invention, or to a homologue of such an EG307 polynucleotide, or to the complement of such a polynucleotide. A polynucleotide complement of any nucleic acid sequence of the present invention refers to the nucleic acid sequence of the polynucleotide that is complementary to (i.e., can form a complete double helix with) the strand for which the sequence is cited. It is to be noted that a double-stranded nucleic acid molecule of the present invention for which a nucleic acid sequence has been determined for one strand, that is represented by a SEQ ID
NO, also comprises a complementary strand having a sequence that is a complement of that SEQ ID NO. As such, polynucleotides of the present invention, which can be either double-stranded or single-stranded, include those polynucleotides that form stable hybrids under stringent hybridization conditions with either a given SEQ ID NO denoted herein and/or with the complement of that SEQ ID NO, which may or may not be denoted herein.
Methods to deduce a complementary sequences are known to those skilled in the art.
Preferred is an EG307 polynucleotide that includes a nucleic acid sequence having at least about 65 percent, preferably at least about 70 percent, more preferably at least about 75 percent, more preferably at least about 80 percent, more preferably at least about 85 percent, more preferably at least about 90 percent and even more preferably at least about 95 percent homology with the corresponding regions) of the nucleic acid sequence encoding at least a portion of an EG307 polypeptide. Particularly preferred is an EG307 polynucleotide capable of encoding at least a portion of an EG307 polypeptide that naturally is present in plants.
Particularly preferred EG307 polynucleotides of the present invention hybridize under stringent hybridization conditions with at least one of the following polynucleotides: SEQ ID
NO:l, SEQ ID N0:91, SEQ ID. N0:2, SEQ ID N0:4, SEQ ID NO:S, SEQ ID N0:7, SEQ
ID

NO:10, SEQ ID NO:11, SEQ ID N0:12, SEQ ID N0:14, SEQ ID NO:15, SEQ ID N0:17, SEQ ID N0:18, SEQ ID N0:20, SEQ ID NO:21, SEQ ID. NO:23, SEQ ID N0:24, SEQ ID
N0:25, SEQ ID N0:27, SEQ ID N0:28, SEQ ID N0:29, SEQ ID N0:90, SEQ ID N0:31, SEQ ID NO:33, SEQ ID N0:34, SEQ ID. NO:35, SEQ ID N0:37, SEQ ID N0:38, SEQ ID
S N0:40, SEQ ID N0:41, SEQ ID N0:42, SEQ ID N0:44, SEQ ID N0:45, SEQ ID N0:46, SEQ ID NO:47, SEQ ID N0:49, SEQ ID. NO:50, SEQ ID NO:51, SEQ ID N0:53, SEQ ID
N0:54, SEQ ID NO:55, SEQ ID N0:57, SEQ ID NO:58, SEQ ID N0:60, SEQ ID N0:62, SEQ ID N0:63, SEQ ID N0:64, SEQ ID N0:66, SEQ ID N0:67, SEQ ID N0:69, SEQ TD.
N0:70, SEQ ID N0:71, SEQ ID N0:73, SEQ ID N0:74, SEQ ID N0:75, SEQ ID N0:77, SEQ ID N0:59, SEQ ID N0:78, SEQ ID N0:80, SEQ ID N0:81, SEQ ID N0:82, SEQ ID
N0:84, and/or SEQ ID N0:85, or to a homologue or complement of such polynucleotide.
A preferred polynucleotide of the present invention includes at least a portion of nucleic acid sequence SEQ ID NO:1, SEQ ID N0:91, SEQ ID. N0:2, SEQ ID N0:4, SEQ ID
NO:S, SEQ ID N0:7, SEQ ID NO:10, SEQ ID NO:11, SEQ ID N0:12, SEQ ID N0:14, SEQ
ID NO:15, SEQ ID N0:17, SEQ ID N0:18, SEQ ID N0:20, SEQ ID N0:21, SEQ ID.
N0:23, SEQ ID N0:24, SEQ ID N0:25, SEQ ID N0:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30, SEQ ID NO:31, SEQ ID N0:33, SEQ ID N0:34, SEQ ID. N0:35, SEQ ID
NO:37, SEQ ID NO:38, SEQ ID N0:40, SEQ ID N0:41, SEQ ID N0:42, SEQ ID N0:44, SEQ ID N0:45, SEQ ID N0:46, SEQ ID N0:47, SEQ ID N0:49, SEQ ID. NO:50, SEQ ID
NO:51, SEQ ID N0:53, SEQ ID N0:54, SEQ ID NO:55, SEQ ID N0:57, SEQ ID N0:58, SEQ ID N0:60, SEQ ID N0:62, SEQ ID NO:63, SEQ ID N0:64, SEQ ID N0:66, SEQ ID
N0:67, SEQ ID N0:69, SEQ ID. N0:70, SEQ ID N0:71, SEQ ID N0:73, SEQ ID N0:74, SEQ ID N0:75, SEQ ID N0:77, SEQ ID N0:59, and/or SEQ ID N0:78 that is capable of hybridizing (i.e., that hybridizes under stringent hybridization conditions) to an O. sativa EG307 gene, to a O. ~ufipogoh EG307 gene, to a Z. rhays mays EG307 gene, to a 2 mays pa~viglufnis EG307 gene, to a Z. diplope~evcnis EG307 gene and/or to a Z.
luxu~iaus EG307 gene of the present invention, as well as a polynucleotide that is an allelic variant of any of those polynucleotides. Such preferred polynucleotides can include nucleotides in addition to those included in the SEQ ID NOs, such as, but not limited to, a full-length gene, a full-length coding region, a polynucleotide encoding a fusion polypeptide, and/or a polynucleotide encoding a multivalent protective compound.
The present invention also includes polynucleotides encoding a polypeptide including at least a portion of SEQ ID N0:3, polynucleotides encoding a polypeptide having at least a portion of SEQ ID N0:6, polynucleotides encoding a polypeptide having at least a portion of SEQ ID N0:9, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:13, polynucleotides encoding a polypeptide having at least a portion of SEQ
ID N0:16, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:19, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:22, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:26, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:30, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:36, polynucleotides encoding a polypeptide having at least a portion ~of SEQ ID
N0:39, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:43, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:48, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:52, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:56, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:61, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:65, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:68, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:72, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:76, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:79, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
NO:83, and/or polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:86, including polynucleotides that have been modified to accommodate codon usage properties of the cells in which such polynucleotides are to be expressed.
Knowing the nucleic acid sequences of certain plant EG307 polynucleotides of the present invention allows one slcilled in the art to, for example, (a) make copies of those polynucleotides, (b) obtain polynucleotides including at least a portion of such polynucleotides (e.g., polynucleotides including full-length genes, full-length coding regions, regulatory control sequences, truncated coding regions), and (c) obtain EG307 polynucleotides for other plants, particularly since, as described in detail in the Examples section, knowledge of O. sativa EG307 polynucleotides of the present invention enabled the isolation of O. rufipogoh, Zea mat's rnays, Zea mat's parviglu~cis, Z.
a'iplope~ennis, and Z.luxuf°ians EG307 polynucleotides of the present invention. Such polynucleotides can be obtained in a variety of ways including screening appropriate expression libraries with antibodies of the present invention; traditional cloning techniques using oligonucleotide probes of the present invention to screen appropriate libraries or DNA; and PCR amplification of appropriate libraries or DNA using oligonucleotide primers of the present invention.
Preferred libraries to screen or from which to amplify polynucleotides include libraries such as genomic DNA libraries, BAC libraries, YAC libraries, cDNA libraries prepared from isolated plant tissues, including, but not limited to, stems, reproductive structures/tissues, leaves, roots, and tillers; and libraries constructed from pooled cDNAs from any or all of the tissues listed above. In the case of rice, BAC libraries, available from Clemson University, are preferred. Similarly, preferred DNA sources to screen or from which to amplify polynucleotides include plant genomic DNA. Techniques to clone and amplify genes are disclosed, for example, in Sambrook et al., ibid. and in Galun & Breiman, TRANSGENIC
PLANTS, Imperial College Press, 1997.
The present invention also includes polynucleotides that are oligonucleotides capable of hybridizing, under stringent hybridization conditions, with complementary regions of other, preferably longer, polynucleotides of the present invention such as those comprising plant EG307 genes or other plant EG307 polynucleotides. Oligonucleotides of the present invention can be RNA, DNA, or derivatives of either. The minimal size of such oligonucleotides is the size required to form a stable hybrid between a given oligonucleotide and the complementary sequence on another polynucleotide of the present invention.
Minimal size characteristics are disclosed herein. The size of the oligonucleotide must also be sufficient for the use of the oligonucleotide in accordance with the present invention.
Oligonucleotides of the present invention can be used in a variety of applications including, but not limited to, as probes to identify additional polynucleotides, as primers to amplify or extend polynucleotides, as targets for expression analysis, as candidates for targeted mutagenesis and/or recovery, or in agricultural applications to alter EG307 polypeptide production or activity. Such agricultural applications include the use of such oligonucleotides in, for example, antisense-, triplex formation-, ribozyme- and/or RNA drug-based technologies. The present invention, therefore, includes such oligonucleotides and methods to enhance economic productivity in a plant by use of one or more of such technologies.
D. EG1117Polyuueleotides One embodiment of the present invention is an isolated plant polynucleotide that hybridizes under stringent hybridization conditions with at least one of the following genes:
an O. sativa EGl 117 gene, an O. r~ufipogon EGl 117 gene, a Z. mays mays EG1117 gene, and a Z. mays parviglumis EG1117 gene. The identifying characteristics of such genes are heretofore described. A polynucleotide of the present invention can include an isolated natural plant EG1117 gene or a homologue thereof. A polynucleotide of the present invention can include one or more regulatory regions, full-length or partial coding regions, or combinations thereof. The minimal size of a polynucleotide of the present invention is the minimal size that can form a stable hybrid with one of the aforementioned genes under stringent hybridization conditions. Suitable and preferred plants are disclosed above.
Characteristics of isolated EGl 117 genes and homologues thereof are described above in the section entitled "EG307 polynucleotides."
One embodiment of the present invention is a plant EG1117 polynucleotide that hybridizes under stringent hybridization conditions to an EG1117 polynucleotide of the present invention, or to a homologue of such an EGl 117 polynucleotide, or to the complement of such a polynucleotide. Preferred is an EGl 117 polynucleotide that includes a nucleic acid sequence having at least about 65 percent, preferably at least about 70 percent, more preferably at least about 75 percent, more preferably at least about 80 percent, more preferably at least about 85 percent, more preferably at least about 90 percent and even more preferably at least about 95 percent homology with the corresponding regions) of the nucleic acid sequence encoding at least a portion of an EG1117 polypeptide.
Particularly preferred is an EG1117 polynucleotide capable of encoding at least a portion of an EG1117 polypeptide that naturally is present in plants.
Particularly preferred EG1117 polynucleotides of the present invention hybridize under stringent hybridization conditions with at least one of the following polynucleotides: , or to a homologue or complement of such polynucleotide.
A preferred polynucleotide of the present invention includes at least a portion of nucleic acid sequence SEQ ID N0:92, that is capable of hybridizing (i.e., that hybridizes under stringent hybridization conditions) to an O. sativa EGl 117 gene, to a O. rufipogon EG1117 gene, to a Z. mays f~zays EGl 117 gene, and/or to a Z. mays parviglumis EG1117 gene of the present invention, as well as a polynucleotide that is an allelic variant of any of those polynucleotides. Such preferred polynucleotides can include nucleotides in addition to those included in the SEQ ID NOs, such as, but not limited to, a full-length gene, a full-length 'coding region, a polynucleotide encoding a fusion polypeptide, and/or a polynucleotide encoding a multivalent protective compound.
A preferred polynucleotide of the present invention includes at least a portion of nucleic acid sequence SEQ ID N0:92, SEQ ID N0:93, SEQ ID N0:94, SEQ ID N0:96, SEQ
ID N0:97, SEQ ID N0:98, SEQ ID NO:100, SEQ ID NO:101, SEQ ID N0:102, SEQ ID
N0:103, SEQ ID N0:104, SEQ ID N0:104, SEQ ID N0:106, SEQ ID N0:107, SEQ ID

N0:109, SEQ ID NO:110, SEQ ID N0:112, SEQ ID N0:113, SEQ ID N0:114, SEQ ID
N0:116, SEQ ID N0:117, SEQ ID NO:l 19, SEQ ID NO:120, SEQ ID N0:121, SEQ ID
N0:122, SEQ ID N0:123, SEQ ID N0:124, SEQ ID N0:125, SEQ ID N0:127, SEQ ID
NO:128, SEQ ID N0:129, SEQ ID N0:130, SEQ ID N0:131, SEQ ID N0:133, SEQ ID
N0:135, SEQ ID N0:136, SEQ ID N0:137, SEQ ID N0:138, SEQ ID N0:140, SEQ ID
N0:141, SEQ ID N0:142, SEQ ID N0:144, SEQ ID N0:145, SEQ ID N0:146, SEQ ID
N0:147, SEQ ID NO:149, SEQ ID NO:150, SEQ ID NO:151, SEQ ID N0:152, SEQ ID
N0:154, SEQ ID NO:155, SEQ ID N0:157, SEQ ID NO:158, SEQ ID N0:160, SEQ ID
N0:161, SEQ ID N0:162, SEQ ID N0:163, SEQ ID N0:165, SEQ ID N0:166, SEQ ID
NO:167, and/or SEQ ID N0:168, that is capable of hybridizing (i.e., that hybridizes under stringent hybridization conditions) to an O. sativa EG1117 gene, to a O.
f~ufipogon EG1117 gene, to a Z. mat's mat's EG1117 gene, and/or to a 2 mat's pa~viglumis EG1117 gene, to a gene of the present invention, as well as a polynucleotide that is an allelic variant of any of those polynucleotides. Such preferred polynucleotides can include nucleotides in addition to those included in the SEQ ID NOs, such as, but not limited to, a full-length gene, a full-length coding region, a polynucleotide encoding a fusion polypeptide, and/or a polynucleotide encoding a multivalent protective compound.
The present invention also includes polynucleotides encoding a polypeptide including at least a portion of SEQ ID N0:95, polynucleotides encoding a polypeptide having at least a portion of SEQ ID N0:99, polynucleotides encoding a polypeptide having at least a portion of SEQ ID N0:102, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
NO:105, polynucleotides encoding a polypeptide having at least a portion of SEQ ID N0:108, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
NO:111, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
NO:115, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
NO:118, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:121, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:126, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:131, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:134, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:139, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:143, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:148, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:153, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:156, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:159, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:164, polynucleotides encoding a polypeptide having at least a portion of SEQ ID
N0:169, including polynucleotides that have been modified to accommodate codon usage properties of the cells in which such polynucleotides are to be expressed.
Knowing the nucleic acid sequences of certain plant EGl 117 polynucleotides of the present invention allows one skilled in the art to, for example, (a) make copies of those polynucleotides, (b) obtain polynucleotides including at least a portion of such polynucleotides (e.g., polynucleotides including full-length genes, full-length coding regions, regulatory control sequences, truncated coding regions), and (c) obtain EG117 polynucleotides for other plants, particularly since, as described in detail in the Examples section, knowledge of O. ~ufipogon EG117 polynucleotides of the present invention enabled the isolation of O. sativa, Zea mays naays, and Zea nays parviglumis EGl 117 polynucleotides of the present invention. Such polynucleotides can be obtained in a variety of ways including screening appropriate expression libraries with antibodies of the present invention; traditional cloning techniques using oligonucleotide probes of the present invention to screen appropriate libraries or DNA; and PCR amplification of appropriate libraries or DNA using oligonucleotide primers of the present invention. Preferred libraries are described above in the section entitled "EG307 polynucleotides."
The present invention also includes polynucleotides that are oligonucleotides capable of hybridizing, under stringent hybridization conditions, with complementary regions of other, preferably longer, polynucleotides of the present invention such as those comprising plant EG1117 genes or other plant EG1117 polynucleotides. Oligonucleotides of the present invention can be RNA, DNA, or derivatives of either. The minimal size of such oligonucleotides is the size required to form a stable hybrid between a given oligonucleotide and the complementary sequence on another polynucleotide of the present invention.
Minimal size characteristics are disclosed herein. The size of the oligonucleotide must also be sufficient for the use of the oligonucleotide in accordance with the present invention. Such applications are described above in the section entitled "EG307 polynucleotides."
E. Recombinant molecules The present invention also includes a recombinant vector, which includes at least one plant EG307 or EGl 117 polynucleotide of the present invention, inserted into any vector capable of delivering the polynucleotide into a host cell. Such a vector contains heterologous nucleic acid sequences, that is nucleic acid sequences that are not naturally found adjacent to polynucleotides of the present invention and that are derived from a species other than the species from which the polynucleotide(s) are derived. As used herein, a derived polynucleotide is one that is identical or similar in sequence to a polynucleotide or portion of a polynucleotide, but can contain modifications, such as modified bases, backbone modifications, nucleotide changes, and the like. The vector can be either RNA
or DNA, either prokaryotic or eukaryotic, and typically is a virus or a plasmid.
Recombinant vectors can be used in the cloning, sequencing, and/or otherwise manipulating of plant EG307 or EG1117 polynucleotides of the present invention. One type of recombinant vector, referred to herein as a recombinant molecule and described in more detail below, can be used in the expression of polynucleotides of the present invention. Preferred recombinant vectors are capable of replicating in the transformed cell.
Suitable and preferred polynucleotides to include in recombinant vectors of the present invention are as disclosed herein for suitable and preferred plant EG307 or EGl 117 polynucleotides per se. Particularly preferred polynucleotides to include in recombinant vectors, and particularly in recombinant molecules, of the present invention include SEQ ID
NO:1, SEQ ID N0:91, SEQ ID. NO:2, SEQ ID N0:4, SEQ ID NO:S, SEQ ID NO:7, SEQ
ID
NO:10, SEQ ID NO:11, SEQ ID N0:12, SEQ ID N0:14, SEQ ID NO:15, SEQ ID N0:17, SEQ ID N0:18, SEQ ID N0:20, SEQ ID N0:21, SEQ ID. N0:23~ SEQ ID N0:24, SEQ ID
N0:25, SEQ ID N0:27, SEQ ID N0:28, SEQ ID N0:29, SEQ ID N0:30, SEQ ID N0:31, SEQ ID N0:33, SEQ ID NO:34, SEQ ID. N0:35, SEQ ID N0:37, SEQ ID NO:38, SEQ ID
N0:40, SEQ ID N0:41, SEQ ID N0:42, SEQ ID N0:44, SEQ ID N0:45, SEQ ID N0:46, SEQ ID N0:47, SEQ ID N0:49, SEQ ID. NO:50, SEQ ID NO:51, SEQ ID N0:53, SEQ ID
N0:54, SEQ ID NO:55, SEQ ID N0:57, SEQ ID NO:58, SEQ ID N0:60, SEQ ID N0:62, SEQ ID N0:63, SEQ ID N0:64, SEQ ID N0:66, SEQ ID N0:67, SEQ ID N0:69, SEQ ID.
N0:70, SEQ ID N0:71, SEQ ID N0:73, SEQ ID N0:74, SEQ ID N0:75, SEQ ID N0:77, SEQ ID N0:59, and/or SEQ ID N0:78, . Alternative preferred polynucleotides to include in recombinant vectors, and particularly in recombinant molecules, of the present invention include SEQ ID N0:92, SEQ ID NO:93, SEQ ID N0:94, SEQ ID N0:96, SEQ ID N0:97, SEQ ID N0:98, SEQ ID NO:100, SEQ ID NO:101, SEQ ID N0:102, SEQ ID N0:103, SEQ
ID N0:104, SEQ ID N0:104, SEQ ID N0:106, SEQ ID N0:107, SEQ ID N0:109, SEQ ID
NO:110, SEQ ID N0:112, SEQ ID N0:113, SEQ ID NO:114, SEQ ID N0:116, SEQ ID
N0:117, SEQ ID N0:119, SEQ ID N0:120, SEQ ID N0:121, SEQ ID N0:122, SEQ ID
N0:123, SEQ ID N0:124, SEQ ID N0:125, SEQ ID N0:127, SEQ ID N0:128, SEQ ID
N0:129, SEQ ID N0:130, SEQ ID NO:131, SEQ ID N0:133, SEQ ID N0:135, SEQ ID

N0:136,SEQ ID N0:137, SEQ ID SEQ ID N0:140, SEQ ID N0:141, N0:138, SEQ ID

N0:142,SEQ ID N0:144, SEQ ID SEQ ID N0:146, SEQ ID NO:147, N0:145, SEQ ID

N0:149,SEQ ID NO:150, SEQ ID SEQ ID N0:152, SEQ ID N0:154, NO:151, SEQ ID

NO:155,SEQ ID N0:157, SEQ ID SEQ ID N0:160, SEQ ID N0:161, N0:158, SEQ ID

N0:162,SEQ ID N0:163, SEQ ID SEQ ID N0:166, SEQ ID N0:167, N0:165, and/or SEQ

ID NO:168.
Isolated plant EG307 or EG1117 polypeptides of the present invention can be produced in a variety of ways, including production and recovery of natural polypeptides, production and recovery of recombinant polypeptides, and chemical synthesis of the polypeptides. In one embodiment, an isolated polypeptide of the present invention is produced by culturing a cell capable of expressing the polypeptide under conditions effective to produce the polypeptide, and recovering the polypeptide. A preferred cell to culture is a recombinant cell that is capable of expressing the polypeptide, the recombinant cell being produced by transforming a host cell with one or more polynucleotides of the present invention.
Transformation of a polynucleotide into a cell can be accomplished by any method by which a polynucleotide can be inserted into the cell. Transformation techniques include, but are not limited to, transfection, electroporation, microinjection, lipofection, adsorption, and protoplast fusion. A recombinant cell may remain unicellular or may grow into a tissue, organ or a multicellular organism. Transformed polynucleotides of the present invention can remain extrachromosomal or can integrate into one or more sites within a chromosome of the transformed (i.e., recombinant) cell in such a manner that their ability to be expressed is retained. Suitable and preferred polynucleotides with which to transform a cell are as disclosed herein for suitable and preferred plant EG307 or EG1117 polynucleotides per se.
Particularly preferred polynucTeotides to include in recombinant cells of the present invention include SEQ ID NO:1, SEQ ID NO:91, SEQ ID. N0:2, SEQ ID N0:4, SEQ ID NO:S, SEQ
ID N0:7, SEQ ID NO:10, SEQ ID NO:11, SEQ ID N0:12, SEQ ID N0:14, SEQ ID NO:15, SEQ ID N0:17, SEQ ID N0:18, SEQ ID N0:20, SEQ ID N0:21, SEQ ID. N0:23, SEQ ID
N0:24, SEQ ID NO:25, SEQ ID N0:27, SEQ ID N0:28, SEQ ID N0:29, SEQ ID N0:30, SEQ ID N0:31, SEQ ID N0:33, SEQ ID N0:34, SEQ ID. N0:35, SEQ ID N0:37, SEQ ID
N0:38, SEQ ID N0:40, SEQ ID N0:41, SEQ ID N0:42, SEQ ID N0:44, SEQ ID N0:45, SEQ ID N0:46, SEQ ID N0:47, SEQ ID N0:49, SEQ ID. NO:50, SEQ ID NO:51, SEQ ID
N0:53, SEQ ID N0:54, SEQ ID NO:55, SEQ ID N0:57, SEQ ID N0:58, SEQ ID N0:60, SEQ ID N0:62, SEQ ID N0:63, SEQ ID N0:64, SEQ ID N0:66, SEQ ID N0:67, SEQ ID
N0:69, SEQ ID. N0:70, SEQ ID N0:71, SEQ ID N0:73, SEQ ID N0:74, SEQ ID N0:75, SEQ ID N0:77, SEQ ID N0:59, SEQ ID N0:78, SEQ ID N0:80, SEQ ID N0:81, SEQ ID
N0:82, SEQ ID N0:84, and/or SEQ ID N0:85. Alternative preferred polynucleotides to include in recombinant cells of the present invention include SEQ ID N0:92, SEQ ID N0:93, SEQ ID N0:94, SEQ ID N0:96, SEQ ID N0:97, SEQ ID N0:98, SEQ ID NO:100, SEQ ID
NO:101, SEQ ID N0:102, SEQ ID N0:103, SEQ ID N0:104, SEQ ID N0:104, SEQ ID
N0:106, SEQ ID N0:107, SEQ ID N0:109, SEQ ID NO:l 10, SEQ ID NO:l 12, SEQ ID
N0:113, SEQ ID N0:114, SEQ ID N0:116, SEQ ID N0:117, SEQ ID N0:119, SEQ ID
N0:120, SEQ ID N0:121, SEQ ID N0:122, SEQ ID N0:123, SEQ ID N0:124, SEQ ID
N0:125, SEQ ID N0:127, SEQ ID N0:128, SEQ ID N0:129, SEQ ID N0:130, SEQ ID
N0:131, SEQ ID N0:133, SEQ ID N0:135, SEQ ID NO:136, SEQ ID N0:137, SEQ ID
N0:138, SEQ ID N0:140, SEQ ID N0:141, SEQ ID N0:142, SEQ ID N0:144, SEQ ID
N0:145, SEQ ID N0:146, SEQ ID N0:147, SEQ ID N0:149, SEQ ID NO:150, SEQ ID
NO:151, SEQ ID NO:152, SEQ ID N0:154, SEQ ID NO:155, SEQ ID N0:157, SEQ ID
N0:158, SEQ ID NO:160, SEQ ID N0:161, SEQ ID N0:162, SEQ ID N0:163, SEQ ID
N0:165, SEQ ID N0:166, SEQ ID N0:167, and/or SEQ ID N0:168.
Suitable host cells to transform include any cell that can be transformed with a polynucleotide of the present invention. Host cells can be either untransformed cells or cells that are already transformed with at least one polynucleotide. Host cells of the present invention either can be endogenously (i.e., naturally) capable of producing plant EG307 or EG1117 polypeptides of the present invention or can be capable of producing such polypeptides after being transformed with at least one polynucleotide of the present invention.
Host cells of the present invention can be any cell capable of producing at least one polypeptide of the present invention, and include bacterial, fungal (including yeast and rice blast, Magnapo~the grisea), parasite (including nematodes, especially of the genera Xiphifzema, Helicotylenchus, and Tylehchlohyhchus), insect, other animal and plant cells.
Suitable host viruses to transform include any virus that can be transformed with a polynucleotide of the present invention, including, but not limited to, rice stripe virus, and echinochloa hoja blanca virus.
In a preferred embodiment, non-pathogenic symbiotic bacteria, which are able to live and replicate within plant tissues, so-called endophytes, or non-pathogenic symbiotic bacteria, which are capable of colonizing the phyllosphere or the rhizosphere, so-called epiphytes, are used. Such bacteria include bacteria of the genera Ag~obacteYium, Alcaligenes, Azospi~illuyn, Azotobactef~, Bacillus, Clavibacten, Ehtef~obacte~, E~~winia, Flavobacter, Klebsiella, Pseudomohas, Rhizobium, Sef°ratia, Sty°eptomyces and Xanthomonas. Symbiotic fungi, such as Trichode~ma and Gliocladiunz are also possible hosts for expression of the inventive nucleotide sequences for the same purpose.
A recombinant cell is preferably produced by transforming a host cell with one or more recombinant molecules, each comprising one or more polynucleotides of the present invention operatively linked to an expression vector containing one or more transcription control sequences. The phrase "operatively linked" refers to insertion of a polynucleotide into an expression vector in a manner such that the molecule is able to be expressed in the correct reading frame when transformed into a host cell. As used herein, an expression vector is a DNA or RNA vector that is capable of transforming a host cell and of effecting expression of a specified polynucleotide. Preferably, the expression vector is also capable of replicating within the host cell. Expression vectors can be either prokaryotic or eukaryotic, and are typically viruses or plasmids. Expression vectors of the present invention include any vectors that function (i.e., direct gene expression) in recombinant cells of the present invention, including in bacterial, fungal, parasite, insect, other animal, and plant cells. Preferred expression vectors of the present invention can direct gene expression in bacterial, yeast, fungal, insect and mammalian cells and more preferably in the cell types heretofore disclosed.
Recombinant molecules of the present invention may also (a) contain secretory signals (i.e., signal segment nucleic acid sequences) to enable an expressed EG307 or polypeptide of the present invention to be secreted from the cell that produces the polypeptide and/or (b) contain fusion sequences which lead to the expression of polynucleotides of the present invention as fusion polypeptides. Examples of suitable signal segments and fusion segments encoded by fusion segment nucleic acids are disclosed herein.
Eulearyotic recombinant molecules may include intervening and/or untranslated sequences surrounding and/or within the nucleic acid sequences of polynucleotides of the present invention. ~ Suitable signal segments include natural signal segments or any heterologous signal segment capable of directing the secretion of a polypeptide of the present invention.
Preferred signal and fusion sequences employed to enhance organ and organelle specific expression include, but are not limited to, arcelin-5, see Goossens, A. et. al. The arcelin-5 Gene of Phaseolus vulgaris directs high seed-specific expression in transgenic Phaseolus acutifolius and A~abidopsis plants. Plant Physiology (1999) 120:1095-1104, phaseolin, see Sengupta-Gopalan, C. et. al.
Developmentally regulated expression of the bean beta-phaseolin gene in tobacco seeds.
PNAS (1985) 82:3320-3324, hydroxyproline-rich glycoprotein , serpin, see Yan, X. et. al.
Gene fusions of signal sequences with a modified beta-glucuronidase gene results in retention of the beta-glucuronidase protein in the secretory pathway/plasma membrane.
Plant Physiology (1997) 115:915-924, N-acetyl glucosaminyl transferase 1, see Essl, D. et. al. The N-terminal 77 amino acids from tobacco N-acetylglucosaminyltransferase I are sufficient to retain reporter protein in the Golgi apparatus of Nicotiana benthamiana cells.
Febs Letters (1999) 453(1-2):169-73, albumin, see Vandekerckhove, J. et. al. Enkephalins produced in transgenic plants using modified 2S seed storage proteins. BioTechnology 7:929-932 (1989) and PRl, see Pen, J. et. al. Efficient production of active industrial enzymes in plants.
Industrial Crops and Prod. (1993) 1:241-250.
Polynucleotides of the present invention can be operatively linked to expression vectors containing regulatory sequences such as transcription control sequences, translation control sequences, origins of replication, and other regulatory sequences that are compatible with the recombinant cell and that control the expression of polynucleotides of the present invention. In particular, recombinant molecules of the present invention include transcription control sequences. Transcription control sequences are sequences which control the initiation, elongation, and termination of transcription. Included are those transcription control sequences which are sufficient to render promoter-dependent gene expression controllable for cell-type specific, tissue-specific or inducible by external signals or agents; such elements may be located in the 5' or 3' regions of the native gene. Particularly important transcription control sequences are those which control transcription initiation, such as promoter, enhancer, operator and repressor sequences. Suitable transcription control sequences include any transcription control sequence that can function in at least one of the recombinant cells of the present invention. A variety of such transcription control sequences are known to those skilled in the art. Preferred transcription control sequences include those which function in bacterial, yeast, fungal, insect and mammalian cells, such as, but not limited to, tac, lac, trp, trc, oxy-pro, ompllpp, rrnB, bacteriophage lambda (7~) (such as ?~pL and ~,pR
and fusions that include such promoters), bacteriophage T7, T7lac, bacteriophage T3, bacteriophage SP6, bacteriophage SPO1, metallothionein, a-mating factor, Pichia alcohol oxidase, alphavirus subgenomic promoters (such as Sindbis virus subgenomic promoters), antibiotic resistance gene, baculovirus, Heliothis zea insect virus, vaccinia virus, herpesvirus, poxvirus, adenovirus, cytomegalovirus (such as intermediate early promoters, simian virus 40, retrovirus, actin, retroviral long terminal repeat, Rous sarcoma virus, heat shock, phosphate and nitrate transcription control sequences as well as other sequences capable of controlling gene expression in prokaryotic or eukaryotic cells.
Particularly preferred transcription control sequences are plant transcription control sequences. The choice of transcription control sequence will vary depending on the temporal and spatial requirements for expression, and also depending on the target species. Thus, expression of the nucleotide sequences of this invention in any plant organ (leaves, roots, seedlings, immature or mature reproductive structures, etc.) or at any stage of plant development is preferred. Although many transcription control sequences from dicotyledons have been shown to be operational in monocotyledons and vice versa, ideally dicotyledonous transcription control sequences are selected for expression in dicotyledons, and monocotyledonous promoters for expression in monocotyledons. However, there is no restriction to the provenance of selected transcription control sequences; it is sufficient that they are operational in driving the expression of the nucleotide sequences in the desired cell.
Preferred transcription control sequences that are expressed constitutively include but are not limited to promoters from genes encoding actin or ubiquitin and the CaMV 35S and 19S promoters. The nucleotide sequences of this invention can also be expressed under the regulation of promoters that are chemically regulated. This enables the EG307 or EGl 117 polypeptide to be synthesized only when the crop plants are treated with the inducing chemicals. Preferred technology for chemical induction of gene expression is detailed in the published application EP 0 332 104 (to Ciba-Geigy) and IJ.S. Pat. No.
5,614,395. A preferred promoter for chemical induction is the tobacco PR-la promoter.
A preferred category of promoters is that which is induced by the physiological state of the plant (i.e. wound inducible, water-stress inducible, salt-stress inducible, disease inducible, and the like). Numerous promoters have been described which are expressed at wound sites and also at the sites of phytopathogen infection. Ideally, such a promoter should only be active locally at the sites of infection, and in this way the EG307 or EGl 117 polypeptides only accumulate in cells in which the accumulation is desired.
Preferred promoters of this kind include those described by Stanford et al. Mol. Gen.
Genet. 215: 200-208 (1989), Xu et al. Plant Molec. Biol. 22: 573-588 (1993), Logemann et al.
Plant Cell 1:
151-158 (1989), Rohrmeier & Lehle, Plant Molec. Biol. 22: 783-792 (1993), Firek et al. Plant Molec. Biol. 22: 129-142 (1993), and Warner et al. Plant J. 3: 191-201 (1993).
Preferred tissue-specific expression patterns include but are not limited to green tissue specific, root specific, stem specific, and flower specific. Promoters suitable for expression in green tissue include many which regulate genes involved in photosynthesis and many of these have been cloned from both monocotyledons and dicotyledons. A preferred promoter is the maize PEPC promoter from the phosphoenol carboxylase gene (Hudspeth & Grula, Plant Molec. Biol. 12: 579-589 (1989)). A preferred promoter for root specific expression is that described by de Framond (FEBS 290: 103-106 (1991); EP 0 452 269 to Ciba-Geigy). A

preferred stem specific promoter is that described in U.S. Pat. No. 5,625,136 (to Ciba-Geigy) and which drives expression of the maize trpA gene.
A recombinant molecule of the present invention is a molecule that can include at least one of any polynucleotide heretofore described operatively linked to at least one of any transcription control sequence capable of effectively regulating expression of the polynucleotide(s) in the cell to be transformed, examples of which are disclosed herein.
A recombinant cell of the present invention includes any cell transformed with at least one of any polynucleotide of the present invention. Suitable and preferred polynucleotides as well as suitable and preferred recombinant molecules with which to transfer cells are disclosed herein.
Recombinant cells of the present invention can also be co-transformed with one or more recombinant molecules including plant EG307 or EG1117 polynucleotides encoding one or more polypeptides of the present invention and one or more other polypeptides useful when expressed in plants.
It may be appreciated by one skilled in the art that use of recombinant DNA
technologies can improve expression of transformed polynucleotides by manipulating, for example, the number of copies of the polynucleotides within a host cell, the.efficiency with which those polynucleotides are transcribed, the efficiency with which the resultant transcripts are translated, and the efficiency of post-translational modifications. Recombinant techniques useful for increasing the expression of polynucleotides of the present invention include, but are not limited to, operatively linking polynucleotides to high-copy number plasmids, integration of the polynucleotides into one or more host cell chromosomes, addition of vector stability sequences to plasmids, substitutions or modifications of transcription control signals (e.g., promoters, operators, enhancers), substitutions or modifications of translational control signals (e.g., ribosome binding sites, Shine-Dalgarno sequences), modification of polynucleotides of the present invention to correspond to the codon usage of the host cell, deletion of sequences that destabilize transcripts, and use of control signals that temporally separate recombinant cell growth from recombinant enzyme production during fermentation. The activity of an expressed recombinant polypeptide of the present invention may be improved by fragmenting, modifying, or derivatizing polynucleotides encoding such a polypeptide.
Recombinant cells of the present invention can be used to produce one or more polypeptides of the present invention by culturing such cells under conditions effective to produce such a polypeptide, and recovering the polypeptide. Effective conditions to produce a polypeptide include, but are not limited to, appropriate media, bioreactor, temperature, pH
and oxygen conditions that permit polypeptide production. An appropriate, or effective, medium refers to any medium in which a cell of the present invention, when cultured, is capable of producing an EG307 or EGl 117 polypeptide of the present invention.
Such a medium is typically an aqueous medium comprising assimilable carbon, nitrogen and phosphate sources, as well as appropriate salts, minerals, metals and other nutrients, such as vitamins. The medium may comprise complex nutrients or may be a defined minimal medium. Cells of the present invention can be cultured in conventional fermentation bioreactors, which include, but are not limited to, batch, fed-batch, cell recycle, and continuous fermentors. Culturing can also be conducted in shake flasks, test tubes, microtiter dishes, and petri plates. Culturing is carried out at a temperature, pH and oxygen content appropriate for the recombinant cell. Such culturing conditions are well within the expertise of one of ordinary skill in the art.
Depending on the vector and host system used for production, resultant polypeptides of the present invention may either remain within the recombinant cell; be secreted into the fermentation medium; be secreted into a space between two cellular membranes, such as the periplasmic space in E. coli; or be retained on the outer surface of a cell or viral membrane.
The phrase "recovering the polypeptide" refers simply to collecting the whole fermentation medium containing the polypeptide and need not imply additional steps of separation or purification. Polypeptides of the present invention can be purified using a variety of standard polypeptide purification techniques, such as, but not limited to, affinity chromatography, ion exchange chromatography, filtration, electrophoresis, hydrophobic interaction chromatography, gel filtration chromatography, reverse phase chromatography, concanavalin A chromatography, chromatofocusing and differential solubilization.
Polypeptides of the present invention are preferably retrieved in "substantially pure" form. As used herein, "substantially pure" refers to a purity that allows for the effective use of the polypeptide as a diagnostic or test compound, and means, with increasing preference, at least 50%, 60%, 70%, 80%, 90%, 95%, or 98% homogeneous.
F. T~a~zsfected plant Bells ahd transgenic plahts With regard to EG307 and EG1117, particularly preferred recombinant cells are plant cells. By "plant cell" is meant any self propagating cell bounded by a semi-permeable membrane and containing a plastid. Such a cell also requires a cell wall if further propagation is desired. Plant cell, as used herein includes, without limitation, algae, cyanobacteria, seeds, suspension cultures, embryos, meristematic regions, callus tissue, leaves, roots, shoots, gametophytes, sporophytes, pollen, and microspores.
In a particularly preferred embodiment, at least one (or both) of the EG307 or polypeptides or an allele or mutant formthereof, of the invention is expressed in a higher organism, e.g., a plant. In this case, transgenic plants expressing effective amounts of the polypeptides exhibit improved economic productivity. A nucleotide sequence of the present invention is inserted into an expression cassette, which is then preferably stably integrated in the genome of said plant. In another preferred embodiment, the nucleotide sequence is included in a non-pathogenic self replicating virus. Plants transformed in accordance with the present invention may be monocots or dicots and include, but are not limited to, maize, wheat, barley, rye, millet, chickpea, lentil, flax, olive, fig almond, pistachio, walnut, beet, parsnip, citrus fruits, including, but not limited to, orange, lemon, lime, grapefruit, tangerine, minneola, and tangelo, sweet potato, bean, pea, chicory, lettuce, cabbage, cauliflower, broccoli, turnip, radish, spinach, asparagus, onion, garlic, pepper, celery, squash, pumpkin, hemp, zucchini, apple, pear, quince, melon, plum, cherry, peach, nectarine, apricot, strawberry, grape, raspberry, blackberry, pineapple, avocado, papaya, mango, banana, soybean, tomato, sorghum, sugarcane, sugarbeet, sunflower, rapeseed, clover, tobacco, carrot, cotton, alfalfa, rice, potato, eggplant, cucumber, A~~abidopsis, and woody plants such as coniferous and deciduous trees.
Once a desired nucleotide sequence has been transformed into a particular plant species, it may be propagated in that species or moved into other varieties of the same species, particularly including commercial varieties, using traditional breeding techniques.
Accordingly, the present invention provides a method for producing a transfected plant cell or transgenic plant comprising the steps of a) transfecting a plant cell to contain a heterologous DNA segment encoding a protein and derived from an EG307 and/or polynucleotide not native to said cell (the polynucleotide indeed could be native but the expression pattern could be developmentally altered, still leading to the preferred effect);
wherein said polynucleotide is operably linked to a promoter that can be used effectively for expression of transgenic proteins; b) optionally growing and maintaining said cell under conditions whereby a transgenic plant is regenerated therefrom; c) optionally growing said transgenic plant under conditions whereby said DNA is expressed, whereby the total amount of EG307 and/or EGl 117 polypeptide in said plant is altered. In a preferred embodiment, the method further comprises the step of obtaining and growing additional generations of .
descendants of said transgenic plant which comprise said heterologous DNA
segment wherein said heterologous DNA segment is expressed. As used herein, "heterologous DNA", or in some cases, "transgene" refers to foreign genes or polynucleotides, or additional, or modified versions of native or endogenous genes or polynucleotides (perhaps driven by different promoters) in order to alter the traits of a plant in a specific manner.
The invention also provides plant cells which comprise heterologous DNA
encoding an EG307 and/or EG1117 polypeptide. In a preferred embodiment, the transgenic plant cell is a propagation material of a transgenic plant. The present invention also provides a transfected host cell comprising a host cell transfected with a construct comprising a promoter, enhancer or intron polynucleotide from an evolutionarily significant EG307 and/or polynucleotide, and a polynucleotide encoding a reporter protein.
The present invention also provides a method of providing improved economic productivity in a plant comprising: a) producing a transfected plant cell having a transgene encoding an EG307 and/or EG1117 polypeptide whereby EG307 and/or EG1117 expression in said plant cell is altered; and b) growing a transgenic plant from the transfected plant cell wherein the EG307 and/or EG1117 transgene is expressed in the transgenic plant. The expression of the transgene includes an increase in EG307 and/or EGl 117 expression. In some embodiments, the expression of the transgene produces an RNA that may interfere with a native EG307 and/or EGl 117 gene such that the expression of the native gene is either eliminated or reduced, resulting in a useful outcome.
The invention also provides a transgenic plant containing heterologous DNA
which encodes an EG307 and/or EG1117 polypeptide that is expressed in plant tissue, including expression in a vector introduced into the plant.
The present invention also provides an isolated polynucleotide which includes a transcription control element operably linked to a polynucleotide that encodes the EG307 and/or EG1117 gene in plant tissue. In preferred embodiment, the transcription control element is the promoter native to an EG307 and/or EG1117 gene.
The present invention also provides a method of malting a transfected cell comprising a) identifying an evolutionarily significant EG307 and/or EGl 117 polynucleotide in a domesticated plant; b) using said EG307 and/or EGl 117 polynucleotide to identify a non-polypeptide coding sequence that may be a transcription or translation regulatory element, enhancer, intron or other 5' or 3' flanking sequence; c) assembling a construct comprising said non-polypeptide coding sequence and a polynucleotide encoding a reporter protein; and d) transfecting said construct into a host cell. The present invention also provides a transfected cell produced according to this method. In one embodiment, the host cell is a plant cell, and the method further comprises the step of growing and maintaining the cell under conditions suitable for regenerating a transgenic plant. Also provided is a transgenic plant produced by the method.
A nucleotide sequence of this invention is preferably expressed in transgenic plants, thus causing the biosynthesis of the corresponding EG307 and/or EG1117 polypeptide in the transgenic plants. In this way, transgenic plants with characteristics related to improved economic productivity are generated. For their expression in transgenic plants, the nucleotide sequences of the invention may require modification and optimization. Although preferred gene sequences may be adequately expressed in both monocotyledonous and dicotyledonous plant species, sequences can be modified to account for the specific codon preferences and GC content preferences of monocotyledons or dicotyledons as these preferences have been shown to differ (Murray et al. Nucl. Acids Res. 17. 477-498 (1989)). All changes required to be made within the nucleotide sequences such as those described above are made using well known techniques of site directed mutagenesis, PCR, and synthetic gene construction using the methods described in the published patent applications EP 0 385 962 (to Monsanto), EP 0 359 472 (to Lubrizol), and WO 93107278 (to Ciba-Geigy).
For efficient initiation of translation, sequences adjacent to the initiating methionine may require modification. For example, they can be modified by the inclusion of sequences known to be effective in plants. Joshi has suggested an appropriate consensus for plants (NAR
15: 6643-6653 (1987)) and Clontech suggests a further consensus translation initiator (1993/1994 catalog, page 210). These consensuses are suitable for use with the nucleotide sequences of this invention. The sequences are incorporated into constructions comprising the nucleotide sequences, up to and including the ATG (while leaving the second amino acid unmodified), or alternatively up to and including the GTC subsequent to the ATG (with the possibility of modifying the second amino acid of the transgene).
Expression of the nucleotide sequences in transgenic plants is driven by transcription control elements shown to be functional in plants. Transformation of plants with a polynucleotide under the control of these regulatory elements provides for controlled expression in the transformed plant. Such transcription control elements have been described above. In addition to the selection of a suitable initiator of transcription, constructions for expression of EG307 and/or EGl 117 polypeptide in plants require an appropriate transcription terminator to be attached downstream of the heterologous nucleotide sequence.
Several such terminators are available and known in the art (e.g. tml from CaMV, E9 from rbcS). Any available terminator known to function in plants can be used in the context of this invention.
Numerous other sequences can be incorporated into expression cassettes described in this invention. These include sequences which have been shown to enhance expression such 'as intron sequences (e.g. from Adhl and bronzel) and viral leader sequences (e.g. from TMV, MCMV and AMV).
The present invention also provides a method of increasing yield in a plant comprising a) producing a transgenic plant cell having a transgene encoding an EG307 and/or EG1117 polypeptide and the transgene is under the control of regulatory sequences suitable for controlled expression of the gene(s); and b) growing a transgenic plant from the transgenic plant cell wherein the EG307 and/or EGl 117 transgene is expressed in the transgenic plant.
The present invention also provides a method of increasing yield in a plant comprising a) producing a transfected plant cell having a transgene containing the EG307 and/or EGl 117 gene under the control of a promoter providing constitutive expression of the EG307 andlor EGl 117 gene; and b) growing a transgenic plant from the transgenic plant cell wherein the EG307 and/or EG1117 transgene is expressed constitutively in the transgenic plant.
The present invention also provides a method of providing controllable yield in a transgenic plant comprising: a) producing a transfected plant cell having a transgene containing the EG307 and/or EG1117 gene under the control of a promoter providing ~0 controllable expression of the EG307 and/or EG1117 gene; and b) growing a transgenic plant from the transgenic plant cell wherein the EG307 and/or EG1117 transgene is controllably expressed in the transgenic plant. In one embodiment, the EG307 and/or EG1117 gene is expressed using a tissue-specific or cell type-specific promoter, or by a promoter that is activated by the introduction of an external signal or agent, such as a chemical signal or agent.
It may be preferable to target expression of the nucleotide sequences of the present invention to different cellular localizations in the plant. In some cases, localization in the cytosol may be desirable, whereas in other cases, localization in some subcellular organelle may be preferred. Subcellular localization of heterologous DNA encoded polypeptides is undertaken using techniques well known in the art. Typically, the DNA encoding the target peptide from a known organelle-targeted gene product is manipulated and fused upstream of the nucleotide sequence. Many such target sequences are known for the chloroplast and their functioning in heterologous constructions has been shown. The expression of the nucleotide sequences of the present invention is also targeted to the endoplasmic reticulum or to. the vacuoles of the host cells. Techniques to achieve this are well-known in the art.

Vectors suitable for plant transformation are described elsewhere in this specification.
For Ag~obacteriun~-mediated transformation, binary vectors or vectors carrying at least one T-DNA border sequence are suitable, whereas for direct gene transfer any vector is suitable and linear DNA containing only the construction of interest may be preferred. In the case of direct gene transfer, transformation with a single DNA species or co-transformation can be used (Schocher et al. Biotechnology 4: 1093-1096 (1986)). For both direct gene transfer and Agrobacte~iurn-mediated transfer, transformation is usually (but not necessarily) undertaken with a selectable marker which may provide resistance to an antibiotic (kanamycin, hygromycin or methotrexate) or a herbicide (basta). The choice of selectable marker is not, however, critical to the invention.
In another preferred embodiment, a nucleotide sequence of the present invention is directly transformed into the plastid genome. A major advantage of plastid transformation is that plastids are capable of expressing multiple open reading frames under control of a single promoter. Plastid transformation technology is extensively described in U.S.
Pat. Nos.
5,451,513, 5,545,817, and 5,545,818, in PCT application no. WO 95/16783, and in McBride et al. (1994) Proc. Natl. Acad. Sci. USA 91, 7301-7305. The basic technique for chloroplast transformation involves introducing regions of cloned plastid DNA flanking a selectable marker together with the gene of interest into a suitable target tissue, e.g., using biolistics or protoplast transformation (e.g., calcium chloride or PEG mediated transformation). The 1 to 1.5 kb flanking regions, termed targeting sequences, facilitate homologous recombination with the plastid genome and thus allow the replacement or modification of specific regions of the plastome. Initially, point mutations in the chloroplast 165 rRNA and rpsl2 genes conferring resistance to spectinomycin and/or streptomycin are utilized as selectable markers for transformation (Svab, Z., Hajdukiewicz, P., and Maliga, P. (1990) Proc.
Natl. Acad. Sci.
USA 87, 8526-8530; Staub, J. M., and Maliga, P. (1992) Plant Cell 4, 39-45).
This resulted in stable homoplasmic transformants at a frequency of approximately one per 100 bombardments of target leaves. The presence of cloning sites between these markers allowed creation of a plastid targeting vector for introduction of foreign genes (Staub, J. M., and Maliga, P. (1993) EMBO J. 12, 601-606). Substantial increases in transformation frequency are obtained by replacement of the recessive rRNA or r-polypeptide antibiotic resistance genes with a dominant selectable marker, the bacterial aadA gene encoding the spectinomycin-detoxifying enzyme aminoglycoside-3'-adenyltransferase (Svab, Z., and Maliga, P. (1993) Proc. Natl. Acad. Sci. USA 90, 913-917). Previously, this marker had been used successfully for high-frequency transformation of the plastid genome of the green alga Chlamydomonas reinhardtii (Goldschmidt-Clermont, M. (1991) Nucl. Acids Res.
19: 4083-4089). Other selectable markers useful for plastid transformation are known in the art and encompassed within the scope of the invention. Typically, approximately 15-20 cell division cycles following transformation are required to reach a homoplastidic state.
Plastid expression, in which genes are inserted by homologous recombination into all of the several thousand copies of the circular plastid genome present in each plant cell, takes advantage of the enormous copy number advantage over nuclear-expressed genes to permit expression levels that can readily exceed 10% of the total soluble plant polypeptide. In a preferred embodiment, .a nucleotide sequence of the present invention is inserted into a plastid targeting vector and transformed into the plastid genome of a desired plant host. Plants homoplastic for plastid genomes containing a nucleotide sequence of the present invention are obtained, and are preferentially capable of high expression of the nucleotide sequence.
The present invention also provides a method of identifying a plant yield-related gene comprising: a) providing a plant tissue sample; b) introducing into the plant tissue sample a candidate plant yield-related gene; c) expressing the candidate plant yield-related gene within the plant tissue sample; and d) determining whether the plant tissue sample exhibits change in yield response, whereby a change in response identifies a plant yield-related gene. The present invention also provides plant yield-related genes isolated according to the method.
Yield response, as used herein, is measured by techniques well known to those skilled in the art. In the cereals yield response is determined, for example, by one or more of the following metrics, grain weight, grain length, grain weight/1000 grains, size of panicle, number of panicles, and number of grains/panicle.
G. EG307 or EGI117Antibodies The present invention also includes isolated antibodies capable of selectively binding to an EG307 or EG1117 polypeptide of the present invention or to a mimetope thereof. Such antibodies are also referred to herein as anti-EG307 or anti-EG1117 antibodies. Particularly preferred antibodies of this embodiment include anti-O. sativa EG307 antibodies, anti-O.
~ufipogofz EG307 antibodies, anti-Z. mays EG307 antibodies, anti-O. sativa antibodies, anti-O. rufipogon EG1117 antibodies, anti-Z. ways EG117 antibodies.
Isolated antibodies are antibodies that have been removed from their natural milieu.
The term "isolated" does not refer to the state of purity of such antibodies.
As such, isolated antibodies can include anti-sera containing such antibodies, or antibodies that have been purified to varying degrees.

As used herein, the term "selectively binds to" refers to the ability of antibodies of the present invention to preferentially bind to specified polypeptides and mimetopes thereof of the present invention. Binding can be measured using a variety of methods known to those skilled in the art including immunoblot assays, immunoprecipitation assays, radioimmunoassays, enzyme immunoassays (e.g., ELISA), immunofluorescent antibody assays and immunoelectron microscopy; see, for example, Sambrook et al., ibid., and Harlow ~c Lane, 1990, ibid.
Antibodies of the present invention can be either polyclonal or monoclonal antibodies.
Antibodies of the present invention include functional equivalents such as antibody fragments and genetically-engineered antibodies, including single chain antibodies, that are capable of selectively binding to at least one of the epitopes of the polypeptide or mimetope used to obtain the antibodies. Antibodies of the present invention also include chimeric antibodies that can bind to more than one epitope. Preferred antibodies are raised in response to polypeptides, or mimetopes thereof, that are encoded, at least in part, by a polynucleotide of the present invention.
A preferred method to produce antibodies of the present invention includes (a) administering to an animal an effective amount of a polypeptide or mimetope thereof of the present invention to produce the antibodies and (b) recovering the antibodies.
In another method, antibodies of the present invention are produced recombinantly using techniques as 20. heretofore disclosed to produce EG307 or EG1117 polypeptides of the present invention.
Antibodies of the present invention have a variety of potential uses that are within the scope of the present invention. For example, such antibodies can be used (a) as reagents in assays to detect expression of EG307 or EG1117 by plant and/or (b) as tools to screen expression libraries and/or to recover desired polypeptides of the present invention from a mixture of polypeptides and other contaminants. Furthermore, antibodies of the present invention can be used to target cytotoxic agents to plants in order to directly kill such plants.
Targeting can be accomplished by conjugating (i.e., stably joining) such antibodies to the cytotoxic agents using techniques known to those skilled in the art. Suitable cytotoxic agents are known to those skilled in the art. Suitable cytotoxic agents include, but are not limited to:
double-chain polypeptides (i.e., toxins having A and B chains), such as diphtheria toxin, ricin toxin, Pseudomonas exotoxin, modeccin toxin, abrin toxin, and shiga torn;
single-chain toxins, such as pokeweed antiviral polypeptide, a-amanitin, and ribosome inhibiting polypeptides; and chemical toxins, such as melphalan, methotrexate, nitrogen mustard, doxorubicin and daunomycin. Preferred double-chain toxins are modified to include the toxic domain and translocation domain of the toxin but lack the toxin's intrinsic cell binding domain.
H. Fo~mulatioya of GYOwth-Enhancing Compositions The invention also includes compositions comprising at least one or both of the EG307 or EGl 117 polypeptides of the present invention. In order to effectively control growth such compositions preferably contain sufficient amounts of polypeptide.
Such amounts vary depending on the target crop, and on the environmental conditions, such as humidity, temperature or type of soil. In a preferred embodiment, compositions comprising the EG307 and/or EG1117 polypeptide comprise host cells expressing the polypeptides without additional purification. In another preferred embodiment, the cells expressing the EG307 and/or EGl 117 polypeptides are lyophilized prior to their use as a growth-enhancing agent. In another embodiment, the EG307 or EGl 117 polypeptides are engineered to be secreted from the host cells. In cases where purification of the polypeptides from the host cells in which they are expressed is desired, various degrees of purification of the EG307 or EG1117 polypeptides are reached.
The present invention further embraces the preparation of compositions comprising at least one EG307 or EG1117 polypeptide of the present invention, which is homogeneously mixed with one or more compounds or groups of compounds described herein. The present invention also relates to methods of treating plants, which comprise application of the EG307 or EGl 117 polypeptides or compositions containing the EG307 or EG1117 polypeptides, to plants. The EG307 or EGl 117 polypeptides can be applied to the crop area in the form of compositions or plant to be treated, simultaneously or in succession, with further compounds.
These compounds can be both fertilizers or micronutrient donors or other preparations that influence plant growth. They can also be selective herbicides, insecticides, fungicides, bactericides, nematicides, molluscicides or mixtures of several of these preparations, if desired together with further carriers, surfactants or application-promoting adjuvants customarily employed in the art of formulation. Suitable carriers and adjuvants can be solid or liquid and correspond to the substances ordinarily employed in formulation technology, e.g. natural or regenerated mineral substances, solvents, dispersants, wetting agents, tackifiers, binders or fertilizers.
A preferred method of applying EG307 or EG1117 polypeptides of the present invention is by spraying the soil, water, or foliage of plants. The number of applications and the rate of application depend on the type of plant and the desired increase in yield. The EG307 or EG1117 polypeptides can also penetrate the plant through the roots via the soil (systemic action) by impregnating the locus of the plant with a liquid composition, or by applying the compounds in solid form to the soil, e.g. in granular form (soil application). The EG307 or EGl 117 polypeptides may also be applied to seeds (coating) by impregnating the seeds either with a liquid formulation containing EG307 or EGl 117 polypeptides, or coating them with a solid formulation. In special cases, further types of application are also possible, for example, selective treatment of the plant stems or buds.
The EG307 or EGl 117 polypeptides are used in unmodified form or, preferably, together with the adjuvants conventionally employed in the art of formulation, and are therefore formulated in known manner to emulsifiable concentrates, coatable pastes, directly sprayable or dilutable solutions, dilute emulsions, wettable powders, soluble powders, dusts, granulates, and also encapsulations, for example, in polymer substances. Like the nature of the compositions, the methods of application, such as spraying, atomizing, dusting, scattering or pouring, are chosen in accordance with the intended objectives and the prevailing circumstances.
The formulations, compositions or preparations containing the EG307 or EGl 117 polypeptides and, where appropriate, a solid or liquid adjuvant, are prepared in a known manner, for example by homogeneously mixing and/or grinding the EG307 or polypeptides with extenders, for example solvents, solid carriers and, where appropriate, surface-active compounds (surfactants).
Suitable solvents include aromatic hydrocarbons, preferably the fractions having 8 to 12 carbon atoms, for example, xylene mixtures or substituted naphthalenes, phthalates such as dibutyl phthalate or dioctyl phthalate, aliphatic hydrocarbons such as cyclohexane or paraffms, alcohols and glycols and their ethers and esters, such as ethanol, ethylene glycol monomethyl or monoethyl ether, ketones such as cyclohexanone, strongly polar solvents such as N-methyl-2-pyrrolidone, dimethyl sulfoxide or dimethyl formamide, as well as epoxidized vegetable oils such as epoxidized coconut oil or soybean oil or water.
The solid carriers used e.g. for dusts and dispersible powders, are normally natural mineral fillers such as calcite, talcum, kaolin, montmorillonite or attapulgite. In order to improve the physical properties it is also possible to add highly dispersed silicic acid or highly dispersed absorbent polymers. Suitable granulated adsorptive carriers are porous types, for example pumice, broken brick, sepiolite or bentonite; and suitable nonsorbent carriers are materials such as calcite or sand. In addition, a great number of pregranulated materials of inorganic or organic nature can be used, e.g. especially dolomite or pulverized plant residues.

Suitable surface-active compounds are nonionic, cationic and/or anionic surfactants having good emulsifying, dispersing and wetting properties. The term "surfactants" will also be understood as comprising mixtures of surfactants. Suitable anionic surfactants can be both water-soluble soaps and water-soluble synthetic surface-active compounds.
Suitable soaps are the alkali metal salts, alkaline earth metal salts or unsubstituted or substituted ammonium salts of higher fatty acids (chains of 10 to 22 carbon atoms), for example the sodium or potassium salts of oleic or stearic acid, or of natural fatty acid mixtures which can be obtained for example from coconut oil or tallow oil. The fatty acid methyltaurin salts may also be used.
More frequently, however, so-called synthetic surfactants are used, especially fatty sulfonates, fatty sulfates, sulfonated benzimidazole derivatives or alkylaxylsulfonates.
The fatty sulfonates or sulfates are usually in the form of alkali metal salts, alkaline earth metal salts or unsubstituted or substituted ammonium salts and have a 8 to 22 carbon alkyl radical which also includes the alkyl moiety of alkyl radicals, for example, the sodium or calcium salt of lignonsulfonic acid, of dodecylsulfate or of a mixture of fatty alcohol sulfates obtained from natural fatty acids. These compounds also comprise the salts of sulfuric acid esters and sulfonic acids of fatty alcohol/ethylene oxide adducts. The sulfonated benzimidazole derivatives preferably contain 2 sulfonic acid groups and one fatty acid radical containing 8 to 22 carbon atoms. Examples of alkylarylsulfonates are the sodium, calcium or triethanolamine salts of dodecylbenzenesulfonic acid, dibutylnapthalenesulfonic acid, or of a naphthalenesulfonic acid/formaldehyde condensation product. Also suitable are corresponding phosphates, e.g. salts of the phosphoric acid ester of an adduct of p-nonylphenol with 4 to 14 moles of ethylene oxide.
Non-ionic surfactants are preferably polyglycol ether derivatives of aliphatic or cycloaliphatic alcohols, or saturated or unsaturated fatty acids and alkylphenols, said derivatives containing 3 to 30 glycol ether groups and 8 to 20 caxbon atoms in the (aliphatic) hydrocarbon moiety and 6 to 18 carbon atoms in the alkyl moiety of the alkylphenols.
Further suitable non-ionic surfactants are the water-soluble adducts of polyethylene oxide with polypropylene glycol, ethylenediamine propylene glycol and alkylpolypropylene glycol containing 1 to 10 carbon atoms in the alkyl chain, which adducts contain 20 to 250 ethylene glycol ether groups and 10 to 100 propylene glycol ether groups.
These compounds usually contain 1 to 5 ethylene glycol units per propylene glycol unit.
Representative examples of non-ionic surfactants are nonylphenolpolyethoxyethanols, castor oil polyglycol ethers, polypropylene/polyethylene oxide adducts, tributylphenoxypolyethoxyethanol, polyethylene glycol and octylphenoxyethoxyethanol.
Fatty acid esters of polyoxyethylene sorbitan and polyoxyethylene sorbitan trioleate are also suitable non-ionic surfactants.
Cationic surfactants are preferably quaternary ammonium salts which have, as N-substituent, at least one C8-C22 alkyl radical and, as further substituents, lower unsubstituted or halogenated alkyl, benzyl or lower hydroxyalkyl radicals. The salts are preferably in the form of halides, methylsulfates or ethylsulfates, e.g.
stearyltrimethylammonium chloride or benzyldi(2-chloroethyl)ethylammonium bromide. The surfactants customarily employed in the art of formulation are described, for example, in "McCutcheon's Detergents and Emulsifiers Annual," MC Publishing Corp. Ringwood, N.J., 1979, and Sisely and Wood, "Encyclopedia of Surface Active Agents," Chemical Publishing Co., Inc. New York, 1980.
!V. Identification of Genes Evolved Under Neutral Conditions As described in detail herein, KA/Ks analysis allows the identification of positively selected protein-coding genes; however, this type of analysis can also be used to identify another set of evolutionarily significant genes, those genes evolving under neutral conditions.
A KA/Ks ratio > 1 signifies the role of positive selection, while conversely, a KA/Ks ratio < 1 suggests that a protein-coding gene has been negatively selected (i.e., has been conserved). As noted elsewhere herein, most genes (in fact, the vast majority) are conserved.
Only rare genes exhibit a KAIKs ratio > l, since very few genes are positively selected. As described herein, genes that were positively selected during domestication of the cereals (as well as other crops) have significant commercial value; however, another set of genes contained in the genomes of domesticated plants has been neither positively (to produce a desired, enhanced trait in the domesticated descendant) nor negatively selected (conserved).
This subset of plant genes, as noted above, also has a significant commercial value, and this set of genes can be identified by using KA/Ks analysis, to be described here.
These genes comprise those that render the plant resistant to drought, disease, pests (including, but not limited to, insects, animal herbivores, and microbes), high salt levels, and other stresses. Attacks by pests, and damage by drought or high salt levels, etc, are responsible for annual losses of billions of dollars to faxmers, seed companies, and the large agricultural companies. The identification of genes that render wild plants resistant to these stresses is thus of great value, both socially (to a hungry world), and economically.
The method to detect these genes is as follows. After plants were first domesticated (and subsequently, as the descendants are further domesticated), they were "pampered", in the sense, for example, that humans supply water in sufficient quantities to meet the plant's needs. Thus the plant is not required to deal with drought stress "on its own". Similarly, humans remove insect pests (either physically, or through the use of pesticides), and segregate domesticated plants away from animal herbivores, such that the domesticated plant is not constantly confronted with the need to deal with these pests. In fact, it has been well documented that domesticated cereals, for example, are usually much more vulnerable to drought, high salt levels, pests, and other stresses than are their wild relatives/ancestors. This is because organisms generally do not maintain abilities that are not required to survive. As humans take over these roles, domesticated plants can save the high metabolic costs ("metabolic extravagance") of maintaining genes that code for stress-related traits.
This loss of resistance must of course stem from genetic differences (i.e., changes) between the ancestor and its pampered domesticated descendent. These genetic changes that result in loss of function can occur through three different mechanisms. The genes that code for these traits may actually be lost from the genome of the descendent crop.
Gene loss has been documented and is a well-known phenomenon. Similarly, the genes that code for "unneeded" traits in a descendent crop may still persist in the genome, but are no longer expressed, as a result of promoter changes, for example. Alternatively, the genes coding for these unneeded traits may still be part of the genome, and may still be expressed, but the genes may have accumulated nucleotide substitutions that render the protein product either nonfunctional or less fully functional than the ancestral homolog. These genes are thus evolving vceutr~ally.
Neutral amino acid replacements accumulate in the protein product of a gene that is free of selective pressures (either positive or negative). For a domesticated plant that has been freed of the need to maintain a functional protein product for the gene of interest, a condition of molecular neutrality exits. This includes genes that code for traits like pest, disease, drought, salt, etc., resistance. Such fully unconstrained, neutrally evolving genes are perfect candidates for detection by KA/Ks analysis, as a neutrally evolving gene will ideally exhibit a KA/Ks ratio = 1, when the homolog from the ancestral and descendant plants are compared.
Thus the method invented and described here involves high-throughput sequencing of a cDNA library for an ancestral plant, BLASTING the resulting ESTs against a database of ESTs from the modern descendent, and performing KA/Ks analysis for homologous pairs.
The details of this process are explained elsewhere in this patent, for the case of a positively selected gene. The genes with a KA/Ks ratio = 1 will be the set of genes that control important stress resistant traits, and that these genes can be effectively and swiftly identified by use of this ratio. This commercially valuable set of genes includes those coding for desirable traits such resistance to pests, disease, drought, high salt levels, etc. To best identify these genes, the EST sequencing from both the modern domesticated and the ancestral species should be performed very carefully, with a high standard of accuracy. While one can make use of cereal EST databases available in GenBank, one may also resequence ESTs from cDNA
libraries prepared specifically for this purpose. The accuracy of sequencing is important, because this will give rise to a very narrow distribution of gene pair comparisons between ancestral and modern homologs that have a KA/Ks ratio equal to one. This will reduce the number of false positives to a minimum, thus expediting the process.
When the accuracy of the screening process is not stringently controlled, or is unknown, it is possible that sequencing errors will obscure a KA/Ks ratio of 1.0, and for this reason, KA/Ks values of between about 0.75 - 1.25 are checked carefully for evidence of neutral evolution.
Polynucleotides that have evolved under neutral conditions can then be mapped onto one of the known quantitative trait loci, or QTL, whereby the specific stress-resistance trait controlled by that polynucleotide may be rapidly and conclusively identified.
V. Screening Methods for Identification of Agents The present invention also provides screening methods using the polynucleotides and polypeptides identified and characterized using the above-described methods.
These screening methods are useful for identifying agents which may modulate the functions) of the polynucleotides or polypeptides in a manner that would be useful for enhancing or diminishing a characteristic in a domesticated or ancestor organism.
Generally, the methods entail contacting at least one agent to be tested with a domesticated organism, ancestor organism, or transgenic organism or cell that has been transfected with a polynucleotide sequence identified by the methods described above, or a preparation of the polypeptide encoded by such polynucleotide sequence, wherein an agent is identified by its ability to modulate function of either the polynucleotide sequence or the polypeptide.
For example, an agent can be a compound that is applied or contacted with a domesticated plant or animal to induce expression of the identified gene at a desired time. Specifically in regard to plants, an agent could be used, for example, to induce flowering at an appropriate time.
As used herein, the term "agent" means a biological or chemical compound such as a simple or complex organic or inorganic molecule, a peptide, a protein or an oligonucleotide.
A vast array of compounds can be synthesized, for example oligomers, such as oligopeptides and oligonucleotides, and synthetic organic and inorganic compounds based on various core structures, and these are also included in the term "agent". In addition, various natural sources can provide compounds for screening, such as plant or animal extracts, and the like.
Compounds can be tested singly or in combination with one another.
To "modulate function" of a polynucleotide or a polypeptide means that the function of the polynucleotide or polypeptide is altered when compared to not adding an agent.
Modulation may occur on any level that affects function. A polynucleotide or polypeptide function may be direct or indirect, and measured directly or indirectly. A
"function" of a polynucleotide includes, but is not limited to, replication, translation, and expression pattern(s). A polynucleotide function also includes functions associated with a polypeptide encoded within the polynucleotide. For example, an agent which acts on a polynucleotide and affects protein expression, conformation, folding (or other physical characteristics), binding to other moieties (such as ligands), activity (or other functional characteristics), regulation and/or other aspects of protein structure or function is considered to have modulated polynucleotide function. The ways that an effective agent can act to modulate the expression of a polynucleotide include, but are not limited to 1) modifying binding of a transcription factor to a transcription factor responsive element in the polynucleotide; 2) modifying the interaction between two transcription factors necessary for expression of the polynucleotide;
3) altering the ability of a transcription factor necessary for expression of the polynucleotide to enter the nucleus; 4) inhibiting the activation of a transcription factor involved in transcription of the polynucleotide; 5) modifying a cell-surface receptor which normally interacts with a ligand and whose binding of the ligand results in expression of the polynucleotide; 6) inhibiting the inactivation of a component of the signal transduction cascade that leads to expression of the polynucleotide; and 7) enhancing the activation of a transcription factor involved in transcription of the polynucleotide.
A "function" of a polypeptide includes, but is not limited to, conformation, folding (or other physical characteristics), binding to other moieties (such as ligands), activity (or other functional characteristics), and/or other aspects of protein structure or functions. For example, an agent that acts on a polypeptide and affects its conformation, folding (or other physical characteristics), binding to other moieties (such as ligands), activity (or other functional characteristics), and/or other aspects of protein structure or functions is considered to have modulated polypeptide function. The ways that an effective agent can act to modulate the function of a polypeptide include, but are not limited to 1) changing the conformation, folding or other physical characteristics; 2) changing the binding strength to its natural ligand or changing the specificity of binding to ligands; and 3) altering the activity of the polypeptide.
Generally, the choice of agents to be screened is governed by several parameters, such as the particular polynucleotide or polypeptide target, its perceived function, its three-s dimensional structure (if known or surmised), and other aspects of rational compound design.
Techniques of combinatorial chemistry can also be used to generate numerous permutations of candidates. Those of skill in the art can devise and/or obtain suitable agents for testing.
The in vivo screening assays described herein may have several advantages over conventional drug screening assays: 1) if an agent must enter a cell to achieve a desired therapeutic effect, an ifz vivo assay can give an indication as to whether the agent can enter a cell; 2) an i~ vivo screening assay can identify agents that, in the state in which they are added to the assay system are ineffective to elicit at least one characteristic which is associated with modulation of polynucleotide or polypeptide function, but that are modified by cellular components once inside a cell in such a way that they become effective agents;
3) most importantly, an isZ vivo assay system allows identification of agents affecting any component of a pathway that ultimately results in characteristics that are associated with polynucleotide or polypeptide function.
In general, screening can be performed by adding an agent to a sample of appropriate cells which have been transfected with a polynucleotide identified using the methods of the present invention, and monitoring the effect, i.e., modulation of a function of the polynucleotide or the polypeptide encoded within the polynucleotide. The experiment preferably includes a control sample which does not receive the candidate agent. The treated and untreated cells are then compared by any suitable phenotypic criteria, including but not limited to microscopic analysis, viability testing, ability to replicate, histological examination, the level of a particular RNA or polypeptide associated with the cells, the level of enzymatic activity expressed by the cells or cell lysates, the interactions of the cells when exposed to infectious agents, and the ability of the cells to interact with other cells or compounds.
Differences between treated and untreated cells indicate effects attributable to the candidate agent. Optimally, the agent has a greater effect on experimental cells than on control cells.
Appropriate host cells include, but are not limited to, eukaryotic cells, preferably plant or animal cells. The choice of cell will at least partially depend on the nature of the assay contemplated.
To test for agents that upregulate the expression of a polynucleotide, a suitable host cell transfected with a polynucleotide of interest, such that the polynucleotide is expressed (as ~7 used herein, expression includes transcription and/or translation) is contacted with an agent to be tested. An agent would be tested for its ability to result in increased expression of mRNA
and/or polypeptide. Methods of making vectors and transfection are well known in the art.
"Transfection" encompasses any method of introducing the exogenous sequence, including, . for example, lipofection, transduction, infection or electroporation. The exogenous polynucleotide may be maintained as a non-integrated vector (such as a plasmid) or may be integrated into the host genome.
To identify agents that specifically activate transcription, transcription regulatory regions could be linked to a reporter gene and the construct added to an appropriate host cell.
As used herein, the term "reporter gene" means a gene that encodes a gene product that can be identified (i.e., a reporter protein). Reporter genes include, but are not limited to, alkaline phosphatase, .chloramphenicol acetyltransferase, (3-galactosidase, luciferase and green fluorescence protein (GFP). Identification methods for the products of reporter genes include, but are not limited to, enzymatic assays and fluorimetric assays. Reporter genes and assays to 1 S detect their products are well known in the art and are described, for example in Ausubel et al.
(1987) and periodic updates. Reporter genes, reporter gene assays, and reagent kits are also readily available from commercial sources. Examples of appropriate cells include, but are not limited to, plant, fungal, yeast, mammalian, and other eukaryotic cells. A
practitioner of ordinary skill will be well acquainted with techniques for transfecting eukaryotic cells, including the preparation of a suitable vector, such as a viral vector;
conveying the vector into the cell, such as by electroporation; and selecting cells that have been transformed, such as by using a reporter or drug sensitivity element. The effect of an agent on transcription from the regulatory region in these constructs would be assessed through the activity of the reporter gene product.
Besides the increase in expression under conditions in which it is normally repressed mentioned above, expression could be decreased when it would normally be expressed. An agent could accomplish this through a decrease in transcription rate and the reporter gene system described above would be a means to assay for this. The host cells to assess such agents would need to be permissive for expression.
Cells transcribing mRNA (from the polynucleotide of interest) could be used to identify agents that specifically modulate the half life of mRNA and/or the translation of mRNA. Such cells would also be used to assess the effect of an agent on the processing and/or post-translational modification of the polypeptide. An agent could modulate the amount of polypeptide in a cell by modifying the turn-over (i.e., increase or decrease the half life) of the polypeptide. The specificity of the agent with regard to the mRNA
and polypeptide would be determined by examining the products in the absence of the agent and by examining the products of unrelated mRNAs and polypeptides. Methods to examine mRNA half life, protein processing, and protein turn-over are well known to those skilled in the art.
In vivo screening methods could also be useful in the identification of agents that modulate polypeptide function through the interaction with the polypeptide directly. Such agents could block normal polypeptide-ligand interactions, if any, or could enhance or stabilize such interactions. Such agents could also alter a conformation of the polypeptide.
The effect of the agent could be determined using immunoprecipitation reactions.
Appropriate antibodies would be used to precipitate the polypeptide and any protein tightly associated with it. By comparing the polypeptides immunoprecipitated from treated cells and from untreated cells, an agent could be identified that would augment or inhibit polypeptide-ligand interactions, if any. Polypeptide-ligand interactions could also be assessed using cross- .
linking reagents that convert a close, but noncovalent interaction between polypeptides into a covalent interaction. Techniques to examine protein-protein interactions are well known to those skilled in the art. Techniques to assess protein conformation are also well known to those skilled in the art.
It is also understood that screening methods can involve in vity~o methods, such as cell-free transcription or translation systems. In those systems, transcription or translation is allowed to occur, and an agent is tested for its ability to modulate function.
For an assay that determines whether an agent modulates the translation of mRNA or a polynucleotide, an in vitf~o transcription/translation system may be used. These systems are available commercially and provide an i~c vitro means to produce mRNA corresponding to a polynucleotide sequence of interest. After mRNA is made, it can be translated in vitro and the translation products compared. Comparison of translation products between an in vitf°o expression system that does not contain any agent (negative control) with an i~ vitro expression system that does contain an agent indicates whether the agent is affecting translation.
Comparison of translation products between control and test polynucleotides indicates whether the agent, if acting on this level, is selectively affecting translation (as opposed to affecting translation in a general, non-selective or non-specific fashion). The modulation of polypeptide function can be accomplished in many ways including, but not limited to, the in vivo and in vitro assays listed above as well as in in vitro assays using protein preparations.
Polypeptides can be extracted and/or purified from natural or recombinant sources to create protein preparations.

An agent can be added to a sample of a protein preparation and the effect monitored; that is whether and how the agent acts on a polypeptide and affects its conformation, folding (or other physical characteristics), binding to other moieties (such as ligands), activity (or other functional characteristics), and/or other aspects of protein structure or functions is considered to have modulated polypeptide function.
In an example for an assay for an agent that binds to a polypeptide encoded by a polynucleotide identified by the methods described herein, a polypeptide is first recombinantly expressed in a prokaryotic or eukaryotic expression system as a native or as a fusion protein in which a polypeptide (encoded by a polynucleotide identified as described above) is conjugated with a well-characterized epitope or protein. Recombinant polypeptide is then purified by, for instance, immunoprecipitation using appropriate antibodies or anti-epitope antibodies or by binding to immobilized ligand of the conjugate. An affinity column made of polypeptide or fusion protein is then used to screen a mixture of compounds which have been appropriately labeled. Suitable labels include, but are not limited to fluorochromes, radioisotopes, enzymes and chemiluminescent compounds. The unbound and bound compounds can be separated by washes using various conditions (e.g. high salt, detergent) that are routinely employed by those skilled in the art. Non-specific binding to the affinity column can be minimized by pre-clearing the compound mixture using an affinity column containing merely the conjugate or the epitope. Similar methods can be used for screening for an agents) that competes for binding to polypeptides. In addition to affinity chromatography, there are other techniques such as measuring the change of melting temperature or the fluorescence anisotropy of a protein which will change upon binding another molecule. For example, a BIAcore assay using a sensor chip (supplied by Pharmacia Biosensor, Stitt et al.
(1995) Cell 80: 661-670) that is covalently coupled to polypeptide may be performed to determine the binding activity of different agents.
It is also understood that the in vitro screening methods of this invention include structural, or rational, drug design, in which the amino acid sequence, three-dimensional atomic structure or other property (or properties) of a polypeptide provides a basis for designing an agent which is expected to bind to a polypeptide. Generally, the design and/or choice of agents in this context is governed by several parameters, such as side-by-side comparison of the structures of a domesticated organism's and homologous ancestral polypeptides, the perceived function of the polypeptide target, its three-dimensional structure (if known or surmised), and other aspects of rational drug design. Techniques of combinatorial chemistry can also be used to generate numerous permutations of candidate agents.
Also contemplated in screening methods of the invention are transgenic animal and plant systems, which are known in the art.
The screening methods described above represent primary screens, designed to detect any agent that may exhibit activity that modulates the function of a polynucleotide or polypeptide. The skilled artisan will recognize that secondary tests will lilcely be necessary in order to evaluate an agent further. For example, a secondary screen may comprise testing the agents) in an assay using mice and other animal models (such as rat), which are known in the art or in the domesticated or ancestral plant or animal itself. In addition, a cytotoxicity assay would be performed as a further corroboration that an agent which tested positive in a primary screen would be suitable for use in living organisms. Any assay for cytotoxicity would be suitable for this purpose, including, for example the MTT assay (Promega).
The screening methods detailed earlier in this specification may be applied specifically to EG307 or EGl 117. Accordingly, the invention provides a method of identifying an agent that modulates the function of the non-polypeptide coding regions of an EG307 or EG1117 polynucleotide, comprising contacting a host cell that has been transfected with a construct comprising the non-polypeptide coding region operabley linked to a reporter gene coding region, with at least one candidate agent, wherein the agent is identified by its ability to modulate the transcription or translation of said reporter polynucleotide. The present invention also provides agents identified by the method.
The present invention also provides a method of identifying an agent that modulates the function of the non-polypeptide coding regions of an evolutionarily significant EG307 or EG1117 polynucleotide, comprising contacting a plant or transgenic plant containing an EG307 or EGl 117 polynucleotide with at least one candidate agent, wherein the agent is identified by its ability to modulate the transcription or translation of said reporter polynucleotide. The present invention also provides agents identified by the method.
The present invention also provides a method of identifying an agent which may modulate yield, said method comprising contacting at least one candidate agent with a plant or cell comprising an EG307 or EG1117 gene, wherein the agent is identified by its ability to modulate yield. In one embodiment the plant or cell is transfected with a polynucleotide encoding and EG307 or EG1117 gene. The present invention also provides agents identified by the method. In one embodiment, the identified agent modulates yield by modulating a function of the polynucleotide encoding the polypeptide. In another embodiment, the identified agent modulates yield by modulating a function of the polypeptide.
The invention also includes agents identified by the screening methods described herein. .
The following examples are provided to further assist those of ordinary skill in the art.
Such examples are intended to be illustrative and therefore should not be regarded as limiting the invention. A number of exemplary modifications and variations are described in this application and others will become apparent to those of skill in this axt.
Such variations axe considered to fall within the scope of the invention as described and claimed herein.
EXAMPLES
EXAMPLE 1: cDNA Library Construction A domesticated plant or animal cDNA library is constructed using an appropriate tissue from the plant or animal. A person of ordinary skill in the art would know the appropriate tissue or tissues to analyze according to the trait of interest.
Alternately, the whole organism may be used. For example, 1 day old plant seedlings are known to express most of the plant's genes.
Total RNA is extracted from the tissue (RNeasy kit, Quiagen; RNAse-free Rapid Total RNA kit, 5 Prime--3 Prime, Inc., or any similar and suitable product) and the integrity and purity of the RNA are determined according to conventional molecular cloning methods.
Poly A+ RNA is isolated (Mini-Oligo(dT) Cellulose Spin Columns, 5 Prime--3 Prime, Inc., or any similar and suitable product) and used as template for the reverse-transcription of cDNA
with oligo (dT) as a primer. The synthesized cDNA is treated and modified for cloning using commercially available kits. Recombinants are then packaged and propagated in a host cell line. Portions of the packaging mixes are amplified and the remainder retained prior to amplification. The library can be normalized and the numbers of independent recombinants in the library is determined.
EXAMPLE 2: Seauence Comparison Randomly selected ancestor cDNA clones from the cDNA library are sequenced using an automated sequencer, such as an ABI 377 or MegaBACE 1000 or any similar and suitable - product. Commonly used primers on the cloning vector such as the M13 Universal and Reverse primers are used to carry out the sequencing. For inserts that are not completely sequenced by end sequencing, dye-labeled terminators or custom primers can be used to fill in remaining gaps.
The detected sequence differences are initially checked for accuracy, for example by finding the points where there are differences between the domesticated and ancestor sequences; checking the sequence fluorogram (chromatogram) to determine if the bases that appear unique to the domesticated organism correspond to strong, clear signals specific for the called base; checking the domesticated organism's hits to see if there is more than one sequence that corresponds to a sequence change; and other methods known in the art, as needed. Multiple domesticated organism sequence entries for the same gene that have the same nucleotide at a position where there is a different ancestor nucleotide provides independent support that the domesticated sequence is accurate, and that the domesticated/ancestor difference is real. Such changes are examined using public or commercial database information and the genetic code to determine whether these DNA
sequence changes result in a change in the amino acid sequence of the encoded protein. The sequences can also be examined by direct sequencing of the encoded protein.
EXAMPLE 3: Molecular Evolution Analysis The domesticated plant or animal and wild ancestor sequences under comparison are subjected to KA/Ks analysis. In this analysis, publicly or commercially available computer programs, such as Li 93 and INA, are used to determine the number of non-synonymous changes per site (KA) divided by the number of synonymous changes per site (Ks) for each sequence under study as described above. Full-length coding regions or partial segments of a coding region can be used. The higher the KA/Ks ratio, the more likely that a sequence has undergone adaptive evolution. Statistical significance of KA/Ks values is determined using established statistic methods and available programs such as the t-test.
To further lend support to the significance of a high KA/Ks ratio, the domesticated sequence under study can be compared to other evolutionarily proximate species. These comparisons allow further discrimination as to whether the adaptive evolutionary changes are unique to the domesticated plant or animal lineage compared to other closely related species.
The sequences can also be examined by direct sequencing of the gene of interest from representatives of several diverse domesticated populations to assess to what degree the sequence is conserved in the domesticated plant or animal.

EXAMPLE 4: cDNA Library Construction A teosinte cDNA library is constructed using whole teosinte 1 day old seedlings, or other appropriate plant tissues. Total RNA is extracted from the seedling tissue and the integrity and purity of the RNA are determined according to conventional molecular cloning methods. Poly A+ RNA is selected and used as template for the reverse-transcription of cDNA with oligo (dT) as a primer. The synthesized cDNA is treated and modified for cloning using commercially available kits. Recombinants are then packaged and propagated in a host cell line. Portions of the packaging mixes are amplified and the remainder retained prior to amplification. Recombinant DNA is used to transfect E. coli host cells, using established methods. The libraxy can be normalized and the numbers of independent recombinants in the library is determined.
EXAMPLE 5: Seauence Comparison Randomly selected teosinte seedling cDNA clones from the cDNA library are sequenced using an automated sequencer, such as the ABI 377. Commonly used primers on the cloning vector such as the M13 Universal and Reverse primers are used to carry out the sequencing. For inserts that are not completely sequenced by end sequencing, dye-labeled terminators are used to fill in remaining gaps.
The resulting teosinte sequences are compared to domesticated maize sequences via database seaxches. Genome databases are publicly or commercially available for a number of species, including maize. One example of a maize database can be found at the MaizeDB
website at the University of Missouri. MaizeDB is a public Internet gateway to current knowledge about the maize genome and its expression. Other appropriate maize EST
(expressed sequence tag) databases axe privately owned and maintained. The high scoring "hits," i.e., sequences that show a significant (e.g., >80%) similarity after homology analysis, are retrieved and analyzed. The two homologous sequences are then aligned using the alignment program CLUSTAL V developed by Higgins et al. Any sequence divergence, including nucleotide substitution, insertion and deletion, can be detected and recorded by the alignment.
The detected sequence differences are initially checked for accuracy by fording the points where there are differences between the teosinte and maize sequences;
checking the sequence fluorogram (chromatogram) to determine if the bases that appear unique to maize correspond to strong, clear signals specific for the called base; checking the maize hits to see if there is more than one maize sequence that corresponds to a sequence change; and other methods known in the art as needed. Multiple maize sequence entries for the same gene that have the same nucleotide at a position where there is a different teosinte nucleotide provides independent support that the maize sequence is accurate, and that the teosinte/maize difference is real. Such changes are examined using public/commercial database information and the genetic code to determine whether these DNA sequence changes result in a change in the amino acid sequence of the encoded protein. The sequences can also be examined by direct sequencing of the encoded protein.
EXAMPLE 6: Molecular Evolution Analysis The teosinte and maize sequences under comparison axe subjected to KA/Ks analysis.
In this analysis, publicly or commercially available computer programs, such as Li 93 and INA, are used to determine the number of non-synonymous changes per site (KA) divided by the number of synonymous changes per site (Ks) for each sequence under study as described above. This ratio, KA/Ks, has been shown to be a reflection of the degree to which adaptive evolution, i.e., positive selection, has been at work in the sequence under study. Typically, full-length coding regions have been used in these comparative analyses.
However, partial segments of a coding region can also be used effectively. The higher the KA/Ks ratio, the more likely that a sequence has undergone adaptive evolution. Statistical significance of KA/Ks values is determined using established statistic methods and available programs such as the t-test. Those genes showing statistically high KA/Ks ratios between teosinte and maize genes are very likely to have undergone adaptive evolution.
To further lend support to the significance of a high KA/Ks ratio, the sequence under study can be compared in other ancestral maize species. These comparisons allow further discrimination as to whether the adaptive evolutionary changes are unique to the domesticated maize lineage compared to other ancestors. The sequences can also be examined by direct sequencing of the gene of interest from representatives of several diverse maize populations to assess to what degree the sequence is conserved in the maize species.
EXAMPLE 7: Application of K~//Ks Method to Maize and Teosinte Homologous Seguences obtained from a Database Comparison of domesticated maize and teosinte sequences available on Genbank (accessable through the Entrez Nucleotides database at the National Center for Biotechnology Information web site) revealed at least four homologous genes: waxy, A1 *, Al and globulin for which sequence was available from both maize and teosinte. All available sequences for these genes for both maize and teosinte were compared. The KA/Ks ratios were determined using Li93 and/or 1NA:
Gene Avr. No. Syn. Avr. No. Non-Syn.KAIKs Substitutions Substitutions Waxy 4 1 0.068 A1 * 10 3 0.011 A1 3 2 0.44-0.89 Globulin 10 7 0.42 Although it was anticipated that the polymorphism (multiple allelic copies) and/or the polyploidy (more than 2 sets of chromosomes per cell) observed in maize might make a KA/Ks analysis complex or difficult, it was found that this was not the case.
While the above KA/Ks values indicate that these genes are not positively selected, this example illustrates that the KA/Ks method can be applied to maize and its teosinte sequences obtained from a database.
EXAMPLE 8: Study of Protein Function using a Trans~enic Plant The functional roles of a positively selected maize gene obtained according to the methods of Examples 4-7 can be assessed by conducting assessments of each allele of the gene in a transgenic maize plant. A transgenic plant can be created using an adaptation of the method described in Peng et al. (1999) Nature 400:256-261. Physiological, morphological and/or biochemical examination of the transgenic plant or protein extracts thereof will permit association of each allele with a particular phenotype.
EXAMPLE 9: Manpin~ of Positively Selected Genes to OTLs QTL (quantitative trait locus) analysis has defined chromosomal regions that contain the genes that control several phenotypic traits of interest in maize, including plant height and oil content. By physically mapping each positively-selected gene identified by this method onto one of the known QTLs, the specific trait controlled by each positively-selected gene can be rapidly and conclusively identified.
EXAMPLE 10: Discovery of New Gene EG307 A normalized cDNA library was constructed from pooled tissues (including leaves, panicles, and stems) of Oryza ~ufipogon, the species known to be ancestral to modern rice. A
clone designated PBI0307H9 was first sequenced as part of a high-throughput sequencing project on a MegaBACE 1000 sequencer (AP Biotech). (SEQ ID N0:89) The sequence of this clone was used as a query sequence in a BLAST search of the GenBank database. Four anonymous rice ESTs (accession nos. AU093345, C29145, ISAJ0161, AU056792) were retrieved as hits. Further sequencing revealed that PBI307H9 was a partial cDNA clone.
PBI307H9 had a high KA/Ks ratio when compared to the domesticated rice (O~yza sativa) ESTs in GenBank. cDNA amplification and sequencing were accomplished as follows: Total RNA was isolated from O. rufipogoh (strain NSGC5953) and O. sativa cv.
Nipponbare (Qiagen RNeasy Plant Mini Kit: cat #74903). First strand cDNA was synthesized using a dT
primer (AP Biotech Ready-to-Go T-Primed First-Strand Kit: cat #27-9263-O1) and then used for PCR analysis (Qiagen HotStarTaq Master Mix Kit: cat#203445).
For ease in nomenclature, the gene contained in clone PBI0307H9 is named EG307, both here and throughout. Initially, before final sequence confirmation, the Ka/Ks ratio for EG307 derived from modern rice (O. sativa) and ancestral rice (O. ~ufipogon) EG307 was 1.7.
Once these partial sequences were confirmed in both O. ~ufipogoh and O.
sativa, 5' RACE (Clontech SMART RACE cDNA Amplification Kit: cat # K1811-1) was performed with a gene specific primer to obtain the 5' end of this gene. The complete gene, termed EG307, has a coding region 1344 by long. Final confirmation of the complete (1344 bp) in O. sativa and O. ~ufipogon allowed pairwise comparisons of a number of strains of O. rufipogo~ and O, sativa. Many of these comparisons yield KA/Ks ratios greater than one, some with statistical significance. This is compelling evidence for the role of positive selection on the EG307 gene. As the selection pressure imposed upon ancestral rice was human imposed, this is compelling evidence that EG307 is a gene that was selected for during human domestication of rice. No homologs to EG307 were identified by BLAST
search to the non-redundant section of GenBank, and, as noted above, only four rice genes were identified by BLAST in the EST section of GenBanlc (AU093345, AU056792, C29145, and ISA0161). All four ESTs were essentially uncharacterized.
EXAMPLE 11: Further Ka/Ks analysis of EG307 In order to ascertain the extent of genetic diversity present in O. sativa for the EG307 gene, genomic DNA was isolated from several different strains of O. sativa (acquired from the National Small Grains Collection, U.S.D.A., Aberdeen, Idaho), using Qiagen's protocol (DNeasy Plant Mini Kit: cat #69103). EG307 was then sequenced in genomic DNA
from six different O. sativa strains: Nipponbare, Lemont, IR64, Teqing, Azucena, and Kasalath. The KA/Ks ratios for each of these strains varied when compared to O. ~ufipogou.
Table 1 shows results for the entire 1344 bases of coding region.

Table 1. Full CDS Ka/Ks ratios for O. ~ufipogon (strain IRGC105491) vs. all O.
sativa strains examined.
Position (bp) Ka Ks Ka/Ks size in CDS t by Azucena 0.006680.009220,724 1341 1-1341 0.398 Lemont 0.006680.009220,724 1341 1-1341 0.398 Nipponbare 0.006680.00922p,724 1341 1-1341 0.398 Kasalath-1 0.002040.00483p,422 1341 1-1341 0.552 Kasalath-2 0.002930.004820.608 1341 1-1341 0.369 Kasalath-3 0.001150.004830,238 1341 1-1341 0.740 Kasalath-4 0.002040.004820.423 1341 1-1341 0.551 IR64 0.002040.007000,291 1341 1-1341 0.902 Teqing 0.000 0.000 DIV/0 1341 1-1341 DIV/0 There were differences in the untranslated (UTR) regions between O. s~ufipogo~
and all these O. sativa strains. The wide range of KA/Ks ratios was expected due to the differing degrees of cross breeding among the O. sativa strains. Some were more similar to O.
rufipogon than others due to cross breeding between O. f-ufipogoh with the domesticated strains. Sliding window analysis was performed for all pairwise comparisons between the protein coding region of O. rufipogon EG307 to the protein coding region of each of the O.
sativa strains we sequenced. This allowed identification of the specific areas of the protein that have been selected during domestication. Such pinpointing will allow a targeted approach to characterization of the changes that are important between the ancestral protein and the protein of the domesticated descendent crop plant. This may permit development of agents that target these vital domains of the protein, with the goal of increasing yield.
The length of the "window" was in most cases 150 bp, with a 50 by overlap with adjacent windows. (Thus, as an example, if reading from the 5' end of a CDS, the first window was 150 by in length, as was the adjacent second window to its 3' side.
The second window, also 150 in length, overlapped the first window by 50 by at the 5' end of the second window, and the third window, also 150 bp, overlapped the second window by 50 by at the 5' end of the third window. Thus, the second window overlapped both its adjacent neighbors, each by 50 bp.) In addition a second window analysis was completed in which the CDS was divided approximately into halves. This allows a greater sample size of nucleotides, so that an accurate statistical sampling can be undertaken. It should also be noted that KalKs, although conventionally expressed as a ratio, is really a way of asking "Does the Ka value exceed the Ks value by a statistically significant amount?" Thus, when Ks = 0, as often happens in ancestral rice-to-modern rice comparisons (because there are only some 7,000-8,000 years of domestication), a ratio cannot be computed, since the denominator of the fraction would equal zero. However, such comparisons may still detect the action of positive selection, if the (Ka-Ks) differ~er~ce is statistically significant. Thus for several comparisons shown in the following tables, positive selection can be detected, as long as the comparison is statistically significant. Like those comparisons for which the Ka/Ks ratio is significant, these are shown in bold.
It should also be noted that as a result of the stochastic nature of the nucleotide substitution process, not all comparisons to modern rice strains are expected to reveal evidence of positive selection, particularly since some cross breeding between O. r°ufipogorz and modern O. sativa is known to have occurred.
Table 2. Sliding Window Ka/Ks Ratios for O. ~ufipogoh (strain NSGC 5948) vs.
O. sativa, strain "Nipponbare". Note that all statistically significant comparisons are shown in bold.
Position (bp) Ka Ks Ka/Ks size in CDS t by Window #1 0.000 0.01780.000 165 91-255 0.965 Window #2 0.00790 0.000 DIV/0 150 256-405 0.999 Window #3 0.000 0.000 DIV/0 150 355-504 DIVlO

Window #4 0.000 0.000 DIV/0 150 454-603 DIV/0 Window #5 0.0203 0.000 DIV/0 150 556-705 1.40 Window #6 0.0106 0.000 DIV/0 150 655-804 0.994 Window #7 0.0083 0.000 DIV/0 150 754-903 0.999 Window #8 0.0183 0.000 DIV/0 150 856-1005 1.40 Window #9 0.000 0.000 DIV/0 150 955-1104 DIV/0 Window #10 0.00990 0.022310.444 150 1054-1203 0.493 Window #11 0.00847 0.032360.262 186 1156-1341 0.942 1st large 0.00791 0.000 D1V/0 543 256-798 1.72 Window 2'' large 0.00788 0.01080.728 543 799-1341 0.326 Window 80% CDS 0.00789 0.005401.46 1086 256-1341 0.495 Nearly full 0.00684 0.007010.976 1251 91-1341 0.0343 CDS

It is important to note here that there is statistical support for positive selection displayed in the comparison between O. r~ufzpogon and Nipponbare, when the first large window is used.
This is good evidence that positive selection has occurred (as a result of human domestication) between the ancestral O. r~ufrpogor~, and the domesticated O.
sativa (strain Nipponbare) EG307 homologs. As noted above, as a result of the stochastic nature of the nucleotide substitution process, not all comparisons to modern rice strains are expected to reveal evidence of positive selection. In addition, as noted above, cross breeding has occurred between O. ~ufipogon and some domesticated strains, further obscuring the signal of selection. What this analysis makes clear, however, is that positive selection has occurred on the EG307 gene.
Table 3. Sliding Window Ka/Ks Ratios for O. r~ufipogoh, strain NSGC 5948, vs.
O. sativa (strain "Lemont"). Note that all statistically significant comparisons are shown in bold.
Position (bp) Ka KS Ka/Ks size in CDS t by Window #1 0.000 0.0178 0.000 165 91-255 0.965 Window #2 0.007900.000 DIV/0 150 256-405 0.999 Window #3 0.000 0.000 DIV/0 150 355-504 DIV/0 Window #4 0.000 0.000 DIV/0 150 454-603 DIV/0 Window #5 0.0203 0.000 DIV/0 150 556-705 1.40 Window #6 0.0106 0.000 DIV/0 150 655-804 0.994 Window #7 0.0083 0.000 DIV/0 150 754-903 0.999 Window #8 0.0183 0.000 DIV/0 150 856-1005 1.40 Window #9 0.000 0.000 DIV/0 150 955-1104. DIV/0 Window #10 0.009900.022310.444 150 ~ 1054-12030.493 Window #11 0.008470.032360.262 186 1156-1341 0.942 1st large 0.007910.000 DIV/0 543 256-798 1.72 Window 2"'~ large 0.007880.0108 0.728 543 799-1341 0.326 Window 80% CDS 0.007890.005401.46 1086 256-1341 0.495 Nearly full 0.006840.007010.976 1251 91-1341 0.0343 CDS

It is important to note here that there is statistical support for positive selection displayed in the comparison between O. rufipogoya and Lemont, when the first large window is used. This is good evidence that positive selection has occurred (as a result of human domestication) between the ancestral O. rufipogon, and the domesticated O. sativa (strain Lemont) EG307 homologs. As noted above, as a result of the stochastic nature of the nucleotide substitution process, not all comparisons to modern rice strains are expected to reveal evidence of positive selection. In addition, as noted above, cross breeding has occurred between O.
r~ufipogon and some domesticated strains, further obscuring the signal of selection. What this analysis makes clear, however, is that positive selection has occurred on the EG307 gene.
Table 4. Sliding Window Ka/Ks Ratios for O. rufipogon, strain NSGC 5948, vs.
O. sativa (strain "IR64"). Note that all statistically significant comparisons are shown in bold.
Position (bp) Ka Ks Ka/Ks size in CDS t by Window #1 0.000 0.000 DIV/0 165 91-255 DIV/0 Window #2 0.000 0.000 DIV/0 150 256-405 DIV/0 Window #3 0.000 0.000 DIV/0 150 355-504 DIV/0 Window #4 0.000 0.000 DIV/0 150 454-603 DIV/0 Window #5 0.000 0.000 DIV/0 150 556-705 DIV/0 Window #6 0.000 0.000 DIV/0 150 655-804 DIV/0 Window #7 0.000 0.000 DIV/0 150 754-903 DIV/0 Window #8 0.000 0.000 DIV/0 150 856-1005 DIV/0 Window #9 0.000 0.000 DIV/0 150 955-1104 DIV/0 Window #10 0.000 0.000 DIV/0 150 1054-1203 DIV/0 Window #11 0.000 0.000 DIV/0 186 1156-1341 DIV/0 1st large Window0.000 0.000 DIV/0 543 256-798 DIV/0 2d large Window0.000 0.000 DIV/0 543 799-1341 DIV/0 80% CDS 0.000 0.000 DIV/0 1086 256-1341 DIV/0 Nearly full 0.000 0.000 DIV/0 1251 91-1341 DIV/0 CDS

Note that the protein coding region sequences of EG307 from O. ~ufrpogou and from the O.
sativa strain IR64 are identical, thus, the Ka/Ks values are equal to zero.
IR64 is a low yielding modern strain (personal communication, Shannon Pinson, Research Geneticist, USDA-ARS Rice Research Unit, Beaumont, TX), suspected of massive amounts of interbreeding with wild D. rufipogoh.

Table 5. Sliding Window KalKs Ratios for O. ~ufipogon, strain NSGC 5948, vs.
O. sativa (strain "Teqing"). Note that all statistically significant comparisons are shown in bold.
Position (bp) Ka Ks KalKs size in CDS t by Window #1 0.009850.000 DIV/0 165 91-255 0.995 Window #2 0.000 0.000 DIV/0 150 256-405 DIV/0 Window #3 0.000 0.000 DIV/0 150 355-504 DIV/0 Window #4 0.000 0.000 DIV/0 150 454-603 DIV/0 Window #5 0.000 0.000 DIV/0 150 556-705 DIV/0 Window #6 0.000 0.0343 0.000 150 655-804 0.987 Window #7 0.008260.000 DIV/0 150 754-903 0.999 Window #8 0.008060.000 DIV/0 150 856-1005 0.999 Window #9 0.000 0.000 DIVlO 150 955-1104 DIV/0 Window #10 0.000 0.000 DIV/0 150 1054-1203 DIVlO

Window #11 0.000 0.0155 0.000 186 1156-1341 0.980 1st large 0.000 0.0113 0.000 543 256-798 0.996 Window 2"a large 0.002180.005360.407 543 799-1341 0.547 Window 80% CDS 0.0011 0.008540.129 1086 256-1341 1.14 Nearly full 0.002180.0076?0.284 1251 91-1341 0.909 CDS

Note that no comparisons between the EG307 sequences from O. rufipogo~c and O.
sativa strain Teqing exhibit KalKs ratios greater than one. However, as noted above, as a result of the stochastic nature of the nucleotide substitution process, not all comparisons to modern rice strains are expected to reveal evidence of positive selection. In addition, as noted above, cross breeding has occurred between O. rufipogon and some domesticated strains, further obscuring the signal of selection.

Table 6. Sliding Window I~a/I~s Ratios for O. rufipogofz, strain NSGC 5948, vs. O. sativa (strain "Azucena"). Note that all statistically significant comparisons are shown in bold.
Position (bp) Ka KS Ka/Ks see in CDS t by Window #1 0.000 0.0178 0.000 165 91-255 0.965 Window #2 0.007900.000 DIV/0 150 ~ 256-405 0.999 Window #3 0.000 0.000 DIV/0 150 355-504 DIV/0 Window #4 0.000 0.000 DIV/0 150 454-603 DIVlO

Window #5 0.0203 0.000 DIV/0 150 556-705 1.40 Window #6 0.0106 0.000 DIV/0 150 655-804 0.994 Window #7 0.0083 0.000 DIV/0 150 754-903 0.999 Window #8 0.0183 0.000 DIV/0 150 856-1005 1.40 Window #9 0.000 0.000 DIV/0 150 955-1104 DIV/0 Window #10 0.009900.022310.444 150 1054-1203 0.493 Window #11 0.008470.032360.262 186 1156-1341 0.942 1st large 0.007910.000 DIVlO 543 256-798 1.72 Window 2"a large 0.007880.0108 0.728 543 799-1341 0.326 Window 80% CDS 0.007890.005401.46 1086 256-1341 0.495 Nearly full 0.006840.007010.976 1251 91-1341 0.0343 CDS

It is important to note here that there is statistical support for positive selection displayed in the comparison between O. rufipogof~ and Azucena, when the first large window is used.
This is again good evidence that positive selection has occurred (as a result of human domestication) between the ancestral O. ~ufipogou, and the domesticated O.
sativa (strain Azucena) EG307 homologs. As noted above, as a result of the stochastic nature of the nucleotide substitution process, not all comparisons to modern rice strains are expected to reveal evidence of positive selection. In addition, as noted above, cross breeding has occurred between O. rufipogon and some domesticated strains, further obscuring the signal of .
selection. What this analysis once again makes clear, however, is that positive selection has occurred on the EG307 gene.

Table 7. Sliding Window Ka/Ks Ratios for O. ~ufipogo~c, strain NSGC 5948, vs.
O. sativa (strain "Kasalath 4"). Note that all statistically significant comparisons are shown in bold.
Position (bp) in Ka Ks Ka/Ks CDS t size by Window #1 0.000 0.000 DIV/0150 1-150 DIV/0 Window #2 0.000 0.000 DIV/0150 100-249 DIV/0 Window #3 0.000 0.000 DIV/0150 199-348 DIV/0 Window #4 0.000 0.000 DIV/0150 301-450 DIV/0 Window #5 0.000 0.000 DIV/0150 400-549 DIV/0 Window #6 0.008260.000 DIV/0150 499-648 0.999 Window #7 0.01630.000 DIV/0150 601-750 1.41 Window #8 0.007900.000 DIV/0150 700-849 0.999 Window #9 0.000 0.000 DIV/0150 799-948 DIV/0 Window #10 0.000 0.0155 0.000186 901-1086 0.980 1st Half Window 0.000 0.000 DIV/0543 1-543 DIV/0 2a Half Window 0.004370.005340.818543 544-1086 0.157 Full CDS: Kasalath0.000 0.002680.00010861-1086 0.996 Full CDS: Kasalath0.001100.002680.41010861-1086 0.544 Full CDS: Kasalath0.001100.002680.41010861-1086 0.544 Full CDS: Kasalath0.002200.002680.82110861-1086 0.154 Note that sliding windows are shown only for Kasalath 4. There are 4 allelic differences (designated as Kasalath l, 2, 3, and 4) in this sequence, and as they differ only by single nucleotides, we have chosen to show only one, for purposes of clarity.
The KalKs ratios for each of the full CDS sequences, is shown, however. Note that no comparisons between the EG307 sequences from O. ~°ufipogou and O. sativa strain Kasalath exhibit KalKs ratios greater than one. However, as noted above, as a result of the stochastic nature of the nucleotide substitution process, not all comparisons to modern rice strains are expected to reveal evidence of positive selection. In addition, as noted above, cross breeding has occurred between O. f°ufipogo~c and some domesticated strains, further obscuring the signal of selection.
Upon completion of sequencing of EG307 in the NSGC 5953 strain of O.
rufipogofz, the completed sequence was used to design amplification primers. These primers were then used in the Polymerase Chain Reaction (PCR) to amplify the EG307 gene from several other O. ~ufipogou strains, including NSGC 5948, NSGC 5949, and IRGC105491. The amplified EG307 gene was then sequenced for each of these strains.

EXAMPLE 12: Mapping EG307 EG307 was then physically mapped in rice. Clemson University has developed a Rice Nipponbare bacterial artificial chromosome (BAC) Library; See Budiman, M.A.
1999, "Construction and characterization of deep coverage BAC libraries for two model crops:
Tomato and rice, and initiation of a chromosome walk to jointless-2 in tomato". Ph.D. thesis, Texas A & M University, College Station, TX. Library clones are available from Clemson in the form of hybridization filters.
Two different rice BAC libraries used in screening were purchased from the Clemson University Genomics Institute (CUGI). The OSJNBa library was constructed at CUGI from genomic DNA of the japonica rice strain (Nipponbare variety), and has an average insert size of 130 kb, covering 11 genome equivalents. This is one of the most widely used libraries for the International Rice Genome Sequencing Project. It was constructed in the HindIII site of pBeIoBACl l and contains 36,864 clones. The OSJNBb library was also constructed at CUGI
from genomic DNA of the japonica rice strain' (Nipponbare variety), and has an average insert size of 120 kb, covering 15 genome equivalents. This is another of the most widely used libraries for the International Rice Genome Sequencing Project. It was constructed in the EcoRl site of pIndigoBac536 and contains 55,296 clones.
The DIG protocol (BMB-Roche PCR DIG Probe Synthesis Kit cat #1636090) successfully labeled a unique EG307 494bp PCR product (primers: 5'-GAGTTCACAGGACAGCAGCA-3' (SEQ ID N0:87) and 5'-CAATTCTCTGAGATGCCTTGG-3') (SEQ ID N0:88) to screen against rice BAC filters.
The blots were detected easily using chemiluminescence as per the DIG protocol (BMB-Roche DIG Luminescent Detection Kit: cat #1636090). Two different O. sativa libraries, OSJNBa, and OSJNBb were screened for a total of 5 different filters, three covering the OSJNBb library, and two covering the OSJNBa library. Table 8 shows the individual BACs identified by all three screens:

Table 8. Individual BACs identified in all screens of BAC library with EG307 494bp PCR
product.
BAC Conti O. sativa chromosome b0008J24 contig 80 chromosome b0022E21 contig 80 chromosome b0025P07 not mapped --b0029I04 not mapped --b0047E13 contig 80 chromosome b0023J20 contig 80 chromosome b0033B08 contig 80 chromosome b0050N19 contig 80 chromosome b0054B 15 contig 80 chromosome b0071 C04 contig 80 chromosome b0053G15 contig 80 chromosome a0078K13 contig 80 chromosome a0087K16 contig 80 chromosome a0076M22 contig 80 chromosome a0095O02 contig 80 chromosome The reference data that allows physical mapping of a gene to a particular contig or chromosome are known to those skilled in the art, and are available on a web page made known to purchasers of filter sets or libraries from CUGI. There were also several faint, not significant hybridizations to contig 113, which was also on chromosome 3.
Rice contig 80 was on chromosome 3 and contained 66 BACs and 7 markers.
Judging by the overlap of all these BACs within contig 80, EG307 was approximately 200 kb upstream of marker CD01387 on the short arm of chromosome 3.
Data formerly in RiceGenes, a publicly accessible genome database developed and curated by the USDA-ARS is now integrated in Gramme. Grarnene was recently funded by the USDA IFAFS programme to create a curated, open-source, Web-accessible data resource for comparative genome analysis in the grasses. It provides a collection of rice genetic maps from Cornell University, the Japanese Rice Genome Research Program (JRGP), and the Korea Rice Genome Research Program (KRGRP), as well as comparisons with maps from other grasses (maize, oat, and wheat). The CD01387 marker was mapped to several different rice maps using the RiceGenes website.

There were also several QTLs mapped to this region, but many of them had rather wide ranges that covered almost the entire chromosome. One well-documented QTL
for 1000 grain weight was mapped to this region of chromosome 3 and was associated with marker RZ672 (S.R. McCouch, et al. Genetics 150:899-909 Oct 98). On one map (R3) mapped to 30.4 cM and RZ672 mapped to 39 cM, and both of these markers mapped to four other rice maps (Rice-CU-3, 3RC94, 3RC00, and 3RW99) in similar ranges (Figure 5). Thus, EG307 was within ~10 cM of this QTL marker. The R3 map also had a BAC, OSJNBa0091 P 11, mapped to 21.45 cM - 21.95 cM. EG307 was negative for this BAC and any others in the same contig upon screening the rice BAC libraries. The grain weight QTL
region of rice had also been involved in some synteny studies between rice and maize that indicated synteny between rice chromosome 3S and maize chromosomes 1S and 9L
(W.A.
Wilson, et al. Genetics 153(1): 453-473 Sep 99).
EXAMPLE 13: Identification of EG307 in maize and teosinte Searching the maize genome in GenBanlc by BLAST (using rice EG307 sequences) identified two maize ESTs, accession numbers BE511288 and BG320985, which appeared to be homologous. Primers were designed that allowed successful amplification of the maize (Zea mays) and teosinte (Zea nays pa~viglumis) EG307 homologs (SEQ ID N0:33 and SEQ
ID N0:34 , having a suggested open reading frame represented by SEQ ID N0:35, and SEQ
ID N0:66, having a suggested open reading frame represented by SEQ ID N0:67).
(Protein sequences for maize and teosinte were deduced; and are represented by SEQ ID
N0:36 and SEQ ID N0:68.) Table 9 shows KalKs estimates for a comparison between maize and teosinte.
Table 9. Ka/Ks Ratios for teosinte (Zea mays pa~viglumis) vs. modern maize (Zea mays).
Position Maize (bp) in (BS7) Ka Ks Ka/Ks size by CDS t Teosinte (Benz 967) 0.00970 0.02100.462 1347 1-1347 1.16 Although these Ka/Ks values do not show ratios that are greater than one, there is still evidence for positive selection. All amino acid replacements between ancestral rice and its modern domesticated descendant were characterized, and the same analysis was performed for teosinte and its descendant, modern maize. In both (independent) cases of domestication, a consistent pattern is observed: nearly all amino acid replacements in the modern crop (whether maize or rice), as compared to the ancestral plant (teosinte or ancestral rice) result in increased charge/polarity, increased solubility, and decreased hydrophobicity.
This pattern is most unlikely to have occurred by chance in these two independent domestication events.
This suggests that these replacements were a similar response to human imposed domestication. This is powerful evidence that EG307 has been selected as a result of human domestication of these two cereals.
Upon completion of sequencing of EG307 in one strain of teosinte, the completed sequence was used to design amplification primers. These primers were then used in the Polymerase Chain Reaction (PCR) to amplify the EG307 gene from several other teosinte strains, as well as several strains of modern maize. The amplified EG307 gene was then sequenced for each of these strains.
Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity and understanding, it will be apparent to those of ordinary skill in the art that certain changes and modifications can be practiced. Therefore, the description and examples should not be construed as limiting the scope of the invention, which is delineated by the appended claims.
EXAMPLE 14: Discovery of New Gene EG1117 and KAKs analysis Clone IWF1117H5 (hereafter termed EG1117) was first sequenced during EG's high-throughput sequencing project, conducted on MegaBACE 1000 sequencers (AP
Biotech).
This clone was sequenced from a normalized cDNA library (Incyte Genomics) constructed from material from ancestral rice, O~yza ~ufipogou. GenBank BLAST results hit three anonymous rice ESTs (AU055884, AU055885, BI808367), two anonymous corn ESTs (AI783000, AW000223), and two anonymous wheat ESTs (BE444456, BE443845).
Further sequencing revealed that IWF1117H5 was a partial cDNA clone. It had a KalKs ratio divisible by zero when compared to the domesticated rice, Oryza sativa, ESTs in GenBank.
Genomic DNA was isolated from several different cultivars of O. sativa following Qiagen's protocol (DNeasy Plant Mini Kit: cat #69103). Total RNA was isolated from O.
r~ufipogon and O. sativa cv. Nipponbare (Qiagen RNeasy Plant Mini Kit: cat #74903). First strand cDNA was synthesized using a dT primer (AP Biotech Ready-To-Go T-Primed First-Strand Kit: cat #27-9263-Ol) and then used for PCR analysis (Qiagen HotStarTaq Master Mix Kit: cat #203445). These protocols were also performed on Zea nays (maize), Zea mays parviglumis (teosinte), and on T~iticum aestivum (modern wheat).

Once these partial sequences were confirmed in both O. ~ufipogoh and O.
sativa, inverse PCR was performed with gene specific primers to attempt to obtain the 5' end of this gene. To date, 1659 by of CDS in O. ~ufipogon and O. sativa (Figures 6 and 7) have been identified. This partial sequence includes the stop codon.
EG1117 was then partially sequenced in genomic DNA from six different O.
sativa strains: Nipponbare, Lemont, IR64, Teqing, Azucena, and Kasalath. The Ka/Ks ratios for each of these strains varied when compared to O. ~ufipogon strain 5948. The KalKs ratios for 1656 bases of coding region are as follows:
Table 10. KalKs Ratios for O. ~ufipogon, strain NSGC 5948, vs. various strains of O. sativa.
Strain Ka/Ks t Nipponbare: O. ~ufipogon1.5 0.37 Lemont: O. rufipogon 1.5 0.37 Azucena: O. rufipogon1.5 0.37 IR64: O. ~ufipogon 0.0 1.0 Teqing: O. rufipogon 0.0 1.0 Kasalath: O. ~ufipogon0.0 1.4 The wide range of Ka/Ks ratios is expected due to the amount of cross breeding among the O. sativa strains. Some resemble O. rufipogo~ because of cross breeding between O. ~~ufipogoh with the domesticated strains.
The deduced protein sequence of O. sativa strain Nipponbare was used to perform a BLAST search. A very strong protein BLAST hit to Arabidopsis tlZaliahaPTR2-B
(histidine transporting protein, NP_178313) (SEQ ID N0:170) suggests that only about 30 codons of CDS are missing from the rice sequence (Figure 8).
Homology search results suggest that the EG1117 gene codes for a protein that is very similar to a family of peptide transport proteins that is found in a wide range of species including fungi, plants, insects and mammals. (See Koh, et al. (2002) A~abidopsis. Plaht Physiol. 128:21-29; Hauser, et al. (2001) Mol. Membr. Biol. 18:105-12; Hauser, et al. (2000) J. Biol. Chem. 275:3037-42; Lublcowitz, et al. (1997) Microbiology 143-387-96;
Steiner, et al.
(1995) Mol. Microbiol 16:825-34). EGl 117 codes for a protein of 577 amino acids, that appears to have 12 putative transmembrane domain regions. KA/Ks analysis of the EGl 117 suggests that at least a portion of the EG1117 gene was strongly selected during the domestication of rice.

It is clear that this particular protein is unique even though it shows an apparent structural homology to a large number of well characterized peptide transport proteins (Steiner, et al. (1995)). The sequence appears to encode the predicted twelve transmembrane domains characteristic of this family of proteins. The EG1117 protein clearly has homology with not only peptide transport proteins, but also the oligopeptide transport proteins and nitrate transport proteins (Lubkowitz, et al. (1997); Lin, et al. (2000) Plaht Physiol. 122:379-88; West, et al. (1998) PlafZt J. 15:221229). There is no homology to other types of transport proteins.
Peptide transport proteins are integral membrane proteins that typically contain twelve transmembrane domains in the case of the di/tripeptide transporters and may contain between twelve and fourteen transmembrane domains in the case of oligopeptide transporters. The peptide transport. protein family (PTR family) has been extensively studied in yeast and plants. Typically, these proteins aid in the transport of di/tripeptides or oligopeptides across a cell membrane in a proton-dependent fashion. These carriers couple peptide movement across the membrane to movement of protons down an inwardly directed electrochemical proton gradient allowing the transport of peptides to occur against a substrate gradient (Nakazono, et al. ( 1996) Cup ~. Genet. 29:412-16; Matsukura, et al. (2000) Plaut Physiol.
124:85-93; Toyofuku, et al., (2000) Plant Cell Physiol. 41:940-47; Hirose, et al. (1997) Plant Cell Physiol. 38:1389-1396; Horie, et al. (2001) Plant J. 27:129-38). Peptide transporters typically carry out sequence independent transport of all possible di- and tripeptides. All show stereoselectivity with peptides containing L-enantiomers of amino acids possessing a higher affinity for binding than peptides containing D-enantiomers. Currently, it is not possible to relate the structure of the various transporters to their substrate specificity or to their affinity.
Many different peptide transport proteins have been identified in a variety of species.
Global alignments of these proteins allowed researchers to identify motifs in the primary amino acid sequence that are typically found in all members of this family. In members of the peptide tranporters family, a "FYING" motif, named for the conserved F-Y-x-x-I-N-x-G-S-L residues in the fifth transmembrane domain (TMDS) and either a W-Q-I-P-Q-Y motif or a E-x-C-E-R-F-x-Y-Y-G motif in transmembrane domain 10 (TMD 10) have been identified (Becket, et al. (2001) iu PEPTIDES: THE WAVE OF THE FUTURE , M.
Lebl and R.A.
Houghten, eds. American Peptide Society, 957-58). Interestingly, site directed mutagenesis of the FYING motif in S. ce~evisiae results in attenuated growth on dipeptides, decreased sensitivity to toxic dipeptides and an elimination of radiolabeled dileucine.
These data suggested that the FYING motif plays a crucial role in substrate recognition and/or translocation.
In the case of plants, there is evidence in the literature that .peptide transporters are not only important in the nutritional uptake of peptides and nitrate, but also that these transporters affect the responses to auxins, pathogenic toxins and other developmental processes. In the case of an Af~abidopsis peptide transporter, AtPRT2, it was demonstrated that root growth was affected by toxic ethionine-containing peptides thought to be transported by this particular transport protein (Steiner, et al. (1994) Plant Cell 6:1289-99). In later studies, it was shown that either the over expression or inhibition of expression of the AtPTR2-B
protein by recombinant expression of sense or antisense constructs of the AtPTR2-B gene resulted in delayed flowering and arrested seed development in transgenic Ay~abidopsis plants (Song, et al. (1997) Plaht Physiol. 114:927-935). This suggests that peptide transporters can have a very profound effect on both the growth and development of plants.
Further analysis of the putative EG1117 peptide transporter demonstrates that a FYING motif is indeed present in TMDS of EG1117 compared to other representative plant PTR-2 type proteins. In addition, the EG1117 has a WQVPQY motif in TMD10 identical to the other representative plant PTR-2 type proteins. The multiple sequence alignments created by the DIALIGN local alignment program (Morgenstern (1999) Bioi~cfo~matics 15:211-218, demonstrate that there is nearly 95% alignment of the diverse PTR-2 type plant protein sequences with the rice EG1117 protein with about 70% homology at the amino acid level. In the O. sativa and O. f°ufipogeh proteins, there are only three non-synonymous amino acid replacements. These replacements are structurally significant replacements that may dramatically alter the function or specificity of the putative peptide transport protein. In one case, we have a change from a glutamine (polar uncharged) to a histidine (basic) amino acid.
At the other two positions, we see a change from the acidic aspaxtic acid to an uncharged glycine and a change from a acidic glutamic acid to an uncharged glycine. In general, all three changes shift towards a more basic charge profile.
EXAMPLE 15: Mapping EG1117 EG1117 was then physically mapped in rice. The DIG protocol (BMB-Roche PCR
DIG Probe Synthesis Kit cat #1636090) successfully labeled a unique EG1117 657bp PCR
product (primers: 5'- TCCTGCATCCCTCTCAACTT -3' and 5'-GCATTGGATTCGATGAATGT -3') to screen against rice BAC filters from Clemson University. The blots were definitively detected using chemiluminescence as per the DIG

protocol (BMB-Roche DIG Luminescent Detection Kit: cat #1636090). Two different O.
sativa libraries (OSJNBa and OSJNBb) were screened for a total of 2 different filters. Below are the BACs identified by both screens:
Table 11. Individual BACs identified in all screens of BAC library with EG1117 PCR product.
BAC C_onti~ D. sativa chromosome b0094D04 contig chromosome 3 b0067019 contig chromosome 3 b0073E24 contig chromosome 3 b0053L18 contig chromosome 3 b0095H17 contig chromosome 3 a0004L21 contig chromosome 3 a0031E20 contig chromosome 3 a0035M21 contig chromosome 3 a0024M01 contig chromosome 3 Rice c ontig 58 romosome nd contains 181 nd 15 markers:
is on ch 3 a BACs a EG1117 maps to the same BACs as markers CD01387, 0236, 0875, 82778 and 82015. These all map to 35.8 cM on map 3RJ98. This marker is mapped to several different rice maps, as accessed through the RiceGenes or Gramme website . There are also several QTLs mapped to this region. One well-documented QTL for 1000-grain weight is in this region of chromosome 3 and is associated with marker RZ672 (McCouch, S.R. et al.
Genetics 150:899-909). On one map CD01387 maps to 30.4 cM and RZ672 maps to 39 cM, and both of these markers map to four other rice maps in similar ranges. This region of rice has also been involved in some studies between rice and maize that indicate synteny between rice chromosome 3S and maize chromosomes 1S and 9L (Wilson, W.A. et al. Genetics 153(1):
453-473).
EXAMPLE lb: Relationship of EG307 and EG1117 EG1117 and the previously described gene, EG307 map to the same Clemson BAC
contig, 58. EG1117 lies towards the end of the p-arm about 3 cM upstream of EG307.
EG1117 maps to the same BACs as many of the markers on contig 58, and EG307 maps to the same contig, but has no markers directly mapped to its positive BACs.
A separate analysis was undertaken using data from a published YAC map for rice from the Rice Genome Project (RGP), a joint project of the National Institute of Agrobiological Sciences (NIAS) and the Institute of the Society for Techno-innovation of Agriculture, Forestry and Fisheries (STAFF) and a part of the Japanese Ministry of Agriculture, Forestry and Fisheries (MAFF) Genome Research Program.
The RGP database puts these two genes (EG1117 and EG307) 2 cM apart on chromosome 3. This YAC map has been accepted for publication in Plant Cell (Wu, J., et al., 2002 Plant Cell, prepublication copy). Upon a BLAST search, (see above), EG1117 hit AU055884 and AU055885. Both of these GenBank EST entries come from clone that maps to YACs Y2533 and Y5488. These YACs are anchored with S 10968, which maps °to Chromosome 3 at 33.5 cM.
The unexpected proximity of these two genes suggests a possible functional link.
EG307 and EG1117 may work together to increase yield. We speculate that EG307 may be a transcription factor for EG1117, thus creating a plant operon. All indications are that both EG307 and EG1117 are logical candidates for genes that would have an impact on agriculturally important traits based upon: 1) the KA/Ks analysis on rice domesticated and ancestral species, 2) linkage to a grain weight QTL, and 3) an evolutionary pattern of amino acid replacements between ancestral and domesticated species. EGl 117 also shows evidence for a strong positive selection during domestication based upon the KA/Ks analysis in rice.
EG1117 codes for a protein homologous to a family of peptide transporters.
Other members of this family have been shown in plants to influence growth, flowering and seed development. EG1117 is also linked to the QTL for grain weight. It is highly unlikely that this is a coincidence. These are ideal genes to use in the aims of this proposal to both validate the genes as agriculturally relevant.
EXAMPLE 17: Validation of Yield candidates: Association Analysis & Pedigree Analysis The role of EG307 and EGlll7 in controlling yield in the cereals can be validated by creation of transgenic plants, as described elsewhere in this patent;
additional validation support comes from association analysis and pedigree analysis.
Association analysis involves sequencing each candidate gene in a large number of , well-characterized rice strains to learn if the genes are associated with known traits. EG307 was sequenced in 13 well-characterized modern rice strains and it was determined that the derived, positively-selected allele is present in each of the 9 highest yielding strains, while the ancestral allele is present in the 4 lowest yielding strains. The pattern observed by examination of Table 12 is quite striking. This adds to the evidence that EG307 does influence yield, i.e., that it may be a so-called "yield" gene.
Table 12: Positively Selected EG307 Allele Partitions to High Meld Rice Strains 1000- Derived Accession Grain Ancestral Strain Name Number Weight Allele Allele AC27 PI378579 45.97 X

Kokoku pI389321 40.55 X
Mochi Razza 77 PI 279988 38.64 X

Vary Voto pI 400774 37.17 X

Azucena PI400077 32.08 X

Dalila PI388430 24.28 X

TOTO PI 274213 23.97 ~ X

Sathri SufaidPI 385876 23.95 X

Zenith Clor 7787 23.93 X

Ngoat 389239 9.57 X

BR52-8-1 408373 6.89 X

Jira Shahi 392245 9.05 X

IR1545-339-408625 3.37 X
2_2 Pedigree analysis takes advantage of two important sets of data. In addition to the available grain weight data, the derivation of many rice strains (i.e., in pedigrees) is well known. This allows a validation scheme in which yield-related candidate genes are plotted onto known rice strain pedigrees. For each strain, the lrnown 1000-grain weight and the type of allele (i. e., the "derived", adapted, modern allele) of EG307 and EG1117 are noted. The pattern of transmission of the adapted allele can be inferred from these data.
Example 18. Identification of EG1117 in maize and teosinte.
Using methods well known to those skilled in the art described in Example 13, EG1117 was amplified from a number of maize strains (Zea mays mays) SEQ ID NOs 119, 122, 123, 124, 127, 128, 129, 132, 135, 136, 137, 140, 141, 144, 145, 146, 149, 150, 151, 154) and a number of teosinte strains (Zea mays parviglumis) (SEQ ID NOs 157, 160, 161, 162, 165, 166, 167).

Example 19. Determination of the function of gene candidate EG307. .
To elucidate the function of the EG307 protein, the rice proteins it interacts with will be determined. This "guilt-by-association" approach is useful in situations where one wants to identify potential pathways or functions associated with the unknown protein (Editorial (2001) Nature 410). Two methods of determining interacting proteins include a global screening approach, such as the yeast two-hybrid approach, as well as a more direct approach using a recombinantly expressed form of the unknown protein to isolate interacting proteins based upon the affinity of their interaction. A brief outline of the experimental methods and design are presented for both methods.
A. Yeast two-h, b~(YTH, screen. The YTH screening method for interacting proteins relies upon the creation of recombinant fusions of the protein of interest with one half of a transcription activation factor protein binding domain (the bait) and the use of a cDNA
library of potential protein coding regions fused with the other half of the.
transcription activation factor activation domain (target protein). If the bait interacts with a target protein, the two halves of the transcription factor (binding domain and activation domain) are brought together and one gets initiation of transcription of a reporter gene. There are two basic types of YTH systems typically used, a GAL4 based system for standard YTH (Fields, et al. (199) Nature 340:2445-246) and a LexA based "Interaction-Trap" (IT) method (Golemis, et al.
(1997) in CURRENT PROTOCOLS IN MOLECULAR BIOLOGY F.M. Asubel, et al., eds., John Wiley ~ Sons, NY; Golemis, et al (1997) in CELLS: A LABORATORY MANUAL D.L.
SpSeptor, R. Goldman, and L. Leinwand, eds., Cold Spring Harbor Laboratory Press).
Two rice YTH cDNA libraries (L cv Mil-Yang) are available commercially from Eugentech, Inc. (Yusong Taejon, Korea). These libraries are created in the Stratagene HybriZAP~ (GAL4 based system) from cDNA created using mRNA isolated from rice developing spilces that are either <2 cm or >2 cm in length. These libraries should encode proteins important in both early and late embryonic development. Importantly, we know from our RT-PCR analysis of EG307 expression, that the EG307 protein should be present in these tissues. Therefore, these libraries will likely express proteins that interact with the EG307 protein.
Experimental Details. Standard reagents, yeast strains, vectors and DNA
isolation/sequencing specific for the HybriZAP YTH system will be obtained from Stratagene. The coding region of EG307 will be cloned using an RT-PCR
amplification of O.
sativa shoot mRNA. The PCR amplified insert will be cloned into a linearized pBD-GAL4 Cam phagemid vector and transformants carrying inserts will be selected on chloramphenicol plates to create the "bait" plasmid. The cloning junctions and coding region of EG307 will be sequenced using standard sequencing techniques at EG to ensure usage of the proper reading frame and that no mutations have been introduced during amplification of EG307.
Both commercial libraries from Eugentech are reported by the company to be single round amplified from the primary library and are supplied at 2 x 108 pfu.
Library I ( <2cm spike) has an initial complexity of 1x106 pfu and an amplified library titer of 3.6 x 108 pfu/ml.
Insert sizes in library I range from 0.5 to 3.0 kb. Library II (>2cm spike) has an initial complexity of 5 x 105 pfu and an amplified library titer of 4x106 pfu/ml.
Insert sizes in library II range from 0.5 to 1.6 kb. Since these premade libraries are commercially available at a very reasonable cost, it is prudent to do an initial YTH hunt using this particular system.
Both the bait plasmid and target plasmid library will be co-transfected into yeast strain YRG-2 using Stratagene's YRG-2 Yeast competent cell library lcit. Yeast carrying both plasmids will be selected for by the complementation of the YRG-2 auxotrophic mutations.
In this case the bait and target plasmids should complement both the tryptophan and leucine auxotrophy of the YRG-2 strain. Colonies that grow up from this co-transfection will be used to create a yeast library that will next be screened for interacting target proteins by further selection on plates lacking histidine and containing X-gal. The YRG-2 strain carries two additional reporter plasmids. One carries a GAL4 binding sequence upstream of a HIS gene and is used to complement the YRG-2 histidine auxotrophic mutation. The other plasmid carries a GAL4 binding sequence upstream of a Lac2 reporter gene. This enables blue/white screening for reporter gene expression when the yeast are plated on X-gal containing plates.
During the interaction screening phase, only yeast containing an interacting baitaarget combination will complement the histidine auxotrophy and also cause the expression of LacZ
and conversion of the X-gal substrate to a blue product. This double screen for interactions helps to limit the number of false positive colonies identified. In addition, to some degree the intensity of the blue substrate production is some indication of the strength of the interaction between the bait and target proteins.
Interacting colonies will be picked, the DNA will be isolated, and sequence the target plasmid inserts from several hundred colonies will be sequenced. Sequences will be translated and searched against protein sequences, both full length coding regions and potential open reading frames from ESTs. When multiple identical sequences are identified as targets, it is likely that the protein has been preferentially selected and represents an interacting protein. If a sequence is only represented once or a few times, it is either a non-specific interacting protein, or a transcript represented a limited number of times in the cDNA
library.
Multiple classes of interacting proteins should be identified this way.
Ideally, proteins of known function will be highly represented and a logical function or pathway easily identified. If the interacting proteins) are unknown, but homologous to known proteins, it may still be possible to design experiments to confirm the relevance of the interaction based on known information in the public domain.
Suspected protein-protein interactions will be validated by additional ifz vitro and ih vivo studies. Simpler assays to confirm interactions will be performed during Phase I if time permits. Assays such as affinity pull-downs and far-westerns will be performed as additional reagents such as antibodies and recombinant proteins are made. Generation of recombinant proteins for both the bait protein (EG307) as well as putative interacting proteins will be done as necessary as epitope tagged fusion proteins (GST, myc, V5, biotin tags).
Additional evidence for relevant in vivo interactions, such as fluorescence energy transfer between two appropriately constructed green fluorescent protein fusions, may be necessary to definitively prove an in vivo interaction and also measure factors that might influence that interaction.
However, such experiments are clearly beyond the scope of Phase I.
It is possible that the YTH hunt will identify that the bait protein itself binds to the GAL4 transcription activation sequence and causes activation of the reporter systems. If this occurs, two bait constructs expressing the two halves of the EG307 protein independently will be constructed. These constructs would be tested for direct activation of GAL4 reporters in YRG-2. If negative for direct activation, the cDNA target library would be rescreened for baitaarget interactions.
If no interacting proteins are identified from the commercial pre-made YTH
libraries, it is possible that these libraries are of poor quality or that the GAL4 YTH
system is not sensitive enough to identify the actual interacting proteins. In either case, having alternative libraries constructed in the interaction-trap system (LexA) can be considered.
These libraries would then serve as a basis for further characterization of any other unknown candidate proteins.
Example 20. Direct isolation and identification of interacting proteins using physical methods.
To directly isolate interacting proteins from plant tissues, affinity isolation of proteins present in various solubilized plant tissues will be performed. Isolated interacting proteins will be subjected to limited proteolysis and the resulting peptide fragments will be analyzed by mass spectral analysis to identify whether peptide fragment patterns indicative of known or predicted proteins are produced.
The EG307 protein will be cloned into a bacterial expression system to generate a GST-EG307 fusion protein using Pharmacia's pGEX-SX-1 (Amersham Pharmacia).
Bacterial lysates of the cultures where expression was induced by IPTG, will be used to purify the fusion protein on glutathione sepharose beads. The soluble protein will be eluted from the beads by competition for binding to the solid phase by passage of free glutathione over the column. The recombinant GST-EG307 can then be used as an affinity ligand when recoupled to fresh glutathione sepharose beads. Alternatively, free EG307 protein can also be obtained by removal of the GST domain by treatment of the fusion protein with Factor Xa. There are no Factor Xa protease sites in the predicted EG307 protein.
Plant cell lysates isolated from a large amount of ~. sativa seedlings (200-300 grams) will be created by standard differential tissue disruption and clarification techniques. To isolate cytosolic proteins, the cells will be disrupted by mechanical shearing using a polytron tissue homogenizer and sonicator while keeping the tissue lysate cold in the presence of a protease inhibitor cocktail. The soluble cytosolic lysates will be clarified by differential centrifugation at low speed to remove debris, followed by high speed centrifugation to remove insoluble aggregated protein. To isolate proteins in the insoluble fraction, various detergents such as NP-40, Brij~ 35, and deoxycholate will be used to solublize membrane bound proteins isolated from these insoluble membrane fractions. Insoluble material will be removed by centrifugation.
The plant cell lysates will be individually passed over the glutathioine-sepharose:GST-EG307 beads. The beads will be washed with buffers of various ionic strengths to remove weakly bound protein. Bound proteins will then be eluted with either low pH, high salt or denaturing detergents. Pure proteins will be run on an SDS-PAGE gel and the bands will be stained with a mass spectroscopy compatible silver staining reagent or commassie blue.
Bands of interest will be cut out of the gels and frozen for later analysis.
These proteins will be sent to a Mass Spectroscopy facility for limited proteolysis followed by mass spectroscopy to determine the proteolytic peptide signature and identity of the interacting protein.
This technique should allow for the identification of the interacting proteins as long as the affinity of the interaction is specific and strong enough to ensure a tight binding between the EG307 protein and the potential interacting protein. These data would then allow for the identification of the interacting protein if that protein is homologous to other proteins. It is clearly possible that no proteins will be identified by this method because of a lack of affinity for the EG307 protein. Alternatively,. it is possible that no interacting proteins are present in the lysates generated by the methods outlined above.
If too many proteins appear to bind to the glutathione-seph:GST-EG307 beads, it is possible that either those proteins are non-specifically binding to either the sepharose, glutathione, glutathioine synthestase or to an artificial epitope generated by creating the N-terminal GST fusion with EG307. To eliminate some of these non-specific interactions, lysates will be pre-cleared with sepharose beads alone, glutathione-sepharose as well as an irrelevant GST-fusion protein coupled to glutathione beads. If the nonspecific bands remain following the pre-clearance steps, more stringent washing and binding conditions such as higher salt, lower salt, increased or decreased pH, addition of non-ionic detergents such as Tween-20, will be employed to restrict the proteins that bind to this bait protein.
Example 21. Determine the function of gene candidate EG1117 coding for a novel protein with a putative peptide transport function.
Since the EG 1117 protein likely encodes a form of peptide transport protein based upon the in silico homology data suggesting it is a member of the PTR-2 family of protein, experiments that directly assess this particular function will be carried out.
Two complementary, but independent approaches will be used. First, a method of examining heterologous peptide transport protein functions in yeast will be used to examine the ability of both the domesticated and ancestral form of EGl 117 to transport peptides across the cell membrane and complement auxotrophic amino acid requirements in PTR-2 deletion mutants of yeast. Second, the growth characteristics of berth the domesticated and ancestral species of rice seedlings in the presence of toxic ethionine-containing peptides will be used to correlate the potential peptide transport protein mutations with a measurable phenotypic difference between the domesticated and ances"tral species of rice.
Complementation analysis of heterolo.-gous peptide transport functions in auxotrophic "~ For these studies, we will take advantage of the previously described methods used. to identify novel A~abidopsis peptide transport proteins using specifically designed auxotrophic mutant yeast strains that also carried a mutation in their ability to transport di-/tripeptides (Steiner, et al., 1994). This heterologous system demonstrated that plant peptide transport proteins could be cloned into yeast cells and the recombinantly expressed protein function could be measured by the complementation of the auxotrophic amino acid requirements of the yeast strain. This is a simple, but powerful assay that will quickly yield information about the function of both forms of EG1117.

Parental yeast strain BY4742 [Mata, his3-, leu2-, lys2-, ura2-] and the ORF deletion mutant from the "complete yeast deletion array collection"
available from ATCC in which the PTR-2 gene is deleted, designated BY4742-ptr2. Both strains are available from ATCC. The domesticated and ancestral forms of EGl 117 will be cloned into the pYES2.1-TOPO-TA vector (Invitrogen, Inc.) which allows for the recombinant expression of the putative PTR-2 proteins in the BY4742-ptr2 auxotrophic strain of yeast.
Selection of transfectants will be performed on plates lacking uracil. Plasmid DNA will be reisolated from the transfectants and analyzed by EG 1117 specific primers to confirm that the strain carries the appropriate plasmid. Protein expression is controlled by the presence of galactose in the media. Transfectants will be grown in media containing galactose and EG 1117 protein expression will be monitored by western blot analysis of the C-terminal VS
epitope tag added by the vector. The following shows peptides used in complementation and root growth assays:
Normal peptides Toxic Peptides Met-Leu Eth-Leu Met-Leu-Gly Eth-Leu-Gly Met-Leu-(Gly)2Eth-Leu-(Gly)2 Met-Leu-(Gly)3~
Eth-Leu-(Gly)3 Eth = ethionine, is a toxic derivative of methionine. All selection for auxotrophic phenotype will be done on plates lacking leucine.
To test whether EGl 117 codes for a peptide transport protein, the peptides listed above will be synthesized and purified commercially. Since each peptide carries a leucine, if the peptide is transported into the leucine auxotrophic strain BY4742-ptr2 transfected with a functional peptide transport protein and grown in the presence of galactose, that strain should be able to grow.
A second assay that will be performed is a inhibition assay. In this case, the ptr2 EG307 transfectants as well as the BY4742 parental and BY4742-ptr2 deletion mutants as controls, will be plated as a lawn on YPG (yeast extract, peptone, galactose) plates and the toxic ethionine-peptide derivatives will be spotted onto membrane discs and placed on the yeast lawns (Steiner, et al., 1994). Zones of clearing around the disc would then indicate that the yeast expressed a functional transport protein the allowed the yeast to transport the toxic peptide into the cell, killing the cell.
If both the domesticated and ancestral forms of the EG1117 protein complement the amino acid auxotrophic mutations suggesting they both are functional transport proteins, the following experiments will be performed. To assess the pH or cation dependence, the pH or ionic strength of the plating media will be varied and the ability of the transfectants carrying the EG 1117 proteins to grow in the presence of complementary peptides or die in the presence of the toxic peptides will be determined. Likewise, to assess the potential differences in affinity for peptide, the dose response effects a specific peptide or toxic variant of that peptide on growth of the yeast transformants will be assessed.
It is very likely that we will find that the EG1117 protein indeed codes for a peptide transport protein. It is also likely that the two forms of the protein will display a measurable difference in this function, perhaps a change in specificity/selectivity, pH
optimum or affinity will be evident. Although the non-synonymous changes in the amino acid sequence from acidic to more basic characteristics are present, it is possible that these alterations are in an unimportant region of~the protein. Alternatively, it is possible that these changes do not alter the 3-dimensional structure of the protein sufficiently to alter its function.
It is unlikely that these proteins do not transport peptides, however it is possible that the EG1117 protein might be a transport protein for some other substrate. In this case, in the absence of any evidence that the EG1117 transports peptides, it may be possible to use the same system or at least the transfected yeast to sort out some of these other functions. For example, testing for monosaccharide or polysaccharide transport ability would be possible in the appropriate auxotrophic strains. Alternatively, other yeast deletion mutants for targeted transport functions could be mated with the existing transfectants. In this case, growth of the mated yeast on the selection plates would be indicative of complementation of that particular deletion mutation. Using this strategy, it would be possible to scan a large number of different yeast deletion mutants publicly available (Brachman, et al. (1998) Yeast 14:115-132.
Example 22. Differential sensitivity of ancestral and domesticated rice seedlings to ethionine-containing toxic peptides.
These studies represent an attempt to directly demonstrate in vivo that the EG1117 .
protein functions differently in the domesticated and ancestral strains of rice. The ability to transport toxic peptides results in death of the cells that take up the toxic peptides. A lack of function might be expressed as resistance to the effects of the toxic peptide on the continued growth of the seedling roots. A lack of a phenotypic difference would suggest either that EG
1117 has not been expressed, or that other transport proteins compensate for the altered function of the selected EG 1117 proteins.
A modification of the method used for Arabidopsis will be used in these studies (Steiner, et al., 1994). Rice seeds from O. sativa and O. ~ufipoge~ will be sprouted and then the seedlings are allowed to continue to grow on rice media in a dark, moist container. The seedlings will be exposed to discs impregnated with ethionine-containing toxic peptides or non-toxic peptides as a control. Initial experiments will focus on determining if dramatic differences in sensitivity to the toxic peptides exists. If no dramatic differences are observed, additional experiments using a dose range of toxic peptides will be used in a larger experiment to determine if a difference sensitivity to the toxic peptide dose exists. This would be suggestive of a difference in functional activity of the peptide transporters present it the two strains of rice.
The results from this set of studies will depend upon whether there are other peptide transport mechanisms that compensate for the hypothesized differences in the encoded protein. As long as EG1117 is the primary peptide transport protein used by rice or has a unique function that is critical to the growth of the rice plant, any differences in EGl llTs function between the domesticated and ancestral forms should be manifested by differences in susceptibility to the toxic peptides. Similar experiments in Arabidopsis successfully demonstrated that a single PTR-2 protein was part of a single peptide transport system (Steiner, et al., 1994). Therefore, mutants of the single peptide transporter yielded dramatic results on growth inhibition. In particular, these studies revealed that a deletion of the PTR-2 protein or a lack of PTR-2 protein expression due to a developmental block in PTR-2 expression in the early embryo, resulted in resistance of the plant to the presence of .toxic peptides in the surrounding media. It is likely that rice seedling growth will be similar and easily reveal any differences in function of EG 1117 by alterations in rice seedling growth.

SEQUENCE LISTING
<110> Evolutionary Genomics LLC
<120> Methods to Identify Evolutionarily Significant Changes in Polynucleotide and Polypeptide Sequences in Domesticated Plants <130> GEN0200.1.6/PCT
<150> US 60/349,088 <151> 2002-O1-16 <150> US 60/349,661 <151> 2002-01-17 <150> US 60/368,541 <151> 2002-03-29 <150> US 10/079,042 <151> 2002-02-19 <160> 145 <170>
PatentIn version 3.2 <210>

<211>

<212>
DNA

<213>
Oryza sativa cv. Azucena <400>

ccatgtcgaggtgcttcccctacccgccgccggggtacgtgcgaaacccagtggtggccg 60 tggccgcggccgaagcgcaggcgaccactaaggtttgttgaaccatcggatttacacacg 120 cacgtgccggatcatttgctcttgcctgttggttttgatcggatctgttggttgtgcgtg 180 tgtgatttggggatcgcacgtgcggggaagctaacctttgcatggataacttgagatttg 240 tgaggccgcgcttcgaccagatcggtcgccaatcttttagtggctgaccgtggaaagagg 300 atattactgaccttcggtttgctaattttggttgtgccgttgaatctgaaataaccagaa 360 tagtcatggggaaaaaagtctgatctggaaggttcgaattacatttctatatattgttgt 420 gctcccagacgatggttgcaagaaatcactcatgctggataaaattgtggatgtaagagt 480 ctgcagtcgttaaaatctggaaacagcacattttgccgtagtaaatttgaatccatgttg 540 ctgtctcgttattggtgtgttacgagtaacctgtgtgttgttatctccgcttggactaga 600 ttccaagtaatccagtgccttcatgacctgcaaattctatgcctatgaagtaacatgaac 660 agtttgtatgtatgtattctgttgatgcatacttgcattatttgtgagatgtacatgttg 720 tggtaaaattttgcattcaccatatagaaatagtaactgactatccttgtttagttcgaa 780 aactactgca ggtttagtta ttctctgttg ccaagagtgc ttgttatgat tgtaagggtt 840 acagttctgt gactaaccat gtaacaaata tattaaggat tatcaaatta ttctatgtga 900 agtgtccgtgccctaattgtgttatcttctgtaactgatagcacaacatttgtttcctgc960 tgtgtgcttgtgtaaattggtacttcatcattactatatatttcaaagaaaattctgcat1020 tgcattcccgtcgtccgttctaaatcagaactgacgattgctctggtggctgaagctcca1080 gaaagaaagggaaaaggctgaaaagaagaaagagaaaaggagtgacaggaaagctcttcc1140 acatggtgagatatccaagcattcaaagcgaacccaccacaagaagagaaaacatgaaga1200 catcaataatgctgatcagaagtcccggaaggtttcctccatggaacctggtgagcaatt1260 ggagaagagtggactctcagaagagcatggagctccttgctttactcagacagagcatgg1320 ctctccagagagttcacaggacagcagcaagagaagaaaggttgtgttacccagtcctag1380 ccaagctaagaatggtgaggccctttcttgcatttgtcttcttttagctggtgatgttga1440 attggtttgacttatcctgaattatcatcttgcaggtaacatccttcgaataaagataag1500 aagagatcaagattcttcagcttccctttcggagaaatctaatgttgtacaaacaccagt1560 tcatcaaatgggatcagtttcatctctgccaagtaagaaaaactcaatgcaaccacacaa1620 caccgaaatgatggtgagaacagcatcaacccagcagcaaagcatcaaaggtgattttca1680 agcagtaccgaaacaaggtatgccaaccccagcaaaagtcatgccaagagtcgatgttcc1740 tccatctatgagggcatcaaaggaaaggattggccttcgtcctgcagagatgttggccaa1800 tgttggtccttcaccctccaaggcaaaacagattgtcaatcctgcagctgctaaggttac1860 acaaagagttgatcctccacctgccaaggcatctcagagaattgatcctctgttgccatc1920 caaggttcatatagatgctactcgatcttttacgaaggtctcccagacagagatcaagcc1980 ggaagtacagcccccaattctgaaggtgcctgtggctatgcctaccatcaatcgtcagca2040 gattgacacctcgcagcccaaagaagagccttgctcctctggcaggaatgctgaagctgc2100 ttcagtatcagtagagaagcagtccaagtcagatcgcaaaaagagccgcaaggctgagaa2160 gaaagagaagaagttcaaagatttatttgttacctgggatcctccgtctatggaaatgga2220 tgatatggatctcggggaccaggattggctgcttgatagtacgaggaaacctgatgctgg2280 cattggcaactgcagagaaattgttgatccacttacttctcaatcagcagagcagttctc2340 attgcagcctagggcgattcatttaccagaccttcatgtctatcagttgccatatgtggt2400 tccattctaggtttgtgtagtgagatggagtaggtgagaag 2441 <210> 2 <211> 1344 <212> DNA
<213> Oryza sativa cv. Azucena <220>

<22l> CDS
<222> (1)..(1344) <400> 2 atgtcg aggtgcttc ccctacccg ccgccgggg tacgtgcga aaccca 48 MetSer ArgCysPhe ProTyrPro ProProGly TyrValArg AsnPro gtggtg gccgtggcc gcggccgaa gcgcaggcg accactaag ctccag 96 ValVal AlaValAla AlaAlaGlu AlaGlnAla ThrThrLys LeuGln aaagaa agggaaaag getgaaaag aagaaagag aaaaggagt gacagg 144 LysGlu ArgGluLys AlaGluLys LysLysGlu LysArgSer AspArg aaaget cttccacat ggtgagata tccaagcat tcaaagcga acccac 192 LysAla LeuProHis GlyGluIle SerLysHis SerLysArg ThrHis cacaag aagagaaaa catgaagac atcaataat getgatcag aagtcc 240 HisLys LysArgLys HisGluAsp IleAsnAsn AlaAspGln LysSer cggaag gtttcctcc atggaacct ggtgagcaa ttggagaag agtgga 288 ArgLys ValSerSer MetGluPro GlyGluG1n LeuGluLys SerGly ctctca gaagagcat ggagetcct tgctttact cagacagag catggc 336 LeuSer GluGluHis GlyAlaPro CysPheThr GlnThrGlu HisGly tctcca gagagttca caggacagc agcaagaga agaaaggtt gtgtta 384 SerPro GluSerSer GlnAspSer SerLysArg ArgLysVal ValLeu cccagt cctagccaa getaagaat ggtaacatc cttcgaata aagata 432 ProSer ProSerGln AlaLysAsn G1yAsnIle LeuArgIle LysIle agaaga gatcaagat tcttcaget tccctttcg gagaaatct aatgtt 480 ArgArg AspGlnAsp SerSerAla SerLeuSer G1uLysSer AsnVal gtacaa acaccagtt catcaaatg ggatcagtt tcatctctg ccaagt 528 ValGln ThrProVal HisGlnMet GlySerVal SerSerLeu ProSer aagaaa aactcaatg caaccacac aacaccgaa atgatggtg agaaca 576 LysLys AsnSerMet GlnProHis AsnThrGlu MetMetVal ArgThr gcatca acccagcag caaagcatc aaaggtgat tttcaagca gtaccg 624 AlaSer ThrGlnGln GlnSerIle LysGlyAsp PheGlnAla ValPro aaacaa ggtatgcca accccagca aaagtcatg ccaagagtc gatgtt 672 LysGln GlyMetPro ThrProAla LysValMet ProArgVal AspVal cctcca tctatgagg gcatcaaag gaaaggatt ggccttcgt cctgca 720 Pro Pro Ser Met Arg Ala Ser Lys Glu Arg Ile Gly Leu Arg Pro Ala gagatgttggcc aatgttggt ccttcaccc tccaaggca aaacag att 768 GluMetLeuAla AsnValGly ProSerPro SerLysAla LysGln Ile gtcaatcctgca getgetaag gttacacaa agagttgat cctcca cct 816 ValAsnProAla AlaAlaLys ValThrGln ArgValAsp ProPro Pro gccaaggcatct cagagaatt gatcctctg ttgccatcc aaggtt cat 864 AlaLysAlaSer GlnArgIle AspProLeu LeuProSer LysVal His atagatgetact cgatctttt acgaaggtc tcccagaca gagatc aag 912 IleAspAlaThr ArgSerPhe ThrLysVal SerGlnThr GluIle Lys ccggaagtacag cccccaatt ctgaaggtg cctgtgget atgcct acc 960 ProGluValGln ProProIle LeuLysVal ProValAla MetPro Thr atcaatcgtcag cagattgac acctcgcag cccaaagaa gagcct tgc 1008 IleAsnArgGln GlnIleAsp ThrSerGln ProLysGlu GluPro Cys tcctctggcagg aatgetgaa getgettca gtatcagta gagaag cag 1056 SerSerGlyArg AsnAlaGlu AlaAlaSer ValSerVal GluLys Gln tccaagtcagat cgcaaaaag agccgcaag getgagaag aaagag aag 1104 SerLysSexAsp ArgLysLys SerArgLys AlaGluLys LysGlu Lys aagttcaaagat ttatttgtt acctgggat cctccgtct atggaa atg 1152 LysPheLysAsp LeuPheVa1 ThrTrpAsp ProProSer MetGlu Met gatgatatggat ctcggggac caggattgg ctgcttgat agtacg agg 1200 AspAspMetAsp LeuGlyAsp GlnAspTrp LeuLeuAsp SerThr Arg aaacctgatget ggcattggc aactgcaga gaaattgtt gatcca ctt 1248 LysProAspAla GlyIleGly AsnCysArg GluIleVal AspPro Leu acttctcaatca gcagagcag ttctcattg cagcctagg gcgatt cat 1296 ThrSerGlnSer AlaGluGln PheSerLeu GlnProArg AlaTle His ttaccagacctt catgtctat cagttgcca tatgtggtt ccattc tag 1344 LeuProAspLeu HisValTyr GlnLeuPro TyrValVal ProPhe <210> 3 <211> 447 <212> PRT
<213> Oryza sativa cv. Azucena <400> 3 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro l 5 10 15 Val Val Ala Val Ala Ala Ala Glu Ala Gln Ala Thr Thr Lys Leu Gln Lys Glu Arg Glu Lys Ala Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg Lys Ala Leu Pro His Gly Glu Ile Ser Lys His Ser Lys Arg Thr His His Lys Lys Arg Lys His Glu Asp Ile Asn Asn Ala Asp Gln Lys Ser Arg Lys Val Ser Ser Met Glu Pro Gly Glu Gln Leu Glu Lys Ser Gly Leu Ser Glu Glu His Gly Ala Pro Cys Phe Thr Gln Thr Glu His Gly 100 105 1l0 Ser Pro Glu Ser Ser Gln Asp Ser Ser Lys Arg Arg Lys Va1 Val Leu Pro Ser Pro Ser Gln Ala Lys Asn Gly Asn Ile Leu Arg Ile Lys Ile Arg Arg Asp Gln Asp Ser Ser Ala Ser Leu Ser Glu Lys Ser Asn Val Val Gln Thr Pro Val His Gln Met Gly Ser Val Ser Ser Leu Pro Ser Lys Lys Asn Ser Met Gln Pro His Asn Thr Glu Met Met Val Arg Thr Ala Ser Thr Gln Gln Gln Ser Ile Lys Gly Asp Phe Gln Ala Val Pro Lys Gln Gly Met Pro Thr Pro Ala Lys Val Met Pro Arg Val Asp Val Pro Pro Ser Met Arg Ala Ser Lys Glu Arg Ile Gly Leu Arg Pro Ala Glu Met Leu Ala Asn Val Gly Pro Ser Pro Ser Lys Ala Lys Gln Ile Val Asn Pro Ala Ala Ala Lys Va1 Thr Gln Arg Val Asp Pro Pro Pro Ala Lys A1a Ser Gln Arg Ile Asp Pro Leu Leu Pro Ser Lys Val His Ile Asp Ala Thr Arg Ser Phe Thr Lys Val Ser Gln Thr Glu Ile Lys Pro Glu Val Gln Pro Pro Ile Leu Lys Val Pro Val Ala Met Pro Thr Ile Asn Arg Gln Gln Tle Asp Thr Ser Gln Pro Lys Glu Glu Pro Cys Ser Ser Gly Arg Asn Ala Glu Ala Ala Ser Val Ser Val Glu Lys Gln 340 345 35_0 Ser Lys Ser Asp Arg Lys Lys Ser Arg Lys Ala Glu Lys Lys Glu Lys Lys Phe Lys Asp Leu Phe Val Thr Trp Asp Pro Pro Ser Met Glu Met Asp Asp Met Asp Leu Gly Asp Gln Asp Trp Leu Leu Asp Ser Thr Arg Lys Pro Asp Ala Gly Ile Gly Asn Cys Arg Glu Ile Val Asp Pro Leu Thr Ser Gln Ser Ala Glu Gln Phe Ser Leu Gln Pro Arg Ala Ile His Leu Pro Asp Leu His Val Tyr Gln Leu Pro Tyr Val Val Pro Phe <210> 4 <211> 126 <212> DNA
<213> Oryza sativa cv. Nipponbare <400> 4 gggggtgagc ttaggccgga cgccggggca tcagccatgt cgaggtgctt cccctacccg 60 ccgccggggt acgtgcgaaa cccagtggtg gccgtggccg cggccgaagc gcaggcgacc 120 actaag 12~
<210> 5 <211> 1344 <212> DNA
<213> Oryza sativa cv. Nipponbare <220>
<221> CDS
<222> (1)..(1344) <400> 5 atgtcgaggtgc ttcccc tacccgccg ccggggtac gtgcgaaac cca 48 MetSerArgCys PhePro TyrProPro ProGlyTyr ValArgAsn Pro gtggtggccgtg gccgcg gccgaagcg caggcgacc actaagctc cag 96 ValValA1aVa1 AlaAla AlaGluAla GlnA1aThr ThrLysLeu Gln aaagaaagggaa aagget gaaaagaag aaagagaaa aggagtgac agg 144 LysGluArgGlu LysAla GluLysLys LysGluLys ArgSerAsp Arg aaagetcttcca catggt gagatatcc aagcattca aagcgaacc cac 192 LysAlaLeuPro HisGly GluI1eSer LysHisSer LysArgThr His cacaagaagaga aaacat gaagacatc aataatget gatcagaag tcc 240 HisLysLysArg LysHis GluAspTle AsnAsnAla AspGlnLys Ser cggaaggtttcc tccatg gaacctggt gagcaattg gagaagagt gga 288 ArgLysValSer SerMet GluProGly GluG1nLeu GluLysSer Gly ctctcagaagag catgga getccttgc tttactcag acagagcat ggc 336 LeuSerGluGlu HisG1y AlaProCys PheThrG1n ThrGluHis Gly totccagagagt tcacag gacagcagc aagagaaga aaggttgtg tta 384 SerProGluSer SerGln AspSerSer LysArgArg LysValVal Leu cccagtcctagc caaget aagaatggt aacatcctt cgaataaag ata 432 ProSerProSer GlnAla LysAsnGly AsnIleLeu ArgIleLys I1e agaagagatcaa gattct tcagettcc ctttcggag aaatctaat gtt 480 ArgArgAspGln AspSer SerAlaSer LeuSerGlu LysSerAsn Val gtacaaacacca gttcat caaatggga tcagtttca tctctgcca agt 528 Va1GlnThrPro ValHis GlnMetGly SerValSer SerLeuPro Ser aagaaa aactcaatg caaccacac aacacc gaaatgatg gtgagaaca 576 LysLys AsnSerMet GlnProHis AsnThr GluMetMet ValArgThr gcatca acccagcag caaagcatc aaaggt gattttcaa gcagtaccg 624 AlaSer ThrGlnGln GlnSerIle LysGly AspPheGln AlaValPro aaacaa ggtatgcca accccagca aaagtc atgccaaga gtcgatgtt 672 LysGln GlyMetPro ThrProAla LysVal MetProArg ValAspVal cctcca tctatgagg gcatcaaag gaaagg attggcctt cgtcctgca 720 ProPro SerMetArg AlaSerLys GluArg IleGlyLeu ArgProAla gagatg ttggccaat gttggtcct tcaccc tccaaggca aaacagatt 768 GluMet LeuAlaAsn ValGlyPro SerPro SerLysAla LysGlnIle gtcaat cctgcaget getaaggtt acacaa agagttgat cctccacct 816 ValAsn ProAlaAla AlaLysVal ThrGln ArgValAsp ProProPro gccaag gcatctcag agaattgat cctctg ttgccatcc aaggttcat 864 AlaLys AlaSerGln ArgIleAsp ProLeu LeuProSer LysValHis atagat getactcga tcttttacg aaggtc tcccagaca gagatcaag 912 IleAsp AlaThrArg SerPheThr LysVal SerGlnThr G1uIleLys ccggaa gtacagccc ccaattctg aaggtg cctgtgget atgcctacc 960 ProGlu ValGlnPro ProIleLeu LysVal ProValAla M.etProThr atcaat cgtcagcag attgacacc tcgcag cccaaagaa gagccttgc 1008 21eAsn ArgGlnGln IleAspThr SerGln ProLysGlu GluProCys tcctct ggcaggaat getgaaget gettca gtatcagta gagaagcag 1056 SerSer GlyArgAsn AlaGluAla AlaSer ValSerVal GluLysGln tccaag tcagatcgc aaaaagagc cgcaag getgagaag aaagagaag 1104 SerLys SerAspArg LysLysSer ArgLys AlaGluLys LysGluLys aagttc aaagattta tttgttacc tgggat cctccgtct atggaaatg 1152 LysPhe LysAspLeu PheValThr TrpAsp ProProSer MetGluMet gatgat atggatctc ggggaccag gattgg ctgcttgat agtacgagg 1200 AspAsp MetAspLeu GlyAspGln AspTrp LeuLeuAsp SerThrArg aaacct gatgetggc attggcaac tgcaga gaaattgtt gatccactt 1248 LysPro AspAlaGly TleGlyAsn CysArg GluIleVal AspProLeu acttct caatcagca gagcagttc tcattg cagcctagg gcgattcat 1296 Thr Ser Gln Ser Ala Glu Gln Phe Ser Leu Gln Pro Arg Ala Ile His tta cca gac ctt cat gtc tat cag ttg cca tat gtg gtt cca ttc tag 1344 Leu Pro Asp Leu His Val Tyr Gln Leu Pro Tyr Val Val Pro Phe <210> 6 <211> 447 <212> PRT
<213> Oryza sativa cv. Nipponbare <400> 6 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro 1 5 , 10 15 Val Val Ala Val Ala Ala Ala Glu Ala Gln Ala Thr Thr Lys Leu Gln Lys Glu Arg Glu Lys Ala Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg Lys Ala Leu Pro His Gly Glu Ile Ser Lys His Ser Lys Arg Thr His His Lys Lys Arg Lys His Glu Asp Tle Asn Asn Ala Asp Gln Lys Ser Arg Lys Val Ser Ser Met Glu Pro Gly Glu Gln Leu Glu Lys Ser Gly Leu Ser G1u Glu His Gly Ala Pro Cys Phe Thr Gln Thr Glu His Gly Ser Pro Glu Ser Ser G1n Asp Ser Ser Lys Arg Arg Lys Val Val Leu Pro Ser Pro Ser Gln Ala Lys Asn Gly Asn Ile Leu Arg Ile Lys Ile Arg Arg Asp Gln Asp Ser Ser Ala Ser Leu 5er Glu Lys Ser Asn Val Val Gln Thr Pro Val His Gln Met Gly Ser Val Ser Ser Leu Pro Ser Lys Lys Asn Ser Met Gln Pro His Asn Thr Glu Met Met Val Arg Thr A1a Ser Thr Gln Gln G1n Ser Ile Lys Gly Asp Phe G1n A1a Val Pro Lys Gln Gly Met Pro Thr Pro Ala Lys Val Met Pro Arg Val Asp Val Pro Pro Ser Met Arg Ala Ser Lys Glu Arg Ile Gly Leu Arg Pro Ala Glu Met Leu Ala Asn Val Gly Pro Ser Pro Ser Lys Ala Lys Gln Ile Val Asn Pro Ala Ala Ala Lys Val Thr Gln Arg Val Asp Pro Pro Pro Ala Lys Ala Ser G1n Arg Ile Asp Pro Leu Leu Pro Ser Lys Val His Ile Asp Ala Thr Arg Ser Phe Thr Lys Val Ser Gln Thr Glu Ile Lys Pro Glu Val Gln Pro Pro Ile Leu Lys Val Pro Va1 Ala Met Pro Thr Ile Asn Arg G1n Gln Ile Asp Thr Ser Gln Pro Lys G1u Glu Pro Cys Ser Ser Gly Arg Asn Ala Glu Ala Ala Ser Val Ser Val Glu Lys Gln Ser Lys Ser Asp Arg Lys Lys Ser Arg Lys Ala G1u Lys Lys Glu Lys Lys Phe Lys Asp Leu Phe Val Thr Trp Asp Pro Pro Ser Met Glu Met Asp Asp Met Asp Leu Gly Asp Gln Asp Trp Leu Leu Asp Ser Thr Arg Lys Pro Asp Ala Gly Ile Gly Asn Cys Arg Glu Ile Val Asp Pro Leu Thr Ser Gln Ser Ala Glu Gln Phe Ser Leu Gln Pro Arg Ala Ile His lU

Leu Pro Asp Leu His Val Tyr G1n Leu Pro Tyr Val Val Pro Phe 435 ' 440 445 <210> 7 <211> 2461 <212> DNA
<2l3> Oryza sativa cv. Teqing <400>
gcggacgcgggacatcagccatgtcgaggtgcttcccctacccgccgccggggtacgtgc60 gaaacccagtggtggccgtggccgcggccgaagcgcaggcgaccactaaggtttgttgaa120 ccatcggatttacacacgcacgtgccggatcatttgctcttgcctgttggttttgatcgg180 atctgttggttgtgcgtgtgtgatttggggatcgcacgtgcggggaagctaacctttgca240 tggataacttgagatttgtgaggccgcgcttcgaccagatcggtcgccaatcttttagtg300 gctgaccgtggaaagaggatattactgaccttcggtttgctaattttggttgtgccgttg360 aatctgaaataaccagaatagtcatggggaaaaaagtctgatctggaaggttcgaattac420 atttctatatattgttgtgctcccagacgatggttgcaagaaattactcatgctggataa480 aattgtggatgtaagagtctgcagttgttaaaatctggaaacagcacattttgccgtagt540 aaatttgaatccatgttgctgtctcgttattggtgtgttacgagtaacctgtgtgttgtt600 atctccgcttggactagattccaagtaatccagtgccttcatgacctgcaaattctatgc660 ctatgaagtaacatgaacagtttgtatgtattctgttgatgcatacttgcattatttgtg720 agatgtacatgttgtggtaaaattttgcattcaccatatagaaatagtaactgactatcc780 ttgtttagttcgaaaactactgcaggtttagttattctctgttgccaagagtgcttgtta840 tgattgtaagggttacagttctgtgactaaccatgtaacaaatatattaaggattatcaa900 attattctatgtgaagtgtccgtgccctaattgtgttatcttctgtaactgatagcacaa960 catttgtttcctgctgtgtgcttgtgtaaattggtacttcatcattactatatatttcaa1020 agaaaattctgcattgcattcccgtcgtccgttctaaatcagaactgacgattgctctgg1080 tggctgaagctccagaaagaaagggaaaaggccgaaaagaagaaagagaaaaagagtgac1140 aggaaagctcttccacatggtgagatatccaagcattcaaagcgaacccacaagaagaga1200 aaacatgaagacatcaataatgctgatcagaagtcccggaaggtttcctccatggaacct1260 ggtgagcaattggagaagagtggactctcagaagagcatggagctccttgctttactcag1320 acagtgcatggctctccagagagttcacaggacagcagcaagagaagaaaggttgtgtta1380 cccagtcctagccaagctaagaatggtgaggccctttcttgcatttgtcttcttttagct1440 ggtgatgttgaattggtttgacttatcctgaattatcatcttgcaggtaacatccttcga1500 ataaagataagaagagatcaagattcttcagcttccctttcggagaaatctaatgttgta 1560 caaacaccagttcatcaaatgggatcagtttcatctctgccaagtaagaaaaactcaatg 1620 caaccacacaacaccgaaatgatggtgagaacagcatcaacccagcagcaaagcatcaaa 1680 ggtgattttcaagcagtactgaaacaaggtatgccaaccccagcaaaagtcatgccaaga 1740 gtcgatgttcctccatctatgagggcatcaaaggaaagggttggccttcgtcctgcagag 1800 atgttggccaatgttggtccttcaccatccaaggcaaaacagattgtcaatcctgcagct 1860 gctaaggttacacaaagagttgatcctccacctgccaaggcatctcagagaattgatcct 1920 ctgttgccatccaaggttcatatagatgctactcgatcttttacgaaggtctcccagaca 1980 gagatcaagccggaagtacagcccccaattccgaaggtgcctgtggctatgcctaccatc 2040 aatcgtcagcagattgacacctcgcagcccaaagaagagccttgctcctctggcaggaat 2100 gctgaagctgcttcagtatcagtagagaagcagtccaagtcagatcgcaaaaagagccgc 2160 aaggctgagaagaaagagaagaagttcaaagatttatttgttacctgggatcctccgtct 2220 atggaaatggatgatatggatcttggggaccaggattggctgcttggtagtacgaggaaa 2280 cctgatgctggcattggcaactgcagagaaattgttgatccacttacttctcaatcagca 2340 gagcagttctcattgcagcctagggcgattcatttaccagaccttcatgtctatcagttg 2400 ccatatgtggttccattctaggtttgtgtagtgagatggagtaggtgagaagtagagaga 2460 t 2461 <210> 8 <211> 1341 <212> DNA
<213> Oryza sativa cv. Teqing <220>
<221> CDS
<222> (1)..(1341) <400> 8 atgtcgaggtgc ttcccctac ccgccg ccggggtacgtg cgaaac cca 48 MetSerArgCys PheProTyr ProPro ProGlyTyrVa1 ArgAsn Pro gtggtggccgtg gccgcggcc gaagcg caggcgaccact aagctc cag 96 ValValAlaVal AlaAlaAla GluAla GlnAlaThrThr LysLeu Gln~

aaagaaagggaa aaggccgaa aagaag aaagagaaaaag agtgac agg 144 LysGluArgGlu LysA1aGlu LysLys LysGluLysLys SerAsp Arg aaagetcttcca catggtgag atatcc aagcattcaaag cgaacc cac 192 LysAlaLeuPro HisGlyGlu IleSer LysHisSerLys ArgThr His aagaag agaaaacat gaagacatc aataatget gatcagaag tcccgg 240 LysLys ArgLysHis GluAspIle AsnAsnAla AspGlnLys SerArg aaggtt tcctccatg gaacctggt gagcaattg gagaagagt ggactc 288 LysVal SerSerMet GluProGly GluGlnLeu GluLysSer GlyLeu tcagaa gagcatgga getccttgc tttactcag acagtgcat ggctct 336 SerGlu GluHisGly AlaProCys PheThrGln ThrValHis GlySer ccagag agttcacag gacagcagc aagagaaga aaggttgtg ttaccc 384 ProGlu SerSerGln AspSerSer LysArgArg LysValVal LeuPro agtcct agccaaget aagaatggt aacatcctt cgaataaag ataaga 432 SerPro SerGlnAla LysAsnGly AsnIleLeu ArgIleLys IleArg agagat caagattct tcagettcc ctttcggag aaatctaat gttgta 480 ArgAsp GlnAspSer SerAlaSer LeuSerGlu LysSerAsn ValVal caaaca ccagttcat caaatggga tcagtttca tctctgcca agtaag 528 GlnThr ProValHis GlnMetGly SerValSer SerLeuPro SerLys aaaaac tcaatgcaa ccacacaac accgaaatg atggtgaga acagca 576 LysAsn SerMetGln ProHisAsn ThrGluMet MetValArg ThrA1a tcaacc cagcagcaa agcatcaaa ggtgatttt caagcagta ctgaaa 624 SerThr GlnGlnGln SerIleLys GlyAspPhe GlnAlaVal LeuLys caaggt atgccaacc ccagcaaaa gtcatgcca agagtcgat gttcct 672 GlnGly MetProThr ProAlaLys ValMetPro ArgVa1Asp ValPro ccatct atgagggca tcaaaggaa agggttggc cttcgtcct gcagag 720 ProSex MetArgAla SerLysGlu ArgValGly LeuArgPro AlaGlu atgttg gccaatgtt ggtccttca ccatccaag gcaaaacag attgtc 768 MetLeu AlaAsnVal GlyProSer ProSerLys AlaLysGln IleVal aatcct gcagetget aaggttaca caaagagtt gatcctcca cctgcc 816 AsnPro AlaAlaAla LysValThr GlnArgVal AspProPro ProAla aaggca tctcagaga attgatcct ctgttgcca tccaaggtt catata 864 LysAla SerGlnArg IleAspPro LeuLeuPro SerLysVal HisIle gatget actcgatct tttacgaag gtctcccag acagagatc aagccg 912 AspAla ThrArgSer PheThrLys ValSerGln ThrGluI1e LysPro gaagtacag ccccca attccgaag gtgcctgtg getatgcct accatc 960 GluValGln ProPro IleProLys ValProVa1 AlaMetPro ThrIle aatcgtcag cagatt gacacctcg cagcccaaa gaagagcct tgctcc 1008 AsnArgGln GlnIle AspThrSer GlnProLys GluG1uPro CysSer tctggcagg aatget gaagetget tcagtatca gtagagaag cagtcc 1056 SerGlyArg AsnAla GluAlaAla SerValSer ValGluLys GlnSer aagtcagat cgcaaa aagagccgc aaggetgag aagaaagag aagaag 1104 LysSerAsp ArgLys LysSerArg LysAlaGlu LysLysGlu LysLys ttcaaagat ttattt gttacctgg gatcctccg tctatggaa atggat 1152 PheLysAsp LeuPhe ValThrTrp AspProPro SerMetGlu MetAsp gatatggat cttggg gaccaggat tggctgctt ggtagtacg aggaaa 1200 AspMetAsp LeuGly AspGlnAsp TrpLeuLeu GlySerThr ArgLys cctgatget ggcatt ggcaactgc agagaaatt gttgatcca cttact 1248 ProAspAla GlyIle GlyAsnCys ArgGluIle ValAspPro LeuThr tctcaatca gcagag cagttctca ttgcagcct agggcgatt cattta. 1296 SerGlnSer AlaGlu GlnPheSer LeuGlnPro ArgAlaIle HisLeu ccagacctt catgtc tatcagttg ccatatgtg gttccattc tag 1341 ProAspLeu HisVal TyrGlnLeu ProTyrVal ValProPhe <210> 9 <211> 446 <212> PRT

<213> Oryza cv.Teqing sativa <400> 9 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro 1 ~ 5 10 15 Val Val A1a Val Ala Ala Ala G1u Ala Gln Ala Thr Thr Lys Leu Gln Lys Glu Arg Glu Lys Ala Glu Lys Lys Lys Glu Lys Lys Ser Asp Arg Lys Ala Leu Pro His Gly Glu Ile Ser Lys His Ser Lys Arg Thr His Lys Lys Arg Lys His Glu Asp Ile Asn Asn Ala Asp G1n Lys Ser Arg 14 ' Lys Val Ser Ser Met Glu Pro Gly Glu Gln Leu Glu Lys Ser Gly Leu Ser Glu Glu His Gly Ala Pro Cys Phe Thr Gln Thr Val His Gly Ser Pro Glu Ser Ser Gln Asp Ser Ser Lys Arg Arg Lys Val Val Leu Pro Ser Pro Ser Gln Ala Lys Asn Gly Asn Ile Leu Arg I1e Lys Ile Arg Arg Asp Gln Asp Ser Ser Ala Ser Leu Ser Glu Lys Ser Asn Val Val l45 l50 155 l60 Gln Thr Pro Val His Gln Met G1y Ser Val Ser Ser Leu Pro Ser Lys Lys Asn Ser Met Gln Pro His Asn Thr G1u Met Met Val Arg Thr Ala Ser Thr Gln Gln Gln Ser Ile Lys Gly Asp Phe Gln Ala Val Leu Lys Gln Gly Met Pro Thr Pro Ala Lys Val Met Pro Arg Val Asp Val Pro Pro Ser Met Arg Ala Ser Lys Glu Arg Val Gly Leu Arg Pro Ala Glu Met Leu Ala Asn Val Gly Pro Ser Pro Ser Lys Ala Lys Gln Ile Val Asn Pro Ala Ala Ala Lys Val Thr Gln Arg Val Asp Pro Pro Pro Ala Lys Ala Ser Gln Arg Ile Asp Pro Leu Leu Pro Ser Lys Va1 His Ile Asp Ala Thr Arg Ser Phe Thr Lys Val Ser Gln Thr Glu Ile Lys Pro Glu Val Gln Pro Pro Ile Pro Lys Va1 Pro Val Ala Met Pro Thr Ile Asn Arg Gln Gln Ile Asp Thr Ser Gln Pro Lys Glu Glu Pro Cys Ser Ser Gly Arg Asn Ala Glu Ala Ala Ser Val Ser Val Glu Lys Gln Ser 340 345 _ 350 Lys Ser Asp Arg Lys Lys Ser Arg Lys Ala Glu Lys Lys Glu Lys Lys Phe Lys Asp Leu Phe Va1 Thr Trp Asp Pro Pro Ser Met Glu Met Asp Asp Met Asp Leu Gly Asp Gln Asp Trp Leu Leu Gly Ser Thr Arg Lys Pro Asp Ala Gly Ile Gly Asn Cys Arg Glu Ile Val Asp Pro Leu Thr Ser Gln Ser Ala Glu Gln Phe Ser Leu Gln Pro Arg Ala Ile His Leu Pro Asp Leu His Val Tyr Gln Leu Pro Tyr Val Val Pro Phe <210> 10 <211> 451 <212> DNA
<213> Oryza sativa cv. Lemont <400> 10 cgccacgcgaaaccaaatcccgccgcgcgggatccttttccgccggattccacccgcgaa60 tcggggttccccttacgattcgcgggcggattagcgcgaggcgcgcctccccctacctct120 gtgtgatccgggggtgaggttaggccggacgccggggcatcagccatgtcgaggtgcttc180 ccctacccgccgccggggtacgtgcgaaacccagtggtggccgtggccgcggccgaagcg240 caggcgaccactaaggtttgttgaaccatcggatttacacacgcacgtgccggatcattt300 gctcttgcctgttggttttgatcggatctgttggttgtgcgtgtgtgatttggggatcgc360 acgtgcggggaagctaacctttgcatggataacttgagatttgtgaggccgcgcttcgac420 cagatcggtcgccaatcttttagtggctgac 451 <210>

<211>

<212>
DNA

<213> a sativa Oryz cv. Lemont m <400>

acaaatatattaaggattatcaaattattctatgtgaagtgtccgtgccctaattgtgtt60 atcttctgtaactgatagcacaacatttgtttcctgctgtgtgcttgtgtaaattggtac120 ttcatcattactatatatttcaaagaaaattctgcattgcattcccgtcgtccgttctaa180 atcagaactgacgattgctctggtggctgaagctccagaaagaaagggaaaaggctgaaa240 agaagaaagagaaaaggagtgacaggaaagctcttccacatggtgagatatccaagcatt300 caaagcgaacccaccacaagaagagaaaacatgaagacatcaataatgctgatcagaagt360 cccggaaggtttcctccatggaacctggtgagcaattggagaagagtggactctcagaag420 agcatggagctccttgctttactcagacagagcatggctctccagagagttcacaggaca480 gcagcaagagaagaaaggttgtgttacccagtcctagccaagctaagaatggtgaggccc540 tttcttgcatttgtcttcttttagctggtgatgttgaattggtttgacttatcctgaatt600 atcatcttgcaggtaacatccttcgaataaagataagaagagatcaagattcttcagctt660 ccctttcggagaaatctaatgttgtacaaacaccagttcatcaaatgggatcagtttcat720 ctctgccaagtaagaaaaactcaatgcaaccacacaacaccgaaatgatggtgagaacag780 catcaacccagcagcaaagcatcaaaggtgattttcaagcagtaccgaaacaaggtatgc840 caaccccagcaaaagtcatgccaagagtcgatgttcctccatctatgagggcatcaaagg900 aaaggattggccttcgtcctgcagagatgttggccaatgttggtccttcaccctccaagg960 caaaacagattgtcaatcctgcagctgctaaggttacacaaagagttgatcctccacctg1020 ccaaggcatctcagagaattgatcctctgttgccatccaaggttcatatagatgctactc1080 gatcttttacgaaggtctcccagacagagatcaagccggaagtacagcccccaattctga1140 aggtgcctgtggctatgcctaccatcaatcgtcagcagattgacacctcgcagcccaaag1200 aagagccttgctcctctggcaggaatgctgaagctgcttcagtatcagtagagaagcagt1260 ccaagtcagatcgcaaaaagagccgcaaggctgagaagaaagagaagaagttcaaagatt1320 tatttgttacctgggatcctccgtctatggaaatggatgatatggatctcggggaccagg1380 attggctgcttgatagtacgaggaaacctgatgctggcattggcaactgcagagaaattg1440 ttgatccacttacttctcaatcagcagagcagttctcattgcagcctagggcgattcatt1500 taccagaccttcatgtctatcagttgccatatgtggttccattctaggtttgtgtagtga1560 gatggagtag gtgagaagta gagagatgtt gggagagagc tgtgtgggtc tgggag 1616 <210> 12 <211> 1344 <212> DNA

<213> Oryza sativa cv. Lemont <220>
<221> CDS
<222> (1)..(1344) <400> 12 atgtcg aggtgcttc ccctacccg ccgccgggg tacgtgcga aaccca 48 MetSer ArgCysPhe ProTyrPro ProProGly TyrValArg AsnPro gtggtg gccgtggcc gcggccgaa gcgcaggcg accactaag ctccag 96 ValVal AlaValAla AlaAlaGlu AlaGlnAla ThrThrLys LeuGln aaagaa agggaaaag getgaaaag aagaaagag aaaaggagt gacagg 144 LysGlu ArgGluLys AlaGluLys LysLysGlu LysArgSer AspArg aaaget cttccacat ggtgagata tccaagcat tcaaagcga acccac 192 LysAla LeuProHis GlyGluIle SerLysHis SerLysArg ThrHis cacaag aagagaaaa catgaagac atcaataat getgatcag aagtcc 240 HisLys LysArgLys HisGluAsp IleAsnAsn AlaAspGln LysSer cggaag gtttcctcc atggaacct ggtgagcaa ttggagaag agtgga 288 ArgLys ValSerSer Met,GluPro GlyGluGln LeuGluLys SerGly ctctca gaagagcat ggagetcct tgctttact cagacagag catggc 336 LeuSer G1uGluHis GlyAlaPro CysPheThr GlnThrGlu HisGly tctcca gagagttca caggacagc agcaagaga agaaaggtt gtgtta 384 SerPro GluSerSer GlnAspSer SerLysArg ArgLysVal ValLeu cccagt cctagccaa getaagaat ggtaacatc cttcgaata aagata 432 ProSer ProSerGln AlaLysAsn GlyAsnIle LeuArgI1e LysTle aga~agagatcaagat tcttcaget tccctttcg gagaaatct aatgtt 480 ArgArg AspGlnAsp SerSerAla SerLeuSer GluLysSer AsnVal gtacaa acaccagtt catcaaatg ggatcagtt tca~tctctg ccaagt 528 ValGln ThrProVal HisGlnMet GlySerVal SerSerLeu ProSer aagaaa aactcaatg caaccacac aacaccgaa atgatggtg agaaca 576 LysLys AsnSerMet GlnProHis AsnThrG1u MetMetVal ArgThr gcatca acccagcag caaagcatc aaaggtgat tttcaagca gtaccg 624 AlaSer ThrGlnGln GlnSerIle LysGlyAsp PheG1nAla ValPro aaa caa ggt atg cca acc cca gca aaa gtc atg cca aga gtc gat gtt 672 LysGln G1yMetPro ThrProAla LysValMet ProArgVal AspVal cctcca tctatgagg gcatcaaag gaaaggatt ggccttcgt cctgca 720 ProPro SerMetArg AlaSerLys GluArgIle GlyLeuArg ProAla gagatg ttggccaat gttggtcct tcaccctcc aaggcaaaa cagatt 768 GluMet LeuAlaAsn ValGlyPro SerProSer LysAlaLys GlnIle gtcaat cctgcaget getaaggtt acacaaaga gttgatcct ccacct 816 ValAsn ProAlaAla AlaLysVal ThrGlnArg ValAspPro ProPro gccaag gcatctcag agaattgat cctctgttg ccatccaag gttcat 864 AlaLys AlaSerGln ArgIleAsp ProLeuLeu ProSerLys ValHis atagat getactcga tcttttacg aaggtctcc cagacagag atcaag 912 IleAsp AlaThrArg SerPheThr LysValSer GlnThrGlu IleLys ccggaa gtacagccc ccaattctg aaggtgcct gtggetatg cctacc ~ 960 ProGlu ValGlnPro ProIleLeu LysValPro ValAlaMet ProThr atcaat cgtcagcag attgacaoc tcgcagccc aaagaagag ccttgc 1008 IleAsn ArgG1nGln IleAspThr SerGlnPro LysGluGlu ProCys tcctct ggcaggaat getgaaget gettcagta tcagtagag aagcag 1056 SerSer GlyArgAsn AlaGluAla AlaSerVal SerValGlu LysGln tccaag tcagatcgc aaaaagagc cgcaagget gagaagaaa gagaag 1104 SerLys SerAspArg LysLysSer ArgLysAla GluLysLys GluLys aagttc aaagattta tttgttacc tgggatcct ccgtctatg gaaatg 1152 LysPhe LysAspLeu PheValThr TrpAspPro ProSerMet GluMet gatgat atggatctc ggggaccag gattggctg ottgatagt acgagg 1200 AspAsp MetAspLeu GlyAspGln AspTrpLeu LeuAspSer ThrArg aaacct gatgetggc attggcaac tgcagagaa attgttgat ccactt 1248 LysPro AspAlaGly IleGlyAsn CysArgGlu TleValAsp ProLeu acttct caatcagca gagcagttc tcattgcag cctagggcg attcat 1296 ThrSer GlnSerAla GluGlnPhe SerLeuGln ProArgAla IleHis ttacca gaccttcat gtctatcag ttgccatat gtggttcca ttctag 1344 LeuPro AspLeuHis ValTyrGln LeuProTyr ValValPro Phe <210> 13 ' <211> 447 <212> PRT
<213> 0ryza sativa cv. Lemont <400> 13 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro Val Val Ala Val Ala Ala Ala Glu Ala Gln Ala Thr Thr Lys Leu Gln Lys Glu Arg Glu Lys Ala Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg Lys Ala Leu Pro His Gly Glu Ile Ser Lys His Ser Lys Arg Thr His His Lys Lys Arg Lys His Glu Asp Ile Asn Asn Ala Asp Gln Lys Ser Arg Lys Val Ser Ser Met Glu Pro Gly Glu Gln Leu Glu Lys Ser Gly Leu Ser Glu Glu His Gly Ala Pro Cys Phe Thr G1n Thr Glu His Gly Ser Pro G1u Ser Ser Gln Asp Ser Ser Lys Arg Arg Lys Val Val Leu Pro Ser Pro Ser Gln Ala Lys Asn Gly Asn Ile Leu Arg Ile Lys Ile Arg Arg Asp Gln Asp Ser Ser Ala Ser Leu Ser Glu Lys Ser Asn Val Val Gln Thr Pro Val His Gln Met Gly Ser Val Ser Ser Leu Pro Ser Lys Lys Asn Ser Met Gln Pro His Asn Thr Glu Met Met Val Arg Thr Ala Ser Thr Gln Gln Gln Ser Ile Lys Gly Asp Phe G1n Ala Val Pro Lys Gln Gly Met Pro Thr Pro Ala Lys Val Met Pro Arg Val Asp Val Pro Pro Ser Met Arg Ala Ser Lys Glu Arg Ile Gly Leu Arg Pro Ala Glu Met Leu Ala Asn Val Gly Pro Ser Pro Ser Lys Ala Lys Gln Ile Val Asn Pro Ala Ala Ala Lys Val Thr Gln Arg Val Asp Pro Pro Pro Ala Lys Ala Ser G1n Arg Ile Asp Pro Leu Leu Pro Ser Lys Val His Ile Asp Ala Thr Arg Ser Phe Thr Lys Val Ser Gln Thr Glu Ile Lys Pro Glu Val Gln Pro Pro Ile Leu Lys Val Pro Val Ala Met Pro Thr Ile Asn Arg Gln Gln Tle Asp Thr Ser Gln Pro Lys Glu Glu Pro Cys Ser Ser G1y Arg Asn Ala Glu Ala Ala Ser Val Ser Val Glu Lys Gln Ser Lys Ser Asp Arg Lys Lys Ser Arg Lys Ala Glu Lys Lys G1u Lys Lys Phe Lys Asp Leu Phe Val Thr Trp Asp Pro Pro Ser Met Glu Met Asp Asp Met Asp Leu Gly Asp Gln Asp Trp Leu Leu Asp Ser Thr Arg Lys Pro Asp Ala Gly Ile G1y Asn Cys Arg Glu Ile Val Asp Pro Leu Thr Ser Gln Ser Ala Glu Gln Phe Ser Leu Gln Pro Arg Ala Ile His Leu Pro Asp Leu His Val Tyr Gln Leu Pro Tyr Val Val Pro Phe <210> 14 <211> 2459 <212> DNA

<213> Oryza sativa strain IR64 <400> 14 atgtcgaggt gcttcccctacccgccgccggggtacgtgcgaaacccagtggtggccgtg60 gccgcggccg aagcgcaggcgaccactaaggtttgttgaaccatcggatttacacacgca120 cgtgccggat catttgctcttgcctgttggttttgatcggatctgttggttgtgcgtgtg180 tgatttgggg atcgcacgtgcggggaagctaacctttgcatggataacttgagatttgtg240 aggccgcgct tcgaccagatcggtcgccaatcttttagtggctgaccgtggaaagaggat300 attactgacc ttcggtttgctaattttggttgtgccgttgaatctgaaataaccagaata360 gtcatgggga aaaagtctgatctggaaggttcgaattacatttctatatattgttgtgct420 cccagacgat ggttgcaagaaattactcatgctggataaaattgtggatgtaagagtctg480 cagttgttaa aatctggaaacagcacattttgccgtagtaaatttgaatccatgttgctg540 tctcgttatt ggtgtgttacgagtaacctgtgtgttgttatctccgcttggactagattc600 caagtaatcc agtgccttcatgacctgcaaattctatgcctatgaagtaacatgaacagt660 ttgtatgtat tctgttgatgcatacttgcattatttgtgagatgtacatgttgtggtaaa720 attttgcatt caccatatagaaatagtaattgactatccttgtttagttcgaaaactact780 gcaggtttagttattctctgttgccaagagtgcttgttatgattgtaagggttacagttc840 tgtgactaaccatgtaacaaatatattaaggattatcaaattattctatgtgaagtgtcc900 gtgccctaattgtgttatcttctgtaactgatagcacaacatttgtttcctgctgtgtgc960 ttgtgtaaattggtacttcatcattactatatatttcaaagaaaattctgcattgcattc1020 ccgtcgtccgttctaaatcagaactgacgattgctctggtggctgaagctccagaaagaa1080 agggaaaaggccgaaaagaagaaagagaaaaggagtgacaggaaagctcttccacatggt1140 gagatatccaagcattcaaagcgaacccacaagaagagaaaacatgaagacatcaataat1200 gctgatcagaagtcccggaaggtttcctccatggaacctggtgagcaattggagaagagt1260 ggactctcagaagagcatggagctccttgctttactcagacagtgcatggctctccagag1320 agttcacaggacagcagcaagagaagaaaggttgtgttacccagtcctagccaagctaag1380 aatggtgaggccctttcttgcatttgtcttcttttagctggtgatgttgaattggtttga1440 cttatcctgaattatcatcttgcaggtaacatccttcgaataaagataagaagagatcaa1500 gattcttcag cttccctttc ggagaaatct aatgttgtac aaacaccagt tcatcaaatg 1560 ggatcagttt catctctgcc aagtaagaaa aactcaatgc aaccacacaa caccgaaatg 1620 atggtgagaa cagcatcaac ccagcagcaa agcatcaaag gtgattttca agcagtactg 1680 aaacaaggta tgccaacccc agcaaaagtc atgccaagag tcgatgttcc tccatctatg 1740 agggcatcaaaggaaagggttggccttcgtcctgcagagatgttggccaatgttggtcct1800 tcaccctccaaggcaaaacagattgtcaatcctgcagctgctaaggttacacaaagagtt1860 gatcctccacctgccaaggcatctcagagaattgatcctctgttgccatccaaggttcat1920 atagatgctactcgatcttttacgaagctctcccagacagagatcaagccggaagtacag1980 cccccaattccgaaggtgcctgtggctatgcctaccatcaatcgtcagcagattgacacc2040 tcgcagcccaaagaagagccttgctcctctggcaggaatgctgaagctgcttcagtatca2100 gtagagaagcagtccaagtcagatcgcaaaaagagccgcaaggctgagaagaaagagaag2160 aagttcaaagatttatttgttacctgggatcctccgtctatggaaatggatgatatggat2220 cttggggaccaggattggctgcttggtagtacgaggaaacctgatgctggcattggcaac2280 tgcagagaaattgttgatccacttacttctcaatcagcggagcagttctcattgcagcct2340 agggcgattc,atttaccagaccttcatgtctatcagttgccatatgtggttccattctag2400 gtttgtgtagtgagatggagtaggtgagaagtagagagatgttgggagagagctgtgtg 2459 <210>

<211>

<212>
DNA

<213>
Oryza sativa strain <220>

<221>
CDS

<222> .(1341) (1).

<400> 15 atg tcg agg tgc ttc ccc tac ccg cog ccg ggg tac gtg cga aac cca 48 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro gtg gtg gcc gtg gcc gcg gcc gaa gcg cag gcg acc act aag ctc cag 96 Val Val Ala Val Ala Ala Ala Glu Ala Gln Ala Thr Thr Lys Leu Gln aaa gaa agg gaa aag gcc gaa aag aag aaa gag aaa agg agt gac agg 144 Lys G1u Arg Glu Lys Ala Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg aaa get ctt cca cat ggt gag ata tcc aag cat tca aag cga acc cac 192 Lys Ala Leu Pro His Gly Glu Ile Ser Lys His Ser Lys Arg Thr His aag aag aga aaa cat gaa gac atc aat aat get gat cag aag tcc cgg 240 Lys Lys Arg Lys His Glu Asp Ile Asn Asn Ala Asp Gln Lys Ser Arg aag gtt tcc tcc atg gaa cct ggt gag caa ttg gag aag agt gga ctc 288 Lys Val Ser Ser Met Glu Pro Gly Glu Gln Leu Glu Lys Ser Gly Leu tca gaa gag cat gga get cct tgc ttt act cag aca gtg cat ggc tct 336 Ser Glu Glu His Gly Ala Pro Cys Phe Thr Gln Thr Val His Gly Ser ccagag agttcacag gacagcagc aagagaaga aaggtt gtgttaccc 384 ProGlu SerSerGlriAspSerSer LysArgArg LysVal ValLeuPro agtcct agccaaget aagaatggt aacatcctt cgaata aag.ataaga 432 SerPro SerGlnAla LysAsnGly AsnIleLeu ArgIle LysIleArg agagat caagattct tcagettcc ctttcggag aaatct aatgttgta 480 ArgAsp GlnAspSer SerAlaSer LeuSerGlu LysSer AsnValVal caaaca ccagttcat caaatggga tcagtttca tctctg ccaagtaag 528 GlnThr ProValHis GlnMetGly SerValSer SerLeu ProSerLys aaaaac tcaatgcaa ccacacaac accgaaatg atggtg agaacagca 576 LysAsn SerMetGln ProHisAsn ThrGluMet MetVal ArgThrAla tcaacc cagcagcaa agcatcaaa ggtgatttt caagca gtactgaaa 624 SerThr GlnGlnGln SerIleLys GlyAspPhe GlnAla ValLeuLys caaggt atgccaacc ccagcaaaa gtcatgcca agagtc gatgttcct 672 GlnGly MetProThr ProAlaLys ValMetPro ArgVal AspValPro ccatct atgagggca tcaaaggaa agggttggc cttcgt cctgcagag 720 ProSer MetArgAla SerLysGlu ArgValGly LeuArg ProAlaGlu atgttg gccaatgtt ggtccttca ccctccaag gcaaaa cagattgtc 768 MetLeu AlaAsnVal GlyProSer ProSerLys AlaLys GlnIleVal aatcct gcagetget aaggttaca caaagagtt gatcct ccacctgcc 816 AsnPro AlaAlaAla LysValThr GlnArgVal AspPro ProProAla aaggca tctcagaga attgatcct ctgttgcca tccaag gttcatata 864 LysAla SerGlnArg IleAspPro LeuLeuPro SerLys ValHisIle gatget actcgatct tttacgaag ctctcccag acagag atcaagccg 912 AspAla ThrArgSer PheThrLys LeuSerGln ThrGlu IleLysPro gaagta cagccccca attccgaag gtgcctgtg getatg cctaccatc 960 GluVal GlnProPro IleProLys ValProVal AlaMet ProThrIle aatcgt cagcagatt gacacctcg cagcccaaa gaagag ccttgctcc 1008 AsnArg GlnGlnIle AspThrSer GlnProLys GluGlu ProCysSer tctggc aggaatget gaagetget tcagtatca gtagag aagcagtcc 1056 SerGly ArgAsnAla GluAlaAla SerValSer ValGlu LysGlnSer aagtcagat ogcaaa aagagccgc aaggetgag aagaaagag aagaag 1104 LysSerAsp ArgLys LysSerArg LysAlaGlu LysLysGlu LysLys ttcaaagat ttattt gttacctgg gatcctccg tctatggaa atggat 1152 PheLysAsp LeuPhe ValThrTrp AspProPro SerMetGlu MetAsp gatatggat cttggg gaccaggat tggctgctt ggtagtacg aggaaa 1200 AspMetAsp LeuGly AspGlnAsp TrpLeuLeu GlySerThr ArgLys cctgatget ggcatt ggcaactgc agagaaatt gttgatcca .cttact 1248 ProAspAla GlyIle GlyAsnCys ArgGluI1e ValAspPro LeuThr tctcaatca gcggag cagttctca ttgcagcct agggcgatt cattta 1296 SerGlnSer AlaGlu GlnPheSer LeuGlnPro ArgA1aIle HisLeu ccagacctt catgtc tatcagttg ccatatgtg gttccattc tag 1341 ProAspLeu HisVal TyrGlnLeu ProTyrVal ValProPhe <210> 16 <211> 446 <212> PRT
<213> Oryza sativa strain IR64 <400> 16 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro Val Val Ala Val Ala Ala Ala Glu Ala G1n Ala Thr Thr Lys Leu Gln 20 25 30 ' Lys Glu Arg Glu Lys Ala Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg Lys Ala Leu Pro His Gly Glu Ile Ser Lys His Ser Lys Arg Thr His Lys Lys Arg Lys His Glu Asp Ile Asn Asn Ala Asp G1n Lys Ser Arg Lys Val Ser Ser Met Glu Pro Gly Glu Gln Leu Glu Lys Ser Gly Leu Ser Glu Glu His Gly Ala Pro Cys Phe Thr Gln Thr Val His Gly Ser Pro Glu Ser Ser Gln Asp Ser Ser Lys Arg Arg Lys Val Val Leu Pro Ser Pro Ser Gln Ala Lys Asn Gly Asn Ile Leu Arg Ile Lys Ile Arg l30 135 140 Arg Asp Gln Asp Ser Ser A1a Ser Leu Ser Glu Lys Ser Asn Val Val Gln Thr Pro Val His Gln Met Gly Ser Val Ser Ser Leu Pro Ser Lys Lys Asn Ser Met Gln Pro His Asn Thr Glu Met Met Val Arg Thr Ala Ser Thr Gln Gln Gln Ser Ile Lys Gly Asp Phe G1n Ala Va1 Leu Lys Gln Gly Met Pro Thr Pro Ala Lys Val Met Pro Arg Va1 Asp Val Pro Pro Ser Met Arg Ala Ser Lys Glu Arg Val Gly Leu Arg Pro Ala Glu Met Leu Ala Asn Val Gly Pro Ser Pro Ser Lys Ala Lys Gln I1e Va1 Asn Pro Ala Ala A1a Lys Val Thr Gln Arg Val Asp Pro Pro Pro Ala Lys Ala Ser Gln Arg Ile Asp Pro Leu Leu Pro Ser Lys Val His Ile Asp Ala Thr Arg Ser Phe Thr Lys Leu Ser Gln Thr Glu Ile Lys Pro Glu Val Gln Pro Pro Ile Pro Lys Val Pro Val Ala Met Pro Thr Ile Asn Arg Gln Gln Ile Asp Thr Ser G1n Pro Lys Glu Glu Pro Cys Ser Ser Gly Arg Asn Ala Glu Ala Ala Ser Val Ser Val Glu Lys Gln Ser ~6 Lys Ser Asp Arg Lys Lys Ser Arg Lys Ala Glu Lys Lys Glu Lys Lys Phe Lys Asp Leu Phe Val Thr Trp Asp Pro Pro Ser Met Glu Met Asp Asp Met Asp Leu Gly Asp Gln Asp Trp Leu Leu Gly Ser Thr Arg Lys Pro Asp Ala Gly Ile Gly Asn Cys Arg Glu Ile Val Asp Pro Leu Thr Ser Gln Ser Ala Glu Gln Phe Ser Leu Gln Pro Arg A1a Ile His Leu Pro Asp Leu His Val Tyr G1n Leu Pro Tyr Val Val Pro Phe <210> 17 <211> 2432 <212> DNA
<213> Oryza sativa cv. ICasalath <220>
<221> misc_feature <222> (1950)..(1950) <223> N = G or C
<220>
<22l> misc_feature <222> (2032)..(2032) <223> N = G or C
<400>

catgtcgaggtgcttcccctacccgccgccggggtacgtgcgaaacccagtggtggccgt60 ggccgcggccgaagcgcaggcgaccactaaggtttgttgaaccatcggatttacacacgc120 acgtgccggatcatttgctcttgcctgttggttttgatcggatctgttggttgtgcgtgt180 gtgatttggggatcgcacgtgcggggaagctaacctttgcatggataacttgagatttgt240 gaggccgcgcttcgaccagatcggtcgccaatcttttagtggctgaccgtggaaagagga300 tattactgaccttcggtttgctaattttggttgtgccgttgaatctgaaataaccagaat360 agtcatggggaaaaaagtctgatctggaaggttcgaattacatttctatatattgttgtg420 ctcccagacgatggttgcaagaaattactcatgctggataaaattgtggatgtaagagtc480 tgcagttgttaaaatctggaaacagcacattttgccgtagtaaatttgaatccatgttgc540 tgtctcgttattggtgtgttacgagtaacctgtgtgttgttatctccgcttggactagat600 tccaagtaatccagtgccttcatgacctgcaaattctatgcctatgaagtaacatgaaca 660 gtttgtatgtattctgttgatgcatacttgcattatttgtgagatgtacatgttgtggta 720 aaattttgcattcaccatatagaaatagtaactgactatccttgtttagttcgaaaacta 780 ctgcaggtttagttattctctgttgccaagagtgcttgttatgattgtaagggttacagt 840 tctgtgactaaccatgtaacaaatatattaaggattatcaaattattctatgtgaagtgt 900 ccgtgccctaattgtgttatcttctgtaactgatagcacaacatttgtttcctgctgtgt 960 gcttgtgtaaattggtacttcatcattactatatatttcaaagaaaattctgcattgcat 1020 tcccgtcgtccgttctaaatcagaactgacgattgctctggtggctgaagctccagaaag 1080 aaagggaaaaggccgaaaagaagaaagagaaaaggagtgacaggaaagctcttccacatg 1140 gtgagatatccaagcattcaaagcgaacccacaagaagagaaaacatgaagacatcaata1200 atgctgatcagaagtcccggaaggtttcctccatggaacctggtgagcaattggagaaga1260 gtggactctcagaagagcatggagctccttgctttactcagacagtgcatggctctccag1320 agagttcacaggacagcagcaagagaagaaaggttgtgttacccagtcctagccaagcta1380 agaatggtgaggccctttcttgcatttgtcttcttttagctggtgatgttgaattggttt1440 gacttatcctgaattatcatcttgcaggtaacatccttcgaataaagataagaagagatc1500 aagattcttcagcttccctttcggagaaatctaatgttgtacaaacaccagttcatcaaa1560 tgggatcagtttcatctctgccaagtaagaaaaactcaatgcaaccacacaacaccgaaa1620 tgatggtgagaacagcatcaacccagcagcaaagcatcaaaggtgattttcaagcagtac1680 tgaaacaaggtatgccaaccccagcaaaagtcatgccaagagtcgatgttcctccatcta1740 tgagggcatcaaaggaaagggttggccttcgtcctgcagagatgttggccaatgttggtc1800 cttcaccctccaaggcaaaacagattgtcaatcctgcagctgctaaggttacacaaagag1860 ttgatcctccacctgccaaggcatctcagagaattgatcctctgttgccatccaaggttc1920 atatagatgctactcgatcttttacgaagntctcccagacagagatcaagccggaagtac1980 agcccccaattccgaaggtgcctgtggctatgcctaccatcaatcgtcagcngattgaca 2040 cctcgcagcccaaagaagagccttgctcctctggcaggaatgctgaagctgcttcagtat 2100 cagtagagaagcagtccaagtcagatcgcaaaaagagccgcaaggctgagaagaaagaga 2160 agaagttcaaagatttatttgttacctgggatcctccgtctatggaaatggatgatatgg 2220 atcttggggaccaggattggctgcttggtagtacgaggaaacctgatgctggcattggca 2280 actgcagagaaattgttgatccacttacttctcaatcagcagagcagttctcattgcagc 2340 ctagggcgattcatttaccagaccttcatgtctatcagttgccatatgtggttccattct 2400 aggtttgtgt agtgagatgg agtaggtgag as 2432 <210> 18 <211> 1341 <212> DNA
<213> Oryza sativa cv. Kasalath <220>
<221> CDS
<222> (1)..(1341) <220>
<221> misc_feature <222> (889)..(889) <223> n = G, C
<220>
<221> misc_feature <222> (971)..(971) <223> n = A, T
<400> 18 atg tcg agg tgc ttc ccc tac ccg ccg ccg ggg tac gtg cga aac cca 48 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro gtg gtg gcc gtg gcc gcg gcc gaa gcg cag gcg acc act aag ctc cag 96 Val Val Ala Val Ala Ala Ala Glu Ala Gln A1a Thr Thr Lys Leu Gln aaa gaa agg gaa aag gcc gaa aag aag aaa gag aaa agg agt gac agg 144 Lys Glu Arg Glu Lys Ala Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg aaa get ctt cca cat ggt gag ata tcc aag cat tca aag cga acc cac 192 Lys Ala Leu Pro His Gly Glu Ile Ser Lys His Ser Lys Arg Thr His aag aag aga aaa cat gaa gac atc aat aat get gat cag aag tcc cgg 240 Lys Lys Arg Lys His Glu Asp Ile Asn Asn Ala Asp Gln Lys Ser Arg aag gtt tcc tcc atg gaa cct ggt gag caa ttg gag aag agt gga ctc 288 Lys Val Ser Ser Met G1u Pro Gly Glu Gln Leu Glu Lys Ser Gly Leu tca gaa gag cat gga get cct tgc ttt act cag aca gtg cat ggc tct 336 Ser Glu Glu His Gly Ala Pro Cys Phe Thr Gln Thr Val His Gly Ser cca gag agt tca cag gac agc agc aag aga aga aag gtt gtg tta ccc 384 Pro Glu Ser Ser Gln Asp Ser Ser Lys Arg Arg Lys Val Val Leu Pro agt cct agc caa get aag aat ggt aac atc ctt cga ata aag ata aga 432 Ser Pro Ser Gln Ala Lys Asn G1y Asn Ile Leu Arg Ile Lys Ile Arg aga gat caa gat tct tca get tcc ctt tcg gag aaa tct aat gtt gta 480 2y Arg Asp G1n Asp Ser Ser Ala Ser Leu Ser Glu Lys Ser Asn Va1 Val caaacacca gttcat caaatggga tcagtttca tctctgcca agtaag 528 GlnThrPro ValHis GlnMetGly SerValSer SerLeuPro SerLys aaaaactca atgcaa ccacacaac accgaaatg atggtgaga acagca 576 LysAsnSer MetGln ProHisAsn ThrGluMet MetValArg ThrAla tcaacccag cagcaa agoatcaaa ggtgatttt caagcagta ctgaaa 624 SerThrGln GlnGln SerIleLys GlyAspPhe GlnAlaVal LeuLys caaggtatg ccaacc ccagcaaaa gtcatgcca agagtcgat gttcct 672 GlnGlyMet ProThr ProAlaLys Va1MetPro ArgValAsp ValPro ccatctatg agggca tcaaaggaa agggttggc cttcgtcct gcagag 720 ProSerMet ArgAla SerLysGlu ArgValGly LeuArgPro AlaGlu atgttggcc aatgtt ggtccttca ccctccaag gcaaaacag attgtc 768 MetLeuAla AsnVal GlyProSer ProSerLys AlaLysGln IleVal aatcctgca getget aaggttaca caaagagtt gatcctcca cctgcc 816 AsnProAla AlaAla LysValThr GlnArgVal AspProPro ProAla aaggcatct cagaga attgatcct ctgttgcca tccaaggtt catata 864 LysAlaSer GlnArg IleAspPro LeuLeuPro SerLysVal HisI1e gatgetact cgatct tttacgaag ntctcccag acagagatc aagccg 912 AspAlaThr ArgSer PheThrLys XaaSerGln ThrGluIle LysPro gaagtacag ccccca attccgaag gtgcctgtg getatgcct accatc 960 GluValGln ProPro IleProLys Va1ProVal AlaMetPro ThrIle aatcgtcag cngatt gacacctcg cagcccaaa gaagagcct tgctcc 1008 AsnArgGln XaaI1e AspThrSer GlnProLys GluGluPro CysSer tctggcagg aatget gaagetget tcagtatca gtagagaag cagtcc 1056 SerG1yArg AsnAla GluAlaAla SerValSer ValGluLys GlnSer aagtcagat cgcaaa aagagccgc aaggetgag aagaaagag aagaag 1104 LysSerAsp ArgLys LysSerArg LysAlaGlu LysLysGlu LysLys ttcaaagat ttattt gttacctgg gatcctccg tctatggaa atggat 1152 PheLysAsp LeuPhe ValThrTrp AspProPro SerMetGlu MetAsp gatatggat cttggg gaccaggat tggctgctt ggtagtacg aggaaa 1200 AspMetAsp LeuGly AspGlnAsp TrpLeuLeu GlySerThr ArgLys cct gat get ggc att ggc aac tgc aga gaa att gtt gat cca ctt act 1248 1 Pro Asp Ala Gly Ile Gly Asn Cys Arg Glu Ile Val Asp Pro Leu Thr tct caa tca gca gag cag ttc tca ttg cag cct agg gcg att cat tta 1296 Ser Gln Ser Ala Glu Gln Phe Ser Leu Gln Pro Arg Ala Ile His Leu cca gac ctt cat gtc tat cag ttg cca tat gtg gtt cca ttc tag 1341 Pro Asp Leu His Val Tyr Gln Leu Pro Tyr Val Val Pro Phe <210> 19 <211> 446 <212> PRT
<213> Oryza sativa cu. Kasalath <220>
<221> misc_feature <222> (297)..(297) <223> The 'Xaa' at location 297 stands for Ile, Val, Leu, or Phe.
<220>
<221> misc_feature <222> (324)..(324) <223> The 'Xaa' at location 324 stands for Gln, Arg, Pro, or Leu.
<400> 19 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro Val Val Ala Val Ala Ala Ala Glu Ala Gln Ala Thr Thr Lys Leu Gln Lys Glu Arg Glu Lys Ala Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg Lys Ala Leu Pro His Gly Glu Ile Ser Lys His Ser Lys Arg Thr His Lys Lys Arg Lys His Glu Asp Ile Asn Asn A1a Asp Gln Lys Ser Arg Lys Val Ser Ser Met Glu Pro Gly Glu Gln Leu Glu Lys Ser Gly Leu Ser Glu Glu His Gly Ala Pro Cys Phe Thr Gln Thr Va1 His Gly Ser Pro Glu Ser Ser G1n Asp Ser Ser Lys Arg Arg Lys Val Val Leu Pro Ser Pro Ser Gln Ala Lys Asn Gly Asn Ile Leu Arg Ile Lys Ile Arg Arg Asp Gln Asp Ser Ser Ala Ser Leu Ser Glu Lys Ser Asn Val Val l45 150 155 160 Gln Thr Pro Val His Gln Met Gly Ser Val Ser Ser Leu Pro Ser Lys Lys Asn Ser Met Gln Pro His Asn Thr Glu Met Met Val Arg Thr Ala Ser Thr Gln Gln Gln Ser Ile Lys Gly Asp Phe Gln Ala Val Leu Lys Gln Gly Met Pro Thr Pro Ala Lys Val Met Pro Arg Val Asp Val Pro Pro Ser Met Arg Ala Ser Lys Glu Arg Val Gly Leu Arg Pro Ala Glu Met Leu Ala Asn Val Gly Pro Ser Pro Ser Lys Ala Lys Gln Ile Val Asn Pro Ala Ala Ala Lys Val Thr Gln Arg Val Asp Pro Pro Pro Ala Lys Ala Ser Gln Arg Ile Asp Pro Leu Leu Pro Ser Lys Val His Ile Asp Ala Thr Arg Ser Phe Thr Lys Xaa Ser Gln Thr Glu I1e Lys Pro Glu Val Gln Pro Pro Ile Pro Lys Val Pro Val Ala Met Pro Thr Ile Asn Arg Gln Xaa Ile Asp Thr Ser Gln Pro Lys Glu Glu Pro Cys Ser Ser Gly Arg Asn Ala Glu Ala Ala Ser Val Ser Val Glu Lys Gln Ser Lys Ser Asp Arg Lys Lys Ser Arg Lys Ala Glu Lys Lys Glu Lys Lys ~G

Phe Lys Asp Leu Phe Val Thr Trp Asp Pro Pro Ser Met Glu Met Asp Asp Met Asp Leu Gly Asp Gln Asp Trp Leu Leu Gly Ser Thr Arg Lys Pro Asp Ala Gly Ile Gly Asn Cys Arg Glu Tle Val Asp Pro Leu Thr Ser Gln Ser Ala Glu Gln Phe Ser Leu Gln Pro Arg Ala Ile His Leu Pro Asp Leu His Val Tyr Gln Leu Pro Tyr Val Val Pro Phe <210>

<211> 7 <212>
DNA

<213> 48 Oryza rufipogon strain <400>

ccctacccgccgccggggtacgtgcgaaacccagtggtggccgtggccgcggccgaagcg60 caggcgaccactaaggtttgttgaaccatcggatttacacacgcacgtgccggatcattt120 gctcttgcctgttggttttgatcggatctgttggttgtgcgtgtgtgatttggggatcgc180 acgtgcggggaagctaacctttgcatggataacttgagatttgtgaggccgcgcttcgac240 cagatcggtcgccaatcttttagtggctgaccgtggaaagaggatattactgaccttcgg300 tttgctaattttggttgtgccgttgaatctgaaataaccagaatagtcatggggaaaaag360 tctgatctggaaggttcgaattacatttctatatattgttgtgctcccagacgatggttg420 caagaaattactcatgctggataaaattgtggatgtaagagtctgcagttgttaaaatct480 ggaaacagcacattttgccgtagtaaatttgaatccatgttgctgtctcgttattggtgt540 gttacgagtaacctgtgtgttgttatctccgcttggactagattccaagtaatccagtgc600 cttcatgacctgcaaattctatgcctatgaagtaacatgaacagtttgtatgtattctgt660 tgatgcatacttgcattatttgtgagatgtacatgttgtggtaaaattttgcattcacca720 tatagaaatagtaattgactatccttgtttagttcgaaaacttctgcaggtttagttatt780 ctctgttgccaagagtgcttgttatgattgtaagggttacagttctgtgactaaccatgt840 aacaaatatattaaggattatcaaattattctatgtgaagtgtccgtgccctaattgtgt900 tatcttctgtaactgatagcacaacatttgtttcctgctgtgtgcttgtgtaaattggta960 cttcatcattactatatatttcaaagaaaattctgcattgcattcccgtcgtccgttcta1020 aatcagaact gacgattgct ctggtggctg aagctccaga aagaaaggga aaaggccgaa 1080 aagaagaaagagaaaaggagtgacaggaaagctcttccacatggtgagatatccaagcat 1140 tcaaagcgaacccacaagaa.gagaaaacatgaagacatcaataatgctgatcagaagtcc 1200 cggaaggtttcctccatggaacctggtgagcaattggagaagagtggactctcagaagag 1260 catggagctccttgctttactcagacagtgcatggctctccagagagttcacaggacagc 1320 agcaagagaagaaaggttgtgttacccagtcctagccaagctaagaatggtgaggccctt 1380 tcttgcatttgtcttcttttagctggtgatgttgaattggtttgacttatcctgaattat 1440 catcttgcaggtaacatccttcgaataaagataagaagagatcaagattcttcagcttcc 1500 ctttcggaga aatctaatgt tgtacaaaca ccagttcatc aaatgggatc agtttcatct 1560 ctgccaagta agaaaaactc aatgcaacca cacaacaccg aaatgatggt gagaacagca 1620 tcaacccagcagcaaagcatcaaaggtgattttcaagcagtactgaaacaaggtatgcca1680 accccagcaaaagtcatgccaagagtcgatgttcctccatctatgagggcatcaaaggaa1740 agggttggccttcgtcctgcagagatgttggccaatgttggtccttcaccctccaaggca1800 aaacagattgtcaatcctgcagctgctaaggttacacaaagagttgatcctccacctgcc1860 aaggcatctcagagaattgatcctctgttgccatccaaggttcatatagatgctactcga1920 tcttttacgaagctctcccagacagagatcaagccggaagtacagcccccaattccgaag1980 gtgcctgtggctatgcctaccatcaatcgtcagcagattgacacctcgcagcccaaagaa2040 gagccttgctcctctggcaggaatgctgaagctgcttcagtatcagtagagaagcagtcc2100 aagtcagatcgcaaaaagagccgcaaggctgagaagaaagagaagaagttcaaagattta2160 tttgttacctgggatcctccgtctatggaaatggatgatatggatcttggggaccaggat2220 tggctgcttggtagtacgaggaaacctgatgctggcattggcaactgcagagaaattgtt2280 gatccacttacttctcaatcagcggagcagttctcattgcagcctagggcgattcattta2340 ccagaccttcatgtctatcagttgccatatgtggttccattctaggtttgtgtagtgaga2400 tggagtaggt gagaagtaga gagatgttgg gagagagctg tgtgggt 2447 <210> 21 <211> 1341 <212> DNA , <213> Oryza rufipogon strain 5948 <220>
<221> CDS
<222> (1)..(1341) <220>

<221> feature misc_ <222> 1)..(15) ( <223> = G, T
n A, C, <400> 1 nnnnnn nnnnnnnnn ccctacccg ccgccgggg tacgtgcga aaccca 48 XaaXaa XaaXaaXaa ProTyrPro ProProGly TyrValArg AsnPro gtggtg gccgtggcc gcggccgaa gcgcaggcg accactaag ctccag 96 ValVal AlaValAla AlaAlaGlu AlaGlnAla ThrThrLys LeuG1n aaagaa agggaaaag gccgaaaag aagaaagag aaaaggagt gacagg 144 LysGlu ArgGluLys AlaGluLys LysLysG1u LysArgSer AspArg aaaget cttccacat ggtgagata tccaagcat tcaaagcga acccac 192 LysAla LeuProHis GlyGluI1e SerLysHis SerLysArg ThrHis aagaag agaaaacat gaagacatc aataatget gatcagaag tcccgg 240 LysLys ArgLysHis GluAspI1e AsnAsnAla AspGlnLys SerArg aaggtt tcctccatg gaacctggt gagcaattg gagaagagt ggactc 288 LysVal SerSerMet GluProGly GluG1nLeu GluLysSer G1yLeu tcagaa gagcatgga getccttgc tttactcag acagtgcat ggctct 336 SerGlu G1uHisG1y AlaProCys PheThrGln ThrValHis GlySer ccagag agttcacag gacagcagc aagagaaga aaggttgtg ttaccc 384 ProGlu SerSerGln AspSerSer LysArgArg LysValVal LeuPro agtcct agccaaget aagaatggt aacatcctt cgaataaag ataaga 432 SerPro SerGlnAla LysAsnGly AsnIleLeu ArgIleLys 21eArg agagat caagattct tcagettcc ctttcggag aaatctaat gttgta 480 ArgAsp GlnAspSer SerAlaSer LeuSerGlu LysSerAsn ValVal caaaca ccagttcat caaatggga tcagtttca tctctgcca agtaag 528 GlnThr ProValHis GlnMetG1y SerVa1Ser SerLeuPro SerLys aaaaac tcaatgcaa ccacacaac accgaaatg atggtgaga acagca 576 LysAsn SerMetGln ProHisAsn ThrGluMet MetValArg ThrAla tcaacc cagcagcaa agcatcaaa ggtgatttt caagcagta ctgaaa 624 SerThr GlnGlnGln SerIleLys GlyAspPhe GlnAlaVal LeuLys 195 200 205 ' caaggt atgccaacc ccagcaaaa gtcatgcca agagtcgat gttcct 672 GlnGly MetProThr ProAlaLys ValMetPro ArgValAsp ValPro ccatctatgagg gcatca aaggaaagg gttggcctt cgtcctgca gag 720 ProSerMetArg AlaSer LysGluArg ValGlyLeu ArgProAla Glu atgttggccaat gttggt ccttcaccc tccaaggca aaacagatt gtc 768 MetLeuAlaAsn ValGly ProSerPro SerLysAla LysGlnIle Val aatcctgcaget getaag gttacacaa agagttgat cctccacct gcc 816 AsnProAlaAla AlaLys ValThrGln ArgValAsp ProProPro Ala aaggcatctcag agaatt gatcctctg ttgccatcc aaggttcat ata 864 LysAlaSerGln ArgIle AspProLeu LeuProSer LysValHis Ile gatgetactcga tctttt acgaagctc tcccagaca gagatcaag ccg 912 AspAlaThrArg SerPhe ThrLysLeu SerGlnThr GluTleLys Pro gaagtacagccc ccaatt ccgaaggtg cctgtgget atgcct.acc atc 960 GluValGlnPro ProIle ProLysVal ProValAla MetProThr Ile aatcgtcagcag attgac acctcgcag cccaaagaa gagccttgc tcc 1008 AsnArgGlnGln IleAsp ThrSerG1n ProLysGlu GluProCys Ser tctggcaggaat getgaa getgettca gtatcagta gagaagcag tcc 1056 SerGlyArgAsn AlaGlu AlaAlaSer ValSerVal GluLysGln Ser aagtcagatcgc aaaaag agccgcaag getgagaag aaagagaag aag 1104 LysSerAspArg LysLys SerArgLys AlaGluLys LysGluLys Lys ttcaaagattta tttgtt acctgggat cctccgtct atggaaatg gat 1152 PheLysAspLeu PheVal ThrTrpAsp ProProSer MetGluMet Asp gatatggatctt ggggac caggattgg ctgcttggt agtacgagg aaa 1200 AspMetAspLeu GlyAsp GlnAspTrp LeuLeuGly SerThrArg Lys cctgatgetggc attggc aactgcaga gaaattgtt gatccactt act 1248 ProAspAlaG1y IleGly AsnCysArg GluTleVal AspProLeu Thr.

tctcaatcagcg gagcag ttctcattg cagcctagg gcgattcat tta 1296 SerGlnSerAla GluGln PheSerLeu GlnProArg AlaIleHis Leu ccagaccttcat gtctat cagttgcca tatgtggtt ccattctag 1341 ProAspLeuHis ValTyr GlnLeuPro TyrValVal ProPhe <210>

<211> 46 ' <212>
PRT

~6 <213> Oryza rufipogon strain 5948 <220>
<221> misc_feature <222> (1). (1) <223> The 'Xaa' at location 1 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (2). (2) <223> The 'Xaa' at location 2 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (3)..(3) <223> The 'Xaa' at location 3 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (4). (4) <223> The 'Xaa' at location 4 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (5)..(5) <223> The 'Xaa' at location 5 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<400> 22 Xaa Xaa Xaa Xaa Xaa Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro Val Val Ala Va1 Ala Ala Ala Glu Ala Gln Ala Thr Thr Lys Leu Gln Lys Glu Arg Glu Lys Ala Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg Lys Ala Leu Pro His Gly Glu Ile Ser Lys His Ser Lys Arg Thr His Lys Lys Arg Lys His Glu Asp Ile Asn Asn Ala Asp Gln Lys Ser Arg Lys Val Ser Ser Met Glu Pro Gly Glu Gln Leu Glu Lys Ser Gly Leu Ser Glu Glu His Gly Ala Pro Cys Phe Thr Gln Thr Val His Gly Ser Pro Glu Ser Ser Gln Asp Ser Ser Lys Arg Arg Lys Val Val Leu Pro Ser Pro Ser Gln A1a Lys Asn Gly Asn I1e Leu Arg I1e Lys Ile.Arg Arg Asp Gln Asp Ser Ser Ala Ser Leu Ser Glu Lys Ser Asn Val Val Gln Thr Pro Val His Gln Met G1y Ser Val Ser Ser Leu Pro Ser Lys 165 l70 175 Lys Asn Ser Met Gln Pro His Asn Thr G1u Met Met Val Arg Thr Ala Ser Thr Gln G1n Gln Ser Ile Lys Gly Asp Phe Gln Ala Val Leu Lys Gln Gly Met Pro Thr Pro Ala Lys Val Met Pro Arg Val Asp Val Pro Pro Ser Met Arg Ala Ser Lys Glu Arg Va1 Gly Leu Arg Pro Ala Glu Met Leu Ala Asn Val Gly Pro Ser Pro Ser Lys Ala Lys Gln Ile Val Asn Pro Ala Ala Ala Lys Val Thr Gln Arg Va1 Asp Pro Pro Pro Ala Lys Ala Ser Gln Arg Ile Asp Pro Leu Leu Pro Ser Lys Val His Ile Asp A1a Thr Arg Ser Phe Thr Lys Leu Ser Gln Thr Glu I1e Lys Pro Glu Val Gln Pro Pro Ile Pro Lys Val Pro Val Ala Met Pro Thr Ile Asn Arg Gln Gln Ile Asp Thr Ser Gln Pro Lys Glu Glu Pro Cys Ser Ser Gly Arg Asn A1a Glu Ala Ala Ser Val Ser Val Glu Lys Gln Ser 340 345 350 ' Lys Ser Asp Arg Lys Lys Ser Arg Lys Ala G1u Lys Lys Glu Lys Lys Phe Lys Asp Leu Phe Val Thr Trp Asp Pro Pro Ser Met Glu Met Asp Asp Met Asp Leu Gly Asp Gln Asp Trp Leu Leu Gly Ser Thr Arg Lys Pro Asp Ala Gly Ile Gly Asn Cys Arg Glu Ile Val Asp Pro Leu Thr Ser Gln Ser Ala Glu Gln Phe Ser Leu Gln Pro Arg Ala Ile His Leu Pro Asp Leu His Val Tyr Gln Leu Pro Tyr Val Val Pro Phe <210> 23 <211> 146 <212> DNA
<213> Oryza rufipogon strain 5949 <400> 23 cccctacctc tgtgtgatcc gggggtgagc ttaggccgga cgccggggca tcagccatgt 60 cgaggtgctt cccctacccg ccgccggggt acgtgcgaaa cccagtggtg gccgtggccg 120 cggccgaagc gcaggcgacc actaag 146 <210>

<211>

<212>
DNA

<213> a rufipogonstrain Oryz 5949 <400>

tctgtgactaaccatgtaacaaatatattaaggattatcaaattattctatgtgaagtgt60 ccgtgccctaattgtgttatcttctgtaactgatagcacaacatttgtttcctgctgtgt120 gcttgtgtaaattggtacttcatcattactatatatttcaaagaaaattctgcattgcat180 tcccgtcgtccgttctaaatcagaactgacgattgctctggtggctgaagctccagaaag240 aaagggaaaaggccgaaaagaagaaagagaaaaagagtgacaggaaagctcttccacatg300 gtgagatatccaagcattcaaagcgaacccacaagaagagaaaacatgaagacatcaata360 atgctgatca gaagtcccgg aaggtttcct ccatggaacc tggtgagcaa ttggagaaga 420 gtggactctc agaagagcat ggagctcctt gctttactca gacagtgcat ggctctccag 480 agagttcacaggacagcagcaagagaagaaaggttgtgttacccagtcctagccaagcta540 agaatggtgaggccctttcttgcatttgtcttctcttagctggtgatgttgaattggttt600 gacttatcotgaattatcatcttgcaggtaacatccttcgaataaagataagaagagatc660 aagattcttcagcttccctttcggagaaatctaatgttgtacaaacaccagttcatcaaa720 tgggatcagtttcatctctgccaagtaagaaaaactcaatgcaaccacacaacaccgaaa780 tgatggtgagaacagcatcaacccagcagcaaagcatcaaaggtgattttcaagcagtac840 tgaaacaaggtatgccaaccccagcaaaagtcatgccaagagtcgatgttcctccatcta900 tgagggcatcaaaggaaagggttggccttcgtcctgcagagatgttggccaatgttggtc960 cttcaccatccaaggcaaaacagattgtcaatcctgcagctgctaaggttacacaaagag1020 ttgatcctccacctgccaaggcatctcagagaattgatcctctgttgccatccaaggttc1080 atatagatgctactcgatcttttacgaaggtctcccagacagagatcaagccggaagtac1140 agcccccaattccgaaggtgcctgtggctatgcctaccatcaatcgtcagcagattgaca1200 cctcgcagcccaaagaagagccttgctcctctggcaggaatgctgaagctgcttcagtat1260 cagtagagaagcagtccaagtcagatcgcaaaaagagccgcaaggctgagaagaaagaga1320 agaagttcaaagatttatttgttacctgggatcctccgtctatggaaatggatgatatgg1380 atcttggggaccaggattggctgcttggtagtacgaggaaacctgatgctggcattggca1440 actgcagagaaattgttgatccacttacttctcaatcagcagagcagttctcattgcagc1500 ctagggcgattcatttaccagaccttcatgtctatcagttgccatatgtggttccattct1560 aggtttgtgtagtgagatggagtaggtgagaagtagagagatgttgggagagagc 1615 <210> 25 <211> 1341 <212> DNA
<213> Oryza rufipogon strain 5949 <220>
<22l> CDS
<222> (1)..(1341) <400> 25 atg tcg agg tgc ttc ccc tac ccg ccg ccg ggg tac gtg cga aac cca 48 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro gtg gtg gcc gtg gcc gcg gcc~gaa gcg cag gcg acc act aag ctc cag 96 Val Val Ala Val Ala Ala Ala Glu Ala Gln Ala Thr Thr Lys Leu Gln aaagaa agggaaaag gccgaaaag aagaaagag aaaaagagt gacagg 144 LysGlu ArgGluLys AlaGluLys LysLysGlu LysLysSer AspArg aaaget cttccacat ggtgagata tccaagcat tcaaagcga acccac 192 LysAla LeuProHis GlyGluIle SerLysHis SerLysArg ThrHis aagaag agaaaacat gaagacatc aataatget gatcagaag tcccgg 240 LysLys ArgLysHis GluAspIle AsnAsnAla AspGlnLys SerArg aaggtt tcctccatg gaacctggt gagcaattg gagaagagt ggactc 288 LysVal SerSerMet GluProGly GluGlnLeu GluLysSer GlyLeu tcagaa gagcatgga getccttgc tttactcag acagtgcat ggctct 336 SerGlu GluHisG1y AlaProCys PheThrGln ThrValHis GlySer ccagag agttcacag gacagcagc aagagaaga aaggttgtg ttaccc 384 ProGlu SerSerGln AspSerSer LysArgArg LysValVal LeuPro agtcct agccaaget aagaatggt aacatcctt cgaataaag ataaga 432 SerPro SerGlnAla LysAsnGly AsnIleLeu ArgIleLys IleArg agagat caagattct tcagettcc ctttcggag aaatctaat gttgta 480 ArgAsp GlnAspSer SerAlaSer LeuSerGlu LysSerAsn ValVal 145 150 155 l60 caaaca ccagttcat caaatggga tcagtttca tctctgcca agtaag 528 GlnThr ProValHis GlnMetGly SerValSer SerLeuPro SerLys aaaaac tcaatgcaa ccacacaac accgaaatg atggtgaga acagca ,576 LysAsn SerMetGln ProHisAsn ThrGluMet MetValArg ThrAla tcaacc cagcagcaa agcatcaaa ggtgatttt caagcagta ctgaaa 624 SerThr GlnGlnGln SerIleLys GlyAspPhe GlnAlaVal LeuLys caaggt atgccaacc ccagcaaaa gtcatgcca agagtcgat gttcct 672 GlnGly MetProThr ProAlaLys ValMetPro ArgValAsp ValPro ccatct atgagggca tcaaaggaa agggttggc cttcgtcct gcagag 720 ProSer MetArgAla SerLysGlu ArgValGly LeuArgPro AlaGlu atgttg gccaatgtt ggtccttca ccatccaag gcaaaacag attgtc 768 MetLeu A1aAsnVal GlyProSer ProSerLys AlaLysGln IleVal aatcct gcagetget aaggttaca caaagagtt gatcctcca cctgcc 816 AsnPro AlaAlaA1a LysValThr Gln Val AspProPro ProAla Arg aaggca tctcagaga attgatcct ctgttgcca tccaaggtt catata 864 LysAla SerGlnArg IleAspPro LeuLeuPro SerLysVal HisIle gatget actcgatct tttacgaag gtctcccag acagagatc aagccg 912 AspAla ThrArgSer PheThrLys ValSerGln ThrGluIle LysPro gaagta cagccccca attccgaag gtgcctgtg getatgcct accatc 960 GluVal GlnProPro IleProLys ValProVal AlaMetPro ThrIle aatcgt cagcagatt gacacctcg cagcccaaa gaagagcct tgctcc 1008 AsnArg GlnGln21e AspThrSer GlnProLys GluGluPro CysSer tctggcagg aatget gaagetget tcagtatca gtagagaag cagtcc 1056 SerGlyArg AsnAla GluAlaAla SerValSer ValGluLys GlnSer aagtcagat cgcaaa aagagccgc aaggetgag aagaaagag aagaag 1104 LysSerAsp ArgLys LysSerArg LysAlaGlu LysLysGlu LysLys ttcaaagat ttattt gttacctgg gatcctccg tctatggaa atggat 1152 PheLysAsp LeuPhe ValThrTrp AspProPro SerMetGlu MetAsp gatatggat cttggg gaccaggat tggctgctt ggtagtacg aggaaa 1200 AspMetAsp LeuGly AspGlnAsp TrpLeuLeu G1ySerThr ArgLys cctgatget ggcatt ggcaactgc agagaaatt gttgatcca cttact 1248 ProAspA1a GlyIle GlyAsnCys ArgGluIle ValAspPro LeuThr tctcaatca gcagag cagttctca ttgcagcct agggcgatt cattta 1296 SerGlnSer AlaGlu G1nPheSer LeuG1nPro ArgAlaIle HisLeu ccagacctt catgtc tatcagttg ccatatgtg gttccattc tag 1341 ProAspLeu HisVal TyrGlnLeu ProTyrVal ValProPhe <210> 26 <211> 446 <212> PRT
<213> Oryza rufipogon strain 5949 <400> 26 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro Val Val A1a Val Ala Ala Ala Glu Ala Gln Ala Thr Thr Lys Leu Gln Lys Glu Arg Glu Lys Ala Glu Lys Lys Lys Glu Lys Lys Ser Asp Arg Lys Ala Leu Pro His Gly Glu I1e Ser Lys His Ser Lys Arg Thr His Lys Lys Arg Lys His Glu Asp Ile Asn Asn Ala Asp Gln Lys Ser Arg Lys Val Ser Ser Met Glu Pro Gly Glu Gln Leu Glu Lys Ser Gly Leu Ser Glu Glu His Gly Ala Pro Cys Phe Thr Gln Thr Val His Gly Ser Pro Glu Ser Ser Gln Asp Ser Ser Lys Arg Arg Lys Val Val Leu Pro Ser Pro Ser Gln Ala Lys Asn Gly Asn Ile Leu Arg Ile Lys Ile Arg Arg Asp Gln Asp Ser Ser Ala Ser Leu Ser Glu Lys Ser Asn Val Val Gln Thr Pro Val His Gln Met Gly Ser Val Ser Ser Leu Pro Ser Lys Lys Asn Ser Met Gln Pro His Asn Thr Glu Met Met Val Arg Thr Ala Ser Thr Gln Gln Gln Ser 21e Lys Gly Asp Phe Gln A1a Val Leu Lys Gln Gly Met Pro Thr Pro Ala Lys Val Met Pro Arg Val Asp Val Pro Pro Ser Met Arg Ala Ser Lys Glu Arg Val G1y Leu Arg Pro Ala Glu Met Leu Ala Asn Val Gly Pro Ser Pro Ser Lys Ala Lys Gln Ile Val Asn Pro Ala Ala Ala Lys Val Thr Gln Arg Val Asp Pro Pro Pro Ala Lys Ala Ser Gln Arg Ile Asp Pro Leu Leu Pro Ser Lys Val His Ile Asp Ala Thr Arg Ser Phe Thr Lys Val Ser Gln Thr Glu Ile Lys Pro Glu Val Gln Pro Pro Ile Pro Lys Val Pro Val Ala Met Pro Thr Ile Asn Arg Gln Gln Ile Asp Thr Ser Gln Pro Lys Glu Glu Pro Cys Ser Ser Gly Arg Asn Ala Glu Ala Ala Ser Val Ser Val Glu Lys Gln Ser Lys Ser Asp Arg Lys Lys Ser Arg Lys Ala Glu Lys Lys Glu Lys Lys Phe Lys Asp Leu Phe Val Thr Trp Asp Pro Pro Ser Met Glu Met Asp Asp Met Asp Leu Gly Asp Gln Asp Trp Leu Leu Gly Ser Thr Arg Lys Pro Asp Ala Gly Ile Gly Asn Cys Arg Glu Ile Val Asp Pro Leu Thr Ser G1n Ser Ala Glu Gln Phe Ser Leu G1n Pro Arg Ala Ile His Leu Pro Asp Leu His Val Tyr Gln Leu Pro Tyr Val Val Pro Phe <210> 27 <211> 107 <212> DNA
<213> Oryza rufipogon strain 5953 <400> 27 acgccggggc atcagccatg tcgaggtgct tcccctaccc gccgccgggg tacgtgcgaa 60 acccagtggt ggccgtggcc gcggccgaag cgcaggcgac cactaag l07 <210> 28 <211> 1332 ' <212> DNA
<213> Oryza rufipogon strain 5953 <400> 28 ctccagaaag aaagggaaaa ggccgaaaag aagaaagaga aaaagagtga caggaaagct 60 cttccacatggtgagatatccaagcattcaaagcgaacccacaagaagagaaaacatgaa 120 gacatcaataatgctgatcagaagtcccggaaggtttcctccatggaacctggtgagcaa 180 ttggagaagagtggactctcagaagagcatggagctccttgctttactcagacagtgcat 240 ggctctccagagagttcacaggacagcagcaagagaagaaaggttgtgttacccagtcct 300 agccaagctaagaatggtgaggccctttcttgcatttttcttcttttagctggtgatgtt 360 gaattggtttgacttatcctgaattatcatcttgcaggtaacatccttcgaataaagata 420 agaagagatcaagattcttcagcttccctttcggagaaatctaatgttgtacaaacacca480 gttcatcaaatgggatcagtttcatctctgccaagtaagaaaaactcaatgcaaccacac540 aacaccgaaatgatggtgagaacagcatcaacccagcagcaaagcatcaaaggtgatttt600 caagcagtactgaaacaaggtatgccaaccccagcaaaagtcatgccaagagtcgatgtt660 cctccatctatgagggcatcaaaggaaagggttggccttcgtcctgcagagatgttggcc720 aatgttggtccttcaccctccaaggcaaaacagattgtcaatcctgcagctgctaaggtt780 acacaaagagttgatcctccacctgccaaggcatctcagagaattgatcctctgttgcca840 tccaaggttcatatagatgctactcgatcttttacgaagctctcccagacagagatcaag900 ccggaagtacagcccccaattccgaaggtgcctgtggctatgcctaccatcaatcgtcag960 cagattgacacctcgcagcccaaagaagagccttgctcctctggcaggaatgctgaagct1020 gcttcagtatcagtagagaagcagtccaagtcagatcgcaaaaagagccgcaaggctgag1080 aagaaagagaagaagttcaaagatttatttgttacctgggatcctccgtctatggaaatg1140 gatgatatggatcttggggaccaggattggctgcttggtagtacgaggaaacctgatgct1200 ggcattggcaactgcagagaaattgttgatccacttacttctcaatcagcggagcagttc1260 tcattgcagcc,tagggcgattcatttaccagaccttcatgtctatcagttgccatatgtg1320 gttccattct ag 1332 <210> 29 <211> 1341 <212> DNA
<213> Oryza rufipogon strain 5953 <220>
<221> CDS
<222> (1)..(1341) <400> 29 atg tcg agg tgc ttc ccc tac ccg ccg ccg ggg tac gtg cga aac cca 48 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro gtggtggcc gtggccgcg gccgaagcg caggcg accactaag ctccag 96 ValValAla ValAlaAla AlaGluAla GlnAla ThrThrLys LeuGln aaagaaagg gaaaaggcc gaaaagaag aaagag aaaaagagt gacagg 144 LysGluArg GluLysAla GluLysLys LysGlu LysLysSer AspArg aaagetctt ccacatggt gagatatcc aagcat tcaaagcga acccac 192 LysAlaLeu ProHisGly GluIleSer LysHis SerLysArg ThrHis aagaagaga aaacatgaa gacatcaat aatget gatcagaag tcccgg 240 LysLysArg LysHisG1u AspIleAsn AsnAla AspG1nLys SerArg aaggtttcc tccatggaa cctggtgag caattg gagaagagt ggactc 288 LysValSer SerMetGlu ProGlyGlu GlnLeu GluLysSer GlyLeu tcagaagag catggaget ccttgcttt actcag acagtgcat ggctct 336 SerGluGlu HisGlyAla ProCysPhe ThrGln ThrValHis GlySer ccagagagt tcacaggac agcagcaag agaaga aaggttgtg ttaccc 384 ProGluSer SerGln.Asp SerSerLys ArgArg LysValVal LeuPro agtcctagc caagetaag aatggtaac atcctt cgaataaag ataaga 432 SerProSer GlnAlaLys AsnGlyAsn IleLeu ArgI1eLys TleArg agagatcaa gattcttca gettccctt tcggag aaatctaat gttgta 480 ArgAspGln AspSerSer AlaSerLeu SerGlu LysSerAsn ValVal caaacacca gttcatcaa atgggatca gtttca tctctgcca agtaag 528 GlnThrPro ValHisG1n MetGlySer ValSer SerLeuPro SerLys aaaaactca atgcaacca cacaacacc gaaatg atggtgaga acagca 576 LysAsnSer MetGlnPro HisAsnThr GluMet MetValArg ThrAla tcaacccag cagcaaagc atcaaaggt gatttt caagcagta ctgaaa 624 SerThrGln GlnGlnSer IleLysGly AspPhe GlnAlaVal LeuLys 195 200 . 205 caaggtatg ccaacccca gcaaaagtc atgcca agagtcgat gttcct 672 GlnGlyMet ProThrPro AlaLysVal MetPro ArgValAsp ValPro ccatctatg agggcatca aaggaaagg gttggc cttcgtcct gcagag 720 ProSerMet ArgAlaSer LysGluArg ValGly LeuArgPro AlaGlu atgttggcc aatgttggt ccttcaccc tccaag gcaaaacag attgtc 768 MetLeuAla AsnValGly ProSerPro SerLys AlaLysGln IleVal aatcctgca getgetaag gttacacaa agagtt gatcctcca cctgcc 816 46 , AsnProAla AlaAlaLys Va1ThrGln ArgValAsp ProPro ProAla aaggcatct cagagaatt gatcctctg ttgccatcc aaggtt catata 864 LysAlaSer G1nArgIle AspProLeu LeuProSer LysVal HisIle gatgetact cgatctttt acgaagctc tcccagaca gagatc aagccg 912 AspAlaThr ArgSerPhe ThrLysLeu SerGlnThr GluIle LysPro gaagtacag cccccaatt ccgaaggtg cctgtgget atgcct accatc 960 GluValG1n ProProIle ProLysVal ProValAla MetPro ThrIle aatcgtcag cagattgac acctcgcag cccaaagaa gagcct tgctcc 1008 AsnArgGln GlnIleAsp ThrSerGln ProLysGlu GluPro CysSer tctggcagg aatgetgaa getgettca gtatcagta gagaag cagtcc 1056 SerGlyArg AsnAlaGlu AlaAlaSer ValSerVal GluLys G1nSer aagtcagat cgcaaaaag agccgcaag getgagaag aaagag aagaag 1104 LysSerAsp ArgLysLys SerArgLys AlaGluLys LysGlu LysLys ttcaaagat ttatttgtt acctgggat cctccgtct atggaa atggat 1152 PheLysAsp LeuPheVa1 ThrTrpAsp ProProSer MetGlu MetAsp gatatggat cttggggac caggattgg ctgcttggt agtacg aggaaa 1200 AspMetAsp LeuGlyAsp GlnAspTrp LeuLeuGly SerThr ArgLys cctgatget ggcattggc aactgcaga gaaattgtt gatcca cttact 1248 ProAspAla G1yTleGly AsnCysArg GluIleVal AspPro LeuThr 405 410 4l5 tctcaatca gcggagcag ttctcattg cagcctagg gcgatt cattta 1296 SerGlnSer AlaGluGln PheSerLeu GlnProArg AlaIle HisLeu ccagacctt catgtctat cagttgcca tatgtggtt ccattc tag 1341 ProAspLeu HisVa1Tyr GlnLeuPro TyrValVal ProPhe <210> 30 <211> 446 <212> PRT

<213> Oryza fipogon strain 953 ru 5 <400> 30 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro Val Val Ala Val Ala Ala Ala Glu Ala Gln Ala Thr Thr Lys Leu G1n Lys Glu Arg Glu Lys Ala Glu Lys Lys Lys G1u Lys Lys Ser Asp Arg Lys Ala Leu Pro His Gly Glu Ile Ser Lys His Ser Lys Arg Thr His Lys Lys Arg Lys His Glu Asp Ile Asn Asn Ala Asp Gln Lys Ser Arg Lys Val Ser Ser Met Glu Pro Gly Glu Gln Leu Glu Lys Ser Gly Leu Ser Glu Glu His G1y Ala Pro Cys Phe Thr Gln Thr Val His Gly Ser Pro Glu Ser Ser Gln Asp Ser Ser Lys Arg Arg Lys Val Val Leu Pro Ser Pro Ser Gln Ala Lys Asn Gly Asn Ile Leu Arg Ile Lys Ile Arg Arg Asp Gln Asp Ser Ser Ala Ser Leu Ser Glu Lys Ser Asn Va1 Val Gln Thr Pro Val His Gln Met Gly Ser Val Ser Ser Leu Pro Ser Lys Lys Asn Ser Met Gln Pro His Asn Thr Glu Met Met Val Arg Thr Ala Ser Thr G1n Gln Gln Ser Ile Lys Gly Asp Phe Gln Ala Val Leu Lys Gln Gly Met Pro Thr Pro A1a Lys Val Met Pro'Arg Val Asp Va1 Pro Pro Ser Met Arg Ala Ser Lys Glu Arg Val Gly Leu Arg Pro Ala Glu Met Leu Ala Asn Val Gly Pro Ser Pro Ser Lys A1a Lys Gln Ile Val ' 245 250 255 Asn Pro Ala Ala Ala Lys Val Thr Gln Arg Val Asp Pro Pro Pro Ala 4~

Lys Ala Ser Gln Arg Ile Asp Pro Leu Leu Pro Ser Lys Val His Ile Asp Ala Thr Arg Ser Phe Thr Lys Leu Ser G1n Thr Glu Ile Lys Pro Glu Val Gln Pro Pro Ile Pro Lys Val Pro Val Ala Met Pro Thr Ile 305° 310 315 320 Asn Arg Gln Gln Ile Asp Thr Ser Gln Pro Lys Glu Glu Pro Cys Ser Ser G1y Arg Asn Ala Glu Ala Ala Ser Val Ser Val Glu Lys Gln Ser Lys Ser Asp Arg Lys Lys Ser Arg Lys Ala Glu Lys Lys Glu Lys Lys Phe Lys Asp Leu Phe Val Thr Trp Asp Pro Pro Ser Met Glu Met Asp Asp Met Asp Leu Gly Asp Gln Asp Trp Leu Leu Gly Ser Thr Arg Lys Pro Asp Ala Gly Ile G1y Asn Cys Arg Glu Ile Val Asp Pro Leu Thr Ser Gln Ser Ala Glu Gln Phe Ser Leu Gln Pro Arg Ala Ile His Leu Pro Asp Leu His Val Tyr Gln Leu Pro Tyr Val Val Pro Phe <210> 31 <211> 1341' <212> DNA
<213> Oryza rufipogon strain IRCG105491 <220>
<221> CDS
<222> (1)..(1341) <400> 31 atg tcg agg tgc ttc ccc tac ccg ccg ccg ggg tac gtg cga aac cca 48 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro gtggtggcc gtggccgcg gccgaagcg caggcg accactaag ctccag 96 ValValAla ValAlaAla AlaGluAla GlnAla ThrThrLys LeuGln aaagaaagg gaaaaggcc gaaaagaag aaagag aaaaagagt gacagg 144 LysGluArg GluLysAla GluLysLys LysGlu LysLysSer AspArg aaagetctt ccacatggt gagatatcc aagcat tcaaagcga acccac 192 LysAlaLeu ProHisGly GluIleSer LysHis SerLysArg ThrHis aagaagaga aaacatgaa gacatcaat aatget gatcagaag tcccgg 240 LysLysArg LysHisGlu AspIleAsn AsnAla AspGlnLys SerArg aaggtttcc tccatggaa cctggtgag caattg gagaagagt ggactc 288 LysValSer SerMetGlu ProGlyGlu GlnLeu GluLysSer GlyLeu tcagaagag catggaget ccttgcttt actcag acagtgcat ggctct 336 SerGluGlu HisGlyAla ProCysPhe ThrGln ThrValHis GlySer ccagagagt tcacaggac agcagcaag agaaga aaggttgtg ttaccc 384 ProGluSer SerGlnAsp SerSerLys ArgArg LysVa1Val LeuPro agtcctagc caagetaag aatggtaac atcctt cgaataaag ataaga 432 SerProSer GlnAlaLys AsnGlyAsn IleLeu ArgIleLys IleArg agagatcaa gattcttca gettccctt tcggag aaatctaat gttgta 480 ArgAspGln AspSerSer AlaSerLeu SerGlu LysSerAsn ValVal caaacacca gttcatcaa atgggatca gtttca tctctgcca agtaag 528 GlnThrPro ValHisGln MetGlySer ValSer SerLeuPro SerLys aaaaactca atgcaacca cacaacacc gaaatg atggtgaga acagca 576 LysAsnSer MetGlnPro HisAsnThr GluMet MetValArg ThrAla tcaacccag cagcaaagc atcaaaggt gatttt caagcagta ctgaaa 624 SerThrGln GlnGlnSer IleLysG1y AspPhe GlnAlaVal LeuLys caaggtatg ccaacccca gcaaaagtc atgcca agagtcgat gttcct 672 GlnGlyMet ProThrPro AlaLysVa1 MetPro ArgValAsp ValPro ccatctatg agggcatca aaggaaagg gttggc cttcgtcct gcagag 720 ProSerMet ArgAlaSer LysGluArg ValGly LeuArgPro AlaGlu atgttggcc aatgttggt ccttcacca tccaag gcaaaacag attgtc 768 MetLeuAla AsnValGly ProSerPro SerLys AlaLysGln IleVal aat cctgca getget aaggttaca caaagagtt gatcctcca cctgcc 816 Asn ProA1a AlaAla LysValThr GlnArgVal AspProPro ProAla , aag gcatct cagaga attgatcct ctgttgcca tccaaggtt catata 864 Lys AlaSer GlnArg IleAspPro LeuLeuPro SerLysVal HisIle gat getact cgatct tttacgaag gtctcccag acagagatc aagccg 912 Asp AlaThr ArgSer PheThrLys ValSerGln ThrGluIle LysPro gaa gtacag ccccca attccgaag gtgcctgtg getatgcct accatc 960 Glu ValGln ProPro IleProLys ValProVal AlaMetPro ThrIle aat cgtcag cagatt gacacctcg cagcccaaa gaagagcct tgctcc 1008 Asn ArgGln GlnIle AspThrSer GlnProLys G1uGluPro CysSer tct ggcagg aatget gaagetget tcagtatca gtagagaag cagtcc 1056 Ser GlyArg AsnAla GluAlaAla SerValSer Va1GluLys G1nSer aag tcagat cgcaaa aagagccgc aaggetgag aagaaagag aagaag 1104 Lys SerAsp ArgLys LysSerArg LysAlaGlu LysLysGlu LysLys ttc aaagat ttattt gttacctgg gatcctccg tctatggaa atggat 1152 Phe LysAsp LeuPhe ValThrTrp AspProPro SerMetGlu MetAsp gat atggat cttggg gaccaggat tggctgctt ggtagtacg aggaaa 1200 Asp MetAsp LeuGly AspG1nAsp TrpLeuLeu GlySerThr ArgLys cct gatget ggcatt ggcaactgc agagaaatt gttgatcca cttact 1248 Pro AspAla GlyIle GlyAsnCys ArgGluIle ValAspPro LeuThr tct caatca gcagag cagttctca ttgcagcct agggcgatt cattta 1296 ' GlnSer AlaGlu G1nPheSer LeuGlnPro ArgAlaIle HisLeu Ser cca gacctt catgtc tatcagttg ccatatgtg gttccattc tag 1341 Pro AspLeu HisVal TyrGlnLeu ProTyrVal ValProPhe <21 0> 32 <21 1> 446 <21 2> PRT

<2l 3> Oryza gonstrain RCG105491 rufipo I

<400> 32 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro Val Val Ala Val Ala Ala Ala Glu A1a Gln Ala Thr Thr Lys Leu Gln Lys Glu Arg Glu Lys Ala Glu Lys Lys Lys Glu Lys Lys Ser Asp Arg Lys Ala Leu Pro His Gly Glu Ile Ser Lys His Ser Lys Arg Thr His Lys Lys Arg Lys His Glu Asp I1e Asn Asn Ala Asp Gln Lys Ser Arg Lys Val Ser Ser Met Glu Pro Gly Glu Gln Leu Glu Lys Ser Gly Leu Ser Glu Glu His Gly Ala Pro Cys Phe Thr Gln Thr Val His Gly Ser Pro Glu Ser Ser G1n Asp Ser Ser Lys Arg Arg Lys Val Val Leu Pro Ser Pro Ser Gln Ala Lys Asn Gly Asn Ile Leu Arg Ile Lys I1e Arg 130 135 l40 Arg Asp Gln Asp Ser Ser Ala Ser Leu Ser Glu Lys Ser Asn Val Val Gln Thr Pro Val His Gln Met Gly Ser Val Ser Ser Leu Pro Ser Lys Lys Asn Ser Met G1n Pro His Asn Thr Glu Met Met Val Arg Thr Ala Ser Thr Gln Gln Gln Ser Ile Lys Gly Asp Phe Gln Ala Val Leu Lys G1n Gly Met Pro Thr Pro A1a Lys Val Met Pro Arg Val Asp Val Pro Pro Ser Met Arg Ala Ser Lys Glu Arg Val Gly Leu Arg Pro Ala Glu Met Leu Ala Asn Val Gly Pro Ser Pro Ser Lys Ala Lys Gln Ile Val Asn Pro Ala Ala Ala Lys Val Thr Gln Arg Val Asp Pro Pro Pro Ala Lys A1a Ser Gln Arg Ile Asp Pro Leu Leu Pro Ser Lys Val His Ile Asp Ala Thr Arg Ser Phe Thr Lys Val Ser Gln Thr Glu Ile Lys Pro Glu Val Gln Pro Pro Ile Pro Lys Val Pro Val Ala Met Pro Thr Ile Asn Arg Gln Gln Ile Asp Thr Ser Gln Pro Lys Glu Glu Pro Cys Ser Ser Gly Arg Asn Ala Glu Ala Ala Ser Val Ser Val Glu Lys Gln Ser Lys Ser Asp Arg Lys Lys Ser Arg Lys Ala Glu Lys Lys Glu Lys Lys Phe Lys Asp Leu Phe Val Thr Trp Asp Pro Pro Ser Met Glu Met Asp Asp Met Asp Leu Gly Asp Gln Asp Trp Leu Leu Gly Ser Thr Arg Lys Pro Asp Ala Gly Ile Gly Asn Cys Arg Glu Ile Val Asp Pro Leu Thr Ser Gln Ser Ala Glu Gln Phe Ser Leu Gln Pro Arg Ala Ile His Leu Pro Asp Leu His Val Tyr Gln Leu Pro Tyr Va1 Val Pro Phe <210> 33 <211> 180 <212> DNA
<213> Zea mays mays strain BS7 <400> 33 gcatgtcgag gtgcttcccc tacccgccac cggggtacgt gcggaaccca gtggccgtgg 60 ccgagccgga gtcgaccgct aaggtttgtt gaaccttcgg atttacacac gcacgtgcca 120 gatcgtttgt tcaatctgta ggttttgcgc ggatctgtgt gtttgcgcgt gcgtgatgtg 180 <210> 34 <21l> 1447 <212> DNA
<213> Zea mays mays strain BS7 <400>

tcagaactgacgattgctctggtggctgaagctcctgaaagaaaaggaaaaggccgaaaa60 gaagaaagagaaaaggagtgacaggaaagctcccaagcagtgtgagacgtccaaacattc120 aaagcacagccataagaagagaaagcttgaagatgtcatcaaagctgagcagggtcccaa180 aagagtacccaaagaatcagttgagcagttggagaagagtggactctcagaagagcatgg240 agctccttcttttgtacatacgatacgtgactctcctgagagctcacaggacagcggcaa300 gagacgaaaggttgtcctgtccagtcctagccaacctaagaatggtgagactattctctt360 gtttttgctattctgattgattttttattatagaagaaatcaatcgcttgttcaggattt420 tattcatcccaacttgattttacaggaaacattcttcgcttcaagattaaaagtagtcaa480 gayccccaatcagctgttctggagaaaccaagggttcttgagcaaccattggtccaacaa540 atgggatcaggttcatcccygtcgggcaagcaaaattcaatccatcataagatgaatgtg600 agatctacctctggtcagcggagggtcgatggtgactcccaagcagtacaaaaatgtttg660 attacagaatccccggcaaagaccatgcagagacttgtcccccagcctgcagctaaggtc720 acacatcctgttgatccccagtcagctgttaaggtgccag~ttggaagatcgggcctacct780 ctgaagtcttcgggaagtgtggacccttcgcctgctagagttatgagaagatttgatcct840 ccacctgttaagatgatgtcacagagagttcaccatccagcttccatggtgtcgcagaaa900 gttgatcctccgtttccgaaggtattacataaggaaaccggatctgttgttcgcctacca960 gaagctacccggcctac,tgttcttcaaaaacccaaggacttgcctgctatcaagcagcag1020 gatatcaggacctcttcctcaaaagaagagccctgcttctctggtaggaatgcagaagca1080 gttcaagtgcaagatactaagctctcccggtcagacatgaagaaaatccgcaaagctgag1140 aaaaaagataagaagttcagagatctgtttgttacctggaatccggtattgatagagaat1200 gaaggttcagatcttggtgatgaagactggctgttcagcagtaaaaggaactccgatgct1260 atcatggttcaaagcagagctactgatagttcagtgccgatccatccaatggtgcagcag1320 aagccttctttacaacccagggcaacatttttgccggaccttaatatgtaccagctgcca1380 tatgtcgtaccattttaaacatctggcgaggtagatgagaattagatgagatgttgggag1440 agagctg 1447 <210> 35 <211> 1347 <212> DNA
<213> Zea mays mays strain BS7 <220>
<221> CDS
<222> (1)..(1347) <220>
<221> misc_feature <222> (1). (1347) <223> The Xaa at position 170 stands for Pro or Leu <400> 35 atg tcg agg tgc ttc ccc tac ccg cca ccg ggg tac gtg cgg aac cca 48 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro gtg gcc gtg gcc gag ccg gag tcg acc get aag ctc ctg aaa gaa aag 96 Val Ala Val Ala Glu Pro Glu Ser Thr Ala Lys Leu Leu Lys Glu Lys gaa aag gcc gaa aag aag aaa gag aaa agg agt gac agg aaa get ccc 144 Glu Lys Ala Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg Lys A1a Pro aagcagtgtgag acgtcc aaacattca aagcacagc cataagaag aga 192 LysGlnCysGlu ThrSer LysHisSer LysHisSer HisLysLys Arg aagcttgaagat gtcatc aaagetgag cagggtccc aaaagagta ccc 240 LysLeuGluAsp ValIle LysAlaGlu GlnGlyPro LysArgVal Pro aaagaatcagtt gagcag ttggagaag agtggactc tcagaagag cat 288 LysGluSerVal GluGln LeuGluLys SerGlyLeu SerGluGlu His ggagetccttct ttt,gta catacgata cgtgactct cctgagagc tca 336 GlyAlaProSer PheVa1 HisThrIle ArgAspSer ProGluSer Ser caggacagcggc aagaga cgaaaggtt gtcctgtcc agtcctagc caa 384 GlnAspSerGly LysArg ArgLysVa1 ValLeuSer SerProSer Gln cctaagaatgga aacatt cttcgcttc aagattaaa agtagtcaa gay 432 ProLysAsnGly AsnIle LeuArgPhe LysTleLys SerSerGln Asp ccccaatcaget gttctg gagaaacca agggttctt gagcaacca ttg 480 ProGlnSerAla ValLeu GluLysPro ArgValLeu GluGlnPro Leu gtccaacaaatg ggatca ggttcatcc cygtcgggc aagcaaaat tca 528 ValGlnGlnMet GlySer GlySerSer XaaSerGly LysGlnAsn Ser atccatcataag atgaatgtg agatct acctctggtcag cggagg gtc 576 IleHisHisLys MetAsnVal ArgSer ThrSerGlyGln ArgArg Val gatggtgactcc caagcagta caaaaa tgtttgattaca gaatcc ccg 624 AspGlyAspSer GlnAlaVal GlnLys CysLeuIleThr GluSer Pro gcaaagaccatg cagagactt gtcccc cagcctgcaget aaggtc aca 672 AlaLysThrMet GlnArgLeu ValPro GlnProAlaAla LysVal Thr catcctgttgat ccccagtca getgtt aaggtgccagtt ggaaga tcg 720 HisProValAsp ProGlnSer AlaVal LysValProVal GlyArg Ser ggcctacctctg aagtcttcg ggaagt gtggacccttcg cctget aga 768 GlyLeuProLeu LysSerSer GlySer ValAspProSer ProAla Arg gttatgagaaga tttgatcct ccacct gttaagatgatg tcacag aga 816 ValMetArgArg PheAspPro ProPro ValLysMetMet SerGln Arg gttcaccatcca gettccatg gtgtcg cagaaagttgat cctccg ttt 864 ValHisHisPro A1aSerMet ValSer GlnLysValAsp ProPro Phe ccgaaggtatta cataaggaa accgga tctgttgttcgc ctacca gaa 912 ProLysValLeu HisLysGlu ThrGly SerValValArg LeuPro Glu getacccggcct actgtt cttcaaaaa cccaaggac ttgcctget atc 960 AlaThrArgPro ThrVal LeuGlnLys ProLysAsp LeuProAla Ile aagcag-caggat atcagg acctcttcc tcaaaagaa gagccctgc ttc 1008 LysGlnGlnAsp IleArg ThrSerSer SerLysGlu GluProCys Phe tctggtaggaat gcagaa gcagttcaa gtgcaagat actaagctc tcc 1056 SerGlyArgAsn AlaGlu AlaValGln ValGlnAsp ThrLysLeu Ser cggtcagacatg aagaaa atccgcaaa getgagaaa aaagataag aag 1104 ArgSerAspMet LysLys IleArgLys AlaG1uLys LysAspLys Lys ttcagagatctg tttgtt acctggaat ccggtattg atagagaat gaa 1152 PheArgAspLeu PheVal ThrTrpAsn ProValLeu 21eGluAsn Glu ggttcagatctt ggtgat gaagactgg ctgttcagc agtaaaagg aac 1200 GlySerAspLeu GlyAsp GluAspTrp LeuPheSer SerLysArg Asn tccgatgetatc atggtt caaagcaga getactgat agttcagtg ccg 1248 SerAspAlaIle MetVal GlnSerArg AlaThrAsp SerSerVal Pro atc cat cca atg gtg cag cag aag cct tct tta caa ccc agg gca aca 1296 Ile His Pro Met Val Gln Gln Lys Pro Ser Leu Gln Pro Arg Ala Thr ttt ttg ccg gac ctt aat atg tac cag ctg cca tat gtc gta cca ttt 1344 Phe Leu Pro Asp Leu Asn Met Tyr Gln Leu Pro Tyr Val Val Pro Phe taa 1347 <210> 36 <211> 448 <212> PRT
<213> Zea mays mays strain BS7 <220>
<221> misc_feature <222> (170)..(170) <223> The 'Xaa' at location 170 stands for Pro, or Leu.
<400> 36 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro Val Ala Val Ala G1u Pro Glu Ser Thr A1a Lys Leu Leu Lys Glu Lys G1u Lys Ala Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg Lys Ala Pro Lys Gln Cys Glu Thr Ser Lys His Ser Lys His Ser His Lys Lys Arg Lys Leu Glu Asp Val Ile Lys Ala Glu G1n Gly Pro Lys Arg Val Pro Lys Glu Ser Val Glu Gln Leu G1u vys Ser Gly Leu Ser Glu Glu His Gly Ala Pro Ser Phe Val His Thr Ile Arg Asp Ser Pro Glu Ser Ser Gln Asp Ser Gly Lys Arg Arg Lys Val Val Leu Ser Ser Pro Ser Gln Pro Lys Asn Gly Asn Ile Leu Arg Phe Lys Ile Lys Ser Ser Gln Asp Pro Gln Ser Ala Val Leu Glu Lys Pro Arg Val Leu Glu Gln Pro Leu Val Gln Gln Met Gly Ser Gly Ser Ser Xaa Ser Gly Lys Gln Asn Ser Ile His His Lys Met Asn Val Arg Ser Thr Ser Gly Gln Arg Arg Val Asp Gly Asp Ser Gln Ala Val Gln Lys Cys Leu Ile Thr Glu Ser Pro Ala Lys Thr Met Gln Arg Leu Val Pro Gln Pro Ala Ala Lys Val Thr His Pro Val Asp Pro Gln Ser Ala Val Lys Val Pro Val Gly Arg Ser Gly Leu Pro Leu Lys Ser Ser G1y Ser Val Asp Pro Ser Pro Ala Arg Val Met Arg Arg Phe Asp Pro Pro Pro Val Lys Met Met Ser Gln Arg Val His His Pro Ala Ser Met Val Ser Gln Lys Val Asp Pro Pro Phe Pro Lys Val Leu His Lys Glu Thr Gly Ser Val Val Arg Leu Pro Glu Ala Thr Arg Pro Thr Val Leu Gln Lys Pro Lys.Asp Leu Pro Ala Ile Lys Gln Gln Asp Ile Arg Thr Ser Ser Ser Lys Glu Glu Pro Cys Phe Ser Gly Arg Asn Ala Glu Ala Val Gln Va1 Gln Asp Thr Lys Leu Ser Arg Ser Asp Met Lys Lys Ile Arg Lys Ala Glu Lys Lys Asp Lys Lys Phe Arg Asp Leu Phe Val Thr Trp Asn Pro Val Leu Ile Glu Asn Glu Gly Ser Asp Leu Gly Asp Glu Asp Trp Leu Phe Ser Ser Lys Arg Asn Ser Asp Ala Ile Met Val Gln Ser Arg Ala Thr Asp Ser Ser Val Pro 405 410 , . 415 Ile His Pro Met Val Gln Gln Lys Pro Ser Leu Gln Pro Arg Ala Thr Phe Leu Pro Asp Leu Asn Met Tyr Gln Leu Pro Tyr Val Val Pro Phe <210> 37 <211> 2646 <212> DNA
<213> Zea mays mays strain HuoBai <400>

gcggggtagagcgcggtcgacgtcggcatgtcgaggtgcttcccctacccgccaccgggg60 tacgtgcggaacccagtggccgtggccgagccggagtcgaccgctaaggtttgttgaacc120 ttcggatttacacacgcacgtgccagatcgtttgttcaatctgtaggttttgcgcggatc180 tgtggtttgcgcgtgcgtgatgtgggtattgcccgtgccttgaaagctaaccgagctgag240 gaagtgtatggatcttgtgtagctgcacgaggtcctccaaatcgattgtaaaatttaagt300 tgtatggccggtaggccaagattgggttattccggttttcgaaaactggtagcatggtta360 tcggggacattgaaagaatggtagaacatcaaattcgattcaaaactgtgctagatttgc420 atatttagtcgccctaaaattacgtggacgtgggtgatccgaattggttgttgtatgatg480 gttggaagtgactggccaaatttttttgtttctcaaagttttctttgaaaaactgtttgt540 cgagcgtcaattcgtatttacctgaatttactaattcttaatacagtatgtcgttatttt600 gggctaagcttgtgtaagaagggtcgtttgacattttgtactgtattgatgctgttttgt660 gtttctttgttcggagcagcattcaatgctccttttgttgtttgagagaatctgatattt720 gccatcgtaccgaaagtccgaaaccaactattcaaattgggatttcatttcttttttttt780 ctactgtttttagagttctctttttcgctgctgtgctcttgtgggtcagtacgtgcattt840 ctctttttttctttttttttctgatgttactcttctgttgaccaaaggagttcagaatta900 ttttggccctgtatatcaatagcaaccaacaccatttattgagcccatttttagttttct960 tgttctgtagagtatgcattgttgcaggtcttaactgttgtcagggaagtaacgtgttca1020 acatgattgtaaacgaatacaattctgttgctaactgtgtaatgatgagaaggataattg1080 aataatctttgtgaagtattactgtctgaactgtacgcaaatgctacatttattctttgt1140 gttcgtgtaaatatcattatacataaaaatgctgcattgcattcccgtcgtccgttctaa1200 atcagaactg acgattgctc tggtggctga agct~cctgaa agaaaaggaa aaggccgaaa 1260 agaagaaaga gaaaaggagt gacaggaaag ctcccaagca gtgtgagacg tccaaacatt 1320 caaagcacag ccataagaag agaaagcttg aagatgtcat caaagctgag cagggtccca 1380 aaagagtacc caaagaatca gttgagcagt tggagaagag tggactctca gaagagcatg 1440 gagctccttc ttttgtacat acgatacgtg actctcctga gagctcacag gacagcggca 1500 agagacgaaa ggttgtcctg tccagtccta gccaacctaa gaatggtgag actattctct 1560 tgtttttgct attctgattg attttttatt atagaagaaa tcaatatctt gttcaggatt 1620 ttattcatcc caacttgatt ttacaggaaa cattcttcgc ttcaagatta aaagtagtca 1680 agatccccaa tcagctgttc tggagaaacc aagggttctt gagcaaccat tggtccaaca 1740 aatgggatca ggttcatccc tgtcgggcaa gcaaaattca atccatcata agatgaatgt 1800 gagatctacc tctggtcagc ggagggtcaa tggtgactcc caagcagtac aaaaatgttt 1860 gattacagaa tccccggcaa agaccatgca gagacttgtc ccccagcctg cagctaaggt 1920 cacacatcct gttgatcccc agtcagctgt taaggtgcca gttggaagat cgggcctacc 1980 tctgaagtct tcgggaagtg tggacccttc gcctgctaga gttatgagaa gatttgatcc 2040 tccacctgtt aagatgatgt cacagagagt tcaccatcca gcttccatgg tgtcgcagaa 2100 agttgatcct ccgtttccga aggtattaca taaggaaacc ggatctgttg ttcgcctacc 2160 agaagctacc cggcctactg ttcttcaaaa acccaaggac ttgcctgcta tcaagcagca 2220 ggatatcagg acctcttcct caaaagaaga gccctgcttc tctggtagga atgcagaagc 2280 agttcaagtg caagatacta agctctcccg gtcagacatg aagaaaatcc gcaaagctga 2340 gaaaaaagat aagaagttca gagatctgtt tgttacctgg aatccggtat tgatagagaa 2400 tgaaggttca gatcttggtg atgaagactg gctgttcagc agtaaaagga actccgatgc 2460 tatcatggtt caaagcagag ctactgatag ttcagtgccg atccatccaa tggtgcagca 2520 gaagccttct ttacaaccca gggcaacatt tttgccggac cttaatatgt accagctgcc 2580 atatgtcgta ccattttaaa catctggcga ggtagatgag aattagatga gatgttggga 2640 gagagc . 2646 <210> 38 <211> 1347 <212> DNA
<213> Zea mays mays strain HuoBai <220>
<221> CDS
<222> (1)..(1347) <400>

atgtcg aggtgcttc ccctacccg ccaccgggg tacgtgcgg aaccca 48 MetSer ArgCysPhe ProTyrPro ProProGly TyrValArg AsnPro gtggcc gtggccgag ccggagtcg accgetaag ctcctgaaa gaaaag 96 ValAla ValAlaGlu ProGluSer ThrAlaLys LeuLeuLys GluLys gaaaag gccgaaaag aagaaagag aaaaggagt gacaggaaa getccc 144 GluLys AlaGluLys LysLysGlu LysArg5er AspArgLys AlaPro aagcag tgtgagacg tccaaacat tcaaagcac agccataag aagaga 192 LysGln CysGluThr SerLysHis SerLysHis SerHisLys LysArg aagctt gaagatgtc atcaaaget gagcagggt cccaaaaga gtaccc 240 LysLeu G1uAspVal IleLysAla GluGlnGly ProLysArg ValPro aaagaa tcagttgag cagttggag aagagtgga ctctcagaa gagcat 288 LysGlu SerValGlu GlnLeuGlu LysSerGly LeuSerGlu GluHis ggaget ccttctttt gtacatacg atacgtgac tctcctgag agctca 336 GlyAla ProSerPhe Va1HisThr IleArgAsp SerProGlu SerSer caggac agcggcaag agacgaaag gttgtcctg tccagtcct agccaa 384 G1nAsp SerG1yLys ArgArgLys ValValLeu SerSerPro SerGln cctaag aatggaaac attcttcgc ttcaagatt aaaagtagt caagat 432 ProLys AsnGlyAsn IleLeuArg PheLysIle LysSerSer GlnAsp ccccaa tcagetgtt ctggagaaa ccaagggtt cttgagcaa ccattg 480 ProGln SerAlaVal Leu.GluLys ProArgVal LeuGluGln ProLeu gtccaa caaatggga tcaggttca tccctgtcg ggcaagcaa aattca 528 ValGln GlnMetGly SerG1ySer SerLeuSer GlyLysGln AsnSer atccat cataagatg aatgtgaga tctacctct ggtcagcgg agggtc 576 IleHis HisLysMet AsnValArg SerThrSer GlyGlnArg ArgVal aatggt gactcccaa gcagtacaa aaatgtttg attacagaa tccccg 624 AsnG1y AspSerG1n AlaValGln LysCysLeu IleThrGlu SerPro gcaaag accatgcag agacttgtc ccccagcct gcagetaag gtcaca 672 AlaLys ThrMetGln ArgLeuVal ProGlnPro AlaAlaLys ValThr catcct gttgatccc cagtcaget gttaaggtg ccagttgga agatcg 720 HisPro ValAspPro GlnSerAla ValLysVal ProValGly ArgSer ggcctacct ctgaag tcttcggga agtgtggac ccttcgcct getaga 768 GlyLeuPro LeuLys SerSerGly SerValAsp ProSerPro AlaArg gttatgaga agattt gatcctcca cctgttaag atgatgtca cagaga 816 ValMetArg ArgPhe AspProPro ProValLys MetMetSer GlnArg gttcaccat ccaget tccatggtg tcgcagaaa gttgatcct ccgttt 864 ValHisHis ProAla SerMetVal SerGlnLys ValAspPro ProPhe 275 ~ 280 285 ccgaaggta ttacat aaggaaacc ggatctgtt gttcgccta ccagaa 912 ProLysVal LeuHis LysGluThr GlySerVal ValArgLeu ProGlu getacccgg cctact gttcttcaa aaacccaag gacttgcct getatc 960 AlaThrArg ProThr ValLeuGln LysProLys AspLeuPro AlaIle aagcagcag gatatc aggacctct tcctcaaaa gaagagccc tgcttc 1008 LysGlnGln AspIle ArgThrSer SerSerLys GluGluPro CysPhe tctggtagg aatgca gaagcagtt caagtgcaa gatactaag ctctcc 1056 SerGlyArg AsnAla G1uAlaVal GlnValGln AspThrLys LeuSer cggtcagac atgaag aaaatccgc aaagetgag aaaaaagat aagaag 1104 ArgSerAsp MetLys LysI1eArg Lys~AlaGlu LysLysAsp LysLys ttcagagat ctgttt gttacctgg aatccggta ttgatagag aatgaa 1152 PheArgAsp LeuPhe ValThrTrp AsnProVal LeuIleGlu AsnGlu ggttcagat cttggt gatgaagac tggctgttc agcagtaaa aggaac 1200 GlySerAsp LeuGly AspGluAsp TrpLeuPhe SerSerLys ArgAsn tccgatget atcatg gttcaaagc agagetact gatagttca gtgccg 1248 SerAspAla IleMet ValGlnSer ArgAlaThr AspSerSer ValPro atccatcca atggtg cagcagaag ccttcttta caacccagg gcaaca 1296 IleHisPro MetVal GlnGlnLys ProSexLeu GlnProArg AlaThr tttttgccg gacctt aatatgtac cagctgcca tatgtcgta ccattt 1344 PheLeuPro AspLeu AsnMetTyr GlnLeuPro TyrValVa1 ProPhe taa 1347 <210> 39 <211> 448 <212> PRT

<213> Zea maysmay s Huo Bai strain <400> 39 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro Val Ala Val Ala Glu Pro Glu Ser Thr Ala Lys Leu Leu Lys Glu Lys Glu Lys Ala Glu Lys Lys Lys G1u Lys Arg Ser Asp Arg Lys Ala Pro Lys Gln Cys Glu Thr Ser Lys His Ser Lys His Ser His Lys Lys Arg Lys Leu Glu Asp Val Ile Lys Ala Glu Gln Gly Pro Lys Arg Val Pro Lys Glu Ser Va1 Glu Gln Leu Glu Lys Ser Gly Leu Ser'Glu Glu His Gly Ala Pro Ser Phe Va1 His Thr Ile Arg Asp Ser Pro Glu Ser Ser Gln Asp Ser Gly Lys Arg Arg Lys Va1 Val Leu Ser Ser Pro Ser Gln Pro Lys Asn Gly Asn Ile Leu Arg Phe Lys Ile Lys Ser Ser Gln Asp Pro Gln Ser Ala Val Leu Glu Lys Pro Arg Val Leu Glu Gln Pro Leu Val Gln Gln Met Gly Ser Gly Ser Ser Leu Ser Gly Lys Gln Asn Ser I1e His His Lys Met Asn Va1 Arg Ser Thr Ser Gly Gln Arg Arg Val Asn Gly Asp Ser Gln Ala Val Gln Lys Cys Leu Ile Thr G1u Ser Pro Ala Lys Thr Met Gln Arg Leu Val Pro Gln Pro Ala Ala Lys Val Thr His Pro Val Asp Pro Gln Ser Ala Val Lys Val Pro Val G1y Arg Ser Gly Leu Pro Leu Lys Ser Ser Gly Ser Val Asp Pro Ser Pro Ala Arg Val Met Arg Arg Phe Asp Pro Pro Pro Val Lys Met Met Ser Gln Arg 26p 265 270 Val His His Pro Ala Ser Met Val Ser Gln Lys Val Asp Pro Pro Phe Pro Lys Val Leu His Lys Glu Thr Gly Ser Val Val Arg Leu Pro Glu Ala Thr Arg Pro Thr Val Leu Gln Lys Pro Lys Asp Leu Pro Ala Ile Lys G1n Gln Asp Ile Arg Thr Ser Ser Ser Lys Glu Glu Pro Cys Phe Ser Gly Arg Asn Ala Glu A1a Val Gln Val Gln Asp Thr Lys Leu Ser Arg Ser Asp Met Lys Lys Ile Arg Lys Ala Glu Lys Lys Asp Lys Lys Phe Arg Asp Leu Phe Val Thr Trp Asn Pro Val Leu Ile Glu Asn Glu Gly Ser Asp Leu Gly Asp Glu Asp Trp Leu Phe Ser Ser Lys Arg Asn Ser Asp Ala Ile Met Val Gln Ser Arg Ala Thr Asp Ser Ser Val Pro 405 a 410 415 Ile His Pro Met Va1 Gln Gln Lys Pro Ser Leu Gln Pro Arg Ala Thr Phe Leu Pro Asp Leu Asn Met Tyr Gln Leu Pro Tyr Val Val Pro Phe <210> 40 <211> 262 <212> DNA
<213> Zea mays mays strain Makki <400> 40 gaacgaattt gaatcctttg tgatctctac ggcggggtag agcgcggtcg accgtcggcc 60 atgtcgaggt gcttccccta cccgccaccg gggtacgtgc ggaacccagt ggccgtggcc 120 gagccggagt cgaccgctaa ggtttgttga accttcggat ttacacacgc acgtgccaga 180 tcgtttgttc aatctgtagg ttttgcgcgg atctgtggtt tgcgcgtgcg tgatgtgggt 240 attgcccgtg ccttgaaagc to 262 <210> 41 <211> 2311 <212> DNA
<213> Zea mays mays strain Makki <400>

tttcgaaaactggtagcatggttatcggggacattgaaagaatggtagaacatcaaattc60 gattcaaaactgtgctagatttgcatatttagtcgccctaaaattacgtggacgtgggtg120 atccgaattggttgttgtatgatggttggaagtgactggccaaatttttttgtttctcaa180 agttttctttgacaaactgtttgtcgagcgtcaattcgtatttacctgaatttactaatt240 cttaatacagtatgtcgttattttgggctaagcttgtgtaagaagggtcgtttgacattt300 tgtactgtattgatgctgttttgtgtttctttgttcggagcagcattcaatgctcctttt360 gttgtttgagagaatctgatatttgccatcgtaccgaaagtccgaaaccaactattcaaa420 ttgggatttcatttctttttttttctactgtttttagagttctctttttcgctgctgtgc480 tcttgtgggtcagtacgtgcatttctctttttttctttttttttctgatgttactcttct540 gttgaccaaaggagttcagaattattttggacctgtatatcaatagcaaccaacaccatt600 tattgagcccatttttagttttcttgttctgtagagtatgcattgttgcaggtcttaactX60 gttgtcagggaagtaacgtgttcaacatgattgtaaacgaatacaattctgttgctaact720 gtgtaatgatgagaaggataattgaataatctttgtgaagtattactgtctgaactgtac780 gcaaatgctacattcattctttgtgttcgtgtaaatatcattatacataaaaatgctgca840 ttgcattcccgtcgtccgttctaaatcagaactgacgattgctctggtggctgaagctcc900 tgaaagaaaaggaaaaggccgaaaagaagaaagagaaaaggagtgacaggaaagctccca960 agcagtgtgagacgtccaaacattcaaagcacagccataagaagagaaagcttgaagatg1020 tcatcaaagctgagcagggtcccaaaagagtacccaaagaatcagttgagcagttggaga1080 agagtggactctcagaagagcatggagctccttcttttgtacatacgatacgtgactctc1140 ctgagagctcacaggacagcggcaagagacgaaaggttgtcctgtccagtcctagccaac1200 ctaagaatggtgagactattctcttgtttttgctattctgattgattttttattatagaa1260 gaaatcaatcgcttgttcaggattttattcatcccaacttgattttacaggaaacattct1320 tcgcttcaagattaaaagtagtcaagatccccaatcagctgttctggagaaaccaagggt1380 tcttgagcaaccattggtccaacaaatgggatcaggttcatccctgtcgggcaagcaaaa1440 ttcaatccatcataagatgaatgtgagatctacctctggtcagcggagggtcaatggtga1500 ctcccaagcagtacaaaaatgtttgattacagaatccccggcaaagaccatgcagagact1560 tgtcccccagcctgcagctaaggtcacacatcctgttgatccccagtcagctgttaaggt1620 gccagttggaagatcgggcctacctctgaagtcttcrggaagtgtggacccttcgcctgc1680 tagagttatgagaagatttgatcctccacctgttaagatgatgtcacagagagttcacca1740 tccagcttccatggtgtcgcagaaagttgatcctccgtttccgaaggtattacataagga1800 aaccggatctgttgttcgcctaccagaagctacccggcctactgttcttcaaaaacccaa1860 ggacttgcctgctatcaagcagcaggatatcaggacctcttcctcaaaagaagagccctg1920 cttctctggtaggaatgcagaagcagttcaagtgcaagatactaagctctcccggtcaga1980 catgaagaaaatccgcaaagctgagaaaaaagataagaagttcagagatctgtttgttac2040 ctggaatccggtattgatagagaatgaaggttcagatcttggtgatgaagactggctgtt2100 cagcagtaaaaggaactccgatgctatcatggttcaaagcagagctactgatagttcagt2160 gccgatccatccaatggtgcagcagaagccttctttacaacccagggcaacatttttgcc2220 ggaccttaatatgtaccagctgccatatgtcgtaccattttaaacatctggcgaggtaga2280 tgagaattagatgagatgttgggagagagct 2311 <210> 42 <211> 1347 <212> DNA
<213> Zea mays mays strain Makki <220>
<221> CDS
<222> (1)..(1347) <220>
<221> misc_feature <222> (1). (1347) <223> The Xaa at position 247 stands for Ser <400> 42 atg tcg agg tgc ttc ccc tac ccg cca ccg ggg tac gtg cgg aac cca 48 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro gtg gcc gtg gcc gag ccg gag tcg acc get aag ctc ctg aaa gaa aag 96 Val Ala Val Ala Glu Pro Glu Ser Thr Ala Lys Leu Leu Lys G1u Lys gaa aag gcc gaa aag aag aaa gag aaa agg agt gac agg aaa get ccc 144 Glu Lys Ala Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg Lys Ala Pro aag cag tgt gag acg tcc aaa cat tca aag cac agc cat aag aag aga 192 Lys Gln Cys Glu Thr Ser Lys His Ser Lys His Ser His Lys Lys Arg aagcttgaa gatgtcatc aaagetgag cagggt cccaaaaga gtaccc 240 LysLeuGlu AspVal.Ile LysAlaGlu GlnGly ProLysArg ValPro aaagaatca gttgagcag ttggagaag agtgga ctctcagaa gagcat 288 LysGluSer ValGluGln LeuG1uLys SerGly LeuSerGlu GluHis ggagetcct tcttttgta catacgata cgtgac tctcctgag.agctca 336 GlyAlaPro SerPheVal HisThrI1e ArgAsp SerProGlu SerSer caggacagc ggcaagaga cgaaaggtt gtcctg tccagtcct agccaa 384 GlnAspSer GlyLysArg ArgLysVal Va1Leu SerSerPro SerG1n 115 120 , 125 cctaagaat ggaaacatt cttcgcttc aagatt aaaagtagt caagat 432 ProLysAsn G1yAsnIle LeuArgPhe LysIle LysSerSer GlnAsp ccccaatca getgttctg gagaaacca agggtt cttgagcaa ccattg 480 ProGlnSer AlaValLeu GluLysPro ArgVal LeuGluGln ProLeu gtccaacaa atgggatca ggttcatcc ctgtcg ggcaagcaa aattca 528 ValG1nGln MetGlySer GlySerSer LeuSer GlyLysGln Asn5er atccatcat aagatgaat gtgagatct acctct ggtcagcgg agggtc 576 IleHisHis LysMetAsn ValArgSer ThrSer GlyGlnArg ArgVal aatggtgac tcccaagca gtacaaaaa tgtttg attacagaa tccccg 624 AsnGlyAsp SerG1nAla ValGlnLys CysLeu IleThrGlu SerPro gcaaagacc atgcagaga cttgtcccc cagcctgca getaaggtc aca 672 AlaLysThr MetGlnArg LeuValPro GlnProAla AlaLysVal Thr catcctgtt gatccccag tcagetgtt aaggtgcca gttggaaga tcg 720 HisProVal AspProGln SerAlaVal LysValPro ValGlyArg Ser ggcctacct ctgaagtct t ggaagt gtggaccct tcgcctget aga 768 o GlyLeuPro LeuLysSer XaaGlySer ValAspPro SerProAla Arg gttatgaga agatttgat cctccacct gttaagatg atgtcacag aga 816 ValMetArg ArgPheAsp ProProPro ValLysMet MetSerGln Arg gttcaccat ccagettcc atggtgtcg cagaaagtt gatcctccg ttt 864 ValHisHis ProAlaSer MetValSer GlnLysVal AspProPro Phe ccg aag gta tta cat aag gaa acc gga tct gtt gtt cgc cta cca gaa 912 Pro Lys Val Leu His Lys Glu Thr G1y Ser Val Val Arg Leu Pro Glu get acc cgg cct act gtt ctt caa aaa ccc aag gac ttg cct get atc 960 Ala Thr Arg Pro Thr Val Leu G1n Lys Pro Lys Asp Leu Pro Ala Ile 305 ~ 310 315 320 aagcagcag gatatc aggacctct tcctcaaaa gaagagccc tgcttc 1008 LysGlnGln AspIle ArgThrSer SerSerLys GluGluPro CysPhe tctggtagg aatgca gaagcagtt caagtgcaa gatactaag ctctcc 1056 SerGlyArg AsnAla GluAlaVal GlnValGln AspThrLys LeuSer cggtcagac atgaag aaaatccgc aaagetgag aaaaaagat aagaag 1104 ArgSerAsp MetLys LysIleArg LysAlaGlu LysLysAsp LysLys ttcagagat ctgttt gttacctgg aatccggta ttgatagag aatgaa 1152 PheArgAsp LeuPhe ValThrTrp AsnProVal LeuIleGlu AsnGlu ggttcagat cttggt gatgaagac tggctgttc agcagtaaa aggaac 1200 GlySerAsp LeuGly AspGluAsp TrpLeuPhe SerSerLys ArgAsn tccgatget atcatg gttcaaagc agagetact gatagttca gtgccg 1248 SerAspAla IleMet ValGlnSer ArgA1aThr AspSerSer ValPro atccatcca atggtg cagcagaag ccttcttta caacccagg gcaaca 1296 IleHisPro MetVal GlnGlnLys ProSerLeu GlnProArg AlaThr tttttgccg gacctt aatatgtac cagctgcca tatgtcgta ccattt 1344 PheLeuPro AspLeu AsnMetTyr GlnLeuPro TyrValVal ProPhe taa 1347 <210> 43 <211> 448 <212> PRT

<213> Zea maysmay s rainMak ki st <220>
<221> misc_feature <222> (247)..(247) <223> The 'Xaa' at location 247 stands for Ser.
<400> 43 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro Val Ala Val Ala Glu Pro Glu Ser Thr Ala Lys Leu Leu Lys Glu Lys Glu Lys Ala Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg Lys Ala Pro Lys Gln Cys Glu Thr Ser Lys His Ser Lys His Ser His Lys Lys Arg Lys Leu Glu Asp Val Ile Lys Ala Glu Gln Gly Pro Lys Arg Val Pro Lys Glu Ser Val Glu Gln Leu Glu Lys Ser Gly Leu Ser Glu Glu His Gly Ala Pro Ser Phe Val His Thr Ile Arg Asp Ser Pro Glu Ser Ser Gln Asp Ser Gly Lys Arg Arg Lys Val Val Leu Ser Ser Pro Ser Gln Pro Lys Asn Gly Asn Ile Leu Arg Phe Lys Ile Lys Ser Ser Gln Asp Pro Gln Ser Ala Val Leu Glu Lys Pro Arg Va1 Leu Glu Gln Pro Leu Va1 Gln G1n Met Gly Ser Gly Ser Ser Leu Ser Gly Lys Gln Asn Ser Ile His His Lys Met Asn Val Arg Ser Thr Ser Gly Gln Arg Arg Val Asn Gly Asp Ser Gln Ala Val Gln Lys Cys Leu T1e Thr Glu Ser Pro Ala Lys Thr Met Gln Arg Leu Val Pro Gln Pro Ala Ala Lys Va1 Thr His Pro Val Asp Pro Gln Ser Ala Val Lys Val Pro Val Gly Arg Ser Gly Leu Pro Leu Lys Ser Xaa Gly Ser Val Asp Pro Ser Pro Ala Arg Val Met I~rg Arg Phe Asp Pro Pro Pro Val Lys Met Met Ser G1n Arg Val His His Pro Ala Ser Met Val Ser Gln Lys Val Asp Pro Pro Phe Pro Lys Val Leu His Lys Glu Thr Gly Ser Val Val Arg Leu Pro Glu Ala Thr Arg Pro Thr Val Leu Gln Lys Pro Lys Asp Leu Pro Ala Tle Lys Gln G1n Asp Ile Arg Thr Ser Ser Ser Lys G1u Glu Pro Cys Phe ' 325 330 335 Ser Gly Arg Asn Ala Glu Ala Val Gln Val Gln Asp Thr Lys Leu Ser Arg Ser Asp Met Lys Lys Ile Arg Lys Ala Glu Lys Lys Asp Lys Lys Phe Arg Asp Leu Phe Val Thr Trp Asn Pro Val Leu Ile Glu Asn Glu Gly Ser Asp Leu Gly Asp Glu Asp Trp Leu Phe Ser Sex Lys Arg Asn Ser Asp Ala Ile Met Val Gln Ser Arg Ala Thr Asp Ser Ser Val Pro Ile His Pro Met Val Gln Gln Lys Pro Ser Leu Gln Pro Arg Ala Thr Phe Leu Pro Asp Leu llsn Met Tyr Gln Leu Pro Tyr Val Val Pro Phe <210> 44 <211> 125 <212> DNA
<213> Zea mays mays strain Minl3 <400> 44 ctttgtgatc tctcggcggg gtagagcgcg gtcgaccgtc ggccatgtcg aggtgcttcc 60 cctacccgcc accggggtac gtgcggaacc cagtggccgt ggccgagccg gagtcgaccg 120 ctaag 125 <210> 45 %U

<211> 198 <212> DNA
<213> Zea mays mays strain Minl3 <400> 45 cttaatacag tatgtcgtta ttttgggcta agcttgtgta agaagggtcg tttgacattt 60 tgtactgtat tgatgctgtt ttgtgtttct ttgttcggag cagcattcaa tgctcctttt 120 gttgtttgag agaatctgat atttgccatc gtaccgaaag tccgaaacca actattcaaa 180 ttgggatttc atttcttt 198 <210> 46 <211> 1787 <212> DNA
<213> Zea mays mays strain Minl3 <400> 46 ttctgatgttactcttctgttgaccaaaggagttcagaattattttggccctgtatatca60 atagcaaccaacaccatttattgagcccatttttagttttcttgttctgtagagtatgca120 ttgttgcaggtcttaactgttgtcagggaagtaacgtgttcaacatgattgtaaacgaat180 acaattctgttgctaactgtgtaatgatgagaaggataattgaataatctttgtgaagta240 ttactgtctgaactgtacgcaaatgctacattcattctttgtgttcgtgtaaatatcatt300 atacataaaaatgctgcattgcattcccgtcgtccgttctaaatcagaactgacgattgc360 tctggtggctgaagctcctgaaagaaaaggaaaaggccgaaaagaagaaagagaaaagga420 gtgacaggaaagctcccaagcagtgtgagacgtccaaacattcaaagcacagccataaga480 agagaaagcttgaagatgtcatcaaagctgagcagggtcccaaaagagtacccaaagaat540 cagttgagcagttggagaagagtggactctcagaagagcatggagctccttcttttgtac600 atacgatacgtgactctcctgagagctcacaggacagcggcaagagacgaaaggttgtcc660 , tgtccagtcctagccaacctaagaatggtgagactattctcttgtttttgctattctgat720 tgattttttattatagaagaaatcaatcgcttgttcaggattttattcatcccaacttga780 ttttacaggaaacattcttcgcttcaagattaaaagtagtcaagatccccaatcagctgt840 tctggagaaaccaagggttcttgagcaaccattggtccaacaaatgggatcaggttcatc900 cctgtcgggcaagcaaaattcaatccatcataagatgaatgtgagatctacctctggtca960 gcggagggtcaatggtgactcccaagcagtacaaaaatgtttgattacagaatccccggc1020 aaagaccatgcagagacttgtcccccagcctgcagctaaggtcacacatcctgttgatcc1080 ccagtcagctgttaaggtgccagttggaagatcgggcctacctctgaagtcttcgggaag1140 tgtggacccttcgcctgctagagttatgagaagatttgatcctccacctgttaagatgat1200 gtcacagagagttcaccatccagcttccatggtgtcgcagaaagttgatcctccgtttcc1260 gaaggtattacataaggaaaccggatctgttgttcgcctaccagaagctacccggcctac1320 tgttcttcaaaaacccaaggacttgcctgctatcaagcagcaggatatcaggacctcttc1380 ctcaaaagaagagccctgcttctctggtaggaatgcagaagcagttcaagtgcaggatac1440 taagctctcccggtcagayatgaagaaaatccgcaaagctgagaaaaaagataagaagtt1500 cagagatctgtttgttacctggaatccggtattgatagagaatgaaggttcagatcttgg1560 tgatgaagactggctgttcagcagtaaaaggaactccgatgctatcatggttcaaagcag1620 agctactgatagttcagtgccgatccatccaatggtgcagcagaagccttctttacaacc1680 cagggcaacatttttgccggaccttaatatgtaccagctgccatatgtcgtaccatttta1740 aacatctggcgaggtagatgagaattagatgagatgttgggagagag 1787 <210> 47 <211> 1347 <212> DNA

<213> Zea maysstrain Minl3 mays <220>

<221> CDS

<222> (1)..(1347) <400> 47 atg tcg tgcttc ccctacccg ccaccgggg tacgtgcgg aaccca 48 agg Met Ser CysPhe ProTyrPro ProProGly TyrValArg AsnPro Arg gtg gcc gccgag ccggagtcg accgetaag ctcctgaaa gaaaag 96 gtg Val Ala AlaGlu ProGluSer ThrAlaLys LeuLeuLys GluLys Val 20 " 25 30 gaa aag gaaaag aagaaagag aaaaggagt gacaggaaa getccc 144 gcc Glu Lys GluLys LysLysG1u LysArgSer AspArgLys AlaPro Ala aag cag gagacg tccaaacat tcaaagcac agccataag aagaga 192 tgt Lys Gln GluThr SerLysHis SerLysHis SerHisLys LysArg Cys aag ctt gatgtc atcaaaget gagcagggt cccaaaaga gtaccc 240 gaa Lys Leu AspVal IleLysAla GluGlnGly ProLysArg ValPro Glu aaa gaa gttgag cagttggag aagagtgga ctctcagaa gagcat 288 tca Lys Glu ValGlu GlnLeuGlu LysSerGly LeuSerGlu GluHis Ser gga get tctttt gtacatacg atacgtgac tctcctgag agctca 336 cct Gly Ala SerPhe ValHisThr IleArgAsp SerProG1u SerSer Pro cag gac agc ggc aag aga cga aag gtt gtc ctg tcc agt cct agc caa 384 G1n Asp Ser Gly Lys Arg Arg Lys Val Val Leu Ser Ser Pro Ser Gln cctaag aatggaaac attcttcgc ttcaagatt aaaagtagt caagat 432 ProLys AsnGlyAsn IleLeuArg PheLysIle LysSerSer G1nAsp ccccaa tcagetgtt ctggagaaa ccaagggtt cttgagcaa ccattg 480 ProGln SerAlaVal LeuGluLys ProArgVal LeuGluGln ProLeu gtccaa caaatggga tcaggttca tccctgtcg ggcaagcaa aattca 528 ValGln GlnMetGly SerGlySer SerLeuSer GlyLysGln AsnSer atccat cataagatg aatgtgaga tctacctct ggtcagcgg agggtc 576 IleHis HisLysMet AsnValArg SerThrSer GlyGlnArg ArgVal aatggt gactcccaa gcagtacaa aaatgtttg attacagaa tccccg 624 AsnGly AspSerG1n AlaValGln LysCysLeu IleThrGlu SerPro gcaaag accatgcag agacttgtc ccccagcct gcagetaag gtcaca 672 AlaLys ThrMetGln ArgLeuVal ProGlnPro AlaAlaLys ValThr catcct gttgatccc cagtcaget gttaaggtg ccagttgga agatcg 720 HisPro ValAspPro GlnSerAla ValLysVal ProValGly ArgSer ggccta cctctgaag tcttcggga agtgtggac ccttcgcct getaga 768 GlyLeu ProLeuLys SerSerGly SerValAsp ProSerPro AlaArg gttatg agaagattt gatcctcca cctgttaag atgatgtca cagaga 816 Va1Met ArgArgPhe AspProPro ProValLys MetMetSer GlnArg gttcac catccaget tccatggtg tcgcagaaa gttgatcct ccgttt 864 ValHis HisProA1a SerMetVal SerGlnLys ValAspPro ProPhe ccgaag gtattacat aaggaaacc ggatctgtt gttcgccta ccagaa 912 ProLys ValLeuHis LysGluThr G1ySerVal ValArgLeu ProGlu getacc cggcctact gttcttcaa aaacccaag gacttgcct getatc 960 AlaThr ArgProThr ValLeuGln LysProLys AspLeuPro AlaIle aagcag caggatatc aggacctct tcctcaaaa gaagagccc tgcttc 1008 LysGln GlnAspIle ArgThrSer SerSerLys GluGluPro CysPhe tctggt aggaatgca gaagcagtt caagtgcag gatactaag ctctcc 1056 SerGly ArgAsnAla GluAlaVal GlnValGln AspThrLys LeuSer cggtca gayatgaag aaaatccgc aaagetgag aaaaaagat aagaag 1104 ArgSer AspMetLys LysIleArg LysAlaGlu LysLysAsp LysLys ttcagagatctg tttgtt acctggaat ccggtattg atagagaat gaa 1152 PheArgAspLeu PheVal ThrTrpAsn ProValLeu IleGluAsn Glu ggttcagatctt ggtgat gaagactgg ctgttcagc agtaaaagg aac 1200 GlySerAspLeu GlyAsp GluAspTrp LeuPheSer SerLysArg Asn tccgatgetatc atggtt caaagcaga getactgat agttcagtg ccg 1248 SerAspAlaIle MetVal GlnSerArg AlaThrAsp SerSerVal Pro atccatccaatg gtgcag cagaagcct tctttacaa cccagggca aca 1296 IleHisProMet ValGln GlnLysPro SerLeuGln ProArgAla Thr tttttgccggac cttaat atgtaccag ctgccatat gtcgtacca ttt 1344 PheLeuProAsp LeuAsn MetTyrGln LeuProTyr ValValPro Phe taa 1347 <210> 48 <211> 448 <212> PRT
<213> Zea mays mays strain Minl3 <400> 48 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro Val Ala Val Ala Glu Pro Glu Ser Thr Ala Lys Leu Leu Lys Glu Lys Glu Lys Ala Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg Lys Ala Pro Lys Gln Cys G1u Thr Sex Lys His Ser Lys His Ser His Lys Lys Arg Lys Leu Glu Asp Val Ile Lys Ala Glu Gln Gly Pro Lys Arg Val Pro Lys Glu Ser Val Glu Gln Leu Glu Lys Ser Gly Leu Ser Glu Glu His Gly Ala Pro Ser Phe Val His Thr Ile Arg Asp Ser Pro Glu Ser Ser Gln Asp Ser Gly Lys Arg Arg Lys Val Val Leu Ser Ser Pro Ser Gln Pro Lys Asn Gly Asn Ile Leu Arg Phe Lys Ile Lys Ser Ser Gln Asp Pro Gln Ser Ala Va1 Leu Glu Lys Pro Arg Val Leu Glu Gln Pro Leu Val Gln Gln Met Gly 5er Gly Ser Ser Leu Ser Gly Lys Gln Asn Ser Ile His His Lys Met Asn Val Arg Ser Thr Ser Gly Gln Arg Arg Val Asn Gly Asp Ser G1n A1a Val Gln Lys Cys Leu Ile Thr Glu Ser Pro Ala Lys Thr Met Gln Arg Leu Va1 Pro Gln Pro Ala Ala Lys Val Thr His Pro Val Asp Pro Gln Ser Ala Val Lys Val Pro Val G1y Arg Ser Gly Leu Pro Leu Lys Ser Ser Gly Ser Val Asp Pro Ser Pro Ala Arg Val Met Arg Arg Phe Asp Pro Pro Pro Val Lys Met Met Ser Gln Arg Val His His Pro Ala Ser Met Val Ser Gln Lys Val Asp Pro Pro Phe Pro Lys Val Leu His Lys Glu Thr Gly Ser Val Val Arg Leu Pro Glu Ala Thr Arg Pro Thr Val Leu Gln Lys Pro Lys Asp Leu Pro Ala Ile Lys Gln Gln Asp Ile Arg Thr Ser Ser Ser Lys Glu Glu Pro Cys Phe Ser G1y Arg Asn Ala Glu Ala Val Gln Val Gln Asp Thr Lys Leu Sex Arg Sex Asp Met Lys Lys Ile Arg Lys Ala Glu Lys Lys Asp Lys Lys Phe Arg Asp Leu Phe Val Thr Trp Asn Pro Val Leu Ile Glu Asn Glu Gly Ser Asp Leu Gly Asp Glu Asp Trp Leu Phe Ser Ser Lys Arg Asn Ser Asp Ala Ile Met Val Gln Ser Arg Ala Thr Asp Ser Ser Val Pro Ile His Pro Met Val Gln G1n Lys Pro Ser Leu Gln Pro Arg Ala Thr Phe Leu Pro Asp Leu Asn Met Tyr Gln Leu Pro Tyr Val Val Pro Phe <210> 49 <211> 495 <212> DNA
<213> Zea mays mays strain Pira <400> 49 ctcggcgggt agagcgcggt cgacgtcggc atgtcgaggt gcttccccta cccgccaccg 60 gggtacgtgcggaacccagtggccgtggccgagccggagtcgaccgctaaggttgttgaa120 ccttcggatttacacacgcacgtgccagatcgttgttcaatctgtaggttttgcgcggat180 ctgtggtttgcgcgtgcgtgatgtgggtattgsccgtgccttgaaagctaaccgagctga240 ggaagtgtatggatcttgtgtagctgcacgaggtcctccaaatcgattgtaaaatttaag300 ttgtatggscggtaggscaagattgggttagtccggttttcgaaaactggtagcatggtt360 atcggggacattgaaagaatggtagaacatcaaattcgattcaaaactgtgctagatttg420 catatttagtcgccctaaaattacgtggacgtgggtgatccgaattggttattgtatgat480 ggttggaatatgagc 495 <210> 50 <211> 1768 <212> DNA
<213> Zea mays mays strain Pira <400> 50 ctgttgacca atggagttca gaattatttt ggccctgtat atcaatagca accaacacca 60 tttattgagc ccatttttag ttttcttgtt ctgtagagta tgcattgttg caggtcttaa 120.
ctgttgtcag ggaagtaacg tgttcaacat gattgtaaac gaatacattc tgttgctaac 180 tgtgtaatga tgagaaggat aattgaataa tctttgtgaa gtattactgt ctgaactgta 240 cgcaatgctacattcattctttgtgttcgtgtaaatatcattatacataaaaatgctgct300 tgcattcccgtcgtccgttctaaatcagaactgacgattgctctggtggctgaagctcct360 gaaagaaaaggaaaaagccgaaaagaagaaagagaaaaggagtgacaggaaagctcccaa420 gcagtgtgagacgtccaaacattcaaagcacagccataagaagagaaagcttgaagatgt480 catcaaagctgagcagggtcccaaaagagtacccaaagaatcagttgagcagttggagaa540 gagtggactctcagaagagcatggagctccttcttttgtacatacgatacgtgactctcc600 tgagagctcacaggacagcggcaagagacgaaaggttgtcctgtccagtcctagccaacc660 taagaatggtgagactattctcttgtttttgctattctgattgatttattattatagaag720 aaatcaatcacttgttcaggattttattcatcccaacttgattttacaggaaacattctt780 cgcttcaagattaaaagtagtcaagatccccaatcagctgttctggagaaaccaagggtt840 cttgagcaaccattggtccaacaaatgggatcaggttcatccctgtctggcaagcaaaat900 tcaatccatcataagatgaatgtgagatctacctctggtcagcggagggtcaatggtgac960 tcccaagcagtacaaaaatgtttgattacagaatccccggcaaagaccatgcagagactt1020 gtcccccagcctgcagctaaggtcacacatcctgttgatccccagtcagctgttaaggtg1080 ccagttggaagatcgggcctacctctgaagtcttcgggaagtgtggacccttcgcctgct1140 agagttatgagaagatttgatcctccacctgttaagatgatgtcacagagagttcaccat1200 ccagcttccatggtgtcgcagaaagttgatcctccgtttccgaaggtattacataaggaa1260 accggatctgttgttcgcctaccagaagctacccggcctactgttcttcaaaaacccaag1320 gacttgcctgctatcaagcagcaggagatcaggacctcttyctcaaaagaagagccctgc1380 ttctctggtaggaatgcagaagcagttcaagtgcaggatactaagctctcccggtcagac1440 atgaagaaaatccgcaaagctgagaaaaaagataagaagttcagagatctgtttgttacc1500 tggaatccggtattgatagagaatgaaggttcagatcttggtgatgaagactggctgttc1560 agcagtaaaaggaactccgatgctatcatggttcaaagcagagctactgatagttcagtg1620 ccgatccatccaatggtgcagcagaagccttctttacaacccagggcaacatttttgccg1680 gaccttaatatgtaccagctgccatatgtcgtaccattttaaacatctggcgaggtagat1740 gagaattagatgagatgttgggagagag 1768 <210> 51 <211> 1347 <212> DNA
<213> Zea mays mays strain Pira <220>

<221> CDS
<222> (1)..(1347) <220>
<221> misc feature <222> (1).. (1347) <223> The 'Xaa' at location 329 stands for Ser, or Phe <400> 51 atg tcg agg tgc ttc ccc tac ccg cca ccg ggg tac gtg cgg aac cca 48 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro gtg gcc gtg gcc gag ccg gag tcg acc get aag ctc ctg aaa gaa aag 96 Val Ala Val Ala Glu Pro Glu Ser Thr Ala Lys Leu Leu Lys Glu Lys gaa aaa gcc gaa aag aag aaa gag aaa agg agt gac agg aaa get ccc 144 Glu Lys Ala Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg Lys Ala Pro aagcag tgtgagacg tccaaa cattcaaag cacagccat aagaagaga 192 LysGln CysGluThr SerLys HisSerLys HisSerHis LysLysArg aagctt gaagatgtc atcaaa getgagcag ggtcccaaa agagtaccc 240 LysLeu GluAspVal TleLys AlaGluGln GlyProLys ArgValPro aaagaa tcagttgag cagttg gagaagagt ggactctca gaagagcat 288 LysGlu SerValGlu GlnLeu GluLysSer G1yLeu.SerGluGluHis ggaget ccttctttt gtacat acgatacgt gactctcct gagagctca 336 GlyAla ProSerPhe ValHis ThrIleArg AspSerPro GluSerSer caggac agcggcaag agacga aaggttgtc ctgtccagt cctagccaa 384 GlnAsp SerGlyLys ArgArg LysValVa1 LeuSerSer ProSerGln cctaag aatggaaac attctt cgcttcaag attaaaagt agtcaagat 432 ProLys AsnGlyAsn IleLeu ArgPheLys IleLysSer SerGlnAsp ccccaa tcagetgtt ctggag aaaccaagg gttcttgag caaccattg 480 ProGln SerAlaVal LeuGlu LysProArg ValLeuGlu GlnProLeu gtccaa caaatggga tcaggt tcat.ccctg tctggcaag caaaattca 528 ValG1n GlnMetGly SerGly SerSerLeu SerGlyLys GlnAsnSer atc cat cat aag atg aat gtg aga tct acc tct ggt cag cgg agg gtc 576 Ile His His Lys Met Asn Val Arg Ser Thr Ser Gly Gln Arg Arg Val aat ggt gac tcc caa gca gta caa aaa tgt ttg att aca gaa tcc ccg 624 Asn Gly Asp Ser Gln Ala Val Gln Lys Cys Leu I1e Thr Glu Ser Pro gcaaagacc atgcagaga cttgtc ccccagcctgca getaag gtcaca 672 AlaLysThr MetGlnArg LeuVal ProGlnProAla AlaLys ValThr catcctgtt gatccccag tcaget gttaaggtgcca gttgga agatcg 720 HisProVal AspProGln SerAla ValLysValPro ValGly ArgSer ggcctacct ctgaagtct tcggga agtgtggaccct tcgcct getaga 768 GlyLeuPro LeuLysSer SerGly SerValAspPro SerPro A1aArg gttatgaga agatttgat cctcca cctgttaagatg atgtca cagaga 816 ValMetArg ArgPheAsp ProPro ProValLysMet MetSer G1nArg gttcaccat ccagettcc atggtg tcgcagaaagtt gatcct ccgttt 864 ValHisHis ProAlaSer MetVal SerGlnLysVal AspPro ProPhe ccgaaggta ttacataag gaaacc ggatctgttgtt cgccta ccagaa 912 ProLysVal LeuHisLys GluThr GlySerValVal ArgLeu ProGlu getaccegg cctactgtt ettcaa aaacceaag gacttgect getatc 960 A1aThrArg ProThrVal LeuGln LysProLys AspLeuPro AlaIle aagcagcag gagatcagg acctct tyctcaaaa gaagagccc tgcttc 1008 LysGlnGln GluIleArg ThrSer XaaSerLys GluGluPro CysPhe tctggtagg aatgca~gaa gcagtt caagtgcag gatactaag ctctcc 1056 SerGlyArg AsnAlaGlu AlaVal GlnValGln AspThrLys LeuSer cggtcagac atgaagaaa ateege aaagetgag aaaaaagat aagaag 1104 ArgSerAsp MetLysLys IleArg LysAlaGlu LysLysAsp LysLys ttcagagat ctgtttgtt acctgg aatccggta ttgatagag aatgaa 1152 PheArgAsp LeuPheVal ThrTrp AsnProVal LeuIleG1u AsnGlu ggttcagat cttggtgat gaagac tggctgttc agcagtaaa aggaac 1200 GlySerAsp LeuGlyAsp GluAsp TrpLeuPhe Ser.SerLys ArgAsn tecgatget atcatggtt caaagc agagetact gatagttca gtgecg 1248 SerAspA1a IleMetVal GlnSer ArgAlaThr AspSerSer ValPro atccatcca atggtgcag cagaag ccttcttta caacccagg gcaaca 1296 IleHisPro MetValGln GlnLys ProSerLeu GlnProArg A1aThr ttt ttg ccg gac ctt aat atg tac cag ctg cca tat gtc gta cca ttt 1344 Phe Leu Pro Asp Leu Asn Met Tyr Gln Leu Pro Tyr Val Val Pro Phe taa <2l0> 52 <211> 448 <212> PRT
<213> Zea mays mays strain Pira <220>
<221> misc_feature <222> (329)..(329) <223> The 'Xaa' at location 329 stands for Ser, or Phe.
<400> 52 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro Val Ala Val Ala Glu Pro Glu Ser Thr Ala Lys Leu Leu Lys Glu Lys Glu Lys Ala Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg Lys Ala Pro Lys G1n Cys Glu Thr Ser Lys His Ser Lys His Ser His Lys Lys Arg Lys Leu Glu Asp Val Tle Lys Ala G1u Gln Gly Pro Lys Arg Val Pro Lys Glu Ser Val Glu Gln Leu Glu Lys Ser Gly Leu Ser Glu Glu His Gly Ala Pro Ser Phe Val His Thr Ile Arg Asp Ser Pro Glu Ser Ser Gln Asp Ser Gly Lys Arg Arg Lys Val Val Leu Ser Ser Pro Ser Gln Pro Lys Asn Gly Asn Ile Leu Arg Phe Lys Ile Lys Ser Ser Gln Asp Pro Gln Ser Ala Val Leu Glu Lys Pro Arg Val Leu Glu Gln Pro Leu Val Gln Gln Met G1y Ser Gly Ser Ser Leu Ser Gly Lys Gln Asn Ser xU

Ile His His Lys Met Asn Val Arg Ser Thr Ser Gly Gln Arg Arg Val Asn Gly Asp Ser Gln Ala Val Gln Lys Cys Leu Ile Thr Glu Ser Pro Ala Lys Thr Met Gln Arg Leu Val Pro Gln Pro Ala Ala Lys Val Thr His Pro Val Asp Pro Gln Ser Ala Val Lys Val Pro Val Gly Arg Ser Gly Leu Pro Leu Lys Ser Ser Gly Ser Val Asp Pro Ser Pro Ala Arg Va1 Met Arg Arg Phe Asp Pro Pro Pro Val Lys Met Met Ser Gln Arg Val His His Pro Ala Ser Met Val Ser Gln Lys Val Asp Pro Pro Phe Pro Lys Val Leu His Lys G1u Thr Gly Ser Val Val Arg Leu Pro Glu Ala Thr Arg Pro Thr Val Leu Gln Lys Pro Lys Asp Leu Pro Ala Ile Lys Gln G1n Glu Ile Arg Thr Ser Xaa Ser Lys Glu Glu Pro Cys Phe Ser Gly Arg Asn Ala Glu Ala Val Gln Val Gln Asp Thr Lys Leu Ser Arg Ser Asp Met Lys Lys Ile Arg Lys Ala Glu Lys Lys Asp Lys Lys Phe Arg Asp Leu Phe Val Thr Trp Asn Pro Val Leu Ile Glu Asn Glu Gly Ser Asp Leu Gly Asp Glu Asp Trp Leu Phe Ser Ser Lys Arg Asn Ser Asp Ala Ile Met Val Gln Ser Arg Ala Thr Asp Ser Ser Val Pro Ile His Pro~Met Val Gln Gln Lys Pro Ser Leu Gln Pro Arg Ala Thr Phe.Leu Pro Asp Leu Asn Met Tyr Gln Leu Pro Tyr Val Val Pro Phe <210> 53 <211> 212 <212> DNA
<213> Zea mays mays strain Sari <400> 53 gcgcggtcga ccgtcggcat gtcgaggtgc ttcccctacc cgccaccggg gtacgtgcgg 60 aacccagtgg ccgtggccga gccggagtcg accgctaagg tttgttgaac cttcggattt 120 acacacgcac gtgccagatc gtttgttcaa tctgtaggtt ttgcgcggat ctgtggtttg 180 cgcgtgcgtg atgtgggtat tgcccgtgcc tt 212 <210>

<211>

<212>
DNA

<213> mays mays Zea strain Sari <400>

ttttttcctttttttttctgatgttactcttctgttgaccaaaggagttcagaattattt60 tggccctgtatatcaatagcaaccaacaccatttattgagcccatttttagttttcttgt120 tctgtagagtatgcattgttgcaggtcttaactgttgtcagggaagtaacgtgttcaaca180 tgattgtaaacgaatacaattctgttgctaactgtgtaatgatgagaaggataattgaat240 aatctttgtgaagtattactgtctgaactgtacgcaaatgctacattcattctttgtgtt300 cgtgtaaatatcattatacataaaaatgctgcattgcattcccgtcgtccgttctaaatc360 agaactgacgattgctctggtggctgaagctcctgaaagaaaaggaaaaggccgaaaaga420 agaaagagaaaaggagtgacaggaaagctcccaagcagtgtgagacgtccaaacattcaa480 agcacagccataagaagagaaagcttgaagatgtcatcaaagctgagcagggtcccaaaa540 gagtacccaaagaatcagttgagcagttggagaagagtggactctcagaagagcatggag600 ctccttcttttgtacatacgatacgtgactctcctgagagctcacaggacagcggcaaga660 gacgaaaggttgtcctgtccagtcctagccaacctaagaatggtgagactattctcttgt720 ttttgctattctgattgattttttattatagaagaaatcaatcgcttgttcaggatttta780 ttcatcccaacttgattttacaggaaacattcttcgcttcaagattaaaagtagtcaaga840 tccccaatcagctgttctggagaaaccaagggttcttgagcaaccattggtccaacaaat900 gggatcaggttcatccctgtcgggcaagcaaaattcaatccatcataagatgaatgtgag960 atctacctct ggtcagcgga gggtcaatgg tgactcccaa gcagtacaaa aatgtttgat 1020 tacagaatccccggcaaagaccatgcagagacttgtcccccagcctgcagctaaggtcac1080 acatcctgttgatccccagtcagctgttawggtgccagttggaagatcgggcctacctct1140 gaagtcttcgggaagtgtggacccttcgcctgctagagttatgagaagatttgatcctcc1200 acctgttaagatgatgtcacagagagttcaccatccagcttccatggtgtcgcagaaagt1260 tgatcctccgtttccgaaggtattacataaggaaaccggatctgttgttcgcctaccaga1320 agctacccggcctactgttcttcaaaaacccaaggacttgcctgctatcaagcagcagga1380 tatcaggacctcttcctcaaaagaagagccctgcttctctggtaggaatgcagaagcagt1440 tcaagtgcargatactaagctctcccggtcagayatgaagaaaatccgcaaagctgagaa1500 aaaagataagaagttcagagatctgtttgttacctggaatccggtattgatagagaatga1560 aggttcagatcttggtgatgaagactggctgttcagcagtaaaaggaactccgatgctat1620 catggttcaaagcagagctactgatagttcagtgccgatccatccaatggtgcagcagaa1680 gccttctttacaacccagggcaacatttttgccggaccttaatatgtaccagctgccata1740 tgtcgtaccattttaaacatctggcgaggtagatgagaattagatgagatgttgggagag1800 agc 1803 <210> 55 <211> 1347 <212> DNA
<213> Zea mays mays strain Sari <220>

<221>
CDS

<222> (1)..(1347) <220>

<221> feature misc <222> _ (1).
(1347) <223> 234stands The for Xaa Lys oat or position Met <400>

atg agg tgc ccc tacccgcca ccggggtac gtgcggaac cca 48 tcg ttc Met Arg Cys Pro TyrProPro ProGlyTyr ValArgAsn Pro Ser Phe gtg gtg gcc ccg gagtcgacc getaagctc ctgaaagaa aag 96 gcc gag Val Val Ala Pro GluSerThr AlaLysLeu LeuLysGlu Lys Ala Glu 20 , 25 30 gaa gcc gaa aag aaagagaaa aggagtgac aggaaaget ccc 144 aag aag Glu Ala Glu Lys LysGluLys ArgSerAsp ArgLysAla Pro Lys Lys aag tgt gag tcc aaacattca aagcacagc cataagaag aga 192 cag acg Lys Cys Glu Ser LysHisSer LysHisSer HisLysLys Arg Gln Thr aagctt gaagatgtc atcaaaget gagcagggt cccaaaaga gtaccc 240 LysLeu GluAspVal IleLysAla GluGlnGly ProLysArg ValPro aaagaa tcagttgag cagttggag aagagtgga ctctcagaa gagcat 288 LysGlu SerValGlu GlnLeuGlu LysSerGly LeuSerGlu GluHis ggaget ccttctttt gtacatacg atacgtgac tctcctgag agctca 336 GlyAla ProSerPhe ValHisThr IleArgAsp SerProGlu SerSer caggac agcggcaag agacgaaag gttgtcctg tccagtcct agccaa 384 GlnAsp SerGlyLys ArgArgLys ValValLeu SerSerPro SerGln cctaag aatggaaac attcttcgc ttcaagatt aaaagtagt caagat 432 ProLys AsnGlyAsn IleLeuArg PheLysI1e LysSerSer GlnAsp ccccaa tcagetgtt ctggagaaa ccaagggtt cttgagcaa ccattg 480 ProGln SerA1aVal LeuGluLys ProArgVal LeuGluG1n ProLeu gtccaa caaatggga tcaggttca tccctgtcg ggcaagcaa aattca 528 ValGln GlnMetGly SerG1ySer SerLeuSer GlyLysGln AsnSer atccat cataagatg aatgtgaga tctacctct ggtcagcgg agggtc 576 IleHis HisLysMet AsnValArg SerThrSer GlyGlnArg ArgVal aatggt gactcccaa gcagtacaa aaatgtttg attacagaa tccccg 624 AsnGly AspSerGln AlaValGln LysCysLeu IleThrGlu SerPro gcaaag accatgcag agacttgtc ccccagcct gcagetaag gtcaca 672 AlaLys ThrMetGln ArgLeuVal ProGlnPro AlaAlaLys ValThr catcct gttgatccc cagtcaget gttawggtg ccagttgga agatcg 720 HisPro ValAspPro GlnSerAla Va1XaaVal ProValGly ArgSer ggccta cctctgaag tcttcggga agtgtggac ccttogcct getaga 768 GlyLeu ProLeuLys SerSerGly SerValAsp ProSerPro AlaArg gttatg agaagattt gatcctcca cctgttaag atgatgtca cagaga 816 ValMet ArgArgPhe AspProPro ProValLys MetMetSer GlnArg gttcac catccaget tccatggtg tcgcagaaa gttgatcct ccgttt 864 ValHis HisProAla SerMetVal SerGlnLys ValAspPro ProPhe ccgaag gtattacat aaggaaacc ggatctgtt gttcgccta ccagaa 912 ProLys ValLeuHis LysGluThr Gly Va1 ValArgLeu ProGlu Ser getacccgg cctactgtt cttcaaaaa cccaaggac ttgcct getatc 960 AlaThrArg ProThrVal LeuGlnLys ProLysAsp LeuPro AlaIle aagcagcag gatatcagg acctcttcc tcaaaagaa gagccc tgcttc 1008 LysGlnGln AspIleArg ThrSerSer SerLysGlu GluPro CysPhe tctggtagg aatgcagaa gcagttcaa gtgcargat actaag ctctcc 1056 SerGlyArg AsnAlaGlu AlaValGln ValGlnAsp ThrLys LeuSer cggtcagay atgaagaaa atccgcaaa getgagaaa aaagat aagaag 1104 ArgSerAsp MetLysLys IleArgLys AlaGluLys LysAsp LysLys ttcagagat ctgtttgtt acctggaat ccggtattg atagag aatgaa 1152 PheArgAsp LeuPheVal ThrTrpAsn ProValLeu IleGlu AsnGlu ggttcagat cttggtgat gaagactgg ctgttcagc agtaaa aggaac 1200 GlySerAsp LeuGlyAsp GluAspTrp LeuPheSer SerLys ArgAsn tccgatget atcatggtt caaagcaga getactgat agttca gtgccg 1248 SerAspAla IleMetVal GlnSerArg AlaThrAsp SerSer ValPro atccatcca atggtgcag cagaagcct tctttacaa cccagg gcaaca 1296 IleHisPro MetValGln GlnLysPro SerLeuG1n ProArg AlaThr tttttgccg gaccttaat atgtaccag ctgccatat gtcgta .ccattt 1344 PheLeuPro AspLeuAsn MetTyrGln LeuProTyr ValVal ProPhe taa <210> 56 <211> 448 <212> PRT
<213> Zea mays mays strain Sari <220>
<221> misc_feature <222> (234)..(234) <223> The 'Xaa' at location 234 stands for Lys, or Met.
<400> 56 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro Val Ala Val Ala Glu Pro Glu Ser Thr Ala Lys Leu Leu Lys Glu Lys Glu Lys Ala Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg Lys Ala Pro Lys Gln Cys Glu Thr Ser Lys His Ser Lys His Ser His Lys Lys Arg Lys Leu Glu Asp Val Ile Lys Ala Glu G1n Gly Pro Lys Arg Val Pro Lys Glu Ser Val Glu Gln Leu Glu Lys Ser Gly Leu Ser G1u Glu His Gly Ala Pro Ser Phe Val His Thr Ile Arg Asp Ser Pro Glu Ser Ser Gln Asp Ser Gly Lys Arg Arg Lys Val Val Leu Ser Ser Pro Ser Gln 115 120 l25 Pro Lys Asn Gly Asn Ile Leu Arg Phe Lys Ile Lys Ser Ser Gln Asp Pro G1n Ser Ala Val Leu Glu Lys Pro Arg Val Leu Glu Gln Pro Leu Val Gln Gln Met G1y Ser Gly Ser Ser Leu Ser Gly Lys Gln Asn Ser Ile His His Lys Met Asn Val Arg Ser Thr Ser Gly Gln Arg Arg Val Asn Gly Asp Ser Gln Ala Val Gln Lys Cys Leu Ile Thr Glu Ser Pro A1a Lys Thr Met Gln Arg Leu Val Pro Gln Pro Ala Ala Lys Val Thr His Pro Val Asp Pro Gln Ser Ala Val Xaa Val Pro Val Gly Arg Ser Gly Leu Pro Leu Lys Ser Ser Gly Ser Val Asp Pro Ser Pro Ala Arg Val Met Arg Arg Phe Asp Pro Pro Pro Val Lys Met Met Ser G1n Arg Val His His Pro Ala~Ser Met Val Ser Gln Lys Val Asp Pro Pro Phe Pro Lys Val Leu His Lys Glu Thr Gly Ser Val Val Arg Leu Pro Glu A1a Thr Arg Pro Thr Va1 Leu Gln Lys Pro Lys Asp Leu Pro Ala Ile Lys Gln Gln Asp Ile Arg Thr Ser Ser Ser Lys Glu Glu Pro Cys Phe Ser Gly Arg Asn Ala Glu Ala Val Gln Val Gln Asp Thr Lys Leu Ser Arg Ser Asp Met Lys Lys Ile Arg Lys Ala Glu Lys Lys Asp Lys Lys Phe Arg Asp Leu Phe Val Thr Trp Asn Pro Val Leu Ile Glu Asn Glu Gly Ser Asp Leu Gly Asp Glu Asp Trp Leu Phe Ser Ser Lys Arg Asn Ser Asp Ala Ile Met Val Gln Ser Arg Ala Thr Asp Ser Ser Val Pro 405 410 4l5 Ile His Pro Met Val Gln Gln Lys Pro Ser Leu Gln Pro Arg Ala Thr Phe Leu Pro Asp Leu Asn Met Tyr Gln Leu Pro Tyr Val Val Pro Phe <210>

<211>

<212>
DNA

<213> mays maysstrain Zea Smena <400>

gattgatttcgagcgattcgattccttgtgatctctcggcggggtagagcgcggtcgacc60 ' gtcggccatgtcgaggtgcttcccctacccgccaccggggtacgtgcggaacccagtggc120 cgtggccgagccggagtcgaccgctaaggtttgttgaaccttcggatttacacacgcacg180 tgccagatcgtttgttcaatatgtaggttttgcgcggatctgtggtttgcgcgtgcgtga240 tgtgggtattgcccgtgcctaagctaaccgagctgaggaagtgtatggatcttgtgtagc300 tgcac 305 <210> 58 <211> 2208 <212> DNA
<213> Zea mays mays strain Smena <400>

tttagtcgccctaaaaatacgtggacgtgggtgatccgaattggttgttgtatgatggtt60 ggaatatgagccatctagtgcttccgtgactggccaaatttttttgtttctcaaagtttt120 ctttgaaaaactgtttgtcgagcgtcaattcgtatttacctgaatttactaattcttaat~

acagtatgtcgttattttgggctaagcttgtgtaagaagggtcgtttgacattttgtact240 gtattaatgctgttttgtgtttctttgttcggagcagcattcaatgctccttttgttgtt300 tgagagaatctgatatttgccatcgtaccgaaagtccgaaaccaactattcaaattggga360 tttcatttctttttttttctactgtttttagagttctctttttcgctgctgtgctcttgt420 gggtcagtacgtgcatttctctctttttttctttttttttctgatgttactcttctgttg480 accaaaggagttcagaattattttggccctgtatatcaatttgcaaccaacaccatttat540 tgagcccatttttagttttcttgttctgtagagttatgcattgtttcaggtcttaactgt600 tgtcagggaagtaacgtgttcaacatgattgtaaacgaatacaattctgttgctaactgt660 gtaatgatgagaaggataattgaatagtctttgtgaagtattactgtctgaactgtacgc720 aaatgctacattcattctgtgttcatgtaaatatcattatacataaaaatgctgcattgc780 attcccgtcgtccgttctaaatcagaactgacgattgctctggtggctgaagctcctgaa840 agaaaaggaaaaggccgaaaagaagaaagagaaaaggagtgacaggaaagatcccaagca900 gtgtgagacgtccaaacactcaaagcacagccataagaagagaaagcttgaagatgtcat960 caaagctgagcagggtcccaaaagagtacccaaagaatcagttgagcagttggagaagag1020 tggactctcagaagagcatggagctccttcttttgtacatacgatacgggactctcctga1080 gagctcacaggacagcggcaagagacgaaaggttgtcctgtccagtcctagccaacctaa1140 gaatggtgagactattctcttgtttttgctattctgattgatttattattatagaagaaa1200 tcaatcacttgttcaggattttattcatcccaacttgattttacaggaaacattcttcgc1260 ttcaagattaaaagtagtcaagatccccaatcagctgttctggagaaaccaagggttctt1320 gagcaaccattggtccaacaaatgggatcaggttcatccctgtcgggcaagcaaaattca1380 atccatcataagatgaatgtgagatctacctctggtcagcggagggtcaatggtgactcc1440 caagcagtac aaaaatgttt gattacagaa tccccggcaa agaccatgca gagacttgtc 1500 ccccagcctg cagctaaggt cacacatcct gttgatcccc agtcagctgt taaggtgcca 1560 gttggaagat cgggcctacc tctgaagtct tcaggaagtg tggacccttc gcctgctaga 1620 8~

gttatgagaa gatttgatcctccacctgttaagatgatgtcacagagagttcaccatcca1680 gcttccatgg tgtcgcagaaagttgatcctccgtttccgaaggtattacataaggaaacc1740 ggatctgttg ttcgcctaccagaagctacccggcctactgttcttcaaaaacccaaggac1800 ttgccttcta tcaagcagcaggagatcaggacctcttcctcaaaagaagagccctgcttc1860 tctggtagga atgcagaagctgttcaagtgcaggatactaagctctcccggtcagatatg1920 aagaaaatcc gcaaagctgagaaaaaagataagaagttcagagatctgtttgttacctgg1980 aatccggtat tgatagagaatgaaggttcagatcttggtgatgaagactggctgttcagc2040 agtaaaagga actccgatgctatcatggttcaaagcagagctactgatagttcagtgccg2100 atccatccaa tggtgcagcagaagccttctttacaacccagggcaacatttttgccggac2160 cttaatatgt accagctgcc atatgtcgta ccattttaaa catctggc 2208 <210> 59 <211> 1640 <212> DNA
<213> Zea mays parviglumis strain Wilkes <400>

tcagggaagtaacgtgttcaacatgattgtaaacgaataccattctgttgctaactgtgt 60 aatgatgagaaggataattgaataatctttgtgaagtattactgtctgaactgtacgcct 120 aatgctacattcattctttgtgttcgtgtaaatatcattatacataaatgctgcattgca 180 ttcccgtcgtccgttctaaatcagaactgacgattgctctggtggctgaagctcctgaaa 240 gaaaaggaaaaggccgaaaagaagaaagagaaaaggagtgacaggaaagctcccaagcag 300 tgtgagacgtccaaacattcaaagcacagccataagaagagaaagcttgaagatgtcatc 360 aaagctgagcagggtcccaaaagagtacccaaagaatcagttgagcagttggagaagagt 420 ggactctcagaagagcatggagctccttcttttgtacatacgatacgtgactctcctgag 480 agctcacaggacagcggcaagagacgaaaggttgtcctgtccagtcctagccaacctaag540 aatggtgagactattctcttgtttttgctattctgattgattttttattatagaagaaat600 caatcgcttgttcaggattttattcatcccaacttgattttacaggaaacattcttcgct660 tcaagattaaaagtagtcaagatccccaatcagctgttctggagaaaccaagggttcttg720 agcaaccattggtccaacaaatgggatcaggttcatccctgtcgggcaagcaaaattcaa780 tccatcataagatgaatgtgagatctacctctggtcagcggagggtcaatggtgactccc840 aagcagtacaaaaatgtttgattacagaatccccggcaaagaccatgcagagacttgtcc900 cccagcctgcagctaaggtcacacatcctgttgatccccagtcagctgttaaggtgccag960 ttggaagatcgggcctacctctgaagtcttcgggaagtgtggacccttcgcctgctagag1020 ttatgagaagatttgatcctccacctgttaagatgatgtcacagagagttcaccatccag1080 cttccatggtgtcgcagaaagttgatcctccgtttccgaaggtattacataaggaaaccg1140 gatctgttgttcgcctaccagaagctacccggcctactgttcttcaaaaacccaaggact1200 tgcctgctatcaagcagcaggatatcaggacctcttcctcaaaagaagagccctgcttct1260 ctggtaggaatgcagaagcagttcaagtgcaagatactaagctctcccggtcagacatga1320 agaaaatccgcaaagctgagaaaaaagataagaagttcagagatctgtttgttacctgga1380 atccggtattgatagagaatgaaggttcagatcttggtgatgaagactggctgttcagca1440 gtaaaaggaactccgatgctatcatggttcaaagcagagctactgatagttcagtgccga1500 tccatccaatggtgcagcagaagccttctttacaacccagggcaacatttttgccggacc1560 ttaatatgtaccagctgccatatgtcgtaccattttaaacatctggcgaggtagatgaga1620 attagatgagatgttgggag 1640 <210> 60 <211> 1347 <212> DNA
<213> Zea mays mays strain Smena <220>
<221> CDS
<222> (1)..(1347) <400> 60 atgtcgagg tgcttcccc tacccgcca ccggggtac gtgcggaac cca 48 MetSerArg CysPhePro TyrProPro ProGlyTyr ValArgAsn Pro gtggccgtg gccgagccg gagtcgacc getaagctc ctgaaagaa aag 96 ValAlaVal AlaGluPro GluSerThr AlaLysLeu LeuLysGlu Lys gaaaaggcc gaaaagaag aaagagaaa aggagtgac aggaaagat ccc 144 GluLysA1a GluLysLys LysGluLys ArgSerAsp ArgLysAsp Pro aagcagtgt gagacgtcc aaacactca aagcacagc cataagaag aga 192 LysGlnCys GluThrSer LysHisSer LysHisSer HisLysLys Arg aagcttgaa gatgtcatc aaagetgag cagggtccc aaaagagta ccc 240 LysLeuGlu AspValIle LysAlaGlu GlnGlyPro LysArgVal Pro.

aaagaatca gttgagcag ttggagaag agtggactc tcagaagag cat 288 LysGluSer ValGluGln LeuGluLys SerGlyLeu SerGluGlu His ggagetcct tcttttgta catacgata cgggactct cctgagagc tca 336 GlyAlaPro SerPheVal HisThrIle ArgAspSer ProGluSer Ser caggac agcggcaag agacgaaag gttgtcctg tccagtcct agccaa 384 GlnAsp SerGlyLys ArgArgLys ValValLeu SerSerPro SerGln cctaag aatggaaac attcttcgc ttcaagatt aaaagtagt caagat 432 ProLys AsnGlyAsn IleLeuArg PheLysIle LysSerSer GlnAsp 130 l35 140 ccccaa tcagetgtt ctggagaaa ccaagggtt cttgagcaa ccattg 480 ProGln SerAlaVal LeuGluLys ProArgVal LeuGluGln ProLeu gtccaa caaatggga tcaggttca tccctgtcg ggcaagcaa aattca 528 ValGln GlnMetGly SerGlySer SerLeuSer GlyLysGln AsnSer atccat oataagatg aatgtgaga tctacctct ggtcagcgg agggtc 576 IleHis HisLysMet AsnValArg SerThrSer GlyGlnArg ArgVal aatggt gactcccaa gcagtacaa aaatgtttg attacagaa tccccg 624 AsnGly AspSerGln AlaValGln LysCysLeu IleThrGlu SerPro gcaaag accatgcag agacttgtc ccccagcct gcagetaag gtcaca 672 AlaLys ThrMetG1n ArgLeuVal ProGlnPro AlaAlaLys ValThr catcct gttgatccc cagtcaget gttaaggtg ccagttgga agatcg 720 HisPro ValAspPro GlnSerAla ValLysVal ProValGly ArgSer ggccta cctctgaag tcttcagga agtgtggac ccttcgcct getaga 768 GlyLeu ProLeuLys SerSerGly SerValAsp ProSerPro AlaArg gttatg agaagattt gatcctcca cctgttaag atgatgtca cagaga 816 ValMet ArgArgPhe AspProPro ProValLys MetMetSer GlnArg gttcac catccaget tccatggtg tcgcagaaa gttgatcct ccgttt 864 ValHis HisProA1a SerMetVal SerGlnLys ValAspPro ProPhe ccgaag gtattacat aaggaaacc ggatctgtt gttcgccta ccagaa 912 ProLys ValLeuHis LysGluThr GlySerVal ValArgLeu ProGlu getacc cggcctact gttcttcaa aaacccaag gacttgcct tctatc 960 AlaThr ArgProThr ValLeuGln LysProLys AspLeuPro SerIle aagcag caggagatc aggacctct tcctcaaaa gaagagccc tgcttc 1008 LysGln GlnGluIle ArgThrSer SerSerLys GluGluPro CysPhe tctggt aggaatgca gaagetgtt caagtgcag gatactaag ctctcc 1056 SerGly AsnAla GluAlaVal GlnValGln AspThrLys LeuSer Arg cgg tcagatatg aagaaaatc cgcaaaget gagaaaaaa gataag aag 1104 Arg SerAspMet LysLysTle ArgLysAla GluLysLys AspLys Lys ttc agagatctg tttgttacc tggaatccg gtattgata gagaat gaa 1152 Phe ArgAspLeu PheValThr TrpAsnPro ValLeuIle GluAsn Glu ggt tcagatctt ggtgatgaa gactggctg ttcagcagt aaaagg aac 1200 Gly SerAspLeu GlyAspGlu AspTrpLeu PheSerSex LysArg Asn tcc gatgetatc atggttcaa agcagaget actgatagt tcagtg ccg 1248 Ser AspAlaIle MetValGln SerArg,AlaThrAspSer SerVal Pro atc catccaatg gtgcagcag aagccttct ttacaaccc agggca aca 1296 Ile HisProMet ValGlnGln LysProSer LeuGlnPro ArgAla Thr ttt ttgccggac cttaatatg taccagctg ccatatgtc gtacca ttt 1344 Phe LeuProAsp LeuAsnMet TyrGlnLeu ProTyrVal ValPro Phe taa <2l0>

<211>

<212>
PRT

<213> maysstrain Smena Zea mays <400> 61 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro Val Ala Val Ala Glu Pro Glu Ser Thr Ala Lys Leu Leu Lys Glu Lys Glu Lys Ala Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg Lys Asp Pro Lys Gln Cys Glu Thr Ser Lys His Ser Lys His Ser His Lys Lys Arg Lys Leu Glu Asp Val Ile Lys Ala Glu Gln Gly Pro Lys Arg Val Pro Lys Glu Ser Val Glu Gln Leu Glu Lys Ser Gly Leu Ser Glu Glu His Gly Ala Pro Ser Phe Val His Thr Ile Arg Asp Ser Pro Glu Ser Ser Nl G1n Asp Ser Gly Lys Arg Arg Lys Val Val Leu Ser Ser Pro Ser Gln 115 a 120 125 Pro Lys Asn Gly Asn Ile Leu Arg Phe Lys Ile Lys Ser Ser Gln Asp Pro Gln Ser Ala Val Leu Glu Lys Pro Arg Val Leu Glu Gln Pro Leu 145 150 l55 160 Val Gln Gln Met Gly Ser Gly Ser Ser Leu Ser Gly Lys Gln Asn Ser Ile His His Lys Met Asn Val Arg Ser Thr Ser Gly Gln Arg Arg Val Asn Gly Asp Ser Gln Ala Val G1n Lys Cys Leu Ile Thr Glu Ser Pro Ala Lys Thr Met Gln Arg Leu Val Pro Gln Pro Ala Ala Lys Va1 Thr His Pro Val Asp Pro Gln Ser Ala Val Lys Val Pro Va1 Gly Arg Ser Gly Leu Pro Leu Lys Ser Ser G1y Ser Val Asp Pro Ser Pro Ala Arg Val Met Arg Arg Phe Asp Pro Pro Pro Val Lys Met Met Ser Gln Arg Val His His Pro Ala Ser Met Val Ser Gln Lys Val Asp Pro Pro Phe Pro Lys Val Leu His Lys Glu Thr Gly Ser Val Val Arg Leu Pro Glu Ala Thr Arg Pro Thr Val Leu Gln Lys Pro Lys Asp Leu Pro Ser Ile Lys Gln Gln Glu Ile Arg Thr Ser Ser Ser Lys Glu Glu Pro Cys Phe Ser Gly Arg Asn Ala Glu Ala Val G1n Va1 Gln Asp Thr Lys Leu Ser Arg Ser Asp Met Lys Lys Ile Arg Lys Ala G1u Lys Lys Asp Lys Lys Phe Arg Asp Leu Phe Val Thr Trp Asn Pro Va1 Leu Ile Glu Asn Glu Gly Ser Asp Leu Gly Asp Glu Asp Trp Leu Phe Ser Ser Lys Arg Asn Ser Asp Ala Ile Met Val Gln Ser Arg Ala Thr Asp Ser 5er Val Pro Ile His Pro Met Val Gln Gln Lys Pro Ser Leu Gln Pro Arg Ala Thr Phe Leu Pro Asp Leu Asn Met Tyr G1n Leu Pro Tyr Val Val Pro Phe <210>

<211>

<212>
DNA

<213> mays mays Zea strain <400>

atgtcgaggtgcttcccctacccgccaccggggtacgtgcggaacccagtggccgtggcc 60 gagccggagtcgaccgctaaggtttgttgaaccttcggatttacacacgcacgtgccaga 120 tcgtttgttcaatctgtaggttttgcgcggatctgtggtttgcgcgtgcgtgatgtggcc 180 ctgtgccttgaaagctaaccgagctgaggaagtgtatggatcttgtgtagctgcacgagg 240 tcctccaaatcgattgtaaaatttaagttgtatggccggtaggccaagattgggttagtc 300 cggttttcgaaaactggtagcatggttatcggggacattgaaagaatggtagaacatcaa 360 attcgattcaaaactgtgctagatttgcatatttagtcgccctaaaattacgtggacgtg 420 ggtgatccgaattggttgttgtatgatggttggaagtgactggccaaatttttttgtttc 480 tcaaagttttctttgaaaaactgtttgtcgagcgtcaattcgtatttacctgaatttact 540 aattcttaatacagtatttcgttattttcggctaagcttgtgtaagaagggtcgtttgac 600 attttgtactgtattaatgctgttttgtgtttctttgttcggagcagcattcaatgctcc 660 ttttgttgtttgagagaatctgatatttgccatcgtaccgaaagtccgaaaccaactatt 720 caaattgggatttcatttcttttttctactgtttttagagttctctttttcgctgctgtg 780 ctcttgtgggtcagtacgtgcatttctctttttttttctgatgttactcttctgttgacc 840 aaaggagttcagaattattttggccctgtatatcaatagcaaccaacaccatt 893 <210> 63 <211> 1411 <212> DNA
<213> Zea mays mays strain W22 <400>

ctcctgaaagaaaaggaaaaggccgaaaagaagaaagagaaaaggagtgacaggaaagct60 cccaagcagtgtgagacgtccaaacattcaaagcacagccataagaagagaaagcttgaa120 gatgtcatcaaagctgagcagggtcccaaaagagtacccaaagaatcagttgagcagttg180 gagaagagtggactctcagaagagcatggagctccttcttttgtacatacgatacgtgac240 tctcctgagagctcacaggacagcggcaagagacgaaaggttgtcctgtccagtcctagc300 caacctaagaatggtgagactattctcttgtttttgctattctgattgattttttattat360 agaagaaatcaatcgcttgttcaggattttattcatcccaacttgattttacaggaaaca420 ttcttcgcttcaagattaaaagtagtcaagacccccaatcagctgttctggagaaaccaa480 gggttcttgagcaaccattggtccaacaaatgggatcaggttcatccccgtcgggcaagc540 aaaattcaatccatcataagatgaatgtgagatctacctctggtcagcggagggtcgatg600 gtgactcccaagcagtacaaaaatgtttgattacagaatccccggcaaagaccatgcaga660 gacttgtcccccagcctgcagctaaggtcacacatcctgttgatccccagtcagctgtta720 aggtgccagttggaagatcgggcctacctctgaagtcttcgggaagtgtggacccttcgc780 ctgctagagttatgagaagatttgatcctccacctgttaagatgatgtcacagagagttc840 accatccagcttccatggtgtcgcagaaagttgatcctccgtttccgaaggtattacata900 aggaaaccggatctgttgttcgcctaccagaagctacccggcctactgttcttcaaaaac960 ccaaggacttgcctgctatcaagcagcaggatatcaggacctcttcctcaaaagaagagc1020 cctgcttctctggtaggaatgcagaagcagttcaagtgcaagatactaagctctcccggt1080 cagacatgaagaaaatccgcaaagctgagaaaaaagataagaagttcagagatctgtttg1140 ttacctggaatccggtattgatagagaatgaaggttcagatcttggtgatgaagactggc1200 tgttcagcagtaaaaggaactccgatgctatcatggttcaaagcagagctactgatagtt1260 cagtgccgatccatccaatggtgcagcagaagccttctttacaacccagggcaacatttt1320 tgccggaccttaatatgtaccagctgccatatgtcgtaccattttaaacatctggcgagg1380 tagatagaattagatagatgttgggagagag 1411 <210> 64 <211> 1347 <212> DNA
<213> Zea mays mays strain W22 <220>
<221> CDS
<222> (1)..(1347) <400> 64 atgtcgagg tgcttc ccctacccg ccaccgggg tacgtgcgg aaccca 48 MetSerArg CysPhe ProTyrPro ProProGly TyrValArg AsnPro gtggccgtg gccgag ccggagtcg accgetaag ctcctgaaa gaaaag 96 ValAlaVal AlaGlu ProGluSer ThrAlaLys LeuLeuLys GluLys gaaaaggcc gaaaag aagaaagag aaaaggagt gacaggaaa getccc 144 GluLysAla GluLys LysLysGlu LysArgSer AspArgLys AlaPro aagcagtgt gagacg tccaaacat tcaaagcac agccataag aagaga 192 LysGlnCys G1uThr SerLysHis SerLysHis SerHisLys LysArg aagcttgaa gatgtc atcaaaget gagcagggt cccaaaaga gtaccc 240 LysLeuGlu AspVal IleLysAla GluGlnGly ProLysArg ValPro aaagaatca gttgag cagttggag aagagtgga ctctcagaa gagcat .288 LysGluSer ValGlu GlnLeuGlu LysSerGly LeuSerGlu GluHis ggagetcct tctttt gtacatacg atacgtgac tctcctgag agctca 336 GlyAlaPro SerPhe ValHisThr IleArgAsp SerProGlu SerSer caggacagc ggcaag agacgaaag gttgtcctg tccagtcct agccaa 384 GlnAspSer GlyLys ArgArgLys ValValLeu SerSerPro SerGln cctaagaat ggaaac attcttcgc ttcaagatt aaaagtagt caagac 432 ProLysAsn GlyAsn IleLeuArg PheLysI1e LysSerSer GlnAsp ccccaatca getgtt ctggagaaa ccaagggtt cttgagcaa ccattg 480 ProGlnSer AlaVal LeuGluLys ProArgVal LeuGluGln ProLeu gtccaacaa atggga tcaggttca tccccgtcg ggcaagcaa aattca 528 ValGlnGln MetGly SerGlySer SerProSer GlyLysGln AsnSer atccatcat aagatg aatgtgaga tctacctct ggtcagcgg agggtc 576 IleHisHis LysMet AsnValArg SerThrSer GlyGlnArg ArgVal gatggtgac tcccaa gcagtacaa aaatgtttg attacagaa tccccg 624 AspGlyAsp SerGln AlaValGln LysCysLeu I1eThrGlu SerPro gcaaagacc atgcag agacttgtc ccccagcct gcagetaag gtcaca 672 AlaLysThr MetGln ArgLeuVal ProGlnPro AlaAlaLys ValTar catcctgtt gatccccag tcagetgtt aaggtg ccagttgga agatcg 720 HisProVal AspProGln SerAlaVal LysVa1 ProValGly ArgSer ggcctacct ctgaagtct tcgggaagt gtggac ccttcgcct getaga 768 GlyLeuPro LeuLysSer SerGlySer ValAsp ProSerPro AlaArg gttatgaga agatttgat cctccacct gttaag atgatgtca cagaga 816 ValMetArg ArgPheAsp ProProPro ValLys MetMetSer GlnArg gttcaccat ccagettcc atggtgtcg cagaaa gttgatcct ccgttt 864 ValHisHis ProAlaSer MetValSer GlnLys ValAspPro ProPhe ccgaaggta ttacataag gaaaccgga tctgtt gttcgccta ccagaa 912 ProLysVal LeuHisLys GluThrGly SerVal ValArgLeu ProGlu getacccgg cctactgtt cttcaaaaa cccaag gacttgcct getatc 960 AlaThrArg ProThrVal LeuGlnLys ProLys AspLeuPro AlaIle aagcagcag gatatcagg acctcttcc tcaaaa gaagagccc tgcttc 1008 LysGlnGln AspI1eArg ThrSerSer SerLys GluG1uPro CysPhe tctggtagg aatgcagaa gcagttcaa gtgcaa gatactaag ctctcc 1056 SerGlyArg AsnAlaGlu AlaValGln ValGln AspThrLys LeuSer cggtcagac atgaagaaa atccgcaaa getgag aaaaaagat aagaag 1104 ArgSerAsp MetLysLys IleArgLys AlaGlu LysLysAsp LysLys ttcagagat ctgtttgtt acctggaat ccggta ttgatagag aatgaa 1152 PheArgAsp LeuPheVal ThrTrpAsn ProVah LeuIleGlu AsnG1u ggttcagat cttggtgat gaagactgg ctgttc agcagtaaa aggaac 1200 GlySerAsp LeuGlyAsp GluAspTrp LeuPhe SerSerLys ArgAsn tccgatget atcatggtt caaagcaga getact gatagttca gtgccg 1248 SerAspAla IleMetVal GlnSerArg AlaThr AspSerSer ValPro atccatcca atggtgcag cagaagcct tcttta caacccagg gcaaca 1296 IleHisPro MetValGln GlnLysPro SerLeu GlnProArg AlaThr tttttgccg gaccttaat atgtaccag ctgcca tatgtcgta ccattt 1344 PheLeuPro AspLeuAsn MetTyrGln LeuPro TyrValVal ProPhe taa 1347 <210> 65 <211> 448 <212> PRT
<213> Zea mays mays strain W22 <400> 65 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro Val Ala Val Ala Glu Pro Glu Ser Thr Ala Lys Leu Leu Lys Glu Lys Glu Lys Ala Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg Lys Ala Pro Lys Gln Cys Glu Thr Ser Lys His Ser Lys His Ser His Lys Lys Arg Lys Leu Glu Asp Val Ile Lys Ala Glu Gln Gly Pro Lys Arg Val Pro Lys Glu Ser Val Glu Gln Leu Glu Lys Ser Gly Leu Ser Glu Glu His Gly Ala Pro Ser Phe Val His Thr Ile Arg Asp Ser Pro Glu Ser Ser Gln Asp Ser Gly Lys Arg Arg Lys Val Val Leu Ser Ser Pro Ser Gln 115 120 , 125 Pro Lys Asn Gly Asn Ile Leu Arg Phe Lys Ile Lys Ser Ser Gln Asp Pro Gln Ser Ala Val Leu Glu Lys Pro Arg Val Leu Glu Gln Pro Leu Va1 Gln Gln Met Gly Ser Gly Ser Ser Pro Ser Gly Lys Gln Asn Ser Ile His His Lys Met Asn Val Arg Ser Thr Ser Gly Gln Arg Arg Val Asp G1y Asp Ser Gln Ala Val Gln Lys Cys Leu Ile Thr Glu Ser Pro Ala Lys Thr Met Gln Arg Leu Val Pro Gln Pro Ala Ala Lys Val Thr yt5 His Pro Val Asp Pro Gln Ser Ala Val Lys Va1 Pro Val Gly Arg Ser G1y Leu Pro Leu Lys Ser Ser Gly Ser Val Asp Pro Ser Pro Ala Arg Val Met Arg Arg Phe Asp Pro Pro Pro Va1 Lys Met Met Ser Gln Arg Val His His Pro Ala Ser Met Val Ser Gln Lys Val Asp Pro Pro Phe , Pro Lys Val Leu His Lys Glu Thr Gly Ser Val Val Arg Leu Pro G1u Ala Thr Arg Pro Thr Val Leu Gln Lys Pro Lys Asp Leu Pro Ala Ile Lys Gln Gln Asp Ile Arg Thr Ser Ser Ser Lys Glu Glu Pro Cys Phe Ser Gly Arg Asn Ala Glu Ala Val Gln Val Gln Asp Thr Lys Leu Ser Arg Ser Asp Met Lys Lys Ile Arg Lys Ala Glu Lys Lys Asp Lys Lys Phe Arg Asp Leu Phe Val Thr Trp Asn Pro Val Leu Ile Glu Asn Glu Gly Ser Asp Leu Gly Asp Glu Asp Trp Leu Phe Ser Ser Lys Arg Asn Ser Asp Ala Ile Met Val Gln Ser Arg Ala Thr Asp Ser Ser Val Pro Ile His Pro Met Va1 G1n Gln Lys Pro Ser Leu Gln Pro Arg A1a Thr Phe Leu Pro Asp Leu Asn Met Tyr Gln Leu Pro Tyr Val Val Pro Phe <210> 66 <211> 2644 <212> DNA

<213> Zea mays parviglumis strain Benz <400>

atgtcgaggtgcttcccctacccgccaccggggtacgtgcggaacccagtggccgtggcc60 gagccggagtcgaccgctaaggtttgttgaaccttcggatttacacacgcacgtgccaga120 tcgtttgttcaatctgtaggttttgcgcggatctgtggtttgcgcgtgcgtgatgtgggt180 attgcccgtgccttgaaagctaaccgagctgaggaagtgtatggatcttgtgtagctgca240 cgaggtcctccaaatcgattgtaaaatttaagttgtatggccggtaggccaagattgggt300 tattccggttttcgaaaactggtagcatggttatcggggacattgaaagaatggtagaac360 atcaaattcgattcaaaactgtgctagatttgcatatttagtcgccctaaaattacgtgg420 acgtgggtgatccgaattggttgttgtatgatggttggaagtgactggccaaattttttt480 gtttctcaaagttttctttgacaaactgtttgtcgagcgtcaattcgtatttacctgaat540 ttactaattcttaatacagtatgtcgttattttgggctaagcttgtgtaagaagggtcgt600 ttgacattttgtactgtattgatgctgttttgtgtttctttgttcggagcagcattcaat660 gctccttttgttgtttgagagaatctgatatttgccatcgtaccgaaagtccgaaaccaa720 ctattcaaattgggatttcatttcttttttttctactgtttttagagttctctttttcgc780 tgctgtgctc'ttgtgggtcagtacgtgcatttctctttttttctttttttttctgatgtt840 actcttctgttgaccaaaggagttcagaattattttggccctgtatatcaatagcaacca900 acaccatttattgagcccatttttagttttcttgttctgtagagtatgcattgttgcagg960 tcttaactgttgtcagggaagtaacgtgttcaacatgattgtaaacgaatacaattctgt1020 tgctaactgtgtaatgatgagaaggataattgaataatctttgtgaagtattactgtctg1080 aactgtacgcaaatgctacattcattctttgtgttcgtgtaaatatcattatacataaaa1140 atgctgcattgcattcccgtcgtccgttctaaatcagaactgacgattgctctggtggct1200 gaagctcctgaaagaaaaggaaaaggccgaaaagaagaaagagaaaaggagtgacaggaa1260 agctcccaagcagtgtgagacgtc~aaacattcaaagcacagccataagaagagaaagct1320 tgaagatgtcatcaaagctgagcagggtcccaaaagagtacccaaagaatcagttgagca1380 gttggagaagagtggactctcagaagagcatggagctccttcttttgtacatacgatacg1440 tgactctcctgagagctcacaggacagcggcaagagacgaaaggttgtcctgtccagtcc1500 tagccaacctaagaatggtgagactattctcttgtttttgctattctgattgatttttta1560 ttatagaagaaatcaatcgcttgttcaggattttattcatcccaacttgattttacagga1620 aacattcttcgcttcaagattaaaagtagtcaagatccccaatcagctgttctggagaaa1680 ccaagggttcttgagcaaccattggtccaacaaatgggatcaggttcatccctgtcgggc1740 lUU

aagcaaaattcaatccatcataagatgaatgtgagatctacctctggtcagcggagggtc1800 aatggtgactcccaagcagtacaaaaatgtttgattacagaatccccggcaaagaccatg1860 cagagacttgtcccccagcctgcagctaaggtcacacatcctgttgatccccagtcagct1920 gttaaggtgccagttggaagatcgggcctacctctgaagtcttcgggaagtgtggaccct1980 tcgcctgctagagttatgagaagatttgatcctccacctgttaagatgatgtcacagaga2040 gttcaccatccagcttccatggtgtcgcagaaagttgatcctccgtttccgaaggtatta2100 cataaggaaaccggatctgttgttcgcctaccagaagctacccggcctactgttcttcaa2160 aaacccaaggacttgcctgctatcaagcagcaggatatcaggacctcttcctcaaaagaa2220 gagccctgcttctctggtaggaatgcagaagcagttcaagtgcaagatactaagctctcc2280 cggtcagacatgaagaaaatccgcaaagctgagaaaaaagataagaagttcagagatctg2340 tttgttacctggaatccggtattgatagagaatgaaggttcagatcttggtgatgaagac2400 tggctgttcagcagtaaaaggaactccgatgctatcatggttcaaagcagagctactgat2460 agttcagtgccgatccatccaatggtgcagcagaagccttctttacaacccagggcaaca2520 tttttgccggaccttaatatgtaccagctgccatatgtcgtaccattttaaacatctggc2580 gaggtagatgagaattagatgagatgttgggagagagctgtgtgaacagtaggccgggta2640 gctt 2644 <210> 67 <211> 1347 <212> DNA
<213> Zea ways parviglumis strain Benz <220>
<221> CDS
<222> (1)..(1347) <400> 67 atgtcgaggtgc ttcccc tacccgcca ccggggtac gtgcggaac cca 48 MetSerArgCys PhePro TyrProPro ProGlyTyr Va1ArgAsn Pro 1 5 10 l5 gtggccgtggcc gagccg gagtcgacc getaagctc ctgaaagaa aag 96 ValAlaVa1Ala GluPro GluSerThr AlaLysLeu LeuLysGlu Lys gaaaaggccgaa aagaag aaagagaaa aggagtgac aggaaaget ccc 144 GluLysAlaGlu LysLys LysGluLys ArgSerAsp ArgLysA1a Pro aagcagtgtgag acgtcc aaacattca aagcacagc cataagaag aga l92 LysGlnCysGlu ThrSer LysHisSer LysHisSer HisLysLys Arg aag ctt gaa gat gtc atc aaa get gag cag ggt ccc aaa aga gta ccc 240 Lys Leu Glu Asp Val I1e Lys Ala Glu Gln Gly Pro Lys Arg Val Pro aaagaatca gttgag cagttggag aagagtggactc tcagaagag cat 288 LysGluSer ValGlu GlnLeuGlu LysSerGlyLeu SerGluGlu His ggagetcct tctttt gtacatacg atacgtgactct cctgagagc tca 336 GlyAlaPro SerPhe ValHisThr TleArgAspSer ProGluSer Ser caggacagc ggcaag agacgaaag gttgtcctgtcc agtcctagc caa 384 GlnAspSer GlyLys ArgArgLys Va1ValLeuSer SerProSer Gln cctaagaat ggaaac attcttcgc ttcaagattaaa agtagtcaa gat 432 ProLysAsn GlyAsn IleLeuArg PheLysIleLys SerSerGln Asp ccccaatca getgtt ctggagaaa ccaagggttctt gagcaacca ttg 480 ProGlnSer AlaVal LeuGluLys ProArgValLeu GluGlnPro Leu gtccaacaa atggga tcaggttca tccctgtcgggc aagcaaaat tca 528 ValGlnGln MetGly SerGlySer SerLeuSerGly LysGlnAsn Ser atccatcat aagatg aatgtgaga tctacctctggt cagcggagg gtc 576 IleHisHis LysMet AsnValArg SerThrSerGly GlnArgArg Val aatggtgac tcccaa gcagtacaa aaatgtttgatt acagaatcc ccg 624 ~

AsnGlyAsp SerGln AlaValGln LysCysLeuIle ThrGluSer Pro gcaaagacc atgcag agacttgtc ccccagcctgca getaaggtc aca 672 AlaLysThr MetGln ArgLeuVal ProGlnProAla AlaLysVal Thr catcctgtt gatccc cagtcaget gttaaggtgcca gttggaaga tcg 720 HisProVal AspPro GlnSexAla ValLysValPro ValGlyArg Ser ggcctacct ctgaag tcttcggga agtgtggaccct tcgcctget aga 768 GlyLeuPro LeuLys SerSerGly SerValAspPro SerProAla Arg gttatgaga agattt gatcctcca cctgttaagatg atgtcacag aga 816 ValMetArg ArgPhe AspProPro ProValLysMet MetSerG1n Arg gttcaccat ccaget tccatggtg tcgcagaaagtt gatcctccg ttt 864 ValHisHis ProA1a SerMetVa1 SerGlnLysVal AspProPro .Phe ccgaaggta ttacat aaggaaacc ggatctgttgtt cgcctacca gaa 912 ProLysVal LeuHis LysG1uThr GlySerValVal ArgLeuPro Glu getacccgg cctact gttcttcaa aaacccaaggac ttgcctget atc 960 AlaThrArg ProThr ValLeuGln LysProLysAsp LeuProAla Ile aagcagcag gatatc aggacctct tcctcaaaa gaagagccc tgcttc 1008 LysGlnGln AspIle ArgThrSer SerSerLys Gl.uG1uPro CysPhe tctggtagg aatgca gaagcagtt caagtgcaa gatactaag ctctcc 1056 SerGlyArg AsnAla GluAlaVal GlnValGln AspThrLys LeuSer 340 345 . 350 cggtcagac atgaag aaaatccgc aaagetgag aaaaaagat aagaag 1104 ArgSerAsp MetLys LysIleArg LysAlaGlu LysLysAsp LysLys ttcagagat ctgttt gttacctgg aat~ccggta ttgatagag aatgaa 1152 PheArgAsp LeuPhe ValThrTrp AsnProVal LeuIleGlu AsnGlu ggttcagat cttggt gatgaagac tggctgttc agcagtaaa aggaac 1200 GlySerAsp LeuGly AspGluAsp TrpLeuPhe SerSerLys ArgAsn tccgatget atcatg gttcaaagc agagetact gatagttca gtgccg 1248 SerAspAla ~IleMet Va1GlnSer ArgAlaThr AspSerSer ValPro atccatcca atggtg cagcagaag ccttcttta caacccagg gcaaca 1296 IleHisPro MetVa1 GlnG1nLys ProSerLeu G1nProArg AlaThr tttttgccg gacctt aatatgtac cagctgcca tatgtcgta ccattt 1344 PheLeuPro AspLeu AsnMetTyr GlnLeuPro TyrValVal ProPhe taa 1347 <210> 68 <211> 448 <212> PRT

<213> Zea maysparvigl umisstrain Benz <400> 68 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro Val Ala Val Ala Glu Pro Glu Ser Thr Ala Lys Leu Leu Lys Glu Lys Glu Lys Ala Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg Lys Ala Pro Lys Gln Cys Glu Thr Ser Lys His Ser Lys His Ser His Lys Lys Arg Lys Leu Glu Asp Va1 Ile Lys Ala Glu G1n Gly Pro Lys Arg Val Pro Lys Glu Ser Va1 Glu~Gln Leu Glu Lys Ser Gly Leu Ser Glu Glu His Gly Ala Pro Ser Phe Val His Thr Ile Arg Asp Ser Pro Glu Ser Ser Gln Asp Ser Gly Lys Arg Arg Lys Val Val Leu Ser Ser Pro Ser Gln Pro Lys Asn Gly Asn Ile Leu Arg Phe Lys Ile Lys Ser Ser Gln Asp Pro Gln Ser Ala Val Leu Glu Lys Pro Arg Val Leu Glu Gln Pro Leu l45 150 155 160 Val G1n Gln Met Gly Ser Gly Ser Ser Leu Ser Gly Lys Gln Asn Ser Ile His His Lys Met Asn Val Arg Ser Thr Ser Gly Gln Arg Arg Val Asn Gly Asp Ser Gln Ala Val Gln Lys Cys Leu Ile Thr Glu Ser Pro Ala Lys Thr Met Gln Arg Leu Val Pro Gln Pro Ala Ala Lys Val Thr His Pro Val Asp Pro Gln Ser Ala Val Lys Val Pro Val Gly Arg Ser Gly Leu Pro Leu Lys Ser Ser Gly Ser Val Asp Pro Ser Pro Ala Arg Val Met Arg Arg Phe Asp Pro Pro Pro Val Lys Met Met Ser Gln Arg Val His His Pro Ala Ser Met Val Ser Gln Lys Val Asp Pro Pro Phe Pro Lys Val Leu His Lys Glu Thr Gly Ser Val Val Arg Leu Pro Glu Ala Thr Arg Pro Thr Val Leu Gln Lys Pro Lys Asp Leu Pro A1a Ile Lys Gln Gln Asp Ile Arg Thr Ser Ser Ser Lys Glu Glu Pro Cys Phe Ser Gly Arg Asn Ala Glu Ala Val Gln Val G1n Asp Thr Lys Leu Ser Arg Ser Asp Met Lys Lys I1e Arg Lys Ala Glu Lys Lys Asp Lys Lys Phe Arg Asp Leu Phe Val Thr Trp Asn Pro Val Leu Ile Glu Asn Glu Gly Ser Asp Leu Gly Asp Glu Asp Trp Leu Phe Ser Ser Lys Arg Asn Ser Asp Ala Ile Met Val Gln Ser Arg Ala Thr Asp Ser Ser Val Pro Ile His Pro Met Val G1n Gln Lys Pro Ser Leu Gln Pro Arg Ala Thr Phe Leu Pro Asp Leu Asn Met Tyr Gln Leu Pro Tyr Val Val Pro Phe <210> 69 <211> 586 <212> DNA
<213> Zea mays parviglumis strain BK4 <400>

acgtcggccatgtcgaggtgcttcccctacccgccaccggggtacgtgcggaacccagtg60 gccgtggccgagccggagtcgaccgctaaggtttgttgaaccttcggatttacacacgca120 cgtgccagatcgtttgttcaatctgtaggttttgcgcggatctgtggtttgcgcgtgcgt180 gatgtggcccgtgccttgaaagctaaccgagctgaggaagtgtatggatcttgtgtagct240 gcacgaggtcctccaaatcgattgtaaaatttaagttgtatggccggtaggccaagattg300 ggttagtccggttttcgaaaactggtagcatggttatcggggacattgaaagaatggtag360 aae~atcaaattcgattcaaaactgtgctagatttgcatatttagtcgccctaaaattacg420 tggacgtgggtgatccgaattggttgttgtatgatggttggaagtgactggccaaatttt480 ttgtttctcaaagttttctttgaaaaactgtttgtcgagcgtcaattcgtatttacctga540 atttactaattcttaatacagtatttcgttattttcggctaagctt 586 lU5 <210> 70 <211> 1775 <212> DNA
<213> Zea mays parviglumis strain BK4 <400>

tcttctgttgaccaaaggagttcagaattattttggccctgtatatcaatagcaaccaac 60 accatttattgatcccatttttagttttcttgttctgtagagtatgcattgttgcaggtc 120 ttaactgttgtcagggaagtaacgtgttcaacatgattgtaaacgaatacaattctgttg 180 ctaactgtgtaatgatgagaaggataattgaataatctttgtgaagtattactgtctgaa 240 ctgtacgcaaatgctacattcattctttgtgttcgtgtaaatatcattatacataaaaat 300 gctgcattgcattcccgtcgtccgttctaatcagaactgacgattgctctggtggctgaa 360 gctcctgaaagaaaaggaaaaggccgaaaagaagaaagagaaaaggagtgacaggaaagc 420 tcccaagcagtgtgagacgtccaaacattcaaagcacagccataagaagagaaagcttga 480 agatgtcatcaaagctgagcagggtcccaaaagagtacccaaagaatcagttgagcagtt 540 ggagaagagtggactctcagaagagcatggagctccttcttttgtacatacgatacgtga 600 ctctcctgagagctcacaggacagcggcaagagacgaaaggttgtcctgtccagtcctag 660 ccaacctaagaatggtgagactattctcttgtttttgctattctgattgattttttatta 720 tagaagaaatcaatcgcttgttcaggattttattcatcccaacttgattttacaggaaac 780 attcttcgcttcaagattaaaagtagtcaagacccccaatcagctgttctggagaaacca840 agggttcttgagcaaccattggtccaacaaatgggatcaggttcatccccgtcgggcaag900 caaaattcaatccatcataagatgaatgtgagatctacctctggtcagcggagggtcgat960 ggtgactcccaagcagtacaaaaatgtttgattacagaatccccggcaaagaccatgcag1020 agacttgtcccccagcctgcagctaaggtcacacatcctgttgatccccagtcagctgtt1080 aaggtgccagttggaagatcgggcctacctctgaagtcttcgggaagtgtggacccttcg1140 cctgctagagttatgagaagatttgatcctccacctgttaagatgatgtcacagagagtt1200 caccatccagcttccatggtgtcgcagaaagttgatcctccgtttccgaaggtattacat1260 aaggaaaccggatctgttgttcgcctaccagaagctacccggcctactgttcttcaaaaa1320 cccaaggacttgcctgctatcaagcagcaggatatcagga.cctcttcctcaaaagaagag1380 ccctgcttctctggtaggaatgcagaagcagttcaagtgcaagatactaagctctcccgg1440 tcagacatgaagaaaatccgcaaagctgagaaaaaagataagaagttcagagatctgttt1500 gttacctggaatccggtattgatagagaatgaaggttcagatcttggtgatgaagactgg1560 ctgttcagcagtaaaaggaactccgatgctatcatggttcaaagcagagctactgatagt1620 tcagtgccga tccatccaat ggtgcagcag aagccttctt tacaacccag ggcaacattt 1680 .
ttgccggacc ttaatatgta ccagctgcca tatgtcgtac cattttaaac atctggcgag 1740 gtagatgaga attagatgag atgttgggag agagc 1775 <210> 71 <211> 1347 <212> DNA
<213> Zea mays parviglumis strain BK4 <220>
<221> CDS
<222> (1)..(1347) <400>

atgtcg aggtgcttc ccctacccg ccaccgggg tacgtgcgg aaccca 48 MetSer ArgCysPhe ProTyrPro ProProGly TyrValArg AsnPro gtggcc gtggccgag ccggagtcg accgetaag ctcctgaaa gaaaag 96 ValAla ValAlaGlu ProGluSer ThrA1aLys LeuLeuLys GluLys gaaaag gccgaaaag aagaaagag aaaaggagt gacaggaaa getccc 144 GluLys AlaGluLys LysLysGlu LysArgSer AspArgLys AlaPro aagcag tgtgagacg tccaaacat tcaaagcac agccataag aagaga 192 LysGln CysGluThr SerLysHis SerLysHis SerHisLys LysArg aagctt gaagatgtc atcaaaget gagcagggt cccaaaaga gtaccc 240 LysLeu GluAspVal IleLysAla GluGlnGly ProLysArg ValPro aaagaa tcagttgag cagttggag aagagtgga ctctcagaa gagcat 288 LysGlu SerValGlu GlnLeuGlu LysSerGly LeuSerGlu GluHis ggaget ccttctttt gtacatacg atacgtgac tctcctgag agctca 336 G1yAla ProSerPhe ValHisThr IleArgAsp SerProGlu SerSer caggac agcggcaag agacgaaag gttgtcctg tccagtcct agccaa 384 GlnAsp SerGlyLys ArgArgLys ValValLeu SerSerPro SerGln cctaag aatggaaac attcttcgc ttcaagatt aaaagtagt caagac 432 ProLys AsnGlyAsn IleLeuArg PheLysIle LysSerSer GlnAsp ccccaa tcagetgtt ctggagaaa ccaagggtt cttgagcaa ccattg 480 ProGln SerAlaVal LeuGluLys ProArgVal LeuG1uGln ProLeu gtccaa caaatggga tcaggttca tccccgtcg ggcaagcaa aattca 528 ValGln GlnMetGly SerGlySer SerProSer GlyLysGln AsnSer atccatcat aagatg aatgtgaga tctacctct ggtcagcgg agggtc 576 IleHisHis LysMet AsnValArg SerThrSer GlyGlnArg ArgVal gatggtgac tcccaa gcagtacaa aaatgtttg attacagaa tccccg 624 AspGlyAsp SerGln AlaValGln LysCysLeu IleThrGlu SerPro gcaaagacc atgcag agacttgtc ccccagcct gcagetaag gtcaca 672 AlaLysThr MetGln ArgLeuVal ProGlnPro AlaAlaLys ValThr catcctgtt gatccc cagtcaget gttaaggtg ccagttgga agatcg 720 HisProVal AspPro GlnSerAla ValLysVal ProValGly ArgSer ggcctacct ctgaag tcttcggga agtgtggac ccttcgcct getaga 768 GlyLeuPro LeuLys SerSerGly 5erValAsp ProSerPro AlaArg gttatgaga agattt gatcctcca cctgttaag atgatgtca cagaga 816 ValMetArg ArgPhe AspProPro ProValLys MetMetSer GlnArg gttcaccat ccaget tccatggtg tcgcagaaa gttgatcct ccgttt 864 ValHisHis ProAla SerMetVal SerGlnLys ValAspPro ProPhe ccgaaggta ttacat aaggaaacc ggatctgtt gttcgccta ccagaa 912 ProLysVal LeuHis LysGluThr GlySerVal ValArgLeu ProGlu getacccgg cctact gttcttcaa aaacccaag gacttgcct getatc 960 A1aThrArg ProThr ValLeuGln LysProLys AspLeuPro AlaIle aagcagcag gatatc aggacctct tcctcaaaa gaagagccc tgcttc 1008 LysGlnGln AspI1e ArgThrSer SerSerLys GluGluPro CysPhe tctggtagg aatgca gaagcagtt caagtgcaa gatactaag ctctcc 1056 SerGlyArg AsnAla GluAlaVal GlnValGln AspThrLys LeuSer cggtcagac atgaag aaaatccgc aaagetgag aaaaaagat aagaag 1104 ArgSerAsp MetLys LysI1eArg LysAlaGlu LysLysAsp LysLys ttcagagat ctgttt gttacctgg aatccggta ttgatagag aatgaa 1152 PheArgAsp LeuPhe ValThrTrp AsnProVal LeuIleGlu AsnGlu ggttcagat cttggt gatgaagac tggctgttc agcagtaaa aggaac 1200 GlySerAsp LeuGly AspGluAsp TrpLeuPhe SerSerLys ArgAsn tccgatget atcatg gttcaaagc agagetact gatagttca gtgccg 1248 SerAspAla IleMet ValGlnSer ArgAlaThr AspSerSer ValPro 1~g atc cat cca atg gtg cag cag aag cct tct tta caa ccc agg gca aca 1296 Ile His Pro Met Val Gln Gln Lys Pro Ser Leu Gln Pro Arg Ala Thr ttt ttg ccg gac ctt aat atg tac cag ctg cca tat gtc gta cca ttt 1344 Phe Leu Pro Asp Leu Asn Met Tyr Gln Leu Pro Tyr Val Val Pro Phe 435 440. 445 taa 1347 <210> 72 <211> 448 <212> PRT
<213> Zea mays parviglumis strain BEt4 <400> 72 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro Val Ala Val Ala Glu Pro Glu Ser Thr Ala Lys Leu Leu Lys Glu Lys Glu Lys Ala Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg Lys Ala Pro Lys Gln Cys Glu Thr Ser Lys His Ser.Lys His Ser His Lys Lys Arg Lys Leu Glu Asp Val Ile Lys Ala Glu Gln Gly Pro Lys Arg Val Pro Lys Glu Ser Val Glu Gln Leu Glu Lys Ser Gly Leu Ser Glu Glu His G1y Ala Pro Ser Phe Val His Thr Ile Arg Asp Ser Pro Glu Ser Ser Gln Asp Ser Gly Lys Arg Arg.Lys Val Val Leu Ser Ser Pro Ser Gln Pro Lys Asn Gly Asn Ile Leu Arg Phe Lys Ile Lys Ser Ser Gln Asp Pro Gln Ser Ala Val Leu Glu Lys Pro Arg Val Leu G1u Gln Pro Leu Val Gln Gln Met Gly Ser Gly Ser Ser Pro Ser Gly Lys Gln Asn Ser lUH

Ile His His Lys Met Asn Val Arg Ser Thr Ser Gly Gln Arg Arg Val Asp Gly Asp Ser Gln Ala Val Gln Lys Cys Leu Ile Thr Glu Ser Pro Ala Lys Thr Met Gln Arg Leu Val Pro Gln Pro Ala Ala Lys Val Thr His Pro Val Asp Pro Gln Ser Ala Val Lys Val Pro Val G1y Arg Ser Gly Leu Pro Leu Lys Ser Ser Gly Ser Val Asp Pro Ser Pro Ala Arg Val Met Arg Arg Phe Asp Pro Pro Pro Val Lys Met Met Ser Gln Arg Val His His Pro Ala Ser Met Val Ser Gln Lys Val Asp Pro Pro Phe Pro Lys Val Leu His Lys Glu Thr Gly Ser Val Val Arg Leu Pro Glu Ala Thr Arg Pro Thr Val Leu Gln Lys Pro Lys Asp Leu Pro Ala Ile Lys Gln Gln Asp Ile Arg Thr Ser Ser Ser Lys Glu Glu Pro Cys Phe Ser Gly Arg Asn Ala Glu Ala Val Gln Val Gln Asp Thr Lys Leu Ser Arg 5er Asp Met Lys Lys Ile Arg Lys Ala Glu Lys Lys Asp Lys Lys Phe Arg Asp Leu Phe Val Thr Trp Asn Pro Val Leu Ile Glu Asn Glu Gly Ser Asp Leu Gly Asp Glu Asp Trp Leu Phe Ser Ser Lys Arg Asn Ser Asp Ala Ile Met Val Gln Ser Arg Ala Thr Asp Ser Ser Val Pro m Ile His Pro Met Val Gln Gln Lys Pro Ser Leu Gln Pro Arg A1a Thr Phe Leu Pro Asp Leu Asn Met Tyr Gln Leu Pro Tyr Val Val Pro Phe <210> 73 <211> 305 <212> DNA
<213> Zea mays parviglumis strain IA19 <400>

gattgatttcgagcgattcgactccttgtgatctctacggcggggtagagcgcggtcgac60 cgtcggccatgtcgaggtgcttcccctacccgccaccggggtacgtgcggaacccagtgg120 ccgtggccgagccggagtcgaccgctaaggtttgttgaaccttcggatttacacacgcac180 gtgccagatcgtttgttcaa'tctgtaggttttgcgcggatctgtggtttgcgcgtgcgtg240 atgtgggtattgcccgtgccttgaaagctaaccgagctgaggaagtgtatggatcttgtg300 tagct 305 <210> 74 <211> 1309 <212> DNA
<213> Zea mays parviglumis strain IA19 <400> 74 tcaaagcaca gccataagaa gagaaagctt gaagatgtca tcaaagctga gcaggttccc 60 aaaagagtac ccaaagaatc agttgagcag ttggagaaga gtggactctc agaagagcat 120 ggagctcctt cttttgtaca tacgatacgt gactctcctg agagctcaca ggacagcggc 180 aagagacgaaaggttgtcctgtccagtcctagccaacctaagaatggtgagactattctc240 ttgtttttgctattctgattgattttttattatagaagaaatcaatcgcttgttcaggat300 tttattcatcccaacttgattttacaggaaacattcttcgcttcaagattaaaagtagtc360 aagatccccaatcagctgttctggagaaaccaagggttcttgagcaaccattggtccaac420 aaatgggatcaggttcatccctgtcgggcaagcaaaattcaatccatcataagatgaatg480 tgagatctacctctggtcagcggagggtcaatggtgactcccaagcagtacaaaaatgtt540 tgattacagaatccccggcaaagaccatgcagagacttgtcccccagcctgcagctaagg600 tcacacatcctgttgatccccagtcagctgttaaggtgccagttggaagatcgggcctac660 ctctgaagtcttcgggaagtgtggacccttcgcctgctagagttatgagaagatttgatc720 ctccacctgttaagatgatgtcacagagagttcaccatccagcttccatggtgtcgcaga780 aagttgatcctccgtttccgaaggtattacataaggaaaccggatctgttgttcgcctac840 cagaagctacccggcctactgttcttcaaaaacccaaggacttgcctgctatcaagcagc900 aggakatcaggacctcttcctcaaaagaagagccctgcttctctggtaggaatgcagaag960 cagttcaagtgcaggatactaagctctcccggtcagacatgaagaaaatccgcaaagctg1020 agaaaaaagataagaagttcagagatctgtttgttacctggaatccggtattgatagaga1080 atgaaggttcagatcttggtgatgaagactggctgttcagcagtaaaaggaactccgatg1140 ctatcatggttcaaagcagagctactgatagttcagtgccgatccatccaatggtgcagc1200 agaagccttctttacaacccagggcaacatttttgccggaccttaatatgtaccagctgc1260 catatgtcgtaccattttaaacatctgtcgaggtagatgagaattagat 1309 <210> 75 <211> 1347 <212> DNA
<213> Zea mays parviglumis strain IA19 <220>
<221> CDS
<222> (1)..(1332) <220>
<221> misc_feature <222> (82) .(168) <223> n = A, C, T, or G
<400> 75 atg tcg agg tgc ttc ccc tac ccg cca ccg ggg tac gtg cgg aac cca 48 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro gtg gcc gtg gcc gag ccg gag tcg acc get aag nnn nnn nnn nnn nnn 96 Val A1a Val Ala Glu Pro Glu Ser Thr Ala Lys Xaa Xaa Xaa Xaa Xaa nnnnnn nnnnnnnnn nnnnnn nnnnnnnnn nnnnnnnnn nnnnnnnnn 144 XaaXaa XaaXaaXaa XaaXaa XaaXaaXaa XaaXaaXaa XaaXaaXaa nnnnnn nnnnnnnnn nnnnnn nnntcaaag cacagccat aagaagaga 192 XaaXaa XaaXaaXaa XaaXaa XaaSerLys HisSerHis LysLysArg aagctt gaagatgtc atcaaa getgagcag gttcccaaa agagtaccc 240 LysLeu GluAspVal IleLys AlaGluGln ValProLys ArgValPro 65 70 ~ 75 80 aaagaa tcagttgag cagttg gagaagagt ggactctca gaagagcat 288 LysGlu SerValGlu GlnLeu GluLysSer GlyLeu.SerGluGluHis ggaget ccttctttt gtacat acgatacgt gactctcct gagagctca 336 GlyAla ProSerPhe ValHis ThrIleArg AspSerPro GluSerSer caggac agcggc aagagacga aaggttgtc ctgtccagt cctagccaa 384 GlnAsp SerGly LysArgArg LysValVal LeuSerSer SerGln Pro cctaag aatgga aacattctt cgcttcaag attaaaagt agtcaagat 432 ProLys AsnGly AsnIleLeu ArgPheLys IleLysSer SerGlnAsp ccccaa tcaget gttctggag aaaccaagg gttcttgag caaccattg 480 ProGln SerAla ValLeuGlu LysProArg ValLeuGlu GlnProLeu gtccaa caaatg ggatcaggt tcatccctg tcgggcaag caaaattca 528 ValGln G1nMet GlySerGly SerSerLeu SerGlyLys GlnAsnSer atccat cataag atgaatgtg agatctacc tctggtcag cggagggtc 576 TleHis HisLys MetAsnVal ArgSerThr SerGlyGln ArgArgVal aatggt gactcc caagcagta caaaaatgt ttgattaca gaatccccg 624 AsnGly AspSer GlnAlaVa1 GlnLysCys LeuIleThr GluSerPro gcaaag accatg cagagactt gtcccccag cctgcaget aaggtcaca 672 AlaLys ThrMet GlnArgLeu ValProGln ProAlaAla LysValThr 2l0 215 220 catcct gttgat ccccagtca getgttaag gtgccagtt ggaagatcg 720 HisPro ValAsp ProGlnSer AlaValLys ValProVal GlyArgSer ggccta cctctg aagtcttcg ggaagtgtg gacccttcg cctgetaga 768 GlyLeu ProLeu LysSerSer GlySerVal AspProSer ProAlaArg gttatg agaaga tttgatcct ccacctgtt aagatgatg tcacagaga 816 ValMet ArgArg PheAspPro ProProVal LysMetMet SerGlnArg gttcac catcca gettccatg gtgtcgcag aaagttgat cctccgttt 864 .

ValHis HisPro AlaSerMet ValSerGln LysValAsp ProProPhe ccgaag gtatta cataaggaa accggatct gttgttcgc ctaccagaa 912 ProLys ValLeu HisLysGlu ThrGlySer ValValArg LeuProG1u getacc cggcct actgttctt caaaaaccc aaggacttg cctgetatc 960 AlaThr ArgPro ThrValLeu GlnLysPro LysAspLeu ProAlaIle aagcag caggak atcaggacc tcttcctca aaagaagag ccctgcttc 1008 LysGln GlnXaa TleArgThr SerSerSer LysGluG1u ProCysPhe tctggt aggaat gcagaagca gttcaagtg caggatact aagctctcc 1056 SerGly AlaGluAla ValGlnVal Gln Thr LysLeuSer Arg Asp Asn cggtca atg aagaaaatc cgcaaaget gag aaa gataagaag 1104 gac aaa ArgSerAspMet LysLysIle ArgLysAla GluLys LysAspLysLys ttcagagatctg tttgttacc tggaatccg gtattg atagagaatgaa 1152 PheArgAspLeu PheValThr TrpAsnPro ValLeu I1eGluAsnGlu ggttcagatctt ggtgatgaa gactggctg ttcagc agtaaaaggaac 1200 GlySerAspLeu GlyAspGlu AspTrpLeu PheSer SerLysArgAsn tccgatgetatc atggttcaa agcagaget actgat agttcagtgccg 1248 SerAspAlaIle MetValGln SerArgAla ThrAsp SerSerValPro atccatccaatg gtgcagcag aagccttct ttacaa cccagggcaaca 1296 IleHisProMet ValGlnGln LysProSer LeuGln ProArgAlaThr tttttgccggac cttaatatg taccagctg ccatat gtcgtaccat tttaa 1347 PheLeuProAsp LeuAsnMet TyrGlnLeu ProTyr <210> 76 <211> 444 <212> PRT
<213> Zea mays parviglumis strain IA19 <220>
<221> misc_feature <222> (28) .(28) <223> The 'Xaa' at location 28 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (29) . . (29) <223> The 'Xaa' at location 29 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (30)..(30) <223> The 'Xaa' at location 30 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (31) .(31) <223> The 'Xaa' at location 31 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (32)..(32) <223> The 'Xaa' at location 32 stands for Lys, Asn, Arg, Ser, Thr, Ile, _ Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (33) .(33) <223> The 'Xaa' at location 33 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (34) .(34) <223> The 'Xaa' at location 34 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (35)..(35) <223> The 'Xaa' at location 35 stands for Lys, Asn, Arg, Sera; Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> miso_feature <222> (36) .(36) <223> The 'Xaa' at location 36 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (37)..(37) <223> The 'Xaa' at location 37 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (38) .(38) <223> The 'Xaa' at location 38 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (39) .(39) <223> The 'Xaa' at location 39 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or,Phe.
<220>
<221> misc_feature <222> (40) .(40) <223> The 'Xaa' at location 40 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc feature <222> (41)..(41) <223> The 'Xaa' at location 41 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys~, or Phe .
<220>
<221> misc_feature <222> (42) .(42) <223> The 'Xaa' at location 42 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe .
<220>
<221> misc_feature <222> (43) .(43) <223> The 'Xaa' at location 43 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (44) .(44) <223> The ''Xaa' at location 44 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe .
<220>
<221> misc_feature <222> (45)..(45) <223> The 'Xaa' at location 45 stands for Lys, Asn, Arg, Ser, Thr, Tle, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (46)..(46) <223> The 'Xaa' at location 46 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe .
<220>
<221> misc_feature <222> (47) .(47) <223> The 'Xaa' at location 47 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (48) .(48) <223> The 'Xaa' at location 48 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (49)..(49) <223> The 'Xaa' at location 49 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.

<220>
<221> misc_feature <222> (50) .(50) <223> The 'Xaa' at location 50 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (51)..(51) <223> The 'Xaa' at location 51 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (52) .(52) <223> The 'Xaa' at location 52 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (53) .(53) <223> The 'Xaa' at location 53 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trpy Cys, or Phe.
<220>
<221> misc_feature <222> (54)..(54) <223> The 'Xaa' at location 54 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (55)..(55) <223> The 'Xaa' at location 55 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (56)..(56) <223> The 'Xaa' at location 56 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (324)..(324) <223> The 'Xaa' at location 324 stands for Glu, or Asp.
<400> 76 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro Val Ala Val Ala Glu Pro Glu Ser Thr Ala Lys Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Ser Lys His Ser His Lys Lys Arg Lys Leu Glu Asp Val Ile Lys Ala Glu Gln Val Pro Lys Arg Val Pro Lys Glu Ser Val Glu G1n Leu Glu Lys Ser Gly Leu Ser Glu Glu His Gly Ala Pro Ser Phe Val His Thr Ile Arg Asp Ser Pro Glu Ser Ser Gln Asp Ser Gly Lys Arg Arg Lys Val Val Leu Ser Ser Pro Ser Gln Pro Lys Asn Gly Asn Ile Leu Arg Phe Lys Ile Lys Ser Ser Gln Asp Pro Gln Ser Ala Val Leu Glu Lys Pro Arg Val Leu G1u Gln Pro Leu Val Gln Gln Met Gly Ser Gly Ser Ser Leu Ser Gly Lys Gln Asn Ser Ile His His Lys Met Asn Val Arg Ser Thr Ser Gly Gln Arg Arg Val Asn Gly Asp Ser Gln Ala Val Gln Lys Cys Leu Ile Thr Glu Ser Pro Ala Lys Thr Met Gln Arg Leu Val Pro Gln Pro Ala Ala Lys Val Thr His Pro Val Asp Pro Gln Ser Ala Val Lys Val Pro Val Gly Arg Ser Gly Leu Pro Leu Lys Ser Ser Gly Ser Val Asp Pro Ser Pro Ala Arg Val Met Arg Arg Phe Asp Pro Pro Pro Val Lys Met Met Ser Gln Arg Val His His Pro Ala Ser Met Val Ser Gln Lys Val Asp Pro Pro Phe Pro Lys Val Leu His Lys Glu Thr Gly Ser Val Val Arg Leu Pro Glu Ala Thr Arg Pro Thr Val Leu Gln Lys Pro Lys Asp Leu Pro Ala Ile Lys Gln Gln Xaa Ile Arg Thr Ser Ser Ser Lys Glu Glu Pro Cys Phe Ser Gly Arg Asn Ala Glu Ala Val Gln Va1 Gln Asp Thr Lys Leu Ser Arg Ser Asp Met Lys Lys Ile Arg Lys Ala Glu Lys Lys Asp Lys Lys Phe Arg Asp Leu Phe Val Thr Trp Asn Pro Val Leu Ile Glu Asn Glu G1y Ser Asp Leu Gly Asp G1u Asp Trp Leu Phe Ser Ser Lys Arg Asn Ser Asp Ala Ile Met Val Gln Ser Arg A1a Thr Asp Ser Ser Val Pro Tle His Pro Met Val Gln Gln Lys Pro Ser Leu Gln Pro Arg Ala Thr Phe Leu Pro Asp Leu Asn Met Tyr Gln Leu Pro Tyr <210> 77 <211> 86 <212> DNA
<213> Zea mays parviglumis strain Wilkes <400> 77 ctctcggcgg ggtagagcgc ggtcgaccgt cggccatgtc gaggtgcttc ccctacccgc 60 caccggggta cgtgcggaac ccagtg 86 <210> 78 <211> 1347 <212> DNA
<2l3> Zea mays parviglumis strain Wilkes <220>

<221>
CDS

<222> (1)..(1347) <220>

<221> feature misc <222> _ .(81) (52) <223> G, or N T
=
A, C, <400> 78 atg agg tgcttcccc tacccgcca ccggggtac gtgcggaac cca 48 tcg Met Arg CysPhePro TyrProPro ProG1yTyr ValArgAsn Pro Ser gtg nnn nnnnnnnnn nnnnnnnnn nnnnnnctc ctgaaagaa aag 96 nnn Val Xaa XaaXaaXaa XaaXaaXaa XaaXaaLeu LeuLysGlu Lys Xaa gaa gcc gaaaagaag aaagagaaa aggagtgac aggaaaget ccc 144 aag Glu Ala GluLysLys LysGluLys ArgSerAsp ArgLysAla Pro Lys aag tgt gagacgtcc aaacattca aagcacagc cataagaag aga 192 cag Lys Cys GluThrSer LysHisSer LysHisSer HisLysLys Arg Gln aag gaa gatgtcatc aaagetgag cagggtccc aaaagagta ccc 240 ctt Lys Glu AspValIle LysAlaGlu GlnGlyPro LysArgVal Pro Leu aaagaatcagtt gagcag ttggagaag agtggactc tcagaagag cat 288 LysGluSerVal GluGln LeuGluLys SerGlyLeu SerGluGlu His ggagetccttct tttgta catacgata cgtgactct cctgagagc tca 336 GlyAlaProSer PheVal HisThrIle ArgAspSer ProGluSer Ser caggacagcggc aagaga cgaaaggtt gtcctgtcc agtcctagc caa 384 GlnAspSerGly LysArg ArgLysVal ValLeuSer SerProSer Gln cctaagaatgga aacatt cttcgcttc aagattaaa agtagtcaa gat 432 ProLysAsnGly AsnI1e LeuArgPhe LysIleLys SerSerGln Asp ccccaatcaget gttctg gagaaacca agggttctt gagcaacca ttg 480 ProGlnSerAla ValLeu GluLysPro ArgValLeu GluGlnPro Leu gtccaacaaatg ggatca ggttcatcc ctgtcgggc aagcaaaat tca 528 ValGlnG1nMet G1ySer GlySerSer LeuSerGly LysGlnAsn Ser atccatcataag atgaat gtgagatct acctctggt cagcggagg gtc 576 I1eHisHisLys MetAsn ValArgSer ThrSerGly GlnArgArg Val aatggt gactcc caagcagta caaaaatgt ttgattaca gaatccccg 624 AsnGly AspSer GlnAlaVal GlnLysCys LeuIleThr GluSerPro gcaaag accatg cagagactt gtcccccag cctgcaget aaggtcaca 672 AlaLys ThrMet GlnArgLeu ValProGln ProAlaAla LysValThr a catcct .gttgat ccccagtca getgttaag gtgccagtt ggaagatcg 720 HisPro ValAsp ProGlnSer AlaValLys ValProVal GlyArgSer ggccta cctctg aagtcttcg ggaagtgtg gacccttcg cctgetaga 768 GlyLeu ProLeu LysSerSer GlySerVal AspProSer ProA1aArg gttatg agaaga tttgatcct ccacctgtt aagatgatg tcacagaga 816 ValMet ArgArg PheAspPro ProProVal LysMetMet SerGlnArg gttcac catcca gettccatg gtgtcgcag aaagttgat cctccgttt 864 ValHis HisPro AlaSerMet ValSerGln LysValAsp ProProPhe ccgaag gtatta cataaggaa accggatct gttgttcgc ctaccagaa 912 ProLys ValLeu HisLysGlu ThrGlySer ValValArg LeuProGlu getacc cggcct actgttctt caaaaaccc aaggacttg cctgetatc 960 AlaThr ArgPro ThrValLeu GlnLysPro LysAspLeu ProAlaI1e aagcag caggat atcaggacc tcttcctca aaagaagag ccctgcttc 1008 LysGln GlnAsp I1eArgThr SerSerSer LysGluGlu ProCysPhe tctggt aggaat gcagaagca gttcaagtg caagatact aagctctcc 1056 SerGly ArgAsn AlaGluAla ValGlnVal GlnAspThr LysLeuSer cggtca gacatg aagaaaatc cgcaaaget gagaaaaaa gataagaag 1104 ArgSer AspMet LysLysIle ArgLysAla GluLysLys AspLysLys ttcaga gatctg tttgttacc tggaatccg gtattgata gagaatgaa 1152 PheArg AspLeu PheVa1Thr TrpAsnPro ValLeuIle GluAsnGlu ggttca gatctt ggtgatgaa gactggctg ttcagcagt aaaaggaac 1200 GlySer AspLeu G1yAspGlu AspTrpLeu PheSerSer LysArgAsn tccgat getatc atggttcaa agcagaget actgatagt tcagtgccg 1248 SerAsp AlaIle MetValGln SerArgAla ThrAspSer SerValPro atccat ccaatg gtgcagcag aagccttct ttacaaccc agggcaaca 1296 IleHis ProMet ValGlnGln LysProSer LeuGlnPro ArgAlaThr ttt ttg ccg gac ctt aat atg tac cag ctg cca tat gtc gta cca ttt 1344 Phe Leu Pro Asp Leu Asn Met Tyr Gln Leu Pro Tyr Val Va1 Pro Phe taa 1347 <210> 79 <211> 448 <212> PRT
<213> Zea mays parviglumis strain Wilkes <220>
<221> misc_feature <222> (18) .(18) <223> The 'Xaa' at location 18 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (19) .(19) <223> The 'Xaa' at location 19 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (20) .(20) <223> The 'Xaa' at location 20 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (21) .(21) <223> The 'Xaa' at location 21 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (22)..(22) <223> The 'Xaa' at location 22 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (23)..(23) <223> The 'Xaa' at location 23 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (24)..(24) <223> The 'Xaa' at location 24 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.

<220>
<221> misc_feature <222> (25)..(25) <223> The 'Xaa' at location 25 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (26) . (26) <223> The 'Xaa' at location 26 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<220>
<221> misc_feature <222> (27)..(27) <223> The 'Xaa' at location 27 stands for Lys, Asn, Arg, Ser, Thr, Ile, Met, Glu, Asp, Gly, Ala, Val, Gln, His, Pro, Leu, Tyr, Trp, Cys, or Phe.
<400> 79 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro Val Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Xaa Leu Leu Lys Glu Lys Glu Lys Ala Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg Lys Ala Pro Lys Gln Cys Glu Thr Ser Lys His Ser Lys His Ser His Lys Lys Arg Lys Leu G1u Asp Val Ile Lys Ala G1u Gln Gly Pro Lys Arg Val Pro Lys Glu Ser Val Glu Gln Leu Glu Lys Ser Gly Leu Ser Glu Glu His Gly Ala Pro Ser Phe Val His Thr Ile Arg Asp Ser Pro Glu Ser Ser Gln Asp Ser Gly Lys Arg Arg Lys Val Val Leu Ser Ser Pro Ser Gln Pro Lys Asn Gly Asn Ile Leu Arg Phe Lys Ile Lys Ser Ser Gln Asp 130 ' 135 140 Pro Gln Ser Ala Val Leu Glu Lys Pro Arg Val Leu Glu Gln Pro Leu GJ

Val Gln Gln Met Gly Ser Gly Ser Ser Leu Ser Gly Lys Gln Asn Ser 165 l70 175 Ile His His Lys Met Asn Val Arg Ser Thr Ser Gly Gln Arg Arg Val Asn Gly Asp Ser Gln Ala Val Gln Lys Cys Leu Ile Thr Glu Ser Pro Ala Lys Thr Met Gln Arg Leu Val Pro Gln Pro Ala Ala Lys Val Thr His Pro Val Asp Pro Gln Ser Ala Val Lys Val Pro Val Gly Arg Ser Gly Leu Pro Leu Lys Ser Ser Gly Ser Val Asp Pro Ser Pro Ala Arg Val Met Arg Arg Phe Asp Pro Pro Pro Val Lys Met Met Ser Gln Arg Val His His Pro Ala Ser Met Va1 Ser Gln Lys Val Asp Pro Pro Phe Pro Lys Val Leu His Lys Glu Thr Gly Ser Val Val Arg Leu Pro Glu Ala Thr Arg Pro Thr Val Leu Gln Lys Pro Lys Asp Leu Pro~Ala Ile Lys Gln Gln Asp Ile Arg Thr Ser Ser Ser Lys Glu Glu Pro Cys Phe Ser G1y Arg Asn Ala Glu Ala Val Gln Val Gln Asp Thr Lys Leu Ser Arg Ser Asp Met Lys Lys I1e Arg Lys Ala Glu Lys Lys Asp Lys Lys Phe Arg Asp Leu Phe Val Thr Trp Asn Pro Val Leu Ile Glu Asn Glu Gly Ser Asp Leu G1y Asp Glu Asp Trp Leu Phe Ser Ser Lys Arg Asn Ser Asp Ala Ile Met Val Gln Ser Arg Ala Thr Asp Ser Ser Val Pro Ile His Pro Met Val Gln Gln Lys Pro Ser Leu Gln Pro Arg Ala Thr Phe Leu Pro Asp Leu Asn Met Tyr Gln Leu Pro Tyr Val Val Pro Phe <210> 80 <211> 225 <212> DNA
<213> Zea diploperennis <400> 80 agcgcggtcg accgtcggcc atgtcgaggt gcttccccta cccgccaccg gggtacgtgc 60 ggaacccagt ggccgtggcc gagccggagt cgaccgctaa ggtttgttga accttcggat 120 ttacacacgc acgtgccaga tcgtttgttc aatctgtagg ttttgcgcgg atctgtggtt 180 tgcgcgtgcg tgatgtgggt attgcccgtg ccttgaaagc taacc 225 <210> 81 <211> 1672 <212> DNA
<213> Zea diploperennis <400>

agcccatttttagttttattgttctgtagagtatgcattgttgcaggtcttaactgttgt 60 cagggaagtaacgtgttcaacatgattgtaaacgaatacaattctgttgctaactgtgta 120 atgatgagaaggataattgaataatctttgtgaagtattaetgtctgaactgtacgcaaa 180 tgctacattcattctttgtgttcgtgtaaatatcattatacataaaaatgctgcattgca 240 ttcccgtcgtccgttctaaatcagaactgacgattgctctggtggctgaagctcctgaaa 300 gaaaaggaaaaggccgaaaagaagaaagagaaaaggagtgacaggaaagctcccaagcag 360 tgtgagacgtccaaacactcaaagcacagccataagaagagaaagcttgaagatgtcatc 420 aaagctgagcagggtcccaaaagagtacccaaagaatcagttgagcagttggagaagagt 480 ggactctcagaagagcatggagctccttcttttgtacatacgatacgtgactctcctgag 540 agctcacagg acagcggcaa gagacgaaag gttgtcctgt ccagtcctag ccaacctaag 600 aatggtgaga ctattctctt gtttttgcta ttctgattga ttttttatta tagaagaaat 660 caatcacttg ttcaggattt tattcatccc aacttgattt tacaggaaac attcttcgct 720 tcaagattaa aagtagtcaa gatccccaat cagctgttct ggagaaacca agggttcttg 780 agcaaccattggtccaacaaatgggatcaggttcatccctgtcgggcaagcaaaattcaa840 tccatcataagatgaatgtgagatctacctctggtcagcggagggtcaatggtgactcgc900 aagcagtacaaaaatgtttgattacagaatccccggcaaagaccatgcagagacttgtcc960 cccagcctgcagctaaggtcacacatcctgttgatccccagtcagctgttaaggtgccag1020 ttggaaggtcgggcctacctctcaagttttcgggaagtatggacccttcgcctgctagag1080 ttatgggaagatttgatcctccacctgttaagatgatgtcacagagagttcaccatccag1140 cttccatggtgtcgcagaaagttgatcctccgttaccgaaggtattacataaggaaaccg1200 gatctgttgttcgcctaccagaagctacccggcctactgttcttcaaaaacccaaggact1260 tgcctgctatcaagcagcagcagatcaggacctcttcctcaaaagaagagccctgcttct1320 ctggtaggaatgcagaagcagttcaagtgcatgatactaagctctcccggtcagatatga1380 agaaaatccgcaaagctgagaaaaaagataagaagttcagagatctgtttgttacctgga1440 atccggtattgatagagaatgaaggttcagatcttggtgatgaagactggctgttcagca1500 gtaaaaggaactccgatgctatcatggttcaaagcagagctactgatagttcagtgccga1560 tccatccaatkgtgcagcagaaaccttctttacaacccagggcaacatttttgccggacc1620 ttaatatgtaccagctgccatatgtcgtaccattttaaacatctgtcgaggt 1672 <210>

<211>

<212>
DNA

<213> diploperennis Zea <220>

<221> CDS

<222> (1)..(1347) <220>

<221> feature misc <222> _ 47) (1).
(13 <223> The at 420 stands Xaa position for Met or Ile <400> 82 atg agg tgcttc ccc'tacccg ccaccg gggtacgtg cggaaccca 48 tcg Met Arg CysPhe ProTyrPro ProPro GlyTyrVal ArgAsnPro Ser gtg gtg gccgag ccggagtcg accget aagctcctg aaagaaaag 96 gcc Val Val AlaGlu ProGluSer ThrAla LysLeuLeu LysGluLys Ala gaa gcc gaaaag aagaaagag aaaagg agtgacagg aaagetccc 144 aag Glu Ala G1uLys LysLysGlu LysArg SerAspArg LysA1aPro Lys 35 40 , 45 aag tgt gagacg tccaaacac tcaaag cacagccat aagaagaga 192 cag Lys Cys G1uThr SerLysHis SerLys HisSerHis LysLysArg Gln aagcttgaa gatgtcatc aaagetgag cagggtccc aaaaga gtaccc 240 LysLeuGlu AspValIle LysAlaGlu GlnGlyPro LysArg ValPro aaagaatca gttgagcag ttggagaag agtggactc tcagaa gagcat 288 LysGluSer ValGluGln LeuGluLys SerGlyLeu SerGlu GluHis ggagetcct tcttttgta catacgata cgtgactct cctgag agctca 336 GlyAlaPro SerPheVal HisThrIle ArgAspSer ProGlu SerSer caggacagc ggcaagaga cgaaaggtt gtcctgtcc agtcct agccaa 384 GlnAspSer GlyLysArg ArgLysVal ValLeuSer SerPro SerGln l15 120 125 cctaagaat ggaaacatt cttcgcttc aagattaaa agtagt caagat 432 ProLysAsn GlyAsnIle LeuArgPhe LysTleLys SerSer GlnAsp ccccaatca getgttctg gagaaacca agggttctt gagcaa ccattg 480 ProGlnSer AlaValLeu GluLysPro ArgValLeu G1uGln ProLeu gtccaacaa atgggatca ggttcatcc ctgtcgggc aagcaa aattca 528 ValGlnGln MetGlySer GlySerSer LeuSerGly LysGln AsnSer atccatcat aagatgaat gtgagatct acctctggt cagcgg agggtc 576 I1eHisHis LysMetAsn ValArgSer ThrSerGly GlnArg ArgVal aatggtgac tcgcaagca gtacaaaaa tgtttgatt acagaa tcccog 624 AsnGlyAsp SerGlnAla ValGlnLys CysLeuI1e ThrGlu SerPro gcaaagacc atgcagaga cttgtcccc cagcctgca getaag gtcaca 672 AlaLysThr MetGlnArg LeuValPro GlnProAla AlaLys ValThr catcctgtt gatccccag tcagetgtt aaggtgcca gttgga aggtcg 720 HisProVal AspProGln SerAlaVal LysValPro Va1Gly ArgSer ggcctacct ctcaagttt tcgggaagt atggaccct tcgcct getaga 768 GlyLeuPro LeuLysPhe SerGlySer MetAspPro SerPro AlaArg gttatggga agatttgat cctccacct gttaagatg atgtca cagaga 816 ValMetGly ArgPheAsp ProProPro ValLysMet MetSer GlnArg gttcaccat ccagettcc atggtgtcg cagaaagtt gatcct ccgtta 864 ValHisHis ProAlaSer MetVa1Ser GlnLysVal AspPro ProLeu ccgaaggta ttacataag gaaaccgga tctgttgtt cgccta ccagaa 912 ProLysVal LeuHisLys GluThrGly SerValVa1 ArgLeu ProGlu get ac~c cgg cct act gtt ctt caa aaa ccc aag gac ttg cct get atc 960 Ala Thr Arg Pro Thr Val Leu Gln Lys Pro Lys Asp Leu Pro Ala Ile aag cag cag cag atc agg acc tct tcc tca aaa gaa gag ccc tgc ttc 1008 Lys Gln Gln Gln Tle Arg Thr Ser Ser Ser Lys Glu Glu Pro Cys Phe tct ggt agg aat gca gaa gca gtt caa gtg cat gat act aag ctc tcc 1056 Ser Gly Arg Asn Ala Glu Ala Val Gln Val His Asp Thr Lys Leu Ser cgg tca gat atg aag aaa atc cgc aaa get gag aaa aaa gat aag aag 1104 Arg Ser Asp Met Lys Lys Ile Arg Lys Ala G1u Lys Lys Asp Lys Lys ttc aga gat ctg ttt gtt acc tgg aat ccg gta ttg ata gag aat gaa 1152 Phe Arg Asp Leu Phe Val Thr Trp Asn Pro Val Leu Ile Glu Asn Glu ggttcagat cttggtgat gaagactgg ctgttcagc agtaaaagg aac 1200 GlySerAsp LeuGlyAsp GluAspTrp LeuPheSer SerLysArg Asn tccgatget atcatggtt caaagcaga getactgat agttcagtg ccg 1248 SerAspAla IleMetVal G1nSerArg AlaThrAsp SerSerVal Pro atccatcca atkgtgcag cagaaacct tctttacaa cccagggca aca 1296 IleHisPro XaaValGln GlnLysPro SerLeuGln ProArgAla Thr tttttgccg gaccttaat atgtaccag ctgccatat gtcgtacca ttt 1344 PheLeuPro AspLeuAsn MeCTyrGln LeuProTyr ValValPro Phe taa 1347 <210> 83 <211> 448 <212> PRT
<213> Zea diploperennis <220>
<221> misc_feature <222> (420)..(420) <223> The 'Xaa' at location 420 stands for Met, or Ile.
<400> 83 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro Val Ala Val Ala Glu Pro G1u Ser Thr Ala Lys Leu Leu Lys Glu Lys Glu Lys Ala Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg Lys Ala Pro 12~

Lys Gln Cys Glu Thr Ser Lys His Ser Lys His Ser His Lys Lys Arg Lys Leu Glu Asp Val Ile Lys Ala Glu Gln G1y Pro Lys Arg Val Pro Lys Glu Ser Val Glu Gln Leu Glu Lys Ser Gly Leu Ser Glu Glu His Gly Ala Pro Ser Phe Val His Thr Tle Arg Asp Ser Pro Glu Ser Ser Gln Asp Ser Gly Lys Arg Arg Lys Val Val Leu Ser Ser Pro Ser Gln Pro Lys Asn Gly Asn Ile Leu Arg Phe Lys I1e Lys Ser Ser Gln Asp Pro Gln Ser Ala Val Leu Glu Lys Pro Arg Val Leu Glu Gln Pro Leu Val Gln~Gln Met Gly Ser Gly Ser Ser Leu Ser Gly Lys G1n Asn Ser Ile His His Lys Met Asn Val Arg Ser Thr Ser Gly Gln Arg Arg Val Asn Gly Asp Ser Gln Ala Val Gln Lys Cys Leu Ile Thr Glu Ser Pro Ala Lys Thr Met Gln Arg Leu Val Pro Gln Pro Ala Ala Lys Val Thr His Pro Val Asp Pro G1n Ser Ala Val Lys Val Pro Val Gly Arg Ser Gly Leu Pro Leu Lys Phe Ser Gly Ser Met Asp Pro Ser Pro Ala Arg Val Met Gly Arg Phe Asp Pro Pro Pro Val Lys Met Met Ser Gln Arg Val His His Pro Ala Ser Met Val Ser Gln Lys Val Asp Pro Pro Leu 1Gy Pro Lys Val Leu His Lys Glu Thr Gly Ser Val Val Arg Leu Pro Glu Ala Thr Arg Pro Thr Val Leu G1n Lys Pro Lys Asp Leu Pro Ala Ile Lys Gln Gln G1n Ile Arg Thr Ser Ser Ser Lys Glu Glu Pro Cys Phe Ser Gly Arg Asn Ala Glu Ala Va1 Gln Val His Asp Thr Lys Leu Ser Arg Ser Asp Met Lys Lys Ile Arg Lys A1a Glu Lys Lys Asp Lys Lys Phe Arg Asp Leu Phe Val Thr Trp Asn Pro Val Leu Ile Glu Asn Glu Gly Ser Asp Leu Gly Asp Glu Asp Trp Leu Phe Ser Ser Lys Arg Asn Ser Asp Ala Ile Met Val Gln Ser Arg Ala Thr Asp Ser Ser Val Pro Ile His Pro Xaa Val Gln Gln Lys Pro Ser Leu Gln Pro Arg Ala Thr Phe Leu Pro Asp Leu Asn Met Tyr Gln Leu Pro Tyr Val Val Pro Phe <210>

<211>

<212>
DNA

<213> luxurians Zea <400>

ggccatgtcgaggtgcttcccctacccgccaccggggtacgtgcggaacccagtggccgt60 ggccgagccggagtcgaccgctaaggtttgttgaaccttcggatttacacacgcacgtgc120 cagatcgtttggtcaatctgttggttttgcgcggatctgtggtttgcgcgtgcgtgatgt180 gggtattgcccgtgccttgaaagctaaccgagatgaggaagtgtatggatcttgtttagc240 tgcacgaggtcctccaaatcgattgaaaaatttaagttggatggccggtaggccaagatt300 gggttagtccggtttttgataactggtaccatggttatcggggacattgaacagaacggt360 agaacatcaaattcgattcaaaactgtgctagatttgcacatttagtcgccctaagatta420 cgtggacgtg ggtggtccga attggttgtt gttgtatgat ggttggaata tgagccattt 480 agtgcttccg tgactggcca aatatttttg tttctcaaat ttttctttga aaaactgttt 540 gtcgagcgtc aattcttaat acagtatgtc gttattttgg gctaagcttg tgaaacaagg 600 gtcgtttgacatttgtactgtattaacctgatgttactcttctggttgaccaaaggagtt660 ttagaattattttggtcctgtaaatcaatagcaactaacaccatctattgtgcccatttt720 tagttttgtatagttttgtatgcagtgttgcaggtcttaactgttgtcaggaaagtaacg780 tgttcacatgattgtaaacgaatacaattctgttgctaactgtgtaatgatgagaacgat840 aattgaataatctttgtgaagtattactgtctgaactgtacacaaatgctacattcattc900 tttgtgttcgtgtaaatgtcattatacataaaaaatgctgcattgcattcccgtcgtccg960 ttctaaatcagaactgacgattgctctggtggctgaagctcccgaaagaaaaggaaaagg1020 ccgaaaagaagaaagagaaacggagtgacaggaaagctcccaagcagtgtgagacgtcca1080 aacattcaaa gcacatccat aagaagagaa agcttgaaga tgtcatcaaa gctgggcagg 1140 gtcccaaaag agtacccaaa gaatcagttg agcagttgga gaagagtgga ctctcagaag 1200 agcatggagc tccttctttt gtacataaga tacgcgactc tcctgagagc tcacaggaca 1260 gcggcaagagacgaaaggttgtcctgtccagtcctagccaacctaagaatggtgagacta1320 ttctcttgtttttgctattctgattgattttttattatagaagaaatcaatcacttgttc1380 cggattttattcatcccaacttgacattttacaggaaacattcttcgcttcaagattaaa1440 agtaatcaagatccccaatcagctgttctggagaaaccaagggttcttgaccaaccattg1500 gtccaacaaatgggatcaggttcatccctgtcgggcaagcaaaattcaatccatcataag1560 atgaatgtgagatctacctctggtcagcggagggtcaatggtgaatcccaagcagtacaa1620 aaatgtttgattacagaatccccggcaaagaccatgcagagacttgtcccccagcctgca1680 gctaaggtcacacatcctgttgatccccagtcagctgttaaggtgccagttggaagatcg1740 ggcctacctctgaagttttcgggaagtgtggacccttcgcctgctagagttatgggaaga1800 tttgatcctccacctgttaagatgatgtcacagagagttcaccatccagcttccatggtg1860 tcgcagaaagttgatcctccgttaccgaaggtattacataaggaaaccggatctgttgtt1920 cgcctaccagaagctacccggcctactgttcttcaaaaacccaaggacttgcctgctatc1980 aagcagcaggagatcaggacctcttcctcaaaagaagagccctgcttctctggtaggaat2040 gcagaagcagttcaagtgcaggatactaagctctcccggtcagatgtgaagaaaatccgc2100 aaagctgagaaaaaagataagaagttcagagatctgtttgttacctggaatccggtgttg2160 atagagaatgaaggttcagatcttggtgatgaagactggctgttcagcagtaaaaggaac2220 tccgatgcta tcatggctca aagcagagct actgatagtt cagtgccgat ccatccaatg 2280 gtgcagcaga agccttcttt gcaacccagg gcaacgtttt tgccggacct taatatctac 2340 cagctgccat atgtcgtacc attttaaaca tctgtcgagg tagatgagaa ttagatgaga 2400 tgttgggaga gagctgtgtg aac 2423 <210> 85 <2l1> 1347 <212> DNA
<213> Zea luxurians <220>

<221>
CDS

<222> (1)..(1347) <400>

atgtcgagg tgcttc ccctacccg ccaccgggg tacgtgcgg aaccca 48 MetSerArg CysPhe ProTyrPro ProProGly TyrValArg AsnPro gtggccgtg gccgag ccggagtcg accgetaag ctcccgaaa gaaaag 96 ValAlaVal AlaGlu ProGluSer ThrAlaLys LeuProLys GluLys gaaaaggcc gaaaag aagaaagag aaacggagt gacaggaaa getccc 144 GluLysAla GluLys LysLysGlu LysArgSer AspArgLys AlaPro 35 . 40 45 aagcagtgt gagacg tccaaacat tcaaagcac atccataag aagaga 192 LysGlnCys GluThr SerLysHis SerLysHis IleHisLys LysArg aagcttgaa gatgtc atcaaaget gggcagggt cccaaaaga gtaccc 290 LysLeuGlu AspVal IleLysAla GlyGlnGly ProLysArg ValPro aaagaatca gttgag cagttggag aagagtgga ctctcagaa gagcat 288 LysGluSer ValGlu GlnLeuGlu LysSerGly LeuSerGlu GluHis ggagetcct tctttt gtacataag atacgcgac tctcctgag agctca 336 GlyA1aPro SerPhe ValHisLys IleArgAsp SerProGlu SerSer caggacagc ggcaag agacgaaag gttgtcctg tccagtcct agccaa 384 GlnAspSer GlyLys ArgArgLys ValVa1Leu SerSerPro SerGln cctaagaat ggaaac attcttcgc ttcaagatt aaaagtaat caagat 432 ProLysAsn GlyAsn IleLeuArg PheLysIle LysSerAsn GlnAsp ccccaatca getgtt ctggagaaa ccaagggtt cttgaccaa ccattg 480 ProGlnSer AlaVal LeuG1uLys ProArgVal LeuAspGln ProLeu gtccaa caaatggga tcaggttca tccctg tcgggcaag caaaattca 528 ValG1n GlnMetGly SerGlySer SerLeu SerGlyLys G1nAsnSer atccat cataagatg aatgtgaga tctacc tctggtcag cggagggtc 576 IleHis HisLysMet AsnVa1Arg SerThr SerGlyG1n ArgArgVal aatggt gaatcccaa gcagtacaa aaatgt ttgattaca gaatccccg 624 AsnG1y GluSerGln AlaVa1Gln LysCys LeuIleThr GluSerPro gcaaag accatgcag agacttgtc ccccag cctgcaget aaggtcaca 672 AlaLys ThrMetGln ArgLeuVal ProGln ProAlaAla LysValThr 210 215 . 220 cat cct gtt gat ccc cag tca get gtt aag gtg cca gtt gga aga tcg 720 His Pro Val Asp Pro Gln Ser Ala Val Lys Va1 Pro Val Gly Arg Ser ggc cta cct ctg aag ttt tcg gga agt gtg gac cct tcg cct get aga 768 Gly Leu Pro Leu Lys Phe Ser Gly Ser Val Asp Pro Ser Pro Ala Arg 245 250 ~ 255 gtt atg gga aga ttt gat cct cca cct gtt aag atg atg tca cag aga 816 Val Met Gly Arg Phe Asp Pro Pro Pro Val Lys Met Met Ser Gln Arg gtt cac cat cca get tcc atg gtg tcg cag aaa gtt gat cct ccg tta 864 Val His His Pro Ala Ser Met Val Ser Gln Lys Val Asp Pro Pro Leu ccg aag gta tta cat aag gaa acc gga tct gtt gtt cgc cta cca gaa 912 Pro Lys Val Leu His Lys Glu Thr Gly Ser Val Val Arg Leu Pro Glu get acc cgg cct act gtt ctt caa aaa ccc aag gac ttg cct get atc 960 Ala Thr Arg Pro Thr Val Leu Gln Lys Pro Lys Asp Leu Pro Ala Ile aag cag cag gag atc agg acc tct tcc tca aaa gaa gag ccc tgc ttc 1008 Lys Gln Gln Glu Ile Arg Thr Ser Ser Ser Lys Glu Glu Pro Cys Phe tct ggt agg aat gca gaa gca gtt caa gtg cag gat act aag ctc tcc 1056 Ser Gly Arg Asn Ala Glu Ala Val Gln Val Gln Asp Thr Lys Leu Ser cgg tca gat gtg aag aaa atc cgc aaa get gag aaa aaa gat aag aag 1104 Arg Ser Asp Val Lys Lys Ile Arg Lys Ala Glu Lys Lys Asp Lys Lys ttc aga~gat ctg ttt gtt acc tgg aat ccg gtg ttg ata gag aat gaa 1152 Phe Arg Asp Leu Phe Val Thr Trp Asn Pro Val Leu Ile Glu Asn G1u ggt tca gat ctt ggt gat gaa gac tgg ctg ttc agc agt aaa agg aac 1200 Gly Ser Asp Leu Gly Asp Glu Asp Trp Leu Phe Ser Ser Lys Arg Asn tcc gat get atc atg get caa agc aga get act gat agt tca gtg ccg 1248 Ser Asp Ala Ile Met Ala Gln Ser Arg Ala Thr Asp Ser Ser Val Pro atc cat cca atg gtg cag cag aag cct tct ttg caa ccc agg gca acg 1296 Ile His Pro Met Val Gln Gln Lys Pro Ser Leu Gln Pro Arg Ala Thr ttt ttg ccg gac ctt aat atc tac cag ctg cca tat gtc gta cca ttt 1344 Phe Leu Pro Asp Leu Asn Ile Tyr Gln Leu Pro Tyr Val Val Pro Phe taa 1347 <210> 86 <211> 448 <212> PRT
<213> Zea luxurians <400> 86 Met Ser Arg Cys Phe Pro Tyr Pro Pro Pro Gly Tyr Val Arg Asn Pro l 5 10 15 Val Ala Val A1a Glu Pro Glu Ser Thr A1a Lys Leu Pro Lys Glu Lys Glu Lys A1a Glu Lys Lys Lys Glu Lys Arg Ser Asp Arg Lys Ala Pro Lys Gln Cys Glu Thr Ser Lys His Ser Lys His I1e His Lys Lys Arg Lys Leu Glu Asp Val Ile Lys Ala Gly G1n Gly Pro Lys Arg Val Pro Lys Glu Ser Val Glu Gln Leu Glu Lys Ser Gly Leu Ser Glu Glu His Gly Ala Pro Ser Phe Val His Lys Ile Arg Asp Ser Pro Glu Ser Ser Gln Asp Ser Gly Lys Arg Arg Lys Val Val Leu Ser Ser Pro Ser Gln Pro Lys Asn Gly Asn Ile Leu Arg Phe Lys Ile Lys Ser Asn G1n Asp Pro G1n Ser Ala Va1 Leu Glu Lys Pro Arg Val Leu Asp Gln Pro Leu Val Gln Gln Met Gly Ser Gly Ser Ser Leu Ser Gly Lys Gln Asn Ser Ile His His Lys Met Asn Val Arg Ser Thr Ser Gly Gln Arg Arg Val Asn Gly Glu Ser Gln Ala Val Gln Lys Cys Leu Ile Thr Glu Ser Pro Ala Lys Thr Met Gln Arg Leu Val Pro Gln Pro Ala Ala Lys Val Thr His Pro Val Asp Pro Gln Ser Ala Val Lys Val Pro Val Gly Arg Ser Gly Leu Pro Leu Lys Phe Ser Gly Ser Val Asp Pro Ser Pro Ala Arg Val Met Gly Arg Phe Asp Pro Pro Pro Val Lys Met Met Ser Gln Arg Val His His Pro Ala Ser Met Val Ser Gln Lys Val Asp Pro Pro Leu Pro Lys Val Leu His Lys Glu Thr Gly Ser Val Val Arg Leu Pro Glu Ala Thr Arg Pro Thr Val Leu Gln Lys Pro Lys Asp Leu Pro Ala Ile Lys Gln Gln Glu Ile Arg Thr Ser Ser Ser Lys Glu Glu Pro Cys Phe Ser Gly Arg Asn Ala Glu A1a Val Gln Val Gln Asp Thr Lys Leu Ser Arg Ser Asp Val Lys Lys Ile Arg Lys Ala Glu Lys Lys Asp Lys Lys Phe Arg Asp Leu Phe Val Thr Trp Asn Pro Val Leu Ile Glu Asn Glu Gly Ser Asp Leu Gly Asp Glu Asp Trp Leu Phe Ser Ser Lys Arg Asn JJ

Ser Asp Ala Ile Met Ala Gln Ser Arg Ala Thr Asp Ser Ser Val Pro Ile His Pro Met Val Gln Gln Lys Pro Ser Leu Gln Pro Arg Ala Thr Phe Leu Pro Asp Leu Asn Ile Tyr Gln Leu Pro Tyr Val Val Pro Phe <210> 87 <211> 21 <212> DNA
<213> Artificial Sequence primer <220>
<223> primer <220>
<221> misc_feature <222> (1). (21) <223> primer <400> 87 caattctctg agatgccttg g 21 <210> 88 <211> 21 <212> DNA
<213> Artificial Sequence primer <220>
<223> primer <220>
<221> misc_feature <222> (1)..(21) <223> primer <400> 88 caattctctg agatgccttg g 21 <210> 89 <211> 1402 <212> DNA
<213> Oryza rufipogon <220>
<221> misc_feature <222> (1)..(1402) <223> n= a, c, t, or g <400> 89 gatgagctca cgcggggcgg cgcggctcga gctcgagccg cctatgaggg catcaaagga 60 aagggttggc cttcgtcctg cagagatgtt ggccaatgtt ggtccttcac cctccaaggc .120 aaaacagatt gtcaatcctg cagctgctaa ggttacacaa agagttgatc ctccacctgc 180 caaggcatctcagagaattgatcctctgttgccatccaaggttcatatagatgctactca240 atcttttacgaaggtctcccagacagagatcaagccggaagtacagcccccaattccgaa300 ggtgcctgtggctatgcctaccatcaatcgtcagcagattgacacctcgcagcccaaaga360 agagccttgctcctctggcaggaatgctgaagctgcttcagtatcagtagagaagcagtc420 caagtcagatcgcaaaaagagccgcaaggctgagaagaaagagaagaagttcaaagattt480 atttgttacctgggatcctccgtctatggaaatggatgatatggatcttggggaccagga540 ttggctgcttggtagtacgaggaaacctgatgctggcattggcaactgcagagaaattgt600 tgatccactttacttctcaatcagcagagcagttctcattgcagcctanggcgattcatt660 tacccagaccttcatgtctatcagttgccatatgtggttccattctaggtttgtgtagtg720 agatggagtagtgagaagtaagagatgttgggaagagagctgtgtgggtctgggagatta780 tggttccctg gcacagtttc ccagctttgt tcccagcgtt cttgtttcac ggttgctact 840 gtccaacttc ctgtgtnggt tttttggcgc cgctattgng gcttggactc cccattgatn 900 cctcacacaa ggaaattcga gtagttcaag cgctatttga ttaccggcga accacccaaa 960 gggggggggccggtaccccacgacctttggttccccctcaactagaaggggtnatattgt1020 cgcgccgggggtaacaatgngcacanaaccagtcacggtgnngaaagnttttatccggtc1080 cccaaaatatntcccncccancaaatntnaatacccgggggcactacagttnttataaac1140 cngtggggcnctacaanngtggacgatctcacaaattataatcatatttgtagtatntgc1200 cgangttcgcaaccgtcanacacnatcagttgtcgacgcnacgattatttttcnacagcc1260 gngctacacaancgaccgccgaaangnatgtataggatgangtacatacnatacctgact1320 caanacgtaccanacatcagcatcntgcgcgnntgatgantactcaggaagnagcgtccc1380 tacntccgat tgaaatngtg ac 1402 <210> 90 <211> 1341 <212> DNA
<213> Oryza rufipogon strain IRCG105491 <400> 90 atgtcgaggt gcttccccta cccgccgccg gggtacgtgc gaaacccagt ggtggccgtg 60 gccgcggccg aagcgcaggc gaccactaag ctccagaaag aaagggaaaa ggccgaaaag 120 aagaaagaga aaaagagtga caggaaagct cttccacatg gtgagatatc caagcattca 180 aagcgaaccc acaagaagag aaaacatgaa gacatcaata atgctgatca gaagtcccgg 240 aaggtttcct ccatggaacc tggtgagcaa ttggagaaga gtggactctc agaagagcat 300 ggagctccttgctttactcagacagtgcatggctctccagagagttcacaggacagcagc360 aagagaagaaaggttgtgttacccagtcctagccaagctaagaatggtaacatccttcga420 ataaagataagaagagatcaagattcttcagcttccctttcggagaaatctaatgttgta480 caaacaccagttcatcaaatgggatcagtttcatctctgccaagtaagaaaaactcaatg540 caaccacacaacaccgaaatgatggtgagaacagcatcaacccagcagcaaagcatcaaa600 ggtgattttcaagcagtactgaaacaaggtatgccaaccccagcaaaagtcatgccaaga660 gtcgatgttcctccatctatgagggcatcaaaggaaagggttggccttcgtcctgcagag720 atgttggccaatgttggtccttcaccatccaaggcaaaacagattgtcaatcctgcagct780 gctaaggttacacaaagagttgatcctccacctgccaaggcatctcagagaattgatcct840 ctgttgccatccaaggttcatatagatgctactcgatcttttacgaaggtctcccagaca900 gagatcaagccggaagtacagcccccaattccgaaggtgcctgtggctatgcctaccatc960 aatcgtcagcagattgacacctcgcagcccaaagaagagccttgctcctctggcaggaat1020 gctgaagctgcttcagtatcagtagagaagcagtccaagtcagatcgcaaaaagagccgc1080 aaggctgagaagaaagagaagaagttcaaagatttatttgttacctgggatcctccgtct1140 atggaaatggatgatatggatcttggggaccaggattggctgcttggtagtacgaggaaa1200 cctgatgctggcattggcaactgcagagaaattgttgatccacttacttctcaatcagca1260 gagcagttctcattgcagcctagggcgattcatttaccagaccttcatgtctatcagttg1320 ccatatgtggttccattctag 1341 <210> 91 <211> 2157 <212> DNA
<213> Oryza sativa cv. Nipponbare <400>

tcgaccagatcggtcgccaatcttttagtggctgaccgtggaaagaggatattactgact 60 tcggtttgctaattttggttgtgccgttgaatctgaaataaccagaatagtcatggggaa 120 aaaagtctgatctggaaggttcgaattacatttctatatattgttgtgctcccagacgat 1'80 ggttgcaagaaatcactcatgctggataaaattgtggatgtaagagtctgcagtcgttaa 240 aatctggaaacagcacattttgccgtagtaaatttgaatccatgttgctgtctcgttatt 300 ggtgtgttacgagtaacctgtgtgttgttatctccgcttggactagattccaagtaatcc 360 agtgccttcatgacctgcaaattctatgcctatgaagtaacatgaacagtttgtatgtat 420 13 ti gtattctgtt gatgcatact tgcattattt gtgagatgta catgttgtgg taaaattttg 480 cattcaccat atagaaatag taactgacta tccttgttta gttcgaaaac tactgcaggt 540 ttagttattc tctgttgcca agagtgcttg ttatgattgt aagggttaca gttctgtgac 600 taaccatgta acaaatatat taaggattat caaattattc tatgtgaagt gtccgtgccc 660 taattgtgtt atcttctgta actgatagca caacatttgt ttcctgctgt gtgcttgtgt 720 aaattggtac ttcatcatta ctatatattt caaagaaaat tctgcattgc attcccgtcg 780 tccgttctaa atcagaactg acgattgctc tggtggctga agctccagaa agaaagggaa 840 aaggctgaaa agaagaaaga gaaaaggagt gacaggaaag ctcttccaca tggtgagata 900 tccaagcatt caaagcgaac ccaccacaag aagagaaaac atgaagacat caataatgct 960 gatcagaagt cccggaaggt ttcctccatg gaacctggtg agcaattgga gaagagtgga 1020 ctctcagaag agcatggagc tccttgcttt actcagacag agcatggctc tccagagagt 1080 tcacaggaca gcagcaagag aagaaaggtt gtgttaccca gtcctagcca agctaagaat 1140 ggtgaggccc tttcttgcat ttgtcttctt ttagctggtg atgttgaatt ggtttgactt 1200 atcctgaatt atcatcttgc aggtaacatc cttcgaataa agataagaag agatcaagat 1260 tcttcagctt ccctttcgga gaaatctaat gttgtacaaa caccagttca tcaaatggga 1320 tcagtttcat ctctgccaag taagaaaaac tcaatgcaac cacacaacac cgaaatgatg 1380 gtgagaacag catcaaccca gcagcaaagc atcaaaggtg attttcaagc agtaccgaaa 1440 caaggtatgc caaccccagc aaaagtcatg ccaagagtcg atgttcctcc atctatgagg 1500 gcatcaaagg aaaggattgg ccttcgtcct gcagagatgt tggccaatgt tggtccttca 1560 ccctccaagg caaaacagat tgtcaatcct gcagctgcta aggttacaca aagagttgat 1620 cctccacctg ccaaggcatc tcagagaatt gatcctctgt tgccatccaa ggttcatata 1680 gatgctactc gatcttttac gaaggtctcc cagacagaga tcaagccgga agtacagccc 1740 ccaattctga aggtgcctgt ggctatgcct accatcaatc gtcagcagat tgacacctcg 1800 cagcccaaag aagagccttg ctcctctggc aggaatgctg aagctgcttc agtatcagta 1860 gagaagcagtccaagtcagatcgcaaaaagagccgcaaggctgagaagaaagagaagaag1920 ttcaaagatttatttgttacctgggatcctccgtctatggaaatggatgatatggatctc1980 ggggaccaggattggctgcttgatagtacgaggaaacctgatgctggcattggcaactgc2040 agagaaattgttgatccacttacttctcaatcagcagagcagttctcattgcagcctagg2100 gcgattcatttaccagaccttcatgtctatcagttgccatatgtggttccattctag 2157 ~

l .i y <210> 92 <211> 1259 <212> DNA
<2l3> Oryza rufipogon strain 5948 <400>

atcaaagggcgccctgcattaaagcatgccactggaaattggcgtgcatgttttttcatc60 ctaggtaatttgttaaagatgcatagcatatcccataaaactttggcacagtacttgaag120 gaattccttgtttcttcgcactagtgagtaatgcttctatatggttttgaatcgtacaat180 cctgtatttaccttttgccaaaatattttggtgacatacaacacaaaagaaatgctggtc240 gaagtaccagatagcatactttacagatcaattgaaaaatgctgtgcacatttttatctg300 ttctgcaatgagtagctttgaagtttcagaaatgctagtttggtgacaggggatgaatgc360 tgtgagagactggcctattatggtattgcaaagaacctagttacttatctgaaaacaaat420 cttcatcaaggcaaccttgaagctgcaagaaatgttacaacttggcaggggacatgctac480 ctaacacccctcattggtgccctcctagcagattcttactggggaaagtactggactatt540 gctgctttctcagcaatttattttattgtaagtacaagcctattgctatagaagatatta600 gatattacctacttcggtgcacttgcaccatgtgctgaactgatcttttcaaaataattt660 catatctgaaacatggataatttctgaacttttttactgaagggtctggttgctttgacg720 ctgtcagcatcagttccagctctgcagccgcctaaatgttcaggatctatttgtccagaa780 gcaagcttactccagtatggtgtatttttctctggcctctatatgatagccctcgggact840 ggaggcatcaaaccttgtgtatcatcctttggagctgatcaatttgatgacagtgatcca900 gcagacagagtaaagaagggctccttcttcaattggttttacttctgtataaatatcggt960 gcatttgtatcaggcaccgttatagtttggatacaagataactcaggttgggggatagga1020 tttgccattcctactatatttatggcattagcgattgcaagtttctttgttgcctcaaat1080 atgtacagatttcagaaacctggtggaagccctcttacaagagtgtgtcaggttgttgtt1140 gcagcattccgtaagtggcacactgaagtgccacatgatacatctcttttatatgaggtt1200 gatggccagacttcagcgattgagggaagccggaagctggagcacacaagtgaacttga 1259 <2l0> 93 <211> 868 <212> DNA
<213> Oryza rufipogon strain 5948 <400> 93 attctttgac aaggctgcca tcatctcatc tgatgatgcc aagagtgact cctttacaaa 60 tccgtggagg ctatgcactg tcacccaggt ggaagaactg aaaattctaa tcagaatgtt 120 tcccatttgg gccactacta ttatattcaa cgcggtgtat gctcagaact cttctatgtt 180 catagagcagggaatggttcttgacaagcgagttggatctttcattgtccctcctgcatc 240 cctctcaacttttgatgtcatcagtgtcatcatctggattccgttttatgaccgtgtgct 300 tgtgccaatagctagaaagttcactggaagggagaagggtttctctgagttacagcggat 360 tggaatcggattagccctctccatccttgcaatgctatctgcagctcttgttgagttgag 420 gcgtttagagatcgccagatctgaaggtcttattcatgaggatgt~gctgttccgatgag 480 cattctttggcaaataccgcagtatttcttggttggcgctgctgaggtctttgctgccat 540 aggtcaggttgagttcttctacaatgaagcccctgatgccatgaggagtttgtgtagtgc 600 atttgcgcttgtaacagtctcactggggagctatttaagctcaatcatattaaccttggt 660 gtcatattttacaactcaaggaggggatcctggatggatcccagataacctgaatgaagg 720 ccacctagatcggttcttttcattgattgctgggatcaactttgtgaatttactggtttt 780 cactggttgtgcaatgagatacagatacaagaaagcatga.tgactgtactcatggtaagg 840 tcagtttgtgtaagtaataacagatttt 868 <210> 94 <211> 1659 <212> DNA
<213> Oryza rufipogon strain 5948 <220>
<221> CDS
<222> (1)..(1659) <400> 94 atcaaagggcgc cctgca ttaaagcat gccactgga aattggcgt gca 48 IleLysGlyArg ProAla LeuLysHis AlaThrGly AsnTrpArg Ala tgttttttcatc ctaggg gatgaatgc tgtgagaga ctggcctat tat 96 CysPhePheIle LeuGly AspGluCys CysGluArg LeuAlaTyr Tyr ggtattgcaaag aaccta gttacttat ctgaaaaca aatcttcat caa 144 GlyTleAlaLys AsnLeu ValThrTyr LeuLysThr AsnLeuHis G1n ggcaaccttgaa getgca agaaatgtt acaacttgg caggggaca tgc 192 GlyAsnLeuGlu AlaAla ArgAsnVal ThrThrTrp GlnGlyThr Cys tacctaacaccc ctcatt ggtgccctc ctagcagat tcttactgg gga 240 TyrLeuThrPro LeuIle GlyAlaLeu LeuAlaAsp SerTyrTrp Gly aagtactggact attget getttctca gcaatttat tttattggt ctg 288 LysTyrTrpThr IleAla AlaPheSer AlaIleTyr PheTleGly Leu gtt get ttg acg ctg tca gca tca gtt cca get ctg cag ccg cct aaa 336 ValAla LeuThr LeuSerAla SerValPro AlaLeuGln ProProLys tgttca ggatct atttgtcca gaagcaagc ttactccag tatggtgta 384 CysSer GlySer IleCysPro GluAlaSer LeuLeuGln TyrGlyVal tttttc tctggc ctctatatg atagccctc gggactgga ggcatcaaa 432 PhePhe SerGly LeuTyrMet TleAlaLeu G1yThrGly GlyIleLys ccttgt gtatca tcctttgga getgatcaa tttgatgac agtgatcca 480 ProCys ValSer SexPheGly AlaAspGln PheAspAsp SerAspPro 145 l50 155 160 gcagac agagta aagaagggc tccttcttc aattggttt tacttctgt 528 AlaAsp ArgVal LysLysG1y SerPhePhe AsnTrpPhe TyrPheCys ataaat atcggt gcatttgta tcaggcacc gttatagtt tggatacaa 576 IleAsn IleGly AlaPheVal SerGlyThr ValIleVal TrpIleGln gataac tcaggt tgggggata ggatttgcc attcctact atatttatg 624 AspAsn SerGly TrpG1yIle G1yPheAla IleProThr IlePheMet gcatta gcgatt gcaagtttc tttgttgcc tcaaatatg tacagattt 672 AlaLeu AlaIle AlaSerPhe PheValAla SerAsnMet TyrArgPhe cagaaa cctggt ggaagccct cttacaaga gtgtgtcag gttgttgtt 720 GlnLys ProGly GlySerPro LeuThrArg ValCysGln ValValVal gcagca ttccgt aagtggcac actgaagtg ccacatgat acatctctt 768 AlaAla PheArg LysTrpHis ThrGluVal ProHisAsp ThrSerLeu ttatat gaggtt gatggccag acttcagcg attgaggga agccggaag 816 LeuTyr GluVal AspGlyGln ThrSerAla IleGluGly SerArgLys ctggag cacaca agtgaactt gaattcttt gacaagget gccatcatc 864 LeuGlu HisThr SerGluLeu GluPhePhe AspLysAla AlaIleIle 275 280 2g5 tcatct gatgat gccaagagt gactccttt acaaatccg tggaggcta 912 SerSer AspAsp AlaLysSer AspSerPhe ThrAsnPro TrpArgLeu tgcact gtcacc caggtggaa gaactgaaa attctaatc agaatgttt 960 CysThr ValThr G1nValGlu GluLeuLys IleLeuIle ArgMetPhe cccatt tgggcc actactatt atattcaac gcggtgtat getcagaac 1008 ProIle TrpAla ThrThrIle IlePheAsn AlaValTyr AlaGlnAsn tcttct atgttc atagagcag ggaatggtt cttgacaag cgagttgga 1056 SerSer MetPhe IleGluGln GlyMetVal LeuAspLys ArgValGly 14~

tct ttc att gtc cct cct gca tcc ctc tca act ttt gat gtc atc agt 1104 Ser Phe Ile Val Pro Pro Ala Ser Leu Ser Thr Phe Asp Val Ile Ser gtc atc atc tgg att ccg ttt tat gac cgt gtg ctt gtg cca ata get 1152 Val Ile Ile Trp Ile Pro Phe Tyr Asp Arg Val Leu Val Pro Ile Ala agaaag ttcactgga agggagaag ggtttctct gagtta cagcggatt 1200 ArgLys PheThrGly ArgGluLys GlyPheSer GluLeu GlnArgIle ggaatc ggattagcc ctctccatc cttgcaatg ctatct gcagetctt 1248 GlyIle GlyLeuAla LeuSerIle LeuAlaMet LeuSer AlaAlaLeu gttgag ttgaggcgt ttagagatc gccagatct gaaggt cttattcat 1296 ValGlu LeuArgArg LeuGluIle AlaArgSer GluGly LeuIleHis gaggat gttgetgtt ccgatgagc attctttgg caaata ccgcagtat 1344 GluAsp ValAlaVal ProMetSer IleLeuTrp GlnIle ProG1nTyr q ttcttg gttggcget getgaggtc tttgetgcc ataggt caggttgag 1392 PheLeu ValG1yAla AlaGluVa1 PheAlaAla IleGly GlnValGlu ttcttc tacaatgaa gcccctgat gccatgagg agtttg tgtagtgca 1440 PhePhe TyrAsnGlu AlaProAsp AlaMetArg SerLeu CysSerAla ttt gcg ctt gta aca gtc tca ctg ggg agc tat tta agc tca atc ata 1488 Phe Ala Leu Val Thr Val Ser Leu Gly Ser Tyr Leu Ser Ser Ile Ile tta acc ttg gtg tca tat ttt aca act caa gga ggg gat cct gga tgg 1536 Leu Thr Leu Val Ser Tyr Phe Thr Thr Gln Gly Gly Asp Pro Gly Trp atc cca gat aac ctg aat gaa ggc cac cta gat cgg ttc ttt tca ttg 1584 Ile Pro Asp Asn Leu Asn Glu Gly His Leu Asp Arg Phe Phe Ser Leu att get ggg atc aac ttt gtg aat tta ctg gtt ttc act ggt tgt gca 1632 Ile Ala Gly Ile Asn Phe Val Asn Leu Leu Val Phe Thr Gly Cys Ala atg aga tac aga tac aag aaa gca tga 1659 Met Arg Tyr Arg Tyr Lys Lys Ala <210> 95 <211> 552 <212> P12T
<213> Oryza rufipogon strain 5948 <400> 95 Ile Lys Gly Arg Pro Ala Leu Lys His A1a Thr Gly Asn Trp Arg Ala Cys Phe Phe I1e Leu Gly Asp Glu Cys Cys Glu Arg Leu Ala Tyr Tyr Gly Ile Ala Lys Asn Leu Val Thr Tyr Leu Lys Thr Asn Leu His Gln Gly Asn Leu Glu Ala A1a Arg Asn Val Thr Thr Trp Gln Gly Thr Cys Tyr Leu Thr Pro Leu Ile Gly Ala Leu Leu Ala Asp Ser Tyr Trp Gly Lys Tyr Trp Thr Ile Ala Ala Phe Ser Ala Ile Tyr Phe Ile Gly Leu Val Ala Leu Thr Leu Ser Ala Ser Val Pro Ala Leu Gln Pro Pro Lys Cys Ser Gly Ser Ile Cys Pro Glu Ala Ser Leu Leu Gln Tyr G1y Val Phe Phe Ser Gly Leu Tyr Met Ile Ala Leu Gly Thr Gly Gly I1e Lys Pro Cys Va1 Ser Ser Phe Gly Ala Asp Gln Phe Asp Asp Ser Asp Pro Ala Asp Arg Val Lys Lys Gly Ser Phe Phe Asn Trp Phe Tyr Phe Cys Ile Asn Ile Gly Ala Phe Val Ser Gly Thr Val Ile Val Trp Ile G1n Asp Asn Ser Gly Trp Gly Ile Gly Phe Ala Ile Pro Thr Ile Phe Met Ala Leu Ala Ile Ala Ser Phe Phe Val Ala Ser Asn Met Tyr Arg Phe 210 215 ~ 220 Gln Lys Pro Gly Gly Ser Pro Leu Thr Arg Val Cys Gln Val Val Val Ala Ala Phe Arg Lys Trp His Thr Glu Val Pro His Asp Thr Ser Leu Leu Tyr Glu Val Asp Gly Gln Thr Ser Ala Tle Glu Gly Ser Arg Lys Leu Glu His Thr Ser Glu Leu Glu Phe Phe Asp Lys Ala Ala Ile Ile Ser Ser Asp Asp Ala Lys Ser Asp Ser Phe Thr Asn Pro Trp Arg Leu Cys Thr Val Thr Gln Val Glu Glu Leu Lys Ile Leu Ile Arg Met Phe Pro Ile Trp Ala Thr Thr Ile Ile Phe Asn Ala Val Tyr A1a Gln Asn Ser Ser Met Phe I1e Glu Gln Gly Met Val Leu Asp Lys Arg Val Gly 340 ' 345 350 Ser Phe Ile Val Pro Pro Ala Ser Leu Ser Thr Phe Asp Val Tle Ser Val Ile Ile Trp Ile Pro Phe Tyr Asp Arg Va1 Leu Val Pro Tle Ala Arg Lys Phe Thr Gly Arg Glu Lys Gly Phe Ser Glu Leu Gln Arg Ile Gly Ile Gly Leu Ala Leu Ser Ile Leu Ala Met Leu Ser Ala Ala Leu Val Glu Leu Arg Arg Leu Glu Ile Ala Arg Ser Glu G1y Leu Ile His Glu Asp Val Ala Val Pro Met Ser Ile Leu Trp Gln Ile Pro Gln Tyr Phe Leu Val Gly Ala Ala Glu Val Phe Ala Ala Ile Gly Gln Val Glu Phe Phe Tyr Asn Glu Ala Pro Asp Ala Met Arg Ser Leu Cys Ser Ala Phe Ala Leu Val Thr Val Ser Leu Gly Ser Tyr Leu Ser Ser Ile Ile Leu Thr Leu Val Ser Tyr Phe Thr Thr Gln Gly Gly Asp Pro Gly Trp Ile Pro Asp Asn Leu Asn Glu Gly His Leu Asp Arg Phe Phe Ser Leu Ile Ala Gly Ile Asn Phe Val Asn Leu Leu Val Phe Thr Gly Cys Ala Met Arg Tyr Arg Tyr Lys Lys Ala <210> 96 <211> 1230 <212> DNA
<213> Oryza rufipogon strain 5949 <400> 96 cactggaaat tggcgtgcat gttttttcat cctaggtaat ttgttaaaga tgcatagcat 60 atcccataaa actttggcac agtacttgaa ggaattcctt gtttcttcgc actagtgagt 120 aatgcttcta tatggttttg aatcgtacaa tcctgtattt accttttgcc aaaatatttt 180 ggtgacatacaacacaaaagaaatgctggtcgaagtaccagatagcatactttacagatc 240 aattgaaaaatgctgtgcacatttttatctgttctgcaatgagtagctttgaagtttcag 300 aaatgctagtttggtgacaggggatgaatgctgtgagagactggcctattatggtattgc 360 aaagaacctagttacttatctgaaaacaaatcttcatcaaggcaaccttgaagctgcaag 420 aaatgttacaacttggcaggggacatgctacctaacacccctcattggtgccctcctagc 480 agattcttactggggaaagtactggactattgctgctttctcagcaatttattttattgt 540 aagtacaagcctattgctatagaagatattagatattacctacttcggtgcacttgcacc 600 atgtgctgaactgatcttttcaaaataatttcatatctgaaacatggataatttctgaac 660 tttttaactgaagggtctggttgctttgacgctgtcagcatcagttccagctctgcagcc 720 gcctaaatgttcaggatctatttgtccagaagcaagcttactccagtatggtgtattttt 780 ctctggcctctatatgatagccctcgggactggaggcatcaaaccttgtgtatcatcctt 840 tggagctgatcaatttgatgacagtgatccagcagacagagtaaagaagggctccttctt 900 caattggttttacttctgtataaatatcggtgcatttgtatcaggcaccgttatagtttg 960 gatacaagat aactcaggtt gggggatagg atttgccatt cctactatat ttatggcatt 1020 agcgattgca agtttctttg ttgcctcaaa tatgtacaga tttcagaaac ctggtggaag 1080 ccctcttaca agagtgtgtc aggttgttgt tgcagcattc cgtaagtggc acactgaagt 1140 gccacatgat acatctcttt tatatgaggt tgatggccag acttcagcga ttgagggaag 1200 ccggaagctg gagcacacaa gtgaacttga 1230 <210>

<21l>

<212>
DNA

<2l3>
Oryza rufipogon strain <400>

ataaaactgatacactaccttcttgtactgttccattttgggattggtggaaattaaata60 ctaaatgcaacaaaaagaatatggataaggccatacagcagaacgctagtagtatattag120 tagtttgtccatggcatgcaattcttataagtctacttataattactattactggtgcct180 ataattaatatgggaccattagaggtatatttgtataatgactgaaaatatcagggtagc240 acaagcaatatatgtcagtaggtggcttgctttacagacacatttcttttactttttttt300 agacaatataatatattgtgttttcttgtctgactgaaattactttttgttatacagatt360 ctttgacaaggctgccatcatctcatctgatgatgccaagagtgactcctttacaaatcc420 gtggaggctatgcactgtcacccaggtggaagaactgaaaattctaatcagaatgtttcc480 catttgggccactactattatattcaacgcggtgtatgctcagaactcttctatgttcat540 agagcagggaatggttcttgacaagcgagttggatctttcattgtccctcctgcatccct600 ctcaacttttgatgtcatcagtgtcatcatctggattccgttttatgaccgtgtgcttgt660 gccaatagctagaaagttcactggaagggagaagggtttctctgagttacagcggattgg720 aatcggattagccctctccatccttgcaatgctatctgcagctcttgttgagttgaggcg780 tttagagatcgccagatctgaaggtcttattcatgaggatgttgctgttccgatgagcat840 tctttggcaaataccgcagtatttcttggttggcgctgctgaggtctttgctgccatagg900 tcaggttgagttcttctacaatgaagcccctgatgccatgaggagtttgtgtagtgcatt960 tgcgcttgtaacagtctcactggggagctatttaagctcaatcatattaaccttggtgtc1020 atattttacaactcaaggaggggatcctggatggatcccagataacctgaatgaaggcca1080 cctagatcggttcttttcattgattgctgggatcaactttgtgaatttactggttttcac1140 tggttgtgcaatgagatacagatacaagaaagcatgatgactgtactcatggtaaggtca1200 gtttgtgtaagtaataacagatttt 1225 <210> 98 <211> 1632 <212> DNA
<213> Oryza rufipogon strain 5949 14~/

<220>

<221>
CDS

<222> (1)..(1632) <220>

<221> feature misc <222> _ (2) (1).

<223> C, or N G
=
A, T, <400>

nncactgga aattggcgt gcatgtttt ttcatccta ggggat gaatgc 48 XaaThrGly AsnTrpArg AlaCysPhe PheIleLeu GlyAsp G1uCys tgtgagaga ctggcctat tatggtatt gcaaagaac ctagtt acttat 96 CysGluArg LeuAlaTyr TyrGlyIle AlaLysAsn LeuVal ThrTyr ctgaaaaca aatcttcat caaggcaac cttgaaget gcaaga aatgtt 144 LeuLysThr AsnLeuHis GlnG1yAsn LeuGluAla AlaArg AsnVal acaacttgg caggggaca tgctaccta acacccctc attggt gccctc 192 ThrThrTrp GlnGlyThr CysTyrLeu ThrProLeu IleGly A1aLeu ctagcagat tcttactgg ggaaagtac tggactatt getget ttctca 240 LeuAlaAsp SerTyrTrp GlyLysTyr TrpThrIle AlaAla PheSer gcaatttat tttattggt ctggttget ttgacgctg tcagca tcagtt 288 AlaIleTyr PheIleGly LeuValAla LeuThrLeu SerAla SerVal ccagetctg cagccgcct aaatgttca ggatctatt tgtcca gaagca 336 ProA1aLeu GlnProPro LysCysSer GlySerIle CysPro GluAla agcttactc cagtatggt gtatttttc tctggcctc tatatg atagcc 384 SerLeuLeu GlnTyrGly Va1PhePhe SerGlyLeu TyrMet IleAla ctcgggact ggaggcatc aaaccttgt gtatcatcc tttgga getgat 432 LeuGlyThr GlyGlyIle LysProCys ValSerSer PheGly AlaAsp caatttgat gacagtgat ccagcagac agagtaaag aagggc tccttc 480 G1nPheAsp AspSerAsp ProAlaAsp ArgValLys LysGly SerPhe ttcaattgg ttttacttc tgtataaat atcggtgca tttgta tcaggc 528 PheAsnTrp PheTyrPhe CysIleAsn IleGlyA1a PheVa1 SerGly accgttata gtttggata caagataac tcaggttgg gggata ggattt 576 ThrValIle ValTrpIle GlnAspAsn SerGlyTrp GlyIle GlyPhe gccattcct actatattt atggcatta gcgattgca agtttc tttgtt 624 AlaIlePro ThrIlePhe MetAlaLeu AlaIleAla SerPhe PheVal gcctcaaat atgtacaga tttcagaaa cctggt ggaagccct cttaca 672 AlaSerAsn MetTyrArg PheGlnLys ProGly GlySerPro LeuThr 2l0 215 220 agagtgtgt caggttgtt gttgcagca ttccgt aagtggcac actgaa 720 ArgValCys GlnValVal ValAlaAla PheArg LysTrpHis ThrGlu gtgccacat gatacatct cttttatat gaggtt gatggccag acttca 768 ValProHis AspThrSer LeuLeuTyr GluVal AspGlyGln ThrSer gcgattgag ggaagccgg aagctggag cacaca agtgaactt gaattc 816 AlaIleG1u GlySerArg LysLeuGlu HisThr SerGluLeu GluPhe tttgacaag getgccatc atctcatct gatgat gccaagagt gactcc 864 PheAspLys AlaAlaIle TleSerSer AspAsp AlaLysSer AspSer tttacaaat ccgtggagg ctatgcact gtcacc caggtggaa gaactg 912 PheThrAsn ProTrpArg LeuCysThr ValThr GlnValG1u GluLeu aaaattcta atcagaatg tttcccatt tgggcc actactatt atattc 960 LysIleLeu IleArgMet PheProIle TrpAla ThrThrTle IlePhe aacgcggtg tatgetcag aactcttct atgttc atagagcag ggaatg 1008 AsnAlaVal TyrA1aGln AsnSerSer MetPhe IleG1uGln GlyMet gttcttgac aagcgagtt ggatctttc attgtc cctcotgca tccctc 1056 ValLeuAsp LysArgVa1 GlySerPhe I1eVal ProProAla SerLeu tcaactttt gatgtcatc agtgtc atcatctgg attccgttt tatgac 1104 SerThrPhe AspValIle SerVal IleIleTrp IleProPhe TyrAsp cgtgtgctt gtgccaata getaga aagttcact ggaagggag aagggt 1152 ArgValLeu ValProIle AlaArg LysPheThr GlyArgG1u LysGly ttctctgag ttacagcgg attgga atcggatta gccctctcc atcctt 1200 PheSerGlu LeuGlnArg IleGly IleGlyLeu AlaLeuSer IleLeu gcaatgcta tctgcaget cttgtt gagttgagg cgtttagag atcgcc 1248 AlaMetLeu SerAlaAla LeuVal GluLeuArg ArgLeuGlu IleAla agatctgaa ggtcttatt catgag gatgttget gttccgatg agcatt 1296 ArgSerGlu GlyLeuIle HisGlu AspValAla ValProMet SerIle ctttggcaa ataccgcag tatttc ttggttggc getgetgag gtcttt 1344 LeuTrpG1n IleProGln TyrPhe LeuValG1y AlaAlaGlu ValPhe getgccataggt caggttgag ttcttc tacaatgaa gcccctgat gcc 1392 AlaAlaIleGly GlnValG1u PhePhe TyrAsnGlu AlaProAsp Ala atgaggagtttg tgtagtgca tttgcg cttgtaaca gtctcactg ggg 1440 MetArgSerLeu CysSerAla PheAla LeuValThr ValSerLeu Gly 465 470 475 4.80 agctatttaagc tcaatcata ttaacc ttggtgtca tattttaca act 1488 SerTyrLeuSer SerI1eIle LeuThr LeuValSer TyrPheThr Thr caaggaggggat cctggatgg atccca gataacctg aatgaaggc cac 1536 GlnGlyGlyAsp ProGlyTrp IlePro AspAsnLeu AsnGluGly His ctagatcggttc ttttcattg attget gggatcaac tttgtgaat tta 1584 LeuAspArgPhe PheSerLeu IleAla GlyIleAsn PheValAsn Leu ctggttttcact ggttgtgca atgaga tacagatac aagaaagca tga 1632 LeuValPheThr GlyCysAla MetArg TyrArgTyr LysLysAla <210> 99 <211> 543 <212> PRT
<213> Oryza rufipogon strain 5949 <220>
<221> misc_feature <222> (1)..(1) <223> The 'Xaa' at location 1 stands for Asn, Ser, Thr, Ile, Asp, Gly, Ala, Val, His, Arg, Pro, Leu, Tyr, Cys, or Phe.
<400> 99 Xaa Thr Gly Asn Trp Arg Ala Cys Phe Phe Ile Leu Gly Asp Glu Cys Cys Glu Arg Leu Ala Tyr Tyr Gly Ile Ala Lys Asn Leu Val Thr Tyr Leu Lys Thr Asn Leu His Gln Gly Asn Leu Glu Ala Ala Arg Asn Val Thr Thr Trp Gln Gly Thr Cys Tyr Leu Thr Pro Leu Ile Gly Ala Leu Leu Ala Asp Ser Tyr Trp Gly Lys Tyr Trp Thr Ile Ala Ala Phe Ser Ala Ile Tyr Phe Ile Gly Leu Val Ala Leu Thr Leu Ser Ala Ser Val Pro Ala Leu Gln Pro Pro Lys Cys Ser Gly Ser Ile Cys Pro Glu Ala 100 105 l10 Ser Leu Leu Gln Tyr Gly Val Phe Phe Ser Gly Leu Tyr Met I1e Ala Leu Gly Thr Gly Gly I1e Lys Pro Cys Val Ser Ser Phe Gly Ala Asp Gln Phe Asp Asp Ser Asp Pro Ala Asp Arg Val Lys Lys G1y Ser Phe Phe Asn Trp Phe Tyr Phe Cys Ile Asn Ile Gly Ala Phe Val Ser Gly Thr Val Ile Val Trp Ile Gln Asp Asn Ser Gly Trp Gly Ile Gly Phe Ala I1e Pro Thr Ile Phe Met Ala Leu Ala Ile Ala Ser Phe Phe Val Ala Ser Asn Met Tyr Arg Phe Gln Lys Pro Gly Gly Ser Pro Leu Thr Arg Val Cys Gln Val Val Val A1a Ala Phe Arg Lys Trp His Thr Glu Val Pro His Asp Thr Ser Leu Leu Tyr Glu Val Asp G1y Gln Thr Ser Ala Tle Glu Gly Ser Arg Lys Leu Glu His Thr Ser Glu Leu Glu Phe Phe Asp Lys Ala Ala Ile I1e Ser Ser Asp Asp Ala Lys Ser Asp Ser Phe Thr Asn Pro Trp Arg Leu Cys Thr Val Thr Gln Va1 Glu Glu Leu Lys Ile Leu Ile Arg Met Phe Pro Ile Trp Ala Thr Thr Ile Ile Phe Asn Ala Val Tyr Ala Gln Asn Ser Ser Met Phe Ile Glu Gln Gly Met Val Leu Asp Lys Arg Val G1y Ser Phe Ile Val Pro Pro Ala Ser Leu Ser Thr Phe Asp Val Ile Ser Val Ile Ile Trp Ile Pro Phe Tyr Asp Arg Val Leu Val Pro Tle Ala Arg Lys Phe Thr Gly Arg Glu Lys Gly Phe Ser Glu Leu Gln Arg Ile G1y Ile Gly Leu Ala Leu Ser Ile Leu Ala Met Leu Ser Ala Ala Leu Val Glu Leu Arg Arg Leu Glu Ile Ala Arg Ser Glu Gly Leu I1e His Glu Asp Val Ala Val Pro Met Ser Ile Leu Trp Gln Ile Pro Gln Tyr Phe Leu Val Gly Ala Ala Glu Val Phe Ala Ala Ile Gly Gln Val Glu Phe Phe Tyr Asn Glu A1a Pro Asp Ala Met Arg Ser Leu Cys Ser Ala Phe Ala Leu Val Thr Val Ser Leu Gly Ser Tyr Leu Ser Ser Ile Ile Leu Thr Leu Val Ser Tyr Phe Thr Thr Gln Gly Gly Asp Pro Gly Trp I1e Pro Asp Asn Leu Asn Glu Gly His Leu Asp Arg Phe Phe Ser Leu I1e Ala Gly Ile Asn Phe Val Asn Leu Leu Val Phe Thr Gly Cys Ala Met Arg Tyr Arg Tyr Lys Lys Ala <210> 100 <211> 2599 <212> DNA
<213> Oryza sativa strain Azucena <400> 100 atcaaagggc gccctgcatt aaagcatgcc actggaaatt ggcgtgcatg ttttttcatc 60 ctaggtaatttgttaaagatgcatagcatatcccataaaactttggcacagtacttgaag 120 gaattccttgtttcttcgcactagtgagtaatgcttctatatggttttgaatcgtacaat 180 cctgtatttaccttttgccaaaatattttggtgatatacaacacaaaagaaatgctggtc 240 gaagtaccagatagcatactttacagatcaattgaaaaatgctgtgcacatttttatctg 300 ttctgcaatgagtagctttgaagtttcagaaatgctagtttggtgacaggggatgaatgc 360 tgtgagagactggcctattatggtattgcaaagaacctagttacttatctgaaaacaaat 420 cttcatcaaggcaaccttgaagctgcaagaaatgttacaacttggcaggggacatgctac 480 ctaacacccctcattggtgccctcctagcagattcttactggggaaagtactggactatt 540 gctgctttctcagcaatttattttattgtaagtacaagcctattgctatagaagatatta 600 gatattacctacttcggtgcacttgcaccatgtgctgaactgatcttttcaaaataattt660 catatctgaaacatggataatttctgaacttttttactgaagggtctggttgctttgacg720 ctgtcagcatcagttccagctctgcagccgcctaaatgttcaggatctatttgtccagaa780 gcaagcttactccagtatggtgtatttttctctggcctctatatgatagccctcgggact840 ggaggcatcaaaccttgtgtatcatcctttggagctgatcaatttgatgacagtgatcca900 gcagacagagtaaagaagggctccttcttcaattggttttacttctgtataaatatcggt960 gcatttgtatcaggcaccgttatagtttggatacaagataactcaggttgggggatagga1020 tttgccattcctactatatttatggcattagcgattgcaagtttctttgttgcctcaaat1080 atgtacagatttcagaaacctggtggaagccctcttacaagagtgtgtcaggttgttgtt1140 gcagcattccgtaagtggcacactgaagtgccacatgatacatctcttttatatgaggtt1200 gatggccagacttcagcgattgagggaagccggaagctggagcacacaagtgaacttgag1260 taattcctggatttttgcaatgcatcattgtctcacttttattcattctgttacaaagaa 1320 aaaaggaggaaagtctggatggggacaacaccagccatttgcagttggatgtatacataa 1380 aactgatacactaccttcttgtactgttccattttgggattggtggaaattaaatactaa 1440 atgcaacaaaaagaatatggataaggccatacagcagaacgctagtagtatattagtagt 1500 ttgtccatggcatgcaattcttataagtctacttataattactattactggtgcctataa 1560 ttaatatgggaccattagaggtatatttgtataatgactgaaaatatcagggtagcacaa 1620 gcaatatatgtcagtaggtggcttgctttacagacacatttcttttacttttttagacaa 1680 tataatatattgtgttttcttgtctgactgaaattactttttgttatacagattctttga 1740 caaggctgccatcatctcatctgatgatgccaagagtgactcctttacaaatccgtggag 1800 gctatgcactgtcacccaggtggaagaactgaaaattctaatcagaatgtttcccatttg 1860 ggccactactattatattcaacgcggtgtatgctcacaactcttctatgttcatagagca1920 gggaatggttcttgacaagcgagttggatctttcattgtccctcctgcatccctctcaac1980 ttttgatgtcatcagtgtcatcatctggattccgttttatggccgtgtgcttgtgccaat2040 agctagaaagttcactggaagggagaagggtttctctgagttacagcggattggaatcgg2100 attagccctctccatccttgcaatgctatctgcagctcttgttgagttgaggcgtttagg2160 gatcgccagatctgaaggtcttattcatgaggatgttgctgttccgatgagcattctttg2220 gcaaataccgcagtatttcttggttggcgctgctgaggtctttgctgccataggtcaggt2280 tgagttcttctacaatgaagcccctgatgccatgaggagtttgtgtagtgcatttgcgct2340 tgtaacagtctcactggggagctatttaagctcaatcatattaaccttggtgtcatattt2400 tacaactcaaggaggggatcctggatggatcccagataacctgaatgaaggccacctaga2460 tcggttcttttcattgattgctgggatcaactttgtgaatttactggttttcactggttg2520 tgcaatgagatacagatacaagaaagcatgatgactgtactcatggtaaggtcagtttgt2580 gttagtaataacagatttt 2599 <210> 101 <211> 1659 <212> DNA
<213> Oryza sativa strain Azucena <220>

<221> CDS

<222> (1)..(1659) <400> 101 atcaaagggcgc cctgcatta aagcatgcc actgga aattggcgt gca 48 IleLysGlyArg ProAlaLeu LysHisAla ThrGly AsnTrpArg Ala tgttttttcatc ctaggggat gaatgctgt gagaga ctggcctat tat 96 CysPhePheIle LeuGlyAsp GluCysCys GluArg LeuA1aTyr Tyr ggtattgcaaag aacctagtt acttatctg aaaaca aatcttcat caa 144 GlyIleAlaLys AsnLeuVal ThrTyrLeu LysThr AsnLeuHis Gln ggcaaccttgaa getgcaaga aatgttaca acttgg caggggaca tgc 192 GlyAsnLeuGlu AlaAlaArg AsnValThr ThrTrp GlnGlyThr Cys tacctaacaccc ctcattggt gccctccta gcagat tcttactgg gga 240 TyrLeuThrPro LeuIleGly AlaLeuLeu AlaAsp SerTyrTrp Gly 65 ~0 ~5 80 aagtactggact attgetget ttctcagca atttat tttattggt ctg 288 LysTyrTrpThr .IleAlaAla PheSerAla IleTyr PheIleGly Leu ~~t gttgetttg acgctgtca gcatcagtt ccagetctg cagccgcct aaa 336 ValAlaLeu ThrLeuSer AlaSerVal ProAlaLeu GlnProPro Lys tgttcagga tctatttgt ccagaagca agcttactc cagtatggt gta 384 CysSerGly SerIleCys ProGluAla SerLeuLeu GlnTyrGly Val tttttctct ggcctctat atgatagcc ctcgggact ggaggcatc aaa 432 PhePheSer GlyLeuTyr MetI1eAla LeuGlyThr GlyGlyIle Lys ccttgtgta tcatccttt ggagetgat caatttgat gacagtgat cca 480 ProCysVal SerSerPhe GlyAlaAsp GlnPheAsp AspSerAsp Pro gcagacaga gtaaagaag ggctccttc ttcaattgg ttttacttc tgt 528 AlaAspArg ValLysLys GlySerPhe PheAsnTrp PheTyrPhe Cys ataaatatc ggtgcattt gtatcaggc accgttata gtttggata caa 576 TleAsnIle GlyAlaPhe Va1SerGly ThrValIle ValTrpIle Gln gataactca ggttggggg ataggattt gccattcct actatattt atg 624 AspAsnSer GlyTrpGly TleGlyPhe AlaIlePro ThrIlePhe Met gcattagcg attgcaagt ttctttgtt gcctcaaat atgtacaga ttt 672 AlaLeuAla I1eAlaSer PhePheVal AlaSexAsn MetTyrArg Phe cagaaacct ggtggaagc cctcttaca agagtgtgt caggttgtt gtt 720 GlnLysPro GlyGlySer ProLeuThr ArgValCys GlnValVal Val gcagcattc cgtaagtgg cacactgaa gtgccacat gatacatct ctt 768 AlaAlaPhe ArgLysTrp HisThrGlu ValProHis AspThrSer Leu ttatatgag gttgatggc cagacttca gcgattgag ggaagccgg aag 816 LeuTyrGlu ValAspGly GlnThrSer AlaIleGlu G1ySerArg Lys ctggagcac acaagt.gaa cttgaattc tttgacaag getgccatc atc 864 LeuGluHis ThrSerGlu LeuGluPhe PheAspLys AlaAlaIle Tle tcatctgat gatgccaag agtgactcc tttacaaat ccgtggagg cta 912 SerSerAsp AspAlaLys SerAspSer PheThrAsn ProTrpArg Leu tgcactgtc acccaggtg gaagaactg aaaattcta atcagaatg ttt 960 CysThrVal ThrGlnVal GluGluLeu LysIleLeu IleArgMet Phe cccatttgg gccactact attatattc aacgcggtg tatgetcac aac 1008 ProIleTrp AlaThrThr I1eT1ePhe AsnAlaVal TyrAlaHis Asn tcttct atgttcata gagcaggga atggttctt gacaag cgagttgga 1056 SerSer MetPheIle GluGlnGly MetValLeu AspLys ArgValGly tctttc attgtccct cctgcatcc ctctcaact tttgat gtcatcagt 1104 SerPhe IleValPro ProAlaSer LeuSerThr PheAsp ValIleSer gtcatc atctggatt ccgttttat ggccgtgtg cttgtg ccaataget 1152 ValIle IleTrpIle ProPheTyr GlyArgVal LeuVal ProIleAla agaaag ttcactgga agggagaag ggtttctct gagtta cagcggatt 1200 ArgLys PheThrGly ArgGluLys Gly-PheSer GluLeu GlnArgIle ggaatc ggattagcc ctctccatc cttgcaatg ctatct gcagetctt 1248 GlyIle GlyLeuAla LeuSerIle LeuAlaMet LeuSer AlaAlaLeu gttgag ttgaggcgt ttagggatc gccagatct gaaggt cttattcat 1296 ValGlu LeuArgArg LeuGlyTle AlaArgSer GluGly LeuIleHis gaggat gttgetgtt ccgatgagc attctttgg caaata ccgcagtat 1344 GluAsp ValAlaVal ProMetSer IleLeuTrp GlnI1e ProGlnTyr ttcttg gttggcget getgaggtc tttgetgcc ataggt caggttgag 1392 PheLeu Va1GlyAla AlaGluVal PheAlaAla IleGly GlnValGlu ttcttc tacaatgaa gcccctgat gccatgagg agtttg tgtagtgca 1440 PhePhe TyrAsnGlu AlaProAsp AlaMetArg SerLeu CysSerAla tttgcg cttgtaaca gtctcactg gggagctat ttaagc tcaatcata 1488 PheAla LeuValThr ValSerLeu GlySerTyr LeuSer SerIleTle ttaacc ttggtgtca tattttaca actcaagga ggggat cctggatgg 1536 LeuThr LeuValSer TyrPheThr ThrGlnG1y GlyAsp ProGlyTrp atccca gataacctg aatgaaggc cacctagat cggtte ttttcattg 1584 IlePro AspAsnLeu AsnGluGly HisLeuAsp ArgPhe PheSerLeu attget gggatcaac tttgtgaat ttaetggtt tteact ggttgtgea 1632 IleAla GlyIleAsn PheValAsn LeuLeuVal PheThr GlyCysA1a atgaga tacagatac aagaaagca tga 1659 MetArg TyrArgTyr LysLysAla <210> 02 <211>

<212>
PRT

<213> sativa strain na Ory~a Azuce <400> 102 Ile Lys Gly Arg Pro Ala Leu Lys His Ala Thr Gly Asn Trp Arg Ala Cys Phe Phe Ile Leu Gly Asp Glu Cys Cys Glu Arg Leu Ala Tyr Tyr Gly Tle Ala Lys Asn Leu Val Thr Tyr Leu Lys Thr Asn Leu His Gln Gly Asn Leu G1u Ala Ala Arg Asn Val Thr Thr Trp Gln Gly Thr Cys Tyr Leu Thr Pro Leu Ile Gly Ala Leu Leu Ala Asp Ser Tyr Trp Gly Lys Tyr Trp Thr Ile Ala A1a Phe Ser Ala Ile Tyr Phe Ile Gly Leu Val Ala Leu Thr Leu Ser Ala Ser Val Pro Ala Leu Gln Pro Pro Lys Cys Ser Gly Ser Ile Cys Pro Glu Ala Ser Leu Leu Gln Tyr Gly Val Phe Phe Ser Gly Leu Tyr Met 21e Ala Leu Gly Thr Gly Gly Ile Lys Pro Cys Val Ser Ser Phe Gly Ala Asp Gln Phe Asp Asp Ser Asp Pro 145 150 155 , 160 Ala Asp Arg Val Lys Lys Gly Ser Phe Phe Asn Trp Phe Tyr Phe Cys Ile Asn Ile Gly Ala Phe Val Ser Gly Thr Val Ile Val Trp Ile Gln Asp Asn Ser Gly Trp Gly Tle Gly Phe Ala Ile Pro Thr Ile Phe Met Ala Leu Ala Ile Ala Ser Phe Phe Val Ala Ser Asn Met Tyr Arg Phe Gln Lys Pro Gly Gly Ser Pro Leu Thr Arg Val Cys Gln Val Val Val Ala A1a Phe Arg Lys Trp His Thr Glu Val Pro His Asp Thr Ser Leu Leu Tyr Glu Val Asp Gly Gln Thr Ser Ala Ile Glu Gly Ser Arg Lys Leu Glu His Thr Ser Glu Leu Glu Phe Phe Asp Lys Ala Ala Ile Ile Ser Ser Asp Asp Ala Lys Ser Asp Ser Phe Thr Asn Pro Trp Arg Leu Cys Thr Val Thr Gln Val Glu Glu Leu Lys Ile Leu Ile Arg Met Phe Pro Ile Trp A1a Thr Thr Ile Ile Phe Asn Ala Val Tyr Ala His Asn Ser Ser Met Phe Ile Glu Gln Gly Met Val Leu Asp Lys Arg Val Gly Ser Phe Ile Val Pro Pro Ala Ser Leu Ser Thr Phe Asp Val Ile Ser Val Ile Ile Trp Ile Pro Phe Tyr Gly Arg Val Leu Val Pro Ile Ala Arg Lys Phe Thr Gly Arg G1u Lys Gly Phe Ser Glu Leu Gln Arg Ile 385 390 ~ 395 400 Gly Ile Gly Leu Ala Leu Ser Ile Leu Ala Met Leu Ser Ala Ala Leu Val Glu Leu Arg Arg Leu G1y Ile Ala Arg Ser Glu G1y Leu Ile His Glu Asp Val Ala Val Pro Met Ser Ile Leu Trp Gln Ile Pro Gln Tyr Phe Leu Val Gly Ala A1a Glu Val Phe Ala Ala Ile Gly Gln Val Glu Phe Phe Tyr Asn Glu Ala Pro Asp Ala Met Arg Ser Leu Cys Ser Ala 465 470 " 475 480 Phe Ala Leu Val Thr Val Ser Leu Gly Ser Tyr Leu Ser Ser T1e Ile Leu Thr Leu Val Ser Tyr Phe Thr Thr Gln Gly Gly Asp Pro Gly Trp Ile Pro Asp Asn Leu Asn Glu Gly His Leu Asp Arg Phe Phe Ser Leu Ile Ala Gly Ile Asn Phe Val Asn Leu Leu Val Phe Thr Gly Cys Ala Met Arg Tyr Arg Tyr Lys Lys Ala <210> 103 <211> 2601 <212> DNA
<213> Oryza sativa strain IR64 <400> 103 atcaaagggc gccctgcatt aaagcatgcc actggaaatt ggcgtgcatg ttttttcatc 60 ctaggtaatt tgttaaagat gcatagcata tcccataaaa ctttggcaca gtacttgaag 120 gaattccttg tttcttcgca ctagtgagta atgcttctat atggttttga atcgtacaat 180 cctgtattta ccttttgcca aaatattttg gtgacataca acacaaaaga aatgctggtc 240 gaagtaccagatagcatactttacagatcaattgaaaaatgctgtgcacatttttatctg300 ttctgcaatgagtagctttgaagtttcagaaatgctagtttggtgacaggggatgaatgc360 tgtgagagactggcctattatggtattgcaaagaacctagttacttatctgaaaacaaat420 cttcatcaaggcaaccttgaagctgcaagaaatgttacaacttggcaggggacatgctac480 ctaacacccctcattggtgccctcctagcagattcttactggggaaagtactggactatt540 gctgctttctcagcaatttattttattgtaagtacaagcctattgctatagaagatatta600 gatattacctacttcggtgcacttgcaccatgtgctgaactgatcttttcaaaataattt660 catatctgaaacatggataatttctgaacttttttactgaagggtctggt.tgctttgacg720 ctgtcagcatcagttccagctctgcagccgcctaaatgttcaggatctatttgtccagaa780 gcaagcttactccagtatggtgtatttttctctggcctctatatgatagccctcgggact840 ggaggcatcaaaccttgtgtatcatcctttggagctgatcaatttgatgacagtgatcca900 gcagacagagtaaagaagggctccttcttcaattggttttacttc~gtataaatatcggt960 gcatttgtatcaggcaccgttatagtttggatacaagataactcaggttgggggatagga1020 tttgccattcctactatatttatggcattagcgattgcaagtttctttgttgcctcaaat1080 atgtacagatttcagaaacctggtggaagccctcttacaagagtgtgtcaggttgttgtt1140 gcagcattccgtaagtggcacactgaagtgccacatgatacatctcttttatatgaggtt1200 gatggccagacttcagcgattgagggaagccggaagctggagcacacaagtgaacttgag1260 taattcctggatttttgcaatgcatcattgtctcacttttattcattctgttacaaagaa1320 aaaagggggaaagtctggatggggacaacaccagccatttgcagttggatgtatacataa1380 aactgatacactaccttcttgtactgttccattttgggattggtggaaattaaatactaa1440 atgcaacaaaaagaatatggataaggccatacagcagaacgctagtagtatattagtagt1500 ttgtccatggcatgcaattcttataagtctacttataattactattactggtgcctataa1560 ttaatatgggaccattagaggtatatttgtataatgactgaaaatatcagggtagcacaa1620 gcaatatatgtcagtaggtggcttgctttacagacacatttcttttacttttttttagac1680 aatataatatattgtgttttcttgtctgactgaaattactttttgttatacagattcttt1740 gacaaggctgccatcatctcatctgatgatgccaagagtgactcctttacaaatccgtgg1800 aggctatgcactgtcacccaggtggaagaactgaaaattctaatcagaatgtttcccatt1860 tgggccactactattatattcaacgcggtgtatgctcagaactcttctatgttcatagag1920 cagggaatggttcttgacaagcgagttggatctttcattgtccctcctgcatccctctca1980 acttttgatgtcatcagtgtcatcatctggattccgttttatgaccgtgtgcttgtgcca2040 atagctagaaagttcactggaagggagaagggtttctctgagttacagcggattggaatc2100 ggattagccctctccatccttgcaatgctatctgcagctcttgttgagttgaggcgttta2160 gagatcgccagatctgaaggtcttattcatgaggatgttgctgttccgatgagcattctt2220 tggcaaataccgcagtatttcttggttggcgctgctgaggtctttgctgccataggtcag2280 gttgagttcttctacaatgaagcccctgatgccatgaggagtttgtgtagtgcatttgcg2340 cttgtaacagtctcactggggagctatttaagctcaatcatattaaccttggtgtcatat2400 tttacaactcaaggaggggatcctggatggatcccagataacctgaatgaaggccaccta2460 gatcggttcttttcattgattgctgggatcaactttgtgaatttactggttttcactggt2520 tgtgcaatgagatacagatacaagaaagcatgatgactgtactcatggtaaggtcagttt2580 gtgtaagtaataacagatttt 2601 <210> 104 <211> 1659 <212> DNA
<213> Oryza sativa strain IR64 1bU

<220>

<221> CDS

<222> (1)..(1659) <400> 104 atc ggg cgccctgca ttaaagcat gccactgga aattgg cgtgca 48 aaa Ile Gly ArgProAla LeuLysHis AlaThrGly AsnTrp ArgAla Lys tgt ttc atcctaggg gatgaatgc tgtgagaga ctggcc tattat 96 ttt Cys Phe I1eLeuGly AspGluCys CysGluArg LeuAla TyrTyr Phe ggt gca aagaaccta gttacttat ctgaaaaca aatctt catcaa 144 att Gly Ala LysAsnLeu ValThrTyr LeuLysThr AsnLeu HisGln Ile ggc ctt gaagetgca agaaatgtt acaacttgg cagggg acatgc 192 aac Gly Leu GluAlaAla ArgAsnVal ThrThrTrp GlnGly ThrCys Asn tac cta aca ccc ctc att ggt gcc ctc cta gca gat tct tac tgg gga 240 Tyr Leu Thr Pro Leu Ile Gly Ala Leu Leu Ala Asp Ser Tyr Trp Gly aag tac tgg act att get get ttc tca gca att tat ttt att ggt ctg 288 Lys Tyr Trp Thr Ile Ala Ala Phe Ser Ala Ile Tyr Phe Ile Gly Leu gtt get ttg acg ctg tca gca tca gtt cca get ctg cag ccg cct aaa 336 Val Ala Leu Thr Leu Ser Ala Ser Va1 Pro Ala Leu Gln Pro Pro Lys tgt tca gga tct att tgt cca gaa gca agc tta ctc cag tat ggt gta 384 Cys Ser Gly Ser Ile Cys Pro Glu Ala Ser Leu Leu Gln Tyr Gly Val ttt ttc tct ggc ctc tat atg ata gcc ctc ggg act gga ggc atc aaa 432 Phe Phe Ser Gly Leu Tyr Met Ile Ala Leu Gly Thr G1y Gly Ile Lys cct tgt gta tca tcc ttt gga get gat caa ttt gat gac agt gat cca 480 Pro Cys Val Ser Ser Phe Gly Ala Asp Gln Phe Asp Asp Ser Asp Pro gca gac aga gta aag aag ggc tcc ttc ttc aat tgg ttt tac ttc tgt 528 Ala Asp Arg Val Lys Lys Gly Ser Phe Phe Asn Trp Phe Tyr Phe Cys ata aat atc ggt gca ttt gta tca ggc acc gtt ata gtt tgg ata caa 576 Ile Asn Ile Gly A1a Phe Val Ser Gly Thr Val Ile Val Trp Ile Gln gat aac tca ggt tgg ggg ata gga ttt gcc att cct act ata ttt atg 624 Asp Asn Ser Gly Trp Gly Ile Gly Phe Ala Ile Pro Thr Tle Phe Met gca tta gcg att gca agt ttc ttt gtt gcc tca aat atg tac aga ttt 672 A1a Leu Ala Ile Ala Ser Phe Phe Val Ala Ser Asn Met Tyr Arg Phe cagaaacct ggtggaagc cctcttaca agagtgtgt caggttgtt gtt 720 GlnLysPro GlyGlySer ProLeuThr ArgVa1Cys GlnValVal Val gcagcattc cgtaagtgg cacactgaa gtgccacat gatacatct ctt 768 AlaAlaPhe ArgLysTrp HisThrGlu ValProHis AspThrSer Leu ttatatgag gttgatggc cagacttca gcgattgag ggaagccgg aag 816 LeuTyrGlu ValAspGly G1nThrSer AlaIleGlu GlySerArg Lys ctggagcac acaagtgaa cttgaattc tttgacaag getgccatc atc 864 LeuGluHis ThrSerGlu LeuGluPhe PheAspLys AlaAlaIle Ile tcatctgat gatgccaag agtgactcc tttacaaat ccgtggagg cta 912 SerSerAsp AspAlaLys SerAspSer PheThrAsn ProTrpArg Leu tgcactgtc acccaggtg gaagaactg aaaattcta atcagaatg ttt 960 CysThrVal ThrGlnVal GluGluLeu LysIleLeu IleArgMet Phe cccatttgg gccactact attatattc aacgcggtg tatgetcag aac 1008 ProIleTrp AlaThrThr IleIlePhe AsnAlaVal TyrAlaGln Asn tcttctatg ttcatagag cagggaatg gttcttgac aagcgagtt gga 1056 SerSerMet PheIleGlu GlnGlyMet ValLeuAsp LysArgVal Gly tctttcatt gtccctcct gcatccctc tcaactttt gatgtcatc agt 1104 SerPheIle ValProPro AlaSerLeu SerThrPhe AspValIle Ser gtcatcatc tggattccg ttttatgac cgtgtgctt gtgccaata get 1152 ValIleIle TrpIlePro PheTyrAsp ArgValLeu ValProI1e Ala agaaagttc actggaagg gagaagggt ttctctgag ttacagcgg att 1200 ArgLysPhe ThrGlyArg GluLysGly PheSerGlu LeuGlnArg Ile ggaatcgga ttagccctc tccatcctt gcaatgcta tctgcaget ctt 1248 GlyIleGly LeuAlaLeu SerIleLeu AlaMetLeu SerAlaAla Leu gttgagttg aggcgttta gagatcgcc agatctgaa ggtcttatt cat 1296 ValGluLeu ArgArgLeu GluIleAla ArgSerGlu GlyLeuIle His gaggatgtt getgttccg atgagcatt ctttggcaa ataccgcag tat 1344 GluAspVal AlaValPro MetSerIle LeuTrpGln IleProGln Tyr ttcttggtt ggcgetget gaggtcttt getgccata ggtcaggtt gag 1392 PheLeuVal GlyAlaAla GluValPhe AlaAlaIle GlyGlnVal Glu ttc ttc tac aat gaa gcc cct gat gcc atg agg agt ttg tgt agt gca 1440 Phe Phe Tyr Asn Glu Ala Pro Asp Ala Met Arg Ser Leu Cys Ser Ala ttt gcg ctt gta aca gtc tca ctg ggg agc tat tta agc tca atc ata 1488 Phe Ala Leu Val Thr Val Ser Leu Gly Ser Tyr Leu Ser Ser Ile Ile ttaaccttg gtgtcatat tttacaact caaggaggg gatcctgga tgg 1536 LeuThrLeu ValSerTyr PheThrThr GlnGlyGly AspProGly Trp atcccagat aacctgaat gaaggccac ctagatcgg ttcttttca ttg 1584 IleProAsp AsnLeuAsn GluGlyHis LeuAspArg PhePheSer Leu attgetggg atcaacttt gtgaattta ctggttttc actggttgt gca 1632 IleAlaGly IleAsnPhe ValAsnLeu LeuValPhe ThrGlyCys Ala atgagatac agatacaag aaagcatga 1659 MetArgTyr ArgTyrLys LysA1a <210> 105 <211> 552 <212> PRT
<213> 0ryza sativa strain IR64 <400> 105 Ile Lys G1y Arg Pro Ala Leu Lys His Ala Thr Gly Asn Trp Arg Ala Cys Phe Phe Ile Leu Gly Asp Glu Cys Cys Glu Arg Leu Ala Tyr Tyr Gly Ile Ala Lys Asn Leu Va1 Thr Tyr Leu Lys Thr Asn Leu His Gln G1y Asn Leu Glu Ala Ala Arg Asn Val Thr Thr Trp Gln Gly Thr Cys Tyr Leu Thr Pro Leu Ile Gly Ala Leu Leu Ala Asp Ser Tyr Trp Gly Lys Tyr Trp Thr Ile Ala Ala Phe Ser Ala Ile Tyr Phe Ile Gly Leu Val Ala Leu Thr Leu Ser Ala Ser Val Pro Ala Leu Gln Pro Pro Lys Cys Ser Gly Ser Ile Cys Pro Glu Ala Ser Leu Leu Gln Tyr Gly Va1 Phe Phe Ser Gly Leu Tyr Met Ile Ala Leu Gly Thr Gly Gly Ile Lys Pro Cys Val Ser Ser Phe Gly Ala Asp Gln Phe Asp Asp Ser Asp Pro A1a Asp Arg Val Lys Lys Gly Ser Phe Phe Asn Trp Phe Tyr Phe Cys Ile Asn Ile Gly Ala Phe Val Ser Gly Thr Val Ile Val Trp Ile Gln Asp Asn Ser Gly Trp Gly Ile G1y Phe Ala Ile Pro Thr Ile Phe Met Ala Leu Ala Ile Ala Ser Phe Phe Val Ala Ser Asn Met Tyr Arg Phe Gln Lys Pro Gly Gly Ser Pro Leu Thr Arg Val Cys Gln Val Val Va1 Ala Ala Phe Arg Lys Trp His Thr Glu Val Pro His Asp Thr Ser Leu Leu Tyr Glu Val Asp Gly Gln Thr Ser Ala Ile Glu Gly Ser Arg Lys Leu Glu His Thr Ser Glu Leu Glu Phe Phe Asp Lys A1a Ala Ile Ile Ser Ser Asp Asp A1a Lys Sex Asp Ser Phe Thr Asn Pro Trp Arg Leu Cys Thr Val Thr Gln Va1 Glu Glu Leu Lys Ile Leu Ile Arg Met Phe Pro Ile Trp Ala Thr Thr Ile Ile Phe Asn Ala Va1 Tyr Ala Gln Asn Ser Ser Met Phe Ile Glu Gln Gly Met Val Leu Asp Lys Arg Val Gly Ser Phe Ile Val Pro Pro A1a Ser Leu Ser Thr Phe Asp Val Ile Ser Val Ile Ile Trp Ile Pro Phe Tyr Asp Arg Val Leu Val Pro Ile Ala Arg Lys Phe Thr Gly Arg Glu Lys G1y Phe Ser Glu Leu Gln Arg Ile Gly I1e Gly Leu Ala Leu Ser Ile Leu Ala Met Leu Ser A1a Ala Leu Val Glu Leu Arg Arg Leu Glu Ile Ala Arg Ser Glu Gly Leu Ile His Glu Asp Val A1a Val Pro Met Ser Ile Leu Trp G1n Tle Pro Gln Tyr Phe Leu Val Gly Ala Ala G1u Val Phe Ala Ala Ile Gly Gln Val Glu Phe Phe Tyr Asn Glu Ala Pro Asp Ala Met Arg Ser Leu Cys Ser Ala Phe Ala Leu Val Thr Val Ser Leu Gly Ser Tyr Leu Ser Ser Ile Ile Leu Thr Leu Val Ser Tyr Phe Thr Thr Gln Gly Gly Asp Pro Gly Trp Ile Pro Asp Asn Leu Asn Glu Gly His Leu Asp Arg Phe Phe Ser L~u Tle Ala Gly Ile Asn Phe Val Asn Leu Leu Val Phe Thr G1y Cys Ala Met Arg Tyr Arg Tyr Lys Lys Ala <210> 106 <211> 2601 <212> DNA
<213> Oryza sativa strain Kasalath <400> 106 atcaaagggc gccctgcatt aaagcatgcc actggaaatt ggcgtgcatg ttttttcatc 60 ctaggtaatt tgttaaagat gcatagcata tcccataaaa ctttggcaca gtacttgaag 120 gaattccttg tttcttcgca ctagtgagta atgcttctat atggttttga atcgtacaat 180 cctgtatttaccttttgccaaaatattttggtgacatacaacacaaaagaaatgctggtc 240 gaagtaccagatagcatactttacagatcaattgaaaaatgctgtgcacatttttatctg 300 ttctgcaatgagtagctttgaagtttcagaaatgctagtttggtgacaggggatgaatgc 360 tgtgagagactggcctattatggtattgcaaagaacctagttacttatctgaaaacaaat 420 cttcatcaaggcaaccttgaagctgcaagaaatgttacaacttggcaggggacatgctac 480 ctaacacccctcattggtgccctcctagcagattcttactggggaaagtactggactatt 540 gctgctttctcagcaatttattttattgtaagtacaagcctattgctatagaagatatta 600 gatattacctacttcggtgcacttgcaccatgtgctgaactgatcttttcaaaataattt 660 catatctgaaacatggataatttctgaacttttttactgaagggtctggttgctttgacg 720 ctgtcagcatcagttccagctctgcagccgcctaaatgttcaggatctatttgtccagaa 780 gcaagcttactccagtatggtgtatttttctctggcctctatatgatagccctcgggact 840 ggaggcatcaaaccttgtgtatcatcctttggagctgatcaatttgatgacagtgatcca 900 gcagacagagtaaagaagggctccttcttcaattggttttacttctgtataaatatcggt 960 gcatttgtatcaggcactgttatagtttggatacaagataactcaggttgggggatagga 1020 tttgccattcctactatatttatggcattagcgattgcaagtttctttgttgcctcaaat 1080 atgtacagatttcagaaacctggtggaagccctcttacaagagtgtgtcaggttgttgtt 1140 gcagcattccgtaagtggcacactgaagtgccacatgatacatctcttttatatgaggtt 1200 gatggccagacttcagcgattgagggaagccggaagctggagcacacaagtgaacttgag1260 taattcctggatttttgcaatgcatcattgtctcacttttattcattctgttacaaagaa1320 aaaagggggaaagtctggatggggacaacaccagccatttgcagttggatgtatacataa1380 aactgatacactaccttcttgtactgttccattttgggattggtggaaattaaatactaa1440 atgcaacaaaaagaatatggataaggccatacagcagaacgctagtagtatattagtagt1500 ttgtccatggcatgcaattcttataagtctacttataattactattactggtgcctataa1560 ttaatatgggaccattagaggtatatttgtataatgactgaaaatatcagggtagcacaa1620 gcaatatatgtcagtaggtggcttgctttacagacacatttcttttacttttttttagac1680 aatataatatattgtgttttcttgtctgactgaaattactttttgttatacagattcttt1740 gacaaggctgccatcatctcatctgatgatgccaagagtgactcctttacaaatccgtgg1800 aggctatgcactgtcacccaggtggaagaactgaaaattctaatcagaatgtttcccatt1860 tgggccacta ctattatatt caacgcggtg tatgctcaga actcttctat gttcatagag 1920 cagggaatgg ttcttgacaa gcgagttgga tctttcattg tccctcctgc atccctctca 1980 acttttgatgtcatcagtgtcatcatctggattccgtttaatgaccgtgtgcttgtgcca 2040 atagctagaaagttcactggaagggagaagggtttctctgagttacagcggattggaatc 2100 ggattagccctctccatccttgcaatgctatctgcagctcttgttgagttgaggcgttta 2160 gagatcgccagatctgaaggtcttattcatgaggatgttgctgttccgatgagcattctt 2220 tggcaaataccgcagtatttcttggttggcgctgctgaggtctttgctgccataggtcag 2280 gttgagttcttctacaatgaagcccctgatgccatgaggagtttgtgtagtgcatttgcg 2340 cttgtaacagtctcactggggagctatttaagctcaatcatattaaccttggtgtcatat 2400 tttacaactcaaggaggggatcctggatggatcccagataacctgaatgaaggccaccta 2460 gatcggttcttttcattgattgctgggatcaactttgtgaatttactggttttcactggt 2520 tgtgcaatgagatacagatacaagaaagcatgatgactgtactcatggtaaggtcagttt 2580 gtgtaagtaataacagatttt 2601 <210>

<211>

<212>
DNA

<2l3>
Oryza sativa strain Kasalath <220>

<221>
CDS

<222> (1)..(1659) <400>

atcaaagggcgc cctgcatta aagcatgcc actgga aattggcgt gca 48 IleLysGlyArg ProAlaLeu LysHisAla ThrGly AsnTrpArg Ala l 5 10 15 tgttttttcatc ctaggggat gaatgctgt gagaga ctggcctat tat 96 CysPhePheIle LeuGlyAsp GluCysCys GluArg LeuAlaTyr Tyr ggtattgcaaag aacctagtt acttatctg aaaaca aatcttcat caa 144 GlyIleAlaLys AsnLeuVal ThrTyrLeu LysThr AsnLeuHis Gln ggcaaccttgaa getgcaaga aatgttaca acttgg caggggaca tgc 192 GlyAsnLeuGlu AlaAlaArg AsnValThr ThrTrp GlnGlyThr Cys tacctaacaccc ctcattggt gccctccta gcagat tcttactgg gga 240 TyrLeuThrPro LeuIleGly A1aLeuLeu AlaAsp SerTyrTrp Gly ' aagtactggact attgetget ttctcagca atttat tttattggt ctg 288 LysTyrTrpThr IleAlaA1a PheSerAla IleTyr PheIleGly Leu gttgetttg acgctgtca gcatcagtt ccagetctg cagccgcct aaa 336 ValA1aLeu ThrLeuSer AlaSerVal ProAlaLeu GlnProPro Lys tgttcagga tctatttgt ccagaagca agcttactc cagtatggt gta 384 CysSerGly SerTleCys~ProGluAla SerLeuLeu GlnTyrGly Val tttttctct ggcctctat atgatagcc ctcgggact ggaggcatc aaa 432 PhePheSer GlyLeuTyr MetIleAla LeuGlyThr GlyGlyIle Lys ccttgtgta tcatccttt ggagetgat caatttgat gacagtgat cca 480 ProCysVal SerSerPhe GlyAlaAsp GlnPheAsp AspSerAsp Pro gcagacaga gtaaagaag ggctccttc ttcaattgg ttttacttc tgt 528 AlaAspArg ValLysLys G1ySerPhe PheAsnTrp PheTyrPhe Cys ataaatatc ggtgcattt gtatca ggcactgtt atagtttgg atacaa 576 IleAsnIle GlyAlaPhe ValSer GlyThrVal IleValTrp IleGln gataactca ggttggggg atagga tttgccatt cctactata tttatg 624 AspAsnSer GlyTrpGly IleGly PheAlaIle ProThrIle PheMet gcattagcg attgcaagt ttcttt gttgcctca aatatgtac agattt 672 AlaLeuAla I1eAlaSer PhePhe ValAlaSer AsnMetTyr ArgPhe cagaaacct ggtggaagc cctctt acaagagtg tgtcaggtt gttgtt 720 GlnLysPro GlyGlySer ProLeu ThrArgVal CysGlnVal ValVal gcagcattc cgtaagtgg cacact gaagtgcca catgataca tctctt 768 AlaA1aPhe ArgLysTrp HisThr GluValPro HisAspThr SerLeu ttatatgag gttgatggc cagact tcagcgatt gagggaagc cggaag 816 LeuTyrGlu Va1AspGly GlnThr SerAlaIle GluGlySer ArgLys ctggagcac acaagtgaa cttgaa ttctttgac aaggetgcc atcatc 864 LeuGluHis ThrSerGlu LeuGlu PhePheAsp LysAlaAla TleIle tcatctgat gatgccaag agtgac tcctttaca aatccgtgg aggcta 912 SerSerAsp AspAlaLys SerAsp SerPheThr AsnProTrp ArgLeu tgcactgtc acccaggtg gaagaa ctgaaaatt ctaatcaga atgttt 960 CysThrVal ThrGlnVal GluGlu LeuLysIle LeuIleArg MetPhe cccatttgg gccactact attata ttcaacgcg gtgtatget cagaac 1008 ProIleTrp AlaThrThr IleIle PheAsnAla ValTyrAla GlnAsn tct tct atg ttc ata gag cag gga atg gtt ctt gac aag cga gtt gga 1056 Ser Ser Met Phe Ile Glu Gln Gly Met Val Leu Asp Lys Arg Val Gly tctttcatt gtccctcct gcatccctc tcaactttt gatgtcatc agt 1104 SerPheIle ValProPro AlaSerLeu SerThrPhe AspValIle Ser gtcatcatc tggattccg tttaatgac cgtgtgctt gtgccaata get 1152 ValTleIle TrpIlePro PheAsnAsp ArgValLeu ValProIle Ala agaaagttc actggaagg gagaagggt ttctctgag ttacagcgg att 1200 ArgLysPhe ThrGlyArg GluLysGly PheSexGlu LeuGlnArg Ile ggaatcgga ttagccctc tccatcctt gca.atgcta tctgcaget ctt 1248 GlyIleGly LeuAlaLeu SerIleLeu A1aMetLeu SerAlaAla Leu gttgagttg aggcgttta gagatcgcc agatctgaa ggtcttatt cat 1296 ValGluLeu ArgArgLeu GluIleAla ArgSerGlu GlyLeuIle His gaggatgtt getgttccg atgagcatt ctttggcaa ataccgcag tat 1344 GluAspVal AlaValPro MetSerIle LeuTrpGln IleProGln Tyr ttcttggtt ggcgetget gaggtcttt getgccata ggtcaggtt gag 1392 PheLeuVal GlyAlaAla GluValPhe AlaAlaIle GlyGlnVal Glu ttcttctac aatgaagcc cctgatgcc atgaggagt ttgtgtagt gca 1440 PhePheTyr AsnGluAla ProAspAla MetArgSer LeuCysSer A1a tttgcgctt gtaacagtc tcactgggg agctattta agctcaatc ata 1488 PheAlaLeu ValThrVal SerLeuG1y SerTyrLeu SerSerIle Ile ttaaccttggtg tcatat tttacaact caaggaggggat cctgga tgg 1536 LeuThrLeuVal SerTyr PheThrThr GlnGlyG1yAsp ProGly Trp atcccagataac ctgaat gaaggccac ctagatcggttc ttttca ttg 1584 I1eProAspAsn LeuAsn G1uGlyHis LeuAspArgPhe PheSer Leu attgetgggatc aacttt gtgaattta ctggttttcact ggttgt gca 1632 IleAlaGlyIle AsnPhe ValAsnLeu LeuValPheThr GlyCys Ala atgagatacaga tacaag aaagcatga 1659 MetArgTyrArg TyrLys LysAla <210> 108 <211> 552 <212> PRT
<213> Oryza sativa strain Kasalath <400> 108 Ile Lys Gly Arg Pro Ala Leu Lys His Ala Thr Gly Asn Trp Arg Ala l 5 10 15 Cys Phe Phe Ile Leu Gly Asp Glu Cys Cys Glu Arg Leu Ala Tyr Tyr Gly Ile Ala Lys Asn Leu Val Thr Tyr Leu Lys Thr Asn Leu His Gln Gly Asn Leu Glu Ala Ala Arg Asn Val Thr Thr Trp Gln Gly Thr Cys Tyr Leu Thr Pro Leu Ile Gly Ala Leu~Leu Ala Asp Ser Tyr Trp Gly Lys Tyr Trp Thr Ile Ala Ala Phe Ser Ala Ile Tyr Phe Ile Gly Leu Val Ala Leu Thr Leu Ser Ala Ser Val Pro Ala Leu G1n Pro Pro Lys Cys Ser Gly Ser Ile Cys Pro Glu Ala Ser Leu Leu Gln Tyr Gly Val Phe Phe Ser Gly Leu Tyr Met Ile Ala Leu Gly Thr Gly Gly Ile Lys Pro Cys Val Ser Ser Phe Gly Ala Asp Gln Phe Asp Asp Ser Asp Pro Ala Asp Arg Val Lys Lys Gly Ser Phe Phe Asn Trp Phe Tyr Phe Cys I1e Asn Ile Gly Ala Phe Val Ser Gly Thr Val Ile Val Trp Ile.Gln Asp Asn Ser Gly Trp Gly Ile Gly Phe A1a Ile Pro Thr Ile Phe Met Ala Leu Ala Ile Ala Ser Phe Phe Val Ala Ser Asn Met Tyr Arg Phe Gln Lys Pro Gly Gly Ser Pro Leu Thr Arg Val Cys Gln Val Val Val Ala Ala Phe Arg Lys Trp His Thr Glu Val Pro His Asp Thr Ser Leu 245 250 255 , Leu Tyr Glu Val Asp Gly Gln Thr Ser Ala Ile Glu G1y Ser Arg Lys Leu Glu His Thr Ser Glu Leu Glu Phe Phe Asp Lys Ala Ala Ile Ile Ser Ser Asp Asp Ala Lys Ser Asp Ser Phe Thr Asn Pro Trp Arg Leu Cys Thr Val Thr Gln Val Glu Glu Leu Lys Ile Leu Ile Arg Met Phe Pro Ile Trp Ala Thr Thr Ile Tle Phe Asn Ala Val Tyr Ala Gln Asn Ser Ser Met Phe Ile Glu Gln Gly Met Val Leu Asp Lys Arg Val Gly Ser Phe Ile Val Pro Pro Ala Ser Leu Ser Thr Phe Asp Val Ile Ser Val Ile Ile Trp I1e Pro Phe Asn Asp Arg Val Leu Val Pro Ile Ala Arg Lys Phe Thr Gly Arg Glu Lys Gly Phe Ser Glu Leu Gln Arg Ile Gly Ile Gly Leu Ala Leu Ser Ile Leu Ala Met Leu Ser Ala Ala Leu Val Glu Leu Arg Arg Leu Glu Ile Ala Arg Ser Glu Gly Leu Ile His Glu Asp Val Ala Val Pro Met Ser Tle Leu Trp Gln Ile Pro Gln Tyr Phe Leu Val Gly Ala Ala Glu Val Phe Ala Ala Ile Gly Gln Val Glu Phe Phe Tyr Asn Glu Ala Pro Asp Ala Met Arg Ser Leu Cys Ser Ala Phe Ala Leu Val Thr Val Ser Leu Gly Ser Tyr Leu Ser Ser Ile Ile Leu Thr Leu Val Ser Tyr Phe Thr Thr Gln Gly Gly Asp Pro Gly Trp 500 505 5l0 Ile Pro Asp Asn Leu Asn Glu Gly His Leu Asp Arg Phe Phe Ser Leu Ile Ala Gly Ile Asn Phe Val Asn Leu Leu Val Phe Thr Gly Cys Ala Met Arg Tyr Arg Tyr Lys Lys Ala <210>

<211>

<212>
DNA

<213> a sativa Oryz strain Lemont <400>

atcaaagggcgccctgcattaaagcatgccactggaaattggcgtgcatgttttttcatc60 ctaggtaatttgttaaagatgcatagcatatcccataaaactttggcacagtacttgaag120 gaattccttgtttcttcgcactagtgagtaatgcttctatatggttttgaatcgtacaat180 cctgtatttaccttttgccaaaatattttggtgatatacaacacaaaagaaatgctggtc240 gaagtaccagatagcatactttacagatcaattgaaaaatgctgtgcacatttttatctg300 ttctgcaatgagtagctttgaagtttcagaaatgctagtttggtgacaggggatgaatgc360 tgtgagagactggcctattatggtattgcaaagaacctagttacttatctgaaaacaaat420 cttcatcaaggcaaccttgaagctgcaagaaatgttacaacttggcaggggacatgctac480 ctaacacccctcattggtgccctcctagcagattcttactggggaaagtactggactatt540 gctgctttctcagcaatttattttattgtaagtacaagcctattgctatagaagatatta600 gatattacctacttcggtgcacttgcaccatgtgctgaactgatcttttcaaaataattt660 catatctgaaacatggataatttctgaacttttttactgaagggtctggttgctttgacg720 ctgtcagcatcagttccagctctgcagccgcctaaatgttcaggatctatttgtccagaa780 gcaagcttactccagtatggtgtatttttctctggcctctatatgatagccctcgggact840 ggaggcatcaaaccttgtgtatcatcctttggagctgatcaatttgatgacagtgatcca900 gcagacagagtaaagaagggctccttcttcaattggttttacttctgtataaatatcggt960 gcatttgtatcaggcaccgttatagtttggatacaagataactcaggttgggggatagga1020 tttgccattcctactatatttatggcattagcgattgcaagtttctttgttgcctcaaat 1080 atgtacagatttcagaaacctggtggaagccctcttacaagagtgtgtcaggttgttgtt 1140 gcagcattccgtaagtggcacactgaagtgccacatgatacatctcttttatatgaggtt 1200 gatggccagacttcagcgattgagggaagccggaagctggagcacacaagtgaacttgag 1260 taattcctggatttttgcaatgcatcattgtctcacttttattcattctgttacaaagaa 1320 aaaaggaggaaagtctggatggggacaacaccagccatttgcagttggatgtatacataa 1380 aactgatacactaccttcttgtactgttccattttgggattggtggaaattaaatactaa 1440 atgcaacaaaaagaatatggataaggccatacagcagaacgctagtagtatattagtagt 1500 ttgtccatggcatgcaattcttataagtctacttataattactattactggtgcctataa 1560 ttaatatgggaccattagaggtatatttgtataatgactgaaaatatcagggtagcacaa 1620 gcaatatatgtcagtaggtggcttgctttacagacacatttcttttacttttttagacaa 1680 tataatatattgtgttttcttgtctgactgaaattactttttgttatacagattctttga 1740 caaggctgccatcatctcatctgatgatgccaagagtgactcctttacaaatccgtggag 1800 gctatgcactgtcacccaggtggaagaactgaaaattctaatcagaatgtttcccatttg 1860 ggccactactattatattcaacgcggtgtatgctcacaactcttctatgttcatagagca1920 gggaatggttcttgacaagcgagttggatctttcattgtccctcctgcatccctctcaac1980 ttttgatgtcatcagtgtcatcatctggattccgttttatggccgtgtgcttgtgccaat2040 agctagaaagttcactggaagggagaagggtttctctgagttacagcggattggaatcgg2100 attagccctctccatccttgcaatgctatctgcagctcttgttgagttgaggcgtttagg2160 gatcgccagatctgaaggtcttattcatgaggatgttgctgttccgatgagcattctttg2220 gcaaataccgcagtatttcttggttggcgctgctgaggtctttgctgccataggtcaggt2280 tgagttcttctacaatgaagcccctgatgccatgaggagtttgtgtagtgcatttgcgct2340 tgtaacagtctcactggggagctatttaagctcaatcatattaaccttggtgtcatattt2400 tacaactcaaggaggggatcctggatggatcccagataacctgaatgaaggccacctaga2460 tcggttcttttcattgattgctgggatcaactttgtgaatttactggttttcactggttg.2520 tgcaatgaga tacagataca agaaagcatg atgactgtac tcatggtaag gtcagtttgt 2580 gttagtaata acagatttt 2599 <210> 110 <211> 1659 <212> DNA
<213> Oryza sativa strain Lemont <220>
<221> CDS
<222> (1)..(1659) <400> 110 atcaaaggg cgccctgca ttaaagcat gccactgga aattggcgt gca 48 IleLysGly ArgProAla LeuLysHis AlaThrGly AsnTrpArg Ala tgttttttc atcctaggg gatgaatgc tgtgagaga ctggcctat tat 96 CysPhePhe IleLeuGly AspGluCys CysGluArg LeuAlaTyr Tyr ggtattgca aagaaccta gttacttat ctgaaaaca aatcttcat caa 144 GlyIleAla LysAsnLeu ValThrTyr LeuLysThr AsnLeuHis Gln ggcaacctt gaagetgca agaaatgtt acaacttgg caggggaca tgc 192 GlyAsnLeu GluAlaAla ArgAsnVal ThrThrTrp GlnGlyThr Cys tacctaaca cccctcatt ggtgccctc ctagcagat tcttactgg gga 240 TyrLeuThr ProLeuIle GlyAlaLeu LeuAlaAsp SerTyrTrp Gly aagtactgg actattget getttctca gcaatttat tttattggt ctg 288 LysTyrTrp ThrIleAla AlaPheSer AlaI1eTyr PheIleGly Leu gttgetttg acgctgtca gcatcagtt ccagetctg cagccgcct aaa 336 ValAlaLeu ThrLeuSer AlaSerVal ProAlaLeu GlnProPro Lys tgttcagga tctatttgt ccagaagca agcttactc cagtatggt gta 384 CysSerGly SerIleCys ProGluAla SerLeuLeu GlnTyrGly Val tttttctct ggcctctat atgatagcc ctcgggact ggaggcatc aaa 432 PhePheSer GlyLeuTyr MetIleAla LeuGlyThr GlyGlyIle Lys ccttgtgta tcatccttt ggagetgat caatttgat gacagtgat cca 480 ProCysVal SerSerPhe GlyAlaAsp GlnPheAsp AspSerAsp Pro gcagacaga gtaaagaag ggctccttc ttcaattgg ttttacttc tgt 528 AlaAspArg ValLysLys GlySerPhe PheAsnTrp PheTyrPhe Cys ataaatatc ggtgcattt gtatcaggc accgttata gtttggata caa 576 IleAsnIle GlyAlaPhe ValSerGly ThrVa1Ile ValTrpIle Gln gataactca ggttggggg ataggattt gccattcct actatattt atg 624 AspAsnSer GlyTrpGly IleGlyPhe AlaIlePro ThrIlePhe Met gcattagcg attgcaagt ttctttgtt gcctcaaat atgtacaga ttt 672 AlaLeuAla IleAlaSer PhePheVal AlaSerAsn MetTyrArg Phe cagaaacct ggtggaagc cctcttaca agagtgtgt caggttgtt gtt 720 GlnLysPro GlyGlySer ProLeuThr ArgVa1Cys G1nValVal Val gcagcattc cgtaagtgg cacactgaa gtgccacat gatacatct ctt 768 AlaAlaPhe ArgLysTrp HisThrGlu ValProHis AspThrSer Leu ttatatgag gttgatggc cagacttca gcgattgag ggaagccgg aag 816 LeuTyrGlu ValAspGly GlnThrSer AlaIleGlu GlySerArg Lys ctggagcac acaagtgaa cttgaattc tttgacaag getgccatc atc 864 LeuGluHis ThrSerGlu LeuGluPhe PheAspLys AlaAlaIle Ile tcatctgat gatgccaag agtgactcc tttacaaat ccgtggagg cta 912 SerSerAsp AspAlaLys SerAspSer PheThrAsn ProTrpArg Leu tgcactgtc acccaggtg gaagaactg aaaattcta atcagaatg ttt 960 CysThrVal ThrGlnVal GluGluLeu LysIleLeu IleArgMet Phe cccatttgg gccactact attatattc aacgcggtg tatgetcac aac 1008 ProIleTrp AlaThrThr IleIlePhe AsnAlaVal TyrAlaHis Asn tcttctatg ttcatagag cagggaatg gttcttgac aagcgagtt gga 1056 SerSerMet PheIleGlu GlnGlyMet ValLeuAsp LysArgVal Gly tctttcatt gtccctcct gcatccctc tcaactttt gatgtcatc agt 1104 SerPheIle ValProPro AlaSerLeu SerThrPhe AspValIle Ser gtcatcatc tggattccg ttttatggc cgtgtgctt gtgccaata get 1152 V~1IleIle TrpI1ePro PheTyrGly ArgValLeu ValProT1e Ala agaaagttc actggaagg gagaagggt ttctctgag ttacagcgg att 1200 ArgLysPhe ThrGlyArg GluLysGly PheSerGlu LeuGlnArg Ile ggaatcgga ttagccctc tccatcctt gcaatgcta tctgcaget ctt 1248 GlyIleGly LeuAlaLeu SerIleLeu AlaMetLeu SerAlaAla Leu gttgagttg aggcgttta gggatcgcc agatctgaa ggtcttatt cat 1296 ValGluLeu ArgArgLeu GlyIleAla ArgSerGlu GlyLeuIle His gaggatgtt getgttccg atgagcatt ctttggcaa ataccgcag tat 1344 GluAspVal A1aValPro MetSerIle LeuTrpGln IleProGln Tyr ttcttggtt ggcgetget gaggtcttt getgccata ggtcaggtt gag 1392 PheLeuVal GlyAlaAla GluValPhe AlaAlaTle GlyGlnVal Glu ttcttctac aatgaagcc cctgatgcc atgaggagt ttgtgtagt gca 1440 Phe Phe Tyr Asn Glu Ala Pro Asp Ala Met Arg Ser Leu Cys Ser Ala tttgcgcttgta acagtc tcactgggg agctattta agctcaatc ata 188 PheAlaLeuVal ThrVal SerLeuGly SerTyrLeu SerSerIle Ile ttaaccttggtg tcatat tttacaact caaggaggg gatcctgga tgg 1536 LeuThrLeuVal SerTyr PheThrThr GlnGlyGly AspProGly Trp atcccagataac ctgaat gaaggccac ctagatcgg ttcttttca ttg 1584 IleProAspAsn LeuAsn GluGlyHis LeuAspArg PhePheSer Leu attgetgggatc aacttt gtgaattta ctggttttc actggttgt gca 1632 IleAlaGlyIle AsnPhe ValAsnLeu LeuValPhe ThrGlyCys Ala atgagatacaga tacaag aaagcatga 1659 MetArgTyrArg TyrLys LysAla <210> 111 <211> 552 <212> PRT
<213> Oryza sativa strain Lemont <400> 111 Ile Lys Gly Arg Pro Ala Leu Lys His Ala Thr Gly Asn Trp Arg Ala Cys Phe Phe Ile Leu Gly Asp Glu Cys Cys Glu Arg Leu Ala Tyr Tyr Gly Ile Ala Lys Asn Leu Val Thr Tyr Leu Lys Thr Asn Leu His Gln Gly Asn Leu Glu Ala Ala Arg Asn Val Thr Thr Trp Gln Gly Thr Cys Tyr Leu Thr Pro Leu Ile Gly Ala Leu Leu A1a Asp Ser Tyr Trp Gly Lys Tyr Trp Thr Ile A1a Ala Phe Ser Ala Ile Tyr Phe Ile Gly Leu Val Ala Leu Thr Leu Ser Ala Ser Va1 Pro Ala Leu Gln Pro Pro Lys 100 105 l10 Cys Ser Gly Ser Ile Cys Pro Glu Ala Ser Leu Leu Gln Tyr Gly Val Phe Phe Ser Gly Leu Tyr Met Ile Ala Leu Gly Thr Gly Gly I1e Lys Pro Cys Val Ser Ser Phe Gly Ala Asp Gln Phe Asp Asp Ser Asp Pro Ala Asp Arg Val Lys Lys Gly Ser Phe Phe Asn Trp Phe Tyr Phe Cys Ile Asn Ile Gly Ala Phe Val Ser Gly Thr Val Ile Val Trp Ile Gln 180 ' 185 190 Asp Asn Ser Gly Trp Gly Ile Gly Phe A1a Ile Pro Thr Ile Phe Met Ala Leu A1a Ile Ala Ser Phe Phe Val Ala Ser Asn Met Tyr Arg Phe Gln Lys Pro Gly Gly Ser Pro Leu Thr Arg Val Cys Gln Val Val Val Ala Ala Phe Arg Lys Trp His Thr Glu Val Pro His Asp Thr Ser Leu Leu Tyr Glu Val Asp Gly Gln Thr Ser Ala Ile Glu Gly Ser Arg Lys Leu Glu His Thr Ser Glu Leu Glu Phe Phe Asp Lys Ala Ala Ile Ile Ser Ser Asp Asp Ala Lys Ser Asp Ser Phe Thr Asn Pro Trp Arg Leu Cys Thr Val Thr Gln Val Glu Glu Leu Lys Ile Leu Ile Arg Met Phe Pro Ile Trp Ala Thr Thr Ile Ile Phe Asn Ala Val Tyr Ala His Asn Ser Ser Met Phe Ile Glu Gln Gly Met Val Leu Asp Lys Arg Val Gly Ser Phe Ile Val Pro Pro Ala Ser Leu Ser Thr Phe Asp Val Ile Ser DEMANDE OU BREVET VOLUMINEUX
LA PRESENTE PARTIE DE CETTE DEMANDE OU CE BREVET COMPREND
PLUS D'UN TOME.

NOTE : Pour les tomes additionels, veuillez contacter le Bureau canadien des brevets JUMBO APPLICATIONS/PATENTS
THIS SECTION OF THE APPLICATION/PATENT CONTAINS MORE THAN ONE
VOLUME

NOTE: For additional volumes, please contact the Canadian Patent Office NOM DU FICHIER / FILE NAME
NOTE POUR LE TOME / VOLUME NOTE:

Claims (78)

Claims What is claimed is:
1. A method for identifying a polynucleotide sequence encoding a polypeptide of a domesticated organism, wherein said polypeptide is or is suspected of being associated with increased yield in said domesticated organism as compared to a wild ancestor of said domesticated organism, comprising the steps of:
a) comparing polypeptide-coding nucleotide sequences of said domesticated organism to polypeptide-coding nucleotide sequences of said wild ancestor; and b) selecting a polynucleotide sequence in the domesticated organism that contains a nucleotide change as compared to a corresponding sequence in the wild ancestor, wherein said change is evolutionarily significant, whereby the domesticated organism's polynucleotide sequence is identified.
2. The method of claim 1 wherein said polypeptide that is associated with improved yield is an EG307 or EG1117 polypeptide.
3. A method for producing a transfected plant cell or transgenic plant comprising the steps of:
a) transfecting a plant cell to contain a heterologous DNA segment encoding a polypeptide and derived from an EG307 or EG1117 polynucleotide not native to said cell;
wherein said polynucleotide is operably linked to a promoter that can be used effectively for expression of transgenic proteins;
b) optionally growing and maintaining said cell under conditions whereby a transgenic plant is regenerated therefrom;
c) optionally growing said transgenic plant under conditions whereby said DNA
is expressed, whereby the total amount of EG307 or EG1117 polypeptide in said plant is increased.
4. The method of claim 3, further comprising the step of obtaining and growing additional generations of descendants of said transgenic plant which comprise said heterologous DNA segment wherein said heterologous DNA segment is expressed.
5. Plant cells, comprising heterologous DNA encoding an EG307 or EG1117 polypeptide.
6. A propagation material of a transgenic plant comprising the transgenic plant cell according to claim 5.
7. A transgenic plant containing heterologous DNA which encodes an EG307 or EG1117 polypeptide that is expressed in plant tissue.
8. An isolated polynucleotide which includes a promoter operably linked to a polynucleotide that encodes the EG307 or EG1117 gene in plant tissue.
9. The isolated polynucleotide of Claim 8, wherein said polynucleotide is a recombinant polynucleotide.
10. The method of claim 8, wherein the promoter is the promoter native to an EG307 or EG1117 gene.
11. A method of making a transfected cell comprising:
a) identifying an evolutionarily significant EG307 or EG1117 polynucleotide in an ancestor of a domesticated plant or a corresponding polynucleotide in a domesticated plant;
b) using said EG307 or EG1117 polynucleotide to identify a non-polypeptide coding sequence that may be a transcription or translation regulatory element, enhancer, intron or other 5' or 3' flanking sequence;
c) assembling a construct comprising said non-polypeptide coding sequence operably linked to a polynucleotide encoding a reporter protein; and d) transfecting said construct into a host cell.
12. A transfected cell produced according to the method of claim 11.
13. A method of making a transgenic plant comprising the method of claim 11, wherein said host cell is a plant cell, further comprising the step of growing and maintaining said cell under conditions whereby said transgenic plant is regenerated therefrom.
14. A transgenic plant produced by the method of claim 12.
15. A method of identifying an agent that modulates the function of the non-polypeptide coding regions of an evolutionarily significant EG307 or EG1117 polynucleotide, comprising contacting the transfected host cell of claim 11 with at least one candidate agent, wherein the agent is identified by its ability to modulate the transcription or translation of said reporter polynucleotide.
16. An agent identified by the method of claim 15.
17. A method of identifying an agent that modulates the function of the non-polypeptide coding regions of an evolutionarily significant EG307 or EG1117 polynucleotide, comprising contacting the transgenic plant of claim 13 with at least one candidate agent, wherein the agent is identified by its ability to modulate the transcription or translation of said reporter polynucleotide.
18. An agent identified by the method of claim 17.
19. A transfected host cell comprising a host cell transfected with a construct comprising a promoter, enhancer or intron polynucleotide from an evolutionarily significant EG307 or EG1117 polynucleotide or any combination thereof, operably linked to a polynucleotide encoding a reporter protein.
20. A method of identifying an agent which may modulate yield, said method comprising contacting at least one candidate agent with a plant or cell comprising an EG307 or EG1117 gene, wherein the agent is identified by its ability to modulate yield.
21. The method of Claim 20, wherein the plant or cell is transfected with a polynucleotide encoding and EG307 or EG1117 gene.
22. An agent identified according to the method of claim 20.
23. The method of claim 20, wherein said identified agent modulates yield by modulating a function of the polynucleotide encoding the polypeptide.
24. The method of claim 20, wherein said identified agent modulates yield by modulating a function of the polypeptide.
25. A method of providing increased yield in a plant comprising:
a) producing a transfected plant cell having heterologous DNA encoding an EG307 or EG1117 polypeptide whereby EG307 or EG1117 is expressed in said plant cell;
and b) growing a transgenic plant from the transfected plant cell wherein the EG307 or EG1117 transgene is expressed in the transgenic plant.
26. The method of claim 25, wherein the transgene is under the control of regulatory sequences suitable for controlled expression of the transgene.
27. A method of producing an EG307 or EG1117 polypeptide comprising:
a) providing a cell transfected with a polynucleotide encoding an EG307 or polypeptide positioned for expression in the cell;
b) culturing the transfected cell under conditions for expressing the polynucleotide;
and c) isolating the EG307 or EG1117 polypeptide.
28. The method of claim 25 wherein said heterologous DNA encoding the EG307 or EG1117 gene is order the control of a promoter providing constitutive expression of the EG307 or EG1117 gene.
29. The method of claim 25 wherein said heterologous DNA encoding the EG307 or EG1117 gene is under the control of a promoter providing controllable expression of the EG307 or EG1117 gene.
30. The method of claim 29, wherein the EG307 or EG1117 gene is expressed using a tissue-specific or cell type-specific promoter, or using a promoter that is activated by the introduction of an external signal or agent, such as a chemical signal or agent.
31. A method of isolating a yield-related gene or fragment thereof from a plant cell, comprising:
a) providing a sample of plant cell polynucleotides;
b) providing a pair of oligonucleotides having sequence homology to a conserved region of an EG307 or EG1117 gene;
c) combining the pair of oligonucleotides with the plant cell polynucleotides sample under conditions suitable for polymerase chain reaction-mediated polynucleotide amplification; and d) isolating the amplified yield-related polynucleotide or fragment thereof.
32. A plant yield-related polynucleotide isolated according to the method of claim 31.
33. A method of isolating a yield-related polynucleotide comprising:
a) providing a preparation of polynucleotides selected from the group consisting of genomic plant cell DNA and recombinant plant cell library polynucleotides;
b) contacting the preparation with an EG307 or EG1117 oligonucleotide under hybridization conditions providing detection of polynucleotides having 50% or greater sequence identity; and c) isolating a yield-related polynucleotide by its association with the EG307 or EG1117 oligonucleotide.
34. The method of Claim 33, wherein the EG307 or EG1117 oligonucleotide is detectably-labelled and the yield-related gene is isolated by its association with the detectable label.
35. The method of Claim 33, wherein the EG307 or EG1117 oligonucleotide is at least 12 nucleotides in length.
36. The method of Claim 33, wherein the EG307 or EG1117 oligonucleotide is at least 30 nucleotides in length.
37. A method of identifying a plant yield-related gene comprising:
a) providing a plant tissue sample;
b) introducing into the plant tissue sample a candidate plant yield-related gene;
c) expressing the candidate plant yield-related gene within the plant tissue sample; and d) determining whether the plant tissue sample exhibits change in yield response, whereby a change in response identifies a plant yield-related gene.
38. A plant yield-related gene isolated according to the method of Claim 37.
39. An isolated polynucleotide selected from the group consisting of:
a) a polynucleotide selected from the group consisting of SEQ ID NO:92, SEQ ID NO:93, SEQ ID. NO:94, SEQ ID NO:96, SEQ ID NO:97, and SEQ ID NO:98; and b) a polynucleotide having at least 85% homology to a polynucleotide of a), and which confers substantially the same yield as the polynucleotide of a).
40. An isolated polynucleotide selected from the group consisting of:
a) a polynucleotide selected from the group consisting of SEQ ID
NO:100, SEQ ID NO:101, SEQ ID. NO:103, SEQ ID NO:104, SEQ ID NO:106, SEQ ID
NO:107, SEQ ID NO:109, SEQ ID NO:110, SEQ ID NO:112, SEQ ID NO:113, SEQ ID
NO:114, SEQ ID NO:116, and SEQ ID NO:117; and b) a polynucleotide having at least 85% homology to a polynucleotide of a), and which confers substantially the same yield as the polynucleotide of a).
41. An isolated polynucleotide selected from the group consisting of:
a) a polynucleotide selected from the group consisting of SEQ ID
NO:119, SEQ ID NO:120, SEQ ID NO:121, SEQ ID NO:122, SEQ ID NO:123, SEQ ID
NO:124, SEQ ID NO:125, SEQ ID NO:127, SEQ ID NO:128, SEQ ID NO:129, SEQ ID
NO:130, SEQ ID NO:131, SEQ ID NO:133, SEQ ID NO:135, SEQ ID NO:136, SEQ ID
NO:137, SEQ ID NO:138, SEQ ID NO:140, SEQ ID NO:141, SEQ ID NO:142, SEQ ID
NO:144, SEQ ID NO:145, SEQ ID NO:146, SEQ ID NO:147, SEQ ID NO:149, SEQ ID
NO:150, SEQ ID NO:151, SEQ ID NO:152, SEQ ID NO:154, and SEQ ID NO:155; and b) a polynucleotide having at least 85% homology to a polynucleotide of a), and which confers substantially the same yield as the polynucleotide of a).
42. An isolated polynucleotide selected from the group consisting of:
a) a polynucleotide selected from the group consisting of SEQ ID
NO:157, SEQ ID NO:158, SEQ ID NO:160, SEQ ID NO:161, SEQ ID NO:162, SEQ ID
NO:163, SEQ ID NO:165, SEQ ID NO:166, SEQ ID NO:167, and SEQ ID NO:168; and b) a polynucleotide having at least 85% homology to a polynucleotide of a), and which confers substantially the same yield as the polynucleotide of a).
43. An isolated polypeptide selected from the group consisting of:
a) a polypeptide encoded by a polynucleotide selected from the group consisting of SEQ ID NO:92, SEQ ID NO:93, SEQ ID. NO:94, SEQ ID NO:96, SEQ ID
NO:97, and SEQ ID NO:98; and b) a polypeptide encoded by a polynucleotide having at least 85% homology to a polynucleotide of a), and which confers substantially the same yield as the polynucleotide of a).
44. An isolated polypeptide selected from the group consisting o~
a) a polypeptide encoded by a polynucleotide selected from the group consisting of SEQ ID NO:100, SEQ ID NO:101, SEQ ID. NO:103, SEQ ID NO:104, SEQ
ID
NO:106, SEQ ID NO:107, SEQ ID NO:109, SEQ ID NO:110, SEQ ID NO:112, SEQ ID
NO:113, SEQ ID NO:114, SEQ ID NO:116, and SEQ ID NO:117; and b) a polypeptide encoded by a polynucleotide having at least 85% homology to a polynucleotide of a), and which confers substantially the same yield as the polynucleotide of a).
45. An isolated polypeptide selected from the group consisting of:
a) a polypeptide encoded by a polynucleotide selected from the group consisting of SEQ ID NO:119, SEQ ID NO:120, SEQ ID NO:121, SEQ ID NO:122, SEQ
ID
NO:123, SEQ ID NO:124, SEQ ID NO:125, SEQ ID NO:127, SEQ ID NO:128, SEQ ID
NO:129, SEQ ID NO:130, SEQ ID NO:131, SEQ ID NO:133, SEQ ID NO:135, SEQ ID
NO:136, SEQ ID NO:137, SEQ ID NO:138, SEQ ID NO:140, SEQ ID NO:141, SEQ ID
NO:142, SEQ ID NO:144, SEQ ID NO:145, SEQ ID NO:146, SEQ ID NO:147, SEQ ID
NO:149, SEQ ID NO:150, SEQ ID NO:151, SEQ ID NO:152, SEQ ID NO:154, and SEQ ID
NO:155; and b) a polypeptide encoded by a polynucleotide having at least 85% homology to a polynucleotide of a), and which confers substantially the same yield as the polynucleotide of a).
46. An isolated polypeptide selected from the group consisting of:
a) a polypeptide encoded by a polynucleotide selected from the group consisting of SEQ ID NO:157, SEQ ID NO:158, SEQ ID NO:160, SEQ ID NO:161, SEQ
ID
NO:162, SEQ ID NO:163, SEQ ID NO:165, SEQ ID NO:166, SEQ ID NO:167, and SEQ ID
NO:168; and b) a polypeptide encoded by a polynucleotide having at least 85% homology to a polynucleotide of a), and which confers substantially the same yield as the polynucleotide of a).
47. A method for identifying a polynucleotide sequence encoding a polypeptide of a wild ancestor or domesticated organism, wherein said polypeptide is or is suspected of being associated with a enhanced economic productivity in said wild ancestor or domesticated organism, comprising the steps of:
a) comparing polypeptide-coding nucleotide sequences of said wild ancestor and said domesticated organism; and b) selecting a polynucleotide sequence in either the domesticated organism or the wild ancestor that contains a nucleotide change as compared to the corresponding sequence in the wild ancestor or domesticated organism, respectively, wherein said change is evolutionarily neutral or positively evolutionarily significant, whereby the polynucleotide which encodes a polypeptide associated with enhanced economic productivity is identified.
48. A method for identifying a polynucleotide sequence encoding a polypeptide of a wild ancestor of a domesticated organism, wherein said polypeptide is or is suspected of being associated with a stress-resistance trait that is unique, enhanced or altered in the wild ancestor of the domesticated organism as compared to the domesticated organism, comprising the steps of:
a) comparing polypeptide-.coding nucleotide sequences of said domesticated organism to polypeptide-coding nucleotide sequences of said wild ancestor; and b) selecting a polynucleotide sequence in the wild ancestor that contains a nucleotide change as compared to a corresponding sequence in the domesticated organism, wherein said change is evolutionarily neutral, whereby the wild ancestor's polynucleotide sequence is identified.
49. The method of claim 48 wherein said domesticated organism is a plant selected from the group consisting of maize, rice, tomato, potato and other domesticated plants whose ancestor is known.
50. The method of claim 49 wherein said domesticated plant is maize and said wild ancestor is teosinte.
51. The method of claim 48 wherein the protein-coding nucleotide sequences of said domesticated organism correspond to cDNA.
52. The method of claim 48, wherein the nucleotide change is a non-synonymous substitution.
53. The method of claim 48, wherein the domesticated organism is a plant and the stress-resistance trait is selected from the group consisting of drought resistance, disease resistance, pest resistance, high salt level resistance and other stress-resistance traits of commercial interest.
54. A method of identifying an agent which may modulate a stress-resistance trait in a wild ancestor of a domesticated organism, said method comprising contacting at least one candidate agent with the wild ancestor, the domesticated organism, or with a cell or transgenic organism that expresses the polynucleotide sequence identified in claim 48, wherein the agent is identified by its ability to modulate the function of the polynucleotide or of the polypeptide encoded by the identified polynucleotide sequence.
55. A method for modulating stress-resistance in a wild ancestor of a domesticated organism by administering the agent of claim 54.
56. A method of identifying an agent to a polypeptide sequence encoded by a polynucleotide sequence in a domesticated organism that corresponds to the wild ancestor stress-resistance polynucleotide sequence of claim 48, comprising contacting at least one candidate agent with the domesticated organism, the ancestor organism, or with a cell or transgenic organism that expresses the polynucleotide sequence, whereby the agent is identified by its ability to modulate function of the polypeptide sequence.
57. A method for modulating stress-resistance in a domesticated organism by administering the agent of claim 56.
58. A method for identifying an evolutionarily neutral change in a polypeptide-coding polynucleotide sequence of a wild ancestor of a domesticated organism comprising the steps of:

a) comparing polypeptide-coding polynucleotide sequences of said wild ancestor to corresponding sequences of said domesticated organism; and b) selecting a polynucleotide sequence in said domesticated organism that contains a nucleotide change as compared to the corresponding sequence of the wild ancestor, wherein the change is evolutionarily neutral and the polynucleotide is associated with a stress-resistance trait in the wild ancestor of the domesticated organism, whereby the evolutionarily neutral change in the polynucleotide is identified.
59. The method of claim 58 wherein said domesticated organism is a plant selected from the group consisting of maize, rice, tomato, potato and other domesticated plants for which the ancestor is known.
60. The method of claim 58 wherein said domesticated plant is maize and said wild ancestor is teosinte.
61. The method of claim 58 wherein the protein-coding nucleotide sequences of said domesticated organism correspond to cDNA.
62. The method of claim 58, wherein the nucleotide change is a non-synonymous substitution.
63. The method of claim 58, where the domesticated organism is a plant and the relevant trait is selected from the group consisting of drought resistance, disease resistance, pest resistance, high salt level resistance, and other stress resistance traits of commercial interest.
64. A method for large scale sequence comparison between polypeptide-coding nucleotide sequences of a wild ancestor of a domesticated organism and polypeptide-coding sequences from said domesticated organism, wherein the wild ancestor organism polypeptide confers or is suspected of conferring a stress-resistance trait that is unique, enhanced or altered in the wild ancestor as compared to the domesticated organism, comprising:
a) aligning the wild ancestor sequences with corresponding sequences from the domesticated organism according to sequence homology; and b) identifying any nucleotide changes within the domesticated organism sequences or wild ancestor sequences as compared to the homologous sequences from the wild ancestor or domesticated organism, respectively.
65. The method of claim 64, wherein the domesticated organism is a plant selected from the group consisting of maize, rice, tomato and other domesticated plants whose ancestors are known.
66. The method of claim 65 wherein said domesticated plant is maize and said wild ancestor is teosinte.
67. The method of claim 64 wherein the protein-coding nucleotide sequences of said domesticated species correspond to cDNA.
68. A method for correlating an evolutionarily neutral nucleotide change to a stress-resistance trait that is unique, enhanced or altered in a wild ancestor of a domesticated organism comprising:
a) identifying a nucleotide sequence according to claim 48; and b) analyzing the functional effect of the presence or absence of the identified sequence in the domesticated organism or ancestor organism.
69. A method for making a transfected plant cell or a transgenic plant comprising the steps of:
a) transforming a plant cell to contain a polynucleotide encoding the stress-resistance polypeptide of a wild ancestor of claim 48, wherein said polynucleotide is operably linked to a promoter that can be used effectively for expression of transgenic proteins;
b) optionally growing and maintaining said cell under conditions whereby a transgenic plant is regenerated therefrom.
70. The transfected cell generated by the method of claim 69.
71. The transgenic plant generated by the method of claim 69.
72. An isolated polynucleotide selected from the group consisting of:
a) a polynucleotide selected from the group consisting of SEQ ID NO:1, SEQ ID NO:91, SEQ ID. NO:2, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:7, SEQ ID
NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:14, SEQ ID NO:15, and SEQ ID
NO:17, and SEQ ID NO:18; and b) a polynucleotide having at least 85% homology to a polynucleotide of a), and which confers substantially the same yield as the polynucleotide of a).
73. An isolated polynucleotide selected from the group consisting of:
a) a polynucleotide selected from the group consisting of SEQ ID NO:20, SEQ ID NO:21, SEQ ID. NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:27, SEQ ID
NO:28, SEQ ID NO:29, and SEQ ID NO:90; and b) a polynucleotide having at least 85% homology to a polynucleotide of a), and which confers substantially the same yield as the polynucleotide of a).
74. An isolated polynucleotide selected from the group consisting of a) a polynucleotide selected from the group consisting of SEQ ID NO:33, SEQ ID NO:34, SEQ ID. NO:35, SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:40, SEQ ID
NO:41, SEQ ID NO:42, SEQ ID NO:44, SEQ ID NO:45, SEQ ID NO:46, SEQ ID NO:47, SEQ ID NO:49, SEQ ID. NO:50, SEQ ID NO:51, SEQ ID NO:53, SEQ ID NO:54, SEQ ID
NO:55, SEQ ID NO:57, SEQ ID NO:58, SEQ ID NO:60, SEQ ID NO:62, SEQ ID NO:63, and SEQ ID NO:64; and b) a polynucleotide having at least 85% homology to a polynucleotide of a), and which confers substantially the same yield as the polynucleotide of a).
75. An isolated polynucleotide selected from the group consisting of:
a) a polynucleotide selected from the group consisting of SEQ ID NO:66, SEQ ID NO:67, SEQ ID NO:69, SEQ ID. NO:70, SEQ ID NO:71, SEQ ID NO:73, SEQ ID
NO:74, SEQ ID NO:75, SEQ ID NO:77, SEQ ID NO:59, and SEQ ID NO:78; and b) a polynucleotide having at least 85% homology to a polynucleotide of a), and which confers substantially the same yield as the polynucleotide of a).
76. An isolated polynucleotide selected from the group consisting of:
a) a polynucleotide selected from the group consisting of SEQ ID NO:80, SEQ ID NO:81, and SEQ ID NO:82; and b) a polynucleotide having at least 85% homology to a polynucleotide of a), and which confers substantially the same yield as the polynucleotide of a).
77. An isolated polynucleotide selected from the group consisting of:
a) a polynucleotide selected from the group consisting of SEQ ID NO:84 and SEQ ID NO:85; and b) a polynucleotide having at least 85% homology to a polynucleotide of a), and which confers substantially the same yield as the polynucleotide of a).
78. An isolated polypeptide selected from the group consisting of:
a) a polypeptide encoded by a polynucleotide selected from the group consisting of SEQ ID NO:1, SEQ ID NO:91, SEQ ID. NO:2, SEQ ID NO:4, SEQ ID
NO:5, SEQ ID NO:7, SEQ ID NO:10, SEQ ID NO:11, SEQ ID NO:12, SEQ ID NO:14, SEQ ID
NO:15, and SEQ ID NO:17, and SEQ ID NO:18; and b) a polypeptide encoded by a polynucleotide having at least 85% homology to a polynucleotide of a), and which confers substantially the same yield as the polynucleotide of a).
CA002473555A 2002-01-16 2003-01-16 Methods to identify evolutionarily significant changes in polynucleotide and polypeptide sequences in domesticated plants and animals Abandoned CA2473555A1 (en)

Applications Claiming Priority (9)

Application Number Priority Date Filing Date Title
US34908802P 2002-01-16 2002-01-16
US60/349,088 2002-01-16
US34966102P 2002-01-17 2002-01-17
US60/349,661 2002-01-17
US10/079,042 2002-02-19
US10/079,042 US7252966B2 (en) 1999-01-29 2002-02-19 EG307 polynucleotides and uses thereof
US36854102P 2002-03-29 2002-03-29
US60/368,541 2002-03-29
PCT/US2003/001460 WO2003062382A2 (en) 2002-01-16 2003-01-16 Methods to identify evolutionarily significant changes in polynucleotide and polypeptide sequences in domesticated plants and animals

Publications (1)

Publication Number Publication Date
CA2473555A1 true CA2473555A1 (en) 2003-07-31

Family

ID=27617766

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002473555A Abandoned CA2473555A1 (en) 2002-01-16 2003-01-16 Methods to identify evolutionarily significant changes in polynucleotide and polypeptide sequences in domesticated plants and animals

Country Status (9)

Country Link
EP (1) EP1501942A4 (en)
JP (1) JP2005518199A (en)
KR (1) KR20040081139A (en)
CN (1) CN1630731A (en)
AU (2) AU2003217221B2 (en)
BR (1) BR0306968A (en)
CA (1) CA2473555A1 (en)
IL (1) IL162897A0 (en)
WO (1) WO2003062382A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102934548A (en) * 2011-08-15 2013-02-20 东北农业大学 Method for auxiliary identification of drought resistant maize

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2006230352A1 (en) * 2005-03-29 2006-10-05 Evolutionary Genomics Llc EG1117 and EG307 polynucleotides and uses thereof
CA2620897A1 (en) * 2005-09-02 2007-03-08 Evolutionary Genomics, Inc. Eg8798 and eg9703 polynucleotides and uses thereof
CN102888398B (en) * 2011-07-22 2014-03-05 中国农业科学院生物技术研究所 Flanking sequence of exogenous insertion fragment of transgenic rice variety Bar68-1 and application thereof

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6245969B1 (en) * 1997-06-24 2001-06-12 Joanne Chory Receptor kinase, Bin1
US6228586B1 (en) * 1998-01-30 2001-05-08 Genoplex, Inc. Methods to identify polynucleotide and polypeptide sequences which may be associated with physiological and medical conditions
US6274319B1 (en) * 1999-01-29 2001-08-14 Walter Messier Methods to identify evolutionarily significant changes in polynucleotide and polypeptide sequences in domesticated plants and animals
DE60036645T2 (en) * 1999-08-05 2008-07-17 Evolutionary Genomics LLC, Lafayette METHODS FOR IDENTIFYING EVOLUTIONARY SIGNIFICANT CHANGES IN POLYNUCLEOTIDE AND POLYPEPTIDE SEQUENCES IN DOMESTICED PLANTS AND ANIMALS

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102934548A (en) * 2011-08-15 2013-02-20 东北农业大学 Method for auxiliary identification of drought resistant maize
CN102934548B (en) * 2011-08-15 2014-07-02 东北农业大学 Method for auxiliary identification of drought resistant maize

Also Published As

Publication number Publication date
AU2003217221B2 (en) 2008-11-27
WO2003062382A2 (en) 2003-07-31
JP2005518199A (en) 2005-06-23
EP1501942A4 (en) 2006-06-07
BR0306968A (en) 2006-04-11
CN1630731A (en) 2005-06-22
IL162897A0 (en) 2005-11-20
KR20040081139A (en) 2004-09-20
AU2009200805A1 (en) 2009-03-26
EP1501942A2 (en) 2005-02-02
WO2003062382A3 (en) 2004-12-09

Similar Documents

Publication Publication Date Title
CA2301500A1 (en) Genes encoding enzymes for lignin biosynthesis and uses thereof
Antonius-Klemola Molecular markers in Rubus (Rosaceae) research and breeding
AU2009200805A1 (en) Methods to identify evolutionary significant changes in polynucleotides and polypeptide sequences in domesticated plants and animals
US7252966B2 (en) EG307 polynucleotides and uses thereof
US20110083229A1 (en) EG1117 And EG307 Polynucleotides And Uses Thereof
US7439018B2 (en) EG1117 Polynucleotides and uses thereof
AU2003217221A1 (en) Methods to identify evolutionarily significant changes in polynucleotide and polypeptide sequences in domesticated plants and animals
EP1947201A2 (en) Methods to identify evolutionarily significant changes in polynucleotide and polypeptide sequences in domesticated plants and animals
US20090133164A1 (en) EG1117 polynucleotides and uses thereof
US20050234654A1 (en) Detection of evolutionary bottlenecking by dna sequencing as a method to discover genes of value
KR20020008163A (en) Methods
US20080047032A1 (en) Eg307 nucleic acids and uses thereof
JP2009297039A (en) Method for identifying evolutionarily significant change in polynucleotide and polypeptide sequence in domesticated plant and animal
NL1021811C2 (en) Plant variety identification agent comprises plant virus, for example, attenuated cucumber mosaic virus containing RNA specific base sequence
US20080256659A1 (en) Eg8798 and Eg9703 Polynucleotides and Uses Thereof
EP1307479A2 (en) Herbicide target genes and methods
US20110173723A1 (en) EG82013 and EG81345 Nucleic Acids and Uses Thereof
CA2531087A1 (en) Methods to identify polynucleotide and polypeptide sequences which may be associated with physiological and medical conditions
KR20150101596A (en) Primer set for classifing balloon flower, Classification method for balloon flower using the same, and Classification kit for balloon flower using the same
CN101501192A (en) EG8798 and EG9703 polynucleotides and uses thereof

Legal Events

Date Code Title Description
EEER Examination request
FZDE Discontinued