WO2024186806A2 - Gènes de résistance aux agents pathogènes des plantes - Google Patents

Gènes de résistance aux agents pathogènes des plantes Download PDF

Info

Publication number
WO2024186806A2
WO2024186806A2 PCT/US2024/018501 US2024018501W WO2024186806A2 WO 2024186806 A2 WO2024186806 A2 WO 2024186806A2 US 2024018501 W US2024018501 W US 2024018501W WO 2024186806 A2 WO2024186806 A2 WO 2024186806A2
Authority
WO
WIPO (PCT)
Prior art keywords
plant
rot
maize
gene
seq
Prior art date
Application number
PCT/US2024/018501
Other languages
English (en)
Other versions
WO2024186806A3 (fr
Inventor
Shawn Thatcher
Original Assignee
Pioneer Hi-Bred International, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pioneer Hi-Bred International, Inc. filed Critical Pioneer Hi-Bred International, Inc.
Publication of WO2024186806A2 publication Critical patent/WO2024186806A2/fr
Publication of WO2024186806A3 publication Critical patent/WO2024186806A3/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/415Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from plants
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01HNEW PLANTS OR NON-TRANSGENIC PROCESSES FOR OBTAINING THEM; PLANT REPRODUCTION BY TISSUE CULTURE TECHNIQUES
    • A01H1/00Processes for modifying genotypes ; Plants characterised by associated natural traits
    • A01H1/04Processes of selection involving genotypic or phenotypic markers; Methods of using phenotypic markers for selection
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01HNEW PLANTS OR NON-TRANSGENIC PROCESSES FOR OBTAINING THEM; PLANT REPRODUCTION BY TISSUE CULTURE TECHNIQUES
    • A01H1/00Processes for modifying genotypes ; Plants characterised by associated natural traits
    • A01H1/12Processes for modifying agronomic input traits, e.g. crop yield
    • A01H1/122Processes for modifying agronomic input traits, e.g. crop yield for stress resistance, e.g. heavy metal resistance
    • A01H1/1245Processes for modifying agronomic input traits, e.g. crop yield for stress resistance, e.g. heavy metal resistance for biotic stress resistance, e.g. pathogen, pest or disease resistance
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8241Phenotypically and genetically modified plants via recombinant DNA technology
    • C12N15/8261Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield
    • C12N15/8271Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield for stress resistance, e.g. heavy metal resistance
    • C12N15/8279Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield for stress resistance, e.g. heavy metal resistance for biotic stress resistance, pathogen resistance, disease resistance

Definitions

  • sequence listing is submitted electronically as an xml- formatted sequence listing file named 9667-US-PSP ST26 created on March 2, 2023, and having a size of 9,668,699 bytes which is filed concurrently with the specification.
  • sequence listing comprised in this xml-formatted document is part of the specification and is herein incorporated by reference in its entirety.
  • the disclosures relates to disease resistance genes, plant breeding and methods of identifying and selecting disease resistance genes.
  • Plant pathogens cause significant crop loss world-wide, and new resistance genes deployed to combat diseases can be overcome quickly. Plant disease resistance gene complements are the result of millions of years of coevolution with pathogens. Resistance genes encode proteins which form a multi-layer defense mechanism that can detect pathogen- associated molecular patterns (PAMPs) or damage-associated molecular patterns (DAMPs) through extracellular pattern recognition receptors (PRRs), as well small, secreted pathogen effectors, through intracellular nucleotide-binding leucine-rich repeat receptors (NLRs) (Zipfel 2014 Trends Immunol, 35:345-51; Monteiro and Nishimura 2018 Annu Rev Phytopathol, 56:243-267; Jones and Dangl, 2006 Nature, 444:323-9).
  • PAMPs pathogen- associated molecular patterns
  • DAMPs damage-associated molecular patterns
  • NLRs intracellular nucleotide-binding leucine-rich repeat receptors
  • PRRs are primarily comprised of trans-membrane domain-containing proteins, in which extracellular domains interact with PAMPs or DAMPs. This interaction can cause a conformatiexonal change that initiates a signaling cascade through the action of an intracellular kinase domain (Tang et al. 2017 Plant Cell, 29:618-637). Effectors are excreted by plant pathogens for a variety of purposes, including suppression of plant defense responses that are triggered by PRRs (Irieda et al.
  • NLRs have been found to underlie dominant resistance phenotypes in many crop species, including rice, soybean, wheat and maize (Liu et al. 2020 Plant Biotechnol J, 18: 1376-1383; Wang et al. 2021 Nat Commun, 12:6263; Saintenac et al. 2013 Science, 341 :783-786; Deng et al. 2022 Mol Plant, 15:904-912; Thatcher et al. 2022 Mol Plant Pathol DOI: 10.1111/mpp.13267).
  • Maize pathogens cause significant crop loss annually, and thus there is significant interest in identifying new sources of resistance genes (Mueller 2016 Plant Health Progress, 17: 12). Maize is thought to have been domesticated during a single event roughly 9,000 years ago, implying that a significant portion of the resistance gene diversity in maize’s wild ancestors may have been lost in modem day varieties through the initial domestication event and subsequent breeding (Yang et al. 2019 Proc Natl Acad Sci U S A, 116:5643-5652; Matsuoka et al. 2002 Proc Natl Acad Sci USA, 99, 6080-4).
  • compositions and methods are based on the discovery disclosed herein of a large number of new maize genes, including genes that have the structural features and expression patterns making them suitable for use as disease resistance genes.
  • These disease resistance genes (“R genes”) can provide increased resistance to a disease.
  • the compositions and methods disclosed herein are thus useful in selecting disease resistant plants, breeding for disease resistant plants, creating transgenic disease resistant plants, and/or using genome editing to introduce or improve disease resistance in plants.
  • plants and methods for making plants having the disclosed markers and/or genes associated with disease resistance that is enhanced as compared to control plants.
  • the compositions and methods are useful in selecting disease resistant plants, introgressing disease resistance into plants, creating transgenic disease resistant plants, and/or creating disease resistant genome edited plants.
  • the methods for identifying and/or selecting comprise detecting or selecting one or more plant materials having a genomic region comprising a sequence that (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739, (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478.
  • the identified or selected plant may possess plant disease resistance that is newly conferred or enhanced relative to a control plant that does not have a genomic region comprising one or more of a genomic region comprising a sequence that (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739, (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478.
  • methods are provided to identify and/or select plant materials with a QTL containing an R gene or marker allele associated with R gene that can confer increased resistance to plant disease.
  • such methods can include obtaining a nucleic acid sample from a plant, seed, tissue or germplasm thereof; and screening the sample for the presence of a QTL containing the R gene or a marker allele associated with the R gene, wherein the R gene (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739, (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478.
  • the method can include screening the sample for the presence of a marker allele linked to the R gene, e.g., by 10 cM, 9 cM, 8 cM, 7 cM, 6 cM, 5 cM, 4 cM, 3 cM, 2 cM, 1 cM, 0.9 cM, 0.8 cM, 0.7 cM, 0.6 cM, 0.5 cM, 0.4 cM, 0.3 cM, 0.2 cM, 0.1 cM, or less on a single meiosis-based genetic map, and associated.
  • a marker allele linked to the R gene e.g., by 10 cM, 9 cM, 8 cM, 7 cM, 6 cM, 5 cM, 4 cM, 3 cM, 2 cM, 1 cM, 0.9 cM, 0.8 cM, 0.7 cM, 0.6 cM, 0.5 cM, 0.4
  • the method can further include detecting one or more R genes or one or more marker alleles linked to R genes, where the one or more R gene (i) comprise a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739 or (ii) encode an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478, thereby identifying the plant material as comprising a QTL or marker allele associated with increased resistance to plant disease. Additionally, the method can include selecting the plant material identified as comprising one or more R genes or one or more marker alleles linked to R genes.
  • the foregoing method of identifying and/or selecting plant materials with a QTL or marker allele associated with increased resistance to plant disease can include obtaining a nucleic acid sample from each of one or more plants, seeds, tissues or germplasm in a population; screening each sample for the presence of one or more R gene, a QTL comprising one or more R gene, or a marker allele associated with the R gene, wherein each R gene (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739 or (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478; and selecting one or more of the plants, seeds, tissues or germplasm having the R gene associated with increased resistance to plant disease.
  • the foregoing methods of identifying and/or selecting plant materials with increased resistance to plant disease can include obtaining a nucleic acid sample from one or more plants, seeds, tissues or germplasm, each sample being representative of a plurality (e.g., a population) of plants, seeds, tissues or germplasm; screening each sample for the presence of one or more of the foregoing R genes, a QTL comprising one or more of the foregoing R genes, or a marker allele associated with the foregoing R genes, wherein each R gene (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: l- 1739 or (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478; and selecting one or more plurality of plants, seeds, tissues or germplasm, wherein the representative sample for the selected plurality has the one or more of the foregoing R genes, a QTL comprising one or more of the foregoing R genes, or
  • the foregoing methods identifying and/or selecting plants can further include crossing at least one of the selected plants comprising an R gene to a second plant that does not have the R gene, thereby producing a progeny plant whose genome comprises one or more R genes that (i) comprise a coding sequence selected from the group consisting of SEQ ID NOs: 1 - 1739 or (ii) encode an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478.
  • the second plant is one of a plant line (a “recurrent parent line”) and the method further includes crossing the progeny plant with another plant of the recurrent parent line to produce a second-generation progeny whose genome comprises one or more R genes that (i) comprise a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739 or (ii) encode an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478.
  • the second- generation progeny can be crossed with the recurrent parent line to produce a third-generation progeny whose genome comprises one or more R genes that (i) comprise a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739 or (ii) encode an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478.
  • This process can be repeated three, four, five, six, seven, or more times, such that each subsequent generation progeny is crossed with the recurrent parent line, thereby introgressing the R gene into the recurrent parent line.
  • a plant having the one or more R genes is crossed with a second plant to produce progeny plants.
  • the progeny plants are screened for a QTL or marker allele associated with the R gene in accordance with the methods disclosed herein.
  • screening includes obtaining a nucleic acid sample from each of the progeny plants and screening the sample for the presence of nucleic acid comprising one or more R genes that
  • methods include expressing in a plant material a heterologous nucleic acid capable increasing plant disease resistance.
  • the method can include introducing into the plant material a nucleic acid sequence, e.g., by transgenic modification or genome editing, approaches.
  • the plant material is susceptible to plant disease prior to introducing the heterologous nucleic acid.
  • the genome of a plant is altered by transgenic modification or gene editing to include one or more R genes that (i) are at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% nucleic acid sequence identity to one of SEQ ID NOs: 1-1739 or (ii) encode an amino acid sequence having 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% amino acid sequence identity to one of SEQ ID NOs: 1740-3478.
  • the transgenic or genome edited plant materials provide increased resistance to a plant disease relative to otherwise isogenic plant lacking materials lacking the R gene introduced by transgenic or gene edited modification.
  • the construct comprises a nucleic acid that is heterologous to the plant material, and the heterologous nucleic acid comprises an R gene sequence that (i) comprises at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% nucleic acid sequence identity to one of SEQ ID NOs: 1-1739 or (ii) encodes a polypeptide comprising at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% amino acid sequence identity to one of SEQ ID NOs: 1740-3478.
  • the construct is a recombinant construct in the foregoing R gene sequence is operably linked to at least one heterologous regulatory sequence.
  • a method of introducing the foregoing construct comprising an R gene sequence into plant material wherein, for example, the construct comprises the R gene sequence operably linked to its native promoter and the construct is introduced into a heterologous genomic locus that did not comprise the R gene prior to the construct being introduced.
  • the foregoing construct is recombinant expression construct that comprises the R gene operably linked to a heterologous promoter; and the method comprises introducing the recombinant expression construct into the plant material.
  • plant materials such as a plant (e.g. a maize plant), plant cell, plant tissue, seed, or germplasm thereof, comprising the isolated construct disclosed herein.
  • the methods embodied by the present disclosure relate to a method for transforming a host cell, which can be a plant cell.
  • the method comprises transforming the host or plant cell with the isolated construct disclosed herein.
  • the method can further include producing a plant by transforming a plant cell with a construct of the present disclosure and regenerating a plant from the transformed plant cell, thereby producing a plant having the R gene disclosed herein.
  • the regenerated plant has improved plant disease resistance, as compared to an isogenic plant lacking the R gene.
  • compositions and methods relate to plant material modified to include an R gene, the plant material having increased resistance to a plant disease, wherein prior to modification the plant material lacked the R.
  • the plant material is modified, e.g., by mutagenesis, transgene insertion, or gene editing, to include a nucleotide sequence (i) having at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% nucleic acid sequence identity to one of SEQ ID NOs: 1-1739 or (ii) encoding a polypeptide having at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identity, at least 96% identity, at least 97% identity, at least 98% identity, at least 99%, or 100% amino acid sequence identity to one of SEQ ID NOs: 1740-3478.
  • a method of generating a variant of an R gene by gene shuffling one or more nucleotide sequences comprising an R gene (i) having at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% nucleic acid sequence identity to one of SEQ ID NOs:l-1739 or (ii) encoding a polypeptide having at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identity, at least 96% identity, at least 97% identity, at least 98% identity, at least 99%, or 100% amino acid sequence identity to one of SEQ ID NOs: 1740-3478.
  • Variants are then transiently or stably expressed in plant material and tested for whether they provide increased resistance to plant disease.
  • one or more variants can be incorporated into construct(s), and the construct(s) can be introduced into a regenerable plant cell; and a plant comprising the variant(s) construct can be regenerated from the plant cell.
  • Plants containing the variant(s) can be evaluated for their tolerance/ susceptibility to gray leaf spot.
  • Plants having a variant that provides increased resistance to plant disease, relative to an isogenic plant that lacks the variant construct can be selected.
  • the plant can be maize, or the plant can be Arabidopsis, soybean, sunflower, sorghum, canola, wheat, alfalfa, cotton, rice, barley, millet, sugar cane, or switchgrass.
  • the method comprises obtaining a plurality of plants each of which exhibits differing levels of plant disease resistance; screening nucleic acid samples from each plant for the presence of allelic variations in the R gene sequence (i) having at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% nucleic acid sequence identity to one of SEQ ID NOs: 1-1739 or (ii) encoding a polypeptide having at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95% identity, at least 96% identity, at least 97% identity, at least 98% identity, at least 99%, or 100% amino acid sequence identity to one of SEQ ID NOs: 1740-3478; evaluating the variations for genetic linkage to altered tolerance/susceptible to the plant disease; and identifying one or
  • the R gene can be associated with increased disease resistance to a plant disease.
  • a plant disease includes , bacterial leaf blight and stalk rot; bacterial leaf spot; bacterial stripe; chocolate spot; goss's bacterial wilt and blight; holcus spot; purple leaf sheath; seed rot-seedling blight; bacterial wilt; com stunt; anthracnose leaf blight; gray leaf spot; aspergillus ear and kernel rot; banded leaf and sheath spot; black bundle disease; black kernel rot; borde bianco; brown spot; black spot; stalk rot; cephalosporium kernel rot; charcoal rot; corticium ear rot; curvularia leaf spot; didymella leaf spot; diplodia ear rot and stalk rot; seed rot; corn seedling blight; diplodia leaf spot or leaf streak; downy mildews; brown strip
  • Fig. 1 is a bar chart showing the distribution of different domain architectures involving canonical NLR domains in maize; abundance of indicated domain architectures are shown for the sum of all 26 maize NAM lines.
  • CC coiled-coil
  • NB NB-ARC
  • LRR Leucine- rich repeat
  • TIR Toll/interleukin 1 receptor
  • RPW8 RPW8-like coiled-coil.
  • Fig. 2 is a bar chart showing the distribution of integrated domains; atypical integrated domains identified via HMMer searches of NLR proteins are shown for all 26 NAM genomes.
  • Figure 3 is a graph shows the average Shannon Entropy across different NLR features of a composite constructed by averaging the entropy of all 158 NLR clusters.
  • a gene or allele is “associated with” a trait when it is part of or linked to a DNA sequence or allele that affects the expression of the trait.
  • the presence of the allele is an indicator of how the trait will be expressed.
  • disease resistant As used to herein, “disease resistant”, “increased plant disease resistance”, “increased resistance to plant disease”, “plant disease resistance” and the like refer to a plant showing increase resistance to a disease compared to a control plant, e.g., a control plant can be one that lacks the QTL or R gene that provides disease resistance but is otherwise isogenic to the disease resistant plant. Disease resistance may manifest in fewer and/or smaller lesions, increased plant health, increased yield, increased root mass, increased plant vigor, less or no discoloration, increased growth, reduced necrotic area, or reduced wilting.
  • an R gene or variant disclosed herein may show resistance one or more diseases [0027]
  • a plant having disease resistance may have 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100% increased resistance to a disease compared to a control plant.
  • a plant may have 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, or 100% increased plant health in the presence of a disease compared to a control plant.
  • chromosomal interval designates a contiguous linear span of genomic DNA that resides in planta on a single chromosome.
  • the genetic elements or genes located on a single chromosomal interval are physically linked.
  • the size of a chromosomal interval is not particularly limited.
  • the genetic elements located within a single chromosomal interval are genetically linked, typically with a genetic recombination distance of, for example, less than or equal to 20 cM, or alternatively, less than or equal to 10 cM. That is, two genetic elements within a single chromosomal interval undergo recombination at a frequency of less than or equal to 20% or less than or equal to 10%.
  • crossed refers to a sexual cross and involved the fusion of two haploid gametes via pollination to produce diploid progeny (e.g., cells, seeds or plants).
  • diploid progeny e.g., cells, seeds or plants.
  • the term encompasses both the pollination of one plant by another and selfing (or self- pollination, e.g., when the pollen and ovule are from the same plant).
  • An “elite line” is any line that has resulted from breeding and selection for superior agronomic performance.
  • a “favorable allele” is the allele at a particular locus (a marker, a QTL, a gene etc.) that confers, or contributes to, an agronomically desirable phenotype, e.g., disease resistance, and that allows the identification of plants with that agronomically desirable phenotype.
  • a favorable allele of a marker is a marker allele that segregates with the favorable phenotype.
  • Genetic markers are nucleic acids that are polymorphic in a population and where the alleles of which can be detected and distinguished by one or more analytic methods, e.g., RFLP, AFLP, isozyme, SNP, SSR, and the like.
  • the term also refers to nucleic acid sequences complementary to the genomic sequences, such as nucleic acids used as probes. Markers corresponding to genetic polymorphisms between members of a population can be detected by methods well-established in the art.
  • PCR-based sequence specific amplification methods include, e.g., PCR-based sequence specific amplification methods, detection of restriction fragment length polymorphisms (RFLP), detection of isozyme markers, detection of polynucleotide polymorphisms by allele specific hybridization (ASH), detection of amplified variable sequences of the plant genome, detection of self-sustained sequence replication, detection of simple sequence repeats (SSRs), detection of single nucleotide polymorphisms (SNPs), or detection of amplified fragment length polymorphisms (AFLPs).
  • ESTs expressed sequence tags
  • SSR markers derived from EST sequences and randomly amplified polymorphic DNA (RAPD).
  • germplasm refers to genetic material of or from an individual (e.g., a plant), a group of individuals (e.g., a plant line, variety or family), or a clone derived from a line, variety, species, or culture, or more generally, all individuals within a species or for several species (e.g., maize germplasm collection or Andean germplasm collection).
  • the germplasm can be part of an organism, cell, or can be separate from the organism or cell.
  • germplasm provides genetic material with a specific molecular makeup that provides a physical foundation for some or all of the hereditary qualities of an organism or cell culture.
  • germplasm includes cells, seed or tissues from which new plants may be grown, or plant parts, such as leafs, stems, pollen, or cells, that can be cultured into a whole plant.
  • a “haplotype” is the genotype of an individual at a plurality of genetic loci, i.e. a combination of alleles. Typically, the genetic loci described by a haplotype are physically and genetically linked, i.e., on the same chromosome segment.
  • heterogeneity is used to indicate that individuals within the group differ in genotype at one or more specific loci.
  • heterosis can be defined by performance which exceeds the average of the parents (or high parent) when crossed to other dissimilar or unrelated groups.
  • a “heterotic group” comprises a set of genotypes that perform well when crossed with genotypes from a different heterotic group (Hallauer et al. (1998) Corn breeding, p. 463- 564. In G.F. Sprague and J.W. Dudley (ed.) Corn and corn improvement). Inbred lines are classified into heterotic groups, and are further subdivided into families within a heterotic group, based on several criteria such as pedigree, molecular marker-based associations, and performance in hybrid combinations (Smith et al. (1990) Theor. Appl. Gen. 80:833-840).
  • Iowa Stiff Stalk Synthetic also referred to herein as “stiff stalk”
  • Lancaster or “Lancaster Sure Crop” (sometimes referred to as NSS, or non-Stiff Stalk).
  • BSSS Stiff Stalk Synthetic population
  • NSS Non-Stiff Stalk.
  • This group includes several major heterotic groups such as Lancaster Surecrop, lodent, and Learning Corn.
  • homogeneity indicates that members of a group have the same genotype at one or more specific loci.
  • hybrid refers to the progeny obtained between the crossing of at least two genetically dissimilar parents.
  • inbred refers to a line that has been bred for genetic homogeneity.
  • introgression refers to the transmission of a desired allele of a genetic locus from one genetic background to another.
  • introgression of a desired R gene allele at a specified locus can be transmitted to at least one progeny via a sexual cross between two parents of the same species, where at least one of the parents has the desired allele in its genome.
  • transmission of an allele can occur by recombination between two donor genomes, e.g., in a fused protoplast, where at least one of the donor protoplasts has the desired allele in its genome.
  • the desired allele can be, e.g., detected by a marker that is associated with a phenotype, at a QTL, a transgene, or the like.
  • Offspring comprising the desired allele may be repeatedly backcrossed to a line having a desired genetic background and selected for the desired allele, to result in the allele becoming fixed in a selected genetic background.
  • a “line” or “strain” is a group of individuals of identical parentage that are generally inbred to some degree and that are generally homozygous and homogeneous at most loci (isogenic or near isogenic).
  • a “subline” refers to an inbred subset of descendents that are genetically distinct from other similarly inbred subsets descended from the same progenitor.
  • the term “linked” or “linkage” is used to describe the degree with which one marker locus is associated with another marker locus or some other locus.
  • the linkage relationship between a molecular marker and a locus affecting a phenotype is given as a “probability” or “adjusted probability”.
  • Linkage can be expressed as a desired limit or range.
  • any marker is linked (genetically and physically) to any other marker when the markers are separated by less than 50, 40, 30, 25, 20, or 15 map units (or cM) of a single meiosis map (a genetic map based on a population that has undergone one round of meiosis, such as e.g.
  • bracketed range of linkage for example, between 10 and 20 cM, between 10 and 30 cM, or between 10 and 40 cM.
  • the phrase “closely linked”, in the present application, means that recombination between two linked loci occurs with a frequency of equal to or less than about 10% (i.e., are separated on a genetic map by not more than 10 cM). Put another way, the closely linked loci co-segregate at least 90% of the time.
  • Marker loci are especially useful with respect to the subject matter of the current disclosure when they demonstrate a significant probability of co-segregation (linkage) with a desired trait (e.g., increased resistance to plant disease).
  • “closely linked” loci such as a marker locus and a second locus can display an inter-locus recombination frequency of 10% or less, preferably about 9% or less, still more preferably about 8% or less, yet more preferably about 7% or less, still more preferably about 6% or less, yet more preferably about 5% or less, still more preferably about 4% or less, yet more preferably about 3% or less, and still more preferably about 2% or less.
  • the relevant loci display a recombination frequency of about 1% or less, e.g., about 0.75% or less, more preferably about 0.5% or less, or yet more preferably about 0.25% or less.
  • a recombination frequency of about 1% or less, e.g., about 0.75% or less, more preferably about 0.5% or less, or yet more preferably about 0.25% or less.
  • Two loci that are localized to the same chromosome, and at such a distance that recombination between the two loci occurs at a frequency of less than 10% (e.g., about 9 %, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.75%, 0.5%, 0.25%, or less) are also said to be “in proximity to” each other.
  • any marker is closely linked (genetically and physically) to any other marker that is in close proximity, e.g., at or less than 10 cM distant.
  • Two closely linked markers on the same chromosome can be positioned 9, 8, 7, 6, 5, 4, 3, 2, 1, 0.75, 0.5 or 0.25 cM or less from each other.
  • two different markers can have the same genetic map coordinates. In that case, the two markers are in such close proximity to each other that recombination occurs between them with such low frequency that it is undetectable.
  • linkage disequilibrium refers to a non-random segregation of genetic loci or traits (or both). In either case, linkage disequilibrium implies that the relevant loci are within sufficient physical proximity along a length of a chromosome so that they segregate together with greater than random (i.e., non-random) frequency. Markers that show linkage disequilibrium are considered linked. Linked loci co-segregate more than 50% of the time, e.g., from about 51% to about 100% of the time.
  • linkage can be between two markers, or alternatively between a marker and a locus affecting a phenotype.
  • a marker locus can be “associated with” (linked to) a trait. The degree of linkage of a marker locus and a locus affecting a phenotypic trait is measured, e.g., as a statistical probability of co-segregation of that molecular marker with the phenotype (e.g., an F statistic or LOD score).
  • Linkage disequilibrium is most commonly assessed using the measure r2, which is calculated using the formula described by Hill, W.G. and Robertson, A, Theor. Appl. Genet. 38:226-231(1968).
  • r2 1
  • complete LD exists between the two marker loci, meaning that the markers have not been separated by recombination and have the same allele frequency.
  • the r2 value will be dependent on the population used. Values for r2 above 1/3 indicate sufficiently strong LD to be useful for mapping (Ardlie et al. 2002 Nature Reviews Genetics 3:299-309).
  • alleles are in linkage disequilibrium when r2 values between pairwise marker loci are greater than or equal to 0.33, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, or 1.0.
  • linkage equilibrium describes a situation where two markers independently segregate, i.e., sort among progeny randomly. Markers that show linkage equilibrium are considered unlinked (whether or not they lie on the same chromosome).
  • a “locus” is a position on a chromosome, e.g. where a nucleotide, gene, sequence, or marker is located.
  • LOD score The “logarithm of odds (LOD) value” or “LOD score” (Risch, 1992 Science 255(5046):803-804) is used in genetic interval mapping to describe the degree of linkage between two marker loci.
  • a LOD score of three between two markers indicates that linkage is 1000 times more likely than no linkage, while a LOD score of two indicates that linkage is 100 times more likely than no linkage.
  • LOD scores greater than or equal to two may be used to detect linkage.
  • LOD scores can also be used to show the strength of association between marker loci and quantitative traits in “quantitative trait loci” mapping. In this case, the LOD score’s size is dependent on the closeness of the marker locus to the locus affecting the quantitative trait, as well as the size of the quantitative trait effect.
  • plant material includes whole plants, plant cells, plant protoplast, plant cell or tissue culture from which plants can be regenerated, plant calli, plant clumps and plant cells that are intact in plants, or parts of plants, such as seeds, flowers, cotyledons, leaves, stems, buds, roots, root tips and the like.
  • a “modified plant” means any plant that has a genetic change due to human intervention.
  • a modified plant may have genetic changes introduced through plant transformation, genome editing, mutagenesis, or conventional plant breeding.
  • a “marker” is a means of finding a position on a genetic or physical map, or else linkages among markers and trait loci (loci affecting traits).
  • the position that the marker detects may be known via detection of polymorphic alleles and their genetic mapping, or else by hybridization, sequence match or amplification of a sequence that has been physically mapped.
  • a marker can be a DNA marker (detects DNA polymorphisms), a protein (detects variation at an encoded polypeptide), or a simply inherited phenotype (such as the ‘waxy’ phenotype).
  • a DNA marker can be developed from genomic nucleotide sequence or from expressed nucleotide sequences (e.g., from a spliced RNA or a cDNA). Depending on the DNA marker technology, the marker may consist of primers complementary to sequence flanking the locus and/or probes that hybridize to polymorphic alleles at the locus.
  • a DNA marker, or a genetic marker may also be used to describe the gene, DNA sequence or nucleotide on the chromosome itself (rather than the components used to detect the gene or DNA sequence) and is often used when that DNA marker is associated with a particular trait in human genetics (e.g. a marker for breast cancer).
  • the term marker locus is the locus (gene, sequence or nucleotide) that the marker detects.
  • Markers can be defined by the type of polymorphism that they detect and also the marker technology used to detect the polymorphism. Marker types include but are not limited to, e.g., detection of restriction fragment length polymorphisms (RFLP), detection of isozyme markers, randomly amplified polymorphic DNA (RAPD), amplified fragment length polymorphisms (AFLPs), detection of simple sequence repeats (SSRs), detection of amplified variable sequences of the plant genome, detection of self-sustained sequence replication, or detection of single nucleotide polymorphisms (SNPs). SNPs can be detected e.g.
  • RFLP restriction fragment length polymorphisms
  • RAPD randomly amplified polymorphic DNA
  • AFLPs amplified fragment length polymorphisms
  • SSRs simple sequence repeats
  • SNPs single nucleotide polymorphisms
  • DNA sequencing via DNA sequencing, PCR-based sequence specific amplification methods, detection of polynucleotide polymorphisms by allele specific hybridization (ASH), dynamic allele-specific hybridization (DASH), molecular beacons, microarray hybridization, oligonucleotide ligase assays, Flap endonucleases, 5’ endonucleases, primer extension, single strand conformation polymorphism (SSCP) or temperature gradient gel electrophoresis (TGGE).
  • DNA sequencing such as the pyrosequencing technology has the advantage of being able to detect a series of linked SNP alleles that constitute a haplotype. Haplotypes tend to be more informative (detect a higher level of polymorphism) than SNPs.
  • Marker assisted selection (of MAS) is a process by which individual plants are selected based on marker genotypes.
  • Marker assisted counter-selection is a process by which marker genotypes are used to identify plants that will not be selected, allowing them to be removed from a breeding program or planting.
  • a “marker haplotype” refers to a combination of alleles at a marker locus.
  • molecular marker may be used to refer to a genetic marker, as defined above, or an encoded product thereof (e.g., a protein) used as a point of reference when identifying a linked locus.
  • a molecular marker can be derived from genomic nucleotide sequences or from expressed nucleotide sequences (e.g., from a spliced RNA, a cDNA, etc.), or from an encoded polypeptide.
  • the term also refers to nucleic acid sequences complementary to or flanking the marker sequences, such as nucleic acids used as probes or primer pairs capable of amplifying the marker sequence.
  • a “molecular marker probe” is a nucleic acid sequence or molecule that can be used to identify the presence of a marker locus, e.g., a nucleic acid probe that is complementary to a marker locus sequence.
  • a marker probe refers to a probe of any type that is able to distinguish (i.e., genotype) the particular allele that is present at a marker locus.
  • Nucleic acids are “complementary” when they specifically hybridize in solution. Some of the markers described herein are also referred to as hybridization markers when located on an indel region, such as the non-collinear region described herein.
  • the insertion region is, by definition, a polymorphism vis a vis a plant without the insertion.
  • the marker need only indicate whether the indel region is present or absent. Any suitable marker detection technology may be used to identify such a hybridization marker, e.g. SNP technology is used in the examples provided herein.
  • phenotype can refer to the observable expression of a gene or series of genes.
  • the phenotype can be observable to the naked eye, or by any other means of evaluation, e.g., weighing, counting, measuring (length, width, angles, etc.), microscopy, biochemical analysis, or an electromechanical assay.
  • a phenotype is directly controlled by a single gene or genetic locus, i.e., a “single gene trait” or a “simply inherited trait”.
  • single gene traits can segregate in a population to give a “qualitative” or “discrete” distribution, i.e.
  • a phenotype falls into discrete classes.
  • a phenotype is the result of several genes and can be considered a “multigenic trait” or a “complex trait”.
  • Multigenic traits segregate in a population to give a “quantitative” or “continuous” distribution, i.e. the phenotype cannot be separated into discrete classes. Both single gene and multigenic traits can be affected by the environment in which they are being expressed, but multigenic traits tend to have a larger environmental component.
  • a “polymorphism” is a variation in the DNA between two or more individuals within a population.
  • a polymorphism preferably has a frequency of at least 1% in a population.
  • a useful polymorphism can include a single nucleotide polymorphism (SNP), a simple sequence repeat (SSR), or an insertion/deletion polymorphism, also referred to herein as an “indel”
  • QTL quantitative trait locus
  • a “reference sequence” or a “consensus sequence” is a defined sequence used as a basis for sequence comparison.
  • the reference sequence for a marker is obtained by sequencing a number of lines at the locus, aligning the nucleotide sequences in a sequence alignment program (e.g. Sequencher), and then obtaining the most common nucleotide sequence of the alignment. Polymorphisms found among the individual sequences are annotated within the consensus sequence.
  • a reference sequence is not usually an exact copy of any individual DNA sequence, but represents an amalgam of available sequences and is useful for designing primers and probes to polymorphisms within the sequence.
  • An “unfavorable allele” of a marker is a marker allele that segregates with the unfavorable plant phenotype, therefore providing the benefit of identifying plants that can be removed from a breeding program or planting.
  • yield refers to the productivity per unit area of a particular plant product of commercial value. Yield is affected by both genetic and environmental factors. “Agronomics,” “agronomic traits,” and “agronomic performance” refer to the traits (and underlying genetic elements) of a given plant variety that contribute to yield over the course of growing season. Individual agronomic traits include emergence vigor, vegetative vigor, stress tolerance, disease resistance or tolerance, herbicide resistance, branching, flowering, seed set, seed size, seed density, standability, threshability and the like. Yield is, therefore, the final culmination of all agronomic traits. [0062] NLR-genes.
  • NBS-LRR (“NLR”) group of R genes is the largest class of R genes discovered to date. In Arabidopsis thaliana, over 150 are predicted to be present in the genome (Meyers et al. 2003 Plant Cell, 15:809-834; Monosi et al. 2004 Theoretical and Applied Genetics, 109:1434-1447), while in rice, approximately 500 NLR genes have been predicted (Monosi 2004, supra).
  • the NBS-LRR class of R genes is comprised of two subclasses. Class 1 NLR genes contain a TIR-Toll/Interleukin-1 like domain at their N’ terminus; which to date have only been found in dicots (Meyers 2003, supra; Monosi 2004, supra).
  • NBS-LRR The second class of NBS-LRR contain either a coiled-coil domain or an (nt) domain at their N terminus (Baiet et al. 2002 Genome Research, 12: 1871-1884; Monosi 2004 supra; Pan et al. 2000 Journal of Molecular Evolution, 50:203-213). Class 2 NBS-LRR have been found in both di cot and monocot species. (Bai 2002, supra; Meyers 2003, supra; Monosi 2004, supra; Pan 2000, supra).
  • the NBS domain of the gene appears to have a role in signaling in plant defense mechanisms (van derBiezen et al. 1998, Current Biology: CB, 8:R226-R227).
  • the LRRregion appears to be the region that interacts with the pathogen AVR products (Michelmore et al. 1998 Genome Res, 8:1113-1130; Meyers 2003 supra).
  • This LRR region in comparison with the NB- ARC (NBS) domain is under a much greater selection pressure to diversify (Michelmore 1998, supra; Meyers 2003, supra; Palomino et al. 2002, Genome Research, 12: 1305-1315).
  • LRR domains are found in other contexts as well; these 20-29-residue motifs are present in tandem arrays in a number of proteins with diverse functions, such as hormone - receptor interactions, enzyme inhibition, cell adhesion and cellular trafficking.
  • NLRs typically comprise a nucleotide-binding domain, a series of leucine-rich repeats, and an N-terminal region which may include a coiled-coil (CC), Toll/Interleukin-1 (TIR) or resistance to powdery mildew 8 (RPW8) domain (Shao et al. 2016 Plant Physiol, 170:2095-109).
  • CC coiled-coil
  • TIR Toll/Interleukin-1
  • RPW8 resistance to powdery mildew 8
  • NLRs may arise via rare recombination events which result in domains with high similarity to effector targets being integrated into NLR genes, which then detect the presence of effectors though direct interaction (Grund et al. 2019 Plant Physiol 179: 1227-1235).
  • NLRs can detect the presence of pathogen effectors through (i) direct interaction of an effector with canonical NLR domains, (ii) direct interaction of an effector with an integrated domain that mimics the effector’s host target or (iii) interaction with a host gene targeted by an effector (guardee) to detect alteration of its normal state by the pathogen (van der Hoorn and Kamoun, 2008 Plant Cell 20:2009-17; Cesari et al.
  • ID NLRs typically contain all canonical NLR domains but do not function in direct detection of pathogen effectors, but instead act to transmit the signal of a “sensor” NLR (Wu et al. 2017 Proc Natl Acad Sci USA 114:8113-8118). Helper NLRs have been found in a variety of plant species, and can be specific to a single sensor NLR, or interact with a wide variety of sensors (Saile et al. 2020 PLoS Biol 18:e3000783).
  • NLR complements of a variety of plant species have been identified, including a nearly comprehensive set of Arabidopsis and rice NLR complements (Van de Weyer et al. 2019 Cell 178: 1260-1272 el4; Shang et al. 2022 Cell Res, 32:878-896).
  • NLR PAV largely takes place through the expansion and contraction of large physically compact clusters of NLRs in a few locations within the genome (Meyers et al. 2003 Plant Cell, 15:809-34; Jacob et al. 2013 Front Immunol, 4:297). These regions are thought to represent evolutionary hotspots, where different NLRs may rapidly recombine to generate new sequence diversity (van Wersch and Li 2019 Trends Plant Sci, 24:688-699 ).
  • NLRs are subject to differential evolutionary pressures (Meyers et al. 1998 Plant Cell, 10: 1833-46).
  • NB-ARC domains which are functional ATPases that control the activation states of NLRs, typically show high conservation, while LRRs and coiled-coil domains have much higher amino acid diversity (Qi et al. 2012 Plant Physiol, 158: 1819-32).
  • a common measure of linkage is the frequency with which traits cosegregate. This can be expressed as a percentage of cosegregation (recombination frequency) or in centiMorgans (cM).
  • the cM is a unit of measure of genetic recombination frequency.
  • One cM is equal to a 1% chance that a trait at one genetic locus will be separated from a trait at another locus due to crossing over in a single generation (meaning the traits segregate together 99% of the time). Because chromosomal distance is approximately proportional to the frequency of crossing over events between traits, there is an approximate physical distance that correlates with recombination frequency.
  • Marker loci are themselves traits and can be assessed according to standard linkage analysis by tracking the marker loci during segregation. Thus, one cM is equal to a 1% chance that a marker locus will be separated from another locus, due to crossing over in a single generation.
  • a marker is to a gene (e.g., R gene disclosed herein which (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739, (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478) controlling a trait of interest, the more effective and advantageous that marker is as an indicator for the desired trait.
  • a gene e.g., R gene disclosed herein which (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs: 1-1739, (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478) controlling a trait of interest, the more effective and advantageous that marker is as an indicator for the desired trait.
  • Closely linked loci display an inter-locus cross-over frequency of about 10% or less, preferably about 9% or less, still more preferably about 8% or less, yet more preferably about 7% or less, still more preferably about 6% or less, yet more preferably about 5% or less, still more preferably about 4% or less, yet more preferably about 3% or less, and still more preferably about 2% or less.
  • the relevant loci e.g., a marker locus and a target locus
  • the loci are about 10 cM, 9 cM, 8 cM, 7 cM, 6 cM, 5 cM, 4 cM, 3 cM, 2 cM, 1 cM, 0.75 cM, 0.5 cM or 0.25 cM or less apart.
  • two loci that are localized to the same chromosome, and at such a distance that recombination between the two loci occurs at a frequency of less than 10% (e.g., about 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, 0.75%, 0.5%, 0.25%, or less) are said to be “proximal to” each other.
  • marker locus is not necessarily responsible for the expression of the disease resistance phenotype.
  • the marker polynucleotide sequence be part of a gene that is responsible for the disease resistant phenotype (for example, is part of the gene open reading frame).
  • the association between a specific marker allele and the disease resistance trait is due to the original “coupling” linkage phase between the marker allele and the allele in the ancestral line from which the allele originated. Eventually, with repeated recombination, crossing over events between the marker and genetic locus can change this orientation.
  • the favorable marker allele may change depending on the linkage phase that exists within the parent having resistance to the disease that is used to create segregating populations. This does not change the fact that the marker can be used to monitor segregation of the phenotype. It only changes which marker allele is considered favorable in a given segregating population.
  • Marker assisted selection Molecular markers can be used in a variety of plant breeding applications (e.g. see Staub et al. 1996 Hortscience 31 :729-741; Tanksley 1983 Plant Molecular Biology Reporter. 1 :3-8).
  • One of the main areas of interest is to increase the efficiency of backcrossing and introgressing genes using marker-assisted selection (MAS).
  • MAS marker-assisted selection
  • a molecular marker that demonstrates linkage with a locus affecting a desired phenotypic trait provides a useful tool for the selection of the trait in a plant population. This is particularly true where the phenotype is hard to assay.
  • DNA marker assays are less laborious and take up less physical space than field phenotyping, much larger populations can be assayed, increasing the chances of finding a recombinant with the target segment from the donor line moved to the recipient line.
  • a marker is located within the gene itself, so that recombination cannot occur between the marker and the gene.
  • the methods disclosed herein produce a marker in a disease resistance gene, wherein the gene was identified by inferring genomic location from clustering of conserved domains or a clustering analysis.
  • flanking regions When a gene is introgressed by MAS, it is not only the gene that is introduced but also the flanking regions (Gepts. 2002 Crop Sci 42: 1780-1790). This is referred to as “linkage drag.” In the case where the donor plant is highly unrelated to the recipient plant, these flanking regions carry additional genes that may code for agronomically undesirable traits. Linkage drag may also result in reduced yield or other negative agronomic characteristics even after multiple cycles of backcrossing into the elite line.
  • flanking region can be decreased by additional backcrossing, although this is not always successful, as breeders do not have control over the size of the region or the recombination breakpoints (Young et al. 1998 Genetics 120:579-585). In classical breeding it is usually only by chance that recombinations are selected that contribute to a reduction in the size of the donor segment (Tanksley et al. 1989 Biotechnology 7: 257-264). Even after 20 backcrosses in backcrosses of this type, one may expect to find a sizeable piece of the donor chromosome still linked to the gene being selected.
  • markers however, it is possible to select those rare individuals that have experienced recombination near the gene of interest.
  • 150 backcross plants there is a 95% chance that at least one plant will have experienced a crossover within 1 cM of the gene, based on a single meiosis map distance. Markers will allow unequivocal identification of those individuals.
  • With one additional backcross of 300 plants there would be a 95% chance of a crossover within 1 cM single meiosis map distance of the other side of the gene, generating a segment around the target gene of less than 2 cM based on a single meiosis map distance. This can be accomplished in two generations with markers, while it would have required on average 100 generations without markers (See Tanksley et al., supra).
  • flanking markers surrounding the gene can be utilized to select for recombinations in different population sizes. For example, in smaller population sizes, recombinations may be expected further away from the gene, so more distal flanking markers would be required to detect the recombination.
  • the key components to the implementation of MAS are: (i) defining the population within which the marker-trait association will be determined, which can be a segregating population, or a random or structured population; (ii) monitoring the segregation or association of polymorphic markers relative to the trait, and determining linkage or association using statistical methods; (iii) defining a set of desirable markers based on the results of the statistical analysis, and (iv) the use and/or extrapolation of this information to the current set of breeding germplasm to enable marker-based selection decisions to be made.
  • the markers described in this disclosure, as well as other marker types such as SSRs and FLPs, can be used in marker assisted selection protocols.
  • SSRs can be defined as relatively short runs of tandemly repeated DNA with lengths of 6 bp or less (Tautz 1989 Nucleic Acid Research 17: 6463-6471; Wang et al. 1994 Theoretical and Applied Genetics, 88: 1-6). Polymorphisms arise due to variation in the number of repeat units, probably caused by slippage during DNA replication (Levinson and Gutman 1987 Mol Biol Evol 4: 203-221). The variation in repeat length may be detected by designing PCR primers to the conserved non-repetitive flanking regions (Weber and May 1989 Am J Hum Genet. 44:388-396).
  • SSRs are highly suited to mapping and MAS as they are multi-allelic, codominant, reproducible and amenable to high throughput automation (Rafalski et al. 1996 Generating and using DNA markers in plants. In: Non-mammalian genomic analysis: a practical guide. Academic press, pp 75-135).
  • SSR markers can be generated, and SSR profiles can be obtained by gel electrophoresis of the amplification products. Scoring of marker genotype is based on the size of the amplified fragment.
  • FLP markers can also be generated. Most commonly, amplification primers are used to generate fragment length polymorphisms. Such FLP markers are in many ways similar to SSR markers, except that the region amplified by the primers is not typically a highly repetitive region.
  • the amplified region, or amplicon will have sufficient variability among germplasm, often due to insertions or deletions, such that the fragments generated by the amplification primers can be distinguished among polymorphic individuals, and such indels are known to occur frequently in maize (Bhattramakki et al. 2002 Plant Mol Biol 48, 539-547; Rafalski 2002b, supra).
  • SNP markers detect single base pair nucleotide substitutions. Of all the molecular marker types, SNPs are the most abundant, thus having the potential to provide the highest genetic map resolution (Bhattramakki et al. 2002 Plant Molecular Biology 48:539-547). SNPs can be assayed at an even higher level of throughput than SSRs, in a so-called 'ultra-high- throughput' fashion, as SNPs do not require large amounts of DNA and automation of the assay may be straight-forward. SNPs also have the promise of being relatively low-cost systems. These three factors together make SNPs highly attractive for use in MAS.
  • a number of SNPs together within a sequence, or across linked sequences, can be used to describe a haplotype for any particular genotype (Ching et al. 2002, BMC Genet. 3: 19; Gupta et al. 2001, Rafalski 2002b, Plant Science 162:329-333).
  • Haplotypes can be more informative than single SNPs and can be more descriptive of any particular genotype.
  • a single SNP may be allele “T' for a specific line or variety with disease resistance, but the allele T' might also occur in the breeding population being utilized for recurrent parents.
  • a haplotype e.g. a combination of alleles at linked SNP markers, may be more informative.
  • haplotype Once a unique haplotype has been assigned to a donor chromosomal region, that haplotype can be used in that population or any subset thereof to determine whether an individual has a particular gene. Using automated high throughput marker detection platforms makes this process highly efficient and effective.
  • SNP single nucleotide polymorphic
  • the primers are used to amplify DNA segments from individuals (preferably inbred) that represent the diversity in the population of interest.
  • the PCR products are sequenced directly in one or both directions.
  • the resulting sequences are aligned and polymorphisms are identified.
  • the polymorphisms are not limited to single nucleotide polymorphisms (SNPs), but also include indels, CAPS, SSRs, and VNTRs (variable number of tandem repeats).
  • markers within the described map region can be hybridized to BACs or other genomic libraries, or electronically aligned with genome sequences, to find new sequences in the same approximate location as the described markers.
  • ESTs expressed sequence tags
  • RAPD randomly amplified polymorphic DNA
  • Isozyme profiles and linked morphological characteristics can, in some cases, also be indirectly used as markers. Even though they do not directly detect DNA differences, they are often influenced by specific genetic differences. However, markers that detect DNA variation are far more numerous and polymorphic than isozyme or morphological markers (Tanksley 1983 Plant Molecular Biology Reporter 1 : 3-8).
  • Sequence alignments or contigs may also be used to find sequences upstream or downstream of the specific markers listed herein. These new sequences, close to the markers described herein, are then used to discover and develop functionally equivalent markers. For example, different physical and/or genetic maps are aligned to locate equivalent markers not described within this disclosure but that are within similar regions. These maps may be within the species, or even across other species that have been genetically or physically aligned.
  • MAS uses polymorphic markers that have been identified as having a significant likelihood of co- segregation with a trait conferred by the R gene disclosed herein (e.g., a sequence that (i) comprises a coding sequence selected from the group consisting of SEQ ID NOs:l-1739, (ii) encodes an amino acid sequence selected from the selected from the group consisting of SEQ ID NOs: 1740-3478).
  • markers are presumed to map near a gene or genes that give the plant its disease resistant phenotype, and are considered indicators for the desired trait, or markers.
  • plants are tested for the presence of a desired allele in the marker, and plants containing a desired genotype at one or more loci are expected to transfer the desired genotype, along with a desired phenotype, to their progeny.
  • plants with one or more disease resistance R-genes disclosed herein may be selected for by detecting one or more marker alleles, and in addition, progeny plants derived from those plants can also be selected.
  • a plant containing a desired genotype in a given chromosomal region i.e. a genotype associated with disease resistance
  • the progeny of such a cross would then be evaluated genotypically using one or more markers and the progeny plants with the same genotype in a given chromosomal region would then be selected as having disease resistance.
  • SNPs could be used alone or in combination (i.e. a SNP haplotype) to select for a favorable R gene allele associated with disease resistance.
  • a SNP haplotype can include a combination of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, or 13 of the markers for the R gene disclosed herein.
  • a SNP haplotype can also include a combination of 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, or 39 of such markers for one or more R gene disclosed herein.
  • polymorphic sites at marker loci in and around a chromosome marker identified by the methods disclosed herein wherein one or more polymorphic sites is in linkage disequilibrium (LD) with an allele at one or more of the polymorphic sites in the haplotype and thus could be used in a marker assisted selection program to introgress a gene allele or genomic fragment of interest.
  • LD linkage disequilibrium
  • Two particular alleles at different polymorphic sites are said to be in LD if the presence of the allele at one of the sites tends to predict the presence of the allele at the other site on the same chromosome (Stevens 1999 Mol. Diag. 4:309-17).
  • the marker loci can be located within 5 cM, 2 cM, or 1 cM (on a single meiosis based genetic map) of the disease resistance trait QTL comprising an R-gene disclosed herein.
  • Allelic frequency can differ from one germplasm pool to another. Germplasm pools vary due to maturity differences, heterotic groupings, geographical distribution, etc. As a result, SNPs and other polymorphisms may not be informative in some germplasm pools.
  • a “recombinant protein” is used herein to refer to a protein that is no longer in its natural environment, for example in vitro or in a recombinant bacterial or plant host cell; a protein that is expressed from a polynucleotide that has been edited from its native version; or a protein that is expressed from a polynucleotide in a different genomic position relative to the native sequence.
  • R-gene encoded polypeptides including a polypeptides having an amino acid sequence that has at least about 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater sequence identity to one of SEQ ID NOs: 1740-3478.
  • the sequence identity is against the full-length sequence of a polypeptide.
  • the term “about” when used herein in context with percent sequence identity means +/- 1.0 percentage point, relative to the recited percentage.
  • substantially free of cellular material refers to a polypeptide including preparations of protein having less than about 30%, 20%, 10% or 5% (by dry weight) of non-target protein (also referred to herein as a “contaminating protein”).
  • “Fragments” or “biologically active portions” include polypeptide or polynucleotide fragments comprising sequences sufficiently identical to an R gene or R gene encoded polypeptide disclosed herein, respectively, and that exhibit disease resistance when expressed in a plant.
  • “Variants” as used herein refers to proteins or polypeptides having an amino acid sequence that is at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater identical to the parental amino acid sequence, e.g., one of SEQ ID NOs: 1740-3478.
  • amino acid sequence variants of a polypeptide may be prepared by mutations in the DNA. This may also be accomplished by one of several forms of mutagenesis, such as for example site-specific double strand break technology, and/or in directed evolution. In some aspects, the changes encoded in the amino acid sequence will not substantially affect the function of the protein. Such variants will possess the desired activity. However, it is understood that the ability of an R gene-encoded polypeptide to confer disease resistance may, in some cases, be improved by the use of such techniques upon the compositions of this disclosure.
  • nucleic acid molecule refers to DNA molecules (e.g., recombinant DNA, cDNA, genomic DNA, plastid DNA, mitochondrial DNA) and RNA molecules (e.g., mRNA) and analogs of the DNA or RNA generated using nucleotide analogs.
  • the nucleic acid molecule can be single-stranded. In some examples, the nucleic acid molecule can be double-stranded.
  • nucleic acid molecule e.g., RNA or DNA
  • isolated nucleic acid molecule e.g., RNA or DNA
  • recombinant nucleic acid molecule e.g., RNA or DNA
  • nucleic acid sequence e.g., RNA or DNA
  • an “isolated” or “recombinant” nucleic acid is free of sequences (preferably protein encoding sequences) that naturally flank the nucleic acid (i.e., sequences located at the 5' and 3' ends of the nucleic acid) in the genomic DNA of the organism from which the nucleic acid is derived.
  • isolated or “recombinant” when used to refer to nucleic acid molecules excludes isolated chromosomes.
  • the recombinant nucleic acid molecules can contain less than about 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb or 0.1 kb of nucleic acid sequences that naturally flank the R gene nucleic acid molecule in genomic DNA of the cell from which the R gene is derived.
  • an isolated nucleic acid molecule comprising an R gene has one or more change in the nucleic acid sequence compared to the native or genomic nucleic acid sequence.
  • the change in the native or genomic nucleic acid sequence includes but is not limited to: changes in the nucleic acid sequence due to the degeneracy of the genetic code; changes in the nucleic acid sequence due to the amino acid substitution, insertion, deletion and/or addition compared to the native or genomic sequence; removal of one or more intron; deletion of one or more upstream or downstream regulatory regions; and deletion of the 5’ and/or 3’ untranslated region associated with the genomic nucleic acid sequence.
  • the nucleic acid molecule encoding one of SEQ ID NOs: 1740-3478 is a non-genomic sequence.
  • polynucleotides comprising R gene disclosed herein are contemplated. Such polynucleotides are useful for production of encoded polypeptides in host cells when operably linked to a suitable promoter, transcription termination and/or polyadenylation sequences. Such polynucleotides are also useful as probes for isolating homologous or substantially homologous polynucleotides that are R genes or related to R genes disclosed herein.
  • nucleic acid molecules encoding one of SEQ ID NOs: 1740- 3478, and variants, fragments and complements thereof.
  • “Complement” is used herein to refer to a nucleic acid sequence that is sufficiently complementary to a given nucleic acid sequence such that it can hybridize to the given nucleic acid sequence to thereby form a stable duplex.
  • a reverse complement is a complement formed by exchanging each A with T, T with A, C with G, and G with C in a sequence and then reversing the 5’ to 3’ order of the exchanged sequence, such that the reverse complement of 5’-ACCTGAG-3’ is 5’-CTCAGGT-3’.
  • “Polynucleotide sequence variants” is used herein to refer to a nucleic acid sequence that except for the degeneracy of the genetic code encodes the same polypeptide.
  • the nucleic acid molecule comprising an R gene is a non- genomic nucleic acid sequence.
  • a “non-genomic nucleic acid sequence” or “non-genomic nucleic acid molecule” or “non-genomic polynucleotide” refers to a nucleic acid molecule that has one or more change in the nucleic acid sequence compared to a native or genomic nucleic acid sequence.
  • the change to a native or genomic nucleic acid molecule includes but is not limited to: changes in the nucleic acid sequence due to the degeneracy of the genetic code; optimization of the nucleic acid sequence for expression in plants; changes in the nucleic acid sequence to introduce at least one amino acid substitution, insertion, deletion and/or addition compared to the native or genomic sequence; removal of one or more intron associated with the genomic nucleic acid sequence; insertion of one or more heterologous introns; deletion of one or more upstream or downstream regulatory regions associated with the genomic nucleic acid sequence; insertion of one or more heterologous upstream or downstream regulatory regions; deletion of the 5’ and/or 3’ untranslated region associated with the genomic nucleic acid sequence; insertion of a heterologous 5’ and/or 3’ untranslated region; and modification of a polyadenylation site.
  • the non- genomic nucleic acid molecule is a synthetic nucleic acid sequence.
  • the nucleic acid molecule comprising an R gene disclosed herein is a non-genomic polynucleotide having a nucleotide sequence having at least 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater identity, to a nucleic acid sequence encoding one of SEQ ID NOs: 1740-3478, wherein the R gene can confer disease resistance activity when expressed in a plant.
  • the nucleic acid molecule encodes a polypeptide variant comprising one or more amino acid substitutions relative to the amino acid sequence of one of SEQ ID NOs: 1740-3478.
  • Nucleic acid molecules that are fragments of these R gene nucleic acid sequences are also encompassed by the disclosure.
  • “Fragment” as used herein refers to a portion of the nucleic acid sequence encoding one of SEQ ID NOs: 1740-3478.
  • a fragment of a nucleic acid sequence may encode a biologically active portion of one of SEQ ID NOs: 1740-3478 or it may be a fragment that can be used as a hybridization probe or PCR primer using methods disclosed below.
  • Nucleic acid molecules that are fragments of a nucleic acid sequence encoding a polypeptide comprising at least about 150, 180, 210, 240, 270, 300, 330, 360, 400, 450, or 500 contiguous nucleotides or up to the number of nucleotides present in a full-length nucleic acid sequence encoding one of SEQ ID NOs: 1740-3478 (e.g., one of SEQ ID NOs: 1-1739, respectively), depending upon the intended use.
  • Contiguous nucleotides is used herein to refer to nucleotide residues that are immediately adjacent to one another.
  • Fragments of the nucleic acid sequences will encode protein fragments that retain the biological activity of the R gene-encoded polypeptide and, hence, retain disease resistance.
  • “Retains disease resistance” is used herein to refer to a polypeptide having at least about 10%, at least about 30%, at least about 50%, at least about 70%, 80%, 90%, 95% or higher of the disease resistance of the full- length R gene disclosed herein.
  • a polynucleotide disclosed herein encodes a polypeptide comprising an amino acid sequence having at least about 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or greater identity across the entire length of the amino acid sequence of one of SEQ ID NOs: 1740-3478.
  • such a polynucleotide comprises genomic sequence, including introns, regulatory elements, and untranslated regions.
  • the disclosure also provides nucleic acid molecules encoding variants of the R gene-encoded polypeptide disclosed herein.
  • “Variants” of R gene include sequences that encode the R polypeptides disclosed herein (such as one of SEQ ID NOs: 1740-3478) or a fragment or variant thereof, but that differ conservatively because of the degeneracy of the genetic code as well as those that are sufficiently identical as discussed above.
  • Naturally occurring allelic variants can be identified with the use of well-known molecular biology techniques, such as polymerase chain reaction (PCR) and hybridization techniques as outlined below.
  • Variant nucleic acid sequences also include synthetically derived nucleic acid sequences that have been generated, for example, by using site-directed mutagenesis but which still encode the R-gene polypeptidea disclosed herein.
  • variant nucleic acid molecules can be created by introducing one or more nucleotide substitutions, additions and/or deletions into the corresponding nucleic acid sequence disclosed herein, such that one or more amino acid substitutions, additions or deletions are introduced into the encoded protein. Mutations can be introduced by standard techniques, such as site-directed mutagenesis and PCR-mediated mutagenesis. Such variant nucleic acid sequences are also encompassed by the present disclosure.
  • variant nucleic acid sequences can be made by introducing mutations randomly along all or part of the coding sequence, such as by saturation mutagenesis, and the resultant mutants can be screened for ability to confer activity to identify mutants that retain activity.
  • the encoded protein can be expressed recombinantly, and the activity of the protein can be determined using standard assay techniques.
  • polynucleotides of the disclosure and fragments thereof are optionally used as substrates for a variety of recombination and recursive recombination reactions, in addition to standard cloning methods as set forth in, e.g., Ausubel, Berger and Sambrook, i.e., to produce additional polypeptide homologues and fragments thereof with desired properties. A variety of such reactions are known.
  • Methods for producing a variant of any nucleic acid listed herein comprising recursively recombining such polynucleotide with a second (or more) polynucleotide, thus forming a library of variant polynucleotides are also examples of the disclosure, as are the libraries produced, the cells comprising the libraries and any recombinant polynucleotide produced by such methods. Additionally, such methods optionally comprise selecting a variant polynucleotide from such libraries based on activity, as is wherein such recursive recombination is done in vitro or in vivo.
  • a variety of diversity generating protocols including nucleic acid recursive recombination protocols are available.
  • the procedures can be used separately, and/or in combination to produce one or more variants of a nucleic acid or set of nucleic acids, as well as variants of encoded proteins.
  • Individually and collectively, these procedures provide robust, widely applicable ways of generating diversified nucleic acids and sets of nucleic acids (including, e.g., nucleic acid libraries) useful, e.g., for the engineering or rapid evolution of nucleic acids, proteins, pathways, cells and/or organisms with new and/or improved characteristics.
  • the result of any of the diversity generating procedures described herein can be the generation of one or more nucleic acids, which can be selected or screened for nucleic acids with or which confer desirable properties or that encode proteins with or which confer desirable properties.
  • any nucleic acids that are produced can be selected for a desired activity or property, e.g. such activity at a desired pH, etc. This can include identifying any activity that can be detected, for example, in an automated or automatable format, by any of the assays in the art.
  • a variety of related (or even unrelated) properties can be evaluated, in serial or in parallel, at the discretion of the practitioner.
  • nucleotide sequences disclosed herein can also be used to isolate corresponding sequences from a different source. In this manner, methods such as PCR, hybridization, and the like can be used to identify such sequences based on their sequence homology to the sequences identified by the methods disclosed herein. Sequences that are selected based on their sequence identity to the entire sequences set forth herein or to fragments thereof are encompassed by the disclosure. Such sequences include sequences that are orthologs of the sequences.
  • the term “orthologs” refers to genes derived from a common ancestral gene and which are found in different species as a result of speciation. Genes found in different species are considered orthologs when their nucleotide sequences and/or their encoded protein sequences share substantial identity as defined elsewhere herein.
  • oligonucleotide primers can be designed for use in PCR reactions to amplify corresponding DNA sequences from cDNA or genomic DNA extracted from any organism of interest.
  • Methods for designing PCR primers and PCR cloning are disclosed in Sambrook et al. 1989 Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, New York), hereinafter “Sambrook”. See also, Innis et al., eds. 1990 PCR Protocols: A Guide to Methods and Applications (Academic Press, New York); Innis and Gelfand, eds.
  • PCR PCR Strategies (Academic Press, New York); and Innis and Gelfand, eds. 1999 PCR Methods Manual (Academic Press, New York).
  • Known methods of PCR include, but are not limited to, methods using paired primers, nested primers, single specific primers, degenerate primers, gene-specific primers, vector-specific primers, partially-mismatched primers, and the like.
  • hybridization probes may be genomic DNA fragments, cDNA fragments, RNA fragments or other oligonucleotides and may be labeled with a detectable group such as 32 P or any other detectable marker, such as other radioisotopes, a fluorescent compound, an enzyme or an enzyme cofactor.
  • Probes for hybridization can be made by labeling synthetic oligonucleotides based on the known polypeptide-encoding nucleic acid sequences disclosed herein.
  • the probe typically comprises a region of nucleic acid sequence that hybridizes under stringent conditions to at least about 12, at least about 25, at least about 50, 75, 100, 125, 150, 175 or 200 consecutive nucleotides of nucleic acid sequences encoding polypeptides or a fragment or variant thereof.
  • nucleotide Constructs are not intended to limit the disclosure to constructs comprising DNA.
  • Polynucleotide constructs particularly polynucleotides and oligonucleotides composed of ribonucleotides and combinations of ribonucleotides and deoxyribonucleotides, may also be employed in the methods disclosed herein.
  • the isolated polynucleotide constructs, nucleic acids, and nucleotide sequences disclosed herein additionally encompass all complementary forms (e.g., the reverse complement) of each sequence disclosed for such a construct.
  • polynucleotide constructs and nucleotide sequences disclosed herein can encompass any such constructs, molecules, and sequences suitable for use in a method for transforming plant material disclosed herein. Such constructs can include naturally occurring molecules and/or synthetic analogues.
  • the disclosed nucleotide constructs, nucleic acids, and nucleotide sequences also encompass all forms of nucleotide constructs including, but not limited to, single-stranded forms, double-stranded forms, hairpins, stem-and-loop structures and the like.
  • Transformed organisms disclosed herein include plant cells, bacteria, yeast, baculovirus, protozoa, nematodes and algae.
  • the transformed organism comprises a disclosed sequence (e.g., as part of a construct, expression cassette, or vector comprising the nucleotide sequence disclosed herein which are associated with increased disease resistance.
  • the disclosed sequences can be used in constructs for expression in the organism of interest.
  • Constructs can include 5’ and 3’; regulatory sequences operably linked to an R gene sequence, variant or fragment disclosed herein.
  • operably linked refers to a functional linkage between a promoter and/or a regulatory sequence and a second sequence, wherein the promoter and/or regulatory sequence initiates, mediates, and/or affects transcription of the DNA sequence corresponding to the second sequence.
  • operably linked means that the nucleic acid sequences being linked are contiguous and, where necessary, to join two protein coding regions in the same reading frame.
  • the construct may additionally contain at least one additional gene to be cotransformed into the organism. Alternatively, the additional gene(s) can be provided on multiple DNA constructs.
  • Such a DNA construct is provided with a plurality of restriction sites for insertion of the polypeptide gene sequence of the disclosure to be under the transcriptional regulation of the regulatory regions.
  • the DNA construct may additionally contain selectable marker genes.
  • the DNA construct will generally include in the 5' to 3' direction of transcription: a transcriptional and translational initiation region (e.g., a promoter), a DNA sequence of the embodiments, and a transcriptional and translational termination region (e.g., termination region) functional in the organism serving as a host.
  • the transcriptional initiation region e.g., the promoter
  • the transcriptional initiation region may be native, analogous, foreign or heterologous to the host organism and/or to the sequence of the embodiments.
  • the promoter or regulatory sequence may be the natural sequence or alternatively a synthetic sequence.
  • the term “foreign” as used herein indicates that the promoter is not found in the native organism into which the promoter is introduced.
  • the term “heterologous” in reference to a sequence means a sequence that originates from a foreign species or, if from the same species, is substantially modified from its native form in composition and/or genomic locus by deliberate human intervention.
  • a chimeric gene comprises a coding sequence operably linked to a transcription initiation region that is heterologous to the coding sequence. Where the promoter is a native or natural sequence, the expression of the operably linked sequence is altered from the wild-type expression, which results in an alteration in phenotype.
  • the DNA construct comprises a polynucleotide encoding one of SEQ ID NOs: 1740-3478 or a fragment or variant thereof. In some embodiments the DNA construct comprises a polynucleotide encoding a fusion protein that includes one of SEQ ID NOs: 1740-3478 or a fragment or variant thereof.
  • a DNA construct may also include a transcriptional enhancer sequence.
  • An “enhancer” refers to a DNA sequence which can stimulate promoter activity, and may be an innate element of the promoter or a heterologous element inserted to enhance the level or tissuespecificity of a promoter.
  • Various enhancers include, for example, introns with gene expression enhancing properties in plants (US Patent Application Publication Number 2009/0144863, the ubiquitin intron (i.e., the maize ubiquitin intron 1 (see, for example, NCBI sequence S94464)), the omega enhancer or the omega prime enhancer (Gallie et al. 1989 Molecular Biology ofRNA ed.
  • the termination region may be native with the transcriptional initiation region, may be native with the operably linked DNA sequence of interest, may be native with the plant host or may be derived from another source (i.e., foreign or heterologous to the promoter, the sequence of interest, the plant host or any combination thereof).
  • Convenient termination regions are available from the Ti-plasmid of A. tumefaciens, such as the octopine synthase and nopaline synthase termination regions. See also, Guerineau et al. 1991 Mol. Gen. Genet. 262: 141-144; Proudfoot 1991 Cell 64:671-674; Sanfacon et al. 1991 Genes Dev. 5: 141-149; Mogen et al. 1990 Plant Cell 2: 1261-1272; Munroe et al. 1990 Gene 91 :151-158; Ballas et al. 1989 Nucleic Acids Res. 17:7891-7903 and Joshi et al. 1987 Nucleic Acid Res. 15:9627-9639.
  • a nucleic acid may be optimized for increased expression in the host organism.
  • the synthetic nucleic acids can be synthesized using plant-preferred codons for improved expression. See, for example, Campbell and Gowri 1990 Plant Physiol. 92: 1-11 for a discussion of host-preferred usage.
  • nucleic acid sequences of the embodiments may be expressed in both monocotyledonous and dicotyledonous plant species, sequences can be modified to account for the specific preferences and GC content preferences of monocotyledons or dicotyledons as these preferences have been shown to differ (Murray et al. 1989 Nucleic Acids Res. 17:477- 498).
  • the plant-preferred for a particular amino acid may be derived from known gene sequences from plants.
  • Additional sequence modifications are known to enhance gene expression in a cellular host. These include elimination of sequences encoding spurious polyadenylation signals, exon-intron splice site signals, transposon-like repeats, and other well -characterized sequences that may be deleterious to gene expression.
  • the GC content of the sequence may be adjusted to levels average for a given cellular host, as calculated by reference to known genes expressed in the host cell.
  • host cell refers to a cell which contains a vector and supports the replication and/or expression of the expression vector is intended. Host cells may be prokaryotic cells such as E.
  • coli or eukaryotic cells such as yeast, insect, amphibian or mammalian cells or monocotyledonous or dicotyledonous plant cells.
  • An example of a monocotyledonous host cell is a maize host cell.
  • the sequence is modified to avoid predicted hairpin secondary mRNA structures.
  • the various DNA fragments may be manipulated so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame.
  • adapters or linkers may be employed to join the DNA fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites or the like.
  • in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions may be involved.
  • a number of promoters can be used in the practice of the embodiments.
  • the promoters can be selected based on the desired outcome.
  • the nucleic acids can be combined with constitutive, tissue-preferred, inducible
  • the methods of the embodiments involve introducing a polypeptide or polynucleotide into a plant.
  • “Introducing” is as used herein means presenting to the plant the polynucleotide or polypeptide in such a manner that the sequence gains access to the interior of a cell of the plant.
  • the methods of the embodiments do not depend on a particular method for introducing a polynucleotide or polypeptide into a plant, only that the polynucleotide(s) or polypeptide(s) gains access to the interior of at least one cell of the plant.
  • Methods for introducing polynucleotide(s) or polypeptide(s) into plants include, but are not limited to, stable transformation methods, transient transformation methods, and virus- mediated methods.
  • “Stable transformation” as used herein means that the nucleotide construct introduced into a plant integrates into the genome of the plant and is capable of being inherited by the progeny thereof. “Transient transformation” as used herein means that a polynucleotide is introduced into the plant and does not integrate into the genome of the plant or a polypeptide is introduced into a plant. “Plant” as used herein refers to whole plants, plant organs (e.g., leaves, stems, roots, etc.), seeds, plant cells, propagules, embryos and progeny of the same. Plant cells can be differentiated or undifferentiated (e.g. callus, suspension culture cells, protoplasts, leaf cells, root cells, phloem cells and pollen).
  • Transformation protocols as well as protocols for introducing nucleotide sequences into plants may vary depending on the type of plant or plant cell, i.e., monocot or dicot, targeted for transformation. Suitable methods of introducing nucleotide sequences into plant cells and subsequent insertion into the plant genome include microinjection (Crossway et al. (1986) Biotechniques 4:320-334), electroporation (Riggs et al. 1986 Proc. Natl. Acad. Sci. USA 83:5602-5606), dgrotocterzwm-mediated transformation (US Patent Numbers 5,563,055 and 5,981,840), direct gene transfer (Paszkowski et al. 1984 EMBO J.
  • R genes e.g., encoding one of SEQ ID NOs: 1740-3478 or a fragment or variant thereof
  • the identified polynucleotides can be introduced into a desired location in the genome of a plant through the use of double-stranded break technologies such as TALENs, meganucleases, zinc finger nucleases, CRISPR-Cas, and the like.
  • the R gene can be introduced into a desired location in a genome using a CRISPR-Cas system, for the purpose of site-specific insertion.
  • the desired location in a plant genome can be any desired target site for insertion, such as a genomic region amenable for breeding or may be a target site located in a genomic window with an existing trait of interest.
  • Existing traits of interest could be either an endogenous trait or a previously introduced trait.
  • an R gene can be altered though gene editing in its native site to encode a R polypeptide having the amino acid sequence set forth in one of SEQ ID NOs: 1740-3478 or a fragment or variant thereof.
  • an R gene can be introduced by genome editing at a different genomic location.
  • nucleotide construct comprising an R gene sequence encoding a polypeptide having at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% amino acid sequence identity to one of SEQ ID NOs: 1740-3478 (or a fragment or variant thereof) can be inserted at a genomic locus other than the R gene’s native genomic locus.
  • genome editing technologies may be used to alter or modify the polynucleotide sequence to make it a favorable R gene allele.
  • Site specific modifications can be introduced into the desired R gene allele using any method for introducing site specific modification, including, but not limited to, through the use of gene repair oligonucleotides (e.g. US Publication 2013/0019349), or through the use of double-stranded break technologies such as TALENs, meganucleases, zinc finger nucleases, CRISPR-Cas, and the like.
  • Such technologies can be used to modify the previously introduced polynucleotide through the insertion, deletion or substitution of nucleotides within the introduced polynucleotide.
  • doublestranded break technologies can be used to add additional nucleotide sequences to the introduced polynucleotide. Additional sequences that may be added include, additional expression elements, such as enhancer and promoter sequences.
  • genome editing technologies may be used to position additional disease resistant proteins in close proximity to the R gene sequence within the genome of a plant, in order to generate molecular stacks disease resistant proteins.
  • an “altered target site,” “altered target sequence.” “modified target site,” and “modified target sequence” are used interchangeably herein and refer to a target sequence as disclosed herein that comprises at least one alteration when compared to non-altered target sequence.
  • Such “alterations” include, for example: (i) replacement of at least one nucleotide, (ii) a deletion of at least one nucleotide, (iii) an insertion of at least one nucleotide, or (iv) any combination of (i) - (iii).
  • NB-ARC domain clustering was carried out by pairwise alignment using MUSCLE, followed by construction of a phylogenetic tree with a maximum likelihood tree using 50 bootstraps, through MEGA software (version 10.0.5) (Kumar et al. 2018 Mol Biol Evol 35: 1547-1549).
  • RNA-seq library construction used in Examples below. Leaves from plants grown under the conditions listed above were sampled at flowering stage (50 to 70 days, depending on the line), and their total RNA was isolated from ground frozen tissue with RNeasy (Qiagen Inc., Valencia, CA), according to manufacturer’s protocol. Total RNA was then analyzed for quality and quantity with the Agilent Bioanalyzer RNA Nano kit (Agilent Technologies, Santa Clara, CA) and normalized to lug input per sample. Sequencing libraries were prepared according to Illumina Inc. (San Diego, CA) TruSeq mRNA-Seq protocols.
  • RNAs were isolated via attachment to oligo (dT) beads, fragmented and reverse transcribed into cDNA by random hexamer primers with Superscript II reverse transcriptase (Life Technologies, Carlsbad, CA). The resulting cDNAs were end repaired, 3 prime A-tailed and ligated with Illumina indexed TruSeq adapters. Ligated cDNA fragments were PCR amplified with Illumina TruSeq primers, purified with AmpureXP Beads (Beckman Coulter Genomics, Danvers, MA) and checked for quality and quantity with the Agilent TapeStation 4200 system with DI 000 ScreenTape. Libraries were combined into one sequencing pool and was normalized to 2nM.
  • the pool was denatured according to Illumina sequencing protocols, hybridized and clustered on two flow cell lanes of a NovaSP flow cell using the NovaSeq 6000. Single-end fifty base sequences and eight base dual-index sequences were generated on the NovaSeq 6000 according to Illumina protocols. Data was trimmed for quality with a minimum threshold of Q13 and the resulting sequences were split by index identifier. Sequencing data is available at the Sequence Read Archive (SRA) database, accession GSE206952.
  • SRA Sequence Read Archive
  • RNA-seq expression data was obtained from a short-read repository (SRA, https://www.ncbi.nlm.nih.gov/sra, accessions ERX3793507-ERX3793986).
  • SRA short-read repository
  • RNA-seq reads obtained from SRA, as well as those generated through our own library construction and sequencing, were then quantified by running Salmon (version 1.1.0) against the transcriptome of each NAM founder line, with GC bias correction (Patro et al., 2017).
  • Transcript expression per library was then converted to gene expression per tissue using the DESeq2 package in R (version 1.31.6) (Love et al., 2014).
  • Intra-cluster expression variability was assessed by calculating the average pairwise Manhattan distance of log-transformed expression values for each cluster, using spatial distance module of SciPy (version 1.5.4).
  • Example 1 Identification of NLRs.
  • NAM maize nested association mapping
  • NLRs The majority of NLRs (57 %) had canonical structures, with a coiled-coil region, followed by an NB-ARC domain, terminating in a series of LRRs. Some alternative structures were abundant, including proteins containing only a coiled-coil and an NB-ARC domain (14.6 %), proteins containing only an NB-ARC domain and LRRs (11.4 %) and proteins with an NB- ARC and no other canonical NLR domains (6.1 %). Interestingly, several genes were identified that may be the result of a two-NLR fusion.
  • NAM-associated domain 25 of the NAM founder lines contained an NLR on chromosome 6 which had a coiled-coil-NB-ARC-LRR-NB-ARC-LRR structure, with a C-terminal integrated no apical meristem associated (NAM-associated) domain (Cheng et al., 2012).
  • Example 2 Integrated domains in NLRs of NAM founder lines.
  • HMMer was used to search for atypical integrated domains within the NAM NLR repertoire, i.e., the NAM’s NLRome. After identifying all domains via HMMer, custom python scripts were employed to filter out all hits that overlapped canonical NLR domains. The resulting set of potential atypical domains was then filtered loosely (e-value ⁇ 0.11) and strictly (e-value ⁇ 0.01 and at least 40% of the domain covered). The loosely filtered set contained a number of domains of unknown function and canonical domains with very poor coverage. Although the majority of these hits are likely false positives, some may represent true IDs that have undergone significant divergence after their neofunctionalization. After filtering for high confidence domain calls and collapsing redundant domains, a total of 19 strictly filtered unique integrated domains were found across all NAM NLRs (Fig. 2).
  • the most frequent integrated domain was a kinase, which appeared in two to three NLRs in each NAM founder line.
  • PAH amphipathic helix
  • NAM-associated no apical meristem-associated domain
  • the unfiltered set included many low-quality domain hits found in only a single gene, the more strictly filtered set only included two domains that appeared uniquely in a single gene in only one NAM founder line (zf-RVT and UvsW).
  • the M0I8W genome contains nucleotides that potentially encode an NB-ARC domain and which cluster very closely with the B73 gene NB-ARC domain (98.3% sequence identity), but no actual gene was found to be produced at this locus. Similar genomic/genic NB-ARC clusters were found throughout the genome, including Chrl (M0I8W Zm000034a005849), Chr2 (M0I8W Zm00034a016521), Chr4 (M0I8W Zm000034a031848), Chr6 (B73 Zm00001e031193), Chr7 (M0I8W Zm00034a051957) and ChrlO (B73 Zm00001e039226).
  • NLRs were found to be distributed as singletons and small groups throughout the genome, but many existed in a few large clusters of variable size in which many NLRs were concentrated in a small genomic space. For the purpose of this analysis, physically clustered genes where those considered to reside within 1 MB of another NLR.
  • This cluster also contained a large number of genomic NB-ARCs without definitive gene models, with the most extreme example being M37W, which had 17 NLR genes and 18 genomic regions with potential to encode NB-ARCs, but gene model derived from RNA-seq data. Unsurprisingly this cluster also had a high degree of PAV and allelic diversity. Sequence-based clustering revealed that this cluster is actually comprised of two groups which are distinct at the sequence level but in very close proximity physically.
  • NLRs Despite the distance and potential intervening gene, these NLRs appeared to be highly co-regulated, averaging an R2 of 0.97 across different tissue types.
  • Clustering of the protein sequences of all NAM NLRs to determine their relationships was done using OrthoAgogue software application (Ekseth et al. 2014 Bioinformatics 30(5): 734-736. 158 clusters were identified. 20 were classified as “core” NLR clusters, with all NAM founder lines containing at least one member. A total of 15 clusters were present in all but one NAM founder line and 11 were missing from only two NAM founder lines. On average, clusters contained at least one member in 16 out of the 26 NAM founder lines, indicating that PAV was the norm for most NLRs across the lines.
  • NLRs are known to be a very diverse group of genes, with high presence-absence variation, high Ka/Ks ratios and frequent intergenic crossovers in other species.
  • OrthoAug clusters were examined for outliers on different chromosomes or significantly different positions relative to other members. Although the vast majority of NLRs (98.7 %) resided in groups that contained similar positions on the same chromosome, several outliers were also identified. The most extreme outliers were found in Oh7B, which contained 11 NLRs on chromosome 9 that clustered with chromosome 10 NLRs from all other NAM founder lines.
  • NLR gene expression Besides transposition, an alternative or additional explanation may be that the rapidly evolving nature of NLRs caused two separate clusters to undergo convergent evolution.
  • Subsequent expression analysis revealed relatively low Manhattan distances for pairwise comparisons within these clusters, providing further evidence for their relatedness (see “NLR gene expression”).
  • RNA-seq data was originally intended for transcriptome annotation and most tissues only contained two biological replicates, reducing the statistical power of differential expression testing. Therefore the data was used only to assess broad expression differences across tissues and have noted all cases where the two biological replicates are substantially divergent (> 2-fold difference), and a third biological replicate would be required to get a more accurate expression estimate.
  • the public data was also supplemented with additional RNA-seq libraries that contained four biological replicates from each NAM founder line constructed from R1 leaves, a developmental stage at which plants often encounter pathogen challenge in the field.
  • NLRs were found to be expressed at a significant level across all tissues surveyed (average fragments per kilobase of exon per million mapped fragments or FPKM of 6.75), with the highest average expression found in vegetative tissue. Endosperm had the lowest median NLR expression, followed by embryo, anther, ear inflorescence and tassel. All vegetative tissues had similar levels of average NLR expression, with shoot having the lowest average NLR expression (4.52 FPKM) and leaf base having the highest (7.85 FPKM).
  • NLRs which lacked LRR domains were expressed at a slightly lower level than those containing the canonical coiled-coil, NB-ARC and LRR domains (average FPKM of 4.26 compared to 6.94).
  • RPW8 NLRs were found in the NAM founder lines, they both possessed above average expression levels (average 18.98 FPKM).
  • the rare tissue-specific expression patterns may have bearing on resistance gene selection for diseases which are known to invade specific tissues.
  • the ChrlO->Chr2 translocation which resulted in a sequence-based cluster containing a mixture of genes from different chromosomes also possessed a Manhattan distance which was similar to clusters containing non-mixed genes (23.2, compared to an average of 25.4).
  • Example 7 Diversity within clusters at the whole gene and domain-level.
  • Entropy variation across the different regions of the NLR proteins within each cluster was assessed. After Shannon entropy was calculated at each position within each cluster, these values were binned into the following protein regions: coiled-coil, NB-ARC domain, spacer (region between NB-ARC and start of LRRs), LRRs, LRR spacers (regions in between LRRs), C-terminal and integrated domains. Coiled-coil regions, which have been proposed to play a role in inter- and intra-protein interaction, tended to have higher entropy than the whole protein (0.31).
  • NB-ARC domains tend to have higher conservation than average within NLRs, and this was broadly consistent across the clusters from the NAM founder lines (average Shannon entropy of 0.10).
  • Spacer sequences between the NB-ARC domain and LRR region also had low entropy on average (0.18).
  • LRRs have been noted to have higher than average diversity, and we also found that they had high average Shannon entropy within clusters (0.38).
  • the spacer regions between different LRRs on average had a similar level of entropy (0.38), but on a per-cluster basis, diversity of LRRs was often uncorrelated with diversity of LRR spacer regions.
  • the Sec66 domain which has been proposed to be involved in protein translocation, had extremely low entropy (0.04) within its 24-member cluster, despite this cluster having very high entropy at the whole protein level (0.69).
  • the majority of clusters with IDs tended to have high entropy either within the ID, or at the whole protein level, which may be reflective of their proposed role in direct effector binding.
  • a “composite” NLR was constructed by averaging the entropy patterns of the most common domains. Average Shannon entropy of each position within each domain/protein region was calculated for all clusters (Fig. 3). For regions of variable size (spacers and C- terminal), the positions of entropy values were placed into 100 bins, with each bin representing 1% of the domain’s total size in a given cluster, before averaging. Only the four common LRR HMM models were included in the resulting composite NLR (LRR 1, LRR 4, LRR 6 and LRR 8). These LRR domains showed the second highest level of entropy, with only the C- terminal domains having higher average values. The resulting composite NLR shows the clear variability of NLR entropy throughout the different canonical domains (Fig. 3).
  • NLRs were found to have very high levels of PAV and allelic diversity and were distributed unevenly across maize genomes, with a single cluster on chromosome 10 representing a significant portion of the total complement of almost all lines.
  • the physical clustering seen across the maize genome correlates well with sequence-based clustering, enabling physical placement of NLRs based on sequence alone.
  • the ability to infer physical location from sequence is beneficial for techniques such as resistance gene enrichment sequencing (RenSeq) (Jupe et al. 2013 Plant J, 76: 530-44).
  • NLR expression across a wide array of tissue types indicated that genes in most-sequence based clusters shared tissue-specific expression patterns. The majority of NLRs were expressed ubiquitously, although some clear root-preferential clusters existed. A small number of outliers within sequence-based clusters exhibited different expression patterns compared to the rest of the cluster, including a chromosome 10 NLR which had leaf basespecific expression in most lines, but much broader expression in other lines. Such outliers may be indicative of neofunctionalization, although additional studies are needed to assess this possibility.
  • PAH domains have not been reported as effector targets, they are known to integrated into the NLRs of other species and may be targeted by pathogens due to their role in protein-protein interaction of transcription factors (Kroj et al. 2016, supra; Bowen et al. 2010 J Mol Biol, 395: 937-49).
  • a novel ID structure was found in a gene that contained both an N-terminal REC 104 domain and a mid-protein kinase domain.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Botany (AREA)
  • Biochemistry (AREA)
  • Environmental Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Developmental Biology & Embryology (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • Cell Biology (AREA)
  • Physics & Mathematics (AREA)
  • Microbiology (AREA)
  • Plant Pathology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Medicinal Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Breeding Of Plants And Reproduction By Means Of Culturing (AREA)

Abstract

L'invention concerne des plantes, des cellules, des tissus et un germoplasme de ceux-ci comprenant R gènes pour une résistance accrue aux maladies des plantes. L'invention concerne également des procédés de reproduction et des procédés d'identification et de sélection de plantes ayant les gènes R décrits. L'invention concerne des procédés pour fabriquer de nouveaux variants du gène R et des fragments pour la résistance aux maladies. Les gènes R selon l'invention sont utiles dans la production de plantes résistantes aux maladies par sélection, modification transgénique ou édition du génome.
PCT/US2024/018501 2023-03-06 2024-03-05 Gènes de résistance aux agents pathogènes des plantes WO2024186806A2 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202363488568P 2023-03-06 2023-03-06
US63/488,568 2023-03-06

Publications (2)

Publication Number Publication Date
WO2024186806A2 true WO2024186806A2 (fr) 2024-09-12
WO2024186806A3 WO2024186806A3 (fr) 2024-10-24

Family

ID=92675664

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2024/018501 WO2024186806A2 (fr) 2023-03-06 2024-03-05 Gènes de résistance aux agents pathogènes des plantes

Country Status (1)

Country Link
WO (1) WO2024186806A2 (fr)

Similar Documents

Publication Publication Date Title
US20240093223A1 (en) Methods of identifying, selecting, and producing southern corn rust resistant crops
US11473101B2 (en) Methods of identifying, selecting, and producing southern corn rust resistant crops
US20210040569A1 (en) Methods of identifying, selecting, and producing disease resistant crops
WO2021143587A1 (fr) Procédés d'identification, de sélection et de production de cultures résistantes aux maladies
US20040025202A1 (en) Nucleic acid molecules associated with oil in plants
WO2019203942A1 (fr) Procédés d'identification, de sélection et de production de riz résistant à la brûlure helminthosporienne bactérienne
US20240191249A1 (en) Plant pathogen effector and disease resistance gene identification, compositions, and methods of use
US12091673B2 (en) Methods of identifying, selecting, and producing southern corn rust resistant crops
US20230151382A1 (en) Plant pathogen effector and disease resistance gene identification, compositions, and methods of use
WO2023023499A1 (fr) Compositions et procédés de résistance aux taches grises des feuilles
US20220282338A1 (en) Methods of identifying, selecting, and producing anthracnose stalk rot resistant crops
WO2024186806A2 (fr) Gènes de résistance aux agents pathogènes des plantes
US11661609B2 (en) Methods of identifying, selecting, and producing disease resistant crops
BR112020025031A2 (pt) molécula que tem utilidade pesticida, composições e processos relacionados à mesma