WO2023151004A1 - Methods and compositions for increasing protein and oil content and/or modifying oil profile in plant - Google Patents

Methods and compositions for increasing protein and oil content and/or modifying oil profile in plant Download PDF

Info

Publication number
WO2023151004A1
WO2023151004A1 PCT/CN2022/075977 CN2022075977W WO2023151004A1 WO 2023151004 A1 WO2023151004 A1 WO 2023151004A1 CN 2022075977 W CN2022075977 W CN 2022075977W WO 2023151004 A1 WO2023151004 A1 WO 2023151004A1
Authority
WO
WIPO (PCT)
Prior art keywords
plant
nucleic acid
acid sequence
hap
seq
Prior art date
Application number
PCT/CN2022/075977
Other languages
French (fr)
Inventor
Qingshan Chen
Zhaoming QI
Dawei XIN
Jian LV
Xiaoping Tan
Original Assignee
Northeast Agriculture University
Syngenta Group Co, Ltd.
Syngenta Crop Protection Ag
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeast Agriculture University, Syngenta Group Co, Ltd., Syngenta Crop Protection Ag filed Critical Northeast Agriculture University
Priority to PCT/CN2022/075977 priority Critical patent/WO2023151004A1/en
Priority to PCT/US2023/062421 priority patent/WO2023154887A1/en
Publication of WO2023151004A1 publication Critical patent/WO2023151004A1/en

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01HNEW PLANTS OR NON-TRANSGENIC PROCESSES FOR OBTAINING THEM; PLANT REPRODUCTION BY TISSUE CULTURE TECHNIQUES
    • A01H6/00Angiosperms, i.e. flowering plants, characterised by their botanic taxonomy
    • A01H6/54Leguminosae or Fabaceae, e.g. soybean, alfalfa or peanut
    • A01H6/542Glycine max [soybean]
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01HNEW PLANTS OR NON-TRANSGENIC PROCESSES FOR OBTAINING THEM; PLANT REPRODUCTION BY TISSUE CULTURE TECHNIQUES
    • A01H1/00Processes for modifying genotypes ; Plants characterised by associated natural traits
    • A01H1/12Processes for modifying agronomic input traits, e.g. crop yield
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01HNEW PLANTS OR NON-TRANSGENIC PROCESSES FOR OBTAINING THEM; PLANT REPRODUCTION BY TISSUE CULTURE TECHNIQUES
    • A01H5/00Angiosperms, i.e. flowering plants, characterised by their plant parts; Angiosperms characterised otherwise than by their botanic taxonomy
    • A01H5/10Seeds
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8241Phenotypically and genetically modified plants via recombinant DNA technology
    • C12N15/8261Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses

Definitions

  • This disclosure relates to the field of plant biotechnology.
  • it relates to methods and compositions for increasing plant protein /oil content and modifying oil profile.
  • Soybean is a valuable field crop. Soybean oil extracted from the seed is employed in a number of retail products such as cooking oil, baked goods, margarines and the like. Soybean is also used as a grain as a food source for both animals and humans. Soybean meal is a component of many foods and animal feed. Typically, during processing of whole soybeans, the fibrous hull is removed and the oil is extracted, and the remaining soybean meal is a combination of approximately 50%carbohydrates and 50%protein. For human consumption soybean meal is made into soybean flour that is processed to protein concentrates used for meat extenders or specialty pet foods. Production of edible protein ingredients from soybean offers a healthier and less expensive replacement for animal protein in meats as well as dairy-type products.
  • an elite Glycine max plant having in its genome a nucleic acid sequence from a donor Glycine plant, wherein the donor Glycine plant is a different strain from the elite Glycine max plant, and wherein the nucleic acid sequence encoding at least one polypeptide having at least 90%identity or 95%identity to SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein said polypeptide confers increased protein, oil content, and/or modified oil profile on the elite Glycine max plant.
  • a plant in another aspect, provided herein is a plant, the genome of which has been edited to comprise a nucleic acid sequence encoding at least one polypeptide having at least 90%identity or 95%identity to SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22 or 24-59 , wherein said polypeptide confers increased protein, increased oil content, and/or modified oil profile relative to a control plant, wherein the plant does not comprise said nucleic acid sequence before the genome editing.
  • a plant having stably incorporated into its genome a nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence encodes a polypeptide having (a) an amino acid sequence comprising at least 85%, at least 90%, or at least 95%identity to at least one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, or, (b) an amino acid sequence set forth in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein said nucleic acid sequence is heterologous to the plant, and wherein the plant has increased protein content and/or increased oil and/or modified oil profile as compared to a control plant.
  • a method of producing a soybean plant having increased protein, increased oil content, and/or modified oil profile comprising the steps of: a) providing a donor soybean plant comprising in its genome a nucleic acid sequence encoding at least one polypeptide having at least 90%identity or 95%identity to SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or a nucleic acid sequence encoding any one of SEQ ID NO: 22, or 24-59, wherein said nucleic acid sequence confers onto said donor soybean plant an increased protein, increased oil content, and/or modified oil profile; b) crossing the donor soybean plant of a) with the recipient soybean plant not comprising said nucleic acid sequence; and c) selecting a progeny plant from the cross of b) by isolating a nucleic acid from said progeny plant and detecting within said nucleic acid a molecular marker associated with said nucleic acid sequence thereby producing a soybean plant having increased protein content, increased oil content,
  • a method of conferring increased protein content, increased oil content, and/or modified oil profile to a plant comprising: a) introducing into the genome of the plant a nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence is stably incorporated into the genome, wherein the nucleic acid sequence encodes a polypeptide having (i) an amino acid sequence comprising least 85%, at least 90%, or at least 95%identity to at least one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59, or (ii) an amino acid sequence set forth in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59, wherein said nucleic acid sequence is heterologous to the plant, and wherein expression of said nucleic acid sequence increases protein content, increases oil content, and/or modified oil profile compared to a control plant not expressing said nucleic acid sequence.
  • a polypeptide selected from: (a) a polypeptide having the amino acid sequence shown in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein expression of the polypeptide in a plant confers increased protein, oil content, and/or modified oil profile on said plant, and having a heterologous amino acid sequence attached thereto; (b) a polypeptide comprising the amino acid sequence of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, and having a substitution and/or a deletion and/or an addition of one or more amino acid residues, wherein expression of the polypeptide in the plant confers increased protein, increased oil content, and/or modified oil profile on said plant; (c) a polypeptide having at least 99%, at least 95%, at least 90%, at least 85%, or at least 80%identity with and having the same function as the amino acid sequence of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein the polypeptide when expressed
  • nucleic acid molecule comprising (a) a nucleotide sequence encoding a protein having an amino acid sequence sharing at least 90%, 95%or 100%sequence identity to SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein said nucleotide sequence comprises a heterologous nucleic acid sequence attached thereto and expression of the nucleic acid molecule in a plant increases protein content, increases oil content, and/or modified oil profile in the plant; (b) the nucleotide sequence of part (a) comprising a sequence of SEQ ID NOs: NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, 21 and a sequence encoding SEQ ID NO: 22, 24-59; or (c) the nucleotide sequence of part (a) having at least 99%, at least 95%, at least 90%identity to of any one of SEQ ID NOs: NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, 21 or a polynucleotide of SEQ ID
  • primer pairs for amplifying the nucleic acid molecule as disclosed above are provided herein.
  • the present application includes the following figures.
  • the figures are intended to illustrate certain embodiments and/or features of the compositions and methods, and to supplement any description (s) of the compositions and methods.
  • the figures do not limit the scope of the compositions and methods, unless the written description expressly indicates that such is the case.
  • FIG. 1 shows the alignment diagram of Glyma. 06G303700 CDS sequence in Suinong 14 (SN14) , ZYD00006 (ZYD) and Williams 82 (W82) .
  • FIG. 2 shows a phylogenetic tree of Glyma. START (Glyma. 06G303700) according to certain aspects of this disclosure.
  • FIG. 3 shows the comparison of amino acid sequences of Glyma. 03G036300 and AT1G05230.
  • FIG. 4 shows the comparison of amino acid sequences of Glyma. 03G036300 in Suinong 14 (SN14) , ZYD00006 (ZYD) and Williams 82 (W82) .
  • FIG. 5 shows the comparison of amino acid sequences of Glyma. 07G192400 Suinong 14 (SN14) , ZYD00006 (ZYD) and Williams 82 (W82) .
  • FIGS. 6A-B show the predicted tertiary protein structures of Glyma. START (Glyma. 06G303700) derived from soy strains Suinong 14 (SN14) and ZYD00006 (ZYD) , respectively, according to certain aspects of this disclosure.
  • FIG. 7 shows tissue-specific expression of Glyma. START (Glyma. 06G303700) according to certain aspects of this disclosure.
  • FIG. 8 shows cell location of Glyma. START (Glyma. 06G303700) according to certain aspects of this disclosure.
  • FIG. 9 shows results of using qRT-PCR to identify transgenic Arabidopsis expressing Glyma. START (Glyma. 06G303700) under the control of an 35S promoter according to certain aspects of this disclosure.
  • FIGS. 10A-B show results of analyzing seed fatty acid content/profile and protein content, respectively, in Arabidopsis mutant, transgenic Arabidopsis expressing a Glyma. START (Glyma. 06G303700) according to certain aspects of this disclosure.
  • FIG. 11 shows results of using qRT-PCR to identify transgenic soybean expressing Glyma.
  • START (Glyma. 06G303700) under the control of an 35S promoter according to certain aspects of this disclosure.
  • FIG. 12A-C show results of seed protein content and fatty acid content, and fatty acid profile, respectively, in transgenic soybean expressing a Glyma. START (Glyma. 06G303700) according to certain aspects of this disclosure.
  • FIGS. 13A-13B show the protein content distribution and oil content distribution, respectively, of excellent haplotype phenotypic in block 1 of Glyma. START (Glyma. 06G303700) according to certain aspects of this disclosure.
  • FIGS. 14A-14B show the protein content distribution and oil content distribution, respectively, of excellent haplotype phenotypic in block 2 of Glyma. START (Glyma. 06G303700) according to certain aspects of this disclosure.
  • FIG. 15A-15B show the protein content distribution and oil content distribution, respectively, of excellent haplotype phenotypic in block 3 of Glyma. START (Glyma. 06G303700) according to certain aspects of this disclosure.
  • FIG. 16 shows the map of the Fu28 entry vector according to certain aspects of this disclosure.
  • FIG. 17 shows the map of the pr35S expression vector according to certain aspects of this disclosure.
  • the polypeptides result in a modified oil profile when expressed in a plant or part thereof as compared to a control plant that does not express the polypeptides.
  • oil content and “fatty acid content” are used interchangeably herein.
  • fatty acid profile and “oil profile” are used interchangeably herein.
  • the polypeptides include SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59, and variants of thereof.
  • Various means of introducing nucleic acid sequence into the soybean plant are also disclosed, which include transgenic means, gene editing, and breeding.
  • phenotype, ” “phenotypic trait” or “trait” refer to a distinguishable characteristic (s) of a genetically controlled trait.
  • the plants provided herein are a non-naturally occurring variety of soybean having the desired trait.
  • the non-naturally occurring variety of soybean is an elite soybean variety.
  • a “non-naturally occurring variety of soybean” is any variety of soybean that does not naturally exist in nature.
  • a “non-naturally occurring variety of soybean” may be produced by any method known in the art, including, but not limited to, transforming a soybean plant or germplasm, transfecting a soybean plant or germplasm and crossing a naturally occurring variety of soybean with a non-naturally occurring variety of soybean.
  • a “non-naturally occurring variety of soybean” may comprise one of more heterologous nucleotide sequences.
  • a “non-naturally occurring variety of soybean” may comprise one or more non-naturally occurring copies of a naturally occurring nucleotide sequence (i.e., extraneous copies of a gene that naturally occurs in soybean) .
  • a “non-naturally occurring variety of soybean” may comprise a non-natural combination of two or more naturally occurring nucleotide sequences (i.e., two or more naturally occurring genes that do not naturally occur in the same soybean, for instance genes not found in Glycine max lines) .
  • Methods and compositions are provided that modulate the level of oil, protein and/or fatty acids in a plant, a plant part, or a seed.
  • various methods and compositions are provided that produce an increase in protein content in the plant, plant part or seed.
  • An increase in protein content includes any statistically significant increase in the protein content in the plant, plant part or seed when compared to an appropriate control plant or plant part and includes, for example, an increase of at least 0.2%, 0.4%, 0.6%, 0.8%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%or higher.
  • an increase in protein content includes an increase of about 0.2%to about 0.5%, about 0.5%to about 1%, about 1%to about 2%, about 2%to about 3%, about 4%to about 5%, about 5%to about 6%, about 6%to about 7%, about 7%to about 8%, about 8%to about 9%, about 9%to about 10%, about 10%to about 12%, about 12%to about 14%, about 14%to about 16%, about 16%to about 18%, about 18%to about 20%, about 22%to about 25%, about 25%to about 30%.
  • NIR FOSS Near Infrared Ray
  • an increase in oil content includes any statistically significant increase in the oil content in the plant, plant part or seed when compared to an appropriate control plant or plant part and includes, for example, an increase of at least 0.2%, 0.4%, 0.6%, 0.8%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%or higher.
  • an increase in oil content includes an increase of about 0.2%to about 0.5%, about 0.5%to about 1%, about 1%to about 2%, about 2%to about 3%, about 4%to about 5%, about 5%to about 6%, about 6%to about 7%, about 7%to about 8%, about 8%to about 9%, about 9%to about 10%, about 10%to about 12%, about 12%to about 14%, about 14%to about 16%, about 16%to about 18%, about 18%to about 20%, about 22%to about 25%, about 25%to about 30%.
  • Various methods of assaying for oil content levels are known. For example, mature seeds can be harvested, and grain protein content can be determined by NIR or wet chemistry analysis (see examples)
  • various methods and compositions are provided that produce a modified oil profile in the plant, plant part or seed.
  • a modified oil profile includes a change in a ratio of fatty acids consitutents included in the oil generated by the plant, plant part or seed, relative to a control plant, without a change (e.g., without an increase or a decrease) in the oil content or oil level of the plant, plant part or seed.
  • the modified oil profile comprises a modified fatty acid profile, wherein the modified fatty acid profile includes an increase in linoleic acid and/or palmitic acid and/or oleic acid and/or eicosenoic acid in the oil relative to other fatty acid constituents of the oil.
  • the modified oil profile includes an increase of at least 0.2%, 0.4%, 0.6%, 0.8%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%or higher or linoleic acid and/or palmitic acid and/or oleic acid and/or eicosenoic acid in the oil without a corresponding increase in oil content.
  • a modified fatty acid profile results in a modified oil profile.
  • a modified oil profile comprises a modified fatty acid profile.
  • an increase in fatty acid content includes any statistically significant increase in the fatty content in the plant, plant part or seed when compared to an appropriate control plant or plant part and includes, for example, an increase of at least 0.2%, 0.4%, 0.6%, 0.8%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%or higher.
  • an increase in fatty acid content includes an increase of about 0.2%to about 0.5%, about 0.5%to about 1%, about 1%to about 2%, about 2%to about 3%, about 4%to about 5%, about 5%to about 6%, about 6%to about 7%, about 7%to about 8%, about 8%to about 9%, about 9%to about 10%, about 10%to about 12%, about 12%to about 14%, about 14%to about 16%, about 16%to about 18%, about 18%to about 20%, about 22%to about 25%, or about 25%to about 30%.
  • Various methods of assaying for fatty content levels are known. For example, mature seeds can be harvested, and grain protein content can be determined by gas chromatography (see examples) .
  • the methods and compositions provide for an increase in linoleic acid and/or palmitic acid and/or oleic acid and/or eicosenoic acid in increased (or any combination thereof) when compared to an appropriate control plant.
  • Such increases include for example, an increase of at least 0.2%, 0.4%, 0.6%, 0.8%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%or higher.
  • an increase in linoleic acid and/or palmitic acid and/or oleic acid and/or eicosenoic acid in increased includes an increase of about 0.2%to about 0.5%, about 0.5%to about 1%, about 1%to about 2%, about 2%to about 3%, about 4%to about 5%, about 5%to about 6%, about 6%to about 7%, about 7%to about 8%, about 8%to about 9%, about 9%to about 10%, about 10%to about 12%, about 12%to about 14%, about 14%to about 16%, about 16%to about 18%, about 18%to about 20%, about 22%to about 25%, about 25%to about 30%. or higher of linoleic acid and/or palmitic acid and/or oleic acid and/or eicosenoic acid.
  • a “subject plant or plant cell” is one in which genetic alteration, such as transformation, has been affected as to a polynucleotide of interest, or is a plant or plant cell which is descended from a plant or cell so altered and which comprises the alteration.
  • a “control” or “control plant” or “control plant cell” provides a reference point for measuring changes in phenotype of the subject plant or plant cell.
  • a control plant or plant cell may comprise, for example: (a) a wild-type plant or cell, i.e., of the same genotype as the starting material for the genetic alteration which resulted in the subject plant or cell; (b) a plant or plant cell of the same genotype as the starting material but which has been transformed with a null construct (i.e., with a construct which has no known effect on the trait of interest, such as a construct comprising a marker gene) ; (c) a plant or plant cell which is a non-transformed segregant among progeny of a subject plant or plant cell; (d) a plant or plant cell genetically identical to the subject plant or plant cell but which is not exposed to conditions or stimuli that would induce expression of the gene of interest; or (e) the subject plant or plant cell itself, under conditions in which the gene of interest is not expressed.
  • a wild-type plant or cell i.e., of the same genotype as the starting material for the genetic alteration which resulted in the subject plant or
  • compositions and methods for conferring increased protein content, increased oil content, and/or modified oil profile are provided.
  • Polypeptides, polynucleotides and fragments and variants thereof that confer increased protein content, increased oil content, and/or modify oil profile are provided.
  • the polypeptide is SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, or a fragment or variant of any one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59.
  • the polynucleotide is any one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21, a polynucleotide encoding a polypeptide having the sequence of any one of SEQ ID NO: 22, or 24-59, or a fragment or variant of any one thereof.
  • the term “gene” refers to a hereditary unit including a sequence of DNA that occupies a specific location on a chromosome and that contains the genetic instruction for a particular characteristic or train in an organism.
  • the genome of the soybean cultivar Williams 82 www. ncbi. nlm. nih. gov/assembly/GCF_000004515.6/?
  • Williams 82 was derived from backcrossing a Phytophthora root rot resistance locus from the donor parent Kingwa into the recurrent parent Williams. See Schmutz et al., Nature 2010 Jan 14; 463 (7278) : 178-83. doi: 10.1038/nature08670.
  • Glyma. 06G303700 (SEQ ID NO: 1-5) sequence is expressed in all tissues and organs, with the highest expression level in seeds.
  • Glyma. START (Glyma. 06G303700) comprises several conserved domains: a START_ArGLABRA2_like domain (aa 241-465 of SEQ ID NO: 3 &5) ; a START domain (aa 246-466 of SEQ ID NO: 3 &5) ; a homeobox domain (aa 57-110 of SEQ ID NO: 3 &5) ; a homeodomain (aa 55-113 of SEQ ID NO: 3 &5) ; a COG5576 superfamily domain (aa 13-129 of SEQ ID NO: 3 &5) ; and a MreC superfamily domain (aa 120-193 of SEQ ID NO: 3 &5) .
  • the START_ArGLABRA2_like domain is the C-terminal lipid-binding START domain of the Arabidopsis homeobox protein GLABRA 2.
  • the START_ArGLABRA2_like subfamily includes the Arabidopsis homeobox protein GLABRA 2 and other proteins related to steroid production.
  • the homeobox domain encodes a 61-amino acid sequence, which has the ability to bind specific DNA sequences and control gene expression at the transcriptional level.
  • the COG5576 super family domain is a homeodomain-containing, transcriptional regulation domain. MreC superfamily domain usually involves in formation and maintenance of cell shape, which can position cell wall synthetic complexes.
  • the genomic sequence of Glyma. 06G303700 is 8466 bp in length, and the CDS sequence is 2190 bp in length.
  • the exon region of Glyma. 06G303700 (SEQ ID NO: 3) in soy variety SN14 is identical to the corresponding gene in soy variety Williams82 (W82) .
  • Wild soybean (G. soja) variety ZYD00006 (ZYD) comprises four mutations in Glyma. 06G303700 relative to Williams82 (FIG.
  • C1162T i.e., change from C to T at 1162 bp position
  • A1370G i.e., change from A to G at 1370 bp position
  • C2063G i.e., change from C to G at 2063 bp position
  • C2098G i.e., change from G to A at 2098 bp position
  • the last three base mutations do not result in any changes in the encoded amino acids, but the first base mutation, C1162T, resulted in an alanine to valine substitution at position 388, i.e., A388V.
  • the phylogenetic tree of Glyma. 06G303700 was constructed using homologous sequences from Soybean, Arabidopsis, rice, corn, and other plants with MEGA5 software. See FIG. 2.
  • Glyma. 06G303700 shows high homology with Glyma. 15G220200, Glyma. 12G100100, and AT1G05230.
  • Glyma. 12G100100 contains the same conserved domains as Glyma. 06G303700.
  • AT1G05230 contains START_ArGLABRA2_like and homeobox domains, which are also present in Glyma. 06G303700.
  • AT1G05230 and Glyma. START (Glyma. 06G303700) share 78.9%amino acid sequence identity, See FIG. 3.
  • Glyma. 03G040200 (SEQ ID NO: 10-12) has an OPT domain (aa 4-73 of SEQ ID NO: 12) , which is related to transmembrane transport. Glyma. 03G040200 is expressed in low levels in seeds.
  • the genomic sequence of Glyma. 03G040200 (SEQ ID NO: 10) is 463 bp in length, and the CDS sequence (SEQ ID NO: 11) is 237 bp in length.
  • soy variety Williams82 SEQ ID NO: 12
  • Glyma. 03G036300 (SEQ ID NO: 6-9) is a pif1 helicase and is involved in a number of cellular processes including DNA repair, DNA strand breaking, recombination, nucleotide binding, ATP binding, telomere maintenance, and cell response to DNA damage stimulation.
  • the protein possesses helicase activity and hydrolase activity.
  • Glyma. 03G036300 comprises a PIF1 domain (aa 2-211 of SEQ ID NO: 8) , a SF1_C_RecD domain (aa 258-303 of SEQ ID NO: 8) , and a RecD domain (aa 250-294) .
  • PIF1 domain is a conserved domain shared by the PIF1-like helicase family.
  • the SF1_C_RecD domain is found in the C-terminal helicase domain of Rec D family helicases.
  • the RecD domain is found in the ATP-dependent exoDNAses and the like and acts as a 3'-5' helicase.
  • RecBCD enzyme can unfold or separate DNA strands and also forms single-stranded gaps in DNA.
  • Glyma. 03G036300 in W82 (SEQ ID NO: 6) is 988 bp, and the full length of CDS (SEQ ID NO: 7) is 987 bp.
  • Glyma. 03G036300 in ZYD is same as that in W82.
  • the translation of Glyma. 03g036300 is terminated at 294th amino acid in SN14, and it can be translated normally in ZYD00006 (FIG. 4) .
  • Glyma. 07G192400 (SEQ ID NO: 16-19) is highly expressed in seeds and is involved in transmembrane transport. No conserved domain information was known for Glyma. 07G192400.
  • the genome sequence of the gene Glyma. 07G192400 (SEQ ID NO: 16) is 4263 bp in length, and the CDS sequence (SEQ ID NO: 18) is 417 bp in length. See FIG. 5. only one base mutation occurred in ZYD00006, and the mutation was G-A.
  • Translating the CDS sequence of the gene into amino acid sequence it was found that the base mutation in the CDS sequence led to the change of amino acid translation, resulting in the change of amino acid from V (valine) to I (isoleucine) at position 46..
  • Glyma. 06g297500 (SEQ ID NO: 13-15) .
  • the full length genomic sequence of Glyma. 06G297500 (SEQ ID NO: 13) is 463 bp
  • the full length CDS sequence (SEQ ID NO: 14) is 237 bp.
  • the CDS sequence and amino acid sequence are identical in all three of soy varieties SN14, ZYD00006, and Williams82.
  • nucleic acid sequences when the nucleic acid sequences of certain sequences are aligned with each other, the nucleic acids that “correspond to” certain enumerated positions in the present invention are those that align with these positions in a reference sequence, but that are not necessarily in these exact numerical positions relative to a particular nucleic acid sequence of the invention.
  • Optimal alignment of sequences for comparison can be conducted by computerized implementations of known algorithms. or by visual inspection. Readily available sequence comparison and multiple sequence alignment algorithms are, respectively, the Basic Local Alignment Search Tool (BLAST) and ClustalW/ClustalW2/Clustal Omega programs available on the Internet (e.g., the website of the EMBL-EBI) .
  • BLAST Basic Local Alignment Search Tool
  • ClustalW/ClustalW2/Clustal Omega programs available on the Internet (e.g., the website of the EMBL-EBI) .
  • variants and fragments of the above-described polynucleotides and polypeptides and variants and fragments thereof increase protein content, increase oil content, and/or modify oil profile when expressed in a plant, plant part, or seed.
  • Fragments of the proteins that increase protein content, increase oil content, and/or modify oil profile when expressed in a plant, plant part, or seed include those that are shorter than the full-length sequences, either due to the use of an alternate downstream start site, or due to processing that produces a shorter protein having the activity.
  • a fragment of a protein that increases protein content, increases oil content, and/or modifies oil profile when expressed in a plant can be a polypeptide that is, for example, 10, 25, 50, 100, 150, 200, 250 or more amino acids in length of any one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59.
  • Such biologically active portions can be prepared by recombinant techniques and evaluated for activity of being able to confer increased protein content, increased oil content, and/or modified oil profile.
  • a fragment comprises at least 8 contiguous amino acids of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59.
  • Variants disclosed herein are polypeptides having an amino acid sequence that has at least 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 86%, about 87%, about 88%, about 89%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%or about 99%identity to the amino acid sequence of any one of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59.
  • Such variants will increase protein content, increase oil content, and/or modify oil profile when expressed in a plant, plant part or seed.
  • a variant polynucleotide comprises a deletion and/or addition of one or more nucleotides at one or more internal sites within the native polynucleotide and/or a substitution of one or more nucleotides at one or more sites in the native polynucleotide.
  • Equivalent programs may also be used.
  • equivalent program any sequence comparison program that, for any two sequences in question, generates an alignment having identical nucleotide residue matches and an identical percent sequence identity when compared to the corresponding alignment generated by needle from EMBOSS version 6.3.1.
  • BLAST nucleotide searches can be performed with the BLASTN program (nucleotide query searched against nucleotide sequences) to obtain nucleotide sequences homologous to nucleic acid molecules of the invention, or with the BLASTX program (translated nucleotide query searched against protein sequences) to obtain protein sequences homologous to nucleic acid molecules of the invention.
  • BLAST protein searches can be performed with the BLASTP program (protein query searched against protein sequences) to obtain amino acid sequences homologous to protein molecules of the invention, or with the TBLASTN program (protein query searched against translated nucleotide sequences) to obtain nucleotide sequences homologous to protein molecules of the invention.
  • Gapped BLAST in BLAST 2.0
  • PSI-Blast can be used to perform an iterated search that detects distant relationships between molecules. See Altschul et al. (1997) supra.
  • the default parameters of the respective programs e.g., BLASTX and BLASTN
  • Alignment may also be performed manually by inspection.
  • Two sequences are "optimally aligned” when they are aligned for similarity scoring using a defined amino acid substitution matrix (e.g., BLOSUM62) , gap existence penalty and gap extension penalty so as to arrive at the highest score possible for that pair of sequences.
  • Amino acid substitution matrices and their use in quantifying the similarity between two sequences are well-known in the art and described, e.g., in Dayhoff et al. (1978) "A model of evolutionary change in proteins. " In “Atlas of Protein Sequence and Structure, " Vol. 5, Suppl. 3 (ed. M.O. Dayhoff) , pp. 345-352. Natl. Biomed. Res. Found., Washington, D.C.
  • the BLOSUM62 matrix is often used as a default scoring substitution matrix in sequence alignment protocols.
  • the gap existence penalty is imposed for the introduction of a single amino acid gap in one of the aligned sequences, and the gap extension penalty is imposed for each additional empty amino acid position inserted into an already opened gap.
  • the alignment is defined by the amino acids positions of each sequence at which the alignment begins and ends, and optionally by the insertion of a gap or multiple gaps in one or both sequences, so as to arrive at the highest possible score.
  • fragments and variants of the polypeptides disclosed herein each comprises one or more conserved domains of the canonical polypeptide.
  • the variant or fragment can comprise a polypeptide comprising at least 40%, 50%, 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 98%or at least 99%identity to one or more of the conserved domains in the canonical polypeptide sequence.
  • a variant or fragment of Glyma. 06G303700 may comprise one or more of the conserved domains of the START_ArGLABRA2_like domain (aa 241-465 of SEQ ID NO: 3 &5) ; the START domain (aa 246-466 of SEQ ID NO: 3 &5) ; the homeobox domain (aa 57-110 of SEQ ID NO: 3 &5) ; the homeodomain (aa 55-113 of SEQ ID NO: 3 &5) ; the COG5576 superfamily domain (aa 13-129 of SEQ ID NO: 3 &5) ; and/or the MreC superfamily domain.
  • a variant or fragment of Glyma. 06G303700 (SEQ ID NO: 3 &5) can retain activity as a transcription factor.
  • a variant or fragment of Glyma. 03G040200 can comprise a polypeptide comprising at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 98%or at least 99%identity to one or more of the conserved domains of Glyma. 03G040200 (SEQ ID NO: 12) .
  • a variant or fragment of Glyma. 03G040200 (SEQ ID NO: 12) can retain activity as in transmembrane transport.
  • a variant or fragment of Glyma. 03G036300 can comprise a polypeptide comprising at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 98%or at least 99%identical to one or more of the conserved domains of Glyma. 03G040200 (SEQ ID NO: 12) .
  • a variant or fragment of Glyma. 03G036300 (SEQ ID NO: 8) can retain activity as a pif1 helicase.
  • a variant or fragment of Glyma. 06g297500 can comprise a polypeptide comprising at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 98%or at least 99%identical to one or more of the conserved domains of Glyma. 06g297500 (SEQ ID NO: 15) .
  • fragments and variants of the polypeptides disclosed herein will retain the activity of conferring increased protein content, increased oil content, and/or modified oil profile to a plant expressing the polypeptide.
  • increase in protein content and/or oil content can comprise any statistically significant increase, including, for example an increase of about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 85%, 90%, 95%or greater relative to a control. Methods of determining protein content or oil content are further described below.
  • the polypeptides disclosed herein may comprise a heterologous amino acid sequence attached thereto.
  • a polypeptide may have a polypeptide tag or additional protein domain attached thereto.
  • the heterologous amino acid sequence can be attached to the N terminus, the C terminus, or internally within the polypeptide.
  • the polypeptide may have one or more polypeptide tags and/or additional protein domains attached thereto at one or more positions of the polypeptide.
  • the nucleic acid sequence encoding the polypeptides disclosed herein may comprise a heterologous nucleic acid sequence attached thereto.
  • the heterologous nucleic acid sequence may encode a polypeptide tag or additional protein domain that will be attached to the encoded polypeptide.
  • the heterologous nucleic acid sequence may encode a regulatory element such as an intron, an enhancer, a promoter, a terminator, etc.
  • the heterologous nucleic acid sequence can be positioned at the 5' end, the 3' end, or in-frame within the coding sequence of the polypeptide.
  • the nucleic acid sequence encoding the polypeptides disclosed herein may have one or more heterologous nucleic acid sequences attached thereto at one or more positions of the nucleic acid sequence.
  • heterologous in reference to a polypeptide or polynucleotide sequence is a sequence that originates, for example, from a cell or an organism with another genetic background of the same species or from a foreign species, or, if from the same species, is substantially modified from its native form in composition and/or genomic locus by deliberate human intervention. As such, heterologous sequences are in a configuration not found in nature.
  • a “native” polynucleotide or polypeptide comprises a naturally occurring nucleotide sequence or amino acid sequence, respectively.
  • heterologous refers to, when used in reference to a gene or nucleic acid, a gene encoding a factor that is not in its natural environment (i.e., has been altered by the of man) .
  • a heterologous gene may include a gene from one species introduced into another species.
  • a heterologous gene may also include a gene native to an organism that has been altered in some way (e.g., mutated, added in multiple copies, linked to a non-native promoter or enhancer polynucleotide, etc. ) .
  • Heterologous genes further may comprise plant gene polynucleotides that comprise cDNA forms of a plant gene; the cDNAs may be expressed in either a sense (to produce mRNA) or anti-sense orientation (to produce an antisense RNA transcript that is complementary to the mRNA transcript) .
  • heterologous genes are distinguished from endogenous plant genes in that the heterologous gene polynucleotide are joined to polynucleotides comprising regulatory elements such as promoters that are not found naturally associated with the gene for the protein encoded by the heterologous gene or with plant gene polynucleotide in the chromosome, or are associated with portions of the chromosome not found in nature (e.g., genes expressed in loci where the gene is not normally expressed) .
  • a “heterologous” polynucleotide is a polynucleotide not naturally associated with a host cell into which it is introduced, including non-naturally occurring multiple copies of a naturally occurring polynucleotide.
  • Polynucleotides encoding the polypeptides provided herein can be provided in expression cassettes for expression in an organism of interest.
  • the cassette will include 5' and 3' regulatory sequences operably linked to a polynucleotide encoding a polypeptide provided herein that allows for expression of the polynucleotide.
  • the cassette may additionally contain at least one additional gene or genetic element to be co-transformed into the organism. Where additional genes or elements are included, the components are operably linked. Alternatively, the additional gene (s) or element (s) can be provided on multiple expression cassettes.
  • Such an expression cassette is provided with a plurality of restriction sites and/or recombination sites for insertion of the polynucleotides to be under the transcriptional regulation of the regulatory elements or regions.
  • the expression cassette may additionally contain a selectable marker gene.
  • the expression cassette will include in the 5'-3' direction of transcription, a transcriptional and translational initiation region (i.e., a promoter) , a polynucleotide of the invention, and a transcriptional and translational termination region (i.e., termination region) functional in the organism of interest, i.e., a plant or bacteria.
  • the promoters of the invention are capable of directing or driving transcription and expression of a coding sequence in a host cell.
  • the regulatory regions i.e., promoters, transcriptional regulatory regions, and translational termination regions
  • a chimeric gene or a chimeric nucleic acid molecule comprises a coding sequence operably linked to a transcription initiation region that is heterologous to the coding sequence.
  • transcriptional terminators are available for use in expression cassettes. These are responsible for the termination of transcription beyond the transgene and correct mRNA polyadenylation.
  • the termination region may be native with the transcriptional initiation region, may be native with the operably linked DNA sequence of interest, may be native with the plant host, or may be derived from another source (i.e., foreign or heterologous to the promoter, the DNA sequence of interest, the plant host, or any combination thereof) .
  • Appropriate transcriptional terminators are those that are known to function in plants and include the CAMV 35S terminator, the tml terminator, the nopaline synthase terminator and the pea rbcs E9 terminator.
  • Termination regions used in the expression cassettes can be obtained from, e.g., the Ti-plasmid of A. tumefaciens, such as the octopine synthase and nopaline synthase termination regions. See also Guerineau et al. (1991) Mol. Gen. Genet. 262: 141-144; Proudfoot (1991) Cell 64: 671-674; Sanfacon et al. (1991) Genes Dev. 5: 141-149; Mogen et al. (990) Plant Cell 2: 1261-1272; Munroe et al.
  • Additional regulatory signals include, but are not limited to, transcriptional initiation start sites, operators, activators, enhancers, other regulatory elements, ribosomal binding sites, an initiation codon, termination signals, and the like. See, for example, U.S. Pat. Nos. 5,039,523 and 4,853,331; EPO 0480762A2; Sambrook et al. (1992) Molecular Cloning: A Laboratory Manual, ed. Maniatis et al. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. ) , hereinafter “Sambrook 11” ; Davis et al, eds. (1980) .
  • the various DNA fragments may be manipulated, so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame.
  • adapters or linkers may be employed to join the DNA fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like.
  • in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions may be involved.
  • a number of promoters can be used in the practice of the invention.
  • the promoters can be selected based on the desired outcome.
  • the nucleic acids can be combined with constitutive, inducible, tissue-preferred, or other promoters for expression in the organism of interest.
  • constitutive promoters can also be used.
  • constitutive promoters include CaMV 35S promoter (Odell et al. (985) Nature 313 : 810-812) ; rice actin (McElroy et al. (1990) Plant Cell 2: 163-171) ; ubiquitin (Christensen et al. (1989) Plant Mol. Biol. 12: 619-632 and Christensen et al. (1992) Plant Mol. Biol. 18: 675-689) ; pEMU (Last et al. (1991) Theor. Appl. Genet. 81: 581 -588) ; MAS (Velten e/a/.
  • Inducible promoters include those that drive expression of pathogenesis-related proteins (PR proteins) , which are induced following infection by a pathogen.
  • PR proteins pathogenesis-related proteins
  • PR proteins pathogenesis-related proteins
  • Promoters that are expressed locally at or near the site of pathogen infection may also be used (Marineau et al. (1987) Plant Mol. Biol.
  • Wound-inducible promoters may be used in the constructions of the invention.
  • Such wound-inducible promoters include pin II promoter (Ryan (1990) Ann. Rev. Phytopath. 28: 425-449; Ouan et al. (1996) Nature Biotechnology 14: 494-498) ; wunl and wun2 (U.S. Patent No. 5,428,148) ; winl and win2 (Stanford et al. (1989) Mol. Gen. Genet. 215: 200-208) ; systemin (McGurl et al. (1992) Science 225: 1570-1573) ; WIP1 (Rohmeier et al. (1993) Plant Mol. Biol.
  • Tissue-preferred promoters for use in the invention include those set forth in Yamamoto et al. (1997) Plant J. 12 (2) : 255-265; Kawamata et al. (1997) Plant Cell Physiol. 38 (7) : 792-803; Hansen et al. (1997) Mol. Gen Genet. 254 (3) : 337-343; Russell et al. (1997) Transgenic Res. 6 (2) : 157-168; Rinehart et al. (1996) Plant Physiol. 112 (3) : 1331-1341; Van Camp et al. (1996) Plant Physiol. 112 (2) : 525-535; Canevascim et al. (1996) Plant Physiol.
  • Leaf-preferred promoters include those set forth in Yamamoto et al. (1997) Plant J. 12 (2) : 255-265; Kwon et al. (1994) Plant Physiol. 105: 357-67; Yamamoto et al. (1994) Plant Cell Physiol. 35 (5) : 773-778; Gotor et al. (1993) Plant J. 3: 509-18; Orozco et al. (1993) Plant Mol. Biol. 23 (6) : 1129-1138; and Matsuoka et al. (1993) Proc. Natl. Acad. Sci. USA 90 (20) : 9586-9590.
  • Root-preferred promoters are known and include those in Hire et al. (1992) Plant Mol. Biol. 20 (2) : 207-218 (soybean root-specific glutamine synthetase gene) ; Keller and Baumgartner (1991) Plant Cell 3 (10) : 1051-1061 (root-specific control element) ; Sanger et al. (1990) Plant Mol. Biol. 14 (3) : 433-443 (mannopine synthase (MAS) gene of Agrobacterium tumefaciens) ; and Miao et al. (1991) Plant Cell 3 (1) : 11-22 (cytosolic glutamine synthetase (GS) ) ; Bogusz et al.
  • seed-preferred promoters include both “seed-specific” promoters (those promoters active during seed development such as promoters of seed storage proteins) as well as “seed-germinating” promoters (those promoters active during seed germination) . See Thompson et al. (1989) BioEssays 10: 108. Seed-preferred promoters include, but are not limited to, Ciml (cytokinin-induced message) ; cZ19Bl (maize 19 kDa zein) ; milps (myo-inositol-1 -phosphate synthase) (see WO 00/11177 and U.S. Patent No. 6,225,529) .
  • Gamma-zein is an endosperm- specific promoter.
  • Globulin 1 (Gib-1) is a representative embryo-specific promoter.
  • seed-specific promoters include, but are not limited to, bean ⁇ -phaseolin, napin, ⁇ -conglycinin, soybean lectin, cruciferin, and the like.
  • seed-specific promoters include, but are not limited to, maize 15 kDa zein, 22 kDa zein, 27 kDa zein, gamma-zein, waxy, shrunken 1, shrunken 2, Globulin 1, etc. See also WO 00/12733, where seed-preferred promoters from endl and end! genes are disclosed.
  • the polynucleotides or variants thereof provided herein are not expressed using a root-specific promoter. In further embodiments, the polynucleotides or variants thereof provided herein are not expressed with the RCc3 root-specific promoter. (See US20130139280) .
  • promoters that function in bacteria are well-known in the art.
  • Such promoters include any of the known crystal protein gene promoters, including the promoters of any of the proteins of the invention, and promoters specific for B. thuringiensis sigma factors.
  • mutagenized or recombinant crystal protein-encoding gene promoters may be recombinantly engineered and used to promote expression of the novel gene segments disclosed herein.
  • leader sequences derived from viruses are also known to enhance expression, and these are particularly effective in dicotyledonous cells.
  • the expression cassette may comprise one or more of such leader sequences.
  • leader sequences from tobacco mosaic virus (TMV, the “W-sequence” ) have been shown to be effective in enhancing expression (e.g., Gallie et al. Nucl. Acids Res. 15: 8693-8711 (1987) ; Skuzeski et al. Plant Molec. Biol. 15: 65-79 (1990) ) .
  • leader sequences known in the art include but are not limited to: picomavirus leaders, for example, EMCV leader (encephalomyocarditis 5' noncoding region) (Elroy-Stein, O., Fuerst, T.R., and Moss, B.
  • EMCV leader encephalomyocarditis 5' noncoding region
  • potyvirus leaders for example, tobacco etch virus (TEV) leader (Allison et al., 1986) ; maize dwarf mosaic virus (MDMV) leader; Virology 154: 9-20) ; human immunoglobulin heavy-chain binding protein (BiP) leader, (Macejak, D.G., and Samow, P., Nature 353: 90-94 (1991) ; untranslated leader from the coat protein mRNA of alfalfa mosaic virus (AMV RNA 4) , (Jobling, S.A., and Gehrke, L., Nature 325: 622-625 (1987) ; tobacco mosaic virus leader (TMV) , (Gallie, D.
  • TMV tobacco mosaic virus leader
  • TMV tobacco mosaic virus leader
  • the expression cassette can also comprise a selectable marker gene for the selection of transformed cells.
  • Selectable marker genes are utilized for the selection of transformed cells or tissues.
  • Marker genes include genes encoding antibiotic resistance, such as those encoding neomycin phosphotransferase ⁇ (NEO) and hygromycin, 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) , spectinomycin, or acetolactate synthase (ALS) .
  • Selection markers used routinely in transformation include the nptll gene, which confers resistance to kanamycin and related antibiotics (Messing &Vierra Gene 19: 259-268 (1982) ; Bevan et al., Nature 304: 184-187 (1983) ) , the pat and bar genes, which confer resistance to the herbicide glufosinate (also called phosphinothricin; see White et al., Nucl. Acids Res 18: 1062 (1990) , Spencer et al. Theor. Appl. Genet 79: 625-631 (1990) and U.S. Patent Nos.
  • the promoter used herein to drive the expression of the above referenced polynucleotide comprises SEQ ID NO: 23 (FIG. 17) .
  • the promoter used herein to drive the expression of the polynucleotides provided herein comprises a native promoter or an active variant or fragment thereof.
  • native promoter used interchangeably with the term “endogenous promoter, ” refers to a promoter that is found in plants in nature.
  • An active variant or fragment of a native promoter refers to a promoter sequence that has one or more nucleotide substitutions, deletions, or insertions and that can drive expression of an operably-linked polynucleotide sequence under conditions similar to those under which the native promoter is active.
  • Such active variants or fragments may be created by site-directed mutagenesis, induced mutation, or may occur as allelic variants (polymorphisms) .
  • a construct comprising a native promoter or an active variant or fragment thereof operably linked to a polynucleotide encoding a polypeptide having the sequence of any one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, or a fragment or variant (e.g., having least 85%, at least 90%, at least 95%, at least 98%, or at least 99%identity) of any one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59; and when introduced into a plant, the construct confers increased protein content, increased oil content, and/or modified oil profile.
  • the native promoter is a heterologous promoter to the polynucleotide.
  • a plant e.g., a plant cell
  • a plant part e.g., a plant seed
  • the polynucleotide encodes a polypeptide having an amino acid sequence comprising least 85%, at least 90%, at least 95%, at least 98%, or at least 99%identity to at least one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59.
  • the polynucleotide comprises at least one, at least two, at least three, at least four, at least five, or at least six mutations as compared to any one of SEQ ID NOs: 1, 2, 4, 7, 10, 11,14, 16, 17, 20, or 21. In some embodiments, the polynucleotide comprises at least one, at least two, at least three, at least four, at least five, or at least six mutations as compared to a polynucleotide encoding any one of SEQ ID NO: 22 or 24-59.
  • the plant is a dicot plant. In some embodiments, the plant is a monocot plant. In some embodiments, the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane. In some embodiments, the plant is a soybean plant. In some embodiments, the plant is an elite soybean plant.
  • nucleic acid sequence operably linked to a native promoter or an active variant or fragment thereof, where the nucleic acid sequence encodes a polypeptide having an amino acid sequence comprising least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%identity, at least 96%, at least 97%, at least 98%, or at least 99%identity to at least one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59.
  • the nucleic acid sequence encodes a polypeptide having an amino acid sequence set forth in SEQ ID 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59.
  • the polynucleotide as described in Section I of this disclosure is a heterologous nucleic acid sequence in the genome of the plant.
  • heterologous in the context of a chromosomal segment refers to one or more DNA sequences (e.g., genetic loci) in a configuration in which they are not found in nature, for example as a result of a recombination event between homologous chromosomes during meiosis, or for example as a result of introduction of a transgenic sequence, or for example as a result of modification through gene editing.
  • soybean plants are used to exemplify the composition and methods throughout the application, a polynucleotide as provided herein may be introduced to any plant species, including, but not limited to, monocots and dicots.
  • plants of interest include, but are not limited to, corn (maize) , sorghum, wheat, sunflower, tomato, crucifers, peppers, potato, cotton, rice, soybean, sugarbeet, sugarcane, tobacco, barley, and oilseed rape, Brassica sp., alfalfa, rye, millet, safflower, peanuts, sweet potato, cassava, coffee, coconut, pineapple, citrus trees, cocoa, tea, banana, avocado, fig, guava, mango, olive, papaya, cashew, macadamia, almond, oats, vegetables, ornamentals, and conifers.
  • Glycine (soybean or soya bean) is a genus in the bean family Fabaceae.
  • the Glycine plants can be Glycine arenaria, Glycine argyrea, Glycine cyrtoloba, Glycine canescens, Glycine clandestine, Glycine curvata, Glycinefalcata, Glycine latifolia, Glycine microphylla, Glycine pescadrensis , Glycine stenophita, Glycine syndetica, Glycine soja Seib. Et Zucc., Glycine max (L.) Merrill., Glycine tabacina, or Glycine tomentella.
  • the plants provided herein are elite plants or derived from an elite line.
  • an “elite line” is an agronomically superior line that has resulted from many cycles of breeding and selection for superior agronomic performance. Numerous elite lines are available and known to those of skill in the art of soybean breeding. An “elite population, ” is an assortment of elite individuals or lines that can be used to represent the state of the art in terms of agronomically superior genotypes of a given crop species, such as soybean. Similarly, an “elite germplasm” or elite strain of germplasm is an agronomically superior germplasm, typically derived from, and/or can give rise to, a plant with superior agronomic performance, such as an existing or newly developed elite line of soybean.
  • An “elite” plant is any plant from an elite line, such that an elite plant is a representative plant from an elite variety.
  • the soybean plant comprising a polynucleotide encoding any one of the polypeptides disclosed herein is an elite soybean plant.
  • Non-limiting examples of elite soybean varieties that are commercially available to farmers or soybean breeders include: AG00802, A0868, AG0902, A1923, AG2403, A2824, A3704, A4324, A5404, AG5903, AG6202 AG0934; AG1435; AG2031; AG2035; AG2433; AG2733; AG2933; AG3334; AG3832; AG4135; AG4632; AG4934; AG5831; AG6534; and AG7231 (Asgrow Seeds, Des Moines, Iowa, USA) ; BPR0144RR, BPR 4077NRR and BPR 4390NRR (Bio Plant Research, Camp Point, Ill., USA) ; DKB 17-51 and DKB37-51 (DeKalb Genetics, DeKalb, Ill., USA) ; DP 4546 RR, and DP 7870 RR (Delta &Pine Land Company, Lubbock, Tex., USA) ; JG 03R501, JG 32R
  • the plants provided herein can comprise one or more additional polynucleotides that encode an additional polypeptide that can confer a phenotype of increased protein content, increased oil content, and/or modified oil profile on a plant.
  • the additional polynucleotide encodes a polypeptide having the sequence of any one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59.
  • the additional polynucleotide can be introduced using similar approaches as disclosed above, e.g, by transgenic means, by breeding, or by genome editing.
  • the plants, plant parts or seeds having the heterologous polynucleotide or polypeptide disclosed herein or active variants and fragment thereof can have a modified level of expression of the polynucleotide or polypeptide (i.e, an increase or a decrease in expression level) .
  • the plants, plant parts or seeds having the heterologous polynucleotide or polypeptide disclosed herein or active variants and fragment thereof can have a modified level of activity of the polypeptide (i.e, an increase or a decrease in activity level) .
  • Methods to generate such modified levels of expression or activity are disclosed elsewhere herein and include, but are not limited to, breeding, gene editing, and transgenic techniques.
  • Plants produced as described above can be propagated to produce progeny plants, and the progeny plants that have stably incorporated into its genome a polynucleotide conferring the increased protein content, increased oil content, and/or modified oil profile can be selected and can be further propagated if desired.
  • progeny refers to the descendant (s) of a particular cross. Typically, progeny result from breeding of two individuals, although some species (particularly some plants and hermaphroditic animals) can be selfed (i.e., the same plant acts as the donor of both male and female gametes) .
  • the descendant (s) can be, for example, of the F1, the F2, or any subsequent generation.
  • a plant cell, seed, or plant part or harvest product can be obtained from the plant produced as above and the plant cell, seed, or plant part can be screened using methods disclosed above for the evidence of stable incorporation of the polynucleotide.
  • stable incorporation refers to the integration of a nucleic acid sequence into the genome of a plant and said nucleic acid sequence is capable of being inherited by the progeny thereof.
  • plant part indicates a part of a plant, including single cells and cell tissues such as plant cells that are intact in plants, cell clumps and tissue cultures from which plants can be regenerated.
  • plant parts include, but are not limited to, single cells and tissues from pollen, ovules, zygotes, leaves, embryos, roots, root tips, anthers, flowers, flower parts, fruits, stems, shoots, cuttings, and seeds; as well as pollen, ovules, egg cells, zygotes, leaves, embryos, roots, root tips, anthers, flowers, flower parts, fruits, stems, shoots, cuttings, scions, rootstocks, seeds, protoplasts, calli, and the like.
  • plant products can be harvested from the plant disclosed above and processed to produce processed products, such as flour, soy meal, oil, starch, and the like. These processed products are also within the scope of this invention provided that they comprise a polynucleotide or polypeptide or variant thereof disclosed herein.
  • processed products include but are not limited to protein concentrate, protein isolate, soybean hulls, meal, flower, oil and the whole soybean itself.
  • a nucleic acid sequence may be introduced to a plant cell by various ways, for example, by transformation, by genome modification techniques (such as by genome editing) , or by breeding.
  • the plant can be produced by transforming the nucleic acid sequence encoding a polypeptide disclosed above into a recipient plant.
  • the method can comprise editing the genome of the recipient plant so that the resulting plant comprises a polynucleotide encoding a polypeptide disclosed above.
  • the method can comprise increasing the expression level and/or activity of the above-mentioned proteins in a recipient plant, for example, by enhancing promoter activity or replacing the endogenous promoter with a stronger promoter.
  • the method can comprise breeding a donor plant comprising a polynucleotide as described above with a recipient plant and selecting for incorporation of the polynucleotide into the recipient plant genome.
  • the method comprises transforming a polynucleotide disclosed herein or an active variant or fragment thereof into a recipient plant to obtain a transgenic plant, and said transgenic plant has increased protein content, increased oil content, and/or modified oil profile.
  • Expression cassettes comprising polynucleotides encoding the polypeptides as described above can be used to transform plants of interest.
  • transgenic and grammatical variations thereof refer to a plant, including any part derived from the plant, such as a cell, tissue or organ, in which a heterologous nucleic acid is integrated into the genome.
  • the heterologous nucleic acid is a recombinant construct, vector or expression cassette comprising one or more nucleic acids.
  • a transgenic plant is produced by a genetic engineering method, such as Agrobacterium transformation. Through gene technology, the heterologous nucleic acid is stably integrated into chromosomes, so that the next generation can also be transgenic.
  • “transgenic” and grammatical variations thereof also encompass biological treatments, which include plant hybridization and/or natural recombination.
  • Transformation results in a transformed plant, including whole plants, as well as plant organs (e.g., leaves, stems, roots, etc. ) , seeds, plant cells, propagules, embryos and progeny of the same.
  • Plant cells can be differentiated or undifferentiated (e.g., callus, suspension culture cells, protoplasts, leaf cells, root cells, phloem cells, pollen) . Transformation may result in stable or transient incorporation of the nucleic acid into the cell.
  • Stable transformation is intended to mean that the nucleotide construct introduced into a host cell integrates into the genome of the host cell and is capable of being inherited by the progeny thereof.
  • Transient transformation is intended to mean that a polynucleotide is introduced into the host cell and does not integrate into the genome of the host cell.
  • Methods for transformation typically involve introducing a nucleotide construct into a plant.
  • the transformation method is an Agrobacterium-mediated transformation.
  • the transformation method is a biolistic-mediated transformation. Transformation may also be performed by infection, transfection, microinjection, electroporation, microprojection, biolistics or particle bombardment, electroporation, silica/carbon fibers, ultrasound mediated, PEG mediated, calcium phosphate co-precipitation, poly cation DMSO technique, DEAE dextran procedure, Agrobacterium and viral mediated (e.g., Caulimoriviruses, Geminiviruses, RNA plant viruses) , liposome mediated and the like.
  • Agrobacterium and viral mediated e.g., Caulimoriviruses, Geminiviruses, RNA plant viruses
  • Transformation protocols as well as protocols for introducing polypeptides or polynucleotide sequences into plants may vary depending on the type of plant or plant cell, i.e., monocot or dicot, targeted for transformation.
  • Methods for transformation are known in the art and include those set forth in US Patent Nos: 8,575,425; 7,692,068; 8,802,934; and 7,541,517; each of which is herein incorporated by reference. See, also, Rakoczy-Trojanowska, M. (2002) Cell Mol Biol Lett. 7: 849-858; Jones et al. (2005) Plant Methods, Vol. 1, Article 5; Rivera et al. (2012) Physics of Life Reviews 9: 308-345; Bartlett et al.
  • plastid transformation can be accomplished by transactivation of a silent plastid-borne transgene by tissue-preferred expression of a nuclear-encoded and plastid-directed RNA polymerase.
  • tissue-preferred expression of a nuclear-encoded and plastid-directed RNA polymerase Such a system has been reported in McBride et al. (1994) Proc. Natl. Acad. Sci. USA 91 (15) : 7301-7305.
  • the cells that have been transformed may be grown into plants in accordance with conventional ways. See, for example, McCormick et al. (1986) Plant Cell Reports 5: 81-84. These plants may then be grown, and either pollinated with the same transformed strain or different strains, and the resulting hybrid having constitutive expression of the desired phenotypic characteristic identified. Two or more generations may be grown to ensure that expression of the desired phenotypic characteristic is stably maintained and inherited and then seeds harvested to ensure expression of the desired phenotypic characteristic has been achieved. In this manner, the present invention provides transformed seed (also referred to as "transgenic seed” ) having a nucleotide construct of the invention, for example, an expression cassette of the invention, stably incorporated into their genome.
  • the method comprises crossing a donor plant comprising a polynucleotide encoding a polypeptide disclosed herein with a recipient plant, and the polypeptide is able to confer increased protein, increased oil content, and/or modified oil profile in the recipient plant.
  • crossing and “breeding” refer to the fusion of gametes to produce progeny (e.g., by fertilization, such as to produce seed by pollination in plants) .
  • a “cross, ” “breeding, ” or “cross-fertilization” is fertilization of one individual by another (e.g., cross-pollination in plants) .
  • the plant disclosed herein may be a whole plant, or may be a plant cell, seed, or tissue, or a plant part such as leaf, stem, pollen, or cell that can be cultivated into a whole plant.
  • a progeny plant created by the crossing or breeding process is repeatedly crossed back to one of its parents through a process referred to herein as “backcrossing” .
  • the “donor” parent refers to the parental plant with the desired gene or locus to be introgressed.
  • the “recipient” parent (used one or more times) or “recurrent” parent (used two or more times) refers to the parental plant into which the gene or locus is being introgressed. For example, see Ragot, M. et al. Marker-assisted Backcrossing: A Practical Example, in Techniques et Utilisations des Marqueurs Mole Les Colloques, Vol. 72, pp.
  • BC1 refers to the second use of the recurrent parent
  • BC2 refers to the third use of the recurrent parent
  • the donor soybean plant is a Glycine max plant. In some embodiments, the donor soybean plant is a Glycine soja plant. In some embodiments, the recipient soybean plant is an elite Glycine max plant or an elite Glycine soja plant. In some embodiments, the donor plant is from soy variety Suinong 14 (SN14) . In some embodiments, the donor plant is soy variety Glycine soja ZYD0006.
  • the polynucleotide sequences provided herein can be targeted to specific sites within the genome of a recipient plant cell.
  • Such methods include, but are not limited to, meganucleases designed against the plant genomic sequence of interest CRISPR-Cas9, TALENs, and other technologies for precise editing of genomes (Feng, et al. Cell Research 23: 1229-1232, 2013, WO 2013/026740) ; Cre-lox site-specific recombination; FLP-FRT recombination (Li et al. (2009) Plant Physiol 151: 1087-1095) ; Bxbl -mediated integration (Yau et al.
  • gene editing is used to mutagenize the genome of a plant to produce plants having one or more of the polypeptides that is able to confer increased protein content, increased oil content, and/or modified oil profile.
  • plants transformed with and expressing gene-editing machinery as described above which, when crossed with a target plant, result in gene editing in the target plant.
  • gene editing may involve transient, inducible, or constitutive expression of the gene editing components or systems.
  • Gene editing may involve genomic integration or episomal presence of the gene editing components or systems.
  • Gene editing generally refers to the use of a site-directed nuclease (including but not limited to CRISPR/Cas, zinc fingers, meganucleases, and the like) to cut a nucleotide sequence at a desired location. This may be to cause an insertion/deletion ( “indel” ) mutation, (i.e., “SDN1” ) , a base edit (i.e., “SDN2” ) , or allele insertion or replacement (i.e., “SDN3” ) .
  • a site-directed nuclease including but not limited to CRISPR/Cas, zinc fingers, meganucleases, and the like
  • SDN2 or SDN3 gene editing may comprise the provision of one or more recombination templates (e.g., in a vector) comprising a gene sequence of interest that can be used for homology directed repair (HDR) within the plant (i.e., to be introduced into the plant genome) .
  • the gene or allele of interest is one that is able to confer to the plant an improved trait, e.g., increased protein content, increased oil content, and/or modified oil profile.
  • the recombination template can be introduced into the plant to be edited either through transformation or through breeding with a donor plant comprising the recombination template. Breaks in the plant genome may be introduced within, upstream, and/or downstream of a target sequence.
  • a double strand DNA break is made within or near the target sequence locus.
  • breaks are made upstream and downstream of the target sequence locus, which may lead to its excision from the genome.
  • one or more single strand DNA breaks are made within, upstream, and/or downstream of the target sequence (e.g., using a nickase Cas9 variant) . Any of these DNA breaks, as well as those introduced via other methods known to one of skill in the art, may induce HDR.
  • the target sequence is replaced by the sequence of the provided recombination template comprising a polynucleotide of interest, e.g., any one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or a polynucleotide encoding a polypeptide having the sequence of any one of SEQ ID NO: 22, or 24-59 may be provided on/as a template.
  • a polynucleotide of interest e.g., any one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21
  • a polynucleotide encoding a polypeptide having the sequence of any one of SEQ ID NO: 22, or 24-59 may be provided on/as a template.
  • the polynucleotide of interest is operably linked to a promoter and the expression of the polynucleotide of interest controlled by the promoter conferred increased protein increased, oil content, and/or modified oil profile to the plant.
  • the promoter is a native promoter or an active variant or fragment thereof as described above.
  • mutations in the genes of interest described herein may be generated without the use of a recombination template via targeted introduction of DNA double strand breaks. Such breaks may be repaired through the process of non-homologous end joining (NHEJ) , which can result in the generation of small insertions or deletions (indels) at the repair site. Such indels may lead to frameshift mutations causing premature stop codons or other types of loss-of-function mutations in the targeted genes.
  • NHEJ non-homologous end joining
  • gene editing may involve transient, inducible, or constitutive expression of the gene editing components or systems in the target plant.
  • Gene editing may also involve genomic integration or episomal presence of the gene editing components or systems in the target plant.
  • the nucleic acid modification or mutation is effected by a (modified) zinc-finger nuclease (ZFN) system.
  • ZFN zinc-finger nuclease
  • the ZFN system uses artificial restriction enzymes generated by fusing a zinc finger DNA-binding domain to a DNA-cleavage domain that can be engineered to target desired DNA sequences. Exemplary methods of genome editing using ZFNs can be found for example in U.S. Patent Nos. 6,534,261; 6,607,882; 6,746,838; 6,794,136; 6,824,978; 6,866,997; 6,933,113; and 6,979,539.
  • the nucleic acid modification is effected by a (modified) meganuclease, which are endodeoxyribonucleases characterized by a large recognition site (double-stranded DNA sequences of 12 to 40 base pairs) .
  • a (modified) meganuclease which are endodeoxyribonucleases characterized by a large recognition site (double-stranded DNA sequences of 12 to 40 base pairs) .
  • Exemplary method for using meganucleases can be found in US Patent Nos: 8,163,514; 8,133,697; 8,021,867; 8,119,361; 8,119,381; 8,124,369; and 8,129,134, which are specifically incorporated by reference.
  • the nucleic acid modification is effected by a (modified) CRISPR/Cas complex or system.
  • the CRISPR/Cas system or complex is a class 2 CRISPR/Cas system.
  • said CRISPR/Cas system or complex is a type II, type V, or type VI CRISPR/Cas system or complex.
  • the CRISPR/Cas system does not require the generation of customized proteins to target specific sequences but rather a single Cas protein can be programmed by an RNA guide (gRNA) to recognize a specific nucleic acid target, in other words the Cas enzyme protein can be recruited to a specific nucleic acid target locus (which may comprise or consist of RNA and/or DNA) of interest using said short RNA guide.
  • gRNA RNA guide
  • CRISPR/Cas or CRISPR system is as used herein foregoing documents refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated ( “Cas” ) genes, including sequences encoding a Cas gene and one or more of, a tracr (trans-activating CRISPR) sequence (e.g.
  • RNA (s) as that term is herein used (e.g., RNA (s) to guide Cas, such as Cas9, e.g. CRISPR RNA and, where applicable, transactivating (tracr) RNA or a single guide RNA (sgRNA) (chimeric RNA) ) or other sequences and transcripts from a CRISPR locus.
  • RNA e.g., RNA (s) to guide Cas, such as Cas9, e.g. CRISPR RNA and, where applicable, transactivating (tracr) RNA or a single guide RNA (sgRNA) (chimeric RNA) ) or other sequences and transcripts from a CRISPR locus.
  • a CRISPR system is characterized by elements that promote the formation of a CRISPR complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous CRISPR system) .
  • target sequence refers to a sequence to which a guide sequence is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex.
  • a target sequence may comprise any polynucleotide, such as DNA or RNA polynucleotides.
  • the gRNA is a chimeric guide RNA or single guide RNA (sgRNA) .
  • the gRNA comprises a guide sequence and a tracr mate sequence (or direct repeat) .
  • the gRNA comprises a guide sequence, a tracr mate sequence (or direct repeat) , and a tracr sequence.
  • the CRISPR/Cas system or complex as described herein does not comprise and/or does not rely on the presence of a tracr sequence (e.g. if the Cas protein is Cas12a) .
  • the Cas protein as referred to herein such as but not limited to Cas9, Cas12a (formerly referred to as Cpf1) , Cas12b (formerly referred to as C2c1) , Cas13a (formerly referred to as C2c2) , C2c3, Cas13b protein, may originate from any suitable source, and hence may include different orthologues, originating from a variety of (prokaryotic) organisms, as is well documented in the art.
  • the Cas protein is (modified) Cas9, preferably (modified) Staphylococcus aureus Cas9 (SaCas9) or (modified) Streptococcus pyogenes Cas9 (SpCas9) .
  • the Cas protein is Cas12a, optionally from Acidaminococcus sp., such as Acidaminococcus sp. BV3L6 Cpf1 (AsCas12a ) or Lachnospiraceae bacterium Cas12a , such as Lachnospiraceae bacterium MA2020 or Lachnospiraceae bacterium MD2006 (LBCas12a) . See U.S. Pat. No. 10,669,540, incorporated herein by reference in its entirety.
  • the Cas12a protein may be from Moraxella bovoculi AAX08_00205 [Mb2Cas12a] or Moraxella bovoculi AAX11_00205 [Mb3Cas12a] . See WO 2017/189308, incorporated herein by reference in its entirety.
  • the Cas protein is (modified) C2c2, preferably Leptotrichia wadei C2c2 (LwC2c2) or Listeria newyorkensis FSL M6-0635 C2c2 (LbFSLC2c2) .
  • the (modified) Cas protein is C2c1.
  • the (modified) Cas protein is C2c3.
  • the (modified) Cas protein is Cas13b.
  • Other Cas enzymes are available to a person skilled in the art.
  • the gene-editing machinery (e.g., the DNA modifying enzyme) introduced into the plants can be controlled by any promoter that can drive recombinant gene expression in plants.
  • the promoter is a constitutive promoter.
  • the promoter is a tissue-specific promoter, e.g., a pollen-specific promoter or a sperm cell specific promoter, a zygote specific promoter, or a promoter that is highly expressed in sperm, eggs and zygotes (e.g., prOsActin1) .
  • Suitable promoters are disclosed in U.S. Pat. No. 10,519,456, the entire content of which is herein incorporated by reference.
  • a method of editing plant genomic DNA comprises using a first soybean plant expressing a DNA modification enzyme and at least one optional guide nucleic acid as described above to pollinate a target plant comprising genomic DNA to be edited.
  • the various polynucleotides and variants thereof provided herein can be stacked with one or more polynucleotides encoding a desirable trait such as a polynucleotide that confers, for example, insect, disease or herbicide resistance or other desirable agronomic traits of interest including, but not limited to, traits associated with high oil content; increased digestibility; balanced amino acid content; and high energy content.
  • a desirable trait such as a polynucleotide that confers, for example, insect, disease or herbicide resistance or other desirable agronomic traits of interest including, but not limited to, traits associated with high oil content; increased digestibility; balanced amino acid content; and high energy content.
  • Such traits may refer to properties of both seed and non-seed plant tissues, or to food or feed prepared from plants or seeds having such traits.
  • gene or trait “stacking” is combining desired genes or traits into one transgenic plant line.
  • plant breeders stack transgenic traits by making crosses between parents that each have a desired trait and then identifying offspring that have both of these desired traits (so-called “breeding stacks” ) .
  • Another way to stack genes is by transferring two or more genes into the cell nucleus of a plant at the same time during transformation.
  • Another way to stack genes is by re-transforming a transgenic plant with another gene of interest.
  • gene stacking can be used to combine two different insect resistance traits, an insect resistance trait and a disease resistance trait, or a herbicide resistance trait (such as, for example, Bt11) .
  • the use of a selectable marker in addition to a gene of interest would also be considered gene stacking.
  • a nucleic acid molecule or vector of the disclosure can include an additional coding sequence for one or more polypeptides or double stranded RNA molecules (dsRNA) of interest for agronomic traits that primarily are of benefit to a seed company, grower or grain processor.
  • a polypeptide of interest can be any polypeptide encoded by a nucleotide sequence of interest.
  • Non-limiting examples of polypeptides of interest that are suitable for production in plants include those resulting in agronomically important traits such as herbicide resistance (also sometimes referred to as “herbicide tolerance” ) , virus resistance, bacterial pathogen resistance, insect resistance, nematode resistance, or fungal resistance. See, e.g., U.S. Patent Nos.
  • the polypeptide also can be one that increases plant vigor or yield (including traits that allow a plant to grow at different temperatures, soil conditions and levels of sunlight and precipitation) , or one that allows identification of a plant exhibiting a trait of interest (e.g., a selectable marker, seed coat color, relative maturity group, etc. ) .
  • a trait of interest e.g., a selectable marker, seed coat color, relative maturity group, etc.
  • Polynucleotides conferring resistance/tolerance to an herbicide that inhibits the growing point or meristem can also be suitable in some embodiments.
  • Exemplary polynucleotides in this category code for mutant ALS and AHAS enzymes as described, e.g., in U.S. Patent Nos. 5,767,366 and 5,928,937.
  • U.S. Patent Nos. 4,761,373 and 5,013,659 are directed to plants resistant to various imidazalinone or sulfonamide herbicides.
  • 4,975,374 relates to plant cells and plants containing a nucleic acid encoding a mutant glutamine synthetase (GS) resistant to inhibition by herbicides that are known to inhibit GS, e.g., phosphinothricin and methionine sulfoximine.
  • GS glutamine synthetase
  • U.S. Patent No. 5,162,602 discloses plants resistant to inhibition by cyclohexanedione and aryloxyphenoxypropanoic acid herbicides. The resistance is conferred by an altered acetyl coenzyme A carboxylase (ACCase) .
  • Polypeptides encoded by nucleotides sequences conferring resistance to glyphosate are also suitable for the disclosure. See, e.g., U.S. Patent No. 4,940,835 and U.S. Patent No. 4,769,061.
  • U.S. Patent No. 5,554,798 discloses transgenic glyphosate resistant maize plants, which resistance is conferred by an altered 5-enolpyruvyl-3-phosphoshikimate (EPSP) synthase gene.
  • EPP 5-enolpyruvyl-3-phosphoshikimate
  • Polynucleotides coding for resistance to phosphono compounds such as glufosinate ammonium or phosphinothricin, and pyridinoxy or phenoxy propionic acids and cyclohexones are also suitable. See, European Patent Application No. 0 242 246. See also, U.S. Patent Nos. 5,879,903, 5,276,268, and 5,561,236.
  • suitable polynucleotides include those coding for resistance to herbicides that inhibit photosynthesis, such as a triazine and a benzonitrile (nitrilase) See, U.S. Patent No. 4,810,648.
  • Additional suitable polynucleotides coding for herbicide resistance include those coding for resistance to 2, 2-dichloropropionic acid, sethoxydim, haloxyfop, imidazolinone herbicides, sulfonylurea herbicides, triazolopyrimidine herbicides, s-triazine herbicides and bromoxynil.
  • polynucleotides conferring resistance to a protox enzyme, or that provide enhanced resistance to plant diseases; enhanced tolerance of adverse environmental conditions (abiotic stresses) including but not limited to drought, excessive cold, excessive heat, or excessive soil salinity or extreme acidity or alkalinity; and alterations in plant architecture or development, including changes in developmental timing. See, e.g., U.S. Patent Publication No. 2001/0016956 and U.S. Patent No. 6,084,155.
  • Additional suitable polynucleotides include those coding for insecticidal polypeptides. These polypeptides may be produced in amounts sufficient to control, for example, insect pests (i.e., insect controlling amounts) . It is recognized that the amount of production of an insectidal polypeptide in a plant necessary to control insects or other pests may vary depending upon the cultivar, type of pest, environmental factors and the like. Polynucleotides useful for additional insect or pest resistance include, for example, those that encode toxins identified in Bacillus organisms.
  • Bt insecticidal proteins include the Cry proteins such as Cry1Aa, Cry1Ab, Cry1Ac, Cry1B, Cry1C, Cry1D, Cry1Ea, Cry1Fa, Cry3A, Cry9A, Cry9B, Cry9C, and the like, as well as vegetative insecticidal proteins such as Vip1, Vip2, Vip3, and the like.
  • an additional polypeptide is an insecticidal polypeptide derived from a non-Bt source, including without limitation, an alpha-amylase, a peroxidase, a cholesterol oxidase, a patatin, a protease, a protease inhibitor, a urease, an alpha-amylase inhibitor, a pore-forming protein, a chitinase, a lectin, an engineered antibody or antibody fragment, a Bacillus cereus insecticidal protein, a Xenorhabdus spp. (such as X. nematophila or X. bovienii) insecticidal protein, a Photorhabdus spp. (such as P.
  • luminescens or P. asymobiotica) insecticidal protein a Brevibacillus spp. (such as B. laterosporous) insecticidal protein, a Lysinibacillus spp. (such as L. sphearicus) insecticidal protein, a Chromobacterium spp. (such as C. subtsugae or C. foundedae) insecticidal protein, a Yersinia spp. (such as Y. entomophaga) insecticidal protein, a Paenibacillus spp. (such as P. propylaea) insecticidal protein, a Clostridium spp. (such as C. bifermentans) insecticidal protein, a Pseudomonas spp. (such as P. fluorescens) and a lignin.
  • a Brevibacillus spp. such as B. laterosporous insect
  • Polypeptides that are suitable for production in plants further include those that improve or otherwise facilitate the conversion of harvested plants or plant parts into a commercially useful product, including, for example, increased or altered carbohydrate content or distribution, improved fermentation properties, increased oil content, increased protein content, modified oil profile, improved digestibility, and increased nutraceutical content, e.g., increased phytosterol content, increased tocopherol content, increased stanol content or increased vitamin content.
  • Polypeptides of interest also include, for example, those resulting in or contributing to a reduced content of an unwanted component in a harvested crop, e.g., phytic acid, or sugar degrading enzymes. By “resulting in” or “contributing to” is intended that the polypeptide of interest can directly or indirectly contribute to the existence of a trait of interest (e.g., increasing cellulose degradation by the use of a heterologous cellulase enzyme) .
  • the polypeptide contributes to improved digestibility for food or feed.
  • Xylanases are hemicellulolytic enzymes that improve the breakdown of plant cell walls, which leads to better utilization of the plant nutrients by an animal. This leads to improved growth rate and feed conversion. Also, the viscosity of the feeds containing xylan can be reduced. Heterologous production of xylanases in plant cells also can facilitate lignocellulosic conversion to fermentable sugars in industrial processing.
  • a polypeptide useful for the disclosure can be a polysaccharide degrading enzyme. Plants of this disclosure producing such an enzyme may be useful for generating, for example, fermentation feedstocks for bioprocessing.
  • enzymes useful for a fermentation process include alpha amylases, proteases, pullulanases, isoamylases, cellulases, hemicellulases, xylanases, cyclodextrin glycotransferases, lipases, phytases, laccases, oxidases, esterases, cutinases, granular starch hydrolyzing enzyme and other glucoamylases.
  • Polysaccharide-degrading enzymes include: starch degrading enzymes such as ⁇ -amylases (EC 3.2.1.1) , glucuronidases (E.C. 3.2.1.131) ; exo-1, 4- ⁇ -D glucanases such as amyloglucosidases and glucoamylase (EC 3.2.1.3) , ⁇ -amylases (EC 3.2.1.2) , ⁇ -glucosidases (EC 3.2.1.20) , and other exo-amylases; starch debranching enzymes, such as a) isoamylase (EC 3.2.1.68) , pullulanase (EC 3.2.1.41) , and the like; b) cellulases such as exo-1, 4-3-cellobiohydrolase (EC 3.2.1.91) , exo-1, 3- ⁇ -D-glucanase (EC 3.2.1.39) , ⁇ -glucosidase (
  • proteases such as fungal and bacterial proteases.
  • Fungal proteases include, but are not limited to, those obtained from Aspergillus, Trichoderma, Mucor and Rhizopus, such as A. niger, A. awamori, A. oryzae and M. miehei.
  • the polypeptides of this disclosure can be cellobiohydrolase (CBH) enzymes (EC 3.2.1.91) .
  • the cellobiohydrolase enzyme can be CBH1 or CBH2.
  • hemicellulases such as mannases and arabinofuranosidases (EC 3.2.1.55) ; ligninases; lipases (e.g., E.C. 3.1.1.3) , glucose oxidases, pectinases, xylanases, transglucosidases, alpha 1, 6 glucosidases (e.g., E.C. 3.2.1.20) ; esterases such as ferulic acid esterase (EC 3.1.1.73) and acetyl xylan esterases (EC 3.1.1.72) ; and cutinases (e.g. E.C. 3.1.1.74) .
  • hemicellulases such as mannases and arabinofuranosidases (EC 3.2.1.55) ; ligninases; lipases (e.g., E.C. 3.1.1.3) , glucose oxidases, pectinases, xy
  • Double stranded RNA molecules useful with the disclosure include but are not limited to those that suppress target insect genes.
  • gene suppression when taken together, are intended to refer to any of the well-known methods for reducing the levels of protein produced as a result of gene transcription to mRNA and subsequent translation of the mRNA. Gene suppression is also intended to mean the reduction of protein expression from a gene or a coding sequence including posttranscriptional gene suppression and transcriptional suppression.
  • Posttranscriptional gene suppression is mediated by the homology between of all or a part of a mRNA transcribed from a gene or coding sequence targeted for suppression and the corresponding double stranded RNA used for suppression and refers to the substantial and measurable reduction of the amount of available mRNA available in the cell for binding by ribosomes.
  • the transcribed RNA can be in the sense orientation to effect what is called co-suppression, in the anti-sense orientation to effect what is called anti-sense suppression, or in both orientations producing a dsRNA to effect what is called RNA interference (RNAi) .
  • Transcriptional suppression is mediated by the presence in the cell of a dsRNA, a gene suppression agent, exhibiting substantial sequence identity to a promoter DNA sequence or the complement thereof to effect what is referred to as promoter trans suppression.
  • Gene suppression may be effective against a native plant gene associated with a trait, e.g., to provide plants with reduced levels of a protein encoded by the native gene or with enhanced or reduced levels of an affected metabolite.
  • Gene suppression can also be effective against target genes in plant pests that may ingest or contact plant material containing gene suppression agents, specifically designed to inhibit or suppress the expression of one or more homologous or complementary sequences in the cells of the pest.
  • genes targeted for suppression can encode an essential protein, the predicted function of which is selected from the group consisting of muscle formation, juvenile hormone formation, juvenile hormone regulation, ion regulation and transport, digestive enzyme synthesis, maintenance of cell membrane potential, amino acid biosynthesis, amino acid degradation, sperm formation, pheromone synthesis, pheromone sensing, antennae formation, wing formation, leg formation, development and differentiation, egg formation, larval maturation, digestive enzyme formation, hemolymph synthesis, hemolymph maintenance, neurotransmission, cell division, energy metabolism, respiration, and apoptosis.
  • the polynucleotides provide herein are stacked with other polynucleotides that increase protein content, amino acid content, oil content, and/or oil profile, including, for example, the polynucleotides set forth in METHODS AND COMPOSITIONS FOR INCREASING PROTEIN AND/OR OIL CONTENT AND MODIFYING OIL PROFILE IN A PLANT, International Application No. ______, filed ______, 2022 (Attorney Docket No. 086879-1262815; Syngenta Ref. No. 82423-WO-REG-ORG-P-1, and filed concurrently herewith and herein incorporated by reference in its entirety.
  • selectable marker means a nucleotide sequence that when expressed imparts a distinct phenotype to the plant, plant part and/or plant cell expressing the marker and thus allows such transformed plants, plant parts and/or plant cells to be distinguished from those that do not have the marker.
  • Such a nucleotide sequence may encode either a selectable or screenable marker, depending on whether the marker confers a trait that can be selected for by chemical means, such as by using a selective agent (e.g., an antibiotic, herbicide, or the like) , or on whether the marker is simply a trait that one can identify through observation or testing, such as by screening (e.g., the R-locus trait) .
  • Selectable markers can also include the makers associated with oil and/or protein content and fatty acid profile (e.g., as described in Whiting, R.M., et al., BMC Plant Biol. 2020 Oct 23; 20 (1) : 485) .
  • the genetic characteristic of the plant as represented by its genetic marker profile can be used to select plants of desired traits.
  • the term “marker-based selection” refers to the use of genetic markers to detect one or more nucleic acids from the plant, where the nucleic acid is associated with a desired trait to identify plants that carry genes for desirable (or undesirable) traits.
  • Markers include but are not limited to Restriction Fragment Length Polymorphisms (RFLPs) , Randomly Amplified Polymorphic DNAs (RAPDs) , Arbitrarily Primed Polymerase Chain Reaction (AP-PCR) , DNA Amplification Fingerprinting (DAF) , Sequence Characterized Amplified Regions (SCARs) , Amplified Fragment Length Polymorphisms (AFLPs) , Simple Sequence Repeats (SSRs) which are also referred to as Microsatellites, and Single Nucleotide Polymorphisms (SNPs) .
  • RFLPs Restriction Fragment Length Polymorphisms
  • RAPDs Randomly Amplified Polymorphic DNAs
  • AP-PCR Arbitrarily Primed Polymerase Chain Reaction
  • DAF DNA Amplification Fingerprinting
  • SCARs Sequence Characterized Amplified Regions
  • AFLPs Amplified Fragment Length Poly
  • associated with refers to a recognizable and/or detectable relationship between two entities.
  • the phrase “associated with increased protein content” refers to a trait, locus, gene, allele, marker, phenotype, etc., or the expression product thereof, the presence or absence of which can influence or indicate an extent and/or degree to which a plant or its progeny exhibits increased protein content as compared to a control plant.
  • a marker is “associated with” a trait when it is linked to it and when the presence of the marker is an indicator of whether and/or to what extent the desired trait or trait form will occur in a plant/germplasm comprising the marker.
  • a marker is “associated with” an allele when it is linked to it and when the presence (or absence) of the marker is an indicator of whether the allele is present (or absent) in a plant, germplasm, or population comprising the marker.
  • “amarker associated with increased protein content” refers to a marker whose presence or absence can be used to predict whether and/or to what extent a plant will display increased protein content as compared to a control plant.
  • allele (s) refer to any of one or more alternative forms of a gene, all of which alleles relate to at least one trait or characteristic. In a diploid cell, the two alleles of a given gene occupy corresponding loci on a pair of homologous chromosomes.
  • genotyp and variants thereof refers to the genetic composition of an organism, including, for example, whether a diploid organism is heterozygous (i.e., has two different alleles for a given gene or QTL) or homozygous (i.e., has the same allele for a given gene or QTL) for one or more genes or loci (e.g., a SNP, a haplotype, a gene mutation, an insertion, or a deletion) .
  • a diploid organism i.e., has two different alleles for a given gene or QTL
  • homozygous i.e., has the same allele for a given gene or QTL
  • genes or loci e.g., a SNP, a haplotype, a gene mutation, an insertion, or a deletion
  • the markers used to identify the plants comprising the polynucleotides disclosed herein are SNPs.
  • SNP genotyping methods include hybridization, primer extension, oligonucleotide ligation, nuclease cleavage, minisequencing and coded spheres. Such methods are well known and disclosed in e.g., Gut, I. G., Hum. Mutat. 17: 475-492 (2001) ; Shi, Clin. Chem.
  • Masscode SupTM/Sup Qiagen, Germantown, MD , (Hologic, Madison, WI) , (Applied Biosystems, Foster City, CA) , (Applied Biosystems, Foster City, CA) and Beadarrays SupTM/Sup (Illumina, San Diego, CA) .
  • an assay e.g. generally a two-step allelic discrimination assay or similar
  • a KASP SupTM/Sup assay generally a one-step allelic discrimination assay defined below or similar
  • both can be employed to identify the SNPs that associate with increased protein content, increased oil content, and/or modified oil profileas disclosed herein (e.g., favorable alleles as depicted in Tables 2-5 below) .
  • a forward primer, a reverse primer, and two assay probes that recognize two different alleles at the SNP site (or hybridization oligos) are employed.
  • the forward and reverse primers are employed to amplify genetic loci that comprise SNPs that are associated with increased protein content, increased oil content, and/or modified oil profile (for example, any of the favorable alleles as shown in Tables 2-5 below) .
  • the particular nucleotides that are present at the SNP positions are then assayed using the probes.
  • the assay probes and the reaction conditions are designed such that an assay probe will only hybridize to the reverse complement of a 100%perfectly matched sequence, thereby permitting identification of which allele (s) that are present based upon detection of hybridizations.
  • the probes are differentially labeled with, for example, fluorophores to permit distinguishing between the two assay probes in a single reaction.
  • Exemplary methods of amplifying include employing a polymerase chain reaction (PCR) or ligase chain reaction (LCR) using a nucleic acid isolated from a soybean plant or germplasm as a template in the PCR or LCR.
  • a number of SNP alleles together within a sequence, or across linked sequences can be used to describe a haplotype for any particular genotype. Ching et al., BMC Genet. 3: 19 (2002) (14 pages) ; Gupta et al., (2001) Curr Sci. 80: 524–535, Rafalski, Plant Sci. 162: 329-333 (2002) .
  • haplotypes can be more informative than single SNPs and can be more descriptive of any particular genotype. For example, a single SNP may be allele “T” for a specific disease resistant line or variety, but the allele “T” might also occur in the soybean breeding population being utilized for recurrent parents.
  • a combination of alleles at linked SNPs may be more informative.
  • a unique haplotype has been assigned to a donor chromosomal region, that haplotype can be used in that population or any subset thereof to determine whether an individual has a particular gene.
  • the use of automated high throughput marker detection platforms known to those of ordinary skill in the art makes this process highly efficient and effective.
  • haplotype can refer to the set of alleles an individual inherited from one parent. A diploid individual thus has two haplotypes.
  • haplotype can be used in a more limited sense to refer to physically linked and/or unlinked genetic markers (e.g., sequence polymorphisms) associated with a phenotypic trait.
  • haplotype block (sometimes also referred to in the literature simply as a haplotype) refers to a group of two or more genetic markers that are physically linked on a single chromosome (or a portion thereof) . Typically, each block has a few common haplotypes, and a subset of the genetic markers (i.e., a “haplotype tag” ) can be chosen that uniquely identifies each of these haplotypes.
  • Exemplary markers that are associated with and can be used to identify plants having increased protein content and/or increased oil content are shown in Tables 2-5.
  • Block 1 contains SNP #1-#5, of which SNP #4 is located in the CDS coding region.
  • Block 2 contains SNP #7-#18 12, among which SNP #7 and #8 are located in CDS coding region;
  • Block 3 contains SNP #19 and #20, both of which are outside the CDS coding region.
  • the SNP genotyping reveals seven different haplotypes that are associated with increased protein content and/or increased oil content. Tables 3-5 shown the genotype of each haplotype.
  • haplotypes Hap_2, Hap_3, and Hap_6 were found associated with increased protein content; haplotypes Hap_1, Hap_2, Hap5 and Hap_7 were found associated with increased oil content. Hap_2 was associated with both increased oil content and increased protein content.
  • FIG. 13-15 haplotypes Hap_2, Hap_3, and Hap_6 were found associated with increased protein content; haplotypes Hap_1, Hap_2, Hap5 and Hap_7 were found associated with increased oil content.
  • Hap_2 was associated with both increased oil content and increased protein content.
  • SNP markers can be used in a marker assisted breeding program to move traits, such as native traits or traits conferred by transgenes or traits conferred by genome editing, into the a desired plant background.
  • traits such as native traits or traits conferred by transgenes or traits conferred by genome editing
  • native trait refers to a trait already existing in germplasm, including wild relatives of crop species, or that can be produced by recombination of existing traits.
  • progeny plants from a cross between a donor soybean plant comprising in its genome a nucleic acid sequence encoding SEQ ID NO: 3, 5, 8, 9, 12,15, 18, 19, 22, 24-59, or a fragment or variant of any one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59, and a recipient soybean plant not comprising said nucleic acid sequence can be screened to detect the presence of the markers associated with increased protein content, increased oil conten, and/or modified oil profilet. Plants comprising said markers can be selected and verified for increased protein content, increased oil content, and/or modified oil profile as compared to control plants.
  • the donor plant comprises a nucleic acid sequence encoding SEQ ID NO: 3 and the markers are those listed in Table 2.
  • the markers that can be used to select plants having increased protein content are the alleles associated one or more haplotypes of Hap_1, Hap_2, Hap_5, or Hap_7. In some embodiments, the markers that can be used to select plants having increased oil content are the alleles associated with one or more haplotypes of Hap_2, Hap_3, or Hap_6.
  • the favorable alleles of the SNPs are those present in one or more of aforementioned haplotypes.
  • kits and primers that can be used to introduce a polynucleotide sequence as described in this disclosure into a recipient plant or to detect a polynucleotide sequence as described in this disclosure in a plant.
  • kits and primers that can be used to identify plants that have increased protein content, increased oil content, and/or modified oil profile.
  • the primers can include Glyma. 06G303700-F ATAACTAGTATGTTCCAGCCGAACC (SEQ ID NO: 40) ; and Glyma. 06G303700-R, ATAGGATCCAGCAGGTTCACCAGA (SEQ ID NO: 41) .
  • kits and primers that can be used to detect the expression level of the polypeptide disclosed herein in plants.
  • the primers can include Glyma. 06G303700-q-F: AGTTGCACCGATTCAACAGGC (SEQ ID NO: 63) ; and Glyma. 06G303700-q-R: CCATGCGATGTGGTTCCATCT (SEQ ID NO: 64) .
  • kits and primers that can be used to detect the expression level of the polypeptide disclosed herein in plants.
  • the primers can include Glyma. 06G303700-q-F: AGTTGCACCGATTCAACAGGC (SEQ ID NO: 65) ; and Glyma. 06G303700-q-R: CCATGCGATGTGGTTCCATCT (SEQ ID NO: 66) .
  • the kit may also comprise one or more probes having a sequence corresponding to or complementary to a sequence having 80%to 100%sequence identity with a specific region of the transgenic event or gene editing event.
  • the kit may comprise any reagent and material required to perform the assay or detection method.
  • Embodiment 1 An elite Glycine max plant having in its genome a nucleic acid sequence from a donor Glycine plant, wherein the donor Glycine plant is a different strain from the elite Glycine max plant, and wherein the nucleic acid sequence encoding at least one polypeptide having at least 90%identity or 95%identity to SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein said polypeptide confers increased protein, oil content, and/or modified oil profile on the elite Glycine max plant.
  • Embodiment 2 The elite Glycine max plant of embodiment 1, wherein the donor Glycine plant is from Glycine soja or Glycine max.
  • Embodiment 3 The elite Glycine max plant of embodiment 2, wherein the Glycine soja is the ZYD00006 variety. s
  • Embodiment 4 The elite Glycine max plant of embodiment 1 or 2, wherein the nucleic acid sequence encodes at least one polypeptide having the amino acid sequence set forth in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59.
  • Embodiment 5 The elite Glycine max plant of any one of embodiments 1-3, wherein the nucleic acid sequence has at least 90%, 95%or 100%sequence identity to any one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, 21 and polynucleotides encoding SEQ ID NO: 22, and 24-59
  • Embodiment 6 The elite Glycine max plant of embodiment 1, wherein the polypeptide encoded by the nucleic acid sequence has at least 90%, or at least 95%identity to SEQ ID NO: 3 or SEQ ID NO: 5 or or SEQ ID NO: 22, wherein the polypeptide comprises one or more of the following: (i) a START domain, wherein START domain has no more than two, no more than five, no more than ten amino acid substitutions as compared to amino acid residues 246-466 of SEQ ID NO: 20, or (ii) a homeodomain, wherein the homeodomain has no more than two, no more than five, no more than ten amino acid substitutions as compared to amino acid residues 55-113 of SEQ ID NO: 20.
  • a START domain wherein START domain has no more than two, no more than five, no more than ten amino acid substitutions as compared to amino acid residues 246-466 of SEQ ID NO: 20
  • a homeodomain wherein the homeodomain has no more than two
  • Embodiment 7 The elite Glycine max plant of any one of embodiments 1-6, wherein the nucleic acid sequence is introduced into said plant genome by genome editing of the sequence set forth in SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16 or 17, wherein the genome editing confers increased protein, oil content, and/or oil profile.
  • Embodiment 8 The elite Glycine max plant of any one of embodiments 1-6, wherein the nucleic acid sequence is introduced by genome editing of a Glycine max genomic region homologous to or an ortholog of the nucleic acid sequence corresponding to SEQ ID NO: 1, and further making at least one genomic edit to said Glycine max genomic region of at least one allele change corresponding to any described in any of Tables 21-23, wherein the one or more alleles are associated with the one or more of haplotypes Hap_1, Hap_2, Hap_3, Hap_5, Hap_6, and/or Hap_7, wherein said one or more alleles confer in the plant increased protein and/or oil content, wherein said Glycine max genomic region did not comprise said allele change before genome editing, and wherein said genomic edit confers in the plant increased protein and/or oil content.
  • Embodiment 9 The elite Glycine max plant of embodiment 7 or 8, wherein the genomic editing is accomplished through CRISPR, TALEN, meganucleases, or through modification of genomic nucleic acids.
  • Embodiment 10 The elite Glycine max plant of any one of embodiments 1-6, wherein said nucleic acid sequence is introduced into said plant genome by transgenic expression of (a) a nucleic acid sequence encoding at least one polypeptide having at least 90%identity or 95%identity to SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 and apolynucleotide encoding any one of SEQ ID NO: 22, or 24-59 or (b) a nucleic acid sequence encoding at least one polypeptide set forth in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59 wherein said polypeptide confers increased protein and/or oil content on the elite Glycine max plant.
  • Embodiment 11 The elite Glycine max plant of any of embodiments 1-10, wherein the elite Glycine max plant has in its genome at least one allele that is associated with a haplotype of Hap_1, Hap_2, Hap_5, and/or Hap_7, wherein the plant has increased oil content.
  • Embodiment 12 The elite Glycine max plant of any of embodiments 1-10, wherein the elite Glycine max plant has in its genome at least one allele that is associated with a haplotype of Hap_2, Hap_3, and/or Hap_6, wherein the plant has increased protein content.
  • Embodiment 13 The elite Glycine max plant of any one of embodiments 1-6, wherein at least one parental line of said elite Glycine max plant was selected or identified through molecular marker selection, wherein said parental line is selected or identified based on the presence of a molecular marker located within or closely linked with said nucleic acid sequence corresponding to any one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, or any portion thereof, wherein said molecular marker is associated with increased protein and/or oil content and/or modified oil profile.
  • Embodiment 14 The elite Glycine max plant of embodiment 13, wherein the molecular marker is a single nucleotide polymorphism (SNP) , a quantitative trait locus (QTL) , an amplified fragment length polymorphism (AFLP) , randomly amplified polymorphic DNA (RAPD) , a restriction fragment length polymorphism (RFLP) , or a microsatellite.
  • SNP single nucleotide polymorphism
  • QTL quantitative trait locus
  • AFLP amplified fragment length polymorphism
  • RAPD randomly amplified polymorphic DNA
  • RFLP restriction fragment length polymorphism
  • Embodiment 15 The elite Glycine max plant of embodiment 13 or 14, wherein the nucleic acid sequence comprises a SNP marker associated with increased protein and/or oil content, and wherein the molecular marker is any one or more of the SNP markers as shown in Table 2.
  • Embodiment 16 The elite Glycine max plant of any one of embodiments 1-15, wherein the elite Glycine max plant is an agronomically elite Glycine max plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, herbicide tolerance.
  • Embodiment 17 A plant, the genome of which has been edited to comprise a nucleic acid sequence encoding at least one polypeptide having at least 90%identity or 95%identity to SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 and a polynucleotide encoding any one of SEQ ID NO: 22, or 24-59, wherein said polypeptide confers increased protein and/or oil content and/or modified oil profile relative to a control plant, wherein the plant does not comprise said nucleic acid sequence before the genome editing.
  • Embodiment 18 The plant of embodiment 17, wherein the nucleic acid sequence is introduced into said plant genome by genome editing of a nucleic acid sequence set forth in SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or a nucleic acid sequence encoding any oneof SEQ ID NO: 22, or 24-59.
  • Embodiment 19 The plant of embodiment 17 or 18, wherein the nucleic acid sequence is introduced by genome editing of a genomic region homologous to or an ortholog of the nucleic acid sequence corresponding to SEQ ID NO: 1, and further making at least one genomic edit to said Glycine max genomic region of at least one allele change corresponding to any described in any of Table 2, wherein the one or more alleles are associated with the one or more of haplotypes Hap_1, Hap_2, Hap_3, Hap_5, Hap_6 and/or Hap_7, wherein said one or more alleles confer in the plant increased protein and/or oil content, wherein said Glycine max genomic region did not comprise said allele change before genome editing, and wherein said genomic edit confers in the plant increased protein and/or oil content.
  • Embodiment 20 The plant of embodiment 17, wherein the nucleic acid sequence is modified into said plant genome by duplication, inversion, promoter modification, terminator modification and/or splicing modification via genome editing of a nucleic acid sequence set forth in SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 and a nucleic acid sequence encoding any one of SEQ ID NO: 22, or 24-59.
  • Embodiment 21 The plant of any one of embodiments 17-20, wherein the genomic editing is accomplished through CRISPR, TALEN, meganucleases, or through modification of genomic nucleic acids.
  • Embodiment 22 The plant of any one of embodiments 17-21, wherein the plant has in its genome at least one allele that is associated with a haplotype of Hap_1, Hap_2, Hap_5, and/or Hap_7, wherein the plant has increased oil content.
  • Embodiment 23 The plant of any one of embodiments 17-21, wherein the plant has in its genome at least one genetic marker that is allele that is associated with a haplotype of Hap_2, Hap_3, and/or Hap_6, wherein the plant has increased protein content.
  • Embodiment 24 The plant of any one of embodiments 17-23, wherein the nucleic acid sequence comprises a SNP marker associated with increased protein and/or oil content, and wherein the molecular marker is any one or more of the SNP markers as shown in Table 2.
  • Embodiment 25 The plant of any one of embodiments 17-24, wherein the plant is an agronomically elite plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, herbicide tolerance.
  • Embodiment 26 The plant of any one of embodiments 17-25, wherein the nucleic acid sequence is operably linked to a heterologous promoter and wherein the heterologous promoter is active in the plant.
  • Embodiment 27 The plant of embodiment 26, wherein the promoter is a native promoter or active variant or fragment thereof.
  • Embodiment 28 A plant having stably incorporated into its genome a nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence encodes a polypeptide having (a) an amino acid sequence comprising at least 85%, at least 90%, or at least 95%identity to at least one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, or, (b) an amino acid sequence set forth in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59 wherein said nucleic acid sequence is heterologous to the plant, and wherein the plant has increased protein content and/or increased oil and/or modified oil profile as compared to a control plant.
  • Embodiment 29 The plant of embodiment 28, wherein (a) said nucleic acid sequence comprises at least 85%, at least 90%, or at least 95%identity to at least one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or to a polynucleotide encodes any one of SEQ ID NO: 22, 24-59, or, (b) said nucleic acid sequence is any one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or encodes any one of SEQ ID NO: 22, 24-59.
  • Embodiment 30 The plant of embodiment 28 or 29, wherein the nucleic acid sequence is introduced into the genome by transgenic expression.
  • Embodiment 31 The plant of embodiment 28 or 29, wherein the nucleic acid sequence is introduced by genome editing.
  • Embodiment 32 The plant of any one of embodiments 28-31, wherein the promoter is an endogenous promoter.
  • Embodiment 33 The plant of any one of embodiments 28-31, wherein the promoter is a constitutive promoter, inducible promoter, a a tissue-specific promoter.
  • Embodiment 34 The plant of any one of embodiments 28-30, wherein said genomic region of the plant comprises at least one allele corresponding to one or more alleles as described in any of Tables 2-5, wherein the one or more alleles are associated with one or more of haplotypes Hap_1, Hap_2, Hap_3, Hap_5, Hap_6, and/or Hap_7, and wherein said one or more alleles confer in the plant increased protein and/or oil content.
  • Embodiment 35 The plant of any one of embodiments 28-34, wherein the plant has in its genome at least one allele associated with a haplotype of Hap_1, Hap_2, Hap_5, and/or Hap_7, wherein the plant has increased oil content.
  • Embodiment 36 The plant of any one of embodiments 28-34, wherein the plant has in its genome at least one allele associated with a haplotype of Hap_2, Hap_3, and/or Hap_6, wherein the plant has increased protein content.
  • Embodiment 37 The plant of any one of embodiments 28-36, wherein the nucleic acid sequence comprises a SNP marker associated with increased protein and/or oil content, and wherein the molecular marker is any one or more of the SNP markers as shown in Table 2.
  • Embodiment 38 The plant of any one of embodiments 28-37, wherein the plant is a dicot plant.
  • Embodiment 39 The plant of embodiment 38, wherein the dicot plant is a soybean plant or an elite soybean plant.
  • Embodiment 40 The plant of any one of embodiments 28-37, wherein the plant is a monocot plant.
  • Embodiment 41 The plant of embodiment 40, wherein the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane.
  • Embodiment 42 The plant of any one of embodiments 28-41, wherein the plant is an agronomically elite plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, or herbicide tolerance.
  • Embodiment 43 A progeny plant from the elite Glycine max plant of any one of embodiments 1-16 or the plant of any one of embodiments 17-42, wherein said progeny plant has stably incorporated into its genome the nucleic acid sequence.
  • Embodiment 44 A plant cell, seed, or plant part derived from the elite Glycine max plant of any one of embodiments 1-16 or the plant of any one of embodiments 17-42, wherein said plant cell, seed or plant part has stably incorporated into its genome the nucleic acid sequence.
  • Embodiment 45 A harvest product derived from the elite Glycine max plant of any one of embodiments 1-16 or the plant of any one of embodiments 17-42.
  • Embodiment 46 A processed product derived from the harvest product of embodiment 45, wherein the processed product is a flour, a meal, an oil, a starch, or a product derived from any of the foregoing.
  • Embodiment 47 A method of producing a soybean plant having increased protein and/or oil content and/or modified oil profile, the method comprising the steps of: a) providing a donor soybean plant comprising in its genome a nucleic acid sequence encoding at least one polypeptide having at least 90%identity or 95%identity to SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or a nucleic acid sequence encoding any one of SEQ ID NO: 22, or 24-59, wherein said nucleic acid sequence confers onto said donor soybean plant an increased protein and/or oil content and/or modified oil profile; b) crossing the donor soybean plant of a) with the recipient soybean plant not comprising said nucleic acid sequence; and c) selecting a progeny plant from the cross of b) by isolating a nucleic acid from said progeny plant and detecting within said nucleic acid a molecular marker associated with said nucleic acid sequence thereby producing a soybean plant having increased protein content and/or increased oil content and/or
  • Embodiment 48 The method of embodiment 47, wherein the molecular marker is a single nucleotide polymorphism (SNP) , a quantitative trait locus (QTL) , an amplified fragment length polymorphism (AFLP) , randomly amplified polymorphic DNA (RAPD) , a restriction fragment length polymorphism (RFLP) or a microsatellite.
  • SNP single nucleotide polymorphism
  • QTL quantitative trait locus
  • AFLP amplified fragment length polymorphism
  • RAPD randomly amplified polymorphic DNA
  • RFLP restriction fragment length polymorphism
  • Embodiment 49 The method of embodiment 47 or 48, wherein the molecular markers are markers as set forth in Tables 2-5.
  • Embodiment 50 The method of any one of embodiments 47-49, wherein either the recipient or the donor soybean plant is an elite Glycine max plant.
  • Embodiment 51 A method of producing a Glycine max plant with increased protein and/or oil content to, the method comprising the steps of: a) isolating a nucleic acid from a Glycine max plant; b) detecting in the nucleic acid of a) at least one molecular marker associated with, or closely linked with a nucleic acid sequence comprising any one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, or a portion of any thereof, wherein said portion confers to a plant increased protein content and/or increased oil content; c) selecting a plant based on the presence of the molecular marker detected in b) ; and d) producing a Glycine max progeny plant from the plant of c) identified as having said marker associated with increased protein and/or increased oil content.
  • Embodiment 52 The method of embodiment 51, wherein the molecular marker is a single nucleotide polymorphism (SNP) , a quantitative trait locus (QTL) , an amplified fragment length polymorphism (AFLP) , randomly amplified polymorphic DNA (RAPD) , a restriction fragment length polymorphism (RFLP) or a microsatellite.
  • SNP single nucleotide polymorphism
  • QTL quantitative trait locus
  • AFLP amplified fragment length polymorphism
  • RAPD randomly amplified polymorphic DNA
  • RFLP restriction fragment length polymorphism
  • Embodiment 53 The method of embodiment 51 or 52, wherein the molecular marker is one or more SNPs set forth in Table 2.
  • Embodiment 54 The method of any one of embodiments 51-53, wherein the molecular marker comprises alleles associated with one or more of haplotypes Hap_1, Hap_2, Hap_3, Hap_5, and/or Hap_7.
  • Embodiment 55 The method of embodiment 51, wherein the detecting comprises amplifying a molecular marker locus or a portion of the molecular marker locus and detecting the resulting amplified molecular marker amplicon.
  • Embodiment 56 The method of embodiment 51, wherein the nucleic acid is selected from DNA or RNA.
  • Embodiment 57 A plant produced by the method of any one of embodiments 47-56.
  • Embodiment 58 A method of conferring increased protein content and/or increased oil content and/or modified oil profile to a plant comprising: a) introducing into the genome of the plant a nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence is stably incorporated into the genome, wherein the nucleic acid sequence encodes a polypeptide having (i) an amino acid sequence comprising least 85%, at least 90%, or at least 95%identity to at least one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59, or (ii) an amino acid sequence set forth in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59 wherein said nucleic acid sequence is heterologous to the plant, and wherein expression of said nucleic acid sequence increases protein content and/or increases oil content compared to a control plant not expressing said nucleic acid sequence.
  • Embodiment 59 The method of embodiment 58, wherein the nucleic acid sequence is introduced into the genome of the plant by transformation.
  • Embodiment 60 The method of embodiment 58, wherein the nucleic acid sequence is introduced into the genome of the plant by crossing a donor plant comprising the nucleic acid sequence with the plant to produce a progeny plant having increased protein content and/or increased oil content.
  • Embodiment 61 The method of embodiment 58, wherein the nucleic acid sequence is introduced into the genome of the plant by gene editing of the genome of the plant.
  • Embodiment 62 The method of embodiment 58, wherein the method comprises Cas12a mediated gene replacement.
  • Embodiment 63 The method of embodiment 62, wherein the method comprises at least one gRNA.
  • Embodiment 64 The method of any one of embodiments 58-63, wherein the promoter is an exogenous promoter.
  • Embodiment 65 The method of any one of embodiments 58-63, wherein the promoter is an endogenous promoter.
  • Embodiment 66 The method of embodiment 64, wherein the exogenous promoter comprises SEQ ID NO: 23 or an active variant or fragment thereof.
  • Embodiment 67 The method of embodiment 59, wherein the method comprises screening for the introduced nucleic acid sequence with PCR and/or sequencing.
  • Embodiment 68 The method of any one of embodiments 58-67, wherein the plant is a dicot plant.
  • Embodiment 69 The method of embodiment 68, wherein the dicot plant is a soybean plant.
  • Embodiment 70 The method of any one of embodiments 58-67, wherein the plant is a monocot plant.
  • Embodiment 71 The method of embodiment 70, wherein the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane.
  • Embodiment 72 A plant produced by the method of any one of embodiments 58-71.
  • Embodiment 73 A polypeptide selected from: (a) a polypeptide having the amino acid sequence shown in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein expression of the polypeptide in a plant confers increased protein, oil content and/or modified oil profile on said plant, and having a heterologous amino acid sequence attached thereto; (b) a polypeptide comprising the amino acid sequence of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, and having a substitution and/or a deletion and/or an addition of one or more amino acid residues, wherein expression of the polypeptide in the plant confers increased protein and/or oil content on said plant; (c) a polypeptide having at least 99%, at least 95%, at least 90%, at least 85%, or at least 80%identity with and having the same function as the amino acid sequence of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein the polypeptide when expressed in a plant confers increased polypeptide
  • Embodiment 74 A nucleic acid molecule comprising (a) a nucleotide sequence encoding a protein having an amino acid sequence sharing at least 90%, 95%or 100%sequence identity to SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein said nucleotide sequence comprises a heterologous nucleic acid sequence attached thereto and expression of the nucleic acid molecule in a plant increase protein and/or oil content in the plant; (b) the nucleotide sequence of part (a) comprising a sequence of SEQ ID NOs: NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, 21 and a sequence encoding SEQ ID NO: 22, 24-59; or (c) the nucleotide sequence of part (a) having at least 99%, at least 95%, at least 90%identity to of any one of SEQ ID NOs: NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, 21 or a polynucleotide of SEQ ID NO: 22, 24-59.
  • Embodiment 75 An expression cassette comprising the nucleic acid molecule of embodiment 74 or a nucleic acid sequence encoding the polypeptide of embodiment 73.
  • Embodiment 76 The expression cassette of embodiment 75, wherein the nucleic acid molecule is operably linked to a promoter that is capable of directing expression in a plant cell.
  • Embodiment 77 The expression cassette of embodiment 75, wherein the promoter is an endogenous promoter.
  • Embodiment 78 The expression cassette of embodiment 75, wherein the promoter is an exogenous promoter.
  • Embodiment 79 A vector comprising the nucleic acid molecule of embodiment 74, the expression cassette of any one of embodiments 75-78.
  • Embodiment 80 A transgenic cell comprising the nucleic acid molecule of embodiment 74 or the expression cassette of any one of embodiments 75-78.
  • Embodiment 81 Use of the polypeptide of embodiment 73 or the nucleic acid molecule of embodiment 74, or the expression cassette of any one of embodiments 75 to 78 in conferring increased protein content and/or increased oil content and/or modified oil profile in a plant.
  • Embodiment 82 Use of the expression cassette of any one of embodiments 75-78 in a cell, wherein the expression level and/or activity of the polypeptide in the cell is increased, and the protein content and/or oil content is increased in a plant upon expression in a plant.
  • Embodiment 83 A method for increasing protein content and/or oil content in a plant, comprising increasing the expression level and/or activity of the polypeptide of embodiment 73 in the plant.
  • Embodiment 84 A method for producing a plant variety with increased protein content and/or oil content, comprising increasing the expression level and/or activity of the polypeptide of embodiment 73 in a recipient plant.
  • Embodiment 85 The method of embodiments 83 or 84, wherein the increasing the expression level and/or activity of the polypeptide in the plant is by transgenic means or by breeding.
  • Embodiment 86 A method for producing a transgenic plant with increased protein content and/or oil content, comprising the following step: introducing the nucleic acid molecule of embodiment 67 or the expression cassette of any one of embodiments 75-78 to a recipient plant to obtain a transgenic plant; the transgenic plant has increased protein content and/or oil content compared with the recipient plant.
  • Embodiment 87 The method of embodiment 86, wherein the introducing the nucleic acid molecule to the recipient plant is performed by introducing the expression cassette of any one of embodiments 75-78 into the recipient plant.
  • Embodiment 88 A primer pair for amplifying the nucleic acid molecule of embodiment 74.
  • Embodiment 89 The primer pair of embodiment 88, wherein the primer pair is a primer pair 1 composed of two single-stranded DNA comprising a sequence of SEQ ID NO: 63 and SEQ ID NO: 64.
  • Embodiment 90 A kit comprising the primer pair of embodiment 88 or 89.
  • the roots, stems, leaves, flowers, pods and seeds of parent SN14 were selected as template materials for tissue-specific expression.
  • the materials were put into an Eppendorf (EP) tube without RNase and immediately put into liquid nitrogen, and stored at -80°C.
  • the soybean template material was SN14, and the soybean transformation material was DN50.
  • the Arabidopsis transformation material is Col-0, and the Arabidopsis mutant material is SALK_127828.47.00. x (ordered from the ABRC website) .
  • Escherichia coli used in this application was DH5 ⁇ and Agrobacterium tumefaciens was EHA105.
  • the target gene fragment of entry vector Fu28 was connected to plant expression vector Pr35S by gateway vector system.
  • the entry vector Fu28 (FIG. 16) and expression vector Pr35S (FIG. 17) were provided by Professor Fu Yongfu of Institute of crop science, Chinese Academy of Agricultural Sciences.
  • the genome sequence, CDS sequence and peptide sequence of candidate genes were obtained from the phytozome website.
  • the parental strains Suinong 14 (SN14) and ZYD00006 were fully sequenced but the sequencing information has not been published.
  • Williams 82 is a soybean cultivar used to produce the reference genome sequence. The relevant sequences in Williams 82, SN14, and ZYD00006 were analyzed and compared by DNAMAN software.
  • the medium for genetic transformation of soybean cotyledon node is shown in Table 10.
  • B5 salt Gamborg Basal Salt Mixture
  • MES 2- (4-Morpholino) ethanesulfonic acid
  • 6-BA 6-benzylaminopurine
  • GA3 Gibberellic acid
  • AS Acetosyringone
  • L-Cys L-Cysteine
  • DTT DL-Dithiothreitol
  • ZT zeatin
  • Pro Proline
  • Asp Aspartic acid
  • Glu glutamic acid
  • IAA 3-Indoleacetic acid.
  • the target crop was selected as Arabidopsis thaliana, and the gene sequence was Glyma. START-CDS sequence.
  • the homologous gene in Arabidopsis thaliana was obtained.
  • the conserved functional domain of the homologous gene in Arabidopsis thaliana was predicted, and the domain was similar to the target gene Glyma. START.
  • the planting soil ratio offlower nutrient soil: vermiculite was 3: 1.
  • the soil was put into small flower pots and slowly soaked in water.
  • Arabidopsis thaliana seeds were sown evenly in moist soil.
  • the opening of each pot was sealed with plastic wrap and placed in a refrigerator at 4°C for vernalization for 48-72h.
  • the pots were placed in an incubator (22°C, 16 h/8 h light/dark, 70 ⁇ mol ⁇ m-2 ⁇ s-1) for 1 week until the Arabidopsis emerged.
  • Arabidopsis plants having two large leaves and two small leaves were selected for transplanting in pots; 1-2 plants per pot. Water or flower fertilizer was added when the soil in the pots became dry.
  • Total DNA was extracted by the CTAB (hexadecyltrimethylammonium bromide) method (Porebski, S. et al., Plant Molecular Biology Reporter, 1997, 15 (1) : 8-15) .
  • CTAB hexadecyltrimethylammonium bromide
  • the prepared CTAB extract was stored for at 4°C.
  • the rosette leaves of Arabidopsis thaliana were collected and placed in an EP tube with add 2 mm small steel balls. Lquid nitrogen was used to quick- freeze the leaves. Next the frozen leaves placed in a tissue grinder to fully break the leaves.
  • 700 ⁇ L of CTAB extract solution was added to the EP tube containing the sample and mixed thoroughly with a vortexer. The mixture in the EP tube was then placed in a 65°C water bath for 1 h, turning and mixing once every 10 minutes.
  • the EP tube taken out of the water bath and added 650 ⁇ L of chloroform after cooling. The two was inverted 30 times to mix thoroughly, and centrifuged at 12000 rpm for 15 minutes at room temperature. 400-500 ⁇ L of the supernatant was added into a new EP tube and 650 ⁇ L of chloroform was added. The mixture was shaken and mixed thoroughly, and centrifuged at 12000 rpm for 15 minutes at room temperature. 400-500 ⁇ L of the supernatant was transferred to a new EP tube containing 700 ⁇ L of pre-cooled isopropanol and inverted 30 times to mix thoroughly. The mixture was then centrifuged at 12000 rpm for 15 minutes at room temperature.
  • the supernatant was discarded, and the precipitate was washed once with 95%ethanol, then once with 75%ethanol, and centrifuged at 7500 rpm for 5 min at room temperature.
  • the DNA precipitate was dried and dissolved with 50 ⁇ L of sterilized water. DNA concentration (as reflected by the OD600 value) was measured, and the DNA was stored at -20°C.
  • DNA from Arabidopsis wild-type Col-0 and mutants was extracted and used as a template for PCR amplification.
  • the amplified product was subjected to 1.5%agarose gel electrophoresis to detect whether the mutant was a homozygous mutant.
  • the primers used are shown in Table 11.
  • Arabidopsis cultivation and plant transformation preparation Arabidopsis cultivation and plant transformation preparation .
  • Arabidopsis control group Col-0 and homozygous mutant materials were planted as described above. After the Arabidopsis was bolted, the stalks were removed to increase the number of bolts. The plants were then ready to be transformed when the stalks growed to the same height and only the upper flowers were not blooming.
  • Agrobacterium preparation Agrobacterium tumefaciens containing the expression vector at -80°C were inoculated into 10mL of LB liquid medium containing spectinomycin and cultured overnight at 28°C at 160 rpm. 100 ⁇ L of small shaking bacteria liquid was then transferred to 100 mL of new YEP liquid medium containing spectinomycin for further culturing at 28°C, 200 rpm shaking. When the density of the culture reached OD 600 0.8, the culture was harvested and resuspended the bacteria pellet with 100mL of 5%sucrose and 0.01%Silwet-L77 resuspension solution. The suspension was kept at room temperature for 1-3h for agricultural use.
  • Arabidopsis thaliana that had grown to a suitable bolting height with a large number of inflorescences were used for the transformation. The flowering flocs and the established pods were removed. The unflowered flocs were immersed in the Agrobacterium resuspension for 30s. The Arabidopsis thaliana infected by Agrobacterium was then wrapped in plastic wrap and placed in a dark box for light-proof treatment. After the incubation period of 24 hours, the infected plants were then taken out of the dark box. A second round of transformation was then performed on these plants a week later in order to improve the conversion efficiency. Mature seeds of the plants were harvested.
  • the mature T 0 seeds of the transformed Arabidopsis thaliana were harvested and planted as described above.
  • Basta liquid (Basta dilution concentration is 1: 1000) was sprayed on the plants 2 -3 times, once every other day, and the growth state of Arabidopsis was observed.
  • Non-transgenic Arabidopsis plants appeared chlorosis and gradually died, while transgenic Arabidopsis plants grew normally.
  • the transgenic Arabidopsis thaliana plant grew 4 leaves the plants that were positively identified as transgenic plants were transplanted into new small pots, and the seedlings grow up before identification.
  • Transgenic Arabidopsis thaliana (T 1 , T 2 and T 3 ) rosette leaves were placed in the EP tube andgrinded with a small pestle. The ground leaves were placed into the EP tube in the direction indicated by the Bar test strip, and observe the strips shown on the test strip Number, two bands represent that the identified Arabidopsis plants are transgenic plants, and one band is non-transgenic plants.
  • the leaf DNA of transgenic Arabidopsis thaliana (T 1 , T 2 and T 3 ) was extracted.
  • the transgenic plants were identified by PCR using Glyma. 06G303700 gene primers and Bar primers using primers shown in Table 12.
  • the PCR products were detected by 1.5%agarose gel electrophoresis.
  • RNA of Arabidopsis rosette leaves were extracted and reversely transcribed into cDNA.
  • the expression level of Glyma. 06G303700 in transgenic Arabidopsis was determined using primer sequence shown in Table 13. AtACTIN2 was used as an internal reference gene.
  • Nitrogen content of the seeds was determined using 0.1 mol/L Na 2 CO 3 calibration to prepare 0.1 mol/L HCl.
  • 1%H 3 BO 3 was prepared and was adjusted to pH between 4 and 5.
  • Seven millileters of 0.1%methyl red and 10 mL of 0.1%bromophenol green indicator were added for every 1 L of H 3 BO 3 , and the solution appeared wine red.
  • the seeds were placed in an oven at 60°C for 12-14 hours.
  • 0.1 g sample (accurate to 0.001 g) was poured into a 50 mL digestion tube through a paper trough. The same sample was tested 3 times.
  • 5 mL of concentrated sulfuric acid and a small amount of catalyst (potassium sulfate and copper sulfate 5: 1) was added to digest each sample in an ovenat 400°C for 90 minutes. The sample was then taken out from the oven and let cool.
  • FOSS automatic Kjeltec 2300 was used to determine the total nitrogen content.
  • the content of fatty acids in seeds was determined by gas chromatography as follows. The seeds were placed in an oven at 105°C for 20-30 minutes, and then at 65°C for 12-14 hours. 5 replicate tests were performed for each sample. In each test, about 5 mg of the seed sample was mixed with 1 mL 2.5%concentrated sulfuric acid methanol solution, 5 ⁇ L 50 mg/mL BHT (2, 6-di-tert-butyl-4-methylphenol) . 50 ⁇ L 10mg/L heptadecanic acid or acetic acid was used as internal standard. The storage tube was immediately sealed and placed into a water bath at 85°C for 1.5 h. The tube was inverted every 10 minutes to mix the sample and reagents thoroughly, and then letcool to room temperature.
  • the column model used by the Agilent 6890 gas chromatograph was: 30m ⁇ 320 ⁇ m ⁇ 0.25 ⁇ m.
  • Carrier gas nitrogen 60 mL/min, hydrogen 60 mL/min, air 450 mL/min.
  • Injection volume 1 ⁇ L, split injection mode, split ratio 10: 1, injection port temperature 170°C.
  • Reaction procedure hold at 180°C for 1 min, increase to 250°C at a rate of 25°C/min and hold for 7 min.
  • Calculation formula of absolute quantity is the peak area of the ith fatty acid component, As is the peak area of internal standard, ms is the mass of internal standard, m is the dry weight of the sample.
  • Soybean cotyledon nodes were transformed and cultivated using the following protocol:
  • Co-culture Divide the seed into two halves along the hypocotyl with a razor blade, and use a razor blade to lightly scratch 2-3 points at the cotyledon node to make a cut. Put the explants into the prepared Agrobacterium resuspension, incubate at 160 rpm at 28°C for 30 min to facilitate the Agrobacterium infection, and remove the infected explants from the resuspension with tweezers. Place it on the SCCM covered with filter paper and incubate for 3-5 days at 25°Cin the dark.
  • Elongation of cluster buds Cut the large clump buds and insert them into the SEM, and place them in a sterile tissue culture room for about 14 days.
  • the clump buds that have not grown buds are taken out from the SEM, lightly scratched at the bottom to create a new wound, and then inserted into a new SEM for secondary culture.
  • the culture cycle is about 14 days and the process is repeated.
  • Rooting of positive elongated buds The positive buds were cut from the clumping buds, dipped in IBA hormone for 30 s, inserted into the RM, and cultured in a sterile tissue culture room until they took root.
  • the positive seedlings were taken out from the culture medium, and the roots were cleaned with clean water to remove the residual culture medium.
  • the positive seedlings were transplanted into the soil and cultured in the plant greenhouse.
  • a InfraTec TM 1241 Grain Analyzer (FOSS Analytics) was used to determine the protein and oil content of soybean seeds. Each sample was measured 3-5 times, and the average value was used for phenotypic data analysis.
  • the content of fatty acids in seeds was determined by gas chromatography and calculated as described in section vii of this Example 2 above.
  • the protein and oil content of the above 680 materials were determined by the FOSS grain analysis method for haplotype analysis.
  • the Glyma. 06G303700 (including promoter) sequence length and sequence information were obtained from the Phytozome website, and 680 soybean resource population resequencing (10 ⁇ sequencing) genome sequence information was used as the population verification data for this experiment. Extract the SNP information of the gene Glyma. 06G303700 (including promoter) sequence. Submit the difference SNP information sorting format to Haploview software, divide the gene block and obtain the haplotype classification in the block.
  • the one-way variance method (ANOVA) in the SPSS software was used to analyze the significant differences among the excellent haplotypes and their phenotypes.
  • Grain protein and oil content of the population of the plants was determined, and the protein content was sorted. After the sorting, 20 samples from the high protein and low protein content range were selected and extracted. Quality DNA to prepare high and low phenotype pools for BSA sequencing. Use the SNP-index correlation algorithm to select candidate regions. With SN14 as the reference parent, 3 candidate segments are associated with 95%confidence level, and the genes that cause stop loss, stop gain, or contain Genes with non-synonymous mutations or alternative splicing sites were selected as candidate genes, and a total of 5 genes were screened. The results of bulked segregant analysis ( “BSA” ) mixed pool sequencing are from the master's thesis of Li Wei, Northeast Agricultural University (2016) .
  • Glyma. 03G040200 is reported to have the highest expression in seeds, with a slight expression in stems and no expression in other tissues, but the relative expression levels of seeds and stems are not more than 1.
  • Glyma. 03G036300 has no expression in any organ.
  • Glyma. 06G297500 has extremely high expression levels in various tissues, among which root hairs and roots have the highest expression levels, followed by tip meristems, which gradually decrease in terms of root nodules, stems, pods, leaves, and flowers, and seeds have the lowest expression levels.
  • Glyma. 07G192400 is expressed in all tissues. The apical meristem has the highest expression, followed by pods and seeds, and the tissue with the lowest expression is roots.
  • Glyma. 06G303700 is not expressed in root nodules, and has the highest expression in apical meristems, followed by pods and seeds, and then decreases in the order of flowers, roots, stems, leaves, and root hairs
  • the 3000 bp upstream of the genome sequence of Glyma. 06G303700 were obtained as the promoter sequence of the gene and submitted to the PlantCARE website.
  • the promoter elements were obtained, screened and integrated, and the gene promoter elements were visualized using the TBtools software. The results show that the promoter region of the gene Glyma.
  • 06G303700 include at least the following regions: (i) 60K protein binding site, (ii) an cis-acting element involved in defense and stress response, (iii) a common cis-acting elements in the promoter and enhancer regions, (iv) a core promoter element, (v) an element for maximal elicitor-mediated activation elements, (vi) a conservative DNA module array (CMA3) , (vii) light-responsive elements.
  • 06G303700 contains a large number of TATA boxes (the core promoter element near the transcription promoter) , which plays a certain role in regulating gene expression.
  • the photoresponsive element of the promoter contains MYB binding sites involved in photoresponse, and some conserved DNA modules involved in photoresponse.
  • the amino acid sequences of the genes in the parent SN14 and ZYD00006 were submitted to the SOPMA website for protein secondary structure prediction.
  • the protein secondary structure prediction of this gene in parent SN14 shows that it contains 36.63% ⁇ -helix, 14.13%extended chain, 5.21% ⁇ -turn and 44.03%random coil; in parent ZYD00006 its protein
  • the secondary structure prediction shows that it contains 36.35% ⁇ -helix, 14.13%extended chain, 4.12% ⁇ -turn and 45.40%random coil.
  • the change of only one amino acid base leads to a decrease in the number of ⁇ -helices and a decrease in the number of ⁇ -turns in ZYD00006, resulting in more random coils.
  • the amino acid sequences of the genes in the parent SN14 and ZYD00006 were submitted to the SWISS-MODEL website for protein tertiary structure prediction (FIG. 6) , and models with a QMEAN Z score higher than -4.0 and covering amino acid mutation sites were screened.
  • the predicted gene sequence differs in the tertiary structure of protein in SN14 and ZYD00006.
  • RNA was extracted from the organs (roots, stems, leaves, flowers, pods and seeds) of SN14 and reverse transcribed into cDNA, which was identified by qRT-PCR, Glyma.
  • 06G303700 was expressed in all tissues and organs, very low expression in roots, relatively high expression in stems, leaves, and flowers, higher expression in pods, and the highest expression in seeds, reaching a relative multiple of more than 5 times.
  • RNA is extracted and cDNA were synthesized (Table 16) and qRT-PCR was performed using the following specific primers.
  • the reference gene is GmActin4 (Genbank No: AF049106, Table 14) .
  • Tobacco planting soil was prepared by mixing flower nutrient soil with vermiculite at a ratio of 3: 1. After germination, the seedlings or transfers to new small flowerpots, one plant per pot, placed into an incubator (22°C, 16 h/8 h light/dark, 70 ⁇ mol ⁇ m-2 ⁇ s-1) for cultivation, and watered once every 2 days to ensure adequate water.
  • the agrobacterium pellet was then resuspended in 1 mL of resuspension buffer and 2 ⁇ L of acetosyringone (dissolved in DMSO) to reach a final concentration of 0.04 g/L to the bacteria.
  • the bacterial solution was then transferred to a large EP tube, adjusted the OD 600 to about 0.2 by resuspending the Buffer, and let stand at room temperature for 1-3 h. Healthy tobacco after 3 weeks of growth was selected.
  • FIG. 8 shows results of subcellular localization of Glyma. 06G303700 in an exemplary assay. The green fluorescence of pr35S-Glyma. 06G303700-GFP appears in the cell membrane and nucleus, indicating a nuclear membrane co-expression pattern.
  • RNA extraction from soybean SN14 leaves was extractedby the trizol method. With 2%concentration agarose gel and electrophoresis detection, three bands of 28s, 18s and 5s were observed, which indicated that the integrity of the RNA was good.
  • the cDNA was obtained by reverse transcription and used for Glyma. 06G303700 gene cloning.
  • Glyma. 06G303700 clone The CDS sequence of Glyma. 06G303700 was obtained from the phytozome database. The total length of the sequence is 2190 bp. This sequence was used as a template to design primers at both ends of the gene's CDS sequence (with the terminator removed) . The primer pair was designed to comprise restriction sites (SpeI and BamHI) at both ends of the ccdB gene in the entry vector Firstly, SN14 leaf cDNA was used as a template to clone the CDS sequence of Glyma. 06G303700 gene with CDS primers, and then this product was used as a template to perform PCR with primers with restriction sites to obtain Glyma.
  • SpeI and BamHI restriction sites
  • 06G303700 with restriction sites on both ends.
  • the gene products with restriction sites were recovered through the gel recovery kit for subsequent experiments.
  • the full-length CDS sequence of Glyma. 06G303700 (with the termination codon TGA removed) was cloned using the cDNA of soybean Suinong 14 leaves as a template.
  • the CDS sequence was amplified using the following primers.
  • 06G303700 fragments were gel-purified and cloned into an entry vector Fu28 by restriction digestion and ligation.
  • the Fu28 vector fragment with the ccdB gene cut out was about 3200 bp.
  • the gene Glyma. 06G303700 fragment is about 2200 bp.
  • the ligation products were transformed into Escherichia coli.
  • Bacterial clones comprising the cDNA sequence of Glyma. 06G303700 were identified by PCR and verified by sequencing analysis using primers described below.
  • EHA105 Agrobacterium competent cells were first transformed with pr35S-Glyma. 06G303700, the transformed bacterial cells were grown on a YEP plate that is resistant to both rifampicin and spectinomycin, and monoclonal colonies were selected. The transformation was confirmed by PCR as indicated by the presence of a a 2190 bp DNA fragment, which represented that the expression vector (pr35S-Glyma. 06G303700) has been transferred into EHA105 Agrobacterium tumefaciens. Using Gateway technology, the Glyma. 06G303700 gene fragment and related tags contained in the Fu28 entry vector were transferred to the expression vector pr35S (spectinomycin resistant) through the LR recombination reaction. The reaction system and reaction conditions are shown in Table 18.
  • the resulting plasmid pr35S-Glyma. 06G303700 was transformed into Escherichia coli. The positive clones were identified by PCR and sequencing analysis. pr35S-Glyma. 06G303700 plasmid was then extracted from the positive monoclonal bacteria culture and transformed into Agrobacterium tumefaciens EHA105. Positive clones were identified by PCR.
  • Arabidopsis gene AT1G05230 homologous to Glyma. 06G303700 were selected and their conserved domains identified.
  • Arabidopsis homologous gene AT1G05230 contains three conserved domains: START_ArGLABRA2_like, Homeobox, and MrC super family. Similar to Glyma. 06G303700, the Arabidopsis homologous gene AT1G05230 also has the START_ArGLABRA2_like domain, as shown in Table 19.
  • SALK_127828.4700. x (SEQ ID NO: 60) as the Glyma. 06G303700 Arabidopsis mutant.
  • SALK_127828.4700. x is an Arabidopsis mutant with Col-0 as the background, with an insertion of 186bp sequence into the coding region by means of T-DNA insertion mutagenesis.
  • the T 0 seeds transformed by Arabidopsis mutants were planted. After growing two leaves, Basta reagent (reagent comprising Basta herbicide) was sprayed once every other day. After spraying three times, a large number of Arabidopsis were found to be yellow and stagnant, and only a few Arabidopsis plants continued to grow. It was preliminarily identified as transgenic Arabidopsis replenishment and overexpression plants.
  • Transgenic Arabidopsis T1, T2, T3 generation plants were planted.
  • Leaf extract was prepared as described above.
  • a Bar test strip was inserted into the extract in a specified direction as provided in the manufacture’s instructions. The results displayed on the Bar test strip indicated that the Arabidopsis from which the leaf extract was obtained was genetically modified.
  • Transgenic Arabidopsis plants were planted in T1, T2, and T3 generations.
  • DNA was extracted from rosette leaves of Arabidopsis thaliana, and PCR of the target gene Glyma. 06G303700 and the Bar gene were performed respectively.
  • the results showed that there were bands at 516 bp (Bar gene) and 2190 bp (Glyma. 06G303700 gene) , indicating that transgenic Arabidopsis plants were obtained.
  • Transgenic Arabidopsis overexpression plant pr35S: Glyma. 06G303700 and mutant replenishment plant pr35S: Glyma. 06G303700/SALK_127828.4700. X
  • Col-0 and mutant plant SALK_127828.4700. X were planted on the same conditions, the total RNA was extracted from the rosette leaves and reverse transcribed into cDNA, and the expression of the target gene Glyma. 06G303700 was checked by qRT-PCR reaction. The results showed that the gene Glyma. 06G303700 was expressed in mutant replenishment plants and overexpression plants. The expression level of the gene in overexpression plants was higher than that in mutant replenishment plants.
  • Transgenic Arabidopsis thaliana (overexpressed plant pr35S: Glyma. 06G303700 and mutant complement plant pr35S: Glyma. 06G303700/SALK_127828.4700. X) and Col-0 and mutant plant SALK_127828.4700. X were planted under the same conditions.
  • the mature pods of T 3 generation transgenic plants of Arabidopsis thaliana were collected, and the seeds were obtained and dried.
  • the fatty acid composition content of Arabidopsis thaliana seeds was determined by gas chromatography, and the total nitrogen content of Arabidopsis thaliana seeds was determined by Kjeldahl nitrogen determination method.
  • mutant plants When the phenotype of mutant materials and wild-type materials was determined, the content of fatty acid components in mutant plants was lower than wild-type materials, and the content of oleic acid, linoleic acid and eicosenoic acid was significantly lower than wild-type materials. The total nitrogen content of mutant plants was significantly lower than wild type plants.
  • the phenotype of T 3 transgenic seeds was determined, the content of fatty acids in the seed grains of the mutant plants was significantly increased, but still lower than the control plants, and the content of linoleic acid was significantly increased; the content of components in the overexpression plants was higher than the wild type, the content of palmitic acid was extremely significantly increased, and the content of oleic acid and eicosenoic acid was significantly increased.
  • the T 1 genetically modified soybeans were planted the leaves were crushed and tested using the Bar test strip as described above. The result showed that two horizontal lines appear on the Bar test strip, indicating that the verified plants were genetically modified soybean plants.
  • the DNA was extracted from the leaves of T 1 generation transgenic soybean, and the target gene Glyma. 06G303700 primer and Bar primer were used for PCR. After 1.5%concentration agarose gel electrophoresis, the results showed that there were 516bp (Bar) and 2190bp (Glyma. 06G303700) bands, indicating that the verified plants were transgenic soybean plants.
  • the transgenic soybean (overexpression plant 35S: Glyma. 06G303700) and the control plant DN50 were planted under the same conditions. The young leaves were taken to extract total RNA and reverse transcribed into cDNA. The expression level of Glyma. 06G303700 was tested by qRT-PCR reaction. The results showed that the expression level of Glyma. 06G303700 in the overexpression plants was higher than the control plants, indicating that Glyma. 06G303700 was successfully transformed into soybean plants (FIG. 11) .
  • the transgenic soybean (overexpression plant 35S: Glyma. 06G303700) and the control plant DN50 were planted under the same conditions, their mature seeds were harvested, and some of the seeds were dried for phenotyping, and the grain protein and oil content were determined by gas chromatography analysis. The content of fatty acid components in Arabidopsis seeds was determined by gas chromatography. The protein, oil, and fatty acid content in the overexpression plants were significantly higher than the control plants, indicating that Glyma. 06G303700 promoted quality traits (protein and oil content) (FIG. 12) .
  • the haplotype analysis were performed on 680 soybean resource populations in Northeast China.
  • the protein oil content of the soybean resource population in Northeast China was phenotypically analyzed, and the analysis showed that this population had varying amounts of protein and oil content.
  • the highest protein content of this resource group in the northeast region in 2019 was 52.94%, the lowest was 37.09%, and the average was 42.69%; the highest oil content was 23.04%, the lowest was 14.45%, and the average was 20.74%.
  • This pattern conforms to the variation law of phenotypic traits and can be used for haplotype analysis of candidate genes.
  • the research team conducted a whole-genome resequencing analysis of the soybean resource population in the northeast region. This experiment used the data to perform gene haplotype analysis.
  • Block 1 contains 5 SNPs, of which 1 SNPs is located in the CDS coding region; Block 2 contains 12 SNPs, among which 2 SNPs are located in CDS coding region; Block 3 contains two SNPs, both of which are not in the CDS coding region.
  • Haplotypes that exceed 5.0%of the population (more than 34 Northeast resource groups) are called excellent haplotypes. There are 3 excellent haplotypes in Block 1, Hap_1, Hap_2 and Hap_3 account for 71.07%, 12.60%and 7.31%of all haplotypes. Use the multiple comparison function of SPSS software, the phenotype of the resource material protein and oil in each group of excellent haplotypes were analyzed.
  • Hap_1 and Hap_2 showed extremely significant difference in protein content (P ⁇ 0.01) and significant difference in oil content (P ⁇ 0.05) ; Hap_1 and Hap_3 showed extremely significant differences in protein content (P ⁇ 0.01) and oil content (P ⁇ 0.01) ; Hap_2 and Hap_3 showed no significant difference in protein content, but showed extremely significant difference in oil content (P ⁇ 0.01) .
  • Hap_1 showed a low-protein phenotype
  • Hap_2 and Hap_3 showed a high-protein phenotype
  • in terms of oil content Hap_1 and Hap_2 showed a high-oil phenotype
  • Hap_3 showed a low-oil phenotype.
  • the base variation (C: T) in the exon region occurred at 1890 bp of the gene.
  • the SNP variation of Hap_2 was different from the reference genome, and the protein content of the phenotype was extremely different from that of HAP_1 (P ⁇ 0.01) ; oil content is significantly different from HAP_1 (P ⁇ 0.05) , and extremely significantly different from Hap_3 (P ⁇ 0.01) . See FIGS. 13 and Table 21.
  • Hap_4 and Hap_5 There were two excellent haplotypes in Block 2: Hap_4 and Hap_5, which account for 58.01%and 20.68%of the haplotypes, respectively.
  • Using the multiple comparison function of SPSS software to analyze the significance of the protein oil phenotype of resource materials in each group of haplotypes there was no significant difference in protein content between Hap_4 and Hap_5, but there was a significant difference in oil content (P ⁇ 0.05) .
  • Hap_4 showed a low oil phenotype, while Hap_5 showed a high oil phenotype.
  • Hap_5 had SNP variations different from the reference genome, and the oil phenotype of Hap_5 was significantly different from Hap_4 (P ⁇ 0.05) . See FIG. 14 and Table 22.
  • Hap_6 and Hap_7 There are two excellent haplotypes in Block 3: Hap_6 and Hap_7, which account for 64.23%and 35.93%of the haplotypes, respectively.
  • Hap_6 and Hap_7 showed significant differences in protein content (P ⁇ 0.05) , and there was a very significant difference in terms of oil content (P ⁇ 0.01) .
  • the protein content of Hap_6 was higher than that of Hap_7, and Hap_6 had a high protein phenotype while Hap_7 was a low protein phenotype.
  • the oil content of Hap_6 was lower than Hap_7, and Hap_6 showed a low oil phenotype while Hap_7 showed a high oil phenotype. See FIG. 15 and Table 23.
  • Table 25 haplotypes assocated with increased protein content and oil content
  • Glyma. 03G040200 has an OPT domain, the gene expression in seeds is low, and there is no difference in parent amino acid sequence; Glyma. 03G036300 has a domain, having a function related to DNA repair, and the gene expression of which is absent in various tissues, .
  • Glyma. 07G192400 has no recognizable domains, and it is highly expressed in seeds.
  • Glyma. 06G303700 has structural domains with a function related to lipid transfer and is highly expressed in seeds, ;
  • Glyma. 06G297500 has no recognizable domains, it is expressed inlow levels in seeds.
  • Glyma. 06G303700 has the domain START_ArGLABRA2_like, having a function related to lipid transfer. Results from tissue-specific expression indicate that the gene is expressed in high levels in the seeds, which may be related to soybean quality and regulation of the synthesis and metabolism of grain storage related.
  • Glyma. 06G303700 is expressed in all tissues and organs, with the highest expression level in seeds.
  • the expression pattern of Glyma. 06G303700 and the published soybean seed protein oil-related genes (GmWRI1a, GmWRI1b, GmLEC1a, GmLEC1b, GmFUSa, GmABI3, GmABI5, GmDREBL) during the life cycle of soybean seed development are partly similar, showing a low-high-low trend.
  • Block 1 has three excellent haplotypes: Hap_1 is a low protein and high oil phenotype, Hap_2 is a high protein and high oil phenotype, and Hap_3 is a high protein and low oil phenotype.
  • Block 2 has two excellent haplotypes: Hap_4 is a low oil phenotype and Hap_5 is a high oil phenotype.
  • Block 3 has two excellent haplotypes: Hap_6 is a high protein and low oil phenotype and Hap_7 is a low protein and high oil phenotype.
  • a single component may be replaced by multiple components, and multiple components may be replaced by a single component, to provide an element or structure or to perform a given function or functions. Except where such substitution would not be operative to practice certain embodiments of the disclosure, such substitution is considered within the scope of the disclosure.

Abstract

Compositions and methods for increasing the protein content and/or increasing oil content, and/or modifying oil profile of soybean plant are provided. Compositions include isolated and recombinant polynucleotides encoding polypeptides, expression cassettes, host cells, plants, plant parts stably incorporating these polynucleotides. Methods and kits are provided for producing these plants via transgenic means, breeding or genomic editing approaches and identify plants having increased protein content, increased oil content, and/or modified oil profile.

Description

METHODS AND COMPOSITIONS FOR INCREASING PROTEIN AND OIL CONTENT AND/OR MODIFYING OIL PROFILE IN A PLANT FIELD
This disclosure relates to the field of plant biotechnology. In particular, it relates to methods and compositions for increasing plant protein /oil content and modifying oil profile.
BACKGROUND
Soybean is a valuable field crop. Soybean oil extracted from the seed is employed in a number of retail products such as cooking oil, baked goods, margarines and the like. Soybean is also used as a grain as a food source for both animals and humans. Soybean meal is a component of many foods and animal feed. Typically, during processing of whole soybeans, the fibrous hull is removed and the oil is extracted, and the remaining soybean meal is a combination of approximately 50%carbohydrates and 50%protein. For human consumption soybean meal is made into soybean flour that is processed to protein concentrates used for meat extenders or specialty pet foods. Production of edible protein ingredients from soybean offers a healthier and less expensive replacement for animal protein in meats as well as dairy-type products.
BRIEF SUMMARY
In one aspect, provided herein is an elite Glycine max plant having in its genome a nucleic acid sequence from a donor Glycine plant, wherein the donor Glycine plant is a different strain from the elite Glycine max plant, and wherein the nucleic acid sequence encoding at least one polypeptide having at least 90%identity or 95%identity to SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein said polypeptide confers increased protein, oil content, and/or modified oil profile on the elite Glycine max plant.
In another aspect, provided herein is a plant, the genome of which has been edited to comprise a nucleic acid sequence encoding at least one polypeptide having at least 90%identity or 95%identity to SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22 or 24-59 , wherein said polypeptide confers increased protein, increased oil content, and/or modified oil profile relative to a control plant, wherein the plant does not comprise said nucleic acid sequence before the genome editing.
In another aspect, provided herein is a plant having stably incorporated into its genome a nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence encodes a polypeptide having (a) an amino acid sequence comprising at least 85%, at least 90%, or at least 95%identity to at least one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22,  or 24-59, or, (b) an amino acid sequence set forth in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein said nucleic acid sequence is heterologous to the plant, and wherein the plant has increased protein content and/or increased oil and/or modified oil profile as compared to a control plant.
In yet another aspect, provided herein is a method of producing a soybean plant having increased protein, increased oil content, and/or modified oil profile, the method comprising the steps of: a) providing a donor soybean plant comprising in its genome a nucleic acid sequence encoding at least one polypeptide having at least 90%identity or 95%identity to SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or a nucleic acid sequence encoding any one of SEQ ID NO: 22, or 24-59, wherein said nucleic acid sequence confers onto said donor soybean plant an increased protein, increased oil content, and/or modified oil profile; b) crossing the donor soybean plant of a) with the recipient soybean plant not comprising said nucleic acid sequence; and c) selecting a progeny plant from the cross of b) by isolating a nucleic acid from said progeny plant and detecting within said nucleic acid a molecular marker associated with said nucleic acid sequence thereby producing a soybean plant having increased protein content, increased oil content, and/or modified oil profile.
In yet another aspect, provided herein is a method of conferring increased protein content, increased oil content, and/or modified oil profile to a plant comprising: a) introducing into the genome of the plant a nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence is stably incorporated into the genome, wherein the nucleic acid sequence encodes a polypeptide having (i) an amino acid sequence comprising least 85%, at least 90%, or at least 95%identity to at least one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59, or (ii) an amino acid sequence set forth in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59, wherein said nucleic acid sequence is heterologous to the plant, and wherein expression of said nucleic acid sequence increases protein content, increases oil content, and/or modified oil profile compared to a control plant not expressing said nucleic acid sequence.
In yet another aspect, provided herein is a polypeptide selected from: (a) a polypeptide having the amino acid sequence shown in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein expression of the polypeptide in a plant confers increased protein, oil content, and/or modified oil profile on said plant, and having a heterologous amino acid sequence attached thereto; (b) a polypeptide comprising the amino acid sequence of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, and having a substitution and/or a deletion and/or an addition of one or more amino acid residues, wherein expression of the polypeptide in the plant confers increased protein, increased oil content, and/or modified oil profile on said plant; (c) a polypeptide having  at least 99%, at least 95%, at least 90%, at least 85%, or at least 80%identity with and having the same function as the amino acid sequence of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein the polypeptide when expressed in a plant confers increased protein content, increased oil content, and/or modified oil profile on the plant; or (d) a fusion protein comprising the amino acid sequence of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59 or the polypeptide as defined in any one of (a) to (c) .
In yet another aspect, provided herein is a nucleic acid molecule comprising (a) a nucleotide sequence encoding a protein having an amino acid sequence sharing at least 90%, 95%or 100%sequence identity to SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein said nucleotide sequence comprises a heterologous nucleic acid sequence attached thereto and expression of the nucleic acid molecule in a plant increases protein content, increases oil content, and/or modified oil profile in the plant; (b) the nucleotide sequence of part (a) comprising a sequence of SEQ ID NOs: NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, 21 and a sequence encoding SEQ ID NO: 22, 24-59; or (c) the nucleotide sequence of part (a) having at least 99%, at least 95%, at least 90%identity to of any one of SEQ ID NOs: NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, 21 or a polynucleotide of SEQ ID NO: 22, 24-59.
In yet another aspect, provided herein are primer pairs for amplifying the nucleic acid molecule as disclosed above.
BRIEF DESCRIPTION OF THE DRAWINGS
The present application includes the following figures. The figures are intended to illustrate certain embodiments and/or features of the compositions and methods, and to supplement any description (s) of the compositions and methods. The figures do not limit the scope of the compositions and methods, unless the written description expressly indicates that such is the case.
FIG. 1 shows the alignment diagram of Glyma. 06G303700 CDS sequence in Suinong 14 (SN14) , ZYD00006 (ZYD) and Williams 82 (W82) .
FIG. 2 shows a phylogenetic tree of Glyma. START (Glyma. 06G303700) according to certain aspects of this disclosure.
FIG. 3 shows the comparison of amino acid sequences of Glyma. 03G036300 and AT1G05230.
FIG. 4 shows the comparison of amino acid sequences of Glyma. 03G036300 in Suinong 14 (SN14) , ZYD00006 (ZYD) and Williams 82 (W82) .
FIG. 5 shows the comparison of amino acid sequences of Glyma. 07G192400 Suinong 14 (SN14) , ZYD00006 (ZYD) and Williams 82 (W82) .
FIGS. 6A-B show the predicted tertiary protein structures of Glyma. START (Glyma. 06G303700) derived from soy strains Suinong 14 (SN14) and ZYD00006 (ZYD) , respectively, according to certain aspects of this disclosure.
FIG. 7 shows tissue-specific expression of Glyma. START (Glyma. 06G303700) according to certain aspects of this disclosure.
FIG. 8 shows cell location of Glyma. START (Glyma. 06G303700) according to certain aspects of this disclosure.
FIG. 9 shows results of using qRT-PCR to identify transgenic Arabidopsis expressing Glyma. START (Glyma. 06G303700) under the control of an 35S promoter according to certain aspects of this disclosure.
FIGS. 10A-B show results of analyzing seed fatty acid content/profile and protein content, respectively, in Arabidopsis mutant, transgenic Arabidopsis expressing a Glyma. START (Glyma. 06G303700) according to certain aspects of this disclosure.
FIG. 11 shows results of using qRT-PCR to identify transgenic soybean expressing Glyma. START (Glyma. 06G303700) under the control of an 35S promoter according to certain aspects of this disclosure.
FIG. 12A-C show results of seed protein content and fatty acid content, and fatty acid profile, respectively, in transgenic soybean expressing a Glyma. START (Glyma. 06G303700) according to certain aspects of this disclosure.
FIGS. 13A-13B show the protein content distribution and oil content distribution, respectively, of excellent haplotype phenotypic in block 1 of Glyma. START (Glyma. 06G303700) according to certain aspects of this disclosure.
FIGS. 14A-14B show the protein content distribution and oil content distribution, respectively, of excellent haplotype phenotypic in block 2 of Glyma. START (Glyma. 06G303700) according to certain aspects of this disclosure.
FIG. 15A-15B show the protein content distribution and oil content distribution, respectively, of excellent haplotype phenotypic in block 3 of Glyma. START (Glyma. 06G303700) according to certain aspects of this disclosure.
FIG. 16 shows the map of the Fu28 entry vector according to certain aspects of this disclosure.
FIG. 17 shows the map of the pr35S expression vector according to certain aspects of this disclosure.
DETAILED DESCRIPTION
All technical and scientific terms used herein, unless otherwise defined below, are intended to have the same meaning as commonly understood by one of ordinary skill in the art. References to techniques employed herein are intended to refer to the techniques as commonly understood in the art, including variations on those techniques and/or substitutions of equivalent techniques that would be apparent to one of skill in the art.
Provided herein are plants expressing polypeptides that increase protein content, increase oil content, and/or modify oil profile when expressed in a plant or part thereof. In some instances, the polypeptides result in a modified oil profile when expressed in a plant or part thereof as compared to a control plant that does not express the polypeptides. The terms “oil content” and “fatty acid content” are used interchangeably herein. The terms “fatty acid profile” and “oil profile” are used interchangeably herein. The polypeptides include SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59, and variants of thereof. Various means of introducing nucleic acid sequence into the soybean plant are also disclosed, which include transgenic means, gene editing, and breeding. Markers for identifying the presence of these nucleic acid sequences in the plant are also disclosed. As used herein, the terms “phenotype, ” “phenotypic trait” or “trait” refer to a distinguishable characteristic (s) of a genetically controlled trait.
In some embodiments, the plants provided herein are a non-naturally occurring variety of soybean having the desired trait. In specific embodiments, the non-naturally occurring variety of soybean is an elite soybean variety. A “non-naturally occurring variety of soybean” is any variety of soybean that does not naturally exist in nature. A “non-naturally occurring variety of soybean” may be produced by any method known in the art, including, but not limited to, transforming a soybean plant or germplasm, transfecting a soybean plant or germplasm and crossing a naturally occurring variety of soybean with a non-naturally occurring variety of soybean. In some embodiments, a “non-naturally occurring variety of soybean” may comprise one of more heterologous nucleotide sequences. In some embodiments, a “non-naturally occurring variety of soybean” may comprise one or more non-naturally occurring copies of a naturally occurring nucleotide sequence (i.e., extraneous copies of a gene that naturally occurs in soybean) . In some embodiments, a “non-naturally occurring variety of soybean” may comprise a  non-natural combination of two or more naturally occurring nucleotide sequences (i.e., two or more naturally occurring genes that do not naturally occur in the same soybean, for instance genes not found in Glycine max lines) .
Methods and compositions are provided that modulate the level of oil, protein and/or fatty acids in a plant, a plant part, or a seed. In specific embodiments, various methods and compositions are provided that produce an increase in protein content in the plant, plant part or seed. An increase in protein content includes any statistically significant increase in the protein content in the plant, plant part or seed when compared to an appropriate control plant or plant part and includes, for example, an increase of at least 0.2%, 0.4%, 0.6%, 0.8%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%or higher. In other embodiments an increase in protein content includes an increase of about 0.2%to about 0.5%, about 0.5%to about 1%, about 1%to about 2%, about 2%to about 3%, about 4%to about 5%, about 5%to about 6%, about 6%to about 7%, about 7%to about 8%, about 8%to about 9%, about 9%to about 10%, about 10%to about 12%, about 12%to about 14%, about 14%to about 16%, about 16%to about 18%, about 18%to about 20%, about 22%to about 25%, about 25%to about 30%. Various methods of assaying for protein content levels are known. For example, mature seeds can be harvested, and grain protein content can be determined by FOSS Near Infrared Ray (NIR) analysis (see examples) or by assaying for nitrogen content with an automatic Kieldahl apparatus, or via elemental analyzer.
In other embodiments, various methods and compositions are provided that produce an increase in oil content (e.g., an increase in fatty acid content) in the plant, plant part or seed. An increase in oil content includes any statistically significant increase in the oil content in the plant, plant part or seed when compared to an appropriate control plant or plant part and includes, for example, an increase of at least 0.2%, 0.4%, 0.6%, 0.8%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%or higher. In other embodiments an increase in oil content includes an increase of about 0.2%to about 0.5%, about 0.5%to about 1%, about 1%to about 2%, about 2%to about 3%, about 4%to about 5%, about 5%to about 6%, about 6%to about 7%, about 7%to about 8%, about 8%to about 9%, about 9%to about 10%, about 10%to about 12%, about 12%to about 14%, about 14%to about 16%, about 16%to about 18%, about 18%to about 20%, about 22%to about 25%, about 25%to about 30%. Various methods of assaying for oil content levels are known. For example, mature seeds can be harvested, and grain protein content can be determined by NIR or wet chemistry analysis (see examples)
In other embodiments, various methods and compositions are provided that produce a modified oil profile in the plant, plant part or seed. A modified oil profile includes a change in a ratio of fatty acids consitutents included in the oil generated by the plant, plant part or seed, relative to a control plant, without a change (e.g., without an increase or a decrease) in the oil content or oil level of the plant, plant part or seed. In embodiments, the modified oil profile comprises a modified fatty acid profile, wherein the modified fatty acid profile includes an increase in linoleic acid and/or palmitic acid and/or oleic acid and/or eicosenoic acid in the oil relative to other fatty acid constituents of the oil. In example embodiments, the modified oil profile includes an increase of at least 0.2%, 0.4%, 0.6%, 0.8%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%or higher or linoleic acid and/or palmitic acid and/or oleic acid and/or eicosenoic acid in the oil without a corresponding increase in oil content. In embodiments, a modified fatty acid profile results in a modified oil profile. In embodiments, a modified oil profile comprises a modified fatty acid profile.
In other embodiments, various methods and compositions are provided that produce an increase in fatty acid content in the plant, plant part or seed. An increase in fatty acid content includes any statistically significant increase in the fatty content in the plant, plant part or seed when compared to an appropriate control plant or plant part and includes, for example, an increase of at least 0.2%, 0.4%, 0.6%, 0.8%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%or higher. In other embodiments, an increase in fatty acid content includes an increase of about 0.2%to about 0.5%, about 0.5%to about 1%, about 1%to about 2%, about 2%to about 3%, about 4%to about 5%, about 5%to about 6%, about 6%to about 7%, about 7%to about 8%, about 8%to about 9%, about 9%to about 10%, about 10%to about 12%, about 12%to about 14%, about 14%to about 16%, about 16%to about 18%, about 18%to about 20%, about 22%to about 25%, or about 25%to about 30%. Various methods of assaying for fatty content levels are known. For example, mature seeds can be harvested, and grain protein content can be determined by gas chromatography (see examples) . In specific embodiments, the methods and compositions provide for an increase in linoleic acid and/or palmitic acid and/or oleic acid and/or eicosenoic acid in increased (or any combination thereof) when compared to an appropriate control plant. Such increases include for example, an increase of at least 0.2%, 0.4%, 0.6%, 0.8%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%or higher. In other embodiments, an increase in linoleic acid and/or palmitic acid and/or oleic acid and/or eicosenoic acid in increased (or any combination thereof) includes an increase of about 0.2%to  about 0.5%, about 0.5%to about 1%, about 1%to about 2%, about 2%to about 3%, about 4%to about 5%, about 5%to about 6%, about 6%to about 7%, about 7%to about 8%, about 8%to about 9%, about 9%to about 10%, about 10%to about 12%, about 12%to about 14%, about 14%to about 16%, about 16%to about 18%, about 18%to about 20%, about 22%to about 25%, about 25%to about 30%. or higher of linoleic acid and/or palmitic acid and/or oleic acid and/or eicosenoic acid.
A "subject plant or plant cell" is one in which genetic alteration, such as transformation, has been affected as to a polynucleotide of interest, or is a plant or plant cell which is descended from a plant or cell so altered and which comprises the alteration. A "control" or "control plant" or "control plant cell" provides a reference point for measuring changes in phenotype of the subject plant or plant cell. A control plant or plant cell may comprise, for example: (a) a wild-type plant or cell, i.e., of the same genotype as the starting material for the genetic alteration which resulted in the subject plant or cell; (b) a plant or plant cell of the same genotype as the starting material but which has been transformed with a null construct (i.e., with a construct which has no known effect on the trait of interest, such as a construct comprising a marker gene) ; (c) a plant or plant cell which is a non-transformed segregant among progeny of a subject plant or plant cell; (d) a plant or plant cell genetically identical to the subject plant or plant cell but which is not exposed to conditions or stimuli that would induce expression of the gene of interest; or (e) the subject plant or plant cell itself, under conditions in which the gene of interest is not expressed.
I. Polynucleotides and polypeptides that confer increased protein and/or oil content and/or modify oil profile
Compositions and methods for conferring increased protein content, increased oil content, and/or modified oil profile are provided. Polypeptides, polynucleotides and fragments and variants thereof that confer increased protein content, increased oil content, and/or modify oil profile are provided. In some embodiments, the polypeptide is SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, or a fragment or variant of any one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59. In some embodiments, the polynucleotide is any one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21, a polynucleotide encoding a polypeptide having the sequence of any one of SEQ ID NO: 22, or 24-59, or a fragment or variant of any one thereof. As used herein, the term “gene” refers to a hereditary unit including a sequence of DNA that occupies a specific location on a chromosome and that contains the genetic instruction for a particular characteristic or train in an organism. In various embodiments, the genome of the soybean cultivar Williams 82 (www. ncbi. nlm. nih. gov/assembly/GCF_000004515.6/? &utm_source=gene) is used as the  reference soybean genome. Williams 82 was derived from backcrossing a Phytophthora root rot resistance locus from the donor parent Kingwa into the recurrent parent Williams. See Schmutz et al., Nature 2010 Jan 14; 463 (7278) : 178-83. doi: 10.1038/nature08670.
Glyma. 06G303700
Glyma. 06G303700 (SEQ ID NO: 1-5) sequence is expressed in all tissues and organs, with the highest expression level in seeds. Glyma. START (Glyma. 06G303700) comprises several conserved domains: a START_ArGLABRA2_like domain (aa 241-465 of SEQ ID NO: 3 &5) ; a START domain (aa 246-466 of SEQ ID NO: 3 &5) ; a homeobox domain (aa 57-110 of SEQ ID NO: 3 &5) ; a homeodomain (aa 55-113 of SEQ ID NO: 3 &5) ; a COG5576 superfamily domain (aa 13-129 of SEQ ID NO: 3 &5) ; and a MreC superfamily domain (aa 120-193 of SEQ ID NO: 3 &5) . The START_ArGLABRA2_like domain is the C-terminal lipid-binding START domain of the Arabidopsis homeobox protein GLABRA 2. The START_ArGLABRA2_like subfamily includes the Arabidopsis homeobox protein GLABRA 2 and other proteins related to steroid production. The homeobox domain encodes a 61-amino acid sequence, which has the ability to bind specific DNA sequences and control gene expression at the transcriptional level. The COG5576 super family domain is a homeodomain-containing, transcriptional regulation domain. MreC superfamily domain usually involves in formation and maintenance of cell shape, which can position cell wall synthetic complexes.
The genomic sequence of Glyma. 06G303700 is 8466 bp in length, and the CDS sequence is 2190 bp in length. The exon region of Glyma. 06G303700 (SEQ ID NO: 3) in soy variety SN14 is identical to the corresponding gene in soy variety Williams82 (W82) . Wild soybean (G. soja) variety ZYD00006 (ZYD) comprises four mutations in Glyma. 06G303700 relative to Williams82 (FIG. 1) : C1162T (i.e., change from C to T at 1162 bp position) , A1370G (i.e., change from A to G at 1370 bp position) , C2063G (i.e., change from C to G at 2063 bp position) , and C2098G (i.e., change from G to A at 2098 bp position) . The last three base mutations do not result in any changes in the encoded amino acids, but the first base mutation, C1162T, resulted in an alanine to valine substitution at position 388, i.e., A388V.
The phylogenetic tree of Glyma. 06G303700 was constructed using homologous sequences from Soybean, Arabidopsis, rice, corn, and other plants with MEGA5 software. See FIG. 2. Glyma. 06G303700 shows high homology with Glyma. 15G220200, Glyma. 12G100100, and AT1G05230. Glyma. 12G100100 contains the same conserved domains as Glyma. 06G303700. AT1G05230 contains START_ArGLABRA2_like and homeobox domains,  which are also present in Glyma. 06G303700. AT1G05230 and Glyma. START (Glyma. 06G303700) share 78.9%amino acid sequence identity, See FIG. 3.
Glyma. 03G040200
Glyma. 03G040200 (SEQ ID NO: 10-12) has an OPT domain (aa 4-73 of SEQ ID NO: 12) , which is related to transmembrane transport. Glyma. 03G040200 is expressed in low levels in seeds. The genomic sequence of Glyma. 03G040200 (SEQ ID NO: 10) is 463 bp in length, and the CDS sequence (SEQ ID NO: 11) is 237 bp in length. As compared with soy variety Williams82 (SEQ ID NO: 12) , there are five coding region mutations in both soy variety SN14 and ZYD00006, all of which result in amino acid substitutions. These mutations are A2V, R13S, T24I, G70S, and W48* (tryptophan to a stop codon) .
Glyma. 03G036300
Glyma. 03G036300 (SEQ ID NO: 6-9) is a pif1 helicase and is involved in a number of cellular processes including DNA repair, DNA strand breaking, recombination, nucleotide binding, ATP binding, telomere maintenance, and cell response to DNA damage stimulation. The protein possesses helicase activity and hydrolase activity. Glyma. 03G036300 comprises a PIF1 domain (aa 2-211 of SEQ ID NO: 8) , a SF1_C_RecD domain (aa 258-303 of SEQ ID NO: 8) , and a RecD domain (aa 250-294) . PIF1 domain is a conserved domain shared by the PIF1-like helicase family. The SF1_C_RecD domain is found in the C-terminal helicase domain of Rec D family helicases. The RecD domain is found in the ATP-dependent exoDNAses and the like and acts as a 3'-5' helicase. RecBCD enzyme can unfold or separate DNA strands and also forms single-stranded gaps in DNA.
The full length of genomic sequence of Glyma. 03G036300 in W82 (SEQ ID NO: 6) is 988 bp, and the full length of CDS (SEQ ID NO: 7) is 987 bp. Glyma. 03G036300 in ZYD is same as that in W82. The translation of Glyma. 03g036300 is terminated at 294th amino acid in SN14, and it can be translated normally in ZYD00006 (FIG. 4) .
Glyma. 07G192400
Glyma. 07G192400 (SEQ ID NO: 16-19) is highly expressed in seeds and is involved in transmembrane transport. No conserved domain information was known for Glyma. 07G192400. The genome sequence of the gene Glyma. 07G192400 (SEQ ID NO: 16) is 4263 bp in length, and the CDS sequence (SEQ ID NO: 18) is 417 bp in length. See FIG. 5. only one base mutation occurred in ZYD00006, and the mutation was G-A. Translating the CDS sequence of the gene into amino acid sequence, it was found that the base mutation in the CDS sequence led to the  change of amino acid translation, resulting in the change of amino acid from V (valine) to I (isoleucine) at position 46..
Glyma. 06g297500
Currently, no conserved domain information is known for Glyma. 06g297500 (SEQ ID NO: 13-15) . The full length genomic sequence of Glyma. 06G297500 (SEQ ID NO: 13) is 463 bp, and the full length CDS sequence (SEQ ID NO: 14) is 237 bp. The CDS sequence and amino acid sequence are identical in all three of soy varieties SN14, ZYD00006, and Williams82.
Descriptions of functional domains of the genes are further described in Table 1, below.
Table 1. Functional annotation of genes
Figure PCTCN2022075977-appb-000001
The term “corresponding to” in the context of nucleic acid sequences means that when the nucleic acid sequences of certain sequences are aligned with each other, the nucleic acids that “correspond to” certain enumerated positions in the present invention are those that align with these positions in a reference sequence, but that are not necessarily in these exact numerical positions relative to a particular nucleic acid sequence of the invention. Optimal alignment of sequences for comparison can be conducted by computerized implementations of known  algorithms. or by visual inspection. Readily available sequence comparison and multiple sequence alignment algorithms are, respectively, the Basic Local Alignment Search Tool (BLAST) and ClustalW/ClustalW2/Clustal Omega programs available on the Internet (e.g., the website of the EMBL-EBI) . Other suitable programs include, but are not limited to, GAP, BestFit, Plot Similarity, and FASTA, which are part of the Accelrys GCG Package available from Accelrys, Inc. of San Diego, Calif., United States of America. See also Smith &Waterman, 1981; Needleman &Wunsch, 1970; Pearson &Lipman, 1988; Ausubel et al., 1988; and Sambrook &Russell, 2001.
In some embodiments, variants and fragments of the above-described polynucleotides and polypeptides and variants and fragments thereof increase protein content, increase oil content, and/or modify oil profile when expressed in a plant, plant part, or seed.
Fragments of the proteins that increase protein content, increase oil content, and/or modify oil profile when expressed in a plant, plant part, or seed include those that are shorter than the full-length sequences, either due to the use of an alternate downstream start site, or due to processing that produces a shorter protein having the activity. A fragment of a protein that increases protein content, increases oil content, and/or modifies oil profile when expressed in a plant can be a polypeptide that is, for example, 10, 25, 50, 100, 150, 200, 250 or more amino acids in length of any one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59. Such biologically active portions can be prepared by recombinant techniques and evaluated for activity of being able to confer increased protein content, increased oil content, and/or modified oil profile. As used herein, a fragment comprises at least 8 contiguous amino acids of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59.
Variants disclosed herein are polypeptides having an amino acid sequence that has at least 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 86%, about 87%, about 88%, about 89%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%or about 99%identity to the amino acid sequence of any one of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59. Such variants will increase protein content, increase oil content, and/or modify oil profile when expressed in a plant, plant part or seed. In some embodiments, a variant polynucleotide comprises a deletion and/or addition of one or more nucleotides at one or more internal sites within the native polynucleotide and/or a substitution of one or more nucleotides at one or more sites in the native polynucleotide.
Unless otherwise stated, identity and similarity will be calculated by the Needleman-Wunsch global alignment and scoring algorithms (Needleman and Wunsch (1970) J. Mol. Biol.  48 (3) : 443-453) as implemented by the "needle" program, distributed as part of the EMBOSS software package (Rice, P., Longden, I., and Bleasby, A., EMBOSS: The European Molecular Biology Open Software Suite, 2000, Trends in Genetics 16, (6) pp276-277, versions 6.3.1 available from EMBnet at embnet. org/resource/emboss and emboss. sourceforge. net, among other sources) using default gap penalties and scoring matrices (EBLOSUM62 for protein and EDNAFULL for DNA) . Equivalent programs may also be used. By "equivalent program" is intended any sequence comparison program that, for any two sequences in question, generates an alignment having identical nucleotide residue matches and an identical percent sequence identity when compared to the corresponding alignment generated by needle from EMBOSS version 6.3.1.
Additional mathematical algorithms are known in the art and can be utilized for the comparison of two sequences. See, for example, the algorithm of Karlin and Altschul (1990) Proc. Natl. Acad. Sci. USA 87: 2264, modified as in Karlin and Altschul (1993) Proc. Natl. Acad. Sci. USA 90: 5873-5877. Such an algorithm is incorporated into the BLAST programs of Altschul et al. (1990) J. Mol. Biol. 215: 403. BLAST nucleotide searches can be performed with the BLASTN program (nucleotide query searched against nucleotide sequences) to obtain nucleotide sequences homologous to nucleic acid molecules of the invention, or with the BLASTX program (translated nucleotide query searched against protein sequences) to obtain protein sequences homologous to nucleic acid molecules of the invention. BLAST protein searches can be performed with the BLASTP program (protein query searched against protein sequences) to obtain amino acid sequences homologous to protein molecules of the invention, or with the TBLASTN program (protein query searched against translated nucleotide sequences) to obtain nucleotide sequences homologous to protein molecules of the invention. To obtain gapped alignments for comparison purposes, Gapped BLAST (in BLAST 2.0) can be utilized as described in Altschul et al. (1997) Nucleic Acids Res. 25: 3389. Alternatively, PSI-Blast can be used to perform an iterated search that detects distant relationships between molecules. See Altschul et al. (1997) supra. When utilizing BLAST, Gapped BLAST, and PSI-Blast programs, the default parameters of the respective programs (e.g., BLASTX and BLASTN) can be used. Alignment may also be performed manually by inspection.
Two sequences are "optimally aligned" when they are aligned for similarity scoring using a defined amino acid substitution matrix (e.g., BLOSUM62) , gap existence penalty and gap extension penalty so as to arrive at the highest score possible for that pair of sequences. Amino acid substitution matrices and their use in quantifying the similarity between two sequences are well-known in the art and described, e.g., in Dayhoff et al. (1978) "A model of  evolutionary change in proteins. " In "Atlas of Protein Sequence and Structure, " Vol. 5, Suppl. 3 (ed. M.O. Dayhoff) , pp. 345-352. Natl. Biomed. Res. Found., Washington, D.C. and Hemkoff et al. (1992) Proc. Natl. Acad. Sci. USA 89: 10915-10919. The BLOSUM62 matrix is often used as a default scoring substitution matrix in sequence alignment protocols. The gap existence penalty is imposed for the introduction of a single amino acid gap in one of the aligned sequences, and the gap extension penalty is imposed for each additional empty amino acid position inserted into an already opened gap. The alignment is defined by the amino acids positions of each sequence at which the alignment begins and ends, and optionally by the insertion of a gap or multiple gaps in one or both sequences, so as to arrive at the highest possible score. While optimal alignment and scoring can be accomplished manually, the process is facilitated by the use of a computer-implemented alignment algorithm, e.g., gapped BLAST 2.0, described in Altschul et al. (1997) Nucleic Acids Res. 25: 3389-3402, and made available to the public at the National Center for Biotechnology Information Website (www. ncbi. nlm. nih. gov) . Optimal alignments, including multiple alignments, can be prepared using, e.g., PSI-BLAST, available through www. ncbi. nlm. nih. gov and described by Altschul et al. (1997) Nucleic Acids Res. 25: 3389-3402.
In some embodiments, fragments and variants of the polypeptides disclosed herein each comprises one or more conserved domains of the canonical polypeptide. In some embodiments, the variant or fragment can comprise a polypeptide comprising at least 40%, 50%, 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 98%or at least 99%identity to one or more of the conserved domains in the canonical polypeptide sequence.
In one example, a variant or fragment of Glyma. 06G303700 (SEQ ID NO: 3 &5) may comprise one or more of the conserved domains of the START_ArGLABRA2_like domain (aa 241-465 of SEQ ID NO: 3 &5) ; the START domain (aa 246-466 of SEQ ID NO: 3 &5) ; the homeobox domain (aa 57-110 of SEQ ID NO: 3 &5) ; the homeodomain (aa 55-113 of SEQ ID NO: 3 &5) ; the COG5576 superfamily domain (aa 13-129 of SEQ ID NO: 3 &5) ; and/or the MreC superfamily domain. A variant or fragment of Glyma. 06G303700 (SEQ ID NO: 3 &5) can retain activity as a transcription factor.
In another example, a variant or fragment of Glyma. 03G040200 (SEQ ID NO: 12) can comprise a polypeptide comprising at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 98%or at least 99%identity to one or more of the conserved domains of Glyma. 03G040200 (SEQ ID NO: 12) . A variant or fragment of Glyma. 03G040200 (SEQ ID NO: 12) can retain activity as in transmembrane transport.
In another example, a variant or fragment of Glyma. 03G036300 (SEQ ID NO: 8) can comprise a polypeptide comprising at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 98%or at least 99%identical to one or more of the conserved domains of Glyma. 03G040200 (SEQ ID NO: 12) . A variant or fragment of Glyma. 03G036300 (SEQ ID NO: 8) can retain activity as a pif1 helicase.
In another example, a variant or fragment of Glyma. 06g297500 (SEQ ID NO: 15) can comprise a polypeptide comprising at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 98%or at least 99%identical to one or more of the conserved domains of Glyma. 06g297500 (SEQ ID NO: 15) .
As indicated, fragments and variants of the polypeptides disclosed herein will retain the activity of conferring increased protein content, increased oil content, and/or modified oil profile to a plant expressing the polypeptide. Such increase in protein content and/or oil content can comprise any statistically significant increase, including, for example an increase of about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 85%, 90%, 95%or greater relative to a control. Methods of determining protein content or oil content are further described below.
In some embodiments, the polypeptides disclosed herein may comprise a heterologous amino acid sequence attached thereto. For example, a polypeptide may have a polypeptide tag or additional protein domain attached thereto. The heterologous amino acid sequence can be attached to the N terminus, the C terminus, or internally within the polypeptide. In some instances, the polypeptide may have one or more polypeptide tags and/or additional protein domains attached thereto at one or more positions of the polypeptide.
In some embodiments, the nucleic acid sequence encoding the polypeptides disclosed herein may comprise a heterologous nucleic acid sequence attached thereto. For example, the heterologous nucleic acid sequence may encode a polypeptide tag or additional protein domain that will be attached to the encoded polypeptide. As another example, the heterologous nucleic acid sequence may encode a regulatory element such as an intron, an enhancer, a promoter, a terminator, etc. The heterologous nucleic acid sequence can be positioned at the 5' end, the 3' end, or in-frame within the coding sequence of the polypeptide. In some instances, the nucleic acid sequence encoding the polypeptides disclosed herein may have one or more heterologous nucleic acid sequences attached thereto at one or more positions of the nucleic acid sequence.
As used herein, "heterologous" in reference to a polypeptide or polynucleotide sequence is a sequence that originates, for example, from a cell or an organism with another  genetic background of the same species or from a foreign species, or, if from the same species, is substantially modified from its native form in composition and/or genomic locus by deliberate human intervention. As such, heterologous sequences are in a configuration not found in nature. As used herein, a “native” polynucleotide or polypeptide comprises a naturally occurring nucleotide sequence or amino acid sequence, respectively. As such, “heterologous” refers to, when used in reference to a gene or nucleic acid, a gene encoding a factor that is not in its natural environment (i.e., has been altered by the of man) . For example, a heterologous gene may include a gene from one species introduced into another species. A heterologous gene may also include a gene native to an organism that has been altered in some way (e.g., mutated, added in multiple copies, linked to a non-native promoter or enhancer polynucleotide, etc. ) . Heterologous genes further may comprise plant gene polynucleotides that comprise cDNA forms of a plant gene; the cDNAs may be expressed in either a sense (to produce mRNA) or anti-sense orientation (to produce an antisense RNA transcript that is complementary to the mRNA transcript) . In one aspect of the invention, heterologous genes are distinguished from endogenous plant genes in that the heterologous gene polynucleotide are joined to polynucleotides comprising regulatory elements such as promoters that are not found naturally associated with the gene for the protein encoded by the heterologous gene or with plant gene polynucleotide in the chromosome, or are associated with portions of the chromosome not found in nature (e.g., genes expressed in loci where the gene is not normally expressed) . Further, in embodiments, a “heterologous” polynucleotide is a polynucleotide not naturally associated with a host cell into which it is introduced, including non-naturally occurring multiple copies of a naturally occurring polynucleotide.
II. Expression cassettes and promoters
Polynucleotides encoding the polypeptides provided herein can be provided in expression cassettes for expression in an organism of interest. The cassette will include 5' and 3' regulatory sequences operably linked to a polynucleotide encoding a polypeptide provided herein that allows for expression of the polynucleotide. The cassette may additionally contain at least one additional gene or genetic element to be co-transformed into the organism. Where additional genes or elements are included, the components are operably linked. Alternatively, the additional gene (s) or element (s) can be provided on multiple expression cassettes. Such an expression cassette is provided with a plurality of restriction sites and/or recombination sites for insertion of the polynucleotides to be under the transcriptional regulation of the regulatory elements or regions. The expression cassette may additionally contain a selectable marker gene.
The expression cassette will include in the 5'-3' direction of transcription, a transcriptional and translational initiation region (i.e., a promoter) , a polynucleotide of the invention, and a transcriptional and translational termination region (i.e., termination region) functional in the organism of interest, i.e., a plant or bacteria. The promoters of the invention are capable of directing or driving transcription and expression of a coding sequence in a host cell. The regulatory regions (i.e., promoters, transcriptional regulatory regions, and translational termination regions) may be endogenous or heterologous to the host cell or to each other. As used herein, a chimeric gene or a chimeric nucleic acid molecule comprises a coding sequence operably linked to a transcription initiation region that is heterologous to the coding sequence.
A variety of transcriptional terminators are available for use in expression cassettes. These are responsible for the termination of transcription beyond the transgene and correct mRNA polyadenylation. The termination region may be native with the transcriptional initiation region, may be native with the operably linked DNA sequence of interest, may be native with the plant host, or may be derived from another source (i.e., foreign or heterologous to the promoter, the DNA sequence of interest, the plant host, or any combination thereof) . Appropriate transcriptional terminators are those that are known to function in plants and include the CAMV 35S terminator, the tml terminator, the nopaline synthase terminator and the pea rbcs E9 terminator. These can be used in both monocotyledons and dicotyledons. In addition, a gene's native transcription terminator may be used. Termination regions used in the expression cassettes can be obtained from, e.g., the Ti-plasmid of A. tumefaciens, such as the octopine synthase and nopaline synthase termination regions. See also Guerineau et al. (1991) Mol. Gen. Genet. 262: 141-144; Proudfoot (1991) Cell 64: 671-674; Sanfacon et al. (1991) Genes Dev. 5: 141-149; Mogen et al. (990) Plant Cell 2: 1261-1272; Munroe et al. (1990) Gene 91: 151-158; Ballas et al. (1989) Nucleic Acids Res. 17: 7891-7903; and Joshi et al. (1987) Nucleic Acids Res. 15: 9627-9639.
Additional regulatory signals include, but are not limited to, transcriptional initiation start sites, operators, activators, enhancers, other regulatory elements, ribosomal binding sites, an initiation codon, termination signals, and the like. See, for example, U.S. Pat. Nos. 5,039,523 and 4,853,331; EPO 0480762A2; Sambrook et al. (1992) Molecular Cloning: A Laboratory Manual, ed. Maniatis et al. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. ) , hereinafter “Sambrook 11” ; Davis et al, eds. (1980) .
In preparing the expression cassette, the various DNA fragments may be manipulated, so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame. Toward this end, adapters or linkers may be employed to join the DNA  fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like. For this purpose, in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions, may be involved.
a.Promoters
A number of promoters can be used in the practice of the invention. The promoters can be selected based on the desired outcome. The nucleic acids can be combined with constitutive, inducible, tissue-preferred, or other promoters for expression in the organism of interest. See, for example, promoters set forth in WO 99/43838 and in US Patent Nos: 8,575,425; 7,790,846; 8,147,856; 8,586832; 7,772,369; 7,534,939; 6,072,050; 5,659,026; 5,608,149; 5,608,144; 5,604,121; 5,569,597; 5,466,785; 5,399,680; 5,268,463; 5,608,142; and 6,177,611; herein incorporated by reference.
For expression in plants, constitutive promoters can also be used. Non-limiting examples of constitutive promoters include CaMV 35S promoter (Odell et al. (985) Nature 313 : 810-812) ; rice actin (McElroy et al. (1990) Plant Cell 2: 163-171) ; ubiquitin (Christensen et al. (1989) Plant Mol. Biol. 12: 619-632 and Christensen et al. (1992) Plant Mol. Biol. 18: 675-689) ; pEMU (Last et al. (1991) Theor. Appl. Genet. 81: 581 -588) ; MAS (Velten e/a/. (1984) EMBO J. 3 : 2723-2730) . Inducible promoters include those that drive expression of pathogenesis-related proteins (PR proteins) , which are induced following infection by a pathogen. See, for example, Redolfi et al. (1983) Neth. J. Plant Pathol. 89: 245-254; Uknes et al. (1992) Plant Cell 4: 645-656; and Van Loon (1985) Plant Mol. Virol. 4: 111-116; and WO 99/43819, herein incorporated by reference. Promoters that are expressed locally at or near the site of pathogen infection may also be used (Marineau et al. (1987) Plant Mol. Biol. 9: 335-342; Matton et al. (1989) Molecular Plant-Microbe Interactions 2: 325-331; Somsisch et al. (1986) Proc. Natl. Acad. Sci. USA 83: 2427-2430; Somsisch et al. (1988) Mol. Gen. Genet. 2: 93-98; and Yang (1996) Proc. Natl. Acad. Sci. USA 93: 14972-14977; Chen et al. (1996) Plant J. 10: 955-966; Zhang et al. (1994) Proc. Natl. Acad. Sci. USA 91 : 2507-2511; Warner et al. (1993) Plant J. 3: 191-201; Siebertz et al. (1989) Plant Cell 1 : 961-968 ; Cordero et al. (1992) Physiol. Mol. Plant Path. 41 : 189-200; U.S . Patent No. 5,750,386 (nematode-inducible) ; and the references cited therein) .
Wound-inducible promoters may be used in the constructions of the invention. Such wound-inducible promoters include pin II promoter (Ryan (1990) Ann. Rev. Phytopath. 28: 425-449; Ouan et al. (1996) Nature Biotechnology 14: 494-498) ; wunl and wun2 (U.S. Patent No. 5,428,148) ; winl and win2 (Stanford et al. (1989) Mol. Gen. Genet. 215: 200-208) ; systemin  (McGurl et al. (1992) Science 225: 1570-1573) ; WIP1 (Rohmeier et al. (1993) Plant Mol. Biol. 22: 783-792; Eckelkamp et al. (1993) FEBS Letters 323: 73-76) ; MPI gene (Corderok et al. (1994) Plant J. 6 (2) : 141-150) ; and the like, herein incorporated by reference.
Tissue-preferred promoters for use in the invention include those set forth in Yamamoto et al. (1997) Plant J. 12 (2) : 255-265; Kawamata et al. (1997) Plant Cell Physiol. 38 (7) : 792-803; Hansen et al. (1997) Mol. Gen Genet. 254 (3) : 337-343; Russell et al. (1997) Transgenic Res. 6 (2) : 157-168; Rinehart et al. (1996) Plant Physiol. 112 (3) : 1331-1341; Van Camp et al. (1996) Plant Physiol. 112 (2) : 525-535; Canevascim et al. (1996) Plant Physiol. 112 (2) : 513-524; Yamamoto et al. (1994) Plant Cell Physiol. 35 (5) : 773-778; Lam (1994) Results Probl. Cell Differ. 20: 181-196; Orozco et al. (1993) PlantMolBiol. 23 (6) : 1129-1138; Matsuoka et al. (1993) Proc Natl. Acad. Sci. USA 90 (20) : 9586-9590; and Guevara-Garcia et al. (1993) Plant J. 4 (3) : 495-505.
Leaf-preferred promoters include those set forth in Yamamoto et al. (1997) Plant J. 12 (2) : 255-265; Kwon et al. (1994) Plant Physiol. 105: 357-67; Yamamoto et al. (1994) Plant Cell Physiol. 35 (5) : 773-778; Gotor et al. (1993) Plant J. 3: 509-18; Orozco et al. (1993) Plant Mol. Biol. 23 (6) : 1129-1138; and Matsuoka et al. (1993) Proc. Natl. Acad. Sci. USA 90 (20) : 9586-9590.
Root-preferred promoters are known and include those in Hire et al. (1992) Plant Mol. Biol. 20 (2) : 207-218 (soybean root-specific glutamine synthetase gene) ; Keller and Baumgartner (1991) Plant Cell 3 (10) : 1051-1061 (root-specific control element) ; Sanger et al. (1990) Plant Mol. Biol. 14 (3) : 433-443 (mannopine synthase (MAS) gene of Agrobacterium tumefaciens) ; and Miao et al. (1991) Plant Cell 3 (1) : 11-22 (cytosolic glutamine synthetase (GS) ) ; Bogusz et al. (1990) Plant Cell 2 (7) : 633-641; Leach and Aoyagi (1991) Plant Science (Limerick) 79 (l) : 69-76 (rolC and rolD) ; Teeri et al. (1989) EMBO J. 8 (2) : 343-350; Kuster et al. (1995) Plant Mol. Biol. 29 (4) : 759-772 (the VfENOD-GRP3 gene promoter) ; and, Capana et al. (1994) Plant Mol. Biol. 25 (4) : 681-691 (rolB promoter) . See also U.S. Patent Nos. 5,837,876; 5,750,386; 5,633,363; 5,459,252; 5,401,836; 5,110,732; and 5,023,179.
"Seed-preferred" promoters include both "seed-specific" promoters (those promoters active during seed development such as promoters of seed storage proteins) as well as "seed-germinating" promoters (those promoters active during seed germination) . See Thompson et al. (1989) BioEssays 10: 108. Seed-preferred promoters include, but are not limited to, Ciml (cytokinin-induced message) ; cZ19Bl (maize 19 kDa zein) ; milps (myo-inositol-1 -phosphate synthase) (see WO 00/11177 and U.S. Patent No. 6,225,529) . Gamma-zein is an endosperm- specific promoter. Globulin 1 (Gib-1) is a representative embryo-specific promoter. For dicots, seed-specific promoters include, but are not limited to, bean β-phaseolin, napin, β-conglycinin, soybean lectin, cruciferin, and the like. For monocots, seed-specific promoters include, but are not limited to, maize 15 kDa zein, 22 kDa zein, 27 kDa zein, gamma-zein, waxy, shrunken 1, shrunken 2, Globulin 1, etc. See also WO 00/12733, where seed-preferred promoters from endl and end! genes are disclosed.
In specific embodiments, the polynucleotides or variants thereof provided herein, are not expressed using a root-specific promoter. In further embodiments, the polynucleotides or variants thereof provided herein are not expressed with the RCc3 root-specific promoter. (See US20130139280) .
For expression in a bacterial host, promoters that function in bacteria are well-known in the art. Such promoters include any of the known crystal protein gene promoters, including the promoters of any of the proteins of the invention, and promoters specific for B. thuringiensis sigma factors. Alternatively, mutagenized or recombinant crystal protein-encoding gene promoters may be recombinantly engineered and used to promote expression of the novel gene segments disclosed herein.
A number of non-translated leader sequences derived from viruses are also known to enhance expression, and these are particularly effective in dicotyledonous cells. The expression cassette may comprise one or more of such leader sequences. Specifically, leader sequences from tobacco mosaic virus (TMV, the “W-sequence” ) , maize chlorotic mottle virus (MCMV) , and alfalfa mosaic virus (AMV) have been shown to be effective in enhancing expression (e.g., Gallie et al. Nucl. Acids Res. 15: 8693-8711 (1987) ; Skuzeski et al. Plant Molec. Biol. 15: 65-79 (1990) ) . Other leader sequences known in the art include but are not limited to: picomavirus leaders, for example, EMCV leader (encephalomyocarditis 5' noncoding region) (Elroy-Stein, O., Fuerst, T.R., and Moss, B. PNAS USA 86: 6126-6130 (1989) ) ; potyvirus leaders, for example, tobacco etch virus (TEV) leader (Allison et al., 1986) ; maize dwarf mosaic virus (MDMV) leader; Virology 154: 9-20) ; human immunoglobulin heavy-chain binding protein (BiP) leader, (Macejak, D.G., and Samow, P., Nature 353: 90-94 (1991) ; untranslated leader from the coat protein mRNA of alfalfa mosaic virus (AMV RNA 4) , (Jobling, S.A., and Gehrke, L., Nature 325: 622-625 (1987) ; tobacco mosaic virus leader (TMV) , (Gallie, D. R. et al., Molecular Biology of RNA, 237-256 (1989) ; and maize chlorotic mottle virus leader (MCMV) (Lommel, S.A. et al., Virology 81: 382-385 (1991) . See also, Della-Cioppa et al., Plant Physiology 84: 965-968 (1987) .
The expression cassette can also comprise a selectable marker gene for the selection of transformed cells. Selectable marker genes are utilized for the selection of transformed cells or tissues. Marker genes include genes encoding antibiotic resistance, such as those encoding neomycin phosphotransferase Π (NEO) and hygromycin, 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) , spectinomycin, or acetolactate synthase (ALS) . Selection markers used routinely in transformation include the nptll gene, which confers resistance to kanamycin and related antibiotics (Messing &Vierra Gene 19: 259-268 (1982) ; Bevan et al., Nature 304: 184-187 (1983) ) , the pat and bar genes, which confer resistance to the herbicide glufosinate (also called phosphinothricin; see White et al., Nucl. Acids Res 18: 1062 (1990) , Spencer et al. Theor. Appl. Genet 79: 625-631 (1990) and U.S. Patent Nos. 5,561,236 and 5,276,268) , the hph gene, which confers resistance to the antibiotic hygromycin (Blochinger &Diggelmann, Mol. Cell Biol. 4: 2929-2931) , and the dhfr gene, which confers resistance to methatrexate (Bourouis et al., EMBO J. 2 (7) : 1099-1104 (1983) ) , the EPSPS gene, which confers resistance to glyphosate (U.S. Patent Nos. 4,940,935 and 5,188,642) , the glyphosate N-acetyltransferase (GAT) gene, which also confers resistance to glyphosate (Castle et al. (2004) Science, 304: 1151-1154; U.S. Patent App. Pub. Nos. 20070004912, 20050246798, and 20050060767) ; and the mannose-6-phosphate isomerase gene, which provides the ability to metabolize mannose (U.S. Patent Nos. 5,767,378 and 5,994,629) .
In some embodiments, the promoter used herein to drive the expression of the above referenced polynucleotide comprises SEQ ID NO: 23 (FIG. 17) .
b. Native promoters
In some embodiments, the promoter used herein to drive the expression of the polynucleotides provided herein comprises a native promoter or an active variant or fragment thereof. For purpose of this disclosure, the term “native promoter, ” used interchangeably with the term “endogenous promoter, ” refers to a promoter that is found in plants in nature. An active variant or fragment of a native promoter refers to a promoter sequence that has one or more nucleotide substitutions, deletions, or insertions and that can drive expression of an operably-linked polynucleotide sequence under conditions similar to those under which the native promoter is active. Such active variants or fragments may be created by site-directed mutagenesis, induced mutation, or may occur as allelic variants (polymorphisms) . In some embodiments, disclosed herein is a construct comprising a native promoter or an active variant or fragment thereof operably linked to a polynucleotide encoding a polypeptide having the sequence of any one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, or a fragment or variant (e.g., having least 85%, at least 90%, at least 95%, at least 98%, or at least 99%identity)  of any one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59; and when introduced into a plant, the construct confers increased protein content, increased oil content, and/or modified oil profile. In some embodiments, the native promoter is a heterologous promoter to the polynucleotide.
Also provided herein is a plant, a plant cell, or a plant part (e.g., a plant seed) comprising a construct comprising a native promoter operably linked to a polynucleotide sequence as provided herein. In some embodiments, the polynucleotide encodes a polypeptide having an amino acid sequence comprising least 85%, at least 90%, at least 95%, at least 98%, or at least 99%identity to at least one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59. In some embodiments, the polynucleotide comprises at least one, at least two, at least three, at least four, at least five, or at least six mutations as compared to any one of SEQ ID NOs: 1, 2, 4, 7, 10, 11,14, 16, 17, 20, or 21. In some embodiments, the polynucleotide comprises at least one, at least two, at least three, at least four, at least five, or at least six mutations as compared to a polynucleotide encoding any one of SEQ ID NO: 22 or 24-59. In some embodiments, the plant is a dicot plant. In some embodiments, the plant is a monocot plant. In some embodiments, the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane. In some embodiments, the plant is a soybean plant. In some embodiments, the plant is an elite soybean plant.
Also provided herein is a method of conferring increased protein content, increased oil content, and/or modified oil profile to a plant comprising introducing into the genome of the plant a nucleic acid sequence operably linked to a native promoter or an active variant or fragment thereof, where the nucleic acid sequence encodes a polypeptide having an amino acid sequence comprising least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%identity, at least 96%, at least 97%, at least 98%, or at least 99%identity to at least one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59. In some embodiments, the nucleic acid sequence encodes a polypeptide having an amino acid sequence set forth in  SEQ ID  3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59.
III. Plants, plant cells and plant parts
In the plants provided herein, the polynucleotide as described in Section I of this disclosure is a heterologous nucleic acid sequence in the genome of the plant. As used herein, the term “heterologous” in the context of a chromosomal segment refers to one or more DNA sequences (e.g., genetic loci) in a configuration in which they are not found in nature, for example as a result of a recombination event between homologous chromosomes during meiosis,  or for example as a result of introduction of a transgenic sequence, or for example as a result of modification through gene editing.
Although soybean plants are used to exemplify the composition and methods throughout the application, a polynucleotide as provided herein may be introduced to any plant species, including, but not limited to, monocots and dicots. Examples of plants of interest include, but are not limited to, corn (maize) , sorghum, wheat, sunflower, tomato, crucifers, peppers, potato, cotton, rice, soybean, sugarbeet, sugarcane, tobacco, barley, and oilseed rape, Brassica sp., alfalfa, rye, millet, safflower, peanuts, sweet potato, cassava, coffee, coconut, pineapple, citrus trees, cocoa, tea, banana, avocado, fig, guava, mango, olive, papaya, cashew, macadamia, almond, oats, vegetables, ornamentals, and conifers.
Glycine (soybean or soya bean) is a genus in the bean family Fabaceae. The Glycine plants can be Glycine arenaria, Glycine argyrea, Glycine cyrtoloba, Glycine canescens, Glycine clandestine, Glycine curvata, Glycinefalcata, Glycine latifolia, Glycine microphylla, Glycine pescadrensis , Glycine stenophita, Glycine syndetica, Glycine soja Seib. Et Zucc., Glycine max (L.) Merrill., Glycine tabacina, or Glycine tomentella.
In some embodiments, the plants provided herein are elite plants or derived from an elite line.
As used herein, an “elite line” is an agronomically superior line that has resulted from many cycles of breeding and selection for superior agronomic performance. Numerous elite lines are available and known to those of skill in the art of soybean breeding. An “elite population, ” is an assortment of elite individuals or lines that can be used to represent the state of the art in terms of agronomically superior genotypes of a given crop species, such as soybean. Similarly, an “elite germplasm” or elite strain of germplasm is an agronomically superior germplasm, typically derived from, and/or can give rise to, a plant with superior agronomic performance, such as an existing or newly developed elite line of soybean.
An “elite” plant is any plant from an elite line, such that an elite plant is a representative plant from an elite variety. In some embodiments, the soybean plant comprising a polynucleotide encoding any one of the polypeptides disclosed herein is an elite soybean plant. Non-limiting examples of elite soybean varieties that are commercially available to farmers or soybean breeders include: AG00802, A0868, AG0902, A1923, AG2403, A2824, A3704, A4324, A5404, AG5903, AG6202 AG0934; AG1435; AG2031; AG2035; AG2433; AG2733; AG2933; AG3334; AG3832; AG4135; AG4632; AG4934; AG5831; AG6534; and AG7231 (Asgrow Seeds, Des Moines, Iowa, USA) ; BPR0144RR, BPR 4077NRR and BPR 4390NRR (Bio Plant  Research, Camp Point, Ill., USA) ; DKB 17-51 and DKB37-51 (DeKalb Genetics, DeKalb, Ill., USA) ; DP 4546 RR, and DP 7870 RR (Delta &Pine Land Company, Lubbock, Tex., USA) ; JG 03R501, JG 32R606C ADD and JG 55R503C (JGL Inc., Greencastle, Ind., USA) ; NKS 13-K2 (NK Division of Syngenta Seeds, Golden Valley, Minnesota, USA) ; 90M01, 91M30, 92M33, 93M11, 94M30, 95M30, 97B52, P008T22R2; P16T17R2; P22T69R; P25T51R; P34T07R2; P35T58R; P39T67R; P47T36R; P46T21R; and P56T03R2 (Pioneer Hi-Bred International, Johnston, Iowa, USA) ; SG4771NRR and SG5161NRR/STS (Soygenetics, LLC, Lafayette, Ind., USA) ; S00-K5, S11-L2, S28-Y2, S43-B1, S53-A1, S76-L9, S78-G6, S0009-M2; S007-Y4; S04-D3; S14-A6; S20-T6; S21-M7; S26-P3; S28-N6; S30-V6; S35-C3; S36-Y6; S39-C4; S47-K5; S48-D9; S52-Y2; S58-Z4; S67-R6; S73-S8; and S78-G6 (Syngenta Seeds, Henderson, Ky., USA) ; Richer (Northstar Seed Ltd. Alberta, CA) ; 14RD62 (Stine Seed Co. Ia., USA) ; or Armor 4744 (Armor Seed, LLC, Ar., USA) .
In some embodiments, the plants provided herein can comprise one or more additional polynucleotides that encode an additional polypeptide that can confer a phenotype of increased protein content, increased oil content, and/or modified oil profile on a plant. In some embodiments, the additional polynucleotide encodes a polypeptide having the sequence of any one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59. The additional polynucleotide can be introduced using similar approaches as disclosed above, e.g, by transgenic means, by breeding, or by genome editing.
In specific embodiments, the plants, plant parts or seeds having the heterologous polynucleotide or polypeptide disclosed herein or active variants and fragment thereof can have a modified level of expression of the polynucleotide or polypeptide (i.e, an increase or a decrease in expression level) . In other embodiments, the plants, plant parts or seeds having the heterologous polynucleotide or polypeptide disclosed herein or active variants and fragment thereof can have a modified level of activity of the polypeptide (i.e, an increase or a decrease in activity level) . Methods to generate such modified levels of expression or activity are disclosed elsewhere herein and include, but are not limited to, breeding, gene editing, and transgenic techniques.
Plants produced as described above can be propagated to produce progeny plants, and the progeny plants that have stably incorporated into its genome a polynucleotide conferring the increased protein content, increased oil content, and/or modified oil profile can be selected and can be further propagated if desired. The term “progeny, ” refers to the descendant (s) of a particular cross. Typically, progeny result from breeding of two individuals, although some species (particularly some plants and hermaphroditic animals) can be selfed (i.e., the same plant  acts as the donor of both male and female gametes) . The descendant (s) can be, for example, of the F1, the F2, or any subsequent generation.
In some embodiments, a plant cell, seed, or plant part or harvest product can be obtained from the plant produced as above and the plant cell, seed, or plant part can be screened using methods disclosed above for the evidence of stable incorporation of the polynucleotide. The term “stable incorporation” refers to the integration of a nucleic acid sequence into the genome of a plant and said nucleic acid sequence is capable of being inherited by the progeny thereof. As used herein, the term “plant part” indicates a part of a plant, including single cells and cell tissues such as plant cells that are intact in plants, cell clumps and tissue cultures from which plants can be regenerated. Examples of plant parts include, but are not limited to, single cells and tissues from pollen, ovules, zygotes, leaves, embryos, roots, root tips, anthers, flowers, flower parts, fruits, stems, shoots, cuttings, and seeds; as well as pollen, ovules, egg cells, zygotes, leaves, embryos, roots, root tips, anthers, flowers, flower parts, fruits, stems, shoots, cuttings, scions, rootstocks, seeds, protoplasts, calli, and the like.
In some embodiments, plant products can be harvested from the plant disclosed above and processed to produce processed products, such as flour, soy meal, oil, starch, and the like. These processed products are also within the scope of this invention provided that they comprise a polynucleotide or polypeptide or variant thereof disclosed herein. Other soybean plant products include but are not limited to protein concentrate, protein isolate, soybean hulls, meal, flower, oil and the whole soybean itself.
IV. Methods for producing a plant variety that has increased protein and/or oil content and/or modified oil profile
Provided herein are methods of producing a plant that has increased protein content, increased oil content, and/or modified oil profile by introducing a nucleic acid sequence encoding a polypeptide as provided herein. A nucleic acid sequence may be introduced to a plant cell by various ways, for example, by transformation, by genome modification techniques (such as by genome editing) , or by breeding. In one aspect, the plant can be produced by transforming the nucleic acid sequence encoding a polypeptide disclosed above into a recipient plant. In one aspect, the method can comprise editing the genome of the recipient plant so that the resulting plant comprises a polynucleotide encoding a polypeptide disclosed above. In yet another aspect, the method can comprise increasing the expression level and/or activity of the above-mentioned proteins in a recipient plant, for example, by enhancing promoter activity or  replacing the endogenous promoter with a stronger promoter. In another aspect, the method can comprise breeding a donor plant comprising a polynucleotide as described above with a recipient plant and selecting for incorporation of the polynucleotide into the recipient plant genome.
1. Transgenic means
In some embodiments, the method comprises transforming a polynucleotide disclosed herein or an active variant or fragment thereof into a recipient plant to obtain a transgenic plant, and said transgenic plant has increased protein content, increased oil content, and/or modified oil profile. Expression cassettes comprising polynucleotides encoding the polypeptides as described above can be used to transform plants of interest.
As used herein, the term “transgenic” and grammatical variations thereof refer to a plant, including any part derived from the plant, such as a cell, tissue or organ, in which a heterologous nucleic acid is integrated into the genome. In specific embodiments, the heterologous nucleic acid is a recombinant construct, vector or expression cassette comprising one or more nucleic acids. In other embodiments, a transgenic plant is produced by a genetic engineering method, such as Agrobacterium transformation. Through gene technology, the heterologous nucleic acid is stably integrated into chromosomes, so that the next generation can also be transgenic. As used herein, “transgenic” and grammatical variations thereof also encompass biological treatments, which include plant hybridization and/or natural recombination.
Transformation results in a transformed plant, including whole plants, as well as plant organs (e.g., leaves, stems, roots, etc. ) , seeds, plant cells, propagules, embryos and progeny of the same. Plant cells can be differentiated or undifferentiated (e.g., callus, suspension culture cells, protoplasts, leaf cells, root cells, phloem cells, pollen) . Transformation may result in stable or transient incorporation of the nucleic acid into the cell. "Stable transformation" is intended to mean that the nucleotide construct introduced into a host cell integrates into the genome of the host cell and is capable of being inherited by the progeny thereof. "Transient transformation" is intended to mean that a polynucleotide is introduced into the host cell and does not integrate into the genome of the host cell.
Methods for transformation typically involve introducing a nucleotide construct into a plant. In some embodiments, the transformation method is an Agrobacterium-mediated transformation. In some embodiments, the transformation method is a biolistic-mediated transformation. Transformation may also be performed by infection, transfection, microinjection, electroporation, microprojection, biolistics or particle bombardment, electroporation, silica/carbon fibers, ultrasound mediated, PEG mediated, calcium phosphate co-precipitation,  poly cation DMSO technique, DEAE dextran procedure, Agrobacterium and viral mediated (e.g., Caulimoriviruses, Geminiviruses, RNA plant viruses) , liposome mediated and the like.
Transformation protocols as well as protocols for introducing polypeptides or polynucleotide sequences into plants may vary depending on the type of plant or plant cell, i.e., monocot or dicot, targeted for transformation. Methods for transformation are known in the art and include those set forth in US Patent Nos: 8,575,425; 7,692,068; 8,802,934; and 7,541,517; each of which is herein incorporated by reference. See, also, Rakoczy-Trojanowska, M. (2002) Cell Mol Biol Lett. 7: 849-858; Jones et al. (2005) Plant Methods, Vol. 1, Article 5; Rivera et al. (2012) Physics of Life Reviews 9: 308-345; Bartlett et al. (2008) Plant Methods 4: 1-12; Bates, G.W. (1999) Methods in Molecular Biology 111 : 359-366; Binns and Thomashow (1988) Annual Reviews in Microbiology 42: 57 Sup'/Sup5-606; Christou, P. (1992) The Plant Journal 2: 275-281; Christou, P. (1995) Euphytica 85: 13-27; Tzfira et al. (2004) TRENDS in Genetics 20: 375-383; Yao et al. (2006) Journal of Experimental Botany 57: 3737-3746; Zupan and Zambryski (1995) Plant Physiology 107: 1041-1047.
Methods for transformation of chloroplasts are known in the art. See, for example, Svab et al. (1990) Proc. Natl. Acad. Sci. USA 87 (21) : 8526-8530; Svab and Maliga (1993) Proc. Natl. Acad. Sci. USA 90 (3) : 913-917; Staub and Maliga (1993) EMBO J. 12 (2) : 601-606. The method relies on particle gun delivery of DNA containing a selectable marker and targeting of the DNA to the plastid genome through homologous recombination. Additionally, plastid transformation can be accomplished by transactivation of a silent plastid-borne transgene by tissue-preferred expression of a nuclear-encoded and plastid-directed RNA polymerase. Such a system has been reported in McBride et al. (1994) Proc. Natl. Acad. Sci. USA 91 (15) : 7301-7305.
The cells that have been transformed may be grown into plants in accordance with conventional ways. See, for example, McCormick et al. (1986) Plant Cell Reports 5: 81-84. These plants may then be grown, and either pollinated with the same transformed strain or different strains, and the resulting hybrid having constitutive expression of the desired phenotypic characteristic identified. Two or more generations may be grown to ensure that expression of the desired phenotypic characteristic is stably maintained and inherited and then seeds harvested to ensure expression of the desired phenotypic characteristic has been achieved. In this manner, the present invention provides transformed seed (also referred to as "transgenic seed" ) having a nucleotide construct of the invention, for example, an expression cassette of the invention, stably incorporated into their genome.
2. Crossing
In some embodiments, the method comprises crossing a donor plant comprising a polynucleotide encoding a polypeptide disclosed herein with a recipient plant, and the polypeptide is able to confer increased protein, increased oil content, and/or modified oil profile in the recipient plant. As used herein, the terms “crossing” and “breeding” refer to the fusion of gametes to produce progeny (e.g., by fertilization, such as to produce seed by pollination in plants) . In some embodiments, a “cross, ” “breeding, ” or “cross-fertilization” is fertilization of one individual by another (e.g., cross-pollination in plants) . The plant disclosed herein may be a whole plant, or may be a plant cell, seed, or tissue, or a plant part such as leaf, stem, pollen, or cell that can be cultivated into a whole plant.
In some embodiments, a progeny plant created by the crossing or breeding process is repeatedly crossed back to one of its parents through a process referred to herein as “backcrossing” . In a backcrossing scheme, the “donor” parent refers to the parental plant with the desired gene or locus to be introgressed. The “recipient” parent (used one or more times) or “recurrent” parent (used two or more times) refers to the parental plant into which the gene or locus is being introgressed. For example, see Ragot, M. et al. Marker-assisted Backcrossing: A Practical Example, in Techniques et Utilisations des Marqueurs Moleculaires Les Colloques, Vol. 72, pp. 45-56 (1995) ; and Openshaw et al., Marker-assisted Selection in Backcross Breeding, in Proceedings of the Symposium “Analysis of Molecular Marker Data, ” pp. 41-43 (1994) . The initial cross gives rise to the F1 generation. The term “BC1” refers to the second use of the recurrent parent, “BC2” refers to the third use of the recurrent parent, and so on.
In some embodiments, the donor soybean plant is a Glycine max plant. In some embodiments, the donor soybean plant is a Glycine soja plant. In some embodiments, the recipient soybean plant is an elite Glycine max plant or an elite Glycine soja plant. In some embodiments, the donor plant is from soy variety Suinong 14 (SN14) . In some embodiments, the donor plant is soy variety Glycine soja ZYD0006.
3. Gene editing
In some embodiments, the polynucleotide sequences provided herein can be targeted to specific sites within the genome of a recipient plant cell. Such methods include, but are not limited to, meganucleases designed against the plant genomic sequence of interest CRISPR-Cas9, TALENs, and other technologies for precise editing of genomes (Feng, et al. Cell Research 23: 1229-1232, 2013, WO 2013/026740) ; Cre-lox site-specific recombination; FLP-FRT recombination (Li et al. (2009) Plant Physiol 151: 1087-1095) ; Bxbl -mediated integration (Yau et al. Plant J (2011) 701: 147-166) ; zinc-finger mediated integration (Wright et al. (2005) Plant J  44:693-705) ; Cai et al. (2009) Plant Mol Biol 69: 699-709) ; homologous recombination (Lieberman-Lazarovich and Levy (2011) Methods Mol Biol : 51-65) ; prime editing and transposases (Anzalone, A. et al., Nat Biotechnol. 2020 Jul; 38 (7) : 824-844) ; translocation; and inversion.
Various embodiments of the methods described herein use gene editing. In some embodiments, gene editing is used to mutagenize the genome of a plant to produce plants having one or more of the polypeptides that is able to confer increased protein content, increased oil content, and/or modified oil profile.
In some embodiments, provided herein are plants transformed with and expressing gene-editing machinery as described above, which, when crossed with a target plant, result in gene editing in the target plant.
In general, gene editing may involve transient, inducible, or constitutive expression of the gene editing components or systems. Gene editing may involve genomic integration or episomal presence of the gene editing components or systems.
Gene editing generally refers to the use of a site-directed nuclease (including but not limited to CRISPR/Cas, zinc fingers, meganucleases, and the like) to cut a nucleotide sequence at a desired location. This may be to cause an insertion/deletion ( “indel” ) mutation, (i.e., “SDN1” ) , a base edit (i.e., “SDN2” ) , or allele insertion or replacement (i.e., “SDN3” ) . SDN2 or SDN3 gene editing may comprise the provision of one or more recombination templates (e.g., in a vector) comprising a gene sequence of interest that can be used for homology directed repair (HDR) within the plant (i.e., to be introduced into the plant genome) . In some embodiments, the gene or allele of interest is one that is able to confer to the plant an improved trait, e.g., increased protein content, increased oil content, and/or modified oil profile. The recombination template can be introduced into the plant to be edited either through transformation or through breeding with a donor plant comprising the recombination template. Breaks in the plant genome may be introduced within, upstream, and/or downstream of a target sequence. In some embodiments, a double strand DNA break is made within or near the target sequence locus. In some embodiments, breaks are made upstream and downstream of the target sequence locus, which may lead to its excision from the genome. In some embodiments, one or more single strand DNA breaks (nicks) are made within, upstream, and/or downstream of the target sequence (e.g., using a nickase Cas9 variant) . Any of these DNA breaks, as well as those introduced via other methods known to one of skill in the art, may induce HDR. Through HDR, the target sequence is replaced by the sequence of the provided recombination template comprising a polynucleotide of interest, e.g., any one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or a polynucleotide encoding  a polypeptide having the sequence of any one of SEQ ID NO: 22, or 24-59 may be provided on/as a template. By designing the system such that one or more single strand or double strand breaks are introduced within, upstream, and/or downstream of the corresponding region in the genome of a plant not comprising the gene sequence of interest, this region can be replaced with the templateIn some embodiments, the polynucleotide of interest is operably linked to a promoter and the expression of the polynucleotide of interest controlled by the promoter conferred increased protein increased, oil content, and/or modified oil profile to the plant. In some embodiments, the promoter is a native promoter or an active variant or fragment thereof as described above.
In some embodiments, mutations in the genes of interest described herein may be generated without the use of a recombination template via targeted introduction of DNA double strand breaks. Such breaks may be repaired through the process of non-homologous end joining (NHEJ) , which can result in the generation of small insertions or deletions (indels) at the repair site. Such indels may lead to frameshift mutations causing premature stop codons or other types of loss-of-function mutations in the targeted genes.
In some embodiments, gene editing may involve transient, inducible, or constitutive expression of the gene editing components or systems in the target plant. Gene editing may also involve genomic integration or episomal presence of the gene editing components or systems in the target plant.
In certain embodiments, the nucleic acid modification or mutation is effected by a (modified) zinc-finger nuclease (ZFN) system. The ZFN system uses artificial restriction enzymes generated by fusing a zinc finger DNA-binding domain to a DNA-cleavage domain that can be engineered to target desired DNA sequences. Exemplary methods of genome editing using ZFNs can be found for example in U.S. Patent Nos. 6,534,261; 6,607,882; 6,746,838; 6,794,136; 6,824,978; 6,866,997; 6,933,113; and 6,979,539.
In certain embodiments, the nucleic acid modification is effected by a (modified) meganuclease, which are endodeoxyribonucleases characterized by a large recognition site (double-stranded DNA sequences of 12 to 40 base pairs) . Exemplary method for using meganucleases can be found in US Patent Nos: 8,163,514; 8,133,697; 8,021,867; 8,119,361; 8,119,381; 8,124,369; and 8,129,134, which are specifically incorporated by reference.
In certain embodiments, the nucleic acid modification is effected by a (modified) CRISPR/Cas complex or system. In certain embodiments, the CRISPR/Cas system or complex is a class 2 CRISPR/Cas system. In certain embodiments, said CRISPR/Cas system or complex  is a type II, type V, or type VI CRISPR/Cas system or complex. The CRISPR/Cas system does not require the generation of customized proteins to target specific sequences but rather a single Cas protein can be programmed by an RNA guide (gRNA) to recognize a specific nucleic acid target, in other words the Cas enzyme protein can be recruited to a specific nucleic acid target locus (which may comprise or consist of RNA and/or DNA) of interest using said short RNA guide.
In general, the CRISPR/Cas or CRISPR system is as used herein foregoing documents refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated ( “Cas” ) genes, including sequences encoding a Cas gene and one or more of, a tracr (trans-activating CRISPR) sequence (e.g. tracrRNA or an active partial tracrRNA) , a tracr-mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system) , a guide sequence (also referred to as a “spacer” in the context of an endogenous CRISPR system) , or “RNA (s) ” as that term is herein used (e.g., RNA (s) to guide Cas, such as Cas9, e.g. CRISPR RNA and, where applicable, transactivating (tracr) RNA or a single guide RNA (sgRNA) (chimeric RNA) ) or other sequences and transcripts from a CRISPR locus. In general, a CRISPR system is characterized by elements that promote the formation of a CRISPR complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous CRISPR system) . In the context of formation of a CRISPR complex, “target sequence” refers to a sequence to which a guide sequence is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex. A target sequence may comprise any polynucleotide, such as DNA or RNA polynucleotides.
In certain embodiments, the gRNA is a chimeric guide RNA or single guide RNA (sgRNA) . In certain embodiments, the gRNA comprises a guide sequence and a tracr mate sequence (or direct repeat) . In certain embodiments, the gRNA comprises a guide sequence, a tracr mate sequence (or direct repeat) , and a tracr sequence. In certain embodiments, the CRISPR/Cas system or complex as described herein does not comprise and/or does not rely on the presence of a tracr sequence (e.g. if the Cas protein is Cas12a) .
The Cas protein as referred to herein, such as but not limited to Cas9, Cas12a (formerly referred to as Cpf1) , Cas12b (formerly referred to as C2c1) , Cas13a (formerly referred to as C2c2) , C2c3, Cas13b protein, may originate from any suitable source, and hence may include different orthologues, originating from a variety of (prokaryotic) organisms, as is well documented in the art. In certain embodiments, the Cas protein is (modified) Cas9, preferably (modified) Staphylococcus aureus Cas9 (SaCas9) or (modified) Streptococcus pyogenes Cas9  (SpCas9) . In certain embodiments, the Cas protein is Cas12a, optionally from Acidaminococcus sp., such as Acidaminococcus sp. BV3L6 Cpf1 (AsCas12a ) or Lachnospiraceae bacterium Cas12a , such as Lachnospiraceae bacterium MA2020 or Lachnospiraceae bacterium MD2006 (LBCas12a) . See U.S. Pat. No. 10,669,540, incorporated herein by reference in its entirety. Alternatively, the Cas12a protein may be from Moraxella bovoculi AAX08_00205 [Mb2Cas12a] or Moraxella bovoculi AAX11_00205 [Mb3Cas12a] . See WO 2017/189308, incorporated herein by reference in its entirety. In certain embodiments, the Cas protein is (modified) C2c2, preferably Leptotrichia wadei C2c2 (LwC2c2) or Listeria newyorkensis FSL M6-0635 C2c2 (LbFSLC2c2) . In certain embodiments, the (modified) Cas protein is C2c1. In certain embodiments, the (modified) Cas protein is C2c3. In certain embodiments, the (modified) Cas protein is Cas13b. Other Cas enzymes are available to a person skilled in the art.
Gene editing methods and compositions are also disclosed in US Pat. Nos. 10,519,456 and 10,285,348 82, the entire content of which is herein incorporated by reference.
The gene-editing machinery (e.g., the DNA modifying enzyme) introduced into the plants can be controlled by any promoter that can drive recombinant gene expression in plants. In some embodiments, the promoter is a constitutive promoter. In some embodiments, the promoter is a tissue-specific promoter, e.g., a pollen-specific promoter or a sperm cell specific promoter, a zygote specific promoter, or a promoter that is highly expressed in sperm, eggs and zygotes (e.g., prOsActin1) . Suitable promoters are disclosed in U.S. Pat. No. 10,519,456, the entire content of which is herein incorporated by reference.
In another aspect, provided herein is a method of editing plant genomic DNA. In some embodiments, the method comprises using a first soybean plant expressing a DNA modification enzyme and at least one optional guide nucleic acid as described above to pollinate a target plant comprising genomic DNA to be edited.
V. Stacking
The various polynucleotides and variants thereof provided herein can be stacked with one or more polynucleotides encoding a desirable trait such as a polynucleotide that confers, for example, insect, disease or herbicide resistance or other desirable agronomic traits of interest including, but not limited to, traits associated with high oil content; increased digestibility; balanced amino acid content; and high energy content. Such traits may refer to properties of both seed and non-seed plant tissues, or to food or feed prepared from plants or seeds having such traits.
As used herein, gene or trait “stacking” is combining desired genes or traits into one transgenic plant line. As one approach, plant breeders stack transgenic traits by making crosses between parents that each have a desired trait and then identifying offspring that have both of these desired traits (so-called “breeding stacks” ) . Another way to stack genes is by transferring two or more genes into the cell nucleus of a plant at the same time during transformation. Another way to stack genes is by re-transforming a transgenic plant with another gene of interest. For example, gene stacking can be used to combine two different insect resistance traits, an insect resistance trait and a disease resistance trait, or a herbicide resistance trait (such as, for example, Bt11) . The use of a selectable marker in addition to a gene of interest would also be considered gene stacking.
In some embodiments, a nucleic acid molecule or vector of the disclosure can include an additional coding sequence for one or more polypeptides or double stranded RNA molecules (dsRNA) of interest for agronomic traits that primarily are of benefit to a seed company, grower or grain processor. A polypeptide of interest can be any polypeptide encoded by a nucleotide sequence of interest. Non-limiting examples of polypeptides of interest that are suitable for production in plants include those resulting in agronomically important traits such as herbicide resistance (also sometimes referred to as “herbicide tolerance” ) , virus resistance, bacterial pathogen resistance, insect resistance, nematode resistance, or fungal resistance. See, e.g., U.S. Patent Nos. 5,569,823; 5,304,730; 5,495,071; 6,329,504; and 6,337,431. The polypeptide also can be one that increases plant vigor or yield (including traits that allow a plant to grow at different temperatures, soil conditions and levels of sunlight and precipitation) , or one that allows identification of a plant exhibiting a trait of interest (e.g., a selectable marker, seed coat color, relative maturity group, etc. ) . Various polypeptides of interest, as well as methods for introducing these polypeptides into a plant, are described, for example, in US Patent Nos. 4,761,373; 4,769,061; 4,810,648; 4,940,835; 4,975,374; 5,013,659; 5,162,602; 5,276,268; 5,304,730; 5,495,071; 5,554,798; 5,561,236; 5,569,823; 5,767,366; 5,879,903, 5,928,937; 6,084,155; 6,329,504 and 6,337,431; as well as US Patent Publication No. 2001/0016956.
Polynucleotides conferring resistance/tolerance to an herbicide that inhibits the growing point or meristem, such as an imidazalinone or a sulfonylurea can also be suitable in some embodiments. Exemplary polynucleotides in this category code for mutant ALS and AHAS enzymes as described, e.g., in U.S. Patent Nos. 5,767,366 and 5,928,937. U.S. Patent Nos. 4,761,373 and 5,013,659 are directed to plants resistant to various imidazalinone or sulfonamide herbicides. U.S. Patent No. 4,975,374 relates to plant cells and plants containing a nucleic acid encoding a mutant glutamine synthetase (GS) resistant to inhibition by herbicides that are known to inhibit GS, e.g., phosphinothricin and methionine sulfoximine. U.S. Patent No. 5,162,602  discloses plants resistant to inhibition by cyclohexanedione and aryloxyphenoxypropanoic acid herbicides. The resistance is conferred by an altered acetyl coenzyme A carboxylase (ACCase) .
Polypeptides encoded by nucleotides sequences conferring resistance to glyphosate are also suitable for the disclosure. See, e.g., U.S. Patent No. 4,940,835 and U.S. Patent No. 4,769,061. U.S. Patent No. 5,554,798 discloses transgenic glyphosate resistant maize plants, which resistance is conferred by an altered 5-enolpyruvyl-3-phosphoshikimate (EPSP) synthase gene.
Polynucleotides coding for resistance to phosphono compounds such as glufosinate ammonium or phosphinothricin, and pyridinoxy or phenoxy propionic acids and cyclohexones are also suitable. See, European Patent Application No. 0 242 246. See also, U.S. Patent Nos. 5,879,903, 5,276,268, and 5,561,236.
Other suitable polynucleotides include those coding for resistance to herbicides that inhibit photosynthesis, such as a triazine and a benzonitrile (nitrilase) See, U.S. Patent No. 4,810,648. Additional suitable polynucleotides coding for herbicide resistance include those coding for resistance to 2, 2-dichloropropionic acid, sethoxydim, haloxyfop, imidazolinone herbicides, sulfonylurea herbicides, triazolopyrimidine herbicides, s-triazine herbicides and bromoxynil. Also suitable are polynucleotides conferring resistance to a protox enzyme, or that provide enhanced resistance to plant diseases; enhanced tolerance of adverse environmental conditions (abiotic stresses) including but not limited to drought, excessive cold, excessive heat, or excessive soil salinity or extreme acidity or alkalinity; and alterations in plant architecture or development, including changes in developmental timing. See, e.g., U.S. Patent Publication No. 2001/0016956 and U.S. Patent No. 6,084,155.
Additional suitable polynucleotides include those coding for insecticidal polypeptides. These polypeptides may be produced in amounts sufficient to control, for example, insect pests (i.e., insect controlling amounts) . It is recognized that the amount of production of an insectidal polypeptide in a plant necessary to control insects or other pests may vary depending upon the cultivar, type of pest, environmental factors and the like. Polynucleotides useful for additional insect or pest resistance include, for example, those that encode toxins identified in Bacillus organisms. Polynucleotides comprising nucleotide sequences encoding Bacillus thuringiensis (Bt) Cry proteins from several subspecies have been cloned and recombinant clones have been found to be toxic to lepidopteran, dipteran and/or coleopteran insect larvae. Examples of such Bt insecticidal proteins include the Cry proteins such as Cry1Aa, Cry1Ab, Cry1Ac, Cry1B, Cry1C, Cry1D, Cry1Ea, Cry1Fa, Cry3A, Cry9A, Cry9B, Cry9C, and the like, as well as vegetative insecticidal proteins such as Vip1, Vip2, Vip3, and the like. A full list of Bt-derived proteins can be found on the worldwide web at Bacillus thuringiensis Toxin Nomenclature Database  maintained by the University of Sussex (see also, Crickmore et al. (1998) Microbiol. Mol. Biol. Rev. 62: 807-813) .
In embodiments, an additional polypeptide is an insecticidal polypeptide derived from a non-Bt source, including without limitation, an alpha-amylase, a peroxidase, a cholesterol oxidase, a patatin, a protease, a protease inhibitor, a urease, an alpha-amylase inhibitor, a pore-forming protein, a chitinase, a lectin, an engineered antibody or antibody fragment, a Bacillus cereus insecticidal protein, a Xenorhabdus spp. (such as X. nematophila or X. bovienii) insecticidal protein, a Photorhabdus spp. (such as P. luminescens or P. asymobiotica) insecticidal protein, a Brevibacillus spp. (such as B. laterosporous) insecticidal protein, a Lysinibacillus spp. (such as L. sphearicus) insecticidal protein, a Chromobacterium spp. (such as C. subtsugae or C. piscinae) insecticidal protein, a Yersinia spp. (such as Y. entomophaga) insecticidal protein, a Paenibacillus spp. (such as P. propylaea) insecticidal protein, a Clostridium spp. (such as C. bifermentans) insecticidal protein, a Pseudomonas spp. (such as P. fluorescens) and a lignin.
Polypeptides that are suitable for production in plants further include those that improve or otherwise facilitate the conversion of harvested plants or plant parts into a commercially useful product, including, for example, increased or altered carbohydrate content or distribution, improved fermentation properties, increased oil content, increased protein content, modified oil profile, improved digestibility, and increased nutraceutical content, e.g., increased phytosterol content, increased tocopherol content, increased stanol content or increased vitamin content. Polypeptides of interest also include, for example, those resulting in or contributing to a reduced content of an unwanted component in a harvested crop, e.g., phytic acid, or sugar degrading enzymes. By “resulting in” or “contributing to” is intended that the polypeptide of interest can directly or indirectly contribute to the existence of a trait of interest (e.g., increasing cellulose degradation by the use of a heterologous cellulase enzyme) .
TheIn some embodiments, the polypeptide contributes to improved digestibility for food or feed. Xylanases are hemicellulolytic enzymes that improve the breakdown of plant cell walls, which leads to better utilization of the plant nutrients by an animal. This leads to improved growth rate and feed conversion. Also, the viscosity of the feeds containing xylan can be reduced. Heterologous production of xylanases in plant cells also can facilitate lignocellulosic conversion to fermentable sugars in industrial processing.
Numerous xylanases from fungal and bacterial microorganisms have been identified and characterized (see, e.g., U.S. Patent No. 5,437,992; Coughlin et al. (1993) “Proceedings of the Second TRICEL Symposium on Trichoderma reesei Cellulases and Other Hydrolases” Espoo; Souminen and Reinikainen, eds. (1993) Foundation for Biotechnical and Industrial  Fermentation Research 8: 125-135; U.S. Patent Publication No. 2005/0208178; and PCT Publication No. WO 03/16654) . In particular, three specific xylanases (XYL-I, XYL-II, and XYL-III) have been identified in T. reesei (Tenkanen et al. (1992) Enzyme Microb. Technol. 14: 566; Torronen et al. (1992) Bio/Technology 10: 1461; and Xu et al. (1998) Appl. Microbiol. Biotechnol. 49: 718) .
In other embodiments, a polypeptide useful for the disclosure can be a polysaccharide degrading enzyme. Plants of this disclosure producing such an enzyme may be useful for generating, for example, fermentation feedstocks for bioprocessing. In some embodiments, enzymes useful for a fermentation process include alpha amylases, proteases, pullulanases, isoamylases, cellulases, hemicellulases, xylanases, cyclodextrin glycotransferases, lipases, phytases, laccases, oxidases, esterases, cutinases, granular starch hydrolyzing enzyme and other glucoamylases.
Polysaccharide-degrading enzymes include: starch degrading enzymes such as α-amylases (EC 3.2.1.1) , glucuronidases (E.C. 3.2.1.131) ; exo-1, 4-α-D glucanases such as amyloglucosidases and glucoamylase (EC 3.2.1.3) , β-amylases (EC 3.2.1.2) , α-glucosidases (EC 3.2.1.20) , and other exo-amylases; starch debranching enzymes, such as a) isoamylase (EC 3.2.1.68) , pullulanase (EC 3.2.1.41) , and the like; b) cellulases such as exo-1, 4-3-cellobiohydrolase (EC 3.2.1.91) , exo-1, 3-β-D-glucanase (EC 3.2.1.39) , β-glucosidase (EC 3.2.1.21) ; c) L-arabinases, such as endo-1, 5-α-L-arabinase (EC 3.2.1.99) , α-arabinosidases (EC 3.2.1.55) and the like; d) galactanases such as endo-1, 4-β-D-galactanase (EC 3.2.1.89) , endo-1, 3-β-D-galactanase (EC 3.2.1.90) , α-galactosidase (EC 3.2.1.22) , β-galactosidase (EC 3.2.1.23) and the like; e) mannanases, such as endo-1, 4-β-D-mannanase (EC 3.2.1.78) , β-mannosidase (EC 3.2.1.25) , α-mannosidase (EC 3.2.1.24) and the like; f) xylanases, such as endo-1, 4-β-xylanase (EC 3.2.1.8) , β-D-xylosidase (EC 3.2.1.37) , 1, 3-β-D-xylanase, and the like; and g) other enzymes such as α-L-fucosidase (EC 3.2.1.51) , α-L-rhamnosidase (EC 3.2.1.40) , levanase (EC 3.2.1.65) , inulanase (EC 3.2.1.7) , and the like. In one embodiment, the α-amylase is the synthetic α-amylase, Amy797E, described is US Patent No. 8, 093, 453, herein incorporated by reference in its entirety.
Further enzymes which may be used with the disclosure include proteases, such as fungal and bacterial proteases. Fungal proteases include, but are not limited to, those obtained from Aspergillus, Trichoderma, Mucor and Rhizopus, such as A. niger, A. awamori, A. oryzae and M. miehei. In some embodiments, the polypeptides of this disclosure can be cellobiohydrolase (CBH) enzymes (EC 3.2.1.91) . In one embodiment, the cellobiohydrolase enzyme can be CBH1 or CBH2.
Other enzymes useful with the disclosure include, but are not limited to, hemicellulases, such as mannases and arabinofuranosidases (EC 3.2.1.55) ; ligninases; lipases (e.g., E.C. 3.1.1.3) , glucose oxidases, pectinases, xylanases, transglucosidases, alpha 1, 6 glucosidases (e.g., E.C. 3.2.1.20) ; esterases such as ferulic acid esterase (EC 3.1.1.73) and acetyl xylan esterases (EC 3.1.1.72) ; and cutinases (e.g. E.C. 3.1.1.74) .
Double stranded RNA molecules useful with the disclosure include but are not limited to those that suppress target insect genes. As used herein the words "gene suppression" , when taken together, are intended to refer to any of the well-known methods for reducing the levels of protein produced as a result of gene transcription to mRNA and subsequent translation of the mRNA. Gene suppression is also intended to mean the reduction of protein expression from a gene or a coding sequence including posttranscriptional gene suppression and transcriptional suppression. Posttranscriptional gene suppression is mediated by the homology between of all or a part of a mRNA transcribed from a gene or coding sequence targeted for suppression and the corresponding double stranded RNA used for suppression and refers to the substantial and measurable reduction of the amount of available mRNA available in the cell for binding by ribosomes. The transcribed RNA can be in the sense orientation to effect what is called co-suppression, in the anti-sense orientation to effect what is called anti-sense suppression, or in both orientations producing a dsRNA to effect what is called RNA interference (RNAi) . Transcriptional suppression is mediated by the presence in the cell of a dsRNA, a gene suppression agent, exhibiting substantial sequence identity to a promoter DNA sequence or the complement thereof to effect what is referred to as promoter trans suppression. Gene suppression may be effective against a native plant gene associated with a trait, e.g., to provide plants with reduced levels of a protein encoded by the native gene or with enhanced or reduced levels of an affected metabolite. Gene suppression can also be effective against target genes in plant pests that may ingest or contact plant material containing gene suppression agents, specifically designed to inhibit or suppress the expression of one or more homologous or complementary sequences in the cells of the pest. Such genes targeted for suppression can encode an essential protein, the predicted function of which is selected from the group consisting of muscle formation, juvenile hormone formation, juvenile hormone regulation, ion regulation and transport, digestive enzyme synthesis, maintenance of cell membrane potential, amino acid biosynthesis, amino acid degradation, sperm formation, pheromone synthesis, pheromone sensing, antennae formation, wing formation, leg formation, development and differentiation, egg formation, larval maturation, digestive enzyme formation, hemolymph synthesis, hemolymph maintenance, neurotransmission, cell division, energy metabolism, respiration, and apoptosis.
In one non-limiting embodiment, the polynucleotides provide herein are stacked with other polynucleotides that increase protein content, amino acid content, oil content, and/or oil profile, including, for example, the polynucleotides set forth in METHODS AND COMPOSITIONS FOR INCREASING PROTEIN AND/OR OIL CONTENT AND MODIFYING OIL PROFILE IN A PLANT, International Application No. ____________, filed ______, 2022 (Attorney Docket No. 086879-1262815; Syngenta Ref. No. 82423-WO-REG-ORG-P-1, and filed concurrently herewith and herein incorporated by reference in its entirety.
As used herein, “selectable marker” means a nucleotide sequence that when expressed imparts a distinct phenotype to the plant, plant part and/or plant cell expressing the marker and thus allows such transformed plants, plant parts and/or plant cells to be distinguished from those that do not have the marker. Such a nucleotide sequence may encode either a selectable or screenable marker, depending on whether the marker confers a trait that can be selected for by chemical means, such as by using a selective agent (e.g., an antibiotic, herbicide, or the like) , or on whether the marker is simply a trait that one can identify through observation or testing, such as by screening (e.g., the R-locus trait) . Selectable markers can also include the makers associated with oil and/or protein content and fatty acid profile (e.g., as described in Whiting, R.M., et al., BMC Plant Biol. 2020 Oct 23; 20 (1) : 485) .
VI. Marker assisted selection of the plants with improved traits.
In addition to the phenotypic traits, the genetic characteristic of the plant as represented by its genetic marker profile can be used to select plants of desired traits. The term “marker-based selection” refers to the use of genetic markers to detect one or more nucleic acids from the plant, where the nucleic acid is associated with a desired trait to identify plants that carry genes for desirable (or undesirable) traits. Markers include but are not limited to Restriction Fragment Length Polymorphisms (RFLPs) , Randomly Amplified Polymorphic DNAs (RAPDs) , Arbitrarily Primed Polymerase Chain Reaction (AP-PCR) , DNA Amplification Fingerprinting (DAF) , Sequence Characterized Amplified Regions (SCARs) , Amplified Fragment Length Polymorphisms (AFLPs) , Simple Sequence Repeats (SSRs) which are also referred to as Microsatellites, and Single Nucleotide Polymorphisms (SNPs) . There are known sets of public markers that are being examined by ASTA and other industry groups for their applicability in standardizing determinations of what constitutes an essentially derived variety under the US Plant Variety Protection Act. However, these standard markers do not limit the type of marker and marker profile which can be employed in breeding or developing backcross conversions, or in distinguishing varieties or plant parts or plant cells, or verify a progeny pedigree. Primers and  PCR protocols for assaying these and other markers are disclosed in the Soybase (sponsored by the USDA Agricultural Research Service and Iowa State University) located at the world wide web at 129.186.26.94/SSR. html.
The term “associated with” as used herein refers to a recognizable and/or detectable relationship between two entities. For example, the phrase “associated with increased protein content” refers to a trait, locus, gene, allele, marker, phenotype, etc., or the expression product thereof, the presence or absence of which can influence or indicate an extent and/or degree to which a plant or its progeny exhibits increased protein content as compared to a control plant. As such, a marker is “associated with” a trait when it is linked to it and when the presence of the marker is an indicator of whether and/or to what extent the desired trait or trait form will occur in a plant/germplasm comprising the marker. Similarly, a marker is “associated with” an allele when it is linked to it and when the presence (or absence) of the marker is an indicator of whether the allele is present (or absent) in a plant, germplasm, or population comprising the marker. For example, “amarker associated with increased protein content” refers to a marker whose presence or absence can be used to predict whether and/or to what extent a plant will display increased protein content as compared to a control plant.
The term “allele (s) ” refer to any of one or more alternative forms of a gene, all of which alleles relate to at least one trait or characteristic. In a diploid cell, the two alleles of a given gene occupy corresponding loci on a pair of homologous chromosomes.
The term “genotype” and variants thereof refers to the genetic composition of an organism, including, for example, whether a diploid organism is heterozygous (i.e., has two different alleles for a given gene or QTL) or homozygous (i.e., has the same allele for a given gene or QTL) for one or more genes or loci (e.g., a SNP, a haplotype, a gene mutation, an insertion, or a deletion) .
In one embodiment, the markers used to identify the plants comprising the polynucleotides disclosed herein are SNPs. Non-limiting examples of SNP genotyping methods include hybridization, primer extension, oligonucleotide ligation, nuclease cleavage, minisequencing and coded spheres. Such methods are well known and disclosed in e.g., Gut, I. G., Hum. Mutat. 17: 475-492 (2001) ; Shi, Clin. Chem. 47 (2) : 164-172 (2001) ; Kwok, Pharmacogenomics 1 (1) : 95-100 (2000) ; and Bhattramakki and Rafalski, Discovery and application of single nucleotide polymorphism markers in plants, in PLANT GENOTYPING: THE DNA FINGERPRINTING OF PLANTS, CABI Publishing, Wallingford (2001) . A wide range of commercially available technologies utilize these and other methods to interrogate  SNPs, including Masscode SupTM/Sup (Qiagen, Germantown, MD , (Hologic, Madison, WI) , (Applied Biosystems, Foster City, CA) , (Applied Biosystems, Foster City, CA) and Beadarrays SupTM/Sup (Illumina, San Diego, CA) .
In some embodiments, an assay (e.g. generally a two-step allelic discrimination assay or similar) , a KASP SupTM/Sup assay (generally a one-step allelic discrimination assay defined below or similar) , or both can be employed to identify the SNPs that associate with increased protein content, increased oil content, and/or modified oil profileas disclosed herein (e.g., favorable alleles as depicted in Tables 2-5 below) . In an exemplary two-step assay, a forward primer, a reverse primer, and two assay probes that recognize two different alleles at the SNP site (or hybridization oligos) are employed. The forward and reverse primers are employed to amplify genetic loci that comprise SNPs that are associated with increased protein content, increased oil content, and/or modified oil profile (for example, any of the favorable alleles as shown in Tables 2-5 below) . The particular nucleotides that are present at the SNP positions are then assayed using the probes. In some embodiments, the assay probes and the reaction conditions are designed such that an assay probe will only hybridize to the reverse complement of a 100%perfectly matched sequence, thereby permitting identification of which allele (s) that are present based upon detection of hybridizations. In some embodiments, the probes are differentially labeled with, for example, fluorophores to permit distinguishing between the two assay probes in a single reaction. Exemplary methods of amplifying include employing a polymerase chain reaction (PCR) or ligase chain reaction (LCR) using a nucleic acid isolated from a soybean plant or germplasm as a template in the PCR or LCR.
In some embodiments, a number of SNP alleles together within a sequence, or across linked sequences, can be used to describe a haplotype for any particular genotype. Ching et al., BMC Genet. 3: 19 (2002) (14 pages) ; Gupta et al., (2001) Curr Sci. 80: 524–535, Rafalski, Plant Sci. 162: 329-333 (2002) . In some cases, haplotypes can be more informative than single SNPs and can be more descriptive of any particular genotype. For example, a single SNP may be allele “T” for a specific disease resistant line or variety, but the allele “T” might also occur in the soybean breeding population being utilized for recurrent parents. In this case, a combination of alleles at linked SNPs may be more informative. Once a unique haplotype has been assigned to a donor chromosomal region, that haplotype can be used in that population or any subset thereof to determine whether an individual has a particular gene. The use of automated high throughput marker detection platforms known to those of ordinary skill in the art makes this process highly efficient and effective.
The term “haplotype” can refer to the set of alleles an individual inherited from one parent. A diploid individual thus has two haplotypes. The term “haplotype” can be used in a more limited sense to refer to physically linked and/or unlinked genetic markers (e.g., sequence polymorphisms) associated with a phenotypic trait. The phrase “haplotype block” (sometimes also referred to in the literature simply as a haplotype) refers to a group of two or more genetic markers that are physically linked on a single chromosome (or a portion thereof) . Typically, each block has a few common haplotypes, and a subset of the genetic markers (i.e., a “haplotype tag” ) can be chosen that uniquely identifies each of these haplotypes.
Exemplary markers that are associated with and can be used to identify plants having increased protein content and/or increased oil content are shown in Tables 2-5.
Table 2. SNP sites in Glyma. 06G303700
Figure PCTCN2022075977-appb-000002
The 20 SNPs shown in Table 2 can be divided into three blocks using the HaploView4.2 software. Studies show that SNP in each block had strong linkage disequilibrium. Block 1 contains SNP #1-#5, of which SNP #4 is located in the CDS coding region. Block 2 contains SNP #7-#18 12, among which SNP #7 and #8 are located in CDS coding region; Block 3 contains SNP #19 and #20, both of which are outside the CDS coding region.
The SNP genotyping reveals seven different haplotypes that are associated with increased protein content and/or increased oil content. Tables 3-5 shown the genotype of each haplotype.
Table 3. Block 1 haplotypes
SNP # 1 2 3 4 5 Frequency
Ref C T C C C  
Hap_1 C T C C C 71.07%
Hap_2 C C T T A 12.60%
Hap_3 T T C C C 7.31%
Table 4. Block 2 haplotypes
SNP# 7 8 9 10 11 12 13 14 15 16 17 18 Frequency
Ref C G G T G G A C G A C T  
Hap_4 C G G T G G A C G A C T 58.01%
Hap_5 G A A T A C G T T G A C 20.68%
Table 5. Block 3 haplotypes
SNP # 19 20 Frequency
Ref. C T  
Hap_6 C T 64.23%
Hap_7 T C 35.93%
As shown in the examples, haplotypes Hap_2, Hap_3, and Hap_6 were found associated with increased protein content; haplotypes Hap_1, Hap_2, Hap5 and Hap_7 were found associated with increased oil content. Hap_2 was associated with both increased oil content and increased protein content. FIG. 13-15.
These SNP markers can be used in a marker assisted breeding program to move traits, such as native traits or traits conferred by transgenes or traits conferred by genome editing, into the a desired plant background. As used herein, the term “native trait” refers to a trait already existing in germplasm, including wild relatives of crop species, or that can be produced by recombination of existing traits. For example, progeny plants from a cross between a donor soybean plant comprising in its genome a nucleic acid sequence encoding SEQ ID NO: 3, 5, 8, 9, 12,15, 18, 19, 22, 24-59, or a fragment or variant of any one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59, and a recipient soybean plant not comprising said nucleic acid sequence can be screened to detect the presence of the markers associated with increased protein content, increased oil conten, and/or modified oil profilet. Plants comprising said markers can be selected and verified for increased protein content, increased oil content, and/or modified oil profile as compared to control plants. In some embodiments, the donor plant comprises a nucleic acid sequence encoding SEQ ID NO: 3 and the markers are those listed in Table 2. In some embodiments, the markers that can be used to select plants having increased protein content are the alleles associated one or more haplotypes of Hap_1, Hap_2, Hap_5, or Hap_7. In some embodiments, the markers that can be used to select plants having increased oil content are the alleles associated with one or more haplotypes of Hap_2, Hap_3, or Hap_6. The favorable alleles of the SNPs are those present in one or more of aforementioned haplotypes.
VII. Assay, kits, and primers
Also provided herein are the kits and primers that can be used to introduce a polynucleotide sequence as described in this disclosure into a recipient plant or to detect a polynucleotide sequence as described in this disclosure in a plant.
Also provided herein are kits and primers that can be used to identify plants that have increased protein content, increased oil content, and/or modified oil profile. As a non-limiting example, the primers can include Glyma. 06G303700-F ATAACTAGTATGTTCCAGCCGAACC (SEQ ID NO: 40) ; and Glyma. 06G303700-R, ATAGGATCCAGCAGGTTCACCAGA (SEQ ID NO: 41) .
Also provided herein are the kits and primers that can be used to detect the expression level of the polypeptide disclosed herein in plants. As a non-limiting example, the primers can include Glyma. 06G303700-q-F: AGTTGCACCGATTCAACAGGC (SEQ ID NO: 63) ; and Glyma. 06G303700-q-R: CCATGCGATGTGGTTCCATCT (SEQ ID NO: 64) .
Also provided herein are the kits and primers that can be used to detect the expression level of the polypeptide disclosed herein in plants. As a non-limiting example, the primers can include Glyma. 06G303700-q-F: AGTTGCACCGATTCAACAGGC (SEQ ID NO: 65) ; and Glyma. 06G303700-q-R: CCATGCGATGTGGTTCCATCT (SEQ ID NO: 66) .
In some embodiments, the kit may also comprise one or more probes having a sequence corresponding to or complementary to a sequence having 80%to 100%sequence identity with a specific region of the transgenic event or gene editing event. In some embodiments, the kit may comprise any reagent and material required to perform the assay or detection method.
EXEMPLARY EMBODIMENTS
Embodiment 1. An elite Glycine max plant having in its genome a nucleic acid sequence from a donor Glycine plant, wherein the donor Glycine plant is a different strain from the elite Glycine max plant, and wherein the nucleic acid sequence encoding at least one polypeptide having at least 90%identity or 95%identity to SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein said polypeptide confers increased protein, oil content, and/or modified oil profile on the elite Glycine max plant.
Embodiment 2. The elite Glycine max plant of embodiment 1, wherein the donor Glycine plant is from Glycine soja or Glycine max.
Embodiment 3. The elite Glycine max plant of embodiment 2, wherein the Glycine soja is the ZYD00006 variety. s
Embodiment 4. The elite Glycine max plant of  embodiment  1 or 2, wherein the nucleic acid sequence encodes at least one polypeptide having the amino acid sequence set forth in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59.
Embodiment 5. The elite Glycine max plant of any one of embodiments 1-3, wherein the nucleic acid sequence has at least 90%, 95%or 100%sequence identity to any one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, 21 and polynucleotides encoding SEQ ID NO: 22, and 24-59
Embodiment 6. The elite Glycine max plant of embodiment 1, wherein the polypeptide encoded by the nucleic acid sequence has at least 90%, or at least 95%identity to SEQ ID NO: 3 or SEQ ID NO: 5 or or SEQ ID NO: 22, wherein the polypeptide comprises one or more of the following: (i) a START domain, wherein START domain has no more than two, no more than five, no more than ten amino acid substitutions as compared to amino acid residues 246-466 of SEQ ID NO: 20, or (ii) a homeodomain, wherein the homeodomain has no more than two, no more than five, no more than ten amino acid substitutions as compared to amino acid residues 55-113 of SEQ ID NO: 20.
Embodiment 7. The elite Glycine max plant of any one of embodiments 1-6, wherein the nucleic acid sequence is introduced into said plant genome by genome editing of the sequence set forth in SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16 or 17, wherein the genome editing confers increased protein, oil content, and/or oil profile.
Embodiment 8. The elite Glycine max plant of any one of embodiments 1-6, wherein the nucleic acid sequence is introduced by genome editing of a Glycine max genomic region homologous to or an ortholog of the nucleic acid sequence corresponding to SEQ ID NO: 1, and further making at least one genomic edit to said Glycine max genomic region of at least one allele change corresponding to any described in any of Tables 21-23, wherein the one or more alleles are associated with the one or more of haplotypes Hap_1, Hap_2, Hap_3, Hap_5, Hap_6, and/or Hap_7, wherein said one or more alleles confer in the plant increased protein and/or oil content, wherein said Glycine max genomic region did not comprise said allele change before genome editing, and wherein said genomic edit confers in the plant increased protein and/or oil content.
Embodiment 9. The elite Glycine max plant of embodiment 7 or 8, wherein the genomic editing is accomplished through CRISPR, TALEN, meganucleases, or through modification of genomic nucleic acids.
Embodiment 10. The elite Glycine max plant of any one of embodiments 1-6, wherein said nucleic acid sequence is introduced into said plant genome by transgenic expression of (a) a nucleic acid sequence encoding at least one polypeptide having at least 90%identity or 95%identity to SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 and apolynucleotide encoding any one of SEQ ID NO: 22, or 24-59 or (b) a nucleic acid sequence encoding at least one polypeptide set forth in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59 wherein said polypeptide confers increased protein and/or oil content on the elite Glycine max plant.
Embodiment 11. The elite Glycine max plant of any of embodiments 1-10, wherein the elite Glycine max plant has in its genome at least one allele that is associated with a haplotype of Hap_1, Hap_2, Hap_5, and/or Hap_7, wherein the plant has increased oil content.
Embodiment 12. The elite Glycine max plant of any of embodiments 1-10, wherein the elite Glycine max plant has in its genome at least one allele that is associated with a haplotype of Hap_2, Hap_3, and/or Hap_6, wherein the plant has increased protein content.
Embodiment 13. The elite Glycine max plant of any one of embodiments 1-6, wherein at least one parental line of said elite Glycine max plant was selected or identified through molecular marker selection, wherein said parental line is selected or identified based on the presence of a molecular marker located within or closely linked with said nucleic acid sequence corresponding to any one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, or any portion thereof, wherein said molecular marker is associated with increased protein and/or oil content and/or modified oil profile.
Embodiment 14. The elite Glycine max plant of embodiment 13, wherein the molecular marker is a single nucleotide polymorphism (SNP) , a quantitative trait locus (QTL) , an amplified fragment length polymorphism (AFLP) , randomly amplified polymorphic DNA (RAPD) , a restriction fragment length polymorphism (RFLP) , or a microsatellite.
Embodiment 15. The elite Glycine max plant of embodiment 13 or 14, wherein the nucleic acid sequence comprises a SNP marker associated with increased protein and/or oil content, and wherein the molecular marker is any one or more of the SNP markers as shown in Table 2.
Embodiment 16. The elite Glycine max plant of any one of embodiments 1-15, wherein the elite Glycine max plant is an agronomically elite Glycine max plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, herbicide tolerance.
Embodiment 17. A plant, the genome of which has been edited to comprise a nucleic acid sequence encoding at least one polypeptide having at least 90%identity or 95%identity to SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 and a polynucleotide encoding any one of SEQ ID NO: 22, or 24-59, wherein said polypeptide confers increased protein and/or oil content and/or modified oil profile relative to a control plant, wherein the plant does not comprise said nucleic acid sequence before the genome editing.
Embodiment 18. The plant of embodiment 17, wherein the nucleic acid sequence is introduced into said plant genome by genome editing of a nucleic acid sequence set forth in SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or a nucleic acid sequence encoding any oneof SEQ ID NO: 22, or 24-59.
Embodiment 19. The plant of embodiment 17 or 18, wherein the nucleic acid sequence is introduced by genome editing of a genomic region homologous to or an ortholog of the nucleic acid sequence corresponding to SEQ ID NO: 1, and further making at least one genomic edit to said Glycine max genomic region of at least one allele change corresponding to any described in any of Table 2, wherein the one or more alleles are associated with the one or more of haplotypes Hap_1, Hap_2, Hap_3, Hap_5, Hap_6 and/or Hap_7, wherein said one or more alleles confer in the plant increased protein and/or oil content, wherein said Glycine max genomic region did not comprise said allele change before genome editing, and wherein said genomic edit confers in the plant increased protein and/or oil content.
Embodiment 20. The plant of embodiment 17, wherein the nucleic acid sequence is modified into said plant genome by duplication, inversion, promoter modification, terminator modification and/or splicing modification via genome editing of a nucleic acid sequence set forth in SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 and a nucleic acid sequence encoding any one of SEQ ID NO: 22, or 24-59.
Embodiment 21. The plant of any one of embodiments 17-20, wherein the genomic editing is accomplished through CRISPR, TALEN, meganucleases, or through modification of genomic nucleic acids.
Embodiment 22. The plant of any one of embodiments 17-21, wherein the plant has in its genome at least one allele that is associated with a haplotype of Hap_1, Hap_2, Hap_5, and/or Hap_7, wherein the plant has increased oil content.
Embodiment 23. The plant of any one of embodiments 17-21, wherein the plant has in its genome at least one genetic marker that is allele that is associated with a haplotype of Hap_2, Hap_3, and/or Hap_6, wherein the plant has increased protein content.
Embodiment 24. The plant of any one of embodiments 17-23, wherein the nucleic acid sequence comprises a SNP marker associated with increased protein and/or oil content, and wherein the molecular marker is any one or more of the SNP markers as shown in Table 2.
Embodiment 25. The plant of any one of embodiments 17-24, wherein the plant is an agronomically elite plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, herbicide tolerance.
Embodiment 26. The plant of any one of embodiments 17-25, wherein the nucleic acid sequence is operably linked to a heterologous promoter and wherein the heterologous promoter is active in the plant.
Embodiment 27. The plant of embodiment 26, wherein the promoter is a native promoter or active variant or fragment thereof.
Embodiment 28. A plant having stably incorporated into its genome a nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence encodes a polypeptide having (a) an amino acid sequence comprising at least 85%, at least 90%, or at least 95%identity to at least one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, or, (b) an amino acid sequence set forth in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59 wherein said nucleic acid sequence is heterologous to the plant, and wherein the plant has increased protein content and/or increased oil and/or modified oil profile as compared to a control plant.
Embodiment 29. The plant of embodiment 28, wherein (a) said nucleic acid sequence comprises at least 85%, at least 90%, or at least 95%identity to at least one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or to a polynucleotide encodes any one of SEQ ID NO: 22, 24-59, or, (b) said nucleic acid sequence is any one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or encodes any one of SEQ ID NO: 22, 24-59.
Embodiment 30. The plant of embodiment 28 or 29, wherein the nucleic acid sequence is introduced into the genome by transgenic expression.
Embodiment 31. The plant of embodiment 28 or 29, wherein the nucleic acid sequence is introduced by genome editing.
Embodiment 32. The plant of any one of embodiments 28-31, wherein the promoter is an endogenous promoter.
Embodiment 33. The plant of any one of embodiments 28-31, wherein the promoter is a constitutive promoter, inducible promoter, a a tissue-specific promoter.
Embodiment 34. The plant of any one of embodiments 28-30, wherein said genomic region of the plant comprises at least one allele corresponding to one or more alleles as described in any of Tables 2-5, wherein the one or more alleles are associated with one or more of haplotypes Hap_1, Hap_2, Hap_3, Hap_5, Hap_6, and/or Hap_7, and wherein said one or more alleles confer in the plant increased protein and/or oil content.
Embodiment 35. The plant of any one of embodiments 28-34, wherein the plant has in its genome at least one allele associated with a haplotype of Hap_1, Hap_2, Hap_5, and/or Hap_7, wherein the plant has increased oil content.
Embodiment 36. The plant of any one of embodiments 28-34, wherein the plant has in its genome at least one allele associated with a haplotype of Hap_2, Hap_3, and/or Hap_6, wherein the plant has increased protein content.
Embodiment 37. The plant of any one of embodiments 28-36, wherein the nucleic acid sequence comprises a SNP marker associated with increased protein and/or oil content, and wherein the molecular marker is any one or more of the SNP markers as shown in Table 2.
Embodiment 38. The plant of any one of embodiments 28-37, wherein the plant is a dicot plant.
Embodiment 39. The plant of embodiment 38, wherein the dicot plant is a soybean plant or an elite soybean plant.
Embodiment 40. The plant of any one of embodiments 28-37, wherein the plant is a monocot plant.
Embodiment 41. The plant of embodiment 40, wherein the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane.
Embodiment 42. The plant of any one of embodiments 28-41, wherein the plant is an agronomically elite plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, or herbicide tolerance.
Embodiment 43. A progeny plant from the elite Glycine max plant of any one of embodiments 1-16 or the plant of any one of embodiments 17-42, wherein said progeny plant has stably incorporated into its genome the nucleic acid sequence.
Embodiment 44. A plant cell, seed, or plant part derived from the elite Glycine max plant of any one of embodiments 1-16 or the plant of any one of embodiments 17-42, wherein said plant cell, seed or plant part has stably incorporated into its genome the nucleic acid sequence.
Embodiment 45. A harvest product derived from the elite Glycine max plant of any one of embodiments 1-16 or the plant of any one of embodiments 17-42.
Embodiment 46. A processed product derived from the harvest product of embodiment 45, wherein the processed product is a flour, a meal, an oil, a starch, or a product derived from any of the foregoing.
Embodiment 47. A method of producing a soybean plant having increased protein and/or oil content and/or modified oil profile, the method comprising the steps of: a) providing a donor soybean plant comprising in its genome a nucleic acid sequence encoding at least one polypeptide having at least 90%identity or 95%identity to SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or a nucleic acid sequence encoding any one of SEQ ID NO: 22, or 24-59, wherein said nucleic acid sequence confers onto said donor soybean plant an increased protein and/or oil content and/or modified oil profile; b) crossing the donor soybean plant of a) with the recipient soybean plant not comprising said nucleic acid sequence; and c) selecting a progeny plant from the cross of b) by isolating a nucleic acid from said progeny plant and detecting within said nucleic acid a molecular marker associated with said nucleic acid sequence thereby producing a soybean plant having increased protein content and/or increased oil content and/or modified oil profile.
Embodiment 48. The method of embodiment 47, wherein the molecular marker is a single nucleotide polymorphism (SNP) , a quantitative trait locus (QTL) , an amplified fragment length polymorphism (AFLP) , randomly amplified polymorphic DNA (RAPD) , a restriction fragment length polymorphism (RFLP) or a microsatellite.
Embodiment 49. The method of embodiment 47 or 48, wherein the molecular markers are markers as set forth in Tables 2-5.
Embodiment 50. The method of any one of embodiments 47-49, wherein either the recipient or the donor soybean plant is an elite Glycine max plant.
Embodiment 51. A method of producing a Glycine max plant with increased protein and/or oil content to, the method comprising the steps of: a) isolating a nucleic acid from a Glycine max plant; b) detecting in the nucleic acid of a) at least one molecular marker associated  with, or closely linked with a nucleic acid sequence comprising any one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, or a portion of any thereof, wherein said portion confers to a plant increased protein content and/or increased oil content; c) selecting a plant based on the presence of the molecular marker detected in b) ; and d) producing a Glycine max progeny plant from the plant of c) identified as having said marker associated with increased protein and/or increased oil content.
Embodiment 52. The method of embodiment 51, wherein the molecular marker is a single nucleotide polymorphism (SNP) , a quantitative trait locus (QTL) , an amplified fragment length polymorphism (AFLP) , randomly amplified polymorphic DNA (RAPD) , a restriction fragment length polymorphism (RFLP) or a microsatellite.
Embodiment 53. The method of embodiment 51 or 52, wherein the molecular marker is one or more SNPs set forth in Table 2.
Embodiment 54. The method of any one of embodiments 51-53, wherein the molecular marker comprises alleles associated with one or more of haplotypes Hap_1, Hap_2, Hap_3, Hap_5, and/or Hap_7.
Embodiment 55. The method of embodiment 51, wherein the detecting comprises amplifying a molecular marker locus or a portion of the molecular marker locus and detecting the resulting amplified molecular marker amplicon.
Embodiment 56. The method of embodiment 51, wherein the nucleic acid is selected from DNA or RNA.
Embodiment 57. A plant produced by the method of any one of embodiments 47-56.
Embodiment 58. A method of conferring increased protein content and/or increased oil content and/or modified oil profile to a plant comprising: a) introducing into the genome of the plant a nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence is stably incorporated into the genome, wherein the nucleic acid sequence encodes a polypeptide having (i) an amino acid sequence comprising least 85%, at least 90%, or at least 95%identity to at least one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59, or (ii) an amino acid sequence set forth in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59 wherein said nucleic acid sequence is heterologous to the plant, and wherein expression of said nucleic acid sequence increases protein content and/or increases oil content compared to a control plant not expressing said nucleic acid sequence.
Embodiment 59. The method of embodiment 58, wherein the nucleic acid sequence is introduced into the genome of the plant by transformation.
Embodiment 60. The method of embodiment 58, wherein the nucleic acid sequence is introduced into the genome of the plant by crossing a donor plant comprising the nucleic acid sequence with the plant to produce a progeny plant having increased protein content and/or increased oil content.
Embodiment 61. The method of embodiment 58, wherein the nucleic acid sequence is introduced into the genome of the plant by gene editing of the genome of the plant.
Embodiment 62. The method of embodiment 58, wherein the method comprises Cas12a mediated gene replacement.
Embodiment 63. The method of embodiment 62, wherein the method comprises at least one gRNA.
Embodiment 64. The method of any one of embodiments 58-63, wherein the promoter is an exogenous promoter.
Embodiment 65 The method of any one of embodiments 58-63, wherein the promoter is an endogenous promoter.
Embodiment 66. The method of embodiment 64, wherein the exogenous promoter comprises SEQ ID NO: 23 or an active variant or fragment thereof.
Embodiment 67. The method of embodiment 59, wherein the method comprises screening for the introduced nucleic acid sequence with PCR and/or sequencing.
Embodiment 68. The method of any one of embodiments 58-67, wherein the plant is a dicot plant.
Embodiment 69. The method of embodiment 68, wherein the dicot plant is a soybean plant.
Embodiment 70. The method of any one of embodiments 58-67, wherein the plant is a monocot plant.
Embodiment 71. The method of embodiment 70, wherein the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane.
Embodiment 72. A plant produced by the method of any one of embodiments 58-71.
Embodiment 73. A polypeptide selected from: (a) a polypeptide having the amino acid sequence shown in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein expression of the polypeptide in a plant confers increased protein, oil content and/or modified oil profile on said plant, and having a heterologous amino acid sequence attached thereto; (b) a polypeptide comprising the amino acid sequence of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, and having a substitution and/or a deletion and/or an addition of one or more amino acid residues, wherein expression of the polypeptide in the plant confers increased protein and/or oil content on said plant; (c) a polypeptide having at least 99%, at least 95%, at least 90%, at least 85%, or at least 80%identity with and having the same function as the amino acid sequence of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein the polypeptide when expressed in a plant confers increased polypeptide and/or oil content on the plant; or (d) a fusion protein comprising the amino acid sequence of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59 or the polypeptide as defined in any one of (a) to (c) .
Embodiment 74. A nucleic acid molecule comprising (a) a nucleotide sequence encoding a protein having an amino acid sequence sharing at least 90%, 95%or 100%sequence identity to SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein said nucleotide sequence comprises a heterologous nucleic acid sequence attached thereto and expression of the nucleic acid molecule in a plant increase protein and/or oil content in the plant; (b) the nucleotide sequence of part (a) comprising a sequence of SEQ ID NOs: NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, 21 and a sequence encoding SEQ ID NO: 22, 24-59; or (c) the nucleotide sequence of part (a) having at least 99%, at least 95%, at least 90%identity to of any one of SEQ ID NOs: NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, 21 or a polynucleotide of SEQ ID NO: 22, 24-59.
Embodiment 75. An expression cassette comprising the nucleic acid molecule of embodiment 74 or a nucleic acid sequence encoding the polypeptide of embodiment 73.
Embodiment 76. The expression cassette of embodiment 75, wherein the nucleic acid molecule is operably linked to a promoter that is capable of directing expression in a plant cell.
Embodiment 77. The expression cassette of embodiment 75, wherein the promoter is an endogenous promoter.
Embodiment 78. The expression cassette of embodiment 75, wherein the promoter is an exogenous promoter.
Embodiment 79. A vector comprising the nucleic acid molecule of embodiment 74, the expression cassette of any one of embodiments 75-78.
Embodiment 80. A transgenic cell comprising the nucleic acid molecule of embodiment 74 or the expression cassette of any one of embodiments 75-78.
Embodiment 81. Use of the polypeptide of embodiment 73 or the nucleic acid molecule of embodiment 74, or the expression cassette of any one of embodiments 75 to 78 in conferring increased protein content and/or increased oil content and/or modified oil profile in a plant.
Embodiment 82. Use of the expression cassette of any one of embodiments 75-78 in a cell, wherein the expression level and/or activity of the polypeptide in the cell is increased, and the protein content and/or oil content is increased in a plant upon expression in a plant.
Embodiment 83. A method for increasing protein content and/or oil content in a plant, comprising increasing the expression level and/or activity of the polypeptide of embodiment 73 in the plant.
Embodiment 84. A method for producing a plant variety with increased protein content and/or oil content, comprising increasing the expression level and/or activity of the polypeptide of embodiment 73 in a recipient plant.
Embodiment 85. The method of embodiments 83 or 84, wherein the increasing the expression level and/or activity of the polypeptide in the plant is by transgenic means or by breeding.
Embodiment 86. A method for producing a transgenic plant with increased protein content and/or oil content, comprising the following step: introducing the nucleic acid molecule of embodiment 67 or the expression cassette of any one of embodiments 75-78 to a recipient plant to obtain a transgenic plant; the transgenic plant has increased protein content and/or oil content compared with the recipient plant.
Embodiment 87. The method of embodiment 86, wherein the introducing the nucleic acid molecule to the recipient plant is performed by introducing the expression cassette of any one of embodiments 75-78 into the recipient plant.
Embodiment 88. A primer pair for amplifying the nucleic acid molecule of embodiment 74.
Embodiment 89. The primer pair of embodiment 88, wherein the primer pair is a primer pair 1 composed of two single-stranded DNA comprising a sequence of SEQ ID NO: 63 and SEQ ID NO: 64.
Embodiment 90. A kit comprising the primer pair of embodiment 88 or 89.
EXAMPLES
Example 1. Experimental Materials Used in Examples 2-11
The roots, stems, leaves, flowers, pods and seeds of parent SN14 were selected as template materials for tissue-specific expression. The materials were put into an Eppendorf (EP) tube without RNase and immediately put into liquid nitrogen, and stored at -80℃. The soybean template material was SN14, and the soybean transformation material was DN50. The Arabidopsis transformation material is Col-0, and the Arabidopsis mutant material is SALK_127828.47.00. x (ordered from the ABRC website) .
Unless explicitly stated otherwise, Escherichia coli used in this application was DH5αand Agrobacterium tumefaciens was EHA105. The target gene fragment of entry vector Fu28 was connected to plant expression vector Pr35S by gateway vector system. The entry vector Fu28 (FIG. 16) and expression vector Pr35S (FIG. 17) were provided by Professor Fu Yongfu of Institute of crop science, Chinese Academy of Agricultural Sciences.
The main reagents involved in this experiment are shown in Table 6.
Table 6. Main experimental reagent
Figure PCTCN2022075977-appb-000003
The cultures and antibiotics involved in this experiment are Table 7 and Table 8.
Table 7. Experimental medium
Figure PCTCN2022075977-appb-000004
Figure PCTCN2022075977-appb-000005
Table 8 Antibiotics
Figure PCTCN2022075977-appb-000006
The websites used in the experimental analysis are shown in the Table 9.
Table 9. Function prediction website of candidate gene
Figure PCTCN2022075977-appb-000007
Obtain the gene genome sequence, CDS sequence, peptide sequence and expression data of each tissue part of the gene from the phytozome website. Use Soybase website, phytozome website, Interproscan website, NCBI website to annotate gene function, get Gene Ontology (GO) database number, Kyoto Encyclopedia of Genes and Genomes (KEGG) number, Pfam number and structural domain information.
The genome sequence, CDS sequence and peptide sequence of candidate genes were obtained from the phytozome website. The parental strains Suinong 14 (SN14) and ZYD00006 were fully sequenced but the sequencing information has not been published. Williams 82 is a soybean cultivar used to produce the reference genome sequence. The relevant sequences in Williams 82, SN14, and ZYD00006 were analyzed and compared by DNAMAN software.
The medium for genetic transformation of soybean cotyledon node is shown in Table 10.
Table 10. Medium formula for genetic transformation of soybean cotyledonary node
Figure PCTCN2022075977-appb-000008
Figure PCTCN2022075977-appb-000009
Note: B5 salt: Gamborg Basal Salt Mixture; MES: 2- (4-Morpholino) ethanesulfonic acid; 6-BA: 6-benzylaminopurine; GA3: Gibberellic acid; AS: Acetosyringone; L-Cys: L-Cysteine; DTT: DL-Dithiothreitol; ZT: zeatin; Pro: Proline; Asp: Aspartic acid; Glu: glutamic acid; IAA: 3-Indoleacetic acid.
Example 2. Methods for the Identification of Arabidopsis mutants
A.  Arabidopsis mutant search and order
Using the Blast function of Phytozome website, the target crop was selected as Arabidopsis thaliana, and the gene sequence was Glyma. START-CDS sequence. The homologous gene in Arabidopsis thaliana was obtained. The conserved functional domain of the homologous gene in Arabidopsis thaliana was predicted, and the domain was similar to the target gene Glyma. START. Search for homologous genes in Arabidopsis thaliana on Tair10 website, screen and order mutants that meet the following conditions: (1) The control background was wild-type Col-0; (2) There are fewer mutations in transcriptional gene, and it is better to mutate only in target gene; (3) T-DNA insertion; and (4) The mutant genotype was homozygous (AT1G05230, FIG. 1 and FIG. 3) .
B.  Planting Arabidopsis thaliana
The planting soil ratio offlower nutrient soil: vermiculite was 3: 1. The soil was put into small flower pots and slowly soaked in water. Arabidopsis thaliana seeds were sown evenly in moist soil. The opening of each pot was sealed with plastic wrap and placed in a refrigerator at 4℃ for vernalization for 48-72h. After vernalization, the pots were placed in an incubator (22℃, 16 h/8 h light/dark, 70 μmol·m-2·s-1) for 1 week until the Arabidopsis emerged. Arabidopsis plants having two large leaves and two small leaves were selected for transplanting in pots; 1-2 plants per pot. Water or flower fertilizer was added when the soil in the pots became dry.
C.  DNA extraction from Arabidopsis leaves
Total DNA was extracted by the CTAB (hexadecyltrimethylammonium bromide) method (Porebski, S. et al., Plant Molecular Biology Reporter, 1997, 15 (1) : 8-15) . The prepared CTAB extract was stored for at 4℃. The rosette leaves of Arabidopsis thaliana were collected and placed in an EP tube with add 2 mm small steel balls. Lquid nitrogen was used to quick- freeze the leaves. Next the frozen leaves placed in a tissue grinder to fully break the leaves. 700 μL of CTAB extract solution was added to the EP tube containing the sample and mixed thoroughly with a vortexer. The mixture in the EP tube was then placed in a 65℃ water bath for 1 h, turning and mixing once every 10 minutes. The EP tube taken out of the water bath and added 650 μL of chloroform after cooling. The two was inverted 30 times to mix thoroughly, and centrifuged at 12000 rpm for 15 minutes at room temperature. 400-500 μL of the supernatant was added into a new EP tube and 650 μL of chloroform was added. The mixture was shaken and mixed thoroughly, and centrifuged at 12000 rpm for 15 minutes at room temperature. 400-500 μL of the supernatant was transferred to a new EP tube containing 700 μL of pre-cooled isopropanol and inverted 30 times to mix thoroughly. The mixture was then centrifuged at 12000 rpm for 15 minutes at room temperature. The supernatant was discarded, and the precipitate was washed once with 95%ethanol, then once with 75%ethanol, and centrifuged at 7500 rpm for 5 min at room temperature. The DNA precipitate was dried and dissolved with 50 μL of sterilized water. DNA concentration (as reflected by the OD600 value) was measured, and the DNA was stored at -20℃.
D.  PCR identification of Arabidopsis mutants
DNA from Arabidopsis wild-type Col-0 and mutants was extracted and used as a template for PCR amplification. The amplified product was subjected to 1.5%agarose gel electrophoresis to detect whether the mutant was a homozygous mutant. The primers used are shown in Table 11.
Table 11. PCR primers for identification of Arabidopsis homozygous mutant
Figure PCTCN2022075977-appb-000010
E.  Arabidopsis genetic transformation and identification
i. Flower dipping transformation of Arabidopsis thaliana
Arabidopsis cultivation and plant transformation preparation. Arabidopsis control group Col-0 and homozygous mutant materials were planted as described above. After the Arabidopsis was bolted, the stalks were removed to increase the number of bolts. The plants were then ready to be transformed when the stalks growed to the same height and only the upper flowers were not blooming.
Agrobacterium preparation. Agrobacterium tumefaciens containing the expression vector at -80℃ were inoculated into 10mL of LB liquid medium containing spectinomycin and cultured overnight at 28℃ at 160 rpm. 100 μL of small shaking bacteria liquid was then transferred to 100 mL of new YEP liquid medium containing spectinomycin for further culturing at 28℃, 200 rpm shaking. When the density of the culture reached OD 600 0.8, the culture was harvested and resuspended the bacteria pellet with 100mL of 5%sucrose and 0.01%Silwet-L77 resuspension solution. The suspension was kept at room temperature for 1-3h for agricultural use.
Transformation of Arabidopsis thaliana. Arabidopsis thaliana that had grown to a suitable bolting height with a large number of inflorescences were used for the transformation. The flowering flocs and the established pods were removed. The unflowered flocs were immersed in the Agrobacterium resuspension for 30s. The Arabidopsis thaliana infected by Agrobacterium was then wrapped in plastic wrap and placed in a dark box for light-proof treatment. After the incubation period of 24 hours, the infected plants were then taken out of the dark box. A second round of transformation was then performed on these plants a week later in order to improve the conversion efficiency. Mature seeds of the plants were harvested.
ii. Screening Transgenic Arabidopsis with Basta
The mature T 0 seeds of the transformed Arabidopsis thaliana were harvested and planted as described above. When the two young true leaves were fully expanded, Basta liquid (Basta dilution concentration is 1: 1000) was sprayed on the plants 2 -3 times, once every other day, and the growth state of Arabidopsis was observed. Non-transgenic Arabidopsis plants appeared chlorosis and gradually died, while transgenic Arabidopsis plants grew normally. After the transgenic Arabidopsis thaliana plant grew 4 leaves, the plants that were positively identified as transgenic plants were transplanted into new small pots, and the seedlings grow up before identification.
iii. Bar test strip detection of transgenic Arabidopsis thaliana
Transgenic Arabidopsis thaliana (T 1, T 2 and T 3) rosette leaves were placed in the EP tube andgrinded with a small pestle. The ground leaves were placed into the EP tube in the direction indicated by the Bar test strip, and observe the strips shown on the test strip Number, two bands represent that the identified Arabidopsis plants are transgenic plants, and one band is non-transgenic plants.
iv. Identification of transgenic Arabidopsis thaliana
The leaf DNA of transgenic Arabidopsis thaliana (T 1, T 2 and T 3) was extracted. The transgenic plants were identified by PCR using Glyma. 06G303700 gene primers and Bar primers using primers shown in Table 12. The PCR products were detected by 1.5%agarose gel electrophoresis.
Table 12. Primers for identification of transgenic Arabidopsis thaliana or transgenic soybean expressing Glyma. 06G303700
Figure PCTCN2022075977-appb-000011
v. qRT-PCR identification of T 3 generation transgenic Arabidopsis thaliana
Total RNA of Arabidopsis rosette leaves were extracted and reversely transcribed into cDNA. The expression level of Glyma. 06G303700 in transgenic Arabidopsiswas determined using primer sequence shown in Table 13. AtACTIN2 was used as an internal reference gene.
Table 13. qRT-PCR primers for transgenic Arabidopsis
Figure PCTCN2022075977-appb-000012
vi. Determination of total nitrogen content in Arabidopsis seeds of Arabidopsis mutants and transgenic plants
Nitrogen content of the seeds was determined using 0.1 mol/L Na 2CO 3 calibration to prepare 0.1 mol/L HCl. 1%H 3BO 3 was prepared and was adjusted to pH between 4 and 5. Seven millileters of 0.1%methyl red and 10 mL of 0.1%bromophenol green indicator were added for every 1 L of H 3BO 3, and the solution appeared wine red. Prepare 40%NaOH for determination.
The seeds were placed in an oven at 60℃ for 12-14 hours. 0.1 g sample (accurate to 0.001 g) was poured into a 50 mL digestion tube through a paper trough. The same sample was tested 3 times. 5 mL of concentrated sulfuric acid and a small amount of catalyst (potassium sulfate and copper sulfate 5: 1) was added to digest each sample in an ovenat 400℃ for 90 minutes. The sample was then taken out from the oven and let cool. FOSS automatic Kjeltec 2300 was used to determine the total nitrogen content.
vii. Determination of fatty acid content in Arabidopsis seeds of Arabidopsis mutants and transgenic plants
The content of fatty acids in seeds was determined by gas chromatography as follows. The seeds were placed in an oven at 105℃ for 20-30 minutes, and then at 65℃ for 12-14 hours. 5 replicate tests were performed for each sample. In each test, about 5 mg of the seed sample was mixed with 1 mL 2.5%concentrated sulfuric acid methanol solution, 5 μL 50 mg/mL BHT (2, 6-di-tert-butyl-4-methylphenol) . 50 μL 10mg/L heptadecanic acid or acetic acid was used as internal standard. The storage tube was immediately sealed and placed into a water bath at 85℃ for 1.5 h. The tube was inverted every 10 minutes to mix the sample and reagents thoroughly, and then letcool to room temperature. 160 μL of 9%NaCl solution and 700 μL of n-hexane were then added to the storage tube, and the mixture was vortexed for 3 minutes andcentrifuged at 4,500 rpm for 10 minutes at room temperature. 400 μL of the supernatant of each sample were placed into a new centrifuge tube and dried overnight in a fume hood. 400 μL of ethyl acetate was then added to the dry pellet to fully dissolve it before the measurement.
The column model used by the Agilent 6890 gas chromatograph was: 30m×320μm×0.25μm. Carrier gas: nitrogen 60 mL/min, hydrogen 60 mL/min, air 450 mL/min. Injection volume: 1 μL, split injection mode, split ratio 10: 1, injection port temperature 170℃. Reaction procedure: hold at 180℃ for 1 min, increase to 250℃ at a rate of 25℃/min and hold for 7 min.
Calculation formula of absolute quantity: 
Figure PCTCN2022075977-appb-000013
is the peak area of the ith fatty acid component, As is the peak area of internal standard, ms is the mass of internal standard, m is the dry weight of the sample.
Relative quantity calculation formula: 
Figure PCTCN2022075977-appb-000014
viii. Transformation of soybean cotyledon nodes
Soybean cotyledon nodes were transformed and cultivated using the following protocol:
Preparation of the Agrobacterium tumefaciens and soybean cotyledon. Take out the Agrobacterium tumefaciens containing the expression vector at -80℃ and inoculate it in 10mL of LB liquid medium containing spectinomycin, and culture it overnight at 28℃ at 160 rpm. Transfer 100 μL of small shaking bacteria liquid to 100 mL of new YEP liquid medium containing spectinomycin for culture, 28℃, 200 rpm shaking culture to OD 600=0.8. Centrifuge at 4000rpm for 10 min at room temperature, discard the supernatant medium, and resuspend the  bacteria with 100mL of 5%sucrose and 0.01%Silwet-L77 resuspension solution, and let stand for 1-3h at room temperature for agricultural use. Resuspend the bacteria above in 100 mL LCCM and incubate at 28℃ 200 rpm for 30 min for subsequent transformation. Resuspend the bacteria above in 100 mL LCCM and incubate at 28℃ 200 rpm for 30 min for subsequent transformation. Sterilize soybean seeds by the following procedures. Choose full and undamaged seeds into the petri dish, put the petri dish and beaker with the selected seeds into the airtight container in the fume hood, open the lid of the petri dish, and add sodium hypochlorite and sodium hypochlorite to the beaker at 94: 6 Hydrochloric acid and quickly seal the airtight container, turn on the fume hood switch, airtight and sterilize the seeds after 8-12 hours, and blow them in a clean bench for 30 minutes to remove the chlorine attached to the surface of the seeds to avoid damage to the seeds. Add appropriate amount of sterilized water to the soybean seeds to make the seeds absorb the water just to complete the imbibition. Put the seeds in the dark for 12-14h.
Co-culture. Divide the seed into two halves along the hypocotyl with a razor blade, and use a razor blade to lightly scratch 2-3 points at the cotyledon node to make a cut. Put the explants into the prepared Agrobacterium resuspension, incubate at 160 rpm at 28℃ for 30 min to facilitate the Agrobacterium infection, and remove the infected explants from the resuspension with tweezers. Place it on the SCCM covered with filter paper and incubate for 3-5 days at 25℃in the dark.
Induction of clumping buds. After 3-5 days of co-cultivation, after the hypocotyls are enlarged, cut the hypocotyls of the explants with a blade, leaving about 2 mm of the hypocotyls, and put the explants after cutting the hypocotyls into sterilized water several times Wash until the liquid is clear and sterile in order to remove excess bacterial liquid. Put the cut-off hypocotyl explants on sterile paper to absorb the remaining liquid on the surface, and insert the explants into the SIM + with tweezers. Set the conditions of the sterile tissue culture room to 25℃ 16 h/8 h light/darkness and place the screening medium plate with explants in the sterile tissue culture room for about 14 days. Observe the growth of the clump buds, take out the slow-growing clump buds and scratch the wound at the bottom again and insert them into a new SIM +; the good ones are used for transfer to the SEM.
Elongation of cluster buds. Cut the large clump buds and insert them into the SEM, and place them in a sterile tissue culture room for about 14 days. The clump buds that have not grown buds are taken out from the SEM, lightly scratched at the bottom to create a new wound, and then inserted into a new SEM for secondary culture. The culture cycle is about 14 days and the process is repeated.
Identification of positive elongated buds. When the buds are about 5 cm long and there are about 3 leaves, select a leaf, and perform Bar test strip test as described below to preliminarily determine the positive seedlings.
Rooting of positive elongated buds. The positive buds were cut from the clumping buds, dipped in IBA hormone for 30 s, inserted into the RM, and cultured in a sterile tissue culture room until they took root.
Transplanting and cultivation of positive seedlings. The positive seedlings were taken out from the culture medium, and the roots were cleaned with clean water to remove the residual culture medium. The positive seedlings were transplanted into the soil and cultured in the plant greenhouse.
ix. Bar test strip detection of T 1 generation transgenic soybean
Select the transgenic soybean plant T1 rosette leaves in the EP tube, add the extract and grind the leaves with a small pestle, insert them into the EP tube in the direction indicated by the Bar test strip, and observe the strips shown on the test strip Number, two bands represent that the identified soybean plants are transgenic plants, and one band is non-transgenic plants.
x. PCR identification of T1 generation transgenic soybean
Follow the DNA extraction of the leaf DNA of the transgenic soybean T 1. Glyma. 06G303700 gene primers and Bar primers were used to identify transgenic soybeans by PCR. The related primer sequences are shown in Table 12 above.
xi. qRT-PCR identification of T 1 generation transgenic soybean
Leaves from the soybean plants were immersed in an RNase-free EP tube and freezed in liquid nitrogen. Total RNA was extracted and reversely transcribed into cDNA. qRT-PCR was performed with the primers in Table 14 to analyze the expression of Glyma. 06G303700.
Table 14. qRT-PCR primers for gene expression pattern analysis
Figure PCTCN2022075977-appb-000015
xii. Determination of protein, oil and fatty acid content in transgenic soybean seeds
A InfraTec TM 1241 Grain Analyzer (FOSS Analytics) was used to determine the protein and oil content of soybean seeds. Each sample was measured 3-5 times, and the average value was used for phenotypic data analysis.
The content of fatty acids in seeds was determined by gas chromatography and calculated as described in section vii of this Example 2 above.
xiii. Haplotype analysis
Using the collected and sorted out SNPs of 680 re-sequenced genomes of soybean resources in Northeast China, the protein and oil content of the above 680 materials were determined by the FOSS grain analysis method for haplotype analysis. The Glyma. 06G303700 (including promoter) sequence length and sequence information were obtained from the Phytozome website, and 680 soybean resource population resequencing (10×sequencing) genome sequence information was used as the population verification data for this experiment. Extract the SNP information of the gene Glyma. 06G303700 (including promoter) sequence. Submit the difference SNP information sorting format to Haploview software, divide the gene block and obtain the haplotype classification in the block.
Taking the haplotypes with more than 5.0%of the population as the excellent haplotypes, the one-way variance method (ANOVA) in the SPSS software was used to analyze the significant differences among the excellent haplotypes and their phenotypes.
Example 3. Identifications of the candidate genes by SNP analysis
Grain protein and oil content of the population of the plants (SN14) was determined, and the protein content was sorted. After the sorting, 20 samples from the high protein and low protein content range were selected and extracted. Quality DNA to prepare high and low phenotype pools for BSA sequencing. Use the SNP-index correlation algorithm to select candidate regions. With SN14 as the reference parent, 3 candidate segments are associated with 95%confidence level, and the genes that cause stop loss, stop gain, or contain Genes with non-synonymous mutations or alternative splicing sites were selected as candidate genes, and a total of 5 genes were screened. The results of bulked segregant analysis ( “BSA” ) mixed pool sequencing are from the master's thesis of Li Wei, Northeast Agricultural University (2016) .
Table 15. Identification of ΔSNP in putative candidate genes
Figure PCTCN2022075977-appb-000016
Figure PCTCN2022075977-appb-000017
Example 4. Analysis of candidate genes tissue expression
Glyma. 03G040200 is reported to have the highest expression in seeds, with a slight expression in stems and no expression in other tissues, but the relative expression levels of seeds and stems are not more than 1. Glyma. 03G036300 has no expression in any organ. Glyma. 06G297500 has extremely high expression levels in various tissues, among which root hairs and roots have the highest expression levels, followed by tip meristems, which gradually decrease in terms of root nodules, stems, pods, leaves, and flowers, and seeds have the lowest expression levels. Glyma. 07G192400 is expressed in all tissues. The apical meristem has the highest expression, followed by pods and seeds, and the tissue with the lowest expression is roots. Glyma. 06G303700 is not expressed in root nodules, and has the highest expression in apical meristems, followed by pods and seeds, and then decreases in the order of flowers, roots, stems, leaves, and root hairs.
Example 5. Promoter element analysis of Glyma. 06G303700
The 3000 bp upstream of the genome sequence of Glyma. 06G303700 were obtained as the promoter sequence of the gene and submitted to the PlantCARE website. The promoter elements were obtained, screened and integrated, and the gene promoter elements were visualized using the TBtools software. The results show that the promoter region of the gene Glyma. 06G303700 include at least the following regions: (i) 60K protein binding site, (ii) an cis-acting element involved in defense and stress response, (iii) a common cis-acting elements in the promoter and enhancer regions, (iv) a core promoter element, (v) an element for maximal elicitor-mediated activation elements, (vi) a conservative DNA module array (CMA3) , (vii) light-responsive elements. The promoter sequence of Glyma. 06G303700 contains a large number of TATA boxes (the core promoter element near the transcription promoter) , which plays a certain role in regulating gene expression. The photoresponsive element of the promoter contains MYB binding sites involved in photoresponse, and some conserved DNA modules involved in photoresponse.
Example 6. Protein structure prediction of Glyma. 06G303700
The amino acid sequences of the genes in the parent SN14 and ZYD00006 were submitted to the SOPMA website for protein secondary structure prediction. The protein  secondary structure prediction of this gene in parent SN14 shows that it contains 36.63%α-helix, 14.13%extended chain, 5.21%β-turn and 44.03%random coil; in parent ZYD00006 its protein The secondary structure prediction shows that it contains 36.35%α-helix, 14.13%extended chain, 4.12%β-turn and 45.40%random coil. The change of only one amino acid base leads to a decrease in the number of α-helices and a decrease in the number of β-turns in ZYD00006, resulting in more random coils. FIG. 6.
The amino acid sequences of the genes in the parent SN14 and ZYD00006 were submitted to the SWISS-MODEL website for protein tertiary structure prediction (FIG. 6) , and models with a QMEAN Z score higher than -4.0 and covering amino acid mutation sites were screened. The predicted gene sequence differs in the tertiary structure of protein in SN14 and ZYD00006.
Example 7. Tissue-specific expression analysis of Glyma. 06G303700
RNA was extracted from the organs (roots, stems, leaves, flowers, pods and seeds) of SN14 and reverse transcribed into cDNA, which was identified by qRT-PCR, Glyma. 06G303700 was expressed in all tissues and organs, very low expression in roots, relatively high expression in stems, leaves, and flowers, higher expression in pods, and the highest expression in seeds, reaching a relative multiple of more than 5 times.
To analyze tissue specific expression of Glyma. 06G303700, RNA is extracted and cDNA were synthesized (Table 16) and qRT-PCR was performed using the following specific primers. The reference gene is GmActin4 (Genbank No: AF049106, Table 14) .
Table 16. Reaction solution preparation for qRT-PCR
Figure PCTCN2022075977-appb-000018
Example 8. Subcellular localization of Glyma. 06G303700
Tobacco cultivation. Tobacco planting soilwas prepared by mixing flower nutrient soil with vermiculite at a ratio of 3: 1. After germination, the seedlings or transfers to new small flowerpots, one plant per pot, placed into an incubator (22℃, 16 h/8 h light/dark, 70 μmol·m-2·s-1) for cultivation, and watered once every 2 days to ensure adequate water.
Agrobacterium injection of tobacco leaves. Pr35S-Glyma. 06G303700 Agrobacterium tumefaciens from -80℃ were thawed and inoculated in 10 mL of spectinomycin-resistant YEP  liquid medium. The culture was grown a shaker at 200 rpm, 28℃ until reaching a density of OD 600=0.8. and the Agrobacterium culture was harvested by centrifuging at 10,000 rpm at room temperature for 1 min. Resuspension Buffer was prepared (1 mL 20 mM MES pH=5.6 and 500 μL 1M MgCl2 to 50 mL with sterile water) and used to wash the cells twice. The agrobacterium pellet was then resuspended in 1 mL of resuspension buffer and 2μL of acetosyringone (dissolved in DMSO) to reach a final concentration of 0.04 g/L to the bacteria. The bacterial solution was then transferred to a large EP tube, adjusted the OD 600 to about 0.2 by resuspending the Buffer, and let stand at room temperature for 1-3 h. Healthy tobacco after 3 weeks of growth was selected. The tobacco leaf of the tobacco plants or pierced with a syringe needle and injected with the prepared Agrobacterium. Tobacco plants inoculated with Agrobacterium tumefaciens were then placed in an incubator (22℃, 16 h light/8 h dark, 70 μmol/m 2/s) for 48 hours, the epidermis removed in order to observe the subcellular localization of the target protein through a confocal microscope. FIG. 8 shows results of subcellular localization of Glyma. 06G303700 in an exemplary assay. The green fluorescence of pr35S-Glyma. 06G303700-GFP appears in the cell membrane and nucleus, indicating a nuclear membrane co-expression pattern.
Example 9. Cloning and Vector Construction of Glyma. 06G303700
Total RNA extraction from soybean SN14 leaves. RNA from young and tender SN14 triple compound leaves was extractedby the trizol method. With 2%concentration agarose gel and electrophoresis detection, three bands of 28s, 18s and 5s were observed, which indicated that the integrity of the RNA was good. The cDNA was obtained by reverse transcription and used for Glyma. 06G303700 gene cloning.
Glyma. 06G303700 clone. The CDS sequence of Glyma. 06G303700 was obtained from the phytozome database. The total length of the sequence is 2190 bp. This sequence was used as a template to design primers at both ends of the gene's CDS sequence (with the terminator removed) . The primer pair was designed to comprise restriction sites (SpeI and BamHI) at both ends of the ccdB gene in the entry vector Firstly, SN14 leaf cDNA was used as a template to clone the CDS sequence of Glyma. 06G303700 gene with CDS primers, and then this product was used as a template to perform PCR with primers with restriction sites to obtain Glyma. 06G303700 with restriction sites on both ends. The gene products with restriction sites were recovered through the gel recovery kit for subsequent experiments. The full-length CDS sequence of Glyma. 06G303700 (with the termination codon TGA removed) was cloned using the cDNA of soybean Suinong 14 leaves as a template. The CDS sequence was amplified using the following primers.
Table 17. Clone primer
Figure PCTCN2022075977-appb-000019
Construction of entry vector (Fu28-Glyma. 06G303700) . The Fu28 empty vector and the target gene were digested with restriction endonucleases (SpeI and BamHI) and ligated with Solution I ligase. The ligation product was transferred to E. coli competent DH5α, cultured in a chloramphenicol resistant plate for about 16 hours, and monoclonal colonies were picked, and the activated bacterial liquid was identified with a value of 1.5%Concentration agarose gel electrophoresis identification and sequencing comparison. The results showed that the target band appeared at 2190 bp, and the sequencing comparison results were consistent with the Glyma. 06G303700 gene sequence. The amplified Glyma. 06G303700 fragments were gel-purified and cloned into an entry vector Fu28 by restriction digestion and ligation. The Fu28 vector fragment with the ccdB gene cut out was about 3200 bp. The gene Glyma. 06G303700 fragment is about 2200 bp. The ligation products were transformed into Escherichia coli. Bacterial clones comprising the cDNA sequence of Glyma. 06G303700 were identified by PCR and verified by sequencing analysis using primers described below.
Construction of expression vector (pr35S-Glyma. 06G303700) . The Fu28-Glyma. 06G303700 and pr35S vector plasmids were extracted, the plasmids were recombined by LR reaction, and the products were transferred into E. coli competent DH5α, cultured in spectinomycin-resistant plates for about 16h, and single colonies were picked, The primers in 2-15 were used to perform PCR identification on the activated bacterial solution, identified by 1.5%concentration agarose gel electrophoresis and sent for sequencing comparison. The results showed that the target band appeared at 2190 bp, and the sequencing comparison results were completely consistent with the Glyma. 06G303700 gene sequence.
Expression vector transferred into EHA105 Agrobacterium tumefaciens. EHA105 Agrobacterium competent cells were first transformed with pr35S-Glyma. 06G303700, the transformed bacterial cells were grown on a YEP plate that is resistant to both rifampicin and spectinomycin, and monoclonal colonies were selected. The transformation was confirmed by PCR as indicated by the presence of a a 2190 bp DNA fragment, which represented that the expression vector (pr35S-Glyma. 06G303700) has been transferred into EHA105 Agrobacterium tumefaciens. Using Gateway technology, the Glyma. 06G303700 gene fragment and related tags contained in the Fu28 entry vector were transferred to the expression vector pr35S (spectinomycin resistant) through the LR recombination reaction. The reaction system and reaction conditions are shown in Table 18.
Table 18. LR reaction system
Figure PCTCN2022075977-appb-000020
The resulting plasmid pr35S-Glyma. 06G303700 was transformed into Escherichia coli. The positive clones were identified by PCR and sequencing analysis. pr35S-Glyma. 06G303700 plasmid was then extracted from the positive monoclonal bacteria culture and transformed into Agrobacterium tumefaciens EHA105. Positive clones were identified by PCR.
Transient expression localization of Glyma. 06G303700. Agrobacterium tumefaciens was injected and transformed into tobacco leaves. After 48 hours, the injected leaves were cut and the epidermis was removed. They were spread out in clean water and placed on a glass slide and covered with a cover glass. Observe the subcellular localization of pr35S-Glyma. 06G303700-GFP expressing fusion protein under a confocal microscope. The results are shown in FIG. 4: the green fluorescence of pr35S-Glyma. 06G303700-GFP appears on the cell membrane and nucleus, indicating that the gene Glyma. 06G303700 is a nuclear membrane co-expressed gene (FIG. 8) .
Example 10. Expressing Glyma. 06G303700 in Arabidopsis
i.  Selection of Arabidopsis mutants
Using the Blast function contained in the Phytozome website, Arabidopsis gene AT1G05230 homologous to Glyma. 06G303700 were selected and their conserved domains identified. Arabidopsis homologous gene AT1G05230 contains three conserved domains: START_ArGLABRA2_like, Homeobox, and MrC super family. Similar to Glyma. 06G303700, the Arabidopsis homologous gene AT1G05230 also has the START_ArGLABRA2_like domain, as shown in Table 19.
Table 19. AT1G0523 gene function annotation
Figure PCTCN2022075977-appb-000021
Figure PCTCN2022075977-appb-000022
Efforts of searching for AT1G05230 mutants on the Tair10 website led toidentification of SALK_127828.4700. x (SEQ ID NO: 60) as the Glyma. 06G303700 Arabidopsis mutant. SALK_127828.4700. x is an Arabidopsis mutant with Col-0 as the background, with an insertion of 186bp sequence into the coding region by means of T-DNA insertion mutagenesis.
ii.  PCR identification of Arabidopsis mutants
Arabidopsis mutant SALK_127828.4700. x and the Arabidopsis wild-type Col-0 material were planted, and DNA extracted from the rosette leaves. PCR was performed with a combination of LP+RP and LP+BP primers as shown in Table 11. The length of the PCR product of LP + RP was 1170 bp, and the length of the product of LP + BP was 578-878 bp. The results of 1.5%agarose gel electrophoresis indicate that the mutant was homozygous.
iii.  Basta screening of transgenic Arabidopsis
The T 0 seeds transformed by Arabidopsis mutants were planted. After growing two leaves, Basta reagent (reagent comprising Basta herbicide) was sprayed once every other day. After spraying three times, a large number of Arabidopsis were found to be yellow and stagnant, and only a few Arabidopsis plants continued to grow. It was preliminarily identified as transgenic Arabidopsis replenishment and overexpression plants.
iv.  Bar test strip detection and PCR identification of transgenic Arabidopsis
Transgenic Arabidopsis T1, T2, T3 generation plants were planted. Leaf extract was prepared as described above. A Bar test strip was inserted into the extract in a specified direction as provided in the manufacture’s instructions. The results displayed on the Bar test strip indicated that the Arabidopsis from which the leaf extract was obtained was genetically modified.
Transgenic Arabidopsis plants were planted in T1, T2, and T3 generations. DNA was extracted from rosette leaves of Arabidopsis thaliana, and PCR of the target gene Glyma. 06G303700 and the Bar gene were performed respectively. After 1.5%concentration agarose gel electrophoresis, the results showed that there were bands at 516 bp (Bar gene) and 2190 bp (Glyma. 06G303700 gene) , indicating that transgenic Arabidopsis plants were obtained.
v.  qRT-PCR identification of T 3 transgenic Arabidopsis
Transgenic Arabidopsis (overexpression plant pr35S: Glyma. 06G303700 and mutant replenishment plant pr35S: Glyma. 06G303700/SALK_127828.4700. X) and Col-0 and mutant plant SALK_127828.4700. X were planted on the same conditions, the total RNA was extracted from the rosette leaves and reverse transcribed into cDNA, and the expression of the target gene Glyma. 06G303700 was checked by qRT-PCR reaction. The results showed that the gene Glyma. 06G303700 was expressed in mutant replenishment plants and overexpression plants. The expression level of the gene in overexpression plants was higher than that in mutant replenishment plants.
vi.  Determination of fatty acids and total nitrogen in T3 transgenic Arabidopsis
Transgenic Arabidopsis thaliana (overexpressed plant pr35S: Glyma. 06G303700 and mutant complement plant pr35S: Glyma. 06G303700/SALK_127828.4700. X) and Col-0 and mutant plant SALK_127828.4700. X were planted under the same conditions. The mature pods of T 3 generation transgenic plants of Arabidopsis thaliana were collected, and the seeds were obtained and dried. The fatty acid composition content of Arabidopsis thaliana seeds was determined by gas chromatography, and the total nitrogen content of Arabidopsis thaliana seeds was determined by Kjeldahl nitrogen determination method. When the phenotype of mutant materials and wild-type materials was determined, the content of fatty acid components in mutant plants was lower than wild-type materials, and the content of oleic acid, linoleic acid and eicosenoic acid was significantly lower than wild-type materials. The total nitrogen content of mutant plants was significantly lower than wild type plants. When the phenotype of T 3 transgenic seeds was determined, the content of fatty acids in the seed grains of the mutant plants was significantly increased, but still lower than the control plants, and the content of linoleic acid was significantly increased; the content of components in the overexpression plants was higher than the wild type, the content of palmitic acid was extremely significantly increased, and the content of oleic acid and eicosenoic acid was significantly increased. In terms of protein content, the total nitrogen content of mutant replenishment plants increased significantly, which was significantly different from that of control materials. The total nitrogen content of overexpressed plants was significantly higher than that of wild-type plants. The results showed that in Arabidopsis seed protein oil accumulation, Glyma. 06G303700 could promote fatty acid content to a certain extent, and the effect on protein content was more significant. FIG. 9 and 10.
Example 11. Expressing Glyma. 06G303700 in soybean
i.  Bar test strip detection and PCR identification of T 1 transgenic soybean
The T 1 genetically modified soybeans were planted the leaves were crushed and tested using the Bar test strip as described above. The result showed that two horizontal lines appear on the Bar test strip, indicating that the verified plants were genetically modified soybean plants.
The DNA was extracted from the leaves of T 1 generation transgenic soybean, and the target gene Glyma. 06G303700 primer and Bar primer were used for PCR. After 1.5%concentration agarose gel electrophoresis, the results showed that there were 516bp (Bar) and 2190bp (Glyma. 06G303700) bands, indicating that the verified plants were transgenic soybean plants.
ii.  qRT-PCR identification of T1 transgenic soybeans
The transgenic soybean (overexpression plant 35S: Glyma. 06G303700) and the control plant DN50 were planted under the same conditions. The young leaves were taken to extract total RNA and reverse transcribed into cDNA. The expression level of Glyma. 06G303700 was tested by qRT-PCR reaction. The results showed that the expression level of Glyma. 06G303700 in the overexpression plants was higher than the control plants, indicating that Glyma. 06G303700 was successfully transformed into soybean plants (FIG. 11) .
iii.  Determination of protein and fatty acids in T1 transgenic soybean
The transgenic soybean (overexpression plant 35S: Glyma. 06G303700) and the control plant DN50 were planted under the same conditions, their mature seeds were harvested, and some of the seeds were dried for phenotyping, and the grain protein and oil content were determined by gas chromatography analysis. The content of fatty acid components in Arabidopsis seeds was determined by gas chromatography. The protein, oil, and fatty acid content in the overexpression plants were significantly higher than the control plants, indicating that Glyma. 06G303700 promoted quality traits (protein and oil content) (FIG. 12) .
Example 12. Haplotype analysis of Glyma. 06G303700 in soybean
i.  Materials for haplotype analysis
The haplotype analysis were performed on 680 soybean resource populations in Northeast China. The protein oil content of the soybean resource population in Northeast China was phenotypically analyzed, and the analysis showed that this population had varying amounts of protein and oil content. The highest protein content of this resource group in the northeast region in 2019 was 52.94%, the lowest was 37.09%, and the average was 42.69%; the highest oil content was 23.04%, the lowest was 14.45%, and the average was 20.74%. This pattern  conforms to the variation law of phenotypic traits and can be used for haplotype analysis of candidate genes. In 2018, the research team conducted a whole-genome resequencing analysis of the soybean resource population in the northeast region. This experiment used the data to perform gene haplotype analysis.
Table 20. Protein and oil content in soybean resources in northeast China
Figure PCTCN2022075977-appb-000023
ii.  Division of the Glyma. 06G303700 haplotype blocks
The whole genome of soybean resource population in Northeast China was resequenced and unqualified SNPs were screened out. There were 20 SNP variation sites in Glyma. 06G303700 and its promoter region that met the research requirements (Table 2) . Using HaploView4.2 software, SNP variation sites were divided into three blocks, and SNP in each block had strong linkage disequilibrium. Block 1 contains 5 SNPs, of which 1 SNPs is located in the CDS coding region; Block 2 contains 12 SNPs, among which 2 SNPs are located in CDS coding region; Block 3 contains two SNPs, both of which are not in the CDS coding region.
iii.  Excellent haplotype analysis and phenotypic correlation analysis
Haplotypes that exceed 5.0%of the population (more than 34 Northeast resource groups) are called excellent haplotypes. There are 3 excellent haplotypes in Block 1, Hap_1, Hap_2 and Hap_3 account for 71.07%, 12.60%and 7.31%of all haplotypes. Use the multiple comparison function of SPSS software, the phenotype of the resource material protein and oil in each group of excellent haplotypes were analyzed. Hap_1 and Hap_2 showed extremely significant difference in protein content (P < 0.01) and significant difference in oil content (P <0.05) ; Hap_1 and Hap_3 showed extremely significant differences in protein content (P < 0.01) and oil content (P < 0.01) ; Hap_2 and Hap_3 showed no significant difference in protein content, but showed extremely significant difference in oil content (P < 0.01) . Hap_1 showed a low-protein phenotype, Hap_2 and Hap_3 showed a high-protein phenotype; in terms of oil content, Hap_1 and Hap_2 showed a high-oil phenotype, and Hap_3 showed a low-oil phenotype. The base variation (C: T) in the exon region occurred at 1890 bp of the gene. The SNP variation of Hap_2 was different from the reference genome, and the protein content of the phenotype was extremely different from that of HAP_1 (P<0.01) ; oil content is significantly different from HAP_1 (P<0.05) , and extremely significantly different from Hap_3 (P<0.01) . See FIGS. 13 and Table 21.
Table 21. Significant analysis of phenotypes among excellent haplotypes in block 1
Figure PCTCN2022075977-appb-000024
There were two excellent haplotypes in Block 2: Hap_4 and Hap_5, which account for 58.01%and 20.68%of the haplotypes, respectively. Using the multiple comparison function of SPSS software to analyze the significance of the protein oil phenotype of resource materials in each group of haplotypes, there was no significant difference in protein content between Hap_4 and Hap_5, but there was a significant difference in oil content (P < 0.05) . Hap_4 showed a low oil phenotype, while Hap_5 showed a high oil phenotype. The base variations (C: G) and (G: A) in the exon region of the gene 6941 bp and 6977 bp occurred. Hap_5 had SNP variations different from the reference genome, and the oil phenotype of Hap_5 was significantly different from Hap_4 (P < 0.05) . See FIG. 14 and Table 22.
Table 22. Significant analysis of phenotype among excellent haplotypes in block 2
Figure PCTCN2022075977-appb-000025
There are two excellent haplotypes in Block 3: Hap_6 and Hap_7, which account for 64.23%and 35.93%of the haplotypes, respectively. Using the multiple comparison function of SPSS software to analyze the significance of the protein oil phenotype of resource materials in each group of haplotypes, Hap_6 and Hap_7 showed significant differences in protein content (P < 0.05) , and there was a very significant difference in terms of oil content (P < 0.01) . The protein content of Hap_6 was higher than that of Hap_7, and Hap_6 had a high protein phenotype while Hap_7 was a low protein phenotype. The oil content of Hap_6 was lower than Hap_7, and Hap_6 showed a low oil phenotype while Hap_7 showed a high oil phenotype. See FIG. 15 and Table 23.
Table 24. Significant analysis of phenotype among excellent haplotypes in block 3
Figure PCTCN2022075977-appb-000026
Table 25. haplotypes assocated with increased protein content and oil content
Figure PCTCN2022075977-appb-000027
Candidate genes were obtained by BSA mixed pool sequencing, Glyma. 03G040200 has an OPT domain, the gene expression in seeds is low, and there is no difference in parent amino acid sequence; Glyma. 03G036300 has a domain, having a function related to DNA repair, and the gene expression of which is absent in various tissues, . Glyma. 07G192400 has no recognizable domains, and it is highly expressed in seeds. Glyma. 06G303700 has structural domains with a function related to lipid transfer and is highly expressed in seeds, ; Glyma. 06G297500 has no recognizable domains, it is expressed inlow levels in seeds.
Glyma. 06G303700 has the domain START_ArGLABRA2_like, having a function related to lipid transfer. Results from tissue-specific expression indicate that the gene is expressed in high levels in the seeds, which may be related to soybean quality and regulation of the synthesis and metabolism of grain storage related.
Glyma. 06G303700 is expressed in all tissues and organs, with the highest expression level in seeds. The expression pattern of Glyma. 06G303700 and the published soybean seed protein oil-related genes (GmWRI1a, GmWRI1b, GmLEC1a, GmLEC1b, GmFUSa, GmABI3, GmABI5, GmDREBL) during the life cycle of soybean seed development are partly similar, showing a low-high-low trend.
Purchasing Arabidopsis gene mutants that are highly homologous to Glyma. 06G303700 through the ABRC website: abrc. osu. edu, screening homozygous Arabidopsis mutant seeds, and determining the fatty acid content and total nitrogen content of the grains. The fatty acid content and total nitrogen content of Arabidopsis mutant seeds were significantly lower than control plants. The fatty acid content and total nitrogen content of the mutant replenishment plants increased significantly, and the fatty acid content and total nitrogen content of the overexpression plants also increased. To sum up, Glyma. 06G303700 has important  potential in improving soybean quality, and has regulatory effect on improving soybean grain protein and oil content.
680 soybean resources from Northeast China were resequenced, and the haplotype analysis of the gene Glyma. 06G303700 (including the promoter) was performed. The results showed that the gene had 20 SNP mutations in the resources, of which 3 were located in the exon region. The base variations were at 1890 bp, 6941 bp and 6977 bp are (C: T) , (C: G) , and (G: A) , respectively. These sites may be closely related to the accumulation of protein and oil. According to the linkage disequilibrium relationship, the site is divided into three blocks. Block 1 has three excellent haplotypes: Hap_1 is a low protein and high oil phenotype, Hap_2 is a high protein and high oil phenotype, and Hap_3 is a high protein and low oil phenotype. Block 2 has two excellent haplotypes: Hap_4 is a low oil phenotype and Hap_5 is a high oil phenotype. Block 3 has two excellent haplotypes: Hap_6 is a high protein and low oil phenotype and Hap_7 is a low protein and high oil phenotype.
All patents, patent publications, patent applications, journal articles, books, technical references, and the like discussed in the instant disclosure are incorporated herein by reference in their entirety for all purposes.
It can be appreciated that, in certain aspects of the disclosure, a single component may be replaced by multiple components, and multiple components may be replaced by a single component, to provide an element or structure or to perform a given function or functions. Except where such substitution would not be operative to practice certain embodiments of the disclosure, such substitution is considered within the scope of the disclosure.
The examples presented herein are intended to illustrate potential and specific implementations of the disclosure. It can be appreciated that the examples are intended primarily for purposes of illustration of the disclosure for those skilled in the art. There may be variations to these diagrams or the operations described herein without departing from the spirit of the disclosure. For instance, in certain cases, method steps or operations may be performed or executed in differing order, or operations may be added, deleted or modified.
All numerical designations, e.g., pH, temperature, time, concentration, and molecular weight, including ranges, are approximations which are varied (+) or (-) by increments of 0.1 or 1.0, as appropriate. It is to be understood, although not always explicitly stated that all numerical designations are preceded by the term “about. ” Where a range of values is provided, it is understood that each intervening value, to the smallest fraction of the unit of the lower limit, unless the context clearly dictates otherwise, between the upper and lower limits of that range is  also specifically disclosed. Any narrower range between any stated values or unstated intervening values in a stated range and any other stated or intervening value in that stated range is encompassed. The upper and lower limits of those smaller ranges may independently be included or excluded in the range, and each range where either, neither, or both limits are included in the smaller ranges is also encompassed within the technology, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included.
The following copending commonly owned patent application is incorporated by reference in its entirety for all purposes:
METHODS AND COMPOSITIONS FOR INCREASING PROTEIN AND/OR OIL CONTENT AND MODIFYING OIL PROFILE IN A PLANT, International Application No. ____________, filed ______, 2022, and filed concurrently herewith. (Attorney Docket No. 086879-1262815; Syngenta Reference No. 82423-WO-REG-ORG-P-1) .
In the foregoing description, numerous specific details are set forth to provide a more thorough understanding of the present invention. However, it will be apparent to one of skill in the art that the invention described in this disclosure may be practiced without one or more of these specific details. In other instances, well-known features and procedures well known to those skilled in the art have not been described to avoid obscuring the invention. Embodiments of the disclosure have been described for illustrative and not restrictive purposes. Although the present invention is described primarily with reference to specific embodiments, it is also envisioned that other embodiments will become apparent to those skilled in the art upon reading the present disclosure, and it is intended that such embodiments be contained within the present inventive methods. Accordingly, the present disclosure is not limited to the embodiments described above or depicted in the drawings, and various embodiments and modifications can be made without departing from the scope of the claims below.
Figure PCTCN2022075977-appb-000028
Figure PCTCN2022075977-appb-000029
Figure PCTCN2022075977-appb-000030
Figure PCTCN2022075977-appb-000031
Figure PCTCN2022075977-appb-000032
Figure PCTCN2022075977-appb-000033
Figure PCTCN2022075977-appb-000034
Figure PCTCN2022075977-appb-000035
Figure PCTCN2022075977-appb-000036
Figure PCTCN2022075977-appb-000037
Figure PCTCN2022075977-appb-000038
Figure PCTCN2022075977-appb-000039
Figure PCTCN2022075977-appb-000040
Figure PCTCN2022075977-appb-000041
Figure PCTCN2022075977-appb-000042
Figure PCTCN2022075977-appb-000043
Figure PCTCN2022075977-appb-000044
Figure PCTCN2022075977-appb-000045
Figure PCTCN2022075977-appb-000046
Figure PCTCN2022075977-appb-000047
Figure PCTCN2022075977-appb-000048
Figure PCTCN2022075977-appb-000049
Figure PCTCN2022075977-appb-000050
Figure PCTCN2022075977-appb-000051
Figure PCTCN2022075977-appb-000052
Figure PCTCN2022075977-appb-000053
Figure PCTCN2022075977-appb-000054
Figure PCTCN2022075977-appb-000055
Figure PCTCN2022075977-appb-000056

Claims (91)

  1. An elite Glycine max plant having in its genome a nucleic acid sequence from a donor Glycine plant, wherein the donor Glycine plant is a different strain from the elite Glycine max plant, and wherein the nucleic acid sequence encoding at least one polypeptide having at least 90%identity or 95%identity to SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, and/or 24-59, wherein said polypeptide confers increased protein content, increased oil content, and/or modified oil profile on the elite Glycine max plant as compared to a control plant not comprising said nucleic acid sequence.
  2. The elite Glycine max plant of claim 1, wherein the donor Glycine plant is a Glycine soja plant or Glycine max plant.
  3. The elite Glycine max plant of claim 2, wherein the Glycine soja plant is a ZYD00006 variety.
  4. The elite Glycine max plant of claim 2, wherein the Glycine max plant is a DN50 variety or a SN14 variety.
  5. The elite Glycine max plant of claim 1 or 4, wherein the nucleic acid sequence encodes at least one polypeptide having the amino acid sequence set forth in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59.
  6. The elite Glycine max plant of any one of claims 1-5, wherein the nucleic acid sequence has at least 90%, 95%, or 100%sequence identity to any one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21.
  7. The elite Glycine max plant of any one of claims 1-6, wherein the polypeptide encoded by the nucleic acid sequence has at least 90%, or at least 95%identity to any one of SEQ ID NO: 3, 5, or 22, wherein the polypeptide comprises one or more of the following:
    (i) a START domain, wherein START domain has no more than two, no more than five, no more than ten amino acid substitutions as compared to amino acid residues 246-466 of SEQ ID NO: 20, or
    (ii) a homeodomain, wherein the homeodomain has no more than two, no more than five, no more than ten amino acid substitutions as compared to amino acid residues 55-113 of SEQ ID NO: 20.
  8. The elite Glycine max plant of any one of claims 1-6, wherein the nucleic acid sequence is introduced into said plant genome by genome editing of the genomic sequences corresponding to and comprising any one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16 or 17, wherein the genome editing confers increased protein content, increased oil content, and/or modified oil profile.
  9. The elite Glycine max plant of any one of claims 1-6, wherein the nucleic acid sequence is introduced by genome editing of a Glycine max genomic region homologous to or an ortholog of the nucleic acid sequence corresponding to SEQ ID NO: 1, and further making at least one genomic edit to said Glycine max genomic region of at least one allele change corresponding to any described in any of Tables 21-23, wherein the one or more alleles are associated with the one or more of haplotypes Hap_1, Hap_2, Hap_3, Hap_5, Hap_6, and/or Hap_7, wherein said one or more alleles confer in the plant increased protein content, increased oil content, and/or modified oil profile, wherein said Glycine max genomic region did not comprise said allele change before genome editing, and wherein said genomic edit confers in the plant increased protein content, increased oil content, and/or modified oil profile.
  10. The elite Glycine max plant of claim 7 or 8, wherein the genomic editing is accomplished through CRISPR, TALEN, meganucleases, or through modification of genomic nucleic acids.
  11. The elite Glycine max plant of any one of claims 1-6, wherein said nucleic acid sequence is introduced into said plant genome by transgenic expression of
    (a) a nucleic acid sequence having at least 90%identity or 95%identity to SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21,
    (b) a nucleic acid sequence encoding a polypeptide having at least 90%identity or 95%identity to the sequence of any one of SEQ ID NO: 22 or 24-59, or
    (c) a nucleic acid sequence encoding a polypeptide having the sequence set forth in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59,
    wherein said polypeptide confers increased protein content, increased oil content, and/or modified oil profile on the elite Glycine max plant.
  12. The elite Glycine max plant of any of claims 1-10, wherein the elite Glycine max plant has in its genome at least one allele that is associated with a haplotype of Hap_1, Hap_2, Hap_5, and/or Hap_7, wherein the plant has increased oil content.
  13. The elite Glycine max plant of any of clams 1-10, wherein the elite Glycine max plant has in its genome at least one allele that is associated with a haplotype of Hap_2, Hap_3, and/or Hap_6, wherein the plant has increased protein content.
  14. The elite Glycine max plant of any one of claims 1-12, wherein at least one parental line of said elite Glycine max plant was selected or identified through molecular marker selection, wherein said parental line is selected or identified based on the presence of a molecular marker located within or closely linked with said nucleic acid sequence corresponding to any one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, or any portion thereof, wherein said molecular marker is associated with increased protein content, increased oil content, and/or modified oil profile.
  15. The elite Glycine max plant of claim 13, wherein the molecular marker is a single nucleotide polymorphism (SNP) , a quantitative trait locus (QTL) , an amplified fragment length polymorphism (AFLP) , randomly amplified polymorphic DNA (RAPD) , a restriction fragment length polymorphism (RFLP) , or a microsatellite.
  16. The elite Glycine max plant of claim 13 or 14, wherein the nucleic acid sequence comprises a SNP marker associated with increased protein content, increased oil content, and/or modified oil profile, and wherein the molecular marker is any one or more of the SNP markers as shown in Table 2.
  17. The elite Glycine max plant of any one of claims 1-15, wherein the elite Glycine max plant is an agronomically elite Glycine max plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, or herbicide tolerance.
  18. A plant, the genome of which has been edited to comprise a nucleic acid sequence encoding at least one polypeptide having at least 90%identity or 95%identity to the sequence of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22 and/or 24-59, wherein said polypeptide confers increased protein content, increased oil content, and/or modified oil profile relative to a control plant, wherein the plant does not comprise said nucleic acid sequence before the genome editing.
  19. The plant of claim 17, wherein the nucleic acid sequence is introduced into said plant genome by genome editing of a nucleic acid sequence set forth in SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, and/or 21.
  20. The plant of claim 17 or 18, wherein the genome editing comprises duplication, inversion, promoter modification, terminator modification and/or splicing modification of the nucleic acid sequence.
  21. The plant of any one of claims 17-19, wherein the nucleic acid sequence is introduced by genome editing of a genomic region homologous to or an ortholog of the nucleic acid sequence corresponding to SEQ ID NO: 1, and further making at least one genomic edit to said Glycine max genomic region of at least one allele change corresponding to any described in any of Table 2, wherein the one or more alleles are associated with the one or more of haplotypes Hap_1, Hap_2, Hap_3, Hap_5, Hap_6 and/or Hap_7, wherein said one or more alleles confer in the plant increased protein content, increased oil content, and/or modified oil profile, wherein said Glycine max genomic region did not comprise said allele change before genome editing, and wherein said genomic edit confers in the plant increased protein content, increased oil content, and/or modified oil profile.
  22. The plant of any one of claims 17-20, wherein the genomic editing is accomplished through CRISPR, TALEN, meganucleases, or through modification of genomic nucleic acids.
  23. The plant of any one of claims 17-21, wherein the plant has in its genome at least one allele that is associated with a haplotype of Hap_1, Hap_2, Hap_5, and/or Hap_7, wherein the plant has increased oil content.
  24. The plant of any one of clams 17-21, wherein the plant has in its genome at least one genetic marker that is allele that is associated with a haplotype of Hap_2, Hap_3, and/or Hap_6, wherein the plant has increased protein content.
  25. The plant of any one of claims 17-23, wherein the nucleic acid sequence comprises a SNP marker associated with increased protein content, increased oil content, and/or modified oil profile, and wherein the molecular marker is any one or more of the SNP markers as shown in Table 2.
  26. The plant of any one of claims 17-24, wherein the plant is an agronomically elite plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, or herbicide tolerance.
  27. The plant of any one of claims 17-25, wherein the nucleic acid sequence is operably linked to a heterologous promoter and wherein the heterologous promoter is active in the plant.
  28. The plant of claim 26, wherein the promoter is a native promoter or active variant or fragment thereof.
  29. A plant having stably incorporated into its genome a nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence encodes a polypeptide having
    (a) an amino acid sequence comprising at least 85%, at least 90%, or at least 95%identity to at least one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, or,
    (b) an amino acid sequence set forth in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59,
    wherein said nucleic acid sequence is heterologous to the plant, and
    wherein the plant has increased protein content, increased oil content, and/or modified oil profile as compared to a control plant.
  30. The plant of claim 28, wherein
    (a) the nucleic acid sequence comprises at least 85%, at least 90%, or at least 95%identity to at least one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21, or
    (b) the nucleic acid sequence is any one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21.
  31. The plant of claim 28 or 29, wherein the nucleic acid sequence is introduced into the genome by transgenic expression.
  32. The plant of claim 28 or 29, wherein the nucleic acid sequence is introduced by genome editing.
  33. The plant of any one of claims 28-31, wherein the promoter is an endogenous promoter.
  34. The plant of any one of claims 28-31, wherein the promoter is a constitutive promoter, inducible promoter, a a tissue-specific promoter.
  35. The plant of any one of claims 28-30, wherein said genomic region of the plant comprises at least one allele corresponding to one or more alleles as described in any of Tables 2-5, wherein the one or more alleles are associated with one or more of haplotypes Hap_1,  Hap_2, Hap_3, Hap_5, Hap_6, and/or Hap_7, and wherein said one or more alleles confer in the plant increased protein content and/or oil content.
  36. The plant of any one of claims 28-34, wherein the plant has in its genome at least one allele associated with a haplotype of Hap_1, Hap_2, Hap_5, and/or Hap_7, wherein the plant has increased oil content.
  37. The plant of any one of claims 28-34, wherein the plant has in its genome at least one allele associated with a haplotype of Hap_2, Hap_3, and/or Hap_6, wherein the plant has increased protein content.
  38. The plant of any one of claims 28-36, wherein the nucleic acid sequence comprises a SNP marker associated with increased protein content, increased oil content, and/or modified oil profile, and wherein the molecular marker is any one or more of the SNP markers as shown in Table 2.
  39. The plant of any one of claims 28-37, wherein the plant is a dicot plant.
  40. The plant of claim 38, wherein the dicot plant is a soybean plant or an elite soybean plant.
  41. The plant of any one of claims 28-37, wherein the plant is a monocot plant.
  42. The plant of claim 40, wherein the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane.
  43. The plant of any one of claims 28-41, wherein the plant is an agronomically elite plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, or herbicide tolerance.
  44. A progeny plant from the elite Glycine max plant of any one of claims 1-16 or the plant of any one of claims 17-42, wherein said progeny plant has stably incorporated into its genome the nucleic acid sequence.
  45. A plant cell, seed, or plant part derived from the elite Glycine max plant of any one of claims 1-16 or the plant of any one of claims 17-42, wherein said plant cell, seed or plant part has stably incorporated into its genome the nucleic acid sequence.
  46. A harvest product derived from the elite Glycine max plant of any one of claims 1-16 or the plant of any one of claims 17-42.
  47. A processed product derived from the harvest product of claim 45, wherein the processed product is a flour, a meal, an oil, a starch, or a product derived from any of the foregoing.
  48. A method of producing a soybean plant having increased protein content, increased oil content, and/or modified oil profile, the method comprising the steps of:
    a) providing a donor soybean plant comprising in its genome a nucleic acid sequence encoding at least one polypeptide having at least 90%identity or 95%identity to SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59, wherein said nucleic acid sequence confers to said donor soybean plant increased protein content, increased oil content, and/or modified oil profile;
    b) crossing the donor soybean plant of a) with the recipient soybean plant not comprising said nucleic acid sequence; and
    c) selecting a progeny plant from the cross of b) by isolating a nucleic acid from said progeny plant and detecting within said nucleic acid a molecular marker associated with said nucleic acid sequence thereby producing a soybean plant having increased protein content, increased oil content, and/or modified oil profile.
  49. The method of claim 47, wherein the molecular marker is a single nucleotide polymorphism (SNP) , a quantitative trait locus (QTL) , an amplified fragment length polymorphism (AFLP) , randomly amplified polymorphic DNA (RAPD) , a restriction fragment length polymorphism (RFLP) or a microsatellite.
  50. The method of claim 47 or 48, wherein the molecular markers are markers as set forth in Tables 2-5.
  51. The method of any one of claims 47-49, wherein either the recipient or the donor soybean plant is an elite Glycine max plant.
  52. A method of producing a Glycine max plant with increased protein content, increased oil content, and/or modified oil profile, the method comprising the steps of:
    a) isolating a nucleic acid from a Glycine max plant;
    b) detecting in the nucleic acid of a) at least one molecular marker associated with, or closely linked with a nucleic acid sequence comprising any one of SEQ ID NO: 1, 2, 4, 7, 10, 11,14, 16, 17, or a portion of any thereof, wherein said portion confers to a plant increased protein content, increased oil content, and/or modified oil profile;
    c) selecting a plant based on the presence of the molecular marker detected in b) ; and
    d) producing a Glycine max progeny plant from the plant of c) identified as having said marker associated with increased protein content, increased oil content, and/or modified oil profile.
  53. The method of claim 51, wherein the molecular marker is a single nucleotide polymorphism (SNP) , a quantitative trait locus (QTL) , an amplified fragment length polymorphism (AFLP) , randomly amplified polymorphic DNA (RAPD) , a restriction fragment length polymorphism (RFLP) or a microsatellite.
  54. The method of claim 51 or 52, wherein the molecular marker is one or more SNPs set forth in Table 2.
  55. The method of any one of claims 51-53, wherein the molecular marker comprises alleles associated with one or more of haplotypes Hap_1, Hap_2, Hap_3, Hap_5, and/or Hap_7.
  56. The method of claim 51, wherein the detecting comprises amplifying a molecular marker locus or a portion of the molecular marker locus and detecting the resulting amplified molecular marker amplicon.
  57. The method of claim 51, wherein the nucleic acid is selected from DNA or RNA.
  58. A plant produced by the method of any one of claims 47-56.
  59. A method of conferring increased protein content, increased oil content, and/or modified oil profile to a plant comprising:
    a) introducing into the genome of the plant a nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence is stably incorporated into the genome, wherein the nucleic acid sequence encodes a polypeptide having
    (i) an amino acid sequence comprising least 85%, at least 90%, or at least 95%identity to at least one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59, or
    (ii) an amino acid sequence set forth in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59,
    wherein said nucleic acid sequence is heterologous to the plant, and
    wherein expression of said nucleic acid sequence increases protein content, increases oil content, and/or modifies oil profile compared to a control plant not expressing said nucleic acid sequence.
  60. The method of claim 58, wherein the nucleic acid sequence is introduced into the genome of the plant by transformation.
  61. The method of claim 58, wherein the nucleic acid sequence is introduced into the genome of the plant by crossing a donor plant comprising the nucleic acid sequence with the plant to produce a progeny plant having increased protein content, increased oil content, and/or modified oil profile.
  62. The method of claim 58, wherein the nucleic acid sequence is introduced into the genome of the plant by gene editing of the genome of the plant.
  63. The method of claim 58, wherein the method comprises Cas12a mediated gene replacement.
  64. The method of claim 62, wherein the method comprises at least one gRNA.
  65. The method of any one of claims 58-63, wherein the promoter is an exogenous promoter.
  66. The method of any one of claims 58-63, wherein the promoter is an endogenous promoter.
  67. The prompter of claim 64, wherein the exogenous promoter comprises SEQ ID NO: 23 or an active variant or fragment thereof.
  68. The method of claim 59-65, wherein the method comprises screening for the introduced nucleic acid sequence with PCR and/or sequencing.
  69. The method of any one of claims 58-67, wherein the plant is a dicot plant.
  70. The method of claim 68, wherein the dicot plant is a soybean plant.
  71. The method of any one of claims 58-67, wherein the plant is a monocot plant.
  72. The method of claim 70, wherein the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane.
  73. A plant produced by the method of any one of claims 58-71.
  74. A polypeptide selected from:
    (a) a polypeptide having the amino acid sequence of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein expression of the polypeptide in a plant confers increased protein  content, increased oil content, and/or modified oil profile on said plant, and having a heterologous amino acid sequence attached thereto;
    (b) a polypeptide comprising the amino acid sequence of SEQ ID NO: 3, 5, 8, 9, 12,15, 18, 19, 22, or 24-59, and having a substitution and/or a deletion and/or an addition of one or more amino acid residues, wherein expression of the polypeptide in the plant confers increased protein, increased oil content, and/or modified oil profile on said plant;
    (c) a polypeptide having at least 99%, at least 95%, at least 90%, at least 85%, or at least 80%identity with and having the same function as the amino acid sequence of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein the polypeptide when expressed in a plant confers increased protein content, increased oil content, and/or modified oil profile on the plant; or
    (d) a fusion protein comprising the amino acid sequence of SEQ ID NO: 3, 5, 8, 9, 12,15, 18, 19, 22, or 24-59 or the polypeptide as defined in any one of (a) to (c) .
  75. A nucleic acid molecule comprising
    (a) a nucleotide sequence encoding a polypeptide having an amino acid sequence having at least 90%, 95%or 100%sequence identity to SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein said nucleotide sequence comprises a heterologous nucleic acid sequence attached thereto and expression of the nucleic acid molecule in a plant increases protein content, increases oil content, and/or modifies oil profile in the plant;
    (b) the nucleotide sequence of part (a) comprising a sequence of SEQ ID NOs: NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21;
    (c) the nucleotide sequence of part (a) having at least 99%, at least 95%, at least 90%identity to of any one of SEQ ID NOs: NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21; or
    (d) the nucleotide sequence of part (a) having at least 99%, at least 95%, at least 90%identity to a polynucleotide encoding any one of SEQ ID NO: 22 or 24-59.
  76. An expression cassette comprising the nucleic acid molecule of claim 74 or a nucleic acid sequence encoding the polypeptide of claim 73.
  77. The expression cassette of claim 75, wherein the nucleic acid molecule is operably linked to a promoter that is capable of directing expression in a plant cell.
  78. The expression cassette of claim 75, wherein the promoter is an endogenous promoter.
  79. The expression cassette of claim 75, wherein the promoter is an exogenous promoter.
  80. A vector comprising the nucleic acid molecule of claim 74 or the expression cassette of any one of claims 75-78.
  81. A transgenic cell comprising the nucleic acid molecule of claim 74 or the expression cassette of any one of claims 75-78.
  82. Use of the polypeptide of claim 73 or the nucleic acid molecule of claim 74, or the expression cassette of any one of claims 75 to 78 in conferring increased protein content, increased oil content, and/or modified oil profile in a plant.
  83. Use of the expression cassette of any one of claims 75-78 in a cell, wherein the expression level and/or activity of the polypeptide in the cell is increased, and the protein content is increased, the oil content is increased and/or the oil profile is modified in the cell.
  84. A method for increasing protein content and/or oil content and/or modifying oil profile in a plant, comprising increasing the expression level and/or activity of the polypeptide of claim 73 in the plant.
  85. A method for producing a plant variety with increased protein content, increased oil content, and/or modified oil profile, comprising increasing the expression level and/or activity of the polypeptide of claim 73 in a recipient plant.
  86. The method of claims 83 or 84, wherein the increasing the expression level and/or activity of the polypeptide in the plant is by transgenic means or by breeding.
  87. A method for producing a transgenic plant with increased protein content, increased oil content, and/or modified oil profile, comprising the following step: introducing the nucleic acid molecule of claim 67 or the expression cassette of any one of claims 75-78 to a recipient plant to obtain a transgenic plant; the transgenic plant has increased protein content, increased oil content, and/or modified oil profile compared with the recipient plant.
  88. The method of claim 86, wherein the introducing the nucleic acid molecule to the recipient plant is performed by introducing the expression cassette of any one of claims 75-78 into the recipient plant.
  89. A primer pair for amplifying the nucleic acid molecule of claim 74.
  90. The primer pair of claim 87, wherein the primer pair is a primer pair 1 composed of two single-stranded DNA comprising a sequence of SEQ ID NO: 63 and SEQ ID NO: 64.
  91. A kit comprising the primer pair of claim 88 or 89.
PCT/CN2022/075977 2022-02-11 2022-02-11 Methods and compositions for increasing protein and oil content and/or modifying oil profile in plant WO2023151004A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2022/075977 WO2023151004A1 (en) 2022-02-11 2022-02-11 Methods and compositions for increasing protein and oil content and/or modifying oil profile in plant
PCT/US2023/062421 WO2023154887A1 (en) 2022-02-11 2023-02-10 Methods and compositions for increasing protein and/or oil content and modifying oil profile in a plant

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/075977 WO2023151004A1 (en) 2022-02-11 2022-02-11 Methods and compositions for increasing protein and oil content and/or modifying oil profile in plant

Publications (1)

Publication Number Publication Date
WO2023151004A1 true WO2023151004A1 (en) 2023-08-17

Family

ID=87563480

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/075977 WO2023151004A1 (en) 2022-02-11 2022-02-11 Methods and compositions for increasing protein and oil content and/or modifying oil profile in plant

Country Status (1)

Country Link
WO (1) WO2023151004A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002063020A2 (en) * 2001-01-05 2002-08-15 Monsanto Technology Llc Soybean plants with enhanced yields and methods for breeding for and screening of soybean plants with enhanced yields
CN1681384A (en) * 2002-07-11 2005-10-12 孟山都技术有限公司 High yielding soybean plants with increased seed protein plus oil
AR051465A1 (en) * 2004-10-22 2007-01-17 Agrinomics Llc GENERATION OF PLANTS WITH ALTERED OIL CONTENT
US20130333061A1 (en) * 2008-02-05 2013-12-12 Wei Wu Isolated novel nucleic acid and protein molecules from soy and methods of using those molecules to generate transgenic plants with enhanced agronomic traits
CN105475116A (en) * 2006-03-10 2016-04-13 孟山都技术有限公司 Soybean seed and oil compositions and methods of making same
US20190085038A1 (en) * 2015-12-28 2019-03-21 Evogene Ltd. Plant traits conferred by isolated polynucleotides and polypeptides

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002063020A2 (en) * 2001-01-05 2002-08-15 Monsanto Technology Llc Soybean plants with enhanced yields and methods for breeding for and screening of soybean plants with enhanced yields
CN1681384A (en) * 2002-07-11 2005-10-12 孟山都技术有限公司 High yielding soybean plants with increased seed protein plus oil
AR051465A1 (en) * 2004-10-22 2007-01-17 Agrinomics Llc GENERATION OF PLANTS WITH ALTERED OIL CONTENT
CN105475116A (en) * 2006-03-10 2016-04-13 孟山都技术有限公司 Soybean seed and oil compositions and methods of making same
US20130333061A1 (en) * 2008-02-05 2013-12-12 Wei Wu Isolated novel nucleic acid and protein molecules from soy and methods of using those molecules to generate transgenic plants with enhanced agronomic traits
US20190085038A1 (en) * 2015-12-28 2019-03-21 Evogene Ltd. Plant traits conferred by isolated polynucleotides and polypeptides

Similar Documents

Publication Publication Date Title
US20230212595A1 (en) Generation of site specific integration sites for complex trait loci in corn and soybean, and methods of use
US20240099213A1 (en) Haploid induction compositions and methods for use therefor
US20230203525A1 (en) Compositions and methods for enhancing resistance to northern leaf blight in maize
US11304392B2 (en) Haploid induction compositions and methods for use therefor
JP2011101653A (en) Method and composition for expression of transgene in plant
KR20120093193A (en) Stacking of translational enhancer elements to increase polypeptide expression in plants
US9725733B2 (en) Polynucleotide encoding NF-YB derived from jatropha and use thereof
US20240110199A1 (en) Novel genetic loci associated with disease resistance in soybeans
AU2019460919B2 (en) Nucleic acid sequence for detecting soybean plant DBN8002 and detection method therefor
US20130312136A1 (en) Methods and Compositions for Modulating Gene Expression in Plants
EP4025589A1 (en) Methods of improving seed size and quality
AU2009324843B2 (en) Transformation of sugarcane
US20150203864A1 (en) Myb55 promoter and use thereof
WO2023151004A1 (en) Methods and compositions for increasing protein and oil content and/or modifying oil profile in plant
WO2023151007A1 (en) Methods and compositions for increasing protein and/or oil content and modifying oil profile in a plant
WO2023154887A1 (en) Methods and compositions for increasing protein and/or oil content and modifying oil profile in a plant
WO2023168691A1 (en) Methods and compositions for modifying flowering time genes in plants
WO2019075387A1 (en) Methods and compositions for modulating gossypol content in cotton plants
CN110959043A (en) Method for improving agronomic traits of plants by using BCS1L gene and guide RNA/CAS endonuclease system
CN116802305A (en) Novel resistance genes associated with disease resistance in soybean
CN117247962A (en) Function of cotton guide protein GhDIR5 and application of cotton guide protein GhDIR5 in gossypol synthesis
WO2022136658A1 (en) Methods of controlling grain size
CN116234816A (en) Compositions and methods for enhancing resistance to northern leaf blight in maize
US20140366220A1 (en) Compositions and Methods for Increased Expression in Sugar Cane
MX2013007621A (en) Methods and compositions for modified ethanol inducible promoter systems.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22925366

Country of ref document: EP

Kind code of ref document: A1