WO2023154887A1 - Methods and compositions for increasing protein and/or oil content and modifying oil profile in a plant - Google Patents

Methods and compositions for increasing protein and/or oil content and modifying oil profile in a plant Download PDF

Info

Publication number
WO2023154887A1
WO2023154887A1 PCT/US2023/062421 US2023062421W WO2023154887A1 WO 2023154887 A1 WO2023154887 A1 WO 2023154887A1 US 2023062421 W US2023062421 W US 2023062421W WO 2023154887 A1 WO2023154887 A1 WO 2023154887A1
Authority
WO
WIPO (PCT)
Prior art keywords
plant
acid sequence
seq
nucleic acid
glyma
Prior art date
Application number
PCT/US2023/062421
Other languages
French (fr)
Inventor
Qingshan Chen
Zhaoming QI
Dawei XIN
Jian LV
Xiaoping Tan
Original Assignee
Northeast Agricultural University
Syngenta Group Co, Ltd
Syngenta Crop Protection Ag
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/CN2022/075977 external-priority patent/WO2023151004A1/en
Priority claimed from PCT/CN2022/075982 external-priority patent/WO2023151007A1/en
Application filed by Northeast Agricultural University, Syngenta Group Co, Ltd, Syngenta Crop Protection Ag filed Critical Northeast Agricultural University
Publication of WO2023154887A1 publication Critical patent/WO2023154887A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/415Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from plants
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8241Phenotypically and genetically modified plants via recombinant DNA technology
    • C12N15/8261Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield

Definitions

  • This disclosure relates to the field of plant biotechnology.
  • it relates to methods and compositions for increasing plant protein/oil content and modifying oil profile.
  • Soybean is a valuable field crop. Soybean oil extracted from the seed is employed in a number of retail products such as cooking oil, baked goods, margarines, and the like.
  • Soybean is also used as a grain as a food source for both animals and humans.
  • Soybean meal is a component of many foods and animal feed. Typically, during the processing of whole soybeans, the fibrous hull is removed, and the oil is extracted, and the remaining soybean meal is a combination of approximately 50% carbohydrates and 50% protein.
  • soybean meal is made into soybean flour that is processed to protein concentrates used for meat extenders or specialty petfoods. Production of edible protein ingredients from soybean offers a healthier and less expensive replacement for animal protein in meats as well as dairy -type products.
  • an elite Glycine max plant having in its genome a nucleic acid sequence from a donor Glycine plant, wherein the donor Glycine plant is a different strain from the elite Glycine max plant, and wherein the nucleic acid sequence encoding at least one polypeptide having at least 90% identity or 95% identity to the amino acid sequence ofSEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59, 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136 or 139, wherein said polypeptide confers increased protein, oil content, and/or modified oil profile on the elite Glycine max plant as compared to a control plant not comprising said nucleic acid sequence.
  • a plant in another aspect, provided herein is a plant, the genome of which has been edited to comprise a nucleic acid sequence encoding at least one polypeptide having at least 90% identity or 95% identity to the amino acid sequence of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22,24-59, 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136 nd/or 139, wherein said polypeptide confers increased protein, increased oil content, and/or modified oil profile relative to a control plant, wherein the plant does not comprise said nucleic acid sequence before the genome editing.
  • nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence encodes a polypeptide having (a) an amino acid sequence comprising at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or24-59, 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, or, (b) an amino acid sequence set forth in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, wherein said nucleic acid sequence is heterologous to the plant, and wherein the plant has increased protein content and/or increased
  • a method of producing a soybean plant having increased protein, increased oil content, and/or modified oil profile comprising the steps of: a) providing a donor soybean plant comprising in its genome a nucleic acid sequence encoding at least one polypeptide having at least 90% identity or 95% identity to any one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, 21, 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134 or 137, or a nucleic acid sequence encoding any one of SEQ ID NOs: 22, or 24-59, 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134 or 137, wherein said nucleic acid sequence confers onto said donor soybean plant an increased protein, increased oil content, and/or modified oil profile; b) crossing the donor soybean plant of a)
  • a method of conferring increased protein content, increased oil content, and/or modified oil profile to a plant comprising: a) introducing into the genome of the plant a nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence is stably incorporated into the genome, wherein the nucleic acid sequence encodes a polypeptide having (i) an amino acid sequence comprising least 85%, at least 90%, or at least 95% identity to any one of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59, 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, or (ii) an amino acid sequence set forth in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59, 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130,
  • a polypeptide selected from: (a) a polypeptide having the amino acid sequence shown in any one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, wherein expression of the polypeptide in a plant confers increased protein, oil content, and/or modified oil profile on said plant, and having a heterologous amino acid sequence attached thereto; (b) a polypeptide comprising the amino acid sequence of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59, 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, and having a sub stitution and/or a deletion and/or an addition of one or more amino acid residues, wherein expression of the polypeptide in
  • nucleic acid molecule comprising (a) a nucleotide sequence encoding a protein having an amino acid sequence sharing at least 90%, 95% or 100% sequence identity to any one of SEQ IDNOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, wherein said nucleotide sequence comprises a heterologous nucleic acid sequence attached thereto and expression of the nucleic acid molecule in a plant increases protein content, increases oil content, and/or modified oil profile in the plant; (b) the nucleotide sequence of part (a) comprising a sequence of any one of SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21, 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131,
  • primer pairs for amplifying the nucleic acid molecule as disclosed above are provided herein.
  • a method of producing a Glycine max plant with increased protein content, increased oil content, and/or modified oil profile comprising the steps of: a) isolating a nucleic acid from a Glycine max plant b) detecting in the nucleic acid of a) at least one molecular marker associated with a nucleic acid sequence comprising any one of SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, 21, 74, 77,
  • FIG. 1 shows bolting assessment of of T3 generation wild type Col-0 plants (WT), mutant SALK 021984C plants, and transgenic Arabidopsis replenishment plants (pSOYl: Glyma.20G092400/SALK 0219840) and overexpression plants (pSOYl .Glyma.20G092400) according to certain aspects of this disclosure.
  • FIG. 2 shows inflorescence of T3 generation wild type Col-0 plants (WT), mutant SALK 021984C plants, and transgenic Arabidopsis replenishment plants
  • FIGS. 3A-3B show fatty acid compositions in seeds from wild type plants (WT), mutant SALK 021984C plants, and transgenic Arabidopsis replenishment plants
  • FIG. 3A shows the content of various fatty acids. From left to right: WT (Col-0),
  • FIG. 3B shows total fatty acid content
  • FIG. 4 shows a phylogenetic tree of Glyma. 20G092400 according to certain aspects of this disclosure.
  • FIG. 5 shows a phylogenetic tree of Glyma. START (Glyma.06G303700) according to certain aspects of this disclosure.
  • FIGS. 6A-6B show the protein content distribution and oil content distribution, respectively, of excellent haplotype phenotypic in block 1 of Glyma. START (Glyma.06G303700) according to certain aspects of this disclosure.
  • FIGS. 7A-7B show the protein content distribution and oil content distribution, respectively, of excellent haplotype phenotypic in block 2 of Glyma. START (Glyma.06G303700) according to certain aspects of this disclosure.
  • FIG. 8A-8B show the protein content distribution and oil content distribution, respectively, of excellent haplotype phenotypic in block 3 of Glyma. START (Glyma.06G303700) according to certain aspects of this disclosure. DETAILED DESCRIPTION
  • polypeptides that increase protein content and/or increase oil content when expressed in a plant.
  • the polypeptides result in a modified oil profile when expressed in a plant or part thereof as compared to a control plantthat does not express the polypeptides.
  • oil content and “fatty acid content” are used interchangeably herein.
  • fatty acid profile and “oil profile” are used interchangeably herein.
  • the polypeptides include SEQ ID NO: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, and/or 139 and variants of thereof.
  • nucleic acid sequence into the soybean plant
  • transgenic means include transgenic means, gene editing, and breeding.
  • Markers for identifying the presence of these nucleic acid sequences in the plant are also disclosed.
  • phenotype refers to a distinguishable characteristic(s) of a genetically controlled trait.
  • the plants provided herein are a non-naturally occurring variety of soybean having the desired trait.
  • the non-naturally occurring variety of soybean is an elite soybean variety.
  • a “non-naturally occurring variety of soybean” is any variety of soybean that does not naturally exist in nature.
  • a “non-naturally occurring variety of soybean” may be produced by any method known in the art, including, but not limited to, transforming a soybean plant or germplasm, transfecting a soybean plant or germplasm, and crossing a naturally occurring variety of soybean with a non-naturally occurring variety of soybean.
  • a “non-naturally occurring variety of soybean” may comprise one of more heterologous nucleotide sequences.
  • a “non-naturally occurring variety of soybean” may comprise one or more non-naturally occurring copies of a naturally occurring nucleotide sequence (i.e., extraneous copies of a gene that naturally occurs in soybean). In some embodiments, a “non-naturally occurring variety of soybean” may comprise a non-natural combination of two or more naturally occurring nucleotide sequences (i.e., two or more naturally occurring genes that do not naturally occur in the same soybean, for instance genes not found in Glycine max lines). [0026] Methods and compositions are provided that modulate the level of oil, protein and/or fatty acids in a plant, a plant part, or a seed.
  • an increase in protein content includes any statistically significant increase in the protein content in the plant, plant part or seed when compared to an appropriate control plant or plant part and includes, for example, an increase of at least 0.2%, 0.4%, 0.6%, 0.8%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30% or higher
  • an increase in protein content includes an increase of about 0.2% to about 0.5%, about 0.5% to about 1%, about 1% to about 2%, about 2% to about 3%, about 4% to about 5%, about 5% to about 6%, about 6% to about 7%, about 7% to about 8%, about 8% to about 9%, 30% or higher
  • an increase in oil content includes any statistically significant increase in the oil content in the plant, plant part or seed when compared to an appropriate control plant or plant part and includes, for example, an increase of at least 0.2%, 0.4%, 0.6%, 0.8%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30% or higher.
  • an increase in oil content includes an increase of about O.2% to about0.5%, about 0.5% to about 1%, about 1 % to about 2%, about 2% to about 3%, about 4% to about 5%, about 5% to about 6%, about 6% to about 7%, about 7% to about 8%, about 8% to about 9%, about 9% to about 10%, about 10% to about 12%, about 12% to about 14%, about 14% to about 16%, about 16% to about 18%, about 18% to about 20%, about 22% to about 25%, about 25% to about 30%.
  • Various methods of assaying for oil content levels are known. For example, mature seeds can be harvested, and grain protein content can be determined by FOSS analysis (see Examples).
  • an increase in fatty acid content includes any statistically significant increase in the fatty content in the plant, plant part or seed when compared to an appropriate control plant or plant part and includes, for example, an increase of at least 0.2%, 0.4%, 0.6%, 0.8%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30% or higher.
  • an increase in fatty acid content includes an increase of about 0.2% to about 0.5%, about 0.5% to about 1%, about 1% to about 2%, about 2% to about 3%, about 4% to about 5%, about 5% to about 6%, about 6% to about 7%, about 7% to about 8%, about 8% to about 9%, about 9% to about 10%, about 10% to about 12%, about 12% to about 14%, about 14% to about 16%, about 16% to about 18%, about 18% to about 20%, about 22% to about 25%, about 25% to about 30%.
  • Various methods of assaying for fatty content levels are known. For example, mature seeds can be harvested, and grain protein content can be determinedby gas chromatography (see examples).
  • the methods and compositions provide for an increase in linoleic acid and/or palmitic acid and/or oleic acid and/or eicosenoic acid in increased (or any combination thereof) when compared to an appropriate control plant.
  • Such increases include for example, an increase of atleastO.2%, 0.4%, 0.6%, 0.8%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30% or higher.
  • an increase in linoleic acid and/or palmitic acid and/or oleic acid and/or eicosenoic acid in increased includes an increase of about 0.2% to about 0.5%, about 0.5% to about 1 %, about 1 % to about 2%, about 2% to about 3%, about 4% to about 5%, about 5% to about 6%, about 6% to about 7%, about 7% to about 8%, about 8% to about 9%, about 9% to about 10%, about 10% to about 12%, about 12% to about 14%, about 14% to about 16%, about 16% to about 18%, about 18% to about 20%, about 22% to about 25%, about 25% to about 30%. orhigher of linoleic acid and/or palmitic acid and/or oleic acid and/or eicosenoic acid.
  • a "subject plant or plant cell” is one in which genetic alteration, such as transformation, has been affected as to a polynucleotide of interest, oris a plant or plant cell which is descended from a plant or cell so altered and which comprises the alteration.
  • a "control” or “control plant” or “control plant cell” provides a reference point for measuring changes in phenotype of the subjectplant or plant cell.
  • a control plant or plant cell may comprise, for example: (a) a wild-type plant or cell, i.e., of the same genotype as the starting material for the genetic alteration which resulted in the subject plant or cell; (b) a plant or plant cell of the same genotype as the starting material but which has been transformed with a null construct (i.e., with a construct which has no known effect on the trait of interest, such as a construct comprising a marker gene); (c) a plant or plant cell which is a non-transform ed segregant among progeny of a subject plant or plant cell; (d) a plant or plant cell genetically identical to the subject plant or plant cell but which is not exposed to conditions or stimuli that would induce expression of the gene of interest; or (e) the subject plant or plant cell itself, under conditions in which the gene of interest is not expressed.
  • a wild-type plant or cell i.e., of the same genotype as the starting material for the genetic alteration which resulted in the subject plant
  • compositions and methods for conferring increased protein content, increased oil content, and/or modified oil profile are provided.
  • Polypeptides, polynucleotides and fragments and variants thereof that confer increased protein content, increased oil content, and/or modified oil profile are provided.
  • the polypeptide is SEQ ID NO: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136 or 139 or a fragment or variantof any one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136 or 139.
  • the polynucleotide is any one of SEQ ID NOs: 74,
  • the polynucleotide encodes a polypeptide having the sequence of any one of SEQ ID NOs: 75,
  • the polypeptide is SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, or a fragment or variant of any one of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59.
  • the polynucleotide is any one of SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21, a polynucleotide encoding a polypeptide having the sequence of any one of SEQ ID NOs: 22, or 24-59, or a fragment or variant of any one thereof.
  • the term “gene” refers to a hereditary unit including a sequence of DNA that occupies a specific location on a chromosome and that contains the genetic instruction for a particular characteristic or train in an organism.
  • the genome of the soybean cultivar Williams 82 is used as the reference soybean genome.
  • Glyma.20G092400 (SEQ ID NO: 76) is detected in all tissues and organs, with the highest expression level in seeds (herein also referredto as grains) (Table 1). The expression level is the highest in the late milk (LM) stage of the grain. Glyma.20G092400 includes several conserved domains of the amino acid transferase-V family. This domain is found in amino acid transferase and other enzymes including cysteine desulfurase.
  • Glyma.20G092400 comprises a selenocysteine lyase/cysteine desulfurase (aa 50-437 of SEQ ID NO: 76); a cysteine desulfurase (SufS)-like domain (aa 91 -274 of SEQ ID NO: 76); an aminotransferase class-V domain (aa 93-274 of SEQ ID NO: 76), and a bifunctional selenocysteine lyase/cysteine desulfurase (aa 92-275 of SEQ ID NO: 76).
  • Glyma.20G092000 (SEQ ID NO. 6) is detected in all tissues and organs, with the highest expression level in seeds (grains) (Table 1). The expression level is the highest in the LM stage of the grain.
  • Glyma.20G092000 comprises several conserved domains in the retroviral protease superfamily, which includes the pepsin-like aspartic protease of cells and retroviruses, and also has sphingolipid activator-like protein type B, region 1 and region 2.
  • Glyma.20G092000 comprises: a Phytepsin domain (aa 76-505 of SEQ ID NO: 79); a Eukaryotic aspartyl protease (ASP) domain (aa 84-506 of SEQ ID NO: 79); an aspartyl protease domain (aa 77-507 of SEQ ID NO: 79); two Saposin (B) Domains (aa 316-351 and aa380-418 of SEQ ID NO: 79).
  • Glyma.20G094900 (SEQ ID NO: 9) sequence is detected in all tissues and organs, with the highest expression level in seeds (grains) (Table 1). The expression level is the highest in the LM stage of the grain. Glyma.20G094900 is a protein with unknown function identified as DUF1336) and appears to belongto the DUF1336 superfamily. This family represents the C-terminus of many pseudoproteins with unknown function.
  • Glyma.20G094900 comprises a protein enhanced disease resistance 2 (EDR2) C-terminal domain (aa 2-68 of SEQ ID NO: 82).
  • EDR2 protein enhanced disease resistance 2
  • Glyma.20G092100 (SEQ IDNO:85) sequence is detected in all tissues and organs, with the highest expression level in seeds (grains) (Table l). The expression level is the highest in the DS stage of the grain. Glyma.20G092100 comprises several conserved domains matching to the PPR repeatfamily.
  • Glyma.20G092100 comprises several tetratricopeptide- like (TPR) helical domains (aa 57-253, aa 229-365, and aa 404-461 of SEQ ID NO: 85) and pentatricopeptide repeats (aa 403-429, aa 578-607, aa 438-461, aa 370-398, aa 647-675, aa 194-241, aa 264-313, aa 265-299, aa 540-574, aa 300-334, aa 435-469, aa 644-678, aa 403- 434, aa 195-229, aa 230-264, aa 88-122, aa 158-194, aa 335-365, aa 470-504, aa 366-400, aa 123-157, aa 575-609, aa 370-398, aa 648
  • Glyma.06G303700 (SEQ ID NO: 1-5) sequence is expressed in all tissues and organs, with the highest expression level in seeds.
  • Glyma. START (Glyma.06G303700) comprises several conserved domains: a START_ArGLABRA2_like domain (aa 241-465 of SEQ ID NO: 3 & 5); a START domain (aa 246-466 of SEQ ID NO: 3 & 5); a homeobox domain (aa 57-110 of SEQ ID NO: 3 & 5); a homeodomain(aa 55-113 of SEQ ID NO: 3 & 5); a COG5576 superfamily domain (aa 13-129 of SEQ ID NO: 3 & 5); and a MreC superfamily domain (aa 120-193 of SEQ ID NO: 3 & 5).
  • the START_ArGLABRA2_like domain is the C-terminal lipid-binding START domain of the Arabidopsis homeobox protein GLABRA 2.
  • the START_ArGLABRA2_like subfamily includes the Arabidopsis homeobox protein GLABRA2 and other proteins related to steroid production.
  • the homeobox domain encodes a 61 -amino acid sequence, which has the ability to bind specific DNA sequences and control gene expression at the transcriptional level.
  • the COG5576 superfamily domain is a homeodomain-containing, transcriptional regulation domain. MreC superfamily domain usually involves in formation and maintenance of cell shape, which can position cell wall synthetic complexes.
  • Glyma.06G303700 is 8466 bp in length, and the CDS sequence is 2190 bp in length.
  • the exon region of Glyma.06G303700 (SEQ ID NO: 3) in soy variety SN14 is identical to the corresponding gene in soy variety Williams82 (W82). Wild soybean (G.
  • ZYD00006 comprises four mutations in Glyma.06G303700 relative to Williams82: Cl 162T (i.e., change from C to T at 1162 bp position), A1370G (i.e., change from Ato Gat 1370 bp position), C2063G(i.e., change fromC to Gat2063 bp position), and C2098G (i.e., change from Gto A at2098 bp position).
  • the last three base mutations do not result in any changes in the encoded amino acids, but the first base mutation, Cl 162T, resulted in an alanineto valine substitution atposition 388, i.e., A388V.
  • Glyma.06G303700 The phylogenetic tree of Glyma.06G303700was constructed using homologous sequences from Soybean, Arabidopsis, rice, corn, and other plants with MEGA5 software. See FIG. 5 and Table 62.
  • Glyma.06G303700 shows high homology with Glyma. 15G220200, Glyma.l2G100100, and AT1G05230.
  • Glyma.12G100100 contains the same conserved domains as Glyma.06G303700.
  • AT 1G05230 contains START_ArGLABRA2_like and homeobox domains, which are also present in Glyma.06G303700.
  • AT1G05230 and Glyma. START (Glyma.06G303700) share 78.9% amino acid sequence identity.
  • Glyma.03G040200 (SEQ ID NO: 10- 12) has an OPT domain (aa 4-73 of SEQ ID NO: 12), which is related to transmembrane transport. Glyma.03G040200 is expressedin low levels in seeds.
  • the genomic sequence of Glyma.03G040200 (SEQ ID NO: 10) is 463 bp in length, and the CDS sequence (SEQ ID NO: 11) is 237 bp in length.
  • soy variety Williams82 SEQ ID NO: 12
  • Glyma.03G036300 (SEQ ID NO: 6-9) is a pifl helicase and is involved in a number of cellular processes including DNA repair, DNA strand breaking, recombination, nucleotide binding, ATP binding, telomere maintenance, and cell response to DNA damage stimulation.
  • the protein possesses helicase activity and hydrolase activity.
  • Glyma.03G036300 comprises aPIFl domain (aa 2-211 of SEQ ID NO: 8), a SFI C RecD domain (aa 258-303 of SEQ ID NO: 8), and a RecD domain (aa 250-294).
  • PIF1 domain is a conserved domain shared by the PIFl-like helicase family.
  • the SFI C RecD domain is found in the C-terminal helicase domain of Rec D family helicases.
  • the RecD domain is found in the ATP-dependent exoDNAses and the like and acts as a 3 '-5' helicase.
  • RecBCD enzyme can unfold or separate DNA strands and also forms single-stranded gaps in DNA.
  • Glyma.03G036300 in W82 (SEQ ID NO: 6) is 988 bp, and the full length of CDS (SEQ ID NO: 7) is 987 bp.
  • Glyma.03 G036300 in ZYD is same as thatin W82.
  • the translation of Glyma.03g036300 is terminated at 294th amino acid in SN14 (SEQ ID NO: 9), and it can be translated normally in ZYD00006 (SEQ ID NO: 8).
  • Glyma.07Gl 92400 (SEQ ID NO:16-19) is highly expressed in seeds andis involved in transmembrane transport. No conserved domain information was known for
  • Glyma.07G192400 The genome sequence of the n Glyma.07G 192400 (SEQ ID NO: 16) is 4263 bp in length, and the CDS sequence (SEQ ID NO: 18) is 417 bp in length. Only one base mutation occurred in ZYD00006, and the mutation was G-A. Translating the CDS sequence of the gene into amino acid sequence, it was found that the base mutation in the CDS sequence led to the change of amino acid translation, resulting in the change of amino acid from V (valine) as in SN14 or W82 (SEQ ID NO: to I (isoleucine) as in ZYD00006 at position 46 of the Glyma.07G192400 polypeptide. Glyma. 06g297500
  • Glyma.06g297500 SEQ ID NO: 13-15.
  • the full-length genomic sequence of Glyma.06G297500 SEQ ID NO: 13
  • the full length CDS sequence SEQ ID NO: 14
  • the CDS sequence and amino acid sequence are identical in all three of soy varieties SN14, ZYD00006, and Williams82.
  • nucleic acid sequences in the context of nucleic acid sequences means that when the nucleic acid sequences of certain sequences are aligned with each other, the nucleic acids that “correspond to” certain enumerated positions in the present invention are those that align with these positions in a reference sequence, butthat are not necessarily in these exact numerical positions relative to a particular nucleic acid sequence of the invention.
  • Optimal alignment of sequences for comparison can be conducted by computerized implementations of known algorithms, or by visual inspection. Readily available sequence comparison and multiple sequence alignment algorithms are, respectively, the Basic Local Alignment Search Tool (BLAST) and ClustalW/ClustalW2/Clustal Omega programs available on the Internet (e.g., the website of the EMBL-EBI).
  • BLAST Basic Local Alignment Search Tool
  • ClustalW/ClustalW2/Clustal Omega programs available on the Internet (e.g., the website of the EMBL-EBI).
  • variants and fragments of the above-described polynucleotides and polypeptides and variants and fragments thereof increase protein content, increase oil content, and/or modify oil profile when expressed in a plant, plant part, or seed.
  • Fragments of the proteins that increase protein content, increase oil content, and/or modify oil profile when expressed in a plant, plant part, or seed include those that are shorter than the full-length sequences, either due to the use of an alternate downstream start site, or due to processing that produces a shorter protein having the activity
  • a fragment of a protein that increases protein content, increases oil content, and/or modifies oil profile when expressed in a plant can be a polypeptide that is, for example, 10, 25, 50, 100, 150, 200, 250 or more amino acids in length of any one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136 or 139.
  • Such biologically active portions can be prepared by recombinant techniques and evaluated for activity of being able to confer increased protein content, increased oil content, and/or modified oil profile.
  • a fragment comprises at least 8 contiguous amino acids of SEQ ID NO: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136 or 139.
  • a fragment of a protein that increases protein content, increases oil content, and/or modifies oil profile when expressed in a plant can be a polypeptide that is, for example, 10, 25, 50, 100, 150, 200, 250 or more amino acids in length of any one of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24- 59.
  • Such biologically active portions can be prepared by recombinant techniques and evaluated for activity of being ableto confer increased protein content, increased oil content, and/or modified oil profile.
  • a fragment comprises at least 8 contiguous amino acids of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59.
  • Variants disclosed herein are polypeptides having an amino acid sequence that has at least 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 86%, about 87%, about 88%, about 89%, about 90%, about 91 %, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98% or about 99% identity to the amino acid sequence of any one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, 139, 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59.
  • variants will increase protein content, increased oil content, and/or modified oil profile when expressed in a plant, plant part or seed.
  • a variant polynucleotide comprises a deletion and/or addition of one or more nucleotides at one or more internal sites within the native polynucleotide and/or a substitution of one or more nucleotides at one or more sites in the native polynucleotide.
  • Equivalent programs may also be used.
  • equivalent program any sequence comparison program that, for any two sequences in question, generates an alignment having identical nucleotide residue matches and an identical percent sequence identity when compared to the corresponding alignment generated by needle from EMBOSS version 6.3.1.
  • BLAST nucleotide searches can be performed with the BLASTN program (nucleotide query searched against nucleotide sequences) to obtain nucleotide sequences homologous to nucleic acid molecules of the invention, or with the BLASTX program (translated nucleotide query searched against protein sequences) to obtain protein sequences homologous to nucleic acid molecules of the invention.
  • BLAST protein searches can be performed with the BLASTP program (protein query searched against protein sequences) to obtain amino acid sequences homologous to protein molecules of the invention, or with the TBLASTN program (protein query searched against translated nucleotide sequences) to obtain nucleotide sequences homologous to protein molecules of the invention.
  • Gapped BLAST in BLAST 2.0
  • PSI-Blast can be used to perform an iterated search that detects distant relationships between molecules. See Altschul et al. (1997) supra.
  • the default parameters of the respective programs e g., BLASTX and BLASTN
  • Alignment may also be performed manually by inspection.
  • Two sequences are "optimally aligned” when they are aligned for similarity scoring using a defined amino acid substitution matrix (e.g., BLOSUM62), gap existence penalty and gap extension penalty so as to arrive at the highest score possible for that pair of sequences.
  • Amino acid substitution matrices andtheiruse in quantifying the similarity between two sequences are well-known in the art and described, e g., in Dayhoff etal. (1978) "A model of evolutionary change in proteins.” In “Atlas of Protein Sequence and Structure,” Vol. 5, Suppl. 3 (ed. M. O. Dayhoff), pp. 345-352. Natl. Biomed. Res. Found., Washington, D C. and Hemkoffet al.
  • the BLOSUM62 matrix is often used as a default scoring substitution matrix in sequence alignment protocols.
  • the gap existence penalty is imposed for the introduction of a single amino acid gap in one of the aligned sequences, and the gap extension penalty is imposed for each additional empty amino acid position inserted into an already opened gap.
  • the alignment is defined by the amino acids positions of each sequence at which the alignment begins and ends, and optionally by the insertion of a gap or multiple gaps in one or both sequences, so as to arrive at the highest possible score.
  • fragments and variants of the polypeptides disclosed herein each comprises one or more conserved domains of the canonical polypeptide.
  • the variant or fragment can comprise a polypeptide comprising at least 40%, 50%, 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% identity to one or more of the conserved domains in the canonical polypeptide sequence.
  • a variant or fragment of Glyma.20G092400 may comprise a selenocysteine lyase/Cysteine desulfurase (aa 50-437 of SEQ ID NO: 76); a Cysteine desulfurase (SufS)-like domain (aa 1-274 of SEQ ID NO: 76); an Aminotransferase class-V domain (aa 93-274 of SEQ ID NO: 76), and a Bifunctional selenocysteine lyase/cysteine desulfurase (aa 92-275 of SEQ ID NO: 76).
  • a variant or fragment of Glyma may comprise a selenocysteine lyase/Cysteine desulfurase (aa 50-437 of SEQ ID NO: 76); a Cysteine desulfurase (SufS)-like domain (aa 1-274 of SEQ ID NO: 76); an
  • 20G092400 (SEQ ID NO: 76) can comprise a polypeptide comprising at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% identical to one or more of the conserved domains of Glyma.20G092400 (SEQ ID NO:76).
  • a variant or fragment of Glyma.20G092000 can comprise a Phytepsin domain (aa 76-505 of SEQ ID NO: 79); a Eukaryotic aspartyl protease (ASP) domain (aa 84-506 of SEQ ID NO: 79); an aspartyl protease domain (aa 77- 507 of SEQ ID NO: 79); two Saposin (B) Domains (aa 316-351 and aa380-418 of SEQ ID NO: 79).
  • a variant or fragment of Glyma.20G092000 can retain functionality as aspartic proteinase.
  • 20G092000 (SEQ ID NO: 79) can comprise a polypeptide comprising at least 60%, at least 65%, at least 70%, at least 75%, atleast 80%, atleast 85%, atleast 90%, atleast 95%, atleast 98%, or atleast99% identical to one or more of the conserved domains of Glyma. 20G092000 (SEQ ID NO:79).
  • Glyma.20G094900 can comprise one or more of the conserved domains of a DUF1336 superfamily protein.
  • the variant or fragment can comprise a protein enhanced disease resistance 2 (EDR2) C-terminal domain (aa 2-68 of SEQ ID NO: 82).
  • EDR2 protein enhanced disease resistance 2
  • 20G094900 (SEQ ID NO: 82) can comprise a polypeptide comprising at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% identical to one or more of the conserved domains of Glyma. 20G094900 (SEQ ID NO:82).
  • a variant or a fragment of Glyma. 20G094900 can retain activities similar to EDR2 in regulating pathogen resistance.
  • a variant or fragment of Glyma.20G092100 can comprise one or more of a tetratricopeptide-like (TPR) helical domains (aa 57-253, aa 229-365, and aa 404-461 of SEQ ID NO: 85) and/or one or more of the pentatricopeptide repeats (aa 403-429, aa 578-607, aa 438-461, aa 370-398, aa 647-675, aa 194-241, aa 264- 313, aa 265-299, aa 540-574, aa 300-334, aa 435-469, aa 644-678, aa 403-434, aa 195-229, aa 230-264, aa 88-122, aaa 158-194, aa 335-365, aa 470-504, aaa
  • a variant or fragment of Glyma.20G092100 can comprise a polypeptide comprising at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% identical to one or more of the conserved domainsof Glyma.20G092100 (SEQ ID NO:85).
  • Avariant or fragment of Glyma.20G092100 can retain activitivies similar to TPR in mediating protein-protein interactions and the assembly of multiple protein complexes.
  • a variant or fragment of Glyma.06G303700 may comprise one or more of the conserved domainsof the START_ArGLABRA2_like domain (aa 241-465 of SEQ ID NO: 3 & 5); the START domain (aa 246-466 of SEQ ID NO: 3 & 5); the homeobox domain (aa 57-110 of SEQ ID NO: 3 & 5); the homeodomain (aa 55- 113 of SEQ ID NO: 3 & 5); the COG5576 superfamily domain (aa 13-129 of SEQ ID NO: 3 & 5); and/or the MreC superfamily domain.
  • Avariant or fragment of Glyma.06G303700 can retain activity as a transcription factor.
  • a variant or fragment of Glyma.03G040200 can comprise a polypeptide comprising atleast60%, atleast 65%, at least70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 98% or at least 99% identity to one or more of the conserved domainsof Glyma.03G040200 (SEQ ID NO: 12).
  • Avariant or fragment of Glyma.03G040200 can retain activity as in transmembrane transport.
  • a variant or fragment of Glyma.03G036300 can comprise a polypeptide comprising at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 98% or at least 99% identical to one ormore of the conserved domainsof Glyma.03G040200 (SEQ ID NO: 12).
  • Avariant or fragment ofGlyma.03 G036300 (SEQ ID NO: 8) can retain activity as a pifl helicase.
  • a variant or fragment of Glyma.06g297500 can comprise a polypeptide comprising at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 98% or at least 99% identical to one ormore of the conserved domainsof Glyma.06g297500 (SEQ ID NO: 15).
  • fragments and variants of the polypeptides disclosed herein will retain the activity of conferring increased protein content, increased oil content, and/or modified oil profile to a plant expressing the polypeptide.
  • increase in protein content and/or oil content can comprise any statistically significant increase, including, for example an increase of about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 85%, 90%, 95% or greater relative to a control. Methods of determining protein content or oil content are further described below.
  • the polypeptides disclosed herein may comprise a heterologous amino acid sequence attached thereto.
  • a polypeptide may have a polypeptide tag or additional protein domain attached thereto.
  • the heterologous amino acid sequence can be attached to the N terminus, the C terminus, or internally within the polypeptide.
  • the polypeptide may have one or more polypeptide tags and/or additional protein domains attached thereto at one or more positions of the polypeptide.
  • the nucleic acid sequence encoding the polypeptides disclosed herein may comprise a heterologous nucleic acid sequence attached thereto.
  • the heterologous nucleic acid sequence may encode a polypeptide tag or additional protein domain that will be attached to the encoded polypeptide.
  • the heterologous nucleic acid sequence may encode a regulatory element such as an intron, an enhancer, a promoter, a terminator, etc.
  • the heterologous nucleic acid sequence canbe positioned at the 5' end, the 3' end, or in-frame within the coding sequence of the polypeptide.
  • the nucleic acid sequence encoding the polypeptides disclosed herein may have one or more heterologous nucleic acid sequences attached thereto at one or more positions of the nucleic acid sequence.
  • heterologous or “recombinant” in reference to a polypeptide or polynucleotide sequence is a sequence that originates, for example, from a cell or an organism with another genetic background of the same species or from a foreign species, or, if from the same species, is substantially modified from its native form in composition and/or genomic locus by deliberate human intervention. As such, heterologous sequences are in a configuration not found in nature.
  • a “native” polynucleotide or polypeptide comprises a naturally occurring nucleotide sequence or amino acid sequence, respectively.
  • “heterologous” or “recombinant” refers to, when used in reference to a polynucleotide, a polynucleotide encoding a factor that is not in its natural environment (i.e., has been altered by the of man).
  • a heterologous gene may include a polynucleotide from one species introduced into another species.
  • a heterologous polynucleotide may also include a polynucleotide native to an organism that has been altered in some way (e.g., mutated, added in multiple copies, linked to a non-native promoter or enhancer polynucleotide, etc.).
  • Heterologous genes further may comprise plant polynucleotides that comprise cDNA forms of a plant gene; the cDNAs may be expressed in either a sense (to produce mRNA) or antisense orientation (to produce an antisense RNA transcript that is complementary to the mRNA transcript).
  • heterologous polynucleotides are distinguished from endogenous plant genes in that the heterologous gene polynucleotide are joined to polynucleotides comprising regulatory elements such as promoters that are not found naturally associated with the polynucleotide for the protein encoded by the heterologous polynucleotide or with plant polynucleotide in the chromosome, or are associated with portions of the chromosome not found in nature (e.g., polynucleotides expressed in loci where the polynucleotide is not normally expressed).
  • regulatory elements such as promoters that are not found naturally associated with the polynucleotide for the protein encoded by the heterologous polynucleotide or with plant polynucleotide in the chromosome, or are associated with portions of the chromosome not found in nature (e.g., polynucleotides expressed in loci where the polynucleotide is not normally expressed).
  • a “heterologous” or “recombinant” polynucleotide is a polynucleotide not naturally associated with a host cell into which it is introduced, including non-naturally occurring multiple copies of a naturally occurring polynucleotide.
  • Polynucleotides encoding the polypeptides provided herein can be provided in expression cassettes for expression in an organism of interest.
  • the cassette will include 5' and 3 ' regulatory sequences operably linked to a polynucleotide encoding a polypeptide provided herein that allows for expression of the polynucleotide.
  • the cassette may additionally contain at least one additional gene or genetic element to be co-transformed into the organism. Where additional genes or elements are included, the components are operably linked. Alternatively, the additional gene(s) or element(s) can be provided on multiple expression cassettes.
  • Such an expression cassette is provided with a plurality of restriction sites and/or recombination sites for insertion of the polynucleotides to be under the transcriptional regulation of the regulatory elements or regions.
  • the expression cassette may additionally contain a selectable marker gene.
  • the expression cassette will include in the 5 '-3' direction of transcription, a transcriptional and translational initiation region (i.e., a promoter), a polynucleotide of the invention, and a transcriptional and translational termination region (i.e., termination region) functional in the organism of interest, i.e., a plant or bacteria.
  • the promoters of the invention are capable of directing or driving transcription and expression of a coding sequence in a host cell.
  • the regulatory regions i.e., promoters, transcriptional regulatory regions, and translational termination regions
  • a chimeric gene or a chimeric nucleic acid molecule comprises a coding sequence operably linked to a transcription initiation region that is heterologous to the coding sequence.
  • transcriptional terminators are available for use in expression cassettes. These are responsible for the termination of transcription beyond the transgene and correct mRNA polyadenylation.
  • the termination region may be native with the transcriptional initiation region, may be native to the operably linked DNA sequence of interest, maybe native to the planthost, or may be derived from another source(z.e., foreign or heterologous to the promoter, the DNA sequence of interest, the plant host, or any combination thereof).
  • Appropriate transcriptional terminators are those that are known to function in plants and include the CAMV pSOYl terminator, the tml terminator, the nopaline synthase terminator and the pea rbcs E9 terminator.
  • Termination regions used in the expression cassettes can be obtained from, e.g., the Ti-plasmid of A. tumefaciens, such as the octopine synthase and nopaline synthase termination regions. See also Guerineauet al. (1991) Mol. Gen. Genet. 262: 141-144; Proudfoot (1991) Cell 64:671-674; Sanfacon et al. (1991) Genes Dev. 5 : 141-149; Mogen et al. (990) Plant Cell 2: 1261-1272; Munroe et al.
  • Additional regulatory signals include, but are not limited to, transcriptional initiation start sites, operators, activators, enhancers, other regulatory elements, ribosomal binding sites, an initiation codon, termination signals, andthe like. See, for example, U. S. Pat. Nos. 5,039,523 and 4,853,331; EPO 0480762A2; Sambrook et al. (1992) Molecular Cloning: A Laboratory Manual, ed. Maniatis et al. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.), hereinafter “Sambrook 11”; Davis etal, eds. (1980).
  • the various DNA fragments may be manipulated, so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame.
  • adapters or linkers may be employed to join the DNA fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like.
  • in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions maybe involved.
  • a number of promoters can be used in the practice of the invention.
  • the promoters can be selected based on the desired outcome.
  • the nucleic acids can be combined with constitutive, inducible, tissue-preferred, or other promoters for expression in the organism of interest.
  • the promoter used herein to drive the expression of the polynucleotides provided herein comprises an exogenous promoter.
  • exogenous promoter refers to a promoter that is not found in plants in nature, for example, a synthetic promoter.
  • constitutive promoters can also be used.
  • constitutive promoters include CaMV pSOY 1 promoter (Odell et al. (985) Nature 313 :810-812); rice actin (McElroy et al. (1990) Plant Cell 2: 163-171); ubiquitin (Christensen et al. (1989) Plant Mol. Biol. 12:619-632 and Christensen et al. (1992) Plant Mol. Biol. 18:675-689); pEMU (Last et al. (1991) Theor. Appl. Genet. 81 : 581 -588); MAS (Velten e/ a/.
  • Inducible promoters include those that drive expression of pathogenesis-related proteins (PR proteins), which are induced following infection by a pathogen.
  • PR proteins pathogenesis-related proteins
  • PR proteins pathogenesis-related proteins
  • Promoters that are expressed locally at or near the site of pathogen infection may also be used (Marineau etal. (1987) Plant Mol. Biol. 9:335-342; Matton et al.
  • Wound-inducible promoters maybe usedin methods and compositions in this disclosure.
  • Such wound-inducible promoters include pin II promoter (Ryan (1990) Ann. Rev. Phytopath. 28:425-449; Ouan et al. (1996) Nature Biotechnology 14:494-498); wunl and wun2 (U.S. Patent No. 5,428,148); winl and win2 (Stanford et al. (1989) Mol. Gen. Genet. 215 :200-208); systemin (McGurl et al. (1992) Science 225: 1570-1573); WIP1 (Rohmeier et al. (1993) Plant Mol. Biol.
  • Tissue-preferred promoters for use in the invention include those set forth in Yamamoto et al (1997) Plant J. 12(2):255-265 ; Kawamata et al (1997) Plant Cell Physiol.
  • Leaf-preferred promoters include those set forth in Yamamotoet al. (1997) PlantI. 12(2):255-265; Kwon et al. (1994) Plant Physiol. 105:357-67; Yamamotoet al. (1994) Plant Cell Physiol. 35(5):773-778; Gotor etal. (1993) Plant J. 3 :509-18; Orozco etal. (1993) Plant Mol. Biol. 23(6): 1129-1138; and Matsuoka et al. (1993) Proc. Natl. Acad. Sci. USA 90(20):9586-9590. [0073] Root-preferred promoters are known and include those in Hire et al. (1992) Plant Mol.
  • seed-preferred promoters include both “seed-specific” promoters (those promoters active during seed development such as promoters of seed storage proteins) as well as “seed-germinating” promoters (those promoters active during seed germination). See Thompson et al. (1989) BioEssays 10: 108. Seed-preferred promoters include, but are not limited to, Ciml (cytokinin-induced message); cZ19Bl (maize 19 kDa zein); milps (myoinositol-1 -phosphate synthase) (see WO 00/11177 and U.S. PatentNo. 6,225,529).
  • Gammazein is an endosperm-specific promoter.
  • Globulin 1 (Gib- 1) is a representative embryospecific promoter.
  • seed-specific promoters include, but are not limited to, bean P- phaseolin, napin, -conglycinin, soybean lectin, cruciferin, and the like.
  • seedspecific promoters include, but are not limited to, maize 15 kDa zein, 22 kDa zein, 27 kDa zein, gamma- zein, waxy, shrunken 1, shrunken 2, Globulin 1, etc. See also WO 00/12733, where seed-preferred promoters from endl and end! genes are disclosed.
  • the polynucleotides or variants thereof provided herein are not expressed using a root-specific promoter. In further embodiments, the polynucleotides or variants thereof provided herein are not expressed with the RCc3 rootspecific promoter. (See US 20130139280).
  • promoters that function in bacteria are well- known in the art.
  • Such promoters include any of the known crystal protein gene promoters, including the promoters of any of the proteins of the invention, and promoters specific fori?. thuringiensis sigma factors.
  • mutagenized, or recombinant crystal proteinencoding gene promoters may be recombinantly engineered and used to promote the expression of the novel gene segments disclosed herein.
  • a number of non-translated leader sequences derived from viruses are also known to enhance expression, and these are particularly effective in dicotyledonous cells.
  • the expression cassette may comprise one or more of such leader sequences.
  • leader sequences from tobacco mosaic virus TMV, the “W-sequence”
  • MCMV maize chlorotic mottle virus
  • AMV alfalfa mosaic virus
  • Other leader sequences known in the art include but are not limited to: picomavirus leaders, for example, EMCV leader (encephalomyocarditis 5' noncoding region) (Elroy-Stein, O., Fuerst, T. R , and Moss, B.
  • potyvirus leaders for example, tobacco etch virus (TEV) leader (Allison etal., 1986); maize dwarf mosaic virus (MDMV) leader; Virology 154:9-20); human immunoglobulin heavy -chain binding protein (BiP) leader, (Macejak,D. G., and Samow, P., Nature 353: 90-94 (1991); untranslated leader from the coat protein mRNA of alfalfa mosaic virus (AMVRNA 4), (Jobling, S. A., and Gehrke, L., Nature 325 :622-625 (1987); tobacco mosaic virus leader (TMV), (Gallie, D. R.
  • the expression cassette can also comprise a selectable marker gene forthe selection of transformed cells.
  • Selectable marker genes are utilized forthe selection of transformed cells or tissues.
  • Marker genes include genes encoding antibiotic resistance, such as those encoding neomycin phosphotransferase II (NEO) and hygromycin, 5-enolpyruvylshikimate- 3 -phosphate synthase (EPSPS), spectinomycin, or Acetolactate synthase (ALS).
  • Selection markers used routinely in transformation include the nptll gene, which confers resistance to kanamycin and related antibiotics (Messing & Vierra Gene 1 : 259-268 (1982); Bevan etal., Nature 304:184-187 (1983)), the /zrztand bar genes, which confer resistance to the herbicide glufosinate (also called phosphinothricin; see White etal., Nucl. Acids Res 18: 1062 (1990), Spencer etal. Theor. Appl. Genet 79: 625-631 (1990) and U.S. PatentNos.
  • the promoter used herein to drive the expression of the polynucleotides provided herein comprises a native promoter or an active variant or fragment thereof.
  • native promoter used interchangeably with the term “endogenous promoter,” refers to a promoter that is found in plants in nature.
  • An active variant or fragment of a native promoter refers to a promoter sequence that has one or more nucleotide substitutions, deletions, or insertions and that can drivethe expression of an operably linked polynucleotide sequence under conditions similar to those under which the native promoter is active.
  • the native promoter comprises a polynucleotide having the sequence of SEQ ID NO: 58.
  • a construct comprising a native promoter or its active variant or fragment operably linked to a polynucleode having the sequence of any one of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134 or 137, or a fragment or variant of any one of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134 or 137 (e.g., having least 85%, at least 90%, at least 95%, at least 98%, or at least 99% identity); and when introduced into a plant, the construct confers increased protein content, increased oil content, and/or modified oil profile.
  • a construct comprising a native promoter or an active variant or fragment thereof operably linked to a polynucleotide encoding a polypeptide having the sequence of any one of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, or a fragment orvariant (e g., having least 85%, atleast 90%, at least 95%, at least 98%, orat least 99% identity) of any one of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or24-59; and when introduced into a plant, the construct confers increased protein content, increased oil content, and/or modified oil profile.
  • the native promoter is a heterologous promoter to the polynucleotide.
  • polynucleotide encodes a polypeptide having an amino acid sequence comprising at least 85%, at least 90%, or at least 95% identity to at least one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136 or 139.
  • the polynucleotide comprises at least one, at least two, at least three, at least four, at least five, or at least six mutations as compared to SEQ ID NO: 75, 78, 81, 84, 87, 111, 114, 117, 120, 123, 126, 129, 132, 135 or 138.
  • the polynucleotide comprises at least one, at least two, at least three, at least four, at least five, or at least six mutations as compared to a polynucleotide having a sequence of any one of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134 or 137.
  • the polynucleotide encodes a polypeptide having an amino acid sequence comprising at least 85%, atleast 90%, at least95%, at least 98%, or at least 99% identity to at least one of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59.
  • the polynucleotide comprises at least one, at least two, at least three, at least four, at least five, or at least six mutations as compared to any one of SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21. In some embodiments, the polynucleotide comprises at least one, at least two, at least three, at least four, at least five, or at least six mutations as compared to a polynucleotide encoding any one of SEQ ID NOs: 22 or 24-59.
  • the plant is a dicot plant. In some embodiments, the plant is a monocot plant. In some embodiments, the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane. In some embodiments, the plant is a soybean plant. In some embodiments, the plant is an elite soybean plant.
  • Also provided herein is a method of conferring increased protein content, increased oil content, and/or modified oil profile to a plant comprising introducing into the genome of the plant a nucleic acid sequence operably linked to a promoter comprising SEQ ID NO: 93 or an active variant or fragment thereof, where the nucleic acid sequence encodes a polypeptide having an amino acid sequence comprising least 85%, at least 90%, at least 91%, at least92%, atleast 93%, atleast94%, at least 95% identity, atleast 96%, at least97%, at least 98%, or at least 99% identity to at least one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136 or 139, 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59.
  • the nucleic acid sequence encodes a polypeptide having an amino acid sequence setforthin SEQ ID NO: 75, 78, 81, 84, 87, 111, 114, 117, 120, 123, 126, 129, 132, 135, 138, 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59.
  • the polynucleotide as described in Section I of this disclosure is a heterologous nucleic acid sequence in the genome of the plant.
  • heterologous in the context of a chromosomal segment refers to one or more DNA sequences (e g., genetic loci) in a configuration in which they are not found in nature, for example as a result of a recombination eventbetween homologous chromosomes during meiosis, or for example as a result of the introduction of a transgenic sequence, or for example as a result of modification through gene editing.
  • soybean plants are used to exemplify the composition and methods throughout the application, a polynucleotide as provided herein may be introduced to any plant species, including, but not limited to, monocots and dicots.
  • plants of interest include but are not limited to, corn (maize), sorghum, wheat, sunflower, tomato, crucifers, peppers, potato, cotton, rice, soybean, sugarbeet, sugarcane, tobacco, barley, and oilseed rape, Brassica sp., alfalfa, rye, millet, safflower, peanuts, sweetpotato, cassava, coffee, coconut, pineapple, citrus trees, cocoa, tea, banana, avocado, fig, guava, mango, olive, papaya, cashew, macadamia, almond, oats, vegetables, ornamentals, and conifers.
  • Glycine is a genus in the bean family Fabaceae.
  • the Glycine plants can be Glycine arenaria, Glycine argyrea, Glycine cyrtolob a, Glycine canescens, Glycine clandestine, Glycine curvata, Glycinefalcata, Glycinelatifolia, Glycine microphylla, Glycine pescadrensis, Glycine stenophita, Glycine syndetica, Glycine soja Seib. EtZucc., Glycine max (L.) Merrill., Glycine tabacina, or Glycine tomentella.
  • the plants provided herein are elite plants or derived from an elite line.
  • an “elite line” is an agronomically superior line that has resulted from many cycles of breeding and selection for superior agronomic performance. Numerous elite lines are available and known to those of skill in the art of soybean breeding. An “elite population,” is an assortment of elite individuals or lines that can be used to represent the state of the art in terms of agronomically superior genotypes of a given crop species, such as soybean. Similarly, an “elite germplasm” or elite strain of germplasm is an agronomically superior germplasm, typically derived from, and/or can give rise to, a plant with superior agronomic performance, such as an existing or newly developed elite line of soybean.
  • An “elite” plant is any plant from an elite line, such that an elite plant is a representative plant from an elite variety.
  • the soybean plant comprising a polynucleotide encoding any one of the polypeptides disclosed herein is an elite soybean plant.
  • Non-limiting examples of elite soybean varieties that are commercially available to farmers or soybean breeders include: AG00802, A0868, AG0902, A1923, AG2403, A2824, A3704, A4324, A5404, AG5903, AG6202 AG0934; AG1435; AG2031 ; AG2035; AG2433; AG2733; AG2933; AG3334; AG3832; AG4135; AG4632; AG4934; AG5831 ; AG6534; and AG7231 (Asgrow Seeds, Des Moines, Iowa, USA); BPR0144RR, BPR 4077NRR and BPR 4390NRR (Bio Plant Research, Camp Point, Ill., USA); DKB 17-51 and DKB37-51 (DeKalb Genetics, DeKalb, Ill., USA); DP 4546 RR, and DP 7870 RR (Delta & Pine Land Company, Lubbock, Tex., USA); JG 03R501, IG 32R606C ADD and
  • the plants provided herein can comprise one or more additional polynucleotides that encode an additional polypeptide that can confer a phenotype of increased protein content, increased oil content, or modified oil profile on a plant.
  • the additional polynucleotide encodes a polypeptide having the sequence of any one of SEQ IDNOs: 3, 5, 6, 8, 9, 11, 12, 15, 18, 19, 22, or 24-59.
  • the additional polynucleotide can be introduced using similar approaches as disclosed above, e.g, by transgenic means, by breeding, or by genome editing.
  • the plants, plant parts, or seeds having the heterologous polynucleotide or polypeptide disclosed herein or active variants and fragments thereof can have a modified level of expression of the polynucleotide or polypeptide (i.e., an increase or a decrease in expression level).
  • the plants, plant parts or seeds having the heterologous polynucleotide or polypeptide disclosed herein or active variants and fragments thereof can have a modified level of activity of the polypeptide (i.e., an increase or a decrease in activity level).
  • Methods to generate such modified levels of expression or activity are disclosed elsewhere herein and include, but are not limited to, breeding, gene editing, and transgenic techniques.
  • progeny plants produced as described above can be propagated to produce progeny plants, and the progeny plants that have stably incorporated into their genome a polynucleotide conferring increased protein content, increased oil content, and/or modified oil profile can be selected and can be further propagated if desired.
  • progeny refers to the descendant(s) of a particular cross. Typically, progeny plants result from the breeding of two individuals, although some species (particularly some plants and hermaphroditic animals) can be selfed (i.e., the same plant acts as the donor of both male and female gametes). The descendant(s) can be, for example, of the Fl, the F2, or any sub sequent generation.
  • a plant cell, seed, or plant part or harvest product can be obtained from the plant produced as above, and the plant cell, seed, or plant part can be screened using methods disclosed above for the evidence of stable incorporation of the polynucleotide.
  • stable incorporation refers to the integration of a nucleic acid sequence into the genome of a plant and said nucleic acid sequence is capable of being inherited by the progeny thereof.
  • plant part indicates a part of a plant, including single cells and cell tissues such as plant cells that are intact in plants, cell clumps and tissue cultures from which plants can be regenerated.
  • plant parts include, but are not limited to, single cells and tissues from pollen, ovules, zygotes, leaves, embryos, roots, root tips, anthers, flowers, flower parts, fruits, stems, shoots, cuttings, and seeds; as well as pollen, ovules, egg cells, zygotes, leaves, embryos, roots, root tips, anthers, flowers, flower parts, fruits, stems, shoots, cuttings, scions, rootstocks, seeds, protoplasts, calli, and the like.
  • plant products can be harvested from the plant disclosed above and processed to produce processed products, such as flour, soy meal, oil, starch, and the like. These processed products are also within the scope of this invention provided that they comprise a polynucleotide or polypeptide or variant thereof disclosed herein.
  • processed products include but are not limited to protein concentrate, protein isolate, soybean hulls, meal, flower, oil, and the whole soybean itself.
  • a nucleic acid sequence maybe introduced to a plant cell by various ways, for example, by transformation, by genome modification techniques (such as by genome editing), or by breeding.
  • the plant can be produced by transforming the nucleic acid sequence encoding a polypeptide disclosed above into a recipient plant.
  • the method can comprise editing the genome of the recipient plant so that the resulting plant comprises a polynucleotide encoding a polypeptide disclosed above.
  • the method can comprise increasing the expression level and/or activity of the above-mentioned proteins in a recipient plant, for example, by enhancing promoter activity or replacing the endogenous promoter with a stronger promoter.
  • the method can comprise breeding a donor plant comprising a polynucleotide as described above with a recipient plant and selecting for incorp oration of the polynucleotide into the recipient plant genome.
  • the method comprises transforming a polynucleotide disclosed herein or an active variant or fragment thereof into a recipient plant to obtain a transgenic plant, and said transgenic plant has increased protein content, increased oil content, and/or modified oil profile.
  • Expression cassettes comprising polynucleotides encoding the polypeptides as described above can be used to transform plants of interest.
  • the term “transgenic” and grammatical variations thereof refer to a plant, including any part derived from the plant, such as a cell, tissue, or organ, in which a heterologous nucleic acid is integrated into the genome.
  • the heterologous nucleic acid is a recombinant construct, vector, or expression cassette comprising one or more nucleic acids.
  • a transgenic plant is produced by a genetic engineering method, such as Agrobacterium transformation. Through gene technology, the heterologous nucleic acid is stably integrated into chromosomes, so that the next generation can also be transgenic.
  • “transgenic” and grammatical variations thereof also encompass biological treatments, which include plant hybridization and/or natural recombination.
  • Transformation results in a transformed plant, including whole plants, as well as plant organs (e.g., leaves, stems, roots, etc.), seeds, plant cells, propagules, embryos, and progeny of the same.
  • Plant cells can be differentiated or undifferentiated (e.g., callus, suspension culture cells, protoplasts, leaf cells, root cells, phloem cells, pollen).
  • Transformation may result in stable or transient incorporation of the nucleic acid into the cell.
  • Stable transformation is intended to mean that the nucleotide construct introduced into a host cell integrates into the genome of the host cell and is capable of being inherited by the progeny thereof.
  • Transient transformation is intended to mean that a polynucleotide is introduced into the host cell and does not integrate into the genome of the host cell.
  • Methods for transformation typically involve introducing a nucleotide construct into a plant.
  • the transformation method is an Agrobacterium-mediated transformation
  • the transformation method is abiolistic-mediated transformation. Transformation may also be performed by infection, transfection, microinjection, electroporation, microprojection, biolistics or particle bombardment, electroporation, silica/carbon fibers, ultrasound -mediated, PEG mediated, calcium phosphate co-precipitation, poly cation DMSO technique, DEAE dextran procedure, Agrobacterium and viral-mediated (e.g., Caulimoriviruses, Geminiviruses, RNA plant viruses), liposome- mediated and the like.
  • Agrobacterium and viral-mediated e.g., Caulimoriviruses, Geminiviruses, RNA plant viruses
  • Transformation protocols as well as protocols for introducing polypeptides or polynucleotide sequences into plants may vary depending on the type of plant or plant cell, i.e., monocot or dicot, targeted for transformation.
  • Methods for transformation are known in the art and include those setforth in US PatentNos: 8,575,425; 7,692,068; 8,802,934; and 7,541,517; each of which is herein incorporated by reference. See, also, Rakoczy- Trojanowska, M. (2002) Cell Mol Biol Lett. 7:849-858; Jones et al. (2005) Plant Methods, Vol. 1, Article 5; Rivera et l (2012) Physics of Life Reviews 9:308-345; Bartlett et al.
  • the method relies on particle gun delivery of DNA containing a selectable marker and targeting of the DNA to the plastid genome through homologous recombination. Additionally, plastid transformation can be accomplished by transactivation of a silent plastid-bome transgene by tissue-preferred expression of a nuclear-encoded and plastid- directed RNA polymerase. Such a system has been reported in McBride et al. (1994) Proc. Natl. Acad. Sci. USA 91 (15):7301-7305.
  • the cells that have been transformed may be grown into plants in accordance with conventional ways. See, for example, McCormick et al. (1986) Plant Cell Reports 5 :81-84. These plants may then be grown, and either pollinated with the same transformed strain or different strains, and the resulting hybrid having constitutive expression of the desired phenotypic characteristic identified. Two or more generations may be grown to ensure that expression of the desired phenotypic characteristic is stably maintained and inherited and then seeds harvested to ensure expression of the desired phenotypic characteristic has been achieved. In this manner, the present invention provides a transformed seed (also referred to as "transgenic seed") having a nucleotide construct of the invention, for example, an expression cassette of the invention, stably incorporated into their genome.
  • the method comprises crossing a donor plant comprising a polynucleotide encoding a polypeptide disclosed herein with a recipient plant, and the polypeptide is able to confer increased protein content, increased oil content, and/or modified oil profile in the recipient plant.
  • crossing and “breeding” refer to the fusion of gametes to produce progeny (e.g., by fertilization, such as to produce seed by pollination in plants)
  • a “cross,” “breeding,” or “cross-fertilization” is fertilization of one individual by another (e.g., cross-pollination in plants).
  • the plant disclosed herein may be a whole plant, or may be a plant cell, seed, or tissue, or a plant part such as leaf, stem, pollen, or cell that can be cultivated into a whole plant.
  • a progeny plant created by the crossing or breeding process is repeatedly crossed back to one of its parents through a process referred to herein as “backcrossing”.
  • the “donor” parent refers to the parental plant with the desired gene or locus to be introgressed.
  • the “recipient” parent (used one or more times) or “recurrent” parent (used two or more times) refers to the parental plant into which the gene or locus is being introgressed. For example, see Ragot, M. et al. Marker-assisted Backcrossing: A Practical Example, in Techniques et Utilisations des Marqueurs Mole Les Colloques, Vol. 72, pp.
  • BC1 refers to the second use of the recurrent parent
  • BC2 refers to the third use of the recurrent parent
  • the donor soybean plant is a Glycine max plant. In some embodiments, the donor soybean plant is a Glycine soja plant. In some embodiments, the recipient soybean plant is an elite Glycine max planter an elite Glycine soja plant. In some embodiments, the donor plantis from soy variety Suinong 14 (SN14) . In some embodiments, the donor plant is the soy variety Glycine soja ZYD0006.
  • the polynucleotide sequences provided herein can be targeted to specific sites within the genome of a recipient plant cell.
  • Such methods include, but are not limited to, meganucleases designed against the plant genomic sequence of interest CRISPR-Cas9, TALENs, and other technologies for precise editing of genomes (Feng, et al. Cell Research 23 : 1229-1232, 2013, WO 2013/026740); Cre-lox site-specific recombination; FLP-FRT recombination (Li etal. (2009) Plant Physiol 151 :1087-1095); Bxbl -mediated integration (Yau etal.
  • gene editing is used to mutagenize the genome of a plant to produce plants having one or more of the polypeptides that is able to confer increased protein content, increased oil content, and/or modified oil profile.
  • plants transformed with and expressing gene-editing machinery as described above which, when crossed with a target plant, result in gene editing in the target plant.
  • Gene editing may involve transient, inducible, or constitutive expression of the gene editing components or systems. Gene editing may involve genomic integration or episomal presence of the gene editing components or systems.
  • Gene editing generally refers to the use of a site-directed nuclease (including but not limited to CRISPR/Cas, zinc fingers, meganucleases, and the like) to cut a nucleotide sequence at a desired location. This may be to cause an insertion/deletion (“indel”) mutation, (i.e., “SDN1”), abase edit (i.e., “SDN2”), or allele insertion or replacement (i.e., “SDN3”).
  • indel insertion/deletion
  • SDN2 or SDN3 gene editing may comprise the provision of one or more recombination templates (e.g., in a vector) comprising a gene sequence ofinterest that can be usedfor homology-directed repair (HDR) within the plant (i.e., to be introduced into the plant genome).
  • the gene or allele of interest is one that is able to confer to the plant an improved trait, e g., increased protein content, increased oil content, and/or modified oil profile.
  • the recombination template can be introduced into the plant to be edited either through transformation or through breeding with a donor plant comprising the recombination template. Breaks in the plant genome maybe introduced within, upstream, and/or downstream of a target sequence.
  • a double strand DNA break is made within or near the target sequence locus.
  • breaks are made upstream and downstream of the target sequence locus, which may lead to its excision from the genome.
  • one or more single strandDNA breaks are made within, upstream, and/or downstream of the target sequence (e.g., using a nickase Cas9 variant). Any of these DNA breaks, as well as those introduced via other methods known to one of skill in the art, may induce HDR.
  • the target sequence is replaced by the sequence of the provided recombination template comprising a polynucleotide ofinterest, e g., any one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, 21, 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134 or 137, or a polynucleotide encoding a polypeptide having the sequence of any one of SEQ ID NOs: : 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, 139, 22, or24-59 maybe provided on/as a template.
  • a polynucleotide ofinterest e g., any one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, 21, 74, 77, 80, 83, 86, 110, 113, 116, 119, 122,
  • the polynucleotide of interest is operably linked to a promoter and the expression of the polynucleotide of interest controlled by the promoter conferred increased protein content, increased oil content, and/or modified oil profile to the plant.
  • the promoter is a native promoter, or an active variant or fragment thereof as described above.
  • the native promoter comprises SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, 21, 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134 or 137.
  • mutations in the genes of interest described herein may be generated without the use of a recombination template via targeted introduction of DNA double strand breaks. Such breaks may be repaired through the process of non-homologous end joining (NHEJ), which can result in the generation of small insertions or deletions (indels) at the repair site. Such indels may lead to frameshift mutations causing premature stop codons or other types of loss-of-function mutations in the targeted genes.
  • NHEJ non-homologous end joining
  • gene editing may involve transient, inducible, or constitutive expression of the gene editing components or systems in the target plant.
  • Gene editing may also involve genomic integration or episomal presence of the gene editing components or systems in the target plant.
  • the nucleic acid modification or mutation is effected by a (modified) zinc-finger nuclease (ZFN) system.
  • ZFN zinc-finger nuclease
  • the ZFN system uses artificial restriction enzymes generated by fusing a zinc finger DNA-binding domain to a DNA-cleavage domain that can be engineered to target desired DNA sequences. Exemplary methods of genome editing u sing ZFNs can be found for example in U.S. PatentNos. 6,534,261; 6,607,882; 6,746,838; 6,794,136; 6,824,978; 6,866,997; 6,933, 113; and 6,979,539.
  • the nucleic acid modification is effected by a (modified) meganuclease, which are endodeoxyribonucleases characterized by a large recognition site (double-stranded DNA sequencesof 12to 40 base pairs).
  • a (modified) meganuclease which are endodeoxyribonucleases characterized by a large recognition site (double-stranded DNA sequencesof 12to 40 base pairs).
  • Exemplary methods forusing meganucleases can befoundin USPatentNos: 8,163,514; 8,133,697; 8,021,867; 8, 119,361; 8,119,381; 8,124,369; and 8,129,134, which are specifically incorporated by reference.
  • the nucleic acid modification is effected by a (modified) CRISPR/Cas complex or system.
  • the CRISPR/Cas system or complex is a class 2 CRISPR/Cas system.
  • said CRISPR/Cas system or complex is a type II, type V, or type VI CRISPR/Cas system or complex.
  • the CRISPR/Cas system does not require the generation of customized proteins to target specific sequences but rather a single Cas protein can be programmed by an RNA guide (gRNA) to recognize a specific nucleic acid target; in otherwords, the Cas enzyme protein can be recruited to a specific nucleic acid target locus (which may comprise or consist of RNA and/or DNA) of interest using said short RNA guide.
  • gRNA RNA guide
  • the CRISPR/Cas or CRISPR system is as used herein foregoing documents refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding a Cas gene and one or more of, a tracr (trans-activating CRISPR) sequence (e.g., tracrRNA or an active partial tracrRNA), a tracr-mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred to as a“ spacer” in the context of an endogenous CRISPR system), or“RNA(s)” as that term is herein used (e.g., RNA(s) to guide Cas, such as Cas9, e.g., CRISPR RNA and, where applicable, transactivating (tracr) RNA or a single guide RNA (s).
  • a CRISPR system is characterized by elements that promote the formation of a CRISPR complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous CRISPR system).
  • target sequence refers to a sequence to which a guide sequence is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex.
  • a target sequence may comprise any polynucleotide, such as DNA or RNA polynucleotides.
  • the gRNA is a chimeric guide RNA or single guide RNA (sgRNA).
  • the gRNA comprises a guide sequence and a tracr mate sequence (or direct repeat).
  • the gRNA comprises a guide sequence, a tracr mate sequence (or direct repeat), and a tracr sequence.
  • the CRISPR/Cas system or complex as described herein doesnot comprise and/or does not rely on the presence of a tracr sequence (e.g., if the Cas protein is Cas 12a).
  • the Cas protein as referred to herein such as but not limited to Cas9, Cas 12a (formerly referred to as Cpfl), Casl2b (formerly referred to as C2cl), Casl3a (formerly referred to as C2c2), C2c3, Casl3b protein, may originate from any suitable source, and hence may include different orthologues, originating from a variety of (prokaryotic) organisms, as is well documented in the art.
  • the Cas protein is (modified) Cas9, preferably (modified) Staphylococcus aureus Cas9 (SaCas9) or (modified) Streptococcus pyogenes Cas9 (SpCas9).
  • the Cas protein is Cas 12a, optionally from Acidaminococcus sp., such as Acidaminococcus sp. BV3L6 Cpfl (AsCasl2a) or Lachnospiraceae bacterium Cas 12a, such as Lachnospiraceae bacterium MA2020 or Lachnospiraceae bacterium MD2006 (LBCasl2a).5ee U.S. Pat. No. 10,669,540, incorporated herein by reference in its entirety.
  • Acidaminococcus sp. such as Acidaminococcus sp. BV3L6 Cpfl (AsCasl2a)
  • Lachnospiraceae bacterium Cas 12a such as Lachnospiraceae bacterium MA2020 or Lachnospiraceae bacterium MD2006 (LBCasl2a).5ee U.S. Pat. No. 10,669,540, incorporated herein by reference in its entirety.
  • the Casl2aprotein maybe from Moraxella bovoculi AAX08_00205 [Mb2Casl2a] or Moraxella bovoculi AAX1 1_00205 [Mb3Casl2a]. See WO 2017/189308, incorporated herein by reference in its entirety.
  • the Cas protein is (modified) C2c2, preferably Leptotrichia wadei C2c2 (LwC2c2) or Listeria newyorkensis FSL M6-0635 C2c2 (LbFSLC2c2).
  • the (modified) Cas protein is C2cl.
  • the (modified) Cas protein is C2c3.
  • the (modified) Cas protein is Cas 13b.
  • Other Cas enzymes are available to a person skilled in the art.
  • the gene-editing machinery (e.g., the DNA modifying enzyme) introduced into the plants can be controlled by any promoter that can drive recombinant gene expression in plants.
  • the promoter is a constitutive promoter.
  • the promoter is a tissue-specific promoter, e g., a pollen-specific promoter or a sperm cell specific promoter, a zygote specific promoter, or a promoter that is highly expressed in sperm, eggs and zygotes (e.g., prOsActinl).
  • Suitable promoters are disclosed in U.S. Pat. No. 10,519,456, the entire content of which is herein incorporated by reference.
  • a method of editing plant genomic DNA comprises using a first soybean plant expressing a DNA modification enzyme and at least one optional guide nucleic acid as described above to pollinate a target plant comprising genomic DNA to be edited.
  • the various polynucleotides and variants thereof provided herein can be stacked with one or more polynucleotides encoding a desirable trait such as a polynucleotide that confers, for example, insect, disease or herbicide resistance or other desirable agronomic traits of interest including, but not limited to, traits associated with high oil content; increased digestibility; balanced amino acid content; and high energy content.
  • a desirable trait such as a polynucleotide that confers, for example, insect, disease or herbicide resistance or other desirable agronomic traits of interest including, but not limited to, traits associated with high oil content; increased digestibility; balanced amino acid content; and high energy content.
  • Such traits may refer to properties of both seed and non-seed plant tissues, or to food or feed prepared from plants or seeds having such traits.
  • gene or trait “stacking” is combining desired genes or traits into one transgenic plant line.
  • plant breeders stack transgenic traits by making crosses between parents that each have a desired trait and then identifying offspring that have both of these desired traits (so-called “breeding stacks”).
  • Another way to stack genes is by transferring two or more genes into the cell nucleus of a plant at the same time during transformation.
  • Another way to stack genes is by re-transforming a transgenic plantwith another gene of interest.
  • gene stacking can be used to combine two different insect resistance traits, an insect resistance trait and a disease resistance trait, or an herbicide resistance trait (such as, for example, Btl 1).
  • the use of a selectable marker in addition to a gene of interest would also be considered gene stacking.
  • a nucleic acid molecule or vector of the disclosure can include an additional coding sequence for one or more polypeptides or double stranded RNA molecules (dsRNA) of interest for agronomic traits that primarily are of benefit to a seed company, grower or grain processor.
  • a polypeptide of interest can be any polypeptide encoded by a nucleotide sequence of interest.
  • Non-limiting examples of polypeptides of interest that are suitable for production in plants include those resulting in agronomically important traits such as herbicide resistance (also sometimes referred to as “herbicide tolerance”), virus resistance, bacterial pathogen resistance, insect resistance, nematode resistance, or fungal resistance. See, e.g., U.S.
  • the polypeptide also can be one that increases plantvigor oryield (including traits that allow a plantto grow at different temperatures, soil conditions and levels of sunlight and precipitation), or one that allows identification of a plant exhibiting a trait of interest (e.g., a selectable marker, seed coat color, relative maturity group, etc ).
  • a trait of interest e.g., a selectable marker, seed coat color, relative maturity group, etc.
  • Polynucleotides conferring resistan ce/tolerance to an herbicide that inhibits the growing point or meristem can also be suitable in some embodiments.
  • Exemplary polynucleotides in this category code for mutant ALS and AHAS enzymes as described, e.g., in U.S. PatentNos. 5,767,366 and 5,928,937.
  • U.S. Patent Nos. 4,761,373 and 5,013,659 are directed to plants resistantto various imidazalinone or sulfonamide herbicides.
  • 4,975,374 relatesto plant cells and plants containing a nucleic acid encoding a mutant glutamine synthetase (GS) resistantto inhibition by herbicides that are known to inhibit GS, e.g., phosphinothricin and methionine sulfoximine.
  • GS glutamine synthetase
  • U.S. Patent No. 5,162,602 discloses plants resistant to inhibition by cyclohexanedione and aryloxyphenoxypropanoic acid herbicides. The resistance is conferred by an altered acetyl coenzyme A carboxylase (ACCase).
  • Polypeptides encoded by nucleotides sequences conferring resistance to glyphosate are also suitable forthe disclosure. See, e.g., U.S. PatentNo. 4,940,835 and U.S. PatentNo. 4,769,061.
  • U.S. PatentNo. 5,554,798 discloses transgenic glyphosate resistant maize plants, which resistance is conferred by an altered 5-enolpyruvyl-3-phosphoshikimate (EPSP) synthase gene.
  • EPP 5-enolpyruvyl-3-phosphoshikimate
  • Polynucleotides coding for resistance to phosphono compounds such as glufosinate ammonium or phosphinothricin, and pyridinoxy or phenoxy propionic acids and cyclohexones are also suitable. See, European Patent Application No. 0242246. See also, U.S. Patent Nos. 5,879,903, 5,276,268, and 5,561,236.
  • Suitable polynucleotides include those coding for resistance to herbicides that inhibit photosynthesis, such as a triazine and a benzonitrile (nitrilase) See, U.S. PatentNo.
  • polynucleotides coding for herbicide resistance include those coding for resistance to 2, 2-di chloropropionic acid, sethoxydim, haloxyfop, imidazolinone herbicides, sulfonylurea herbicides, triazolopyrimidine herbicides, s-triazine herbicides and bromoxynil.
  • Additional suitable polynucleotides include those coding for insecticidal polypeptides. These polypeptides may be producedin amounts sufficient to control, for example, insect pests (i.e., insect controlling amounts). Itis recognized that the amount of production of an insecticidal polypeptide in a plant necessary to control insects or other pests may vary depending upon the cultivar, type of pest, environmental factors and the like. Polynucleotides useful for additional insect or pest resistance include, for example, those that encode toxins identified in Bacillus organisms.
  • Bt insecticidal proteins include the Cry proteins such as Cryl Aa, Cry 1 Ab, CrylAc, Cry IB, CrylC, Cry ID, CrylEa, CrylFa, Cry3A, Cry 9A, Cry 9B, Cry 9C, and the like, as well as vegetative insecticidal proteins such as Vip 1 , Vip2, Vip3, and the like.
  • an additional polypeptide is an insecticidal polypeptide derived from a non-B? source, including without limitation, an alpha-amylase, a peroxidase, a cholesterol oxidase, a patatin, a protease, a protease inhibitor, a urease, an alpha-amylase inhibitor, a pore-forming protein, a chitinase, a lectin, an engineered antibody or antibody fragment, a Bacillus cercus insecticidal protein, aXenorhabdus spp. (such as A nematophila or A bovienii) insecticidal protein, a Photorhabdus spp. (such as P.
  • luminescens or P. asymobiotica) insecticidal protein aBrevibacillus spp. (such as B. laterosporous) insecticidal protein, a Lysinibacillus spp. (such as L. sphearicus) insecticidal protein, a Chromobacterium spp. (such as C. subtsugae or C. foundedae) insecticidal protein, a Yersinia spp. (such as Y. entomophaga) insecticidal protein, a Paenibacillus spp. (such as P. propylaea) insecticidal protein, a Clostridium spp. (such as C. bifermentans) insecticidal protein, aPseudomonas spp. (such as P. fluor escens) anda lignin.
  • aBrevibacillus spp. such as B. laterospor
  • Polypeptides that are suitable for production in plants further include those that improve or otherwise facilitate the conversion of harvested plants or plant parts into a commercially useful product, including, for example, increased or altered carbohydrate content or distribution, improved fermentation properties, increased oil content, increased protein content, modified oil profile, improved digestibility, and increased nutraceutical content, e.g., increased phytosterol content, increased tocopherol content, increased stanol content or increased vitamin content.
  • Polypeptides of interest also include, for example, those resulting in or contributing to a reduced content of an unwanted component in a harvested crop, e.g., phytic acid, or sugar degrading enzymes.
  • resultingin or “contributing to” is intended that the polypeptide of interest can directly or indirectly contribute to the existence of a trait of interest (e.g., increasing cellulose degradation using a heterologous cellulase enzyme).
  • the polypeptide contributes to improved digestibility for food or feed.
  • Xylanases are hemicellulolytic enzymes that improve the breakdown of plant cell walls, which leads to better utilization of the plant nutrients by an animal. This leads to improved growth rate and feed conversion. Also, the viscosity of the feeds containing xylan can be reduced. Heterologous production of xylanases in plant cells also can facilitate lignocellulosic conversion to fermentable sugars in industrial processing.
  • a polypeptide useful for the disclosure can be a polysaccharide degrading enzyme. Plants of this disclosure producing such an enzyme may be useful for generating, for example, fermentation feedstocks for bioprocessing.
  • enzymes useful for a fermentation process include alpha amylases, proteases, pullulanases, isoamylases, cellulases, hemicellulases, xylanases, cyclodextrin glycotransferases, lipases, phytases, laccases, oxidases, esterases, cutinases, granular starch hydrolyzing enzyme and other glucoamylases.
  • Polysaccharide-degrading enzymes include: starch degrading enzymes such as a- amylases (EC 3.2.1.1), glucuronidases (E.C. 3.2.1.131); exo-l,4-a-D glucanases such as amyloglucosidases and glucoamylase (EC 3.2.1.3), P-amylases (EC 3.2.1.2), a-glucosidases (EC 3.2.1 .20), and other exo-amylases; starch debranching enzymes, such as a) isoamylase (EC 3.2.1 .68), pullulanase (EC 3.2.1 .41 ), and the like; b) cellulases such as exo-1,4-3 - cellobiohydrolase (EC 3.2.1.91), exo-l,3-P-D-glucanase (EC 3.2.
  • starch degrading enzymes such as a- amylases (EC 3.2
  • L-arabinases such as endo-l,5-a-L-arabinase (EC 3.2.1.99), a-arabinosidases (EC 3.2.1 .55) and the like
  • galactanases such as endo-l,4-P-D-galactanase(EC 3.2.1.89), endo-l,3-P-D-galactanase(EC 3.2.1.90), a-galactosidase (EC 3.2.1.22), P-galactosidase (EC 3.2.1.23) and the like
  • mannanases such as endo-l,4-P-D-mannanase(EC 3.2.1.78), P- mannosidase(EC 3.2.1.25), a-mannosidase (EC 3.2.1.24) and the like
  • xylanases such as endo-l,4
  • proteases such as fungal and bacterial proteases.
  • Fungal proteases include, but are not limited to, those obtained from Aspergillus, Trichoderma, Mucor mARhizopus, such as A. niger, A. awamori, A. oryzae andM. miehei.
  • the polypeptides of this disclosure canbe cellobiohydrolase (CBH) enzymes (EC 3.2.1.91).
  • the cellobiohydrolase enzyme can be CBH1 or CBH2.
  • hemicellulases such as mannases and arabinofuranosidases (EC 3.2.1.55); ligninases; lipases (e.g., E C. 3. 1.1.3), glucose oxidases, pectinases, xylanases, transglucosidases, alpha 1,6 glucosidases (e.g., E.C. 3.2.1.20); esterases such as ferulic acid esterase (EC 3.1.1.73) and acetyl xylan esterases (EC 3.1.1.72); and cutinases (e.g., E.C. 3.1.1.74).
  • two or more polynucleotides encoding two or more polypeptides, each conferring modified oil content and/or altered lipid profile when recombinantly expressed in a plant are stacked in a plant using methods disclosed herein.
  • the resultant genetically modified plant has modified oil content and/or altered lipid profile relative to a control plant, where the control plant does not recombinantly express the two or more polynucleotides.
  • the two or more polynucleotides are expressed in the plant under two or more heterologous promoters.
  • a polynucleotide encoding GmDESI (SEQ ID NO: 76, which correspond to SEQ ID NO: 1 of PCT/CN2022/075982) and a polynucleotide encoding GmSTART (SEQ ID NO: 3, which corresponds to SEQ ID NO: 3 of PCT/CN2022/075977) are stacked in a transgenic soybean plant, resultingin an altered lipid profile and modified total oil content in seeds as compared to a control plant that does not recombinantly express both GmDESI and GmSTART.
  • Double stranded RNA molecules useful with the disclosure include but are not limited to those that suppress target insect genes.
  • gene suppression when taken together, are intended to refer to any of the well-known methods for reducing the levels of protein produced as a result of gene transcription to mRNA and subsequent translation of the mRNA.
  • Gene suppression is also intended to mean the reduction of protein expression from a gene or a coding sequence including posttranscriptional gene suppression and transcriptional suppression.
  • Posttranscriptional gene suppression is mediated by the homology between of all or a part of a mRNA transcribed from a gene or coding sequence targeted for suppression and the corresponding double stranded RNA used for suppression and refers to the substantial and measurable reduction of the amount of available mRNA available in the cell for binding by ribosomes.
  • the transcribed RNA can be in the sense orientation to effect what is called co-suppression, in the anti-sense orientation to effect what is called anti-sense suppression, or in both orientations producing a dsRNA to effect what is called RNA interference (RNAi).
  • RNAi RNA interference
  • Transcriptional suppression is mediatedby the presence in the cell of a dsRNA, a gene suppression agent, exhibiting substantial sequence identity to a promoter DNA sequence or the complement thereof to effect what is referred to as promotertrans suppression.
  • Gene suppression may be effective against a native plantgene associated with a trait, e g., to provide plants with reduced levels of a protein encoded by the native gene or with enhanced or reduced levels of an affected metabolite.
  • Gene suppression can also be effective against target genes in plant pests that may ingest or contact plant material containing gene suppression agents, specifically designed to inhibit or suppress the expression of one or more homologous or complementary sequences in the cells of the pest.
  • genes targeted for suppression can encode an essential protein, the predicted function of which is selected from the group consisting of muscle formation, juvenile hormone formation, juvenile hormone regulation, ion regulation andtransport, digestive enzyme synthesis, maintenance of cell membrane potential, amino acid biosynthesis, amino acid degradation, sperm formation, pheromone synthesis, pheromone sensing, antennae formation, wing formation, leg formation, development and differentiation, egg formation, larval maturation, digestive enzyme formation, hemolymph synthesis, hemolymph maintenance, neurotransmission, cell division, energy metabolism, respiration, and apoptosis.
  • selectable marker means a nucleotide sequence that when expressed imparts a distinct phenotype to the plant, plant part and/or plant cell expressing the marker and thus allows such transformed plants, plant parts and/or plant cells to be distinguished from those that do not have the marker.
  • Such a nucleotide sequence may encode either a selectable or screenable marker, depending on whether the marker confers a trait that can be selected for by chemical means, such as by using a selective agent (e.g., an antibiotic, herbicide, or the like), or on whether the marker is simply a trait that one can identify through observation or testing, such as by screening (e.g., the R-locus trait).
  • Selectable markers can also include the makers associated with oil and/or protein content and fatty acid profile (e.g., as described in Whiting, R.M., et al., BMC Plant Biol. 2020 Oct 23;20(l):485).
  • the genetic characteristic of the plant as represented by its genetic marker profile can be used to select plants of desired traits.
  • the term “marker-based selection” refers to the use of genetic markers to detect one or more nucleic acids from the plant, where the nucleic acid is associated with a desired trait to identify plants that carry genes for desirable (or undesirable) traits.
  • Markers includebut are not limited to Restriction Fragment Length Polymorphisms (RFLPs), Randomly Amplified Polymorphic DNAs (RAPDs), Arbitrarily Primed Polymerase Chain Reaction (AP-PCR), DNA Amplification Fingerprinting (DAF), Sequence Characterized Amplified Regions (SCARs), Amplified Fragment Length Polymorphisms (AFLPs), Simple Sequence Repeats (SSRs) which are also referred to as Microsatellites, and Single Nucleotide Polymorphisms (SNPs).
  • RFLPs Restriction Fragment Length Polymorphisms
  • RAPDs Randomly Amplified Polymorphic DNAs
  • AP-PCR Arbitrarily Primed Polymerase Chain Reaction
  • DAF Sequence Characterized Amplified Regions
  • AFLPs Amplified Fragment Length Polymorphisms
  • SSRs Simple Sequence Repeats
  • SNPs Single Nucleotide
  • association with refers to a recognizable and/or detectable relationship between two entities.
  • the phrase “associated with increased protein content” refers to a trait, locus, gene, allele, marker, phenotype, etc., or the expression product thereof, the presence or absence of which can influence or indicate an extent and/or degree to which a plant or its progeny exhibits increased protein content as compared to a control plant.
  • a marker is “associated with” a trait when it is linked to it and when the presence of the marker is an indicator of whether and/or to what extent the desired trait or trait form will occur in a plant/germplasm comprising the marker.
  • a marker is “associated with” an allele when it is linked to it and when the presence (or absence) of the marker is an indicator of whether the allele is present (or absent) in a plant, germplasm, or population comprising the marker.
  • a marker associated with increased protein content refers to a marker whose presence or absence can be used to predict whether and/or to what extent a plant will display increased protein content as compared to a control plant.
  • allele(s) refers to any of one or more alternative forms of a gene, all of which alleles relate to at least one trait or characteristic. In a diploid cell, the two alleles of a given gene occupy corresponding loci on a pair of homologous chromosomes.
  • genotyp and variants thereof refer to the genetic composition of an organism, including, for example, whether a diploid organism is heterozygous (i.e., has two different alleles for a given gene or QTL) or homozygous (i.e., has the same allele for a given gene or QTL) for one or more genes or loci (e g., an SNP, a haplotype, a gene mutation, an insertion, or a deletion).
  • a diploid organism i.e., has two different alleles for a given gene or QTL
  • homozygous i.e., has the same allele for a given gene or QTL
  • genes or loci e g., an SNP, a haplotype, a gene mutation, an insertion, or a deletion.
  • the markers used to identify the plants comprising the polynucleotides disclosed herein are SNPs.
  • SNP genotyping methods include hybridization, primer extension, oligonucleotide ligation, nuclease cleavage, minisequencing and coded spheres. Such methods are well known and disclosed in e.g., Gut, I.G., Hum. Mutat. 17: 475-492 (2001); Shi, Clin. Chem.
  • an assay e g., generally a two-step allelic discrimination assay or similar
  • aKASP SupTM/Sup assay generally a one-step allelic discrimination assay defined below or similar
  • both can be employed to identify the SNPs that associate with increased protein content, increased oil content, and/or modified oil profile.
  • a forward primer, a reverse primer, and two assay probesthat recognize two different alleles at the SNP site (or hybridization oligos) are employed.
  • the forward and reverse primers are employed to amplify genetic loci that comprise SNPs that are associated with increased protein content, increased oil content, and/or modified oil profile.
  • the assay probes andthe reaction conditions are designed such that an assay probe will only hybridize to the reverse complement of a 100% perfectly matched sequence, thereby permitting identification of which allele (s) that are present based upon detection of hybridizations.
  • the probes are differentially labeled with, for example, fluorop hores to permit distinguishing between the two assay probes in a single reaction.
  • Exemplary methods of amplifying include employing a polymerase chain reaction (PCR) or ligase chain reaction (LCR) using a nucleic acid isolated from a soybean plant or germplasm as a template in the PCR or LCR.
  • a number of SNP alleles together within a sequence, or across linked sequences can be used to describe a haplotype for any particular genotype.
  • haplotypes can be more informative than single SNPs and can be more descriptive of any particular genotype.
  • a single SNP may be allele “T” for a specific disease resistant line or variety, but the allele “T” might also occur in the soybean breeding population being utilized for recurrent parents.
  • a combination of alleles at linked SNPs may be more informative.
  • a unique haplotype has been assigned to a donor chromosomal region, that haplotype can be used in that population or any subset thereof to determine whether an individual has a particular gene.
  • the use of automated high throughput marker detection platforms known to those of ordinary skill in the art makes this process highly efficient and effective.
  • haplotype can refer to the set of alleles an individual inherited from one parent. A diploid individual thus has two haplotypes.
  • haplotype can be used in a more limited sense to refer to physically linked and/or unlinked genetic markers (e.g., sequence polymorphisms) associated with a phenotypic trait.
  • haplotype block (sometimes also referred to in the literature simply as a haplotype) refers to a group of two or more genetic markers that are physically linked on a single chromosome (or a portion thereof). Typically, eachblockhas a few common haplotypes, and a subset of the genetic markers (i.e., a “haplotype tag”) can be chosen that uniquely identifies each of these haplotypes.
  • Block 1 contains SNP #l-#5, of which SNP #4 is located in the CDS coding region.
  • Block 2 contains SNP #7-# 18 12, among which SNP#7 and #8 are located in the CDS coding region;
  • Block 3 contains SNP #19 and #20, both of which are outside the CDS coding region.
  • the SNP genotyping reveals seven different haplotypes that are associated with increased protein content and/or increased oil content. Tables 4-6 shown the genotype of each haplotype.
  • haplotypes Hap _2, Hap _3, andHap_6 were found associated with increased protein content; haplotypes Hap 1, Hap 2, Hap 5 and Hap 7 were found associated with increased oil content. Hap_2 was associated with both increased oil content and increased protein content.
  • FIG. 68 haplotypes Hap _2, Hap _3, andHap_6 were found associated with increased protein content; haplotypes Hap 1, Hap 2, Hap 5 and Hap 7 were found associated with increased oil content.
  • Hap_2 was associated with both increased oil content and increased protein content.
  • SNP markers can be used in a marker-assisted breeding program to move traits, such as native traits or traits conferred by transgenes or traits conferred by genome editing, into a desired plant background.
  • traits such as native traits or traits conferred by transgenes or traits conferred by genome editing
  • native trait refers to a trait already existing in germplasm, including wild relatives of crop species, or that can be produced by the recombination of existing traits.
  • progeny plants from a cross between a donor soybean plant comprising in its genome a nucleic acid sequence encoding SEQ ID NO: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, 139, 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, or a fragment orvariant ofany one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, 139, 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59, and a recipient soybean plant not comprising said nucleic acid sequence can be screened to detect the presence of the markers associated with increased protein content, increased oil content, and/or modified oil profile.
  • Plants comprising said markers can be selected and verified for increased protein content, increased oil content, and/or modified oil profile as compared to control plants.
  • the donor plant comprises a nucleic acid sequence encoding SEQ ID NO: 76.
  • the donor plant comprises a nucleic acid sequence encoding SEQ ID NO: 3 and the markers listed in Table 3.
  • the markers that can be used to select plants having increased protein content are the alleles associated with one or more haplotypes of Hap l, Hap_2, Hap_5, or Hap_7.
  • the markers that can be used to select plants having increased oil content are the alleles associated with one or more haplotypes of Hap_2, Hap_3 , or Hap_6.
  • the favorable alleles of the SNPs are those present in one or more of the aforementioned haplotypes.
  • kits and primers that can be used to introduce a polynucleotide sequence as described in this disclosure into a recipient plant or to detect a polynucleotide sequence as describedin this disclosure in a plant.
  • kits and primers that can be usedto identify plants that have increased protein content, increased oil content, and/or modified oil profile.
  • the primers can include Glyma.20G092400-zF, ATGGCCTCCAACGGCG (SEQ ID NO: 37); and Glyma.20G092400-zR, AGCCGAAAGAAGAGCACAAGTAAACC (SEQ ID NO: 38).
  • the primers can include Glyma.06G303700-F ATAACTAGTATGTTCCAGCCGAACC (SEQ ID NO: 63); and Glyma.06G303700-R, ATAGGATCCAGCAGGTTCACCAGA(SEQ ID NO: 64).
  • kits and primers that can be usedto detect the expression level of the polypeptide disclosed herein in plants.
  • the primers can include Glyma.20G092400-q-F CTGATGCTCAAAAGCTTAGGACCCG (SEQ ID NO: 100); and Glyma.20G092400-q-R AACCTTGTTGTAAACCTGACGAGAAAT (SEQ ID NO: 101) (Table 14).
  • the primers can include Glyma.06G303700-q-F: AGTTGCACCGATTCAACAGGC (SEQ ID NO: 65); and Glyma.06G303700-q-R CCATGCGATGTGGTTCCATCT (SEQ ID NO: 66).
  • the kit may also comprise one or more probes having a sequence corresponding to or complementary to a sequence having 80% to 100% sequence identity with a specific region of the transgenic event or gene editing event.
  • the kit may comprise any reagent and material required to perform the assay or detection method.
  • any reference to a series of embodiments is to be understood as a reference to each of those embodiments disjunctively (e.g., "Embodiments 1-4" is to be understood as “Embodiments 1, 2, 3, or4").
  • Embodiment Al is an elite Glycine max plant having in its genome a nucleic acid sequence from a donor Glycine plant, wherein the donor Glycine plant is a different strain from the elite Glycine max plant, and wherein the nucleic acid sequence encoding at least one polypeotide having at least 90% identity or 95% identity to SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, wherein said polypeptide confers increased protein content, oil content, and/or modified oil profile on the elite Glycine max plantas compared to a control plant not comprising said nucleic acid sequence.
  • Embodiment A2 is the elite Glycine max plant of embodiment Al, wherein the donor Glycine plant is a Glycine soja plant or Glycine max plant.
  • Embodiment A3 is the elite Glycine max plant of embodiment A2, wherein the Glycine soja plant is a ZYD00006 variety.
  • Embodiment A4 is the elite Glycine max plant of embodiment A2, wherein the Glycine max plant is a DN50 variety or a SN14 variety.
  • Embodiment A5 is the elite Glycine max plant of any one of embodiments Al -A4, wherein the nucleic acid sequence encodes at least one polypeptide having the amino acid sequence set forth in the SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139.
  • Embodiment A6 is the elite Glycine max plant of any one of embodiments Al -A4, wherein the nucleic acid sequence has at least 90% identity, at least 95% identity, or at least 100% identity to any one of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134, or 137, or the nucleic acid sequence has at least 90% identity, at least 95% identity, or at least 100% identity to any one of SEQ ID NOs: 75, 78, 81, 84, 87, 111, 114, 117, 120, 123, 126, 129, 132, 135, or 138 .
  • Embodiment A7 is the elite Glycine max plant of any one of embodiments Al -A6, wherein the polypeptide encoded by the nucleic acid sequence has at least 90% identity or at least 95% identity to SEQ ID NO: 3, wherein the polypeptide comprises an aminotransferase domain, wherein the amino transferase domain has no more than two, no more than five, no more than ten amino acid substitutions as compared to amino acid residues 91-274 of SEQ ID NO: 76.
  • Embodiment A8 is the elite Glycine max plant of any oneof embodiments Al -A7, wherein said nucleic acid sequence is introduced into said plant genome by genome editing of genomic sequences corresponding to and comprising any one of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134, or 137, wherein the genome editing confers increased protein content, oil content, and/or oil profile.
  • Embodiment A9 is the elite Glycine max plant of embodiment A8, wherein the gene editing is by CRISPR, TALEN, meganucleases, or through modification of genomic nucleic acids.
  • Embodiment Al 0 is the elite Glycine max plant of embodiment Al -A6, wherein said nucleic acid sequence is introduced into said plant genome by transgenic expression of (a) a nucleic acid sequence having at least 90% identity or at least 95% identity to any one of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134, or 137, (b) a nucleic acid sequence encoding a polypeptide having at least 90% identity or at least 95% identity to the sequence of any one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, or (c) a nucleic acid sequence encoding a polypeptide the sequence of any one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115,
  • Embodiment Al 1 is the elite Glycine max plant of any one of embodiments Al- A10, wherein the elite Glycine max plant is an agronomically elite Glycine max plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, or herbicide tolerance.
  • Embodiment A12 is a plant, the genome of which hasbeen edited to comprise a nucleic acid sequence encoding at least one polypeptide having at least 90% identity or 95% identity to any one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, wherein said polypeptide confers increased protein content, increased oil content, and/or modified oil profile relative to a control plant, wherein the plant does not comprise said nucleic acid sequence before the genome editing.
  • Embodiment A13 is the plant of embodiment Al 2, wherein the nucleic acid sequence is introduced into said plant genome by genome editing of the nucleic acid sequence set forth in any one of SEQ ID NOs : 74, 77, 80, 83 , 86, 110, 113 , 116, 119, 122, 125, 128, 131, 134, or 137.
  • Embodiment Al 4 is the plant of embodiment Al 2 or Al 3, wherein the genome editing comprises duplication, inversion, promoter modification, terminator modification and/or splicing modification of the nucleic acid sequence.
  • Embodiment Al 5 is the plant of any one of embodiments A12-A14, wherein the genome editing is accomplished through CRISPR, TALEN, meganucleases, or through modification of genomic nucleic acids.
  • Embodiment Al 6 is the plant of any one of embodiments A12-A15, wherein the plant is an agronomically elite plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, or herbicide tolerance.
  • Embodiment Al 7 is the plant of any one of embodiments A12-A16, wherein the nucleic acid sequence is operably linked to a heterologous promoter and wherein the heterologous promoter is active in the plant.
  • Embodiment Al 8 is the plant of embodiment Al 7, wherein the heterologous promoter is a native promoter or active variant of fragment thereof.
  • Embodiment Al 9 is a plant having stably incorporated into its genome a nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence encodes a polypeptide having an amino acid sequence that has at least 85% identity, at least 90% identity, or at least 95% identity to at least one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, or an amino acid sequence set forth in SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, wherein said nucleic acid sequence is heterologous to the plant, and wherein the plant has increased protein content, increased oil content, and/or modified oil profileas compared to a control plant.
  • Embodiment A20 is the plant of embodiment Al 9, wherein the nucleic acid sequence comprises at least 85% identity, at least 90% identity, or at least 95% identity to at least one of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134, or 137, or the nucleic acid sequence is any one of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134, or 137.
  • Embodiment A21 is the plant of embodiments Al 9 or A20, wherein the nucleic acid sequence is introduced into the genome by transgenic expression.
  • Embodiment A22 is the plant of embodiments Al 9 or A20, wherein the nucleic acid sequence is introduced into the genome by genome editing.
  • Embodiment A23 is the plant of embodiment A22, wherein the promoter is an endogenous promoter.
  • Embodiment A24 is the plant of any one of embodiments A19-A23, wherein the promoter is a constitutive promoter, an inducible promoter, or a tissue-specific promoter
  • Embodiment A25 is the plant of any one of embodiments A19-A24, wherein the plant is a dicot plant.
  • Embodiment A26 is the plant of embodiment A25, wherein the dicot plant is a soybean plant or an elite soybean plant.
  • Embodiment A27 is the plant of any one of embodiments Al 9-A24, wherein the plant is a monocot plant.
  • Embodiment A28 is the plant of embodiment A27, wherein the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane.
  • Embodiment A29 is the plant of any one of embodiments Al 9-A28, wherein the plant is an agronomically elite plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, or herbicide tolerance.
  • Embodiment A30 is a progeny plant from the elite Glycine max plant of any one of embodiments Al -All or the plant of any one of embodiments A12-A29, wherein the progeny plant has stably incorporated into its genome the nucleic acid sequence.
  • Embodiment A31 is a plant cell, seed, or plant part derived from the elite Glycine max plant of any one of embodiments Al -Al 1 or the plant of any one of embodiments A12- A29, wherein said plant cell, seed or plant part has stably incorporated into its genome the nucleic acid sequence.
  • Embodiment A32 is a harvest product derived from the elite Glycine max plant of any one of embodiments Al -All or the plant of any one of embodiments A12-A29.
  • Embodiment A33 is a processed product derived from the harvest product of embodiment A32, wherein the processed product is a flour, a meal, an oil, a starch, ora product derived from any of the foregoing.
  • Embodiment A34 is a method of producing a soybean plant having increased polypeptide and/or oil content, the method comprising the steps of: a) providing a donor soybean plant comprising in its genome a nucleic acid sequence encoding any at least one polypeptide having at least 90% identity or 95% identity to SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, wherein said nucleic acid sequence confers to said donor soybean plant increased protein content, increased oil content, and/or modified oil profile compared to donor Glycine plant, b) crossing the donor soybean plant of a) with a recipient soybean plant not comprising said nucleic acid sequence; and c) selecting a progeny plant from the cross of b) by detecting the presence of the nucleic acid sequence, or the presence of one or more molecular markers associated with the nucleic acid sequencein the progeny plant, thereby producing
  • Embodiment A35 is the method of embodiment A34, wherein the molecular marker is a single nucleotide polymorphism (SNP), a quantitative trait locus (QTL), an amplified fragment length polymorphism (AFLP), randomly amplified polymorphic DNA (RAPD), a restriction fragment length polymorphism (RFLP) or a microsatellite.
  • SNP single nucleotide polymorphism
  • QTL quantitative trait locus
  • AFLP amplified fragment length polymorphism
  • RAPD randomly amplified polymorphic DNA
  • RFLP restriction fragment length polymorphism
  • Embodiment A36 is the method of embodiment A34 or A35, wherein the either donor or recipient soybean plant is an elite Glycine max plant.
  • Embodiment A37 is a method of producing a Glycine max plant with increased protein content, increased oil content, and/or modified oil profile, the method comprising the steps of: a) isolating a nucleic acid from a Glycine max plant b) detecting in the nucleic acid of a) at least one molecular marker associated with a nucleic acid sequence comprising any one of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134, or 137, wherein said nucleic acid sequence confers to the Glycine max plant increased protein content, increased oil content, and/or modified oil profile; c) selecting a Glycine max plant based on the presence of the molecular marker detected in b); and d) producing a Glycine max progeny plant from the plant of c) identified as having said molecular marker associated with increased polypeptide and/or increased oil content.
  • Embodiment A38 is the method of embodiment A37, wherein the molecular marker is a single nucleotide polymorphism (SNP), a quantitative trait locus (QTL), an amplified fragment length polymorphism (AFLP), randomly amplified polymorphic DNA (RAPD), a restriction fragment length polymorphism (RFLP) or a microsatellite.
  • SNP single nucleotide polymorphism
  • QTL quantitative trait locus
  • AFLP amplified fragment length polymorphism
  • RAPD randomly amplified polymorphic DNA
  • RFLP restriction fragment length polymorphism
  • Embodiment A39 is the method of embodiment A38, wherein the detecting comprises amplifying a molecular marker locus or a portion of the molecular marker locus and detecting the resulting amplified molecular marker amplicon.
  • Embodiment A40 is the method of embodiment A39, wherein the amplifying comprises employing a polymerase chain reaction (PCR) or ligase chain reaction (LCR) using a nucleic acid isolated from a soybean plant or germplasm as a template in the PCR or LCR.
  • PCR polymerase chain reaction
  • LCR ligase chain reaction
  • Embodiment A41 is the method of embodiment A39, wherein the nucleic acid is selected from DNA or RNA.
  • Embodiment A42 is a plant produced by the method of any one of embodiments A34-A41.
  • Embodiment A43 is a method of conferring increased protein content, increased oil content, and/or modified oil profile to a plant comprising: a) introducing into the genome of the plant a nucleic acid molecule operably linked to a promoter active in the plant, wherein the nucleic acid sequence is stably incorporated into the genome, wherein the nucleic acid sequence encodes a polypeptide having (i) an amino acid sequence comprising at least 85%, at least 90%, or at least 95% identity to any one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, or (ii) an amino acid sequence set forth in SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, wherein said nucleic acid sequence is heterologous to
  • Embodiment A44 is the method of embodiment A43, wherein the nucleic acid sequence is introduced into the genome of the plant by transformation.
  • Embodiment A45 is the method of embodiment A44, wherein the nucleic acid sequence is introduced into the genome of the plant by crossing a donor plant comprising the nucleic acid sequence with the plant to produce a progeny plant having increased protein content, increased oil content, and/or modified oil profile.
  • Embodiment A46 is the method of embodiment A45, wherein the nucleic acid sequence is introduced into the genome of the plant by gene editing of the genome of the plant
  • Embodiment A47 is the method of embodiment A45, wherein the method comprises Cast 2a mediated gene replacement.
  • Embodiment A48 is the method of any one of embodiments A43-A47, wherein the promoter is an exogenous promoter.
  • Embodiment A49 is the method of any of embodiments A43-A47, wherein the promoter is an endogenous promoter.
  • Embodiment A50 is the method of any one of embodiments A43-A49 wherein the method comprises screening for the introduced nucleic acid sequencewith PCR and/or sequencing.
  • Embodiment A51 is the method of any one of embodiments A43-A50, wherein the plant is a dicot plant.
  • Embodiment A52 is the method of embodiment A51 , wherein the dicot plant is a soybean plant.
  • Embodiment A53 is the method of any one of embodiments A43-A51, wherein the plant is a monocot plant.
  • Embodiment A54 is the method of embodiment A53, wherein the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane.
  • Embodiment A55 is a plant produced by the method of any one of embodiments A43-A54.
  • Embodiment A56 is a polypeptide selected from: (a) a polypeptide having the amino acid sequence shown in SEQ ID NO: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136 or 139, or any portion thereof, wherein the portion confers increased polypeptide and/or oil content, and having a heterologous amino acid sequence attached thereto; (b) a polypeptide comprisingthe amino acid sequence of SEQ ID NO: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136 or 139, and having substitution and/or deletion and/or addition of one or more amino acid residues, wherein expression of the polypeptide confers increased polypeptide and/or oil content on the plant; (c) a polypeptide having more than 99%, more than 95%, more than 90%, more than 85%, or more than 80% identity
  • Embodiment A57 is a nucleic acid molecule comprising: (a) a nucleotide sequence encoding a protein having an amino acid sequence sharing at least 90%, 95% or 100% sequence identity to SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, wherein said nucleotide sequence comprises a heterologous nucleic acid sequence attached thereto and expression of the nucleic acid molecule in a plant increases protein and/or oil content in the plant; (b) the nucleotide sequence of part (a) comprising a sequence of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134, or 137; or (c) the nucleotide sequence of part (a) having at least more than 99%, at least 95%, at least 90%
  • Embodiment A58 is an expression cassette comprisingthe nucleicacid molecule of embodiment 56 or encoding the polypeptide of embodiment A57.
  • Embodiment A59 is the expression cassette of embodiment A58, wherein the nucleic acid molecule is operably linked to a promoter capable of directing expression in a plant cell.
  • Embodiment A60 is the expression cassette of embodiment A59, wherein the promoter is an endogenous promoter.
  • Embodiment A61 is the expression cassette of embodiment A59, wherein the promoter is an exogenous promoter.
  • Embodiment A62 is the expression cassette of embodiment A61, wherein the promoter comprises pSOYl (SEQ ID NO: 20).
  • Embodiment A63 is a vector comprising the nucleic acid molecule of embodiment A62, the expression cassette of any one of embodiments A56-A61, a nucleic acid molecule havingthe sequence setforth in SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134, or 137 , or a nucleic acid sequence encoding the polypeptide havingthe sequence setforthin SEQ ID NO: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139.
  • Embodiment A64 is a transgenic cell comprising the nucleic acid molecule of embodiment A63 or the expression cassette of any one of embodiments A56-A63.
  • Embodiment A65 Use of the polypeptide of embodiment A56 or the nucleic acid molecule of embodiment A57, or the expression cassette, of any one of embodiments A56- A64, or the transgenic cell of embodiment A63 in conferring increased protein content, increased oil content, and/or modified oil profile.
  • Embodiment A66 is use of the expression cassette of any one of embodiments A56- A64 in a cell, wherein the expression level and/or activity of the polypeptide in the cell is increased, and the protein content is increased, the oil content is increased and/or the oil profile is modified in the cell.
  • Embodiment A67 is a method for increasing protein content, increasing oil content, and/or modifying oil profile in a plant, comprising increasing the expression level and/or activity of the polypeptide of embodiment A56 in the plant.
  • Embodiment A68 is a method for producing a plant variety with increased protein content, increased oil content, and/or modified oil profile in a plant, comprising increasing the expression level and/or activity of the nucleic acid molecule of embodiment A57 in the plant.
  • Embodiment A69 is the method of embodiments A67 or A68, wherein the increasing the expression level and/or activity of the polypeptide in the plant is by transgenic means or by breeding.
  • Embodiment A70 is a method for producing a transgenic plant with increased protein content, increased oil content, and/or modified oil profile, comprising introducing the nucleic acid molecule of embodiment A57 or the expression cassette of any one of embodiments A65-A69 to a recipient plant to obtain a transgenic plant, wherein the transgenic plant has increased protein content, increased oil content, and/or modified oil profile compared to the recipient plant.
  • Embodiment A71 is the method of embodiment A70, wherein the introducing the nucleic acid molecule to the recipient plant is performed by introducing the expression cassette of any one of embodiments A6-A64 into the recipient plant
  • Embodiment A72 is a primer pair for amplifying the nucleic acid molecule of embodiment A57.
  • Embodiment A73 is the primer pair of embodiment A72, wherein the primer pair is a primer pair composed of two single-stranded DNA shown in at least one of Table 14, Table 17, Table 18, and Table 19.
  • Embodiment A74 is a kit comprising the primer pair of embodiment A72 or A73.
  • Embodiment B An elite Glycine max plant having in its genome a nucleic acid sequence from a donor Glycine plant, wherein the donor Glycine plant is a different strain from the elite Glycine max plant, and wherein the nucleic acid sequence encoding at least one polypeptide having at least 90% identity or 95% identity to the amino acid sequence of SEQ ID NO: 3, 5, 8, 9, 12,15, 18, 19, 22, or 24-59, wherein said polypeptide confers increased protein, oil content, and/or modified oil profile on the elite Glycine max plant.
  • EmbodimentB2 The elite Glycine max plant of embodiment Bl, wherein the donor Glycine plant is from Glycine soja or Glycine max.
  • Embodiment B3 The elite Glycine max plant of embodiment B2, wherein the Glycine soja is the ZYD00006 variety.
  • Embodiment B The elite Glycine max plant of embodiment Bl or B2, wherein the nucleic acid sequence encodes at least one polypeptide having the amino acid sequence set forth in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59.
  • Embodiment B5 The elite Glycine max plant of any one of embodiments Bl -B3, wherein the nucleic acid sequence has atleast90%, 95% or 100% sequence identity to any one of SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or a polynucleotide encoding a polypeptide having the amino acid sequence of any one of SEQ ID NOs: 22, or 24-59.
  • Embodiment B6 Embodiment B6.
  • the elite Glycine max plant of embodiment Bl wherein the polypeptide encoded by the nucleic acid sequence has at least 90%, or at least 95% identity to SEQ ID NO: 3, SEQ ID NO: 5, or SEQ ID NO: 22, wherein the polypeptide comprises one or more of the following: (i) a START domain, wherein START domain has no more than two, no more than five, no more than ten amino acid substitutions as compared to amino acid residues 246-466 of SEQ ID NO: 20, or (ii) a homeodomain, wherein the homeodomain has no more than two, no more than five, no more than ten amino acid substitutions as compared to amino acid residues 55-113 of SEQ ID NO: 20.
  • Embodiment B7 The elite Glycine max plant of any one of embodiments Bl -B6, wherein the nucleic acid sequence is introduced into said plant genome by genome editing of the sequence setforthin SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16 or 17, wherein the genome editing confers increased protein, oil content, and/or oil profile.
  • Embodiment B8 The elite Glycine max plant of any one of embodiments Bl -B6, wherein the nucleic acid sequence is introduced by genome editing of a Glycine max genomic region homologous to or an ortholog of the nucleic acid sequence corresponding to SEQ ID NO: 1, and further making at least one genomic editto said Glycine max genomic region of atleast one allele change corresponding to any described in any of Tables 4-6, wherein the one or more alleles are associated with the one or more of haplotypes Hap l, Hap_2, Hap_3, Hap_5, Hap_6, and/or Hap_7, wherein said one or more alleles confer in the plant increased protein and/or oil content, wherein said Glycine max genomic region did not comprise said allele change before genome editing, and wherein said genomic edit confers in the plant increased protein and/or oil content.
  • Embodiment B9 The elite Glycine max plant of embodimentB7 or B8, wherein the genomic editing is accomplished through CRISPR, TALEN, meganucleases, or through modification of genomic nucleic acids.
  • Embodiment B10 The elite Glycine max plant of any one of embodiments Bl -B6, wherein said nucleic acid sequence is introduced into said plant genome by transgenic expression of (a) a nucleic acid sequence encoding at least one polypeptide having at least 90% identity or 95% identity to SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or a polynucleotide encoding a polypeptide comprising the amino acid sequence of any one of SEQ ID NOs: 22, or 24-59 or (b) a nucleic acid sequence encoding atleast one polypeptide comprisingthe amino acid sequence set forth in SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59 wherein said polypeptide confers increased protein and/or oil content on the elite Glycine max plant.
  • Embodiment Bl 1. The elite Glycine max plant of any of embodiments Bl -BIO, wherein the elite Glycine max plant has in its genome at least one allele that is associated with a haplotype of Hap l , Hap_2, Hap_5, and/or Hap_7, wherein the plant has increased oil content.
  • Embodiment Bl 2. The elite Glycine max plant of any of embodiments Bl -BIO, wherein the elite Glycine max plant has in its genome at least one allele that is associated with a haplotype of Hap_2, Hap_3, and/or Hap_6, wherein the plant has increased protein content.
  • Embodiment Bl 3. The elite Glycine max plant of any one of embodiments Bl -B6, wherein at least one parental line of said elite Glycine max plant was selected or identified through molecular marker selection, wherein said parental line is selected or identified based on the presence of a molecular marker located within or closely linked with said nucleic acid sequence of any one of SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, or any portion thereof, wherein said molecular marker is associated with increased protein and/or oil content and/or modified oil profile.
  • Embodiment Bl 4 The elite Glycine max plant of embodiment Bl 3, wherein the molecular marker is a single nucleotide polymorphism (SNP), a quantitative trait locus (QTL), an amplified fragment length polymorphism (AFLP), randomly amplified polymorphic DNA (RAPD), a restriction fragment length polymorphism (RFLP), or a micro satellite.
  • SNP single nucleotide polymorphism
  • QTL quantitative trait locus
  • AFLP amplified fragment length polymorphism
  • RAPD randomly amplified polymorphic DNA
  • RFLP restriction fragment length polymorphism
  • Embodiment Bl 5. The elite Glycine maxplant of embodimentB13 orB14, wherein the nucleic acid sequence comprises a SNP marker associated with increased protein and/or oil content, and wherein the molecular marker is any one or more of the SNP markers as shown in Table 3
  • Embodiment Bl 6 The elite Glycine max plant of any one of embodiments Bl -Bl 5, wherein the elite Glycine max plant is an agronomically elite Glycine max plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, herbicide tolerance.
  • a plant the genome of which has been edited to comprise a nucleic acid sequence encoding at least one polypeptide having at least 90% identity or 95% identity to SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or at least one polynucleotide encoding a polypeptide comprising the amino acid sequence of any one of SEQ ID NOs: 22, or24-59, wherein said polypeptide confers increased protein and/or oil content and/or modified oil profile relative to a control plant, wherein the plant does not comprise said nucleic acid sequence before the genome editing.
  • Embodiment Bl 8. The plant ofembodiment Bl 7, wherein the nucleic acid sequence is introduced into said plant genome by genome editing of a nucleic acid sequence set forth in SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or a nucleic acid sequence encoding any oneof SEQ ID NOs: 22, or 24-59.
  • Embodiment Bl 9. The plant of embodiment B 17 or Bl 8, wherein the nucleic acid sequence is introduced by genome editing of a genomic region homologous to or an ortholog of the nucleic acid sequence corresponding to SEQ ID NO: 1, and further making at least one genomic editto said Glycine max genomic region of at least one allele change corresponding to any described in any ofTable 3, wherein the one or more alleles are associated with the one or more of haplotypes Hap_l, Hap_2,Hap_3, Hap_5,Hap_6 and/or Hap_7, wherein said one or more alleles confer in the plant increased protein and/or oil content, wherein said Glycine max genomic region did not comprise said allele change before genome editing, and wherein said genomic edit confers in the plant increased protein and/or oil content.
  • Embodiment B20 The plant ofembodiment Bl 7, wherein the nucleic acid sequence is modified into said plant genome by duplication, inversion, promoter modification, terminator modification and/or splicing modification via genome editing of a nucleic acid sequence set forth in any one of SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or a nucleic acid sequence encoding a polypeptide comprising an amino acid sequence of any one of SEQ ID NOs: 22, or 24-59.
  • Embodiment B21 The plant of any one ofembodiments B17-B20, wherein the genomic editing is accomplished through CRISPR, TALEN, meganucleases, or through modification of genomic nucleic acids.
  • Embodiment B22 The plant of any one ofembodiments B17-B21, wherein the plant has in its genome at least one allele that is associated with a haplotype of Hap l , Hap_2, Hap_5, and/or Hap_7, wherein the plant has increased oil content.
  • Embodiment B23 The plant of any one ofembodiments B17-B21, wherein the plant has in its genome at least one genetic marker that is allele that is associated with a haplotype of Hap_2, Hap_3 , and/or Hap_6, wherein the plant has increased protein content.
  • Embodiment B24 The plant of any one ofembodiments B17-B23, wherein the nucleic acid sequence comprises a SNP marker associated with increased protein and/or oil content, and wherein the molecular marker is any one or more of the SNP markers as shown in Table 3
  • Embodiment B25 The plant of any one ofembodiments B17-B24, wherein the plant is an agronomically elite plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, herbicide tolerance.
  • Embodiment B26 The plant of any one ofembodiments B17-B25, wherein the nucleic acid sequence is operably linked to a heterologous promoter and wherein the heterologous promoter is active in the plant.
  • Embodiment B27 The plant of embodimentB26, wherein the promoter is a native promoter or active variant or fragment thereof
  • Embodiment B28 A plant having stably incorporated into its genome a nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence encodes a polypeptide having (a) an amino acid sequence comprising at least 85%, at least 90%, or at least 95% identity to at least one of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, or (b) an amino acid sequence setforth in SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein said nucleic acid sequenceis heterologous to the plant, and wherein the plant has increased protein content and/or increased oil and/or modified oil profile as compared to a control plant.
  • Embodiment B29 The plant of embodimentB28, wherein (a) said nucleic acid sequence comprises at least 85%, at least 90%, or at least 95% identity to at least one of SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or21 orto a polynucleotide encodes any one of SEQ ID NOs: 22, 24-59, or (b) said nucleic acid sequence is any one of SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or21 or encodes any one of SEQ ID NOs: 22, 24-59.
  • Embodiment B30 The plant of embodiment B28 or B2 , wherein the nucleic acid sequence is introduced into the genome by transgenic expression.
  • Embodiment B31 The plant of embodiment B28 or B29, wherein the nucleic acid sequence is introduced by genome editing.
  • Embodiment B32 The plant of any one of embodiments B28-B31, wherein the promoter is an endogenous promoter.
  • Embodiment B33 The plant of any one of embodiments B28-B31, wherein the promoter is a constitutive promoter, inducible promoter, a a tissue-specific promoter.
  • Embodiment B34 The plant ofany one of embodiments B28-B30, wherein said genomic region of the plant comprises at least one allele corresponding to one or more alleles as described in any of Tables 3-6, wherein the one or more alleles are associated with one or more of haplotypes Hap_l , Hap_2, Hap_3 , Hap_5, Hap_6, and/or Hap_7, and wherein said one or more alleles confer in the plant increased protein and/or oil content.
  • EmbodimentB35 The plant of any one of embodiments B28-B34, wherein the plant has in its genome at least one allele associated with a haplotype of Hap l , Hap_2, Hap_5, and/or Hap_7, wherein the plant has increased oil content.
  • Embodiment B36 The plant of any one of embodiments B28-B34, wherein the plant has in its genome at least one allele associated with a haplotype ofHap_2, Hap_3, and/or Hap_6, wherein the plant has increased protein content.
  • Embodiment B37 The plant of any one of embodiments B28-B36, wherein the nucleic acid sequence comprises a SNP marker associated with increased protein and/or oil content, and wherein the molecular marker is any one or more of the SNP markers as shown in Table 3.
  • Embodiment B38 The plant of any one of embodiments B28-B37, wherein the plant is a dicot plant.
  • EmbodimentB39 The plant of emb odimentB38, wherein the dicotplantis a soybean plant or an elite soybean plant.
  • Embodiment B40 The plant of any one of embodiments B28-B37, wherein the plant is a monocot plant.
  • Embodiment B41 The plant of embodimentB40, wherein the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane.
  • Embodiment B42 The plant of any one ofembodiments B28-B41, wherein the plant is an agronomically elite plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, or herbicide tolerance.
  • Embodiment B43 A progeny plant from the elite Glycine max plant of any one of embodiments l-16 orthe plantof any oneof embodiments B17-B42, wherein saidprogeny plant has stably incorporated into its genome the nucleic acid sequence.
  • Embodiment B44 A plant cell, seed, or plant part derived from the elite Glycine max plantof any one of embodiments Bl -Bl 6 or the plant of any one of embodiments Bl 7- B42, wherein said plant cell, seed or plant part has stably incorporated into its genome the nucleic acid sequence.
  • Embodiment B45 A harvest product derived from the elite Glycine max plant of any one of embodiments B1-B16 or the plant of any one of embodiments Bl 7-B42.
  • Embodiment B46 A processed product derived from the harvest product of embodiment B45, wherein the processed product is a flour, a meal, an oil, a starch, or a product derived from any of the foregoing.
  • Embodiment 47 A method of producing a soybean plant having increased protein and/or oil content and/or modified oil profile, the method comprising the steps of : a) providing a donor soybean plant comprising in its genome a nucleic acid sequence encoding at least one polypeptide having at least 90% identity, at least 95% identity, or at least 98% identity to SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or a nucleic acid sequence encoding a polypeptide having an amino acid sequence of any one of SEQ ID NOs: 22, or 24- 59, wherein said nucleic acid sequence confers onto said donor soybean plant an increased protein and/or oil content and/or modified oil profile; b) crossing the donor soybean plant of a) with the recipient soybean plant not comprising said nucleic acid sequence; and c) selecting a progeny plant from the cross ofb) by isolating a nucleic acid from said progeny plant and detecting within said nucleic acid a molecular marker associated with said
  • Embodiment B48 The method of embodiment B47, wherein the molecular marker is a single nucleotide polymorphism (SNP), a quantitative trait locus (QTL), an amplified fragment length polymorphism (AFLP), randomly amplified polymorphic DNA (RAPD), a restriction fragment length polymorphism (RFLP) or a microsatellite.
  • SNP single nucleotide polymorphism
  • QTL quantitative trait locus
  • AFLP amplified fragment length polymorphism
  • RAPD randomly amplified polymorphic DNA
  • RFLP restriction fragment length polymorphism
  • Embodiment B49 The method of embodiment B47 or B48, wherein the molecular markers are markers as set forth in Tables 3-6.
  • Embodiment B50 The method of any one of embodiments B47-B49, wherein either the recipient or the donor soybean plant is an elite Glycine max plant.
  • Embodiment B51 A method of producing a Glycine max plant with increased protein and/or oil content to, the method comprising the steps of: a) isolating a nucleic acid from a Glycine max plant, b) detecting in the nucleic acid of a) at least one molecular marker associated with, or closely linked with a nucleic acid sequence comprising any one of SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, or a portion of any thereof, wherein said portion confers to a plant increased protein content and/or increased oil content; c) selecting a plant based on the presence of the molecular marker detected in b); and d) producing a Glycine max progeny plant from the plant of c) identified as having said marker associated with increased protein and/or increased oil content.
  • Embodiment B5 The method of embodiment B51, wherein the molecular marker is a single nucleotide polymorphism (SNP), a quantitative trait locus (QTL), an amplified fragment length polymorphism (AFLP), randomly amplified polymorphic DNA (RAPD), a restriction fragment length polymorphism (RFLP) or a micro satellite.
  • SNP single nucleotide polymorphism
  • QTL quantitative trait locus
  • AFLP amplified fragment length polymorphism
  • RAPD randomly amplified polymorphic DNA
  • RFLP restriction fragment length polymorphism
  • EmbodimentB53 The method of embodimentB51 or B 52, wherein the molecular marker is one or more SNPs set forth in Table 3
  • Embodiment B54 The method of any one of embodiments B51-B53, wherein the molecular marker comprises alleles associated with one or more of haplotypes Hap l , Hap_2, Hap_3, Hap_5, and/or Hap_7.
  • Embodiment B55 The method of embodiment B51, wherein the detecting comprises amplifying a molecular marker locus or a portion of the molecular marker locus and detecting the resulting amplified molecular marker amplicon.
  • Embodiment B56 The method of embodiment B51, wherein the nucleic acid is selected from DNA or RNA.
  • Embodiment B57 A plant produced by the method of any one of embodiments B47-B56.
  • Embodiment B58 A method of conferring increased protein content and/or increased oil content and/or modified oil profile to a plant comprising: a) introducing into the genome of the plant a nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence is stably incorporated into the genome, wherein the nucleic acid sequence encodes a polypeptide having (i) an amino acid sequence comprising least 85%, at least 90%, or at least 95% identity to at least one of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, or (ii) an amino acid sequence set forth in any one of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59 wherein said nucleic acid sequence is heterologous to the plant, and wherein expression of said nucleic acid sequence increases protein content and/or increases oil content compared to a control plant not expressing said nucleic acid sequence.
  • Embodiment B59 The method of embodiment B58, wherein the nucleic acid sequence is introduced into the genome of the plant by transformation.
  • Embodiment B60 The method of embodiment B58, wherein the nucleic acid sequence is introduced into the genome of the plant by crossing a donor plant comprising the nucleic acid sequence with the plant to produce a progeny plant having increased protein content and/or increased oil content.
  • Embodiment B61 The method of embodiment B58, wherein the nucleicacid sequence is introduced into the genome of the plant by gene editing of the genome of the plant.
  • Embodiment B62 The method of embodiment B58, wherein the method comprises Casl2a mediated gene replacement.
  • Embodiment B63 The method of embodiment B62, wherein the method comprises at least one gRNA.
  • Embodiment B64 The method of any one of embodiments B58-B63, wherein the promoter is an exogenous promoter.
  • EmbodimentB65 The method of any one of embodiments B58-B63, wherein the promoter is an endogenous promoter.
  • Embodiment B66 The method of embodiment B64, wherein the exogenous promoter comprises SEQ ID NO: 23 or an active variant or fragment thereof.
  • Embodiment B67 The method of embodiment B59, wherein the method comprises screening for the introduced nucleic acid sequence with PCR and/or sequencing.
  • Embodiment B68 The method of any one of embodiments B58-B67, wherein the plant is a dicot plant.
  • Embodiment B69 The method of embodiment B68, wherein the dicot plant is a soybean plant.
  • Embodiment B70 The method of any one of embodiments B58-B67, wherein the plant is a monocot plant.
  • Embodiment B71 The method of embodiment B70, wherein the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane.
  • Embodiment B72 A plant produced by the method of any one of embodiments B58-B71.
  • Embodiment B73 A polypeptide selected from: (a) a polypeptide comprising the amino acid sequence of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein expression of the polypeptide in a plant confers increased protein, oil content and/or modified oil profile on said plant, and having a heterologous amino acid sequence attached thereto; (b) a polypeptide comprisingthe amino acid sequence of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, and having a substitution and/or a deletion and/or an addition of one or more amino acid residues, wherein expression of the polypeptide in the plant confers increased protein and/or oil content on said plant; (c) a polypeptide having at least 99%, at least 95%, at least 90%, atleast 85%, or atleast 80% identity with and havingthe same function as the amino acid sequence of SEQ IDNOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein the polypeptide when expressed in
  • Embodiment B74 A nucleic acid molecule comprising (a) a nucleotide sequence encoding a protein having an amino acid sequence sharing at least 90%, 95% or 100% sequence identity to SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein said nucleotide sequence comprises a heterologous nucleic acid sequence attached thereto and expression of the nucleic acid molecule in a plant increase protein and/or oil content in the plant; (b) the nucleotide sequence of part (a) comprising a sequence of SEQ IDNOs: NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or a sequence encoding SEQ ID NOs: 22, or 24-59; or (c) the nucleotide sequence of part (a) having at least 99%, at least 95%, at least 90% identity to of any one of SEQ IDNOs: NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, 21 or a polynucleotide of SEQ ID NO: 22, 24-59.
  • Embodiment B75 An expression cassette comprising the nucleic acid molecule of embodiment 74 or a nucleic acid sequence encoding the polypeptide of embodiment B73.
  • Embodiment B76 The expression cassette of embodiment B75, wherein the nucleic acid molecule is operably linked to a promoter that is capable of directing expression in a plant cell.
  • Embodiment B77 The expression cassette of embodiment B75, wherein the promoter is an endogenous promoter.
  • Embodiment B78 The expression cassette of embodiment B75, wherein the promoter is an exogenous promoter.
  • Embodiment B79 A vector comprising the nucleic acid molecule of embodiment B74, the expression cassette of any one of embodiments B75-B78.
  • Embodiment B80 A transgenic cell comprising the nucleic acid molecule of embodimentB74 or the expression cassette of any one of embodiments B75-B78.
  • Embodiment B81 Use of the polypeptide of embodiment B73 or the nucleic acid molecule of embodiment B74, or the expression cassette of any one of embodiments B75 to B78 in conferring increased protein content and/or increased oil content and/or modified oil profile in a plant.
  • Embodiment B82 Use of the expression cassette of any one of embodiments B75- B78 in a cell, wherein the expression level and/or activity of the polypeptide in the cell is increased, and the protein content and/or oil content is increased in a plant upon expression in a plant.
  • Embodiment B83 A method for increasing protein content and/or oil content in a plant, comprising increasing the expression level and/or activity of the polypeptide of embodimentB73 in the plant.
  • Embodiment B84 A method for producing a plant variety with increased protein content and/or oil content, comprising increasing the expression level and/or activity of the polypeptide of embodimentB73 in a recipient plant.
  • Embodiment B85 The method of embodiments B83 orB84, wherein the increasing the expression level and/or activity of the polypeptide in the plant is by transgenic means or by breeding.
  • Embodiment B86 A method for producing a transgenic plant with increased protein content and/or oil content, comprising the following step : introducing the nucleic acid molecule of embodimentB67 orthe expression cassette of any one of embodiments B75-B78 to a recipient plant to obtain a transgenic plant; the transgenic plant has increased protein content and/or oil content compared with the recipient plant.
  • Embodiment B87 The method of embodiment B86, wherein the introducing the nucleic acid molecule to the recipient plant is performed by introducing the expression cassette of any one of embodiments B75-B78 into the recipient plant.
  • Embodiment B88 A primer pair for amplifying the nucleic acid molecule of embodimentB74.
  • Embodiment B89 The primer pair of embodiment B88, wherein the primer pair is a primer pair 1 composed of two single-stranded DNA comprising a sequence of SEQ ID NO: 63 and SEQ ID NO: 64.
  • Embodiment B90 A kit comprising the primer pair of embodimentB88 orB89.
  • the four extreme materials were sown in the experimental field of Xian gyang Farm with the conditions described below: the appropriate soil moisture content was about 15-20%, the row length was about 5m, the row spacing was about 60cm, the seeding depth (the distance from the surface of the soil) was about 3 -4cm, and each material was sown in 20 rows. After 3 weeks, the seedlings were manually thinned to reach a plant spacing of about 6.5 cm.
  • soybean grains and growth stages (Glob, Hrt, Cot, EMI, EM2, MM, LM and DS) of the seeds are as described on the Soybase website (soybase.org) and is shown in Table 7.
  • Field sampling was performed by selecting plants blooming at nodes 6-8, and leaf samples from the nodes 6-8 were taken each time, and approximately one full centrifuge tube was taken as a biological replicate each time. Three biological replicates of each material were used. Each biological replicate was immediately placed in the ice box for storage for protein and fatty acid phenotype determination.
  • the Kjeldahl method is commonly used for the quantitative determination of nitrogen contained in organic substances plus the nitrogen contained in the inorganic compound’s ammonia and ammonium (NH 3 /NH 4 + ). Without modification, other forms of inorganic nitrogen, for instance nitrate, are not included in this measurement.
  • the Kjeldahl reagents required for determining soybean grain protein content are shown in Table 11.
  • Gene expression analysis was determined by real time quantitative-PCR (qRT-PCR) analysis. Reaction solutions for genomic DNA removal were prepared as shown in Table 13, primers for qRT-PCR amplification were shown in Table 14, and reactions solutionsfor qRT-PCR amplification were prepared as shown in Table 15. Table 14. qRT-PCR amplification primers
  • the planting soil comprising flower nutrients and vermiculite at a ratio of 3 : 1 (flower nutrient soil: vermiculite).
  • the soil was put into small flowerpots and slowly soaked in water.
  • Arabidopsis thaliana seeds were sown evenly in moist soil.
  • the opening of each pot was sealed with plastic wrap and placed in a refrigerator at4°C for vernalization for 48-72h. After vernalization, the pots were placedin an incubator (22°C, 16 h/8 h light/dark, 70 pmol/m 2 /s) for 1 week until the Arabidopsis emerged. After culturing for 1 week, the wrap was removed.
  • CTAB hexadecyltrimethylammonium bromide
  • the mixture in the EP tube was then placed in a 65°C water bath for 1 h, turning and mixing once every 10 minutes.
  • the EP tube was then taken out of the water bath and added 650 pL of chloroform after cooling. The two was inverted 30 times to mix thoroughly, and centrifuged at 12000 rpm for 15 minutes at room temperature.
  • 400-500 pL of the supernatant was added into a new EP tube and 650 pL of chloroform was added.
  • the mixture was shaken and mixed thoroughly and centrifuged at 12000 rpm for 15 minutes at room temperature.
  • 400-500 pL of the supernatant was transferred to a new EP tube containing 700 pL of pre-cooled isopropanol and inverted 30 times to mix thoroughly.
  • the mixture was then centrifuged at 12000 rpm for 15 minutes at room temperature. The supernatant was discarded, and the precipitate was washed once with 95% ethanol, then once with 75% ethanol, and centrifuged at 7500 rpm for 5 min at room temperature. The DNA precipitate was dried and dissolved with 50 pL of sterilized water. DNA concentration (as reflected by the OD600 value) was measured by using NanoDrop2000C, and the DNA was stored at -20°C.
  • Arabidopsis homozygotes were screend by using the Arabidopsis mutant (SALK 021984C) detection primers (LP+RP, BP+RP) (Table 17) provided by SIGnAL (signal. salk.edu/tdnaprimers.2.html).
  • SALK 021984C Arabidopsis mutant
  • BP+RP BP+RP
  • SIGnAL signal. salk.edu/tdnaprimers.2.html
  • Arabidopsis homozygotes were screend by using the Arabidopsis mutant (SALK 021984C) detection primers (LP+RP, BP+RP) provided by SIGnAL (signal. salk.edu/tdnaprimers.2.html).
  • RT-PCR primers for detection of homozygous mutants of Arabidopsis thaliana are provided in Table 16. Arabidopsis total RNA extraction and generation of cDNA were performed as described in Example 7. Atl8SrRNA was used as an internal reference gene, and RT-PCR detection primers for SALK _02/9S4Chomozygous mutation were shown in Table 18.
  • Arabidopsis cultivation and plant transformation preparation Arabidopsis control group wild-type Col-0 and mutant materials were planted as described above. After the Arabidopsis was bolted, the stalks were removed to increase the number of bolts. The plants were then ready to be transformed when the stalks growed to the same height and only the upper flowers were not blooming.
  • Agrobacterium preparation Agrobacterium tumefaciens containing the expression vector at -80°C were inoculated into lOmL of LB liquid medium containing spectinomycin and cultured overnight at 28°C at 160 rpm. 100 pL of small shaking bacteria liquid was then transferred to 100 mL of new YEP liquid medium containing spectinomycin for further culturing at 28°C, 200 rpm shaking. When the density of the culture reached OD 60 o 0.8, the culture was harvested and centrifuged. The bacteria pellet was resuspended with lOOmL of 5% sucrose and 0.01% Silwet-L77 resuspension solution. The suspension was kept at room temperature for l-3hfor agricultural use.
  • transgenic Arabidopsis thaliana T b T 2 and T 3
  • the transgenic plants were identified by PCR using Glyma.20G092400 gene primers and Bar primers using primers shown in Table 19.
  • the PCRproducts were detectedby 1.5% agarose gel electrophoresis.
  • RNA of Arabidopsis rosette leaves were extracted and reversely transcrib edinto cDNA.
  • the expression level of Glyma.20G092400ln transgenic Arabidopsis was determined using the primer sequence shown in Table 20. AtACTIN2 was used as an internal reference gene.
  • Table 20 RT-qPCR primers for detecting transgenic Arabidopsis v. Determination of total nitrogen content in Arabidopsis seeds of Arabidopsis mutants and transgenic plants
  • Nitrogen content of the seeds which reflects the protein content of the seeds, was determined usingKjeldahl reagents describedin Table 11: 0.1 mol/LNa2CO3 calibration to prepare 0.1 mol/L HC1.
  • 1% H3BO3 was prepared, and pH was calibrated to within a range of pH 4 to pH 5.
  • 7 mL of 0.1% methyl red and 10 mb of 0.1% bromophenol green indicator was added for every 1 L of H3BO3, and the solution appeared wine red.
  • the content of fatty acids in seeds was determined by gas chromatography as follows: The seeds were placed in an oven at 105°C for 20-30 minutes, and then at65°C for 12-14 hours. 5 replicate tests were performed for each sample. In each test, about 5 mg of the seed sample was mixed with 1 mL 2.5% concentrated sulfuric acid methanol solution, 5 pL 50 mg/mL BHT (2,6-di-tert-butyl-4-methylphenol). 50 pL lOmg/L heptadecanic acid or acetic acid was used as internal standard. The tubes containing the samples were immediately sealed and placed into a water bath at 85°C for 1.5 h.
  • Each tube was inverted every 10 minutes to mix the sample and reagents thoroughly, and then let cool to room temperature.
  • 160 pL of 9% NaCl solution and 700 pL of n-hexane were then added to the storage tube, and the mixture was vortexed for 3 minutes andcentrifuged at4,500 rpm for 10 minutes at room temperature.
  • 400 pL of the supernatant of each sample were placed into a new centrifuge tube and dried overnight in a fume hood. 400 pL of ethyl acetate was then added to the dry pellet to fully dissolve it before the measurement.
  • Asx m the ith fatty acid component
  • As is the peak area of internal standard
  • ms is the mass of internal standard
  • m is the dry weight of the sample.
  • Soybean cotyledon nodes were transformed and cultivated usingthe following protocol:
  • Co-culture Divide the seed into two halves along the hypocotyl with a razor blade and use a razor blade to lightly scratch 2-3 points at the cotyledon node to make a cut. Put the explants into the prepared Agrobacterium resuspension, incubate at 160 rpm at28°C for 30 min to facilitate the Agrobacterium infection, and remove the infected explants from the resuspension with tweezers. Place it on the SCCM covered with filter paper and incubate for 3-5 days at 25 °C in the dark.
  • Rooting of positive elongated buds The positive buds were cut from the clumping buds, dipped in IB A hormone for 30 s, inserted into the rooting medium (RM), at25°C 16 h/8 h light/darkness. The rooting cultures was carried out under dark conditions and cultured in a sterile tissue culture room until they took root.
  • Transplanting and cultivation of positive seedlings The positive seedlings were taken out from the culture medium, and the roots were cleaned with clean water to remove the residual culture medium. The positive seedlings were transplanted into the soil and cultured in the plant greenhouse. viii. PCR identification of T1 generation transgenic soybean
  • RNA was extracted and reversely transcribed into cDNA.
  • qRT-PCR was performed with the primers in Table 14 to identify Tl generation transgenic soybean by analyzing the expression of Glyma.20G092400, the internal reference gene is GmActin4 (GenbankNo: AF049106).
  • x Determination of protein, oil and fatty acid content in transgenic soybean seeds
  • a InfraTecTM 1241 Grain Analyzer (FOSS Analytics) was used to determine the protein and oil content of soybean seeds. Each sample was measured 3-5 times, and the average value was used for phenotypic data analysis.
  • Expression profiles of candidate genes including Glyma.20G092000, Glyma.20G092100, Glyma.20G092400 andGlyma.20G094900 were analyzed by RT-qPCR (Table 1) in WT SN14. RNA was extracted from roots, stems, leaves, flowers, pods and seeds (herein also referred to as grains) of SN14. Expression of the candidate genes was analysized in 8 developmental stages of the grain: Glob, Hrt, Cot, EMI, EM2, MM, LM and DS (Table 2). The results showed that all candidate genes were expressed in the tested tissues, and all showed the highest expression levels in a certain developmental stage of the grain.
  • Glyma.20G092000, Glyma.20G092400 and Glyma.20G094900 had the highest expression levels in the LM stage of the grain.
  • Glyma.20G092100 had the highest expression level in the DS stage of grains.
  • the expression level of Glyma.20G092000 in grain Cot, LM and DS phases is higher than that in other non-grain tissues and organs .
  • the expression level of Glyma.20G092100 in seed at Cot, EMI , MM, LM and DS phases are all higher than in the non-seed tissues and organs, i.e., root, stem, leaf, flower, pod.
  • the expression level of Glyma.20G092400 in the six developmental stages of the grain (Cot, EMI , EM2, MM, LM and DS) is higher than that in other non-grain tissues and organs;
  • the expression level of Glyma.20G094900 in LM and DS phases is higher than that in other non- grain tissues and organs . Therefore, it is speculated that Glyma.20G092000, Glyma.20G092100, Glyma.20G092400 anAGlyma.20G094900 play an important regulatory role during grain development.
  • Glyma.20G092000 belongs to the retroviral protease superfamily, which includes the pepsin-like aspartic protease of cells and retroviruses, and also has sphingolipid activator-like protein type B, region 1 and region 2.
  • Glyma.20G092100 belongs to the PPR repeatfamily. This repeathas no known function. It is about 35 amino acids long, and upto 18 copies are found in some proteins.
  • Glyma.20G092400 belongs to the amino acid transferase-V family, and this protein contains an amino acid transferase domain and other enzymes including a cysteine desulfurase domain.
  • Glyma.20G09490 Ob elongs to the DUF1336 superfamily and is a protein with unknown function. This family represents the C- terminus of many pseudoproteins with unknown function.
  • RNA is extracted and cDNA were synthesized andqRT-PCRwas performedusingthe specific primers provided in Table 20 shown above in Example 2.
  • the reference gene is GmAciinA (GenbankNo: AF049106).
  • Example 6 Subcellular localization of Glyma.20G092400
  • Tobacco planting soil was prepared by mixing flower nutrient soil with vermiculite at a ratio of 3 : 1. After germination, the seedlings or transfers to new small flowerpots, one plant per pot, placed into an incubator (22°C, 16 h/8 h light/dark, 70 pmol*m-2 «s-l) for cultivation, and watered once every 2 days to ensure adequate water.
  • the Nicotiana benthamiana was inoculated with Agrobacterium tumefaciens in an incubator (22°C, 16h light/8h dark, 70pmol m-2 s-l) for 48h, then observed by confocal microscope for subcellular localization of the target protein.
  • the subcellular localization of pSOY l-Glyma.20G092400-GFP expressing fusion protein was observed under a confocal microscope.
  • the green fluorescence ofpSOYl- Glyma.20G092400-GFP fusion protein was observed in the nucleus, indicating that the protein encoded by Glyma.20G092400 ⁇ s expressed in the nucleus.
  • RNA extraction from soybean SN14 leaves RNA from young and tender SN14 triple compound leaves was extractedby the trizol method. With 2% concentration agarose gel and electrophoresis detection, three bands of28s, 18s and 5s were observed, which indicated that the integrity of the RNA was good.
  • the cDNA was obtained by reverse transcription and used for Glyma.20G092400 gene cloning.
  • Glyma.20G092400 clone The CDS sequence of Glyma20G092400 was obtained from the phytozome database. The CDS sequenceis 1388 bp in length. The cloning primers were designed according to the CDS sequence of Glyma.20G092400 (Table 16).
  • This sequence was used as a template to design primers at both ends of the gene's CDS sequence (with the terminator removed).
  • the primer pair was designed to comprise restriction sites (Spel and BamH I) at both ends of the ccdB gene in the entry vector.
  • SN 14 leaf cDNA was used as a template to clone the CDS sequence of Glyma.20G092400 gene with CDS primers, and then this product was used as a template to performPCRwith primers with restriction sites to obtain Glyma.20G092400 with restriction sites on both ends
  • the gene products with restriction sites were recovered through the gel recovery kitfor subsequent experiments.
  • the full-length CDS sequence of Glyma.20G092400 (with the termination codon TGA removed) was cloned using the cDNAof soybean Suinong 14 (SN14) leaves as a template.
  • the CDS sequence was amplified usin the following primers.
  • EHA105 Agrobacterium competent cells were first transformed with pSOY 1 -Glyma.20G092400, the transformed bacterial cells were grown on a YEP plate that is resistant to both rifampicin and spectinomycin, and single colonies were selected. The transformation was confirmed by PCR as indicated by the presence of a 1338 bp DNA fragment (Giyma.20G092400, data not shown), which represented that the expression vector (pSOYl-Gfyma.20G09240ff) has been transferred into EHA105 Agrobacterium tumefaciens.
  • Giyma.20G092400 We obtained the amino acid sequences of Arabidopsis AT5G26600 from the Phytozome (phytozome.jgi.doe.gov, pz portal. him! ) database query and performed a sequence alignment analysis with the amino acid sequence of Giyma.20G092400. It is found that the percentage identity of amino acid sequences between Giyma.20G092400 and Arabidopsis AT5G26600 is about 75.8%, and they have the same protein conserved domains, all belong to the amino acid transferase-V family (FIG. 4).
  • the Arabidopsis AT5G26600 gene mutant, SALK 021984C was purchased through ABRC (abrc.osu.edu ) as the soybean Giyma.20G092400 mutant in Arabidopsis for subsequent experiments. ii. PCR identification Arabidopsis mutants
  • Transgenic Arabidopsis Tl, T2, T3 generation plants were selected and planted.
  • Leaf extract was prepared as described above.
  • a Bar test strip Linear Chemicals
  • Bar test strips showed a clear number of two bands in the leaves of transgenic plants (overexpression: pSOYl : Glyma.20G092400, supplement: pSOYl :
  • Glyma.20G092400 transcripts were assessed in Arabidopsis wild type ecotype Col-0 plants (WT), mutant SALK 021984C plants, trangenic Arabidopsis replenishment plants (pSOYl: Glyma.20G092400/SALK 021984C ⁇ , andoverexpression plants (pSOYl:
  • Glyma.20G092400 ⁇ See Table 22.
  • the expression level of Glyma.20G092400m overexpression plants was higher than that in replenishment plants.
  • the results indicates that the mutation in Arabidops is A T5 G26600 (a homolog of Glyma.20G092400 ⁇ signficantly reduces its expression, which may be rescued by reintroducing an exogenous copy of the Glyma.20G092400 as shown herein.
  • the AT5G26600 and Glyma.20G092400 polypeptides share 61% amino acid sequence identity.
  • the Glyma.20G092400 replenishment experiment was further carried out on the mutant plants.
  • the results showed that the palmitic acid, linoleic acid, linoleic acid and eicosenoic acid content of the replenishment lines were significantly higher than those of the control plants, and the stearic acid and oleic acid contents were lowerthan those of the control plants (FIG. 3A).
  • the replenishment lines had harder grains (data not shown).
  • the fatty acid and oleic acid content were significantly higher in the control plants than that of the mutant plants (FIG. 3A).
  • the total f tty acid content of the grains of the replenishment line was significantly higher than that of the control plants (FIG. 3B).
  • the results suggest that Glyma.20G092400 can promote the accumulation of fatty acid content in grains.
  • Example 9 Expressing Glytna.20G092400 in soybean
  • Bar test strip detection and PCR identification of Ti transgenic soybean [0382] The Ti genetically modified soybeans were planted the leaves were crushed and tested using the Bar test strip as described above. The result shows that two horizontal lines appear on the Bar test strip in the overexpressing plants (data not shown), indicating that the verified plants are genetically modified soybean plants. The overexpressing plants were verified by PCR using the full-length primers of CDS sequence of Glyma.20G092400 (1338 bp) and Bar primers (516 bp) (data not shown), indicating that the verified plants were transgenic soybean plants.
  • transgenic soybean overexpression plant p SOY 1 .Glyma.20(4092400) and the control wild-type plant Dongnong 50 (DN50) (WT) were planted under the same conditions.
  • the young leaves were taken to extract total RNA and reverse transcribed into cDNA.
  • the expression level of Glyma.20G092400 was tested by qRT-PCR reaction using Glyma.20G092400 specific primers. The results showed that the expression level of Glyma.20G092400 in the overexpression plants was higher than the control plants, indicating that Glyma.20G092400 was successfully transformed into soybean plants (Table 23).
  • Table 23 Glyma.20G092400 transcripts in plant leaves of wild type (WT) and
  • the numbers represent the relative expression levels iii. Determination of protein and fatty acids in T1 transgenic soybean grains [0384]
  • the transgenic soybean (overexpression plant p SOY 1 .Glyma.20G092400) and the control plantDN50 were planted under the same conditions, their mature T1 seeds were harvested, and some of the seeds were dried for phenotyping.
  • the grain protein and oil content were determined by Kjeldahl nitrogen determination and the content of fatty acid was determined by gas chromatography, e g., as disclosed in Rapid CommunMass Spectrom. 2007;21(12):1937-43.).
  • the protein, oil, and fatty acid content in the overexpression plants were significantly higher than the control plants, indicating that Glyma.20G092400 promoted quality traits (protein and oil content) (Table 24).
  • FOSS grain analyzer (INFRATEC 1241) was used to determine the seed protein and oil content of the CSSLs population (2013-2015). Three biological replicates were measured for each test material, and the average value was used for protein and oil content phenotypic data analysis. The range of protein contentwas about 37.00% - 46.77%, andthe range of oil contentwas about 18.02%-23. 19%. The results are consistentto the normal distribution and is suitable for quantitative trait locus (QTL) mapping of protein and oil content. As described herein the QTL mapping refers to a genome-wide inference of the relationship between genotype at various genomic locations and phenotype for a set of quantitative traits in terms of the number, genomic positions, effects, and interaction of QTL.
  • QTL quantitative trait locus
  • the logarithm of the odds (LOD) value range is 3.72-12.62.
  • the minimum LOD value of Qpro Gm20 2 is 3.72, and the maximumLOD value of Qoil Gm20 1 is 15.16.
  • the range of genetic contribution rate (R2) is 2.27 %-22.86%.
  • the minimum genetic contribution rate of Qpro Gm20 2 is 2.27%, and the maximum genetic contribution rate of Qpro Gm20 1 is 22.86%.
  • the range of additive effects is -0.52-1.27.
  • the minimum additive effect value of Qoil Gm20 1 is -0.52, and the addition of Qpro Gm20 1.
  • the maximum additive effect is 1.27.
  • the confidence intervals of the four QTLs are close, they are integrated as the "hot spot" interval (33.54Mb-34.70Mb) for the study of protein and oil content-related QTLs.
  • the QTLs results are used for the mining and function analysis of subsequent protein and oil content-related candidate genes.
  • This “hot spot” interval is consistent with MQTLOil-62 (Gm20: 33. 14Mb-33.84Mb) described in Qi et al., Plant Cell Environ. 41(9):2109-2127 (2016).
  • we identified the “hot spot” interval through Meta analysis of 312 oil content QTLs (Table 26), thus further verifying the precision and accuracy of fine positioning of protein and oil content QTL (Qpro&oil Gm20).
  • the "hot spot” interval (33.54Mb -34.70Mb) was obtained by integrating the confidence intervals of the CSSLs population protein and oil content QTL (Qpro&oil Gm20) finely mapped from 2013 to 2015.
  • the candidate gene mining and Web Gene Ontology (WEGO) analysis were performed on the "hot spot” interval.
  • the results show that there are 130 candidate genes in this “hot spot” interval.
  • Glyma.20G092100 is related to the development of embryonic grains (G0:0009793) and has the functions of protein amino acid binding and glycoprotein binding (G0:0005515); Glyma. 20G092400 has catalytic activity (G0:0003824); Glyma.20G094900G related to lipid binding (G0:0005543) (Table 27). Therefore, the above four genes are used as candidate genes for protein and oil content analysis. The results suggest that these four genes maybe related to the metabolism and synthesis of protein and oil.
  • Glyma.20G092100 GO: 0009793: embryo G0:0005515: protein G0:0005739: development ending in seed amino acid binding, chloroplast dormancy glycoprotein binding
  • GO: 0010228 ovule development
  • GO: 0016226 thylakoid membrane organization
  • GO: 0048481 vegetativeto reproductivephase transition of meristem
  • G0:0008152 indoleacetic
  • G0:0030170 acid biosynthetic process pyridoxalphosphate binding
  • GO: 0009610 metabolic process
  • GO: 0009684 paia- aminobenzoic acid metabolic process
  • GO: 0019344 response to symbiotic fungus
  • GO: 0046482 tryptophan catabohc process
  • Glyma.20G094900 GO: 0008150: biological
  • G0:0005543 lipid
  • G0:0005739 process binding mitochondrion
  • G0:0008289 G0:0005886: plasma phospho-lipid binding membrane
  • Soybean grain protein content is one of the important traits to measure soybean quality.
  • the Kjeldahl method was used to determine the grain protein content of parent SN14 and extreme materials (HPLO, LPHO, HPHO and LPLO) to analyze the protein accumulation characteristics of soybean grains at different developmental stages (Table 28).
  • the results showed that the protein content of the grains of the five materials had the highest total nitrogen/protein content during the EMI period, and the nitrogen/protein content decreased with the progress of grain development.
  • the grain protein content of the five materials all showed a sharp downward trend from the development stages of EMI to MM.
  • the grain protein content of HPHO and LPLO has the lowest level at the MM stage.
  • the grain protein content of SN14, HPHO and LPLO showed an upward trend, while the HPLO and LPHO grain protein content continued to decrease, with the HPLO grain protein content reaching the lowest level at the LM development stage.
  • the grain protein content of SN14, HPHO, LPHO and LPLO decreases.
  • the LPHO grain protein content maintained a downward trend during the entire grain development process and reached the lowest during the DS development period.
  • the two high-protein materials, HPLO and HPHO had higher protein contentthan the parent N J 4 at all stages of the soybean kernel development process.
  • the two low-protein materials, LPHO and LPLO had lower protein contentthan the parent A' A7 Vat all stages of the soybean kernel development process.
  • Example 14 Determination of fatty acid content of soybean kernels at different developmental stages
  • Types of fatty acids in soybean seed oil include palmitic acid (Cl 6:0), stearic acid (Cl 8:0), oleic acid (Cl 8:1), linoleic acid (Cl 8:2) and linolenic acid (Cl 8 :3).
  • the fatty acid content of the parent SN14 and extreme materials HPLO, LPHO, HPHO and LPLO were measured to analyze the fatty acid accumulation characteristics of soybean grains at different developmental stages ( Tables 29-34). The results showed that the fatty acid content of the grains of the five materials was detected in the EM1-EM2 development stage.
  • the palmitic acid level remained low but detectable at EMI and EM2 stages, and increased sharply from stages EM2 to MM, and peaked at stage LM, before it drops from stages LMto DS.
  • the stearic acid level was high at stage EMI and decreased sharply from stages EMI to EM2.
  • the stearic acid level increased gradually from stages EM2 to LM and peaked at stage LM.
  • the oleic acid level (Table 31) and linoleic acid level (Table 32) showed the same trend as the palmitic acid level (Table 29) throughout all five developmental stages.
  • LPLO generally all materials, except LPLO, had high linolenic acid at stages EMI and EM2, followed by a downward trend from stages EM2 to LM, before it increased from stages LMto DS.
  • the LPLO linolenic acid level was irregular: it was high at stage EMI and decreased from stages EMI to EM2, followed by a sharp increase from stages EM2 to MM and a sharp decrease from stages MM to LM, and followed by an increase from stages LMto DS. All five materials have similar trends of palmitic acid, stearic acid, oleic acid, and linoleic acid throughout the five developmental stages.
  • the total fatty acid content has the same trend as that of several individual fatty acids, e.g., palmitic acid in Table 29.
  • the fatty acid content of the two high-oil materials, LPHO and HPHO was higher than that of the parent SN14, while the fatty acid content of the two low-oil materials, HPLO and LPLO, was lower than that of the parent SN14.
  • Stearic acid amounts in soybean seeds at different developmental stages in SN14, HPLO, LPHO, HPLO, and LPLO Table 31.
  • Oleic acid amounts in soybean seeds at different developmental stages in SN14, HPLO, LPHO, HPLO, and LPLO
  • Example 15 Expression analysis of candidate genes in soybean grains at different developmental stages
  • Glyma.20G094900 in SN14 and four extreme materials were analyzed by RT-qPCR.
  • RNA extraction, cDNA generation and RT-qPCR were performed accodin to the methods described herein. Expression was examined at eight developmental stages of their grains Glob, Hrt, Cot, EMI, EM2, MM, LM and DS (Table 7).
  • the expression level increased from stages Hrt to Cot and dropped from stages Cotto EM2, followed by an increase from stages EM2 to LM, and dropped from stages LMto DS in SN14, LPHO and LPLO materials.
  • the expression level of Glyma.20G092000 in HPHO and HPLO materials continued to increase from stages LMto DS and reached the highest level at stages DS.
  • the expression level of Glyma.20G092I00 remained relatively steady from stages Glob to EM2 (except LPLO), followed by an increase from stages EM2 to DS, and the expression level remained high at stage DS (.
  • the expression level of Glyma.20G092400 in LPLO is slightly lower than that of Glyma.20G092100 at stage DS.
  • the expression level of Glyma.20G092400 in LPLO is slightly lower than that of Glyma.20G092100 at stage DS.
  • the expression level of Glyma.20G092400 duringthe developmental stage was higher than that of Glyma.20G092000, Glyma.20G092100 and Glyma.20G094900 at each developmental stage of the five materials, and the expression level of Glyma.20G092400 in HPHO was the highest at stage LM. Therefore, Glyma.20G092400 is selected for further analysis of its role in the regulation of protein and oil accumulation during grain development.
  • GmDESl(Glyma.20G092400) The phylogenetic tree of GmDESl(Glyma.20G092400) was constructed using homologous sequences from Soybean, Arabidopsis, rice, and corn with MEGA5 software. See Fig. 4.
  • GmDESl(Glyma.20G092400) shows identity with AT5G26600 (60.6%), AT3G62130 (55.5%), Zm00001d008187 (54.8%), Zm00001d040555 (57.2%), LOC_Os01gl8640 (56.3%) and LOC_Os01gl 8660 (52.4%).
  • the roots, stems, leaves, flowers, pods and seeds of parent SN14 were selected as template materials fortissue-specific expression.
  • the materials were put into an Eppendorf (EP) tube without RNase and immediately put into liquid nitrogen, and stored at -80°C.
  • the soybean template material was SN14, and the soybean transformation material was DN50.
  • the Arabidopsis transformation material is Col-0, and the Arabidopsis mutant material is S ALK_ 127828.47.00.x (ord ered from th e ABRC web site) .
  • Escherichia COH USQA in this application was DH5a anA Agrobacterium tumefacienssNas EHA105.
  • the target gene fragment of entry vector Fu28 was connected to plant expression vector Pr35Sby gateway vector system.
  • the entry vector Fu28 and expression vector Pr35S were provided by Professor Fu Yongfu of Institute of crop science, Chinese Academy of Agricultural Sciences. Both vectors are described in WangX, et al. (2013) BioVector, a flexible system for gene specific expression in plants. BMC PlantBiol 13 : 198), the entire content of said publications is herein incorporated by reference.
  • the target crop was selected as Arabidopsis thaliana, and the gene sequence was Glyma. START-CDS sequence.
  • the homologous gene in Arabidopsis thaliana was obtained.
  • the conserved functional domain of the homologous gene in Arabidopsis thaliana was predicted, and the domain was similar to the target gene Glyma. START.
  • the planting soil ratio offlower nutrient soil: vermiculite was 3 :1.
  • the soil was put into small flower pots and slowly soaked in water.
  • Arabidopsis thaliana seeds were sown evenly in moist soil.
  • the opening of each pot was sealed with plastic wrap and placed in a refrigerator at4°C for vernalization for 48-72h.
  • the pots were placed in an incubator (22°C, 16 h/8 h light/dark, 70 pmol»m-2»s-l) for 1 week until the Arabidopsis emerged.
  • Arabidopsis plants having two large leaves and two small leaves were selected for transplanting in pots; 1-2 plants per pot. Water or flower fertilizer was added when the soil in the pots became dry.
  • CTAB hexadecyltrimethylammonium bromide
  • the prepared CTAB extract was stored forat4°C.
  • the rosette leaves oi Arabidopsis aXiana were collected and placed in an EP tube with add 2 mm small steel balls. Lquid nitrogen was used to quick-freeze the leaves.
  • the frozen leaves placed in a tissue grinder to fully break the leaves. 700 pL of CTAB extract solution was added to the EP tube containingthe sample and mixed thoroughly with a vortexer.
  • the mixture in the EP tube was then placed in a 65°C water bath for 1 h, turning and mixing once every 10 minutes.
  • the EP tube taken out of the water bath and added 650 pL of chloroform after cooling. The two was inverted 30 times to mix thoroughly, and centrifuged at 12000 rpm for 15 minutes at room temperature.
  • 400-500 pL of the supernatant was added into a new EP tube and 650 pL of chloroform was added.
  • the mixture was shaken and mixed thoroughly, and centrifuged at 12000 rpm for 15 minutes at room temperature.
  • 400-500 pL of the supernatant was transferred to a new EP tube containing 700 pL of pre-cooled isopropanol and inverted 30 times to mix thoroughly.
  • the mixture was then centrifuged at 12000 rpm for 15 minutes at room temperature. The supernatant was discarded, and the precipitate was washed once with 95% ethanol, then once with 75% ethanol, and centrifuged at 7500 rpm for 5 min at room temperature. The DNA precipitate was dried and dissolved with 50 pL of sterilized water. DNA concentration (as reflected by the OD600 value) was measured, and the DNA was stored at -20°C.
  • DNA from Arabidopsis wild-type Col-0 and mutants was extracted and used as a template for PCR amplification.
  • the amplified product was subjected to 1.5% agarose gel electrophoresis to detect whether the mutant was a homozygous mutant.
  • the primers used are shown in Table 41.
  • Arabidopsis cultivation and plant transformation preparation Arabidopsis control group Col-0 and homozygous mutant materials were planted as described above. After the Arabidopsis was bolted, the stalks were removed to increase the number of bolts. The plants were then ready to be transformed when the stalks growedto the same height and only the upper flowers were not blooming.
  • Aerobacterium preparation Agrobacterium tumefaciens containing the expression vector at -80°C were inoculated into lOmL of LB liquid medium containing spectinomycin and cultured overnight at 28°C at 160 rpm.
  • Transgenic Arabidopsis thaliana (T b T 2 and T 3 ) rosette leaves were placed in the EP tube andgrinded with a small pestle. The ground leaves were placed into the EP tube in the direction indicated by the Bar test strip, and observe the strips shown on the test strip Number, two bands represent that the identified Arabidopsis plants are transgenic plants, and one band is non-transgenic plants. iv. Identification of transgenic Arabidopsis thaliana
  • transgenic Arabidopsis thaliana T b T 2 and T 3
  • the transgenic plants were identified by PCR using Glyma.06G303700 gene primers and Bar primers using primers shown in Table 42.
  • the PCR products were detected by 1.5% agarose gel electrophoresis.
  • RNA of Arabidopsis rosette leaves were extracted and reversely transcribed into cDNA.
  • the expression level of Glyma.06G303700 in transgenic Arabidopsis was determined using primer sequence shown in Table 43 o AtACTIN2 was used as an internal reference gene.
  • Nitrogen content of the seeds was determined using 0.1 mol/LNa2CO 3 calibration to prepare 0. 1 mol/L HCl. 1% H 3 BO 3 was prepared and was adjusted to pHbetween4 and 5 Seven millileters of 0.1% methyl red and 10 mL of 0. 1% bromophenol green indicator were added for every 1 L of H 3 BO 3 , and the solution appeared wine red. Prepare 40% NaOHfor determination. [0412] The seeds were placed in an oven at 60°C for 12-14 hours. 0.1 g sample (accurate to 0.001 g) was poured into a 50 mL digestion tube through a paper trough. The same sample was tested 3 times.
  • the content of fatty acids in seeds was determined by gas chromatography as follows. The seeds were placed in an oven at 105°C for 20-30 minutes, and then at 65 °C for 12-14 hours. 5 replicate tests were performed for each sample. In each test, about 5 mg of the seed sample was mixed with 1 mL 2.5% concentrated sulfuric acid methanol solution, 5 pL 50 mg/mL BHT (2,6-di-tert-butyl-4-methylphenol). 50 pL lOmg/L heptadecanic acid or acetic acid was used as internal standard. The storage tube was immediately sealed and placed into a water bath at 85°C for 1.5 h.
  • the tube was inverted every 10 minutes to mix the sample and reagents thoroughly, and then letcool to room temperature. 160 pL of 9%NaCl solution and 700 pL of n-hexane were then added to the storage tube, and the mixture was vortexed for 3 minutes andcentrifuged at 4,500 rpm for 10 minutes at room temperature. 400 pL of the supernatant of each sample were placed into a new centrifuge tube and dried overnight in a fume hood. 400 pL of ethyl acetate was then added to the dry pellet to fully dissolve it before the measurement.
  • the column model used by the Agilent 6890 gas chromatograph was: 30m*320pm*0.25pm.
  • Carrier gas nitrogen 60 mL/min, hydrogen 60mL/min, air 450 mL/min.
  • Injection volume 1 pL, split injection mode, split ratio 10:1, injection port temperature 170°C.
  • Reaction procedure hold at 180°C for 1 min, increase to 250°C at a rate of 25°C/min and hold for 7 min. ms
  • Asx m the ith fatty acid component
  • As is the peak area of internal standard
  • ms is the mass of internal standard
  • m is the dry weight of the sample.
  • Soybean cotyledon nodes were transformed and cultivated usingthe following protocol:
  • Co-culture Divide the seed into two halves along the hypocotyl with a razor blade, and use a razor blade to lightly scratch 2-3 points at the cotyledon node to make a cut. Put the explants into the prepared Agrobacterium resuspension, incubate at 160 rpm at28°C for 30 min to facilitate the Agrobacterium infection, and remove the infected explants from the resuspension with tweezers. Place it on the SCCM covered with filter paper and incubate for 3-5 days at 25 °C in the dark.
  • GmActin4-q-R GTTTCAAGCTCTTGCTCGTAATC A (SEQ ID NO: 70) xii. Determination of protein, oil and fatty acid content in transgenic soybean seeds
  • Grain protein and oil content of the population of the plants was determined, and the protein content was sorted. After the sorting, 20 samples from the high protein and low protein content range were selected and extracted. Quality DNA to prepare high and low phenotype pools for BSA sequencing. Use the SNP-index correlation algorithm to select candidate regions. With SN14 as the reference parent, 3 candidate segments are associated with 95% confidence level, and the genes that cause stop loss, stop gain, or contain Genes with non-synonymous mutations or alternative splicing sites were selected as candidate genes, and a total of 5 genes were screened. The results of bulked segregant analysis (“BSA”) mixed pool sequencing are from the master's thesis of Li Wei, Northeast Agricultural University (2016).
  • BSA bulked segregant analysis
  • Glyma.03G040200 is reported to have the highest expression in seeds, with a slight expression in stems and no expression in other tissues, but the relative expression levels of seeds and stems are not more than 1.
  • Glyma.03G036300 has no expression in any organ.
  • Glyma.06G297500 has extremely high expression levels in various tissues, among which root hairs and roots have the highest expression levels, followed by tip meristems, which gradually decrease in terms of root nodules, stems, pods, leaves, and flowers, and seeds have the lowest expression levels.
  • Glyma.07G 192400 is expressed in all tissues. The apical meristem has the highest expression, followed by pods and seeds, and the tissue with the lowest expression is roots.
  • Glyma.06G303700 is not expressed in root nodules, and has the highest expression in apical meristems, followedby pods and seeds, and then decreases in the order of flowers, roots, stems, leaves, and root hair
  • the promoter region of the gene Glyma.06G303700 include at least the following regions: (i) 60K protein binding site, (ii) an cis-acting element involved in defense and stress response, (iii) a common cis-acting elements in the promoter and enhancer regions, (iv) a core promoter element, (v) an element for maximal elicitor-mediated activation elements, (vi) a conservative DNA module array (CMA3), (vii) light-responsive elements.
  • the promoter sequence of Glyma.06G303700 contains a large number of TATA boxes (the core promoter element near the transcription promoter), which plays a certain role in regulating gene expression.
  • the photoresponsive element of the promoter contains MYB binding sites involved in photoresponse, and some conserved DNA modules involved in photoresponse.
  • the amino acid sequences of the genes in the parent SN 14 and ZYD00006 were submitted to the SOPMA website for protein secondary structure prediction.
  • the protein secondary structure prediction of this gene in parent SN14 shows that it contains 36.63% a- helix, 14.13% extended chain, 5.21% P-turn and 44.03% random coil; in parent ZYD00006 its protein
  • the secondary structure prediction shows thatit contains 36.35% a-helix, 14.13% extended chain, 4.12% P-turn and 45.40% random coil.
  • the change of only one amino acid base leads to a decrease in the number of a-helices and a decrease in the number of P-turns in ZYD00006 relative to SN14, resultingin more random coils.
  • RNA is extracted and cDNA were synthesized (Table 46) and qRT-PCRwas performed using the following specific primers.
  • the reference gene is GmActin (Genbank No: AF049106, Table 44).
  • Tobacco planting soil was prepared by mixing flower nutrient soil with vermiculite at a ratio of 3 : 1. After germination, the seedlings or transfers to new small flowerpots, one plant per pot, placed into an incubator (22°C, 16 h/8 h light/dark, 70 pmol*m-2 «s-l) for cultivation, and watered once every 2 days to ensure adequate water.
  • the agrobacterium pellet was then resuspended in 1 mL of resuspension buffer and 2 pL of acetosyringone (dissolved in DMSO) to reach a final concentration of 0.04 g/L to the bacteria.
  • the bacterial solution was then transferred to a large EP tube, adjusted the ODgoo to about 0.2 by resuspending the Buffer, and let stand at room temperature for 1-3 h. Healthy tobacco after 3 weeksof growth was selected.
  • Tobacco plants inoculated with Agrobacterium tumefaciens were then placed in an incubator (22°C, 16 h light/8 h dark, 70 pmol/m 2 /s) for 48 hours, the epidermis removed in order to observe the subcellular localization of the target protein through a confocal microscope.
  • the green fluorescence of pr35S-Glyma.06G303700-GFP appears in the cell membrane and nucleus, indicating a nuclear membrane co-expression pattern.
  • RNA extraction from soybean SN14 leaves was extractedby the trizol method. With 2% concentration agarose gel and electrophoresis detection, three bands of28s, 18s and 5s were observed, which indicated that the integrity of the RNA was good.
  • the cDNA was obtained by reverse transcription and used for Glyma.06G303700 gene cloning.
  • Glyma.06G 303700 clone The CDS sequence of Glyma.06G303700 was obtained from the phytozome database. The total length of the sequence is 2190bp. This sequence was used as a template to design primers at both ends of the gene's CDS sequence (with the terminator removed).
  • the primer pair was designed to comprise restriction sites CS’y>c4 and BamH ) at both ends of the ccdB gene in the entry vector Firstly, SN14 leaf cDNA was used as a template to clone the CDS sequence of Glyma.06G303700 gene with CDS primers, and then this product was used as a template to perform PCR with primers with restriction sites to obtain Glyma.06G303700 with restriction sites on both ends. The gene products with restriction sites were recovered through the gel recovery kit for subsequent experiments The full-length CDS sequence of Glyma.06G303700 (with the termination codon TGA removed) was cloned using the cDNA of soybean Suinong 14 leaves as a template. The CDS sequence was amplified using the following primers.
  • Glyma.06G303700-F ATAACTAGTATGTTCCAGCCGAACC (SEQ ID NO: 63)
  • the amplified Glyma.06G303700 fragments were gel-purified and cloned into an entry vector Fu28 by restriction digestion and ligation.
  • the Fu28 vector fragment with the ccdB gene cut out was about 3200 bp.
  • the geneGlyma.06G303700 fragment is about2200bp.
  • the ligation products were transformed into Escherichia coli.
  • Bacterial clones comprising the cDNA sequence of Glyma.06G303700 were identified by PCR and verified by sequencing analysis using primers described below.
  • EHA105 Agrobacterium competent cells were first transformed with pr35S-Glyma.06G303700, the transformed bacterial cells were grown on a YEP plate that is resistant to both rifampicin and spectinomy cin, and monoclonal colonies were selected. The transformation was confirmed by PCR as indicated by the presence of a a 2190 bp DNA fragment, which represented that the expression vector (pr35 S-Glyma.06G303700) has been transferred into EHA105 Agrobacterium tumefaciens.
  • the Glyma.O 6G 303700 gerre fragment and related tags contained in the Fu28 entry vector were transferred to the expression vector pr35 S (spectinomycin resistant) through the LR recombination reaction.
  • the reaction system and reaction conditions are shown in Table 48.
  • pr35 S- Glyma.06G303700 was transformed into Escherichia coli.
  • the positive clones were identified by PCR and sequencing analysis.
  • pr35S- Glyma.06G303700 plasmid was then extracted from the positive monoclonal bacteria culture and transformed into Agrobacterium tumefaciens EHA105. Positive clones were identified by PCR.
  • Example 26 Expressing Glyma.06G303700 in Arabidopsis i. Selection of Arabidopsis mutants
  • Arabidopsis gene AT1G05230 homologous to Glyma.06G303700sNeve selected and their conserved domains identified.
  • Arabidopsis homologous gene A T1 G05230 contains three conserved domains: START_ArGLABRA2_like, Homeobox, andMrC superfamily. Similar to Glyma.06G303700, the Arabidopsis homologous gene ATJG05230 also has the START_ArGLABRA2_like domain, as shown in Table 49.
  • START_ArGLABRA2_ PTHR24326 Family not named like PTHR24326:SF239: HOMEOBOX-LEUCINE ZIPPER
  • MreC superfamily PF00046 Homeobox domain
  • KOG0483 Transcription factor HEX, contains HOXandHALZ domains
  • KOG0484 Transcription factor PHOX2/ARIX, contains HOX domain
  • GO: 0008289 Interacting selectively and non-covalently with a lipid GO: 0010090; GO: 0005634; GO: 0010103; GO: 0048497
  • SALK 127828.4700.x (SEQ ID NO: 60) as the Glyma.06G303700 Arabidop sis mutant.
  • SALK 127828.4700.x is an Arabidopsis mutant with Col-0 as the background, with an insertion of 186bp sequence into the coding region by means of T-DNA insertion mutagenesis. ii. PCR identification Arabidopsis mutants
  • Transgenic Arabidopsis T1 , T2, T3 generation plants were planted.
  • Leaf extract was prepared as described above.
  • a Bar test strip was inserted into the extract in a specified direction as provided in the manufacture’ s instructions.
  • the results displayed on the Bar test strip indicated that the Arabidopsis from which the leaf extract was obtained was genetically modified.
  • Transgenic Arabidopsis plants were planted in Tl, T2, and T3 generations. DNA was extracted from rosette leaves of Arabidopsis thaliana, and PCR of the target gene Glyma.06G303700 and the Bar gene were performed respectively. After 1.5% concentration agarose gel electrophoresis, the results showed that there were bands at 516 bp (Bar gene) and 2190 bp (Glyma.06G303700 gene), indicating that transgenic Arabidopsis plants were obtained. v. qRT-PCR identification of T 3 transgenic 4/z//>/6fo/?.s7.s
  • Table 50 Relative expression levels of various transgenes in the transgenic Arabidopsis plants vi. Determination of fatty acids and total nitrogen in T3 transgenic AzA/Ayzs/.s
  • Transgenic Arabidopsis thaliana (overexpressed plant pr35 S.Glyma.06G303700 and mutant complement plant pr35S:Glyma.06G303700/SALK 127828. 4700. X) and Col-0 and mutant plant SALK 127828. 4700. A were planted underthe same conditions. The mature pods of T 3 generation transgenic plants of Arabidopsis thaliana were collected, and the seeds were obtained and dried. The fatty acid composition content of Arabidopsis thaliana seeds was determined by gas chromatography, and the total nitrogen content of Arabidopsis thaliana seeds was determined by Kjeldahl nitrogen determination method.
  • mutant plants When the phenotype of mutant materials and wild-type materials was determined, the content of fatty acid components in mutant plants was lower than wild-type materials, and the content of oleic acid, linoleic acid and eicosenoic acid was significantly lower than wild-type materials. The total nitrogen content of mutant plants was significantly lower than wild type plants.
  • the content of fatty acids in the seed grains of the mutant plants was significantly increased, but still lower than the control plants, and the content of linoleic acid was significantly increased; the content of components in the overexpression plants was higher than the wild type, the content of palmitic acid was extremely significantly increased, and the content of oleic acid and eicosenoic acid was significantly increased (Table 51 ) .
  • the total nitrogen content of mutant replenishment plants increased significantly, which was significantly different from that of control materials.
  • the total nitrogen content of overexpressed plants was significantly higher than that of wild-type plants.
  • the results showed that in Arabidopsis seed protein oil accumulation, Glyma.06G303700 could promote fatty acid content to a certain extent, and the effect on protein content was more significant.
  • Table 51 Analysis of seed fatty acid content/profile andprotein content, respectively, in Arabidopsis mutant, transgenic Arabidopsis expressing a Glyma. START (Glyma.06G303700).
  • Example 27 Expressing Glyma.06G303700 in soybean i. Bar test strip detection and PCR identification of T t transgenic soybean
  • transgenic soybean overexpression plant 35 .Glyma.06G303700
  • control plantDN50 was planted under the same conditions.
  • the youngleaves were taken to extract total RNA and reverse transcribed into cDNA.
  • Glyma.06G303700 was tested by qRT-PCR reaction. The results showed that the expression level of Glyma.06G303700 in the overexpression plants was higher than the control plants, indicating that Glyma.06G303700 was successfully transformed into soybean plants. See
  • transgenic soybean overexpression plant 35 .Glyma.06G303700
  • control plantDN50 The transgenic soybean (overexpression plant 35 .Glyma.06G303700) andthe control plantDN50 were planted under the same conditions, their mature seeds were harvested, and some of the seeds were dried for phenotyping, andthe grain protein and oil content were determined by gas chromatography analysis. The content of fatty acid components in Arabidopsis seeds was determined by gas chromatography. The protein, oil, and fatty acid content in the overexpression plants were significantly higher than the control plants, indicating that Glyma.06G 303700 promoted quality traits (protein and oil content).
  • the haplotype analysis were performed on 680 soybean resource populations in Northeast China.
  • the protein oil content of the soybean resource population in Northeast China was phenotypically analyzed, and the analysis showed that this population had varying amounts of protein and oil content.
  • the highest protein content of this resource group in the northeast region in 2019 was 52.94%, the lowest was 37.09%, and the average was 42.69%; the highest oil contentwas 23.04%, the lowest was 14.45%, and the average was 20.74%.
  • This pattern conforms to the variation law of phenotypic traits and can be used for haplotype analysis of candidate genes.
  • the research team conducted a whole-genome resequencing analysis of the soybean resource population in the northeast region. This experiment used the data to perform gene haplotype analysis.
  • Haplotypes that exceed 5.0% of the population (more than 34 Northeast resource groups) are called excellent haplotypes. There are 3 excellent haplotypes in Block 1, Hap_l, Hap_2 and Hap_3 account for 71.07%, 12.60% and 7.31% of all haplotypes. Use the multiple comparison function of SPSS software, the phenotype of the resource material protein and oil in each group of excellent haplotypes were analyzed.
  • Hap_l and Hap_2 showed extremely significant difference in protein content (P ⁇ 0.01) and significant difference in oil content (P ⁇ 0.05); Hap l and Hap_3 showed extremely significant differences in protein content (P ⁇ 0.01) and oil content (P ⁇ 0.01); Hap 2 and Hap 3 showed no significant difference in protein content, but showed extremely significant difference in oil content (P ⁇ 0.01).
  • Hap_l showed a low-protein phenotype
  • Hap_2 and Hap _3 showed a high -protein phenotype
  • in terms of oil content Hap l and Hap_2 showed a high-oil phenotype
  • Hap_3 showed a low-oil phenotype.
  • the base variation (C:T) in the exon region occurred at 1890 bp of the gene.
  • the SNP variation of Hap_2 was different from the reference genome, and the protein content of the phenotype was extremely different from that of HAP 1 (P ⁇ 0.01 ); oil content is significantly different from HAP 1 (P ⁇ 0.05), and extremely significantly differentfrom Hap_3 (PO.01). See FIG. 6 and Table 55.
  • Hap_4 andHap_5 Two excellent haplotypes in Block 2: Hap_4 andHap_5, which account for 58.01% and 20.68% of the haplotypes, respectively.
  • Usingthe multiple comparison function of SPSS software to analyze the significance of the protein oil phenotype of resource materials in each group of haplotypes there was no significant difference in protein content between Hap_4 andHap_5, but there was a significant difference in oil content (P ⁇ 0.05).
  • Hap_4 showed a low oil phenotype
  • Hap_5 showed a high oil phenotype.
  • Hap_5 had SNP variations differentfrom the reference genome, and the oil phenotype of Hap_5 was significantly different from Hap_4 (P ⁇ 0.05). See FIG. 7 and Table 56.
  • Oil content Hap_4 Hap_5 0.028 Significant There are two excellent haplotypes in Block 3 : Hap_6 and Hap _7, which account for 64.23% and 35.93% of the haplotypes, respectively.
  • Hap_6 and Hap _7 showed significant differences in protein content (P ⁇ 0.05), and there was a very significant difference in terms of oil content (P ⁇ 0.01).
  • the protein content of Hap_6 was higher than that of Hap_7, and Hap_6had a high protein phenotype while Hap_7 was a low protein phenotype.
  • the oil content of Hap_6 was lower than Hap_7, and Hap_6 showed a low oil phenotype while Hap_7 showed a high oil phenotype. See FIG. 8A and 8B and Table 57.
  • Table 58 haplotypes assocated with increased protein content and oil content
  • Glyma.03G040200 has an OPT domain, the gene expression in seeds is low, and there is no difference in parent amino acid sequence;
  • Glyma.03G036300 has a domain, having a function related to DNA repair, and the gene expression of which is absent in various tissues.
  • Glyma.07G 192400 has no recognizable domains, and it is highly expressed in seeds.
  • Glyma.06G303700 has structural domains with a function relatedto lipid transfer and is highly expressed in seeds;
  • Glyma.06G297500 has no recognizable domains, it is expressed in low levels in seeds.
  • Glyma.06G303700 has the domain START_ArGLABRA2_like, having a function related to lipid transfer. Results from tissue-specific expression indicate that the gene is expressed in high levels in the seeds, which maybe related to soybean quality and regulation of the synthesis and metabolism of grain storage related.
  • Glyma.06G303700 is expressed in all tissues and organs, with the highest expression level in seeds.
  • the expression pattern of Glyma.06G303700 and the published soybean seed protein oil-related genes (GmWRIla, GmWRIlb, GmLECla, GmLEClb, GmFUSa, GmABI3, GmABI5, GmDREBE) during the life cycle of soybean seed development are partly similar, showing a low-high-low trend.
  • Glyma.06G303700 Purchasing Arabidopsis gene mutants that are highly homologous to Glyma.06G303700 through the ABRC website: abrc.osu.edu, screening homozygous Arabidopsis mutant seeds, and determining the fatty acid content and total nitrogen content of the grains.
  • the fatty acid content and total nitrogen content of Arabidopsis mutant seeds were significantly lower than control plants.
  • the fatty acid content and total nitrogen content of the mutant replenishment plants increased significantly, and the fatty acid content and total nitrogen content of the overexpression plants also increased.
  • Glyma.06G303700 has important potential in improving soybean quality, and has regulatory effect on improving soybean grain protein and oil content.
  • Block 1 has three excellent haplotypes: Hap_l is a low protein and high oil phenotype, Hap_2 is a high protein and high oil phenotype, and Hap_3 is a high protein and low oil phenotype.
  • Block 2 has two excellent haplotypes: Hap_4 is a low oil phenotype and Hap _5 is a high oil phenotype.
  • Block 3 has two excellent haplotypes: Hap_6 is a high protein and low oil phenotype and Hap _7 is a low protein and high oil phenotype.
  • Example 29 Stacking genes
  • Construct 26627 in which the GmDESlgene (SEQ ID NO:74) is constitutively expressed by the Cauliflower mosaic virus (CaMV) 35 S promoter (SEQ ID NO:23) and the GmSTART gene (SEQ ID NO: 1) is constitutively expressed from the Medicago truncatula glyceraldehyde-3-phosphate dehydrogenase C subunit 1 promoter (SEQ ID NO: 143) was built.
  • the 26627 construct also includes an acetolactate synthase gene from N. tabacum to provide sulfonylurea resistance for selection of positive transformants.
  • the 26627 binary vector was used to transform soybean plants and the resulting plants were characterized. Positive transformants were identified and retained; null segregants were also retained to determine the effects of the 26627 construct on soybean composition.
  • a pairwise comparison trial was designed to identify differencesbetween transgenic (GM) and null segregant plants in a greenhouse setting. Seed composition data was collected at T2 seed stage and analyzed by paired T-test. Seed protein was checked by elemental analyzer, seed oil was exacted by diethyl ether and lipid profile was checked by GC-FID.
  • T1 transgenic soybean seeds were germinated and sampled for Taqman analysis to identify GM homozygous and null plants from the same event.
  • 20 growth uniform seedlings (10 GM and 10 null) were selected and transplanted into soil by placing 1 GM and 1 null side by side to make 10 pairs within a 1 .2x0.7m block.
  • 10 single copy events from the same construct were selected according to genotype and expression data.
  • Leaf and seed samples were taken for gene expression from 3 individual plants from GM and null per event atR6 stage, respectively. Single plant was harvested and threshed to collect seeds atR8 stage, seeds were air dried to -12% water content and delivered to lab for protein and amino acid analysis.
  • Example 29.3 GmDESl-GmSTART co-expression reduces oil content in soy seed [0473] Among 7 soybean events transformed with the 26627 vector that were tested, 3 events accumulate significantly less oil than null segregant plants, ranging from -3.60% to - 7.97% (Table 59). Protein content of GM plants is quite similar as corresponding null plants. Table 59. T2 seed composition change in GmDESl -Gm START co-expression transgenic and Null
  • Example 29,4 GmDESl -GmSTART co-expression modifies seed lipid profile
  • Soybean plants transformed with the 26627 vector accumulate more palmitoleic acid than null segregants, with an increase in the range from 12.53% to 18.78% (Table 60A and Table 60B).
  • the content of oleic acid in transgenic seeds also increased relative to the null segregants, while linoleic acid and linolenic acid reduced significantly.
  • myristic acid, stearic acid, palmitic acid, and eicosadienoic acid are all reduced in transgenic plants relative to the null segregants, which attributed to the phenomenon that the total oil content in seeds of transgenic plants was reduced relative to the null segregants.
  • Table 60A Lipid profile change of T2 seed of GmDESl -Gm START co-expression transgenic and null segregant plants
  • Table 60B Lipid profile change of T2 seed of GmDES 1 -GmSTART co-expression transgenic and null segregant plants

Abstract

Compositions and methods for increasing the protein content and/or increasing oil content of soybean plant are provided. Compositions include isolated and recombinant polynucleotides encoding polypeptides, expression cassettes, host cells, plants, plant parts stably incorporating these polynucleotides. Methods and kits are provided for producing these plants via transgenic means, breeding or genomic editing approaches and identify plants having increased protein content, increased oil content, and/or modified oil profile.

Description

METHODS AND COMPOSITIONS FOR INCREASING PROTEIN AND/OR OIL
CONTENT AND MODIFYING OIL PROFILE IN A PLANT
RELATED APPLICATIONS
[0001] This application claims priority to International Application No. PCT/CN2022/075982, filed onFebruary 11 , 2022, and International Application. No. PCT/CN2022/075977, filed onFebruary 11, 2022. Both international applications are herein incorporated by references in their entireties for all purposes.
SEQUENCE LISTING
[0002] The instant application contains a Sequence Listing which has been submitted electronically in XML format and is hereby incorporated by reference in its entirety. Said XML copy, created on January 6, 2023, is named 82423_ST26 and is 281 Kilobytes in size.
FIELD
[0003] This disclosure relates to the field of plant biotechnology. In particular, it relates to methods and compositions for increasing plant protein/oil content and modifying oil profile.
BACKGROUND
[0004] Soybean is a valuable field crop. Soybean oil extracted from the seed is employed in a number of retail products such as cooking oil, baked goods, margarines, and the like.
Soybean is also used as a grain as a food source for both animals and humans. Soybean meal is a component of many foods and animal feed. Typically, during the processing of whole soybeans, the fibrous hull is removed, and the oil is extracted, and the remaining soybean meal is a combination of approximately 50% carbohydrates and 50% protein. For human consumption soybean meal is made into soybean flour that is processed to protein concentrates used for meat extenders or specialty petfoods. Production of edible protein ingredients from soybean offers a healthier and less expensive replacement for animal protein in meats as well as dairy -type products.
BRIEF SUMMARY
[0005] In one aspect, provided herein is an elite Glycine max plant having in its genome a nucleic acid sequence from a donor Glycine plant, wherein the donor Glycine plant is a different strain from the elite Glycine max plant, and wherein the nucleic acid sequence encoding at least one polypeptide having at least 90% identity or 95% identity to the amino acid sequence ofSEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59, 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136 or 139, wherein said polypeptide confers increased protein, oil content, and/or modified oil profile on the elite Glycine max plant as compared to a control plant not comprising said nucleic acid sequence.
[0006] In another aspect, provided herein is a plant, the genome of which has been edited to comprise a nucleic acid sequence encoding at least one polypeptide having at least 90% identity or 95% identity to the amino acid sequence of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22,24-59, 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136 nd/or 139, wherein said polypeptide confers increased protein, increased oil content, and/or modified oil profile relative to a control plant, wherein the plant does not comprise said nucleic acid sequence before the genome editing.
[0007] In another aspect, provided herein is a plant having stably incorporated into its genome a nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence encodes a polypeptide having (a) an amino acid sequence comprising at least 85%, at least 90%, or at least 95% identity to the amino acid sequence of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or24-59, 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, or, (b) an amino acid sequence set forth in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, wherein said nucleic acid sequence is heterologous to the plant, and wherein the plant has increased protein content and/or increased oil and/or modified oil profile as compared to a control plant.
[0008] In yet another aspect, provided herein is a method of producing a soybean plant having increased protein, increased oil content, and/or modified oil profile, the method comprising the steps of: a) providing a donor soybean plant comprising in its genome a nucleic acid sequence encoding at least one polypeptide having at least 90% identity or 95% identity to any one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, 21, 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134 or 137, or a nucleic acid sequence encoding any one of SEQ ID NOs: 22, or 24-59, 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134 or 137, wherein said nucleic acid sequence confers onto said donor soybean plant an increased protein, increased oil content, and/or modified oil profile; b) crossing the donor soybean plant of a) with the recipient soybean plant not comprising said nucleic acid sequence; and c) selecting a progeny plant from the cross of b) by isolating a nucleic acid from said progeny plant and detecting within said nucleic acid a molecular marker associated with said nucleic acid sequence thereby producing a soybean plant having increased protein content, increased oil content, and/or modified oil profile.
[0009] In yet another aspect, provided herein is a method of conferring increased protein content, increased oil content, and/or modified oil profile to a plant comprising: a) introducing into the genome of the plant a nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence is stably incorporated into the genome, wherein the nucleic acid sequence encodes a polypeptide having (i) an amino acid sequence comprising least 85%, at least 90%, or at least 95% identity to any one of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59, 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, or (ii) an amino acid sequence set forth in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59, 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, wherein said nucleic acid sequence is heterologous to the plant, and wherein expression of said nucleic acid sequence increases protein content, increases oil content, and/or modified oil profile compared to a control plant not expressing said nucleic acid sequence.
[0010] In yet another aspect, provided herein is a polypeptide selected from: (a) a polypeptide having the amino acid sequence shown in any one of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, wherein expression of the polypeptide in a plant confers increased protein, oil content, and/or modified oil profile on said plant, and having a heterologous amino acid sequence attached thereto; (b) a polypeptide comprising the amino acid sequence of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59, 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, and having a sub stitution and/or a deletion and/or an addition of one or more amino acid residues, wherein expression of the polypeptide in the plant confers increased protein, increased oil content, and/or modified oil profile on said plant; (c) a polypeptide having at least 99%, at least 95%, at least 90%, at least 85%, or at least 80% identity with and having the same function as the amino acid sequence of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein the polypeptide when expressed in a plant confers increased protein content, increased oil content, and/or modified oil profile on the plant; or (d) a fusion protein comprising the amino acid sequence of SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59 or the polypeptide as defined in any one of (a) to (c).
[0011] In yet another aspect, provided herein is a nucleic acid molecule comprising (a) a nucleotide sequence encoding a protein having an amino acid sequence sharing at least 90%, 95% or 100% sequence identity to any one of SEQ IDNOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, wherein said nucleotide sequence comprises a heterologous nucleic acid sequence attached thereto and expression of the nucleic acid molecule in a plant increases protein content, increases oil content, and/or modified oil profile in the plant; (b) the nucleotide sequence of part (a) comprising a sequence of any one of SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21, 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134 or 137 or a sequence encoding SEQ ID NO: 22, 24-59; or (c)the nucleotide sequence of part (a) having at least 99%, at least 95%, at least 90% identity to of any one of SEQ ID NOs: NO: 1, 2, 4, 7, 10, 11, 14, 16, 17,
20. 21. 74. 77. 80. 83.86, 110, 113, 116, 119, 122, 125, 128, 131, 134 or l37 ora polynucleotide encoding a polypeptide having the amino acid sequence of any one of SEQ ID NOs: 22, 24-59, 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, and/or 139.
[0012] In yet another aspect, provided herein are primer pairs for amplifying the nucleic acid molecule as disclosed above.
[0013] In some embodiments, disclosed herein is a method of producing a Glycine max plant with increased protein content, increased oil content, and/or modified oil profile, the method comprising the steps of: a) isolating a nucleic acid from a Glycine max plant b) detecting in the nucleic acid of a) at least one molecular marker associated with a nucleic acid sequence comprising any one of SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, 21, 74, 77,
80. 83. 86, 110, 113, 116, 119, 122, 125, 128, 131, 134, or 137, wherein said nucleic acid sequence confers to the Glycine max plant increased protein content, increased oil content, and/or modified oil profile; c) selecting a Glycine max plant based on the presence of the molecular marker detected in b); and d) producing a Glycine max progeny plant from the plant of c) identified as having said molecular marker associated with increased polypeptide and/or increased oil content.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] The present application includes the following figures. The figures are intended to illustrate certain embodiments and/or features of the compositions and methods, and to supplement any description(s) of the compositions and methods. The figures do not limit the scope of the compositions and methods, unless the written description expressly indicates that such is the case. [0015] FIG. 1 shows bolting assessment of of T3 generation wild type Col-0 plants (WT), mutant SALK 021984C plants, and transgenic Arabidopsis replenishment plants (pSOYl: Glyma.20G092400/SALK 0219840) and overexpression plants (pSOYl .Glyma.20G092400) according to certain aspects of this disclosure.
[0016] FIG. 2 shows inflorescence of T3 generation wild type Col-0 plants (WT), mutant SALK 021984C plants, and transgenic Arabidopsis replenishment plants
(pSOYl :Glyma.20G092400/SALK 0219840) andoverexpression plants (pSOYl .Glyma.20G092400) according to certain aspects of this disclosure.
[0017] FIGS. 3A-3B show fatty acid compositions in seeds from wild type plants (WT), mutant SALK 021984C plants, and transgenic Arabidopsis replenishment plants
(pSOYl :Glyma.20G092400/SALK 021984C) and ov erexpression plants
(pSOYl .Glyma.20G092400) according to certain aspects of this disclosure. Asterisks indicate significant differences when compared with WT (*,0.05>P>0.01 and **, P<0.01).
FIG. 3A shows the content of various fatty acids. From left to right: WT (Col-0),
SALK 021984C, pSOYl :Glyma.20G092400/SALK 021984C, and pSOY 1 .Glyma.20G092400. FIG. 3B shows total fatty acid content
[0018] FIG. 4 shows a phylogenetic tree of Glyma. 20G092400 according to certain aspects of this disclosure.
[0019] FIG. 5 shows a phylogenetic tree of Glyma. START (Glyma.06G303700) according to certain aspects of this disclosure.
[0020] FIGS. 6A-6B show the protein content distribution and oil content distribution, respectively, of excellent haplotype phenotypic in block 1 of Glyma. START (Glyma.06G303700) according to certain aspects of this disclosure.
[0021] FIGS. 7A-7B show the protein content distribution and oil content distribution, respectively, of excellent haplotype phenotypic in block 2 of Glyma. START (Glyma.06G303700) according to certain aspects of this disclosure.
[0022] FIG. 8A-8B show the protein content distribution and oil content distribution, respectively, of excellent haplotype phenotypic in block 3 of Glyma. START (Glyma.06G303700) according to certain aspects of this disclosure. DETAILED DESCRIPTION
[0023] All technical and scientific terms used herein, unless otherwise defined below, are intended to have the same meaning as commonly understood by one of ordinary skill in the art. References to techniques employed herein are intended to refer to the techniques as commonly understood in the art, including variations on those techniques and/or substitutions of equivalent techniques that would be apparent to one of skill in the art.
[0024] Provided herein are plants expressing polypeptides that increase protein content and/or increase oil content when expressed in a plant. In some instances, the polypeptides result in a modified oil profile when expressed in a plant or part thereof as compared to a control plantthat does not express the polypeptides. The terms “oil content” and “fatty acid content” are used interchangeably herein. The terms “fatty acid profile” and “oil profile” are used interchangeably herein. The polypeptides include SEQ ID NO: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, and/or 139 and variants of thereof. Various means of introducing nucleic acid sequence into the soybean plant are also disclosed, which include transgenic means, gene editing, and breeding. Markers for identifying the presence of these nucleic acid sequences in the plant are also disclosed. As used herein, the terms “phenotype,” “phenotypic trait” or “trait” refer to a distinguishable characteristic(s) of a genetically controlled trait.
[0025] In some embodiments, the plants provided herein are a non-naturally occurring variety of soybean having the desired trait. In specific embodiments, the non-naturally occurring variety of soybeanis an elite soybean variety. A “non-naturally occurring variety of soybean” is any variety of soybean that does not naturally exist in nature. A “non-naturally occurring variety of soybean” may be produced by any method known in the art, including, but not limited to, transforming a soybean plant or germplasm, transfecting a soybean plant or germplasm, and crossing a naturally occurring variety of soybean with a non-naturally occurring variety of soybean. In some embodiments, a “non-naturally occurring variety of soybean” may comprise one of more heterologous nucleotide sequences. In some embodiments, a “non-naturally occurring variety of soybean” may comprise one or more non-naturally occurring copies of a naturally occurring nucleotide sequence (i.e., extraneous copies of a gene that naturally occurs in soybean). In some embodiments, a “non-naturally occurring variety of soybean” may comprise a non-natural combination of two or more naturally occurring nucleotide sequences (i.e., two or more naturally occurring genes that do not naturally occur in the same soybean, for instance genes not found in Glycine max lines). [0026] Methods and compositions are provided that modulate the level of oil, protein and/or fatty acids in a plant, a plant part, or a seed. In specific embodiments, various methods and compositions are provided that produce an increase in protein content in the plant, plant part or seed. An increase in protein content includes any statistically significant increase in the protein content in the plant, plant part or seed when compared to an appropriate control plant or plant part and includes, for example, an increase of at least 0.2%, 0.4%, 0.6%, 0.8%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30% or higher In other embodiments an increase in protein content includes an increase of about 0.2% to about 0.5%, about 0.5% to about 1%, about 1% to about 2%, about 2% to about 3%, about 4% to about 5%, about 5% to about 6%, about 6% to about 7%, about 7% to about 8%, about 8% to about 9%, about 9% to about 10%, about 10% to about 12%, about 12% to about 14%, about 14% to about 16%, about 16% to about 18%, about 18% to about 20%, about 22% to about 25%, about 25% to about 30%. Various methods of assaying for protein content levels are known. For example, mature seeds canbe harvested, and grain protein content can be determined by FOSS NIR analysis (see examples) or by assaying for nitrogen content with an automatic Kieldahl apparatus.
[0027] In other embodiments, various methods and compositions are provided that produce an increase in oil content (e.g., increase in fatty acid content) in the plant, plant part or seed. An increase in oil content includes any statistically significant increase in the oil content in the plant, plant part or seed when compared to an appropriate control plant or plant part and includes, for example, an increase of at least 0.2%, 0.4%, 0.6%, 0.8%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30% or higher. In other embodiments an increase in oil content includes an increase of about O.2% to about0.5%, about 0.5% to about 1%, about 1 % to about 2%, about 2% to about 3%, about 4% to about 5%, about 5% to about 6%, about 6% to about 7%, about 7% to about 8%, about 8% to about 9%, about 9% to about 10%, about 10% to about 12%, about 12% to about 14%, about 14% to about 16%, about 16% to about 18%, about 18% to about 20%, about 22% to about 25%, about 25% to about 30%. Various methods of assaying for oil content levels are known. For example, mature seeds can be harvested, and grain protein content can be determined by FOSS analysis (see Examples). [0028] In other embodiments, various methods and compositions are provided that produce an increase in fatty acid content in the plant, plant part or seed. An increase in fatty acid content includes any statistically significant increase in the fatty content in the plant, plant part or seed when compared to an appropriate control plant or plant part and includes, for example, an increase of at least 0.2%, 0.4%, 0.6%, 0.8%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30% or higher. In other embodiments, an increase in fatty acid content includes an increase of about 0.2% to about 0.5%, about 0.5% to about 1%, about 1% to about 2%, about 2% to about 3%, about 4% to about 5%, about 5% to about 6%, about 6% to about 7%, about 7% to about 8%, about 8% to about 9%, about 9% to about 10%, about 10% to about 12%, about 12% to about 14%, about 14% to about 16%, about 16% to about 18%, about 18% to about 20%, about 22% to about 25%, about 25% to about 30%. Various methods of assaying for fatty content levels are known. For example, mature seeds can be harvested, and grain protein content can be determinedby gas chromatography (see examples). In specific embodiments, the methods and compositions provide for an increase in linoleic acid and/or palmitic acid and/or oleic acid and/or eicosenoic acid in increased (or any combination thereof) when compared to an appropriate control plant. Such increases include for example, an increase of atleastO.2%, 0.4%, 0.6%, 0.8%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30% or higher. In other embodiments, an increase in linoleic acid and/or palmitic acid and/or oleic acid and/or eicosenoic acid in increased (or any combination thereof) includes an increase of about 0.2% to about 0.5%, about 0.5% to about 1 %, about 1 % to about 2%, about 2% to about 3%, about 4% to about 5%, about 5% to about 6%, about 6% to about 7%, about 7% to about 8%, about 8% to about 9%, about 9% to about 10%, about 10% to about 12%, about 12% to about 14%, about 14% to about 16%, about 16% to about 18%, about 18% to about 20%, about 22% to about 25%, about 25% to about 30%. orhigher of linoleic acid and/or palmitic acid and/or oleic acid and/or eicosenoic acid.
[0029] A "subject plant or plant cell" is one in which genetic alteration, such as transformation, has been affected as to a polynucleotide of interest, oris a plant or plant cell which is descended from a plant or cell so altered and which comprises the alteration. A "control" or "control plant" or "control plant cell" provides a reference point for measuring changes in phenotype of the subjectplant or plant cell. A control plant or plant cell may comprise, for example: (a) a wild-type plant or cell, i.e., of the same genotype as the starting material for the genetic alteration which resulted in the subject plant or cell; (b) a plant or plant cell of the same genotype as the starting material but which has been transformed with a null construct (i.e., with a construct which has no known effect on the trait of interest, such as a construct comprising a marker gene); (c) a plant or plant cell which is a non-transform ed segregant among progeny of a subject plant or plant cell; (d) a plant or plant cell genetically identical to the subject plant or plant cell but which is not exposed to conditions or stimuli that would induce expression of the gene of interest; or (e) the subject plant or plant cell itself, under conditions in which the gene of interest is not expressed.
I. Polynucleotides and polypeptides that confer increased protein content, increased oil content, and/or modified oil profile
[0030] Compositions and methods for conferring increased protein content, increased oil content, and/or modified oil profile are provided. Polypeptides, polynucleotides and fragments and variants thereof that confer increased protein content, increased oil content, and/or modified oil profile are provided. In some embodiments, the polypeptide is SEQ ID NO: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136 or 139 or a fragment or variantof any one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136 or 139. In some embodiments, the polynucleotide is any one of SEQ ID NOs: 74,
77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134 or 137. In some embodiments, the polynucleotide encodes a polypeptide having the sequence of any one of SEQ ID NOs: 75,
78, 81, 84, 87, 111, 114, 117, 120, 123, 126, 129, 132, 135 or 138 or a fragment or variant of any one thereof. In some embodiments, the polypeptide is SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, or a fragment or variant of any one of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59. In some embodiments, the polynucleotide is any one of SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21, a polynucleotide encoding a polypeptide having the sequence of any one of SEQ ID NOs: 22, or 24-59, or a fragment or variant of any one thereof.
As used herein, the term “gene” refers to a hereditary unit including a sequence of DNA that occupies a specific location on a chromosome and that contains the genetic instruction for a particular characteristic or train in an organism. In various embodiments, the genome of the soybean cultivar Williams 82 is used as the reference soybean genome. Williams 82 (ncbi(.)nlm(.)nih(.)gov/assembly/GCF_000004515.6/?&utm_source=gene) was derived from b ackcrossing a Phytophthora root rot resistance locus from the donor parent Kingwa into the recurrent parent Williams. See Schmutz et al., Nature, 2010 Jan 14;463(7278): 178-83. doi: 10.1038/nature08670.
Glyma.20G092400
[0031] Glyma.20G092400 (SEQ ID NO: 76) is detected in all tissues and organs, with the highest expression level in seeds (herein also referredto as grains) (Table 1). The expression level is the highest in the late milk (LM) stage of the grain. Glyma.20G092400 includes several conserved domains of the amino acid transferase-V family. This domain is found in amino acid transferase and other enzymes including cysteine desulfurase. Glyma.20G092400 comprises a selenocysteine lyase/cysteine desulfurase (aa 50-437 of SEQ ID NO: 76); a cysteine desulfurase (SufS)-like domain (aa 91 -274 of SEQ ID NO: 76); an aminotransferase class-V domain (aa 93-274 of SEQ ID NO: 76), and a bifunctional selenocysteine lyase/cysteine desulfurase (aa 92-275 of SEQ ID NO: 76).
G/y/w 20G092000
Glyma.20G092000 (SEQ ID NO. 6) is detected in all tissues and organs, with the highest expression level in seeds (grains) (Table 1). The expression level is the highest in the LM stage of the grain. Glyma.20G092000 comprises several conserved domains in the retroviral protease superfamily, which includes the pepsin-like aspartic protease of cells and retroviruses, and also has sphingolipid activator-like protein type B, region 1 and region 2. Glyma.20G092000 comprises: a Phytepsin domain (aa 76-505 of SEQ ID NO: 79); a Eukaryotic aspartyl protease (ASP) domain (aa 84-506 of SEQ ID NO: 79); an aspartyl protease domain (aa 77-507 of SEQ ID NO: 79); two Saposin (B) Domains (aa 316-351 and aa380-418 of SEQ ID NO: 79).
Glyma.20G094900
[0032] Glyma.20G094900 (SEQ ID NO: 9) sequence is detected in all tissues and organs, with the highest expression level in seeds (grains) (Table 1). The expression level is the highest in the LM stage of the grain. Glyma.20G094900 is a protein with unknown function identified as DUF1336) and appears to belongto the DUF1336 superfamily. This family represents the C-terminus of many pseudoproteins with unknown function.
Glyma.20G094900 comprises a protein enhanced disease resistance 2 (EDR2) C-terminal domain (aa 2-68 of SEQ ID NO: 82).
Glyma.20G092100 [0033] Glyma.20G092100 (SEQ IDNO:85) sequence is detected in all tissues and organs, with the highest expression level in seeds (grains) (Table l).The expression level is the highest in the DS stage of the grain. Glyma.20G092100 comprises several conserved domains matching to the PPR repeatfamily. Glyma.20G092100 comprises several tetratricopeptide- like (TPR) helical domains (aa 57-253, aa 229-365, and aa 404-461 of SEQ ID NO: 85) and pentatricopeptide repeats (aa 403-429, aa 578-607, aa 438-461, aa 370-398, aa 647-675, aa 194-241, aa 264-313, aa 265-299, aa 540-574, aa 300-334, aa 435-469, aa 644-678, aa 403- 434, aa 195-229, aa 230-264, aa 88-122, aa 158-194, aa 335-365, aa 470-504, aa 366-400, aa 123-157, aa 575-609, aa 370-398, aa 648-680, aa 269-301, aa 438-470, aa 233-265, aa 578- 610, aa 403-429, aa 578-607, aa 438-461, aa 370-398, aa 647-675, aa 194-241, aa 264-313, aa 265-299, aa 540-574, aa 300-334, aa 435-469, aa 644-678, aa 403-434, aa 195-229, aa 335-365, aa 158-194, aa 88-122, aa 230-264, aa 470-504, aa 366-400, aa 505-539, aa 123- 157, aa 575-609, aa 370-398, aa 269-301, aa 648-680, aa 438-470, aa 233-265, and aa 578- 610 of SEQ ID NO: 85).
Table 1. Relative polypetide expression levels in various plant tissues
Figure imgf000013_0001
Glyma.06G 303700
[0034] Glyma.06G303700 (SEQ ID NO: 1-5) sequence is expressed in all tissues and organs, with the highest expression level in seeds. Glyma. START (Glyma.06G303700) comprises several conserved domains: a START_ArGLABRA2_like domain (aa 241-465 of SEQ ID NO: 3 & 5); a START domain (aa 246-466 of SEQ ID NO: 3 & 5); a homeobox domain (aa 57-110 of SEQ ID NO: 3 & 5); a homeodomain(aa 55-113 of SEQ ID NO: 3 & 5); a COG5576 superfamily domain (aa 13-129 of SEQ ID NO: 3 & 5); and a MreC superfamily domain (aa 120-193 of SEQ ID NO: 3 & 5). The START_ArGLABRA2_like domain is the C-terminal lipid-binding START domain of the Arabidopsis homeobox protein GLABRA 2. The START_ArGLABRA2_like subfamily includes the Arabidopsis homeobox protein GLABRA2 and other proteins related to steroid production. The homeobox domain encodes a 61 -amino acid sequence, which has the ability to bind specific DNA sequences and control gene expression at the transcriptional level. The COG5576 superfamily domain is a homeodomain-containing, transcriptional regulation domain. MreC superfamily domain usually involves in formation and maintenance of cell shape, which can position cell wall synthetic complexes.
[0035] The genomic sequence of Glyma.06G303700is 8466 bp in length, and the CDS sequence is 2190 bp in length. The exon region of Glyma.06G303700 (SEQ ID NO: 3) in soy variety SN14 is identical to the corresponding gene in soy variety Williams82 (W82). Wild soybean (G. soja) variety ZYD00006 (ZYD) comprises four mutations in Glyma.06G303700 relative to Williams82: Cl 162T (i.e., change from C to T at 1162 bp position), A1370G (i.e., change from Ato Gat 1370 bp position), C2063G(i.e., change fromC to Gat2063 bp position), and C2098G (i.e., change from Gto A at2098 bp position). The last three base mutations do not result in any changes in the encoded amino acids, but the first base mutation, Cl 162T, resulted in an alanineto valine substitution atposition 388, i.e., A388V.
[0036] The phylogenetic tree of Glyma.06G303700was constructed using homologous sequences from Soybean, Arabidopsis, rice, corn, and other plants with MEGA5 software. See FIG. 5 and Table 62. Glyma.06G303700 shows high homology with Glyma. 15G220200, Glyma.l2G100100, and AT1G05230. Glyma.12G100100 contains the same conserved domains as Glyma.06G303700. AT 1G05230 contains START_ArGLABRA2_like and homeobox domains, which are also present in Glyma.06G303700. AT1G05230 and Glyma. START (Glyma.06G303700) share 78.9% amino acid sequence identity.
Glyma.03G040200
[0037] Glyma.03G040200 (SEQ ID NO: 10- 12) has an OPT domain (aa 4-73 of SEQ ID NO: 12), which is related to transmembrane transport. Glyma.03G040200 is expressedin low levels in seeds. The genomic sequence of Glyma.03G040200 (SEQ ID NO: 10) is 463 bp in length, and the CDS sequence (SEQ ID NO: 11) is 237 bp in length. As compared with soy variety Williams82 (SEQ ID NO: 12), there are five coding region mutations in both soy variety SN14 and ZYD00006, all ofwhich result in amino acid substitutions. These mutations are A2V, R13S, T24I, G70S, and W48* (tryptophan to a stop codon).
Glyma.03G036300
[0038] Glyma.03G036300 (SEQ ID NO: 6-9) is a pifl helicase and is involved in a number of cellular processes including DNA repair, DNA strand breaking, recombination, nucleotide binding, ATP binding, telomere maintenance, and cell response to DNA damage stimulation. The protein possesses helicase activity and hydrolase activity. Glyma.03G036300 comprises aPIFl domain (aa 2-211 of SEQ ID NO: 8), a SFI C RecD domain (aa 258-303 of SEQ ID NO: 8), and a RecD domain (aa 250-294). PIF1 domain is a conserved domain shared by the PIFl-like helicase family. The SFI C RecD domain is found in the C-terminal helicase domain of Rec D family helicases. The RecD domain is found in the ATP-dependent exoDNAses and the like and acts as a 3 '-5' helicase. RecBCD enzyme can unfold or separate DNA strands and also forms single-stranded gaps in DNA.
[0039] The full length of genomic sequence of Glyma.03G036300 in W82 (SEQ ID NO: 6) is 988 bp, and the full length of CDS (SEQ ID NO: 7) is 987 bp. Glyma.03 G036300 in ZYD is same as thatin W82. The translation of Glyma.03g036300 is terminated at 294th amino acid in SN14 (SEQ ID NO: 9), and it can be translated normally in ZYD00006 (SEQ ID NO: 8).
Glyma.07G192400
[0040] Glyma.07Gl 92400 (SEQ ID NO:16-19) is highly expressed in seeds andis involved in transmembrane transport. No conserved domain information was known for
Glyma.07G192400. The genome sequence of the n Glyma.07G 192400 (SEQ ID NO: 16) is 4263 bp in length, and the CDS sequence (SEQ ID NO: 18) is 417 bp in length. Only one base mutation occurred in ZYD00006, and the mutation was G-A. Translating the CDS sequence of the gene into amino acid sequence, it was found that the base mutation in the CDS sequence led to the change of amino acid translation, resulting in the change of amino acid from V (valine) as in SN14 or W82 (SEQ ID NO: to I (isoleucine) as in ZYD00006 at position 46 of the Glyma.07G192400 polypeptide. Glyma. 06g297500
[0041] Currently, no conserved domain information is knownfor Glyma.06g297500 (SEQ ID NO: 13-15). The full-length genomic sequence of Glyma.06G297500 (SEQ ID NO: 13) is 463 bp, and the full length CDS sequence (SEQ ID NO: 14) is 237 bp. The CDS sequence and amino acid sequence are identical in all three of soy varieties SN14, ZYD00006, and Williams82.
[0042] Descriptions of functional domains of the genes are further describedin Table 2, below.
Table 2. Functional annotation of genes
Figure imgf000016_0001
[0043] The term “corresponding to” in the context of nucleic acid sequences means that when the nucleic acid sequences of certain sequences are aligned with each other, the nucleic acids that “correspond to” certain enumerated positions in the present invention are those that align with these positions in a reference sequence, butthat are not necessarily in these exact numerical positions relative to a particular nucleic acid sequence of the invention. Optimal alignment of sequences for comparison can be conducted by computerized implementations of known algorithms, or by visual inspection. Readily available sequence comparison and multiple sequence alignment algorithms are, respectively, the Basic Local Alignment Search Tool (BLAST) and ClustalW/ClustalW2/Clustal Omega programs available on the Internet (e.g., the website of the EMBL-EBI). Other suitable programs include, but are not limited to, GAP, BestFit, Plot Similarity, and FASTA, which are part of the Accelrys GCG Package available from Accelrys, Inc. of San Diego, Calif., United States of America. See also Smith & Waterman, 1981; Needleman & Wunsch, 1970; Pearson & Lipman, 1988; Ausubel et al., 1988; and Sambrook & Russell, 2001.
[0044] In some embodiments, variants and fragments of the above-described polynucleotides and polypeptides and variants and fragments thereof increase protein content, increase oil content, and/or modify oil profile when expressed in a plant, plant part, or seed.
[0045] Fragments of the proteins that increase protein content, increase oil content, and/or modify oil profile when expressed in a plant, plant part, or seed include those that are shorter than the full-length sequences, either due to the use of an alternate downstream start site, or due to processing that produces a shorter protein having the activity In some embodiments, a fragment of a protein that increases protein content, increases oil content, and/or modifies oil profile when expressed in a plant can be a polypeptide that is, for example, 10, 25, 50, 100, 150, 200, 250 or more amino acids in length of any one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136 or 139. Such biologically active portions can be prepared by recombinant techniques and evaluated for activity of being able to confer increased protein content, increased oil content, and/or modified oil profile. As used herein, in particular embodiments, a fragment comprises at least 8 contiguous amino acids of SEQ ID NO: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136 or 139. A fragment of a protein that increases protein content, increases oil content, and/or modifies oil profile when expressed in a plant can be a polypeptide that is, for example, 10, 25, 50, 100, 150, 200, 250 or more amino acids in length of any one of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24- 59. Such biologically active portions can be prepared by recombinant techniques and evaluated for activity of being ableto confer increased protein content, increased oil content, and/or modified oil profile. As used herein, in other particular embodiments, a fragment comprises at least 8 contiguous amino acids of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59. [0046] Variants disclosed herein are polypeptides having an amino acid sequence that has at least 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 86%, about 87%, about 88%, about 89%, about 90%, about 91 %, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98% or about 99% identity to the amino acid sequence of any one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, 139, 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59. Such variants will increase protein content, increased oil content, and/or modified oil profile when expressed in a plant, plant part or seed. In some embodiments, a variant polynucleotide comprises a deletion and/or addition of one or more nucleotides at one or more internal sites within the native polynucleotide and/or a substitution of one or more nucleotides at one or more sites in the native polynucleotide.
[0047] Unless otherwise stated, identity and similarity will be calculated by the Needleman-Wunsch global alignment and scoring algorithms (Needleman and Wunsch (1970) J. Mol. Biol. 48(3):443-453) as implemented by the "needle" program, distributed as part of the EMBOSS software package (Rice, P., Longden, I., and Bleasby, A., EMBOSS: The European Molecular Biology Open Software Suite, 2000, Trends in Genetics 16, (6) pp276-277, versions 6.3.1 available from EMBnet at embnet.org/resource/emboss and emboss.sourceforge.net, among other sources) using default gap penalties and scoring matrices (EBLOSUM62 for protein and EDNAFULL for DNA). Equivalent programs may also be used. By "equivalent program" is intended any sequence comparison program that, for any two sequences in question, generates an alignment having identical nucleotide residue matches and an identical percent sequence identity when compared to the corresponding alignment generated by needle from EMBOSS version 6.3.1.
[0048] Additional mathematical algorithms are known in the art and can be utilized for the comparison of two sequences. See, for example, the algorithm of Karlin and Altschul (1990) Proc. Natl. Acad. Sci. USA 87:2264, modified as in Karlin and Altschul (1993) Proc. Natl. Acad. Sci. USA 90:5873-5877. Such an algorithm is incorporated into the BLAST programs of Altschul et al. (1990) J. Mol. Biol. 215:403. BLAST nucleotide searches can be performed with the BLASTN program (nucleotide query searched against nucleotide sequences) to obtain nucleotide sequences homologous to nucleic acid molecules of the invention, or with the BLASTX program (translated nucleotide query searched against protein sequences) to obtain protein sequences homologous to nucleic acid molecules of the invention. BLAST protein searches can be performed with the BLASTP program (protein query searched against protein sequences) to obtain amino acid sequences homologous to protein molecules of the invention, or with the TBLASTN program (protein query searched against translated nucleotide sequences) to obtain nucleotide sequences homologous to protein molecules of the invention. To obtain gapped alignments for comparison purposes, Gapped BLAST (in BLAST 2.0) can be utilized as described in Altschul etal. (1997) Nucleic Acids Res. 25:3389. Alternatively, PSI-Blast can be used to perform an iterated search that detects distant relationships between molecules. See Altschul et al. (1997) supra. When utilizing BLAST, Gapped BLAST, and PSI-Blast programs, the default parameters of the respective programs (e g., BLASTX and BLASTN) can be used. Alignment may also be performed manually by inspection.
[0049] Two sequences are "optimally aligned" when they are aligned for similarity scoring using a defined amino acid substitution matrix (e.g., BLOSUM62), gap existence penalty and gap extension penalty so as to arrive at the highest score possible for that pair of sequences. Amino acid substitution matrices andtheiruse in quantifying the similarity between two sequences are well-known in the art and described, e g., in Dayhoff etal. (1978) "A model of evolutionary change in proteins." In "Atlas of Protein Sequence and Structure," Vol. 5, Suppl. 3 (ed. M. O. Dayhoff), pp. 345-352. Natl. Biomed. Res. Found., Washington, D C. and Hemkoffet al. (1992) Proc. Natl. Acad. Sci. USA 89: 10915-10919. The BLOSUM62 matrix is often used as a default scoring substitution matrix in sequence alignment protocols. The gap existence penalty is imposed for the introduction of a single amino acid gap in one of the aligned sequences, and the gap extension penalty is imposed for each additional empty amino acid position inserted into an already opened gap. The alignment is defined by the amino acids positions of each sequence at which the alignment begins and ends, and optionally by the insertion of a gap or multiple gaps in one or both sequences, so as to arrive at the highest possible score. While optimal alignment and scoring can be accomplished manually, the process is facilitated by the use of a computer-implemented alignment algorithm, e.g., gapped BLAST 2.0, described in Altschul et al. (1997) Nucleic Acids Res. 25:3389-3402, and made available to the public at the National Center for Biotechnology Information Website (www.ncbi.nlm.nih. ov). Optimal alignments, including multiple alignments, canbe prepared using, e.g., PSI- BLAST, available through www.ncbi.nlm.nih.gov and described by Altschul etal. (1997) Nucleic Acids Res. 25:3389-3402.
[0050] In some embodiments, fragments and variants of the polypeptides disclosed herein each comprises one or more conserved domains of the canonical polypeptide. In some embodiments, the variant or fragment can comprise a polypeptide comprising at least 40%, 50%, 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% identity to one or more of the conserved domains in the canonical polypeptide sequence.
[0051] In one example, a variant or fragment of Glyma.20G092400 (SEQ ID NO: 76) may comprise a selenocysteine lyase/Cysteine desulfurase (aa 50-437 of SEQ ID NO: 76); a Cysteine desulfurase (SufS)-like domain (aa 1-274 of SEQ ID NO: 76); an Aminotransferase class-V domain (aa 93-274 of SEQ ID NO: 76), and a Bifunctional selenocysteine lyase/cysteine desulfurase (aa 92-275 of SEQ ID NO: 76). A variant or fragment of Glyma. 20G092400 (SEQ ID NO: 76) can comprise a polypeptide comprising at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% identical to one or more of the conserved domains of Glyma.20G092400 (SEQ ID NO:76).
[0052] In another example, a variant or fragment of Glyma.20G092000 (SEQ ID NO:79) can comprise a Phytepsin domain (aa 76-505 of SEQ ID NO: 79); a Eukaryotic aspartyl protease (ASP) domain (aa 84-506 of SEQ ID NO: 79); an aspartyl protease domain (aa 77- 507 of SEQ ID NO: 79); two Saposin (B) Domains (aa 316-351 and aa380-418 of SEQ ID NO: 79). A variant or fragment of Glyma.20G092000 (SEQ ID NO:79) can retain functionality as aspartic proteinase. A variant or fragment of Glyma. 20G092000 (SEQ ID NO: 79) can comprise a polypeptide comprising at least 60%, at least 65%, at least 70%, at least 75%, atleast 80%, atleast 85%, atleast 90%, atleast 95%, atleast 98%, or atleast99% identical to one or more of the conserved domains of Glyma. 20G092000 (SEQ ID NO:79).
[0053] In another example, a variant or fragment of Glyma.20G094900 (SEQ ID NO: 82) Glyma.20G094900 can comprise one or more of the conserved domains of a DUF1336 superfamily protein. In some embodiments, the variant or fragment can comprise a protein enhanced disease resistance 2 (EDR2) C-terminal domain (aa 2-68 of SEQ ID NO: 82). A variant or fragment of Glyma. 20G094900 (SEQ ID NO: 82) can comprise a polypeptide comprising at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% identical to one or more of the conserved domains of Glyma. 20G094900 (SEQ ID NO:82). A variant or a fragment of Glyma. 20G094900 can retain activities similar to EDR2 in regulating pathogen resistance. [0054] In another example, a variant or fragment of Glyma.20G092100 (SEQ ID NO: 85) can comprise one or more of a tetratricopeptide-like (TPR) helical domains (aa 57-253, aa 229-365, and aa 404-461 of SEQ ID NO: 85) and/or one or more of the pentatricopeptide repeats (aa 403-429, aa 578-607, aa 438-461, aa 370-398, aa 647-675, aa 194-241, aa 264- 313, aa 265-299, aa 540-574, aa 300-334, aa 435-469, aa 644-678, aa 403-434, aa 195-229, aa 230-264, aa 88-122, aa 158-194, aa 335-365, aa 470-504, aa 366-400, aa 123-157, aa 575- 609, aa 370-398, aa 648-680, aa 269-301, aa 438-470, aa 233-265, aa 578-610, aa 403-429, aa 578-607, aa 438-461, aa 370-398, aa 647-675, aa 194-241, aa 264-313, aa 265-299, aa 540-574, aa 300-334, aa 435-469, aa 644-678, aa 403-434, aa 195-229, aa 335-365, aa 158- 194, aa 88-122, aa 230-264, aa 470-504, aa 366-400, aa 505-539, aa 123-157, aa 575-609, aa 370-398, aa 269-301, aa 648-680, aa 438-470, aa 233-265, andaa 578-610 of SEQ ID NO: 12). A variant or fragment of Glyma.20G092100 (SEQ ID NO: 85) can comprise a polypeptide comprising at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99% identical to one or more of the conserved domainsof Glyma.20G092100 (SEQ ID NO:85). Avariant or fragment of Glyma.20G092100 (SEQ ID NO:85) can retain activitivies similar to TPR in mediating protein-protein interactions and the assembly of multiple protein complexes.
[0055] In another example, a variant or fragment of Glyma.06G303700 (SEQ ID NO: 3 & 5) may comprise one or more of the conserved domainsof the START_ArGLABRA2_like domain (aa 241-465 of SEQ ID NO: 3 & 5); the START domain (aa 246-466 of SEQ ID NO: 3 & 5); the homeobox domain (aa 57-110 of SEQ ID NO: 3 & 5); the homeodomain (aa 55- 113 of SEQ ID NO: 3 & 5); the COG5576 superfamily domain (aa 13-129 of SEQ ID NO: 3 & 5); and/or the MreC superfamily domain. Avariant or fragment of Glyma.06G303700 (SEQ ID NO: 3 & 5) can retain activity as a transcription factor.
[0056] In another example, a variant or fragment of Glyma.03G040200 (SEQ ID NO: 12) can comprise a polypeptide comprising atleast60%, atleast 65%, at least70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 98% or at least 99% identity to one or more of the conserved domainsof Glyma.03G040200 (SEQ ID NO: 12). Avariant or fragment of Glyma.03G040200 (SEQ ID NO: 12) can retain activity as in transmembrane transport.
[0057] In another example, a variant or fragment of Glyma.03G036300 (SEQ ID NO: 8) can comprise a polypeptide comprising at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 98% or at least 99% identical to one ormore of the conserved domainsof Glyma.03G040200 (SEQ ID NO: 12). Avariant or fragment ofGlyma.03 G036300 (SEQ ID NO: 8) can retain activity as a pifl helicase.
[0058] In another example, a variant or fragment of Glyma.06g297500 (SEQ ID NO: 15) can comprise a polypeptide comprising at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 98% or at least 99% identical to one ormore of the conserved domainsof Glyma.06g297500 (SEQ ID NO: 15).
[0059] As indicated, fragments and variants of the polypeptides disclosed herein will retain the activity of conferring increased protein content, increased oil content, and/or modified oil profile to a plant expressing the polypeptide. Such increase in protein content and/or oil content can comprise any statistically significant increase, including, for example an increase of about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 85%, 90%, 95% or greater relative to a control. Methods of determining protein content or oil content are further described below.
[0060] In some embodiments, the polypeptides disclosed herein may comprise a heterologous amino acid sequence attached thereto. For example, a polypeptide may have a polypeptide tag or additional protein domain attached thereto. The heterologous amino acid sequence can be attached to the N terminus, the C terminus, or internally within the polypeptide. In some instances, the polypeptide may have one or more polypeptide tags and/or additional protein domains attached thereto at one or more positions of the polypeptide.
[0061] In some embodiments, the nucleic acid sequence encoding the polypeptides disclosed herein may comprise a heterologous nucleic acid sequence attached thereto. For example, the heterologous nucleic acid sequence may encode a polypeptide tag or additional protein domain that will be attached to the encoded polypeptide. As another example, the heterologous nucleic acid sequence may encode a regulatory element such as an intron, an enhancer, a promoter, a terminator, etc. The heterologous nucleic acid sequence canbe positioned at the 5' end, the 3' end, or in-frame within the coding sequence of the polypeptide. In some instances, the nucleic acid sequence encoding the polypeptides disclosed herein may have one or more heterologous nucleic acid sequences attached thereto at one or more positions of the nucleic acid sequence. [0062] As used herein, "heterologous" or “recombinant” in reference to a polypeptide or polynucleotide sequence is a sequence that originates, for example, from a cell or an organism with another genetic background of the same species or from a foreign species, or, if from the same species, is substantially modified from its native form in composition and/or genomic locus by deliberate human intervention. As such, heterologous sequences are in a configuration not found in nature. As used herein, a “native” polynucleotide or polypeptide comprises a naturally occurring nucleotide sequence or amino acid sequence, respectively. As such, “heterologous” or “recombinant” refers to, when used in reference to a polynucleotide, a polynucleotide encoding a factor that is not in its natural environment (i.e., has been altered by the of man). For example, a heterologous gene may include a polynucleotide from one species introduced into another species. A heterologous polynucleotide may also include a polynucleotide native to an organism that has been altered in some way (e.g., mutated, added in multiple copies, linked to a non-native promoter or enhancer polynucleotide, etc.). Heterologous genes further may comprise plant polynucleotides that comprise cDNA forms of a plant gene; the cDNAs may be expressed in either a sense (to produce mRNA) or antisense orientation (to produce an antisense RNA transcript that is complementary to the mRNA transcript). In one aspect of the invention, heterologous polynucleotides are distinguished from endogenous plant genes in that the heterologous gene polynucleotide are joined to polynucleotides comprising regulatory elements such as promoters that are not found naturally associated with the polynucleotide for the protein encoded by the heterologous polynucleotide or with plant polynucleotide in the chromosome, or are associated with portions of the chromosome not found in nature (e.g., polynucleotides expressed in loci where the polynucleotide is not normally expressed). Further, in embodiments, a “heterologous” or “recombinant” polynucleotide is a polynucleotide not naturally associated with a host cell into which it is introduced, including non-naturally occurring multiple copies of a naturally occurring polynucleotide.
IL Expression cassettes and promoters
[0063] Polynucleotides encoding the polypeptides provided herein can be provided in expression cassettes for expression in an organism of interest. The cassette will include 5' and 3 ' regulatory sequences operably linked to a polynucleotide encoding a polypeptide provided herein that allows for expression of the polynucleotide. The cassette may additionally contain at least one additional gene or genetic element to be co-transformed into the organism. Where additional genes or elements are included, the components are operably linked. Alternatively, the additional gene(s) or element(s) can be provided on multiple expression cassettes. Such an expression cassette is provided with a plurality of restriction sites and/or recombination sites for insertion of the polynucleotides to be under the transcriptional regulation of the regulatory elements or regions. The expression cassette may additionally contain a selectable marker gene.
[0064] The expression cassette will include in the 5 '-3' direction of transcription, a transcriptional and translational initiation region (i.e., a promoter), a polynucleotide of the invention, and a transcriptional and translational termination region (i.e., termination region) functional in the organism of interest, i.e., a plant or bacteria. The promoters of the invention are capable of directing or driving transcription and expression of a coding sequence in a host cell. The regulatory regions (i.e., promoters, transcriptional regulatory regions, and translational termination regions) may be endogenous or heterologous to the host cell or to each other. As used herein, a chimeric gene or a chimeric nucleic acid molecule comprises a coding sequence operably linked to a transcription initiation region that is heterologous to the coding sequence.
[0065] A variety of transcriptional terminators are available for use in expression cassettes. These are responsible for the termination of transcription beyond the transgene and correct mRNA polyadenylation. The termination region may be native with the transcriptional initiation region, may be native to the operably linked DNA sequence of interest, maybe native to the planthost, or may be derived from another source(z.e., foreign or heterologous to the promoter, the DNA sequence of interest, the plant host, or any combination thereof). Appropriate transcriptional terminators are those that are known to function in plants and include the CAMV pSOYl terminator, the tml terminator, the nopaline synthase terminator and the pea rbcs E9 terminator. These can be usedin both monocotyledons and dicotyledons. In addition, a gene's native transcription terminator may be used. Termination regions used in the expression cassettescan be obtained from, e.g., the Ti-plasmid of A. tumefaciens, such as the octopine synthase and nopaline synthase termination regions. See also Guerineauet al. (1991) Mol. Gen. Genet. 262: 141-144; Proudfoot (1991) Cell 64:671-674; Sanfacon et al. (1991) Genes Dev. 5 : 141-149; Mogen et al. (990) Plant Cell 2: 1261-1272; Munroe et al. (1990) Gene 91 : 151-158; Ballas et al. (1989) Nucleic Acids Res. 17:7891-7903; and Joshi et al. (1987) Nucleic Acids Res. 15 :9627-9639. [0066] Additional regulatory signals include, but are not limited to, transcriptional initiation start sites, operators, activators, enhancers, other regulatory elements, ribosomal binding sites, an initiation codon, termination signals, andthe like. See, for example, U. S. Pat. Nos. 5,039,523 and 4,853,331; EPO 0480762A2; Sambrook et al. (1992) Molecular Cloning: A Laboratory Manual, ed. Maniatis et al. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.), hereinafter “Sambrook 11”; Davis etal, eds. (1980).
[0067] In preparingthe expression cassette, the various DNA fragments may be manipulated, so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame. Toward this end, adapters or linkers may be employed to join the DNA fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like. For this purpose, in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions, maybe involved. a. Promoters
[0068] A number of promoters can be used in the practice of the invention. The promoters can be selected based on the desired outcome. The nucleic acidscan be combined with constitutive, inducible, tissue-preferred, or other promoters for expression in the organism of interest. See, for example, promoters setforthin WO 99/43838 andin US PatentNos: 8,575,425; 7,790,846; 8, 147,856; 8,586832; 7,772,369; 7,534,939; 6,072,050; 5,659,026; 5,608,149; 5,608,144; 5,604,121 ; 5,569,597; 5,466,785; 5,399,680; 5,268,463; 5,608,142; and 6,177,611; herein incorporated by reference. In some embodiments, the promoter used herein to drive the expression of the polynucleotides provided herein comprises an exogenous promoter. The term “exogenous promoter,” refers to a promoter that is not found in plants in nature, for example, a synthetic promoter.
[0069] For expression in plants, constitutive promoters can also be used. Non-limiting examples of constitutive promoters include CaMV pSOY 1 promoter (Odell et al. (985) Nature 313 :810-812); rice actin (McElroy et al. (1990) Plant Cell 2: 163-171); ubiquitin (Christensen et al. (1989) Plant Mol. Biol. 12:619-632 and Christensen et al. (1992) Plant Mol. Biol. 18:675-689); pEMU (Last et al. (1991) Theor. Appl. Genet. 81 : 581 -588); MAS (Velten e/ a/. (1984) EMBO J. 3 :2723-2730). Inducible promoters include those that drive expression of pathogenesis-related proteins (PR proteins), which are induced following infection by a pathogen. See, for example, Redolfi etal. (1983) Neth. I. PlantPathol. 89:245- 254; Uknes etal. (1992) Plant Cell 4:645-656; and Van Loon (1985) Plant Mol. Virol. 4: 111-116; and WO 99/43819, herein incorporated by reference. Promoters that are expressed locally at or near the site of pathogen infection may also be used (Marineau etal. (1987) Plant Mol. Biol. 9:335-342; Matton et al. (1989) Molecular Plant-Microbe Interactions 2: 325-331 ; Somsisch etal. (1986) Proc. Natl. Acad. Sci. USA 83:2427-2430; Somsischet al. (1988) Mol. Gen. Genet. 2:93-98; and Yang (1996) Proc. Natl. Acad. Sci. USA 93 : 14972-14977; Chen et al. (1996) Plant J. 10 :955 -966; Zhang et al. (1994) Proc. Natl. Acad. Sci. USA 91 :2507- 2511; Warner et al. (1993) Plant J. 3 : 191-201; Siebertz et al. (1989) Plant Cell 1 :961- 968; Cordero etal. (1992) Physiol. Mol. Plant Path. 41 : 189-200; U. S. PatentNo. 5,750,386 (nematode-inducible); and the references cited therein).
[0070] Wound-inducible promoters maybe usedin methods and compositions in this disclosure. Such wound-inducible promoters include pin II promoter (Ryan (1990) Ann. Rev. Phytopath. 28:425-449; Ouan et al. (1996) Nature Biotechnology 14:494-498); wunl and wun2 (U.S. Patent No. 5,428,148); winl and win2 (Stanford et al. (1989) Mol. Gen. Genet. 215 :200-208); systemin (McGurl et al. (1992) Science 225: 1570-1573); WIP1 (Rohmeier et al. (1993) Plant Mol. Biol. 22:783-792; Eckelkamp et al. (1993) FEBS Letters 323 :73-76); MPI gene (Corderok et al. (1994) Plant J. 6(2): 141-150); and the like, herein incorporated by reference.
[0071] Tissue-preferred promoters for use in the invention include those set forth in Yamamoto et al (1997) Plant J. 12(2):255-265 ; Kawamata et al (1997) Plant Cell Physiol.
38(7):792-803; Hansen etal. (1997) Mol. Gen Genet. 254(3 ):337-343; Russell etal. (1997) Transgenic Res. 6(2): 157-168; Rinehart etal. (1996)PlantPhysiol. 112(3): 1331-1341; Van Camp et al. (1996) Plant Physiol. 112(2):525-535; Canevascim et al. (1996) Plant Physiol. 112(2):513-524; Yamamoto et al. (1994) Plant Cell Physiol. 35(5):773-778; Lam (1994) Results Probl. Cell Differ. 20: 181-196; Orozco et al. (1993) PlantMolBiol. 23(6): 1129- 1138; Matsuoka etal. (1993) Proc Natl. Acad. Sci. USA 90(20):9586-9590; and Guevara- Garcia et al. (1993) Plant J. 4(3):495-505.
[0072] Leaf-preferred promoters include those set forth in Yamamotoet al. (1997) PlantI. 12(2):255-265; Kwon et al. (1994) Plant Physiol. 105:357-67; Yamamotoet al. (1994) Plant Cell Physiol. 35(5):773-778; Gotor etal. (1993) Plant J. 3 :509-18; Orozco etal. (1993) Plant Mol. Biol. 23(6): 1129-1138; and Matsuoka et al. (1993) Proc. Natl. Acad. Sci. USA 90(20):9586-9590. [0073] Root-preferred promoters are known and include those in Hire et al. (1992) Plant Mol. Biol. 20(2):207-218 (soybean root-specific glutamine synthetase gene); Keller and Baumgartner (1991) Plant Cell 3 (10): 1051-1061 (root-specific control element); Sanger et al. (1990) PlantMol. Biol. 14(3):433-443 (mannopine synthase (MAS) gene of Agrobacterium tumefaciens), and Miao et al. (1991) Plant Cell 3(1): 11-22 (cytosolic glutamine synthetase (GS)); Bogusz etal. (1990) Plant Cell 2(7):633-641; Leach and Aoyagi (1991) Plant Science (Limerick) 79(l):69-76 (rolC and rolD); Teeri etal. (1989) EMBO J. 8(2):343-350; Kuster et al. (1995) PlantMol. Biol. 29(4):759-772 (the VfENOD-GRP3 gene promoter); and, Capana et al. (1994) PlantMol. Biol. 25(4) 681- 691 (rolB promoter). See also U.S. PatentNos. 5,837,876; 5,750,386; 5,633,363; 5,459,252; 5,401,836; 5,110,732; and 5,023,179.
[0074] "Seed-preferred" promoters include both "seed-specific" promoters (those promoters active during seed development such as promoters of seed storage proteins) as well as "seed-germinating" promoters (those promoters active during seed germination). See Thompson et al. (1989) BioEssays 10: 108. Seed-preferred promoters include, but are not limited to, Ciml (cytokinin-induced message); cZ19Bl (maize 19 kDa zein); milps (myoinositol-1 -phosphate synthase) (see WO 00/11177 and U.S. PatentNo. 6,225,529). Gammazein is an endosperm-specific promoter. Globulin 1 (Gib- 1) is a representative embryospecific promoter. For dicots, seed-specific promoters include, but are not limited to, bean P- phaseolin, napin, -conglycinin, soybean lectin, cruciferin, and the like. For monocots, seedspecific promoters include, but are not limited to, maize 15 kDa zein, 22 kDa zein, 27 kDa zein, gamma- zein, waxy, shrunken 1, shrunken 2, Globulin 1, etc. See also WO 00/12733, where seed-preferred promoters from endl and end! genes are disclosed.
[0075] In specific embodiments, the polynucleotides or variants thereof provided herein, are not expressed using a root-specific promoter. In further embodiments, the polynucleotides or variants thereof provided herein are not expressed with the RCc3 rootspecific promoter. (See US 20130139280).
[0076] For expression in a bacterial host, promoters that function in bacteria are well- known in the art. Such promoters include any of the known crystal protein gene promoters, including the promoters of any of the proteins of the invention, and promoters specific fori?. thuringiensis sigma factors. Alternatively, mutagenized, or recombinant crystal proteinencoding gene promoters may be recombinantly engineered and used to promote the expression of the novel gene segments disclosed herein. [0077] A number of non-translated leader sequences derived from viruses are also known to enhance expression, and these are particularly effective in dicotyledonous cells. The expression cassette may comprise one or more of such leader sequences. Specifically, leader sequences from tobacco mosaic virus (TMV, the “W-sequence”), maize chlorotic mottle virus (MCMV), and alfalfa mosaic virus (AMV) have been shown to be effective in enhancing expression (e.g., Gallie et l. Nucl. Acids Res. 15: 8693-8711 (1987); Skuzeski et al. Plant Molec. Biol. 15 : 65-79 (1990)). Other leader sequences known in the art include but are not limited to: picomavirus leaders, for example, EMCV leader (encephalomyocarditis 5' noncoding region) (Elroy-Stein, O., Fuerst, T. R , and Moss, B. PAL45 USA 86:6126-6130 (1989)); potyvirus leaders, for example, tobacco etch virus (TEV) leader (Allison etal., 1986); maize dwarf mosaic virus (MDMV) leader; Virology 154:9-20); human immunoglobulin heavy -chain binding protein (BiP) leader, (Macejak,D. G., and Samow, P., Nature 353: 90-94 (1991); untranslated leader from the coat protein mRNA of alfalfa mosaic virus (AMVRNA 4), (Jobling, S. A., and Gehrke, L., Nature 325 :622-625 (1987); tobacco mosaic virus leader (TMV), (Gallie, D. R. et al., Molecular Biology ofRNA, 237-256 (1989); and maize chlorotic mottle virus leader (MCMV) (Lommel, S. A. et al., Viro logy 81 :382-385 (1991). See also, T)Q\\a-Cio^&et al., Plant Physiology 84:965-968 (1987).
[0078] The expression cassette can also comprise a selectable marker gene forthe selection of transformed cells. Selectable marker genes are utilized forthe selection of transformed cells or tissues. Marker genes include genes encoding antibiotic resistance, such as those encoding neomycin phosphotransferase II (NEO) and hygromycin, 5-enolpyruvylshikimate- 3 -phosphate synthase (EPSPS), spectinomycin, or Acetolactate synthase (ALS). Selection markers used routinely in transformation include the nptll gene, which confers resistance to kanamycin and related antibiotics (Messing & Vierra Gene 1 : 259-268 (1982); Bevan etal., Nature 304:184-187 (1983)), the /zrztand bar genes, which confer resistance to the herbicide glufosinate (also called phosphinothricin; see White etal., Nucl. Acids Res 18: 1062 (1990), Spencer etal. Theor. Appl. Genet 79: 625-631 (1990) and U.S. PatentNos. 5,561,236 and 5,276,268), the /z/?/ gene, which confers resistance to the antibiotic hygromycin (Blochinger & Diggelmann, Mol. Cell Biol. 4 : 2929-2931), and the dhfr gene, which confers resistance to methatrexate (Bourouis etal., EMBO J. 2(7): 1099-1104 (1983)), the EPSPS gene, which confers resistance to glyphosate (U.S. PatentNos. 4,940,935 and 5,188,642), the glyphosate N-acetyltransf erase (GAT) gene, which also confers resistance to glyphosate (Castle etal. (2004) Science, 304:1151-1154; U.S. Patent App. Pub. Nos. 20070004912, 20050246798, and 20050060767); and the mannose-6-phosphate isomerase gene, which provides the ability to metabolize mannose (U.S. PatentNos. 5,767,378 and 5,994,629). b. Native promoters
[0079] In some embodiments, the promoter used herein to drive the expression of the polynucleotides provided herein comprises a native promoter or an active variant or fragment thereof. For purpose of this disclosure, the term “native promoter,” used interchangeably with the term “endogenous promoter,” refers to a promoter that is found in plants in nature. An active variant or fragment of a native promoter refers to a promoter sequence that has one or more nucleotide substitutions, deletions, or insertions and that can drivethe expression of an operably linked polynucleotide sequence under conditions similar to those under which the native promoter is active. Such active variants or fragments may be created by site-directed mutagenesis, induced mutation, or may occur as allelic variants (polymorphisms). In some embodiments, the native promoter comprises a polynucleotide having the sequence of SEQ ID NO: 58. In some embodiments, disclosed herein is a construct comprising a native promoter or its active variant or fragment operably linked to a polynucleode having the sequence of any one of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134 or 137, or a fragment or variant of any one of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134 or 137 (e.g., having least 85%, at least 90%, at least 95%, at least 98%, or at least 99% identity); and when introduced into a plant, the construct confers increased protein content, increased oil content, and/or modified oil profile. In some embodiments, disclosed herein is a construct comprising a native promoter or an active variant or fragment thereof operably linked to a polynucleotide encoding a polypeptide having the sequence of any one of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, or a fragment orvariant (e g., having least 85%, atleast 90%, at least 95%, at least 98%, orat least 99% identity) of any one of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or24-59; and when introduced into a plant, the construct confers increased protein content, increased oil content, and/or modified oil profile. In some embodiments, the native promoter is a heterologous promoter to the polynucleotide.
[0080] Also provided herein is a plant, a plant cell, or a plant part (e.g., a plant seed) comprising the construct described above. In some embodiments, the polynucleotide encodes a polypeptide having an amino acid sequence comprising at least 85%, at least 90%, or at least 95% identity to at least one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136 or 139. In some embodiments, the polynucleotide comprises at least one, at least two, at least three, at least four, at least five, or at least six mutations as compared to SEQ ID NO: 75, 78, 81, 84, 87, 111, 114, 117, 120, 123, 126, 129, 132, 135 or 138. In some embodiments, the polynucleotide comprises at least one, at least two, at least three, at least four, at least five, or at least six mutations as compared to a polynucleotide having a sequence of any one of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134 or 137. In some embodiments, the polynucleotide encodes a polypeptide having an amino acid sequence comprising at least 85%, atleast 90%, at least95%, at least 98%, or at least 99% identity to at least one of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59. In some embodiments, the polynucleotide comprises at least one, at least two, at least three, at least four, at least five, or at least six mutations as compared to any one of SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21. In some embodiments, the polynucleotide comprises at least one, at least two, at least three, at least four, at least five, or at least six mutations as compared to a polynucleotide encoding any one of SEQ ID NOs: 22 or 24-59. In some embodiments, the plant is a dicot plant. In some embodiments, the plant is a monocot plant. In some embodiments, the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane. In some embodiments, the plant is a soybean plant. In some embodiments, the plant is an elite soybean plant.
[0081] Also provided herein is a method of conferring increased protein content, increased oil content, and/or modified oil profile to a plant comprising introducing into the genome of the plant a nucleic acid sequence operably linked to a promoter comprising SEQ ID NO: 93 or an active variant or fragment thereof, where the nucleic acid sequence encodes a polypeptide having an amino acid sequence comprising least 85%, at least 90%, at least 91%, at least92%, atleast 93%, atleast94%, at least 95% identity, atleast 96%, at least97%, at least 98%, or at least 99% identity to at least one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136 or 139, 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59. In some embodiments, the nucleic acid sequence encodes a polypeptide having an amino acid sequence setforthin SEQ ID NO: 75, 78, 81, 84, 87, 111, 114, 117, 120, 123, 126, 129, 132, 135, 138, 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59.
III. Plants, plant cells, and plant parts
[0082] In the plants provided herein, the polynucleotide as described in Section I of this disclosure is a heterologous nucleic acid sequence in the genome of the plant. As used herein, the term “heterologous” in the context of a chromosomal segment refers to one or more DNA sequences (e g., genetic loci) in a configuration in which they are not found in nature, for example as a result of a recombination eventbetween homologous chromosomes during meiosis, or for example as a result of the introduction of a transgenic sequence, or for example as a result of modification through gene editing.
[0083] Although soybean plants are used to exemplify the composition and methods throughout the application, a polynucleotide as provided herein may be introduced to any plant species, including, but not limited to, monocots and dicots. Examples of plants of interest include but are not limited to, corn (maize), sorghum, wheat, sunflower, tomato, crucifers, peppers, potato, cotton, rice, soybean, sugarbeet, sugarcane, tobacco, barley, and oilseed rape, Brassica sp., alfalfa, rye, millet, safflower, peanuts, sweetpotato, cassava, coffee, coconut, pineapple, citrus trees, cocoa, tea, banana, avocado, fig, guava, mango, olive, papaya, cashew, macadamia, almond, oats, vegetables, ornamentals, and conifers.
[0084] Glycine (soybean or soya bean) is a genus in the bean family Fabaceae. The Glycine plants can be Glycine arenaria, Glycine argyrea, Glycine cyrtolob a, Glycine canescens, Glycine clandestine, Glycine curvata, Glycinefalcata, Glycinelatifolia, Glycine microphylla, Glycine pescadrensis, Glycine stenophita, Glycine syndetica, Glycine soja Seib. EtZucc., Glycine max (L.) Merrill., Glycine tabacina, or Glycine tomentella.
[0085] In some embodiments, the plants provided herein are elite plants or derived from an elite line.
[0086] As used herein, an “elite line” is an agronomically superior line that has resulted from many cycles of breeding and selection for superior agronomic performance. Numerous elite lines are available and known to those of skill in the art of soybean breeding. An “elite population,” is an assortment of elite individuals or lines that can be used to represent the state of the art in terms of agronomically superior genotypes of a given crop species, such as soybean. Similarly, an “elite germplasm” or elite strain of germplasm is an agronomically superior germplasm, typically derived from, and/or can give rise to, a plant with superior agronomic performance, such as an existing or newly developed elite line of soybean.
[0087] An “elite” plant is any plant from an elite line, such that an elite plant is a representative plant from an elite variety. In some embodiments, the soybean plant comprising a polynucleotide encoding any one of the polypeptides disclosed herein is an elite soybean plant. Non-limiting examples of elite soybean varieties that are commercially available to farmers or soybean breeders include: AG00802, A0868, AG0902, A1923, AG2403, A2824, A3704, A4324, A5404, AG5903, AG6202 AG0934; AG1435; AG2031 ; AG2035; AG2433; AG2733; AG2933; AG3334; AG3832; AG4135; AG4632; AG4934; AG5831 ; AG6534; and AG7231 (Asgrow Seeds, Des Moines, Iowa, USA); BPR0144RR, BPR 4077NRR and BPR 4390NRR (Bio Plant Research, Camp Point, Ill., USA); DKB 17-51 and DKB37-51 (DeKalb Genetics, DeKalb, Ill., USA); DP 4546 RR, and DP 7870 RR (Delta & Pine Land Company, Lubbock, Tex., USA); JG 03R501, IG 32R606C ADD and IG 55R503C (JGL Inc., Greencastle, Ind., USA); NKS 13-K2 (NK Division of Syngenta Seeds, Golden Valley, Minnesota, USA); 90M01, 91M30, 92M33, 93M11, 94M30, 95M30, 97B52, P008T22R2; P16T17R2; P22T69R; P25T51R; P34T07R2; P35T58R; P39T67R; P47T36R; P46T21R; and P56T03R2 (Pioneer Hi-Bred International, Johnston, Iowa, USA); SG4771NRR and SG5161NRR/STS (Soy genetics, LLC, Lafayette, Ind., USA); S00-K5, S11-L2, S28-Y2, S43-B1, S53-A1, S76-L9, S78-G6, S0009-M2; S007-Y4; S04-D3; S14-A6; S20-T6; S21-M7; S26-P3; S28-N6; S30-V6; S35-C3; S36-Y6; S39-C4; S47-K5; S48-D9; S52-Y2; S58-Z4; S67-R6; S73-S8; and S78-G6 (Syngenta Seeds, Henderson, Ky., USA); Richer (Northstar Seed Ltd. Alberta, CA); 14RD62 (Stine Seed Co. la., USA); or Armor 4744 (Armor Seed, LLC, Ar., USA).
[0088] In some embodiments, the plants provided herein can comprise one or more additional polynucleotides that encode an additional polypeptide that can confer a phenotype of increased protein content, increased oil content, or modified oil profile on a plant. In some embodiments, the additional polynucleotide encodes a polypeptide having the sequence of any one of SEQ IDNOs: 3, 5, 6, 8, 9, 11, 12, 15, 18, 19, 22, or 24-59. The additional polynucleotide can be introduced using similar approaches as disclosed above, e.g, by transgenic means, by breeding, or by genome editing.
[0089] In specific embodiments, the plants, plant parts, or seeds having the heterologous polynucleotide or polypeptide disclosed herein or active variants and fragments thereof can have a modified level of expression of the polynucleotide or polypeptide (i.e., an increase or a decrease in expression level). In other embodiments, the plants, plant parts or seeds having the heterologous polynucleotide or polypeptide disclosed herein or active variants and fragments thereof can have a modified level of activity of the polypeptide (i.e., an increase or a decrease in activity level). Methods to generate such modified levels of expression or activity are disclosed elsewhere herein and include, but are not limited to, breeding, gene editing, and transgenic techniques. [0090] Plants produced as described above can be propagated to produce progeny plants, and the progeny plants that have stably incorporated into their genome a polynucleotide conferring increased protein content, increased oil content, and/or modified oil profile can be selected and can be further propagated if desired. The term “progeny,” refers to the descendant(s) of a particular cross. Typically, progeny plants result from the breeding of two individuals, although some species (particularly some plants and hermaphroditic animals) can be selfed (i.e., the same plant acts as the donor of both male and female gametes). The descendant(s) can be, for example, of the Fl, the F2, or any sub sequent generation.
[0091] In some embodiments, a plant cell, seed, or plant part or harvest product can be obtained from the plant produced as above, and the plant cell, seed, or plant part can be screened using methods disclosed above for the evidence of stable incorporation of the polynucleotide. The term “stable incorporation” refers to the integration of a nucleic acid sequence into the genome of a plant and said nucleic acid sequence is capable of being inherited by the progeny thereof. As used herein, the term “plant part” indicates a part of a plant, including single cells and cell tissues such as plant cells that are intact in plants, cell clumps and tissue cultures from which plants can be regenerated. Examples of plant parts include, but are not limited to, single cells and tissues from pollen, ovules, zygotes, leaves, embryos, roots, root tips, anthers, flowers, flower parts, fruits, stems, shoots, cuttings, and seeds; as well as pollen, ovules, egg cells, zygotes, leaves, embryos, roots, root tips, anthers, flowers, flower parts, fruits, stems, shoots, cuttings, scions, rootstocks, seeds, protoplasts, calli, and the like.
[0092] In some embodiments, plant products can be harvested from the plant disclosed above and processed to produce processed products, such as flour, soy meal, oil, starch, and the like. These processed products are also within the scope of this invention provided that they comprise a polynucleotide or polypeptide or variant thereof disclosed herein. Other soybean plant products include but are not limited to protein concentrate, protein isolate, soybean hulls, meal, flower, oil, and the whole soybean itself.
IV. Methods for producing a plant variety that has increased protein content, increased oil content, and/or modified oil profile
[0093] Provided herein are methods of producing a plant that has increased protein content, increased oil content, and/or modified oil profile by introducing a nucleic acid sequence encoding a polypeptide as provided herein. A nucleic acid sequence maybe introduced to a plant cell by various ways, for example, by transformation, by genome modification techniques (such as by genome editing), or by breeding. In one aspect, the plant can be produced by transforming the nucleic acid sequence encoding a polypeptide disclosed above into a recipient plant. In one aspect, the method can comprise editing the genome of the recipient plant so that the resulting plant comprises a polynucleotide encoding a polypeptide disclosed above. In yet another aspect, the method can comprise increasing the expression level and/or activity of the above-mentioned proteins in a recipient plant, for example, by enhancing promoter activity or replacing the endogenous promoter with a stronger promoter. In another aspect, the method can comprise breeding a donor plant comprising a polynucleotide as described above with a recipient plant and selecting for incorp oration of the polynucleotide into the recipient plant genome.
1. Transgenic means
[0094] In some embodiments, the method comprises transforming a polynucleotide disclosed herein or an active variant or fragment thereof into a recipient plant to obtain a transgenic plant, and said transgenic plant has increased protein content, increased oil content, and/or modified oil profile. Expression cassettes comprising polynucleotides encoding the polypeptides as described above can be used to transform plants of interest.
[0095] As used herein, the term “transgenic” and grammatical variations thereof refer to a plant, including any part derived from the plant, such as a cell, tissue, or organ, in which a heterologous nucleic acid is integrated into the genome. In specific embodiments, the heterologous nucleic acid is a recombinant construct, vector, or expression cassette comprising one or more nucleic acids. In other embodiments, a transgenic plant is produced by a genetic engineering method, such as Agrobacterium transformation. Through gene technology, the heterologous nucleic acid is stably integrated into chromosomes, so that the next generation can also be transgenic. As used herein, “transgenic” and grammatical variations thereof also encompass biological treatments, which include plant hybridization and/or natural recombination.
[0096] Transformation results in a transformed plant, including whole plants, as well as plant organs (e.g., leaves, stems, roots, etc.), seeds, plant cells, propagules, embryos, and progeny of the same. Plant cells can be differentiated or undifferentiated (e.g., callus, suspension culture cells, protoplasts, leaf cells, root cells, phloem cells, pollen).
Transformation may result in stable or transient incorporation of the nucleic acid into the cell. "Stable transformation" is intended to mean thatthe nucleotide construct introduced into a host cell integrates into the genome of the host cell and is capable of being inherited by the progeny thereof. "Transient transformation" is intended to mean that a polynucleotide is introduced into the host cell and does not integrate into the genome of the host cell.
[0097] Methods for transformation typically involve introducing a nucleotide construct into a plant. In some embodiments, the transformation method is an Agrobacterium-mediated transformation In some embodiments, the transformation method is abiolistic-mediated transformation. Transformation may also be performed by infection, transfection, microinjection, electroporation, microprojection, biolistics or particle bombardment, electroporation, silica/carbon fibers, ultrasound -mediated, PEG mediated, calcium phosphate co-precipitation, poly cation DMSO technique, DEAE dextran procedure, Agrobacterium and viral-mediated (e.g., Caulimoriviruses, Geminiviruses, RNA plant viruses), liposome- mediated and the like.
[0098] Transformation protocols as well as protocols for introducing polypeptides or polynucleotide sequences into plants may vary depending on the type of plant or plant cell, i.e., monocot or dicot, targeted for transformation. Methods for transformation are known in the art and include those setforth in US PatentNos: 8,575,425; 7,692,068; 8,802,934; and 7,541,517; each of which is herein incorporated by reference. See, also, Rakoczy- Trojanowska, M. (2002) Cell Mol Biol Lett. 7:849-858; Jones et al. (2005) Plant Methods, Vol. 1, Article 5; Rivera et l (2012) Physics of Life Reviews 9:308-345; Bartlett et al. (2008) Plant Methods 4: l-12; Bates, G.W. (1999) Methods in Molecular Biology 111 :359- 366; Binns and Thomashow (1988) Annual Reviews in Microbiology 42:57 Sup7Sup5- 606; Christou, P. (1992) The Plant Journal 2:275-281; Christou,P. (1995)Euphytica85 : 13-27; Tzfira et al. (2004) TRENDS in Genetics 20:375-383; Yao et al. (2006) Journal of Experimental Botany 57:3737-3746; Zupan and Zambry ski (1995) Plant Physiology 107: 1041-1047.
[0099] Methods for the transformation of chloroplasts are known in the art. See, for example, Svab et al. (1990) Proc. Natl. Acad. Sci. USA 87(21):8526-8530; Svab andMaliga (1993) Proc. Natl. Acad. Sci. USA 90(3):913-917; Staub and Maliga (1993)EMBO J.
12(2):601-606. The method relies on particle gun delivery of DNA containing a selectable marker and targeting of the DNA to the plastid genome through homologous recombination. Additionally, plastid transformation can be accomplished by transactivation of a silent plastid-bome transgene by tissue-preferred expression of a nuclear-encoded and plastid- directed RNA polymerase. Such a system has been reported in McBride et al. (1994) Proc. Natl. Acad. Sci. USA 91 (15):7301-7305.
[0100] The cells that have been transformed may be grown into plants in accordance with conventional ways. See, for example, McCormick et al. (1986) Plant Cell Reports 5 :81-84. These plants may then be grown, and either pollinated with the same transformed strain or different strains, and the resulting hybrid having constitutive expression of the desired phenotypic characteristic identified. Two or more generations may be grown to ensure that expression of the desired phenotypic characteristic is stably maintained and inherited and then seeds harvested to ensure expression of the desired phenotypic characteristic has been achieved. In this manner, the present invention provides a transformed seed (also referred to as "transgenic seed") having a nucleotide construct of the invention, for example, an expression cassette of the invention, stably incorporated into their genome.
2. Crossing
[0101] In some embodiments, the method comprises crossing a donor plant comprising a polynucleotide encoding a polypeptide disclosed herein with a recipient plant, and the polypeptide is able to confer increased protein content, increased oil content, and/or modified oil profile in the recipient plant. As used herein, the terms “crossing” and “breeding” refer to the fusion of gametes to produce progeny (e.g., by fertilization, such as to produce seed by pollination in plants) In some embodiments, a “cross,” “breeding,” or “cross-fertilization” is fertilization of one individual by another (e.g., cross-pollination in plants). The plant disclosed herein may be a whole plant, or may be a plant cell, seed, or tissue, or a plant part such as leaf, stem, pollen, or cell that can be cultivated into a whole plant.
[0102] In some embodiments, a progeny plant created by the crossing or breeding process is repeatedly crossed back to one of its parents through a process referred to herein as “backcrossing”. In a backcrossing scheme, the “donor” parent refers to the parental plant with the desired gene or locus to be introgressed. The “recipient” parent (used one or more times) or “recurrent” parent (used two or more times) refers to the parental plant into which the gene or locus is being introgressed. For example, see Ragot, M. et al. Marker-assisted Backcrossing: A Practical Example, in Techniques et Utilisations des Marqueurs Moleculaires Les Colloques, Vol. 72, pp. 45-56 (1995); andOpenshaw et al., Marker-assisted Selection in Backcross Breeding, in Proceedings of the Symposium “Analysis of Molecular Marker Data,” pp. 41-43 (1994). The initial cross gives rise to the Fl generation. The term “BC1” refers to the second use of the recurrent parent, “BC2” refers to the third use of the recurrent parent, and so on.
[0103] In some embodiments, the donor soybean plant is a Glycine max plant. In some embodiments, the donor soybean plant is a Glycine soja plant. In some embodiments, the recipient soybean plant is an elite Glycine max planter an elite Glycine soja plant. In some embodiments, the donor plantis from soy variety Suinong 14 (SN14) . In some embodiments, the donor plant is the soy variety Glycine soja ZYD0006.
3. Gene editing
[0104] In some embodiments, the polynucleotide sequences provided herein can be targeted to specific sites within the genome of a recipient plant cell. Such methods include, but are not limited to, meganucleases designed against the plant genomic sequence of interest CRISPR-Cas9, TALENs, and other technologies for precise editing of genomes (Feng, et al. Cell Research 23 : 1229-1232, 2013, WO 2013/026740); Cre-lox site-specific recombination; FLP-FRT recombination (Li etal. (2009) Plant Physiol 151 :1087-1095); Bxbl -mediated integration (Yau etal. Plant J (2011) 701 : 147-166); zinc-finger mediated integration (Wright et al. (2005) Plant 144:693-705); Cai et al. (2009) Plant Mol Biol 69:699-709); and homologous recombination (Lieberman-Lazarovich and Levy (2011) Methods Mol Biol : 51- 65); prime editing and transposases (Anzalone, A. et al., NatBiotechnol. 2020 Jul;38(7):824- 844); translocation; and inversion
[0105] Various embodiments of the methods described herein use gene editing. In some embodiments, gene editing is used to mutagenize the genome of a plant to produce plants having one or more of the polypeptides that is able to confer increased protein content, increased oil content, and/or modified oil profile.
[0106] In some embodiments, provided herein are plants transformed with and expressing gene-editing machinery as described above, which, when crossed with a target plant, result in gene editing in the target plant.
[0107] In general, gene editing may involve transient, inducible, or constitutive expression of the gene editing components or systems. Gene editing may involve genomic integration or episomal presence of the gene editing components or systems. [0108] Gene editing generally refers to the use of a site-directed nuclease (including but not limited to CRISPR/Cas, zinc fingers, meganucleases, and the like) to cut a nucleotide sequence at a desired location. This may be to cause an insertion/deletion (“indel”) mutation, (i.e., “SDN1”), abase edit (i.e., “SDN2”), or allele insertion or replacement (i.e., “SDN3”). SDN2 or SDN3 gene editing may comprise the provision of one or more recombination templates (e.g., in a vector) comprising a gene sequence ofinterest that can be usedfor homology-directed repair (HDR) within the plant (i.e., to be introduced into the plant genome). In some embodiments, the gene or allele of interest is one that is able to confer to the plant an improved trait, e g., increased protein content, increased oil content, and/or modified oil profile. The recombination template can be introduced into the plant to be edited either through transformation or through breeding with a donor plant comprising the recombination template. Breaks in the plant genome maybe introduced within, upstream, and/or downstream of a target sequence. In some embodiments, a double strand DNA break is made within or near the target sequence locus. In some embodiments, breaks are made upstream and downstream of the target sequence locus, which may lead to its excision from the genome. In some embodiments, one or more single strandDNA breaks (nicks) are made within, upstream, and/or downstream of the target sequence (e.g., using a nickase Cas9 variant). Any of these DNA breaks, as well as those introduced via other methods known to one of skill in the art, may induce HDR. Through HDR, the target sequence is replaced by the sequence of the provided recombination template comprising a polynucleotide ofinterest, e g., any one of SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, 21, 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134 or 137, or a polynucleotide encoding a polypeptide having the sequence of any one of SEQ ID NOs: : 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, 139, 22, or24-59 maybe provided on/as a template. By designing the system such that one or more single strand or double strand breaks are introduced within, upstream, and/or downstream of the corresponding region in the genome of a plant not comprising the gene sequence ofinterest, this region can be replaced with the template. In some embodiments, the polynucleotide of interest is operably linked to a promoter and the expression of the polynucleotide of interest controlled by the promoter conferred increased protein content, increased oil content, and/or modified oil profile to the plant. In some embodiments, the promoter is a native promoter, or an active variant or fragment thereof as described above. In some embodiments, the native promoter comprises SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, 21, 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134 or 137. [0109] In some embodiments, mutations in the genes of interest described herein may be generated without the use of a recombination template via targeted introduction of DNA double strand breaks. Such breaks may be repaired through the process of non-homologous end joining (NHEJ), which can result in the generation of small insertions or deletions (indels) at the repair site. Such indels may lead to frameshift mutations causing premature stop codons or other types of loss-of-function mutations in the targeted genes.
[0110] In some embodiments, gene editing may involve transient, inducible, or constitutive expression of the gene editing components or systems in the target plant. Gene editing may also involve genomic integration or episomal presence of the gene editing components or systems in the target plant.
[OHl] In certain embodiments, the nucleic acid modification or mutation is effected by a (modified) zinc-finger nuclease (ZFN) system. The ZFN system uses artificial restriction enzymes generated by fusing a zinc finger DNA-binding domain to a DNA-cleavage domain that can be engineered to target desired DNA sequences. Exemplary methods of genome editing u sing ZFNs can be found for example in U.S. PatentNos. 6,534,261; 6,607,882; 6,746,838; 6,794,136; 6,824,978; 6,866,997; 6,933, 113; and 6,979,539.
[0112] In certain embodiments, the nucleic acid modification is effected by a (modified) meganuclease, which are endodeoxyribonucleases characterized by a large recognition site (double-stranded DNA sequencesof 12to 40 base pairs). Exemplary methods forusing meganucleases can befoundin USPatentNos: 8,163,514; 8,133,697; 8,021,867; 8, 119,361; 8,119,381; 8,124,369; and 8,129,134, which are specifically incorporated by reference.
[0113] In certain embodiments, the nucleic acid modification is effected by a (modified) CRISPR/Cas complex or system. In certain embodiments, the CRISPR/Cas system or complex is a class 2 CRISPR/Cas system. In certain embodiments, said CRISPR/Cas system or complex is a type II, type V, or type VI CRISPR/Cas system or complex. The CRISPR/Cas system does not require the generation of customized proteins to target specific sequences but rather a single Cas protein can be programmed by an RNA guide (gRNA) to recognize a specific nucleic acid target; in otherwords, the Cas enzyme protein can be recruited to a specific nucleic acid target locus (which may comprise or consist of RNA and/or DNA) of interest using said short RNA guide.
[0114] In general, the CRISPR/Cas or CRISPR system is as used herein foregoing documents refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated (“Cas”) genes, including sequences encoding a Cas gene and one or more of, a tracr (trans-activating CRISPR) sequence (e.g., tracrRNA or an active partial tracrRNA), a tracr-mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system), a guide sequence (also referred to as a“ spacer” in the context of an endogenous CRISPR system), or“RNA(s)” as that term is herein used (e.g., RNA(s) to guide Cas, such as Cas9, e.g., CRISPR RNA and, where applicable, transactivating (tracr) RNA or a single guide RNA (sgRNA) (chimeric RNA)) or other sequences and transcripts from a CRISPR locus. In general, a CRISPR system is characterized by elements that promote the formation of a CRISPR complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous CRISPR system). In the context of formation of a CRISPR complex, “target sequence” refers to a sequence to which a guide sequence is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex. A target sequence may comprise any polynucleotide, such as DNA or RNA polynucleotides.
[0115] In certain embodiments, the gRNA is a chimeric guide RNA or single guide RNA (sgRNA). In certain embodiments, the gRNA comprises a guide sequence and a tracr mate sequence (or direct repeat). In certain embodiments, the gRNA comprises a guide sequence, a tracr mate sequence (or direct repeat), and a tracr sequence. In certain embodiments, the CRISPR/Cas system or complex as described herein doesnot comprise and/or does not rely on the presence of a tracr sequence (e.g., if the Cas protein is Cas 12a).
[0116] The Cas protein as referred to herein, such as but not limited to Cas9, Cas 12a (formerly referred to as Cpfl), Casl2b (formerly referred to as C2cl), Casl3a (formerly referred to as C2c2), C2c3, Casl3b protein, may originate from any suitable source, and hence may include different orthologues, originating from a variety of (prokaryotic) organisms, as is well documented in the art. In certain embodiments, the Cas protein is (modified) Cas9, preferably (modified) Staphylococcus aureus Cas9 (SaCas9) or (modified) Streptococcus pyogenes Cas9 (SpCas9). In certain embodiments, the Cas protein is Cas 12a, optionally from Acidaminococcus sp., such as Acidaminococcus sp. BV3L6 Cpfl (AsCasl2a) or Lachnospiraceae bacterium Cas 12a, such as Lachnospiraceae bacterium MA2020 or Lachnospiraceae bacterium MD2006 (LBCasl2a).5ee U.S. Pat. No. 10,669,540, incorporated herein by reference in its entirety. Alternatively, the Casl2aprotein maybe from Moraxella bovoculi AAX08_00205 [Mb2Casl2a] or Moraxella bovoculi AAX1 1_00205 [Mb3Casl2a]. See WO 2017/189308, incorporated herein by reference in its entirety. In certain embodiments, the Cas protein is (modified) C2c2, preferably Leptotrichia wadei C2c2 (LwC2c2) or Listeria newyorkensis FSL M6-0635 C2c2 (LbFSLC2c2). In certain embodiments, the (modified) Cas protein is C2cl. In certain embodiments, the (modified) Cas protein is C2c3. In certain embodiments, the (modified) Cas protein is Cas 13b. Other Cas enzymes are available to a person skilled in the art.
[0117] Gene editing methods and compositions are also disclosed in US Pat Nos. 10,519,456 and 10,285,34882, the entire content of which is herein incorporated by reference.
[0118] The gene-editing machinery (e.g., the DNA modifying enzyme) introduced into the plants can be controlled by any promoter that can drive recombinant gene expression in plants. In some embodiments, the promoter is a constitutive promoter. In some embodiments, the promoter is a tissue-specific promoter, e g., a pollen-specific promoter or a sperm cell specific promoter, a zygote specific promoter, or a promoter that is highly expressed in sperm, eggs and zygotes (e.g., prOsActinl). Suitable promoters are disclosed in U.S. Pat. No. 10,519,456, the entire content of which is herein incorporated by reference.
[0119] In another aspect, provided herein is a method of editing plant genomic DNA. In some embodiments, the method comprises using a first soybean plant expressing a DNA modification enzyme and at least one optional guide nucleic acid as described above to pollinate a target plant comprising genomic DNA to be edited.
V. Stacking
[0120] The various polynucleotides and variants thereof provided herein can be stacked with one or more polynucleotides encoding a desirable trait such as a polynucleotide that confers, for example, insect, disease or herbicide resistance or other desirable agronomic traits of interest including, but not limited to, traits associated with high oil content; increased digestibility; balanced amino acid content; and high energy content. Such traits may refer to properties of both seed and non-seed plant tissues, or to food or feed prepared from plants or seeds having such traits.
[0121] As used herein, gene or trait “stacking” is combining desired genes or traits into one transgenic plant line. As one approach, plant breeders stack transgenic traits by making crosses between parents that each have a desired trait and then identifying offspring that have both of these desired traits (so-called “breeding stacks”). Another way to stack genes is by transferring two or more genes into the cell nucleus of a plant at the same time during transformation. Another way to stack genes is by re-transforming a transgenic plantwith another gene of interest. For example, gene stacking can be used to combine two different insect resistance traits, an insect resistance trait and a disease resistance trait, or an herbicide resistance trait (such as, for example, Btl 1). The use of a selectable marker in addition to a gene of interest would also be considered gene stacking.
[0122] In some embodiments, a nucleic acid molecule or vector of the disclosure can include an additional coding sequence for one or more polypeptides or double stranded RNA molecules (dsRNA) of interest for agronomic traits that primarily are of benefit to a seed company, grower or grain processor. A polypeptide of interest can be any polypeptide encoded by a nucleotide sequence of interest. Non-limiting examples of polypeptides of interest that are suitable for production in plants include those resulting in agronomically important traits such as herbicide resistance (also sometimes referred to as “herbicide tolerance”), virus resistance, bacterial pathogen resistance, insect resistance, nematode resistance, or fungal resistance. See, e.g., U.S. PatentNos. 5,569,823; 5,304,730; 5,495,071; 6,329,504; and 6,337,431. The polypeptide also can be one that increases plantvigor oryield (including traits that allow a plantto grow at different temperatures, soil conditions and levels of sunlight and precipitation), or one that allows identification of a plant exhibiting a trait of interest (e.g., a selectable marker, seed coat color, relative maturity group, etc ). Various polypeptides of interest, as well as methods for introducing these polypeptides into a plant, are described, for example, in US PatentNos. 4,761,373; 4,769,061 ; 4,810,648; 4,940,835; 4,975,374; 5,013,659; 5, 162,602; 5,276,268; 5,304,730; 5,495,071 ; 5,554,798; 5,561,236; 5,569,823; 5,767,366; 5,879,903, 5,928,937; 6,084, 155; 6,329,504 and 6,337,431; as well as US Patent Publication No. 2001/0016956.
[0123] Polynucleotides conferring resistan ce/tolerance to an herbicide that inhibits the growing point or meristem, such as an imidazalinone or a sulfonylurea can also be suitable in some embodiments. Exemplary polynucleotides in this category code for mutant ALS and AHAS enzymes as described, e.g., in U.S. PatentNos. 5,767,366 and 5,928,937. U.S. Patent Nos. 4,761,373 and 5,013,659 are directed to plants resistantto various imidazalinone or sulfonamide herbicides. U.S. PatentNo. 4,975,374 relatesto plant cells and plants containing a nucleic acid encoding a mutant glutamine synthetase (GS) resistantto inhibition by herbicides that are known to inhibit GS, e.g., phosphinothricin and methionine sulfoximine. U.S. Patent No. 5,162,602 discloses plants resistant to inhibition by cyclohexanedione and aryloxyphenoxypropanoic acid herbicides. The resistance is conferred by an altered acetyl coenzyme A carboxylase (ACCase).
[0124] Polypeptides encoded by nucleotides sequences conferring resistance to glyphosate are also suitable forthe disclosure. See, e.g., U.S. PatentNo. 4,940,835 and U.S. PatentNo. 4,769,061. U.S. PatentNo. 5,554,798 discloses transgenic glyphosate resistant maize plants, which resistance is conferred by an altered 5-enolpyruvyl-3-phosphoshikimate (EPSP) synthase gene.
[0125] Polynucleotides coding for resistance to phosphono compounds such as glufosinate ammonium or phosphinothricin, and pyridinoxy or phenoxy propionic acids and cyclohexones are also suitable. See, European Patent Application No. 0242246. See also, U.S. Patent Nos. 5,879,903, 5,276,268, and 5,561,236.
[0126] Other suitable polynucleotides include those coding for resistance to herbicides that inhibit photosynthesis, such as a triazine and a benzonitrile (nitrilase) See, U.S. PatentNo.
4,810,648. Additional suitable polynucleotides coding for herbicide resistance include those coding for resistance to 2, 2-di chloropropionic acid, sethoxydim, haloxyfop, imidazolinone herbicides, sulfonylurea herbicides, triazolopyrimidine herbicides, s-triazine herbicides and bromoxynil. Also suitable are polynucleotides conferring resistance to a protox enzyme, or that provide enhanced resistance to plant diseases; enhanced tolerance of adverse environmental conditions (abiotic stresses) including but not limited to drought, excessive cold, excessive heat, or excessive soil salinity or extreme acidity or alkalinity; and alterations in plant architecture or development, including changes in developmental timing. See, e.g., U.S. Patent Publication No. 2001/0016956 and U.S. PatentNo. 6,084,155.
[0127] Additional suitable polynucleotides include those coding for insecticidal polypeptides. These polypeptides may be producedin amounts sufficient to control, for example, insect pests (i.e., insect controlling amounts). Itis recognized that the amount of production of an insecticidal polypeptide in a plant necessary to control insects or other pests may vary depending upon the cultivar, type of pest, environmental factors and the like. Polynucleotides useful for additional insect or pest resistance include, for example, those that encode toxins identified in Bacillus organisms. Polynucleotides comprising nucleotide sequences encoding Bacillus thuringiensis (Bt) Cry proteins from several subspecies have been cloned and recombinant clones have been found to be toxic to lepidopteran, dipteran and/or coleopteran insect larvae. Examples of such Bt insecticidal proteins include the Cry proteins such as Cryl Aa, Cry 1 Ab, CrylAc, Cry IB, CrylC, Cry ID, CrylEa, CrylFa, Cry3A, Cry 9A, Cry 9B, Cry 9C, and the like, as well as vegetative insecticidal proteins such as Vip 1 , Vip2, Vip3, and the like. A full list of ^/-derived proteins canbe found on the worldwide web at Bacillus thuringiensis Toxin Nomenclature Database maintained by the University of Sussex (see also, Crickmore etal. (1998) Microbiol. Mol. Biol. Rev. 62:807-813).
[0128] In embodiments, an additional polypeptide is an insecticidal polypeptide derived from a non-B? source, including without limitation, an alpha-amylase, a peroxidase, a cholesterol oxidase, a patatin, a protease, a protease inhibitor, a urease, an alpha-amylase inhibitor, a pore-forming protein, a chitinase, a lectin, an engineered antibody or antibody fragment, a Bacillus cercus insecticidal protein, aXenorhabdus spp. (such as A nematophila or A bovienii) insecticidal protein, a Photorhabdus spp. (such as P. luminescens or P. asymobiotica) insecticidal protein, aBrevibacillus spp. (such as B. laterosporous) insecticidal protein, a Lysinibacillus spp. (such as L. sphearicus) insecticidal protein, a Chromobacterium spp. (such as C. subtsugae or C. piscinae) insecticidal protein, a Yersinia spp. (such as Y. entomophaga) insecticidal protein, a Paenibacillus spp. (such as P. propylaea) insecticidal protein, a Clostridium spp. (such as C. bifermentans) insecticidal protein, aPseudomonas spp. (such as P. fluor escens) anda lignin.
[0129] Polypeptides that are suitable for production in plants further include those that improve or otherwise facilitate the conversion of harvested plants or plant parts into a commercially useful product, including, for example, increased or altered carbohydrate content or distribution, improved fermentation properties, increased oil content, increased protein content, modified oil profile, improved digestibility, and increased nutraceutical content, e.g., increased phytosterol content, increased tocopherol content, increased stanol content or increased vitamin content. Polypeptides of interest also include, for example, those resulting in or contributing to a reduced content of an unwanted component in a harvested crop, e.g., phytic acid, or sugar degrading enzymes. By “resultingin” or “contributing to” is intended that the polypeptide of interest can directly or indirectly contribute to the existence of a trait of interest (e.g., increasing cellulose degradation using a heterologous cellulase enzyme).
[0130] In some embodiments, the polypeptide contributes to improved digestibility for food or feed. Xylanases are hemicellulolytic enzymes that improve the breakdown of plant cell walls, which leads to better utilization of the plant nutrients by an animal. This leads to improved growth rate and feed conversion. Also, the viscosity of the feeds containing xylan can be reduced. Heterologous production of xylanases in plant cells also can facilitate lignocellulosic conversion to fermentable sugars in industrial processing.
[0131] Numerous xylanases from fungal and bacterial microorganisms have been identified and characterized (see, e.g., U.S. Patent No. 5,437,992; Coughlin et al. (1993) “Proceedings of the Second TRICEL Symposium on Trichoderma reesei Cellulases and Other Hydrolases” Espoo; Souminen and Reinikainen, eds. (1993) Foundation for Biotechnical and Industrial Fermentation Research 8:125-135; U.S. Patent Publication No. 2005/0208178; and PCT Publication No. WO 03/16654). In particular, three specific xylanases (XYL-I, XYL-II, and XYL-III) have been identified in T. reesei (Tenkanen etal. (1992) EnzymeMicrob. Technol. 14:566; Torronenet al. (1992) Bio/Technology 10:1461; and Xu et al. (1998) Appl. Microbiol. Biotechnol. 49:718).
[0132] In other embodiments, a polypeptide useful for the disclosure can be a polysaccharide degrading enzyme. Plants of this disclosure producing such an enzyme may be useful for generating, for example, fermentation feedstocks for bioprocessing. In some embodiments, enzymes useful for a fermentation process include alpha amylases, proteases, pullulanases, isoamylases, cellulases, hemicellulases, xylanases, cyclodextrin glycotransferases, lipases, phytases, laccases, oxidases, esterases, cutinases, granular starch hydrolyzing enzyme and other glucoamylases.
[0133] Polysaccharide-degrading enzymes include: starch degrading enzymes such as a- amylases (EC 3.2.1.1), glucuronidases (E.C. 3.2.1.131); exo-l,4-a-D glucanases such as amyloglucosidases and glucoamylase (EC 3.2.1.3), P-amylases (EC 3.2.1.2), a-glucosidases (EC 3.2.1 .20), and other exo-amylases; starch debranching enzymes, such as a) isoamylase (EC 3.2.1 .68), pullulanase (EC 3.2.1 .41 ), and the like; b) cellulases such as exo-1,4-3 - cellobiohydrolase (EC 3.2.1.91), exo-l,3-P-D-glucanase (EC 3.2. 1.39), P-glucosidase (EC 3.2.1.21); c) L-arabinases, such as endo-l,5-a-L-arabinase (EC 3.2.1.99), a-arabinosidases (EC 3.2.1 .55) and the like; d) galactanases such as endo-l,4-P-D-galactanase(EC 3.2.1.89), endo-l,3-P-D-galactanase(EC 3.2.1.90), a-galactosidase (EC 3.2.1.22), P-galactosidase (EC 3.2.1.23) and the like; e) mannanases, such as endo-l,4-P-D-mannanase(EC 3.2.1.78), P- mannosidase(EC 3.2.1.25), a-mannosidase (EC 3.2.1.24) and the like; f) xylanases, such as endo-l,4-P-xylanase (EC 3.2.1.8), P-D-xylosidase (EC 3.2.1.37), 1,3-p-D-xylanase, and the like; and g) other enzymes suchas a-L-fucosidase (EC 3.2.1.51), a-L-rhamnosidase (EC 3.2.1.40), levanase (EC 3.2.1.65), inulanase(EC 3.2.1.7), and the like. In one embodiment, the a-amylase is the synthetic u-amylase, Amy797E, described is US Patent No. 8,093,453, herein incorporated by reference in its entirety.
[0134] Further enzymes which maybe used with the disclosure include proteases, such as fungal and bacterial proteases. Fungal proteases include, but are not limited to, those obtained from Aspergillus, Trichoderma, Mucor mARhizopus, such as A. niger, A. awamori, A. oryzae andM. miehei. In some embodiments, the polypeptides of this disclosure canbe cellobiohydrolase (CBH) enzymes (EC 3.2.1.91). In one embodiment, the cellobiohydrolase enzyme can be CBH1 or CBH2.
[0135] Other enzymes useful with the disclosure include, but are not limited to, hemicellulases, suchas mannases and arabinofuranosidases (EC 3.2.1.55); ligninases; lipases (e.g., E C. 3. 1.1.3), glucose oxidases, pectinases, xylanases, transglucosidases, alpha 1,6 glucosidases (e.g., E.C. 3.2.1.20); esterases such as ferulic acid esterase (EC 3.1.1.73) and acetyl xylan esterases (EC 3.1.1.72); and cutinases (e.g., E.C. 3.1.1.74).
[0136] In some embodiments, two or more polynucleotides encoding two or more polypeptides, each conferring modified oil content and/or altered lipid profile when recombinantly expressed in a plant, are stacked in a plant using methods disclosed herein. The resultant genetically modified plant has modified oil content and/or altered lipid profile relative to a control plant, where the control plant does not recombinantly express the two or more polynucleotides. In some embodiments, the two or more polynucleotides are expressed in the plant under two or more heterologous promoters. In one illustrative example, a polynucleotide encoding GmDESI (SEQ ID NO: 76, which correspond to SEQ ID NO: 1 of PCT/CN2022/075982) and a polynucleotide encoding GmSTART (SEQ ID NO: 3, which corresponds to SEQ ID NO: 3 of PCT/CN2022/075977) are stacked in a transgenic soybean plant, resultingin an altered lipid profile and modified total oil content in seeds as compared to a control plant that does not recombinantly express both GmDESI and GmSTART. The stacking of the two polynucleotides increases oleic acid content but decreases linoleic acid and linolenic acid in the seeds. Meanwhile, myristic acid, stearic acid, palmitic acid, and eicosadienoic acid were all reduced in transgenic plants, See Example 29. [0137] Double stranded RNA molecules useful with the disclosure include but are not limited to those that suppress target insect genes. As used herein the words "gene suppression", when taken together, are intended to refer to any of the well-known methods for reducing the levels of protein produced as a result of gene transcription to mRNA and subsequent translation of the mRNA. Gene suppression is also intended to mean the reduction of protein expression from a gene or a coding sequence including posttranscriptional gene suppression and transcriptional suppression. Posttranscriptional gene suppression is mediated by the homology between of all or a part of a mRNA transcribed from a gene or coding sequence targeted for suppression and the corresponding double stranded RNA used for suppression and refers to the substantial and measurable reduction of the amount of available mRNA available in the cell for binding by ribosomes. The transcribed RNA can be in the sense orientation to effect what is called co-suppression, in the anti-sense orientation to effect what is called anti-sense suppression, or in both orientations producing a dsRNA to effect what is called RNA interference (RNAi). Transcriptional suppression is mediatedby the presence in the cell of a dsRNA, a gene suppression agent, exhibiting substantial sequence identity to a promoter DNA sequence or the complement thereof to effect what is referred to as promotertrans suppression. Gene suppression may be effective against a native plantgene associated with a trait, e g., to provide plants with reduced levels of a protein encoded by the native gene or with enhanced or reduced levels of an affected metabolite. Gene suppression can also be effective against target genes in plant pests that may ingest or contact plant material containing gene suppression agents, specifically designed to inhibit or suppress the expression of one or more homologous or complementary sequences in the cells of the pest. Such genes targeted for suppression can encode an essential protein, the predicted function of which is selected from the group consisting of muscle formation, juvenile hormone formation, juvenile hormone regulation, ion regulation andtransport, digestive enzyme synthesis, maintenance of cell membrane potential, amino acid biosynthesis, amino acid degradation, sperm formation, pheromone synthesis, pheromone sensing, antennae formation, wing formation, leg formation, development and differentiation, egg formation, larval maturation, digestive enzyme formation, hemolymph synthesis, hemolymph maintenance, neurotransmission, cell division, energy metabolism, respiration, and apoptosis.
[0138] As used herein, “selectable marker” means a nucleotide sequence that when expressed imparts a distinct phenotype to the plant, plant part and/or plant cell expressing the marker and thus allows such transformed plants, plant parts and/or plant cells to be distinguished from those that do not have the marker. Such a nucleotide sequence may encode either a selectable or screenable marker, depending on whether the marker confers a trait that can be selected for by chemical means, such as by using a selective agent (e.g., an antibiotic, herbicide, or the like), or on whether the marker is simply a trait that one can identify through observation or testing, such as by screening (e.g., the R-locus trait). Selectable markers can also include the makers associated with oil and/or protein content and fatty acid profile (e.g., as described in Whiting, R.M., et al., BMC Plant Biol. 2020 Oct 23;20(l):485).
VI. Marker assisted selection of the plants with improved traits.
[0139] In addition to the phenotypic traits, the genetic characteristic of the plant as represented by its genetic marker profile can be used to select plants of desired traits. The term “marker-based selection” refers to the use of genetic markers to detect one or more nucleic acids from the plant, where the nucleic acid is associated with a desired trait to identify plants that carry genes for desirable (or undesirable) traits. Markers includebut are not limited to Restriction Fragment Length Polymorphisms (RFLPs), Randomly Amplified Polymorphic DNAs (RAPDs), Arbitrarily Primed Polymerase Chain Reaction (AP-PCR), DNA Amplification Fingerprinting (DAF), Sequence Characterized Amplified Regions (SCARs), Amplified Fragment Length Polymorphisms (AFLPs), Simple Sequence Repeats (SSRs) which are also referred to as Microsatellites, and Single Nucleotide Polymorphisms (SNPs). There are known sets of public markers that are being examined by ASTA and other industry groups for their applicability in standardizing determinations of what constitutes an essentially derived variety under the US Plant Variety Protection Act. However, these standard markers do not limit the type of marker and marker profile which can be employed in breeding or developing backcross conversions, or in distinguishing varieties or plant parts or plant cells or verify a progeny pedigree. Primers and PCR protocols for assaying these and other markers are disclosed in the Soybase (sponsored by the USDA Agricultural Research Service and Iowa State University) located at the world wide web at 129.186.26.94/SSR.html.
[0140] The term “associated with” as used herein refers to a recognizable and/or detectable relationship between two entities. For example, the phrase “associated with increased protein content” refers to a trait, locus, gene, allele, marker, phenotype, etc., or the expression product thereof, the presence or absence of which can influence or indicate an extent and/or degree to which a plant or its progeny exhibits increased protein content as compared to a control plant. As such, a marker is “associated with” a trait when it is linked to it and when the presence of the marker is an indicator of whether and/or to what extent the desired trait or trait form will occur in a plant/germplasm comprising the marker. Similarly, a marker is “associated with” an allele when it is linked to it and when the presence (or absence) of the marker is an indicator of whether the allele is present (or absent) in a plant, germplasm, or population comprising the marker. For example, “a marker associated with increased protein content” refers to a marker whose presence or absence can be used to predict whether and/or to what extent a plant will display increased protein content as compared to a control plant.
[0141] The term “allele(s)” refers to any of one or more alternative forms of a gene, all of which alleles relate to at least one trait or characteristic. In a diploid cell, the two alleles of a given gene occupy corresponding loci on a pair of homologous chromosomes.
[0142] The term “genotype” and variants thereof refer to the genetic composition of an organism, including, for example, whether a diploid organism is heterozygous (i.e., has two different alleles for a given gene or QTL) or homozygous (i.e., has the same allele for a given gene or QTL) for one or more genes or loci (e g., an SNP, a haplotype, a gene mutation, an insertion, or a deletion).
[0143] In one embodiment, the markers used to identify the plants comprising the polynucleotides disclosed herein are SNPs. Non-limiting examples of SNP genotyping methods include hybridization, primer extension, oligonucleotide ligation, nuclease cleavage, minisequencing and coded spheres. Such methods are well known and disclosed in e.g., Gut, I.G., Hum. Mutat. 17: 475-492 (2001); Shi, Clin. Chem. 47(2): 164-172 (2001); Kwok, Pharmacogenomics 1(1): 95-100 (2000); and Bhattramakki and Rafalski, Discovery and application of single nucleotide polymorphism markers in plants, in PLANT GENOTYPING: THE DNA FINGERPRINTING OF PLANTS, CABI Publishing, Wallingford (2001). A wide range of commercially available technologies utilize these and other methods to interrogate SNPs, including Masscode SupTM/Sup (Qiagen, Germantown, MD, (Hologic, Madison, WI), (Applied Biosystems, Foster City, CA), (Applied Biosystems, Foster City, CA) and Beadarrays SupTM/Sup (Illumina, San Diego, CA).
[0144] In some embodiments, an assay (e g., generally a two-step allelic discrimination assay or similar), aKASP SupTM/Sup assay (generally a one-step allelic discrimination assay defined below or similar), or both can be employed to identify the SNPs that associate with increased protein content, increased oil content, and/or modified oil profile. In an exemplary two-step assay, a forward primer, a reverse primer, and two assay probesthat recognize two different alleles at the SNP site (or hybridization oligos) are employed. The forward and reverse primers are employed to amplify genetic loci that comprise SNPs that are associated with increased protein content, increased oil content, and/or modified oil profile. The particular nucleotides that are present at the SNP positions are then assayed using the probes. In some embodiments, the assay probes andthe reaction conditions are designed such that an assay probe will only hybridize to the reverse complement of a 100% perfectly matched sequence, thereby permitting identification of which allele (s) that are present based upon detection of hybridizations. In some embodiments, the probes are differentially labeled with, for example, fluorop hores to permit distinguishing between the two assay probes in a single reaction. Exemplary methods of amplifying include employing a polymerase chain reaction (PCR) or ligase chain reaction (LCR) using a nucleic acid isolated from a soybean plant or germplasm as a template in the PCR or LCR.
[0145] In some embodiments, a number of SNP alleles together within a sequence, or across linked sequences, can be used to describe a haplotype for any particular genotype. Chinget al., BMC Genet. 3: 19 (2002) (14 pages); Gupta et al., (2001) Curr Sci. 80:524-535, Rafalski, Plant Sci. 162: 329-333 (2002). In some cases, haplotypes can be more informative than single SNPs and can be more descriptive of any particular genotype. For example, a single SNP may be allele “T” for a specific disease resistant line or variety, but the allele “T” might also occur in the soybean breeding population being utilized for recurrent parents. In this case, a combination of alleles at linked SNPs may be more informative. Once a unique haplotype has been assigned to a donor chromosomal region, that haplotype can be used in that population or any subset thereof to determine whether an individual has a particular gene. The use of automated high throughput marker detection platforms known to those of ordinary skill in the art makes this process highly efficient and effective.
[0146] The term “haplotype” can refer to the set of alleles an individual inherited from one parent. A diploid individual thus has two haplotypes. The term “haplotype” can be used in a more limited sense to refer to physically linked and/or unlinked genetic markers (e.g., sequence polymorphisms) associated with a phenotypic trait. The phrase “haplotype block” (sometimes also referred to in the literature simply as a haplotype) refers to a group of two or more genetic markers that are physically linked on a single chromosome (or a portion thereof). Typically, eachblockhas a few common haplotypes, and a subset of the genetic markers (i.e., a “haplotype tag”) can be chosen that uniquely identifies each of these haplotypes.
[0147] Exemplary markers that are associated with and can be used to identify plants having increased protein content and/or increased oil content are shown in Tables 3-6.
Table 3. SNP sites in Glyma.06G303700
SNP# Position (bp) Alleles SNP# Position (bp) Alleles SNP# Position (bp) Alleles
1 454bp C/T 8 6977bp G/A 15 7988bp G/T
2 1144bp T/C 9 7102bp G/A 16 8202bp A/G
3 1165bp C/T 10 7206bp T/A 17 8315bp C/A
4 1890bp C/T 11 7212bp G/A 18 8366bp T/C
5 5790bp C/A 12 7239bp G/C 19 8393bp C/T
6 6827bp T/G 13 7255bp A/G 20 8396bp T/C
7 6941bp C/G 14 7852bp C/T
[0148] The 20 SNPs shown in Table 3 can be divided into three blocks using the HaploView4.2 software. Studies showthat SNP in each block had strong linkage disequilibrium. Block 1 contains SNP #l-#5, of which SNP #4 is located in the CDS coding region. Block 2 contains SNP #7-# 18 12, among which SNP#7 and #8 are located in the CDS coding region; Block 3 contains SNP #19 and #20, both of which are outside the CDS coding region.
[0149] The SNP genotyping reveals seven different haplotypes that are associated with increased protein content and/or increased oil content. Tables 4-6 shown the genotype of each haplotype.
Table 4. Block 1 haplotypes
Figure imgf000051_0001
Table 5. Block 2 haplotypes
Figure imgf000052_0001
Table 6. Block 3 haplotypes
Figure imgf000052_0002
[0150] As shown in the examples, haplotypes Hap _2, Hap _3, andHap_6 were found associated with increased protein content; haplotypes Hap 1, Hap 2, Hap 5 and Hap 7 were found associated with increased oil content. Hap_2 was associated with both increased oil content and increased protein content. FIG. 68.
[0151] These SNP markers can be used in a marker-assisted breeding program to move traits, such as native traits or traits conferred by transgenes or traits conferred by genome editing, into a desired plant background. As used herein, the term “native trait” refers to a trait already existing in germplasm, including wild relatives of crop species, or that can be produced by the recombination of existing traits. For example, progeny plants from a cross between a donor soybean plant comprising in its genome a nucleic acid sequence encoding SEQ ID NO: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, 139, 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, or a fragment orvariant ofany one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, 139, 3, 5, 8, 9, 12, 15, 18, 19, 22, 24-59, and a recipient soybean plant not comprising said nucleic acid sequence can be screened to detect the presence of the markers associated with increased protein content, increased oil content, and/or modified oil profile. Plants comprising said markers can be selected and verified for increased protein content, increased oil content, and/or modified oil profile as compared to control plants. In some embodiments, the donor plant comprises a nucleic acid sequence encoding SEQ ID NO: 76. In some embodiments, the donor plant comprises a nucleic acid sequence encoding SEQ ID NO: 3 and the markers listed in Table 3. In some embodiments, the markers that can be used to select plants having increased protein content are the alleles associated with one or more haplotypes of Hap l, Hap_2, Hap_5, or Hap_7. In some embodiments, the markers that can be used to select plants having increased oil content are the alleles associated with one or more haplotypes of Hap_2, Hap_3 , or Hap_6. The favorable alleles of the SNPs are those present in one or more of the aforementioned haplotypes.
VII. Assay, kits, and primers
[0152] Also provided herein are the kits and primers that can be used to introduce a polynucleotide sequence as described in this disclosure into a recipient plant or to detect a polynucleotide sequence as describedin this disclosure in a plant.
[0153] Also provided herein are kits and primers that can be usedto identify plants that have increased protein content, increased oil content, and/or modified oil profile. As a nonlimiting example, the primers can include Glyma.20G092400-zF, ATGGCCTCCAACGGCG (SEQ ID NO: 37); and Glyma.20G092400-zR, AGCCGAAAGAAGAGCACAAGTAAACC (SEQ ID NO: 38). As another non-limiting example, the primers can include Glyma.06G303700-F ATAACTAGTATGTTCCAGCCGAACC (SEQ ID NO: 63); and Glyma.06G303700-R, ATAGGATCCAGCAGGTTCACCAGA(SEQ ID NO: 64).
[0154] Also provided herein are the kits and primers that can be usedto detect the expression level of the polypeptide disclosed herein in plants. As a non-limiting example, the primers can include Glyma.20G092400-q-F CTGATGCTCAAAAGCTTAGGACCCG (SEQ ID NO: 100); and Glyma.20G092400-q-R AACCTTGTTGTAAACCTGACGAGAAAT (SEQ ID NO: 101) (Table 14). As another non-limiting example, the primers can include Glyma.06G303700-q-F: AGTTGCACCGATTCAACAGGC (SEQ ID NO: 65); and Glyma.06G303700-q-R CCATGCGATGTGGTTCCATCT (SEQ ID NO: 66).
[0155] In some embodiments, the kit may also comprise one or more probes having a sequence corresponding to or complementary to a sequence having 80% to 100% sequence identity with a specific region of the transgenic event or gene editing event. In some embodiments, the kit may comprise any reagent and material required to perform the assay or detection method. EXEMPLARY EMBODIMENTS
[0156] As used below, any reference to a series of embodiments is to be understood as a reference to each of those embodiments disjunctively (e.g., "Embodiments 1-4" is to be understood as "Embodiments 1, 2, 3, or4").
[0157] Embodiment Al is an elite Glycine max plant having in its genome a nucleic acid sequence from a donor Glycine plant, wherein the donor Glycine plant is a different strain from the elite Glycine max plant, and wherein the nucleic acid sequence encoding at least one polypeotide having at least 90% identity or 95% identity to SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, wherein said polypeptide confers increased protein content, oil content, and/or modified oil profile on the elite Glycine max plantas compared to a control plant not comprising said nucleic acid sequence.
[0158] Embodiment A2 is the elite Glycine max plant of embodiment Al, wherein the donor Glycine plant is a Glycine soja plant or Glycine max plant.
[0159] Embodiment A3 is the elite Glycine max plant of embodiment A2, wherein the Glycine soja plant is a ZYD00006 variety.
[0160] Embodiment A4 is the elite Glycine max plant of embodiment A2, wherein the Glycine max plant is a DN50 variety or a SN14 variety.
[0161] Embodiment A5 is the elite Glycine max plant of any one of embodiments Al -A4, wherein the nucleic acid sequence encodes at least one polypeptide having the amino acid sequence set forth in the SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139.
[0162] Embodiment A6 is the elite Glycine max plant of any one of embodiments Al -A4, wherein the nucleic acid sequence has at least 90% identity, at least 95% identity, or at least 100% identity to any one of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134, or 137, or the nucleic acid sequence has at least 90% identity, at least 95% identity, or at least 100% identity to any one of SEQ ID NOs: 75, 78, 81, 84, 87, 111, 114, 117, 120, 123, 126, 129, 132, 135, or 138 .
[0163] Embodiment A7 is the elite Glycine max plant of any one of embodiments Al -A6, wherein the polypeptide encoded by the nucleic acid sequence has at least 90% identity or at least 95% identity to SEQ ID NO: 3, wherein the polypeptide comprises an aminotransferase domain, wherein the amino transferase domain has no more than two, no more than five, no more than ten amino acid substitutions as compared to amino acid residues 91-274 of SEQ ID NO: 76.
[0164] Embodiment A8. is the elite Glycine max plant of any oneof embodiments Al -A7, wherein said nucleic acid sequence is introduced into said plant genome by genome editing of genomic sequences corresponding to and comprising any one of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134, or 137, wherein the genome editing confers increased protein content, oil content, and/or oil profile.
[0165] Embodiment A9 is the elite Glycine max plant of embodiment A8, wherein the gene editing is by CRISPR, TALEN, meganucleases, or through modification of genomic nucleic acids.
[0166] Embodiment Al 0 is the elite Glycine max plant of embodiment Al -A6, wherein said nucleic acid sequence is introduced into said plant genome by transgenic expression of (a) a nucleic acid sequence having at least 90% identity or at least 95% identity to any one of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134, or 137, (b) a nucleic acid sequence encoding a polypeptide having at least 90% identity or at least 95% identity to the sequence of any one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, or (c) a nucleic acid sequence encoding a polypeptide the sequence of any one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, wherein said polypeptide confers increased protein content, increased oil content, and/or modified oil profile on the elite Glycine max plant.
[0167] Embodiment Al 1 is the elite Glycine max plant of any one of embodiments Al- A10, wherein the elite Glycine max plant is an agronomically elite Glycine max plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, or herbicide tolerance.
[0168] Embodiment A12 is a plant, the genome of which hasbeen edited to comprise a nucleic acid sequence encoding at least one polypeptide having at least 90% identity or 95% identity to any one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, wherein said polypeptide confers increased protein content, increased oil content, and/or modified oil profile relative to a control plant, wherein the plant does not comprise said nucleic acid sequence before the genome editing. [0169] Embodiment A13 is the plant of embodiment Al 2, wherein the nucleic acid sequence is introduced into said plant genome by genome editing of the nucleic acid sequence set forth in any one of SEQ ID NOs : 74, 77, 80, 83 , 86, 110, 113 , 116, 119, 122, 125, 128, 131, 134, or 137.
[0170] Embodiment Al 4 is the plant of embodiment Al 2 or Al 3, wherein the genome editing comprises duplication, inversion, promoter modification, terminator modification and/or splicing modification of the nucleic acid sequence.
[0171] Embodiment Al 5 is the plant of any one of embodiments A12-A14, wherein the genome editing is accomplished through CRISPR, TALEN, meganucleases, or through modification of genomic nucleic acids.
[0172] Embodiment Al 6 is the plant of any one of embodiments A12-A15, wherein the plant is an agronomically elite plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, or herbicide tolerance.
[0173] Embodiment Al 7 is the plant of any one of embodiments A12-A16, wherein the nucleic acid sequence is operably linked to a heterologous promoter and wherein the heterologous promoter is active in the plant.
[0174] Embodiment Al 8 is the plant of embodiment Al 7, wherein the heterologous promoter is a native promoter or active variant of fragment thereof.
[0175] Embodiment Al 9 is a plant having stably incorporated into its genome a nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence encodes a polypeptide having an amino acid sequence that has at least 85% identity, at least 90% identity, or at least 95% identity to at least one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, or an amino acid sequence set forth in SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, wherein said nucleic acid sequence is heterologous to the plant, and wherein the plant has increased protein content, increased oil content, and/or modified oil profileas compared to a control plant.
[0176] Embodiment A20 is the plant of embodiment Al 9, wherein the nucleic acid sequence comprises at least 85% identity, at least 90% identity, or at least 95% identity to at least one of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134, or 137, or the nucleic acid sequence is any one of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134, or 137.
[0177] Embodiment A21 is the plant of embodiments Al 9 or A20, wherein the nucleic acid sequence is introduced into the genome by transgenic expression.
[0178] Embodiment A22 is the plant of embodiments Al 9 or A20, wherein the nucleic acid sequence is introduced into the genome by genome editing.
[0179] Embodiment A23 is the plant of embodiment A22, wherein the promoter is an endogenous promoter.
[0180] Embodiment A24 is the plant of any one of embodiments A19-A23, wherein the promoter is a constitutive promoter, an inducible promoter, or a tissue-specific promoter
[0181] Embodiment A25 is the plant of any one of embodiments A19-A24, wherein the plant is a dicot plant.
[0182] Embodiment A26 is the plant of embodiment A25, wherein the dicot plant is a soybean plant or an elite soybean plant.
[0183] Embodiment A27 is the plant of any one of embodiments Al 9-A24, wherein the plant is a monocot plant.
[0184] Embodiment A28 is the plant of embodiment A27, wherein the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane.
[0185] Embodiment A29 is the plant of any one of embodiments Al 9-A28, wherein the plant is an agronomically elite plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, or herbicide tolerance.
[0186] Embodiment A30 is a progeny plant from the elite Glycine max plant of any one of embodiments Al -All or the plant of any one of embodiments A12-A29, wherein the progeny plant has stably incorporated into its genome the nucleic acid sequence.
[0187] Embodiment A31 is a plant cell, seed, or plant part derived from the elite Glycine max plant of any one of embodiments Al -Al 1 or the plant of any one of embodiments A12- A29, wherein said plant cell, seed or plant part has stably incorporated into its genome the nucleic acid sequence. [0188] Embodiment A32 is a harvest product derived from the elite Glycine max plant of any one of embodiments Al -All or the plant of any one of embodiments A12-A29.
[0189] Embodiment A33 is a processed product derived from the harvest product of embodiment A32, wherein the processed product is a flour, a meal, an oil, a starch, ora product derived from any of the foregoing.
[0190] Embodiment A34 is a method of producing a soybean plant having increased polypeptide and/or oil content, the method comprising the steps of: a) providing a donor soybean plant comprising in its genome a nucleic acid sequence encoding any at least one polypeptide having at least 90% identity or 95% identity to SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, wherein said nucleic acid sequence confers to said donor soybean plant increased protein content, increased oil content, and/or modified oil profile compared to donor Glycine plant, b) crossing the donor soybean plant of a) with a recipient soybean plant not comprising said nucleic acid sequence; and c) selecting a progeny plant from the cross of b) by detecting the presence of the nucleic acid sequence, or the presence of one or more molecular markers associated with the nucleic acid sequencein the progeny plant, thereby producing a soybean plant having increased protein content, increased oil content, and/or modified oil profile.
[0191] Embodiment A35 is the method of embodiment A34, wherein the molecular marker is a single nucleotide polymorphism (SNP), a quantitative trait locus (QTL), an amplified fragment length polymorphism (AFLP), randomly amplified polymorphic DNA (RAPD), a restriction fragment length polymorphism (RFLP) or a microsatellite.
[0192] Embodiment A36 is the method of embodiment A34 or A35, wherein the either donor or recipient soybean plant is an elite Glycine max plant.
[0193] Embodiment A37 is a method of producing a Glycine max plant with increased protein content, increased oil content, and/or modified oil profile, the method comprising the steps of: a) isolating a nucleic acid from a Glycine max plant b) detecting in the nucleic acid of a) at least one molecular marker associated with a nucleic acid sequence comprising any one of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134, or 137, wherein said nucleic acid sequence confers to the Glycine max plant increased protein content, increased oil content, and/or modified oil profile; c) selecting a Glycine max plant based on the presence of the molecular marker detected in b); and d) producing a Glycine max progeny plant from the plant of c) identified as having said molecular marker associated with increased polypeptide and/or increased oil content.
[0194] Embodiment A38 is the method of embodiment A37, wherein the molecular marker is a single nucleotide polymorphism (SNP), a quantitative trait locus (QTL), an amplified fragment length polymorphism (AFLP), randomly amplified polymorphic DNA (RAPD), a restriction fragment length polymorphism (RFLP) or a microsatellite.
[0195] Embodiment A39 is the method of embodiment A38, wherein the detecting comprises amplifying a molecular marker locus or a portion of the molecular marker locus and detecting the resulting amplified molecular marker amplicon.
[0196] Embodiment A40 is the method of embodiment A39, wherein the amplifying comprises employing a polymerase chain reaction (PCR) or ligase chain reaction (LCR) using a nucleic acid isolated from a soybean plant or germplasm as a template in the PCR or LCR.
[0197] Embodiment A41 is the method of embodiment A39, wherein the nucleic acid is selected from DNA or RNA.
[0198] Embodiment A42 is a plant produced by the method of any one of embodiments A34-A41.
[0199] Embodiment A43 is a method of conferring increased protein content, increased oil content, and/or modified oil profile to a plant comprising: a) introducing into the genome of the plant a nucleic acid molecule operably linked to a promoter active in the plant, wherein the nucleic acid sequence is stably incorporated into the genome, wherein the nucleic acid sequence encodes a polypeptide having (i) an amino acid sequence comprising at least 85%, at least 90%, or at least 95% identity to any one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, or (ii) an amino acid sequence set forth in SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, wherein said nucleic acid sequence is heterologous to the plant, and wherein expression of said nucleic acid sequence increases protein content, increased oil content, and/or modified oil profile compared to a control plant not expressing said nucleic acid sequence.
[0200] Embodiment A44 is the method of embodiment A43, wherein the nucleic acid sequence is introduced into the genome of the plant by transformation. [0201] Embodiment A45 is the method of embodiment A44, wherein the nucleic acid sequence is introduced into the genome of the plant by crossing a donor plant comprising the nucleic acid sequence with the plant to produce a progeny plant having increased protein content, increased oil content, and/or modified oil profile.
[0202] Embodiment A46 is the method of embodiment A45, wherein the nucleic acid sequence is introduced into the genome of the plant by gene editing of the genome of the plant
[0203] Embodiment A47 is the method of embodiment A45, wherein the method comprises Cast 2a mediated gene replacement.
[0204] Embodiment A48 is the method of any one of embodiments A43-A47, wherein the promoter is an exogenous promoter.
[0205] Embodiment A49 is the method of any of embodiments A43-A47, wherein the promoter is an endogenous promoter.
[0206] Embodiment A50 is the method of any one of embodiments A43-A49 wherein the method comprises screening for the introduced nucleic acid sequencewith PCR and/or sequencing.
[0207] Embodiment A51 is the method of any one of embodiments A43-A50, wherein the plant is a dicot plant.
[0208] Embodiment A52 is the method of embodiment A51 , wherein the dicot plant is a soybean plant.
[0209] Embodiment A53 is the method of any one of embodiments A43-A51, wherein the plant is a monocot plant.
[0210] Embodiment A54 is the method of embodiment A53, wherein the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane.
[0211] Embodiment A55 is a plant produced by the method of any one of embodiments A43-A54.
[0212] Embodiment A56 is a polypeptide selected from: (a) a polypeptide having the amino acid sequence shown in SEQ ID NO: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136 or 139, or any portion thereof, wherein the portion confers increased polypeptide and/or oil content, and having a heterologous amino acid sequence attached thereto; (b) a polypeptide comprisingthe amino acid sequence of SEQ ID NO: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136 or 139, and having substitution and/or deletion and/or addition of one or more amino acid residues, wherein expression of the polypeptide confers increased polypeptide and/or oil content on the plant; (c) a polypeptide having more than 99%, more than 95%, more than 90%, more than 85%, or more than 80% identity with the amino acid sequence of SEQ ID NO: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136 or 139, wherein the polypeptide when expressed in a plant confers increased polypeptide and/or oil content on the plant; or (d) a fusion polypeptide comprising the amino acid sequence of SEQ ID NO: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136 or l39, or the polypeptide as defined in any one of(a) to (c).
[0213] Embodiment A57 is a nucleic acid molecule comprising: (a) a nucleotide sequence encoding a protein having an amino acid sequence sharing at least 90%, 95% or 100% sequence identity to SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139, wherein said nucleotide sequence comprises a heterologous nucleic acid sequence attached thereto and expression of the nucleic acid molecule in a plant increases protein and/or oil content in the plant; (b) the nucleotide sequence of part (a) comprising a sequence of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134, or 137; or (c) the nucleotide sequence of part (a) having at least more than 99%, at least 95%, at least 90%, at least 85%, or at least 80% identity to any one of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134, or 137.
[0214] Embodiment A58 is an expression cassette comprisingthe nucleicacid molecule of embodiment 56 or encoding the polypeptide of embodiment A57.
[0215] Embodiment A59 is the expression cassette of embodiment A58, wherein the nucleic acid molecule is operably linked to a promoter capable of directing expression in a plant cell.
[0216] Embodiment A60 is the expression cassette of embodiment A59, wherein the promoter is an endogenous promoter.
[0217] Embodiment A61 is the expression cassette of embodiment A59, wherein the promoter is an exogenous promoter. [0218] Embodiment A62 is the expression cassette of embodiment A61, wherein the promoter comprises pSOYl (SEQ ID NO: 20).
[0219] Embodiment A63 is a vector comprising the nucleic acid molecule of embodiment A62, the expression cassette of any one of embodiments A56-A61, a nucleic acid molecule havingthe sequence setforth in SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134, or 137 , or a nucleic acid sequence encoding the polypeptide havingthe sequence setforthin SEQ ID NO: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, or 139.
[0220] Embodiment A64 is a transgenic cell comprising the nucleic acid molecule of embodiment A63 or the expression cassette of any one of embodiments A56-A63.
[0221] Embodiment A65. Use of the polypeptide of embodiment A56 or the nucleic acid molecule of embodiment A57, or the expression cassette, of any one of embodiments A56- A64, or the transgenic cell of embodiment A63 in conferring increased protein content, increased oil content, and/or modified oil profile.
[0222] Embodiment A66 is use of the expression cassette of any one of embodiments A56- A64 in a cell, wherein the expression level and/or activity of the polypeptide in the cell is increased, and the protein content is increased, the oil content is increased and/or the oil profile is modified in the cell.
[0223] Embodiment A67 is a method for increasing protein content, increasing oil content, and/or modifying oil profile in a plant, comprising increasing the expression level and/or activity of the polypeptide of embodiment A56 in the plant.
[0224] Embodiment A68 is a method for producing a plant variety with increased protein content, increased oil content, and/or modified oil profile in a plant, comprising increasing the expression level and/or activity of the nucleic acid molecule of embodiment A57 in the plant.
[0225] Embodiment A69 is the method of embodiments A67 or A68, wherein the increasing the expression level and/or activity of the polypeptide in the plant is by transgenic means or by breeding.
[0226] Embodiment A70 is a method for producing a transgenic plant with increased protein content, increased oil content, and/or modified oil profile, comprising introducing the nucleic acid molecule of embodiment A57 or the expression cassette of any one of embodiments A65-A69 to a recipient plant to obtain a transgenic plant, wherein the transgenic plant has increased protein content, increased oil content, and/or modified oil profile compared to the recipient plant.
[0227] Embodiment A71 is the method of embodiment A70, wherein the introducing the nucleic acid molecule to the recipient plant is performed by introducing the expression cassette of any one of embodiments A6-A64 into the recipient plant
[0228] Embodiment A72 is a primer pair for amplifying the nucleic acid molecule of embodiment A57.
[0229] Embodiment A73 is the primer pair of embodiment A72, wherein the primer pair is a primer pair composed of two single-stranded DNA shown in at least one of Table 14, Table 17, Table 18, and Table 19.
[0230] Embodiment A74 is a kit comprising the primer pair of embodiment A72 or A73.
[0231] Embodiment B 1. An elite Glycine max plant having in its genome a nucleic acid sequence from a donor Glycine plant, wherein the donor Glycine plant is a different strain from the elite Glycine max plant, and wherein the nucleic acid sequence encoding at least one polypeptide having at least 90% identity or 95% identity to the amino acid sequence of SEQ ID NO: 3, 5, 8, 9, 12,15, 18, 19, 22, or 24-59, wherein said polypeptide confers increased protein, oil content, and/or modified oil profile on the elite Glycine max plant.
[0232] EmbodimentB2. The elite Glycine max plant of embodiment Bl, wherein the donor Glycine plant is from Glycine soja or Glycine max.
[0233] Embodiment B3. The elite Glycine max plant of embodiment B2, wherein the Glycine soja is the ZYD00006 variety.
[0234] Embodiment B4. The elite Glycine max plant of embodiment Bl or B2, wherein the nucleic acid sequence encodes at least one polypeptide having the amino acid sequence set forth in SEQ ID NO: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59.
[0235] Embodiment B5. The elite Glycine max plant of any one of embodiments Bl -B3, wherein the nucleic acid sequence has atleast90%, 95% or 100% sequence identity to any one of SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or a polynucleotide encoding a polypeptide having the amino acid sequence of any one of SEQ ID NOs: 22, or 24-59. [0236] Embodiment B6. The elite Glycine max plant of embodiment Bl, wherein the polypeptide encoded by the nucleic acid sequence has at least 90%, or at least 95% identity to SEQ ID NO: 3, SEQ ID NO: 5, or SEQ ID NO: 22, wherein the polypeptide comprises one or more of the following: (i) a START domain, wherein START domain has no more than two, no more than five, no more than ten amino acid substitutions as compared to amino acid residues 246-466 of SEQ ID NO: 20, or (ii) a homeodomain, wherein the homeodomain has no more than two, no more than five, no more than ten amino acid substitutions as compared to amino acid residues 55-113 of SEQ ID NO: 20.
[0237] Embodiment B7. The elite Glycine max plant of any one of embodiments Bl -B6, wherein the nucleic acid sequence is introduced into said plant genome by genome editing of the sequence setforthin SEQ ID NO: 1, 2, 4, 7, 10, 11, 14, 16 or 17, wherein the genome editing confers increased protein, oil content, and/or oil profile.
[0238] Embodiment B8. The elite Glycine max plant of any one of embodiments Bl -B6, wherein the nucleic acid sequence is introduced by genome editing of a Glycine max genomic region homologous to or an ortholog of the nucleic acid sequence corresponding to SEQ ID NO: 1, and further making at least one genomic editto said Glycine max genomic region of atleast one allele change corresponding to any described in any of Tables 4-6, wherein the one or more alleles are associated with the one or more of haplotypes Hap l, Hap_2, Hap_3, Hap_5, Hap_6, and/or Hap_7, wherein said one or more alleles confer in the plant increased protein and/or oil content, wherein said Glycine max genomic region did not comprise said allele change before genome editing, and wherein said genomic edit confers in the plant increased protein and/or oil content.
[0239] Embodiment B9. The elite Glycine max plant of embodimentB7 or B8, wherein the genomic editing is accomplished through CRISPR, TALEN, meganucleases, or through modification of genomic nucleic acids.
[0240] Embodiment B10. The elite Glycine max plant of any one of embodiments Bl -B6, wherein said nucleic acid sequence is introduced into said plant genome by transgenic expression of (a) a nucleic acid sequence encoding at least one polypeptide having at least 90% identity or 95% identity to SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or a polynucleotide encoding a polypeptide comprising the amino acid sequence of any one of SEQ ID NOs: 22, or 24-59 or (b) a nucleic acid sequence encoding atleast one polypeptide comprisingthe amino acid sequence set forth in SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59 wherein said polypeptide confers increased protein and/or oil content on the elite Glycine max plant.
[0241] Embodiment Bl 1. The elite Glycine max plant of any of embodiments Bl -BIO, wherein the elite Glycine max plant has in its genome at least one allele that is associated with a haplotype of Hap l , Hap_2, Hap_5, and/or Hap_7, wherein the plant has increased oil content.
[0242] Embodiment Bl 2. The elite Glycine max plant of any of embodiments Bl -BIO, wherein the elite Glycine max plant has in its genome at least one allele that is associated with a haplotype of Hap_2, Hap_3, and/or Hap_6, wherein the plant has increased protein content.
[0243] Embodiment Bl 3. The elite Glycine max plant of any one of embodiments Bl -B6, wherein at least one parental line of said elite Glycine max plant was selected or identified through molecular marker selection, wherein said parental line is selected or identified based on the presence of a molecular marker located within or closely linked with said nucleic acid sequence of any one of SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, or any portion thereof, wherein said molecular marker is associated with increased protein and/or oil content and/or modified oil profile.
[0244] Embodiment Bl 4. The elite Glycine max plant of embodiment Bl 3, wherein the molecular marker is a single nucleotide polymorphism (SNP), a quantitative trait locus (QTL), an amplified fragment length polymorphism (AFLP), randomly amplified polymorphic DNA (RAPD), a restriction fragment length polymorphism (RFLP), or a micro satellite.
[0245] Embodiment Bl 5. The elite Glycine maxplant of embodimentB13 orB14, wherein the nucleic acid sequence comprises a SNP marker associated with increased protein and/or oil content, and wherein the molecular marker is any one or more of the SNP markers as shown in Table 3
[0246] Embodiment Bl 6. The elite Glycine max plant of any one of embodiments Bl -Bl 5, wherein the elite Glycine max plant is an agronomically elite Glycine max plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, herbicide tolerance. [0247] Embodiment Bl 7. A plant, the genome of which has been edited to comprise a nucleic acid sequence encoding at least one polypeptide having at least 90% identity or 95% identity to SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or at least one polynucleotide encoding a polypeptide comprising the amino acid sequence of any one of SEQ ID NOs: 22, or24-59, wherein said polypeptide confers increased protein and/or oil content and/or modified oil profile relative to a control plant, wherein the plant does not comprise said nucleic acid sequence before the genome editing.
[0248] Embodiment Bl 8. The plant ofembodiment Bl 7, wherein the nucleic acid sequence is introduced into said plant genome by genome editing of a nucleic acid sequence set forth in SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or a nucleic acid sequence encoding any oneof SEQ ID NOs: 22, or 24-59.
[0249] Embodiment Bl 9. The plant of embodiment B 17 or Bl 8, wherein the nucleic acid sequence is introduced by genome editing of a genomic region homologous to or an ortholog of the nucleic acid sequence corresponding to SEQ ID NO: 1, and further making at least one genomic editto said Glycine max genomic region of at least one allele change corresponding to any described in any ofTable 3, wherein the one or more alleles are associated with the one or more of haplotypes Hap_l, Hap_2,Hap_3, Hap_5,Hap_6 and/or Hap_7, wherein said one or more alleles confer in the plant increased protein and/or oil content, wherein said Glycine max genomic region did not comprise said allele change before genome editing, and wherein said genomic edit confers in the plant increased protein and/or oil content.
[0250] Embodiment B20. The plant ofembodiment Bl 7, wherein the nucleic acid sequence is modified into said plant genome by duplication, inversion, promoter modification, terminator modification and/or splicing modification via genome editing of a nucleic acid sequence set forth in any one of SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or a nucleic acid sequence encoding a polypeptide comprising an amino acid sequence of any one of SEQ ID NOs: 22, or 24-59.
[0251] Embodiment B21. The plant of any one ofembodiments B17-B20, wherein the genomic editing is accomplished through CRISPR, TALEN, meganucleases, or through modification of genomic nucleic acids.
[0252] Embodiment B22. The plant of any one ofembodiments B17-B21, wherein the plant has in its genome at least one allele that is associated with a haplotype of Hap l , Hap_2, Hap_5, and/or Hap_7, wherein the plant has increased oil content. [0253] Embodiment B23. The plant of any one ofembodiments B17-B21, wherein the plant has in its genome at least one genetic marker that is allele that is associated with a haplotype of Hap_2, Hap_3 , and/or Hap_6, wherein the plant has increased protein content.
[0254] Embodiment B24. The plant of any one ofembodiments B17-B23, wherein the nucleic acid sequence comprises a SNP marker associated with increased protein and/or oil content, and wherein the molecular marker is any one or more of the SNP markers as shown in Table 3
[0255] Embodiment B25. The plant of any one ofembodiments B17-B24, wherein the plant is an agronomically elite plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, herbicide tolerance.
[0256] Embodiment B26. The plant of any one ofembodiments B17-B25, wherein the nucleic acid sequence is operably linked to a heterologous promoter and wherein the heterologous promoter is active in the plant.
[0257] Embodiment B27. The plant of embodimentB26, wherein the promoter is a native promoter or active variant or fragment thereof
[0258] Embodiment B28. A plant having stably incorporated into its genome a nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence encodes a polypeptide having (a) an amino acid sequence comprising at least 85%, at least 90%, or at least 95% identity to at least one of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, or (b) an amino acid sequence setforth in SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein said nucleic acid sequenceis heterologous to the plant, and wherein the plant has increased protein content and/or increased oil and/or modified oil profile as compared to a control plant.
[0259] Embodiment B29. The plant of embodimentB28, wherein (a) said nucleic acid sequence comprises at least 85%, at least 90%, or at least 95% identity to at least one of SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or21 orto a polynucleotide encodes any one of SEQ ID NOs: 22, 24-59, or (b) said nucleic acid sequence is any one of SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or21 or encodes any one of SEQ ID NOs: 22, 24-59.
[0260] Embodiment B30. The plant of embodiment B28 or B2 , wherein the nucleic acid sequence is introduced into the genome by transgenic expression. [0261] Embodiment B31. The plant of embodiment B28 or B29, wherein the nucleic acid sequence is introduced by genome editing.
[0262] Embodiment B32. The plant of any one of embodiments B28-B31, wherein the promoter is an endogenous promoter.
[0263] Embodiment B33. The plant of any one of embodiments B28-B31, wherein the promoter is a constitutive promoter, inducible promoter, a a tissue-specific promoter.
[0264] Embodiment B34. The plant ofany one of embodiments B28-B30, wherein said genomic region of the plant comprises at least one allele corresponding to one or more alleles as described in any of Tables 3-6, wherein the one or more alleles are associated with one or more of haplotypes Hap_l , Hap_2, Hap_3 , Hap_5, Hap_6, and/or Hap_7, and wherein said one or more alleles confer in the plant increased protein and/or oil content.
[0265] EmbodimentB35. The plant of any one of embodiments B28-B34, wherein the plant has in its genome at least one allele associated with a haplotype of Hap l , Hap_2, Hap_5, and/or Hap_7, wherein the plant has increased oil content.
[0266] Embodiment B36. The plant of any one of embodiments B28-B34, wherein the plant has in its genome at least one allele associated with a haplotype ofHap_2, Hap_3, and/or Hap_6, wherein the plant has increased protein content.
[0267] Embodiment B37. The plant of any one of embodiments B28-B36, wherein the nucleic acid sequence comprises a SNP marker associated with increased protein and/or oil content, and wherein the molecular marker is any one or more of the SNP markers as shown in Table 3.
[0268] Embodiment B38. The plant of any one of embodiments B28-B37, wherein the plant is a dicot plant.
[0269] EmbodimentB39. The plant of emb odimentB38, wherein the dicotplantis a soybean plant or an elite soybean plant.
[0270] Embodiment B40. The plant of any one of embodiments B28-B37, wherein the plant is a monocot plant.
[0271] Embodiment B41. The plant of embodimentB40, wherein the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane. [0272] Embodiment B42. The plant of any one ofembodiments B28-B41, wherein the plant is an agronomically elite plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, or herbicide tolerance.
[0273] Embodiment B43. A progeny plant from the elite Glycine max plant of any one of embodiments l-16 orthe plantof any oneof embodiments B17-B42, wherein saidprogeny plant has stably incorporated into its genome the nucleic acid sequence.
[0274] Embodiment B44. A plant cell, seed, or plant part derived from the elite Glycine max plantof any one of embodiments Bl -Bl 6 or the plant of any one of embodiments Bl 7- B42, wherein said plant cell, seed or plant part has stably incorporated into its genome the nucleic acid sequence.
[0275] Embodiment B45. A harvest product derived from the elite Glycine max plant of any one of embodiments B1-B16 or the plant of any one of embodiments Bl 7-B42.
[0276] Embodiment B46. A processed product derived from the harvest product of embodiment B45, wherein the processed product is a flour, a meal, an oil, a starch, or a product derived from any of the foregoing.
[0277] Embodiment 47. A method of producing a soybean plant having increased protein and/or oil content and/or modified oil profile, the method comprising the steps of : a) providing a donor soybean plant comprising in its genome a nucleic acid sequence encoding at least one polypeptide having at least 90% identity, at least 95% identity, or at least 98% identity to SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or a nucleic acid sequence encoding a polypeptide having an amino acid sequence of any one of SEQ ID NOs: 22, or 24- 59, wherein said nucleic acid sequence confers onto said donor soybean plant an increased protein and/or oil content and/or modified oil profile; b) crossing the donor soybean plant of a) with the recipient soybean plant not comprising said nucleic acid sequence; and c) selecting a progeny plant from the cross ofb) by isolating a nucleic acid from said progeny plant and detecting within said nucleic acid a molecular marker associated with said nucleic acid sequence thereby producing a soybean plant having increased protein content and/or increased oil content and/or modified oil profile.
[0278] Embodiment B48. The method of embodiment B47, wherein the molecular marker is a single nucleotide polymorphism (SNP), a quantitative trait locus (QTL), an amplified fragment length polymorphism (AFLP), randomly amplified polymorphic DNA (RAPD), a restriction fragment length polymorphism (RFLP) or a microsatellite.
[0279] Embodiment B49. The method of embodiment B47 or B48, wherein the molecular markers are markers as set forth in Tables 3-6.
[0280] Embodiment B50. The method of any one of embodiments B47-B49, wherein either the recipient or the donor soybean plant is an elite Glycine max plant.
[0281] Embodiment B51. A method of producing a Glycine max plant with increased protein and/or oil content to, the method comprising the steps of: a) isolating a nucleic acid from a Glycine max plant, b) detecting in the nucleic acid of a) at least one molecular marker associated with, or closely linked with a nucleic acid sequence comprising any one of SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, or a portion of any thereof, wherein said portion confers to a plant increased protein content and/or increased oil content; c) selecting a plant based on the presence of the molecular marker detected in b); and d) producing a Glycine max progeny plant from the plant of c) identified as having said marker associated with increased protein and/or increased oil content.
[0282] Embodiment B5 . The method of embodiment B51, wherein the molecular marker is a single nucleotide polymorphism (SNP), a quantitative trait locus (QTL), an amplified fragment length polymorphism (AFLP), randomly amplified polymorphic DNA (RAPD), a restriction fragment length polymorphism (RFLP) or a micro satellite.
[0283] EmbodimentB53. The method of embodimentB51 or B 52, wherein the molecular marker is one or more SNPs set forth in Table 3
[0284] Embodiment B54. The method of any one of embodiments B51-B53, wherein the molecular marker comprises alleles associated with one or more of haplotypes Hap l , Hap_2, Hap_3, Hap_5, and/or Hap_7.
[0285] Embodiment B55. The method of embodiment B51, wherein the detecting comprises amplifying a molecular marker locus or a portion of the molecular marker locus and detecting the resulting amplified molecular marker amplicon.
[0286] Embodiment B56. The method of embodiment B51, wherein the nucleic acid is selected from DNA or RNA. [0287] Embodiment B57. A plant produced by the method of any one of embodiments B47-B56.
[0288] Embodiment B58. A method of conferring increased protein content and/or increased oil content and/or modified oil profile to a plant comprising: a) introducing into the genome of the plant a nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence is stably incorporated into the genome, wherein the nucleic acid sequence encodes a polypeptide having (i) an amino acid sequence comprising least 85%, at least 90%, or at least 95% identity to at least one of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, or (ii) an amino acid sequence set forth in any one of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59 wherein said nucleic acid sequence is heterologous to the plant, and wherein expression of said nucleic acid sequence increases protein content and/or increases oil content compared to a control plant not expressing said nucleic acid sequence.
[0289] Embodiment B59. The method of embodiment B58, wherein the nucleic acid sequence is introduced into the genome of the plant by transformation.
[0290] Embodiment B60. The method of embodiment B58, wherein the nucleic acid sequence is introduced into the genome of the plant by crossing a donor plant comprising the nucleic acid sequence with the plant to produce a progeny plant having increased protein content and/or increased oil content.
[0291] Embodiment B61. The method of embodiment B58, wherein the nucleicacid sequence is introduced into the genome of the plant by gene editing of the genome of the plant.
[0292] Embodiment B62. The method of embodiment B58, wherein the method comprises Casl2a mediated gene replacement.
[0293] Embodiment B63. The method of embodiment B62, wherein the method comprises at least one gRNA.
[0294] Embodiment B64. The method of any one of embodiments B58-B63, wherein the promoter is an exogenous promoter.
[0295] EmbodimentB65 The method of any one of embodiments B58-B63, wherein the promoter is an endogenous promoter. [0296] Embodiment B66. The method of embodiment B64, wherein the exogenous promoter comprises SEQ ID NO: 23 or an active variant or fragment thereof.
[0297] Embodiment B67. The method of embodiment B59, wherein the method comprises screening for the introduced nucleic acid sequence with PCR and/or sequencing.
[0298] Embodiment B68. The method of any one of embodiments B58-B67, wherein the plant is a dicot plant.
[0299] Embodiment B69. The method of embodiment B68, wherein the dicot plant is a soybean plant.
[0300] Embodiment B70. The method of any one of embodiments B58-B67, wherein the plant is a monocot plant.
[0301] Embodiment B71. The method of embodiment B70, wherein the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane.
[0302] Embodiment B72. A plant produced by the method of any one of embodiments B58-B71.
[0303] Embodiment B73. A polypeptide selected from: (a) a polypeptide comprising the amino acid sequence of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein expression of the polypeptide in a plant confers increased protein, oil content and/or modified oil profile on said plant, and having a heterologous amino acid sequence attached thereto; (b) a polypeptide comprisingthe amino acid sequence of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, and having a substitution and/or a deletion and/or an addition of one or more amino acid residues, wherein expression of the polypeptide in the plant confers increased protein and/or oil content on said plant; (c) a polypeptide having at least 99%, at least 95%, at least 90%, atleast 85%, or atleast 80% identity with and havingthe same function as the amino acid sequence of SEQ IDNOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein the polypeptide when expressed in a plant confers increased polypeptide and/or oil content on the plant; or (d) a fusion protein comprisin the amino acid sequence of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or24-59 or the polypeptide as defined in any one of(a) to (c).
[0304] Embodiment B74. A nucleic acid molecule comprising (a) a nucleotide sequence encoding a protein having an amino acid sequence sharing at least 90%, 95% or 100% sequence identity to SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, or 24-59, wherein said nucleotide sequence comprises a heterologous nucleic acid sequence attached thereto and expression of the nucleic acid molecule in a plant increase protein and/or oil content in the plant; (b) the nucleotide sequence of part (a) comprising a sequence of SEQ IDNOs: NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21 or a sequence encoding SEQ ID NOs: 22, or 24-59; or (c) the nucleotide sequence of part (a) having at least 99%, at least 95%, at least 90% identity to of any one of SEQ IDNOs: NO: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, 21 or a polynucleotide of SEQ ID NO: 22, 24-59.
[0305] Embodiment B75. An expression cassette comprising the nucleic acid molecule of embodiment 74 or a nucleic acid sequence encoding the polypeptide of embodiment B73.
[0306] Embodiment B76. The expression cassette of embodiment B75, wherein the nucleic acid molecule is operably linked to a promoter that is capable of directing expression in a plant cell.
[0307] Embodiment B77. The expression cassette of embodiment B75, wherein the promoter is an endogenous promoter.
[0308] Embodiment B78. The expression cassette of embodiment B75, wherein the promoter is an exogenous promoter.
[0309] Embodiment B79. A vector comprising the nucleic acid molecule of embodiment B74, the expression cassette of any one of embodiments B75-B78.
[0310] Embodiment B80. A transgenic cell comprising the nucleic acid molecule of embodimentB74 or the expression cassette of any one of embodiments B75-B78.
[0311] Embodiment B81. Use of the polypeptide of embodiment B73 or the nucleic acid molecule of embodiment B74, or the expression cassette of any one of embodiments B75 to B78 in conferring increased protein content and/or increased oil content and/or modified oil profile in a plant.
[0312] Embodiment B82. Use of the expression cassette of any one of embodiments B75- B78 in a cell, wherein the expression level and/or activity of the polypeptide in the cell is increased, and the protein content and/or oil content is increased in a plant upon expression in a plant. [0313] Embodiment B83. A method for increasing protein content and/or oil content in a plant, comprising increasing the expression level and/or activity of the polypeptide of embodimentB73 in the plant.
[0314] Embodiment B84. A method for producing a plant variety with increased protein content and/or oil content, comprising increasing the expression level and/or activity of the polypeptide of embodimentB73 in a recipient plant.
[0315] Embodiment B85. The method of embodiments B83 orB84, wherein the increasing the expression level and/or activity of the polypeptide in the plant is by transgenic means or by breeding.
[0316] Embodiment B86. A method for producing a transgenic plant with increased protein content and/or oil content, comprising the following step : introducing the nucleic acid molecule of embodimentB67 orthe expression cassette of any one of embodiments B75-B78 to a recipient plant to obtain a transgenic plant; the transgenic plant has increased protein content and/or oil content compared with the recipient plant.
[0317] Embodiment B87. The method of embodiment B86, wherein the introducing the nucleic acid molecule to the recipient plant is performed by introducing the expression cassette of any one of embodiments B75-B78 into the recipient plant.
[0318] Embodiment B88. A primer pair for amplifying the nucleic acid molecule of embodimentB74.
[0319] Embodiment B89. The primer pair of embodiment B88, wherein the primer pair is a primer pair 1 composed of two single-stranded DNA comprising a sequence of SEQ ID NO: 63 and SEQ ID NO: 64.
[0320] Embodiment B90. A kit comprising the primer pair of embodimentB88 orB89.
EXAMPLES
Example 1. Experimental Materials Used in Examples 2-11
[0321] Four materials (herein also referred to as “extreme materials”) from the chromosome segment substitution lines (CSSLs) population in 2014-2015 that showed significant differences in protein and oil content from the recurrent parent Suinong 14 (SN14, control material) were selected. These materials were disclosed in Qi et al., Plant Cell Environ. 2018 Sep;41(9):2109-2127. The materials were identified as having High Protein Low Oil (HPLO) content, Low Protein High Oil (LPHO) content, High Protein High Oil (HPHO) content, and Low Protein Low Oil (LPLO content and were used herein as the test materials. From 2016 to 2018, the four extreme materials were sown in the experimental field of Xian gyang Farm with the conditions described below: the appropriate soil moisture content was about 15-20%, the row length was about 5m, the row spacing was about 60cm, the seeding depth (the distance from the surface of the soil) was about 3 -4cm, and each material was sown in 20 rows. After 3 weeks, the seedlings were manually thinned to reach a plant spacing of about 6.5 cm.
[0322] Field sampling of soy protein and oil content of the extreme materials.
Developmental stages soybean grains and growth stages (Glob, Hrt, Cot, EMI, EM2, MM, LM and DS) of the seeds are as described on the Soybase website (soybase.org) and is shown in Table 7.
Table 7. Soybean seed development stages
Accession Name Description
SOY:0001288 Globular (Glob) 4-5 days afterflowering,the embryois spherical,the endo sperm surrounds the embryo, the proximal end of the embryo is cells, and the distal end of the embry o is noncellular
SOY:0001289 Heart (Hrt) Visible only 6-7 days afterflowering, when the cotyledons begin to develop
SOY:0001290 Cotyledon (Cot) 10-14 days afterflowering, the cotyledons are in the normal position, the primary leaf prim ordium is visible, and the endosperm accounts for about 1 /2 of the grain volume
SOY:0001291 Early Maturity 1 20-30 days afterflowering, the primordiumofthefirsttriple
( EM 1 ) compound leaf has been formed, the endosperm has been completely absorbed, and the cotyledonshave reached their final size
SOY: 0001292 Early Maturity 2 30-50 days after flowering, all endosperm disappears, and
(EM2) the grain size is gradually increasing
SOY: 0001293 Mid Seed Maturity 50-80 days after flowering, the weight and size of the
(MM ) kernels reachabouthalf of their maturity
SOY: 0001294 Late Seed Maturity 80-110 after flowering, the seeds reach their mature size.
( LM ) Due to dehydration and drying, the weight of the seeds gradually decreases
SOY:0001360 Dry Seed (DS) The seeds are mature, andthe moisture contentreaches about 12%
[0323] Field sampling was performed by selecting plants blooming at nodes 6-8, and leaf samples from the nodes 6-8 were taken each time, and approximately one full centrifuge tube was taken as a biological replicate each time. Three biological replicates of each material were used. Each biological replicate was immediately placed in the ice box for storage for protein and fatty acid phenotype determination.
[0324] Soybean and Arabidopsis genetic transformation. Unless explicitly stated otherwise, Escherichia coli used in this application was DH5a and Agrobacterium tumefaciens was EHA105. The target gene fragment of entry vector Fu28 was introduced into the plant expression vector pSOYl via ligation using the gateway vector system. The entry vector Fu28 and expression vector Psoy 1 were presented by Professor Fu Yongfu of Institute of crop science, Chinese Academy of Agricultural Sciences (Wang X, et al. (2013) BioVector, a flexible system for gene specific expression in plants. BMC Plant Biol 13 : 198).
[0325] The main reagents involved in this experiment are shown in Table 8.
Table 8. Main experimental reagents
Reagent name Manufacturer
EasyTaq DNAPolymerase BeijingQuanshijin Biotechnology Co., Ltd.
2000 Plus DNAMarker BeijingQuanshijin Biotechnology Co., Ltd.
Easy Pure HiPure Plasmid MiniPrep Kit BeijingQuanshijin Biotechnology Co., Ltd.
Trans2KPlus DNAMarker BeijingQuanshijin Biotechnology Co., Ltd.
2 xChamQUniversalSYBR qPCR Master Mix NanjingNuoweizanBiotechnology Co., Ltd.
HiScript II QRT SuperMix for qPCR(+gDNA wiper) NanjingNuoweizanBiotechnology Co., Ltd.
TRIzolReagent Invitrogen
Gateway LRClonase Ilenzymemix Invitrogen
Gel ExtractionKit (200) OMEGA
Cycle-Pure Kit (100) OMEGA
KOD-Plus-Neo TOYOBO
Solution I Takara
Restriction endonuclease Xho , BamHII) NEB
Hormones andantibiotic drugs, etc. Sigma . Phyto Tech lab a nd Am re sc o
[0326] The cultures and antibiotics involved in this experiment are Table 9 and Table 10.
Table 9. Experimental medium
Medium Component and Volume
LB liquid 5 g/L Yea st extiact+1 Og/LNaCl+1 Og/L Peptone
LB Solid 5 g/L Yea st extiact+1 Og/LNaCl+1 Og/L Peptone+15 g/L Agar
YEP liquid 5 g/L Yea st extiact+5 g/L NaCl+1 Og/L Peptone
YEP Solid 5 g/L Yea st extiact+5 g/L NaCl+1 Og/L Peptone+15g/LAgar
Table 10. Antibiotics
Antibiotic Working fluid concentration
Cm 27.38pg/mL
Spec 50.00pg/mL
Rif 50.00pg/mL [0327] Determination of protein and oil content of CSSLs population. The FOSS grain analyzer (INFRATEC 1241) was used to determine the protein and oil content of soybean grains in CSSLs population. Each test material was tested 3-5 times, andthe average value was used for phenotypic data analysis.
[0328] The Kjeldahl method is commonly used for the quantitative determination of nitrogen contained in organic substances plus the nitrogen contained in the inorganic compound’s ammonia and ammonium (NH3/NH4 +). Without modification, other forms of inorganic nitrogen, for instance nitrate, are not included in this measurement. The Kjeldahl reagents required for determining soybean grain protein content are shown in Table 11.
Figure imgf000077_0001
[0329] Bioinformatic analysis of candidate genes. The websites used to predict functions of candidate gens in the “hot spot” interval are shown in the Table 12 below.
Figure imgf000077_0002
[0330] Gene expression analysis. Gene expression was determined by real time quantitative-PCR (qRT-PCR) analysis. Reaction solutions for genomic DNA removal were prepared as shown in Table 13, primers for qRT-PCR amplification were shown in Table 14, and reactions solutionsfor qRT-PCR amplification were prepared as shown in Table 15.
Figure imgf000077_0003
Table 14. qRT-PCR amplification primers
Figure imgf000078_0001
Table 15. Reaction solution preparation for qRT-PCR
Component Volume
2 x ChamQ Universal SYBRqPCRMaster Mix 10p.L
Primer 1 (10pM) 0.4pL
Primer2 (lOpM) 0.4pL
Template cDNA 1.5pL ddHjO Upto 20pL
[0331] Primers for cloning Glyma.20G092400 are provided in Table 16.
Table 16. Cloning primers for Glyma.20G092400
Figure imgf000078_0002
Example 2. Methods for the Identification of Arabidopsis mutants
A. Plantm&Arabidopsis thaliana
[0332] The planting soil comprising flower nutrients and vermiculite at a ratio of 3 : 1 (flower nutrient soil: vermiculite). The soil was put into small flowerpots and slowly soaked in water. Arabidopsis thaliana seeds were sown evenly in moist soil. The opening of each pot was sealed with plastic wrap and placed in a refrigerator at4°C for vernalization for 48-72h. After vernalization, the pots were placedin an incubator (22°C, 16 h/8 h light/dark, 70 pmol/m2/s) for 1 week until the Arabidopsis emerged. After culturing for 1 week, the wrap was removed.
B. DNA extraction from Arabidopsis leaves
[0333] Total DNA was extracted by the CTAB (hexadecyltrimethylammonium bromide) method (Poreb ski, S. et al., Plant Molecular Biology Reporter, 1997, 15(1):8-15). The prepared CTAB extract was stored forat4°C. The rosette leaves of Arabidopsis thaliana were collected and placed in an Eppendorf® (“EP”) tube with with small steel balls. Lquid nitrogen was used to quick-freeze the leaves. Next the frozen leaves were placed in a tissue grinder to fully break the leaves. 700 pL of CTAB extract solution was addedto the EP tube containing the sample and mixed thoroughly with a vortex er. The mixture in the EP tube was then placed in a 65°C water bath for 1 h, turning and mixing once every 10 minutes. The EP tube was then taken out of the water bath and added 650 pL of chloroform after cooling. The two was inverted 30 times to mix thoroughly, and centrifuged at 12000 rpm for 15 minutes at room temperature. 400-500 pL of the supernatant was added into a new EP tube and 650 pL of chloroform was added. The mixture was shaken and mixed thoroughly and centrifuged at 12000 rpm for 15 minutes at room temperature. 400-500 pL of the supernatant was transferred to a new EP tube containing 700 pL of pre-cooled isopropanol and inverted 30 times to mix thoroughly. The mixture was then centrifuged at 12000 rpm for 15 minutes at room temperature. The supernatant was discarded, and the precipitate was washed once with 95% ethanol, then once with 75% ethanol, and centrifuged at 7500 rpm for 5 min at room temperature. The DNA precipitate was dried and dissolved with 50 pL of sterilized water. DNA concentration (as reflected by the OD600 value) was measured by using NanoDrop2000C, and the DNA was stored at -20°C.
C. PCR identification o Arabidopsis mutants
[0334] Arabidopsis homozygotes were screend by using the Arabidopsis mutant (SALK 021984C) detection primers (LP+RP, BP+RP) (Table 17) provided by SIGnAL (signal. salk.edu/tdnaprimers.2.html). Total DNA of Arabidopsis wild-type ecotype Col-0 (“Col-0”) and the mutant were extracted andused as a template for PCR amplification. The amplified product was subjected to 1.5% agarose gel electrophoresis to detect whether the mutant was a homozygous mutant. The primers used are shown in Table 17. Arabidopsis homozygotes were screend by using the Arabidopsis mutant (SALK 021984C) detection primers (LP+RP, BP+RP) provided by SIGnAL (signal. salk.edu/tdnaprimers.2.html).
Table 17. PCR detection primer for Arabidopsis mutant (SALK 021984C)
Figure imgf000079_0001
D. RT-PCR identification c Arabidopsis mutants
[0335] RT-PCR primers for detection of homozygous mutants of Arabidopsis thaliana are provided in Table 16. Arabidopsis total RNA extraction and generation of cDNA were performed as described in Example 7. Atl8SrRNA was used as an internal reference gene, and RT-PCR detection primers for SALK _02/9S4Chomozygous mutation were shown in Table 18.
Table 18. RT-PCR for detection of Arabidopsis thaliana homozygous mutants
Figure imgf000080_0001
E. Arabidopsis genetic transformation and identification i. Flower dipping transformation o Arabidopsis thaliana
[0336] Arabidopsis cultivation and plant transformation preparation. Arabidopsis control group wild-type Col-0 and mutant materials were planted as described above. After the Arabidopsis was bolted, the stalks were removed to increase the number of bolts. The plants were then ready to be transformed when the stalks growed to the same height and only the upper flowers were not blooming.
[0337] Agrobacterium preparation. Agrobacterium tumefaciens containing the expression vector at -80°C were inoculated into lOmL of LB liquid medium containing spectinomycin and cultured overnight at 28°C at 160 rpm. 100 pL of small shaking bacteria liquid was then transferred to 100 mL of new YEP liquid medium containing spectinomycin for further culturing at 28°C, 200 rpm shaking. When the density of the culture reached OD60o 0.8, the culture was harvested and centrifuged. The bacteria pellet was resuspended with lOOmL of 5% sucrose and 0.01% Silwet-L77 resuspension solution. The suspension was kept at room temperature for l-3hfor agricultural use.
[0338] Transformation of Arabidopsis thaliana. Arabidopsis thaliana that had grown to a suitable bolting height with a large number of inflorescences were used for the transformation. The flowering flocs and the established pods were removed. The unflowered flocs were immersed in the Agrobacterium resuspension for 30s. L Arabidopsis thaliana infected by Agrobacterium was then wrapped in plastic wrap and placed in a dark box for light-proof treatment. After the incubation period of 24 hours, the infected plants were then taken out of the dark box. A second round of transformation was then performed on these plants a week later in order to improve the conversion efficiency. Mature seeds of the plants were harvested. ii. Screening Transgenic Arabidopsis with Basta
[0339] The mature To seeds of the transformedd/z/A/z/o/zs/.s thalianawere harvested and planted as described above. When the two young true leaves were fully expanded, Basta liquid (Basta dilution concentration is 1 :1000) was sprayed on the plants 2 - 3 times, once every other day, and the growth state of Arabidopsis was observed. Non-transgenic Arabidopsis plants appeared chlorosis and gradually died, while transgenicAraZuzto/zsA plants grew normally. After the transgenic Arabidopsis thaliana plant grew 4 leaves, the plants that were positively identified as transgenic plants were transplanted into new small pots to allowthe seedlings to grow up for verification of transgene status. iii. Identification of transgenic Arabidopsis thaliana
[0340] The leaf DNA of transgenic Arabidopsis thaliana (Tb T2 and T3) was extracted. The transgenic plants were identified by PCR using Glyma.20G092400 gene primers and Bar primers using primers shown in Table 19. The PCRproducts were detectedby 1.5% agarose gel electrophoresis.
Table 19. PCR primers for detecting transgenic Arabidopsis
Figure imgf000081_0001
iv. qRT-PCR identification of T3 generation transgenicAraZuzfo/zsA thaliana
[0341] Total RNA of Arabidopsis rosette leaves were extracted and reversely transcrib edinto cDNA. The expression level of Glyma.20G092400ln transgenic Arabidopsiswas determined using the primer sequence shown in Table 20. AtACTIN2 was used as an internal reference gene. Table 20. RT-qPCR primers for detecting transgenic Arabidopsis
Figure imgf000082_0001
v. Determination of total nitrogen content in Arabidopsis seeds of Arabidopsis mutants and transgenic plants
[0342] Nitrogen content of the seeds, which reflects the protein content of the seeds, was determined usingKjeldahl reagents describedin Table 11: 0.1 mol/LNa2CO3 calibration to prepare 0.1 mol/L HC1. 1% H3BO3 was prepared, and pH was calibrated to within a range of pH 4 to pH 5. 7 mL of 0.1% methyl red and 10 mb of 0.1% bromophenol green indicator was added for every 1 L of H3BO3, and the solution appeared wine red. Prepared 40%NaOHfor determination.
[0343] The seeds were placed in an oven at 60°C for 12-14 hours. 0.1 g sample (accurate to 0.001 g) was poured into a 50 mL digestion tube through a paper trough. The same sample was tested 3 times. 5 mL of concentrated sulfuric acid and a small amount of catalyst (potassium sulfate and copper sulfate 5 : 1) was added to digest each sample in an ovenat 400°C for 90 minutes. The sample was then taken out from the oven and let cool and use the FOSS automatic Kjeltec 2300 to determine the total nitrogen content. vi. Determination of fatty acid content in Arabidopsis seeds of Arabidopsis mutants and transgenic plants
[0344] The content of fatty acids in seeds was determined by gas chromatography as follows: The seeds were placed in an oven at 105°C for 20-30 minutes, and then at65°C for 12-14 hours. 5 replicate tests were performed for each sample. In each test, about 5 mg of the seed sample was mixed with 1 mL 2.5% concentrated sulfuric acid methanol solution, 5 pL 50 mg/mL BHT (2,6-di-tert-butyl-4-methylphenol). 50 pL lOmg/L heptadecanic acid or acetic acid was used as internal standard. The tubes containing the samples were immediately sealed and placed into a water bath at 85°C for 1.5 h. Each tube was inverted every 10 minutes to mix the sample and reagents thoroughly, and then let cool to room temperature. 160 pL of 9% NaCl solution and 700 pL of n-hexane were then added to the storage tube, and the mixture was vortexed for 3 minutes andcentrifuged at4,500 rpm for 10 minutes at room temperature. 400 pL of the supernatant of each sample were placed into a new centrifuge tube and dried overnight in a fume hood. 400 pL of ethyl acetate was then added to the dry pellet to fully dissolve it before the measurement.
[0345] We used Agilent 6890 gas chromatograph with the column model of : 30mx320pmx0.25pm. Additional operational parameters were carrier gas: nitrogen 60 mL/min; hydrogen 60 mL/min; air 450 mL/min; injection volume: 1 pL; split injection mode; split ratio 10:1; injection port temperature 170°C. Reaction procedure includes holdingthe reaction mixture at 180°C for 1 min, increasing to 250°C at a rate of 25°C/min, and holding the reaction mixture for 7 min. ms
[0346] Calculation formula of absolute quantity : coi = Aix - Ai is the peak area of
Asx m the ith fatty acid component, As is the peak area of internal standard, msis the mass of internal standard, m is the dry weight of the sample.
[0347] Relative quantity calculation formula : w(%) = (— — )xl00% . vii. Transformation of soybean cotyledon nodes
[0348] Soybean cotyledon nodes were transformed and cultivated usingthe following protocol:
[0349] Preparation of the Agrobacterium tumefaciens and soybean cotyledon. Take out the Agrobacterium tumefaciens containing the expression vector at -80°C and inoculate it in lOmL of LB liquid medium containing spectinomycin, and culture it overnight at 28°C at 160 rpm. Transfer 100 pL of small shaking bacteria liquid to 100 mL of new LB liquid medium containing spectinomycin for culture, 28 °C, 200 rpm shaking culture to OD60o=0.8.
Centrifuge 4000rpm for 10 minutes at room temperature, discard the supernatant medium, and resuspend the bacteria with lOO mL co-culture liquid medium (LCCM), and incubate at 28 °C, 200 rpm for 30 minutes for sub sequent transformation.
[0350] Sterilize soybean seeds by the following procedures. Choose full and undamaged seeds into the petri dish, putthe petri dish and beaker with the selected seeds into the airtight container in the fume hood, open the lid of the petri dish, and add sodium hypochlorite and sodium hypochlorite to the beaker at 94:6 Hydrochloric acid and quickly seal the airtight container, turn on the fume hood switch, airtight and sterilize the seeds for 10-12 hours. After sterilization, the seeds were taken out and blown to remove the chlorine attached to the surface of the seeds to avoid damage to the seeds. Add appropriate amount of sterilized water to the soybean seeds to make the seeds absorb the water just to complete the imbibition. Putthe seeds in the dark for 12-14h.
[0351] Co-culture. Divide the seed into two halves along the hypocotyl with a razor blade and use a razor blade to lightly scratch 2-3 points at the cotyledon node to make a cut. Put the explants into the prepared Agrobacterium resuspension, incubate at 160 rpm at28°C for 30 min to facilitate the Agrobacterium infection, and remove the infected explants from the resuspension with tweezers. Place it on the SCCM covered with filter paper and incubate for 3-5 days at 25 °C in the dark.
[0352] Induction of clumping buds. After 3 -5 days of co-cultivation, after the hypocotyls are enlarged, cut the hypocotyls of the explants with a blade, leaving about 2 mm of the hypocotyls, and putthe explants after cutting the hypocotyls into sterilized water several times Wash until the liquid is clear and sterile in order to remove excess bacterial liquid. Put the cut-off hypocotyl explants on sterile paper to absorb the remaining liquid on the surface and insert the explants into the SIM+with tweezers. Set the conditions of the sterile tissue culture room to 25 °C 16 h/8 h light/darkness and place the screening medium plate with explants in the sterile tissue culture room for about 14 days. Observethe growth of the clump buds, take out the slow-growing clump buds and scratch the wound at the bottom again and insert them into a new SIM+; the good ones are used for transfer to the bud elongation medium (SEM).
[0353] Elongation of cluster buds. Cut the sprout buds and insert them into the SEM and place them in a sterile tissue culture room for 2 weeks at 25 °C 16 h/8 h light/darkness. The clump buds that have not grown buds are taken out from the SEM, lightly scratched at the bottom to create a new wound, and then inserted into a new SEM for secondary culture. The culture cycle is about 14 days and the process are repeated.
[0354] Identification of positive elongated buds. When the buds are about 5 cm long and there are about 3 leaves, select a leaf, and perform Bar test strip test as described below to preliminarily determine the positive seedlings.
[0355] Rooting of positive elongated buds. The positive buds were cut from the clumping buds, dipped in IB A hormone for 30 s, inserted into the rooting medium (RM), at25°C 16 h/8 h light/darkness. The rooting cultures was carried out under dark conditions and cultured in a sterile tissue culture room until they took root. [0356] Transplanting and cultivation of positive seedlings. The positive seedlings were taken out from the culture medium, and the roots were cleaned with clean water to remove the residual culture medium. The positive seedlings were transplanted into the soil and cultured in the plant greenhouse. viii. PCR identification of T1 generation transgenic soybean
[0357] Follow the DNA extraction of the leafDNA of the transgenic soybean Tb Glyma.20G092400 gene primers and Bar primers wereused to identify transgenic soybeans by PCR. The related primer sequences are shown in Table 19. ix. qRT-PCR identification of Ti generation transgenic soybean
[0358] Leaves from the soybean plants were immersed in an RNase-free EP tube and freezed in liquid nitrogen. Total RNA was extracted and reversely transcribed into cDNA. qRT-PCR was performed with the primers in Table 14 to identify Tl generation transgenic soybean by analyzing the expression of Glyma.20G092400, the internal reference gene is GmActin4 (GenbankNo: AF049106). x. Determination of protein, oil and fatty acid content in transgenic soybean seeds
[0359] A InfraTec™ 1241 Grain Analyzer (FOSS Analytics) was used to determine the protein and oil content of soybean seeds. Each sample was measured 3-5 times, and the average value was used for phenotypic data analysis.
[0360] The content of fatty acids in seeds was determined by gas chromatography and calculated as describedin section vii above.
Example 3. Analysis of candidate genes tissue expression
[0361] Expression profiles of candidate genes including Glyma.20G092000, Glyma.20G092100, Glyma.20G092400 andGlyma.20G094900 were analyzed by RT-qPCR (Table 1) in WT SN14. RNA was extracted from roots, stems, leaves, flowers, pods and seeds (herein also referred to as grains) of SN14. Expression of the candidate genes was analysized in 8 developmental stages of the grain: Glob, Hrt, Cot, EMI, EM2, MM, LM and DS (Table 2). The results showed that all candidate genes were expressed in the tested tissues, and all showed the highest expression levels in a certain developmental stage of the grain. As shown in Table 1, Glyma.20G092000, Glyma.20G092400 and Glyma.20G094900 had the highest expression levels in the LM stage of the grain. Glyma.20G092100 had the highest expression level in the DS stage of grains. The expression level of Glyma.20G092000 in grain Cot, LM and DS phases is higher than that in other non-grain tissues and organs .
The expression level of Glyma.20G092100 in seed at Cot, EMI , MM, LM and DS phases are all higher than in the non-seed tissues and organs, i.e., root, stem, leaf, flower, pod. The expression level of Glyma.20G092400 in the six developmental stages of the grain (Cot, EMI , EM2, MM, LM and DS) is higher than that in other non-grain tissues and organs; The expression level of Glyma.20G094900 in LM and DS phases is higher than that in other non- grain tissues and organs . Therefore, it is speculated that Glyma.20G092000, Glyma.20G092100, Glyma.20G092400 anAGlyma.20G094900 play an important regulatory role during grain development.
Example 4. Protein structure analysis of candidate genes
[0362] The domains of proteins encoded by Glyma.20G092000, Glyma.20G092400, Glyma.20G094900 and Glyma.20G092100 genes were analyzed through NCBI database. The results showed that Glyma.20G092000 belongs to the retroviral protease superfamily, which includes the pepsin-like aspartic protease of cells and retroviruses, and also has sphingolipid activator-like protein type B, region 1 and region 2. Glyma.20G092100 belongs to the PPR repeatfamily. This repeathas no known function. It is about 35 amino acids long, and upto 18 copies are found in some proteins. Glyma.20G092400 belongs to the amino acid transferase-V family, and this protein contains an amino acid transferase domain and other enzymes including a cysteine desulfurase domain. Glyma.20G09490 Ob elongs to the DUF1336 superfamily and is a protein with unknown function. This family represents the C- terminus of many pseudoproteins with unknown function.
Example 5. Tissue-specific expression analysis of Glyma.20G092400
[0363] RNA was extracted from the organs (roots, stems, leaves, flowers, pods and seeds) of SN14 and reverse transcribed into cDNA, which was identified by qRT-PCR Glyma.20G092400 was expressed in all tissues and organs, very low expression in roots, relatively high expression in stems, leaves, and flowers, higher expression in pods, and the highest expression in seeds, reaching a relative multiple of more than 5 times.
[0364] To analyze tissue specific expression of Glyma.20G092400, RNA is extracted and cDNA were synthesized andqRT-PCRwas performedusingthe specific primers provided in Table 20 shown above in Example 2. The reference gene is GmAciinA (GenbankNo: AF049106). Example 6. Subcellular localization of Glyma.20G092400
[0365] Tobacco cultivation. Tobacco planting soilwas prepared by mixing flower nutrient soil with vermiculite at a ratio of 3 : 1. After germination, the seedlings or transfers to new small flowerpots, one plant per pot, placed into an incubator (22°C, 16 h/8 h light/dark, 70 pmol*m-2«s-l) for cultivation, and watered once every 2 days to ensure adequate water.
[0366] Agrobacterium injection of tobacco leaves. Agrobacterium tumefaciens containing the expression vector p OY \-Glyma.20G092400-GFP') or an expression vector encoding a GFP protein (as a control) was inoculated in lOmLLB (containing the corresponding antibiotics) liquid medium and cultured at 28 °C and shaken at 200rpm until OD600=0.8. The bacteria liquid was transferred to 1 5mL sterilizedEP tubes in batches and centrifuged at 10,000 rpm for 1 min at room temperature to enrich the bacteria. To prepare the resuspension buffer, about mL MES+500pL MgCL was made up to 50mL with sterile water. The bacteria were resuspended with 1 mL of the resuspension buffer, centrifuged at 10,000 rpm at room temperature for 1 min, andthe supernatant was discarded. The bacteria were resuspended, washed, and centrifuged again, and were added with acetosyringone (final concentration 40mg/L). The bacterial solution was transferred into an EP tube, buffer was added to adjust it to OD600=0.2, and the mixture was leftatroom temperature for 1 hour. Robust tobacco was selected after 3 weeks of growth. The tobacco leaves were injected with Agrobacterium using a syringe needle. The Nicotiana benthamiana was inoculated with Agrobacterium tumefaciens in an incubator (22°C, 16h light/8h dark, 70pmol m-2 s-l) for 48h, then observed by confocal microscope for subcellular localization of the target protein. The subcellular localization of pSOY l-Glyma.20G092400-GFP expressing fusion protein was observed under a confocal microscope. The green fluorescence ofpSOYl- Glyma.20G092400-GFP fusion protein was observed in the nucleus, indicating that the protein encoded by Glyma.20G092400\s expressed in the nucleus.
Example 7. Cloning and Vector Construction of Glyma.20G092400
[0367] Total RNA extraction from soybean SN14 leaves. RNA from young and tender SN14 triple compound leaves was extractedby the trizol method. With 2% concentration agarose gel and electrophoresis detection, three bands of28s, 18s and 5s were observed, which indicated that the integrity of the RNA was good. The cDNA was obtained by reverse transcription and used for Glyma.20G092400 gene cloning. [0368] Glyma.20G092400 clone. The CDS sequence of Glyma20G092400 was obtained from the phytozome database. The CDS sequenceis 1388 bp in length. The cloning primers were designed according to the CDS sequence of Glyma.20G092400 (Table 16). This sequence was used as a template to design primers atboth ends of the gene's CDS sequence (with the terminator removed). The primer pair was designed to comprise restriction sites (Spel and BamH I) at both ends of the ccdB gene in the entry vector. First, SN 14 leaf cDNA was used as a template to clone the CDS sequence of Glyma.20G092400 gene with CDS primers, and then this product was used as a template to performPCRwith primers with restriction sites to obtain Glyma.20G092400 with restriction sites on both ends The gene products with restriction sites were recovered through the gel recovery kitfor subsequent experiments. The full-length CDS sequence of Glyma.20G092400 (with the termination codon TGA removed) was cloned using the cDNAof soybean Suinong 14 (SN14) leaves as a template. The CDS sequence was amplified usin the following primers.
[0369] Construction of entry vector (Fu28-Glyma.20G092400) . The Fu28 empty vector and the target gene were digested with restriction endonucleases (Spel and /G HI) and the digested product was ligated with Solution I ligase. The ligation product was transformed into E.coli comp etent DH5 a, and cultured in a chloramphenicol resistant plate for about 16 hours. A single colony was picked and cultured. The insertion of Glyma.20G092400 into the bacteria genome was verified by PCRusingthe primers as shown in Table 14. Gel electrophoresis detected the target band at 1338 bp (data not shown). The presence of a Glyma.20G092400 insert was futher confirmed by sequencing.
[0370] Construction of expression vector ( FU28-
Figure imgf000088_0001
Glyma.20G092400 and pSOYl vector plasmids were extracted. The Fu28-Glyma.20G092400 plasmid additionally carries a green fluorenscent protein (GFP) from Fu28. The plasmids were recombined by LR reaction (Table 21), and the products were transformed into SN14 DH5a and cultured. A single colony was picked and cultured. The presence of the insertion of the Glyma.20G092400 sequence in the bacteria was confirmed by PCR using primers as shown in Table 16. The target band of 1338 bp was detected by gel electrophoresis detected (data not shown). The presence of a Glyma.20G092400 insert was futher confirmed by sequencing, which is consistent with the Glyma.20G092400 gene sequence.
Table 21. LR reaction
Component _ Volume _
Expression vector (pSOYl empty vector plasmid) 2pL Component _ Volume
Entry cavtisr(Fu28-Glyma.20G092400) 2j.iL
LRClonaseEnzyme Mix II IpL
[0371] Expression vector transferred into EHA105 Agrobacterium tumefaciens. EHA105 Agrobacterium competent cells were first transformed with pSOY 1 -Glyma.20G092400, the transformed bacterial cells were grown on a YEP plate that is resistant to both rifampicin and spectinomycin, and single colonies were selected. The transformation was confirmed by PCR as indicated by the presence of a 1338 bp DNA fragment (Giyma.20G092400, data not shown), which represented that the expression vector (pSOYl-Gfyma.20G09240ff) has been transferred into EHA105 Agrobacterium tumefaciens.
[0372] Transient expression localization of Glyma.20G092400. Agrobacterium tumefaciens was injected and transformed into tobacco leaves. After 48 hours, the injected leaves were cut, and the epidermis was removed. They were spread out in clean water and placed on a glass slide and covered with a cover glass. The subcellular localization of pSOYl- Glyma.20G092400-GFP expressing fusion protein was observed under a confocal microscope. The green fluorescence of fSOX\-Glyma.20G092400-GFP appeared on the cell membrane and nucleus, indicating that the gene Giyma.20G092400 is a nuclear membrane co-expressed gene.
Example 8. Expressing Glyma.20G092400 mArabidopsis i. Selection of Arabidopsis mutants
[0373] [0379] Arabidopsis A T5G26600 is highly homologous to soybean
Giyma.20G092400. We obtained the amino acid sequences of Arabidopsis AT5G26600 from the Phytozome (phytozome.jgi.doe.gov, pz portal. him! ) database query and performed a sequence alignment analysis with the amino acid sequence of Giyma.20G092400. It is found that the percentage identity of amino acid sequences between Giyma.20G092400 and Arabidopsis AT5G26600 is about 75.8%, and they have the same protein conserved domains, all belong to the amino acid transferase-V family (FIG. 4). Therefore, the Arabidopsis AT5G26600 gene mutant, SALK 021984C, was purchased through ABRC (abrc.osu.edu ) as the soybean Giyma.20G092400 mutant in Arabidopsis for subsequent experiments. ii. PCR identification Arabidopsis mutants
[0374] Arabidopsis mutant SALK 021984C and the Arabidopsis wild-type Col-0 (WT) plants were planted, and DNA were extracted from the rosette leaves of the plants. PCR was performed with a combination of LP+RP and LP+BP primers as shown in Table 17. The length of the product of LP + BP was about 813 bp (data not shown), and gel electrophoresis analysis indicated that the mutant was homozygous. iii. PCR identification of homozygous Arabidopsis mutants
[0375] In order to determine whether the Arabidopsis mutant SALK 021984C can be successfully transcribed at the mRNA level, total RNA from the Arabidopsis mutant SALK 021984C leaves was extracted, reverse transcribed to obtain cDNA, and the cDNA was used as a template for RT-PCR amplification. The RT-PCR product was detected by using 1.5% agarose gel electrophoresis. The results showed thatthe target gene transcription was not detected in the Arabidopsis mutant, while the transcription was detected for the internal control, AT18sRNA (data not shown). This result further verified that the mutant is a homozygous mutant of Arabidopsis . iv. Basta screening of transgenic Arabidopsis thaliana replenishment and overexpression plants
[0376] In order to verify the function of Glyma.20G092400 Arabidopsis, we used Agrobacterium inflorescence infection methodto transform the expression vector (pSOYl- Glyma.20G092400') into Arabidopsis wild-type ecotype Col-0 (WT) and mutant SALK 021984C to produce overexpression plants (pSOYl-Glyma.20G092400)' and replenishment plants (pSOYl-Glyma.20G092400ISALK 021984CI), respectively. After Agrobacterium infection, mature TO generation seeds of Arabidopsis thaliana were harvested and mixed with fine sand and sown in the prepared soil. After one week, when the two young leaves of the T1 generation plants are fully developed, the plant was sprayed evenly with Basta liquid (Basta dilution concentration: 1 : 1000) once every other day . After spraying 2-3 times, it is observed thatthe non-transgenic^ra ut/o/JS-A plants appear withered and gradually die while the transgenic Arabidopsis plants grew normally and remained green (data not shown). Positive plants were transferred to new small pots, and then identified by the indication of green leaves when the seedlings grew. These plants were preliminarily identified as transgenic Arabidopsis replenishment or overexpression plants. v. Bar test strip detection and PCR identification oftransgerucAraZut/qpsA thaliana replenishment and overexpression plants
[0377] Transgenic Arabidopsis Tl, T2, T3 generation plants were selected and planted. Leaf extract was prepared as described above. A Bar test strip (Linear Chemicals) was inserted into the extract in a specified direction as provided in the manufacture’ s instructions. Bar test strips showed a clear number of two bands in the leaves of transgenic plants (overexpression: pSOYl : Glyma.20G092400, supplement: pSOYl :
Glyma.20G092400/SALK 021984C} (data not shown). At the same time, the total DNA of Arabidopsis leaves was extracted, and the full-length primers of CDS sequence of Glyma.20G092400 and Bar primers were used for PCR identification of transgenic plants. The results showed that there was no target band for Glyma.20G092400 (e.g., 1338 bp) nor for Bar (e.g., 516 bp) in the control plants, but in the replenishment and overexpression plants. The results displayed on the Bar test strip and PCT detection confirmed that the Arabidopsis plant was genetically modified. vi. RT-qPCR identification of T3 transgenic Azvi ncZopsA
[0378] Arabidopsis wild-type (WT) Col-0, mutant plants (SALK 021984C}, replenishment plants (pSOYl: Glyma.20G092400/SALK 021984C) and overexpression plants
(pSOYl .Glyma.20G092400} were planted under the same conditions as described for Arabidopsis above. The total RNA of plant leaves was extracted and reverse transcribed to obtain cDNA as described above. The cDNA was used as a template RT-qPCR amplification using Glyma.20G092400 specific primers to identify the transgenic Arabidopsis thaliana. The results showed that: Glyma.20G092400 was not expressed in wild-type Col-0 and mutants, but it expressed in replenishment plants and overexpression plants. The amount of Glyma.20G092400 transcripts was assessed in Arabidopsis wild type ecotype Col-0 plants (WT), mutant SALK 021984C plants, trangenic Arabidopsis replenishment plants (pSOYl: Glyma.20G092400/SALK 021984C}, andoverexpression plants (pSOYl:
Glyma.20G092400}. See Table 22. The expression level of Glyma.20G092400m overexpression plants was higher than that in replenishment plants. The results indicates that the mutation in Arabidops is A T5 G26600 (a homolog of Glyma.20G092400} signficantly reduces its expression, which may be rescued by reintroducing an exogenous copy of the Glyma.20G092400 as shown herein. The AT5G26600 and Glyma.20G092400 polypeptides share 61% amino acid sequence identity.
Table 22. Relative expression level of G'/)' z.20G092400 (transcript levels) in various plants.
Figure imgf000091_0001
Figure imgf000092_0004
vii. Investigation on bolting of T3 generation Arabidopsis
Figure imgf000092_0001
[0379] Arabidopsis wild-type Col-0, mutant plants (SALK 021984C) , replenishment plants (pSOYl : Glyma.20G092400ISALK 021984C) and overexpression plants (pSOYl: Glyma.20G092400) were planted on the same conditions. After 25 days, the plants were examined for bolting. Bolting occurs when a crop prematurely grows flower stalks and produces seeds. The results showed that wild-type Col-0 plants, replenishment plants, and overexpression plants bolted earlier than mutant plants, and the bolting height of wild-type Col-0 and replenishment plants was about the same. In contrast, overexpression plants appeared to have the maximum bolting height (FIG. 1), which indicates that the Glyma.20G092400 gene may play a role in promoting plant bolting. viii . Investigation on bolting of T3 generation transgen inflorescence
Figure imgf000092_0002
[0380] Arabidopsis wild-type Col-0, mutant plants (SALK 021984C), replenishment plants (pSOYl : Glyma.20G092400/SALK 021984C) and overexpression plants
(pSOYl .Glyma.20G092400) were planted on the same conditions. After 35 days, plants were examined for inflorescence. The results showed that wild-type Col-0, replenishment plants and overexpression plants had more inflorescences than mutant plants. Further, overexpression plants had the most inflorescences (FIG. 2). It is speculated that the Glyma.20G092400 gene may promote the growth of plant inflorescences. ix. Determination of fatty acids and total nitrogen in T3
Figure imgf000092_0003
Arabidopsis seeds
[0381] Arabidopsis wild-type Col-0, mutant plants (SALK 021984C), replenishment plants (pSOYl : Glyma.20G092400/SALK 021984C) and overexpression plants (pSOYl: Glyma.20G092400)' were planted under the same conditions. After the seeds mature, the fatty acid content of the grains was determined by gas chromatography. The results showed that the fatty acid composition differed and total fatty acid content of the grains of the overexpression plants of Glyma.20G092400 were significantly higher than those of the control plants (FIGS. 3A-B). Additionally, the fatty acid composition and total fatty acid content of the grains of the mutant plants was significantly lower than those of the control plants. The Glyma.20G092400 replenishment experiment was further carried out on the mutant plants. The results showed that the palmitic acid, linoleic acid, linoleic acid and eicosenoic acid content of the replenishment lines were significantly higher than those of the control plants, and the stearic acid and oleic acid contents were lowerthan those of the control plants (FIG. 3A). We also foundthatthe replenishment lines had harder grains (data not shown). The fatty acid and oleic acid content were significantly higher in the control plants than that of the mutant plants (FIG. 3A). Moreover, the total f tty acid content of the grains of the replenishment line was significantly higher than that of the control plants (FIG. 3B). The results suggest that Glyma.20G092400 can promote the accumulation of fatty acid content in grains.
Example 9. Expressing Glytna.20G092400 in soybean i. Bar test strip detection and PCR identification of Ti transgenic soybean [0382] The Ti genetically modified soybeans were planted the leaves were crushed and tested using the Bar test strip as described above. The result shows that two horizontal lines appear on the Bar test strip in the overexpressing plants (data not shown), indicating that the verified plants are genetically modified soybean plants. The overexpressing plants were verified by PCR using the full-length primers of CDS sequence of Glyma.20G092400 (1338 bp) and Bar primers (516 bp) (data not shown), indicating that the verified plants were transgenic soybean plants. ii. qRT-PCR identification of TI transgenic soybeans
[0383] The transgenic soybean (overexpression plant p SOY 1 .Glyma.20(4092400) and the control wild-type plant Dongnong 50 (DN50) (WT) were planted under the same conditions. The young leaves were taken to extract total RNA and reverse transcribed into cDNA.The expression level of Glyma.20G092400 was tested by qRT-PCR reaction using Glyma.20G092400 specific primers. The results showed that the expression level of Glyma.20G092400 in the overexpression plants was higher than the control plants, indicating that Glyma.20G092400 was successfully transformed into soybean plants (Table 23). Table 23 Glyma.20G092400 transcripts in plant leaves of wild type (WT) and
Glyma.20G092400-OEby RT-PCR
Figure imgf000094_0001
Note: the numbers represent the relative expression levels iii. Determination of protein and fatty acids in T1 transgenic soybean grains [0384] The transgenic soybean (overexpression plant p SOY 1 .Glyma.20G092400) and the control plantDN50 were planted under the same conditions, their mature T1 seeds were harvested, and some of the seeds were dried for phenotyping. The grain protein and oil content were determined by Kjeldahl nitrogen determination and the content of fatty acid was determined by gas chromatography, e g., as disclosed in Rapid CommunMass Spectrom. 2007;21(12):1937-43.). The protein, oil, and fatty acid content in the overexpression plants were significantly higher than the control plants, indicating that Glyma.20G092400 promoted quality traits (protein and oil content) (Table 24).
Table 24 Protein and fatty acid contents in wild type (WT) and Glyma.20G092400-
Figure imgf000094_0002
Note: the numbers represent the amount of each component as a weight percentage of the seed
Example 10. Distribution of protein and oil content of CSSLs population
[0385] FOSS grain analyzer (INFRATEC 1241) was used to determine the seed protein and oil content of the CSSLs population (2013-2015). Three biological replicates were measured for each test material, and the average value was used for protein and oil content phenotypic data analysis. The range of protein contentwas about 37.00% - 46.77%, andthe range of oil contentwas about 18.02%-23. 19%. The results are consistentto the normal distribution and is suitable for quantitative trait locus (QTL) mapping of protein and oil content. As described herein the QTL mapping refers to a genome-wide inference of the relationship between genotype at various genomic locations and phenotype for a set of quantitative traits in terms of the number, genomic positions, effects, and interaction of QTL.
Example 11. Fine mapping of QTL (Qpro&oil_Gm20) for soybean protein and oil content
[0386] The genome-wide introduction lines (CSSLs) constructed based on SN14 and wild beans (ZYD00006) were used to fine-map the 2013-2015 QTL (Qpro&oil Gm20) for soybean protein and oil content (Table 25). The results showed that four protein and oil content-related QTLs (Qpro Gm20 1, Qpro Gm 20 2, Qoil Gm20 1 and Qoil Gm20 _2) having similar confidence intervals were detected on the same chromosome (Gm20) from 2013 to 2015, and the distance range was 0.02Mb-0. 16Mb, Qpro _Gm20 _2. The minimum image distance is 0.02Mb, and the maximum image distance of Qoil Gm20 2 is 0.16Mb. The logarithm of the odds (LOD) value range is 3.72-12.62. The minimum LOD value of Qpro Gm20 2 is 3.72, and the maximumLOD value of Qoil Gm20 1 is 15.16. The range of genetic contribution rate (R2) is 2.27 %-22.86%. The minimum genetic contribution rate of Qpro Gm20 2 is 2.27%, and the maximum genetic contribution rate of Qpro Gm20 1 is 22.86%. The range of additive effects is -0.52-1.27. The minimum additive effect value of Qoil Gm20 1 is -0.52, and the addition of Qpro Gm20 1. The maximum additive effect is 1.27. Since the confidence intervals of the four QTLs are close, they are integrated as the "hot spot" interval (33.54Mb-34.70Mb) for the study of protein and oil content-related QTLs. The QTLs results are used for the mining and function analysis of subsequent protein and oil content-related candidate genes. This “hot spot” interval is consistent with MQTLOil-62 (Gm20: 33. 14Mb-33.84Mb) described in Qi et al., Plant Cell Environ. 41(9):2109-2127 (2018). In this study, we identified the “hot spot” interval through Meta analysis of 312 oil content QTLs (Table 26), thus further verifying the precision and accuracy of fine positioning of protein and oil content QTL (Qpro&oil Gm20).
Table 25. QTL (Qpro&oil_Gm20) fine mapping of protein and oil content in CSSLs population
Year QTL name Gm Confidence Map Marker LOD R2 Additive interval distance name (%) effect
Figure imgf000095_0001
2013 Qpro Gm20 Gm20 33.54-33.59 0.05 Blockl 0563 12.62 22.86 1.27
1
2014 Qpro Gm20 Gm20 34.68-34.70 0.02 Blockl 0634 3.72 2.27 -0.41
2 Year QTL name Gm Confidence Map Marker LOD R2 Additive interval distance name (%) effect
Figure imgf000096_0001
MQTLOil-62 Gm20 33.14-33.84 0.70 15.52 7.00 1.70
Example 12. Candidate gene mining and WEGO analysis in the “hot spot” interval
[0387] The "hot spot" interval (33.54Mb -34.70Mb) was obtained by integrating the confidence intervals of the CSSLs population protein and oil content QTL (Qpro&oil Gm20) finely mapped from 2013 to 2015. The candidate gene mining and Web Gene Ontology (WEGO) analysis were performed on the "hot spot" interval. The results show that there are 130 candidate genes in this “hot spot” interval. There are 112 candidate genes having Gene Ontology (GO) annotations: 98 candidate genes are related to cell composition, 64 candidate genes are related to molecular functions, 72 candidate genes are involved in biological processes. Further, GO analysis of 130 candidate genes was carried out using GO analysis tools in Soybase and Quick GO databases, available at Kyoto Encyclopedia of Genes and Genomes (KEGG; www.kegg.jp/) and Gene Ontology (geneontology.org/). The results showed that: Glyma.20G092000 is involved in lipid metabolism (G0:0006629);
Glyma.20G092100 is related to the development of embryonic grains (G0:0009793) and has the functions of protein amino acid binding and glycoprotein binding (G0:0005515); Glyma. 20G092400 has catalytic activity (G0:0003824); Glyma.20G094900G related to lipid binding (G0:0005543) (Table 27). Therefore, the above four genes are used as candidate genes for protein and oil content analysis. The results suggest that these four genes maybe related to the metabolism and synthesis of protein and oil.
Table 27. GO analysis of candidate genes
Gene_ID Biological Process Molecular Function Cellular Component
Glyma.20G092000 GO: 0006096: golgi G0:0004175: aspartic- G0:0005576: cytosol organization type endopeptidase activity
GO: 0006508: glycolysis G0:0004190: G0:0005773: extracell- endopeptidase activity ularregion Gene ID Biological Process Molecular Function Cellular Component
G0:0006623: hyperosmotic G0:0005829: plasmod- response esma
GO: 0006629: lipid metabolic G0:0009506: vacuole process
GO: 0006833: organ development
GO: 0006972: protein targeting to vacuole
GO: 0009266: response to cadmium ion
GO: 0048513: water transport
Glyma.20G092100 GO: 0009793: embryo G0:0005515: protein G0:0005739: development ending in seed amino acid binding, chloroplast dormancy glycoprotein binding
GO: 0010027: iron-sulfur G0:0009507: clusterassembly mitochondrion
GO: 0010228: ovule development GO: 0016226: thylakoid membrane organization GO: 0048481: vegetativeto reproductivephase transition of meristem
Glyma.20G092400 GO: 0006569: cysteine G0:0003824: catalytic biosynthetic process activity
G0:0008152: indoleacetic G0:0030170: acid biosynthetic process pyridoxalphosphate binding GO: 0009610: metabolic process GO: 0009684: paia- aminobenzoic acid metabolic process GO: 0019344: response to symbiotic fungus GO: 0046482: tryptophan catabohc process
Glyma.20G094900 GO: 0008150: biological G0:0005543: lipid G0:0005739: process binding mitochondrion
G0:0008289: G0:0005886: plasma phospho-lipid binding membrane
G0:0005634: nucleus
Example 13. Determination of protein content of soybean grains at different developmental stages
[0388] Soybean grain protein content is one of the important traits to measure soybean quality. In this study, the Kjeldahl method was used to determine the grain protein content of parent SN14 and extreme materials (HPLO, LPHO, HPHO and LPLO) to analyze the protein accumulation characteristics of soybean grains at different developmental stages (Table 28). The results showed that the protein content of the grains of the five materials had the highest total nitrogen/protein content during the EMI period, and the nitrogen/protein content decreased with the progress of grain development. The grain protein content of the five materials all showed a sharp downward trend from the development stages of EMI to MM. The grain protein content of
Figure imgf000098_0001
HPHO and LPLO has the lowest level at the MM stage. From the development stages of MM to LM, the grain protein content of SN14, HPHO and LPLO showed an upward trend, while the HPLO and LPHO grain protein content continued to decrease, with the HPLO grain protein content reaching the lowest level at the LM development stage. During the LM-DS development stages, the grain protein content of SN14, HPHO, LPHO and LPLO decreases. The LPHO grain protein content maintained a downward trend during the entire grain development process and reached the lowest during the DS development period. Of note, the two high-protein materials, HPLO and HPHO, had higher protein contentthan the parent N J 4 at all stages of the soybean kernel development process. Moreover, the two low-protein materials, LPHO and LPLO, had lower protein contentthan the parent A' A7 Vat all stages of the soybean kernel development process.
Table 28. Protein accumulation characteristics of soybean grains at different developmental stages
Figure imgf000098_0002
Example 14. Determination of fatty acid content of soybean kernels at different developmental stages
[0389] Types of fatty acids in soybean seed oil include palmitic acid (Cl 6:0), stearic acid (Cl 8:0), oleic acid (Cl 8:1), linoleic acid (Cl 8:2) and linolenic acid (Cl 8 :3). In this study, the fatty acid content of the parent SN14 and extreme materials (HPLO, LPHO, HPHO and LPLO) were measured to analyze the fatty acid accumulation characteristics of soybean grains at different developmental stages ( Tables 29-34). The results showed that the fatty acid content of the grains of the five materials was detected in the EM1-EM2 development stage. Referring to Table 29, the palmitic acid level remained low but detectable at EMI and EM2 stages, and increased sharply from stages EM2 to MM, and peaked at stage LM, before it drops from stages LMto DS. Referringto Table 30, the stearic acid level was high at stage EMI and decreased sharply from stages EMI to EM2. The stearic acid level increased gradually from stages EM2 to LM and peaked at stage LM. The oleic acid level (Table 31) and linoleic acid level (Table 32) showed the same trend as the palmitic acid level (Table 29) throughout all five developmental stages. Referring to Table 33, generally all materials, except LPLO, had high linolenic acid at stages EMI and EM2, followed by a downward trend from stages EM2 to LM, before it increased from stages LMto DS. The LPLO linolenic acid level was irregular: it was high at stage EMI and decreased from stages EMI to EM2, followed by a sharp increase from stages EM2 to MM and a sharp decrease from stages MM to LM, and followed by an increase from stages LMto DS. All five materials have similar trends of palmitic acid, stearic acid, oleic acid, and linoleic acid throughout the five developmental stages. Referring to Table 34, the total fatty acid content has the same trend as that of several individual fatty acids, e.g., palmitic acid in Table 29. Of note, the fatty acid content of the two high-oil materials, LPHO and HPHO, was higher than that of the parent SN14, while the fatty acid content of the two low-oil materials, HPLO and LPLO, was lower than that of the parent SN14.
Table 29. Palmitic acid amounts in soybean seeds at different developmental stages in SN14, HPLO, LPHO, HPLO, and LPLO
Figure imgf000099_0001
Table 30. Stearic acid amounts in soybean seeds at different developmental stages in SN14, HPLO, LPHO, HPLO, and LPLO
Figure imgf000099_0002
Table 31. Oleic acid amounts in soybean seeds at different developmental stages in SN14, HPLO, LPHO, HPLO, and LPLO
Figure imgf000100_0001
Table 32. Linoleic acid amounts in soybean seeds at different developmental stages in SN14, HPLO, LPHO, HPLO, and LPLO
Figure imgf000100_0002
Table 33. Linolenic acid amounts in soybean seeds at different developmental stages in SN14, HPLO, LPHO, HPLO, and LPLO
Figure imgf000100_0003
Table 34. Total fatty acid amounts in soybean seeds at different developmental stages in SN14, HPLO, LPHO, HPLO, and LPLO
Figure imgf000100_0004
Example 15. Expression analysis of candidate genes in soybean grains at different developmental stages
[0390] Expression of Glyma.20G092000, Glymct.20G092100, Glyma.20G092400and
Glyma.20G094900 in SN14 and four extreme materials (HPLO, LPHO, HPHO and LPLO) were analyzed by RT-qPCR. RNA extraction, cDNA generation and RT-qPCR were performed accodin to the methods described herein. Expression was examined at eight developmental stages of their grains Glob, Hrt, Cot, EMI, EM2, MM, LM and DS (Table 7). The results (shown in Table 35A and Table 35B) indicated that, in general, the expression levels of Glyma.20G092000, Glymct.20G092100, Glyma.20G092400 an< Glyma.20G094900 started low during stages Glob and Hrt and reached the peak at stage LM or DS. Three genes, Glyma.20G092000, Glyma.20G092400 and Glyma.20G094900 share a similar trend of expression throughout the eight developmental stages. Briefly, the expression level increased from stages Hrt to Cot and dropped from stages Cotto EM2, followed by an increase from stages EM2 to LM, and dropped from stages LMto DS in SN14, LPHO and LPLO materials. The expression level of Glyma.20G092000 in HPHO and HPLO materials continued to increase from stages LMto DS and reached the highest level at stages DS. The expression level of Glyma.20G092I00 remained relatively steady from stages Glob to EM2 (except LPLO), followed by an increase from stages EM2 to DS, and the expression level remained high at stage DS (. The expression level of Glyma.20G092400 in LPLO is slightly lower than that of Glyma.20G092100 at stage DS. In each of the five materials (SN14, LPHO, HPLO, HPHO and LPLO), the expression level of Glyma.20G092400 in LPLO is slightly lower than that of Glyma.20G092100 at stage DS. The expression level of Glyma.20G092400 duringthe developmental stage was higher than that of Glyma.20G092000, Glyma.20G092100 and Glyma.20G094900 at each developmental stage of the five materials, and the expression level of Glyma.20G092400 in HPHO was the highest at stage LM. Therefore, Glyma.20G092400 is selected for further analysis of its role in the regulation of protein and oil accumulation during grain development.
Table 35A. Expression profiles of Glyma.20G092000, Glyma.20G092100, Glyma.20G092400, and Glyma.20G094900 at different developmental stages in seed
Figure imgf000101_0001
Figure imgf000102_0001
Note: the numbers represent relative expression levels
Table 35B. Expression profiles of Glyma.20G092000, Glyma.20G092100,
Glyma.20G092400, and Glyma.20G094900 at different developmental stages in seed
Figure imgf000102_0002
Note: the numbers represent relative expression levels.
Example 16. Phylogenetic analysis
[0391] The phylogenetic tree of GmDESl(Glyma.20G092400) was constructed using homologous sequences from Soybean, Arabidopsis, rice, and corn with MEGA5 software. See Fig. 4. GmDESl(Glyma.20G092400) shows identity with AT5G26600 (60.6%), AT3G62130 (55.5%), Zm00001d008187 (54.8%), Zm00001d040555 (57.2%), LOC_Os01gl8640 (56.3%) and LOC_Os01gl 8660 (52.4%).
Example 17. Experimental Materials Used in Examples 18-27
[0392] The roots, stems, leaves, flowers, pods and seeds of parent SN14 were selected as template materials fortissue-specific expression. The materials were put into an Eppendorf (EP) tube without RNase and immediately put into liquid nitrogen, and stored at -80°C. The soybean template material was SN14, and the soybean transformation material was DN50. The Arabidopsis transformation material is Col-0, and the Arabidopsis mutant material is S ALK_ 127828.47.00.x (ord ered from th e ABRC web site) . [0393] Unless explicitly stated otherwise, Escherichia COH USQA in this application was DH5a anA Agrobacterium tumefacienssNas EHA105. The target gene fragment of entry vector Fu28 was connected to plant expression vector Pr35Sby gateway vector system. The entry vector Fu28 and expression vector Pr35S were provided by Professor Fu Yongfu of Institute of crop science, Chinese Academy of Agricultural Sciences. Both vectors are described in WangX, et al. (2013) BioVector, a flexible system for gene specific expression in plants. BMC PlantBiol 13 : 198), the entire content of said publications is herein incorporated by reference.
[0394] The main reagents involved in this experiment are shown in Table 36.
Table 36. Main experimental reagent
Reagent name Manufacturer
Easy Pure HiPure Plasmid MiniPrep Kit TransGen Biotech
Trans2K Plus DNAMarker TransGen Biotech
2 x ChamQ Universal SYBR qPCR Master Vazyme Biotech
Mix
HiScript II Q RT SuperMix for qPCR Vazyme Biotech
(+gDNA wiper)
TRIzol Reagent Invitrogen
2 x Rapid Taq Master Mix Vazyme Biotech
FastPure Plasmid Mini Kit Vazyme Biotech
Gateway LR Clonase II enzyme mix Invitrogen
Gel Extraction Kit ( 200 ) OMEGA
Cycle-Pure Kit ( 100) OMEGA
KOD-Plus-Neo TOYOBO
Solution I Takara
Restriction endonuclease (Spel, P>am\A\ ) NEB
Hormones and antibiotics Sigma, Phyto Techlab with Amresco
[0395] The cultures and antibiotics involved in this experiment are Table 37 and Table 38.
Table 37. Experimental medium
Medium Component and Volume
Liquid LB 5 g/L Yeast+10 g/LNaCl+10 g/L Tryptone
Solid LB 5 g/L Yeast+10 g/L NaCl+10 g/L Tryptone +15 g/L Agar
Liquid YEP 5 g/L Yeast +5 g/L NaCl+10 g/L Tryptone
Solid YEP 5 g/L Yeast +5 g/LNaCl+10 g/L Tryptone +15 g/L Agar
Table 38 Antibiotics
Antibiotic Working concentration
Chloramphenicol 27.38 pg/mL spectinomycin 50.00 pg/mL
Rifampicin 50.00 pg/mL [0396] The websites used in the experimental analysis are shown in the Table 39.
Table 39. Function prediction website of candidate gene
Databases Uniform Resource Locator (website)
Soybase www.soybase.org/
Phytozome phytozome.jgi.doe.gov/pz/portal.html
NCBI www.ncbi.nlm.nih.gov/
KEGG www.kegg.jp/
TairlO www Arabidopsis .org/
Interproscan www.ebi.ac.uk/interpro/search/sequence-search
PlantCARE bioinformatics.psb.ugent.be/webtools/plantcare/html/
SOPMA npsa-prabi.ibcp.fr/cgi- bin/npsa automat.pl?page=npsa sopma html
SIGnAL signal, salk.edu/tdnaprimers.2.html
SWISS-MODEL swissmodel.expasy.org/
[0397] Obtain the gene genome sequence, CDS sequence, peptide sequence and expression data of each tissue part of the gene from the phytozome website. Use Soy base website, phytozome website, Interproscan website, NCBI website to annotate gene function, get Gene Ontology (GO) database number, Kyoto Encyclopedia of Genes and Genomes (KEGG) number, Pfam number and structural domain information.
[0398] The genome sequence, CDS sequence and peptide sequence of candidate genes were obtained from the phytozome website. The parental strains Suinong 14 (SN14) and ZYD00006 were fully sequenced but the sequencing information has not been published. Williams 82 is a soybean cultivar used to produce the reference genome sequence. The relevant sequences in Williams 82, SN14, and ZYD00006 were analyzed and compared by DNAMAN software.
[0399] The medium for genetic transformation of soybean cotyledon node is shown in Table 40.
Table 40. Medium formula for genetic transformation of soybean cotyledonary node
Medium Drugs and dosage
LCCM 0.321 g/L B5 salt, B5 vitamin, 30 g/L sucrose, 3.9 g/LMES, 1.67 mg/L 6-
_ BA, 0.25 mg/L GA3, 0.04 g/LAS, 0.4 g/LL-Cys, pH=5,4 _
SCCM _ LCCM, 0, 15 g/L DTT, 6,5 g/L Agar, pH=5,4 _
SIM+ 3.21 g/L B5 salt, B5 vitamin, 30 g/L sucrose, 0.59 g/L MES, 1.67 mg/L 6-
BA, 200 mg/L Cef, 250 mg/LTim, 8 mg/LBasta, 7.5 g/L Agar, pH=5.7 SEM 4.33 g/L MS salt, B5 vitamin, 30 g/L sucrose, 0.59 g/L MES, 0.5 mg/L
GA3, I mg/LZT, 0.1 g/L Pro, 0.1 g/L Asp, O. l g/LGlu, O. l mg/LIAA, _ 0.2 g/LCef, 0.25 g/L Tim, 4 mg/LBasta, 7,5 g/L Agar, pH=5,6 _ RM 1.605 g/L B5 salt, B5 vitamin, 20 g/L sucrose, 0.59 g/L MES, 1 mg/L
_ IBA, 8 g/L Agar, 0.2 g/L Cef, 0.25 g/L Tim, pH=5,7 _ Note: B5 salt: Gamborg Basal SaltMixture; MES: 2-(4-Morpholino) ethanesulfonic acid; 6- BA: 6-benzylaminopurine; GA3 : Gibberellic acid; AS: Acetosyringone; L-Cys: L-Cysteine; DTT: DL-Dithiothreitol; ZT: zeatin; Pro: Proline; Asp: Aspartic acid; Glu: glutamic acid; IAA: 3-Indoleacetic acid.
Example 18. Methods for the Identification of Arabidopsis mutants
A. Arabidopsis mutant search and order
[0400] Using the Blast function ofPhytozome website, the target crop was selected as Arabidopsis thaliana, and the gene sequence was Glyma. START-CDS sequence. The homologous gene in Arabidopsis thaliana was obtained. The conserved functional domain of the homologous gene in Arabidopsis thaliana was predicted, and the domain was similar to the target gene Glyma. START. Search for homologous genes in Arabidopsis thaliana on TairlO website, screen and order mutants that meet the following conditions: (1) The control background was wild-type Col-0; (2) There are fewer mutations in transcriptional gene, and it is better to mutate only in target gene; (3) T-DNA insertion; and (4) The mutant genotype was homozygous (AT1G05230).
B. Xarbim Arabidopsis thaliana
[0401] The planting soil ratio offlower nutrient soil: vermiculite was 3 :1. The soil was put into small flower pots and slowly soaked in water. Arabidopsis thaliana seeds were sown evenly in moist soil. The opening of each pot was sealed with plastic wrap and placed in a refrigerator at4°C for vernalization for 48-72h. After vernalization, the pots were placed in an incubator (22°C, 16 h/8 h light/dark, 70 pmol»m-2»s-l) for 1 week until the Arabidopsis emerged. Arabidopsis plants having two large leaves and two small leaves were selected for transplanting in pots; 1-2 plants per pot. Water or flower fertilizer was added when the soil in the pots became dry.
C. DNA extraction from Arabidopsis leaves
[0402] Total DNA was extracted by the CTAB (hexadecyltrimethylammonium bromide) method (Porebski, S. et al., Plant Molecular Biology Reporter, 1997, 15(1):8-15). The prepared CTAB extract was stored forat4°C. The rosette leaves oi Arabidopsis aXiana were collected and placed in an EP tube with add 2 mm small steel balls. Lquid nitrogen was used to quick-freeze the leaves. Next the frozen leaves placed in a tissue grinder to fully break the leaves. 700 pL of CTAB extract solution was added to the EP tube containingthe sample and mixed thoroughly with a vortexer. The mixture in the EP tube was then placed in a 65°C water bath for 1 h, turning and mixing once every 10 minutes. The EP tube taken out of the water bath and added 650 pL of chloroform after cooling. The two was inverted 30 times to mix thoroughly, and centrifuged at 12000 rpm for 15 minutes at room temperature. 400-500 pL of the supernatant was added into a new EP tube and 650 pL of chloroform was added. The mixture was shaken and mixed thoroughly, and centrifuged at 12000 rpm for 15 minutes at room temperature. 400-500 pL of the supernatant was transferred to a new EP tube containing 700 pL of pre-cooled isopropanol and inverted 30 times to mix thoroughly. The mixture was then centrifuged at 12000 rpm for 15 minutes at room temperature. The supernatant was discarded, and the precipitate was washed once with 95% ethanol, then once with 75% ethanol, and centrifuged at 7500 rpm for 5 min at room temperature. The DNA precipitate was dried and dissolved with 50 pL of sterilized water. DNA concentration (as reflected by the OD600 value) was measured, and the DNA was stored at -20°C.
D. PCR identification Arabidopsis mutants
[0403] DNA from Arabidopsis wild-type Col-0 and mutants was extracted and used as a template for PCR amplification. The amplified product was subjected to 1.5% agarose gel electrophoresis to detect whether the mutant was a homozygous mutant. The primers used are shown in Table 41.
Table 41. PCR primers for identification of Arabidopsis homozygous mutant
Primer name Sequence
SALK-LP GCTCTGCCAATTTCAGCATAC (SEQ ID NO: 71 )
SALK-RP TGTCTCCTCCTCCTCTTCCTC (SEQ ID NO: 72)
SALK-BP ATTTTGCCGATTTCGGAAC (SEQ ID NO: 73)
E. Arabidopsis genetic transformation and identification i. Flower dipping transformation o Arabidopsis thaliana
[0404] Arabidopsis cultivation and plant transformation preparation. Arabidopsis control group Col-0 and homozygous mutant materials were planted as described above. After the Arabidopsis was bolted, the stalks were removed to increase the number of bolts. The plants were then ready to be transformed when the stalks growedto the same height and only the upper flowers were not blooming. [0405] Aerobacterium preparation. Agrobacterium tumefaciens containing the expression vector at -80°C were inoculated into lOmL of LB liquid medium containing spectinomycin and cultured overnight at 28°C at 160 rpm. 100 pL of small shaking bacteria liquid was then transferred to 100 mL of new YEP liquid medium containing spectinomycin for further culturing at 28°C, 200 rpm shaking. When the density of the culture reached OD6000.8, the culture was harvested and resuspended the bacteria pellet with 1 OOmL of 5% sucrose and 0.01% Silwet-L77 resuspension solution. The suspension was kept at room temperature for 1- 3h for agricultural use.
[0406] Transformation of Arabidopsis thaliana. Arabidopsis thaliana that had grown to a suitable bolting height with a large number of inflorescences were used for the transformation. The flowering flocs and the established pods were removed. The unflowered flocs were immersed in the Agrobacterium resuspension for 30s. The Arabidopsis thaliana infected by Agrobacterium was then wrapped in plastic wrap and placed in a dark box for light-proof treatment. After the incubation period of 24 hours, the infected plants were then taken out of the dark box. A second round of transformation was then performed on these plants a week later in order to improve the conversion efficiency. Mature seeds of the plants were harvested. ii. Screening Transgenic Arabidopsis with Basta
[0407] The mature To seeds of the transformcddraA/t/o/zs/.s thalianasNexe harvested and planted as described above. When the two young true leaves were fully expanded, Basta liquid (Basta dilution concentration is 1 :1000) was sprayed on the plants 2 - 3 times, once every other day, and the growth state of Arabidopsis was observed. Non-transgenic Arabidopsis plants appeared chlorosis and gradually died, while transgenicAra/utfo/zws plants grew normally. After the transgenic Arabidopsis thaliana plant grew 4 leaves, the plants that were positively identified as transgenic plants were transplanted into new small pots, and the seedlings grow up before identification. iii. Bar test strip detection of transgenic Arabidopsis thaliana
[0408] Transgenic Arabidopsis thaliana (Tb T2 and T3) rosette leaves were placed in the EP tube andgrinded with a small pestle. The ground leaves were placed into the EP tube in the direction indicated by the Bar test strip, and observe the strips shown on the test strip Number, two bands represent that the identified Arabidopsis plants are transgenic plants, and one band is non-transgenic plants. iv. Identification of transgenic Arabidopsis thaliana
[0409] The leaf DNA of transgenic Arabidopsis thaliana (Tb T2 and T3) was extracted. The transgenic plants were identified by PCR using Glyma.06G303700 gene primers and Bar primers using primers shown in Table 42. The PCR products were detected by 1.5% agarose gel electrophoresis.
Table 42. Primers for identification of transgenic Arabidopsis thaliana or transgenic b i Gl 06G303700
Figure imgf000108_0001
v. qRT-PCR identification of T3 generation transgenicd/c/A/t/o/zsv'.s thaliana
[0410] Total RNA of Arabidopsis rosette leaves were extracted and reversely transcribed into cDNA. The expression level of Glyma.06G303700 in transgenic Arabidopsiswas determined using primer sequence shown in Table 43 o AtACTIN2 was used as an internal reference gene.
Figure imgf000108_0002
vi. Determination of total nitrogen content in Arabidopsis seeds of Arabidopsis mutants and transgenic plants
[0411] Nitrogen content of the seeds was determined using 0.1 mol/LNa2CO3 calibration to prepare 0. 1 mol/L HCl. 1% H3BO3 was prepared and was adjusted to pHbetween4 and 5 Seven millileters of 0.1% methyl red and 10 mL of 0. 1% bromophenol green indicator were added for every 1 L of H3BO3, and the solution appeared wine red. Prepare 40% NaOHfor determination. [0412] The seeds were placed in an oven at 60°C for 12-14 hours. 0.1 g sample (accurate to 0.001 g) was poured into a 50 mL digestion tube through a paper trough. The same sample was tested 3 times. 5 mL of concentrated sulfuric acid and a small amount of catalyst (potassium sulfate and copper sulfate 5 : 1) was added to digest each sample in an ovenat 400°C for 90 minutes. The sample was then taken out from the oven and let cool. FOSS automatic Kjeltec 2300 was used to determinethe total nitrogen content. vii. Determination of fatty acid content in Arabidopsis seeds of Arabidopsis mutants and transgenic plants
[0413] The content of fatty acids in seeds was determined by gas chromatography as follows. The seeds were placed in an oven at 105°C for 20-30 minutes, and then at 65 °C for 12-14 hours. 5 replicate tests were performed for each sample. In each test, about 5 mg of the seed sample was mixed with 1 mL 2.5% concentrated sulfuric acid methanol solution, 5 pL 50 mg/mL BHT (2,6-di-tert-butyl-4-methylphenol). 50 pL lOmg/L heptadecanic acid or acetic acid was used as internal standard. The storage tube was immediately sealed and placed into a water bath at 85°C for 1.5 h. The tube was inverted every 10 minutes to mix the sample and reagents thoroughly, and then letcool to room temperature. 160 pL of 9%NaCl solution and 700 pL of n-hexane were then added to the storage tube, and the mixture was vortexed for 3 minutes andcentrifuged at 4,500 rpm for 10 minutes at room temperature. 400 pL of the supernatant of each sample were placed into a new centrifuge tube and dried overnight in a fume hood. 400 pL of ethyl acetate was then added to the dry pellet to fully dissolve it before the measurement.
[0414] The column model used by the Agilent 6890 gas chromatograph was: 30m*320pm*0.25pm. Carrier gas: nitrogen 60 mL/min, hydrogen 60mL/min, air 450 mL/min. Injection volume: 1 pL, split injection mode, split ratio 10:1, injection port temperature 170°C. Reaction procedure: hold at 180°C for 1 min, increase to 250°C at a rate of 25°C/min and hold for 7 min. ms
[0415] Calculation formula of absolute quantity : ®i = Ai x - Ai is the peak area of
Asx m the ith fatty acid component, As is the peak area of internal standard, msis the mass of internal standard, m is the dry weight of the sample.
[0416] Relative quantity calculation formula : w(%) = (= — ) x 100% . viii. Transformation of soybean cotyledon nodes
[0417] Soybean cotyledon nodes were transformed and cultivated usingthe following protocol:
[0418] Preparation of Agrobacterium tumefaciens and soybean cotyledon. Take out the Agrobacterium tumefaciens containing the expression vector at -80°C and inoculate it in lOmL of LB liquid medium containing spectinomycin, and culture it overnight at 28°C at 160 rpm. Transfer 100 pL of small shaking bacteria liquid to 100 mL of new YEP liquid medium containing spectinomycin for culture, 28°C, 200 rpm shaking culture to OD6oo=0.8.
Centrifuge at4000rpm for 10 min at room temperature, discard the supernatant medium, and resuspend the bacteria with lOOmL of 5% sucrose and 0.01% Silwet-L77 resuspension solution, and let stand for l-3hatroom temperature for agricultural use. Resuspend the bacteria above in 100 mLLCCM and incubate at 28°C 200 rpm for 30 min for subsequent transformation. Resuspend the bacteria above in 100 mL LCCMand incubate at28°C 200 rpm for 30 min for sub sequent transformation. Sterilize soybean seeds by the following procedures. Choose full and undamaged seeds into the petri dish, put the petri dish and beaker with the selected seeds into the airtight container in the fume hood, open the lid of the petri dish, and add sodium hypochlorite and sodium hypochlorite to the beaker at 94:6 Hydrochloric acid and quickly seal the airtight container, turn on the fume hood switch, airtight and sterilize the seeds after 8-12 hours, and blow them in a clean bench for 30 minutes to remove the chlorine attached to the surface of the seedsto avoid damage to the seeds. Add appropriate amount of sterilized water to the soybean seeds to make the seeds absorb the water just to complete the imbibition. Putthe seeds in the dark for 12-14h.
[0419] Co-culture. Divide the seed into two halves along the hypocotyl with a razor blade, and use a razor blade to lightly scratch 2-3 points at the cotyledon node to make a cut. Put the explants into the prepared Agrobacterium resuspension, incubate at 160 rpm at28°C for 30 min to facilitate the Agrobacterium infection, and remove the infected explants from the resuspension with tweezers. Place it on the SCCM covered with filter paper and incubate for 3-5 days at 25 °C in the dark.
[0420] Induction of clumping buds. After 3 -5 days of co-cultivation, after the hypocotyls are enlarged, cut the hypocotyls of the explants with a blade, leaving about 2 mm of the hypocotyls, and putthe explants after cutting the hypocotyls into sterilized water several times Wash until the liquid is clear and sterile in order to remove excess bacterial liquid. Put the cut-off hypocotyl explants on sterile paper to absorb the remaining liquid on the surface, and insert the explants into the SIM+with tweezers. Set the conditions of the sterile tissue culture room to 25 °C 16 h/8 h light/darkness and place the screening medium plate with explants in the sterile tissue culture room for about 14 days. Observethe growth of the clump buds, take out the slow-growing clump buds and scratch the wound at the bottom again and insert them into new SIM+; the good ones are used for transfer to the SEM.
[0421] Elongation of cluster buds. Cutthe large clump buds and insert them into the SEM, and place them in a sterile tissue culture room for about 14 days. The clump buds that have not grown buds are taken out from the SEM, lightly scratched at the bottom to create a new wound, and then inserted into a new SEM for secondary culture. The culture cycle is about 14 days and the process is repeated.
[0422] Identification of positive elongated buds. When the buds are about 5 cm long and there are about 3 leaves, select a leaf, and perform Bar test strip test as described below to preliminarily determine the positive seedlings.
[0423] Rooting of positive elongated buds. The positive buds were cut from the clumping buds, dipped in IB A hormone for 30 s, inserted into the RM, and cultured in a sterile tissue culture room until they took root.
[0424] Transplanting and cultivation of positive seedlings. The positive seedlings were taken out from the culture medium, and the roots were cleaned with clean water to remove the residual culture medium. The positive seedlings were transplanted into the soil and cultured in the plant greenhouse. ix. Bar test strip detection of Tx generation transgenic soybean
[0425] Select the transgenic soybean plant T1 rosette leaves in the EP tube, add the extract and grind the leaves with a small pestle, insert them into the EP tube in the direction indicated by the Bar test strip, and observe the strips shown on the test strip Number, two bands represent that the identified soybean plants are transgenic plants, and one band is non- transgenic plants. x. PCR identification of T1 generation transgenic soybean
[0426] Follow the DNA extraction of the leaf DNA of the transgenic soybean Ti. Glyma.06G303700 gene primers and Bar primers were used to identify transgenic soybeans by PCR. The related primer sequences are shown in Table 42 above. xi. qRT-PCR identification of TL generation transgenic soybean
[0427] Leaves from the soybean plants were immersed in an RNase-free EP tube and freezed in liquid nitrogen. Total RNA was extracted and reversely transcribed into cDNA. qRT-PCR was performed with the primers in Table 44 to analyze the expression of Glyma.06G303700.
Table 44. qRT-PCR primers for gene expression pattern analysis
Primer name Sequence
Glyma.06G303700-q-F AGTTGCACCGATTCAACAGGC (SEQ ID NO: 65)
Glyma.06G303700-c\- CCATGCGATGTGGTTCCATCT(SEQIDNO 66)
GmActin4-q-F GTGTCAGCCATACTGTCCCCATTT (SEQ ID NO: 69)
GmActin4-q-R GTTTCAAGCTCTTGCTCGTAATC A (SEQ ID NO: 70) xii. Determination of protein, oil and fatty acid content in transgenic soybean seeds
[0428] An InfraTec™ 1241 Grain Analyzer (FOSS Analytics) was used to determine the protein and oil content of soybean seeds. Each sample was measured 3-5 times, and the average value was used for phenotypic data analysis.
[0429] The content of fatty acids in seeds was determined by gas chromatography and calculated as described in section vii of this Example 2 above. xiii. Haplotype analysis
[0430] Using the collected and sorted out SNPs of 680 re-sequenced genomes of soybean resources in Northeast China, the protein and oil content of the above 680 materials were determined by the FOSS grain analysis method for haplotype analysis. The Glyma.06G303700 (including promoter) sequence length and sequence information were obtained from the Phytozome website, and 680 soybean resource population resequencing (1 Oxsequencing) genome sequence information was used as the population verification data for this experiment. Extract the SNP information of the gene Glyma.06G303700 (including promoter) sequence. Submit the difference SNP information sorting format to Haploview software, divide the gene block and obtain the haplotype classification in the block.
[0431] Taking the haplotypes with more than 5.0% of the population as the excellent haplotypes, the one-way variance method (ANOVA) in the SPSS software was used to analyze the significant differences among the excellent haplotypes and their phenotypes. Example 19. Identifications of the candidate genes by SNP analysis
[0432] Grain protein and oil content of the population of the plants (SN14) was determined, and the protein content was sorted. After the sorting, 20 samples from the high protein and low protein content range were selected and extracted. Quality DNA to prepare high and low phenotype pools for BSA sequencing. Use the SNP-index correlation algorithm to select candidate regions. With SN14 as the reference parent, 3 candidate segments are associated with 95% confidence level, and the genes that cause stop loss, stop gain, or contain Genes with non-synonymous mutations or alternative splicing sites were selected as candidate genes, and a total of 5 genes were screened. The results of bulked segregant analysis (“BSA”) mixed pool sequencing are from the master's thesis of Li Wei, Northeast Agricultural University (2016).
Table 45. Identification of ASNP in putative candidate genes
Linkage Gene Position A(SNP index)
Figure imgf000113_0001
Example 20. Analysis of candidate genes tissue expression
[0433] Glyma.03G040200 is reported to have the highest expression in seeds, with a slight expression in stems and no expression in other tissues, but the relative expression levels of seeds and stems are not more than 1. Glyma.03G036300 has no expression in any organ. Glyma.06G297500 has extremely high expression levels in various tissues, among which root hairs and roots have the highest expression levels, followed by tip meristems, which gradually decrease in terms of root nodules, stems, pods, leaves, and flowers, and seeds have the lowest expression levels. Glyma.07G 192400 is expressed in all tissues. The apical meristem has the highest expression, followed by pods and seeds, and the tissue with the lowest expression is roots. Glyma.06G303700 is not expressed in root nodules, and has the highest expression in apical meristems, followedby pods and seeds, and then decreases in the order of flowers, roots, stems, leaves, and root hairs.
I l l Example 21. Promoter element analysis of Glyma.06G303700
[0434] The 3000 bp upstream of the genome sequence of Glyma.06G303700 iQK, obtained as the promoter sequence of the gene and submitted to the PlantCARE website. The promoter elements were obtained, screened and integrated, and the gene promoter elements were visualized using the TBtools software. The results show that the promoter region of the gene Glyma.06G303700 include at least the following regions: (i) 60K protein binding site, (ii) an cis-acting element involved in defense and stress response, (iii) a common cis-acting elements in the promoter and enhancer regions, (iv) a core promoter element, (v) an element for maximal elicitor-mediated activation elements, (vi) a conservative DNA module array (CMA3), (vii) light-responsive elements. The promoter sequence of Glyma.06G303700 contains a large number of TATA boxes (the core promoter element near the transcription promoter), which plays a certain role in regulating gene expression. The photoresponsive element of the promoter contains MYB binding sites involved in photoresponse, and some conserved DNA modules involved in photoresponse.
Example 22. Protein structure prediction of Glyma.06G303700
[0435] The amino acid sequences of the genes in the parent SN 14 and ZYD00006 were submitted to the SOPMA website for protein secondary structure prediction. The protein secondary structure prediction of this gene in parent SN14 shows that it contains 36.63% a- helix, 14.13% extended chain, 5.21% P-turn and 44.03% random coil; in parent ZYD00006 its protein The secondary structure prediction shows thatit contains 36.35% a-helix, 14.13% extended chain, 4.12% P-turn and 45.40% random coil. The change of only one amino acid base leads to a decrease in the number of a-helices and a decrease in the number of P-turns in ZYD00006 relative to SN14, resultingin more random coils.
[0436] The amino acid sequences of the genes in proteins encodedby the parent SN14 and ZYD00006 were submitted to the SWISS-MODEL website for protein tertiary structure prediction, and models with a QMEAN Z score higher than -4.0 and covering amino acid mutation sites were screened. The predicted gene sequence differs in the tertiary structure of protein in SN14 and ZYD00006.
Example 23. Tissue-specific expression analysis of Glyma.06G303700
[0437] RNA was extracted from the organs (roots, stems, leaves, flowers, pods and seeds) of SN14 and reverse transcribed into cDNA, which was identified by qRT-PCR, G/y«?a.06G303700 was expressed in all tissues and organs, very low expression in roots, relatively high expression in stems, leaves, and flowers, higher expression in pods, and the highest expression in seeds, reaching a relative multiple of more than 5 times.
[0438] To analyze tissue specific expression of Glyma.06G303700, RNA is extracted and cDNA were synthesized (Table 46) and qRT-PCRwas performed using the following specific primers. The reference gene is GmActin (Genbank No: AF049106, Table 44).
Table 46. Reaction solution preparation for qRT-PCR
Component Volume
2 x ChamQ Universal SYBR qPCR Master Mix 10 pL
Primer F (lOpM) 0.4 pL
Primer R(10pM) 0.4 pL
Template cDNA 1.5 pL ddH2O Up to 20 pL
Example 24. Subcellular localization of Glyma.06G303700
[0439] Tobacco cultivation. Tobacco planting soilwas prepared by mixing flower nutrient soil with vermiculite at a ratio of 3 : 1. After germination, the seedlings or transfers to new small flowerpots, one plant per pot, placed into an incubator (22°C, 16 h/8 h light/dark, 70 pmol*m-2«s-l) for cultivation, and watered once every 2 days to ensure adequate water.
[0440] Agrobacterium injection of tobacco leaves. Pr35 -Glyma.06G303700 Agrobacterium tumefaciens from -80°C were thawed and inoculated in 10 mL of spectinomycin-resistant YEP liquid medium. The culture was grown a shaker at200rpm, 28°C until reaching a density of ODgoo=0.8. and the Agrobacterium culture was harvestedby centrifuging at 10,000 rpm at room temperature for 1 min. Resuspension Buffer was prepared (1 mL 20 mMMES pH=5.6 and 500 pL lMMgC12 to 50 mL with sterile water) and used to wash the cells twice. The agrobacterium pellet was then resuspended in 1 mL of resuspension buffer and 2 pL of acetosyringone (dissolved in DMSO) to reach a final concentration of 0.04 g/L to the bacteria. The bacterial solution was then transferred to a large EP tube, adjusted the ODgoo to about 0.2 by resuspending the Buffer, and let stand at room temperature for 1-3 h. Healthy tobacco after 3 weeksof growth was selected. The tobacco leaf of the tobacco plants or pierced with a syringe needle and injected with the prepared Agrobacterium. Tobacco plants inoculated with Agrobacterium tumefaciens were then placed in an incubator (22°C, 16 h light/8 h dark, 70 pmol/m2/s) for 48 hours, the epidermis removed in order to observe the subcellular localization of the target protein through a confocal microscope. The green fluorescence of pr35S-Glyma.06G303700-GFP appears in the cell membrane and nucleus, indicating a nuclear membrane co-expression pattern.
Example 25. Cloning and Vector Construction of Glyma.06G303700
[0441] Total RNA extraction from soybean SN14 leaves. RNA from young and tender SN14 triple compound leaves was extractedby the trizol method. With 2% concentration agarose gel and electrophoresis detection, three bands of28s, 18s and 5s were observed, which indicated that the integrity of the RNA was good. The cDNA was obtained by reverse transcription and used for Glyma.06G303700 gene cloning.
[0442] Glyma.06G 303700 clone. The CDS sequence of Glyma.06G303700 was obtained from the phytozome database. The total length of the sequence is 2190bp. This sequence was used as a template to design primers at both ends of the gene's CDS sequence (with the terminator removed). The primer pair was designed to comprise restriction sites CS’y>c4 and BamH ) at both ends of the ccdB gene in the entry vector Firstly, SN14 leaf cDNA was used as a template to clone the CDS sequence of Glyma.06G303700 gene with CDS primers, and then this product was used as a template to perform PCR with primers with restriction sites to obtain Glyma.06G303700 with restriction sites on both ends. The gene products with restriction sites were recovered through the gel recovery kit for subsequent experiments The full-length CDS sequence of Glyma.06G303700 (with the termination codon TGA removed) was cloned using the cDNA of soybean Suinong 14 leaves as a template. The CDS sequence was amplified using the following primers.
Table 47. Clone primer
Primer name _ Sequence _
Glyma.06G303700-F ATAACTAGTATGTTCCAGCCGAACC (SEQ ID NO: 63)
Glyma.06G303700-R. ATAGGATCCAGCAGGTTCACCAGA (SEQ ID NO: 64)
[0443] Construction of entry vector (Fu28-Glyma.06G303700). The Fu28 empty vector and the target gene were digested with restriction endonucleases (Spe and /G/iiHI) and ligated with Solution I ligase. The ligation product was transferred to E. coli competent DH5a, cultured in a chloramphenicol resistant plate for about 16 hours, and monoclonal colonies were picked, and the activated bacterial liquid was identified with a value of 1.5% Concentration agarose gel electrophoresis identification and sequencing comparison. The results showed that the target band appeared at 2190 bp, and the sequencing comparison results were consistent with the Glyma.06G303700 gene sequence. The amplified Glyma.06G303700 fragments were gel-purified and cloned into an entry vector Fu28 by restriction digestion and ligation. The Fu28 vector fragment with the ccdB gene cut out was about 3200 bp. The geneGlyma.06G303700 fragment is about2200bp. The ligation products were transformed into Escherichia coli. Bacterial clones comprising the cDNA sequence of Glyma.06G303700 were identified by PCR and verified by sequencing analysis using primers described below.
[0444] Construction of expression vector (or35S-Glyma.06G303700). The Fu28- Glyma.06G303700 and pr35S vector plasmids were extracted, the plasmids were recombined by LR reaction, and the products were transferred into E. coli competent DH5 a, cultured in spectinomy cin-resistant plates for about 16h, and single colonies were picked, The primers in 2-15 were used to perform PCR identification on the activated bacterial solution, identified by 1.5% concentration agarose gel electrophoresis and sent for sequencing comparison. The results showed that the target band appeared at 2190 bp, and the sequencing comparison results were completely consistent with the Glyma.06G303700 gene sequence.
[0445] Expression vector transferred into EHA105 Agrobacterium tumefaciens. EHA105 Agrobacterium competent cells were first transformed with pr35S-Glyma.06G303700, the transformed bacterial cells were grown on a YEP plate that is resistant to both rifampicin and spectinomy cin, and monoclonal colonies were selected. The transformation was confirmed by PCR as indicated by the presence of a a 2190 bp DNA fragment, which represented that the expression vector (pr35 S-Glyma.06G303700) has been transferred into EHA105 Agrobacterium tumefaciens. Using Gateway technology, the Glyma.O 6G 303700 gerre fragment and related tags contained in the Fu28 entry vector were transferred to the expression vector pr35 S (spectinomycin resistant) through the LR recombination reaction. The reaction system and reaction conditions are shown in Table 48.
Table 48. LR reaction system
Component Volume Reaction conditions
Pr35 S empty vector plasmid 2 pL 25°C
^u2 -Glyma.06G303700
LR Clonase Enzyme Mix
Figure imgf000117_0001
[0446] The resulting plasmid pr35 S- Glyma.06G303700 was transformed into Escherichia coli. The positive clones were identified by PCR and sequencing analysis. pr35S- Glyma.06G303700 plasmid was then extracted from the positive monoclonal bacteria culture and transformed into Agrobacterium tumefaciens EHA105. Positive clones were identified by PCR.
[0447] Transient expression localization of grobacterium tumefaciens
Figure imgf000118_0001
was injected and transformed into tobacco leaves. After 48 hours, the injected leaves were cut and the epidermis was removed. They were spread out in clean water and placed on a glass slide and covered with a cover glass. The subcellular localization of pr35S- Glyma.06G303700-GFP expressing fusion protein was observed under a confocal microscope the green fluorescence of 35 -Glyma.06G303700-GFP appeared on the cell membrane and nucleus, indicating that the gene Glyma.06G303700 is a nuclear membrane co-expressed gene.
Example 26. Expressing Glyma.06G303700 in Arabidopsis i. Selection of Arabidopsis mutants
[0448] Using the Blast function contained in the Phytozome web site, Arabidopsis gene AT1G05230 homologous to Glyma.06G303700sNeve selected and their conserved domains identified. Arabidopsis homologous gene A T1 G05230 contains three conserved domains: START_ArGLABRA2_like, Homeobox, andMrC superfamily. Similar to Glyma.06G303700, the Arabidopsis homologous gene ATJG05230 also has the START_ArGLABRA2_like domain, as shown in Table 49.
Table 49. AT1G0523 gene function annotation
Conserved domain Annotation
START_ArGLABRA2_ PTHR24326: Family not named like PTHR24326:SF239: HOMEOBOX-LEUCINE ZIPPER
Homeobox PROTEIN HDG2
MreC superfamily PF00046: Homeobox domain
PF01852: START domain
KOG0483 : Transcription factor HEX, contains HOXandHALZ domains
KOG0484: Transcription factor PHOX2/ARIX, contains HOX domain
GO: 0003677: DNAbinding;
GO: 0003700: sequence-specific DNAbinding transcription factor activity;
GO: 0006355: regulation of transcription, DNA-templated;
GO: 0008289: Interacting selectively and non-covalently with a lipid GO: 0010090; GO: 0005634; GO: 0010103; GO: 0048497
[0449] Efforts of searching for ATI G05230 mutants on the TairlO website led toidentification of SALK 127828.4700.x (SEQ ID NO: 60) as the Glyma.06G303700 Arabidop sis mutant. SALK 127828.4700.x is an Arabidopsis mutant with Col-0 as the background, with an insertion of 186bp sequence into the coding region by means of T-DNA insertion mutagenesis. ii. PCR identification Arabidopsis mutants
Figure imgf000119_0001
[0450] [0001] Arabidopsis mutant SALK 127828.4700.x andthe Arabidopsis wildtype Col-0 material were planted, and DNA extracted from the rosette leaves. PCR was performed with a combination ofLP+RP nd LP+BP primers as shown in Table 41. The length of the PCR product of LP + RP was 1170 bp, and the length of the product of LP + BP was 578-878 bp. The results of 1.5% agarose gel electrophoresis indicate that the mutant was homozygous. iii. Basta screening of transgenic Arabidopsis
[0451] The To seeds transformed by Arabidopsis mutants were planted. After growing two leaves, Basta reagent (reagent comprising Basta herbicide) was sprayed once every other day. After spraying three times, a large number of Arabidopsis were found to be yellow and stagnant, and only a few Arabidopsis plants continued to grow. It was preliminarily identified as transgenic Arabidopsis replenishment and ov erexpression plants. iv. Bar test strip detection and PCR identification of trans genic /l/'6//>/ fo/zs7'.s
[0452] Transgenic Arabidopsis T1 , T2, T3 generation plants were planted. Leaf extract was prepared as described above. A Bar test strip was inserted into the extract in a specified direction as provided in the manufacture’ s instructions. The results displayed on the Bar test strip indicated that the Arabidopsis from which the leaf extract was obtained was genetically modified.
[0453] Transgenic Arabidopsis plants were planted in Tl, T2, and T3 generations. DNA was extracted from rosette leaves of Arabidopsis thaliana, and PCR of the target gene Glyma.06G303700 and the Bar gene were performed respectively. After 1.5% concentration agarose gel electrophoresis, the results showed that there were bands at 516 bp (Bar gene) and 2190 bp (Glyma.06G303700 gene), indicating that transgenic Arabidopsis plants were obtained. v. qRT-PCR identification of T3 transgenic 4/z//>/6fo/?.s7.s
[0454] Transgenic Arabidopsis (overexpression plant pr35S:Glyma.06G303700 and mutant replenishment plant pr35S: Glyma.06G303700/SALK 127828.4700.X) and Col-0 and mutant plant SALK 127828.4700.Xwe\'e planted on the same conditions, the total RNA was extracted from the rosette leaves and reverse transcribed into cDNA, and the expression of the target gene Glyma.06G303700 was checked by qRT-PCR reaction. The results showed that the gene Glyma.06G303700 was expressed in mutant replenishment plants and overexpression plants Expression was not detected in WT or
Glyma.06G303700/SALK_127828.4700. X samples. The expression level of the gene in overexpression plants was higher than that in mutant replenishment plants. See Table 50.
Table 50. Relative expression levels of various transgenes in the transgenic Arabidopsis plants
Figure imgf000120_0001
vi. Determination of fatty acids and total nitrogen in T3 transgenic AzA/Ayzs/.s
[0455] Transgenic Arabidopsis thaliana (overexpressed plant pr35 S.Glyma.06G303700 and mutant complement plant pr35S:Glyma.06G303700/SALK 127828. 4700. X) and Col-0 and mutant plant SALK 127828. 4700. A were planted underthe same conditions. The mature pods of T3 generation transgenic plants of Arabidopsis thaliana were collected, and the seeds were obtained and dried. The fatty acid composition content of Arabidopsis thaliana seeds was determined by gas chromatography, and the total nitrogen content of Arabidopsis thaliana seeds was determined by Kjeldahl nitrogen determination method. When the phenotype of mutant materials and wild-type materials was determined, the content of fatty acid components in mutant plants was lower than wild-type materials, and the content of oleic acid, linoleic acid and eicosenoic acid was significantly lower than wild-type materials. The total nitrogen content of mutant plants was significantly lower than wild type plants. When the phenotype of T3 transgenic seeds was determined, the content of fatty acids in the seed grains of the mutant plants was significantly increased, but still lower than the control plants, and the content of linoleic acid was significantly increased; the content of components in the overexpression plants was higher than the wild type, the content of palmitic acid was extremely significantly increased, and the content of oleic acid and eicosenoic acid was significantly increased (Table 51 ) . In terms of protein content, the total nitrogen content of mutant replenishment plants increased significantly, which was significantly different from that of control materials. The total nitrogen content of overexpressed plants was significantly higher than that of wild-type plants. The results showed that in Arabidopsis seed protein oil accumulation, Glyma.06G303700 could promote fatty acid content to a certain extent, and the effect on protein content was more significant.
Table 51. Analysis of seed fatty acid content/profile andprotein content, respectively, in Arabidopsis mutant, transgenic Arabidopsis expressing a Glyma. START (Glyma.06G303700).
Figure imgf000121_0001
Example 27. Expressing Glyma.06G303700 in soybean i. Bar test strip detection and PCR identification of Tt transgenic soybean
[0456] The Ti genetically modified soybeans were planted the leaves were crushed and tested usingthe Bar test strip as described above. The result showed that two horizontal lines appear on the Bar test strip, indicating that the verified plants were genetically modified soybean plants.
[0457] The DNA was extracted from the leaves of Tx generation transgenic soybean, and the target gene Glyma.06G303700 primer and Bar primer were used for PCR. After 1.5% concentration agarose gel electrophoresis, the results showed that there were 516bp (Bar) and 2190bp (Glyma.06G303700) bands, indicating that the verified plants were transgenic soybean plants. ii. qRT-PCR identification of T1 transgenic soybeans
[0458] The transgenic soybean (overexpression plant 35 .Glyma.06G303700) andthe control plantDN50 were planted under the same conditions. The youngleaves were taken to extract total RNA and reverse transcribed into cDNA. The expression level of
Glyma.06G303700 was tested by qRT-PCR reaction. The results showed that the expression level of Glyma.06G303700 in the overexpression plants was higher than the control plants, indicating that Glyma.06G303700 was successfully transformed into soybean plants. See
Table 52
Table 52. Expression level of Glyma. START (Glyma.06G303700) in transgenic soybean plants.
Figure imgf000122_0001
Note: the numbers represent relative expression level iii. Determination of protein and fatty acids in T1 transgenic soybean
[0459] The transgenic soybean (overexpression plant 35 .Glyma.06G303700) andthe control plantDN50 were planted under the same conditions, their mature seeds were harvested, and some of the seeds were dried for phenotyping, andthe grain protein and oil content were determined by gas chromatography analysis. The content of fatty acid components in Arabidopsis seeds was determined by gas chromatography. The protein, oil, and fatty acid content in the overexpression plants were significantly higher than the control plants, indicating that Glyma.06G 303700 promoted quality traits (protein and oil content).
See Table 53.
Table 53 Seed protein content and fatty acid content, and fatty acid profile, respectively, in transgenic soybean expressing a Glyma. START (Glyma.06G303700).
Figure imgf000122_0002
Note: the numbers represent the percentage of the seed by weight for each component Example 28. Haplotype analysis of Glyma.06G303700 in soybean i. Materials for haplotype analysis
[0460] The haplotype analysis were performed on 680 soybean resource populations in Northeast China. The protein oil content of the soybean resource population in Northeast China was phenotypically analyzed, and the analysis showed that this population had varying amounts of protein and oil content. The highest protein content of this resource group in the northeast region in 2019 was 52.94%, the lowest was 37.09%, and the average was 42.69%; the highest oil contentwas 23.04%, the lowest was 14.45%, and the average was 20.74%. This pattern conforms to the variation law of phenotypic traits and can be used for haplotype analysis of candidate genes. In 2018, the research team conducted a whole-genome resequencing analysis of the soybean resource population in the northeast region. This experiment used the data to perform gene haplotype analysis.
Table 54. Protein and oil content in soybean resources in northeast China
Phenotype Year Min Mix Mean CV% Skewness Kurtosist
Protein cotent 2019 37.09% 52.94% 42.69% 4.73 0.50 1.33
Oil content 2019 14.45% 23.04% 20.74% 4.98 -1.56 3.28 ii. Division of the Glyma.06G303700 haplotype blocks
[0461] The whole genome of soybean resource population in Northeast China was resequenced and unqualified SNPs were screened out. There were 20 SNP variation sites in Gfyma.06G303700 and its promoter region that met the research requirements (Table 3). Using HaploView4.2 software, SNP variation sites were divided into three blocks, and SNP in each block had strong linkage disequilibrium. Block 1 contains 5 SNPs, of which 1 SNPs is located in the CDS coding region; Block 2 contains 12 SNPs, among which 2 SNPs are located in CDS coding region; Block 3 contains two SNPs, both of which are notin the CDS coding region. iii. Excellent haplotype analysis and phenotypic correlation analysis
[0462] Haplotypes that exceed 5.0% of the population (more than 34 Northeast resource groups) are called excellent haplotypes. There are 3 excellent haplotypes in Block 1, Hap_l, Hap_2 and Hap_3 account for 71.07%, 12.60% and 7.31% of all haplotypes. Use the multiple comparison function of SPSS software, the phenotype of the resource material protein and oil in each group of excellent haplotypes were analyzed. Hap_l and Hap_2 showed extremely significant difference in protein content (P < 0.01) and significant difference in oil content (P < 0.05); Hap l and Hap_3 showed extremely significant differences in protein content (P < 0.01) and oil content (P < 0.01); Hap 2 and Hap 3 showed no significant difference in protein content, but showed extremely significant difference in oil content (P < 0.01). Hap_l showed a low-protein phenotype, Hap_2 and Hap _3 showed a high -protein phenotype; in terms of oil content, Hap l and Hap_2 showed a high-oil phenotype, andHap_3 showed a low-oil phenotype. The base variation (C:T) in the exon region occurred at 1890 bp of the gene. The SNP variation of Hap_2 was different from the reference genome, and the protein content of the phenotype was extremely different from that of HAP 1 (P<0.01 ); oil content is significantly different from HAP 1 (P<0.05), and extremely significantly differentfrom Hap_3 (PO.01). See FIG. 6 and Table 55.
Table 55. Significant analysis of phenotypes among excellent haplotypes in block 1
Phenotype Haplotype Haplotype F value Significance
Hap l Hap_2 0.000 Very significant
Protein content Hap l Hap_3 0.000 Very significant
Hap 2 Hap 3 0.236 No significance
Hap l Hap_2 0.021 Significant
Oil content Hap l Hap_3 0.000 Very significant
Hap 2 Hap 3 0.001 Very significant
[0463] There were two excellent haplotypes in Block 2: Hap_4 andHap_5, which account for 58.01% and 20.68% of the haplotypes, respectively. Usingthe multiple comparison function of SPSS software to analyze the significance of the protein oil phenotype of resource materials in each group of haplotypes, there was no significant difference in protein content between Hap_4 andHap_5, but there was a significant difference in oil content (P< 0.05). Hap_4 showed a low oil phenotype, while Hap_5 showed a high oil phenotype. The base variations (C:G) and (G:A) in the exon region of the gene 6941 bp and 6977 bp occurred. Hap_5 had SNP variations differentfrom the reference genome, and the oil phenotype of Hap_5 was significantly different from Hap_4 (P<0.05). See FIG. 7 and Table 56.
Table 56. Significant analysis of phenotype among excellent haplotypes in block 2
Phenotype Haplotype Haplotype F value Significance
Protein content Hap_4 Hap_5 0.274 No significance
Oil content Hap_4 Hap_5 0.028 Significant [0464] There are two excellent haplotypes in Block 3 : Hap_6 and Hap _7, which account for 64.23% and 35.93% of the haplotypes, respectively. Using the multiple comparison function of SPSS software to analyze the significance of the protein oil phenotype of resource materials in each group of haplotypes, Hap_6 and Hap _7 showed significant differences in protein content (P < 0.05), and there was a very significant difference in terms of oil content (P < 0.01). The protein content of Hap_6 was higher than that of Hap_7, and Hap_6had a high protein phenotype while Hap_7 was a low protein phenotype. The oil content of Hap_6 was lower than Hap_7, and Hap_6 showed a low oil phenotype while Hap_7 showed a high oil phenotype. See FIG. 8A and 8B and Table 57.
Table 57. Significant analysis of phenotype among excellent haplotypes in block 3
Phenotype Haplotype Haplotype F value Significance
Protein content Hap 6 Hap 7 0.030 Significant
Oil content Hap r 6 Hap 7 0.002 - r- significant
Table 58. haplotypes assocated with increased protein content and oil content
T,, . Increase protein Increase oil
Phenotype 1 content content
Hap l +
Hap_2 + +
Hap_3 +
Hap_4
Hap_5 +
Hap_6 +
Hap_7 +
[0465] Candidate genes were obtained by BSA mixed pool sequencing, Glyma.03G040200 has an OPT domain, the gene expression in seeds is low, and there is no difference in parent amino acid sequence; Glyma.03G036300 has a domain, having a function related to DNA repair, and the gene expression of which is absent in various tissues. Glyma.07G 192400 has no recognizable domains, and it is highly expressed in seeds. Glyma.06G303700 has structural domains with a function relatedto lipid transfer and is highly expressed in seeds; Glyma.06G297500 has no recognizable domains, it is expressed in low levels in seeds. [0466] Glyma.06G303700 has the domain START_ArGLABRA2_like, having a function related to lipid transfer. Results from tissue-specific expression indicate that the gene is expressed in high levels in the seeds, which maybe related to soybean quality and regulation of the synthesis and metabolism of grain storage related.
[0467] Glyma.06G303700 is expressed in all tissues and organs, with the highest expression level in seeds. The expression pattern of Glyma.06G303700 and the published soybean seed protein oil-related genes (GmWRIla, GmWRIlb, GmLECla, GmLEClb, GmFUSa, GmABI3, GmABI5, GmDREBE) during the life cycle of soybean seed development are partly similar, showing a low-high-low trend.
[0468] Purchasing Arabidopsis gene mutants that are highly homologous to Glyma.06G303700 through the ABRC website: abrc.osu.edu, screening homozygous Arabidopsis mutant seeds, and determining the fatty acid content and total nitrogen content of the grains. The fatty acid content and total nitrogen content of Arabidopsis mutant seeds were significantly lower than control plants. The fatty acid content and total nitrogen content of the mutant replenishment plants increased significantly, and the fatty acid content and total nitrogen content of the overexpression plants also increased. To sum up, Glyma.06G303700 has important potential in improving soybean quality, and has regulatory effect on improving soybean grain protein and oil content.
[0469] 680 soybean resources from Northeast Chinawere resequenced, andthe haplotype analysis of the gene Glyma.06G303700 (including the promoter) was performed. The results showed that the gene had 20 SNP mutations in the resources, of which 3 were located in the exon region. The base variations were at 1890 bp, 6941 bp and 6977 bp are (C:T), (C:G), and (G:A), respectively. These sites may be closely related to the accumulation of protein and oil. According to the linkage disequilibrium relationship, the site is divided into three blocks. Block 1 has three excellent haplotypes: Hap_l is a low protein and high oil phenotype, Hap_2 is a high protein and high oil phenotype, and Hap_3 is a high protein and low oil phenotype. Block 2 has two excellent haplotypes: Hap_4 is a low oil phenotype and Hap _5 is a high oil phenotype. Block 3 has two excellent haplotypes: Hap_6 is a high protein and low oil phenotype and Hap _7 is a low protein and high oil phenotype. Example 29. Stacking genes
Example 29,1. Soybean transformation
[0470] Construct 26627, in which the GmDESlgene (SEQ ID NO:74) is constitutively expressed by the Cauliflower mosaic virus (CaMV) 35 S promoter (SEQ ID NO:23) and the GmSTART gene (SEQ ID NO: 1) is constitutively expressed from the Medicago truncatula glyceraldehyde-3-phosphate dehydrogenase C subunit 1 promoter (SEQ ID NO: 143) was built. The 26627 construct also includes an acetolactate synthase gene from N. tabacum to provide sulfonylurea resistance for selection of positive transformants. The 26627 binary vector was used to transform soybean plants and the resulting plants were characterized. Positive transformants were identified and retained; null segregants were also retained to determine the effects of the 26627 construct on soybean composition.
Example 29,2 Greenhouse trial
[0471] A pairwise comparison trial was designed to identify differencesbetween transgenic (GM) and null segregant plants in a greenhouse setting. Seed composition data was collected at T2 seed stage and analyzed by paired T-test. Seed protein was checked by elemental analyzer, seed oil was exacted by diethyl ether and lipid profile was checked by GC-FID.
[0472] For greenhouse trial, T1 transgenic soybean seeds were germinated and sampled for Taqman analysis to identify GM homozygous and null plants from the same event. 20 growth uniform seedlings (10 GM and 10 null) were selected and transplanted into soil by placing 1 GM and 1 null side by side to make 10 pairs within a 1 .2x0.7m block. 10 single copy events from the same construct were selected according to genotype and expression data. Leaf and seed samples were taken for gene expression from 3 individual plants from GM and null per event atR6 stage, respectively. Single plant was harvested and threshed to collect seeds atR8 stage, seeds were air dried to -12% water content and delivered to lab for protein and amino acid analysis.
Example 29.3 GmDESl-GmSTART co-expression reduces oil content in soy seed [0473] Among 7 soybean events transformed with the 26627 vector that were tested, 3 events accumulate significantly less oil than null segregant plants, ranging from -3.60% to - 7.97% (Table 59). Protein content of GM plants is quite similar as corresponding null plants. Table 59. T2 seed composition change in GmDESl -Gm START co-expression transgenic and Null
Figure imgf000128_0001
Example 29,4, GmDESl -GmSTART co-expression modifies seed lipid profile [0474] Soybean plants transformed with the 26627 vector accumulate more palmitoleic acid than null segregants, with an increase in the range from 12.53% to 18.78% (Table 60A and Table 60B). The content of oleic acid in transgenic seeds also increased relative to the null segregants, while linoleic acid and linolenic acid reduced significantly. Meanwhile myristic acid, stearic acid, palmitic acid, and eicosadienoic acid are all reduced in transgenic plants relative to the null segregants, which attributed to the phenomenon that the total oil content in seeds of transgenic plants was reduced relative to the null segregants.
Table 60A. Lipid profile change of T2 seed of GmDESl -Gm START co-expression transgenic and null segregant plants
Figure imgf000128_0002
Figure imgf000129_0001
Table 60B. Lipid profile change of T2 seed of GmDES 1 -GmSTART co-expression transgenic and null segregant plants
Figure imgf000129_0002
Example 30. Sequences of the disclosure
[0475] A listing of the sequences by SEQ ID NO is provided in Table 61.
Table 61. Annotation of sequences in the sequence listing
SEQ ID NO: Description _
1 Glyma. START (Glyma.06G303700) genomic sequence(in W82 and SN14)
2 Glyma.START (Glyma.06G303700) CDS (in W82 and SN14)
3 Glyma.START (Glyma.06G303700) protein (in W82 and SN14)
4 Glyma.START (Glyma.06G303700) CDS (in ZYD)
5 Glyma.START (Glyma.06G303700) protein (in ZYD)
6 Glyma.03 G036300 genomic sequence(inW82 and ZYD)
7 Glyma.03 G036300 CDS (in W82 and ZYD)
8 Glyma.03 G036300 protein (in W82 and ZYD)
9 Glyma.03 G036300 protein (in SN14)
10 Glyma.03 G040200 genomic sequence
11 Glyma.03 G040200 CDS
12 Glyma.03 G040200 protein
13 Glyma.06G297500 genomic sequence
14 Glyma.06G297500 CDS
15 Glyma.06G297500 protein Glyma.07G192400 genomic sequence (in SN14 andW82)
Glyma.07G192400 CDS (in SN14 and W82)
Glyma.07Gl 92400 protein (in SN14 andW82)
Glyma.07Gl 92400 protein (in ZYD)
AT1G05230 genomic sequence
AT1G05230 CDS
AT1G05230 protein pr35S
LOC_Os04g48070 protein
LOC_Os04g53540 protein
LOC_Os08g04190 protein
LOC_Os08g08820 protein
LOC_Os08gl 9590 protein
LOC Os 10g42490 protein
Zm00001d000247 protein
Zm00001d002234 protein
Zm00001d004230 protein
Zm00001d024701 protein
Zm00001d026351 protein
Zm00001d049443 protein
Zm00001d052133 protein
Glyma. l2Gl 00100 protein
Glyma.13 G308200 protein
Glyma.12G194400 protein
Glyma.13 G357100 protein
AT1G73360 protein
ATI G17920 protein
ATI G34650 protein
AT4G21750 protein
AT4G04890 protein
AT4G00730 protein
AT4G17710 protein
AT2G32370 protein
Medtr2g088470 protein
Medtr2g030570 protein
Medtr2g030600 protein
Medtr4g047800 protein
Os08g04190 protein
0s08g08820 protein
Os04g53540 protein
0s04g48070 protein
Phvul.011G104700 protein
Phvul.005Gl 14600 protein
Phvul.005G168900 protein
SALK_127828.47.00.x genomic sequence
Bar-F
Bar-R
Glyma.06G303700-F Glyma.06G303700-R
Glyma.06G303700-q-F
Glyma.06G303700-q-R
AtACTIN2-q-F
AtACTIN2-q-R
GmActin4-q-F
GmActin4-q-R
SALK-LP-START
SALK-RP- START
SALK-BP- START
GmDESl(Glyma.20G092400) genomic sequence
GmDESl(Glyma.20G092400) CDS
GmDESl(Glyma.20G092400) protein
Glyma.20G092000 genomic sequence
Glyma.20G092000 CDS
Glyma.20G092000 protein
Glyma.20G094900 genomic sequence
Glyma.20G094900 CDS
Glyma.20G094900 protein
Glyma.20G092100 genomic sequence
Glyma.20G092100 CDS
Glyma.20G092100 Protein
AT5G26600 genomic sequence
AT5G26600 CDS
AT5G26600 Protein
SALK-LP-DES1
SALK-RP-DES1
SALK-BP-DES1
SALK_021984C genomic sequence pSOYl
Glyma.20G092400-cF
Glyma.20G092400-cF
Glyma.20G092000-qF
Glyma.20G092000-qR
Glyma.20G092100-qF
Glyma.20G092100-qR
Glyma.20G092400-qF
Glyma.20G092400-qR
Glyma.20G094900-qF
Glyma.20G094900-qR
Glyma.20G092400-zF Arabidopsis thaliana mutant DNA detection
Glyma.20G092400-zR4/z//>/<7o/rs7.s thaliana mutant DNA detection
SALK_021984C-F Arabidopsis thaliana mutant mRNA detection
SALK_021984C-R draA/r/o/zs/.s thaliana mutant mRNA detection 108 Atl 8SrRNA-F
109 Atl 8SrRNA-R
1 10 LOC_Os01gl 8640 genomic sequence
1 1 1 LOC OsOlgl 8640 CDS
1 12 LOC OsOlgl 8640 protein
1 13 LOC_Os01gl 8660 genomic sequence
1 14 LOC OsOlgl 8660 CDS
1 15 LOC OsOlgl 8660 protein
1 16 Zm00001d008187 genomic sequence
1 17 Zm00001d008187 CDS
1 18 Zm00001d008187 protein
1 19 Zm00001d040555 genomic sequence
120 Zm00001d040555 CDS
121 Zm00001d040555 protein
122 AT3 G261 15 (DCD2) genomic sequence
123 AT3G261 15 (DCD2) CDS
124 AT3 G261 15 (DCD2) Protein
125 AT1 G48420 (DCD1) genomic sequence
126 AT1 G48420 (DCD1) CDS
127 AT1 G48420 (DCD1) Protein
128 AT5G28030 (DES1) genomic sequence
129 AT5G28030 (DES1) CDS
130 AT5G28030 (DES 1 ) protein
13 1 At5g65720 (NFS1) genomic sequence
132 At5g65720 (NFS1) CDS
133 At5g65720 (NFS1) protein
134 Atlg08490 (NFS2) genomic sequence
135 Atlg08490 (NFS2) CDS
136 Atlg08490 (NFS2) protein
137 AT3G62130 (LCD) genomic sequence
138 AT3 G62130 (LCD) CDS
139 AT3 G62130 (LCD) protein
140 Reference haplotype
141 Hap_4
142 Hap_5
143 Medicago truncatula glyceraldehyde-3 -phosphate dehydrogenase
C subunit 1 promoter
Table 62 Glyma.06G303700 related sequences
SEQ ID Description
3 Glyma. START (Glyma.06G303700) protein (in W82 and SN14)
5 Glyma. START (Glyma.06G303700) protein (in ZYD)
8 Glyma.03G036300 protein (in W82 and ZYD)
9 Glyma.03G036300 protein (in SN14)
12 Glyma.03G040200 protein
15 Glyma.06G297500 protein Glyma.07G192400 protein (in SN14 and W82)
Glyma.07G192400 protein (in ZYD)
AT1G05230 protein
LOC_Os04g48070 protein
LOC_Os04g53540 protein
LOC_Os08g04190 protein
LOC_Os08g08820 protein
LOC_Os08gl9590 protein
LOC_OslOg42490 protein
Zm00001d000247 protein
Zm00001d002234 protein
Zm00001d004230 protein
Zm00001d024701 protein
Zm00001d026351 protein
Zm00001d049443 protein
Zm00001d052133 protein
Glyma. 12G100100 protein
Glyma. 13G308200 protein
Glyma. 12G194400 protein
Glyma. 13G357100 protein
AT1G73360 protein
ATI G17920 protein
AT1G34650 protein
AT4G21750 protein
AT4G04890 protein
AT4G00730 protein
AT4G17710 protein
AT2G32370 protein
Medtr2g088470 protein
Medtr2g030570 protein
Medtr2g030600 protein
Medtr4g047800 protein
Gs08g04190 protein
Gs08g08820 protein
Os04g53540 protein
Gs04g48070 protein
Phvul.011G104700 protein
Phvul.005G114600 protein
Phvul.005G168900 protein
GmDES 1 (Glyma.20G092400) protein
Glyma.20G092000 protein
Glyma.20G094900 protein
Glyma.20G092100 Protein
AT5G26600 Protein
LOC OsOlgl 8640 protein
LOC OsOlgl 8660 protein
Zm00001d008187 protein
Zm00001d040555 protein 124 AT3 G26115 (DCD2) Protein
127 AT1G48420 (DCD1) Protein
130 AT5G28030 (DES1) protein
133 At5g65720 (NFS 1) protein
136 Atlg08490 (NFS2) protein
139 AT3G62130 (LCD) protein
[0476] All patents, patent publications, patent applications, journal articles, books, technical references, and the like discussed in the instant disclosure are incorporated herein by reference in their entirety for all purposes.
[0477] It can be appreciated that, in certain aspects of the disclosure, a single component may be replaced by multiple components, and multiple components may be replacedby a single component, to provide an element or structure or to perform a given function or functions. Except where such substitution would not be operative to practice certain embodiments of the disclosure, such substitution is considered within the scope of the disclosure.
[0478] The examples presented herein are intended to illustrate potential and specific implementations of the disclosure. It can be appreciated that the examples are intended primarily for purposes of illustration of the disclosure forthose skiliedin the art. There may be variations to these diagrams or the operations described herein without departing from the spirit of the disclosure. For instance, in certain cases, method steps or operations may be performed or executed in differing order, or operations may be added, deleted or modified.
[0479] All numerical designations, e g., pH, temperature, time, concentration, and molecular weight, including ranges, are approximations which are varied ( + ) or ( - ) by increments of 0.1 or 1.0, as appropriate. It is to be understood, although not always explicitly stated that all numerical designations are preceded by the term “about.” Where a range of values is provided, it is understood that each intervening value, to the smallest fraction of the unit of the lower limit, unless the context clearly dictates otherwise, between the upper and lower limits of that range is also specifically disclosed. Any narrower range between any stated values or unstated intervening values in a stated range and any other stated or intervening value in that stated range is encompassed. The upper and lower limits of those smaller ranges may independently be included or excluded in the range, and each range where either, neither, or both limits are included in the smaller ranges is also encompassed within the technology, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included.
[0480] In the foregoing description, numerous specific details are set forth to provide a more thorough understanding of the present invention. However, it will be apparent to one of skill in the art that the invention described in this disclosure may be practiced without one or more of these specific details. In other instances, well-known features and procedures well known to those skilled in the art have notbeen described to avoid obscurin the invention. Embodiments of the disclosure have been described for illustrative and not restrictive purposes. Although the present invention is described primarily with reference to specific embodiments, it is also envisioned that other embodiments will become apparent to those skilled in the art upon reading the present disclosure, and it is intended that such embodiments be contained within the present inventive methods. Accordingly, the present disclosure is not limited to the embodiments described above or depicted in the drawings, and various embodiments and modifications can be made without departing from the scope of the claims below.

Claims

WHAT IS CLAIMED IS:
1. An elite soybean plant having stably incorporated into its genome a heterologous polynucleotide, wherein the polynucleotide comprises a nucleic acid sequence encoding
(a) a polypeptide having an amino acid sequence that has at least 90% identity, at least 95% identity or 100% identity to at least one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, and/or 139, or
(b) a polypeptide having an amino acid sequence that has at least 90% identity, at least 95% identity or 100% identity to at least one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, and/or 139, wherein expression of the heterologous polynucleotide in the plant increases protein content, increases oil content, and/or modifies oil profile as compared to a control plant.
2. The elite soybean plant of claim 1, wherein the nucleic acid sequence encoding the polypeptide of (a) has at least 90% identity, at least 95% identity, or 100% identity to at least one of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134, or 137; and the nucleic acid sequence encoding the polypeptide of (b) has at least 90% identity, at least 95% identity, or 100% identity to at least one of SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21.
3. The elite soybean plant of claim 1 or 2, wherein the nucleic acid sequence encoding the polypeptide of (a) or (b) is operably linked to a heterologous promoter active in the plant.
4. The elite soybean plant of claim 1 or 2, wherein the polynucleotide comprises a nucleic acid sequence encoding the polypeptide of both (a) and (b).
5. The elite soybean plant of claim 4, wherein the nucleic acid sequence encoding the polypeptide of (a) is operably linked to a first heterologous promoter active in the plant and the nucleic acid sequence encoding the polypeptide of (b) is operably linked to a second heterologous promoter active in the plant.
6. The elite soybean plant of any one of claims 1-5, wherein the polynucleotide is introduced into the genome of the plant by transgenic expression.
7. The elite soybean plant of any one of claims 1-5, wherein the polynucleotide is introduced into the genome of the plant by genome editing.
8. The elite soybean plant of claim 3, wherein the heterologous promoter is an endogenous promoter, a constitutive promoter, an inducible promoter, or a tissue-specific promoter, and wherein the heterologous promoter is not a root-specific promoter.
9. The elite soybean plant of any one of claims 1-8, wherein the elite soybean plant is an agronomically elite plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, or herbicide tolerance.
10. A plant cell, seed, or plant part derived from the elite soybean plant of any one of claims 1 -9, wherein said plant cell, seed or plant part has the polynucleotide stably incorporated into its genome.
11. A harvested product derived from the seed of claim 10, wherein the harvested product comprises the polynucleotide.
12. A processed product derived from the harvested product of claim 11, wherein the processed product is a flour, a meal, an oil, a starch, or a product derived from any of the foregoing, and wherein the processed product comprises the polynucleotide.
13 A method of conferring increased protein content, increased oil content, and/or modified oil profile to a plant comprising: a) introducing into the genome of a plant, a nucleic acid sequence encoding a polypeptide having
(i) an amino acid sequence comprising at least 85%, at least 90%, at least 95%, or 100% identity to any one of SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, and/or 139, or
(ii) an amino acid sequence comprising at least 85%, at least 90%, at least 95%, or 100% identity to any one of SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, and/or24-59, wherein said nucleic acid sequence is heterologous to the plant, and wherein expression of said nucleic acid sequence in the plant results in increased protein content, increased oil content, and/or modified oil profile compared to a control plant not expressing said nucleic acid sequence.
14. The method of claim 13, wherein the nucleic acid sequence is introduced into the genome of the plant by one of transformation, gene editing of the genome of the plant, or by crossing a donor plant comprising the nucleic acid sequence with the plant to produce a progeny plant having increased protein content, increased oil content, and/or modified oil profile.
15 The method of claim 13 or 14, wherein the method further comprises screening for the introduced nucleic acid sequence with PCR and/or sequencing.
16. The method of any of claims 13-15, wherein the nucleic acid sequence encodes the polypeptide of (i) and (ii).
17. The method of any one of claims 13-16, wherein the plant is a soybean plant, optionally wherein the soybean plant is an elite soybean plant.
18. A plant produced by the method of any one of claims 13-17.
19. An expression cassette, comprising a heterologous promoter operably linked to:
(a) a nucleotide sequence encoding a protein having an amino acid sequence sharing at least 90%, 95% or 100% sequence identity to SEQ ID NOs: 76, 79, 82, 85, 88, 112, 115, 118, 121, 124, 127, 130, 133, 136, and/or 139; or
(b) a nucleotide sequence encoding a protein having an amino acid sequence sharing at least 90%, 95% or 100% sequence identity to SEQ ID NOs: 3, 5, 8, 9, 12, 15, 18, 19, 22, and/or 24-59, wherein expression of the nucleic acid sequence in a plant increases protein and/or oil content relative to a control plant.
20. The expression cassette of claim 19, wherein the nucleotide sequence of part (a) comprises a sequence of SEQ ID NOs: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134 or 137, or has at least more than 99%, at least 95%, at least 90%, at least 85%, or at least 80% identity to any one of SEQ ID Nos: 74, 77, 80, 83, 86, 110, 113, 116, 119, 122, 125, 128, 131, 134, or 137; and wherein the nucleotide sequenceof part(b) comprises a sequence of SEQ ID NOs: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or 21, or has at least more than 99%, at least 95%, at least 90%, at least 85%, or at least 80% identity to any one of SEQ ID Nos: 1, 2, 4, 7, 10, 11, 14, 16, 17, 20, or21.
21. The expression cassette of claim 19 or 20, comprising the nucleotide sequence of (a) operably linked to a first heterologous promoter and the nucleotide sequenceof (b) operably linked to a second heterologous promoter, wherein expression of the nucleic acid sequence in a plant increases protein and/or oil content in the plant relative to a control plant, and wherein the first and second heterologous promoters are capable of directing expression in a plant cell, and wherein the control plant does not comprise the nucleotide sequence of (a), the nucleotide sequence of (b), or both the nucleotide sequence (a) and the nucleotide sequence (b).
22. The expression cassette of any one of claims 19-21, wherein the heterologous promoter is an endogenous promoter or an exogenous promoter, wherein the heterologous promoter is not a root-specific promoter, and optionally wherein the exogenous promoter comprises SEQ ID NO: 20 (pSOYl).
23. A vector comprising the expression cassette of any of claims 19-22.
24. A transgenic plant cell comprising the expression cassette of any of claims 19- 22 or the vector of claim 23 , wherein the nucleotide sequence is stably incorporated into the genome of the transgenic plant cell, optionally wherein the transgenic cell is a transgenic soybean plant cell.
PCT/US2023/062421 2022-02-11 2023-02-10 Methods and compositions for increasing protein and/or oil content and modifying oil profile in a plant WO2023154887A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
PCT/CN2022/075977 WO2023151004A1 (en) 2022-02-11 2022-02-11 Methods and compositions for increasing protein and oil content and/or modifying oil profile in plant
CNPCT/CN2022/075977 2022-02-11
PCT/CN2022/075982 WO2023151007A1 (en) 2022-02-11 2022-02-11 Methods and compositions for increasing protein and/or oil content and modifying oil profile in a plant
CNPCT/CN2022/075982 2022-02-11

Publications (1)

Publication Number Publication Date
WO2023154887A1 true WO2023154887A1 (en) 2023-08-17

Family

ID=85703814

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/062421 WO2023154887A1 (en) 2022-02-11 2023-02-10 Methods and compositions for increasing protein and/or oil content and modifying oil profile in a plant

Country Status (1)

Country Link
WO (1) WO2023154887A1 (en)

Citations (72)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0242246A1 (en) 1986-03-11 1987-10-21 Plant Genetic Systems N.V. Plant cells resistant to glutamine synthetase inhibitors, made by genetic engineering
US4761373A (en) 1984-03-06 1988-08-02 Molecular Genetics, Inc. Herbicide resistance in plants
US4769061A (en) 1983-01-05 1988-09-06 Calgene Inc. Inhibition resistant 5-enolpyruvyl-3-phosphoshikimate synthase, production and use
US4810648A (en) 1986-01-08 1989-03-07 Rhone Poulenc Agrochimie Haloarylnitrile degrading gene, its use, and cells containing the gene
US4853331A (en) 1985-08-16 1989-08-01 Mycogen Corporation Cloning and expression of Bacillus thuringiensis toxin gene toxic to beetles of the order Coleoptera
US4940935A (en) 1989-08-28 1990-07-10 Ried Ashman Manufacturing Automatic SMD tester
US4940835A (en) 1985-10-29 1990-07-10 Monsanto Company Glyphosate-resistant plants
US4975374A (en) 1986-03-18 1990-12-04 The General Hospital Corporation Expression of wild type and mutant glutamine synthetase in foreign hosts
US5013659A (en) 1987-07-27 1991-05-07 E. I. Du Pont De Nemours And Company Nucleic acid fragment encoding herbicide resistant plant acetolactate synthase
US5023179A (en) 1988-11-14 1991-06-11 Eric Lam Promoter enhancer element for gene expression in plant roots
US5039523A (en) 1988-10-27 1991-08-13 Mycogen Corporation Novel Bacillus thuringiensis isolate denoted B.t. PS81F, active against lepidopteran pests, and a gene encoding a lepidopteran-active toxin
EP0480762A2 (en) 1990-10-12 1992-04-15 Mycogen Corporation Novel bacillus thuringiensis isolates active against dipteran pests
US5110732A (en) 1989-03-14 1992-05-05 The Rockefeller University Selective gene expression in plants
US5162602A (en) 1988-11-10 1992-11-10 Regents Of The University Of Minnesota Corn plants tolerant to sethoxydim and haloxyfop herbicides
US5188642A (en) 1985-08-07 1993-02-23 Monsanto Company Glyphosate-resistant plants
US5268463A (en) 1986-11-11 1993-12-07 Jefferson Richard A Plant promoter α-glucuronidase gene construct
US5276268A (en) 1986-08-23 1994-01-04 Hoechst Aktiengesellschaft Phosphinothricin-resistance gene, and its use
US5304730A (en) 1991-09-03 1994-04-19 Monsanto Company Virus resistant plants and method therefore
US5399680A (en) 1991-05-22 1995-03-21 The Salk Institute For Biological Studies Rice chitinase promoter
US5401836A (en) 1992-07-16 1995-03-28 Pioneer Hi-Bre International, Inc. Brassica regulatory sequence for root-specific or root-abundant gene expression
US5428148A (en) 1992-04-24 1995-06-27 Beckman Instruments, Inc. N4 - acylated cytidinyl compounds useful in oligonucleotide synthesis
US5437992A (en) 1994-04-28 1995-08-01 Genencor International, Inc. Five thermostable xylanases from microtetraspora flexuosa for use in delignification and/or bleaching of pulp
US5459252A (en) 1991-01-31 1995-10-17 North Carolina State University Root specific gene promoter
US5466785A (en) 1990-04-12 1995-11-14 Ciba-Geigy Corporation Tissue-preferential promoters
US5495071A (en) 1987-04-29 1996-02-27 Monsanto Company Insect resistant tomato and potato plants
US5554798A (en) 1990-01-22 1996-09-10 Dekalb Genetics Corporation Fertile glyphosate-resistant transgenic corn plants
US5569597A (en) 1985-05-13 1996-10-29 Ciba Geigy Corp. Methods of inserting viral DNA into plant material
US5569823A (en) 1993-05-28 1996-10-29 Bayer Aktiengesellschaft DNA comprising plum pox virus and tomato spotted wilt virus cDNAS for disease resistance
US5604121A (en) 1991-08-27 1997-02-18 Agricultural Genetics Company Limited Proteins with insecticidal properties against homopteran insects and their use in plant protection
US5608144A (en) 1994-08-12 1997-03-04 Dna Plant Technology Corp. Plant group 2 promoters and uses thereof
US5608149A (en) 1990-06-18 1997-03-04 Monsanto Company Enhanced starch biosynthesis in tomatoes
US5608142A (en) 1986-12-03 1997-03-04 Agracetus, Inc. Insecticidal cotton plants
US5633363A (en) 1994-06-03 1997-05-27 Iowa State University, Research Foundation In Root preferential promoter
US5659026A (en) 1995-03-24 1997-08-19 Pioneer Hi-Bred International ALS3 promoter
US5750386A (en) 1991-10-04 1998-05-12 North Carolina State University Pathogen-resistant transgenic plants
US5767366A (en) 1991-02-19 1998-06-16 Louisiana State University Board Of Supervisors, A Governing Body Of Louisiana State University Agricultural And Mechanical College Mutant acetolactate synthase gene from Ararbidopsis thaliana for conferring imidazolinone resistance to crop plants
US5767378A (en) 1993-03-02 1998-06-16 Novartis Ag Mannose or xylose based positive selection
US5837876A (en) 1995-07-28 1998-11-17 North Carolina State University Root cortex specific gene promoter
US5879903A (en) 1986-08-23 1999-03-09 Hoechst Aktiengesellschaft Phosphinothricin-resistance gene, and its use
US5928937A (en) 1995-04-20 1999-07-27 American Cyanamid Company Structure-based designed herbicide resistant products
WO1999043819A1 (en) 1998-02-26 1999-09-02 Pioneer Hi-Bred International, Inc. Family of maize pr-1 genes and promoters
WO1999043838A1 (en) 1998-02-24 1999-09-02 Pioneer Hi-Bred International, Inc. Synthetic promoters
US5994629A (en) 1991-08-28 1999-11-30 Novartis Ag Positive selection
WO2000011177A1 (en) 1998-08-20 2000-03-02 Pioneer Hi-Bred International, Inc. Seed-preferred promoters
WO2000012733A1 (en) 1998-08-28 2000-03-09 Pioneer Hi-Bred International, Inc. Seed-preferred promoters from end genes
US6084155A (en) 1995-06-06 2000-07-04 Novartis Ag Herbicide-tolerant protoporphyrinogen oxidase ("protox") genes
US6177611B1 (en) 1998-02-26 2001-01-23 Pioneer Hi-Bred International, Inc. Maize promoters
US20010016956A1 (en) 1994-06-16 2001-08-23 Ward Eric R. Herbicide-tolerant protox genes produced by DNA shuffling
US6329504B1 (en) 1996-12-13 2001-12-11 Monsanto Company Antifungal polypeptide and methods for controlling plant pathogenic fungi
US6337431B1 (en) 1994-12-30 2002-01-08 Seminis Vegetable Seeds, Inc. Transgenic plants expressing DNA constructs containing a plurality of genes to impart virus resistance
WO2003016654A1 (en) 2001-08-10 2003-02-27 Akzenta Paneele + Profile Gmbh Panel and fastening system for such a panel
US20050060767A1 (en) 1998-08-12 2005-03-17 Venkiteswaran Subramanian DNA shuffling to produce herbicide selective crops
US20050208178A1 (en) 2003-12-19 2005-09-22 Syngenta Participations Ag Microbially expressed xylanases and their use as feed additives and other uses
US20050246798A1 (en) 2004-04-29 2005-11-03 Verdia Inc. Novel glyphosate-N-acetyltransferase (GAT) genes
US20070004912A1 (en) 2000-10-30 2007-01-04 Pioneer Hi-Bred International, Inc. Novel glyphosate-N-acetyltransferase (GAT) genes
US7534939B2 (en) 1999-09-15 2009-05-19 Monsanto Technology Llc Plant transformed with polynucleotide encoding lepidopteran-active Bacillus thuringiensis δ-endotoxin
US7541517B2 (en) 2003-12-22 2009-06-02 Pioneer Hi-Bred International, Inc. Bacillus thuringiensis CRY9 nucleic acids
US7692068B2 (en) 2003-10-14 2010-04-06 Athenix Corporation AXMI-010, a delta-endotoxin gene and methods for its use
US7772369B2 (en) 1999-05-04 2010-08-10 Monsanto Technology Llc Coleopteran-toxic polypeptide compositions and insect-resistant transgenic plants
US8093453B2 (en) 2005-03-16 2012-01-10 Syngenta Participations Ag Corn event 3272 and methods of detection thereof
US8147856B2 (en) 2006-06-14 2012-04-03 Athenix Corp. AXMI-031, AXMI-039, AXMI-040 and AXMI-049, a family of novel delta-endotoxin genes and methods for their use
WO2012117324A1 (en) * 2011-02-28 2012-09-07 Basf Plant Science Company Gmbh Plants having enhanced yield-related traits and producing methods thereof
WO2013026740A2 (en) 2011-08-22 2013-02-28 Bayer Cropscience Nv Methods and means to modify a plant genome
US20130139280A1 (en) 2010-06-25 2013-05-30 Jeong Sheop Shin Plants Having Enhanced Yield-Related Traits and a Method for Making the Same
US8575425B2 (en) 2009-07-02 2013-11-05 Athenix Corporation AXMI-205 pesticidal gene and methods for its use
US8586832B2 (en) 2009-12-21 2013-11-19 Pioneer Hi Bred International Inc Bacillus thuringiensis gene with Lepidopteran activity
US20130326723A1 (en) * 1999-05-06 2013-12-05 Thomas J. La Rosa Soy nucleic acid molecules and other molecules associated with plants and uses thereof for plant improvement
US8802934B2 (en) 2010-08-19 2014-08-12 Pioneer Hi Bred International Inc Bacillus thuringiensis gene with lepidopteran activity
WO2017189308A1 (en) 2016-04-19 2017-11-02 The Broad Institute Inc. Novel crispr enzymes and systems
US10285348B2 (en) 2016-12-02 2019-05-14 Syngenta Participations Ag Simultaneous gene editing and haploid induction
US10519456B2 (en) 2016-12-02 2019-12-31 Syngenta Participations Ag Simultaneous gene editing and haploid induction
US10669540B2 (en) 2015-06-18 2020-06-02 The Board Institute, Inc. CRISPR enzymes and systems

Patent Citations (76)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4769061A (en) 1983-01-05 1988-09-06 Calgene Inc. Inhibition resistant 5-enolpyruvyl-3-phosphoshikimate synthase, production and use
US4761373A (en) 1984-03-06 1988-08-02 Molecular Genetics, Inc. Herbicide resistance in plants
US5569597A (en) 1985-05-13 1996-10-29 Ciba Geigy Corp. Methods of inserting viral DNA into plant material
US5188642A (en) 1985-08-07 1993-02-23 Monsanto Company Glyphosate-resistant plants
US4853331A (en) 1985-08-16 1989-08-01 Mycogen Corporation Cloning and expression of Bacillus thuringiensis toxin gene toxic to beetles of the order Coleoptera
US4940835A (en) 1985-10-29 1990-07-10 Monsanto Company Glyphosate-resistant plants
US4810648A (en) 1986-01-08 1989-03-07 Rhone Poulenc Agrochimie Haloarylnitrile degrading gene, its use, and cells containing the gene
EP0242246A1 (en) 1986-03-11 1987-10-21 Plant Genetic Systems N.V. Plant cells resistant to glutamine synthetase inhibitors, made by genetic engineering
US5561236A (en) 1986-03-11 1996-10-01 Plant Genetic Systems Genetically engineered plant cells and plants exhibiting resistance to glutamine synthetase inhibitors, DNA fragments and recombinants for use in the production of said cells and plants
US4975374A (en) 1986-03-18 1990-12-04 The General Hospital Corporation Expression of wild type and mutant glutamine synthetase in foreign hosts
US5879903A (en) 1986-08-23 1999-03-09 Hoechst Aktiengesellschaft Phosphinothricin-resistance gene, and its use
US5276268A (en) 1986-08-23 1994-01-04 Hoechst Aktiengesellschaft Phosphinothricin-resistance gene, and its use
US5268463A (en) 1986-11-11 1993-12-07 Jefferson Richard A Plant promoter α-glucuronidase gene construct
US5608142A (en) 1986-12-03 1997-03-04 Agracetus, Inc. Insecticidal cotton plants
US5495071A (en) 1987-04-29 1996-02-27 Monsanto Company Insect resistant tomato and potato plants
US5013659A (en) 1987-07-27 1991-05-07 E. I. Du Pont De Nemours And Company Nucleic acid fragment encoding herbicide resistant plant acetolactate synthase
US5039523A (en) 1988-10-27 1991-08-13 Mycogen Corporation Novel Bacillus thuringiensis isolate denoted B.t. PS81F, active against lepidopteran pests, and a gene encoding a lepidopteran-active toxin
US5162602A (en) 1988-11-10 1992-11-10 Regents Of The University Of Minnesota Corn plants tolerant to sethoxydim and haloxyfop herbicides
US5023179A (en) 1988-11-14 1991-06-11 Eric Lam Promoter enhancer element for gene expression in plant roots
US5110732A (en) 1989-03-14 1992-05-05 The Rockefeller University Selective gene expression in plants
US4940935A (en) 1989-08-28 1990-07-10 Ried Ashman Manufacturing Automatic SMD tester
US5554798A (en) 1990-01-22 1996-09-10 Dekalb Genetics Corporation Fertile glyphosate-resistant transgenic corn plants
US5466785A (en) 1990-04-12 1995-11-14 Ciba-Geigy Corporation Tissue-preferential promoters
US5608149A (en) 1990-06-18 1997-03-04 Monsanto Company Enhanced starch biosynthesis in tomatoes
EP0480762A2 (en) 1990-10-12 1992-04-15 Mycogen Corporation Novel bacillus thuringiensis isolates active against dipteran pests
US5459252A (en) 1991-01-31 1995-10-17 North Carolina State University Root specific gene promoter
US5767366A (en) 1991-02-19 1998-06-16 Louisiana State University Board Of Supervisors, A Governing Body Of Louisiana State University Agricultural And Mechanical College Mutant acetolactate synthase gene from Ararbidopsis thaliana for conferring imidazolinone resistance to crop plants
US5399680A (en) 1991-05-22 1995-03-21 The Salk Institute For Biological Studies Rice chitinase promoter
US5604121A (en) 1991-08-27 1997-02-18 Agricultural Genetics Company Limited Proteins with insecticidal properties against homopteran insects and their use in plant protection
US5994629A (en) 1991-08-28 1999-11-30 Novartis Ag Positive selection
US5304730A (en) 1991-09-03 1994-04-19 Monsanto Company Virus resistant plants and method therefore
US5750386A (en) 1991-10-04 1998-05-12 North Carolina State University Pathogen-resistant transgenic plants
US5428148A (en) 1992-04-24 1995-06-27 Beckman Instruments, Inc. N4 - acylated cytidinyl compounds useful in oligonucleotide synthesis
US5401836A (en) 1992-07-16 1995-03-28 Pioneer Hi-Bre International, Inc. Brassica regulatory sequence for root-specific or root-abundant gene expression
US5767378A (en) 1993-03-02 1998-06-16 Novartis Ag Mannose or xylose based positive selection
US5569823A (en) 1993-05-28 1996-10-29 Bayer Aktiengesellschaft DNA comprising plum pox virus and tomato spotted wilt virus cDNAS for disease resistance
US5437992A (en) 1994-04-28 1995-08-01 Genencor International, Inc. Five thermostable xylanases from microtetraspora flexuosa for use in delignification and/or bleaching of pulp
US5633363A (en) 1994-06-03 1997-05-27 Iowa State University, Research Foundation In Root preferential promoter
US20010016956A1 (en) 1994-06-16 2001-08-23 Ward Eric R. Herbicide-tolerant protox genes produced by DNA shuffling
US5608144A (en) 1994-08-12 1997-03-04 Dna Plant Technology Corp. Plant group 2 promoters and uses thereof
US6337431B1 (en) 1994-12-30 2002-01-08 Seminis Vegetable Seeds, Inc. Transgenic plants expressing DNA constructs containing a plurality of genes to impart virus resistance
US5659026A (en) 1995-03-24 1997-08-19 Pioneer Hi-Bred International ALS3 promoter
US5928937A (en) 1995-04-20 1999-07-27 American Cyanamid Company Structure-based designed herbicide resistant products
US6084155A (en) 1995-06-06 2000-07-04 Novartis Ag Herbicide-tolerant protoporphyrinogen oxidase ("protox") genes
US5837876A (en) 1995-07-28 1998-11-17 North Carolina State University Root cortex specific gene promoter
US6072050A (en) 1996-06-11 2000-06-06 Pioneer Hi-Bred International, Inc. Synthetic promoters
US6329504B1 (en) 1996-12-13 2001-12-11 Monsanto Company Antifungal polypeptide and methods for controlling plant pathogenic fungi
WO1999043838A1 (en) 1998-02-24 1999-09-02 Pioneer Hi-Bred International, Inc. Synthetic promoters
WO1999043819A1 (en) 1998-02-26 1999-09-02 Pioneer Hi-Bred International, Inc. Family of maize pr-1 genes and promoters
US6177611B1 (en) 1998-02-26 2001-01-23 Pioneer Hi-Bred International, Inc. Maize promoters
US20050060767A1 (en) 1998-08-12 2005-03-17 Venkiteswaran Subramanian DNA shuffling to produce herbicide selective crops
US6225529B1 (en) 1998-08-20 2001-05-01 Pioneer Hi-Bred International, Inc. Seed-preferred promoters
WO2000011177A1 (en) 1998-08-20 2000-03-02 Pioneer Hi-Bred International, Inc. Seed-preferred promoters
WO2000012733A1 (en) 1998-08-28 2000-03-09 Pioneer Hi-Bred International, Inc. Seed-preferred promoters from end genes
US7772369B2 (en) 1999-05-04 2010-08-10 Monsanto Technology Llc Coleopteran-toxic polypeptide compositions and insect-resistant transgenic plants
US20130326723A1 (en) * 1999-05-06 2013-12-05 Thomas J. La Rosa Soy nucleic acid molecules and other molecules associated with plants and uses thereof for plant improvement
US7534939B2 (en) 1999-09-15 2009-05-19 Monsanto Technology Llc Plant transformed with polynucleotide encoding lepidopteran-active Bacillus thuringiensis δ-endotoxin
US20070004912A1 (en) 2000-10-30 2007-01-04 Pioneer Hi-Bred International, Inc. Novel glyphosate-N-acetyltransferase (GAT) genes
WO2003016654A1 (en) 2001-08-10 2003-02-27 Akzenta Paneele + Profile Gmbh Panel and fastening system for such a panel
US7692068B2 (en) 2003-10-14 2010-04-06 Athenix Corporation AXMI-010, a delta-endotoxin gene and methods for its use
US20050208178A1 (en) 2003-12-19 2005-09-22 Syngenta Participations Ag Microbially expressed xylanases and their use as feed additives and other uses
US7790846B2 (en) 2003-12-22 2010-09-07 Pioneer Hi-Bred International, Inc. Bacillus thuringiensis Cry9 toxins
US7541517B2 (en) 2003-12-22 2009-06-02 Pioneer Hi-Bred International, Inc. Bacillus thuringiensis CRY9 nucleic acids
US20050246798A1 (en) 2004-04-29 2005-11-03 Verdia Inc. Novel glyphosate-N-acetyltransferase (GAT) genes
US8093453B2 (en) 2005-03-16 2012-01-10 Syngenta Participations Ag Corn event 3272 and methods of detection thereof
US8147856B2 (en) 2006-06-14 2012-04-03 Athenix Corp. AXMI-031, AXMI-039, AXMI-040 and AXMI-049, a family of novel delta-endotoxin genes and methods for their use
US8575425B2 (en) 2009-07-02 2013-11-05 Athenix Corporation AXMI-205 pesticidal gene and methods for its use
US8586832B2 (en) 2009-12-21 2013-11-19 Pioneer Hi Bred International Inc Bacillus thuringiensis gene with Lepidopteran activity
US20130139280A1 (en) 2010-06-25 2013-05-30 Jeong Sheop Shin Plants Having Enhanced Yield-Related Traits and a Method for Making the Same
US8802934B2 (en) 2010-08-19 2014-08-12 Pioneer Hi Bred International Inc Bacillus thuringiensis gene with lepidopteran activity
WO2012117324A1 (en) * 2011-02-28 2012-09-07 Basf Plant Science Company Gmbh Plants having enhanced yield-related traits and producing methods thereof
WO2013026740A2 (en) 2011-08-22 2013-02-28 Bayer Cropscience Nv Methods and means to modify a plant genome
US10669540B2 (en) 2015-06-18 2020-06-02 The Board Institute, Inc. CRISPR enzymes and systems
WO2017189308A1 (en) 2016-04-19 2017-11-02 The Broad Institute Inc. Novel crispr enzymes and systems
US10285348B2 (en) 2016-12-02 2019-05-14 Syngenta Participations Ag Simultaneous gene editing and haploid induction
US10519456B2 (en) 2016-12-02 2019-12-31 Syngenta Participations Ag Simultaneous gene editing and haploid induction

Non-Patent Citations (119)

* Cited by examiner, † Cited by third party
Title
"Wild Germplasm for Genetic Improvement in Crop Plants", 1 January 2021, ELSEVIER, ISBN: 978-0-12-822137-2, article HAMMAD NADEEM TAHIR MUHAMMAD ET AL: "Untapped Soybeans: A Genetic Reservoir for its Improvement", pages: 139 - 151, XP093044919, DOI: 10.1016/B978-0-12-822137-2.00008-4 *
ALLISON ET AL.: "maize dwarf mosaic virus (MDMV) leader", VIROLOGY, vol. 154, 1986, pages 9 - 20
ALTSCHUL ET AL., J. MOL. BIOL., vol. 215, 1990, pages 403
ALTSCHUL ET AL., NUCLEIC ACIDS RES, vol. 25, 1997, pages 3389 - 3402
ALTSCHUL ET AL., NUCLEIC ACIDS RES., vol. 25, 1997, pages 3389 - 3402
ANZALONE, A. ET AL., NATBIOTECHNOL, vol. 38, no. 7, July 2020 (2020-07-01), pages 824 - 844
BALLAS ET AL., NUCLEIC ACIDS RES., vol. 17, 1989, pages 7891 - 7903
BARTLETT, PLANT METHODS, vol. 4, 2008, pages 1 - 12
BATES, G.W., METHODS IN MOLECULAR BIOLOGY, vol. 111, 1999, pages 359 - 366
BEVAN ET AL., NATURE, vol. 304, 1983, pages 184 - 187
BHATTRAMAKKIRAFALSKI: "PLANT GENOTYPING: THE DNA FINGERPRINTING OF PLANTS", 2001, CABI PUBLISHING, article "Discovery and application of single nucleotide polymorphism markers in plants"
BINNSTHOMASHOW, ANNUAL REVIEWS IN MICROBIOLOGY, vol. 42, no. 57, 1988, pages 606
BLOCHINGERDIGGELMANN, MOL. CELL BIOL., vol. 4, pages 2929 - 2931
BOUROUIS ET AL., EMBO J, vol. 2, no. 7, 1983, pages 1099 - 1104
CAI ET AL., PLANT MOL BIOL, vol. 69, 2009, pages 699 - 709
CANEVASCIM ET AL., PLANT PHYSIOL., vol. 112, no. 2, 1996, pages 513 - 524
CAPANA ET AL., PLANTMOL. BIOL., vol. 25, no. 4, 1994, pages 681 - 691
CASTLE, SCIENCE, vol. 304, 2004, pages 1151 - 1154
CHEN ET AL., PLANT J, vol. 10, 1996, pages 955 - 966
CHING ET AL., BMC GENET, vol. 3, no. 19, 2002, pages 14
CHRISTENSEN ET AL., PLANT MOL. BIOL., vol. 12, 1989, pages 619 - 632
CHRISTENSEN ET AL., PLANT MOL. BIOL., vol. 20, no. 2, 1992, pages 207 - 218
CHRISTOU, P, EUPHYTICA, vol. 85, 1995, pages 13 - 27
CHRISTOU, P, THE PLANT JOURNAL, vol. 2, 1992, pages 275 - 281
CORDERO ET AL., PHYSIOL. MOL. PLANT PATH., vol. 41, 1992, pages 189 - 200
CORDEROK ET AL., PLANT J, vol. 6, no. 2, 1994, pages 141 - 150
COUGHLIN ET AL.: "Espoo", vol. 8, 1993, FOUNDATION FOR BIOTECHNICAL AND INDUSTRIAL FERMENTATION RESEARCH, article "Proceedings of the Second TRICEL Symposium on Trichoderma reesei Cellulases and Other Hydrolases", pages: 125 - 135
CRICKMORE ET AL., MICROBIOL. MOL. BIOL. REV., vol. 62, 1998, pages 807 - 813
DAYHOFF ET AL.: "Atlas of Protein Sequence and Structure", vol. 5, 1978, NATL. BIOMED. RES. FOUND., article "A model of evolutionary change in proteins", pages: 345 - 352
DELLA-CIOPPAET, PLANT PHYSIOLOGY, vol. 84, 1987, pages 965 - 968
ECKELKAMP ET AL., FEBSLETTERS, vol. 323, 1993, pages 73 - 76
ELROY-STEIN, O.FUERST, T. R.MOSS, B., PNAS USA, vol. 86, 1989, pages 6126 - 6130
FENG ET AL., CELL RESEARCH, vol. 23, 2013, pages 1229 - 1232
GALLIE ET AL., NUCL. ACIDS RES., vol. 15, 1987, pages 8693 - 8711
GALLIE, D. R. ET AL., MOLECULAR BIOLOGY OF RNA, 1989, pages 237 - 256
GUERINEAU, MOL. GEN. GENET., vol. 262, 1991, pages 141 - 144
GUEVARA-GARCIA ET AL., PLANT J, vol. 3, no. 3, 1993, pages 509 - 505
GUPTA ET AL., CURR SCI, vol. 80, 2001, pages 524 - 535
GUT, I.G., HUM. MUTAT., vol. 17, 2001, pages 475 - 492
HANSEN ET AL., MOL. GEN GENET., vol. 254, no. 3, 1997, pages 337 - 343
HEMKOFFET, PROC. NATL. ACAD. SCI. USA, vol. 89, 1992, pages 10915 - 10919
JOBLING, S. A.GEHRKE, L., NATURE, vol. 325, 1987, pages 622 - 625
JONES ET AL., PLANT METHODS, vol. 1, 2005
JOSHI ET AL., NUCLEIC ACIDS RES, vol. 15, 1987, pages 9627 - 9639
KARLINALTSCHUL, PROC. NATL. ACAD. SCI. USA, vol. 87, no. 21, 1990, pages 8526 - 8530
KAWAMATA ET AL., PLANT CELL PHYSIOL., vol. 38, no. 7, 1997, pages 792 - 803
KELLERBAUMGARTNER, PLANT CELL, vol. 3, no. 10, 1991, pages 1051 - 1061
KUSTER ET AL., PLANTMOL. BIOL., vol. 29, no. 4, 1995, pages 759 - 772
KWOK, PHARMACOGENOMICS, vol. 1, no. 1, 2000, pages 95 - 100
KWON ET AL., PLANT PHYSIOL., vol. 105, 1994, pages 357 - 67
LAM, RESULTS PROBL. CELL DIFFER., vol. 20, 1994, pages 181 - 196
LAST ET AL., THEOR. APPL. GENET., vol. 81, 1991, pages 581 - 588
LEACHAOYAGI, PLANT SCIENCE (LIMERICK, vol. 79, no. 1, 1991, pages 69 - 76
LI ET AL., PLANT PHYSIOL, vol. 151, 2009, pages 1087 - 1095
LIEBERMAN-LAZAROVICHLEVY, METHODS MOL BIOL, 2011, pages 51 - 65
LOMMEL, S. A. ET AL., VIROLOGY, vol. 81, 1991, pages 382 - 385
MACEJAK, D. G.SAMOW, P., NATURE, vol. 353, 1991, pages 810 - 812
MARINEAU ET AL., PLANT MOL. BIOL., vol. 9, 1987, pages 335 - 342
MATSUOKA ET AL., PROC NATL. ACAD. SCI. USA, vol. 90, no. 20, 1993, pages 9586 - 9590
MATSUOKA ET AL., PROC. NATL. ACAD. SCI. USA, vol. 90, no. 20, 1993, pages 9586 - 9590
MATTON ET AL., MOLECULAR PLANT-MICROBE INTERACTIONS, vol. 2, 1989, pages 325 - 331
MCBRIDE ET AL., PROC. NATL. ACAD. SCI. USA, vol. 91, no. 15, 1994, pages 7301 - 7305
MCCORMICK ET AL., PLANT CELL REPORTS, vol. 5, 1986, pages 81 - 84
MCELROY ET AL., PLANT CELL, vol. 2, no. 7, 1990, pages 1261 - 1272
MCGURL ET AL., SCIENCE, vol. 225, 1992, pages 1570 - 1573
MESSINGVIERRA, GENE, vol. 19, 1982, pages 259 - 268
MUNROE, GENE, vol. 91, 1990, pages 151 - 158
NEEDLEMANWUNSCH, J. MOL. BIOL., vol. 48, no. 3, 1970, pages 443 - 453
OPENSHAW ET AL.: "Marker-assisted Selection in Backcross Breeding", PROCEEDINGS OF THE SYMPOSIUM, 1994, pages 41 - 43
OROZCO ET AL., PLANT MOL. BIOL., vol. 23, no. 6, 1993, pages 1129 - 1138
OROZCO ET AL., PLANTMOLBIOL, vol. 23, no. 6, 1993, pages 1129 - 1138
OUAN ET AL., NATURE BIOTECHNOLOGY, vol. 14, 1996, pages 494 - 498
POREBSKI, S ET AL., PLANT MOLECULAR BIOLOGY REPORTER, vol. 15, no. 1, 1997, pages 8 - 15
PROUDFOOT, CELL, vol. 64, 1991, pages 671 - 674
QI ET AL., PLANT CELL ENVIRON, vol. 41, no. 9, 2018, pages 2109 - 2127
QI ET AL., PLANT CELL ENVIRON, vol. 41, no. 9, September 2018 (2018-09-01), pages 2109 - 2127
RAFALSKI, PLANT SCI, vol. 162, 2002, pages 329 - 333
RAGOT, M ET AL.: "Marker-assisted Backcrossing: A Practical Example", TECHNIQUES ET UTILISATIONS DES MARQUEURS MOLECULAIRES LES COLLOQUES, vol. 72, 1995, pages 45 - 56
RAKOCZY-TROJANOWSKA, M, CELL MOL BIOL LETT, vol. 7, 2002, pages 849 - 858
RAPID COMMUNMASS SPECTROM, vol. 21, no. 12, 2007, pages 1937 - 43
REDOLFI ET AL., NETH. J. PLANT PATHOL., vol. 89, 1983, pages 245 - 254
RICE, P.LONGDEN, I.BLEASBY, A.: "EMBOSS: The European Molecular Biology Open Software Suite", TRENDS IN GENETICS, vol. 16, no. 6, 2000, pages 276 - 277, XP004200114, DOI: 10.1016/S0168-9525(00)02024-2
RINEHARTETAL, PLANTPHYSIOL., vol. 112, no. 3, 1996, pages 1331 - 1341
RIVERA ET AL., PHYSICS OF LIFE REVIEWS, vol. 9, 2012, pages 308 - 345
ROHMEIERET, PLANTMOL. BIOL., vol. 22, 1993, pages 783 - 792
RUSSELL ET AL., TRANSGENIC RES., vol. 6, no. 2, 1997, pages 157 - 168
RYAN, ANN. REV. PHYTOPATH., vol. 28, 1990, pages 425 - 449
SANFACON, GENES DEV, vol. 5, 1991, pages 141 - 149
SANGER, PLANTMOL. BIOL., vol. 14, no. 3, 1990, pages 433 - 443
SCHMUTZ ET AL., NATURE, vol. 463, no. 7278, 14 January 2010 (2010-01-14), pages 178 - 83
SHI, CLIN. CHEM., vol. 47, no. 2, 2001, pages 164 - 172
SIEBERTZ ET AL., PLANT CELL, vol. 1, 1989, pages 961 - 968
SKUZESKI ET AL., PLANTMOLEC. BIOL., vol. 15, 1990, pages 65 - 79
SOMSISCH ET AL., MOL. GEN. GENET., vol. 2, 1988, pages 93 - 98
SOMSISCH ET AL., PROC. NATL. ACAD. SCI. USA, vol. 83, 1986, pages 2427 - 2430
SPENCER ET AL., THEOR. APPL. GENET, vol. 79, 1990, pages 625 - 631
STANFORD ET AL., MOL. GEN. GENET., vol. 215, 1989, pages 200 - 208
STAUBMALIGA, EMBO J., vol. 12, no. 2, 1993, pages 601 - 606
TEERI ET AL., EMBO J., vol. 8, no. 2, 1989, pages 343 - 350
TENKANEN ET AL., ENZYME MICROB. TECHNOL., vol. 14, 1992, pages 566
THOMPSON ET AL., BIOESSAYS, vol. 10, 1989, pages 108
TORRONEN ET AL., BIO/TECHNOLOGY, vol. 10, 1992, pages 1461
TZFIRA ET AL., TRENDS IN GENETICS, vol. 20, 2004, pages 375 - 383
UKNES ET AL., PLANT CELL, vol. 4, 1992, pages 645 - 656
VANLOON, PLANTMOL. VIROL., vol. 4, 1985, pages 111 - 116
VELTEN, EMBO J., vol. 3, 1984, pages 2723 - 2730
VIRENDER KUMAR: "Omics advances and integrative approaches for the simultaneous improvement of seed oil and protein content in soybean (Glycine max L.),", CRITICAL REVIEWS IN PLANT SCIENCES, vol. 40, no. 5, 10 August 2021 (2021-08-10), pages 398 - 421, XP009544301, ISSN: 0735-2689, Retrieved from the Internet <URL:https://www.tandfonline.com/doi/full/10.1080/07352689.2021.1954778> [retrieved on 20230508], DOI: 10.1080/07352689.2021.1954778 *
WANG X ET AL.: "BioVector, a flexible system for gene specific expression in plants", BMC PLANT BIOL, vol. 13, 2013, pages 198, XP021169278, DOI: 10.1186/1471-2229-13-198
WHITE ET AL., NUCL. ACIDS RES, vol. 18, 1990, pages 1062
WHITING, R.M. ET AL., BMC PLANT BIOL, vol. 20, no. 1, 23 October 2020 (2020-10-23), pages 485
WRIGHT ET AL., PLANT J, vol. 44, 2005, pages 693 - 705
XU ET AL., APPL. MICROBIOL. BIOTECHNOL., vol. 49, 1998, pages 718
YAMAMOTO ET AL., PLANT CELL PHYSIOL, vol. 35, no. 5, 1994, pages 773 - 778
YAMAMOTO ET AL., PLANT CELL PHYSIOL., vol. 35, no. 5, 1994, pages 773 - 778
YAMAMOTO ET AL., PLANT J, vol. 12, no. 2, 1997, pages 255 - 265
YANG, PROC. NATL. ACAD. SCI. USA, vol. 93, 1996, pages 14972 - 14977
YAO ET AL., JOURNAL OF EXPERIMENTAL BOTANY, vol. 57, 2006, pages 3737 - 3746
YAU ET AL., PLANT J, vol. 701, 2011, pages 147 - 166
ZUPANZAMBRY SKI, PLANT PHYSIOLOGY, vol. 107, 1995, pages 1041 - 1047

Similar Documents

Publication Publication Date Title
US20230212595A1 (en) Generation of site specific integration sites for complex trait loci in corn and soybean, and methods of use
US20230203525A1 (en) Compositions and methods for enhancing resistance to northern leaf blight in maize
US11932865B2 (en) Acetyl co-enzyme a carboxylase herbicide resistant plants
US10662435B2 (en) Plants having altered agronomic characteristics under abiotic stress conditions and related constructs and methods involving genes encoding NAC3/ONAC067 polypeptides
US20190085355A1 (en) Drought tolerant maize
US20130312136A1 (en) Methods and Compositions for Modulating Gene Expression in Plants
WO2023154887A1 (en) Methods and compositions for increasing protein and/or oil content and modifying oil profile in a plant
WO2023151007A1 (en) Methods and compositions for increasing protein and/or oil content and modifying oil profile in a plant
WO2023151004A1 (en) Methods and compositions for increasing protein and oil content and/or modifying oil profile in plant
US20170306346A1 (en) Improved agronomic characteristics under water limiting conditions for plants expressing pub10 polypeptides
EP2638167A2 (en) Dominant negative mutant kip-related proteins (krp) in zea mays and methods of their use
Mandal et al. Osmotin: A PR gene impart tolerance to excess salt in Indica Rice
CN110959043A (en) Method for improving agronomic traits of plants by using BCS1L gene and guide RNA/CAS endonuclease system
WO2023168691A1 (en) Methods and compositions for modifying flowering time genes in plants
Öz Microarray based expression profiling of barley under boron stress and cloning of 3H boron tolerance gene
CN116802305A (en) Novel resistance genes associated with disease resistance in soybean
WO2024008752A1 (en) Methods to increase iron content in plants
AU2012212301B9 (en) Acetyl Co-Enzyme A carboxylase herbicide resistant plants
CN116234816A (en) Compositions and methods for enhancing resistance to northern leaf blight in maize
Mall Evaluation of novel input output traits in sorghum through biotechnology

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23711890

Country of ref document: EP

Kind code of ref document: A1