WO2023151007A1

WO2023151007A1 - Methods and compositions for increasing protein and/or oil content and modifying oil profile in a plant

Info

Publication number: WO2023151007A1
Application number: PCT/CN2022/075982
Authority: WO
Inventors: Qingshan Chen; Zhaoming QI; Dawei XIN; Jian LV; Xiaoping Tan
Original assignee: Northeast Agriculture University; Syngenta Group Co, Ltd.; Syngenta Crop Protection Ag
Priority date: 2022-02-11
Filing date: 2022-02-11
Publication date: 2023-08-17

Abstract

Compositions and methods for increasing the protein content and/or increasing oil content of soybean plant are provided. Compositions include isolated and recombinant polynucleotides encoding polypeptides, expression cassettes, host cells, plants, plant parts stably incorporating these polynucleotides. Methods and kits are provided for producing these plants via transgenic means, breeding or genomic editing approaches and identify plants having increased protein content, increased oil content, and/or modified oil profile.

Description

METHODS AND COMPOSITIONS FOR INCREASING PROTEIN AND/OR OIL CONTENT AND MODIFYING OIL PROFILE IN A PLANT

FIELD

This disclosure relates to the field of plant biotechnology. In particular, it relates to methods and compositions for increasing plant protein /oil content and modifying oil profile.

BACKGROUND

Soybean is a valuable field crop. Soybean oil extracted from the seed is employed in a number of retail products such as cooking oil, baked goods, margarines and the like. Soybean is also used as a grain as a food source for both animals and humans. Soybean meal is a component of many foods and animal feed. Typically, during processing of whole soybeans, the fibrous hull is removed, and the oil is extracted, and the remaining soybean meal is a combination of approximately 50%carbohydrates and 50%protein. For human consumption soybean meal is made into soybean flour that is processed to protein concentrates used for meat extenders or specialty pet foods. Production of edible protein ingredients from soybean offers a healthier and less expensive replacement for animal protein in meats as well as dairy-type products.

BRIEF SUMMARY

In some embodiments, disclosed herein is an elite Glycine max plant having in its genome a nucleic acid sequence from a donor Glycine plant, wherein the donor Glycine plant is a different strain from the elite Glycine max plant, and wherein the nucleic acid sequence encoding at least one polypeotide having at least 90%identity or 95%identity to SEQ ID NOs: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, wherein said polypeptide confers increased protein content, oil content, and/or modified oil profile on the elite Glycine max plant as compared to a control plant not comprising said nucleic acid sequence.

In some embodiments, disclosed herein is a plant, the genome of which has been edited to comprise a nucleic acid sequence encoding at least one polypeptide having at least 90%identity or 95%identity to SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 and/or 72, wherein said polypeptide confers increased protein content, increased oil content, and/or modified oil profile relative to a control plant, wherein the plant does not comprise said nucleic acid sequence before the genome editing.

In some embodiments, disclosed herein is a plant having stably incorporated into its genome a nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence encodes a polypeptide having an amino acid sequence that has at least 85%identity, at least 90%identity, or at least 95%identity to at least one of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 and/or 72, or an amino acid sequence set forth in SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 and/or 72, wherein said nucleic acid sequence is heterologous to the plant, and wherein the plant has increased protein content, increased oil content, and/or modified oil profileas compared to a control plant.

In some embodiments, disclosed herein is a method of producing a soybean plant having increased polypeptide and/or oil content, the method comprising the steps of: a) providing a donor soybean plant comprising in its genome a nucleic acid sequence encoding any at least one polypeptide having at least 90%identity or 95%identity to SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, wherein said nucleic acid sequence confers to said donor soybean plant increased protein content, increased oil content, and/or modified oil profile compared to donor Glycine plant, b) crossing the donor soybean plant of a) with a recipient soybean plant not comprising said nucleic acid sequence; and c) selecting a progeny plant from the cross of b) by detecting the presence of the nucleic acid sequence, or the presence of one or more molecular markers associated with the nucleic acid sequencein the progeny plant, thereby producing a soybean plant having increased protein content, increased oil content, and/or modified oil profile.

In some embodiments, disclosed herein is a method of producing a Glycine max plant with increased protein content, increased oil content, and/or modified oil profile, the method comprising the steps of: a) isolating a nucleic acid from a Glycine max plant b) detecting in the nucleic acid of a) at least one molecular marker associated with a nucleic acid sequence comprising any one of SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 or 70, wherein said nucleic acid sequence confers to the Glycine max plant increased protein content, increased oil content, and/or modified oil profile; c) selecting a Glycine max plant based on the presence of the molecular marker detected in b) ; and d) producing a Glycine max progeny plant from the plant of c) identified as having said molecular marker associated with increased polypeptide and/or increased oil content.

In some embodiments, disclosed herein is a method of conferring increased protein content, increased oil content, and/or modified oil profile to a plant comprising: a) introducing into the genome of the plant a nucleic acid molecule operably linked to a promoter active in the plant, wherein the nucleic acid sequence is stably incorporated into the genome, wherein the nucleic acid sequence encodes a polypeptide having (i) an amino acid sequence comprising at least 85%, at least 90%, or at least 95%identity to any one of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, or (ii) an amino acid sequence set forth in SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, wherein said nucleic acid sequence is heterologous to the plant, and wherein expression of said nucleic acid sequence increases protein content, increased oil content, and/or modified oil profile compared to a control plant not expressing said nucleic acid sequence.

In some embodiments, disclosed herein is a polypeptide selected from: (a) a polypeptide having the amino acid sequence shown in SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, or any portion thereof, wherein the portion confers increased polypeptide and/or oil content, and having a heterologous amino acid sequence attached thereto; (b) a polypeptide comprising the amino acid sequence of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, and having substitution and/or deletion and/or addition of one or more amino acid residues, wherein expression of the polypeptide confers increased polypeptide and/or oil content on the plant; (c) a polypeptide having more than 99%, more than 95%, more than 90%, more than 85%, or more than 80%identity with the amino acid sequence of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, wherein the polypeptide when expressed in a plant confers increased polypeptide and/or oil content on the plant; or (d) a fusion polypeptide comprising the amino acid sequence of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, or the polypeptide as defined in any one of (a) to (c) .

In some embodiments, disclosed herein is a nucleic acid molecule comprising: (a) a nucleotide sequence encoding a protein having an amino acid sequence sharing at least 90%, 95%or 100%sequence identity to SEQ ID NOs: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, wherein said nucleotide sequence comprises a heterologous nucleic acid sequence attached thereto and expression of the nucleic acid molecule in a plant increases protein and/or oil content in the plant; (b) the nucleotide sequence of part (a) comprising a sequence of SEQ ID NOs: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 or 70; or (c) the nucleotide sequence of part (a) having at least more than 99%, at least 95%, at least 90%, at least 85%, or at least 80%identity to any one of SEQ ID NOs1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 or 70. In some embodiments, disclosed herein is a primer pair for amplifying the nucleic acid molecule described above. In some embodiments, disclosed herein is a kit comprising the primer pair.

BRIEF DESCRIPTION OF THE DRAWINGS

The present application includes the following figures. The figures are intended to illustrate certain embodiments and/or features of the compositions and methods, and to supplement any description (s) of the compositions and methods. The figures do not limit the scope of the compositions and methods, unless the written description expressly indicates that such is the case.

FIG. 1 shows bar charts of protein content (PC) and oil content (OC) values in the CSSL population among 2013-2015 trial. X-axis represents seed protein and oil content, Y-axis represents the density of frequency distribution, and red solid line means normal distributive curve of CSSLs population. Black arrow represents protein/oil content of wild soybean (ZYD00006) protein and oil content, and red arrow represents protein/oil content of Suinong14.

FIG. 2 shows the results of the WEGO analysis of candidate genes in the “hot spot” regtion (33.54Mb-34.70Mb) according to certain aspects of this disclosure.

FIG. 3 is a schematic illustration of the structure of the soybean protein Glyma. 20G092000 according to certain aspects of this disclosure.

FIG. 4 is a schematic illustration of the structure of the soybean protein Glyma. 20G092100 according to certain aspects of this disclosure.

FIG. 5 is a schematic illustration of the structure of the soybean protein Glyma. 20G092400 according to certain aspects of this disclosure.

FIG. 6 is a schematic illustration of the structure of the soybean protein Glyma. 20G094900 according to certain aspects of this disclosure.

FIG. 7 illustrates protein content, as reflected by the total nitrogen content on Y axis, in soybean seed at different developmental stages in recurrent parent Suinong 14 (SN14; diamond) and four chromosome segment substation lines High Protein Low Oil (HPLO; circle) , Low Protein High Oil (LPHO; triangle) , High Protein High Oil (HPHO; square) , and Low Protein Low Oil (LPLO; star) , according to certain aspects of this disclosure.

FIGS. 8A-8F illustrate fatty acid compositions in soybean seed at different developmental stages in SN14 (diamond) , HPLO (circle) , LPHO (triangle) , HPLO (square) , and LPLO (star) according to certain aspects of this disclosure. FIG. 8A shows palmitic acid measurments, FIG. 8B shows stearic acid measurments, FIG. 8C shows oleic acid measurments, FIG. 8D shows linoleic acid measurments, FIG. 8E shows linolenic acid measurments, and FIG. 8F shows total fatty acid measurments.

FIG. 9 illustrates tissue-specific expression of candidate genes Glyma. 20G092000, Glyma. 20G092100, Glyma. 20G092400, and Glyma. 20G094900 according to certain aspects of this disclosure. Expression was assessed by RT-qPCR with 3 replicates.

FIG. 10 illustrates expression profiles of candidate genes in seeds at different developmental stages for candidate genes Glyma. 20G092000, Glyma. 20G092100, Glyma. 20G092400, and Glyma. 20G094900 according to certain aspects of this disclosure.

FIG. 11 depicts subcellular localization of Glyma. 20G092400.

FIG. 12 shows quantitative analysis of Glyma. 20G092400 transcripts in Arabidopsis wild type ecotype Col-0 plants (WT) , mutant SALK_021984C plants, and trangenic Arabidopsis replenishment plants (pSOY1: Glyma. 20G092400/SALK_021984C) and overexpression plants (pSOY1: Glyma. 20G092400) according to certain aspects of this disclosure. Expression was assessed by RT-qPCR

FIG. 13 shows bolting assessment of of T3 generation wild type Col-0 plants (WT) , mutant SALK_021984C plants, and transgenic Arabidopsis replenishment plants (pSOY1: Glyma. 20G092400/SALK_021984C) and overexpression plants (pSOY1: Glyma. 20G092400) according to certain aspects of this disclosure.

FIG. 14 shows inflorescence of T3 generation wild type Col-0 plants (WT) , mutant SALK_021984C plants, and transgenic Arabidopsis replenishment plants (pSOY1: Glyma. 20G092400/SALK_021984C) and overexpression plants (pSOY1: Glyma. 20G092400) according to certain aspects of this disclosure.

FIGS. 15A-15B show fatty acid compositions in seeds from wild type plants (WT) , mutant SALK_021984C plants, and transgenic Arabidopsis replenishment plants (pSOY1: Glyma. 20G092400/SALK_021984C) and overexpression plants (pSOY1: Glyma. 20G092400) according to certain aspects of this disclosure. Asterisks indicate significant differences when compared with WT (*, 0.05>P≥0.01 and **, P<0.01) . FIG. 15A shows the content of various fatty acids. From left to right: WT (Col-0) , SALK_021984C, pSOY1: Glyma. 20G092400/SALK_021984C, and pSOY1: Glyma. 20G092400. FIG. 15B shows total fatty acid content

FIG. 16 shows quantitative analysis of Glyma. 20G092400 transcripts in plant leaves of wild type (WT) and Glyma. 20G092400-OE soybean mutants according to certain aspects of this disclosure. Expression was assessed by RT-qPCR.

FIGS. 17A-17C show protein and fatty acid contents in wild type (WT) and Glyma. 20G092400-OE soybean mutant seeds according to certain aspects of this disclosure. FIG. 17A shows contents of various fatty acids. From left to right: WT, Glyma. 20G092400-OE mutants, 1, 2, and 3. FIG. 17B shows total fatty acid contents. FIG. 17C shows protein contents. Asterisks indicate significant differences compared with WT (*, 0.05>P≥0.01 and **, P<0.01) .

FIG. 18 shows a phylogenetic tree of Glyma. 20G092400 according to certain aspects of this disclosure.

DETAILED DESCRIPTION

All technical and scientific terms used herein, unless otherwise defined below, are intended to have the same meaning as commonly understood by one of ordinary skill in the art. References to techniques employed herein are intended to refer to the techniques as commonly understood in the art, including variations on those techniques and/or substitutions of equivalent techniques that would be apparent to one of skill in the art.

Provided herein are plants expressing polypeptides that increase protein content and/or increase oil content when expressed in a plant. In some instances, the polypeptides result in a modified oil profile when expressed in a plant or part thereof as compared to a control plant that does not express the polypeptides. The terms “oil content” and “fatty acid content” are used interchangeably herein. The terms “fatty acid profile” and “oil profile” are used interchangeably herein. The polypeptides include SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72 and variants of thereof. Various means of introducing nucleic acid sequence into the soybean plant are also disclosed, which include transgenic means, gene editing, and breeding. Markers for identifying the presence of these nucleic acid sequences in the plant are also disclosed. As used herein, the terms “phenotype, ” “phenotypic trait” or “trait” refer to a distinguishable characteristic (s) of a genetically controlled trait.

In some embodiments, the plants provided herein are a non-naturally occurring variety of soybean having the desired trait. In specific embodiments, the non-naturally occurring variety of soybean is an elite soybean variety. A “non-naturally occurring variety of soybean” is any variety of soybean that does not naturally exist in nature. A “non-naturally occurring variety of soybean” may be produced by any method known in the art, including, but not limited to, transforming a soybean plant or germplasm, transfecting a soybean plant or germplasm, and crossing a naturally occurring variety of soybean with a non-naturally occurring variety of soybean. In some embodiments, a “non-naturally occurring variety of soybean” may comprise one of more heterologous nucleotide sequences. In some embodiments, a “non-naturally occurring variety of soybean” may comprise one or more non-naturally occurring copies of a naturally occurring nucleotide sequence (i.e., extraneous copies of a gene that naturally occurs in soybean) . In some embodiments, a “non-naturally occurring variety of soybean” may comprise a non-natural combination of two or more naturally occurring nucleotide sequences (i.e., two or more naturally occurring genes that do not naturally occur in the same soybean, for instance genes not found in Glycine max lines) .

Methods and compositions are provided that modulate the level of oil, protein and/or fatty acids in a plant, a plant part, or a seed. In specific embodiments, various methods and compositions are provided that produce an increase in protein content in the plant, plant part or seed. An increase in protein content includes any statistically significant increase in the protein content in the plant, plant part or seed when compared to an appropriate control plant or plant part and includes, for example, an increase of at least 0.2%, 0.4%, 0.6%, 0.8%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%or higher. In other embodiments an increase in protein content includes an increase of about 0.2%to about 0.5%, about 0.5%to about 1%, about 1%to about 2%, about 2%to about 3%, about 4%to about 5%, about 5%to about 6%, about 6%to about 7%, about 7%to about 8%, about 8%to about 9%, about 9%to about 10%, about 10%to about 12%, about 12%to about 14%, about 14%to about 16%, about 16%to about 18%, about 18%to about 20%, about 22%to about 25%, about 25%to about 30%. Various methods of assaying for protein content levels are known. For example, mature seeds can be harvested, and grain protein content can be determined by FOSS NIR analysis (see examples) or by assaying for nitrogen content with an automatic Kieldahl apparatus.

In other embodiments, various methods and compositions are provided that produce an increase in oil content (e.g., increase in fatty acid content) in the plant, plant part or seed. An increase in oil content includes any statistically significant increase in the oil content in the plant, plant part or seed when compared to an appropriate control plant or plant part and includes, for example, an increase of at least 0.2%, 0.4%, 0.6%, 0.8%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%or higher. In other embodiments an increase in oil content includes an increase of about 0.2%to about 0.5%, about 0.5%to about 1%, about 1%to about 2%, about 2%to about 3%, about 4%to about 5%, about 5%to about 6%, about 6%to about 7%, about 7%to about 8%, about 8%to about 9%, about 9%to about 10%, about 10%to about 12%, about 12%to about 14%, about 14%to about 16%, about 16%to about 18%, about 18%to about 20%, about 22%to about 25%, about 25%to about 30%. Various methods of assaying for oil content levels are known. For example, mature seeds can be harvested, and grain protein content can be determined by FOSS analysis (see Examples) .

In other embodiments, various methods and compositions are provided that produce an increase in fatty acid content in the plant, plant part or seed. An increase in fatty acid content includes any statistically significant increase in the fatty content in the plant, plant part or seed when compared to an appropriate control plant or plant part and includes, for example, an increase of at least 0.2%, 0.4%, 0.6%, 0.8%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%or higher. In other embodiments, an increase in fatty acid content includes an increase of about 0.2%to about 0.5%, about 0.5%to about 1%, about 1%to about 2%, about 2%to about 3%, about 4%to about 5%, about 5%to about 6%, about 6%to about 7%, about 7%to about 8%, about 8%to about 9%, about 9%to about 10%, about 10%to about 12%, about 12%to about 14%, about 14%to about 16%, about 16%to about 18%, about 18%to about 20%, about 22%to about 25%, about 25%to about 30%. Various methods of assaying for fatty content levels are known. For example, mature seeds can be harvested, and grain protein content can be determined by gas chromatography (see examples) . In specific embodiments, the methods and compositions provide for an increase in linoleic acid and/or palmitic acid and/or oleic acid and/or eicosenoic acid in increased (or any combination thereof) when compared to an appropriate control plant. Such increases include for example, an increase of at least 0.2%, 0.4%, 0.6%, 0.8%, 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%or higher. In other embodiments, an increase in linoleic acid and/or palmitic acid and/or oleic acid and/or eicosenoic acid in increased (or any combination thereof) includes an increase of about 0.2%to about 0.5%, about 0.5%to about 1%, about 1%to about 2%, about 2%to about 3%, about 4%to about 5%, about 5%to about 6%, about 6%to about 7%, about 7%to about 8%, about 8%to about 9%, about 9%to about 10%, about 10%to about 12%, about 12%to about 14%, about 14%to about 16%, about 16%to about 18%, about 18%to about 20%, about 22%to about 25%, about 25%to about 30%. or higher of linoleic acid and/or palmitic acid and/or oleic acid and/or eicosenoic acid.

A "subject plant or plant cell" is one in which genetic alteration, such as transformation, has been affected as to a polynucleotide of interest, or is a plant or plant cell which is descended from a plant or cell so altered and which comprises the alteration. A "control" or "control plant" or "control plant cell" provides a reference point for measuring changes in phenotype of the subject plant or plant cell. A control plant or plant cell may comprise, for example: (a) a wild-type plant or cell, i.e., of the same genotype as the starting material for the genetic alteration which resulted in the subject plant or cell; (b) a plant or plant cell of the same genotype as the starting material but which has been transformed with a null construct (i.e., with a construct which has no known effect on the trait of interest, such as a construct comprising a marker gene) ; (c) a plant or plant cell which is a non-transformed segregant among progeny of a subject plant or plant cell; (d) a plant or plant cell genetically identical to the subject plant or plant cell but which is not exposed to conditions or stimuli that would induce expression of the gene of interest; or (e) the subject plant or plant cell itself, under conditions in which the gene of interest is not expressed.

I. Polynucleotides and polypeptides that confer increased protein content, increased oil content, and/or modified oil profile

Compositions and methods for conferring increased protein content, increased oil content, and/or modified oil profile are provided. Polypeptides, polynucleotides and fragments and variants thereof that confer increased protein content, increased oil content, and/or modified oil profile are provided. In some embodiments, the polypeptide is SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72; or a fragment or variant of any one of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72. In some embodiments, the polynucleotide is any one of SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 or 70; a polynucleotide encoding a polypeptide having the sequence of any one of SEQ ID NO: 2, 5, 8, 11, 14, 44, 47, 50, 53, 56, 59, 62, 65, 68 or 71, or a fragment or variant of any one thereof. As used herein, the term “gene” refers to a hereditary unit including a sequence of DNA that occupies a specific location on a chromosome and that contains the genetic instruction for a particular characteristic or train in an organism. In various embodiments, the genome of the soybean cultivar Williams 82 is used as the reference soybean genome. Williams 82 was derived from backcrossing a Phytophthora root rot resistance locus from the donor parent Kingwa into the recurrent parent Williams. See Schmutz et al., Nature, 2010 Jan 14; 463 (7278) : 178-83. doi: 10.1038/nature08670.

Glyma. 20G092400

Glyma. 20G092400 (SEQ ID NO: 3) is detected in all tissues and organs, with the highest expression level in seeds (herein also referred to as grains) (FIG. 9) . The expression level is the highest in the late milk (LM) stage of the grain. Glyma. 20G092400 includes several conserved domains of the amino acid transferase-V family. This domain is found in amino acid transferase and other enzymes including cysteine desulfurase (FIG. 5) . Glyma. 20G092400 comprises a selenocysteine lyase/Cysteine desulfurase (aa 50-437 of SEQ ID NO: 3) ; a Cysteine desulfurase (SufS) -like domain (aa 91-274 of SEQ ID NO: 3) ; a Aminotransferase class-V domain (aa 93-274 of SEQ ID NO: 3) , and a Bifunctional selenocysteine lyase/cysteine desulfurase (aa 92-275 of SEQ ID NO: 3) .

Glyma. 20G092000

Glyma. 20G092000 (SEQ ID NO. 6) is detected in all tissues and organs, with the highest expression level in seeds (grains) (FIG. 9) . The expression level is the highest in the LM stage of the grain. Glyma. 20G092000 comprises several conserved domains in the retroviral protease superfamily, which includes the pepsin-like aspartic protease of cells and retroviruses, and also has sphingolipid activator-like protein type B, region 1 and region 2 (FIG. 3) . Glyma. 20G092000 comprises: a Phytepsin domain (aa 76-505 of SEQ ID NO: 6) ; a Eukaryotic aspartyl protease (ASP) domain (aa 84-506 of SEQ ID NO: 6) ; an aspartyl protease domain (aa 77-507 of SEQ ID NO: 6) ; two Saposin (B) Domains (aa 316-351 and aa380-418 of SEQ ID NO: 6) .

Glyma. 20G094900

Glyma. 20G094900 (SEQ ID NO: 9) sequence is detected in all tissues and organs, with the highest expression level in seeds (grains) (FIG. 9) . The expression level is the highest in the LM stage of the grain. Glyma. 20G094900 is a protein with unknown function identified as DUF1336) and appears to belong to the DUF1336 superfamily. This family represents the C-terminus of many pseudoproteins with unknown function (FIG. 6) . Glyma. 20G094900 comprises a protein enhanced disease resistance 2 (EDR2) C-terminal domain (aa 2-68 of SEQ ID NO: 9) .

Glyma. 20G092100

Glyma. 20G092100 (SEQ ID NO: 12) sequence is detected in all tissues and organs, with the highest expression level in seeds (grains) (FIG. 9) . The expression level is the highest in the DS stage of the grain. Glyma. 20G092100 comprises several conserved domains matching to the PPR repeat family (FIG. 4) . Glyma. 20G092100 comprises several tetratricopeptide-like (TPR) helical domains (aa 57-253, aa 229-365, and aa 404-461 of SEQ ID NO: 12) and pentatricopeptide repeats (aa 403-429, aa 578-607, aa 438-461, aa 370-398, aa 647-675, aa 194-241, aa 264-313, aa 265-299, aa 540-574, aa 300-334, aa 435-469, aa 644-678, aa 403-434, aa 195-229, aa 230-264, aa 88-122, aa 158-194, aa 335-365, aa 470-504, aa 366-400, aa 123-157, aa 575-609, aa 370-398, aa 648-680, aa 269-301, aa 438-470, aa 233-265, aa 578-610, aa 403-429, aa 578-607, aa 438-461, aa 370-398, aa 647-675, aa 194-241, aa 264-313, aa 265-299, aa 540-574, aa 300-334, aa 435-469, aa 644-678, aa 403-434, aa 195-229, aa 335-365, aa 158-194, aa 88-122, aa 230-264, aa 470-504, aa 366-400, aa 505-539, aa 123-157, aa 575-609, aa 370-398, aa 269-301, aa 648-680, aa 438-470, aa 233-265, and aa 578-610 of SEQ ID NO: 12) .

The term “corresponding to” in the context of nucleic acid sequences means that when the nucleic acid sequences of certain sequences are aligned with each other, the nucleic acids that “correspond to” certain enumerated positions in the present invention are those that align with these positions in a reference sequence, but that are not necessarily in these exact numerical positions relative to a particular nucleic acid sequence of the invention. Optimal alignment of sequences for comparison can be conducted by computerized implementations of known algorithms. or by visual inspection. Readily available sequence comparison and multiple sequence alignment algorithms are, respectively, the Basic Local Alignment Search Tool (BLAST) and ClustalW/ClustalW2/Clustal Omega programs available on the Internet (e.g., the website of the EMBL-EBI) . Other suitable programs include, but are not limited to, GAP, BestFit, Plot Similarity, and FASTA, which are part of the Accelrys GCG Package available from Accelrys, Inc. of San Diego, Calif., United States of America. See also Smith &Waterman, 1981; Needleman &Wunsch, 1970; Pearson &Lipman, 1988; Ausubel et al., 1988; and Sambrook &Russell, 2001.

In some embodiments, variants and fragments of the above-described polynucleotides and polypeptides and variants and fragments thereof increase protein content, increase oil content, and/or modify oil profile when expressed in a plant, plant part, or seed.

Fragments of the proteins that increase protein content, increase oil content, and/or modify oil profile when expressed in a plant, plant part, or seed include those that are shorter than the full-length sequences, either due to the use of an alternate downstream start site, or due to processing that produces a shorter protein having the activity. A fragment of a protein that increases protein content, increases oil content, and/or modifies oil profile when expressed in a plant can be a polypeptide that is, for example, 10, 25, 50, 100, 150, 200, 250 or more amino acids in length of any one of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72. Such biologically active portions can be prepared by recombinant techniques and evaluated for activity of being able to confer increased protein content, increased oil content, and/or modified oil profile. As used herein, a fragment comprises at least 8 contiguous amino acids of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72.

Variants disclosed herein are polypeptides having an amino acid sequence that has at least 60%, about 65%, about 70%, about 75%, about 80%, about 85%, about 86%, about 87%, about 88%, about 89%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%or about 99%identity to the amino acid sequence of any one of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72. Such variants will increase protein content, increased oil content, and/or modified oil profile when expressed in a plant, plant part or seed. In some embodiments, a variant polynucleotide comprises a deletion and/or addition of one or more nucleotides at one or more internal sites within the native polynucleotide and/or a substitution of one or more nucleotides at one or more sites in the native polynucleotide.

Unless otherwise stated, identity and similarity will be calculated by the Needleman-Wunsch global alignment and scoring algorithms (Needleman and Wunsch (1970) J. Mol. Biol. 48 (3) : 443-453) as implemented by the "needle" program, distributed as part of the EMBOSS software package (Rice, P., Longden, I., and Bleasby, A., EMBOSS: The European Molecular Biology Open Software Suite, 2000, Trends in Genetics 16, (6) pp276-277, versions 6.3.1 available from EMBnet at embnet. org/resource/emboss and emboss. sourceforge. net, among other sources) using default gap penalties and scoring matrices (EBLOSUM62 for protein and EDNAFULL for DNA) . Equivalent programs may also be used. By "equivalent program" is intended any sequence comparison program that, for any two sequences in question, generates an alignment having identical nucleotide residue matches and an identical percent sequence identity when compared to the corresponding alignment generated by needle from EMBOSS version 6.3.1.

Additional mathematical algorithms are known in the art and can be utilized for the comparison of two sequences. See, for example, the algorithm of Karlin and Altschul (1990) Proc. Natl. Acad. Sci. USA 87: 2264, modified as in Karlin and Altschul (1993) Proc. Natl. Acad. Sci. USA 90: 5873-5877. Such an algorithm is incorporated into the BLAST programs of Altschul et al. (1990) J. Mol. Biol. 215: 403. BLAST nucleotide searches can be performed with the BLASTN program (nucleotide query searched against nucleotide sequences) to obtain nucleotide sequences homologous to nucleic acid molecules of the invention, or with the BLASTX program (translated nucleotide query searched against protein sequences) to obtain protein sequences homologous to nucleic acid molecules of the invention. BLAST protein searches can be performed with the BLASTP program (protein query searched against protein sequences) to obtain amino acid sequences homologous to protein molecules of the invention, or with the TBLASTN program (protein query searched against translated nucleotide sequences) to obtain nucleotide sequences homologous to protein molecules of the invention. To obtain gapped alignments for comparison purposes, Gapped BLAST (in BLAST 2.0) can be utilized as described in Altschul et al. (1997) Nucleic Acids Res. 25: 3389. Alternatively, PSI-Blast can be used to perform an iterated search that detects distant relationships between molecules. See Altschul et al. (1997) supra. When utilizing BLAST, Gapped BLAST, and PSI-Blast programs, the default parameters of the respective programs (e.g., BLASTX and BLASTN) can be used. Alignment may also be performed manually by inspection.

Two sequences are "optimally aligned" when they are aligned for similarity scoring using a defined amino acid substitution matrix (e.g., BLOSUM62) , gap existence penalty and gap extension penalty so as to arrive at the highest score possible for that pair of sequences. Amino acid substitution matrices and their use in quantifying the similarity between two sequences are well-known in the art and described, e.g., in Dayhoff et al. (1978) "A model of evolutionary change in proteins. " In "Atlas of Protein Sequence and Structure, " Vol. 5, Suppl. 3 (ed. M.O. Dayhoff) , pp. 345-352. Natl. Biomed. Res. Found., Washington, D.C. and Hemkoff et al. (1992) Proc. Natl. Acad. Sci. USA 89: 10915-10919. The BLOSUM62 matrix is often used as a default scoring substitution matrix in sequence alignment protocols. The gap existence penalty is imposed for the introduction of a single amino acid gap in one of the aligned sequences, and the gap extension penalty is imposed for each additional empty amino acid position inserted into an already opened gap. The alignment is defined by the amino acids positions of each sequence at which the alignment begins and ends, and optionally by the insertion of a gap or multiple gaps in one or both sequences, so as to arrive at the highest possible score. While optimal alignment and scoring can be accomplished manually, the process is facilitated by the use of a computer-implemented alignment algorithm, e.g., gapped BLAST 2.0, described in Altschul et al. (1997) Nucleic Acids Res. 25: 3389-3402, and made available to the public at the National Center for Biotechnology Information Website (www. ncbi. nlm. nih. gov) . Optimal alignments, including multiple alignments, can be prepared using, e.g., PSI-BLAST, available through www. ncbi. nlm. nih. gov and described by Altschul et al. (1997) Nucleic Acids Res. 25: 3389-3402.

In some embodiments, fragments and variants of the polypeptides disclosed herein each comprises one or more conserved domains of the canonical polypeptide. In some embodiments, the variant or fragment can comprise a polypeptide comprising at least 40%, 50%, 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%identity to one or more of the conserved domains in the canonical polypeptide sequence.

In one example, a variant or fragment of Glyma. 20G092400 (SEQ ID NO: 3) may comprise a selenocysteine lyase/Cysteine desulfurase (aa 50-437 of SEQ ID NO: 3) ; a Cysteine desulfurase (SufS) -like domain (aa 91-274 of SEQ ID NO: 3) ; an Aminotransferase class-V domain (aa 93-274 of SEQ ID NO: 3) , and a Bifunctional selenocysteine lyase/cysteine desulfurase (aa 92-275 of SEQ ID NO: 3) . A variant or fragment of Glyma. 20G092400 (SEQ ID NO: 3) can comprise a polypeptide comprising at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%identical to one or more of the conserved domains of Glyma. 20G092400 (SEQ ID NO: 3) .

In another example, a variant or fragment of Glyma. 20G092000 (SEQ ID NO: 6) can comprise a Phytepsin domain (aa 76-505 of SEQ ID NO: 6) ; a Eukaryotic aspartyl protease (ASP) domain (aa 84-506 of SEQ ID NO: 6) ; an aspartyl protease domain (aa 77-507 of SEQ ID NO: 6) ; two Saposin (B) Domains (aa 316-351 and aa380-418 of SEQ ID NO: 6) . A variant or fragment of Glyma. 20G092000 (SEQ ID NO: 6) can retain functionality as aspartic proteinase. A variant or fragment of Glyma. 20G092000 (SEQ ID NO: 6) can comprise a polypeptide comprising at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%identical to one or more of the conserved domains of Glyma. 20G092000 (SEQ ID NO: 6) .

In another example, a variant or fragment of Glyma. 20G094900 (SEQ ID NO: 9) Glyma. 20G094900 can comprise one or more of the conserved domains of a DUF1336 superfamily protein. In some embodiments, the variant or fragment can comprise a protein enhanced disease resistance 2 (EDR2) C-terminal domain (aa 2-68 of SEQ ID NO: 9) . A variant or fragment of Glyma. 20G094900 (SEQ ID NO: 9) can comprise a polypeptide comprising at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%identical to one or more of the conserved domains of Glyma. 20G094900 (SEQ ID NO: 9) . A variant or a fragment of Glyma. 20G094900 can retain activities similar to EDR2 in regulating pathogen resistance.

In another example, a variant or fragment of Glyma. 20G092100 (SEQ ID NO: 12) can comprise one or more of a tetratricopeptide-like (TPR) helical domains (aa 57-253, aa 229-365, and aa 404-461 of SEQ ID NO: 12) and/or one or more of the pentatricopeptide repeats (aa 403-429, aa 578-607, aa 438-461, aa 370-398, aa 647-675, aa 194-241, aa 264-313, aa 265-299, aa 540-574, aa 300-334, aa 435-469, aa 644-678, aa 403-434, aa 195-229, aa 230-264, aa 88-122, aa 158-194, aa 335-365, aa 470-504, aa 366-400, aa 123-157, aa 575-609, aa 370-398, aa 648-680, aa 269-301, aa 438-470, aa 233-265, aa 578-610, aa 403-429, aa 578-607, aa 438-461, aa 370-398, aa 647-675, aa 194-241, aa 264-313, aa 265-299, aa 540-574, aa 300-334, aa 435-469, aa 644-678, aa 403-434, aa 195-229, aa 335-365, aa 158-194, aa 88-122, aa 230-264, aa 470-504, aa 366-400, aa 505-539, aa 123-157, aa 575-609, aa 370-398, aa 269-301, aa 648-680, aa 438-470, aa 233-265, and aa 578-610 of SEQ ID NO: 12) . A variant or fragment of Glyma. 20G092100 (SEQ ID NO: 12) can comprise a polypeptide comprising at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 98%, or at least 99%identical to one or more of the conserved domains of Glyma. 20G092100 (SEQ ID NO: 12) . A variant or fragment of Glyma. 20G092100 (SEQ ID NO: 12) can retain activitivies similar to TPR in mediating protein-protein interactions and the assembly of multiple protein complexes.

As indicated, fragments and variants of the polypeptides disclosed herein will retain the activity of conferring increased protein content, increased oil content, and/or modified oil profile to a plant expressing the polypeptide. Such increase in protein content and/or oil content can comprise any statistically significant increase, including, for example an increase of about 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, 55%, 60%, 65%, 70%, 75%, 85%, 90%, 95%or greater relative to a control. Methods of determining protein content or oil content are further described below.

In some embodiments, the polypeptides disclosed herein may comprise a heterologous amino acid sequence attached thereto. For example, a polypeptide may have a polypeptide tag or additional protein domain attached thereto. The heterologous amino acid sequence can be attached to the N terminus, the C terminus, or internally within the polypeptide. In some instances, the polypeptide may have one or more polypeptide tags and/or additional protein domains attached thereto at one or more positions of the polypeptide.

In some embodiments, the nucleic acid sequence encoding the polypeptides disclosed herein may comprise a heterologous nucleic acid sequence attached thereto. For example, the heterologous nucleic acid sequence may encode a polypeptide tag or additional protein domain that will be attached to the encoded polypeptide. As another example, the heterologous nucleic acid sequence may encode a regulatory element such as an intron, an enhancer, a promoter, a terminator, etc. The heterologous nucleic acid sequence can be positioned at the 5' end, the 3' end, or in-frame within the coding sequence of the polypeptide. In some instances, the nucleic acid sequence encoding the polypeptides disclosed herein may have one or more heterologous nucleic acid sequences attached thereto at one or more positions of the nucleic acid sequence.

As used herein, "heterologous" in reference to a polypeptide or polynucleotide sequence is a sequence that originates, for example, from a cell or an organism with another genetic background of the same species or from a foreign species, or, if from the same species, is substantially modified from its native form in composition and/or genomic locus by deliberate human intervention. As such, heterologous sequences are in a configuration not found in nature. As used herein, a “native” polynucleotide or polypeptide comprises a naturally occurring nucleotide sequence or amino acid sequence, respectively. As such, “heterologous” refers to, when used in reference to a gene or nucleic acid, a gene encoding a factor that is not in its natural environment (i.e., has been altered by the of man) . For example, a heterologous gene may include a gene from one species introduced into another species. A heterologous gene may also include a gene native to an organism that has been altered in some way (e.g., mutated, added in multiple copies, linked to a non-native promoter or enhancer polynucleotide, etc. ) . Heterologous genes further may comprise plant gene polynucleotides that comprise cDNA forms of a plant gene; the cDNAs may be expressed in either a sense (to produce mRNA) or anti-sense orientation (to produce an antisense RNA transcript that is complementary to the mRNA transcript) . In one aspect of the invention, heterologous genes are distinguished from endogenous plant genes in that the heterologous gene polynucleotide are joined to polynucleotides comprising regulatory elements such as promoters that are not found naturally associated with the gene for the protein encoded by the heterologous gene or with plant gene polynucleotide in the chromosome, or are associated with portions of the chromosome not found in nature (e.g., genes expressed in loci where the gene is not normally expressed) . Further, in embodiments, a “heterologous” polynucleotide is a polynucleotide not naturally associated with a host cell into which it is introduced, including non-naturally occurring multiple copies of a naturally occurring polynucleotide.

II. Expression cassettes and promoters

Polynucleotides encoding the polypeptides provided herein can be provided in expression cassettes for expression in an organism of interest. The cassette will include 5' and 3' regulatory sequences operably linked to a polynucleotide encoding a polypeptide provided herein that allows for expression of the polynucleotide. The cassette may additionally contain at least one additional gene or genetic element to be co-transformed into the organism. Where additional genes or elements are included, the components are operably linked. Alternatively, the additional gene (s) or element (s) can be provided on multiple expression cassettes. Such an expression cassette is provided with a plurality of restriction sites and/or recombination sites for insertion of the polynucleotides to be under the transcriptional regulation of the regulatory elements or regions. The expression cassette may additionally contain a selectable marker gene.

The expression cassette will include in the 5'-3' direction of transcription, a transcriptional and translational initiation region (i.e., a promoter) , a polynucleotide of the invention, and a transcriptional and translational termination region (i.e., termination region) functional in the organism of interest, i.e., a plant or bacteria. The promoters of the invention are capable of directing or driving transcription and expression of a coding sequence in a host cell. The regulatory regions (i.e., promoters, transcriptional regulatory regions, and translational termination regions) may be endogenous or heterologous to the host cell or to each other. As used herein, a chimeric gene or a chimeric nucleic acid molecule comprises a coding sequence operably linked to a transcription initiation region that is heterologous to the coding sequence.

A variety of transcriptional terminators are available for use in expression cassettes. These are responsible for the termination of transcription beyond the transgene and correct mRNA polyadenylation. The termination region may be native with the transcriptional initiation region, may be native with the operably linked DNA sequence of interest, may be native with the plant host, or may be derived from another source (i.e., foreign or heterologous to the promoter, the DNA sequence of interest, the plant host, or any combination thereof) . Appropriate transcriptional terminators are those that are known to function in plants and include the CAMV pSOY1 terminator, the tml terminator, the nopaline synthase terminator and the pea rbcs E9 terminator. These can be used in both monocotyledons and dicotyledons. In addition, a gene's native transcription terminator may be used. Termination regions used in the expression cassettes can be obtained from, e.g., the Ti-plasmid of A. tumefaciens, such as the octopine synthase and nopaline synthase termination regions. See also Guerineau et al. (1991) Mol. Gen. Genet. 262: 141-144; Proudfoot (1991) Cell 64: 671-674; Sanfacon et al. (1991) Genes Dev. 5: 141-149; Mogen et al. (990) Plant Cell 2: 1261-1272; Munroe et al. (1990) Gene 91: 151-158; Ballas et al. (1989) Nucleic Acids Res. 17: 7891-7903; and Joshi et al. (1987) Nucleic Acids Res. 15: 9627-9639.

Additional regulatory signals include, but are not limited to, transcriptional initiation start sites, operators, activators, enhancers, other regulatory elements, ribosomal binding sites, an initiation codon, termination signals, and the like. See, for example, U.S. Pat. Nos. 5,039,523 and 4,853,331; EPO 0480762A2; Sambrook et al. (1992) Molecular Cloning: A Laboratory Manual, ed. Maniatis et al. (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N. Y. ) , hereinafter “Sambrook 11” ; Davis et al, eds. (1980) .

In preparing the expression cassette, the various DNA fragments may be manipulated, so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame. Toward this end, adapters or linkers may be employed to join the DNA fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like. For this purpose, in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions, may be involved.

a. Promoters

A number of promoters can be used in the practice of the invention. The promoters can be selected based on the desired outcome. The nucleic acids can be combined with constitutive, inducible, tissue-preferred, or other promoters for expression in the organism of interest. See, for example, promoters set forth in WO 99/43838 and in US Patent Nos: 8,575,425; 7,790,846; 8,147,856; 8,586832; 7,772,369; 7,534,939; 6,072,050; 5,659,026; 5,608,149; 5,608,144; 5,604,121; 5,569,597; 5,466,785; 5,399,680; 5,268,463; 5,608,142; and 6,177,611; herein incorporated by reference. In some embodiments, the promoter used herein to drive the expression of the polynucleotides provided herein comprises an exogenous promoter. The term “exogenous promoter, ” refers to a promoter that is not found in plants in nature, for example, a synthetic promoter.

For expression in plants, constitutive promoters can also be used. Non-limiting examples of constitutive promoters include CaMV pSOY1 promoter (Odell et al. (985) Nature 313 : 810-812) ; rice actin (McElroy et al. (1990) Plant Cell 2: 163-171) ; ubiquitin (Christensen et al. (1989) Plant Mol. Biol. 12: 619-632 and Christensen et al. (1992) Plant Mol. Biol. 18: 675-689) ; pEMU (Last et al. (1991) Theor. Appl. Genet. 81: 581 -588) ; MAS (Velten e/a/. (1984) EMBO J. 3 : 2723-2730) . Inducible promoters include those that drive expression of pathogenesis-related proteins (PR proteins) , which are induced following infection by a pathogen. See, for example, Redolfi et al. (1983) Neth. J. Plant Pathol. 89: 245-254; Uknes et al. (1992) Plant Cell 4: 645-656; and Van Loon (1985) Plant Mol. Virol. 4: 111-116; and WO 99/43819, herein incorporated by reference. Promoters that are expressed locally at or near the site of pathogen infection may also be used (Marineau et al. (1987) Plant Mol. Biol. 9: 335-342; Matton et al. (1989) Molecular Plant-Microbe Interactions 2: 325-331; Somsisch et al. (1986) Proc. Natl. Acad. Sci. USA 83: 2427-2430; Somsisch et al. (1988) Mol. Gen. Genet. 2: 93-98; and Yang (1996) Proc. Natl. Acad. Sci. USA 93: 14972-14977; Chen et al. (1996) Plant J. 10: 955-966; Zhang et al. (1994) Proc. Natl. Acad. Sci. USA 91 : 2507-2511; Warner et al. (1993) Plant J. 3: 191-201; Siebertz et al. (1989) Plant Cell 1 : 961-968; Cordero et al. (1992) Physiol. Mol. Plant Path. 41: 189-200; U.S . Patent No. 5,750,386 (nematode-inducible) ; and the references cited therein) .

Wound-inducible promoters may be used in the constructions of the invention. Such wound-inducible promoters include pin II promoter (Ryan (1990) Ann. Rev. Phytopath. 28: 425-449; Ouan et al. (1996) Nature Biotechnology 14: 494-498) ; wunl and wun2 (U.S. Patent No. 5,428,148) ; winl and win2 (Stanford et al. (1989) Mol. Gen. Genet. 215: 200-208) ; systemin (McGurl et al. (1992) Science 225: 1570-1573) ; WIP1 (Rohmeier et al. (1993) Plant Mol. Biol. 22: 783-792; Eckelkamp et al. (1993) FEBS Letters 323: 73-76) ; MPI gene (Corderok et al. (1994) Plant J. 6 (2) : 141-150) ; and the like, herein incorporated by reference.

Tissue-preferred promoters for use in the invention include those set forth in Yamamoto et al. (1997) Plant J. 12 (2) : 255-265; Kawamata et al. (1997) Plant Cell Physiol. 38 (7) : 792-803; Hansen et al. (1997) Mol. Gen Genet. 254 (3) : 337-343; Russell et al. (1997) Transgenic Res. 6 (2) : 157-168; Rinehart et al. (1996) Plant Physiol. 112 (3) : 1331-1341; Van Camp et al. (1996) Plant Physiol. 112 (2) : 525-535; Canevascim et al. (1996) Plant Physiol. 112 (2) : 513-524; Yamamoto et al. (1994) Plant Cell Physiol. 35 (5) : 773-778; Lam (1994) Results Probl. Cell Differ. 20: 181-196; Orozco et al. (1993) PlantMolBiol. 23 (6) : 1129-1138; Matsuoka et al. (1993) Proc Natl. Acad. Sci. USA 90 (20) : 9586-9590; and Guevara-Garcia et al. (1993) Plant J. 4 (3) : 495-505.

Leaf-preferred promoters include those set forth in Yamamoto et al. (1997) Plant J. 12 (2) : 255-265; Kwon et al. (1994) Plant Physiol. 105: 357-67; Yamamoto et al. (1994) Plant Cell Physiol. 35 (5) : 773-778; Gotor et al. (1993) Plant J. 3: 509-18; Orozco et al. (1993) Plant Mol. Biol. 23 (6) : 1129-1138; and Matsuoka et al. (1993) Proc. Natl. Acad. Sci. USA 90 (20) : 9586-9590.

Root-preferred promoters are known and include those in Hire et al. (1992) Plant Mol. Biol. 20 (2) : 207-218 (soybean root-specific glutamine synthetase gene) ; Keller and Baumgartner (1991) Plant Cell 3 (10) : 1051-1061 (root-specific control element) ; Sanger et al. (1990) Plant Mol. Biol. 14 (3) : 433-443 (mannopine synthase (MAS) gene of Agrobacterium tumefaciens) ; and Miao et al. (1991) Plant Cell 3 (1) : 11-22 (cytosolic glutamine synthetase (GS) ) ; Bogusz et al. (1990) Plant Cell 2 (7) : 633-641; Leach and Aoyagi (1991) Plant Science (Limerick) 79 (l) : 69-76 (rolC and rolD) ; Teeri et al. (1989) EMBO J. 8 (2) : 343-350; Kuster et al. (1995) Plant Mol. Biol. 29 (4) : 759-772 (the VfENOD-GRP3 gene promoter) ; and, Capana et al. (1994) Plant Mol. Biol. 25 (4) : 681-691 (rolB promoter) . See also U.S. Patent Nos. 5,837,876; 5,750,386; 5,633,363; 5,459,252; 5,401,836; 5,110,732; and 5,023,179.

"Seed-preferred" promoters include both "seed-specific" promoters (those promoters active during seed development such as promoters of seed storage proteins) as well as "seed- germinating" promoters (those promoters active during seed germination) . See Thompson et al. (1989) BioEssays 10: 108. Seed-preferred promoters include, but are not limited to, Ciml (cytokinin-induced message) ; cZ19Bl (maize 19 kDa zein) ; milps (myo-inositol-1 -phosphate synthase) (see WO 00/11177 and U.S. Patent No. 6,225,529) . Gamma-zein is an endosperm-specific promoter. Globulin 1 (Gib-1) is a representative embryo-specific promoter. For dicots, seed-specific promoters include, but are not limited to, bean β-phaseolin, napin, β-conglycinin, soybean lectin, cruciferin, and the like. For monocots, seed-specific promoters include, but are not limited to, maize 15 kDa zein, 22 kDa zein, 27 kDa zein, gamma-zein, waxy, shrunken 1, shrunken 2, Globulin 1, etc. See also WO 00/12733, where seed-preferred promoters from endl and end! genes are disclosed.

In specific embodiments, the polynucleotides or variants thereof provided herein, are not expressed using a root-specific promoter. In further embodiments, the polynucleotides or variants thereof provided herein are not expressed with the RCc3 root-specific promoter. (See US 20130139280) .

For expression in a bacterial host, promoters that function in bacteria are well-known in the art. Such promoters include any of the known crystal protein gene promoters, including the promoters of any of the proteins of the invention, and promoters specific for B. thuringiensis sigma factors. Alternatively, mutagenized, or recombinant crystal protein-encoding gene promoters may be recombinantly engineered and used to promote expression of the novel gene segments disclosed herein.

A number of non-translated leader sequences derived from viruses are also known to enhance expression, and these are particularly effective in dicotyledonous cells. The expression cassette may comprise one or more of such leader sequences. Specifically, leader sequences from tobacco mosaic virus (TMV, the “W-sequence” ) , maize chlorotic mottle virus (MCMV) , and alfalfa mosaic virus (AMV) have been shown to be effective in enhancing expression (e.g., Gallie et al. Nucl. Acids Res. 15: 8693-8711 (1987) ; Skuzeski et al. Plant Molec. Biol. 15: 65-79 (1990) ) . Other leader sequences known in the art include but are not limited to: picomavirus leaders, for example, EMCV leader (encephalomyocarditis 5' noncoding region) (Elroy-Stein, O., Fuerst, T.R., and Moss, B. PNAS USA 86: 6126-6130 (1989) ) ; potyvirus leaders, for example, tobacco etch virus (TEV) leader (Allison et al., 1986) ; maize dwarf mosaic virus (MDMV) leader; Virology 154: 9-20) ; human immunoglobulin heavy-chain binding protein (BiP) leader, (Macejak, D.G., and Samow, P., Nature 353: 90-94 (1991) ; untranslated leader from the coat protein mRNA of alfalfa mosaic virus (AMV RNA 4) , (Jobling, S.A., and Gehrke, L., Nature 325: 622-625 (1987) ; tobacco mosaic virus leader (TMV) , (Gallie, D. R. et al., Molecular Biology of RNA, 237-256 (1989) ; and maize chlorotic mottle virus leader (MCMV) (Lommel, S.A. et al., Virology 81: 382-385 (1991) . See also, Della-Cioppa et al., Plant Physiology 84: 965-968 (1987) .

The expression cassette can also comprise a selectable marker gene for the selection of transformed cells. Selectable marker genes are utilized for the selection of transformed cells or tissues. Marker genes include genes encoding antibiotic resistance, such as those encoding neomycin phosphotransferase Π (NEO) and hygromycin, 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) , spectinomycin, or Acetolactate synthase (ALS) . Selection markers used routinely in transformation include the nptll gene, which confers resistance to kanamycin and related antibiotics (Messing &Vierra Gene 19: 259-268 (1982) ; Bevan et al., Nature 304: 184-187 (1983) ) , the pat and bar genes, which confer resistance to the herbicide glufosinate (also called phosphinothricin; see White et al., Nucl. Acids Res 18: 1062 (1990) , Spencer et al. Theor. Appl. Genet 79: 625-631 (1990) and U.S. Patent Nos. 5,561,236 and 5,276,268) , the hph gene, which confers resistance to the antibiotic hygromycin (Blochinger &Diggelmann, Mol. Cell Biol. 4: 2929-2931) , and the dhfr gene, which confers resistance to methatrexate (Bourouis et al., EMBO J. 2 (7) : 1099-1104 (1983) ) , the EPSPS gene, which confers resistance to glyphosate (U.S. Patent Nos. 4,940,935 and 5,188,642) , the glyphosate N-acetyltransferase (GAT) gene, which also confers resistance to glyphosate (Castle et al. (2004) Science, 304: 1151-1154; U.S. Patent App. Pub. Nos. 20070004912, 20050246798, and 20050060767) ; and the mannose-6-phosphate isomerase gene, which provides the ability to metabolize mannose (U.S. Patent Nos. 5,767,378 and 5,994,629) .

b. Native promoters

In some embodiments, the promoter used herein to drive the expression of the polynucleotides provided herein comprises a native promoter or an active variant or fragment thereof. For purpose of this disclosure, the term “native promoter, ” used interchangeably with the term “endogenous promoter, ” refers to a promoter that is found in plants in nature. An active variant or fragment of a native promoter refers to a promoter sequence that has one or more nucleotide substitutions, deletions, or insertions and that can drive expression of an operably linked polynucleotide sequence under conditions similar to those under which the native promoter is active. Such active variants or fragments may be created by site-directed mutagenesis, induced mutation, or may occur as allelic variants (polymorphisms) . In some embodiments, the native promoter comprises a polynucleotide having the sequence of SEQ ID NO: 58. In some embodiments, disclosed herein is a construct comprising a native promoter (e.g., a native promoter comprising SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 or 70) or its active variant or fragment operably linked to a polynucleotide encoding a polypeptide having the sequence of any one of SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 or 70, or a fragment or variant of any one of SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 or 70 (e.g., having least 85%, at least 90%, at least 95%, at least 98%, or at least 99%identity) ; and when introduced into a plant, the construct confers increased protein content, increased oil content, and/or modified oil profile. In some embodiments, the native promoter is a heterologous promoter to the polynucleotide.

Also provided herein is a plant, a plant cell, or a plant part (e.g., a plant seed) comprising the construct described above. In some embodiments, the polynucleotide encodes a polypeptide having an amino acid sequence comprising least 85%, at least 90%, or at least 95%identity to at least one of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72. In some embodiments, the polynucleotide comprises at least one, at least two, at least three, at least four, at least five, or at least six mutations as compared to SEQ ID NO: 2, 5, 8, 11, 14, 44, 47, 50, 53, 56, 59, 62, 65, 68 or 71. In some embodiments, the polynucleotide comprises at least one, at least two, at least three, at least four, at least five, or at least six mutations as compared to a polynucleotide encoding any one of SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 or 70. In some embodiments, the plant is a dicot plant. In some embodiments, the plant is a monocot plant. In some embodiments, the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane. In some embodiments, the plant is a soybean plant. In some embodiments, the plant is an elite soybean plant.

Also provided herein is a method of conferring increased protein content, increased oil content, and/or modified oil profile to a plant comprising introducing into the genome of the plant a nucleic acid sequence operably linked to a promoter comprising SEQ ID NO: 20 or an active variant or fragment thereof, where the nucleic acid sequence encodes a polypeptide having an amino acid sequence comprising least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%identity, at least 96%, at least 97%, at least 98%, or at least 99%identity to at least one of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72. In some embodiments, the nucleic acid sequence encodes a polypeptide having an amino acid sequence set forth in SEQ ID NO: 2, 5, 8, 11, 14, 44, 47, 50, 53, 56, 59, 62, 65, 68 or 71.

III. Plants, plant cells and plant parts

In the plants provided herein, the polynucleotide as described in Section I of this disclosure is a heterologous nucleic acid sequence in the genome of the plant. As used herein, the term “heterologous” in the context of a chromosomal segment refers to one or more DNA sequences (e.g., genetic loci) in a configuration in which they are not found in nature, for example as a result of a recombination event between homologous chromosomes during meiosis, or for example as a result of introduction of a transgenic sequence, or for example as a result of modification through gene editing.

Although soybean plants are used to exemplify the composition and methods throughout the application, a polynucleotide as provided herein may be introduced to any plant species, including, but not limited to, monocots and dicots. Examples of plants of interest include, but are not limited to, corn (maize) , sorghum, wheat, sunflower, tomato, crucifers, peppers, potato, cotton, rice, soybean, sugarbeet, sugarcane, tobacco, barley, and oilseed rape, Brassica sp., alfalfa, rye, millet, safflower, peanuts, sweet potato, cassava, coffee, coconut, pineapple, citrus trees, cocoa, tea, banana, avocado, fig, guava, mango, olive, papaya, cashew, macadamia, almond, oats, vegetables, ornamentals, and conifers.

Glycine (soybean or soya bean) is a genus in the bean family Fabaceae. The Glycine plants can be Glycine arenaria, Glycine argyrea, Glycine cyrtoloba, Glycine canescens, Glycine clandestine, Glycine curvata, Glycinefalcata, Glycine latifolia, Glycine microphylla, Glycine pescadrensis, Glycine stenophita, Glycine syndetica, Glycine soja Seib. Et Zucc., Glycine max (L.) Merrill., Glycine tabacina, or Glycine tomentella.

In some embodiments, the plants provided herein are elite plants or derived from an elite line.

As used herein, an “elite line” is an agronomically superior line that has resulted from many cycles of breeding and selection for superior agronomic performance. Numerous elite lines are available and known to those of skill in the art of soybean breeding. An “elite population, ” is an assortment of elite individuals or lines that can be used to represent the state of the art in terms of agronomically superior genotypes of a given crop species, such as soybean. Similarly, an “elite germplasm” or elite strain of germplasm is an agronomically superior germplasm, typically derived from, and/or can give rise to, a plant with superior agronomic performance, such as an existing or newly developed elite line of soybean.

An “elite” plant is any plant from an elite line, such that an elite plant is a representative plant from an elite variety. In some embodiments, the soybean plant comprising a polynucleotide encoding any one of the polypeptides disclosed herein is an elite soybean plant. Non-limiting examples of elite soybean varieties that are commercially available to farmers or soybean breeders include: AG00802, A0868, AG0902, A1923, AG2403, A2824, A3704, A4324, A5404, AG5903, AG6202 AG0934; AG1435; AG2031; AG2035; AG2433; AG2733; AG2933; AG3334; AG3832; AG4135; AG4632; AG4934; AG5831; AG6534; and AG7231 (Asgrow Seeds, Des Moines, Iowa, USA) ; BPR0144RR, BPR 4077NRR and BPR 4390NRR (Bio Plant Research, Camp Point, Ill., USA) ; DKB 17-51 and DKB37-51 (DeKalb Genetics, DeKalb, Ill., USA) ; DP 4546 RR, and DP 7870 RR (Delta &Pine Land Company, Lubbock, Tex., USA) ; JG 03R501, JG 32R606C ADD and JG 55R503C (JGL Inc., Greencastle, Ind., USA) ; NKS 13-K2 (NK Division of Syngenta Seeds, Golden Valley, Minnesota, USA) ; 90M01, 91M30, 92M33, 93M11, 94M30, 95M30, 97B52, P008T22R2; P16T17R2; P22T69R; P25T51R; P34T07R2; P35T58R; P39T67R; P47T36R; P46T21R; and P56T03R2 (Pioneer Hi-Bred International, Johnston, Iowa, USA) ; SG4771NRR and SG5161NRR/STS (Soygenetics, LLC, Lafayette, Ind., USA) ; S00-K5, S11-L2, S28-Y2, S43-B1, S53-A1, S76-L9, S78-G6, S0009-M2; S007-Y4; S04-D3; S14-A6; S20-T6; S21-M7; S26-P3; S28-N6; S30-V6; S35-C3; S36-Y6; S39-C4; S47-K5; S48-D9; S52-Y2; S58-Z4; S67-R6; S73-S8; and S78-G6 (Syngenta Seeds, Henderson, Ky., USA) ; Richer (Northstar Seed Ltd. Alberta, CA) ; 14RD62 (Stine Seed Co. Ia., USA) ; or Armor 4744 (Armor Seed, LLC, Ar., USA) .

In some embodiments, the plants provided herein can comprise one or more additional polynucleotides that encode an additional polypeptide that can confer a phenotype of increased protein content, increased oil content, or modified oil profile on a plant. In some embodiments, the additional polynucleotide encodes a polypeptide having the sequence of any one of SEQ ID NO: 3, 6, 9, 11, or 15. The additional polynucleotide can be introduced using similar approaches as disclosed above, e.g, by transgenic means, by breeding, or by genome editing.

In specific embodiments, the plants, plant parts or seeds having the heterologous polynucleotide or polypeptide disclosed herein or active variants and fragment thereof can have a modified level of expression of the polynucleotide or polypeptide (i.e., an increase or a decrease in expression level) . In other embodiments, the plants, plant parts or seeds having the heterologous polynucleotide or polypeptide disclosed herein or active variants and fragment thereof can have a modified level of activity of the polypeptide (i.e., an increase or a decrease in activity level) . Methods to generate such modified levels of expression or activity are disclosed elsewhere herein and include, but are not limited to, breeding, gene editing, and transgenic techniques.

Plants produced as described above can be propagated to produce progeny plants, and the progeny plants that have stably incorporated into its genome a polynucleotide conferring increased protein content, increased oil content, and/or modified oil profile can be selected and can be further propagated if desired. The term “progeny, ” refers to the descendant (s) of a particular cross. Typically, progeny result from breeding of two individuals, although some species (particularly some plants and hermaphroditic animals) can be selfed (i.e., the same plant acts as the donor of both male and female gametes) . The descendant (s) can be, for example, of the F1, the F2, or any subsequent generation.

In some embodiments, a plant cell, seed, or plant part or harvest product can be obtained from the plant produced as above and the plant cell, seed, or plant part can be screened using methods disclosed above for the evidence of stable incorporation of the polynucleotide. The term “stable incorporation” refers to the integration of a nucleic acid sequence into the genome of a plant and said nucleic acid sequence is capable of being inherited by the progeny thereof. As used herein, the term “plant part” indicates a part of a plant, including single cells and cell tissues such as plant cells that are intact in plants, cell clumps and tissue cultures from which plants can be regenerated. Examples of plant parts include, but are not limited to, single cells and tissues from pollen, ovules, zygotes, leaves, embryos, roots, root tips, anthers, flowers, flower parts, fruits, stems, shoots, cuttings, and seeds; as well as pollen, ovules, egg cells, zygotes, leaves, embryos, roots, root tips, anthers, flowers, flower parts, fruits, stems, shoots, cuttings, scions, rootstocks, seeds, protoplasts, calli, and the like.

In some embodiments, plant products can be harvested from the plant disclosed above and processed to produce processed products, such as flour, soy meal, oil, starch, and the like. These processed products are also within the scope of this invention provided that they comprise a polynucleotide or polypeptide or variant thereof disclosed herein. Other soybean plant products include but are not limited to protein concentrate, protein isolate, soybean hulls, meal, flower, oil and the whole soybean itself.

IV. Methods for producing a plant variety that has increased protein content, increased oil content, and/or modified oil profile

Provided herein are methods of producing a plant that has increased protein content, increased oil content, and/or modified oil profile by introducing a nucleic acid sequence encoding a polypeptide as provided herein. A nucleic acid sequence may be introduced to a plant cell by various ways, for example, by transformation, by genome modification techniques (such as by genome editing) , or by breeding. In one aspect, the plant can be produced by transforming the nucleic acid sequence encoding a polypeptide disclosed above into a recipient plant. In one aspect, the method can comprise editing the genome of the recipient plant so that the resulting plant comprises a polynucleotide encoding a polypeptide disclosed above. In yet another aspect, the method can comprise increasing the expression level and/or activity of the above-mentioned proteins in a recipient plant, for example, by enhancing promoter activity or replacing the endogenous promoter with a stronger promoter. In another aspect, the method can comprise breeding a donor plant comprising a polynucleotide as described above with a recipient plant and selecting for incorporation of the polynucleotide into the recipient plant genome.

1. Transgenic means

In some embodiments, the method comprises transforming a polynucleotide disclosed herein or an active variant or fragment thereof into a recipient plant to obtain a transgenic plant, and said transgenic plant has increased protein content, increased oil content, and/or modified oil profile. Expression cassettes comprising polynucleotides encoding the polypeptides as described above can be used to transform plants of interest.

As used herein, the term “transgenic” and grammatical variations thereof refer to a plant, including any part derived from the plant, such as a cell, tissue or organ, in which a heterologous nucleic acid is integrated into the genome. In specific embodiments, the heterologous nucleic acid is a recombinant construct, vector or expression cassette comprising one or more nucleic acids. In other embodiments, a transgenic plant is produced by a genetic engineering method, such as Agrobacterium transformation. Through gene technology, the heterologous nucleic acid is stably integrated into chromosomes, so that the next generation can also be transgenic. As used herein, “transgenic” and grammatical variations thereof also encompass biological treatments, which include plant hybridization and/or natural recombination.

Transformation results in a transformed plant, including whole plants, as well as plant organs (e.g., leaves, stems, roots, etc. ) , seeds, plant cells, propagules, embryos and progeny of the same. Plant cells can be differentiated or undifferentiated (e.g., callus, suspension culture cells, protoplasts, leaf cells, root cells, phloem cells, pollen) . Transformation may result in stable or transient incorporation of the nucleic acid into the cell. "Stable transformation" is intended to mean that the nucleotide construct introduced into a host cell integrates into the genome of the host cell and is capable of being inherited by the progeny thereof. "Transient transformation" is intended to mean that a polynucleotide is introduced into the host cell and does not integrate into the genome of the host cell.

Methods for transformation typically involve introducing a nucleotide construct into a plant. In some embodiments, the transformation method is an Agrobacterium-mediated transformation. In some embodiments, the transformation method is a biolistic-mediated transformation. Transformation may also be performed by infection, transfection, microinjection, electroporation, microprojection, biolistics or particle bombardment, electroporation, silica/carbon fibers, ultrasound mediated, PEG mediated, calcium phosphate co-precipitation, poly cation DMSO technique, DEAE dextran procedure, Agrobacterium and viral mediated (e.g., Caulimoriviruses, Geminiviruses, RNA plant viruses) , liposome mediated and the like.

Transformation protocols as well as protocols for introducing polypeptides or polynucleotide sequences into plants may vary depending on the type of plant or plant cell, i.e., monocot or dicot, targeted for transformation. Methods for transformation are known in the art and include those set forth in US Patent Nos: 8,575,425; 7,692,068; 8,802,934; and 7,541,517; each of which is herein incorporated by reference. See, also, Rakoczy-Trojanowska, M. (2002) Cell Mol Biol Lett. 7: 849-858; Jones et al. (2005) Plant Methods, Vol. 1, Article 5; Rivera et al. (2012) Physics of Life Reviews 9: 308-345; Bartlett et al. (2008) Plant Methods 4: 1-12; Bates, G.W. (1999) Methods in Molecular Biology 111 : 359-366; Binns and Thomashow (1988) Annual Reviews in Microbiology 42: 57 Sup'/Sup5-606; Christou, P. (1992) The Plant Journal 2: 275-281; Christou, P. (1995) Euphytica 85: 13-27; Tzfira et al. (2004) TRENDS in Genetics 20: 375-383; Yao et al. (2006) Journal of Experimental Botany 57: 3737-3746; Zupan and Zambryski (1995) Plant Physiology 107: 1041-1047.

Methods for transformation of chloroplasts are known in the art. See, for example, Svab et al. (1990) Proc. Natl. Acad. Sci. USA 87 (21) : 8526-8530; Svab and Maliga (1993) Proc. Natl. Acad. Sci. USA 90 (3) : 913-917; Staub and Maliga (1993) EMBO J. 12 (2) : 601-606. The method relies on particle gun delivery of DNA containing a selectable marker and targeting of the DNA to the plastid genome through homologous recombination. Additionally, plastid transformation can be accomplished by transactivation of a silent plastid-borne transgene by tissue-preferred expression of a nuclear-encoded and plastid-directed RNA polymerase. Such a system has been reported in McBride et al. (1994) Proc. Natl. Acad. Sci. USA 91 (15) : 7301-7305.

The cells that have been transformed may be grown into plants in accordance with conventional ways. See, for example, McCormick et al. (1986) Plant Cell Reports 5: 81-84. These plants may then be grown, and either pollinated with the same transformed strain or different strains, and the resulting hybrid having constitutive expression of the desired phenotypic characteristic identified. Two or more generations may be grown to ensure that expression of the desired phenotypic characteristic is stably maintained and inherited and then seeds harvested to ensure expression of the desired phenotypic characteristic has been achieved. In this manner, the present invention provides transformed seed (also referred to as "transgenic seed" ) having a nucleotide construct of the invention, for example, an expression cassette of the invention, stably incorporated into their genome.

2. Crossing

In some embodiments, the method comprises crossing a donor plant comprising a polynucleotide encoding a polypeptide disclosed herein with a recipient plant, and the polypeptide is able to confer increased protein content, increased oil content, and/or modified oil profile in the recipient plant. As used herein, the terms “crossing” and “breeding” refer to the fusion of gametes to produce progeny (e.g., by fertilization, such as to produce seed by pollination in plants) . In some embodiments, a “cross, ” “breeding, ” or “cross-fertilization” is fertilization of one individual by another (e.g., cross-pollination in plants) . The plant disclosed herein may be a whole plant, or may be a plant cell, seed, or tissue, or a plant part such as leaf, stem, pollen, or cell that can be cultivated into a whole plant.

In some embodiments, a progeny plant created by the crossing or breeding process is repeatedly crossed back to one of its parents through a process referred to herein as “backcrossing” . In a backcrossing scheme, the “donor” parent refers to the parental plant with the desired gene or locus to be introgressed. The “recipient” parent (used one or more times) or “recurrent” parent (used two or more times) refers to the parental plant into which the gene or locus is being introgressed. For example, see Ragot, M. et al. Marker-assisted Backcrossing: A Practical Example, in Techniques et Utilisations des Marqueurs Moleculaires Les Colloques, Vol. 72, pp. 45-56 (1995) ; and Openshaw et al., Marker-assisted Selection in Backcross Breeding, in Proceedings of the Symposium “Analysis of Molecular Marker Data, ” pp. 41-43 (1994) . The initial cross gives rise to the F1 generation. The term “BC1” refers to the second use of the recurrent parent, “BC2” refers to the third use of the recurrent parent, and so on.

In some embodiments, the donor soybean plant is a Glycine max plant. In some embodiments, the donor soybean plant is a Glycine soja plant. In some embodiments, the recipient soybean plant is an elite Glycine max plant or an elite Glycine soja plant. In some embodiments, the donor plant is from soy variety Suinong 14 (SN14) . In some embodiments, the donor plant is soy variety Glycine soja ZYD0006.

3. Gene editing

In some embodiments, the polynucleotide sequences provided herein can be targeted to specific sites within the genome of a recipient plant cell. Such methods include, but are not limited to, meganucleases designed against the plant genomic sequence of interest CRISPR-Cas9, TALENs, and other technologies for precise editing of genomes (Feng, et al. Cell Research 23: 1229-1232, 2013, WO 2013/026740) ; Cre-lox site-specific recombination; FLP-FRT recombination (Li et al. (2009) Plant Physiol 151: 1087-1095) ; Bxbl -mediated integration (Yau et al. Plant J (2011) 701: 147-166) ; zinc-finger mediated integration (Wright et al. (2005) Plant J 44: 693-705) ; Cai et al. (2009) Plant Mol Biol 69: 699-709) ; and homologous recombination (Lieberman-Lazarovich and Levy (2011) Methods Mol Biol : 51-65) ; prime editing and transposases (Anzalone, A. et al., Nat Biotechnol. 2020 Jul; 38 (7) : 824-844) ; translocation; and inversion.

Various embodiments of the methods described herein use gene editing. In some embodiments, gene editing is used to mutagenize the genome of a plant to produce plants having one or more of the polypeptides that is able to confer increased protein content, increased oil content, and/or modified oil profile.

In some embodiments, provided herein are plants transformed with and expressing gene-editing machinery as described above, which, when crossed with a target plant, result in gene editing in the target plant.

In general, gene editing may involve transient, inducible, or constitutive expression of the gene editing components or systems. Gene editing may involve genomic integration or episomal presence of the gene editing components or systems.

Gene editing generally refers to the use of a site-directed nuclease (including but not limited to CRISPR/Cas, zinc fingers, meganucleases, and the like) to cut a nucleotide sequence at a desired location. This may be to cause an insertion/deletion ( “indel” ) mutation, (i.e., “SDN1” ) , a base edit (i.e., “SDN2” ) , or allele insertion or replacement (i.e., “SDN3” ) . SDN2 or SDN3 gene editing may comprise the provision of one or more recombination templates (e.g., in a vector) comprising a gene sequence of interest that can be used for homology directed repair (HDR) within the plant (i.e., to be introduced into the plant genome) . In some embodiments, the gene or allele of interest is one that is able to confer to the plant an improved trait, e.g., increased protein content, increased oil content, and/or modified oil profile. The recombination template can be introduced into the plant to be edited either through transformation or through breeding with a donor plant comprising the recombination template. Breaks in the plant genome may be introduced within, upstream, and/or downstream of a target sequence. In some embodiments, a double strand DNA break is made within or near the target sequence locus. In some embodiments, breaks are made upstream and downstream of the target sequence locus, which may lead to its excision from the genome. In some embodiments, one or more single strand DNA breaks (nicks) are made within, upstream, and/or downstream of the target sequence (e.g., using a nickase Cas9 variant) . Any of these DNA breaks, as well as those introduced via other methods known to one of skill in the art, may induce HDR. Through HDR, the target sequence is replaced by the sequence of the provided recombination template comprising a polynucleotide of interest, e.g., SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67, 70 or a polynucleotide encoding a polypeptide having the sequence of any one of SEQ ID NO: : 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72 may be provided on/as a template. By designing the system such that one or more single strand or double strand breaks are introduced within, upstream, and/or downstream of the corresponding region in the genome of a plant not comprising the gene sequence of interest, this region can be replaced with the templateIn some embodiments, the polynucleotide of interest is operably linked to a promoter and the expression of the polynucleotide of interest controlled by the promoter conferred increased protein content, increased oil content, and/or modified oil profile to the plant. In some embodiments, the promoter is a native promoter, or an active variant or fragment thereof as described above. In some embodiments, the native promoter comprises SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 or 70.

In some embodiments, mutations in the genes of interest described herein may be generated without the use of a recombination template via targeted introduction of DNA double strand breaks. Such breaks may be repaired through the process of non-homologous end joining (NHEJ) , which can result in the generation of small insertions or deletions (indels) at the repair site. Such indels may lead to frameshift mutations causing premature stop codons or other types of loss-of-function mutations in the targeted genes.

In some embodiments, gene editing may involve transient, inducible, or constitutive expression of the gene editing components or systems in the target plant. Gene editing may also involve genomic integration or episomal presence of the gene editing components or systems in the target plant.

In certain embodiments, the nucleic acid modification or mutation is effected by a (modified) zinc-finger nuclease (ZFN) system. The ZFN system uses artificial restriction enzymes generated by fusing a zinc finger DNA-binding domain to a DNA-cleavage domain that can be engineered to target desired DNA sequences. Exemplary methods of genome editing using ZFNs can be found for example in U.S. Patent Nos. 6,534,261; 6,607,882; 6,746,838; 6,794,136; 6,824,978; 6,866,997; 6,933,113; and 6,979,539.

In certain embodiments, the nucleic acid modification is effected by a (modified) meganuclease, which are endodeoxyribonucleases characterized by a large recognition site (double-stranded DNA sequences of 12 to 40 base pairs) . Exemplary method for using meganucleases can be found in US Patent Nos: 8,163,514; 8,133,697; 8,021,867; 8,119,361; 8,119,381; 8,124,369; and 8,129,134, which are specifically incorporated by reference.

In certain embodiments, the nucleic acid modification is effected by a (modified) CRISPR/Cas complex or system. In certain embodiments, the CRISPR/Cas system or complex is a class 2 CRISPR/Cas system. In certain embodiments, said CRISPR/Cas system or complex is a type II, type V, or type VI CRISPR/Cas system or complex. The CRISPR/Cas system does not require the generation of customized proteins to target specific sequences but rather a single Cas protein can be programmed by an RNA guide (gRNA) to recognize a specific nucleic acid target, in other words the Cas enzyme protein can be recruited to a specific nucleic acid target locus (which may comprise or consist of RNA and/or DNA) of interest using said short RNA guide.

In general, the CRISPR/Cas or CRISPR system is as used herein foregoing documents refers collectively to transcripts and other elements involved in the expression of or directing the activity of CRISPR-associated ( “Cas” ) genes, including sequences encoding a Cas gene and one or more of, a tracr (trans-activating CRISPR) sequence (e.g. tracrRNA or an active partial tracrRNA) , a tracr-mate sequence (encompassing a “direct repeat” and a tracrRNA-processed partial direct repeat in the context of an endogenous CRISPR system) , a guide sequence (also referred to as a “spacer” in the context of an endogenous CRISPR system) , or “RNA (s) ” as that term is herein used (e.g., RNA (s) to guide Cas, such as Cas9, e.g. CRISPR RNA and, where applicable, transactivating (tracr) RNA or a single guide RNA (sgRNA) (chimeric RNA) ) or other sequences and transcripts from a CRISPR locus. In general, a CRISPR system is characterized by elements that promote the formation of a CRISPR complex at the site of a target sequence (also referred to as a protospacer in the context of an endogenous CRISPR system) . In the context of formation of a CRISPR complex, “target sequence” refers to a sequence to which a guide sequence is designed to have complementarity, where hybridization between a target sequence and a guide sequence promotes the formation of a CRISPR complex. A target sequence may comprise any polynucleotide, such as DNA or RNA polynucleotides.

In certain embodiments, the gRNA is a chimeric guide RNA or single guide RNA (sgRNA) . In certain embodiments, the gRNA comprises a guide sequence and a tracr mate sequence (or direct repeat) . In certain embodiments, the gRNA comprises a guide sequence, a tracr mate sequence (or direct repeat) , and a tracr sequence. In certain embodiments, the CRISPR/Cas system or complex as described herein does not comprise and/or does not rely on the presence of a tracr sequence (e.g. if the Cas protein is Cas12a) .

The Cas protein as referred to herein, such as but not limited to Cas9, Cas12a (formerly referred to as Cpf1) , Cas12b (formerly referred to as C2c1) , Cas13a (formerly referred to as C2c2) , C2c3, Cas13b protein, may originate from any suitable source, and hence may include different orthologues, originating from a variety of (prokaryotic) organisms, as is well documented in the art. In certain embodiments, the Cas protein is (modified) Cas9, preferably (modified) Staphylococcus aureus Cas9 (SaCas9) or (modified) Streptococcus pyogenes Cas9 (SpCas9) . In certain embodiments, the Cas protein is Cas12a, optionally from Acidaminococcus sp., such as Acidaminococcus sp. BV3L6 Cpf1 (AsCas12a) or Lachnospiraceae bacterium Cas12a , such as Lachnospiraceae bacterium MA2020 or Lachnospiraceae bacterium MD2006 (LBCas12a) . See U.S. Pat. No. 10,669,540, incorporated herein by reference in its entirety. Alternatively, the Cas12a protein may be from Moraxella bovoculi AAX08_00205 [Mb2Cas12a] or Moraxella bovoculi AAX11_00205 [Mb3Cas12a] . See WO 2017/189308, incorporated herein by reference in its entirety. In certain embodiments, the Cas protein is (modified) C2c2, preferably Leptotrichia wadei C2c2 (LwC2c2) or Listeria newyorkensis FSL M6-0635 C2c2 (LbFSLC2c2) . In certain embodiments, the (modified) Cas protein is C2c1. In certain embodiments, the (modified) Cas protein is C2c3. In certain embodiments, the (modified) Cas protein is Cas13b. Other Cas enzymes are available to a person skilled in the art.

Gene editing methods and compositions are also disclosed in US Pat. Nos. 10,519,456 and 10,285,348 82, the entire content of which is herein incorporated by reference.

The gene-editing machinery (e.g., the DNA modifying enzyme) introduced into the plants can be controlled by any promoter that can drive recombinant gene expression in plants. In some embodiments, the promoter is a constitutive promoter. In some embodiments, the promoter is a tissue-specific promoter, e.g., a pollen-specific promoter or a sperm cell specific promoter, a zygote specific promoter, or a promoter that is highly expressed in sperm, eggs and zygotes (e.g., prOsActin1) . Suitable promoters are disclosed in U.S. Pat. No. 10,519,456, the entire content of which is herein incorporated by reference.

In another aspect, provided herein is a method of editing plant genomic DNA. In some embodiments, the method comprises using a first soybean plant expressing a DNA modification enzyme and at least one optional guide nucleic acid as described above to pollinate a target plant comprising genomic DNA to be edited.

V. Stacking

The various polynucleotides and variants thereof provided herein can be stacked with one or more polynucleotides encoding a desirable trait such as a polynucleotide that confers, for example, insect, disease or herbicide resistance or other desirable agronomic traits of interest including, but not limited to, traits associated with high oil content; increased digestibility; balanced amino acid content; and high energy content. Such traits may refer to properties of both seed and non-seed plant tissues, or to food or feed prepared from plants or seeds having such traits.

As used herein, gene or trait “stacking” is combining desired genes or traits into one transgenic plant line. As one approach, plant breeders stack transgenic traits by making crosses between parents that each have a desired trait and then identifying offspring that have both of these desired traits (so-called “breeding stacks” ) . Another way to stack genes is by transferring two or more genes into the cell nucleus of a plant at the same time during transformation. Another way to stack genes is by re-transforming a transgenic plant with another gene of interest. For example, gene stacking can be used to combine two different insect resistance traits, an insect resistance trait and a disease resistance trait, or an herbicide resistance trait (such as, for example, Bt11) . The use of a selectable marker in addition to a gene of interest would also be considered gene stacking.

In some embodiments, a nucleic acid molecule or vector of the disclosure can include an additional coding sequence for one or more polypeptides or double stranded RNA molecules (dsRNA) of interest for agronomic traits that primarily are of benefit to a seed company, grower or grain processor. A polypeptide of interest can be any polypeptide encoded by a nucleotide sequence of interest. Non-limiting examples of polypeptides of interest that are suitable for production in plants include those resulting in agronomically important traits such as herbicide resistance (also sometimes referred to as “herbicide tolerance” ) , virus resistance, bacterial pathogen resistance, insect resistance, nematode resistance, or fungal resistance. See, e.g., U.S. Patent Nos. 5,569,823; 5,304,730; 5,495,071; 6,329,504; and 6,337,431. The polypeptide also can be one that increases plant vigor or yield (including traits that allow a plant to grow at different temperatures, soil conditions and levels of sunlight and precipitation) , or one that allows identification of a plant exhibiting a trait of interest (e.g., a selectable marker, seed coat color, relative maturity group, etc. ) . Various polypeptides of interest, as well as methods for introducing these polypeptides into a plant, are described, for example, in US Patent Nos. 4,761,373; 4,769,061; 4,810,648; 4,940,835; 4,975,374; 5,013,659; 5,162,602; 5,276,268; 5,304,730; 5,495,071; 5,554,798; 5,561,236; 5,569,823; 5,767,366; 5,879,903, 5,928,937; 6,084,155; 6,329,504 and 6,337,431; as well as US Patent Publication No. 2001/0016956.

Polynucleotides conferring resistance/tolerance to an herbicide that inhibits the growing point or meristem, such as an imidazalinone or a sulfonylurea can also be suitable in some embodiments. Exemplary polynucleotides in this category code for mutant ALS and AHAS enzymes as described, e.g., in U.S. Patent Nos. 5,767,366 and 5,928,937. U.S. Patent Nos. 4,761,373 and 5,013,659 are directed to plants resistant to various imidazalinone or sulfonamide herbicides. U.S. Patent No. 4,975,374 relates to plant cells and plants containing a nucleic acid encoding a mutant glutamine synthetase (GS) resistant to inhibition by herbicides that are known to inhibit GS, e.g., phosphinothricin and methionine sulfoximine. U.S. Patent No. 5,162,602 discloses plants resistant to inhibition by cyclohexanedione and aryloxyphenoxypropanoic acid herbicides. The resistance is conferred by an altered acetyl coenzyme A carboxylase (ACCase) .

Polypeptides encoded by nucleotides sequences conferring resistance to glyphosate are also suitable for the disclosure. See, e.g., U.S. Patent No. 4,940,835 and U.S. Patent No. 4,769,061. U.S. Patent No. 5,554,798 discloses transgenic glyphosate resistant maize plants, which resistance is conferred by an altered 5-enolpyruvyl-3-phosphoshikimate (EPSP) synthase gene.

Polynucleotides coding for resistance to phosphono compounds such as glufosinate ammonium or phosphinothricin, and pyridinoxy or phenoxy propionic acids and cyclohexones are also suitable. See, European Patent Application No. 0 242 246. See also, U.S. Patent Nos. 5,879,903, 5,276,268, and 5,561,236.

Other suitable polynucleotides include those coding for resistance to herbicides that inhibit photosynthesis, such as a triazine and a benzonitrile (nitrilase) See, U.S. Patent No. 4,810,648. Additional suitable polynucleotides coding for herbicide resistance include those coding for resistance to 2, 2-dichloropropionic acid, sethoxydim, haloxyfop, imidazolinone herbicides, sulfonylurea herbicides, triazolopyrimidine herbicides, s-triazine herbicides and bromoxynil. Also suitable are polynucleotides conferring resistance to a protox enzyme, or that provide enhanced resistance to plant diseases; enhanced tolerance of adverse environmental conditions (abiotic stresses) including but not limited to drought, excessive cold, excessive heat, or excessive soil salinity or extreme acidity or alkalinity; and alterations in plant architecture or development, including changes in developmental timing. See, e.g., U.S. Patent Publication No. 2001/0016956 and U.S. Patent No. 6,084,155.

Additional suitable polynucleotides include those coding for insecticidal polypeptides. These polypeptides may be produced in amounts sufficient to control, for example, insect pests (i.e., insect controlling amounts) . It is recognized that the amount of production of an insecticidal polypeptide in a plant necessary to control insects or other pests may vary depending upon the cultivar, type of pest, environmental factors and the like. Polynucleotides useful for additional insect or pest resistance include, for example, those that encode toxins identified in Bacillus organisms. Polynucleotides comprising nucleotide sequences encoding Bacillus thuringiensis (Bt) Cry proteins from several subspecies have been cloned and recombinant clones have been found to be toxic to lepidopteran, dipteran and/or coleopteran insect larvae. Examples of such Bt insecticidal proteins include the Cry proteins such as Cry1Aa, Cry1Ab, Cry1Ac, Cry1B, Cry1C, Cry1D, Cry1Ea, Cry1Fa, Cry3A, Cry9A, Cry9B, Cry9C, and the like, as well as vegetative insecticidal proteins such as Vip1, Vip2, Vip3, and the like. A full list of Bt-derived proteins can be found on the worldwide web at Bacillus thuringiensis Toxin Nomenclature Database maintained by the University of Sussex (see also, Crickmore et al. (1998) Microbiol. Mol. Biol. Rev. 62: 807-813) .

In embodiments, an additional polypeptide is an insecticidal polypeptide derived from a non-Bt source, including without limitation, an alpha-amylase, a peroxidase, a cholesterol oxidase, a patatin, a protease, a protease inhibitor, a urease, an alpha-amylase inhibitor, a pore-forming protein, a chitinase, a lectin, an engineered antibody or antibody fragment, a Bacillus cereus insecticidal protein, a Xenorhabdus spp. (such as X. nematophila or X. bovienii) insecticidal protein, a Photorhabdus spp. (such as P. luminescens or P. asymobiotica) insecticidal protein, a Brevibacillus spp. (such as B. laterosporous) insecticidal protein, a Lysinibacillus spp. (such as L. sphearicus) insecticidal protein, a Chromobacterium spp. (such as C. subtsugae or C. piscinae) insecticidal protein, a Yersinia spp. (such as Y. entomophaga) insecticidal protein, a Paenibacillus spp. (such as P. propylaea) insecticidal protein, a Clostridium spp. (such as C. bifermentans) insecticidal protein, a Pseudomonas spp. (such as P. fluorescens) and a lignin.

Polypeptides that are suitable for production in plants further include those that improve or otherwise facilitate the conversion of harvested plants or plant parts into a commercially useful product, including, for example, increased or altered carbohydrate content or distribution, improved fermentation properties, increased oil content, increased protein content, modified oil profile, improved digestibility, and increased nutraceutical content, e.g., increased phytosterol content, increased tocopherol content, increased stanol content or increased vitamin content. Polypeptides of interest also include, for example, those resulting in or contributing to a reduced content of an unwanted component in a harvested crop, e.g., phytic acid, or sugar degrading enzymes. By “resulting in” or “contributing to” is intended that the polypeptide of interest can directly or indirectly contribute to the existence of a trait of interest (e.g., increasing cellulose degradation using a heterologous cellulase enzyme) .

In some embodiments, the polypeptide contributes to improved digestibility for food or feed. Xylanases are hemicellulolytic enzymes that improve the breakdown of plant cell walls, which leads to better utilization of the plant nutrients by an animal. This leads to improved growth rate and feed conversion. Also, the viscosity of the feeds containing xylan can be reduced. Heterologous production of xylanases in plant cells also can facilitate lignocellulosic conversion to fermentable sugars in industrial processing.

Numerous xylanases from fungal and bacterial microorganisms have been identified and characterized (see, e.g., U.S. Patent No. 5,437,992; Coughlin et al. (1993) “Proceedings of the Second TRICEL Symposium on Trichoderma reesei Cellulases and Other Hydrolases” Espoo; Souminen and Reinikainen, eds. (1993) Foundation for Biotechnical and Industrial Fermentation Research 8: 125-135; U.S. Patent Publication No. 2005/0208178; and PCT Publication No. WO 03/16654) . In particular, three specific xylanases (XYL-I, XYL-II, and XYL-III) have been identified in T. reesei (Tenkanen et al. (1992) Enzyme Microb. Technol. 14: 566; Torronen et al. (1992) Bio/Technology 10: 1461; and Xu et al. (1998) Appl. Microbiol. Biotechnol. 49: 718) .

In other embodiments, a polypeptide useful for the disclosure can be a polysaccharide degrading enzyme. Plants of this disclosure producing such an enzyme may be useful for generating, for example, fermentation feedstocks for bioprocessing. In some embodiments, enzymes useful for a fermentation process include alpha amylases, proteases, pullulanases, isoamylases, cellulases, hemicellulases, xylanases, cyclodextrin glycotransferases, lipases, phytases, laccases, oxidases, esterases, cutinases, granular starch hydrolyzing enzyme and other glucoamylases.

Polysaccharide-degrading enzymes include: starch degrading enzymes such as α-amylases (EC 3.2.1.1) , glucuronidases (E.C. 3.2.1.131) ; exo-1, 4-α-D glucanases such as amyloglucosidases and glucoamylase (EC 3.2.1.3) , β-amylases (EC 3.2.1.2) , α-glucosidases (EC 3.2.1.20) , and other exo-amylases; starch debranching enzymes, such as a) isoamylase (EC 3.2.1.68) , pullulanase (EC 3.2.1.41) , and the like; b) cellulases such as exo-1, 4-3-cellobiohydrolase (EC 3.2.1.91) , exo-1, 3-β-D-glucanase (EC 3.2.1.39) , β-glucosidase (EC 3.2.1.21) ; c) L-arabinases, such as endo-1, 5-α-L-arabinase (EC 3.2.1.99) , α-arabinosidases (EC 3.2.1.55) and the like; d) galactanases such as endo-1, 4-β-D-galactanase (EC 3.2.1.89) , endo-1, 3-β-D-galactanase (EC 3.2.1.90) , α-galactosidase (EC 3.2.1.22) , β-galactosidase (EC 3.2.1.23) and the like; e) mannanases, such as endo-1, 4-β-D-mannanase (EC 3.2.1.78) , β-mannosidase (EC 3.2.1.25) , α-mannosidase (EC 3.2.1.24) and the like; f) xylanases, such as endo-1, 4-β-xylanase (EC 3.2.1.8) , β-D-xylosidase (EC 3.2.1.37) , 1, 3-β-D-xylanase, and the like; and g) other enzymes such as α-L-fucosidase (EC 3.2.1.51) , α-L-rhamnosidase (EC 3.2.1.40) , levanase (EC 3.2.1.65) , inulanase (EC 3.2.1.7) , and the like. In one embodiment, the α-amylase is the synthetic α-amylase, Amy797E, described is US Patent No. 8,093,453, herein incorporated by reference in its entirety.

Further enzymes which may be used with the disclosure include proteases, such as fungal and bacterial proteases. Fungal proteases include, but are not limited to, those obtained from Aspergillus, Trichoderma, Mucor and Rhizopus, such as A. niger, A. awamori, A. oryzae and M. miehei. In some embodiments, the polypeptides of this disclosure can be cellobiohydrolase (CBH) enzymes (EC 3.2.1.91) . In one embodiment, the cellobiohydrolase enzyme can be CBH1 or CBH2.

Other enzymes useful with the disclosure include, but are not limited to, hemicellulases, such as mannases and arabinofuranosidases (EC 3.2.1.55) ; ligninases; lipases (e.g., E.C. 3.1.1.3) , glucose oxidases, pectinases, xylanases, transglucosidases,

alpha

1, 6 glucosidases (e.g., E.C. 3.2.1.20) ; esterases such as ferulic acid esterase (EC 3.1.1.73) and acetyl xylan esterases (EC 3.1.1.72) ; and cutinases (e.g., E.C. 3.1.1.74) .

Double stranded RNA molecules useful with the disclosure include but are not limited to those that suppress target insect genes. As used herein the words "gene suppression" , when taken together, are intended to refer to any of the well-known methods for reducing the levels of protein produced as a result of gene transcription to mRNA and subsequent translation of the mRNA. Gene suppression is also intended to mean the reduction of protein expression from a gene or a coding sequence including posttranscriptional gene suppression and transcriptional suppression. Posttranscriptional gene suppression is mediated by the homology between of all or a part of a mRNA transcribed from a gene or coding sequence targeted for suppression and the corresponding double stranded RNA used for suppression and refers to the substantial and measurable reduction of the amount of available mRNA available in the cell for binding by ribosomes. The transcribed RNA can be in the sense orientation to effect what is called co-suppression, in the anti-sense orientation to effect what is called anti-sense suppression, or in both orientations producing a dsRNA to effect what is called RNA interference (RNAi) . Transcriptional suppression is mediated by the presence in the cell of a dsRNA, a gene suppression agent, exhibiting substantial sequence identity to a promoter DNA sequence or the complement thereof to effect what is referred to as promoter trans suppression. Gene suppression may be effective against a native plant gene associated with a trait, e.g., to provide plants with reduced levels of a protein encoded by the native gene or with enhanced or reduced levels of an affected metabolite. Gene suppression can also be effective against target genes in plant pests that may ingest or contact plant material containing gene suppression agents, specifically designed to inhibit or suppress the expression of one or more homologous or complementary sequences in the cells of the pest. Such genes targeted for suppression can encode an essential protein, the predicted function of which is selected from the group consisting of muscle formation, juvenile hormone formation, juvenile hormone regulation, ion regulation and transport, digestive enzyme synthesis, maintenance of cell membrane potential, amino acid biosynthesis, amino acid degradation, sperm formation, pheromone synthesis, pheromone sensing, antennae formation, wing formation, leg formation, development and differentiation, egg formation, larval maturation, digestive enzyme formation, hemolymph synthesis, hemolymph maintenance, neurotransmission, cell division, energy metabolism, respiration, and apoptosis.

In one non-limiting embodiment, the polynucleotides provide herein are stacked with other polynucleotides that increase protein content, amino acid content, oil content and/or fatty acid content, including, for example, the polynucleotides set forth in METHODS AND COMPOSITIONS FOR INCREASING PROTEIN AND/OR OIL CONTENT AND MODIFYING OIL PROFILE IN A PLANT, International Application No. ____________, filed ______, 2022 concurrently herewith (Attorney Docket No. 086879-1262814; Syngenta Ref. No. 82424-WO-REG-ORG-P-1 and herein incorporated by reference in its entirety.

As used herein, “selectable marker” means a nucleotide sequence that when expressed imparts a distinct phenotype to the plant, plant part and/or plant cell expressing the marker and thus allows such transformed plants, plant parts and/or plant cells to be distinguished from those that do not have the marker. Such a nucleotide sequence may encode either a selectable or screenable marker, depending on whether the marker confers a trait that can be selected for by chemical means, such as by using a selective agent (e.g., an antibiotic, herbicide, or the like) , or on whether the marker is simply a trait that one can identify through observation or testing, such as by screening (e.g., the R-locus trait) . Selectable markers can also include the makers associated with oil and/or protein content and fatty acid profile (e.g., as described in Whiting, R.M., et al., BMC Plant Biol. 2020 Oct 23; 20 (1) : 485) .

VI. Marker assisted selection of the plants with improved traits.

In addition to the phenotypic traits, the genetic characteristic of the plant as represented by its genetic marker profile can be used to select plants of desired traits. The term “marker-based selection” refers to the use of genetic markers to detect one or more nucleic acids from the plant, where the nucleic acid is associated with a desired trait to identify plants that carry genes for desirable (or undesirable) traits. Markers include but are not limited to Restriction Fragment Length Polymorphisms (RFLPs) , Randomly Amplified Polymorphic DNAs (RAPDs) , Arbitrarily Primed Polymerase Chain Reaction (AP-PCR) , DNA Amplification Fingerprinting (DAF) , Sequence Characterized Amplified Regions (SCARs) , Amplified Fragment Length Polymorphisms (AFLPs) , Simple Sequence Repeats (SSRs) which are also referred to as Microsatellites, and Single Nucleotide Polymorphisms (SNPs) . There are known sets of public markers that are being examined by ASTA and other industry groups for their applicability in standardizing determinations of what constitutes an essentially derived variety under the US Plant Variety Protection Act. However, these standard markers do not limit the type of marker and marker profile which can be employed in breeding or developing backcross conversions, or in distinguishing varieties or plant parts or plant cells or verify a progeny pedigree. Primers and PCR protocols for assaying these and other markers are disclosed in the Soybase (sponsored by the USDA Agricultural Research Service and Iowa State University) located at the world wide web at 129.186.26.94/SSR. html.

The term “associated with” as used herein refers to a recognizable and/or detectable relationship between two entities. For example, the phrase “associated with increased protein content” refers to a trait, locus, gene, allele, marker, phenotype, etc., or the expression product thereof, the presence or absence of which can influence or indicate an extent and/or degree to which a plant or its progeny exhibits increased protein content as compared to a control plant. As such, a marker is “associated with” a trait when it is linked to it and when the presence of the marker is an indicator of whether and/or to what extent the desired trait or trait form will occur in a plant/germplasm comprising the marker. Similarly, a marker is “associated with” an allele when it is linked to it and when the presence (or absence) of the marker is an indicator of whether the allele is present (or absent) in a plant, germplasm, or population comprising the marker. For example, “a marker associated with increased protein content” refers to a marker whose presence or absence can be used to predict whether and/or to what extent a plant will display increased protein content as compared to a control plant.

The term “allele (s) ” refer to any of one or more alternative forms of a gene, all of which alleles relate to at least one trait or characteristic. In a diploid cell, the two alleles of a given gene occupy corresponding loci on a pair of homologous chromosomes.

The term “genotype” and variants thereof refer to the genetic composition of an organism, including, for example, whether a diploid organism is heterozygous (i.e., has two different alleles for a given gene or QTL) or homozygous (i.e., has the same allele for a given gene or QTL) for one or more genes or loci (e.g., a SNP, a haplotype, a gene mutation, an insertion, or a deletion) .

In one embodiment, the markers used to identify the plants comprising the polynucleotides disclosed herein are SNPs. Non-limiting examples of SNP genotyping methods include hybridization, primer extension, oligonucleotide ligation, nuclease cleavage, minisequencing and coded spheres. Such methods are well known and disclosed in e.g., Gut, I.G., Hum. Mutat. 17: 475-492 (2001) ; Shi, Clin. Chem. 47 (2) : 164-172 (2001) ; Kwok, Pharmacogenomics 1 (1) : 95-100 (2000) ; and Bhattramakki and Rafalski, Discovery and application of single nucleotide polymorphism markers in plants, in PLANT GENOTYPING: THE DNA FINGERPRINTING OF PLANTS, CABI Publishing, Wallingford (2001) . A wide range of commercially available technologies utilize these and other methods to interrogate SNPs, including Masscode SupTM/Sup (Qiagen, Germantown, MD, (Hologic, Madison, WI) , (Applied Biosystems, Foster City, CA) , (Applied Biosystems, Foster City, CA) and Beadarrays SupTM/Sup (Illumina, San Diego, CA) .

In some embodiments, an assay (e.g., generally a two-step allelic discrimination assay or similar) , a KASP SupTM/Sup assay (generally a one-step allelic discrimination assay defined below or similar) , or both can be employed to identify the SNPs that associate with increased protein content, increased oil content, and/or modified oil profile. In an exemplary two-step assay, a forward primer, a reverse primer, and two assay probes that recognize two different alleles at the SNP site (or hybridization oligos) are employed. The forward and reverse primers are employed to amplify genetic loci that comprise SNPs that are associated with increased protein content, increased oil content, and/or modified oil profile. The particular nucleotides that are present at the SNP positions are then assayed using the probes. In some embodiments, the assay probes and the reaction conditions are designed such that an assay probe will only hybridize to the reverse complement of a 100%perfectly matched sequence, thereby permitting identification of which allele (s) that are present based upon detection of hybridizations. In some embodiments, the probes are differentially labeled with, for example, fluorophores to permit distinguishing between the two assay probes in a single reaction. Exemplary methods of amplifying include employing a polymerase chain reaction (PCR) or ligase chain reaction (LCR) using a nucleic acid isolated from a soybean plant or germplasm as a template in the PCR or LCR.

In some embodiments, a number of SNP alleles together within a sequence, or across linked sequences, can be used to describe a haplotype for any particular genotype. Ching et al., BMC Genet. 3: 19 (2002) (14 pages) ; Gupta et al., (2001) Curr Sci. 80: 524–535, Rafalski, Plant Sci. 162: 329-333 (2002) . In some cases, haplotypes can be more informative than single SNPs and can be more descriptive of any particular genotype. For example, a single SNP may be allele “T” for a specific disease resistant line or variety, but the allele “T” might also occur in the soybean breeding population being utilized for recurrent parents. In this case, a combination of alleles at linked SNPs may be more informative. Once a unique haplotype has been assigned to a donor chromosomal region, that haplotype can be used in that population or any subset thereof to determine whether an individual has a particular gene. The use of automated high throughput marker detection platforms known to those of ordinary skill in the art makes this process highly efficient and effective.

These SNP markers can be used in a marker assisted breeding program to move traits, such as native traits or traits conferred by transgenes or traits conferred by genome editing, into a desired plant background. As used herein, the term “native trait” refers to a trait already existing in germplasm, including wild relatives of crop species, or that can be produced by recombination of existing traits. For example, progeny plants from a cross between a donor soybean plant comprising in its genome a nucleic acid sequence encoding SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, and a recipient soybean plant not comprising said nucleic acid sequence can be screened to detect the presence of the markers associated with increased protein content, increased oil content, and/or modified oil profile. Plants comprising said markers can be selected and verified for increased protein content, increased oil content, and/or modified oil profile as compared to control plants. In some embodiments, the donor plant comprises a nucleic acid sequence encoding SEQ ID NO: 3.

VII. Assay, kits, and primers

Also provided herein are the kits and primers that can be used to introduce a polynucleotide sequence as described in this disclosure into a recipient plant or to detect a polynucleotide sequence as described in this disclosure in a plant.

Also provided herein are kits and primers that can be used to identify plants that have increased protein content, increased oil content, and/or modified oil profile. As a non-limiting example, the primers can include Glyma. 20G092400-zF, ATGGCCTCCAACGGCG (SEQ ID NO: 37) ; and Glyma. 20G092400-zR, AGCCGAAAGAAGAGCACAAGTAAACC (SEQ ID NO: 38) .

Also provided herein are the kits and primers that can be used to detect the expression level of the polypeptide disclosed herein in plants. As a non-limiting example, the primers can include Glyma. 20G092400-q-F: CTGATGCTCAAAAGCTTAGGACCCG (SEQ ID NO: 29) ; and Glyma. 20G092400-q-R: AACCTTGTTGTAAACCTGACGAGAAAT (SEQ ID NO: 30) (Table 14) .

In some embodiments, the kit may also comprise one or more probes having a sequence corresponding to or complementary to a sequence having 80%to 100%sequence identity with a specific region of the transgenic event or gene editing event. In some embodiments, the kit may comprise any reagent and material required to perform the assay or detection method.

EXEMPLARY EMBODIMENTS

As used below, any reference to a series of embodiments is to be understood as a reference to each of those embodiments disjunctively (e.g., "Embodiments 1-4" is to be understood as "

Embodiments

1, 2, 3, or 4" ) .

Embodiment 1 is an elite Glycine max plant having in its genome a nucleic acid sequence from a donor Glycine plant, wherein the donor Glycine plant is a different strain from the elite Glycine max plant, and wherein the nucleic acid sequence encoding at least one polypeotide having at least 90%identity or 95%identity to SEQ ID NOs: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, wherein said polypeptide confers increased protein content, oil content, and/or modified oil profile on the elite Glycine max plant as compared to a control plant not comprising said nucleic acid sequence.

Embodiment 2 is the elite Glycine max plant of embodiment 1, wherein the donor Glycine plant is a Glycine soja plant or Glycine max plant.

Embodiment 3 is the elite Glycine max plant of embodiment 2, wherein the Glycine soja plant is a ZYD00006 variety.

Embodiment 4 is the elite Glycine max plant of embodiment 2, wherein the Glycine max plant is a DN50 variety or a SN14 variety.

Embodiment 5 is the elite Glycine max plant of any one of embodiments 1-4, wherein the nucleic acid sequence encodes at least one polypeptide having the amino acid sequence set forth in the SEQ ID NOs: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 and/or 72.

Embodiment 6 is the elite Glycine max plant of any one of embodiments 1-4, wherein the nucleic acid sequence has at least 90%identity, at least 95%identity, or at least 100%identity to any one of SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 and/or 70 or the nucleic acid sequence has at least 90%identity, at least 95%identity, or at least 100%identity to any one of SEQ ID NO: 2, 5, 8, 11, 14, 44, 47, 50, 53, 56, 59, 62, 65, 68 or 71.

Embodiment 7 is the elite Glycine max plant of any one of embodiments 1-6, wherein the polypeptide encoded by the nucleic acid sequence has at least 90%identity or at least 95%identity to SEQ ID NO: 3, wherein the polypeptide comprises an amino transferase domain, wherein the amino transferase domain has no more than two, no more than five, no more than ten amino acid substitutions as compared to amino acid residues 91-274 of SEQ ID NO: 3.

Embodiment 8 is the elite Glycine max plant of any one of embodiments 1-7, wherein said nucleic acid sequence is introduced into said plant genome by genome editing of genomic sequences corresponding to and comprising any one of SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 or 70, wherein the genome editing confers increased protein content, oil content, and/or oil profile.

Embodiment 9 is the elite Glycine max plant of embodiment 8, wherein the gene editing is by CRISPR, TALEN, meganucleases, or through modification of genomic nucleic acids.

Embodiment 10 is the elite Glycine max plant of embodiment 1-6, wherein said nucleic acid sequence is introduced into said plant genome by transgenic expression of (a) a nucleic acid sequence having at least 90%identity or at least 95%identity to any one of SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 and/or 70, (b) a nucleic acid sequence encoding a polypeptide having at least 90%identity or at least 95%identity to the sequence of any one of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 and/or 72, or (c) a nucleic acid sequence encoding a polypeptide the sequence of any one of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 and/or 72, wherein said polypeptide confers increased protein content, increased oil content, and/or modified oil profile on the elite Glycine max plant.

Embodiment 11 is the elite Glycine max plant of any one of embodiments 1-10, wherein the elite Glycine max plant is an agronomically elite Glycine max plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, or herbicide tolerance.

Embodiment 12 is a plant, the genome of which has been edited to comprise a nucleic acid sequence encoding at least one polypeptide having at least 90%identity or 95%identity to SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 and/or 72, wherein said polypeptide confers increased protein content, increased oil content, and/or modified oil profile relative to a control plant, wherein the plant does not comprise said nucleic acid sequence before the genome editing.

Embodiment 13 is the plant of embodiment 12, wherein the nucleic acid sequence is introduced into said plant genome by genome editing of the nucleic acid sequence set forth in SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 and/or 70.

Embodiment 14 is the plant of

embodiment

12 or 13, wherein the genome editing comprises duplication, inversion, promoter modification, terminator modification and/or splicing modification of the nucleic acid sequence.

Embodiment 15 is the plant of any one of embodiments 12-14, wherein the genome editing is accomplished through CRISPR, TALEN, meganucleases, or through modification of genomic nucleic acids.

Embodiment 16 is the plant of any one of embodiments 12-15, wherein the plant is an agronomically elite plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, or herbicide tolerance.

Embodiment 17 is the plant of any one of embodiments 12-16, wherein the nucleic acid sequence is operably linked to a heterologous promoter and wherein the heterologous promoter is active in the plant.

Embodiment 18 is the plant of embodiment 17, wherein the heterologous promoter is a native promoter or active variant of fragment thereof.

Embodiment 19 is a plant having stably incorporated into its genome a nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence encodes a polypeptide having an amino acid sequence that has at least 85%identity, at least 90%identity, or at least 95%identity to at least one of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 and/or 72, or an amino acid sequence set forth in SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 and/or 72, wherein said nucleic acid sequence is heterologous to the plant, and wherein the plant has increased protein content, increased oil content, and/or modified oil profileas compared to a control plant.

Embodiment 20 is the plant of embodiment 19, wherein the nucleic acid sequence comprises at least 85%identity, at least 90%identity, or at least 95%identity to at least one of SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 and/or 70, or the nucleic acid sequence is any one of SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 and/or 70.

Embodiment 21 is the plant of embodiments 19 or 20, wherein the nucleic acid sequence is introduced into the genome by transgenic expression.

Embodiment 22 is the plant of embodiments 19 or 20, wherein the nucleic acid sequence is introduced into the genome by genome editing.

Embodiment 23 is the plant of embodiment 22, wherein the promoter is an endogenous promoter.

Embodiment 24 is the plant of any one of embodiments 19-23, wherein the promoter is a constitutive promoter, inducible promoter, or a tissue-specific promoter

Embodiment 25 is the plant of any one of embodiments 19-24, wherein the plant is a dicot plant.

Embodiment 26 is the plant of embodiment 25, wherein the dicot plant is a soybean plant or an elite soybean plant.

Embodiment 27 is the plant of any one of embodiments 19-24, wherein the plant is a monocot plant.

Embodiment 28 is the plant of embodiment 27, wherein the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane.

Embodiment 29 is the plant of any one of embodiments 19-28, wherein the plant is an agronomically elite plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, or herbicide tolerance.

Embodiment 30 is a progeny plant from the elite Glycine max plant of any one of embodiments 1-11 or the plant of any one of embodiments 12-29, wherein the progeny plant has stably incorporated into its genome the nucleic acid sequence.

Embodiment 31 is a plant cell, seed, or plant part derived from the elite Glycine max plant of any one of embodiments 1-11 or the plant of any one of embodiments 12-29, wherein said plant cell, seed or plant part has stably incorporated into its genome the nucleic acid sequence.

Embodiment 32 is a harvest product derived from the elite Glycine max plant of any one of embodiments 1-11 or the plant of any one of embodiments 12-29.

Embodiment 33 is a processed product derived from the harvest product of embodiment 32, wherein the processed product is a flour, a meal, an oil, a starch, or a product derived from any of the foregoing.

Embodiment 34 is a method of producing a soybean plant having increased polypeptide and/or oil content, the method comprising the steps of: a) providing a donor soybean plant comprising in its genome a nucleic acid sequence encoding any at least one polypeptide having at least 90%identity or 95%identity to SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, wherein said nucleic acid sequence confers to said donor soybean plant increased protein content, increased oil content, and/or modified oil profile compared to donor Glycine plant, b) crossing the donor soybean plant of a) with a recipient soybean plant not comprising said nucleic acid sequence; and c) selecting a progeny plant from the cross of b) by detecting the presence of the nucleic acid sequence, or the presence of one or more molecular markers associated with the nucleic acid sequencein the progeny plant, thereby producing a soybean plant having increased protein content, increased oil content, and/or modified oil profile.

Embodiment 35 is the method of embodiment 34, wherein the molecular marker is a single nucleotide polymorphism (SNP) , a quantitative trait locus (QTL) , an amplified fragment length polymorphism (AFLP) , randomly amplified polymorphic DNA (RAPD) , a restriction fragment length polymorphism (RFLP) or a microsatellite.

Embodiment 36 is the method of embodiment 34 or 35, wherein the either donor or recipient soybean plant is an elite Glycine max plant.

Embodiment 37 is a method of producing a Glycine max plant with increased protein content, increased oil content, and/or modified oil profile, the method comprising the steps of: a) isolating a nucleic acid from a Glycine max plant b) detecting in the nucleic acid of a) at least one molecular marker associated with a nucleic acid sequence comprising any one of SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 or 70, wherein said nucleic acid sequence confers to the Glycine max plant increased protein content, increased oil content, and/or modified oil profile; c) selecting a Glycine max plant based on the presence of the molecular marker detected in b) ; and d) producing a Glycine max progeny plant from the plant of c) identified as having said molecular marker associated with increased polypeptide and/or increased oil content.

Embodiment 38 is the method of embodiment 37, wherein the molecular marker is a single nucleotide polymorphism (SNP) , a quantitative trait locus (QTL) , an amplified fragment length polymorphism (AFLP) , randomly amplified polymorphic DNA (RAPD) , a restriction fragment length polymorphism (RFLP) or a microsatellite.

Embodiment 39 is the method of embodiment 38, wherein the detecting comprises amplifying a molecular marker locus or a portion of the molecular marker locus and detecting the resulting amplified molecular marker amplicon.

Embodiment 40 is the method of embodiment 39, wherein the amplifying comprises employing a polymerase chain reaction (PCR) or ligase chain reaction (LCR) using a nucleic acid isolated from a soybean plant or germplasm as a template in the PCR or LCR.

Embodiment 41 is the method of embodiment 39, wherein the nucleic acid is selected from DNA or RNA.

Embodiment 42 is a plant produced by the method of any one of embodiments 34-41.

Embodiment 43 is a method of conferring increased protein content, increased oil content, and/or modified oil profile to a plant comprising: a) introducing into the genome of the plant a nucleic acid molecule operably linked to a promoter active in the plant, wherein the nucleic acid sequence is stably incorporated into the genome, wherein the nucleic acid sequence encodes a polypeptide having (i) an amino acid sequence comprising at least 85%, at least 90%, or at least 95%identity to any one of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, or (ii) an amino acid sequence set forth in SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, wherein said nucleic acid sequence is heterologous to the plant, and wherein expression of said nucleic acid sequence increases protein content, increased oil content, and/or modified oil profile compared to a control plant not expressing said nucleic acid sequence.

Embodiment 44 is the method of embodiment 43, wherein the nucleic acid sequence is introduced into the genome of the plant by transformation.

Embodiment 45 is the method of embodiment 44, wherein the nucleic acid sequence is introduced into the genome of the plant by crossing a donor plant comprising the nucleic acid sequence with the plant to produce a progeny plant having increased protein content, increased oil content, and/or modified oil profile.

Embodiment 46 is the method of embodiment 45, wherein the nucleic acid sequence is introduced into the genome of the plant by gene editing of the genome of the plant.

Embodiment 47 is the method of embodiment 45, wherein the method comprises Cas12a mediated gene replacement.

Embodiment 48 is the method of any one of embodiments 43-47, wherein the promoter is an exogenous promoter.

Embodiment 49 is the method of any of embodiments 43-47, wherein the promoter is an endogenous promoter.

Embodiment 50 is the method of any one of embodiments 43-49 wherein the method comprises screening for the introduced nucleic acid sequencewith PCR and/or sequencing.

Embodiment 51 is the method of any one of embodiments 43-50, wherein the plant is a dicot plant.

Embodiment 52 is the method of embodiment 51, wherein the dicot plant is a soybean plant.

Embodiment 53 is the method of any one of embodiments 43-51, wherein the plant is a monocot plant.

Embodiment 54 is the method of embodiment 53, wherein the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane.

Embodiment 55 is a plant produced by the method of any one of embodiments 43-54.

Embodiment 56 is a polypeptide selected from: (a) a polypeptide having the amino acid sequence shown in SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, or any portion thereof, wherein the portion confers increased polypeptide and/or oil content, and having a heterologous amino acid sequence attached thereto; (b) a polypeptide comprising the amino acid sequence of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, and having substitution and/or deletion and/or addition of one or more amino acid residues, wherein expression of the polypeptide confers increased polypeptide and/or oil content on the plant; (c) a polypeptide having more than 99%, more than 95%, more than 90%, more than 85%, or more than 80%identity with the amino acid sequence of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, wherein the polypeptide when expressed in a plant confers increased polypeptide and/or oil content on the plant; or (d) a fusion polypeptide comprising the amino acid sequence of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, or the polypeptide as defined in any one of (a) to (c) .

Embodiment 57 is a nucleic acid molecule comprising: (a) a nucleotide sequence encoding a protein having an amino acid sequence sharing at least 90%, 95%or 100%sequence identity to SEQ ID NOs: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, wherein said nucleotide sequence comprises a heterologous nucleic acid sequence attached thereto and expression of the nucleic acid molecule in a plant increases protein and/or oil content in the plant; (b) the nucleotide sequence of part (a) comprising a sequence of SEQ ID NOs: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 or 70; or (c) the nucleotide sequence of part (a) having at least more than 99%, at least 95%, at least 90%, at least 85%, or at least 80%identity to any one of SEQ ID NOs1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 or 70.

Embodiment 58 is an expression cassette comprising the nucleic acid molecule of embodiment 56 or encoding the polypeptide of embodiment 57.

Embodiment 59 is the expression cassette of embodiment 58, wherein the nucleic acid molecule is operably linked to a promoter capable of directing expression in a plant cell.

Embodiment 60 is the expression cassette of embodiment 59, wherein the promoter is an endogenous promoter.

Embodiment 61 is the expression cassette of embodiment 59, wherein the promoter is an exogenous promoter.

Embodiment 62 is the expression cassette of embodiment 61, wherein the promoter comprises pSOY1 (SEQ ID NO: 20) .

Embodiment 63 is a vector comprising the nucleic acid molecule of embodiment 62, the expression cassette of any one of embodiments 56-61, a nucleic acid molecule having the sequence set forth in SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 or 70, or a nucleic acid sequence encoding the polypeptide of

embodiment

3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72.

Embodiment 64 is a transgenic cell comprising the nucleic acid molecule of embodiment 63 or the expression cassette of any one of embodiments 56-63.

Embodiment 65. Use of the polypeptide of embodiment 56 or the nucleic acid molecule of embodiment 57, or the expression cassette, of any one of embodiments 56-64, or the transgenic cell of embodiment 63 in conferring increased protein content, increased oil content, and/or modified oil profile.

Embodiment 66 is use of the expression cassette of any one of embodiments 56-64 in a cell, wherein the expression level and/or activity of the polypeptide in the cell is increased, and the protein content is increased, the oil content is increased and/or the oil profile is modified in the cell.

Embodiment 67 is a method for increasing protein content, increasing oil content, and/or modifying oil profile in a plant, comprising increasing the expression level and/or activity of the polypeptide of embodiment 56 in the plant.

Embodiment 68 is a method for producing a plant variety with increased protein content, increased oil content, and/or modified oil profile in a plant, comprising increasing the expression level and/or activity of the nucleic acid molecule of embodiment 57 in the plant.

Embodiment 69 is the method of embodiments 67 or 68, wherein the increasing the expression level and/or activity of the polypeptide in the plant is by transgenic means or by breeding.

Embodiment 70 is a method for producing a transgenic plant with increased protein content, increased oil content, and/or modified oil profile, comprising introducing the nucleic acid molecule of embodiment 57 or the expression cassette of any one of embodiments 65-69 to a recipient plant to obtain a transgenic plant, wherein the transgenic plant has increased protein content, increased oil content, and/or modified oil profile compared to the recipient plant.

Embodiment 71 is the method of embodiment 70, wherein the introducing the nucleic acid molecule to the recipient plant is performed by introducing the expression cassette of any one of embodiments 6-64 into the recipient plant.

Embodiment 72 is a primer pair for amplifying the nucleic acid molecule of embodiment 57.

Embodiment 73 is the primer pair of embodiment 72, wherein the primer pair is a primer pair composed of two single-stranded DNA shown in Table 10, Table 13, Table 14, and Table 15.

Embodiment 74 is a kit comprising the primer pair of embodiment 72 or 73.

EXAMPLES

Example 1. Experimental Materials Used in Examples 2-11

Four materials (herein also referred to as “extreme materials” ) from the chromosome segment substitution lines (CSSLs) population in 2014-2015 that showed significant differences in protein and oil content from the recurrent parent Suinong 14 (SN14, control material) were selected. These materials were disclosed in Qi et al., Plant Cell Environ. 2018 Sep; 41 (9) : 2109-2127. The materials were identified as having High Protein Low Oil (HPLO) content, Low Protein High Oil (LPHO) content, High Protein High Oil (HPHO) content, and Low Protein Low Oil (LPLO content and were used herein as the test materials. From 2016 to 2018, the four extreme materials were sown in the experimental field of Xiangyang Farm with the conditions described below: the appropriate soil moisture content was about 15-20%, the row length was about 5m, the row spacing was about 60cm, the seeding depth (the distance from the surface of the soil) was about 3-4cm, and each material was sown in 20 rows. After 3 weeks, the seedlings were manually thinned to reach a plant spacing of about 6.5 cm.

Field sampling of soy protein and oil content of the extreme materials.

Developmental stages soybean grains and growth stages (Glob, Hrt, Cot, EM1, EM2, MM, LM and DS) of the seeds are as described on the Soybase website (soybase. org) and is shown in Table 3.

Table 3. Soybean seed development stages.

Field sampling was performed by selecting plants blooming at nodes 6-8, and leaf samples from the nodes 6-8 were taken each time, and approximately one full centrifuge tube was taken as a biological replicate each time. Three biological replicates of each material were used. Each biological replicate was immediately placed in the ice box for storage for protein and fatty acid phenotype determination.

Soybean and Arabidopsis genetic transformation. Unless explicitly stated otherwise, Escherichia coli used in this application was DH5α and Agrobacterium tumefaciens was EHA105. The target gene fragment of entry vector Fu28 was introduced into the plant expression vector pSOY1 via ligation using the gateway vector system. The entry vector Fu28 and expression vector Psoy1 were presented by Professor Fu Yongfu of Institute of crop science, Chinese Academy of Agricultural Sciences (Wang X, et al. (2013) BioVector, a flexible system for gene specific expression in plants. BMC Plant Biol 13: 198) .

The main reagents involved in this experiment are shown in Table 4.

Table 4. Main experimental reagents

The cultures and antibiotics involved in this experiment are Table 5 and Table 6.

Table 5. Experimental medium

Table 6. Antibiotics

Determination of protein and oil content of CSSLs population. The FOSS grain analyzer (INFRATEC1241) was used to determine the protein and oil content of soybean grains in CSSLs population. Each test material was tested 3-5 times, and the average value was used for phenotypic data analysis.

The Kjeldahl method is commonly used for the quantitative determination of nitrogen contained in organic substances plus the nitrogen contained in the inorganic compound’s ammonia and ammonium (NH ₃/NH ₄ ⁺) . Without modification, other forms of inorganic nitrogen, for instance nitrate, are not included in this measurement. The Kjeldahl reagents required for determining soybean grain protein content are shown in Table 7.

Table 7. Kjeldahl reagents

Bioinformatic analysis of candidate genes. The websites used to predict functions of candidate gens in the “hot spot” interval are shown in the Table 8 below.

Table 8. Function prediction website of candidate gene

Gene expression analysis. Gene expression was determined by real time quantitative-PCR (qRT-PCR) analysis. Reaction solutions for genomic DNA removal were prepared as shown in Table 9, primers for qRT-PCR amplification were shown in Table 10, and reactions solutions for qRT-PCR amplification were prepared as shown in Table 11.

Table 9. Reaction solution preparation for genomic DNA removal

Table 10. qRT-PCR amplification primers

Primer	Sequence (5’→3’)	SEQ ID NO
Glyma. 20G092000-qF	ACTGTTGGAGACAAACCTTTCACCC	SEQ ID NO: 25
Glyma. 20G092000-qR	GAGGACCCCGTGGAGGAGGAAT	SEQ ID NO: 26
Glyma. 20G092100-qF	CCATCACGAGTGCTGCTATCAAGGA	SEQ ID NO: 27
Glyma. 20G092100-qR	TGAAAAGTCATCAAGTAGAGCGTGG	SEQ ID NO: 28
Glyma. 20G092400-qF	CTGATGCTCAAAAGCTTAGGACCCG	SEQ ID NO: 29
Glyma. 20G092400-qR	AACCTTGTTGTAAACCTGACGAGAAAT	SEQ ID NO: 30
Glyma. 20G094900-qF	TGGATTATATGCCTGAGTGTTGGTAG	SEQ ID NO: 31
Glyma. 20G094900-qR	TGTAATCACACCAATAACCAGACCC	SEQ ID NO: 32
GmActin4-qF	GTGTCAGCCATACTGTCCCCATTT	SEQ ID NO: 33
GmActin4-qR	GTTTCAAGCTCTTGCTCGTAATCA	SEQ ID NO: 34

Table 11. Reaction solution preparation for qRT-PCR

Primers for cloning Glyma. 20G092400 are provided in Table 12.

Table 12. Cloning primers for Glyma. 20G092400

Primer	Sequence (5’→3’)	SEQ ID NO
Glyma. 20G092400-cF	TCTCTCGAGATGGCCTCCAACGGCG	SEQ ID NO: 21
Glyma. 20G092400-cF	CCAGGATCCAGCCGAAAGAAGAGCACAAGTAAACC	SEQ ID NO: 22

Example 2. Methods for the Identification of Arabidopsis mutants

A. Planting Arabidopsis thaliana

The planting soil comprising flower nutrients and vermiculite at a ratio of 3: 1 (flower nutrient soil: vermiculite) . The soil was put into small flowerpots and slowly soaked in water. Arabidopsis thaliana seeds were sown evenly in moist soil. The opening of each pot was sealed with plastic wrap and placed in a refrigerator at 4℃ for vernalization for 48-72h. After vernalization, the pots were placed in an incubator (22℃, 16 h/8 h light/dark, 70 μmol/m ²/s) for 1 week until the Arabidopsis emerged. After culturing for 1 week, the wrap was removed.

B. DNA extraction from Arabidopsis leaves

Total DNA was extracted by the CTAB (hexadecyltrimethylammonium bromide) method (Porebski, S. et al., Plant Molecular Biology Reporter, 1997, 15 (1) : 8-15) . The prepared CTAB extract was stored for at 4℃. The rosette leaves of Arabidopsis thaliana were collected and placed in an

( “EP” ) tube with with small steel balls. Lquid nitrogen was used to quick-freeze the leaves. Next the frozen leaves were placed in a tissue grinder to fully break the leaves. 700 μL of CTAB extract solution was added to the EP tube containing the sample and mixed thoroughly with a vortexer. The mixture in the EP tube was then placed in a 65℃ water bath for 1 h, turning and mixing once every 10 minutes. The EP tube was then taken out of the water bath and added 650 μL of chloroform after cooling. The two was inverted 30 times to mix thoroughly, and centrifuged at 12000 rpm for 15 minutes at room temperature. 400-500 μL of the supernatant was added into a new EP tube and 650 μL of chloroform was added. The mixture was shaken and mixed thoroughly and centrifuged at 12000 rpm for 15 minutes at room temperature. 400-500 μL of the supernatant was transferred to a new EP tube containing 700 μL of pre-cooled isopropanol and inverted 30 times to mix thoroughly. The mixture was then centrifuged at 12000 rpm for 15 minutes at room temperature. The supernatant was discarded, and the precipitate was washed once with 95%ethanol, then once with 75%ethanol, and centrifuged at 7500 rpm for 5 min at room temperature. The DNA precipitate was dried and dissolved with 50 μL of sterilized water. DNA concentration (as reflected by the OD600 value) was measured by using NanoDrop2000C, and the DNA was stored at -20℃.

C. PCR identification of Arabidopsis mutants

Arabidopsis homozygotes were screend by using the Arabidopsis mutant (SALK_021984C) detection primers (LP+RP, BP+RP) (Table 13) provided by SIGnAL (signal. salk. edu/tdnaprimers. 2. html) . Total DNA of Arabidopsis wild-type ecotype Col-0 ( “Col-0” ) and the mutant were extracted and used as a template for PCR amplification. The amplified product was subjected to 1.5%agarose gel electrophoresis to detect whether the mutant was a homozygous mutant. The primers used are shown in Table 13. Arabidopsis homozygotes were screend by using the Arabidopsis mutant (SALK_021984C) detection primers (LP+RP, BP+RP) provided by SIGnAL (signal. salk. edu/tdnaprimers. 2. html) .

Table 13. PCR detection primer for Arabidopsis mutant (SALK_021984C)

Primer	Sequence (5’→3’)	SEQ ID NO
SALK-LP	TTAATAAACGCCGAGCATGAC	SEQ ID NO: 16
SALK-RP	GGATTATTGGCAAGCACTGAC	SEQ ID NO: 17
SALK-BP	ATTTTGCCGATTTCGGAAC	SEQ ID NO: 18

D. RT-PCR identification of Arabidopsis mutants

RT-PCR primers for detection of homozygous mutants of Arabidopsis thaliana are provided in Table 12. Arabidopsis total RNA extraction and generation of cDNA were performed as described in Example 7. At18SrRNA was used as an internal reference gene, and RT-PCR detection primers for SALK_021984C homozygous mutation were shown in Table 14.

Table 14. RT-PCR for detection of Arabidopsis thaliana homozygous mutants

Primer name	Sequence (5’→3’)	SEQ ID NO
At18SrRNA-F	CGTCCCTGCCCTTTGTACAC	SEQ ID NO: 41
At18SrRNA-R	CGAACACTTCACCGGATCATT	SEQ ID NO: 42
SALK_021984C-F	GATAACGCAACAACCGCAGC	SEQ ID NO: 39
SALK_021984C-R	TGCGTCCCTGAGCCTATGAT	SEQ ID NO: 40

E. Arabidopsis genetic transformation and identification

i. Flower dipping transformation of Arabidopsis thaliana

Arabidopsis cultivation and plant transformation preparation. Arabidopsis control group wild-type Col-0 and mutant materials were planted as described above. After the Arabidopsis was bolted, the stalks were removed to increase the number of bolts. The plants were then ready to be transformed when the stalks growed to the same height and only the upper flowers were not blooming.

Agrobacterium preparation. Agrobacterium tumefaciens containing the expression vector at -80℃ were inoculated into 10mL of LB liquid medium containing spectinomycin and cultured overnight at 28℃ at 160 rpm. 100 μL of small shaking bacteria liquid was then transferred to 100 mL of new YEP liquid medium containing spectinomycin for further culturing at 28℃, 200 rpm shaking. When the density of the culture reached OD ₆₀₀ 0.8, the culture was harvested and centrifuged. The bacteria pellet was resuspended with 100mL of 5%sucrose and 0.01%Silwet-L77 resuspension solution. The suspension was kept at room temperature for 1-3h for agricultural use.

Transformation of Arabidopsis thaliana. Arabidopsis thaliana that had grown to a suitable bolting height with a large number of inflorescences were used for the transformation. The flowering flocs and the established pods were removed. The unflowered flocs were immersed in the Agrobacterium resuspension for 30s. The Arabidopsis thaliana infected by Agrobacterium was then wrapped in plastic wrap and placed in a dark box for light-proof treatment. After the incubation period of 24 hours, the infected plants were then taken out of the dark box. A second round of transformation was then performed on these plants a week later in order to improve the conversion efficiency. Mature seeds of the plants were harvested.

ii. Screening Transgenic Arabidopsis with Basta

The mature T ₀ seeds of the transformed Arabidopsis thaliana were harvested and planted as described above. When the two young true leaves were fully expanded, Basta liquid (Basta dilution concentration is 1: 1000) was sprayed on the plants 2 -3 times, once every other day, and the growth state of Arabidopsis was observed. Non-transgenic Arabidopsis plants appeared chlorosis and gradually died, while transgenic Arabidopsis plants grew normally. After the transgenic Arabidopsis thaliana plant grew 4 leaves, the plants that were positively identified as transgenic plants were transplanted into new small pots to allowthe seedlings to grow up for verification of transgene status.

iii. Identification of transgenic Arabidopsis thaliana

The leaf DNA of transgenic Arabidopsis thaliana (T ₁, T ₂ and T ₃) was extracted. The transgenic plants were identified by PCR using Glyma. 20G092400 gene primers and Bar primers using primers shown in Table 15. The PCR products were detected by 1.5%agarose gel electrophoresis.

Table 15. PCR primers for detecting transgenic Arabidopsis

Primer	Sequence (5’→3’)	SEQ ID NO
Bar-F	CCAGCTGCCAGAAACCCACG	SEQ ID NO: 23
Bar-R	CGAACGGGGGATCTACCATG	SEQ ID NO: 24
Glyma. 20G092400-zF	ATGGCCTCCAACGGCG	SEQ ID NO: 37
Glyma. 20G092400-zR	AGCCGAAAGAAGAGCACAAGTAAACC	SEQ ID NO: 38

iv. qRT-PCR identification of T ₃ generation transgenic Arabidopsis thaliana

Total RNA of Arabidopsis rosette leaves were extracted and reversely transcribedinto cDNA. The expression level of Glyma. 20G092400 in transgenic Arabidopsiswas determined using the primer sequence shown in Table 16. AtACTIN2 was used as an internal reference gene.

Table 16. RT-qPCR primers for detecting transgenic Arabidopsis

Primer	Sequence (5’→3’)	SEQ ID NO
Glyma. 20G092400-qF	CTGATGCTCAAAAGCTTAGGACCCG	SEQ ID NO: 29
Glyma. 20G092400-qR	AACCTTGTTGTAAACCTGACGAGAAAT	SEQ ID NO: 30
AtACTIN2-qF	CCGGTATTGTGCTGGATTCT	SEQ ID NO: 33
AtACTIN2-qR	TTCTCGATGGAAGAGCTGGT	SEQ ID NO: 34

v. Determination of total nitrogen content in Arabidopsis seeds of Arabidopsis mutants and transgenic plants

Nitrogen content of the seeds, which reflects the protein content of the seeds, was determined using Kjeldahl reagents described in Table 7: 0.1 mol/L Na ₂CO ₃ calibration to prepare 0.1 mol/L HCl. 1%H ₃BO ₃ was prepared, and pH was calibrated to within a range of pH 4 to pH 5.7 mL of 0.1%methyl red and 10 mL of 0.1%bromophenol green indicator was added for every 1 L of H ₃BO ₃, and the solution appeared wine red. Prepared 40%NaOH for determination.

The seeds were placed in an oven at 60℃ for 12-14 hours. 0.1 g sample (accurate to 0.001 g) was poured into a 50 mL digestion tube through a paper trough. The same sample was tested 3 times. 5 mL of concentrated sulfuric acid and a small amount of catalyst (potassium sulfate and copper sulfate 5: 1) was added to digest each sample in an ovenat 400℃ for 90 minutes. The sample was then taken out from the oven and let cool and use the FOSS automatic Kjeltec 2300 to determine the total nitrogen content.

vi. Determination of fatty acid content in Arabidopsis seeds of Arabidopsis mutants and transgenic plants

The content of fatty acids in seeds was determined by gas chromatography as follows: The seeds were placed in an oven at 105℃ for 20-30 minutes, and then at 65℃ for 12-14 hours. 5 replicate tests were performed for each sample. In each test, about 5 mg of the seed sample was mixed with 1 mL 2.5%concentrated sulfuric acid methanol solution, 5 μL 50 mg/mL BHT (2,6-di-tert-butyl-4-methylphenol) . 50 μL 10mg/L heptadecanic acid or acetic acid was used as internal standard. The tubes containing the samples were immediately sealed and placed into a water bath at 85℃ for 1.5 h. Each tube was inverted every 10 minutes to mix the sample and reagents thoroughly, and then let cool to room temperature. 160 μL of 9%NaCl solution and 700 μL of n-hexane were then added to the storage tube, and the mixture was vortexed for 3 minutes andcentrifuged at 4, 500 rpm for 10 minutes at room temperature. 400 μL of the supernatant of each sample were placed into a new centrifuge tube and dried overnight in a fume hood. 400 μL of ethyl acetate was then added to the dry pellet to fully dissolve it before the measurement.

We used Agilent 6890 gas chromatograph with the column model of: 30m×320μm×0.25μm. Additional operational parameters were carrier gas: nitrogen 60 mL/min; hydrogen 60 mL/min; air 450 mL/min; injection volume: 1 μL; split injection mode; split ratio 10:1; injection port temperature 170℃. Reaction procedure includes holding the reaction mixture at 180℃ for 1 min, increasing to 250℃ at a rate of 25℃/min, and holding the reaction mixture for 7 min.

Calculation formula of absolute quantity:

Ai is the peak area of the ith fatty acid component, As is the peak area of internal standard, ms is the mass of internal standard, m is the dry weight of the sample.

Relative quantity calculation formula:

vii. Transformation of soybean cotyledon nodes

Soybean cotyledon nodes were transformed and cultivated using the following protocol:

Preparation of the Agrobacterium tumefaciens and soybean cotyledon. Take out the Agrobacterium tumefaciens containing the expression vector at -80℃ and inoculate it in 10mL of LB liquid medium containing spectinomycin, and culture it overnight at 28℃ at 160 rpm. Transfer 100 μL of small shaking bacteria liquid to 100 mL of new LB liquid medium containing spectinomycin for culture, 28 ℃, 200 rpm shaking culture to OD ₆₀₀=0.8. Centrifuge 4000rpm for 10 minutes at room temperature, discard the supernatant medium, and resuspend the bacteria with 100 mL co-culture liquid medium (LCCM) , and incubate at 28 ℃, 200 rpm for 30 minutes for subsequent transformation.

Sterilize soybean seeds by the following procedures. Choose full and undamaged seeds into the petri dish, put the petri dish and beaker with the selected seeds into the airtight container in the fume hood, open the lid of the petri dish, and add sodium hypochlorite and sodium hypochlorite to the beaker at 94: 6 Hydrochloric acid and quickly seal the airtight container, turn on the fume hood switch, airtight and sterilize the seeds for 10-12 hours. After sterilization, the seeds were taken out and blown to remove the chlorine attached to the surface of the seeds to avoid damage to the seeds. Add appropriate amount of sterilized water to the soybean seeds to make the seeds absorb the water just to complete the imbibition. Put the seeds in the dark for 12-14h.

Co-culture. Divide the seed into two halves along the hypocotyl with a razor blade and use a razor blade to lightly scratch 2-3 points at the cotyledon node to make a cut. Put the explants into the prepared Agrobacterium resuspension, incubate at 160 rpm at 28℃ for 30 min to facilitate the Agrobacterium infection, and remove the infected explants from the resuspension with tweezers. Place it on the SCCM covered with filter paper and incubate for 3-5 days at 25℃ in the dark.

Induction of clumping buds. After 3-5 days of co-cultivation, after the hypocotyls are enlarged, cut the hypocotyls of the explants with a blade, leaving about 2 mm of the hypocotyls, and put the explants after cutting the hypocotyls into sterilized water several times Wash until the liquid is clear and sterile in order to remove excess bacterial liquid. Put the cut-off hypocotyl explants on sterile paper to absorb the remaining liquid on the surface and insert the explants into the SIM ⁺ with tweezers. Set the conditions of the sterile tissue culture room to 25℃ 16 h/8 h light/darkness and place the screening medium plate with explants in the sterile tissue culture room for about 14 days. Observe the growth of the clump buds, take out the slow-growing clump buds and scratch the wound at the bottom again and insert them into a new SIM ⁺; the good ones are used for transfer to the bud elongation medium (SEM) .

Elongation of cluster buds. Cut the sprout buds and insert them into the SEM and place them in a sterile tissue culture room for 2 weeks at 25℃ 16 h/8 h light/darkness. The clump buds that have not grown buds are taken out from the SEM, lightly scratched at the bottom to create a new wound, and then inserted into a new SEM for secondary culture. The culture cycle is about 14 days and the process are repeated.

Identification of positive elongated buds. When the buds are about 5 cm long and there are about 3 leaves, select a leaf, and perform Bar test strip test as described below to preliminarily determine the positive seedlings.

Rooting of positive elongated buds. The positive buds were cut from the clumping buds, dipped in IBA hormone for 30 s, inserted into the rooting medium (RM) , at 25℃ 16 h/8 h light/darkness. The rooting cultures was carried out under dark conditions and cultured in a sterile tissue culture room until they took root.

Transplanting and cultivation of positive seedlings. The positive seedlings were taken out from the culture medium, and the roots were cleaned with clean water to remove the residual culture medium. The positive seedlings were transplanted into the soil and cultured in the plant greenhouse.

viii. PCR identification of T1 generation transgenic soybean

Follow the DNA extraction of the leaf DNA of the transgenic soybean T ₁. Glyma. 20G092400 gene primers and Bar primers were used to identify transgenic soybeans by PCR. The related primer sequences are shown in Table 15.

ix. qRT-PCR identification of T ₁ generation transgenic soybean

Leaves from the soybean plants were immersed in an RNase-free EP tube and freezed in liquid nitrogen. Total RNA was extracted and reversely transcribed into cDNA. qRT-PCR was performed with the primers in Table 16 to identify T1 generation transgenic soybean by analyzing the expression of Glyma. 20G092400, the internal reference gene is GmActin4 (Genbank No: AF049106) .

x. Determination of protein, oil and fatty acid content in transgenic soybean seeds

A InfraTec ^TM 1241 Grain Analyzer (FOSS Analytics) was used to determine the protein and oil content of soybean seeds. Each sample was measured 3-5 times, and the average value was used for phenotypic data analysis.

The content of fatty acids in seeds was determined by gas chromatography and calculated as described in section vii above.

Example 3. Analysis of candidate genes tissue expression

Expression profiles of candidate genes including Glyma. 20G092000, Glyma. 20G092100, Glyma. 20G092400 and Glyma. 20G094900 were analysized by RT-qPCR (FIG. 9) in WT SN14. RNA was extracted from roots, stems, leaves, flowers, pods and seeds (herein also referred to as grains) of SN14. Expression of the candidate genes was analysized in 8 developmental stages of the grain: Glob, Hrt, Cot, EM1, EM2, MM, LM and DS (Table 1) . The results showed that all candidate genes were expressed in the tested tissues, and all showed the highest expression levels in a certain developmental stage of the grain. Glyma. 20G092000, Glyma. 20G092400 and Glyma. 20G094900 had the highest expression levels in the LM stage of the grain. Glyma. 20G092100 had the highest expression level in the DS stage of grains (FIG. 9, upper right panel) . The expression level of Glyma. 20G092000 in grain Cot, LM and DS phases is higher than that in other non-grain tissues and organs (FIG. 9, upper left panel) . The expression level of Glyma. 20G092100 in seed at Cot, EM1, MM, LM and DS phases are all higher than in the non-seed tissues and organs, i.e., root, stem, leaf, flower, pod. The expression level of Glyma. 20G092400 in the six developmental stages of the grain (Cot, EM1, EM2, MM, LM and DS) is higher than that in other non-grain tissues and organs (FIG. 9, lower left panel) ; The expression level of Glyma. 20G094900 in LM and DS phases is higher than that in other non-grain tissues and organs (FIG. 9, lower right panel) . Therefore, it is speculated that Glyma. 20G092000, Glyma. 20G092100, Glyma. 20G092400 and Glyma. 20G094900 play an important regulatory role during grain development.

Example 4. Protein structure analysis of candidate genes

The domains of proteins encoded by Glyma. 20G092000, Glyma. 20G092400, Glyma. 20G094900 and Glyma. 20G092100 genes were analyzed through NCBI database. The results showed that Glyma. 20G092000 belongs to the retroviral protease superfamily, which includes the pepsin-like aspartic protease of cells and retroviruses, and also has sphingolipid activator-like protein type B, region 1 and region 2 (FIG. 3) . Glyma. 20G092100 belongs to the PPR repeat family (FIG. 4) . This repeat has no known function. It is about 35 amino acids long, and up to 18 copies are found in some proteins. Glyma. 20G092400 belongs to the amino acid transferase-V family, and this protein contains an amino acid transferase domain and other enzymes including a cysteine desulfurase domain (FIG. 5) . Glyma. 20G094900 belongs to the DUF1336 superfamily and is a protein with unknown function (FIG. 6) . This family represents the C-terminus of many pseudoproteins with unknown function.

Example 5. Tissue-specific expression analysis of Glyma. 20G092400

RNA was extracted from the organs (roots, stems, leaves, flowers, pods and seeds) of SN14 and reverse transcribed into cDNA, which was identified by qRT-PCR, Glyma. 20G092400 was expressed in all tissues and organs, very low expression in roots, relatively high expression in stems, leaves, and flowers, higher expression in pods, and the highest expression in seeds, reaching a relative multiple of more than 5 times.

To analyze tissue specific expression of Glyma. 20G092400, RNA is extracted and cDNA were synthesized and qRT-PCR was performed using the specific primers provided in Table 16 shown above in Example 2. The reference gene is GmActin4 (Genbank No: AF049106) .

Example 6. Subcellular localization of Glyma. 20G092400

Tobacco cultivation. Tobacco planting soilwas prepared by mixing flower nutrient soil with vermiculite at a ratio of 3: 1. After germination, the seedlings or transfers to new small flowerpots, one plant per pot, placed into an incubator (22℃, 16 h/8 h light/dark, 70 μmol·m-2·s-1) for cultivation, and watered once every 2 days to ensure adequate water.

Agrobacterium injection of tobacco leaves. Agrobacterium tumefaciens containing the expression vector (pSOY1-Glyma. 20G092400-GFP) or an expression vector encoding a GFP protein (as a control) was inoculated in 10mL LB (containing the corresponding antibiotics) liquid medium and cultured at 28 ℃ and shaken at 200rpm until OD600=0.8. The bacteria liquid was transferred to 1.5mL sterilized EP tubes in batches and centrifuged at 10,000 rpm for 1 min at room temperature to enrich the bacteria. To prepare the resuspension buffer, about mL MES+500μL MgCl ₂ was made up to 50mL with sterile water. The bacteria were resuspended with 1 mL of the resuspension buffer, centrifuged at 10,000 rpm at room temperature for 1 min, and the supernatant was discarded. The bacteria were resuspended, washed, and centrifuged again, and were added with acetosyringone (final concentration 40mg/L) . The bacterial solution was transferred into an EP tube, buffer was added to adjust it to OD600=0.2, and the mixture was left at room temperature for 1 hour. Robust tobacco was selected after 3 weeks of growth. The tobacco leaves were injected with Agrobacterium using a syringe needle. The Nicotiana benthamiana was inoculated with Agrobacterium tumefaciens in an incubator (22℃, 16h light/8h dark, 70μmol·m-2·s-1) for 48h, then observed by confocal microscope for subcellular localization of the target protein. The subcellular localization of pSOY1-Glyma. 20G092400-GFP expressing fusion protein was observed under a confocal microscope. FIG. 11 shows detection of the green fluorescence of pSOY1-Glyma. 20G092400-GFP fusion protein in the nucleus, indicating that the protein encoded by Glyma. 20G092400 is expressed in the nucleus.

Example 7. Cloning and Vector Construction of Glyma. 20G092400

Total RNA extraction from soybean SN14 leaves. RNA from young and tender SN14 triple compound leaves was extractedby the trizol method. With 2%concentration agarose gel and electrophoresis detection, three bands of 28s, 18s and 5s were observed, which indicated that the integrity of the RNA was good. The cDNA was obtained by reverse transcription and used for Glyma. 20G092400 gene cloning.

Glyma. 20G092400 clone. The CDS sequence of Glyma. 20G092400 was obtained from the phytozome database. The CDS sequence is 1388 bp in length. The cloning primers were designed according to the CDS sequence of Glyma. 20G092400 (Table 12) . This sequence was used as a template to design primers at both ends of the gene's CDS sequence (with the terminator removed) . The primer pair was designed to comprise restriction sites (SpeI and BamHI) at both ends of the ccdB gene in the entry vector. First, SN14 leaf cDNA was used as a template to clone the CDS sequence of Glyma. 20G092400 gene with CDS primers, and then this product was used as a template to perform PCR with primers with restriction sites to obtain Glyma. 20G092400 with restriction sites on both ends. The gene products with restriction sites were recovered through the gel recovery kit for subsequent experiments. The full-length CDS sequence of Glyma. 20G092400 (with the termination codon TGA removed) was cloned using the cDNA of soybean Suinong 14 (SN14) leaves as a template. The CDS sequence was amplified using the following primers.

Construction of entry vector (Fu28-Glyma. 20G092400) . The Fu28 empty vector and the target gene were digested with restriction endonucleases (SpeI and BamHI) and the digested product was ligated with Solution I ligase. The ligation product was transformed into E. coli competent DH5α, and cultured in a chloramphenicol resistant plate for about 16 hours. A single colony was picked and cultured. The insertion of Glyma. 20G092400 into the bacteria genome was verified by PCR using the primers as shown in Table 10. Gel electrophoresis detected the target band at 1338 bp (data not shown) . The presence of a Glyma. 20G092400 insert was futher confirmed by sequencing.

Construction of expression vector (pSOY1-Glyma. 20G092400) . The Fu28-Glyma. 20G092400 and pSOY1 vector plasmids were extracted. The Fu28-Glyma. 20G092400 plasmid additionally carries a green fluorenscent protein (GFP) from Fu28. The plasmids were recombined by LR reaction (Table 16) , and the products were transformed into SN14 DH5α and cultured. A single colony was picked and cultured. The presence of the insertion of the Glyma. 20G092400 sequence in the bacteria was confirmed by PCR using primers as shown in Table 12. The target band of 1338 bp was detected by gel electrophoresis detected (data not shown) . The presence of a Glyma. 20G092400 insert was futher confirmed by sequencing, which is consistent with the Glyma. 20G092400 gene sequence.

Table 16. LR reaction

Expression vector transferred into EHA105 Agrobacterium tumefaciens. EHA105 Agrobacterium competent cells were first transformed with pSOY1-Glyma. 20G092400, the transformed bacterial cells were grown on a YEP plate that is resistant to both rifampicin and spectinomycin, and single colonies were selected. The transformation was confirmed by PCR as indicated by the presence of a 1338 bp DNA fragment (Glyma. 20G092400, data not shown) , which represented that the expression vector (pSOY1-Glyma. 20G092400) has been transferred into EHA105 Agrobacterium tumefaciens.

Transient expression localization of Glyma. 20G092400. Agrobacterium tumefaciens was injected and transformed into tobacco leaves. After 48 hours, the injected leaves were cut, and the epidermis was removed. They were spread out in clean water and placed on a glass slide and covered with a cover glass. The subcellular localization of pSOY1-Glyma. 20G092400-GFP expressing fusion protein was observed under a confocal microscope. The results are shown in FIG. 11: the green fluorescence (bright lines on top left panel) of pSOY1-Glyma. 20G092400-GFP appears on the cell membrane and nucleus, indicating that the gene Glyma. 20G092400 is a nuclear membrane co-expressed gene (FIG. 11) .

Example 8. Expressing Glyma. 20G092400 in Arabidopsis

i. Selection of Arabidopsis mutants

Arabidopsis AT5G26600 is highly homologous to soybean Glyma. 20G092400. We obtained the amino acid sequences of Arabidopsis AT5G26600 from the Phytozome (phytozome. jgi. doe. gov/pz/portal. html#) database query and performed a sequence alignment analysis with the amino acid sequence of Glyma. 20G092400. It is found that the percentage identity of amino acid sequences between Glyma. 20G092400 and Arabidopsis AT5G26600 is about 75.8%, and they have the same protein conserved domains, all belong to the amino acid transferase-V family (FIG. 7, FIG. 18) . Therefore, the Arabidopsis AT5G26600 gene mutant, SALK_021984C, was purchased through ABRC (abrc. osu. edu/) as the soybean Glyma. 20G092400 mutant in Arabidopsis for subsequent experiments.

ii. PCR identification of Arabidopsis mutants

Arabidopsis mutant SALK_021984C and the Arabidopsis wild-type Col-0 (WT) plants were planted, and DNA were extracted from the rosette leaves of the plants. PCR was performed with a combination of LP+RP and LP+BP primers as shown in Table 13. The length of the product of LP + BP was about 813 bp (data not shown) , and gel electrophoresis analysis indicated that the mutant was homozygous.

iii. PCR identification of homozygous Arabidopsis mutants

In order to determine whether the Arabidopsis mutant SALK_021984C can be successfully transcribed at the mRNA level, total RNA from the Arabidopsis mutant SALK_021984C leaves was extracted, reverse transcribed to obtain cDNA, and the cDNA was used as a template for RT-PCR amplification. The RT-PCR product was detected by using 1.5%agarose gel electrophoresis. The results showed that the target gene transcription was not detected in the Arabidopsis mutant, while the transcription was detected for the internal control, AT18sRNA (data not shown) . This result further verified that the mutant is a homozygous mutant of Arabidopsis.

iv. Basta screening of transgenic Arabidopsis thaliana replenishment and overexpression plants

In order to verify the function of Glyma. 20G092400 in Arabidopsis, we used Agrobacterium inflorescence infection method to transform the expression vector (pSOY1-Glyma. 20G092400) into Arabidopsis wild-type ecotype Col-0 (WT) and mutant SALK_021984C to produce overexpression plants (pSOY1-Glyma. 20G092400) and replenishment plants (pSOY1-Glyma. 20G092400/SALK_021984CI) , respectively. After Agrobacterium infection, mature T0 generation seeds of Arabidopsis thaliana were harvested and mixed with fine sand and sown in the prepared soil. After one week, when the two young leaves of the T1 generation plants are fully developed, the plant was sprayed evenly with Basta liquid (Basta dilution concentration: 1: 1000) once every other day. After spraying 2-3 times, it is observed that the non-transgenic Arabidopsis plants appear withered and gradually die while the transgenic Arabidopsis plants grew normally and remained green (data not shown) . Positive plants were transferred to new small pots, and then identified by the indication of green leaves when the seedlings grew. These plants were preliminarily identified as transgenic Arabidopsis replenishment or overexpression plants.

v. Bar test strip detection and PCR identification of transgenic Arabidopsis thaliana replenishment and overexpression plants

Transgenic Arabidopsis T1, T2, T3 generation plants were selected and planted. Leaf extract was prepared as described above. A Bar test strip (Linear Chemicals) was inserted into the extract in a specified direction as provided in the manufacture’s instructions. Bar test strips showed a clear number of two bands in the leaves of transgenic plants (overexpression: pSOY1: Glyma. 20G092400, supplement: pSOY1: Glyma. 20G092400/SALK_021984C) (data not shown) . At the same time, the total DNA of Arabidopsis leaves was extracted, and the full-length primers of CDS sequence of Glyma. 20G092400 and Bar primers were used for PCR identification of transgenic plants. The results showed that there was no target band for Glyma. 20G092400 (e.g., 1338 bp) nor for Bar (e.g., 516 bp) in the control plants, but in the replenishment and overexpression plants. The results displayed on the Bar test strip and PCT detection confirmed that the Arabidopsis plant was genetically modified.

vi. RT-qPCR identification of T ₃ transgenic Arabidopsis

Arabidopsis wild-type (WT) Col-0, mutant plants (SALK_021984C) , replenishment plants (pSOY1: Glyma. 20G092400/SALK_021984C) and overexpression plants (pSOY1: Glyma. 20G092400) were planted under the same conditions as described for Arabidopsis above. The total RNA of plant leaves was extracted and reverse transcribed to obtain cDNA as described above. The cDNA was used as a template RT-qPCR amplification using Glyma. 20G092400 specific primers to identify the transgenic Arabidopsis thaliana. The results showed that: Glyma. 20G092400 was not expressed in wild-type Col-0 and mutants, but it expressed in replenishment plants and overexpression plants. As shown in FIG. 12, expression level of Glyma. 20G092400 in overexpression plants was higher than that in replenishment plants. The results indicates that the mutation in Arabidopsis AT5G26600 (a homolog of Glyma. 20G092400) signficantly reduces its expression, which may be rescued by reintroducing an exogenous copy of the Glyma. 20G092400 as shown herein. The AT5G26600 and Glyma. 20G092400 polypeptides share 61%amino acid sequence identity. ]

vii. Investigation on bolting of T ₃ generation transgenic Arabidopsis

Arabidopsis wild-type Col-0, mutant plants (SALK_021984C) , replenishment plants (pSOY1: Glyma. 20G092400/SALK_021984C) and overexpression plants (pSOY1: Glyma. 20G092400) were planted on the same conditions. After 25 days, the plants were examined for bolting. Bolting occurs when a crop prematurely grows flower stalks and produces seeds. The results showed that wild-type Col-0 plants, replenishment plants, and overexpression plants bolted earlier than mutant plants, and the bolting height (dotted line) of wild-type Col-0 and replenishment plants was about the same. In contrast, overexpression plants appeared to have the maximum bolting height (arrow) (FIG. 13) , which indicates that the Glyma. 20G092400 gene may play a role in promoting plant bolting.

viii. Investigation on bolting of T ₃ generation transgenic Arabidopsis inflorescence

Arabidopsis wild-type Col-0, mutant plants (SALK_021984C) , replenishment plants (pSOY1: Glyma. 20G092400/SALK_021984C) and overexpression plants (pSOY1: Glyma. 20G092400) were planted on the same conditions. After 35 days, plants were examined for inflorescence. The results showed that wild-type Col-0, replenishment plants and overexpression plants had more inflorescences (arrows) than mutant plants. Further, overexpression plants had the most inflorescences (left) (FIG. 14) . It is speculated that the Glyma. 20G092400 gene may promote the growth of plant inflorescences.

ix. Determination of fatty acids and total nitrogen in T3 transgenic Arabidopsis seeds

Arabidopsis wild-type Col-0, mutant plants (SALK_021984C) , replenishment plants (pSOY1: Glyma. 20G092400/SALK_021984C) and overexpression plants (pSOY1: Glyma. 20G092400) were planted under the same conditions. After the seeds mature, the fatty acid content of the grains was determined by gas chromatography. The results showed that the fatty acid composition and total fatty acid content of the grains of the overexpression plants of Glyma. 20G092400 were significantly higher than those of the control plants (FIGS. 15A-B) . Additionally, the fatty acid composition and total fatty acid content of the grains of the mutant plants were significantly lower than those of the control plants. The Glyma. 20G092400 replenishment experiment was further carried out on the mutant plants. The results showed that the palmitic acid, linoleic acid, linoleic acid and eicosenoic acid content of the replenishment lines were significantly higher than those of the control plants, and the stearic acid and oleic acid contents were lower than those of the control plants (FIG. 15A) . We also found that the replenishment lines had harder grains (data not shown) . The fatty acid and oleic acid content were significantly higher in the control plants than that of the mutant plants (FIG. 15A) . Moreover, the total fatty acid content of the grains of the replenishment line was significantly higher than that of the control plants (FIG. 15B) . The results suggest that Glyma. 20G092400 can promote the accumulation of fatty acid content in grains.

Example 9. Expressing Glyma. 20G092400 in soybean

i. Bar test strip detection and PCR identification of T ₁ transgenic soybean

The T ₁ genetically modified soybeans were planted the leaves were crushed and tested using the Bar test strip as described above. The result shows that two horizontal lines appear on the Bar test strip in the overexpressing plants (data not shown) , indicating that the verified plants are genetically modified soybean plants. The overexpressing plants were verified by PCR using the full-length primers of CDS sequence of Glyma. 20G092400 (1338 bp) and Bar primers (516 bp) (data not shown) , indicating that the verified plants were transgenic soybean plants.

ii. qRT-PCR identification of T1 transgenic soybeans

The transgenic soybean (overexpression plant pSOY1: Glyma. 20G092400) and the control wild-type plant Dongnong 50 (DN50) (WT) were planted under the same conditions. The young leaves were taken to extract total RNA and reverse transcribed into cDNA. The expression level of Glyma. 20G092400 was tested by qRT-PCR reaction using Glyma. 20G092400 specific primers. The results showed that the expression level of Glyma. 20G092400 in the overexpression plants was higher than the control plants, indicating that Glyma. 20G092400 was successfully transformed into soybean plants (FIG. 16) .

iii. Determination of protein and fatty acids in T1 transgenic soybean grains

The transgenic soybean (overexpression plant pSOY1: Glyma. 20G092400) and the control plant DN50 were planted under the same conditions, their mature T1 seeds were harvested, and some of the seeds were dried for phenotyping. The grain protein and oil content were determined by Kjeldahl nitrogen determination and the content of fatty acid was determined by gas chromatography, e.g., as disclosed in Rapid Commun Mass Spectrom. 2007; 21 (12) : 1937-43. ) . The protein, oil, and fatty acid content in the overexpression plants were significantly higher than the control plants, indicating that Glyma. 20G092400 promoted quality traits (protein and oil content) (FIG. 17) .

Example 10. Distrubution of protein and oil content of CSSLs population

FOSS grain analyzer (INFRATEC1241) was used to determine the seed protein and oil content of the CSSLs population (2013-2015) . Three biological replicates were measured for each test material, and the average value was used for protein and oil content phenotypic data analysis. FIG. 1 presents a histogram of the density distribution of protein (upper three panels) and oil (lower three panels) content. The range of protein content is about 37.00%-46.77%, and the range of oil content is about 18.02%-23.19%. The results are consistent to the normal distribution and is suitable for quantitative trait locus (QTL) mapping of protein and oil content. As described herein the QTLmapping refers a genome-wide inference of the relationship between genotype at various genomic locations and phenotype for a set of quantitative traits in terms of the number, genomic positions, effects, and interaction of QTL. X-axis represents seed protein and oil content, Y-axis represents the density of frequency distribution, and solid line means normal curve of CSSLs population. In each panel, left arrow represents the location of wild soybean (ZYD00006) protein and oil content, and right arrow represents the location of SN14 protein and oil content.

Example 11. Fine mapping of QTL (Qpro&oil_Gm20) for soybean protein and oil content

The genome-wide introduction lines (CSSLs) constructed based on SN14 and wild beans (ZYD00006) were used to fine-map the 2013-2015 QTL (Qpro&oil_Gm20) for soybean protein and oil content (Table 17) . The results showed that four protein and oil content-related QTLs (Qpro_Gm20_1, Qpro_Gm20_2, Qoil_Gm20_1 and Qoil_Gm20_2) having similar confidence intervals were detected on the same chromosome (Gm20) from 2013 to 2015, and the distance range was 0.02Mb-0.16Mb, Qpro_Gm20_2. The minimum image distance is 0.02Mb, and the maximum image distance of Qoil_Gm20_2 is 0.16Mb. The logarithm of the odds (LOD) value range is 3.72-12.62. The minimum LOD value of Qpro_Gm20_2 is 3.72, and the maximum LOD value of Qoil_Gm20_1 is 15.16. The range of genetic contribution rate (R2) is 2.27 %-22.86%. The minimum genetic contribution rate of Qpro_Gm20_2 is 2.27%, and the maximum genetic contribution rate of Qpro_Gm20_1 is 22.86%. The range of additive effects is -0.52-1.27. The minimum additive effect value of Qoil_Gm20_1 is -0.52, and the addition of Qpro_Gm20_1. The maximum additive effect is 1.27. Since the confidence intervals of the four QTLs are close, they are integrated as the "hot spot" interval (33.54Mb-34.70Mb) for the study of protein and oil content-related QTLs. The QTLs results are used for the mining and function analysis of subsequent protein and oil content-related candidate genes. This “hot spot” interval is consistent with MQTLOil-62 (Gm20: 33.14Mb-33.84Mb) described in Qi et al., Plant Cell Environ. 41 (9) : 2109-2127 (2018) . In this study, we identified the “hot spot” interval through Meta analysis of 312 oil content QTLs (Table 18) , thus further verifying the precision and accuracy of fine positioning of protein and oil content QTL (Qpro&oil_Gm20) .

Table 17. QTL (Qpro&oil_Gm20) fine mapping of protein and oil content in CSSLs population

Table 18. Meta-analysis of QTL for soybean oil content

Example 12. Candidate gene mining and WEGO analysis in the “hot spot” interval

The "hot spot" interval (33.54Mb-34.70Mb) was obtained by integrating the confidence intervals of the CSSLs population protein and oil content QTL (Qpro&oil_Gm20) finely mapped from 2013 to 2015. The candidate gene mining and Web Gene Ontology (WEGO) analysis were performed on the "hot spot" interval. The results show that there are 130 candidate genes in this “hot spot” interval. There are 112 candidate genes having Gene Ontology (GO) annotations: 98 candidate genes are related to cell composition, 64 candidate genes are related to molecular functions, 72 candidate genes are involved in biological processes (FIG. 2) . Further, GO analysis of 130 candidate genes was carried out using GO analysis tools in Soybase and Quick GO databases, available at Kyoto Encyclopedia of Genes and Genomes (KEGG; www. kegg. jp/) and Gene Ontology (geneontology. org/) . The results showed that: Glyma. 20G092000 is involved in lipid metabolism (GO: 0006629) ; Glyma. 20G092100 is related to the development of embryonic grains (GO: 0009793) and has the functions of protein amino acid binding and glycoprotein binding (GO: 0005515) ; Glyma. 20G092400 has catalytic activity (GO: 0003824) ; Glyma. 20G094900 is related to lipid binding (GO: 0005543) (Table 19) . Therefore, the above four genes are used as candidate genes for protein and oil content analysis. The results suggest that these four genes may be related to the metabolism and synthesis of protein and oil.

Table 19. GO analysis of candidate genes

Example 13. Determination of protein content of soybean grains at different developmental stages

Soybean grain protein content is one of the important traits to measure soybean quality. In this study, the Kjeldahl method was used to determine the grain protein content of parent SN14 and extreme materials (HPLO, LPHO, HPHO and LPLO) to analyze the protein accumulation characteristics of soybean grains at different developmental stages (FIG. 7) . The results showed that the protein content of the grains of the five materials had the highest total nitrogen/protein content during the EM1 period, and the nitrogen/protein content decreased with the progress of grain development. The grain protein content of the five materials all showed a sharp downward trend from the development stages of EM1 to MM. The grain protein content of SN14, HPHO and LPLO has the lowest level at the MM stage. From the development stages of MM to LM, the grain protein content of SN14, HPHO and LPLO showed an upward trend, while the HPLO and LPHO grain protein content continued to decrease, with the HPLO grain protein content reaching the lowest level at the LM development stage. During the LM-DS development stages, the grain protein content of SN14, HPHO, LPHO and LPLO decreases. The LPHO grain protein content maintained a downward trend during the entire grain development process and reached the lowest during the DS development period. Of note, the two high-protein materials, HPLO and HPHO, had higher protein content than the parent SN14 at all stages of the soybean kernel development process. Moreover, the two low-protein materials, LPHO and LPLO, had lower protein content than the parent SN14 at all stages of the soybean kernel development process.

Example 14. Determination of fatty acid content of soybean kernels at different developmental stages

Types of fatty acids in soybean seed oil include palmitic acid (C16: 0) , stearic acid (C18: 0) , oleic acid (C18: 1) , linoleic acid (C18: 2) and linolenic acid (C18 : 3) . In this study, the fatty acid content of the parent SN14 and extreme materials (HPLO, LPHO, HPHO and LPLO) were measured to analyze the fatty acid accumulation characteristics of soybean grains at different developmental stages (FIGS. 8A-F) . The results showed that the fatty acid content of the grains of the five materials was detected in the EM1-EM2 development stage. Referring to FIG. 8A, the palmitic acid level remained low but detectable at EM1 and EM2 stages, and increased sharply from stages EM2 to MM, and peaked at stage LM, before it drops from stages LM to DS. Referring to FIG. 8B, the stearic acid level was high at stage EM1 and decreased sharpely from stages EM1 to EM2. The stearic acid level increased gradually from stages EM2 to LM and peaked at stage LM. The oleic acid level (FIG. 8C) and linoleic acid level (FIG. 8D) showed the same trend as the palmitic acid level (FIG. 8A) throughout all five developmental stages. Referring to FIG. 8E, generally all materials, except LPLO, had high linolenic acid at stages EM1 and EM2, followed by a downward trend from stages EM2 to LM, before it increased from stages LM to DS. The LPLO linolenic acid level was irregular: it was high at stage EM1 and decreased from stages EM1 to EM2, followed by a sharp increase from stages EM2 to MM and a sharp decrease from stages MM to LM, and followed by an increase from stages LM to DS. All five materials have similar trends of palmitic acid, stearic acid, oleic acid, and linoleic acid throughout the five developmental stages. Referring to FIG. 8F, the total fatty acid contentat has the same trend as that of several individual fatty acids, e.g., palmitic acid in FIG. 8A. Of note, the fatty acid content of the two high-oil materials, LPHO and HPHO, was higher than that of the parent SN14, while the fatty acid content of the two low-oil materials, HPLO and LPLO, was lower than that of the parent SN14.

Example 15. Expression analysis of candidate genes in soybean grains at different developmental stages

Expression of Glyma. 20G092000, Glyma. 20G092100, Glyma. 20G092400 and Glyma. 20G094900 in SN14 and four extreme materials (HPLO, LPHO, HPHO and LPLO) were analyzed by RT-qPCR. RNA extraction, cDNA generation and RT-qPCR were performed accoding to the methods described herein. Expression was examined at eight developmental stages of their grains Glob, Hrt, Cot, EM1, EM2, MM, LM and DS (Table 3) (FIGS 10) . The results showed that, in general, the expression levels of Glyma. 20G092000, Glyma. 20G092100, Glyma. 20G092400 and Glyma. 20G094900 started low during stages Glob and Hrt and reached the peak at stage LM or DS. Three genes, Glyma. 20G092000, Glyma. 20G092400 and Glyma. 20G094900 share a similar trend of expression throughout the eight developmental stages. Briefly, the expression level increased from stages Hrt to Cot and dropped from stages Cot to EM2, followed by an increase from stages EM2 to LM, and dropped from stages LM to DS in SN14, LPHO and LPLO materials. The expression level of Glyma. 20G092000 in HPHO and HPLO materials continued to increase from stages LM to DS and reached the highest level at stages DS (FIG. 10, top left panel) . The expression level of Glyma. 20G092100 remained relatively steady from stages Glob to EM2 (except LPLO) , followed by an increase from stages EM2 to DS, and the expression level remained high at stage DS (FIG. 10, top right panel) . The expression level of Glyma. 20G092400 in LPLO is slightly lower than that of Glyma. 20G092100 at stage DS. In each of the five materials (SN14, LPHO, HPLO, HPHO and LPLO) , the expression level of Glyma. 20G092400 in LPLO is slightly lower than that of Glyma. 20G092100 at stage DS. The expression level of Glyma. 20G092400 during the developmental stage was higher than that of Glyma. 20G092000, Glyma. 20G092100 and Glyma. 20G094900 at each developmental stage of the five materials, and the expression level of Glyma. 20G092400 in HPHO was the highest at stage LM. Therefore, Glyma. 20G092400 is selected for further analysis of its role in the regulation of protein and oil accumulation during grain development.

Example 16. Phylogenetic analysis

The phylogenetic tree of GmDES1 (Glyma. 20G092400) was constructed using homologous sequences from Soybean, Arabidopsis, rice, and corn with MEGA5 software. See Fig. 18. GmDES1 (Glyma. 20G092400) shows identity with AT5G26600 (60.6%) , AT3G62130 (55.5%) , Zm00001d008187 (54.8%) , Zm00001d040555 (57.2%) , LOC_Os01g18640 (56.3%) and LOC_Os01g18660 (52.4%) .

All patents, patent publications, patent applications, journal articles, books, technical references, and the like discussed in the instant disclosure are incorporated herein by reference in their entirety for all purposes.

It can be appreciated that, in certain aspects of the disclosure, a single component may be replaced by multiple components, and multiple components may be replaced by a single component, to provide an element or structure or to perform a given function or functions. Except where such substitution would not be operative to practice certain embodiments of the disclosure, such substitution is considered within the scope of the disclosure.

The examples presented herein are intended to illustrate potential and specific implementations of the disclosure. It can be appreciated that the examples are intended primarily for purposes of illustration of the disclosure for those skilled in the art. There may be variations to these diagrams, or the operations described herein without departing from the spirit of the disclosure. For instance, in certain cases, method steps or operations may be performed or executed in differing order, or operations may be added, deleted, or modified.

All numerical designations, e.g., pH, temperature, time, concentration, and molecular weight, including ranges, are approximations which are varied (+) or (-) by increments of 0.1 or 1.0, as appropriate. It is to be understood, although not always explicitly stated that all numerical designations are preceded by the term “about. ” Where a range of values is provided, it is understood that each intervening value, to the smallest fraction of the unit of the lower limit, unless the context clearly dictates otherwise, between the upper and lower limits of that range is also specifically disclosed. Any narrower range between any stated values or unstated intervening values in a stated range and any other stated or intervening value in that stated range is encompassed. The upper and lower limits of those smaller ranges may independently be included or excluded in the range, and each range where either, neither, or both limits are included in the smaller ranges is also encompassed within the technology, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included.

The following copending commonly owned patent application is incorporated by reference in its entirety for all purposes:

METHODS AND COMPOSITIONS FOR INCREASING PROTEIN AND/OR OIL CONTENT AND MODIFYING OIL PROFILE IN A PLANT, International Application No. ____________, filed ______, 2022, and filed concurrently herewith. (Attorney Docket No. 086879-1262814; Client Reference No. 82424-WO-REG-ORG-P-1) .

In the foregoing description, numerous specific details are set forth to provide a more thorough understanding of the present invention. However, it will be apparent to one of skill in the art that the invention described in this disclosure may be practiced without one or more of these specific details. In other instances, well-known features and procedures well known to those skilled in the art have not been described in order to avoid obscuring the invention. Embodiments of the disclosure have been described for illustrative and not restrictive purposes. Although the present invention is described primarily with reference to specific embodiments, it is also envisioned that other embodiments will become apparent to those skilled in the art upon reading the present disclosure, and it is intended that such embodiments be contained within the present inventive methods. Accordingly, the present disclosure is not limited to the embodiments described above or depicted in the drawings, and various embodiments and modifications can be made without departing from the scope of the claims below.

INFORMAL SEQUENCE LISTING

SEQ ID NO: 1 GmDES1 (Glyma. 20G092400) genomic sequence

SEQ ID NO: 2 GmDES1 (Glyma. 20G092400) CDS

SEQ ID NO: 3 GmDES1 (Glyma. 20G092400) protein

SEQ ID NO: 4 Glyma. 20G092000 genomic sequence

SEQ ID NO: 5 Glyma. 20G092000 CDS

SEQ ID NO: 6 Glyma. 20G092000 protein

SEQ ID NO: 7 Glyma. 20G094900 genomic sequence

SEQ ID NO: 8 Glyma. 20G094900 CDS

SEQ ID NO: 9 Glyma. 20G094900 protein

SEQ ID NO: 10 Glyma. 20G092100 genomic sequence

SEQ ID NO: 11 Glyma. 20G092100 CDS

SEQ ID NO: 12 Glyma. 20G092100 Protein

SEQ ID NO: 13 AT5G26600 genomic sequence

SEQ ID NO: 14 AT5G26600 CDS

SEQ ID NO: 15 AT5G26600 Protein

SEQ ID NO: 16 SALK-LP

SEQ ID NO: 17 SALK-RP

SEQ ID NO: 18 SALK-BP

SEQ ID NO: 19 SALK_021984C genomic sequence

SEQ ID NO: 20 pSOY1

SEQ ID NO: 21 Glyma. 20G092400-cF

SEQ ID NO: 22 Glyma. 20G092400-cF

SEQ ID NO: 23 Bar-F

SEQ ID NO: 24 Bar-R

SEQ ID NO: 25 Glyma. 20G092000-qF

SEQ ID NO: 26 Glyma. 20G092000-qR

SEQ ID NO: 27 Glyma. 20G092100-qF

SEQ ID NO: 28 Glyma. 20G092100-qR

SEQ ID NO: 29 Glyma. 20G092400-qF

SEQ ID NO: 30 Glyma. 20G092400-qR

SEQ ID NO: 31 Glyma. 20G094900-qF

SEQ ID NO: 32 Glyma. 20G094900-qR

SEQ ID NO: 33 GmActin4-qF

SEQ ID NO: 34 GmActin4-qR

SEQ ID NO: 35 AtACTIN2-q-F

SEQ ID NO: 36 AtACTIN2-q-R

SEQ ID NO: 37 Glyma. 20G092400-zF Arabidopsis thaliana mutant DNA detection

SEQ ID NO: 38 Glyma. 20G092400-zR Arabidopsis thaliana mutant DNA detection

SEQ ID NO: 39 SALK_021984C-F Arabidopsis thaliana mutant mRNA detection

SEQ ID NO: 40 SALK_021984C-R Arabidopsis thaliana mutant mRNA detection

SEQ ID NO: 41 At18SrRNA-F

SEQ ID NO: 42 At18SrRNA-R

SEQ ID NO: 43 LOC_Os01g18640 genomic sequence

SEQ ID NO: 44 LOC_Os01g18640 CDS

SEQ ID NO: 45 LOC_Os01g18640 protein

SEQ ID NO: 46 LOC_Os01g18660 genomic sequence

SEQ ID NO: 47 LOC_Os01g18660 CDS

SEQ ID NO: 48 LOC_Os01g18660 protein

SEQ ID NO: 49 Zm00001d008187 genomic sequence

SEQ ID NO: 50 Zm00001d008187 CDS

SEQ ID NO: 51 Zm00001d008187 protein

SEQ ID NO: 52 Zm00001d040555 genomic sequence

SEQ ID NO: 53 Zm00001d040555 CDS

SEQ ID NO: 54 Zm00001d040555 protein

SEQ ID NO: 55 AT3G26115 (DCD2) genomic sequence

SEQ ID NO: 56 AT3G26115 (DCD2) CDS

SEQ ID NO: 57 AT3G26115 (DCD2) Protein

SEQ ID NO: 58 AT1G48420 (DCD1) genomic sequence

SEQ ID NO: 59 AT1G48420 (DCD1) CDS

SEQ ID NO: 60 AT1G48420 (DCD1) Protein

SEQ ID NO: 61 AT5G28030 (DES1) genomic sequence

SEQ ID NO: 62 AT5G28030 (DES1) CDS

SEQ ID NO: 63 AT5G28030 (DES1) protein

SEQ ID NO: 64 At5g65720 (NFS1) genomic sequence

SEQ ID NO: 65 At5g65720 (NFS1) CDS

SEQ ID NO: 66 At5g65720 (NFS1) protein

SEQ ID NO: 67 At1g08490 (NFS2) genomic sequence

SEQ ID NO: 68 At1g08490 (NFS2) CDS

SEQ ID NO: 69 At1g08490 (NFS2) protein

SEQ ID NO: 70 AT3G62130 (LCD) genomic sequence

SEQ ID NO: 71 AT3G62130 (LCD) CDS

SEQ ID NO: 72 AT3G62130 (LCD) protein

Claims

An elite Glycine max plant having in its genome a nucleic acid sequence from a donor Glycine plant, wherein the donor Glycine plant is a different strain from the elite Glycine max plant, and wherein the nucleic acid sequence encoding at least one polypeotide having at least 90%identity or 95%identity to SEQ ID NOs: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, wherein said polypeptide confers increased protein content, oil content, and/or modified oil profile on the elite Glycine max plant as compared to a control plant not comprising said nucleic acid sequence.
The elite Glycine max plant of claim 1, wherein the donor Glycine plant is a Glycine soja plant or Glycine max plant.
The elite Glycine max plant of claim 2, wherein the Glycine soja plant is a ZYD00006 variety.
The elite Glycine max plant of claim 2, wherein the Glycine max plant is a DN50 variety or a SN14 variety.
The elite Glycine max plant of any one of claims 1-4, wherein the nucleic acid sequence encodes at least one polypeptide having the amino acid sequence set forth in the SEQ ID NOs: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 and/or 72.
The elite Glycine max plant of any one of claims 1-4, wherein

the nucleic acid sequence has at least 90%identity, at least 95%identity, or at least 100%identity to any one of SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 and/or 70 or

the nucleic acid sequence has at least 90%identity, at least 95%identity, or at least 100%identity to any one of SEQ ID NO: 2, 5, 8, 11, 14, 44, 47, 50, 53, 56, 59, 62, 65, 68 or 71.
The elite Glycine max plant of any one of claims 1-6, wherein the polypeptide encoded by the nucleic acid sequence has at least 90%identity or at least 95%identity to SEQ ID NO: 3, wherein the polypeptide comprises an amino transferase domain, wherein the amino transferase domain has no more than two, no more than five, no more than ten amino acid substitutions as compared to amino acid residues 91-274 of SEQ ID NO: 3.
The elite Glycine max plant of any one of claims 1-7, wherein said nucleic acid sequence is introduced into said plant genome by genome editing of genomic sequences corresponding to and comprising any one of SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 or 70, wherein the genome editing confers increased protein content, oil content, and/or oil profile.
The elite Glycine max plant of claim 8, wherein the gene editing is by CRISPR, TALEN, meganucleases, or through modification of genomic nucleic acids.
The elite Glycine max plant of claim 1-6, wherein said nucleic acid sequence is introduced into said plant genome by transgenic expression of

(a) a nucleic acid sequence having at least 90%identity or at least 95%identity to any one of SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 and/or 70,

(b) a nucleic acid sequence encoding a polypeptide having at least 90%identity or at least 95%identity to the sequence of any one of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 and/or 72, or

(c) a nucleic acid sequence encoding a polypeptide the sequence of any one of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 and/or 72,

wherein said polypeptide confers increased protein content, increased oil content, and/or modified oil profile on the elite Glycine max plant.
The elite Glycine max plant of any one of claims 1-10, wherein the elite Glycine max plant is an agronomically elite Glycine max plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, or herbicide tolerance.
A plant, the genome of which has been edited to comprise a nucleic acid sequence encoding at least one polypeptide having at least 90%identity or 95%identity to SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 and/or 72, wherein said polypeptide confers increased protein content, increased oil content, and/or modified oil profile relative to a control plant, wherein the plant does not comprise said nucleic acid sequence before the genome editing.
The plant of claim 12, wherein the nucleic acid sequence is introduced into said plant genome by genome editing of the nucleic acid sequence set forth in SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 and/or 70.
The plant of claim 12 or 13, wherein the genome editing comprises duplication, inversion, promoter modification, terminator modification and/or splicing modification of the nucleic acid sequence.
The plant of any one of claims 12-14, wherein the genome editing is accomplished through CRISPR, TALEN, meganucleases, or through modification of genomic nucleic acids.
The plant of any one of claims 12-15, wherein the plant is an agronomically elite plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, or herbicide tolerance.
The plant of any one of claims 12-16, wherein the nucleic acid sequence is operably linked to a heterologous promoter and wherein the heterologous promoter is active in the plant.
The plant of claim 17, wherein the heterologous promoter is a native promoter or active variant of fragment thereof.
A plant having stably incorporated into its genome a nucleic acid sequence operably linked to a promoter active in the plant, wherein the nucleic acid sequence encodes a polypeptide having

(a) an amino acid sequence that has at least 85%identity, at least 90%identity, or at least 95%identity to at least one of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 and/or 72, or

(b) an amino acid sequence set forth in SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 and/or 72,

wherein said nucleic acid sequence is heterologous to the plant, and wherein the plant has increased protein content, increased oil content, and/or modified oil profileas compared to a control plant.
The plant of claim 19, wherein

(a) the nucleic acid sequence comprises at least 85%identity, at least 90%identity, or at least 95%identity to at least one of SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 and/or 70, or

(b) the nucleic acid sequence is any one of SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 and/or 70.
The plant of claims 19 or 20, wherein the nucleic acid sequence is introduced into the genome by transgenic expression.
The plant of claims 19 or 20, wherein the nucleic acid sequence is introduced into the genome by genome editing.
The plant of claim 22, wherein the promoter is an endogenous promoter.
The plant of any one of embodiments 19-23, wherein the promoter is a constitutive promoter, inducible promoter, or a tissue-specific promoter
The plant of any one of claims 19-24, wherein the plant is a dicot plant.
The plant of claim 25, wherein the dicot plant is a soybean plant or an elite soybean plant.
The plant of any one of embodiments 19-24, wherein the plant is a monocot plant.
The plant of claim 27, wherein the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane.
The plant of any one of claims 19-28, wherein the plant is an agronomically elite plant having a commercially significant yield and/or commercially susceptible vigor, seed set, standability, threshability, abiotic/biotic resistance, or herbicide tolerance.
A progeny plant from the elite Glycine max plant of any one of claims 1-91 or the plant of any one of claims 10-29, wherein the progeny plant has stably incorporated into its genome the nucleic acid sequence.
A plant cell, seed, or plant part derived from the elite Glycine max plant of any one of claims 1-11 or the plant of any one of claims 10-29, wherein said plant cell, seed or plant part has stably incorporated into its genome the nucleic acid sequence.
A harvest product derived from the elite Glycine max plant of any one of claims 1-11 or the plant of any one of claims 12-29.
A processed product derived from the harvest product of claim 32, wherein the processed product is a flour, a meal, an oil, a starch, or a product derived from any of the foregoing.
A method of producing a soybean plant having increased polypeptide and/or oil content, the method comprising the steps of:

a) providing a donor soybean plant comprising in its genome a nucleic acid sequence encoding any at least one polypeptide having at least 90%identity or 95%identity to SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, wherein said nucleic acid sequence confers to said donor soybean plant increased protein content, increased oil content, and/or modified oil profile compared to donor Glycine plant,

b) crossing the donor soybean plant of a) with a recipient soybean plant not comprising said nucleic acid sequence; and

c) selecting a progeny plant from the cross of b) by detecting the presence of the nucleic acid sequence, or the presence of one or more molecular markers associated with the nucleic acid sequencein the progeny plant, thereby producing a soybean plant having increased protein content, increased oil content, and/or modified oil profile.
The method of claim 34, wherein the molecular marker is a single nucleotide polymorphism (SNP) , a quantitative trait locus (QTL) , an amplified fragment length polymorphism (AFLP) , randomly amplified polymorphic DNA (RAPD) , a restriction fragment length polymorphism (RFLP) or a microsatellite.
The method of claim 34 or 35, wherein the either donor or recipient soybean plant is an elite Glycine max plant.
A method of producing a Glycine max plant with increased protein content, increased oil content, and/or modified oil profile, the method comprising the steps of:

a) isolating a nucleic acid from a Glycine max plant

b) detecting in the nucleic acid of a) at least one molecular marker associated with a nucleic acid sequence comprising any one of SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 or 70, wherein said nucleic acid sequence confers to the Glycine max plant increased protein content, increased oil content, and/or modified oil profile;

c) selecting a Glycine max plant based on the presence of the molecular marker detected in b) ; and

d) producing a Glycine max progeny plant from the plant of c) identified as having said molecular marker associated with increased polypeptide and/or increased oil content.
The method of claim 37, wherein the molecular marker is a single nucleotide polymorphism (SNP) , a quantitative trait locus (QTL) , an amplified fragment length polymorphism (AFLP) , randomly amplified polymorphic DNA (RAPD) , a restriction fragment length polymorphism (RFLP) or a microsatellite.
The method of claim 38, wherein the detecting comprises amplifying a molecular marker locus or a portion of the molecular marker locus and detecting the resulting amplified molecular marker amplicon.
The method of claim 39, wherein the amplifying comprises employing a polymerase chain reaction (PCR) or ligase chain reaction (LCR) using a nucleic acid isolated from a soybean plant or germplasm as a template in the PCR or LCR.
The method of claim 39, wherein the nucleic acid is selected from DNA or RNA.
A plant produced by the method of any one of claims 34-41.
A method of conferring increased protein content, increased oil content, and/or modified oil profile to a plant comprising:

a) introducing into the genome of the plant a nucleic acid molecule operably linked to a promoter active in the plant, wherein the nucleic acid sequence is stably incorporated into the genome, wherein the nucleic acid sequence encodes a polypeptide having

(i) an amino acid sequence comprising at least 85%, at least 90%, or at least 95%identity to any one of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, or

(ii) an amino acid sequence set forth in SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72,

wherein said nucleic acid sequence is heterologous to the plant, and

wherein expression of said nucleic acid sequence increases protein content, increased oil content, and/or modified oil profile compared to a control plant not expressing said nucleic acid sequence.
The method of claim 43, wherein the nucleic acid sequence is introduced into the genome of the plant by transformation.
The method of claim 44, wherein the nucleic acid sequence is introduced into the genome of the plant by crossing a donor plant comprising the nucleic acid sequence with the plant to produce a progeny plant having increased protein content, increased oil content, and/or modified oil profile.
The method of claim 45, wherein the nucleic acid sequence is introduced into the genome of the plant by gene editing of the genome of the plant.
The method of claim 45, wherein the method comprises Cas12a mediated gene replacement.
The method of any one of claims 43-47, wherein the promoter is an exogenous promoter.
The method of any of claims 43-47, wherein the promoter is an endogenous promoter.
The method of any one of claims 43-49 wherein the method comprises screening for the introduced nucleic acid sequencewith PCR and/or sequencing.
The method of any one of claims 43-50, wherein the plant is a dicot plant.
The method of claim 51, wherein the dicot plant is a soybean plant.
The method of any one of claims 43-51, wherein the plant is a monocot plant.
The method of claim 53, wherein the monocot plant is selected from the group consisting of rice, wheat, maize, and sugar cane.
A plant produced by the method of any one of claims 43-54.
A polypeptide selected from:

(a) a polypeptide having the amino acid sequence shown in SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, or any portion thereof, wherein the portion confers increased polypeptide and/or oil content, and having a heterologous amino acid sequence attached thereto;

(b) a polypeptide comprising the amino acid sequence of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, and having substitution and/or deletion and/or addition of one or more amino acid residues, wherein expression of the polypeptide confers increased polypeptide and/or oil content on the plant;

(c) a polypeptide having more than 99%, more than 95%, more than 90%, more than 85%, or more than 80%identity with the amino acid sequence of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, wherein the polypeptide when expressed in a plant confers increased polypeptide and/or oil content on the plant; or

(d) a fusion polypeptide comprising the amino acid sequence of SEQ ID NO: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, or the polypeptide as defined in any one of (a) to (c) .
A nucleic acid molecule comprising:

(a) a nucleotide sequence encoding a protein having an amino acid sequence sharing at least 90%, 95%or 100%sequence identity to SEQ ID NOs: 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72, wherein said nucleotide sequence comprises a heterologous nucleic acid sequence attached thereto and expression of the nucleic acid molecule in a plant increases protein and/or oil content in the plant;

(b) the nucleotide sequence of part (a) comprising a sequence of SEQ ID NOs: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 or 70; or

(c) the nucleotide sequence of part (a) having at least more than 99%, at least 95%, at least 90%, at least 85%, or at least 80%identity to any one of SEQ ID NOs1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 or 70.
An expression cassette comprising the nucleic acid molecule of claim 56 or encoding the polypeptide of claim 57.
The expression cassette of claim 58, wherein the nucleic acid molecule is operably linked to a promoter capable of directing expression in a plant cell.
The expression cassette of claim 59, wherein the promoter is an endogenous promoter.
The expression cassette of claim 59, wherein the promoter is an exogenous promoter.
The expression cassette of claim 61, wherein the promoter comprises pSOY1 (SEQ ID NO: 20) .
A vector comprising the nucleic acid molecule of claim 62, the expression cassette of any one of claims 56-36, a nucleic acid molecule having the sequence set forth in SEQ ID NO: 1, 4, 7, 10, 13, 43, 46, 49, 52, 55, 58, 61, 64, 67 or 70, or a nucleic acid sequence encoding the polypeptide of claim 3, 6, 9, 12, 15, 45, 48, 51, 54, 57, 60, 63, 66, 69 or 72.
A transgenic cell comprising the nucleic acid molecule of claim 63 or the expression cassette of any one of claims 56-63
Use of the polypeptide of claim 56 or the nucleic acid molecule of claim 57, or the expression cassette, of any one of claims 56-64, or the transgenic cell of claim 63 in conferring increased protein content, increased oil content, and/or modified oil profile.
Use of the expression cassette of any one of claims 56-64 in a cell, wherein the expression level and/or activity of the polypeptide in the cell is increased, and the protein content is increased, the oil content is increased and/or the oil profile is modified in the cell.
A method for increasing protein content, increasing oil content, and/or modifying oil profile in a plant, comprising increasing the expression level and/or activity of the polypeptide of claim 56 in the plant.
A method for producing a plant variety with increased protein content, increased oil content, and/or modified oil profile in a plant, comprising increasing the expression level and/or activity of the nucleic acid molecule of claim 57 in the plant.
The method of claims 67 or 68, wherein the increasing the expression level and/or activity of the polypeptide in the plant is by transgenic means or by breeding.
A method for producing a transgenic plant with increased protein content, increased oil content, and/or modified oil profile, comprising introducing the nucleic acid molecule of claim 57 or the expression cassette of any one of claims 65-69 to a recipient plant to obtain a transgenic plant, wherein the transgenic plant has increased protein content, increased oil content, and/or modified oil profile compared to the recipient plant.
The method of claim 70, wherein the introducing the nucleic acid molecule to the recipient plant is performed by introducing the expression cassette of any one of claims 6-36 into the recipient plant.
A primer pair for amplifying the nucleic acid molecule of claim 57.
The primer pair of claim 72, wherein the primer pair is a primer pair composed of two single-stranded DNA shown in Table 10, Table 13, Table 14, and Table 15.
A kit comprising the primer pair of claim 72 or 73.