WO2021156505A1 - Methods of controlling grain size and weight - Google Patents

Methods of controlling grain size and weight Download PDF

Info

Publication number
WO2021156505A1
WO2021156505A1 PCT/EP2021/052951 EP2021052951W WO2021156505A1 WO 2021156505 A1 WO2021156505 A1 WO 2021156505A1 EP 2021052951 W EP2021052951 W EP 2021052951W WO 2021156505 A1 WO2021156505 A1 WO 2021156505A1
Authority
WO
WIPO (PCT)
Prior art keywords
att
oml4
aat
plant
gsk2
Prior art date
Application number
PCT/EP2021/052951
Other languages
French (fr)
Inventor
Yunhai Li
Jia LYU
Penggen DUAN
Liming Zhang
Baolan ZHANG
Original Assignee
Institute Of Genetics And Developmental Biology Chinese Academy Of Sciences
Williams, Andrea
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute Of Genetics And Developmental Biology Chinese Academy Of Sciences, Williams, Andrea filed Critical Institute Of Genetics And Developmental Biology Chinese Academy Of Sciences
Priority to CA3167040A priority Critical patent/CA3167040A1/en
Priority to US17/760,160 priority patent/US20230081195A1/en
Priority to EP21705118.4A priority patent/EP4099818A1/en
Priority to AU2021216126A priority patent/AU2021216126A1/en
Priority to CN202180011352.5A priority patent/CN115135142A/en
Publication of WO2021156505A1 publication Critical patent/WO2021156505A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8241Phenotypically and genetically modified plants via recombinant DNA technology
    • C12N15/8261Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield
    • C12N15/8262Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield involving plant development
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01HNEW PLANTS OR NON-TRANSGENIC PROCESSES FOR OBTAINING THEM; PLANT REPRODUCTION BY TISSUE CULTURE TECHNIQUES
    • A01H1/00Processes for modifying genotypes ; Plants characterised by associated natural traits
    • A01H1/12Processes for modifying agronomic input traits, e.g. crop yield
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01HNEW PLANTS OR NON-TRANSGENIC PROCESSES FOR OBTAINING THEM; PLANT REPRODUCTION BY TISSUE CULTURE TECHNIQUES
    • A01H5/00Angiosperms, i.e. flowering plants, characterised by their plant parts; Angiosperms characterised otherwise than by their botanic taxonomy
    • A01H5/10Seeds
    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01HNEW PLANTS OR NON-TRANSGENIC PROCESSES FOR OBTAINING THEM; PLANT REPRODUCTION BY TISSUE CULTURE TECHNIQUES
    • A01H6/00Angiosperms, i.e. flowering plants, characterised by their botanic taxonomy
    • A01H6/46Gramineae or Poaceae, e.g. ryegrass, rice, wheat or maize
    • A01H6/4636Oryza sp. [rice]
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/415Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from plants
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8201Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation
    • C12N15/8213Targeted insertion of genes into the plant genome by homologous recombination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8241Phenotypically and genetically modified plants via recombinant DNA technology
    • C12N15/8261Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/12Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/12Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
    • C12N9/1205Phosphotransferases with an alcohol group as acceptor (2.7.1), e.g. protein kinases
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y207/00Transferases transferring phosphorus-containing groups (2.7)
    • C12Y207/11Protein-serine/threonine kinases (2.7.11)
    • C12Y207/11001Non-specific serine/threonine protein kinase (2.7.11.1), i.e. casein kinase or checkpoint kinase
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/10Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture
    • Y02A40/146Genetically Modified [GMO] plants, e.g. transgenic plants

Definitions

  • the invention relates to methods of increasing grain size and/or weight in a plant, as well as plants with increased grain size and/or weight by reducing the expression and/or activity of OML4.
  • the invention relates to methods of increasing grain number by increasing the expression and/or activity of OML4.
  • Grain yield is determined by tiller number, grain number and grain weight.
  • grain size is a key component of grain weight
  • regulation of grain size is a crucial strategy to increase grain production.
  • Grain growth is restricted by spikelet hulls, which influence final grain size in rice.
  • the growth of the spikelet hull is determined by cell proliferation and cell expansion processes.
  • GW2, GW5/GSE5, GW8/OsSPL16, GS3, GS9, OsMKKK10-OsMKK4-OsMPK6 and MKP1 Several genes that regulate grain size by influencing cell proliferation in the spikelet hull have been described in rice, such as GW2, GW5/GSE5, GW8/OsSPL16, GS3, GS9, OsMKKK10-OsMKK4-OsMPK6 and MKP1 .
  • OML4 Mei2-Like protein 4
  • LARGE1 the Mei2-Like protein 4 encoded by the LARGE1 gene is phosphorylated by the glycogen synthase kinase 2 (GSK2) and negatively controls grain size and weight in rice.
  • GSK2 glycogen synthase kinase 2
  • Loss of function of OML4 leads to large and heavy grains, while overexpression of OML4 causes small and light grains.
  • OML4 regulates grain size by restricting cell expansion in the spikelet hull.
  • OML4 is expressed in developing inflorescences (e.g. panicles of rice) and grains, and expression (indicated by GFP-OML4 fusion protein) is localized in the nuclei.
  • a method of increasing grain size and/or weight comprising reducing or abolishing the expression and/or activity of Mei2-Like protein 4 (OML4).
  • OML4 Mei2-Like protein 4
  • the method comprises introducing at least one mutation into at least one nucleic acid sequence encoding OML4 and/or at least one mutation into the promoter of OML4.
  • the method further comprises additionally reducing or abolishing the expression and/or activity of a SHAGGY-like kinase (GSK2).
  • GSK2 SHAGGY-like kinase
  • the method comprises introducing at least one mutation into at least one nucleic acid sequence encoding GSK2 and/or at least one mutation into the promoter of GSK2.
  • the mutation is a loss of function or partial loss of function mutation.
  • the mutation is introduced using targeted genome modification, preferably ZFNs, TALENs or CRISPR/Cas9 or mutagenesis, preferably TILLING or T- DNA insertion.
  • the method comprises using RNA interference to reduce or abolish the expression of a OML4 nucleic acid sequence or a GSK2 nucleic acid sequence.
  • a genetically modified plant characterised by reduced or abolished expression of OML4.
  • the plant comprises at least one mutation in at least one nucleic acid sequence encoding a OML4 gene and/or at least one mutation into the promoter of OML4.
  • the plant part is a seed or grain (such terms can be used interchangeably).
  • progeny plants obtained or obtainable from the seeds, as well as seeds obtained from said progeny plants.
  • the plant further comprises at least one mutation in at least one nucleic acid sequence encoding GSK2 and/or at least one mutation into the promoter of GSK2.
  • the mutation is a loss of function or partial loss of function mutation.
  • the plant comprises an RNA interference construct that reduces or abolishes the expression of OML4.
  • a method of producing a plant with increased grain size and/or weight comprising introducing at least one mutation into at least one nucleic acid sequence encoding a OML4 polypeptide and/or at least one mutation into the promoter of OML4.
  • the method further comprises introducing at least one mutation into at least one nucleic acid sequence encoding a GSK2 polypeptide and/or at least one mutation into the promoter of GSK2.
  • the mutation is a loss of function or partial loss of function mutation.
  • the OML4 nucleic acid sequence encodes a polypeptide comprising SEQ ID NO: 1 or a functional variant or homolog thereof, and preferably the nucleic acid sequence encoding OML4 comprises a nucleic sequence as defined in SEQ ID NO: 2.
  • the promoter of OML4 comprises a sequence as defined in SEQ ID NO: 3 or a functional variant or homolog thereof.
  • the GSK2 nucleic acid sequence encodes a polypeptide as defined in SEQ ID NO: 4 or a functional variant or homolog thereof, and preferably, the GSK2 nucleic acid sequence comprises a nucleic acid sequence as defined in SEQ ID NO: 5 or a functional variant or homolog thereof.
  • the GSK2 promoter comprises a nucleic acid sequence as defined in SEQ ID NO: 6 or a functional variant or homolog thereof.
  • the mutation is introduced using targeted genome modification, preferably ZFNs, TALENs or CRISP/Cas9, or the mutation is introduced using mutagenesis, preferably TILLING or T-DNA insertion.
  • the plant is a crop plant.
  • the plant is selected from rice, wheat, maize, soybean and brassicas. DESCRIPTION OF THE FIGURES
  • Figure 1 shows that LARGE1 influences grain size and plant morphology.
  • A, B ZHJ and largel -1 grains.
  • C, D ZHJ and largel -1 plants.
  • E ZHJ (left) and largel -1 (right) panicles.
  • F, G Grain length and width of ZHJ and largel -1.
  • H 1000-grain weight of ZHJ and largel -1.
  • I Plant height of ZHJ and largel -1.
  • J Panicle length of ZHJ and largel -1.
  • K The number of ZHJ and largel -1 primary panicle branches.
  • L The number of ZHJ and largel -1 secondary panicle branches. Values in F-H are given as mean + SD (n 350).
  • Figure 2 shows that the largel forms large grains due to increased cell expansion in the spikelet hull.
  • A, B SEM analysis of the outer surface of ZHJ (A) and largel -1 (B) lemmas.
  • C, D SEM analysis of the inner surface of ZHJ (C) and largel -1 (D) lemmas.
  • E, F The average length (E) and width (F) of outer epidermal cells in ZHJ and largel - 1 lemmas.
  • G Outer epidermal cell number in the longitudinal direction in ZHJ and largel -1 lemmas.
  • H Outer epidermal cell number in the transverse direction in ZHJ and largel -1 lemmas.
  • FIG. 3 shows that LARGE1 encodes the mei2-like protein OML4.
  • A The LARGE1/OML4 gene structure. The coding sequence was shown using the black box, and introns were indicated using black lines. ATG and TGA represent the start codon and the stop codon, respectively.
  • B OML4 and mutated protein encodes by largel . The OML4 protein contains three RNA recognition motif (RRM) domains. The mutation results in a premature termination codon in OML4, causing a truncated protein.
  • C The dCAPSI marker was developed according to the largel -1 mutation. The PCR products were digested by the restriction enzyme Hph I.
  • OML4 expression activity was monitored by proOML4::GUS transgene expression. Histochemical analysis of GUS activity in panicles at different developmental stages.
  • J, K Mature paddy (J) and brown (K) rice grains of ZHJ, largel -1 , gl_ARGE1-GFP; largel -1 #1.
  • L-O Subcellular location of OML4-GFP in gLARGE1-GFP; largel -1 #1 root cells.
  • GFP fluorescence of GFP-OML4 (L), DAPI staining (M), DIC (N) and merged (O) images are shown. Bars: 2 mm in D, E, J and K; 1 cm in I; 10 pm in L-O.
  • Figure 4 shows that Overexpression of OML4 results in smaller grains.
  • A, B ZHJ and proActin:OML4 grains.
  • C, D Grain length and width of ZHJ and proActin:OML4 transgenic lines.
  • E 1000-grain weight of ZHJ and proActin:OML4 transgenic lines.
  • G ZHJ and proActin:OML4 plants.
  • H Plant height of ZHJ and proActin:OML4 transgenic lines.
  • I ZHJ and proActin:OML4 panicles.
  • J Panicle length of ZHJ and proActin:OML4 transgenic lines.
  • K, L The primary and secondary panicle branch number of ZHJ and proActin:OML4 transgenic lines.
  • M Total grain number per panicle of ZHJ and proActin:OML4 transgenic lines.
  • N, O SEM analysis of the outer surface of ZHJ (N) and proActin:OML4 #1 (O) lemmas.
  • P, Q The average length and width of outer epidermal cells in the longitudinal direction in ZHJ and proActin:OML4 #1 lemmas.
  • R, S The number of outer epidermal cells in the longitudinal and transverse direction in ZHJ and proActin:OML4 #1 lemmas.
  • C-E, and P-S are given as the means ⁇ SD (n 350).
  • Value F is given as the mean ⁇ SD.
  • Asterisks indicate significant differences between ZHJ and proActin:OML4 transgenic lines. * P ⁇ 0.05; ** P ⁇ 0.01 compared with the wild type by Student’s t- test. Bars: 2 mm in A and B; 10 cm in G and I; 50 pm in N and O.
  • FIG. 5 shows that OML4 physically interacts with GSK2 in Vitro and in Vivo.
  • OML4 interacts with GSK2 in yeast cells. Yeast cells were cultured on SD/-Trp-Leu or SD/-Trp-Leu-His-Ade media.
  • B OML4 associates with GSK2 in N. benthamiana. OML4-nLUC and GSK2-cLUC were co-expressed in N. benthamiana leaves. Luciferase activity was observed 48 hours after infiltration. The range of luminescence intensity was scaled by the pseudocolor bar.
  • C Bimolecular fluorescence complementation (BiFC) assays shown that OML4 interacts with GSK2 in N. benthamiana.
  • OML4-cYFP was coexpressed with GSK2-nYFP in leaves of N. benthamiana.
  • D OML4 binds GSK2 in vitro. GSK2-GST was incubated with OML4- MBP and pulled down by OML4-MBP and detected by immunoblot with anti-GST antibody. IB: immunoblot.
  • E Interaction between OML4 and GSK2 in the Co-IP assays. Anti-MYC beads were used to immunoprecipitate GSK2-GFP proteins. Gel blots were probed with anti-MYC or anti-GFP antibody. Bars: 50 pm in C.
  • FIG. 6 shows that GSK2 is required for the phosphorylation of OML4.
  • A GSK2 phosphorylates OML4 in vitro. The phosphorylated OML4-FLAG, nOML4-FLAG (the N- terminal of OML4) and cOML4-FLAG (the C-terminal of OML4) were separated by phos-tag SDS-PAGE. The phosphorylated protein was marked with the red vertical line.
  • the phosphorylated OML4-MBP, OML4S105A, S607A-MBP and GSK2-GST were separated by phos-tag SDS-PAGE. The phosphorylated protein was marked with red vertical line.
  • GSK2 influences the abundance of OML4.
  • GSK2-GFP and OML4- MYC were co-expressed in tobacco leaves and protein levels were detected by western blotting. This result was repeated for three times.
  • S(105) and S(607) partially influence the abundance of OML4.
  • GSK2-GFP and OML4-MYC or OML4S105A, S607A-MYC were co-expressed in tobacco leaves and protein levels were detected by western blotting. This result was repeated for three times.
  • FIG. 7 shows that GSK2 acts genetically with OML4 to regulate seed size.
  • A, B ZHJ and GSK2-RNAi grains.
  • D, E Grain length (D) and width (E) of ZHJ and GSK2-RNAi transgenic lines.
  • F 1000-grain weight of ZHJ and GSK2-RNAi transgenic lines.
  • G, H SEM analysis of the outer surface of ZHJ (G) and GSK2-RNAi #1 (H) lemmas.
  • (I, J) The average length and width of outer epidermal cells in the longitudinal direction in ZHJ and GSK2-RNAi #1 lemmas.
  • K Grains of ZHJ, largel-1 , GSK2-RNAi#1 and largel-1 ; GSK2-RNAi#1.
  • L Grain length of ZHJ, largel-1 , GSK2-RNAi#1 and largel - 1 ; GSK2-RNAi#1. Values in D-F, l-J, and L are given as the means + SD (n350). *P ⁇ 0.05; **P ⁇ 0.01 compared with the wild type by Student’s t- test. Bars: 2 mm in A, B and K; 50 pm in G and H.
  • Figure 8 shows the expression level of the indicated genes in ZHJ and largel -1 panicles.
  • Figure 9 shows the CDS and protein sequence of OML4.
  • A The full-length cDNA sequence of OML4. The deletion sequence in largel -1 in the OML4 gene is show in red.
  • B The amino acid sequence of OML4.
  • C The amino acid sequence of largel -1.
  • Figure 10 shows the plant height, panicle size and grain number per panicle of gl_ARGE1 ;large1-1.
  • A Plants of ZHJ, largel -1 , gLARGEI ;large1-1 #1 and gLARGEI ;large1-1 #2.
  • B Phenotypes of ZHJ (left), largel -1 (middle) and gLARGEI ;large1-1 #1 (right) panicles.
  • C Plant height of ZHJ, largel -1 and gLARGEI ;large1-1 #1.
  • D Panicle length of ZHJ, largel -1 and gLARGEI ;large1-1 #1.
  • FIG 11 shows the structural features and phylogenetic tree of OML4.
  • A Amino acid sequence alignment of MEI2-LIKE proteins in rice. The three conserved RNA Recognition Motif (RRM) are marked.
  • B Phylogenetic tree of MEI2-LIKE proteins in rice and Arabidopsis.
  • OML1 , OML2, OML3, OML4, and OML5 are from O.sativa
  • TE1 and LOC103653544 (MEI2-LIKE protein 1) are from Z. mays
  • AML1 , AML2, AML3, AML4, and AML5 are from Arabidopsis.
  • the multiple sequence alignment and construction of phylogenetic tree were performed with MEGA7 using neighbor-joining method with 100 bootstrap replicates.
  • Figure 12 shows the identification of the largel -1 mutation.
  • CHR chromosome
  • POS position in chromosome.
  • nucleic acid As used herein, the words “nucleic acid”, “nucleic acid sequence”, “nucleotide”, “nucleic acid molecule” or “polynucleotide” are intended to include DNA molecules (e.g., cDNA or genomic DNA), RNA molecules (e.g., mRNA), natural occurring, mutated, synthetic DNA or RNA molecules, and analogs of the DNA or RNA generated using nucleotide analogs. It can be single-stranded or double-stranded. Such nucleic acids or polynucleotides include, but are not limited to, coding sequences of structural genes, anti-sense sequences, and non-coding regulatory sequences that do not encode mRNAs or protein products.
  • genes may include introns and exons as in the genomic sequence, or may comprise only a coding sequence as in cDNAs, and/or may include cDNAs in combination with regulatory sequences.
  • polypeptide and “protein” are used interchangeably herein and refer to amino acids in a polymeric form of any length, linked together by peptide bonds.
  • the aspects of the invention involve recombinant DNA technology and exclude embodiments that are solely based on generating plants by traditional breeding methods.
  • a method of increasing grain size and/or weight in a plant comprising reducing or abolishing the expression and/or activity of Mei2-Like protein 4 (OML4).
  • OML4 Mei2-Like protein 4
  • an “increase” in grain size and/or weight may comprise an increase of at least 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45% or 50% compared to the grain size and/or weight in a wild-type or control plant.
  • the increase may be between 5 and 30% and even more preferably between 10 and 25% compared to the grain size and/or weight in a wild-type or control plant.
  • grain size may comprise one of grain length and/or grain width.
  • the grain weight may comprise thousand-grain weight. Any of the above can be measured using standard techniques in the art.
  • yield in general means a measurable produce of economic value, typically related to a specified crop, to an area, and to a period of time. Individual plant parts directly contribute to yield based on their number, size and/or weight. The actual yield is the yield per square meter for a crop and year, which is determined by dividing total production (includes both harvested and appraised production) by planted square metres.
  • yield is increased by at least 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45% ,50% 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95% compared to a control or wild-type plant. In a preferred embodiment, yield is increased by at least 10%, and even more preferably between 10 and 60% compared to a control or wild- type plant.
  • the method further comprises reducing or abolishing the expression or activity of SHAGGY-like kinase (GSK2).
  • GSK2 SHAGGY-like kinase
  • the method comprises introducing at least one mutation into OML4. In a further embodiment, the method comprises introducing at least one mutation into OML4 and at least one mutation into GSK2.
  • “By at least one mutation” is meant that where the OML4 or GSK2 gene is present as more than one copy or homoeologue (with the same or slightly different sequence) there is at least one mutation in at least one gene. Preferably all genes are mutated in OML4 and/or GSK2.
  • reducing means a decrease in the levels of OML4 or GSK2 expression and/or activity by up to 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% or 90% when compared to the level in a wild-type or control plant.
  • bolish means that no expression of OML4 or GSK2 polypeptide is detectable or that no functional OML4 or GSK2 polypeptide is produced. Methods for determining the level of OML4 or GSK2 polypeptide expression and/or activity would be well known to the skilled person. These reductions can be measured by any standard technique known to the skilled person.
  • a reduction in the expression and/or content levels of at least OML4 or GSK2 expression may be a measure of protein and/or nucleic acid levels and can be measured by any technique known to the skilled person, such as, but not limited to, any form of gel electrophoresis or chromatography (e.g. HPLC).
  • the method comprises introducing at least one mutation into the, preferably endogenous, gene encoding OML4 and/or the OML4 promoter.
  • the method comprises introducing a further mutation into the, preferably endogenous, gene encoding GSK2 and/or the GSK2 promoter.
  • said mutation is in the coding region of the OML4 or the GSK2 gene.
  • at least one mutation or structural alteration may be introduced into the OML4 or GSK2 promoter such that the OML4 or GSK2 gene is either not expressed (i.e. expression is abolished) or expression is reduced, as defined herein.
  • At least one mutation may be introduced into the OML4 or GSK2 gene such that the altered gene does not express a full-length (i.e. expresses a truncated) OML4 or GSK2 protein or does not express a fully functional OML4 or GSK2 protein.
  • the activity of the OML4 or GSK2 polypeptide can be considered to be reduced or abolished as described herein.
  • the mutation may result in the expression of OML4 or GSK2 with no, significantly reduced or altered biological activity in vivo.
  • OML4 or GSK2 may not be expressed at all.
  • the sequence of the OML4 gene comprises or consists of a nucleic acid sequence as defined in SEQ ID NO: 2 (genomic) or a functional variant or homologue thereof and encodes a polypeptide as defined in SEQ ID NO: 1 or a functional variant or homologue thereof.
  • OML4 promoter is meant a region extending for at least 2000-2500bp, preferably 2049bp upstream of the ATG codon of the OML4 ORF (open reading frame).
  • sequence of the OML4 promoter comprises or consists of a nucleic acid sequence as defined in SEQ ID NO: 3 or a functional variant or homologue thereof.
  • GSK2 promoter is meant a region extending at least 200-300bp, preferably 247bp upstream of the ATG codon of the GSK2 ORF (open reading frame).
  • the sequence of the GSK2 promoter comprises or consists of a nucleic acid sequence as defined in SEQ ID NO: 6 or a functional variant or homologue thereof.
  • an ‘endogenous’ nucleic acid may refer to the native or natural sequence in the plant genome.
  • the endogenous sequence of the OML4 gene comprises SEQ ID NO: 2 and encodes an amino acid sequence as defined in SEQ ID NO: 1 or homologs thereof.
  • functional variants as defined herein
  • homologs are shown in SEQ ID NOs: 7-9, 13-15, 19-21 and 25-27. Accordingly, in one embodiment, the homolog encodes a polypeptide selected from SEQ ID NOs: 7, 13, 19 or 25 or the homolog comprises or consists of a nucleic acid sequence selected from SEQ ID NOs: 8, 14, 20, 26.
  • the endogenous sequence of the GSK2 gene comprises SEQ ID NO: 5 and encodes an amino acid sequence as defined in SEQ ID NO: 4 or homologs thereof. Also included in the scope of this invention are functional variants (as defined herein) and homologs of the above identified sequences. Examples of GSK2 homologs are shown in SEQ ID NOs: 10-12, 16-18, 22-24 and 28-30. Accordingly, in one embodiment, the homolog encodes a polypeptide selected from SEQ ID NOs: 10, 16, 22 or 28 or the homolog comprises or consists of a nucleic acid sequence selected from SEQ ID NOs: 11 , 17, 23 or 29.
  • a functional variant of a nucleic acid sequence refers to a variant gene sequence or part of the gene sequence which retains the biological function of the full non-variant sequence.
  • a functional variant also comprises a variant of the gene of interest which has sequence alterations that do not affect function, for example in non- conserved residues.
  • a codon for the amino acid alanine, a hydrophobic amino acid may be substituted by a codon encoding another less hydrophobic residue, such as glycine, or a more hydrophobic residue, such as valine, leucine, or isoleucine.
  • a codon encoding another less hydrophobic residue such as glycine
  • a more hydrophobic residue such as valine, leucine, or isoleucine.
  • changes which result in substitution of one negatively charged residue for another such as aspartic acid for glutamic acid, or one positively charged residue for another, such as lysine for arginine, can also be expected to produce a functionally equivalent product.
  • Nucleotide changes which result in alteration of the N-terminal and C-terminal portions of the polypeptide molecule would also not be expected to alter the activity of the polypeptide.
  • a functional variant has at least 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%,
  • homolog also designates a OML4 or GSK2 promoter or OML4 or GSK2 gene orthologue from other plant species.
  • a homolog may have, in increasing order of preference, at least 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%,
  • OML4 or “LARGE1” gene (such terms are used interchangeably herein) encodes a Mei-2 like protein, OML4. This protein is characterised by three RNA recognition motifs or RRMs.
  • sequence of the RRMs is selected from: SRTLFVRNINSNVEDSELKLLFEHFGDIRALYTACKHRGFVMISYYDIRSALNAKMELQ NKALRRRKLDIHYSIPKD: SEQ ID NO: 37
  • the OML4 nucleic acid (coding) sequence encodes a OML4 protein comprising at least one RRM motif, preferably all three motifs as defined above, or a variant thereof, wherein the variant has at least 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%,
  • the “GSKZ gene (SHAGGY-like kinase) encodes a serine/threonine kinase, which is an ortholog of BIN2, and is involved in BR signalling.
  • nucleic acid sequences or polypeptides are said to be “identical” if the sequence of nucleotides or amino acid residues, respectively, in the two sequences is the same when aligned for maximum correspondence as described below.
  • the terms “identical” or percent “identity,” in the context of two or more nucleic acids or polypeptide sequences, refer to two or more sequences or subsequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same, when compared and aligned for maximum correspondence over a comparison window, as measured using one of the following sequence comparison algorithms or by manual alignment and visual inspection.
  • sequence identity When percentage of sequence identity is used in reference to proteins or peptides, it is recognised that residue positions that are not identical often differ by conservative amino acid substitutions, where amino acids residues are substituted for other amino acid residues with similar chemical properties (e.g., charge or hydrophobicity) and therefore do not change the functional properties of the molecule. Where sequences differ in conservative substitutions, the percent sequence identity may be adjusted upwards to correct for the conservative nature of the substitution. Means for making this adjustment are well known to those of skill in the art. For sequence comparison, typically one sequence acts as a reference sequence, to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated.
  • sequence comparison algorithm calculates the percent sequence identities for the test sequences relative to the reference sequence, based on the program parameters.
  • algorithms that are suitable for determining percent sequence identity and sequence similarity are the BLAST and BLAST 2.0 algorithms.
  • Suitable homologues can be identified by sequence comparisons and identifications of conserved domains. There are predictors in the art that can be used to identify such sequences. The function of the homologue can be identified as described herein and a skilled person would thus be able to confirm the function, for example when overexpressed in a plant.
  • nucleotide sequences of the invention and described herein can also be used to isolate corresponding sequences from other organisms, particularly other plants, for example crop plants.
  • methods such as PCR, hybridization, and the like can be used to identify such sequences based on their sequence homology to the sequences described herein.
  • Topology of the sequences and the characteristic domains structure can also be considered when identifying and isolating homologs.
  • Sequences may be isolated based on their sequence identity to the entire sequence or to fragments thereof.
  • hybridization techniques all or part of a known nucleotide sequence is used as a probe that selectively hybridizes to other corresponding nucleotide sequences present in a population of cloned genomic DNA fragments or cDNA fragments (i.e., genomic or cDNA libraries) from a chosen plant.
  • the hybridization probes may be genomic DNA fragments, cDNA fragments, RNA fragments, or other oligonucleotides, and may be labelled with a detectable group, or any other detectable marker.
  • Hybridization of such sequences may be carried out under stringent conditions.
  • stringent conditions or “stringent hybridization conditions” is intended conditions under which a probe will hybridize to its target sequence to a detectably greater degree than to other sequences (e.g., at least 2-fold over background).
  • Stringent conditions are sequence dependent and will be different in different circumstances.
  • target sequences that are 100% complementary to the probe can be identified (homologous probing).
  • stringency conditions can be adjusted to allow some mismatching in sequences so that lower degrees of similarity are detected (heterologous probing).
  • a probe is less than about 1000 nucleotides in length, preferably less than 500 nucleotides in length.
  • stringent conditions will be those in which the salt concentration is less than about 1.5 M Na ion, typically about 0.01 to 1.0 M Na ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30°C for short probes (e.g., 10 to 50 nucleotides) and at least about 60°C for long probes (e.g., greater than 50 nucleotides). Duration of hybridization is generally less than about 24 hours, usually about 4 to 12. Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide.
  • a variant as used herein can comprise a nucleic acid sequence encoding a OML4 or a GSK2 polypeptide as defined herein that is capable of hybridising under stringent conditions as defined herein to a nucleic acid sequence as defined in SEQ ID NO: 2 or 5 respectively.
  • a method of increasing grain size and/or weight in a plant comprising reducing or abolishing the expression of at least one nucleic acid encoding a OML4 polypeptide, as described herein, wherein the method comprises introducing at least one mutation into at least OML4 gene and/or promoter, wherein the OML4 gene comprises or consists of a.
  • nucleic acid sequence encoding a polypeptide as defined in one of SEQ ID NO:1 ; or b. a nucleic acid sequence as defined in one of SEQ ID NO: 2; or c. a nucleic acid sequence with at least 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or at least 99% overall sequence identity to either (a) or (b); or d.
  • nucleic acid sequence encoding a OML4 polypeptide as defined herein that is capable of hybridising under stringent conditions as defined herein to the nucleic acid sequence of any of (a) to (c). and wherein the OML4 promoter comprises or consists of e. a nucleic acid sequence as defined in SEQ ID NO: 3; f. a nucleic acid sequence with at least 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or at least 99% overall sequence identity to (e); or g. a nucleic acid sequence capable of hybridising under stringent conditions as defined herein to the nucleic acid sequence of any of (e) to (f).
  • the mutation that is introduced into the endogenous OML4 gene or promoter or the GSK2 gene or promoter thereof to silence, reduce, or inhibit the biological activity and/or expression levels of the OML4 or GSK2 gene or protein can be selected from the following mutation types
  • a "missense mutation” which is a change in the nucleic acid sequence that results in the substitution of an amino acid for another amino acid
  • a "nonsense mutation” or "STOP codon mutation” which is a change in the nucleic acid sequence that results in the introduction of a premature STOP codon and, thus, the termination of translation (resulting in a truncated protein); plant genes contain the translation stop codons "TGA” (UGA in RNA), "TAA” (UAA in RNA) and “TAG” (UAG in RNA); thus any nucleotide substitution, insertion, deletion which results in one of these codons to be in the mature mRNA being translated (in the reading frame) will terminate translation.
  • a frameshift mutation resulting in the nucleic acid sequence being translated in a different frame downstream of the mutation.
  • a frameshift mutation can have various causes, such as the insertion, deletion or duplication of one or more nucleotides.
  • splice site which is a mutation that results in the insertion, deletion or substitution of a nucleotide at the site of splicing.
  • a “deletion” may refer to the deletion of at least one nucleotide. In one embodiment, said deletion may be between 1 and 20 base pairs. In a preferred embodiment, the at least one mutation is a deletion of at least one nucleotide.
  • At least one mutation as defined above and which leads to the insertion, deletion or substitution of at least one nucleic acid or amino acid compared to the wild-type OML4 or GSK 2 promoter or OML4 or GSK2 nucleic acid or protein sequence can affect the biological activity of the OML4 protein or GSK2 protein respectively.
  • the mutation is a loss of function mutation such as a premature stop codon, or an amino acid change in a highly conserved region that is predicted to be important for protein structure.
  • the mutation may be introduced into at least one RRM as defined herein of the OML4 gene.
  • the mutation may be a substitution or deletion of a phosphorylation site in OML4.
  • the mutation may be at position S105, S146 and/or S607 of SEQ ID NO: 1 or a homologous position in a homologous sequence.
  • the mutation prevents the phosphorylation of OML4 at one or more of these sites. As described in the examples, preventing phosphorylation (by GSK2) of OML4 at one or more of these sites reduces the protein levels of OML4.
  • the mutation is introduced into the OML4 or GSK2 promoter and is at least the deletion and/or insertion of at least one nucleic acid.
  • Other major changes such as deletions that remove functional regions of the promoter are also included as these will reduce the expression of OML4 and GSK2.
  • At least one mutation may be introduced into the OML4 promoter and at least one mutation is introduced into the OML4 gene. In a further embodiment, at least one mutation may also be introduced into the GSK2 gene and at least one mutation is introduced into the GSK2 promoter.
  • the mutation is introduced using mutagenesis or targeted genome editing. That is, in one embodiment, the invention relates to a method and plant that has been generated by genetic engineering methods as described above, and does not encompass naturally occurring varieties.
  • Targeted genome modification or targeted genome editing is a genome engineering technique that uses targeted DNA double-strand breaks (DSBs) to stimulate genome editing through homologous recombination (HR)-mediated recombination events.
  • DSBs DNA double-strand breaks
  • HR homologous recombination
  • customisable DNA binding proteins can be used: meganucleases derived from microbial mobile genetic elements, ZF nucleases based on eukaryotic transcription factors, transcription activator- 1 ike effectors (TALEs) from Xanthomonas bacteria, and the RNA-guided DNA endonuclease Cas9 from the type II bacterial adaptive immune system CRISPR (clustered regularly interspaced short palindromic repeats).
  • ZF and TALE proteins all recognize specific DNA sequences through protein-DNA interactions. Although meganucleases integrate nuclease and DNA-binding domains, ZF and TALE proteins consist of individual modules targeting 3 or 1 nucleotides (nt) of DNA, respectively. ZFs and TALEs can be assembled in desired combinations and attached to the nuclease domain of Fokl to direct nucleolytic activity toward specific genomic loci.
  • CRISPR is a microbial nuclease system involved in defense against invading phages and plasmids.
  • CRISPR loci in microbial hosts contain a combination of CRISPR-associated (Cas) genes as well as non-coding RNA elements capable of programming the specificity of the CRISPR-mediated nucleic acid cleavage (sgRNA).
  • sgRNA CRISPR-mediated nucleic acid cleavage
  • each CRISPR locus is the presence of an array of repetitive sequences (direct repeats) interspaced by short stretches of non-repetitive sequences (spacers).
  • the non-coding CRISPR array is transcribed and cleaved within direct repeats into short crRNAs containing individual spacer sequences, which direct Cas nucleases to the target site (protospacer).
  • the Type II CRISPR is one of the most well characterized systems and carries out targeted DNA double-strand break in four sequential steps. First, two non-coding RNA, the pre-crRNA array and tracrRNA, are transcribed from the CRISPR locus.
  • tracrRNA hybridizes to the repeat regions of the pre-crRNA and mediates the processing of pre-crRNA into mature crRNAs containing individual spacer sequences.
  • the mature crRNA:tracrRNA complex directs Cas9 to the target DNA via Watson-Crick base-pairing between the spacer on the crRNA and the protospacer on the target DNA next to the protospacer adjacent motif (PAM), an additional requirement for target recognition.
  • Cas9 mediates cleavage of target DNA to create a double-stranded break within the protospacer.
  • CRISPR-Cas9 compared to conventional gene targeting and other programmable endonucleases is the ease of multiplexing, where multiple genes can be mutated simultaneously simply by using multiple sgRNAs each targeting a different gene.
  • the intervening section can be deleted or inverted (Wiles et al., 2015).
  • Cas9 is thus the hallmark protein of the type II CRISPR-Cas system, and is a large monomeric DNA nuclease guided to a DNA target sequence adjacent to the PAM (protospacer adjacent motif) sequence motif by a complex of two noncoding RNAs: CRISPR RNA (crRNA) and trans-activating crRNA (tracrRNA).
  • the Cas9 protein contains two nuclease domains homologous to RuvC and HNH nucleases.
  • the HNH nuclease domain cleaves the complementary DNA strand whereas the RuvC-like domain cleaves the non-complementary strand and, as a result, a blunt cut is introduced in the target DNA.
  • Cpf1 which is another Cas protein, can be used as the endonuclease.
  • Cpf 1 differs from Cas9 in several ways: Cpf 1 requires a T- rich PAM sequence (TTTV) for target recognition, Cpf1 does not require a tracrRNA, (i.e.
  • the CRISPR/CPf1 system consists of a Cpf 1 enzyme and a crRNA.
  • the nuclease may be MAD7.
  • the single guide RNA is the second component of the CRISPR/Cas(Cpf or MAD7) system that forms a complex with the Cas9/Cpf1/MAD7 nuclease.
  • sgRNA is a synthetic RNA chimera created by fusing crRNA with tracrRNA.
  • the sgRNA guide sequence located at its 5’end confers DNA target specificity. Therefore, by modifying the guide sequence, it is possible to create sgRNAs with different target specificities.
  • the canonical length of the guide sequence is 20 bp.
  • Cas9 (or Cpf1/MAD7) expression plasmids for use in the methods of the invention can be constructed as described in the art.
  • Cas9 or Cpf 1 or MAD7 and the one or more sgRNA molecules may be delivered as separate or as single constructs.
  • the promoters used to drive expression of the CRISPR enzyme/sgRNA molecule may be the same or different.
  • RNA polymerase (Pol) ll-dependent promoters or the CaMV35S promoter can be used to drive expression of the CRISPR enzyme.
  • Pol Ill-dependent promoters such as U6 or U3, can be used to drive expression of the sgRNA.
  • the sgRNA molecules target a sequence selected from SEQ ID No: 33 (OML4 target sequence) or SEQ ID NO: 34 (GSK2 target sequence) or a variant thereof as defined herein.
  • the sgRNA molecules comprises a protospacer sequence selected from SEQ ID No: 35 (OML4 target sequence) or SEQ ID NO: 36 (GSK2 target sequence) or a variant thereof, as defined herein.
  • the method uses the sgRNA constructs defined in detail below to introduce a targeted mutation into a OML4 gene and/or promoter, and in a further embodiment, to additionally introduce a mutation into a GSK2 gene and/or promoter.
  • aspects of the invention involve targeted mutagenesis methods, specifically genome editing, and in a preferred embodiment exclude embodiments that are solely based on generating plants by traditional breeding methods.
  • the genome editing constructs may be introduced into a plant cell using any suitable method known to the skilled person (the term “introduced” can be used interchangeably with “transformation”, which is described below).
  • any of the nucleic acid constructs described herein may be first transcribed to form a preassembled Cas9(or other CRISP nuclease)-sgRNA ribonucleoprotein and then delivered to at least one plant cell using any of the above described methods, such as lipofection, electroporation, holistic bombardment or microinjection.
  • the invention also extends to a plant obtained or obtainable by any method described herein.
  • mutagenesis methods can be used to introduce at least one mutation into a OML4 gene or OML4 promoter sequence, or into a GSK2 gene or GSK2 promoter sequence. These methods include both physical and chemical mutagenesis. A skilled person will know further approaches can be used to generate such mutants, and methods for mutagenesis and polynucleotide alterations are well known in the art. See, for example, Kunkel (1985) Proc. Natl. Acad. Sci. USA 82:488- 492; Kunkel et al. (1987) Methods in Enzymol. 154:367-382; U.S. Patent No. 4,873,192; Walker and Gaastra, eds. (1983) Techniques in Molecular Biology (MacMillan Publishing Company, New York) and the references cited therein.
  • insertional mutagenesis is used, for example using T-DNA mutagenesis (which inserts pieces of the T-DNA from the Agrobacterium tumefaciens T-Plasmid into DNA causing either loss of gene function or gain of gene function mutations), site-directed nucleases (SDNs) or transposons as a mutagen. Insertional mutagenesis is an alternative means of disrupting gene function and is based on the insertion of foreign DNA into the gene of interest (see Krysan et al, The Plant Cell, Vol. 11 , 2283-2290, December 1999).
  • T-DNA is used as an insertional mutagen to disrupt the OML4 or GSK2 gene or OML4 or GSK2 promoter expression.
  • T-DNA not only disrupts the expression of the gene into which it is inserted, but also acts as a marker for subsequent identification of the mutation. Since the sequence of the inserted element is known, the gene in which the insertion has occurred can be recovered, using various cloning or PCR-based strategies.
  • the insertion of a piece of T-DNA in the order of 5 to 25 kb in length generally produces a disruption of gene function. If a large enough population of T-DNA transformed lines is generated, there are reasonably good chances of finding a transgenic plant carrying a T-DNA insert within any gene of interest. Transformation of spores with T-DNA is achieved by an Agro bacterium- mediated method which involves exposing plant cells and tissues to a suspension of Agrobacterium cells.
  • mutagenesis is physical mutagenesis, such as application of ultraviolet radiation, X-rays, gamma rays, fast or thermal neutrons or protons.
  • the targeted population can then be screened to identify a OML4 or GSK2 loss of function mutant.
  • the method comprises mutagenizing a plant population with a mutagen.
  • the mutagen may be a fast neutron irradiation or a chemical mutagen, for example selected from the following non-limiting list: ethyl methanesulfonate (EMS), methylmethane sulfonate (MMS), N-ethyl-N- nitrosurea (ENU), triethylmelamine (1 ⁇ M), N-methyl-N-nitrosourea (MNU), procarbazine, chlorambucil, cyclophosphamide, diethyl sulfate, acrylamide monomer, melphalan, nitrogen mustard, vincristine, dimethylnitosamine, N-methyl-N'-nitro- Nitrosoguanidine (MNNG), nitrosoguanidine, 2-aminopurine, 7,12 dimethyl- benz(a)anthracene (DMBA), ethylene oxide, hexamethylphosphoramide, bisulfan, diepoxyalkanes (diepoxyoctane (DE
  • the method used to create and analyse mutations is targeting induced local lesions in genomes (TILLING), reviewed in Henikoff et al, 2004.
  • TILLING induced local lesions in genomes
  • seeds are mutagenised with a chemical mutagen, for example EMS.
  • the resulting M1 plants are self-fertilised and the M2 generation of individuals is used to prepare DNA samples for mutational screening.
  • DNA samples are pooled and arrayed on microtiter plates and subjected to gene specific PCR.
  • the PCR amplification products may be screened for mutations in the target gene using any method that identifies heteroduplexes between wild type and mutant genes.
  • dHPLC denaturing high pressure liquid chromatography
  • DCE constant denaturant capillary electrophoresis
  • TGCE temperature gradient capillary electrophoresis
  • the PCR amplification products are incubated with an endonuclease that preferentially cleaves mismatches in heteroduplexes between wild type and mutant sequences.
  • Cleavage products are electrophoresed using an automated sequencing gel apparatus, and gel images are analyzed with the aid of a standard commercial image-processing program.
  • any primer specific to the OML4 or GSK2 nucleic acid sequence may be utilized to amplify the OML4 or GSK2 nucleic acid sequence within the pooled DNA sample.
  • the primer is designed to amplify the regions of the OML4 or GSK2 gene where useful mutations are most likely to arise, specifically in the areas of the genes that are highly conserved and/or confer activity as explained elsewhere.
  • the PCR primer may be labelled using any conventional labelling method.
  • the method used to create and analyse mutations is EcoTILLING.
  • EcoTILLING is molecular technique that is similar to TILLING, except that its objective is to uncover natural variation in a given population as opposed to induced mutations. The first publication of the EcoTILLING method was described in Comai et al.2004.
  • Rapid high-throughput screening procedures thus allow the analysis of amplification products for identifying a mutation conferring the reduction or inactivation of the expression of the OML4 or GSK2 gene as compared to a corresponding non- mutagenised wild type plant.
  • the seeds of the M2 plant carrying that mutation are grown into adult M3 plants and screened for the phenotypic characteristics associated with the target gene. Loss of and reduced function mutants with increased grain weight and/or grain size compared to a control can thus be identified.
  • Plants obtained or obtainable by such method which carry a partial or complete loss of function mutation in the endogenous OML4 gene or promoter locus are also within the scope of the invention.
  • the expression of the OML4 or GSK2 gene may be reduced at either the level of transcription or translation.
  • expression of a OML4 or GSK2 nucleic acid as defined herein can be reduced or silenced using a number of gene silencing methods known to the skilled person, such as, but not limited to, the use of small interfering nucleic acids (siNA) against OML4 or GSK2.
  • siNA small interfering nucleic acids
  • silencing is a term generally used to refer to suppression of expression of a gene via sequence-specific interactions that are mediated by RNA molecules. The degree of reduction may be so as to totally abolish production of the encoded gene product, but more usually the abolition of expression is partial, with some degree of expression remaining. The term should not therefore be taken to require complete “silencing" of expression.
  • the siNA may include, short interfering RNA (siRNA), double- stranded RNA (dsRNA), micro-RNA (miRNA), antagomirs and short hairpin RNA (shRNA) capable of mediating RNA interference.
  • the inhibition of expression and/or activity can be measured by determining the presence and/or amount of OML4 or GSK2 transcript using techniques well known to the skilled person (such as Northern Blotting, RT-PCR and so on).
  • Transgenes may be used to suppress endogenous plant genes. This was discovered originally when chalcone synthase transgenes in petunia caused suppression of the endogenous chalcone synthase genes and indicated by easily visible pigmentation changes. Subsequently it has been described how many, if not all plant genes can be "silenced" by transgenes. Gene silencing requires sequence similarity between the transgene and the gene that becomes silenced. This sequence homology may involve promoter regions or coding regions of the silenced target gene. When coding regions are involved, the transgene able to cause gene silencing may have been constructed with a promoter that would transcribe either the sense or the antisense orientation of the coding sequence RNA. It is likely that the various examples of gene silencing involve different mechanisms that are not well understood. In different examples there may be transcriptional or post-transcriptional gene silencing and both may be used according to the methods of the invention.
  • RNA interference is another post-transcriptional gene-silencing phenomenon which may be used according to the methods of the invention. This is induced by double-stranded RNA in which mRNA that is homologous to the dsRNA is specifically degraded. It refers to the process of sequence-specific post-transcriptional gene silencing mediated by short interfering RNAs (siRNA).
  • siRNA short interfering RNAs
  • the process of RNAi begins when the enzyme, DICER, encounters dsRNA and chops it into pieces called small- interfering RNAs (siRNA).
  • This enzyme belongs to the RNase III nuclease family. A complex of proteins gathers up these RNA remains and uses their code as a guide to search out and destroy any RNAs in the cell with a matching sequence, such as target mRNA.
  • MicroRNAs may be used to knock out gene expression and/or mRNA translation.
  • miRNAs miRNAs are typically single stranded small RNAs typically 19-24 nucleotides long. Most plant miRNAs have perfect or near-perfect complementarity with their target sequences. However, there are natural targets with up to five mismatches. They are processed from longer non coding RNAs with characteristic fold-back structures by double-strand specific RNases of the Dicer family. Upon processing, they are incorporated in the RNA-induced silencing complex (RISC) by binding to its main component, an Argonaute protein.
  • RISC RNA-induced silencing complex
  • miRNAs serve as the specificity components of RISC, since they base-pair to target nucleic acids, mostly mRNAs, in the cytoplasm. Subsequent regulatory events include target mRNA cleavage and destruction and/or translational inhibition. Effects of miRNA overexpression are thus often reflected in decreased mRNA levels of target genes.
  • Artificial microRNA (amiRNA) technology has been applied in Arabidopsis thaliana and other plants to efficiently silence target genes of interest. The design principles for amiRNAs have been generalized and integrated into a Web-based tool (http://wmd.weiaelworld.org) ⁇
  • a plant may be transformed to introduce a RNAi, shRNA, snRNA, dsRNA, siRNA, miRNA, ta-siRNA, amiRNA or cosuppression molecule that has been designed to target the expression of an OML4 or GSK2 nucleic acid sequence and selectively decreases or inhibits the expression of the gene or stability of its transcript.
  • the RNAi, snRNA, dsRNA, shRNA siRNA, miRNA, amiRNA, ta-siRNA or cosuppression molecule used according to the various aspects of the invention comprises a fragment of at least 17 nt, preferably 22 to 26 nt and can be designed on the basis of the information shown in any of SEQ ID NOs:2, 5, 8, 11 , 14, 17, 20, 23, 26 and 29. Guidelines for designing effective siRNAs are known to the skilled person. Briefly, a short fragment of the target gene sequence (e.g., 19-40 nucleotides in length) is chosen as the target sequence of the siRNA of the invention. The short fragment of target gene sequence is a fragment of the target gene mRNA.
  • the criteria for choosing a sequence fragment from the target gene mRNA to be a candidate siRNA molecule include 1) a sequence from the target gene mRNA that is at least 50-100 nucleotides from the 5’ or 3’ end of the native mRNA molecule, 2) a sequence from the target gene mRNA that has a G/C content of between 30% and 70%, most preferably around 50%, 3) a sequence from the target gene mRNA that does not contain repetitive sequences (e.g., AAA, CCC, GGG, TTT, AAAA, CCCC, GGGG, TTTT), 4) a sequence from the target gene mRNA that is accessible in the mRNA, 5) a sequence from the target gene mRNA that is unique to the target gene, 6) avoids regions within 75 bases of a start codon.
  • repetitive sequences e.g., AAA, CCC, GGG, TTT, AAAA, CCCC, GGGG, TTTT
  • the sequence fragment from the target gene mRNA may meet one or more of the criteria identified above.
  • the selected gene is introduced as a nucleotide sequence in a prediction program that takes into account all the variables described above for the design of optimal oligonucleotides.
  • This program scans any mRNA nucleotide sequence for regions susceptible to be targeted by siRNAs.
  • the output of this analysis is a score of possible siRNA oligonucleotides. The highest scores are used to design double stranded RNA oligonucleotides that are typically made by chemical synthesis.
  • degenerate siRNA sequences may be used to target homologous regions.
  • siRNAs according to the invention can be synthesized by any method known in the art. RNAs are preferably chemically synthesized using appropriately protected ribonucleoside phosphoramidites and a conventional DNA/RNA synthesizer. Additionally, siRNAs can be obtained from commercial RNA oligonucleotide synthesis suppliers.
  • the silencing RNA molecule is introduced into the plant using conventional methods, for example a vector and Agrobacterium-mediated transformation. Stably transformed plants are generated and expression of the OML4 or GSK2 gene compared to a wild type control plant is analysed.
  • Silencing of the OML4 or GSK2 nucleic acid sequence may also be achieved using virus-induced gene silencing.
  • the plant expresses a nucleic acid construct comprising a RNAi, shRNA snRNA, dsRNA, siRNA, miRNA, ta-siRNA, amiRNA or co suppression molecule that targets the OML4 nucleic acid sequence as described herein and reduces expression of the endogenous OML4 nucleic acid sequence.
  • a gene is targeted when, for example, the RNAi, snRNA, dsRNA, siRNA, shRNA miRNA, ta-siRNA, amiRNA or cosuppression molecule selectively decreases or inhibits the expression of the gene compared to a control plant.
  • RNAi, snRNA, dsRNA, siRNA, miRNA, ta-siRNA, amiRNA or cosuppression molecule targets a OML4 or GSK2 nucleic acid sequence when the RNAi, shRNA snRNA, dsRNA, siRNA, miRNA, ta-siRNA, amiRNA or cosuppression molecule hybridises under stringent conditions to the gene transcript.
  • a further approach to gene silencing is by targeting nucleic acid sequences complementary to the regulatory region of the gene (e.g., the promoter and/or enhancers) of OML4 or GSK2 to form triple helical structures that prevent transcription of the gene in target cells.
  • Other methods such as the use of antibodies directed to an endogenous polypeptide for inhibiting its function in plants, or interference in the signalling pathway in which a polypeptide is involved, will be well known to the skilled man.
  • man-made molecules may be useful for inhibiting the biological function of a target polypeptide, or for interfering with the signalling pathway in which the target polypeptide is involved.
  • RNAi construct to silence GSK2 comprises or consists of the sequence defined in SEQ ID NO: 31 or a functional variant thereof.
  • the invention extends to a plant obtained or obtainable by a method as described herein.
  • a method of increasing the grain number in a plant As shown in Figure 4(m) overexpressing OML4 results in a significant increase in grain number. Accordingly, in a further aspect of the invention, there is provided a method of increasing grain number in a plant, the method comprising increasing the expression and/or activity of OML4. Preferably said increase is relative to a wild-type or control plant.
  • an “increase” in grain number may comprise an increase of at least 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45% or 50% compared to the grain number in a wild-type or control plant.
  • an increase in grain number may be an increase in grain number per panicle. Any of the above can be measured using standard techniques in the art.
  • the method further comprises increasing the expression or activity of SHAGGY-like kinase (GSK2).
  • the method may comprise introducing and expressing in a plant or plant cell a nucleic acid construct comprising a nucleic acid sequence encoding an OML4 polypeptide as defined in SEQ ID NO: 1 or a homolog or functional variant thereof, as defined herein.
  • the nucleic acid sequence is operably linked to a regulatory sequence, preferably a promoter.
  • the nucleic acid construct may comprise a first nucleic acid sequence encoding an OML4 polypeptide as defined above and a second nucleic acid sequence encoding a GSK2 polypeptide as defined in SEQ ID NO: 4 or a homolog or functional variant thereof.
  • the first and second nucleic acid sequences are operably linked to a regulatory sequence, preferably a promoter.
  • the first and second nucleic acid sequences may be operably linked to the same or a different regulatory sequence.
  • the method may comprise introducing and expressing a first nucleic acid construct comprising a nucleic acid sequence encoding an OML4 polypeptide as defined above and a second nucleic acid construct comprising a nucleic acid sequence encoding a GSK2 polypeptide as defined above.
  • the nucleic acid sequences are preferably operably linked to a regulatory sequence.
  • the second nucleic acid construct may be introduced and expressed in the plant before, after or concurrently with the first nucleic acid construct.
  • the progeny plant is stably transformed with the nucleic acid construct described herein and comprises the exogenous polypeptide or polypeptides that are heritably maintained in the plant cell.
  • the method may also comprise the additional step of collecting seeds from the selected progeny plant.
  • the method may further comprise the step of regenerating a transgenic plant from the plant cell wherein the transgenic plant comprises in its genome a nucleic acid sequence selected from SEQ ID NO: 2 and a nucleic acid sequence selected from SEQ ID NO: 5 or a homolog or functional variant thereof, and obtaining progeny derived from the transgenic plant, where the progeny exhibits an increase in grain number.
  • the method may comprise introducing a mutation into the plant genome, where said mutation is the insertion of at least one or more additional copy(ies) of a nucleic acid encoding a OML4 polypeptide or a homolog or variant thereof such that said sequence is operably linked to a regulatory sequence and wherein said mutation is introduced using targeted genome editing.
  • said mutation results in an increase in the expression of a OML4 nucleic acid compared to a control or wild-type plant.
  • the method may further comprise introducing one or more further mutations into the plant genome, where the one or more further mutations is the insertion of at least one or more additional copy(ies) of a nucleic acid encoding a GSK2 polypeptide or a homologue or functional variant thereof such that said sequence is operably linked to a regulatory sequence.
  • the mutation is introduced using targeted genome editing.
  • the mutation also results in an increase in the expression of a GSK2 polypeptide compared to a control or wild-type plant.
  • the genomic and amino acid sequence of rice OML4 and GSK2 and its homologs are defined below.
  • the mutation is introduced using CRISPR as described herein.
  • the invention also extends to plants obtained or obtainable by any method described herein.
  • a genetically altered plant part thereof or plant cell characterised in that the plant does not express OML4, has reduced levels of OML4 expression, does not express a functional OML4 protein or expresses a OML4 protein with reduced function and/or activity.
  • the plant is a reduction (knock down) or loss of function (knock out) mutant wherein the function of the OML4 nucleic acid sequence is reduced or lost compared to a wild type control plant.
  • a mutation is introduced into either the OML4 gene sequence or the corresponding promoter sequence, which disrupts the transcription of the gene. Therefore, preferably said plant comprises at least one mutation in the promoter and/or gene for OML4.
  • the plant may comprise a mutation in both the promoter and gene for OML4.
  • the genetically altered plant, part thereof or plant cell is further characterised in that the plant also does not express GSK2 has reduced levels of GSK2 expression, does not express a functional GSK2 protein or expresses a GSK2 protein with reduced function and/or activity.
  • a plant, part thereof or plant cell characterised by an increase in grain weight and/or size compared to a wild-type or control pant, wherein preferably, the plant comprises at least one mutation in the OML4 gene and/or its promoter.
  • the plant may be produced by introducing a mutation, preferably a deletion, insertion or substitution into the OML4 gene and/or promoter sequence by any of the above described methods.
  • a mutation preferably a deletion, insertion or substitution into the OML4 gene and/or promoter sequence by any of the above described methods.
  • said mutation is introduced into a least one plant cell and a plant regenerated from the at least one mutated plant cell.
  • the plant or plant cell may comprise a nucleic acid construct expressing an RNAi molecule targeting the OML4 or GSK2 gene as described herein.
  • said construct is stably incorporated into the plant genome.
  • These techniques also include gene targeting using vectors that target the gene of interest and which allow integration of a transgene at a specific site.
  • the targeting construct is engineered to recombine with the target gene, which is accomplished by incorporating sequences from the gene itself into the construct. Recombination then occurs in the region of that sequence within the gene, resulting in the insertion of a foreign sequence to disrupt the gene. With its sequence interrupted, the altered gene will be translated into a nonfunctional protein, if it is translated at all.
  • the method comprises introducing at least one mutation into the OML4 gene and/or OML4 promoter of preferably at least one plant cell using any mutagenesis technique described herein.
  • the method comprises further introducing at least one mutation into the GSK2 gene and/or GSK2 promoter Preferably, said method further comprising regenerating a plant from the mutated plant cell.
  • the method may further comprise selecting one or more mutated plants, preferably for further propagation.
  • said selected plants comprise at least one mutation in the target gene(s) and/or promoter sequence (s).
  • said plants or said seeds of said plant are characterised by abolished or a reduced level of OML4 expression and/or a reduced level of OML4 polypeptide activity.
  • Expression and/or activity levels of OML4 can be measured by any standard technique known to the skilled person. A reduction is as described herein.
  • the selected plants may be propagated by a variety of means, such as by clonal propagation or classical breeding techniques.
  • a first generation (or T1) transformed plant may be selfed and homozygous second-generation (or T2) transformants selected, and the T2 plants may then further be propagated through classical breeding techniques.
  • the generated transformed organisms may take a variety of forms. For example, they may be chimeras of transformed cells and non- transformed cells; clonal transformants (e.g., all cells transformed to contain the expression cassette); grafts of transformed and untransformed tissues (e.g., in plants, a transformed rootstock grafted to an untransformed scion).
  • a genetically altered plant characterised in that the expression of OML4 is increased compared to the level of expression in a control or wild-type plant.
  • the plant expresses a polynucleotide that is either exogenous or endogenous to that plant. That is, a polynucleotide that is introduced into the plant by any means other than a sexual cross.
  • an exogenous nucleic acid is expressed in the transgenic plant, which is a nucleic acid construct comprising a nucleic acid construct as described above.
  • the plant carries a mutation in its genome where the mutation is the insertion of at least one or more additional copy of a nucleic acid sequence encoding an OML4 polypeptide, as defined herein, or a homolog or variant thereof such that said sequence is operably linked to a regulatory sequence.
  • the plant may further comprise a second mutation in the plant genome, wherein the mutation is the insertion of at least one or more additional copy of a nucleic acid sequence encoding a GSK2 polypeptide, as defined herein, or a homolog or variant thereof such that said sequence is operably linked to a regulatory sequence.
  • the mutation is introduced using targeted genome editing.
  • a “genetically altered plant” or “mutant plant” is a plant that has been genetically altered compared to the naturally occurring wild type (WT) plant.
  • a mutant plant is a plant that has been altered compared to the naturally occurring wild type (WT) plant using a mutagenesis method, such as any of the mutagenesis methods described herein.
  • the mutagenesis method is targeted genome modification or genome editing.
  • the plant genome has been altered compared to wild type sequences using a mutagenesis method. Such plants have an altered phenotype as described herein, such as an increased disease resistance.
  • increased grain weight and/or size is conferred by the presence of an altered plant genome, for example, a mutated endogenous OML4 gene or OML4 promoter sequence.
  • the endogenous promoter or gene sequence is specifically targeted using targeted genome modification and the presence of a mutated gene or promoter sequence is not conferred by the presence of transgenes expressed in the plant.
  • the genetically altered plant can be described as transgene-free.
  • a plant according to the various aspects of the invention, including the transgenic plants, methods and uses described herein may be a monocot or a dicot plant.
  • the plant is a crop plant.
  • crop plant is meant any plant which is grown on a commercial scale for human or animal consumption or use.
  • the crop plant is selected from rice, wheat, maize, soybean and brassicas, such as for example, B.napus. More preferably, the crop plant is rice and even more preferably the japonica or indica variety.
  • plant encompasses whole plantsand progeny of the plants and plant parts, including seeds, fruit, shoots, stems, leaves, roots (including tubers), flowers, tissues and organs, wherein each of the aforementioned comprise at least one of the mutations described herein or a sgRNA or an RNAi construct as described herein.
  • plant also encompasses plant cells, suspension cultures, callus tissue, embryos, meristematic regions, gametophytes, sporophytes, pollen and microspores, again wherein each of the aforementioned comprises at least one of the mutations described herein or nucleic acid construct, a sgRNA or an RNAi construct as described herein.
  • the plat part is a grain or seed.
  • the invention also extends to harvestable parts of a plant of the invention as described herein, but not limited to seeds, leaves, fruits, flowers, stems, roots, rhizomes, tubers and bulbs.
  • the aspects of the invention also extend to products derived, preferably directly derived, from a harvestable part of such a plant, such as dry pellets or powders, oil, fat and fatty acids, starch or proteins.
  • Another product that may derived from the harvestable parts of the plant of the invention is biodiesel.
  • the invention also relates to food products and food supplements comprising the plant of the invention or parts thereof. In one embodiment, the food products may be animal feed.
  • a product derived from a plant as described herein or from a part thereof there is provided.
  • the plant part or harvestable product is a seed or grain. Therefore, in a further aspect of the invention, there is provided a seed produced from a genetically altered plant as described herein.
  • the plant part is pollen, a propagule or progeny of the genetically altered plant described herein. Accordingly, in a further aspect of the invention there is provided pollen, a propagule or progeny of the genetically altered plant as described herein.
  • a control plant as used herein according to all of the aspects of the invention is a plant which has not been modified according to the methods of the invention. Accordingly, in one embodiment, the control plant does not have reduced expression of a OML4 nucleic acid and/or reduced activity of a OML4 polypeptide. In an alternative embodiment, the plant been genetically modified, as described above. In one embodiment, the control plant is a wild type plant. The control plant is typically of the same plant species, preferably having the same genetic background as the modified plant. Genome editing constructs for use with the methods for targeted genome modification described herein
  • crRNA or CRISPR RNA is meant the sequence of RNA that contains the protospacer element and additional nucleotides that are complementary to the tracrRNA.
  • tracrRNA transactivating RNA
  • a CRISPR enzyme such as Cas9 thereby activating the nuclease complex to introduce double-stranded breaks at specific sites within the genomic sequence of at least one OML4 or GSK2 nucleic acid or promoter sequence.
  • protospacer element is meant the portion of crRNA (or sgRNA) that is complementary to the genomic DNA target sequence, usually around 20 nucleotides in length. This may also be known as a spacer or targeting sequence.
  • sgRNA single-guide RNA
  • sgRNA single-guide RNA
  • gRNA single-guide RNA
  • the sgRNA or gRNA provide both targeting specificity and scaffolding/binding ability for a Cas nuclease.
  • a gRNA may refer to a dual RNA molecule comprising a crRNA molecule and a tracrRNA molecule.
  • TAL effector transcription activator-like (TAL) effector
  • TALE transcription activator-like (TAL) effector
  • genomic DNA target sequence e.g. a sequence within the OML4 gene or promoter sequence
  • a TALE protein is composed of a central domain that is responsible for DNA binding, a nuclear-localisation signal and a domain that activates target gene transcription.
  • the DNA-binding domain consists of monomers and each monomer can bind one nucleotide in the target nucleotide sequence.
  • Monomers are tandem repeats of 33-35 amino acids, of which the two amino acids located at positions 12 and 13 are highly variable (repeat variable diresidue, RVD). It is the RVDs that are responsible for the recognition of a single specific nucleotide.
  • HD targets cytosine; Nl targets adenine, NG targets thymine and NN targets guanine (although NN can also bind to adenine with lower specificity).
  • a nucleic acid construct wherein the nucleic acid construct encodes at least one DNA-binding domain, wherein the DNA- binding domain can bind to a sequence in the OML4 gene, wherein said sequence is comprises or consists of SEQ ID NO: 33 or a variant thereof.
  • the DNA-binding domain can bind to a sequence in the GSK2 gene, wherein said sequence comprises or consists of SEQ ID NO: 34 or a variant thereof.
  • said construct further comprises a nucleic acid encoding a SSN, such as Fokl or a Cas protein.
  • the nucleic acid construct encodes at least one protospacer element wherein the sequence of the protospacer element is selected from SEQ ID No: 35 (to target OML4) or SEQ ID NO: 36 (to target GSK2) or a variant thereof.
  • the nucleic acid construct comprises a crRNA-encoding sequence.
  • a crRNA sequence may comprise the protospacer elements as defined above and preferably additional nucleotides that are complementary to the tracrRNA.
  • An appropriate sequence for the additional nucleotides will be known to the skilled person as these are defined by the choice of Cas protein.
  • the nucleic acid construct further comprises a tracrRNA sequence.
  • a tracrRNA sequence would be known to the skilled person as this sequence is defined by the choice of Cas protein.
  • the nucleic acid construct comprises at least one nucleic acid sequence that encodes a sgRNA (or gRNA).
  • sgRNA typically comprises a crRNA sequence, a tracrRNA sequence and preferably a sequence for a linker loop.
  • the nucleic acid construct may further comprise at least one nucleic acid sequence encoding an endoribonuclease cleavage site.
  • the endoribonuclease is Csy4 (also known as Cas6f).
  • the nucleic acid construct comprises multiple sgRNA nucleic acid sequences the construct may comprise the same number of endoribonuclease cleavage sites.
  • the cleavage site is 5’ of the sgRNA nucleic acid sequence. Accordingly, each sgRNA nucleic acid sequence is flanked by a endoribonuclease cleavage site.
  • variant refers to a nucleotide sequence where the nucleotides are substantially identical to one of the above sequences.
  • the variant may be achieved by modifications such as an insertion, substitution or deletion of one or more nucleotides.
  • the variant has at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% identity to any one of the above sequences.
  • sequence identity is at least 90%.
  • sequence identity is 100%. Sequence identity can be determined by any one known sequence alignment program in the art.
  • the invention also relates to a nucleic acid construct comprising a nucleic acid sequence operably linked to a suitable plant promoter.
  • a suitable plant promoter may be a constitutive or strong promoter or may be a tissue-specific promoter.
  • suitable plant promoters are selected from, but not limited to U3 and U6.
  • the nucleic acid construct of the present invention may also further comprise a nucleic acid sequence that encodes a CRISPR enzyme.
  • CRISPR enzyme is meant an RNA-guided DNA endonuclease that can associate with the CRISPR system. Specifically, such an enzyme binds to the tracrRNA sequence.
  • the CRIPSR enzyme is a Cas protein (“CRISPR associated protein), preferably Cas 9 or Cpf 1 , more preferably Cas9.
  • Cas9 is a codon-optimised Cas9 (specific for the plant in question).
  • Cas9 has the sequence described in SEQ ID NO: 32 or a functional variant or homolog thereof.
  • the CRISPR enzyme is a protein from the family of Class 2 candidate x proteins, such as C2c1 , C2C2 and/or C2c3.
  • the Cas protein is from Streptococcus pyogenes.
  • the Cas protein may be from any one of Staphylococcus aureus, Neisseria meningitides, Streptococcus thermophiles or Treponema denticola.
  • the CRISPR enzyme is MAD7.
  • the term “functional variant” as used herein with reference to Cas9 refers to a variant Cas9 gene sequence or part of the gene sequence which retains the biological function of the full non-variant sequence, for example, acts as a DNA endonuclease, or recognition or/and binding to DNA.
  • a functional variant also comprises a variant of the gene of interest, which has sequence alterations that do not affect function, for example non-conserved residues.
  • a functional variant of SEQ ID NO: 32 has at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% overall sequence identity to the nucleic acid acid represented by SEQ ID NO: 32.
  • the Cas9 protein has been modified to improve activity.
  • Suitable homologs or orthologs can be identified by sequence comparisons and identifications of conserved domains.
  • the function of the homolog or ortholog can be identified as described herein and a skilled person would thus be able to confirm the function when expressed in a plant.
  • the nucleic acid construct comprises at least one nucleic acid sequence that encodes a TAL effector, wherein said effector targets a OML4 sequence, such as SEQ ID NO: 33 or a GSK2 sequence such as SEQ ID NO: 34.
  • OML4 sequence such as SEQ ID NO: 33
  • GSK2 sequence such as SEQ ID NO: 34.
  • Methods for designing a TAL effector would be well known to the skilled person, given the target sequence. Examples of suitable methods are given in Sanjana et al., and Cermak T et al, both incorporated herein by reference.
  • said nucleic acid construct comprises two nucleic acid sequences encoding a TAL effector, to produce a TALEN pair.
  • the nucleic acid construct further comprises a sequence-specific nuclease (SSN).
  • SSN is a endonuclease such as Fokl.
  • the TALENs are assembled by the Golden Gate cloning method in a single plasmid or nucleic acid construct.
  • a sgRNA molecule wherein the sgRNA molecule comprises a crRNA sequence and a tracrRNA sequence and wherein the crRNA sequence can bind to at least one sequence such as SEQ ID NO: 33 (for OML4) or SEQ ID NO: 34 (for GSK2) or a variant thereof.
  • the sgRNA molecule may comprise at least one chemical modification, for example that enhances its stability and/or binding affinity to the target sequence or the crRNA sequence to the tracrRNA sequence.
  • modifications would be well known to the skilled person, and include for example, but not limited to, the modifications described in Rahdar et al., 2015, incorporated herein by reference.
  • the crRNA may comprise a phosphorothioate backbone modification, such as 2’-fluoro (2’-F), 2’-0-methyl Me) and S-constrained ethyl (cET) substitutions.
  • nucleic acid sequence that encodes for a protospacer element (as defined in any of SEQ ID NO: 35 or 36.)
  • Cas9 and sgRNA may be combined or in separate expression vectors (or nucleic acid constructs, such terms are used interchangeably).
  • an isolated plant cell is transfected with a single nucleic acid construct comprising both sgRNA and Cas9 as described in detail above.
  • an isolated plant cell is transfected with two nucleic acid constructs, a first nucleic acid construct comprising at least one sgRNA as defined above and a second nucleic acid construct comprising Cas9 or a functional variant or homolog thereof.
  • the second nucleic acid construct may be transfected below, after or concurrently with the first nucleic acid construct.
  • the advantage of a separate, second construct comprising a cas protein is that the nucleic acid construct encoding at least one sgRNA can be paired with any type of cas protein, as described herein, and therefore is not limited to a single cas function (as would be the case when both cas and sgRNA are encoded on the same nucleic acid construct).
  • the nucleic acid construct comprising a cas protein is transfected first and is stably incorporated into the genome, before the second transfection with a nucleic acid construct comprising at least one sgRNA nucleic acid.
  • a plant or part thereof or at least one isolated plant cell is transfected with mRNA encoding a cas protein and co-transfected with at least one nucleic acid construct as defined herein.
  • Cas9 expression vectors for use in the present invention can be constructed as described in the art.
  • the expression vector comprises a nucleic acid sequence as defined herein or a functional variant or homolog thereof, wherein said nucleic acid sequence is operably linked to a suitable promoter.
  • suitable promoters include, but are not limited to Cas9, 35S and Actin.
  • an isolated plant cell transfected with at least one sgRNA molecule as described herein.
  • a genetically modified or edited plant comprising the transfected cell described herein.
  • the nucleic acid construct or constructs may be integrated in a stable form.
  • the nucleic acid construct or constructs are not integrated (i.e. are transiently expressed).
  • the genetically modified plant is free of any sgRNA and/or Cas protein nucleic acid. In other words, the plant is transgene free.
  • introduction encompasses the transfer of an exogenous polynucleotide into a host cell, irrespective of the method used for transfer.
  • Plant tissue capable of subsequent clonal propagation may be transformed with a genetic construct of the present invention and a whole plant regenerated there from.
  • the particular tissue chosen will vary depending on the clonal propagation systems available for, and best suited to, the particular species being transformed.
  • Exemplary tissue targets include leaf disks, pollen, embryos, cotyledons, hypocotyls, megagametophytes, callus tissue, existing meristematic tissue (e.g., apical meristem, axillary buds, and root meristems), and induced meristem tissue (e.g., cotyledon meristem and hypocotyl meristem).
  • tissue targets include leaf disks, pollen, embryos, cotyledons, hypocotyls, megagametophytes, callus tissue, existing meristematic tissue (e.g., apical meristem, axillary buds, and root meristems), and induced meristem tissue (e.g., cotyledon meristem and hypocotyl meristem).
  • the resulting transformed plant cell may then be used to regenerate a transformed plant in a manner known to persons skilled in the art.
  • transformation Transformation of plants is now
  • Transformation methods include the use of liposomes, electroporation, chemicals that increase free DNA uptake, injection of the DNA directly into the plant (microinjection), gene guns (or biolistic particle delivery systems (bioloistics)) as described in the examples, lipofection, transformation using viruses or pollen and microprojection.
  • Methods may be selected from the calcium/polyethylene glycol method for protoplasts, ultrasound-mediated gene transfection, optical or laser transfection, transfection using silicon carbide fibers, electroporation of protoplasts, microinjection into plant material, DNA or RNA-coated particle bombardment, infection with (non-integrative) viruses and the like.
  • Transgenic plants can also be produced via Agrobacterium tumefaciens mediated transformation, including but not limited to using the floral dip/ Agrobacterium vacuum infiltration method as described in Clough & Bent (1998) and incorporated herein by reference.
  • At least one nucleic acid construct or sgRNA molecule as described herein can be introduced to at least one plant cell using any of the above described methods.
  • any of the nucleic acid constructs described herein may be first transcribed to form a preassembled Cas9- sgRNA ribonucleoprotein and then delivered to at least one plant cell using any of the above described methods, such as lipofection, electroporation or microinjection.
  • the plant material obtained in the transformation is, as a rule, subjected to selective conditions so that transformed plants can be distinguished from untransformed plants.
  • the seeds obtained in the above-described manner can be planted and, after an initial growing period, subjected to a suitable selection by spraying.
  • a further possibility is growing the seeds, if appropriate after sterilization, on agar plates using a suitable selection agent so that only the transformed seeds can grow into plants.
  • a suitable marker can be bar-phosphinothricin or PPT.
  • the transformed plants are screened for the presence of a selectable marker, such as, but not limited to, GFP, GUS (b-glucuronidase). Other examples would be readily known to the skilled person.
  • the generated transformed plants may be propagated by a variety of means, such as by clonal propagation or classical breeding techniques.
  • a first generation (or T1) transformed plant may be selfed and homozygous second-generation (or T2) transformants selected, and the T2 plants may then further be propagated through classical breeding techniques.
  • a method of obtaining a genetically modified plant as described herein comprising a. selecting a part of the plant; b. transfecting at least one cell of the part of the plant of paragraph (a) with at least one nucleic acid construct as described herein or at least one sgRNA molecule as described herein, using the transfection or transformation techniques described above; c. regenerating at least one plant derived from the transfected cell or cells; d. selecting one or more plants obtained according to paragraph (c) that show silencing or reduced expression of OML4.
  • the method also comprises the step of screening the genetically modified plant for SSN (preferably CRISPR)-induced mutations in the OML4 gene or promoter sequence.
  • the method comprises obtaining a DNA sample from a transformed plant and carrying out DNA amplification to detect a mutation in at least one OML4 gene or promoter sequence.
  • the methods comprise generating stable T2 plants preferably homozygous for the mutation (that is a mutation in at least one OML4 gene or promoter sequence). Plants that have a mutation in at least one OML4 gene and/or promoter sequence can also be crossed with another plant also containing at least one mutation in at least one OML4 gene and/or promoter sequence to obtain plants with additional mutations in the OML4 gene promoter sequence.
  • This method can be used to generate a T2 plants with mutations on all or an increased number of homoeologs, when compared to the number of homoeolog mutations in a single T 1 plant transformed as described above.
  • a genetically altered plant of the present invention may also be obtained by transference of any of the sequences of the invention by crossing, e.g., using pollen of the genetically altered plant described herein to pollinate a wild-type or control plant, or pollinating the gynoecia of plants described herein with other pollen that does not contain a mutation in at least one of the OML4 gene or promoter sequence.
  • the methods for obtaining the plant of the invention are not exclusively limited to those described in this paragraph; for example, genetic transformation of germ cells from the ear of wheat could be carried out as mentioned, but without having to regenerate a plant afterward.
  • the largel forms large and heavy grains
  • TheInventor 1-1 mutant was isolated from y-ray- treated M2 populations of the japonica variety Zhonghuajing (ZHJ).
  • the largel -1 mutant displayed large grains and high plants ( Figure 1A-1 E).
  • the length ofInventge1-1 grains was increased by 16.24% compared with that of ZHJ grains ( Figure 1 F).
  • the width ofInventge1-1 grains was increased by 11.54% compared with that of ZHJ grains ( Figure 1G).
  • the largel -1 grains were also significantly heavier than ZHJ grains ( Figure 1 H).
  • the weight of largel -1 grains was increased by 23.11% compared with that of ZHJ grains.
  • LARGE1 encodes the Mei-2 like protein OML4
  • the MutMap approach was used to identify theCDCge1-1 mutation.
  • the progeny segregation showed that the single recessive mutation determines the large grain phenotype ofInventge1-1.
  • the genomic DNAs from F2 plants with large-grain phenotype were pooled and applied for whole-genome resequencing.
  • the wild-type ZHJ was also sequenced as a control. SNP analyses were performed as described previously (Fang, et al. 2016; Huang, et al. 2017). We detected 3913 SNPs and 1280 INDELs between ZHJ and the pooled F2 plants withInventge1-1 phenotypes.
  • This INDEL contains a 4-bp deletion inCDCge1-1 in the gene ⁇ LOC_Os02g31290) ( Figure 3A; Figure 9; Table 13), which leads to a premature stop codon ( Figure 3B).
  • Figure 3C dCAPSI marker
  • the genetic complementation test was conducted to confirm whether the deletion in LOC_Os02g31290 was responsible for theInventge1-1 phenotypes.
  • the genomic fragment of LOC_Os02g31290 ( gLARGEI ) was transformed into the Wegge1-1 mutant and generated eleven transgenic lines.
  • the gLARGEI construct complemented the large grain phenotypes of theInventge1-1 mutant ( Figure 3D and 3E).
  • the grain length and width of gLARGEI ;large1-1 transgenic plants were similar to those of ZHJ ( Figure 3F and 3G).
  • Genomic complementary plants also recovered to the wild type in plant height and morphology ( Figure 10). Therefore, the complementation test supported that the LARGE1 gene is LOC_Os02g31290.
  • LARGE1 /LOC_Os02g31290 encodes the Mei-2 like protein OML4 with three RNA Recognition Motifs (RRMs) ( Figure 3B and Figure 11). Homologs of OML4 were found in crops ( Figure 11) but the role of OML4 and its homologs in grain size control are totally unknown so far. The mutation inInventge1-1 resulted in a premature stop codon. The proteins encoded byInventge1-1 (OML4 large1 1 ) lacked RRM motifs ( Figure 3B), which indicated thatInventge1-1 is a loss of function allele.
  • OML4 results in short grains due to short cells in spikelet hulls
  • OML4 interacts with GSK2
  • OML4 in grain growth control, we identified its interacting partners through a yeast two-hybrid (Y2H) assay.
  • the OML4 full-length protein was used as the bait.
  • six different clones corresponding to GSK2 were found in this screen.
  • GSK2 has been reported to restrict grain growth in rice, suggesting that GSK2 is a candidate OML4-interacting partner.
  • OML4 was fused with the C-terminus of the yellow fluorescent protein (OML4-cYFP), and GSK2 was fused with the N-terminus of the yellow fluorescent protein (GSK2-nYFP). Confocal laser scanning microscopy observation showed that a strong YFP fluorescence was observed in nuclei when we co-expressed OML4-cYFP and GSK2- nYFP in N.benthamiana leaves. These results indicate that OML4 associates with GSK2 in plant cells.
  • GSK2 phosphorylates OML4 and modulates its protein level
  • GSK2 acts genetically with OML4 to regulate grain size
  • GSK2-RNAi produced long grains, like that observed inCDCge1-1 mutant, and GSK2 and OML4 restrict cell elongation in spikelet hulls ( Figure 2 and Figure 7).
  • GSK2 can phosphorylate OML4 in vitro.
  • We crossedInventge1-1 with GSK2-RNAi and isolated largel -1 ;GSK2-RNAi plants Figure 7K.
  • the length of largel -1 grains was increased by 16.24% in comparison to that of ZHJ, while the length of largel -1 ;GSK2-RNAi grains was increased by 7.90% compared with GSK2-RNAI.
  • Grain size and weight are critical determinants of grain yield, but the genetic and molecular mechanisms of grain size control in rice are still limited.
  • OML4 as a novel regulator of grain size and weight.
  • GSK2 interacts with and phosphorylates OML4.
  • GSK2 and OML4 function, at least in part, in a common pathway to control grain length in rice.
  • TheInventge1-1 mutant produced long, wide and heavy grains in comparison to the wild type. By contrast, overexpression of LARGE1 caused short and light grains. Thus, LARGE1 is a negative regulator of grain size and weight. Cellular analyses support that LARGE1 controls grain size by restricting cell expansion. Consistent with this, expression of several genes (e.g. SPL13, GS2, GS5 and GL7) (Li, et al. 2011 ; Che, et al. 2015; Duan, et al. 2015; Hu, et al. 2015; Zhou, et al. 2015; Si, et al. 2016), which control grain size by regulating cell expansion, was altered inInventge1-1 ( Figure 8).
  • LARGE1 encodes the Mei2-like protein (OML4) in rice.
  • Mei2-like proteins There are many Mei2-like proteins in plants, which have the conserved RRMs, but appear to have taken on distinct functions in plant development (Jeffares, et al. 2004).
  • the Arabidopsis-Mei2- Like (AML) genes contain a five-member gene family, which play a role in meiosis and vegetative growth (Kaur, et al. 2006).
  • TERMINAL EAR 1 ( TE1 ) encoding a Mei2-like protein, plays a role in regulating leaf initiation (Veit, et al. 1998).
  • PLASTOCHRON2(PLA2)/LEAFY HEAD2 (LHD2) encodes a Mei2-like protein (OML1) (Kawakatsu, et al. 2006).
  • the pla2 mutant exhibited precocious maturation of leaves , shortened plastochron, and ectopic shoot formation during the reproductive phase (Kawakatsu, et al. 2006).
  • OML4 a negative regulator of grain size in rice.
  • GSK2 regulates grain size by interacting with GS2 that predominately promotes cell expansion in spikelet hulls (Che et al., 2015).
  • GSK5 a homolog of GSK2
  • GSK2 has been reported to control grain size by restricting cell expansion in spikelet hulls (Hu, et al. 2018).
  • GSK2 is a functional protein kinase
  • GSK2 could phosphorylate OML4.
  • GSK2 can interact and phosphorylate OML4.
  • GSK2 influences the level of OML4 (Figure 6E). It is possible that GSK2 might phosphorylate OML4 and prevent the degradation of OML4.
  • the g-rays was used to irradiate the grains of the wild type Zhonghuajing (ZHJ), and the Wegge1-1 mutant was isolated from the M2 population.
  • Rice plants were grown in the field according to a previous report (Huang, et al. 2017). Rice plants were cultivated in Lingshui from December 2016 to April 2017, December 2017 to April 2018 and Zhejiang Academy of Agricultural Sciences (Hangzhou) from July 2017 to November 2017, July 2018 to November 2018, respectively.
  • RNA of seedlings or young panicles were extracted using a RNA Pre Pure Plant Kit (Tiangen, Beijing). cDNAs was synthesized according to the previous study (Duan, et al. 2015). Real-time RT-PCR was conducted on an ABI7500 real-time PCR system using a SYBR Green Mix Kit (Bio-Rad, Hercules, CA). Rice Actinl gene was used as an internal control.
  • the latter series of the recombinant vectors were constructed using the same kit and similar methods.
  • the related vectors we used in this study were plPKB003 (containing the ACTIN promoter and fused with the CDS of the OML4 gene), pMDC107 (constructing the gOML4-GFP plasmid), and pMDC164 (constructing the proOML4:GUS vector).
  • the plasmids gOML4, proACTIN:OML4, gOML4-GFP and proOML4:GUS were introduced into the Agrobacterium strain GV3101 , respectively.
  • the gOML4 and gOML4-GFP were transferred intoInventge1-1, and other plasmids were transferred into the wild type according to a previous report (Hiei, et al. 1994).
  • the cDNA sequences of GSK2 and OML4 were amplified using gene-specific primers (Table S4), and products were fused into the linearized pGADT7 and pGBKT7 vectors, respectively. Yeast two-hybrid analysis was conducted according to the manufacturer’s instruction (Clontech, USA).
  • Recombinant proteins (OML4-MBP and MBP) and the prey proteins (GSK2-GST and GST) were incubated in TGH buffer (50 mM HEPES, PH 7.5, 10% glycerol, 150 mM NaCI, Triton X-100, 1.5 mM MgCh, 1 mM EGTA, and protease inhibitor cocktail tablet) for 0.5 hr at 4 °C with 20 pi MBP-beads per tube. Centrifuge 500 rpm for 2 mins and discard supernatant to stop the reaction. Wash beads with ice-cold TGH buffer for 5 times and then add 50 pi SDS-loading buffer.
  • TGH buffer 50 mM HEPES, PH 7.5, 10% glycerol, 150 mM NaCI, Triton X-100, 1.5 mM MgCh, 1 mM EGTA, and protease inhibitor cocktail tablet
  • OML4, nOML4 and cOML4 were amplified using the specific primers (OML4-FLAG-F/R, nOML4-FLAG-F/R and COML4-FLAG-F/R) in Table S4.
  • the products were cloned to the vector pETnT to construct OML4-FLAG, nOML4-FLAG and cOML4-FLAG plasmids.
  • the GSK2 coding sequence was amplified using the primers GSK2-GST-F/R and subcloned to the vector pGEX4T-1 to construct GSK2- GST plasmid. All these plasmids were transformed into Escherichia coli (host strain BL21).
  • OML4-FLAG Induction, isolation and purification of OML4-FLAG, nOML4-FLAG, cOML4-FLAG and GSK2-GST proteins were done as described previously (Xia, et al. 2013). 10 pl_ of GSK2-GST was incubated with 5 mI_ of OML4-FLAG, nOML4-FLAG and cOML4-FLAG in 20 mI_ reaction buffer (25 mM Tris-HCI, PH 7.5, 10 mM MgCI 2 , 1 mM DTT, 50 mM ATP) for 2 hours, respectively. Phosphorylated products were analyzed by phos-tag SDS-PAGE. Anti- GST and anti-FLAG and anti-GST antibodies were utilized to detect the phosphorylated products and the input.
  • SEQ ID NO: 1 OML4 amino acid sequence
  • SEQ ID NO: 2 OML4 nucleic acid sequence (genomic)
  • ATGCCAT CT CAGGTCATGG AT CAG AGGCAT CACAT GT CCCAGT ACAGCCACCCCACCTT G GCTGCAT CCT CCTT CT CGG AGG AGCTT CGT CT CCCCACAG AGGTACT CCATAATTGCG AT A ATTTT G GT CC AAAT CTT CCTT CT G G AAGTCTTTT CTATGTG ATG GCT AAT G GTG ATCTGTCT GG AAATTTT ATTT GTTT AGCCTTT CCTGGT G ACCTGGTT AT GATT CAT AT CT ACAAAT CTTT A CC AATT ATT CT C ACC AT GTTT AT AT AT ATT C ATT AT G ATG AAT AT CT AT AATTT GT ACT AATTTTT CTCTCACCATGTTCATCTCTTCTTCTATCTTTGCAGAGGCAAGTTGGATTTTGGAAGCAGGA GTCATTACCTCATCACATGGGTTAGTGCTGAGTTTGATTTAACTGGGTTTTGTTCTA CATTTGTCTATTAGTATGCCTTGCGGTTGC
  • SEQ ID NO: 4 GSK2 amino acid sequence
  • SEQ ID NO: 5 GSK2 nucleic acid sequence
  • SEQ ID NO: 7 OML4 amino acid sequence
  • SEQ ID NO: 8 OML4 nucleic acid sequence
  • GCT G ACTTT GTT CC AAT AG G AG AACCCTTT G G AG AATG AC AAT AACC AG GG C AC ACTT G C A
  • GCT G AAGG AATGG AG AGCAG ACAT CTTTACAAAGTTGGTT CTGCT AACCTTGGTGGT CATT
  • ACCT CCT CTCCC ATGCT ATGG ACG AACT CAG GAT C ATTT AT C AAT AAT AT ACC AT CTCG ACC
  • SEQ ID NO: 9 OML4 promoter sequence
  • CTGCTT C AGT CG AG CTACCTG AG GTGTT G AAACTTGGT ATCTGT CT AT CTTT C AAG GT G CTA
  • SEQ ID NO: 10 GSK2 amino acid sequence
  • SEQ ID NO: 1 1 GSK2 nucleic acid sequence
  • SEQ ID NO: 12 GSK2 promoter sequence
  • SEQ ID NO: 13 OML4 amino acid sequence
  • SEQ ID NO: 14 OML4 nucleic acid sequence
  • SEQ ID NO: 15 OML4 promoter sequence
  • SEQ ID NO: 16 GSK2 amino acid sequence
  • SEQ ID NO: 17 GSK2 nucleic acid sequence
  • SEQ ID NO: 18 GSK2 promoter sequence
  • SEQ ID NO: 19 OML4 amino acid sequence
  • SEQ ID NO: 20 OML4 nucleic acid sequence
  • SEQ ID NO: 21 OML4 promoter sequence
  • SEQ ID NO: 22 GSK2 amino acid sequence
  • SEQ ID NO: 23 GSK2 nucleic acid sequence
  • SEQ ID NO: 24 GSK2 promoter sequence
  • SEQ ID NO: 25 OML4 amino acid sequence
  • SEQ ID NO: 26 OML4 nucleic acid sequence
  • SEQ ID NO: 27 OML4 promoter sequence
  • AAAGT CAATT AATTT CTTT G AAATTT ACAAATTTTT CAT AG AAAAC A C AAAAAT AC ATTTGT G AAAC AAACTTTTT C AAAAAAGT CT AT CTT GAT G AAACGG AT G G A GTATTATGTAT AAT ATTTTT ATT AT AT ATTTT ATTGCTAAATAAAAATTTTATGACTTTTGTTTA CTTTTT CACCAAT AAAAG ACT AT AATGCAAAAT GT AAAAT ATTT AAAGTTT AATTT G AAGTT GT T ATTT CGG AAAT AAT CACCTT CG AAGTTT AAATTT GT AAT ATTGCAAACTTT ATTTGG AG ATG TTTT C ACG GT CG ACTTG
  • SEQ ID NO: 28 GSK2 amino acid sequence
  • SEQ ID NO: 30 GSK2 promoter sequence
  • ATG CGTTCT AAGTAT C AAG AT CCT ATT ACT ACT ACT AC ACCTT GT AAT GAG AAT CAT AAG GT G AAG AT AAATGG AT CTT CT ACT CCAG AAGGG AAAG AG AG ACT AG AG AACTT G AGCT CAG CTT C ACG CACT AAAACC AGC AAAAACTTT G GTG AGCT CTT G GCTAGTG AT G AC AAT AC AT G GGAACCTTATTCTGAGGCTCCTGTTGCTGAGAAAACTCTGTATGTAGACACTGTGCATTCA GT ACACAAG AAGGT ACAAG AAG AGT CTTT ATT AAAAG ATT ACCCTT CACT AG AAGTT GTT CC TGTTAAAGAAGATGTTCAGAACTTGATTGGAGCCAGTGAAGAAGCTATCTCAGGTCTAAAA GTTGAAGAATGTGCTGATCAAGCTATTTCTGAAGTAGTAGAGATTACAAAGGATTTTGAATG TT C AAGG CTT CAT CAT CAT C AC ATT
  • SEQ ID NO: 31 GSK2 RNAi sequence
  • SEQ ID NO: 33 CRISPR target sequence for OML4 GTGGGTTCCGGCAACCTCAATGG
  • SEQ ID NO: 34 CRISPR target sequence for GSK2 AGGGGAATGACGCGGTGACCGGG
  • SEQ ID NO: 35 CRISPR protospacer sequence for OML4 GTGGGTTCCGGCAACCTCAA
  • SEQ ID NO: 36 CRISPR protospacer sequence for GSK2 AGGGGAATGACGCGGTGACC

Abstract

The invention relates to methods of increasing grain size and/or weight in a plant, as well as plants with increased grain size and/or weight.

Description

Methods of controlling grain size and weight FIELD OF THE INVENTION
The invention relates to methods of increasing grain size and/or weight in a plant, as well as plants with increased grain size and/or weight by reducing the expression and/or activity of OML4. Alternatively, the invention relates to methods of increasing grain number by increasing the expression and/or activity of OML4.
BACKGROUND OF THE INVENTION
The world population continues to increase rapidly, and this increase has led to a growing demand for staple crops, such as rice, wheat and maize. Grain yield is determined by tiller number, grain number and grain weight. As grain size is a key component of grain weight, regulation of grain size is a crucial strategy to increase grain production. Grain growth is restricted by spikelet hulls, which influence final grain size in rice. In turn, the growth of the spikelet hull is determined by cell proliferation and cell expansion processes. Several genes that regulate grain size by influencing cell proliferation in the spikelet hull have been described in rice, such as GW2, GW5/GSE5, GW8/OsSPL16, GS3, GS9, OsMKKK10-OsMKK4-OsMPK6 and MKP1 . In addition, several genes that control grain size by influencing cell expansion in the spikelet hulls have been reported in rice, such as GS2/OsGRF4, OsGSK5, GLW7 (SPL13), GL7, PGL1/2 and APG. However, the genetic and molecular relationships between these factors remain largely unknown. There therefore exists a need to increase grain size and/or grain weight in staple crops. There also exists a need to increase grain number in staple crops. The present invention addresses this need.
SUMMARY OF THE INVENTION
We have identified genes whose loss and gain of functions lead to opposite effects on grain size. Here we report that the Mei2-Like protein 4 (OML4) encoded by the LARGE1 gene is phosphorylated by the glycogen synthase kinase 2 (GSK2) and negatively controls grain size and weight in rice. Loss of function of OML4 leads to large and heavy grains, while overexpression of OML4 causes small and light grains. OML4 regulates grain size by restricting cell expansion in the spikelet hull. OML4 is expressed in developing inflorescences (e.g. panicles of rice) and grains, and expression (indicated by GFP-OML4 fusion protein) is localized in the nuclei. Biochemical analyses show that GSK2 physically interacts with OML4 and phosphorylates it, therefore possibly influencing the stability of OML4. Genetic analyses support that GSK2 and OML4 act, at least in part, in a common pathway to control grain size in rice. Therefore, our findings reveal a significant genetic and molecular mechanism to control both grain size and weight in crops.
In a first aspect of the invention, there is provided a method of increasing grain size and/or weight, the method comprising reducing or abolishing the expression and/or activity of Mei2-Like protein 4 (OML4).
Preferably, the method comprises introducing at least one mutation into at least one nucleic acid sequence encoding OML4 and/or at least one mutation into the promoter of OML4.
In a further embodiment, the method further comprises additionally reducing or abolishing the expression and/or activity of a SHAGGY-like kinase (GSK2). Preferably, the method comprises introducing at least one mutation into at least one nucleic acid sequence encoding GSK2 and/or at least one mutation into the promoter of GSK2.
In one embodiment, the mutation is a loss of function or partial loss of function mutation. Preferably, the mutation is introduced using targeted genome modification, preferably ZFNs, TALENs or CRISPR/Cas9 or mutagenesis, preferably TILLING or T- DNA insertion. Alternatively, the method comprises using RNA interference to reduce or abolish the expression of a OML4 nucleic acid sequence or a GSK2 nucleic acid sequence.
In another aspect of the invention, there is provided a genetically modified plant, plant cell or part thereof characterised by reduced or abolished expression of OML4. Preferably, the plant comprises at least one mutation in at least one nucleic acid sequence encoding a OML4 gene and/or at least one mutation into the promoter of OML4. Most preferably the plant part is a seed or grain (such terms can be used interchangeably). Also provided, are progeny plants obtained or obtainable from the seeds, as well as seeds obtained from said progeny plants.
In another embodiment, the plant further comprises at least one mutation in at least one nucleic acid sequence encoding GSK2 and/or at least one mutation into the promoter of GSK2. Preferably, the mutation is a loss of function or partial loss of function mutation.
In an alternative embodiment, the plant comprises an RNA interference construct that reduces or abolishes the expression of OML4.
In another aspect of the invention, there is provided a method of producing a plant with increased grain size and/or weight, the method comprising introducing at least one mutation into at least one nucleic acid sequence encoding a OML4 polypeptide and/or at least one mutation into the promoter of OML4. In one embodiment, the method further comprises introducing at least one mutation into at least one nucleic acid sequence encoding a GSK2 polypeptide and/or at least one mutation into the promoter of GSK2. Preferably, the mutation is a loss of function or partial loss of function mutation.
According to any aspect of the invention, in one embodiment, the OML4 nucleic acid sequence encodes a polypeptide comprising SEQ ID NO: 1 or a functional variant or homolog thereof, and preferably the nucleic acid sequence encoding OML4 comprises a nucleic sequence as defined in SEQ ID NO: 2. In another embodiment, the promoter of OML4 comprises a sequence as defined in SEQ ID NO: 3 or a functional variant or homolog thereof.
In a further embodiment, the GSK2 nucleic acid sequence encodes a polypeptide as defined in SEQ ID NO: 4 or a functional variant or homolog thereof, and preferably, the GSK2 nucleic acid sequence comprises a nucleic acid sequence as defined in SEQ ID NO: 5 or a functional variant or homolog thereof. In another embodiment, the GSK2 promoter comprises a nucleic acid sequence as defined in SEQ ID NO: 6 or a functional variant or homolog thereof.
In one embodiment of any of the above described methods, the mutation is introduced using targeted genome modification, preferably ZFNs, TALENs or CRISP/Cas9, or the mutation is introduced using mutagenesis, preferably TILLING or T-DNA insertion.
According to any aspect of the invention, in one embodiment, the plant is a crop plant. Preferably, the plant is selected from rice, wheat, maize, soybean and brassicas. DESCRIPTION OF THE FIGURES
The invention is further described in the following non-limiting figures:
Figure 1 shows that LARGE1 influences grain size and plant morphology. (A, B) ZHJ and largel -1 grains. (C, D) ZHJ and largel -1 plants. (E) ZHJ (left) and largel -1 (right) panicles. (F, G) Grain length and width of ZHJ and largel -1. (H) 1000-grain weight of ZHJ and largel -1. (I) Plant height of ZHJ and largel -1. (J) Panicle length of ZHJ and largel -1. (K) The number of ZHJ and largel -1 primary panicle branches. (L) The number of ZHJ and largel -1 secondary panicle branches. Values in F-H are given as mean + SD (n ³50). Values in l-L are given as means + SD (n=20). Asterisks indicate significant differences between ZHJ and largel -1. **P<0.01 compared with the wild type (ZHJ) by Student’s t- test. Bars: 2 mm in A and B; 10 cm in C-E.
Figure 2 shows that the largel forms large grains due to increased cell expansion in the spikelet hull. (A, B) SEM analysis of the outer surface of ZHJ (A) and largel -1 (B) lemmas. (C, D) SEM analysis of the inner surface of ZHJ (C) and largel -1 (D) lemmas. (E, F) The average length (E) and width (F) of outer epidermal cells in ZHJ and largel - 1 lemmas. (G) Outer epidermal cell number in the longitudinal direction in ZHJ and largel -1 lemmas. (H) Outer epidermal cell number in the transverse direction in ZHJ and largel -1 lemmas. (I, J) The average length (I) and width (J) of inner epidermal cells in the longitudinal direction in ZHJ and largel -1 lemmas. Values in E-J are given as the means + SD (n ³ 50). **P<0.01 compared with the wild type by Student’s t- test. Bars: 50 pm in A-D.
Figure 3 shows that LARGE1 encodes the mei2-like protein OML4. (A) The LARGE1/OML4 gene structure. The coding sequence was shown using the black box, and introns were indicated using black lines. ATG and TGA represent the start codon and the stop codon, respectively. (B) OML4 and mutated protein encodes by largel . The OML4 protein contains three RNA recognition motif (RRM) domains. The mutation results in a premature termination codon in OML4, causing a truncated protein. (C) The dCAPSI marker was developed according to the largel -1 mutation. The PCR products were digested by the restriction enzyme Hph I. (D, E) Mature paddy (D) and brown (E) rice grains of ZHJ, largel -1 , gLARGEI ; largel -1 #1 and gLARGEI ; largel -1 #2. (F, G) Grain length (F) and width (G) of ZHJ, largel -1 , gLARGEI ; largel -1 #1 and gLARGEI ; Iarge1-1 #2. Asterisks indicate significant differences between ZHJ and largel -1. **P<0.01 compared with the wild type by Student’s t-test. (H) The relative 0ML4 gene expression level in young panicle of 1 cm (YP1) to 15 cm (YP15) in ZHJ. Values are given as mean ± SD. Three biological replicates were used (n = 3). (I) OML4 expression activity was monitored by proOML4::GUS transgene expression. Histochemical analysis of GUS activity in panicles at different developmental stages. (J, K) Mature paddy (J) and brown (K) rice grains of ZHJ, largel -1 , gl_ARGE1-GFP; largel -1 #1. (L-O) Subcellular location of OML4-GFP in gLARGE1-GFP; largel -1 #1 root cells. GFP fluorescence of GFP-OML4 (L), DAPI staining (M), DIC (N) and merged (O) images are shown. Bars: 2 mm in D, E, J and K; 1 cm in I; 10 pm in L-O.
Figure 4 shows that Overexpression of OML4 results in smaller grains. (A, B) ZHJ and proActin:OML4 grains. (C, D) Grain length and width of ZHJ and proActin:OML4 transgenic lines. (E) 1000-grain weight of ZHJ and proActin:OML4 transgenic lines. (F) Expression level of OML4 in ZHJ and proActin:OML4 transgenic lines. Three biological replicates were used (n=3). ACTIN1 was used to normalize expression. (G) ZHJ and proActin:OML4 plants. (H) Plant height of ZHJ and proActin:OML4 transgenic lines. (I) ZHJ and proActin:OML4 panicles. (J) Panicle length of ZHJ and proActin:OML4 transgenic lines. (K, L) The primary and secondary panicle branch number of ZHJ and proActin:OML4 transgenic lines. (M) Total grain number per panicle of ZHJ and proActin:OML4 transgenic lines. (N, O) SEM analysis of the outer surface of ZHJ (N) and proActin:OML4 #1 (O) lemmas. (P, Q) The average length and width of outer epidermal cells in the longitudinal direction in ZHJ and proActin:OML4 #1 lemmas. (R, S) The number of outer epidermal cells in the longitudinal and transverse direction in ZHJ and proActin:OML4 #1 lemmas. Values in C-E, and P-S are given as the means ± SD (n ³50). Value F is given as the mean ± SD. Values H, and J-M are given as the means ± SD (n = 20). Asterisks indicate significant differences between ZHJ and proActin:OML4 transgenic lines. *P<0.05; **P<0.01 compared with the wild type by Student’s t- test. Bars: 2 mm in A and B; 10 cm in G and I; 50 pm in N and O.
Figure 5 shows that OML4 physically interacts with GSK2 in Vitro and in Vivo. (A) OML4 interacts with GSK2 in yeast cells. Yeast cells were cultured on SD/-Trp-Leu or SD/-Trp-Leu-His-Ade media. (B) OML4 associates with GSK2 in N. benthamiana. OML4-nLUC and GSK2-cLUC were co-expressed in N. benthamiana leaves. Luciferase activity was observed 48 hours after infiltration. The range of luminescence intensity was scaled by the pseudocolor bar. (C) Bimolecular fluorescence complementation (BiFC) assays shown that OML4 interacts with GSK2 in N. benthamiana. OML4-cYFP was coexpressed with GSK2-nYFP in leaves of N. benthamiana. (D) OML4 binds GSK2 in vitro. GSK2-GST was incubated with OML4- MBP and pulled down by OML4-MBP and detected by immunoblot with anti-GST antibody. IB: immunoblot. (E) Interaction between OML4 and GSK2 in the Co-IP assays. Anti-MYC beads were used to immunoprecipitate GSK2-GFP proteins. Gel blots were probed with anti-MYC or anti-GFP antibody. Bars: 50 pm in C.
Figure 6 shows that GSK2 is required for the phosphorylation of OML4. (A) GSK2 phosphorylates OML4 in vitro. The phosphorylated OML4-FLAG, nOML4-FLAG (the N- terminal of OML4) and cOML4-FLAG (the C-terminal of OML4) were separated by phos-tag SDS-PAGE. The phosphorylated protein was marked with the red vertical line. (B) Detection of phosphorylation sites of OML4 by LC-MS/MS after in vitro phosphorylation reaction. OML4 contains 1001 residues. The phosphorylate residues detected by LC-MS/MS were shown in red. Two important residues shown by underline, were substituted into phosphor-dead residues. (C) S(105) and S(607) partially influence the phosphorylation of OML4. The phosphorylated nOML4-FLAG, nOML4(S105A)-FLAG, cOML4-FLAG and cOML4(S607A)-FLAG were separated by phos-tag SDS-PAGE. The phosphorylated protein was marked with the red vertical line. (D) S(105) and S(607) partially influence the phosphorylation of OML4. The phosphorylated OML4-MBP, OML4S105A, S607A-MBP and GSK2-GST were separated by phos-tag SDS-PAGE. The phosphorylated protein was marked with red vertical line. (E) GSK2 influences the abundance of OML4. GSK2-GFP and OML4- MYC were co-expressed in tobacco leaves and protein levels were detected by western blotting. This result was repeated for three times. (F) S(105) and S(607) partially influence the abundance of OML4. GSK2-GFP and OML4-MYC or OML4S105A, S607A-MYC were co-expressed in tobacco leaves and protein levels were detected by western blotting. This result was repeated for three times.
Figure 7 shows that GSK2 acts genetically with OML4 to regulate seed size. (A, B) ZHJ and GSK2-RNAi grains. (C) Expression level of GSK2 in ZHJ and GSK2-RNAi transgenic lines. Three biological replicates were used (n=3). ACTIN1 was used to normalize expression. (D, E) Grain length (D) and width (E) of ZHJ and GSK2-RNAi transgenic lines. (F) 1000-grain weight of ZHJ and GSK2-RNAi transgenic lines. (G, H) SEM analysis of the outer surface of ZHJ (G) and GSK2-RNAi #1 (H) lemmas. (I, J) The average length and width of outer epidermal cells in the longitudinal direction in ZHJ and GSK2-RNAi #1 lemmas. (K) Grains of ZHJ, largel-1 , GSK2-RNAi#1 and largel-1 ; GSK2-RNAi#1. (L) Grain length of ZHJ, largel-1 , GSK2-RNAi#1 and largel - 1 ; GSK2-RNAi#1. Values in D-F, l-J, and L are given as the means + SD (n³50). *P<0.05; **P<0.01 compared with the wild type by Student’s t- test. Bars: 2 mm in A, B and K; 50 pm in G and H.
Figure 8 shows the expression level of the indicated genes in ZHJ and largel -1 panicles. ACTIN1 was used to normalize expression. Values are means + SD relative to the ZHJ value set at 1. Three biological replicates were used (n=3). *P<0.05; **P<0.01 compared with the wild type by Student’s t- test.
Figure 9 shows the CDS and protein sequence of OML4. (A) The full-length cDNA sequence of OML4. The deletion sequence in largel -1 in the OML4 gene is show in red. (B) The amino acid sequence of OML4. (C) The amino acid sequence of largel -1.
Figure 10 shows the plant height, panicle size and grain number per panicle of gl_ARGE1 ;large1-1. (A) Plants of ZHJ, largel -1 , gLARGEI ;large1-1 #1 and gLARGEI ;large1-1 #2. (B) Phenotypes of ZHJ (left), largel -1 (middle) and gLARGEI ;large1-1 #1 (right) panicles. (C) Plant height of ZHJ, largel -1 and gLARGEI ;large1-1 #1. (D) Panicle length of ZHJ, largel -1 and gLARGEI ;large1-1 #1.
(E) The number of ZHJ, largel -1 and gLARGEI ;large1-1 #1 primary panicle branches.
(F) 1000-grain weight of ZHJ, largel -1 and gLARGEI ;large1-1 #1. Values in C-E are given as the means + SD (n=20). Value F is given as the mean + SD (n=100). Asterisks indicate significant differences between ZHJ and largel -1 or ZHJ and gLARGEI ;large1-1 #1. **P<0.01 compared with the wild type by Student’s t- test. Bars: 10 cm in A and B.
Figure 11 shows the structural features and phylogenetic tree of OML4. (A) Amino acid sequence alignment of MEI2-LIKE proteins in rice. The three conserved RNA Recognition Motif (RRM) are marked. (B) Phylogenetic tree of MEI2-LIKE proteins in rice and Arabidopsis. OML1 , OML2, OML3, OML4, and OML5 are from O.sativa, TE1 and LOC103653544 (MEI2-LIKE protein 1) are from Z. mays, AML1 , AML2, AML3, AML4, and AML5 are from Arabidopsis. The multiple sequence alignment and construction of phylogenetic tree were performed with MEGA7 using neighbor-joining method with 100 bootstrap replicates.
Figure 12 shows the identification of the largel -1 mutation. CHR, chromosome; POS, position in chromosome. The whole genome sequencing reveals the one deletion in the LOC_Os02g31290 gene, which has a SNP/INDEL-index = 1 .
DETAILED DESCRIPTION OF THE INVENTION
The present invention will now be further described. In the following passages, different aspects of the invention are defined in more detail. Each aspect so defined may be combined with any other aspect or aspects unless clearly indicated to the contrary. In particular, any feature indicated as being preferred or advantageous may be combined with any other feature or features indicated as being preferred or advantageous.
The practice of the present invention will employ, unless otherwise indicated, conventional techniques of botany, microbiology, tissue culture, molecular biology, chemistry, biochemistry and recombinant DNA technology, bioinformatics, which are within the skill of the art. Such techniques are explained fully in the literature.
As used herein, the words "nucleic acid", "nucleic acid sequence", "nucleotide", "nucleic acid molecule" or "polynucleotide" are intended to include DNA molecules (e.g., cDNA or genomic DNA), RNA molecules (e.g., mRNA), natural occurring, mutated, synthetic DNA or RNA molecules, and analogs of the DNA or RNA generated using nucleotide analogs. It can be single-stranded or double-stranded. Such nucleic acids or polynucleotides include, but are not limited to, coding sequences of structural genes, anti-sense sequences, and non-coding regulatory sequences that do not encode mRNAs or protein products. These terms also encompass a gene. The term "gene" or “gene sequence" is used broadly to refer to a DNA nucleic acid associated with a biological function. Thus, genes may include introns and exons as in the genomic sequence, or may comprise only a coding sequence as in cDNAs, and/or may include cDNAs in combination with regulatory sequences.
The terms "polypeptide" and "protein" are used interchangeably herein and refer to amino acids in a polymeric form of any length, linked together by peptide bonds. The aspects of the invention involve recombinant DNA technology and exclude embodiments that are solely based on generating plants by traditional breeding methods.
Methods of increasing grain size and/or weight
In a first aspect of the invention, there is provided a method of increasing grain size and/or weight in a plant, wherein the method comprises reducing or abolishing the expression and/or activity of Mei2-Like protein 4 (OML4).
In one embodiment, an “increase” in grain size and/or weight may comprise an increase of at least 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45% or 50% compared to the grain size and/or weight in a wild-type or control plant. In one embodiment, the increase may be between 5 and 30% and even more preferably between 10 and 25% compared to the grain size and/or weight in a wild-type or control plant. In one embodiment grain size may comprise one of grain length and/or grain width. In a further embodiment, the grain weight may comprise thousand-grain weight. Any of the above can be measured using standard techniques in the art.
In a further aspect of the invention, there is provided a method of increasing yield the method comprising reducing or abolishing the expression or activity of the OML4 gene. The term "yield" in general means a measurable produce of economic value, typically related to a specified crop, to an area, and to a period of time. Individual plant parts directly contribute to yield based on their number, size and/or weight. The actual yield is the yield per square meter for a crop and year, which is determined by dividing total production (includes both harvested and appraised production) by planted square metres.
In one example, yield is increased by at least 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45% ,50% 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95% compared to a control or wild-type plant. In a preferred embodiment, yield is increased by at least 10%, and even more preferably between 10 and 60% compared to a control or wild- type plant.
In a further aspect of the invention, the method further comprises reducing or abolishing the expression or activity of SHAGGY-like kinase (GSK2). In one embodiment the method comprises introducing at least one mutation into OML4. In a further embodiment, the method comprises introducing at least one mutation into OML4 and at least one mutation into GSK2.
“By at least one mutation” is meant that where the OML4 or GSK2 gene is present as more than one copy or homoeologue (with the same or slightly different sequence) there is at least one mutation in at least one gene. Preferably all genes are mutated in OML4 and/or GSK2.
The terms “reducing” means a decrease in the levels of OML4 or GSK2 expression and/or activity by up to 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% or 90% when compared to the level in a wild-type or control plant. The term “abolish” expression means that no expression of OML4 or GSK2 polypeptide is detectable or that no functional OML4 or GSK2 polypeptide is produced. Methods for determining the level of OML4 or GSK2 polypeptide expression and/or activity would be well known to the skilled person. These reductions can be measured by any standard technique known to the skilled person. For example, a reduction in the expression and/or content levels of at least OML4 or GSK2 expression may be a measure of protein and/or nucleic acid levels and can be measured by any technique known to the skilled person, such as, but not limited to, any form of gel electrophoresis or chromatography (e.g. HPLC).
In one embodiment, the method comprises introducing at least one mutation into the, preferably endogenous, gene encoding OML4 and/or the OML4 promoter. In another embodiment, the method comprises introducing a further mutation into the, preferably endogenous, gene encoding GSK2 and/or the GSK2 promoter. Preferably, said mutation is in the coding region of the OML4 or the GSK2 gene. In a further embodiment, at least one mutation or structural alteration may be introduced into the OML4 or GSK2 promoter such that the OML4 or GSK2 gene is either not expressed (i.e. expression is abolished) or expression is reduced, as defined herein. In an alternative embodiment, at least one mutation may be introduced into the OML4 or GSK2 gene such that the altered gene does not express a full-length (i.e. expresses a truncated) OML4 or GSK2 protein or does not express a fully functional OML4 or GSK2 protein. In this manner, the activity of the OML4 or GSK2 polypeptide can be considered to be reduced or abolished as described herein. In any case, the mutation may result in the expression of OML4 or GSK2 with no, significantly reduced or altered biological activity in vivo. Alternatively, OML4 or GSK2 may not be expressed at all.
In one embodiment, the sequence of the OML4 gene comprises or consists of a nucleic acid sequence as defined in SEQ ID NO: 2 (genomic) or a functional variant or homologue thereof and encodes a polypeptide as defined in SEQ ID NO: 1 or a functional variant or homologue thereof.
By “OML4 promoter” is meant a region extending for at least 2000-2500bp, preferably 2049bp upstream of the ATG codon of the OML4 ORF (open reading frame). In one embodiment, the sequence of the OML4 promoter comprises or consists of a nucleic acid sequence as defined in SEQ ID NO: 3 or a functional variant or homologue thereof. Similarly, by “GSK2 promoter” is meant a region extending at least 200-300bp, preferably 247bp upstream of the ATG codon of the GSK2 ORF (open reading frame). In one embodiment, the sequence of the GSK2 promoter comprises or consists of a nucleic acid sequence as defined in SEQ ID NO: 6 or a functional variant or homologue thereof.
In the above embodiments an ‘endogenous’ nucleic acid may refer to the native or natural sequence in the plant genome. In one embodiment, the endogenous sequence of the OML4 gene comprises SEQ ID NO: 2 and encodes an amino acid sequence as defined in SEQ ID NO: 1 or homologs thereof. Also included in the scope of this invention are functional variants (as defined herein) and homologs of the above identified sequences. Examples of OML4 homologs are shown in SEQ ID NOs: 7-9, 13-15, 19-21 and 25-27. Accordingly, in one embodiment, the homolog encodes a polypeptide selected from SEQ ID NOs: 7, 13, 19 or 25 or the homolog comprises or consists of a nucleic acid sequence selected from SEQ ID NOs: 8, 14, 20, 26. In a further embodiment, the endogenous sequence of the GSK2 gene comprises SEQ ID NO: 5 and encodes an amino acid sequence as defined in SEQ ID NO: 4 or homologs thereof. Also included in the scope of this invention are functional variants (as defined herein) and homologs of the above identified sequences. Examples of GSK2 homologs are shown in SEQ ID NOs: 10-12, 16-18, 22-24 and 28-30. Accordingly, in one embodiment, the homolog encodes a polypeptide selected from SEQ ID NOs: 10, 16, 22 or 28 or the homolog comprises or consists of a nucleic acid sequence selected from SEQ ID NOs: 11 , 17, 23 or 29. The term “functional variant of a nucleic acid sequence” as used herein with reference to any SEQ ID describes herein refers to a variant gene sequence or part of the gene sequence which retains the biological function of the full non-variant sequence. A functional variant also comprises a variant of the gene of interest which has sequence alterations that do not affect function, for example in non- conserved residues. Also encompassed is a variant that is substantially identical, i.e. has only some sequence variations, for example in non-conserved residues, compared to the wild type sequences as shown herein and is biologically active. Alterations in a nucleic acid sequence which result in the production of a different amino acid at a given site that do not affect the functional properties of the encoded polypeptide are well known in the art. For example, a codon for the amino acid alanine, a hydrophobic amino acid, may be substituted by a codon encoding another less hydrophobic residue, such as glycine, or a more hydrophobic residue, such as valine, leucine, or isoleucine. Similarly, changes which result in substitution of one negatively charged residue for another, such as aspartic acid for glutamic acid, or one positively charged residue for another, such as lysine for arginine, can also be expected to produce a functionally equivalent product. Nucleotide changes which result in alteration of the N-terminal and C-terminal portions of the polypeptide molecule would also not be expected to alter the activity of the polypeptide. Each of the proposed modifications is well within the routine skill in the art, as is determination of retention of biological activity of the encoded products.
In one embodiment, a functional variant has at least 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%,
46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%,
61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%,
76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%,
91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or at least 99% overall sequence identity to the non-variant nucleic acid or amino acid sequence.
The term homolog, as used herein, also designates a OML4 or GSK2 promoter or OML4 or GSK2 gene orthologue from other plant species. A homolog may have, in increasing order of preference, at least 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%,
48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or at least 99% overall sequence identity to the amino acid represented by any of SEQ ID NO: 1 or 4 or to the nucleic acid sequences as shown in SEQ ID NO: 2 or 5. Functional variants of OML4 homologs as defined above are also within the scope of the invention.
The “OML4” or “LARGE1” gene (such terms are used interchangeably herein) encodes a Mei-2 like protein, OML4. This protein is characterised by three RNA recognition motifs or RRMs.
In one embodiment, the sequence of the RRMs is selected from: SRTLFVRNINSNVEDSELKLLFEHFGDIRALYTACKHRGFVMISYYDIRSALNAKMELQ NKALRRRKLDIHYSIPKD: SEQ ID NO: 37
QGTIVLFNVDLSLTNDDLHKIFGDYGEIKEIRDTPQKGHHKIIEFYDVRAAEAALRALNR NDIAGKKIKLE: SEQ ID NO: 38; and
LMIKNIPNKYTSKMLLAAIDENHKGTYDFIYLPIDFKNKCNVGYAFINMTNPQHIIPFYQT FNGKKWEKFNSEKVASLAYARIQGK: SEQ ID NO: 39
Accordingly, in one embodiment, the OML4 nucleic acid (coding) sequence encodes a OML4 protein comprising at least one RRM motif, preferably all three motifs as defined above, or a variant thereof, wherein the variant has at least 25%, 26%, 27%, 28%, 29%, 30%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%,
44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%,
59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%,
74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%,
89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or at least 99% overall sequence identity to at least one of SEQ ID No 37 to 39 as defined herein.
The “GSKZ gene (SHAGGY-like kinase) encodes a serine/threonine kinase, which is an ortholog of BIN2, and is involved in BR signalling.
Two nucleic acid sequences or polypeptides are said to be "identical" if the sequence of nucleotides or amino acid residues, respectively, in the two sequences is the same when aligned for maximum correspondence as described below. The terms "identical" or percent "identity," in the context of two or more nucleic acids or polypeptide sequences, refer to two or more sequences or subsequences that are the same or have a specified percentage of amino acid residues or nucleotides that are the same, when compared and aligned for maximum correspondence over a comparison window, as measured using one of the following sequence comparison algorithms or by manual alignment and visual inspection. When percentage of sequence identity is used in reference to proteins or peptides, it is recognised that residue positions that are not identical often differ by conservative amino acid substitutions, where amino acids residues are substituted for other amino acid residues with similar chemical properties (e.g., charge or hydrophobicity) and therefore do not change the functional properties of the molecule. Where sequences differ in conservative substitutions, the percent sequence identity may be adjusted upwards to correct for the conservative nature of the substitution. Means for making this adjustment are well known to those of skill in the art. For sequence comparison, typically one sequence acts as a reference sequence, to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Default program parameters can be used, or alternative parameters can be designated. The sequence comparison algorithm then calculates the percent sequence identities for the test sequences relative to the reference sequence, based on the program parameters. Non-limiting examples of algorithms that are suitable for determining percent sequence identity and sequence similarity are the BLAST and BLAST 2.0 algorithms.
Suitable homologues can be identified by sequence comparisons and identifications of conserved domains. There are predictors in the art that can be used to identify such sequences. The function of the homologue can be identified as described herein and a skilled person would thus be able to confirm the function, for example when overexpressed in a plant.
Thus, the nucleotide sequences of the invention and described herein can also be used to isolate corresponding sequences from other organisms, particularly other plants, for example crop plants. In this manner, methods such as PCR, hybridization, and the like can be used to identify such sequences based on their sequence homology to the sequences described herein. Topology of the sequences and the characteristic domains structure can also be considered when identifying and isolating homologs. Sequences may be isolated based on their sequence identity to the entire sequence or to fragments thereof. In hybridization techniques, all or part of a known nucleotide sequence is used as a probe that selectively hybridizes to other corresponding nucleotide sequences present in a population of cloned genomic DNA fragments or cDNA fragments (i.e., genomic or cDNA libraries) from a chosen plant. The hybridization probes may be genomic DNA fragments, cDNA fragments, RNA fragments, or other oligonucleotides, and may be labelled with a detectable group, or any other detectable marker. Methods for preparation of probes for hybridization and for construction of cDNA and genomic libraries are generally known in the art and are disclosed in Sambrook, et al., (1989) Molecular Cloning: A Library Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, New York).
Hybridization of such sequences may be carried out under stringent conditions. By "stringent conditions" or "stringent hybridization conditions" is intended conditions under which a probe will hybridize to its target sequence to a detectably greater degree than to other sequences (e.g., at least 2-fold over background). Stringent conditions are sequence dependent and will be different in different circumstances. By controlling the stringency of the hybridization and/or washing conditions, target sequences that are 100% complementary to the probe can be identified (homologous probing). Alternatively, stringency conditions can be adjusted to allow some mismatching in sequences so that lower degrees of similarity are detected (heterologous probing). Generally, a probe is less than about 1000 nucleotides in length, preferably less than 500 nucleotides in length. Typically, stringent conditions will be those in which the salt concentration is less than about 1.5 M Na ion, typically about 0.01 to 1.0 M Na ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30°C for short probes (e.g., 10 to 50 nucleotides) and at least about 60°C for long probes (e.g., greater than 50 nucleotides). Duration of hybridization is generally less than about 24 hours, usually about 4 to 12. Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide.
In a further embodiment, a variant as used herein can comprise a nucleic acid sequence encoding a OML4 or a GSK2 polypeptide as defined herein that is capable of hybridising under stringent conditions as defined herein to a nucleic acid sequence as defined in SEQ ID NO: 2 or 5 respectively. In one embodiment, there is provided a method of increasing grain size and/or weight in a plant, as described herein, the method comprising reducing or abolishing the expression of at least one nucleic acid encoding a OML4 polypeptide, as described herein, wherein the method comprises introducing at least one mutation into at least OML4 gene and/or promoter, wherein the OML4 gene comprises or consists of a. a nucleic acid sequence encoding a polypeptide as defined in one of SEQ ID NO:1 ; or b. a nucleic acid sequence as defined in one of SEQ ID NO: 2; or c. a nucleic acid sequence with at least 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or at least 99% overall sequence identity to either (a) or (b); or d. a nucleic acid sequence encoding a OML4 polypeptide as defined herein that is capable of hybridising under stringent conditions as defined herein to the nucleic acid sequence of any of (a) to (c). and wherein the OML4 promoter comprises or consists of e. a nucleic acid sequence as defined in SEQ ID NO: 3; f. a nucleic acid sequence with at least 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or at least 99% overall sequence identity to (e); or g. a nucleic acid sequence capable of hybridising under stringent conditions as defined herein to the nucleic acid sequence of any of (e) to (f).
In a preferred embodiment, the mutation that is introduced into the endogenous OML4 gene or promoter or the GSK2 gene or promoter thereof to silence, reduce, or inhibit the biological activity and/or expression levels of the OML4 or GSK2 gene or protein can be selected from the following mutation types
1. a "missense mutation", which is a change in the nucleic acid sequence that results in the substitution of an amino acid for another amino acid;
2. a "nonsense mutation" or "STOP codon mutation", which is a change in the nucleic acid sequence that results in the introduction of a premature STOP codon and, thus, the termination of translation (resulting in a truncated protein); plant genes contain the translation stop codons "TGA" (UGA in RNA), "TAA" (UAA in RNA) and "TAG" (UAG in RNA); thus any nucleotide substitution, insertion, deletion which results in one of these codons to be in the mature mRNA being translated (in the reading frame) will terminate translation.
3. an "insertion mutation" of one or more amino acids, due to one or more codons having been added in the coding sequence of the nucleic acid;
4. a "deletion mutation" of one or more amino acids, due to one or more codons having been deleted in the coding sequence of the nucleic acid;
5. a "frameshift mutation", resulting in the nucleic acid sequence being translated in a different frame downstream of the mutation. A frameshift mutation can have various causes, such as the insertion, deletion or duplication of one or more nucleotides.
6. a “splice site” mutation, which is a mutation that results in the insertion, deletion or substitution of a nucleotide at the site of splicing.
As used herein, a “deletion” may refer to the deletion of at least one nucleotide. In one embodiment, said deletion may be between 1 and 20 base pairs. In a preferred embodiment, the at least one mutation is a deletion of at least one nucleotide.
In general, the skilled person will understand that at least one mutation as defined above and which leads to the insertion, deletion or substitution of at least one nucleic acid or amino acid compared to the wild-type OML4 or GSK 2 promoter or OML4 or GSK2 nucleic acid or protein sequence can affect the biological activity of the OML4 protein or GSK2 protein respectively.
In one embodiment, the mutation is a loss of function mutation such as a premature stop codon, or an amino acid change in a highly conserved region that is predicted to be important for protein structure.
In one embodiment, the mutation may be introduced into at least one RRM as defined herein of the OML4 gene. In an alternative or further embodiment, the mutation may be a substitution or deletion of a phosphorylation site in OML4. In one embodiment, the mutation may be at position S105, S146 and/or S607 of SEQ ID NO: 1 or a homologous position in a homologous sequence. Preferably, the mutation prevents the phosphorylation of OML4 at one or more of these sites. As described in the examples, preventing phosphorylation (by GSK2) of OML4 at one or more of these sites reduces the protein levels of OML4. In another embodiment, the mutation is introduced into the OML4 or GSK2 promoter and is at least the deletion and/or insertion of at least one nucleic acid. Other major changes such as deletions that remove functional regions of the promoter are also included as these will reduce the expression of OML4 and GSK2.
In one embodiment at least one mutation may be introduced into the OML4 promoter and at least one mutation is introduced into the OML4 gene. In a further embodiment, at least one mutation may also be introduced into the GSK2 gene and at least one mutation is introduced into the GSK2 promoter.
In one embodiment, the mutation is introduced using mutagenesis or targeted genome editing. That is, in one embodiment, the invention relates to a method and plant that has been generated by genetic engineering methods as described above, and does not encompass naturally occurring varieties.
Targeted genome modification or targeted genome editing is a genome engineering technique that uses targeted DNA double-strand breaks (DSBs) to stimulate genome editing through homologous recombination (HR)-mediated recombination events. To achieve effective genome editing via introduction of site-specific DNA DSBs, four major classes of customisable DNA binding proteins can be used: meganucleases derived from microbial mobile genetic elements, ZF nucleases based on eukaryotic transcription factors, transcription activator- 1 ike effectors (TALEs) from Xanthomonas bacteria, and the RNA-guided DNA endonuclease Cas9 from the type II bacterial adaptive immune system CRISPR (clustered regularly interspaced short palindromic repeats). Meganuclease, ZF, and TALE proteins all recognize specific DNA sequences through protein-DNA interactions. Although meganucleases integrate nuclease and DNA-binding domains, ZF and TALE proteins consist of individual modules targeting 3 or 1 nucleotides (nt) of DNA, respectively. ZFs and TALEs can be assembled in desired combinations and attached to the nuclease domain of Fokl to direct nucleolytic activity toward specific genomic loci.
In a preferred embodiment, the genome editing method that can be used according to the various aspects of the invention is CRISPR. The use of this technology in genome editing is well described in the art, for example in US 8,697,359 and references cited herein. In short, CRISPR is a microbial nuclease system involved in defense against invading phages and plasmids. CRISPR loci in microbial hosts contain a combination of CRISPR-associated (Cas) genes as well as non-coding RNA elements capable of programming the specificity of the CRISPR-mediated nucleic acid cleavage (sgRNA). Three types (l-lll) of CRISPR systems have been identified across a wide range of bacterial hosts. One key feature of each CRISPR locus is the presence of an array of repetitive sequences (direct repeats) interspaced by short stretches of non-repetitive sequences (spacers). The non-coding CRISPR array is transcribed and cleaved within direct repeats into short crRNAs containing individual spacer sequences, which direct Cas nucleases to the target site (protospacer). The Type II CRISPR is one of the most well characterized systems and carries out targeted DNA double-strand break in four sequential steps. First, two non-coding RNA, the pre-crRNA array and tracrRNA, are transcribed from the CRISPR locus. Second, tracrRNA hybridizes to the repeat regions of the pre-crRNA and mediates the processing of pre-crRNA into mature crRNAs containing individual spacer sequences. Third, the mature crRNA:tracrRNA complex directs Cas9 to the target DNA via Watson-Crick base-pairing between the spacer on the crRNA and the protospacer on the target DNA next to the protospacer adjacent motif (PAM), an additional requirement for target recognition. Finally, Cas9 mediates cleavage of target DNA to create a double-stranded break within the protospacer.
One major advantage of the CRISPR-Cas9 system, as compared to conventional gene targeting and other programmable endonucleases is the ease of multiplexing, where multiple genes can be mutated simultaneously simply by using multiple sgRNAs each targeting a different gene. In addition, where two sgRNAs are used flanking a genomic region, the intervening section can be deleted or inverted (Wiles et al., 2015).
Cas9 is thus the hallmark protein of the type II CRISPR-Cas system, and is a large monomeric DNA nuclease guided to a DNA target sequence adjacent to the PAM (protospacer adjacent motif) sequence motif by a complex of two noncoding RNAs: CRISPR RNA (crRNA) and trans-activating crRNA (tracrRNA). The Cas9 protein contains two nuclease domains homologous to RuvC and HNH nucleases. The HNH nuclease domain cleaves the complementary DNA strand whereas the RuvC-like domain cleaves the non-complementary strand and, as a result, a blunt cut is introduced in the target DNA. Heterologous expression of Cas9 together with an sgRNA can introduce site-specific double strand breaks (DSBs) into genomic DNA of live cells from various organisms. For applications in eukaryotic organisms, codon optimized versions of Cas9, which is originally from the bacterium Streptococcus pyogenes, have been used. Alternatively, Cpf1 , which is another Cas protein, can be used as the endonuclease. Cpf 1 differs from Cas9 in several ways: Cpf 1 requires a T- rich PAM sequence (TTTV) for target recognition, Cpf1 does not require a tracrRNA, (i.e. only a crRNA is required) and the Cpf 1 -cleavage site is located distal and downstream to the PAM sequence in the protospacer sequence (Li et al., 2017). Furthermore, after identification of the PAM motif, Cpf 1 introduces a sticky-end-like DNA double-stranded break with several nucleotides of overhang. As such, the CRISPR/CPf1 system consists of a Cpf 1 enzyme and a crRNA. In a further alternative embodiment, the nuclease may be MAD7.
The single guide RNA (sgRNA) is the second component of the CRISPR/Cas(Cpf or MAD7) system that forms a complex with the Cas9/Cpf1/MAD7 nuclease. sgRNA is a synthetic RNA chimera created by fusing crRNA with tracrRNA. The sgRNA guide sequence located at its 5’end confers DNA target specificity. Therefore, by modifying the guide sequence, it is possible to create sgRNAs with different target specificities. The canonical length of the guide sequence is 20 bp.
Cas9 (or Cpf1/MAD7) expression plasmids for use in the methods of the invention can be constructed as described in the art. Cas9 or Cpf 1 or MAD7 and the one or more sgRNA molecules may be delivered as separate or as single constructs. Where separate constructs are used for the delivery of the CRISPR enzyme (i.e. Cas9 or Cpf1 or MAD7) and the sgRNA molecule (s), the promoters used to drive expression of the CRISPR enzyme/sgRNA molecule may be the same or different. In one embodiment, RNA polymerase (Pol) ll-dependent promoters or the CaMV35S promoter can be used to drive expression of the CRISPR enzyme. In another embodiment, Pol Ill-dependent promoters, such as U6 or U3, can be used to drive expression of the sgRNA.
Accordingly, using techniques known in the art it is possible to design sgRNA molecules (such as https://chopchop.cbu.uib.no/) it is possible to find target sites and design sgRNA molecules that target a OML4 or GSK2 gene or promoter sequence as described herein. In one embodiment, the sgRNA molecules target a sequence selected from SEQ ID No: 33 (OML4 target sequence) or SEQ ID NO: 34 (GSK2 target sequence) or a variant thereof as defined herein. In a further embodiment, the sgRNA molecules comprises a protospacer sequence selected from SEQ ID No: 35 (OML4 target sequence) or SEQ ID NO: 36 (GSK2 target sequence) or a variant thereof, as defined herein.
In one embodiment, the method uses the sgRNA constructs defined in detail below to introduce a targeted mutation into a OML4 gene and/or promoter, and in a further embodiment, to additionally introduce a mutation into a GSK2 gene and/or promoter.
Thus, aspects of the invention involve targeted mutagenesis methods, specifically genome editing, and in a preferred embodiment exclude embodiments that are solely based on generating plants by traditional breeding methods.
The genome editing constructs may be introduced into a plant cell using any suitable method known to the skilled person (the term “introduced” can be used interchangeably with “transformation”, which is described below). In an alternative embodiment, any of the nucleic acid constructs described herein may be first transcribed to form a preassembled Cas9(or other CRISP nuclease)-sgRNA ribonucleoprotein and then delivered to at least one plant cell using any of the above described methods, such as lipofection, electroporation, holistic bombardment or microinjection.
Specific protocols for using the above described CRISPR constructs would be well known to the skilled person. As one example, a suitable protocol is described in Ma & Liu (“CRISPR/Cas-based multiplex genome editing in monocot and dicot plants”) incorporated herein by reference.
The invention also extends to a plant obtained or obtainable by any method described herein.
Alternatively, more conventional mutagenesis methods can be used to introduce at least one mutation into a OML4 gene or OML4 promoter sequence, or into a GSK2 gene or GSK2 promoter sequence. These methods include both physical and chemical mutagenesis. A skilled person will know further approaches can be used to generate such mutants, and methods for mutagenesis and polynucleotide alterations are well known in the art. See, for example, Kunkel (1985) Proc. Natl. Acad. Sci. USA 82:488- 492; Kunkel et al. (1987) Methods in Enzymol. 154:367-382; U.S. Patent No. 4,873,192; Walker and Gaastra, eds. (1983) Techniques in Molecular Biology (MacMillan Publishing Company, New York) and the references cited therein.
In one embodiment, insertional mutagenesis is used, for example using T-DNA mutagenesis (which inserts pieces of the T-DNA from the Agrobacterium tumefaciens T-Plasmid into DNA causing either loss of gene function or gain of gene function mutations), site-directed nucleases (SDNs) or transposons as a mutagen. Insertional mutagenesis is an alternative means of disrupting gene function and is based on the insertion of foreign DNA into the gene of interest (see Krysan et al, The Plant Cell, Vol. 11 , 2283-2290, December 1999). Accordingly, in one embodiment, T-DNA is used as an insertional mutagen to disrupt the OML4 or GSK2 gene or OML4 or GSK2 promoter expression. T-DNA not only disrupts the expression of the gene into which it is inserted, but also acts as a marker for subsequent identification of the mutation. Since the sequence of the inserted element is known, the gene in which the insertion has occurred can be recovered, using various cloning or PCR-based strategies. The insertion of a piece of T-DNA in the order of 5 to 25 kb in length generally produces a disruption of gene function. If a large enough population of T-DNA transformed lines is generated, there are reasonably good chances of finding a transgenic plant carrying a T-DNA insert within any gene of interest. Transformation of spores with T-DNA is achieved by an Agro bacterium- mediated method which involves exposing plant cells and tissues to a suspension of Agrobacterium cells.
The details of this method are well known to a skilled person. In short, plant transformation by Agrobacterium results in the integration into the nuclear genome of a sequence called T-DNA, which is carried on a bacterial plasmid. The use of T-DNA transformation leads to stable single insertions. Further mutant analysis of the resultant transformed lines is straightforward and each individual insertion line can be rapidly characterized by direct sequencing and analysis of DNA flanking the insertion. Gene expression in the mutant is compared to expression of the OML4 or GSK2 nucleic acid sequence in a wild type plant and phenotypic analysis is also carried out.
In another embodiment, mutagenesis is physical mutagenesis, such as application of ultraviolet radiation, X-rays, gamma rays, fast or thermal neutrons or protons. The targeted population can then be screened to identify a OML4 or GSK2 loss of function mutant. In another embodiment of the various aspects of the invention, the method comprises mutagenizing a plant population with a mutagen. The mutagen may be a fast neutron irradiation or a chemical mutagen, for example selected from the following non-limiting list: ethyl methanesulfonate (EMS), methylmethane sulfonate (MMS), N-ethyl-N- nitrosurea (ENU), triethylmelamine (1ΈM), N-methyl-N-nitrosourea (MNU), procarbazine, chlorambucil, cyclophosphamide, diethyl sulfate, acrylamide monomer, melphalan, nitrogen mustard, vincristine, dimethylnitosamine, N-methyl-N'-nitro- Nitrosoguanidine (MNNG), nitrosoguanidine, 2-aminopurine, 7,12 dimethyl- benz(a)anthracene (DMBA), ethylene oxide, hexamethylphosphoramide, bisulfan, diepoxyalkanes (diepoxyoctane (DEO), diepoxybutane (BEB), and the like), 2-methoxy- 6-chloro-9 [3-(ethyl-2-chloroethyl)aminopropylamino]acridine dihydrochloride (ICR-170) or formaldehyde. Again, the targeted population can then be screened to identify a OML4 or GSK2 gene or promoter mutant.
In another embodiment, the method used to create and analyse mutations is targeting induced local lesions in genomes (TILLING), reviewed in Henikoff et al, 2004. In this method, seeds are mutagenised with a chemical mutagen, for example EMS. The resulting M1 plants are self-fertilised and the M2 generation of individuals is used to prepare DNA samples for mutational screening. DNA samples are pooled and arrayed on microtiter plates and subjected to gene specific PCR. The PCR amplification products may be screened for mutations in the target gene using any method that identifies heteroduplexes between wild type and mutant genes. For example, but not limited to, denaturing high pressure liquid chromatography (dHPLC), constant denaturant capillary electrophoresis (CDCE), temperature gradient capillary electrophoresis (TGCE), or by fragmentation using chemical cleavage. Preferably the PCR amplification products are incubated with an endonuclease that preferentially cleaves mismatches in heteroduplexes between wild type and mutant sequences. Cleavage products are electrophoresed using an automated sequencing gel apparatus, and gel images are analyzed with the aid of a standard commercial image-processing program. Any primer specific to the OML4 or GSK2 nucleic acid sequence may be utilized to amplify the OML4 or GSK2 nucleic acid sequence within the pooled DNA sample. Preferably, the primer is designed to amplify the regions of the OML4 or GSK2 gene where useful mutations are most likely to arise, specifically in the areas of the genes that are highly conserved and/or confer activity as explained elsewhere. To facilitate detection of PCR products on a gel, the PCR primer may be labelled using any conventional labelling method. In an alternative embodiment, the method used to create and analyse mutations is EcoTILLING. EcoTILLING is molecular technique that is similar to TILLING, except that its objective is to uncover natural variation in a given population as opposed to induced mutations. The first publication of the EcoTILLING method was described in Comai et al.2004.
Rapid high-throughput screening procedures thus allow the analysis of amplification products for identifying a mutation conferring the reduction or inactivation of the expression of the OML4 or GSK2 gene as compared to a corresponding non- mutagenised wild type plant. Once a mutation is identified in a gene of interest, the seeds of the M2 plant carrying that mutation are grown into adult M3 plants and screened for the phenotypic characteristics associated with the target gene. Loss of and reduced function mutants with increased grain weight and/or grain size compared to a control can thus be identified.
Plants obtained or obtainable by such method which carry a partial or complete loss of function mutation in the endogenous OML4 gene or promoter locus are also within the scope of the invention
In an alternative embodiment, the expression of the OML4 or GSK2 gene may be reduced at either the level of transcription or translation. For example, expression of a OML4 or GSK2 nucleic acid as defined herein, can be reduced or silenced using a number of gene silencing methods known to the skilled person, such as, but not limited to, the use of small interfering nucleic acids (siNA) against OML4 or GSK2.
“Gene silencing" is a term generally used to refer to suppression of expression of a gene via sequence-specific interactions that are mediated by RNA molecules. The degree of reduction may be so as to totally abolish production of the encoded gene product, but more usually the abolition of expression is partial, with some degree of expression remaining. The term should not therefore be taken to require complete "silencing" of expression. In one embodiment, the siNA may include, short interfering RNA (siRNA), double- stranded RNA (dsRNA), micro-RNA (miRNA), antagomirs and short hairpin RNA (shRNA) capable of mediating RNA interference.
The inhibition of expression and/or activity can be measured by determining the presence and/or amount of OML4 or GSK2 transcript using techniques well known to the skilled person (such as Northern Blotting, RT-PCR and so on).
Transgenes may be used to suppress endogenous plant genes. This was discovered originally when chalcone synthase transgenes in petunia caused suppression of the endogenous chalcone synthase genes and indicated by easily visible pigmentation changes. Subsequently it has been described how many, if not all plant genes can be "silenced" by transgenes. Gene silencing requires sequence similarity between the transgene and the gene that becomes silenced. This sequence homology may involve promoter regions or coding regions of the silenced target gene. When coding regions are involved, the transgene able to cause gene silencing may have been constructed with a promoter that would transcribe either the sense or the antisense orientation of the coding sequence RNA. It is likely that the various examples of gene silencing involve different mechanisms that are not well understood. In different examples there may be transcriptional or post-transcriptional gene silencing and both may be used according to the methods of the invention.
The mechanisms of gene silencing and their application in genetic engineering, which were first discovered in plants in the early 1990s and then shown in Caenorhabditis elegans are extensively described in the literature.
RNA interference (RNAi) is another post-transcriptional gene-silencing phenomenon which may be used according to the methods of the invention. This is induced by double-stranded RNA in which mRNA that is homologous to the dsRNA is specifically degraded. It refers to the process of sequence-specific post-transcriptional gene silencing mediated by short interfering RNAs (siRNA). The process of RNAi begins when the enzyme, DICER, encounters dsRNA and chops it into pieces called small- interfering RNAs (siRNA). This enzyme belongs to the RNase III nuclease family. A complex of proteins gathers up these RNA remains and uses their code as a guide to search out and destroy any RNAs in the cell with a matching sequence, such as target mRNA.
Artificial and/or natural microRNAs (miRNAs) may be used to knock out gene expression and/or mRNA translation. MicroRNAs (miRNAs) miRNAs are typically single stranded small RNAs typically 19-24 nucleotides long. Most plant miRNAs have perfect or near-perfect complementarity with their target sequences. However, there are natural targets with up to five mismatches. They are processed from longer non coding RNAs with characteristic fold-back structures by double-strand specific RNases of the Dicer family. Upon processing, they are incorporated in the RNA-induced silencing complex (RISC) by binding to its main component, an Argonaute protein. miRNAs serve as the specificity components of RISC, since they base-pair to target nucleic acids, mostly mRNAs, in the cytoplasm. Subsequent regulatory events include target mRNA cleavage and destruction and/or translational inhibition. Effects of miRNA overexpression are thus often reflected in decreased mRNA levels of target genes. Artificial microRNA (amiRNA) technology has been applied in Arabidopsis thaliana and other plants to efficiently silence target genes of interest. The design principles for amiRNAs have been generalized and integrated into a Web-based tool (http://wmd.weiaelworld.org)·
Thus, according to the various aspects of the invention a plant may be transformed to introduce a RNAi, shRNA, snRNA, dsRNA, siRNA, miRNA, ta-siRNA, amiRNA or cosuppression molecule that has been designed to target the expression of an OML4 or GSK2 nucleic acid sequence and selectively decreases or inhibits the expression of the gene or stability of its transcript. Preferably, the RNAi, snRNA, dsRNA, shRNA siRNA, miRNA, amiRNA, ta-siRNA or cosuppression molecule used according to the various aspects of the invention comprises a fragment of at least 17 nt, preferably 22 to 26 nt and can be designed on the basis of the information shown in any of SEQ ID NOs:2, 5, 8, 11 , 14, 17, 20, 23, 26 and 29. Guidelines for designing effective siRNAs are known to the skilled person. Briefly, a short fragment of the target gene sequence (e.g., 19-40 nucleotides in length) is chosen as the target sequence of the siRNA of the invention. The short fragment of target gene sequence is a fragment of the target gene mRNA. In preferred embodiments, the criteria for choosing a sequence fragment from the target gene mRNA to be a candidate siRNA molecule include 1) a sequence from the target gene mRNA that is at least 50-100 nucleotides from the 5’ or 3’ end of the native mRNA molecule, 2) a sequence from the target gene mRNA that has a G/C content of between 30% and 70%, most preferably around 50%, 3) a sequence from the target gene mRNA that does not contain repetitive sequences (e.g., AAA, CCC, GGG, TTT, AAAA, CCCC, GGGG, TTTT), 4) a sequence from the target gene mRNA that is accessible in the mRNA, 5) a sequence from the target gene mRNA that is unique to the target gene, 6) avoids regions within 75 bases of a start codon. The sequence fragment from the target gene mRNA may meet one or more of the criteria identified above. The selected gene is introduced as a nucleotide sequence in a prediction program that takes into account all the variables described above for the design of optimal oligonucleotides. This program scans any mRNA nucleotide sequence for regions susceptible to be targeted by siRNAs. The output of this analysis is a score of possible siRNA oligonucleotides. The highest scores are used to design double stranded RNA oligonucleotides that are typically made by chemical synthesis. In addition to siRNA which is complementary to the mRNA target region, degenerate siRNA sequences may be used to target homologous regions. siRNAs according to the invention can be synthesized by any method known in the art. RNAs are preferably chemically synthesized using appropriately protected ribonucleoside phosphoramidites and a conventional DNA/RNA synthesizer. Additionally, siRNAs can be obtained from commercial RNA oligonucleotide synthesis suppliers.
The silencing RNA molecule is introduced into the plant using conventional methods, for example a vector and Agrobacterium-mediated transformation. Stably transformed plants are generated and expression of the OML4 or GSK2 gene compared to a wild type control plant is analysed.
Silencing of the OML4 or GSK2 nucleic acid sequence may also be achieved using virus-induced gene silencing.
Thus, in one embodiment of the invention, the plant expresses a nucleic acid construct comprising a RNAi, shRNA snRNA, dsRNA, siRNA, miRNA, ta-siRNA, amiRNA or co suppression molecule that targets the OML4 nucleic acid sequence as described herein and reduces expression of the endogenous OML4 nucleic acid sequence. A gene is targeted when, for example, the RNAi, snRNA, dsRNA, siRNA, shRNA miRNA, ta-siRNA, amiRNA or cosuppression molecule selectively decreases or inhibits the expression of the gene compared to a control plant. Alternatively, a RNAi, snRNA, dsRNA, siRNA, miRNA, ta-siRNA, amiRNA or cosuppression molecule targets a OML4 or GSK2 nucleic acid sequence when the RNAi, shRNA snRNA, dsRNA, siRNA, miRNA, ta-siRNA, amiRNA or cosuppression molecule hybridises under stringent conditions to the gene transcript.
A further approach to gene silencing is by targeting nucleic acid sequences complementary to the regulatory region of the gene (e.g., the promoter and/or enhancers) of OML4 or GSK2 to form triple helical structures that prevent transcription of the gene in target cells. Other methods, such as the use of antibodies directed to an endogenous polypeptide for inhibiting its function in plants, or interference in the signalling pathway in which a polypeptide is involved, will be well known to the skilled man. In particular, it can be envisaged that man-made molecules may be useful for inhibiting the biological function of a target polypeptide, or for interfering with the signalling pathway in which the target polypeptide is involved.
In another aspect, the invention relates to a silencing construct obtainable or obtained by a method as described herein and to a plant cell comprising such construct. In one example an RNAi construct to silence GSK2 comprises or consists of the sequence defined in SEQ ID NO: 31 or a functional variant thereof.
In another aspect, the invention extends to a plant obtained or obtainable by a method as described herein.
Methods of increasing grain number
In another aspect of the invention, there is provided a method of increasing the grain number in a plant. As shown in Figure 4(m) overexpressing OML4 results in a significant increase in grain number. Accordingly, in a further aspect of the invention, there is provided a method of increasing grain number in a plant, the method comprising increasing the expression and/or activity of OML4. Preferably said increase is relative to a wild-type or control plant.
In one embodiment, an “increase” in grain number may comprise an increase of at least 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45% or 50% compared to the grain number in a wild-type or control plant. In one embodiment, an increase in grain number may be an increase in grain number per panicle. Any of the above can be measured using standard techniques in the art.
In a further aspect of the invention, the method further comprises increasing the expression or activity of SHAGGY-like kinase (GSK2).
In one embodiment, the method may comprise introducing and expressing in a plant or plant cell a nucleic acid construct comprising a nucleic acid sequence encoding an OML4 polypeptide as defined in SEQ ID NO: 1 or a homolog or functional variant thereof, as defined herein. Preferably, the nucleic acid sequence is operably linked to a regulatory sequence, preferably a promoter. In another embodiment, the nucleic acid construct may comprise a first nucleic acid sequence encoding an OML4 polypeptide as defined above and a second nucleic acid sequence encoding a GSK2 polypeptide as defined in SEQ ID NO: 4 or a homolog or functional variant thereof. Preferably, the first and second nucleic acid sequences are operably linked to a regulatory sequence, preferably a promoter. The first and second nucleic acid sequences may be operably linked to the same or a different regulatory sequence.
In an alternative embodiment, the method may comprise introducing and expressing a first nucleic acid construct comprising a nucleic acid sequence encoding an OML4 polypeptide as defined above and a second nucleic acid construct comprising a nucleic acid sequence encoding a GSK2 polypeptide as defined above. Again, the nucleic acid sequences are preferably operably linked to a regulatory sequence. The second nucleic acid construct may be introduced and expressed in the plant before, after or concurrently with the first nucleic acid construct.
Methods for the introduction of a nucleic acid construct as described above into a plant or plant cell (also called “transformation” (such terms may be used interchangeably)) are described herein. In one embodiment, the progeny plant is stably transformed with the nucleic acid construct described herein and comprises the exogenous polypeptide or polypeptides that are heritably maintained in the plant cell. The method may also comprise the additional step of collecting seeds from the selected progeny plant.
The method may further comprise the step of regenerating a transgenic plant from the plant cell wherein the transgenic plant comprises in its genome a nucleic acid sequence selected from SEQ ID NO: 2 and a nucleic acid sequence selected from SEQ ID NO: 5 or a homolog or functional variant thereof, and obtaining progeny derived from the transgenic plant, where the progeny exhibits an increase in grain number.
In a further embodiment, the method may comprise introducing a mutation into the plant genome, where said mutation is the insertion of at least one or more additional copy(ies) of a nucleic acid encoding a OML4 polypeptide or a homolog or variant thereof such that said sequence is operably linked to a regulatory sequence and wherein said mutation is introduced using targeted genome editing. Preferably, said mutation results in an increase in the expression of a OML4 nucleic acid compared to a control or wild-type plant. In an additional embodiment, the method may further comprise introducing one or more further mutations into the plant genome, where the one or more further mutations is the insertion of at least one or more additional copy(ies) of a nucleic acid encoding a GSK2 polypeptide or a homologue or functional variant thereof such that said sequence is operably linked to a regulatory sequence. Again, preferably the mutation is introduced using targeted genome editing. Preferably the mutation also results in an increase in the expression of a GSK2 polypeptide compared to a control or wild-type plant. The genomic and amino acid sequence of rice OML4 and GSK2 and its homologs are defined below.
In one embodiment, the mutation is introduced using CRISPR as described herein.
The invention also extends to plants obtained or obtainable by any method described herein.
Genetically altered or modified plants and methods of producing such plants
In another aspect of the invention there is provided a genetically altered plant, part thereof or plant cell characterised in that the plant does not express OML4, has reduced levels of OML4 expression, does not express a functional OML4 protein or expresses a OML4 protein with reduced function and/or activity. For example, the plant is a reduction (knock down) or loss of function (knock out) mutant wherein the function of the OML4 nucleic acid sequence is reduced or lost compared to a wild type control plant. To this end, a mutation is introduced into either the OML4 gene sequence or the corresponding promoter sequence, which disrupts the transcription of the gene. Therefore, preferably said plant comprises at least one mutation in the promoter and/or gene for OML4. In one embodiment the plant may comprise a mutation in both the promoter and gene for OML4.
In a further embodiment, the genetically altered plant, part thereof or plant cell is further characterised in that the plant also does not express GSK2 has reduced levels of GSK2 expression, does not express a functional GSK2 protein or expresses a GSK2 protein with reduced function and/or activity.
In a further aspect of the invention, there is provided a plant, part thereof or plant cell characterised by an increase in grain weight and/or size compared to a wild-type or control pant, wherein preferably, the plant comprises at least one mutation in the OML4 gene and/or its promoter.
The plant may be produced by introducing a mutation, preferably a deletion, insertion or substitution into the OML4 gene and/or promoter sequence by any of the above described methods. Preferably said mutation is introduced into a least one plant cell and a plant regenerated from the at least one mutated plant cell.
Alternatively, the plant or plant cell may comprise a nucleic acid construct expressing an RNAi molecule targeting the OML4 or GSK2 gene as described herein. In one embodiment, said construct is stably incorporated into the plant genome. These techniques also include gene targeting using vectors that target the gene of interest and which allow integration of a transgene at a specific site. The targeting construct is engineered to recombine with the target gene, which is accomplished by incorporating sequences from the gene itself into the construct. Recombination then occurs in the region of that sequence within the gene, resulting in the insertion of a foreign sequence to disrupt the gene. With its sequence interrupted, the altered gene will be translated into a nonfunctional protein, if it is translated at all.
In another aspect of the invention there is provided a method for producing a genetically altered plant as described herein. In one embodiment, the method comprises introducing at least one mutation into the OML4 gene and/or OML4 promoter of preferably at least one plant cell using any mutagenesis technique described herein. In a further embodiment, the method comprises further introducing at least one mutation into the GSK2 gene and/or GSK2 promoter Preferably, said method further comprising regenerating a plant from the mutated plant cell.
The method may further comprise selecting one or more mutated plants, preferably for further propagation. Preferably said selected plants comprise at least one mutation in the target gene(s) and/or promoter sequence (s). Preferably said plants or said seeds of said plant are characterised by abolished or a reduced level of OML4 expression and/or a reduced level of OML4 polypeptide activity. Expression and/or activity levels of OML4 can be measured by any standard technique known to the skilled person. A reduction is as described herein.
The selected plants may be propagated by a variety of means, such as by clonal propagation or classical breeding techniques. For example, a first generation (or T1) transformed plant may be selfed and homozygous second-generation (or T2) transformants selected, and the T2 plants may then further be propagated through classical breeding techniques. The generated transformed organisms may take a variety of forms. For example, they may be chimeras of transformed cells and non- transformed cells; clonal transformants (e.g., all cells transformed to contain the expression cassette); grafts of transformed and untransformed tissues (e.g., in plants, a transformed rootstock grafted to an untransformed scion).
In a further aspect of the invention there is provided a plant obtained or obtainable by the above-described methods.
In another aspect of the invention, there is provided a genetically altered plant, part thereof or plant cell characterised in that the expression of OML4 is increased compared to the level of expression in a control or wild-type plant. Preferably, the plant expresses a polynucleotide that is either exogenous or endogenous to that plant. That is, a polynucleotide that is introduced into the plant by any means other than a sexual cross. In one embodiment of the method, an exogenous nucleic acid is expressed in the transgenic plant, which is a nucleic acid construct comprising a nucleic acid construct as described above. Alternatively, the plant carries a mutation in its genome where the mutation is the insertion of at least one or more additional copy of a nucleic acid sequence encoding an OML4 polypeptide, as defined herein, or a homolog or variant thereof such that said sequence is operably linked to a regulatory sequence. The plant may further comprise a second mutation in the plant genome, wherein the mutation is the insertion of at least one or more additional copy of a nucleic acid sequence encoding a GSK2 polypeptide, as defined herein, or a homolog or variant thereof such that said sequence is operably linked to a regulatory sequence. Preferably the mutation is introduced using targeted genome editing.
For the purposes of the invention, a “genetically altered plant” or “mutant plant” is a plant that has been genetically altered compared to the naturally occurring wild type (WT) plant. In one embodiment, a mutant plant is a plant that has been altered compared to the naturally occurring wild type (WT) plant using a mutagenesis method, such as any of the mutagenesis methods described herein. In one embodiment, the mutagenesis method is targeted genome modification or genome editing. In one embodiment, the plant genome has been altered compared to wild type sequences using a mutagenesis method. Such plants have an altered phenotype as described herein, such as an increased disease resistance. Therefore, in one example, increased grain weight and/or size is conferred by the presence of an altered plant genome, for example, a mutated endogenous OML4 gene or OML4 promoter sequence. In one embodiment, the endogenous promoter or gene sequence is specifically targeted using targeted genome modification and the presence of a mutated gene or promoter sequence is not conferred by the presence of transgenes expressed in the plant. In other words, the genetically altered plant can be described as transgene-free.
A plant according to the various aspects of the invention, including the transgenic plants, methods and uses described herein may be a monocot or a dicot plant. Preferably, the plant is a crop plant. By crop plant is meant any plant which is grown on a commercial scale for human or animal consumption or use.
Preferably, the crop plant is selected from rice, wheat, maize, soybean and brassicas, such as for example, B.napus. More preferably, the crop plant is rice and even more preferably the japonica or indica variety.
The term "plant" as used herein encompasses whole plantsand progeny of the plants and plant parts, including seeds, fruit, shoots, stems, leaves, roots (including tubers), flowers, tissues and organs, wherein each of the aforementioned comprise at least one of the mutations described herein or a sgRNA or an RNAi construct as described herein. The term "plant" also encompasses plant cells, suspension cultures, callus tissue, embryos, meristematic regions, gametophytes, sporophytes, pollen and microspores, again wherein each of the aforementioned comprises at least one of the mutations described herein or nucleic acid construct, a sgRNA or an RNAi construct as described herein. Accordingly, in one embodiment, the plat part is a grain or seed.
The invention also extends to harvestable parts of a plant of the invention as described herein, but not limited to seeds, leaves, fruits, flowers, stems, roots, rhizomes, tubers and bulbs. The aspects of the invention also extend to products derived, preferably directly derived, from a harvestable part of such a plant, such as dry pellets or powders, oil, fat and fatty acids, starch or proteins. Another product that may derived from the harvestable parts of the plant of the invention is biodiesel. The invention also relates to food products and food supplements comprising the plant of the invention or parts thereof. In one embodiment, the food products may be animal feed. In another aspect of the invention, there is provided a product derived from a plant as described herein or from a part thereof.
In a most preferred embodiment, the plant part or harvestable product is a seed or grain. Therefore, in a further aspect of the invention, there is provided a seed produced from a genetically altered plant as described herein.
In an alternative embodiment, the plant part is pollen, a propagule or progeny of the genetically altered plant described herein. Accordingly, in a further aspect of the invention there is provided pollen, a propagule or progeny of the genetically altered plant as described herein.
A control plant as used herein according to all of the aspects of the invention is a plant which has not been modified according to the methods of the invention. Accordingly, in one embodiment, the control plant does not have reduced expression of a OML4 nucleic acid and/or reduced activity of a OML4 polypeptide. In an alternative embodiment, the plant been genetically modified, as described above. In one embodiment, the control plant is a wild type plant. The control plant is typically of the same plant species, preferably having the same genetic background as the modified plant. Genome editing constructs for use with the methods for targeted genome modification described herein
By “crRNA” or CRISPR RNA is meant the sequence of RNA that contains the protospacer element and additional nucleotides that are complementary to the tracrRNA.
By “tracrRNA” (transactivating RNA) is meant the sequence of RNA that hybridises to the crRNA and binds a CRISPR enzyme, such as Cas9 thereby activating the nuclease complex to introduce double-stranded breaks at specific sites within the genomic sequence of at least one OML4 or GSK2 nucleic acid or promoter sequence.
By “protospacer element” is meant the portion of crRNA (or sgRNA) that is complementary to the genomic DNA target sequence, usually around 20 nucleotides in length. This may also be known as a spacer or targeting sequence.
By “sgRNA” (single-guide RNA) is meant the combination of tracrRNA and crRNA in a single RNA molecule, preferably also including a linker loop (that links the tracrRNA and crRNA into a single molecule). ’’sgRNA” may also be referred to as “gRNA" and in the present context, the terms are interchangeable. The sgRNA or gRNA provide both targeting specificity and scaffolding/binding ability for a Cas nuclease. A gRNA may refer to a dual RNA molecule comprising a crRNA molecule and a tracrRNA molecule.
By “TAL effector” (transcription activator-like (TAL) effector) or TALE is meant a protein sequence that can bind the genomic DNA target sequence (e.g. a sequence within the OML4 gene or promoter sequence) and that can be fused to the cleavage domain of an endonuclease such as Fokl to create TAL effector nucleases or TALENS or meganucleases to create megaTALs. A TALE protein is composed of a central domain that is responsible for DNA binding, a nuclear-localisation signal and a domain that activates target gene transcription. The DNA-binding domain consists of monomers and each monomer can bind one nucleotide in the target nucleotide sequence. Monomers are tandem repeats of 33-35 amino acids, of which the two amino acids located at positions 12 and 13 are highly variable (repeat variable diresidue, RVD). It is the RVDs that are responsible for the recognition of a single specific nucleotide. HD targets cytosine; Nl targets adenine, NG targets thymine and NN targets guanine (although NN can also bind to adenine with lower specificity). In another aspect of the invention there is provided a nucleic acid construct wherein the nucleic acid construct encodes at least one DNA-binding domain, wherein the DNA- binding domain can bind to a sequence in the OML4 gene, wherein said sequence is comprises or consists of SEQ ID NO: 33 or a variant thereof. In an alternative embodiment, the DNA-binding domain can bind to a sequence in the GSK2 gene, wherein said sequence comprises or consists of SEQ ID NO: 34 or a variant thereof. In one embodiment, said construct further comprises a nucleic acid encoding a SSN, such as Fokl or a Cas protein.
In one embodiment, the nucleic acid construct encodes at least one protospacer element wherein the sequence of the protospacer element is selected from SEQ ID No: 35 (to target OML4) or SEQ ID NO: 36 (to target GSK2) or a variant thereof.
In a further embodiment, the nucleic acid construct comprises a crRNA-encoding sequence. As defined above, a crRNA sequence may comprise the protospacer elements as defined above and preferably additional nucleotides that are complementary to the tracrRNA. An appropriate sequence for the additional nucleotides will be known to the skilled person as these are defined by the choice of Cas protein.
In another embodiment, the nucleic acid construct further comprises a tracrRNA sequence. Again, an appropriate tracrRNA sequence would be known to the skilled person as this sequence is defined by the choice of Cas protein.
In a further embodiment, the nucleic acid construct comprises at least one nucleic acid sequence that encodes a sgRNA (or gRNA). Again, as already discussed, sgRNA typically comprises a crRNA sequence, a tracrRNA sequence and preferably a sequence for a linker loop.
In a further embodiment, the nucleic acid construct may further comprise at least one nucleic acid sequence encoding an endoribonuclease cleavage site. Preferably the endoribonuclease is Csy4 (also known as Cas6f). Where the nucleic acid construct comprises multiple sgRNA nucleic acid sequences the construct may comprise the same number of endoribonuclease cleavage sites. In another embodiment, the cleavage site is 5’ of the sgRNA nucleic acid sequence. Accordingly, each sgRNA nucleic acid sequence is flanked by a endoribonuclease cleavage site.
The term ‘variant’ refers to a nucleotide sequence where the nucleotides are substantially identical to one of the above sequences. The variant may be achieved by modifications such as an insertion, substitution or deletion of one or more nucleotides. In a preferred embodiment, the variant has at least 50%, at least 55%, at least 60%, at least 65%, at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, at least 91%, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% identity to any one of the above sequences. In one embodiment, sequence identity is at least 90%. In another embodiment, sequence identity is 100%. Sequence identity can be determined by any one known sequence alignment program in the art.
The invention also relates to a nucleic acid construct comprising a nucleic acid sequence operably linked to a suitable plant promoter. A suitable plant promoter may be a constitutive or strong promoter or may be a tissue-specific promoter. In one embodiment, suitable plant promoters are selected from, but not limited to U3 and U6.
The nucleic acid construct of the present invention may also further comprise a nucleic acid sequence that encodes a CRISPR enzyme. By “CRISPR enzyme” is meant an RNA-guided DNA endonuclease that can associate with the CRISPR system. Specifically, such an enzyme binds to the tracrRNA sequence. In one embodiment, the CRIPSR enzyme is a Cas protein (“CRISPR associated protein), preferably Cas 9 or Cpf 1 , more preferably Cas9. In a specific embodiment Cas9 is a codon-optimised Cas9 (specific for the plant in question). In one embodiment, Cas9 has the sequence described in SEQ ID NO: 32 or a functional variant or homolog thereof. In another embodiment, the CRISPR enzyme is a protein from the family of Class 2 candidate x proteins, such as C2c1 , C2C2 and/or C2c3. In one embodiment, the Cas protein is from Streptococcus pyogenes. In an alternative embodiment, the Cas protein may be from any one of Staphylococcus aureus, Neisseria meningitides, Streptococcus thermophiles or Treponema denticola. Alternatively, the CRISPR enzyme is MAD7.
The term “functional variant” as used herein with reference to Cas9 refers to a variant Cas9 gene sequence or part of the gene sequence which retains the biological function of the full non-variant sequence, for example, acts as a DNA endonuclease, or recognition or/and binding to DNA. A functional variant also comprises a variant of the gene of interest, which has sequence alterations that do not affect function, for example non-conserved residues. Also encompassed is a variant that is substantially identical, i.e. has only some sequence variations, for example in non-conserved residues, compared to the wild type sequences as shown herein and is biologically active. In one embodiment, a functional variant of SEQ ID NO: 32 has at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% overall sequence identity to the nucleic acid acid represented by SEQ ID NO: 32. In a further embodiment, the Cas9 protein has been modified to improve activity.
Suitable homologs or orthologs can be identified by sequence comparisons and identifications of conserved domains. The function of the homolog or ortholog can be identified as described herein and a skilled person would thus be able to confirm the function when expressed in a plant.
In an alternative aspect of the invention, the nucleic acid construct comprises at least one nucleic acid sequence that encodes a TAL effector, wherein said effector targets a OML4 sequence, such as SEQ ID NO: 33 or a GSK2 sequence such as SEQ ID NO: 34. Methods for designing a TAL effector would be well known to the skilled person, given the target sequence. Examples of suitable methods are given in Sanjana et al., and Cermak T et al, both incorporated herein by reference. Preferably, said nucleic acid construct comprises two nucleic acid sequences encoding a TAL effector, to produce a TALEN pair. In a further embodiment, the nucleic acid construct further comprises a sequence-specific nuclease (SSN). Preferably such SSN is a endonuclease such as Fokl. In a further embodiment, the TALENs are assembled by the Golden Gate cloning method in a single plasmid or nucleic acid construct.
In another aspect of the invention, there is provided a sgRNA molecule, wherein the sgRNA molecule comprises a crRNA sequence and a tracrRNA sequence and wherein the crRNA sequence can bind to at least one sequence such as SEQ ID NO: 33 (for OML4) or SEQ ID NO: 34 (for GSK2) or a variant thereof.
A “variant” is as defined herein. In one embodiment, the sgRNA molecule may comprise at least one chemical modification, for example that enhances its stability and/or binding affinity to the target sequence or the crRNA sequence to the tracrRNA sequence. Such modifications would be well known to the skilled person, and include for example, but not limited to, the modifications described in Rahdar et al., 2015, incorporated herein by reference. In this example the crRNA may comprise a phosphorothioate backbone modification, such as 2’-fluoro (2’-F), 2’-0-methyl
Figure imgf000041_0001
Me) and S-constrained ethyl (cET) substitutions.
In another aspect of the invention, there is provided an isolated nucleic acid sequence that encodes for a protospacer element (as defined in any of SEQ ID NO: 35 or 36.)
In another aspect of the invention, there is provided a plant or part thereof or at least one isolated plant cell transfected with at least one nucleic acid construct as described herein. Cas9 and sgRNA may be combined or in separate expression vectors (or nucleic acid constructs, such terms are used interchangeably). In other words, in one embodiment, an isolated plant cell is transfected with a single nucleic acid construct comprising both sgRNA and Cas9 as described in detail above. In an alternative embodiment, an isolated plant cell is transfected with two nucleic acid constructs, a first nucleic acid construct comprising at least one sgRNA as defined above and a second nucleic acid construct comprising Cas9 or a functional variant or homolog thereof. The second nucleic acid construct may be transfected below, after or concurrently with the first nucleic acid construct. The advantage of a separate, second construct comprising a cas protein is that the nucleic acid construct encoding at least one sgRNA can be paired with any type of cas protein, as described herein, and therefore is not limited to a single cas function (as would be the case when both cas and sgRNA are encoded on the same nucleic acid construct).
In one embodiment, the nucleic acid construct comprising a cas protein is transfected first and is stably incorporated into the genome, before the second transfection with a nucleic acid construct comprising at least one sgRNA nucleic acid. In an alternative embodiment, a plant or part thereof or at least one isolated plant cell is transfected with mRNA encoding a cas protein and co-transfected with at least one nucleic acid construct as defined herein.
Cas9 expression vectors for use in the present invention can be constructed as described in the art. In one example, the expression vector comprises a nucleic acid sequence as defined herein or a functional variant or homolog thereof, wherein said nucleic acid sequence is operably linked to a suitable promoter. Examples of suitable promoters include, but are not limited to Cas9, 35S and Actin.
In an alternative aspect of the present invention, there is provided an isolated plant cell transfected with at least one sgRNA molecule as described herein.
In a further aspect of the invention, there is provided a genetically modified or edited plant comprising the transfected cell described herein. In one embodiment, the nucleic acid construct or constructs may be integrated in a stable form. In an alternative embodiment, the nucleic acid construct or constructs are not integrated (i.e. are transiently expressed). Accordingly, in a preferred embodiment, the genetically modified plant is free of any sgRNA and/or Cas protein nucleic acid. In other words, the plant is transgene free.
The term "introduction", “transfection” or "transformation" as referred to throughout the application encompasses the transfer of an exogenous polynucleotide into a host cell, irrespective of the method used for transfer. Plant tissue capable of subsequent clonal propagation, whether by organogenesis or embryogenesis, may be transformed with a genetic construct of the present invention and a whole plant regenerated there from. The particular tissue chosen will vary depending on the clonal propagation systems available for, and best suited to, the particular species being transformed. Exemplary tissue targets include leaf disks, pollen, embryos, cotyledons, hypocotyls, megagametophytes, callus tissue, existing meristematic tissue (e.g., apical meristem, axillary buds, and root meristems), and induced meristem tissue (e.g., cotyledon meristem and hypocotyl meristem). The resulting transformed plant cell may then be used to regenerate a transformed plant in a manner known to persons skilled in the art. The transfer of foreign genes into the genome of a plant is called transformation. Transformation of plants is now a routine technique in many species. Any of several transformation methods known to the skilled person may be used to introduce any of the nucleic acid constructs described herein or, a sgRNA molecule of interest into a suitable ancestor cell. The methods described for the transformation and regeneration of plants from plant tissues or plant cells may be utilized for transient or for stable transformation. Transformation methods include the use of liposomes, electroporation, chemicals that increase free DNA uptake, injection of the DNA directly into the plant (microinjection), gene guns (or biolistic particle delivery systems (bioloistics)) as described in the examples, lipofection, transformation using viruses or pollen and microprojection. Methods may be selected from the calcium/polyethylene glycol method for protoplasts, ultrasound-mediated gene transfection, optical or laser transfection, transfection using silicon carbide fibers, electroporation of protoplasts, microinjection into plant material, DNA or RNA-coated particle bombardment, infection with (non-integrative) viruses and the like. Transgenic plants, can also be produced via Agrobacterium tumefaciens mediated transformation, including but not limited to using the floral dip/ Agrobacterium vacuum infiltration method as described in Clough & Bent (1998) and incorporated herein by reference.
Accordingly, in one embodiment, at least one nucleic acid construct or sgRNA molecule as described herein can be introduced to at least one plant cell using any of the above described methods. In an alternative embodiment, any of the nucleic acid constructs described herein may be first transcribed to form a preassembled Cas9- sgRNA ribonucleoprotein and then delivered to at least one plant cell using any of the above described methods, such as lipofection, electroporation or microinjection.
Optionally, to select transformed plants, the plant material obtained in the transformation is, as a rule, subjected to selective conditions so that transformed plants can be distinguished from untransformed plants. For example, the seeds obtained in the above-described manner can be planted and, after an initial growing period, subjected to a suitable selection by spraying. A further possibility is growing the seeds, if appropriate after sterilization, on agar plates using a suitable selection agent so that only the transformed seeds can grow into plants. As described in the examples, a suitable marker can be bar-phosphinothricin or PPT. Alternatively, the transformed plants are screened for the presence of a selectable marker, such as, but not limited to, GFP, GUS (b-glucuronidase). Other examples would be readily known to the skilled person. Alternatively, no selection is performed, and the seeds obtained in the above- described manner are planted and grown and OML4 expression or protein levels measured at an appropriate time using standard techniques in the art. This alternative, which avoids the introduction of transgenes, is preferable to produce transgene-free plants. Following DNA transfer and regeneration, putatively transformed plants may also be evaluated, for instance using PCR to detect the presence of the gene of interest, copy number and/or genomic organisation. Alternatively or additionally, integration and expression levels of the newly introduced DNA may be monitored using Southern, Northern and/or Western analysis, both techniques being well known to persons having ordinary skill in the art.
The generated transformed plants may be propagated by a variety of means, such as by clonal propagation or classical breeding techniques. For example, a first generation (or T1) transformed plant may be selfed and homozygous second-generation (or T2) transformants selected, and the T2 plants may then further be propagated through classical breeding techniques.
In a further related aspect of the invention, there is also provided, a method of obtaining a genetically modified plant as described herein, the method comprising a. selecting a part of the plant; b. transfecting at least one cell of the part of the plant of paragraph (a) with at least one nucleic acid construct as described herein or at least one sgRNA molecule as described herein, using the transfection or transformation techniques described above; c. regenerating at least one plant derived from the transfected cell or cells; d. selecting one or more plants obtained according to paragraph (c) that show silencing or reduced expression of OML4.
In a further embodiment, the method also comprises the step of screening the genetically modified plant for SSN (preferably CRISPR)-induced mutations in the OML4 gene or promoter sequence. In one embodiment, the method comprises obtaining a DNA sample from a transformed plant and carrying out DNA amplification to detect a mutation in at least one OML4 gene or promoter sequence.
In a further embodiment, the methods comprise generating stable T2 plants preferably homozygous for the mutation (that is a mutation in at least one OML4 gene or promoter sequence). Plants that have a mutation in at least one OML4 gene and/or promoter sequence can also be crossed with another plant also containing at least one mutation in at least one OML4 gene and/or promoter sequence to obtain plants with additional mutations in the OML4 gene promoter sequence. The combinations will be apparent to the skilled person. Accordingly, this method can be used to generate a T2 plants with mutations on all or an increased number of homoeologs, when compared to the number of homoeolog mutations in a single T 1 plant transformed as described above.
A plant obtained or obtainable by the methods described above is also within the scope of the invention.
A genetically altered plant of the present invention may also be obtained by transference of any of the sequences of the invention by crossing, e.g., using pollen of the genetically altered plant described herein to pollinate a wild-type or control plant, or pollinating the gynoecia of plants described herein with other pollen that does not contain a mutation in at least one of the OML4 gene or promoter sequence. The methods for obtaining the plant of the invention are not exclusively limited to those described in this paragraph; for example, genetic transformation of germ cells from the ear of wheat could be carried out as mentioned, but without having to regenerate a plant afterward.
While the foregoing disclosure provides a general description of the subject matter encompassed within the scope of the present invention, including methods, as well as the best mode thereof, of making and using this invention, the following examples are provided to further enable those skilled in the art to practice this invention and to provide a complete written description thereof. However, those skilled in the art will appreciate that the specifics of these examples should not be read as limiting on the invention, the scope of which should be apprehended from the claims and equivalents thereof appended to this disclosure. Various further aspects and embodiments of the present invention will be apparent to those skilled in the art in view of the present disclosure "and/or" where used herein is to be taken as specific disclosure of each of the two specified features or components with or without the other. For example "A and/or B" is to be taken as specific disclosure of each of (i) A, (ii) B and (iii) A and B, just as if each is set out individually herein. Unless context dictates otherwise, the descriptions and definitions of the features set out above are not limited to any particular aspect or embodiment of the invention and apply equally to all aspects and embodiments which are described.
The foregoing application, and all documents and sequence accession numbers cited therein or during their prosecution ("appln cited documents") and all documents cited or referenced in the appln cited documents, and all documents cited or referenced herein ("herein cited documents"), and all documents cited or referenced in herein cited documents, together with any manufacturer's instructions, descriptions, product specifications, and product sheets for any products mentioned herein or in any document incorporated by reference herein, are hereby incorporated herein by reference, and may be employed in the practice of the invention. More specifically, all referenced documents are incorporated by reference to the same extent as if each individual document was specifically and individually indicated to be incorporated by reference.
The invention is now described in the following non-limiting example.
EXAMPLE
The largel forms large and heavy grains
We have identified a number of grain size mutants in rice. The Iarge1-1 mutant was isolated from y-ray- treated M2 populations of the japonica variety Zhonghuajing (ZHJ). The largel -1 mutant displayed large grains and high plants (Figure 1A-1 E). The length of Iarge1-1 grains was increased by 16.24% compared with that of ZHJ grains (Figure 1 F). Similarly, the width of Iarge1-1 grains was increased by 11.54% compared with that of ZHJ grains (Figure 1G). The largel -1 grains were also significantly heavier than ZHJ grains (Figure 1 H). The weight of largel -1 grains was increased by 23.11% compared with that of ZHJ grains. These results indicate that LARGE1 negatively regulates grain size and weight in rice.
Mature Iarge1-1 plants were significantly higher than ZHJ plants (Figure 11). The Iarge1-1 panicles were long and loose in comparison to the wild-type panicles (Figure 1J), indicating that LARGE1 also negatively influences panicle length. As panicle structure and shape are determined by panicle branches, we investigated ZHJ and largel -1 panicle branches. The primary branches of largel -1 panicles were more than those of ZHJ (Figure 1 K), and the Iarge1-1 had fewer secondary branches than ZHJ (Figure 1 L).. LARGE1 regulates cell expansion in spikelet hulls
Grain growth is limited by spikelet hulls, and spikelet hull growth is determined by cell proliferation and cell expansion processes . To uncover cellular basis for LARGE1 in grain growth, we investigated cells in ZHJ and Iarge1-1 spikelet hulls. As shown in Figure 2, the outer epidermal cells in Iarge1-1 lemmas were longer and wider cells than those of ZHJ lemmas, while cell number in Iarge1-1 lemmas were similar to that in wild- type lemmas in both longitudinal and transverse directions (Figures 2A, 2B, 2E-2H). Similarly, the average length and width of inner epidermal cells of Iarge1-1 was longer and wider than that of ZHJ (Figure 2C, 2D, 2I, 2J). These results indicate that the long and wide grain phenotypes of Iarge1-1 results from the long and wide cells in spikelet hulls. Thus, LARGE1 regulates grain size by limiting cell expansion in spikelet hulls.
As several genes were reported to regulate grain size by influencing cell expansion in spikelet hulls, we investigated their expression levels in wild-type and Iarge1-1 panicles (Figure 8). SPL13/GWL7, a transcription factor, positively influences grain length by increasing cell expansion (Si, et al. 2016). Higher expression level of SPL13 in Iarge1-1 panicles was observed. GL7/GW7/SLG7 promotes cell elongation in spikelet hulls, resulting in long grains (Wang, et al. 2015; Wang, et al. 2015; Zhou, et al. 2015), although GL7/GW7/SLG7 is also proposed to increase grain length by influencing cell proliferation (Wang, et al. 2015). Expression of GL7 was obviously increased in largel- 1 compared with that in ZHJ (Figure 8). The putative serine carboxypeptidase GS5 and the transcription factor GS2 affect grain growth by increasing both cell expansion and cell proliferation (Li, et al. 2011 ; Duan, et al. 2015; Hu, et al. 2015). Expression levels of GS5 and GS2 in Iarge1-1 were significantly higher than those in ZHJ (Figures 8). The bHLH transcription factor PGL1 controls grain length by increasing cell expansion (Heang and Sassa 2012a, b). APG, another bHLH transcription factor, regulates grain length by restricting cell expansion in spikelet hulls (Heang and Sassa 2012a, b). Expression levels of APG and PGL1 in Iarge1-1 were lower and higher than those in ZHJ, respectively (Figure 8). These data indicate that LARGE1 influences expression of several grain size genes that regulate cell expansion.
LARGE1 encodes the Mei-2 like protein OML4
The MutMap approach was used to identify the Iarge1-1 mutation. We crossed ZHJ with Iarge1-1 and generated an F2 population. In the F2 population, the progeny segregation showed that the single recessive mutation determines the large grain phenotype of Iarge1-1. The genomic DNAs from F2 plants with large-grain phenotype were pooled and applied for whole-genome resequencing. The wild-type ZHJ was also sequenced as a control. SNP analyses were performed as described previously (Fang, et al. 2016; Huang, et al. 2017). We detected 3913 SNPs and 1280 INDELs between ZHJ and the pooled F2 plants with Iarge1-1 phenotypes. The SNP/INDEL ratio in the pooled F2 plants was calculated in the whole genome. Among them, only one INDEL in the coding region had a SNP/INDEL-ratio = 1. This INDEL contains a 4-bp deletion in Iarge1-1 in the gene {LOC_Os02g31290) (Figure 3A; Figure 9; Table 13), which leads to a premature stop codon (Figure 3B). We further confirmed this deletion in LOC_Os02g31290 by developing dCAPSI marker (Figure 3C). These results indicate that LOC_Os02g31290 is the candidate gene for LARGE1.
The genetic complementation test was conducted to confirm whether the deletion in LOC_Os02g31290 was responsible for the Iarge1-1 phenotypes. The genomic fragment of LOC_Os02g31290 ( gLARGEI ) was transformed into the Iarge1-1 mutant and generated eleven transgenic lines. The gLARGEI construct complemented the large grain phenotypes of the Iarge1-1 mutant (Figure 3D and 3E). The grain length and width of gLARGEI ;large1-1 transgenic plants were similar to those of ZHJ (Figure 3F and 3G). Genomic complementary plants also recovered to the wild type in plant height and morphology (Figure 10). Therefore, the complementation test supported that the LARGE1 gene is LOC_Os02g31290.
LARGE1 /LOC_Os02g31290 encodes the Mei-2 like protein OML4 with three RNA Recognition Motifs (RRMs) (Figure 3B and Figure 11). Homologs of OML4 were found in crops (Figure 11) but the role of OML4 and its homologs in grain size control are totally unknown so far. The mutation in Iarge1-1 resulted in a premature stop codon. The proteins encoded by Iarge1-1 (OML4large1 1) lacked RRM motifs (Figure 3B), which indicated that Iarge1-1 is a loss of function allele.
Expression and subcellular localization of OML4
We investigated the expression of OML4 in developing panicles using quantitative RT- PCR analysis. The OML4 gene expression was detected and was also variable during panicle development (Figure 3H). We further generated the OML4 promoterGUS transgenic plants ( proOML4:GUS ) and examined the expression patterns of OML4 in developing panicles. During panicle development, GUS activity was detected in the panicles with about 1 cm of length. The strongest GUS activity was observed in the panicles with about 5 cm of length. The GUS activity was then gradually decreased during panicle development (Figure 3I). Similarly, quantitative RT-PCR analysis indicate that expression of OML4 was relatively high in the panicles with about 5 cm of length (Figure 3H).
To investigate the subcellular localization of OML4 in rice, we generated gLARGEI- GFP transgenic plants. As shown in Figure 3J and 3K, the gLARGE1-GFP construct rescued the phenotypes of the Iarge1-1 mutant (Figure 3J and 3K), indicating that the LARGE1-GFP fusion protein is functional. GFP signal in gLARGEI -GFP;large1 -1 roots was predominantly detected in nuclei (Figure 3L-30). Thus, this finding indicated that OML4 is localized in nuclei in rice
Overexpression of OML4 results in short grains due to short cells in spikelet hulls
To further reveal functions of OML4 in grain growth, we conducted the proActin:OML4 construct, transformed it into ZHJ and generated fourteen transgenic lines. The proActin:OML4 transgenic plants had short grains compared with ZHJ (Figure 4A-4C), while the width of proActin:OML4 grains was similar to that of ZHJ (Figure 4D). The grains were also significantly lighter than ZHJ (Figure 4E). Grain length of proActin:OML4 transgenic lines was associated with the expression levels of OML4 (Figure 4F). These data reveals that OML4 functions to restrict grain growth in rice.
Mature proActin:OML4 transgenic plants were shorter than ZHJ (Figures 4G and 4H). The average length of proActin:OML4 panicles was significantly decreased compared with that of ZHJ panicles (Figure 4I and 4J). The primary panicle branches of proActin:OML4 were comparable to those of ZHJ, while the secondary panicle branches of proActin:OML4 were obviously increased in comparison to those of ZHJ (Figure 4K and 4L), resulting in the increased grain number per panicle (Figure 4M).
As proActin:OML4 transgenic lines produced short grains, we tested whether overexpression of OML4 could decrease cell length in spikelet hulls. We examined the size of outer epidermal cells in wild-type and proActin:OML4 spikelet hulls (Figure 4N and 40). Outer epidermal cells in proActin:OML4 spikelet hulls were shorter than those of ZHJ spikelet hulls (Figure 4P and 4Q). By contrast, the number of epidermal cells in the longitudinal and transverse direction in proActin:OML4 spikelet hulls was similar to that in ZHJ spikelet hulls (Figure 4R and 4S). These results further revealed that OML4 affects grain growth by limiting cell expansion in spikelet hulls. X
OML4 interacts with GSK2
To further understand the molecular role of OML4 in grain growth control, we identified its interacting partners through a yeast two-hybrid (Y2H) assay. The OML4 full-length protein was used as the bait. Among several interacting proteins, six different clones corresponding to GSK2 were found in this screen. As GSK2 has been reported to restrict grain growth in rice, suggesting that GSK2 is a candidate OML4-interacting partner. We further confirmed the interaction of OML4 with the full length GSK2 in yeast cells (Figure 5A).
We next verified the interaction between OML4 and GSK2 in plant cells using the firefly luciferase (LUC) complementation imaging assay (Figure 5B). The OML4-nLUC and GSK2-cLUC were transformed and co-expressed in N.benthamiana leaves. The LUC activity was detected when we co-expressed OML4-nLUC and GSK2-cLUC, while no signal was observed in both combinations of OML4-nLUC/cLUC and nLUC/GSK2- cLUC. We then performed bimolecular fluorescence complementation (BiFC) assay to test the interaction between OML4 and GSK2 in plant cells (Figure 5C). OML4 was fused with the C-terminus of the yellow fluorescent protein (OML4-cYFP), and GSK2 was fused with the N-terminus of the yellow fluorescent protein (GSK2-nYFP). Confocal laser scanning microscopy observation showed that a strong YFP fluorescence was observed in nuclei when we co-expressed OML4-cYFP and GSK2- nYFP in N.benthamiana leaves. These results indicate that OML4 associates with GSK2 in plant cells.
To investigate whether OML4 could directly interact with GSK2, we performed an in vitro pull-down assay (Figure 5D). We expressed maltose binding protein (MBP)-fused OML4 (OML4-MBP) and GST tag-fused GSK2 (GSK2-GST) proteins in E.coli cells. As shown in Figure 5D, OML4-MBP physically interacted with GSK2-GST but not the negative control (GST) in vitro. The co-immunoprecipitation (Co-IP) analyses were used to examine the association of GSK2 and OML4 in N.benthamiana. We co expressed GSK2-GFP and OML4-MYC in N.benthamiana leaves (Figure 5E). Total proteins were isolated and incubated with MYC beads to immunoprecipitate OML4- MYC. The anti-MYC and anti-GFP antibodies were used to detect immunoprecipitated proteins, respectively. GSK2-GFP proteins were detected in the immunoprecipitated OML4-MYC complexes (Figure 5E), indicating that GSK2 associated with OML4 in vivo. These results reveal that OML4 can directly interact with GSK2 in vitro and in vivo.
GSK2 phosphorylates OML4 and modulates its protein level
As GSK2 possesses kinase activity and interacts with OML4, we examined whether GSK2 could phosphorylate OML4. To test this, we performed an in vitro kinase assay. GST-fused GSK2 (GSK2-GST) proteins were incubated with OML4-Flag, the N- terminal region of OML4-fused Flag (nOML4-Flag), and the C-terminal region of OML4- fused Flag (cOML4-Flag) in an in vitro kinase assay buffer, respectively. The phosphorylated OML4-Flag, nOML4-Flag and cOML4-Flag were detected in the presence of GSK2-GST, while the phosphorylated OML4-Flag, nOML4-Flag and cOML4-Flag were not found in the absence of GSK2-GST (Figure 6A). These results show that GSK2 can phosphorylate OML4 in vitro.
To further verify that GSK2 can phosphorylate OML4, we investigated phosphorylation sites of OML4. To identify the phosphorylation sites in OML4, the recombinant OML4 was incubated with the recombinant GSK2 in an in vitro kinase assay buffer, separated by SDS-PAGE electrophoresis, and then subjected to LC-MS/MS analysis for phosphopeptides. We identified 18 phosphopeptides of OML4, which correspond to 14 phosphosites (Figure 6B). Among 14 phosphorylation sites of OML4, we observed that S105, S146 and S607 are Ser/Thr, Ser and Ser in its closest homologs in different plant species, respectively, suggesting that these three amino acids are possible conserved phosphorylation sites. We then mutated two amino acids into phosphor-dead alanine (OML4S105A S607A) and detected their phosphorylation levels by GSK2. Mutations of the two aforementioned Ser residues to Ala reduced the phosphorylation level of OML4, although OML4S105A S607A was still phosphorylated by GSK2 (Figure 6C and 6D), indicating that S105 and S607 partially contribute to its phosphorylation by GSK2. This result further supports that GSK2 can phosphorylate OML4 in vitro.
Considering that GSK2 can interact with and phosphorylate OML4 in vitro, we asked if the protein level of OML4 could be affected by GSK2. As shown in Figure 6E, we found that the level of OML4-MYC was increased when GSK2-GFP was coexpressed in leaves of N.benthamiana. Considering that the phosphorylation level of OML4S105AS607A was lower than that of OML4 in vitro, we asked whether mutations in S105 and S607 could influence the protein level of OML4. As shown in Figure 6F, the level of OML4S105A S607A was obviously lower than that of OML4 when we transiently overexpressed GSK2-GFP with OML4-MYC or OML4S105A S607A-MYC in leaves of N.benthamiana. These results indicate that GSK2 affects the level of OML4 possibly by influencing its phosphorylation.
GSK2 acts genetically with OML4 to regulate grain size
Although GSK2 has been described to affect grain size, the function of GSK2 in grain size control has not been characterized in detail. To carefully investigate the role of GSK2 in grain size control, we downregulated the expression of GSK2 using RNA interference (RNAi) approach ( GSK2-RNAi ), as described previously (Tong, et al. 2012). GSK2-RNA! lines showed longer and slightly wider grains than ZHJ (Figure OK IE), indicating that GSK2 predominantly regulates grain length in rice. The grain weight of GSK2-RNA! transgenic lines was also significantly increased in comparison to that of ZHJ (Figure 7F). We then observed epidermal cells in ZHJ and GSK2-RNAi spikelet hulls. GSK2-RNAi spikelet hulls contained longer and slightly wider epidermal cells than ZHJ spikelet hulls (Figure 7G-7J). These results demonstrate that GSK2 controls grain growth by limiting cell elongation in spikelet hulls.
GSK2-RNAi produced long grains, like that observed in Iarge1-1 mutant, and GSK2 and OML4 restrict cell elongation in spikelet hulls (Figure 2 and Figure 7). In addition, GSK2 can phosphorylate OML4 in vitro. We therefore speculated that GSK2 and OML4 could function in a common pathway to regulate grain length in rice. To test this, we crossed Iarge1-1 with GSK2-RNAi and isolated largel -1 ;GSK2-RNAi plants (Figure 7K). As shown in Figure 7L, the length of largel -1 grains was increased by 16.24% in comparison to that of ZHJ, while the length of largel -1 ;GSK2-RNAi grains was increased by 7.90% compared with GSK2-RNAI. The results suggest that GSK2 acts, at least in part, in a common genetic pathway with OML4 to control grain length.
In addition, we also used the CRISPR constructs described herein to introduce at least one mutation into GSK2. In these CRISPR lines the grain length of gsk2-cri(7.99±0.30) was increased compared with ZHJ(7.20±0.17). DISCUSSION
Grain size and weight are critical determinants of grain yield, but the genetic and molecular mechanisms of grain size control in rice are still limited. In this study, we identify OML4 as a novel regulator of grain size and weight. GSK2 interacts with and phosphorylates OML4. GSK2 and OML4 function, at least in part, in a common pathway to control grain length in rice. These findings reveal an important genetic and molecular mechanism of the GSK2-OML4 regulatory module in grain size control.
The Iarge1-1 mutant produced long, wide and heavy grains in comparison to the wild type. By contrast, overexpression of LARGE1 caused short and light grains. Thus, LARGE1 is a negative regulator of grain size and weight. Cellular analyses support that LARGE1 controls grain size by restricting cell expansion. Consistent with this, expression of several genes (e.g. SPL13, GS2, GS5 and GL7) (Li, et al. 2011 ; Che, et al. 2015; Duan, et al. 2015; Hu, et al. 2015; Zhou, et al. 2015; Si, et al. 2016), which control grain size by regulating cell expansion, was altered in Iarge1-1 (Figure 8).
LARGE1 encodes the Mei2-like protein (OML4) in rice. There are many Mei2-like proteins in plants, which have the conserved RRMs, but appear to have taken on distinct functions in plant development (Jeffares, et al. 2004). The Arabidopsis-Mei2- Like (AML) genes contain a five-member gene family, which play a role in meiosis and vegetative growth (Kaur, et al. 2006). In maize, TERMINAL EAR 1 ( TE1 ), encoding a Mei2-like protein, plays a role in regulating leaf initiation (Veit, et al. 1998). In rice, PLASTOCHRON2(PLA2)/LEAFY HEAD2 (LHD2) encodes a Mei2-like protein (OML1) (Kawakatsu, et al. 2006). The pla2 mutant exhibited precocious maturation of leaves , shortened plastochron, and ectopic shoot formation during the reproductive phase (Kawakatsu, et al. 2006). However, the function of Mei2-like proteins in seed/grain size control has not been reported in plants. In this study, we identify OML4 as a negative regulator of grain size in rice.
We further identified the OML4-interacting proteins. Interestingly, one of them is the GSK2, a homologue of Arabidopsis BIN2 (BRASSINOSTEROID INSENSITIVE2) kinase, which has been reported to influence grain size and multiple growth processes in rice (Tong, et al. 2012). Previous studies showed that GSK2 interacts with several grain size regulators. However, the effect of GSK2 on cell proliferation and/or cell expansion in spikelet hulls has not been characterized in detail. In this study, we found that downregulation of GSK2 formed large grains as a result of large cells in spikelet hulls (Figure 7D and 7I). These results indicate that GSK2 restricts cell expansion rather than cell proliferation in spikelet hulls. Consistent with this, it has been proposed that GSK2 regulates grain size by interacting with GS2 that predominately promotes cell expansion in spikelet hulls (Che et al., 2015). GSK5, a homolog of GSK2, has been reported to control grain size by restricting cell expansion in spikelet hulls (Hu, et al. 2018). Considering that GSK2 is a functional protein kinase, we presumed that GSK2 could phosphorylate OML4. Consistent with this idea, we found that GSK2 can interact and phosphorylate OML4. We further observed that GSK2 influences the level of OML4 (Figure 6E). It is possible that GSK2 might phosphorylate OML4 and prevent the degradation of OML4. Supporting this, we observed that mutations in S105 and S607 partially influence the abundance of OML4 (Figure 6F). In addition, our genetic analyses suggest that GSK2 and OML4 function, at least in part, in a common pathway to control grain length in rice. Therefore, our findings reveal an important genetic and molecular mechanism of grain size control involving the GSK2-OML4 regulatory module in rice, suggesting this module is a promising target for grain size improvement in crops.
Materials and methods
Plant materials and growth conditions
The g-rays was used to irradiate the grains of the wild type Zhonghuajing (ZHJ), and the Iarge1-1 mutant was isolated from the M2 population. Rice plants were grown in the field according to a previous report (Huang, et al. 2017). Rice plants were cultivated in Lingshui from December 2016 to April 2017, December 2017 to April 2018 and Zhejiang Academy of Agricultural Sciences (Hangzhou) from July 2017 to November 2017, July 2018 to November 2018, respectively.
Phenotypic evaluation and cellular analysis
The ZHJ and Iarge1-1 plants grown in the paddy fields were taken photographs after completing grouting. MICROTEK Scan Marker i560 (MICROTEK, Shanghai, China) was used to scan mature seeds. We use the WSEEN Rice Test System (WSeen, Zhejiang, China) to measure the grain length and width. We also measured the 1000- grain weight with three replicates (Huang, et al. 2017). We use a scanning electron microscope (SEM) to observe the cell size and cell number. SEM observation was performed as described previously (Duan, et al. 2015). Image J software was explored to measure cell length and width.
RNA extraction and real-time RT-PCR analysis
Total RNA of seedlings or young panicles were extracted using a RNA Pre Pure Plant Kit (Tiangen, Beijing). cDNAs was synthesized according to the previous study (Duan, et al. 2015). Real-time RT-PCR was conducted on an ABI7500 real-time PCR system using a SYBR Green Mix Kit (Bio-Rad, Hercules, CA). Rice Actinl gene was used as an internal control.
Identification of the LARGE1 gene
We crossed Iarge1-1 with the wild type ZHJ to produce F2 populations. We clone the LARGE1 gene using the F2 population. The whole genome of wild-type ZHJ and mixed-pool of 50 individual plants with mutant phenotypes were resequenced using NextSeq 500 (Illumine, American). The MutMap was used to isolate LARGE1 gene as described previously (Abe, et al. 2012), and the SNP/INDEL-ratio was analysed as described previously (Fang, et al. 2016).
Constructs and plant transformation
The genomic sequence of OML4, which contained a 2049-bp 5’ flanking region, the whole gene region and a 1259-bp 3’ flanking region, was amplified using the primers gOML4-99-F and gOML4-99-R. We used the GBclonart Seamless Cloe Kit to fuse the OML4 genomic sequence to the pMDC99 vector and generated the gOML4 recombinant construct. The latter series of the recombinant vectors were constructed using the same kit and similar methods. The related vectors we used in this study were plPKB003 (containing the ACTIN promoter and fused with the CDS of the OML4 gene), pMDC107 (constructing the gOML4-GFP plasmid), and pMDC164 (constructing the proOML4:GUS vector).. The plasmids gOML4, proACTIN:OML4, gOML4-GFP and proOML4:GUS were introduced into the Agrobacterium strain GV3101 , respectively. The gOML4 and gOML4-GFP were transferred into Iarge1-1, and other plasmids were transferred into the wild type according to a previous report (Hiei, et al. 1994).
GUS staining and subcellular localization of OML4 GUS staining of panicles in different developmental stages was performed as described previously (Fang, et al. 2016). The GFP fluorescence of gOML4-GFP transgenic seedlings was observed using the Zeiss LSM 710 confocal microscopy. The 4’, 6-diamidino-2-phenylindole (DAPI) (1pg/mL) was used to stain cell nuclei.
Yeast two-hybrid assays
The cDNA sequences of GSK2 and OML4 were amplified using gene-specific primers (Table S4), and products were fused into the linearized pGADT7 and pGBKT7 vectors, respectively. Yeast two-hybrid analysis was conducted according to the manufacturer’s instruction (Clontech, USA).
BiFC assay
Full-length cDNA fragments of OML4 and GSK2 were recombined into the pGBW414- cYFP and pGBW414-nYFP vectors. The constructs were transformed into Nicotians benthamiana mesophyll cells by acetosyringone (AS) for transient expression. Confocal imaging analysis was performed using a Zeiss LSM 710 confocal microscopy.
Pull down assay
Recombinant proteins (OML4-MBP and MBP) and the prey proteins (GSK2-GST and GST) were incubated in TGH buffer (50 mM HEPES, PH 7.5, 10% glycerol, 150 mM NaCI, Triton X-100, 1.5 mM MgCh, 1 mM EGTA, and protease inhibitor cocktail tablet) for 0.5 hr at 4 °C with 20 pi MBP-beads per tube. Centrifuge 500 rpm for 2 mins and discard supernatant to stop the reaction. Wash beads with ice-cold TGH buffer for 5 times and then add 50 pi SDS-loading buffer. Denatured the samples at 98 °C for 5 mins and finally subjected to the SDS-PAGE analysis. We used anti-MBP (Beyotime) and Anti-GST (Beyotime) to detect the input and the pull-down samples, respectively.
Phosphorylation analysis
The coding sequences of OML4, nOML4 and cOML4 were amplified using the specific primers (OML4-FLAG-F/R, nOML4-FLAG-F/R and COML4-FLAG-F/R) in Table S4. The products were cloned to the vector pETnT to construct OML4-FLAG, nOML4-FLAG and cOML4-FLAG plasmids. The GSK2 coding sequence was amplified using the primers GSK2-GST-F/R and subcloned to the vector pGEX4T-1 to construct GSK2- GST plasmid. All these plasmids were transformed into Escherichia coli (host strain BL21). Induction, isolation and purification of OML4-FLAG, nOML4-FLAG, cOML4-FLAG and GSK2-GST proteins were done as described previously (Xia, et al. 2013). 10 pl_ of GSK2-GST was incubated with 5 mI_ of OML4-FLAG, nOML4-FLAG and cOML4-FLAG in 20 mI_ reaction buffer (25 mM Tris-HCI, PH 7.5, 10 mM MgCI2, 1 mM DTT, 50 mM ATP) for 2 hours, respectively. Phosphorylated products were analyzed by phos-tag SDS-PAGE. Anti- GST and anti-FLAG and anti-GST antibodies were utilized to detect the phosphorylated products and the input.
SEQUENCE LISTING
Rice
SEQ ID NO: 1 : OML4 amino acid sequence
MPSQVMDQRHHMSQYSHPTLAASSFSEELRLPTERQVGFWKQESLPHHMGSKSVASSPIEKP
QPIGTRMAGRLELLQPYKLRDQGAAFSLEHKLFGQERHANLPPSPWRPDQETGRQTDSSLKS
AALFSDGRINPNGAYNENGLFSSSVSDIFDKKLRLTSKNGLVGQSIEKVDLNHVDDEPFELTEEI
EAQIIGNLLPDDDDLLSGVVDEVGYPTNANNRDDADDDIFYTGGGMELETDENKKLQEFNGSA
NDGIGLLNGVLNGEHLYREQPSRTLFVRNINSNVEDSELKLLFEHFGDIRALYTACKHRGFVMIS
YYDIRSALNAKMELQNKALRRRKLDIHYSIPKDNPSEKDINQGTIVLFNVDLSLTNDDLHKIFGDY
GEIKEIRDTPQKGHHKIIEFYDVRAAEAALRALNRNDIAGKKIKLETSRLGAARRLSQHMSSELC
QEEFGVCKLGSPSTSSPPIASFGSTNLATITSTGHENGSIQGMHSGLQTSISQFRETSFPGLSST
IPQSLSTPIGISSGATHSNQAALGEISQSLGRMNGHMNYSFQGMSALHPHSLPEVHNGVNNGV
PYNLNSMAQVVNGTNSRTAEAVDNRHLHKVGSGNLNGHSFDRAEGALGFSRSGSSSVRGHQ
LMWNNSSNFHHHPNSPVLWPSPGSFVNNVPSRSPAQMHGVPRAPSSHMIDNVLPMHHLHVG
SAPAINPSLWDRRHGYAGELTEAPNFHPGSVGSMGFPGSPQLHSMELNNIYPQTGGNCMDPT
VSPAQIGGPSPQQRGSMFHGRNPMVPLPSFDSPGERMRSRRNDSNGNQSDNKKQYELDVD
RIVRGDDSRTTLMIKNIPNKYTSKMLLAAIDENHKGTYDFIYLPIDFKNKCNVGYAFINMTNPQHII
PFYQTFNGKKWEKFNSEKVASLAYARIQGKSALIAHFQNSSLMNEDKRCRPILFHSDGPNAGD
QEPFPMGTNIRARSGRSRASSGEESHQDISITSVNCDTSTNGVDTTGPAKD
(RRM domains are underlined)
SEQ ID NO: 2: OML4 nucleic acid sequence (genomic)
ATGCCAT CT CAGGTCATGG AT CAG AGGCAT CACAT GT CCCAGT ACAGCCACCCCACCTT G GCTGCAT CCT CCTT CT CGG AGG AGCTT CGT CT CCCCACAG AGGTACT CCATAATTGCG AT A ATTTT G GT CC AAAT CTT CCTT CT G G AAGTCTTTT CTATGTG ATG GCT AAT G GTG ATCTGTCT GG AAATTTT ATTT GTTT AGCCTTT CCTGGT G ACCTGGTT AT GATT CAT AT CT ACAAAT CTTT A CC AATT ATT CT C ACC AT GTTT AT AT ATT C ATT AT G ATG AAT AT CT AT AATTT GT ACT AATTTTT CTCTCACCATGTTCATCTCTTCTTCTATCTTTGCAGAGGCAAGTTGGATTTTGGAAGCAGGA GTCATTACCTCATCACATGGGTTAGTGCTGAGTTTGATTTAACTTATACTGGGTTTTGTTCTA CATTTGTCTATTAGTATGCCTTGCGGTTGCAGCTTTAAATTTTCACGCTGTTGGGGGCATGT ACTT AGTCG TTT CTTT ATG C ATG GAT AG C A A AACTTT G G G G AC ATCT ATT G G CT CTTTTTT CT GCATGAATTACAAACCATCTATAGGAGGGCTTTCTTTGAAAGGTTTACCTGGCCTTGACAG CCATCTAGCCTGCCTAAATTGAGTTAACACTAGGTGCTGGCCTTGCCACCTGATTAGTGCC TTGGT G AACATTGGTTTTAAGT ATTTT CCCCT CT ATTT AT GTT AG ATT AATTTGCAAT AAAT AA AT AAAT AAAT AAAT AAAC AT G CAT GTT CTT CTT AT AT ATGC AATTGGTTGTTGT GTTTTTT CTT GTT ATG GTT ACTTT CTTTGTT CT ATT GT ACT ACT CTTT G AGT CTTT GAT AAT GTG ATGGTTC A TAAATATGTGGGTTTCCCATGATATTTTCTCATAACTAGGTGGGTTTCCAATATTGACAGGA AG C AAGTCTGTTG CAT CTT C ACC AATT G AAAAACCT C AACCT ATT G GG AC AAG G ATGG CT G GT CG ACTAG AACTT CT ACAACCAT AT AAACT AAG AG ACCAGGG AGCTGCATTT AGCCTTG A GC AC AAG CT ATT CG GT CAAG AG AGG C AT G CT AACTT G CC ACC AT CT CCTTGG AG ACCT GAT CAAGAAACTGGCCGCCAAACTGATTCATCTTTGAAGTCGGCAGCTTTATTTTCTGATGGGA GGATTAATCCGAATGGTGCCTATAACGAGAATGGGCTTTTCTCAAGCTCTGTATCAGATATT TTTG ACAAG AAAT GTG AGTGGTTTTT CTTT AT CATTTGCATTTGCTT CAT CAAAATGCTT GAT T CT ATG AAAC AC AG ACT CG AG AAATTT CC ATT CC ATT GAT AGT AAAT GTGCT G AAAT AT ACC AT C AC ATG AC AT ATGT ATTGG C AACT AC AACGCTT CCTT ACG AT CTT AC ATT CT AT ACTT AAT GCTT CT CAT G AATG AAT AG AAAT GT ACAAAAGT AAAACAAAAAAT AC AACT G AAAT G AAAGG GT AGT AAAAT G AAAT G ACTTT C ATT CCCTT CCCCTTTTT CC AT AAG AAT CTTGCCT CCTTT AT CT CCT GTTT CTTT CTAGTGG CT AAAAG AAT CAAT CC ACTTT AGTTTGGT ATCGT AGT CCGTC TGTT ATT CTT GT AC ATT CTTTTGCC AAAAAAAAG TCTG C ACT CTGGTT C AACCTTT ATT CT AT T GT AAT AT GTT AT CT CCAATTT CCAAT CATTG ACCACT GT CTG ATTTT ATTT GT AACCT GTGC AGTG AG ATT AAC AT CC AAG AAT G GT CTT GTCGGT C AGT C AATT G AAAAGGTT G ACCT AAAC CATGTTGATGATGAGCCCTTTGAGTTGACCGAGGAAATTGAGGCCCAAATAATTGGAAATC TTCTTCCTGATGATGATGACCTGTTATCAGGTGTTGTTGATGAAGTTGGGTATCCAACCAAC GCTAACAACCGGGATGATGCTGATGATGATATATTCTACACTGGAGGCGGGATGGAACTC G AAACT GAT G AAAAT AAAAAACTGC AAG AATTT AATGGCAGTGCT AAT G ATGG AATTGGTTT GTTAAATGGTGTGTTGAATGGTGAACATCTATACCGGGAACAGCCTTCGAGAACTCTTTTT GTT CG AAAC ATT AAT AGT AAT GTT G AGG ACT CTG AATT G AAG CT CCT ATTT G AG GTT AGTT A CTT ATTT CTT CTT CTTT G AAT CACT CTT CT GTT ACAACAG ATTTG ACAT CT G AG AAGCCAT CT GTTCTTCTATGCAGCATTTCGGAGATATCCGTGCCCTTTATACTGCCTGTAAACATCGTGGT TTTGTGATGATATCTTACTATGATATAAGGTCAGCGCTGAATGCCAAGATGGAGCTTCAAAA CAAGGCACTGAGGCGTAGGAAACTTGACATACATTATTCCATTCCGAAGGTAACCATCAAA T CAT CAATTGCCACTT AACT G AAAATGCTT AT CTGCATTTT CT GTTGCCT GTT CTT GTGCTT A G AATGTT ATT ATT CT AG AT ATT CACT AAAATT G AGCACATTTGCTTTT CTTT CCCCACAGG AC AATCCTTCGGAGAAAGATATTAACCAGGGAACTATTGTACTTTTTAACGTTGACCTATCTTTA ACAAATGATG ATCTACAT AAG ATCTTTGGTGACTATGGTGAAATAAAGGAGGTACGAT ATTT C ATTT GCTGACTACT ATT AT AG CT AG AA AG TATGACTCACTAGTTCT ATTT G C AG ATT CG TG ACACTCCACAGAAGGGTCAT C AC A A A AT AAT AG A ATTTT ATG ATGTC AG AG C AG CTG AAG C TGCACTTCGTGCATTAAACAGGAATGATATTGCAGGCAAGAAAATCAAATTGGAGACCAGC CGTCTGGGTGCTGCTAGGCGGTAAGTCATTTGGGTCTTGTCAACAGTGATAATACTCTGTT TGCT GTTTT CTTTTT AGTT CTT ACT ACT ACTTT CTT CAT C ACTTTT AT AAC AT ACAT ATT CACC ATTTT AACATTTTTG ACAT ACT AGCT G AATGCCCAT ACATTGCAATGGG AATT AATT ATT AG A G AACC AC ACTGC AC ACT CT AAAGCCT C AAAAATT AAT AT AAAACT AT CCT C AAT GT AAAT CTT AG GGT CAT ATTTTTT GTCGT C ATTTT C ACCT CC AATTT GTTTT CCCTGTT AG ACGG CTT GAG GTTAGGAAAGGGACAAAAGTCCACCTACCTCACTGTTTGGGGGACTCACATAGCAGTGGT GGTGGGTGGTGGGTGGTGGCAGTGGTAGAGTATAGAGTATATATTTTGAATGCATAGTGTA T CTT CTTTT AT GTTTG AGTTT CTT AT CCACAT AAT GTT CATGCT G AGCT GTGCAGG AAT AGTT TAGTTGAATGCAGCATATTGAATAAACGAAAAAAATGTCAAACATGTTGGTAGAATGGCATT T CT CT G AGTATTTT AATTGT AGCT ATTGCTTT G ACTG ATTT C AATGCT CT CT AT C AC AG CTT G TCGCAGCATATGTCTTCAGAATTGTGTCAGGAAGAGTTTGGTGTATGCAAACTGGGGAGTC C AAGC AC AAGT AG CCCT CC AATTGCTT CGTTTGGTATGCT GTTTT CCTTTTT CAT CT C AAT G TATGTTTTGCTGATAGGTGCATTTTCTGACACGGATGGTTATATTGCAAGGTTCTACTAATTT GGCAACAATAACTTCAACTGGTCATGAAAATGGAAGTATCCAGGGTATGCATTCTGGACTT CAG AC AT CAAT AAGCC AGTT CAG AG AAACAT CTTTT CCAGGCCT AT CTT CT ACCAT ACCAC A AAGTTT GT CC ACT CC AATT G G AATTT CAT CCG GTGC AACT CAT AGT AACC AGG CTGCCCTT GGTGAGATCAGCCAATCTCTAGGTCGGATGAATGGGCATATGAACTATAGTTTTCAGGGCA T G AGTGCT CTT CAT CCT C ATT CT CTGCCTG AAGT CCAC AAT G G AGT G AAC AAT GGTGTCCC TT ACAACTTAAACAGCATGGCACAAGTT GT CAATGG AACC AACT CG AGG ACAGCT G AAGCT GTGGACAACAGACATCTCCATAAAGTGGGTTCCGGCAACCTCAATGGACATTCATTTGATC GTGCGG AAGG AGGT AATTT GT AT AT CCT AAT CT CCTTT GTTTG AAAAAT CT GTT AT GTT AAG AGGAACTGAACTATCCTAGGATATGTTGGTTCCATCATGGGTCATGCCATGATTTTGGTGG GATGAATTCCTCGTTTTCTATAATTACATGCTTTTGTGGGATGAGGTGGTGATCGACCAAAC ACATTT CGTTT CT CAAACCAATG AAAGTT GT GT AATGTTTGG ATG AAAG AAATTACAT CTGG AT CAAT CT AC AAGCCTT AT AT GTT AT CT AAT C ATT CCTT G AAT GTGT ATTTTTTTTTT C ACTT G CAGCTCTTGGATTTTCAAGAAGTGGAAGTTCTTCTGTCCGTGGTCACCAGTTAATGTGGAA TAATTCAAGTAACTTCCATCATCACCCAAATTCTCCTGTTCTATGGCCAAGCCCTGGATCAT TTGTAAACA ATGTTCC ATCTCG CTCCCCTG C AC AAATG C ATG G AGTTCC AAG AGC ACC ATC GTCGCACATGATTGACAATGTGCTTCCCATGCACCATCTCCATGTAGGATCGGCACCAGCG ATCAACCCATCACTTTGGGATAGGCGGCATGGCTATGCAGGGGAATTGACAGAAGCACCA AATTTCCATCCTGGTAGTGTGGGAAGCATGGGATTTCCTGGTAGTCCTCAGCTTCACTCGA TGG AGCTT AAT AAC AT AT ACCCT C AAACT G G AGG G AATTGC AT G G ACCC AACT GTGTCTCC TGCACAGATTGGTGGTCCATCTCCTCAGCAGAGAGGTTCGATGTTCCATGGAAGGAATCCT ATGGTTCCCCTTCCATCCTTTGATTCACCTGGTGAACGGATGAGGAGCCGAAGAAATGATT C AAATGGT AAT C AGT CT GAT AAT AAAAAG CAAT AT GAG CTTG AT GTTG ACCG C ATT GTTCGT GGTG AT G ACT CCCG G ACT ACG CTG AT GAT AAAG AAT AT CCC AAAC AAGT ATGTGT AAC AAC TGTT AATTT AG GTT C ATTTTTTTTT CTT G CCTTTGCCTT CTTTT CTGT CATTTT CAT GT ATTT CT AATTGACTTGGGATTCCAGGTACACCTCAAAGATGCTTCTAGCTGCTATTGATGAAAATCAT AAAGGG ACTT ATG ATTTT ATTT ACCT ACCAATTG ACTT CAAGGTG AT CT AG ATTT ATTT AGT A TGCAACTAATACATCATATTTGTTCAGATAGTCTTGCCTAATCGAATTACTGAATGGGATGT GTCCT ACTTTT CAG AAC AAGT G CAAT GTAG GCT ATGCTTT CAT CAAT ATG ACC AAT CCT CAG CATATCATTCCATTTTATCAGGTGAGAGATACTATCTATAGGGCCTGCCCAGCTGAGCTGG CTGCAACTGCATCACAGCCAGCTGCTGCCCGAAGCAGCAATGCCAGTGGCTTGCTCCTGC AGCCAGCT CAGCC AAG AG AAACC ATT AT CAAGTGCT AGT CGCAT G AAGGCAAT AGCTT ACG TT CTGCATGCGGCTT GT CAACTTTGG ACATT GT ACATT AT CCAATTTG AAAT AAAT CAAT ATT GTGCCCTCATCCCTTTTTTGCAGACGTTCAATGGCAAGAAGTGGGAAAAGTTTAACAGTGA G A A AG TG G C ATC ACTTG CTT ATG CT AG AAT CC AAG G G A AAT C AG CTCTT ATT G CTC ACTTCC AGAACTCCAGTTTGATGAATGAGGACAAGCGCTGCCGCCCCATACTATTCCATTCGGATGG TCCT AAT G C AG G AG ATC AG G T ATG AT CTTT CTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCT CT CTCGTTG AT AAAT G G AGTT AAAG C AG C AG AT G ACACTTGG AC AC AGTTTGCT GTTTT AT G GCAAGTTCTTTTTT GTT AGCAGGCCTTTT CTGCT GT ATTT G AAT GT ATTTT AT CAC AAAT AG A CCTATATTTTGTGGTTGTTTCTGTTCTGCAGTTCCAAATTTCATGCCACATTGTGGGTTCCTT CTC ACTCT CTTTTTT CTTTT GCATGCCATGTCATGGTCT CTTT CCT AT AT ATT AC AG TTG C A A GCACCATT CCTT CT CATTT CTTTGGG AACT AG AAG AT AAT AGT AT CT GTT ACTT ATT ATTCT C TCCT AAT G GC ACTG AGTTT G CT CC AT AAT C ACT AGT C ATT CTT GTTTGGT CTTT CAG AACCT TTTATGTTAG CT CT G AAAG GTTT ATT GTT CC ATGC AG ATT G CT ATT CCTTT AACT AT AT GATT AAC ACCTTTT GT CCTTTT GTTGT CC ATT AG G AACC ATT CCCTATGG GT AC AAAC AT CCG AG C CAGGTCAGGGAGATCGCGAGCTTCCTCTGGCGAAGAAAGCCACCAGGACATCTCAATCAC CTCGGTTAATTGTGACACTTCTACCAATGGAGTTGATACTACAGGGCCTGCCAAGGACTGA GTAACACAACTGCTCTGGATCACTAACCCCCAAATCCCAAATCATAACTTTTGCGACGCGG TTTCCATTTCCCAGTTTTCCGCCCTTTTTCCCCCAACTTTGGTTTTTTTGGTATGACCCCCAA TCTGTATTTATTAACTTCCATGAATGCGGGTTACCGAAGACTTGGCTAGATTGCTGCAACAT TTTGTCCCTGATGGAAACATGGATAGAGAGACAGAGAGGGTGCTTCCAGTTTCCCCTGAAC CTACCATTATCATATTAACCTGAAGGCCGAGAAAGGTGAAAGGCGCAGCGAGAGCTTCCA G ATTTTGGT C ACTTTTT AAG AAT GT ATT AACCCC ATGTT GTAT AGC AGTTT CC AGT AACT GTG CTGAGGGGAGAGAGAGAAAGAGAGGAGAGCAAGGAGACAATTTACATGAGTTTTTAGTGG TGGTGTGGAGAGGAAGTCTTTCCCTGCATTTTCTTTTGGAACCTTTTCTGGCGTCTTCATCT ATGTTCCATTTTGAGTTGAGGTCTCCTCTTTTAAGTTGTGTGCAGAGGAGTTCCGATTTTGT CTT C AGGG AACTTT G ACCGTATCTAT CG ACCTT CAT ATGT AAAT C AAC AT CT CT AT AT AGTTT GTGTGCCCTCTGTTGTATGCCTGCGGCCCCTTGCACCAAACGAATTGTCTCTCTAACTCGT G AG ATTGCT GTCCT CGTTTGGT CGT ATT ACAT CT G AAT CT AAG C ATTTG AT GTT ACG C AAAT ACATGCCAATGGCTGCATTGCGACATGTAGCAGACGGCCAATGTTCAAACAAAAATCTTAA CTT AT G AAGTAT ACT AGT ACCT CC AT CCT AAAAT AT AAC AATTTGG G ACTG AT GTGTATATCC TAGTCCAATGAATCTGGCCCTTGTCTAAATTCATTGGACTGGATATGTCTCATCCACTTTCA AAT AG CT AT ATTTT G G G ACG AC ACTTT C A AAT AACT AT ATTTT G G AACG G AG G G AG T AAAT A ATT AT AAAT ACT AGT ACTAT CAATTTGGCACATGGT GT CAAGT CCACTTGGTGCATGGT CAT CT AG GCTT CCCTTT G GTG CCT CT CTT AAG AACCTT CT AAG CGTTT AAC AC AAATT AAAAT CG AAGT AAG AAT CT G ACACG AATT G AATT CG AAATTTGCT CT CACAATG AG AC AAAAACAAAAG AATTTGGCGAATACAGCAGTAACGCCGTGGACGAAGACAATAATAATAGTCTCGGACTCGG G AGTT GTT C AGT C AGT GTCCGTC A
SEQ ID NO: 3 OML4 promoter sequence cttactgtcagatggactactttgagaaaaaaagggggcaaaataactatatcaataaattaacctctgtcaaaacaggcaacaatta aaattaagagcagcttagaccattctttctaattttctagttataagatgcacattctacttcagttttcgttagcgcgtttttcaaactgctaaa cgatatgttccgtgcgaaaactttctatataagtagcttaaagatatcaaataaatccattattcaattttgtaataatcaaaaactcaatta atcatacgctaatgactttatattcccttactcaatcttcatctatcttaaattgggccatgtctctttttttaattaagatgcagattttacttcggtt ttcattagcacgatttttaaatcgctaaatggtgtgtttcatacggaggatctacttttgaaaatttttaatgatttaaactcaattatttatatatta atagctctctcattctgcgtgcccttacttaatcctcatcctcattacttacaaacactgcataacggagtaatagtattattattaatgttatgtt aatcctgatcctaatccctaatccaaagagaaccatctaaatacccggcgcaagcaaccccctctgctctgtcgtaaccaaaaatttcc ctctcccctgcgaactcccaccacccaaatttaactcccccaacctcccgcccgtcgcgccagctgacccgtcactgacagggtggg ccccacgccccggcgcggtgggtcccacgcgtcagcgaccgtgggtagggtgggcgcgggtgcgccccccccccacccggtccc gtgctccgcggtggcggtcaccgggtgcggggggtgggccgcgtatataggcgggccgccgcgccgcgcgctGCTGGCTAG
GTGTAGGAGCTTCAGCTTTGGCCCACATCGCCCCCCTCTCGCCCTCTTCCTTCGCTTTCGT
CTCACCGCCCCCACCGCCTCGCCTGGGGGAGGGGAGGGGAGGGGAGCCCTTCGCCGGA
GCGGCCAGGTTCCGGCGAGCATCTAGAGGAGGAGGAGGGGGAGGGCGGGGAATGGGGA
GCGGCGGCCGGAAGAGGGGGCACGTCGTCGCTGCTGCTGCTGCTGTTGTTGTTGCGTCC
CTCTAGGGTTAGGTAGGGGCGTTGCTGGAGTAGCTTTCTCCCACCCCCAATTTTTTTTGTT
CGTTCT CTTT CG CTCTCG AG GTCTCTCT CT CT CT CT CT CT CT CCCC ACCT CCGCCCCGCCG
CGTCGGGGGGTTGGTCCTCCTTGCCGGCGGCGTTCGTCGTCGTCGTCGTCGCATTGAGG
GGGGAGAGGTGATCCGGCCGTAGTCCATTCCAGCTCGGGGAAGGGGGGGGGCATGGGG
GCAGCTGGTCCGCGTGGTGGTGCCGCCGCTCTCGAATTCGTGCGGGGATTTTGGTTTTGA
AGAGGGAGGTGACCCGCACGCGCCGATCTGGTGAGGCCTTGCTCGTTTTGTGCTGTTTTT
TGTGCCTAGCTTTGGTCGGAGGTGTTTGAATTGTTGGGGAATTTTGAGCTTTTGCTGTGAT
CTGAGCTTCAAATTTCGGTGGGGGTTAACTTGGCCTGGGCACCTCGGAATTTCTGTTTAAT TTTTGGTGGGGTTTCTTTGATCACAAGATACTTGCTTGCTTGGAGCTTTGGGAGCCCGAGG
CGCATTAAATTCCACATCTTCTGCGCTGTTTTATCGGGAAATTAAACATTTCGTGCTCAAGT
CTGTGGGGGGGTTTTTCCCTCGGATTTGTCAAATCTGGCGGCTCTTGTTCGAAAATTTTCA
TCTTGGGAGCTTACGAACGCAAAATTCTTCACATTTCTTTTGCTTCCTGGCTTGGAAGCTGT
GG AAT CCAAATTTTT AT GTGCT G AATTG ACATGGTT AGCCATGTTTTTTTT CCACAG AACC A
CATGATTTTAGCAAAATTTCGCCATTTCTACTTTGATCCGGTGGAATCTAGTTGCCAGATGT
GTCG ACTG GTACCTTGTCTAACTAG CTCC ATGG CTATG CGCTTG C AG G
SEQ ID NO: 4: GSK2 amino acid sequence
MDQPAPAPEPMLLDAQPPAAVACDKKQQEGEAPYAEGNDAVTGHIISTTIGGKNGEPKRTISY
MAERVVGTGSFGIVFQAKCLETGETVAIKKVLQDRRYKNRELQLMRAMDHPNVISLKHCFFSTT
SRDELFLNLVMEYVPETLYRVLKHYSNANHRMPLIYVKLYMYQLFRGLAYIHTVPGVCHRDVKP
QNVLVDPLTHQVKLCDFGSAKTLVPGEPNISYICSRYYRAPELIFGATEYTTSIDIWSAGCVLAEL
LLGQPLFPGESAVDQLVEIIKVLGTPTREEIRCMNPNYTEFRFPQIKAHPWHKVFHKRMPPEAID
LASRLLQYSPSLRCTALDACAHPFFDELREPNARLPNGRPFPPLFNFKHELANSSQELISRLIPE
HVRRQATHNFFNTGS
SEQ ID NO: 5: GSK2 nucleic acid sequence
ATGGACCAGCCGGCGCCGGCGCCGGAGCCGATGCTGCTCGACGCGCAGCCGCCCGCCG CCGTCGCCTGCGACAAGGTATGTGACTAACCGGATCTTGGCGTGCTGATCCGTGGTTTTG CGGTTCTTTGCTGTGTGCTGATTTAGTGTGCTGTTCTTGGTGGAGCAGAAGCAGCAGGAG GGGGAGGCGCCGTACGCGGAGGGGAATGACGCGGTGACCGGGCACATCATCTCCACCA CCATCGGGGGCAAGAACGGCGAGCCCAAGAGGGTGAGACACGAGCCTTCCCCCCCCCCC CTTTGTTGTTTTGGTCTTGGTTCCATTTCTTGAGTTGCAGTGAAATGCTGCCGGTTCTTGGT TTAGG AAG GTGTTCTTGTGTGTTCTG C AG CTAGTTTCTTAG CTCCGTGT AGT G ATTTTTGGT GATGGGAAAGCCATTGGCTCTAAGAGAGGCATGTGGATTAGTGGTCAGATTTTGCAAAAGA AGTAAACT GTTGGTAG ATAT C AGCC AATTT ATTT AGT GTT AGTT GTT CAT GTT CTT GT ATT AC TGCAAGATCTGTTGTAAATAACTAAATATGGCTTGTTTGGTGCTCATTTTTGGTGGTTTGTA GGGGAAAAAGTTGGGTGTGTTGGATTACATTGTTGTGAACACTAGTGCTCATAATTAAATTT TGGTCTTAAGATGGTAATTTTGTACTTGATTTTCAGACAATTAGCTACATGGCGGAGCGCGT TGTGGGCACTGGTTCTTTCGGTATCGTCTTTCAGGTGATTCATCTTTCAGAAAGTTGTTATT TGTTTCTTTCTTTTCGTGCTGTCGACTTGTTGGTCTGATGTTTAGCTTGCTGGTTTCATGTGT AGGCTAAATGCTTGGAGACAGGAGAGACTGTTGCCATTAAGAAGGTATTGCAGGACCGAC GGTACAAG AACCGTG AGCTT CAGCTT ATGCGCGCCATGG ACCACCCCAAT GT CAT CT CCC TGAAGCATTGCTTCTTCTCAACCACAAGTAGGGATGAGCTGTTCCTCAATCTTGTCATGGAA T AT GTT CCAG AAACACT CT ACCGTGTGCTT AAGCACT ACAGCAATGCCAACC ACCGG ATGC C ACTT AT CTACGT C AAG CTTT AC AT GTAT C AG GT GTGTGG ATTGCT AAT C AAT CAT AAATTTT GAAATGCCTGCCTTCCTGTGTGTCTCTTCTAAGTCTATTCTACATTGGCTGCAGTTATTTAG GGGGCTTGCGTACATTCATACTGTTCCAGGGGTCTGTCATAGGGATGTGAAGCCACAAAAT GTTTTGGT AGGT ATT CAT GAT CAG ATT ATT ATTTTGCT ATGCG ATGGCCTTT GATT ATTGGCT CT G AACT CCTTT CTT G C AAT AC AGGTGG AT CCT CT AACT CAT C AAGT C AAG CTCTGT G ACTT TGGGAGCGCAAAAACACTGGTATTGGCCTTTTCCACCCTAAAGTTTTGTAATACGCACACA TT ACTTT AG ACTTT CTTTTTTTT AATTGG ACTTT AG ACG ATT CTTGCT GT AG ACT AGT CAGTTT T G AAT CTT ACC ATTT GTT AAGTTGG AG CT AG CCCT GTGTT ACT G AAT CGTT C AAAG AACT CT T AT AT ACTTGGT G AAT CTT ACCCCTTTTTTT CTT CCTTTTT ATT ATGCTTG ATGG AAG TTT CAT GG AAATT CCTT AGTTTT AC ACCTTTTT CC ACCTT ATT CCAG AT GTTTGCT AC AATT GT ACTTT TG AT AATTTTG AT CTT ACT GTCCT AAT AT CC ATT AATTT ACT ATT CC AT C AG GTCCC AGGTG A ACCCAATATATCATATATATGCTCACGCTACTACCGAGCACCGGAGCTCATATTTGGTGCAA CT G AAT AT ACT AC AT C AAT AG AT ATATGGT CAG CTGGGTGTGTT CTT G CAG AG CT ACT CCTT GGTCAGGTTGGTTTCTTTTTTCTATGGTTGACAGATCTGCAAACTTTTGGTTTAGTTATTTAA GC AT G ATGT CAT C ACT GTTG CTGT G ATTTTG ATT AT CTT GT ATTT GTTTTTGCT AG CC ATT GT TT CCAGGGG AG AGTGCAGTCG AT CAGCTTGT AG AG AT AATT AAGGT ACTGCAAG ACATGCC AT G C AGTT CT AATTTT G CTCCT ACT ATT G AGT AT G GGC AT CTT CT CT AACCTT GTATG AT ATT CTTGCAGGTT CTTGGT ACACCAACCCGTG AGG AAAT ACGTTGCAT G AACCCG AACT AT ACA G AGTTT AGGTTT CCACAG AT AAAAGCT CACCCTTGGCACAAGGTAAGCAT ACAAT CTT AT CC ATGTTG AGT CAT AT AT C ACGT CAT CTTTT AT AGTTT CCTGG AC AACT AT G AAAAT GTAG CTG G G CT C ATTT CCA AT AAT AG ATT CTG G AC ACC AG AT AG CTTT ACAAT G C A AT GTAT AAAT AAG GAGGTGCATACAGGTACTGATTTTTCTAACTTCTGCGTAGGTTTTCCACAAGAGGATGCCT CCT G AAGCAATAG ACCT CGCTT CACGCCTT CTT CAAT ATT CACCG AGTCT CCGCTGCACTG CTGTG AGTAT ATT CTT G CTG C AATTTT AAGT AGC AG AAC AGTAG AAAAGTG ATTTTT C ACT A CTGCTCACAGCAGGGGTACTGTAAAACGCCCCTTTTCTTATTGCTGTTATGCAAGTTTGCCT ACTGTAGCTGGTCATATGAGCTGTTACTTTTCACCCTTTAAGAGTTGCACAAATTTGAGCGT AACCAAGG AATTTT CTT AAT C ACTTTGCCCTCC AAGT G CT CTTTG ATTT GTG CAACT CCTG A AATGG GGTGG AGT G G AG AAAC ACT CCTT GTTT CTTT CT CTTTTTT CTTTTTTCCT AAAGT AG A TTGAAGAATGCTAGTCTTCACTAACTTTGGTTTTAGTGGGGCATGGCCATTATGGTTATGAT CTTTAGTGGTCCATTACCAAATCAATGTTGGGGTGGATGAATGATAGTTGTCTCATGTTTAG T CGTT ATT CAGT GT AATTGCAAT AGCCAG ATG ACAACTT AAT ATT G ATTTTTTTT CCG AT GT G CTT ATT C ATTT G AAT AT CTTT ATGC AGCTT G ATGC AT GTG C AC AT CCTTT CTTT GAT G AGCT G CG AG AGCCG AATGC ACG CTT G CC AAACG G ACGTCC ATTT CC ACCACT ATT C AACTT C AAAC ACG AAGTAAGT G AAT CAG AT G AAACAT AAT CTGCTACACAACTT CAG ATCTTGGTAT CCAT G AGAAAATGTGTACTCTCCTTGGTGCTCATTGGTGCTGCCTTTTGGTCTCTACAGCTAGCAAA TTCTTCTCAAGAGCTCATCAGCAGGCTCATACCAGAACATGTTCGACGGCAAGCTACCCAC AACTT CTT CAAT ACTGG G AG CT AAAAAT G CT AAAT G C AC ACC ACC AG ACCTTTT GTTGG ATC GTTTTCGCGGAACCGGTGAAGTTCACATGAAGGCTGAGTCAGATGATTCTTCGAATCCCCG CAAAACAAGAAGAATAGAAAATATGATTCCTCAGATGATGATATGCAAATGCTTCGTTGGAA GTTCAATTCAATCATCGAAGAAGAACAACATTGTAAATCGAGAAGTTTTTGCATCGCGAGTT TGGTAGTGAAACCGGGATCAGCTGGTATGACGGAGGAAACCGAAATGTTTAGATCCATGA CT G AGTTTT CTTT C ATTTTTTTTGCCC AATT GT AAC AG AAG AAT AT AGTT CCCT AATGTAG GC GT AGTT GT AACCT GT AAACTGCC ACT GTTTTGTT C AC ATT CCAT G ATGT AAATGCC ACC AT G CCT CT GAT G AAT AACT CTCCTT GT AACCTT GTT CCTT CCAT CCTT G ACT GTTT ACCTT AAAGC CGTGGACAGTGTACACTGTACATGTACCGTGCTACACGGAAGGACATATTTGAATTTTTTTT CTCTCTCTCG AAAG ACTAC ATC AAG CATTG CT G G ATTTTTTTTT AAA AAA AT G G C AC A ACTTT CG ATGGT C AACC AT AAG C AATT AGT GT CGTTTT AAAACCCCTT ACT CCCAT ATGC AC AAT AC TT AT CTTT CATTTTT CT AG ATT GTTT AGCAT AAAAT AAG ATTT AAAAAG AAAG AAAACT AT ATT GCT ATTT AATT GTTGGGTGTAG AATGGG AAAACTTTTT AAAT G AAAG AT GATT ATTT CTT AAT GTAACAAGTAGTACTGTAGTGTGGATTGAATTGGGGCAAACTTTAAACTCTAAAACGAGAA CT ATTTTT G AAT AG AG CG AG C ATTT AAA AG AT G AATT AC AT ACC ACCT AT AG AG ATG AA A AA AAAGGCATGGGACAGTTAGGGCTTGGGCCCCATAGCTTGTTAGGTTTGTAAGATTAAAATT T AG AGTAAATTT C AC AAAACT AC AC AT ACT AT G ACC AAACT AT C ACAAAACT AT AT ATTT AAC T CG AT GT ATCAT AAAACT ACACATTT AAG ATG AAATGTT ACAAAACT ACATGTTT AGTT ACT A C ATT ATT AC AG AACT AT AG GTTT AG AACC AATTT AGTT AC AAAAC AAT AATGTTT ATT G CT CT AG CAT AAT AAT G GTG CT AG G G ATTT AAACT CT A AATT GTG AT AACTT CAAT ATT AAAT ATG C A GTTTT GTG AT ACTT AACTTT AAAT CT AT AGTTTT ATG AT AC AAC
SEQ ID NO: 6 GSK2 promoter sequence
CCGTACTGATTTCGGCAGCATCAAGGACTAGAGGAGGAGGAGGAGCAATAAACAAGTGCC GCCATGTCGCTTGCCCGGCTTTCAGGGGCGCTTTTGGAATTCTCGTTTGACCGCCTACAAA CCCAC AGCCTGTCCCTGCAACCAATTTGGCCCTGCGCCACGCC ACCCCCAAAGCT ATT AG T ACT AACCACACCCCCT CT CT CT CT CT CT ACACTTAGCAGT AGT ACT AAG ACCCT CTTTT AT AAAATTTTT AAGTCACT AT CACATT CGTT AGCAGT AGT ACT AAG ACCCT CTTTGT AAAACTTT T AAGT CCCT AT CACAT CG AAT GTTTGG ACACT AATT AT AAAT ATT AAACGTAG ACT ATT AAT A AAACCC AT CCAT AAT CTT AG ACT AATT CG CG AG ACG AAT CT ATT GAG CCT AATT AAGTC AT G ATT AGCCT ATGTG AT G ATACAGTAAACATT CT CT AATT ATGG ATT AATT AG ACTT AAAAAATT TGTCT CGC AAATT AGCTTT C ATTT AT AT AATT AGTTTT GT AAAT AGT CT AT ATTT AAT ACT CT A AATTAGTGTTTAAAACAGAGACTAAAGTTAAGTCCATGATCCAAACACCACCTAACATGGAC AATT AG GCTGT ACT AC AACCTTTT G CC AAGCT ACGTGT AC AGGTAAAT CG C AC AC AT GTTGT CATCTTTGGAGGCTG AAACATGGGG AAAT ATCATGTGAAAACCGTT AAAT AAGTGAAAACT CATG AAAATT AT ATTT AAAAGTT CT CT AAATTT AT AT AAAAATT AAT AG AG AT AAAT AT AG ACA T G AAAT ACATTT ACAT CAAAT CT CAAGTCG AAACT CAACTTTT ATTT GAG AG AAT AT AAAAG A CAAATTTT AGGTG AAT AGT GTT CT ATT ATTTTTCAT CCG AAATTTGGCATTTTTGTT ACT CCC AAAT AAAGTT G AGTTT AACTT GAT ATTT G GTG AAT AT ATTT C ATGCCT AC ACTT CT CT CC AGT AATTTTT CAT G AATTT ATT AAACTTTT AGTT CCG ATTTT CACGTGTTTT CACTT ATTTG ATGGT TT CC ATCG G AT ATGT CCCTG AAAT ATGGT ATT G AAGT AC AAC AT AGTT CAT ACTTGG GT CGT GTTTGAAGTCATGTAAGGGCGTTATAAGCTCATAGGTTTTGCTTACATACAATTGGTGGGAT AAAAAGGCACCGGTAATTT CTT CAAG ATTG AT AAAAT AAAT GT CT AGCGCT ATAAGGCC AT G GCACACATCAAATGTTGTTTAGAACAGGTTATTTCTAGCTCCATAAATTGTTGGATTTGAATT T GTG ATT AC ATT GAT AAT AATT GATT G AAT C AGTTTGTT CTT ATTTT AG AG AAAAT AAAAAAAA T AAACC ACT AT AATTT AACTT AC AAACT CC AAAACT AGTCTG G ATCTGT AATTT AG GTTGTGC T AAAC AAGG CCT AAAG AAAAG AAGT AAT GTTT G G AG AAC AT GTTTTT AAAT C AAT ATG G ACC AT CTT CT AAAATGG GC AC ACT GTG C AACCG AAATGGTT ATT AG C AACTT AAT AT CC AAT CCT T AAAAAAACT AC ATT G AAAAT AT CCT AAAT CCC AAAATT AAATTTT AAAAT CT AATTTTGGT AG CGATTGATTATTTGTAGGGGCAAATGATGAGGCCCTAAATCAACCATGTTTAGCTACTTCCT CGCTTT CTTT AAGTAT GTTT CAT ACGCT ACAAACT GAT ATTTTTTGCAAAT ACTTTTT ATT AAA AAAATT ATTTT AAGT CTGC AAAAG CT AAT GTTT AATT AGTCCT AC ACT AAT AAT CCT CCTT GT TTGGCTTGCCGCTGATAAGCTTAGTCAAAACCCTGATCCGAACTGCACGTAAGAACGGTCA AG AAACCATTT CGGTT ACAT CACAC AACACAGCCT CAT CT CT CATGCT GT CATGCTT GTGGT GCACCT AGCAATT CCT CCCT CCCCAT CT GT CTT CCT CCT CT AAT CT AAT CCACCT CCCCACT AATCC ACC AG CTGTGTAC ACTG C AG C AG C AGC AG C AG CT AACC ACT CT C ACT AAAAACT AT AGCAGCTGCAGTAACAGCAGCAGC AT CACCCACCTT CTT CTTGGT CAAAGCCAT CCAT CCC ACC ACT CACCC AT CCCT CCCAGT AT AAGCCAAACCAAT CCAT AG AGG AGG AAG AGG AGG A CCAGGTGGTGGCACACCTAAGCTTTGTGCAGTGCCATTCACGCACCTGCAGCTTCCAGCT TTGCCAC
Wheat
SEQ ID NO: 7: OML4 amino acid sequence
MEPYKLMDQKTPFGERKLLGHQRHVNLPPTPWRADQDPLQQHDSFSKPLALFPNARKGHLN
MTQYENGLFSSSLPDIFDNKLRLTPKNGLVGQPAEKEVNHADDEPFELTQEIEAQVIGNLLPDD
DDLLSGVLYNVGHPARANNMDDIDDDIFSTGGGMELEADENNKLLKLNGGANTGQTGFNGLLY
GENPSRTLSIRNINTNVEDTELKLLFEQYGDIRTLYTAYKHHGLVMISYYDIRSAERAMKALQSK
PFRQWKLEIHYSIPKENPLENDNNQGTLAVINLDQSVTNDDLRHIFGGYGEIKAIHGTSQNGHH
KYVDFFDTRAAEAALYALNMRDIAGKKIRLERCCAGDGKRLTTLHRPPELEQEEYGACKLGNA
NSLPSTYYGSVNMASMTSAGPEHGISRVLRPRVQPPIHQFREGAFLDVPSSTMQSISSPVRIAT
AVTHNNRSTVGENGHSLGKMGGQINGHLNYGFHGVGAFNPHSLPDFRNGQSNGISCNLGTIS
PIGVKSNSRTAEGMESRHLYKVGSANLGGHSSGHTEAPGFSRTGSCPLHGHQVAWNNSNNS
HHHTSSPMLWPNSGSFINNIPSRPPTQAHGISRTSRMLENVLPVNHHVGSAPAVNPSILDRRT
GYAGELMEAPSFHPGSAGSMGFSGSPHLHQLELTSMFPQSGGNQAMSPAHIGARSPQQRGH
MFHGRGHIGPPPSSFDSPGERARSRRNESCANQSDNKRQYELDIERIVCGEDSRTTLMIKNIP
NKYTSKMLLTAIDENHKGTYDFIYLPIDFFQNKCNVGYAFINMISPEHIVPFYKIFHGKRWEKFNS
EKVASLAYARIQGKSSLIAHFQNSSLMNEDKRCRPILFHSDGPNAGDQEPFPMGTHVRSRPGR
SRVLSCEESHRDTLSSSANNWTPSNGGGHASGYSKEADPTTA
SEQ ID NO: 8: OML4 nucleic acid sequence
ATGGACCCATACAAGTTGATGGACCAGAAAACTCCCTTTGGTGAGCACAAGTTGTTGGGCC ATCAAAGGCATGTTAACCTGCCGCCAACCCCCTGGAGGGCTGATCAAGATCCTCTACAACA ACAT GATT CGTTTT CG AAGCCGTTGGCTTT ATTT CCT AATGCT AG AAAAGG ACATTT AAAT AT G ACCC AAT AT G AG AATGG ACTTTT CT CAAGCT CCCTT CCAG ACATTTTT G ACAACAAAT GT A AG CCCTTG AT CCTT GTCT CTT G C AGTTTTT ATTT C ATTT ATT GT AGC ACTT CAT AAC ACTG AA CT ATG AACTGCGTCC AT CCG ATATG GT ACT CCT CCCTT C AGTT CAT AT AAAT AAT ACT CCCT CCGTCCCAAAATGTAAGACGCTTTTTGACACTATACTAATGTTAAAAAGCGTCTTATATTATG G G ACG G AG G G AG T AG T ATG C AG AT AACG G AAAG G G T AAAC A A A AG AAG AT AAG G AAAAT A TTTTT ATTTGCTT ATT AATAAAAAGCTT GTTTGCTTTT ATTG ACT GTTT CACTT CAGT G AAAT C TGAGCTTTTCTTGCTACATCCAAGTGAGAAACGAGACAAACTGGCCTGAGCTTTTCATGCT ACATCCAATTGAGAAATGAGAGTCTGTCCTGTGCTTTTCATGCTAAGTCCAAGTGAGAAAA GAG AC AAT CTGC AGT AAT ATT AGTG CTT AAT ACT AAACC ACTTTT AATTT G CTG AT GTGC AG TGAGACTAACACCTAAGAATGGCCTTGTTGGCCAGCCAGCTGAAAAGGAACTCAACCATGC AGATGACGAGCCTTTTGAATTAACTCAGGAAATTGAGGCACAAGTAATTGGCAATCTCCTC CCTG ATG AT G ACG ACTT ATT GT C AG GT GTT CTTT AT AAT GTGGGT C ACCCTGCCCGT G CTA ATAACATGGATGACATTGATGACGATATATTCTCTACTGGAGGTGGAATGGAATTGGAAGC TGATGAAAATAACAAATTGCTAAAACTTAATGGAGGTGCCAACACCGGTCAGACTGGGTTC AACGG CCT ACT GTAT G GCG AAAACCCCT CG AG AACCCTTT CC ATT AG AAAC ATT AAT ACC A AT GTTG AG G AT ACT G AATT G AAACT CCT ATTTG AG GT AAGTT CCAT CTT CC AGCTT G ACTTT CTCCCAACTCTGAAGG C AAT AT ATTT C ACCTG AT AG C ATTT ATTTT CTTT GTAG C A AT ATG G A G AC AT CCG AAC ACTTT AC ACTGCCT AC AAAC AT C ATGGTTT AGT G ATG AT AT CTT ACT ATG A TATAAGATCGGCAGAACGTGCCATGAAAGCGCTTCAAAGCAAGCCATTCAGGCAGTGGAA ACTTGAGATACATTACTCCATCCCAACGGTATTTCCTTGATATAATGCCATTCTGACTTGATA
TGATGTGGTGCTTTGACATTACTTAATGTGATATTACTACGATGTTTGCTTGCCATTATTTGT
TGCATTGGT ACTT AATTGGCACTGG AAAT GT ATTT AT ACTTGCAAG AAT GTT CACATT CT AAT
GCT G ACTTT GTT CC AAT AG G AG AACCCTTT G G AG AATG AC AAT AACC AG GG C AC ACTT G C A
GTGATTAACCTAGACCAGTCTGTAACTAATGATGATCTTCGTCATATATTTGGTGGCTATGG
TGAAATCAAGGCGGTATGGCCTGCGCACTAACCAACTCTTATGTCAGCTAGTACACTACAG
AT ACT AACTT CCTT GTTT AT C AG ATT C ACG GG AC AT C AC AAAATGG CC AT C AC AAAT ACGTT
GAGTTTTTTGATACCAGAGCAGCAGAAGCTGCACTTTATGCTTTGAACATGAGAGATATTGC
AGGAAAGAAAATCAGATTAGAGCGCTGCTGCCTGGGCGACGGTAAACGGTATTACTGTGA
CCATAATTTTGCGCATCTGTCCATTTTTAGTGCTTCTAGTGCCTTCGCTTTTCGAAGAGTCT
T ATT ACAC ACCTTGT GTCGG CT ACCCTT CAT G AAGTT CTTTTTT CT CC AAC ACGCTTT AG AAT
GCTGTTATTTTTTATTAGAGGAAGGTACTGAAATGGCACAAGGTATGACCAATGGAAGCCA
AAAATG AACAG AAAAACT AAAAAACCAGCAAAC AAAAAACCAAAAAGGCT AAG AAG ACT AAC
AAAAT CT AACC AAAACT AGC AT AAT G ACCT AT AT AT ACT AT ACC AATT AG G AAGC AAG AG AC
CT G AAGCCCCAGT AAGCAGCAG AAT ATGGCGGT CCAGT CGGT AGCAGCAACCTT CGCAG A
TCAACTTTGTATAAAGTTATCTGGGTATCTGCGAATTGAGGAAATATATAATTCAACGGTGT
TTATGGT CTT G CAT AT ACT GTG AAGTT G GT AAAC AT ATCGT C AAT G G AC AT AT AC AGTAT AC
ACG GGG CTG ATG C AATT CCTGTCT CTT C AAT AAAAT AT GTTTT AGTT ATT AAAC ACG C AAAC
TTGTGATTGACGTTTAATATGATTTTTTAGAGTCGTCATTTGCACTTGAATTCAAAGTTGGTT
GTATACTTGTATATTTCTTGTTTTAGGGAAGTGTGCTTTGGAGTTTGGAGGAAATTGGTAGG
TGTAAAAAAATCTTTTCATATGATGTGCAGGAAGATGTGTTTTAGAACTTGATGCAGAACGT
CCCCCCTATGGATTATTATGCTTGTCTAAACTTTATTTTTGGAGGAAGAAACAAGAGCATCT
G ACTTT CCTT ATGCCT ATT CTT ACAACTGTATT AGT AATGCT AGTTTTTGCACAACAGTTT G A
CGCGGCACAGGCCTCCTGAGTTGGAGCACGAAGAGTATGGTGCATGCAAGCTAGGAAATG
C AAAC AGTCTG CCGTC AACTT ACTACG GTATG C AGTTTG ATTTC AAATC ACG AG AC ATGTTT
CTGCTG CT AAT CG C ATTT ACT AACCT ATGTAT G GC AT AAT AC AAG GTT CTGT C AAC ATGG CT
T CCAT G ACTT CCGCTGGT CCT G AACATGGG AT CT CT CGGGTT CTGCGT CCCAG AGTT CAG
CCACCAATACACCAATTCAGGGAGGGAGCTTTCCTGGATGTTCCCTCAAATACTATGCAAA
GT AT AT CCT CTCCT GTT AG AATTGCAACTGCAGT AACGCAT AACAACCGGTCG ACT GT CGG
TGAGAATGGTCATTCACTTGGAAAGATGGGTGGACAGATTAATGGACACTTGAATTATGGA
TTTCATGGGGTTGGAGCTTTCAATCCACATTCCCTTCCTGACTTTCGCAACGGCCAAAGTA
ATGGT ATTT CTTGCAACTT AGGCACAAT AT CACCCATTGG AGTT AAG AGCAACT CT AG AACT
GCT G AAGG AATGG AG AGCAG ACAT CTTTACAAAGTTGGTT CTGCT AACCTTGGTGGT CATT
CTT CTGGT CAT ACCG AAG GT ACT AATTT G GGTG CCTT ATTT ACT G ATGT AGCC AT AT GTTT A
TGG AG ACGC ACT GTTT CC ATT AG GTT C ATTTGCC AT CT CTTT CCCTT CCAGT CATTTT CTT G
AAAAT G T C A ATTTT G A AAG AAC AT AT G CTTT GAT AT C A AT AAT AC AG A AG CTTTT AT AG CTT A
ATGGT AATTGGT GT AGCCT AAATT AT ACT ATTTTT G AGGTTGCAACT ATT CTGTTT AG ACAAT
GCAATTAGGCTTACATGGGCATGCCTTGTGTTCTTGTAGCACCCGGGTTTTCAAGAACTGG
AAGCTGCCCCCTTCATGGCCACCAAGTAGCGTGGAATAATTCAAATAACTCCCATCACCAT
ACCT CCT CTCCC ATGCT ATGG ACG AACT CAG GAT C ATTT AT C AAT AAT AT ACC AT CTCG ACC
TCCCATGCAAGCGCATGGAATTTCAAGAACATCTCGCATGCTTGAAAATGTCCTTCCAGTG
AATCATCATGTTGGATCTGCACCAGCTGTCAATCCATCAATTTTGGATAGGAGAACTGGTTA
TGTAGGGGAGCTGATGGAAGCGCCAAGTTTCCACCCTGGGAGTGCTGGAAGCATGGGTTT
CT CTGGT AGT CCGCAT CTGCAT CAGTTGG AGCT CACT AGCAT GTTT CCT CAG AGTGG AGG
GAACCAAGCCATGTCCCCTGCACACATTGGTGCTCGATCTCCTCAGCAGAGGGGGCATAT
GTTTAATGGAAGGGGTCATATAGGTCCCCCTCCATCTTCATTTGATTCACCAGGTGAACGT
GCAAGGAGCCGAAGAAACGAGTCATGTGCTAATCAATCGGATAATAAAAGGCAGTATGAG
CTAGACATTGAGCGTATAGTCTGCGGCGAGGATTCCCGGACTACTTTAATGATAAAGAACA
TCCCAAACAAGTATACATCTGGGACTTTCTGATTTTGTTCTAGTTTATGTGCAAGTGTCACT
CT ATTTG AAGT C ACGCC AT GTTTT GAT GTTT CT ATT G CCTT AAT G GT ATTT C AGGTAC ACCT C
T AAG ATGCTTTT G ACCGCT ATT G ATG AAAAT C AC AAGG G AACTT AT G ATTTT AT CT AT CTT CC
AATTGACTTTAAGGTGAATGGAGCTTTTGTAAACAGCTGTTGCATGTTTATCCTTGGTTCGA
CATTACTTGCATACAACGAACTAATGGTGCTCATGTGCATTTTCAGAATAAATGCAACGTGG
GCTACGCATTCATCAATATGATAAGTCCTGAACATATTGTTCCATTCTATAAGGTGAGAGTG
AG ATGTT ACAAGTT ATG AAATGGCGGCAGT GT ATT AG AT AAAGCTT CAT GTT G ACATTTTT A
T AT G ATTTTT C ACCCT CTGCTTT CCGTCGT C ATTT CTTTTT CCAT AACT ACCTGT ATT AC ACT
ATCATGCTACAATTGCATGGATTTTGGATATCGCATGTCAGGTAGTCAGTAGTACCTTTACC
ATTT CT G GTTT C ACG CT CT AAG C ATTTTTT ACCT AATG CCAGT CG AT AAATG AAC AAC AT AC A TGCCTGTCTCTTTCAGATATTTCATGGGAAAAGGTGGGAGAAATTCAATAGCGAGAAGGTA GC AT C ACTT G CAT AT G CT AG AAT CC AAG G AAAAT CAT CT CT AATTGCG C ACTTCC AAAATT C AAGTTTG AT G AATG AG G AT AAACGCT G CCGCCCTAT ACTTTTT C ACT C AG AT G GT CC AAAT G CAGGGGATCAAGTATGTTCTCTGATTGTCCATATCCTTTGCTGTATTACTGTTTCGATAGGG C ACCTG ACTT G GTGCC ACT AACT AG ATG ACCT GTATAT CTT ATT GTGT G CCC AT CC AAT AC A T GAT CGGTG AAGT CCACACACAT ACCT AATTTT AT AT CATT AT ATTTTT ATT AT CTTGCAT CT G AAATT AAG CAG TAG AC CTT ACACAGTTT AGT AT GTTTTTTT CTT ATGCT ACGT CAG AACTTT TCCTG AGT ATTT CTTT CCTTT AG AATT GT ATT G ACGCG G AAAG AAAT ACTG AG G AAAAATT C TT ACT CCCT CCGTTCC AAATT ACT CGTCGT G GTTTT AGTT C AAG AGTACT AATT AAAAT CCTA C AAAT CAT G G AAT ATG ATCCT C AATTT ATT CT AAAT CCTTT G AAAC AAG G AGGTCCTTG AG G ATT CAAGCAT AGGCT CAT CGTT CTT GTTT CCT AGT GTTGCT AT GTT CTTTTT AATG AT ACAT A TAGCTGTATAGGCTCATCATTTTCTTCACATATGGTGGTGTTTGATGCGCAATTGGGATACA TTGTGGGTTAGGGAGCGG AAAAT AAAACCATCTTGTTAACATTAGGCAGGGCTATGAGTTT GG AGTAGAGAAT AT AGTGCATACATGACAAGATTGCCCCTCG AT AATGGCTTT ACTT AATTT TGTGTGTATAT ATTTTTTGT ATTTTT AAC ATT ATT ACT C AACCT GTCT AC AAAAAACC AT CGTT CT GATT G G ACTT C AAACT GTGGTATAT G AAACT ACAT AT CCC AT G CC AAAC ACCCC AAT AG A TT G AACT CCT CCC ACCC ACT ATTT CC ATT CTT ACCT CCC AT G ATG AGTCT G AACTG ACC AT G TTTTTGTTGTAAACTTTTCTCAGGAACCTTTTCCAATGGGAACACACGTCCGTTCTAGGCCT GGGCGATCCAGGGTTTTGAGCTGCGAAGAAAGTCACCGGGACACTCTGTCATCTTCTGCC AACAATTGGACTCCTTCCAACGGGGGCGGCCACGCTTCAGGCTACTCAAAGGAGGCTGAC CCAACCACAGCTTGAAAGCTGAAGCACTAACCACAACATCAACATCCAACCTTTTGACATTT GC AAT CCC AGTTTT C AC ATT ACC AT CCTTT CCC ACCT CTTTTT G CTT GTGGT ATTTT CG G AG TCTGTAGCTATTTAGTACTTTCTATGTCGTGGGCTACCAGAGGCTTCCTAGAGGCTGCAAA TTTTGTCGCTGAGTAGAAGCAAGGGAACGGACGGAGGGTGCTCCCAGTTTCTCCTGAGCC TAT ATGCGTGTATT AACTG AAGGCCGTGGAAGGCAAAACTCGTGGGGAGCTCTCTG AG ATT TTGGACTGTAAGGTGTAACCCAGCGTTGTACAGGGTTTCCTAGTAAGAATGCATGACGGG GACAGCCGACACTGTATTGGTGCTGTTGTATGAAAGGCAGGCTGTGCCATGCAGCGTCTT TTGAAACTGTTTTGATGTTAACTACTCCCTCCGTCCGCGAATAAGTGTACATCTAGCTTTTA TT CT AACT CAAAGTTTT G AAACTTT G ACCAACTTT AT AGGT AAAAGTAGCAT CATTT ATGGCA CT AAATT AGT AT CACT AG ATT CGTT CT G AAGTGT ATTTTT AT AAT AT ACCAATTTG AT GT CAT A T AAT CCT ACT ACT CTTTTTT AACAAGTTGGT CAAAATT AT AAAACTTTG ACTT AGG AAAAACG GT AG AAGT ACACTT ATT CG AGG ACGG AGGG AGT AGG AAACATGCCCGT GT GTTGCAACGG GAG AAAT AAAATCCTT G AC AT AATG AT AATT GTT
SEQ ID NO: 9: OML4 promoter sequence
GAGGAGGGGACAACAAGAATGCCAGATGAGAAGGGGATGATGCACATGCCGGCAACATG TG ATAT GTAC ATGT CTTGGTT AG AG ACTTTT GTTT AT G C AACCT ATT AAAAACT ATGTG CAT G TTTGCTTG AT GT GTT AAACATTT AAATTTG AAG AAAT CAAAAT GTTTG AAT G AAAAAG AAT AT GGAGACCGAGATATGTCGACTCTGAGTCTGCCGTCGCCTGTCTAGATTTCAAATGAAAAAC GT CACAT AT ACATT CT G ACAAGC AC AAATGCAGCG ACATT ACT CTT AG AACGCAG AAAAGTT GCT AT G AGT AC AAAC AT G CC AACCT AAC AG G AAGTG CTGT CG AG AACG AG CCCTTGCCTT GATGGTTGGCCTTGGCGGTGGCCAACTAGCCCACCTGGGTTTGATCCCTAGGATTAACGC GAGTGTTTCACCTGGCGCAAAAGAACCCATAACCTAGGGTTCTTTTGCCAGTTTTAAAGTAT CTCACATGTCTATGGAGATTCACGGAGTCGAATGATGCGATTTAGGTGTCTGTCACGAAAT T GTTTTGGT GTG AAT CAAAT G ATTTGCCCAT ATT AG ATGCAAAG AAG AAT CATTTT AATT ATT TT CCCT AT AT CT GTT AT CTTT AAT GT ATTG AAAAT GT AAAT AGG AACAACGT AATTTT CAAGG C AAT C AAT AAC AAT C AAT GTTGC ATTTTT AG GCTT GTTT G AAAAT G CAT AT AG CT AGT AGTAG T AT ATTTT GTTT G AAAACGTG ATTT CAAAT AT ACT CCT CACT AAACTG AATTTTCCCAGTGTT TTTGGAAGCAGAGTTCTTCCAACACGGAAGGTGGTACCAGTATAAACGTGCCAACCTAACA GCAAGAGCCGTCGAAATCCTCGGTGGTTCGGCGGCAGCAACTCCAGCGGTCCAGTCCCG AT CG ACCCCCACACAACT CGT ACTGGTG AGCGTT AT CCT GT CCCACACCAG ACAAT CGG A G AACGTG ACCT CCGCGT CACCT CACCGCGCCAACCCCCACCCCT CCGCG AAAT AATT CCG TCCCCGTCCAAACGCCGCTTCCCACCCGGGCCGACGCGCAGCCACGCGGGTCCCGACGT CTGACACCGGCCCCAGCTCACTGACACGTGGGGCCCCTGACCCGGGTACATGTGCCGTT CGGCACGAACGGAATGGCGGAGGAGCTATACGGTGCCGCGGTGGGGTGCGCCGCGGGT GAGCCTACCGCGGTGGGGACCACACCGGGCGCGGTATATAAAGGCCCCGCACCTTCTGC AT CG CGTTT CC ACTT AGT CC AAT AAAT AAT AT AGT AAT AC AG CATTT CG CCGCT CTTTT AATT AG ATTTTTTTGGCCTT CGTCT CCGCTT CGT CT CGTCT CGT CT CACCACCGCC AGTCCACCA CCAGCTCGCCGGAATTCCCTCGCCGGAGACGCGCTGCGGAGGCATACCTAGGAGGAGGA
GCGAAGAATGGGGAGGAGTGGAGGGGTGCCCGCTGCCGCGCTCGCTCGCTAGGGTTAG
GTGCGCGCGTGCGTGAGGATGGAGCTCGCTCTCTGAGCTCGCCTCGACCCCCCGCTGAG
GGCATTAGCTGTGCCGTCGCTGCGCGCTGGGTGATCTGCCCTCCTGTGAGCTCCGGGGG
AGCCGTTTTGGCTTGCGCCATGGAGCCGTCTCTTCCGGGCGCGGCCACGGGTTGCGTGT
TGCGGTAAGCTCCTGGCCGCGACAAGCTGAAGGGGATCTCGACTGCCGATCTGGTGAGC
CTACAAATTCTTCCGTTTCTAACAATTTTTTTGCGGGTTTTCATGCAAATTCGGGGTGCGTTT
GTGCAGAAAAATGGTGTTTGAATTCCTGGAGGTATTTGTTTGGGGGTAATTTTGTGCGTATT
TTTCTCGCTTTGTGTGATCTGTGCTCGGGTTCGGGGGATACTTTGATGCGGTTGGGCAGTA
TTTTGGT CT CTT CTGCCTTTTT ATTTT GAT CACAAGTTT CTT GTGCT CTTT CG AGCT CGT CG A
GGACGAGGAGCATTAAATTTCCCGTCGTCTTTTGCGCTGTTTTATGGCTAATTAGTTTGGGA
GCACAGAGTTCTGCACGGAATTTTGCATAACCTTTTTTAATCGTTCAATTCGAGTTGTTCCG
GTCCT AAATTTT G C AG AATTT CT CC AGT GTTCCGT AGCCG GT CTCGT G AGTT CG ATTTTGG
GTTCACGGTCGATCAAATCTAGGCCTCGGGACTGCATTTTCTCGCGTTTATTATTTGATGAT
CTGCTT C AGT CG AG CTACCTG AG GTGTT G AAACTTGGT ATCTGT CT AT CTTT C AAG GT G CTA
GCAGG ATGCCAGCT AG AAT CATGG AGCAG AGGCACCACATGCCCCCATT CCACCT CCCCG
TGGAGTCCGAATCGTCTTCTCCCATGTGGTAAGCCAACTGCAATAGCCATTATTGCCCGAT
ATCCTTAAATGATGTCTAATGATGGACTGCATTCTTTCTTACTTTAGGGTAGGGGGTACTAA
TTGGTT C AGTTTT G GGGTG ACTT G GT C AGT AT C ATT AAC AACT AG ACCT AG GTT AATT CCTT
CATCATTATCGAATTCTTTTTGTAGAGGCCTGTTGGACACTTCAGGCAGGAATGTTTTCCTG
AG C ATGTT AGTG ACT ACCTG AAC ATT CGTGT AATTT GTAGGT G CTT ATT AGTTT ATT CTT CG
GGTAGTTTCTCTCGGACTAAATAAAATGTGACCTAGCAGAACACTGTTACAGTTTACGGATA
TGTGGGGGCATCCGAGGTTCCACATATGGGACAATTGATCGAGCAAGATTGGAGGATGTG
TATCGTT GTTT CC AG GT AG CTT AAG GT AGCT GT CTT G CTATATATGG GT G AAG C AG CTG ATT
TGGAAGGCACATGGTCTCACAGGGGTGATTGGGTATCGATTATTGACAGATGCGCATGGA
T GTTGCCT ACAATG ATT CTT CCATT AAAT AATT CT GAT GTTGCTT CAT CACTT CT CTTTGCGC
T CAGT GTTT GT GTT CGTTTTT ATGGCT G ATTT ATTTCTT GTTTT G AAAAAC AG AAAAAAG ATT
C ATTG ATTTCG GG AAGTAG GTCTG CTG CAT CTT CT CC AGTCG AAAAG CC AAAG CCT ATT G G
CCAAAGGTTGTGCATCAATTAGGACTT
SEQ ID NO: 10: GSK2 amino acid sequence
MEHPAPAPEPMLLDEQPPTAVACEKKQQDGEAPYAEGNDAMTGHIISTTIGGKNGEPKQTISY
MAERVVGTGSFGIVFQAKCLETGETVAIKKVLQDRRYKNRELQLMRSMIHSNVVSLKHCFFSTT
SRDELFLNLVMEYVPETLYRVLKHYSNAKQGMPLIYVKLYTYQLFRGLAYIHTVPGVCHRDVKP
QNVLVDPLTHQVKICDFGSAKVLVAGEPNISYICSRYYRAPELIFGATEYTTSIDIWSAGCVLAEL
LLGQPLFPGESAVDQLVEIIKVLGTPTREEIRCMNPNYTEFRFPQIKAHPWHKVFHKKMPPEAID
LASRLLQYSPSLRCTALDACAHPFFDELWEPNARLPNGRPFPPLFNFKHELANASQDLINRLVP
EHVRRQAGLAFVHAGS
SEQ ID NO: 1 1 : GSK2 nucleic acid sequence
ATGGAGCATCCGGCGCCGGCGCCGGAGCCGATGCTGCTCGACGAGCAGCCCCCCACCG
CAGTCGCCTGCGAGAAGGTAACCGGATCTGTGCTGGGATGGTGTTGGCCGTGTGTTTCTT
GGCGTGGTGTTCCGTTGAGCTGATGTTTAGCGTGTTGTTTTCGTTGGGCGCTCTTGTTGAG
CAGAAGCAGCAGGATGGCGAGGCGCCGTATGCGGAGGGGAACGACGCCATGACCGGTC
ACATCATCTCCACCACCATCGGCGGCAAGAACGGCGAGCCCAAGCAGGTGAGCTCAGCG
TCTCTTATGTTTCGCTTGTGTCTCTTGGCCTGAGTTTGCACGGCCAGTTCTTGCCTTGGTGA
G ATGTGTCTGCTCTCCTG C AG CT ATT CT CTTT AGCT ATG AC AACT C ATT G AAAT AT AG CTGT
GTGG ATT CTTGGTT AG ATTTTT CTT CGTTT ACCAAAT ACG AAAAAAAT GTTT CAAAGCGGCT
G AATTTATC AATT ATC AAG GACGATGTAG CTT GT C AG CCT ATTTTT GTAGTG CTC ATTTGTTT
GATCCTCATGTAACTATGGTTTGCTCAAGAGATCTGTTCCAAATATGCCTGTGTGGTGTTCC
ATACTGTGGGTTTTCGGGACAAATTTGGACGGCTTCAGTTAGATTTTGGCCAACACTAGTG
CTC AAATCTGTTACTATG AG C AAC AG CT GAT ACCT CTTT G GCG CCC AGTTGGT AAT GTCCT
GCTTTGTTTTTCAGACGATTAGCTACATGGCGGAGCGCGTTGTGGGCACTGGTTCGTTTGG
CATCGTCTTTCAGGTGATTGCTCTAGCCATTGTTTGTTTCCTTGTTTGTGTTGTTGACTACC
AGCCTGATGTTTAGGGAAATGTTGCATGTGTAGGCTAAATGCCTGGAGACCGGGGAGACA
GTGGCCATTAAGAAGGTACTGCAGGACCGACGGTACAAGAATCGTGAGCTGCAGCTTATG
CGTTCG ATG AT CCATT CC AATGTT GTCTCCCT C AAGC ACTGCTT CTT CT C AACC AC AAGT AG
AG ATG AGCT GTTT CT G AACCTT GT CATGG AGT AT GT CCCGG AG ACACT CT ACCGCGTGCTT AAGC ACT ACAGT AATGCCAAACAGGGG ATGCCACTT AT CT ACGTCAAGCTTT ACACCT AT C AG GTTT GT G AATTT CC AGT G AAT AAAT GT G AAAT GTGTGTCTGT C ATT GTGC AACT ATT CT A AG T C AATTTT AC ATTT GTGGCAGCT ATT CAGGGGGCTGGCGT AC ATT CAT ACTGTTCCAGG AGTCTGTCACAGGGATGTGAAGCCACAAAATGTTTTGGTATGTATCAGAGGCCGGGGTCTT CCCCTTT CTG AAAAAAAT GTAT C AGTG AAC ACTG AAC AG ATTTGCT C AGTTTT CAT GTATG G CTTTT CTTGCTT G ATTTTG AACTTGCCT CCACTTGCT AT ATTATACAGGTT GAT CCTTT AACA CATCAAGTTAAGATCTGCGACTTTGGAAGCGCGAAAGTTCTGGTATGTTGGCTCTTTCCCC AAG AGTTT AGT G ATACGT AC AC ACTGCTT C AAT CC ATTT GTCCTGTCGTGTAGGCT ACT CAT TCTATTCAGTATTGAACCAGAATCGGCATCATGGTCTGTGCTATTTTGATTTAGTCTTACTGT TTT AG G CTT ATAGCTGGCCAGGTGTT A AG ATT AA AATT AAG T C ACTTTT AT AT ACCTT AC AG T TT G ACTT CTT CAG AT ATTTTTGGTTT AT AAACT ATT AT CT CT GT ATT CCGCTT ATT CCTT CCT A G ATTGCT G AATT CTTGCTTT AGCCG AATGCAAAGTTT CTG AT CTT CACTT CATTT ATTT ACAT TGG ATGT CCG AC ACT G AATTT AAACTTTT GTT CCTTT ACT AC AAT AT C AACCTGC AT AGTACT TT G ATGTT ACTT ACCTGCT AAT CCG ATAT CGTTTTT CTT GTCCTGTTCTAT C AG GTGGCG GG T G AGCCC AAT AT AT CAT ACAT AT G CT C ACGCT ACT ACCGTG CT CCG G AGCTT AT ATTT G GTG CGACTGAATATACAACATCGATAGATATATGGTCAGCTGGTTGTGTTCTTGCAGAACTGCTC CTTGGT C AGGTT AGTT CCTT CGTTT CGTT C AC AT AT ATTGC AAT CT CCT AGGTT CC AACT AA G ATGT CAAT ACT GT AGTTT CT GAT CTTT CATTT GTTTTT G CT AGCC ATT ATTT CC AGG CG AG A GTGC AGT CG AT C AACTT GT AG AG AT AAT C AAGGTCTGC AAAC ATT CC AT AT AT CTTT CTTT C GCTT AT ACT ATT AG AT GTT GTTG ACCTTT GTG ATGTT CTT GTAG GTT CTTGG AACGCC AACT CGG GAG G AAAT ACGTTGTAT G AACCCG AATT AT ACG G AGTTT AGGTTT CC AC AG AT AAAAG CTCATCCTTGG C AC A AG G T AG G CTTG CAAT CT C ATT CT A ATGT CCA AT CAT AT AT C AC ATTT GCTGTT ATT AAT ATATGTGG CT C ACT GTT ATT AAT ATAG GT C AG CCGT AT AT AAG AT CT G CT GTAATATACTTAACCATGTAATGTGATGCCTACGTGTACTGATTGCACTTTGCTGTGACCAG GTTTT CC AC AAG AAAAT G CCT CCTG AAGCC AT AG AT CTT G CTT C ACGT CTT CTT CAAT ATT C ACCAAGCCTCCGTTGCACTGCTGTAAGTTTTTTCTTTTCACGTTGCTTGCTCTTCCAGGTGT TTGTTGCGGCAAGTAGGAGAGGAACAGATGAATGTAAATGTAAATATGAATGGTCTTTTAG AG AC AAT CAG AT AT AT AGTT GT CCTT ATT GATT GTTG GT AACTT ATTT ATGTATATGTGTGTA GT GT ACGTTT GT CAAACT AG ATTG AT CAGTACT AGTCTT CTTTTTTTT CTTTT CG AAAAGGGG GGACTCCCCGGCCTCTGCATCAGAGCGATGCATACGGCCACAATT AT AAAT AAAT AAAGTA GTT C A AC A AG GT CTTG CAAT CTG CTG C A AAA A AG TAGGCTGGCT C AC A A AG AG CT AG A AAA ACAAAAAAGGCCCAAAAGCCACAACCGGCTGGCATAAGATAGATAGATAGGTAAACTAATC GCCTATCCT ATT ACT AGT CTT CT AT G AGC AT CAT AAT CAT AGTT ATG AG G ACCGCT AAT CCA GT ACCAT AG AT CG ATGCTTGGC AG AT G ACG AGTT GAT ATTT AGT CGT CTT GTAAC AT GTTTT GCTATAGCGTGATGGCTATATTCACGCATTTGAATATCCGCATGCAGCTTGACGCATGTGC GCATCCTTTCTTTGATGAGCTATGGGAGCCTAACGCGCGCCTGCCAAATGGACGCCCGTT CCCACCTCT GTT CAACTT CAAGC ATG AAGT AAGTGCAT CAG AG AAAAACT AGGCTGCT CAT TTGCAATTTGACAAAAATGTATGCAACCTGTTCGTGCTGTTGTGCTTATGGGATCTGCTTTT TTTTTTTTCTGCAGCTGGCCAATGCTTCACAAGACCTCATCAACAGGCTTGTGCCTGAACAT GTTCGCCGACAAGCTGGTCTTGCTTTCGTGCATGCGGGGAGCTAAATATGCGCACCGGTG CCCTCAACCTTGCACCTTATTGTTTTGCCATGGGCAGAAGGGTGGTGGTTTAAGATGGAGG CAGGTCAGATGATCCCTGGAGCGATATATGCCAGATTCCATCATCAGGAGTACCGGTAGA GCACCG AGG AAT AACAACTGTCTAGATCATCTGCCAGGGAAGGAGACTTGCCAGGG AAAC AGCATAGCCTTACGCCGTGGACCCGAGTTTTCTTTCAGTTTTTGCCCTATTGTAAGAGTTAT T AAT AG CTT CTT AAT GT ACT GTAG CTCGT AAGTT GT C AACT ATTTTGTT CT CC ATT C ACT G AC GTATTGTTGCAGTAAACTTCGCTGTTCAATAAGTTGTGTCATGGCAGAGCTTGCACGCCCA CTGCCTGTCATGTAGTCAAGCTGTCTATTTTCTGTTGGGTAGTTGCGACCCGTCGTGAGAT GGCATGGCTGAACTGGAATTAGGGTTCGTGGGATCGAGAATTGGGGAAGCTATAGGTTTA GTATGGCCAAAGGCTCACAATATAATCCAATGCTGATTCCAGAAAAACGGGGGAGGCTTAA ATTGCCCCGCTAGCAACAGGTAGAAAGGAAACAACTCGGCAAGTGAACTGATACAATAATA CTCCCCTTGT CTT AAAAT AACT GTCT C AATTTT GT ACC AACTTT AAT AT AAAGTT AT ACT ACG GTTAAGACATCTATTTTGGAATGGCACGAGTAGTAAATAATGGTTGAATAGATGGAGTCCCA CGAGCCGTCCGATCCTGTGACAGACGGCGAGTCCCACGAGCCGTCCGATCCTGTGACGA GCTTCAATCTTGAGCGTCCACTAACTGAATCTTGATGAGGAGTTATATAAAGCAGTTTCGGC CT G ACAAT ACCT CCCCGTGAAAG ACG AACTT GT CCT CAAAT AGTT G ATGGGCG AT CG AACC AGCCTCCTATTGTTTGCTTGAACAAGGCCGGGAAGGTGGTCTTGATGAAATCAGTGTCTTC CCAGGATGCATCATGATGCGGAGCATCTTACAGTCATTCCCATGGACCTATCTTCGTCTCT CGAGACACCACGTGGACCGATGACACGAGCTCGTGCAAGGGCTACCACGAACGAGGTTA ACT CT CT CTTT GTT G AACT CT CCTTT G ACCCACTTG AG ACATGGCT ACT ACCT CAAATGG A
SEQ ID NO: 12: GSK2 promoter sequence
ACGTT CCAAAAGG AT AATT CAT AACCT AGCAATTTTAG AT AT CT ATG AACTT CAGTAT GTGC CAT CACGGTCT CAACAGG AT ATGGT ACT CT GTTTT G ATTTTTT AT CAAAACCT AATT AT CAAT TTATATATGTG CGT AC AAT CTTT AACC ATT AT AG CT G ACG AT CTTT AACGT GTTT CAT CAT CT GTAGGCTGTGCAATGTGAGATAATCATGTATGGTGTTACCGAATGGGCTGGAAAGTTTCTA AATTATGAGCTCTGCATGATTAAGTGGCGCCGGGACCTGTACTTCATTTAATGCAGAGCAA G AAAG G AAC AT C AAAG AAGCT GTT AAAATGG ATGG G AG AATTT G CT AAAAC AT GTTT ACCT A TTTTTTT AAAG ATT AAGTGTATT CT AG AAAG AAAAT AATT GT ATGTTT CAT G G AGT AAC AAAT CAGGAACTGTGCAGGTATGTGTTCATCTTGATGGGAGTTTGTCCGAGATGCTGGGCGAGG GGATTGTTTCCGCACTGCATGTATCCAAGTGTTTAGGGATGCCTTTTGCGAACAAATTTAGT TTTTTTTTGAGCGAGGAAATTGATACTTTTTTTGTGAATGTATTCGAGTGGGTGAAATACTCC CCATTTGTCTCGAGAGGTCCAGGTGTGGAAGCTTAGCACTGGGTTTGTCATTATTGAGGCA AG AAT AGTTTG AAT AT G CAC ACTTT C ATT ATGT ACTT G C AGT G CGT AATGC AT CAT GTGTAG AAAAAG AT AGTT AT ATTTGTAT AG AGTAAATT ACAAT GTTT G AGGT ATTT G AAT AAAAAAT AC TTT GTTGTTT CATG ACATGCAAC AATGCG AT CTTTT GTGCT CGTT ATT AT AAGTTTG AGT AAA TTTGTATT AGTT AAT AT AATGTACGATGACTGCCACGTTTGACTAATTTT AT ATT AATTT AGAC GTGC AAAC ATTT ACT AGTACT CAAT AT G G AAC AC AG AAACCG ACT AT C AAAG C ATTGCT GAT CCG AT CCCG C AT ACT AT AT ATAG GT C ATT AACT G AC AT AT AAAAATGTTTGG AT AATTTT ACT T CT AC AAAT AT ACT CAC A AAA ACT G AG T AC ATTTTT AG C ACTG G CT AA AAG G G TT ATT ATT CG AACG AAAAC AC AT AATT GTTG CGT AAAAG AT GTT GTATTTT C AGTACC AAATTT ACG ATTT G A TCCAAAAATAAGGAGGCATTAAAATGATGCGGATCTTTGGGTCTCGGGTGCCAATGCACTT G AATTTG AT CTTTTT AAAAGT ATTGCG AAATT AGT CAAAAATTT CAG AAACTT CTTGCAAACA AG AAT G ATACG GTGTTATACCCGTGT G AC AAGTTT C ACG AATG AAT G AGTTTT ATG GT ATTT TAG GTT AAAC AAAAC AAAAT CG AC ACT AT AT AAAC AT ATT C AC ACCTTTT GTTT ATGT C AAG G AGTCCACGG AAGTCACTT CTT CGCT AAACTTTTT AT ACAAGTAT AACACTACAAG ATT CT CG T CT CCG AAAATTTT CAGG AATTTTT G ACTCTTTTT GTT ATTT AT AAATT ATT ATTTTT CAAACA GGTTGCAATGGGACCCAAGATTCATTAGGTATTTCCGGGCATTAAAATGACATAGTATATAG C ACT AAT AAG GTT CT CCT AT ATG AC AT G AAT G CGCC AT C AATTT CT CCC ACG AAT ACCCT AG T AT ATTT GT AC AGC AATT AGT GT AC AATTTT CAC AAATTT CT CT G ACG AAT ACCCTCGTATAT T AT CAGTT CATTTT CCGGCAG AAATT G AAAAT ATGCCGT AAAT AT ATTTT AGCGGCATT GTT A T CCTTTTG ACCAAAAAATG AAT CCCATT ACT CGGCAAT AAATGCGGCAG ACT AT AT AAAACC C AACCT GATGCCCGGGGT ACT CCC AGC AATTTG ACT CCCG AGG CT CGTCT AGT CT AAT CCA CCT CCCC ACT AAT CCACCAGCT GTTT ACACAGGGTCAGCT AACCGCT CT CTCT AT AG AT CA ACGTCACT CCCCAT CTTGTT CGT CTTGGT CACCCCCACCCCCACTTT CCCTT CACTGGT CA AAGGCACCACCACCCACATCACAGTACAAGCCAAGCCAAGCCAAGCCAAGCCAGAGAAGA GGACCAGGCGTAGGTGGATGCAAGTGTGAGCCCACCGTGTCCGCCCCATTCACACCCTA GCCAC
Soybean
SEQ ID NO: 13: OML4 amino acid sequence
MPSEIMEKRGVSASSRFLDDISYVSEKNTGLRKPKFIHDHFLQGKSEMAASPGIIFNTSSPHETN
AKTGLLMSQTTLSREITEDLHFGREAGNIEMLKDSTTESLNYHKRSWSNVHRQPASSSYGLVG
SKIVTNAASRESSLFSSSLSDMFSQKLRLLGNGVLSGQPITVGSLPEEEPYKSLEEIEAETIGNLL
PDEDDLFSGVNDELGCSTRTRMNDDFEDFDLFSSSGGMELEGDEHLISGKRTSCGDEDPDYF
GVSKGKIPFGEQSSRTLFVRNINSNVEDSELKALFEQYGDIRTIYTACKHRGFVMISYYDIRAAQ
NAMKALQNRSLRSRKLDIHYSIPKGNSPEKDIGHGTLMISNLDSSVLDDELKQIFGFYGEIREIYE
YPQLNHVKFIEFYDVRAAEASLRALNGICFAGKHIKLEPGLPKIATCMMHQSHKGKDEPDVGHS
LSDNISLRHKAGVSSGFIASGSSLENGYNQGFHSATQLPAFIDNSPFHVNSSIHKITRGASAGKV
SGVFEASNAFDAMKFASISRFHPHSLPEYRESLATGSPYNFSSTINTASNIGTGSTESSESRHIQ
GMSSTGNLAEFNAGGNGNHPHHGLYHMWNGSNLHQQPSSNAMLWQKTPSFVNGACSPGLP
QIPSFPRTPPHVLRASHIDHQVGSAPVVTASPWDRQHSFLGESPDASGFRLGSVGSPGFNGS
WQLHPPASHNMFPHVGGNGTELTSNAGQGSPKQLSHVFPGKLPMTLVSKFDTTNERMRNLY
SRRSEPNTNNNADKKQYELDLGRILRGDDNRTTLMIKNIPNKYTSKMLLVAIDEQCRGTYDFLY
LPIDFKNKCNVGYAFINMIDPGQIIPFHKAFHGKKWEKFNSEKVAVLAYARIQGKSALIAHFQNSS LMNEDKRCRPILFHTDGPNAGDPEPFPLGNNIRVRPGKIRINGNEENRSQGNPSSLASGEESG
NAIESTSSSSKNSD
SEQ ID NO: 14: OML4 nucleic acid sequence
ATGCCTTCTGAAATAATGGAGAAGAGGGGTGTTTCTGCCTCATCTCGCTTTTTGGATGACA TTT CCT AT GTTT CTG AG GT AATT ATT AAT GT AACT GTCT AAG AAT G GTTT GTT CT AATTT AT AA TGTG ACCCT C AAC AAG CT AATT GTT ATT CT AACT GT CTT AT AAT GTTTTTTTTT AT AAT GATT A T CAGTT CCAAG AAC ATTTT ACAGCCT AAG ACTT CGGTTTT CTTT GT CATTTT GTT AAT CAATT TGACCTGTATGCATGGCCTCAATGCTATTGCCTTTTCGACAATTGGTTTTCTAAACATGCGT T AAACTTTT ATGGGCAG AAG AAT ACAGG ATT ACGG AAGCCAAAATTT ATT CATG ACCATTTT CT AC AAG GT G AGTT C AAT C AACT AATT ATT ATTT GTT C AAAAT G GTTT GTATAT CTT GTG CTG ATTTACCTGTGTATCAATTGCATCCTTAATGCCCCAAATAGATTCACTAACAGATAGTTAAGA TT C AG ACCTTTT G AGT G AACT GTTT AC ACT CC AGTTT AG AAATTGG CT AGT AGCT AT C ATT G AGTTTG AACGT GTG AACTTTTTG AAG AAT CTTT CCAT AT AT GTTT CT GT AT ACCTT ATTTTT GT AT ATTT C A AAG C A AT ATTT CT CT CAATTTTT GTTT CAATTTTTT AT CAAT GTTTT GTTTGCTTTT AG AATT AATG ATT GT CAAT GTTGCT AACT AGTAT CCTT CAT ACG AGTAATT AT CTT AAATT CT AAAACTGGT AT ATTT ATTT C ACTTT AT G GTG ATT G GTG AT AAT ACTT GTTG ATTT GT CTTTTTT AGCCC AT ACACTT CT CACTTT ATGCTG AAAT CAAT AT GT AATTTTT ATTTTGCTT CTGG AAT A ATG AAT AT C ACT AAT CAACGTT G C AAATT G AC AT CAT CT AAAATT AAT GT ATTTTT CT GTTT GT GGTG ACAAT GT AATTGCTGCAAACCT AT AT AAATTGCT GAT AAAAAAAAAAAAACCT AT AT AA ATT AAT GTTTT AT AGTG AAT GT AT AAATT CAAT ACCTT GTTCT CACAACATTTTG ATT GTGGTA TAGCTGGGATAATTAATGATGATTTCATGAATTTAGATGCTGTGCTCTGCTGGACTGAAGCT T ATTT ATG ATTTTG G T AT A A AT ATT ATT A AG A ATTTG CTTTT ATTTT AATTGTGCT A ATTTT G A A TGTAGT AAT AAT GT AAT AT CTGC ATGT AT CCAT ATTT AT GTTT GTTT ACCT AT GTT CC ATT AAT AAGC AGTT CAT CTGCT G AACAT GT AACT AATTT CTGG AT AAAGTAATTT CT AT ATT CAAATTT T CAGGG AAG AGTG AAATGGCTGCATCACCTGGCAT CATTTTT AAT ACTT CGT CACCCCAT G AAACC AAT G CAAAAAC AG GCTTGTT AAT GTCT C AAACT ACT CTATCTCGT G AAATT AC AG AA GACCTACATTTTGGCAGAGAAGCAGGCAATATAGAGATGCTGAAGGATTCTACCACAGAAT CATTGAATTATCACAAGAGATCATGGTCTAATGTGCATCGGCAGCCAGCATCTAGCTCATAT GGTTTAGTTGGGAGCAAGATTGTCACCAATGCTGCCTCACGGGAAAGCAGTCTATTTTCAA GCT C ATTGT CT G AC AT GTTT AGCC AAAAGT GT AAG AATTT GTTT C ATGG AT GTT AAT AT AGTT GCATGCATGTGTTATGGGTATTGTAGCATAATCAAATTCTGGTTGCTTTTACACTTCGTAAT ATTTT AG AT AT G AGTTT CTGTTG C ATT C ATTTGCTTGTGT ATTT GT C ATT AGC AATTT AG CAT AG AAG AAT AT AT G CTTGCT AT CTTTT GT AAT GTAG AAG G AC AAT ACCCT C ACC AAACCC ACC ACC AAAAAT AT AAAACT AAGT AAAGTT AT CT ATTT GTTTT AG GTTTT GT GATT AT ACTTT AGTT CTGCTCATGTGTCACTGTGTGTATATATGTATATCTTAATGCAGTGAGGTTATTGGGGAATG G AGTGCT GTCTG GT C AACCC ATT ACTGTTGGTT CCCTT CCT G AGG AAG AACC AT AT AAAT C TCTCGAAGAAATTGAGGCTGAAACTATTGGAAATCTCCTTCCTGATGAAGATGACCTGTTTT CTGGAGTCAATGATGAGTTAGGATGCAGTACTCGCACTAGAATGAATGATGATTTTGAAGA TTTTGACTTGTTCAGCAGCAGTGGAGGCATGGAATTGGAAGGAGATGAACATCTAATTTCT GGAAAAAGAACCAGTTGCGGGGATGAAGATCCTGATTACTTTGGAGTTTCTAAAGGAAAAA TT CCTTTTGGT G AACAAT CTT CT AG AACACTTTTT GTT AG AAAC AT CAAT AGCAAT GT AG AAG ATTCTGAGCTAAAGGCTCTCTTTGAGGTGAACCTTTATTCTTTTATTCTGGCGGATGCTATC TT AG AATTTT CAT G AAAC ATTT CAT ACC ACT AAT AAT G GC AT GT AAAT G G ACT ATTTT GTTT G TT CC AGC AAT AT G GAG AT AT CCG AACC AT AT AT ACT G CCTG C AAGC AT CGTGG ATTT GTT AT G ATTT CTT ATT ATG AT AT AAG G G C AG C AC A A A AT G CAAT G AA AG C ACTT C AAA AT AG G TC AT T GAG AT CT AGG AAACTTG AT AT AC ATT ATT C AATTCC AAAG GT AT CATT ATT AAT AACTT CT C AT G C ATGC AT AATT CCTTTTT CCTT GT CATTTT GAT AAAGTT GTT ATTTTT ATT CTT CAT CAT A TC ATTT ATT ATT ACCG CCAT ATGTTTTG CTT GTT C AATTGCTT G CATGCCT GTTTTT ATG GTT TGCTTATAGATATATCTTGATTTGATGACATGTCAGGGCAATTCTCCAGAGAAGGATATTGG CCAT G GTAC ACT G ATG AT AT CC AAT CTT GATT CAT CT GTTTTGG AT G ATG AACT AAAAC AG A TTTTTGGGTTTT ATGG AG AAATT AG AG AAGT AAGT CGTT CTTGTTGGTTTT CAT CCATTTTT G GT GTTTGT GTTTT AAAAT GAT AC AAG CATT CTT AAAT ATT GTCTCTGT AATTGC AG AT CT AT G AATATCCACAATTGAATCATGTCAAATTTATTGAATTTTATGATGTCCGGGCTGCAGAAGCT TCTCTTCGTGCATTAAACGGGATCTGCTTTGCTGGGAAGCACATTAAGCTTGAGCCTGGTC TT CCCAAG ATTGCAACATGGTTGCT GTT ACCGCTT CTTTTTT ATTTT CAATTT ATTTTTTT CT C TTTTT AT AT CAACTTTTT CAACTGTTTT CT ACTTTTTT AAAT GTGCG AAT CTT AAAACATT GTTT T GT AAATG AAGT CTTTT ATTTTGG AT CTT CAT ATTT ATGCT CACCACCTTT AAT AGTAT CCT C ACTTTGTAGAGTTTGATAGAGTGTAAAGTTGTTATCAACCACCTTGATGGAAAAATTACATC TT G AC AATT AT G C A AACCTT G CTTTT GTAGTATGATGCATCAGT C AC AC A A AG G AAA AG AT G AACCT G ATGTTGGT CAT AGT CT G AGT G AC AAC AT AT CCTT AAG AC AT AAAG GT AT AATTTTT GT CTGCTTT CACTT GT CTTTTTT CCCT CAT AAAGCT AAAATGTT CTTGGT CT CACT AG ACAT A ACTTACAGCAGGAGTGTCATCTGGATTTATTGCATCTGGTAGCAGCTTGGAAAATGGATAT AAT C AG GG ATTT C ATT CTGCG AC AC AGCT ACCT G CTTTT ATT GAT AACT C ACCGTTT CAT GT GAATTCTAGCATTCACAAGATCACAAGAGGGGCATCTGCGGGAAAAGTATCTGGTGTTTTT GAGGCCAGTAATGCTTTTGATGCTATGAAATTTGCATCCATTTCGAGGTTCCATCCTCATTC TTTACCTGAATATCGTGAAAGTTTAGCTACTGGCAGTCCTTACAACTTTTCAAGTACCATTAA CACGGCTTCCAATATTGGAACTGGATCGACGGAATCATCTGAAAGCAGGCACATTCAGGG AATGAGTTCAACTGGGAACCTAGCTGAGTTTAATGCAGGAGGTAAGTTTAATGTGCTAAGA AAGCCT CAT GTATATG CTT CCTTT ATTTGC AGC AGTTTT G AAAT GTTT CCTT GTCTAT AG AAA ATT CT GAT AAGG AAT C AATTT GTTG C AAAG GTTG AACTT ATT GTT C ACTTT AAATGG C AT CCT AG G AGTTT G AAACCTT AT AAT G AAG AG CTT G ATTGTT AATTTT AATG AT G ATGCC AG CCT AG GGTTTTCAAT ATTTT CATT CTT CT AAT AACACCCAAAAAT AAT AATT GTT GTTTAAG AGCCAG ACT ATT GAT CT AT ATG AAT CT AG ACTT G CCTGT CCG AGT AT AT GAG ATTT AAG C ATT CC AAAT TGTAAATTGGTCGAGGTCATTTTTCCTACAAGCTTGTAAGTGGTAGAAGGTGCTGGGAAAT TTTAGGCTGAAGCGATATCTAATATGGATTTAATAGTTCTATATTTGAATGCTGGTATGTAAC CTTTTTGTTTGATTTTGGACCTTCAGGAAACGGAAACCACCCCCATCATGGACTTTATCATA TGTGG AATGG GTCC AACTTG CATC AG C AACCTTCTTC AAATG CC ATG CTTTGG C AAAAAAC ACC AT CCTTTGTT AATGGT G CAT GTT CT CC AGGTCTT CC AC AG AT ACCC AG CTTT CCT AG AA CACCACCTCATGTTCTTAGAGCATCACATATAGACCACCAAGTGGGATCAGCACCAGTTGT T ACAGCCT CACCCTGGG AT AG ACAACATT CTTT CTTGGG AG AGT CACCT G ATGCTT CTGGT TTTAGATTGGGTTCTGTTGGAAGTCCAGGCTTTAATGGTAGCTGGCAGTTGCATCCTCCTG CTT CT CAC AAT AT GTTT CCT CAT GTT G GTGGG AATGGT AC AG AATT G ACGT CAAAT G CTG G GCAGGGCTCTCCTAAGCAGTTGTCACATGTTTTCCCTGGGAAACTTCCCATGACTTTGGTT TCTAAATTTGATACTACCAATGAACGAATGAGAAACCTCTATTCTCGTAGAAGTGAACCAAA CACT AAC AAC AAT G CTG AT AAAAAAC AAT AT G AACTTG ACCT AGGCCG C ATTTT ACGTG GG GAT G ACAACCG G AC AAC ACT C ATG AT AAAAAAT ATT CCC AAT AAGTAT G CC AATT AT CT CCA T AT CTTTTTT GTG C ATTTTT G CTG CTT ATGCTGTT AT CTT CT CAT CCTT ACTT C ACC AAAG AAT GTGATTATTAGTTAAATAAGCAATTGCTTATTTGGCTTGTCCGCTTTTCATGTTGGTGCATTA ACC AT AC AAGTGCCCT CT CT CTTTTGCTT G CTT ACATGCCT AAAT AG CAT AT ACTTTTT ACAT AACAG ATTTT ACAAAATTT G AAT AACAATTTTT AAAAACAAGCAAATT CTTTTGCACTGGT CT TT C AGTTT GTCT CTTT C ATTT AT ATTTGTTT AAAT ATTTTT G GC AGGT AT ACTT C AAAG ATGCT TCTTGTTGCCATAGATGAGCAATGTCGAGGAACTTATGATTTTCTGTATTTGCCAATTGATTT C AAG G C A AGT ATTT ATTT G G ACTTG CT AGTTG ATT G ATT CTTT ACTT AAAT G AAGT AAT CAT A AAT AT GTTT CT AAGTCAATT CT ACAAATGGTTGCAG AAC AAAT GT AAT GTTGGCT ATGCATT C AT C AAT AT GAT CG AT CCTGG AC AAATT ATT CC ATT CC AC AAGGTT ATTT G G AAACCCTT AT G T AAT ACT AATTTG AT AT AATT ATCTGT G ATTT G AATT CTG AGTT CAT CT GATT CTTTTTGTTT C T AAG AGTTG AT AT ATCT AAT AG AT A AG G A AT ATT GTAC AG G CTTTT C ATG G G AAAAAAT G G G AGAAGTTCAACAGTGAAAAGGTAGCAGTACTCGCCTATGCCCGAATTCAAGGAAAATCTGC TCTTATTGCTCATTTCCAGAATTCAAGCCTGATGAATGAAGATAAACGGTGCCGTCCTATTC T CTT CCAT ACAG ATGGCCCAAATGCTGGT G ATCCGGTAAAT CAGCTT GTT CTTT AGTTGT AA CT ATTTT CCTTTT G CT AACT AC AAT GT ATT ATG G AACT ACT AT ATC AG CTTGTT CTTC AG TTG T AATT GTTTT CCTTTTGCT AACT AC ATT ACT AC AAT GT ATT ATGG AACT ACT AT AG G ACC AG A CT ACT AAT ATT CCT CT CACT GAT ATTTT ATT CCGT AG AT CT G GTTTT ACT G ATG AT AC ATT AA T GTTTT G G G CTG ATG C A AG T AAAG TG G G AG CT AATT AT AG G CTTT CTC ACTT C AA A ACTTTT TTGCCTGATGTCTAATATTAACATCCATTGGGTGTGGCAAGAAAGAGTTTCTTGAAGGAATA TTT CCCCG ATG ATT ATTTG ACTT ACAC AT AAC AACT AAAG CATT ATGTTT CT G ACTT G AGTT G GTTTT AATGGCT ACAAAATGCAAT CT ATT AGT GTG ATTTT AACT GTTT CT AGCATT AT ATGCA TTT A AC A A ACT G G G CTTTCC ATT A AAAG AAT AT AT ATT GGCTTGC AACCT AATT G T ACT ATT G GTACCTGGTT CTT CC ACT GAT ATT AT AT ACCG AG AC AAAATT AACCT AAT CTGTCTCTG C AG GAGCCTTTCCCCTTGGGTAACAATATTAGAGTGAGGCCTGGAAAAATTCGCATTAATGGTA ATGAGGAGAATCGCAGCCAAGGGAATCCTTCATCTTTGGCAAGTGGAGAAGAGTCCGGGA AT G C AAT AG AAT CT ACAT CG AGCT CTT C AAAAAATT CT G ACT G ATTT AG CAT CAT GAT CT AA CAGTT CAAT GTTGCAT GT GAT AT CAACT CCAAG ACTGTAT ATTT ACATT AT CTTTTT GTT CG A TCGAGCAAGAGGAGTTGGAGCTGGTAGGAAAGGGGGCTCAAAATTTTTTCCTATAGAGGA GCCTTGCAAGAGTTTTTGGAAGTTGAGGTACATAACCCGAATGAAGTCACTGATTCTATTGT TTT CCGTT ATTTT CCT AAAATTTTGCATGG AGT ACTGCT ACCAT CCT ACAACTTT AG AG AAT G GCCT AACT G AAGCTT AAAATTTTGGCT AGCT GT G AATGG ACAATGTG ACACTTTGCAGTTT C CTT GTG AT AAT GTG CAT C ATT GTGG GTTT C AAGGGTT CTTT G CT G ATTTT GTTTT C ATGGT C AT CTTTGTT GTT CAT ATGT AATTTT GTT CCCTT AT ATT CCCG GG C ATTGGT AT CCTT ATGGGT TTT GTTGT CT AT AT AATGG ATTTTTGAGGAAAAACATTT AAAT G AAACATTTT CTT CGGTGGT GGTAGT ATT CAG AT ATTT CTGCTT CGCTT GTT ATTTCT GT ATT CTTTT AT CAGTGCTT AT ACA CCT GTT ATGCAT GT CAGGGTT CT AGT CATTG ATT AG AAAAATGCTT AATT CAGCCTT AGTT A CAT G CCACT AGC AAATGCT GTTT GAT AAT AAAG GTT CCT G AGT C ATG ACT AT ATT CTTT G AC AG AG AAAAAAATTG AGT AT AT AT AGCAATTGGTT AG AT AG ATTT CT CT AT AAAC ATTT AAAAG AAAACAAG AAGGTAAAAAT ATT CAGTTT CTT AAT AAACT AAATT CAGTTT AT CCATTT AACTTT TGG AG AAGTTAAATG AAAAG AGTTTCTAT AAAAG TTAAGTGCATATGCAGTAAGTTTGATTA CT ATG AAGTT AT AT AT ATTTT CTTTG AAGTTT G ATG AT CAT AAAGTTTT ATTTT AT AT ATGT AT GTAT ATTT C AAAT G ACT GTTT AG ACG AAG AAAACT AATGGT AC AAATTTT CT GTT C AC AAT C A CATACGTTGGCATTTATTTGTACGCAAGTGACTGGTGAGCTTTATGGCATGACTAGCAGTG GT AC ATTTGCTGC AACC AG ACCT GAT G AAAT G AG ATTTGTTTT CTGT CC AT AG AT ATCTGG C TTTT CTT AT CATG AT AGT G ACT CAT CATTTT CCG AAGTTT CAAGT CT CAACATGTATTTTTTTT T CATTTT AATTTTTTG ACCAT AAAGTG AAAT CT GTTG AAAAAGT AGTGCAATGGT AT CT ACAA TTCCAATATATGTGTCTGCGCAGTGCGCACAAACTTTACAAAACTAACCAGTGAGACATGAT TTTGCACTTTTGCCTTTTGGTTT CAAG AGT CAAG A
SEQ ID NO: 15: OML4 promoter sequence
C ACC AAACC AT AAAAC AT ATG AAT GT GTGC AATT AAATT ATTT C ATTGGTT AGTAG AT AT G C A AAAG AAAAAC AGTT CC ATT GT G AAAAAAAC AGTTT CAT C ATTT G AAT AT AAC AAATTT GAT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT A ATG ATTG G A AT AT ATTT A A A A AT A G AGT AAAT ACTTT CTT ATT GTT CACAAAAC ATT AT CTT AG AAG AAAGCATTT ATGG AAATTTT TTTT AG AG AAAAT ATTG AAAGTAGCCATT AAT CAAT AG AATT GT AT AAAATTTTCCT AAAT G A AATTTT AC AAG AAT AT AG AAACCC AACTT CCC AT CTT AAGCT C AAT AC AC ATGCCT G CAT CC AAT AT AAAG AG G G G TTT C A ACTTGTT GTTTTT C ACTGT A AAA AAA A A AG TTTT AAG CAT ATT A TT ATT AT AAT CC AC ACT ACT CCCTGC ACCT ATTT ATT AT C ATTTTTTT AAG AT AAG ATT GT AAT T AAT AAATTTGTT AAAT AG AATTT AATG AATT AAAAGTT AGT AATT AATT AT CATTTTT CTT CG AG AT AAG AAT GT AATT AAT AAATTT GTT AAAT AT AATTT AAT G AATT AAAATT CAAGGTATT AA T AAATT AC A ATTTT AA AA AT AATT A AAATTTT C ACTT AAAT AG T AATTT C ATT ACC A A AT ATTTT AAGG ACACCAAT AAAGTAAAT GT AT CT AGT AATT ACTT AAAAAT AAAT ATTT GTTT AATTTTT A ATG AT GTT ATTTTT ACT AAATT CCT AAT ATGGT AAAAAAAATT CACTT AAT ATTTTT CT ATTT GT TCATGG ACTAAT AGTTTTT CTT ATGTACATTGGTAGTAATG AGGATCG AAACC ATCCATCTA CT AT AAAACCCTTGCT AT CAG ATG ACCCT AGTG AATT AG ACT AAT AATTT AAGCCAT CTTT AG TAGTAAATT AAATT AAATT AACTT ATT AT ATACATGCTT AT ATGCTGAAAACAAT AAT AAATCA CAT GTG AGT AAAT ACAT G AACAATTT ATTTT AAAG ACAA ATTTTTTTT CTTTTG AAT ACGGGT ACGTT ATG AG AAT CCATTT AAAACATTT AT AT AACCATT ATTTT GT CGCATGCAT AT ACCATT AT GT AT CCTGGG ATTTT AT G ATGCACAAT AACT AT AACT ACAAAATT AAAT AACT ATGG AATT TT AAAAATT AAAT AAC AAAT GTTTTT C AGTT AT ACCTT AAAAT G CT AAAT GT GTATTT AAAACT AG G AAAAT AAAAT AGC AAGT ACTT AG ATT AATT AGTGTGGGTT ATTT C AATTT AT CCATT AAT CTT G AATT AAAAT CT AAAAGTAT AT AATT GT ATT AAAT ATTTTT ATT CAAG CAAGT G CAT CT CT TGCAT GT GTTTTTT G AAAT CAAT AT ATTTT CATTGT CGG AAACTAAAAACTTT G AAACT AAAA AG AC AACTTT G AC ACCG ACT CT C AAAG CTT AAAG ATT AT C AAAGT G ATTTT ACCT CTTT AAAC AAAAT ATTTTT CAT AC AAAT AAC AC ATTT G G AAAACC AAAAAATT AAAAAGT AT AAT AAAAT C A CC AGT AATT CAC AAT AAAAC AT AG AT ATGTGTATAG AT G AATT ATT AC ATT AAGCT G GT ATT C AAGC AAC ACT AAAAAAAAAAAG AAAC AG ATT CAT GTTG GAT AAAAT C AAAAAAC AAAACG AA ACG GTTT C ATT ATTT G AAT GT AAC AAACCTG AC ACCT GAT ATT AT AAT AAT AAC AAC AC ATT A AATT AT ATTTTT AAAAT ACAAAT ATT ATTTTTTTGTGAAT AT ATT AAATTT ATTTGAAACCATTA T CTT AG AGG AAACATTT ATGG AAAAG AGTTTT AAAAAAAAT ACTG AAAAAGTAACT GT AAT AA AAGT AT CCATTAGT G AAT AAATTTGCAT AAAAAATT CCCAAACAG AATTTT ACAAG AATTT AG AAAGCCCCACTTTCCAAGTTTCCATGTTAAGAACACACAGACTTGCATCCAATAAAGAGAGA AGCTT AT CTT GTT ATTTT CCACT GT AAAAAAAAAAAAAAAAC AAGTTT AGTTT G AAGCTGCTT ATT ATT AAAAC AC ACT ATTT CTG CGCCT ATTT ATT CTCCCTCCT C ATTT CT AGTTTT CATTTTT T AATT ATT ATT ATTTTT ATTTTTTTT GTTTTT GTTTT C AAAGCC AACT AAC ACCCTTTT CCTTT C ACTTT CTCTGT CAG AAG ACT AAAAAAAC ACCT CT GTTGTT G CCTT GT CCGTTT ATTTT CT CT C CAAACCAG AG AGCG ACCGCCGGCG AGT CT CACCT CGCCGG AAATT CGCT CTT CCT CGCCG G AACCT CC ATTTTT CCCTTT CCT ATTGCG CTTGCTTTTT CTT CCC ACCG GCT CCTT C AAG AG GAGCCGTTTCCGTGCAATAGTTTTGTTGCTTCTTTTTTTTTTTCGTTTTCTGAGCGCCTGAG ATTGCAATGCAGAGGGAAGGAGGTTCCTGTCGCGGCAAGAAGCGTGGGATCTCTCGTTTC T G AAATT CCT AT GAT GTGG AG CGTTTG AG AG ATT CGTTTTTTT CTT CTT CTTTTTTTTGT CT C TTTCTCTTTGTTCTCGAGTTTCTGGATGAGGAGCGCGCTGAATTTTGTGGAGGAGAAGTTTT TGTGCTAGCGAGGGCGTGTTGAAATTCAGAAGGTGTGAATTTGTTTTGTTAGCTTGAGAAA AAAAAAGTGCT AATT ATGGTTT GT GTTT AT GT GT GTTT CTTTTTGCTTTTTTTTTTTTT CTGGT T AAATGGTTTTTT CT CTTT AGTGG AAGTGGTTTTT AT G AGTTT ATGG AG AG ATG AG AAT CT CT TTGGTGTTCATGTTTTTGATCTGCTGGTGTTTTCAGATTATGGCCTCTCTGAAATTTTGTTCT TTT GTTTTT ATTTTTT CT CTGG AT CT GTT AGT GTTTG AGGCTTT CCCTGTG AAATT CCCT CT C CAT AT GT GT AATTTTGT GAT AG AACCAAGGTGT AGTG ATAT AAATTT AAGTT AAATGCTT GTT TTTTTTTTTTTTTTTGGTTTGGG AAAAAAAGGG AG AGTT GTGGTT ACAGTTG AATGGG ATTTT ATTTTT GTTTT GTTTT AATTT CCATGG AT AGTTTTT GTTTTT AATTTTTT AAGTTTTTT GTT AAG C AAATGGCCT AAT AACCG C ATT AG GTTGTT AGT AGGT AAG CAT CG AG CTT CTTT CTT CT CCT AAT AC AT CCTT CCTT CTGGT AAT GTAT AGTTG AAG AT G C AACT ATTTGT G GCTTT GTTTT CCA CTGCT CTTTTT AAAATTTGCAATTG AAT ATTTG AAGTGCTTTGGT AGGGTTTT CATT AGT CCA TTTTTTTGTCACTTTTTTTTGTGGTGGTGATGTTAGTAAAGTTCCAATTGTTATGATAGATGA T ATTTTT CTGGGT CT G AATTTTT CTTT CTGCCTGT AGCAGTT AT GT ATG AT AT G AAATGCG AT GTT CATT GTT AT C AGT CCTTT CCT ATG AC AAGG G AAT GACCTTG AATTT CT CGC AG AT GTCG CACATAGAGCTTGACGAAAACTAACAAGGAAGGAGGTTTTCAGG
SEQ ID NO: 16: GSK2 amino acid sequence
MASLPLGHHHHHHKPAAAAIHPSQPPQSQPQPEVPRRSSDMETDKDMSATVIEGNDAVTGHII
STTIGGKNGEPKETISYMAERVVGTGSFGVVFQAKCLETGEAVAIKKVLQDRRYKNRELQLMRL
MDHPNVISLKHCFFSTTSRDELFLNLVMEYVPESMYRVIKHYTTMNQRMPLIYVKLYTYQIFRGL
AYIHTALGVCHRDVKPQNLLVHPLTHQVKLCDFGSAKVLVKGESNISYICSRYYRAPELIFGATE
YTASIDIWSAGCVLAELLLGQPLFPGENQVDQLVEIIKVLGTPTREEIRCMNPNYTEFRFPQIKAH
PWHKVFHKRMPPEAIDLASRLLQYSPSLRCTALEACAHPFFDELREPNARLPNGRPLPPLFNFK
QELAGASPELINRLIPEHIRRQMGLSFPHSAGT
SEQ ID NO: 17: GSK2 nucleic acid sequence
ATGGCCTCCTTGCCCTTGGGGCACCACCACCACCACCACAAACCGGCGGCGGCGGCTAT ACATCCGTCGCAACCGCCGCAGTCTCAGCCGCAACCCGAAGTTCCTCGCCGGAGCTCCG ATATGGAGACAGATAAGGTACTTCCGCTCATTGTACTCTTCACGAACCCTCGGAGTGGTTT CCGACTTTCCGGAGCTCCGATCTCCGTCGATTCGCCTCGAAGCTCCGGCGTCGCCGGAGT TTCGACCGATCTACCGGTTTTCCGTGCTCGCCAGAGATTTTCTCCGGCGACGCCGCTGAT CGGAATGGTTATTGTTTTCTTCGAGAGCGATGTTGATTCTCGTTGACGAACTCCAAAAATAG AAAAGAAAATTAGGTTTTACTTTTTTGGAGTGTGTTTTGGTTGATGCTTTTTTGGTAGGGATC TT AAC ACT G AAG AAAAAATT AG AATTTT CT GTTTT AGGTGT CG G AG AAAAGG AAAG G AAT C A AT GTG AAAAT GTGG AATCCT GTGCTTTG ATTTTTTGTTT CCTTT AATT CAAGG AG AG AG ATT C T GATT AGGT GT ACTT AGCT G ACCTG AGTT AAC ATT CTT ATTT C AC ATT CT AAC ATTTTT ATGT TT CTTT CACTT AT CT CT AAT CT ACT GTT AATTTT CTTT AGCT AT GTT AATT CTGTGCT ATT AT A TGGTCTATTATGGGGGTATAGTTTTTGTTTACATTTTTTGGGGTTTGTGTGTGTGATTTCCTT T ACTT CCCTT GTGGT G G ATT GTTGTT C AAAAGGT C AAACG GTT AT AATTT G CTTTGCTT C AG GGAATTAGTGTCCTT AGATTCTCTCTGTATTGTGTCTT AAGTT AT AGCGTTGAAGTTTTTCTT TATGCTTTCTTGTGAGCTGCGGTTACCTGATTTAACTTTAGTTTATGTGTGTGCTCTTTGAG CT CTTT ACACTTTGCCTTT CTT CAACTT CACATT CT G AACTTT GT CTG ATTT CTT CTGGT AAC CCTCTGGTT CAT AT GTTT CATTGCCATGC AATTTT CTT CT CAT AAC ACTTGTTT C AACC AGT A AACT GT C ATG AG AT ACCCCCTTT CCTT AT ATTTGCAT CTT CT C AAATT AACTT C ACT GTCTAT ATGCATGTTTGTTGCCATGTCTGGCACGGCATGCGTTTGATAGTTGATAGGCACATGTTGT TGCC AT ATTTT GTG G ACGTTGCT AAAC AAAATT ATTG AT G AC AAT ATCTGT AAAG CT AAATTT AAAT ATG ATT GAT ATT GT AT CAAT AAAAAT CT G ACATT CAAGT ACT GT ATT AGG AG ATTTGCT TT ACTGC ATT AAAT AT CT AATT CTT GTT AT AAGTTGC AG G AT ATGT C AG CT ACT GT CATT GAG GGGAATGATGCTGTCACTGGCCACATAATCTCCACCACAATTGGAGGCAAAAATGGGGAA CCT AAAG AG GT G AG AATGTGTT CT AACT CCC AACCCCTTT CCT CCTG AACTT AC AATTTTT A TT AAAAAATT CATTT CAT ACCCT CAT AAAT AT AT GTT ATT GTAT AT ACT GATT ATT GTTTTG AT AATGGTTCACTTCCTTATGGGGATAGAGTGGAAGTAGAGTTAGTGGTTGGGGAATCTAAAT T AAT AAATTGCCT ATT AT ATT CAAGG CCTT CCAAAG AT AT AAT ACTGGTT GT CAAT CAG AGTT TGGCTT ATTT CCCCAGTGCTT ACCT CAT CTG AT AATTTTT ATT CGCCAACAAT AT ATT AGCT C TT ACAAG AT GT AT AATTTTG AAG AATTT AATT AT G ATGGTT CAAT AAATTT ATGCTT AAAATT G GTGATATTTATCTGAGTTTCCTTATGTGGGCTTGGTTGAAGGGGTGGGAAAGGGATTTCAA TGTCCCTTTTCTCCAGTGGTCCTCAGCAATGTCCTTAGCTTTT ATTT AATGCTTCTTGG AAG GG CAGG GTTT GTGGTTTGTT CTT GAG AT GTTT GTT AT GTTTTT AC AG AAGT ATT AT ATT CT AT GT AT CTTTT AGT ACT ACTGGTACTTTT CATGCATT AAT AT AT ATT AT CTTTGG AGTCCAAAAA AAAAT AAAATTT ATT CTTCGTAAAT AACTT ATTT GTT ATG ATT ACTT CCAT GAT ACC ACCTGCA GACCATCAGTTACATGGCAGAACGTGTTGTTGGCACTGGATCATTTGGAGTTGTTTTTCAG GT ATGG AT G AACAAT CACCT AG AT G ACAAAT ATT CCT ATT AAGCTTT CT CTGCT GT CACAT A TCTCATTGTTTTCCACCCCTGGATGGCATTCCTCTTTT ACCT AAAAT AT AGGCAAAGTGCTT G G AG ACTG G AG AAG C AG TG G CT ATT A AAA AG GT CTTG C A AG ACAGGCGGT AC A AAA AT CG TGAATTGCAGTTAATGCGCTTAATGGATCACCCTAATGTAATTTCCCTGAAGCACTGTTTCT TCTCCACAACAAGCAGAGATGAACTTTTTCTAAACTTGGTAATGGAATATGTTCCCGAATCA ATGT ACCG AGTT AT AAAG C ACT AC ACT ACT ATG AACC AG AG AATGCCT CT CAT CTATGTG AA ACT GT AT ACAT AT CAAGT ATG AACTTTT CT ATT CT GTTTGG AATTT AGCT CAT GT GTTGTTTT ATAACATTGTAACAATCGAGTTTGGATATGATGTTTAGATCTTTAGGGGATTAGCATATATC CATACCGCACTGGGAGTTTGCCATAGGGATGTGAAGCCTCAAAATCTTTTGGTATGCTTTC TTT CAAT G CTTTT CCT CT AT G AGTT GT ATT C ATTT ATT CT AAAT CTT AACCTTTT G CAT AT AT G ACT AC AG GTT CAT CCT CTT ACT C ACC AAGTT AAG CTATGT G ATTTT G GG AGTGCC AAAGTT C TGGTATGTTGGTCTGCATTGTTCTTGTACACATCATTGCTTCATGTACATATGCCACCATGA TAATGGAGGACTACTAAAATCAAATTCTTCCTACCGGACATAGCTATGCTAAAACTTGTATA AG AT CTTT CCATAAATGCAAT ATT G ATTT AACCT GTTT AT GTG AT G ATTTGTT ATT AGT AAAT A GC AATT G AAGT G AAAATG AT G CC AAG AAT CTT G ACT CT G ACCC ATTTTTT CCT ATT AT ATG A AAAAT AAAT AG A AG A AAAG TT AT CG ATT G G CATC ATG T G G ATTTTTT ATT C A ATT AT C A ATTT C ATG AAGCTT CT CAT GTT C ACC ACTTGGT AG GAT AT AGTT AT GAT ATT ATTTTT CC AC AAAAA ATTT ATATCAGGTCAAGGGTG AAT C AA AC ATTT CATACATATGTTCACGTTACTATCGGGCT CC AG A ACTAATATTTG GTG C AAC AG AAT AC AC AGCTT CT ATT G ATATCTG GT CAGCT G GTTG TGTT CTT G CTG AACTTCTTCT AG G AC AG GTT AT AAATTT CTG G AAAT CT ATG C ATT A ATGTT G TT GAT ACTT AAG ATTTTTTTG CTTT CTTT CTGGG ATAT GTT AT ATTG ACTT CACGT AGTTT CT A ATGTTTGTATAGCCATTATTTCCTGGAGAAAACCAAGTGGACCAACTTGTGGAAATTATCAA GGTGATGTCCCTTCTATATGAGTGTCTCCATGGATTGCAGAATATATCTGCAGAGATAATTA TTTAGATGTCTTCTTGTAGGTTCTTGGTACTCCAACACGCGAGGAAATCCGTTGTATGAACC CAAATT AT ACAG AGTTT AG ATT CCCT CAG ATT AAAGCT CAT CCTTGGCACAAGGTAAT G ACA TTT CT CAT CCAT CCT CCTTTTG AT ATT CAT CACTTGCCATTGG ACTTT AAAATGGGG ATT AAA AAG ATG AAAAAAT AGTT GT C AAAAT CAAATT CAAT AG CAT G CGTT AC AAGTT AC AACT AG GT TTTTG AG GTT G CTTT CCAT ATT CTTT GTTTT GT AATT GAT G AGC AT GAT AG C ATTG ATTG AT G T AACCT ACT ACCT C ACT AAT AG GAT AAAG C ATTGGCT AGT G AATT AT G C ATTT ATTTTGGT G CTCTATGTTTCAGGTTTTCCACAAGCGAATGCCTCCTGAAGCAATTGACCTTGCATCAAGG CTT CT CCAAT ATT CACCTAGT CT CCGCTGCACTGCGGT G AGT AGG AT G AACT AT GAT ACCT CCCTT CACTTTT CCCCTTT AAAT AAAAGG AAAACAT ACACAGG AAAAAGTTTGCTT ATTTT AA CCTT CTTGCT GTG AT ATT AT AT CT AT ATT CCTT ATGG CAT GTTTTTT AATTT AGTT AACT CAAT TG G CTT A ATTTT CACTGGTGG CTTTT ATT ATTT CAGCTGGAAGCATGTGCACAT CCTTT CTTT GAT G AGCTT CGCG AACCAAATGCCCGGCT ACCT AATGGCCGT CCACTGCCCCCACTTTT C AACTT C AAAC AG G AG GT AT AT AT CTT AGTCGT ATTTTTTTT ATT AAAT GTGACGTGT C AAGG C TGTTG CTTT GT CC ACT GTT C ATT ATGTATATCTGT AT G ACT ACTT ACTTT ACC AT CT GTTCTG CATG AT CCAAACCAAACACAAGGG AAT CCAATT AACAATTT CT CTT AT AT CAAAATTGTG AA CAGTATTT AACACC AG AAT ATT AT CTT AAT CATT CTGCAAT G AAAACTT AACT ACAGTT AGCT GGAGCATCACCTGAACTGATCAATAGGCTCATCCCAGAGCATATTAGGCGGCAGATGGGT CT CAGCTT CCCGCATT CTGCCGGT ACAT AG AT GTAAAGGG AT AATG AAACG AT G AGT CAAC CTACATAGTGATCGATGTGAATCAACAGAAGGGCTGTTTGAGGCCTATGTATAACTGGGAG TCCCAACATAATATGCAGTTTTTCCTCCCCCTTGTGAAGATGTATACATGTGTTGGTTGCTC GGTAAAGCTT G AAAGTTGGT GATT CT GT GT AGTATTT CATT CAAGTT AAAGCAT ACTT AT CC CTGCATCTGTAT ATT GTTTT G GTC AG ATTT CAG AAAG CT AG G AG T AT AAAAT G AT AG CAAT C ATGTCTTCATAGGTAGAGGGGCCCAGCTGAATTGAGGGGCCCCTATAGTAGTTTGGCTTTG CTTTTT ATG AG ATT AAATT CAGCAT GT CGTTT AT ATT AT GTTT AT AACAAT CT CTT GATT CAAA AC AAG AAATTTT CT CGTTGTTT AAT ACT CT AGTAACCCCGTT CCTT CT ACCC AAG AAG ATTTT GTTTGTCATATGTGG ACAAG AAG AAAGGATTCAATCAAAAAGTTGATTACGGAAGAAAAAAA T AT G AATT CTTT AT GTTG AT G AC AAGG GT GT GTGC ACTT AGG GTG ACTT GTT AACAAC AT AC GTT GAG AT G AGG GTT AAT AT ACTT CGTTG CTAT AT ATT C AATT AT ATTT C ATTT CT ATTT GTGT TGAAGTCTAAGTCAGAATTTGAAGTCACATATGGTTAGGACTTGGGAGCAAAATATATAAGT GAAAAAGAATCACAAACCTAACGCTTTAAGATCATCCACTATGCATATTGAATTGTTTAGAA GCTTTTTCGGTGGTCCTACACTTCACCTCAGATTTAAAAGTTTTTTTTTCCTCCGATGATAAC ATT AAATG AATTGTTT AAAT G AACTT AAAAG ATGTTTTTTTTTTTT AT AAAAAAATTTGGTT AA GCAGCAATT CTT AAATGCT AT GTT AT CCGCT CT AATGGTAAATT CT GTT AAACAAT GTT GTTT CT G AACGT AT AAT AT AATGTAAT C AACG AAAT AAAATT ACT AT C AAT C AAAG AT ACT AG G GTA TT AACAT AATT AT G AGTTG ATTT AGTTT G AATTT AG AAAAAAAACTG ATT AAACTGGTTTG ATT TG
SEQ ID NO: 18: GSK2 promoter sequence
G AAATT ATTTT AAGTAAG AT AT CT ATTT AT AAATT AGGT CCAAATT CACATTTTTT AAACATT A T AAAT AG AATT AT CT ACCTG AACAT AAAGTGTT AAACAATTT AG AAT GT CT AATTTT AAATTT G AAAAGTAAAAAAG AAAATTT ACT C AG AGTTT C ACTT AT C AC AAAATT G ATAGT AAAAAATT AG TTT C AGT GTAT AATTTTT AC ATT ACT G AAG AAAAAAATTT GTG ACTTT AAG AG CT C ATT AT AC ACATTT AAT ATGTTTTGGATTCTAGCTAGTCTGATTT AAT AT AATTT AAAT AT AAAATATTTCA TAATTGTTTGTTATGCATGTTTTGAACACCACTCCTTCCATAAGGGGG AGTTT ACACTGTTC AATT ACTTTT AC AT GAT CACGT C AAGGTC AAG ATT AT GATT CTT AAT CGC AT CC AT AG CT AG CAAG AAGC AAAAGCAGTT ACCAG AGGTTT CTGG AATT CCCAGCTT CT CT CT CT CT CT CT CT C TCTCTCT AT AT AT AT AT AT AT AT AT AT AT G ATT G CC AA AT G TT AC ATTTT G G G G CTT ATGTG A A GTG AT AAT AAATT C AATT G AACGTCCCTT CT CTT CCTTT AGG AT ATTT CTTTTT CT AT AACAT A AAGG AT AGTTT AG AAT ACAAT AT AT AACT ACCT GTTTT AGGTTTT AACT ATT G AAT CGGGT AA AAACT G AAAAC AACT AAT G CT G AAAAAT AAAAT AAAAT CT AAAATTGG AAAATTGG CC AG GT T AAAAAT AAAAG AGGTT AATTT CT AAT CT AT AAATTT AAT GT AT GTTT AG ATTG C AGTTG GAA G AGTTT AAAAT ATTTT GTCT C AAAGT AGT AGTTTTTTTTT CTT C ATT ATTT CGT AC ATT AAAAT TTTT AAAATTT ATTCTCAACCTTTATTCAAATATAGTCTT AT AATTAGTACT AATT AAAT AGTA GT GT CAAAAAT CCT ACACT CAAAT AT ATT CCAATT AAATTTT AAAAAAT ATT ATTTTT CATG AA GTTACGGAGTGCTGTGCACTAATGAGATGAAACCGAGCAAATTATTAGAATATACTACATAG TT ACAATT AT AAT AAAT G AAAATT AAAAT ATTTTTT AC ATT ATTT G G ATATG CAT AT AG AAATT AT AT ACTT AT AT AT ATATAT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT AT CAGTAAT AT AATTT ATT AAACCCAATGG AAAGT ACTT AT GAG AAGG AGCTGTAATTTTTT ATT TT ATTTT CAAAGT ATTT CCCAAT AAAT AAAT AT CT AAGTAAAAG ATT AATT AATT G AAAAAAAT AGTATGCATG AGTTTT ATT AGTG AAT ATTTT AAAAATTTTGGTT AAAAAGT ACTT AGT AT ATT G TTATGAAAT ATTTT ATTTTTCAACTAATT AAAAT ATTT AAT AT AACTGTAT AAT AT ACAT AT AAT CTT AT GAT C ACTTGTT AAAAAG ACT C ATT C AATTTT AAAT AG AT C AAACT GT AC ATTTG ATTT A AT GTT CATT CTT ATTTT ATTT CTT AAGTT G ACAATT CAT AACAAAGTCAT AAATGCAT AT ATGT AGG ACAGCGTTTT CATTTT G AAT G AAT CAATTTT CTTT AAG AT GT GTTT ATTTT AATT ACATTT CTTT CTTT CTTTTGT AAG AGGTTTT CAAAG AT GTT CAT ACT AT ATT AACTGCGT G AACC ATGC AT CG G ATGTTT CGTGTT C AC AAT G ATTTTT AATG AAT ATTT AATT AAT AAAT AAAG AAAAT AT C AAAATGTCTTTT AACGTCAT CAAACGTT AAAT AT AT AT AT AT AT AT AT AT AT AT AT AT AT ATTT A T CAT AAAAAAT CAAAAT ATTT ATT AAG AAG ATT AAAT AT AAAAAT GT AAATTT AT CAT CAATTT GG ACTTG AGTT ATG AAGCACT ACCTTT CGTTTT AAATT CT CT AG AT AAACT GTTTT ACAAAAT ATTGCATGCAAGTACGTAACATTATACGAACATCGAATTTGCTTCGGTTCTCCTGCCCCTTA CGG CACC AG AT C ACT G CT CCCTTT CAT C ACG ACCCT GATT CGCGCGTGCTCT C AAT CTCCC T AAACT CG CG T G A ACT C ACT CTTT CTCTCTTCTT G AAC A A A AAC AG G G CAAG AG AG AG AG A AGAAAAACGAAGAAAGGTAATAGAGAGAGAAAGGGAAGAGGAGAGAGAAACGAAGAAGAA G AGT GTTT CT C AC AT C AC
Maize
SEQ ID NO: 19: OML4 amino acid sequence
MPFQVMDPRHHLSQFTNTTVAASSFSEEQLRLPTERLVGFWKQESLHHIGSKSVASSPIEKPQ
PIGTKTMGRVDPQPYKPRGQKSAFSLEHKTFGQERHVNMPPSLWRADQDPYVQSDSSLFPD
GRSTNPYEAYNENGLFSSSLSEIFDRKLGLRSNDVLLHQPLEKVEPTHVDDEPFELTEEIEAQII
GNILPDDDDLLSGVDVGYTAHASNGDDVDDDIFYTGGGMELETVENKKSTEPNSGANDGLGSL
NGTMNGQHPYGEHPSRTLFVQNINSNVEDSELKVLFEHYGEISNLYTACKHRGFVMISYYDIRS
SWNAMRALQNKPLRHRKLDIHYSIPKDNPSGKDINQGMLVVFNVDPSVTNNDIHKIFSDYGEIK
EIRDAPQKGHHKVIEFYDVRAAEGAVRALNRSDLAGKKINLGTVGLSGVRRLTQHMSKESGQE
EFGVCKLGSLSTNSPPLPSLGSSYMVAMTSSGRENGSIHGLHSGLLTSMSPFREASFPGLSSTI PQSLSSPIGIASATTHSNQAPLGELSHSLSRMNGHMNYGFQGLGALHPHSLPEVHDGANNGTP
YNLNTMVPIGVNSNSRTAEAVDCRHLHKVGSSNLNGHSFDRVGEGAMGFSRSGSGPVHGHQ
LMWNNSNNLQRHPNSPVLWQNPGSFVNNVPSRSPAQMHGVPRAPSHMIENVLPMHHHHVG
SAPAINPSLWDRRHGYAGELTEASSFHLGSVGSLGFPGSPQLHGLELNNIFSHTGGNRMDPTV
SSAQISAPSPQQRGPMFHGRNPMVPLPSFDSPGERIRSMRNDSGANQSDNKRQYELDVDRIM
RGVDSRTTLMIKNIPNKYTSKMLLAAIDESHKGTYDFIYLPIDFKNKCNVGYAFINMTNAQHIIPFY
QTFNGKKWEKFNSEKVASLAYARIQGKTALIAHFQNSSLMNEDKRCRPILFHSDGPNAGDQEP
FPMGTNIRARSGRSRTSSGEENHHDIQTVLTNGDTSSNGADTSGPTKDTE
SEQ ID NO: 20: OML4 nucleic acid sequence
ATGCCATTT CAAGT CATGG AT CCG AGGCACCACCT CT CCCAGTT CACCAAT ACAACCGTAG CTGCGT CCT CCTT CT CT G AGG AGCAGCTT CGCCTT CCCACAG AGGTAAT AAT CTGCAGTT G CAGAATTGTTGCCCTATTTATTGTTTTCTGTTTTTGTTAGTTTATGATAAGGCTAGTGGTGTC TTT ATT GTTTT AGTT CAT GTTTG AT ACCT ACC AT GTTGT C ACTCG ATTTT CTGG AT AT CT AT G ACATGCACTAATTTTTTTAATCTATCTTTGCAGAGGCTGGTGGGTTTTTGGAAGCAGGAGTC GTTGC AT C AC ATT G GT G AGTACTT AATTT GATT C AAT ACCCCTT AG CTTTTTGCT C ATTT CCA T G C A AAG A AT G CT CTTT G G CTG C A AAA AT CC AC ATGTT ATT G CG G G G AA ATTTT GTG C ATTT AAT AAC ATTTT ATG CGT G ACT AAG GG CT AG TTT G AAT CC ACT AG AG CT AAT AATT AGTTGTC TAAAAAATTG CTAGTAG AATT AG CTAG CT A AC A A AT A ACT AG CT A ACT ATT AG CTAATTTACT AAAAAT AGCT AAT AGTT CAACT ATT AGCT AT ATT GTTTGG AT GT CT AT AG AGCTAATTTT AGC AG CT AACT ATT AT CT CTAGTG C ATT C A AAC AG G G CCT A AAT AAC AT AA AT AATTT GTTT G CTT GT G ATG AAT AT G ATTTT AG CTTTTT ACCCT AACTTT AT C AG G AAT AG AAGGCTTT GTTTTTGT TGTTGTCGT GTTGG AC AT GTTTT ATTGC ACTTTT C ATTT GTT GTTT ATGT ATTT AT AGT CT C AA GCC ATTGTTTT GTGTT CACCTTGG GTTGC AAT G G ATTT AT G AC AT ATTT G ATGT CC AG GTT A T CCTTT GAT ATGCAATTGGTTGCGCT AGTTT CT ACCTT ATT ATT CCTTTTT ACTT ATTTGGCA CTCCTGTCGT ACT CT CT CTTT GTT CT C AC AAT G GTT CATGC ATTTT GTTGTT CATT AT C AAG A TGT CTT CT C AAAGG CAAG CT GTTT CT ATT GTTGT C AGG G AGC AAGT C AGTTGC AT CTT CTCC AATT G AAAAG CCCC AACCC ATTGGT AC AAAAAC AAT GGGTCGGGTAGAT CC AC AACC AT AC AAGCCGAGAGGCCAGAAGTCTGCATTTAGCCTTGAACACAAAACTTTTGGTCAAGAGAGGC AT GTT AACATGCC ACC AT CTCTGT G GAG AGCT GAT C AAG ACCCTT ATGTT C AAT CT GATT C A TCTTTATTTCCCGATGGAAGGAGTACTAATCCATATGAGGCCTACAACGAGAATGGGCTTTT CT CAAGCT CCCT GT CAG AAATTTTTG ACAG AAAAT GTG AG ACAGCTT ACT CTGGCACTTT CA TCAACTTCATTAGAGCGATTGATTATACTGCAGTGAGCCTGCACCATGAGAACCATTCTCTT CAT CTT AG AAAATGC ATT G AACTGTAT C AC AC ATT CC AT AGTATGT ATT GTGTATGTGTGTG CCTTGAAATCAACAGAAAGGAATAAAAAGTACAATAAAGGATATTAGTGAGTATGAATGGGA AG A AAA A AT A AAA AAA AT ACTT AAC AT ATTTTTTT AG C ATTTTT G C ATCTT ATTTT CG AAG G AA CCTTACCTGCTTTATTTTTCTTTGGCCCAAGAATCCTTTCACTTAAGTTTGGTATCGTTATCC TTTTATTTTCAGTAACACTTTGTGCAAGATTTGGGCAGTCAGACACTCCGATTAAATCATTG CT ATT GT AGTAAGCAAT ACAT AATT CAT ATTT ATTGCTTTCT AACAAATT AT ATGCTT CAAT GT GTAGTGGGACTGAGATCAAATGATGTGCTTCTACATCAACCACTTGAAAAGGTTGAACCAA CTCATGTAGATGATGAGCCCTTTGAGTTAACAGAGGAAATCGAGGCTCAAATAATAGGAAA CATACTTCCTGATGATGATGATCTACTATCAGGTGTTGATGTTGGGTACACAGCCCATGCTA GCAATGGTG ATG ATGTTGATGATGAT AT ATTTT ACACTGGAGGTGGGATGGAACTGGAGAC CGTTG AAAAT AAAAAAAGTACAGAACCTAACAGTGGAGCTAATGATGGTCTTGGGTCGCTA AATGGCACAATGAATGGTCAACATCCATATGGGGAACACCCTTCAAGAACTCTTTTCGTCC AG AACATT AAT AGCAAT GTTG AGG ATT CTG AATT AAAGGT CCT ATTTG AGGT ATGTT CCTTTT TT CT GTTTT CTGCTT AAACCT AT CGTT CCT GT ACAG AACATTT GTTT CTG AAAAT CATTT ACT CTTTACCCACAGCATTATGGAGAAATCAGCAACCTTTACACTGCCTGCAAACATCGCGGTT TTGT AATG AT AT CTT ACT AT G AC AT AAG GT CAT CATGG AAT G CC ATG AG GG C ACTT C AAAAC AAGCCACTAAGACATAGAAAACTTGACATACATTACTCCATTCCGAAGGTATTCACGAGTCT TACTGGCTTGATGTGTAGACATATTTTGCCCAAGGATGCCAGTATGTAGCTAGTTTACTGTT AT CAGTTTT GT AGTT CTT GTGCT AATTTT CACCTTTTTT CCCTT AGG AT AAT CCTT CGGGG AA GG AT ATT AACC AGGG GAT G CTT GTT GTATTT AAT GTTG ACCCGTCTGT AAC AAAC AAT G ATA TCCAT AAG AT ATTTAGTG ACTATGGTGAAAT AAAAG AGGTATGCTATGCTCTTACATTAACTA CCT ACT ACATT AT AACT AG AACT AT AAT GT CTT AAATT AATTGC AG ATT CGTG ATG C ACCG C A AAAGGGCCATCACAAAGTT AT AG AATTTT ACGATGTCAGAGCAGCTGAAGGTGCAGTTCGT G CTTT AAAC AG GAGTGATCTTGCTGG CAAG A A AAT A AATTT GGGGACTGTTGGTCTGAGTG GTGTT AG ACGGT AT G CCTTT G AAAT GTTATCCTG CT GTT CATT C AC AT ATTT C AGT AAC AAT A CTT ATT ACTTTTGG ACAGTCCAT ATTT AACT GTTG AT CATTT GAT CGTG ATT CTTGCTT AGGC AT CTTT G GTATATAGT ACCAT C ACTT ATT CTATATG ACG GTACCTGTCG AT AG AAT G CAC ATT AGTT GAT CTGG ATTTTT ATTTT CTTTT CT CAAGTGG AAAAT CT CTT CCTGG AGCT GT AAACAT TGCACTGTTTTT ATTTT GT CATGCAT AG AT AGTTG AT CTTT GTTT CTTT ATTT CT ATGTATGGG CTCTG ATGTCCT AC AC AAAAC AG ATTTTT GTTTGTT CTTT CAT ATT GT AGT CTT ATT CTATGTA TTGC ATTT AG GT GTATG G ATATAT ACTT AGT AT GTT AGTT AT CT AAGT CAT CC AG AAAAAAG A GC AATT ATT AT GTG AC AAC ATT CT AATTTT G ATTTT ACCGTG CAAACTTTTG AAAAC ATT G GT TTT AAT C ACT G CT CT AACATTG ATTTT AATGTT GTTTT AT AAC AG ATT AAC AC AG CAC AT GTC CAAAGAGTCGGGGCAAGAAGAATTTGGTGTATGCAAACTGGGCAGTCTAAGCACAAATAG CCCTCCATTGCCTTCATTGGGTATGCTGTTGGTTTTTTTCATCTTTAATGTATGTCATGTCTA T AG CT AC ATTT CCT G AC AT G G AGG AT AATT CTT C AAGGTT CAT CTT ATATG GT AGCC AT G AC ATCTTCTGGCCGTGAAAATGGGAGTATTCATGGTTTGCATTCTGGACTGCTCACATCAATG AGCCCGTT CAG AG AGGCTT CTTTT CCGGGCCT AT CAT CT ACCAT ACCACAG AGCCT GT CCT CT CCC ATT G G AATTGC AT CTGCT AC AACT CAT AGT AAT C AGG CT CCCCTT G GTG AG CT C AG CCACTCACTTAGTCGGATGAATGGGCATATGAATTATGGTTTTCAAGGCTTGGGTGCTCTT CATCCCCATTCTCTTCCTGAAGTTCACGATGGAGCAAATAATGGCACCCCGTACAATCTAA ACACCATGGTACCAATTGGTGTGAATAGCAACTCAAGAACAGCCGAAGCAGTTGACTGCAG AC AT CTT CAT AAAGTG GGTT CT AGC AACCT C AAT G G AC ATT CATTT G ATCGTGT CG GTG AAG G AGGT AAGTTT GT AAATTT G G AC ATT CT AAT CT CC ATTTTT AT GTTTG AACCC ATT GT CATTT CT ATT CCTT AAACATGTGTTTT GT AAT AAAGCT GTT AGGTTT AT CAGG ATT GT G AAAACT G AA CT GT G AAAATTTG AT CAATT AAT GT AT GTT ATTT AACT GTT CCGTT CATG ATTGCAT CT GT AA CAAATTTTGCAGCTATGGGATTTTCAAGAAGTGGAAGTGGTCCTGTCCATGGTCACCAGCT AAT GTGG AAT AATT C AAAT AACTT AC AACGT CAT CCC AATT CCCCTGTG CT GTGG CAAAAT C CAG GAT CATTT GT AAAT AAT GTACCGTCTCGCT CCCC AGC AC AAATGC ATGG AGTT CC AAG AG C ACC AT C AC ACAT GATT GAG AAT GT CCTT CC AATGC AT CAT CAT CAT GTGGGCTCTGCG CCAGCAATCAATCCATCACTTTGGGACAGGCGGCATGGCTATGCAGGGGAATTGACAGAA GCATCAAGTTTTCATCTTGGCAGTGTTGGGAGCTTGGGATTTCCTGGTAGCCCTCAGCTTC ATGGCCTGGAGCTAAATAACATATTTTCTCACACTGGTGGGAATCGCATGGATCCAACCGT GT CTT CGG CT CAG AT C AG CGC ACC AT CT CCT C AAC AG AG AG GT CCT AT GTT CC AT G G AAG GAATCCTATGGTTCCCCTTCCATCATTTGATTCACCTGGTGAGCGGATAAGAAGCATGAGA AATGACTCAGGTGCTAACCAGTCTGATAATAAACGGCAGTACGAGCTTGATGTTGACCGCA TAATGCGAGGGGTAGACTCACGAACTACACTGATGATAAAGAATATCCCAAATAAGTATGTT TT GAG AT C ACC AAATTTT ATGCT AC ATTT AT GTTCTGTCT C AAT AT ATT CTTTT GTTCTGGTTG GTTCTTTCGGGTTTCAGGTATACCTCCAAGATGCTCTTGGCTGCTATTGATGAAAGTCATAA GGGCACTT ATG ACTTT ATTT ACTTGCCAATT G ATTTT AAGGTAGTTTG AAACTTTG AATTT AA CT CAT AAG CG ACCG GG GCCTT GT ATT AGTTG AG ACT ACTTTT GTGTT CAT GTT ACT AAAT G A GAT CAAT CTCCTTTT CAG AAT AAAT GT AAT GTTGGCT ATGCTTT CAT CAACATG ACC AATGCT CAGCAT AT CATTCCATTTT AT CAG GTC AG AAA ATT ATT CCAATT G ACG AAGTGCT ACTGCAT TG ATGT AAAGTT GT AAACT AG CCTTTGGT C AACTT AT AT G CCTTGCC AAATTT GT ACTTT GAT AAAAT AT CCGGCTT G AAC AT CG ACGTG CTATCCT G AGCC ATTTT GT CAT CTTTTT CAG ACTT TTAATGGTAAAAAGTGGG AG AAGTTT AACAGTGAGAAGGTGGCATCACTTGCTTATGCTAG AAT CC AAG GG AAAAC AG CT CTG ATT G CT CATTT CC AG AACT CT AGTTT G ATG AAT G AG G AC AAACGTTGCCGCCCCATACTCTTCCACTCAGATGGTCCTAATGCAGGAGATCAGGTATGCT T ATTT CTTTTTT ATTTT GTCGTTGGT ACTTT CCCTGCTAT CTTGTT CT CC AGTT AC ATT ATGTT T CGCTGC AGT G C ACT GT G ACG AGT CTT CT AT AT AAT CCAT AT ACCTT G AAT CCTTG ATGG G GCTGATGGCAGATAAAAACATAGGTTTTGTGAAAATAAAATGGGGGGAGGTAAATGTCCAC CTGCCATTTTTGCTGCATTAACTGCCCTGTGACAAGACTTCTCTATACCATCGTACAAAGGC CCTGTTTGAATGCACTAAAGCTAATAGTTAGTTGGCTAAAAAGTTTAGAGAATTGGCTAGCT AACAAAT AGTTGGCT AACT ATT AGCTG ATTTGCT AG AAGT AGCT AAT AGTT G AATT ATT AGC CAG ACT GTTT GGATGTCTGCAGCT AATTTT AG CAG CT AACT ATT AACTCT AGTG G ATT C AA A CAGG G CCA A AG TC AT CAAT AT AT ACCTTG AAT CCTT GATGGGCTGATGGCAGCT AA AA AC A TAGGTTTTGTGTGGCGAATCCTTCTAAATTATATGGCCCACATGCACTTGTCTTTATCCCAA AG ACCT CAG ACG ACT AT G CAT ATGT ACC AG AT AACTT AAAAG AATTT GT CCC AGTAT CTCG A AGG ACCT CGGG AAAT CCACTTT ACAACCAAG AT CGCAAG ATT AAGT AC AC AC A A ATC AC AT ACCG AAG TTTT GT AGCGG AATT CAT ATT ACAAT AAGTTT ACAAATT ACAAT ATCG AAAAGGG CGTACCCAATGCTGTAGGCTTCCCGCACTGTGCGGGGTCTGGGGGAGGGTATCTTTAAGC GCCAAGCCTTACCCGCATAATATGTAGAGGCTGGGGCTCGAACCAGGGACCTTCCGGTTA CAGACGGTAGGCTCTACCGCTGCACTAGGCCTGCCCTTCACAAATTACAATATCGAAATGA GT ACAAATTTG AT ATG AAAGTAAT ACAACTTT G AAT G ACATG AATT ACAATTTTAAGTT CAAA AT AC ATT G CT AT CTT AA AT G AC AAA ACT C AG G T G G A AG T AC AG A A AAT AT ACTT AT AT A AG AA G ACCG AGT CCACCG ACACTT AGCTT CT AT CT ACAACAG AAC AAG AACAT CACT CGCAACAT GGTGGGATAAAACCCTGAGTACACAAGTACTCCACAAGGCTTACCCGACTAAAGAAAATGA CTCCAAGGGCATGCAAGAATTGGGGATTCAAGGTGAGGTTATAGCAAGAATAAAAAACTCC TTT G C AT AAA AG CTT ACT AG AAG TG G ATCCTT AAG CC AT ATTT G AATTT AT C A ACTT AG CTCT CT CCT AAAT CT AG ATT AGCCT AAT CT AG AT C AAAC ACTT G CC AAACC ATT GT CTT C ATTTT AC C AG AT CT C ATTT CT CTT CTT AACT ACG ATG C ACTT AACCCTTGC AT ATGT C AACCC AAT CTT C G AGTGGT CCAAG ACC AAAACGGGTTTGGGCCACCTG AT AGCACAGT ACT CCACCCT CCAA CCCATGCT AGTTGGGCACAC ACT ACT CT CCTT AAT CG ACT CGG ACGG AAACACTGCACCG A G ACG AAAAC ACTGC AC AAAT CT C ATTTTT CT CCTT AAT CG ACT C AG AT G G AAAC ACT G CACC GAG ACT CCTTT CTCG ATG C AAGTT ACCC ACCCG GTCT CAT ATT AATT C ACCTTTTT C AC ATT T CTTT AACAT AT CT CAAT ATT CAGCGG AATTGG AAACATTTT CT G AAAACCCCT AATTGG AAA C ATT AC ACT CT ATTT G GTG CAT G C AAGG AG AAAAAT CTT GTTT CCCC AT CT ACT CG ACT G GG AC AAAT AAT CAT G CGT C ACCTT GTTT CC AGCC AT AT AAAG AAACGTGC ATGCT CTG GTAG G AAAATGGAGAAGGGCACATGCTTCTACAGTAGGATAGTAGTAGTACGCCTTATTTTTTAGAC AAAAT CT AAAACTTT AT ACG CCTT GTTTTTT AGG AT G G ACG AAGTAT AT AAGT ATATATGTCC AGAAACATATGGATGACTAAATGGACGACCAGCTCGACTAGGGTCGATTAGTCGACCTAGT CG ACG ACT AAT C AC AACT AAC AAG GTTTT AAAGT CGTTTG ACT AAT CG CG ATT AGTCGGCCT TATTGCTGGAGTAAGCACGATTAGGGGCTTCGACTCGACTAGGGCGACTAGGAAGCGATT AGTC AT CCTAGT CACT G ACT AAT CGTG ATT AGT CGCCCG ATT AG GG GTTATGT CCG ACT AG CTAGTCCTGTTTTGGGCCAGTAATTCGTCCTTTGTTCTATGGGCCGGCAGGCGGCCCACC AT CT CTGCAG AAAG TAG AAA ACTT GT GTTGCCCT ACCT GT ACCGCAGTAGCAGCACAGT AG CCGTCGTCCCTTCTCTAGCGCGCAGTTGCGCACCCTCTGCAGCCCTTCTTCAGCGTGCGG CTGCGTCCTCTCTGCTCCGGCTGCGATCCCTCTGCTCCTGCCAGCGCGTGGTTGTTGCAG AGGCCTCTGTTAAGCCGATGCCCTCTAGTATGGCGCACGCCTCTGCTCCAGGACTCCACC G AAGTCCACCCT CCAGCGCAAG AGCGT CCT CCACT GAT CCACT CTG ACT CCTCCATCAT AC TT CTT C AG AGTG AATT AG TTT AG AGTTT GTT CTG AACTT C AG AAAT C AG AAAT C AG AAATT C A GACTTCAGACTTCGGAGTCCAGAGAACATCAGAGTTCAGACTTCTGAGTTCACAGTTCAGG CTTT C AG AGT CT GTT GTTTTGCT AT AG CT AT AAT AT ATT GTTGTCCTGCT AC AGCT CT AGT GT ACT G CT AT ATTGC AGTACT G CT AC AG CT AT AT AT ATT GAT AT AT AT ATTT AT AC AT ATAGTCC TGTTATAGGTGGACGACTATAGGACGACTAGGAGTCTACTAGACTCGACTAATCGAGCAAA TCGATGACTAATCGTGATTAGTCGCCTTATCGGTGCTCAGGCGACTAGAATCGACTAGCCG ACTTTAAAACCTTGATGACTAATCGACTGGTCGGTAGCTATACGACTAGGTTCGATTAGAC G ACTT G AAAAT AGTT AT CT CG AG C AG ACT CCCT ATCCC ACTT CACT CCCT ATTT C AAACT AC ACT ATGCAAACAAT AT AAT CT AT AGTGCAAAACAGT ACTTTGCACGCT CGTTT ACATGGT AT GCTGGAGATGACCTTAGTGCTTGTTAGACGATATTCACTTGGCGATTATCTCCCAACCTAG CACTT G ATCT GT CCATCCAT CTT CAGGTTGGT CTGCCGT CAT CGT CTT GTGGTTGGCTTT G ATCCACGTTCTTTACTCCGCGTAATCAACTAACGTACCTGAATGAGATGCACGATGCATATG T ATG AG CAT AAAAT G A AC CAAT G CT AC AG T G A AG AAAAT C AAAC ACTT AAT G G C A AG G C ATT GCCACAATCCTACGCAAGTACTAGATACATATTGTCACTAACCTTGATTAGGCGAGATAATA ACCCCCTCGGGTACTGTAGCATATATATGTAGGCAGACAAGAATATATGGGCTTTATGGGC CTT AAC ACCCCCT AT CG AACT C AAGG CGG AAGT G G AGG ATTT G AAG C ATT G AGTTT GATT A GATGAAACTGATGTTGTGCCCTAGTTTGTGCTTTTGTGAAGAAATCTGCAAGCTATAATTCT CAGGGCACATATTGAAGATCAATGGTCTTCTGATGACAATGAGATCTAGTGAACGATACAT C AAC ACC AAT GT GTTTT GT G AGTT C ACGCTT CACT G G AT C ATG AC AAATTT GTAT AGTT CCA ATGTTGTCACAGTGAAGAGGCGTAGGCGAGTCACAAGAAACGCCTAGATCAGCCAAGAGC CAACG AAT CCAG ATAAT CTT AGCAGTAGT AGT AGCCAGGGCT CG AAGTT CTGCTT CAGT AC TAGATCTAGATACAACAGCTTGCTTCTTGGATTTCCAAGCAACAGGGGATGATCCAAGAAG AATACAGTAACCAGTGATGGAGCGACGATCTGTAGGATCACTGGCCCAGGTAGCATCAGA GTAAGCACGAAGCTGAAGTGGGGAATTTGAGTCAT AAAAT AAACATTGTGTTTTTGTCCCTC GT AAAT AT CT AAGC AC ACG AAGTAAGT G CCC AT AAT G AACTG AT GTAG GAG C AG AT AC AAA CT CAT AAT CAT AG ACGGTTTG ACG CT GTT CACC AT AG C AG CC AT C ACTTT ACC AT C ATT AAG CTGCCATGTTTTGATATCAGCAGCATTGCGACGATCATCCGCAAGAACGGGTGCCGCATCA GT C AAAT G AAAG AGT AAT CC ATGTCCCCTG AGT G CAGT CT C AAC AC AG AAAG CCC ACTCCG GAT AATTT CG GCC AT C AAG AGTG AT ATT G ACC AC AAT AG C ATTT GT CG CC AT ATTG AATT C A ATGAAAATCAGGGAGAACAGGAGACCTGAAACCAAACAAACCAGAGGACGAGTTGACGGA GGTCCTGGGCGCGGAAACCGAGTTGGACAGTCTGCTCGCAATGGCAGCCACCGGCGCAG AAACAGGGACGACGCAGATGGCGACGAGGCAGGGCGCAGCAGACGGCAGCGCAGATCC
GGATCCGCAGACTGTTTGCGGCGTGATTATCGGATCGATGGCAGTTGCACAAATCTTCTCT
GCAGCGACTGTTTGCGGCGTGCAGCAGGCGGACGGCGACAGGGCGCAGTAGGCGGACG
ACGGCAGCCGAGCGCAGCAGGCAGGTGCAGACGGCGGACGACGGCGGCCGGGAAGCC
CAGATCCGCCCGCGGGGATGGAAGAAACCGCGGCCGGGCGCAGCAGGCAGGTGCGGAC
GGCGGCGACCGGGTAGCCCAGATCCGCCCGCGGCAGAGGCGGGGAGGGGAAAAGCCG
CGGCGGCCGGATCTGCCTCGAGGACGGCCGCGGATCTGGACGGGATCCGCGACGACGG
ACGGGCGGTGGCCGGATCTGCGCGACGGCGGACGGGATCCACGATGGCGGCCGCGGAT
CTGGACGAGGGCGGCGCAGATGAGCCCGCAACGACGGAATCCGCGACGGCGGACGGGC
GGCGGCCGGATCTGCGCGACCGCGGACGGGATCCGCGATGGCGGCCGCGGATGTGGAC
GGGGGCGACACAGATGAGCCTGCAGCCGCGACGGCGGGTGGGAGGAAGGTGGAGAGAG
GGTCGCAACGGCGGCCGGGAGGAGACGATGGTGGCTAAAAAAATCTAAGAAACCCTAATC
GTGACCTGCTCTGTTAATAGGTCACTAACCTTGATTAGGCGAGATAATAACCCCTTCGGGT
ACTGT AG C ATAT AT AT AG G CG AC A AG A AT AT ATG G G CTTT ATG G G CCTT A AC AC AT AT AACT
CACT AAACACAACAAT CACGTT CTT CCAGTTT AACCAG AT CT AACT CAAAC AT CAAG AAAT A
AT AAACT ATGTGT AAGT CCT AT AT CTT CTTT AG GTAGT G CCC AAC AT C AG AAG ACT AGC AAA
ACCT AG ACT CAT C ATT CTT AG AC ACCT AAATT C AG AAT G AG AAT AG AAG C AAT CT AACT AG C
ACTCTAAACCACCTTTTGGTGAAAGAGTAATTGTGGGAATGACTTGATTCTATTCCACGACA
ATGTGTGCGTATACATAGGAGAGGCCGGGGTTGCTCACAAGGCAACCGCACAGGCGTACA
AG CCA AT CAAG GGCAGCCT AC A AT CAAG GGCTGACTACCAT A ATT AG G CTTT CT AT A ATT A
C AAT AG TCT A AC ATTT G G G ACT AACT CGCATAGCACAACATCT AAAT AAA AC AT C AC ACT AT
T AG AT CT AGCAGGCAG AACAT CATT AAAG AT CACAGTCTTT CACAAAACC AC AACTT AAAAC
C AAAAG ACCT AAAAC ACT AAT GTG C AATGCCC ACT ATGC AGT ATT AAG ATTT CAACT AAAG C
AG ACCT AGCG AT GTT ATTTGCTT CG AG AT ACTT G GAG AAG C AAT C AAC AT CC AT CT AT G ACA
TTTAACCGGTCACTAAGGCCCTGTTTGGACAGCTCCAGCTCCAGAAAATTCGGTAGAGTTG
GTGGAGCAGGTCATTAGGTGCTCCATAAAATCGTGGAGTTGGAGCTGTAAGCCTTCAGAA
GACATTTTGTCTTTGATAAGTCATGCCCCCGCAGTCTAATCGGGAGCATCGCTAACGGTCA
GGCTGGACCGAAACTCCTGGAACAACGAGGTGGGTGGTCCCTTGGTGAAGACATCTGCGT
ACTGAGATGTCGTAGGAACATGAAGAACCCGAGCGTGTCCGAGGGCGACCTTCTCTCGGA
CAAAGTGGAGATCGATCTCAACATGCTTCGTCCGTTGATGCTGCACTGGATTGCTGGAGAG
ATATACAACACTGACGTTGTCACAATAGACTAGGGTGGCACGGCGAGGCGGGTGCCGAAG
CT C AAT G AGT AACT AACGT AACC AAGT AGCTT C AGC AACG CC ATTT G CC AC AAC ACG GT AT
TCGGCCTCGGCACTGGACCGGGAAACCGTGTGCTGGCGCTTGGAGGACCACGACACTAG
GTT GT CT CCAAGG AACACTGCGT AGCCAG AGGTCG ACCGGCG AGTGT CGGG ACAT CCAG
CCCAGTCGGCATCTGTGTAGACAACGAGCTTCGTAGGGGAAGATCGGCGCATAGTCAGTC
CAAGAGATATAGTGCCTTGCAGGTAGCGCAAGATGCGCTTGAGAGCCGCGAGGTGGGGC
TCTCGTGGATCATGCATATAGAGGCAAATCTGCTGAACAACGAAGGCAATGTCCGGACGG
GTGAAAGTCAAATACTGTAGAGCACCTGCCAGGCTGCGGTACTGAGTAGCGTCATCAACG
GGAGGTCCATCTGCAGATAATTTGGAGTGGAGATCAACAGGTGTGCTACACGGCTTGCAC
ACGCT CAT CCCG ACGCGCT CCAAAAT AT CCTG AGTGTACT GT CGTT G AG AG AG AAACAG AC
CATTGGCAG AACGTGT CACAG AAATGCCCAAAAAATGGT G AAGCT G ACCCAT GT CCGT CAT
AGCAAACTCACGCTGGAGAGCCCCAATCACATACTGAAGAAACTTTGCAGAGGAGGCAGT
GAGAACAATATCATCAACATACAGCAGCAAATAGGCAGTGTCTGGCCCTTGGTGATAGATG
AACAGTGAACTATCTGACTTGGTTTCAATAAATCCAAGGGAAAAAAGATGGGATGCGAACC
TGTGATGCCAAGCACGAGGAGCCTGCTTCAAGCCATATAGGGATTTGTTGAGCCGACAGA
CAAGATCCGGATGAGAGGAATCCACAAAACCAGAGGGTTGTACGCAGTACACTGTCTCGG
TGAGGGTGCCATGTAAGAATGCATTCTTCACATCTAGCTGATGGATGGACCAGTTCTGAGA
G AG AGCCAACG AG AG AACAACT CG AACT GTTGCAGGCTTG ACAACCGG ACTG AAAGT CT C
ATCATAATCCACACCGGGGCGCTGGGTAAACCCACGGAGGACCCAACGAGCCTTGTAGTG
ATCAAGAGACCCATCTGCGAGCAGTTTGTGTCGAAAAATCCACTTGCCAGTTACCACATTG
ACTCCAGGAGGCCGTGCTACTAAACTCCAGGTGTCATTGGCGAGTAGAGCATCATACTCA
GCTTGCATAGCGGAGCGCCAATTGGGGTCTGACAATGCATCACGAACGGAGCGAGGCAG
TGGCGACATAGACACAACGTGGAGGTTGAGGCGATCCACGGGCTGTGCCATGCCGGTTTT
GCCGCG AGTGTGCATGGGATGCGCATTGGCGAT AGGGGTGATGG AG ACCGG ACG AGTGT
CTGCCGTGCGACCGGTGGCCGCGGTGCTGCTGGCAACGGCCTGTGCAGGATGCACAGG
GGCAGCATCACCCGTCGATGCGGTCGGCGAGGCGGCGGGGGCACTGCTGGCAGCAGCC
TGTGCAGGGTGCACAGGGGCAGCAGCGCCGGTCGACGTGGCGGAGGGACTGGGCGTCC
CCAATCCAGTGGGAGGAGACGTGGGTGCCTCTACACGCCCATGGGCGACGTGAGGTGTA CCTGCATGCACAAGTCTTGCTCCAGGAATAGGAGCGGTTAGATCATGTTCATCAAGAAGAA
AATCCAAGGCGGAGGATGCCATGGGAGTGGTAGACATGGCTGCGAAAGGGAAGAAGGAC
TCGTCAAAAACGACATGTCTAGAAATAAGAATGCGGTTCGACTCAAGCTGGAGACACCAAT
AGCCTTTGTGTTCCGAGGAATAGCCGAGAAAAACGCATAAGGAGGAGCGGGGGGCAAGTT
TGTGAGGTGCTGTGGAGGACATGTTAGGATAACAGGCACTCCCAAAAACTTTAAGATGATC
ATAGGAGGGTTGGGAGGAAAAGAGGGCACTATATGGTGTGGAGAAAGCAAGGGTTTTAGT
GGGGAGCCGATTCACAAGATATGTCGCAGTGTGAAGGGCTTCAACCCAATAAGCCGGAGG
TATACTGGCCTGAAACAAAAGAGAACGCAGAATGTCATTTATGGTGCGAAGAGAACGTTCT
GCTTTCCCATTTTGCTGAGAAGTTTAGGGGCACGACATGCGTAAGACAATGCCGTGGGAG
AGAAAAAATGTGCGGGCCTGGGAATTATCAAATTCACGGCCATTGTCGCACTGGATGCTCT
TGATGACGGTGCCGAATTGGGTGCGAATATAGGTGAAAAAGTTGGCAAGGGCGGAAAAAG
T CT CG G ACTTT AG ACGG AGT G G AAACGT CC AAAT GT AGTGG G AGC AGT CAT C AAG AATT AC
CAGATAATATTTATAGCCCGACACACTAACAATTGGGGAGGTCCATAAATCACAGTGTATTA
AGTCAAAATTGTGAGAAGCTCGAGAGCTAGATGAACTGAATGGCAAACGAACATGACGACC
AAGTT G AC ACG C ATG AC AG AT GTGGTT G AC AT CAT CTTT ATT AC AG G AAAT AAC ACT G G AG
GTAATAAGTTTGGACAAAGCTTGATGCCCAAGATGACCGAGACGACGATGCCACAGGGAG
GTGGGTGCAGCGAGGAATGCAGGGGTGCTGGTGGAGGGTGCATAGAACGGGTAGAGGTC
ACCGG AGCT ATTGCACCTGGCG AT CACGTT CCTGGTTTGCAAAT CCTT CACAG AAAGGCCA
AAGGGATCAAACTCAATGGAGCAATTATTGTCGGTGGTAAAACGACGGATAGAAATTAGAT
T CTT AAT AAT GTT AG GAG AC ACG AG G AC ATT ATT G AG AACT AAATT GTG ATG CG GG AAAG A
AAAAAT ATGTG AT CC AGTG GCT GT G AC AG G AAG C AAG AC ACC ATTT CCC AC AATG AT AG AT
GGAGTGAATGAAGTGGGCAAGGAAATGGTGGAAAGTTTACCAGCGTCCGAGGTCATGTGC
GATCCTGCACCGGAGTCGGCGTACCACTCTGAAGTAGCGTTCGGCGGGTTGAGGGTCAT
GGTGTTGAAAGAGTGCAAGGGCGTCCTGATGCCATGCTCCTCCGTGAGTGGGGTTCCAGG
GCGCCGCTTGGTACGCCGGTGCGGGCGCCTGGAAACCAGGGGCTCCAGTTCCCCCGTAG
TGCATACCGTACGGAGGGGATGGAGCGCCGTAGAAACTGCCATAGGCGTTGTATTGTGGT
ACTGCGTTGAATGCTGGCGGCGGTGGTGGGCGCCCGGACTGATCGTATGGCCACAACCG
TACAGTGCCAACCCAAGGATGCGCGAAGGACGGATGCATGCCGGGGGGCACCTCCCTGT
CTGGCACCAGGAGGCGTTGGTTGGCCCTGGTGATGGCCGTTGCGTCCACCGCGGCCGCG
ACGACGGCCGTTACGTTGGCCGTTTGGCACCGAGGGGCATGCTGGAGGATGTGCCCCTG
GACGTGGAGGTGCTGGAGCCCCGGGAGCCGCAGCTCGCAGCGTTGCAGCGACGAGGGC
GGATGGCGGGGACGGAGGTCGTGCGTCGATTTCCAGCTCCTCCAACAGCAGGTGCGCTC
GTGCCTCTGCGAACGTGGGGAACGGCCTGTGCATCTTGAGGATGGACACCATCTGGCGG
AACTTACCGCCGAGGCCGCGAAGGAGCGTGAGCACCATCTGCCGATCGTCGATGGGATC
GCCGAACTCGGCAAGGGAAGCCGCCATCGATTCGAGCTGGCGGCAGTAGTCGGTGATTC
TCAGGGACTCTTGGCGGAAGTTGCGGAATTTTGTTTCGAGCAGAAGCGCCCGAGACTCCC
TCTGGCCGAGGAACTCGTCCTCGAGGTAGCACCACGCCCCGCGAGCGGGGCCCTGTCGC
ATCATCAGAGATTGCTGCAGGTCACCGGAGACGGTGCTGTAGATCCATGTCAGGACGCAG
CAATTGGCTTGAACCCATGCCGGGCGCGACGGGAACGCTTCATCTTCAAGGACGTGACGA
GTCAGGGCATATTTGCCAAGGACAGTGAGGAACATGCCACGCCACTTGGTGTAGGTATTT
GTCGCCTGATCGAGAAGGACAGGGATGAGCGCCTTCACGTTGACGACGGCAGTGGCCTG
CGCCCAGAGGGCCTCATGGGCATGCTCATACGCGTCCAGAGCGGCAGCACGGAGGCGG
CCTTCCTCGGCGCGGCGGGCGTCTTTGGCACGGTGTGCAGCAGCCTCAGCCGTAGGGCG
CTGGTCCTCGGCGGCAGCGTCGTGGTCGTCCGCCATCGGGAGGGAGACGCGACGGCTG
GGCAGCCGAGCTGGAGCCGCTGCAGACTGGACGGGAGGGAGGCGCGACGGGGATTAGC
GTGGTCTGGAGGCTCGACCGCGCCCGACCAGAGGGAACGACCACGCGATCTGGACGGG
AATCAGCCGAGGGAGACGCGACGGGGATCCGCCGATCTGGTGCCGCGTCGGATCAGCGA
GCGTGGCGGGCGAGCGGATCAGCGGCCGCGCTCGCGGTCTGGAGCTGCGACCGCGCGA
CGAGGCGGGTGAGCGGATCAGCGGCCGCACCCGGCAGCAACAACGACGGGGCGGGTGA
TCGAACGGACGGCGCAGGCGATGGGATCAGCGACGCTCCAGGCGACGAGGTCTGCAGG
GGCGGCGATCGGATCGGCGACGGCGCGGTCTTGGGTTGCGGAAGTGTGGTGGATCGGA
ACCTTGATACCATGAAAGAGTAATTGTGGGAATGACTTGATTCTATTCCACGGCAATGTGTG
CGTATACATAGGAGAGGCCGGGGTTGCTCACAAGGCAACCGCACAGGCGTACAAGCCAAT
CAAGGGCAGCCT AC A AT C A AG GGCTGACTACCAT AATT AG G CTTT CT AT AATT AC A AT AGTC
T AAC ATTTGG G ACT AACTCG C AT CG C AC AAC AT CT AAAT AAAAC AT C AC ACT ATT AG AT CT A
GCAGGCAG AACAT CACC AAAG AT CAC AGTCTTT CACAAAACCACAACTT AAAACCAAAAG A
CCTAAAACACTAATGTGCAATGCCCACTATGCAGTATTAAGATTTCAACTGAAGCAGACCTA
GCGATGTTATTTGCTTCGAGATACTTGGAGAAGCAATCAACATCCATCTATGACATTTAACC GGTCACTAAGGCCCTGTTTGGACAGCTCCAGCTCCAGAAAATTCGGTAGAGTTGGTGGAG CAGGTCATTAGGTGCTCCATAAAATCGTGGAGTTGGAGCTGTAAGCCTTCAGAAGACATTT TGT CTTTG AT AAGTC ATTTT GATT ATT ATTT AG GTT AAAAAT ATTTTTT AAAACT ATTT AAATT A AT ATT AT AAACT AT AGCTCCG CGCT G G AGCTGG AATTT AG AGT CAT CCC AAAC ACC AACT AA ATATAGAGTATAATGACCACTAGAGCAAGGCATCGACTTTATCAAATAAATAAAATCGACAC AAAC AAC ACT G AG AAC AT GTTGG CT AGCCG ATT G AAAT ACT AAACCT AT CTTT C ACGTC AT C AATT G AC AAT AC ATTGC AT ACTTGT CT ACC AAAAC ACT CTT CT AG G AG ATG GTAT C ATT CT C ACT GTTT CC AG AG C AAGTTTGGT AC AT AGTTTGC AAAT CG C ACC AT ACTT AAAT G GT CCCAG TGTCTGCTTAACAATTTCAGAACTTGCTGTATTTTTGTGTTTGCAGTTCTTCTAAGCACATGG TT GT AATTTT G ACATTTTGTTGTG AT CTTT CT CAGG AACCTTT CCCT ATGGGT ACAAAC AT CC GAGCCAGGTCTGGGAGATCCCGGACTTCCTCTGGTGAAGAAAATCACCATGATATCCAGA CAGTCTTGACCAACGGTGACACTTCTTCCAATGGAGCTGACACTTCAGGTCCCACCAAGGA C ACTG AGTAGCTG AACTG C AG CTTG CTGCGTTG CT G ACC AC AAAG GCCC AAACT AT AACTT TTT G CAAACCC ATTTT C AGTT CTTT CCCCCCTTT CCC ATTTT G GTTCT GTTTTGT AAAGT CT C CCG ATCTGT ATTT ATT G ACTT CC ACG ATGCGG GT CACCG AAG ACTT AG GTTGCTGC AAAAT TTTGTCCCTGACGGGAAGCTATATGCAAGAGGGTGGTACTGGCTATGTGCTTGTTAACCTG AAGGCCGAGAAAGGTGAAAAGCGCAGGGAGAGCCTCCAGATTTTGGTCGCTGTAAGAATT AACCCC AT GTTGT AC AG C AG GT CCC AGT AACTT GT AGT GAT G GG AG AGTGG AGT C ATTTT C ATCAGTTTTTAGTGGTGGTTGTGTGGAGAGGAAGAGTCTTGCCTGCGTTTTCTTTTGGAAC CTTCTCTTGTGCCTTTACATTTTTTTAGTCGAGGGTTCCTCTTAAATTGTGTGCAGAGGGGG CTCAATTTTGTTAACCGGAACAAGGCGCATGTGCGTCTTGGATCAACCCCGGTCTTGTCTT C AG GC ACTGTT ACCTT ATTT AT C AAAC AT ATGT AC ACCT CC AT CT AT AT AT AGTAT G AGTTTT G ATGCCTAT CT ATTTT GTG GCTGT CGT CT C AC AAGGTT ATTT AT CT AT ATATAGTGT G AGT AA TT CTT GTT C AAAT CCTTT CT CCTT ACT AT AAAT ATTT GT C AC AAT ACG CG ATCGCT CCC AAT A ACTGCTATAAATATTTGTCTCCGCCGTGGCCTCCATCCCTAAACGGAGCACAGAGCCAGCC CCACTCCCTTTCTCCTTACTCCGACAGGAGATGCGGATGCCGCCGAGGGCCGTTCCACAT GGCCCCTAAAAACAGTGGGGTCCTAAGCTGCTGGACACTAGCATTTTCCCTATAGTTTATC TGCTTT AT AGTTT AT CT ACTTT AG AC AC AAAT ACG C AAAG AGC AT CG C ACT GT CAT CCTGTC TT ATTT AG ATT GTT AT CCT AAT AT CT CAATTGCTT AT CAAACATT ATTT ACT AT ACCC ACG AT G GTT AT ATTGGTT G AG AG ACTTTTT AAAATTG AAATT ATT G GG G AACT ATTT AAGG CCT G C AAT GATT G AAG G A AG ATT AAAT AG TTT G G C A ATT CT ATG C ATG G AG AA A AAGTT G G ATG CT ATT G AT CT CAATGGT AT AAT CTTT G ACTTT GT AT CACAAT GTT AG AAG ACATTTTT AGCGTG AT AT G AAT G AG ACGTGCG ACAGCGCAGCC ACACAAT AGCACACACTTTT AT ACGG
SEQ ID NO: 21 : OML4 promoter sequence
CATACTTGTTGGCAAGAGCGCCAATCACGGTGCCTCAAAACAGGTTATTGACAACGTCGAA CATTCTCTCCTCTTCAGGAGTGAACTGTTCGGGTTTCCCCTGTGCGGCGTGATAACAGTTC ATTGC AG CC AACC AC AAT AT CAT CTT ACT ACGT CAT CTTTT GT AAAAT GTCCTAT C AAAAG GT T CACTTGGTTTT AAAGTAGCAACAAAACC ACT AAC AG AAAAATGCCT AAT AT CAGGTTTTT G GATT GTT AG AG AAAT AT G C ATTTT C AGTTTT AATTT AAT CC AG AAAAT C AC AGTG AT GTATGT GAT G ACAT GTAT GTGC AT ATGTGTAT C ACT ACT C AC AT AAGTT GT AAAC AAC AGT AAATT AT A CAC AAAT ACT AAG AACAG AGT GT ACCCT GTGG AGGG ACCG ATGTTGCAAGGCAT CAGTGG CT CT ATT C AC ACG AG AC AT CT CAT GTGTATGTTCGATGT AGTCAT ACGC AGT CG ATGTAG AC AGATGTACGTAGTGCAGTCCCTCGAACGACGCCGGCGACGAGGAACTTGATCAGCGTTGA TTCAGCGGACGAAGCGAGCAGTCGTGAGTACGCTCCCCAAAAACCTAATCGTCCGCACAC CT GTGCAAGT AACAG ACAGCG ATTT CGG AGGCCTGCT CT CCCAAACT CT CT GTGCT CGCA GAAGGTGGGACGAGAATGGCTGTGTGCAACGCGTCTGAGACTCTACGTGCGTACTGTGAA TAGAAGCAGCCTCCACTCCTCCATATAAGTACACGCGCAGAGGGAGGTGAACAGACAGTA ACAGT CACCAT CAG AGCT ACCGTT AT AG ACAGCCAG AAATTG AT ACCATT AGT G ACGT CCG TT ACT AGCCG ACAACC ATT ACAGCCCGT CCGTT AT AGCCAT AACAC AGG AAACAACCAGT A AC AG ACG AT AAT AATGG ACT GT C ATT ACT CT AGG CAAAAT AT G C AACCCTT AG G ACG G AAT ATTCGGATCAAAGTCCGATCCACCACGGCCCCGCCGGCGGCGGCGCGCGCGCATGATAG T CCTT CAT C ATTTT CT CAG CTTT AT C AAT AG ATGC ACC AAT GAT ACTT CT ATTT AAGTT GATT GAATTGTCACTTGAACTTCCGGTATGGTACTAAAGTACTAGTACACTGTAGCATTAAAATGA GCCTTT AACATT AACT ATT ATTG AAT ATT AATTTGTGCCAG ACCCACATT AATT CAACAGTCG TTGC AACT AGCC ATTTTT G G AT CCAAAAAATTT AAAAAAATT G C AAAAACC AC AAATTT C ACC CC AAT CT CTTT AG AAAT ACCCTACG CGG AT G G AGCT CGTT AC AC AAACC ATT CC ATT AT GTT GTGCG ATTT CTG AGCGTT CAAAT AAACGTGCGTG AATT ACTT AATT CT G AAAT AAAAAAGCT AT AG AGGCT GT AGT CTGCT ACAAT CT AT GT ACT AG AGCATT AG AG ATG AAGT G AAGTCG AG
AGCTGATATGATATGGACGAGAGGAGGATGCTGCACTAGAACGAGGCTAATCCAAGCAGT
GAGTGAGAGGAGAACAATCTGGCGCAAGCAAGCAAGCAGCAAGGCTTGCCGCCCGTCCT
AACC AACT CAGCCC AAAG CCGT CG CCT CCCCC AACT CCCACC ACCC AAATTT G AACCC AC
CGCACACCAATGCACCGCT CT CTT CCGT CG AT CCCACTGCAGTACTGGT CCCACCCCT GT
AT CAAGTCACT G ACAAG ACAGCCCGCCT AG AGTGGGCCACAT CT CGT CAGTTT CAGGTGG
TATGAACAAGCCCCAGGACAGCCGCGCGCGCCGCGGTCCCGTGCTCCGCGGTGGCAGTC
ACCGGGCGTAACCGCAGGGTACGGTATAAAGGGCGTGCCGCCCGTCTCTTGCCGCCCGC
ATTTGGGTAGGTAGTTGCTGTTCCCCTGCGAGGCCCGCGTCTCCCCTTCGTCTCAACACC
CACCGCCTCCTCGCCGGAGCCAGTAAGCTTCCGGCGAAGAAATCCGGCGGGCACATCAC
AAGGGGGCCGAGCAGGGGGACCTAGGCGAGCGCCGGAATGGGGGCCAGCGCCCGGGG
GCTCGCCGTCGTCGTGGCTCTAGGGTTAGATAGGTGTCCGTAGCTTTTTTCTTCGCTCGCC
CTCCCCCACGCGGCCAGGGTGTGCAGCCCCGGCGTCGCATTGGGTCCCGGGACGACCG
TAAGGCCGCGTGATCCACCCGTGCTCGAGCTCGGACAGGGTCCCGGTTGCGTGGCATGG
GGGCATCTCTTCCGCCTGCTCCCGCTGCCTGCGAGTTTGGCACCGTTTTTGAGCTCTGAA
GAGGAGGAGGTGGTGGCAGCAGGCACCGATCTGGTGAGCCCCCACTCGCTTTCCGTTCT
CTACATGGATGGTTTCTGTTTGGGATGTTTCAATTTTGGGAAATTTTGAAAGCTCTCGTATA
AGTCGTTTTGTTTCGTGGGTGTCCTTGCTTGCTGTATGTAACCTGAGCTTGAATTCGGGGT
CTGACAATTATTTTGGGTTGTGTTCTGCCGGGAATTCCTCGTTTTATTTTGATGGTTTCTTTG
ATCACTAGGGACTTGCTGGTTTGGAGCTCGTAGAGCCCGAGGCGCATTAAATTTTACATCT
TCTGTGCTGTCGTATTGGGGGAAATTAAACATTTCTCTCAAATTTGTGGGATTCGCACTCTG
GTTTGTCAAACCTACTGGTTCTGATTCAGAAGTATTGACTTTGGAAGCTCACACGAGCTAAA
AT CCG CCTTTTT CT CT G CTG CCCTGTG GCTCG GTTGTC ATGG ATTG AC AG ATTTCTG CCCG
T AAAATTGCT CCT ATT CGT CAT GTT AACCCCT CG AC ACTT CAT CTTTT CCGC AAGTTTT ATT A
ATTTTGCGTTGATCCTGGGCAATTGAGATACGGTGCTGTTGTCTAGGTTTGTGCCTAACAC
GTT AT ATG GTCTG G ACGCCTG C AG G
SEQ ID NO: 22: GSK2 amino acid sequence
MEAPPVPELMDLDAPPPAAADAAAAAPVPPAVSDKKKEGEGGDTVTGHIISTTIGGKNGEPKR
TISYMAERVVGTGSFGIVFQAKCLETGETFAIKKVLQDRRYKNRELQLMRAMEHPNVICLKHCF
FSTTSRDELFLNLVMEFVPETLYRVLKHYSNANQRMPLIYVKLYMYQLFRGLAYIHNVPGVCHR
DVKPQNVLVDPLTHQVKLCDFGSAKVLIPGEPNISYICSRYYRAPELIFGATEYTTSIDIWSAGCV
LAELLLGQPLFPGESAVDQLVEIIKVLGTPTREEIRCMNPNYTEFRFPQIKAHPWHKIFHKRMPP
EAIDLASRLLQYSPSLRCSALDACAHPFFDELRAPNARLPNGRPFPPLFNFKHELANASPDLINR
LVPEQIRRQNGVNFGHTGS
SEQ ID NO: 23: GSK2 nucleic acid sequence
ATGGAGGCGCCGCCGGTACCGGAGCTCATGGATCTGGACGCGCCCCCTCCCGCCGCAGC
CGACGCCGCAGCCGCGGCGCCGGTTCCCCCCGCCGTCAGCGACAAGGTGAGCGAGTGC
CCCAGATCCGGAGCTGGGCTCGGATCTGCGGCCGTGGTCGCGGCTGGGCGCCTCCCGAT
CTGCTGCCTCCGCGAGCGACGTTGCTAATGGTGGTGGCCTGTCTATTTTTTCCTCTCTCAC
TTTCCGTTTGTGTTGCAGAAGAAGGAAGGGGAAGGGGGAGACACTGTTACGGGTCACATC
ATCTCCACCACCATCGGTGGGAAGAACGGCGAGCCGAAGCGGGTAAAGCTACGCTTCTCT
CGCTGTCTGTTTGTCTATCTGTCGTGCCGATGTGCGCGTGAATGCTGCTGCGGTTAGTGC
GGCTGAAGTGCCCCCGCTTGTTTCGTAGCGGCCTTGCGGTCGGAATCCGTTTTGATCTGA
CGGTTTGCGCATGGGGTCGTGTTCTGCGCCTCTTGTTTAGCGGCTACACAGCTACAGCTA
GCATGCTGGTGAAATTTGGTGGGTTTGTTCTGGTTTTGTTGATGTATTATGCTCTCCCCGCT
ACTCTGGGCCTCTGGGGATTCTGGCTGGGTTGCGCTTCCTTGGCTTAGTGTTTGCAGCTG
AATTATGTGTCTGACCGCTTCATTTCGTGCTTCGTTACTTGGTTTTTTAAGGCTAACATGCAT
TT AGG AAG C ACG GT CT ACC ATT CTT GTG ATT AGTT CTGCCGTGTG C AG AAC AG AAAT G GTC
TAACTGTTAGTTTAGGTCCAGGTATGAGTGAGGATTCGAATTCCTTCCTGCTCAGTTGCTCT
G ACGCCTGCCT AGTTT GTT ACCCT CTT CGTGT CCTCAGTTGCT CATTT GTTCTT CTT CTGGC
CTTAATTGCAGACCATCAGTTACATGGCAGAACGTGTCGTGGGTACGGGCTCATTTGGGAT
CGTCTTCCAGGTATGGTGCTTGGTCATGGGAGCTCTTCTTTGTACGTGCCTAACATTTGTT
G ATGT AAC ATGC ACT G AATT AACTTT G AC AT GT AGGCT AAGTGTTT G G AG ACTGG AG AG AC
CTTCGCCATTAAGAAGGTGCTGCAGGATCGGCGTTACAAGAACCGGGAGCTGCAACTTAT
GCGTGCCATGGAGCACCCCAACGTCATCTGCCTGAAGCACTGCTTCTTCTCAACAACGAG
C AG GG ACG AGTT GTTT CT AAACCTT GT CAT G G AATTT GT CCCCG AG ACCCT GTACCGTGTC CT G AAGCACT AC AGCAACGCG AACCAG AGG ATGCCT CTT AT CT ACGT CAAGCT CT ACAT GT
AT CAGGTTT GT G AACCAGCAT CTT AACTT AT ATG AAGCTGCT AAT GT GTGCTTT CATT GTTTT
GCT AACT GTCT CTTTTTTT GTAAT GTT CG C AG CTTTT C AG AGG CCT AGCCT AT ATT CAT AAT G
TACCAGGAGTCTGCCATAGGGATGTAAAGCCACAAAACGTTTTGGTACGTGTCATGTGGAC
AAGGTTTCGTTCTTTCTTTGATTTGGTAACTAGTTCTGAGTTGGTCTGATCCTTCTTTGATAT
ACAGGTTGATCCTCTCACCCACCAGGTCAAGCTCTGTGACTTTGGTAGCGCAAAAGTCCTG
GTATGTT GTTTTT CTTT CCTT G AGG ATTT GTAGT C AC AT CC AGTTGTT GT ATGCTTT CT CTTT
T G AAAT ATT CTT AT C AAAG GCTTGTTTTT CTTT CCTTG AG G AT ATGTAGT C ACAT CC AGTT GT
TGT ATG CTTT CTTTTTTG A A AT ATT CTT AT C AA AG GCTATCCATACT ATT G G C ATG G CATT AG
TGGTTTGT GT C ACT AT GTAAAAT GTAT CAT C AGTATGCT G CT ATTGCCT GTT AT GATT AATT G
TGATTGTAGTTGGTTGGTCCTATGGAACAAAACACATCTTGAAGGTAGCTTAAGTATAGATG
CAAGGCTCGTGGATATATTTCTCAGTGAACTATTGATACAAAAACTGTCCTGTTACATAGTT
TTGGT CT AG AT AT CTTGCAG AT CAAT GTTGGCT ACATTTT AGTCAAGCTT AT CAAATTT GT CT
T CAT CAT GTGCAGTT AT ATTT AT CCT ATTT G AGCTAT G ACTT ATT CAATT GTTTCTGGGGCT G
T CTTT GTATGT AAT AG CATT GATT CTTTT G CTT CT AT CCG AAGTT CC AAT CTTGG AATT G ACT
GT AATT CAT GTGTTAT AAC AATT AG ATT CTTT ACTTGT ATG CTGT ATTTTTTT AT CC AC AT ACT
AAT C AGTT CC AT AT GTT GTTTT GT CAG ATT CCTGGTG AACCG AAC AT AT CTT ACAT AT G CT CT
CGTT ATT AT CGTGCT CCAG AGCT CAT ATTTGG AGCG ACGG AGT AT ACAACTT CAAT AG ACAT
ATGGTCAGCTGGCTGTGTTCTAGCTGAGTTGCTTCTTGGTCAGGTTGGTTGCATTCATATA
ATG ATT AACCT AAT ATTTTT GTACCT CGATTTG ACC AAAT CAT GTATGGT GTG ATT CT AACCT
CTTGGGGTCTTGGATCTTATGTATCAGCCACTGTTTCCGGGAGAGAGTGCTGTTGATCAGT
TGGTAGAGATTATCAAGGTACTGCAAAATGTTCCAAAGTAGACATTCTATTCTTCTACCGGG
GT GTTT CTT ATGGTTATGTG ATGTGCCTGTAG GTT CTT G GT ACT CC AACCCG T G AGG AG AT
ACG ATG CAT G AAT CCC AACT AT ACT G AGTT CAGGTTT CCT CAG AT AAAG GCT CAT CCGTGG
CAC AAGGTATT CTT ATGTT AAAAT CATGTTT CTGT CCAC AT CT ATG ATT CCATTT CACCAGCA
GCTACTGTAGTTATATACTTGTAGCCCACGGTCCAAAATGTTATTGAAGGGCGCTTAAAGAA
ATTGTTTGCAGATCTTAGCGAAAATTTGAGCTCAGAATGCATCAGTTACTGACTGATTGTTC
C ACTTT CCGTTTT AT CCT CC AG ATTTT CC AC AAG AG AATGCCT CCG G AAG CC ATT G ACCTT G
CTT CCCGT CTCCTT C AGT ATT CGCC AAGTCTT CG CTGCTCTG CTGT G AGT ATTTTTTTTT AC
TTTGTTTATTTAGATTAGAGTCAGCTTTGATGCTTATAGTTTGAGGGGAATAGACAATGGAA
CGCGACAGACTAAGGTCTTTGAGGTCTTTGTGTTGCATATGGTCTATTTTACTTGGCTTTGG
TT ATT CG AAAG CTT CC ACT GTTGT CAATT ATCCGT AGT CCTGT AG ACC ATG ACC AATT AAAA
GTGGCTAAAATCCATGTGGAATTATGTCCTCTCAAACCATAGCGTATGGTCCTGCATGTATA
TGGTAATTATGCTGCCCAGTGGTCCAGAAGGCTAGTAGAACCATCAGTTTTGATGG ATGTT
AGCTGATGAAGAGTGGGTGCAACTTTATAGTCACATGTGCTTGTTAAGTGTACTAGCTAGT
GGCCCACCT AAAAAG AGCAGT CCT AGTT CT CACATGCT GT AGGGTGGCACACACCAT AAT C
TTT AAT G CAT C AGTTTGTTGGTT AG AAAT GTT CT AAT GTG CTT CAT GT ATT CT ATTT CT ATT C A
GCTTGATGCATGCGCTCATCCCTTCTTTGATGAGCTCCGGGCGCCGAATGCACGTCTACC
AAACGGCCGTCCATTCCCTCCGCTCTTTAACTTCAAACATGAGGTAAGCAAACTAAACACA
GTGCAAAGTTCGTTTAGCCAGACGCTTCAGTTCGGTCATTAAAGACCTGAAAGGATCCAGT
TTGC ACTTGCT CT ACT GTTTTGCT CAT C AGTT ACCCCCCCCCCTTTTTTTTT G CTATG CAG C
TAGCAAATGCCTCCCCGGACCTCATCAACAGGCTTGTACCGGAGCAGATTAGACGGCAGA
ACGGTGTCAACTTTGGGCATACCGGGAGCTAGGAGGGCAGGCGGCTGCCATGGTCAAGT
TTTTGGTCTTGGTACCCCATGTGCAGGGCCGATTGCAGGTGACGGTGATATTGCTGCACC
ATCTGGAGAGGAGGGGCTCAGCAGTACCTGAGAGAGCTGAAACTATGTAAATTATCTGAC
CGCGAAGGAGTACGGCCAGTGTTAAGCCAGTAAACTGGCGCATGTTGGTCCAGAGTAGTT
AAGAATGTAGCAGGTGGAGACTGGTAAATGCCTAGGGTCGTTTTTAGTTGTTGTTACTAGT
ATTTTGTAATGTAATGTTCGTCGGTACTTCCCAGCAGTAGTGTAGCTGCTCATGTTTTGTTC
GCCCGTCATGATGTAAATGATCATCACCCAACTGGAACCCCTGTTATCTCGTTACATGCTTA
GGCCT CT GAT CGTT CCGT AGTTGCT GT GAT AG AGCT CTG ACAAGTCT G AGCAGG AAAGTG
GGTAGGAATTGCTTCTGGTGAAATCTGGACAAGTTTTGTCGAATACAGATGCATCTGCTGA
TTG ATCGTCTGGTTG C AAGT AGT CTGC AC ATTCCC AAGG CC AC AG AT CATT ACTTT CAG ATT
GTT GAT AACG ACCAAATGGCAAGTAACAG AAACG ACCG AAATT CGCAAGCAGGCAATT ACA
GACGCGGCCGCGCCAGCACATCGCCGCCGTAGTCCTTGACCTTGCTCCACAAAACCGGC
CATGCGTCCTTCCTGATGGCGGCTAGGTTCTCTCCATCCCAGTCTCTGGCTGCGAGCTCC
TCAGCCATGGCGACCGCCGCGGCCACGACGTCCTCCACGCCGCCGTCCACGGCCGCATC
CACGATGCCTCGGCGCGCGGCCTGCGCGGCCGTCATCTTCTCCCCCTTCATCACCAGGTC
CCTCCTAGCGGCCGCGTCGGGGACCTTCTGCCGAACCAGCTCGCCGACAAAATCGACGA TCTTGATCCCGGCGTCGACCTCGCTCATGTAGAGGAACCCGCGGGAGGCGCGCATGGCG
ACGGCGTCGTGCGCCAGCGCGAGCGCGCAGCCGGCGCCCGCGGCGTGGCCCGTGACG
GCCGCGACCGTTGGCACGGGGAGCGCGAGCAGGTCGGCGACGAGGCCGCGGAACGCG
GCGCGCATCTCGGAGAGGCGGTGCCCCGGCGCCGGGGCCGGCCCCGCGCGCGCCCAC
GCGAGGTCGTAGCCGTTGCTGAAGAACTTGCCCTCGCCGGCCAGGACGAGCGCACCGGG
GGAGGCGCGGCGGGCGGCTGCGACCGCGGAGCGGAGGGCGGAGAGCAGGGCCGGGCT
GAGGCGGTGCTCCTCCGCGCCGGTGAGGGTGATGACGTGCACCCGCCCGCGCTTCTCCA
CGGCGCACAGGCTTTCCTCCATCGTTGGAGTGGATTTGGGGCTCCTCACTGCTATGCCAC
TG AT AGT AT GTTT ACTTTTT CCCCT CTTGC AT CTGG G AAG GT CC AAAAT GTCCCTGGT CC AG
CT CT AGT ACAG AAGT GTTCAACT AAACCTTTT CT GTT CTGGCGT CAACACCAAGGCCCT AG
AGCACAAACCAAATTTAGGAGTGAAACTAAATTATATCAGGAACATAATAATTGGAGGTGAA
TTT AATT GTAT AG ACT ACCTT ATT C AAATT ATG AG CCT ATTTT AATT G G ATGT G ACTT CAACA
TT ATT AG AT ATGT G AAAG AC AAG AAC ACT ATG AATGGTGTT CAT AAAC AC AAAACC AT CT C A
ATTCTTTGATTCACAATTTTATGAACTTTGAGAGTTGGTAATGTGTGGTGGGCTTTTTCACTT
GGTCAAACAATCAAGAGTTCCCCATTTTGGGAAAAACTTGATCGAATTCTTGTCTCAAAAGG
T GAG AAGT C ATTTTTT CC AC AAGCT ATG GAT AAAT G AAT CCCT AG AG AG ATTT AT G AAC AC A
ATCCCCTGCTTCTTTCAACTGGAAACT
SEQ ID NO: 24: GSK2 promoter sequence
AAGTGAAGGATATCTTCTTTGCGAATGGGATATTCCGAGTTAACAATGGACAGGACACAAG GTTTTGGGAGGACAAATGGCTGGGGGATTTCTCGCTCCAGCATAGATTCCCGAGCCTATAT AACCT AGTGCAGCGG AAG AATGCT ACTGTGGCCAAT GTGCT AGGGT CT GT ACCT CT CAAT G TATCCTACAAGAGAGGCTTACATGGTGCTAATTTGGAGAGATGGCATACCTTAGTCAGCCT AGTAGTGGATACGACGTTGAACCAGGCAAGAGATAGTTTTCGTTGGAGCCTTCATCAAAAC GGGTTGTTCTCCACTCAGTCTATGTATGCGGCATTGATTGGGAACGGACAAGTACGGCAG G ATGGCCTCATCTGG AAACTAAAACTCCCCTT AAAG ATCAAGATCTTCTTCTGGTTCTT AAG ACAAGGGGT AACCTT AACT AAAG AT AACCTTGCCAAG AG AAATTGGT CAGG AT CAAAAAAA T GTGTTTT CTGT CC AC AAG AT G AAACC ATT C AAC AT CTTTT CCTCC AGTGT C ATT AT G CG AG ATTT CT ATGGCGT ACGGT AT ATTTT ACATTTGGCATT AG AG AACCAACT AGTAT AG AAG AT A TGTGTTCTTCTTGGCTTCAGGGGTTTCACCCTAATGTTAAAGCTAAGATATATGTGAGCGCT ATAGCTATTTGTTGGGCGTTGTGGCTAAGTAGAAACGATGTGGTTTTTAATAAATCTCCTAC CCAAACTTATTT ACAGGT ACT CTT CCG AGG AACTTACTGGT GT CGTTT CT AGG G ATGCTT CA AAGGCATAAAGAGGACACTAGAAGCATGAGGGAGGCCTGCAGACTTTTGGAGACATCGAT G ATGC AAGT CTT CTCG ACGTATG GTT G G ACCTT C AGTAAT AG ATT AACT ATGTG AT GTTT GT CTATTCTCCCAACTGCGTTTGGGTTTTGTGGCCAAATGTGGCGTTGTTTCGGTACTTTATGT TGTGGTGTGTGGACGGCCGTCATCAGCTGATGTAGGTCGGGATTTGGTTTTTTTTTCCCGT T AT CT A AAA A AT AT ATGTG G CT AG ATTT ATC ATC ATCC AG G T AAAT AT AG AC AT AAA A ATT A A GAT CT C AAAT G AAT AAT AT CTT CG ACCGG AT G G AGT AT G AC AT AATTTT AC AT C ACG ATTT C T AAAC AATTGCT AAGTT CTTT CCGCT C ATT CGGTCT ATT GT AC AT ATGTAT C AAC AT CTT AT A CT CAT CCGTCT C AAATT AAG ATT CGTTTT ACTT AATT AATGGGTT CAT AC AAC ACTTG ATTT A T AT GTT ATGT ATGT GT CT AGGTT CAT CTT CATTT ATTTG AAT ATTG AT AT AAAAAT CAAG AGTT AAAACAACTATTATTTTGGGACGCGGTGAGTATTTTTTCTCCATTTCCTCGCACCTAGGGAT TTCACGCGATGGATACACATTCTATGTAAAAAAAGATTGGGCGTTAACAGTCAGTCATTAAA AAT ATT CTTTTT CT AAAAAATT AAAAAAAG AGG AT CTCCATTGG AAAT AT GTTTTTT CG AAACT ACT G G AG ATG CT CT AGGT ATT GT G AAC AGTTTTTTT CT C ATT AAAAAGATGCTGC AAAAT CC GTT G ATGCT CCT AG AT CACT CG ACAACT ACAGTT ACCAT CGTT CATGCCTT CGGTTTT AGCA ACAAAAAACAGTGCAAT CCT AAACAAAAGCAT CT ATT AAT CACAATTGGTTGCTGCCATTGG T ACTGCACT CAGCAACT CT GTT AG C AAAG GTAATG CACT CTT GT AGT CTTT G ACCG G AT CTT TTGGCTAGGGAAAACTAAGGATGCGTTTGGTTACGGGACAGGCAGGATAGAGATGTCCCC AGGCGTACTCTCTCGTCACTCTAATTTCGAGGGGCAACTAGAGACAACATTGGAATAATCC TGTCT C AACCCCT GATT CT G AACT AAAC AACCTT ATTT AAGGTACGT CCT AT CT CAT CCCGT TCTGTCATTATAACCAAACGCACCCTAAAAAAATGTTCATGAAGGAGAGAATTAAAAGGTTC CAGTTTT CAGTATGCT AG TTT AGCAACG AGTGT ATTGCAATT AATT AT CACT ATT GTT CGG A CCCT CCATTTT G GTAGT ACAGGT AAAT CCCT ACT AAG CAAG AAT AAT AT GTTTTTTT ATGCTA C AC AT AGGT AGCGTTT G AGT AG ACTT GT ATTTT AAAT AAAAT G CT ACTGCT GAT AAG ACT AT AACGGTACGGGAAAAAGAAGACAATTTAGAGCTTGCCAAATTTCTTTAGCAGCCAATTAATT CCT ACC ACGGT CCTGTCCT C AG AATTTTTTTT AGTAAC AAAT C AGT G CACT ACT GATT CCTA AACCAGGCTGAAACCGGAAACGGCTCGCTGCGCTGCCGCTGCGTCACTGTCGCTGGCAA AGAAAACAACTCCCGGCCAGGGGTCCGAGCAGGAGCAGCAGTATATTTTCCCGCCGCTAA
TAAAAACAGTCAGCGGCACACTTCGCCAAGCGAGGCAGGCAGCGGCTGTCCCGAGCTGT
CGAAAGCGAGGCGCGGCGGCAGTCCTCGCAGCAGGGCCGACCGGTCAAAAGCACTGCT
GCTCCACACCACCCCCACCATCCCTTTCCCCAACCCCCGAAGCCGAGCCAGCGAACCACC
CCGCCCGCAGCCGCAAGCAAGCAGCCAAGCAGTGTGAACTGACCGTCCGTTCCGTCCAG
CCCACC
B.Napus
SEQ ID NO: 25: OML4 amino acid sequence
MMPSDIMEQRGVSTPSHFREDTRISSERQFGFLKTDLIPENQGGRDRFSNLPKSSWTPESHQL
KPQSSLSGVHPSVSPNARNTTNGSQWESSLFSSSLSDTFSRKLRLQRSDMLSPMSANTVVTH
REEEPSESLEEIEAQTIGNLLPDEDDLFAEVMGDVGRKSRAGGDDLDDFDLFSSVGGMELDGD
VFPPMGPRNGERGRNNSVGEHHRAEIPSRTILAGNISSNVEDYELKVLFEQFGDIQALHTACKN
RGFIMVSYYDIRAAQNAARALHNKLLRGTKLDIRYSIPKEIPSGKDASKGALLITNIDSSISNEELN
RMVKSYGEIKEIRRTMHDNPQIYIEFFDIRASEAALGGLNGLEVAGKQLKLALTYPESQRYMSQF
VAHDAEGFLPKMPFTNTSSGHMGRHFPGIIPSTSIDGGPMGISHSSVGSPVNSFIERHRSLSIPI
GFPPLANVISASKPGIQEHVHPFDNSNMGIQSMPNLHPHSFSEYLDNFTNGSPYKSSTAFSEVV
SDGSKANDAFMLHNVRGVDGFNGGGIGSPMNQNSRRPNLNLWSNSNTQQQNPSGGMMWP
SSPSHLNSITSQRPPVTVFSRAPPVMVNMASSPVHHHIGSAPVLNSPFWDRRQAYVAESLESP
GFHIGSHGSMGFPGSSPSHPMEIGSHKSFSHVAGNRMDINSQNAVLRSPQQLSHLFPGRNPM
VSMPGSFDSPNERYRNLSHRRSESSSSHADKKLFELDVDRILRGDDVRTTLMLKNIPNKYTSK
MLLSAIDEHCKGTYDFLYLPIDFKNKCNVGYAFINLIEPEKIVPFYKAFNGKKWEKFNSEKVATLT
YARIQGKVALIAHFQNSSLMNEDKRCRPILFHTDGPNAGDQEPFPMGTNIRSRPGKPRSSSIDN
HNGFSIASVSENREEPPNGTDPFLKEN
SEQ ID NO: 26: OML4 nucleic acid sequence
ATGATGCCGTCTGATATAATGGAACAGAGAGGTGTATCAACACCTTCCCACTTTCGTGAAG AT ACT CGTATT AGTT C AG AG GT AACTTTTT CTTTT ACT GTGTAG C ACC AT CTTTGT C AC ATT A TCTG CC ACT ATTTT CT AT GAT GTTT AAAACT GTTTT CTTTTT GTTT CT C AAGTAT ACTT GTT CT TTTGTCTGGCAGAGGCAATTTGGGTTTCTGAAAACAGACCTGATTCCTGAAAACCAAGGTG GTCGTGATAGATTTTCAAATCTGCCAAAGAGTTCCTGGACACCTGAAAGTCACCAGCTGAA GCCACAAT CT AGCTT GTCTGGGGTGCACCCCT CT GTT AGCCCT AACGCAAG AAACACCACA AATGGTAGCCAGTGGGAAAGTAGTTTATTTTCCAGCTCACTGTCTGATACATTTAGTAGAAA ACGTAAGCTT CTGGTT CACTTTT AT G AATT GTT ACTT ATT AT GTTG ATTTT GTTTT AT CCT CT A CGGTAAAG AAACG CCGTTT GTT AAT CT AGT AC AT CAT AG ACG AT CGT G AAAGTTT GTTT CTT T CT CCTTT AACTT ACT GTACTTT AACT ACTTG ACTGCGTCT CCAAATT CTTGGTTTTTGCAGT ACGGTTACAGAGAAGTGATATGCTATCTCCTATGTCTGCGAACACAGTTGTTACCCACCGT GAGGAAGAACCCTCTGAATCTTTAGAAGAAATTGAGGCGCAAACTATTGGAAATCTTCTGC CAGATGAAGATGACCTCTTTGCAGAAGTGATGGGTGACGTTGGGCGTAAATCTCGTGCCG GTGGAGATGATCTAGATGATTTTGACCTTTTCAGCAGTGTTGGTGGCATGGAGCTAGATGG AGATGTTTTTCCTCCTATGGGCCCCAGAAACGGAGAGAGAGGCCGCAATAATTCTGTTGGC GAACATCATCGAGCGGAAATTCCATCCAGAACAATTTTGGCCGGAAATATCAGTAGCAATG TCG AAG ACTATG AG CT G AAG GT CCTTTTTG AG GT ACCTT ATT CC AG C AG CGTTTCCCCCC A C AG ATTT GTTT AT AT AAT CT G G AATT GATT ACTT CGT ACTG AG AAT ACTTTT ACTT GTT C AG C AATTTGGAGACATCCAGGCTCTTCATACAGCTTGCAAGAATCGTGGTTTTATCATGGTATCC TACTATGATATAAGGGCTGCTCAAAATGCGGCGAGAGCACTCCACAATAAGCTGTTAAGAG G AACG AAACTT GAT ATT CGTT ATT CT AT CCCT AAGGTATG ATT CCTT GTTTTT AT G AAATAT A TTGT CTTTGCT CTGT G G AC AGT ATTT GT G ACTT AT GTT G ATTT GTATCT AT CTT AC AATTTT C TTGGCTCCAGGAAATTCCTTCAGGAAAAGACGCCAGTAAAGGAGCCCTGTTGATTACTAAT ATT GATT CGT CT ATTT CAAAT G AAG AACT CAAT CG AATGGT CAAAT CGT ATGG AG AAAT CAA AGAGGTTGATATATTGAGATGCTCCGTTTAGTTACTTTTCTGAGGTAGATTCTAATGATGTTT CTGTGGTTTGCAGATTCGTAGAACCATGCACGATAACCCACAGATATACATAGAATTCTTTG ACATCCGAGCGTCAGAGGCTGCTCTTGGTGGCCTGAATGGACTCGAGGTTGCTGGGAAGC AGCTT AAACTTGCGTTAACCTATCCAG AG AGTCAAAGGTGGGTGACTGGTTGTTTTTTTTTT CT CCCTGGTTT AT ATT CCTTT GTGGGCTGT G AAT G AAT ACAAAAT CCT AAAT CAAAATG ATTT GAACATGTGCTTTGCTGTTAAGTATTTACGAGGATGCCAGTTGTGTTGATGTATGGGGTTC ACCCATTCTTTTTTCTTTATTTCAGGTACATGTCACAGTTTGTTGCACATGATGCTGAAGGG TTT CT ACCT A AAAT G CCTTTT ACT AAT ACATCATCTGGGCACATGGGTATG CTTTT G C ATT C A GCATTT GT AATT CTTTTTTTT ATT G AATG ATTT GT CAT CTTG AT ACT CAAACCACTGCCGTT A AAT ATCTCTGTGT C AG GG AG AC ATTT CCC AGG AAT AATT CCTT C AACCT CC ATT G ATG GTGG ACCTATGGGGATTAGTCATAGTTCTGTTGGATCGCCTGTGAACTCCTTCATTGAACGTCATA GGAGTCTCAGCATTCCTATTGGATTTCCACCTTTGGCAAACGTCATCTCAGCCAGCAAGCC CGGAATTCAGGAGCATGTCCACCCTTTTGACAATTCAAATATGGGGATCCAAAGCATGCCA AACCTT CAT CCT C ATT CTTTTT C AG AGTACCT CG ACAACTTT AC AAAT G GTAGT CC AT AT AAG TCCTCGACAGCATTTTCTGAAGTCGTCAGTGATGGCTCGAAAGCAAATGATGCCTTTATGTT ACATAATGTTCGTGGAGTGGATGGCTTTAACGGAGGGGGTAAGCTCTTTATCTCTAAATTG CT ACT GTTTT GAT AAATTT GT CG AAG AAT AAT G ATG ATATGT AGTT G AC AATT GT G AGTTT AA G AAG AAT GTCTGCCGTAG C AC ACT GTTAG G ATGGT CCTT AC AATTTT AGT G G AAT CT G AAAT GTGCT ACAGCG ATG AAAATT CT AGGT ACT GTTT CT GT AG ACAACTTTTTTT AAAAGC ATT CTT GGTGT AAAACTT GT CAT CCTGGG AAAAT ATT ATT AGT ATT AT GTT CTT AATTGCAGTCAT AT A GACAGATAACTGTGCTGGGTTTGAAATTGAATTTGAAAGTGGCTGAAACATTCGTTGTGTAT GT C AAC AG AATT G C AC AATT ACTG AGT G CT AGT ATTT CTT CT ACTGT CAT AC AT AAT ATT GTT TTTTTCTTTCTCACTTTTAGTTGTTGTGGTCTTTTGACTGTAGGCATAGGGTCTCCCATGAAC C AAAACT CCCGCCGCCCT AACCTT AATTT AT G G AGC AATT CT AACACT C AG C AAC AAAAT CC TTCAGGTGGCATGATGTGGCCTAGCTCGCCGTCTCACCTCAACGGCAAACGTCATCTCAG CCAGCAAGCCCGGAATTCAGGAGCATGTCCACCCTTTTGACAATTCAAATATGGGGATCCA AAGC ATGCCAAACCTT CAT CCT CATT CTTTTT CAG AGTACCT CG ACAACTTT ACAAATGGT A GT CC AT AT AAGT CCT CG AC AG C ATTTT CTG AAGT CGTTAGTG ATG GCT CG AAAGC AAATG A TGCCTTTATGTTACATAATGTTCGTGGAGTGGATGGCTTTAACGGAGGGGGTAAGCTCTTT AT CT CT AAATT G CT ACT GTTTT GAT AAATTTGT CG AAG AAT AATG ATG AT AT GTAGTT G ACAA TTGTG AGTTT AAGAAGAATGTCTGCCGTAGCACACTATTAGGATGGTCCTTACAATTTTAGT GG AAT CTG AAAT GTGCT ACAGCG AT G AAAATT CT AGGT ACT GTTT CT GT AG ACAACTTTTTT T AAAAGC ATT CTT G GT GTAAAACTT GT CAT CCTGGG AAAAT ATT ATT AGT ATT AT GTT CTT AA TTGCAGTCATATAGACAGATAACTGTGCTGGGTTTGAAATTGAATTTGAAAGTGGCTGAAAC ATT CGTT GT GT AT GT CAACAG AATTGCACAATT ACT G AGTGCT AGT ATTT CTT CT ACTGT CAT AC AT AAT ATT GTTTTTTT CTTT CT C ACTTTT AGTTGTTGTGGT CTTTTG ACT GT AG GC AT AG G GT CT CCCAT G AACCAAAACT CCCGCCGCCCT AACCTT AATTT ATGG AGCAATT CT AACACT C AGCAACAAAATCCTTCAGGTGGCATGATGTGGCCTAGCTCGCCGTCTCACCTCAACAGCAT T ACT AGT CAGCG CCC ACCT GTT ACT GT ATT CT CT AG AG CACCT CCTGTT ATGGTG AAT AT G GCATCTTCCCCTGTGCACCACCACATTGGATCTGCGCCCGTATTAAACTCGCCTTTCTGGG ATAGAAGACAAGCCTATGTTGCTGAATCTCTAGAATCGCCTGGCTTCCACATAGGTTCTCAT GGTAGCATGGGGTTTCCTGGCTCTTCACCCTCACATCCAATGGAAATTGGTTCTCACAAGT CCTTTTCCCATGTTGCTGGGAATCGCATGGATATAAATTCCCAAAATGCTGTACTGCGATCT CCCCAACAGTTGTCTCATCTCTTCCCCGGGAGGAACCCAATGGTTTCAATGCCGGGTTCGT TT G ACT CGCCT AAT G AACG AT ACAGG AAT CT CT CACACCGTAG AAGCG AGT CT AGCT CT AG TCATGCTGACAAGAAACTGTTTGAGCTTGATGTTGACCGTATATTACGTGGGGATGATGTC AG G AC AAC ACT G ATGCTT AAAAAC ATT CCT AAT AAGT AAGTGG ATT C AGT GT CTTT CCTTT A TT CCTT GTT AT AT AT CTTTT GTT AG CTT CGT AG GTTGTTTG AT GTTTT CCTTTT C AATT CT G AA CTCTATAAAATGCTGCTATGGTTTAGGTATACTTCTAAGATGCTTCTCTCCGCCATTGACGA GCATTGTAAAGGAACGTATGATTTCCTTTATTTGCCAATTGATTTCAAGGCAAGCAGGCGTC CGT CCT ACCTTTTT AT AT AAT AGTCTT AT GT AG AAAATGGGCTTTTGGT ATTTGCAAT AT CAG T ATTTTTTT G CT AACCT AATTTT ACCTT CT CGTTT CAG AAC AAATGC AAT GTGG GAT ACG CTT TCATCAACCTTATTGAACCTGAAAAGATTGTACCATTTTATAAGGTACAGCCAGCCTTTTCT GTTGCTGCTTTTT AT AT ATTTTTTGG CTTTTT CT CTTG AAG AGCATTGGTT AAAAGTTT AAAAA AAACTTGCAGGCTTTTAATGGAAAAAAGTGGGAAAAGTTTAACAGCGAGAAGGTGGCAACT CTTACATATGCTCGAATTCAAGGAAAAGTAGCACTTATTGCCCATTTCCAGAACTCAAGCTT AATGAACGAAGACAAACGTTGCCGGCCTATTCTTTTCCACACCGATGGTCCAAATGCTGGT GAT C AG GTG AAT GTT ACT AAC AC AT CAG AT AAC AT CAT CTT GTT AGGGTT CT C ATTT CGT AG TAGTTGCTCAATTTCGCTCTCCCTTTGGTTGCACATATTGAAATGGGTTCTTAGTGAGATCT CAT AAGTT CAAAG AT GTGGTG ATGCT CAGTT ACT CAAT AAG AG ATT G ATTT GTTT CAT ATTT G TCACCTTTGTTGTTATTATTTGCAGGAACCATTTCCAATGGGAACCAACATACGATCAAGAC CAGGAAAGCCACGAAGCAGTAGCATTGATAACCACAACGGCTTTAGCATCGCTTCCGTTTC AGAAAACAGAGAAGAACCTCCTAATGGAACCGATCCTTTCTTGAAGGAGAACTAACCAATG AGCAAAAAAACCAAGCAGAGGTAAAAGAAAGTTAAGGAAAAATGAAGAGCTAAAGATATAA CAC AAGTTTT AT ATT ATT AT AAT CAT AT CAT CAGCACACCCT AG AGTT CT GT AAAT CGGGGG TGTT AAATTT ACCCT G AC AAAACT GTTTTT G CGGT G AAG AT AT ATTTTTGG AG AG AT CATT AA ACTTT GTTG ACCT C AAACCTT C AC AGGTTGCTT C ACC AGTTTTGTT GT ATT AT C AAAT ATCCC CT G AG AAAT AT CTT CG AG AGTTT CT CTTT ACTTTTT GTTTTTTTTTTT GT CTT GTTT G GGGTT A TT C AAGT ATTTTT GT CTT CTTGCT ATCG ATGTAGTATGT AAC AAGCCTT G G ATTT ACATT C AA CGT CTTTGCTGGCT ATTTGTGGCCATTT CAT GTT GTAACTTTTTTGG AG ATTTT AAT G AATGC TT CCTTTTTGG AT AAA
SEQ ID NO: 27: OML4 promoter sequence
AGTAATT AAT ATT CTTTTCGTT CCACAAAT AT AATTTTTTT AGT ATTTT CACAC AT ATT AAG AA AACACGCT AAACT ACCAT AAT AAAT GT ATT GTTTT AT GT AATTTT CAATTTT CAAT AACTTTT A ACC AAT AGTAATT CAAT AAAGT CAATT AATTT CTTT G AAATTT ACAAATTTTT CAT AG AAAAC A C AAAAAT AC AT ATTTGT G AAAC AAACTTTTT C AAAAAAGT CT AT CTT GAT G AAACGG AT G G A GTATTATGTAT AAT ATTTTT ATT AT AT ATTTT ATTGCTAAATAAAAATTTTATGACTTTTGTTTA CTTTTT CACCAAT AAAAG ACT AT AATGCAAAAT GT AAAAT ATTT AAAGTTT AATTT G AAGTT GT T ATTT CGG AAAT AAT CACCTT CG AAGTTT AAATTT GT AAT ATTGCAAACTTT ATTTGG AG ATG TTTT C ACG GT CG ACTTGCT AC ATG ACT CTTTTTTTTTT GT AGC ATGCT AC AT G ACCCT CT ATT CTTTTTTTT CCCCT ATTT ATT GTT ACTTT ACAATT G AAAT AAT AAGGC AAAAT ACAAT AGTGG A TGACTTTTTTCCCCATACCACCTTTTTCGGTTTTCTCTATTTGGTTGTTCGAACCTGCACATG CT C ATTT GATAGCGTGGAAGG ATT G G CC AT C A AAC A A AT A AAA ATT C AC A AT C A AG G AT ATT T ATT AT CAGTTTTT GTTGTTGTG C ACTT C ATT GT AAAAT AAAAAAAAT CC AAAC ACGG AT GAT AACAACCGTGGATCACGAGTAAACTAATTCACTCAGTCATAAAAAGAAAGAGATATAGTGA GCAAAAAATC ATTTT AAG AT AGTATT GAT CCAACCAACC AAACATT AT CTT CAAAAATT ACAA T GTTTTT ACG AC AG TTG AT AAAAAAAAAGCTT ATTT AGT AAAC AT AAAAACT AT G G AGT AGTT TTTTTTGT AAAC AC AAACT AT AGTTT C AG ACTTTGTTTT GTATT CTTT C AAC AAAG AGTGT AA CT AT A AAA AC ATT CTT AT C AACTTTT CG CT C A AG TTG TT AC AG AAA AAA ACTT AAT C AAG A AT T AAAAT G ACACTT AT AAAATT AT CAAT AT AAT AAAATT ATT AATTT AT AG AGGTT AT AT CAATT AAAAACT AAC AACTTTT ATT CGT GTTTT C ATT CAT AT AT AAT AGT AAAGT GT AATTT CCT AACT T C ATTT G AAC AT ATT CT AAT AAAT AGTTT GT AG ATT A A AAAC A A AT C AC ACTTT G AAAAG AAA AAAAAATCAAATAGTCCACATGTTCAATAAATAGGCTGCTCCTTGGTTACAAAACCGCGCTC ATCGACTGCTCGCTGCCGTCGAGACTCTCGTGTGAGACCGTAATTTTTGTCAGTTTTAGTT AT AAT CTACGGT CC AG ATTT AAT ATCGT ACG AAACC ACT AG AT CC ACG AT AC AT CC AAC AC A G AAG AGT G CT CT CCT CTCCT C AACT CT ATTTT GTTTTTTT CCT CT C ATT CTTTTTTT AGTCG A AACT CT AAACC AACT AACCG AAAAAAACAAAAAACT CTTT CT CT CCT CT CCATTT CT CT CT CT AGGAGAGACAACCGGAATCGCACGTCGACGGGAAGAGTATCGCCGGAACTATTATAATTA CCGCCGGTCGCATAGATTATTCGTTGGAAACAACGCGTCGTGAGAGGAGAGGAAATTCGA AAAAAAG AAG AAAAAAATT AG AAACACCG ATT CACTTTTTTTTTGGGGGTT ATTTT AATTG AT TTGTGTGAATTAAATATTCTGCGATGGATGTGATTGGATAGAAGGAAACAAAAAGGAAAGG AG G A AG AT A A AAG AG AAG G CG AATT ATT CTGCTCCTCTCTCTCTCTCTCTCT CTTT CTTCTC TGTCGAACATCGCTGTTGCTGCTGTGTGTTTTCTTCGTGCATCCTTTTATTTTTCAAGGTAAT G AATTT CACG AG AT CCATT CTT CACAAGTTT CTTT CTTTTTTT AAATTT AATTT AATTT AGTGG AAAAAAT GTTTGGG AG G AAG CGT AATT GT GTTT GTTT GT AAATT AGGT AAG CT CTTT GT ATTT GTTTTTTT ATTTGCTGGT G AGT AATTT AGGTTT ATTTT CTT AAATT AAGTT AAACTGGGTGCC C AAGTTT GT G AATT AGGT AG G AGTT G GTTCCCT GTTTGC AT AT AAT G AGCTG AAC AAG G AT CATG AATT AGGCG AAATT GT AGT CT CTT ATG GCTTTTTG AAAT ACCT AAT CTTTGTCTT CCAG GTGTTTCTACTCCGCTTT AAAGGAGAG AGGTTT AAG ATG ATTTTTTTCGTATTGAACTTCTTC TT AG AGTACGT AAAGTTGCT G ACTTTGTTTGG ATTTAGGGTTT G ATTTT G CTT AGTT CT AATT G AATT CTT GTGTT GTTTTTTTTT GT GT CCTTTG AGTT ATTTTGCTT AAT CTTTTTT GT CTGGCA AGATCCTTCTTTGCAATGAATAGTGGATTTTGTTTCTTTTGGAGACTTACTGGCTTTGAATCT AAAACTGGTT GTT CAT CTTT C AGG GG AAGTG AT ATG GT CCGTTG AAAAAG ACT AAAAAG CT A C AAAAG AG ATTTT GTTTT ATT ATTCC AAATTTT G CTGT CAT CTGC
SEQ ID NO: 28: GSK2 amino acid sequence
MTSLSLGPQPPATAQPPQLRDGDASRRRSDMDTDKDMSAAVIEGNDAVTGHIISTTIGGKNGE
PKQTISYMAERVVGQGSFGIVFQAKCLETGESVAIKKVLQDRRYKNRELQLMRLMDHPNVVSL
KHCFFSTTSRDELFLNLVMEYVPETLYRVLKHYTNSSQRMPIFYVKLYTYQIFRGLAYIHTVPGV
CHRDVKPQNLLVDPLTHQCKLCDFGSAKVLVKGEANISYICSRYYRAPELIFGATEYTSSIDIWS
AGCVLAELLLGQPLFPGENSVDQLVEIIKVLGTPTREEIRCMNPNYTDFRFPQIKAHPWHKVFH
KRMPPEAIDLASRLLQYSPSLRYTALEACAHPFFNELREPNARLPNGRPLPALFNFKQELAGAS
PELINRLIPEHIRRQMSGGFPSQPGH SEQ ID NO: 29: GSK2 nucleic acid sequence
ATGACATCACTATCATTGGGCCCTCAGCCTCCGGCTACTGCTCAGCCGCCGCAGCTTCGC G ACGG AG ATGCTT CCAGGCGTCGTT CCG AT ATGG AT ACAG ACAAGGTTGCTCT CT CCCT CT CT CT CT CT CT CT CT CT CT CT ACTTT AACGTTT G GT G AAC AAATT G C ATTT CG ATTGCGTTTGG TGGCT ATT GT AG AT CTCGG CT AG AT CT AG CTTCG ATTT C ACTTTTTTTTT G CG GTTT CT CAG CG AAT CG ATCTGT GTTTT CT CTT G CTATCGT CGT AGTT CGTAGTTCGT AGT AG CTAG CTAGT CTT ACT ATT CAG CT G AAT GTTT C AACC AAT CAT ATT G AAG AT CTT G AGCT AT GTTTT GATT AC TAGTATTAGGGTGAAGAACATTGGTTCTCTCTGGGTTTGAAATTCGATTTCACAGACGATGT AG AT CTT AATT ACT AG ATT GTTT AACT AAT C AC AC ACTT GTT CC ATG ACT GT AAGTG ATTTG A TGTATTGGATTTACATTTGTTTGTTATCTACGTGATTGGACTCTGAGCTAGGCCTTGACTGT T CTTGG ATTTG AAG ATTTCAT AT GTTT AAAG AATGGTTTT GTCT ATT GATT GTTT CGT AAT CT CAT GTTT GTT GTTTT CAGG AG AAG AGCACT ATTTTTTTTTTT AAT CAGTTTT CTTT GTT CTTT C TTGACGAGAATAGTTTGATGATATGTTGAGGTTTGGTTGCAGGATATGTCTGCTGCTGTGAT AGAGGGAAACGATGCTGTTACAGGCCACATCATTTCTACTACAATTGGAGGCAAAAACGGT G AACCT AAACAGGTTT G AGTT CCTTT CTTT GTTT G AAAT CTT CAAAT GT CAT AATT AGT AACA TT GTT AAT GATT AC ATTT AAT CAT AT GTT C ACTT G CTTTT CC ACTT ACAG CTT AAAAC AAT AAC T AAAC AG AG ACT CTTT GTG GTT C ATTT ATT AC AACTTT AAGT AGG CT ACT C ACTT AT GTTTT A CTCTTTCTGTTTTTTTGCAGACCATCAGTTACATGGCCGAACGGGTTGTTGGACAAGGATC ATTCGGAATCGTGTTCCAGGTACCTTTGTGCTTCTCAATCACTGTTACCCTTTGTAGGCGGT AGCTTTCTTCTTTCCTTTCTGATCGAAGTATGAACTTACCATTGTAGGCCAAGTGCTTGGAA ACTGGAGAATCAGTCGCCATTAAGAAGGTTTTGCAAGACCGGCGCTACAAGAATCGTGAG CTGC AGTT GAT G CG ACT AAT G G ACC ACCCAAAT GTGGTTT CCTT G AAG C ATT GTTT CTT CT C TACAACGAGTAGAGATGAGCTCTTCCTCAATCTCGTTATGGAGTATGTACCCGAGACTTTG TACCGGGTTTTGAAGCACTATACTAATTCAAGCCAGAGAATGCCTATTTTCTATGTCAAACT CT AC AC AT ACC AAGTATGC ATT GTT ATT ATGT GTTTCCCTTT C AG GC AGT AT CT CT CTTT GTT GATT CT AAAACGGGT AAG AAT ACTTTTTTT CTGCAG AT CTT CAG AGGCTTGGCTT AT AT CCA T ACT GTTCCTGGTGTCTGT C AC AG AG ATGTG AAACC AC AAAAT CTTTT G GTACGTT GATT CT ATTTTGGGTTT GT CTTT GAT AAT CTTG AT AG ATT GTT AACT AATT CT CTT GT ACGTT CTGCAG GTTG ATCCT CT C ACT CAT C AGT GT AAG CTGTGT G ATTTTGG AAGT G C AAAAGTATTGGT AAG G AGCTTT ACCTTT AAT AT CCTGCTTT G CTT ATTT C AACT GTGTATGTGTTCTGTCT CAT G AAA TCTTTGCGACACATGATTATTCGGATTAGGTGAAAGGTGAAGCAAACATATCATACATTTGC T CT CGGTATT ACCG AGCT CCAG AGCT CAT CTTTGGGGCCACAG AGT AT ACAT CCT CCAT AG ACATATGGTCTGCTGGTTGTGTTTTGGCAGAGCTCCTTCTTGGCCAGGTTAGTGTAAACTA TTTT AT CT GTTT AACT CT AG AAT GTTCCGCTAT C ATTTTTG AT ATTT AT AATTTTTT ATCTGTC AGCCGTTGTTCCCGGGAGAAAATTCTGTGGACCAGCTGGTAGAGATCATCAAGGTGAAGT TT C ATTTTG AT CAT AT GTT AT CTTGCTGT CGT ATT CT GTTTT GTATAT AAAATT CAT AT AAT CT T AT AG ATTT GT AAT GAT AT AT GTGCTGCGTTTGTTT AG GTT CTTGGT ACT CC AACT CG AG AA G AAAT CCG ATGCAT G AAT CCAAACT ACACAG ACTT CAG ATT CCCT CAAAT CAAAGCT CACC CGTGGCATAAGGTATTTATATGCATGTCCGATCATACAGTGGCTAAATAGTTGAATCGCTTC T CATT AT ATT CGT AT AAATG AAAAACT AAACAAATTCACAT ACTT CT CT CTG ACCTT CAGGTT TT CCAT AAG AG G ATGCCT CCAG AAG CCATTGACCT CGC AT CT CG GCTT CTT C AAT ACT C AC CG AGCCTGCGTT ACACTGCGGT CAGT AT CT CT AAACCACCAAGTACT CTT AATT GTT AAG A GTGTTCTCT CT G G ATT C ATTGG ACCTGC ACTGC ACT GT CC AAT GTTG CT G ATGTTTT CTTTT AACT G ACATTTTTTT GTTGTTT CTGTGT AAAAGCTT G AAG CAT GTGC ACAT CC ATTTTT C AAT G AACT CCGTG AGCCG AATGCT CGT CTT CCT AACGGCCG ACCT CT ACCAGCCTT GTT CAACT T C AAAC AAG AG GT ACGTC AAT C AC AGC AAAAAAAAAAAAAGT AAT AT AG CT CC AAACC ATT A CT AG AAT GTT CAGTTTT AAAC AGTT G CCT AAT CT GTAAT CT CT CT CT CT ATT CG AAT GTT CAT AACAGTT AGCTGGGGCTT CACCAG AGCTG AT AAACAGGCT CAT ACCGG AGCACAT AAGGC G ACAG AT G AGTGG AGGCTT CCCAT CACAGCCTGGT CATT AG AAAAGG AAT ATGG AAACT G GGATGCTTTTGCGGAGCAAATGCCTTATGGAAAAGAGGAGAGAAGATCTCTGATTTTTCAG AGGGTTT AACT AAAAT AT CAGCTT AT G AGT AG AG AG AT G ATTGGCCAATT AAGCTTTTT GAG AAAT C AG G AGGT G GT G ATG ATT GTGTCT AAT AT AC AATT CT CT CTTTT CT CTTTTT AT GTT AT AATT CGCTTTT G ACTT GTAG AG AT ACCTTTT CTCGTTGT ATT ATTTGT AT AT GTTTTT GTCCG TAAGACAGCAAACCGCGATGATGGAAGAATGGAATGAATGAATGATGTCTAAAACTTAAGC CT AAT AACAAGGT CGG AGCT CAT ACAT AT AT AT AAAGTT AG AAT GTG AG AGCT CCAT GTT AA AATAACCTTAACATTGGCACGTGAATACAATTGCATGATTGAATTTCTGGTACGTCGAGAGG AAGT AAGTTT AT AG AAAGTT GTTTGTG AACAAACAAATGG AG AAACATTT GTTTT GTTGCAAA GAAACGTATGGTTCCATAATGTAGAAGAGGCATTTGAATGTGAGCTTTAAAACCTTTCATGA AAG AAAAGG AAAGTT ATGGGT CACT AACCGG AAAAT AT AT CATTTG AAAT GT GT AT AAAACT TAATGGGCTGAAAACTGTAGATAAGGAATTCCGGATTCTGGGAACCCTATTAACTGAGCCA C AAGC AAAG AT ACG AG G ACC AAACCCT AAAT CTT CT CT CTTTTTTT CCCCCT C ATT C AGGTG TTTTT C ATT AGT C AC ATT CGTT CTTT AT ACTTTT ATT AT CTTTG ATTGTT AAT AG ATT GTCTG A AAACGC AT GT CCACTT GTTT CT GTTTT ATTT GTTTTTTT CTTTTGCTGCAGGCTTTGG AAGT C C AC ACT AAG GT G AAAC AACT CTCCCT AAT CT AT ACGCCTTT C ACCT CTTT CCCCG CCTTT G A T CCTTT G AG AGTTTTTTTTT CTTTTTTTTTTTTG AAAATT CAAATTTT ATT CAACCTG AG AAT C GG G AAAT CAT ATT CGGTT AC AAAT CCGCTTTG AAC AAAAGTT CC AAAAT C AAACT ATT ACT A T CTTTGCCC ACT CACT AAACCT G AC AC ATT CTGCT AGCCT GTTTTT CG AAATT CTT C AG AAT CGGTTGCCGTTCTAAACTTTTGACGAAACCCAGAGGACTCCTTGTTGCTGCAGATGCCGG G ACTT CCT AT GT CGT CT CCGG AGTAGCT AGTT CCCG AT G AACT CCACAAAACT CAT AACCG TCAATGTCTATGAAGCGCTTGACGGTTCAATAGGGGATGGTTTCAGGTACTTACGATTAGA GGCTTTGCAACCACCACCAAAACCGGGCTTGTACACAAACGCCACACAGCGCTCGAAACC GAATCCCAAACCGGGCAAGCGCCCGCTACTGCCGCCACTAAACCAACCTTCAAGAAAGCG GGCCGATTTTGCT
SEQ ID NO: 30: GSK2 promoter sequence
ATG CGTTCT AAGTAT C AAG AT CCT ATT ACT ACT ACT ACT AC ACCTT GT AAT GAG AAT CAT AAG GT G AAG AT AAATGG AT CTT CT ACT CCAG AAGGG AAAG AG AG ACT AG AG AACTT G AGCT CAG CTT C ACG CACT AAAACC AGC AAAAACTTT G GTG AGCT CTT G GCTAGTG AT G AC AAT AC AT G GGAACCTTATTCTGAGGCTCCTGTTGCTGAGAAAACTCTGTATGTAGACACTGTGCATTCA GT ACACAAG AAGGT ACAAG AAG AGT CTTT ATT AAAAG ATT ACCCTT CACT AG AAGTT GTT CC TGTTAAAGAAGATGTTCAGAACTTGATTGGAGCCAGTGAAGAAGCTATCTCAGGTCTAAAA GTTGAAGAATGTGCTGATCAAGCTATTTCTGAAGTAGTAGAGATTACAAAGGATTTTGAATG TT C AAGG CTT CAT CAT CAT C AC ATT GTTG C ACC ACC AT CATTGCC AAAAG CT CCTT CAG ATT CTTGGTT AAAG CGT ACGTT G CC AAC AAT CCC AT C AAAG AAC AACT C ATT C AC AT G GTTGC A GTCT CTTGG C ATT G ATG AT AAT AAT AAT C AAAT C ACC AAG AGT ATT C AAG AAAAT CT C AAGT GGGAAACTATGGTCAAAACCTCCAATACACAACAAGGGTTTGTGTGCATCTCCAAGGTAAG CT AAT GT GTATTTTT C AAAGT C AAT G GTT G GCC AAAT GTTTTT GTTTTTTTTTTT GTTTTTG AC AAGTT GATT AG CTT ACTTT GTTG ACC ATT ATTTTT GT CTTT C AGG AC AC ACT C AACCCT AT AC CAG AG G CAT AG C AAT AC C AAATT ACAAG T AA ATTT C A AC A AT A AAA A A AG GAT GAG CCA AT A AAGTTTTGTTTTT GTT CAT CTT CC AAATTTT CT CCT CTTT AATT ATATGT AAAT CT G AAAT AAA AGGTTCCTAAAAAGAGAAAAGCTATGGAGATGAAATAAAAGGTCTCAAATATTGTCTGTCAC TTGTGGGGTTTGGGGGGGGGTCTTATTGAAGTGATGTACAGCTCATGTTAACAGAGATTTT GTTGCAAT AAT ACT CCAT AATT CCAT GT G ACATT GTTT CTTTT G ACCTT CTTT AT AT ATT CT CT GCT AGTAAT AG ACTTTTTGTTTTGTT CTTTT GT AATT AT GTTT CTGT AAT GT AG AG CACT AAA GAGACCTGAAAACTGCAGAACTCAATTGAATGCATTGGCTAAATGGTTATGAGAGGAATTA TT G AAACAATTT ATGGT GT GAG AAGTT CAAAT ATT ATT CT CTTT AT AGT GT CATGG AT AG AT C AG AT AT AGTT CAGG AG AAAGT AAAG AAAG AAAAAAAAACTTT AT AAAGGT ATCTT CATT AGTT AAG AT AT AC ATG AAAG AAACTG CTG CTTT AG G AG ATGTTTTGTT GAT CTT C ATG ATT CTT CT A T CTTT AT C ACTT GTAT GATT GTAT CC ATGG CGGTTTTT G CTT G CTT C AA A AAC AAG AAAG AG AAGAATGGTTCCTGTAGCTGTGGCAGTTGTTGGTGGCTGCGGTTGTGGTGGTGGAGGCTG CGTTT AGT ATT AC AC AAAT GAG AT AT AT CTTGGT CCTTGG CG AGTTT CCTGGTAAT G ATTTT GGTTTAGAGCATCTTTATCCGGGTATACCAAAGGGTTTCTTAGCCTGTGGGTCCCGTGTAG G ACCC ATTTTTTTT AAG AAAT CGGTT AC AAAAACT ACT AAAT AGTAGTCG GTT ATT AAG GGTT TCTTACACTGTTCGCGGACCCCGCTAACACGTGACGGCTAACGATTGGTTCATTTTTTTTTT TT AAATT CG AAAAAGTAAAAAAAAT AAAAT AAAAAAAATT AGG AAACT CT ATTTGG AGTTT CA GGGAT AATGATGCTATTAGGTCTTCACCGATTGTGACTG ATTT ACTT AAGCAGCCATTCT AT AT AT AGTTT ACATTACGTACATATAGAACAAAAAT AT ATACAT AAAAT ATCAGATAAATTCAG AAT CAAAT AT AT ATGCG AT AT GTTTTT GT AAATT ATTT GTT CAAATTTT CAAGT CT ACAAAT AA T G AGT CACAAAAC AAAAT AT ACC AAG AAATGG ATT G CG AT CGTCC AT GTG AT AC AT CCAG G GCCCTCT AAG ACTTTT AAACGT ATCTCGT ATT G AACCAAATGTT AAAACCCCGTT G AAAAG G TAGCCATCTTGCTCGTATAAACGAAAATTTTCATAGATGGTAGGGGGTGATTGGTTGAACT GT AGCAAGTG ACTTT AACTTT AATTTTT AT CT ACAGTTTT AAAAACCAT CAAT CGTGCTTT AT ATT AG TTTTT AAAGCT ACCACC AAAAAAT AAAAAGT ACAGCCAAAAAAACAAAAAAAAAAAT A ACT GT AAAAAATTT AATTT CT AAAGCT CCATTTTTTTGG AT GT AGG AAATTTT AAAGCT CT GT TCACGCGTGGGCCATCCTTTTCAAACATACTATACTAGTTGTTATTTGTTACCCAAAATGTA AAT ACATGCT ATGT CCTT ACT AGGCAGTAT AT AG AAATT AGTTT GTTTT AAT G AAT CTGG AAC AAT ACT AACTT C AAT AATT AATTGC AAG GTT AT CC ACCCTT G ACT GAT G AGG AG GTT AGTCG CGTTCTCATTGGTGCGTTACTCTTACGCGCTCTATCGACGCGTGGACGATATCCGAAGCTC TTTTAATAATACAAAGAGAGAGAGAGAGAGAAGGGAAAGATAGTCTTTACTCTTCAGTGGT GGGTAGAGAGCGAAAGTTAGAGAAAGAGAGAGAAGAATAGCAC
SEQ ID NO: 31 : GSK2 RNAi sequence
TCCCAGGTGAACCCAATATATCATATATATGCTCACGCTACTACCGAGCACCGGAGCTCAT ATTTGGTG C AACT G AAT AT ACT AC AT C AAT AG AT ATATG GT C AG CTGG GTGTGTTCTTG C AG AGCTACTCCTTGGTCAGCCATTGTTTCCAGGGGAGAGTGCAGTCGATCAGCTTGTAGAGAT AATTAAGGTTCTTGGTACACCAACCCGTGAGGAAATACGTTGCATGAACCCGAACTATACA G AGTTT AG GTTT CC AC AG AT AAAAG CT C ACCCTT G GC AC AAG GTTTT CCAC AAG AGG ATGC CT CCT G AAGC AAT AG ACCT CG CTT C ACG CCTT CTT C AAT ATT C ACCG AGTCT CCGCTG C AC TGCTCTTGATGCATGTGCACATCCTTTCTTTGATGAGCTGCGA
SEQ ID NO: 32 CAS9 nucleic acid sequence
ATGGCTCCTAAGAAGAAGCGGAAGGTTGGTATTCACGGGGTGCCTGCGGCTATGGATAAG
AAGTACAGCATTGGTCTGGACATCGGGACGAATTCCGTTGGCTGGGCCGTGATCACCGAT
GAGTACAAGGTCCCTTCCAAGAAGTTTAAGGTTCTGGGGAACACCGATCGGCACAGCATC
AAG AAG AAT CTCATTGG AGCCCT CCT GTT CG ACT CAGGCG AG ACCGCCG AAGCAACAAGG
CTCAAGAGAACCGCAAGGAGACGGTATACAAGAAGGAAGAATAGGATCTGCTACCTGCAG
GAGATTTTCAGCAACGAAATGGCGAAGGTGGACGATTCGTTCTTTCATAGATTGGAAGAAA
GTTTCCTCGTCGAGGAAGATAAGAAGCACGAGAGGCATCCTATCTTTGGCAACATTGTCGA
CG AGGTTGCCT AT CACG AAAAGTACCCC AC AAT CTAT CAT CTGCGG AAG AAGCTT GTGG AC
TCGACTGATAAGGCGGACCTTAGATTGATCTACCTCGCTCTGGCACACATGATTAAGTTCA
GGGGCCATTTTCTGATCGAGGGGGATCTTAACCCGGACAATAGCGATGTGGACAAGTTGT
T CAT CC AGCT CGTCC AAACCT AC AAT C AGCT CTTTG AG G AAAACCC AATT AAT G CTT C AG G
CGTCGACGCCAAGGCGATCCTGTCTGCACGCCTTTCAAAGTCTCGCCGGCTTG AG AACTT
GATCGCTCAACTCCCGGGCGAAAAGAAGAACGGCTTGTTCGGGAATCTCATTGCACTTTC
GTTGGGGCTCACACCAAACTTCAAGAGTAATTTTGATCTCGCTGAGGACGCAAAGCTGCAG
CTTTCCAAGGACACTTATGACGATGACCTGGATAACCTTTTGGCCCAAATCGGCGATCAGT
ACGCGGACTTGTTCCTCGCCGCGAAGAATTTGTCGGACGCGATCCTCCTGAGTGATATTCT
CCGCGTGAACACCGAGATTACAAAGGCCCCGCTCTCGGCGAGTATGATCAAGCGCTATGA
CGAGCACCATCAGGATCTGACCCTTTTGAAGGCTTTGGTCCGGCAGCAACTCCCAGAGAA
GTACAAGGAAATCTTCTTTGATCAATCCAAGAACGGCTACGCTGGTTATATTGACGGCGGG
GCATCGCAGGAGGAATTCTACAAGTTTATCAAGCCAATTCTGGAGAAGATGGATGGCACAG
AGGAACTCCTGGTGAAGCTCAATAGGGAGGACCTTTTGCGGAAGCAAAGAACTTTCGATAA
CGGCAGCATCCCTCACCAGATTCATCTCGGGGAGCTGCACGCCATCCTGAGAAGGCAGGA
AGACTTCTACCCCTTTCTTAAGGATAACCGGGAGAAGATCGAAAAGATTCTGACGTTCAGA
ATT CCGTACT AT GT CGG ACCACT CGCCCGGGGT AATT CCAG ATTTGCGTGG AT G ACCAG AA
AGAGCGAGGAAACCATCACACCTTGGAACTTCGAGGAAGTGGTCGATAAGGGCGCTTCCG
CACAGAGCTTCATTGAGCGCATGACAAATTTTGACAAGAACCTGCCTAATGAGAAGGTCCT
TCCCAAGCATTCCCTCCTGTACGAGTATTTCACTGTTTATAACGAACTCACGAAGGTGAAGT
ATGTGACCGAGGGAATGCGCAAGCCCGCCTTCCTGAGCGGCGAGCAAAAGAAGGCGATC
GTGGACCTTTTGTTTAAGACCAATCGGAAGGTCACAGTTAAGCAGCTCAAGGAGGACTACT
TCAAGAAGATTGAATGCTTCGATTCCGTTGAGATCAGCGGCGTGGAAGACAGGTTTAACGC
CTCACTGGGGACTTACCACGATCTCCTGAAGATCATTAAGGATAAGGACTTCTTGGACAAC
GAGGAAAATGAGGATATCCTCGAAGACATTGTCCTGACTCTTACGTTGTTTGAGGATAGGG
AAATGATCGAGGAACGCTTGAAGACGTATGCCCATCTCTTCGATGACAAGGTTATGAAGCA
GCT CAAG AG AAG AAG AT ACACCGG ATGGGG AAGGCT GT CCCGCAAGCTT AT CAATGGCAT
TAGAGACAAGCAATCAGGGAAGACAATCCTTGACTTTTTGAAGTCTGATGGCTTCGCGAAC
AGG AATTTT ATGCAGCT GATT CACG AT G ACT CACTT ACTTT CAAGG AGG AT AT CCAG AAGG
CTCAAGTGTCGGGACAAGGTGACAGTCTGCACGAGCATATCGCCAACCTTGCGGGATCTC
CTGCAATCAAGAAGGGTATTCTGCAGACAGTCAAGGTTGTGGATGAGCTTGTGAAGGTCAT
GGG ACGGCAT AAGCCCG AG AACAT CGTT ATT G AG ATGGCCAG AG AAAAT CAG ACCACACA
AAAGGGTCAGAAGAACTCGAGGGAGCGCATGAAGCGCATCGAGGAAGGCATTAAGGAGC
TGGGGAGTCAGATCCTT AAGG AGCACCCGGTGGAAAACACGCAGTTGCAAAATG AG AAGC
TCTATCTGTACTATCTGCAAAATGGCAGGGATATGTATGTGGACCAGGAGTTGGATATTAA CCGCCTCT CGG ATT ACG ACGTCG AT CAT ATCGTTCCT C AGT CCTT CCTT AAG G AT G AC AG C
ATTGACAATAAGGTTCTCACCAGGTCCGACAAGAACCGCGGGAAGTCCGATAATGTGCCC
AGCGAGGAAGTCGTTAAGAAGATGAAGAACTACTGGAGGCAACTTTTGAATGCCAAGTTGA
TCACACAGAGGAAGTTTGATAACCTCACTAAGGCCGAGCGCGGAGGTCTCAGCGAACTGG
ACAAGGCGGGCTTCATTAAGCGGCAACTGGTTGAGACTAGACAGATCACGAAGCACGTGG
CGC AG ATT CT CG ATT C ACG C ATG AAC ACG AAGT ACG AT G AG AAT G AC AAG CTG ATCCGG G
AAGTGAAGGTCATCACCTTGAAGTCAAAGCTCGTTTCTGACTTCAGGAAGGATTTCCAATTT
TATAAGGTGCGCGAGATCAACAATTATCACCATGCTCATGACGCATACCTCAACGCTGTGG
TCGGAACAGCATTGATTAAGAAGTACCCGAAGCTCGAGTCCGAATTCGTGTACGGTGACTA
TAAGGTTTACGATGTGCGCAAGATGATCGCCAAGTCAGAGCAGGAAATTGGCAAGGCCAC
TGCG AAGT ATTT CTTTT ACT CT AACATT AT G AATTTCTTT AAG ACTG AG AT CACGCTGGCT AA
TGGCGAAATCCGGAAGAGACCACTTATTGAGACCAACGGCGAGACAGGGGAAATCGTGTG
GGACAAGGGGAGGGATTTCGCCACAGTCCGCAAGGTTCTCTCTATGCCTCAAGTGAATATT
GTCAAGAAGACTGAAGTCCAGACGGGCGGGTTCTCAAAGGAATCTATTCTGCCCAAGCGG
AACTCGGATAAGCTTATCGCCAGAAAGAAGGACTGGGATCCGAAGAAGTATGGAGGTTTC
GACTCACCAACGGTGGCTTACTCTGTCCTGGTTGTGGCAAAGGTGGAGAAGGGAAAGTCA
AAGAAGCTCAAGTCTGTCAAGGAGCTCCTGGGTATCACCATTATGGAGAGGTCCAGCTTC
GAAAAGAATCCGATCGATTTTCTCGAGGCGAAGGGATATAAGGAAGTGAAGAAGGACCTG
ATCATTAAGCTTCCAAAGTACAGTCTTTTCGAGTTGGAAAACGGCAGGAAGCGCATGTTGG
CTTCCGCAGGAGAGCTCCAGAAGGGTAACGAGCTTGCTTTGCCGTCCAAGTATGTGAACT
T CCT CT AT CTGGCAT CCCACT ACG AG AAGCT CAAGGGCAGCCCAG AGG AT AACG AACAG A
AGCAACTGTTTGTGGAGCAACACAAGCATTATCTTGACGAGATCATTGAACAGATTTCGGA
GTTCAGTAAGCGCGTCATCCTCGCCGACGCGAATTTGGATAAGGTTCTCTCAGCCTACAAC
AAGCACCGGGACAAGCCTATCAGAGAGCAGGCGGAAAATATCATTCATCTCTTCACCCTGA
CAAACCTTGGGGCTCCCGCTGCATTCAAGTATTTTGACACTACGATTGATCGGAAGAGATA
CACTTCTACGAAGGAGGTGCTGGATGCAACCCTTATCCACCAATCGATTACTGGCCTCTAC
GAGACGCGGATCGACTTGAGTCAGCTCGGTGGCGATAAGAGACCCGCAGCAACCAAGAA
G G C AG G G C A AG C A A AG AAG AAG AAG TG A
SEQ ID NO: 33 CRISPR target sequence for OML4 GTGGGTTCCGGCAACCTCAATGG
SEQ ID NO: 34 CRISPR target sequence for GSK2 AGGGGAATGACGCGGTGACCGGG
SEQ ID NO: 35: CRISPR protospacer sequence for OML4 GTGGGTTCCGGCAACCTCAA
SEQ ID NO: 36: CRISPR protospacer sequence for GSK2 AGGGGAATGACGCGGTGACC

Claims

CLAIMS:
1. A method of increasing grain size and/or weight in a plant, the method comprising reducing or abolishing the expression and/or activity of a Mei2-Like protein 4 (OML4).
2. The method of claim 1 , wherein the method comprises introducing at least one mutation into at least one nucleic acid sequence encoding OML4, wherein preferably the OML4 nucleic acid sequence encodes a polypeptide comprising SEQ ID NO: 1 or a functional variant or homolog thereof and/or introducing at least one mutation into the promoter of OML4, wherein the promoter of OML4 optionally comprises a sequence as defined in SEQ ID NO: 3 or a functional variant or homolog thereof.
3. A method of producing a plant with increased grain size and/or weight, the method comprising introducing at least one mutation into at least one nucleic acid sequence encoding a OML4 polypeptide, wherein the OML4 nucleic acid sequence preferably encodes a polypeptide comprising SEQ ID NO: 1 or a functional variant or homolog thereof, and/or at least one mutation into the promoter of OML4, wherein the promoter of OML4 optionally comprises a sequence as defined in SEQ ID NO: 3 or a functional variant or homolog thereof.
4. The method of any preceding claim, wherein the method further comprises reducing or abolishing the expression and/or activity of a SHAGGY-like kinase (GSK2).
5. The method of claim 4, wherein the method comprises introducing at least one mutation into at least one nucleic acid sequence encoding GSK2, wherein the nucleic acid sequence encoding GSK2 preferably encodes a polypeptide comprising SEQ ID NO: 4 or a functional variant or homolog thereof and/or introducing at least one mutation into the promoter of GSK2, wherein the GSK2 promoter optionally comprises a nucleic acid sequence as defined in SEQ ID NO: 6 or a functional variant or homolog thereof.
6. The method of any of claims 2 to 5, wherein the mutation is a loss of function or partial loss of function mutation.
7. The method of any of claims 2 to 6, wherein the mutation is introduced using targeted genome modification, preferably ZFNs, TALENs or CRISPR/Cas9 or mutagenesis, preferably TILLING or T-DNA insertion.
8. The method of claim 1 or 3, wherein the method comprises using RNA interference to reduce or abolish the expression of a OML4 nucleic acid sequence or a GSK2 nucleic acid sequence.
9. The method of any preceding claim, wherein the plant is a crop plant, optionally selected from rice, wheat, maize, soybean and brassicas.
10. A genetically modified plant, plant cell or part thereof characterised by reduced or abolished expression and/or activity of OML4.
11 . The genetically modified plant of claim 10, wherein the plant comprises at least one mutation in at least one nucleic acid sequence encoding a OML4 gene, wherein the OML4 nucleic acid preferably encodes a polypeptide as defined in SEQ ID NO: 1 or a functional variant or homolog thereof and/or at least one mutation into the promoter of OML4, wherein the OML4 promoter optionally comprises a nucleic acid sequence as defined in SEQ ID NO: 3 or a functional variant or homolog thereof.
12. The genetically modified plant of claim 10 or claim 11 , wherein the plant further comprises at least one mutation in at least one nucleic acid sequence encoding GSK2, wherein the GSK2 nucleic acid preferably encodes a polypeptide as defined in SEQ ID NO: 4 or a functional variant or homolog thereof and/or at least one mutation in the promoter of GSK2, wherein the GSK2 promoter preferably comprises a nucleic acid sequence as defined in SEQ ID NO: 6 or a functional variant or homolog thereof.
13. The genetically modified plant of claim 11 or claim 12, wherein the mutation is a loss of function or partial loss of function mutation.
14. The genetically modified plant of any of claims 11 to 13, wherein the mutation is introduced using targeted genome modification, preferably ZFNs, TALENs or CRISP/Cas9, or wherein the mutation is introduced using mutagenesis, preferably TILLING or T-DNA insertion.
15. The genetically modified plant of claim 14, wherein the plant comprises an RNA interference construct that reduces or abolishes the expression of OML4.
16. The genetically modified plant of any of claims 10 to 14, wherein the plant is a crop plant, optionally selected from rice, wheat, maize, soybean and brassicas.
17. A nucleic acid construct, wherein said construct comprises a nucleic acid sequence encoding at least one single-guide RNA (sgRNA), wherein said sgRNA sequence comprises a sequence selected from SEQ ID NO: 35 and 36 or a variant thereof.
18. A method of increasing grain number in a plant, the method comprising increasing the expression and/or activity of a Mei2-Like protein 4 (OML4).
19. The method of claim 18, wherein the method comprises introducing and expressing in the plant a nucleic acid construct, wherein the construct comprises a nucleic acid sequence encoding a OML4 polypeptide as defined in SEQ ID NO: 1 or a functional variant or homolog thereof.
20. A genetically modified plant, plant cell or part thereof characterised by increased expression and/or activity of OML4, wherein the plant is preferably a crop plant.
PCT/EP2021/052951 2020-02-07 2021-02-08 Methods of controlling grain size and weight WO2021156505A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CA3167040A CA3167040A1 (en) 2020-02-07 2021-02-08 Methods of controlling grain size and weight
US17/760,160 US20230081195A1 (en) 2020-02-07 2021-02-08 Methods of controlling grain size and weight
EP21705118.4A EP4099818A1 (en) 2020-02-07 2021-02-08 Methods of controlling grain size and weight
AU2021216126A AU2021216126A1 (en) 2020-02-07 2021-02-08 Methods of controlling grain size and weight
CN202180011352.5A CN115135142A (en) 2020-02-07 2021-02-08 Method for controlling grain size and grain weight

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2020074530 2020-02-07
CNPCT/CN2020/074530 2020-02-07

Publications (1)

Publication Number Publication Date
WO2021156505A1 true WO2021156505A1 (en) 2021-08-12

Family

ID=74595262

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2021/052951 WO2021156505A1 (en) 2020-02-07 2021-02-08 Methods of controlling grain size and weight

Country Status (6)

Country Link
US (1) US20230081195A1 (en)
EP (1) EP4099818A1 (en)
CN (1) CN115135142A (en)
AU (1) AU2021216126A1 (en)
CA (1) CA3167040A1 (en)
WO (1) WO2021156505A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4873192A (en) 1987-02-17 1989-10-10 The United States Of America As Represented By The Department Of Health And Human Services Process for site specific mutagenesis without phenotypic selection
US20070016974A1 (en) * 1999-09-30 2007-01-18 Byrum Joseph R Nucleic acid molecules and other molecules associated with plants
WO2010086221A1 (en) * 2009-01-28 2010-08-05 Basf Plant Science Company Gmbh Plants having enhanced yield-related traits and a method for making the same
CN103667314A (en) * 2013-12-09 2014-03-26 中国科学院遗传与发育生物学研究所 Protein OsMKK4 originated from rice and application of biomaterial related to protein OsMKK4 in regulation and control of size of plant seed
US8697359B1 (en) 2012-12-12 2014-04-15 The Broad Institute, Inc. CRISPR-Cas systems and methods for altering expression of gene products
CN110004163A (en) * 2019-02-01 2019-07-12 中国农业科学院作物科学研究所 The method of polygenes editor raising paddy drought resistance
CN110484555A (en) * 2018-05-10 2019-11-22 中国农业科学院作物科学研究所 The construction method of transgenic paddy rice with seediness grain fasciation character

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200255846A1 (en) * 2017-03-24 2020-08-13 Institute of Genetics and Developmental Biology Chinese Acedemy of Sciences Methods for increasing grain yield
CN110692507A (en) * 2018-07-09 2020-01-17 中国科学院遗传与发育生物学研究所 Method for improving plant species
CN110526993B (en) * 2019-03-06 2020-06-16 山东舜丰生物科技有限公司 Nucleic acid construct for gene editing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4873192A (en) 1987-02-17 1989-10-10 The United States Of America As Represented By The Department Of Health And Human Services Process for site specific mutagenesis without phenotypic selection
US20070016974A1 (en) * 1999-09-30 2007-01-18 Byrum Joseph R Nucleic acid molecules and other molecules associated with plants
WO2010086221A1 (en) * 2009-01-28 2010-08-05 Basf Plant Science Company Gmbh Plants having enhanced yield-related traits and a method for making the same
US8697359B1 (en) 2012-12-12 2014-04-15 The Broad Institute, Inc. CRISPR-Cas systems and methods for altering expression of gene products
CN103667314A (en) * 2013-12-09 2014-03-26 中国科学院遗传与发育生物学研究所 Protein OsMKK4 originated from rice and application of biomaterial related to protein OsMKK4 in regulation and control of size of plant seed
CN110484555A (en) * 2018-05-10 2019-11-22 中国农业科学院作物科学研究所 The construction method of transgenic paddy rice with seediness grain fasciation character
CN110004163A (en) * 2019-02-01 2019-07-12 中国农业科学院作物科学研究所 The method of polygenes editor raising paddy drought resistance

Non-Patent Citations (12)

* Cited by examiner, † Cited by third party
Title
"Techniques in Molecular Biology", 1983, MACMILLAN PUBLISHING COMPANY
FANG NA ET AL: "SMALL GRAIN 11Controls Grain Size, Grain Number and Grain Yield in Rice", RICE, SPRINGER US, BOSTON, vol. 9, no. 1, 29 November 2016 (2016-11-29), pages 1 - 11, XP036112397, ISSN: 1939-8425, [retrieved on 20161129], DOI: 10.1186/S12284-016-0136-Z *
J. KAUR: "The Arabidopsis-mei2-Like Genes Play a Role in Meiosis and Vegetative Growth in Arabidopsis", THE PLANT CELL ONLINE, vol. 18, no. 3, 1 March 2006 (2006-03-01), pages 545 - 559, XP055195387, ISSN: 1040-4651, DOI: 10.1105/tpc.105.039156 *
KE HUANG ET AL: "WIDE AND THICK GRAIN 1 , which encodes an otubain-like protease with deubiquitination activity, influences grain size and shape in rice", THE PLANT JOURNAL, vol. 91, no. 5, 16 June 2017 (2017-06-16), GB, pages 849 - 860, XP055490401, ISSN: 0960-7412, DOI: 10.1111/tpj.13613 *
KRYSAN ET AL., THE PLANT CELL, vol. 11, December 1999 (1999-12-01), pages 2283 - 2290
KUNKEL ET AL., METHODS IN ENZYMOL., vol. 154, 1987, pages 367 - 382
KUNKEL, PROC. NATL. ACAD. SCI. USA, vol. 82, 1985, pages 488 - 492
LI NA ET AL: "Control of grain size in rice", PLANT REPRODUCTION, SPRINGER, DE, vol. 31, no. 3, 10 March 2018 (2018-03-10), pages 237 - 251, XP036573223, ISSN: 2194-7953, [retrieved on 20180310], DOI: 10.1007/S00497-018-0333-6 *
SAMBROOK ET AL.: "Molecular Cloning: A Library Manual", 1989, COLD SPRING HARBOR LABORATORY PRESS
SAVADI SIDDANNA ED - GEORGE TIMOTHY S ET AL: "Molecular regulation of seed development and strategies for engineering seed size in crop plants", PLANT GROWTH REGULATION, SPRINGER, DORDRECHT, NL, vol. 84, no. 3, 26 December 2017 (2017-12-26), pages 401 - 422, XP036446161, ISSN: 0167-6903, [retrieved on 20171226], DOI: 10.1007/S10725-017-0355-3 *
XU RAN ET AL: "Control of Grain Size and Weight by the OsMKKK10-OsMKK4-OsMAPK6 Signaling Pathway in Rice", MOLECULAR PLANT, vol. 11, no. 6, 1 June 2018 (2018-06-01), pages 860 - 873, XP055800733, ISSN: 1674-2052, DOI: 10.1016/j.molp.2018.04.004 *
ZHANG TIAN: "When Less Is More: GSK2-OML4 Module Negatively Regulates Grain Size in Rice", THE PLANT CELL, vol. 32, no. 6, 1 June 2020 (2020-06-01), US, pages 1781 - 1781, XP055800736, ISSN: 1040-4651, Retrieved from the Internet <URL:https://academic.oup.com/plcell/article-pdf/32/6/1781/36891975/plcell_v32_6_1781.pdf> DOI: 10.1105/tpc.20.00219 *

Also Published As

Publication number Publication date
CA3167040A1 (en) 2021-08-12
US20230081195A1 (en) 2023-03-16
EP4099818A1 (en) 2022-12-14
AU2021216126A1 (en) 2022-07-07
CN115135142A (en) 2022-09-30

Similar Documents

Publication Publication Date Title
AU2017201713B2 (en) Plant regulatory elements and uses thereof
EP2726493B1 (en) Methods and compositions for selective regulation of protein expression
AU2018274709B2 (en) Methods for increasing grain productivity
WO2019158911A1 (en) Methods of increasing nutrient use efficiency
MX2015005466A (en) Identification of a xanthomonas euvesicatoria resistance gene from pepper (capsicum annuum) and method for generating plants with resistance.
WO2019038417A1 (en) Methods for increasing grain yield
US20200255846A1 (en) Methods for increasing grain yield
US20230183729A1 (en) Methods of increasing seed yield
WO2021048272A1 (en) Methods of increasing biotic stress resistance in plants
AU2020357916A1 (en) Plants having a modified lazy protein
EP3856912A1 (en) Methods for altering starch granule profile
CN114072512A (en) Sterile gene and related construct and application thereof
WO2016154178A1 (en) Modulation of dreb gene expression to increase maize yield and other related traits
WO2019080727A1 (en) Lodging resistance in plants
WO2021156505A1 (en) Methods of controlling grain size and weight
JP5545793B2 (en) Rice blast field resistance gene Pi35 (t) and its utilization
US20210238622A1 (en) Pollination barriers and their use
WO2023227912A1 (en) Glucan binding protein for improving nitrogen fixation in plants
EA043050B1 (en) WAYS TO INCREASE GRAIN YIELD
WO2022136658A1 (en) Methods of controlling grain size
WO2020185637A1 (en) Rose rosette virus infectious clones and uses thereof
WO2024098063A2 (en) Targeted insertion via transposition
WO2021016840A1 (en) Abiotic stress tolerant plants and methods
WO2020237524A1 (en) Abiotic stress tolerant plants and methods

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21705118

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021216126

Country of ref document: AU

Date of ref document: 20210208

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 3167040

Country of ref document: CA

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021705118

Country of ref document: EP

Effective date: 20220907