WO2020131986A1 - Multiplex genome targeting - Google Patents

Multiplex genome targeting Download PDF

Info

Publication number
WO2020131986A1
WO2020131986A1 PCT/US2019/067032 US2019067032W WO2020131986A1 WO 2020131986 A1 WO2020131986 A1 WO 2020131986A1 US 2019067032 W US2019067032 W US 2019067032W WO 2020131986 A1 WO2020131986 A1 WO 2020131986A1
Authority
WO
WIPO (PCT)
Prior art keywords
cell
sequence
guide rna
endoribonuclease
dna
Prior art date
Application number
PCT/US2019/067032
Other languages
French (fr)
Inventor
Huirong Gao
Joshua K. Young
Original Assignee
Pioneer Hi-Bred International, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pioneer Hi-Bred International, Inc. filed Critical Pioneer Hi-Bred International, Inc.
Publication of WO2020131986A1 publication Critical patent/WO2020131986A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8201Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation
    • C12N15/8213Targeted insertion of genes into the plant genome by homologous recombination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]

Definitions

  • the disclosure relates to the field of molecular biology, in particular to compositions and methods for modifying the genome of a cell.
  • Recombinant DNA technology has made it possible to insert DNA sequences at targeted genomic locations and/or modify specific endogenous chromosomal sequences.
  • Site-specific integration techniques which employ site-specific recombination systems, as well as other types of recombination technologies, have been used to generate targeted insertions of genes of interest in a variety of organism.
  • Genome-editing techniques such as designer zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), or homing meganucleases, are available for producing targeted genome perturbations, but these systems tend to have low specificity and employ designed nucleases that need to be redesigned for each target site, which renders them costly and time-consuming to prepare.
  • CRISPR Clustered Regularly Interspaced Short Palindromic Repeats
  • CRISPR-Cas as a robust double strand break tool has been widely used for genome editing.
  • CRISPR-Cas As a robust double strand break tool has been widely used for genome editing.
  • CRISPR-Cas As a robust double strand break tool has been widely used for genome editing.
  • CRISPR-Cas As a robust double strand break tool has been widely used for genome editing.
  • CRISPR-Cas as a robust double strand break tool has been widely used for genome editing.
  • CRISPR-Cas as a robust double strand break tool has been widely used for genome editing.
  • compositions and methods for editing a plurality of target polynucleotides for generating a plurality of guide RNA molecules in a cell, and for providing a guide RNA to a cell for genome editing.
  • the plurality of guide RNA molecules may be provided to a target polynucleotide or to a target cell as part of a contiguous polynucleotide, comprising discrete guide RNA molecules separated by an endoribonuclease recognition sequence.
  • the endoribonuclease recognition sequence is capable of being cleaved by an endoribonuclease recognition sequence.
  • endoribonuclease for example an endoribonuclease identified from a Type I-E CRISPR system, for example from Streptococcus thermophilus.
  • the endoribonuclease shares at least 50%, between 50% and 55%, at least 55%, between 55% and 60%, at least 60%, between 60% and 65%, at least 65%, between 65% and 70%, at least 70%, between 70% and 75%, at least 75%, between 75% and 80%, at least 80%, between 80% and 85%, at least 85%, between 85% and 90%, at least 90%, between 90% and 95%, at least 95%, between 95% and 96%, at least 96%, between 96% and 97%, at least 97%, between 97% and 98%, at least 98%, between 98% and 99%, at least 99%, between 99% and 100%, or 100% sequence identity with at least 50, between 50 and 100, at least 100, between 100 and 150, at least 150, between 150 and 200, at least 200, or greater than 200 contiguous amino acids of SEQID NO:48.
  • the endoribonuclease shares at least 50%, between 50% and 55%, at least 55%, between 55% and 60%, at least 60%, between 60% and 65%, at least 65%, between 65% and 70%, at least 70%, between 70% and 75%, at least 75%, between 75% and 80%, at least 80%, between 80% and 85%, at least 85%, between 85% and 90%, at least 90%, between 90% and 95%, at least 95%, between 95% and 96%, at least 96%, between 96% and 97%, at least 97%, between 97% and 98%, at least 98%, between 98% and 99%, at least 99%, between 99% and 100%, or 100% sequence identity with at least 250, between 250 and 300, at least 300, between 300 and 350, at least 350, between 350 and 400, at least 400, between 400 and 450, at least 450, between 450 and 500, at least 500, between 500 and 550, at least 550, between
  • a functional fragment or functional variant of SEQID NO: 1, SEQID NO:39, or SESQ ID NO:48 is provided, wherein the functional fragment or functional variant is capable of, or encodes a molecule capable of, cleaving a ribonucleotide sequence.
  • a poly guide RNA molecule that comprises a plurality of discrete gRNAs, and a plurality of recognition sequences for an endoribonuclease, for example an endoribonuclease, for example an endoribonuclease isolated or derived from Streptococcus thermophilus , for example a molecule comprising a least 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97,%, 98%, 99%, greater than 99%, or 100% sequence identity to SEQID NO: l, SEQID NO:39, or SESQ ID NO:48, or any functional fragment or functional variant thereof, wherein the recognition sequence is capable of being recognized and cleaved by said endoribonuclease.
  • the poly-guide RNA molecule comprises two, three, four, five, six, seven, eight, nine, ten, or greater than ten such recognition sequences.
  • Said recognition sequences may be identical or non-identical, or a combination thereof.
  • Said discrete gRNAs may be identical or non-identical, or a combination thereof.
  • the poly-guide RNA molecule may be provided to a target polynucleotide or target cell on a DNA vector or directly as an RNA molecule; a DNA vector comprising a poly-guide RNA molecule may optionally further comprise a polynucleotide sequence encoding an
  • DNA vectors comprising a poly-guide RNA molecule and one or more additional compositions may comprise all compositions oriented in the same direction or in different directions, and may comprise a single expression element directing the expression of all compositions, or may comprise a plurality of expression elements directing the expression of individual or grouped compositions.
  • DNA vectors comprising components not oriented all in the same way may include a bidirectional promoter to regulate the expression of individual or grouped compositions.
  • an expression element may be provided to regulate the expression of one or more of the compositions provided herein, in either a constitutive manner or a non- constitutive manner, for example temporally- (i.e., during different time points of an cell or organism life cycle, or diurnally regulated), spatially- (i.e., different cell or tissue types), or conditionally- (i.e., inducible or regulated) controlled. Combinations of expression elements to control the expressions of different compositions are contemplated.
  • a synthetic composition comprising an
  • endoribonuclease isolated or derived from Streptococcus thermophilus, and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease.
  • a synthetic composition comprising an
  • endoribonuclease isolated or derived from Streptococcus thermophilus, and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; further comprising a Cas endonuclease.
  • a synthetic composition comprising an
  • endoribonuclease isolated or derived from Streptococcus thermophilus, and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the recognition sequence comprises the nucleotides CCCGCNNNNGCGGG.
  • a synthetic composition comprising an
  • a synthetic composition comprising an
  • endoribonuclease isolated or derived from Streptococcus thermophilus
  • a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the poly-guide RNA molecule comprises RNA.
  • a synthetic composition comprising an
  • endoribonuclease isolated or derived from Streptococcus thermophilus , and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the endoribonuclease is a protein.
  • a synthetic composition comprising an
  • endoribonuclease isolated or derived from Streptococcus thermophilus
  • a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the plurality of discrete RNA molecules comprise at least two non-identical guide RNA molecules that are each capable of forming a complex with a Cas endonuclease.
  • a synthetic composition comprising an
  • endoribonuclease isolated or derived from Streptococcus thermophilus , and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the endoribonuclease shares at least 85% sequence identity with SEQID NO:48.
  • a synthetic composition comprising an
  • endoribonuclease isolated or derived from Streptococcus thermophilus
  • a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the poly-guide RNA molecule is operably linked to a promoter.
  • a synthetic composition comprising an
  • endoribonuclease isolated or derived from Streptococcus thermophilus and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the poly-guide RNA molecule is operably linked to a promoter, wherein the promoter selected from the group consisting of: U6, Ubiquitin, bidirectional promoter.
  • a synthetic composition comprising an
  • endoribonuclease isolated or derived from Streptococcus thermophilus , and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease.
  • a synthetic composition comprising an
  • endoribonuclease isolated or derived from Streptococcus thermophilus
  • a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease; wherein at least one of the discrete guide RNA molecules is capable of selective hybridization with a target polynucleotide.
  • a synthetic composition comprising an
  • endoribonuclease isolated or derived from Streptococcus thermophilus
  • a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease; wherein the discrete guide RNA molecules are each capable of selective hybridization with a target polynucleotide.
  • a synthetic composition comprising an
  • endoribonuclease isolated or derived from Streptococcus thermophilus
  • a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease; wherein the discrete guide RNA molecules are each capable of selective hybridization with a target polynucleotide, wherein each of the discrete guide RNA molecules are non-identical.
  • a synthetic composition comprising an
  • endoribonuclease isolated or derived from Streptococcus thermophilus and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease; wherein the discrete guide RNA molecules are each capable of selective hybridization with at least one target polynucleotide, wherein each of the discrete guide RNA molecules are identical.
  • a synthetic composition comprising an
  • endoribonuclease isolated or derived from Streptococcus thermophilus
  • a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease; wherein the discrete guide RNA molecules are each capable of selective hybridization with a different target polynucleotide.
  • a synthetic composition comprising an
  • endoribonuclease isolated or derived from Streptococcus thermophilus
  • a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease; wherein the discrete guide RNA molecules are each capable of selective hybridization with a target polynucleotide; wherein the target polynucleotide is in a cell.
  • a synthetic composition comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease.
  • a synthetic composition comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; further comprising a Cas endonuclease.
  • a synthetic composition comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the recognition sequence comprises the nucleotides CCCGCNNNNGCGGG.
  • a synthetic composition comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein at least one component is a DNA molecule encoding the component.
  • a synthetic composition comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the poly-guide RNA molecule comprises RNA.
  • a synthetic composition comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the endoribonuclease is a protein.
  • a synthetic composition comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the plurality of discrete RNA molecules comprise at least two non-identical guide RNA molecules that are each capable of forming a complex with a Cas endonuclease.
  • a synthetic composition comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease;
  • a synthetic composition comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the poly-guide RNA molecule is operably linked to a promoter.
  • a synthetic composition comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the poly-guide RNA molecule is operably linked to a promoter; wherein the promoter selected from the group consisting of: U6, Ubiquitin, bidirectional promoter.
  • a synthetic composition comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease.
  • a synthetic composition comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease; wherein at least one of the discrete guide RNA molecules is capable of selective hybridization with a target polynucleotide.
  • a synthetic composition comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease; wherein the discrete guide RNA molecules are each capable of selective hybridization with a target polynucleotide.
  • a synthetic composition comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease; wherein the discrete guide RNA molecules are each capable of selective hybridization with a target polynucleotide, wherein each of the discrete guide RNA molecules are non-identical.
  • a synthetic composition comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the discrete guide RNA molecules are each capable of selective hybridization with at least one target polynucleotide, wherein each of the discrete guide RNA molecules are identical.
  • a synthetic composition comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease; wherein the discrete guide RNA molecules are each capable of selective hybridization with a different target polynucleotide.
  • a synthetic composition comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease; wherein the discrete guide RNA molecules are each capable of selective hybridization with a target polynucleotide; wherein the target polynucleotide is in a cell.
  • a cell comprises, either transiently introduced or stably integrated, any of the synthetic compositions herein.
  • a cell comprises in its genome a polynucleotide sequence that is capable of selective hybridization with at least one of the discrete gRNA molecules of the poly-guide RNA molecule.
  • a method for providing a poly-guide RNA molecule to a cell that comprises a target sequence capable of selective hybridization with at least one guide RNA of the poly-guide RNA molecule.
  • a method for providing a poly-guide RNA molecule to a cell that comprises a target sequence capable of selective hybridization with at least one guide RNA of the poly-guide RNA molecule; further comprising providing to the cell a Cas endonuclease.
  • a method is provided for providing a poly-guide RNA molecule to a cell that comprises a target sequence capable of selective hybridization with at least one guide RNA of the poly-guide RNA molecule; wherein the cell is a bacterium, plant cell, or animal cell.
  • a method for providing a poly-guide RNA molecule to a cell that comprises a target sequence capable of selective hybridization with at least one guide RNA of the poly-guide RNA molecule; wherein the Cas endonuclease, the endoribonuclease, and the poly-guide RNA molecule are provided on one vector to a target cell.
  • a method for providing a poly-guide RNA molecule to a cell that comprises a target sequence capable of selective hybridization with at least one guide RNA of the poly-guide RNA molecule; wherein the Cas endonuclease is provided to a target cell on a different vector than that comprising the poly-guide RNA or the endoribonuclease.
  • a method for providing a poly-guide RNA molecule to a cell that comprises a target sequence capable of selective hybridization with at least one guide RNA of the poly-guide RNA molecule; wherein the Cas endonuclease and/or the
  • endoribonuclease is/are delivered to the cell as proteins, and the poly-guide RNA molecule is provided to the cell as RNA.
  • a method for generating a plurality of guide RNA molecules in a cell, the method comprising providing a heterologous RNA molecule comprising a plurality of discrete guide RNA molecules, wherein the plurality of the guide RNA molecules are each flanked by an endoribonuclease recognition sequence; providing an endoribonuclease that cleaves the endoribonuclease recognition sequence; and thereby generating the plurality of guide RNA molecules in the cell.
  • a method for generating a plurality of guide RNA molecules in a cell comprising providing a heterologous RNA molecule comprising a plurality of discrete guide RNA molecules, wherein the plurality of the guide RNA molecules are each flanked by an endoribonuclease recognition sequence; providing an endoribonuclease that cleaves the endoribonuclease recognition sequence; and thereby generating the plurality of guide RNA molecules in the cell; further comprising providing to the cell a Cas endonuclease.
  • a method for generating a plurality of guide RNA molecules in a cell comprising providing a heterologous RNA molecule comprising a plurality of discrete guide RNA molecules, wherein the plurality of the guide RNA molecules are each flanked by an endoribonuclease recognition sequence; providing an endoribonuclease that cleaves the endoribonuclease recognition sequence; and thereby generating the plurality of guide RNA molecules in the cell; wherein the cell is a bacterium, plant cell, or animal cell.
  • a method for generating a plurality of guide RNA molecules in a cell comprising providing a heterologous RNA molecule comprising a plurality of discrete guide RNA molecules, wherein the plurality of the guide RNA molecules are each flanked by an endoribonuclease recognition sequence; providing an endoribonuclease that cleaves the endoribonuclease recognition sequence; and thereby generating the plurality of guide RNA molecules in the cell; wherein the Cas endonuclease, the endoribonuclease, and the poly-guide RNA molecule are provided on one vector to a target cell.
  • a method for generating a plurality of guide RNA molecules in a cell comprising providing a heterologous RNA molecule comprising a plurality of discrete guide RNA molecules, wherein the plurality of the guide RNA molecules are each flanked by an endoribonuclease recognition sequence; providing an endoribonuclease that cleaves the endoribonuclease recognition sequence; and thereby generating the plurality of guide RNA molecules in the cell; wherein the Cas endonuclease is provided to a target cell on a different vector than that comprising the poly-guide RNA or the endoribonuclease.
  • a method for generating a plurality of guide RNA molecules in a cell comprising providing a heterologous RNA molecule comprising a plurality of discrete guide RNA molecules, wherein the plurality of the guide RNA molecules are each flanked by an endoribonuclease recognition sequence; providing an endoribonuclease that cleaves the endoribonuclease recognition sequence; and thereby generating the plurality of guide RNA molecules in the cell; wherein the Cas endonuclease and/or the endoribonuclease is/are delivered to the cell as proteins, and the poly-guide RNA is provided to the cell as RNA.
  • a method for editing a target polynucleotide in a cell comprising providing to the cell an endoribonuclease, a Cas endonuclease, and a poly-guide RNA molecule, wherein the poly-guide RNA molecule comprises a plurality of discrete guide RNA.
  • a method for editing a target polynucleotide in a cell comprising providing to the cell an endoribonuclease, a Cas endonuclease, and a poly-guide RNA molecule, wherein the poly-guide RNA molecule comprises a plurality of discrete guide RNA; wherein the cell is a bacterium, plant cell, or animal cell.
  • a method for editing a target polynucleotide in a cell comprising providing to the cell an endoribonuclease, a Cas endonuclease, and a poly-guide RNA molecule, wherein the poly-guide RNA molecule comprises a plurality of discrete guide RNA; wherein the Cas endonuclease, the endoribonuclease, and the poly-guide RNA molecule are provided on one vector to a target cell.
  • a method for editing a target polynucleotide in a cell comprising providing to the cell an endoribonuclease, a Cas endonuclease, and a poly-guide RNA molecule, wherein the poly-guide RNA molecule comprises a plurality of discrete guide RNA; wherein the Cas endonuclease is provided to a target cell on a different vector than that comprising the poly-guide RNA or the endoribonuclease.
  • a method for editing a target polynucleotide in a cell comprising providing to the cell an endoribonuclease, a Cas endonuclease, and a poly-guide RNA molecule, wherein the poly-guide RNA molecule comprises a plurality of discrete guide RNA; wherein the Cas endonuclease and/or the endoribonuclease is/are delivered to the cell as proteins, and the poly-guide RNA molecule is provided to the cell as RNA.
  • the poly-guide RNA molecule is provided as RNA. In any aspect, the poly-guide RNA molecule is provided as a DNA molecule encoding the discrete guide RNA molecules, operably linked to a functional promoter.
  • the Cas endoribonuclease is provided as a protein, or as an RNA molecule that gets transcribed in to a protein, or as a DNA molecule that gets translated into RNA and transcribed into a protein.
  • the Cas endonuclease is provided as a protein, or as an RNA molecule that gets transcribed in to a protein, or as a DNA molecule that gets translated into RNA and transcribed into a protein.
  • the Cas endoribonuclease is Cas6. In one aspect, the Cas endoribonuclease is identified from a Type-I system.
  • FIG. 1 depicts the Type I-E CRISPR locus from Streptococcus thermophilus
  • the Cas endoribonuclease gene is indicated with a crosshatched arrow in the locus.
  • FIG. 2 depicts guide RNAs comprising a ⁇ 33 nt variable targeting sequence that is flanked by fixed sequences comprising a ⁇ 7 nt 5 prime sequence and a ⁇ 21 nt 3 prime sequence capable of forming a hairpin-like structure (FIG. 2A).
  • These fixed flanking sequences are the result of cleavage within the repeat sequences of the primary CRISPR array transcript by the Cas endoribonuclease protein in Type I-E CRISPR-Cas systems (FIG. 2B).
  • FIG. 3 shows the experimental design to determine the sequence for the Cas endoribonuclease binding and cleavage.
  • FIG. 3 A depicts the cassette comprising the Cas endoribonuclease (RN) for poly-guide RNA cleavage and the Cas9 deoxyribonuclease (DN) for genomic target cleavage.
  • FIG. 3B shows two different arrangements of the genomic target site gRNAs within the poly-guide RNA cassette.
  • FIG. 4 depicts the Zea mays Y1 genomic locus (Zm-Yl, SEQID NO:36) for Cas9 cleavage, using guide RNAs released from the poly-guide RNA by the Cas endoribonuclease (RN) that target two target sites adjacent to Zm-Y 1 :
  • TS2 is Target Site 2 (SEQID NO:22), and
  • TS3 is Target Site 3 (SEQID NO:23).
  • fl is the forward primer 1 (SEQID NO: 10)
  • f2 is the forward primer 2 (SEQID NO: 12)
  • rl is the reverse primer 1 (SEQID NO: 11)
  • r2 is the reverse primer 2 (SEQID NO: 13).
  • FIG. 5 shows the design of the DNA cassettes used in the examples. Arrows depict a promoter sequence.
  • Table 1 Compositions of the DNA cassettes of FIG. 5
  • FIG. 6 shows mutation frequencies (%) for different Cas endoribonuclease recognition sequences, at 7 days post-bombardment.
  • FIG. 7 shows promoter comparisons for multiplex targeting with the Cas endoribonuclease, at 4 days post-bombardment.
  • FIG. 8 shows promoter comparisons for multiplex targeting with the Cas endoribonuclease, at 7 days post -Agrobacterium infection.
  • FIG. 9 is the vector schematic for targeting five sites in maize as described in
  • FIG. 10 is the vector schematic for targeting four sites in sorghum as described in
  • FIG. 11A is the experimental design for targeting two sites in canola, for the A and C genomes.
  • FIG. 11B shows the nucleotide difference for the PGAZ gene (CR1) in the canola A genome vs the C genome.
  • FIG. 12A is the vector schematic for testing two sites in canola as described in
  • FIG. 12B shows the experimental results, demonstrating successful cleavage in canola in both the A and C genomes, with the multiplexed guide RNAs.
  • FIG. 13A is the vector schematic for the Cas9 cassette used in the promoter study in maize.
  • FIG. 13B are the vector schematics for different promoter cassettes used in the promoter study in maize (Example 4).
  • FIG. 13C shows the results of testing in maize at target site TS45.
  • FIG. 13D shows the results of testing in maize at target site Y1-CR2.
  • sequence descriptions and sequence listing attached hereto comply with the rules governing nucleotide and amino acid sequence disclosures in patent applications as set forth in 37 C.F.R. ⁇ 1.821 and 1.825.
  • sequence descriptions comprise the three letter codes for amino acids as defined in 37 C.F.R. ⁇ 1.821 and 1.825, which are incorporated herein by reference.
  • SEQID NO:l is the Streptococcus thermophilus DGCC7710 DNA sequence of the
  • Type I-E Cas endoribonuclease (RN).
  • SEQID NO:2 is the Streptococcus thermophilus DGCC7710 DNA sequence of the
  • SEQID NO:3 is the Streptococcus thermophilus DGCC7710 DNA sequence of the Type I-E Cas endoribonuclease Recognition Sequence 1.
  • SEQID NO:4 is the Streptococcus thermophilus DGCC7710 DNA sequence of the
  • SEQID NO:5 is the Streptococcus thermophilus DGCC7710 DNA sequence of the
  • SEQID NO:6 is the Streptococcus thermophilus DGCC7710 DNA sequence of the
  • SEQID NO:7 is the artificial DNA sequence of optimized porcine teschovirus-1 2A self-cleaving peptide (p2A).
  • SEQID NO:8 is the artificial DNA sequence of Y1-CR2 guide.
  • SEQID NO:9 is the artificial DNA sequence of Y1-CR3 guide.
  • SEQID NO: 10 is the artificial DNA sequence of Yl-CR2-fl.
  • SEQID NO: 11 is the artificial DNA sequence of Yl-CR2-rl.
  • SEQID NO:12 is the artificial DNA sequence of Yl-CR3-f2.
  • SEQID NO:13 is the artificial DNA sequence of Yl-CR3-r2.
  • SEQID NO:14 is the Simian Virus 40 DNA sequence of SV40 NLS.
  • SEQID NO:15 is the artificial DNA sequence of VIRD2 NLS.
  • SEQID NO:16 is the artificial DNA sequence of UBTCAS9.
  • SEQID NO: 17 is the Streptococcus pyogenes DNA sequence of Cas9.
  • SEQID NO: 18 is the artificial DNA sequence of UBTRN-CAS9 with all NLS.
  • SEQID NO:19 is the Zea mays DNA sequence of UBTRN-CAS9 no NLS for RN.
  • SEQID NO:20 is the Zea mays DNA sequence of ZM-U6 promoter.
  • SEQID NO:21 is the Zea mays DNA sequence of GUIDE RNA (77bp).
  • SEQID NO:22 is the Zea mays DNA sequence of Y1-TS2.
  • SEQID NO:23 is the Zea mays DNA sequence of Y1-TS3.
  • SEQID NO:24 is the artificial DNA sequence of ZM-UBI TERM-V1.
  • SEQID NO:25 is the artificial DNA sequence of BSV (AY) TR PRO.
  • SEQID NO:26 is the artificial DNA sequence of ZM-HPLV9 INTRON1.
  • SEQID NO:27 is the artificial DNA sequence of CAMV35S TERM.
  • SEQID NO:28 is the Setaria italica DNA sequence of SI-UBI1 PRO.
  • SEQID NO:29 is the Setaria italica DNA sequence of SI-UBI1 INTRON1.
  • SEQID NO:30 is the Setaria italica DNA sequence of SI-UBI TERM (MODI).
  • SEQID NO:31 is the Zea mays DNA sequence of ZM-UBI bidirectional promoter.
  • SEQID NO:32 is the Zea mays DNA sequence of ZM-ODP2.
  • SEQID NO:33 is the Zea mays DNA sequence of ZM-WUS2.
  • SEQID NO:34 is the Zea mays DNA sequence of IN2-2 PRO.
  • SEQID NO:35 is the Zea mays DNA sequence of In2-1 TERM.
  • SEQID NO:36 is the artificial DNA sequence of ZS-YELLOW1 Nl.
  • SEQID NO:37 is the Streptococcus thermophilus DNA sequence of the Type I-E Type
  • SEQID NO:38 is the S. thermophilus DNA sequence of the Type I-E casD gene.
  • SEQID NO:39 is the S. thermophilus DNA sequence of the Type I-E cas
  • Endoribonuclease (RN) gene Endoribonuclease (RN) gene.
  • SEQID NO:40 is the S. thermophilus DNA sequence of the Type I-E casC gene.
  • SEQID NO:41 is the S. thermophilus DNA sequence of the Type I-E casA gene.
  • SEQID NO:42 is the S. thermophilus DNA sequence of the Type I-E casB gene.
  • SEQID NO:43 is the S. thermophilus DNA sequence of the Type I-E cas3 gene.
  • SEQID NO:44 is the S. thermophilus DNA sequence of the Type I-E casl gene.
  • SEQID NO:45 is the S. thermophilus DNA sequence of the Type I-E cas2 gene.
  • SEQID NO:46 is the Zea mays DNA sequence of the Ubiquitin promoter.
  • SEQID NO:47 is the DNA sequence of the Pinll terminator.
  • SEQID NO:48 is the Protein sequence of the Streptococcus thermophilus Type I-E
  • SEQID NO:49 is the forward primer bridge sequence DNA sequence.
  • SEQID NO:50 is the reverse primer bridge sequence DNA sequence.
  • SEQID NO:51 is the secondary PCR universal forward primer DNA sequence.
  • SEQID NO:52 is the secondary PCR universal reverse primer DNA sequence.
  • SEQID NO:53 is the Zea mays SH2-CR4 guide DNA sequence.
  • SEQID NO:54 is the Zea mays SH2-CR5 guide DNA sequence.
  • SEQID NO:55 is the Zea mays SU1-CR1 guide DNA sequence.
  • SEQID NO:56 is the Zea mays SU1-CR4 guide DNA sequence.
  • SEQID NO:57 is the Sorghum bicolor OSDL1-CR3 guide DNA sequence.
  • SEQID NO:58 is the Sorghum bicolor 0SDL3-CR1 guide DNA sequence.
  • SEQID NO:59 is the Sorghum bicolor REC8-CR4 guide DNA sequence.
  • SEQID NO:60 is the Sorghum bicolor SP011-CRl guide DNA sequence.
  • SEQID NO:61 is the Brassica napus PGAZ-CR1-C guide DNA sequence.
  • SEQID NO:62 is the Brassica napus PGAZ-CR1-A guide DNA sequence.
  • SEQID NO:63 is the Brassica napus PGAZ-CR2 guide DNA sequence.
  • SEQID NO:64 is the Arabidopsis thaliana AT-UBI PRO bidirectional DNA sequence.
  • SEQID NO:65 is the Arabidopsis thaliana AT-UBI10 5UTR-INTRON 1 DNA seq.
  • SEQID NO:66 is the Arabidopsis thaliana AT-NLS (CO) DNA sequence.
  • SEQID NO:67 is the Zea mays SH2-CR4 target site DNA sequence.
  • SEQID NO:68 is the Zea mays SH2-CR5 target site DNA sequence.
  • SEQID NO:69 is the Zea mays SU1-CR1 target site DNA sequence.
  • SEQID NO:70 is the Zea mays SU1-CR4 target site DNA sequence.
  • SEQID NO:71 is the Sorghum bicolor OSDL1-CR3 target site DNA sequence.
  • SEQID NO:72 is the Sorghum bicolor OSDL3-CR1 target site DNA sequence.
  • SEQID NO:73 is the Sorghum bicolor REC8-CR4 target site DNA sequence.
  • SEQID NO:74 is the Sorghum bicolor SPOl 1-CRl target site DNA sequence.
  • SEQID NO:75 is the Sorghum bicolor PGAZ-CR1-C target site DNA sequence.
  • SEQID NO:76 is the Brassica napus PGAZ-CR1-A target site DNA sequence.
  • SEQID NO:77 is the Brassica napus PGAZ-CR2 target site DNA sequence.
  • SEQID NO:78 is the artificial SH2_CR4 forward primer DNA sequence.
  • SEQID NO:79 is the artificial SH2_CR4 reverse primer DNA sequence.
  • SEQID NO:80 is the artificial SH2 CR5 forward primer DNA sequence.
  • SEQID NO:81 is the artificial SH2_CR5 reverse primer DNA sequence.
  • SEQID NO:82 is the artificial Sul-CRl forward primer DNA sequence.
  • SEQID NO:83 is the artificial Sul-CRl reverse primer DNA sequence.
  • SEQID NO:84 is the artificial Sul-CR4 forward primer DNA sequence.
  • SEQID NO:85 is the artificial Sul-CR4 reverse primer DNA sequence.
  • SEQID NO:86 is the artificial PGAZ-CR1-A forward primer DNA sequence.
  • SEQID NO:87 is the artificial PGAZ-CR1-A reverse primer DNA sequence.
  • SEQID NO:88 is the artificial PGAZ-CR1-C forward primer DNA sequence.
  • SEQID NO:89 is the artificial PGAZ-CR1-C reverse primer DNA sequence.
  • SEQID NO:90 is the artificial PGAZ-CR2 forward primer 1 DNA sequence.
  • SEQID NO:91 is the artificial PGAZ-CR2 reverse primer 1 DNA sequence.
  • SEQID NO:92 is the artificial OSDL1-CR3 forward primer DNA sequence.
  • SEQID NO:93 is the artificial OSDL1-CR3 reverse primer DNA sequence.
  • SEQID NO:94 is the artificial OSDL3-CR1 forward primer DNA sequence.
  • SEQID NO:95 is the artificial OSDL3-CR1 reverse primer DNA sequence.
  • SEQID NO:96 is the artificial REC8-CR4 forward primer PRT sequence.
  • SEQID NO:97 is the artificial REC8-CR4 reverse primer DNA sequence.
  • SEQID NO:98 is the artificial SPOl 1-CRl forward primer DNA sequence.
  • SEQID NO:99 is the artificial SPOl 1-CRl reverse primer DNA sequence.
  • Targeting multiple loci for editing (“multiplex” targeting) may be achieved by providing a plurality of guide RNAs to target polynucleotide(s).
  • the plurality of guide RNAs are provided as DNA sequences on a single construct, separated by a sequence comprising a target site for an endoribonuclease (RN), such as but not limited to Cas6.
  • RN endoribonuclease
  • the plurality of guide RNAs share complementarity to different target sites.
  • the target polynucleotide(s) is(are) present in the genome of a cell.
  • nucleic acid means a polynucleotide and includes a single or a double-stranded polymer of deoxyribonucleotide or ribonucleotide bases. Nucleic acids may also include fragments and modified nucleotides. Thus, the terms“polynucleotide”,“nucleic acid sequence”,“nucleotide sequence” and“nucleic acid fragment” are used interchangeably to denote a polymer of RNA and/or DNA and/or RNA-DNA that is single- or double-stranded, optionally comprising synthetic, non-natural, or altered nucleotide bases.
  • Nucleotides are referred to by their single letter designation as follows:“A” for adenosine or deoxyadenosine (for RNA or DNA, respectively),“C” for cytosine or deoxycytosine,“G” for guanosine or deoxyguanosine,“U” for uridine,“T” for thymine or deoxythymidine,“R” for purines (A or G),“Y” for pyrimidines (C or T),“K” for G or T,“H” for A or C or T,“I” for inosine, and“N” for any nucleotide.
  • the term“genome” as it applies to a prokaryotic and eukaryotic cell or organism cells encompasses not only chromosomal DNA found within the nucleus, but organelle DNA found within subcellular components (e.g, mitochondria, or plastid) of the cell.
  • ORF Open reading frame
  • sequences include reference to hybridization, under stringent hybridization conditions, of a nucleic acid sequence to a specified nucleic acid target sequence to a detectably greater degree (e.g, at least 2-fold over background) than its hybridization to non-target nucleic acid sequences and to the substantial exclusion of non-target nucleic acids.
  • Selectively hybridizing sequences typically have about at least 80% sequence identity, or 90% sequence identity, up to and including 100% sequence identity (i.e., fully complementary) with each other.
  • stringent conditions or“stringent hybridization conditions” includes reference to conditions under which a probe will selectively hybridize to its target sequence in an in vitro hybridization assay. Stringent conditions are sequence-dependent and will be different in different circumstances. By controlling the stringency of the hybridization and/or washing conditions, target sequences can be identified which are 100% complementary to the probe (homologous probing). Alternatively, stringency conditions can be adjusted to allow some mismatching in sequences so that lower degrees of similarity are detected (heterologous probing). Generally, a probe is less than about 1000 nucleotides in length, optionally less than 500 nt in length.
  • stringent conditions will be those in which the salt concentration is less than about 1.5 M Na ion, typically about 0.01 to 1.0 M Na ion concentration (or other salt(s)) at pH 7.0 to 8.3, and at least about 30°C for short probes (e.g, 10 to 50 nucleotides) and at least about 60°C for long probes (e.g, greater than 50 nucleotides).
  • Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide.
  • Exemplary moderate stringency conditions include hybridization in 40 to 45% formamide, 1 M NaCl, 1% SDS at 37°C, and a wash in 0.5X to IX SSC at 55 to 60°C.
  • Exemplary high stringency conditions include hybridization in 50% formamide, 1 M NaCl, 1% SDS at 37°C, and a wash in 0.1X SSC at 60°C to 65°C.
  • a“region of homology to a genomic region” that is found on the donor DNA is a region of DNA that has a similar sequence to a given“genomic region” in the cell or organism genome.
  • a region of homology can be of any length that is sufficient to promote homologous recombination at the cleaved target site.
  • the region of homology can comprise at least 5-10, 5-15, 5-20, 5-25, 5-30, 5-35, 5-40, 5-45, 5- 50, 5-55, 5-60, 5-65, 5- 70, 5-75, 5-80, 5-85, 5-90, 5-95, 5-100, 5-200, 5-300, 5-400, 5-500, 5-600, 5-700, 5-800, 5-900, 5-1000, 5-1100, 5-1200, 5-1300, 5- 1400, 5-1500, 5-1600, 5-1700, 5-1800, 5-1900, 5-2000, 5-2100, 5-2200, 5-2300, 5-2400, 5-2500, 5-2600, 5-2700, 5-2800, 5-2900, 5-3000, 5-3100 or more bases in length such that the region of homology has sufficient homology to undergo homologous recombination with the
  • “Sufficient homology” indicates that two polynucleotide sequences have structural similarity such that they are capable of acting as substrates for a homologous recombination reaction.
  • the structural similarity includes overall length of each polynucleotide fragment, as well as the sequence similarity of the polynucleotides.
  • Sequence similarity can be described by the percent sequence identity over the whole length of the sequences, and/or by conserved regions comprising localized similarities such as contiguous nucleotides having 100% sequence identity, and percent sequence identity over a portion of the length of the sequences.
  • a“genomic region” is a segment of a chromosome in the genome of a cell that is present on either side of the target site or, alternatively, also comprises a portion of the target site.
  • the genomic region can comprise at least 5-10, 5-15, 5-20, 5-25, 5-30, 5-35, 5- 40, 5-45, 5- 50, 5-55, 5-60, 5-65, 5- 70, 5-75, 5-80, 5-85, 5-90, 5-95, 5-100, 5-200, 5-300, 5-400, 5-500, 5-600, 5-700, 5-800, 5-900, 5-1000, 5-1100, 5-1200, 5-1300, 5-1400, 5-1500, 5-1600, 5- 1700, 5-1800, 5-1900, 5-2000, 5-2100, 5-2200, 5-2300, 5-2400, 5-2500, 5-2600, 5-2700, 5-2800.
  • “homologous recombination” includes the exchange of
  • the frequency of homologous recombination is influenced by a number of factors. Different organisms vary with respect to the amount of homologous recombination and the relative proportion of homologous to non-homologous recombination. Generally, the length of the region of homology affects the frequency of homologous recombination events: the longer the region of homology, the greater the frequency. The length of the homology region needed to observe homologous recombination is also species-variable. In many cases, at least 5 kb of homology has been utilized, but homologous recombination has been observed with as little as 25-50 bp of homology. See, for example, Singer et al.
  • sequence identity or“identity” in the context of nucleic acid or polypeptide sequences refers to the nucleic acid bases or amino acid residues in two sequences that are the same when aligned for maximum correspondence over a specified comparison window.
  • the term“percentage of sequence identity” refers to the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide or polypeptide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences.
  • the percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the results by 100 to yield the percentage of sequence identity.
  • percent sequence identities include, but are not limited to, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%, or any percentage from 50% to 100%. These identities can be determined using any of the programs described herein.
  • Sequence alignments and percent identity or similarity calculations may be determined using a variety of comparison methods designed to detect homologous sequences including, but not limited to, the MegAlignTM program of the LASERGENE bioinformatics computing suite (DNASTAR Inc., Madison, WI).
  • sequence analysis software is used for analysis, that the results of the analysis will be based on the“default values” of the program referenced, unless otherwise specified.
  • “default values” will mean any set of values or parameters that originally load with the software when first initialized.
  • The“Clustal V method of alignment” corresponds to the alignment method labeled Clustal V (described by Higgins and Sharp, (1989) CABIOS 5:151-153; Higgins et al., (1992) Comput Appl Biosci 8: 189-191) and found in the MegAlignTM program of the
  • PENALTY 10.
  • DIAGONALS SAVED 5.
  • The“Clustal W method of alignment” corresponds to the alignment method labeled Clustal W (described by Higgins and Sharp, (1989) CABIOS 5:151-153; Higgins et al ., (1992) Comput Appl Biosci 8:189-191) and found in the MegAlignTM v6.1 program of the LASERGENE bioinformatics computing suite (DNASTAR Inc., Madison, WI).
  • identity/similarity values refer to the value obtained using GAP Version 10 (GCG, Accelrys, San Diego, CA) using the following parameters: % identity and % similarity for a nucleotide sequence using a gap creation penalty weight of 50 and a gap length extension penalty weight of 3, and the nwsgapdna.cmp scoring matrix; % identity and % similarity for an amino acid sequence using a GAP creation penalty weight of 8 and a gap length extension penalty of 2, and the BLOSUM62 scoring matrix (Henikoff and Henikoff, (1989) Proc. Natl. Acad. Sci. USA 89: 10915).
  • GAP uses the algorithm ofNeedleman and Wunsch, (1970) J Mol Biol 48:443-53, to find an alignment of two complete sequences that maximizes the number of matches and minimizes the number of gaps. GAP considers all possible alignments and gap positions and creates the alignment with the largest number of matched bases and the fewest gaps, using a gap creation penalty and a gap extension penalty in units of matched bases.
  • BLAST is a searching algorithm provided by the National Center for Biotechnology
  • NCBI National Cancer Institute
  • polypeptides from other species or modified naturally or synthetically wherein such polypeptides have the same or similar function or activity.
  • Useful examples of percent identities include, but are not limited to, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95%, or any percentage from 50% to 100%.
  • any amino acid identity from 50% to 100% may be useful in describing the present disclosure, such as 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99%.
  • Polynucleotide and polypeptide sequences, variants thereof, and the structural relationships of these sequences can be described by the terms“homology”,“homologous”, “substantially identical”,“substantially similar” and“corresponding substantially” which are used interchangeably herein. These refer to polypeptide or nucleic acid sequences wherein changes in one or more amino acids or nucleotide bases do not affect the function of the molecule, such as the ability to mediate gene expression or to produce a certain phenotype.
  • nucleic acid sequences that do not substantially alter the functional properties of the resulting nucleic acid relative to the initial, unmodified nucleic acid. These modifications include deletion, substitution, and/or insertion of one or more nucleotides in the nucleic acid fragment. Substantially similar nucleic acid sequences
  • encompassed may be defined by their ability to hybridize (under moderately stringent conditions, e.g ., 0.5X SSC, 0.1% SDS, 60°C) with the sequences exemplified herein, or to any portion of the nucleotide sequences disclosed herein and which are functionally equivalent to any of the nucleic acid sequences disclosed herein.
  • Stringency conditions can be adjusted to screen for moderately similar fragments, such as homologous sequences from distantly related organisms, to highly similar fragments, such as genes that duplicate functional enzymes from closely related organisms. Post-hybridization washes determine stringency conditions.
  • a centimorgan is the distance between two polynucleotide sequences, linked genes, markers, target sites, loci, or any pair thereof, wherein 1% of the products of meiosis are recombinant.
  • a centimorgan is equivalent to a distance equal to a 1% average recombination frequency between the two linked genes, markers, target sites, loci, or any pair thereof.
  • nucleic acid molecule substantially or essentially free from
  • an isolated or purified polynucleotide or polypeptide or protein is substantially free of other cellular material, or culture medium when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized.
  • an "isolated" polynucleotide is free of sequences (optimally protein encoding sequences) that naturally flank the polynucleotide (i.e., sequences located at the 5' and 3' ends of the polynucleotide) in the genomic DNA of the organism from which the polynucleotide is derived.
  • the isolated polynucleotide can contain less than about 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb, or 0.1 kb of nucleotide sequence that naturally flank the polynucleotide in genomic DNA of the cell from which the polynucleotide is derived.
  • Isolated polynucleotides may be purified from a cell in which they naturally occur. Conventional nucleic acid purification methods known to skilled artisans may be used to obtain isolated polynucleotides.
  • the term also embraces recombinant polynucleotides and chemically synthesized polynucleotides.
  • fragment refers to a contiguous set of nucleotides or amino acids. In one embodiment, a fragment is 2, 3, 4, 5, 6, 7 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or greater than 20 contiguous nucleotides. In one embodiment, a fragment is 2, 3, 4, 5, 6, 7 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or greater than 20 contiguous amino acids. A fragment may or may not exhibit the function of a sequence sharing some percent identity over the length of said fragment.
  • fragment that is functionally equivalent and“functionally equivalent fragment” are used interchangeably herein. These terms refer to a portion or subsequence of an isolated nucleic acid fragment or polypeptide that displays the same activity or function as the longer sequence from which it derives.
  • the fragment retains the ability to alter gene expression or produce a certain phenotype whether or not the fragment encodes an active protein.
  • the fragment can be used in the design of genes to produce the desired phenotype in a modified plant. Genes can be designed for use in suppression by linking a nucleic acid fragment, whether or not it encodes an active enzyme, in the sense or antisense orientation relative to a plant promoter sequence.
  • Gene includes a nucleic acid fragment that expresses a functional molecule such as, but not limited to, a specific protein, including regulatory sequences preceding (5’ non coding sequences) and following (3’ non-coding sequences) the coding sequence.“Native gene” refers to a gene as found in its natural endogenous location with its own regulatory sequences.
  • endogenous it is meant a sequence or other molecule that naturally occurs in a cell or organism.
  • an endogenous polynucleotide is normally found in the genome of a cell; that is, not heterologous.
  • An“allele” is one of several alternative forms of a gene occupying a given locus on a chromosome. When all the alleles present at a given locus on a chromosome are the same, that plant is homozygous at that locus. If the alleles present at a given locus on a chromosome differ, that plant is heterozygous at that locus.
  • Coding sequence refers to a polynucleotide sequence which codes for a specific amino acid sequence.
  • Regulatory sequences refer to nucleotide sequences located upstream (5’ non-coding sequences), within, or downstream (3’ non-coding sequences) of a coding sequence, and which influence the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences include, but are not limited to, promoters, translation leader sequences, 5’ untranslated sequences, 3’ untranslated sequences, introns, polyadenylation target sequences, RNA processing sites, effector binding sites, and stem-loop structures.
  • A“mutated gene” is a gene that has been altered through human intervention.
  • Such a“mutated gene” has a sequence that differs from the sequence of the corresponding non- mutated gene by at least one nucleotide addition, deletion, or substitution.
  • the mutated gene comprises an alteration that results from a guide polynucleotide/Cas endonuclease system as disclosed herein.
  • a mutated plant is a plant comprising a mutated gene.
  • a“targeted mutation” is a mutation in a gene (referred to as the target gene), including a native gene, that was made by altering a target sequence within the target gene using any method known to one skilled in the art, including a method involving a guided Cas endonuclease system as disclosed herein.
  • knock-out represents a DNA sequence of a cell that has been rendered partially or completely inoperative by targeting with a Cas protein; for example, a DNA sequence prior to knock-out could have encoded an amino acid sequence, or could have had a regulatory function ( e.g ., promoter).
  • knock-in represents the replacement or insertion of a DNA sequence at a specific DNA sequence in cell by targeting with a Cas protein (for example by homologous recombination (HR), wherein a suitable donor DNA polynucleotide is also used).
  • a suitable donor DNA polynucleotide for example by homologous recombination (HR), wherein a suitable donor DNA polynucleotide is also used.
  • knock-ins are a specific insertion of a heterologous amino acid coding sequence in a coding region of a gene, or a specific insertion of a transcriptional regulatory element in a genetic locus.
  • domain it is meant a contiguous stretch of nucleotides (that can be RNA,
  • the term“conserved domain” or“motif’ means a set of polynucleotides or amino acids conserved at specific positions along an aligned sequence of evolutionarily related proteins. While amino acids at other positions can vary between homologous proteins, amino acids that are highly conserved at specific positions indicate amino acids that are essential to the structure, the stability, or the activity of a protein. Because they are identified by their high degree of conservation in aligned sequences of a family of protein homologues, they can be used as identifiers, or“signatures”, to determine if a protein with a newly determined sequence belongs to a previously identified protein family.
  • A“codon-modified gene” or“codon-preferred gene” or“codon-optimized gene” is a gene having its frequency of codon usage designed to mimic the frequency of preferred codon usage of the host cell.
  • An“optimized” polynucleotide is a sequence that has been optimized for improved expression in a particular heterologous host cell.
  • An“optimized nucleotide sequence” is a nucleotide sequence that has been optimized for expression in a particular organism.
  • a plant-optimized nucleotide sequence includes a codon-optimized gene.
  • a plant-optimized nucleotide sequence can be synthesized by modifying a nucleotide sequence encoding a protein such as, for example, a Cas endonuclease as disclosed herein, using one or more plant-preferred codons for improved expression. See , for example, Campbell and Gowri (1990) Plant Physiol. 92: 1-11 for a discussion of host-preferred codon usage.
  • A“promoter” is a region of DNA involved in recognition and binding of RNA polymerase and other proteins to initiate transcription.
  • the promoter sequence consists of proximal and more distal upstream elements, the latter elements often referred to as enhancers.
  • An“enhancer” is a DNA sequence that can stimulate promoter activity, and may be an innate element of the promoter or a heterologous element inserted to enhance the level or tissue- specificity of a promoter. Promoters may be derived in their entirety from a native gene, or be composed of different elements derived from different promoters found in nature, and/or comprise synthetic DNA segments.
  • promoters may direct the expression of a gene in different tissues or cell types, or at different stages of development, or in response to different environmental conditions. It is further recognized that since in most cases the exact boundaries of regulatory sequences have not been completely defined, DNA fragments of some variation may have identical promoter activity.
  • Promoters that cause a gene to be expressed in most cell types at most times are commonly referred to as“constitutive promoters”.
  • the term“inducible promoter” refers to a promoter that selectively express a coding sequence or functional RNA in response to the presence of an endogenous or exogenous stimulus, for example by chemical compounds (chemical inducers) or in response to environmental, hormonal, chemical, and/or developmental signals.
  • Inducible or regulated promoters include, for example, promoters induced or regulated by light, heat, stress, flooding or drought, salt stress, osmotic stress, phytohormones, wounding, or chemicals such as ethanol, abscisic acid (ABA), jasmonate, salicylic acid, or safeners.
  • Translation leader sequence refers to a polynucleotide sequence located between the promoter sequence of a gene and the coding sequence.
  • the translation leader sequence is present in the mRNA upstream of the translation start sequence.
  • the translation leader sequence may affect processing of the primary transcript to mRNA, mRNA stability or translation efficiency. Examples of translation leader sequences have been described ( e.g .,
  • “3’ non-coding sequences”,“transcription terminator” or“termination sequences” refer to DNA sequences located downstream of a coding sequence and include polyadenylation recognition sequences and other sequences encoding regulatory signals capable of affecting mRNA processing or gene expression.
  • the polyadenylation signal is usually characterized by affecting the addition of polyadenylic acid tracts to the 3’ end of the mRNA precursor.
  • the use of 3’ noncoding sequences is exemplified by Ingelbrecht et al. 1989 Plant Cell 1 :671-680.
  • RNA transcript refers to the product resulting from RNA polymerase-catalyzed transcription of a DNA sequence. When the RNA transcript is a perfect complimentary copy of the DNA sequence, it is referred to as the primary transcript or pre-mRNA. A RNA transcript is referred to as the mature RNA or mRNA when it is a RNA sequence derived from post- transcriptional processing of the primary transcript pre-mRNA.“Messenger RNA” or“mRNA” refers to the RNA that is without introns and that can be translated into protein by the cell.
  • cDNA refers to a DNA that is complementary to, and synthesized from, an mRNA template using the enzyme reverse transcriptase.
  • the cDNA can be single-stranded or converted into double-stranded form using the Klenow fragment of DNA polymerase I.
  • Sense RNA refers to RNA transcript that includes the mRNA and can be translated into protein within a cell or in vitro.
  • Antisense RNA refers to an RNA transcript that is complementary to all or part of a target primary transcript or mRNA, and that blocks the expression of a target gene (see, e.g.,
  • the complementarity of an antisense RNA may be with any part of the specific gene transcript, i.e., at the 5’ non-coding sequence, 3’ non-coding sequence, introns, or the coding sequence.
  • “Functional RNA” refers to antisense RNA, ribozyme RNA, or other RNA that may not be translated but yet has an effect on cellular processes.
  • the terms “complement” and“reverse complement” are used interchangeably herein with respect to mRNA transcripts, and are meant to define the antisense RNA of the message.
  • “poly-guide RNA” refers to a contiguous polynucleotide molecule that comprises a plurality of“discrete” guide RNA components, which are individual guide RNA molecules that are separated from each other, for example by one or more nuclease recognition sequences.
  • the poly-guide RNA is encoded by a poly-DNA molecule comprising discrete DNA sequences that each encode a guide RNA that may form a functional complex with a Cas endonuclease for the targeting, recognition, binding, and optionally nicking or cleaving of one or more target polynucleotide sequence(s).
  • the poly-guide RNA is an RNA molecule comprising discrete guide RNA sequences, that each may form a functional complex with a Cas endonuclease for the targeting, recognition, binding, and optionally nicking or cleaving of one or more target polynucleotide sequence(s).
  • Each of the guide RNA DNA and/or RNA sequences within the poly-guide RNA or poly-DNA molecule may be identical, share some percentage of sequence identity with each other, or be non identical.
  • one or more of the nuclease recognition sequence(s) may be the target of a Cas endoribonuclease.
  • the nuclease recognition sequence enables a“functional interaction” between the poly-guide RNA and an endoribonuclease (for example a Cas endoribonuclease), that is, the endoribonuclease can recognize, bind to, and cleave the poly-guide RNA at the nuclease recognition sequence.
  • an endoribonuclease for example a Cas endoribonuclease
  • one component of the poly-guide RNA is heterologous to another component.
  • genomic refers to the entire complement of genetic material (genes and non-coding sequences) that is present in each cell of an organism, or virus or organelle; and/or a complete set of chromosomes inherited as a (haploid) unit from one parent.
  • operably linked refers to the association of nucleic acid sequences on a single nucleic acid fragment so that the function of one is regulated by the other.
  • a promoter is operably linked with a coding sequence when it is capable of regulating the expression of that coding sequence (i.e., the coding sequence is under the transcriptional control of the promoter).
  • Coding sequences can be operably linked to regulatory sequences in a sense or antisense orientation.
  • the complementary RNA regions can be operably linked, directly or indirectly, 5’ to the target mRNA, or 3’ to the target mRNA, or within the target mRNA, or a first complementary region is 5’ and its complement is 3’ to the target mRNA.
  • “host” refers to an organism or cell into which a heterologous component (polynucleotide, polypeptide, other molecule, cell) has been introduced.
  • a “host cell” refers to an in vivo or in vitro eukaryotic cell, prokaryotic cell (e.g., bacterial or archaeal cell), or cell from a multicellular organism (e.g, a cell line) cultured as a unicellular entity, into which a heterologous polynucleotide or polypeptide has been introduced.
  • the cell is selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell, an animal cell, in invertebrate cell, a vertebrate cell, a fish cell, a frog cell, a bird cell, an insect cell, a mammalian cell, a pig cell, a cow cell, a goat cell, a sheep cell, a rodent cell, a rat cell, a mouse cell, a non-human primate cell, and a human cell.
  • the cell is in vitro. In some cases, the cell is in vivo.
  • recombinant refers to an artificial combination of two otherwise separated segments of sequence, e.g, by chemical synthesis, or manipulation of isolated segments of nucleic acids by genetic engineering techniques.
  • the terms“plasmid”,“vector” and“cassette” refer to a linear or circular extra chromosomal element often carrying genes that are not part of the central metabolism of the cell, and usually in the form of double-stranded DNA. Such elements may be autonomously replicating sequences, genome integrating sequences, phage, or nucleotide sequences, in linear or circular form, of a single- or double-stranded DNA or RNA, derived from any source, in which a number of nucleotide sequences have been joined or recombined into a unique construction which is capable of introducing a polynucleotide of interest into a cell.
  • “Transformation cassette” refers to a specific vector comprising a gene and having elements in addition to the gene that facilitates transformation of a particular host cell.
  • “Expression cassette” refers to a specific vector comprising a gene and having elements in addition to the gene that allow for expression of that gene in a host.
  • a“Donor DNA cassette” refer
  • the Donor DNA cassette further comprises polynucleotide sequences that are homologous to the target site, that flank the polynucleotide of interest operably linked to a noncoding expression regulatory element.
  • a recombinant DNA construct comprises an artificial combination of nucleic acid sequences, e.g ., regulatory and coding sequences that are not all found together in nature.
  • a recombinant DNA construct may comprise regulatory sequences and coding sequences that are derived from different sources, or regulatory sequences and coding sequences derived from the same source, but arranged in a manner different than that found in nature.
  • Such a construct may be used by itself or may be used in conjunction with a vector. If a vector is used, then the choice of vector is dependent upon the method that will be used to introduce the vector into the host cells as is well known to those skilled in the art.
  • a plasmid vector can be used.
  • the skilled artisan is well aware of the genetic elements that must be present on the vector in order to successfully transform, select and propagate host cells.
  • the skilled artisan will also recognize that different independent transformation events may result in different levels and patterns of expression (Jones et al. , (1985) EMBO J 4:2411-2418; De Almeida et al. , (1989 )Mol Gen Genetics 218:78-86), and thus that multiple events are typically screened in order to obtain lines displaying the desired expression level and pattern.
  • Such screening may be accomplished standard molecular biological, biochemical, and other assays including Southern analysis of DNA, Northern analysis of mRNA expression, PCR, real time quantitative PCR (qPCR), reverse transcription PCR (RT-PCR), immunoblotting analysis of protein expression, enzyme or activity assays, and/or phenotypic analysis.
  • Southern analysis of DNA Northern analysis of mRNA expression, PCR, real time quantitative PCR (qPCR), reverse transcription PCR (RT-PCR), immunoblotting analysis of protein expression, enzyme or activity assays, and/or phenotypic analysis.
  • Non-limiting examples include differences in taxonomic derivation (e.g, a polynucleotide sequence obtained from Zea mays would be heterologous if inserted into the genome of an Oryza sativa plant, or of a different variety or cultivar of Zea mays ; or a polynucleotide obtained from a bacterium was introduced into a cell of a plant), or sequence (e.g, a polynucleotide sequence obtained from Zea mays, isolated, modified, and re-introduced into a maize plant).
  • heterologous in reference to a sequence can refer to a sequence that originates from a different species, variety, foreign species, or, if from the same species, is substantially modified from its native form in composition and/or genomic locus by deliberate human intervention.
  • a promoter operably linked to a heterologous polynucleotide is from a species different from the species from which the polynucleotide was derived, or, if from the same/analogous species, one or both are substantially modified from their original form and/or genomic locus, or the promoter is not the native promoter for the operably linked polynucleotide.
  • one or more regulatory region(s) and/or a polynucleotide provided herein may be entirely synthetic.
  • a discrete component of a poly-guide RNA molecule is heterologous to at least one other component, /. e. , do not occur together in nature.
  • the endoribonuclease and/or one or more of the discrete guide RNAs of the poly-guide RNA molecule are heterologous to each other. Any one or more of the components of a system may be heterologous with respect to one another, meaning they do not originate from the same organism.
  • the term“expression”, as used herein, refers to the production of a functional end-product (e.g ., an mRNA, guide RNA, or a protein) in either precursor or mature form.
  • a functional end-product e.g ., an mRNA, guide RNA, or a protein
  • A“mature” protein refers to a post-translationally processed polypeptide (i.e., one from which any pre- or propeptides present in the primary translation product have been removed).
  • Precursor protein refers to the primary product of translation of mRNA (i.e., with pre- and propeptides still present). Pre- and propeptides may be but are not limited to intracellular localization signals.
  • CRISPR Clustered Regularly Interspaced Short Palindromic Repeats
  • a CRISPR locus can consist of a CRISPR array comprising short direct repeats (CRISPR repeats) separated by short variable DNA sequences (called spacers), which can be flanked by diverse Cas (CRISPR-associated) genes.
  • an“effector” or“effector protein” is a protein that encompasses an activity including recognizing, binding to, and/or cleaving or nicking a polynucleotide target.
  • An effector, or effector protein may also be an endonuclease.
  • The“effector complex” of a CRISPR system includes Cas proteins involved in crRNA and target recognition and binding. Some of the component Cas proteins may additionally comprise domains involved in target polynucleotide cleavage.
  • Cas protein refers to a polypeptide encoded by a Cas (CRISPR- sociated) gene.
  • a Cas protein includes but is not limited to: a Cas9 protein, a Cpfl (Casl2) protein, a C2cl protein, a C2c2 protein, a C2c3 protein, Cas3, Cas3-HD, Cas 5, Cas7, Cas8, CaslO, or combinations or complexes of these.
  • a Cas protein may be a“Cas endonuclease” or “Cas effector protein”, that when in complex with a suitable polynucleotide component, is capable of recognizing, binding to, and optionally nicking or cleaving all or part of a specific polynucleotide target sequence.
  • a Cas endonuclease described herein comprises one or more nuclease domains.
  • the endonucleases of the disclosure may include those having one or more RuvC nuclease domains.
  • a Cas protein is further defined as a functional fragment or functional variant of a native Cas protein, or a protein that shares at least 50%, between 50% and 55%, at least 55%, between 55% and 60%, at least 60%, between 60% and 65%, at least 65%, between 65% and 70%, at least 70%, between 70% and 75%, at least 75%, between 75% and 80%, at least 80%, between 80% and 85%, at least 85%, between 85% and 90%, at least 90%, between 90% and 95%, at least 95%, between 95% and 96%, at least 96%, between 96% and 97%, at least 97%, between 97% and 98%, at least 98%, between 98% and 99%, at least 99%, between 99% and 100%, or 100% sequence identity with at least 50, between 50 and 100, at least 100, between 100 and 150, at least 150, between 150 and 200, at least 200, between 200 and 250, at least 250, between 250 and 300, at least 300, between 300 and 350, at least 350, between 350 and 400, at
  • A“Cas endonuclease” may comprise domains that enable it to function as a double-strand-break-inducing agent.
  • A“Cas endonuclease” may also comprise one or more modifications or mutations that abolish or reduce its ability to cleave a double-strand polynucleotide (dCas).
  • the Cas endonuclease molecule may retain the ability to nick a single-strand polynucleotide (for example, a D10A mutation in a Cas9 endonuclease molecule) (nCas9).
  • “functionally equivalent fragment” of a Cas endonuclease are used interchangeably herein, and refer to a portion or subsequence of the Cas endonuclease of the present disclosure in which the ability to recognize, bind to, and optionally unwind, nick or cleave (introduce a single or double strand break in) the target site is retained.
  • the portion or subsequence of the Cas endonuclease can comprise a complete or partial (functional) peptide of any one of its domains such as for example, but not limiting to a complete of functional part of a Cas3 HD domain, a complete of functional part of a Cas3 Helicase domain, complete of functional part of a Cascade protein (such as but not limiting to a Cas5, Cas5d, Cas7 and Cas8bl).
  • “functionally equivalent variant” of a Cas endonuclease or Cas effector protein are used interchangeably herein, and refer to a variant of the Cas effector protein disclosed herein in which the ability to recognize, bind to, and optionally unwind, nick or cleave all or part of a target sequence is retained.
  • a Cas endonuclease may also include a multifunctional Cas endonuclease.
  • the term“multifunctional Cas endonuclease” and“multifunctional Cas endonuclease polypeptide” are used interchangeably herein and includes reference to a single polypeptide that has Cas endonuclease functionality (comprising at least one protein domain that can act as a Cas endonuclease) and at least one other functionality, such as but not limited to, the functionality to form a cascade (comprises at least a second protein domain that can form a cascade with other proteins).
  • the multifunctional Cas endonuclease comprises at least one additional protein domain relative (either internally, upstream (5’), downstream (3’), or both internally 5’ and 3’, or any combination thereof) to those domains typical of a Cas endonuclease.
  • cascade and“cascade complex” are used interchangeably herein and include reference to a multi-subunit protein complex that can assemble with a polynucleotide forming a polynucleotide-protein complex (PNP).
  • Cascade is a PNP that relies on the polynucleotide for complex assembly and stability, and for the identification of target nucleic acid sequences.
  • Cascade functions as a surveillance complex that finds and optionally binds target nucleic acids that are complementary to a variable targeting domain of the guide polynucleotide.
  • cleavage-ready Cascade “crCascade”
  • cleavage-ready Cascade complex “crCascade complex”
  • cleavage-ready Cascade system “CRC” and“crCascade system”
  • PNP polynucleotide-protein complex
  • RNA polymerase II RNA polymerase II
  • RNA capping occurs generally as follows: The most terminal 5’ phosphate group of the mRNA transcript is removed by RNA terminal phosphatase, leaving two terminal phosphates.
  • a guanosine monophosphate (GMP) is added to the terminal phosphate of the transcript by a guanylyl transferase, leaving a 5 '-5' triphosphate-linked guanine at the transcript terminus.
  • RNA having, for example, a 5’-hydroxyl group instead of a 5’-cap Such RNA can be referred to as“uncapped RNA”, for example. Uncapped RNA can better accumulate in the nucleus following
  • RNA components herein are uncapped.
  • the term“guide polynucleotide”, relates to a polynucleotide sequence that can form a complex with a Cas endonuclease, including the Cas endonuclease described herein, and enables the Cas endonuclease to recognize, optionally bind to, and optionally cleave a DNA target site.
  • the guide polynucleotide sequence can be a RNA sequence, a DNA sequence, or a combination thereof (a RNA-DNA combination sequence).
  • RNA, crRNA or tracrRNA, respectively refer to a portion or subsequence of the guide RNA, crRNA or tracrRNA, respectively, of the present disclosure in which the ability to function as a guide RNA, crRNA or tracrRNA, respectively, is retained.
  • “functionally equivalent variant” of a guide RNA, crRNA or tracrRNA are used interchangeably herein, and refer to a variant of the guide RNA, crRNA or tracrRNA, respectively, of the present disclosure in which the ability to function as a guide RNA, crRNA or tracrRNA, respectively, is retained.
  • the terms“single guide RNA” and“sgRNA” are used interchangeably herein and relate to a synthetic fusion of two RNA molecules, a crRNA (CRISPR RNA) comprising a variable targeting domain (linked to a tracr mate sequence that hybridizes to a tracrRNA), fused to a tracrRNA (trans-activating CRISPR RNA).
  • the single guide RNA can comprise a crRNA or crRNA fragment and a tracrRNA or tracrRNA fragment of the type II CRISPR/Cas system that can form a complex with a type II Cas endonuclease, wherein said guide RNA/Cas endonuclease complex can direct the Cas endonuclease to a DNA target site, enabling the Cas endonuclease to recognize, optionally bind to, and optionally nick or cleave (introduce a single or double-strand break) the DNA target site.
  • variable targeting domain or“VT domain” is used interchangeably herein and includes a nucleotide sequence that can hybridize (is complementary) to one strand (nucleotide sequence) of a double strand DNA target site.
  • the percent complementation between the first nucleotide sequence domain (VT domain) and the target sequence can be at least 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 63%, 65%, 66%,
  • variable targeting domain can be at least 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
  • variable targeting domain comprises a contiguous stretch of 12 to 30 nucleotides.
  • the variable targeting domain can be composed of a DNA sequence, a RNA sequence, a modified DNA sequence, a modified RNA sequence, or any combination thereof.
  • CER domain of a guide polynucleotide
  • a CER domain comprises a (trans-acting) tracrNucleotide mate sequence followed by a tracrNucleotide sequence.
  • the CER domain can be composed of a DNA sequence, a RNA sequence, a modified DNA sequence, a modified RNA sequence (see for example US20150059010A1, published 26 February 2015), or any combination thereof.
  • guide polynucleotide/Cas endonuclease system “ guide polynucleotide/Cas complex”,“guide polynucleotide/Cas system” and“guided Cas system”“Polynucleotide-guided endonuclease” , “PGEN” are used interchangeably herein and refer to at least one guide polynucleotide and at least one Cas endonuclease, that are capable of forming a complex, wherein said guide polynucleotide/Cas endonuclease complex can direct the Cas endonuclease to a DNA target site, enabling the Cas endonuclease to recognize, bind to, and optionally nick or cleave (introduce a single or double-strand break) the DNA target site.
  • a guide polynucleotide/Cas endonuclease complex herein can comprise Cas protein(s) and suitable polynucleotide component(s) of any of the known CRISPR systems (Horvath and Barrangou, 2010, Science 327: 167-170; Makarova el al. 2015, Nature Reviews Microbiology Vol. 13: 1 - 15; Zetsche el a/. , 2015, Cell 163, 1-13;
  • RNA/Cas endonuclease complex can direct the Cas endonuclease to a DNA target site, enabling the Cas endonuclease to recognize, bind to, and optionally nick or cleave (introduce a single or double-strand break) the DNA target site.
  • target locus “target locus”,“genomic target site”,“genomic target sequence”,“genomic target locus”,“target polynucleotide”, and“protospacer”, are used interchangeably herein and refer to a
  • polynucleotide sequence such as, but not limited to, a nucleotide sequence on a chromosome, episome, a locus, or any other DNA molecule in the genome (including chromosomal, chloroplastic, mitochondrial DNA, plasmid DNA) of a cell, at which a guide polynucleotide/Cas endonuclease complex can recognize, bind to, and optionally nick or cleave .
  • the target site can be an endogenous site in the genome of a cell, or alternatively, the target site can be heterologous to the cell and thereby not be naturally occurring in the genome of the cell, or the target site can be found in a heterologous genomic location compared to where it occurs in nature.
  • terms“endogenous target sequence” and“native target sequence” are used
  • an“artificial target site” or“artificial target sequence” are used interchangeably herein and refer to a target sequence that has been introduced into the genome of a cell.
  • Such an artificial target sequence can be identical in sequence to an endogenous or native target sequence in the genome of a cell but be located in a different position ( i.e ., a non-endogenous or non-native position) in the genome of a cell.
  • A“protospacer adjacent motif’ herein refers to a short nucleotide sequence adjacent to a target sequence (protospacer) that is recognized (targeted) by a guide polynucleotide/Cas endonuclease system described herein.
  • the Cas endonuclease may not successfully recognize a target DNA sequence if the target DNA sequence is not followed by a PAM sequence.
  • the sequence and length of a PAM herein can differ depending on the Cas protein or Cas protein complex used.
  • the PAM sequence can be of any length but is typically 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 nucleotides long.
  • modified target sequence are used interchangeably herein and refer to a target sequence as disclosed herein that comprises at least one alteration when compared to non-altered target sequence. Such“alterations” include, for example: replacement of at least one nucleotide, deletion of at least one nucleotide, insertion of at least one nucleotide, chemical modification of at least one nucleotide, or any combination of the preceding.
  • A“modified nucleotide” or“edited nucleotide” refers to a nucleotide sequence of interest that comprises at least one alteration when compared to its non-modified nucleotide sequence.
  • Such“alterations” include, for example: replacement of at least one nucleotide, deletion of at least one nucleotide, insertion of at least one nucleotide, chemical modification of at least one nucleotide, or any combination of the preceding.
  • Methods for“modifying a target site” and“altering a target site” are used interchangeably herein and refer to methods for producing an altered target site.
  • donor DNA is a DNA construct that comprises a polynucleotide of interest to be inserted into the target site of a Cas endonuclease.
  • polynucleotide modification template includes a polynucleotide that comprises at least one nucleotide modification when compared to the nucleotide sequence to be edited.
  • a nucleotide modification can be at least one nucleotide substitution, addition or deletion.
  • the polynucleotide modification template can further comprise homologous nucleotide sequences flanking the at least one nucleotide modification, wherein the flanking homologous nucleotide sequences provide sufficient homology to the desired nucleotide sequence to be edited.
  • Plant generically includes whole plants, plant organs, plant tissues, seeds, plant cells, seeds and progeny of the same.
  • Plant cells include, without limitation, cells from seeds, suspension cultures, embryos, meristematic regions, callus tissue, leaves, roots, shoots, gametophytes, sporophytes, pollen and microspores.
  • a "plant element” or“plant part” is intended to reference either a whole plant or a plant component, which may comprise differentiated and/or undifferentiated tissues, for example but not limited to plant tissues, parts, and cell types.
  • a plant element is one of the following: whole plant, seedling, meristematic tissue, ground tissue, vascular tissue, dermal tissue, seed, leaf, root, shoot, stem, flower, fruit, stolon, bulb, tuber, corm, keiki, shoot, bud, tumor tissue, and various forms of cells and culture ( e.g ., single cells, embryos, callus tissue), intact plant cells comprising a cell wall, plant protoplasts (lacking a cell wall), plant cell tissue cultures from which plants can be regenerated, plant calli, plant clumps, and plant cells that are intact in plants or parts of plants such as embryos, pollen, ovules, seeds, leaves, flowers, branches, fruit, kernels, ears, cobs, husks
  • Grain is intended to mean the mature seed produced by commercial growers for purposes other than growing or reproducing the species. Progeny, variants, and mutants of the regenerated plants are also included within the scope of the invention, provided that these parts comprise the introduced polynucleotides.
  • plant organ refers to plant tissue or a group of tissues that constitute a morphologically and functionally distinct part of a plant.
  • plant element is synonymous to a "portion" or“part” of a plant, and refers to any part of the plant, and can include distinct tissues and/or organs, and may be used interchangeably with the term “tissue” throughout.
  • a "plant reproductive element” is intended to generically reference any part of a plant that is able to initiate other plants via either sexual or asexual reproduction of that plant, for example but not limited to: seed, seedling, root, shoot, cutting, scion, graft, stolon, bulb, tuber, corm, keiki, or bud.
  • the plant element may be in plant or in a plant organ, tissue culture, or cell culture.
  • the term“monocotyledonous” or“monocot” refers to the subclass of angiosperm plants also known as“monocotyledoneae”, whose seeds typically comprise only one embryonic leaf, or cotyledon.
  • the term includes references to whole plants, plant elements, plant organs ( e.g ., leaves, stems, roots, etc.), seeds, plant cells, and progeny of the same.
  • the term“dicotyledonous” or“dicot” refers to the subclass of angiosperm plants also knows as“dicotyledoneae”, whose seeds typically comprise two embryonic leaves, or cotyledons.
  • the term includes references to whole plants, plant elements, plant organs (e.g., leaves, stems, roots, etc.), seeds, plant cells, and progeny of the same.
  • a "male sterile plant” is a plant that does not produce male gametes that are viable or otherwise capable of fertilization.
  • a "female sterile plant” is a plant that does not produce female gametes that are viable or otherwise capable of fertilization. It is recognized that male-sterile and female-sterile plants can be female-fertile and male- fertile, respectively. It is further recognized that a male fertile (but female sterile) plant can produce viable progeny when crossed with a female fertile plant and that a female fertile (but male sterile) plant can produce viable progeny when crossed with a male fertile plant.
  • non-conventional yeast refers to any yeast that is not a
  • Saccharomyces e.g, S. cerevisiae
  • Schizosaccharomyces yeast species see“Non- Conventional Yeasts in Genetics, Biochemistry and Biotechnology: Practical Protocols”, K. Wolf, K.D. Breunig, G. Barth, Eds., Springer-Verlag, Berlin, Germany, 2003).
  • crossed or“cross” or“crossing” in the context of this disclosure means the fusion of gametes via pollination to produce progeny (i.e., cells, seeds, or plants).
  • progeny i.e., cells, seeds, or plants.
  • the term encompasses both sexual crosses (the pollination of one plant by another) and selfing (self- pollination, i.e., when the pollen and ovule (or microspores and megaspores) are from the same plant or genetically identical plants).
  • introgression refers to the transmission of a desired allele of a genetic locus from one genetic background to another.
  • introgression of a desired allele at a specified locus can be transmitted to at least one progeny plant via a sexual cross between two parent plants, where at least one of the parent plants has the desired allele within its genome.
  • transmission of an allele can occur by recombination between two donor genomes, e.g, in a fused protoplast, where at least one of the donor protoplasts has the desired allele in its genome.
  • the desired allele can be, e.g, a transgene, a modified (mutated or edited) native allele, or a selected allele of a marker or QTL.
  • the term“isoline” is a comparative term, and references organisms that are genetically identical, but differ in treatment.
  • two genetically identical maize plant embryos may be separated into two different groups, one receiving a treatment (such as the introduction of a CRISPR-Cas effector endonuclease) and one control that does not receive such treatment. Any phenotypic differences between the two groups may thus be attributed solely to the treatment and not to any inherency of the plant's endogenous genetic makeup.
  • "Introducing" is intended to mean presenting to a target, such as a cell or organism, a polynucleotide or polypeptide or polynucleotide-protein complex, in such a manner that the component s) gains access to the interior of a cell of the organism or to the cell itself.
  • A“polynucleotide of interest” includes any nucleotide sequence that
  • a“polynucleotide of interest” encodes a protein or polypeptide that is“of interest” for a particular purpose, e.g. a selectable marker.
  • a trait or polynucleotide“of interest” is one that improves a desirable phenotype of a plant, particularly a crop plant, i.e. a trait of agronomic interest.
  • Polynucleotides of interest include, but are not limited to, polynucleotides encoding important traits for agronomics, herbicide-resistance, insecticidal resistance, disease resistance, nematode resistance, herbicide resistance, microbial resistance, fungal resistance, viral resistance, fertility or sterility, grain characteristics, commercial products, phenotypic marker, or any other trait of agronomic or commercial importance.
  • a polynucleotide of interest may additionally be utilized in either the sense or anti- sense orientation. Further, more than one polynucleotide of interest may be utilized together, or “stacked”, to provide additional benefit.
  • a“polynucleotide of interest” may encode a gene expression regulatory element, for example a promoter, intron, terminator,
  • a“polynucleotide of interest” may comprise a DNA sequences that encodes for an RNA molecule, for example a functional RNA, siRNA, miRNA, or a guide RNA that is capable of interacting with a Cas endonuclease to bind to a target polynucleotide sequence.
  • A“complex trait locus” includes a genomic locus that has multiple transgenes genetically linked to each other.
  • compositions and methods herein may provide for an improved "agronomic trait” or “trait of agronomic importance” or“trait of agronomic interest” to a plant, which may include, but not be limited to, the following: disease resistance, drought tolerance, heat tolerance, cold tolerance, salinity tolerance, metal tolerance, herbicide tolerance, improved water use efficiency, improved nitrogen utilization, improved nitrogen fixation, pest resistance, herbivore resistance, pathogen resistance, yield improvement, health enhancement, vigor improvement, growth improvement, photosynthetic capability improvement, nutrition enhancement, altered protein content, altered oil content, increased biomass, increased shoot length, increased root length, improved root architecture, modulation of a metabolite, modulation of the proteome, increased seed weight, altered seed carbohydrate composition, altered seed oil composition, altered seed protein composition, altered seed nutrient composition, as compared to an isoline plant not comprising a modification derived from the methods or compositions herein.
  • Agronomic trait potential is intended to mean a capability of a plant element for exhibiting a phenotype, preferably an improved agronomic trait, at some point during its life cycle, or conveying said phenotype to another plant element with which it is associated in the same plant.
  • a decrease in a characteristic may be at least 1%, at least 2%, at least 3%, at least 4%, at least 5%, between 5% and 10%, at least 10%, between 10% and 20%, at least 15%, at least 20%, between 20% and 30%, at least 25%, at least 30%, between 30% and 40%, at least 35%, at least 40%, between 40% and 50%, at least 45%, at least 50%, between 50% and 60%, at least about 60%, between 60% and 70%, between 70% and 80%, at least 75%, at least about 80%, between 80% and 90%, at least about 90%, between 90% and 100%, at least 100%, between 100% and 200%, at least 200%, at least about 300%, at least about 400%) or more lower than the untreated control and an increase may be at least 1%, at least 2%, at least 3%, at least 4%, at least 5%
  • Double-strand breaks induced by double-strand-break-inducing agents can result in the induction of DNA repair mechanisms, including the non-homologous end-joining pathway, and homologous recombination.
  • Endonucleases include a range of different enzymes, including restriction endonucleases (see e.g. Roberts et ah, (2003) Nucleic Acids Res 1 :418-20), Roberts et ah, (2003) Nucleic Acids Res 31 : 1805-12, and Belfort et ah, (2002) in Mobile DNA II, pp. 761- 783, Eds. Craigie et ak, (ASM Press, Washington, DC)), meganucleases (see e.g, WO
  • site-specific base conversions can also be achieved to engineer one or more nucleotide changes to create one or more EMEs described herein into the genome.
  • site-specific base edit mediated by an OG to T ⁇ A or an A ⁇ T to G * C base editing deaminase enzymes (Gaudelli et al., Programmable base editing of A ⁇ T to G * C in genomic DNA without DNA cleavage.” Nature (2017); Nishida et al.“Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems.” Science 353 (6305) (2016); Komor et al.“Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage.” Nature 533 (7603) (2016):420-4.
  • Any double-strand-break or -nick or -modification inducing agent may be used for the methods described herein, including for example but not limited to: Cas endonucleases, recombinases, TALENs, zinc finger nucleases, restriction endonucleases, meganucleases, and deaminases.
  • Class I Cas endonucleases comprise multisubunit effector complexes (Types I, III, and IV), while Class 2 systems comprise single protein effectors (Types II, V, and VI) (Makarova et al. 2015, Nature Reviews Microbiology Vol. 13: 1- 15; Zetsche et al., 2015, Cell 163, 1-13; Shmakov et al., 2015, Molecular Cell 60, 1-13; Haft et al, 2005, Computational Biology, PLoS Comput Biol 1(6): e60; and Koonin et al. 2017, Curr Opinion Microbiology 37:67-78).
  • the Cas endonuclease acts in complex with a guide RNA (gRNA) that directs the Cas endonuclease to cleave the DNA target to enable target recognition, binding, and cleavage by the Cas endonuclease.
  • gRNA guide RNA
  • the Cas endonuclease-guide polynucleotide complex recognizes a short nucleotide sequence adjacent to the target sequence (protospacer), called a“protospacer adjacent motif’ (PAM).
  • PAM protospacer adjacent motif
  • Examples of a Cas endonuclease include but are not limited to Cas9 and Cpfl .
  • Cas9 (formerly referred to as Cas5, Csnl, or Csxl2) is a Class 2 Type II Cas endonuclease (Makarova et al. 2015, Nature Reviews Microbiology Vol. 13: 1-15).
  • a Cas9-gRNA complex recognizes a 3’ PAM sequence (NGG for the S. pyogenes Cas9) at the target site, permitting the spacer of the guide RNA to invade the double-stranded DNA target, and, if sufficient homology between the spacer and protospacer exists, generate a double-strand break cleavage.
  • Cas9 endonucleases comprise RuvC and HNH domains that together produce double strand breaks, and separately can produce single strand breaks. For the S.
  • Cpfl is a Clas 2 Type V Cas endonuclease, and comprises nuclease RuvC domain but lacks an HNH domain (Yamane et al., 2016, Cell 165:949- 962). Cpfl endonucleases create“sticky” overhang ends.
  • Cas-gRNA systems at a genomic target site include but are not limited to insertions, deletions, substitutions, or modifications of one or more nucleotides at the target site; modifying or replacing nucleotide sequences of interest (such as a regulatory elements); insertion of polynucleotides of interest; gene knock-out; gene-knock in; modification of splicing sites and/or introducing alternate splicing sites; modifications of nucleotide sequences encoding a protein of interest; amino acid and/or protein fusions; and gene silencing by expressing an inverted repeat into a gene of interest.
  • a“polynucleotide modification template” comprises at least one nucleotide modification when compared to the nucleotide sequence to be edited.
  • a nucleotide modification can be at least one nucleotide substitution, addition, deletion, or chemical alteration.
  • the polynucleotide modification template can further comprise homologous nucleotide sequences flanking the at least one nucleotide modification, wherein the flanking homologous nucleotide sequences provide sufficient homology to the desired nucleotide sequence to be edited.
  • a polynucleotide of interest is inserted at a target site and provided as part of a“donor DNA” molecule.
  • “donor DNA” is a DNA construct that comprises a polynucleotide of interest to be inserted into the target site of a Cas
  • the donor DNA construct further comprises a first and a second region of homology that flank the polynucleotide of interest.
  • the first and second regions of homology of the donor DNA share homology to a first and a second genomic region, respectively, present in or flanking the target site of the cell or organism genome.
  • the donor DNA can be tethered to the guide polynucleotide. Tethered donor DNAs can allow for co-localizing target and donor DNA, useful in genome editing, gene insertion, and targeted genome regulation, and can also be useful in targeting post-mitotic cells where function of endogenous HR machinery is expected to be highly diminished (Mali et al. , 2013, Nature Methods Vol. 10: 957-963).
  • the amount of homology or sequence identity shared by a target and a donor polynucleotide can vary and includes total lengths and/or regions.
  • the process for editing a genomic sequence at a Cas-gRNA double-strand-break site with a modification template generally comprises: providing a host cell with a Cas-gRNA complex that recognizes a target sequence in the genome of the host cell and is able to induce a single- or double-strand-break in the genomic sequence, and optionally at least one
  • polynucleotide modification template comprising at least one nucleotide alteration when compared to the nucleotide sequence to be edited.
  • the polynucleotide modification template can further comprise nucleotide sequences flanking the at least one nucleotide alteration, in which the flanking sequences are substantially homologous to the chromosomal region flanking the double-strand break.
  • Genome editing using double-strand-break-inducing agents, such as Cas9- gRNA complexes has been described, for example in US20150082478 published on 19 March
  • the gene comprising the Cas endonuclease may be optimized as described in WO2016186953 published 24 November 2016, and then delivered into cells as DNA expression cassettes by methods known in the art.
  • the Cas endonuclease is provided as a polypeptide.
  • the Cas endonuclease is provided as a polynucleotide encoding a polypeptide.
  • the guide RNA is provided as a DNA molecule encoding one or more RNA molecules.
  • the guide RNA is provide as RNA or chemically-modified RNA.
  • the Cas endonuclease protein and guide RNA are provided as a ribonucleoprotein complex (RNP).
  • Cas endonucleases Cas endoribonucleases, Cas effector proteins, and Cascade proteins may all be generally referred to as“Cas proteins”.
  • Endonucleases are enzymes that cleave the phosphodiester bond within a polynucleotide chain, and include restriction endonucleases that cleave DNA at specific sites without damaging the bases.
  • restriction endonucleases include restriction endonucleases, meganucleases, TAL effector nucleases (TALENs), zinc finger nucleases, and Cas (CRISPR- associated) effector endonucleases.
  • Cas endonucleases either as single effector proteins or in an effector complex with other components, unwind the DNA duplex at the target sequence and optionally cleave at least one DNA strand, as mediated by recognition of the target sequence by a polynucleotide (such as, but not limited to, a crRNA or guide RNA) that is in complex with the Cas effector protein.
  • a polynucleotide such as, but not limited to, a crRNA or guide RNA
  • Such recognition and cutting of a target sequence by a Cas endonuclease typically occurs if the correct protospacer-adjacent motif (PAM) is located at or adjacent to the 3' end of the DNA target sequence.
  • PAM protospacer-adjacent motif
  • Cas endonuclease herein may lack DNA cleavage or nicking activity, but can still specifically bind to a DNA target sequence when complexed with a suitable RNA component.
  • Cas endonucleases may occur as individual effectors (Class 2 CRISPR systems) or as part of larger effector complexes (Class I CRISPR systems). Cas endonucleases that have been described include, but are not limited to, for example: Cas3 (a feature of Class 1 type I systems), Cas9 (a feature of Class 2 type II systems) and Cas 12 (Cpfl) (a feature of Class 2 type V systems).
  • Cas endoribonucleases are a feature of some systems, for example Type I.
  • Type I-E Cas endoribonuclease is an integral subunit of the targeting complex of the Cascade, that binds and cleaves within each repeat sequence of the precursor crRNA (pre-crRNA) transcript, generating a library of crRNAs wherein each contains a unique spacer sequence flanked by portions of the adjacent repeats (Hochstrasser and Doudna, TIBS 40(l):58-66, 2015).
  • Type I-E Cas endonucleases, endoribonucleases, and effector proteins can be used for targeted genome editing (via simplex and multiplex double-strand breaks and nicks) and targeted genome regulation (via tethering of epigenetic effector domains to either the Cas protein or sgRNA.
  • a Cas endonuclease can also be engineered to function as an RNA-guided
  • RNA tethers could serve as a scaffold for the assembly of multiprotein and nucleic acid complexes (Mali et al. , 2013, Nature Methods Vol. 10: 957-963).
  • Fragments and variants of Cas endonucleases, endoribonucleases, and effector proteins can be obtained via methods such as site-directed mutagenesis and synthetic
  • a Cas endonuclease, endoribonuclease, or effector protein can comprise a modified form of the Cas polypeptide.
  • the modified form of the Cas polypeptide can include an amino acid change (e.g ., deletion, insertion, chemical alteration, or substitution) that reduces the naturally-occurring nuclease activity of the Cas protein.
  • the modified form of the Cas protein has less than 50%, less than 40%, less than 30%, less than 20%, less than 10%, less than 5%, or less than 1% of the nuclease activity of the corresponding wild- type Cas polypeptide (US20140068797 published 06 March 2014).
  • the modified form of the Cas polypeptide has no substantial nuclease activity and is referred to as catalytically “inactivated Cas” or“deactivated Cas (dCas).”
  • An inactivated Cas/deactivated Cas includes a deactivated Cas endonuclease (dCas).
  • a catalytically inactive Cas effector protein can be fused to a heterologous sequence to induce or modify activity.
  • a Cas endonuclease, endoribonuclease, or effector protein can be part of a fusion protein comprising one or more heterologous protein domains (e.g ., 1, 2, 3, or more domains in addition to the Cas protein).
  • a fusion protein may comprise any additional protein sequence, and optionally a linker sequence between any two domains, such as between Cas and a first heterologous domain.
  • protein domains that may be fused to a Cas protein herein include, without limitation, epitope tags (e.g., histidine [His], V5, FLAG, influenza hemagglutinin [HA], myc, VSV-G, thioredoxin [Trx]), reporters (e.g, glutathione-5-transferase [GST], horseradish peroxidase [HRP], chloramphenicol acetyltransferase [CAT], beta- galactosidase, beta-glucuronidase [GUS], luciferase, green fluorescent protein [GFP], HcRed, DsRed, cyan fluorescent protein [CFP], yellow fluorescent protein [YFP], blue fluorescent protein [BFP]), and domains having one or more of the following activities: methylase activity, demethylase activity, transcription activation activity (e.g, VP 16 or VP64), transcription repression activity, transcription release factor activity, histone modification activity, methyl
  • a Cas protein can also be in fusion with a protein that binds DNA molecules or other molecules, such as maltose binding protein (MBP), S- tag, Lex A DNA binding domain (DBD), GAL4A DNA binding domain, and herpes simplex virus (HSV) VP 16.
  • MBP maltose binding protein
  • DBD Lex A DNA binding domain
  • GAL4A DNA binding domain GAL4A DNA binding domain
  • HSV herpes simplex virus
  • a catalytically active and/or inactive Cas endonuclease, endoribonuclease, or effector protein can be fused to a heterologous sequence (US20140068797 published 06 March 2014).
  • Suitable fusion partners include, but are not limited to, a polypeptide that provides an activity that indirectly increases transcription by acting directly on the target DNA or on a polypeptide (e.g, a histone or other DNA-binding protein) associated with the target DNA.
  • Additional suitable fusion partners include, but are not limited to, a polypeptide that provides for methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, or demyristoylation activity.
  • fusion partners include, but are not limited to, a polypeptide that directly provides for increased transcription of the target nucleic acid (e.g, a transcription activator or a fragment thereof, a protein or fragment thereof that recruits a transcription activator, a small molecule/drug-responsive transcription regulator, etc.).
  • a polypeptide that directly provides for increased transcription of the target nucleic acid e.g, a transcription activator or a fragment thereof, a protein or fragment thereof that recruits a transcription activator, a small molecule/drug-responsive transcription regulator, etc.
  • Cas proteins described herein can be expressed and purified by methods known in the art, for example as described in WO/2017/186953 published 24 November 2016.
  • a Cas endonuclease, endoribonuclease, or effector protein can comprise a heterologous nuclear localization sequence (NLS).
  • a heterologous NLS amino acid sequence herein may be of sufficient strength to drive accumulation of a Cas protein in a detectable amount in the nucleus of a yeast cell herein, for example.
  • An NLS may comprise one
  • NLS may be operably linked to the N-terminus or C-terminus of a Cas protein herein, for example.
  • Two or more NLS sequences can be linked to a Cas protein, for example, such as on both the N- and C-termini of a Cas protein.
  • the Cas endonuclease gene can be operably linked to a SV40 nuclear targeting signal upstream of the Cas codon region and a bipartite VirD2 nuclear localization signal (Tinland el al. (1992) Proc. Natl. Acad. Sci. USA 89:7442-6) downstream of the Cas codon region.
  • a SV40 nuclear targeting signal upstream of the Cas codon region
  • a bipartite VirD2 nuclear localization signal Tinland el al. (1992) Proc. Natl. Acad. Sci. USA 89:7442-6
  • suitable NLS sequences herein include those disclosed in U.S. Patent Nos. 6,660,830 and 7,309,576.
  • the guide polynucleotide molecule comprises a Cas endonuclease recognition
  • the guide such as a gRNA, comprises a CRISPR RNA (crRNA) and a trans-activating CRISPR RNA (tracrRNA) to guide the Cas endonuclease to its DNA target.
  • the crRNA comprises a spacer region
  • the gRNA is a“single” guide RNA” (sgRNA) that comprises a synthetic fusion of crRNA and tracrRNA.
  • sgRNA single guide RNA
  • a targeting method herein can be performed in such a way that two or more DNA target sites are targeted in the method, for example. Such a method can optionally be
  • a multiplex method is typically performed by a targeting method herein in which multiple different RNA components are provided, each designed to guide a guide polynucleotide/Cas endonuclease complex to a unique DNA target site.
  • RNA sequences are separated by a Cas endoribonuclease binding/cleavage site, such that a contiguous RNA sequence may comprise a plurality of gRNA sequences, for example for forming a complex with a Cas endonuclease for genomic target site cleavage.
  • the plurality of gRNA sequences in the contiguous RNA sequence may comprise identical, similar, or dissimilar gRNA sequences, or any combination thereof.
  • the poly-guide RNA may be used in vitro or in vivo , for example as part of an RNA sequence that comprises more than one guide RNA for a Cas endonuclease.
  • the poly-guide RNA may be introduced to a cell, for example, as a component of a DNA vector comprising a DNA sequence that can be translated into the poly-guide RNA.
  • the poly-guide RNA may be introduced directly as a polyribonucleotide.
  • the method of introduction to a cell may be by any means known in the art, for example but not limited to Agrobacterium or Ochrobacterium transformation, or by particle bombardment.
  • the polygRNA may be provided to a target cell or target polynucleotide by any method known in the art, including but not limited to: direct contact in a solution, delivery on a solid matrix such as a particle (microparticle or nanoparticle or silicon carbide“whiskers”), via a liposome, or as part of a recombinant vector.
  • the polygRNA may be provided as a DNA sequence encoding an RNA sequence, or provided directly as an RNA sequence, or provided as a combination RNA-DNA sequence.
  • the Cas endoribonuclease may be provided to the polygRNA, or to a target cell, by any method known in the art, including but not limited to: direct contact in a solution, delivery on a solid matrix such as a particle (microparticle or nanoparticle or silicon carbide “whiskers”), via a liposome, or as part of a recombinant vector.
  • the RN may be provided as a DNA sequence encoding an RNA sequence that can optionally be translated into the RN protein, or provided as an RNA sequence that may be optionally translated into the RN protein, or provided as an RN protein. Double-Strand-Break Repair and Polynucleotide Modification
  • a double-strand-break-inducing agent such as a guided Cas endonuclease can recognize, bind to a DNA target sequence and introduce a single strand (nick) or double-strand break.
  • a single or double-strand break is induced in the DNA, the cell’s DNA repair mechanism is activated to repair the break, for example via nonhomologous end-joining (NHEJ) or Homology -Directed Repair (HDR) processes which can lead to modifications at the target site.
  • NHEJ nonhomologous end-joining
  • HDR Homology -Directed Repair
  • NHEJ nonhomologous end-joining pathway
  • Modification of a target polynucleotide includes any one or more of the following: insertion of at least one nucleotide, deletion of at least one nucleotide, chemical alteration of at least one nucleotide, replacement of at least one nucleotide, or mutation of at least one nucleotide.
  • the DNA repair mechanism creates an imperfect repair of the double-strand break, resulting in a change of a nucleotide at the break site.
  • a polynucleotide template may be provided to the break site, wherein the repair results in a template-directed repair of the break.
  • a donor polynucleotide may be provided to the break site, wherein the repair results in the incorporation of the donor polynucleotide into the break site.
  • the methods and compositions described herein improve the probability of a non-NHEJ repair mechanism outcome at a DSB. In one aspect, an increase of the HDR to NHEJ repair ratio is effected.
  • Homology-directed repair is a mechanism in cells to repair double- stranded and single stranded DNA breaks.
  • Homology-directed repair includes homologous recombination (HR) and single-strand annealing (SSA) (Lieber. 2010 Annu. Rev. Biochem. 79: 181-211).
  • HR homologous recombination
  • SSA single-strand annealing
  • Other forms of HDR include single-stranded annealing (SSA) and breakage-induced replication, and these require shorter sequence homology relative to HR.
  • a“region of homology to a genomic region” that is found on the donor DNA is a region of DNA that has a similar sequence to a given“genomic region” in the cell or organism genome.
  • a region of homology can be of any length that is sufficient to promote homologous recombination at the cleaved target site.
  • the region of homology can comprise at least 5-10, 5-15, 5-20, 5-25, 5-30, 5-35, 5-40, 5-45, 5- 50, 5-55, 5-60, 5-65, 5- 70, 5-75, 5-80, 5-85, 5-90, 5-95, 5-100, 5-200, 5-300, 5-400, 5-500, 5-600, 5-700, 5-800, 5-900, 5-1000, 5-1100, 5-1200, 5-1300, 5- 1400, 5-1500, 5-1600, 5-1700, 5-1800, 5-1900, 5-2000, 5-2100, 5-2200, 5-2300, 5-2400, 5-2500, 5-2600, 5-2700, 5-2800, 5-2900, 5-3000, 5-3100 or more bases in length such that the region of homology has sufficient homology to undergo homologous recombination with the
  • “Sufficient homology” indicates that two polynucleotide sequences share structural similarity to act as substrates for a homologous recombination reaction.
  • the structural similarity includes overall length of each polynucleotide fragment, as well as the sequence similarity of the polynucleotides.
  • Sequence similarity can be described by the percent sequence identity over the whole length of the sequences, and/or by conserved regions comprising localized similarities such as contiguous nucleotides having 100% sequence identity, and percent sequence identity over a portion of the length of the sequences.
  • the amount of homology or sequence identity shared by a target and a donor polynucleotide can vary and includes total lengths and/or regions having unit integral values in the ranges of about 1-20 bp, 20-50 bp, 50-100 bp, 75-150 bp, 100-250 bp, 150-300 bp, 200-400 bp, 250-500 bp, 300-600 bp, 350-750 bp, 400-800 bp, 450-900 bp, 500-1000 bp, 600-1250 bp, 700-1500 bp, 800-1750 bp, 900-2000 bp, 1-2.5 kb, 1.5-3 kb, 2-4 kb, 2.5-5 kb, 3-6 kb, 3.5-7 kb, 4-8 kb, 5-10 kb, or up to and including the total length of the target site.
  • ranges include every integer within the range, for example, the range of 1-20 bp includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 and 20 bps.
  • the amount of homology can also be described by percent sequence identity over the full aligned length of the two polynucleotides which includes percent sequence identity of about at least 50%, 55%, 60%, 65%, 70%, 71%, 72%,
  • Sufficient homology includes any combination of polynucleotide length, global percent sequence identity, and optionally conserved regions of contiguous nucleotides or local percent sequence identity, for example sufficient homology can be described as a region of 75-150 bp having at least 80% sequence identity to a region of the target locus.
  • DNA double-strand breaks can be an effective factor to stimulate homologous recombination pathways (Puchta et al. , (1995) Plant Mol Biol 28:281-92; Tzfira and White, (2005) Trends Biotechnol 23:567-9; Puchta, (2005) J Exp Bot 56: 1-14).
  • DNA-breaking agents a two- to nine-fold increase of homologous recombination was observed between artificially constructed homologous DNA repeats in plants (Puchta et al. , (1995) Plant Mol Biol 28:281-92).
  • experiments with linear DNA molecules demonstrated enhanced homologous recombination between plasmids (Lyznik et al. , (1991) Mol Gen Genet 230:209-18).
  • HDR are contemplated.
  • the fraction of HR reads relative to the number of total mutant reads is at least 2, 3, 4, 5, 6, 7, 8, 9, 10, between 10 and 15, 15, between 15 and 20, 20, between 20 and 25, 25, between 25 and 30, 30, between 30 and 40, 40, between 40 and 50, 50, between 50 and 60, 60, between 60 and 70, 70, between 70 and 80, 80, between 80 and 90, 90, between 90 and 100, 100, between 100 and 125, 125, between 125 and 150, greater than 150, or infinitely greater than that observed for a single cleavage strategy.
  • the percent of HR reads relative to the number of total mutant reads is at least 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 20%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%,
  • compositions and methods described herein can be used for gene targeting.
  • DNA targeting can be performed by cleaving one or both strands at a specific polynucleotide sequence in a cell with a Cas endonuclease associated with a suitable guide polynucleotide component. Once a single or double-strand break is induced in the DNA, the cell’s DNA repair mechanism is activated to repair the break via nonhomologous end-joining (NHEJ) or Homology -Directed Repair (HDR) processes which can lead to modifications at the target site.
  • NHEJ nonhomologous end-joining
  • HDR Homology -Directed Repair
  • the length of the DNA sequence at the target site can vary, and includes, for example, target sites that are at least 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more than 30 nucleotides in length. It is further possible that the target site can be palindromic, that is, the sequence on one strand reads the same in the opposite direction on the complementary strand.
  • the nick/cleavage site can be within the target sequence or the nick/cleavage site could be outside of the target sequence.
  • the cleavage could occur at nucleotide positions immediately opposite each other to produce a blunt end cut or, in other cases, the incisions could be staggered to produce single-stranded overhangs, also called“sticky ends”, which can be either 5' overhangs, or 3' overhangs.
  • Active variants of genomic target sites can also be used. Such active variants can comprise at least 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to the given target site, wherein the active variants retain biological activity and hence are capable of being recognized and cleaved by an Cas endonuclease.
  • Assays to measure the single or double-strand break of a target site by an endonuclease are known in the art and generally measure the overall activity and specificity of the agent on DNA substrates comprising recognition sites.
  • the process for editing a genomic sequence combining DSB and modification templates generally comprises: introducing into a host cell a DSB-inducing agent, or a nucleic acid encoding a DSB-inducing agent, that recognizes a target sequence in the chromosomal sequence and is able to induce a DSB in the genomic sequence, and at least one polynucleotide modification template comprising at least one nucleotide alteration when compared to the nucleotide sequence to be edited.
  • the polynucleotide modification template can further comprise nucleotide sequences flanking the at least one nucleotide alteration, in which the flanking sequences are substantially homologous to the chromosomal region flanking the DSB.
  • Genome editing using DSB-inducing agents, such as Cas-gRNA complexes has been described, for example in US20150082478 published on 19 March 2015, WO2015026886 published on 26 February 2015, W02016007347 published 14 January 2016, and WO/2017/025131 published on 18 February 2016.
  • RNA/Cas endonuclease systems have been described (see for example: US20150082478 A1 published 19 March 2015, WO2015026886 published 26 February 2015, and US20150059010 published 26 February 2015) and include but are not limited to modifying or replacing nucleotide sequences of interest (such as a regulatory elements), insertion of polynucleotides of interest, gene drop-out, gene knock-out, gene-knock in, modification of splicing sites and/or introducing alternate splicing sites, modifications of nucleotide sequences encoding a protein of interest, amino acid and/or protein fusions, and gene silencing by expressing an inverted repeat into a gene of interest.
  • nucleotide sequences of interest such as a regulatory elements
  • Proteins may be altered in various ways including amino acid substitutions, deletions, truncations, and insertions. Methods for such manipulations are generally known. For example, amino acid sequence variants of the protein(s) can be prepared by mutations in the DNA. Methods for mutagenesis and nucleotide sequence alterations include, for example, Kunkel, (1985) Proc. Natl. Acad. Sci. USA 82:488-92; Kunkel et al., ( 1987) Meth Unzymol 154:367-82; U.S. Patent No. 4,873,192; Walker and Gaastra, eds. (1983) Techniques in
  • Conservative deletions, insertions, and amino acid substitutions are not expected to produce radical changes in the characteristics of the protein, and the effect of any substitution, deletion, insertion, or combination thereof can be evaluated by routine screening assays.
  • Assays for double-strand-break-inducing activity are known and generally measure the overall activity and specificity of the agent on DNA substrates comprising target sites.
  • crCascade cleavage ready Cascade
  • crRNA CRISPR RNA
  • the genes comprising the crCascade may be optimized as described in WO2016186953 published 24 November 2016, and then delivered into cells as DNA expression cassettes by methods known in the art.
  • the components necessary to comprise an active crCascade complex may also be delivered as RNA with or without modifications that protect the RNA from degradation or as mRNA capped or uncapped (Zhang, Y. et al. , 2016, Nat. Commun. 7:12617) or Cas protein guide polynucleotide complexes (WO2017070032 published 27 April 2017), or any combination thereof.
  • a part or part(s) of the crCascade complex and crRNA may be expressed from a DNA construct while other components are delivered as RNA with or without modifications that protect the RNA from degradation or as mRNA capped or uncapped (Zhang et al. 2016 Nat. Commun. 7: 12617) or Cas protein guide polynucleotide complexes (W02017070032 published 27 April 2017) or any combination thereof.
  • tRNA derived elements may also be used to recruit endogenous RNAses to cleave crRNA transcripts into mature forms capable of guiding the crCascade complex to its DNA target site, as described, for example, in W02017105991 published 22 June 2017.
  • crCascade nickase complexes may be utilized separately or conceitedly to generate a single or multiple DNA nicks on one or both DNA strands. Furthermore, the cleavage activity of the Cas endonuclease may be deactivated by altering key catalytic residues in its cleavage domain (Sinkunas, T. et al. , 2013, EMBO J.
  • RNA guided helicase that may be used to enhance homology-directed repair, induce transcriptional activation, or remodel local DNA structures.
  • activity of the Cas cleavage and helicase domains may both be knocked-out and used in combination with other DNA cutting, DNA nicking, DNA binding, transcriptional activation, transcriptional repression, DNA remodeling, DNA deamination, DNA unwinding, DNA recombination enhancing, DNA integration, DNA inversion, and DNA repair agents.
  • the PAM preferences for each new system disclosed herein may be examined. If the cleavage ready Cascade (crCascade) complex results in degradation of the randomized PAM library, the crCascade complex can be converted into a nickase by disabling the ATPase dependent helicase activity either through mutagenesis of critical residues or by assembling the reaction in the absence of ATP as described previously (Sinkunas, T. et al., 2013, EMBO J. 32:385-394). Two regions of PAM randomization separated by two protospacer targets may be utilized to generate a double-stranded DNA break which may be captured and sequenced to examine the PAM sequences that support cleavage by the respective crCascade complex.
  • the invention describes a method for modifying a target site in the genome of a cell, the method comprising introducing into a cell at least one Cas endonuclease and guide RNA, and identifying at least one cell that has a modification at the target site.
  • the nucleotide to be edited can be located within or outside a target site recognized and cleaved by a Cas endonuclease.
  • the at least one nucleotide modification is not a modification at a target site recognized and cleaved by a Cas endonuclease.
  • a knock-out may be produced by an indel (insertion or deletion of nucleotide bases in a target DNA sequence through NHEJ), or by specific removal of sequence that reduces or completely destroys the function of sequence at or near the targeting site.
  • a guide polynucleotide/Cas endonuclease induced targeted mutation can occur in a nucleotide sequence that is located within or outside a genomic target site that is recognized and cleaved by the Cas endonuclease.
  • the method for editing a nucleotide sequence in the genome of a cell can be a method without the use of an exogenous selectable marker by restoring function to a non functional gene product.
  • the invention describes a method for modifying a target site in the genome of a cell, the method comprising introducing into a cell at least one PGEN described herein and at least one donor DNA, wherein said donor DNA comprises a
  • polynucleotide of interest and optionally, further comprising identifying at least one cell that said polynucleotide of interest integrated in or near said target site.
  • the methods disclosed herein may employ homologous
  • HR recombination
  • a polynucleotide of interest is introduced into the organism cell via a donor DNA construct.
  • donor DNA is a DNA construct that comprises a polynucleotide of interest to be inserted into the target site of a Cas endonuclease.
  • the donor DNA construct further comprises a first and a second region of homology that flank the polynucleotide of interest.
  • the first and second regions of homology of the donor DNA share homology to a first and a second genomic region, respectively, present in or flanking the target site of the cell or organism genome.
  • the donor DNA can be tethered to the guide polynucleotide. Tethered donor
  • DNAs can allow for co-localizing target and donor DNA, useful in genome editing, gene insertion, and targeted genome regulation, and can also be useful in targeting post-mitotic cells where function of endogenous HR machinery is expected to be highly diminished (Mali et al ., 2013, Nature Methods Vol. 10: 957-963).
  • the amount of homology or sequence identity shared by a target and a donor polynucleotide can vary and includes total lengths and/or regions having unit integral values in the ranges of about 1-20 bp, 20-50 bp, 50-100 bp, 75-150 bp, 100-250 bp, 150-300 bp, 200-400 bp, 250-500 bp, 300-600 bp, 350-750 bp, 400-800 bp, 450-900 bp, 500-1000 bp, 600-1250 bp, 700-1500 bp, 800-1750 bp, 900-2000 bp, 1-2.5 kb, 1.5-3 kb, 2-4 kb, 2.5-5 kb, 3-6 kb, 3.5-7 kb, 4-8 kb, 5-10 kb, or up to and including the total length of the target site.
  • ranges include every integer within the range, for example, the range of 1-20 bp includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 and 20 bps.
  • the amount of homology can also be described by percent sequence identity over the full aligned length of the two polynucleotides which includes percent sequence identity of about at least 50%, 55%, 60%, 65%, 70%, 71%, 72%,
  • Sufficient homology includes any combination of polynucleotide length, global percent sequence identity, and optionally conserved regions of contiguous nucleotides or local percent sequence identity, for example sufficient homology can be described as a region of 75-150 bp having at least 80% sequence identity to a region of the target locus.
  • Episomal DNA molecules can also be ligated into the double-strand break, for example, integration of T-DNAs into chromosomal double-strand breaks (Chilton and Que,
  • the disclosure comprises a method for editing a nucleotide sequence in the genome of a cell, the method comprising introducing into at least one PGEN described herein, and a polynucleotide modification template, wherein said polynucleotide modification template comprises at least one nucleotide modification of said nucleotide sequence, and optionally further comprising selecting at least one cell that comprises the edited nucleotide sequence.
  • the guide polynucleotide/Cas endonuclease system can be used in combination with at least one polynucleotide modification template to allow for editing (modification) of a genomic nucleotide sequence of interest.
  • at least one polynucleotide modification template to allow for editing (modification) of a genomic nucleotide sequence of interest.
  • Polynucleotides of interest and/or traits can be stacked together in a complex trait locus as described in WO2012129373 published 27 September 2012, and in WO2013112686, published 01 August 2013.
  • the guide polynucleotide/Cas9 endonuclease system described herein provides for an efficient system to generate double-strand breaks and allows for traits to be stacked in a complex trait locus.
  • a guide polynucleotide/Cas system as described herein, mediating gene targeting can be used in methods for directing heterologous gene insertion and/or for producing complex trait loci comprising multiple heterologous genes in a fashion similar as disclosed in
  • WO2012129373 published 27 September 2012, where instead of using a double-strand break inducing agent to introduce a gene of interest, a guide polynucleotide/Cas system as disclosed herein is used.
  • a guide polynucleotide/Cas system as disclosed herein is used.
  • the transgenes can be bred as a single genetic locus (see, for example, US20130263324 published 03 October 2013 or WO2012129373 published 14 March 2013).
  • plants comprising (at least) one transgenes can be crossed to form an FI that comprises both transgenes.
  • progeny from these FI F2 or BC1 1/500 progeny would have the two different transgenes recombined onto the same chromosome.
  • the complex locus can then be bred as single genetic locus with both transgene traits. This process can be repeated to stack as many traits as desired.
  • chromosomal intervals that correlate with a phenotype or trait of interest can be identified.
  • a variety of methods well known in the art are available for identifying chromosomal intervals.
  • the boundaries of such chromosomal intervals are drawn to encompass markers that will be linked to the gene controlling the trait of interest.
  • the chromosomal interval is drawn such that any marker that lies within that interval (including the terminal markers that define the boundaries of the interval) can be used as a marker for a particular trait.
  • the chromosomal interval comprises at least one QTL, and furthermore, may indeed comprise more than one QTL.
  • QTL quantitative trait locus
  • An“allele of a QTL” can comprise multiple genes or other genetic factors within a contiguous genomic region or linkage group, such as a haplotype.
  • An allele of a QTL can denote a haplotype within a specified window wherein said window is a contiguous genomic region that can be defined, and tracked, with a set of one or more polymorphic markers.
  • a haplotype can be defined by the unique fingerprint of alleles at each marker within the specified window.
  • polynucleotide(s) of interest can be introduced into a cell.
  • Cells include, but are not limited to, prokaryotic, eukaryotic, human, non-human, animal, bacterial, fungal, insect, yeast, non- conventional yeast, and plant cells, as well as whole organisms and progeny produced by the methods described herein.
  • Vectors and constructs include circular plasmids, and linear polynucleotides, comprising a polynucleotide of interest and optionally other components including linkers, adapters, regulatory or analysis.
  • a recognition site and/or target site can be comprised within an intron, coding sequence, 5' UTRs, 3' UTRs, and/or regulatory regions.
  • the invention further provides expression constructs for expressing in a prokaryotic or eukaryotic cell/organism a guide RNA/Cas system that is capable of recognizing, binding to, and optionally nicking, unwinding, or cleaving all or part of a target sequence.
  • the expression constructs of the disclosure comprise a promoter operably linked to a nucleotide sequence encoding a Cas gene and a promoter operably linked to a guide RNA of the present disclosure.
  • the promoter is capable of driving expression of an operably linked nucleotide sequence in a prokaryotic or eukaryotic cell/organism.
  • CER domain can be selected from, but not limited to , the group consisting of a 5' cap, a 3' polyadenylated tail, a riboswitch sequence, a stability control sequence, a sequence that forms a dsRNA duplex, a modification or sequence that targets the guide poly nucleotide to a subcellular location, a modification or sequence that provides for tracking , a modification or sequence that provides a binding site for proteins , a Locked Nucleic Acid (LNA), a 5-methyl dC nucleotide, a 2,6-Diaminopurine nucleotide, a 2’-Fluoro A nucleotide, a 2’-Fluoro U nucleotide; a 2'-0- Methyl RNA nucleotide, a phosphorothioate bond, linkage to a cholesterol molecule, linkage to a polyethylene glycol molecule, linkage to a spacer 18 molecule, a
  • the additional beneficial feature is selected from the group of a modified or regulated stability, a subcellular targeting, tracking, a fluorescent label, a binding site for a protein or protein complex, modified binding affinity to complementary target sequence, modified resistance to cellular degradation, and increased cellular permeability.
  • RNA polymerase III RNA polymerase III promoters, which allow for transcription of RNA with precisely defined, unmodified, 5’- and 3’- ends
  • This strategy has been successfully applied in cells of several different species including maize and soybean (US20150082478 published 19 March 2015). Methods for expressing RNA components that do not have a 5’ cap have been described (W02016/025131 published 18 February 2016).
  • compositions can be employed to obtain a cell or organism having a polynucleotide of interest inserted in a target site for a Cas endonuclease. Such methods can employ homologous recombination (HR) to provide integration of the polynucleotide of interest at the target site.
  • HR homologous recombination
  • a polynucleotide of interest is introduced into the organism cell via a donor DNA construct.
  • the donor DNA construct further comprises a first and a second region of homology that flank the polynucleotide of interest.
  • the first and second regions of homology of the donor DNA share homology to a first and a second genomic region, respectively, present in or flanking the target site of the cell or organism genome.
  • the donor DNA can be tethered to the guide polynucleotide. Tethered donor
  • DNAs can allow for co-localizing target and donor DNA, useful in genome editing, gene insertion, and targeted genome regulation, and can also be useful in targeting post-mitotic cells where function of endogenous HR machinery is expected to be highly diminished (Mali et al ., 2013, Nature Methods Vol. 10: 957-963).
  • the amount of homology or sequence identity shared by a target and a donor polynucleotide can vary and includes total lengths and/or regions having unit integral values in the ranges of about 1-20 bp, 20-50 bp, 50-100 bp, 75-150 bp, 100-250 bp, 150-300 bp, 200-400 bp, 250-500 bp, 300-600 bp, 350-750 bp, 400-800 bp, 450-900 bp, 500-1000 bp, 600-1250 bp, 700-1500 bp, 800-1750 bp, 900-2000 bp, 1-2.5 kb, 1.5-3 kb, 2-4 kb, 2.5-5 kb, 3-6 kb, 3.5-7 kb, 4-8 kb, 5-10 kb, or up to and including the total length of the target site.
  • ranges include every integer within the range, for example, the range of 1-20 bp includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 and 20 bps.
  • the amount of homology can also be described by percent sequence identity over the full aligned length of the two polynucleotides which includes percent sequence identity at least of about 50%, 55%, 60%, 65%, 70%, 71%, 72%,
  • Sufficient homology includes any combination of polynucleotide length, global percent sequence identity, and optionally conserved regions of contiguous nucleotides or local percent sequence identity, for example sufficient homology can be described as a region of 75-150 bp having at least 80% sequence identity to a region of the target locus. Sufficient homology can also be described by the predicted ability of two polynucleotides to specifically hybridize under high stringency conditions, see, for example, Sambrook et al. ,
  • the amount of homology or sequence identity shared by the“region of homology” of the donor DNA and the“genomic region” of the organism genome can be at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity, such that the sequences undergo homologous recombination
  • the region of homology on the donor DNA can have homology to any sequence flanking the target site. While in some instances the regions of homology share significant sequence homology to the genomic sequence immediately flanking the target site, it is recognized that the regions of homology can be designed to have sufficient homology to regions that may be further 5' or 3' to the target site. The regions of homology can also have homology with a fragment of the target site along with downstream genomic regions
  • the first region of homology further comprises a first fragment of the target site and the second region of homology comprises a second fragment of the target site, wherein the first and second fragments are dissimilar.
  • polynucleotides of interest include, for example, genes of interest involved in information, such as zinc fingers, those involved in communication, such as kinases, and those involved in housekeeping, such as heat shock proteins. More specific polynucleotides of interest include, but are not limited to, genes involved in traits of agronomic interest such as but not limited to: crop yield, grain quality, crop nutrient content, starch and carbohydrate quality and quantity as well as those affecting kernel size, sucrose loading, protein quality and quantity, nitrogen fixation and/or utilization, fatty acid and oil composition, genes encoding proteins conferring resistance to abiotic stress (such as drought, nitrogen, temperature, salinity, toxic metals or trace elements, or those conferring resistance to toxins such as pesticides and herbicides), genes encoding proteins conferring resistance to biotic stress (such as attacks by fungi, viruses, bacteria, insects, and nematodes, and development of diseases associated with these organisms).
  • genes of interest involved in information such as zinc fingers
  • Agronomically important traits such as oil, starch, and protein content can be genetically altered in addition to using traditional breeding methods. Modifications include increasing content of oleic acid, saturated and unsaturated oils, increasing levels of lysine and sulfur, providing essential amino acids, and also modification of starch. Hordothionin protein modifications are described in U.S. Patent Nos. 5,703,049, 5,885,801, 5,885,802, and 5,990,389.
  • Polynucleotide sequences of interest may encode proteins involved in providing disease or pest resistance.
  • Disease resistance or “pest resistance” is intended that the plants avoid the harmful symptoms that are the outcome of the plant-pathogen interactions.
  • Pest resistance genes may encode resistance to pests that have great yield drag such as rootworm, cutworm, European Com Borer, and the like.
  • Disease resistance and insect resistance genes such as lysozymes or cecropins for antibacterial protection, or proteins such as defensins, glucanases or chitinases for antifungal protection, or Bacillus thuringiensis endotoxins, protease inhibitors, collagenases, lectins, or glycosidases for controlling nematodes or insects are all examples of useful gene products.
  • Genes encoding disease resistance traits include detoxification genes, such as against fumonisin (U.S. Patent No. 5,792,931); avirulence (avr) and disease resistance (R) genes (Jones et al. (1994) Science 266:789; Martin et al. (1993) Science 262: 1432; and
  • Insect resistance genes may encode resistance to pests that have great yield drag such as rootworm, cutworm, European Corn Borer, and the like.
  • Such genes include, for example, Bacillus thuringiensis toxic protein genes (U.S. Patent Nos. 5,366,892; 5,747,450; 5,736,514; 5,723,756; 5,593,881; and Geiser et al. (1986) Gene 48: 109); and the like.
  • herbicide resistance-encoding nucleic acid molecule includes proteins that confer upon a cell the ability to tolerate a higher concentration of an herbicide than cells that do not express the protein, or to tolerate a certain concentration of an herbicide for a longer period of time than cells that do not express the protein.
  • Herbicide resistance traits may be introduced into plants by genes coding for resistance to herbicides that act to inhibit the action of acetolactate synthase (ALS, also referred to as acetohydroxyacid synthase, AHAS), in particular the sulfonylurea (UK:
  • sulphonylurea type herbicides
  • genes coding for resistance to herbicides that act to inhibit the action of glutamine synthase such as phosphinothricin or basta (e.g ., the bar gene), glyphosate (e.g ., the EPSP synthase gene and the GAT gene), HPPD inhibitors (e.g, the HPPD gene) or other such genes known in the art. See, for example, US Patent Nos. 7,626,077, 5,310,667, 5,866,775, 6,225,114, 6,248,876, 7,169,970, 6,867,293, and 9,187,762.
  • the bar gene encodes resistance to the herbicide basta
  • the nptll gene encodes resistance to the antibiotics kanamycin and geneticin
  • the ALS-gene mutants encode resistance to the herbicide chlorsulfuron.
  • polynucleotide of interest may also comprise antisense sequences complementary to at least a portion of the messenger RNA
  • mRNA for a targeted gene sequence of interest.
  • Antisense nucleotides are constructed to hybridize with the corresponding mRNA. Modifications of the antisense sequences may be made as long as the sequences hybridize to and interfere with expression of the corresponding mRNA. In this manner, antisense constructions having 70%, 80%, or 85% sequence identity to the corresponding antisense sequences may be used. Furthermore, portions of the antisense nucleotides may be used to disrupt the expression of the target gene. Generally, sequences of at least 50 nucleotides, 100 nucleotides, 200 nucleotides, or greater may be used.
  • the polynucleotide of interest may also be used in the sense orientation to suppress the expression of endogenous genes in plants.
  • Methods for suppressing gene expression in plants using polynucleotides in the sense orientation are known in the art.
  • the methods generally involve transforming plants with a DNA construct comprising a promoter that drives expression in a plant operably linked to at least a portion of a nucleotide sequence that corresponds to the transcript of the endogenous gene.
  • a nucleotide sequence has substantial sequence identity to the sequence of the transcript of the endogenous gene, generally greater than about 65% sequence identity, about 85% sequence identity, or greater than about 95% sequence identity. See U.S. Patent Nos. 5,283,184 and 5,034,323.
  • the polynucleotide of interest can also be an expression regulatory element, such as but not limited to a promoter, enhancer, intron, terminator, or UTR (untranslated regulatory sequence).
  • a UTR may be present at either the 5’ end or the 3’ end of a coding or noncoding sequence.
  • Other examples of polynucleotides of interest include genes encoding for
  • ribonucleotide molecules for example mRNA, siRNA, or other ribonucleotides.
  • the regulatory element or RNA molecule may be endogenous to the cell in which the genetic modification occurs, or it may be heterologous to the cell.
  • the polynucleotide of interest can also be a phenotypic marker.
  • a phenotypic marker is screenable or a selectable marker that includes visual markers and selectable markers whether it is a positive or negative selectable marker. Any phenotypic marker can be used.
  • a selectable or screenable marker comprises a DNA segment that allows one to identify, or select for or against a molecule or a cell that comprises it, often under particular conditions. These markers can encode an activity, such as, but not limited to, production of RNA, peptide, or protein, or can provide a binding site for RNA, peptides, proteins, inorganic and organic compounds or compositions and the like.
  • selectable markers include, but are not limited to, DNA segments that comprise restriction enzyme sites; DNA segments that encode products which provide resistance against otherwise toxic compounds including antibiotics, such as, spectinomycin, ampicillin, kanamycin, tetracycline, Basta, neomycin phosphotransferase II (NEO) and hygromycin phosphotransferase (HPT)); DNA segments that encode products which are otherwise lacking in the recipient cell ( e.g ., tRNA genes, auxotrophic markers); DNA segments that encode products which can be readily identified (e.g., phenotypic markers such as b- galactosidase, GUS; fluorescent proteins such as green fluorescent protein (GFP), cyan (CFP), yellow (YFP), red (RFP), and cell surface proteins); the generation of new primer sites for PCR (e.g, the juxtaposition of two DNA sequence not previously juxtaposed), the inclusion of DNA sequences not acted upon or acted upon by a restriction endonuclea
  • antibiotics such
  • Additional selectable markers include genes that confer resistance to herbicidal compounds, such as sulphonylureas, glufosinate ammonium, bromoxynil, imidazolinones, and 2,4-dichlorophenoxyacetate (2,4-D). See for example, Acetolactase synthase (ALS) for resistance to sulfonylureas, imidazolinones, triazolopyrimidine sulfonamides,
  • herbicidal compounds such as sulphonylureas, glufosinate ammonium, bromoxynil, imidazolinones, and 2,4-dichlorophenoxyacetate (2,4-D). See for example, Acetolactase synthase (ALS) for resistance to sulfonylureas, imidazolinones, triazolopyrimidine sulfonamides,
  • ALS Acetolactase synthase
  • Polynucleotides of interest includes genes that can be stacked or used in combination with other traits, such as but not limited to herbicide resistance or any other trait described herein. Polynucleotides of interest and/or traits can be stacked together in a complex trait locus as described in US20130263324 published 03 Oct 2013 and in WO/2013/112686, published 01 August 2013.
  • a polypeptide of interest includes any protein or polypeptide that is encoded by a polynucleotide of interest described herein.
  • identifying at least one plant cell comprising in its genome, a polynucleotide of interest integrated at the target site.
  • a variety of methods are available for identifying those plant cells with insertion into the genome at or near to the target site. Such methods can be viewed as directly analyzing a target sequence to detect any change in the target sequence, including but not limited to PCR methods, sequencing methods, nuclease digestion, Southern blots, and any combination thereof. See, for example, US20090133152 published 21 May 2009.
  • the method also comprises recovering a plant from the plant cell comprising a polynucleotide of interest integrated into its genome.
  • the plant may be sterile or fertile. It is recognized that any polynucleotide of interest can be provided, integrated into the plant genome at the target site, and expressed in a plant.
  • Any polynucleotide encoding a Cas protein, other CRISPR system component, or other polynucleotide disclosed herein may be functionally linked to a heterologous expression element, to facilitate transcription or regulation in a host cell.
  • expression elements include but are not limited to: promoter, leader, intron, and terminator.
  • Expression elements may be “minimal” - meaning a shorter sequence derived from a native source, that still functions as an expression regulator or modifier.
  • an expression element may be“optimized” - meaning that its polynucleotide sequence has been altered from its native state in order to function with a more desirable characteristic in a particular host cell.
  • an expression element may be“synthetic” - meaning that it is designed in silico and synthesized for use in a host cell. Synthetic expression elements may be entirely synthetic, or partially synthetic
  • promoters are able to direct RNA synthesis at a higher rate than others. These are called“strong promoters”. Certain other promoters have been shown to direct RNA synthesis at higher levels only in particular types of cells or tissues and are often referred to as“tissue specific promoters”, or“tissue-preferred promoters” if the promoters direct RNA synthesis preferably in certain tissues but also in other tissues at reduced levels.
  • a plant promoter includes a promoter capable of initiating transcription in a plant cell.
  • a promoter capable of initiating transcription in a plant cell see, Potenza et al. , 2004, In vitro Cell Dev Biol 40: 1-22; Porto et al, 2014, Molecular Biotechnology (2014), 56(1), 38-49.
  • New promoters of various types useful in plant cells are constantly being discovered; numerous examples may be found in the compilation by Okamuro and Goldberg, (1989) In The Biochemistry of Plants, Vol. 115, Stumpf and Conn, eds (New York, NY: Academic Press), pp. 1-82.
  • Morphogenic factors are polynucleotides that act to enhance the rate, efficiency, and/or efficacy of targeted polynucleotide modification by a number of mechanisms, some of which are related to the capability of stimulating growth of a cell or tissue, including but not limited to promoting progression through the cell cycle, inhibiting cell death, such as apoptosis, stimulating cell division, and/or stimulating embryogenesis.
  • the polynucleotides can fall into several categories, including but not limited to, cell cycle stimulatory polynucleotides, developmental polynucleotides, anti-apoptosis polynucleotides, hormone polynucleotides, transcription factors, or silencing constructs targeted against cell cycle repressors or pro- apoptotic factors.
  • cell cycle stimulatory polynucleotides developmental polynucleotides
  • anti-apoptosis polynucleotides include hormone polynucleotides, transcription factors, or silencing constructs targeted against cell cycle repressors or pro- apoptotic factors.
  • a morphogenic factor may be involved in plant metabolism, organ development, stem cell development, cell growth stimulation, organogenesis, somatic embryogenesis initiation, accelerated somatic embryo maturation, initiation and/or development of the apical meristem, initiation and/or development of shoot meristem, or a combination thereof.
  • the morphogenic factor is a molecule selected from one or more of the following categories: 1) cell cycle stimulatory polynucleotides including plant viral replicase genes such as Rep A, cyclins, E2F, prolifera, cdc2 and cdc25; 2) developmental polynucleotides such as Led, Knl family, WUSCHEL, Zwille, BBM, Aintegumenta (ANT), FUS3, and members of the Knotted family, such as Knl, STM, OSH1, and SbHl; 3) anti apoptosis polynucleotides such as CED9, Bcl2, Bcl-X(L), Bcl-W, Al, McL-1, Macl, Boo, and Bax-inhibitors; 4) hormone polynucleotides such as IPT, TZS, and CKI-1; and 5) silencing constructs targeted against cell cycle repressors, such as Rb,
  • ODP2 Ovule Development Protein 2
  • BBM Babyboom
  • polypeptides with eight other proteins having two AP2 domains polypeptides with eight other proteins having two AP2 domains.
  • the expression of the morphogenic factor is transient. In some aspects, the expression of the morphogenic factor is constitutive. In some aspects, the expression of the morphogenic factor is specific to a particular tissue or cell type. In some aspects, the expression of the morphogenic factor is temporally regulated. In some aspects, the expression of the morphogenic factor is regulated by an environmental condition, such as temperature, time of day, or other factor. In some aspects, the expression of the morphogenic factor is stable. In some aspects, expression of the morphogenic factor is controlled. The controlled expression may be a pulsed expression of the morphogenic factor for a particular period of time. Alternatively, the morphogenic factor may be expressed in only some transformed cells and not expressed in others. The control of expression of the morphogenic factor can be achieved by a variety of methods as disclosed herein.
  • Agrobacterium a natural plant pathogen, has been widely used for the transformation of dicotyledonous plants and more recently for transformation of
  • Agrobacterium-mediated gene transfer system offers the potential to regenerate transgenic cells at relatively high frequencies without a significant reduction in plant regeneration rates.
  • process of DNA transfer to the plant genome is well characterized relative to other DNA delivery methods. DNA transferred via Agrobacterium is less likely to undergo any major rearrangements than is DNA transferred via direct delivery, and it integrates into the plant genome often in single or low copy numbers.
  • the most commonly used Agrobacterium- mediated gene transfer system is a binary transformation vector system where the Agrobacterium has been engineered to include a disarmed, or nononcogenic, Ti helper plasmid, which encodes the vir functions necessary for DNA transfer, and a much smaller separate plasmid called the binary vector plasmid, which carries the transferred DNA, or the T-DNA region.
  • the T-DNA is defined by sequences at each end, called T-DNA borders, which play an important role in the production of T-DNA and in the transfer process.
  • Binary vectors are vectors in which the virulence genes are placed on a different plasmid than the one carrying the T-DNA region (Bevan, 1984, Nucl. Acids. Res. 12: 8711- 8721).
  • the development of T-DNA binary vectors has made the transformation of plant cells easier as they do not require recombination.
  • the finding that some of the virulence genes exhibited gene dosage effects led to the development of a superbinary vector, which carried additional virulence genes ( Komari, T., et al., Plant Cell Rep. (1990), 9:303-306).
  • Agrobacteria with helper plasmids can significantly improve the transient protein expression, transient T-DNA delivery, somatic embryo phenotypes, transformation frequencies, recovery of quality events, and usable quality events in different plant lines (WO2017078836A1, published 11 May 2017).
  • VIR genes are also used for the improvement of transformation with
  • Ochrobactrum for example as disclosed in US20180216123, published 02 August 2018.
  • compositions described herein do not depend on a particular method for introducing a sequence into an organism or cell, only that the polynucleotide or polypeptide gains access to the interior of at least one cell of the organism.
  • Introducing includes reference to the incorporation of a nucleic acid into a eukaryotic or prokaryotic cell where the nucleic acid may be incorporated into the genome of the cell, and includes reference to the transient (direct) provision of a nucleic acid, protein or ribonucleoprotein complex to the cell.
  • Methods for introducing polynucleotides or polypeptides or a polynucleotide- protein complex into cells or organisms are known in the art including, but not limited to, microinjection, electroporation, stable transformation methods, transient transformation methods, ballistic particle acceleration (particle bombardment), whiskers mediated transformation, Agrobacterium- mediated transformation, direct gene transfer, viral-mediated introduction, transfection, transduction, cell-penetrating peptides, mesoporous silica nanoparticle (MSN)- mediated direct protein delivery, topical applications, sexual crossing , sexual breeding, and any combination thereof.
  • General methods for the introduction of polynucleotides into a cell for transformation for example Agrobacterium-m Q diat Q d transformation, Ochrobactrum- mediated transformation, and particle bombardment-mediated transformation of cells are known in the art.
  • the guide polynucleotide (guide RNA, crNucleotide +
  • tracrNucleotide guide DNA and/or guide RNA-DNA molecule
  • the guide RNA can also be introduced into a cell indirectly by introducing a recombinant DNA molecule comprising a heterologous nucleic acid fragment encoding the guide RNA (or crRNA + tracrRNA), operably linked to a specific promoter that is capable of transcribing the guide RNA (crRNA+tracrRNA molecules) in said cell.
  • the specific promoter can be, but is not limited to, a RNA polymerase III promoter, which allow for transcription of RNA with precisely defined, unmodified, 5’- and 3’-ends (Ma el al. , 2014, Mol. Ther. Nucleic Acids 3:el61; DiCarlo et al., 2013, Nucleic Acids Res. 41 : 4336-4343; WO2015026887, published 26 February 2015).
  • Any promoter capable of transcribing the guide RNA in a cell can be used and includes a heat shock /heat inducible promoter operably linked to a nucleotide sequence encoding the guide RNA.
  • Protocols for introducing polynucleotides, polypeptides or polynucleotide-protein complexes into eukaryotic cells, such as plants or plant cells are known and include
  • polynucleotides may be introduced into cells by contacting cells or organisms with a virus or viral nucleic acids. Generally, such methods involve incorporating a polynucleotide within a viral DNA or RNA molecule. In some examples a polypeptide of interest may be initially synthesized as part of a viral polyprotein, which is later processed by proteolysis in vivo or in vitro to produce the desired recombinant protein.
  • Bacterial strains useful in the methods of the disclosure include, but are not limited to, a disarmed Agrobacteria , an
  • Ochrobactrum bacteria or a Rhizobiaceae bacteria Standard protocols for particle bombardment (Finer and McMullen, 1991, In Vitro Cell Dev. Biol. - Plant 27: 175-182), Agrobacterium- mediated transformation (Jia et al., 2015, Int J. Mol. Sci. 16: 18552-18543; US2017/0121722 incorporated herein by reference in its entirety), or Ochrobactrum-mediated transformation (US2018/0216123 incorporated herein by reference in its entirety) can be used with the methods and compositions of the disclosure.
  • the polynucleotide or recombinant DNA construct can be provided to or introduced into a prokaryotic and eukaryotic cell or organism using a variety of transient transformation methods.
  • transient transformation methods include, but are not limited to, the introduction of the polynucleotide construct directly into the plant.
  • Nucleic acids and proteins can be provided to a cell by any method including methods using molecules to facilitate the uptake of anyone or all components of a guided Cas system (protein and/or nucleic acids), such as cell-penetrating peptides and nanocarriers. See also US20110035836 published 10 February 2011, and EP2821486A1 published 07 January 2015.
  • Stable transformation is intended to mean that the nucleotide construct introduced into an organism integrates into a genome of the organism and is capable of being inherited by the progeny thereof.
  • Transient transformation is intended to mean that a polynucleotide is introduced into the organism and does not integrate into a genome of the organism or a polypeptide is introduced into an organism. Transient transformation indicates that the introduced composition is only temporarily expressed or present in the organism.
  • a variety of methods are available to identify those cells having an altered genome at or near a target site without using a screenable marker phenotype. Such methods can be viewed as directly analyzing a target sequence to detect any change in the target sequence, including but not limited to PCR methods, sequencing methods, nuclease digestion, Southern blots, and any combination thereof.
  • the presently disclosed polynucleotides and polypeptides can be introduced into a cell.
  • Cells include, but are not limited to, human, non-human, animal, mammalian, bacterial, protist, fungal, insect, yeast, non-conventional yeast, and plant cells, as well as plants and seeds produced by the methods described herein.
  • the cell of the organism is a reproductive cell, a somatic cell, a meiotic cell, a mitotic cell, a stem cell, or a pluripotent stem cell. Any cell from any organism may be used with the compositions and methods described herein, including monocot and dicot plants, and plant elements.
  • Animal cells can include, but are not limited to: an organism of a phylum including chordates, arthropods, mollusks, annelids, cnidarians, or echinoderms; or an organism of a class including mammals, insects, birds, amphibians, reptiles, or fishes.
  • the animal is human, mouse, C.
  • elegans rat, fruit fly ( Drosophila spp.), zebrafish, chicken, dog, cat, guinea pig, hamster, chicken, Japanese ricefish, sea lamprey, pufferfish, tree frog (e.g, Xenopus spp.), monkey, or chimpanzee.
  • Drosophila spp. Drosophila spp.
  • zebrafish chicken, dog, cat, guinea pig, hamster, chicken, Japanese ricefish, sea lamprey, pufferfish, tree frog (e.g, Xenopus spp.), monkey, or chimpanzee.
  • Particular cell types that are contemplated include haploid cells, diploid cells, reproductive cells, neurons, muscle cells, endocrine or exocrine cells, epithelial cells, muscle cells, tumor cells, embryonic cells, hematopoietic cells, bone cells, germ cells, somatic cells, stem cells, pluripotent stem cells, induced pluripotent stem cells, progenitor cells, meiotic cells, and mitotic cells.
  • a plurality of cells from an organism may be used.
  • compositions and methods described herein may be used to edit the genome of an animal cell in various ways. In one aspect, it may be desirable to delete one or more nucleotides. In another aspect, it may be desirable to insert one or more nucleotides. In one aspect, it may be desirable to replace one or more nucleotides. In another aspect, it may be desirable to modify one or more nucleotides via a covalent or non-covalent interaction with another atom or molecule.
  • Genome modification may be used to effect a genotypic and/or phenotypic change on the target organism.
  • a change is preferably related to an improved phenotype of interest or a physiologically-important characteristic, the correction of an endogenous defect, or the expression of some type of expression marker.
  • the phenotype of interest or physiologically-important characteristic is related to the overall health, fitness, or fertility of the animal, the ecological fitness of the organism, or the relationship or interaction of the organism with other organisms in its environment.
  • Cells that have been genetically modified using the compositions or methods described herein may be transplanted to a subject for purposes such as gene therapy, e.g. to treat a disease, or as an antiviral, antipathogenic, or anticancer therapeutic, for the production of genetically modified organisms in agriculture, or for biological research.
  • Examples of monocot plants that can be used include, but are not limited to, corn
  • Ziea mays rice ( Oryza sativa), rye ( Secale cereale), sorghum (, Sorghum bicolor, Sorghum vulgare), millet (e.g, pearl millet (Pennisetum glaucum), proso millet (Panicum miliaceum), foxtail millet ( Setaria italica), finger millet (Eleusine coracana )), wheat ( Triticum species, for example Triticum aestivum, Triticum monococcum), sugarcane ( Saccharum spp .), oats (. Avena ), barley (Hordeum), switchgrass (Panicum virgatum), pineapple (Ananas comosus), banana (Musa spp.), palm, ornamentals, turfgrasses, and other grasses.
  • Triticum species for example Triticum aestivum, Triticum monococcum
  • sugarcane Saccharum spp .
  • dicot plants that can be used include, but are not limited to, soybean
  • Brassica species for example but not limited to: oilseed rape or Canola
  • Additional plants that can be used include safflower (Carthamus tinctorius), sweet potato (Ipomoea batatus), cassava (Manihot esculenta), coffee (Coffea spp), coconut (Cocos nucifera ), citrus trees ( Citrus spp), cocoa ( Theobroma cacao), tea ( Camellia sinensis), banana (Musa spp), avocado (Persea americana), fig (Ficus casica), guava (Psidium guajava), mango (Mangifera indica), olive (Olea europaea), papaya (Carica papaya), cashew
  • Vegetables that can be used include tomatoes (Lycopersicon esculentum), lettuce (e.g, Lactuca sativa), green beans
  • Phaseolus vulgaris Phase vulgaris
  • lima beans Phaseolus limensis
  • peas Lathyrus spp
  • members of the genus Cucumis such as cucumber (C. sativus), cantaloupe (C. cantalupensis), and musk melon (C. meld).
  • Ornamentals include azalea (Rhododendron spp), hydrangea (Macrophylla hydrangea), hibiscus (Hibiscus rosasanensis), roses (Rosa spp), tulips (Tulipa spp), daffodils (Narcissus spp), petunias (Petunia hybrida), carnation (Dianthus caryophyllus), poinsettia (Euphorbia pulcherrima), and chrysanthemum.
  • a fertile plant is a plant that produces viable male and female gametes and is self-fertile. Such a self-fertile plant can produce a progeny plant without the contribution from any other plant of a gamete and the genetic material comprised therein.
  • Other embodiments of the disclosure can involve the use of a plant that is not self-fertile because the plant does not produce male gametes, or female gametes, or both, that are viable or otherwise capable of fertilization.
  • the present disclosure finds use in the breeding of plants comprising one or more introduced traits, or edited genomes.
  • a non-limiting example of how two traits can be stacked into the genome at a genetic distance of, for example, 5 cM from each other is described as follows: A first plant comprising a first transgenic target site integrated into a first DSB target site within the genomic window and not having the first genomic locus of interest is crossed to a second transgenic plant, comprising a genomic locus of interest at a different genomic insertion site within the genomic window and the second plant does not comprise the first transgenic target site. About 5% of the plant progeny from this cross will have both the first transgenic target site integrated into a first DSB target site and the first genomic locus of interest integrated at different genomic insertion sites within the genomic window.
  • Progeny plants having both sites in the defined genomic window can be further crossed with a third transgenic plant comprising a second transgenic target site integrated into a second DSB target site and/or a second genomic locus of interest within the defined genomic window and lacking the first transgenic target site and the first genomic locus of interest. Progeny are then selected having the first transgenic target site, the first genomic locus of interest and the second genomic locus of interest integrated at different genomic insertion sites within the genomic window.
  • Such methods can be used to produce a transgenic plant comprising a complex trait locus having at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,15, 16, 17, 19, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 or more transgenic target sites integrated into DSB target sites and/or genomic loci of interest integrated at different sites within the genomic window.
  • a complex trait locus having at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,15, 16, 17, 19, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 or more transgenic target sites integrated into DSB target sites and/or genomic loci of interest integrated at different sites within the genomic window.
  • various complex trait loci can be generated.
  • a synthetic composition comprising a Cas endoribonuclease molecule and a heterologous poly-guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease molecule.
  • a synthetic composition comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with a Cas endoribonuclease molecule, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the Cas endoribonuclease.
  • Aspect 3 The synthetic composition of Aspect 1 or Aspect 2, wherein the
  • Cas endoribonuclease is isolated or derived from Streptococcus thermophilus.
  • Aspect 4 The synthetic composition of Aspect 1 or Aspect 2, further comprising a Cas endonuclease.
  • Aspect 5 The synthetic composition of Aspect 1 or Aspect 2, wherein the recognition sequence comprises the nucleotides CCCGCNNNNGCGGG.
  • Aspect 6 The synthetic composition of Aspect 1 or Aspect 2, wherein at least one component is a DNA molecule encoding the component.
  • Aspect 7 The synthetic composition of Aspect 1 or Aspect 2, wherein the poly-guide RNA molecule comprises RNA.
  • Aspect 8 The synthetic composition of Aspect 1 or Aspect 2, wherein the
  • Aspect 9 The synthetic composition of Aspect 1 or Aspect 2, wherein the plurality of discrete RNA molecules comprise at least two non-identical guide RNA molecules that are each capable of forming a complex with a Cas endonuclease.
  • Aspect 10 The synthetic composition of Aspect 1 or Aspect 2, wherein the endoribonuclease molecule shares at least 85% sequence identity with SEQID NO:48.
  • Aspect 11 The synthetic composition of Aspect 1 or Aspect 2, wherein the poly-guide RNA molecule is operably linked to a promoter.
  • Aspect 12 The synthetic composition of Aspect 10, wherein the promoter selected from the group consisting of: U6, Ubiquitin, bidirectional promoter.
  • Aspect 13 A synthetic composition of editing a plurality of target polynucleotides with any of the synthetic compositions of Aspects 1-11.
  • Aspect 14 The synthetic composition of Aspect 12, wherein the plurality of target polynucleotides are non-identical to each other.
  • Aspect 15 A cell comprising any of the synthetic compositions of Aspects 1-
  • Aspect 16 The cell of Aspect 14, wherein the cell comprises a polynucleotide sequence in its genome that is capable of selective hybridization with at least one of the discrete guide RNA molecules of the poly-guide RNA molecule.
  • Aspect 17 A method of providing a poly-guide RNA molecule to a cell that comprises a target sequence capable of selective hybridization with at least one guide RNA of the poly-guide RNA molecule.
  • a method of generating a plurality of guide RNA molecules in a cell comprising providing a heterologous RNA molecule comprising a plurality of discrete guide RNA molecules, wherein the plurality of the guide RNA molecules are each flanked by an endoribonuclease recognition sequence; providing an endoribonuclease that cleaves the endoribonuclease recognition sequence; and thereby generating the plurality of guide RNA molecules in the cell.
  • Aspect 19 The method of Aspect 16 or Aspect 17, further comprising providing to the cell a Cas endonuclease.
  • Aspect 20 A method of editing a target polynucleotide in a cell, comprising providing to the cell a Cas endoribonuclease, a Cas endonuclease, and a poly-guide RNA molecule, wherein the poly-guide RNA molecule comprises a plurality of discrete guide RNA.
  • Aspect 21 The method of any of Aspects 16-19, wherein the cell is a bacterium, plant cell, or animal cell.
  • Aspect 22 The method of any of Aspects 16-19, wherein the Cas
  • the endonuclease, the Cas endoribonuclease, and the poly-guide RNA molecule are provided on one vector to a target cell.
  • Aspect 23 The method of any of Aspects 16-19, wherein the Cas
  • endonuclease is provided to a target cell on a different vector than that comprising the poly-guide RNA or the Cas endoribonuclease.
  • Aspect 24 The method of any of Aspects 16-19, wherein the Cas
  • endonuclease and/or the Cas endoribonuclease is/are delivered to the cell as proteins, and the poly-guide RNA molecule is provided to the cell as RNA.
  • Example 1 Identification of a Type I-E Cas endoribonuclease protein for poly-guide RNA expression
  • mgRNA Cas9 multi-guide RNA
  • CRISPR associated Cas9 multi-guide RNA
  • Sth7710 NCBI accession number: NZ_AWVZ00000000.1
  • minCED Bland, C. et al. (2007) BMC Bioinformatics , 8:209
  • genomic sequence regions adjacent to the identified CRISPR arrays were examined for the presence of open-reading frames (ORFs) encoding proteins with homology to Cas3, the signature protein of Type I CRISPR-Cas systems, by comparisons with NCBI protein databases using the PSI-BLAST program (Altschul, S. F. et al ., 1997, Nucleic Acids Res. 25:3389-3402).
  • ORFs open-reading frames
  • One of the CRISPR arrays comprising a 28 bp repeat with the sequence consensus of 5’- GTTTTTCCCGCAC ACGCGGGGGTGATCC-3’ was demonstrated to be adjacent to an ORF encoding a Cas3 protein.
  • genes encoding the other proteins typical of a Type I-E CRISPR-Cas system were also identified within the locus (SEQ ID NOs: 2-9). This was accomplished by first translating the ORFs between the cas3 gene and CRISPR array into proteins using ORF Finder (Stothard, P. (2000) Biotechniques 28: 1102-1104) followed by comparisons with NCBI protein databases using the PSI-BLAST program (Altschul, S. F. et al., 1997). As shown in FIG.
  • the multi -protein complex of the Type I-E system termed Cascade, is directed by small CRISPR RNAs (referred to herein as guide RNAs) to bind DNA target sites (lore, M. M. et al. (2011) Nat. Struct. Mol. Biol. 18:529-536 and Sinkunas, T. et al. (2013 ) EMBO J. 32:385- 394).
  • guide RNAs are comprised of a ⁇ 33 nt variable targeting sequence that is flanked by fixed sequences comprised of a ⁇ 7 nt 5 prime sequence and a ⁇ 21 nt 3 prime hairpin comprising sequence (FIG. 2A).
  • compositions disclosed herein may be utilized to modify the transcriptional status of a gene or a target polynucleotide in the genome of a cell.
  • said cell is a eukaryotic cell.
  • a plant cell is used. Transformation of a eukaryotic cell with a Type I-E Cas endoribonuclease and Cas9 and associated guide polynucleotide can be accomplished by various methods known to be effective in plants, including particle-mediated delivery, Agrobacterium-mediated transformation, PEG- mediated delivery, and electroporation. It is appreciated that any method known in the art may be utilized. Example methods are described below.
  • the immature embryos were isolated and placed embryo axis side down (scutellum side up), 25 embryos per plate, on bombardment medium for 4 hours and then aligned within the 2.5-cm target zone in preparation for bombardment.
  • isolated embryos were placed on initiation medium and placed in the dark at temperatures ranging from 26 degrees Celsius to 37 degrees Celsius for 8 to 24 hours prior to placing on bombardment medium for 4 hours at 26 degrees Celsius prior to bombardment as described above.
  • Plasmids comprising genes encoding the Cas endoribonuclease, Cas9 and associated guide polynucleotide constructs were constructed using standard molecular biology techniques and co-bombarded with plasmids comprising the developmental genes ODP2 (AP2 domain transcription factor ODP2 (Ovule development protein 2); US20090328252 Al) and Wuschel (US2011/0167516).
  • ODP2 AP2 domain transcription factor ODP2 (Ovule development protein 2); US20090328252 Al) and Wuschel
  • the plasmids were precipitated onto 0.6 micrometer (average diameter) gold pellets using a water-soluble cationic lipid transfection reagent as follows.
  • DNA solution was prepared on ice using 1 micrograms of plasmid DNA and optionally other constructs for co-bombardment such as 50 ng (0.5 microliters) of each plasmid comprising the developmental genes ODP2 (AP2 domain transcription factor ODP2 (Ovule development protein 2); US20090328252 Al) and Wuschel.
  • ODP2 AP2 domain transcription factor ODP2 (Ovule development protein 2); US20090328252 Al) and Wuschel.
  • gold particles 15 mg/ml
  • 5 microliters of a water-soluble cationic lipid transfection reagent was added in water and mixed carefully. Gold particles were pelleted in a microfuge at 10,000 rpm for 1 min and the supernatant removed.
  • the resulting pellet was carefully rinsed with 100 ml of 100% EtOH without resuspending the pellet and the EtOH rinse was carefully removed. 105 microliters of 100% EtOH was added and the particles resuspended by brief sonication. Then, 10 microliters was spotted onto the center of each macrocarrier and allowed to dry about 2 minutes before bombardment.
  • the plasmids and DNA of interest were precipitated onto 1.1 microns (average diameter) tungsten pellets using a calcium chloride (CaC12) precipitation procedure by mixing 100 microliters prepared tungsten particles in water, 10 microliters (1 microgram) DNA in Tris EDTA buffer (1 microgram total DNA), 100 microliters 2.5 M CaC12, and 10 microliters 0.1 M spermidine. Each reagent was added sequentially to the tungsten particle suspension, with mixing. The final mixture was sonicated briefly and allowed to incubate under constant vortexing for 10 minutes.
  • CaC12 calcium chloride
  • the tubes were centrifuged briefly, liquid was removed, and the particles were washed with 500 ml 100% ethanol, followed by a 30 second centrifugation. Again, the liquid was removed, and 105 microliters of 100% ethanol was added to the final tungsten particle pellet.
  • the tungsten/DNA particles were briefly sonicated. 10 microliters of the tungsten/DNA particles was spotted onto the center of each macrocarrier, after which the spotted particles were allowed to dry about 2 minutes before bombardment.
  • the sample plates were bombarded at level #4 with a Biorad Helium Gun. All samples received a single shot at 450 PSI, with a total of ten aliquots taken from each tube of prepared parti cles/DNA. Following bombardment, the embryos were incubated on maintenance medium for 12 to 48 hours at temperatures ranging from 26C to 37C, and then placed at 26C.
  • the following alternative protocol is for stable, rapid assay done by 2-4 days after bombardment. After 5 to 7 days the embryos are transferred to selection medium containing 3 mg/liter Bialaphos, and subcultured every 2 weeks at 26C. After approximately 10 weeks of selection, selection-resistant callus clones are transferred to medium to initiate plant
  • somatic embryo maturation 2-4 weeks
  • somatic embryos are transferred to medium for germination and transferred to a lighted culture room.
  • developing plantlets are transferred to hormone-free medium in tubes for 7-10 days until plantlets are well established.
  • Plants are then transferred to inserts in flats (equivalent to a 2.5" pot) containing potting soil and grown for 1 week in a growth chamber, subsequently grown an additional 1-2 weeks in the greenhouse, then transferred to Classic 600 pots (1.6 gallon) and grown to maturity. Plants are monitored and scored for transformation efficiency, and/or modification of regenerative capabilities.
  • Initiation medium comprised 4.0 g/1 N6 basal salts (SIGMA C-1416), 1.0 ml/1
  • Eriksson's Vitamin Mix 1000X SIGMA-1511
  • 0.5 mg/1 thiamine HC1, 20.0 g/1 sucrose, 1.0 mg/1 2,4-D, and 2.88 g/1 L-proline brought to volume with DI-H20 following adjustment to pH 5.8 with KOH
  • 2.0 g/1 Gelrite (added after bringing to volume with DI-H20); and 8.5 mg/1 silver nitrate (added after sterilizing the medium and cooling to room temperature).
  • Maintenance medium comprised 4.0 g/1 N6 basal salts (SIGMA C-1416), 1.0 ml/1
  • Eriksson's Vitamin Mix 1000X SIGMA-1511
  • 0.5 mg/1 thiamine HC1, 30.0 g/1 sucrose, 2.0 mg/1 2,4-D, and 0.69 g/1 L-proline brought to volume with DI-H20 following adjustment to pH 5.8 with KOH
  • 3.0 g/1 Gelrite (added after bringing to volume with DI-H20); and 0.85 mg/1 silver nitrate (added after sterilizing the medium and cooling to room temperature).
  • Bombardment medium comprised 4.0 g/1 N6 basal salts (SIGMA C-1416), 1.0 ml/1 Eriksson's Vitamin Mix (1000X SIGMA-1511), 0.5 mg/1 thiamine HC1, 120.0 g/1 sucrose, 1.0 mg/1 2,4-D, and 2.88 g/1 L-proline (brought to volume with DI-H20 following adjustment to pH 5.8 with KOH); 2.0 g/1 Gelrite (added after bringing to volume with DI-H20); and 8.5 mg/1 silver nitrate (added after sterilizing the medium and cooling to room temperature).
  • Selection medium comprised 4.0 g/1 N6 basal salts (SIGMA C-1416), 1.0 ml/1
  • Eriksson's Vitamin Mix 1000X SIGMA-1511
  • 0.5 mg/1 thiamine HC1, 30.0 g/1 sucrose, and 2.0 mg/1 2,4-D brought to volume with DI-H20 following adjustment to pH 5.8 with KOH
  • 3.0 g/1 Gelrite added after bringing to volume with DI-H20
  • 0.85 mg/1 silver nitrate and 3.0 mg/1 bialaphos both added after sterilizing the medium and cooling to room temperature).
  • Plant regeneration medium comprised 4.3 g/1 MS salts (GIBCO 11117-074), 5.0 ml/1 MS vitamins stock solution (0.100 g nicotinic acid, 0.02 g/1 thiamine HCL, 0.10 g/1 pyridoxine HCL, and 0.40 g/1 glycine brought to volume with polished DI-H20) (Murashige and Skoog (1962) Physiol. Plant.
  • Hormone-free medium comprised 4.3 g/1 MS salts (GIBCO 11117-074), 5.0 ml/1
  • MS vitamins stock solution (0.100 g/1 nicotinic acid, 0.02 g/1 thiamine HCL, 0.10 g/1 pyridoxine HCL, and 0.40 g/1 glycine brought to volume with polished DI-H20), 0.1 g/1 myo-inositol, and 40.0 g/1 sucrose (brought to volume with polished DI-H20 after adjusting pH to 5.6); and 6 g/1 bacto-agar (added after brought to volume with polished DI-H20), sterilized and cooled to 60°C.
  • RN-Cas9 (Cas9 in control treatments), guide RNA plasmid, helper plasmids of
  • Agrobacterium- mediated transformation was performed essentially as described in Djukanovic et al. (2006) Plant Biotech J 4:345-57. Briefly, 10-12 day old immature embryos (0.8 -2.5 mm in size) were dissected from sterilized kernels and placed into liquid medium (4.0 g/L N6 Basal Salts (Sigma C-1416), 1.0 ml/L Eriksson's Vitamin Mix (Sigma E-l 511), l.O mg/L thiamine HC1, 1.5 mg/L 2, 4-D, 0.690 g/L L-proline, 68.5 g/L sucrose, 36.0 g/L glucose, pH 5.2).
  • liquid medium 4.0 g/L N6 Basal Salts (Sigma C-1416), 1.0 ml/L Eriksson's Vitamin Mix (Sigma E-l 511), l.O mg/L thiamine HC1, 1.5 mg/L 2, 4-D, 0.690 g/L L-proline, 68.5 g/L
  • the embryos are then transferred onto new media plates containing 4.0 g/L N6 Basal Salts (Sigma C-1416), 1.0 ml/L Eriksson's Vitamin Mix (Sigma E- 1511), 1.0 mg/L thiamine HC1, 1.5 mg/L 2, 4-D, 0.69 g/L L-proline, 30.0 g/L sucrose, 0.5 g/L MES buffer, 0.85 mg/L silver nitrate, 3.0 mg/L Bialaphos, 100 mg/L carbenicillin, and 6.0 g/L agar, pH 5.8. Embryos are subcultured every three weeks until transgenic events are identified.
  • Somatic embryogenesis is induced by transferring a small amount of tissue onto regeneration medium (4.3 g/L MS salts (Gibco 11117), 5.0 ml/L MS Vitamins Stock Solution, 100 mg/L myo-inositol, 0.1 ?M ABA, 1 mg/L IAA, 0.5 mg/L zeatin, 60.0 g/L sucrose, 1.5 mg/L Bialaphos, 100 mg/L carbenicillin, 3.0 g/L Gelrite, pH 5.6) and incubation in the dark for two weeks at 28°C.
  • regeneration medium 4.3 g/L MS salts (Gibco 11117), 5.0 ml/L MS Vitamins Stock Solution, 100 mg/L myo-inositol, 0.1 ?M ABA, 1 mg/L IAA, 0.5 mg/L zeatin, 60.0 g/L sucrose, 1.5 mg/L Bialaphos, 100 mg/L carbenicillin, 3.0 g/L Gelrite, pH 5.6
  • Agrobacterium was also used to deliver all components in a single T-DNA.
  • Agrobacterium- mediated transformation was performed for canola.
  • Surface sterilized canola seeds were germinated on medium (2.2g/L MS Basal Salt mixture, 20g/L sucrose, 8g/L Sigma Agar) at 26C with 16-hour light intensity of 30-60 umol/m 2 /sec for 7 days in phytatray.
  • Hypocotyl or stem of sterilized canola seedlings were cut to ⁇ 2mm segments with, 40-50 segments were placed in petri dish containing 10ml liquid medium (4.3g/L Basal salt mixture, O.
  • lg/L Myo-inositol 5ml/L 36J-MS vitamin stock, 0.5g/L MES buffer 0, 0.1/ml/L 12N- a-NAA lmg/ml , lml/L 13108-BAP lmg/ml, O.lml/L 17G-Gibberellic acid 0.1mg/ml, lml/L 14020-Thymidin 50mg/mL) with lOul Acetosyringone (lOOmM) and 200ul Agrobacterium at a concentration of 0.8-1.0 OD600. Plates were incubated at 21C at dim light (16 hours light intensity of 5umol/m 2 /sec) for 3 days.
  • Agrobacterium- mediated transformation was performed essentially as described in Che et. al. (2016) Plant Biotechnology Journal 16, 1388-1395. Transformation process bypassed the callus stage since quick process was applied as described in Lowe et. al. (2016) In Vitro Cell Dev Biol Plant 54, 240-252. TO plants leaves were sampled and DNA were extracted as describe above and target sites mutation were analyzed with NGS. Mutations were detected in 1, 2, 3, and all 4 target sites.
  • Samples of a transformed plant were obtained and sequenced, using any method known in the art, and compared to the genomic sequences of an isoline plant not transformed with the Cas endoribonuclease, Cas9 and associated guide polynucleotide guide polynucleotide constructs.
  • the presence of non-homologous end-joining (NHEJ) insertion and/or deletion (indel) mutations resulting from DNA repair can also be used as a signature to detect cleavage activity.
  • NHEJ non-homologous end-joining
  • indel deletion
  • Example 4 The Type I-E Cas endoribonuclease cleaved a poly-guide RNA and enabled multiplex site targeting and double-strand-break creation with a Cas9 endonuclease
  • DNA expression cassettes were constructed according to FIG. 3.
  • the Cas9 gene from Streptococcus pyogenes Ml GAS (SF370) and the potato ST-LS1 intron was introduced in order to eliminate its expression in E.coli and Agrobacterium.
  • the Simian virus 40 SV40, SEQID NO: 14
  • monopartite amino terminal nuclear localization signal and bipartite nuclear localization signal from VirD2 SEQID NO: 15
  • Agrobacterium tumerfaciens endonuclease were incorporated at the amino terminus 5 and 3 of the Cas9 open reading frame.
  • the Cas9 gene was operably linked to a maize Ubiquitin promoter using standard molecular biological techniques.
  • SEQID NO: 16 represents the Cas9 cassette.
  • Optimized RN from Streptococcus thermophilus DGCC7710 Type I-E was linked to Cas9 with glycine-serine-glycine spacer (GSG) and a porcine teschovirus-1 2 A self-cleaving peptide (P2A) (SEQ ID:7).
  • SV40 NLS was also incorporated at the amino terminus 5 of the RN (some without).
  • the RN-Cas9 fusion was also linked to a maize Ubiquitin promoter and pinll (potato intron 2 terminator) using standard molecular biological techniques.
  • SEQID NO: 18 represents the RN-Cas9 cassette.
  • WO2015026883, and WO2015026886 were used to direct initiation and termination of gRNA expression.
  • the guide RNA coding sequence was 77bp long (SEQID NO:21) and comprised a 12-30 bp variable targeting domain chosen from a maize genomic target site.
  • Guide RNA variable targeting domains used in the multiplex development was maize Y1 gene identified as Y1-CR2 and Y1-CR3 which correspond to the genomic target sites Y1-TS2 and Y1-TS3 respectively (SEQ ID: 8, 9, 22, 23, respectively).
  • Constructs were prepared for particle bombardment or Agrobacteirum- mediated transformation, and introduced into Zea mays embryos as described above, to determine the optimal recognition sequence for the Type I-E Cas endoribonuclease (RN).
  • Poly-guide RNA cassettes were designed as shown in FIG. 3B, with different orders of the individual gRNAs in the poly-guide RNA for cleavage by the RN to release the gRNAs Y1-CR2 (also called“Y2”, targeting the Zea mays target site TS2) and Y1-CR3 (also called“Y3”, targeting the Zea mays target site TS3).
  • the constructs of FIG. 3 A was co-bombarded with each of the construct options shown in FIG. 3B. Two reps were conducted with 25 embryos each. Immature embryos were harvested at 2-4 days after bombardment, or 6-7 days after Agro infection, respectively.
  • Embryos were collected after transformation and pooled into single tube for a rep, overnight lyophilized embryos were grounded into powder in GenoGrinder. Genomic DNA was extracted suing the IBI Genomic DNA Mini kit (plant). PCR was done with Thermo Phusion F-548 for the first PCR and NEB Q5 for second PCR. PCR products flanking the CTS2 and CTS3 (FIG. 2) were sequenced with next generation sequencing (NGS) using two steps approach. Primary PCR was done with primers of fl/rl and f2/r2 (SEQ ID NOs: 10, 11, 12, 13) for CTS2 and CTS3 respectively, SEQID NO:49 and SEQID NO:50 were bridge sequences in the forward and reverse primers respectively.
  • Target site mutation frequency was calculated by the mutation read number divided by total read number.
  • RN protein for guide RNA processing, five sequences (Table 2) including the original 28bp repeat sequence (RN RSO) in the Streptococcus thermophilus DGCC7710 Type I-E CRISPR array and 4 variants derived from RN RSO were tested.
  • Guide RNA gene cassettes comprised maize U6 polymerase III promoter, the Y1-CR2 and Y1-CR3, and U6 terminator.
  • the endoribonuclease recognition sequence (RN RS) was used to link the Y1-CR2 and Y1-CR3 guide RNAs (FIG. 3B).
  • Target site Y1-TS2 and Y1-TS3 mutation frequency was used to evaluate the RN-
  • RN-RS4 performed the best (FIG. 6), mutations were detected at both Y-TS2 and Y1-TS3 no matter the Y1-CR2 and Y1-CR3 guide position in the cassette.
  • RN-RS also showed some multiplexing activity
  • RN-RS1 showed no multiplexing activity at all
  • RN-RS2 can only process the guides when it is adjacent to the U6 promoter
  • RN-R3 showed opposite effect as RN-RS2.
  • FIG. 6 shows the frequencies of mutations identified with each of the recognition sequences from each of the two input cassette gRNA orders (Y2-Y3 and Y3-Y2), at 7 days post-bombardment. Based on these data, RS4 was selected for further experimentation.
  • ZM-UBI Setaria italica ubiquitin promoter (SI-UBI) (SEQID NO:28), and maize bidirectional ubiquitin promoter (SEQID NO:31), and corresponding terminators of ZM-UBI TERM, CAMV35S TERM, SI-UBI term, and ZM-U6 TERM were used to control the RN binding site (RS4) linked Y 1 -R2 and Y 1 -CR3 guides.
  • SI-UBI Setaria italica ubiquitin promoter
  • SEQID NO:31 maize bidirectional ubiquitin promoter
  • BBM also known as ovule development protein! ( ODP2 ) (SEQID NO:32) and Wuschel 2 (WUS2) (SEQID NO:33) and were expressed under control of maize UBI promoter and In2-2 promoter (SEQID NO:34) respectively, yellow fluorescent protein (YFP) under Zm-UBI promoter control.
  • ODP2 ovule development protein!
  • WUS2 Wuschel 2
  • YFP yellow fluorescent protein
  • the RN-Cas9 expression cassette and guide RNA cassettes and the helper gene cassettes were in separate plasmids or constructed into a single T-DNA based on transformation methods used. All plasmids constructs were assembled using chemically synthesized DNA fragments with standard molecular cloning techniques.
  • ZmUBI Zea mays ubiquitin promoter
  • SiUBI Setaria italica Ubiquitin promoter
  • ZmU6 Zea mays U6 promoter
  • ZmUBI bidirectional Zea mays Ubiquitin bi-directional promoter
  • vectors were prepared for Agrobacterium-mediated transformation of corn embryos, using the methods described above, and introduced into maize embryos, using the ZmUBI, SiUBI, and ZmUBI bidirectional promoters at two different target sites, CR2 and CR3.
  • Table 3 Agrobacterium-mediated introduction of RN for multiplex site targeting in maize
  • OSDL1-CR3 SEQID NO:71
  • OSDL3-CR1 SEQID NO:72
  • REC8-CR4 SEQID NO:73
  • SPOl 1-CRl SEQID NO:74
  • CR1 and CR2 were targeted in Canola (A and C genomes), as shown in FIG. 11 A, using the vector design depicted in FIG. 12A: PGAZ-CR1 C genome target site (SEQID NO:75), PGAZ-CR1 A genome target site (SEQID NO:76), PGAZ- CR2 target site (A and C genomes) (SEQID NO:77).
  • the CR1 target site (PGAZ gene) differs by one nucleotide as shown in FIG. 1 IB; therefore, two different guides were needed to target the PGAZ gene in both the A and C genomes (BNA-PGAZ CR1 guide and BNA-PGAZ CR10BC-A guide shown in FIG. 12A).
  • CR2 was targeted with the BNA-PGAZ CR2 guide (FIG. 12A).
  • the guide sequences are given as: PGAZ-CRl C genome (SEQID NO:61), PGAZ-CRl A genome (SEQID NO:62), PGAZ-CR2 guide (A and C genomes) (SEQID NO:63).
  • Cas endoribonuclease identified and derived from Streptococcus thermophilus may be used to cleave a poly-guide RNA molecule to release individual guide RNAs that may form a complex with a Cas endonuclease, for the recognition, binding, and optionally nicking or cleaving a DNA target, for example in the genome of a cell.
  • compositions and methods disclosed herein may be used in any prokaryotic or eukaryotic cell, such as a bacterial cell, an animal cell, a fungal cell, or a plant cell.
  • the plant cell may be from any plant, for example from a monocot or dicot, for example but not limited to maize, soy, sorghum, canola, wheat, rice, cotton, or sunflower.
  • the Cas endonuclease may be Cas9, Cpfl, part of a Cascade, or any RNA-guided Cas endonuclease or Cas endonuclease system.
  • the method of introduction of the endonuclease, poly-guide RNA molecule, and/or endoribonuclease may be via any method known in the art, such as but not limited to Agrobacterium- mediated transformation, particle bombardment, whisker-mediated transformation, electroporation, floral dip, co-incubation, or lipofection.
  • Delivery of any component to a target polynucleotide or a target cell may be with all DNA components (such as encoding all components on a DNA vector), or alternatively, some or all components may be delivered as RNA (for example, the poly-guide RNA may be delivered directly to the target polynucleotide or cell as an RNA molecule), or some or all components may be delivered as a protein (for example, the endoribonuclease or endonuclease may be delivered directly to the target polynucleotide or cell as a protein).
  • RNA for example, the poly-guide RNA may be delivered directly to the target polynucleotide or cell as an RNA molecule
  • a protein for example, the endoribonuclease or endonuclease may be delivered directly to the target polynucleotide or cell as a protein.
  • poly-guide RNA, endonuclease, endoribonuclease may be delivered together to the target polynucleotide or target cell, or one or more components may be introduced separately. Separate introduction may occur concurrently, or be spatially or temporally distinct.
  • the poly-guide RNA and the endoribonuclease may be introduced to each other for poly-guide RNA processing into individual discrete gRNA molecules prior to the introduction of the Cas endonuclease.
  • any guide RNA composition or combination of gRNAs may be utilized.
  • the examples provided herein demonstrate multiplex polynucleotide editing using a plurality of guide RNAs provided as components of a poly-guide RNA molecule that is cleaved by an endonuclease, and is not limited to any particular target site, target polynucleotide, cell type, or specific guide RNA composition.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Biotechnology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Cell Biology (AREA)
  • Mycology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Medicinal Chemistry (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Enzymes And Modification Thereof (AREA)

Abstract

Methods and compositions are provided for the editing of a plurality of target polynucleotides, for example in the genome of a cell, using a Cas endonuclease guided by a plurality of discrete guide RNAs provided as a poly-guide RNA sequence, wherein each of the discrete guide RNAs are flanked by a sequence (or sequences) capable of recognition and cleavage by an endoribonuclease. Also provided compositions of an endoribonuclease, and methods of use thereof. The endoribonuclease may be obtained or derived from Streptococcus thermophilus.

Description

MULTIPLEX GENOME TARGETING
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims the benefit of U.S. Provisional Patent Application No.
62/783604 filed on 21 December 2018, all of which is hereby incorporated herein in its entirety by reference.
REFERENCE TO SEQUENCE LISTING SUBMITTED ELECTRONICALLY
[0002] The official copy of the sequence listing is submitted electronically via EFS-Web as an ASCII formatted sequence listing with a file named
7840WOPCT_SequenceListing_ST25.txt created on 17 December 2019 and having a size of 86,624 bytes and is filed concurrently with the specification. The sequence listing comprised in this ASCII formatted document is part of the specification and is herein incorporated by reference in its entirety.
FIELD OF THE INVENTION
[0003] The disclosure relates to the field of molecular biology, in particular to compositions and methods for modifying the genome of a cell.
BACKGROUND
[0004] Recombinant DNA technology has made it possible to insert DNA sequences at targeted genomic locations and/or modify specific endogenous chromosomal sequences. Site- specific integration techniques, which employ site-specific recombination systems, as well as other types of recombination technologies, have been used to generate targeted insertions of genes of interest in a variety of organism. Genome-editing techniques such as designer zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), or homing meganucleases, are available for producing targeted genome perturbations, but these systems tend to have low specificity and employ designed nucleases that need to be redesigned for each target site, which renders them costly and time-consuming to prepare. Newer technologies utilizing archaeal or bacterial adaptive immunity systems have been identified, called CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats), which comprise different domains of effector proteins that encompass a variety of activities (DNA recognition, binding, and optionally cleavage). CRISPR-Cas as a robust double strand break tool has been widely used for genome editing. To develop traits of economical and agronomical importance, simultaneously targeting multiple genes often is desired. For example, some traits may be controlled by several genes, while some trait genes may have multiple copies in the genome. In other examples, it may be desirable to modify multiple different genomic targets related to different traits or phenotypes. To achieve a desired phenotype, knocking out several genes or multiple copies of a gene is required. In other cases, activating certain genes and knocking out (or repressing) other genes may be desirable to generate a useful phenotype. There remains a need in the art for methods and compositions related to editing multiple genomic target sites.
SUMMARY OF INVENTION
[0005] Provided are compositions and methods for editing a plurality of target polynucleotides, for generating a plurality of guide RNA molecules in a cell, and for providing a guide RNA to a cell for genome editing. The plurality of guide RNA molecules may be provided to a target polynucleotide or to a target cell as part of a contiguous polynucleotide, comprising discrete guide RNA molecules separated by an endoribonuclease recognition sequence. In some aspects, the endoribonuclease recognition sequence is capable of being cleaved by an
endoribonuclease, for example an endoribonuclease identified from a Type I-E CRISPR system, for example from Streptococcus thermophilus.
[0006] In some aspects of any of the methods or compositions described herein, the endoribonuclease shares at least 50%, between 50% and 55%, at least 55%, between 55% and 60%, at least 60%, between 60% and 65%, at least 65%, between 65% and 70%, at least 70%, between 70% and 75%, at least 75%, between 75% and 80%, at least 80%, between 80% and 85%, at least 85%, between 85% and 90%, at least 90%, between 90% and 95%, at least 95%, between 95% and 96%, at least 96%, between 96% and 97%, at least 97%, between 97% and 98%, at least 98%, between 98% and 99%, at least 99%, between 99% and 100%, or 100% sequence identity with at least 50, between 50 and 100, at least 100, between 100 and 150, at least 150, between 150 and 200, at least 200, or greater than 200 contiguous amino acids of SEQID NO:48.
[0007] In some aspects of any of the methods or compositions described herein, the endoribonuclease shares at least 50%, between 50% and 55%, at least 55%, between 55% and 60%, at least 60%, between 60% and 65%, at least 65%, between 65% and 70%, at least 70%, between 70% and 75%, at least 75%, between 75% and 80%, at least 80%, between 80% and 85%, at least 85%, between 85% and 90%, at least 90%, between 90% and 95%, at least 95%, between 95% and 96%, at least 96%, between 96% and 97%, at least 97%, between 97% and 98%, at least 98%, between 98% and 99%, at least 99%, between 99% and 100%, or 100% sequence identity with at least 250, between 250 and 300, at least 300, between 300 and 350, at least 350, between 350 and 400, at least 400, between 400 and 450, at least 450, between 450 and 500, at least 500, between 500 and 550, at least 550, between 550 and 600, at least 600, between 600 and 625, at least 625, or even greater than 625 contiguous nucleotides of SEQID NO: l or SEQID NO:39.
[0008] In some aspects of any of the methods or compositions described herein, a functional fragment or functional variant of SEQID NO: 1, SEQID NO:39, or SESQ ID NO:48 is provided, wherein the functional fragment or functional variant is capable of, or encodes a molecule capable of, cleaving a ribonucleotide sequence.
[0009] In some aspects of any of the methods or compositions described herein, a poly guide RNA molecule is provided, that comprises a plurality of discrete gRNAs, and a plurality of recognition sequences for an endoribonuclease, for example an endoribonuclease, for example an endoribonuclease isolated or derived from Streptococcus thermophilus , for example a molecule comprising a least 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97,%, 98%, 99%, greater than 99%, or 100% sequence identity to SEQID NO: l, SEQID NO:39, or SESQ ID NO:48, or any functional fragment or functional variant thereof, wherein the recognition sequence is capable of being recognized and cleaved by said endoribonuclease. In some aspects, the poly-guide RNA molecule comprises two, three, four, five, six, seven, eight, nine, ten, or greater than ten such recognition sequences. Said recognition sequences may be identical or non-identical, or a combination thereof. Said discrete gRNAs may be identical or non-identical, or a combination thereof. The poly-guide RNA molecule may be provided to a target polynucleotide or target cell on a DNA vector or directly as an RNA molecule; a DNA vector comprising a poly-guide RNA molecule may optionally further comprise a polynucleotide sequence encoding an
endoribonuclease, a polynucleotide sequence encoding an endonuclease, or polynucleotide sequences encoding an endoribonuclease and an endonuclease, respectively. DNA vectors comprising a poly-guide RNA molecule and one or more additional compositions may comprise all compositions oriented in the same direction or in different directions, and may comprise a single expression element directing the expression of all compositions, or may comprise a plurality of expression elements directing the expression of individual or grouped compositions. DNA vectors comprising components not oriented all in the same way may include a bidirectional promoter to regulate the expression of individual or grouped compositions.
[0010] In any aspect, an expression element may be provided to regulate the expression of one or more of the compositions provided herein, in either a constitutive manner or a non- constitutive manner, for example temporally- (i.e., during different time points of an cell or organism life cycle, or diurnally regulated), spatially- (i.e., different cell or tissue types), or conditionally- (i.e., inducible or regulated) controlled. Combinations of expression elements to control the expressions of different compositions are contemplated.
[0011] In one aspect, a synthetic composition is provided, comprising an
endoribonuclease isolated or derived from Streptococcus thermophilus, and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease.
[0012] In one aspect, a synthetic composition is provided, comprising an
endoribonuclease isolated or derived from Streptococcus thermophilus, and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; further comprising a Cas endonuclease.
[0013] In one aspect, a synthetic composition is provided, comprising an
endoribonuclease isolated or derived from Streptococcus thermophilus, and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the recognition sequence comprises the nucleotides CCCGCNNNNGCGGG.
[0014] In one aspect, a synthetic composition is provided, comprising an
endoribonuclease isolated or derived from Streptococcus thermophilus, and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein at least one component is a DNA encoding the component. [0015] In one aspect, a synthetic composition is provided, comprising an
endoribonuclease isolated or derived from Streptococcus thermophilus , and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the poly-guide RNA molecule comprises RNA.
[0016] In one aspect, a synthetic composition is provided, comprising an
endoribonuclease isolated or derived from Streptococcus thermophilus , and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the endoribonuclease is a protein.
[0017] In one aspect, a synthetic composition is provided, comprising an
endoribonuclease isolated or derived from Streptococcus thermophilus , and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the plurality of discrete RNA molecules comprise at least two non-identical guide RNA molecules that are each capable of forming a complex with a Cas endonuclease.
[0018] In one aspect, a synthetic composition is provided, comprising an
endoribonuclease isolated or derived from Streptococcus thermophilus , and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the endoribonuclease shares at least 85% sequence identity with SEQID NO:48.
[0019] In one aspect, a synthetic composition is provided, comprising an
endoribonuclease isolated or derived from Streptococcus thermophilus , and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the poly-guide RNA molecule is operably linked to a promoter.
[0020] In one aspect, a synthetic composition is provided, comprising an
endoribonuclease isolated or derived from Streptococcus thermophilus , and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the poly-guide RNA molecule is operably linked to a promoter, wherein the promoter selected from the group consisting of: U6, Ubiquitin, bidirectional promoter.
[0021] In any aspect, a synthetic composition is provided, comprising an
endoribonuclease isolated or derived from Streptococcus thermophilus , and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease.
[0022] In any aspect, a synthetic composition is provided, comprising an
endoribonuclease isolated or derived from Streptococcus thermophilus , and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease; wherein at least one of the discrete guide RNA molecules is capable of selective hybridization with a target polynucleotide.
[0023] In any aspect, a synthetic composition is provided, comprising an
endoribonuclease isolated or derived from Streptococcus thermophilus , and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease; wherein the discrete guide RNA molecules are each capable of selective hybridization with a target polynucleotide.
[0024] In any aspect, a synthetic composition is provided, comprising an
endoribonuclease isolated or derived from Streptococcus thermophilus , and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease; wherein the discrete guide RNA molecules are each capable of selective hybridization with a target polynucleotide, wherein each of the discrete guide RNA molecules are non-identical.
[0025] In any aspect, a synthetic composition is provided, comprising an
endoribonuclease isolated or derived from Streptococcus thermophilus , and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease; wherein the discrete guide RNA molecules are each capable of selective hybridization with at least one target polynucleotide, wherein each of the discrete guide RNA molecules are identical.
[0026] In any aspect, a synthetic composition is provided, comprising an
endoribonuclease isolated or derived from Streptococcus thermophilus , and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease; wherein the discrete guide RNA molecules are each capable of selective hybridization with a different target polynucleotide.
[0027] In any aspect, a synthetic composition is provided, comprising an
endoribonuclease isolated or derived from Streptococcus thermophilus , and a heterologous poly guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease; wherein the discrete guide RNA molecules are each capable of selective hybridization with a target polynucleotide; wherein the target polynucleotide is in a cell.
[0028] In one aspect, a synthetic composition is provided, comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease.
[0029] In one aspect, a synthetic composition is provided, comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; further comprising a Cas endonuclease.
[0030] In one aspect, a synthetic composition is provided, comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the recognition sequence comprises the nucleotides CCCGCNNNNGCGGG. [0031] In one aspect, a synthetic composition is provided, comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein at least one component is a DNA molecule encoding the component.
[0032] In one aspect, a synthetic composition is provided, comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the poly-guide RNA molecule comprises RNA.
[0033] In one aspect, a synthetic composition is provided, comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the endoribonuclease is a protein.
[0034] In one aspect, a synthetic composition is provided, comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the plurality of discrete RNA molecules comprise at least two non-identical guide RNA molecules that are each capable of forming a complex with a Cas endonuclease.
[0035] In one aspect, a synthetic composition is provided, comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; The synthetic composition of Claim Error! Reference source not found, or Claim 1, wherein the endoribonuclease shares at least 85% sequence identity with SEQID NO:48.
[0036] In one aspect, a synthetic composition is provided, comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the poly-guide RNA molecule is operably linked to a promoter. [0037] In one aspect, a synthetic composition is provided, comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the poly-guide RNA molecule is operably linked to a promoter; wherein the promoter selected from the group consisting of: U6, Ubiquitin, bidirectional promoter.
[0038] In any aspect, a synthetic composition is provided, comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease.
[0039] In any aspect, a synthetic composition is provided, comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease; wherein at least one of the discrete guide RNA molecules is capable of selective hybridization with a target polynucleotide.
[0040] In any aspect, a synthetic composition is provided, comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease; wherein the discrete guide RNA molecules are each capable of selective hybridization with a target polynucleotide.
[0041] In any aspect, a synthetic composition is provided, comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease; wherein the discrete guide RNA molecules are each capable of selective hybridization with a target polynucleotide, wherein each of the discrete guide RNA molecules are non-identical. [0042] In any aspect, a synthetic composition is provided, comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the discrete guide RNA molecules are each capable of selective hybridization with at least one target polynucleotide, wherein each of the discrete guide RNA molecules are identical.
[0043] In any aspect, a synthetic composition is provided, comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease; wherein the discrete guide RNA molecules are each capable of selective hybridization with a different target polynucleotide.
[0044] In any aspect, a synthetic composition is provided, comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with an endoribonuclease, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the endoribonuclease; wherein the synthetic composition is capable of being cleaved by the endoribonuclease; wherein the discrete guide RNA molecules are each capable of selective hybridization with a target polynucleotide; wherein the target polynucleotide is in a cell.
[0045] In one aspect, a cell is provided that comprises, either transiently introduced or stably integrated, any of the synthetic compositions herein.
[0046] In some aspects, a cell is provided that comprises in its genome a polynucleotide sequence that is capable of selective hybridization with at least one of the discrete gRNA molecules of the poly-guide RNA molecule.
[0047] In one aspect, a method is provided for providing a poly-guide RNA molecule to a cell that comprises a target sequence capable of selective hybridization with at least one guide RNA of the poly-guide RNA molecule.
[0048] In one aspect, a method is provided for providing a poly-guide RNA molecule to a cell that comprises a target sequence capable of selective hybridization with at least one guide RNA of the poly-guide RNA molecule; further comprising providing to the cell a Cas endonuclease. [0049] In one aspect, a method is provided for providing a poly-guide RNA molecule to a cell that comprises a target sequence capable of selective hybridization with at least one guide RNA of the poly-guide RNA molecule; wherein the cell is a bacterium, plant cell, or animal cell.
[0050] In one aspect, a method is provided for providing a poly-guide RNA molecule to a cell that comprises a target sequence capable of selective hybridization with at least one guide RNA of the poly-guide RNA molecule; wherein the Cas endonuclease, the endoribonuclease, and the poly-guide RNA molecule are provided on one vector to a target cell.
[0051] In one aspect, a method is provided for providing a poly-guide RNA molecule to a cell that comprises a target sequence capable of selective hybridization with at least one guide RNA of the poly-guide RNA molecule; wherein the Cas endonuclease is provided to a target cell on a different vector than that comprising the poly-guide RNA or the endoribonuclease.
[0052] In one aspect, a method is provided for providing a poly-guide RNA molecule to a cell that comprises a target sequence capable of selective hybridization with at least one guide RNA of the poly-guide RNA molecule; wherein the Cas endonuclease and/or the
endoribonuclease is/are delivered to the cell as proteins, and the poly-guide RNA molecule is provided to the cell as RNA.
[0053] In one aspect, a method is provided for generating a plurality of guide RNA molecules in a cell, the method comprising providing a heterologous RNA molecule comprising a plurality of discrete guide RNA molecules, wherein the plurality of the guide RNA molecules are each flanked by an endoribonuclease recognition sequence; providing an endoribonuclease that cleaves the endoribonuclease recognition sequence; and thereby generating the plurality of guide RNA molecules in the cell.
[0054] In one aspect, a method is provided for generating a plurality of guide RNA molecules in a cell, the method comprising providing a heterologous RNA molecule comprising a plurality of discrete guide RNA molecules, wherein the plurality of the guide RNA molecules are each flanked by an endoribonuclease recognition sequence; providing an endoribonuclease that cleaves the endoribonuclease recognition sequence; and thereby generating the plurality of guide RNA molecules in the cell; further comprising providing to the cell a Cas endonuclease.
[0055] In one aspect, a method is provided for generating a plurality of guide RNA molecules in a cell, the method comprising providing a heterologous RNA molecule comprising a plurality of discrete guide RNA molecules, wherein the plurality of the guide RNA molecules are each flanked by an endoribonuclease recognition sequence; providing an endoribonuclease that cleaves the endoribonuclease recognition sequence; and thereby generating the plurality of guide RNA molecules in the cell; wherein the cell is a bacterium, plant cell, or animal cell.
[0056] In one aspect, a method is provided for generating a plurality of guide RNA molecules in a cell, the method comprising providing a heterologous RNA molecule comprising a plurality of discrete guide RNA molecules, wherein the plurality of the guide RNA molecules are each flanked by an endoribonuclease recognition sequence; providing an endoribonuclease that cleaves the endoribonuclease recognition sequence; and thereby generating the plurality of guide RNA molecules in the cell; wherein the Cas endonuclease, the endoribonuclease, and the poly-guide RNA molecule are provided on one vector to a target cell.
[0057] In one aspect, a method is provided for generating a plurality of guide RNA molecules in a cell, the method comprising providing a heterologous RNA molecule comprising a plurality of discrete guide RNA molecules, wherein the plurality of the guide RNA molecules are each flanked by an endoribonuclease recognition sequence; providing an endoribonuclease that cleaves the endoribonuclease recognition sequence; and thereby generating the plurality of guide RNA molecules in the cell; wherein the Cas endonuclease is provided to a target cell on a different vector than that comprising the poly-guide RNA or the endoribonuclease.
[0058] In one aspect, a method is provided for generating a plurality of guide RNA molecules in a cell, the method comprising providing a heterologous RNA molecule comprising a plurality of discrete guide RNA molecules, wherein the plurality of the guide RNA molecules are each flanked by an endoribonuclease recognition sequence; providing an endoribonuclease that cleaves the endoribonuclease recognition sequence; and thereby generating the plurality of guide RNA molecules in the cell; wherein the Cas endonuclease and/or the endoribonuclease is/are delivered to the cell as proteins, and the poly-guide RNA is provided to the cell as RNA.
[0059] In one aspect, a method is provided for editing a target polynucleotide in a cell, comprising providing to the cell an endoribonuclease, a Cas endonuclease, and a poly-guide RNA molecule, wherein the poly-guide RNA molecule comprises a plurality of discrete guide RNA.
[0060] In any aspect, a method is provided for editing a target polynucleotide in a cell, comprising providing to the cell an endoribonuclease, a Cas endonuclease, and a poly-guide RNA molecule, wherein the poly-guide RNA molecule comprises a plurality of discrete guide RNA; wherein the cell is a bacterium, plant cell, or animal cell.
[0061] In any aspect, a method is provided for editing a target polynucleotide in a cell, comprising providing to the cell an endoribonuclease, a Cas endonuclease, and a poly-guide RNA molecule, wherein the poly-guide RNA molecule comprises a plurality of discrete guide RNA; wherein the Cas endonuclease, the endoribonuclease, and the poly-guide RNA molecule are provided on one vector to a target cell.
[0062] In any aspect, a method is provided for editing a target polynucleotide in a cell, comprising providing to the cell an endoribonuclease, a Cas endonuclease, and a poly-guide RNA molecule, wherein the poly-guide RNA molecule comprises a plurality of discrete guide RNA; wherein the Cas endonuclease is provided to a target cell on a different vector than that comprising the poly-guide RNA or the endoribonuclease.
[0063] In any aspect, a method is provided for editing a target polynucleotide in a cell, comprising providing to the cell an endoribonuclease, a Cas endonuclease, and a poly-guide RNA molecule, wherein the poly-guide RNA molecule comprises a plurality of discrete guide RNA; wherein the Cas endonuclease and/or the endoribonuclease is/are delivered to the cell as proteins, and the poly-guide RNA molecule is provided to the cell as RNA.
[0064] In any aspect, the poly-guide RNA molecule is provided as RNA. In any aspect, the poly-guide RNA molecule is provided as a DNA molecule encoding the discrete guide RNA molecules, operably linked to a functional promoter. In any aspect, the Cas endoribonuclease is provided as a protein, or as an RNA molecule that gets transcribed in to a protein, or as a DNA molecule that gets translated into RNA and transcribed into a protein. In any aspect, the Cas endonuclease is provided as a protein, or as an RNA molecule that gets transcribed in to a protein, or as a DNA molecule that gets translated into RNA and transcribed into a protein.
[0065] In one aspect, the Cas endoribonuclease is Cas6. In one aspect, the Cas endoribonuclease is identified from a Type-I system.
BRIEF DESCRIPTION OF THE DRAWINGS AND THE SEQUENCE LISTING
[0066] The disclosure can be more fully understood from the following detailed description and the accompanying drawings and Sequence Listing, which form a part of this application. [0067] FIG. 1 depicts the Type I-E CRISPR locus from Streptococcus thermophilus
DGCC7710. The Cas endoribonuclease gene is indicated with a crosshatched arrow in the locus.
[0068] FIG. 2 depicts guide RNAs comprising a ~33 nt variable targeting sequence that is flanked by fixed sequences comprising a ~7 nt 5 prime sequence and a ~21 nt 3 prime sequence capable of forming a hairpin-like structure (FIG. 2A). These fixed flanking sequences are the result of cleavage within the repeat sequences of the primary CRISPR array transcript by the Cas endoribonuclease protein in Type I-E CRISPR-Cas systems (FIG. 2B).
[0069] FIG. 3 shows the experimental design to determine the sequence for the Cas endoribonuclease binding and cleavage. FIG. 3 A depicts the cassette comprising the Cas endoribonuclease (RN) for poly-guide RNA cleavage and the Cas9 deoxyribonuclease (DN) for genomic target cleavage. FIG. 3B shows two different arrangements of the genomic target site gRNAs within the poly-guide RNA cassette.
[0070] FIG. 4 depicts the Zea mays Y1 genomic locus (Zm-Yl, SEQID NO:36) for Cas9 cleavage, using guide RNAs released from the poly-guide RNA by the Cas endoribonuclease (RN) that target two target sites adjacent to Zm-Y 1 : TS2 is Target Site 2 (SEQID NO:22), and TS3 is Target Site 3 (SEQID NO:23). fl is the forward primer 1 (SEQID NO: 10), f2 is the forward primer 2 (SEQID NO: 12), rl is the reverse primer 1 (SEQID NO: 11), r2 is the reverse primer 2 (SEQID NO: 13).
[0071] FIG. 5 shows the design of the DNA cassettes used in the examples. Arrows depict a promoter sequence.
Table 1: Compositions of the DNA cassettes of FIG. 5
Figure imgf000015_0001
[0072] FIG. 6 shows mutation frequencies (%) for different Cas endoribonuclease recognition sequences, at 7 days post-bombardment.
[0073] FIG. 7 shows promoter comparisons for multiplex targeting with the Cas endoribonuclease, at 4 days post-bombardment.
[0074] FIG. 8 shows promoter comparisons for multiplex targeting with the Cas endoribonuclease, at 7 days post -Agrobacterium infection.
[0075] FIG. 9 is the vector schematic for targeting five sites in maize as described in
Example 4.
[0076] FIG. 10 is the vector schematic for targeting four sites in sorghum as described in
Example 4.
[0077] FIG. 11A is the experimental design for targeting two sites in canola, for the A and C genomes. FIG. 11B shows the nucleotide difference for the PGAZ gene (CR1) in the canola A genome vs the C genome.
[0078] FIG. 12A is the vector schematic for testing two sites in canola as described in
Example 4. FIG. 12B shows the experimental results, demonstrating successful cleavage in canola in both the A and C genomes, with the multiplexed guide RNAs.
[0079] FIG. 13A is the vector schematic for the Cas9 cassette used in the promoter study in maize. FIG. 13B are the vector schematics for different promoter cassettes used in the promoter study in maize (Example 4). FIG. 13C shows the results of testing in maize at target site TS45. FIG. 13D shows the results of testing in maize at target site Y1-CR2.
[0080] The sequence descriptions and sequence listing attached hereto comply with the rules governing nucleotide and amino acid sequence disclosures in patent applications as set forth in 37 C.F.R. §§1.821 and 1.825. The sequence descriptions comprise the three letter codes for amino acids as defined in 37 C.F.R. §§ 1.821 and 1.825, which are incorporated herein by reference.
[0081] SEQID NO:l is the Streptococcus thermophilus DGCC7710 DNA sequence of the
Type I-E Cas endoribonuclease (RN).
[0082] SEQID NO:2 is the Streptococcus thermophilus DGCC7710 DNA sequence of the
Type I-E Cas endoribonuclease Recognition Sequence 0. [0083] SEQID NO:3 is the Streptococcus thermophilus DGCC7710 DNA sequence of the Type I-E Cas endoribonuclease Recognition Sequence 1.
[0084] SEQID NO:4 is the Streptococcus thermophilus DGCC7710 DNA sequence of the
Type I-E Cas endoribonuclease Recognition Sequence 2.
[0085] SEQID NO:5 is the Streptococcus thermophilus DGCC7710 DNA sequence of the
Type I-E Cas endoribonuclease Recognition Sequence 3.
[0086] SEQID NO:6 is the Streptococcus thermophilus DGCC7710 DNA sequence of the
Type I-E Cas endoribonuclease Recognition Sequence 4.
[0087] SEQID NO:7 is the artificial DNA sequence of optimized porcine teschovirus-1 2A self-cleaving peptide (p2A).
[0088] SEQID NO:8 is the artificial DNA sequence of Y1-CR2 guide.
[0089] SEQID NO:9 is the artificial DNA sequence of Y1-CR3 guide.
[0090] SEQID NO: 10 is the artificial DNA sequence of Yl-CR2-fl.
[0091] SEQID NO: 11 is the artificial DNA sequence of Yl-CR2-rl.
[0092] SEQID NO:12 is the artificial DNA sequence of Yl-CR3-f2.
[0093] SEQID NO:13 is the artificial DNA sequence of Yl-CR3-r2.
[0094] SEQID NO:14 is the Simian Virus 40 DNA sequence of SV40 NLS.
[0095] SEQID NO:15 is the artificial DNA sequence of VIRD2 NLS.
[0096] SEQID NO:16 is the artificial DNA sequence of UBTCAS9.
[0097] SEQID NO: 17 is the Streptococcus pyogenes DNA sequence of Cas9.
[0098] SEQID NO: 18 is the artificial DNA sequence of UBTRN-CAS9 with all NLS.
[0099] SEQID NO:19 is the Zea mays DNA sequence of UBTRN-CAS9 no NLS for RN.
[0100] SEQID NO:20 is the Zea mays DNA sequence of ZM-U6 promoter.
[0101] SEQID NO:21 is the Zea mays DNA sequence of GUIDE RNA (77bp).
[0102] SEQID NO:22 is the Zea mays DNA sequence of Y1-TS2.
[0103] SEQID NO:23 is the Zea mays DNA sequence of Y1-TS3.
[0104] SEQID NO:24 is the artificial DNA sequence of ZM-UBI TERM-V1.
[0105] SEQID NO:25 is the artificial DNA sequence of BSV (AY) TR PRO.
[0106] SEQID NO:26 is the artificial DNA sequence of ZM-HPLV9 INTRON1.
[0107] SEQID NO:27 is the artificial DNA sequence of CAMV35S TERM.
[0108] SEQID NO:28 is the Setaria italica DNA sequence of SI-UBI1 PRO. [0109] SEQID NO:29 is the Setaria italica DNA sequence of SI-UBI1 INTRON1.
[0110] SEQID NO:30 is the Setaria italica DNA sequence of SI-UBI TERM (MODI).
[0111] SEQID NO:31 is the Zea mays DNA sequence of ZM-UBI bidirectional promoter.
[0112] SEQID NO:32 is the Zea mays DNA sequence of ZM-ODP2.
[0113] SEQID NO:33 is the Zea mays DNA sequence of ZM-WUS2.
[0114] SEQID NO:34 is the Zea mays DNA sequence of IN2-2 PRO.
[0115] SEQID NO:35 is the Zea mays DNA sequence of In2-1 TERM.
[0116] SEQID NO:36 is the artificial DNA sequence of ZS-YELLOW1 Nl.
[0117] SEQID NO:37 is the Streptococcus thermophilus DNA sequence of the Type I-E Type
I CRISPR repeat.
[0118] SEQID NO:38 is the S. thermophilus DNA sequence of the Type I-E casD gene.
[0119] SEQID NO:39 is the S. thermophilus DNA sequence of the Type I-E cas
Endoribonuclease (RN) gene.
[0120] SEQID NO:40 is the S. thermophilus DNA sequence of the Type I-E casC gene.
[0121] SEQID NO:41 is the S. thermophilus DNA sequence of the Type I-E casA gene.
[0122] SEQID NO:42 is the S. thermophilus DNA sequence of the Type I-E casB gene.
[0123] SEQID NO:43 is the S. thermophilus DNA sequence of the Type I-E cas3 gene.
[0124] SEQID NO:44 is the S. thermophilus DNA sequence of the Type I-E casl gene.
[0125] SEQID NO:45 is the S. thermophilus DNA sequence of the Type I-E cas2 gene.
[0126] SEQID NO:46 is the Zea mays DNA sequence of the Ubiquitin promoter.
[0127] SEQID NO:47 is the DNA sequence of the Pinll terminator.
[0128] SEQID NO:48 is the Protein sequence of the Streptococcus thermophilus Type I-E
Endoribonuclease (RN).
[0129] SEQID NO:49 is the forward primer bridge sequence DNA sequence.
[0130] SEQID NO:50 is the reverse primer bridge sequence DNA sequence.
[0131] SEQID NO:51 is the secondary PCR universal forward primer DNA sequence.
[0132] SEQID NO:52 is the secondary PCR universal reverse primer DNA sequence.
[0133] SEQID NO:53 is the Zea mays SH2-CR4 guide DNA sequence.
[0134] SEQID NO:54 is the Zea mays SH2-CR5 guide DNA sequence.
[0135] SEQID NO:55 is the Zea mays SU1-CR1 guide DNA sequence.
[0136] SEQID NO:56 is the Zea mays SU1-CR4 guide DNA sequence. [0137] SEQID NO:57 is the Sorghum bicolor OSDL1-CR3 guide DNA sequence.
[0138] SEQID NO:58 is the Sorghum bicolor 0SDL3-CR1 guide DNA sequence.
[0139] SEQID NO:59 is the Sorghum bicolor REC8-CR4 guide DNA sequence.
[0140] SEQID NO:60 is the Sorghum bicolor SP011-CRl guide DNA sequence.
[0141] SEQID NO:61 is the Brassica napus PGAZ-CR1-C guide DNA sequence.
[0142] SEQID NO:62 is the Brassica napus PGAZ-CR1-A guide DNA sequence.
[0143] SEQID NO:63 is the Brassica napus PGAZ-CR2 guide DNA sequence.
[0144] SEQID NO:64 is the Arabidopsis thaliana AT-UBI PRO bidirectional DNA sequence.
[0145] SEQID NO:65 is the Arabidopsis thaliana AT-UBI10 5UTR-INTRON 1 DNA seq.
[0146] SEQID NO:66 is the Arabidopsis thaliana AT-NLS (CO) DNA sequence.
[0147] SEQID NO:67 is the Zea mays SH2-CR4 target site DNA sequence.
[0148] SEQID NO:68 is the Zea mays SH2-CR5 target site DNA sequence.
[0149] SEQID NO:69 is the Zea mays SU1-CR1 target site DNA sequence.
[0150] SEQID NO:70 is the Zea mays SU1-CR4 target site DNA sequence.
[0151] SEQID NO:71 is the Sorghum bicolor OSDL1-CR3 target site DNA sequence.
[0152] SEQID NO:72 is the Sorghum bicolor OSDL3-CR1 target site DNA sequence.
[0153] SEQID NO:73 is the Sorghum bicolor REC8-CR4 target site DNA sequence.
[0154] SEQID NO:74 is the Sorghum bicolor SPOl 1-CRl target site DNA sequence.
[0155] SEQID NO:75 is the Sorghum bicolor PGAZ-CR1-C target site DNA sequence.
[0156] SEQID NO:76 is the Brassica napus PGAZ-CR1-A target site DNA sequence.
[0157] SEQID NO:77 is the Brassica napus PGAZ-CR2 target site DNA sequence.
[0158] SEQID NO:78 is the artificial SH2_CR4 forward primer DNA sequence.
[0159] SEQID NO:79 is the artificial SH2_CR4 reverse primer DNA sequence.
[0160] SEQID NO:80 is the artificial SH2 CR5 forward primer DNA sequence.
[0161] SEQID NO:81 is the artificial SH2_CR5 reverse primer DNA sequence.
[0162] SEQID NO:82 is the artificial Sul-CRl forward primer DNA sequence.
[0163] SEQID NO:83 is the artificial Sul-CRl reverse primer DNA sequence.
[0164] SEQID NO:84 is the artificial Sul-CR4 forward primer DNA sequence.
[0165] SEQID NO:85 is the artificial Sul-CR4 reverse primer DNA sequence.
[0166] SEQID NO:86 is the artificial PGAZ-CR1-A forward primer DNA sequence.
[0167] SEQID NO:87 is the artificial PGAZ-CR1-A reverse primer DNA sequence. [0168] SEQID NO:88 is the artificial PGAZ-CR1-C forward primer DNA sequence.
[0169] SEQID NO:89 is the artificial PGAZ-CR1-C reverse primer DNA sequence.
[0170] SEQID NO:90 is the artificial PGAZ-CR2 forward primer 1 DNA sequence.
[0171] SEQID NO:91 is the artificial PGAZ-CR2 reverse primer 1 DNA sequence.
[0172] SEQID NO:92 is the artificial OSDL1-CR3 forward primer DNA sequence.
[0173] SEQID NO:93 is the artificial OSDL1-CR3 reverse primer DNA sequence.
[0174] SEQID NO:94 is the artificial OSDL3-CR1 forward primer DNA sequence.
[0175] SEQID NO:95 is the artificial OSDL3-CR1 reverse primer DNA sequence.
[0176] SEQID NO:96 is the artificial REC8-CR4 forward primer PRT sequence.
[0177] SEQID NO:97 is the artificial REC8-CR4 reverse primer DNA sequence.
[0178] SEQID NO:98 is the artificial SPOl 1-CRl forward primer DNA sequence.
[0179] SEQID NO:99 is the artificial SPOl 1-CRl reverse primer DNA sequence.
DETAILED DESCRIPTION
[0180] Targeting multiple loci for editing (“multiplex” targeting) may be achieved by providing a plurality of guide RNAs to target polynucleotide(s). In some aspects, the plurality of guide RNAs are provided as DNA sequences on a single construct, separated by a sequence comprising a target site for an endoribonuclease (RN), such as but not limited to Cas6. In some aspects, the plurality of guide RNAs share complementarity to different target sites. In some aspects, the target polynucleotide(s) is(are) present in the genome of a cell.
[0181] Terms used in the claims and specification are defined as set forth below unless otherwise specified. It must be noted that, as used in the specification and the appended claims, the singular forms "a," "an" and "the" include plural referents unless the context clearly dictates otherwise.
Definitions
[0182] As used herein,“nucleic acid” means a polynucleotide and includes a single or a double-stranded polymer of deoxyribonucleotide or ribonucleotide bases. Nucleic acids may also include fragments and modified nucleotides. Thus, the terms“polynucleotide”,“nucleic acid sequence”,“nucleotide sequence” and“nucleic acid fragment” are used interchangeably to denote a polymer of RNA and/or DNA and/or RNA-DNA that is single- or double-stranded, optionally comprising synthetic, non-natural, or altered nucleotide bases. Nucleotides (usually found in their 5’ -monophosphate form) are referred to by their single letter designation as follows:“A” for adenosine or deoxyadenosine (for RNA or DNA, respectively),“C” for cytosine or deoxycytosine,“G” for guanosine or deoxyguanosine,“U” for uridine,“T” for thymine or deoxythymidine,“R” for purines (A or G),“Y” for pyrimidines (C or T),“K” for G or T,“H” for A or C or T,“I” for inosine, and“N” for any nucleotide.
[0183] The term“genome” as it applies to a prokaryotic and eukaryotic cell or organism cells encompasses not only chromosomal DNA found within the nucleus, but organelle DNA found within subcellular components (e.g, mitochondria, or plastid) of the cell.
[0184] “Open reading frame” is abbreviated ORF.
[0185] The term "selectively hybridizes" (or“selective hybridization”) includes reference to hybridization, under stringent hybridization conditions, of a nucleic acid sequence to a specified nucleic acid target sequence to a detectably greater degree (e.g, at least 2-fold over background) than its hybridization to non-target nucleic acid sequences and to the substantial exclusion of non-target nucleic acids. Selectively hybridizing sequences typically have about at least 80% sequence identity, or 90% sequence identity, up to and including 100% sequence identity (i.e., fully complementary) with each other.
[0186] The term "stringent conditions" or“stringent hybridization conditions” includes reference to conditions under which a probe will selectively hybridize to its target sequence in an in vitro hybridization assay. Stringent conditions are sequence-dependent and will be different in different circumstances. By controlling the stringency of the hybridization and/or washing conditions, target sequences can be identified which are 100% complementary to the probe (homologous probing). Alternatively, stringency conditions can be adjusted to allow some mismatching in sequences so that lower degrees of similarity are detected (heterologous probing). Generally, a probe is less than about 1000 nucleotides in length, optionally less than 500 nt in length. Typically, stringent conditions will be those in which the salt concentration is less than about 1.5 M Na ion, typically about 0.01 to 1.0 M Na ion concentration (or other salt(s)) at pH 7.0 to 8.3, and at least about 30°C for short probes (e.g, 10 to 50 nucleotides) and at least about 60°C for long probes (e.g, greater than 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. Exemplary low stringency conditions include hybridization with a buffer solution of 30 to 35% formamide, 1 M NaCl, 1% SDS (sodium dodecyl sulfate) at 37°C, and a wash in IX to 2X SSC (20X SSC = 3.0 M NaCl/0.3 M trisodium citrate) at 50 to 55°C. Exemplary moderate stringency conditions include hybridization in 40 to 45% formamide, 1 M NaCl, 1% SDS at 37°C, and a wash in 0.5X to IX SSC at 55 to 60°C. Exemplary high stringency conditions include hybridization in 50% formamide, 1 M NaCl, 1% SDS at 37°C, and a wash in 0.1X SSC at 60°C to 65°C.
[0187] By“homology” is meant DNA sequences that are similar. For example, a“region of homology to a genomic region” that is found on the donor DNA is a region of DNA that has a similar sequence to a given“genomic region” in the cell or organism genome. A region of homology can be of any length that is sufficient to promote homologous recombination at the cleaved target site. For example, the region of homology can comprise at least 5-10, 5-15, 5-20, 5-25, 5-30, 5-35, 5-40, 5-45, 5- 50, 5-55, 5-60, 5-65, 5- 70, 5-75, 5-80, 5-85, 5-90, 5-95, 5-100, 5-200, 5-300, 5-400, 5-500, 5-600, 5-700, 5-800, 5-900, 5-1000, 5-1100, 5-1200, 5-1300, 5- 1400, 5-1500, 5-1600, 5-1700, 5-1800, 5-1900, 5-2000, 5-2100, 5-2200, 5-2300, 5-2400, 5-2500, 5-2600, 5-2700, 5-2800, 5-2900, 5-3000, 5-3100 or more bases in length such that the region of homology has sufficient homology to undergo homologous recombination with the
corresponding genomic region.“Sufficient homology” indicates that two polynucleotide sequences have structural similarity such that they are capable of acting as substrates for a homologous recombination reaction. The structural similarity includes overall length of each polynucleotide fragment, as well as the sequence similarity of the polynucleotides. Sequence similarity can be described by the percent sequence identity over the whole length of the sequences, and/or by conserved regions comprising localized similarities such as contiguous nucleotides having 100% sequence identity, and percent sequence identity over a portion of the length of the sequences.
[0188] As used herein, a“genomic region” is a segment of a chromosome in the genome of a cell that is present on either side of the target site or, alternatively, also comprises a portion of the target site. The genomic region can comprise at least 5-10, 5-15, 5-20, 5-25, 5-30, 5-35, 5- 40, 5-45, 5- 50, 5-55, 5-60, 5-65, 5- 70, 5-75, 5-80, 5-85, 5-90, 5-95, 5-100, 5-200, 5-300, 5-400, 5-500, 5-600, 5-700, 5-800, 5-900, 5-1000, 5-1100, 5-1200, 5-1300, 5-1400, 5-1500, 5-1600, 5- 1700, 5-1800, 5-1900, 5-2000, 5-2100, 5-2200, 5-2300, 5-2400, 5-2500, 5-2600, 5-2700, 5-2800. 5-2900, 5-3000, 5-3100 or more bases such that the genomic region has sufficient homology to undergo homologous recombination with the corresponding region of homology. [0189] As used herein,“homologous recombination” (HR) includes the exchange of
DNA fragments between two DNA molecules at the sites of homology. The frequency of homologous recombination is influenced by a number of factors. Different organisms vary with respect to the amount of homologous recombination and the relative proportion of homologous to non-homologous recombination. Generally, the length of the region of homology affects the frequency of homologous recombination events: the longer the region of homology, the greater the frequency. The length of the homology region needed to observe homologous recombination is also species-variable. In many cases, at least 5 kb of homology has been utilized, but homologous recombination has been observed with as little as 25-50 bp of homology. See, for example, Singer et al. , (1982) Cell 31 :25-33; Shen and Huang, (1986) Genetics 112:441-57; Watt et al. , (1985) Proc. Natl. Acad. Sci. USA 82:4768-72, Sugawara and Haber, (1992 )Mol Cell Biol 12:563-75, Rubnitz and Subramani, (1984) o/ Cell Biol 4:2253-8; Ayares et al. , (1986) Proc. Natl. Acad. Sci. USA 83:5199-203; Liskay et al. , (1987) Genetics 115: 161-7.
[0190] “Sequence identity” or“identity” in the context of nucleic acid or polypeptide sequences refers to the nucleic acid bases or amino acid residues in two sequences that are the same when aligned for maximum correspondence over a specified comparison window.
[0191] The term“percentage of sequence identity” refers to the value determined by comparing two optimally aligned sequences over a comparison window, wherein the portion of the polynucleotide or polypeptide sequence in the comparison window may comprise additions or deletions (i.e., gaps) as compared to the reference sequence (which does not comprise additions or deletions) for optimal alignment of the two sequences. The percentage is calculated by determining the number of positions at which the identical nucleic acid base or amino acid residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison and multiplying the results by 100 to yield the percentage of sequence identity. Useful examples of percent sequence identities include, but are not limited to, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95%, or any percentage from 50% to 100%. These identities can be determined using any of the programs described herein.
[0192] Sequence alignments and percent identity or similarity calculations may be determined using a variety of comparison methods designed to detect homologous sequences including, but not limited to, the MegAlign™ program of the LASERGENE bioinformatics computing suite (DNASTAR Inc., Madison, WI). Within the context of this application it will be understood that where sequence analysis software is used for analysis, that the results of the analysis will be based on the“default values” of the program referenced, unless otherwise specified. As used herein“default values” will mean any set of values or parameters that originally load with the software when first initialized.
[0193] The“Clustal V method of alignment” corresponds to the alignment method labeled Clustal V (described by Higgins and Sharp, (1989) CABIOS 5:151-153; Higgins et al., (1992) Comput Appl Biosci 8: 189-191) and found in the MegAlign™ program of the
LASERGENE bioinformatics computing suite (DNASTAR Inc., Madison, WI). For multiple alignments, the default values correspond to GAP PENALTY=10 and GAP LENGTH
PENALTY=10. Default parameters for pairwise alignments and calculation of percent identity of protein sequences using the Clustal method are KTUPLE=1, GAP PENALTY=3, WINDOW=5 and DIAGONALS SAVED=5. For nucleic acids these parameters are KTUPLE=2, GAP
PENALTY=5, WINDOW=4 and DIAGONALS SAVED=4. After alignment of the sequences using the Clustal V program, it is possible to obtain a“percent identity” by viewing the “sequence distances” table in the same program. The“Clustal W method of alignment” corresponds to the alignment method labeled Clustal W (described by Higgins and Sharp, (1989) CABIOS 5:151-153; Higgins et al ., (1992) Comput Appl Biosci 8:189-191) and found in the MegAlign™ v6.1 program of the LASERGENE bioinformatics computing suite (DNASTAR Inc., Madison, WI). Default parameters for multiple alignment (GAP PENALTY=10, GAP LENGTH PENALTY=0.2, Delay Divergen Seqs (%)=30, DNA Transition Weight=0.5, Protein Weight Matrix=Gonnet Series, DNA Weight Matrix=IUB). After alignment of the sequences using the Clustal W program, it is possible to obtain a“percent identity” by viewing the “sequence distances” table in the same program. Unless otherwise stated, sequence
identity/similarity values provided herein refer to the value obtained using GAP Version 10 (GCG, Accelrys, San Diego, CA) using the following parameters: % identity and % similarity for a nucleotide sequence using a gap creation penalty weight of 50 and a gap length extension penalty weight of 3, and the nwsgapdna.cmp scoring matrix; % identity and % similarity for an amino acid sequence using a GAP creation penalty weight of 8 and a gap length extension penalty of 2, and the BLOSUM62 scoring matrix (Henikoff and Henikoff, (1989) Proc. Natl. Acad. Sci. USA 89: 10915). GAP uses the algorithm ofNeedleman and Wunsch, (1970) J Mol Biol 48:443-53, to find an alignment of two complete sequences that maximizes the number of matches and minimizes the number of gaps. GAP considers all possible alignments and gap positions and creates the alignment with the largest number of matched bases and the fewest gaps, using a gap creation penalty and a gap extension penalty in units of matched bases.
“BLAST” is a searching algorithm provided by the National Center for Biotechnology
Information (NCBI) used to find regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches to identify sequences having sufficient similarity to a query sequence such that the similarity would not be predicted to have occurred randomly. BLAST reports the identified sequences and their local alignment to the query sequence. It is well understood by one skilled in the art that many levels of sequence identity are useful in
identifying polypeptides from other species or modified naturally or synthetically wherein such polypeptides have the same or similar function or activity. Useful examples of percent identities include, but are not limited to, 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90% or 95%, or any percentage from 50% to 100%. Indeed, any amino acid identity from 50% to 100% may be useful in describing the present disclosure, such as 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99%.
[0194] Polynucleotide and polypeptide sequences, variants thereof, and the structural relationships of these sequences can be described by the terms“homology”,“homologous”, “substantially identical”,“substantially similar” and“corresponding substantially” which are used interchangeably herein. These refer to polypeptide or nucleic acid sequences wherein changes in one or more amino acids or nucleotide bases do not affect the function of the molecule, such as the ability to mediate gene expression or to produce a certain phenotype.
These terms also refer to modification(s) of nucleic acid sequences that do not substantially alter the functional properties of the resulting nucleic acid relative to the initial, unmodified nucleic acid. These modifications include deletion, substitution, and/or insertion of one or more nucleotides in the nucleic acid fragment. Substantially similar nucleic acid sequences
encompassed may be defined by their ability to hybridize (under moderately stringent conditions, e.g ., 0.5X SSC, 0.1% SDS, 60°C) with the sequences exemplified herein, or to any portion of the nucleotide sequences disclosed herein and which are functionally equivalent to any of the nucleic acid sequences disclosed herein. Stringency conditions can be adjusted to screen for moderately similar fragments, such as homologous sequences from distantly related organisms, to highly similar fragments, such as genes that duplicate functional enzymes from closely related organisms. Post-hybridization washes determine stringency conditions.
[0195] A "centimorgan" (cM) or "map unit" is the distance between two polynucleotide sequences, linked genes, markers, target sites, loci, or any pair thereof, wherein 1% of the products of meiosis are recombinant. Thus, a centimorgan is equivalent to a distance equal to a 1% average recombination frequency between the two linked genes, markers, target sites, loci, or any pair thereof.
[0196] An "isolated" or "purified" nucleic acid molecule, polynucleotide, polypeptide, or protein, or biologically active portion thereof, is substantially or essentially free from
components that normally accompany or interact with the polynucleotide or protein as found in its naturally occurring environment. Thus, an isolated or purified polynucleotide or polypeptide or protein is substantially free of other cellular material, or culture medium when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized. Optimally, an "isolated" polynucleotide is free of sequences (optimally protein encoding sequences) that naturally flank the polynucleotide (i.e., sequences located at the 5' and 3' ends of the polynucleotide) in the genomic DNA of the organism from which the polynucleotide is derived. For example, in various embodiments, the isolated polynucleotide can contain less than about 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb, or 0.1 kb of nucleotide sequence that naturally flank the polynucleotide in genomic DNA of the cell from which the polynucleotide is derived. Isolated polynucleotides may be purified from a cell in which they naturally occur. Conventional nucleic acid purification methods known to skilled artisans may be used to obtain isolated polynucleotides. The term also embraces recombinant polynucleotides and chemically synthesized polynucleotides.
[0197] The term“fragment” refers to a contiguous set of nucleotides or amino acids. In one embodiment, a fragment is 2, 3, 4, 5, 6, 7 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or greater than 20 contiguous nucleotides. In one embodiment, a fragment is 2, 3, 4, 5, 6, 7 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or greater than 20 contiguous amino acids. A fragment may or may not exhibit the function of a sequence sharing some percent identity over the length of said fragment.
[0198] The terms“fragment that is functionally equivalent” and“functionally equivalent fragment” are used interchangeably herein. These terms refer to a portion or subsequence of an isolated nucleic acid fragment or polypeptide that displays the same activity or function as the longer sequence from which it derives. In one example, the fragment retains the ability to alter gene expression or produce a certain phenotype whether or not the fragment encodes an active protein. For example, the fragment can be used in the design of genes to produce the desired phenotype in a modified plant. Genes can be designed for use in suppression by linking a nucleic acid fragment, whether or not it encodes an active enzyme, in the sense or antisense orientation relative to a plant promoter sequence.
[0199] “Gene” includes a nucleic acid fragment that expresses a functional molecule such as, but not limited to, a specific protein, including regulatory sequences preceding (5’ non coding sequences) and following (3’ non-coding sequences) the coding sequence.“Native gene” refers to a gene as found in its natural endogenous location with its own regulatory sequences.
[0200] By the term“endogenous” it is meant a sequence or other molecule that naturally occurs in a cell or organism. In one aspect, an endogenous polynucleotide is normally found in the genome of a cell; that is, not heterologous.
[0201] An“allele” is one of several alternative forms of a gene occupying a given locus on a chromosome. When all the alleles present at a given locus on a chromosome are the same, that plant is homozygous at that locus. If the alleles present at a given locus on a chromosome differ, that plant is heterozygous at that locus.
[0202] “Coding sequence” refers to a polynucleotide sequence which codes for a specific amino acid sequence.“Regulatory sequences” refer to nucleotide sequences located upstream (5’ non-coding sequences), within, or downstream (3’ non-coding sequences) of a coding sequence, and which influence the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences include, but are not limited to, promoters, translation leader sequences, 5’ untranslated sequences, 3’ untranslated sequences, introns, polyadenylation target sequences, RNA processing sites, effector binding sites, and stem-loop structures. [0203] A“mutated gene” is a gene that has been altered through human intervention.
Such a“mutated gene” has a sequence that differs from the sequence of the corresponding non- mutated gene by at least one nucleotide addition, deletion, or substitution. In certain
embodiments of the disclosure, the mutated gene comprises an alteration that results from a guide polynucleotide/Cas endonuclease system as disclosed herein. A mutated plant is a plant comprising a mutated gene.
[0204] As used herein, a“targeted mutation” is a mutation in a gene (referred to as the target gene), including a native gene, that was made by altering a target sequence within the target gene using any method known to one skilled in the art, including a method involving a guided Cas endonuclease system as disclosed herein.
[0205] The terms“knock-out”,“gene knock-out” and“genetic knock-out” are used interchangeably herein. A knock-out represents a DNA sequence of a cell that has been rendered partially or completely inoperative by targeting with a Cas protein; for example, a DNA sequence prior to knock-out could have encoded an amino acid sequence, or could have had a regulatory function ( e.g ., promoter).
[0206] The terms“knock-in”,“gene knock-in,“gene insertion” and“genetic knock-in” are used interchangeably herein. A knock-in represents the replacement or insertion of a DNA sequence at a specific DNA sequence in cell by targeting with a Cas protein (for example by homologous recombination (HR), wherein a suitable donor DNA polynucleotide is also used). Examples of knock-ins are a specific insertion of a heterologous amino acid coding sequence in a coding region of a gene, or a specific insertion of a transcriptional regulatory element in a genetic locus.
[0207] By“domain” it is meant a contiguous stretch of nucleotides (that can be RNA,
DNA, and/or RNA-DNA-combination sequence) or amino acids.
[0208] The term“conserved domain” or“motif’ means a set of polynucleotides or amino acids conserved at specific positions along an aligned sequence of evolutionarily related proteins. While amino acids at other positions can vary between homologous proteins, amino acids that are highly conserved at specific positions indicate amino acids that are essential to the structure, the stability, or the activity of a protein. Because they are identified by their high degree of conservation in aligned sequences of a family of protein homologues, they can be used as identifiers, or“signatures”, to determine if a protein with a newly determined sequence belongs to a previously identified protein family.
[0209] A“codon-modified gene” or“codon-preferred gene” or“codon-optimized gene” is a gene having its frequency of codon usage designed to mimic the frequency of preferred codon usage of the host cell.
[0210] An“optimized” polynucleotide is a sequence that has been optimized for improved expression in a particular heterologous host cell.
[0211] An“optimized nucleotide sequence" is a nucleotide sequence that has been optimized for expression in a particular organism. A plant-optimized nucleotide sequence includes a codon-optimized gene. A plant-optimized nucleotide sequence can be synthesized by modifying a nucleotide sequence encoding a protein such as, for example, a Cas endonuclease as disclosed herein, using one or more plant-preferred codons for improved expression. See , for example, Campbell and Gowri (1990) Plant Physiol. 92: 1-11 for a discussion of host-preferred codon usage.
[0212] A“promoter” is a region of DNA involved in recognition and binding of RNA polymerase and other proteins to initiate transcription. The promoter sequence consists of proximal and more distal upstream elements, the latter elements often referred to as enhancers. An“enhancer” is a DNA sequence that can stimulate promoter activity, and may be an innate element of the promoter or a heterologous element inserted to enhance the level or tissue- specificity of a promoter. Promoters may be derived in their entirety from a native gene, or be composed of different elements derived from different promoters found in nature, and/or comprise synthetic DNA segments. It is understood by those skilled in the art that different promoters may direct the expression of a gene in different tissues or cell types, or at different stages of development, or in response to different environmental conditions. It is further recognized that since in most cases the exact boundaries of regulatory sequences have not been completely defined, DNA fragments of some variation may have identical promoter activity.
[0213] Promoters that cause a gene to be expressed in most cell types at most times are commonly referred to as“constitutive promoters”. The term“inducible promoter” refers to a promoter that selectively express a coding sequence or functional RNA in response to the presence of an endogenous or exogenous stimulus, for example by chemical compounds (chemical inducers) or in response to environmental, hormonal, chemical, and/or developmental signals. Inducible or regulated promoters include, for example, promoters induced or regulated by light, heat, stress, flooding or drought, salt stress, osmotic stress, phytohormones, wounding, or chemicals such as ethanol, abscisic acid (ABA), jasmonate, salicylic acid, or safeners.
[0214] “Translation leader sequence” refers to a polynucleotide sequence located between the promoter sequence of a gene and the coding sequence. The translation leader sequence is present in the mRNA upstream of the translation start sequence. The translation leader sequence may affect processing of the primary transcript to mRNA, mRNA stability or translation efficiency. Examples of translation leader sequences have been described ( e.g .,
Turner and Foster, (1995) Mol Biotechnol 3:225-236).
[0215] “3’ non-coding sequences”,“transcription terminator” or“termination sequences” refer to DNA sequences located downstream of a coding sequence and include polyadenylation recognition sequences and other sequences encoding regulatory signals capable of affecting mRNA processing or gene expression. The polyadenylation signal is usually characterized by affecting the addition of polyadenylic acid tracts to the 3’ end of the mRNA precursor. The use of 3’ noncoding sequences is exemplified by Ingelbrecht et al. 1989 Plant Cell 1 :671-680.
[0216] “RNA transcript” refers to the product resulting from RNA polymerase-catalyzed transcription of a DNA sequence. When the RNA transcript is a perfect complimentary copy of the DNA sequence, it is referred to as the primary transcript or pre-mRNA. A RNA transcript is referred to as the mature RNA or mRNA when it is a RNA sequence derived from post- transcriptional processing of the primary transcript pre-mRNA.“Messenger RNA” or“mRNA” refers to the RNA that is without introns and that can be translated into protein by the cell.
“cDNA” refers to a DNA that is complementary to, and synthesized from, an mRNA template using the enzyme reverse transcriptase. The cDNA can be single-stranded or converted into double-stranded form using the Klenow fragment of DNA polymerase I.“Sense” RNA refers to RNA transcript that includes the mRNA and can be translated into protein within a cell or in vitro.“Antisense RNA” refers to an RNA transcript that is complementary to all or part of a target primary transcript or mRNA, and that blocks the expression of a target gene (see, e.g.,
U.S. Patent No. 5,107,065). The complementarity of an antisense RNA may be with any part of the specific gene transcript, i.e., at the 5’ non-coding sequence, 3’ non-coding sequence, introns, or the coding sequence.“Functional RNA” refers to antisense RNA, ribozyme RNA, or other RNA that may not be translated but yet has an effect on cellular processes. The terms “complement” and“reverse complement” are used interchangeably herein with respect to mRNA transcripts, and are meant to define the antisense RNA of the message.
[0217] As used herein,“poly-guide RNA” refers to a contiguous polynucleotide molecule that comprises a plurality of“discrete” guide RNA components, which are individual guide RNA molecules that are separated from each other, for example by one or more nuclease recognition sequences. In some aspects, the poly-guide RNA is encoded by a poly-DNA molecule comprising discrete DNA sequences that each encode a guide RNA that may form a functional complex with a Cas endonuclease for the targeting, recognition, binding, and optionally nicking or cleaving of one or more target polynucleotide sequence(s). In some aspects, the poly-guide RNA is an RNA molecule comprising discrete guide RNA sequences, that each may form a functional complex with a Cas endonuclease for the targeting, recognition, binding, and optionally nicking or cleaving of one or more target polynucleotide sequence(s). Each of the guide RNA DNA and/or RNA sequences within the poly-guide RNA or poly-DNA molecule may be identical, share some percentage of sequence identity with each other, or be non identical. In some aspects, one or more of the nuclease recognition sequence(s) may be the target of a Cas endoribonuclease. The nuclease recognition sequence enables a“functional interaction” between the poly-guide RNA and an endoribonuclease (for example a Cas endoribonuclease), that is, the endoribonuclease can recognize, bind to, and cleave the poly-guide RNA at the nuclease recognition sequence. In some aspects, one component of the poly-guide RNA is heterologous to another component.
[0218] The term "genome" refers to the entire complement of genetic material (genes and non-coding sequences) that is present in each cell of an organism, or virus or organelle; and/or a complete set of chromosomes inherited as a (haploid) unit from one parent.
[0219] The term“operably linked” refers to the association of nucleic acid sequences on a single nucleic acid fragment so that the function of one is regulated by the other. For example, a promoter is operably linked with a coding sequence when it is capable of regulating the expression of that coding sequence (i.e., the coding sequence is under the transcriptional control of the promoter). Coding sequences can be operably linked to regulatory sequences in a sense or antisense orientation. In another example, the complementary RNA regions can be operably linked, directly or indirectly, 5’ to the target mRNA, or 3’ to the target mRNA, or within the target mRNA, or a first complementary region is 5’ and its complement is 3’ to the target mRNA.
[0220] Generally,“host” refers to an organism or cell into which a heterologous component (polynucleotide, polypeptide, other molecule, cell) has been introduced. As used herein, a "host cell" refers to an in vivo or in vitro eukaryotic cell, prokaryotic cell (e.g., bacterial or archaeal cell), or cell from a multicellular organism (e.g, a cell line) cultured as a unicellular entity, into which a heterologous polynucleotide or polypeptide has been introduced. In some embodiments, the cell is selected from the group consisting of: an archaeal cell, a bacterial cell, a eukaryotic cell, a eukaryotic single-cell organism, a somatic cell, a germ cell, a stem cell, a plant cell, an algal cell, an animal cell, in invertebrate cell, a vertebrate cell, a fish cell, a frog cell, a bird cell, an insect cell, a mammalian cell, a pig cell, a cow cell, a goat cell, a sheep cell, a rodent cell, a rat cell, a mouse cell, a non-human primate cell, and a human cell. In some cases, the cell is in vitro. In some cases, the cell is in vivo.
[0221] The term“recombinant” refers to an artificial combination of two otherwise separated segments of sequence, e.g, by chemical synthesis, or manipulation of isolated segments of nucleic acids by genetic engineering techniques.
[0222] The terms“plasmid”,“vector” and“cassette” refer to a linear or circular extra chromosomal element often carrying genes that are not part of the central metabolism of the cell, and usually in the form of double-stranded DNA. Such elements may be autonomously replicating sequences, genome integrating sequences, phage, or nucleotide sequences, in linear or circular form, of a single- or double-stranded DNA or RNA, derived from any source, in which a number of nucleotide sequences have been joined or recombined into a unique construction which is capable of introducing a polynucleotide of interest into a cell.“Transformation cassette” refers to a specific vector comprising a gene and having elements in addition to the gene that facilitates transformation of a particular host cell.“Expression cassette” refers to a specific vector comprising a gene and having elements in addition to the gene that allow for expression of that gene in a host. In one aspect, a“Donor DNA cassette” comprises a heterologous
polynucleotide to be inserted at the double-strand break site created by a double-strand-break inducing agent (e.g. a Cas endonuclease and guide RNA complex), that is operably linked to a noncoding expression regulatory element. In some aspects, the Donor DNA cassette further comprises polynucleotide sequences that are homologous to the target site, that flank the polynucleotide of interest operably linked to a noncoding expression regulatory element.
[0223] The terms“recombinant DNA molecule”,“recombinant DNA construct”,
“expression construct”,“construct”, and“recombinant construct” are used interchangeably herein. A recombinant DNA construct comprises an artificial combination of nucleic acid sequences, e.g ., regulatory and coding sequences that are not all found together in nature. For example, a recombinant DNA construct may comprise regulatory sequences and coding sequences that are derived from different sources, or regulatory sequences and coding sequences derived from the same source, but arranged in a manner different than that found in nature. Such a construct may be used by itself or may be used in conjunction with a vector. If a vector is used, then the choice of vector is dependent upon the method that will be used to introduce the vector into the host cells as is well known to those skilled in the art. For example, a plasmid vector can be used. The skilled artisan is well aware of the genetic elements that must be present on the vector in order to successfully transform, select and propagate host cells. The skilled artisan will also recognize that different independent transformation events may result in different levels and patterns of expression (Jones et al. , (1985) EMBO J 4:2411-2418; De Almeida et al. , (1989 )Mol Gen Genetics 218:78-86), and thus that multiple events are typically screened in order to obtain lines displaying the desired expression level and pattern. Such screening may be accomplished standard molecular biological, biochemical, and other assays including Southern analysis of DNA, Northern analysis of mRNA expression, PCR, real time quantitative PCR (qPCR), reverse transcription PCR (RT-PCR), immunoblotting analysis of protein expression, enzyme or activity assays, and/or phenotypic analysis.
[0224] The term“heterologous” refers to the difference between the original
environment, location, or composition of a particular polynucleotide or polypeptide sequence and its current environment, location, or composition. Non-limiting examples include differences in taxonomic derivation (e.g, a polynucleotide sequence obtained from Zea mays would be heterologous if inserted into the genome of an Oryza sativa plant, or of a different variety or cultivar of Zea mays ; or a polynucleotide obtained from a bacterium was introduced into a cell of a plant), or sequence (e.g, a polynucleotide sequence obtained from Zea mays, isolated, modified, and re-introduced into a maize plant). As used herein,“heterologous” in reference to a sequence can refer to a sequence that originates from a different species, variety, foreign species, or, if from the same species, is substantially modified from its native form in composition and/or genomic locus by deliberate human intervention. For example, a promoter operably linked to a heterologous polynucleotide is from a species different from the species from which the polynucleotide was derived, or, if from the same/analogous species, one or both are substantially modified from their original form and/or genomic locus, or the promoter is not the native promoter for the operably linked polynucleotide. Alternatively, one or more regulatory region(s) and/or a polynucleotide provided herein may be entirely synthetic. In one aspect, a discrete component of a poly-guide RNA molecule is heterologous to at least one other component, /. e. , do not occur together in nature. In another aspect, the endoribonuclease and/or one or more of the discrete guide RNAs of the poly-guide RNA molecule are heterologous to each other. Any one or more of the components of a system may be heterologous with respect to one another, meaning they do not originate from the same organism.
[0225] The term“expression”, as used herein, refers to the production of a functional end-product ( e.g ., an mRNA, guide RNA, or a protein) in either precursor or mature form.
[0226] A“mature” protein refers to a post-translationally processed polypeptide (i.e., one from which any pre- or propeptides present in the primary translation product have been removed).
[0227] “Precursor” protein refers to the primary product of translation of mRNA (i.e., with pre- and propeptides still present). Pre- and propeptides may be but are not limited to intracellular localization signals.
[0228] “CRISPR” (Clustered Regularly Interspaced Short Palindromic Repeats) loci refers to certain genetic loci encoding components of DNA cleavage systems, for example, used by bacterial and archaeal cells to destroy foreign DNA (Horvath and Barrangou, 2010, Science 327: 167-170; W02007025097, published 01 March 2007). A CRISPR locus can consist of a CRISPR array comprising short direct repeats (CRISPR repeats) separated by short variable DNA sequences (called spacers), which can be flanked by diverse Cas (CRISPR-associated) genes.
[0229] As used herein, an“effector” or“effector protein” is a protein that encompasses an activity including recognizing, binding to, and/or cleaving or nicking a polynucleotide target. An effector, or effector protein, may also be an endonuclease. The“effector complex” of a CRISPR system includes Cas proteins involved in crRNA and target recognition and binding. Some of the component Cas proteins may additionally comprise domains involved in target polynucleotide cleavage.
[0230] The term“Cas protein” refers to a polypeptide encoded by a Cas (CRISPR- sociated) gene. A Cas protein includes but is not limited to: a Cas9 protein, a Cpfl (Casl2) protein, a C2cl protein, a C2c2 protein, a C2c3 protein, Cas3, Cas3-HD, Cas 5, Cas7, Cas8, CaslO, or combinations or complexes of these. A Cas protein may be a“Cas endonuclease” or “Cas effector protein”, that when in complex with a suitable polynucleotide component, is capable of recognizing, binding to, and optionally nicking or cleaving all or part of a specific polynucleotide target sequence. A Cas endonuclease described herein comprises one or more nuclease domains. The endonucleases of the disclosure may include those having one or more RuvC nuclease domains. A Cas protein is further defined as a functional fragment or functional variant of a native Cas protein, or a protein that shares at least 50%, between 50% and 55%, at least 55%, between 55% and 60%, at least 60%, between 60% and 65%, at least 65%, between 65% and 70%, at least 70%, between 70% and 75%, at least 75%, between 75% and 80%, at least 80%, between 80% and 85%, at least 85%, between 85% and 90%, at least 90%, between 90% and 95%, at least 95%, between 95% and 96%, at least 96%, between 96% and 97%, at least 97%, between 97% and 98%, at least 98%, between 98% and 99%, at least 99%, between 99% and 100%, or 100% sequence identity with at least 50, between 50 and 100, at least 100, between 100 and 150, at least 150, between 150 and 200, at least 200, between 200 and 250, at least 250, between 250 and 300, at least 300, between 300 and 350, at least 350, between 350 and 400, at least 400, between 400 and 450, at least 500, or greater than 500 contiguous amino acids of a native Cas protein, and retains at least partial activity.
[0231] A“Cas endonuclease” may comprise domains that enable it to function as a double-strand-break-inducing agent. A“Cas endonuclease” may also comprise one or more modifications or mutations that abolish or reduce its ability to cleave a double-strand polynucleotide (dCas). In some aspects, the Cas endonuclease molecule may retain the ability to nick a single-strand polynucleotide (for example, a D10A mutation in a Cas9 endonuclease molecule) (nCas9).
[0232] A“functional fragment“,“fragment that is functionally equivalent” and
“functionally equivalent fragment” of a Cas endonuclease are used interchangeably herein, and refer to a portion or subsequence of the Cas endonuclease of the present disclosure in which the ability to recognize, bind to, and optionally unwind, nick or cleave (introduce a single or double strand break in) the target site is retained. The portion or subsequence of the Cas endonuclease can comprise a complete or partial (functional) peptide of any one of its domains such as for example, but not limiting to a complete of functional part of a Cas3 HD domain, a complete of functional part of a Cas3 Helicase domain, complete of functional part of a Cascade protein (such as but not limiting to a Cas5, Cas5d, Cas7 and Cas8bl).
[0233] The terms“functional variant”,“variant that is functionally equivalent” and
“functionally equivalent variant” of a Cas endonuclease or Cas effector protein are used interchangeably herein, and refer to a variant of the Cas effector protein disclosed herein in which the ability to recognize, bind to, and optionally unwind, nick or cleave all or part of a target sequence is retained.
[0234] A Cas endonuclease may also include a multifunctional Cas endonuclease. The term“multifunctional Cas endonuclease” and“multifunctional Cas endonuclease polypeptide” are used interchangeably herein and includes reference to a single polypeptide that has Cas endonuclease functionality (comprising at least one protein domain that can act as a Cas endonuclease) and at least one other functionality, such as but not limited to, the functionality to form a cascade (comprises at least a second protein domain that can form a cascade with other proteins). In one aspect, the multifunctional Cas endonuclease comprises at least one additional protein domain relative (either internally, upstream (5’), downstream (3’), or both internally 5’ and 3’, or any combination thereof) to those domains typical of a Cas endonuclease.
[0235] The terms“cascade” and“cascade complex” are used interchangeably herein and include reference to a multi-subunit protein complex that can assemble with a polynucleotide forming a polynucleotide-protein complex (PNP). Cascade is a PNP that relies on the polynucleotide for complex assembly and stability, and for the identification of target nucleic acid sequences. Cascade functions as a surveillance complex that finds and optionally binds target nucleic acids that are complementary to a variable targeting domain of the guide polynucleotide.
[0236] The terms”cleavage-ready Cascade”,“crCascade”,” cleavage-ready Cascade complex”,“crCascade complex”,”cleavage-ready Cascade system”,“CRC” and“crCascade system”, are used interchangeably herein and include reference to a multi-subunit protein complex that can assemble with a polynucleotide forming a polynucleotide-protein complex (PNP), wherein one of the cascade proteins is a Cas endonuclease capable of recognizing, binding to, and optionally unwinding, nicking, or cleaving all or part of a target sequence.
[0237] The terms“5’-cap” and“7-methylguanylate (m7G) cap” are used interchangeably herein. A 7-methylguanylate residue is located on the 5' terminus of messenger RNA (mRNA) in eukaryotes. RNA polymerase II (Pol II) transcribes mRNA in eukaryotes. Messenger RNA capping occurs generally as follows: The most terminal 5’ phosphate group of the mRNA transcript is removed by RNA terminal phosphatase, leaving two terminal phosphates. A guanosine monophosphate (GMP) is added to the terminal phosphate of the transcript by a guanylyl transferase, leaving a 5 '-5' triphosphate-linked guanine at the transcript terminus.
Finally, the 7-nitrogen of this terminal guanine is methylated by a methyl transferase.
[0238] The terminology“not having a 5’ -cap” herein is used to refer to RNA having, for example, a 5’-hydroxyl group instead of a 5’-cap. Such RNA can be referred to as“uncapped RNA”, for example. Uncapped RNA can better accumulate in the nucleus following
transcription, since 5’ -capped RNA is subject to nuclear export. One or more RNA components herein are uncapped.
[0239] As used herein, the term“guide polynucleotide”, relates to a polynucleotide sequence that can form a complex with a Cas endonuclease, including the Cas endonuclease described herein, and enables the Cas endonuclease to recognize, optionally bind to, and optionally cleave a DNA target site. The guide polynucleotide sequence can be a RNA sequence, a DNA sequence, or a combination thereof (a RNA-DNA combination sequence).
[0240] The terms“functional fragment”,“fragment that is functionally equivalent” and
“functionally equivalent fragment” of a guide RNA, crRNA or tracrRNA are used
interchangeably herein, and refer to a portion or subsequence of the guide RNA, crRNA or tracrRNA, respectively, of the present disclosure in which the ability to function as a guide RNA, crRNA or tracrRNA, respectively, is retained.
[0241] The terms“functional variant“,“variant that is functionally equivalent” and
“functionally equivalent variant” of a guide RNA, crRNA or tracrRNA (respectively) are used interchangeably herein, and refer to a variant of the guide RNA, crRNA or tracrRNA, respectively, of the present disclosure in which the ability to function as a guide RNA, crRNA or tracrRNA, respectively, is retained. [0242] The terms“single guide RNA" and“sgRNA” are used interchangeably herein and relate to a synthetic fusion of two RNA molecules, a crRNA (CRISPR RNA) comprising a variable targeting domain (linked to a tracr mate sequence that hybridizes to a tracrRNA), fused to a tracrRNA (trans-activating CRISPR RNA). The single guide RNA can comprise a crRNA or crRNA fragment and a tracrRNA or tracrRNA fragment of the type II CRISPR/Cas system that can form a complex with a type II Cas endonuclease, wherein said guide RNA/Cas endonuclease complex can direct the Cas endonuclease to a DNA target site, enabling the Cas endonuclease to recognize, optionally bind to, and optionally nick or cleave (introduce a single or double-strand break) the DNA target site.
[0243] The term“variable targeting domain” or“VT domain” is used interchangeably herein and includes a nucleotide sequence that can hybridize (is complementary) to one strand (nucleotide sequence) of a double strand DNA target site. The percent complementation between the first nucleotide sequence domain (VT domain) and the target sequence can be at least 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 63%, 65%, 66%,
67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%,
83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%,
99% or 100%. The variable targeting domain can be at least 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25, 26, 27, 28, 29 or 30 nucleotides in length. In some embodiments, the variable targeting domain comprises a contiguous stretch of 12 to 30 nucleotides. The variable targeting domain can be composed of a DNA sequence, a RNA sequence, a modified DNA sequence, a modified RNA sequence, or any combination thereof.
[0244] The term“Cas endonuclease recognition domain” or“CER domain” (of a guide polynucleotide) is used interchangeably herein and includes a nucleotide sequence that interacts with a Cas endonuclease polypeptide. A CER domain comprises a (trans-acting) tracrNucleotide mate sequence followed by a tracrNucleotide sequence. The CER domain can be composed of a DNA sequence, a RNA sequence, a modified DNA sequence, a modified RNA sequence (see for example US20150059010A1, published 26 February 2015), or any combination thereof.
[0245] As used herein, the terms“guide polynucleotide/Cas endonuclease complex”,
“guide polynucleotide/Cas endonuclease system”,“ guide polynucleotide/Cas complex”,“guide polynucleotide/Cas system” and“guided Cas system”“Polynucleotide-guided endonuclease” , “PGEN” are used interchangeably herein and refer to at least one guide polynucleotide and at least one Cas endonuclease, that are capable of forming a complex, wherein said guide polynucleotide/Cas endonuclease complex can direct the Cas endonuclease to a DNA target site, enabling the Cas endonuclease to recognize, bind to, and optionally nick or cleave (introduce a single or double-strand break) the DNA target site. A guide polynucleotide/Cas endonuclease complex herein can comprise Cas protein(s) and suitable polynucleotide component(s) of any of the known CRISPR systems (Horvath and Barrangou, 2010, Science 327: 167-170; Makarova el al. 2015, Nature Reviews Microbiology Vol. 13: 1 - 15; Zetsche el a/. , 2015, Cell 163, 1-13;
Shmakov et al., 2015, Molecular Cell 60, 1-13).
[0246] The terms“guide RNA/Cas endonuclease complex”,“guide RNA/Cas endonuclease system”,“ guide RNA/Cas complex”,“guide RNA/Cas system”,“gRNA/Cas complex”,“gRNA/Cas system”,“RNA-guided endonuclease” ,“RGEN” are used
interchangeably herein and refer to at least one RNA component and at least one Cas
endonuclease that are capable of forming a complex , wherein said guide RNA/Cas endonuclease complex can direct the Cas endonuclease to a DNA target site, enabling the Cas endonuclease to recognize, bind to, and optionally nick or cleave (introduce a single or double-strand break) the DNA target site.
[0247] The terms“target site”,“target sequence”,“target site sequence,’’target DNA”,
“target locus”,“genomic target site”,“genomic target sequence”,“genomic target locus”,“target polynucleotide”, and“protospacer”, are used interchangeably herein and refer to a
polynucleotide sequence such as, but not limited to, a nucleotide sequence on a chromosome, episome, a locus, or any other DNA molecule in the genome (including chromosomal, chloroplastic, mitochondrial DNA, plasmid DNA) of a cell, at which a guide polynucleotide/Cas endonuclease complex can recognize, bind to, and optionally nick or cleave . The target site can be an endogenous site in the genome of a cell, or alternatively, the target site can be heterologous to the cell and thereby not be naturally occurring in the genome of the cell, or the target site can be found in a heterologous genomic location compared to where it occurs in nature. As used herein, terms“endogenous target sequence” and“native target sequence” are used
interchangeable herein to refer to a target sequence that is endogenous or native to the genome of a cell and is at the endogenous or native position of that target sequence in the genome of the cell. An“artificial target site” or“artificial target sequence” are used interchangeably herein and refer to a target sequence that has been introduced into the genome of a cell. Such an artificial target sequence can be identical in sequence to an endogenous or native target sequence in the genome of a cell but be located in a different position ( i.e ., a non-endogenous or non-native position) in the genome of a cell.
[0248] A“protospacer adjacent motif’ (PAM) herein refers to a short nucleotide sequence adjacent to a target sequence (protospacer) that is recognized (targeted) by a guide polynucleotide/Cas endonuclease system described herein. The Cas endonuclease may not successfully recognize a target DNA sequence if the target DNA sequence is not followed by a PAM sequence. The sequence and length of a PAM herein can differ depending on the Cas protein or Cas protein complex used. The PAM sequence can be of any length but is typically 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20 nucleotides long.
[0249] An“altered target site”,“altered target sequence”,“modified target site”,
“modified target sequence” are used interchangeably herein and refer to a target sequence as disclosed herein that comprises at least one alteration when compared to non-altered target sequence. Such“alterations” include, for example: replacement of at least one nucleotide, deletion of at least one nucleotide, insertion of at least one nucleotide, chemical modification of at least one nucleotide, or any combination of the preceding.
[0250] A“modified nucleotide” or“edited nucleotide” refers to a nucleotide sequence of interest that comprises at least one alteration when compared to its non-modified nucleotide sequence. Such“alterations” include, for example: replacement of at least one nucleotide, deletion of at least one nucleotide, insertion of at least one nucleotide, chemical modification of at least one nucleotide, or any combination of the preceding.
[0251] Methods for“modifying a target site” and“altering a target site” are used interchangeably herein and refer to methods for producing an altered target site.
[0252] As used herein,“donor DNA” is a DNA construct that comprises a polynucleotide of interest to be inserted into the target site of a Cas endonuclease.
[0253] The term“polynucleotide modification template” includes a polynucleotide that comprises at least one nucleotide modification when compared to the nucleotide sequence to be edited. A nucleotide modification can be at least one nucleotide substitution, addition or deletion. Optionally, the polynucleotide modification template can further comprise homologous nucleotide sequences flanking the at least one nucleotide modification, wherein the flanking homologous nucleotide sequences provide sufficient homology to the desired nucleotide sequence to be edited.
[0254] The term“plant” generically includes whole plants, plant organs, plant tissues, seeds, plant cells, seeds and progeny of the same. Plant cells include, without limitation, cells from seeds, suspension cultures, embryos, meristematic regions, callus tissue, leaves, roots, shoots, gametophytes, sporophytes, pollen and microspores.
[0255] A "plant element" or“plant part” is intended to reference either a whole plant or a plant component, which may comprise differentiated and/or undifferentiated tissues, for example but not limited to plant tissues, parts, and cell types. In one embodiment, a plant element is one of the following: whole plant, seedling, meristematic tissue, ground tissue, vascular tissue, dermal tissue, seed, leaf, root, shoot, stem, flower, fruit, stolon, bulb, tuber, corm, keiki, shoot, bud, tumor tissue, and various forms of cells and culture ( e.g ., single cells, embryos, callus tissue), intact plant cells comprising a cell wall, plant protoplasts (lacking a cell wall), plant cell tissue cultures from which plants can be regenerated, plant calli, plant clumps, and plant cells that are intact in plants or parts of plants such as embryos, pollen, ovules, seeds, leaves, flowers, branches, fruit, kernels, ears, cobs, husks, stalks, roots, root tips, anthers, and the like, as well as the parts themselves. Grain is intended to mean the mature seed produced by commercial growers for purposes other than growing or reproducing the species. Progeny, variants, and mutants of the regenerated plants are also included within the scope of the invention, provided that these parts comprise the introduced polynucleotides. The term "plant organ" refers to plant tissue or a group of tissues that constitute a morphologically and functionally distinct part of a plant. As used herein, a "plant element" is synonymous to a "portion" or“part” of a plant, and refers to any part of the plant, and can include distinct tissues and/or organs, and may be used interchangeably with the term "tissue" throughout. Similarly, a "plant reproductive element" is intended to generically reference any part of a plant that is able to initiate other plants via either sexual or asexual reproduction of that plant, for example but not limited to: seed, seedling, root, shoot, cutting, scion, graft, stolon, bulb, tuber, corm, keiki, or bud. The plant element may be in plant or in a plant organ, tissue culture, or cell culture.
[0256] “Progeny” comprises any subsequent generation of a plant.
[0257] The term“monocotyledonous” or“monocot” refers to the subclass of angiosperm plants also known as“monocotyledoneae”, whose seeds typically comprise only one embryonic leaf, or cotyledon. The term includes references to whole plants, plant elements, plant organs ( e.g ., leaves, stems, roots, etc.), seeds, plant cells, and progeny of the same.
[0258] The term“dicotyledonous” or“dicot” refers to the subclass of angiosperm plants also knows as“dicotyledoneae”, whose seeds typically comprise two embryonic leaves, or cotyledons. The term includes references to whole plants, plant elements, plant organs (e.g., leaves, stems, roots, etc.), seeds, plant cells, and progeny of the same.
[0259] As used herein, a "male sterile plant" is a plant that does not produce male gametes that are viable or otherwise capable of fertilization. As used herein, a "female sterile plant" is a plant that does not produce female gametes that are viable or otherwise capable of fertilization. It is recognized that male-sterile and female-sterile plants can be female-fertile and male- fertile, respectively. It is further recognized that a male fertile (but female sterile) plant can produce viable progeny when crossed with a female fertile plant and that a female fertile (but male sterile) plant can produce viable progeny when crossed with a male fertile plant.
[0260] The term“non-conventional yeast” herein refers to any yeast that is not a
Saccharomyces (e.g, S. cerevisiae) or Schizosaccharomyces yeast species (see“Non- Conventional Yeasts in Genetics, Biochemistry and Biotechnology: Practical Protocols”, K. Wolf, K.D. Breunig, G. Barth, Eds., Springer-Verlag, Berlin, Germany, 2003).
[0261] The term“crossed” or“cross” or“crossing” in the context of this disclosure means the fusion of gametes via pollination to produce progeny (i.e., cells, seeds, or plants). The term encompasses both sexual crosses (the pollination of one plant by another) and selfing (self- pollination, i.e., when the pollen and ovule (or microspores and megaspores) are from the same plant or genetically identical plants).
[0262] The term“introgression” refers to the transmission of a desired allele of a genetic locus from one genetic background to another. For example, introgression of a desired allele at a specified locus can be transmitted to at least one progeny plant via a sexual cross between two parent plants, where at least one of the parent plants has the desired allele within its genome. Alternatively, for example, transmission of an allele can occur by recombination between two donor genomes, e.g, in a fused protoplast, where at least one of the donor protoplasts has the desired allele in its genome. The desired allele can be, e.g, a transgene, a modified (mutated or edited) native allele, or a selected allele of a marker or QTL. [0263] The term“isoline” is a comparative term, and references organisms that are genetically identical, but differ in treatment. In one example, two genetically identical maize plant embryos may be separated into two different groups, one receiving a treatment (such as the introduction of a CRISPR-Cas effector endonuclease) and one control that does not receive such treatment. Any phenotypic differences between the two groups may thus be attributed solely to the treatment and not to any inherency of the plant's endogenous genetic makeup.
[0264] "Introducing" is intended to mean presenting to a target, such as a cell or organism, a polynucleotide or polypeptide or polynucleotide-protein complex, in such a manner that the component s) gains access to the interior of a cell of the organism or to the cell itself.
[0265] A“polynucleotide of interest” includes any nucleotide sequence that
[0266] In some aspects, a“polynucleotide of interest” encodes a protein or polypeptide that is“of interest” for a particular purpose, e.g. a selectable marker. In some aspects a trait or polynucleotide“of interest” is one that improves a desirable phenotype of a plant, particularly a crop plant, i.e. a trait of agronomic interest. Polynucleotides of interest: include, but are not limited to, polynucleotides encoding important traits for agronomics, herbicide-resistance, insecticidal resistance, disease resistance, nematode resistance, herbicide resistance, microbial resistance, fungal resistance, viral resistance, fertility or sterility, grain characteristics, commercial products, phenotypic marker, or any other trait of agronomic or commercial importance. A polynucleotide of interest may additionally be utilized in either the sense or anti- sense orientation. Further, more than one polynucleotide of interest may be utilized together, or “stacked”, to provide additional benefit. In some aspects, a“polynucleotide of interest” may encode a gene expression regulatory element, for example a promoter, intron, terminator,
5’UTR, 3’UTR, or other noncoding sequence. In some aspects, a“polynucleotide of interest” may comprise a DNA sequences that encodes for an RNA molecule, for example a functional RNA, siRNA, miRNA, or a guide RNA that is capable of interacting with a Cas endonuclease to bind to a target polynucleotide sequence.
[0267] A“complex trait locus” includes a genomic locus that has multiple transgenes genetically linked to each other.
[0268] The compositions and methods herein may provide for an improved "agronomic trait" or "trait of agronomic importance" or“trait of agronomic interest” to a plant, which may include, but not be limited to, the following: disease resistance, drought tolerance, heat tolerance, cold tolerance, salinity tolerance, metal tolerance, herbicide tolerance, improved water use efficiency, improved nitrogen utilization, improved nitrogen fixation, pest resistance, herbivore resistance, pathogen resistance, yield improvement, health enhancement, vigor improvement, growth improvement, photosynthetic capability improvement, nutrition enhancement, altered protein content, altered oil content, increased biomass, increased shoot length, increased root length, improved root architecture, modulation of a metabolite, modulation of the proteome, increased seed weight, altered seed carbohydrate composition, altered seed oil composition, altered seed protein composition, altered seed nutrient composition, as compared to an isoline plant not comprising a modification derived from the methods or compositions herein.
[0269] "Agronomic trait potential" is intended to mean a capability of a plant element for exhibiting a phenotype, preferably an improved agronomic trait, at some point during its life cycle, or conveying said phenotype to another plant element with which it is associated in the same plant.
[0270] The terms "decreased," "fewer," "slower" and "increased" "faster" "enhanced"
"greater" as used herein refers to a decrease or increase in a characteristic of the modified plant element or resulting plant compared to an unmodified plant element or resulting plant. For example, a decrease in a characteristic may be at least 1%, at least 2%, at least 3%, at least 4%, at least 5%, between 5% and 10%, at least 10%, between 10% and 20%, at least 15%, at least 20%, between 20% and 30%, at least 25%, at least 30%, between 30% and 40%, at least 35%, at least 40%, between 40% and 50%, at least 45%, at least 50%, between 50% and 60%, at least about 60%, between 60% and 70%, between 70% and 80%, at least 75%, at least about 80%, between 80% and 90%, at least about 90%, between 90% and 100%, at least 100%, between 100% and 200%, at least 200%, at least about 300%, at least about 400%) or more lower than the untreated control and an increase may be at least 1%, at least 2%, at least 3%, at least 4%, at least 5%, between 5% and 10%, at least 10%, between 10% and 20%, at least 15%, at least 20%, between 20% and 30%, at least 25%, at least 30%, between 30% and 40%, at least 35%, at least 40%, between 40% and 50%, at least 45%, at least 50%, between 50% and 60%, at least about 60%, between 60% and 70%, between 70% and 80%, at least 75%, at least about 80%, between 80% and 90%, at least about 90%, between 90% and 100%, at least 100%, between 100% and 200%, at least 200%, at least about 300%, at least about 400% or more higher than the untreated control. [0271] As used herein, the term“before”, in reference to a sequence position, refers to an occurrence of one sequence upstream, or 5’, to another sequence.
[0272] The meaning of abbreviations is as follows:“sec” means second(s),“min” means minute(s),“h” means hour(s),“d” means day(s),“microliters” means microliter(s),“mL” means milliliter(s),“L” means liter(s),“uM” means micromolar,“mM” means millimolar,“M” means molar,“mmol” means millimole(s),“umole” mean micromole(s),“g” means gram(s), “micrograms” or“ug” means microgram(s),“ng” means nanogram(s),“U” means unit(s),“bp” means base pair(s) and“kb” means kilobase(s).
Double-Strand-Break (DSB) Inducing Agents
[0273] Double-strand breaks induced by double-strand-break-inducing agents, such as endonucleases that cleave the phosphodiester bond within a polynucleotide chain, can result in the induction of DNA repair mechanisms, including the non-homologous end-joining pathway, and homologous recombination. Endonucleases include a range of different enzymes, including restriction endonucleases (see e.g. Roberts et ah, (2003) Nucleic Acids Res 1 :418-20), Roberts et ah, (2003) Nucleic Acids Res 31 : 1805-12, and Belfort et ah, (2002) in Mobile DNA II, pp. 761- 783, Eds. Craigie et ak, (ASM Press, Washington, DC)), meganucleases (see e.g, WO
2009/114321; Gao et al. (2010) Plant Journal 1 :176-187), TAL effector nucleases or TALENs (see e.g, US20110145940, Christian, M., T. Cermak, et al. 2010. Targeting DNA double-strand breaks with TAL effector nucleases. Genetics 186(2): 757-61 and Boch et al., (2009), Science 326(5959): 1509-12), zinc finger nucleases (see e.g. Kim, Y. G., J. Cha, et al. (1996). "Hybrid restriction enzymes: zinc finger fusions to Fokl cleavage”), and CRISPR-Cas endonucleases (see e.g. W02007/025097 application published March 1, 2007).
[0274] In addition to the double-strand break inducing agents, site-specific base conversions can also be achieved to engineer one or more nucleotide changes to create one or more EMEs described herein into the genome. These include for example, a site-specific base edit mediated by an OG to T·A or an A·T to G*C base editing deaminase enzymes (Gaudelli et al., Programmable base editing of A·T to G*C in genomic DNA without DNA cleavage." Nature (2017); Nishida et al.“Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems.” Science 353 (6305) (2016); Komor et al.“Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage.” Nature 533 (7603) (2016):420-4. [0275] Any double-strand-break or -nick or -modification inducing agent may be used for the methods described herein, including for example but not limited to: Cas endonucleases, recombinases, TALENs, zinc finger nucleases, restriction endonucleases, meganucleases, and deaminases.
CRISPR systems
[0276] Methods and compositions are provided for polynucleotide modification with a
CRISPR Associated (Cas) endonuclease. Class I Cas endonucleases comprise multisubunit effector complexes (Types I, III, and IV), while Class 2 systems comprise single protein effectors (Types II, V, and VI) (Makarova et al. 2015, Nature Reviews Microbiology Vol. 13: 1- 15; Zetsche et al., 2015, Cell 163, 1-13; Shmakov et al., 2015, Molecular Cell 60, 1-13; Haft et al, 2005, Computational Biology, PLoS Comput Biol 1(6): e60; and Koonin et al. 2017, Curr Opinion Microbiology 37:67-78). In Class 2 Type II systems, the Cas endonuclease acts in complex with a guide RNA (gRNA) that directs the Cas endonuclease to cleave the DNA target to enable target recognition, binding, and cleavage by the Cas endonuclease. In many systems, the Cas endonuclease-guide polynucleotide complex recognizes a short nucleotide sequence adjacent to the target sequence (protospacer), called a“protospacer adjacent motif’ (PAM).
[0277] Examples of a Cas endonuclease include but are not limited to Cas9 and Cpfl .
Cas9 (formerly referred to as Cas5, Csnl, or Csxl2) is a Class 2 Type II Cas endonuclease (Makarova et al. 2015, Nature Reviews Microbiology Vol. 13: 1-15). A Cas9-gRNA complex recognizes a 3’ PAM sequence (NGG for the S. pyogenes Cas9) at the target site, permitting the spacer of the guide RNA to invade the double-stranded DNA target, and, if sufficient homology between the spacer and protospacer exists, generate a double-strand break cleavage. Cas9 endonucleases comprise RuvC and HNH domains that together produce double strand breaks, and separately can produce single strand breaks. For the S. pyogenes Cas9 endonuclease, the double-strand break leaves a blunt end. Cpfl is a Clas 2 Type V Cas endonuclease, and comprises nuclease RuvC domain but lacks an HNH domain (Yamane et al., 2016, Cell 165:949- 962). Cpfl endonucleases create“sticky” overhang ends.
[0278] Some uses for Cas-gRNA systems at a genomic target site include but are not limited to insertions, deletions, substitutions, or modifications of one or more nucleotides at the target site; modifying or replacing nucleotide sequences of interest (such as a regulatory elements); insertion of polynucleotides of interest; gene knock-out; gene-knock in; modification of splicing sites and/or introducing alternate splicing sites; modifications of nucleotide sequences encoding a protein of interest; amino acid and/or protein fusions; and gene silencing by expressing an inverted repeat into a gene of interest.
[0279] In some aspects, a“polynucleotide modification template” is provided that comprises at least one nucleotide modification when compared to the nucleotide sequence to be edited. A nucleotide modification can be at least one nucleotide substitution, addition, deletion, or chemical alteration. Optionally, the polynucleotide modification template can further comprise homologous nucleotide sequences flanking the at least one nucleotide modification, wherein the flanking homologous nucleotide sequences provide sufficient homology to the desired nucleotide sequence to be edited.
[0280] In some aspects, a polynucleotide of interest is inserted at a target site and provided as part of a“donor DNA” molecule. As used herein,“donor DNA” is a DNA construct that comprises a polynucleotide of interest to be inserted into the target site of a Cas
endonuclease. The donor DNA construct further comprises a first and a second region of homology that flank the polynucleotide of interest. The first and second regions of homology of the donor DNA share homology to a first and a second genomic region, respectively, present in or flanking the target site of the cell or organism genome. The donor DNA can be tethered to the guide polynucleotide. Tethered donor DNAs can allow for co-localizing target and donor DNA, useful in genome editing, gene insertion, and targeted genome regulation, and can also be useful in targeting post-mitotic cells where function of endogenous HR machinery is expected to be highly diminished (Mali et al. , 2013, Nature Methods Vol. 10: 957-963). The amount of homology or sequence identity shared by a target and a donor polynucleotide can vary and includes total lengths and/or regions.
[0281] The process for editing a genomic sequence at a Cas-gRNA double-strand-break site with a modification template generally comprises: providing a host cell with a Cas-gRNA complex that recognizes a target sequence in the genome of the host cell and is able to induce a single- or double-strand-break in the genomic sequence, and optionally at least one
polynucleotide modification template comprising at least one nucleotide alteration when compared to the nucleotide sequence to be edited. The polynucleotide modification template can further comprise nucleotide sequences flanking the at least one nucleotide alteration, in which the flanking sequences are substantially homologous to the chromosomal region flanking the double-strand break. Genome editing using double-strand-break-inducing agents, such as Cas9- gRNA complexes, has been described, for example in US20150082478 published on 19 March
2015, WO2015026886 published on 26 February 2015, W02016007347 published 14 January
2016, and W02016025131 published on 18 February 2016.
[0282] To facilitate optimal expression and nuclear localization for eukaryotic cells, the gene comprising the Cas endonuclease may be optimized as described in WO2016186953 published 24 November 2016, and then delivered into cells as DNA expression cassettes by methods known in the art. In some aspects, the Cas endonuclease is provided as a polypeptide. In some aspects, the Cas endonuclease is provided as a polynucleotide encoding a polypeptide. In some aspects, the guide RNA is provided as a DNA molecule encoding one or more RNA molecules. In some aspects, the guide RNA is provide as RNA or chemically-modified RNA. In some aspects, the Cas endonuclease protein and guide RNA are provided as a ribonucleoprotein complex (RNP).
Cas Proteins
[0283] Cas endonucleases, Cas endoribonucleases, Cas effector proteins, and Cascade proteins may all be generally referred to as“Cas proteins”.
[0284] Endonucleases are enzymes that cleave the phosphodiester bond within a polynucleotide chain, and include restriction endonucleases that cleave DNA at specific sites without damaging the bases. Examples of endonucleases include restriction endonucleases, meganucleases, TAL effector nucleases (TALENs), zinc finger nucleases, and Cas (CRISPR- associated) effector endonucleases.
[0285] Cas endonucleases, either as single effector proteins or in an effector complex with other components, unwind the DNA duplex at the target sequence and optionally cleave at least one DNA strand, as mediated by recognition of the target sequence by a polynucleotide (such as, but not limited to, a crRNA or guide RNA) that is in complex with the Cas effector protein. Such recognition and cutting of a target sequence by a Cas endonuclease typically occurs if the correct protospacer-adjacent motif (PAM) is located at or adjacent to the 3' end of the DNA target sequence. Alternatively, a Cas endonuclease herein may lack DNA cleavage or nicking activity, but can still specifically bind to a DNA target sequence when complexed with a suitable RNA component. [0286] Cas endonucleases may occur as individual effectors (Class 2 CRISPR systems) or as part of larger effector complexes (Class I CRISPR systems). Cas endonucleases that have been described include, but are not limited to, for example: Cas3 (a feature of Class 1 type I systems), Cas9 (a feature of Class 2 type II systems) and Cas 12 (Cpfl) (a feature of Class 2 type V systems).
[0287] Cas endoribonucleases are a feature of some systems, for example Type I. The
Type I-E Cas endoribonuclease is an integral subunit of the targeting complex of the Cascade, that binds and cleaves within each repeat sequence of the precursor crRNA (pre-crRNA) transcript, generating a library of crRNAs wherein each contains a unique spacer sequence flanked by portions of the adjacent repeats (Hochstrasser and Doudna, TIBS 40(l):58-66, 2015).
[0288] Type I-E Cas endonucleases, endoribonucleases, and effector proteins can be used for targeted genome editing (via simplex and multiplex double-strand breaks and nicks) and targeted genome regulation (via tethering of epigenetic effector domains to either the Cas protein or sgRNA. A Cas endonuclease can also be engineered to function as an RNA-guided
recombinase, and via RNA tethers could serve as a scaffold for the assembly of multiprotein and nucleic acid complexes (Mali et al. , 2013, Nature Methods Vol. 10: 957-963).
[0289] Fragments and variants of Cas endonucleases, endoribonucleases, and effector proteins can be obtained via methods such as site-directed mutagenesis and synthetic
construction. Methods for measuring endonuclease activity are well known in the art such as, but not limiting to, WO2013166113 published 07 November 2013, WO2016186953 published 24 November 2016, and WO2016186946 published 24 November 2016.
[0290] A Cas endonuclease, endoribonuclease, or effector protein can comprise a modified form of the Cas polypeptide. The modified form of the Cas polypeptide can include an amino acid change ( e.g ., deletion, insertion, chemical alteration, or substitution) that reduces the naturally-occurring nuclease activity of the Cas protein. For example, in some instances, the modified form of the Cas protein has less than 50%, less than 40%, less than 30%, less than 20%, less than 10%, less than 5%, or less than 1% of the nuclease activity of the corresponding wild- type Cas polypeptide (US20140068797 published 06 March 2014). In some cases, the modified form of the Cas polypeptide has no substantial nuclease activity and is referred to as catalytically “inactivated Cas” or“deactivated Cas (dCas).” An inactivated Cas/deactivated Cas includes a deactivated Cas endonuclease (dCas). A catalytically inactive Cas effector protein can be fused to a heterologous sequence to induce or modify activity.
[0291] A Cas endonuclease, endoribonuclease, or effector protein can be part of a fusion protein comprising one or more heterologous protein domains ( e.g ., 1, 2, 3, or more domains in addition to the Cas protein). Such a fusion protein may comprise any additional protein sequence, and optionally a linker sequence between any two domains, such as between Cas and a first heterologous domain. Examples of protein domains that may be fused to a Cas protein herein include, without limitation, epitope tags (e.g., histidine [His], V5, FLAG, influenza hemagglutinin [HA], myc, VSV-G, thioredoxin [Trx]), reporters (e.g, glutathione-5-transferase [GST], horseradish peroxidase [HRP], chloramphenicol acetyltransferase [CAT], beta- galactosidase, beta-glucuronidase [GUS], luciferase, green fluorescent protein [GFP], HcRed, DsRed, cyan fluorescent protein [CFP], yellow fluorescent protein [YFP], blue fluorescent protein [BFP]), and domains having one or more of the following activities: methylase activity, demethylase activity, transcription activation activity (e.g, VP 16 or VP64), transcription repression activity, transcription release factor activity, histone modification activity, RNA cleavage activity and nucleic acid binding activity. A Cas protein can also be in fusion with a protein that binds DNA molecules or other molecules, such as maltose binding protein (MBP), S- tag, Lex A DNA binding domain (DBD), GAL4A DNA binding domain, and herpes simplex virus (HSV) VP 16.
[0292] A catalytically active and/or inactive Cas endonuclease, endoribonuclease, or effector protein can be fused to a heterologous sequence (US20140068797 published 06 March 2014). Suitable fusion partners include, but are not limited to, a polypeptide that provides an activity that indirectly increases transcription by acting directly on the target DNA or on a polypeptide (e.g, a histone or other DNA-binding protein) associated with the target DNA.
Additional suitable fusion partners include, but are not limited to, a polypeptide that provides for methyltransferase activity, demethylase activity, acetyltransferase activity, deacetylase activity, kinase activity, phosphatase activity, ubiquitin ligase activity, deubiquitinating activity, adenylation activity, deadenylation activity, SUMOylating activity, deSUMOylating activity, ribosylation activity, deribosylation activity, myristoylation activity, or demyristoylation activity. Further suitable fusion partners include, but are not limited to, a polypeptide that directly provides for increased transcription of the target nucleic acid (e.g, a transcription activator or a fragment thereof, a protein or fragment thereof that recruits a transcription activator, a small molecule/drug-responsive transcription regulator, etc.).
[0293] The Cas proteins described herein can be expressed and purified by methods known in the art, for example as described in WO/2016/186953 published 24 November 2016.
[0294] A Cas endonuclease, endoribonuclease, or effector protein can comprise a heterologous nuclear localization sequence (NLS). A heterologous NLS amino acid sequence herein may be of sufficient strength to drive accumulation of a Cas protein in a detectable amount in the nucleus of a yeast cell herein, for example. An NLS may comprise one
(monopartite) or more (e.g, bipartite) short sequences (e.g, 2 to 20 residues) of basic, positively charged residues (e.g, lysine and/or arginine), and can be located anywhere in a Cas amino acid sequence but such that it is exposed on the protein surface. An NLS may be operably linked to the N-terminus or C-terminus of a Cas protein herein, for example. Two or more NLS sequences can be linked to a Cas protein, for example, such as on both the N- and C-termini of a Cas protein. The Cas endonuclease gene can be operably linked to a SV40 nuclear targeting signal upstream of the Cas codon region and a bipartite VirD2 nuclear localization signal (Tinland el al. (1992) Proc. Natl. Acad. Sci. USA 89:7442-6) downstream of the Cas codon region. Non limiting examples of suitable NLS sequences herein include those disclosed in U.S. Patent Nos. 6,660,830 and 7,309,576.
Guide polynucleotides
[0295] The guide polynucleotide molecule comprises a Cas endonuclease recognition
(CER) domain that interacts with the Cas endonuclease, and a Variable Targeting (VT) domain that hybridizes to a nucleotide sequence in a target DNA . In some aspects, the guide, such as a gRNA, comprises a CRISPR RNA (crRNA) and a trans-activating CRISPR RNA (tracrRNA) to guide the Cas endonuclease to its DNA target. The crRNA comprises a spacer region
complementary to one strand of the double strand DNA target and a region that base pairs with the tracrRNA, forming an RNA duplex. In some aspects, the gRNA is a“single” guide RNA” (sgRNA) that comprises a synthetic fusion of crRNA and tracrRNA.
[0296] A targeting method herein can be performed in such a way that two or more DNA target sites are targeted in the method, for example. Such a method can optionally be
characterized as a“multiplex” method. Two, three, four, five, six, seven, eight, nine, ten, or more target sites can be targeted at the same time in certain embodiments. A multiplex method is typically performed by a targeting method herein in which multiple different RNA components are provided, each designed to guide a guide polynucleotide/Cas endonuclease complex to a unique DNA target site.
[0297] Described herein are methods and compositions for utilizing a poly-guide RNA to provide multiple guide RNAs to a target polynucleotide sequence for editing a plurality of target polynucleotides. The guide RNAs are separated by a Cas endoribonuclease binding/cleavage site, such that a contiguous RNA sequence may comprise a plurality of gRNA sequences, for example for forming a complex with a Cas endonuclease for genomic target site cleavage. The plurality of gRNA sequences in the contiguous RNA sequence may comprise identical, similar, or dissimilar gRNA sequences, or any combination thereof.
[0298] The poly-guide RNA (poly-guide RNA) may be used in vitro or in vivo , for example as part of an RNA sequence that comprises more than one guide RNA for a Cas endonuclease. The poly-guide RNA may be introduced to a cell, for example, as a component of a DNA vector comprising a DNA sequence that can be translated into the poly-guide RNA. Alternatively, the poly-guide RNA may be introduced directly as a polyribonucleotide. The method of introduction to a cell may be by any means known in the art, for example but not limited to Agrobacterium or Ochrobacterium transformation, or by particle bombardment.
[0299] The polygRNA may be provided to a target cell or target polynucleotide by any method known in the art, including but not limited to: direct contact in a solution, delivery on a solid matrix such as a particle (microparticle or nanoparticle or silicon carbide“whiskers”), via a liposome, or as part of a recombinant vector. The polygRNA may be provided as a DNA sequence encoding an RNA sequence, or provided directly as an RNA sequence, or provided as a combination RNA-DNA sequence.
[0300] The Cas endoribonuclease (RN) may be provided to the polygRNA, or to a target cell, by any method known in the art, including but not limited to: direct contact in a solution, delivery on a solid matrix such as a particle (microparticle or nanoparticle or silicon carbide “whiskers”), via a liposome, or as part of a recombinant vector. The RN may be provided as a DNA sequence encoding an RNA sequence that can optionally be translated into the RN protein, or provided as an RNA sequence that may be optionally translated into the RN protein, or provided as an RN protein. Double-Strand-Break Repair and Polynucleotide Modification
[0301] Once a double-strand break is induced in the genome, cellular DNA repair mechanisms are activated to repair the break.
[0302] A double-strand-break-inducing agent, such a guided Cas endonuclease can recognize, bind to a DNA target sequence and introduce a single strand (nick) or double-strand break. Once a single or double-strand break is induced in the DNA, the cell’s DNA repair mechanism is activated to repair the break, for example via nonhomologous end-joining (NHEJ) or Homology -Directed Repair (HDR) processes which can lead to modifications at the target site. The most common repair mechanism to bring the broken ends together is the
nonhomologous end-joining (NHEJ) pathway (Bleuyard et al, (2006) DNA Repair 5: 1-12). The structural integrity of chromosomes is typically preserved by the repair, but deletions, insertions, or other rearrangements (such as chromosomal translocations) are possible (Siebert and Puchta, 2002, Plant Cell 14: 1121-31; Pacher et al. , 2007, Genetics 175:21-9). NHEJ is often error-prone and can introduce small mutations in the target site. In plants, NHEJ is often the preferred pathway by which DSBs are remediated.
[0303] Modification of a target polynucleotide includes any one or more of the following: insertion of at least one nucleotide, deletion of at least one nucleotide, chemical alteration of at least one nucleotide, replacement of at least one nucleotide, or mutation of at least one nucleotide. In some aspects, the DNA repair mechanism creates an imperfect repair of the double-strand break, resulting in a change of a nucleotide at the break site. In some aspects, a polynucleotide template may be provided to the break site, wherein the repair results in a template-directed repair of the break. In some aspects, a donor polynucleotide may be provided to the break site, wherein the repair results in the incorporation of the donor polynucleotide into the break site.
[0304] In some aspects, the methods and compositions described herein improve the probability of a non-NHEJ repair mechanism outcome at a DSB. In one aspect, an increase of the HDR to NHEJ repair ratio is effected.
Homology -Directed Repair and Homologous Recombination
[0305] Homology-directed repair (HDR) is a mechanism in cells to repair double- stranded and single stranded DNA breaks. Homology-directed repair includes homologous recombination (HR) and single-strand annealing (SSA) (Lieber. 2010 Annu. Rev. Biochem. 79: 181-211). The most common form of HDR is called homologous recombination (HR), which has the longest sequence homology requirements between the donor and acceptor DNA. Other forms of HDR include single-stranded annealing (SSA) and breakage-induced replication, and these require shorter sequence homology relative to HR. Homology-directed repair at nicks (single-stranded breaks) can occur via a mechanism distinct from HDR at double-strand breaks (Davis and Maizels. PNAS (0027-8424), 111 (10), p. E924-E932).
[0306] By“homology” is meant DNA sequences that are similar. For example, a“region of homology to a genomic region” that is found on the donor DNA is a region of DNA that has a similar sequence to a given“genomic region” in the cell or organism genome. A region of homology can be of any length that is sufficient to promote homologous recombination at the cleaved target site. For example, the region of homology can comprise at least 5-10, 5-15, 5-20, 5-25, 5-30, 5-35, 5-40, 5-45, 5- 50, 5-55, 5-60, 5-65, 5- 70, 5-75, 5-80, 5-85, 5-90, 5-95, 5-100, 5-200, 5-300, 5-400, 5-500, 5-600, 5-700, 5-800, 5-900, 5-1000, 5-1100, 5-1200, 5-1300, 5- 1400, 5-1500, 5-1600, 5-1700, 5-1800, 5-1900, 5-2000, 5-2100, 5-2200, 5-2300, 5-2400, 5-2500, 5-2600, 5-2700, 5-2800, 5-2900, 5-3000, 5-3100 or more bases in length such that the region of homology has sufficient homology to undergo homologous recombination with the
corresponding genomic region.“Sufficient homology” indicates that two polynucleotide sequences share structural similarity to act as substrates for a homologous recombination reaction. The structural similarity includes overall length of each polynucleotide fragment, as well as the sequence similarity of the polynucleotides. Sequence similarity can be described by the percent sequence identity over the whole length of the sequences, and/or by conserved regions comprising localized similarities such as contiguous nucleotides having 100% sequence identity, and percent sequence identity over a portion of the length of the sequences.
[0307] The amount of homology or sequence identity shared by a target and a donor polynucleotide can vary and includes total lengths and/or regions having unit integral values in the ranges of about 1-20 bp, 20-50 bp, 50-100 bp, 75-150 bp, 100-250 bp, 150-300 bp, 200-400 bp, 250-500 bp, 300-600 bp, 350-750 bp, 400-800 bp, 450-900 bp, 500-1000 bp, 600-1250 bp, 700-1500 bp, 800-1750 bp, 900-2000 bp, 1-2.5 kb, 1.5-3 kb, 2-4 kb, 2.5-5 kb, 3-6 kb, 3.5-7 kb, 4-8 kb, 5-10 kb, or up to and including the total length of the target site. These ranges include every integer within the range, for example, the range of 1-20 bp includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 and 20 bps. The amount of homology can also be described by percent sequence identity over the full aligned length of the two polynucleotides which includes percent sequence identity of about at least 50%, 55%, 60%, 65%, 70%, 71%, 72%,
73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%. Sufficient homology includes any combination of polynucleotide length, global percent sequence identity, and optionally conserved regions of contiguous nucleotides or local percent sequence identity, for example sufficient homology can be described as a region of 75-150 bp having at least 80% sequence identity to a region of the target locus. Sufficient homology can also be described by the predicted ability of two polynucleotides to specifically hybridize under high stringency conditions, see, for example, Sambrook et al. , ( 1989) Molecular Cloning: A Laboratory Manual, (Cold Spring Harbor Laboratory Press, NY); Current Protocols in Molecular Biology , Ausubel et al. , Eds (1994) Current Protocols, (Greene Publishing Associates, Inc. and John Wiley &
Sons, Inc.); and, Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology- Hybridization with Nucleic Acid Probes , (Elsevier, New York).
[0308] DNA double-strand breaks can be an effective factor to stimulate homologous recombination pathways (Puchta et al. , (1995) Plant Mol Biol 28:281-92; Tzfira and White, (2005) Trends Biotechnol 23:567-9; Puchta, (2005) J Exp Bot 56: 1-14). Using DNA-breaking agents, a two- to nine-fold increase of homologous recombination was observed between artificially constructed homologous DNA repeats in plants (Puchta et al. , (1995) Plant Mol Biol 28:281-92). In maize protoplasts, experiments with linear DNA molecules demonstrated enhanced homologous recombination between plasmids (Lyznik et al. , (1991) Mol Gen Genet 230:209-18).
[0309] Alteration of the genome of a prokaryotic and eukaryotic cell or organism cell, for example, through homologous recombination (HR), is a powerful tool for genetic engineering. Homologous recombination has been demonstrated in plants (Halfter et al. , (1992) Mol Gen Genet 231 : 186-93) and insects (Dray and Gloor, 1997, Genetics 147:689-99). Homologous recombination has also been accomplished in other organisms. For example, at least 150-200 bp of homology was required for homologous recombination in the parasitic protozoan Leishmania (Papadopoulou and Dumas, (1997) Nucleic Acids Res 25:4278-86). In the filamentous fungus Aspergillus nidulans , gene replacement has been accomplished with as little as 50 bp flanking homology (Chaveroche et al. , (2000) Nucleic Acids Res 28:e97). Targeted gene replacement has also been demonstrated in the ciliate Tetrahymenthermophila (Gaertig et a/. , (1994) Nucleic Acids Res 22:5391-8). In mammals, homologous recombination has been most successful in the mouse using pluripotent embryonic stem cell lines (ES) that can be grown in culture,
transformed, selected and introduced into a mouse embryo (Watson et al ., 1992, Recombinant DNA, 2nd Ed., Scientific American Books distributed by WH Freeman & Co.).
Improving the Probability of HDR in DSB Repair
[0310] Methods and compositions for encouraging the repair of a double strand break via
HDR are contemplated.
[0311] In some aspects, the fraction of HR reads relative to the number of total mutant reads (NHEJ + HR) is at least 2, 3, 4, 5, 6, 7, 8, 9, 10, between 10 and 15, 15, between 15 and 20, 20, between 20 and 25, 25, between 25 and 30, 30, between 30 and 40, 40, between 40 and 50, 50, between 50 and 60, 60, between 60 and 70, 70, between 70 and 80, 80, between 80 and 90, 90, between 90 and 100, 100, between 100 and 125, 125, between 125 and 150, greater than 150, or infinitely greater than that observed for a single cleavage strategy.
[0312] In some aspects, the percent of HR reads relative to the number of total mutant reads (NHEJ + HR) is at least 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 20%, 31%, 32%, 33%, 34%, 35%, 36%, 37%, 38%, 39%, 40%, 41%, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100%.
Gene Targeting
[0313] The compositions and methods described herein can be used for gene targeting.
[0314] In general, DNA targeting can be performed by cleaving one or both strands at a specific polynucleotide sequence in a cell with a Cas endonuclease associated with a suitable guide polynucleotide component. Once a single or double-strand break is induced in the DNA, the cell’s DNA repair mechanism is activated to repair the break via nonhomologous end-joining (NHEJ) or Homology -Directed Repair (HDR) processes which can lead to modifications at the target site. [0315] The length of the DNA sequence at the target site can vary, and includes, for example, target sites that are at least 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more than 30 nucleotides in length. It is further possible that the target site can be palindromic, that is, the sequence on one strand reads the same in the opposite direction on the complementary strand. The nick/cleavage site can be within the target sequence or the nick/cleavage site could be outside of the target sequence. In another variation, the cleavage could occur at nucleotide positions immediately opposite each other to produce a blunt end cut or, in other cases, the incisions could be staggered to produce single-stranded overhangs, also called“sticky ends”, which can be either 5' overhangs, or 3' overhangs. Active variants of genomic target sites can also be used. Such active variants can comprise at least 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to the given target site, wherein the active variants retain biological activity and hence are capable of being recognized and cleaved by an Cas endonuclease.
[0316] Assays to measure the single or double-strand break of a target site by an endonuclease are known in the art and generally measure the overall activity and specificity of the agent on DNA substrates comprising recognition sites.
Gene Editing
[0317] The process for editing a genomic sequence combining DSB and modification templates generally comprises: introducing into a host cell a DSB-inducing agent, or a nucleic acid encoding a DSB-inducing agent, that recognizes a target sequence in the chromosomal sequence and is able to induce a DSB in the genomic sequence, and at least one polynucleotide modification template comprising at least one nucleotide alteration when compared to the nucleotide sequence to be edited. The polynucleotide modification template can further comprise nucleotide sequences flanking the at least one nucleotide alteration, in which the flanking sequences are substantially homologous to the chromosomal region flanking the DSB. Genome editing using DSB-inducing agents, such as Cas-gRNA complexes, has been described, for example in US20150082478 published on 19 March 2015, WO2015026886 published on 26 February 2015, W02016007347 published 14 January 2016, and WO/2016/025131 published on 18 February 2016.
[0318] Some uses for guide RNA/Cas endonuclease systems have been described (see for example: US20150082478 A1 published 19 March 2015, WO2015026886 published 26 February 2015, and US20150059010 published 26 February 2015) and include but are not limited to modifying or replacing nucleotide sequences of interest (such as a regulatory elements), insertion of polynucleotides of interest, gene drop-out, gene knock-out, gene-knock in, modification of splicing sites and/or introducing alternate splicing sites, modifications of nucleotide sequences encoding a protein of interest, amino acid and/or protein fusions, and gene silencing by expressing an inverted repeat into a gene of interest.
[0319] Proteins may be altered in various ways including amino acid substitutions, deletions, truncations, and insertions. Methods for such manipulations are generally known. For example, amino acid sequence variants of the protein(s) can be prepared by mutations in the DNA. Methods for mutagenesis and nucleotide sequence alterations include, for example, Kunkel, (1985) Proc. Natl. Acad. Sci. USA 82:488-92; Kunkel et al., ( 1987) Meth Unzymol 154:367-82; U.S. Patent No. 4,873,192; Walker and Gaastra, eds. (1983) Techniques in
Molecular Biology (MacMillan Publishing Company, New York) and the references cited therein. Guidance regarding amino acid substitutions not likely to affect biological activity of the protein is found, for example, in the model of Dayhoff el al. , (1978) Atlas of Protein Sequence and Structure (Natl Biomed Res Found, Washington, D.C.). Conservative substitutions, such as exchanging one amino acid with another having similar properties, may be preferable.
Conservative deletions, insertions, and amino acid substitutions are not expected to produce radical changes in the characteristics of the protein, and the effect of any substitution, deletion, insertion, or combination thereof can be evaluated by routine screening assays. Assays for double-strand-break-inducing activity are known and generally measure the overall activity and specificity of the agent on DNA substrates comprising target sites.
[0320] Described herein are methods for genome editing with Cleavage Ready Cascade
(crCascade) Complexes. Following characterization of the guide RNA and PAM sequence, components of the cleavage ready Cascade (crCascade) complex and associated CRISPR RNA (crRNA) may be utilized to modify chromosomal DNA in other organisms including plants. To facilitate optimal expression and nuclear localization (for eukaryotic cells), the genes comprising the crCascade may be optimized as described in WO2016186953 published 24 November 2016, and then delivered into cells as DNA expression cassettes by methods known in the art. The components necessary to comprise an active crCascade complex may also be delivered as RNA with or without modifications that protect the RNA from degradation or as mRNA capped or uncapped (Zhang, Y. et al. , 2016, Nat. Commun. 7:12617) or Cas protein guide polynucleotide complexes (WO2017070032 published 27 April 2017), or any combination thereof.
Additionally, a part or part(s) of the crCascade complex and crRNA may be expressed from a DNA construct while other components are delivered as RNA with or without modifications that protect the RNA from degradation or as mRNA capped or uncapped (Zhang et al. 2016 Nat. Commun. 7: 12617) or Cas protein guide polynucleotide complexes (W02017070032 published 27 April 2017) or any combination thereof. To produce crRNAs in-vivo , tRNA derived elements may also be used to recruit endogenous RNAses to cleave crRNA transcripts into mature forms capable of guiding the crCascade complex to its DNA target site, as described, for example, in W02017105991 published 22 June 2017. crCascade nickase complexes may be utilized separately or conceitedly to generate a single or multiple DNA nicks on one or both DNA strands. Furthermore, the cleavage activity of the Cas endonuclease may be deactivated by altering key catalytic residues in its cleavage domain (Sinkunas, T. et al. , 2013, EMBO J.
32:385-394) resulting in a RNA guided helicase that may be used to enhance homology-directed repair, induce transcriptional activation, or remodel local DNA structures. Moreover, the activity of the Cas cleavage and helicase domains may both be knocked-out and used in combination with other DNA cutting, DNA nicking, DNA binding, transcriptional activation, transcriptional repression, DNA remodeling, DNA deamination, DNA unwinding, DNA recombination enhancing, DNA integration, DNA inversion, and DNA repair agents.
[0321] The transcriptional direction of the tracrRNA for the CRISPR-Cas system (if present) and other components of the CRISPR-Cas system (such as variable targeting domain, crRNA repeat, loop, anti-repeat) can be deduced as described in WO2016186946 published 24 November 2016, and WO2016186953 published 24 November 2016.
[0322] As described herein, once the appropriate guide RNA requirement is established, the PAM preferences for each new system disclosed herein may be examined. If the cleavage ready Cascade (crCascade) complex results in degradation of the randomized PAM library, the crCascade complex can be converted into a nickase by disabling the ATPase dependent helicase activity either through mutagenesis of critical residues or by assembling the reaction in the absence of ATP as described previously (Sinkunas, T. et al., 2013, EMBO J. 32:385-394). Two regions of PAM randomization separated by two protospacer targets may be utilized to generate a double-stranded DNA break which may be captured and sequenced to examine the PAM sequences that support cleavage by the respective crCascade complex.
[0323] In one embodiment, the invention describes a method for modifying a target site in the genome of a cell, the method comprising introducing into a cell at least one Cas endonuclease and guide RNA, and identifying at least one cell that has a modification at the target site.
[0324] The nucleotide to be edited can be located within or outside a target site recognized and cleaved by a Cas endonuclease. In one embodiment, the at least one nucleotide modification is not a modification at a target site recognized and cleaved by a Cas endonuclease. In another embodiment, there are at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 30, 40, 50, 100, 200, 300, 400, 500, 600, 700, 900 or 1000 nucleotides between the at least one nucleotide to be edited and the genomic target site.
[0325] A knock-out may be produced by an indel (insertion or deletion of nucleotide bases in a target DNA sequence through NHEJ), or by specific removal of sequence that reduces or completely destroys the function of sequence at or near the targeting site.
[0326] A guide polynucleotide/Cas endonuclease induced targeted mutation can occur in a nucleotide sequence that is located within or outside a genomic target site that is recognized and cleaved by the Cas endonuclease.
[0327] The method for editing a nucleotide sequence in the genome of a cell can be a method without the use of an exogenous selectable marker by restoring function to a non functional gene product.
[0328] In one embodiment, the invention describes a method for modifying a target site in the genome of a cell, the method comprising introducing into a cell at least one PGEN described herein and at least one donor DNA, wherein said donor DNA comprises a
polynucleotide of interest, and optionally, further comprising identifying at least one cell that said polynucleotide of interest integrated in or near said target site.
[0329] In one aspect, the methods disclosed herein may employ homologous
recombination (HR) to provide integration of the polynucleotide of interest at the target site.
[0330] Various methods and compositions can be employed to produce a cell or organism having a polynucleotide of interest inserted in a target site via activity of a CRISPR- Cas system component described herein. In one method described herein, a polynucleotide of interest is introduced into the organism cell via a donor DNA construct. As used herein,“donor DNA” is a DNA construct that comprises a polynucleotide of interest to be inserted into the target site of a Cas endonuclease. The donor DNA construct further comprises a first and a second region of homology that flank the polynucleotide of interest. The first and second regions of homology of the donor DNA share homology to a first and a second genomic region, respectively, present in or flanking the target site of the cell or organism genome.
[0331] The donor DNA can be tethered to the guide polynucleotide. Tethered donor
DNAs can allow for co-localizing target and donor DNA, useful in genome editing, gene insertion, and targeted genome regulation, and can also be useful in targeting post-mitotic cells where function of endogenous HR machinery is expected to be highly diminished (Mali et al ., 2013, Nature Methods Vol. 10: 957-963).
[0332] The amount of homology or sequence identity shared by a target and a donor polynucleotide can vary and includes total lengths and/or regions having unit integral values in the ranges of about 1-20 bp, 20-50 bp, 50-100 bp, 75-150 bp, 100-250 bp, 150-300 bp, 200-400 bp, 250-500 bp, 300-600 bp, 350-750 bp, 400-800 bp, 450-900 bp, 500-1000 bp, 600-1250 bp, 700-1500 bp, 800-1750 bp, 900-2000 bp, 1-2.5 kb, 1.5-3 kb, 2-4 kb, 2.5-5 kb, 3-6 kb, 3.5-7 kb, 4-8 kb, 5-10 kb, or up to and including the total length of the target site. These ranges include every integer within the range, for example, the range of 1-20 bp includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 and 20 bps. The amount of homology can also be described by percent sequence identity over the full aligned length of the two polynucleotides which includes percent sequence identity of about at least 50%, 55%, 60%, 65%, 70%, 71%, 72%,
73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100%. Sufficient homology includes any combination of polynucleotide length, global percent sequence identity, and optionally conserved regions of contiguous nucleotides or local percent sequence identity, for example sufficient homology can be described as a region of 75-150 bp having at least 80% sequence identity to a region of the target locus. Sufficient homology can also be described by the predicted ability of two polynucleotides to specifically hybridize under high stringency conditions, see, for example, Sambrook et al. , ( 1989) Molecular Cloning: A Laboratory Manual, (Cold Spring Harbor Laboratory Press, NY); Current Protocols in Molecular Biology , Ausubel et al. , Eds (1994) Current Protocols, (Greene Publishing Associates, Inc. and John Wiley & Sons, Inc.); and, Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology- Hybridization with Nucleic Acid Probes , (Elsevier, New York).
[0333] Episomal DNA molecules can also be ligated into the double-strand break, for example, integration of T-DNAs into chromosomal double-strand breaks (Chilton and Que,
(2003 ) Plant Physiol 133:956-65; Salomon and Puchta, (1998 ) EMBO J. 17:6086-95). Once the sequence around the double-strand breaks is altered, for example, by exonuclease activities involved in the maturation of double-strand breaks, gene conversion pathways can restore the original structure if a homologous sequence is available, such as a homologous chromosome in non-dividing somatic cells, or a sister chromatid after DNA replication (Molinier et al ., (2004) Plant Cell 16:342-52). Ectopic and/or epigenic DNA sequences may also serve as a DNA repair template for homologous recombination (Puchta, (1999) Genetics 152:1173-81).
[0334] In one embodiment, the disclosure comprises a method for editing a nucleotide sequence in the genome of a cell, the method comprising introducing into at least one PGEN described herein, and a polynucleotide modification template, wherein said polynucleotide modification template comprises at least one nucleotide modification of said nucleotide sequence, and optionally further comprising selecting at least one cell that comprises the edited nucleotide sequence.
[0335] The guide polynucleotide/Cas endonuclease system can be used in combination with at least one polynucleotide modification template to allow for editing (modification) of a genomic nucleotide sequence of interest. (See also US20150082478, published 19 March 2015 and WO2015026886 published 26 February 2015).
[0336] Polynucleotides of interest and/or traits can be stacked together in a complex trait locus as described in WO2012129373 published 27 September 2012, and in WO2013112686, published 01 August 2013. The guide polynucleotide/Cas9 endonuclease system described herein provides for an efficient system to generate double-strand breaks and allows for traits to be stacked in a complex trait locus.
[0337] A guide polynucleotide/Cas system as described herein, mediating gene targeting, can be used in methods for directing heterologous gene insertion and/or for producing complex trait loci comprising multiple heterologous genes in a fashion similar as disclosed in
WO2012129373 published 27 September 2012, where instead of using a double-strand break inducing agent to introduce a gene of interest, a guide polynucleotide/Cas system as disclosed herein is used. By inserting independent transgenes within 0.1, 0.2, 0.3, 0.4, 0.5, 1.0, 2, or even 5 centimorgans (cM) from each other, the transgenes can be bred as a single genetic locus (see, for example, US20130263324 published 03 October 2013 or WO2012129373 published 14 March 2013). After selecting a plant comprising a transgene, plants comprising (at least) one transgenes can be crossed to form an FI that comprises both transgenes. In progeny from these FI (F2 or BC1) 1/500 progeny would have the two different transgenes recombined onto the same chromosome. The complex locus can then be bred as single genetic locus with both transgene traits. This process can be repeated to stack as many traits as desired.
[0338] Further uses for guide RNA/Cas endonuclease systems have been described (See for example: US20150082478 published 19 March 2015, WO2015026886 published 26
February 2015, US20150059010 published 26 February 2015, W02016007347 published 14 January 2016, and PCT application W02016025131 published 18 February 2016) and include but are not limited to modifying or replacing nucleotide sequences of interest (such as a regulatory elements), insertion of polynucleotides of interest, gene knock-out, gene-knock in, modification of splicing sites and/or introducing alternate splicing sites, modifications of nucleotide sequences encoding a protein of interest, amino acid and/or protein fusions, and gene silencing by expressing an inverted repeat into a gene of interest.
[0339] Resulting characteristics from the gene editing compositions and methods described herein may be evaluated. Chromosomal intervals that correlate with a phenotype or trait of interest can be identified. A variety of methods well known in the art are available for identifying chromosomal intervals. The boundaries of such chromosomal intervals are drawn to encompass markers that will be linked to the gene controlling the trait of interest. In other words, the chromosomal interval is drawn such that any marker that lies within that interval (including the terminal markers that define the boundaries of the interval) can be used as a marker for a particular trait. In one embodiment, the chromosomal interval comprises at least one QTL, and furthermore, may indeed comprise more than one QTL. Close proximity of multiple QTLs in the same interval may obfuscate the correlation of a particular marker with a particular QTL, as one marker may demonstrate linkage to more than one QTL. Conversely, e.g ., if two markers in close proximity show co-segregation with the desired phenotypic trait, it is sometimes unclear if each of those markers identifies the same QTL or two different QTL. The term“quantitative trait locus” or“QTL” refers to a region of DNA that is associated with the differential expression of a quantitative phenotypic trait in at least one genetic background, e.g, in at least one breeding population. The region of the QTL encompasses or is closely linked to the gene or genes that affect the trait in question. An“allele of a QTL” can comprise multiple genes or other genetic factors within a contiguous genomic region or linkage group, such as a haplotype. An allele of a QTL can denote a haplotype within a specified window wherein said window is a contiguous genomic region that can be defined, and tracked, with a set of one or more polymorphic markers. A haplotype can be defined by the unique fingerprint of alleles at each marker within the specified window.
Recombinant Constructs and Transformation of Cells
[0340] The disclosed guide polynucleotides, Cas endonucleases, polynucleotide modification templates, donor DNAs, guide polynucleotide/Cas endonuclease systems disclosed herein, and any one combination thereof, optionally further comprising one or more
polynucleotide(s) of interest, can be introduced into a cell. Cells include, but are not limited to, prokaryotic, eukaryotic, human, non-human, animal, bacterial, fungal, insect, yeast, non- conventional yeast, and plant cells, as well as whole organisms and progeny produced by the methods described herein.
[0341] Standard recombinant DNA and molecular cloning techniques used herein are well known in the art and are described more fully in Sambrook et al. , Molecular Cloning: A Laboratory Manual; Cold Spring Harbor Laboratory. Cold Spring Harbor, NY (1989).
Transformation methods are well known to those skilled in the art and are described infra.
[0342] Vectors and constructs include circular plasmids, and linear polynucleotides, comprising a polynucleotide of interest and optionally other components including linkers, adapters, regulatory or analysis. In some examples a recognition site and/or target site can be comprised within an intron, coding sequence, 5' UTRs, 3' UTRs, and/or regulatory regions.
Components for Expression and Utilization of CRISPR-Cas Systems Cells
[0343] The invention further provides expression constructs for expressing in a prokaryotic or eukaryotic cell/organism a guide RNA/Cas system that is capable of recognizing, binding to, and optionally nicking, unwinding, or cleaving all or part of a target sequence.
[0344] In one embodiment, the expression constructs of the disclosure comprise a promoter operably linked to a nucleotide sequence encoding a Cas gene and a promoter operably linked to a guide RNA of the present disclosure. The promoter is capable of driving expression of an operably linked nucleotide sequence in a prokaryotic or eukaryotic cell/organism.
[0345] Nucleotide sequence modification of the guide polynucleotide, VT domain and/or
CER domain can be selected from, but not limited to , the group consisting of a 5' cap, a 3' polyadenylated tail, a riboswitch sequence, a stability control sequence, a sequence that forms a dsRNA duplex, a modification or sequence that targets the guide poly nucleotide to a subcellular location, a modification or sequence that provides for tracking , a modification or sequence that provides a binding site for proteins , a Locked Nucleic Acid (LNA), a 5-methyl dC nucleotide, a 2,6-Diaminopurine nucleotide, a 2’-Fluoro A nucleotide, a 2’-Fluoro U nucleotide; a 2'-0- Methyl RNA nucleotide, a phosphorothioate bond, linkage to a cholesterol molecule, linkage to a polyethylene glycol molecule, linkage to a spacer 18 molecule, a 5’ to 3’ covalent linkage, or any combination thereof. These modifications can result in at least one additional beneficial feature, wherein the additional beneficial feature is selected from the group of a modified or regulated stability, a subcellular targeting, tracking, a fluorescent label, a binding site for a protein or protein complex, modified binding affinity to complementary target sequence, modified resistance to cellular degradation, and increased cellular permeability.
[0346] A method of expressing RNA components such as gRNA in eukaryotic cells for performing Cas9-mediated DNA targeting has been to use RNA polymerase III (Pol III) promoters, which allow for transcription of RNA with precisely defined, unmodified, 5’- and 3’- ends (DiCarlo et al. , Nucleic Acids Res. 41 : 4336-4343; Ma el aI., Moί Ther. Nucleic Acids 3:el61). This strategy has been successfully applied in cells of several different species including maize and soybean (US20150082478 published 19 March 2015). Methods for expressing RNA components that do not have a 5’ cap have been described (W02016/025131 published 18 February 2016).
[0347] Various methods and compositions can be employed to obtain a cell or organism having a polynucleotide of interest inserted in a target site for a Cas endonuclease. Such methods can employ homologous recombination (HR) to provide integration of the polynucleotide of interest at the target site. In one method described herein, a polynucleotide of interest is introduced into the organism cell via a donor DNA construct.
[0348] The donor DNA construct further comprises a first and a second region of homology that flank the polynucleotide of interest. The first and second regions of homology of the donor DNA share homology to a first and a second genomic region, respectively, present in or flanking the target site of the cell or organism genome.
[0349] The donor DNA can be tethered to the guide polynucleotide. Tethered donor
DNAs can allow for co-localizing target and donor DNA, useful in genome editing, gene insertion, and targeted genome regulation, and can also be useful in targeting post-mitotic cells where function of endogenous HR machinery is expected to be highly diminished (Mali et al ., 2013, Nature Methods Vol. 10: 957-963).
[0350] The amount of homology or sequence identity shared by a target and a donor polynucleotide can vary and includes total lengths and/or regions having unit integral values in the ranges of about 1-20 bp, 20-50 bp, 50-100 bp, 75-150 bp, 100-250 bp, 150-300 bp, 200-400 bp, 250-500 bp, 300-600 bp, 350-750 bp, 400-800 bp, 450-900 bp, 500-1000 bp, 600-1250 bp, 700-1500 bp, 800-1750 bp, 900-2000 bp, 1-2.5 kb, 1.5-3 kb, 2-4 kb, 2.5-5 kb, 3-6 kb, 3.5-7 kb, 4-8 kb, 5-10 kb, or up to and including the total length of the target site. These ranges include every integer within the range, for example, the range of 1-20 bp includes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 and 20 bps. The amount of homology can also be described by percent sequence identity over the full aligned length of the two polynucleotides which includes percent sequence identity at least of about 50%, 55%, 60%, 65%, 70%, 71%, 72%,
73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, between 98% and 99%, 99%, between 99% and 100%, or 100%. Sufficient homology includes any combination of polynucleotide length, global percent sequence identity, and optionally conserved regions of contiguous nucleotides or local percent sequence identity, for example sufficient homology can be described as a region of 75-150 bp having at least 80% sequence identity to a region of the target locus. Sufficient homology can also be described by the predicted ability of two polynucleotides to specifically hybridize under high stringency conditions, see, for example, Sambrook et al. ,
( 1989) Molecular Cloning: A Laboratory Manual, (Cold Spring Harbor Laboratory Press, NY); Current Protocols in Molecular Biology, Ausubel et al, Eds (1994) Current Protocols, (Greene Publishing Associates, Inc. and John Wiley & Sons, Inc.); and, Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology— Hybridization with Nucleic Acid Probes, (Elsevier, New York). [0351] The structural similarity between a given genomic region and the corresponding region of homology found on the donor DNA can be any degree of sequence identity that allows for homologous recombination to occur. For example, the amount of homology or sequence identity shared by the“region of homology” of the donor DNA and the“genomic region” of the organism genome can be at least 50%, 55%, 60%, 65%, 70%, 75%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or 100% sequence identity, such that the sequences undergo homologous recombination
[0352] The region of homology on the donor DNA can have homology to any sequence flanking the target site. While in some instances the regions of homology share significant sequence homology to the genomic sequence immediately flanking the target site, it is recognized that the regions of homology can be designed to have sufficient homology to regions that may be further 5' or 3' to the target site. The regions of homology can also have homology with a fragment of the target site along with downstream genomic regions
[0353] In one embodiment, the first region of homology further comprises a first fragment of the target site and the second region of homology comprises a second fragment of the target site, wherein the first and second fragments are dissimilar.
Polynucleotides of Interest
[0354] Polynucleotides of interest are further described herein and include
polynucleotides reflective of the commercial markets and interests of those involved in the development of the crop. Crops and markets of interest change, and as developing nations open up world markets, new crops and technologies will emerge also. In addition, as our
understanding of agronomic traits and characteristics such as yield and heterosis increase, the choice of genes for genetic engineering will change accordingly.
[0355] General categories of polynucleotides of interest include, for example, genes of interest involved in information, such as zinc fingers, those involved in communication, such as kinases, and those involved in housekeeping, such as heat shock proteins. More specific polynucleotides of interest include, but are not limited to, genes involved in traits of agronomic interest such as but not limited to: crop yield, grain quality, crop nutrient content, starch and carbohydrate quality and quantity as well as those affecting kernel size, sucrose loading, protein quality and quantity, nitrogen fixation and/or utilization, fatty acid and oil composition, genes encoding proteins conferring resistance to abiotic stress (such as drought, nitrogen, temperature, salinity, toxic metals or trace elements, or those conferring resistance to toxins such as pesticides and herbicides), genes encoding proteins conferring resistance to biotic stress (such as attacks by fungi, viruses, bacteria, insects, and nematodes, and development of diseases associated with these organisms).
[0356] Agronomically important traits such as oil, starch, and protein content can be genetically altered in addition to using traditional breeding methods. Modifications include increasing content of oleic acid, saturated and unsaturated oils, increasing levels of lysine and sulfur, providing essential amino acids, and also modification of starch. Hordothionin protein modifications are described in U.S. Patent Nos. 5,703,049, 5,885,801, 5,885,802, and 5,990,389.
[0357] Polynucleotide sequences of interest may encode proteins involved in providing disease or pest resistance. By "disease resistance" or "pest resistance" is intended that the plants avoid the harmful symptoms that are the outcome of the plant-pathogen interactions. Pest resistance genes may encode resistance to pests that have great yield drag such as rootworm, cutworm, European Com Borer, and the like. Disease resistance and insect resistance genes such as lysozymes or cecropins for antibacterial protection, or proteins such as defensins, glucanases or chitinases for antifungal protection, or Bacillus thuringiensis endotoxins, protease inhibitors, collagenases, lectins, or glycosidases for controlling nematodes or insects are all examples of useful gene products. Genes encoding disease resistance traits include detoxification genes, such as against fumonisin (U.S. Patent No. 5,792,931); avirulence (avr) and disease resistance (R) genes (Jones et al. (1994) Science 266:789; Martin et al. (1993) Science 262: 1432; and
Mindrinos et al. (1994) Cell 78: 1089); and the like. Insect resistance genes may encode resistance to pests that have great yield drag such as rootworm, cutworm, European Corn Borer, and the like. Such genes include, for example, Bacillus thuringiensis toxic protein genes (U.S. Patent Nos. 5,366,892; 5,747,450; 5,736,514; 5,723,756; 5,593,881; and Geiser et al. (1986) Gene 48: 109); and the like.
[0358] An "herbicide resistance protein" or a protein resulting from expression of an
"herbicide resistance-encoding nucleic acid molecule" includes proteins that confer upon a cell the ability to tolerate a higher concentration of an herbicide than cells that do not express the protein, or to tolerate a certain concentration of an herbicide for a longer period of time than cells that do not express the protein. Herbicide resistance traits may be introduced into plants by genes coding for resistance to herbicides that act to inhibit the action of acetolactate synthase (ALS, also referred to as acetohydroxyacid synthase, AHAS), in particular the sulfonylurea (UK:
sulphonylurea) type herbicides, genes coding for resistance to herbicides that act to inhibit the action of glutamine synthase, such as phosphinothricin or basta ( e.g ., the bar gene), glyphosate ( e.g ., the EPSP synthase gene and the GAT gene), HPPD inhibitors (e.g, the HPPD gene) or other such genes known in the art. See, for example, US Patent Nos. 7,626,077, 5,310,667, 5,866,775, 6,225,114, 6,248,876, 7,169,970, 6,867,293, and 9,187,762. The bar gene encodes resistance to the herbicide basta, the nptll gene encodes resistance to the antibiotics kanamycin and geneticin, and the ALS-gene mutants encode resistance to the herbicide chlorsulfuron.
[0359] Furthermore, it is recognized that the polynucleotide of interest may also comprise antisense sequences complementary to at least a portion of the messenger RNA
(mRNA) for a targeted gene sequence of interest. Antisense nucleotides are constructed to hybridize with the corresponding mRNA. Modifications of the antisense sequences may be made as long as the sequences hybridize to and interfere with expression of the corresponding mRNA. In this manner, antisense constructions having 70%, 80%, or 85% sequence identity to the corresponding antisense sequences may be used. Furthermore, portions of the antisense nucleotides may be used to disrupt the expression of the target gene. Generally, sequences of at least 50 nucleotides, 100 nucleotides, 200 nucleotides, or greater may be used.
[0360] In addition, the polynucleotide of interest may also be used in the sense orientation to suppress the expression of endogenous genes in plants. Methods for suppressing gene expression in plants using polynucleotides in the sense orientation are known in the art. The methods generally involve transforming plants with a DNA construct comprising a promoter that drives expression in a plant operably linked to at least a portion of a nucleotide sequence that corresponds to the transcript of the endogenous gene. Typically, such a nucleotide sequence has substantial sequence identity to the sequence of the transcript of the endogenous gene, generally greater than about 65% sequence identity, about 85% sequence identity, or greater than about 95% sequence identity. See U.S. Patent Nos. 5,283,184 and 5,034,323.
[0361] The polynucleotide of interest can also be an expression regulatory element, such as but not limited to a promoter, enhancer, intron, terminator, or UTR (untranslated regulatory sequence). A UTR may be present at either the 5’ end or the 3’ end of a coding or noncoding sequence. Other examples of polynucleotides of interest include genes encoding for
ribonucleotide molecules, for example mRNA, siRNA, or other ribonucleotides. The regulatory element or RNA molecule may be endogenous to the cell in which the genetic modification occurs, or it may be heterologous to the cell.
[0362] The polynucleotide of interest can also be a phenotypic marker. A phenotypic marker is screenable or a selectable marker that includes visual markers and selectable markers whether it is a positive or negative selectable marker. Any phenotypic marker can be used. Specifically, a selectable or screenable marker comprises a DNA segment that allows one to identify, or select for or against a molecule or a cell that comprises it, often under particular conditions. These markers can encode an activity, such as, but not limited to, production of RNA, peptide, or protein, or can provide a binding site for RNA, peptides, proteins, inorganic and organic compounds or compositions and the like.
[0363] Examples of selectable markers include, but are not limited to, DNA segments that comprise restriction enzyme sites; DNA segments that encode products which provide resistance against otherwise toxic compounds including antibiotics, such as, spectinomycin, ampicillin, kanamycin, tetracycline, Basta, neomycin phosphotransferase II (NEO) and hygromycin phosphotransferase (HPT)); DNA segments that encode products which are otherwise lacking in the recipient cell ( e.g ., tRNA genes, auxotrophic markers); DNA segments that encode products which can be readily identified (e.g., phenotypic markers such as b- galactosidase, GUS; fluorescent proteins such as green fluorescent protein (GFP), cyan (CFP), yellow (YFP), red (RFP), and cell surface proteins); the generation of new primer sites for PCR (e.g, the juxtaposition of two DNA sequence not previously juxtaposed), the inclusion of DNA sequences not acted upon or acted upon by a restriction endonuclease or other DNA modifying enzyme, chemical, etc.; and, the inclusion of a DNA sequences required for a specific modification (e.g, methylation) that allows its identification.
[0364] Additional selectable markers include genes that confer resistance to herbicidal compounds, such as sulphonylureas, glufosinate ammonium, bromoxynil, imidazolinones, and 2,4-dichlorophenoxyacetate (2,4-D). See for example, Acetolactase synthase (ALS) for resistance to sulfonylureas, imidazolinones, triazolopyrimidine sulfonamides,
pyrimidinylsalicylates and sulphonylaminocarbonyl-triazolinones (Shaner and Singh, 1997, Herbicide Activity: Toxicol BiochemMol Biol 69-110); glyphosate resistant 5- enolpyruvylshikimate-3 -phosphate (EPSPS) (Saroha et al. 1998, J. Plant Biochemistry & Biotechnology Vol 7:65-72); [0365] Polynucleotides of interest includes genes that can be stacked or used in combination with other traits, such as but not limited to herbicide resistance or any other trait described herein. Polynucleotides of interest and/or traits can be stacked together in a complex trait locus as described in US20130263324 published 03 Oct 2013 and in WO/2013/112686, published 01 August 2013.
[0366] A polypeptide of interest includes any protein or polypeptide that is encoded by a polynucleotide of interest described herein.
[0367] Further provided are methods for identifying at least one plant cell, comprising in its genome, a polynucleotide of interest integrated at the target site. A variety of methods are available for identifying those plant cells with insertion into the genome at or near to the target site. Such methods can be viewed as directly analyzing a target sequence to detect any change in the target sequence, including but not limited to PCR methods, sequencing methods, nuclease digestion, Southern blots, and any combination thereof. See, for example, US20090133152 published 21 May 2009. The method also comprises recovering a plant from the plant cell comprising a polynucleotide of interest integrated into its genome. The plant may be sterile or fertile. It is recognized that any polynucleotide of interest can be provided, integrated into the plant genome at the target site, and expressed in a plant.
Expression Elements
[0368] Any polynucleotide encoding a Cas protein, other CRISPR system component, or other polynucleotide disclosed herein may be functionally linked to a heterologous expression element, to facilitate transcription or regulation in a host cell. Such expression elements include but are not limited to: promoter, leader, intron, and terminator. Expression elements may be “minimal” - meaning a shorter sequence derived from a native source, that still functions as an expression regulator or modifier. Alternatively, an expression element may be“optimized” - meaning that its polynucleotide sequence has been altered from its native state in order to function with a more desirable characteristic in a particular host cell. Alternatively, an expression element may be“synthetic” - meaning that it is designed in silico and synthesized for use in a host cell. Synthetic expression elements may be entirely synthetic, or partially synthetic
(comprising a fragment of a naturally-occurring polynucleotide sequence).
[0369] It has been shown that certain promoters are able to direct RNA synthesis at a higher rate than others. These are called“strong promoters”. Certain other promoters have been shown to direct RNA synthesis at higher levels only in particular types of cells or tissues and are often referred to as“tissue specific promoters”, or“tissue-preferred promoters” if the promoters direct RNA synthesis preferably in certain tissues but also in other tissues at reduced levels.
[0370] A plant promoter includes a promoter capable of initiating transcription in a plant cell. For a review of plant promoters, see, Potenza et al. , 2004, In vitro Cell Dev Biol 40: 1-22; Porto et al, 2014, Molecular Biotechnology (2014), 56(1), 38-49. New promoters of various types useful in plant cells are constantly being discovered; numerous examples may be found in the compilation by Okamuro and Goldberg, (1989) In The Biochemistry of Plants, Vol. 115, Stumpf and Conn, eds (New York, NY: Academic Press), pp. 1-82.
Developmental Genes (Morphogenic Factors)
[0371] Morphogenic factors (also called“developmental genes” or“dev genes”, which are used synonymously throughout) are polynucleotides that act to enhance the rate, efficiency, and/or efficacy of targeted polynucleotide modification by a number of mechanisms, some of which are related to the capability of stimulating growth of a cell or tissue, including but not limited to promoting progression through the cell cycle, inhibiting cell death, such as apoptosis, stimulating cell division, and/or stimulating embryogenesis. The polynucleotides can fall into several categories, including but not limited to, cell cycle stimulatory polynucleotides, developmental polynucleotides, anti-apoptosis polynucleotides, hormone polynucleotides, transcription factors, or silencing constructs targeted against cell cycle repressors or pro- apoptotic factors. Methods and compositions for rapid and efficient transformation of plants by transforming cells of plant explants with an expression construct comprising a heterologous nucleotide encoding a morphogenic factor are described in US Patent Application Publication No. US2017/0121722 (published 04 May 2017).
[0372] A morphogenic factor (gene or protein) may be involved in plant metabolism, organ development, stem cell development, cell growth stimulation, organogenesis, somatic embryogenesis initiation, accelerated somatic embryo maturation, initiation and/or development of the apical meristem, initiation and/or development of shoot meristem, or a combination thereof.
[0373] In some aspects, the morphogenic factor is a molecule selected from one or more of the following categories: 1) cell cycle stimulatory polynucleotides including plant viral replicase genes such as Rep A, cyclins, E2F, prolifera, cdc2 and cdc25; 2) developmental polynucleotides such as Led, Knl family, WUSCHEL, Zwille, BBM, Aintegumenta (ANT), FUS3, and members of the Knotted family, such as Knl, STM, OSH1, and SbHl; 3) anti apoptosis polynucleotides such as CED9, Bcl2, Bcl-X(L), Bcl-W, Al, McL-1, Macl, Boo, and Bax-inhibitors; 4) hormone polynucleotides such as IPT, TZS, and CKI-1; and 5) silencing constructs targeted against cell cycle repressors, such as Rb, CK1, prohibitin, and weel, or stimulators of apoptosis such as APAF-1, bad, bax, CED-4, and caspase-3, and repressors of plant developmental transitions, such as Pickle and WD polycomb genes including FIE and Medea. The polynucleotides can be silenced by any known method such as antisense, RNA interference, cosuppression, chimerplasty, or transposon insertion.
[0374] Other morphogenic factors useful in the present disclosure include, but are not limited to, Ovule Development Protein 2 (ODP2) polypeptides, and related polypeptides, e.g ., Babyboom (BBM) protein family proteins. The ODP2 polypeptides share homology with several polypeptides within the AP2 family, e.g. , see FIG. 1 of US8,420,893, which is incorporated herein by reference in its entirety, provides an alignment of the maize and rice ODP2
polypeptides with eight other proteins having two AP2 domains.
[0375] In some aspects, the expression of the morphogenic factor is transient. In some aspects, the expression of the morphogenic factor is constitutive. In some aspects, the expression of the morphogenic factor is specific to a particular tissue or cell type. In some aspects, the expression of the morphogenic factor is temporally regulated. In some aspects, the expression of the morphogenic factor is regulated by an environmental condition, such as temperature, time of day, or other factor. In some aspects, the expression of the morphogenic factor is stable. In some aspects, expression of the morphogenic factor is controlled. The controlled expression may be a pulsed expression of the morphogenic factor for a particular period of time. Alternatively, the morphogenic factor may be expressed in only some transformed cells and not expressed in others. The control of expression of the morphogenic factor can be achieved by a variety of methods as disclosed herein.
Helper Plasmids
[0376] Agrobacterium , a natural plant pathogen, has been widely used for the transformation of dicotyledonous plants and more recently for transformation of
monocotyledonous plants. The advantage of the Agrobacterium-mediated gene transfer system is that it offers the potential to regenerate transgenic cells at relatively high frequencies without a significant reduction in plant regeneration rates. Moreover, the process of DNA transfer to the plant genome is well characterized relative to other DNA delivery methods. DNA transferred via Agrobacterium is less likely to undergo any major rearrangements than is DNA transferred via direct delivery, and it integrates into the plant genome often in single or low copy numbers.
[0377] The most commonly used Agrobacterium- mediated gene transfer system is a binary transformation vector system where the Agrobacterium has been engineered to include a disarmed, or nononcogenic, Ti helper plasmid, which encodes the vir functions necessary for DNA transfer, and a much smaller separate plasmid called the binary vector plasmid, which carries the transferred DNA, or the T-DNA region. The T-DNA is defined by sequences at each end, called T-DNA borders, which play an important role in the production of T-DNA and in the transfer process.
[0378] Binary vectors are vectors in which the virulence genes are placed on a different plasmid than the one carrying the T-DNA region (Bevan, 1984, Nucl. Acids. Res. 12: 8711- 8721). The development of T-DNA binary vectors has made the transformation of plant cells easier as they do not require recombination. The finding that some of the virulence genes exhibited gene dosage effects (Jin et al., J. Bacterid. (1987) 169:4417-4425) led to the development of a superbinary vector, which carried additional virulence genes (Komari, T., et al., Plant Cell Rep. (1990), 9:303-306). These early superbinary vectors carried a large“vir” fragment (-14.8 kbp) from the hypervirulenece Ti plasmid, pTiBo542, which had been introduced into a standard binary vector (ibid). The superbinary vectors resulted in vastly improved plant transformation. For example, Hiei, Y., et al. (Plant J. (1994) 6:271-282) described efficient transformation of rice by Agrobacterium, and subsequently there were reports of using this system for maize, barley and wheat (Ishida, Y., et al., Nat. Biotech. (1996) 14:745- 750; Tingay, S., et al., Plant J. (1997) 11 : 1369-1376; and Cheng, M., et al., Plant Physiol. (1997) 115:971-980; see also U.S. Pat. No. 5,591,616 to Hiei et al). Examples of prior superbinary vectors include pTOK162 (Japanese Patent Appl. (Kokai) No. 4-222527, EP-A-504,869, EP-A- 604,662, and U.S. Pat. No. 5,591,616) and pTOK233 (see Komari, T., ibid ; and Ishida, Y., et al., ibid).
[0379] Agrobacteria with helper plasmids, such as pVIR9, pVIR7, or pVIRlO, can significantly improve the transient protein expression, transient T-DNA delivery, somatic embryo phenotypes, transformation frequencies, recovery of quality events, and usable quality events in different plant lines (WO2017078836A1, published 11 May 2017).
[0380] VIR genes are also used for the improvement of transformation with
Ochrobactrum , for example as disclosed in US20180216123, published 02 August 2018.
Introduction of System Components into a Cell
[0381] The methods and compositions described herein do not depend on a particular method for introducing a sequence into an organism or cell, only that the polynucleotide or polypeptide gains access to the interior of at least one cell of the organism. Introducing includes reference to the incorporation of a nucleic acid into a eukaryotic or prokaryotic cell where the nucleic acid may be incorporated into the genome of the cell, and includes reference to the transient (direct) provision of a nucleic acid, protein or ribonucleoprotein complex to the cell.
[0382] Methods for introducing polynucleotides or polypeptides or a polynucleotide- protein complex into cells or organisms are known in the art including, but not limited to, microinjection, electroporation, stable transformation methods, transient transformation methods, ballistic particle acceleration (particle bombardment), whiskers mediated transformation, Agrobacterium- mediated transformation, direct gene transfer, viral-mediated introduction, transfection, transduction, cell-penetrating peptides, mesoporous silica nanoparticle (MSN)- mediated direct protein delivery, topical applications, sexual crossing , sexual breeding, and any combination thereof. General methods for the introduction of polynucleotides into a cell for transformation, for example Agrobacterium-mQdiatQd transformation, Ochrobactrum- mediated transformation, and particle bombardment-mediated transformation of cells are known in the art.
[0383] For example, the guide polynucleotide (guide RNA, crNucleotide +
tracrNucleotide, guide DNA and/or guide RNA-DNA molecule) can be introduced into a cell directly (transiently) as a single stranded or double stranded polynucleotide molecule. The guide RNA (or crRNA + tracrRNA) can also be introduced into a cell indirectly by introducing a recombinant DNA molecule comprising a heterologous nucleic acid fragment encoding the guide RNA (or crRNA + tracrRNA), operably linked to a specific promoter that is capable of transcribing the guide RNA (crRNA+tracrRNA molecules) in said cell. The specific promoter can be, but is not limited to, a RNA polymerase III promoter, which allow for transcription of RNA with precisely defined, unmodified, 5’- and 3’-ends (Ma el al. , 2014, Mol. Ther. Nucleic Acids 3:el61; DiCarlo et al., 2013, Nucleic Acids Res. 41 : 4336-4343; WO2015026887, published 26 February 2015). Any promoter capable of transcribing the guide RNA in a cell can be used and includes a heat shock /heat inducible promoter operably linked to a nucleotide sequence encoding the guide RNA.
[0384] Protocols for introducing polynucleotides, polypeptides or polynucleotide-protein complexes into eukaryotic cells, such as plants or plant cells are known and include
microinjection (Crossway et al, (1986) Biotechniques 4:320-34 and U.S. Patent No. 6,300,543), meristem transformation (U.S. Patent No. 5,736,369), electroporation (Riggs et al. , (1986) Proc. Natl. Acad. Sci. USA 83:5602-6, Agrobacterium-mQdiatQd transformation (U.S. Patent Nos. 5,563,055 and 5,981,840), whiskers mediated transformation (Ainley et al. 2013, Plant
Biotechnology Journal 11 :1126-1134; Shaheen A. and M. Arshad 2011 Properties and
Applications of Silicon Carbide (2011), 345-358 Editor(s): Gerhardt, Rosario. Publisher: InTech, Rijeka, Croatia. CODEN: 69PQBP; ISBN: 978-953-307-201-2), direct gene transfer
(Paszkowski et al. , (1984) EMBO J 3:2717-22), and ballistic particle acceleration (U.S. Patent Nos. 4,945,050; 5,879,918; 5,886,244; 5,932,782; Tomes et al. , (1995) "Direct DNA Transfer into Intact Plant Cells via Microprojectile Bombardment" in Plant Cell, Tissue, and Organ Culture: Fundamental Methods, ed. Gamborg & Phillips (Springer-Verlag, Berlin); McCabe et al, (1988) Biotechnology 6:923-6; Weissinger et al, (1988) Ann Rev Genet 22:421-77; Sanford et al. , (1987) Particulate Science and Technology 5:27-37 (onion); Christou et al. , (1988) Plant Physiol 87:671-4 (soybean); Finer and McMullen, (1991) In vitro Cell Dev Biol 27P: 175-82 (soybean); Singh et al, (1998) Theor Appl Genet 96:319-24 (soybean); Datta et al, (1990) Biotechnology 8:736-40 (rice); Klein et al, (1988) Proc. Natl. Acad. Sci. USA 85:4305-9
(maize); Klein et al, (1988) Biotechnology 6:559-63 (maize); U.S. Patent Nos. 5,240,855;
5,322,783 and 5,324,646; Klein et al, (1988 ) Plant Physiol 91 :440-4 (maize); Fromm et al, (1990) Biotechnology 8:833-9 (maize); Hooykaas-Van Slogteren et al, (1984) Nature 311 :763- 4; U.S. Patent No. 5,736,369 (cereals); Bytebier et al, (1987) Proc. Natl. Acad. Sci. USA
84:5345-9 ( Liliaceae ); De Wet et al, (1985) in The Experimental Manipulation of Ovule Tissues, ed. Chapman et al, (Longman, New York), pp. 197-209 (pollen); Kaeppler et al, (1990) Plant Cell Rep 9:415-8) and Kaeppler et al, (1992) Theor Appl Genet 84:560-6 (whisker-mediated transformation); D'Halluin et al, (1992) Plant Cell 4:1495-505 (electroporation); Li et al,
(1993) Plant Cell Rep 12:250-5; Christou and Ford (1995) Annals Botany 75:407-13 (rice) and Osjoda et al, (1996) Nat Biotechnol 14:745-50 (maize vi a Agrobacterium tumefaciens). [0385] Alternatively, polynucleotides may be introduced into cells by contacting cells or organisms with a virus or viral nucleic acids. Generally, such methods involve incorporating a polynucleotide within a viral DNA or RNA molecule. In some examples a polypeptide of interest may be initially synthesized as part of a viral polyprotein, which is later processed by proteolysis in vivo or in vitro to produce the desired recombinant protein. Methods for introducing polynucleotides into plants and expressing a protein encoded therein, involving viral DNA or RNA molecules, are known, see, for example, U.S. Patent Nos. 5,889,191, 5,889,190, 5,866,785, 5,589,367 and 5,316,931.
[0386] The methods provided herein rely upon the use of bacteria-mediated and/or biolistic-mediated gene transfer to produce regenerable plant cells. Bacterial strains useful in the methods of the disclosure include, but are not limited to, a disarmed Agrobacteria , an
Ochrobactrum bacteria or a Rhizobiaceae bacteria. Standard protocols for particle bombardment (Finer and McMullen, 1991, In Vitro Cell Dev. Biol. - Plant 27: 175-182), Agrobacterium- mediated transformation (Jia et al., 2015, Int J. Mol. Sci. 16: 18552-18543; US2017/0121722 incorporated herein by reference in its entirety), or Ochrobactrum-mediated transformation (US2018/0216123 incorporated herein by reference in its entirety) can be used with the methods and compositions of the disclosure.
[0387] The polynucleotide or recombinant DNA construct can be provided to or introduced into a prokaryotic and eukaryotic cell or organism using a variety of transient transformation methods. Such transient transformation methods include, but are not limited to, the introduction of the polynucleotide construct directly into the plant.
[0388] Nucleic acids and proteins can be provided to a cell by any method including methods using molecules to facilitate the uptake of anyone or all components of a guided Cas system (protein and/or nucleic acids), such as cell-penetrating peptides and nanocarriers. See also US20110035836 published 10 February 2011, and EP2821486A1 published 07 January 2015.
[0389] Other methods of introducing polynucleotides into a prokaryotic and eukaryotic cell or organism or plant part can be used, including plastid transformation methods, and the methods for introducing polynucleotides into tissues from seedlings or mature seeds.
[0390] Stable transformation is intended to mean that the nucleotide construct introduced into an organism integrates into a genome of the organism and is capable of being inherited by the progeny thereof. Transient transformation is intended to mean that a polynucleotide is introduced into the organism and does not integrate into a genome of the organism or a polypeptide is introduced into an organism. Transient transformation indicates that the introduced composition is only temporarily expressed or present in the organism.
[0391] A variety of methods are available to identify those cells having an altered genome at or near a target site without using a screenable marker phenotype. Such methods can be viewed as directly analyzing a target sequence to detect any change in the target sequence, including but not limited to PCR methods, sequencing methods, nuclease digestion, Southern blots, and any combination thereof.
Cells and Organisms
[0392] The presently disclosed polynucleotides and polypeptides can be introduced into a cell. Cells include, but are not limited to, human, non-human, animal, mammalian, bacterial, protist, fungal, insect, yeast, non-conventional yeast, and plant cells, as well as plants and seeds produced by the methods described herein. In some aspects, the cell of the organism is a reproductive cell, a somatic cell, a meiotic cell, a mitotic cell, a stem cell, or a pluripotent stem cell. Any cell from any organism may be used with the compositions and methods described herein, including monocot and dicot plants, and plant elements.
Animal Cells
[0393] The presently disclosed polynucleotides and polypeptides can be introduced into an animal cell. Animal cells can include, but are not limited to: an organism of a phylum including chordates, arthropods, mollusks, annelids, cnidarians, or echinoderms; or an organism of a class including mammals, insects, birds, amphibians, reptiles, or fishes. In some aspects, the animal is human, mouse, C. elegans , rat, fruit fly ( Drosophila spp.), zebrafish, chicken, dog, cat, guinea pig, hamster, chicken, Japanese ricefish, sea lamprey, pufferfish, tree frog (e.g, Xenopus spp.), monkey, or chimpanzee. Particular cell types that are contemplated include haploid cells, diploid cells, reproductive cells, neurons, muscle cells, endocrine or exocrine cells, epithelial cells, muscle cells, tumor cells, embryonic cells, hematopoietic cells, bone cells, germ cells, somatic cells, stem cells, pluripotent stem cells, induced pluripotent stem cells, progenitor cells, meiotic cells, and mitotic cells. In some aspects, a plurality of cells from an organism may be used.
[0394] The compositions and methods described herein may be used to edit the genome of an animal cell in various ways. In one aspect, it may be desirable to delete one or more nucleotides. In another aspect, it may be desirable to insert one or more nucleotides. In one aspect, it may be desirable to replace one or more nucleotides. In another aspect, it may be desirable to modify one or more nucleotides via a covalent or non-covalent interaction with another atom or molecule.
[0395] Genome modification may be used to effect a genotypic and/or phenotypic change on the target organism. Such a change is preferably related to an improved phenotype of interest or a physiologically-important characteristic, the correction of an endogenous defect, or the expression of some type of expression marker. In some aspects, the phenotype of interest or physiologically-important characteristic is related to the overall health, fitness, or fertility of the animal, the ecological fitness of the organism, or the relationship or interaction of the organism with other organisms in its environment.
[0396] Cells that have been genetically modified using the compositions or methods described herein may be transplanted to a subject for purposes such as gene therapy, e.g. to treat a disease, or as an antiviral, antipathogenic, or anticancer therapeutic, for the production of genetically modified organisms in agriculture, or for biological research.
Plant Cells and Plants
[0397] Examples of monocot plants that can be used include, but are not limited to, corn
(Zea mays), rice ( Oryza sativa), rye ( Secale cereale), sorghum (, Sorghum bicolor, Sorghum vulgare), millet (e.g, pearl millet (Pennisetum glaucum), proso millet (Panicum miliaceum), foxtail millet ( Setaria italica), finger millet (Eleusine coracana )), wheat ( Triticum species, for example Triticum aestivum, Triticum monococcum), sugarcane ( Saccharum spp .), oats (. Avena ), barley (Hordeum), switchgrass (Panicum virgatum), pineapple (Ananas comosus), banana (Musa spp.), palm, ornamentals, turfgrasses, and other grasses.
[0398] Examples of dicot plants that can be used include, but are not limited to, soybean
(Glycine max), Brassica species (for example but not limited to: oilseed rape or Canola)
(Brassica napus, B. campestris, Brassica rapa, Brassica. juncea), alfalfa (Medicago sativa),), tobacco (Nicotiana tabacum), Arabidopsis (Arabidopsis thaliana), sunflower (Helianthus annuus), cotton (Gossypium arboreum, Gossypium barbadense), and peanut (Arachis hypogaea), tomato (Solanum ly coper sicum), and potato (Solanum tuberosum).
[0399] Additional plants that can be used include safflower (Carthamus tinctorius), sweet potato (Ipomoea batatus), cassava (Manihot esculenta), coffee (Coffea spp), coconut (Cocos nucifera ), citrus trees ( Citrus spp), cocoa ( Theobroma cacao), tea ( Camellia sinensis), banana (Musa spp), avocado (Persea americana), fig (Ficus casica), guava (Psidium guajava), mango (Mangifera indica), olive (Olea europaea), papaya (Carica papaya), cashew
(Anacardium occidentale), macadamia (Macadamia integrifolia), almond (Prunus amygdalus), sugar beets (Beta vulgaris), vegetables, ornamentals, and conifers. Vegetables that can be used include tomatoes (Lycopersicon esculentum), lettuce (e.g, Lactuca sativa), green beans
(Phaseolus vulgaris), lima beans (Phaseolus limensis), peas (Lathyrus spp), and members of the genus Cucumis such as cucumber (C. sativus), cantaloupe (C. cantalupensis), and musk melon (C. meld). Ornamentals include azalea (Rhododendron spp), hydrangea (Macrophylla hydrangea), hibiscus (Hibiscus rosasanensis), roses (Rosa spp), tulips (Tulipa spp), daffodils (Narcissus spp), petunias (Petunia hybrida), carnation (Dianthus caryophyllus), poinsettia (Euphorbia pulcherrima), and chrysanthemum.
[0400] In certain embodiments of the disclosure, a fertile plant is a plant that produces viable male and female gametes and is self-fertile. Such a self-fertile plant can produce a progeny plant without the contribution from any other plant of a gamete and the genetic material comprised therein. Other embodiments of the disclosure can involve the use of a plant that is not self-fertile because the plant does not produce male gametes, or female gametes, or both, that are viable or otherwise capable of fertilization.
[0401] The present disclosure finds use in the breeding of plants comprising one or more introduced traits, or edited genomes.
[0402] A non-limiting example of how two traits can be stacked into the genome at a genetic distance of, for example, 5 cM from each other is described as follows: A first plant comprising a first transgenic target site integrated into a first DSB target site within the genomic window and not having the first genomic locus of interest is crossed to a second transgenic plant, comprising a genomic locus of interest at a different genomic insertion site within the genomic window and the second plant does not comprise the first transgenic target site. About 5% of the plant progeny from this cross will have both the first transgenic target site integrated into a first DSB target site and the first genomic locus of interest integrated at different genomic insertion sites within the genomic window. Progeny plants having both sites in the defined genomic window can be further crossed with a third transgenic plant comprising a second transgenic target site integrated into a second DSB target site and/or a second genomic locus of interest within the defined genomic window and lacking the first transgenic target site and the first genomic locus of interest. Progeny are then selected having the first transgenic target site, the first genomic locus of interest and the second genomic locus of interest integrated at different genomic insertion sites within the genomic window. Such methods can be used to produce a transgenic plant comprising a complex trait locus having at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,15, 16, 17, 19, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 or more transgenic target sites integrated into DSB target sites and/or genomic loci of interest integrated at different sites within the genomic window. In such a manner, various complex trait loci can be generated.
[0403] Some aspects of the invention include, but are not limited to:
[0404] Aspect 1 : A synthetic composition comprising a Cas endoribonuclease molecule and a heterologous poly-guide RNA molecule comprising a plurality of discrete guide RNA molecules and a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease molecule.
[0405] Aspect 2: A synthetic composition comprising a poly-guide RNA molecule comprising a nuclease recognition sequence that is capable of functional interaction with a Cas endoribonuclease molecule, wherein at least one discrete component of the poly-guide RNA molecule is heterologous to the Cas endoribonuclease.
[0406] Aspect 3 : The synthetic composition of Aspect 1 or Aspect 2, wherein the
Cas endoribonuclease is isolated or derived from Streptococcus thermophilus.
[0407] Aspect 4: The synthetic composition of Aspect 1 or Aspect 2, further comprising a Cas endonuclease.
[0408] Aspect 5: The synthetic composition of Aspect 1 or Aspect 2, wherein the recognition sequence comprises the nucleotides CCCGCNNNNGCGGG.
[0409] Aspect 6: The synthetic composition of Aspect 1 or Aspect 2, wherein at least one component is a DNA molecule encoding the component.
[0410] Aspect 7: The synthetic composition of Aspect 1 or Aspect 2, wherein the poly-guide RNA molecule comprises RNA.
[0411] Aspect 8: The synthetic composition of Aspect 1 or Aspect 2, wherein the
Cas endoribonuclease is a protein. [0412] Aspect 9: The synthetic composition of Aspect 1 or Aspect 2, wherein the plurality of discrete RNA molecules comprise at least two non-identical guide RNA molecules that are each capable of forming a complex with a Cas endonuclease.
[0413] Aspect 10: The synthetic composition of Aspect 1 or Aspect 2, wherein the endoribonuclease molecule shares at least 85% sequence identity with SEQID NO:48.
[0414] Aspect 11 : The synthetic composition of Aspect 1 or Aspect 2, wherein the poly-guide RNA molecule is operably linked to a promoter.
[0415] Aspect 12: The synthetic composition of Aspect 10, wherein the promoter selected from the group consisting of: U6, Ubiquitin, bidirectional promoter.
[0416] Aspect 13: A synthetic composition of editing a plurality of target polynucleotides with any of the synthetic compositions of Aspects 1-11.
[0417] Aspect 14: The synthetic composition of Aspect 12, wherein the plurality of target polynucleotides are non-identical to each other.
[0418] Aspect 15: A cell comprising any of the synthetic compositions of Aspects 1-
12
[0419] Aspect 16: The cell of Aspect 14, wherein the cell comprises a polynucleotide sequence in its genome that is capable of selective hybridization with at least one of the discrete guide RNA molecules of the poly-guide RNA molecule.
[0420] Aspect 17: A method of providing a poly-guide RNA molecule to a cell that comprises a target sequence capable of selective hybridization with at least one guide RNA of the poly-guide RNA molecule.
[0421] Aspect 18: A method of generating a plurality of guide RNA molecules in a cell, the method comprising providing a heterologous RNA molecule comprising a plurality of discrete guide RNA molecules, wherein the plurality of the guide RNA molecules are each flanked by an endoribonuclease recognition sequence; providing an endoribonuclease that cleaves the endoribonuclease recognition sequence; and thereby generating the plurality of guide RNA molecules in the cell.
[0422] Aspect 19: The method of Aspect 16 or Aspect 17, further comprising providing to the cell a Cas endonuclease. [0423] Aspect 20: A method of editing a target polynucleotide in a cell, comprising providing to the cell a Cas endoribonuclease, a Cas endonuclease, and a poly-guide RNA molecule, wherein the poly-guide RNA molecule comprises a plurality of discrete guide RNA.
[0424] Aspect 21 : The method of any of Aspects 16-19, wherein the cell is a bacterium, plant cell, or animal cell.
[0425] Aspect 22: The method of any of Aspects 16-19, wherein the Cas
endonuclease, the Cas endoribonuclease, and the poly-guide RNA molecule are provided on one vector to a target cell.
[0426] Aspect 23: The method of any of Aspects 16-19, wherein the Cas
endonuclease is provided to a target cell on a different vector than that comprising the poly-guide RNA or the Cas endoribonuclease.
[0427] Aspect 24: The method of any of Aspects 16-19, wherein the Cas
endonuclease and/or the Cas endoribonuclease is/are delivered to the cell as proteins, and the poly-guide RNA molecule is provided to the cell as RNA.
[0428] While the invention has been particularly shown and described with reference to a preferred embodiment and various alternate embodiments, it will be understood by persons skilled in the relevant art that various changes in form and details can be made therein without departing from the spirit and scope of the invention. For instance, while the particular examples below may illustrate the methods and embodiments described herein using a specific plant, the principles in these examples may be applied to any plant. Therefore, it will be appreciated that the scope of this invention is encompassed by the embodiments of the inventions recited herein and in the specification rather than the specific examples that are exemplified below. All cited patents, applications, and publications referred to in this application are herein incorporated by reference in their entirety, for all purposes, to the same extent as if each were individually and specifically incorporated by reference. EXAMPLES
[0429] The following are examples of specific embodiments of some aspects of the invention. The examples are offered for illustrative purposes only, and are not intended to limit the scope of the invention in any way. Efforts have been made to ensure accuracy with respect to numbers used ( e.g ., amounts, temperatures, etc.), but some experimental error and deviation should, of course, be allowed for.
Example 1: Identification of a Type I-E Cas endoribonuclease protein for poly-guide RNA expression
[0430] To develop a Cas9 multi-guide RNA (mgRNA, poly-guide RNA) expression platform for eukaryotic cells, in particular plant cells, the cas endoribonuclease gene from a Type I-E CRISPR (clustered regularly interspaced palindromic repeats)-Cas (CRISPR associated) system was identified. This was accomplished by first retrieving the genomic sequence for Sth7710 (NCBI accession number: NZ_AWVZ00000000.1) and examining it for the presence of CRISPR repeats using minCED (Bland, C. et al. (2007) BMC Bioinformatics , 8:209). In total, 4 arrays of CRISPRs each with a different repeat sequence were identified. Next, genomic sequence regions adjacent to the identified CRISPR arrays (about 10 kb 5 prime and 3 prime) were examined for the presence of open-reading frames (ORFs) encoding proteins with homology to Cas3, the signature protein of Type I CRISPR-Cas systems, by comparisons with NCBI protein databases using the PSI-BLAST program (Altschul, S. F. et al ., 1997, Nucleic Acids Res. 25:3389-3402). One of the CRISPR arrays comprising a 28 bp repeat with the sequence consensus of 5’- GTTTTTCCCGCAC ACGCGGGGGTGATCC-3’ (SEQID NO:2) was demonstrated to be adjacent to an ORF encoding a Cas3 protein. Next, genes encoding the other proteins typical of a Type I-E CRISPR-Cas system (Makarova, K. S. et al ., 2015, Nature Rev. Microbiol. 13:722-736) were also identified within the locus (SEQ ID NOs: 2-9). This was accomplished by first translating the ORFs between the cas3 gene and CRISPR array into proteins using ORF Finder (Stothard, P. (2000) Biotechniques 28: 1102-1104) followed by comparisons with NCBI protein databases using the PSI-BLAST program (Altschul, S. F. et al., 1997). As shown in FIG. 1, the resulting genes produced an operon structure typical of a Type I- E CRISPR-Cas system (Makarova, K. S. et al., 2015, Nature Rev. Microbiol. 13:722-736). The gene responsible for CRISPR RNA cleavage and maturation (endoribonuclease, SEQID NO: 39) was located between casD and casl genes. Example 2: Utilization of Cas endoribonuclease from the Type I-E CRISPR system for poly-guide RNA expression
[0431] The multi -protein complex of the Type I-E system, termed Cascade, is directed by small CRISPR RNAs (referred to herein as guide RNAs) to bind DNA target sites (lore, M. M. et al. (2011) Nat. Struct. Mol. Biol. 18:529-536 and Sinkunas, T. et al. (2013 ) EMBO J. 32:385- 394). These guide RNAs are comprised of a ~33 nt variable targeting sequence that is flanked by fixed sequences comprised of a ~7 nt 5 prime sequence and a ~21 nt 3 prime hairpin comprising sequence (FIG. 2A). These fixed flanking sequences are the result of cleavage within the repeat sequences of the primary CRISPR array transcript by the Cas endoribonuclease protein in Type I-E CRISPR-Cas systems (FIG. 2B) (Gesner, E. M. et al. (2011 ) Nat. Struct. Mol. Biol. 18:688- 692). In this example, both the Type I-E Cas endonuclease and its CRISPR repeat sequence are repurposed to permit Cas9 guide RNA processing in eukaryotic cells, in particular plant cells. By combining two CRISPR technologies (Cas endoribonuclease and Cas9 endonuclease) (FIG. 3 A), multiple Cas9 guide RNAs may be expressed from a single promoter simplifying the delivery of multiple gRNA expression cassettes (FIG. 3B).
Example 3: Transformation of Type I-E Cas endoribonuclease, Cas9, and gRNA
expression constructs for multiplex genome manipulation
[0432] In some aspects, the compositions disclosed herein may be utilized to modify the transcriptional status of a gene or a target polynucleotide in the genome of a cell. In some aspects, said cell is a eukaryotic cell. In one example of a eukaryotic cell, a plant cell is used. Transformation of a eukaryotic cell with a Type I-E Cas endoribonuclease and Cas9 and associated guide polynucleotide can be accomplished by various methods known to be effective in plants, including particle-mediated delivery, Agrobacterium-mediated transformation, PEG- mediated delivery, and electroporation. It is appreciated that any method known in the art may be utilized. Example methods are described below.
Particle-mediated delivery
[0433] Transformation of maize immature embryos using particle delivery was performed as follows. Media recipes follow below.
[0434] The ears were husked and surface sterilized in 30% Clorox® bleach plus 0.5%
Micro detergent for 20 minutes, and rinsed two times with sterile water. The immature embryos were isolated and placed embryo axis side down (scutellum side up), 25 embryos per plate, on bombardment medium for 4 hours and then aligned within the 2.5-cm target zone in preparation for bombardment. Alternatively, isolated embryos were placed on initiation medium and placed in the dark at temperatures ranging from 26 degrees Celsius to 37 degrees Celsius for 8 to 24 hours prior to placing on bombardment medium for 4 hours at 26 degrees Celsius prior to bombardment as described above.
[0435] Plasmids comprising genes encoding the Cas endoribonuclease, Cas9 and associated guide polynucleotide constructs were constructed using standard molecular biology techniques and co-bombarded with plasmids comprising the developmental genes ODP2 (AP2 domain transcription factor ODP2 (Ovule development protein 2); US20090328252 Al) and Wuschel (US2011/0167516). The plasmids were precipitated onto 0.6 micrometer (average diameter) gold pellets using a water-soluble cationic lipid transfection reagent as follows. DNA solution was prepared on ice using 1 micrograms of plasmid DNA and optionally other constructs for co-bombardment such as 50 ng (0.5 microliters) of each plasmid comprising the developmental genes ODP2 (AP2 domain transcription factor ODP2 (Ovule development protein 2); US20090328252 Al) and Wuschel. To the pre-mixed DNA, 20 microliters of prepared gold particles (15 mg/ml) and 5 microliters of a water-soluble cationic lipid transfection reagent was added in water and mixed carefully. Gold particles were pelleted in a microfuge at 10,000 rpm for 1 min and the supernatant removed. The resulting pellet was carefully rinsed with 100 ml of 100% EtOH without resuspending the pellet and the EtOH rinse was carefully removed. 105 microliters of 100% EtOH was added and the particles resuspended by brief sonication. Then, 10 microliters was spotted onto the center of each macrocarrier and allowed to dry about 2 minutes before bombardment.
[0436] Alternatively, the plasmids and DNA of interest were precipitated onto 1.1 microns (average diameter) tungsten pellets using a calcium chloride (CaC12) precipitation procedure by mixing 100 microliters prepared tungsten particles in water, 10 microliters (1 microgram) DNA in Tris EDTA buffer (1 microgram total DNA), 100 microliters 2.5 M CaC12, and 10 microliters 0.1 M spermidine. Each reagent was added sequentially to the tungsten particle suspension, with mixing. The final mixture was sonicated briefly and allowed to incubate under constant vortexing for 10 minutes. After the precipitation period, the tubes were centrifuged briefly, liquid was removed, and the particles were washed with 500 ml 100% ethanol, followed by a 30 second centrifugation. Again, the liquid was removed, and 105 microliters of 100% ethanol was added to the final tungsten particle pellet. For particle gun bombardment, the tungsten/DNA particles were briefly sonicated. 10 microliters of the tungsten/DNA particles was spotted onto the center of each macrocarrier, after which the spotted particles were allowed to dry about 2 minutes before bombardment. The sample plates were bombarded at level #4 with a Biorad Helium Gun. All samples received a single shot at 450 PSI, with a total of ten aliquots taken from each tube of prepared parti cles/DNA. Following bombardment, the embryos were incubated on maintenance medium for 12 to 48 hours at temperatures ranging from 26C to 37C, and then placed at 26C.
[0437] The following alternative protocol is for stable, rapid assay done by 2-4 days after bombardment. After 5 to 7 days the embryos are transferred to selection medium containing 3 mg/liter Bialaphos, and subcultured every 2 weeks at 26C. After approximately 10 weeks of selection, selection-resistant callus clones are transferred to medium to initiate plant
regeneration. Following somatic embryo maturation (2-4 weeks), well-developed somatic embryos are transferred to medium for germination and transferred to a lighted culture room. Approximately 7-10 days later, developing plantlets are transferred to hormone-free medium in tubes for 7-10 days until plantlets are well established. Plants are then transferred to inserts in flats (equivalent to a 2.5" pot) containing potting soil and grown for 1 week in a growth chamber, subsequently grown an additional 1-2 weeks in the greenhouse, then transferred to Classic 600 pots (1.6 gallon) and grown to maturity. Plants are monitored and scored for transformation efficiency, and/or modification of regenerative capabilities.
[0438] Initiation medium comprised 4.0 g/1 N6 basal salts (SIGMA C-1416), 1.0 ml/1
Eriksson's Vitamin Mix (1000X SIGMA-1511), 0.5 mg/1 thiamine HC1, 20.0 g/1 sucrose, 1.0 mg/1 2,4-D, and 2.88 g/1 L-proline (brought to volume with DI-H20 following adjustment to pH 5.8 with KOH); 2.0 g/1 Gelrite (added after bringing to volume with DI-H20); and 8.5 mg/1 silver nitrate (added after sterilizing the medium and cooling to room temperature).
[0439] Maintenance medium comprised 4.0 g/1 N6 basal salts (SIGMA C-1416), 1.0 ml/1
Eriksson's Vitamin Mix (1000X SIGMA-1511), 0.5 mg/1 thiamine HC1, 30.0 g/1 sucrose, 2.0 mg/1 2,4-D, and 0.69 g/1 L-proline (brought to volume with DI-H20 following adjustment to pH 5.8 with KOH); 3.0 g/1 Gelrite (added after bringing to volume with DI-H20); and 0.85 mg/1 silver nitrate (added after sterilizing the medium and cooling to room temperature). [0440] Bombardment medium comprised 4.0 g/1 N6 basal salts (SIGMA C-1416), 1.0 ml/1 Eriksson's Vitamin Mix (1000X SIGMA-1511), 0.5 mg/1 thiamine HC1, 120.0 g/1 sucrose, 1.0 mg/1 2,4-D, and 2.88 g/1 L-proline (brought to volume with DI-H20 following adjustment to pH 5.8 with KOH); 2.0 g/1 Gelrite (added after bringing to volume with DI-H20); and 8.5 mg/1 silver nitrate (added after sterilizing the medium and cooling to room temperature).
[0441] Selection medium comprised 4.0 g/1 N6 basal salts (SIGMA C-1416), 1.0 ml/1
Eriksson's Vitamin Mix (1000X SIGMA-1511), 0.5 mg/1 thiamine HC1, 30.0 g/1 sucrose, and 2.0 mg/1 2,4-D (brought to volume with DI-H20 following adjustment to pH 5.8 with KOH); 3.0 g/1 Gelrite (added after bringing to volume with DI-H20); and 0.85 mg/1 silver nitrate and 3.0 mg/1 bialaphos (both added after sterilizing the medium and cooling to room temperature).
[0442] Plant regeneration medium comprised 4.3 g/1 MS salts (GIBCO 11117-074), 5.0 ml/1 MS vitamins stock solution (0.100 g nicotinic acid, 0.02 g/1 thiamine HCL, 0.10 g/1 pyridoxine HCL, and 0.40 g/1 glycine brought to volume with polished DI-H20) (Murashige and Skoog (1962) Physiol. Plant. 15:473), 100 mg/1 myo-inositol, 0.5 mg/1 zeatin, 60 g/1 sucrose, and 1.0 ml/1 of 0.1 mM abscisic acid (brought to volume with polished DI-H20 after adjusting to pH 5.6); 3.0 g/1 Gelrite (added after bringing to volume with DI-H20); and 1.0 mg/1 indoleacetic acid and 3.0 mg/1 bialaphos (added after sterilizing the medium and cooling to 60°C).
[0443] Hormone-free medium comprised 4.3 g/1 MS salts (GIBCO 11117-074), 5.0 ml/1
MS vitamins stock solution (0.100 g/1 nicotinic acid, 0.02 g/1 thiamine HCL, 0.10 g/1 pyridoxine HCL, and 0.40 g/1 glycine brought to volume with polished DI-H20), 0.1 g/1 myo-inositol, and 40.0 g/1 sucrose (brought to volume with polished DI-H20 after adjusting pH to 5.6); and 6 g/1 bacto-agar (added after brought to volume with polished DI-H20), sterilized and cooled to 60°C.
[0444] RN-Cas9 (Cas9 in control treatments), guide RNA plasmid, helper plasmids of
BBM, WUS and YFP were delivered to maize immature embryos using biolistic particle gun with ratio of RN-Cas9 (Cas9): gRNA:YFP:BBM:WUS= 50:20:25: 12.5: 12.5 (ng/shot). Immature embryos were harvested at 2-4 days or 6-7 days after bombardment or Agrobacterium infection respectively.
Agrobacterium- mediated transformation of maize
[0445] Agrobacterium- mediated transformation was performed essentially as described in Djukanovic et al. (2006) Plant Biotech J 4:345-57. Briefly, 10-12 day old immature embryos (0.8 -2.5 mm in size) were dissected from sterilized kernels and placed into liquid medium (4.0 g/L N6 Basal Salts (Sigma C-1416), 1.0 ml/L Eriksson's Vitamin Mix (Sigma E-l 511), l.O mg/L thiamine HC1, 1.5 mg/L 2, 4-D, 0.690 g/L L-proline, 68.5 g/L sucrose, 36.0 g/L glucose, pH 5.2). After embryo collection, the medium was replaced with 1 ml Agrobacterium at a concentration of 0.35-0.45 OD550. Maize embryos were incubated with Agrobacterium for 5 min at room temperature, then the mixture was poured onto a media plate containing 4.0 g/L N6 Basal Salts (Sigma C-1416), 1.0 ml/L Eriksson's Vitamin Mix (Sigma E-l 511), 1.0 mg/L thiamine HC1, 1.5 mg/L 2, 4-D, 0.690 g/L L-proline, 30.0 g/L sucrose, 0.85 mg/L silver nitrate, 0.1 nM aceto- syringone, and 3.0 g/L Gelrite, pH 5.8. Embryos were incubated axis down in the dark for 3 days at 20°C, then incubated 4 days in the dark at 28°C. Embryos were harvested for DNA extraction.
[0446] In another variation for stable transformation, the embryos are then transferred onto new media plates containing 4.0 g/L N6 Basal Salts (Sigma C-1416), 1.0 ml/L Eriksson's Vitamin Mix (Sigma E- 1511), 1.0 mg/L thiamine HC1, 1.5 mg/L 2, 4-D, 0.69 g/L L-proline, 30.0 g/L sucrose, 0.5 g/L MES buffer, 0.85 mg/L silver nitrate, 3.0 mg/L Bialaphos, 100 mg/L carbenicillin, and 6.0 g/L agar, pH 5.8. Embryos are subcultured every three weeks until transgenic events are identified. Somatic embryogenesis is induced by transferring a small amount of tissue onto regeneration medium (4.3 g/L MS salts (Gibco 11117), 5.0 ml/L MS Vitamins Stock Solution, 100 mg/L myo-inositol, 0.1 ?M ABA, 1 mg/L IAA, 0.5 mg/L zeatin, 60.0 g/L sucrose, 1.5 mg/L Bialaphos, 100 mg/L carbenicillin, 3.0 g/L Gelrite, pH 5.6) and incubation in the dark for two weeks at 28°C. All material with visible shoots and roots are transferred onto media containing 4.3 g/L MS salts (Gibco 11117), 5.0 ml/L MS Vitamins Stock Solution, 100 mg/L myo-inositol, 40.0 g/L sucrose, 1.5 g/L Gelrite, pH 5.6, and incubated under artificial light at 28°C. One week later, plantlets are moved into glass tubes containing the same medium and grown until they were sampled and/or transplanted into soil.
[0447] Agrobacterium was also used to deliver all components in a single T-DNA.
Agrobacterium- mediated transformation of canola
[0448] Agrobacterium- mediated transformation was performed for canola. Surface sterilized canola seeds were germinated on medium (2.2g/L MS Basal Salt mixture, 20g/L sucrose, 8g/L Sigma Agar) at 26C with 16-hour light intensity of 30-60 umol/m2/sec for 7 days in phytatray. Hypocotyl or stem of sterilized canola seedlings were cut to ~2mm segments with, 40-50 segments were placed in petri dish containing 10ml liquid medium (4.3g/L Basal salt mixture, O. lg/L Myo-inositol, 5ml/L 36J-MS vitamin stock, 0.5g/L MES buffer 0, 0.1/ml/L 12N- a-NAA lmg/ml , lml/L 13108-BAP lmg/ml, O.lml/L 17G-Gibberellic acid 0.1mg/ml, lml/L 14020-Thymidin 50mg/mL) with lOul Acetosyringone (lOOmM) and 200ul Agrobacterium at a concentration of 0.8-1.0 OD600. Plates were incubated at 21C at dim light (16 hours light intensity of 5umol/m2/sec) for 3 days. Segments of the Hypocotyls/stems then were transferred to plates with medium (4.3g/L MS Basal salt mixture, O.lg/L Myo-inositol, 5ml/L 36J-MS vitamin stock solution, 0.5ml/L MES buffer, 0.04g/L Adenine Hemisulfate salt, 20g/L sucrose, 0.5g/L PVP-40, 5g/L TC Agar, O.lml/L 12 N-a-NAA lmg/ml, lml/L 13108-BAP lmg/ml, O. lml/L 17G-Gibberellic acid 0. lmg/ml, lml/L 47A-Silver nitrate 2mg/ml, lOml/L 17D-Carbenicillin 50mg/ml, 0.1 ml/L 47-spectinomycin dihydrochloride 50mg/ml) and place at 26 C with 16-hour light intensity of 30-60 umol/m2/sec for 3 more days, Segments were collected for DNA extraction.
Agrobacterium- mediated transformation for sorghum
[0449] Agrobacterium- mediated transformation was performed essentially as described in Che et. al. (2018) Plant Biotechnology Journal 16, 1388-1395. Transformation process bypassed the callus stage since quick process was applied as described in Lowe et. al. (2018) In Vitro Cell Dev Biol Plant 54, 240-252. TO plants leaves were sampled and DNA were extracted as describe above and target sites mutation were analyzed with NGS. Mutations were detected in 1, 2, 3, and all 4 target sites.
Sequence verification of genomic polynucleotide modification
[0450] Samples of a transformed plant were obtained and sequenced, using any method known in the art, and compared to the genomic sequences of an isoline plant not transformed with the Cas endoribonuclease, Cas9 and associated guide polynucleotide guide polynucleotide constructs. The presence of non-homologous end-joining (NHEJ) insertion and/or deletion (indel) mutations resulting from DNA repair can also be used as a signature to detect cleavage activity.
Example 4: The Type I-E Cas endoribonuclease cleaved a poly-guide RNA and enabled multiplex site targeting and double-strand-break creation with a Cas9 endonuclease
[0451] DNA expression cassettes were constructed according to FIG. 3. The Cas9 gene from Streptococcus pyogenes Ml GAS (SF370) and the potato ST-LS1 intron was introduced in order to eliminate its expression in E.coli and Agrobacterium. To facilitate nuclear localization of the Cas9 protein in maize cells, the Simian virus 40 (SV40, SEQID NO: 14) monopartite amino terminal nuclear localization signal and bipartite nuclear localization signal from VirD2 (SEQID NO: 15) Agrobacterium tumerfaciens endonuclease were incorporated at the amino terminus 5 and 3 of the Cas9 open reading frame. The Cas9 gene was operably linked to a maize Ubiquitin promoter using standard molecular biological techniques. SEQID NO: 16 represents the Cas9 cassette.
[0452] Optimized RN from Streptococcus thermophilus DGCC7710 Type I-E was linked to Cas9 with glycine-serine-glycine spacer (GSG) and a porcine teschovirus-1 2 A self-cleaving peptide (P2A) (SEQ ID:7). SV40 NLS was also incorporated at the amino terminus 5 of the RN (some without). The RN-Cas9 fusion was also linked to a maize Ubiquitin promoter and pinll (potato intron 2 terminator) using standard molecular biological techniques. SEQID NO: 18 represents the RN-Cas9 cassette.
[0453] To direct the Cas9 nuclease to the designated genomic target sites, a maize U6 polymerase III promoter (SEQID NO:20; see WO2015026885, WO20158026887,
WO2015026883, and WO2015026886) and its cognate U6 polymerase III termination sequences were used to direct initiation and termination of gRNA expression. The guide RNA coding sequence was 77bp long (SEQID NO:21) and comprised a 12-30 bp variable targeting domain chosen from a maize genomic target site. Guide RNA variable targeting domains used in the multiplex development was maize Y1 gene identified as Y1-CR2 and Y1-CR3 which correspond to the genomic target sites Y1-TS2 and Y1-TS3 respectively (SEQ ID: 8, 9, 22, 23, respectively).
[0454] Constructs were prepared for particle bombardment or Agrobacteirum- mediated transformation, and introduced into Zea mays embryos as described above, to determine the optimal recognition sequence for the Type I-E Cas endoribonuclease (RN). Poly-guide RNA cassettes were designed as shown in FIG. 3B, with different orders of the individual gRNAs in the poly-guide RNA for cleavage by the RN to release the gRNAs Y1-CR2 (also called“Y2”, targeting the Zea mays target site TS2) and Y1-CR3 (also called“Y3”, targeting the Zea mays target site TS3). The constructs of FIG. 3 A was co-bombarded with each of the construct options shown in FIG. 3B. Two reps were conducted with 25 embryos each. Immature embryos were harvested at 2-4 days after bombardment, or 6-7 days after Agro infection, respectively.
[0455] Embryos were collected after transformation and pooled into single tube for a rep, overnight lyophilized embryos were grounded into powder in GenoGrinder. Genomic DNA was extracted suing the IBI Genomic DNA Mini kit (plant). PCR was done with Thermo Phusion F-548 for the first PCR and NEB Q5 for second PCR. PCR products flanking the CTS2 and CTS3 (FIG. 2) were sequenced with next generation sequencing (NGS) using two steps approach. Primary PCR was done with primers of fl/rl and f2/r2 (SEQ ID NOs: 10, 11, 12, 13) for CTS2 and CTS3 respectively, SEQID NO:49 and SEQID NO:50 were bridge sequences in the forward and reverse primers respectively. Secondary (universal) PCR was done using primers of SEQID NO:51 and SEQID NO:52, green sequences were for Illumina sequencing and the 8N on reverse primer indicated bar code position which corresponded to the sample locations on plate for sequencing. Target site mutation frequency was calculated by the mutation read number divided by total read number.
[0456] To determine the optimal binding site sequence of RN protein for guide RNA processing, five sequences (Table 2) including the original 28bp repeat sequence (RN RSO) in the Streptococcus thermophilus DGCC7710 Type I-E CRISPR array and 4 variants derived from RN RSO were tested. Guide RNA gene cassettes comprised maize U6 polymerase III promoter, the Y1-CR2 and Y1-CR3, and U6 terminator. The endoribonuclease recognition sequence (RN RS) was used to link the Y1-CR2 and Y1-CR3 guide RNAs (FIG. 3B).
[0457] Target site Y1-TS2 and Y1-TS3 mutation frequency was used to evaluate the RN-
Cas9 on processing guide RNA of Y1-CR2 and Y1-CR3. The mutation read was from
combining the number of top 10 types of indel mutation at each site. Plasmids with individual maize U6 promoter controlled Y1-CR2 and Y1-CR3 co-delivered with Zm-UBI: Cas9 (FIG. 5 Vector 1). Among the tested 5 RN binding sites, RN-RS4 performed the best (FIG. 6), mutations were detected at both Y-TS2 and Y1-TS3 no matter the Y1-CR2 and Y1-CR3 guide position in the cassette. RN-RS also showed some multiplexing activity, RN-RS1 showed no multiplexing activity at all, RN-RS2 can only process the guides when it is adjacent to the U6 promoter, and RN-R3 showed opposite effect as RN-RS2. Results were consistent when the RH-RS, RN-RS3, and RN-RS4 were retested. FIG. 6 shows the frequencies of mutations identified with each of the recognition sequences from each of the two input cassette gRNA orders (Y2-Y3 and Y3-Y2), at 7 days post-bombardment. Based on these data, RS4 was selected for further experimentation.
Table 2: Recognition sequences for the Type I-E Cas endoribonuclease
Conserved sequences are indicated by underlining.
Name
Figure imgf000092_0001
sequence
Figure imgf000092_0002
Seq ID
Figure imgf000092_0003
Figure imgf000093_0002
[0458] Four different promoters were tested for the poly-guide RNA cassettes, as shown in FIG. 5 (Vectors 5, 6, 7, and 9). Polymerase II promoters and terminators can also drive the RN binding site linked guide RNA. ZM-UBI, Setaria italica ubiquitin promoter (SI-UBI) (SEQID NO:28), and maize bidirectional ubiquitin promoter (SEQID NO:31), and corresponding terminators of ZM-UBI TERM, CAMV35S TERM, SI-UBI term, and ZM-U6 TERM were used to control the RN binding site (RS4) linked Y 1 -R2 and Y 1 -CR3 guides.
[0459] To facilitate delivery of the genome-editing reagents into maize cells, Baby boom
( BBM , also known as ovule development protein! ( ODP2 ) (SEQID NO:32) and Wuschel 2 (WUS2) (SEQID NO:33) and were expressed under control of maize UBI promoter and In2-2 promoter (SEQID NO:34) respectively, yellow fluorescent protein (YFP) under Zm-UBI promoter control. The RN-Cas9 expression cassette and guide RNA cassettes and the helper gene cassettes were in separate plasmids or constructed into a single T-DNA based on transformation methods used. All plasmids constructs were assembled using chemically synthesized DNA fragments with standard molecular cloning techniques. FIG. 7 shows the percent reads mutation frequencies for each of the constructs with different promoters ( Zea mays ubiquitin promoter (ZmUBI), Setaria italica Ubiquitin promoter (SiUBI), Zea mays U6 promoter (ZmU6), and a Zea mays Ubiquitin bi-directional promoter (ZmUBI bidirectional), at 4 days post
bombardment.
[0460] Next, vectors were prepared for Agrobacterium-mediated transformation of corn embryos, using the methods described above, and introduced into maize embryos, using the ZmUBI, SiUBI, and ZmUBI bidirectional promoters at two different target sites, CR2 and CR3.
[0461] The results are shown in Table 3 and in FIG. 8.
Table 3: Agrobacterium-mediated introduction of RN for multiplex site targeting in maize
(7 days after infection)
Figure imgf000093_0001
Figure imgf000094_0001
[0462] Next, five sites in maize were targeted using the vector shown in FIG. 9, including: SH2-CR4 (SEQID NO:67), SH2-CR5 (SEQID NO:68), SU1-CR1 (SEQID NO:69), SU1-CR4 (SEQID NO:70). The guide sequences are given for each target as: SH2-CR4 (SEQID NO:53), SH2-CR5 (SEQID NO:54), Y1-CR2 (SEQID NO:8), Y1-CR3 (SEQID NO:9), SU1- CR1 (SEQID NO:55), and SU1-CR4 (SEQID NO: 56). Mutation fractions for each site, and the primer sequence SEQIDs to verify mutations, are shown in Table 4.
Table 4: Multiplex site targeting in maize
Figure imgf000094_0002
[0463] Next, four sites in sorghum were targeted using the vector shown in FIG. 10:
OSDL1-CR3 (SEQID NO:71), OSDL3-CR1 (SEQID NO:72), REC8-CR4 (SEQID NO:73), SPOl 1-CRl (SEQID NO:74). The guide sequences are given for each target as: OSDL1-CR3 (SEQID NO:57), OSDL3-CR1 (SEQID NO:58), REC8-CR4 (SEQID NO:59) and SPOl 1-CRl (SEQID NO:60). Results are shown in Table 5 (NGS = Next Generation Sequencing). Multiplex delivery of 4 guides generated knockout mutations in TO plants at approximately 20% frequency. Mutations were detected at 1, 2, 3, and all 4 target sites. Some sites had bi-allelic mutations.
Table 5: Multiplex site targeting in sorghum
Figure imgf000094_0003
[0464] Next, two different target sites (CR1 and CR2) were targeted in Canola (A and C genomes), as shown in FIG. 11 A, using the vector design depicted in FIG. 12A: PGAZ-CR1 C genome target site (SEQID NO:75), PGAZ-CR1 A genome target site (SEQID NO:76), PGAZ- CR2 target site (A and C genomes) (SEQID NO:77). The CR1 target site (PGAZ gene) differs by one nucleotide as shown in FIG. 1 IB; therefore, two different guides were needed to target the PGAZ gene in both the A and C genomes (BNA-PGAZ CR1 guide and BNA-PGAZ CR10BC-A guide shown in FIG. 12A). CR2 was targeted with the BNA-PGAZ CR2 guide (FIG. 12A). The guide sequences are given as: PGAZ-CRl C genome (SEQID NO:61), PGAZ-CRl A genome (SEQID NO:62), PGAZ-CR2 guide (A and C genomes) (SEQID NO:63). Primers to detect mutations are given as (f = forward primer, r = reverse primer): PGAZ-CRl-Af (SEQID NO:86), PGAZ-CRl -Ar (SEQID NO:87), PGAZ-CRl -Cf (SEQID NO:88), PGAZ-CRl-Cr (SEQID NO:89), PGAZ-CR2fl (SEQID NO:90), and PGAZ-CR2rl (SEQID NO:91).
[0465] As shown in FIG. 12B, both the A and C genomes in Canola were successfully mutated at both the CR1 and CR2 sites.
[0466] Different promoters for driving the Cas 6 (RN) cassette were tested (schema depicted in FIG. 13B), for a plasmid that also comprised the Cas9 cassette as shown in FIG. 13 A, for two different target sites in maize, TS45 and Y1-CR2. The Cas6 construct on Fig 13B was co-bombarded with the Cas9 and guide RNA vector on FIG13A. The Ubiquitin bidirectional promoter is given as SEQID NO:64, the UBI10 5’UTR and intron 1 sequence is given as SEQID NO:65, and the NLS is given as SEQID NO:66. Results are shown in FIG. 13C and FIG. 13D. Introduction of separate Cas6 cassettes were also successful in introducing mutations.
[0467] These examples demonstrate that the Cas endoribonuclease identified and derived from Streptococcus thermophilus , for example as provided by the polypeptide sequence SEQID NO:48 or as encoded by the polynucleotide sequences SEQID NO: l or SEQID NO:39, or any functional variant or fragment of any of the preceding, may be used to cleave a poly-guide RNA molecule to release individual guide RNAs that may form a complex with a Cas endonuclease, for the recognition, binding, and optionally nicking or cleaving a DNA target, for example in the genome of a cell. It will be appreciated by one of skill in the art that the compositions and methods disclosed herein may be used in any prokaryotic or eukaryotic cell, such as a bacterial cell, an animal cell, a fungal cell, or a plant cell. The plant cell may be from any plant, for example from a monocot or dicot, for example but not limited to maize, soy, sorghum, canola, wheat, rice, cotton, or sunflower. The Cas endonuclease may be Cas9, Cpfl, part of a Cascade, or any RNA-guided Cas endonuclease or Cas endonuclease system. The method of introduction of the endonuclease, poly-guide RNA molecule, and/or endoribonuclease may be via any method known in the art, such as but not limited to Agrobacterium- mediated transformation, particle bombardment, whisker-mediated transformation, electroporation, floral dip, co-incubation, or lipofection.
[0468] Delivery of any component to a target polynucleotide or a target cell may be with all DNA components (such as encoding all components on a DNA vector), or alternatively, some or all components may be delivered as RNA (for example, the poly-guide RNA may be delivered directly to the target polynucleotide or cell as an RNA molecule), or some or all components may be delivered as a protein (for example, the endoribonuclease or endonuclease may be delivered directly to the target polynucleotide or cell as a protein). Some or all of the components (poly-guide RNA, endonuclease, endoribonuclease) may be delivered together to the target polynucleotide or target cell, or one or more components may be introduced separately. Separate introduction may occur concurrently, or be spatially or temporally distinct. The poly-guide RNA and the endoribonuclease may be introduced to each other for poly-guide RNA processing into individual discrete gRNA molecules prior to the introduction of the Cas endonuclease.
[0469] It will be appreciated by one skilled in the art that using the methods and endoribonuclease disclosed herein, any guide RNA composition or combination of gRNAs may be utilized. The examples provided herein demonstrate multiplex polynucleotide editing using a plurality of guide RNAs provided as components of a poly-guide RNA molecule that is cleaved by an endonuclease, and is not limited to any particular target site, target polynucleotide, cell type, or specific guide RNA composition.

Claims

WE CLAIM:
1. A composition comprising:
(a) an endoribonuclease, and
(b) a heterologous poly-guide RNA molecule comprising a plurality of discrete guide RNA molecules, wherein a region of each of the discrete guide RNA molecule shares substantial identity with a target sequence in the genome of the cell and wherein each of the discrete guide RNA molecules is flanked by a nuclease recognition sequence that is capable of functional interaction with the endoribonuclease molecule; wherein the endoribonuclease is Cas6.
2. The composition of Claim 1, further comprising a Cas endonuclease, wherein at least one of the discrete guide RNA molecules forms a complex with the Cas endonuclease.
3. The composition of Claim 2, wherein the plurality of discrete RNA molecules comprise at least two non-identical guide RNA molecules that are each capable of targeting at least two distinct target sequences in the genome.
4. The composition of Claim 1, wherein the nuclease recognition sequence comprises the nucleotides CCCGCNNNNGCGGG.
5. The composition of Claim 1, wherein the endoribonuclease molecule shares at least 85% sequence identity with SEQID NO:48.
6. The composition of Claim 1, wherein the poly-guide RNA molecule is provided as a DNA molecule operably linked to a promoter.
7. A cell comprising:
(a) an endoribonuclease, (b) a composition comprising a poly-guide RNA molecule comprising a plurality of discrete guide RNA molecules, wherein a region of each of the discrete guide RNA molecule shares substantial identity with a target sequence in the genome of the cell and wherein each of the discrete guide RNA molecules is flanked by a nuclease recognition sequence that is capable of functional interaction with the
endoribonuclease molecule, and
(c) a Cas endonuclease.
8. The cell of Claim 14, wherein the cell is a eukaryotic cell.
9. The cell of Claim 14, wherein the cell is a plant cell.
10. A method of targeting a plurality of target sequences in a genome of a cell, the method comprising providing the cell with the composition of Claim 1.
11. A method of generating a plurality of guide RNA molecules in a cell, the method
comprising:
(a) providing a heterologous RNA molecule comprising a plurality of discrete guide RNA molecules to the cell, wherein the plurality of the guide RNA molecules are each flanked by an endoribonuclease recognition sequence;
(b) providing an endoribonuclease that cleaves the endoribonuclease recognition
sequence, and thereby generating the plurality of guide RNA molecules in the cell.
12. The method of Claim 11, further comprising providing to the cell a Cas endonuclease.
13. A method of targeting a plurality of target polynucleotides in a cell, comprising providing to the cell an endoribonuclease, a Cas endonuclease, and a poly-guide RNA molecule, wherein the poly-guide RNA molecule comprises a plurality of discrete guide RNAs that are each capable of selectively hybridizing to one or more of the target polynucleotides.
14. The method of Claim 13, wherein the cell is a bacterium, a plant cell, a fungal cell, or an animal cell.
15. The method of Claim 13, wherein the Cas endonuclease, the endoribonuclease, and the poly-guide RNA molecule are provided on one vector to the cell.
16. The method of Claim 13, wherein at least two of the Cas endonuclease, the poly-guide RNA molecule, and the endoribonuclease are provided on different DNA vectors to the cell.
17. The method of Claim 13, wherein the Cas endonuclease is provided to the cell as a
protein.
18. The method of Claim 13, wherein the endoribonuclease is provided to the cell as a
protein.
19. The method of Claim 13, wherein the poly-guide RNA molecule is provided to the cell as RNA.
PCT/US2019/067032 2018-12-21 2019-12-18 Multiplex genome targeting WO2020131986A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201862783604P 2018-12-21 2018-12-21
US62/783,604 2018-12-21

Publications (1)

Publication Number Publication Date
WO2020131986A1 true WO2020131986A1 (en) 2020-06-25

Family

ID=71101684

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2019/067032 WO2020131986A1 (en) 2018-12-21 2019-12-18 Multiplex genome targeting

Country Status (1)

Country Link
WO (1) WO2020131986A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023278802A1 (en) * 2021-07-01 2023-01-05 Cedars-Sinai Medical Center Formulations for oral delivery of nucleic acids
US11660355B2 (en) 2017-12-20 2023-05-30 Cedars-Sinai Medical Center Engineered extracellular vesicles for enhanced tissue delivery
US11759482B2 (en) 2017-04-19 2023-09-19 Cedars-Sinai Medical Center Methods and compositions for treating skeletal muscular dystrophy

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110217739A1 (en) * 2008-11-06 2011-09-08 University Of Georgia Research Foundation, Inc. Cas6 polypeptides and methods of use
US20170022499A1 (en) * 2014-04-03 2017-01-26 Massachusetts Institute Of Techology Methods and compositions for the production of guide rna
US20180251784A1 (en) * 2014-06-26 2018-09-06 Regeneron Pharmaceuticals, Inc. Methods and compositions for targeted genetic modifications and methods of use

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110217739A1 (en) * 2008-11-06 2011-09-08 University Of Georgia Research Foundation, Inc. Cas6 polypeptides and methods of use
US20170022499A1 (en) * 2014-04-03 2017-01-26 Massachusetts Institute Of Techology Methods and compositions for the production of guide rna
US20180251784A1 (en) * 2014-06-26 2018-09-06 Regeneron Pharmaceuticals, Inc. Methods and compositions for targeted genetic modifications and methods of use

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DATABASE Protein 10 September 2015 (2015-09-10), "Hypothetical protein MNA02_1023 [Streptococcus thermophilus]", XP055721409, retrieved from ncbi Database accession no. AKH33472.1 *
DATABASE Protein 15 June 2017 (2017-06-15), "chorion class high-cysteine HCB protein 13 [Tyzzerella sp. An114]", XP055721410, retrieved from ncbi Database accession no. WP_088108616.1 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11759482B2 (en) 2017-04-19 2023-09-19 Cedars-Sinai Medical Center Methods and compositions for treating skeletal muscular dystrophy
US11660355B2 (en) 2017-12-20 2023-05-30 Cedars-Sinai Medical Center Engineered extracellular vesicles for enhanced tissue delivery
WO2023278802A1 (en) * 2021-07-01 2023-01-05 Cedars-Sinai Medical Center Formulations for oral delivery of nucleic acids

Similar Documents

Publication Publication Date Title
US20230235345A1 (en) Plant genome modification using guide rna/cas endonuclease systems and methods of use
US20220364107A1 (en) Agronomic trait modification using guide rna/cas endonuclease systems and methods of use
US10676754B2 (en) Compositions and methods for producing plants resistant to glyphosate herbicide
US20200332305A1 (en) Use of cpfi endonuclease for plant genome modifications
US20210238614A1 (en) Methods and compositions for homology directed repair of double strand breaks in plant cell genomes
US20220307006A1 (en) Donor design strategy for crispr-cas9 genome editing
WO2020131986A1 (en) Multiplex genome targeting
US20230203517A1 (en) Large scale genome manipulation
US20230079816A1 (en) Cas-mediated homology directed repair in somatic plant tissue
US20200332306A1 (en) Type i-e crispr-cas systems for eukaryotic genome editing
US20230091338A1 (en) Intra-genomic homologous recombination
WO2023164550A2 (en) Methods and compositions for advanced breeding through targeted chromosome engineering
WO2023102393A1 (en) High efficiency large scale chromosomal genome manipulation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19899339

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19899339

Country of ref document: EP

Kind code of ref document: A1