EP4243608A1 - Fusion protein for editing endogenous dna of a eukaryotic cell - Google Patents

Fusion protein for editing endogenous dna of a eukaryotic cell

Info

Publication number
EP4243608A1
EP4243608A1 EP21814696.7A EP21814696A EP4243608A1 EP 4243608 A1 EP4243608 A1 EP 4243608A1 EP 21814696 A EP21814696 A EP 21814696A EP 4243608 A1 EP4243608 A1 EP 4243608A1
Authority
EP
European Patent Office
Prior art keywords
amino acid
seq
exonuclease
protein
acid sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP21814696.7A
Other languages
German (de)
French (fr)
Inventor
Sylvestre Marillonnet
Alain Tissier
Tom SCHREIBER
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Leibniz-Institut fur Pflanzenbiochemie (ipb)
Leibniz Institut fuer Pflanzenbiochemie
Original Assignee
Leibniz-Institut fur Pflanzenbiochemie (ipb)
Leibniz Institut fuer Pflanzenbiochemie
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Leibniz-Institut fur Pflanzenbiochemie (ipb), Leibniz Institut fuer Pflanzenbiochemie filed Critical Leibniz-Institut fur Pflanzenbiochemie (ipb)
Publication of EP4243608A1 publication Critical patent/EP4243608A1/en
Pending legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8201Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation
    • C12N15/8213Targeted insertion of genes into the plant genome by homologous recombination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/87Introduction of foreign genetic material using processes not otherwise provided for, e.g. co-transformation
    • C12N15/90Stable introduction of foreign DNA into chromosome
    • C12N15/902Stable introduction of foreign DNA into chromosome using homologous recombination
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K38/00Medicinal preparations containing peptides
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2770/00MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA ssRNA viruses positive-sense
    • C12N2770/00011Details
    • C12N2770/38011Tombusviridae
    • C12N2770/38041Use of virus, viral particle or viral elements as a vector
    • C12N2770/38043Use of virus, viral particle or viral elements as a vector viral genome or elements thereof as genetic vector

Definitions

  • the present invention relates to a protein, such as a fusion protein, for editing endogenous DNA in a eukaryotic cell or in a eukaryotic organism at a target site of the endogenous DNA, and to a nucleic acid molecule comprising a polynucleotide encoding the protein.
  • the invention also relates to a DNA construct, plasmid or vector comprising the polynucleotide of the nucleic acid molecule.
  • the invention further relates to a prokaryotic or eukaryotic cell comprising the protein, or the nucleic acid molecule, or the DNA construct, plasmid or vector.
  • the invention further relates to a kit for editing endogenous DNA at a target site in a eukaryotic cell or in a eukaryotic organism. Further, the invention relates to a method for inserting a donor sequence of interest into endogenous DNA of a eukaryotic cell or a eukaryotic organism at a target site and to a method for modifying endogenous DNA of a eukaryotic cell or a eukaryotic organism at a target site. Also provided is a cell or a eukaryotic organism generated by the methods.
  • genetic engineering does not only require methods for gene knock-out or gene knock-down. Genetic engineering also requires methods for the targeted knock-in of nucleotide sequences into the genome or the introduction of specific mutations at targeted sites in a predictable way. Such modifications are not possible through the NHEJ pathway.
  • the repair of DSBs through homology directed repair (HDR) pathways is suited for knock-in of sequences into the genome and the replacement of genes because HDR relies on a template for DNA break repair.
  • flanking arms of such a DNA repair template also referred to herein as “donor nucleic acid” or, if DNA, as “donor DNA”
  • donor nucleic acid also referred to herein as “donor nucleic acid” or, if DNA, as “donor DNA”
  • HDR allows targeted integration and deletion of sequences in the genome.
  • NHEJ is the predominant pathway for repair
  • the repair of DSBs through the HDR pathway is rare in eukaryotic cells and happens with a likelihood well below the practical limit for detection and isolation. This means that gene editing applications that rely on HDR only become feasible in eukaryotic cells when the frequency of DSB repair through the HDR pathway is significantly increased.
  • US 20170175140 describes methods for using a 5’-exonuclease to increase the frequency of homologous recombination in eukaryotic cells.
  • the reported improvement was rather low so that the need remains to increase the efficiency of HDR and, generally, the efficiency of gene editing, notably via the HDR mechanism.
  • methods and tools such as proteins, nucleic acid molecules, and/or kits
  • a protein for editing endogenous DNA in a eukaryotic cell or in a eukaryotic organism at a target site of the endogenous DNA comprising a site-specific endonuclease and a 5’- 3’ exonuclease, wherein the 5’-3’ exonuclease is a monomeric 5’-3’ exonuclease.
  • a protein for editing endogenous DNA in a eukaryotic cell or in a eukaryotic organism at a target site of the endogenous DNA comprising a site-specific endonuclease and a 5’- 3’ exonuclease, wherein the 5’-3’ exonuclease is a monomeric 5’-3’ exonuclease having 5’-3’ exonuclease catalytic efficiency k ca t/K m of at least 0.072 (pM s) -1 or a turnover number of least 0.50 s -1 in the in-vitro exonuclease assay described in the description.
  • the fusion protein according to item 4 optionally as further defined in any one of items 2, 3, and 7 to 10, wherein the 5’-3’ exonuclease is fused to the N-terminal end or to the C-terminal end of the site-specific endonuclease.
  • the fusion protein according to item 4 optionally as further defined in any one of items 2, 3, and 7 to 11, wherein the site-specific endonuclease and the 5’-3’ exonuclease are fused via a polypeptide linker.
  • said polypeptide linker consisting of from 5 to 300 amino acid residues, preferably from 10 to 200, more preferably from 20 to 120 amino acid residues.
  • a nucleic acid molecule comprising a polynucleotide encoding the protein according to any one of items 1 to 13, preferably encoding the protein according to any one of items 4 and 11 to 13.
  • Nucleic acid construct, plasmid or vector comprising the polynucleotide of the nucleic acid molecule according to item 14.
  • Kit comprising: a nucleic acid molecule comprising a polynucleotide encoding said first protein subunit according to item 5 or 6 and a nucleic acid molecule comprising a polynucleotide encoding said second protein subunit according to item 5 or 6.
  • a prokaryotic or eukaryotic cell comprising i) the protein according to any one of items 1 to 13, ii) the nucleic acid molecule of item 14, iii) the nucleic acid construct, plasmid or vector according to item 15, or iv) the kit according to item 16.
  • the donor nucleic acid comprises, in the following order, a first homology arm that is homologous to a first region flanking a target site in the genome of said cell on a first side of said target site, optionally a donor sequence of interest to be inserted into genomic DNA of said cell at said target site, and a second homology arm that is homologous to a second region flanking said target site on the second side of said target site.
  • a non-human organism preferably a plant, comprising a cell according to any one of items 17 to 22.
  • kits for editing endogenous DNA at a target site in a eukaryotic cell or in a eukaryotic organism comprising
  • a guide RNA being capable of binding to the site-specific endonuclease and of directing the protein to the target site on the endogenous DNA of said cell or organism; or a nucleic acid molecule encoding said guide RNA.
  • a method for modifying endogenous DNA of a eukaryotic cell or a eukaryotic organism at a target site comprising providing the cell or organism with:
  • a guide RNA capable of binding to the site-specific endonuclease and of directing the protein to said target site in the endogenous DNA of said cell or organism, or with a nucleic acid (guide nucleic acid) encoding said guide RNA.
  • the invention is based on the surprising finding that there are huge differences among 5’-3’ exonucleases (i.e. exonucleases that hydrolyze DNA in the 5’ to 3’ direction) in their ability to increase the frequency of HDR and thus the efficiency of gene editing via HDR.
  • 5’-3’ exonucleases that are monomeric and display high in vitro 5’-3’ exonuclease activity are particularly suited to significantly increase HDR.
  • enzymes that are monomeric but whose 5’-exonuclease activity is too low or enzymes that are multimeric are not suited for increasing the frequency of HDR.
  • the inventors have further found that (even among monomeric 5’-3’ exonucleases) there are huge differences among 5’- 3’ exonucleases in their ability to increase the frequency of HDR and thus the efficiency of gene editing via HDR. Moreover, the inventors have found that improved efficiency of gene editing via HDR may be achieved by combining endonucleases with particular 5’-3’ exonucleases that have high activity with the type of single or double strand break (e.g. blunt ends or staggered ends) produced by the endonuclease.
  • the invention provides improved methods, proteins, kits, and nucleic acid molecules for gene editing via HDR. Accordingly, the invention allows HDR to become reasonably competitive with the otherwise faster NHEJ pathway of DSB repair in cells, notably eukaryotic cells.
  • Fig. 1 Identification of putative single guide RNA (sgRNA) target sites in the genomic sequence of NbPGK (phosphoglycerate kinase of Nicotiana benthamiana) close to the stop codon in a PAM-in orientation (via CRISPOR and CRISPR-P v2.0), see Example 1.
  • LB stands for T-DNA left border
  • RB stands for T-DNA right border
  • GUS is the GUS protein-encoding ORF
  • 5’HA and 3’HA indicate the homology arms on the donor DNA and the binding regions of the 5’ and 3’ homology arms of the donor DNA on the genomic DNA.
  • Arrows indicate the position of the sgRNA targets.
  • Fig. 2 Design of SpCas9 exonuclease fusions and sgRNAs.
  • A N-terminal Exo-Cas9- fusion.
  • B C-terminal Cas9-Exo-fusion. Details are explained in Example 2.
  • Exo indicates an exonuclease encoding fragment;
  • Linker stands for a linker between Exo and SpCas9i fragment;
  • tOCS is a transcription terminator.
  • N-SpCas9i-N stands for a Cas9 version having two NLS signals (nuclear localization signals).
  • Bsal stands for the type IIS restriction endonuclease recognition sites of restriction enzyme Bsa/; horizontal boxes containing base quadruplets indicate that Bsal cleavage sites that form the GG overhangs used for assembling adjacent fragments by ligation using the Golden Gate (GG) cloning method described by Marillonnet et al. (described inter alia in WO2011154147 A1).
  • GG Golden Gate
  • Fig. 3 Schematic presentation of sgRNA construct design using the Golden Gate (GG) cloning method.
  • SILI6 stands for the U6 promoter from Solanum lycopersicum (SILI6, pAGT5824).
  • Fig. 4 Results of gene targeting by transient expression in Nicotiana benthamiana leaves using translational GUS-fusion to NbPGK.
  • E-4LF2-Cas9 stands for N-terminal fusions, wherein the exonuclease E is on the N-terminal side of Cas9 and linked via linker 4LF2.
  • Cas9-4LF2-E stands for C-terminal fusions, wherein the exonuclease E is on the C-terminal side of Cas9 and linked via linker 4LF2.
  • the bars are labeled with abbreviations of the exonuclease of the fusion protein.
  • UL12 refers to the UL12-1 exonuclease.
  • DU stands for the DUMAS exonuclease.
  • MD stands for the MD5 exonuclease.
  • dCas9 stands for deactivated Cas9.
  • Fig. 5 Analysis of exonuclease activity by in vitro processing of a blunt ended hairpin oligonucleotide. Comparison of T5 and T7 Exonuclease activity.
  • Fig. 6 Transgenic Nicotiana benthamiana tobacco mosaic virus (TMV)-reporter line using GFP encoded in gDNA. This reporter line is used to measure the frequency and efficiency of HDR in the Examples 7 to 22.
  • gDNA nbi775
  • gDNA designates the insertion cassette in the transgenic N. benthamiana line as given in SEQ ID NO: 99, encoding a TMV with truncated RdRP with GFP, under the control of Act2p as the transcription promoter.
  • GFP replaces the coat protein (CP) of TMV.
  • MP stands for the TMV movement protein.
  • Donor stands for the donor nucleic acid comprising the donor sequence encoding the RNA-dependent RNA polymerase (RdRP) of TMV and is given in SEQ ID NO: 74. Insertion of the Donor by HDR repairs the RdRP and allows transcription of a replicating TMV expressing GFP. See Example 7 for details. The sequence stretch of the TMV transgene construct at the bottom of Fig. 6 is given as SEQ ID NO: 73.
  • Fig. 7 Exonuclease fused Cas9 leads to increased HDR in planta.
  • Cas9-4LF2-X indicates that the exonucleases were fused to the C-terminal end of Cas9 using the 4LF2 linker.
  • Cas9 WT and deactivated Cas9 (dCas9) were used as control.
  • the donor DNA is as in Figure 6.
  • Exo1 Arabidopsis exonuclease I; LaExo: Lamda phage exonuclease; T5: Bacteriophage T5 exonuclease; T7: Bacteriophage T7 exonuclease; Exo3: Exonuclease III from E. coli;
  • TREX1 Three-prime repair exonuclease 1 from Homo sapiens. For details see Example 8.
  • Fig. 9 Genotyping of HDR events by PCR (see Example 10).
  • Donor represents the donor nucleic acid.
  • Primer pair P1 consists of primers 1F and 1 R, primer pair P2 of primers 2F and 2R, and primer pair P3 of primers 3F and 3R.
  • Nb WT stands for Nicotiana benthamiana wild-type DNA.
  • Cas9 stands for wild-type Cas9 and “dCas9” for deactivated Cas9.
  • Fig. 10 Exonuclease domain of Exo1 (ExolAC) fused to Cas9 only slightly increased HDR efficiency (see Example 11).
  • X-4LF2-Cas9 indicates that the exonucleases were fused to the N-terminal end of Cas9 using the 4LF2 linker.
  • Cas9 stands for wild-type Cas9 and “dCas9” for deactivated Cas9.
  • Fig. 11 Comparison of UL12-homologues in HDR (see Example 12).
  • X-4LF2-Cas9 indicates that the exonucleases were fused to the N-terminal end of Cas9 using the 4LF2 linker.
  • Cas9 stands for wild-type Cas9 and “dCas9” for deactivated Cas9.
  • the amino acid sequence of UL12 is given in SEQ ID NO: 32.
  • LIL12-2 SEQ ID NO: 33
  • BGLF5 SEQ ID NO: 34
  • Dumas SEQ ID NO: 35
  • MD5 SEQ ID NO: 36
  • PapE SEQ ID NO: 43
  • PiE SEQ ID NO: 44
  • SOX SEQ ID NO: 68
  • AB4P SEQ ID NO: 69.
  • Fig. 12 The LIL12 homologues PapE and PiE fused to Cas9 showed an increased HDR-efficiency over LIL12 in the GFP spot count analysis (see Example 12).
  • Fig. 13 Comparison of T7-homologues in HDR (see Example 13).
  • Fig. 14 The T7 homologue ME15 shows HDR-efficiency comparable to LIL12 (see Example 13).
  • Fig. 15 Tree of LIL12 homologues (see Example 12).
  • Fig. 17 Amino acid sequence identities of UL12- and T7-homologues together with T5 (see Examples 12 and 13).
  • Fig. 18 Comparison of exonuclease activity of monomeric exonucleases using blunt end DNA substrates (see Example 14). Hairpin oligo: SEQ ID NO: 37.
  • Fig. 19 Comparison of exonuclease activity of monomeric (T5 and T7) with trimeric (LaExo) exonucleases using blunt end DNA substrates SEQ ID NO: 37 (see Example 15).
  • Fig. 20 Activity of Cas9-fused LaExo cannot be increased by coexpression of nuclear localized LaExo (termed N-LaExo or LaExo-N) (see Example 16).
  • Cas9-2LF2-X indicates that the exonucleases were fused to the C-terminal end of Cas9 using the 2LF2 linker.
  • N-LaExo indicates that a nuclear localization signal (NLS) was fused to the N-terminal end of LaExo (Lambda Exonuclease).
  • LaExo-N indicates that a nuclear localization signal (NLS) was fused to the C-terminal end of LaExo.
  • “Cas9” stands for wild-type Cas9 and “dCas9” for deactivated Cas9.
  • Fig. 22 Cas12a-exonuclease fusion leads to increased HDR in planta (see Example 18).
  • X-4LF2-Cas12a indicates the exonucleases were fused to the N-terminal end of Cas12a using the 4LF2 linker.
  • “Cas12a” stands for wild-type Cas12a.
  • Fig. 23 Estimation of HDR-efficiency of Cas12a exonuclease fusion proteins by GFP spot count (see Example 19). Labelling as in Fig. 22.
  • dCas12a means deactivated Cas12a.
  • Fig. 24 Comparative analysis of the cleavage pattern of Cas9- and Castaexonuclease fusion proteins (see Example 20).
  • X-Cas9 and X-Cas12a indicates that the exonucleases tested were fused to the N-terminal end of Cas9 and Casta, respectively.
  • Fig. 25A-D Amplicon sequencing using Cas9-exonuclease fusion proteins and sgRNAs in PAM-out orientation (see Example 21).
  • Fig. 26A-D Amplicon sequencing using Casta-exonuclease fusion proteins and crRNAs in PAM-out orientation (see Example 22).
  • Fig. 27A/B/C Sequence alignment of tested alkaline exonucleases homologous to IIL12 (see Example 23). Residues of 5’-phosphate coordination and residues of catalytic triad are indicated in bold and underlined, respectively. conserveed motifs of alkaline exonucleases are underlined according to Goldstein and Weller (1998) and Buisson et al., 2009. UL12- group-specific motifs and PapE-specific amino acid residues are indicated with rectangles and lines, respectively.
  • BGLF5 SEQ ID NO: 34
  • SOX SEQ ID NO: 68
  • MD5 SEQ ID NO: 36
  • DUMAS SEQ ID NO: 35
  • AB4P SEQ ID NO: 69
  • PiE SEQ ID NO: 44
  • PapE SEQ ID NO: 43
  • UL12-1 SEQ ID NO: 32
  • UL12-2 SEQ ID NO: 33.
  • the line labeled “consensus” is not an amino acid sequence, but indicates positions of high conservation in the sequences above.
  • Fig. 28A/B Amino acid sequence alignment of sequence portions homologous to PapE (see Example 24).
  • PapE-group specific motifs I, II, III and IV correlate with increased HDR efficiency.
  • PapE-group is a sub-group of UL12-group exonucleases.
  • PapE-group specific motifs I, II, III and IV are indicated.
  • General alkaline exonuclease functional motifs II, III and IV (according to Goldstein and Weller 1998 and Buisson et al., 2009) are indicated by bars in the center.
  • the amino acid sequence of UL12 is given in SEQ ID NO: 32 and of PapE in SEQ ID NO: 43.
  • amino acid sequences of the proteins/peptides except UL12 and PapE are given in the SEQ ID NOs: 75 to 94.
  • the line labeled “consensus” is not an amino acid sequence, but indicates positions of high conservation in the sequences above.
  • T7 exonuclease homologues Residues of the catalytic triad are indicated underlined bold. T7 exonuclease-group specific motifs I and II are indicated with lines. ME15-specific motifs I and II are also indicated with a bar in the fifth portion from the top.
  • RalTLI SEQ ID NO: 72
  • PaPHBO2 SEQ ID NO: 71
  • SpiPhage SEQ ID NO: 46
  • YerO3- 12 SEQ ID NO: 70
  • T7 SEQ ID NO: 30
  • ME15 SEQ ID NO: 45.
  • the line labeled “consensus” is not an amino acid sequence, but indicates positions of high conservation in the sequences above. DETAILED DESCRIPTION OF THE INVENTION
  • the protein of the invention comprises a site-specific endonuclease (also briefly referred to herein as “endonuclease”) and a 5’-3’ exonuclease (also briefly referred to herein as “exonuclease”).
  • the endonuclease is a protein having endonuclease activity and is capable of cleaving phosphodiester bonds within a polynucleotide chain at a specific site.
  • the exonuclease is a protein having 5’-3’ exonuclease activity and is capable of cleaving nucleotides from the end of a polynucleotide chain in 5’ to 3’ direction.
  • the endonuclease and the exonuclease are enzymes and these enzymatic activities must be present in the protein of the invention.
  • the protein of the invention is a 5’-3’ exonuclease and a sitespecific endonuclease.
  • endonuclease and the exonuclease may combine to the protein of the invention. They may be bound covalently or non-covalently.
  • An example of covalent bonding is a fusion protein comprising the endonuclease and the exonuclease as domains of the fusion protein.
  • the endonuclease and the exonuclease can be bound by other covalent chemical bonds such as disulfide bridges or by chemical linkers (e.g. using glutardialdehyde optionally followed by reduction e.g. using sodium borohydride).
  • fusion proteins are preferred.
  • the protein may be an oligomeric protein (protein complex) comprising a first subunit (preferably protein subunit) comprising said endonuclease and a second subunit (preferably protein subunit) comprising said exonuclease.
  • the first subunit may comprise: the site-specific endonuclease (e.g. as a domain of the first subunit) and a first interaction domain (such as a first protein interaction domain) or first interaction nucleic acid (such as a nucleic acid comprising an aptamer);
  • the second subunit may comprise: the 5’-3’ exonuclease (e.g.
  • the first interaction domain may be such peptide epitope and the second interaction domain may be a single chain antibody binding said peptide epitope.
  • coil-coil protein-protein interaction domains can be used for the same purpose (Lebar et al., Nat Chem Biol, 16: 513-519).
  • the first and the second interaction domains may be coil-coil protein-protein interaction domains.
  • protein-RNA interaction domains can be used for the same purpose.
  • a peptide that specifically recognizes this aptamer will bind the gRNA and be in physical proximity to the endonuclease (Ma et al., Nature Biotech, 34: 528-531).
  • This principle can be used to bring the 5’-3’ exonucleases in proximity to the endonuclease e.g., by fusing the exonuclease to the peptide recognizing the specific RNA aptamer and fusing the RNA aptamer to the gRNA.
  • the protein of the invention is a CRISPR-Cas nuclease
  • the gRNA comprises an aptamer
  • the exonuclease comprises a peptide as interaction domain that binds to the aptamer.
  • the aptamer-peptide complex can serve as a non-covalent linker between the endonuclease and the exonuclease.
  • the protein may be an oligomeric protein comprising a first subunit being or comprising the endonuclease a second subunit comprising said 5’-3’ exonuclease and (as a second interaction domain) a peptide capable of binding to the aptamer, and a nucleic acid having a segment capable of binding to the endonuclease (referred to as “interaction nucleic acid above, e.g. a gRNA) and an aptamer (segment) capable of binding to the 5’-3’ exonuclease (notably to said peptide capable of binding to the aptamer).
  • the endonuclease is preferably a CRISPR-Cas nuclease.
  • the first interaction domain is a single chain antibody and the second interaction domain is a peptide epitope that specifically binds to the single chain antibody;
  • the second interaction domain is a single chain antibody and the first interaction domain is a peptide epitope that specifically binds to the single chain antibody;
  • the first interaction nucleic acid is a gRNA comprising an aptamer and the second interaction domain is a peptide that specifically recognizes and binds the aptamer.
  • covalent binding is preferred and fusions proteins comprising the endonuclease and the exonuclease are more preferred.
  • the fusion protein of the invention is a fusion of a site-specific endonuclease with a 5’- 3’ exonuclease.
  • the endonuclease and the exonuclease represent domains of the fusion protein. Where, in the following, reference is made to the endonuclease or the exonuclease in the context of the fusion protein, the endonuclease domain or the exonuclease domain, respectively, of the fusion protein is/are meant.
  • the exonuclease may be fused to the N- terminal end or the C-terminal end of the site-specific endonuclease.
  • the fusion may be direct, i.e. without a linker.
  • the two domains are fused via a linker polypeptide in order to avoid steric hindrance between and/or for the two domains.
  • the fusion protein is a 5’- 3’ exonuclease and a site-specific endonuclease (and these functions are normally present in separate domains of the fusion protein).
  • the linker is a polypeptide of at least 10, preferably at least 20, and more preferably at least 30 amino acid residues.
  • the maximum number of amino acid residues of the linker is not particularly limited, but may be defined as 250 residues, preferably at most 200, and more preferably at most 150 amino acid residues.
  • the length of the polypeptide linker is between 40 and 90 amino acids, preferably between 50 and 80 amino acids and more preferably between 60 and 70 amino acids.
  • the polypeptide linker consists of 61 amino acids.
  • the site-specific endonuclease provides the protein of the invention, optionally in conjunction with further components, with the ability to detect a target site on the endogenous DNA of a eukaryotic cell or a eukaryotic organism, to guide the protein of the invention (e.g. the fusion protein) including the exonuclease to the target site and to cleave the endogenous DNA at the target site.
  • the term “target site” refers to a site on the endogenous DNA intended to be cleaved by the endonuclease.
  • the endonuclease of the protein of the invention (as well as the endonuclease domain of the fusion protein) has site-specific endonuclease function and can cleave the DNA at the target site.
  • double strand breaks (DSB) are induced to the endogenous DNA.
  • the DSBs may be blunt end DSBs or staggered DSBs with sticky overhangs.
  • the type of cleavage at the target site depends on the endonuclease used. Some endonucleases like Cas9 induce blunt end DSBs to the DNA whereas other endonucleases like Cas12a (formerly Cpf1) induce staggered DSBs with sticky overhangs.
  • nickases may be endonucleases with nickase activity, so-called nickases.
  • nickase may be a mutant variant of a CRISPR nuclease, such as Cas9.
  • a nickase is used to induce a DSB to an endogenous DNA molecule, wherein the nickase induces a single strand nick both at the coding strand and the template strand in proximity.
  • Two gRNAs may be used to guide the nickase to the two sites in order to produce a DSB by two nickase reactions.
  • the 5’-3’ exonuclease provides the protein of the invention with the ability to process the DNA at the target site, after cleavage by the endonuclease, by way of its 5’-3’ exonuclease activity.
  • the fusion or other bonding to the endonuclease ensures that the exonuclease is in proximity of the DNA ends generated by the endonuclease.
  • the inventors expect that the 5’-3’ exonuclease has a higher affinity to double-stranded DNA ends than the DNA repair factors from the NHEJ pathway, which prevents the latter from binding to the DNA ends.
  • the exonuclease processes the DNA in 5’ to 3’ direction and creates free 3’-overhangs preferably at both ends of the cleaved DNA.
  • the inventors assume that processing contributes to DNA break repair through the HDR pathway instead of the NHEJ pathway, especially when suitable donor DNA is present (see further below).
  • the 3’-overhangs are believed to pair with the complementary strand of the homology arms of the donor nucleic acid (preferably donor DNA) to create a complex of hybrid DNA that comprises the cleaved endogenous DNA and the donor nucleic acid.
  • the formation of this complex is also believed to contribute to the increased frequency of DNA break repair through the homology directed repair (HDR) pathway instead of the non-homologous end joining (NHEJ) pathway.
  • HDR homology directed repair
  • NHEJ non-homologous end joining
  • the site-specific endonuclease of the fusion protein may be any endonuclease that cleaves double-stranded DNA at a target site in a site-specific manner.
  • sitespecific endonucleases that can be used in the invention are Zinc-finger nucleases (ZFN), transcription activator- 1 ike effector nucleases (TALEN), and CRISPR-endonucleases, whereby the latter are preferred because of their ease of use and wide applicability.
  • ZFN Zinc-finger nucleases
  • TALEN transcription activator- 1 ike effector nucleases
  • CRISPR-endonucleases whereby the latter are preferred because of their ease of use and wide applicability.
  • CRISPR- endonucleases are Cas9 and Cas12a (formerly Cpf1) and modified versions (e.g. mutants) thereof that have endonuclease activity.
  • CRISPR- endonuclease Cas9 The structure and use of the CRISPR- endonuclease Cas9 is described inter alia in WO2014093712 A1 and WO2014093635 A1.
  • the structure and use of the CRISPR-nuclease Cpf1 is described inter alia in W02016205711 A1 and WO2017141173 A1.
  • the site-specific endonuclease is Cas9 or a mutant thereof having endonuclease activity.
  • a CRISPR-endonuclease such as Cas9 requires a guide RNA (gRNA) to guide the endonuclease to a target site by complementarity of the gRNA to sequences at the target DNA.
  • the gRNA has complementarity to a target nucleic acid (generally target DNA) and has the ability to bind to the endonuclease that is used for cleaving the target DNA.
  • the nuclease may be Cas9 or Cpf1 or modified versions (e.g. mutants) thereof that have endonuclease activity.
  • the invention is not limited to the Cas9 or Cpf1 endonucleases, and other CRISPR endonucleases may be used as well.
  • the gRNA comprises a guide sequence linked to a direct repeat sequence.
  • the guide sequence provides the complementarity to a target DNA for guiding the endonuclease to the target site.
  • the direct repeat sequence generally provides portions that allow binding of the gRNA to a CRISPR nuclease as, for example, in a tracrRNA.
  • the gRNA may be a single guide RNA (sgRNA), i.e.
  • a gRNA may comprise a sequence stretch complementary to the target DNA and, if required, a trans-activating CRISPR RNA (tracrRNA).
  • the sequence stretch complementary to the target DNA may have a length of from 19 to 22 contiguous nucleotides, preferably from 20 to 21 nucleotides. The succession of these elements depends on the type of CRISPR-Cas-system used.
  • the gRNA is generally a sgRNA that comprises in 5’ to 3’-direction a sequence stretch complementary to a strand of the target DNA and a trans-activating CRISPR RNA (tracrRNA).
  • tracrRNA trans-activating CRISPR RNA
  • the CRISPR endonuclease (also briefly referred to as CRISPR nuclease), for example Cas9, having bound gRNA (such as a sgRNA) can scan in the eukaryotic cell the endogenous DNA to recognize, at the target site, a target sequence adjacent to a Proto-spacer Adjacent Motif (so-called PAM-sequence).
  • PAM-sequence Proto-spacer Adjacent Motif
  • the distal part of the gRNA which is bound to the endonuclease, can hybridize with the unwound target DNA to identify the target site as determined by the gRNA.
  • the endonuclease may exert its function and cleave or nick the target DNA near the PAM sequence.
  • the pattern of the DNA cleavage depends on the properties of the endonuclease.
  • a CRISPR nuclease usually introduces double strand breaks (DSBs).
  • the DSBs may have blunt ends (e.g. in the case of Cas9).
  • Cpf1 may be used as the CRISPR nuclease.
  • the target DNA may be nicked, i.e. only one of the strands of the target DNA is cleaved. Nicking may be achieved by using a CRISPR nuclease having one of the two nuclease domains of a natural CRISPR nuclease inactivated by mutation.
  • both strands of the target DNA are preferably cleaved to introduce DSBs in the target DNA. Even more preferably, both strands of the target DNA are cleaved to introduce blunt-ended DSBs in the target DNA.
  • both strands of the target DNA are cleaved to introduce sticky-ended DSBs in the target DNA.
  • CRISPR nucleases are divided into different types based on their mode of operation. They originate from different bacteria and/or archaea and differ in the size, domain structure, and the PAM-sequence recognized. Nevertheless, CRISPR/Cas nucleases depend on the basic principle of a RNA-guided nuclease activity.
  • Cpf1 as an example of a CRISPR nuclease that differs from Cas9 in that it recognizes a different PAM-sequence and does not require a tracrRNA sequence in the gRNA (EP 3 009 511 ; Zetsche et al., Cell 163(3) (2015) 759-771).
  • Cpf1 unlike Cas9, generates double strand breaks with sticky overhangs.
  • the site-specific endonuclease may alternatively have nickase activity and introduce single strand breaks (nicks) into the endogenous DNA of the eukaryotic or prokaryotic cell.
  • Single strand nicks at both the coding strand and the template strand are required to produce DSBs to the DNA using nickase enzymes. These two nicks may be carried out by the same nickase.
  • Two different guide gRNAs one directed to the coding strand and the other one directed to the template strand, may be used to obtain a DSB.
  • the two gRNAs guide the nickase enzyme to introduce single strand nicks both at the coding strand and the template strand of the DNA in proximity, wherein one guide RNA is designed in PAM- in and the other in PAM-out orientation.
  • the exonuclease of the protein of the invention is generally a monomeric 5’-3’ exonuclease.
  • An exonuclease is a monomeric 5’-3’ exonuclease in the sense of the present invention if it consists of a single protein subunit and the exonuclease activity is present in this single protein subunit.
  • many 5’-3’ exonucleases known in the art are multimeric.
  • the inventors have surprisingly found that much better results and better gene editing efficiency, notably using the HDR mechanism, can be achieved if the exonuclease is a monomeric exonuclease.
  • An exonuclease comprises 5’-3’ exonuclease activity if it hydrolyzes blunt end double stranded DNA in 5’ to 3’-direction preferably from both ends of a DSB to create 3’-overhangs of the non-hydrolyzed strand.
  • the activity is not limited to hydrolysis of blunt end double stranded DNA.
  • the 5’-3’ exonuclease is a 5’-3’ exonuclease having 5’-3’ exonuclease catalytic efficiency kcat/K m of at least 0.072 (pM s) -1 or a turnover number of least 0.50 s' 1 using the exonuclease assay described herein.
  • the 5’-3’ exonuclease is a 5’-3’ exonuclease that has a 5’-3’ exonuclease catalytic efficiency k ca t/K m of at least 0.10, preferably of at least 0.20 (pM s)’ 1 , and/or a turnover number of least 0.70 s’ 1 , preferably of at least 1.4 s’ 1 .
  • the catalytic efficiency and/or the turnover number may be determined using, as substrate, the hairpin oligonucleotide of SEQ ID NO:37 that is phosphorylated at its 5’ end and carries a fluorescent dye linked to the thymine base (T) closest to the 3’-end of the oligonucleotide.
  • the underlying assay is described in Example 4. The assay is carried out at 27°C and pH 7.9 (measured at 25°C) and increasing fluorescence due to decreasing fluorescence quenching is monitored, recorded and plotted. Initial velocities using multiple different substrate concentrations are determined from the initial linear part of the plot. Turnover number and K m values are determined from a Lineweaver-Burk plot of the Initial velocities against substrate concentrations. Example 4 gives further details on the assay.
  • turnover number and/or catalytic efficiency of the exonuclease refer to and are assayed by using the free exonuclease enzyme, i.e. exonuclease not being part of the protein of the invention.
  • the turnover number and catalytic efficiency of the free exonuclease corresponds to that of the fusion protein or the protein complex comprising the exonuclease.
  • the 5’-3’ exonuclease activity of the 5’-3’ exonuclease of the invention is higher than the 5’-3’ exonuclease activity of the T5 exonuclease (SEQ ID NO: 31) in the in-vitro exonuclease assay described in Example 4.
  • the 5’-3’ exonuclease activity of the 5’-3’ exonuclease of the invention is at least 2-fold, preferably at least 3-fold, more preferably at least 4-fold that of the 5’-3’ exonuclease activity of the T5 exonuclease (SEQ ID NO: 31) in the in-vitro exonuclease assay described in Example 4
  • the 5’-3’ exonuclease activity of the 5’-3’ exonuclease according to the invention is the same or higher than the 5’-3’ exonuclease activity of the bacteriophage T7 exonuclease (SEQ ID NO: 30) in the in-vitro exonuclease assay described in Example 4.
  • the inventors have surprisingly found that the efficiency of gene editing using the HDR mechanism depends on the exonuclease activity of the exonuclease. Otherwise, the exonuclease is not particularly limited, except being a monomeric 5’-3’ exonuclease.
  • Natural exonucleases may be used for fusion with the endonuclease. However, a natural exonuclease may be modified, e.g. by introducing mutations, additions, insertions and/or deletions, provided the exonuclease activity is not compromised.
  • exonuclease examples include the T7 exonuclease (SEQ ID NO: 30), the LIL12-1 exonuclease (SEQ ID NO: 32), and the LIL12-2 exonuclease (SEQ ID NO: 33). Further examples are the BGLF5 exonuclease (SEQ ID NO: 34), DUMAS exonuclease (SEQ ID NO: 35), and the MD5 exonuclease (SEQ ID NO: 36).
  • Preferred exonucleases are PapE (SEQ ID NO: 43), a deoxyribonuclease from Papiine alphaherpesvirus 2; PiE (SEQ ID NO: 44), a deoxyribonuclease from Pteropus lylei-associated alpha-herpesvirus.
  • Further examples of exonucleases are SOX (SEA ID NO: 68) and AB4P (SEQ ID NO: 69).
  • Other particularly suited exonucleases are ME15 (SEQ ID NO: 45) and SpiPh (SEQ ID NO: 46).
  • exonucleases are 03-12 (SEQ ID NO: 70), PhBO2 (SEQ ID NO: 71) and RaTL1 (SEQ ID NO: 72). Variants of these exonucleases as defined herein are also suitable for practicing the invention.
  • the 5’-3’ exonucleases that may be used in the invention may be grouped into the following two groups, (I) the UL-12 homologues (some of which are depicted in Fig. 15) and (II) the T7 homologues (some of which are depicted in Fig. 16).
  • group (I) are: the LIL12-1 exonuclease (SEQ ID NO: 32), the LIL12-2 exonuclease (SEQ ID NO: 33), the BGLF5 exonuclease (SEQ ID NO: 34), the DUMAS exonuclease (SEQ ID NO: 35), the MD5 exonuclease (SEQ ID NO: 36), the PapE exonuclease (SEQ ID NO: 43), the PiE exonuclease (SEQ ID NO: 44), the SOX exonuclease (SEA ID NO: 68), the AB4P exonuclease (SEQ ID NO: 69); and variants of these exonucleases as defined herein.
  • UL12-1 exonuclease SEQ ID NO: 32
  • UL12-2 exonuclease SEQ ID NO: 33
  • PapE exonuclease SEQ ID NO: 43
  • PiE exonuclease SEQ ID NO: 44
  • PapE exonuclease SEQ ID NO: 43
  • PiE exonuclease SEQ ID NO: 44
  • T7 exonuclease SEQ ID NO: 30
  • ME15 exonuclease SEQ ID NO: 45
  • SpiPh exonuclease SEQ ID NO: 46
  • 03-12 exonuclease SEQ ID NO: 70
  • PhBO2 SEQ ID NO: 71
  • RaTL1 exonuclease SEQ ID NO: 72
  • ME15 exonuclease SEQ ID NO: 45
  • SpiPh exonuclease SEQ ID NO: 46
  • the 5’-3’ exonuclease according to the invention is a protein the amino acid sequence of which comprises or consists of:
  • an amino acid sequence having from 1 to 50, preferably at most 40, more preferably at most 30, even more preferably at most 20, and most preferably at most 10 amino acid substitutions, additions, deletions and/or insertions compared to the amino acid sequence of SEQ ID NO: 32 or SEQ ID NO: 33.
  • the 5’-3’ exonuclease according to the invention is a protein the amino acid sequence of which comprises or consists of:
  • amino acid sequence having from 1 to 50, preferably at most 40, more preferably at most 30, even more preferably at most 20, and most preferably at most 10 amino acid substitutions, additions, deletions and/or insertions compared to the amino acid sequence of SEQ ID NO: 34.
  • the 5’-3’ exonuclease according to the invention is a protein the amino acid sequence of which comprises or consists of:
  • amino acid sequence having from 1 to 50, preferably at most 40, more preferably at most 30, even more preferably at most 20, and most preferably at most 10 amino acid substitutions, additions, deletions and/or insertions compared to the amino acid sequence of SEQ ID NO: 35.
  • the 5’-3’ exonuclease according to the invention is a protein the amino acid sequence of which comprises or consists of: (i) the amino acid sequence of SEQ ID NO: 36 (MD5 exonuclease); or
  • amino acid sequence having from 1 to 50, preferably at most 40, more preferably at most 30, even more preferably at most 20, and most preferably at most 10 amino acid substitutions, additions, deletions and/or insertions compared to the amino acid sequence of SEQ ID NO: 36.
  • the 5’-3’ exonuclease is a protein whose amino acid sequence is or comprises:
  • an amino acid sequence of from 1 to 121 preferably from 1 to 90, more preferably from 1 to 60, even more preferably from 1 to 45 and most preferably from 1 to 30 amino acid substitutions, additions, insertions and/or deletions compared to the amino acid sequence defined in SEQ ID NO: 43.
  • the amino acid sequence of the 5’-3’ exonuclease of this embodiment preferably comprises the PapE-group specific Motif I, and/or II, and/or III (cf. Fig. 27, underlining designates motifs identified in Fig. 27) of amino acid sequence PAASVH, RRL, and APASAPAAVRAA (SEQ ID NO: 50), respectively, at positions corresponding to that/those of PapE (cf. Fig. 27). More preferably, all these three motifs are present.
  • the amino acid sequence of the 5’-3’ exonuclease of this embodiment preferably comprises one or more amino acid sequence segments selected from the group consisting of SEQ ID NO: 47 (APAESVHACGVL), SEQ ID NO: 48 (APAASVHACGVL), SEQ ID NO: 49 (AKYAFDPADAGXXVVAAHRRL), SEQ ID NO: 50 (APASAPAAVRAA) and SEQ ID NO: 51 (LIITPVRXDAA), more preferably at positions corresponding to those in SEQ ID NO: 43.
  • SEQ ID NO: 47 APAESVHACGVL
  • SEQ ID NO: 48 APAASVHACGVL
  • SEQ ID NO: 49 AKYAFDPADAGXXVVAAHRRL
  • SEQ ID NO: 50 APASAPAAVRAA
  • SEQ ID NO: 51 LIITPVRXDAA
  • the 5’-3’ exonuclease according to the invention is a protein whose amino acid sequence is or comprises
  • an amino acid sequence of from 1 to 131 preferably from 1 to 98, more preferably from 1 to 65, even more preferably from 1 to 49 and most preferably from 1 to 32 amino acid substitutions, additions, insertions and/or deletions compared to the amino acid sequence defined in SEQ ID NO: 44.
  • exonuclease notably PapE and PiE and variants thereof as defined above with reference to SEQ ID NOs: 43 and 44
  • Group (I) exonucleases generally share certain sequence motifs that are indicated in the alignment of Fig. 27.
  • the variants of the exonucleases given above preferably have these sequence motifs, such as the UL12-qroup specific Motif I of amino acid sequence K/RPLMXFF/YE.
  • K/R means either K or R
  • F/Y means either F or Y.
  • X is defined as above.
  • the Group (I) exonuclease is, alternatively or additionally to the above embodiments, a 5’-3’ exonuclease comprising UL12-group specific Motif I, Motif I, the bridge region, Motif la, and the
  • the 5’-3’ exonuclease may thus be an exonuclease whose amino acid sequence comprises the amino acid sequence segment of SEQ ID NO: 56 (PXPLMXFXEAATQXQXXXQLWXLLRRGLXTAXTLXWGXXGPXFXXWLXXXXXXXXXXXXXXXX AXXFGRXNEXXARXXLFRYCVGRAD), wherein X at position no. 35 of this sequence is preferably K or R. This segment is preferably present at the position corresponding to that in UL12-1.
  • the Group (I) exonuclease may comprises Motif I, cf. Fig. 27.
  • the 5’-3’ exonuclease may be a protein whose amino acid sequence comprises an amino acid sequence segment of (and including) amino acid residues no. 9 to 37 of SEQ ID NOs: 56, wherein X at position no. 35 of SEQ ID NOs: 56 is preferably K or R.
  • the Group (I) exonuclease is, alternatively or additionally to the above embodiments, a 5’-3’ exonuclease comprising UL12-qroup specific Motif II, cf. Fig. 27.
  • the 5’-3’ may thus be an exonuclease whose amino acid sequence comprises the amino acid sequence segment RYCV or FRYCV, preferably followed contiguously by the segment GRAD to result in segment RYCVGRAD or FRYCVGRAD, respectively.
  • sequence segments are preferably present at a position corresponding to that in LIL12-1.
  • the Group (I) exonuclease may comprise Motif II, cf. Fig. 27.
  • the 5’-3’ exonuclease may be a protein whose amino acid sequence comprises an amino acid sequence segment of SEQ ID NO: 95 (GVLXDXHTGMVGASLD), wherein the X at position no. 4 is preferably M, V, L, or I; H at position 7 may alternatively be R; and/or M at position 10 may alternatively be V or L.
  • the Group (I) exonuclease may comprises Motif III, cf. Fig. 27.
  • the 5’-3’ exonuclease may be a protein whose amino acid sequence comprises an amino acid sequence segment of SEQ ID NO: 96 (EVKCRAKYAFDPXD), wherein the V at position no. 2 may alternatively be I; the A at position 9 may alternatively be L or T; and/or the D at position 14 may alternatively be E.
  • the Group (I) exonuclease may comprises Motif VI, cf. Fig. 27.
  • the 5’-3’ exonuclease may be a protein whose amino acid sequence comprises an amino acid sequence segment of SEQ ID NO: 97 (FANPRHPNFKQILVQXYVLXXHFP), wherein the K at position no. 10 may alternatively be R; and/or the X at position 16 is preferably G, A, S, or T.
  • the protein for editing endogenous DNA is a fusion protein comprising a CRISPR-nuclease as endonuclease and a LIL12 (UL12_1 or UL12_2 or their variants defined herein) as exonuclease.
  • the protein for editing endogenous DNA is preferably a protein, wherein the site-specific endonuclease is a CRISPR-nuclease as defined above, and wherein the site-specific endonuclease and the 5’-3’ exonuclease are fused via a polypeptide linker, the polypeptide linker having a length of 25 amino acids or more, preferably 30 amino acids or more, more preferably 40 amino acids or more, even more preferably 50 amino acids or more and most preferably 60 amino acids or more, and wherein the 5’-3’ exonuclease is a protein whose amino acid sequence is or comprises (i) the amino acid sequence defined in SEQ ID NO: 32 (LIL12-1) or SEQ ID NO: 33 (UL12-2), or
  • amino acid sequence that has at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably at least 95% and most preferably at least 98% sequence identity to the amino acid sequence defined in SEQ ID NO: 32 or SEQ ID NO: 33, or
  • amino acid sequence that has at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably at least 95% and most preferably at least 98% sequence similarity to the amino acid sequence defined in SEQ ID NO: 32 or SEQ ID NO: 33, or
  • This embodiment may be combined with other preferred embodiments described herein, such as comprising the amino acid sequence segment of SEQ ID NO: 56.
  • the 5’-3’ exonuclease according to the invention is a protein the amino acid sequence of which comprises or consists of:
  • an amino acid sequence having from 1 to 50, preferably at most 40, more preferably at most 30, even more preferably at most 20, and most preferably at most 10 amino acid substitutions, additions, deletions and/or insertions compared to the amino acid sequence of SEQ ID NO: 30.
  • the 5’-3’ exonuclease according to the invention is a protein whose amino acid sequence is or comprises
  • amino acid sequence that has at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably at least 95% and most preferably at least 98% sequence identity to the amino acid sequence defined in SEQ ID NO: 45, or (iii) an amino acid sequence that has at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably at least 95% and most preferably at least 98% sequence similarity to the amino acid sequence defined in SEQ ID NO: 45, or
  • amino acid sequence of the 5’-3’ exonuclease of this embodiment may comprise one or more amino acid sequence segments selected from the group consisting of SEQ ID NO: 52 (APTESETLWDCI) and SEQ ID NO: 53 (ILRFNDYNIDT).
  • the 5’-3’ exonuclease according to the invention is a protein whose amino acid sequence is or comprises
  • amino acid sequence that has at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably at least 95% and most preferably at least 98% sequence identity to the amino acid sequence defined in SEQ ID NO: 46, or
  • the 3’-overhangs produced by the action of the exonuclease, preferably on both ends of the cleaved endogenous DNA, are homologous to the homology arms of the donor nucleic acid and can anneal to them. This annealing is the starting point for the DNA break repair through HDR.
  • amino acid substitutions, additions, deletions, and insertions may be combined, but the given number or number range refers to the sum of all substitutions, additions, insertions and deletions of amino acid residues compared to a reference sequence defined by a SEQ ID NO.
  • amino acid substitutions, additions, insertions and deletions amino acid substitutions, additions, and deletions are preferred.
  • insertions relates to insertions of amino acid residues within the amino acid sequence of the reference sequence, i.e. excluding additions at the C- or N-terminal end.
  • additions means additions of amino acid residues at the C- or N-terminal end of the amino acid sequence of a reference sequence.
  • a “deletion” may be a deletion of a terminal or an internal amino acid residue of a reference sequence.
  • the protein or a domain thereof is defined by a number or number range of amino acid substitutions, additions, deletions, and/or insertions relative to a reference sequence, the protein may, as an alternative embodiment, have from 1 to several amino acid substitutions, additions, insertions or deletions relative to the indicated amino acid sequence of segment.
  • the donor nucleic acid (sometimes also referred to as “repair template”, “donor fragment”; or as “DNA repair template” or “donor DNA” if it is DNA) for use with the protein of the invention (i.e. the fusion protein or oligomeric protein (protein complex) of the invention) is a nucleic acid molecule that generally comprises a donor sequence flanked by a first and a second homology arm, one at the 5’ end and the other at the 3’ end of the donor nucleic acid.
  • the first homology arm is generally homologous to a first region flanking a target site in the genome of said cell on a first side of said target site.
  • the second homology arm is homologous to a second region flanking said target site on the second side of said target site.
  • the donor nucleic acid When provided into the cell, the donor nucleic acid may be single stranded or double stranded DNA or RNA and may be linear or circular. However, if the donor nucleic acid is provided into the cell as RNA, it generally needs to be transcribed in the cell from RNA into a donor DNA for the HDR to work.
  • the donor nucleic acid may be transcribed from RNA into DNA by a reverse transcriptase that may be provided into the cell in addition to the donor RNA.
  • the donor nucleic acid is preferably DNA and is, in this embodiment, also referred to herein as donor DNA.
  • the donor nucleic acid may be part of a DNA construct, plasmid or vector.
  • the homology arms of the donor nucleic acid comprise a nucleotide sequence that is homologous to the endogenous DNA in proximity of the target site.
  • the homology arm at the 5’ end of the donor nucleic acid may be homologous to a nucleotide sequence upstream of the target site and the homology arm at the 3’ end may be homologous to a nucleotide sequence downstream of the target site. Due to this homology, the 3’ overhangs that are generated by the 5’-3’ exonuclease of the protein of the invention can invade the homology arms of the donor nucleic acid and anneal to the complementary strand of the homology arm.
  • hybrid DNA complex comprising the endogenous DNA and the donor nucleic acid.
  • This hybrid DNA complex is also referred to as “displacement loop” (D-loop) and represents the first step of DNA DSB repair through HDR.
  • D-loop displacement loop
  • the 3’ overhangs annealed to the homology arms can serve as primers for a DNA polymerase to synthesize a new DNA strand using the homology arms as a template.
  • This process allows inserting a copy of the donor sequence comprised in the donor nucleic acid into the endogenous DNA at the target site. It is possible to delete a specific nucleotide sequence in the endogenous DNA at the target site.
  • a donor sequence may be used that is shorter than the nucleotide sequence it replaces in the endogenous DNA and with a suitable choice of the homology arms.
  • the donor sequence may even be absent.
  • the protein according to the invention is provided into a eukaryotic cell in combination with a donor nucleic acid for editing endogenous DNA using HDR.
  • the protein according to the invention is provided into the cell without a donor nucleic acid.
  • the protein according to the invention may generate a deletion of at least one nucleotide in the endogenous DNA in the direct vicinity of the double strand break that was generated at the target site.
  • this embodiment generates deletions of two or more sequential nucleotides in the direct vicinity of the double strand break that was generated at the target site.
  • Such an embodiment is particularly useful for deleting one or more nucleotides in a random manner in a non-coding region of the endogenous DNA.
  • the donor nucleic acid can be introduced into the cell in many different ways that are generally known to the skilled person. Depending on the delivery method, the donor nucleic acid may be introduced into the cell as single stranded or double stranded DNA or RNA. In the case of plants, the donor nucleic acid may be introduced into the cell through Agrobacterium- mediated transformation. To provide the donor nucleic acid into a plant cell or a cell of a plant, the cell or plant is contacted with a suspension of Agrobacterium cells that carry the donor nucleic acid within the T-DNA of a T-DNA binary plasmid.
  • Agrobacterium cells secrete the T-DNA as single-stranded DNA into plant cells, which is, as demonstrated by the Examples below, sufficient for HDR to work and eventually lead to gene replacement (gene targeting) according to the present invention.
  • the donor nucleic acid when secreted into a plant cell as part of the single-stranded T- DNA, is converted into double stranded DNA before HDR takes place. Either way, the provision of the donor nucleic acid in either single-stranded or double stranded form into a plant cell using transformation with Agrobacterium triggers HDR and eventually leads to gene replacement or editing according to the present invention.
  • the donor nucleic acid may alternatively be provided into the cell in the form of RNA, for example through delivery by an RNA virus or after expression from a transgene.
  • the donor nucleic acid if it is provided into the cell in the form of RNA, it should be reverse- transcribed into DNA using a reverse transcriptase that may be coexpressed within the cell.
  • Other transformation or transfection methods generally known in the art may be employed.
  • the donor nucleic acid may be linear single stranded DNA.
  • the donor nucleic acid is a linear double stranded DNA, more preferably linear double stranded DNA.
  • the donor nucleic acid comprises first and second homology arms as described above.
  • the donor nucleic acid may optionally comprise a donor sequence that is positioned in between the two homology arms.
  • the first homology arm may be located at the 5’ end of the donor nucleic acid and comprises a nucleotide sequence that shares preferably at least 95% sequence identity to a fragment on the endogenous DNA that lies upstream of the target site.
  • the second homology arm at the 3’ end of the donor nucleic acid comprises a nucleotide sequence that shares at least 95% sequence identity to a fragment of the endogenous DNA that lies downstream of the target site.
  • the homology arms of the donor nucleic acid show no mismatch to the endogenous DNA.
  • the homology arms have a perfect match to the endogenous DNA at least towards the 5’-end and the 3’-end of the donor nucleic acid molecule.
  • the nucleotide sequence of the homology arm is at least 20 bp, preferably at least 60 bp, more preferably at least 60 bp, and most preferably at least 120 bp long and the sequence stretches with highest identity should be oriented towards the 5’-end and the 3’-end of the donor nucleic molecule.
  • Each homology arm should have a minimum length of 20 bp and may be up to 1000 bp long, preferably up to 500 bp long and more preferably up to 250 bp long. Preferably, the length of the homology arms is longer than 50 bp. In a specific embodiment, each of the two homology arms is between 100 and 300 bp long and allows perfect pairing to the endogenous DNA without a mismatch.
  • the donor sequence (may also be referred to herein as donor nucleotide sequence) that is optionally present in the donor nucleic acid is a nucleotide sequence that may be inserted into the endogenous DNA at the target site. If no donor sequence is present in the donor nucleic acid, no nucleotide sequence will be integrated from the donor nucleic acid into the endogenous DNA. However, a sequence segment may be deleted from the endogenous DNA, for example a sequence segment that lies between the two homology arms after the donor nucleic acid has annealed to the endogenous DNA.
  • the donor sequence comprises at least one nucleotide, preferably at least 10 nucleotides, more preferably at least 30 nucleotides, even more preferably at least 100 nucleotides.
  • the maximum length of the donor sequence is not particularly limited.
  • the donor sequence may be up to 15,000 nucleotides long. In another embodiment, the donor sequence may also be 20,000 nucleotides (20 kb) long. However, the insertion of long donor sequences into the endogenous DNA at the target site may be less likely than for shorter donor sequences.
  • the donor sequence of the repair template is up to 10,000 nucleotides long, preferably up to 7,000 nucleotides long, and more preferably up to 3000 nucleotides long.
  • the inventors have found that the protein according to the invention allows the insertion of unexpectedly long donor nucleotide sequences comprised in the donor nucleic acid into the endogenous DNA through HDR.
  • the protein according to the invention allows increasing the frequency of gene replacement events preferably by for several orders of magnitude compared to the absence of the exonuclease, so that unexpectedly long nucleotide sequences can be inserted into the endogenous DNA.
  • the donor sequence may be or may contain one or more open reading frames (ORFs) or entire gene to be inserted at the target site in the endogenous DNA.
  • the protein of the invention can be used for modifying (also “editing”) endogenous DNA of a eukaryotic cell at a target site of endogenous DNA.
  • editing endogenous DNA refers to modifications of the endogenous DNA at a target site.
  • the underlying mechanism of the modifications of the invention is believed to be homology directed repair (HDR).
  • HDR homology directed repair
  • the modifications in the endogenous DNA depend and the donor nucleic acid and are selected from the group consisting of insertions into endogenous DNA, deletions from endogenous DNA, and substitutions in endogenous DNA.
  • insertion means that at least one nucleotide from the donor nucleic acid is inserted into the endogenous DNA at the target site.
  • Substitution means that at least one nucleotide (preferably a segment of two or more nucleotides) of the endogenous DNA at the target site is replaced at the target site with at least one different nucleotide or with a segment of two or more nucleotides from the donor nucleic acid.
  • Deletion means that at least one nucleotide (preferably a segment of two or more nucleotides) is deleted from the endogenous DNA at the target site.
  • the insertion, substitution or deletion of nucleotides in the endogenous DNA is achieved through appropriate design of the donor nucleic acid.
  • the endogenous DNA of a eukaryotic cell may be the genomic DNA of the cell, but any double stranded DNA molecule (such as mitochondrial, plastid or other) that is contained within the cell can be edited.
  • the endogenous DNA may be edited at a single target site or at two or more target sites simultaneously.
  • the editing of the endogenous DNA according to the present invention may also be referred to as “gene replacement” or “gene targeting”.
  • Gene replacement according to the present invention comprises the editing of endogenous DNA as defined above.
  • gene replacement also comprises the generation of targeted gene knock-outs through the targeted deletion of one or more nucleotide sequence stretches or single base pairs in the endogenous DNA of a eukaryotic cell. This needs to be distinguished from non-target gene knock-outs where the nature of the mutation cannot be predicted in advance.
  • the protein of the invention allows increasing the frequency of double strand break repair through the homology directed repair (HDR) pathway.
  • the invention also allows increasing the number of gene replacement events in a eukaryotic cell, presumably because DNA break repair through HDR is a prerequisite for gene replacement.
  • the protein of the invention allows achieving a higher number (i.e. an increased frequency) of gene replacement events when provided with a donor nucleic acid in comparison to the separate action of the exonuclease and the endonuclease not being fused together or without forming a protein complex.
  • the protein of the invention increases the number of gene replacement events at least 1.5-fold, more preferably at least 3-fold and even more preferably at least 5-fold over the separate provision of the exonuclease and the endonuclease (i.e. without being fused together or without forming a protein complex when provided with a donor nucleic acid). Further, the protein of the invention allows increasing the number of gene replacement events at least 1.5-fold, more preferably at least 2-fold and even more preferably at least 3-fold over fusion proteins or protein complexes comprising the T5 exonuclease as an exonuclease. The protein of the invention also allows increasing the number of gene replacement events at least 1.5-fold, more preferably at least 2-fold and even more preferably at least 3-fold over fusion proteins or protein complexes comprising a multimeric exonuclease.
  • 5’-3’ exonucleases that are monomeric and show high in- vitro 5’-3’ exonuclease activity are particularly suited to significantly increase the frequency of HDR and eventually the frequency of gene replacement events when they are contained in a fusion protein or a protein complex with a site-specific endonuclease and are provided with a donor nucleic acid.
  • the editing of endogenous DNA according to the invention takes place in eukaryotic cells. Preferably, it takes place in at least one cell of a eukaryotic organism. Preferably, however, the editing takes place in multiple cells of a eukaryotic organism, such as in multiple cells of one or more leaves of a plant. The editing generally takes place in two or more cells in parallel.
  • Eukaryotic cells generally comprises a functional HDR pathway naturally, so that no genetic engineering to provide a cell with the HDR pathway is generally necessary.
  • further components that modify and/or improve the frequency and/or efficiency of DNA break repair through homologous recombination (HDR) may be provided into the cell.
  • HDR homologous recombination
  • components like the proteins Rad51 and/or Rad52 may be provided into the cell to support the DNA break repair through HDR.
  • components may be provided into the cell that downregulate the NHEJ pathway to favor DNA break repair through HDR.
  • an organism After successful editing in a eukaryotic cell, it is possible to obtain an organism from the edited eukaryotic cell. For example, after editing an embryonic animal cell, an animal containing the edited endogenous DNA in all cells may be obtained. After editing a germ cell of an animal, the edited germ cell may be used for fertilizing another germ cell or the edited germ cell may be fertilized by another germ cell for obtaining an embryonic cell containing the edited endogenous DNA. In a further alternative, somatic animal cells comprising the edited endogenous DNA may be produced and propagated. Such cells may, for example, express a protein or other factor that is not expressed in suitable form in the starting cell. If the edited cells are administered to an organism lacking the protein or factor, the organism may be provided with the missing protein or factor.
  • the invention provides a method of treating or preventing a genetic defect in a eukaryotic organism (e.g. a human being or an animal), comprising modifying endogenous DNA of a eukaryotic cell at a target site of the endogenous DNA (according to the invention), cultivating the modified cell and/or progeny cells of the modified cell, and administering the modified cell or cells to a eukaryotic organism in need thereof.
  • a eukaryotic organism e.g. a human being or an animal
  • plants After successful editing in a eukaryotic plant cell, it is possible to regenerate plants from the cell, whereby the plants contain the endogenous DNA in all cells of the plant. This allows producing new plant lines (e.g. of crop plants) containing the edited endogenous DNA.
  • Methods of regenerating plants from cells or tissue are generally known in the art of plant biotechnology. For example, plants may be regenerated from callus tissue using suitable media, as described e.g. in text books on plant biotechnology such as Slater, Scott and Fowler, Plant Biotechnology, second edition, Oxford University Press, 2008.
  • the editing may be carried out in germline cells of floral tissue, e.g.
  • the eukaryotic cell that may be edited according to the invention may be a fungal (e.g. yeast) cell, a plant cell, or an animal cell, such as a human cell.
  • the eukaryotic cell is a plant cell.
  • the organism of the invention may be a plant or animal organism.
  • humans are excluded and/or method of modifying the human germline are excluded from the invention.
  • the endogenous DNA of a single cell of a eukaryotic organism is edited.
  • the endogenous DNA of several cells of a eukaryotic organism are edited.
  • the endogenous DNA of a somatic cell of a eukaryotic organism is edited.
  • the endogenous DNA of germline cells is edited so that the edited endogenous DNA is inherited to the progeny.
  • the endogenous DNA of embryonic cells is edited so that the edited endogenous DNA is present in all cells of the organism that develops from the edited embryonic cells.
  • human cells may be edited according to the invention.
  • cells of livestock animals are preferred.
  • the plant or cells thereof wherein editing according to the invention may be carried out is not particularly limited.
  • the invention can be applied to monocot and dicot plants that can be edited according to the invention.
  • the plant species for practicing this invention include, but are not restricted to, representatives of Leguminoseae, Solanaceae, Chenopodiaceae, Compositae, Cucurbitaceae, Brassicaceae, and Scrophulariaceae among dicotyledons, and Poaceae, Musaceae, and Zingiberaceae among monocots. Both crop and non-crop plants can be used, whereby crop plants are preferred.
  • Common crop plants that are preferably edited with the protein according to the present invention include alfalfa, barley, beans, canola, cowpeas, cotton, corn, clover, lotus, lentils, lupine, maize, millet, oats, peas, peanuts, poplar, rice, rye, sweet clover, sunflower, sweetpea, soybean, sorghum, triticale, yam beans, velvet beans, vetch, wheat, wisteria, potato, banana, coffee, cacao, sugar beet, and nut plants.
  • Performing a method of the invention with such plants or cells of such plants allows producing cells of these plants containing the edited endogenous DNA.
  • plants containing the edited endogenous DNA in all cells of the plants may be produced therefrom.
  • Nucleic acids, DNA constructs, plasmids and vectors of the invention are provided.
  • the present invention also provides a nucleic acid molecule comprising a polynucleotide encoding the protein of the invention.
  • This nucleic acid molecule is also referred to herein as first nucleic acid molecule.
  • the polynucleotide is understood to be the coding sequence of the protein of the invention, such as the fusion protein of the invention or one subunit of the protein of the invention. If the protein of the invention is an oligomeric protein, the nucleic acid molecule may comprise two coding sequences, a first coding sequence encoding the first subunit and a second coding sequence encoding the second subunit. Alternatively, the first and the second subunit of an oligomeric protein, e.g. dimeric protein, may be encoded on separate nucleic acid molecules.
  • the invention also provides a first DNA construct comprising the polynucleotide(s) encoding the protein of the invention.
  • the invention also provides a plasmid or vector comprising the DNA construct.
  • the nucleic acid molecule and the DNA construct may contain additional genetic elements, such as genetic elements for expressing the protein in a eukaryotic cell. Examples of such genetic elements are a promoter active in the cell or organism and operably linked to the polynucleotide, optional transcription enhancers, and/or transcription terminators. For Agrobacterium-mediated transformation, left and right T-DNA border sequences may also be such genetic elements.
  • the first nucleic acid molecule, plasmid or vector may further comprise further nucleic acid segments, such as plasmid or vector backbone.
  • the nucleic acid molecule, DNA construct, plasmid or vector may be single stranded or double stranded and may be circular or linear; preferably they are double-stranded and circular.
  • the first nucleic acid molecule may, in one embodiment, further comprise the donor nucleic acid.
  • the first nucleic acid molecule may, in another embodiment, further encode the gRNA for a CRISPR nuclease as the endonuclease of the invention.
  • the first nucleic acid molecule comprises the donor nucleic acid and encodes the gRNA for a CRISPR nuclease.
  • the donor nucleic acid may be cut out form the nucleic acid molecule, e.g. using the CRISPR nuclease and additional gRNAs that guide the CRISPR nuclease to suitable cleavage sites on the nucleic acid molecule for cutting out the donor nucleic acid.
  • the first nucleic acid molecule, DNA construct, plasmid or vector of the invention may be generated by cloning together the elements to be combined.
  • a convenient cloning method that was also used in the Examples is Golden Gate (GG) cloning that makes use of type IIS restriction enzymes for restriction and seamless ligation, cf. WO 2008/095927 and WO2011154147.
  • the invention further describes a second nucleic acid molecule.
  • This second nucleic acid molecule is or comprises the donor nucleic acid of the invention as described above.
  • the invention also provides a second DNA construct that comprises the donor nucleic acid and optionally further elements such as left and right T-DNA borders for Agrobacterium-mediated transformation.
  • the second nucleic acid molecule may comprise further nucleic acid segments such as a plasmid or vector backbone.
  • the donor nucleic acid may be comprised by the first nucleic acid molecule or by the second nucleic acid molecule. Where a second nucleic acid molecule is used for the donor nucleic acid, it may further encode the gRNA for a CRISPR nuclease as the endonuclease of the invention.
  • the invention describes a third nucleic acid molecule comprising or encoding a gRNA, such as a sgRNA. If the third nucleic acid molecule is RNA, it comprises the gRNA. If it is DNA, the third nucleic acid molecule comprises a polynucleotide encoding the gRNA. For transcription in eukaryotic cells, the third nucleic acid molecule, if DNA, may comprise a (third) DNA construct containing a promoter operably linked to the polynucleotide encoding the gRNA. For transfection in plant cells using Agrobacterium-mediated transfection, the third DNA construct may further contain left and right T-DNA borders. As indicated above, additional gRNAs may be present or encoded on the third nucleic acid molecule, e.g. for cutting out the donor nucleic acid from the same or another nucleic acid molecule.
  • Promoters for expression in eukaryotic cells are generally known.
  • promoters active in plant cells are used.
  • the term "promoter active in plant cells” means a DNA sequence that is capable of controlling (initiating) transcription in a plant cell. This includes any promoter of plant origin, but also any promoter of non-plant origin which is capable of directing transcription in a plant cell, i.e. , certain promoters of viral or bacterial origin such as the cauliflower mosaic virus 35S promoter (CaMV35S promoter) (Harpster et al. (1988) Mol Gen Genet.
  • CaMV35S promoter cauliflower mosaic virus 35S promoter
  • the subterranean clover virus promoter No 4 or No 7 (WO9606932), or T-DNA gene promoters but also cell cycle specific (Ferreira et al., (1994) Plant Cell 6: 1763-1774), tissue-specific or organ-specific promoters including but not limited to seed-specific promoters (e.g., WO89/03887), egg-cell specific promoter (Steffen et al., (2007) Plant J. 51 :281-292; Sprunck et al., (2012) Science 338:1093-1097), organ- primordia specific promoters (An et al.
  • constitutive promoters i.e. promoters that are not developmentally regulated, are preferably used. However, constitutive promoters may be tissue-specific or organ-specific. Preferred promoters are those used in the Examples described below.
  • the editing of the endogenous DNA according to the present invention requires that the protein, the donor nucleic acid, and, in the case of a CRISPR nuclease as the endonuclease, the gRNA or gRNAs are simultaneously present in the same cell.
  • These elements are also referred to herein as components of the invention. This means that the components of the invention should be present in the same cell at the same time and, thus, may be provided to the cells in parallel or consecutively.
  • the components may be provided to eukaryotic cells, cells of an organism, or the organism transiently or stably.
  • Transient means that incorporation of nucleic acid molecule(s), or parts thereof, encoding or comprising the components into the genome of the eukaryotic cell is very unlikely and generally does not occur (e.g. because no selection pressure for incorporating the nucleic acid molecules into the genome of the eukaryotic cell or organism is applied).
  • a DNA plasmid as the nucleic acid molecule of the invention may comprise or encode a component(s) of the invention (where required operably linked to a promoter so that the component(s) can be expressed inside the cell from the DNA plasmid).
  • Stably providing the components to eukaryotic cells, cells of an organism, or the organism means that nucleic acid molecule(s), or parts thereof, encoding or comprising the component(s) are incorporated into the genome of the eukaryotic cell or organism (e.g. by application of selection pressure and selection of cells or organism wherein the incorporation has taken place or using Agrobacterium-mediated transformation).
  • Agrobacterium-mediated transformation generally integrates the T-DNA comprising or encoding the component(s) into the genome of the cell.
  • the genome then comprises or encodes the components comprised or encoded in the T-DNA so that the components can be expressed by a promoter operably linked to them or cut out (e.g. in the case of the donor nucleic acid).
  • somatic plant cells or cells of a plant are transformed by Agrobacterium, the components are generally not passed on to the daughter plants of the transformed plant. If germline plant cells or cells of a plant are transformed by Agrobacterium, the components can be passed on to the daughter cells or organisms, whereby the components or coding sequence encoding the components are stably integrated into the genome over subsequent generations.
  • a first nucleic acid molecule encoding the protein of interest may be provided stably to the eukaryotic cell or organism.
  • Cells containing and expressing the protein of the invention stably may then be provided with the second nucleic acid molecule comprising the donor nucleic acid and optionally the third nucleic acid molecule encoding one or more gRNA(s) transiently.
  • Providing components of the invention transiently to the eukaryotic cells, cells of an organism or the organism is advantageous, as it is generally desired to limit genetic modifications to the specific desired editing event.
  • the components may, for example, be injected into or be taken up by (for example upon electroporation or PEG mediated transformation) the eukaryotic cell as a mixed solution of the protein of the invention, the donor DNA (or a DNA comprising the donor DNA), and the gRNA.
  • the fusion protein and the gRNA are provided to the cells via genetic transformation or transient transfection of nucleic acid molecules encoding these components such that they are expressed in the transformed or transfected eukaryotic cells, and the donor DNA will also be transformed or transfected into the cells.
  • Methods for introducing the nucleic acid molecule(s) into animal cells are known to the skilled person, such as electroporation, microinjection, or using transfection agents (e.g. as described in WO2014056590 or WO2014053245). These methods are particularly suitable for transiently provided the components to the cells.
  • various methods for introducing the DNA molecule(s) into plant cells, cells of a plant or a plant are known, and examples are electroporation, PEG (polyethylene glycol) transformation, microinjection, particle bombardment, and the use of viral vectors. Again, these methods are particularly suitable for transiently provided the components to plant cells.
  • the preferred method of introducing the DNA molecules of the invention into plant cells or cells of a plant is Agrobacterium-mediated transformation.
  • Agrobacten c/m-mediated transformation is well-established in the field of plant biotechnology, e.g. from text books on plant biotechnology such as Slater, Scott and Fowler, Plant Biotechnology, second edition, Oxford University Press, 2008. It comprises contacting living plant tissue (e.g.
  • plant tissue may be sprayed with a suspension containing Agrobacterium cells and optionally an abrasive and a surfactant, e.g. as described in WO2012019660.
  • the first nucleic acid molecule of the invention may be a plasmid (such as binary vector) containing in its T-DNA the first DNA construct encoding the protein of the invention.
  • the second nucleic acid molecule of the invention may be a plasmid (such as binary vector) containing in its T-DNA the donor nucleic acid of the invention.
  • the third nucleic acid molecule of the invention may be a plasmid (such as binary vector) containing in its T-DNA a DNA construct encoding the one or more gRNAs of the invention.
  • a binary vector that contains in its T-DNA more than one of said nucleic acids in a single molecule, such as the first and the second nucleic acid.
  • such plasmid may additionally contain the third nucleic acid encoding the gRNA(s) in its T-DNA.
  • each type of nucleic acid molecules may be separately introduced as a binary vector in Agrobacterium and cultured.
  • the one, two or more Agrobacterium cultures, each containing one binary vector may be mixed and the mixture may be used for transforming plant cells or cells of a plant.
  • the Agrobacterium may belong to the species Agrobacterium tumefaciens or Agrobacterium rhizogenes that are commonly used for plant transformation and transfection and which are known to the skilled person from general knowledge.
  • the Agrobacterium strain to be used in the processes of the invention may comprise a nucleic acid molecule (Ti-plasmid or binary vector) that may be said first, second or third nucleic acid molecule, or a nucleic acid molecule comprising two or more of the nucleic acids of the invention.
  • the DNA construct(s) is/are typically present in T-DNA of the plasmid or binary vector for introduction of the nucleic construct into plant cells by the secretory system of Agrobacterium.
  • the nucleic acid construct(s) is/are flanked by a T-DNA border sequence for allowing transfection of said plant(s) and introduction into plant cells or cells of a plant.
  • said DNA construct(s) is/are present in T-DNA and flanked on both sides by T-DNA border sequences.
  • DNA construct means a recombinant construct comprising or encoding one or more components of the invention.
  • the DNA constructs may be present in the T-DNA of a Ti-plasmid or binary vector of the Agrobacterium strain.
  • Ti-plasmids or binary vectors may contain a selectable marker outside of said T-DNA for allowing cloning and genetic engineering in bacteria.
  • the T-DNA that is transferred into plant cells may not contain a selectable marker that would, if present, allow selection of plant or plant cells containing said T-DNA.
  • selectable marker genes that should, in this embodiment, not be present in T-DNA of the Ti-plasmid or binary vectors are an antibiotic resistance gene or a herbicide resistance gene.
  • the process of the invention preferably makes use of transient transfection.
  • the process of the invention does not comprise a step of selecting for plant cells or plants having incorporated the nucleic acid molecule(s) of the invention by using such antibiotic resistance gene or a herbicide resistance gene. Accordingly, in this embodiment, no antibiotic resistance gene or a herbicide resistance gene needs to be incorporated into the plant cells or plants. However, it is possible to use suitable markers for selecting or identifying the editing event and cells wherein the editing has occurred.
  • Agrobacterium-mediated gene transfer and vectors therefor are known to the skilled person, e.g. from the references cited herein or from textbooks on plant biotechnology such as Slater, Scott and Fowler, Plant Biotechnology, second edition, Oxford University Press, 2008.
  • Agrobacterium strains usable in the invention are those that are generally used in the art for transfecting or transforming plants.
  • binary vector systems and binary strains are used, i.e. the vir genes required for transfer of T-DNA into plant cells on the one hand and the T-DNA on the other hand are on separate plasmids.
  • vir plasmid containing the vir genes
  • vir helper plasmid The plasmid containing the T-DNA to be transfected is the so-called binary vector that may be a “DNA molecule” or “vector” of the invention.
  • the invention also provides an Agrobacterium cell containing the first nucleic acid molecule of the invention.
  • the invention provides an Agrobacterium cell comprising a plasmid comprising in the T-DNA the first DNA construct containing the polynucleotide encoding the fusion protein.
  • Co-transfection by Agrobacterium can be achieved by preparing two or more different Agrobacterium cultures, a first one that contains a first nucleic acid molecule (Ti plasmid or binary vector), construct or vector encoding the fusion protein and a second Agrobacterium culture containing the second nucleic acid molecule.
  • a third Agrobacterium culture containing the third nucleic acid molecule may also be prepared. Suspensions of these Agrobacterium cultures may be separately grown and mixed prior to transfection.
  • the suspension of agrobacteria may be produced as follows.
  • a nucleic acid molecule or vector may be transformed into the Agrobacterium strain and transformed Agrobacterium cultures may be grown preferably under application of selective pressure for maintenance of the nucleic acid molecule in question.
  • the Agrobacterium strain to be used in the processes of the invention is then inoculated into a culture medium and grown to a high cell concentration.
  • Agrobacteria are generally grown up to a cell concentration corresponding to an OD at 600 nm of at least 1, typically of about 1.5.
  • Such highly concentrated agrobacterial suspensions are then diluted to achieve the desired cell concentration.
  • water or Agrobacterium infiltration medium may be used.
  • the water may contain a buffer or salts.
  • the water may further contain the surfactant or wetting agent.
  • the concentrated agrobacterial suspensions may be diluted with water, and any additives such as the surfactant and the optional buffer substances are added after or during the dilution process.
  • Separately produced suspensions for co-transfection may then be mixed and the mixed suspension be used for transfecting plant cells or cells of a plant.
  • an Agrobacterium suspension may be added to the plant cell culture. If selected parts of a plant such as plant leaves are to be transfected, the generally known agroinfiltration may be used, whereby a pressure difference is used to insert the Agrobacterium suspension into plant tissue. For example, a needle-less syringe containing the Agrobacterium suspension may be used to press an Agrobacterium suspension into plant tissue. In another agroinfiltration method, an entire plant or major parts of a plant is dipped upside down into an Agrobacterium suspension, a vacuum is applied and then quickly released, whereby an Agrobacterium suspension is inserted into plant tissue.
  • the invention also comprises a kit of parts for editing endogenous DNA at a target site in a eukaryotic cell.
  • the kit comprises at least two parts: A donor nucleic acid, donor construct or second nucleic acid molecule as described in the section “Donor nucleic acid” and a protein of the invention as described in the section “The protein and the fusion protein of the invention”.
  • the kit comprises the protein of the invention in the form of expressed protein or as nucleic acid molecule comprising a polynucleotide encoding the protein (e.g. fusion protein).
  • the kit may further comprise a eukaryotic cell.
  • the kit allows editing of the endogenous DNA of the eukaryotic cell as described herein.
  • the kit may also comprise one or more gRNA(s) or one or more nucleic acid molecule encoding the one or more gRNA that bind to the site-specific endonuclease portion of the protein.
  • the invention also provides a kit comprising a nucleic acid molecule comprising a polynucleotide encoding said first protein subunit described above and a nucleic acid molecule comprising a polynucleotide encoding said second protein subunit described above.
  • the invention also provides a kit of parts for editing endogenous DNA in a eukaryotic cell or in a eukaryotic organism at a target site of the endogenous DNA, wherein the kit comprises a fusion protein comprising the LIL12 exonuclease and a donor nucleic acid.
  • the kit comprises a donor nucleic acid according to the invention or a donor construct comprising said donor nucleic acid according to the invention and a fusion protein for editing endogenous DNA, comprising a site-specific endonuclease and a 5’-3’ exonuclease, wherein the site-specific endonuclease and the 5’-3’ exonuclease are fused via a polypeptide linker, and wherein the polypeptide linker has a length of 25 amino acids or more, preferably 30 amino acids or more, more preferably 40 amino acids or more, even more preferably 50 amino acids or more and most preferably 60 amino acids or more, and wherein the site-specific endonuclease is preferably as defined above herein, and wherein the 5’-3’ exonuclease is a protein whose amino acid sequence is or comprises (i) the amino acid sequence defined in SEQ ID NO: 32 (UL12-1) or SEQ ID NO: 33 (UL12-2), or (
  • Method for modifying endogenous DNA of a eukaryotic cell also comprises methods for editing endogenous DNA at a target site of the endogenous DNA.
  • One method relates to the insertion of a nucleotide sequence of interest into endogenous DNA of a eukaryotic cell at a target site and/or a deletion of a nucleotide sequence segment at the target site.
  • Another method relates to the modification of the endogenous DNA of a eukaryotic cell at a target site. All these methods comprise the provision of a donor nucleic acid and a protein according to the invention into a eukaryotic cell or organism.
  • the endonuclease of the protein e.g.
  • the fusion protein then cleaves the DNA at the target site and the exonuclease of the protein processes the cleaved DNA to generate 3’ overhangs, which can invade the homology arms of the donor nucleic acid to induce DNA modification through homology directed repair (HDR).
  • the DNA modification may lead to the insertion of the insertion sequence carried within the donor nucleic acid into the endogenous DNA.
  • the method of the invention allows the insertion, substitution and deletion of one or more base pairs in the endogenous DNA of the eukaryotic host cell.
  • the protein of the invention and the donor nucleic acid as well as the principles and processes for the insertion of the insertion sequence of the repair template into the endogenous DNA are described in the sections above.
  • the invention also provides a method of inserting nucleotide fragments of at least 5,000 base pairs into the genome of a eukaryotic cell at a target site.
  • the method comprises the provision of the protein of the invention and a donor nucleic acid into a eukaryotic cell, wherein the donor nucleic acid comprises an insertion sequence that is at least 5,000 nucleotides long and wherein the eukaryotic cell should harbor a functional HDR pathway.
  • the donor nucleic acid comprises an insertion sequence of 7,000 nucleotides and the 7,000 nucleotides of the insertion sequence may be inserted into the endogenous DNA of a eukaryotic cell at a target site.
  • the insertion sequence is 15,000 nucleotides long and the insertion sequence of 15,000 nucleotides is inserted into the endogenous DNA at a target site.
  • the insertions sequence is 20,000 nucleotides long and the insertion sequence of 20,000 nucleotides is inserted into the endogenous DNA at a target site.
  • An upper limit for the length of the insert does not exist, however, the frequency of successful insertion events may decrease with the length of the insertion sequence.
  • the present invention also provides a method for modifying endogenous DNA of a eukaryotic cell or a eukaryotic organism at a target site that does not depend on HDR and a donor nucleic acid/donor sequence.
  • this method is particularly suited to induce random deletions of one or more sequential nucleotides in the endogenous DNA at the target site, for example in non-coding regions or cis-active elements.
  • this method comprises providing a cell or an organism with: a protein according to the invention, or a nucleic acid molecule according to the invention, or a nucleic acid construct, plasmid or vector according to the invention, or a kit according to the invention.
  • the method generates modifications to the endogenous DNA, wherein the modifications involve deleting a sequence of two or more nucleotides in the endogenous DNA upstream and/or downstream of the target site in a random manner.
  • Example 1 Donor construct design for translational GUS fusion to Nicotiana benthamiana Phosphoglycerate kinase NbPGK (Niben101Scf05688g08010.1)
  • sgRNA putative single guide RNA
  • HAs 5' and 3' homology arms
  • the sgRNA target sequences were mutated in the Donor fragment to prevent Cas9-mediated cleavage of the Donor (before and/or after integration).
  • Amplified fragments were domesticated for Golden- Gate (GG) cloning by removing internal Bsal or Bpil sites using corresponding oligonucleotides.
  • GG Golden- Gate
  • the stop codon of NbPGK was deleted to allow a translational read-trough, leading to a translational NbPGK-GUS fusion.
  • Donor fragments were assembled via GG-cloning and delivered as T-DNAs via Agrobacterium tumefaciens (strain GV3101 pMP90)-mediated transient expression in leaves of N. benthamiana Nb).
  • the polynucleotide sequence of the Sequence of Donor Fragment is that of the following SEQ ID NO: 1 :
  • LB T-DNA left border
  • Exonucleases were codon optimized for Nicotiana benthamiana and synthesized as Level -1 GG-modules with matching overhangs (Fig. 2).
  • LF2 linker fragments were amplified via primer extension followed by assembling into Level -1 modules leading to 2xLF2 or 4xLF2 (see sequences below).
  • a 432 bp fragment from the XTEN linker (144 aa) was synthesized as Level -1 GG-module.
  • Shorter variants of the XTEN linker (XTEN16 and XTEN40) were generated via PCR using the 432 bp XTEN fragment as template and assembled as Level -1 GG-modules.
  • Modules for exonucleases and linkers were assembled together into Level 0 modules via GG-cloning (Fig. 2). Resulting exonuclease-linker Level 0 modules were assembled together with 2x35S(short), Omega translational enhancer, SpCas9i and tOCS terminator as transcriptional unit into Level 1. Level 1 expression vectors where transformed into Agrobacterium tumefaciens (GV3101 pMP90) and used for transient expression in N. benthamiana leaves.
  • GV3101 pMP90 Agrobacterium tumefaciens
  • LbCas12a (version of LbCas12a with introns) -> tOCS (OCS transcription terminator) -
  • sequences - VYXX I- designate the overhangs used in Golden Gate cloning to assemble the different modules.
  • the polynucleotide sequence of Cas9 module for N-terminal fusions AGGT_NLS- SpCas9i-NLS*_GCTT is shown in the following SEQ ID NO: 3 (* means stop codon; NLS stands for nuclear localization signal; sequences in bold letters correspond to the coding sequence; sequences in normal letters correspond to the introns):
  • the polynucleotide sequence of Cas12a module for N-terminal fusions AGGT_/VLS- LbCas12ai-/VLS*_GCTT is shown in the following SEQ ID NO: 57 (* means stop codon; NLS stands for nuclear localization signal; sequences in bold letters correspond to the coding sequence; sequences in normal letters correspond to the introns):
  • 2x35Sshort double Cauliflower mosaic virus 35S promoter, short version, translational enhancer: Omega translation enhancer from tobacco mosaic virus, A/LS-SpCas9(intron)-/VLS: version of SpCas9 with introns with N-terminal and C-terminal nuclear localization sequence (NLS), tOCS OCS transcription terminator, : Golden Gate cloning overhangs.
  • Level 1 construct the sequences of SEQ ID NO:2, 3 and 17 are combined with the fragment referred to as “Exo-linker” comprising the exonuclease and the polypeptide linker in Fig. 2A.
  • the Exo-linker is assembled by GG (Golden Gate) cloning from the Level -1 fragments “Exo” and “Linker” shown in Fig. 2A.
  • Table 1 Level 1 assemblies of modules for N-terminal tagged SpCas9 (Exonuclease-Linker- SpCas9).
  • any given exonuclease-linker combination was assembled first from Level -1 into Level 0.
  • Corresponding exonuclease-linker fusions (Level 0) were combined with other given Level 0 modules (2x35Ss_Q enhancer_exonuclease-linker_NLS-SpCas9i-NLS_tOCS) and assembled into GG-compatible MoClo Level 1 T-DNA Vector for Agro bacterium-medated transient expression in planta (including LB and RB sequences for T-DNA delivery).
  • the polynucleotide sequence of the Cas9 module for C-terminal fusions AATG_/VLS- SpCas9i-/VLS_TTCG is shown in the following SEQ ID NO: 16:
  • AATG_/VLS-LbCas12ai-/VLS_TTCG is shown in the following SEQ ID NO: 62:
  • the sequences of SEQ ID NO: 2, 16 and 17 are combined with the fragment referred to as “linker-Exo” comprising the polypeptide linker and the exonuclease in Fig. 2B.
  • the linker-Exo is assembled by GG (Golden Gate) cloning from the Level -1 fragments “Linker” and “Exo” shown in Fig. 2B.
  • SEQ ID NO: 18 polynucleotide encoding linker TTCG_2xLF2_AATG
  • SEQ ID NO: 20 polynucleotide encoding linker TTCG_XTEN144_AATG
  • SEQ ID NO: 25 polynucleotide encoding LIL12 exonuclease (AATG_UL12*_GCTT) for C- terminal Exonuclease-Cas9 fusions (* means stop codon)
  • SEQ ID NO: 26 polynucleotide encoding LIL12-2 exonuclease (AATG_UL12-2*_GCTT)
  • SEQ ID NO: 63 polynucleotide encoding Stenotrophomonas phage IME15 exonuclease (AATG_ME15*_GCTT) for C-terminal Exonuclease-Cas9 fusions (* means stop codon)
  • SEQ ID NO: 64 polynucleotide encoding Yersinia phage phiYeO3-12 exonuclease (AATG O3-12* GCTT) for C-terminal Exonuclease-Cas9 fusions (* means stop codon)
  • SEQ ID NO: 65 polynucleotide encoding Spirochaeta bacterium exonuclease (AATG_SpiPh*_GCTT) for C-terminal Exonuclease-Cas9 fusions (* means stop codon)
  • SEQ ID NO: 66 polynucleotide encoding Pasteurella phage vB_PmuP_PHB02 exonuclase (AATG_PhBO2*_GCTT) for C-terminal Exonuclease-Cas9 fusions (* means stop codon)
  • SEQ ID NO: 67 polynucleotide encoding Ralstonia phage philTL-1 exonuclease (AATG_RaTL1*_GCTT) Exonuclease for C-terminal Exonuclease-Cas9 fusions (* means stop codon)
  • Table 2 Level 1 assemblies of modules for C-terminal tagged SpCas9 (SpCas9- Linker- Exo)
  • any given linker-exonuclease combination was assembled first from Level -1 into Level 0.
  • Corresponding linker-exonuclease fusions (Level 0) were combined with other given Level 0 modules (2x35Ss_Q enhancer;NLS-SpCas9i-NLS; Linker-exo; tOCS) and assembled into GG- compatible MoClo Level 1 T-DNA vector for Agrobacterium mediated transient expression in planta (including LB and RB sequences for T-DNA delivery).
  • Resulting sgRNA-Terminator PCR fragments were combined with U6 promoter from Solanum Lycopersicum (SILI6, pAGT5824) via GG-cloning into Level 1 T- DNA vectors.
  • Level 1 expression vectors where transformed into Agrobacterium tumefaciens (GV3101 pMP90) and used for transient expression in N. benthamiana leaves.
  • SEQ ID NO: 28 spacer for sgR-PGK1 (as present in SEQ ID NO: 27):
  • SEQ ID NO: 29 spacer for sgR-PGK2 (alternative spacer that may replace the spacer of SEQ
  • SpCas9-Exonuclease variants we expressed all components (SpCas9, sgRNAs, Donor fragment) by Agrobacterium mediated transient expression in N. benthamiana (Nb). Successful gene targeting should lead to measurable GUS activity.
  • Corresponding Agrobacterium strains were grown on plate for 2 days at 28°C (LB Agar with corresponding antibiotics). Grown bacteria were resuspended in AIM (Agrobacterium infiltration media) to an optical density (ODeoo) of 0.1 and 0.2 for Cas9-variants and sgRNAs/donor, respectively. Dilutions of Agro bacterium- strains were mixed equally together (1: 1:1 :1; Cas9-construct : sgR-PGK1 : sgR-PGK2 : Donor). Agrobacterium suspensions were inoculated into leaves of Nb using a needleless syringe.
  • Example 4 Analysis of exonuclease activity by in vitro processing of a blunt ended hairpin oligonucleotide
  • the T5 and T7 exonucleases were purchased from New England Biolabs (catalog numbers M0363 and M0263, respectively).
  • the oligonucleotide has the nucleotide sequence: It is phosphorylated at the 5'-end and carries the Oregon Green fluorescent dye at the position shown in Fig. 5.
  • T7 exonuclease degrades the blunt-ended oligonucleotide much faster than the T5 exonuclease.
  • Example 5 Design of a C-terminal Cas12a (D156R) exonuclease fusions
  • the polynucleotide sequence of the Cas12a module for C-terminal fusions AATG_/VLS- LbCas12a(D156R)-/VLS (intron)_TTCG is shown in the following SEQ ID NO: 38:
  • the formatting has the following meaning:
  • Level 1 assemblies of modules for C-terminal tagged LbCas12a (LbCas12a(D156R)i- Linker-Exo). stands for a stop codon (as above). Any given linker-exonuclease combination was assembled first form Level -1 into Level
  • Level 0 Corresponding linker-exonuclease fusions (Level 0) were combined with other Level 0 modules (2x35Ss_Q enhancer_NLS-LbCas12a(D156R)i-NLS_Linker-exo_tOCS) and assembled into GG-compatible MoClo Level 1 T-DNA Vector for Agrobacterium mediated transient expression in planta (including LB and RB sequences for T-DNA delivery)
  • Example 6 Design of an N-terminal Cas12a (D156R) exonuclease fusion
  • Exonucleases and linkers may be as described in Example 2.
  • the general structure of the construct to be assembled is: T-DNA right border - RB, 2x35Sshort, translational enhancer, Exonucleaselinker, NLS-LbCas12a(D156R)-NLS (intron), tOCS, T-DNA left border LB.
  • Table 4 Level 1 assemblies of modules for N-terminal tagged LbCas12a (Exo-Linker- LbCas12a).
  • any given Exonuclease-Linker combination was assembled first form Level -1 into Level 0.
  • Corresponding Exonuclease-linker fusions (Level 0) were combined with other given Level 0 modules (2x35Ss_Q enhancer_exonuclease-linker_NLS-SpCas9i-NLS_tOCS) and assembled into GG-compatible MoClo Level 1 T-DNA Vector for Agrobacterium mediated transient expression in planta (including LB and RB sequences for T-DNA delivery).
  • Example 7 Transgenic Nicotiana benthamiana tobacco mosaic virus (TMV)-Reporter line using GFP
  • a GFP-based viral reporter system was generated.
  • a schematic presentation of the TMV-based HDR reporter system using GFP is given in Fig. 6.
  • the genome of the Nicotiana benthamiana plant is transgenic and comprises an insertion cassette harboring a TMV-based HDR reporter system using GFP.
  • This reporter system leads to GFP fluorescence when a donor sequence from the donor nucleic acid (donor) is successfully and correctly integrated into the plant genome.
  • the genome of the TMV present in the insertion cassette is modified: (i) the coat protein (CP)-encoding sequence is exchanged by sequence encoding GFP, and (ii) the replicase RdRP contains a 3.8 kb deletion replaced by a 76 bp attB site.
  • the MP is intact and facilitates viral spread from cell to cell (signal propagation) which allows macroscopic observation of GFP-expression derived from single-cell HDR events (one GFP-spot equals one single cell HDR event).
  • the RdRP is essential for viral replication and production of secondary transcripts (MP and GFP) from subgenomic promoters.
  • the exchange of the CP with GFP prevents packaging of the viral genome into viral particles (non-infectious virus) and allows high, RdRP-dependent expression rates of GFP instead.
  • Viral replication (GFP-expression) only takes place if the disrupted RdRP is repaired via precise insertion of the provided donor DNA by HDR.
  • Cas9 sgRNA targets are used, specific for the attB site to induce DNA double strand breaks (DSBs) with PAM-in (combination of PAM-ln1 and PAM-ln2 sgRNAs) or PAM-out (combination of PAM-out1 and PAM-out2 sgRNAs) orientation.
  • HDR homology directed repair
  • donor donor nucleic acid
  • RdRP TMV replicase
  • Example 8 Exonuclease fused Cas9 leads to increased HDR in planta
  • T-DNAs carrying the transcriptional units for the expression of (i) Cas9 exonuclease fusion proteins and (ii) sgRNAs and (iii) the donor DNA were delivered transiently by agrobacterium mediated transient transformation into leaves of Nicotiana benthamiana plants using a needleless syringe as described in Example 3.
  • GFP fluorescence was monitored after 3 or 6 days post inoculation (dpi) under UV-light produced by a hand-held lamp (model Blak-Ray B100A from U P) and pictures were taken using a digital camera (Canon EOS 700D).
  • Exonuclease-fused Cas9 was compared to WT Cas9 (denoted “Cas9”) and deactivated Cas9 (denoted “dCas9”) as controls.
  • the exonucleases were fused to the C- terminal end of Cas9 using the 4LF2 linker (Cas9-4LF2-X). Fusion of LIL12 and T7 to Cas9 led to strongest increase in HDR in Fig. 7. A high rate of HDR events could be observed with UL12-fused Cas9 shortly after 3 dpi. After 6 dpi, GFP fluorescence was saturated in tissues where UL12- and T7-fused Cas9 was expressed. 6 dpi Cas9 expression only led to few HDR events. See Figure 7.
  • Genomic DNA of Nicotiana benthamiana leafs were isolated 4 dpi from tissues expressing dual PAM In sgRNA, donor nucleic acid (donor) and the corresponding Cas9 exonuclease fusions. Details are depicted in Fig. 9.
  • Primer pair P1 (consisting of the primers 1 F and 1 R) was used to monitor on-target NHEJ-events (small deletion).
  • Primer pairs P2 (2F and 2R) and P3 (3F and 3 R) were used to amplify upstream and downstream HDR junctions, respectively.
  • Cas9 led to a deletion of the fragment flanked by the sgRNA target sites (smaller band). Intensities of this fragment negatively correlate with GFP spot number. Junctions of HDR- events could be confirmed for both sides for LIL12-, T5- and T7-fused Cas9 variants.
  • Exonuclease-fused Cas9 was compared to WT Cas9 and deactivated Cas9 (dCas9) as controls.
  • dCas9 deactivated Cas9
  • X-4LF2-Cas9 indicates that the exonucleases were fused to the N-terminal end of Cas9 using the 4LF2 linker.
  • the Cas9-fused Exonuclease domain of Exo1 (ExolAC) only led to slight increase in HDR-events compared to Cas9.
  • UL12- and T7-fused Cas9 outperformed Exo1AC-fused Cas9. See Fig. 10.
  • Amino acid (aa) sequence identity of the analyzed UL12-homologues is given in the upper panel of Fig. 11. Analysis of HDR-efficiency using the transgenic Nicotiana benthamiana TMV reporterline according to Examples 7 and 8. LIL12 Homologues with a sequence identity of 50% or higher such as PiE (SEQ ID NO: 44) or PapE (SEQ ID NO: 43) show HDR-rates comparable or higher to UL12 (SEQ ID NO: 32) or UL12-2 (SEQ ID NO: 33) in Figure 11 .
  • Fig. 12 provides the quantitative GFP spot count analysis of Fig. 11. GFP-spot count was done as described in Example 9.
  • Figure 12 reveals that PapE (SEQ ID NO: 43) and PiE (SEQ ID NO: 44) fused to Cas9 show an increased HDR-efficiency over LIL12 and LIL12-2.
  • LIL12 homologs with a sequence identity of less than 49% show HDR-rates lower than for LIL12 or LIL12-2. See Figs. 11 , 12, 15 and 17.
  • UL12 SEQ ID NO: 32
  • UL12-2 SEQ ID NO: 33
  • PapE SEQ ID NO: 43
  • PiE SEQ ID NO: 44
  • AB4P SEQ ID NO: 69
  • MD5 SEQ ID NO: 36
  • Dumas SEQ ID NO: 35
  • BGLF5 SEQ ID NO: 34
  • SOX SEQ ID NO: 68.
  • T7 Homologues ME15 (SEQ ID NO: 45) (87% sequence identity) and SpiPh (SEQ ID NO: 46) (65% sequence identity) show increased HDR- rates compared to T7 (SEQ ID NO: 30).
  • the T7 homologue ME15 shows HDR-efficiency comparable to UL12 (SEQ ID NO: 32) and UL12-2 (SEQ ID NO: 33) and higher than T7 in Fig. 13 and the GFP spot count analysis of Fig. 14.
  • T7 homologue SpiPh (SEQ ID NO: 46) shows a HDR-efficiency higher than T7 in Figs. 13 and 14. Also see Figs. 16 and 17.
  • UL12/UL12-1 SEQ ID NO: 32
  • UL12-2 SEQ ID NO: 33
  • T7 SEQ ID NO: 30
  • ME15/IME15 SEQ ID NO: 45
  • O3-12/YerO3-12 SEQ ID NO: 70
  • SpiPh SEQ ID NO: 46
  • PHBO2 SEQ ID NO: 71
  • RaTL1/RalTL1 SEQ ID NO: 72.
  • T5 (NEB; # M0363), T7 (NEB; M0263S), LaExo (NEB; M0262S).
  • Fast increase of fluorescence in Fig. 19 indicates high exonuclease activity and preference for blunt end DNA substrates.
  • T7 and LaExo show a higher exonuclease activity on blunt end DNA substrates than T5 (see Fig. 19).
  • fusion of the trimeric Lambda exonuclease (LaExo) to Cas9 showed less HDR-efficiency than Cas9- fused UL12.
  • Example 18 Cas12a-exonuclease fusion leads to increased HDR in planta Analysis of HDR-efficiency of Cas12a-exonuclease fusion proteins using the transgenic Nicotiana benthamiana TMV reporterline and methods described before. Fluorescence was monitored for 5 dpi. WT Cas12a and no endonuclease serve as controls. Note that Cas12a generates “staggered” cuts with overhangs on double stranded DNA opposed to “blunt” cuts generated by Cas9. See Fig. 22. Also see Example 19.
  • Cas9-mediated cleavage leads to a deletion of the fragment flanked by the dual sgRNA target sites (additional smaller band in Fig. 24; indication of NHEJ; 44 bp deletion between e.g. PAM-ln1 and PAM- In2).
  • Cas12a did not lead to a distinct deletion, indicating a broader spectrum of deletions or generally reduced NHEJ-mediated deletions.
  • Cas9-exonuclease fusions led to broader deletion sizes leading to a visible smear instead of a distinct band (Fig. 24). Reduced precise deletion frequency correlates with increased HDR. Also see Examples 21 and 22.
  • Example 21 Amplicon sequencing using Cas9-exonuclease fusion proteins See Fig. 25 A-D.
  • the same genomic DNA from Example 20 (Cas9 and Cas9-exo fusion proteins without donor) was used as template for on-target amplification using primer pairs P1 from Example 20/Fig. 24 with 5' adapters.
  • Adapters serve as binding anchors for index primers for amplicon sequencing.
  • Cas9 mediated cleavage mainly led to precise deletions between the cleavage sites.
  • Cas9-mediated cleavage in PAM out orientation also led to a significant number of 2 nt small deletions (1 nt shift per cleavage site).
  • Cas9-exonuclease fusion generally led to larger deletions.
  • the maximal size of deletions is similar between the different fused exonucleases, whereas UL12- and T7-fused Cas9 showed higher frequency. See Fig. 25A-D.
  • Genomic DNA from Example 20 (Cas12a and Cas12a-exonuclease fusion proteins without donor) was used as template for on-target amplification using primer pairs P1 with 5' adapters. Adapters serve as binding anchors for index primers for amplicon sequencing.
  • Cas12a led to deletion of the fragment between the dual crRNA cleavage site and a few smaller on-target deletions. Fusion of Exonucleases to Cas12a led to increased indels with larger deletions compared to Cas12a WT. See Fig. 26A-D.
  • Example 23 Sequence alignment of tested alkaline exonucleases homologous to UL12.
  • the X in SEQ ID NO: 56 represents variable amino acid residues, whereas the non-X residues in SEQ ID NO: 56 are fixed to the residue indicated in the one-letter code.
  • UL12-group specific motifs I and II could be identified. These motifs are present in all of PiE, PapE, LIL12-1 and LIL12-2, but absent in all of BGLF5, SOX, MD5, DUMAS and AB4P.
  • the UL12-group specific motif II consisting of SEQ ID NO: 54 (FRYCVGRAD), differentiates all of PiE, PapE, UL12-1 and UL12-2 from all of BGLF5, SOX, MD5, DUMAS and AB4P.
  • SEQ ID NO: 56 also distinguishes all of PiE, PapE, UL12-1 and UL12-2 from all of BGLF5, SOX, MD5, DUMAS and AB4P. This means that SEQ ID NO: 56 is present in all exonucleases of Example 12/Figs. 11-12 that show a high HDR efficiency and absent in all exonucleases of Example12/Figs. 11-12 that show a low HDR efficiency. Thus, SEQ ID NO: 56 may explain why PiE, PapE, UL12-1 and UL12-2 show a higher HDR efficiency compared to BGLF5, SOX, MD5, DUMAS and AB4P.
  • Example 24 Amino acid sequence segments specific for PapE
  • PapE (SEQ ID NO: 43) showed a particularly high HDR efficiency in Example 12/Figs. 11- 12. Amino acid sequences homologous to PapE were aligned to PapE (SEQ ID NO: 43) to identify sequence motifs/sequence segments that are specific for PapE-group exonucleases. As depicted in Fig. 28A and 28B, motifs specific for PapE and PapE-group exonucleases could be identified (PapE-group specific motifs I to IV).
  • the PapE-group specific motifs I to IV differentiate PapE or PapE-group exonucleases from all of UL12-1, UL12-2, PiE, BGLF5, SOX, MD5, DUMAS and AB4P as depicted in Figs. 27A to 27C.
  • the PapE-group specific motifs I, II, III and IV from Fig. 28A and 28B are comprised in at least one of SEQ ID NO: 47 (APAESVHACGVL), SEQ ID NO: 48 (APAASVHACGVL), SEQ ID NO: 49 (AKYAFDPADAGXXVVAAHRRL), SEQ ID NO: 50 (APASAPAAVRAA) and SEQ ID NO: 51 (LIITPVRXDAA).
  • any one selected from the group consisting of SEQ ID NO: 47, 48, 49, 50 and 51 is an amino acid segment that is specific for PapE (SEQ ID NO: 43) or PapE-group specific exonucleases.
  • Amino acid sequences of exonucleases homologous to T7 were aligned as depicted in Fig. 29.
  • Amino acid sequence motifs/sequence segments were identified that are present in SpiPhage, T7 and ME15, but not in RalTLI, PaPHBO2 or YerO3-12 (T7 exonuclease group motifs I and II).
  • T7 and ME15 show a higher HDR efficiency in Example 13/Figs. 13-14 than RalTLI, PaPHBO2 or YerO3-12
  • the T7 exonuclease group specific motifs I and II correlate with the higher HDR efficiency of SpiPhage, T7 and ME15.
  • ME15 (SEQ ID NO: 45) showed the highest HDR efficiency of the tested T7 homologues (Fig. 14).
  • ME15 specific motifs I and II were identified that are present in the amino acid sequence of ME15 (SEQ ID NO: 45) but absent in the amino acid sequence of T7 (SEQ ID NO: 30).
  • the ME15 specific motifs I and II are comprised in at least one of SEQ ID NO: 52 (APTESETLWDCI) and SEQ ID NO: 53 (ILRFNDYNIDT).
  • SEQ ID NO: 52 ATESETLWDCI
  • SEQ ID NO: 53 ILRFNDYNIDT
  • the amino acid sequence of SEQ ID NO: 98 (WEEEIWHRCCDHAKAR) is a sequence motif specific for T7, SpiPhage and ME15.
  • HSV-1 LIL12 The exonuclease activity of HSV-1 LIL12 is required for in vivo function. Virology. 1998 May 10;244(2):442-57. doi: 10.1006/viro.1998.9129. PMID: 9601512.
  • SEQ ID NO: 1 polynucleotide sequence of the sequence of Donor Fragment (PGK-GUS): sequence given above
  • SEQ ID NO: 2 polynucleotide sequence of module 2x35Ss - Q
  • SEQ ID NO: 3 polynucleotide sequence of module AGGT_NLS-SpCas9i-NLS*_GCTT for N- terminal fusions
  • SEQ ID NO: 4 polynucleotide encoding linker 2xLF2
  • SEQ ID NO: 5 polynucleotide encoding linker 4xLF2
  • SEQ ID NO: 6 polynucleotide encoding linker XTEN 144
  • SEQ ID NO: 7 polynucleotide encoding linker XTEN40
  • SEQ ID NO: 8 polynucleotide encoding linker XTEN 16
  • SEQ ID NO: 9 polynucleotide encoding T7 exonuclease for N-terminal exonuclease-Cas9 fusions
  • SEQ ID NO: 10 polynucleotide encoding T5 Exonuclease for N-terminal Exonuclease-Cas9 fusions
  • SEQ ID NO: 11 polynucleotide encoding LIL12 exonuclease for N-terminal Exonuclease-Cas9 fusions
  • SEQ ID NO: 12 polynucleotide encoding LIL12-2 Exonuclease for N-terminal Exonuclease-
  • SEQ ID NO: 13 polynucleotide encoding BGLF5 exonuclease for N-terminal Exonuclease-
  • SEQ ID NO: 14 polynucleotide encoding DUMAS exonuclease for N-terminal Exonuclease-
  • SEQ ID NO: 15 polynucleotide encoding MD5 exonuclease for N-terminal Exonuclease-Cas9 fusions
  • SEQ ID NO: 16 polynucleotide sequence of module AATG_NLS-SpCas9i-NLS_TTCG for C- terminal fusions
  • SEQ ID NO: 17 polynucleotide sequence of module tOCS
  • SEQ ID NO: 18 polynucleotide encoding linker 2xLF2
  • SEQ ID NO: 19 polynucleotide encoding linker 4xLF2
  • SEQ ID NO: 20 polynucleotide encoding linker XTEN144
  • SEQ ID NO: 21 polynucleotide encoding linker XTEN40
  • SEQ ID NO: 22 polynucleotide encoding linker: XTEN16
  • SEQ ID NO: 23 polynucleotide encoding T7 Exonuclease for C-terminal Exonuclease-Cas9 fusions
  • SEQ ID NO: 24 polynucleotide encoding T5 Exonuclease for C-terminal Exonuclease-Cas9 fusions
  • SEQ ID NO: 25 polynucleotide encoding LIL12 Exonuclease for C-terminal Exonuclease-
  • SEQ ID NO: 26 polynucleotide encoding LIL12-2 Exonuclease for C-terminal Exonuclease-
  • SEQ ID NO: 27 sgRNA transcriptional unit
  • SEQ ID NO: 28 spacer for sgR-PGK1
  • SEQ ID NO: 29 spacer for sgR-PGK2
  • SEQ ID NO: 38 polynucleotide encoding AATG_NLS-LbCas12a(D156R)-NLS(intron)_TTCG
  • SEQ ID NO: 39 polynucleotide sequence of the Cas12a module for N-terminal fusions AGGT_N LS-Cas12a(D156R)-N LS (intron)_GCTT
  • SEQ ID NO: 40 amino acid sequence of NLS-LbCas12a(D156R)-NLS for C-terminal fusions
  • SEQ ID NO 41 amino acid sequence of protein NLS-LbCas12a(D156R)-NLS (intron) for N- terminal fusions
  • SEQ ID NO: 58 Polynucleotide encoding AATG_SOX_TTCG exonuclease for N-terminal Exonuclease- Cas9 fusions
  • SEQ ID NO: 65 polynucleotide encoding AATG_SpiPh*_GCTT Exonuclease for C-terminal Exonuclease-Cas9 fusions (* means stop codon)

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • Biomedical Technology (AREA)
  • Organic Chemistry (AREA)
  • Biotechnology (AREA)
  • Zoology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Mycology (AREA)
  • Cell Biology (AREA)
  • Medicinal Chemistry (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Peptides Or Proteins (AREA)

Abstract

A protein for editing endogenous DNA in a eukaryotic cell or in a eukaryotic organism at a target site, comprising a site-specific endonuclease and a 5'-3' exonuclease, wherein the 5'-3' exonuclease is a monomeric 5'-3' exonuclease.

Description

FUSION PROTEIN FOR EDITING ENDOGENOUS DNA OF A EUKARYOTIC CELL
FIELD OF THE INVENTION
The present invention relates to a protein, such as a fusion protein, for editing endogenous DNA in a eukaryotic cell or in a eukaryotic organism at a target site of the endogenous DNA, and to a nucleic acid molecule comprising a polynucleotide encoding the protein. The invention also relates to a DNA construct, plasmid or vector comprising the polynucleotide of the nucleic acid molecule. The invention further relates to a prokaryotic or eukaryotic cell comprising the protein, or the nucleic acid molecule, or the DNA construct, plasmid or vector. The invention further relates to a kit for editing endogenous DNA at a target site in a eukaryotic cell or in a eukaryotic organism. Further, the invention relates to a method for inserting a donor sequence of interest into endogenous DNA of a eukaryotic cell or a eukaryotic organism at a target site and to a method for modifying endogenous DNA of a eukaryotic cell or a eukaryotic organism at a target site. Also provided is a cell or a eukaryotic organism generated by the methods.
BACKGROUND OF THE INVENTION
Recent advances in genome editing through the CRISPR/Cas technique simplified the targeted modification of genes and genomes and created new possibilities for genetic engineering. Site-specific endonucleases like Cas9 make it comparatively easy to induce targeted modifications in genomes because they can generate double strand breaks (DSBs) at designated sites. DSBs are potentially dangerous to the genetic integrity of the cell and the cell attempts at repairing them by one of several endogenous repair pathways. In eukaryotic organisms, the non-homologous end joining (NHEJ) pathway is the most frequent form of DSB repair. The NHEJ pathway allows simple and fast repair of DSBs at the expense of accuracy because the break ends are directly ligated together without the use of a DNA template for repair. As a result, repair through NHEJ occasionally introduces mutations into the genome, typically deletions or insertions of few base pairs. Genetic engineering exploits this phenomenon to generate knock-outs or knock-downs of target genes because the deletion or insertion of few base pairs can cause frame-shift mutations or generate premature stop codons that can give rise to a mutant phenotype.
However, genetic engineering does not only require methods for gene knock-out or gene knock-down. Genetic engineering also requires methods for the targeted knock-in of nucleotide sequences into the genome or the introduction of specific mutations at targeted sites in a predictable way. Such modifications are not possible through the NHEJ pathway. The repair of DSBs through homology directed repair (HDR) pathways, however, is suited for knock-in of sequences into the genome and the replacement of genes because HDR relies on a template for DNA break repair. Sequences that lie between flanking arms of such a DNA repair template (also referred to herein as “donor nucleic acid” or, if DNA, as “donor DNA”) are integrated into the genome, provided that the flanking arms are homologous to the sequences flanking the site of the DSB. Thus, HDR allows targeted integration and deletion of sequences in the genome. However, as NHEJ is the predominant pathway for repair, the repair of DSBs through the HDR pathway is rare in eukaryotic cells and happens with a likelihood well below the practical limit for detection and isolation. This means that gene editing applications that rely on HDR only become feasible in eukaryotic cells when the frequency of DSB repair through the HDR pathway is significantly increased. US 20170175140 describes methods for using a 5’-exonuclease to increase the frequency of homologous recombination in eukaryotic cells. However, the reported improvement was rather low so that the need remains to increase the efficiency of HDR and, generally, the efficiency of gene editing, notably via the HDR mechanism.
Therefore, it is an object of the present invention to provide means for increasing the efficiency of HDR in gene editing in cells and molecular tools therefor. It is a further object of the present invention to provide methods and tools (such as proteins, nucleic acid molecules, and/or kits) for increasing the frequency of gene replacement events in eukaryotic cells for the editing of a desired nucleotide sequence at a genomic target site in a predictable way. It is a further object to provide a protein or polynucleotide encoding it for gene editing in eukaryotic cells and/or organisms, notably in plants.
SUMMARY OF THE INVENTION
These objects are accomplished according to the claims. Inter alia, these objects are accomplished by:
1) A protein for editing endogenous DNA in a eukaryotic cell or in a eukaryotic organism at a target site of the endogenous DNA, comprising a site-specific endonuclease and a 5’- 3’ exonuclease, wherein the 5’-3’ exonuclease is a monomeric 5’-3’ exonuclease.
2) A protein for editing endogenous DNA in a eukaryotic cell or in a eukaryotic organism at a target site of the endogenous DNA, comprising a site-specific endonuclease and a 5’- 3’ exonuclease, wherein the 5’-3’ exonuclease is a monomeric 5’-3’ exonuclease having 5’-3’ exonuclease catalytic efficiency kcat/Km of at least 0.072 (pM s)-1 or a turnover number of least 0.50 s-1 in the in-vitro exonuclease assay described in the description.
3) The protein according to item 1 or 2, wherein the 5’-3’ exonuclease has the same or a higher 5’-3’ exonuclease activity in terms of catalytic efficiency kcat/Km or in terms of the turnover number than the T7 exonuclease (SEQ ID NO: 30) in the in-vitro exonuclease assay described in the description; or has a 5’-3’ exonuclease activity that is at least twice that of the T5 exonuclease (SEQ ID NO: 31) in terms of catalytic efficiency kcat/Km or in terms of the turnover number in the in-vitro exonuclease assay described in the description.
4) The protein according to any one of items 1 , 2, and 3, wherein said protein is a fusion protein comprising said site-specific endonuclease and said 5’-3’ exonuclease.
5) The protein according to any one of items 1 , 2, and 3, wherein said protein is an oligomeric protein (protein complex) comprising a first protein subunit comprising said endonuclease and a second protein subunit comprising said exonuclease.
6) The protein according to item 5, wherein said first subunit comprises (as a domain of said first subunit) said site-specific endonuclease and a first interaction domain, and said second subunit comprises (as a domain of said second subunit) said 5’-3’ exonuclease and a second interaction domain, wherein said first and said second interaction domain bind to each other to form said oligomeric protein (protein complex).
7) The protein according to any one of items 1 to 6, wherein the 5’-3’ exonuclease is a protein whose amino acid sequence is or comprises
(i) the amino acid sequence of SEQ ID NO: 30 (T7 exonuclease); or
(ii) an amino acid sequence having at least 80% sequence identity to the amino acid sequence of SEQ ID NO: 30; or
(iii) an amino acid sequence having at least 90% sequence similarity to the amino acid sequence of SEQ ID NO: 30; or
(iv) an amino acid sequence having from 1 to 50 amino acid substitutions, additions, deletions and/or insertions compared to the amino acid sequence of SEQ ID NO: 30.
8) The fusion protein according to any one of items 1 to 6, wherein the 5’-3’ exonuclease is a protein whose amino acid sequence is or comprises
(i) the amino acid sequence of SEQ ID NO: 32 (LIL12-1 exonuclease) or SEQ ID NO: 33 (LIL12-2 exonuclease); or
(ii) an amino acid sequence having at least 80% sequence identity to the amino acid sequence of SEQ ID NO: 32 or SEQ ID NO: 33; or
(iii) an amino acid sequence having at least 90% sequence similarity to the amino acid sequence of SEQ ID NO: 32 or SEQ ID NO: 33; or
(iv) an amino acid sequence having from 1 to 120 amino acid substitutions, additions, deletions and/or insertions compared to the amino acid sequence of SEQ ID NO: 32 or SEQ ID NO: 33.
9) The protein according to any one of items 1 to 8, wherein the site-specific endonuclease is a CRISPR nuclease capable of inducing double strand breaks to DNA, such as Cas9, or is a CRISPR nuclease with nickase activity capable of inducing single strand nicks to double stranded DNA, such as a nickase variant of Cas9. ) The protein according to any one of items 1 to 9, wherein the protein provides a higher frequency of gene targeting events than the separate application of the site-specific endonuclease and the 5’-3’ exonuclease of the protein without being fused together or without forming a protein complex when provided with a donor nucleic acid. ) The fusion protein according to item 4, optionally as further defined in any one of items 2, 3, and 7 to 10, wherein the 5’-3’ exonuclease is fused to the N-terminal end or to the C-terminal end of the site-specific endonuclease. ) The fusion protein according to item 4, optionally as further defined in any one of items 2, 3, and 7 to 11, wherein the site-specific endonuclease and the 5’-3’ exonuclease are fused via a polypeptide linker. ) The fusion protein according to item 12, said polypeptide linker consisting of from 5 to 300 amino acid residues, preferably from 10 to 200, more preferably from 20 to 120 amino acid residues. ) A nucleic acid molecule comprising a polynucleotide encoding the protein according to any one of items 1 to 13, preferably encoding the protein according to any one of items 4 and 11 to 13. ) Nucleic acid construct, plasmid or vector comprising the polynucleotide of the nucleic acid molecule according to item 14. ) Kit comprising: a nucleic acid molecule comprising a polynucleotide encoding said first protein subunit according to item 5 or 6 and a nucleic acid molecule comprising a polynucleotide encoding said second protein subunit according to item 5 or 6. ) A prokaryotic or eukaryotic cell comprising i) the protein according to any one of items 1 to 13, ii) the nucleic acid molecule of item 14, iii) the nucleic acid construct, plasmid or vector according to item 15, or iv) the kit according to item 16. ) The cell according to item 17, wherein the cell is a eukaryotic cell that further comprises a donor nucleic acid for homology directed DNA repair. ) The cell according to item 18, wherein the donor nucleic acid comprises, in the following order, a first homology arm that is homologous to a first region flanking a target site in the genome of said cell on a first side of said target site, optionally a donor sequence of interest to be inserted into genomic DNA of said cell at said target site, and a second homology arm that is homologous to a second region flanking said target site on the second side of said target site.
20) The cell according to item 19, said donor sequence comprising, preferably consisting of, up to 15,000, preferably up to 10,000 nucleotides in length.
21) The cell according to any one of items 17 to 20, further comprising a guide RNA or a guide construct encoding said guide RNA, said guide RNA being capable of binding to the site-specific endonuclease and of directing the site-specific endonuclease to a target site in the genome of said cell.
22) The cell according to any one of items 17 to 21 , wherein said cell is a plant cell.
23) A non-human organism, preferably a plant, comprising a cell according to any one of items 17 to 22.
24) A kit for editing endogenous DNA at a target site in a eukaryotic cell or in a eukaryotic organism, the kit comprising
(a) a donor nucleic acid as defined in item 19 or a donor construct comprising the donor nucleic acid, and
(b) a protein as defined in any one of items 1 to 13, or a nucleic acid as defined in item 14, or a DNA construct, plasmid or vector according to item 15, or a kit according to item 16.
25) The kit according to item 24, further comprising
(c) a cell of a eukaryotic organism or a eukaryotic organism.
26) The kit according to item 24 or 25, further comprising
(d) a guide RNA (gRNA) being capable of binding to the site-specific endonuclease and of directing the protein to the target site on the endogenous DNA of said cell or organism; or a nucleic acid molecule encoding said guide RNA.
27) A method for modifying endogenous DNA of a eukaryotic cell or a eukaryotic organism at a target site, the method comprising providing the cell or organism with:
(a) a donor nucleic acid as defined in item 19 and
(b) a protein as defined in any one of items 1 to 13, or a nucleic acid as defined in item 14, or a DNA construct, plasmid or vector according to item 15, or a kit according to item 16, wherein said modifying of endogenous DNA involves inserting a donor sequence of interest into endogenous DNA at the target site through homology directed repair, and/or involves deleting a sequence in the endogenous DNA at the target site through homology directed repair.
28) The method according to item 27, said method being a method of inserting a donor sequence of interest into the endogenous DNA of a eukaryotic cell or a eukaryotic organism at a target site, wherein a donor sequence of interest contained in said donor nucleic acid is inserted into the endogenous DNA.
29) The method according any one of items 28, further comprising providing the cell or organism with:
(c) a guide RNA capable of binding to the site-specific endonuclease and of directing the protein to said target site in the endogenous DNA of said cell or organism, or with a nucleic acid (guide nucleic acid) encoding said guide RNA.
30) The method according to item 28 or 29, wherein the donor sequence has a length of up to 15 kbp.
31) Use of the fusion protein according to any one of items 1 to 13 for gene editing in a eukaryotic cell.
32) A cell or a eukaryotic organism generated by the method according to any one of items 27 to 30.
Preferred embodiments that may be combined with the subject matters defined in the claims or the above items are described in the detailed description below.
The invention is based on the surprising finding that there are huge differences among 5’-3’ exonucleases (i.e. exonucleases that hydrolyze DNA in the 5’ to 3’ direction) in their ability to increase the frequency of HDR and thus the efficiency of gene editing via HDR. Notably, the inventors have found that 5’-3’ exonucleases that are monomeric and display high in vitro 5’-3’ exonuclease activity are particularly suited to significantly increase HDR. In contrast, enzymes that are monomeric but whose 5’-exonuclease activity is too low or enzymes that are multimeric are not suited for increasing the frequency of HDR. The inventors have further found that (even among monomeric 5’-3’ exonucleases) there are huge differences among 5’- 3’ exonucleases in their ability to increase the frequency of HDR and thus the efficiency of gene editing via HDR. Moreover, the inventors have found that improved efficiency of gene editing via HDR may be achieved by combining endonucleases with particular 5’-3’ exonucleases that have high activity with the type of single or double strand break (e.g. blunt ends or staggered ends) produced by the endonuclease. Thus, the invention provides improved methods, proteins, kits, and nucleic acid molecules for gene editing via HDR. Accordingly, the invention allows HDR to become reasonably competitive with the otherwise faster NHEJ pathway of DSB repair in cells, notably eukaryotic cells.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 : Identification of putative single guide RNA (sgRNA) target sites in the genomic sequence of NbPGK (phosphoglycerate kinase of Nicotiana benthamiana) close to the stop codon in a PAM-in orientation (via CRISPOR and CRISPR-P v2.0), see Example 1. LB stands for T-DNA left border; RB stands for T-DNA right border; GUS is the GUS protein-encoding ORF; 5’HA and 3’HA indicate the homology arms on the donor DNA and the binding regions of the 5’ and 3’ homology arms of the donor DNA on the genomic DNA. Arrows indicate the position of the sgRNA targets.
Fig. 2: Design of SpCas9 exonuclease fusions and sgRNAs. A: N-terminal Exo-Cas9- fusion. B: C-terminal Cas9-Exo-fusion. Details are explained in Example 2. Exo indicates an exonuclease encoding fragment; Linker stands for a linker between Exo and SpCas9i fragment; tOCS is a transcription terminator. N-SpCas9i-N stands for a Cas9 version having two NLS signals (nuclear localization signals). Vertical boxes labeled “Bsal" stand for the type IIS restriction endonuclease recognition sites of restriction enzyme Bsa/; horizontal boxes containing base quadruplets indicate that Bsal cleavage sites that form the GG overhangs used for assembling adjacent fragments by ligation using the Golden Gate (GG) cloning method described by Marillonnet et al. (described inter alia in WO2011154147 A1).
Fig. 3 Schematic presentation of sgRNA construct design using the Golden Gate (GG) cloning method. SILI6 stands for the U6 promoter from Solanum lycopersicum (SILI6, pAGT5824).
Fig. 4 Results of gene targeting by transient expression in Nicotiana benthamiana leaves using translational GUS-fusion to NbPGK. A) Relative GUS activity 4 days after inoculation of gene targeting constructs. Fusion of different exonucleases to Cas9 led to different GUS activities. GUS activity is relative to Act2p-GUS control construct. Relative GUS activity is indicated per pg of total protein. E-4LF2-Cas9 stands for N-terminal fusions, wherein the exonuclease E is on the N-terminal side of Cas9 and linked via linker 4LF2. Cas9-4LF2-E stands for C-terminal fusions, wherein the exonuclease E is on the C-terminal side of Cas9 and linked via linker 4LF2. The bars are labeled with abbreviations of the exonuclease of the fusion protein. UL12 refers to the UL12-1 exonuclease. DU stands for the DUMAS exonuclease. MD stands for the MD5 exonuclease. dCas9 stands for deactivated Cas9. B) This is a repetition of experiment shown in A) but in addition with the fusion of Cas9 to Lambda-exonuclease (LaExo), a multimeric (trimeric) exonuclease, showing that this fusion does not lead to increased HDR. C) Using the same assay as in A) and B), separate expression of Cas9 and UL12 was tested and shown to lead to an increase in HDR but not as strong as the fusion. D) Stained leaf discs from quantitative GUS measurements of A). The enlarged leaf discs compare a leaf disc with high GUS activity inoculated with a Cas9::UL12 fusion (left) with a Cas9 control without exonuclease (right).
Fig. 5 Analysis of exonuclease activity by in vitro processing of a blunt ended hairpin oligonucleotide. Comparison of T5 and T7 Exonuclease activity. A) Hairpin oligonucleotide (SEQ ID NO: 37) based on Nikiforov (2014) but phosphorylated at the 5' end and carrying the Oregon Green fluorescent dye. Stacking of the dye with the last G:C base pairs quenches the fluorescence. Upon exonucleolytic degradation from the 5‘-end, the stacking cannot happen and quenching is released, leading to higher fluorescence. B) Measurement of fluorescence over 30 minutes at 27 °C from two concentrations of the hairpin oligonucleotide (10 and 20 pM) of the four concentrations measured. C) Measurement of fluorescence as in B) with 5 pM of the hairpin oligonucleotide and 3 enzymes (T5: T5-exonuclease; T7: T7 exonuclease; LaExo: Lambda-exonuclease; H2O: water control).
Fig. 6 Transgenic Nicotiana benthamiana tobacco mosaic virus (TMV)-reporter line using GFP encoded in gDNA. This reporter line is used to measure the frequency and efficiency of HDR in the Examples 7 to 22. “gDNA (nbi775)” designates the insertion cassette in the transgenic N. benthamiana line as given in SEQ ID NO: 99, encoding a TMV with truncated RdRP with GFP, under the control of Act2p as the transcription promoter. GFP replaces the coat protein (CP) of TMV. MP stands for the TMV movement protein. “Donor” stands for the donor nucleic acid comprising the donor sequence encoding the RNA- dependent RNA polymerase (RdRP) of TMV and is given in SEQ ID NO: 74. Insertion of the Donor by HDR repairs the RdRP and allows transcription of a replicating TMV expressing GFP. See Example 7 for details. The sequence stretch of the TMV transgene construct at the bottom of Fig. 6 is given as SEQ ID NO: 73.
Fig. 7 Exonuclease fused Cas9 leads to increased HDR in planta. Cas9-4LF2-X indicates that the exonucleases were fused to the C-terminal end of Cas9 using the 4LF2 linker. Cas9 WT and deactivated Cas9 (dCas9) were used as control. The donor DNA is as in Figure 6. Exo1: Arabidopsis exonuclease I; LaExo: Lamda phage exonuclease; T5: Bacteriophage T5 exonuclease; T7: Bacteriophage T7 exonuclease; Exo3: Exonuclease III from E. coli; TREX1: Three-prime repair exonuclease 1 from Homo sapiens. For details see Example 8.
Fig. 8 Quantification of HDR-events by GFP-spot count. For details see Example 9.
Fig. 9 Genotyping of HDR events by PCR (see Example 10). “Donor” represents the donor nucleic acid. Primer pair P1 consists of primers 1F and 1 R, primer pair P2 of primers 2F and 2R, and primer pair P3 of primers 3F and 3R. “Nb WT” stands for Nicotiana benthamiana wild-type DNA. “Cas9” stands for wild-type Cas9 and “dCas9” for deactivated Cas9.
Fig. 10 Exonuclease domain of Exo1 (ExolAC) fused to Cas9 only slightly increased HDR efficiency (see Example 11). X-4LF2-Cas9 indicates that the exonucleases were fused to the N-terminal end of Cas9 using the 4LF2 linker. “Cas9” stands for wild-type Cas9 and “dCas9” for deactivated Cas9.
Fig. 11 Comparison of UL12-homologues in HDR (see Example 12). X-4LF2-Cas9 indicates that the exonucleases were fused to the N-terminal end of Cas9 using the 4LF2 linker. “Cas9” stands for wild-type Cas9 and “dCas9” for deactivated Cas9. The amino acid sequence of UL12 is given in SEQ ID NO: 32. The amino acid sequences of other exonucleases are given in: LIL12-2 = SEQ ID NO: 33, BGLF5 = SEQ ID NO: 34, Dumas = SEQ ID NO: 35, MD5 = SEQ ID NO: 36, PapE = SEQ ID NO: 43, PiE = SEQ ID NO: 44, SOX = SEQ ID NO: 68, AB4P = SEQ ID NO: 69.
Fig. 12: The LIL12 homologues PapE and PiE fused to Cas9 showed an increased HDR-efficiency over LIL12 in the GFP spot count analysis (see Example 12). Amino acid sequences are given in the following SEQ ID NOs: LIL12 = SEQ ID NO: 32, LIL12-2 = SEQ ID NO: 33, PapE = SEQ ID NO: 43, PiE = SEQ ID NO: 44, AB4P = SEQ ID NO: 69, MD5 = SEQ ID NO: 36, Dumas = SEQ ID NO: 35, BGLF5 = SEQ ID NO: 34, SOX = SEQ ID NO: 68.
Fig. 13 Comparison of T7-homologues in HDR (see Example 13). Amino acid sequences are given in the following SEQ ID NOs: UL12/UL12-1 = SEQ ID NO: 32, LIL12-2 = SEQ ID NO: 33, T7 = SEQ ID NO: 30, ME15/IME15 = SEQ ID NO: 45, O3-12/YerO3-12 = SEQ ID NO: 70, SpiPh = SEQ ID NO: 46, PHBO2 = SEQ ID NO: 71 , RaTL1/RalTL1 = SEQ ID NO: 72.
Fig. 14 The T7 homologue ME15 shows HDR-efficiency comparable to LIL12 (see Example 13). Amino acid sequences are given in the following SEQ ID NOs: UL12/UL12-1 = SEQ ID NO: 32, UL12-2 = SEQ ID NO: 33, T7 = SEQ ID NO: 30, ME15/IME15 = SEQ ID NO: 45, O3-12/YerO3-12 = SEQ ID NO: 70, SpiPh = SEQ ID NO: 46, PHBO2 = SEQ ID NO: 71, RaTL1/RalTL1 = SEQ ID NO: 72.
Fig. 15: Tree of LIL12 homologues (see Example 12).
Fig. 16 Tree of T7 homologues (see Example 13).
Fig. 17 Amino acid sequence identities of UL12- and T7-homologues together with T5 (see Examples 12 and 13).
Fig. 18 Comparison of exonuclease activity of monomeric exonucleases using blunt end DNA substrates (see Example 14). Hairpin oligo: SEQ ID NO: 37.
Fig. 19 Comparison of exonuclease activity of monomeric (T5 and T7) with trimeric (LaExo) exonucleases using blunt end DNA substrates SEQ ID NO: 37 (see Example 15).
Fig. 20 Activity of Cas9-fused LaExo cannot be increased by coexpression of nuclear localized LaExo (termed N-LaExo or LaExo-N) (see Example 16). Cas9-2LF2-X indicates that the exonucleases were fused to the C-terminal end of Cas9 using the 2LF2 linker. N-LaExo indicates that a nuclear localization signal (NLS) was fused to the N-terminal end of LaExo (Lambda Exonuclease). LaExo-N indicates that a nuclear localization signal (NLS) was fused to the C-terminal end of LaExo. “Cas9” stands for wild-type Cas9 and “dCas9” for deactivated Cas9.
Fig. 21 Determination of minimal homology arm-length (see Example 17).
Fig. 22 Cas12a-exonuclease fusion leads to increased HDR in planta (see Example 18). X-4LF2-Cas12a indicates the exonucleases were fused to the N-terminal end of Cas12a using the 4LF2 linker. “Cas12a” stands for wild-type Cas12a. Fig. 23 Estimation of HDR-efficiency of Cas12a exonuclease fusion proteins by GFP spot count (see Example 19). Labelling as in Fig. 22. dCas12a means deactivated Cas12a.
Fig. 24 Comparative analysis of the cleavage pattern of Cas9- and Castaexonuclease fusion proteins (see Example 20). X-Cas9 and X-Cas12a indicates that the exonucleases tested were fused to the N-terminal end of Cas9 and Casta, respectively.
Fig. 25A-D Amplicon sequencing using Cas9-exonuclease fusion proteins and sgRNAs in PAM-out orientation (see Example 21).
Fig. 26A-D Amplicon sequencing using Casta-exonuclease fusion proteins and crRNAs in PAM-out orientation (see Example 22).
Fig. 27A/B/C Sequence alignment of tested alkaline exonucleases homologous to IIL12 (see Example 23). Residues of 5’-phosphate coordination and residues of catalytic triad are indicated in bold and underlined, respectively. Conserved motifs of alkaline exonucleases are underlined according to Goldstein and Weller (1998) and Buisson et al., 2009. UL12- group-specific motifs and PapE-specific amino acid residues are indicated with rectangles and lines, respectively. The amino acid sequence portions shown are as follows: BGLF5 = SEQ ID NO: 34, SOX = SEQ ID NO: 68, MD5 = SEQ ID NO: 36, DUMAS = SEQ ID NO: 35, AB4P = SEQ ID NO: 69, PiE = SEQ ID NO: 44, PapE = SEQ ID NO: 43, UL12-1 = SEQ ID NO: 32, UL12-2 = SEQ ID NO: 33. The line labeled “consensus” is not an amino acid sequence, but indicates positions of high conservation in the sequences above.
Fig. 28A/B Amino acid sequence alignment of sequence portions homologous to PapE (see Example 24). PapE-group specific motifs I, II, III and IV correlate with increased HDR efficiency. PapE-group is a sub-group of UL12-group exonucleases. PapE-group specific motifs I, II, III and IV are indicated. General alkaline exonuclease functional motifs II, III and IV (according to Goldstein and Weller 1998 and Buisson et al., 2009) are indicated by bars in the center. The amino acid sequence of UL12 is given in SEQ ID NO: 32 and of PapE in SEQ ID NO: 43. The amino acid sequences of the proteins/peptides except UL12 and PapE are given in the SEQ ID NOs: 75 to 94. The line labeled “consensus” is not an amino acid sequence, but indicates positions of high conservation in the sequences above.
Fig. 29 Alignment of tested T7 exonuclease homologues. Residues of the catalytic triad are indicated underlined bold. T7 exonuclease-group specific motifs I and II are indicated with lines. ME15-specific motifs I and II are also indicated with a bar in the fifth portion from the top. RalTLI = SEQ ID NO: 72, PaPHBO2 = SEQ ID NO: 71, SpiPhage = SEQ ID NO: 46, YerO3- 12 = SEQ ID NO: 70, T7 = SEQ ID NO: 30, ME15 = SEQ ID NO: 45. The line labeled “consensus” is not an amino acid sequence, but indicates positions of high conservation in the sequences above. DETAILED DESCRIPTION OF THE INVENTION
The protein and the fusion protein of the invention
The protein of the invention comprises a site-specific endonuclease (also briefly referred to herein as “endonuclease”) and a 5’-3’ exonuclease (also briefly referred to herein as “exonuclease”). The endonuclease is a protein having endonuclease activity and is capable of cleaving phosphodiester bonds within a polynucleotide chain at a specific site. The exonuclease is a protein having 5’-3’ exonuclease activity and is capable of cleaving nucleotides from the end of a polynucleotide chain in 5’ to 3’ direction. Thus, the endonuclease and the exonuclease are enzymes and these enzymatic activities must be present in the protein of the invention. Thus, the protein of the invention is a 5’-3’ exonuclease and a sitespecific endonuclease.
There are various possibilities regarding how the endonuclease and the exonuclease may combine to the protein of the invention. They may be bound covalently or non-covalently. An example of covalent bonding is a fusion protein comprising the endonuclease and the exonuclease as domains of the fusion protein. Alternatively, the endonuclease and the exonuclease can be bound by other covalent chemical bonds such as disulfide bridges or by chemical linkers (e.g. using glutardialdehyde optionally followed by reduction e.g. using sodium borohydride). Among the covalent bonding between the endonuclease and the exonuclease, fusion proteins are preferred.
Alternatively, the protein may be an oligomeric protein (protein complex) comprising a first subunit (preferably protein subunit) comprising said endonuclease and a second subunit (preferably protein subunit) comprising said exonuclease. The first subunit may comprise: the site-specific endonuclease (e.g. as a domain of the first subunit) and a first interaction domain (such as a first protein interaction domain) or first interaction nucleic acid (such as a nucleic acid comprising an aptamer); the second subunit may comprise: the 5’-3’ exonuclease (e.g. as a domain of the second subunit) and a second interaction domain (a second protein or peptide interaction domain); wherein said first interaction domain or first interaction nucleic acid and said second interaction domain bind to each other to form said oligomeric protein (protein complex). Thus, in this embodiment, it is possible to bring the 5’-3’ exonuclease and the endonuclease in proximity to each other by specific protein-protein or protein-RNA interaction domains. For example, specific protein-protein interactions between a peptide epitope and a single chain antibody recognizing this peptide have been used to generate strong transcription activators based on dCas9 (deactivated Cas9) (Tanenbaum et al., 2014, Cell, 159: 635-646). Thus, the first interaction domain may be such peptide epitope and the second interaction domain may be a single chain antibody binding said peptide epitope. Similarly, coil-coil protein-protein interaction domains can be used for the same purpose (Lebar et al., Nat Chem Biol, 16: 513-519). Thus, the first and the second interaction domains may be coil-coil protein-protein interaction domains.
Similarly, protein-RNA interaction domains can be used for the same purpose. Thus, by inserting an aptamer in the gRNA used in the invention, a peptide that specifically recognizes this aptamer will bind the gRNA and be in physical proximity to the endonuclease (Ma et al., Nature Biotech, 34: 528-531). This principle can be used to bring the 5’-3’ exonucleases in proximity to the endonuclease e.g., by fusing the exonuclease to the peptide recognizing the specific RNA aptamer and fusing the RNA aptamer to the gRNA. In the latter embodiment, the protein of the invention is a CRISPR-Cas nuclease, the gRNA comprises an aptamer, and the exonuclease comprises a peptide as interaction domain that binds to the aptamer. Thus, the aptamer-peptide complex can serve as a non-covalent linker between the endonuclease and the exonuclease. In this embodiment, the protein may be an oligomeric protein comprising a first subunit being or comprising the endonuclease a second subunit comprising said 5’-3’ exonuclease and (as a second interaction domain) a peptide capable of binding to the aptamer, and a nucleic acid having a segment capable of binding to the endonuclease (referred to as “interaction nucleic acid above, e.g. a gRNA) and an aptamer (segment) capable of binding to the 5’-3’ exonuclease (notably to said peptide capable of binding to the aptamer). As above, the endonuclease is preferably a CRISPR-Cas nuclease.
Specific embodiments of the embodiment wherein the endonuclease and the exonuclease form a protein complex are thus as follows: the first interaction domain is a single chain antibody and the second interaction domain is a peptide epitope that specifically binds to the single chain antibody; the second interaction domain is a single chain antibody and the first interaction domain is a peptide epitope that specifically binds to the single chain antibody; the first interaction nucleic acid is a gRNA comprising an aptamer and the second interaction domain is a peptide that specifically recognizes and binds the aptamer.
Among covalent and non-covalent binding between the endonuclease and the exonuclease, covalent binding is preferred and fusions proteins comprising the endonuclease and the exonuclease are more preferred.
The fusion protein of the invention is a fusion of a site-specific endonuclease with a 5’- 3’ exonuclease. The endonuclease and the exonuclease represent domains of the fusion protein. Where, in the following, reference is made to the endonuclease or the exonuclease in the context of the fusion protein, the endonuclease domain or the exonuclease domain, respectively, of the fusion protein is/are meant. The exonuclease may be fused to the N- terminal end or the C-terminal end of the site-specific endonuclease. The fusion may be direct, i.e. without a linker. Preferably, however, the two domains are fused via a linker polypeptide in order to avoid steric hindrance between and/or for the two domains. The fusion protein is a 5’- 3’ exonuclease and a site-specific endonuclease (and these functions are normally present in separate domains of the fusion protein).
The linker is a polypeptide of at least 10, preferably at least 20, and more preferably at least 30 amino acid residues. The maximum number of amino acid residues of the linker is not particularly limited, but may be defined as 250 residues, preferably at most 200, and more preferably at most 150 amino acid residues. In a preferred embodiment, the length of the polypeptide linker is between 40 and 90 amino acids, preferably between 50 and 80 amino acids and more preferably between 60 and 70 amino acids. In a particular embodiment the polypeptide linker consists of 61 amino acids.
The site-specific endonuclease provides the protein of the invention, optionally in conjunction with further components, with the ability to detect a target site on the endogenous DNA of a eukaryotic cell or a eukaryotic organism, to guide the protein of the invention (e.g. the fusion protein) including the exonuclease to the target site and to cleave the endogenous DNA at the target site. The term “target site” refers to a site on the endogenous DNA intended to be cleaved by the endonuclease. The endonuclease of the protein of the invention (as well as the endonuclease domain of the fusion protein) has site-specific endonuclease function and can cleave the DNA at the target site. In preferred embodiments, double strand breaks (DSB) are induced to the endogenous DNA. The DSBs may be blunt end DSBs or staggered DSBs with sticky overhangs. The type of cleavage at the target site depends on the endonuclease used. Some endonucleases like Cas9 induce blunt end DSBs to the DNA whereas other endonucleases like Cas12a (formerly Cpf1) induce staggered DSBs with sticky overhangs. It is also possible to use endonucleases with nickase activity, so-called nickases. Such nickase may be a mutant variant of a CRISPR nuclease, such as Cas9. In an embodiment, a nickase is used to induce a DSB to an endogenous DNA molecule, wherein the nickase induces a single strand nick both at the coding strand and the template strand in proximity. Two gRNAs may be used to guide the nickase to the two sites in order to produce a DSB by two nickase reactions.
The 5’-3’ exonuclease provides the protein of the invention with the ability to process the DNA at the target site, after cleavage by the endonuclease, by way of its 5’-3’ exonuclease activity. The fusion or other bonding to the endonuclease ensures that the exonuclease is in proximity of the DNA ends generated by the endonuclease. The inventors expect that the 5’-3’ exonuclease has a higher affinity to double-stranded DNA ends than the DNA repair factors from the NHEJ pathway, which prevents the latter from binding to the DNA ends. This suppression of the NHEJ pathway is assumed to promote the DNA break repair through the HDR pathway and, thereby, increases the frequency of gene replacement events. The inventors also expect that differences in affinity to broken DNA ends contribute to the observation of the inventors that some 5’-3’ exonucleases lead to a higher gene replacement efficiency than other 5’-3’ exonucleases. It is conceivable that the high activity of the exonuclease used in the invention to act on the DSBs produced by the endonuclease effectively competes with the NHEJ pathway.
After binding to the DNA break ends, the exonuclease processes the DNA in 5’ to 3’ direction and creates free 3’-overhangs preferably at both ends of the cleaved DNA. The inventors assume that processing contributes to DNA break repair through the HDR pathway instead of the NHEJ pathway, especially when suitable donor DNA is present (see further below). The 3’-overhangs are believed to pair with the complementary strand of the homology arms of the donor nucleic acid (preferably donor DNA) to create a complex of hybrid DNA that comprises the cleaved endogenous DNA and the donor nucleic acid. The formation of this complex is also believed to contribute to the increased frequency of DNA break repair through the homology directed repair (HDR) pathway instead of the non-homologous end joining (NHEJ) pathway.
The endonuclease and the exonuclease which are comprised by the protein of the invention are described in further detail in the following.
Site-specific endonuclease and optional further components thereof
The site-specific endonuclease of the fusion protein may be any endonuclease that cleaves double-stranded DNA at a target site in a site-specific manner. Examples of sitespecific endonucleases that can be used in the invention are Zinc-finger nucleases (ZFN), transcription activator- 1 ike effector nucleases (TALEN), and CRISPR-endonucleases, whereby the latter are preferred because of their ease of use and wide applicability. Examples of CRISPR- endonucleases are Cas9 and Cas12a (formerly Cpf1) and modified versions (e.g. mutants) thereof that have endonuclease activity. The structure and use of the CRISPR- endonuclease Cas9 is described inter alia in WO2014093712 A1 and WO2014093635 A1. The structure and use of the CRISPR-nuclease Cpf1 is described inter alia in W02016205711 A1 and WO2017141173 A1. In a preferred embodiment, the site-specific endonuclease is Cas9 or a mutant thereof having endonuclease activity.
It is generally known that a CRISPR-endonuclease such as Cas9 requires a guide RNA (gRNA) to guide the endonuclease to a target site by complementarity of the gRNA to sequences at the target DNA. Thus, the gRNA has complementarity to a target nucleic acid (generally target DNA) and has the ability to bind to the endonuclease that is used for cleaving the target DNA. As mentioned above, the nuclease may be Cas9 or Cpf1 or modified versions (e.g. mutants) thereof that have endonuclease activity. However, the invention is not limited to the Cas9 or Cpf1 endonucleases, and other CRISPR endonucleases may be used as well. In some embodiments, the gRNA comprises a guide sequence linked to a direct repeat sequence. The guide sequence provides the complementarity to a target DNA for guiding the endonuclease to the target site. The direct repeat sequence generally provides portions that allow binding of the gRNA to a CRISPR nuclease as, for example, in a tracrRNA. The gRNA may be a single guide RNA (sgRNA), i.e. it may comprise a transactivating RNA (tracrRNA) required for certain CRISPR-Cas systems, such as the Type II CRISPR-Cas9 system. Thus, a gRNA may comprise a sequence stretch complementary to the target DNA and, if required, a trans-activating CRISPR RNA (tracrRNA). The sequence stretch complementary to the target DNA may have a length of from 19 to 22 contiguous nucleotides, preferably from 20 to 21 nucleotides. The succession of these elements depends on the type of CRISPR-Cas-system used.
For use of Cas9 or another endonuclease of a class 2, Type II CRISPR-Cas-system, the gRNA is generally a sgRNA that comprises in 5’ to 3’-direction a sequence stretch complementary to a strand of the target DNA and a trans-activating CRISPR RNA (tracrRNA). The use of CRISPR-Cas systems is generally known to the skilled person.
To sequence-specifically cleave a target DNA, the CRISPR endonuclease (also briefly referred to as CRISPR nuclease), for example Cas9, having bound gRNA (such as a sgRNA) can scan in the eukaryotic cell the endogenous DNA to recognize, at the target site, a target sequence adjacent to a Proto-spacer Adjacent Motif (so-called PAM-sequence). When the PAM-sequence is detected at the target site, the endonuclease binds to it and may unwind the DNA. Subsequently, the distal part of the gRNA, which is bound to the endonuclease, can hybridize with the unwound target DNA to identify the target site as determined by the gRNA. When about 20 contiguous nucleotides of the distal end of the gRNA have successfully hybridized with the separated DNA strand, the endonuclease may exert its function and cleave or nick the target DNA near the PAM sequence. The pattern of the DNA cleavage depends on the properties of the endonuclease. A CRISPR nuclease usually introduces double strand breaks (DSBs). The DSBs may have blunt ends (e.g. in the case of Cas9). If DSBs with sticky ends are desired, Cpf1 may be used as the CRISPR nuclease. In a further alternative, the target DNA may be nicked, i.e. only one of the strands of the target DNA is cleaved. Nicking may be achieved by using a CRISPR nuclease having one of the two nuclease domains of a natural CRISPR nuclease inactivated by mutation. However, in the present invention, both strands of the target DNA are preferably cleaved to introduce DSBs in the target DNA. Even more preferably, both strands of the target DNA are cleaved to introduce blunt-ended DSBs in the target DNA. In another embodiment, both strands of the target DNA are cleaved to introduce sticky-ended DSBs in the target DNA. CRISPR nucleases are divided into different types based on their mode of operation. They originate from different bacteria and/or archaea and differ in the size, domain structure, and the PAM-sequence recognized. Nevertheless, CRISPR/Cas nucleases depend on the basic principle of a RNA-guided nuclease activity. Cpf1 as an example of a CRISPR nuclease that differs from Cas9 in that it recognizes a different PAM-sequence and does not require a tracrRNA sequence in the gRNA (EP 3 009 511 ; Zetsche et al., Cell 163(3) (2015) 759-771). Cpf1 , unlike Cas9, generates double strand breaks with sticky overhangs.
The site-specific endonuclease may alternatively have nickase activity and introduce single strand breaks (nicks) into the endogenous DNA of the eukaryotic or prokaryotic cell. Single strand nicks at both the coding strand and the template strand are required to produce DSBs to the DNA using nickase enzymes. These two nicks may be carried out by the same nickase. Two different guide gRNAs, one directed to the coding strand and the other one directed to the template strand, may be used to obtain a DSB. In one embodiment, the two gRNAs guide the nickase enzyme to introduce single strand nicks both at the coding strand and the template strand of the DNA in proximity, wherein one guide RNA is designed in PAM- in and the other in PAM-out orientation.
5’-3’ exonuclease
The exonuclease of the protein of the invention (e.g. the fusion protein) is generally a monomeric 5’-3’ exonuclease. An exonuclease is a monomeric 5’-3’ exonuclease in the sense of the present invention if it consists of a single protein subunit and the exonuclease activity is present in this single protein subunit. Generally, many 5’-3’ exonucleases known in the art are multimeric. However, the inventors have surprisingly found that much better results and better gene editing efficiency, notably using the HDR mechanism, can be achieved if the exonuclease is a monomeric exonuclease.
An exonuclease comprises 5’-3’ exonuclease activity if it hydrolyzes blunt end double stranded DNA in 5’ to 3’-direction preferably from both ends of a DSB to create 3’-overhangs of the non-hydrolyzed strand. However, the activity is not limited to hydrolysis of blunt end double stranded DNA. In one embodiment of the invention, the 5’-3’ exonuclease is a 5’-3’ exonuclease having 5’-3’ exonuclease catalytic efficiency kcat/Km of at least 0.072 (pM s)-1 or a turnover number of least 0.50 s'1 using the exonuclease assay described herein. In a more preferred embodiment, the 5’-3’ exonuclease is a 5’-3’ exonuclease that has a 5’-3’ exonuclease catalytic efficiency kcat/Km of at least 0.10, preferably of at least 0.20 (pM s)’1, and/or a turnover number of least 0.70 s’1, preferably of at least 1.4 s’1. The catalytic efficiency and/or the turnover number may be determined using, as substrate, the hairpin oligonucleotide of SEQ ID NO:37 that is phosphorylated at its 5’ end and carries a fluorescent dye linked to the thymine base (T) closest to the 3’-end of the oligonucleotide. The underlying assay is described in Example 4. The assay is carried out at 27°C and pH 7.9 (measured at 25°C) and increasing fluorescence due to decreasing fluorescence quenching is monitored, recorded and plotted. Initial velocities using multiple different substrate concentrations are determined from the initial linear part of the plot. Turnover number and Km values are determined from a Lineweaver-Burk plot of the Initial velocities against substrate concentrations. Example 4 gives further details on the assay.
Herein, where reference is made to the turnover number and/or catalytic efficiency of the exonuclease, these values refer to and are assayed by using the free exonuclease enzyme, i.e. exonuclease not being part of the protein of the invention. For the purposes of the invention, it is assumed that the turnover number and catalytic efficiency of the free exonuclease corresponds to that of the fusion protein or the protein complex comprising the exonuclease.
In another embodiment, the 5’-3’ exonuclease activity of the 5’-3’ exonuclease of the invention is higher than the 5’-3’ exonuclease activity of the T5 exonuclease (SEQ ID NO: 31) in the in-vitro exonuclease assay described in Example 4. Preferably, the 5’-3’ exonuclease activity of the 5’-3’ exonuclease of the invention is at least 2-fold, preferably at least 3-fold, more preferably at least 4-fold that of the 5’-3’ exonuclease activity of the T5 exonuclease (SEQ ID NO: 31) in the in-vitro exonuclease assay described in Example 4
In another embodiment, the 5’-3’ exonuclease activity of the 5’-3’ exonuclease according to the invention is the same or higher than the 5’-3’ exonuclease activity of the bacteriophage T7 exonuclease (SEQ ID NO: 30) in the in-vitro exonuclease assay described in Example 4.
The inventors have surprisingly found that the efficiency of gene editing using the HDR mechanism depends on the exonuclease activity of the exonuclease. Otherwise, the exonuclease is not particularly limited, except being a monomeric 5’-3’ exonuclease. Natural exonucleases may be used for fusion with the endonuclease. However, a natural exonuclease may be modified, e.g. by introducing mutations, additions, insertions and/or deletions, provided the exonuclease activity is not compromised. Examples of the exonuclease are the T7 exonuclease (SEQ ID NO: 30), the LIL12-1 exonuclease (SEQ ID NO: 32), and the LIL12-2 exonuclease (SEQ ID NO: 33). Further examples are the BGLF5 exonuclease (SEQ ID NO: 34), DUMAS exonuclease (SEQ ID NO: 35), and the MD5 exonuclease (SEQ ID NO: 36). Preferred exonucleases are PapE (SEQ ID NO: 43), a deoxyribonuclease from Papiine alphaherpesvirus 2; PiE (SEQ ID NO: 44), a deoxyribonuclease from Pteropus lylei-associated alpha-herpesvirus. Further examples of exonucleases are SOX (SEA ID NO: 68) and AB4P (SEQ ID NO: 69). Other particularly suited exonucleases are ME15 (SEQ ID NO: 45) and SpiPh (SEQ ID NO: 46). Other suited exonucleases are 03-12 (SEQ ID NO: 70), PhBO2 (SEQ ID NO: 71) and RaTL1 (SEQ ID NO: 72). Variants of these exonucleases as defined herein are also suitable for practicing the invention.
The 5’-3’ exonucleases that may be used in the invention may be grouped into the following two groups, (I) the UL-12 homologues (some of which are depicted in Fig. 15) and (II) the T7 homologues (some of which are depicted in Fig. 16).
Members and examples of group (I) are: the LIL12-1 exonuclease (SEQ ID NO: 32), the LIL12-2 exonuclease (SEQ ID NO: 33), the BGLF5 exonuclease (SEQ ID NO: 34), the DUMAS exonuclease (SEQ ID NO: 35), the MD5 exonuclease (SEQ ID NO: 36), the PapE exonuclease (SEQ ID NO: 43), the PiE exonuclease (SEQ ID NO: 44), the SOX exonuclease (SEA ID NO: 68), the AB4P exonuclease (SEQ ID NO: 69); and variants of these exonucleases as defined herein. Preferred are the UL12-1 exonuclease (SEQ ID NO: 32), the UL12-2 exonuclease (SEQ ID NO: 33), the PapE exonuclease (SEQ ID NO: 43), and the PiE exonuclease (SEQ ID NO: 44) and their variants. Most preferred are the PapE exonuclease (SEQ ID NO: 43), and the PiE exonuclease (SEQ ID NO: 44) and their variants.
Members and examples of group (II) are: the T7 exonuclease (SEQ ID NO: 30), the ME15 exonuclease (SEQ ID NO: 45), the SpiPh exonuclease (SEQ ID NO: 46), the 03-12 exonuclease (SEQ ID NO: 70), the PhBO2 (SEQ ID NO: 71), and the RaTL1 exonuclease (SEQ ID NO: 72), and their variants as defined herein. Preferred are the ME15 exonuclease (SEQ ID NO: 45) and the SpiPh exonuclease (SEQ ID NO: 46) and their variants defined herein. Most preferred is the ME15 exonuclease and its variants defined below.
Various exonucleases for use in the invention are described in further detail in the following. The variants of items (ii) to (iv) have (as that of items (i)) 5’-3’-exonuclease activity, preferably those minimum activities defined above numerically. Here, as throughout the specification, amino acid sequences are given in the standard one-letter code according to WIPO Standard ST25. X stands for any one of the 20 standard amino acid residues.
Group (I) exonucleases
In one embodiment, the 5’-3’ exonuclease according to the invention is a protein the amino acid sequence of which comprises or consists of:
(i) the amino acid sequence of SEQ ID NO: 32 (UL12-1 exonuclease) or SEQ ID NO: 33 (UL12-2 exonuclease); or
(ii) an amino acid sequence having at least 70%, preferably at least 80%, more preferably at least 85%, even more preferably at last 90%, and most preferably at least 95% sequence identity to the amino acid sequence of SEQ ID NO: 32 or SEQ ID NO: 33; or
(iii) an amino acid sequence having at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably at least 95%, and most preferably at least 98% sequence similarity to the amino acid sequence of SEQ ID NO: 32 or SEQ ID NO: 33; or
(iv) an amino acid sequence having from 1 to 50, preferably at most 40, more preferably at most 30, even more preferably at most 20, and most preferably at most 10 amino acid substitutions, additions, deletions and/or insertions compared to the amino acid sequence of SEQ ID NO: 32 or SEQ ID NO: 33.
In another embodiment, the 5’-3’ exonuclease according to the invention is a protein the amino acid sequence of which comprises or consists of:
(i) the amino acid sequence of SEQ ID NO: 34 (BGLF5 exonuclease); or
(ii) an amino acid sequence having at least 70%, preferably at least 80%, more preferably at least 85%, even more preferably at last 90%, and most preferably at least 95% sequence identity to the amino acid sequence of SEQ ID NO: 34; or
(iii) an amino acid sequence having at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably at least 95%, and most preferably at least 98% sequence similarity to the amino acid sequence of SEQ ID NO: 34; or
(iv) an amino acid sequence having from 1 to 50, preferably at most 40, more preferably at most 30, even more preferably at most 20, and most preferably at most 10 amino acid substitutions, additions, deletions and/or insertions compared to the amino acid sequence of SEQ ID NO: 34.
In another embodiment, the 5’-3’ exonuclease according to the invention is a protein the amino acid sequence of which comprises or consists of:
(i) the amino acid sequence of SEQ ID NO: 35 (DUMAS exonuclease); or
(ii) an amino acid sequence having at least 70%, preferably at least 80%, more preferably at least 85%, even more preferably at last 90%, and most preferably at least 95% sequence identity to the amino acid sequence of SEQ ID NO: 35; or
(iii) an amino acid sequence having at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably at least 95%, and most preferably at least 98% sequence similarity to the amino acid sequence of SEQ ID NO: 35; or
(iv) an amino acid sequence having from 1 to 50, preferably at most 40, more preferably at most 30, even more preferably at most 20, and most preferably at most 10 amino acid substitutions, additions, deletions and/or insertions compared to the amino acid sequence of SEQ ID NO: 35.
In another embodiment, the 5’-3’ exonuclease according to the invention is a protein the amino acid sequence of which comprises or consists of: (i) the amino acid sequence of SEQ ID NO: 36 (MD5 exonuclease); or
(ii) an amino acid sequence having at least 70%, preferably at least 80%, more preferably at least 85%, even more preferably at last 90%, and most preferably at least 95% sequence identity to the amino acid sequence of SEQ ID NO: 36; or
(iii) an amino acid sequence having at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably at least 95%, and most preferably at least 98% sequence similarity to the amino acid sequence of SEQ ID NO: 36; or
(iv) an amino acid sequence having from 1 to 50, preferably at most 40, more preferably at most 30, even more preferably at most 20, and most preferably at most 10 amino acid substitutions, additions, deletions and/or insertions compared to the amino acid sequence of SEQ ID NO: 36.
The above variants of items (ii) to (iv) have 5’-3’-exonuclease activity, preferably those defined above numerically.
As stated above, in a preferred embodiment, the 5’-3’ exonuclease is a protein whose amino acid sequence is or comprises:
(i) the amino acid sequence defined in SEQ ID NO: 43 (PapE), or
(ii) an amino acid sequence that has at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably at least 95% and most preferably at least 98% sequence identity to the amino acid sequence defined in SEQ ID NO: 43, or
(iii) an amino acid sequence that has at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably 95% and most preferably 98% sequence similarity to the amino acid sequence defined in SEQ ID NO: 43, or
(iv) an amino acid sequence of from 1 to 121 , preferably from 1 to 90, more preferably from 1 to 60, even more preferably from 1 to 45 and most preferably from 1 to 30 amino acid substitutions, additions, insertions and/or deletions compared to the amino acid sequence defined in SEQ ID NO: 43.
The amino acid sequence of the 5’-3’ exonuclease of this embodiment (notably of items (ii) to (iv)) preferably comprises the PapE-group specific Motif I, and/or II, and/or III (cf. Fig. 27, underlining designates motifs identified in Fig. 27) of amino acid sequence PAASVH, RRL, and APASAPAAVRAA (SEQ ID NO: 50), respectively, at positions corresponding to that/those of PapE (cf. Fig. 27). More preferably, all these three motifs are present.
Alternatively or additionally, the amino acid sequence of the 5’-3’ exonuclease of this embodiment (notably of items (ii) to (iv)) preferably comprises one or more amino acid sequence segments selected from the group consisting of SEQ ID NO: 47 (APAESVHACGVL), SEQ ID NO: 48 (APAASVHACGVL), SEQ ID NO: 49 (AKYAFDPADAGXXVVAAHRRL), SEQ ID NO: 50 (APASAPAAVRAA) and SEQ ID NO: 51 (LIITPVRXDAA), more preferably at positions corresponding to those in SEQ ID NO: 43.
In another preferred embodiment, the 5’-3’ exonuclease according to the invention is a protein whose amino acid sequence is or comprises
(i) the amino acid sequence defined in SEQ ID NO: 44 (PiE), or
(ii) an amino acid sequence that has at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably at least 95% and most preferably at least 98% sequence identity to the amino acid sequence defined in SEQ ID NO: 44, or
(iii) an amino acid sequence that has at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably 95% and most preferably 98% sequence similarity to the amino acid sequence defined in SEQ ID NO: 44, or
(iv) an amino acid sequence of from 1 to 131 , preferably from 1 to 98, more preferably from 1 to 65, even more preferably from 1 to 49 and most preferably from 1 to 32 amino acid substitutions, additions, insertions and/or deletions compared to the amino acid sequence defined in SEQ ID NO: 44.
The group (i) exonuclease, notably PapE and PiE and variants thereof as defined above with reference to SEQ ID NOs: 43 and 44, may be a protein whose amino acid sequence comprises the amino acid sequence segments of SEQ ID NO: 54 (FRYCVGRAD) and/or SEQ ID NO: 55 (PXPLMXFFEAATQ), e.g. at positions corresponding to those in SEQ ID NOs: 43.
Preferred embodiments of Group (I) exonucleases
Group (I) exonucleases generally share certain sequence motifs that are indicated in the alignment of Fig. 27. The variants of the exonucleases given above preferably have these sequence motifs, such as the UL12-qroup specific Motif I of amino acid sequence K/RPLMXFF/YE. In this amino acid sequence segment, K/R means either K or R and F/Y means either F or Y. X is defined as above.
The Group (I) exonuclease is, alternatively or additionally to the above embodiments, a 5’-3’ exonuclease comprising UL12-group specific Motif I, Motif I, the bridge region, Motif la, and the
UL12-I ;ific Motif II cf. Fig. 27 (underlining highlights segments indicated in Fig. 27).
The 5’-3’ exonuclease may thus be an exonuclease whose amino acid sequence comprises the amino acid sequence segment of SEQ ID NO: 56 (PXPLMXFXEAATQXQXXXQLWXLLRRGLXTAXTLXWGXXGPXFXXXWLXXXXXXXXXXXXX AXXFGRXNEXXARXXLFRYCVGRAD), wherein X at position no. 35 of this sequence is preferably K or R. This segment is preferably present at the position corresponding to that in UL12-1. Alternatively, the Group (I) exonuclease may comprises Motif I, cf. Fig. 27. Accordingly, the 5’-3’ exonuclease may be a protein whose amino acid sequence comprises an amino acid sequence segment of (and including) amino acid residues no. 9 to 37 of SEQ ID NOs: 56, wherein X at position no. 35 of SEQ ID NOs: 56 is preferably K or R.
The Group (I) exonuclease is, alternatively or additionally to the above embodiments, a 5’-3’ exonuclease comprising UL12-qroup specific Motif II, cf. Fig. 27. The 5’-3’ may thus be an exonuclease whose amino acid sequence comprises the amino acid sequence segment RYCV or FRYCV, preferably followed contiguously by the segment GRAD to result in segment RYCVGRAD or FRYCVGRAD, respectively. As above, such sequence segments are preferably present at a position corresponding to that in LIL12-1.
Alternatively or additionally, the Group (I) exonuclease may comprise Motif II, cf. Fig. 27. Accordingly, the 5’-3’ exonuclease may be a protein whose amino acid sequence comprises an amino acid sequence segment of SEQ ID NO: 95 (GVLXDXHTGMVGASLD), wherein the X at position no. 4 is preferably M, V, L, or I; H at position 7 may alternatively be R; and/or M at position 10 may alternatively be V or L.
Alternatively or additionally, the Group (I) exonuclease may comprises Motif III, cf. Fig. 27. Accordingly, the 5’-3’ exonuclease may be a protein whose amino acid sequence comprises an amino acid sequence segment of SEQ ID NO: 96 (EVKCRAKYAFDPXD), wherein the V at position no. 2 may alternatively be I; the A at position 9 may alternatively be L or T; and/or the D at position 14 may alternatively be E.
Alternatively or additionally, the Group (I) exonuclease may comprises Motif VI, cf. Fig. 27. Accordingly, the 5’-3’ exonuclease may be a protein whose amino acid sequence comprises an amino acid sequence segment of SEQ ID NO: 97 (FANPRHPNFKQILVQXYVLXXHFP), wherein the K at position no. 10 may alternatively be R; and/or the X at position 16 is preferably G, A, S, or T.
Other preferred embodiment comprising 5’ -3’ exonucleases UL12 and variants
In a preferred embodiment, the protein for editing endogenous DNA is a fusion protein comprising a CRISPR-nuclease as endonuclease and a LIL12 (UL12_1 or UL12_2 or their variants defined herein) as exonuclease. In this embodiment, the protein for editing endogenous DNA is preferably a protein, wherein the site-specific endonuclease is a CRISPR-nuclease as defined above, and wherein the site-specific endonuclease and the 5’-3’ exonuclease are fused via a polypeptide linker, the polypeptide linker having a length of 25 amino acids or more, preferably 30 amino acids or more, more preferably 40 amino acids or more, even more preferably 50 amino acids or more and most preferably 60 amino acids or more, and wherein the 5’-3’ exonuclease is a protein whose amino acid sequence is or comprises (i) the amino acid sequence defined in SEQ ID NO: 32 (LIL12-1) or SEQ ID NO: 33 (UL12-2), or
(ii) an amino acid sequence that has at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably at least 95% and most preferably at least 98% sequence identity to the amino acid sequence defined in SEQ ID NO: 32 or SEQ ID NO: 33, or
(iii) an amino acid sequence that has at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably at least 95% and most preferably at least 98% sequence similarity to the amino acid sequence defined in SEQ ID NO: 32 or SEQ ID NO: 33, or
(iv) an amino acid sequence of from 1 to 120, preferably from 1 to 93, more preferably from 1 to 62, even more preferably from 1 to 46 and most preferably from 1 to 31 amino acid substitutions, additions, insertions and/or deletions compared to the amino acid sequence defined in SEQ ID NO: 32 or SEQ ID NO: 33.
This embodiment may be combined with other preferred embodiments described herein, such as comprising the amino acid sequence segment of SEQ ID NO: 56.
Group (II) exonucleases
In another preferred embodiment, the 5’-3’ exonuclease according to the invention is a protein the amino acid sequence of which comprises or consists of:
(i) the amino acid sequence of SEQ ID NO: 30 (T7 exonuclease); or
(ii) an amino acid sequence having at least 70%, preferably at least 80%, more preferably at least 85%, even more preferably at last 90%, and most preferably at least 95% sequence identity to the amino acid sequence of SEQ ID NO: 30; or
(iii) an amino acid sequence having at least 80%, preferably at least 85%, more preferably at least 95%, even more preferably at least 95%, and most preferably at least 98% sequence similarity to the amino acid sequence of SEQ ID NO: 30; or
(iv) an amino acid sequence having from 1 to 50, preferably at most 40, more preferably at most 30, even more preferably at most 20, and most preferably at most 10 amino acid substitutions, additions, deletions and/or insertions compared to the amino acid sequence of SEQ ID NO: 30.
In a preferred embodiment, the 5’-3’ exonuclease according to the invention is a protein whose amino acid sequence is or comprises
(i) the amino acid sequence defined in SEQ ID NO: 45 (ME15), or
(ii) an amino acid sequence that has at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably at least 95% and most preferably at least 98% sequence identity to the amino acid sequence defined in SEQ ID NO: 45, or (iii) an amino acid sequence that has at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably at least 95% and most preferably at least 98% sequence similarity to the amino acid sequence defined in SEQ ID NO: 45, or
(iv) an amino acid sequence of from 1 to 60, preferably from 1 to 45, more preferably from 1 to 30, even more preferably from 1 to 22 and most preferably from 1 to 15 amino acid substitutions, additions, insertions and/or deletions compared to the amino acid sequence defined in SEQ ID NO: 45.
Optionally, the amino acid sequence of the 5’-3’ exonuclease of this embodiment may comprise one or more amino acid sequence segments selected from the group consisting of SEQ ID NO: 52 (APTESETLWDCI) and SEQ ID NO: 53 (ILRFNDYNIDT).
In another preferred embodiment, the 5’-3’ exonuclease according to the invention is a protein whose amino acid sequence is or comprises
(i) the amino acid sequence defined in SEQ ID NO: 46 (SpiPh), or
(ii) an amino acid sequence that has at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably at least 95% and most preferably at least 98% sequence identity to the amino acid sequence defined in SEQ ID NO: 46, or
(iii) an amino acid sequence that has at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably at least 95% and most preferably at least 98% sequence similarity to the amino acid sequence defined in SEQ ID NO: 46, or
(iv) an amino acid sequence of from 1 to 59, preferably from 1 to 44, more preferably from 1 to 29, even more preferably from 1 to 22 and most preferably from 1 to 14 amino acid substitutions, additions, insertions and/or deletions compared to the amino acid sequence defined in SEQ ID NO: 46.
The 3’-overhangs produced by the action of the exonuclease, preferably on both ends of the cleaved endogenous DNA, are homologous to the homology arms of the donor nucleic acid and can anneal to them. This annealing is the starting point for the DNA break repair through HDR.
Where the protein is defined herein by a number or number range of amino acid substitutions, additions, deletions, and/or insertions, amino acid substitutions, additions, deletions, and insertions may be combined, but the given number or number range refers to the sum of all substitutions, additions, insertions and deletions of amino acid residues compared to a reference sequence defined by a SEQ ID NO. Among amino acid substitutions, additions, insertions and deletions, amino acid substitutions, additions, and deletions are preferred. The term “insertions” relates to insertions of amino acid residues within the amino acid sequence of the reference sequence, i.e. excluding additions at the C- or N-terminal end. The term “additions” means additions of amino acid residues at the C- or N-terminal end of the amino acid sequence of a reference sequence. A “deletion” may be a deletion of a terminal or an internal amino acid residue of a reference sequence. Herein, where the protein or a domain thereof is defined by a number or number range of amino acid substitutions, additions, deletions, and/or insertions relative to a reference sequence, the protein may, as an alternative embodiment, have from 1 to several amino acid substitutions, additions, insertions or deletions relative to the indicated amino acid sequence of segment.
Donor nucleic acid
The donor nucleic acid (sometimes also referred to as “repair template”, “donor fragment”; or as “DNA repair template” or “donor DNA” if it is DNA) for use with the protein of the invention (i.e. the fusion protein or oligomeric protein (protein complex) of the invention) is a nucleic acid molecule that generally comprises a donor sequence flanked by a first and a second homology arm, one at the 5’ end and the other at the 3’ end of the donor nucleic acid. The first homology arm is generally homologous to a first region flanking a target site in the genome of said cell on a first side of said target site. The second homology arm is homologous to a second region flanking said target site on the second side of said target site.
When provided into the cell, the donor nucleic acid may be single stranded or double stranded DNA or RNA and may be linear or circular. However, if the donor nucleic acid is provided into the cell as RNA, it generally needs to be transcribed in the cell from RNA into a donor DNA for the HDR to work. The donor nucleic acid may be transcribed from RNA into DNA by a reverse transcriptase that may be provided into the cell in addition to the donor RNA. In order to avoid retro-transcription, the donor nucleic acid is preferably DNA and is, in this embodiment, also referred to herein as donor DNA.
The donor nucleic acid may be part of a DNA construct, plasmid or vector. The homology arms of the donor nucleic acid comprise a nucleotide sequence that is homologous to the endogenous DNA in proximity of the target site. The homology arm at the 5’ end of the donor nucleic acid may be homologous to a nucleotide sequence upstream of the target site and the homology arm at the 3’ end may be homologous to a nucleotide sequence downstream of the target site. Due to this homology, the 3’ overhangs that are generated by the 5’-3’ exonuclease of the protein of the invention can invade the homology arms of the donor nucleic acid and anneal to the complementary strand of the homology arm. The result is a hybrid DNA complex comprising the endogenous DNA and the donor nucleic acid. This hybrid DNA complex is also referred to as “displacement loop” (D-loop) and represents the first step of DNA DSB repair through HDR. To resolve the displacement loop, the 3’ overhangs annealed to the homology arms can serve as primers for a DNA polymerase to synthesize a new DNA strand using the homology arms as a template. This process allows inserting a copy of the donor sequence comprised in the donor nucleic acid into the endogenous DNA at the target site. It is possible to delete a specific nucleotide sequence in the endogenous DNA at the target site. For this purpose, a donor sequence may be used that is shorter than the nucleotide sequence it replaces in the endogenous DNA and with a suitable choice of the homology arms. For the deletion of a nucleotide sequence, e.g. of an unspecified number of nucleotides, in the endogenous DNA, the donor sequence may even be absent.
In one embodiment, the protein according to the invention is provided into a eukaryotic cell in combination with a donor nucleic acid for editing endogenous DNA using HDR. In an alternative embodiment, the protein according to the invention is provided into the cell without a donor nucleic acid. In this embodiment, the protein according to the invention may generate a deletion of at least one nucleotide in the endogenous DNA in the direct vicinity of the double strand break that was generated at the target site. Preferably, this embodiment generates deletions of two or more sequential nucleotides in the direct vicinity of the double strand break that was generated at the target site. Such an embodiment is particularly useful for deleting one or more nucleotides in a random manner in a non-coding region of the endogenous DNA.
The donor nucleic acid can be introduced into the cell in many different ways that are generally known to the skilled person. Depending on the delivery method, the donor nucleic acid may be introduced into the cell as single stranded or double stranded DNA or RNA. In the case of plants, the donor nucleic acid may be introduced into the cell through Agrobacterium- mediated transformation. To provide the donor nucleic acid into a plant cell or a cell of a plant, the cell or plant is contacted with a suspension of Agrobacterium cells that carry the donor nucleic acid within the T-DNA of a T-DNA binary plasmid. According to current knowledge, Agrobacterium cells secrete the T-DNA as single-stranded DNA into plant cells, which is, as demonstrated by the Examples below, sufficient for HDR to work and eventually lead to gene replacement (gene targeting) according to the present invention. However, it is also possible that the donor nucleic acid, when secreted into a plant cell as part of the single-stranded T- DNA, is converted into double stranded DNA before HDR takes place. Either way, the provision of the donor nucleic acid in either single-stranded or double stranded form into a plant cell using transformation with Agrobacterium triggers HDR and eventually leads to gene replacement or editing according to the present invention.
As indicated above, the donor nucleic acid may alternatively be provided into the cell in the form of RNA, for example through delivery by an RNA virus or after expression from a transgene. However, if the donor nucleic acid is provided into the cell in the form of RNA, it should be reverse- transcribed into DNA using a reverse transcriptase that may be coexpressed within the cell. Other transformation or transfection methods generally known in the art may be employed. For example, it is also possible to deliver the donor nucleic acid into a eukaryotic cell, notably into an animal cell, by electroporation, chemical transfer using transfection reagents, or by gene bombardment.
The donor nucleic acid may be linear single stranded DNA. Preferably, however, the donor nucleic acid is a linear double stranded DNA, more preferably linear double stranded DNA. In any event, the donor nucleic acid comprises first and second homology arms as described above. The donor nucleic acid may optionally comprise a donor sequence that is positioned in between the two homology arms. The first homology arm may be located at the 5’ end of the donor nucleic acid and comprises a nucleotide sequence that shares preferably at least 95% sequence identity to a fragment on the endogenous DNA that lies upstream of the target site. The second homology arm at the 3’ end of the donor nucleic acid comprises a nucleotide sequence that shares at least 95% sequence identity to a fragment of the endogenous DNA that lies downstream of the target site. In a preferred embodiment, the homology arms of the donor nucleic acid show no mismatch to the endogenous DNA. In another preferred embodiment the homology arms have a perfect match to the endogenous DNA at least towards the 5’-end and the 3’-end of the donor nucleic acid molecule. If a homology arm shares at least 95% sequence identity with a segment of the endogenous DNA, the nucleotide sequence of the homology arm is at least 20 bp, preferably at least 60 bp, more preferably at least 60 bp, and most preferably at least 120 bp long and the sequence stretches with highest identity should be oriented towards the 5’-end and the 3’-end of the donor nucleic molecule.
Each homology arm should have a minimum length of 20 bp and may be up to 1000 bp long, preferably up to 500 bp long and more preferably up to 250 bp long. Preferably, the length of the homology arms is longer than 50 bp. In a specific embodiment, each of the two homology arms is between 100 and 300 bp long and allows perfect pairing to the endogenous DNA without a mismatch.
The donor sequence (may also be referred to herein as donor nucleotide sequence) that is optionally present in the donor nucleic acid is a nucleotide sequence that may be inserted into the endogenous DNA at the target site. If no donor sequence is present in the donor nucleic acid, no nucleotide sequence will be integrated from the donor nucleic acid into the endogenous DNA. However, a sequence segment may be deleted from the endogenous DNA, for example a sequence segment that lies between the two homology arms after the donor nucleic acid has annealed to the endogenous DNA.
The donor sequence, if present, comprises at least one nucleotide, preferably at least 10 nucleotides, more preferably at least 30 nucleotides, even more preferably at least 100 nucleotides. The maximum length of the donor sequence is not particularly limited. The donor sequence may be up to 15,000 nucleotides long. In another embodiment, the donor sequence may also be 20,000 nucleotides (20 kb) long. However, the insertion of long donor sequences into the endogenous DNA at the target site may be less likely than for shorter donor sequences. In an embodiment, the donor sequence of the repair template is up to 10,000 nucleotides long, preferably up to 7,000 nucleotides long, and more preferably up to 3000 nucleotides long.
The inventors have found that the protein according to the invention allows the insertion of unexpectedly long donor nucleotide sequences comprised in the donor nucleic acid into the endogenous DNA through HDR. The protein according to the invention allows increasing the frequency of gene replacement events preferably by for several orders of magnitude compared to the absence of the exonuclease, so that unexpectedly long nucleotide sequences can be inserted into the endogenous DNA. The donor sequence may be or may contain one or more open reading frames (ORFs) or entire gene to be inserted at the target site in the endogenous DNA.
Methods of modifying or editing endogenous DNA
The protein of the invention can be used for modifying (also “editing”) endogenous DNA of a eukaryotic cell at a target site of endogenous DNA. The term “editing endogenous DNA” refers to modifications of the endogenous DNA at a target site. The underlying mechanism of the modifications of the invention is believed to be homology directed repair (HDR). The modifications in the endogenous DNA depend and the donor nucleic acid and are selected from the group consisting of insertions into endogenous DNA, deletions from endogenous DNA, and substitutions in endogenous DNA. Here, insertion means that at least one nucleotide from the donor nucleic acid is inserted into the endogenous DNA at the target site. Substitution means that at least one nucleotide (preferably a segment of two or more nucleotides) of the endogenous DNA at the target site is replaced at the target site with at least one different nucleotide or with a segment of two or more nucleotides from the donor nucleic acid. Deletion means that at least one nucleotide (preferably a segment of two or more nucleotides) is deleted from the endogenous DNA at the target site. The insertion, substitution or deletion of nucleotides in the endogenous DNA is achieved through appropriate design of the donor nucleic acid.
The endogenous DNA of a eukaryotic cell may be the genomic DNA of the cell, but any double stranded DNA molecule (such as mitochondrial, plastid or other) that is contained within the cell can be edited. The endogenous DNA may be edited at a single target site or at two or more target sites simultaneously. The editing of the endogenous DNA according to the present invention may also be referred to as “gene replacement” or “gene targeting”. Gene replacement according to the present invention comprises the editing of endogenous DNA as defined above. In particular, gene replacement also comprises the generation of targeted gene knock-outs through the targeted deletion of one or more nucleotide sequence stretches or single base pairs in the endogenous DNA of a eukaryotic cell. This needs to be distinguished from non-target gene knock-outs where the nature of the mutation cannot be predicted in advance.
The protein of the invention allows increasing the frequency of double strand break repair through the homology directed repair (HDR) pathway. The invention also allows increasing the number of gene replacement events in a eukaryotic cell, presumably because DNA break repair through HDR is a prerequisite for gene replacement. The protein of the invention allows achieving a higher number (i.e. an increased frequency) of gene replacement events when provided with a donor nucleic acid in comparison to the separate action of the exonuclease and the endonuclease not being fused together or without forming a protein complex. In a preferred embodiment, the protein of the invention increases the number of gene replacement events at least 1.5-fold, more preferably at least 3-fold and even more preferably at least 5-fold over the separate provision of the exonuclease and the endonuclease (i.e. without being fused together or without forming a protein complex when provided with a donor nucleic acid). Further, the protein of the invention allows increasing the number of gene replacement events at least 1.5-fold, more preferably at least 2-fold and even more preferably at least 3-fold over fusion proteins or protein complexes comprising the T5 exonuclease as an exonuclease. The protein of the invention also allows increasing the number of gene replacement events at least 1.5-fold, more preferably at least 2-fold and even more preferably at least 3-fold over fusion proteins or protein complexes comprising a multimeric exonuclease.
The inventors have found that 5’-3’ exonucleases that are monomeric and show high in- vitro 5’-3’ exonuclease activity are particularly suited to significantly increase the frequency of HDR and eventually the frequency of gene replacement events when they are contained in a fusion protein or a protein complex with a site-specific endonuclease and are provided with a donor nucleic acid.
The editing of endogenous DNA according to the invention takes place in eukaryotic cells. Preferably, it takes place in at least one cell of a eukaryotic organism. Preferably, however, the editing takes place in multiple cells of a eukaryotic organism, such as in multiple cells of one or more leaves of a plant. The editing generally takes place in two or more cells in parallel.
The editing of endogenous DNA in a eukaryotic cell according to the invention requires that the protein of the invention and a donor nucleic acid are provided into the same cell. Eukaryotic cells generally comprises a functional HDR pathway naturally, so that no genetic engineering to provide a cell with the HDR pathway is generally necessary. Optionally, further components that modify and/or improve the frequency and/or efficiency of DNA break repair through homologous recombination (HDR) may be provided into the cell. For example, components like the proteins Rad51 and/or Rad52 may be provided into the cell to support the DNA break repair through HDR. Alternatively, components may be provided into the cell that downregulate the NHEJ pathway to favor DNA break repair through HDR.
After successful editing in a eukaryotic cell, it is possible to obtain an organism from the edited eukaryotic cell. For example, after editing an embryonic animal cell, an animal containing the edited endogenous DNA in all cells may be obtained. After editing a germ cell of an animal, the edited germ cell may be used for fertilizing another germ cell or the edited germ cell may be fertilized by another germ cell for obtaining an embryonic cell containing the edited endogenous DNA. In a further alternative, somatic animal cells comprising the edited endogenous DNA may be produced and propagated. Such cells may, for example, express a protein or other factor that is not expressed in suitable form in the starting cell. If the edited cells are administered to an organism lacking the protein or factor, the organism may be provided with the missing protein or factor. Accordingly, the invention provides a method of treating or preventing a genetic defect in a eukaryotic organism (e.g. a human being or an animal), comprising modifying endogenous DNA of a eukaryotic cell at a target site of the endogenous DNA (according to the invention), cultivating the modified cell and/or progeny cells of the modified cell, and administering the modified cell or cells to a eukaryotic organism in need thereof.
After successful editing in a eukaryotic plant cell, it is possible to regenerate plants from the cell, whereby the plants contain the endogenous DNA in all cells of the plant. This allows producing new plant lines (e.g. of crop plants) containing the edited endogenous DNA. Methods of regenerating plants from cells or tissue are generally known in the art of plant biotechnology. For example, plants may be regenerated from callus tissue using suitable media, as described e.g. in text books on plant biotechnology such as Slater, Scott and Fowler, Plant Biotechnology, second edition, Oxford University Press, 2008. Alternatively, the editing may be carried out in germline cells of floral tissue, e.g. using the floral dip transformation method described for example by Clough and Bent (The Plant Journal (1998) 16(6), 735-743). Seeds produced from such flowers may contain the edited endogenous DNA and can be selected and, if desired, be further bred to produce a plant line containing the edited endogenous DNA. Eukaryotic cells and organisms editable using the present invention
The eukaryotic cell that may be edited according to the invention may be a fungal (e.g. yeast) cell, a plant cell, or an animal cell, such as a human cell. Preferably, the eukaryotic cell is a plant cell. Accordingly, the organism of the invention may be a plant or animal organism. In one embodiment, humans are excluded and/or method of modifying the human germline are excluded from the invention. In one embodiment, the endogenous DNA of a single cell of a eukaryotic organism is edited. In another embodiment, the endogenous DNA of several cells of a eukaryotic organism (notably of a plant) are edited. In a further embodiment, the endogenous DNA of a somatic cell of a eukaryotic organism (notably of a plant) is edited. In another embodiment, the endogenous DNA of germline cells is edited so that the edited endogenous DNA is inherited to the progeny. In a further embodiment, the endogenous DNA of embryonic cells is edited so that the edited endogenous DNA is present in all cells of the organism that develops from the edited embryonic cells.
Among animal cells, human cells may be edited according to the invention. However, cells of livestock animals are preferred.
Among plants, the plant or cells thereof wherein editing according to the invention may be carried out is not particularly limited. The invention can be applied to monocot and dicot plants that can be edited according to the invention. The plant species for practicing this invention include, but are not restricted to, representatives of Leguminoseae, Solanaceae, Chenopodiaceae, Compositae, Cucurbitaceae, Brassicaceae, and Scrophulariaceae among dicotyledons, and Poaceae, Musaceae, and Zingiberaceae among monocots. Both crop and non-crop plants can be used, whereby crop plants are preferred. Common crop plants that are preferably edited with the protein according to the present invention include alfalfa, barley, beans, canola, cowpeas, cotton, corn, clover, lotus, lentils, lupine, maize, millet, oats, peas, peanuts, poplar, rice, rye, sweet clover, sunflower, sweetpea, soybean, sorghum, triticale, yam beans, velvet beans, vetch, wheat, wisteria, potato, banana, coffee, cacao, sugar beet, and nut plants.
Performing a method of the invention with such plants or cells of such plants allows producing cells of these plants containing the edited endogenous DNA. As mentioned above, plants containing the edited endogenous DNA in all cells of the plants may be produced therefrom.
Nucleic acids, DNA constructs, plasmids and vectors of the invention
The present invention also provides a nucleic acid molecule comprising a polynucleotide encoding the protein of the invention. This nucleic acid molecule is also referred to herein as first nucleic acid molecule. The polynucleotide is understood to be the coding sequence of the protein of the invention, such as the fusion protein of the invention or one subunit of the protein of the invention. If the protein of the invention is an oligomeric protein, the nucleic acid molecule may comprise two coding sequences, a first coding sequence encoding the first subunit and a second coding sequence encoding the second subunit. Alternatively, the first and the second subunit of an oligomeric protein, e.g. dimeric protein, may be encoded on separate nucleic acid molecules.
The invention also provides a first DNA construct comprising the polynucleotide(s) encoding the protein of the invention. The invention also provides a plasmid or vector comprising the DNA construct. The nucleic acid molecule and the DNA construct may contain additional genetic elements, such as genetic elements for expressing the protein in a eukaryotic cell. Examples of such genetic elements are a promoter active in the cell or organism and operably linked to the polynucleotide, optional transcription enhancers, and/or transcription terminators. For Agrobacterium-mediated transformation, left and right T-DNA border sequences may also be such genetic elements. The first nucleic acid molecule, plasmid or vector may further comprise further nucleic acid segments, such as plasmid or vector backbone. The nucleic acid molecule, DNA construct, plasmid or vector may be single stranded or double stranded and may be circular or linear; preferably they are double-stranded and circular.
The first nucleic acid molecule may, in one embodiment, further comprise the donor nucleic acid. The first nucleic acid molecule may, in another embodiment, further encode the gRNA for a CRISPR nuclease as the endonuclease of the invention. In a still further embodiment, the first nucleic acid molecule comprises the donor nucleic acid and encodes the gRNA for a CRISPR nuclease. After transfection into the eukaryotic cell or cells, the donor nucleic acid may be cut out form the nucleic acid molecule, e.g. using the CRISPR nuclease and additional gRNAs that guide the CRISPR nuclease to suitable cleavage sites on the nucleic acid molecule for cutting out the donor nucleic acid.
The first nucleic acid molecule, DNA construct, plasmid or vector of the invention may be generated by cloning together the elements to be combined. A convenient cloning method that was also used in the Examples is Golden Gate (GG) cloning that makes use of type IIS restriction enzymes for restriction and seamless ligation, cf. WO 2008/095927 and WO2011154147.
The invention further describes a second nucleic acid molecule. This second nucleic acid molecule is or comprises the donor nucleic acid of the invention as described above. The invention also provides a second DNA construct that comprises the donor nucleic acid and optionally further elements such as left and right T-DNA borders for Agrobacterium-mediated transformation. The second nucleic acid molecule may comprise further nucleic acid segments such as a plasmid or vector backbone. The skilled person understands that the donor nucleic acid may be comprised by the first nucleic acid molecule or by the second nucleic acid molecule. Where a second nucleic acid molecule is used for the donor nucleic acid, it may further encode the gRNA for a CRISPR nuclease as the endonuclease of the invention.
The invention describes a third nucleic acid molecule comprising or encoding a gRNA, such as a sgRNA. If the third nucleic acid molecule is RNA, it comprises the gRNA. If it is DNA, the third nucleic acid molecule comprises a polynucleotide encoding the gRNA. For transcription in eukaryotic cells, the third nucleic acid molecule, if DNA, may comprise a (third) DNA construct containing a promoter operably linked to the polynucleotide encoding the gRNA. For transfection in plant cells using Agrobacterium-mediated transfection, the third DNA construct may further contain left and right T-DNA borders. As indicated above, additional gRNAs may be present or encoded on the third nucleic acid molecule, e.g. for cutting out the donor nucleic acid from the same or another nucleic acid molecule.
Promoters for expression in eukaryotic cells are generally known. For expression in plant cells and plants, promoters active in plant cells are used. The term "promoter active in plant cells" means a DNA sequence that is capable of controlling (initiating) transcription in a plant cell. This includes any promoter of plant origin, but also any promoter of non-plant origin which is capable of directing transcription in a plant cell, i.e. , certain promoters of viral or bacterial origin such as the cauliflower mosaic virus 35S promoter (CaMV35S promoter) (Harpster et al. (1988) Mol Gen Genet. 212(1):182-90, the subterranean clover virus promoter No 4 or No 7 (WO9606932), or T-DNA gene promoters but also cell cycle specific (Ferreira et al., (1994) Plant Cell 6: 1763-1774), tissue-specific or organ-specific promoters including but not limited to seed-specific promoters (e.g., WO89/03887), egg-cell specific promoter (Steffen et al., (2007) Plant J. 51 :281-292; Sprunck et al., (2012) Science 338:1093-1097), organ- primordia specific promoters (An et al. (1996) Plant Cell 8(1): 15-30), stem-specific promoters (Keller et al., (1988) EMBO J. 7(12): 3625-3633), leaf specific promoters (Hudspeth et al. (1989) Plant Mol Biol. 12: 579-589), mesophyl-specific promoters (such as the light-inducible Rubisco promoters), root-specific promoters (Keller et al. (1989) Genes Dev. 3: 1639-1646), tuber-specific promoters (Keil et al. (1989) EMBO J. 8(5): 1323-1330), vascular tissue specific promoters (Peleman et al. (1989) Gene 84: 359-369), stamen-selective promoters (WO 89/10396, WO 92/13956), dehiscence zone specific promoters (WO 97/13865) and the like. For transient expression, constitutive promoters, i.e. promoters that are not developmentally regulated, are preferably used. However, constitutive promoters may be tissue-specific or organ-specific. Preferred promoters are those used in the Examples described below.
Delivery of the protein of the invention, the donor DNA, and/or the gRNA into a eukaryotic cell The editing of the endogenous DNA according to the present invention requires that the protein, the donor nucleic acid, and, in the case of a CRISPR nuclease as the endonuclease, the gRNA or gRNAs are simultaneously present in the same cell. These elements are also referred to herein as components of the invention. This means that the components of the invention should be present in the same cell at the same time and, thus, may be provided to the cells in parallel or consecutively. The components may be provided to eukaryotic cells, cells of an organism, or the organism transiently or stably. Transient means that incorporation of nucleic acid molecule(s), or parts thereof, encoding or comprising the components into the genome of the eukaryotic cell is very unlikely and generally does not occur (e.g. because no selection pressure for incorporating the nucleic acid molecules into the genome of the eukaryotic cell or organism is applied). For example, a DNA plasmid as the nucleic acid molecule of the invention may comprise or encode a component(s) of the invention (where required operably linked to a promoter so that the component(s) can be expressed inside the cell from the DNA plasmid).
Stably providing the components to eukaryotic cells, cells of an organism, or the organism means that nucleic acid molecule(s), or parts thereof, encoding or comprising the component(s) are incorporated into the genome of the eukaryotic cell or organism (e.g. by application of selection pressure and selection of cells or organism wherein the incorporation has taken place or using Agrobacterium-mediated transformation). Agrobacterium-mediated transformation generally integrates the T-DNA comprising or encoding the component(s) into the genome of the cell. The genome then comprises or encodes the components comprised or encoded in the T-DNA so that the components can be expressed by a promoter operably linked to them or cut out (e.g. in the case of the donor nucleic acid). If somatic plant cells or cells of a plant are transformed by Agrobacterium, the components are generally not passed on to the daughter plants of the transformed plant. If germline plant cells or cells of a plant are transformed by Agrobacterium, the components can be passed on to the daughter cells or organisms, whereby the components or coding sequence encoding the components are stably integrated into the genome over subsequent generations.
It is not necessary that all components of the invention are provided to the eukaryotic cell or organism in the same way, i.e. all stably or all transiently. Instead, it is possible to provide one or more components stably and to provide one or more other components transiently. For example, a first nucleic acid molecule encoding the protein of interest may be provided stably to the eukaryotic cell or organism. Cells containing and expressing the protein of the invention stably may then be provided with the second nucleic acid molecule comprising the donor nucleic acid and optionally the third nucleic acid molecule encoding one or more gRNA(s) transiently.
Providing components of the invention transiently to the eukaryotic cells, cells of an organism or the organism is advantageous, as it is generally desired to limit genetic modifications to the specific desired editing event. However, as indicated above, it is possible to incorporate one or more of the components of the invention stably into the eukaryotic cells, cells of an organism or the organism such that they can be expressed and provide other components transiently. Any undesired stable genetic modifications may then be removed from a transgenic plants by segregation and out-crossing.
Several possibilities exist to provide the eukaryotic cell with the components of the invention for practicing the methods of the invention, notably transiently. The components may, for example, be injected into or be taken up by (for example upon electroporation or PEG mediated transformation) the eukaryotic cell as a mixed solution of the protein of the invention, the donor DNA (or a DNA comprising the donor DNA), and the gRNA. These methods can be practiced both with animal and with plant cells. However, in general, the fusion protein and the gRNA are provided to the cells via genetic transformation or transient transfection of nucleic acid molecules encoding these components such that they are expressed in the transformed or transfected eukaryotic cells, and the donor DNA will also be transformed or transfected into the cells.
Methods for introducing the nucleic acid molecule(s) into animal cells are known to the skilled person, such as electroporation, microinjection, or using transfection agents (e.g. as described in WO2014056590 or WO2014053245). These methods are particularly suitable for transiently provided the components to the cells.
Also, various methods for introducing the DNA molecule(s) into plant cells, cells of a plant or a plant are known, and examples are electroporation, PEG (polyethylene glycol) transformation, microinjection, particle bombardment, and the use of viral vectors. Again, these methods are particularly suitable for transiently provided the components to plant cells. However, the preferred method of introducing the DNA molecules of the invention into plant cells or cells of a plant is Agrobacterium-mediated transformation. Agrobacten c/m-mediated transformation is well-established in the field of plant biotechnology, e.g. from text books on plant biotechnology such as Slater, Scott and Fowler, Plant Biotechnology, second edition, Oxford University Press, 2008. It comprises contacting living plant tissue (e.g. leaves or floral tissue) with a suspension of Agrobacterium cells containing a Ti-plasmid and a binary vector comprising T-DNA. Entry of the Agrobacterium cells into the plant tissue may be facilitated by a pressure difference (e.g. using a needleless syringe) or sucking (e.g. vacuum infiltration). Alternatively, plant tissue may be sprayed with a suspension containing Agrobacterium cells and optionally an abrasive and a surfactant, e.g. as described in WO2012019660.
For Agrobacterium-mediated transfection, the first nucleic acid molecule of the invention may be a plasmid (such as binary vector) containing in its T-DNA the first DNA construct encoding the protein of the invention. The second nucleic acid molecule of the invention may be a plasmid (such as binary vector) containing in its T-DNA the donor nucleic acid of the invention. The third nucleic acid molecule of the invention may be a plasmid (such as binary vector) containing in its T-DNA a DNA construct encoding the one or more gRNAs of the invention. However, it is possible to produce a binary vector that contains in its T-DNA more than one of said nucleic acids in a single molecule, such as the first and the second nucleic acid. Optionally, such plasmid may additionally contain the third nucleic acid encoding the gRNA(s) in its T-DNA. If two or more nucleic acid molecules are used, each type of nucleic acid molecules may be separately introduced as a binary vector in Agrobacterium and cultured. For transformation or transfection, the one, two or more Agrobacterium cultures, each containing one binary vector may be mixed and the mixture may be used for transforming plant cells or cells of a plant.
The Agrobacterium may belong to the species Agrobacterium tumefaciens or Agrobacterium rhizogenes that are commonly used for plant transformation and transfection and which are known to the skilled person from general knowledge. The Agrobacterium strain to be used in the processes of the invention may comprise a nucleic acid molecule (Ti-plasmid or binary vector) that may be said first, second or third nucleic acid molecule, or a nucleic acid molecule comprising two or more of the nucleic acids of the invention. The DNA construct(s) is/are typically present in T-DNA of the plasmid or binary vector for introduction of the nucleic construct into plant cells by the secretory system of Agrobacterium. On at least one side or on both sides, the nucleic acid construct(s) is/are flanked by a T-DNA border sequence for allowing transfection of said plant(s) and introduction into plant cells or cells of a plant. Preferably, said DNA construct(s) is/are present in T-DNA and flanked on both sides by T-DNA border sequences. Herein, the term “DNA construct” means a recombinant construct comprising or encoding one or more components of the invention.
The DNA constructs may be present in the T-DNA of a Ti-plasmid or binary vector of the Agrobacterium strain. Ti-plasmids or binary vectors may contain a selectable marker outside of said T-DNA for allowing cloning and genetic engineering in bacteria. However, the T-DNA that is transferred into plant cells may not contain a selectable marker that would, if present, allow selection of plant or plant cells containing said T-DNA. Examples of selectable marker genes that should, in this embodiment, not be present in T-DNA of the Ti-plasmid or binary vectors are an antibiotic resistance gene or a herbicide resistance gene. The process of the invention preferably makes use of transient transfection. In this embodiment, the process of the invention does not comprise a step of selecting for plant cells or plants having incorporated the nucleic acid molecule(s) of the invention by using such antibiotic resistance gene or a herbicide resistance gene. Accordingly, in this embodiment, no antibiotic resistance gene or a herbicide resistance gene needs to be incorporated into the plant cells or plants. However, it is possible to use suitable markers for selecting or identifying the editing event and cells wherein the editing has occurred.
Agrobacterium-mediated gene transfer and vectors therefor are known to the skilled person, e.g. from the references cited herein or from textbooks on plant biotechnology such as Slater, Scott and Fowler, Plant Biotechnology, second edition, Oxford University Press, 2008. Agrobacterium strains usable in the invention are those that are generally used in the art for transfecting or transforming plants. Generally, binary vector systems and binary strains are used, i.e. the vir genes required for transfer of T-DNA into plant cells on the one hand and the T-DNA on the other hand are on separate plasmids. Examples of usable Agrobacterium strains are given in the article of Hellens et al., Trends in Plant Science 5 (2000) 446-451 on binary Agrobacterium strains and vector systems. In the context of a binary Agrobacterium strain, the plasmid containing the vir genes is referred to as “vir plasmid” or “vir helper plasmid”. The plasmid containing the T-DNA to be transfected is the so-called binary vector that may be a “DNA molecule” or “vector” of the invention.
Accordingly, the invention also provides an Agrobacterium cell containing the first nucleic acid molecule of the invention. Notably, the invention provides an Agrobacterium cell comprising a plasmid comprising in the T-DNA the first DNA construct containing the polynucleotide encoding the fusion protein.
Co-transfection by Agrobacterium can be achieved by preparing two or more different Agrobacterium cultures, a first one that contains a first nucleic acid molecule (Ti plasmid or binary vector), construct or vector encoding the fusion protein and a second Agrobacterium culture containing the second nucleic acid molecule. A third Agrobacterium culture containing the third nucleic acid molecule may also be prepared. Suspensions of these Agrobacterium cultures may be separately grown and mixed prior to transfection. The suspension of agrobacteria may be produced as follows. A nucleic acid molecule or vector may be transformed into the Agrobacterium strain and transformed Agrobacterium cultures may be grown preferably under application of selective pressure for maintenance of the nucleic acid molecule in question. In one method, the Agrobacterium strain to be used in the processes of the invention is then inoculated into a culture medium and grown to a high cell concentration. Agrobacteria are generally grown up to a cell concentration corresponding to an OD at 600 nm of at least 1, typically of about 1.5. Such highly concentrated agrobacterial suspensions are then diluted to achieve the desired cell concentration. For diluting the highly concentrated agrobacterial suspensions, water or Agrobacterium infiltration medium may be used. The water may contain a buffer or salts. The water may further contain the surfactant or wetting agent. Alternatively, the concentrated agrobacterial suspensions may be diluted with water, and any additives such as the surfactant and the optional buffer substances are added after or during the dilution process. Separately produced suspensions for co-transfection may then be mixed and the mixed suspension be used for transfecting plant cells or cells of a plant.
If plant cells in cell culture are to be transfected, an Agrobacterium suspension may be added to the plant cell culture. If selected parts of a plant such as plant leaves are to be transfected, the generally known agroinfiltration may be used, whereby a pressure difference is used to insert the Agrobacterium suspension into plant tissue. For example, a needle-less syringe containing the Agrobacterium suspension may be used to press an Agrobacterium suspension into plant tissue. In another agroinfiltration method, an entire plant or major parts of a plant is dipped upside down into an Agrobacterium suspension, a vacuum is applied and then quickly released, whereby an Agrobacterium suspension is inserted into plant tissue.
Kit for editing endogenous DNA
The invention also comprises a kit of parts for editing endogenous DNA at a target site in a eukaryotic cell. The kit comprises at least two parts: A donor nucleic acid, donor construct or second nucleic acid molecule as described in the section “Donor nucleic acid” and a protein of the invention as described in the section “The protein and the fusion protein of the invention”. The kit comprises the protein of the invention in the form of expressed protein or as nucleic acid molecule comprising a polynucleotide encoding the protein (e.g. fusion protein). The kit may further comprise a eukaryotic cell. When the donor nucleic acid and the protein comprised in the kit of the invention are provided into the same eukaryotic cell, the kit allows editing of the endogenous DNA of the eukaryotic cell as described herein.
If necessary, the kit may also comprise one or more gRNA(s) or one or more nucleic acid molecule encoding the one or more gRNA that bind to the site-specific endonuclease portion of the protein.
The invention also provides a kit comprising a nucleic acid molecule comprising a polynucleotide encoding said first protein subunit described above and a nucleic acid molecule comprising a polynucleotide encoding said second protein subunit described above.
In addition, the invention also provides a kit of parts for editing endogenous DNA in a eukaryotic cell or in a eukaryotic organism at a target site of the endogenous DNA, wherein the kit comprises a fusion protein comprising the LIL12 exonuclease and a donor nucleic acid. In this particular embodiment the kit comprises a donor nucleic acid according to the invention or a donor construct comprising said donor nucleic acid according to the invention and a fusion protein for editing endogenous DNA, comprising a site-specific endonuclease and a 5’-3’ exonuclease, wherein the site-specific endonuclease and the 5’-3’ exonuclease are fused via a polypeptide linker, and wherein the polypeptide linker has a length of 25 amino acids or more, preferably 30 amino acids or more, more preferably 40 amino acids or more, even more preferably 50 amino acids or more and most preferably 60 amino acids or more, and wherein the site-specific endonuclease is preferably as defined above herein, and wherein the 5’-3’ exonuclease is a protein whose amino acid sequence is or comprises (i) the amino acid sequence defined in SEQ ID NO: 32 (UL12-1) or SEQ ID NO: 33 (UL12-2), or (ii) an amino acid sequence that has at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably at least 95% and most preferably at least 98% sequence identity to the amino acid sequence defined in SEQ ID NO: 32 or SEQ ID NO: 33, or (iii) an amino acid sequence that has at least 80%, preferably at least 85%, more preferably at least 90%, even more preferably at least 95% and most preferably at least 98% sequence similarity to the amino acid sequence defined in SEQ ID NO: 32 or SEQ ID NO: 33, or (iv) an amino acid sequence of from 1 to 120, preferably from 1 to 93, more preferably from 1 to 62, even more preferably from 1 to 46 and most preferably from 1 to 31 amino acid substitutions, additions, insertions and/or deletions compared to the amino acid sequence defined in SEQ ID NO: 32 or SEQ ID NO: 33. Method for modifying endogenous DNA of a eukaryotic cell The invention also comprises methods for editing endogenous DNA at a target site of the endogenous DNA. One method relates to the insertion of a nucleotide sequence of interest into endogenous DNA of a eukaryotic cell at a target site and/or a deletion of a nucleotide sequence segment at the target site. Another method relates to the modification of the endogenous DNA of a eukaryotic cell at a target site. All these methods comprise the provision of a donor nucleic acid and a protein according to the invention into a eukaryotic cell or organism. The endonuclease of the protein (e.g. fusion protein) then cleaves the DNA at the target site and the exonuclease of the protein processes the cleaved DNA to generate 3’ overhangs, which can invade the homology arms of the donor nucleic acid to induce DNA modification through homology directed repair (HDR). The DNA modification may lead to the insertion of the insertion sequence carried within the donor nucleic acid into the endogenous DNA. Thus, depending on the design of the donor nucleic acid, the method of the invention allows the insertion, substitution and deletion of one or more base pairs in the endogenous DNA of the eukaryotic host cell. The protein of the invention and the donor nucleic acid as well as the principles and processes for the insertion of the insertion sequence of the repair template into the endogenous DNA are described in the sections above. Method of inserting long nucleotide fragments into the genome The invention also provides a method of inserting nucleotide fragments of at least 5,000 base pairs into the genome of a eukaryotic cell at a target site. The method comprises the provision of the protein of the invention and a donor nucleic acid into a eukaryotic cell, wherein the donor nucleic acid comprises an insertion sequence that is at least 5,000 nucleotides long and wherein the eukaryotic cell should harbor a functional HDR pathway. In an embodiment, the donor nucleic acid comprises an insertion sequence of 7,000 nucleotides and the 7,000 nucleotides of the insertion sequence may be inserted into the endogenous DNA of a eukaryotic cell at a target site. In another embodiment, the insertion sequence is 15,000 nucleotides long and the insertion sequence of 15,000 nucleotides is inserted into the endogenous DNA at a target site. In a further embodiment, the insertions sequence is 20,000 nucleotides long and the insertion sequence of 20,000 nucleotides is inserted into the endogenous DNA at a target site. An upper limit for the length of the insert does not exist, however, the frequency of successful insertion events may decrease with the length of the insertion sequence. The present invention also provides a method for modifying endogenous DNA of a eukaryotic cell or a eukaryotic organism at a target site that does not depend on HDR and a donor nucleic acid/donor sequence. This method is particularly suited to induce random deletions of one or more sequential nucleotides in the endogenous DNA at the target site, for example in non-coding regions or cis-active elements. In an embodiment, this method comprises providing a cell or an organism with: a protein according to the invention, or a nucleic acid molecule according to the invention, or a nucleic acid construct, plasmid or vector according to the invention, or a kit according to the invention. As a result, the method generates modifications to the endogenous DNA, wherein the modifications involve deleting a sequence of two or more nucleotides in the endogenous DNA upstream and/or downstream of the target site in a random manner. EXAMPLES Example 1: Donor construct design for translational GUS fusion to Nicotiana benthamiana Phosphoglycerate kinase NbPGK (Niben101Scf05688g08010.1) First we identified putative single guide RNA (sgRNA) target sites in the genomic sequence of NbPGK close to the stop codon in a PAM-in orientation (via CRISPOR (http://crispor.tefor.net/) and CRISPR-P v2.0 (http://crispr.hzau.edu.cn/CRISPR2/); Fig.1). For the construction of the Donor construct we amplified and cloned approx.1 kb upstream and downstream of the stop codon as 5' and 3' homology arms (HAs), respectively. The sgRNA target sequences were mutated in the Donor fragment to prevent Cas9-mediated cleavage of the Donor (before and/or after integration). Amplified fragments were domesticated for Golden- Gate (GG) cloning by removing internal Bsal or Bpil sites using corresponding oligonucleotides. In the Donor (5' HA), the stop codon of NbPGK was deleted to allow a translational read-trough, leading to a translational NbPGK-GUS fusion. Upstream and downstream of the donor fragment an orthogonal target site for sgRNA_Csy4-3 was integrated in a PAM-out orientation for cleavage, processing and release of the donor fragment. Donor fragments were assembled via GG-cloning and delivered as T-DNAs via Agrobacterium tumefaciens (strain GV3101 pMP90)-mediated transient expression in leaves of N. benthamiana Nb).
The polynucleotide sequence of the Sequence of Donor Fragment (PGK-GUS) is that of the following SEQ ID NO: 1 :
In the polynucleotide of SEQ ID NO: 1 as represented above, the formatting has the following meaning:
LB: T-DNA left border; |sgRNA-Csy4-3 (PAM) |: target sequence for cutting by Cas9 outside the donor fragment; 5'HA + 3'HA: 5’ and 3’ homology arms; (sgR-PGK1+2 PAM): sequences of the PGK locus targeted by single-guide RNAs but mutated in the donor; SIS
(introns): GUS gene with introns; RB: T-DNA right border
Example 2: Design of SpCas9 exonuclease fusions and sgRNAs
Exonucleases were codon optimized for Nicotiana benthamiana and synthesized as Level -1 GG-modules with matching overhangs (Fig. 2). LF2 linker fragments were amplified via primer extension followed by assembling into Level -1 modules leading to 2xLF2 or 4xLF2 (see sequences below). A 432 bp fragment from the XTEN linker (144 aa) was synthesized as Level -1 GG-module. Shorter variants of the XTEN linker (XTEN16 and XTEN40) were generated via PCR using the 432 bp XTEN fragment as template and assembled as Level -1 GG-modules. Modules for exonucleases and linkers were assembled together into Level 0 modules via GG-cloning (Fig. 2). Resulting exonuclease-linker Level 0 modules were assembled together with 2x35S(short), Omega translational enhancer, SpCas9i and tOCS terminator as transcriptional unit into Level 1. Level 1 expression vectors where transformed into Agrobacterium tumefaciens (GV3101 pMP90) and used for transient expression in N. benthamiana leaves.
N-terminal Exonuclease-Cas9 fusions
The general structure of the construct encoding the protein of interest was as follows
(Fig. 2A):
RB (T-DNA right border) - GGAG - 2x35Sshort (double Cauliflower mosaic virus 35S promoter, short version) - translational enhancer (Omega translation enhancer from tobacco mosaic virus) - ATG - exonuclease - linker - GGT SpCas9 (version of SpCas9 with introns) /
LbCas12a (version of LbCas12a with introns) -> tOCS (OCS transcription terminator) -
CGC LB (T-DNA left border).
The sequences - VYXX I- designate the overhangs used in Golden Gate cloning to assemble the different modules.
The polynucleotide sequence of the promoter-enhancer module GGAG_2x35Ss_TACT
The polynucleotide sequence of Cas9 module for N-terminal fusions AGGT_NLS- SpCas9i-NLS*_GCTT is shown in the following SEQ ID NO: 3 (* means stop codon; NLS stands for nuclear localization signal; sequences in bold letters correspond to the coding sequence; sequences in normal letters correspond to the introns):
The polynucleotide sequence of Cas12a module for N-terminal fusions AGGT_/VLS- LbCas12ai-/VLS*_GCTT is shown in the following SEQ ID NO: 57 (* means stop codon; NLS stands for nuclear localization signal; sequences in bold letters correspond to the coding sequence; sequences in normal letters correspond to the introns):
In the polynucleotides of SEQ ID NO: 2, 3 and 17 as represented above, the formatting has the following meaning:
2x35Sshort. double Cauliflower mosaic virus 35S promoter, short version, translational enhancer: Omega translation enhancer from tobacco mosaic virus, A/LS-SpCas9(intron)-/VLS: version of SpCas9 with introns with N-terminal and C-terminal nuclear localization sequence (NLS), tOCS OCS transcription terminator, : Golden Gate cloning overhangs.
In the assembled Level 1 construct, the sequences of SEQ ID NO:2, 3 and 17 are combined with the fragment referred to as “Exo-linker” comprising the exonuclease and the polypeptide linker in Fig. 2A. The Exo-linker is assembled by GG (Golden Gate) cloning from the Level -1 fragments “Exo” and “Linker” shown in Fig. 2A.
The polynucleotide sequences of the exonuclease and of various linkers are given in the following:
Table 1: Level 1 assemblies of modules for N-terminal tagged SpCas9 (Exonuclease-Linker- SpCas9).
Any given exonuclease-linker combination was assembled first from Level -1 into Level 0. Corresponding exonuclease-linker fusions (Level 0) were combined with other given Level 0 modules (2x35Ss_Q enhancer_exonuclease-linker_NLS-SpCas9i-NLS_tOCS) and assembled into GG-compatible MoClo Level 1 T-DNA Vector for Agro bacterium-medated transient expression in planta (including LB and RB sequences for T-DNA delivery).
C-terminal Cas9-Exonuclease fusions
The general structure of the construct encoding the protein of the invention was as follows (Fig. 2B):
The polynucleotide sequence of the Cas9 module for C-terminal fusions AATG_/VLS- SpCas9i-/VLS_TTCG is shown in the following SEQ ID NO: 16:
In the polynucleotides of SEQ ID NO: 16, the formatting has the following meaning:
A/LS-SpCas9 (intron) -NLS
The polynucleotide sequence of the LbCas12a(D156R) module for C-terminal fusions
AATG_/VLS-LbCas12ai-/VLS_TTCG is shown in the following SEQ ID NO: 62:
In the assembled Level 1 construct, the sequences of SEQ ID NO: 2, 16 and 17 are combined with the fragment referred to as “linker-Exo” comprising the polypeptide linker and the exonuclease in Fig. 2B. The linker-Exo is assembled by GG (Golden Gate) cloning from the Level -1 fragments “Linker” and “Exo” shown in Fig. 2B.
Linkers for C-terminal Exonuclease-Cas9 fusions (frame underlined)
SEQ ID NO: 18: polynucleotide encoding linker TTCG_2xLF2_AATG SEQ ID NO: 20: polynucleotide encoding linker TTCG_XTEN144_AATG
SEQ ID NO: 25: polynucleotide encoding LIL12 exonuclease (AATG_UL12*_GCTT) for C- terminal Exonuclease-Cas9 fusions (* means stop codon)
SEQ ID NO: 26: polynucleotide encoding LIL12-2 exonuclease (AATG_UL12-2*_GCTT)
Exonuclease for C-terminal Exonuclease-Cas9 fusions (* means stop codon)
SEQ ID NO: 63: polynucleotide encoding Stenotrophomonas phage IME15 exonuclease (AATG_ME15*_GCTT) for C-terminal Exonuclease-Cas9 fusions (* means stop codon)
SEQ ID NO: 64: polynucleotide encoding Yersinia phage phiYeO3-12 exonuclease (AATG O3-12* GCTT) for C-terminal Exonuclease-Cas9 fusions (* means stop codon) SEQ ID NO: 65: polynucleotide encoding Spirochaeta bacterium exonuclease (AATG_SpiPh*_GCTT) for C-terminal Exonuclease-Cas9 fusions (* means stop codon)
SEQ ID NO: 66: polynucleotide encoding Pasteurella phage vB_PmuP_PHB02 exonuclase (AATG_PhBO2*_GCTT) for C-terminal Exonuclease-Cas9 fusions (* means stop codon)
SEQ ID NO: 67: polynucleotide encoding Ralstonia phage philTL-1 exonuclease (AATG_RaTL1*_GCTT) Exonuclease for C-terminal Exonuclease-Cas9 fusions (* means stop codon) Table 2: Level 1 assemblies of modules for C-terminal tagged SpCas9 (SpCas9- Linker- Exo)
Any given linker-exonuclease combination was assembled first from Level -1 into Level 0. Corresponding linker-exonuclease fusions (Level 0) were combined with other given Level 0 modules (2x35Ss_Q enhancer;NLS-SpCas9i-NLS; Linker-exo; tOCS) and assembled into GG- compatible MoClo Level 1 T-DNA vector for Agrobacterium mediated transient expression in planta (including LB and RB sequences for T-DNA delivery).
Design of sqRNA Cas9 sgRNAs where amplified as Golden-Gate (GG)-modules via PCR. The forward primer binds to the sgRNA scaffold and brings in the new spacer sequence together with a Bsal restriction site including the corresponding overhang (ATTG, guanine in bold is the SIU6 transcriptional start site). A sgRNA flip extension scaffold (Chen et al., 2013) followed by the 67 bp U-26 terminator (Castel et al., 2018) was used as template (pAGT6182). The reverse primer binds in the terminator sequence and contains a Bsal site including the corresponding overhang (CGCT, Fig. 3). Resulting sgRNA-Terminator PCR fragments were combined with U6 promoter from Solanum Lycopersicum (SILI6, pAGT5824) via GG-cloning into Level 1 T- DNA vectors. Level 1 expression vectors where transformed into Agrobacterium tumefaciens (GV3101 pMP90) and used for transient expression in N. benthamiana leaves.
In the polynucleotide of SEQ ID NO: 27 as represented above, the formatting has the following meaning:
LB SIU6 U6 promoter from Solanum lycopersicum, |Spacer| sgRNA scaffold (SpCas9),
AtU6-26-t67 U6 promoter from Arabidopsis thaliana, RB, GG-overhanqs
SEQ ID NO: 28: spacer for sgR-PGK1 (as present in SEQ ID NO: 27):
AGCCACATATCCACTGGTGG
SEQ ID NO: 29: spacer for sgR-PGK2 (alternative spacer that may replace the spacer of SEQ
ID NO: 28 in SEQ ID NO: 27):
CC AC T GAT T AT GC T GAT GAG
Example 3: Translational GUS-fusion to NbPGK via gene targeting by transient expression in Nicotiana benthamiana leaves
To validate the efficiency of the generated SpCas9-Exonuclease variants we expressed all components (SpCas9, sgRNAs, Donor fragment) by Agrobacterium mediated transient expression in N. benthamiana (Nb). Successful gene targeting should lead to measurable GUS activity.
Corresponding Agrobacterium strains were grown on plate for 2 days at 28°C (LB Agar with corresponding antibiotics). Grown bacteria were resuspended in AIM (Agrobacterium infiltration media) to an optical density (ODeoo) of 0.1 and 0.2 for Cas9-variants and sgRNAs/donor, respectively. Dilutions of Agro bacterium- strains were mixed equally together (1: 1:1 :1; Cas9-construct : sgR-PGK1 : sgR-PGK2 : Donor). Agrobacterium suspensions were inoculated into leaves of Nb using a needleless syringe. 4 days post inoculation leaf samples were harvested and analyzed for quantitative (2 leaf discs 0.9 cm) and qualitative (1 leaf disc 0.9 cm) GUS activity according to Kay et al., 2007 (Kay et al., 2007). A GUS construct driven by the constitutive promoter of actin2 was inoculated separately and used as positive control. GUS activities produced by different Cas9-Exonuclease fusions are relative to Act2p-GUS activity (Fig. 4A). Fusion of Cas9-4LF2-UL12, UL12-2-4LF2-Cas9 and T7-4LF2-Cas9 led to the highest increase of GUS activity compared to Cas9 (Fig. 4A). Low GUS activity was detected when dCas9 was used and comparable to the GUS activity of the control without any nuclease (background activity). Pictures of stained leaf discs were taken with a VHX microscope (Keyence; Fig. 4B). Leaf discs and quantitative GUS data came from the same experiment, which was repeated twice with similar results.
Example 4: Analysis of exonuclease activity by in vitro processing of a blunt ended hairpin oligonucleotide
To determine the activity of the T5 and T7 bacteriophage exonucleases on blunt end dsDNA we used a hairpin oligonucleotide with a linked fluorophore (Oregon green) as described in (Nikiforov, 2014), except that the 5’ end was phosphorylated. The fluorophore is quenched by stacking to the G-C base pair at the site of the blunt end (Fig. 5). Processing of the blunt end by the exonuclease releases the fluorophore, prevents quenching and could be measured by fluorescence (Excitation 495 nm; Emission 520 nm).
Reaction (in 100 μL):
10 μL Buffer NEB4
2.5 μL Exonuclease (1:5 dilution; 5 U) x μL Oligonucleotide (adjusted to desired final concentration (between 5 and 20 pM)
2 μL DTT (250 mM) y pl H2O to 100 μL
The T5 and T7 exonucleases were purchased from New England Biolabs (catalog numbers M0363 and M0263, respectively). The oligonucleotide has the nucleotide sequence: It is phosphorylated at the 5'-end and carries the Oregon Green fluorescent dye at the position shown in Fig. 5.
All components were mixed in a 96-well plate on ice to prevent initiation of the reaction. The reaction is started upon setting the multi-well plate in the fluorescence reader (TECAN spark Fusion plate reader). The measurement was done at 27 °C for 30 minutes. For the calculation of kinetic parameters for the T7 exonuclease, four different concentrations of the hairpin oligonucleotide were used (5, 10, 15 and 20 pM). The fluorescence values of the no-enzyme control were subtracted and the initial velocity (Vo) of the reaction was calculated by measuring the slope of the fluorescence increase, which is linear during the first 300 to 400 seconds (see Fig. 5). This set of Vo values was then used to calculate the Km and Kcat values using the Lineweaver-Burk linear representation of the Michaelis-Menten equation. It should be mentioned that the kinetic parameters thus calculated are apparent and specific to this assay because the exonucleases are processive enzymes which cleave one nucleotide after the other.
The comparison of T5 and T7 (Fig. 5) clearly indicates that the T7 exonuclease degrades the blunt-ended oligonucleotide much faster than the T5 exonuclease.
Example 5: Design of a C-terminal Cas12a (D156R) exonuclease fusions
Constructs were designed similarly as described in Example 2. The general structure of the assembled construct is: T-DNA right border RB, 2x35Sshort, translational enhancer, NLS- LbCas12a(D156R)-NLS (intron), Linker-Exonuclease, tOCS, T-DNA left border LB.
The polynucleotide sequence of the Cas12a module for C-terminal fusions AATG_/VLS- LbCas12a(D156R)-/VLS (intron)_TTCG is shown in the following SEQ ID NO: 38: In the polynucleotides of SEQ ID NO: 38 as represented above, the formatting has the following meaning:
A/LS-LbCas12a(D156R)-/VLS, INTRON, GG-overhangs
Table 3: Level 1 assemblies of modules for C-terminal tagged LbCas12a (LbCas12a(D156R)i- Linker-Exo). stands for a stop codon (as above). Any given linker-exonuclease combination was assembled first form Level -1 into Level
0. Corresponding linker-exonuclease fusions (Level 0) were combined with other Level 0 modules (2x35Ss_Q enhancer_NLS-LbCas12a(D156R)i-NLS_Linker-exo_tOCS) and assembled into GG-compatible MoClo Level 1 T-DNA Vector for Agrobacterium mediated transient expression in planta (including LB and RB sequences for T-DNA delivery)
Example 6: Design of an N-terminal Cas12a (D156R) exonuclease fusion
Constructs were designed similarly as described in Example 2. Exonucleases and linkers may be as described in Example 2. The general structure of the construct to be assembled is: T-DNA right border - RB, 2x35Sshort, translational enhancer, Exonucleaselinker, NLS-LbCas12a(D156R)-NLS (intron), tOCS, T-DNA left border LB.
The polynucleotide sequence of the Cas12a module for N-terminal fusions
AGGT_/VLS-Cas12a(D156R)-/VLS (intron)_GCTT is shown in the following SEQ ID NO: 39:
In the polynucleotides of SEQ ID NO: 39 as represented above, the formatting has the following meaning:
A/LS-LbCas12a(D156R)-/VLS (intron) GG-overhan
Table 4: Level 1 assemblies of modules for N-terminal tagged LbCas12a (Exo-Linker- LbCas12a).
Any given Exonuclease-Linker combination was assembled first form Level -1 into Level 0. Corresponding Exonuclease-linker fusions (Level 0) were combined with other given Level 0 modules (2x35Ss_Q enhancer_exonuclease-linker_NLS-SpCas9i-NLS_tOCS) and assembled into GG-compatible MoClo Level 1 T-DNA Vector for Agrobacterium mediated transient expression in planta (including LB and RB sequences for T-DNA delivery).
Example 7: Transgenic Nicotiana benthamiana tobacco mosaic virus (TMV)-Reporter line using GFP
In addition to the GUS-based reporter system described above, a GFP-based viral reporter system was generated. A schematic presentation of the TMV-based HDR reporter system using GFP is given in Fig. 6. The genome of the Nicotiana benthamiana plant is transgenic and comprises an insertion cassette harboring a TMV-based HDR reporter system using GFP. This reporter system leads to GFP fluorescence when a donor sequence from the donor nucleic acid (donor) is successfully and correctly integrated into the plant genome. The genome of the TMV present in the insertion cassette is modified: (i) the coat protein (CP)-encoding sequence is exchanged by sequence encoding GFP, and (ii) the replicase RdRP contains a 3.8 kb deletion replaced by a 76 bp attB site. The MP is intact and facilitates viral spread from cell to cell (signal propagation) which allows macroscopic observation of GFP-expression derived from single-cell HDR events (one GFP-spot equals one single cell HDR event). The RdRP is essential for viral replication and production of secondary transcripts (MP and GFP) from subgenomic promoters. The exchange of the CP with GFP prevents packaging of the viral genome into viral particles (non-infectious virus) and allows high, RdRP-dependent expression rates of GFP instead. Viral replication (GFP-expression) only takes place if the disrupted RdRP is repaired via precise insertion of the provided donor DNA by HDR. Four Cas9 sgRNA targets are used, specific for the attB site to induce DNA double strand breaks (DSBs) with PAM-in (combination of PAM-ln1 and PAM-ln2 sgRNAs) or PAM-out (combination of PAM-out1 and PAM-out2 sgRNAs) orientation. DSB repair through homology directed repair (HDR) by provision of a donor nucleic acid (donor) leads to reconstitution of the TMV replicase (RdRP) and consequent GFP production as HDR read out. For details see Fig. 6. This reporter system was used to measure HDR frequency in the following examples.
Example 8: Exonuclease fused Cas9 leads to increased HDR in planta
Analysis of HDR-efficiency using the transgenic Nicotiana benthamiana TMV reporter line from Example 7. T-DNAs carrying the transcriptional units for the expression of (i) Cas9 exonuclease fusion proteins and (ii) sgRNAs and (iii) the donor DNA were delivered transiently by agrobacterium mediated transient transformation into leaves of Nicotiana benthamiana plants using a needleless syringe as described in Example 3. GFP fluorescence was monitored after 3 or 6 days post inoculation (dpi) under UV-light produced by a hand-held lamp (model Blak-Ray B100A from U P) and pictures were taken using a digital camera (Canon EOS 700D). Exonuclease-fused Cas9 was compared to WT Cas9 (denoted “Cas9”) and deactivated Cas9 (denoted “dCas9”) as controls. The exonucleases were fused to the C- terminal end of Cas9 using the 4LF2 linker (Cas9-4LF2-X). Fusion of LIL12 and T7 to Cas9 led to strongest increase in HDR in Fig. 7. A high rate of HDR events could be observed with UL12-fused Cas9 shortly after 3 dpi. After 6 dpi, GFP fluorescence was saturated in tissues where UL12- and T7-fused Cas9 was expressed. 6 dpi Cas9 expression only led to few HDR events. See Figure 7.
Example 9: Quantification of HDR-events by GFP-spot count
For quantification of HDR-events, pictures of Nicotiana benthamiana leafs from three inoculated plants were taken 3 dpi and 6 dpi using a digital camera while illuminating the leaves in the dark with a hand held UV lamp as described in Example 8. All constructs were inoculated on one single leaf with scrambled positions in each leaf. To assess comparable GFP-spot count, areas of the same size which fit into each inoculation spot were used to manually count GFP-fluorescence spots within these areas (Inoculation spots; GFP spots equal HDR events). Saturation of the GFP signal is indicated (not defined - nd). GFP spot number of Cas9-expressed tissues (6dpi) did not exceed GFP spot number of Cas9-UL12 expressed tissues (3dpi). See Figure 8.
Example 10: Genotyping of HDR events by PCR
Genomic DNA of Nicotiana benthamiana leafs were isolated 4 dpi from tissues expressing dual PAM In sgRNA, donor nucleic acid (donor) and the corresponding Cas9 exonuclease fusions. Details are depicted in Fig. 9. Primer pair P1 (consisting of the primers 1 F and 1 R) was used to monitor on-target NHEJ-events (small deletion). Primer pairs P2 (2F and 2R) and P3 (3F and 3 R) were used to amplify upstream and downstream HDR junctions, respectively. Cas9 led to a deletion of the fragment flanked by the sgRNA target sites (smaller band). Intensities of this fragment negatively correlate with GFP spot number. Junctions of HDR- events could be confirmed for both sides for LIL12-, T5- and T7-fused Cas9 variants.
Sequencing confirmed precise repair (data not shown). See Fig. 9 for details.
Example 11 Exonuclease domain of Exo1 (ExolAC) fused to Cas9 only slightly increased
HDR efficiency
Analysis of HDR-efficiency using the transgenic Nicotiana benthamiana TMV reporterline according to Examples 7 and 8. GFP fluorescence was monitored at 5 dpi. Exonuclease-fused Cas9 was compared to WT Cas9 and deactivated Cas9 (dCas9) as controls. X-4LF2-Cas9 indicates that the exonucleases were fused to the N-terminal end of Cas9 using the 4LF2 linker. The Cas9-fused Exonuclease domain of Exo1 (ExolAC) only led to slight increase in HDR-events compared to Cas9. UL12- and T7-fused Cas9 outperformed Exo1AC-fused Cas9. See Fig. 10.
Example 12 Comparison of UL12-homoloques in HDR
Amino acid (aa) sequence identity of the analyzed UL12-homologues is given in the upper panel of Fig. 11. Analysis of HDR-efficiency using the transgenic Nicotiana benthamiana TMV reporterline according to Examples 7 and 8. LIL12 Homologues with a sequence identity of 50% or higher such as PiE (SEQ ID NO: 44) or PapE (SEQ ID NO: 43) show HDR-rates comparable or higher to UL12 (SEQ ID NO: 32) or UL12-2 (SEQ ID NO: 33) in Figure 11 . Fig. 12 provides the quantitative GFP spot count analysis of Fig. 11. GFP-spot count was done as described in Example 9. Figure 12 reveals that PapE (SEQ ID NO: 43) and PiE (SEQ ID NO: 44) fused to Cas9 show an increased HDR-efficiency over LIL12 and LIL12-2. LIL12 homologs with a sequence identity of less than 49% show HDR-rates lower than for LIL12 or LIL12-2. See Figs. 11 , 12, 15 and 17. UL12 = SEQ ID NO: 32, UL12-2 = SEQ ID NO: 33, PapE = SEQ ID NO: 43, PiE = SEQ ID NO: 44, AB4P = SEQ ID NO: 69, MD5 = SEQ ID NO: 36, Dumas = SEQ ID NO: 35, BGLF5 = SEQ ID NO: 34, SOX = SEQ ID NO: 68.
Example 13 Comparison of T7-homologues in HDR
Amino acid (aa) sequence identity of the analyzed T7-homologues is given in the upper panel of Fig. 13. Analysis of HDR-efficiency using the transgenic Nicotiana benthamiana TMV reporterline according to Examples 7, 8 and 12. T7 Homologues ME15 (SEQ ID NO: 45) (87% sequence identity) and SpiPh (SEQ ID NO: 46) (65% sequence identity) show increased HDR- rates compared to T7 (SEQ ID NO: 30). The T7 homologue ME15 (SEQ ID NO: 45) shows HDR-efficiency comparable to UL12 (SEQ ID NO: 32) and UL12-2 (SEQ ID NO: 33) and higher than T7 in Fig. 13 and the GFP spot count analysis of Fig. 14. Fig. 14 provides the quantitative GFP spot count analysis of Fig. 13. GFP-spot count was done as described in Example 9. The T7 homologue SpiPh (SEQ ID NO: 46) shows a HDR-efficiency higher than T7 in Figs. 13 and 14. Also see Figs. 16 and 17. UL12/UL12-1 = SEQ ID NO: 32, UL12-2 = SEQ ID NO: 33, T7 = SEQ ID NO: 30, ME15/IME15 = SEQ ID NO: 45, O3-12/YerO3-12 = SEQ ID NO: 70, SpiPh = SEQ ID NO: 46, PHBO2 = SEQ ID NO: 71, RaTL1/RalTL1 = SEQ ID NO: 72.
Example 14 Comparison of exonuclease activity of monomeric exonucleases using blunt end DNA substrates
See Fig. 18. (A) Fluorophore-labeled hairpin Oligo was used to assess exonuclease activity on blunt end DNA. The fluorophore is quenched by stacking to the plane terminal GC-bond. 5' to 3' resection by the exonuclease leads the release of the fluorophore and consequent fluorescence. (B) The assay was conducted according to Example 4. Labeled Oligo (with a concentration of 10 pM) was incubated with different exonucleases over 30 minutes at 28 °C. T5 (NEB; # M0363), T7 (NEB; M0263S), recombinant His-tagged LIL12 has been purified from E.coli. Fast increase of fluorescence in Fig. 18 indicates high exonuclease activity and preference for blunt end DNA substrates for LIL12 and T7 compared to T5. See Figure 18. UL12 = SEQ ID NO: 32, T7 = SEQ ID NO: 30, T5 = SEQ ID NO: 31.
Example 15 Comparison of exonuclease activity of monomeric (T5 and T7) with trimeric (LaExo) exonucleases using blunt end DNA substrates
See Fig. 19. (A) Fluorophore-labeled hairpin Oligo was used to assess exonuclease activity on blunt end DNA. The fluorophore is quenched by stacking to the plane terminal GC-bond. 5' to 3' resection by the exonuclease leads the release of the fluorophore and consequent fluorescence. (B) The assay was conducted according to Example 4 with small changes as follows: LaExo from NEB needs its own specific buffer (LaExo buffer). Experiments with T5- exo and T7-exo were done using NEB buffer 4 and Experiments with LaExo were done with LaExo-specific buffer (NEB). Reactions were performed simultaneously in the same 96 well plate.
Reaction T5-Exo and T7- Exo (in 100 uL):
10 μL Buffer NEB4
1 μL Exonuclease (T5 or T7) (10 U) x μL Oligonucleotide (adjusted to desired final concentration (10 pM) 2 μL DTT (250 mM) y μl H2O to 100 μL
Reaction La Exo (in 100 μL):
10 μL Buffer LaExo
2 μL Exonuclease (LaExo) (10 U) x μL Oligonucleotide (adjusted to desired final concentration (10 pM)
2 μL DTT (250 mM) y pl H2O to 100 μL
Labeled Oligo was incubated with different exonucleases over 30 minutes at 28°C. T5 (NEB; # M0363), T7 (NEB; M0263S), LaExo (NEB; M0262S). Fast increase of fluorescence in Fig. 19 indicates high exonuclease activity and preference for blunt end DNA substrates. T7 and LaExo (Lambda exonuclease) show a higher exonuclease activity on blunt end DNA substrates than T5 (see Fig. 19). However, as demonstrated in Example 16 and Fig. 20, fusion of the trimeric Lambda exonuclease (LaExo) to Cas9 showed less HDR-efficiency than Cas9- fused UL12.
Example 16 Activity of Cas9-fused LaExo cannot be increased by coexpression of nuclear localized LaExo (termed N-LaExo or LaExo-N)
Analysis of HDR-efficiency using the transgenic Nicotiana benthamiana TMV reporterline and methods described before. GFP fluorescence was monitored over 5 days post inoculation (dpi). Fusion of the trimeric LaExo (Lambda exonuclease) to Cas9 showed less HDR-efficiency compared to Cas9-fused UL12. Coexpression of nuclear localized LaExo (nuclear localized via a nuclear localization signal (NLS); termed N-LaExo or La-Exo-N) did not increase HDR- efficiency. See Fig. 20. N-LaExo and LaExo-N indicate that the nuclear localization signal (NLS) was fused to the N- or the C-terminal end of LaExo, respectively.
Example 17 Determination of minimal homology arm-length
Analysis of HDR-efficiency using the transgenic Nicotiana benthamiana TMV reporterline and methods described before. GFP fluorescence was monitored at 5 days post inoculation (dpi). Donor nucleic acids differ in homology arm (HA) length as indicated (1000 bp, 500 bp, 250 bp, 100 bp). Minimal homology arm length of 250 bp showed slightly increased HDR efficiency. See Fig. 21.
Example 18 Cas12a-exonuclease fusion leads to increased HDR in planta Analysis of HDR-efficiency of Cas12a-exonuclease fusion proteins using the transgenic Nicotiana benthamiana TMV reporterline and methods described before. Fluorescence was monitored for 5 dpi. WT Cas12a and no endonuclease serve as controls. Note that Cas12a generates “staggered” cuts with overhangs on double stranded DNA opposed to “blunt” cuts generated by Cas9. See Fig. 22. Also see Example 19.
Example 19 Estimation of HDR-efficiency of Cas12a exonuclease fusion proteins by GFP spot count
Quantification of HDR events from Example 18/Fig. 22 according to the method described in Example 9. For quantification of HDR-events, pictures of Nicotiana benthamiana leaves from three inoculated plants were taken at 3 dpi. All constructs were inoculated on one single leaf with different positions in each plant. To assess comparable numbers of GFP-fluorescent spots counting was done in defined areas of the same size. In contrast to Cas9-fused exonucleases, Cas12a-fused T5 led to increased HDR compared to Cas12a-fused T7. Cas12a-fused LIL12 leads to highest increase of HDR-efficiency. See Fig. 23. dCas12a means deactivated Cas12a.
Example 20 Comparative analysis of the NHEJ cleavage pattern of Cas9- and Cas12a- exonuclease fusion proteins
See Fig. 24. Corresponding Cas9- and Cas12a-exonuclease fusion proteins and sgRNAs (Cas9)/crRNAs (Cas12a) (PAM-in or PAM-out) in the presence (+D - donor DNA) and absence of the donor DNA were transiently expressed in leaves of the transgenic Nicotiana benthamiana TMV-reporterline by Agrobacterium mediated expression (as described in Examples 7 and 8). Genomic DNA was isolated 4 dpi from plant leaf tissue expressing the corresponding nuclease-exonuclease fusion proteins and sgRNAs/crRNAs and used as template for on-target amplification using primer pair P1 (primers 1F and 1 R). Cas9-mediated cleavage leads to a deletion of the fragment flanked by the dual sgRNA target sites (additional smaller band in Fig. 24; indication of NHEJ; 44 bp deletion between e.g. PAM-ln1 and PAM- In2). Cas12a did not lead to a distinct deletion, indicating a broader spectrum of deletions or generally reduced NHEJ-mediated deletions. Cas9-exonuclease fusions led to broader deletion sizes leading to a visible smear instead of a distinct band (Fig. 24). Reduced precise deletion frequency correlates with increased HDR. Also see Examples 21 and 22.
This clearly shows that Fusion of LIL12 or T7exo allows 5'to 3' resection prior to NHEJ- mediated relegation of the DNA lesions.
Example 21 Amplicon sequencing using Cas9-exonuclease fusion proteins See Fig. 25 A-D. The same genomic DNA from Example 20 (Cas9 and Cas9-exo fusion proteins without donor) was used as template for on-target amplification using primer pairs P1 from Example 20/Fig. 24 with 5' adapters. Adapters serve as binding anchors for index primers for amplicon sequencing. Cas9 mediated cleavage mainly led to precise deletions between the cleavage sites. Cas9-mediated cleavage in PAM out orientation also led to a significant number of 2 nt small deletions (1 nt shift per cleavage site). Cas9-exonuclease fusion generally led to larger deletions. The maximal size of deletions is similar between the different fused exonucleases, whereas UL12- and T7-fused Cas9 showed higher frequency. See Fig. 25A-D.
Example 22 Amplicon Sequencing using Cas12a-exonuclease fusion proteins
Genomic DNA from Example 20 (Cas12a and Cas12a-exonuclease fusion proteins without donor) was used as template for on-target amplification using primer pairs P1 with 5' adapters. Adapters serve as binding anchors for index primers for amplicon sequencing. In general, Cas12a led to deletion of the fragment between the dual crRNA cleavage site and a few smaller on-target deletions. Fusion of Exonucleases to Cas12a led to increased indels with larger deletions compared to Cas12a WT. See Fig. 26A-D.
Example 23: Sequence alignment of tested alkaline exonucleases homologous to UL12.
The amino acid sequences from exonucleases that were identified as homologous to LIL12 in Fig. 15 were aligned. The alignment is given in Figs. 27A to 27C. Sequence motifs/sequence segments that are specific for the exonucleases that show a high HDR efficiency in Example 12 and in Figs. 11-12 were identified. The exonucleases PiE, PapE, LIL12-1 and LIL12-2 revealed the highest HDR efficiency in Example 12 and were selected for the analysis. This analysis identified that all of PiE (SEQ ID NO: 44), PapE (SEQ ID NO: 43), LIL12-1 (SEQ ID NO: 32) and LIL12-2 (SEQ ID NO: 33) contain the amino acid sequence of as a common sequence motif/sequence segment, wherein the X at position 2 is preferably R or K, the X at position 8 is preferably F or Y, the X at position 17 is preferably D or E and the X at position 35 is preferably R or K. The X in SEQ ID NO: 56 and all other amino acid sequence segments comprised in this application stands for any one of the 20 amino acids in the standard genetic code, wherein an amino acid residue must be present at the position X as X does not allow the absence of an amino acid residue. Thus, the X in SEQ ID NO: 56 represents variable amino acid residues, whereas the non-X residues in SEQ ID NO: 56 are fixed to the residue indicated in the one-letter code. As evident from Fig. 27A and 27B, UL12-group specific motifs I and II could be identified. These motifs are present in all of PiE, PapE, LIL12-1 and LIL12-2, but absent in all of BGLF5, SOX, MD5, DUMAS and AB4P. In particular, the UL12-group specific motif II consisting of SEQ ID NO: 54 (FRYCVGRAD), differentiates all of PiE, PapE, UL12-1 and UL12-2 from all of BGLF5, SOX, MD5, DUMAS and AB4P. As the UL12-group specific motifs I and II are also comprised in SEQ ID NO: 56, SEQ ID NO: 56 also distinguishes all of PiE, PapE, UL12-1 and UL12-2 from all of BGLF5, SOX, MD5, DUMAS and AB4P. This means that SEQ ID NO: 56 is present in all exonucleases of Example 12/Figs. 11-12 that show a high HDR efficiency and absent in all exonucleases of Example12/Figs. 11-12 that show a low HDR efficiency. Thus, SEQ ID NO: 56 may explain why PiE, PapE, UL12-1 and UL12-2 show a higher HDR efficiency compared to BGLF5, SOX, MD5, DUMAS and AB4P.
Example 24: Amino acid sequence segments specific for PapE
PapE (SEQ ID NO: 43) showed a particularly high HDR efficiency in Example 12/Figs. 11- 12. Amino acid sequences homologous to PapE were aligned to PapE (SEQ ID NO: 43) to identify sequence motifs/sequence segments that are specific for PapE-group exonucleases. As depicted in Fig. 28A and 28B, motifs specific for PapE and PapE-group exonucleases could be identified (PapE-group specific motifs I to IV). The PapE-group specific motifs I to IV differentiate PapE or PapE-group exonucleases from all of UL12-1, UL12-2, PiE, BGLF5, SOX, MD5, DUMAS and AB4P as depicted in Figs. 27A to 27C. The PapE-group specific motifs I, II, III and IV from Fig. 28A and 28B are comprised in at least one of SEQ ID NO: 47 (APAESVHACGVL), SEQ ID NO: 48 (APAASVHACGVL), SEQ ID NO: 49 (AKYAFDPADAGXXVVAAHRRL), SEQ ID NO: 50 (APASAPAAVRAA) and SEQ ID NO: 51 (LIITPVRXDAA). Thus, any one selected from the group consisting of SEQ ID NO: 47, 48, 49, 50 and 51 is an amino acid segment that is specific for PapE (SEQ ID NO: 43) or PapE-group specific exonucleases.
Example 25: Sequence alignment of tested T7 exonuclease homologues
Amino acid sequences of exonucleases homologous to T7 were aligned as depicted in Fig. 29. Amino acid sequence motifs/sequence segments were identified that are present in SpiPhage, T7 and ME15, but not in RalTLI, PaPHBO2 or YerO3-12 (T7 exonuclease group motifs I and II). As SpiPhage, T7 and ME15 show a higher HDR efficiency in Example 13/Figs. 13-14 than RalTLI, PaPHBO2 or YerO3-12, the T7 exonuclease group specific motifs I and II correlate with the higher HDR efficiency of SpiPhage, T7 and ME15. In particular, ME15 (SEQ ID NO: 45) showed the highest HDR efficiency of the tested T7 homologues (Fig. 14). ME15 specific motifs I and II were identified that are present in the amino acid sequence of ME15 (SEQ ID NO: 45) but absent in the amino acid sequence of T7 (SEQ ID NO: 30). The ME15 specific motifs I and II are comprised in at least one of SEQ ID NO: 52 (APTESETLWDCI) and SEQ ID NO: 53 (ILRFNDYNIDT). Thus, the sequence segments of SEQ ID NO: 52 and/or 53 correlate with the increased HDR frequency of ME15 over T7 and allow to distinguish ME15 from the T7 exonuclease. On the other hand, the amino acid sequence of SEQ ID NO: 98 (WEEEIWHRCCDHAKAR) is a sequence motif specific for T7, SpiPhage and ME15.
REFERENCES
Kay, S., Hahn, S., Marois, E., Hause, G. and Bonas, II. (2007) A bacterial effector acts as a plant transcription factor and induces a cell size regulator. Science, 318, 648-651.
Nikiforov, T.T. (2014) Generic assay format for endo- and exonucleases based on fluorogenic substrates labeled with single fluorophores. Anal Biochem, 461 , 67-73.
Goldstein JN, Weller SK. The exonuclease activity of HSV-1 LIL12 is required for in vivo function. Virology. 1998 May 10;244(2):442-57. doi: 10.1006/viro.1998.9129. PMID: 9601512.
Buisson M, Geoui T, Flot D, Tarbouriech N, Ressing ME, Wiertz EJ, Burmeister WP. A bridge crosses the active-site canyon of the Epstein-Barr virus nuclease with DNase and RNase activities. J Mol Biol. 2009 Aug 28;391(4):717-28. doi: 10.1016/j.jmb.2009.06.034. Epub 2009 Jun 16. PMID: 19538972.
Ferreira PCG, Hemerly AS, de Almeida Engler J, Van Montagu M, Engler G, Inze D (1994) Developmental expression of the Arabidopsis cyclin gene cydAt. Plant Cell 6: 1763- 1774
Steffen JG, Kang IH, Macfarlane J, Drews GN. Identification of genes expressed in the Arabidopsis female gametophyte. Plant J. 2007;51 :281-92.
Sprunck S, Rademacher S, Vogler F, Gheyselinck J, Grossniklaus II, Dresselhaus T. Egg cell-secreted EC1 triggers sperm cell activation during double fertilization. Science. 2012;338:1093-7.
Summary of protein and amino acid sequences
SEQ ID NO: 1 : polynucleotide sequence of the sequence of Donor Fragment (PGK-GUS): sequence given above
SEQ ID NO: 2: polynucleotide sequence of module 2x35Ss - Q
SEQ ID NO: 3: polynucleotide sequence of module AGGT_NLS-SpCas9i-NLS*_GCTT for N- terminal fusions
SEQ ID NO: 4: polynucleotide encoding linker 2xLF2
SEQ ID NO: 5: polynucleotide encoding linker 4xLF2
SEQ ID NO: 6: polynucleotide encoding linker XTEN 144
SEQ ID NO: 7: polynucleotide encoding linker XTEN40
SEQ ID NO: 8: polynucleotide encoding linker XTEN 16
SEQ ID NO: 9: polynucleotide encoding T7 exonuclease for N-terminal exonuclease-Cas9 fusions
SEQ ID NO: 10: polynucleotide encoding T5 Exonuclease for N-terminal Exonuclease-Cas9 fusions
SEQ ID NO: 11: polynucleotide encoding LIL12 exonuclease for N-terminal Exonuclease-Cas9 fusions
SEQ ID NO: 12: polynucleotide encoding LIL12-2 Exonuclease for N-terminal Exonuclease-
Cas9 fusions
SEQ ID NO: 13: polynucleotide encoding BGLF5 exonuclease for N-terminal Exonuclease-
Cas9 fusions
SEQ ID NO: 14: polynucleotide encoding DUMAS exonuclease for N-terminal Exonuclease-
Cas9 fusions
SEQ ID NO: 15: polynucleotide encoding MD5 exonuclease for N-terminal Exonuclease-Cas9 fusions
SEQ ID NO: 16: polynucleotide sequence of module AATG_NLS-SpCas9i-NLS_TTCG for C- terminal fusions
SEQ ID NO: 17: polynucleotide sequence of module tOCS
SEQ ID NO: 18: polynucleotide encoding linker 2xLF2
SEQ ID NO: 19: polynucleotide encoding linker 4xLF2
SEQ ID NO: 20: polynucleotide encoding linker XTEN144
SEQ ID NO: 21: polynucleotide encoding linker XTEN40
SEQ ID NO: 22: polynucleotide encoding linker: XTEN16
SEQ ID NO: 23: polynucleotide encoding T7 Exonuclease for C-terminal Exonuclease-Cas9 fusions
SEQ ID NO: 24: polynucleotide encoding T5 Exonuclease for C-terminal Exonuclease-Cas9 fusions SEQ ID NO: 25: polynucleotide encoding LIL12 Exonuclease for C-terminal Exonuclease-
Cas9 fusions
SEQ ID NO: 26: polynucleotide encoding LIL12-2 Exonuclease for C-terminal Exonuclease-
Cas9 fusions
SEQ ID NO: 27: sgRNA transcriptional unit
SEQ ID NO: 28: spacer for sgR-PGK1
SEQ ID NO: 29: spacer for sgR-PGK2
SEQ ID NO: 38: polynucleotide encoding AATG_NLS-LbCas12a(D156R)-NLS(intron)_TTCG
(for C-terminal fusion of exonuclease)
SEQ ID NO: 39: polynucleotide sequence of the Cas12a module for N-terminal fusions AGGT_N LS-Cas12a(D156R)-N LS (intron)_GCTT
SEQ ID NO: 40: amino acid sequence of NLS-LbCas12a(D156R)-NLS for C-terminal fusions SEQ ID NO 41 : amino acid sequence of protein NLS-LbCas12a(D156R)-NLS (intron) for N- terminal fusions
SEQ ID NO: 58:Polynucleotide encoding AATG_SOX_TTCG exonuclease for N-terminal Exonuclease- Cas9 fusions
SEQ ID NO: 65: polynucleotide encoding AATG_SpiPh*_GCTT Exonuclease for C-terminal Exonuclease-Cas9 fusions (* means stop codon)

Claims

Claims A protein for editing endogenous DNA in a eukaryotic cell or in a eukaryotic organism at a target site of the endogenous DNA, comprising a site-specific endonuclease and a 5’- 3’ exonuclease, wherein the 5’-3’ exonuclease is preferably a monomeric 5’-3’ exonuclease. The protein according to claim 1 , wherein the 5’-3’ exonuclease is a protein whose amino acid sequence is or comprises
A)
(i) the amino acid sequence defined in SEQ ID NO: 43 (PapE), or
(ii) an amino acid sequence that has at least 80% sequence identity to the amino acid sequence defined in SEQ ID NO: 43, or
(iii) an amino acid sequence that has at least 90% sequence similarity to the amino acid sequence defined in SEQ ID NO: 43, or
(iv) an amino acid sequence having from 1 to 121 amino acid substitutions, additions, insertions and/or deletions compared to the amino acid sequence defined in SEQ ID NO: 43, or
B)
(i) the amino acid sequence defined in SEQ ID NO: 44 (PiE), or
(ii) an amino acid sequence that has at least 80% sequence identity to the amino acid sequence defined in SEQ ID NO: 44, or
(iii) an amino acid sequence that has at least 90% sequence similarity to the amino acid sequence defined in SEQ ID NO: 44, or
(iv) an amino acid sequence having from 1 to 131 amino acid substitutions, additions, insertions and/or deletions compared to the amino acid sequence defined in SEQ ID NO: 44. The protein according to claim 1, wherein the 5’-3’ exonuclease is a protein whose amino acid sequence is or comprises
C)
(i) the amino acid sequence defined in SEQ ID NO: 45 (ME15), or
(ii) an amino acid sequence that has at least 80% sequence identity to the amino acid sequence defined in SEQ ID NO: 45, or
(iii) an amino acid sequence that has at least 90% sequence similarity to the amino acid sequence defined in SEQ ID NO: 45, or
(iv) an amino acid sequence having from 1 to 60 amino acid substitutions, additions, insertions and/or deletions compared to the amino acid sequence defined in SEQ ID NO: 45; or D)
(i) the amino acid sequence defined in SEQ ID NO: 46 (SpiPh), or
(ii) an amino acid sequence that has at least 80% sequence identity to the amino acid sequence defined in SEQ ID NO: 46, or
(iii) an amino acid sequence that has at least 90% sequence similarity to the amino acid sequence defined in SEQ ID NO: 46, or
(iv) an amino acid sequence having from 1 to 59 amino acid substitutions, additions, insertions and/or deletions compared to the amino acid sequence defined in SEQ ID NO: 46. The protein according to any one of claims 1 to 3, wherein said protein is a fusion protein comprising said site-specific endonuclease and said 5’-3’ exonuclease. The protein according to any one of claims 1 to 3, wherein said protein is an oligomeric protein (protein complex) comprising a first protein subunit comprising said site-specific endonuclease and a second protein subunit comprising said 5’-3’ exonuclease. The protein according to claim 5, wherein said first protein subunit comprises (preferably as a domain of said first protein subunit) said site-specific endonuclease and a first interaction domain, and said second protein subunit comprises (preferably as a domain of said second protein subunit) said 5’-3’ exonuclease and a second interaction domain, wherein said first and said second interaction domain bind to each other to form said oligomeric protein (protein complex). The protein according to any one of claims 1 to 3, wherein said protein is an oligomeric protein comprising a first subunit comprising said endonuclease and a second subunit comprising said 5’-3’ exonuclease, and a nucleic acid having a portion capable of binding to the endonuclease (such as a gRNA) and an aptamer capable of binding to the 5’-3’ exonuclease. The protein according to any one of claims 1 to 7, wherein the site-specific endonuclease is a CRISPR nuclease capable of inducing double strand breaks to or in DNA, such as Cas9 or Cas12a, or is a CRISPR nuclease with nickase activity capable of inducing single strand nicks to double stranded DNA, such as a nickase variant of Cas9. protein for editing endogenous DNA in a eukaryotic cell or in a eukaryotic organism at a target site of the endogenous DNA, comprising a site-specific endonuclease and a 5’- 3’ exonuclease, wherein the site-specific endonuclease is a CRISPR-nuclease as defined in claim 8 and wherein the site-specific endonuclease and the 5’-3’ exonuclease are fused via a polypeptide linker, the polypeptide linker having a length of 25 amino acid residues or more, preferably 30 amino acid residues or more, and wherein the 5’-3’ exonuclease is a protein whose amino acid sequence is or comprises
(i) the amino acid sequence defined in SEQ ID NO: 32 (LIL12-1) or SEQ ID NO: 33 (UL12-2), or
(ii) an amino acid sequence that has at least 80% sequence identity to the amino acid sequence defined in SEQ ID NO: 32 or SEQ ID NO: 33, or
(iii) an amino acid sequence that has at least 90% sequence similarity to the amino acid sequence defined in SEQ ID NO: 32 or SEQ ID NO: 33, or
(iv) an amino acid sequence having from 1 to 120 amino acid substitutions, additions, insertions and/or deletions compared to the amino acid sequence defined in SEQ ID NO: 32 or SEQ ID NO: 33.
10. The protein according to any one of claims 1 to 9, wherein the 5’-3’ exonuclease has the same or a higher 5’-3’ exonuclease activity in terms of catalytic efficiency kcat/Km or in terms of the turnover number than the T7 exonuclease (SEQ ID NO: 30) in the in-vitro exonuclease assay described in the description; or has a 5’-3’ exonuclease activity that is at least twice that of the T5 exonuclease (SEQ ID NO: 31) in terms of catalytic efficiency kcat/Km or in terms of the turnover number in the in-vitro exonuclease assay described in the description.
11. The protein according to any one of claims 1 to 10, wherein the protein provides an increased frequency of double strand break repair through the homology directed repair (HDR) pathway and/or a higher frequency of gene targeting or gene replacement events than the separate application of the site-specific endonuclease and the 5’-3’ exonuclease of the protein without being fused together or without forming a protein complex when provided with a donor nucleic acid.
12. The protein according to any one of claims 1 , 2 or 4 to 11 , wherein the amino acid sequence of the 5’-3’ exonuclease comprises the amino acid sequence segment of SEQ ID NO: 56 The protein according to any one of claims 1, 2 or 4 to 11, wherein the amino acid sequence of the 5’-3’ exonuclease comprises the amino acid sequence segments of SEQ ID NO: 54 (FRYCVGRAD) and SEQ ID NO: 55 (PXPLMXFFEAATQ). The protein according to any one of claims 4 and 8 to 13, wherein the 5’-3’ exonuclease is fused to the N-terminal end or to the C-terminal end of the site-specific endonuclease. The protein according to any one of claims 4, 8, and 10 to 14, wherein the site-specific endonuclease and the 5’-3’ exonuclease are fused via a polypeptide linker. The protein according to claim 15, said polypeptide linker consisting of from 5 to 300 amino acid residues, preferably from 10 to 200, more preferably from 20 to 120 amino acid residues. nucleic acid molecule comprising a polynucleotide encoding the protein according to any one of claims 1 to 16. Nucleic acid construct, plasmid or vector comprising the polynucleotide of the nucleic acid molecule according to claim 17. Kit comprising: a nucleic acid molecule comprising a polynucleotide encoding said first protein subunit according to claim 5 or 6 and a nucleic acid molecule comprising a polynucleotide encoding said second protein subunit according to claim 5 or 6; or a nucleic acid molecule comprising a polynucleotide encoding the site-specific endonuclease of the protein-nucleic acid complex according to claim 7, and a nucleic acid molecule comprising a polynucleotide encoding the 5’-3’ exonuclease of the protein-nucleic acid complex according to claim 7. prokaryotic or eukaryotic cell comprising i) the protein according to any one of claims 1 to 16, ii) the nucleic acid molecule of claim 17, iii) the nucleic acid construct, plasmid or vector according to claim 18, or iv) the kit according to claim 19. The cell according to claim 20, wherein the cell is a eukaryotic cell that further comprises a donor nucleic acid for homology directed DNA repair. The cell according to claim 21, wherein the donor nucleic acid comprises, in the following order, a first homology arm that is homologous to a first region flanking a target site in the genome of said cell on a first side of said target site, optionally a donor sequence of interest to be inserted into genomic DNA of said cell at said target site, and a second homology arm that is homologous to a second region flanking said target site on the second side of said target site. The cell according to claim 22, said donor sequence comprising, preferably consisting of, up to 15,000, preferably up to 10,000, and more preferably up to 20,000 nucleotides in length. The cell according to any one of claims 20 to 23, further comprising a guide RNA (gRNA) or a guide construct encoding said guide RNA, said guide RNA being capable of binding to the site-specific endonuclease and of directing the site-specific endonuclease to a target site in the genome of said cell. The cell according to any one of claims 20 to 24, wherein said cell is a plant cell. non-human organism, preferably a plant, comprising a cell according to any one of claims 20 to 25. kit for editing endogenous DNA in a eukaryotic cell or in a eukaryotic organism at a target site of the endogenous DNA, the kit comprising a donor nucleic acid as defined in claim 22 or a donor construct comprising said donor nucleic acid and a fusion protein for editing endogenous DNA, comprising a site-specific endonuclease and a 5’-3’ exonuclease, wherein the site-specific endonuclease and the 5’-3’ exonuclease are fused via a polypeptide linker, and wherein the polypeptide linker has a length of 25 amino acid residues or more, preferably 30 amino acid residues or more, and wherein the site-specific endonuclease is preferably as defined in claim 8, and wherein the 5’-3’ exonuclease is a protein whose amino acid sequence is or comprises
(i) the amino acid sequence defined in SEQ ID NO: 32 (LIL12-1) or SEQ ID NO: 33 (UL12-2), or
(ii) an amino acid sequence that has at least 80% sequence identity to the amino acid sequence defined in SEQ ID NO: 32 or SEQ ID NO: 33, or
(iii) an amino acid sequence that has at least 90% sequence similarity to the amino acid sequence defined in SEQ ID NO: 32 or SEQ ID NO: 33, or
(iv) an amino acid sequence having from 1 to 120 amino acid substitutions, additions, insertions and/or deletions compared to the amino acid sequence defined in SEQ ID NO: 32 or SEQ ID NO: 33.
28. The kit according to claim 27, wherein the 5’-3’ exonuclease has the same or a higher 5’-3’ exonuclease activity in terms of catalytic efficiency kcat/Km or in terms of the turnover number than the T7 exonuclease (SEQ ID NO: 30) in the in-vitro exonuclease assay described in the description; or has a 5’-3’ exonuclease activity that is at least twice that of the T5 exonuclease (SEQ ID NO: 31) in terms of catalytic efficiency kcat/Km or in terms of the turnover number in the in-vitro exonuclease assay described in the description.
29. The kit for editing endogenous DNA according to claim 27 or 28, wherein the 5’-3’ exonuclease is fused to the N-terminal end or to the C-terminal end of the site-specific endonuclease.
30. A prokaryotic or eukaryotic cell comprising the kit according to any one of claims 27 to 29.
31 . The cell according to claim 30, wherein said cell is a plant cell.
32. A non-human organism, preferably a plant, comprising a cell according to claim 30 or 31.
33. A kit for editing endogenous DNA at a target site in a eukaryotic cell or in a eukaryotic organism, the kit comprising
(a) a donor nucleic acid as defined in claim 22 or a donor construct comprising said donor nucleic acid, and
(b) a protein as defined in any one of claims 1 to 16, or a nucleic acid molecule as defined in claim 17, or a nucleic acid construct, plasmid or vector according to claim 18, or a kit according to claim 19 or 27 to 29.
34. The kit according to claim 33, further comprising
(c) a cell of a eukaryotic organism or a eukaryotic organism.
35. The kit according to claim 33 or 34, further comprising
(d) a guide RNA (gRNA) being capable of binding to the site-specific endonuclease and of directing the protein to the target site on the endogenous DNA of said cell or organism; or a nucleic acid molecule encoding said guide RNA.
36. A method for modifying endogenous DNA of a eukaryotic cell or a eukaryotic organism at a target site, the method comprising providing the cell or organism with:
(a) a donor nucleic acid as defined in claim 22 and (b) a protein as defined in any one of claims 1 to 16, or a nucleic acid molecule as defined in claim 17, or a nucleic acid construct, plasmid or vector according to claim 18, or a kit according to claim 19 or 27 to 29, wherein said modifying of endogenous DNA involves inserting a donor sequence of interest into endogenous DNA at the target site through homology directed repair, and/or involves deleting a sequence in the endogenous DNA at the target site through homology directed repair. The method according to claim 36, said method being a method of inserting a donor sequence of interest into the endogenous DNA of a eukaryotic cell or a eukaryotic organism at a target site, wherein a donor sequence of interest contained in said donor nucleic acid is inserted into the endogenous DNA. The method according to claim 36 or 37, further comprising providing the cell or organism with:
(c) a guide RNA (gRNA) capable of binding to the site-specific endonuclease and of directing the protein to said target site in the endogenous DNA of said cell or organism, or with a nucleic acid (guide nucleic acid) encoding said guide RNA. The method according to any one of claims 36 to 38, wherein the donor sequence has a length of up to 15 kbp, more preferably 20 kbp. method for modifying endogenous DNA of a eukaryotic cell or a eukaryotic organism at a target site, the method comprising providing the cell or organism with: a protein as defined in any one of claims 1 to 16, or a nucleic acid molecule as defined in claim 17, or a nucleic acid construct, plasmid or vector according to claim 18, or a kit according to claim 19 or 27 to 29, wherein said modifying of endogenous DNA involves deleting a sequence of two or more nucleotides in the endogenous DNA upstream and/or downstream of the target site in a random manner, wherein no donor nucleic acid or donor sequence is provided. Use of the protein according to any one of claims 1 to 16 or the kit according to claims 19 or 27 to 29 for gene editing in a eukaryotic cell. cell or a eukaryotic organism generated by the method according to any one of claims 36 to 40.
EP21814696.7A 2020-11-11 2021-11-10 Fusion protein for editing endogenous dna of a eukaryotic cell Pending EP4243608A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
EP20206999 2020-11-11
EP21197718 2021-09-20
PCT/EP2021/081279 WO2022101286A1 (en) 2020-11-11 2021-11-10 Fusion protein for editing endogenous dna of a eukaryotic cell

Publications (1)

Publication Number Publication Date
EP4243608A1 true EP4243608A1 (en) 2023-09-20

Family

ID=78770582

Family Applications (1)

Application Number Title Priority Date Filing Date
EP21814696.7A Pending EP4243608A1 (en) 2020-11-11 2021-11-10 Fusion protein for editing endogenous dna of a eukaryotic cell

Country Status (2)

Country Link
EP (1) EP4243608A1 (en)
WO (1) WO2022101286A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230392160A1 (en) * 2022-04-12 2023-12-07 John Innes Centre Compositions and methods for increasing genome editing efficiency

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2811889A (en) 1987-10-20 1989-05-23 Plant Genetic Systems N.V. A process for the production of biologically active peptide via the expression of modified storage seed protein genes in transgenic plants
GB8810120D0 (en) 1988-04-28 1988-06-02 Plant Genetic Systems Nv Transgenic nuclear male sterile plants
AU1192392A (en) 1991-02-08 1992-09-07 Plant Genetic Systems N.V. Stamen-specific promoters from rice
DE69533037T2 (en) 1994-08-30 2005-05-04 Commonwealth Scientific And Industrial Research Organisation PLANT-TREATMENT REGULATOR OF CIRCOVIRUS
AU718082B2 (en) 1995-10-06 2000-04-06 Plant Genetic Systems N.V. Seed shattering
US20100291633A1 (en) 2007-09-03 2010-11-18 Thorsten Selmer Method of cloning at least one nucleic acid molecule of interest using type iis restriction endonucleases, and corresponding cloning vectors, kits and system using type iis restriction endonucleases
EP2395087A1 (en) 2010-06-11 2011-12-14 Icon Genetics GmbH System and method of modular cloning
EP2418283A1 (en) 2010-08-07 2012-02-15 Nomad Bioscience GmbH Process of transfecting plants
WO2012118717A2 (en) * 2011-02-28 2012-09-07 Seattle Children's Research Institute Coupling endonucleases with end-processing enzymes drive high efficiency gene disruption
ES2919864T3 (en) 2012-10-05 2022-07-28 Biontech Delivery Tech Gmbh Hydroxylated polyamine derivatives as transfection reagents
CN112370532A (en) 2012-10-08 2021-02-19 生物技术传送科技有限责任公司 Carboxylated polyamine derivatives as transfection reagents
PT2898075E (en) 2012-12-12 2016-06-16 Harvard College Engineering and optimization of improved systems, methods and enzyme compositions for sequence manipulation
PT2896697E (en) 2012-12-12 2015-12-31 Massachusetts Inst Technology Engineering of systems, methods and optimized guide compositions for sequence manipulation
WO2016025759A1 (en) * 2014-08-14 2016-02-18 Shen Yuelei Dna knock-in system
US9790490B2 (en) 2015-06-18 2017-10-17 The Broad Institute Inc. CRISPR enzymes and systems
US20170175140A1 (en) 2015-12-16 2017-06-22 Regents Of The University Of Minnesota Methods for using a 5'-exonuclease to increase homologous recombination in eukaryotic cells
US9896696B2 (en) 2016-02-15 2018-02-20 Benson Hill Biosystems, Inc. Compositions and methods for modifying genomes
EP3575396A1 (en) * 2018-06-01 2019-12-04 Algentech SAS Gene targeting
US11873322B2 (en) * 2018-06-25 2024-01-16 Yeda Research And Development Co. Ltd. Systems and methods for increasing efficiency of genome editing
GB2596660B (en) * 2019-01-07 2023-09-13 Crisp Hr Therapeutics Inc A non-toxic Cas9 enzyme and application thereof

Also Published As

Publication number Publication date
WO2022101286A1 (en) 2022-05-19

Similar Documents

Publication Publication Date Title
EP3365440B1 (en) Restoring function to a non-functional gene product via guided cas systems and methods of use
De Pater et al. ZFN‐induced mutagenesis and gene‐targeting in Arabidopsis through Agrobacterium‐mediated floral dip transformation
AU2016265560B2 (en) Rapid characterization of Cas endonuclease systems, PAM sequences and guide RNA elements
EP3110945B1 (en) Compositions and methods for site directed genomic modification
Tzfira et al. Site-specific integration of Agrobacterium tumefaciens T-DNA via double-stranded intermediates
KR20210104068A (en) Novel CRISPR-CAS system for genome editing
US11584936B2 (en) Targeted viral-mediated plant genome editing using CRISPR /Cas9
EP3036327B1 (en) Genome modification using guide polynucleotide/cas endonuclease systems and methods of use
JP2018531024A6 (en) Methods and compositions for marker-free genome modification
CN106086047B (en) Targeted genome engineering
CN112020554A (en) Novel CAS9 orthologs
US20160060637A1 (en) Improved Gene Targeting and Nucleic Acid Carrier Molecule, In Particular for Use in Plants
CN112126637B (en) Adenosine deaminase and related biological material and application thereof
CN116391038A (en) Engineered Cas endonuclease variants for improved genome editing
EP4243608A1 (en) Fusion protein for editing endogenous dna of a eukaryotic cell
JP2023515116A (en) A novel CRISPR-CAS system for genome editing
US20160222395A1 (en) Agrobacterium-mediated genome modification without t-dna integration
CN116615226A (en) Fusion proteins for editing endogenous DNA of eukaryotic cells
TW201945537A (en) Cloning vector, kit, and method for specifically inducing mutagenesis in chloroplast genes, and transgenic plant cells and agrobacterium generated by the same
BR112017024535B1 (en) IN VITRO METHOD FOR IDENTIFICATION OF A PROTOSPACER ADJACENT MOTIVE SEQUENCE (PAM)

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20230322

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS