WO2019238772A1

WO2019238772A1 - Polynucleotide constructs and methods of gene editing using cpf1

Info

Publication number: WO2019238772A1
Application number: PCT/EP2019/065382
Authority: WO
Inventors: Jan Gerrit SCHAART; Marinus Johannes Maria Smulders
Original assignee: Stichting Wageningen Research
Priority date: 2018-06-13
Filing date: 2019-06-12
Publication date: 2019-12-19
Also published as: GB201809709D0

Abstract

Described is a repair process for genetic material which is Complementarity Directed End Joining (CDEJ). CDEJ essentially follows the same repair mechanism as alternative Non- Homologous End Joining (aNHEJ), except the 5' extension gives complementarity. A specifically designed excision construct is provided whereby after cutting with Cpf1 there is a joining of DNA via perfect matching complementary 5'-ends. Annealing and ligation generates a precise and predictable repair. Employing the viability of precise CDEJ repair systems, there is provided a completely removable marker selection system which can be removed by stable or (ideally) transient expression of CRISPR Cpf1 targeting the distal ends of the marker sequence. Molecular analysis of the gene edited products produced using the selection system show that neither the length of crRNA, nor the nature of the target sequence affected precision or efficiency.

Description

POLYNUCLEOTIDE CONSTRUCTS AND METHODS OF GENE EDITING USING CPF1

FIELD OF ART

The present invention relates to biology and to gene editing and the modification of genomes of living organisms using Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) gene editing technology. More particularly the invention relates to gene editing and genome modification of organisms using CRISPR from Prevotella and Francisella 1 (Cpf1 ).

BACKGROUND

Emerging technologies like genome editing promise an ability to modify genomes of living organisms with unprecedented accuracy and precision. There are a wide range of applications in many fields like medicine, genetics, industry and plant breeding.

Traditionally plant breeding has relied on incorporating traits of interest into elite lines through backcrossing; this process is time consuming, takes multiple generations, and is limited in vegetatively propagated plants and plants with long generation times. One way to overcome these barriers is the use of plant transformation which utilizes Agrobacterium or particle bombardment to incorporate genes at random locations into the plant genome to get desirable traits, but this is limited heavily by regulations. Another way to incorporate desired traits is through chemical or radiation induced mutagenesis of plants, but this is random and unpredictable. Genome editing promises to accelerate plant breeding by transferring or modifying genes in elite backgrounds at precise genomic locations.

Current Editing Systems

Various systems have been developed to perform genome editing, which includes, zinc finger nucleases (ZFN), transcription activator like effector nucleases (TALENs), meganucleases, and most recently clustered regularly interspaced short palindromic repeats associated protein. For a review article see: Gaj, T., Gersbach, C. A. & Barbas III, C. F. ZFN, TALEN, and CRISPR/Cas-based methods for genome engineering. Trends in biotechnology 31 , 397-405 (2013). Each of the main systems is shown schematically in Figure 1 hereto. These systems all work on the same principles; all contain a DNA sequence targeting domain for targeting specific DNA sequences and a nuclease domain to create DNA breaks. The sequence targeting domain are, respectively, zinc fingers (ZF), modified transcription activator like effectors (TALE), or a crRNA (CRISPR RNA) in the Cas complex. ZFs and TALEs are linked to endonucleases like Fok1 for ZFNs and TALENs to perform DNA breaks, while CRISPR-Cas contains nuclease domains within the Cas complex. DNA double stranded breaks (DSBs) become the site of modification through low-fidelity repair mechanisms or through template/donor integration.

CRISPR Cas

CRISPR-Cas refers to a diverse group of bacterial and archaeal adaptive immune systems. (See: Wright, A. V., Nunez, J. K. & Doudna, J. A. Biology and applications of CRISPR systems: harnessing nature’s toolbox for genome engineering. Cell 164, 29-44 (2016)). These systems have evolved as a defence of phages; essentially when cells are infected with a new phage the CRISPR-Cas machinery cuts up the phage’s DNA and inserts pieces into the CRISPR array which consists of repeat sequences separated by protospacers. The arrays are transcribed and cleaved and serve as RNA templates to target phage sequences allowing the nuclease to target and to break up any homologous phage sequence, therefore providing immunity to the respective phage. The Cas complex also contains a domain, which recognizes a protospacer adjacent motif (PAM) functioning to limit cleavage of endogenous sequences within the CRISPR array; motif recognition prevents self cleavage by only allowing cleavage adjacent to a specific short nucleotide sequence not found within the CRISPR array.

Cas proteins are very diverse in structure as they exist in such a wide variety of organisms, but these proteins thus far have been classified into six types across two classes. Types I, III, and IV are classified as class 1 systems since the effector complex consists of multiple protein subunits. Types II, V, and VI are classified as class 2 systems as the effector complex consists of one protein subunit. Class 2 proteins are the most applicable to genome editing as the single subunit allows for easier cloning and transformation as well as simpler complex assembly in vivo. The two systems that have seen the most attention for their utilization in genome editing are the CRISPR-Cas9 and CRISPR-Cpf1 systems, which are types II and V respectively.

CRISPR-Cas9

CRISPR-Cas9 was the first CRISPR system to be widely applied for genome editing.

(See: Lander, E. S. The heroes of CRISPR. Cell 164, 18-28 (2016)). This system differed from previous genome editing systems due to its ease of use. Other systems which use TALE or zinc finger proteins to target sequences require a more complex design and construction of targeting domains. With CRISPR-Cas9 the sequence targeting system consists of a PAM interacting domain and a crRNA as a targeting system. The crRNA also binds to a tracrRNA (transactivating crRNA) in the complex. Later development of the CRISPR-Cas9 system for expression in eukaryotes led to the creation of a fusion between crRNA and tracrRNA, known as single-guide RNAs (sgRNAs). The two main Cas9 systems used to date are modified from Streptococcus thermophilus and Streptococcus pyogenes Cas9 proteins, and RNAs; modified versions have been codon optimized for humans and plants and have a nuclear localization signal and use a sgRNA. These systems have been successfully used to edit DNA in various animals, fungi, oomycetes, and plants. The Cas9 systems come with some drawbacks. Issues are related to the systems application in the later explained system for marker excision like the (DSB) falling within the target sequence and the creation of blunt ends at DSBs better explained in comparison with CRISPR Cpf1.

CRISPR-Cofl (Cas12a)

Due to the various issues with Cas9, attention has turned to a more recently discovered Cas protein known as Cpf1 (CRISPR from Prevotella and Francisella 1 ). Three Cpf1 systems have been well characterized in Francisella novicida U 112, Acidaminococcus sp. BV3L6 and Lachnospiraceae bacterium ND2006 (See Zetsche, B. et al. Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR-Cas system. Cell 163, 759-771 (2015)). As mentioned earlier, this system consists of a single protein subunit like Cas9 but differs in key ways. Cpf1 recognizes a 5’-TTTV PAM at the 5’ end of the target sequence contrary to Cas9 which recognizes a 5’-NGG PAM at the 3’ end of the target sequence. The PAM positioning in Cpf1 allows for DNA cleavage beyond the 3’ end of the target sequence when using truncated crRNAs, whilst Cas9 cuts proximal to the PAM. Cpf1 also leaves a 4 to 5 nucleotide overhang after a DSB in vitro, which could be exploited for more targeted sequence insertions or ligations, compared to Cas9 which leaves blunt ends.

Cpf1 also utilizes a single crRNA as a guide, while Cas9 uses either a crRNA with a tracrRNA or a fusion sgRNA.

CRISPR Cpf1 can act as an alternative to the widely used CRISPR Cas9 system.

Previously suggested applications (Zetsche et al (2015) supra) include expanded use in organisms with AT rich genomes based on the T-rich PAM, non Homology Directed Repair (HDR) dependent insertions based on overhangs left by Cpf1 , and improved HDR based on the of Cpf1 to cut multiple times before deletions or HDR affect the core target site.

Kim, D. et al. Genome-wide analysis reveals specificities of Cpf1 endonucleases in human cells. Nature biotechnology 34, 863 (2016); Yan, W. X. et al. BLISS is a versatile and quantitative method for genome-wide profiling of DNA double-strand breaks. Nature communications 8 15058 (2017); and Lei, C. et at. The CCTL (Cpf1 -assisted Cutting and Taq DNA ligase-assisted Ligation) method for efficient editing of large DNA constructs in vitro. Nucleic acids research 45, e74-e74 (2017) are various publications of in vivo experiments using Cpfl to cut DNA show a wide range of overhangs left by LbCpfl , from large 5’ overhangs to even 3’ overhangs.

LbCpfl is known in research using it to undertake genetic modifications in plants, and the following are noted: Wang, M., Mao, Y., Lu, Y., Tao, X. & Zhu, J.-k. Multiplex gene editing in rice using the CRISPR-Cpfl system. Molecular plant 10, 1011-1013 (2017); Tang, X. et al. A CRISPR-Cpfl system for efficient genome editing and transcriptional repression in plants. Nature plants 3, 17018 (2017); Xu, R. et al. Generation of targeted mutant rice using a CRISPR-Cpfl system. Plant biotechnology journal 15, 713-717 (2017); Kim, H. et al. CRISPR/Cpf1 -mediated DNA-free plant genome editing. Nature Communications 8, 14406 (2017); Begemann, M. B. et al. Precise insertion and guided editing of higher plant genomes using Cpfl CRISPR nucleases. Scientific reports 7, 1 1606 (2017).

AsCpfl and FnCpfl have also successfully been used in plants (e.g. see Hu X, Wang C, Liu Q, Fu Y, Wang K (2017) Targeted mutagenesis in rice using CRISPR-Cpfl system. J Genet Genomics 44(1 ):71-73.

W02017/015015 EMORY UNIVERSITY discloses general application of Cpfl . Described is insertion of nucleic acids into genetic material, but as a result of HDR of Cpfl -induced DNA-nicks or breaks.

GB2531454 SNIPR TECHNOLOGIES LTD describes a method for excision of a donor DNA fragment with homology arms from a circular vector using CRISPR-Cas to provide recombinogenic nucleic acid strands. The excised donor DNA fragment is used for HDR in target cells. The free ends of the linearized donor DNA fragments promote homologous recombination.

DNA repair systems

Currently the main system used for targeted DNA insertion is Homology Directed Repair (HDR). Orthwein, A. et al. A mechanism for the suppression of homologous recombination in G1 cells. Nature 528, 422 (2015) describes a system largely active in the S phase and G2 phase of the cell cycle which limits its application in nondividing cells. Shrivastav, M., De Haro, L. P. & Nickoloff, J. A. Regulation of DNA double-strand break repair pathway choice. Cell research 18, 134 (2008) describes how HDR also does not occur as often as other repair mechanisms.

Other DNA insertion techniques have been devised that exploit other repair mechanisms, such as classical Non-Homologous End Joining (cNHEJ) and alternative Non-Homologous End Joining (aNHEJ). Homology independent targeted integrations (HITI) exploit cNHEJ; this system essentially makes DSBs at a target site while providing donor strands, the cell then integrates these strands through cNHEJ (see Auer, T. O., Duroure, K., De Cian, A., Concordet, J.-P. & Del Bene, F. Highly efficient CRISPR/Cas9-mediated knock-in in zebrafish by homology-independent DNA repair. Genome research 24, 142-153 (2014); Maresca, M., Lin, V. G., Guo, N. & Yang, Y. Obligate ligation-gated recombination

(ObLiGaRe): custom-designed nuclease-mediated targeted integration through

nonhomologous end joining. Genome research 23, 539-546 (2013); and Suzuki, K. et al. In vivo genome editing via CRISPR/Cas9 mediated homology-independent targeted integration. Nature 540, 144 (2016)).

Nakade, S. et al. Microhomology-mediated end-joining-dependent integration of donor DNA in cells and animals using TALENs and CRISPR/Cas9. Nature communications 5, 5560 (2014) describes another technique used called precise integration into target chromosome (PITCh) and this exploits the aNHEJ pathway. This system also makes DSBs while providing donor strands, however the donor strands are flanked short repeats with microhomologies to the sequences flanking the chromosomal break to promote use of the aNHEJ pathway.

Orlando, S. J. et al. Zinc-finger nuclease-driven targeted integration into mammalian genomes using donors with limited chromosomal homology. Nucleic acids research 38, e152-e152 (2010) describes other attempts at exploiting the aNHEJ by providing strand flanked by 50 to 100 bp of homology. This study also explored the use of the 4 bp 5’ overhangs left by ZFNs for targeted insertions and were able to get targeted insertions at rates comparable to HDR.

WO2018/099475 S HANG AH I INST FOR BIOLOGICAL SCIENCES CHINESE ACADEMY OF SCIENCES discloses a site-directed editing method for plant genomes based on Cpf1. In particular, disclosed are a nucleic acid construct, vector or vector combination for plant genome site-directed editing based on Cpf1 (AsCpfl , FnCpfl , LbCpfl ). The nucleic acid construct comprises a first expression cassette and an optional second expression cassette. The first expression cassette is a Cpf1 -NLS fusion protein expression cassette. The second expression cassette is a crRNA expression cassette. With the method, the single-gene knockout, multiple-gene knockout or homologous recombination and directional insertion of foreign fragments can be performed simply and efficiently at a predetermined plant genomic site.

Removable selectable markers

A limitation to some applications of genome editing, like HDR or other forms of DNA integration, is the lack of selectability without the incorporation of a selectable marker or specifically making a selectable change in a gene (i.e. like introducing herbicide tolerance). Schaart, J. G., Krens, F. A., Pelgrom, K. T., Mendes, O. & Rouwendal, G. J. Effective production of marker-free transgenic strawberry plants using inducible site-specific recombination and a bifunctional selectable marker gene. Plant biotechnology journal 2, 233-240 (2004) describes a removable selectable marker has been developed previously for marker free transgenic plants. (See also EP 1264891 A1 Plant Research International B.V.) This system uses a marker fusion of codA and nptll (CN) which should allow for both negative selection and positive selection along with an inducible recombinase gene flanked by recombinase sites. This fusion marker when brought along with a transgene allows for positive selection on kanamycin, followed by negative selection on 5- fluorocytosine after induced recombinase removal of the marker recombinase selection unit plants. This system however does not lead to complete excision, leaving behind recombination sites, which is not desirable.

WO2018/025206 A1 Kyoto University discloses a method of producing a cell having a scarless modified genome sequence, wherein an exogenous nucleic acid sequence which has been inserted into a target region in the genome is completely excised. The exogenous nucleic acid sequence comprises a nucleic acid sequence homologous to a genome sequence in the targeted region at each end and one or more sequence-specific nuclease-recognizing site(s) between the two homologous nucleic acid sequences, and where in the method comprises: (1 ) introducing the sequence-specific nuclease or a nucleic acid encoding the same into a host cell having a genome sequence into which the exogenous nucleic acid sequence is inserted; and (2) culturing the cell obtained in step (1 ), thereby causing double-strand break at the sequence-specific nuclease-recognizing site(s) and the subsequent microhomology-mediated end joining or single-strand annealing between the resulting broken ends that contain the homologous nucleic acid sequences to generate a cell having a scarlessly reverted genome sequence in which the exogenous nucleic acid sequence is completely excised from the targeted region. The method disclosed therefore relies on microhomology-mediated end joining process following a double strand break in order to excise a previously integrated exogenous nucleic acid fragment in the genome. The exogenous nucleic acid therefore has to comprise spaced apart microhomology regions outside of the nuclease recognition sites.

An object of the invention is to provide an improved method of gene editing in organisms which incorporates seamless removal of the construct used.

BRIEF SUMMARY OF THE DISCLOSURE The inventors have discovered that processed Cpf1 ends undergo precise repair. Without wishing to be bound by any particular theory, the inventors describe herein a novel repair mechanism, called Complementarity Directed End Joining (CDEJ). CDEJ essentially follows the same repair mechanism as aNHEJ, except the 5’ extension gives

complementarity. By specifically designing the excision construct, the inventors expect that joining of DNA ends with perfect matching complementary 5’-ends by annealing and ligation will result in precise and predictable repair.

Employing the viability of precise CDEJ repair systems, the inventors have constructed a completely removable selection system design to be removed by stable or (ideally) transient expression of CRISPR Cpf1 targeting the distal ends of the marker sequence. Molecular analysis of the gene edited products produced using the selection system revealed surprisingly that neither the length of crRNA, nor the nature of the target sequence affected precision or efficiency.

The inventors have found that by using the same fusion marker flanked by Cpf1 target sequences it is possible to exploit the ability of Cpf1 to cut outside its target site and to leave 5 bp long 5’ overhangs which then allow for complete excision with precise CDEJ.

By incorporating such a marker in a template/donor for HDR or other targeted integrations, it is now possible to select for integrations as well as select for the precise targeted mutations carried by the template followed by seamless marker removal (see Figures 2A and 2B).

Accordingly, the present invention provides a double stranded DNA polynucleotide for insertion into DNA of an organism at a desired target locus, which target locus comprises a DNA sequence cleavable by a site-directed nuclease enzyme, the polynucleotide comprising, in linear order:

(a) a nucleotide sequence with homology to a portion of the target locus upstream of the sequence cleavable by the nuclease;

(b) a first Cpf1 target sequence;

(c) at least one selected nucleotide sequence;

(d) a second Cpf1 target sequence in an inverse orientation to (b); and

(e) a nucleotide sequence with homology to a portion of the target locus

downstream of sequence cleavable by the nuclease;

wherein the sequence (a) and/or the sequence (e) includes at least one change in sequence compared to the respective homologous sequence portion in the DNA of the organism; and wherein the nuclease cleavable sequence of the target locus includes the same sequence that is cleaved by a Cpf1-crRNA guide complex, when acting at (b) and (d).

In the invention, 5’-overhangs are created by cleavage using Cpf1 at artificial target sites. The 5’-overhanqs are used to achieve the“seamless” ligation of two cleaved sites, after excision of a DNA fragment (marker). The sequence of the 5’-overhangs is at least similar to the nucleotide sequence at the genomic target locus, so that when an inserted marker fragment is excised with Cpf1 , subsequent ligation of sticky end strands results in recovery of the original genomic sequence. Therefore there is an accurate and precise ligation of sequence complementary 5’-overhangs. This what is meant by the term‘seamless’ excision used herein.

The double stranded DNA polynucleotide may also be considered as being a DNA “insertion template”. The DNA of an organism may be that found in a cell of the organism, whether a prokaryote or eukaryote, single celled or multicellular. Such DNA can be in the form of chromosomal DNA. The DNA may be genomic or it may be associated with a subcellular compartment such as a mitochondrion or particularly where plants are concerned, a plastid. The target locus of the DNA of an organism may be of any desired length or location in the genomic material (nuclear or otherwise) and there is not really a maximum length to consider. This target may be as long as entire coding regions including control elements, e.g. 100s, 1000s or 10,000s or 100,000s of contiguous nucleotides long; or may be as short as the specific endonuclease or CRISPR enzyme- directed cutting site plus additional contiguous nucleotides upstream and downstream to function as homology arms. These homology arms may be between 20bp and 1000bp long. In a method of homology-independent targeted integration (HITI), no specific length is required and sequences are just inserted at the location of a double strand break.

In the nucleotide sequences (a) which have homology to a portion of the target locus upstream of the sequence cleavable by the nuclease, the portion of the target locus to which there is homology in (a) may range from 100% to about 0.001 % of the desired target. In some instances, the sequence cleavable by the nuclease and the portion of the target locus to which there is homology in (a) may be spaced apart by a number of contiguous nucleotides. Such spacing nucleotides may be any number of nucleotides, for example in the range 1 to 1000 nucleotides; optionally 1 - 100, 1 - 50 or 1 - 20 nucleotides. Similarly, for nucleotide sequences (e) which have homology to a portion of the target locus downstream of sequence cleavable by the nuclease, the aforementioned also applies. The length of the sequences“(a) which have homology to a portion of the target locus upstream of the sequence cleavable by the nuclease”, and the length of the“sequences (e) which have homology to a portion of the target locus downstream of sequence cleavable by the nuclease” may be as few as 20 contiguous nucleotides long and as much as 100,000 nucleotides long.

The sequences (a) and (e) above may be the same length or different lengths.

The term“homology” as used herein to define a nucleotide sequence, may also be considered in terms of percentage identity to a target or reference nucleotide sequence. Sequence identity may be determined using a global alignment algorithm known in the art, such as the Needleman Wunsch algorithm in the program GAP (GCG Wisconsin Package, Accelrys). Also, alignment and determining percent sequence identity can be achieved in various ways that are within the skill in the art, for instance, using publicly available computer software such as BLAST, BLAST-2, ALIGN, ALIGN-2 or Megalign (DNASTAR) software. Appropriate parameters for measuring alignment, including any algorithms needed to achieve maximal alignment over the full-length of the sequences being compared can be determined by known methods.

Where there is“at least one change in sequence” compared to the respective homologous sequence portion in the DNA of the organism this at least one change in sequence may be selected from: a base insertion or insertions, a base deletion or deletions, base change or changes, or any combination thereof. This therefore represents possibilities of a substitution, addition, deletion, inversion or insertion of one or more nucleotide residues in the DNA of the organism. These changes can involve just a single nucleotide residue (a point change) or a multiplicity of contiguous or non-contiguous changes of nucleotide residue. In greater extent the change can be the addition (e.g. duplication or

multiplication) of coding or control regions or DNA, including genes and/or promoters, whether in whole or in part. Also possible are deletions of regions of DNA including coding genes and/or control elements. Further is the possible addition, deletion or substitution of a gene and/or control element or parts thereof, e.g. promoter and/or enhancer upstream of a native gene sequence. In this way gene editing of the invention may be used to increase or decrease a level of native gene expression.

In other aspects, the present invention may allow an overexpression (OX) of a gene of interest, usually a gene already present in the genome of the organism, simply by adding further copies of the native gene, with or without addition regulatory elements which may or may not be heterologous, i.e. non-native and from another strain, variety or species or organism. In some circumstances the site of the at least one change in sequence in (a) is adjacent to (b) and/or the site of the at least one change in sequence in (e) is adjacent to (d). In other circumstances the site of the at least one change may not be adjacent to either (b) or (d) and so the change may lie within the sequences (a) or (e) more distant from the first and/or second Cpf1 target sequences.

In a preferred aspect, the at least one change in sequence comprises insertion of a polynucleotide. This means that the inserted polynucleotide may encode a gene of interest (GOI). Further, there may be a promoter included for the particular GOI located upstream thereof. This allows the polynucleotides of the invention to be used to introduce additional genes for overexpression, or to introduce heterologous genes for novel expression of products in an organism of interest.

Optionally there may be an additional GOI downstream of the second Cpf1 target sequence (d) and a promoter for the additional GOI upstream of the first Cpf1 target sequence (b). Where the GOI leads to an expressed product which is observable directly or indirectly by assay, this provides a way of checking for excision of the marker (c).

In any aspect of the invention, there is preferably no pair of microhomologous regions located outside of (b), (c) and (d). This is in order to exclude the possibility of

microhomology-mediated end joining (MMEJ) following Cpf1 site-directed cleavage.

Advantageously, the polynucleotides of the invention are constructed so as to harness the newly observed CDEJ.

Also provided in accordance with the invention is an isolated DNA polynucleotide comprising (b) a first Cpf1 target sequence; (c) a nucleotide sequence encoding at least one marker; and (d) a second Cpf1 target sequence; wherein (b) is upstream of (c); (d) is downstream of (c); and (d) is in inverse orientation to (b). (There is intentionally no item (a) in this paragraph).

Further provided in accordance with the invention is an isolated DNA polynucleotide consisting of (b) a first Cpf1 target sequence; (c) a nucleotide sequence encoding at least one marker; and (d) a second Cpf1 target sequence; wherein (b) is upstream of (c); (d) is downstream of (c); and (d) is in an inverse orientation to (b). (There is intentionally no item (a) in this paragraph).

In accordance with any aspect of the invention there may be at least two markers which are selection markers; possibly wherein the markers are operatively linked under the control of at least one promoter for expression in a cell of the organism. The term "operably linked" as used herein refers to a functional linkage between the promoter sequence and the selection marker, such that the promoter sequence is able to initiate transcription of the marker.

By way of example of some preferred markers, these are a kanamycin resistance gene, a cytosine deaminase gene, a hygromycin resistance gene, or a fluorescent protein gene. They may be used in any combination as appropriate to a sequential selection system.

The Cpf1 target sequences (b) and (d) in inverse orientation preferably each consist of a PAM sequence, a crRNA recognition sequence, and a distal sequence. The distal sequences may be 4 or 5 contiguous nucleotides and upon Cpf1 cleavage this distal sequence is the origin of a 5’ overhang; preferably a 4 or 5 base overhang.

The PAM sequence is preferably TTTN. Suitable other sequences may be determined in accordance with methods well known in the art for a specific or an artificial or modified Cpf1 which may be used.

The crRNA recognition sequence may be at least 16 nucleotides and up to about 24 nucleotides. The crRNA recognition sequence may be 17, 18, 19, 20, 21 , 22 or 23 contiguous nucleotides long. The recognition sequence may be longer than 24 nucleotides, e.g. up to and including 30 nucleotides.

The distal sequence may also comprise additional contiguous nucleotides between the crRNA recognition sequence and the 4 - 5 nucleotide cleavage sequence. These additional contiguous nucleotides are usually just a few in number, maybe as many as 5, 6, 7, 8 or 9 in number. Such additional nucleotides may be present due to cloning activity in preparing templates, plasmids or vectors of the invention, e.g. using Golden Gate cloning and assembly.

In some aspects of the invention, the Cpf1 target sequence in (d) may be identical to or substantially the same as the Cpf1 target sequence (b). That is to say, the difference may be one, two, three or four nucleotides, which are more usually non-contiguous of each other.

In other aspects of the invention, the Cpf1 target sequence in (d) may be different to the Cpf1 target sequence (b), apart from the 4 - 5bp cleavage sequence which would remain the same.

Various PAM and recognition sequence combinations are possible, but in some preferred aspects, such sequences may be

(i) 5’ TTT ATGTCCCCT GTT GAC 3’ [SEQ ID NO: 1];

(ii) 5’ TTT AG G ATG C C ACT AAAA 3’ [SEQ ID NO: 2]; (iii) 5’ TTT AG AT C G AAT CTTCTA 3’ [SEQ ID NO: 3]; or

(iv) 5’ TTTGTG CT AAC G CT GAT G 3’[SEQ ID NO: 4]

Other active Cpf1 target sites may be selected from the literature, including via target site selection software such as that available via https://benchling.com.

Also provided is an isolated RNA molecule encoding a polynucleotide as described herein.

In other aspects, the invention provides a plasmid or vector comprising a polynucleotide as described herein; preferably an expression vector.

In further aspect, the invention includes a cell transformed with a polynucleotide or a plasmid or vector as described herein.

The invention is applicable to the cells of any organism, whether plants, animals including humans, bacteria or fungi. The invention though does not include a process of modifying the germ line genetic identity of human beings.

Also described are cells produced by such methods, and organisms (such as animals, plants, or fungi) comprising or produced from such cells.

In preferred aspects, the invention is useful in the editing and modification of plant genomes and so the cell is a plant cell. Equally though the invention is useful in connection with any prokaryotic or eukaryotic organism, single or multicellular. Any animal genome may be edited or modified in using the polynucleotides and methods of the invention as herein described.

Therefore, the invention provides a method of seamless genetic modification of a cell. In the following aspects, the polynucleotide, vector or plasmid as described herein does not include a Cpf1 encoding sequence such that the Cpf1 is inducibly expressed in a transformed cell. Therefore, the Cpf1 and crRNA to form the Cpf1 ribonucleoprotein is provided in various other ways as defined below.

Accordingly, a method of seamless genetic modification of a cell comprises:

(i) making a double stranded break at a desired target locus in the DNA in a cell of an organism;

(ii) introducing a polynucleotide or a vector or plasmid as herein described into the cell;

(iii) applying a first selection screen which identifies cells wherein the polynucleotide is integrated into the DNA of the cell;

(iv) collecting the identified cells of (iii); (v) transforming the cells of (iv) with (A) an expression vector comprising polynucleotide sequence encoding Cpf1 and encoding a crRNA at least substantially complementary to the crRNA recognition sequence in the Cpf1 target sequence in the introduced polynucleotide; or (B) a first expression vector comprising a polynucleotide encoding Cpf1 and a second expression vector or an mRNA encoding a crRNA at least substantially complementary to the crRNA recognition sequence in the Cpf1 target sequence in the introduced polynucleotide;

(vi) applying a second selection screen which identifies cells wherein sequences (b), (c) and (d) of the integrated polynucleotide are excised from the DNA of the cell by Cpf1 cleavage; and

(vii) collecting the identified cells of (vi).

The invention requires design Cpf1 -target sites to include a short stretch of genomic DNA sequence that originates from the target genome. This short stretch of genomic DNA sequence is the exact location where selectable marker sequences are inserted. After Cpf1 -mediated excision of the marker sequences, the original genomic target sequence is restored. This is what is meant by the term“seamless” excision.

In other words, there is no detectable trace of any selectable marker or accompanying sequences associated with it’s insertion or subsequent removal. Any sequencing analysis or sequence based probing of a modified cell of the invention would not show any difference to the native gene sequence of the cell prior to modification. In such a situation a cell of same genotype can readily be kept aside as a control for comparative analysis with a modified cell of the invention. Alternatively, if the gene sequence of a modified cell is known by way of inference, because the starting cell used for modification in accordance with the invention is taken from a cell culture or from a tissue or an organism whose genetic sequence information is known, then the known sequence information serves as the control comparator.

In the examples described below, a deletion of a selectable marker sequence from a target site in a cell results in accurate repair resulting in a predicted (predesigned) sequence, i.e. the original genomic sequence where the selectable marker was inserted/located.

Therefore“seamless” in accordance with the invention means that there is no discernible or detectable change in the gene sequence of the modified cell in the target region where the selectable marker had been temporarily located by way of insertion and then deleted by excision. In other words, the method of the invention provides for the temporary insertion of marker sequence, followed by removal whereby the fact of the temporary existence of the marker sequence at a target locus in the genome cannot then

subsequently be shown to have occurred.

Advantageously in the above method a single transformation step may be used to effect the targeted Cpf1 cutting, but the construction of expression vector may be more involved.

In another way, the invention is provided as a method of seamless genetic modification of a cell, comprising:

(i) making a double stranded break at a desired target locus in the DNA of a cell of an organism;

(ii) introducing a polynucleotide or a vector or plasmid as described herein into the cell;

(iii) applying a first selection screen which identifies cells wherein the polynucleotide is integrated into a chromosome;

(iv) collecting the identified cells of (iii);

(v) transforming the cells of (iv) with an expression vector comprising polynucleotide sequence encoding Cpf1 ;

(vi) introducing into the cells of (iv) or (v) a crRNA at least substantially complementary to the crRNA recognition sequence in the Cpf1 target sequence in the introduced polynucleotide;

(vii) applying a second selection screen which identifies cells wherein sequences (b), (c) and (d) of the integrated polynucleotide are excised from the DNA of the cell by Cpf1 cleavage; and

(viii) collecting the identified cells of (vii).

Advantageously this method provides a more straightforward way of effecting Cpf1 targeted cutting of the DNA, by supplying a crRNA directly to cells.

In another method, the invention provides for seamless genetic modification of a cell, the method comprising:

(i) making a double stranded break at a desired target locus in the DNA in the cell of an organism;

(ii) introducing a polynucleotide or vector or plasmid as described herein into the cell; (iii) applying a first selection screen which identifies cells wherein the polynucleotide is integrated into the DNA of the cell;

(iv) collecting the identified cells of (iii);

(v) introducing into the cells of (iv) a Cpf1-crRNA ribonucleoprotein complex, wherein the crRNA is at least substantially

complementary to the crRNA recognition sequence in the Cpf1 target sequence in the introduced polynucleotide;

(vi) applying a second selection screen which identifies cells wherein sequences (b), (c) and (d) of the integrated polynucleotide are excised from the chromosome by Cpf1 cleavage;

(vii) collecting the identified cells of (vi).

For example, step (v) in the above method aspect of the invention may include a biolistic approach on cells, or cell-cell fusions, or microinjection. In connection with plants, the biolistics may be applied to cells, tissues or plant somatic embryos, e.g. a maize somatic embryo. Also possible is direct gene transfer into plant protoplasts, including PEG- mediated. Other methods of introducing DNA into cells of plants or animals will be readily known to a person of average skill in the art.

In connection with method aspects of the invention where there is a polynucleotide, vector or plasmid as described herein which includes a Cpf1 encoding sequence such that the Cpf1 is inducibly expressed in a transformed cell, the following is provided.

A method of seamless genetic modification of a cell comprises, comprising:

(iv) collecting the identified cells of (iii);

(v) transforming the cells of (iv) with (A) an expression vector or mRNA comprising polynucleotide sequence encoding a crRNA at least substantially complementary to the crRNA recognition sequence in the Cpf1 target sequence in the introduced polynucleotide;

(vi) inducing the expression of Cpf1 ; (vii) applying a second selection screen which identifies cells wherein sequences (b), (c) and (d) of the integrated polynucleotide are excised from the DNA of the cell by Cpf1 cleavage; and

(viii) collecting the identified cells of (vi).

In any of the aforementioned methods of the invention, insertion of the polynucleotide into DNA of an organism usually proceeds following the site-specific cutting of that DNA. Such cutting of the DNA is usually achieved with a site-directed nuclease enzyme. Many possibilities of such site-specific nuclease enzymes are possible in accordance with the invention. There are restriction endonucleases and these may be naturally occurring or artificially engineered. Any of the four types of restriction endonuclease may be used:

• Type I (EC 3.1.21.3) cleave at sites remote from a recognition site.

• Type II (EC 3.1.21.4) cleave within or at short specific distances from a recognition site. These are single function (restriction) enzymes independent of methylase.

• Type III (EC 3.1.21.5) cleave at sites a short distance from a recognition site.

These enzymes form part of a complex with a modification methylase

(EC 2.1.1.72).

• Type IV. These target modified DNA, e.g. methylated, hydroxymethylated and glucosyl-hydroxymethylated DNA.

Examples of such restriction enzymes are well known to a person of skill in the art and there are databases and catalogues and many commercial suppliers, e.g. New England Biolabs, Inc., Thermo Fisher Scientific, Promega Corporation and Sigma-Aldrich, to name a few.

Site-specific DNA cleavage may also be directed by artificial zinc finger nucleases (ZFNs), meganucleases or Transcription activator-like effector nucleases (TALENs).

CRISPR enzymes may also be used together with a suitable targeting RNA, and these principally include Cas9 or Cpf1.

The introduction of the nuclease into the cell in accordance with the invention preferably involves direct transfer, e.g. employing electroporation or liposomes and other methods well known to a person of average skill in the art. Where plant cells are concerned then PEG-mediated transfer into protoplasts would be used.

The resulting DNA cutting may result in blunt ends or alternatively“sticky” ends.

Following DNA cutting the insertion of the DNA polynucleotide into the DNA of the organism involves innate DNA repair processes following site-specific cutting. These can involve any of the known mechanisms for repair double-strand breaks (DSBs); that is to say non-homologous end joining (NHEJ), microhomology-mediated end joining (MMEJ), and homologous recombination (HR).

In accordance with various aspects of the invention, the Cpf1 and crRNA may each individually be comprised in a composition and introduced into a cell individually or collectively. Alternatively, the components may be provided in a single composition.

Therefore, the invention provides methods wherein one or more polynucleotides are delivered into cells, such as or one or more vectors, one or more transcripts thereof, and/or one or proteins transcribed therefrom, to a host cell.

In any of the aforementioned methods of the invention, one cell may be transformed to include a Cpf1 gene; optionally an inducible Cpf1 gene, and another cell may be transformed so as to express the crRNA or crRNAs. If there are a multiplicity of crRNAs then these are ideally inducibly expressed so that sequentially targeted Cpf1 cutting is available. This is important in connection with embodiments where a segment of, for example, recombinase system inverted DNA is not to be removed but the recombination sites are needing to be removed seamlessly.

Where plants are concerned, the separate transformation of cells with an (inducible) Cpf1 gene on the one hand, and the crRNA or crRNAs on the other, means that respective mature plants can be regenerated and then crossed so that the Cpf1 -crRNA is activated in the progeny and thereby produces the seamless excision of the desired DNA segment.

In some embodiments, a nucleic acid-targeting effector protein in combination with (and optionally complexed with) a guide RNA is delivered to a cell. Conventional viral and non- viral based gene transfer methods can be used to introduce nucleic acids in mammalian cells or target tissues. Such methods can be used to administer nucleic acids encoding components of a nucleic acid-targeting system to cells in culture, or in a host organism.

Non-viral vector delivery systems include DNA plasmids, RNA (e.g. a transcript of a vector described herein), naked nucleic acid, and nucleic acid complexed with a delivery vehicle, such as a liposome. Viral vector delivery systems include DNA and RNA viruses, which have either episomal or integrated genomes after delivery to the cell.

Non-viral delivery of nucleic acids which may be used in accordance with methods of the invention include lipofection, nucleofection, microinjection, biolistics, virosomes, liposomes, immunoliposomes, polycation or lipid ucleic acid conjugates, naked DNA, artificial virions, and agent-enhanced uptake of DNA. Also for use is peptide-mediated delivery, e.g. cell penetrating peptides. In accordance with the invention, delivery of Cpf1 enzyme optionally together with the guide RNA may be via a plasmid. In such plasmid compositions, the dosage should be a sufficient amount of plasmid to elicit a response. Plasmids for use in accordance with the invention generally comprise (i) a promoter; (ii) a sequence encoding a CRISPR enzyme, operably linked to said promoter; (iii) a selectable marker; (iv) an origin of replication; and (v) a transcription terminator downstream of and operably linked to (ii). The plasmid can also encode the RNA components of a CRISPR complex, but one or more of these may instead be encoded on a different vector.

The term "operably linked" as used herein refers to a functional linkage between the promoter sequence and the gene of interest, such that the promoter sequence is able to initiate transcription of the gene of interest.

Where guide RNA is introduced separately of the Cpf1 enzyme into a cell, the RNA molecules may be delivered in liposome or lipofectin formulations and the like and can be prepared by methods well known to those skilled in the art.

Similarly, the polynucleotide insertion templates of the invention may be introduced into cells using liposomes or nanoparticles.

The Cpf1 and/or delivery of the RNAs of the invention may be in RNA form and via microvesicles, liposomes or particle or particles.

In some embodiments, the Cpf1 and guide RNA must be delivered to the nucleus of eukaryotic cells. In other embodiments, the complexes of the present disclosure must be delivered to organelles with genetic information (e.g., chloroplasts and/or mitochondria). In yet other embodiments, the genome-editing tools of the present disclosure are used in organisms without nuclei. Thus, in some embodiments, the present disclosure involved using chimeric Cpfl polypeptides comprising one or more nuclear localization signals. A nuclear localization signal or sequence (NLS) is an amino acid sequence that 'tags' a protein for import into the cell nucleus by nuclear transport. In some embodiments, this signal consists of one or more short sequences of positively charged lysines or arginines exposed on the protein surface. Clusters of arginines or lysines in nucleus-targeted proteins signal the anchoring of these proteins to specialized transporter molecules found on the complex or in the cytoplasm. In some embodiments, one or more NLS can be genetically linked to a Cpfl protein, whether within the open reading frame of the Cpfl gene or at the C-terminus and/or the N-terminus.

Disclosures of practical relevant and which will be known to a person of average skill in the art using Cpfl as a gene-editing tool include published patent applications

WO2017/155407 Wageningen Universiteit; WO2017/127807 Broad Institute and WO2017/181107 Ohio State Innovation Foundation, each of which are incorporated herein by reference.

A main usefulness of methods of the invention described herein is for introducing a ‘change’ in the genome or genetic composition of an organism; for example, allele replacement; that is to say, replacing one allelic version of a gene (or gene parts) for another or others. This may be replacing just a single nucleotide or gene fragments or a compete gene (comprising promoter, 5’-UTR, coding sequence with introns, 3’-UTR and terminator sequences). On the other hand, parts of a gene sequence, e.g. just a promoter, or just some promoter elements, or terminator or any other gene element (e.g., a functional domain) may be replaced in accordance with the invention. The replaced elements may be‘natural’ sequences (i.e. gene sequences or parts thereof which exist in nature), natural sequences with an introduced small or large mutation, or novel synthetic sequences. The replaced allele may be a more beneficial version, a less beneficial version, or on inactive version. The allele replacement may have an adapted functionality or having a higher or lower or different pattern of expression.

Another utility of the invention may be in following up after making targeted chromosomal rearrangements with a site-specific recombinase system. For example, the cre-lox or R/Rs systems. The design of the recombination sites (Rs) may include flanking Cpf1 target sequence so that following recombinase activity and inversion of the polynucleotide sequence segment, the Rs sites may be seamlessly removed. More detail is given about this in the detailed description below.

All aspects of the invention may be applied to any organism, whether animals, plants, bacteria or fungi.

Genes to be replaced may for example be non-functional disease resistance genes (replaced by functional ones, or gene fragments that render them functional).

Promotors may be replaced for other ones that give a different gene expression level, timing and pattern.

Instead of conferring an irreversible change, the system can be used for the temporary introduction of genes or other DNA sequences with the aim to facilitate biological processes. After the biological process step the genes can be removed by the system. This may be aimed at (temporal) gene silencing (RNAi) or (temporal) gene inactivation (insertional inactivation) or induced expression (artificial transcription factors) or repress expression (artificial repressors) or introduction of new genes or DNA sequences for new activities. In more particular examples, the temporary induction of genes or other DNA sequences may be in plant breeding. The temporary state may last for several generations, e.g. early flowering genes intended to speed up generation time in a breeding program but to be omitted from the final product.

In preferred aspects of the invention, the cell may be comprised in a plant tissue and any vector used is preferably introduced into the tissue by agroinfiltration; so a strain of Agrobacterium is used. Therefore, an expression vector encoding Cpf1 and optionally encoding the crRNA, may be introduced into the tissue by agroinfiltration; preferably at the same time.

Where the methods of the invention are employed to edit plant genetic material, the gene edited cells or tissue is preferably cultured to produce plant callus. The callus is then optionally regenerated to form a plantlet; which may then be grown on into a plant.

What will be apparent to a person of skill in the art is that broad application of the polynucleotides, vectors, plasmids, cells and methods of the invention in connection with plant improvement programmes. Described herein are therefore methods of plant breeding which comprise selecting a desired plant, such as an elite plant variety, isolating plant tissue therefrom and subjecting the tissue to a method of the invention for seamless gene editing. The invention therefore helps to speed up the production of new plants and plant varieties for any purpose, whether in agriculture, silviculture, viticulture or horticulture.

The invention therefore also provides kits for seamless genetic modification of a cell, comprising a container which includes a first polynucleotide or a plasmid comprising: (b) a first Cpf1 target sequence; (c) a sequence encoding at least one marker; and (d) a second Cpf1 target sequence; wherein (b) is upstream of (c); (d) is downstream of (c); and (d) is in an inverse orientation to (b).

In certain aspects the kit may comprise a marker as herein described and/or the Cpf1 target sequences (b) and (d) are as defined.

In other aspects, a kit of the invention may further comprise a container which includes a second polynucleotide or plasmid encoding Cpf1. Further, the second polynucleotide or plasmid may further comprise a sequence encoding a crRNA which recognises the Cpf1 target sequence. Alternatively, kits may further comprise a separate container which may either include a crRNA which recognises the Cpf1 target sequence, or a third

polynucleotide or plasmid encoding a crRNA which recognises the Cpf1 target sequence.

Elements of any kit described herein may be provided individually or in combinations, and may be provided in any suitable container, such as a vial, a bottle, or a tube. In some embodiments, the kit includes instructions in one or more languages, for example in more than one language.

The invention herein may be used to good effect in excising Agrobacterium T-DNA which has already been integrated into plant genetic material, usually as part of a wider procedure of genetic modification or plant improvement. The plant cell containing the inserted T-DNA may already have undergone one or a number of additional genetic engineering and/or selection steps and so the present invention provides a way of tidying up the genetic material by seamlessly excising the T-DNA in such a way that the resulting genetic material carries no identifiable trace of having been artificially modified.

Agrobacterium- mediated transformation of plants results in integration of a specific part of the Agrobacterium binary vector, called the T-DNA (Transfer-DNA). The T-DNA is defined by Right Border (RB) and Left Border (LB) sequences that have a specific core sequence of 23-25 base pairs. After integration into the plant genome part of the RB and LB sequences are co-integrated. Agrobacterium (with binary vectors) may be used to introduce gene sequences, but these genes (or other functional DNA elements) may be unwanted or unnecessary after they have been active and have produced desired modification. One example is the introduction of genes for sequence-specific nucleases (SSNs) (Meganucleases, Zinc Finger Nucleases, TALENs, Crispr-Cas) for the induction of double strand DNA breaks and resulting in NHEJ-induced mutations or HDR-mediated DNA integration. After the modifications have been achieved the SSN-genes are undesired and may be removed. Using a specific CRISPR-Cpfl design targeting the T- DNA Right Border (RB) and Left Border (LB) sequences (and adjacent T-DNA sequence) may be used to excise the complete or major part of the T-DNA (including all integrated genes) from the plant genome.

In addition to T-DNAs with SSNs, Cpf1 -mediated excision of T-DNA in accordance with the aspects of the present invention may also be applied to genes that facilitate plant breeding steps. For example (1 ) flowering genes to induce early flowering aimed at speeding up breeding. This may be of special interest for plant species with a long generation time, such as apple (and other tree species) and tulip. Other genes (2) may induce male sterility, helpful in producing hybrid seed; or (3) genes suppressing meiotic recombination for creating homozygous plants. In these three examples the genes may be removed by using the present invention after they have facilitated the breeding step.

Another example is the Cpf1 -mediated removal of marker genes in transgenic or cisgenic plants, in the first case to be able to reuse the marker gene in a repeated transformation step with the same material; in the second step to remove gene sequences that are not native to the transformed plant species (to end up with cisgenic plants in case other genes integrated are native to the species).

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention are further described hereinafter with reference to the accompanying drawings, in which:

Figure 1 is a schematic overview of the most widely used genome editing systems including zinc finger nucleases (ZFN), transcription activator-like effector nucleases (TALEN), CRISPR Cas9, and CRISPR Cpf1 , showing their respective DNA targeting domains, zinc fingers (ZF), transcription activator-like effectors (TALE), Cas9 single guide RNA (sgRNA) complex, and Cpf1 CRISPR RNA (crRNA) complex as well as their Fok1 or Cas protein localized cleavage sites. Also shown are well described DNA repair mechanisms including HDR, aNHEJ, cNHEJ.

Figure 2A is a schematic overview (not to scale) of the use of the inventors’ excisable selectable marker system for targeted changes in plant cells. The template flanked by homology of 500 bp carries the target changes along with the excisable codA-nptll fusion (exCN) and is incorporated into the plant genome by HDR after a double stranded break in the target DNA. Cells with a successful integration of the template are selected for on kanamycin, while cells with failed integrations are selected against. After sufficient selection on kanamycin the cells are retransformed with a Cpf1 construct designed to cut outside the target sites precisely excising exCN. Cells with a successful excision of the template are selected for on 5-fluorocytosine, while cells with failed excisions are selected against. The remaining cells are grown into plantlets and screened for the target changes.

Figure 2B is a schematic overview (not to scale) of CRISPR Cpf1 based removal of a selectable marker in a proof concept linking a promoter to a visual maker coding sequence by CDEJ.

Figure 3A is a schematic view of removal of recombination sites, and how Cpf1 -mediated seamless excision of the invention is used. For each of two Rs sequences there are flanking Cpf1 target sites. Different Cpf-1 target sites are used for sequential excision of Rs sequences (thereby prevent excision of the complete inversion).

Figure 3B shows sequential removal of the Rs sites after induced inversion.

Figure 4 shows Cpf1 target sequence consisting of PAM, 14bp“specificity” sequence (analogous to a“seed” sequence in native crispr systems) cloning sequence and

5’overhang sequence portions (the DNA sequence shown is random and non-specific). Figure 5 shows in overview the experiment of seamless excision of a selectable DNA sequence (SEL) which was used to select for HDR events, making use of designed Cpf1 target sites and CRISPR-Cpfl . In the genomic DNA target sequence (top) the base pairs in red are exchanged by two other base pairs by HDR. The 5 base pairs in green mark the native locus which is used for design of the excision construct and from which the marker is seamlessly excised after HDR.

Figure 6 shows a technique for the removal of removal of internal restriction sites by PCR showing the base change (boxed), primer incorporated recognition sites (Bpil; underlined), and the restriction sites (brackets) for incorporation into a level 0 acceptor.

Figure 7 shows a technique for modifying codA and nptll for fusion by PCR showing the base change of the codA stop codon (boxed), primer incorporated recognition sites {Bpil·, underlined), and the restriction sites (brackets) for incorporation into a level 0 acceptor.

Figure 8 shows a technique for incorporation of the codA-nptll fusion (exCN) expression unit into a coding sequence acceptor by PCR showing the primer incorporated PAM (boxed), recognition sites (Bpil; underlined), and the restriction sites (brackets).

Figure 9: shows a technique for incorporation of coding sequence and terminator sequence into a terminator sequence acceptor by PCR, showing the primer incorporated overhang repeat (boxed), recognition sites (Bpil; underlined), and the restriction sites (brackets).

Figure 10 shows a technique for incorporation of the crRNA expression unit into a level 1 acceptor by PCR showing the primer incorporated protospacer (boxed), recognition sites (Bsal; underlined), and the restriction sites (brackets).

Figure 1 1 shows a (not to scale) schematic overview of CN marker excision by Cpf1 with the exCN1 construct and its corresponding Cpf1 construct with crRNA targeting length of 23bp. Cpf1 binds after recognition of the target sequence including the protospacer adjacent motif (PAM) and protospacer, where it creates a double stranded break (DSB) leaving complementary GATAC/GTATC overhangs as per design excising the exCN marker. The remaining ends ligate by CDEJ. Also shown is the triple repeat 3’ to the excision site, which is present in the excision product, but not in the positive control. Note: The crRNA sequence terminates in TTTT (not shown), which would occur further upstream in the 21 bp and 18 bp targeting lengths.

Figure 12 shows micrographs of fluorescent microscopy of plants infiltrated with

Agrobacterium strains with various combinations of CN DsRed excision constructs with differing target sequences (exCN1 , exCN2, and exCN3) with corresponding Cpf1 constructs differing in crRNA targeting length (18bp, 21 bp, 23bp) or no Cpf1 construct along with a no infiltration, MMA (infiltration medium without Agrobacterium) infiltration, and positive DsRed control. Note: Brightness increased 50% for greater clarity.

Figure 13 shows PCR CN fusion marker excision analysis across the excision site between the p35slong promoter and DsRed coding sequence or NtAn2 coding sequence with the appropriate controls (no infiltration, MMA infiltration, positive DsRed, and water) and replicates (A, B, C, and D), testing three sets of target sites in the constructs, exCN1 , exCN2, and exCN3, and infiltrated their respective Cpfl constructs of differing crRNA targeting lengths (18bp, 21 bp, and 23bp) or no Cpfl .

DETAILED DESCRIPTION

Cpf1 used in accordance with the invention may be any suitable Cpf1 known in the art.

For example, AsCpfl (from Acidaminococcus ) or LbCpfl (from Lachnospiraceae) or FnCpfl (from Francisella novicida).

There are algorithms available in the art for finding target sites and for designing guides for Cpf1. For example, the publication of Zetsche et al (2015) supra and WO2018/013990 Zymergen Inc, each of which are incorporated herein by reference

In the Cpf1 target sequence of the invention there is a sequence which provides the sequence specificity. This may be 14 nucleotides long, but may be less, 10, 1 1 , 12 or 14, or more, e.g. 15, 16, 17, 18, 19, 20, 21 , 22, 23, 24, 25, 26, 27, 28, 29 or 30 or more nucleotides.

Algorithmic tools in the art are also available for identifying potential off target sites for a particular guide sequence. For example, the Cas-Offinder may be used to identify potential off target sites for Cpfl (see Kim et al., "Genome-wide analysis reveals

specificities of Cpf1 endonucleases in human cells" published online June 06, 2016).

A preferred mode of operating is to screen and select for unique Cpfl sequences.

Filtration level is altered by changing both the length of the seed sequence and the number of occurrences of the sequence in the genome. Algorithms may in addition or alternatively provide the sequence of a guide sequence complementary to the reported target sequence(s) by providing the reverse complement of the identified target sequence

Where the homology arms of the polynucleotides of the invention are concerned, as well as expressing the homology in terms of degree of sequence identity, the homology may instead be expressed in terms of hybridization to a polynucleotide of reference sequence. Hybridization of such sequences may be carried out under stringent conditions. By "stringent conditions" or "stringent hybridization conditions" is intended conditions under which one polynucleotide will hybridize to its target sequence to a detectably greater degree than to other sequences (e.g., at least 2-fold over background). Stringent conditions are sequence dependent and will be different in different circumstances. By controlling the stringency of the hybridization and/or washing conditions, target sequences that are 100% complementary to the polynucleotide can be identified (homologous probing). Alternatively, stringency conditions can be adjusted to allow some mismatching in sequences so that lower degrees of similarity are detected (heterologous probing).

Generally, a polynucleotide is less than about 1000 nucleotides in length, preferably less than 500 nucleotides in length.

Typically, stringent conditions will be those in which the salt concentration is less than about 1 .5 M Na⁺ ion, typically about 0.01 to 1 .0 M Na⁺ ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30 °C for short probes (e.g., 10 to 50 nucleotides) and at least about 60 °C for long probes (e.g., greater than 50 nucleotides). Duration of hybridization is generally less than about 24 hours, usually about 4 to 12 hours. Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide.

In terms of percentage identity characterising the extent of homology between left and right the homology arms of a template polynucleotide sequence of the invention and the desired target sequence, the identity may be at least 70%, 71 %, 72%, 73%, 74%, 75%, 76%, 77%,

78%, 79%, 80%, 81 %, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91 %, 92%,

93%, 94%, 95%, 96%, 97%, 98%, or 99%, most preferably 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or at least 99%.

A template nucleic acid of the invention may include a sequence which results in: a change in sequence of 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, 1 1 , 12 or more nucleotides of the target sequence. In some embodiments, the template nucleic acid may be 200+50, 300+50, 400+50, 500+50, 600+50, 700+50, 800+50, 900+50, 1000+50, 1 100+50, 1200+50, 1300+50, 1400+50, 1500+50, 1600+50, 1700+50, 1800+50, 1900+50, 2000+50, 2100+50, of 2250+50 nucleotides in length. On other embodiments, the template nucleic acid may be 3000+500, 4000+500, 5000+500, 6000+500, 7000+500, 8000+500, 9000+500, 10000+500, 1 1000+500, 12000+500, 13000+500, 14000+500, I5000+500, 16000+500, 17000+500, 18000+500, 19000+500, 20000+500, 21000+500, or 22000+500 nucleotides in length. For complete gene replacements the length may be as long as the gene in question, often longer than 10,000bp. Template polynucleotides of the invention may be at least 10000, at least 15000, at least 20000, at least 25000, at least 30000, at least 35000, at least 40000, at least 50000, at least 60000, at least 70000, at least 80000, at least 90000 or at least 100000 nucleotides.

In the templates of the invention, any promoters used may be inducible promoters, such as the Adhl promoter which is inducible by hypoxia or cold stress, the Hsp70 promoter which is inducible by heat stress, the PPDK promoter and the pepcarboxylase promoter which are both inducible by light. Also useful are promoters which are chemically inducible, such as the ln2-2 promoter (see US 5,364,780) the ERE promoter which is estrogen induced, and the Axigl promoter which is auxin induced and tapetum specific in plants but also active in callus. Alternatively, a suitable constitutive promoter may be employed, e.g. for plants a Cauliflower Mosaic Virus 35S gene promoter.

Suitable markers for use in the polynucleotide templates of the invention may include any selectable marker. For example: positive or negative selectable markers. Positive selectable markers include antibiotic resistance which allows the cell to survive antibiotic selection. Negative or counterselective markers eliminate or inhibit growth of the cell on selection. For example, thymidine kinase, which makes the host sensitive

to ganciclovir selection. In specific exemplification of the present invention described later on, the codA gene is used which expresses cytosine deaminase which converts non-toxic 5-fluorocytidine (5-FC) into the toxic agent 5-fluorouracil (5-FU).

Positive and negative selectable markers can serve as both a positive and a negative marker by conferring an advantage to the host under one condition, but inhibits growth under a different condition. An example would be an enzyme that can complement an auxotrophy (positive selection) and be able to convert a chemical to a toxic compound (negative selection).

Some selectable marker choices which can be made in accordance with the invention may include phenotypes. So, for example, in relation to plants, the marker might be a dwarf phenotype or particular petal colour.

Also, suitable markers may include visualisable markers, for example, a fluorescent reporter protein, e.g. Green Fluorescent Protein (GFP), Yellow Fluorescent Protein (YFP), Red Fluorescent Protein (RFP), Cyan Fluorescent Protein (CFP) or mCherry. Such a fluorescent reporter gene provides a suitable marker for visualisation of protein expression since its expression can be simply and directly assayed by fluorescence measurement. Alternatively, the reporter nucleic acid may encode a luminescent protein, such as a luciferase (e.g. firefly luciferase). Alternatively, the reporter gene may be a chromogenic enzyme which can be used to generate an optical signal, e.g. a chromogenic enzyme (such as beta-galactosidase (LacZ) or beta-glucuronidase (Gus)). Reporters used for measurement of expression may also be antigen peptide tags. Other reporters or markers are known in the art, and they may be used as appropriate.

The polynucleotide templates, methods and kits of the invention may be used to edit the genetic material of any kind of cell, whether prokaryotic or eukaryotic. For example bacterial cells, fungal cells, plant cells, protist cells and animal (including human cells but not human embryonic stem cells). Some preferred cells for use in accordance with the present invention are commonly derived from species which typically exhibit high growth rates, are easily cultured and/or transformed, display short generation times, species which have established genetic resources associated with them or species which have been selected, modified or synthesized for optimal expression of heterologous protein under specific conditions. In preferred embodiments of the invention where a protein of interest (from a GOI) is eventually to be used in specific industrial, agricultural, chemical or therapeutic contexts, an appropriate cell may be selected based on the desired specific conditions or cellular context in which the protein of interest is to be deployed. Preferably the cell will be a prokaryotic cell. In preferred embodiments the cell is a bacterial cell, for example an Escherichia coli cell.

The seamlessly edited products of the present invention may not be discernible as being genetically modified in that the genetic change might equally be one which could arise naturally. Therefore, there would be no direct way of testing or determining the nature of the genetic provenance of the cell or organism which has undergone seamless editing in accordance with the invention. A documented history of the genetic material concerned would be needed in order to know whether gene editing has taken place. On the other hand, the present invention can introduce seamlessly genetic changes which would not necessarily arise in nature and could be directly ascertained. Hence such cells and organisms from the invention would be similar to transgenic material made in other ways. By“transgenic” is meant an unnatural locus in the genome, i.e. homologous or, preferably, heterologous expression of the nucleic acids.

A selectable marker sequence in the form of an insert is therefore temporarily inserted into the genome of a cell. The insert may also contain a gene of interest being inserted into the genome of a cell. Once the intended purpose of the marker sequence has been fulfilled, which is to screen for successful transformant cells, then it is then removed by excision in accordance with the methods of the invention. The design of the ends of the selectable marker sequence insert are such that the specific nucleotide sequences are chosen so that they ensure cutting with a directed Cpf1 nuclease which results in excision of the insert and leaving 5’ overhangs on each strand of the nucleic acid of the cell. Due to the sequence selection and design of the insert ends having regard to the insertion locus in the cell, these 5’ overhangs are substantially or strictly complementary in sequence to each other whereby they anneal and are subsequently ligated. The seamless nature of this excision and annealing arises from the choice of particular sequence of the ends of the insertion construct, whereby the resulting excision leaves no information in the ligated DNA whereby any change can be identified at that particular locus. Therefore the insertion constructs of the invention are adaptable by way of intelligent design and modification using known sequence information of the locus of insertion, so as to provide for a transient genetic change in a cell at the locus which is then fully erased so that the transient event cannot be identified from probing or sequencing of the genetic material of the resultant cell or organism.

The term“seamless” is therefore equivalent to no apparent change in sequence at a locus of insertion for an artificial construct which is inserted and then removed.

A most suitable application of the templates and methods of the invention for seamless editing is in the area of plants where improvements in varieties is always needed, and at an increasing pace. Transformation of plants is now a routine technique in many species. Advantageously, any of several transformation methods may be used to introduce the gene of interest into a suitable ancestor cell. The methods described for the

transformation and regeneration of plants from plant tissues or plant cells may be utilized for transient or for stable transformation. Transformation methods include the use of liposomes, electroporation, chemicals that increase free DNA uptake, injection of the DNA directly into the plant, particle gun bombardment, transformation using viruses or pollen and microprojection. Methods may be selected from the calcium/polyethylene glycol method for protoplasts, electroporation of protoplasts, microinjection into plant material, DNA or RNA-coated particle bombardment, infection with (non-integrative) viruses and the like. Also, Agrobacterium tumefaciens- mediated transformation.

Plant tissue capable of subsequent clonal propagation, whether by organogenesis or embryogenesis, may be subjected to the seamless editing of the invention and then a whole plant regenerated there from. The particular tissue chosen will vary depending on the clonal propagation systems available for, and best suited to, the particular species being transformed. Exemplary tissue targets include leaf disks, pollen, embryos, cotyledons, hypocotyls, megagametophytes, callus tissue, existing meristematic tissue (e.g., apical meristem, axillary buds, and root meristems), and induced meristem tissue (e.g., cotyledon meristem and hypocotyl meristem). Gene edited plants of the invention may be propagated by a variety of means, such as by clonal propagation or classical breeding techniques. For example, a first generation (or Ti) edited plant may be selfed and homozygous second-generation (or T₂) edited plants selected, and the T₂ plants may then further be propagated through classical breeding techniques. The generated edited organisms may take a variety of forms. For example, they may be chimeras of edited cells and non-edited cells; clonal edited (e.g., all cells transformed to contain the expression cassette); grafts of edited and non-edited tissues (e.g., in plants, an edited rootstock grafted to an unedited scion).

Where the present invention pertains to plants, this includes whole plants, ancestors and progeny of the plants and plant parts, including seeds, fruit, shoots, stems, leaves, roots (including tubers), flowers, and tissues and organs, wherein each of the aforementioned comprise genetic material of interest to be edited. The term "plant" also encompasses plant cells, suspension cultures, callus tissue, embryos, meristematic regions,

gametophytes, sporophytes, pollen and microspores, again wherein each of the

aforementioned comprises the genetic material of interest to be edited.

Plants may include monocots or dicots. A monocot plant may, for example, be selected from the families Arecaceae, Amaryllidaceae or Poaceae. For example, the plant may be a cereal crop, such as wheat, rice, barley, maize, oat, sorghum, rye, millet, buckwheat, turf grass, Italian rye grass, sugarcane or Festuca species, or a crop such as onion, leek, yam or banana.

A dicot plant may be selected from the families including, but not limited to Asteraceae, Brassicaceae (e.g. Brassica napus), Chenopodiaceae, Cucurbitaceae, Leguminosae (Caesalpiniaceae, Aesalpiniaceae Mimosaceae, Papilionaceae or Fabaceae ), Malvaceae, Rosaceae or Solanaceae. For example, the plant may be selected from lettuce, sunflower, Arabidopsis, broccoli, spinach, water melon, squash, cabbage, tomato, potato, yam, capsicum, tobacco, cotton, okra, apple, rose, strawberry, alfalfa, bean, soybean, field (fava) bean, pea, lentil, peanut, chickpea, apricots, pears, peach, grape vine, bell pepper, chilli or citrus species. In one embodiment, the plant is oilseed rape.

Also included are biofuel and bioenergy crops such as rape/canola, sugar cane, sweet sorghum, Panicum virgatum (switchgrass), linseed, lupin and willow, poplar, poplar hybrids, Miscanthus or gymnosperms, such as loblolly pine. Also included are crops for silage (maize), grazing or fodder (grasses, clover, sanfoin, alfalfa), fibres (e.g. cotton, flax), building materials (e.g. pine, oak), pulping (e.g. poplar), feeder stocks for the chemical industry (e.g. high erucic acid oil seed rape, linseed) and for amenity purposes (e.g. turf grasses for golf courses), ornamentals for public and private gardens (e.g. snapdragon, petunia, roses, geranium, Nicotiana sp.) and plants and cut flowers for the home (African violets, Begonias, chrysanthemums, geraniums, Coleus spider plants, Dracaena, rubber plant).

Most preferred plants are maize, rice, wheat, oilseed rape/canola, sorghum, soybean, sunflower, alfalfa, potato, tomato, tobacco, grape, barley, pea, bean, field bean, lettuce, cotton, sugar cane, sugar beet, broccoli or other vegetable brassicas or poplar.

As described in more detail in the Examples below, the inventors have constructed a completely removable selection system designed to be removed by transient expression of CRISPR Cpf1 targeted to the distal ends of the marker sequence. The constructs work to provide a viable and precise CDEJ repair. In the context of Nicotiana, the system separates a promoter sequence from a visual marker coding sequence by insertion of an excisable CN marker fusion (exCN; see Figures 2A and 2B).

Also possible in accordance with the invention is the Cpf1 -mediated removal of recombination sites as shown schematically in Figure 3. An example involves introduction of Recombination sites (Rs), together with a site-specific recombinase (e.g. cre-lox; R/Rs system): this may be used to create targeted chromosomal rearrangements. Two recombination sites (both flanked by marker gene and Cpf1 -target sequences) are introduced at different chromosomal locations making used of selectable homologous recombination (our system). Upon activation of the site-specific recombinase,

recombination between both recombination sites will take place, resulting in inversion, deletion or exchange of chromosomal DNA fragments. Then, by including the presence of Cpf1 -target sites in accordance with the invention flanking each of the Rs, these can be removed. The use of different Cpf-1 target sites for each Rs means that a sequential seamless excision of each of the Rs sites can be made without also excising the desired inversion.

The inventors made a DNA construct in which the bifunctional marker gene CN (which is a fusion of codA and nptll (Schaart et al., (2004) supra) was flanked by Cpf1 -target sites that were designed for and complete and seamless excision of the CN marker and both Cpf1 target sites. Three different Cpf1 target sites wherein the PAM and 14bp“seed” sequence are from known Cpf1 -target sites with high activity (TTTATGTCCCCTGTTGAC [SEQ ID NO: 5] from, Kim, D. et al. Genome-wide analysis reveals specificities of Cpf1

endonucleases in human cells. Nature biotechnology 34, 863 (2016);

TTT AG GAT G C C ACT AAAA [SEQ ID NO: 6], TTT AG AT CGAAT CTT CT A [SEQ ID NO: 7] from Kleinstiver, B. P. et al. Genome-wide specificities of CRISPR-Cas Cpf1 nucleases in human cells. Nature biotechnology 34, 869 (2016). (PAM sequence is underlined). Figure 4 shows the general structure of a Cpf1 target sequence, although the DNA sequence presented in Figure 4 is random and non-specific. In the inventor’s design of the Cpf target sequence, this includes a 4bp segment used for Golden Gate cloning (or another restriction enzyme/ligation-mediated cloning method) and the distal 5 bp, which form the 5’-end overhang. These sequences are homologous to the genomic sequence from which e.g. selectable marker sequences are excised. The flexible adaptation of the 5 bp 5’- overhang sequence is important. This can e.g. be adapted to a native site from which a marker has to be removed (and this native site is in case of HDR logically close to the genomic sequence to be replaced). This ensures that no‘foreign’ DNA sequences remain at the site of HDR after excision (see Figure 5).

Each of the constructs (exCN1 , exCN2, exCN3) were tested. The DNA fragment with CN and both Cpf1 target sites was placed in between a cauliflower mosaic virus 35S-promoter (pCaMV35S) and the coding sequence of red fluorescent protein gene (DsRed). Excision of the CN marker gene would combine both pCaMV35S and DsRed sequences, resulting in DsRed gene expression and visible red fluorescence. This construct was tested in a transient expression assay using agroinfiltration of Nicotiana benthamiana leaves. For this assay the described DNA construct for marker excision was cloned into a plant expression vector (binary vector) and transferred to Agrobacterium tumefaciens bacterial cells. A second plant expression vector with the construct harbouring the LbCpf1-gene and a specific crRNA (three different lengths, 18bp, 21 bp, 23bp, were tested) targeting the Cpf1- target sites used, was also transferred to A. tumefaciens cells. A mixed bacterial culture containing Agrobacteria with the marker excision construct and Cpf1 was infiltrated into N. benthamiana leaves (greenhouse plants) for expression of the constructs. Four days after agroinfiltration the infiltrated leaves were harvested and analysed for red fluorescence and marker excision.

DsRed fluorescence of leaf tissue is seen in agroinfiltrated leaves with both the marker construct and Cpf1 +crRNA, shows excision of the marker sequences. Cpf1 + crRNAs of 23bp for marker excision constructs exCN2 and exCN3 gave highest level of expression of DsRed.

Molecular analysis shows that a PCR which amplifies DNA sequences between pCaMV35S and DsRed results in PCR fragments of about 3000bp in the‘no Cpf1’ samples, which indicates the presence of an intact, not excised excision construct. In all samples infiltrated with constructs with Cpf1 and 18 bp, 21 bp or 23 bp crRNAs show a PCR fragment of 500 bp, indicating excision of the marker in these samples. PCR of the positive control gives a PCR fragment of a similar length. The unexpected 500 bp fragment in the control samples was due to amplification of positive control DNA which was present as contamination (checked by sequencing of this fragment). Sequencing of the PCR products showed that in all cases accurate, seamless excision of the marker sequences has taken place.

Example 1 : Seamless Cpf1 excision of bifunctional marker gene inserted into

Nicotiana benthamiana

Materials & Methods

Construct Building

Constructs were built using the Golden Gate Modular Cloning System (see: Auer, T. O., Duroure, K., De Cian, A., Concordet, J.-P. & Del Bene, F. Highly efficient CRISPR/Cas9- mediated knock-in in zebrafish by homology-independent DNA repair. Genome research 24, 142-153 (2014)). Library Efficiency™ DH5a™ cells from Invitrogen were transformed with ligation products. Cells with level 0 constructs were plated on LB agar with 50 mg/L spectinomycin, 20 mg/mL X-gal, and 10 mM IPTG. Cells with level 1 constructs were plated on LB agar with 50 mg/L carbenicillin or ampicillin, 20 mg/mL X-gal, and 10 mM IPTG and grown overnight at 37 °C. Cells with level 2 constructs were plated on LB agar with 50 mg/L kanamycin and grown overnight at 37 °C. Level 0 and 1 colonies were screened through blue-white screening and level 2 colonies were screened through red- white screening. White colonies were grown out overnight at 37 °C at 250 RPM in liquid LB media with 50mg/L of the appropriate antibiotic. Clones were further screened by colony PCR; positive clones had their PCR products sequenced. Clones with positive sequences were grown out and had their plasmids isolated using the QIAprep Spin Miniprep Kit from QIAGEN which were further sequenced.

Level 0 Acceptors

Level 0 acceptors used came from the Addgene plasmid repository:

Plasmid Add ene Description Ref.

PL0A01 plCH41308 Level 0 acceptor for CDS1 modules

PL0A02 plCH41276 Level 0 acceptor for 3U + Ter modules

PL1A01 plCH47732 Level 1 acceptor position 1 forward

PL1A02 plCH47742 Level 1 acceptor position 2 forward

PL1A03 plCH47751 Level 1 acceptor position 3 forward

PL1A04 plCH47761 Level 1 acceptor position 4 forward

PL2A01 plCSL4723 Level 2 acceptor

*Weber, E., Engler, C., Gruetzner, R., Werner, S. & Marillonnet, S. A modular cloning system for standardized assembly of multigene constructs. PloS one 6, el6765 (2011).

Level 0 codA Coding Sequence The codA sequence in a vector was cloned into the Level 0 CDS acceptor (PL0A01 ). This sequence contained a Bpil restriction site which had to be removed. The coding sequence was amplified into two components which could be ligated together into PL0A01. Two primer pairs were used one pair (PRCL001/PRCL004) for the coding sequence upstream of the Bpil site and one pair (PRCL002/PRCL003) for the coding sequence downstream of the Bpil site. PRCL004 was designed to span the Bpil site and change one base to remove the site ensuring a silent mutation. The primers were designed with flanking Bpil sites to facilitate ligation into a level 0 acceptor (see Figure 6). A high-fidelity PCR was used to amplify both amplicons with the codA vector as template, which were then isolated by gel electrophoresis and purified using the Zymoclean™ Gel DNA Recovery Kit from ZYMO RESEARCH. These were then used in ligation reaction with PL0A01 as an acceptor.

The high fidelity PCT reaction mix:

Component _ Amount

DreamTaq Buffer (1 Ox) 1.5 mI_

dNTPs (5mM) 0.6 pL

Forward Primer (10mM) 0.3 pL

Reverse Primer (10mM) 0.3 pL

DreamTaq 0.075 pL

Pfu Ultra II 0.0075 pL

1 ng plasmid or 50 ng

Template cDNA

Milli-Q Up to 15 pL

The high fidelity PCT thermocycle program:

Temperature _ Time _ Cycles

95 °C 2:00 min 1x

95 °C 0:30 min

55 °C 0:30 min 35x

72 °C _ 1 :00 min/kb _

72 °C 7:00 min

10 °c « ^{1 x}

Level 0 CN Fusion

The CN sequence in the vector was cloned into the Level 0 CDS acceptor (PL0A01 ). The codA portion contained the same Bpil site and so a new fusion protein was created based on the sequence of the original fusion protein. Two primer pairs (PRCL005/PRCL006 and PRCL007/PRCL008) were used to amplify codA and nptll respectively. The new primers were designed to amplify the coding sequence of codA while modifying the stop codon as well as to amplify nptll while adding a linking sequence upstream of the nptll start codon. The primers were designed with flanking Bpil sites to facilitate ligation into PL0A01 (see Figure 7). A high-fidelity PCR was used to amplify both amplicons with PL0C11 and PL1 C001 as respective templates, which were then isolated by gel electrophoresis and purified using the Zymoclean™ Gel DNA Recovery Kit from ZYMO RESEARCH. These were then used in ligation reaction with PL0A01 as an acceptor.

Ligation reaction mix for incorporation of PCR fragments into level 0 acceptors:

Component _ Amount

Buffer G (1 Ox) 2^~mE

Bpil (10U/pL) 1 pL

T4 DNA Ligase, HC (30U/pL) 1 pL

Level 0 Acceptor 200 ng

PCR Product 1 2 pL

PCR Product 2^* 2 pL

ATP (10mM) 2 pL

Milli-Q_ Up to 20 pL

^*Some reactions only contained a single PCR product.

Level 0 NtAn2 coding sequence

The NtAn2 coding sequence was isolated from cDNA after isolation from gDNA failed.

Pink N. tabacum flower petals were used for RNA isolation of transcripts of the NtAn2 transcription factor. RNA was isolated using the RNeasy Mini Kit QIAGEN and checked for quality through gel electrophoresis. cDNA was reverse transcribed from the isolated RNA using the iScript™ cDNA Synthesis Kit from BIO-RAD. Primers were designed to amplify the coding sequence of the NtAn2 gene and to add flanking Bpil sites to facilitate ligation into PL0A01. A high-fidelity PCR was used to amplify the coding sequence with cDNA as a template. The PCR product was then used directly in a ligation reaction with PL0A01 as an acceptor.

Level 0 Excisable CN Fusion Expression Unit

For ease of further cloning the excisable CN fusion expression unit (exCN) was designed to fit into the PL0A01 acceptor plasmid; this would allow the original kit promotor to be used for further ligations and only require the creation of a coding sequence plus terminator in the PL0A02 acceptor plasmid for the visual markers. To allow for

incorporation of the expression unit into the PL0A01 acceptor plasmid it had to be flanked by Bpil restriction sites and to allow for excision it also needed to be flanked by Cpf1 target sites. To ensure complete excision the restriction sites used for cloning were incorporated inside the Cpf1 target upstream of the cleavage sites i.e. they were placed from

nucleotides 15-18 of the protospacer region. Three different upstream regions of the target sequence including the PAM were used from human sequences that showed the highest indel rates in literature, as shown in Table 1 below.

Table 1: Designed target sites used to flank three different excisable codA-nptll fusion expression unit (exCN) with PAM (red) and the first 14 bases of the protospacer region (black) from literature and the restriction cleavage site (green) required for Golden Gate Cloning.

¹Kim, D. et ai. Genome-wide analysis reveals specificities of Cpf1 endonucleases in human cells. Nature biotechnology 34, 863 (2016).

²Kleinstiver, B. P. et al. Genome-wide specificities of CRISPR-Cas Cpf1 nucleases in human cells. Nature biotechnology 34, 869 (2016).

Three primer pairs (PRCL01 1/PRCL012, PRCL013/PRCL014, and PRCL015/PRCL016) were designed to amplify the entire CN fusion expression unit and were flanked by the inverse of respective target sequences and Bpil sites for further cloning (see Figure 8). A high-fidelity PCR was used to amplify the expression unit with PL1 C006 as a template.

The PCR products were then used directly in a ligation reaction with PL0A01 as an acceptor.

Level 0 Promoterless DsRed and NtAn2

To allow for ligation of the of the promoter and exCN expression unit to the DsRed and

NtAn2 coding sequence followed by a terminator, custom components had to be designed. Level 1 expression units of PL1 C007 and PL1 C008 were first made. The coding sequence and terminator had to be put into the PL0A02 acceptor, therefore primer pairs

(PRCL017/PRCL019 and PRCL018/PRCL019) were designed to amplify the coding sequence and terminator flanked by Bpil sites. To increase the likelihood of complete excision, PRCL017 and PRCL018 also incorporated a triple repeat of the 5 bp overhang sequence that would be left after Cpf1 excision on both sides of the excision (see Figure 9). A high-fidelity PCR was used to amplify the expression unit with PL1 C007 and PL1 C007 as respective templates. The PCR products were then used directly in a ligation reaction with PL0A02 as an acceptor.

Level 0 Remaining Components

The remaining level 0 components came from the Addgene plasmid repository.

Level 0 Plasmids:

Level 1 Acceptors

Level 1 acceptors used came from the Addgene plasmid repository.

Level 1 mature crRNA

A base mature crRNA expression unit was made so it could be used as a template in the production of new crRNAs by PCR. A complete mature crRNA was created by using two complementary primers (PRCR02/PRCR03), which spanned the entire crRNA flanked by Bsal restriction sites required for ligation into a level 1 acceptor along with U6 promoter from PL0C04. (See Figure 10). The primers were annealed to each other in elution buffer in thermocycler program which gradually decreased in temperature from 80 °C to 20 °C. The annealing product was used directly in a ligation reaction with PL1A03 as an acceptor.

Further mature crRNAs were created through PCR according to the protocol developed by Slaman, E. Expanding the CRISPR Toolbox in Tomato MSc thesis, Wageningen University and Research, (2017). The forward primer (PRCR01 ) annealed to the upstream region of the U6 promoter and was flanked by a Bsal site for ligation into either PL1A03 or PL1 A04. The reverse primers were designed to anneal to the upstream region of the crRNA and a portion of the down stream region of the U6 promoter, the unique protospacer sequence was incorporated into 5’ portion of the primer and was flanked by a Bsal site for further ligation. Three protospacer lengths, 18, 21 , and 23 base pairs, for each of the 6 targets were incorporated into primers. Each crRNA expression unit was amplified in a high- fidelity PCR. The PCR products were then used directly in a ligation reaction with plasmid PL1A03 or PL1A04 as an acceptor. Level 1 LbCpf

A Level 1 position 2 LbCfl expression construct containing a 2xCaMV35S promoter and a NOS terminator were provided.

Level 1 Remaining Components

The remaining level 1 components were built using standard ligation reactions using the previously built or obtained level 0 components.

Plasmid Addgene Ref.

PL1 C001 pICSL1 1024 Pnos+5'UTRomega+nptll+3'UTR+Tocs

PL1 C002 plCH41744 L2E

PL1 C003 plCH41780 L4E

plCH47742::2xCaM35s+5'UTRomega+hLbCpf1 +3'UTR+Tno

PL1 C004 Bioscience

s

PL1 C005 Assembled plCH47732::Pnos+5'UTRomega+codA+3'UTR+Tocs

PL1 C006 Assembled plCH47732: :Pnos+codA-nptl I+3'UTR+T ocs

PL1 C013 Assembled plCH47751 p U6+ matu recrRN ALeft2- 18

PL1 C014 Assembled plCH47761 pU6+maturecrRNARight2-18

PL1 C015 Assembled plCH47751 pU6+maturecrRNALeft3-18

PL1 C016 Assembled plCH47761 pU6+maturecrRNARight3-18

PL1 C017 Assembled plCH47751 pU6+maturecrRNALeft1-21

PL1 C018 Assembled plCH47761 pU6+maturecrRNARight1 -21

PL1 C019 Assembled plCH47751 p U6+ matu recrRN ALeft2-21

PL1 C020 Assembled plCH47761 pU6+maturecrRNARight2-21

PL1 C021 Assembled plCH47751 p U6+ matu recrRN ALeft3-21

PL1 C022 Assembled plCH47761 pU6+maturecrRNARight3-21

PL1 C023 Assembled plCH47751 pU6+maturecrRNALeft1-23

PL1 C024 Assembled plCH47761 pU6+maturecrRNARight1 -23

PL1 C025 Assembled plCH47751 pU6+maturecrRNALeft2-23

PL1 C026 Assembled plCH47761 pU6+maturecrRNARight2-23

PL1 C027 Assembled plCH47751 pU6+maturecrRNALeft3-23

PL1 C028 Assembled plCH47761 pU6+maturecrRNARight3-23

^*Weber, E., Engler, C., Gruetzner, R., Werner, S. & Marillonnet, S. A modular cloning system for standardized assembly of multigene constructs. PloS one 6, e16765 (2011 ).

Ligation reaction mix for incorporation of PCR fragments into level 1 acceptors:

Component Amount

Buffer G (10x) 2 pL

Bsal (10U/pL) 1 mI_

T4 DNA Ligase, HC

(30U/pL) 1 mI_

Level 1 Acceptor 200 ng

PCR Product 2 mί

ATP (10mM) 2 mί

Milli-Q Up to 20 m!

Ligation reaction mix for incorporation of level 0 components into level 1 acceptors:

Buffer G (10x) 2 pL

Bsal (10U/pL) 1 pL

T4 DNA Ligase, HC (30U/pL) 1 pL

Level 1 Acceptor 200 ng

Promoter+ 5’UTR+SP Vector Acceptor Molar Equivalent

Coding Sequence Vector Acceptor Molar Equivalent

3’UTR+Terminator Vector Acceptor Molar Equivalent

ATP (10mM) 2 pL

Milli-Q Up to 20 pL

Ligation reaction mix for incorporation of level 1 components into level 1 acceptors:

Component _ Amount

Buffer G (10x) 2 pL

Bpil (10U/pL) 1 pL

T4 DNA Ligase, HC (30U/pL) 1 pL

Level 2 Acceptor 200 ng

Position 1 Vector Acceptor Molar Equivalent Position 2 Vector Acceptor Molar Equivalent Position 3 Vector Acceptor Molar Equivalent Position 4 Vector Acceptor Molar Equivalent End-linker Acceptor Molar Equivalent ATP (10mM) 2 pL

Milli-Q Up to 20 pL

Level 2 Acceptors

The Level 2 acceptor used came from the addgene plasmid repository.

Level 2 Final Components

Level 2 components were assembled using standard ligation reactions using the previously listed level 0 components.

Plant Material

N. benthamiana plants were grown under greenhouse conditions prior to agroinfiltration. N. tabacum var. Samsun was propagated in plant tissue culture prior to Agrobacterium mediated transformation.

Agrobacterium Agrobacterium of the strain Agio (Lazo, G. R., Stein, P. A. & Ludwig, R. A. A DNA transformation-competent Arabidopsis genomic library in Agrobacterium. Nature

Biotechnology 9, 963 (1991 )) was used for all agroinfiltrations and stable transformations.

For transformation, electrocompetent Agrobacteria were transformed with each respective level 1 or 2 plasmid, by adding 50 ng of plasmid DNA to 50 pL of thawed cells in an electroporation cuvette and electroporating them at 14 kV/cm, 200 W, 25 pF for 4-5 seconds. Immediately after electroporation cells were covered with 450 pL of SOC media from Invitrogen and incubated at 28 °C for 60 minutes at 170 RPM. The cells were plated on LB agar with 50 mg/L rifampicin and 50 mg/L carbenicillin for level 1 constructs or 50 mg/L kanamycin for level 2 constructs. Colonies were grown out in liquid LB media with the 50 mg/L of rifampicin and the 50 mg/L of the appropriate antibiotic. Clones were further screened by colony PCR.

Agroinfiltrations

Agrobacteria were grown out in 10 mL liquid LB media with 50 mg/mL rifampicin and 50 mg/mL of the appropriate antibiotic overnight at 28 °C at 170 RPM. The cultures were spun down at 3200 RPM for 10 minutes and resuspended in MMA buffer to an OD600 nm of 1. Cultures for coinfiltration were mixed in equal parts. The cultures were then incubated in the dark for 1 hour.

MMA Buffer:

Component _ Amount _

MS salts - vitamins 2.5 g

Sucrose 10 g

MES 0.975 g

Acetosyringone (200mM) 500 pL

pH 5.6

Water up to 500 mL

Leaves which were nearly fully expanded were infiltrated with the prepared cultures by using a needleless syringe to saturate the leaves as much as possible. Each plant had three leaves infiltrated fully with one of the construct combinations.

Plants infiltrated with the excision constructs were harvested after 1 week. Plants with NtAn2 excision constructs had a single leaf sampled in 4 places for DNA isolation. Plants with DsRed excision constructs had a single leaf sampled for fluorescent microscopic examination followed by 4 samples from each of those leaves being taken for isolation. The samples were flash frozen in liquid nitrogen and stored at -80 °C.

In Vitro Transformation Agrobacteria were grown out in 10 mL liquid LB media with 50 mg/mL rifampicin and 50 mg/mL of the appropriate antibiotic overnight at 28 °C at 170 RPM. The cultures were spun down at 3200 RPM for 10 minutes and resuspended in 40 mL MS liquid + AS.

MS liquid + AS:

Component_ Amount

MS salts + vitamins 2.2 g

Glucose 15 g

Acetosyringone (100mM) 500 pL

pH 5.2

Water up to 500 mL

Leaves from plants grown in tissue culture had their midrib removed and had explants of roughly 0.25 cm² cut from them and were placed on callus inducing media (CIM) + AS.

CIM + AS:

Component Amount

MS salts + vitamins 2.2 g

Sucrose 15 g

BAP (1 mg/mL) 500 pL

NAA (1 mg/mL) 1000 pL

Acetosyringone (100mM) 500 pL

Daishin agar 4 g

pH 5.8

Water up to 500 mL

These explants were covered with the appropriate cultures and incubated for 20 minutes. The explants were dried and were placed on Whitman filter paper on CIM + AS and were cocultivated for 72 hrs in reduced light at 25 °C. Plants were transferred to selection CIM + KM and grown for 18 days to allow callus to develop. After 18 days callus was covered with their corresponding excision crRNA cultures and transferred to shoot inducing media (SIM) + AS.

SIM + AS:

Component Amount

MS salts + vitamins 2^2^~g

Sucrose 15 g

BAP (1 mg/mL) 1000 pL

Daishin agar 4 g

Kanamycin (50 mg/mL) 1000 pL

Cefotaxim (100 mg/mL) 1250 pL

pH 5.8

Water up to 500 mL CIM + KM:

Component Amount

MS salts + vitamins 2.2 g

Sucrose 15 g

BAP (1 mg/ml_) 500 pL

NAA (1 mg/ml_) 1000 mI_

Kanamycin (50 mg/ml_) 1000 pL

Cefotaxim (100 mg/ml_) 1250 mI_

Daishin agar 4 g

pH 5.8

Water up to 500 mL

After three days of cocultivation half the calli were transferred to selection SIM + 5FC and were grown for 21 days, while half were transferred to SIM + C media and after 4 days were transferred to SIM + 5FC selection media.

SIM + 5-FC:

Component Amount

MS salts + vitamins 2.2 g

Sucrose 15 g

BAP (1 mg/ml_) 1000 pL

Daishin agar 4 g

5-fluorocytosine (25 mg/ml_) 5000 mI_

Cefotaxim (100 mg/mL) 1250 pL pH 5.8

Water up to 500 mL

SIM + C:

Component Amount

MS salts + vitamins 2^2 g

Sucrose 15 g

BAP (1 mg/mL) 1000 mI_

Daishin agar 4 g

Cefotaxim (100 mg/mL) 1250 mI_

pH 5.8

Water up to 500 mL

Controls without a second transformation were transferred to matching media series and control SIM + KM selection media.

SIM + KM

Component Amount MS salts + vitamins 2.2 g

Sucrose 15 g

BAP (1 mg/mL) 1000 mI_

Daishin agar 4 g

Kanamycin (50 mg/mL) 1000 mI_

Cefotaxim (100 mg/mL) 1250 mI_

pH 5.8

Water up to 500 mL

Fluorescent Microscopy

Harvested N. benthamiana leaves and developing N. tabacum shoots were examined and imaged under the fluorescent microscopy to determine if there was DsRed expression. DNA isolation

Leaf samples were pulverized after being chilled in liquid nitrogen. DNA was then isolated using a CTAB DNA isolation protocol.

PCR Fragment Length

To determine if marker excision had taken place primer pairs (PRSS018/PRSS019 and PRSS018/PRSS020) were designed to amplify the sequence surrounding the excision and gDNA was used as a template in a standard DreamTaq™ PCR with these primers.

Cloning and Sequencing

PCR products were directly sequenced by Sanger sequencing from both ends to check for mutations at the ligation site. PCR fragments were isolated through gel electrophoresis using the Zymoclean™ Gel DNA Recovery Kit from ZYMO RESEARCH and were further cloned using the PGEM®- T-easy Vector System from Promega and transformed into Library Efficiency™ DH5a™ cells from Invitrogen, which were plated on LB agar with 50 mg/L ampicillin, 20 mg/mL X-gal, and 10 mM IPTG and grown overnight at 37 °C. Colony PCRs were performed on white colonies with M13 forward and reverse primers. PCR products were further sequenced by Sanger sequencing.

Results

Transient Assay

To test marker excision, 8 constructs were assembled testing 3 sets of Cpf1 target sites for both DsRed and NtAn2 visual markers, exCN1 , exCN2, and exCN3, as well as DsRed and NtAn2 controls. Target sites were designed based on human target sites in literature, which showed high mutation rates. The PAM and first 14 bases following it were used from literature, the bases 15-18 were changed for Golden Gate ligation, and any additional bases followed the overhang sequences. Nine Cpf1 constructs were built, 3 for each target set ranging in crRNA length, targeting the first 18, 21 , or 23 bases of the

protospacer. These constructs were coinfiltrated in N. benthamiana grown under greenhouse conditions and harvested after 7 days for fluorescent microscopy and

5 molecular analyses. The excision constructs were designed to leave complementary

overhangs followed by a repeat, differentiating expected excision product sequences from positive controls, as shown in Figure 11.

DsRed

DsRed is a proteinaceous visual marker, which fluoresces under excitation by UV light and

10 should be expressed after marker excision brings the promoter and coding sequence

together. Fluorescent microscopy of infiltrated leaves showed very limited fluorescence and was only detectable under higher magnification. exCN1 coinfiltrated with its respective Cpf1 constructs showed very minimal background fluorescence, while exCN2 and exCN3 showed increasing background fluorescence with crRNA length (see Figure 12). A PCR

15 across the excision site revealed all excision infiltrations gave bands of the expected size as well as most showing faint bands of the larger size expected for lack of excisions due to the presence of the marker between the primers (see Figure 13). The negative exCN controls (i.e. the exCN constructs, which were infiltrated without Cpf1 constructs showed bands at the height expected after excision while also showing more pronounced bands of

20 the size expected for lack of excisions (Figure 13). The positive controls showed bands of the expected size, while the two sets of empty controls (MMA media infiltrated and no infiltration) showed aspecific bands and the water control showed no bands (Figure 13). Half of the PCR products were sequenced from both ends with the amplification primers. All excision products showed the expected sequences with the previously mentioned

25 repeat as well as the positive controls, which lacked this repeat. Table 2 below shows the sequence alignment of PCR products across the CN fusion marker excision site in DsRed constructs showing both 30 bases upstream and downstream of the excision site sequenced from forward and reverse directions aligned to expected sequences, including 2 replicates (1 and 2) for each target sequence set (exCN1 , exCN2, and exCN3), Cpf1

30 construct crRNA targeting length (18bp, 21 bp, and 23bp) combination, as well as the

positive control. The sequence at the location of repair (GATACGATAC) [SEQ ID NO: 27] is missing in the control samples, allowing discrimination of excised sequences and control sequences. All excised samples show the expected excision pattern.

Table 2:

Three of the 6 negative controls had single sequence reads with identity to the positive control, while the remaining negative sequences did not align to any expected sequences (not shown).

5 NtAn2

NtAn2 is a transcription factor, which upregulates the anthocyanin production pathway in N. tabacum and should be expressed after marker excision brings the promoter and coding sequence together, theoretically acting as a visual marker giving red to purple leaves. No visible increase in anthocyanin production was seen in the leaves of infiltrated 10 plants including the positive control. A PCR across the excision site revealed all but one excision infiltrations gave bands of the expected size as well as most showing faint bands of the larger size expected for lack of excisions due to the presence of the marker between the primers (Figure 13). The negative controls showed bands at the height expected after excision while also showing more pronounced bands of the size expected for lack of 15 excisions (Figure 13). The positive controls showed bands of the expected size, while the two sets of empty controls (MMA media infiltrated and no infiltration) and the water control showed no bands (Figure 13). Half of the PCR products were sequenced with the amplification primers and all excision products showed the expected sequences as well as the positive controls as shown in Table 3 below, except for one sequence set (exCN3-23- 20 1 ), which matched the positive control.

Table 3 shows the sequence alignment of PCR products across the CN fusion marker excision site in NtAn2 constructs showing both 30 bases upstream and downstream of the excision site sequenced from forward and reverse directions aligned to expected sequences, including 2 replicates (1 and 2) for each target sequence set (exCN1 , exCN2, 25 and exCN3), Cpf1 construct crRNA targeting length (18bp, 21 bp, and 23bp) combination, as well as the positive control.

Table 3:

Two of the 6 negative controls had sequence identity to the positive control confirmed by reads from the opposite direction (not shown). The remaining sequences did not align to any expected sequences (not shown).

In vitro Assay To test the selectability of the excisable marker system N. tabacum explants were first transformed with the DsRed and NtAn2 excision constructs testing three sets of target sites in the constructs, exCN1 , exCN2, and exCN3 and positive DsRed and NtAn2 controls. These explants were grown on callus inducing media with kanamycin for 21 days for positive selection. The calli were then transiently transformed with their respective Cpf1 construct with the 23bp crRNA targeting length or not transformed for controls. The calli were transferred to shoot inducing media with 5-FC to select against Cpf1 integration and select for excision.

DsRed

Preliminary fluorescent microscopy of twice transformed callus did show some DsRed fluorescence but it was not yet comparable to the positive control (not shown).

NtAn2

Preliminary results in twice transformed callus did not show visible colour variation between controls and samples (not shown). Experiments with NtAn2 did not give a visual phenotype on the plant level, but gave positive in the PCR assay, supporting findings with DsRed

The experiment herein shows precise excision of the CN selection marker by using modified Cpf1 excision target sequences based on human target sequence showing high efficiency in human cells. Molecular analysis shows how precise excision occurs and likely favoured with complementary overhangs left by Cpf1. Fluorescent microscopy of transient assays shows that both crRNA length and target sequence may have some impact on the efficiency of excision, but molecular analysis suggest that precision remains unaffected by crRNA length or target sequence.

Molecular analysis of the transient assay seemed to contain some background

contamination of the control samples as well as one NtAn2 excision sample. This contamination is likely from low level presence of Agrobacteria with the respective DsRed and NtAn2 control constructs on leaf surfaces possibly transferred through contaminated gloves during agroinfiltrations in the greenhouse or during harvest. However nearly all excision samples showed the sequences expected for precise excision, which also points to there likely being only very low levels of Agro bacteria I contamination. Sequences originating from contamination were likely outcompeted by the properly excised template sequences in the PCR reaction. Sequences originating from contamination however were amplified without the presence of the competing properly excised sequence templates in the negative controls. That is not to say that there were not competing templates in the negative controls, as there were amplicons present of the expected unexcised size, but the PCR elongation time was optimized for shorter amplicons favouring the amplification of the contaminating sequences. Such contamination is nearly unavoidable in transient assays but can be avoided under in vitro conditions. The lack of excision sequences in the controls also suggests that contamination does not come from cross contamination between DNA samples.

Fluorescent microscopy showed increased fluorescence with longer crRNAs for excision and differing fluorescence between target sequences used. This suggests that some target sequences are more effective for excision and increasing the crRNA length also increases efficiency. Visually NtAn2 constructs did not show increased anthocyanin production, which is unexpected according to previous research which showed visible increases in anthocyanin in leaves and callus after overexpression of the same NtAn2 overexpression in N. tabacum var. Samsun. (See Pattanaik, S. et al. Isolation and functional characterization of a floral tissue-specific R2R3 MYB regulator from tobacco. Planta 231 , 1061-1076 (2010)).

PCR and sequencing of the samples did show that the sequence was present, so it is still unclear why there was no visible increases in anthocyanin production.

Sequencing results of the PCR products amplified across the excision site in the transient assay suggest that when there was an excision of the marker it did ligate without mutation, as the sequence trace files did not deviate around the excision site and showed no increased background in the read 3’ to the excision site, which would be expected if there were high levels of mutation. This implicates a form of CDEJ to be more common than previously thought compared to other well described repair mechanisms (see Figure 1 ). Essentially Cpf1 would keep cutting the DNA repeatedly until either the marker was removed, and the two flanking complementary ends ligated removing the target sites or there was a large deletion in the target site as has been shown and theorized previously. This follows the same reasoning theorizing why Cpf1 may increase HDR by cutting open the target multiple times until the template is inserted. It is unknown how many cuts and repairs occur before deletion or the target site is modified beyond the point of recognition, but increased cutting followed by precise CDEJ would favour marker excision, as opposed to other repair mechanisms like NHEJ, which are mutation prone and could destroy the target site before ligation.

CDEJ is hypothesized herein as a repair mechanism, which uses complementary overhangs for precise repair of DSBs. Such a repair mechanism is first described in this study as an explanation to the precision of excision mentioned earlier. Both cNHEJ and aNHEJ are assumed to be very prone to mutation, cNHEJ leading to small indels and aNHEJ leading to large deletions, HDR could be this precise but would require a template which was not provided.

Simple CDEJ as described herein is a mechanism involving no end processing, where complementary ends anneal and simply ligate, analogous to in vitro DNA cloning techniques which exploit complementarity of restriction products and T4 ligase to assemble plasmids (see Figure 1 ). CDEJ explains many of the ligations when LbCpfl always cuts in the same places and leaves 5 bp overhangs. However, as mentioned LbCpfl is known to leave both 5 bp and 4 bp overhangs in vitro (see Zetsche et ai (2015) supra). The sequences flanking the excision sites were designed assuming that LbCpfl would leave a 5 bp overhang, any other overhang would require processing for precise excision and ligation.

Example 2: Disruption of Nicotiana tabacum phytoene desaturase (NtPDS) gene

This experiment was performed to check the activity of the LbCpfl -system just by inducing targeted mutations in the pds gene.

Materials and Methods

Construct Building

Constructs were built using the Golden Gate Modular Cloning System (see Auer et al (2014) supra). Library Efficiency™ DH5a™ cells from Invitrogen were transformed with ligation products. Cells with level 1 constructs were plated on LB agar with 50 mg/L carbenicillin or ampicillin, 20 mg/mL X-gal, and 10 mM IPTG and grown overnight at 37 °C. Cells with level 2 constructs were plated on LB agar with 50 mg/L kanamycin and grown overnight at 37 °C. Level 1 colonies were screened through blue-white screening and level 2 colonies were screened through red-white screening. White colonies were grown out overnight at 37 °C at 250 RPM in liquid LB media with 50mg/L of the appropriate antibiotic. Clones were further screened by colony PCR; positive clones had their PCR products sequenced. Clones with positive sequences were grown out and had their plasmids isolated using the QIAprep Spin Miniprep Kit from QIAGEN which were further sequenced.

Components

Level 0, 1 , and 2 components assembled and used are as below. Plasmids obtained from the Addgene plasmid repository were:

Plasmid Addgene Description Ref.

PL1A03 plCH47751 Level 1 acceptor position 3 forward

PL1A04 plCH47761 Level 1 acceptor position 4 forward

PL2A01 plCSL4723 Level 2 acceptor

PL1C00 pICSL1102

Pnos+5'UTRomega+nptll+3'UTR +Tocs (Jones, Unpublished) 1 4 PL1C00

plCH41780 L4E

3

PL1C00 plCH47742::2xCaM35s+5'UTRomega

Bioscience

4 +hl_bCpf 1 +3'UTR+T nos

PL1C02

plCH41766 L3E

9

Plasmids assembled using standard ligation techniques were:

Plasmid Description _

PL1 C030 plCH47751 ::pU6+maturecrRNAPDS1-19

PL1 C031 plCH47751 ::pU6+maturecrRNAPDS1-22

PL1 C032 plCH47751 ::pU6+maturecrRNAPDS1-23

PL1 C033 plCH47751 ::pU6+maturecrRNAPDS3-18

PL1 C034 plCH47751 ::pU6+maturecrRNAPDS3-21

PL1 C035 plCH47751 ::pU6+maturecrRNAPDS3-22

PL1 C036 plCH47751 ::pU6+maturecrRNAPDS3-23

PL1 C037 plCH47751 ::pU6+maturecrRNAPDS4-19

PL1 C038 plCH47751 ::pU6+maturecrRNAPDS4-22

PL1 C039 plCH47751 ::pU6+maturecrRNAPDS4-24

PL1 C040 plCH47751 ::pU6+maturecrRNAPDS5-18

PL1 C041 plCH47751 ::pU6+maturecrRNAPDS5-20

PL1 C042 plCH47751 ::pU6+maturecrRNAPDS5-22

PL1 C043 plCH47751 ::pU6+maturecrRNAPDS5-23

PL1 C044 plCH47761 ::pU6+maturecrRNAPDS1 -19

PL1 C045 plCH47761 ::pU6+maturecrRNAPDS1-22

PL1 C046 plCH47761 ::pU6+maturecrRNAPDS1-23

PL1 C047 plCH47751 ::pU6+prematurecrRNAPDS1 -19

PL1 C048 plCH47751 ::pU6+prematurecrRNAPDS1-22

PL1 C049 plCH47751 ::pU6+prematurecrRNAPDS1-23

PL1 C050 plCH47751 ::pU6+prematurecrRNAPDS3-18

PL1 C051 plCH47751 ::pU6+prematurecrRNAPDS3-21

PL1 C052 plCH47751 ::pU6+prematurecrRNAPDS3-22

PL1 C053 plCH47751 ::pU6+prematurecrRNAPDS3-23

PL1 C054 plCH47751 ::pU6+prematurecrRNAPDS4-19

PL1 C055 plCH47751 ::pU6+prematurecrRNAPDS4-22

PL1 C056 plCH47751 ::pU6+prematurecrRNAPDS4-24

PL1 C057 plCH47751 ::pU6+prematurecrRNAPDS5-18

PL1 C058 plCH47751 ::pU6+prematurecrRNAPDS5-20

PL1 C059 plCH47751 ::pU6+prematurecrRNAPDS5-22

PL1 C060 plCH47751 ::pU6+prematurecrRNAPDS5-23

PL1 C061 plCH47761 ::pU6+prematurecrRNAPDS1 -19

PL1 C062 plCH47761 ::pU6+prematurecrRNAPDS1-22

PL1 C063 plCH47761 ::pU6+prematurecrRNAPDS1-23

PL2C006 plCSL4723::nptll+hLbCpf1 +maturecrRNAPDS1-19+L3E

PL2C007 plCSL4723::nptll+hLbCpf1 +maturecrRNAPDS1-22+L3E

PL2C008 plCSL4723::nptll+hl_bCpf1 +maturecrRNAPDS1-23+L3E

PL2C009 plCSL4723::nptll+hLbCpf1 +maturecrRNAPDS3-18+L3E

PL2C010 plCSL4723::nptll+hLbCpf1 +maturecrRNAPDS3-21 +L3E

PL2C01 1 plCSL4723::nptll+hl_bCpf1 +maturecrRNAPDS3-22+L3E

PL2C012 plCSL4723::nptll+hLbCpf1 +maturecrRNAPDS3-23+L3E

PL2C013 plCSL4723::nptll+hLbCpf1 +maturecrRNAPDS4-19+L3E

PL2C014 plCSL4723::nptll+hl_bCpf1 +maturecrRNAPDS4-22+L3E

PL2C015 plCSL4723::nptll+hLbCpf1 +maturecrRNAPDS4-24+L3E

Primers

The Primers used can be found in the section below.

Level 1 Ligation protocol

Ligations were made according to the following. Ligation reaction mix for incorporation of PCR fragments into level 1 acceptors:

Component _ Amount

Buffer G (10x) 2 pL

Bsal (10U/pL) 1 pL

T4 DNA Ligase, HC

(30U/pL) 1 pL

Level 1 Acceptor 200 ng

PCR Product 2 pL

ATP (10mM) 2 pL

Milli-Q Up to 20 pL

The general ligation program:

Temperature _ Time _ Cycles

37 °C 5:00 min 50x 20 °C 5:00 min

37 °C 5:00 min

50 °C 10:00 min

80 °C 10:00 min

10 °C

LbCpfl

Level 1 position 2 LbCfl expression constructs were provided.

Base CRISPR RNA

A Level 1 base premature crRNA expression construct for LbCpfl was provided. A base mature crRNA expression unit was made so it could be used as a template in the production of new crRNAs by PCR. A complete mature crRNA was created by using 2 complementary primers (PRCR02/PRCR03), which spanned the entire crRNA flanked by Bsal restriction sites required for ligation into a level 1 acceptor along with U6 promoter from PL0C04. The primers were annealed to each other in elution buffer in thermocycler program which gradually decreased in temperature from 80 °C to 20 °C. The annealing product was used directly in a ligation reaction with PL1A03 as an acceptor.

NtPDS Targets

Two targets were obtained from Endo, A., Masafumi, M., Kaya, H. & Toki, S. Efficient targeted mutagenesis of rice and tobacco genomes using Cpf1 from Francisella novicida. Scientific reports 6, 38169 (2016) and two targets were designed by running a truncated genomic sequence containing the mRNA exons with the adjacent 20 bases of intron sequence through the RGEN Cas-Designer tool with the N. tabacum reference genome for an off-target reference. Two targets with low chance for off targets and which were early in the NtPDS coding sequence, to increase likelihood of gene disruption, were chosen. Protospacers of various lengths ranging from 18 to 24 base pairs were chosen to incorporate into crRNAs.

Cloning crRNAs

Further crRNAs were created through PCR according to the protocol developed by Slaman, E. (2017) supra. The forward primer (PRCR01 ) annealed to the upstream region of the U6 promoter and was flanked by a Bsal site for ligation into either PL1A03 or PL1A04. The reverse primers were designed to anneal to the upstream region of the crRNA and a portion of the down stream region of the U6 promoter, the unique protospacer sequence was incorporated into 5’ portion of the primer and was flanked by a Bsal site for further ligation. Various protospacer lengths for each of the 4 targets were incorporated into primers. Each crRNA expression unit was amplified in a high fidelity PCR. The PCR products were then used directly in a ligation reaction with plasmid PL1A03 or PL1A04 as an acceptor. Level 2 ligation protocol

Ligations were made according to the following ligation reaction mix for incorporation of level components into level 2 acceptors:

Buffer G (10x) 2 pL

Bpil (10U/pL) 1 pL

T4 DNA Ligase, HC (30U/pL) 1 pL

Level 2 Acceptor 200 ng

Milli-Q Up to 20 pL Plant Material

N. tabacum var. Xanthii was sown and grown in a greenhouse for 4 weeks prior to agroinfiltration.

Agrobacterium

Agrobacterium of the strain Agio (Lazo et al. (1991 ) supra) were used for all agroinfiltrations and stable transformations.

Transformation

Electrocompetent Agrobacteria were transformed with each respective level 1 or 2 plasmid, by adding 50 ng of plasmid DNA to 50 pL of thawed cells in an electroporation cuvette and electroporating them at 14 kV/cm, 200 W, 25 pF for 4-5 seconds. Immediately after electroporation cells were covered with 450 pL of SOC media from Invitrogen and incubated at 28 °C for 60 minutes at 170 RPM. The cells were plated on LB agar with 50 mg/L rifampicin and 50 mg/L carbenicillin for level 1 constructs and 50 mg/L kanamycin for level 2 constructs. Colonies were grown out in liquid LB media with the 50 mg/L of rifampicin and the 50 mg/L of the appropriate antibiotic. Clones were further screened by colony PCR. Agroinfiltrations - culture

Agrobacteria were grown out in 10 mL liquid LB media with 50 mg/mL rifampicin and 50 mg/mL of the appropriate antibiotic overnight at 28 °C at 170 RPM. The cultures were spun down at 3200 RPM for 10 minutes and resuspended in MMA buffer to an OD600 nm of 1 . Cultures for coinfiltration were mixed in equal parts. The cultures were then incubated in the dark for 1 hour. Infiltration

Leaves which were nearly fully expanded were infiltrated with the prepared cultures by using a needleless syringe to saturate the leave as much as possible. Two leaves per plant were infiltrated with one side in relation to the midrib being infiltrated with mature crRNA constructs and the other with premature crRNA constructs; each plant was infiltrated with a different protospacer length and target.

Harvest

Infiltrated plants had two samples per side of the leaf taken after 48 hours totalling 8 samples per plant, which were flash frozen in liquid nitrogen and stored at -80 °C. Analysis - DNA isolation

List of Primers used in Examples 1 and 2 Custom Ligation Primers:

Primer F/R Sequence SEQ ID NO: Target

TTT G AAG AC AAA AT G G CT AAT AACG CTTT AC

PRCL001 F A SEQ I D NO: 79 codA Start

TTT G AAG ACAAAAGCT CAACGTTT GT AAT CG

PRCL002 R ATGG SEQ I D NO: 99 codA Stop

TTTGAAGACAAATCCGCTGGGAACGGCGAAT

PRCL003 F ATGC SEQ I D NO: 100 codA internal

TTT G AAG AC AAGG AT ACCACGG AT CG AAC AC

PRCL004 R ATCA SEQ I D NO: 101 codA Bpil site

TTT G AAG AC A AA AT G G CT A AT AACG CTTT AC

PRCL005 F A AAC A ATT ATT AACG SEQ I D NO: 102 codA Start

TTTGAAGACAATCCACGTTTGTAATCGATGGC

PRCL006 R T SEQ I D NO: 103 codA Stop

TTT G AAG ACAAT GG ATCT G AAC AAG AT GG AT

PRCL007 F TGCACGCAG SEQ I D NO: 104 nptl l Start

TTT G AAG ACAAAAGCT CAG AAG AACT CGT CA

PRCL008 R AGAAGGC SEQ I D NO: 105 nptl l Stop

TTT G AAG ACAAAAT G AAT ATTT GT ACT AAT AA

PRCL009 F GTCGTC SEQ I D NO: 106 NtAn2 Start

TTTGAAGACAAAAGCTCAACTGAGAAGTGGC

PRCL010 R ATTT SEQ I D NO: 107 NtAn2 Stop

TTTGAAGACAAAATGGTCAACAGGGGACATA

PRCL011 F AAGGAGGAATTCCAATCCCACA SEQ I D NO: 108 Left 1 Pnos

TTTGAAGACAAAAGCGTCAACAGGGGACATA

PRCL012 R A AAG CGTCG ATCT AGT AAC AT A SEQ I D NO: 109 Right 1 Toes

TTT G AAG ACAAAAT GTTTT AGTGGCATCCT AA

PRCL013 F AGGAGGAATTCCAATCCCACA SEQ I D NO: 110 Left 2 Pnos TTTGAAGACAAAAGCTTTTAGTGGCATCCTAA

PRCL014 R A AG CGTCG ATCT AGT AAC AT A SEQ I D NO: 111 Right 2 Toes

TTTGAAGACAAAATGTAGAAGATTCGATCTA

PRCL015 F AAGGAGGAATTCCAATCCCACA SEQ I D NO: 112 Left 3 Pnos

TTTGAAGACAAAAGCTAGAAGATTCGATCTA

PRCL016 R A AAG CGTCG ATCT AGT AAC AT A SEQ I D NO: 113 Right 3 Toes

TTT G AAG AC AAGCTT GAT ACG AT ACG AT ACA

PRCL017 F ATG G G GTC AT CCA AG AAT GTT ATC A SEQ I D NO: 114 DsRed Start

TTT G AAG AC AAGCTT GAT ACG AT ACG AT ACA

PRCL018 F AT G AAT ATTT GT ACT AAT AAGTCGT C SEQ I D NO: 115 NtAn2 Start

TTT G AAG ACAAAGCGG AT AATTT ATTT G AAA PRCL019 R ATTCAT SEQ I D NO: 116 Tmas

Screening and Sequencing Primers:

Primer F/R Sequence SEQ ID NO: Target

PRSS001 F AGCGAGTCAGTGAGCGAG SEQ I D NO: 117 Level 0 Acceptor Backbone

PRSS002 R AATAGGCGTATCACGAGGC SEQ I D NO: 118 Level 0 Acceptor Backbone

PRSS003 F GCTTTGTCGAAACCGTTGCT SEQ I D NO: 119 codA

PRSS004 R GTGTGCTGG C AAT C ACCTT G SEQ I D NO: 120 codA

PRSS005 F TG G G C AC AAC AG AC AAT CG G CTG C SEQ I D NO: 121 nptl l

PRSS006 R TGCGAATCGGGAGCGGCGATACCG SEQ I D NO: 122 nptl l

PRSS007 F AAAGTCCC AC AT CG AT C AGGT SEQ I D NO: 123 pU6

Level 1/2 Acceptor

PRSS008 F GGTGGCAGGATATATTGTGG SEQ I D NO: 124 Backbone

PRSS009 R TGCACATACAAATGGACGAAC SEQ I D NO: 125 Level 1 Acceptor Backbone PRSS010 F CCACTATCCTTCGCAAGACC SEQ I D NO: 126 P35S

PRSS011 R GTGCTCCACCAT GTT G ACG A SEQ I D NO: 127 P35S

PRSS012 F TGGAGAGGACACGCTCGAGT SEQ I D NO: 128 P35S

PRSS013 R G C A ACT GTG CTGTT AAG CT C SEQ I D NO: 129 P35S

PRSS014 R G CTG G C AC AT AC AA AT G G AC SEQ I D NO: 130 Level 2 Acceptor Backbone PRSS015 F GCGCGCGGTGT CAT CTATGT SEQ I D NO: 131 Tnos

PRSS016 F AATGTGCGTGGCTTTATCTGTC SEQ I D NO: 132 Tmas

PRSS017 F G AG CG CC AC AAT A AC A AAC A SEQ I D NO: 133 Toes

PRSS018 F ACGAGGAGCATCGTGGAAAAA SEQ I D NO: 134 P35S

PRSS019 R CTGGTATGTCGGCAGGGTG SEQ I D NO: 135 DsRed

PRSS020 R AGCTT AT G AAGCCT CAAAAT GAGA SEQ I D NO: 136 NtAn2

crRNA Primers:

PRCR01 F 5'-T GT GGT CT CAGGAGT GAT CAAAAGT CCCACAT C pU6.

[SEQ ID NO: 137] PRCR02 F 5'-

TGTGGTCT CAATT GAATTT CT ACT AAGT GT AG AT GG AACT G AAAGT CAAG AT GGT CATT TTT C G CTT GAG AC C AC A Base Mature crRNA Forward. [SEQ ID NO: 138]

PRCR03 R 5'-

TGTGGT CT CAAGCGAAAAAT GACCAT CTT GACTTT CAGTT CCAT CT ACACTT AGT AGAA ATTCAATTGAGACCACA Base Mature crRNA Reverse. [SEQ ID NO: 139]

PRCR04 R 5'-

TGTGGT CT CAAGCGAAAAAAATGGT CAACAGGGGACAAT CT ACACTT AGT AGAAATT C AATCGCTATGT Target Left 1 -18 Bases. [SEQ ID NO: 140]

PRCR05 R 5'-

TGTGGTCT CAAGCG AAAAAAAT GTTTT AGT G G CAT C CAT CT ACACTT AGT AGAAATT CA ATCGCTATGT Target Left 2-18 Bases. [SEQ ID NO: 141 ]

PRCR06 R 5'-

TGTGGTCT CAAG CG AAAAAAAT GT AG AAG ATT CG AT CAT CT ACACTT AGT AGAAATT CA ATCGCTATGT Target Left 3-18 Bases. [SEQ ID NO: 142]

PRCR07 R 5'-

T GT GGT CT CAAGCG AAAAAAAG CGT CAACAGGGGACAAT CTACACTTAGTAGAAATT C AATCGCTATGT Target Right 1 -18 Bases. [SEQ ID NO: 143]

PRCR08 R 5'-

TGTGGTCT CAAG C G AAAAAAAG CTTTT AGTG G CAT C CAT CT ACACTT AGT AGAAATT C AATCGCTATGT Target Right 2-18 Bases. [SEQ ID NO: 144]

PRCR09 R 5'-

TGTGGTCT CAAG C G AAAAAAAG CT AG AAG ATT C GAT CAT CT ACACTT AGT AGAAATT C AATCGCTATGT Target Right 3-18 Bases. [SEQ ID NO: 145] PRCR10 R 5'-

TGTGGTCT CAAGCGAAAAAT ACAAT GGT CAACAGG G GACAAT CT ACACTT AGT AGAAA TTCAATCGCTATGT Target Left 1 -21 Bases. [SEQ ID NO: 146]

PRCR1 1 R 5'-

TGTGGT CT CAAGCGAAAAAT ACAAT GTTTT AGT GGCAT CCAT CT ACACTT AGT AGAAAT TCAATCGCTATGT Target Left 2-21 Bases. [SEQ ID NO: 147]

PRCR12 R 5'-

TGTGGT CT CAAGCGAAAAAT ACAAT GT AGAAGATT CGAT CAT CT ACACTT AGT AGAAAT TCAATCGCTATGT Target Left 3-21 Bases. [SEQ ID NO: 148]

PRCR13 R 5'-

TGTGGTCT CAAGCGAAAAAAT CAAGCGT CAACAGGGGACAAT CT ACACTT AGT AG AAA TTCAATCGCTATGT Target Right 1 -21 Bases. [SEQ ID NO: 149]

PRCR14 R 5'-

TGTGGTCT CAAGCGAAAAAAT CAAG CTTTT AGT G G CAT C CAT CT ACACTT AGT AGAAA TTCAATCGCTATGT Target Right 2-21 Bases. [SEQ ID NO: 150]

PRCR15 R 5'-

TGTGGT CT CAAGCGAAAAAAT CAAGCT AGAAGATT CGAT CAT CT ACACTT AGT AGAAA TTCAATCGCTATGT Target Right 3-21 Bases. [SEQ ID NO: 151]

PRCR16 R 5'-

TGTGGT CT CAAGCGAAAAAGAT ACAAT GGT CAACAGGGGACAAT CT ACACTT AGT AGA AATTCAATCGCTATGT Target Left 1 -23 Bases. [SEQ ID NO: 152]

PRCR17 R 5'-

TGTGGT CT CAAGCGAAAAAGAT ACAAT GTTTT AGT GGCAT CCAT CT ACACTT AGT AGA AATTCAATCGCTATGT Target Left 2-23 Bases. [SEQ ID NO: 153] PRCR18 R 5'-

TGTGGT CT CAAGCGAAAAAGAT ACAAT GT AGAAGATT CGAT CAT CT ACACTT AGT AGA AATTCAATCGCTATGT Target Left 3-23 Bases. [SEQ ID NO: 154]

PRCR19 R 5'-

TGTGGTCT C AAG C G AAAAAGT AT C AAG C GT C AAC AG G G G AC AAT CT AC ACTT AGT AG A AATTCAATCGCTATGT Target Right 1 -23 Bases. [SEQ ID NO: 155]

PRCR20 R 5'-

TGTGGTCT CAAGCGAAAAAGT AT CAAG CTTTT AGT G G CAT C CAT CT ACACTT AGT AGA AATTCAATCGCTATGT Target Right 2-23 Bases. [SEQ ID NO: 156]

PRCR21 R 5'-

TGTGGT CT CAAGCGAAAAAGT AT CAAGCT AGAAGATT CGAT CAT CT ACACTT AGT AGA AATTCAATCGCTATGT Target Right 3-23 Bases. [SEQ ID NO: 157]

Genetic Resources

• Nicotiana benthamiana plants were obtained directly from the greenhouse facility of Wageningen University & Research Radix, Bornsesteeg 48, NL-6708 PE,

Wageningen, the Netherlands, where the lines of these plants have been maintained for over 30 years. The previous direct source of the plants is unknown.

• N. tabacum var. Xanthii plants were obtained directly from the greenhouse facility of Wageningen University & Research Radix, Bornsesteeg 48, NL-6708 PE, Wageningen, the Netherlands, where the lines of these plants have been maintained for over 30 years. The previous direct source of the plants is unknown.

• Pink N. tabacum flower petals. The plant material was obtained directly from the greenhouse facility of Wageningen University & Research Radix, Bornsesteeg 48, NL-6708 PE, Wageningen, the Netherlands, where the lines of these plants have been maintained for over 30 years. The previous direct source of the plants is unknown. • N. tabacum var. Samsun plants were obtained directly from the greenhouse facility of Wageningen University & Research Radix, Bornsesteeg 48, NL-6708 PE, Wageningen, the Netherlands, where the lines of these plants have been maintained for over 30 years. The previous direct source of the plants is unknown.

• Agrobacterium tumefaciens strain Agio was obtained more than 25 years ago from Dr. Gerard R. Lazo, USDA-ARS Western Regional Research Center;

800 Buchanan Street, Albany, CA 94710-1 105, USA. The strain has been maintained since then in the laboratory at Wageningen University & Research, Bornsesteeg 48, NL-6708 PE, Wageningen, the Netherlands. (See also Lazo GR, Stein PA, Ludwig RA. (1991 ) A DNA transformation-competent Arabidopsis genomic library in agrobacterium. Biotechnology. Vol: 9, pages 963-7).

Throughout the description and claims of this specification, the words“comprise” and “contain” and variations of them mean“including but not limited to”, and they are not intended to (and do not) exclude other moieties, additives, components, integers or steps. Throughout the description and claims of this specification, the singular encompasses the plural unless the context otherwise requires. In particular, where the indefinite article is used, the specification is to be understood as contemplating plurality as well as singularity, unless the context requires otherwise.

Features, integers, characteristics, compounds, chemical moieties or groups described in conjunction with a particular aspect, embodiment or example of the invention are to be understood to be applicable to any other aspect, embodiment or example described herein unless incompatible therewith. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive. The invention is not restricted to the details of any foregoing embodiments. The invention extends to any novel one, or any novel combination, of the features disclosed in this specification (including any accompanying claims, abstract and drawings), or to any novel one, or any novel combination, of the steps of any method or process so disclosed.

The reader's attention is directed to all papers and documents which are filed concurrently with or previous to this specification in connection with this application and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference.

Claims

1. A double stranded DNA polynucleotide for insertion into DNA of an organism at a desired target locus, which target locus comprises a DNA sequence cleavable by a site-directed nuclease enzyme, the polynucleotide comprising, in linear order:

(b) a first Cpf1 target sequence;

(c) at least one selected nucleotide sequence;

(d) a second Cpf1 target sequence in an inverse orientation to (b); and

(e) a nucleotide sequence with homology to a portion of the target locus downstream of sequence cleavable by the nuclease;

2. A polynucleotide as claimed in claim 1 , wherein the at least one change in sequence is selected from: a base insertion or insertions, a base deletion or deletions, base change or changes, or any combination thereof.

3. A polynucleotide as claimed in claim 1 or claim 2, wherein the site of the at least one change in sequence in (a) is adjacent to (b) and/or the site of the at least one change in sequence in (e) is adjacent to (d).

4. A polynucleotide as claimed in any of claims 1 to 3, wherein the at least one change in sequence comprises insertion of a polynucleotide.

5. A polynucleotide as claimed in claim 4, wherein the at least one inserted

polynucleotide comprises a gene regulatory element, e.g. a promoter or an enhancer.

6. A polynucleotide as claimed in claim 4, wherein the at least one inserted

polynucleotide comprises a sequence encoding a gene of interest (GOI); optionally further comprising a promoter for the GOI located upstream thereof.

7. A polynucleotide as claimed in any preceding claim, further comprising an additional GOI downstream of (d) and a promoter for the additional GOI upstream of (b).

8. A polynucleotide as claimed in any preceding claim, not including any pair of

microhomologous regions located outside of (b), (c) and (d).

9. An isolated DNA polynucleotide comprising:

(b) a first Cpf1 target sequence;

(c) at least one selected nucleotide sequence; and

(d) a second Cpf1 target sequence;

wherein (b) is upstream of (c); (d) is downstream of (c); and (d) is in inverse orientation to (b).

10. An isolated DNA polynucleotide consisting of:

(b) a first Cpf1 target sequence;

(c) at least one selected nucleotide sequence; and

(d) a second Cpf1 target sequence;

wherein (b) is upstream of (c); (d) is downstream of (c); and (d) is in an inverse orientation to (b).

1 1. A polynucleotide as claimed in any preceding claim, wherein the at least one

selected nucleotide sequence is, or comprises, one or more of a marker, a functional gene or a DNA element, e.g. a recombination site recognised by a site-specific recombinase.

12. A polynucleotide as claimed in any preceding claim, wherein the at least one

selected nucleotide sequence includes a sequence encoding a Cpf1 under control of an inducible promoter.

13. A polynucleotide as claimed in claim 1 1 , wherein there are at least two markers

which are selection markers; preferably wherein the markers are under the operative control of at least one promoter for expression in a cell of the organism.

14. A polynucleotide as claimed in any preceding claim, wherein the at least one marker is selected from: a compound resistance gene, a gene encoding an observable phenotype, or a visualisable protein.

15. A polynucleotide as claimed in any preceding claim, wherein the Cpf1 target sequence (b) and the Cpf target sequence (d) in inverse orientation, each consist of a PAM sequence, a crRNA recognition sequence, and a distal sequence.

16. A polynucleotide as claimed in claim 14, wherein the distal sequence is 4 or 5

contiguous nucleotides and upon Cpf1 cleavage forms a 5’ overhang; preferably a 4 or 5 base overhang.

17. A polynucleotide as claimed in claim 14 or claim 15, wherein the recognition

sequence is at least 14 and not more than 30 contiguous nucleotides.

18. A polynucleotide as claimed in any preceding claim, wherein the Cpf1 target

sequence (d) is identical or substantially the same as the Cpf1 target sequence (b).

19. A polynucleotide as claimed in any preceding claim, wherein the Cpf1 target

sequence (d) is different compared to the Cpf1 target sequence (b), other than the 5 or 5 nucleotide distal sequence which is the same.

20. A polynucleotide as claimed in any of claims 15 to 19, wherein the PAM and

recognition sequence is selected from:

(i) 5’ TTT ATGTCCCCT GTT GAC 3’;

(ii) 5’ TTT AG G ATG C C ACT AAAA 3’;

(iii) 5’ TTT AG AT C G AAT CTTCTA 3’; or

(iv) 5’ TTTGTG CT AAC G CT GAT G 3’

21. An isolated RNA molecule encoding a polynucleotide as claimed in any preceding claim.

22. A plasmid or vector comprising a polynucleotide of any of claims 1 to 19; preferably an expression vector.

23. A cell transformed with a polynucleotide of any of claims 1 to 19 or with a plasmid or vector of claim 21.

24. A cell as claimed in claim 23, which is a plant cell.

25. A method of seamless genetic modification of a cell, comprising:

(ii) introducing a polynucleotide of any of claims 1 to 20, or a vector of claim 22, into the cell;

(iii) applying a first selection screen which identifies cells wherein the

polynucleotide is integrated into the DNA of the cell;

(iv) collecting the identified cells of (iii);

(v) transforming the cells of (iv) with (A) an expression vector comprising

polynucleotide sequence encoding Cpf1 and encoding a crRNA at least substantially complementary to the crRNA recognition sequence in the Cpf1 target sequence in the introduced polynucleotide; or (B) a first expression vector comprising a polynucleotide encoding Cpf1 and a second expression vector encoding a crRNA at least substantially complementary to the crRNA recognition sequence in the Cpf1 target sequence in the introduced polynucleotide;

(vii) collecting the identified cells of (vi).

26. A method of seamless genetic modification of a cell, comprising:

(iii) applying a first selection screen which identifies cells wherein the

polynucleotide is integrated into a chromosome;

(iv) collecting the identified cells of (iii);

(v) transforming the cells of (iv) with an expression vector comprising

polynucleotide sequence encoding Cpf1 ;

(vi) introducing into the cells of (iv) or (v) a crRNA at least substantially

(vii) applying a second selection screen which identifies cells wherein sequences (b), (c) and (d) of the integrated polynucleotide are excised from the DNA of the cell by Cpf1 cleavage; and (viii) collecting the identified cells of (vii).

27. A method of seamless genetic modification of a cell, comprising:

(iii) applying a first selection screen which identifies cells wherein the

polynucleotide is integrated into the DNA of the cell;

(iv) collecting the identified cells of (iii);

(v) introducing into the cells of (iv) a Cpf1 -crRNA complex, wherein the crRNA is at least substantially complementary to the crRNA recognition sequence in the Cpf1 target sequence in the introduced polynucleotide;

(vii) collecting the identified cells of (vi).

28. A method as claimed in any of claims 24 to 27, wherein a plant cell is treated to form a protoplast and wherein the polynucleotide or vector is introduced into the protoplast; optionally wherein the protoplast is then regenerated to a plant cell and further optionally cultured to form tissue.

29. A method as claimed in any of claims 24 to 27, wherein an expression vector

encoding Cpf1 and optionally encoding the crRNA, is introduced into plant tissue by agroinfiltration; preferably at the same time as the vector of claim 22.

30. A method as claimed in any of claims 24 to 29, wherein the tissue is cultured to

produce a plantlet, or a callus and then a plantlet.

31 . A method as claimed in claim 30, wherein the plantlet is grown into a plant.

32. A method of plant breeding comprising selecting a desired plant, isolating plant tissue therefrom and subjecting the tissue to a method of any of claims 25 to 31 ; preferably wherein the plant is an elite plant.

3. A kit for seamless genetic modification of a cell, comprising a container which includes a first polynucleotide or a plasmid comprising:

(b) a first Cpf1 target sequence;

(c) a sequence encoding at least one marker; and

(d) a second Cpf1 target sequence; wherein (b) is upstream of (c); (d) is downstream of (c); and (d) is in an inverse orientation to (b).

34. A kit as claimed in 33, wherein the marker is as set forth in any of claims 12 to 14; and/or the Cpf1 target sequences (b) and (d) are as set forth in any of claims 13 to 17.

35. A kit as claimed in claim 33 or claim 34, further comprising a container which

includes a second polynucleotide or plasmid encoding Cpf1.

36. A kit as claimed in claim 35, wherein the second polynucleotide or plasmid further comprises a sequence encoding a crRNA which recognises the Cpf1 target sequence.

37. A kit as claimed in any of claims 33 to 36, further comprising a container which includes a third polynucleotide or plasmid encoding a crRNA which recognises the Cpf1 target sequence.

38. A kit as claimed in claim 33 to 36, further comprising a container which includes a crRNA which recognises the Cpf1 target sequence.