WO2019072596A1 - Thermostable cas9 nucleases with reduced off-target activity - Google Patents

Thermostable cas9 nucleases with reduced off-target activity Download PDF

Info

Publication number
WO2019072596A1
WO2019072596A1 PCT/EP2018/076480 EP2018076480W WO2019072596A1 WO 2019072596 A1 WO2019072596 A1 WO 2019072596A1 EP 2018076480 W EP2018076480 W EP 2018076480W WO 2019072596 A1 WO2019072596 A1 WO 2019072596A1
Authority
WO
WIPO (PCT)
Prior art keywords
thermocas9
protein
seq
pam
dna
Prior art date
Application number
PCT/EP2018/076480
Other languages
French (fr)
Inventor
John Van Der Oost
Richard Van Kranenburg
Elleke Fenna BOSMA
Ioannis MOUGIAKOS
Prarthana MOHANRAJU
Original Assignee
Wageningen Universiteit
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wageningen Universiteit filed Critical Wageningen Universiteit
Publication of WO2019072596A1 publication Critical patent/WO2019072596A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K2319/00Fusion polypeptide

Abstract

ThermoCas9 is identified and characterized from the thermophilic bacterium Geobacillus thermodenitrificans T12. Experiments show how in vitro ThermoCas9 is active between 20 and 70 °C, has stringent PAM-preference at lower temperatures, tolerates fewer spacer-protospacer mismatches than SpCas9 and its activity at elevated temperatures depends on the sgRNA-structure. Described are ThermoCas9-based engineering tools for gene deletion and transcriptional silencing at 5 °C in Bacillus smithii and for gene deletion at 37 °C in Pseudomonas putida.

Description

THERMOSTABLE CAS9 NUCLEASES WITH REDUCED OFF-TARGET ACTIVITY
FIELD OF THE INVENTION The present invention relates to the field of genetic engineering and more particularly to nucleic acid editing and genome modification using CRISPR/Cas systems. The invention concerns genetic engineering tools in the form of nucleases, particularly thermostable or "thermo" Cas9 nucleases, whereby the configuration of the nuclease and adoption of conditions under which it is employed means that it is useful for sequence-directed site-specific binding, nicking, cutting and modification of genetic material. Also part of the field of the invention are associated expression constructs for delivery and expression of Cas9 nucleases and guide RNAs in vivo. Further, the invention concerns methods and systems of sequence-specific editing of nucleic acids in vitro or in vivo in any amenable organism, including prokaryotes or eukaryotes, including animals, mammals, humans and plants.
BACKGROUND TO THE INVENTION
It was first demonstrated in 2007 that CRISPR-Cas is an adaptive immune system in many bacteria and most archaea (Barrangou et ai, 2007, Science 315: 1709-1712), Brouns et al., 2008, Science 321 : 960-964). Based on functional and structural criteria, two classes of CRISPR-Cas systems that each comprise three types have so far been characterized, most of which use small RNA molecules as guide to target complementary DNA sequences (Makarova et al., 2015, Nat Rev Microbiol 13: 722- 736; Mahanraju et al., 2016, Science 353: aad5147).
In a study by the Doudna/Charpentier labs, a thorough characterization of the effector enzyme of the class 2/type II CRISPR-Cas system (Cas9) was performed, including demonstration that the introduction of designed CRISPR RNA guides (with specific spacer sequences) targets complementary sequences (protospacers) on a plasmid, causing double strand breaks of this plasmid (Jinek et al., 2012, Science 337: 816- 821 ). Following Jinek et al., 2012, Cas9 is used as a tool for genome editing. Cas9 has been used to engineer the genomes of a range of eukaryotic cells (e.g. fish, plant, man) (Charpentier and Doudna, 2013, Nature 495: 50-51 ).
In addition, Cas9 has been used to improve yields of homologous recombination in bacteria by selecting for dedicated recombination events (Jiang et al., 2013, Nature Biotechnol 31 : 233-239). To achieve this, a toxic fragment (Targeting construct) is co- transfected with a rescuing fragment carrying the desired alteration (Editing construct, carrying point mutation or deletions). The Targeting construct consists of Cas9 in combination with a design CRISPR and an antibiotic resistance marker, defining the site of the desired recombination on the host chromosome; in the presence of the corresponding antibiotic, selective pressure ensures that the Targeting construct is stably maintained in the host. Only when the additional recombination occurs of the Editing construct with the CRISPR target site on the host chromosome, the host can escape from the auto-immunity problem. Hence, in the presence of the antibiotic, only the desired (marker-free) mutants are able to survive and grow because the Cas9 target is removed or mutated. A related strategy to select for subsequent removal of the integrated Targeting construct from the chromosome is presented as well, generating a genuine marker free mutant. It has been established in recent years that CRISPR-Cas mediated genome editing constitutes a useful tool for genetic engineering. The prokaryotic CRISPR systems serve their hosts as adaptive immune systems (Jinek et al., 2012, Science 337: 816- 821 ) and can be used for quick and effective genetic engineering (Mali et al., 2013, Nat Methods 10:957-963, for example), requiring only modification of the guide sequence in order to target sequences of interest.
WO2016/198361 (WAGENINGEN UNIVERSITEIT) describes how Geobacillus thermodenitrificans was discovered during a search of a library of about 500 isolates for a thermophile capable of degrading lignocellulosic substrates under anaerobic conditions. After several selection rounds by isolation on cellulose and xylan, 1 10 isolates were identified. Of this library, all of the 1 10 isolates were Geobacillus sp. with G. thermodenitrificans representing about 79% of the library. The isolated G. thermodenitrificans strain is ascribed as "T12". Based on T12, what is dislosed are thermostable Cas9 and variants for performing gene editing to be carried out at elevated temperatures. These "thermoCas9" nucleases provide novel tools for genetic engineering at elevated temperatures and are of particular value in the genetic manipulation of thermophilic microorganisms. Recently, Schaefer A. K. (2017) Nature Methods 14, 547-548 concerns arise about how CRISPR can have very large numbers of off-target sites. A high number of CRISPR-associated mutations were found in two mice (zygotes) subjected to coinjection with an sgRNA-expressing plasmid and a single-stranded oligodeoxynucleotide (ssODN) donor template and Cas9 protein. Comparison was made to a control (non-injected zygote) mouse. Over a thousand mutations were reported, a small number of which were predicted of being disruptive to gene expression and therefore potentially critical. Also, the off-target sites were not predictable in silico. Therefore there is an emergent problem of off-target Cas9 nuclease activity in the field of genetic modification of organisms, with potentially serious practical conseqeuences and restrictions on applicability for the technology.
SUMMARY OF THE INVENTION
ThermoCas9, a Cas9 orthologue from the thermophilic bacterium G. thermodenitrificans T12, as previously described in WO2016/198361 is active in vitro in a wide temperature range of 20 °C - 70 °C, which is much broader than the range of its mesophilic orthologue from Streptococcus pyogenes (SpCas9). The extended activity and stability of ThermoCas9 allows for its application in molecular biology techniques that require DNA manipulation at temperatures of 20 °C - 70°C, as well as its exploitation in harsh environments that require robust enzymatic activity.
PAM preferences of ThermoCas9 are relatively strict for activity in the lower part of the temperature range (< 30 °C), whereas activity at moderate to optimal temperatures (37 °C - 60°C) still takes place with more variety in the PAM sequence.
Unexpectedly, the inventors have found that with the appropriate RNA guide target association and ThermoCas9 activity have temperature dependent character. Without wishing to be bound by any particular theory, stabilization of the multi-domain ThermoCas9 protein is believed to most likely be the result of a major conformational change from an open/flexible state to a rather compact state (as described for SpCas9) upon guide binding.
ThermoCas9 has been found to be able to tolerate fewer single base mismatches at lower temperature (37°C) with a linear DNA target, which is important for its application at genome editing of mammalian cells, and yet very limited multiple base mismatches at both higher (55 °C) and lower (37°C) temperatures (for both plasmid and linear DNA targets). In other words, ThermoCas9 at temperatures in the lower part of the operable temperature range has been found to lack or have significantly reduced cleavage activity when there are mismatches between gRNA and target (protospacer). This means a greater potential for being more specific, i.e. less off-target binding and activity.
As described herein ThermoCas9 is an RNA-guided DNA-endonuclease from the CRISPR-Cas type-IIC system of a thermophilic bacterium. A particular example of ThermoCas9 is described in WO2016/198361 and is from Geobacillus thermodenitrificans l 2 and has an amino acid sequence as set forth in SEQ ID NO:1 , or a sequence of at least 77% identity therewith. However, what is meant by ThermoCas9 in the context of this specification is any Cas9 which is found in nature in any thermophilic organism, not just G. thermodenitrificans and which may be isolated therefrom. Examples of other bacterial species which provide a ThermoCas9 are as follows. The percentage sequence identity is with respect to the ThermoCas9 of Geobacillus thermodenitrificans T12:
Figure imgf000005_0001
Geobacillus MAS1 88
Geobacillus stearothermophilus 88
Geobacillus stearothermophilus ATCC 88
12980
Geobacillus Sah69 88
Geobacillus stearothermophilus 88
Geobacillus kaustophilus 88
Geobacillus stearothermophilus 88
Geobacillus genomosp. 3 87
Geobacillus genomosp. 3 87
Geobacillus subterraneus 87
Effusibacillus pohliae 86
Also included in what is understood as "ThermoCas9" herein is any man-made variant achieved via modification of the amino acid sequence, which itself which may be achieved by modification of the genetic sequence encoding the Cas9 protein and expression thereof in a suitable in vitro or in vivo expression system.
Accordingly, the present invention provides a method of modifying a desired genetic target locus, comprising delivering a composition to the locus, the composition comprising a ThermoCas9 protein which has an amino acid sequence of SEQ ID NO: 1 or a sequence of at least 77% identity therewith, or a polypeptide fragment of such a ThermoCas9 protein, and one or more nucleic acid components, wherein the ThermoCas9 protein or polypeptide forms a complex with the one or more nucleic acid components and upon binding of the complex to a target locus that is 3' of a Protospacer Adjacent Motif (PAM), the ThermoCas9 protein or polypeptide makes a modification of the target locus, and wherein the method is performed at a temperature of less than than 50 °C.
In terms of "modifying" the desired genetic target locus, this includes binding, cleaving (nuclease action), marking or chemically modifying genetic material. Advantageously, the performance of the method of the invention at a temperature of less than 50 °C results in a reduction in off-target effects that would otherwise be expected if the method of genetic modification were being undertaken using a known "mesophilic" Cas9 protein, e.g. Spycas9, which has a native temperature optimum range of from 35 - 45 °C.
The reduction in the level of off-target modification expected with the invention is to be understood in comparative terms, that is to say undesirable off-target modification is reduced compared to when the method is performed using the same ThermoCas9 at a temperature of about 55 °C.
In another way, the level of off-target modification is understood to be reduced when compared to when the method is performed at a temperature in the range of from 35 °C - 45 °C using SpCas9 instead of ThermoCas9; in particular, when using SpCas9 at a temperature of 37 °C.
The method of the invention may be performed at a temperature in the range of about 20 °C to about 50 °C; preferably in the range 35 °C - 45 °C. Ideally, the method of the invention is performed at 37 °C.
For the ThermoCas9 for use in the method of the invention, the target locus comprises or is sufficiently near to an appropriate PAM sequence for the ThermoCas9; preferably wherein the PAM comprises 5'-NNNNC-3' [SEQ ID NO: 2]. More particularly, the PAM sequence comprises 5'-NNNNCNNA-3' [SEQ ID NO: 3] or 5'-NNNNCSAA-3' [SEQ ID NO: 4].
The one or more nucleic acid components in the method of the invention is preferably a guide RNA (gRNA); more preferably a single guide RNA (sgRNA). The modification resulting from the method of the invention may be cleavage of the desired locus nucleic acid, e.g. DNA, by the nuclease activity of the ThermoCas9 protein or polypeptide. In certain methods of the invention, the target locus is found in a cell; optionally a bacterial cell, an animal cell, a human cell or a plant cell and so in such cases the method is carried out in vivo. Alternatively, the method of the invention may be carried out on isolated nucleic acid materials in vitro; e.g. chromosomes, plasmids or linear DNA fragments etc.
The target locus is preferably to be found within double stranded DNA and the modification is a double stranded break. In other situations, the polynucleotide comprising the target locus may be a double stranded DNA, but the ThermoCas9 protein or polypeptide lacks the ability to cut the double stranded DNA and said use results in gene silencing of the polynucleotide.
In certain methods of the invention, the ThermoCas9 protein or polypeptide contains the mutations D8A and H582A.
The ThermoCas9 protein may have the amino acid sequence of a ThermoCas9 from Geobacillus sp., preferably Geobacillus thermodenitrificans; more preferably Geobacillus thermodenitrificans T12, e.g. as set forth in SEQ ID NO:1 .
In preferred aspects, the ThermoCas9 protein used in certain methods of the invention may further comprise at least one functional moiety.
Alternatively, when the ThermoCas9 protein is provided as part of a protein complex comprising at least one further functional or non-functional protein, optionally this at least one further protein further comprises at least one functional moiety. The ThermoCas9 protein or further protein may comprise at least one functional moiety fused or linked to the N-terminus and/or the C-terminus of the ThermoCas9 protein or protein complex; preferably the C-terminus. Then the at least one functional moiety may be a protein; optionally selected from a helicase, a nuclease, a helicase-nuclease, a DNA methylase, a histone methylase, an acetylase, a phosphatase, a kinase, a transcription (co-)activator, a transcription repressor, a DNA binding protein, a DNA structuring protein, a marker protein, a reporter protein, a fluorescent protein, a ligand binding protein, a signal peptide, a subcellular localisation sequence, an antibody epitope or an affinity purification tag, for example a green fluorescent protein (GFP). From each of the expected activities of the aforementioned, the person of average skill in the art will readily recognize the intended function and purpose of the modification achievable by the method of the invention, whether this is makrking loci, regulation of gene expression or chemical change at a locus, including base change(s) such as subsitutions, deletions or indels.
In some other methods of the invention, the native nuclease activity of the ThermoCas9 protein may be inactivated and the inactivated Thermocas9 protein is linked to at least one functional moiety. Preferably, such at least one functional moiety is a nuclease domain. Alternatively, the at least one functional moiety may be a marker protein. When there are a set of two catalytically inactive Cas9 nucleases, then each may be fused to a Fokl nuclease domain, for example. In preferred methods of the invention, temperature is selected from a temperature of: not more than 49 °C not more than 48 °C, not more than 47 °C, not more than 46 °C, not more than 45 °C not more than 44 °C, not more than 43 °C, not more than 42 °C, not more than 41 °C, not more than 40 °C, not more than 39 °C not more than 38 °C, not more than 37 °C, not more than 36 °C, not more than 35 °C, not more than 34 °C, not more than 33 °C, not more than 32 °C, not more than 31 °C, or not more than 30 °C.
In other preferred embodiments, the temperature is a temperature in the range selected from any of the following ranges, as shown by the upper and lower limit combinations marked "x" in table 1 below:
Table 1
Figure imgf000010_0001
The temperature of association between the ThermoCas9-guide RNA and the target nucleic acid may be one part in the overall process or method, but any of the temperatures and temperature ranges disclosed above may apply to the other steps in the overall process or method, including the entirety of the process or method. In some embodiements the process or method of using the ThermoCas9 for CRISPR gene modification is carried out at the same temperature throughout.
That is to say, all steps in the method of the invention may be carried out at substantially the same temperature. Alternatively, some steps may be carried out at a first temperature within any of the aforementioned ranges and other steps may be carried out a different temperature, also within the aforementioned ranges.
The invention also includes using a ThermoCas9 as defined herein, together with a suitable targeting RNA molecule, e.g. sgRNA for binding, cleaving, marking or modifying a target polynucleotide within a target nucleic acid sequence. The targeting RNA molecule recognizes the target nucleic acid sequence on a target nucleic acid strand of the polynucleotide.
The target locus is a polynucleotide that comprises a target nucleic acid sequence may be double stranded and so comprise a target nucleic acid strand, comprising said target nucleic acid sequence, and a non-target nucleic acid strand, comprising a "protospacer" nucleic acid sequence. The protospacer nucleic acid sequence is substantially complementary to the target nucleic acid sequence and pairs with it in the double stranded target polynucleotide. The non-target nucleic acid strand may further comprise the "protospacer adjacent motif (PAM) sequence directly adjacent the 3' end of the protospacer sequence. The PAM sequence may be at least 6, 7, or 8 nucleic acids in length.
Preferably, the PAM sequence has a cytosine in the fifth position. Preferably the PAM sequence comprises the sequence 5'-NNNNC-3', so that from the 5'-end the PAM sequence begins 5'-NNNNC-3'. Additionally or alternatively, the PAM sequence may have an adenine in the eighth position, so that the PAM sequence comprises the sequence 5'-NNNNCNNN-3' [SEQ ID NO: 5] (preferably 5'-NNNNCNNA-3') and from the 5'-end the PAM sequence begins 5'-NNNNNNNA-3' [SEQ ID NO: 6]. Additionally or alternatively, the PAM sequence may have a cytosine in one or more of the first, second, third, fourth, and sixth positions, such that from the 5'-end the PAM sequence begins 5'-CNNNN-3' [SEQ ID NO: 7], 5'-NCNNN-3' [SEQ ID NO: 8], 5'-NNCNN-3' [SEQ ID NO: 9], 5'-NNNCN-3' [SEQ ID NO: 10], and/or 5'-NNNNNC-3' [SEQ ID NO: 1 1 ].
In a particularly preferred embodiment, the PAM sequence comprises, so that from the 5'-end the PAM sequence begins, 5'-CCCCCCNA-3' [SEQ ID NO: 12], and more particularly the PAM sequence comprises, so that from the 5'-end the PAM sequence begins, 5'-CCCCCCAA-3' [SEQ ID NO: 13].
Other preferred PAM sequences include 5'-ATCCCCAA-3' [SEQ ID NO: 14] and 5'- ACGGCCAA-3' [SEQ ID NO: 15]. In preferred aspects, a ThermoCas9 protein or polypeptide fragment of the invention comprises an amino acid sequence of at least 75% identity; preferably at least 85%; more preferably at least 90%; even more preferably at least 95% identity to SEQ ID NO: 1 . The Cas protein or polypeptide may be used in combination with a targeting RNA molecule, e.g. sgRNA, that recognizes a target nucleic acid sequence on the target nucleic acid strand, where the non-target nucleic acid sequence has a PAM sequence directly adjacent the 3' end of the protospacer sequence on the non-target strand, as disclosed herein. Thus, the PAM sequence may comprise the sequence 5'-NNNNC- 3', and the Cas protein may bind, cleave, mark or modify the target strand at any temperature as hereinbefore defined. Preferably from the 5'-end the PAM sequence begins 5'-NNNNC-3' and the Cas protein may bind, cleave, mark or modify the target strand at any temperature as hereinbefore defined. Preferably from the 5'-end the PAM sequence begins 5'-NNNNNNNA-3' and the Cas protein may bind, cleave, mark or modify the target strand at any temperature as hereinbefore defined. Further preferably the 5'-end of the PAM sequence begins 5'-NNNNCNNA-3' and the Cas protein may bind, cleave, mark or modify the target strand at a any temperature as hereinbefore defined. More particularly, a ThermoCas9 protein or polypeptide as employed in the present invention may comprise an amino acid sequence with a percentage identity with SEQ ID NO:1 as follows: at least 60%, at least 61 %, at least 62%, at least 63%, at least 64%, at least 65%, at least 66%, at least 67%, at least 68%, at least 69%, at least 70%, at least 71 %, at least 72%, at least 73%, at least 74%, at least 75%, at least 76%, at least 77%, at least 78%, at least 79%, at least 80%, at least 81 %, at least 82%, at least 83%, at least 84%, at least 85%, at least 86%, at least 87%, at least 88%, at least 89%, at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99%, at least 99.5% or at least 99.8%. The percentage identity may be at least 89%. The percentage identity may be at least 90%. Preferably the percentage identity will be at least 95%, for example 98%.
The percentage amino acid sequence identity with SEQ ID NO: 1 is determinable as a function of the number of identical positions shared by the sequences in a selected comparison window, taking into account the number of gaps, and the length of each gap, which need to be introduced for optimal alignment of the two sequences.
The methods of the invention may be carried out in vivo, for example, in bacterial cells or in animal or human cells. Optionally, the uses and methods of the invention may be carried out, and the nucleoproteins of the invention formed and used, in vivo, for example in human cells which are not embryonic stem cells. Optionally, the uses and methods of the invention may be carried out, and the nucleoproteins of the invention formed and used, in vivo, for example in human cells in methods which do not involve modifying the germ line genetic identity of a human being.
Alternatively, the methods of the invention may be carried out in vitro. The ThermoCas9 protein as described herein may be provided in isolated form, for example when used in vitro or when added to cells by transfection. The ThermoCas9 protein may be heterologously expressed, for example following transient or stable transformation of the cell by nucleic acid encoding the ThermoCas9 protein. The targeting RNA molecule may be transcribed from an expression vector following transient or stable transformation of the cell by nucleic acid encoding the RNA molecule, and/or the RNA molecule may be provided in isolated form, for example when used in vitro or when added to cells by transfection. In preferred embodiments, the ThermoCas9 protein or polypeptide is expressed from the genome of a host cell, following stable intergration of a nucleic acid encoding the Cas protein or polypeptide in the genome of the host cell. Such expression systems may be inducible. Thus the ThermoCas9 protein and/or RNA molecule may be added to the in vivo or in vitro environment using any artificial or contrived method for adding a protein or nucleic acid molecule to a cell in which it is not otherwise present.
The polynucleotide comprising the target nucleic acid sequence (also referred to herein as the "desired locus") may be cleaved by the ThermoCas9 protein, and optionally the cleavage may be DNA cleavage. The target nucleic acid strand comprising the target sequence may be double stranded DNA and the method or use may result in a double stranded break in the polynucleotide comprising the target nucleic acid sequence. The polynucleotide comprising the target nucleic acid sequence may be double stranded DNA, the ThermoCas9 protein may lack the ability to cut the double stranded DNA and the use or method may result in gene silencing of the polynucleotide.
Also, described herein are nucleic acids encoding any of the aforementioned ThermoCas9 proteins or polypeptides. The nucleic acids may be isolated or in the form of expression constructs.
In all aforementioned, amino acid residues in ThermoCas9 may be substituted conservatively or non-conservatively. Conservative amino acid substitutions refer to those where amino acid residues are substituted for other amino acid residues with similar chemical properties (e.g., charge or hydrophobicity) and therefore do not alter the functional properties of the resulting polypeptide.
Similarly it will be appreciated by a person of average skill in the art that nucleic acid sequences may be substituted conservatively or non-conservatively without affecting the function of the polypeptide. Conservatively modified nucleic acids are those substituted for nucleic acids which encode identical or functionally identical variants of the amino acid sequences. It will be appreciated by the skilled reader that each codon in a nucleic acid (except AUG and UGG; typically the only codons for methionine or tryptophan, respectively) can be modified to yield a functionally identical molecule. Accordingly, each silent variation (i.e. synonymous codon) of a polynucleotide or polypeptide, which encodes a polypeptide of the present invention, is implicit in each described polypeptide sequence.
The invention further provides a method of binding, cleaving, marking or modifying a target nucleic acid in a cell, optionally a human cell, comprising either 1 ) transforming, transfecting or transducing the cell with an expression vector comprising a nucleotide sequence encoding a ThermoCas9 protein or polypeptide as hereinbefore described, and a nucleotide sequence encoding a targeting RNA molecule of the invention; or 2) transforming, transfecting or transducing the cell with an expression vector comprising a nucleotide sequence encoding a ThermoCas9 protein or polypeptide as hereinbefore described, and a further expression vector comprising a nucleotide sequence encoding a targeting RNA molecule; or 3) transforming, transfecting or transducing the cell with an expression vector comprising a nucleotide sequence encoding a ThermoCas9 protein or polypeptide as hereinbefore described, and delivering a targeting RNA molecule as provided herein to, or into the cell.
In all of the aforementioned methods, these are substantially carried out at a temperature of not more than 50 °C. The range of temperature for this invention apply as hereinbefore defined.
The ThermoCas9 protein or polypeptide may be expressed from the genome of the transformed cell, for example following stable integration into the genome of a nucleotide sequence encoding the Cas protein or polypeptide. When applied to a human cell the the human cell is not an embryonic stem cell. Optionally, the method does not involve modifying the germ line genetic identity of a human being.
The invention also provides kits comprising one or more of the reagents for carrying out the uses and methods of the invention, or for generating the transformed cells, optionally transformed human cells, or nucleoprotein complexes as described herein, said kits including: a ThermoCas9 protein or polypeptide as hereinbefore described, or an expression vector comprising a nucleic acid sequence encoding a ThermoCas9 protein or polypeptide as hereinbefore described; and/or a targeting RNA molecule or an expression vector comprising a nucleic acid sequence encoding a targeting RNA molecule. The kits include instructions for carrying out the invention to reduce off- target effects by employing a temperature of not more than 50 °C, or a temperature within any of the ranges of temperature as hereinbefore defined.
RNA Guides and Target Sequences
ThermoCas9 proteins used in accordance with the invention include all those described in WO2016/198361 and allow for sequence-specific binding, cleavage, tagging, marking or modification of a locus (i.e. target nucleic acids). Target nucleic acids may be DNA (single-stranded or double-stranded), RNA or synthetic nucleic acids. A particularly useful application of the present invention is the sequence- specific targeting and modification of genomic DNA by one or more ThermoCas9 proteins as hereinbefore described in complex with one or more guide RNAs (gRNAs), e.g. sgRNAs, that complementarily bind to a targeted sequence of the genomic DNA. Consequently, the target nucleic acid is preferably double-stranded DNA. Such targeting may be performed in vitro or in vivo. Preferably such targeting is performed in vivo. In this way, ThermoCas9 proteins may be used to target and modify specific DNA sequences located in the genomic DNA of a cell, optionally a human cell. It is envisaged that the ThermoCas9 system may be used to modify genomes in a variety of cell types of and/or in different organisms. Optionally, the Cas proteins and systems of the invention may be used to modify the genome of a variety of human cell types, particularly isolated human cells except embroyonic stem cells.
The gRNAs, also referred to herein as targeting RNA molecules, recognize the target nucleic acid sequence on the polynucleotide target strand. The RNA molecules may be designed to recognize a target sequence in a double stranded target polynucleotide, wherein the non-target strand comprises a protospacer adjacent motif (PAM) sequence directly adjacent the 3' end of the protospacer sequence. Disclosed herein are PAM sequences that work in an optimal manner with the ThermoCas9 proteins and polypeptides required for the invention. With knowledge of these PAM sequences, gRNAs may be designed for use with the ThermoCas9 proteins and polypeptides across the range of temperatures as hereinbefore disclosed. Accordingly, a ribonucleoprotein complex comprises a ThermoCas9 protein or a polypeptide as hereinbefore described, and at least one RNA molecule which has a targeting function in that it recognizes a particular nucleotide sequence in a target polynucleotide. The present invention also provides use of at least one targeting RNA molecule and a Cas protein or polypeptide for binding, cleaving, marking or modifying a target nucleic acid strand, and a method of binding, cleaving, marking or modifying a target nucleic acid sequence in a target nucleic acid strand using a ribonucleoprotein or nucleoprotein of the invention, as well as transformed human cells having the Cas protein or polypeptide and targeting RNA molecule. The target polynucleotide may further comprise a defined PAM sequence directly adjacent the 3' end of a protospacer sequence, in accordance with a PAM sequence provided herein. The PAM sequence may be 6, 7, or 8 nucleic acids in length, or longer, preferably 8 nucleic acids in length. Preferably, the RNA molecule is a single-stranded RNA molecule, e.g. a CRISPR RNA (crRNA) and is associated, e.g. by hybridization with a tracrRNA. The targeting RNA may be a chimera of a crRNA and tracrRNA. The aforementioned RNA molecules may have a ribonucleotide sequence of at least 90% identity, or complementarity to a target nucleotide sequence. Optionally, the RNA molecule has a ribonucleotide sequence of at least 90%, at least 91 %, at least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 98%, at least 99% or 100% identity or complementarity to a target nucleotide sequence. The preferred target nucleotide sequence is a DNA. Optionally, the use of at least one targeting RNA molecule and a Cas protein or polypeptide for binding, cleaving, marking or modifying a target nucleic acid strand, and a method of binding, cleaving, marking or modifying a target nucleic acid sequence in a target nucleic acid strand using a ribonucleoprotein or nucleoprotein of the invention, may be used in a human cell. Optionally the human cell will be isolated. Optionally the human cell is not an embryonic stem cell. Optionally the Cas protein or polypeptide or ribonucleoprotein or nucleoprotein of the invention will not be used for modifying the germ line genetic identity of a human being. The targeting RNA molecule is preferably modeled on what are known from nature in prokaryotes as CRISPR RNA (crRNA) molecules. The structure of crRNA molecules is already established and explained in more detail in Jore et ai, 201 1 , Nature Structural & Molecular Biology 18: 529-537. In brief, a mature crRNA of type l-E is often 61 nucleotides long and consists of a 5' "handle" region of 8 nucleotides, the "spacer" sequence of 32 nucleotides, and a 3' sequence of 21 nucleotides which form a hairpin with a tetranucleotide loop (Fig 5). Type I systems differ from type II (Cas9) and details of different systems are described in Van der Oost 2014 Nat Rev Micr 12: 479-492. In type II (Cas9) systems there is a different processing mechanism, making use of a second RNA (tracrRNA) and two ribonucleases. Rather than a hairpin, the mature crRNA in type II remains attached to a fragment of the tracrRNA (Fig. 5). However, the RNA used in the invention does not have to be designed strictly to the design of naturally occurring crRNA, whether in length, regions or specific RNA sequences. What is clear though, is that RNA molecules for use in the invention may be designed based on gene sequence information in the public databases or newly discovered, and then made artificially, e.g. by chemical synthesis in whole or in part. The RNA molecules may also be designed and produced by way of expression in genetically modified cells or cell free expression systems and this option may include synthesis of some or all of the RNA sequence.
The structure and requirements of crRNA in type II (Cas9) has also been described in Jinek et ai, 2012 ibid. In type I, there is a so-called "SEED" portion forming the 5' end of the spacer sequence and which is flanked 5' thereto by the 5' handle of 8 nucleotides. Semenova et al. (201 1 , PNAS 108: 10098-10103), have found that all residues of the SEED sequence should be complementary to the target sequence, although for the residue at position 6, a mismatch may be tolerated (Fig. 5). In type II, there is a SEED of 10-12 nucleotides that is located at the 3' end of the spacer (Fig. 5) (reviewed by Van der Oost 2014 ibid.). Similarly, when designing and making an RNA component of a ribonucleoprotein complex directed at a target locus (i.e. sequence), the necessary match and mismatch rules for the type II SEED sequence can be applied.
The invention therefore includes a method of detecting and/or locating a single base change in a target nucleic acid molecule comprising contacting a nucleic acid sample with a ribonucleoprotein complex as hereinbefore described, or with a Cas protein or polypeptide and separate targeting RNA component of the invention as hereinbefore described, and wherein the sequence of the targeting RNA (including when in the ribonucleoprotein complex) is such that it discriminates between a normal allele and a mutant allele by virtue of a single base change. There are certain factors which confer the thermostability character of ThermoCas9; one of which are the PAM preferences of ThermoCas9. The PAM preferences of ThermoCas9 turn out to be strict for activity in the lower part of the temperature range (< 30°C), whereas more variety in the PAM is allowed for activity at the moderate to optimal temperatures (37°C to 60°C). As such, the PAM sequence may be altered to obtain the most efficient binding, cleavage, marking or modification of the target at a given temperature. This provides a great deal of flexibility in application of the ThermoCas9, depending on the particular application. Indeed, in the context of the invention, the optimization via selection of particular PAM sequence can be used to further enhance the advangage of reduced off-target effect.
In all aspects of the invention, ThermoCas9 proteins or polypeptides may be obtained or derived from bacteria, archaea or viruses; or alternatively may be synthesised de novo. In preferred embodiments, a Cas protein or polypeptide is derived from a thermophilic prokaryotic organism, which may be classified as an archaea or bacterium, but is preferably a bacterium. More preferably a Cas protein or polypeptide of the invention will be derived from a thermophilic bacterium. Herein, the term "thermophilic" is to be understood as meaning capable of survival and growth at relatively high temperatures, for example in the context of the invention, capable of nucleic acid cleavage, binding or modification at a temperature between 41 and 122 °C (106 and 252 °F). Preferably a ThermoCas9 protein or polypeptide for use in the invention may be isolated from one or more thermophilic bacteria and will function above 60°C. Preferably a Cas protein or polypeptide may be isolated from one or more thermophilic bacteria and will function in the range 60°C to 80°C and optimally between 60°C and 65°C. In preferred embodiments, a Cas protein or polypeptide is derived from Geobacillus sp. More preferably, a Cas protein of the invention is derived from Geobacillus thermodenitrificans. Even more preferably, a Cas protein of the invention is derived from Geobacillus thermodenitrificans T12. A Cas protein or polypeptide of the invention may be derived from a virus.
Functional Moieties Advantageously, the ability of ThermoCas9 proteins, polypeptides and ribonucleoprotein complexes used in the method of the invention to target any polynucleotide sequence (i.e. desired locus) in a sequence-specific manner may be exploited in order to modify the target nucleic acid in some way, for example by cleaving it and/or marking it and/or modifying it. It will therefore be appreciated that additional proteins may be provided along with the ThermoCas9 protein or polypeptide to achieve this. Accordingly, the ThermoCas9 proteins or polypeptides may further comprise at least one functional moiety and/or the Cas proteins, polypeptides or ribonucleoprotein complexes may be provided as part of a protein complex comprising at least one further protein. In a preferred ThermoCas9 protein, polypeptide or a ribonucleoprotein complex there is at least one further protein further comprises at least one functional moiety. This at least one functional moiety may be fused or linked to the ThermoCas9 protein. Preferably, the at least one functional moiety may be translationally fused to the ThermoCas9 protein through expression in natural or artificial protein expression systems. Alternatively, the at least one functional moiety may be covalently linked by a chemical synthesis step to the ThermoCas9 protein. Preferably, the at least one functional moiety is fused or linked to the N-terminus and/or the C-terminus of the ThermoCas9 protein; preferably the C-terminus. Desirably, the at least one functional moiety will be a protein. It may be a heterologous protein or alternatively may be native to the bacterial species from which the ThermoCas9 protein was derived. The at least one functional moiety may be a protein; optionally selected from a helicase, a nuclease, a helicase-nuclease, a DNA methylase, a histone methylase, an acetylase, a phosphatase, a kinase, a transcription (co-)activator, a transcription repressor, a DNA binding protein, a DNA structuring protein, a marker protein, a reporter protein, a fluorescent protein, a ligand binding protein, a signal peptide, a subcellular localisation sequence, an antibody epitope or an affinity purification tag. Nuclease Activity
Cas proteins, polypeptides or ribonucleoproteins used in the method of the invention may have more than one nuclease domain. Site-specific nucleases can permit the generation of double strand breaks (DSBs) at selected positions along a strand of DNA. In a target host cell, this enables DSBs to be made at specific pre-selected positions in the genome. The creation of such breaks by site-specific nucleases prompts the endogenous cellular repair machinery to be repurposed in order to insert, delete or modify DNA at desired positions in the genome of interest.
One or more nuclease activity sites of the protein or polypeptide molecule may be inactivated, e.g. so as to allow the activity of another functional moiety linked or fused to the protein or polypeptide, e.g. a nuclease domain such as Fok1 nuclease. Therefore notwithstanding the fact that the Cas proteins, polypeptides and ribonucleoproteins may have endogenous nuclease activity, for certain applications it may be desirable to inactivate the native nuclease activity of the Cas protein and provide a Cas protein or a ribonucleoprotein complex wherein the native Cas9 nuclease activity is inactivated and the Cas protein is linked to at least one functional moiety. Reducing the incidence of mis-targeting events by complementation of the native Cas9 nuclease activity is one such application. This may desirably be achieved by inactivation of the native Cas9 nuclease activity of the Cas protein or ribonucleoprotein complex and provision of a heterologous nuclease, preferably fused to the Cas protein. Accordingly, in a Cas protein or a ribonucleoprotein complex at least one functional moiety may be a nuclease domain, preferably a nuclease domain, e.g. a Fokl nuclease domain. In a particularly preferred aspect, the Cas protein or ribonucleoprotein complex is fused to a Fokl nuclease domain is provided as part of a protein complex, preferably comprising another Cas protein or ribonucleoprotein complex is fused to a Fokl nuclease domain and wherein the two complexes target opposite strands of the target genomic DNA.
For some applications it may be desirable to completely attenuate the nuclease activity of the Cas protein, polypeptide or ribonucleoprotein, for example in applications where the Cas protein or ribonucleoprotein complex is utilised to recognise and modify a specific target sequence in a nucleic acid, for instance to mark it as part of a diagnostic test. The nuclease activity of the Cas protein may be inactivated and the functional moiety fused to the Cas protein may be a protein; optionally selected from a helicase, a nuclease, a helicase-nuclease, a DNA methylase, a histone methylase, an acetylase, a phosphatase, a kinase, a transcription (co-)activator, a transcription repressor, a DNA binding protein, a DNA structuring protein, a marker protein, a reporter protein, a fluorescent protein, a ligand binding protein, a signal peptide, a subcellular localisation sequence, an antibody epitope or an affinity purification tag. In a preferred aspect, a catalytically inactive, or "dead" Cas protein or polypeptide (dCas) lacking nuclease activity may be bound to a target nucleic acid sequence and thereby sterically repress activity of that sequence. For example, a target RNA may be designed that is complementary to a promoter or exonic sequence of a gene, so that binding of the dCas and target RNA to the gene sterically represses transcriptional initiation or elongation of the gene sequence, thereby repressing expression of the gene. Alternatively, the methods and uses described herein can use modified nuclease variants of gtCas9 that are nickases. A nickase can be created via a mutation in either one of the HNH or the RuvC catalytic domains of the gtCas9 nuclease. This has been shown for S. pyogenes Cas9 (spCas) with spCas9-mutants D10A and H840A, which have an inactive RuvC or HNH nuclease domain, respectively. The combination of these two mutations leads to a catalytically dead Cas9 variant (Standage-Beier, K. et al., 2015, ACS Synth. Biol. 4, 1217-1225; Jinek, M. et al., 2012, Science 337, 816- 821 ; Xu, T. et al., 2015, Appl. Environ. Microbiol. 81 , 4423-4431 ). Based on sequence homology (Figure 3), these residues can be D8 (D17 in Figure 3) and D581 or H582 (Figure 3) in gtCas9.
Preferably, the mutations D8A and H582A in gtCas9 (ThermoCas9) can be used to create a catalytically inactive, or "dead" Cas protein or polypeptide variant of ThermoCas9 (dCas) which lacks nuclease activity. Such a dCas may usefully find application as, for example, an efficient thermoactive transcriptional silencing CRISPRi tool, being able to steadily and specifically bind to DNA elements without introducing dsDNA breaks. Advantageously, such a system could, amongst other things, greatly facilitate metabolic studies of human cells. One ThermoCas9 protein or a ribonucleoprotein complex has nuclease activity inactivated and the at least one functional moiety is a marker protein, for example GFP. In this way it is possible to specifically target a nucleic acid sequence of interest and to visualize it using a marker which generates an optical signal. Suitable markers may include for example, a fluorescent reporter protein, e.g. Green Fluorescent Protein (GFP), Yellow Fluorescent Protein (YFP), Red Fluorescent Protein (RFP), Cyan Fluorescent Protein (CFP) or mCherry. Such a fluorescent reporter gene provides a suitable marker for visualisation of protein expression since its expression can be simply and directly assayed by fluorescence measurement. Alternatively, the reporter nucleic acid may encode a luminescent protein, such as a luciferase (e.g. firefly luciferase). Alternatively, the reporter gene may be a chromogenic enzyme which can be used to generate an optical signal, e.g. a chromogenic enzyme (such as beta-galactosidase (LacZ) or beta-glucuronidase (Gus)). Reporters used for measurement of expression may also be antigen peptide tags. Other reporters or markers are known in the art, and they may be used as appropriate.
Because the marker may be visualized, in certain embodiments where the target nucleic acid is RNA, specifically mRNA, it is possible to quantify the transcriptional activity of a gene by detection and quantification of the optical signal provided by the marker, particularly where the optical signal generated by the marker is directly proportionate to the quantity of the expression product. Therefore, Cas proteins or ribonucleoproteins may be used to assay expression products of a gene of interest.
In one aspect, the ThermoCas9 described herein may be used in a homologous recombination (HR) mediated genome modification method. Such methods involve HR and site-directed ThermoCas9 activity, whereby counter selection occurs by the ThermoCas9 activity removing cells which do not have a desired modification introduced by HR. Thus the methods and uses provided herein allow the process of homologous recombination to be favoured during a first step such that the genome can be modified with the desired mutation and a second step in which unmodified cells can be targeted by the ThermoCas9 ribonuclease complex to introduce a DSDB into the genomes of the unmodified cells. These methods and uses increase overall the population of cells with the desired mutation whilst eliminating any unmodified cells. Preferably, such methods and uses are used in microbes that have substantially no endogenous NHEJ repair mechanism. Alternatively, the methods and uses may be applied to cells that have an endogenous NHEJ repair mechanism. The methods and uses described herein may be applied to cells that have an endogenous NHEJ repair mechanism but wherein the NHEJ repair mechanism is either conditionally reduced or the NHEJ activity is knocked out.
The methods and uses provided herein may utilise a sequence of the homologous recombination polynucleotide that has at least one mis-match with the guide RNA, such that the guide RNA is no longer able to recognise the modified genome. This means that the ThermoCas9 ribonuclease complex will not recognise the modified genome. Therefore, no DSDB can be introduced by the ThermoCas9 ribonuclease complex and so the modified cells will survive. However, the cells with unmodified genomes will still have substantial complementarity to the guide RNA and consequently can be cleaved site-specifically by the ThermoCas9 ribonuclease complex.
In another aspect of the methods and uses of the invention, the way in which the ThermoCas9 ribonucleoase complex is prevented from acting to cleave the microbial genome is not so much to modify or eliminate the sequence targeted by the guide, but rather the PAM required by the ThermoCas9 ribonuclease complex. The PAM is either modified or eliminated in order to blind the gtCas9 ribonuclease complex to the specific cutting site. Therefore, methods and uses of the invention may include those using a sequence of the homologous recombination polynucleotide that does not include a PAM sequence recognised by the ThermoCas9 ribonuclease complex. Therefore, no DSDB can be introduced by the ThermoCas9 ribonuclease complex and so the HR modified cells will survive. However, the unmodified cells will still be recognised by the ThermoCas9 ribonuclease complex and its guide and so consequently are cleaved site-specifically.
Thus methods and uses of the invention are provided herein that rely on HR to modify the genome of the cell. Preferably, the upstream flank and downstream flanks are 0.5 kilobases (kb) to 1 .0 kb each in length. However, recombination using larger or shorter fragments is possible as well. The homologous recombination polynucleotide may further comprise a polynucleotide sequence between the upstream and downstream flanking regions. This polynucleotide sequence could for example contain a modification that is to be introduced into the genome. Whilst homologous recombination relies upon the upstream and downstream flanks having substantial complementarity to the target regions, mismatches can be accommodated as well. Therefore, in some embodiments, homologous recombination is known to occur between DNA segments with extensive homology to the upstream and downstream flanks. In alternative embodiments, the upstream and downstream flanks have complete complementarity to the target regions. The upstream and downstream flanks need not be identical in size. However, in some instances the upstream and downstream flanks are identical in size. The efficiency of homologous recombination will vary depending on the likelihood of homologous recombination of the smallest fragment length of the flank. However, even if the homologous recombination process is inefficient, advantageously the method described herein will select for any cell that has the desired modification over the unmodified cell. Homologous recombination also allows large deletions (e.g. 50 kb or more) to be made encompassing complete gene clusters. Homologous recombination is also used for recombineering, which is a well-known method to allow for recombination over smaller fragments (45-100 nt). The methods and uses described herein can optionally further comprise at least another homologous recombination polynucleotide or a polynucleotide comprising a sequence encoding a homologous recombination polynucleotide having a sequence substantially complementary to a second target region containing the target in the genome.
In preferred embodiments, the methods and uses of the invention described herein utilise a homologous recombination polynucleotide that is DNA. In some embodiments the DNA is single stranded. In other embodiments, the DNA is double stranded. In further embodiments, the DNA is double stranded and plasmid borne.
HR in the methods and uses of the invention provided herein may be used to remove a polynucleotide sequence from the genome. Alternatively, HR in the methods and uses provided herein may be used to insert one or more gene(s), or fragment(s) thereof, in to the genome. As a further alternative, HR in the methods and uses provided herein may be used to modify or replace at least one nucleotide in the genome. Consequently, the methods and uses provided herein may be used for any desired kind of genome modification. Alternatively, the ThermoCas9 described herein may be used in a HR mediated genome modification method in cells, whereby the ThermoCas9 activity introduces DSDB and can induce cellular HR in cells, as has been shown for SpCas9 (Jiang et al. (2013) Nature Biotech, 31 , 233-239; Xu et al. (2015) Appl Environ Microbiol, 81 , 4423-4431 ; Huang et al. (2015) Acta Biochimica et Biophysica Sinica, 47, 231 -243). .
Alternatively, homologous recombination may be facilitated through recombineering, e.g., by introducing an oligonucleotide into a cell expressing a gene coding for RecT or beta protein as reviewed by Mougiakos et al. ((2016), Trends Biotechnol. 34: 575- 587). In a further embodiment, the Cas9 can be combined with Multiplex Automated Genome Engineering (MAGE) as exemplified by Ronda et al. ((2016), Sci. Rep. 6: 19452.)
ThermoCas9 Nuclease Activity: Divalent Cations Previously characterized, mesophilic Cas9 endonucleases employ divalent cations to catalyze the generation of DSBs in target DNA. ThermoCas9 can mediate dsDNA cleavage in the presence of any of the following divalent cations: Mg2+, Ca2+, Mn2+, Co2+, Ni2+, and Cu2+.
ThermoCas9 Nuclease Activity: Substrates Despite reports that certain type-IIC systems were efficient single stranded DNA cutters ((Ma, et al., Mol. Cell 60, 398^107 (2015); Zhang, et al., Mol. Cell 60, 242-255 (2015)), ThermoCas9 cannot direct cleavage of ssDNA. The nuclease activity of ThermoCas9 is limited to dsDNA substrates.
Expression Vectors
In order that expression of the nucleic acid sensing construct may be carried out in a chosen cell according to methods and uses of the invention, the polynucleotide sequence encoding the ThermoCas9 protein or ribonucleoprotein will preferably be provided in an expression construct. In some embodiments, the polynucleotide encoding the Cas protein or ribonucleoprotein will be provided as part of a suitable expression vector. In certain embodiments an expression vector of the present invention (with or without nucleotide sequence encoding amino acid residues which on expression will be fused to a Cas protein) may further comprise a nucleotide sequence encoding a targeting RNA molecule as hereinbefore defined. Consequently, such expression vectors can be used in an appropriate host to generate a ribonucleoprotein complex of the invention which can target a desired nucleotide sequence. Alternatively, nucleotide sequences encoding a targeting RNA molecule as hereinbefore defined may be provided in a separate expression vector or alternatively may be delivered to a target cell by other means.
Suitable expression vectors will vary according to the recipient cell and suitably may incorporate regulatory elements which enable expression in the target cell and preferably which facilitate high-levels of expression. Such regulatory sequences may be capable of influencing transcription or translation of a gene or gene product, for example in terms of initiation, accuracy, rate, stability, downstream processing and mobility.
Such elements may include, for example, strong and/or constitutive promoters, 5' and 3' UTR's, transcriptional and/or translational enhancers, transcription factor or protein binding sequences, start sites and termination sequences, ribosome binding sites, recombination sites, polyadenylation sequences, sense or antisense sequences, sequences ensuring correct initiation of transcription and optionally poly-A signals ensuring termination of transcription and transcript stabilisation in the host cell. The regulatory sequences may be plant-, animal-, bacteria-, fungal- or virus- derived. Clearly, appropriate regulatory elements will vary according to the host cell of interest. Regulatory elements which facilitate high-level expression in human host cells might include the AOX1 or GAL1 promoter in yeast or the CMV- or SV40-promoters, CMV- enhancer, SV40-enhancer, Herpes simplex virus VIP16 transcriptional activator or inclusion of a globin intron in human cells.
Suitable regulatory elements may be constitutive, whereby they direct expression under most environmental conditions or developmental stages, developmental stage specific or inducible. Preferably, the promoter is inducible, to direct expression in response to environmental, chemical or developmental cues, such as temperature, light, chemicals, drought, and other stimuli. Suitably, promoters may be chosen which allow expression of the protein of interest at particular developmental stages or in response to extra- or intra-cellular conditions, signals or externally applied stimuli. For example, a range of promoters exist for use in E. coli which give high-level expression at particular stages of growth (e.g. osmY stationary phase promoter) or in response to particular stimuli (e.g. HtpG Heat Shock Promoter).
Suitable expression vectors may comprise additional sequences encoding selectable markers which allow for the selection of said vector in a suitable host cell and/or under particular conditions. The methods and uses of the invention also include modifying a target nucleic acid in a human cell, comprising transfecting, transforming or transducing the cell with any of the expression vectors as hereinbefore described. The methods of transfection, transformation or transduction are of the types well known to a person of skill in the art. Where there is one expression vector used to generate expression of a ribonucleoprotein complex of the invention and when the targeting RNA is added directly to the cell then the same or a different method of transfection, transformation or transduction may be used. Similarly, when there is one expression vector being used to generate expression of a ribonucleoprotein complex of the invention and when another expression vector is being used to generate the targeting RNA in situ via expression, then the same or a different method of transfection, transformation or transduction may be used.
BRIEF DESCRIPTION OF THE FIGURES
The invention will now be described in detail with reference to a specific embodiment and with reference to the accompanying drawings, in which:
Figure 1 shows protein architecture of A. naeslundii Cas9 (Cas9-Ana) (Jinek et ai, 2014). gtCas9 belongs to the same Type ll-C CRISPR system and active site residues could be identified.
Figure 2 shows ThermoCas9 PAM analysis. (A) Schematic illustrating the in vitro cleavage assay for discovering the position and identity (5'-ΝΝΝΝΝΝΝ-3' [SEQ ID NO: 16]) of the protospacer adjacent motif (PAM). Black triangles indicate the cleavage position.
(B) Sequence logo of the consensus 7nt long PAM of ThermoCas9, obtained by comparative analysis of the ThermoCas9-based cleavage of target libraries. Letter height at each position is measured by information content.
(C) Extension of the PAM identity to the 8th position by in vitro cleavage assay.
Four linearized plasmid targets, each containing a distinct 5'-CCCCCCAN-3' [SEQ ID NO: 17] PAM, were incubated with ThermoCas9 and sgRNA at 55°C for 1 hour, then analysed by agarose gel electrophoresis.
(D) /n vitro cleavage assays for DNA targets with different PAMs at 30°C and 55°C.
Sixteen linearized plasmid targets, each containing one distinct 5'- CCCCCNNA-3' [SEQ ID NO: 18] PAM, were incubated with ThermoCas9 and sgRNA, then analysed for cleavage efficiency by agarose gel electrophoresis. See also Figure 5.
Figure 3 shows ThermoCas9 is active at a wide temperature range and its thermostability increases when bound to sgRNA.
(A) Schematic representation of the sgRNA and a matching target DNA. Target DNA is shown as a rectangular with black outline, and the PAM is shown as a dark grey, horizontal ellipse with back outline. The crRNA is shown as a dark grey rectangular with black outline and the site where the 3'-end of the crRNA is linked with 5'-end of the tracrRNA is shown as a black, vertical ellipse. The black box with the white letters and the light grey box with the black letters indicate the predicted three and two loops at the 3'-side of the tracrRNA, respectively. The 41 -nt truncation of the repeat/anti-repeat region -formed by the complementary 3'-end of the crRNA and the 5'-end of the tracrRNA- is indicated with a long, light grey, vertical, dotted line. The predicted 3' position of the first tracrRNA loop is marked with a black triangle and a black dotted line. The predicted 3' position of the second tracrRNA loop is marked with a white triangle and a black dotted line. The predicted 3' position of the third tracrRNA loop is marked with a white triangle and a white dotted line.
(B) The importance of the predicted three stem-loops of the tracrRNA scaffold was tested by transcribing truncated variants of the sgRNA and evaluating their ability to guide ThermoCas9 to cleave target DNA at various temperatures. Average values of at least two biological replicates are shown, with error bars representing S.D.
(C) To identify the maximum temperature, endonuclease activity of ThermoCas9:sgRNA RNP complex was assayed after incubation at 60°C, 65°C and 70°C for 5 or 10 min. The pre-heated DNA substrate was added and the reaction was incubated for 1 hour at the corresponding temperature.
(D) Comparison of active temperature range of ThermoCas9 and SpCas9 by activity assays conducted after 5 min of incubation at the indicated temperature. The pre-heated DNA substrate was added and the reaction was incubated for
1 hour at the same temperature.
Figure 4 shows in silico PAM determination results. Panel (A) shows the two hits obtained with phage genomes using CRISPRtarget6. Panel (B) shows sequence logo of the consensus 7nt long PAM of ThermoCas9, obtained by in silico PAM analysis. Letter height at each position is measured by information content.
Figure 5 shows ThermoCas9 PAM discovery. In vitro cleavage assays for DNA targets with different PAMs at 20°C, 37°C, 45°C and 60°C. Seven (20°C) or sixteen (37°C, 45°C, 60°C) linearized plasmid targets, each containing a distinct 5'-CCCCCNNA-3' PAM, were incubated with ThermoCas9 and sgRNA, then analysed by agarose gel electrophoresis.
Figure 6 shows activity of ThermoCas9 at a wide temperature range using sgRNA containing one loop. The importance of the predicted three stem loops of the tracrRNA scaffold was tested by transcribing truncated variations of the sgRNA and evaluating their ability to guide ThermoCas9 to cleave target DNA at various temperatures. Shown above is the effect of one loop on the activity of ThermoCas9 at various temperatures. Average values from at least two biological replicates are shown, with error bars representing S.D.
Figure 7 shows ThermoCas9 mediates dsDNA targeting using divalent cations as catalysts and does not cleave ssDNA. Panel (A) shows in vitro plasmid DNA cleavage by ThermoCas9 with EDTA and various metal ions. M = 1 kb DNA ladder. Panel (B) shows activity of ThermoCas9 on ssDNA substrates. M= 10 bp DNA ladder.
Figure 8 shows spacer selection for the IdhL silencing experiment. Schematic representation of the spacer (sgRNA)-protospacer annealing during the IdhL silencing process; the selected protospacer resides on the non-template strand and 39nt downstream the start codon of the IdhL gene.
Figure 9A shows the scheme of the generated mismatch protospacers library, employed for evaluating the ThermoCas9:sgRNA targeting specificity in vitro. The generated mismatches are indicated with white letters on black background.
Figure 9B is a graphical representation of the ThermoCas9:sgRNA cleavage efficiency over linear or plasmid targets with different mismatches at 37°C.
Figure 9C is a Graphical representation of the ThermoCas9:sgRNA cleavage efficiency over linear or plasmid targets with different mismatches at 55°C.
Below are polynucleotide and amino acid sequences of Cas proteins used in accordance with the invention.
[SEQ ID NO: 1] Geobacillus thermodenitrificans T12 Cas9 protein AA sequence
MKYKIGLDIGITSIGWAVINLDIPRIEDLGVRIFDRAENPKTGESLALPRRLARSARRR LRRRKHRLERIRRLFVREGILTKEELNKLFEKKHEIDVWQLRVEALDRKLNNDELARI LLHLAKRRGFRSNRKSERTNKENSTMLKHIEENQSILSSYRTVAEMVVKDPKFSLH KRNKEDNYTNTVARDDLEREIKLIFAKQREYGNIVCTEAFEHEYISIWASQRPFASK DDIEKKVGFCTFEPKEKRAPKATYTFQSFTVWEHINKLRLVSPGGIRALTDDERRLIY KQAFHKNKITFHDVRTLLNLPDDTRFKGLLYDRNTTLKENEKVRFLELGAYHKIRKAI DSVYGKGAAKSFRPIDFDTFGYALTMFKDDTDIRSYLRNEYEQNGKRMENLADKVY DEELIEELLNLSFSKFGHLSLKALRNILPYMEQGEVYSTACERAGYTFTGPKKKQKT VLLPNIPPIANPVVMRALTQARKWNAIIKKYGSPVSIHIELARELSQSFDERRKMQK EQEGNRKKNETAIRQLVEYGLTLNPTGLDIVKFKLWSEQNGKCAYSLQPIEIERLLE PGYTEVDHVIPYSRSLDDSYTNKVLVLTKENREKGNRTPAEYLGLGSERWQQFETF VLTNKQFSKKKRDRLLRLHYDENEENEFKNRNLNDTRYISRFLANFIREHLKFADSD DKQKVYTVNGRITAHLRSRWNFNKNREESNLHHAVDAAIVACTTPSDIARVTAFYQ RREQNKELSKKTDPQFPQPWPHFADELQARLSKNPKESIKALNLGNYDNEKLESL QPVFVSRMPKRSITGAAHQETLRRYIGIDERSGKIQTWKKKLSEIQLDKTGHFPMY GKESDPRTYEAIRQRLLEHNNDPKKAFQEPLYKPKKNGELGPIIRTIKIIDTTNQVIPL NDGKTVAYNSNIVRVDVFEKDGKYYCVPIYTIDMMKGILPNKAIEPNKPYSEWKEMT EDYTFRFSLYPNDLIRIEFPREKTIKTAVGEEIKIKDLFAYYQTIDSSNGGLSLVSHDN NFSLRSIGSRTLKRFEKYQVDVLGNIYKVRGEKRVGVASSSHSKAGETIRPL*
[SEQ ID NO: 19] Geobacillus thermodenitrificans T12 Cas9 DNA Sequence ATGAAGTATAAAATCGGTCTTGATATCGGCATTACGTCTATCGGTTGGGCTGTC ATTAATTTGGACATTCCTCGCATCGAAGATTTAGGTGTCCGCATTTTTGACAGAG CGGAAAACCCGAAAACCGGGGAGTCACTAGCTCTTCCACGTCGCCTCGCCCGC TCCGCCCGACGTCGTCTGCGGCGTCGCAAACATCGACTGGAGCGCATTCGCC GCCTGTTCGTCCGCGAAGGAATTTTAACGAAGGAAGAGCTGAACAAGCTGTTT GAAAAAAAGCACGAAATCGACGTCTGGCAGCTTCGTGTTGAAGCACTGGATCG AAAACTAAATAACGATGAATTAGCCCGCATCCTTCTTCATCTGGCTAAACGGCG TGGATTTAGATCCAACCGCAAGAGTGAGCGCACCAACAAAGAAAACAGTACGAT GCTCAAACATATTGAAGAAAACCAATCCATTCTTTCAAGTTACCGAACGGTTGCA GAAATGGTTGTCAAGGATCCGAAATTTTCCCTGCACAAGCGTAATAAAGAGGAT AATTACACCAACACTGTTGCCCGCGACGATCTTGAACGGGAAATCAAACTGATT TTCGCCAAACAGCGCGAATATGGGAACATCGTTTGCACAGAAGCATTTGAACAC GAGTATATTTCCATTTGGGCATCGCAACGCCCTTTTGCTTCTAAGGATGATATC GAGAAAAAAGTCGGTTTCTGTACGTTTGAGCCTAAAGAAAAACGCGCGCCAAAA GCAACATACACATTCCAGTCCTTCACCGTCTGGGAACATATTAACAAACTTCGT CTTGTCTCCCCGGGAGGCATCCGGGCACTAACCGATGATGAACGTCGTCTTAT ATACAAGCAAGCATTTCATAAAAATAAAATCACCTTCCATGATGTTCGAACATTG CTTAACTTGCCTGACGACACCCGTTTTAAAGGTCTTTTATATGACCGAAACACCA CGCTGAAGGAAAATGAGAAAGTTCGCTTCCTTGAACTCGGCGCCTATCATAAAA TACGGAAAGCGATCGACAGCGTCTATGGCAAAGGAGCAGCAAAATCATTTCGT CCGATTGATTTTGATACATTTGGCTACGCATTAACGATGTTTAAAGACGACACCG ACATTCGCAGTTACTTGCGAAACGAATACGAACAAAATGGAAAACGAATGGAAA ATCTAGCGGATAAAGTCTATGATGAAGAATTGATTGAAGAACTTTTAAACTTATC GTTTTCTAAGTTTGGTCATCTATCCCTTAAAGCGCTTCGCAACATCCTTCCATAT ATGGAACAAGGCGAAGTCTACTCAACCGCTTGTGAACGAGCAGGATATACATTT ACAGGGCCAAAGAAAAAACAGAAAACGGTATTGCTGCCGAACATTCCGCCGAT CGCCAATCCGGTCGTCATGCGCGCACTGACACAGGCACGCAAAGTGGTCAATG CCATTATCAAAAAGTACGGCTCACCGGTCTCCATCCATATCGAACTGGCCCGG GAACTATCACAATCCTTTGATGAACGACGTAAAATGCAGAAAGAACAGGAAGGA AACCGAAAGAAAAACGAAACTGCCATTCGCCAACTTGTTGAATATGGGCTGACG CTCAATCCAACTGGGCTTGACATTGTGAAATTCAAACTATGGAGCGAACAAAAC GGAAAATGTGCCTATTCACTCCAACCGATCGAAATCGAGCGGTTGCTCGAACCA GGCTATACAGAAGTCGACCATGTGATTCCATACAGCCGAAGCTTGGACGATAG CTATACCAATAAAGTTCTTGTGTTGACAAAGGAGAACCGTGAAAAAGGAAACCG CACCCCAGCTGAATATTTAGGATTAGGCTCAGAACGTTGGCAACAGTTCGAGAC GTTTGTCTTGACAAATAAGCAGTTTTCGAAAAAGAAGCGGGATCGACTCCTTCG GCTTCATTACGATGAAAACGAAGAAAATGAGTTTAAAAATCGTAATCTAAATGAT ACCCGTTATATCTCACGCTTCTTGGCTAACTTTATTCGCGAACATCTCAAATTCG CCGACAGCGATGACAAACAAAAAGTATACACGGTCAACGGCCGTATTACCGCC CATTTACGCAGCCGTTGGAATTTTAACAAAAACCGGGAAGAATCGAATTTGCAT CATGCCGTCGATGCTGCCATCGTCGCCTGCACAACGCCGAGCGATATCGCCCG AGTCACCGCCTTCTATCAACGGCGCGAACAAAACAAAGAACTGTCCAAAAAGAC GGATCCGCAGTTTCCGCAGCCTTGGCCGCACTTTGCTGATGAACTGCAGGCGC GTTTATCAAAAAATCCAAAGGAGAGTATAAAAGCTCTCAATCTTGGAAATTATGA TAACGAGAAACTCGAATCGTTGCAGCCGGTTTTTGTCTCCCGAATGCCGAAGC GGAGCATAACAGGAGCGGCTCATCAAGAAACATTGCGGCGTTATATCGGCATC GACGAACGGAGCGGAAAAATACAGACGGTCGTCAAAAAGAAACTATCCGAGAT CCAACTG G ATAAAACAG GTCATTTCCCAATGTACG G G AAAG AAAGC GATCCAAG GACATATGAAGCCATTCGCCAACGGTTGCTTGAACATAACAATGACCCAAAAAA GGCGTTTCAAGAGCCTCTGTATAAACCGAAGAAGAACGGAGAACTAGGTCCTAT CATCCGAACAATCAAAATCATCGATACGACAAATCAAGTTATTCCGCTCAACGAT GGCAAAACAGTCGCCTACAACAGCAACATCGTGCGGGTCGACGTCTTTGAGAA AGATGGCAAATATTATTGTGTCCCTATCTATACAATAGATATGATGAAAGGGATC TTGCCAAACAAGGCGATCGAGCCGAACAAACCGTACTCTGAGTGGAAGGAAAT GACGGAGGACTATACATTCCGATTCAGTCTATACCCAAATGATCTTATCCGTATC GAATTTCCCCGAGAAAAAACAATAAAGACTGCTGTGGGGGAAGAAATCAAAATT AAGGATCTGTTCGCCTATTATCAAACCATCGACTCCTCCAATGGAGGGTTAAGT TTGGTTAGCCATGATAACAACTTTTCGCTCCGCAGCATCGGTTCAAGAACCCTC AAACGATTCGAGAAATACCAAGTAGATGTGCTAGGCAACATCTACAAAGTGAGA GGGGAAAAGAGAGTTGGGGTGGCGTCATCTTCTCATTCGAAAGCCGGGGAAAC TATCCGTCCGTTATAA
DETAILED DESCRIPTION
Example 1 : Materials and methods employed in subsequent examples a. Bacterial strains and growth conditions
The moderate thermophile B. smithii ET 138 AsigF AhsdR (Mougiakos, et al., (2017) ACS Synth. Biol. 6, 849-861 ) was used for the gene editing and silencing experiments using ThermoCas9. It was grown in LB2 medium (Bosma, et al. Microb. Cell Fact. 14, 99 (2015)) at 55°C. For plates, 30 g of agar (Difco) per liter of medium was used in all experiments. If needed chloramphenicol was added at the concentration of 7 pg/nriL. For protein expression, E. coli Rosetta (DE3) was grown in LB medium in flasks at 37°C in a shaker incubator at 120 rpm until an Οϋβοο nm of 0.5 was reached after the temperature was switched to 16°C. After 30 min, expression was induced by addition of isopropyl-1 -thio- -d-gal-actopyranoside (IPTG) to a final concentration of 0.5 mM, after which incubation was continued at 16°C. For cloning PAM constructs for 6th and 7th, and 8th positions, DH5-alpha competent E. coli (NEB) was transformed according to the manual provided by the manufacturer and grown overnight on LB agar plates at 37°C. For cloning degenerate 7-nt long PAM library, electro-competent DH10B E. coli cells were transformed according to standard procedures (Sambrook, Fritsch & Maniatis, T. Molecular cloning : a laboratory manual. (Cold Spring Harbor Laboratory, 1989) and grown on LB agar plates at 37°C overnight. E. coli DH5a Apir (Invitrogen) was used for P. putida plasmid construction using the transformation procedure described by Ausubel et al. {Current Protocols in Molecular Biology. (John Wiley & Sons, Inc., 2001 ). doi:10.1002/0471 142727). For all E. coli strains, if required chloramphenicol was used in concentrations of 25 mg/L and kanamycin in 50 mg/L. Pseudomonas putida KT2440 (DSM 6125) strains were cultured at 37°C in LB medium unless stated otherwise. If required, kanamycin was added in concentrations of 50 mg/L and 3-methylbenzoate in a concentration of 3 mM. b. ThermoCas9 expression and purification ThermoCas9 was PCR-amplified from the genome of G. thermodenitrificans T12, then cloned and heterologously expressed in E. coli Rosetta (DE3) and purified using FPLC by a combination of Ni2+-affinity, anion exchange and gel filtration chromatographic steps. The gene sequence was inserted into plasmid pML-1 B (obtained from the UC Berkeley MacroLab, Addgene #29653) by ligation-independent cloning using oligonucleotides (Table 2) to generate a protein expression construct encoding the ThermoCas9 polypeptide sequence (residues 1 -1082) fused with an N-terminal tag comprising a hexahistidine sequence and a Tobacco Etch Virus (TEV) protease cleavage site. To express the catalytically inactive ThermoCas9 protein (Thermo- dCas9), the D8A and H582A point mutations were inserted using PCR and verified by DNA sequencing.
Table 2 | Oligonucleotides used in this study.
Oligo Sequence Description
TATGCC7 A7GAGATTATCAAAAAGG
ATCTTCAC FW for construction of in
BG6494 N N N N N N N CTAG ATCCTTTTAAATTAAA vitro target DNA with 7-nt
AATGAAG
long random PAM sequence
TTTTAAATCAATC [SEQ ID NO: 20]
TATGCCGGA rCCTCAGACCAAGTTTAC
TCATATATACTTTAGATTGATTTAAAAC RV for construction of in vitro
BG6495
TTCATTTTTAATTTAAAAG G ATCTAG target DNA sequences [SEQ ID NO: 21]
Adaptor when annealed with BG7357,
TCGTCGGCAGCGTCAGATGTGTATAA
BG7356 ligates to A-tailed
GAGACAG-T- [SEQ ID NO: 22]
ThermoCas9 cleaved fragments
Adaptor when annealed with
CTGTCTCTTATACACATCTGACGCTGC BG7356, ligates to A-tailed
BG7357
CGACGA [SEQ ID NO: 23]
ThermoCas9 cleaved fragments
FW sequencing adaptor for
TCGTCGGCAGCGTCAG [SEQ PCR amplification of the
BG7358
24]
ThermoCas9 cleaved fragments
RV sequencing adapter for
GTCTCGTGGGCTCGGAGATGTGTATA
PCR amplification of the AGAGACAGGACCATGATTACGCCAAG
C [SEQ ID NO: 25] ThermoCas9 cleaved
C fragments
O
υ
TCGTCGGCAGCGTCAGATGTGTATAA RV sequencing adaptor for GAGACAGGGTCATGAGATTATCAAAAA PCR GGATCTTC [SEQ ID NO: 26]
amplification of the control
< a. Oligo Sequence Description fragments
TATGCC7 A7GAGATTATCAAAAAGG FW for construction of in vitro ATCTTCACCCCCCCAGCTAGATCCTTT
BG8157 target DNA with PAM
TAAATTAAAAATGAAGTTTTAAATCAAT
C [SEQ ID NO: 27] "CCCCCCAG"
TATGCCTCATGAGATTATCAAAAAGG FW for construction of in vitro ATCTTCACCCCCCCAACTAGATCCTTT
BG8158 target DNA with PAM
TAAATTAAAAATGAAGTTTTAAATCAAT
C [SEQ ID NO: 28] "CCCCCCAA"
TATGCC TCA 7GAGATTATCAAAAAGG
FW for construction of in vitro ATCTTCACCCCCCCATCTAGATCCTTT
BG8159
TAAATTAAAAATGAAGTTTTAAATCAAT target DNA Wlt
C [SEQ ID NO: 29] "CCCCCCAT"
TATGCC rCATGAGATTATCAAAAAGG
FW for construction of in vitro ATCTTCACCCCCCCACCTAGATCCTTT
BG8160
TAAATTAAAAATGAAGTTTTAAATCAAT target DNA Wlth PAM C [SEQ ID NO: 30] "CCCCCCAC"
TATGCC rCATGAGATTATCAAAAAGG
FW for construction of in vitro ATCTTCACNNNNTNNCTAGATCCTTTT
BG8161
AAATTAAAAATGAAGTTTTAAATCAATC target DNA [SEQ ID NO: 31 ] "NNNNTNN"
FW for PCR linearization of
ACGGTTATCCACAGAATCAG [SEQ ID
BG8363 PAM
NO: 32]
identification libraries
RV for PCR linearization of
CGGGATTGACTTTTAAAAAAGG [SEQ
BG8364 PAM
ID NO: 33]
identification libraries
TATGCC rCATGAGATTATCAAAAAGG FW for construction of in vitro ATCTTCACCCCCCAAACTAGATCCTTT target DNA with PAM
BG8763
TAAATTAAAAATGAAGTTTTAAATCAAT position
C [SEQ ID NO: 34]
6&7 "AA" Oligo Sequence Description
TATGCC TCA 7GAGATTATCAAAAAGG FW for construction of in vitro
ATCTTCACCCCCCATACTAGATCC I I I target DNA with PAM
BG8764
TAAATTAAAAATGAAG I I I I AAATCAAT position
C [SEQ ID NO: 35]
6&7 "AT"
TATGCC7 A7GAGATTATCAAAAAGG FW for construction of in vitro
ATCTTCACCCCCCAGACTAGATCC I I I target DNA with PAM
BG8765
TAAATTAAAAATGAAG I I I I AAATCAAT position
C [SEQ ID NO: 36]
6&7 "AG"
TATGCC rCATGAGATTATCAAAAAGG FW for construction of in vitro
ATCTTCACCCCCCACACTAGATCC I I I target DNA with PAM
BG8766
TAAATTAAAAATGAAG I I I I AAATCAAT position
C [SEQ ID NO: 37]
6&7 "AC"
TATGCC rCATGAGATTATCAAAAAGG FW for construction of in vitro
ATCTTCACCCCCCTAACTAGATCC I I I target DNA with PAM
BG8767
TAAATTAAAAATGAAG I I I I AAATCAAT position
C [SEQ ID NO: 38]
6&7 "TA"
TATGCC rCATGAGATTATCAAAAAGG FW for construction of in vitro
ATCTTCACCCCCCTTACTAGATCC I I I target DNA with PAM
BG8768
TAAATTAAAAATGAAG I I I I AAATCAAT position
C [SEQ ID NO: 39]
6&7 "TT"
TATGCC rCATGAGATTATCAAAAAGG FW for construction of in vitro
ATCTTCACCCCCCTGACTAGATCC I I I target DNA with PAM
BG8769
TAAATTAAAAATGAAG I I I I AAATCAAT position
C [SEQ ID NO: 40]
6&7 "TG"
TATGCC rCATGAGATTATCAAAAAGG FW for construction of in vitro
ATCTTCACCCCCCTCACTAGATCC I I I target DNA with PAM
BG8770
TAAATTAAAAATGAAG I I I I AAATCAAT position
C [SEQ ID NO: 41 ]
6&7 "TC"
BG8771 TATGCC rCATGAGATTATCAAAAAGG
FW for construction of in vitro ATCTTCACCCCCCGAACTAGATCC I I I Oligo Sequence Description
TAAATTAAAAATGAAGTTTTAAATCAAT target DNA with PAM
C [SEQ ID NO: 42] position
6&7 "GA"
TATGCCrCATGAGATTATCAAAAAGG FW for construction of in vitro
ATCTTCACCCCCCGTACTAGATCC 1 1 1 target DNA with PAM
BG8772
TAAATTAAAAATGAAG 1 1 1 1 AAATCAAT position
C [SEQ ID NO: 43]
6&7 "GT"
TATGCC7 A7GAGATTATCAAAAAGG FW for construction of in vitro
ATCTTCACCCCCCGGACTAGATCC 1 1 1 target DNA with PAM
BG8773
TAAATTAAAAATGAAG 1 1 1 1 AAATCAAT position
C [SEQ ID NO: 44]
6&7 "GG"
TATGCC7 A7GAGATTATCAAAAAGG FW for construction of in vitro
ATCTTCACCCCCCGCACTAGATCC 1 1 1 target DNA with PAM
BG8774
TAAATTAAAAATGAAG 1 1 1 1 AAATCAAT position
C [SEQ ID NO: 45]
6&7 "GC"
TATGCC7 A7GAGATTATCAAAAAGG FW for construction of in vitro
ATCTTCACCCCCCCAACTAGATCC 1 1 1 target DNA with PAM
BG8775
TAAATTAAAAATGAAG 1 1 1 1 AAATCAAT position
C [SEQ ID NO: 46]
6&7 "CA"
TATGCC7 A7GAGATTATCAAAAAGG FW for construction of in vitro
ATCTTCACCCCCCCTACTAGATCC 1 1 1 target DNA with PAM
BG8776
TAAATTAAAAATGAAG 1 1 1 1 AAATCAAT position
C [SEQ ID NO: 47]
6&7 "CT"
TATGCC7 A7GAGATTATCAAAAAGG FW for construction of in vitro
ATCTTCACCCCCCCGACTAGATCC 1 1 1 target DNA with PAM
BG8777
TAAATTAAAAATGAAG 1 1 1 1 AAATCAAT position
C [SEQ ID NO: 48]
6&7 "CG"
FW for construction of in vitro
BG8778 TATGCC7 A7GAGATTATCAAAAAGG
target DNA with PAM ATCTTCACCCCCCCCACTAGATCC 1 1 1
position Oligo Sequence Description f iidlRIt tn vroran moueor sg
TAAATTAAAAATGAAGTTTTAAATCAAT 6&7 "CC"
C [SEQ ID NO: 49]
FW for PCR amplification of the
AAGCTTGAAATAATACGACTCACTATA
BG6574 sgRNA template for the first
GG [SEQ ID NO: 50]
PAM identification process (30nt long spacer)
FW for PCR amplification of
AAAAAAGACCTTGACG I I I I CC [SEQ the sgRNA template for the
BG6576
ID NO: 51 ] first
PAM identification process
RV for PCR amplification of the sgRNA template for all
AAGCTTGAAATAATACGACTCACTATA the
BG9307 GGTGAGATTATCAAAAAGGATCTTCAC PAM identification processes
GTC [SEQ ID NO: 52]
except the first one (25nt long
spacer)
RV for PCR amplification of the
3-hairpins long sgRNA
AAAACGCCTAAGAGTGGGGAATG
BG9309 template
[SEQ ID NO: 53]
o for all the PAM identification
'_Z
υ processes except the first <Λ
one
RV for PCR amplification of the
AAAAGGCGATAGGCGATCC [SEQ ID 2-hairpins long sgRNA
BG9310 template
NO: 54]
for all the PAM identification
< processes except the first one Oligo Sequence Description
RV for PCR amplification of the
1 -hairpin long sgRNA
AAAACGGGTCAGTCTGCCTATAG
BG931 1 template
[SEQ ID NO: 55]
for all the PAM identification processes except the first one
AAGCTTGAAATAATACGACTCACTATA GGTGAGATTATCAAAAAGGATCTTCAC
pT7 and 25nt spacer sgRNA
BG9308 GTC [SEQ ID NO: 56] Fw
AAGCTTGAAATAATACGACTCACTATA
„„„„„ „„„„„ „ „ pT7 and 24nt spacer sgRNA
GGAGATTATCAAAAAGGATCTTCACGT H y
Fw
BG101 18 CA [SEQ ID NO: 57]
AAGCTTGAAATAATACGACTCACTATA
„ . . . . . . . pT7 and 23nt spacer sgRNA
GGAAGATTATCAAAAAGGATCTTCACG H y
Fw
BG101 19 TCATAG [SEQ ID NO: 58]
AAGCTTGAAATAATACGACTCACTATA
GGATTATCAAAAAGGATCTTCACGTCA pT? 3nd 22nt Sp3Cer SgRNA
Fw
BG10120 TAGT [SEQ ID NO: 59]
AAGCTTGAAATAATACGACTCACTATA
„ . . . . „ „ pT7 and 21 nt spacer sgRNA
GGAATTATCAAAAAGGATCTTCACGTC H y
Fw
BG10121 ATAGTT [SEQ ID NO: 60]
AAGCTTGAAATAATACGACTCACTATA
„„ „„„„„ „ „ T pT7 and 20nt spacer sgRNA
GGTTATCAAAAAGGATCTTCACGTCAT H y
Fw
BG10122 AGTT [SEQ ID NO: 61 ]
AAGCTTGAAATAATACGACTCACTATA
T„ „„„„„ „ „ T pT7 and 19nt spacer sgRNA
GGTATCAAAAAGGATCTTCACGTCATA H y
Fw
BG10123 GTTC [SEQ ID NO: 62] Oligo Sequence Description
AAGCTTGAAATAATACGACTCACTATA
pT7 and 18nt spacer sgRNA GGATCAAAAAGGATCTTCACGTCATAG
Fw
TTC [SEQ ID NO: 63]
AAAACGCCTAAGAGTGGGGAATGCCC BG9312 GAAGAAAGCGGGCGATAGGCGATCC 3 loops sgRNA OH Rv
[SEQ ID NO: 64]
For the construction of the
AAGCTTGGCGTAATCATGGTC [SEQ
BG8191 ^ ^ pThermoCas9_ctrl plasmid & pThermoCas9_bsApyrF 112
For the construction of the p
TCATGAGTTCCCATGTTGTG [SEQ ID
BG8192 ThermoCas9 ctrl plasmid &
NO: 66]
pThermoCas9_bsApyrF 112 For the construction of the p tatggcgaatcacaacatgggaactcatgaGAACA
BG81 94 TCCTCTTTCTTAG [SEQ ID NO: 67] Therm°Cas9_ctrl plasmid & pThermoCas9_bsApyrF 112 For the construction of the p gccgatatcaagaccgattttatacttcatTTAAGTTA
BG81 95 CCTCCTCGATTG [SEQ ID NO: 68] Therm°Cas9_ctrl plasmid & pThermoCas9_bsApyrF 112 For the construction of the p
ATGAAGTATAAAATCGGTCTTG [SEQ
BG8196 ThermoCas9 ctrl plasmid &
ID NO: 69]
pThermoCas9_bsApyrF 112 υ
3
For the construction of the p
TAACGGACGGATAGTTTC [SEQ ID
ThermoCas9 ctrl plasmid &
NO: 70]
C
'ϋ pThermoCas9_bsApyrF 112
C
gaaagccggggaaactatccgtccgttataAATCAG For the construction of the p 3
E
as BG8198 ACAAAATGGCCTGCTTATG [SEQ ID ThermoCas9_ctrl plasmid &
E NO- 711
J pThermoCas9_bsApyrF1 /2 3
LU Oligo Sequence Description gaactatgacactttattttcagaatggacGTATAAC
For the construction of the p
BG8263 GGTATCCA I I I I AAGAATAATCC [SEQ
ID NO: 72] ThermoCas9_ctrl plasmid accgttatacgtccattctgaaaataaagtGTCATAG For the construction of the p
BG8268
TTCCCCTGAGAT [SEQ ID NO: 73] ThermoCas9_ctrl plasmid
For the construction of the p aacagctatgaccatgattacgccaagcttCCCTCC
BG8210 ThermoCas9_ctrl plasmid &
C ATG C AC AATAG [SEQ ID NO: 74]
pThermoCas9_bsApyrF 112 gaactatgacatcatggagttttaaatccaGTATAAC
For the construction of the p
BG8261 GGTATCCA I I I I AAGAATAATCC [SEQ
ThermoCas9_bsApyrF1
ID NO: 75] accgttatactggatttaaaactccatgatGTCATAG For the construction of the p
BG8266
TTCCCCTGAGAT [SEQ ID NO: 76] ThermoCas9_bsApyrF2 gaactatgaccacccagcttacatcaacaaGTATAA
For the construction of the p
BG8317 CGGTATCCA I I I I AAGAATAATCC
[SEQ ID NO: 77] ThermoCas9_AbspyrF2 accgttatacttgttgatgtaagctgggtgGTCATAG For the construction of the p
BG8320
TTCCCCTGAGAT [SEQ ID NO: 78] ThermoCas9_bsApyrF2
CTATCGGCATTACGTCTATC [SEQ ID For the construction of the
BG9075
NO: 79] pThermoCas9i_ctrl
GCGTCGACTTCTGTATAGC [SEQ ID For the construction of the p
BG9076
NO: 80] ThermoCas9i_ctrl
TGAAGTATAAAATCGGTCTTGCTATCG For the construction of the p
BG9091
GCATTACGTCTATC [SEQ ID NO: 81 ] ThermoCas9i_ctrl
CAAGCTTCGGCTGTATGGAATCACAG
For the construction of the p
BG9092 CGTCGACTTCTGTATAGC [SEQ ID
ThermoCas9i_ctrl
NO: 82] Oligo Sequence Description
GCTGTGATTCCATACAG [SEQ ID NO: For the construction of the p
BG9077
83] ThermoCas9i_ctrl
GGTGCAGTAGGTTGCAGCTATGCTTG
For the construction of the p
BG9267 TATAACG GTATCCAT [SEQ ID NO:
84] ThermoCas9i_ctrl
AAG C ATAG CTG C AAC CTACTG C ACCG
For the construction of the p
BG9263 TCATAGTTCCCCTGAGATTATCG [SEQ
ThermoCas9i_ctrl
ID NO: 85]
TCATGACCAAAATCCCTTAACG [SEQ For the construction of the p
BG9088
ID NO: 86] ThermoCas9i_ctrl
TTAAGGGATTTTGGTCATGAGAACATC For the construction of the p
BG9089
CTC I I I CTTAG [SEQ ID NO: 87] ThermoCas9i_ctrl
GCAAGACCGATTTTATACTTCATTTAA For the construction of the p
BG9090
G [SEQ ID NO: 88] ThermoCas9i_ctrl
GGATCCCATGACGCTAGTATCCAGCT
For the construction of the p
BG9548 GGGTCATAGTTCCCCTGAGATTATCG
[SEQ ID NO: 89] ThermoCas9i_ldhL
TTCAATATTTTTTTTGAATAAAAAATAC
For the construction of the p
BG9601 GATACAATAAAAATGTCTAGAAAAAGA
TAAAAATG [SEQ ID NO: 90] ThermoCas9i_ldhL
TTTTTTATTCAAAAAAAATATTGAATTTT
For the construction of the p
BG9600 AAAAATGATGGTGCTAGTATGAAG
[SEQ ID NO: 91 ] ThermoCas9i_ldhL
CCAGCTGGATACTAGCGTCATGGGAT
For the construction of the p
BG9549 CCGTATAACGGTATCCA I I I l AAGAAT
AATCC [SEQ ID NO: 92] ThermoCas9i_ldhL
TCGGGGGTTCGTTTCCCTTG [SEQ ID FW to check genomic pyrF
BG8552
NO: 93] deletion KO check Oligo Sequence Description
CTTACACAGCCAGTGACGGAAC [SEQ RV to check genomic pyrF
BG8553
ID NO: 94] deletion KO check
GCCGGCGTCCCGGAAAACGA [SEQ For the construction of the p
BG2365
ID NO: 95] ThermoCas9_ppApyrF
GCAGGTCGGGTTCCTCGCATCCATGC For the construction of the p
BG2366
CCCCGAACT [SEQ ID NO: 96] ThermoCas9_ppApyrF ggcttcggaatcgttttccgggacgccggcACGGCA
For the construction of the p
BG2367 TTGGCAAGGCCAAG [SEQ ID NO:
97] ThermoCas9_ppApyrF gacacaggcatcggtGCAGGGTCTCTTGGC For the construction of the p
BG2368
AAGTC [SEQ ID NO: 98] ThermoCas9_ppApyrF gccaagagaccctgCACCGATGCCTGTGTC For the construction of the p
BG2369
GAACC [SEQ ID NO: 99] ThermoCas9_ppApyrF cttggcggaaaacgtcaaggtcttttttacACGCGCA For the construction of the p
BG2370
TCAACTTCAAGGC [SEQ ID NO: 100] ThermoCas9_ppApyrF atgacgagctgttcaccagcagcgcTATTATTG
For the construction of the p
BG2371 AAGCA I I I ATCAGGG [SEQ ID NO:
ThermoCas9_ppApyrF
101 ]
GTAAAAAAGACCTTGACGTTTTC [SEQ For the construction of the p
BG2372
ID NO: 102] ThermoCas9_ppApyrF tatgaagcgggccatTTGAAGACGAAAGGGC For the construction of the p
BG2373
CTC [SEQ ID NO: 103] ThermoCas9_ppApyrF taatagcgctgctggtgaacagctcGTCATAGT
For the construction of the p
BG2374 TCCCCTGAGATTATCG [SEQ ID NO:
104] ThermoCas9_ppApyrF tggagtcatgaacatATGAAGTATAAAATCGG For the construction of the p
BG2375
TCTTG [SEQ ID NO: 105] ThermoCas9_ppApyrF Oligo Sequence Description ccctttcgtcttcAAATGGCCCGCTTCATAAG For the construction of the p
BG2376
CAG [SEQ ID NO: 106] ThermoCas9_ppApyrF gattttatacTTCATATGTTCATGACTCCATT For the construction of the p
BG2377
ATTATTG [SEQ ID NO: 107] ThermoCas9_ppApyrF gggggcatggatgCGAGGAACCCGACCTG For the construction of the p
BG2378
CATTGG [SEQ ID NO: 108] ThermoCas9_ppApyrF
ACACGGCGGATGCACTTACC [SEQ ID FW for confirmation of plasmid
BG2381 NO: 109]
integration and pyrF deletion in P. putida
TGGACGTGTACTTCGACAAC [SEQ ID RV for confirmation of pyrF
BG2382
NO: 1 10] deletion in P. putida
ACACGGCGGATGCACTTACC [SEQ ID RV for confirmation of NO: 1 1 1 ] plasmid
BG2135
integration in P. putida
TGGACGTGTACTTCGACAAC [SEQ ID
BG8196 thermocas9 seq. 1
NO: 1 12]
TAACGGACGGATAGTTTC [SEQ
BG8197 thermocas9 seq. 2
NO: 1 13]
GCCTCATGAATGCAGCGATGGTCCGG TGTTC [SEQ
BG6850 pyrr US
ID NO: 1 14]
2 Φ
GCCTCATGAGTTCCCATGTTGTGATTC
C [SEQ ID
BG6849
'ϋ pyrF DS
C
<D
3 NO: 1 15]
σ
Φ
CO Oligo Sequence Description
CAATCCAACTGGGCTTGAC [SEQ ID
BG6769 thermocas9 seq. 3
NO: 1 16]
CAAGAACTTTATTGGTATAG [SEQ ID
BG6841 thermocas9 seq. 4
NO: 1 17]
TTGCAGAAATGGTTGTCAAG [SEQ ID
BG6840 thermocas9 seq. 5
NO: 1 18]
GAGATAATGCCGACTGTAC [SEQ ID
BG9215 pNW33n backbone seq. 1
NO: 1 19]
AGGGCTCGCCTTTGGGAAG [SEQ ID
BG9216 pNW33n backbone seq. 2
NO: 120]
GTTGCCAACGTTCTGAG [SEQ ID NO:
BG9505 thermocas9 seq. 6
121 ]
AATCCACGCCGTTTAG [SEQ ID NO:
BG9506 thermocas9 seq. 7
122]
ACGGTTATCCACAGAATCAG [SEQ ID FW for PCR linearization of
BG8363
N0 : 1 23- DNA target
CGGGATTGACTTTTAAAAAAGG [SEQ RV for PCR linearization of
BG8364
ID NO: 124] DNA target
AAACTTCATTTTTAATTTAAAAGGATCT AGAACCCCCCGTGAAGATCCTTTTTGA Non-template strand
BG9302 TAATCTCATGACCAAAATCCCTTAACG oligonucleotide for ssDNA
TGAGTTTTCGTTCCACTGAGCGTCAGA
cleavage assays
CCCCGTAGAAA [SEQ ID NO: 125]
TTTCTACGGGGTCTGACGCTCAGTGG
>> AACGAAAACTCACGTTAAGGGATTTTG Template strand
CO
< )
< ) BG9303 GTCATGAGATTATCAAAAAGGATCTTC ol igon ucleotid e
CO
<S) ACCCCCCCAACTAGATCCTTTTAAATT
) for ssDNA cleavage assays CO
>
CO AAAAATGAAGTTT [SEQ ID NO: 126] Oligo Sequence Description
TTTCTACGGGGTCTGACGCTCAGTGG AACGAAAACTCACGTTAAGGGATTTTG Template strand
BG9304 GTCATGAGATTATCAAAAAGGATCTTC oligonucleotide
ACGGGGGGTTCTAGATCCTTTTAAATT
for ssDNA cleavage assays
AAAAATGAAGTTT [SEQ ID NO: 127]
3
C TACTTCCAATCCAATGCAAAGTATAAA
as
Figure imgf000048_0001
moCa BG9665 ATGACGAAAGGAGTTTCTTATTATG
s9 RV qPCR check Idhl
[SEQ ID NO: 130]
ex pre
ssion
AACGGTATTCCGTGATTAAG [SEQ ID
BG9666 FW qPCR check Idhl
NO: 131 ]
Restriction sites are shown in italics. The PAMs are colored red. Spacer regions are shown in bold. Nucleotides in lowercase letters correspond to primer overhangs for HiFi DNA Assembly. LIC: Ligase Independent cloning; FW: Forward primer; RV: Reverse primer.
The proteins were expressed in E. coli Rosetta 2 (DE3) strain. Cultures were grown to an OD6oonm of 0.5-0.6. Expression was induced by the addition of IPTG to a final concentration of 0.5 mM and incubation was continued at 16°C overnight. Cells were harvested by centrifugation and the cell pellet was resuspended in 20 ml_ of Lysis Buffer (50 mM sodium phosphate pH 8, 500 mM NaCI, 1 mM DTT, 10 mM imidazole) supplemented with protease inhibitors (Roche complete, EDTA-free) and lysozyme. Once homogenized, cells were lysed by sonication (Sonoplus, Bandelin) using a using an ultrasonic MS72 microtip probe (Bandelin), for 5-8 minutes consisting of 2s pulse and 2.5s pause at 30% amplitude and then centrifuged at 16,000xg for 1 hour at 4°C to remove insoluble material. The clarified lysate was filtered through 0.22 micron filters (Mdi membrane technologies) and applied to a nickel column (Histrap HP, GE Lifesciences), washed and then eluted with 250 mM imidazole. Fractions containing ThermoCas9 were pooled and dialyzed overnight into the dialysis buffer (250 mM KCI, 20 mM HEPES/KOH, and 1 mM DTT, pH 7.5). After dialysis, sample was diluted 1 :1 in 10 mM HEPES/KOH pH 8, and loaded on a heparin FF column pre-equilibrated in IEX-A buffer (150 mM KCI, 20 mM HEPES/KOH pH 8). Column was washed with IEX- A and then eluted with a gradient of IEX-C (2M KCI, 20 mM HEPES/KOH pH 8). The sample was concentrated to 700 μΙ_ prior to loading on a gel filtration column (HiLoad 16/600 Superdex 200) via FPLC (AKTA Pure). Fractions from gel filtration were analysed by SDS-PAGE; fractions containing ThermoCas9 were pooled and concentrated to 200 μΙ_ (50 mM sodium phosphate pH 8, 2 mM DTT, 5% glycerol, 500 mM NaCI) and either used directly for biochemical assays or frozen at -80°C for storage. c. In vitro synthesis of sgRNA
The sgRNA module was designed by fusing the predicted crRNA and tracrRNA sequences with a 5'-GAAA-3' linker. The sgRNA-expressing DNA sequence was put under the transcriptional control of the T7 promoter. It was synthesized (Baseclear, Leiden, The Netherlands) and provided in the pUC57 backbone. All sgRNAs used in the biochemical reactions were synthesized using the HiScribe™ T7 High Yield RNA Synthesis Kit (NEB). PCR fragments coding for sgRNAs, with the T7 sequence on the 5' end, were utilized as templates for in vitro transcription reaction. T7 transcription was performed for 4 hours. The sgRNAs were run and excised from urea-PAGE gels and purified using ethanol precipitation. d. In vitro cleavage assay
In vitro cleavage assays were performed with purified recombinant ThermoCas9. ThermoCas9 protein, the in vitro transcribed sgRNA and the DNA substrates (generated using PCR amplification using primers described in Table 2) were incubated separately (unless otherwise indicated) at the stated temperature for 10 min, followed by combining the components together and incubating them at the various assay temperatures in a cleavage buffer (100 mM sodium phosphate buffer (pH=7), 500 mM NaCI, 25 mM MgCI2, 25 (VA/%) glycerol, 5 mM dithiothreitol (DTT)) for 1 hour. Each cleavage reaction contained 160 nM of ThermoCas9 protein, 4 nM of substrate DNA, and 150 nM of synthetized sgRNA. Reactions were stopped by adding 6x loading dye (NEB) and run on 1 .5% agarose gels. Gels were stained with SYBR safe DNA stain (Life Technologies) and imaged with a Gel DocTM EZ gel imaging system (Bio-rad). e. Library construction for in vitro PAM screen
For the construction of the PAM library, a 122-bp long DNA fragment, containing the protospacer and a 7-bp long degenerate sequence at its 3'-end, was constructed by primer annealing and Klenow fragment (exo-) (NEB) based extension. The PAM- library fragment and the pNW33n vector were digested by BspHI and BamHI (NEB) and then ligated (T4 ligase, NEB). The ligation mixture was transformed into electro- competent E. coli DH10B cells and plasmids were isolated from liquid cultures. For the 7nt-long PAM determination process, the plasmid library was linearized by Sapl (NEB) and used as the target. For the rest of the assays the DNA substrates were linearized by PCR amplification. f. PAM screening assay
The PAM screening of thermoCas9 was performed using in vitro cleavage assays, which consisted of (per reaction): 160 nM of ThermoCas9, 150 nM in vitro transcribed sgRNA, 4 nM of DNA target, 4 μΙ of cleavage buffer (100 mM sodium phosphate buffer pH 7.5, 500 mM NaCI, 5 mM DTT, 25% glycerol) and MQ water up to 20 μΙ final reaction volume. The PAM containing cleavage fragments from the 55°C reactions were gel purified, ligated with lllumina sequencing adaptors and sent for lllumina HiSeq 2500 sequencing (Baseclear). Equimolar amount of non-thermoCas9 treated PAM library was subjected to the same process and sent for lllumina HiSeq 2500 sequencing as a reference. HiSeq reads with perfect sequence match to the reference sequence were selected for further analysis. From the selected reads, those present more than 1000 times in the ThermoCas9 treated library and at least 10 times more in the ThermoCas9 treated library compared to the control library were employed for WebLogo analysis (Crooks et ai, Genome Res. 14, 1 188-1 190 (2004)). g. Editing and silencing constructs for 43. smithii and P. putida All the primers and plasmids used for plasmid construction were designed with appropriate overhangs for performing NEBuilder HiFi DNA assembly (NEB), and they are listed in Table 2 and 3 respectively. The fragments for assembling the plasmids were obtained through PCR with Q5 Polymerase (NEB) or Phusion Flash High-Fidelity PCR Master Mix (ThermoFisher Scientific), the PCR products were subjected to 1 % agarose gel electrophoresis and they were purified using Zymogen gel DNA recovery kit (Zymo Research). The assembled plasmids were transformed to chemically competent E. coli DH5a cells (NEB), or to E. coli DH5a Apir (Invitrogen) in the case of P. putida constructs, the latter to facilitate direct vector integration. Single colonies were inoculated in LB medium, plasmid material was isolated using the GeneJet plasmid miniprep kit (ThermoFisher Scientific) and sequence verified (GATC-biotech) and 1 g of each construct transformed of B. smithii ET 138 electro-competent cells, which were prepared according to a previously described protocol (Bosma, et al. Microb. Cell Fact. 14, 99 (2015)). The MasterPure™ Gram Positive DNA Purification Kit (Epicentre) was used for genomic DNA isolation from B. smithii and P. putida liquid cultures.
For the construction of the pThermoCas9_ctrl, pThermoCas9_bsApyrF1 and pThermoCas9_bsApyrF2 vectors, the pNW33n backbone together with the ApyrF homologous recombination flanks were PCR amplified from the pWUR_Cas9sp1_hr vector (Mougiakos, et al. ACS Synth. Biol. 6, 849-861 (2017)) (BG8191 and BG8192). The native PxyiA promoter was PCR amplified from the genome of B. smithii ET 138 (BG8194 and BG8195). The thermocas9 gene was PCR amplified from the genome of G. thermodenitrificans T12 (BG8196 and BG8197). The PPta promoter was PCR amplified from the pWUR_Cas9sp1_hr vector (Mougiakos, et al. ACS Synth. Biol. 6, 849-861 (2017)) (BG8198 and BG8261_2/BG8263_nc2/ BG8317_3). The spacers followed by the sgRNA scaffold were PCR amplified from the pUC57_T7t12sgRNA vector (BG8266_2/BG8268_nc2/8320_3 and BG8210). A four-fragment assembly was designed and executed for the construction of the pThermoCas9i_ldhl_ vectors. Initially, targeted point mutations were introduced to the codons of the thermocas9 catalytic residues (mutations D8A and H582A), through a two-step PCR approach using pThermoCas9_ctrl as template. During the first PCR step (BG9075, BG9076), the desired mutations were introduced at the ends of the produced PCR fragment and during the second step (BG9091 , BG9092) the produced fragment was employed as PCR template for the introduction of appropriate assembly- overhangs. The part of the thermocas9 downstream the second mutation along with the IdhL silencing spacer was PCR amplified using pThermoCas9_ctrl as template (BG9077 and BG9267). The sgRNA scaffold together with the pNW33n backbone was PCR amplified using pThermoCas9_ctrl as template (BG9263 and BG9088). The promoter together with the part of the thermocas9 upstream the first mutation was PCR amplified using pThermoCas9_ctrl as template (BG9089, BG9090)
A two-fragment assembly was designed and executed for the construction of pThermoCas9i_ctrl vector. The spacer sequence in the pThermoCas9i_ldhl_ vector was replaced with a random sequence containing Bael restriction sites at both ends. The sgRNA scaffold together with the pNW33n backbone was PCR amplified using pThermoCas9_ctrl as template (BG9548, BG9601 ). The other half of the construct consisted of Thermo-dCas9 and promoter was amplified using pThermoCas9i_ldhl_ as template (BG9600, BG9549). A five-fragment assembly was designed and executed for the construction of the P. putida KT2440 vector pThermoCas9_ppApyrF. The replicon from the suicide vector pEMG was PCR amplified (BG2365, BG2366). The flanking regions of pyrF were amplified from KT2440 genomic DNA (BG2367, BG2368 for the 576-bp upstream flank, and BG2369, BG2370 for the 540-bp downstream flank). The flanks were fused in an overlap extension PCR using primers BG2367 and BG2370 making use of the overlaps of primers BG2368 and BG2369. The sgRNA was amplified from the pThermoCas9_ctrl plasmid (BG2371 , BG2372). The constitutive P3 promoter was amplified from pSWJ-Scel (BG2373, BG2374). This promoter fragment was fused to the sgRNA fragment in an overlap extension PCR using primers BG2372 and BG2373 making use of the overlaps of primers BG2371 and BG2374. ThermoCas9 was amplified from the pThermoCas9_ctrl plasmid (BG2375, BG2376). The inducible Pm- XylS system, to be used for 3-methylbenzoate induction of ThermoCas9 was amplified from pSWJ-Scel (BG2377, BG2378).
Table 3. Plasmids used in this study
Plasmid Description Restriction Primers Source sites used pNW33n E. coli-Bacillus shuttle vector, cloning BGSC vector, CamR pUC57_T7sg pUC57 vector containing DNA encoding Baseclear RNAfull the sgRNA under the control of T7
promoter; serves as a template for in vitro
transcription of full length
Repeat/Antirepeat sgRNAs pMA2_T7sg Vector containing DNA encoding the - - Gen9
RNAtruncate truncated Repeat/Antirepeat part of the
d R/AR sgRNA under the control of T7 promoter;
serves as a template for in vitro
transcription of truncated
Repeat/Antirepeat sgRNAs pRARE T7 RNA polymerase based expression - - EMD vector, KanR Millipore pML-1 B E. coli Rosetta™ (DE3) plasmid, encodes - - Macrolab, rare tRNAs, CamR Addgene pEMG P. putida suicide vector, used as See Table 1
template for replicon and KanR 2 pSWJ-Scel P. putida vector containing l-Scel, used See Table 1
as template for xylS and Ppm 2 pWUR_Cas9 pNW33n with spCas9-module containing - - 2 sp1_hr spacer targeting the pyrF gene. This
plasmid was used as a template for
constructing the ThermoCas9 based
constructs pThermo_Ca thermocas9 with N-term. His-tag and Sspl and BG7886 This study s9 TEV cleavage site in pML-1 B. Expression Ligase and
vector for ThermoCas9 Independent BG7887
Cloning Plasmid Description Restriction Primers Source sites used pThermo_dC cas9c/i 7ermocas9 with N-term. His-tag Sspl and BG7886 This study as9 and TEV cleavage site in pML-1 B. Ligase and
Expression vector for catalytically Independent BG7888 inactive (dead) dThermoCas9 Cloning pNW- Target sequence in pNW33n vector BamHI and See Table This study
PAM7nt containing a 7-nt degenerate PAM for in BspHI 2
vitro PAM determination assay pNW63- Target sequence in pNW33n vector BamHI and See Table This study pNW78 containing distinct nucleotides at the 6th BspHI 2
and 7th positions of the PAM
(CCCCCNNA) pThermoCas pNW33n with ThermoCas9-module1 See Table This study 9_ctrl containing a non-targeting spacer. Used 2
as a negative control pThermoCas pNW33n with ThermoCas9-module1 See Table This study 9_bsApyrF1 containing spacer 1 targeting the pyrF 2
gene and the fused us+ds pyrF-flanks pThermoCas pNW33n with ThermoCas9-module1 See Table This study 9_bsApyrF2 containing spacer 2 targeting the pyrF 2
gene and the fused us+ds pyrF-flanks pThermoCas pNW33n with Thermo-dCas9-module2 See Table This study
9i ctrl containing a non-targeting spacer. Used 2
as a wild-type control pThermoCas pNW33n with Thermo-dCas9-module2 See Table This study 9i_ldhl_ containing spacer 2 targeting the IdhL 2
gene pThermoCas pEMG with ThermoCas9-module3 for See Table This study 9_ppApyrF Pseudomonas putida containing a 2
spacer targeting the a spacer targeting
the pyrF gene and the fused us+ds pyrF- flanks The ThermoCas9 module contains thermocas9 under the native Pxy/L promoter followed by the sgRNA under the B. coagulans Ppta promoter (Figure 1 ).
2 Like the ThermoCas9 module, but with the thermo-dCas9 instead of thermocas9 (Figure 1 ).
3 The ThermoCas9 module for Pseudomonas putida contains thermocas9 under the transcriptional control of the inducible Pm-XylS system followed by the sgRNA under the constitutive P3 promoter.
h. Editing protocol for P. putida
Transformation of the plasmid to P. putida was performed according to Choi et al. (Choi et al., J. Microbiol. Methods 64, 391-397 (2006)). After transformation and selection of integrants, overnight cultures were inoculated. 10μΙ of overnight culture was used for inoculation of 3ml fresh selective medium and after 2 hours of growth at 37°C ThermoCas9 was induced with 3-methylbenzoate. After an additional 6h, dilutions of the culture were plated on non-selective medium supplemented with 3- methylbenzoate. For the control culture the addition of 3-methylbenzoate was omitted in all the steps. Confirmation of plasmid integration in the P. putida chromosome was done by colony PCR with primers BG2381 and BG2135. Confirmation of pyrF deletion was done by colony PCR with primers BG2381 and BG2382. i. RNA isolation
RNA isolation was performed by the phenol extraction based on a previously described protocol (van Hijum et al. BMC Genomics 6, 77 (2005)). Overnight 10 ml_ cultures were centrifuged at 4°C and 4816 xg for 15 min and immediately used for RNA isolation. After removal of the medium, cells were suspended in 0.5 ml_ of ice- cold TE buffer (pH 8.0) and kept on ice. All samples were divided into two 2ml_ screw- capped tubes containing 0.5 g of zirconium beads, 30 μΙ_ of 10% SDS, 30 μΙ_ of 3 M sodium acetate (pH 5.2), and 500 μΙ_ of Roti-Phenol (pH 4.5-5.0, Carl Roth GmbH). Cells were disrupted using a FastPrep-24 apparatus (MP Biomedicals) at 5500 rpm for 45 s and centrifuged at 4°C and 10 000 rpm for 5 min. 400 μΙ_ of the water phase from each tube was transferred to a new tube, to which 400 μΙ_ of chloroform-isoamyl alcohol (Carl Roth GmbH) was added, after which samples were centrifuged at 4 °C and 18 400 xg for 3 min. 300 μΙ_ of the aqueous phase was transferred to a new tube and mixed with 300 μΙ_ of the lysis buffer from the high pure RNA isolation kit (Roche). Subsequently, the rest of the procedure from this kit was performed according to the manufacturer's protocol, except for the DNase incubation step, which was performed for 45 min. The concentration and integrity of cDNA was determined using Nanodrop- 1000 Integrity and concentration of the isolated RNA was checked on a NanoDrop 1000. j. Quantification of mRNA by RT-qPCR
First-strand cDNA synthesis was performed for the isolated RNA using SuperScriptTM III Reverse Transcriptase (Invitrogen) according to manufacturer's protocol. qPCR was performed using the PerfeCTa SYBR Green Supermix for iQ from Quanta Biosciences. 40 ng of each cDNA library was used as the template for qPCR. Two sets of primers were used; BG9665:BG9666 amplifying a 150-nt long region of the IdhL gene and BG9889:BG9890 amplifying a 150-nt long sequence of the rpoD (RNA polymerase sigma factor) gene which was used as the control for the qPCR. The qPCR was run on a Bio-Rad C1000 Thermal Cycler. k. HPLC
A high-pressure liquid chromatography (HPLC) system ICS-5000 was used for lactate quantification. The system was operated with Aminex HPX 87H column from Bio-Rad Laboratories and equipped with a UV1000 detector operating on 210 nm and a RI-150 40°C refractive index detector. The mobile phase consisted of 0.16 N H2SO4 and the column was operated at 0.8 mL/min. All samples were diluted 4:1 with 10 mM DMSO in 0.01 N H2SO4.
Example 2: ThermoCas9 PAM determination
The first step towards the characterization of ThermoCas9 was the in silico prediction of its PAM preferences for successful cleavage of a DNA target. 10 spacers of the G. thermodenitrificans T12 CRISPR locus were used to search for potential protospacers in viral and plasmid sequences using CRISPRtarget (Biswas et al. RNA Biol. 10, 817- 827 (2013)). As only two hits were obtained with phage genomes (Figure 4A), an in vitro PAM determination approach was used. The predicted sgRNA sequence that contained a spacer for ThermoCas9-based targeting linear dsDNA substrates was in vitro transcribed with a matching protospacer. The protospacer was flanked at its 3'- end by randomized 7-base pair (bp) sequences. After performing ThermoCas9-based cleavage assays at 55°C, the cleaved members of the library (together with a non- targeted library sample as control) were deep-sequenced and compared in order to identify the ThermoCas9 PAM preference (Figure 2A). The sequencing results revealed that ThermoCas9 introduces double stranded DNA breaks that, in analogy to mesophilic Cas9 variants, are located mostly between the 3rd and the 4th PAM proximal nucleotides. Moreover, the cleaved sequences revealed that ThermoCas9 recognizes a 5'-NNNNCNR-3' [SEQ ID NO: 132] PAM, with subtle preference for cytosine at the 1st, 3rd, 4th and 6th PAM positions (Figure 2B). Recent studies have revealed the importance of the 8th PAM position for target recognition of certain Type IIC Cas9 orthologues (Karvelis et al. Genome Biol. 16, 253 (2015); Kim et al. Genome Res. 24, 1012-9 (2014)). For this purpose, and taking into account the results from the in silico ThermoCas9 PAM prediction, additional PAM determination assays were performed. This revealed optimal targeting efficiency in the presence of an adenine at the 8th PAM position (Figure 2C). Interestingly, despite the limited number of hits, the aforementioned in silico PAM prediction (Figure4B) also suggested the significance of a cytosine at the 5th and an adenine at the 8th PAM positions.
To further clarify the ambiguity of the PAM at the 6th and 7th PAM positions, a set of 16 different target DNA fragments was generated, in which the matching protospacer was flanked by 5'-CCCCCNNA-3' PAMs. Cleavage assays of these fragments (each with a unique combination of the 6th and 7th nucleotide) were performed in which the different components (ThermoCas9, sgRNA guide, dsDNA target) were pre-heated separately at different temperatures (20, 30, 37, 45, 55 and 60°C) for 10 min before combining and incubating them for 1 hour at the corresponding assay temperature. When the assays were performed at temperatures between 37°C and 60°C, all the different DNA substrates were cleaved (Figure 2D, Figure 5). However, the most digested target fragments consisted of PAM sequences (5th to 8th PAM positions) 5'- CNAA-3' and 5'-CMCA-3', whereas the least digested targets contained a 5'-CAKA-3' PAM. At 30°C, only cleavage of the DNA substrates with the optimal PAM sequences (5th to 8th PAM positions) 5'-CNAA-3' and 5'-CMCA-3' was observed. Lastly, at 20°C only the DNA substrates with (5th to 8th PAM positions) 5'-CVAA-3' and 5'-CCCA PAM sequences were targeted (Figure 5), making these sequences the most preferred PAMs. These findings demonstrate that at its lower temperature limit, ThermoCas9 only cleaves fragments with a preferred PAM.
Example 3: Thermostability and truncations The predicted tracrRNA consists of the anti-repeat region followed by three hairpin structures (Figure 3A). Using the tracrRNA along with the crRNA to form a sgRNA chimera resulted in successful guided cleavage of the DNA substrate. It was observed that a 41 -nt long deletion of the spacer distal end of the full-length repeat-anti-repeat hairpin (Figure 3A), most likely better resembling the dual guide's native state, had little to no effect on the DNA cleavage efficiency. The effect of further truncation of the predicted hairpins (Figure 17A) on the cleavage efficiency of ThermoCas9 was evaluated by performing a cleavage time-series in which all the components (sgRNA, ThermoCas9, substrate DNA) were pre-heated separately at different temperatures (37-65°C) for 1 , 2 and 5 min before combining and incubating them for 1 hour at various assay temperatures (37-65°C). The number of predicted stem-loops of the tracrRNA scaffold seemed to play a crucial role in DNA cleavage; when all three loops were present, the cleavage efficiency was the highest at all tested temperatures, whereas the efficiency decreased upon removal of the 3' hairpin (Figure 3B). Moreover, the cleavage efficiency drastically dropped upon removal of both the middle and the 3' hairpins (Figure 6). Whereas pre-heating ThermoCas9 at 65°C for 1 or 2 min resulted in detectable cleavage, the cleavage activity was abolished after 5 min incubation. The thermostability assay showed that sgRNA variants without the 3' stem- loop result in decreased stability of the ThermoCas9 protein at 65°C, indicating that a full length tracrRNA is required for optimal ThermoCas9-based DNA cleavage at elevated temperatures. Additionally, we also varied the lengths of the spacer sequence (from 25 to 18 nt) and found that spacer lengths of 23, 21 , 20 and 19 cleaved the targets with the highest efficiency. The cleavage efficiency drops significantly when a spacer of 18 nt is used.
In vivo, the ThermoCas9:sgRNA RNP complex is probably formed within minutes. The activity and thermostability of the RNP was evaluated. Pre-assembled RNP complex was heated at 60, 65 and 70°C for 5 and 10 min before adding pre-heated DNA and subsequent incubation for 1 hour at 60, 65 and 70°C. Strikingly, the tThermoCas9ThermoCas9 RNP was active up to 70°C, in spite of its pre-heating for 5 min at 70°C (Figure 3C). This finding confirms an assumption that the ThermoCas9 stability strongly correlates with the association of an appropriate sgRNA guide.
Example 4: Comparison of SpCas9 and ThermoCas9 activities with temperature ThermoCas9 temperature range was compared with that of the Streptococcus pyogenes Cas9 (SpCas9). Both Cas9 homologues were subjected to in vitro activity assays between 20 and 65°C. Both proteins were incubated for 5 min at the corresponding assay temperature prior to the addition of the sgRNA and the target DNA molecules. The mesophilic SpCas9 was active only between 25 and 44°C (Figure 3D); above these temperature SpCas9 activity rapidly decreased to undetectable levels. In contrast, ThermoCas9 cleavage activity could be detected between 25 and 65°C (Figure 3D). This indicates the potential to use ThermoCas9 as a genome editing tool for both thermophilic and mesophilic organisms.
Previously characterized, mesophilic Cas9 endonucleases employ divalent cations to catalyze the generation of DSBs in target DNA (Jinek et al. Science 337, 816-821 (2012); Chen et al. J. Biol. Chem. 289, 13284-13294 (2014)). To evaluate which cations contribute to DNA cleavage by ThermoCas9, plasmid cleavage assays were performed in the presence of one of the following divalent cations: Mg2+, Ca2+, Mn2+, Co2+, Ni2+, and Cu2+; an assay with the cation-chelating agent EDTA was included as negative control. As expected, target dsDNA was cleaved in the presence of divalent cations and remained intact in the presence of EDTA (Figure 7A). The activity of ThermoCas9 was then tested on ssDNA substrates. However, no cleavage was observed, indicating that ThermoCas9 is a dsDNA nuclease (Figure 7B).
Example 5: ThermoCas9-based gene deletion in the thermophile B. smithii
ThermoCas9 Bacillus smithii ET 138 was cultured at 55°C. In order to use a minimum of genetic parts, a single plasmid approach was taken. A set of pNW33n-based pThermoCas9 plasmids were constructed containing the thermocas9 gene under the control of the native xylL promoter (Pxy/i), a homologous recombination template for repairing Cas9-induced double stranded DNA breaks within a gene of interest, and a sgRNA expressing module under control of the constitutive pta promoter (PPfa) from Bacillus coagulans.
First, the full length pyrF gene was deleted from the genome of B. smithii ET 138. The pNW33n-derived plasmids pThermoCas9_bsApyrF1 and pThermoCas9_bsApyrF2 were used for expression of different ThermoCas9 guides with spacers targeting different sites of the pyrF gene, while a third plasmid (pThermoCas9_ctrl) contained a random non-targeting spacer in the sgRNA expressing module. Transformation of B. smithii ET 138 competent cells at 55°C with the control plasmids pNW33n (no guide) and pThermoCas9_ Ctrl resulted in the formation of -200 colonies each. Out of 10 screened pThermoCas9_ Ctrl colonies, none contained the ApyrF genotype, confirnning findings from previous studies that homologous recombination in B. smithii ET 138 is not sufficient to obtain clean mutants (Mougiakos et al. ACS Synth. Biol. 6, 849-861 (2017); Bosma et al. Microb. Cell Fact. 14, 99 (2015)). In contrast, transformation with the pThermoCas9_bsApyrF1 and pThermoCas9_bsApyrF2 plasmids resulted in 20 and 0 colonies respectively, confirming the in vivo activity of ThermoCas9 at 55°C and verifying the above described broad in vitro temperature range of the protein. Out of ten pThermoCas9_ApyrF1 colonies screened, one was a clean ApyrF mutant whereas the rest had a mixed wild type/ApyrF genotype , proving the applicability of the system, as the designed homology directed repair of the targeted pyrF gene was successful. Nonetheless, in the tightly controlled SpCas9- based counter-selection system we previously developed the pyrF deletion efficiency was higher (Olson et al., Curr. Opin. Biotechnol. 33, 130-141 (2015)). The low number of obtained transformants and clean mutants in the ThermoCas9-based tool can be explained by the low homologous recombination efficiency in B. smithii (Olson et al., Curr. Opin. Biotechnol. 33, 130-141 (2015)) combined with the constitutive expression of highly active ThermoCas9. It is anticipated that the use of a tightly controllable promoter will increase efficiencies.
Example 6: ThermoCas9-based gene deletion in the mesophile Pseudomonas putida
The ThermoCas9-based genome editing tool was evaluated in the mesophilic Gram- negative bacterium P. putida KT2440 by combining homologous recombination and ThermoCas9-based counter-selection. For this organism, a Cas9-based tool has not been reported to date. A single plasmid approach was used. The pEMG-based pThermoCas9_ppApyrF plasmid was constructed containing the thermocas9 gene under the control of the 3-methylbenzoate-inducible Pm-promoter, a homologous recombination template for deletion of the pyrF gene and a sgRNA expressing module under the control of the constitutive P3 promoter. After transformation of P. putida KT2440 cells and PCR confirmation of plasmid integration, a colony was inoculated in selective liquid medium for overnight culturing at 37°C. The overnight culture was used for inoculation of selective medium and ThermoCas9 expression was induced with 3-methylbenzoate. Subsequently, dilutions were plated on non-selective medium, supplemented with 3-methylbenzoate. For comparison, a parallel experiment without inducing ThermoCas9 expression with 3-methylbenzoate was performed. The process resulted in 76 colonies for the induced culture and 52 colonies for the non- induced control culture. For the induced culture, 38 colonies (50%) had a clean deletion genotype and 6 colonies had mixed wild-type/deletion genotype. On the contrary, only 1 colony (2%) of the non-induced culture had the deletion genotype and there were no colonies with mixed wild-type/deletion genotype retrieved (Figure 8). These results show that ThermoCas9 can be used as an efficient counter-selection tool in the mesophile P. putida KT2440 when grown at 37°C.
Example 7: ThermoCas9-based gene silencing An efficient thermoactive transcriptional silencing CRISPRi tool is currently not available. Such a system could be useful in a number of applications. For example, such a system would greatly facilitate metabolic studies of thermophiles. A catalytically dead variant of ThermoCas9 could serve this purpose by steadily binding to DNA elements without introducing dsDNA breaks. To this end, we identified the RuvC and HNH catalytic domains of ThermoCas9 and introduced the corresponding D8A and H582A mutations for creating a dead (d)ThermoCas9. After confirmation of the designed sequence, Thermo-dCas9 was heterologously produced, purified and used for an in vitro cleavage assay with the same DNA target as used in the aforementioned ThermoCas9 assays; no cleavage was observed confirming the catalytic inactivation of the nuclease.
In developing the Thermo-dCas9-based CRISPRi tool, the transcriptional silencing of the highly expressed IdhL gene from the genome of B. smithii ET138 was used. pNW33n-based vectors pThermoCas9i_/c//?L and pThermoCas9i_ctrl were constructed. Both vectors contained the thermo-dCas9 gene under the control of PxyiL promoter and a sgRNA expressing module under the control of the constitutive PPta promoter . The pThermoCas9i_/c//?L plasmid contained a spacer for targeting the non- template DNA strand at the 5' end of the 138 IdhL gene in B. smithii ET 138. The position and targeted strand selection were based on previous studies (Bikard et al. Nucleic Acids Res. 41 , 7429-7437 (2013); Larson et al. Nat. Protoc. 8, 2180-2196 (2013)), aiming for the efficient down-regulation of the IdhL gene. The pThermoCas9i_ctrl plasmid contained a random non-targeting spacer in the sgRNA- expressing module. The constructs were used to transform B. smithii ET 138 competent cells at 55°C followed by plating on LB2 agar plates, resulting in equal amounts of colonies. Two out of the approximately 700 colonies per construct were selected for culturing under microaerobic lactate-producing conditions for 24 hours, as described previously (Bosma et al. Appl. Environ. Microbiol. 81 , 1874-1883 (2015)). The growth of the pThermoCas9i_/c//?L cultures was 50% less than the growth of the pThermoCas9i_ctrl cultures. We have previously shown that deletion of the IdhL gene leads to severe growth retardation in B. smithii ET 138 due to a lack of Ldh-based NAD+-regenerating capacity under micro-aerobic conditions (Bosma et al. Microb. Cell Fact. 14, 99 (2015)). Thus, the observed decrease in growth is likely caused by the transcriptional inhibition of the IdhL gene and subsequent redox imbalance due to loss of NAD+-regenerating capacity. Indeed, HPLC analysis revealed 40% reduction in lactate production of the IdhL silenced cultures, and RT-qPCR analysis showed that the transcription levels of the IdhL gene were significantly reduced in the pThermoCas9i_/c//?L cultures compared to the pThermoCas9i_ctrl cultures. Example 8: in vitro spacer-protospacer mismatch tolerance screen
The targeting specificity and spacer-protospacer mismatch tolerance of a Cas9 endonuclease provides information for the development of the Cas9 into a genome engineering tool. To investigate the targeting specificity of ThermoCas9 towards a selected protospacer, a target plasmid library was constructed by introducing either single- or multiple-mismatches to the previously employed protospacer (Figure 9A). Each member of the plasmid library, and its PCR-linearized derivative, was separately used as substrate for in vitro ThermoCas9 cleavage assays at 37 and 55°C.
For the construction of the spacer-protospacer mismatch target library, twenty pairs of 40-nt long complementary ssDNA fragments, containing the mismatch-protospacers, were annealed. The annealing products were designed to have overhangs compatible for their directional ligation (T4 ligase, NEB) into the pNW33n backbone, upon BspHI and BamHI (NEB) digestion of the vector. The ligation mixtures were transformed into chemically competent E. coli DH5a cells (NEB), plasmids were isolated from liquid cultures and verified by sequencing. Both plasmids and PCR linearized DNA substrates were employed for the mismatch-tolerance assays.
Figure 9A shows the scheme of the generated mismatch protospacers library, employed for evaluating the ThermoCas9:sgRNA targeting specificity in vitro. The mismatch spacer-protospacer positions are encircled, the PAM is shown in lighter shade and with the 5th to 8th positions underlined.
Figure 9B is a graphical representation of the ThermoCas9:sgRNA cleavage efficiency over linear or plasmid targets with different mismatches at 37 °C. The percentage of cleavage was calculated based on integrated band intensities after gel electrophoresis. Average values from three biological replicates are shown, with error bars representing S.D.
Figure 9C is a graphical representation of the ThermoCas9:sgRNA cleavage efficiency over linear or plasmid targets with different mismatches at 55 °C. The percentage of cleavage was calculated based on integrated band intensities after gel electrophoresis. Average values from three biological replicates are shown, with error bars representing S.D.
The ThermoCas9:sgRNA activity on linear dsDNA targets was abolished at 37 °C for most of the single-mismatch targets (Figure 9B). Noteworthy exceptions were the targets with single-mismatches at the PAM proximal position 2 and PAM distal position 20, which allowed for weak cleavage (Figure 9B). At 55 °C, the cleavage efficiency for single-mismatch linear targets was higher than at 37 °C, however, it was strongly hampered for most of the tested targets, especially for single-mismatches at the PAM proximal positions 4, 5 and 10 (Figure 9C). On the contrary, single-mismatches at positions 1 , 2 and 20 were the most tolerated for cleavage (Figure 9C).
In complete contrast to the linear targets, all the corresponding plasmid targets with single-mismatches were cleaved by the ThermoCas9:sgRNA complex at 37 °C, regardless the position of the mismatch, with preference for the targets with single- mismatches at the PAM proximal positions 2, 6 to 10, 15 and 20 (Figure 9B). At 55 °C, all the single-mismatch plasmid targets were completely cleaved (Figure 9C).
Remarkably, the ThermoCas9:sgRNA activity was impeded for both linear and plasmid targets with multiple-mismatches as there was no detectable cleavage for most of these targets at the tested temperatures (Figure 9B, 9C). Notable exception was the target with a double mismatch at positions 19 and 20 which was still cleaved at both tested temperatures, again more prominently at 55 °C (Figure 9B, 9C). ThermoCas9 activity on linear DNA targets is abolished at 37 °C upon introduction of single nucleotide mismatches between the spacer and the targeted protospacer, with the exception of the second PAM-proximal protospacer position. Furthermore, the ThermoCas9 activity on linear DNA targets was drastically reduced at 55 °C upon introduction of single nucleotide mismatches between the 3rd and the 10th PAM- proximal protospacer positions, as well as the 15th. The ThermoCas9 activity, on both linear DNA and plasmid targets, is hampered or - in most of the cases - is completely abolished upon introduction of multiple nucleotide mismatches between the spacer and the targeted protospacer, starting from the PAM distal end of the spacer. These results indicate the lower in vitro spacer-protospacer mismatch tolerance of ThermoCas9 compared to SpCas9 (Jinek et al, (2012)) and highlight its potential as a genome editing tool for eukaryotic cells with enhanced target specificity. What is now provided as a consequence is a highly specific nuclease in the form of a Cas9 that does not result in polynucleotide cleavage when a mismatch between guide RNA and the polynucleotide locus occurs.

Claims

1 . A method of modifying a desired genetic target locus, comprising delivering a composition to the locus, the composition comprising a ThermoCas9 protein which has an amino acid sequence of SEQ ID NO: 1 or a sequence of at least 77% identity therewith, or a polypeptide fragment of such a ThermoCas9 protein, and one or more nucleic acid components, wherein the ThermoCas9 protein or polypeptide forms a complex with the one or more nucleic acid components and upon binding of the complex to a target locus that is 3' of a Protospacer Adjacent Motif (PAM), the ThermoCas9 protein or polypeptide makes a modification of the target locus, and wherein the method is performed at a temperature of not more than 50 °C.
2. A method as claimed in claim 1 , wherein the level of off-target modification is reduced compared to when the method is performed at a temperature of 55 °C.
3. A method as claimed in claim 1 or claim 2, wherein the level of off-target modification is reduced compared to when the method is performed using SpyCas9 instead of ThermoCas9 and at a temperature of 37 °C.
4. A method as claimed in any of claims 1 to 3, wherein the method is performed at a temperature in the range 20 °C to 50 °C. or at a temperature in the range 20 °C to 45 °C.
5. A method as claimed in claim 4, wherein the method is performed at 37 °C.
6. A method as claimed in any preceding claim, wherein the PAM sequence comprises 5'-NNNNCNN-3' [SEQ ID NO: 133].
7. A method as claimed in any preceding claim wherein the PAM sequence comprises 5'-NNNNCNNN-3' [SEQ ID NO: 5]; preferably 5'-NNNNCNNA-3' [SEQ ID NO: 3].
8. A method as claimed in any preceding claim, wherein the PAM sequence comprises 5'-NNNNCSAA-3' [SEQ ID NO: 4].
9. A method as claimed in claim 8, wherein the PAM sequence comprises 5'- NNNNCCAA-3' [SEQ ID NO: 134].
10. A method as claimed in any preceding claim, wherein the one or more nucleic acid components is a guide RNA (gRNA); preferably a single guide RNA (sgRNA).
1 1 . A method as claimed in any preceding claim, wherein the modification is nuclease cleavage.
12. A method as claimed in any preceding claim, wherein the target locus is comprised in a DNA molecule; preferably a linear dsDNA molecule.
13. A method as claimed in any preceding claim, wherein the target locus is in a cell; optionally a bacterial cell, an animal cell, a human cell or a plant cell.
14. A method as claimed in any of claims 1 to 12, wherein the target locus is in vitro.
15. A method as claimed in any preceding claim wherein the target locus is within double stranded DNA and the modification is a double stranded break.
16. A method as claimed in any of claims 1 to 14, wherein the polynucleotide comprising the target locus is double stranded DNA, the ThermoCas9 protein lacks the ability to cut the double stranded DNA and said use results in gene silencing of the polynucleotide.
17. A method as claimed in claim 16, wherein the ThermoCas9 protein contains the mutations D8A and H582A.
18. A method as claimed in any preceding claim, wherein the ThermoCas9 protein has an amino acid sequence of a ThermoCas9 from Geobacillus sp., preferably Geobacillus thermodenitrificans; more preferably Geobacillus thermodenitrificans T12.
19. A method as claimed in any preceding claim, wherein the ThermoCas9 protein further comprises at least one functional moiety.
20. A method as claimed in any preceding claim wherein the ThermoCas9 protein is provided as part of a protein complex comprising at least one further functional or non-functional protein, optionally wherein the at least one further protein further comprises at least one functional moiety.
21 . A method as claimed in claim 19 or claim 20, wherein the ThermoCas9 protein or further protein comprises at least one functional moiety fused or linked to the N- terminus and/or the C-terminus of the ThermoCas9 protein or protein complex; preferably the C-terminus.
22. A method as claimed in any of claims 19 to 21 , wherein the at least one functional moiety is a protein; optionally selected from a helicase, a nuclease, a helicase-nuclease, a DNA methylase, a histone methylase, an acetylase, a phosphatase, a kinase, a transcription (co-)activator, a transcription repressor, a DNA binding protein, a DNA structuring protein, a marker protein, a reporter protein, a fluorescent protein, a ligand binding protein, a signal peptide, a subcellular localisation sequence, an antibody epitope or an affinity purification tag, for example a green fluorescent protein (GFP).
23. A method as claimed in claim 20, wherein the native nuclease activity of the ThermoCas9 protein is inactivated and the inactivated Thermocas9 protein is linked to at least one functional moiety.
24. A method as claimed in claim 23, wherein the at least one functional moiety is a nuclease domain; preferably a Fokl nuclease domain.
25. A method as claimed in claim 23 or claim 24, wherein the at least one functional moiety is a marker protein.
PCT/EP2018/076480 2017-10-10 2018-09-28 Thermostable cas9 nucleases with reduced off-target activity WO2019072596A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GBGB1716590.3A GB201716590D0 (en) 2017-10-10 2017-10-10 Thermostable cas9 nucleases with reduced off-target activity
GB1716590.3 2017-10-10

Publications (1)

Publication Number Publication Date
WO2019072596A1 true WO2019072596A1 (en) 2019-04-18

Family

ID=60326828

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2018/076480 WO2019072596A1 (en) 2017-10-10 2018-09-28 Thermostable cas9 nucleases with reduced off-target activity

Country Status (2)

Country Link
GB (1) GB201716590D0 (en)
WO (1) WO2019072596A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11098297B2 (en) 2017-06-09 2021-08-24 Editas Medicine, Inc. Engineered Cas9 nucleases
US11499151B2 (en) 2017-04-28 2022-11-15 Editas Medicine, Inc. Methods and systems for analyzing guide RNA molecules
US11866726B2 (en) 2017-07-14 2024-01-09 Editas Medicine, Inc. Systems and methods for targeted integration and genome editing and detection thereof using integrated priming sites
JP7448120B2 (en) 2019-11-14 2024-03-12 国立研究開発法人農業・食品産業技術総合研究機構 Method for introducing genome editing enzymes into plant cells using plasma

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016198361A1 (en) * 2015-06-12 2016-12-15 Wageningen Universiteit Thermostable cas9 nucleases
WO2018109101A1 (en) * 2016-12-14 2018-06-21 Wageningen Universiteit Thermostable cas9 nucleases
WO2018108338A1 (en) * 2016-12-14 2018-06-21 Wageningen Universiteit Thermostable cas9 nucleases

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016198361A1 (en) * 2015-06-12 2016-12-15 Wageningen Universiteit Thermostable cas9 nucleases
WO2018109101A1 (en) * 2016-12-14 2018-06-21 Wageningen Universiteit Thermostable cas9 nucleases
WO2018108338A1 (en) * 2016-12-14 2018-06-21 Wageningen Universiteit Thermostable cas9 nucleases

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
BENJAMIN P. KLEINSTIVER ET AL: "High-fidelity CRISPR-Cas9 nucleases with no detectable genome-wide off-target effects", NATURE, vol. 529, no. 7587, 1 January 2016 (2016-01-01), London, pages 490 - 495, XP055536782, ISSN: 0028-0836, DOI: 10.1038/nature16526 *
IOANNIS MOUGIAKOS ET AL: "Characterizing a thermostable Cas9 for bacterial genome editing and silencing", BIORXIV, 18 August 2017 (2017-08-18), XP055536365, Retrieved from the Internet <URL:https://www.biorxiv.org/content/biorxiv/early/2017/08/18/177717.full-text.pdf> DOI: 10.1101/177717 *
IOANNIS MOUGIAKOS ET AL: "Characterizing a thermostable Cas9 for bacterial genome editing and silencing", NATURE COMMUNICATIONS, vol. 8, no. 1, 21 November 2017 (2017-11-21), XP055536899, DOI: 10.1038/s41467-017-01591-4 *
LUCAS B. HARRINGTON ET AL: "A thermostable Cas9 with increased lifetime in human plasma", NATURE COMMUNICATIONS, vol. 8, no. 1, 10 November 2017 (2017-11-10), XP055536900, DOI: 10.1038/s41467-017-01408-4 *
MARTINUS J. A. DAAS ET AL: "Complete Genome Sequence of Geobacillus thermodenitrificans T12, A Potential Host for Biotechnological Applications", CURRENT MICROBIOLOGY, vol. 75, no. 1, 12 September 2017 (2017-09-12), Boston, pages 49 - 56, XP055536793, ISSN: 0343-8651, DOI: 10.1007/s00284-017-1349-0 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11499151B2 (en) 2017-04-28 2022-11-15 Editas Medicine, Inc. Methods and systems for analyzing guide RNA molecules
US11098297B2 (en) 2017-06-09 2021-08-24 Editas Medicine, Inc. Engineered Cas9 nucleases
US11866726B2 (en) 2017-07-14 2024-01-09 Editas Medicine, Inc. Systems and methods for targeted integration and genome editing and detection thereof using integrated priming sites
JP7448120B2 (en) 2019-11-14 2024-03-12 国立研究開発法人農業・食品産業技術総合研究機構 Method for introducing genome editing enzymes into plant cells using plasma

Also Published As

Publication number Publication date
GB201716590D0 (en) 2017-11-22

Similar Documents

Publication Publication Date Title
US11976306B2 (en) Thermostable CAS9 nucleases
US11802277B2 (en) Thermostable Cas9 nucleases
WO2018109101A1 (en) Thermostable cas9 nucleases
WO2019072596A1 (en) Thermostable cas9 nucleases with reduced off-target activity
KR102662270B1 (en) Thermostable CAS9 nuclease

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18780102

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18780102

Country of ref document: EP

Kind code of ref document: A1