WO2004031346A2 - Methods and compositions concerning designed highly-specific nucleic acid binding proteins - Google Patents

Methods and compositions concerning designed highly-specific nucleic acid binding proteins Download PDF

Info

Publication number
WO2004031346A2
WO2004031346A2 PCT/US2003/027875 US0327875W WO2004031346A2 WO 2004031346 A2 WO2004031346 A2 WO 2004031346A2 US 0327875 W US0327875 W US 0327875W WO 2004031346 A2 WO2004031346 A2 WO 2004031346A2
Authority
WO
WIPO (PCT)
Prior art keywords
nuclease
nucleic acid
modified
amino acid
polypeptide
Prior art date
Application number
PCT/US2003/027875
Other languages
French (fr)
Other versions
WO2004031346A3 (en
Inventor
Barry L. Stoddard
Raymond J. Monnat Jr.
David Baker
Brett Chevalier
Tania Kortemme
Meggan Chadsey
Original Assignee
Fred Hutchinson Cancer Research Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fred Hutchinson Cancer Research Center filed Critical Fred Hutchinson Cancer Research Center
Priority to AU2003290518A priority Critical patent/AU2003290518A1/en
Publication of WO2004031346A2 publication Critical patent/WO2004031346A2/en
Publication of WO2004031346A3 publication Critical patent/WO2004031346A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses

Definitions

  • the present invention relates generally to the fields of biochemistry and molecular biology. More particularly, the present invention relates to designing and producing artificial nucleic acid binding proteins that specifically bind novel nucleic acid sequences.
  • homing endonucleases are used as the initial molecules for design and engineering.
  • the new nucleic acid binding proteins possess a specificity of nucleic acid sequence recognition that is sufficiently high to allow their use as gene-specific reagents (proteins that recognize and bind one site within an entire biological genome).
  • type II restriction enzymes are common reagents in most laboratories and have revolutionized molecular biology, they have evolved to recognize and cleave relatively short DNA sequences invading their host cell. As such, their natural restriction sites are usually 4 to 8 base pairs, which corresponds to roughly one cleavage site in every 300 to 100,000 base pairs of a random DNA target.
  • non-specific nuclease domains have been tethered to sequence-specific D ⁇ A binding modules such as zinc-fingers (Smith et al, 1999; Smith et al, 2000) and used in vivo to stimulate homologous recombination (Bibikova et al, 2001) or other cellular processes.
  • sequence-specific D ⁇ A binding modules such as zinc-fingers (Smith et al, 1999; Smith et al, 2000) and used in vivo to stimulate homologous recombination (Bibikova et al, 2001) or other cellular processes.
  • Such constructs are potentially useful for the creation of gene-specific reagents (a single protein that recognizes a unique site in a genome) but lack the ability to specifically act at a single unique phosphodiester bond or base pair within the D ⁇ A target site (Smith et al, 1999).
  • the present invention concerns methods for designing and creating polypeptides that recognize specific nucleic acid sequences.
  • the invention concerns at least two related methods.
  • the first method allows the computational design and/or genetic selection of polypeptides that are chimeric fusions of DNA-binding domains from independent, naturally occurring DNA-binding proteins.
  • the second method allows the computational design and/or genetic selection of polypeptides with modified nucleic acid binding surfaces. These methods can be used either individually or in concert to create novel nucleic acid-binding proteins.
  • homing endonucleases are used as the initial molecules for design and engineering.
  • the present invention further concerns the biochemical functions of these designed and created polypeptides, whose nucleic acid binding specificity can be exploited as a reagent for conducting experiments involving nucleic acids, as a therapeutic or diagnostic agent to recognize a specific nucleic acid sequence, or as a delivery vehicle to a specific nucleic acid sequence.
  • methods of the invention involve the computer modeling of a polypeptide (native or a chimeric polypeptide) in the context of a target sequence (designed or chimeric) to identify contact points and interfaces, which refer to interactions both between a polypeptide and a nucleic acid or between different domains or parts of the polypeptide with one another.
  • a nucleic acid-protein contact point refers to the point at which a protein domain and the nucleic acid molecule favorably interact.
  • the sum of nucleic acid-protein contact points constitutes the nucleic acid interface — that is, the entire interaction between a nucleic acid binding polypeptide and its nucleic acid target sequence.
  • a protein-protein contact point refers to the point at which individual protein domains favorably interact.
  • the sum of protein-protein contact points constitutes the amino acid interface.
  • interface refers to nucleic acid and amino acid interfaces unless otherwise specified.
  • contact point refers to nucleic acid- protein contact points and protein-protein contact points, unless otherwise specified.
  • polypeptides are designed based on a naturally occurring polypeptide that has a nucleic acid binding domain that specifically binds a particular nucleic acid sequence (referred to as “nucleic acid binding polypeptide").
  • the nucleic acid binding polypeptide may also comprise other activities or functions.
  • the nucleic acid binding polypeptide is a homing endonuclease, which has a DNA binding domain.
  • the invention concerns nucleic acid binding polypeptides that possess other activities that operate at or with respect to the particular nucleic acid sequence (referred to as "target sequence”).
  • the nucleic acid binding polypeptide is a nuclease that cleaves the nucleic acid within the recognized specific nucleic acid sequence.
  • the invention has particular advantages in generating polypeptides that recognize sequences not previously recognized by a known polypeptide.
  • nucleic acid binding proteins such as restriction endonucleases
  • the number of sites recognized by these proteins in a genome is quite high.
  • Proteins that recognize fewer sites in a genome possess particular benefits, for example the ability to act upon a single genetic site that is a unique marker of a cell phenotype, such as a neoplastic mutation, or on a similarly unique genetic marker of a viral or bacterial pathogen.
  • the action of the engineered polypeptide can be limited to binding, or can include cleavage, gene activation or inactivation, or chemical modification of the nucleic acid sequence.
  • the invention provides methods and compositions concerning proteins that recognize sites that are greater than 8 bases/base pairs.
  • the recognition site may be, be at least, or be at most 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 bases/base pairs or longer in length.
  • the generated proteins may recognize a double-stranded site that is found in a genome fewer than 10 times, and may be a GSR with respect to a particular genome.
  • the site is DNA.
  • the nucleic acid binding polypeptide binds specifically to DNA, while in other embodiments it binds to RNA.
  • DNA or RNA may be in an A-form, B-form, or Z-form conformation or an intermediate conformation between these states.
  • the nucleic acid binding polypeptide binds DNA or RNA.
  • Such polypeptides have one or more "DNA binding domains," which refers to a region (one or more contiguous amino acid residues) or regions of the polypeptide that form the interface and mediate interactions between the polypeptide and the target nucleic acid sequence.
  • These same domains may contain nuclease active sites that are capable of cleaving the target sites at specific positions within their sequence. The chemical activity of these active sites can be attenuated or eliminated as part ofthe redesign ofthe novel nucleic acid-binding proteins.
  • Naturally occurring or wild-type DNA binding polypeptides can be used as a template that may subsequently be altered or modified to generate a polypeptide with the ability to bind specifically to a nucleic acid sequence that differs from the sequence recognized by the unaltered polypeptide.
  • chimeric polypeptides may be combined to form a chimeric polypeptide.
  • a chimeric polypeptide When such a chimeric polypeptide is created and that polypeptide has nuclease activity, it is referred to as a chimeric nuclease.
  • Chimeric polypeptides may be further modified to alter their structure and/or activity. In some embodiments, the chimeric polypeptide recognizes the sequence identified by each individual DNA binding domain of the chimera.
  • Methods for creating such a chimeric nucleic acid-binding polypeptide with nucleic acid sequence-specific activity involve a) preparing a computational model of a complex between a first polypeptide having a nucleic acid binding domain, such as from a nuclease, and a second polypeptide having a nucleic acid domain, such as from the same or a different nuclease; b) evaluating and identifying amino acids that are potential protein-protein contact points between the first polypeptide and the second polypeptide, and c) identifying an amino acid change that creates or enhances an amino acid contact point to improve an amino acid interface between the first and second polypeptides, and further provides a design for the chimeric nuclease.
  • Chimeric nucleases with nucleic acid sequence-specific activity also involve computational modeling and improvement/optimization of the protein-protein interface.
  • Chimeric nucleases of the present invention may be designed so that their catalytic activity is reduced with respect to the parental native nucleases. In some embodiments, the nuclease activity is abolished, rendering a chimeric nucleic acid binding polypeptide.
  • the term "chimeric nucleic acid binding polypeptides" includes chimeric nucleases, which are chimeric nucleic acid binding polypeptides with nuclease activity.
  • potential contact point refers to a point at which a favorable interaction may occur if modifications are made to amino acid(s) in that area.
  • a potential contact point is one in which there are spatial problems, for example, where two amino acids are too far apart to interact with one another or where there is steric hindrance between several amino acids. Alternatively, there may be an unfavorable chemical interaction that requires modification of one or more ofthe amino acids.
  • compositions of the invention concern an existing nuclease (either a naturally occurring protein or a novel chimera as described above) that recognizes a specific sequence initially, but then is designed and modified to recognize a different specific nucleic acid sequence. The binding occurs through an interface between the protein and the specific nucleic acid sequence.
  • Modified polypeptides ofthe present invention may be designed to reduce or abolish the catalytic activity of the unmodified polypeptide.
  • Methods for creating such a modified nucleic acid binding polypeptide with nucleic acid sequence-specific activity involve: a) preparing a computational model of a complex between the unmodified nuclease and the nucleic acid sequence to which it is desired to bind (the "target site") to evaluate the nucleic acid interface; b) identifying potential nucleic acid-protein contact points between the nuclease and the nucleic acid sequence and c) identifying an amino acid change that creates or enhances a nucleic acid- protein contact point to improve a nucleic acid interface between the DNA binding domain and the nucleic acid sequence, and further provides a design for the modified nuclease.
  • the term “improve” is meant to denote the design of an interface that permits the stable and specific interaction between contact points or along an interface relative to interactions between interfaces involving unmodified proteins.
  • the term includes improving an interaction to such an extent that the interaction is said to be “optimized.”
  • the term improve/optimize refers to improvement that may or may not include optimization.
  • the binding occurs through a nucleic acid-binding domain, which is a region ofthe protein that specifically interacts (chemically) with a nucleic acid.
  • the term “computational model” refers to a schematic or other preliminary work that is prepared using a computer algorithm or computer program that can process and provide information about protein and nucleic acid chemistry and conformation.
  • a number of such programs and algorithms are readily available and known to those of skill in the art. They can configure a protein sequence into a 3- dimensional molecule and additionally configure it with a ligand or other substrate, such as a particular nucleic acid molecule. h the context of the invention, the program or algorithm will configure and improve (in some cases, optimize) an interface, including its amino acid side chains, between separate DNA- binding domains of naturally occurring DNA-binding proteins (to create novel chimeric DNA- binding proteins). The program or algorithm will also configure and improve or optimize the interface, including its amino acid side chains, between DNA-binding domain(s) of naturally occurring DNA-binding proteins or chimeric DNA-binding proteins and nucleic acid molecules.
  • a "contact point” refers to the point at which individual protein domains, or protein domain and nucleic acid molecules interact. Such contact points are formed as a result of specific binding between two protein domains or between protein domains and a nucleic acid molecule. Other amino acids within the interface may also be modified to enhance or improve the interaction between individual protein domains or between protein domains and nucleic acid molecules.
  • Modifications to the interface may result in improved interaction between individual protein domains or between protein domain(s) and nucleic acid molecules present in the complex or may result in improved stability of the protein, hi this context, amino acid side chains to be altered represent "potential contact points" in the interface.
  • Interface refers to the amino acids between individual protein domains or between protein domains and nucleic acid molecules that form contact points, as well as those amino acids that are adjacent to contact points and along the planar surface between individual protein domains or between protein domains and nucleic acid molecules.
  • methods ofthe invention further include the step of identifying potential contact points between individual protein domains or between protein domains and nucleic acid molecules and/or identifying amino acids along the interface that can modified to improve the interface (that is improve the interaction between individual protein domains or between protein domains and nucleic acid molecules).
  • Computational modeling that occurs in different embodiments of methods of the invention involves modeling ofthe various entities so as to show their interactions with one another, such interactions between or among the following: single or multiple protein domains with each other or with one or more target sequence(s).
  • the interface ofthe initial model of a new protein-protein complex such as a chimeric nuclease ofthe present invention, or of a novel nucleic acid-binding protein- nucleic acid complex, is analyzed to identify amino acid side chains in the protein(s) to be altered in order to improve the atomic contacts made throughout the interface. This is followed by an automated computational design protocol to search through possible interface sequence combinations corresponding to the amino acids to be altered.
  • Sets of solutions are further assessed by eliminating sequence changes likely to affect nearby active site residues or, in some cases, by reducing structural redundancy (for example, if phenylalanine and tyrosine were computationally selected at a position, only one residue was used based on whether a neighboring atom could form a hydrogen-bond).
  • the interface free energy of the best possible sequence combinations are exhaustively enumerated using optimized rotamer conformations for each sequence.
  • a potential contact point may refer to one or more amino acids that are not stably interacting with the target sequence because a strong or stable enough chemical bond cannot be formed between the amino acid(s) and the sequence. As a result, there is no contact point because of improper or inadequate bonding.
  • the inability to bind may be due to chemical constraints (incompatible reaction groups) or proximity issues (too far or too close together).
  • a specific chemical constraint can involve amino acids that repel each other or attract each other because of chemical charges.
  • a specific proximity issue is when there is steric hindrance between either amino acids or between an amino acid and the target sequence, which precludes or interferes with proper chemical bonding.
  • an amino acid(s) may be too far from the target sequence to create an interface. In such cases, there is a gap between the two, which must be reduced or eliminated to create a contact point.
  • a protein may be modified through one or more amino acid changes, including rotameric changes, to create an actual contact point between the protein's nucleic acid binding domain and the target sequence.
  • a protein may also be modified through one or more amino acid changes to improve the interface between the members of the complex (for example, between the protein and nucleic acid or between the DNA binding domains of a chimeric nuclease; between individual protein domains or between protein domains and nucleic acid molecules).
  • An amino acid change is a modification that is a substitution, deletion, or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more contiguous or non-contiguous amino acids.
  • methods ofthe invention further include identifying an amino acid change that creates or enhances a contact point or the interface between the nucleic acid binding domain and the nucleic acid sequence (target sequence), which can further provide a design for the modified polypeptide.
  • Enhancing a contact point or the interface means that a point at which the polypeptide and the nucleic acid sequence directly or indirectly interact is made more chemically favorable, which includes reducing entropy, increasing stability, and reducing any steric hindrance.
  • Amino acid changes to create 1, 2, 3, 4, 5, 6, 7, 8,9 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
  • a change that is determined computationally refers to the use of a computer program or algorithm to identify amino acid changes that would create a desired contact point or improve the interface.
  • the change is identified based on other polypeptides that interact with site of similar sequence. Parameters known to those of ordinary skill in the art may be employed to guide the program or algorithm, such as sequence alignments, three-dimensional structural alignments, calculations of molecular interaction energies, and docking scores based on molecular complementarity.
  • rotameric libraries of amino acid side chains are used, wherein the different backbone-dependent rotameric states comprise (i) rotamers interacting with surrounding groups; (ii) rotamers interacting with a fixed portion of the molecule including the backbone and all side chains not subject to substitution; and (iii) pairwise rotamer to rotamer energies, which may be considered in determining amino acid changes.
  • a method wherein the overall minimum free energy is enhanced by (i) providing at least two polypeptide backbone models with different relative orientations of protein sequences in the domain interface; (ii) performing sequence design assays on the polypeptide backbone (or on each polypeptide backbone when a chimeric polypeptide is employed); and (iii) obtaining sequences with different amino acid combinations at each interface.
  • the combinations of sequences obtained can be reduced by: (1) eliminating sequences that affect the activity of nearby active site residues; (2) choosing between redundant residues by their ability to form hydrogen bonds with neighboring residues; (3) screening for optimal rotamer conformation for each sequence; and/or (4) identifying the top scoring interface free energy sequences having the overall minimum free energy.
  • redundant residues refers to residues that share similar steric and/or chemical structures, differing only at individual atomic positions, such as phenylalanine vs. tyrosine, or glutamate vs. glutamine, or aspartate vs. asparagine.
  • residues are considered “redundant” with respect to other residues; a "conservative" amino acid change reflects such redundancy.
  • a change that is determined empirically involves changing an amino acid and then evaluating the modified polypeptide.
  • the amino acid change is evaluated using a binding assay in which the ability ofthe modified protein to bind all or part of the target sequence is tested.
  • a binding-cleavage assay is performed to determine whether the target sequence is cleaved at the appropriate position.
  • Methods for creating modified nucleic acid binding polypeptides further include a step of obtaining a crystal structure (or at least the data regarding a crystal structure) of the nucleic acid binding polypeptide before it has been modified.
  • the crystal structure may be ofthe polypeptide alone or with a nucleic acid sequence to which it specifically binds or recognizes. It is further contemplated that obtaining a crystal structure of the data regarding a crystal structure may be accomplished by generating the crystal structure and gathering the relevant data. It will be understood that the relevant data regarding a crystal structure concerns information about the polypeptide' s tertiary structure and/or the polypeptide' s interaction with a nucleic acid sequence to which it specifically binds or recognizes.
  • a polypeptide After a polypeptide is designed, it may be prepared by methods readily known to those of skill in the art. In some embodiments, substitutions, deletions, or additions are introduced via the nucleic acid encoding the polypeptide. Such recombinant techniques are well known to those of skill in the art. In some embodiments, the amino acid change(s) is/are implemented by site- directed mutagenesis of a nucleic acid encoding the unmodified polypeptide. "Unmodified polypeptide" refers to the polypeptide sequence prior to be re-designed using method of the invention. The starting point for many ofthe design methods is an unmodified polypeptide. The term can also apply to nucleases and homing endonucleases specifically.
  • a modified polypeptide After a modified polypeptide is prepared, it may be assayed for solubility and/or proper folding, as well as for activity.
  • the nucleic acid binding polypeptide contains, at a minimum, a nucleic acid binding domain and a catalytic domain.
  • the protein is a nuclease, which contains a nucleic acid binding domain and a domain that cleaves the specifically recognized sequence at a particular site (referred to as site-specific activity).
  • the nuclease is an endonuclease.
  • a Group I homing endonuclease is utilized and modified in methods and compositions of the invention.
  • LAGLIDADG Group I homing endonucleases
  • HNH His-Cys Box
  • GIY-YIG GIY-YIG
  • the invention covers all of them, though in specific embodiments, the Group I LAGLIDADG homing endonucleases are involved.
  • LAGLIDADG endonucleases that may be used include, but are not limited to: Dmo-l, Cre-l, I-Ceu ⁇ , I-Scel, I-Scell, I-SceV, I-SceNI, I-Llal.
  • l-Tev ⁇ is an example of an HNH endonuclease and I-revHI is an example of a GrY-YIG endonuclease.
  • An example of a His-Cys Box homing endonuclease is Ppo- ⁇ .
  • homing endonucleases from the LAGLIDADG or His-Cys Box families are employed. Modified nucleases of the invention may cleave both stands of a target DNA site, cleave the plus strand of a DNA target site, cleave the minus strand of a DNA target site, or bind the DNA target site specifically but not cleave either strand. It is specifically contemplated that methods and compositions of the invention discussed with respect to a nucleic acid binding polypeptide generally, may be applied specifically with respect to a homing endonuclease, and vice versa.
  • a polypeptide that has been designed and/or modified to bind a specific target sequence contains an additional a reactive group, which, in some embodiments, includes a cross-linking agent, a fluorophore, a chromophore, a metal chelator, or a protein domain attached to the modified polypeptide.
  • the reactive group is chemically attached to the modified polypeptide in other embodiments of the invention.
  • the modified polypeptide comprises a protein marker.
  • the protein marker comprises lacZ in additional embodiments.
  • a polypeptide that is "chimeric" refers to polypeptide that contains two or more recognizable and distinct regions that are not found in nature together, and which may be from different polypeptides. hi most cases, the regions are from different polypeptides, however, it is contemplated that a modified polypeptide may contain, for example, two nucleic acid binding regions from the same polypeptide (when that polypeptide normally only has one such region).
  • Methods of the invention are directed to creating a chimeric polypeptide that recognizes the combined target sites of multiple nucleic acid binding domains.
  • a polypeptide with 1, 2, 3, 4, 5, or more nucleic acid binding domains is created and altered to recognize the combined nucleic acid sequences.
  • the target nucleic acid in the context of a chimeric polypeptide with multiple nucleic acid binding domains contains each of the sequences recognized by individual binding domains in some embodiments of the invention. It is contemplated that the chimeric nuclease of the present invention may be designed and/or produced to cleave both stands of a target DNA site, to cleave the plus strand of a DNA target site, cleave the minus strand of a DNA target site, or to bind the DNA target site specifically but not cleave either strand. Therefore, the use ofthe term "nuclease" in this context merely denotes the origin ofthe originating nucleic acid binding domains.
  • step a) involves preparing a computational model of a chimeric nuclease and a nucleic acid sequence, such that the chimeric nuclease comprises (i) a first nuclease, comprising a DNA binding domain and a catalytic domain and (ii) at least a second DNA binding domain; and additional step d) identifying amino acids that are potential protein-protein contact points between the second DNA binding domain and the first nuclease, wherein the substitution of the amino acid improves atomic contacts that can be hindered by, for example, steric hinderance or improper bonding, and wherein the substitution provides a design for the modified chimeric nuclease.
  • An “potential contact points” refer to amino acid residues along the interface between the domains of a chimeric polypeptide that may be introduced to increase the stability of the chimeric polypeptide or to improve or optimize the polypeptide's interface with the target sequence.
  • Atomic contacts between two amino acids that hinder the formation of an optimal interface include but are not limited to improper or undesirable chemical interaction that include, but are not limited to charge-charge repulsion, burial of charged or polar amino acids, or exposure of hydrophobic amino acids and physical interference which includes either steric hindrance between amino acids or poor van der Waals complementarity and interactions between amino acids. It is specifically contemplated that one or more nucleic acid binding domains in a chimeric nuclease are from a homing endonuclease.
  • a chimeric protein may contain a peptide linker molecule between regions to create a monomeric protein.
  • the peptide linker is located between the first DNA binding domain and the second DNA binding domain, h specific embodiments, a peptide linker comprises the amino acid sequence of NGN, GNGN, NGNG, or GNGNG.
  • the invention further concerns compositions generated as a result of design methods of the invention.
  • the invention concerns a modified recombinant polypeptide that recognizes a specific nucleic acid sequence that is greater than 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 25, 26, 27, 28, 29, 30 or more nucleotides in length.
  • the recombinant modified polypeptide recognizes two or more sites in tandem (back-to-back or one after the other) that are individually recognized by the polypeptide prior to being modified.
  • the invention concerns a modified nuclease that has altered sequence-specificity made by the method comprising: (a) preparing a computational model of a complex between nuclease and a target nucleic acid sequence, wherein the nuclease comprises a catalytic domain and a DNA binding domain; (b) identifying potential nucleic acid-protein contact points between the DNA binding domain and the nucleic acid sequence; (c) identifying an amino acid change that creates or enhances a nucleic acid-protein contact point between the DNA binding domain and the nucleic acid sequence and further provides a design for the modified nuclease.
  • the method further includes preparing the modified nuclease.
  • the catalytic domain is catalytically inactive such that the nuclease is no longer a nuclease but is a nucleic acid binding polypeptide
  • methods ofthe invention include methods of designing a modified chimeric nucleic acid binding polypeptide with sequence-specific activity comprising: (a) preparing a computational model of a complex between a chimeric nucleic acid binding polypeptide and a nucleic acid sequence, wherein the chimeric nuclease comprises (i) a first polypeptide having a DNA binding domain and (ii) at least a second polypeptide having a DNA binding domain; (b) identifying potential nucleic acid-protein contact points between the chimeric nuclease and the nucleotide sequence, and (c) identifying an amino acid substitution to produce an operative interface between the chimeric nucleic acid binding polypeptide and the nucleic acid sequence, wherein the substitution provides a design for the modified chimeric nucleic acid binding polypeptide.
  • operative refers to the predicted ability of the modified chimeric nucleic acid binding polypeptide to recognize the nucleic acid sequence.
  • the modified chimeric nucleic acid binding polypeptide comprises site-specific nuclease activity.
  • a modified chimeric nuclease capable of recognizing an altered target nucleic acid sequence comprises: a) a first DNA binding domain from a first homing endonuclease; and, b) a second DNA binding domain from a second homing endonuclease, wherein the chimeric DNA binding polypeptide is capable of binding the target DNA sites of the first and second DNA binding domains.
  • the DNA binding domains from either or both of the homing endonucleases are further modified to improve the nucleic acid interface between the chimeric nuclease and the target nucleic acid sequence. There may be substitutions, deletions, or additions of up to 5, 10, 15, 20, 25, or more amino acids and still allow the domain to be 50% identical to the original unmodified domain.
  • the chimeric DNA binding polypeptide has a domain from Dmo-
  • Other methods of the invention concern methods of designing a modified chimeric nuclease with nucleic acid sequence-specific activity comprising: (a) preparing a computational model of complex between a chimeric nuclease and a DNA sequence, wherein said chimeric nuclease comprises (i) a first nuclease, comprising a DNA binding domain and a catalytic domain and (ii) at least a second DNA binding domain; (b) identifying a potential protein-protein contact points between the first nuclease and the second DNA binding domain; and (c) identifying an amino acid substitution to create a protein-protein contact point or to enhance a nucleic acid interface between the chimeric nuclease and the nucleic acid sequence, wherein the substitution providing a design for the modified chimeric nuclease.
  • the computational design increases the overall minimum free energy by (a) providing at least two polypeptide backbone models with different relative orientations of LAGLIDADG sequences in the domain interface; (b) performing sequence design assays on each polypeptide backbone; and (c) obtaining sequences with different amino acid combinations along the interface.
  • the number of different amino acid combinations is reduced by: (a) eliminating sequences that affect the activity of nearby active site residues; (b) choosing between redundant residues by their ability to form hydrogen bonds with neighboring residues; (c) screening for optimal rotamer conformation for each sequence; and (d) identifying the top scoring interface free energy sequences having the overall minimum free energy.
  • Other methods of the invention include methods of screening to identify a modified polypeptide with altered DNA sequence-specific activity comprising: a) generating polypeptides with one or more amino acid substitutions in their DNA binding domains; b) contacting the polypeptides with nucleic acid segments with random sequences, under conditions that allow the DNA binding domains of the polypeptides to bind specifically to the nucleic acid segments; c) identifying which polypeptides specifically bind the nucleic acid segments; and, d) identify the sequences of the nucleic acid segments, hi specific embodiments, the modified polypeptides specifically bind a sequence of at least 9 residues. Any of the modified polypeptides described herein may be employed (or achieved) in any method of the invention, including any screening method.
  • FIG. 1 Overall design strategy and method for creation and structural analysis of E- Drel.
  • FIGS. 2A-2C Biochemical characterization of a designed endonuclease with novel specificity.
  • FIG. 2A The dmo and ere sites (targets of I-Dmol and l-Crel, respectively) differ considerably; dmo is asymmetric, while ere is nearly palindromic (2-fold symmetric positions underlined).
  • FIG. 2B E-Drel, I-Crel and l-Dmol activity on different target site DNAs.
  • FIG. 2C Mapping of E-Drel scissile phosphate positions in the dre3 target site.
  • FIGS. 3A-3B The dmo and ere sites (targets of I-Dmol and l-Crel, respectively) differ considerably; dmo is asymmetric, while ere is nearly palindromic (2-fold symmetric positions underlined).
  • FIG. 2B E-
  • FIG. 3A Location of the interface in the overall structure (gray oval).
  • FIG. 3B Detailed view ofthe interface highlighting the tight packing of interacting side chains.
  • FIGS. 4A-4 ⁇ Comparison of E-Drel X-ray structure and computational model. Domains from l-Dmol and I-Oel in the x-ray structure are blue and gray, respectively; the designed model is red.
  • FIG. 4A Overall superposition of the backbone template used for computational design and the E-Drel crystal structure illustrates the good overall agreement between experimental and designed structures.
  • FIG. 4B Top view of the domain interface showing an overall superposition of the structure and computational model and detailed side chain packing interactions including the three most important interface hot spot residues: Y13, W19, and F194.
  • FIG. 4C Y13 forms two hydrogen bonds across the interface to D115 and N193.
  • FIG. 4D Comparison of E-Drel X-ray structure and computational model. Domains from l-Dmol and I-Oel in the x-ray structure are blue and gray, respectively; the designed model is red.
  • FIG. 4A Overall superposition of the backbone template used for computational design and the E-Drel crystal structure illustrate
  • W19 stacks across the interface against F151, and forms an unanticipated hydrogen bond to Q144 and a loose cation- ⁇ interaction with R148.
  • FIG. 4E. F194 is buried across the interface in a hydrophobic pocket lined by residues L14, L47, F52 and 152.
  • FIG. 5 Base-specific contacts made by E-Drel to target site DNA. Contacts to the dmo half site are shown (top), and to the ere half site are shown (bottom). Both strands of each half site are shown with the target site center indicated by the oval connecting each strand and the position ofthe scissile phosphates by the circles in the dmo (bottom) and ere (top) strands.
  • FIG. 6 Stereo views of the E-Drel and I-Crel active sites. Indicated are the three catalytic metals (purple), DNA backbone (gray) from one base on either side of the scissile phosphates (yellow), waters (blue) and putative nucleophilic waters (red).
  • the present invention encompasses methods of designing nucleic acid binding polypeptides that are modified to alter their ability to act at a particular target nucleic acid sequence. These modified polypeptides are created from either a single unmodified nucleic acid binding polypeptide or from a combination of unmodified nucleic acid binding polypeptides (chimera). Generally, computer modeling is employed to ascertain the interaction between a region of a nucleic acid binding polypeptide and a target sequence that is altered with respect to the sequence the polypeptide binds in its unmodified form. The modeling evaluates the interface between them and aids in determining which modifications are desirable to enhance the interface between the polypeptide and the nucleic acid.
  • modeling may be employed when a chimeric polypeptide is employed to further evaluate the interaction between the regions of the polypeptide that form the chimera.
  • Modified chimeric polypeptides may be generated by recombining and fusing them to alter site specific activity; identifying interferences between the fused regions of the modified protein comprising sterically hindered or improperly bonded groups; and methods of identifying amino acid substitutions to reduce any negative effect of the interferences, as well as to enhance stability and improve site specific activity.
  • nucleic acid binding polypeptide refers to any polypeptide that specifically binds a particular nucleic acid target, such as transcription factors and endonucleases. While any nucleic acid binding polypeptide may be employed, methods and compositions specifically involve homing endonucleases with respect to both chimeric and non-chimeric modified polypeptides in some embodiments ofthe invention.
  • Homing is the lateral transfer of an intervening sequence (either an intron or intein) to a homologous allele that lacks the sequence (Jacob, 1977).
  • the process is catalyzed by an endonuclease that recognizes and cleaves the target allele.
  • the homing endonuclease itself is encoded by an open reading frame (ORF) embedded within the mobile intervening sequence.
  • ORF open reading frame embedded within the mobile intervening sequence.
  • the mobile elements avoid disrupting host gene function by self-splicing at the RNA (introns) or protein (inteins) level.
  • Homing endonucleases have been found in Eubacteria, Archea, and single cell eukaryotes, where they are often encoded by and promote the lateral transfer of mobile introns (Belfort and Roberts, 1997; Chevalier and Stoddard, 2001). Homing endonucleases are highly specific and have evolved to cleave target sequences within cognate alleles without being overly toxic to the organism. They tolerate some individual base variation at their homing site, which ensures their propagation despite evolutionary drift of their target sequence. Homing endonucleases tend to be small proteins of less than 40 kDa, a property likely due to length limitations ofthe mobile sequences in which they reside.
  • Homing endonucleases bind sequence long (15- ⁇ K) bp) DNA target sites, ensuring extremely high specificity, while tolerating small numbers of single base pair polymorphisms in those sites (Chevalier and Stoddard, 2001). This combination makes homing endonucleases highly sequence-specific (recognizing as few at 1 in 10 9 random sequences) and excellent candidates from which to engineer new, sequence-specific DNA binding or catalytic proteins.
  • homing endonuclease families have been defined on the basis of conserved protein motifs (Chevalier and Stoddard, 2001). These families, are often collectively termed 'group I homing endonuclease families' since they are associated with group I introns. They comprise of the LAGLIDADG, GIY-YIG, H-N-H and the His-Cys box (Belfort and Roberts, 1997; Jurica and Stoddard, 1999) families, but it has been suggested that the latter two comprise a single ⁇ Me family of endonucleases (Kuhlmann et al, 1999).
  • LAGLIDADG enzyme family are the largest and have unique homing sites, further suggesting these enzymes to be malleable and should offer a strong foundation for engineering novel DNA-binding proteins. Because of their structures, members ofthe LAGLIDADG and His-Cys box families are particularly suited to the methods ofthe invention.
  • LAGLIDADG DNA binding specificities represented in the LAGLIDADG homing endonuclease family suggest that the methods of the present invention can be used to generate novel DNA binding proteins with the ability to target many different, specific genes within complex genomes.
  • This endonuclease family has been variously termed 'LAGLIDADG,' 'DOD,' 'dodecapeptide,' 'dodecamer' and 'decapeptide' (Belfort and Roberts, 1997; Lambowitz and Belfort, 1993; Dalgaard et al, 1997).
  • the LAGLIDADG endonucleases are the most phylo genetically diverse of the homing endonuclease families.
  • LAGLIDADG ORFs This vast host distribution includes, for example, the genomes of plant and algal chloroplasts, fungal and protozoan mitochondria, bacteria and archaea.
  • LAGLIDADG ORFs One reason for a wide distribution of LAGLIDADG ORFs appears to be their remarkable ability to invade unrelated types of intervening sequences, including group I introns, archaeal introns and inteins.
  • Descendents of LAGLIDADG homing endonucleases are also found as freestanding endonuclease genes (Watabe et al, 1981; Watabe et al, 1983; Kostriken et al, 1983) and as maturases that assist in RNA splicing (Schafer et al, 1994; Monteilhet et al, 2000; Ho et al, 1997).
  • Enzymes that contain a single copy of this motif such as I-Oel (Thompson et al, 1992) and l-Ceul (Marshall and Lemieux, 1992) act as homodimers and recognize a nearly palindromic homing site (which, like a homodimeric protein, has inherent 2- fold symmetry). Enzymes that have two copies of this motif separated by 80-150 residues, such as l-Dmol (Dalgaard et al, 1993) and PI-Scel (Gimble and Thorner, 1992) act as monomers.
  • LAGLIDADG endonucleases One of the best characterized LAGLIDADG endonucleases is PI-Scel from Saccharomyces cerevisiae, which is generated by autocatalytic protein splicing ofthe NMA intein located within the catalytic subunit ofthe yeast vacuolar l ⁇ -ATPase.
  • l-Crel Heath et al, 1997) and l-Dmol (Silva et al, 1999), which are intron-encoded LAGLIDADG endonucleases that lack protein splicing activity.
  • This small family of proteins is encoded within the only known mobile group I introns residing in nuclear genomes (Johansen et al, 1993). All of these mobile introns are located within highly conserved regions of nuclear small and large subunit ribosomal D ⁇ A of slime molds, fungi and amoebae.
  • the best-studied member of this family is l-Ppol from Physarum polycephalum.
  • GIY-YIG endonucleases This smaller family of endonucleases is characterized by the conserved GrY-(Xio_n)-YIG motif (Kowalski et al, 1999).
  • GIY-YIG endonucleases have been found in the T4 bacteriophage both as freestanding enzymes (F-2evI, F-7evII; (Sharma et al, 1992) and within mobile group I introns such as l-Tevl, 1-TevR; (Bell-Pedersen et al, 1990)
  • GIY-YIG ORFs have also been reported in introns of fungal mitochondria (Tian et al, 1991; Paquin et al, 1994; Saguez et al, 2000), algal mitochondria (Kroymann and Zetsche, 1997; Denovan- Wright et al, 1998) and algal chloroplasts (Paquin et al, 1995; Holloway et
  • H ⁇ H Family Members of the H ⁇ H family are the least well characterized, structurally and biochemically, of all homing endonucleases.
  • the H ⁇ H motif is also the least restricted of the four homing endonuclease families. It has been identified in the non-specific endonucleases, such as the antibacterial colicins E7 and E9 (Ko et al, 1999; Kleanthous et al, 1999), and in proteins encoded by mobile group II introns, including I-SceN, I-SeeNI and l-Llal (Zimmerly et al, 1995a; Zimmerly et al, 1995b; Matsuura et al, 1997).
  • H ⁇ H proteins contain two pairs of conserved histidines surrounding a conserved asparagine within a 30-33 residue sequence (Shub et al, 1994; Gorbalenya, 1994).
  • Members of the H ⁇ H family encoded within group I introns include l-Hmul and 1-HmuH from the SPO1 and SP83 introns, respectively, of two closely related Bacillus subtilis bacteriophages (Goodrich-Blair et al, 1990; Goodrich-Blair and Shub, 1994; Goodrich-Blair and Shub, 1996) and I-revi ⁇ from the nrdB intron of RB3 bacteriophage (Eddy and Gold, 1991).
  • This motif is also contained within l-Cmoel from the psbA gene of the Chlamydomonas moewusii chloroplast (Drouin et al, 2000) and within a homologous, yet uncharacterized, ORF in the psbA gene of C. reinhardtii (Hollo way et al, 1999).
  • ORFs containing HNH motifs, including inteins, have been reported but not studied (Dalgaard et al, 1997; Gorbalenya, 1998; Piefrokovski, 1998).
  • Methods of the invention involve using computer modeling to assist in the design of novel nucleic acid binding proteins.
  • Computer modeling allows the three-dimensional chemical structure of a particular protein molecule to be discerned, particularly in the context of other molecules, such as a nucleic acid to which the protein specifically binds or will be designed to specifically bind or all or part of another protein to which it will be joined.
  • the interface ofthe initial model of a new protein-protein complex or of a new protein- nucleic acid complex is analyzed to identify amino acid side chains in the protein(s) to be altered (in order to optimize the atomic contacts made throughout the interface). This is followed by an automated computational design protocol to search through possible interface sequence combinations corresponding to the amino acids to be altered.
  • a computer program is used, which strips the amino acid side chains at these positions, leaving only the backbone structure intact. Using a library of rotamers, the program then randomly places side-chains into each stripped amino acid residue until the interface is repopulated with side chains. More specifically, a library of possible side chain conformations spanning 19 potential amino acids (all but cysteine) in different backbone- dependent rotameric states (on average about 500 rotamers per sequence position) is created.
  • An energy minimization procedure is then used to search through amino acid sequence combinations in the interface to identify particularly low free energy amino acid sequences.
  • a move consists of the random replacement of a single side chain rotamer with an alternative rotamer from the library. This is iteration is performed multiple times, after which a pattern of amino acids at particular positions may emerge. The identification of a pattern is then used by fixing those particular amino acids, and the remaining amino acids along the interface are stripped and randomized according to the iteration described above. Additional patterns may emerge and this process is repeated.
  • Sets of solutions are further assessed by eliminating sequence changes likely to affect nearby active site residues or, in some cases, by reducing structural redundancy (for example, if phenylalanine and tyrosine were computationally selected at a position, the process was continued with only one residue based on whether a neighboring atom could form a hydrogen-bond).
  • the interface free energy of the best possible sequence combinations are exhaustively enumerated using optimized rotamer conformations for each sequence.
  • the computational method uses an atomic representation of the protein (including all heavy atoms as well as polar hydrogens) and a free energy function consisting of a linear combination of the attractive part of a Lennard- Jones potential (E UaUr ), a linear distance-dependent repulsive term (E Urep ), an orientation-dependent side chain-backbone and side chain- side chain hydrogen bond potential (E HB(s ⁇ bb) 8 ⁇ E HB(S( ⁇ SC) ) (TaniaKortemme, University of Washington, Seattle, WA, USA, Alex Morozov, University of Washington, Seattle, WA, USA & David Baker, University of Washington, Seattle, WA, USA; personal communication), Coulomb electrostatics (E Coul ) and an implicit solvation model (G sol ) (1):
  • E a r is an amino-acid type dependent reference energy, which approximates the interactions made in the unfolded state ensemble (Kuhlman and Baker, 2000) (n aa is the number of amino acids of a certain type); the last two terms were included to model changes in protein stability upon mutation, but no not contribute to free energy changes of protein-protein interactions.
  • Atomic coordinates are taken from structures solved by X-ray crystallography.
  • Polar hydrogens were added to all structures, using CHARMM 19 standard bond lengths and angles.
  • rotatable bonds in polar hydrogen containing side chains several rotamers reflecting different hydrogen positions were created, including a 180 degree flip of Asn and Gin amide groups and the two His imidazole tautomers (assumed to be uncharged).
  • Global optimization of the hydrogen bonding network was performed for each structure using a simple Metropolis Monte-Carlo procedure as described previously (Kuhlman and Baker, 2000) with the energy function given in Equation (1) and described below.
  • Equation (1) The simple free energy function is given in Equation (1).
  • the Lennard- Jones potential, solvation term, and backbone-dependent amino acid probabilities are as previously described (Lazaridis et al, 1999; Kuhlman and Baker, 2000).
  • Energy of side chain-backbone and side chain-side chain hydrogen bonds were determined using an empirical function (Tania Kortemme, University of Washington, Seattle, WA, USA; Alex Morozov, University of Washington, Seattle, WA, USA and David Baker, University of Washington, Seattle, WA, USA) taking into account a) the distance between the hydrogen (H) and the acceptor (A) atoms, b) the angle at the hydrogen atom (D-H"*A) (D: donor atom), and c) the angle at the acceptor atom (H" » A-AB), (AB: heavy atom bound to the acceptor atom).
  • the distance and angular-dependent terms ofthe hydrogen bonding potential were derived from hydrogen bond geometries observed in high-resolution (2.0 A or better) protein crystal structures. Only hydrogen bonds with proton positions given by the chemistry of the donor group were considered for the derivation of the energy parameters of the potential.
  • Coulomb electrostatics used CHARMM 19 partial charges (Neria et al, 1996) and a linear distance-dependent dielectric constant. Hydrogen bonding and Coulomb interactions were divided into three environment classes, dependent on the extent of burial of both participating residues (class 1 : exposed-exposed and exposed-intermediate, class 2: exposed-buried and intermediate-intermediate, class 3: intermediate-buried and buried- buried). The extent of burial was defined by the number of C ⁇ atoms within a sphere of 8 A radius ofthe C ⁇ atom ofthe residue of interest: exposed 0-8, intermediate 9-14, buried >14).
  • Binding free energy changes upon alanine mutation (G b m d ) are calculated using equations (1) and (2).
  • the hydrogen bonding term was scaled so that the maximum contribution of replacing one partner in a buried hydrogen bond by alanine was -4.5 kcal/mol.
  • the relative weights for the two other burial classes were then scaled to be proportional to the weights found in the monomeric set such that the most favorable energies of intermediate and exposed hydrogen bonds were -2.0 and -0.8 kcal/mol, respectively.
  • the amino acid or amino acid sequence may be removed and replaced with an amino acid(s) which increases stability and enhances specific activity, thereby creating a modified protein.
  • the polypeptide may be altered using a variety of methods of DNA mutagenesis known to those of ordinary skill in the art. Individual amino acids may be altered, or one or more amino acids may be removed or replaced using conventional recombinant DNA technology, such as restriction enzymes or DNAses.
  • the sterically hindered or improperly bonded region may also be replaced with substitute amino acids. "Repacked or Replaced” means that an amino acid at a particular position has been substituted with a different amino acid residue or with a modified amino acid. This may be accomplished in a number of ways.
  • the sterically hindered or improperly bonded region may be first removed and then the replacement amino acid(s) incorporated into a polynucleotide encoding the modified polypeptide.
  • Recombinant DNA technology may be used to incorporate a particular coding region into a polynucleotide.
  • a region may be mutagenized using site-specific mutagenesis techniques that are well known to those of ordinary skill in the art.
  • amino acids that affect the activity of nearby active site residues may also be removed or replaced, either to facilitate the creation of a modified protein or to improve the protein in any way, such as increase the protein's stability and its activity.
  • multiple amino acids may be replaced or removed from any region of the proteins involved in creating a modified protein; thus, exactly or at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
  • Enzymatic assays may be appropriate to evaluate the activity of an enzyme, for example.
  • One of skill in the art would be able to evaluate the activity of a modified protein relative to the native protein.
  • a modified protein may be attached (conjugated or fused) to another polypeptide, peptide, or protein.
  • One of skill in the art would also be able to evaluate any modified conjugated or fusion protein of the invention depending upon the activity or activities of the polypeptide components.
  • Modified endonucleases with enhanced stability and site specific activity can be of tremendous benefit in a variety of applications such as, but not limited to, genomic mapping, rapid identification and/or targetting of bacterial or viral pathogens, identification and/or targetting of single or multiple nucleotide polymorphisms (SNPs, MNPs), gene therapy and cancer therapy.
  • the LAGLIDADG homing endonucleases are one such example of endonucleases that can be modified to alter the site specificity ofthe nuclease.
  • the present invention concerns novel compositions comprising a proteinaceous molecule that has been modified relative to a native or wild-type protein, hi other embodiments, amino acid residues of the proteinaceous compound have been replaced, while in further embodiments both deletions and replacements of amino acid residues in the proteinaceous compound have been made.
  • a proteinaceous compound may include an amino acid molecule comprising more than one polypeptide entity.
  • a "proteinaceous molecule,” “proteinaceous composition,” “proteinaceous compound,” “proteinaceous chain” or “proteinaceous material” generally refers, but is not limited to, a protein of greater than about 100 amino acids or the full length endogenous sequence translated from a gene; a polypeptide of greater than about 50 amino acids; and/or a peptide of from about 3 to about 50 amino acids. All the “proteinaceous” terms described above may be used interchangeably herein. Furthermore, these terms may be applied to fusion proteins or protein conjugates as well.
  • the size ofthe at least one proteinaceous molecule may comprise, but is not limited to, about or at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, or greater amino molecule residues, and any range derivable therein.
  • proteinaceous composition encompasses amino molecule sequences comprising at least one of the 20 common amino acids in naturally synthesized proteins, or at least one modified or unusual amino acid, including but not limited to those shown on Table 1 below.
  • an "amino molecule” refers to any amino acid, amino acid derivative or amino acid mimic as would be known to one of ordinary skill in the art.
  • the residues of the proteinaceous molecule are sequential, without any non-amino molecule interrupting the sequence of amino molecule residues.
  • the sequence may comprise one or more non-amino molecule moieties.
  • the sequence of residues of the proteinaceous molecule may be interrupted by one or more non-amino molecule moieties.
  • Proteinaceous compositions may be made by any technique known to those of skill in the art, including the expression of proteins, polypeptides or peptides through standard molecular biological techniques, the isolation of proteinaceous compounds from natural sources, or the chemical synthesis of proteinaceous materials.
  • the nucleotide and protein, polypeptide and peptide sequences for various genes have been previously disclosed, and may be found at computerized databases known to those of ordinary skill in the art.
  • One such database is the National Center for Biotechnology Information's Genbank and GenPept databases (www.ncbi.nlm.nih.gov).
  • Genbank and GenPept databases www.ncbi.nlm.nih.gov.
  • the coding regions for these known genes may be amplified and/or expressed using the techniques disclosed herein or as would be know to those of ordinary skill in the art.
  • a proteinaceous compound may be purified.
  • purified will refer to a specific or protein, polypeptide, or peptide composition that has been subjected to fractionation to remove various other proteins, polypeptides, or peptides, and which composition substantially retains its activity, as may be assessed, for example, by the protein assays, as would be known to one of ordinary skill in the art for the specific or desired protein, polypeptide or peptide.
  • the modified endonuclease has the ability to recognize and cleave within novel DNA target sequences.
  • the modified endonuclease of the present invention is associated with other activities such as the ability to cleave both strands of the DNA target; to cleave only the plus strand of the DNA target; to cleave only the negative strand of DNA target; or to bind but do not cleave either strand of DNA target.
  • the activity of the present invention refers to independent domains from separate, naturally occurring homing endonucleases that when recombined and fused, create modified hybrid proteins with unique and/or enhanced DNA target specificity.
  • the activity ofthe present invention refers to the ability of a single modified homing endonuclease to recognize and cleave an altered nucleic acid target.
  • These homing endonucleases and their activity are amenable to significant structural alterations.
  • the present application refers to the function or activity of a modified endonuclease, one of ordinary skill in the art would understand that this includes, for example, a endonuclease that possesses an additional advantage over the unmodified endonuclease.
  • Determination of the activity of an endonuclease may be achieved using assays familiar to those of skill in the art, and may include for comparison purposes, the use of native and/or recombinant versions of either the modified or unmodified endonuclease.
  • the present invention may employ amino acid sequence variants such as substitutional, insertional or deletion variants.
  • the modified endonuclease ofthe present invention may possess substitutions of amino acids that alleviate steric hinderance, increase stability and enhance site specific activity.
  • these modified proteins may further include insertions or added amino acids, such as with fusion proteins or proteins with linkers, for example.
  • Substitutional or replacement variants typically contain the exchange of one amino acid for another at one or more sites within the protein and may be designed to modulate one or more properties ofthe polypeptide, particularly to enhance its stability and site specific activity.
  • a modified protein may possess an insertion of residues, which typically involves the addition of at least one residue in the polypeptide. This may include the insertion of a linking peptide or polypeptide or simply a single residue. Terminal additions, called fusion proteins, are discussed below.
  • codons that encode the same amino acid such as the six codons for arginine or serine, and also refers to codons that encode biologically equivalent amino acids (Table 2).
  • amino acid and nucleic acid sequences may include additional residues, such as additional N- or C-terminal amino acids or 5' or 3' sequences, and yet still be essentially as set forth in one of the sequences disclosed herein, so long as the sequence meets the criteria set forth above, including the maintenance of biological protein activity where protein expression is concerned.
  • the addition of terminal sequences particularly applies to nucleic acid sequences that may, for example, include various non-coding sequences flanking either ofthe 5' or 3' portions ofthe coding region or may include various internal sequences, i.e., introns, which are known to occur within genes.
  • amino acids of a protein may be substituted for other amino acids in a protein structure without appreciable loss of interactive binding capacity with structures such as, for example, binding sites to substrate molecules. Since it is the interactive capacity and nature of a protein that defines that protein's biological functional activity, certain amino acid substitutions can be made in a protein sequence, and in its underlying DNA coding sequence, and nevertheless produce a protein with like properties. It is thus contemplated by the inventors that various changes may be made in the DNA sequences of genes without appreciable loss of their biological utility or activity, as discussed below. Table 2 shows the codons that encode particular amino acids.
  • the hydropathic index of amino acids may be considered.
  • the importance of the hydropathic amino acid index in conferring interactive biologic function on a protein is generally understood in the art (Kyte & Doolittle, 1982). It is accepted that the relative hydropathic character of the amino acid contributes to the secondary structure of the resultant protein, which in turn defines the interaction of the protein with other molecules, for example, enzymes, substrates, receptors, DNA, antibodies, antigens, and the like.
  • Patent 4,554,101 the following hydrophilicity values have been assigned to amino acid residues: arginine (+3.0); lysine (+3.0); aspartate (+3.0 ⁇ 1); glutamate (+3.0 ⁇ 1); serine (+0.3); asparagine (+0.2); glutamine (+0.2); glycine (0); threonine (-0.4); proline (-0.5 ⁇ 1); alanine (-0.5); histidine (-0.5); cysteine (-1.0); methionine (-1.3); valine (-1.5); leucine (-1.8); isoleucine (-1.8); tyrosine (-2.3); phenylalanine (-2.5); tryptophan (-3.4).
  • ⁇ 2 is preferred, those that are within ⁇ 1 are particularly preferred, and those within ⁇ 0.5 are even more particularly preferred.
  • amino acid substitutions generally are based on the relative similarity of the amino acid side-chain substituents, for example, their hydrophobicity, hydrophilicity, charge, size, and the like.
  • Exemplary substitutions that take into consideration the various foregoing characteristics are well known to those of skill in the art and include: arginine and lysine; glutamate and aspartate; serine and threonine; glutamine and asparagine; and valine, leucine and isoleucine.
  • Another embodiment for the preparation of modified polypeptides according to the invention is the use of peptide mimetics. Mimetics are peptide-containing molecules that mimic elements of protein secondary structure (see Johnson, 1993).
  • peptide mimetics The underlying rationale behind the use of peptide mimetics is that the peptide backbone of proteins exists chiefly to orient amino acid side chains in such a way as to facilitate molecular interactions, such as those of antibody and antigen. A peptide mimetic is expected to permit molecular interactions similar to the natural molecule. These principles may be used, in conjunction with the principles outline above, to engineer second generation modified protein molecules having many of the natural properties of a native protein, but with altered and, in some cases, even improved characteristics. As is known to one of ordinary skill in the art, mutations by amino acid sequence variants may be used to generate synthetic peptides.
  • a specialized kind of insertional variant is the fusion protein. It is contemplated that some of the chimeric polypeptides of the invention may be considered fusion proteins. This molecule generally has all or a substantial portion of the native molecule, linked at the N- or C- terminus, to all or a portion of a second polypeptide, such as a second nucleic acid binding domain.
  • Another example is a fusion that contains leader sequences from other species to permit the recombinant expression of a protein in a heterologous host.
  • Another useful fusion includes the addition of an immunologically active domain, such as an antibody epitope or other tag, to facilitate targeting or purification ofthe fusion protein.
  • an immunologically active domain such as an antibody epitope or other tag
  • 6xHis and GST glutthione S transferase
  • 6xHis and GST glutthione S transferase
  • Other useful fusions include linking of functional domains, such as active sites from enzymes such as a hydrolase, glycosylation domains, cellular targeting signals or transmembrane regions.
  • cross-linking reagents are used to form molecular bridges that tie together functional groups of two different molecules, e.g., a stablizing and coagulating agent.
  • hetero-bifunctional cross-linkers can be used that eliminate unwanted homopolymer formation. It is contemplated that cross-linkers may be implemented with the modified protein molecules ofthe invention.
  • Bifunctional cross-linking reagents have been extensively used for a variety of purposes including preparation of affinity matrices, modification and stabilization of diverse structures, identification of binding sites, and structural studies.
  • cross-linkers may be used to stabilize the polypeptide or to render it more useful as gene specific reagent, for example, by improving the modified protein's targeting capability or overall efficacy.
  • Cross-linkers may also be cleavable, such as disulfides, acid-sensitive linkers, and others.
  • Homobifunctional reagents that carry two identical functional groups may be used to induce efficient cross-linking between identical and different macromolecules or subunits of a macromolecule, and linking of polypeptides to specific binding sites on binding partners.
  • Heterobifunctional reagents contain two different functional groups. By taking advantage of the differential reactivities of the two different functional groups, cross-linking can be controlled both selectively and sequentially.
  • the bifunctional cross-linking reagents can be divided according to the specificity of their functional groups, e.g., amino, sulfhydryl, guanidino, indole, carboxyl specific groups. Of these, reagents directed to free amino groups have become especially popular because of their commercial availability, ease of synthesis and the mild reaction conditions under which they can be applied.
  • a majority of heterobifunctional cross- linking reagents contains a primary amine-reactive group and a thiol-reactive group.
  • cross-linking ligands to liposomes are described in U.S. Patent 5,603,872 and U.S. Patent 5,401,511, each specifically incorporated herein by reference in its entirety).
  • Various ligands can be covalently bound to liposomal surfaces through the cross- linking of amine residues, i another example, heterobifunctional cross-linking reagents and methods of using the cross-linking reagents are described (U.S. Patent 5,889,155, specifically incorporated herein by reference in its entirety).
  • the cross-linking reagents combine a nucleophilic hydrazide residue with an electrophilic maleimide residue, allowing coupling in one example, of aldehydes to free thiols.
  • the cross-linking reagent can be modified to cross-link various functional groups and is thus useful for cross-linking polypeptides and sugars. Table 3 details certain hetero-bifunctional cross-linkers considered useful in the present invention.
  • the present invention concerns independent domains from naturally occurring nucleic acids, as well as whole proteins encoded by naturally occurring nucleic acids, isolatable from cells, that are free from total genomic DNA and that are capable of expressing all or part of a protein or polypeptide.
  • the polynucleotide may encode a native protein that may be manipulated to encode a modified protein.
  • the polynucleotide may encode a modified protein, or it may encode a polynucleotide that will be used to make a fusion protein with a modified protein. It is contemplated that a single polynucleotide molecule may encode, 1, 2, or more different polypeptides (all or part).
  • Nucleic acids of the present invention may be used in expression systems to produce recombinant proteins that can be purified from expressing cells to yield active proteins.
  • nucleic acid refers to a DNA molecule that has been isolated free of total genomic DNA of a particular species. Therefore, a nucleic acid encoding a polypeptide refers to a DNA that contains wild-type, polymorphic, or modified polypeptide- coding sequences yet is isolated away from, or purified free from, total genomic DNA. Included within the term “nucleic acid” are a nucleic acids encoding nucleic acid binding proteins, portions of such proteins, and recombinant vectors, including, for example, plasmids, cosmids, phage, viruses, and the like.
  • polynucleotide refers to a nucleic acid molecule that has been isolated free of total genomic nucleic acid. Therefore, a “polynucleotide encoding a native polypeptide” refers to a DNA segment that contains wild-type polypeptide-coding sequences isolated away from, or purified free from, total genomic DNA.
  • cDNA is intended to refer to DNA prepared using messenger RNA (mRNA) as template. It also is contemplated that a particular polypeptide from a given species may be represented by natural variants that have slightly different nucleic acid sequences but, nonetheless, encode the same protein (due to wobble in codons).
  • a polynucleotide comprising an isolated or purified wild-type, polymorphic, or modified polypeptide gene refers to a DNA segment including wild-type, polymorphic, or modified polypeptide coding sequences and, in certain aspects, regulatory sequences, isolated substantially away from other naturally occurring genes or protein encoding sequences.
  • the term "gene” is used for simplicity to refer to a functional protein, polypeptide, or peptide-encoding unit. As will be understood by those in the art, this functional term includes genomic sequences, cDNA sequences, and smaller engineered gene or cDNA segments that express, or may be adapted to express, proteins, polypeptides, domains, peptides, fusion proteins, and modified polypeptides.
  • a nucleic acid encoding all or part of a native or modified polypeptide may contain a contiguous nucleic acid sequence encoding all or a portion of such a polypeptide of the following lengths: about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, or more nucleotides, nucleosides, or base pairs.
  • the invention concerns isolated DNA segments and recombinant vectors incorporating DNA sequences that encode a wild-type, polymorphic, or modified polypeptide or peptide that includes within its amino acid sequence a contiguous amino acid sequence in accordance with, or essentially corresponding to a native polypeptide.
  • the term "recombinant" may be used in conjunction with a polypeptide or the name of a specific polypeptide, and this generally refers to a polypeptide produced from a nucleic acid molecule that has been manipulated in vitro or that is the replicated product of such a molecule.
  • the invention concerns isolated DNA segments and recombinant vectors incorporating DNA sequences that encode a polypeptide or peptide that includes within its amino acid sequence a contiguous amino acid sequence in accordance with, or essentially corresponding to the polypeptide.
  • nucleic acid segments used in the present invention may be combined with other nucleic acid sequences, such as promoters, polyadenylation signals, additional restriction enzyme sites, multiple cloning sites, other coding segments, and the like, such that their overall length may vary considerably. It is therefore contemplated that a nucleic acid fragment of almost any length may be employed, with the total length preferably being limited by the ease of preparation and use in the intended recombinant DNA protocol.
  • nucleic acid constructs of the present invention may encode full-length polypeptide from any source or encode a truncated version of one or more polypeptides, such as a nucleic acid binding domain without other domains.
  • a nucleic acid sequence may encode a full-length polypeptide sequence with additional heterologous coding sequences, for example to allow for purification of the polypeptide, transport, secretion, post-translational modification, or for therapeutic benefits such as targetting or efficacy.
  • a tag or other heterologous amino acid segments may be added to the modified polypeptide-encoding sequence, wherein "heterologous" refers to an amino acid segment from another polypeptide.
  • one or more nucleic acid constructs may be prepared that include a contiguous stretch of nucleotides identical to or complementary to a particular gene, such as to a homing endonuclease.
  • a nucleic acid construct may be at least 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 250, 300, 400, 500, 600, 700, 800, 900, 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 15,000, 20,000, 30,000, 50,000, 100,000, 250,000, 500,000, 750,000, to at least 1,000,000 nucleotides in length, as well as constructs of greater size, up to and including chromosomal sizes (including all intermediate lengths and intermediate ranges), given the advent of nucleic acids constructs such as a yeast artificial chromosome are known to those of ordinary skill in the art. It will be readily understood
  • DNA segments used in the present invention encompass biologically functional equivalent modified polypeptides and peptides, for example, a modified endonuclease.
  • Such sequences may arise as a consequence of codon redundancy and functional equivalency that are known to occur naturally within nucleic acid sequences and the proteins thus encoded.
  • functionally equivalent proteins or peptides may be created via the application of recombinant DNA technology, in which changes in the protein structure may be engineered, based on considerations ofthe properties ofthe amino acids being exchanged. Changes designed by a human may be introduced through the application of site-directed mutagenesis techniques.
  • a vector in the context of the present invention refers to a carrier nucleic acid molecule into which a sequence encoding a native, modified or unmodified protein can be inserted for introduction into a cell and thereby replicated.
  • a nucleic acid sequence can be exogenous, in that it is foreign to the cell into which the vector is being introduced; or that the sequence is homologous to a sequence in the cell but positioned within the host cell nucleic acid in which the sequence is ordinarily not found.
  • Vectors include but are not limited to plasmids; cosmids; viruses; artificial chromosomes such as YACs (yeast artificial chromosomes) and BACs (bacterial artificial chromosomes); and synthetic constructs such as linear/circular expression elements (LEEs/CEEs).
  • artificial chromosomes such as YACs (yeast artificial chromosomes) and BACs (bacterial artificial chromosomes)
  • synthetic constructs such as linear/circular expression elements (LEEs/CEEs).
  • Viral vectors may be derived from viruses know to those of skill in the art, for example, bacteriophage, animal and plant virus, including but not limited to, adenovirus, vaccinia virus (Ridgeway, 1988; Baichwal and Sugden, 1986; Coupar et al, 1988) adeno-associated virus (AAV) (Ridgeway, 1988; Baichwal and Sugden, 1986; Hermonat and Muzycska, 1984) retrovirus and herpesvirus and offer several features for use in gene transfer into various mammalian cells (Friedmann, 1989; Ridgeway, 1988; Baichwal and Sugden, 1986; Coupar et al, 1988; Horwich et al, 1990).
  • adenovirus vaccinia virus
  • Baichwal and Sugden 1986
  • AAV adeno-associated virus
  • retrovirus and herpesvirus offer several features for use in gene transfer into various mammalian cells (Friedmann, 1989; Ridgeway, 1988; Baich
  • cells containing the modified polypeptides of the present invention may be identified in vitro or in vivo by including a marker in the expression vector.
  • markers would confer an identifiable change to the cell permitting easy identification of cells containing the expression vector.
  • a selectable marker is one that confers a property that allows for selection.
  • a positive selectable marker is one in which the presence of the marker allows for its selection, while a negative selectable marker is one in which its presence prevents its selection.
  • An example of a positive selectable marker is a drug resistance marker.
  • a drug selection marker aids in the cloning and identification of transformants
  • genes that confer resistance to neomycin, puromycin, hygromycin, DHFR, GPT, zeocin and histidinol are useful selectable markers.
  • markers conferring a phenotype that allows for the discrimination of transformants based on the implementation of conditions other types of markers including screenable markers such as GFP, whose basis is colorimetric analysis, are also contemplated.
  • screenable enzymes such as herpes simplex virus thymidine kinase (tk) or chloramphenicol acetyltransferase (CAT) may be utilized.
  • the present invention contemplates the use of modified nucleic acid binding polypeptides as delivery vehicles that may be used to deliver a molecule to a specific nucleic acid site or as an inhibitor at a particular site.
  • the present invention further contemplates a number of means by which a molecule may be delivered to a cell, tissue or subject.
  • Virtually any method by which nucleic acids can be introduced into a cell, or an organism may be employed with the current invention, as described herein or as would be known to one of ordinary skill in the art.
  • Such methods include, but are not limited to direct delivery of DNA by: injection (U.S. Patents 5,994,624, 5,981,274, 5,945,100, 5,780,448, 5,736,524, 5,702,932, 5,656,610, 5,589,466 and 5,580,859, each incorporated herein by reference); microinjection (Harlan and Weintraub, 1985; U.S. Patent No.
  • a modified nucleic acid binding polypeptide can bind to the altered target site and inhibit or prevent transcription or translation ofthe downstream sequence.
  • FIG. 1 Structural modeling indicated that it should be possible to create a novel chimaeric endonuclease by fusing the N-terminal domain of l-Dmol to an I-Oel monomer, repacking the new protein interface to facilitate efficient folding and intimate domain association, and then inserting a short peptide linker to create an enzyme monomer.
  • the domain interface ofthe initial protein model was analyzed to identify 14 interface residues to be redesigned, followed by an automated computational design protocol to search through 8 x 10 17 possible interface sequences. All amino acids but cysteine were allowed at each position.
  • E-Drel interface variants as determined by computational redesign (16 total constructs, each containing between 8 and 12 altered residues in the interface) were then generated and screened in vivo to insure proper folding and solubility. Biochemical characterization of several soluble E-Drel variant proteins revealed that each was able to bind and cleave a specific, 23 bp chimeric DNA target site with high specificity. The co-crystal structure of one of these active variants bound to target site DNA was determined in order to assess the accuracy of prediction of the computational redesign method and characterize the artificial endonuclease.
  • the computational interface redesign focused on the six residues exhibiting steric clashes in the original model, and was extended to include eight additional residues predicted to contribute substantially to the interface free energy (A12, Y13, L17, 119, 152, E105, Y109 and F113).
  • A12, Y13, L17, 119, 152, E105, Y109 and F113 At each of these fourteen sites, a library of possible side chain conformations spanning 19 potential amino acids (all but cysteine) in different backbone-dependent rotameric states (on average about 500 rotamers per sequence position) was created.
  • a Monte-Carlo-simulated annealing procedure in which a move consists of the random replacement of a single rotamer with an alternative rotamer from the library, was then used to search through the 8 x 10 sequence combinations (with 6 x 10 total rotamer combinations) to identify particularly low free energy amino acid sequences. Since the Monte-Carlo protocol does not guarantee finding a global free energy minimum, 1000 separate sequence design runs were performed using two polypeptide backbone models with slightly different relative orientations of the LAGLIDADG helices in the domain interface. This procedure yielded a family of sequences with different amino acid choices at each of the 14 design positions. Native residues were consistently best at three of 14 positions, and a single new residue was consistently best at a fourth position.
  • a short peptide linker was inserted between the l-Dmol and l-Crel domains to generate a monomeric protein.
  • the linker chosen for this purpose, -NGN- resembled the -NMR- linker found in native I-Dmol, but contained glycine and asparagine residues in order to exploit the high ⁇ -turn propensity of NG- and GN-containing peptides (Hutchinson and Thornton, 1994).
  • the 16 enzyme variants described above were generated by site-directed mutagenesis and screened for in vivo folding and solubility.
  • the solubility screen utilizes a blue/white colony color difference that reflects protein solubility- dependent LacZ ⁇ complementation in E.
  • a lacZ ⁇ peptide was fused to E-Drel variants, and then expressed each in E. coli cells expressing a lacZ ⁇ protein partner.
  • constructs encoding ⁇ -Drel were subcloned into the pl-Crel vector and transfected into E coli cells, and induced with 0.5 mM IPTG in BL21[D ⁇ 3] E.coli cells overnight at 15°C. Cells were harvested by centrifugation and lysed by sonication in 50 mM Tris pH 8.0, 100 mM NaCl and 1 mM CaCl 2 .
  • insoluble E-E>reI/lacZ constructs form inclusion bodies, fail to complement lacZ ⁇ , and give rise to white colonies.
  • soluble ⁇ - DrelllacZa constructs complement lac ⁇ to give rise to blue colonies on X-gal indicator plates (FIG. 1).
  • E-Drel is a novel endonuclease with altered specificity
  • E-Drel In order to determine the binding and catalytic activities of E-Drel on different DNA target sites, three of the most highly soluble E-Drel variants as identified by the in vivo protein solubility assay were over-expressed and purified as described above. All three proteins were soluble and easily purified by heparin affinity and size exclusion chromatography. All three proteins were stable at 4°C at a concentration of ⁇ 5 mg/ml in buffer containing 5% glycerol, 150 mM NaCl, 1 mM CaCl 2 , and 50 mM Tris pH 8.0.
  • E-Drel is a two-domain chimeric monomer composed of l-Dmol and l-Crel domains, it was reasoned that the most likely E-Drel target site would be a chimera of the l-Dmol and I-Oel target sites (FIG. 2A).
  • the two native homing sites (termed 'dmo' and 'ere', respectively) can be considered as four distinct half sites, with the center of each target site defined by the middle of the four base overhang generated upon cleavage.
  • the native dmo site is asymmetric, and the two half sites are referred to here as DI and D2.
  • the ere site is pseudo-palindromic, and the two half sites are refer to as CI and CI'.
  • Four chimeric sites can be generated from these four half sites (FIG. 2A): these sites were termed drel (D1:C1), dre2 (D1:C1'), dre3 (D2:C1) and dre4 (D2:C1').
  • the affinity of protein binding to DNA target sites was performed by gel shift analyses (also referred to as electrophoretic migration retardation analyses) using labeled oligonucleotide substrate in the presence of 20 mM Tris 9.0, 10 mM calcium chloride, 1 mM DTT, and 50 ⁇ g/ml BSA. Gels were imaged on a Storm Phosphorimager 840 (Molecular Dynamics, Sunnyvale, CA).
  • Cleavage activity was also evaluated.
  • the digestion of labeled oligonucleotide substrates was performed in I-Oel buffer (20 mM Tris 9.0, 10 mM MgC12, 1 mM DTT, 50 ⁇ g/ml BSA) at 65°C (l-Crel digests), or in New England Biolabs Buffer 4 containing 50 ⁇ g/ml BSA at 37°C (I- Dmol digests) or 65°C (E-E>rel digests).
  • 5' end-labeled primers and template DNA were used to generate both the sequencing ladders and the dsDNA substrates for E-Drel cleavage.
  • Denatured ⁇ -Z eI-digested substrates (labeled as 'X' in FIG. 2C) were run alongside their corresponding sequencing reactions to map cleavage positions. Gels were imaged on a Storm Phosphorimager 840 (Molecular Dynamics, Sunnyvale, CA). Each of the three purified E-Drel variants cleaved target sites dre3 and dre4, but was unable to cleave the drel or dre2 target sites or the native dmo or ere target sites (FIG. 2B). Conversely, purified l-Dmol or l-Crel did not cleave any of the four dre target sites.
  • the dre3 and dre4 sites each contain the same dmo half-site and one ofthe two ere half sites: thus the N- terminal domain of l-Dmol recognizes only the D2 dmo half site, which was unknown upon beginning this project, while the C-terminal domain from l-Crel recognizes either ere half-site, as expected for a domain from an endonuclease homodimer (FIG. 2B).
  • the biochemical behavior of E-Drel on these different DNA target sites indicates that E-Drel is a novel, highly sequence-specific endonuclease that displays altered DNA target site specificity.
  • Drel 6 Since all three E-Drel constructs behaved identically in binding and cleavage assays, a single variant, Drel 6, was chosen for more thorough characterization. This E-Drel variant contains eight computationally designed point mutations at the domain interface (I19W, H51F, L55R, E105R, L108A, F113I, K193N, L194F). Drel6 cleaves its target site precisely at one phosphodiester bond on each DNA strand, separated by four base pairs in the target site DNA to generate four base, 3'-extended cohesive ends (FIG. 2C). This end geometry is identical to all other characterized LAGLIDADG homing endonucleases.
  • Drel6 displays a dissociation constant (Ka ) of 100 ⁇ 5 nM as determined by gel shift assays, or ⁇ two orders magnitude lower than the 1 nM dissociation constant of native I-Oel (Wang et al, 1997). Like l-Crel, E-Drel forms a tight complex with cleavage products in which product dissociation is rate-limiting. This prevents the direct determination of steady-state k cat and K M values.
  • E-Drel bound to its DNA target site was determined in order to visualize the redesigned protein interface, to determine the accuracy of the computational interface prediction, and to visualize the endonuclease DNA interface and its active sites.
  • each asymmetric unit of the P3[ unit cell four copies of the E-Drel/DNA complex are visible: two are well-ordered and have an average B of 35 A 2 , while the remaining two complexes were poorly ordered and have been modeled as poly-alanine/DNA (average B ⁇ 110 A 2 ), hi the well-ordered complexes, density is present for all residues except 1-4 and 253-260 (which are similarly disordered in the l-Dmol and I-Oel structures, respectively (Jurica et al, 1998; Silva et al, 1999; Chevalier et al, 2001).
  • the general topology of the E-E>rel structure and its domain interface was similar to those found in the previously determined structures ofthe parental endonucleases: the conserved core LAGLIDADG helices pack tightly against one another, and the redesigned residues cluster to either side of these helices to form a well-packed domain interface (FIG. 3).
  • the buried surface area of the interface in the E-Drel structure (1460 A 2 ) is comparable to that in l-Dmol (1430 A 2 ) and l-Crel (1870 A 2 ) and is in the size range of typical protein-protein interfaces (Conte et al, 1999).
  • the computational model accurately predicted the E-Drel interface structure (FIGS. 4B- 4E).
  • the apparent rigidity of the protein backbone in the interface undoubtedly facilitated the agreement of predicted side chain conformations to the actual structure.
  • the side chains in the designed and experimental interfaces superimpose well including both conserved and substantially altered residues.
  • ⁇ Gj nt values are predicted changes in interface free energy for a side chain mutation to alanine computed as described elsewhere (Tania Kortemme, University of Washington, Seattle, WA, USA and David Baker, University of Washington, Seattle, WA, USA). Positions subjected to sequence design are indicated in bold. Positions with sequence changes in the final E-Z)rel sequence compared to the parent sequences are highlighted in red. Designed alanine residues (A12A and L108A) are not shown.
  • the dmo half-site interface which had not been previously visualized, includes direct contacts to DNA bases by four arginines, two acidic residues (Asp and Glu), a tyrosine and a threonine.
  • the l-Dmol domain also makes two base- specific contacts from the protein backbone and stacking between a thymine methyl group and a tyrosine ring.
  • the number of H-bonds in the DNA-protein interface of E-Drel is under-saturated: of 92 potential H-bonds that could be made in the major grove of the 23 base pair interface, only 32 direct and 16 water- mediated contacts were observed (FIG. 5).
  • Each E-Drel active site contains residues from both domains. Thus, the E-E>rel active sites are themselves chimeras.
  • the substrate DNA in the E-Drel complex is not cleaved, because of a combination of low temperature and low pH during crystal growth.
  • metal-coordinating aspartates and residues at the periphery of the active sites were avoided.
  • a nuclease consisting of either a naturally occurring protein or an artificial chimeric protein can be modified in a variety of ways to alter the activities delivered to its DNA target sequence.
  • the initial nuclease constructs generally act to create a double strand break at or near the target site.
  • the protein can be modified by mutagenesis in such a manner that any of its active sites (of which there are usually two, but may be more in the case of more complex protein oligomers) are independently inactivated or functionally attenuated, so that one or more sites on each DNA strand are not cleaved, or are cleaved independently to varying extents.
  • a natural or redesigned nuclease can also be modified by adding or eliminating specific surface residues, including, but not limited to cysteines, lysines, arginines, and histidines, to which a wide variety of specific chemical compounds may be attached covalently or non-covalently.
  • specific chemical compounds may include, but are not limited to, fluorophores, chromophores, cross-linking reagents, differentially incorporated isotope labels, radiolabeled compounds, metal chelating agents, and other reactive species.
  • a natural or redesigned nuclease can also be modified by incorporating one or more additional protein domains to either the amino- or carboxy-terminal termini ofthe modified protein.
  • additional protein domains can include, but are not limited to enzymatic catalysts, DNA-binding proteins, transcriptional activators or repressors, DNA- remodeling factors, protein binding modules, membrane-binding proteins and/or protein flurophores.
  • a natural or redesigned nuclease including chimeric nucleases, can also be modified by repacking and remodeling their cores in order to alter their thermal and/or chemical stability, or to introduce metal-binding sites within their structures.
  • a nuclease consisting of either a naturally occurring protein or an artificial chimeric protein can be modified in a variety of ways to alter its DNA target specificity, beyond any alteration produced as the result of recombining nuclease domains.
  • computational methods similar to those described for the redesign of a protein-protein interface can be used to redesign the protein-DNA interface, with an altered set of amino acids within that interface selected against a novel DNA target sequence.
  • the computational methods can be modified to be more appropriate for such an interface, for example by altering the energetic weighting schemes of various forms of structural and chemical interactions, by accommodating and modeling DNA conformational dynamics, and/or by incorporating explicitly modeled solvent molecules into computational redesign algorithms.
  • in vivo and in vitro selection algorithms can be used to select for a nuclease protein, from a partially or completely randomized library of nuclease protein variants, that recognizes a DNA target site of an investigator's choosing. These selection methods can be modified or extended to also ensure that the selected nuclease variant does not bind alternative target sites, also of the investigator's choosing. Selection methods can include, but are not limited to, screens based on DNA binding, DNA binding and nuclease cleavage activity, gene inactivation or inactivation, or any other biochemical or physiological process that is dependent on the presence and/or action ofthe modified protein at a targeted DNA sequence.
  • the screens can be performed in solution, in encapsulated solution systems, or in living prokaryotic or eukaryotic cells. Selection experiments can be performed in conjunction with computational redesign algorithms, or independently of them. Computational screens can be used to help direct the first generation of selected nuclease variants, or can be used independently of selection experiments.
  • compositions and/or methods and/or apparatus disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure.
  • compositions and methods of this invention have been described in terms of preferred embodiments, it will be apparent to those of skill in the art that variations may be applied to the compositions and/or methods and/or apparatus and in the steps or in the sequence of steps ofthe method described herein without departing from the concept, spirit and scope of the invention.

Abstract

The present invention provides compositions and methods concerning nucleic acid binding polypeptides, such as homing endonucleases, which are modified to possess sequencespecific activity at a modified target site. Methods for creating such modified polypeptides involves computer modeling either a polypeptide with a nucleic acid binding domain and a modified target nucleic acid sequence or a fusion of two nucleic acid binding domains (chimera) and a target nucleic acid sequence. Subsequent steps involve modification of the polypeptide or chimera to generate a protein that has site-specific activity at a particular target nucleic acid sequence. In some embodiments, the altered polypeptide has nuclease activity that is specific to the target site. In additional embodiments, the methods and compositions specifically concern homing endonucleases, such as the LAGLIDADG or His-Cys Box Group I homing endonucleases.

Description

DESCRIPTION
METHODS AND COMPOSITIONS CONCERNING DESIGNED HIGHLY-SPECIFIC
NUCLEIC ACID BINDING PROTEINS
BACKGROUND OF THE INVENTION
The present application claims the benefit of priority to U.S. Provisional Application No. 60/408,847, filed on September 6, 2002, which is incorporated by reference in its entirety. The government may own rights in the present invention pursuant to grant number
GM49857 and CA88942 from the National institutes of Health.
1. Field of the Invention The present invention relates generally to the fields of biochemistry and molecular biology. More particularly, the present invention relates to designing and producing artificial nucleic acid binding proteins that specifically bind novel nucleic acid sequences. In particular embodiments homing endonucleases are used as the initial molecules for design and engineering. The new nucleic acid binding proteins possess a specificity of nucleic acid sequence recognition that is sufficiently high to allow their use as gene-specific reagents (proteins that recognize and bind one site within an entire biological genome).
2. Description of Related Art Evolution has repeatedly generated new functional proteins through the recombination and fusion of existing proteins, leading to increased structural and functional complexity. This has been accomplished both through the linkage of independently folded protein domains with flexible peptide linkers, and through the more intimate fusion of protein domains with highly specific protein interfaces. Reproducing the process of domain linkage is relatively straightforward in the laboratory. In contrast, domain fusion, and in particular the design of novel protein interfaces, is extremely difficult.
Recent developments in computational protein design algorithms (Pokala and Handel, 2001), and the development of methods for directed protein evolution (Farinas et al, 2001; Schmidt-Dannert, 2001), offer great promise for protein engineering and molecular interface design. Computational protein design successes have included the redesign of protein cores; the introduction of metal binding sites into proteins; and increases in protein stability (Pokala and Handel, 2001; Dahiyat and Mayo, 1997). Particularly notable have been the complete redesign of a ββα protein motif (Dahiyat and Mayo, 1997); design of novel helical bundle topologies (Harbury et al, 1998); and rational construction of a protein with enzyme-like properties (Bolon and Mayo, 2001). There has also been substantial progress in the selection of enzyme variants that display altered substrate specificities or physical properties (reviewed in Farinas et al, 2001; Schmidt-Dannert, 2001). Altering substrate specificity while maintaining full activity in these enzymes has often been difficult. Challenges in DNA research involving genomic mapping and gene therapy application has generated considerable interest in the identification of new DNA-binding proteins, particularly enzymes that act on DNA, with high specificity. Many engineering attempts have centered on modifying restriction endonucleases, and in particular altering and/or increasing their DNA target-site sequence specificity. Unfortunately, the precision of these enzymes, provided by many redundant contacts to the bases in their restriction sites, has made this venture extremely challenging. Although type II restriction enzymes are common reagents in most laboratories and have revolutionized molecular biology, they have evolved to recognize and cleave relatively short DNA sequences invading their host cell. As such, their natural restriction sites are usually 4 to 8 base pairs, which corresponds to roughly one cleavage site in every 300 to 100,000 base pairs of a random DNA target.
Despite a wealth of structural information on, for example, EcoRV, well-designed experiments to engineer this enzyme to specifically recognize novel sites or longer sites have been only moderately successful. Some progress has been made, however, including the generation of an EcoRV mutant that prefers deoxyuridine over thymidine and another that prefers a methylphosphonate substitution, both at one position of the restriction sequence half- site (Lanio et al, 1996; Wenz et al, 1994). Other EcoRV mutants have a nearly 100-fold preference for distinct base pairs flanking the 6 base pairs restriction site (Lanio et al, 1998).
In a different series of studies with the aim of designing novel DΝA-binding proteins, non-specific nuclease domains have been tethered to sequence-specific DΝA binding modules such as zinc-fingers (Smith et al, 1999; Smith et al, 2000) and used in vivo to stimulate homologous recombination (Bibikova et al, 2001) or other cellular processes. Such constructs are potentially useful for the creation of gene-specific reagents (a single protein that recognizes a unique site in a genome) but lack the ability to specifically act at a single unique phosphodiester bond or base pair within the DΝA target site (Smith et al, 1999). Even in the event that restriction enzymes or many other DNA-binding proteins can be successfully engineered to recognize novel target sequences, their relatively short site specificity is insufficient for the design of DNA-binding proteins that recognize only a single site within an entire biological genome (referred to in the literature and herein as a "gene specific reagent" or "GSR"). Thus, the demand for novel nucleic acid-binding proteins, particularly those with sufficient specificity to act at single genomic sites, has led to great interest in the possibility of engineering enzymes capable of recognizing and being active at novel target sites. .
SUMMARY OF THE INVENTION
Thus, the present invention concerns methods for designing and creating polypeptides that recognize specific nucleic acid sequences. The invention concerns at least two related methods. The first method allows the computational design and/or genetic selection of polypeptides that are chimeric fusions of DNA-binding domains from independent, naturally occurring DNA-binding proteins. The second method allows the computational design and/or genetic selection of polypeptides with modified nucleic acid binding surfaces. These methods can be used either individually or in concert to create novel nucleic acid-binding proteins. Within certain embodiments, homing endonucleases are used as the initial molecules for design and engineering. The present invention further concerns the biochemical functions of these designed and created polypeptides, whose nucleic acid binding specificity can be exploited as a reagent for conducting experiments involving nucleic acids, as a therapeutic or diagnostic agent to recognize a specific nucleic acid sequence, or as a delivery vehicle to a specific nucleic acid sequence.
Generally, methods of the invention involve the computer modeling of a polypeptide (native or a chimeric polypeptide) in the context of a target sequence (designed or chimeric) to identify contact points and interfaces, which refer to interactions both between a polypeptide and a nucleic acid or between different domains or parts of the polypeptide with one another. A nucleic acid-protein contact point refers to the point at which a protein domain and the nucleic acid molecule favorably interact. The sum of nucleic acid-protein contact points constitutes the nucleic acid interface — that is, the entire interaction between a nucleic acid binding polypeptide and its nucleic acid target sequence. A protein-protein contact point refers to the point at which individual protein domains favorably interact. Likewise, the sum of protein-protein contact points constitutes the amino acid interface. The term "interface" refers to nucleic acid and amino acid interfaces unless otherwise specified. The term "contact point" refers to nucleic acid- protein contact points and protein-protein contact points, unless otherwise specified.
In many embodiments of the invention, polypeptides are designed based on a naturally occurring polypeptide that has a nucleic acid binding domain that specifically binds a particular nucleic acid sequence (referred to as "nucleic acid binding polypeptide"). The nucleic acid binding polypeptide may also comprise other activities or functions. In certain embodiments of the invention, the nucleic acid binding polypeptide is a homing endonuclease, which has a DNA binding domain. Furthermore, it is specifically contemplated that the invention concerns nucleic acid binding polypeptides that possess other activities that operate at or with respect to the particular nucleic acid sequence (referred to as "target sequence"). In some embodiments, the nucleic acid binding polypeptide is a nuclease that cleaves the nucleic acid within the recognized specific nucleic acid sequence.
The invention has particular advantages in generating polypeptides that recognize sequences not previously recognized by a known polypeptide. As discussed earlier, because many of the known nucleic acid binding proteins, such as restriction endonucleases, recognize 4 to 8 bases/base pair sites and because the genome contains so many nucleotide base pairs (4 billion for the human genome), the number of sites recognized by these proteins in a genome is quite high. Proteins that recognize fewer sites in a genome possess particular benefits, for example the ability to act upon a single genetic site that is a unique marker of a cell phenotype, such as a neoplastic mutation, or on a similarly unique genetic marker of a viral or bacterial pathogen. The action of the engineered polypeptide can be limited to binding, or can include cleavage, gene activation or inactivation, or chemical modification of the nucleic acid sequence. Thus, the invention provides methods and compositions concerning proteins that recognize sites that are greater than 8 bases/base pairs. The recognition site may be, be at least, or be at most 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 bases/base pairs or longer in length. The generated proteins may recognize a double-stranded site that is found in a genome fewer than 10 times, and may be a GSR with respect to a particular genome. In certain embodiments, the site is DNA. hi specific embodiments, the nucleic acid binding polypeptide binds specifically to DNA, while in other embodiments it binds to RNA. DNA or RNA may be in an A-form, B-form, or Z-form conformation or an intermediate conformation between these states.
In specific embodiments, the nucleic acid binding polypeptide binds DNA or RNA. Such polypeptides have one or more "DNA binding domains," which refers to a region (one or more contiguous amino acid residues) or regions of the polypeptide that form the interface and mediate interactions between the polypeptide and the target nucleic acid sequence. These same domains may contain nuclease active sites that are capable of cleaving the target sites at specific positions within their sequence. The chemical activity of these active sites can be attenuated or eliminated as part ofthe redesign ofthe novel nucleic acid-binding proteins. Naturally occurring or wild-type DNA binding polypeptides can be used as a template that may subsequently be altered or modified to generate a polypeptide with the ability to bind specifically to a nucleic acid sequence that differs from the sequence recognized by the unaltered polypeptide.
It is specifically contemplated that multiple DNA binding domains from the same or heterologous polypeptides may be combined to form a chimeric polypeptide. When such a chimeric polypeptide is created and that polypeptide has nuclease activity, it is referred to as a chimeric nuclease. Chimeric polypeptides may be further modified to alter their structure and/or activity. In some embodiments, the chimeric polypeptide recognizes the sequence identified by each individual DNA binding domain of the chimera. Methods for creating such a chimeric nucleic acid-binding polypeptide with nucleic acid sequence-specific activity (activity, which may be binding activity or may be binding activity in conjunction with an enzymatic activity, at a specific nucleic acid sequence) involve a) preparing a computational model of a complex between a first polypeptide having a nucleic acid binding domain, such as from a nuclease, and a second polypeptide having a nucleic acid domain, such as from the same or a different nuclease; b) evaluating and identifying amino acids that are potential protein-protein contact points between the first polypeptide and the second polypeptide, and c) identifying an amino acid change that creates or enhances an amino acid contact point to improve an amino acid interface between the first and second polypeptides, and further provides a design for the chimeric nuclease. Chimeric nucleases with nucleic acid sequence-specific activity also involve computational modeling and improvement/optimization of the protein-protein interface. Chimeric nucleases of the present invention may be designed so that their catalytic activity is reduced with respect to the parental native nucleases. In some embodiments, the nuclease activity is abolished, rendering a chimeric nucleic acid binding polypeptide. The term "chimeric nucleic acid binding polypeptides" includes chimeric nucleases, which are chimeric nucleic acid binding polypeptides with nuclease activity.
The term "potential contact point" refers to a point at which a favorable interaction may occur if modifications are made to amino acid(s) in that area. In some cases, a potential contact point is one in which there are spatial problems, for example, where two amino acids are too far apart to interact with one another or where there is steric hindrance between several amino acids. Alternatively, there may be an unfavorable chemical interaction that requires modification of one or more ofthe amino acids.
It is further specifically contemplated that some methods and compositions of the invention concern an existing nuclease (either a naturally occurring protein or a novel chimera as described above) that recognizes a specific sequence initially, but then is designed and modified to recognize a different specific nucleic acid sequence. The binding occurs through an interface between the protein and the specific nucleic acid sequence. Modified polypeptides ofthe present invention may be designed to reduce or abolish the catalytic activity of the unmodified polypeptide. Methods for creating such a modified nucleic acid binding polypeptide with nucleic acid sequence-specific activity (capable of recognizing a target nucleic acid sequence that is different from the sequence recognized by the unmodified nuclease) involve: a) preparing a computational model of a complex between the unmodified nuclease and the nucleic acid sequence to which it is desired to bind (the "target site") to evaluate the nucleic acid interface; b) identifying potential nucleic acid-protein contact points between the nuclease and the nucleic acid sequence and c) identifying an amino acid change that creates or enhances a nucleic acid- protein contact point to improve a nucleic acid interface between the DNA binding domain and the nucleic acid sequence, and further provides a design for the modified nuclease. The term "improve" is meant to denote the design of an interface that permits the stable and specific interaction between contact points or along an interface relative to interactions between interfaces involving unmodified proteins. The term includes improving an interaction to such an extent that the interaction is said to be "optimized." Thus, the term improve/optimize refers to improvement that may or may not include optimization. The binding occurs through a nucleic acid-binding domain, which is a region ofthe protein that specifically interacts (chemically) with a nucleic acid. For either embodiment ofthe methods described above, the term "computational model" refers to a schematic or other preliminary work that is prepared using a computer algorithm or computer program that can process and provide information about protein and nucleic acid chemistry and conformation. A number of such programs and algorithms are readily available and known to those of skill in the art. They can configure a protein sequence into a 3- dimensional molecule and additionally configure it with a ligand or other substrate, such as a particular nucleic acid molecule. h the context of the invention, the program or algorithm will configure and improve (in some cases, optimize) an interface, including its amino acid side chains, between separate DNA- binding domains of naturally occurring DNA-binding proteins (to create novel chimeric DNA- binding proteins). The program or algorithm will also configure and improve or optimize the interface, including its amino acid side chains, between DNA-binding domain(s) of naturally occurring DNA-binding proteins or chimeric DNA-binding proteins and nucleic acid molecules. This program or algorithm will allow the detection, identification and improvement/optimization of contact points between individual protein domains or between protein domains and nucleic acid molecules. A "contact point" refers to the point at which individual protein domains, or protein domain and nucleic acid molecules interact. Such contact points are formed as a result of specific binding between two protein domains or between protein domains and a nucleic acid molecule. Other amino acids within the interface may also be modified to enhance or improve the interaction between individual protein domains or between protein domains and nucleic acid molecules. Modifications to the interface may result in improved interaction between individual protein domains or between protein domain(s) and nucleic acid molecules present in the complex or may result in improved stability of the protein, hi this context, amino acid side chains to be altered represent "potential contact points" in the interface. "Interface" refers to the amino acids between individual protein domains or between protein domains and nucleic acid molecules that form contact points, as well as those amino acids that are adjacent to contact points and along the planar surface between individual protein domains or between protein domains and nucleic acid molecules.
More importantly, the algorithm or program will allow the identification of either potential contact points or residues that are not properly interacting with the target sequence or other residues between individual protein domains or between protein domains and nucleic acid molecules that are inhibiting or reducing the overall interaction. Thus, methods ofthe invention further include the step of identifying potential contact points between individual protein domains or between protein domains and nucleic acid molecules and/or identifying amino acids along the interface that can modified to improve the interface (that is improve the interaction between individual protein domains or between protein domains and nucleic acid molecules). Computational modeling that occurs in different embodiments of methods of the invention involves modeling ofthe various entities so as to show their interactions with one another, such interactions between or among the following: single or multiple protein domains with each other or with one or more target sequence(s).
Within one example, the interface ofthe initial model of a new protein-protein complex, such as a chimeric nuclease ofthe present invention, or of a novel nucleic acid-binding protein- nucleic acid complex, is analyzed to identify amino acid side chains in the protein(s) to be altered in order to improve the atomic contacts made throughout the interface. This is followed by an automated computational design protocol to search through possible interface sequence combinations corresponding to the amino acids to be altered.
At each ofthe positions where an amino acid side chain has been determined to display a non-optimal contact or set of contacts within the interface, a library of possible side chain conformations spanning 19 potential amino acids (the set excludes cysteine) in different backbone-dependent rotameric states (on average about 500 rotamers per sequence position) is created. The interaction of all rotamers with the surrounding fixed portion of the molecule (including the polypeptide backbone and all side chains not subjected to the sequence redesign), and all pairwise rotamer-rotamer energies is computed using a free energy function which includes van-der-Waals interactions, solvation effects, explicit hydrogen-bonding interactions, and statistical terms representing the backbone-dependent internal free energies of amino acid rotamers (Tania Kortemme, University of Washington, Seattle, WA, USA and David Baker, University of Washington, Seattle, WA, USA; personal communication and Kuhlman and Baker, 2000). A variety of previously described methods exist for exploiting side chain rotamer libraries for the purpose protein engineering and redesign and are available to anyone skilled in the art. These methods include both variations in the descriptions of side chain rotamer distributions, and variations in the terms used to quantify their energy distributions. Any of these libraries and methods can be used with respect to the methods and reagents described herein. See e.g., a review by Dunbrack (2002). An energy minimization procedure is then used to search through amino acid sequence combinations in the interface to identify particularly low free energy amino acid sequences. For each cycle of this minimization a move consists ofthe random replacement of a single side chain rotamer with an alternative rotamer from the library. As provided in certain examples herein, a Monte-Carlo protocol is used for energy minimization. Sets of solutions are further assessed by eliminating sequence changes likely to affect nearby active site residues or, in some cases, by reducing structural redundancy (for example, if phenylalanine and tyrosine were computationally selected at a position, only one residue was used based on whether a neighboring atom could form a hydrogen-bond). In a final computational step, the interface free energy of the best possible sequence combinations are exhaustively enumerated using optimized rotamer conformations for each sequence. A variety of methods exist for calculations of molecular energies and for their minimization. These methods are readily known and available to the skilled artisan. See for example a review by Mendes et al (2002). These methods include, but are not limited to classical, semi-empirical, and quantum mechanical energy calculations. These methods range from statistical analyses of known protein structures on one hand to more physically-based methods on the other. The methods are generally designated as statistical effective energy functions, physical effective energy functions and empirical effective energy functions. Any of these methods may be used within the methods and compositions of the present invention. As will be understood within the methods for designing modified nucleases, amino acid residues within the interface that should make atomic contact with the target nucleic acid sequence, are characterized as "potential contact points." A potential contact point may refer to one or more amino acids that are not stably interacting with the target sequence because a strong or stable enough chemical bond cannot be formed between the amino acid(s) and the sequence. As a result, there is no contact point because of improper or inadequate bonding. The inability to bind may be due to chemical constraints (incompatible reaction groups) or proximity issues (too far or too close together). A specific chemical constraint can involve amino acids that repel each other or attract each other because of chemical charges. A specific proximity issue is when there is steric hindrance between either amino acids or between an amino acid and the target sequence, which precludes or interferes with proper chemical bonding. Alternatively, an amino acid(s) may be too far from the target sequence to create an interface. In such cases, there is a gap between the two, which must be reduced or eliminated to create a contact point.
A protein may be modified through one or more amino acid changes, including rotameric changes, to create an actual contact point between the protein's nucleic acid binding domain and the target sequence. A protein may also be modified through one or more amino acid changes to improve the interface between the members of the complex (for example, between the protein and nucleic acid or between the DNA binding domains of a chimeric nuclease; between individual protein domains or between protein domains and nucleic acid molecules). An amino acid change is a modification that is a substitution, deletion, or addition of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more contiguous or non-contiguous amino acids. Therefore, methods ofthe invention further include identifying an amino acid change that creates or enhances a contact point or the interface between the nucleic acid binding domain and the nucleic acid sequence (target sequence), which can further provide a design for the modified polypeptide. Enhancing a contact point or the interface means that a point at which the polypeptide and the nucleic acid sequence directly or indirectly interact is made more chemically favorable, which includes reducing entropy, increasing stability, and reducing any steric hindrance.
Amino acid changes to create 1, 2, 3, 4, 5, 6, 7, 8,9 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25 or more interfaces may be determined empirically or computationally, or both. A change that is determined computationally refers to the use of a computer program or algorithm to identify amino acid changes that would create a desired contact point or improve the interface. In some embodiments, the change is identified based on other polypeptides that interact with site of similar sequence. Parameters known to those of ordinary skill in the art may be employed to guide the program or algorithm, such as sequence alignments, three-dimensional structural alignments, calculations of molecular interaction energies, and docking scores based on molecular complementarity.
Within certain embodiments of the invention, rotameric libraries of amino acid side chains are used, wherein the different backbone-dependent rotameric states comprise (i) rotamers interacting with surrounding groups; (ii) rotamers interacting with a fixed portion of the molecule including the backbone and all side chains not subject to substitution; and (iii) pairwise rotamer to rotamer energies, which may be considered in determining amino acid changes.
Within on example, a method is described wherein the overall minimum free energy is enhanced by (i) providing at least two polypeptide backbone models with different relative orientations of protein sequences in the domain interface; (ii) performing sequence design assays on the polypeptide backbone (or on each polypeptide backbone when a chimeric polypeptide is employed); and (iii) obtaining sequences with different amino acid combinations at each interface. The combinations of sequences obtained can be reduced by: (1) eliminating sequences that affect the activity of nearby active site residues; (2) choosing between redundant residues by their ability to form hydrogen bonds with neighboring residues; (3) screening for optimal rotamer conformation for each sequence; and/or (4) identifying the top scoring interface free energy sequences having the overall minimum free energy. The term "redundant residues" refers to residues that share similar steric and/or chemical structures, differing only at individual atomic positions, such as phenylalanine vs. tyrosine, or glutamate vs. glutamine, or aspartate vs. asparagine. A person of ordinary skill in the art is well aware of which residues are considered "redundant" with respect to other residues; a "conservative" amino acid change reflects such redundancy.
A change that is determined empirically involves changing an amino acid and then evaluating the modified polypeptide. In some embodiments, the amino acid change is evaluated using a binding assay in which the ability ofthe modified protein to bind all or part of the target sequence is tested. In additional embodiments, a binding-cleavage assay is performed to determine whether the target sequence is cleaved at the appropriate position.
Methods for creating modified nucleic acid binding polypeptides, in some embodiments, further include a step of obtaining a crystal structure (or at least the data regarding a crystal structure) of the nucleic acid binding polypeptide before it has been modified. The crystal structure may be ofthe polypeptide alone or with a nucleic acid sequence to which it specifically binds or recognizes. It is further contemplated that obtaining a crystal structure of the data regarding a crystal structure may be accomplished by generating the crystal structure and gathering the relevant data. It will be understood that the relevant data regarding a crystal structure concerns information about the polypeptide' s tertiary structure and/or the polypeptide' s interaction with a nucleic acid sequence to which it specifically binds or recognizes.
After a polypeptide is designed, it may be prepared by methods readily known to those of skill in the art. In some embodiments, substitutions, deletions, or additions are introduced via the nucleic acid encoding the polypeptide. Such recombinant techniques are well known to those of skill in the art. In some embodiments, the amino acid change(s) is/are implemented by site- directed mutagenesis of a nucleic acid encoding the unmodified polypeptide. "Unmodified polypeptide" refers to the polypeptide sequence prior to be re-designed using method of the invention. The starting point for many ofthe design methods is an unmodified polypeptide. The term can also apply to nucleases and homing endonucleases specifically.
After a modified polypeptide is prepared, it may be assayed for solubility and/or proper folding, as well as for activity.
In some embodiments ofthe invention, the nucleic acid binding polypeptide contains, at a minimum, a nucleic acid binding domain and a catalytic domain. In specific embodiments, the protein is a nuclease, which contains a nucleic acid binding domain and a domain that cleaves the specifically recognized sequence at a particular site (referred to as site-specific activity). In still further embodiments of the invention, the nuclease is an endonuclease. In some embodiments of the invention, a Group I homing endonuclease is utilized and modified in methods and compositions of the invention. A variety of Group I homing endonucleases are known, and are divided based on motif into four families: LAGLIDADG, His-Cys Box, HNH, or GIY-YIG. The invention covers all of them, though in specific embodiments, the Group I LAGLIDADG homing endonucleases are involved. LAGLIDADG endonucleases that may be used include, but are not limited to: Dmo-l, Cre-l, I-Ceuϊ, I-Scel, I-Scell, I-SceV, I-SceNI, I-Llal. l-Tevϊ is an example of an HNH endonuclease and I-revHI is an example of a GrY-YIG endonuclease. An example of a His-Cys Box homing endonuclease is Ppo-Ϊ. In certain embodiments of the invention, homing endonucleases from the LAGLIDADG or His-Cys Box families are employed. Modified nucleases of the invention may cleave both stands of a target DNA site, cleave the plus strand of a DNA target site, cleave the minus strand of a DNA target site, or bind the DNA target site specifically but not cleave either strand. It is specifically contemplated that methods and compositions of the invention discussed with respect to a nucleic acid binding polypeptide generally, may be applied specifically with respect to a homing endonuclease, and vice versa.
In additional compositions and methods of the invention a polypeptide that has been designed and/or modified to bind a specific target sequence contains an additional a reactive group, which, in some embodiments, includes a cross-linking agent, a fluorophore, a chromophore, a metal chelator, or a protein domain attached to the modified polypeptide. The reactive group is chemically attached to the modified polypeptide in other embodiments of the invention. In some embodiments, the modified polypeptide comprises a protein marker. The protein marker comprises lacZ in additional embodiments.
Many methods and compositions of the invention concern a modified polypeptide that is chimeric. A polypeptide that is "chimeric" refers to polypeptide that contains two or more recognizable and distinct regions that are not found in nature together, and which may be from different polypeptides. hi most cases, the regions are from different polypeptides, however, it is contemplated that a modified polypeptide may contain, for example, two nucleic acid binding regions from the same polypeptide (when that polypeptide normally only has one such region). Methods of the invention are directed to creating a chimeric polypeptide that recognizes the combined target sites of multiple nucleic acid binding domains. In these cases, a polypeptide with 1, 2, 3, 4, 5, or more nucleic acid binding domains is created and altered to recognize the combined nucleic acid sequences. The target nucleic acid in the context of a chimeric polypeptide with multiple nucleic acid binding domains contains each of the sequences recognized by individual binding domains in some embodiments of the invention. It is contemplated that the chimeric nuclease of the present invention may be designed and/or produced to cleave both stands of a target DNA site, to cleave the plus strand of a DNA target site, cleave the minus strand of a DNA target site, or to bind the DNA target site specifically but not cleave either strand. Therefore, the use ofthe term "nuclease" in this context merely denotes the origin ofthe originating nucleic acid binding domains.
Methods of the invention involve any or all of the steps previously identified, and may further include, the following steps or qualifications: wherein step a) involves preparing a computational model of a chimeric nuclease and a nucleic acid sequence, such that the chimeric nuclease comprises (i) a first nuclease, comprising a DNA binding domain and a catalytic domain and (ii) at least a second DNA binding domain; and additional step d) identifying amino acids that are potential protein-protein contact points between the second DNA binding domain and the first nuclease, wherein the substitution of the amino acid improves atomic contacts that can be hindered by, for example, steric hinderance or improper bonding, and wherein the substitution provides a design for the modified chimeric nuclease. An "potential contact points" refer to amino acid residues along the interface between the domains of a chimeric polypeptide that may be introduced to increase the stability of the chimeric polypeptide or to improve or optimize the polypeptide's interface with the target sequence. Atomic contacts between two amino acids that hinder the formation of an optimal interface include but are not limited to improper or undesirable chemical interaction that include, but are not limited to charge-charge repulsion, burial of charged or polar amino acids, or exposure of hydrophobic amino acids and physical interference which includes either steric hindrance between amino acids or poor van der Waals complementarity and interactions between amino acids. It is specifically contemplated that one or more nucleic acid binding domains in a chimeric nuclease are from a homing endonuclease.
It is further contemplated that a chimeric protein may contain a peptide linker molecule between regions to create a monomeric protein. In some embodiments, the peptide linker is located between the first DNA binding domain and the second DNA binding domain, h specific embodiments, a peptide linker comprises the amino acid sequence of NGN, GNGN, NGNG, or GNGNG.
The invention further concerns compositions generated as a result of design methods of the invention. h some embodiments, the invention concerns a modified recombinant polypeptide that recognizes a specific nucleic acid sequence that is greater than 8, 9, 10, 11, 12, 13, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 25, 26, 27, 28, 29, 30 or more nucleotides in length. Alternatively, the recombinant modified polypeptide recognizes two or more sites in tandem (back-to-back or one after the other) that are individually recognized by the polypeptide prior to being modified.
In specific embodiments, the invention concerns a modified nuclease that has altered sequence-specificity made by the method comprising: (a) preparing a computational model of a complex between nuclease and a target nucleic acid sequence, wherein the nuclease comprises a catalytic domain and a DNA binding domain; (b) identifying potential nucleic acid-protein contact points between the DNA binding domain and the nucleic acid sequence; (c) identifying an amino acid change that creates or enhances a nucleic acid-protein contact point between the DNA binding domain and the nucleic acid sequence and further provides a design for the modified nuclease. hi additional embodiments there is a step (d) in which the method further includes preparing the modified nuclease. In certain embodiments ofthe invention, the catalytic domain is catalytically inactive such that the nuclease is no longer a nuclease but is a nucleic acid binding polypeptide
Furthermore, methods ofthe invention include methods of designing a modified chimeric nucleic acid binding polypeptide with sequence-specific activity comprising: (a) preparing a computational model of a complex between a chimeric nucleic acid binding polypeptide and a nucleic acid sequence, wherein the chimeric nuclease comprises (i) a first polypeptide having a DNA binding domain and (ii) at least a second polypeptide having a DNA binding domain; (b) identifying potential nucleic acid-protein contact points between the chimeric nuclease and the nucleotide sequence, and (c) identifying an amino acid substitution to produce an operative interface between the chimeric nucleic acid binding polypeptide and the nucleic acid sequence, wherein the substitution provides a design for the modified chimeric nucleic acid binding polypeptide. The term "operative" as used herein refers to the predicted ability of the modified chimeric nucleic acid binding polypeptide to recognize the nucleic acid sequence.
In particular embodiments, the modified chimeric nucleic acid binding polypeptide comprises site-specific nuclease activity. In some compositions of the invention, a modified chimeric nuclease capable of recognizing an altered target nucleic acid sequence comprises: a) a first DNA binding domain from a first homing endonuclease; and, b) a second DNA binding domain from a second homing endonuclease, wherein the chimeric DNA binding polypeptide is capable of binding the target DNA sites of the first and second DNA binding domains. It is contemplated that the DNA binding domains from either or both of the homing endonucleases are further modified to improve the nucleic acid interface between the chimeric nuclease and the target nucleic acid sequence. There may be substitutions, deletions, or additions of up to 5, 10, 15, 20, 25, or more amino acids and still allow the domain to be 50% identical to the original unmodified domain. In certain embodiments, the chimeric DNA binding polypeptide has a domain from Dmo-
I and/or a domain from Cre-1.
Other methods of the invention concern methods of designing a modified chimeric nuclease with nucleic acid sequence-specific activity comprising: (a) preparing a computational model of complex between a chimeric nuclease and a DNA sequence, wherein said chimeric nuclease comprises (i) a first nuclease, comprising a DNA binding domain and a catalytic domain and (ii) at least a second DNA binding domain; (b) identifying a potential protein-protein contact points between the first nuclease and the second DNA binding domain; and (c) identifying an amino acid substitution to create a protein-protein contact point or to enhance a nucleic acid interface between the chimeric nuclease and the nucleic acid sequence, wherein the substitution providing a design for the modified chimeric nuclease. Within the context of chimeric nucleases, points along the interface between the DNA-binding domains of a chimeric nuclease necessary to establish a stable interaction between the domains are characterized as "potential protein-protein contact points." In additional methods of the invention, the computational design increases the overall minimum free energy by (a) providing at least two polypeptide backbone models with different relative orientations of LAGLIDADG sequences in the domain interface; (b) performing sequence design assays on each polypeptide backbone; and (c) obtaining sequences with different amino acid combinations along the interface. In other embodiments, the number of different amino acid combinations is reduced by: (a) eliminating sequences that affect the activity of nearby active site residues; (b) choosing between redundant residues by their ability to form hydrogen bonds with neighboring residues; (c) screening for optimal rotamer conformation for each sequence; and (d) identifying the top scoring interface free energy sequences having the overall minimum free energy.
Other methods of the invention include methods of screening to identify a modified polypeptide with altered DNA sequence-specific activity comprising: a) generating polypeptides with one or more amino acid substitutions in their DNA binding domains; b) contacting the polypeptides with nucleic acid segments with random sequences, under conditions that allow the DNA binding domains of the polypeptides to bind specifically to the nucleic acid segments; c) identifying which polypeptides specifically bind the nucleic acid segments; and, d) identify the sequences of the nucleic acid segments, hi specific embodiments, the modified polypeptides specifically bind a sequence of at least 9 residues. Any of the modified polypeptides described herein may be employed (or achieved) in any method of the invention, including any screening method.
It is specifically contemplated that any embodiment discussed with respect to a particular method or composition may be implemented with respect to other methods and compositions of the invention.
The use ofthe word "a" or "an" when used in conjunction with the term "comprising" in the claims and/or the specification may mean "one," but it is also consistent with the meaning of "one or more," "at least one," and "one or more than one." The use ofthe term "or" in the claims is used to mean "and/or" unless explicitly indicated to refer to alternatives only or the alternative are mutually exclusive, although the disclosure supports a definition that refers to only alternatives and "and/or."
Throughout this application, the term "about" is used to indicate that a value includes the standard deviation of error for the device or method being employed to determine the value. Other objects, features and advantages of the present invention will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples, while indicating specific embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.
BRIEF DESCRIPTION OF THE DRAWINGS
The following drawings form part ofthe present specification and are included to further demonstrate certain aspects ofthe present invention. The invention may be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein.
FIG. 1. Overall design strategy and method for creation and structural analysis of E- Drel. FIGS. 2A-2C. Biochemical characterization of a designed endonuclease with novel specificity. FIG. 2A. The dmo and ere sites (targets of I-Dmol and l-Crel, respectively) differ considerably; dmo is asymmetric, while ere is nearly palindromic (2-fold symmetric positions underlined). FIG. 2B. E-Drel, I-Crel and l-Dmol activity on different target site DNAs. FIG. 2C. Mapping of E-Drel scissile phosphate positions in the dre3 target site. FIGS. 3A-3B. Top and side stereoviews ofthe E-E>rel domain interface. Blue backbone denotes N-terminal domain from l-Dmol; gray backbone denotes C-terminal domain from I- Oel. FIG. 3A. Location of the interface in the overall structure (gray oval). FIG. 3B. Detailed view ofthe interface highlighting the tight packing of interacting side chains.
FIGS. 4A-4Ε. Comparison of E-Drel X-ray structure and computational model. Domains from l-Dmol and I-Oel in the x-ray structure are blue and gray, respectively; the designed model is red. FIG. 4A. Overall superposition of the backbone template used for computational design and the E-Drel crystal structure illustrates the good overall agreement between experimental and designed structures. FIG. 4B. Top view of the domain interface showing an overall superposition of the structure and computational model and detailed side chain packing interactions including the three most important interface hot spot residues: Y13, W19, and F194. FIG. 4C. Y13 forms two hydrogen bonds across the interface to D115 and N193. FIG. 4D. W19 stacks across the interface against F151, and forms an unanticipated hydrogen bond to Q144 and a loose cation-π interaction with R148. FIG. 4E. F194 is buried across the interface in a hydrophobic pocket lined by residues L14, L47, F52 and 152.
FIG. 5. Base-specific contacts made by E-Drel to target site DNA. Contacts to the dmo half site are shown (top), and to the ere half site are shown (bottom). Both strands of each half site are shown with the target site center indicated by the oval connecting each strand and the position ofthe scissile phosphates by the circles in the dmo (bottom) and ere (top) strands.
FIG. 6. Stereo views of the E-Drel and I-Crel active sites. Indicated are the three catalytic metals (purple), DNA backbone (gray) from one base on either side of the scissile phosphates (yellow), waters (blue) and putative nucleophilic waters (red).
DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
A variety of experimental approaches have been used to identify homing endonuclease variants that can bind and cleave mutant DNA homing sites that contain individual nucleotide base pair substitutions (Argast et al, 1998; Bryk et al, 1993; Gruen et al, 2002; Guo et al, 2000). Such strategies allow a wild type homing endonuclease to be altered to recognize minor variants of its natural DNA target site. The expansion and diversification of the LAGLIDADG family by repeated fusion of unrelated enzyme domains (Chevalier and Stoddard, 2001) has identified this family as being ideal for reengineering.
I. The Present Invention The present invention encompasses methods of designing nucleic acid binding polypeptides that are modified to alter their ability to act at a particular target nucleic acid sequence. These modified polypeptides are created from either a single unmodified nucleic acid binding polypeptide or from a combination of unmodified nucleic acid binding polypeptides (chimera). Generally, computer modeling is employed to ascertain the interaction between a region of a nucleic acid binding polypeptide and a target sequence that is altered with respect to the sequence the polypeptide binds in its unmodified form. The modeling evaluates the interface between them and aids in determining which modifications are desirable to enhance the interface between the polypeptide and the nucleic acid. Furthermore, modeling may be employed when a chimeric polypeptide is employed to further evaluate the interaction between the regions of the polypeptide that form the chimera. Modified chimeric polypeptides may be generated by recombining and fusing them to alter site specific activity; identifying interferences between the fused regions of the modified protein comprising sterically hindered or improperly bonded groups; and methods of identifying amino acid substitutions to reduce any negative effect of the interferences, as well as to enhance stability and improve site specific activity.
The term nucleic acid binding polypeptide refers to any polypeptide that specifically binds a particular nucleic acid target, such as transcription factors and endonucleases. While any nucleic acid binding polypeptide may be employed, methods and compositions specifically involve homing endonucleases with respect to both chimeric and non-chimeric modified polypeptides in some embodiments ofthe invention.
II. Homing Endonucleases
Homing is the lateral transfer of an intervening sequence (either an intron or intein) to a homologous allele that lacks the sequence (Jacob, 1977). The process is catalyzed by an endonuclease that recognizes and cleaves the target allele. The homing endonuclease itself is encoded by an open reading frame (ORF) embedded within the mobile intervening sequence. The mobile elements avoid disrupting host gene function by self-splicing at the RNA (introns) or protein (inteins) level.
Homing endonucleases have been found in Eubacteria, Archea, and single cell eukaryotes, where they are often encoded by and promote the lateral transfer of mobile introns (Belfort and Roberts, 1997; Chevalier and Stoddard, 2001). Homing endonucleases are highly specific and have evolved to cleave target sequences within cognate alleles without being overly toxic to the organism. They tolerate some individual base variation at their homing site, which ensures their propagation despite evolutionary drift of their target sequence. Homing endonucleases tend to be small proteins of less than 40 kDa, a property likely due to length limitations ofthe mobile sequences in which they reside.
Homing endonucleases bind sequence long (15-^K) bp) DNA target sites, ensuring extremely high specificity, while tolerating small numbers of single base pair polymorphisms in those sites (Chevalier and Stoddard, 2001). This combination makes homing endonucleases highly sequence-specific (recognizing as few at 1 in 109 random sequences) and excellent candidates from which to engineer new, sequence-specific DNA binding or catalytic proteins.
Four homing endonuclease families have been defined on the basis of conserved protein motifs (Chevalier and Stoddard, 2001). These families, are often collectively termed 'group I homing endonuclease families' since they are associated with group I introns. They comprise of the LAGLIDADG, GIY-YIG, H-N-H and the His-Cys box (Belfort and Roberts, 1997; Jurica and Stoddard, 1999) families, but it has been suggested that the latter two comprise a single ββ Me family of endonucleases (Kuhlmann et al, 1999). Of these families, the LAGLIDADG enzyme family are the largest and have unique homing sites, further suggesting these enzymes to be malleable and should offer a strong foundation for engineering novel DNA-binding proteins. Because of their structures, members ofthe LAGLIDADG and His-Cys box families are particularly suited to the methods ofthe invention.
A. LAGLIDADG Family
The large number of different DNA binding specificities represented in the LAGLIDADG homing endonuclease family suggest that the methods of the present invention can be used to generate novel DNA binding proteins with the ability to target many different, specific genes within complex genomes. This endonuclease family, with more than 200 members, has been variously termed 'LAGLIDADG,' 'DOD,' 'dodecapeptide,' 'dodecamer' and 'decapeptide' (Belfort and Roberts, 1997; Lambowitz and Belfort, 1993; Dalgaard et al, 1997). The LAGLIDADG endonucleases are the most phylo genetically diverse of the homing endonuclease families. This vast host distribution includes, for example, the genomes of plant and algal chloroplasts, fungal and protozoan mitochondria, bacteria and archaea. One reason for a wide distribution of LAGLIDADG ORFs appears to be their remarkable ability to invade unrelated types of intervening sequences, including group I introns, archaeal introns and inteins. Descendents of LAGLIDADG homing endonucleases are also found as freestanding endonuclease genes (Watabe et al, 1981; Watabe et al, 1983; Kostriken et al, 1983) and as maturases that assist in RNA splicing (Schafer et al, 1994; Monteilhet et al, 2000; Ho et al, 1997). Members of this family are relatively small protein homodimers or monomers composed of separate domains (Chevalier and Stoddard, 2001; Dalgaard et al, 1997) in which LAGLIDADG motifs form structurally conserved, tightly associated α-helical pairs at the center of hydrophobic domain interfaces (Duan et al, 1997; Heath et al, 1997; Ichiyanagi et al, 2000; Jurica et al, 1998; Silva et al, 1999). Enzymes that contain a single copy of this motif, such as I-Oel (Thompson et al, 1992) and l-Ceul (Marshall and Lemieux, 1992) act as homodimers and recognize a nearly palindromic homing site (which, like a homodimeric protein, has inherent 2- fold symmetry). Enzymes that have two copies of this motif separated by 80-150 residues, such as l-Dmol (Dalgaard et al, 1993) and PI-Scel (Gimble and Thorner, 1992) act as monomers.
Unlike homodimers, monomers are not constrained to highly symmetrical DNA targets, and in fact their homing sites tend to be less palindromic. All LAGLIDADG endonucleases recognize long DNA sites (14-30 bp) and cleave the DNA to leave 4-nucleotide 3' overhangs. Binding of DNA target sites is dictated by independent sets of interactions made between individual domains or subunits to individual DNA half-sites (Jurica et al, 1998). The active sites of the enzyme are directly juxtaposed at the enzyme domain interface and share a catalytic divalent cation so that they must maintain their physical association in order to cleave DNA substrates (Chevalier et al, 2001). One of the best characterized LAGLIDADG endonucleases is PI-Scel from Saccharomyces cerevisiae, which is generated by autocatalytic protein splicing ofthe NMA intein located within the catalytic subunit ofthe yeast vacuolar lϊ -ATPase. l-Crel (Heath et al, 1997) and l-Dmol (Silva et al, 1999), which are intron-encoded LAGLIDADG endonucleases that lack protein splicing activity.
B. His-Cys Box Family
This small family of proteins is encoded within the only known mobile group I introns residing in nuclear genomes (Johansen et al, 1993). All of these mobile introns are located within highly conserved regions of nuclear small and large subunit ribosomal DΝA of slime molds, fungi and amoebae. The best-studied member of this family is l-Ppol from Physarum polycephalum.
C. GIY-YIG Family
This smaller family of endonucleases is characterized by the conserved GrY-(Xio_n)-YIG motif (Kowalski et al, 1999). GIY-YIG endonucleases have been found in the T4 bacteriophage both as freestanding enzymes (F-2evI, F-7evII; (Sharma et al, 1992) and within mobile group I introns such as l-Tevl, 1-TevR; (Bell-Pedersen et al, 1990) GIY-YIG ORFs have also been reported in introns of fungal mitochondria (Tian et al, 1991; Paquin et al, 1994; Saguez et al, 2000), algal mitochondria (Kroymann and Zetsche, 1997; Denovan- Wright et al, 1998) and algal chloroplasts (Paquin et al, 1995; Holloway et al, 1999).
D. HΝH Family Members of the HΝH family are the least well characterized, structurally and biochemically, of all homing endonucleases. The HΝH motif is also the least restricted of the four homing endonuclease families. It has been identified in the non-specific endonucleases, such as the antibacterial colicins E7 and E9 (Ko et al, 1999; Kleanthous et al, 1999), and in proteins encoded by mobile group II introns, including I-SceN, I-SeeNI and l-Llal (Zimmerly et al, 1995a; Zimmerly et al, 1995b; Matsuura et al, 1997). HΝH proteins contain two pairs of conserved histidines surrounding a conserved asparagine within a 30-33 residue sequence (Shub et al, 1994; Gorbalenya, 1994). Members of the HΝH family encoded within group I introns include l-Hmul and 1-HmuH from the SPO1 and SP83 introns, respectively, of two closely related Bacillus subtilis bacteriophages (Goodrich-Blair et al, 1990; Goodrich-Blair and Shub, 1994; Goodrich-Blair and Shub, 1996) and I-reviπ from the nrdB intron of RB3 bacteriophage (Eddy and Gold, 1991). This motif is also contained within l-Cmoel from the psbA gene of the Chlamydomonas moewusii chloroplast (Drouin et al, 2000) and within a homologous, yet uncharacterized, ORF in the psbA gene of C. reinhardtii (Hollo way et al, 1999). Other ORFs containing HNH motifs, including inteins, have been reported but not studied (Dalgaard et al, 1997; Gorbalenya, 1998; Piefrokovski, 1998).
III. Computational Model for Designing Proteins Methods of the invention involve using computer modeling to assist in the design of novel nucleic acid binding proteins. Computer modeling allows the three-dimensional chemical structure of a particular protein molecule to be discerned, particularly in the context of other molecules, such as a nucleic acid to which the protein specifically binds or will be designed to specifically bind or all or part of another protein to which it will be joined. The interface ofthe initial model of a new protein-protein complex or of a new protein- nucleic acid complex is analyzed to identify amino acid side chains in the protein(s) to be altered (in order to optimize the atomic contacts made throughout the interface). This is followed by an automated computational design protocol to search through possible interface sequence combinations corresponding to the amino acids to be altered. Generally, at each ofthe positions where an amino acid side chain was determined to display a non- or sub-optimal contact or set of contacts within the interface, a computer program is used, which strips the amino acid side chains at these positions, leaving only the backbone structure intact. Using a library of rotamers, the program then randomly places side-chains into each stripped amino acid residue until the interface is repopulated with side chains. More specifically, a library of possible side chain conformations spanning 19 potential amino acids (all but cysteine) in different backbone- dependent rotameric states (on average about 500 rotamers per sequence position) is created. The interaction of all rotamers with the surrounding, fixed portion ofthe molecule (including the polypeptide backbone and all side chains not subjected to sequence redesign), and all pairwise rotamer-rotamer energies is computed using a free energy function which includes van-der- Waals interactions, solvation effects, explicit hydrogen-bonding interactions, and statistical terms representing the backbone-dependent internal free energies of amino acid rotamers (Tania Kortemme, University of Washington, Seattle, WA, USA and David Baker, University of Washington, Seattle, WA, USA; personal communication; Kuhiman and Baker, 2000). The free energy of the interface is determined. A program such as the Monte-Carlo protocol may be employed for this set of calculations. The program then minimizes the free energy by selecting amino acids that yield a low free energy.
An energy minimization procedure is then used to search through amino acid sequence combinations in the interface to identify particularly low free energy amino acid sequences. For each cycle of this minimization, a move consists of the random replacement of a single side chain rotamer with an alternative rotamer from the library. This is iteration is performed multiple times, after which a pattern of amino acids at particular positions may emerge. The identification of a pattern is then used by fixing those particular amino acids, and the remaining amino acids along the interface are stripped and randomized according to the iteration described above. Additional patterns may emerge and this process is repeated. Sets of solutions are further assessed by eliminating sequence changes likely to affect nearby active site residues or, in some cases, by reducing structural redundancy (for example, if phenylalanine and tyrosine were computationally selected at a position, the process was continued with only one residue based on whether a neighboring atom could form a hydrogen-bond). In a final computational step, the interface free energy of the best possible sequence combinations are exhaustively enumerated using optimized rotamer conformations for each sequence.
The calculations involved in optimizing the amino acid modifications with respect to particular target nucleic acid are described in further detail below.
A. Description of Molecular Energies
The computational method uses an atomic representation of the protein (including all heavy atoms as well as polar hydrogens) and a free energy function consisting of a linear combination of the attractive part of a Lennard- Jones potential (EUaUr), a linear distance- dependent repulsive term (EUrep), an orientation-dependent side chain-backbone and side chain- side chain hydrogen bond potential (E HB(s^bb) 8ιE HB(S(^SC)) (TaniaKortemme, University of Washington, Seattle, WA, USA, Alex Morozov, University of Washington, Seattle, WA, USA & David Baker, University of Washington, Seattle, WA, USA; personal communication), Coulomb electrostatics (ECoul) and an implicit solvation model (Gsol) (1):
ΔO — W attmuatt + " rep^ ' Urep + " HB(sc-bb)^ΗB(sc-bb) "*" " HB(sc- scfi 'HB(sc- sc)
+WCoulECoul + WsolGsol +
Figure imgf000023_0001
aa= l where W are the relative weights of the different energy terms. E^φ(aa) is an amino-acid type
(aa) dependent backbone torsion angle propensity, and Ea r is an amino-acid type dependent reference energy, which approximates the interactions made in the unfolded state ensemble (Kuhlman and Baker, 2000) (naa is the number of amino acids of a certain type); the last two terms were included to model changes in protein stability upon mutation, but no not contribute to free energy changes of protein-protein interactions.
B. Atomic Coordinates and Preparation of Structures
Atomic coordinates are taken from structures solved by X-ray crystallography. Polar hydrogens were added to all structures, using CHARMM 19 standard bond lengths and angles. For rotatable bonds in polar hydrogen containing side chains, several rotamers reflecting different hydrogen positions were created, including a 180 degree flip of Asn and Gin amide groups and the two His imidazole tautomers (assumed to be uncharged). Global optimization of the hydrogen bonding network was performed for each structure using a simple Metropolis Monte-Carlo procedure as described previously (Kuhlman and Baker, 2000) with the energy function given in Equation (1) and described below.
C. The Free Energy Function
The simple free energy function is given in Equation (1). The Lennard- Jones potential, solvation term, and backbone-dependent amino acid probabilities are as previously described (Lazaridis et al, 1999; Kuhlman and Baker, 2000). Energies of side chain-backbone and side chain-side chain hydrogen bonds were determined using an empirical function (Tania Kortemme, University of Washington, Seattle, WA, USA; Alex Morozov, University of Washington, Seattle, WA, USA and David Baker, University of Washington, Seattle, WA, USA) taking into account a) the distance between the hydrogen (H) and the acceptor (A) atoms, b) the angle at the hydrogen atom (D-H"*A) (D: donor atom), and c) the angle at the acceptor atom (H"»A-AB), (AB: heavy atom bound to the acceptor atom). The distance and angular-dependent terms ofthe hydrogen bonding potential were derived from hydrogen bond geometries observed in high-resolution (2.0 A or better) protein crystal structures. Only hydrogen bonds with proton positions given by the chemistry of the donor group were considered for the derivation of the energy parameters of the potential. Coulomb electrostatics used CHARMM 19 partial charges (Neria et al, 1996) and a linear distance-dependent dielectric constant. Hydrogen bonding and Coulomb interactions were divided into three environment classes, dependent on the extent of burial of both participating residues (class 1 : exposed-exposed and exposed-intermediate, class 2: exposed-buried and intermediate-intermediate, class 3: intermediate-buried and buried- buried). The extent of burial was defined by the number of Cα atoms within a sphere of 8 A radius ofthe Cα atom ofthe residue of interest: exposed 0-8, intermediate 9-14, buried >14).
D. Parameterizing the Energy Function on Monomeric Proteins The relative contributions of the different terms of the free energy function were parameterized on the ProTherm dataset of X-> Ala mutations (www.rtc.riken.go.jp/jouhou/protherm/protherm.litml) by minimizing the sum of the squared differences of calculated and observed differences in stability (ΔGcaic(i)-ΔG0bs(i))2 over all mutations i, using a conjugate-gradient-based optimization method. The Coulomb term had a negligible contribution and was excluded. The weights for the side chain-side chain hydrogen bonds showed the expected dependency on burial, with exposed hydrogen bonds contributing little energy. The amino acid type dependent reference energies (Kuhlman and Baker, 2000) cancel out in the analysis of binding energy changes in interfaces, as the unbound partners are used as reference state in this case.
F. Computational Alanine Across Interfaces
Binding free energy changes upon alanine mutation (Gbmd ) are calculated using equations (1) and (2).
ΔΔG bind = AGM binUd T -AGW binTd O"
=
Figure imgf000025_0001
where ΔGcornpieX, ΔGpartner A and ΔGpartner B are the stabilities of the complex and the unbound partners, obtained using equation 1, and WT and MUT describe wild-type and mutant proteins.
The hydrogen bonding term was scaled so that the maximum contribution of replacing one partner in a buried hydrogen bond by alanine was -4.5 kcal/mol. The relative weights for the two other burial classes (see above) were then scaled to be proportional to the weights found in the monomeric set such that the most favorable energies of intermediate and exposed hydrogen bonds were -2.0 and -0.8 kcal/mol, respectively.
IV. Modifying DNA Binding Domains
A. Removing and/or Replacing Sterically Hindered or Improperly Bonded Amino Acid Groups
Once a sterically hindered or improperly bonded region of a DNA binding domain is identified, the amino acid or amino acid sequence may be removed and replaced with an amino acid(s) which increases stability and enhances specific activity, thereby creating a modified protein.
To remove the sterically hindered or improperly bonded amino acid region, the polypeptide may be altered using a variety of methods of DNA mutagenesis known to those of ordinary skill in the art. Individual amino acids may be altered, or one or more amino acids may be removed or replaced using conventional recombinant DNA technology, such as restriction enzymes or DNAses.
The sterically hindered or improperly bonded region may also be replaced with substitute amino acids. "Repacked or Replaced" means that an amino acid at a particular position has been substituted with a different amino acid residue or with a modified amino acid. This may be accomplished in a number of ways. The sterically hindered or improperly bonded region may be first removed and then the replacement amino acid(s) incorporated into a polynucleotide encoding the modified polypeptide. Recombinant DNA technology may be used to incorporate a particular coding region into a polynucleotide. Alternatively, a region may be mutagenized using site-specific mutagenesis techniques that are well known to those of ordinary skill in the art.
It is contemplated that amino acids that affect the activity of nearby active site residues may also be removed or replaced, either to facilitate the creation of a modified protein or to improve the protein in any way, such as increase the protein's stability and its activity. Furthermore, multiple amino acids may be replaced or removed from any region of the proteins involved in creating a modified protein; thus, exactly or at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, or more amino acids may be removed or replaced.
Techniques and assays to determine stability or activity of a modifed protein are described herein, or are well known to those of skill in the art. Enzymatic assays may be appropriate to evaluate the activity of an enzyme, for example. One of skill in the art would be able to evaluate the activity of a modified protein relative to the native protein. As discussed above, a modified protein may be attached (conjugated or fused) to another polypeptide, peptide, or protein. One of skill in the art would also be able to evaluate any modified conjugated or fusion protein of the invention depending upon the activity or activities of the polypeptide components. B. Proteinaceous Compounds
Modified endonucleases with enhanced stability and site specific activity can be of tremendous benefit in a variety of applications such as, but not limited to, genomic mapping, rapid identification and/or targetting of bacterial or viral pathogens, identification and/or targetting of single or multiple nucleotide polymorphisms (SNPs, MNPs), gene therapy and cancer therapy. The LAGLIDADG homing endonucleases are one such example of endonucleases that can be modified to alter the site specificity ofthe nuclease. Thus, methods of designing and producing such modified endonucleases, as well as compositions thereof are described herein. hi certain embodiments, the present invention concerns novel compositions comprising a proteinaceous molecule that has been modified relative to a native or wild-type protein, hi other embodiments, amino acid residues of the proteinaceous compound have been replaced, while in further embodiments both deletions and replacements of amino acid residues in the proteinaceous compound have been made. Furthermore, a proteinaceous compound may include an amino acid molecule comprising more than one polypeptide entity. As used herein, a "proteinaceous molecule," "proteinaceous composition," "proteinaceous compound," "proteinaceous chain" or "proteinaceous material" generally refers, but is not limited to, a protein of greater than about 100 amino acids or the full length endogenous sequence translated from a gene; a polypeptide of greater than about 50 amino acids; and/or a peptide of from about 3 to about 50 amino acids. All the "proteinaceous" terms described above may be used interchangeably herein. Furthermore, these terms may be applied to fusion proteins or protein conjugates as well. hi certain embodiments the size ofthe at least one proteinaceous molecule may comprise, but is not limited to, about or at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, or greater amino molecule residues, and any range derivable therein.
Accordingly, the term "proteinaceous composition" encompasses amino molecule sequences comprising at least one of the 20 common amino acids in naturally synthesized proteins, or at least one modified or unusual amino acid, including but not limited to those shown on Table 1 below.
Figure imgf000028_0001
As used herein, an "amino molecule" refers to any amino acid, amino acid derivative or amino acid mimic as would be known to one of ordinary skill in the art. In certain embodiments, the residues of the proteinaceous molecule are sequential, without any non-amino molecule interrupting the sequence of amino molecule residues. In other embodiments, the sequence may comprise one or more non-amino molecule moieties. In particular embodiments, the sequence of residues of the proteinaceous molecule may be interrupted by one or more non-amino molecule moieties. Proteinaceous compositions may be made by any technique known to those of skill in the art, including the expression of proteins, polypeptides or peptides through standard molecular biological techniques, the isolation of proteinaceous compounds from natural sources, or the chemical synthesis of proteinaceous materials. The nucleotide and protein, polypeptide and peptide sequences for various genes have been previously disclosed, and may be found at computerized databases known to those of ordinary skill in the art. One such database is the National Center for Biotechnology Information's Genbank and GenPept databases (www.ncbi.nlm.nih.gov). The coding regions for these known genes may be amplified and/or expressed using the techniques disclosed herein or as would be know to those of ordinary skill in the art. Alternatively, various commercial preparations of proteins, polypeptides and peptides are known to those of skill in the art. In certain embodiments a proteinaceous compound may be purified. Generally, "purified" will refer to a specific or protein, polypeptide, or peptide composition that has been subjected to fractionation to remove various other proteins, polypeptides, or peptides, and which composition substantially retains its activity, as may be assessed, for example, by the protein assays, as would be known to one of ordinary skill in the art for the specific or desired protein, polypeptide or peptide.
1. Functional Aspects
When the present invention refers to the function or activity of a modified endonuclease, it is meant that the modified endonuclease has the ability to recognize and cleave within novel DNA target sequences. The modified endonuclease of the present invention is associated with other activities such as the ability to cleave both strands of the DNA target; to cleave only the plus strand of the DNA target; to cleave only the negative strand of DNA target; or to bind but do not cleave either strand of DNA target. In particular, the activity of the present invention refers to independent domains from separate, naturally occurring homing endonucleases that when recombined and fused, create modified hybrid proteins with unique and/or enhanced DNA target specificity. In other embodiments, the activity ofthe present invention refers to the ability of a single modified homing endonuclease to recognize and cleave an altered nucleic acid target. These homing endonucleases and their activity are amenable to significant structural alterations. Thus, when the present application refers to the function or activity of a modified endonuclease, one of ordinary skill in the art would understand that this includes, for example, a endonuclease that possesses an additional advantage over the unmodified endonuclease. Determination of the activity of an endonuclease may be achieved using assays familiar to those of skill in the art, and may include for comparison purposes, the use of native and/or recombinant versions of either the modified or unmodified endonuclease. 2. Modified Nucleic Acid Binding Polypeptides
The present invention may employ amino acid sequence variants such as substitutional, insertional or deletion variants. In particular, the modified endonuclease ofthe present invention may possess substitutions of amino acids that alleviate steric hinderance, increase stability and enhance site specific activity. In some embodiments these modified proteins may further include insertions or added amino acids, such as with fusion proteins or proteins with linkers, for example. Substitutional or replacement variants typically contain the exchange of one amino acid for another at one or more sites within the protein and may be designed to modulate one or more properties ofthe polypeptide, particularly to enhance its stability and site specific activity.
In addition to a substitution, a modified protein may possess an insertion of residues, which typically involves the addition of at least one residue in the polypeptide. This may include the insertion of a linking peptide or polypeptide or simply a single residue. Terminal additions, called fusion proteins, are discussed below.
The term "functionally equivalent codon" is used herein to refer to codons that encode the same amino acid, such as the six codons for arginine or serine, and also refers to codons that encode biologically equivalent amino acids (Table 2).
TABLE 2 CODON TABLE
Amino Acids Codons
Alanine Ala A GCA GCC GCG GCU
Cysteine Cys C UGC UGU
Aspartic acid Asp D GAC GAU
Glutamic acid Glu E GAA GAG
Phenylalanine Phe F UUC uuu
Glycine Gly G GGA GGC GGG GGU
Histidine His H CAC CAU
Isoleucine He I AUA AUC AUU
Lysine Lys K AAA AAG
Leucine Leu L UUA UUG CUA CUC CUG CUU
Methionine Met M AUG
Asparagine Asn N AAC AAU
Proline Pro P CCA CCC CCG CCU
Glutamine Gin Q CAA CAG
Arginine Arg R AGA AGGCGA CGC CGG CGU
Serine Ser S AGC AGUUCA UCC UCG UCU
Threonine Thr T ACA ACC ACG ACU
Valine Val V GUA GUCGUG GUU
Tryptophan Trp w UGG
Tyrosine Tyr Y UAC UAU It also will be understood that amino acid and nucleic acid sequences may include additional residues, such as additional N- or C-terminal amino acids or 5' or 3' sequences, and yet still be essentially as set forth in one of the sequences disclosed herein, so long as the sequence meets the criteria set forth above, including the maintenance of biological protein activity where protein expression is concerned. The addition of terminal sequences particularly applies to nucleic acid sequences that may, for example, include various non-coding sequences flanking either ofthe 5' or 3' portions ofthe coding region or may include various internal sequences, i.e., introns, which are known to occur within genes.
The following is a discussion based upon changing of the amino acids of a protein to create an equivalent, or even an improved, second-generation molecule. For example, certain amino acids may be substituted for other amino acids in a protein structure without appreciable loss of interactive binding capacity with structures such as, for example, binding sites to substrate molecules. Since it is the interactive capacity and nature of a protein that defines that protein's biological functional activity, certain amino acid substitutions can be made in a protein sequence, and in its underlying DNA coding sequence, and nevertheless produce a protein with like properties. It is thus contemplated by the inventors that various changes may be made in the DNA sequences of genes without appreciable loss of their biological utility or activity, as discussed below. Table 2 shows the codons that encode particular amino acids.
In making such changes, the hydropathic index of amino acids may be considered. The importance of the hydropathic amino acid index in conferring interactive biologic function on a protein is generally understood in the art (Kyte & Doolittle, 1982). It is accepted that the relative hydropathic character of the amino acid contributes to the secondary structure of the resultant protein, which in turn defines the interaction of the protein with other molecules, for example, enzymes, substrates, receptors, DNA, antibodies, antigens, and the like.
It also is understood in the art that the substitution of like amino acids can be made effectively on the basis of hydrophilicity. U.S. Patent 4,554,101, incorporated herein by reference, states that the greatest local average hydrophilicity of a protein, as governed by the hydrophilicity of its adjacent amino acids, correlates with a biological property of the protein. As detailed in U.S. Patent 4,554,101, the following hydrophilicity values have been assigned to amino acid residues: arginine (+3.0); lysine (+3.0); aspartate (+3.0 ± 1); glutamate (+3.0 ± 1); serine (+0.3); asparagine (+0.2); glutamine (+0.2); glycine (0); threonine (-0.4); proline (-0.5 ± 1); alanine (-0.5); histidine (-0.5); cysteine (-1.0); methionine (-1.3); valine (-1.5); leucine (-1.8); isoleucine (-1.8); tyrosine (-2.3); phenylalanine (-2.5); tryptophan (-3.4).
It is understood that an amino acid can be substituted for another having a similar hydrophilicity value and still produce a biologically equivalent and immunologically equivalent protein. In such changes, the substitution of amino acids whose hydrophilicity values are within
±2 is preferred, those that are within ±1 are particularly preferred, and those within ±0.5 are even more particularly preferred.
As outlined above, amino acid substitutions generally are based on the relative similarity of the amino acid side-chain substituents, for example, their hydrophobicity, hydrophilicity, charge, size, and the like. Exemplary substitutions that take into consideration the various foregoing characteristics are well known to those of skill in the art and include: arginine and lysine; glutamate and aspartate; serine and threonine; glutamine and asparagine; and valine, leucine and isoleucine. Another embodiment for the preparation of modified polypeptides according to the invention is the use of peptide mimetics. Mimetics are peptide-containing molecules that mimic elements of protein secondary structure (see Johnson, 1993). The underlying rationale behind the use of peptide mimetics is that the peptide backbone of proteins exists chiefly to orient amino acid side chains in such a way as to facilitate molecular interactions, such as those of antibody and antigen. A peptide mimetic is expected to permit molecular interactions similar to the natural molecule. These principles may be used, in conjunction with the principles outline above, to engineer second generation modified protein molecules having many of the natural properties of a native protein, but with altered and, in some cases, even improved characteristics. As is known to one of ordinary skill in the art, mutations by amino acid sequence variants may be used to generate synthetic peptides.
3. Fusion Proteins
A specialized kind of insertional variant is the fusion protein. It is contemplated that some of the chimeric polypeptides of the invention may be considered fusion proteins. This molecule generally has all or a substantial portion of the native molecule, linked at the N- or C- terminus, to all or a portion of a second polypeptide, such as a second nucleic acid binding domain.
Another example is a fusion that contains leader sequences from other species to permit the recombinant expression of a protein in a heterologous host. Another useful fusion includes the addition of an immunologically active domain, such as an antibody epitope or other tag, to facilitate targeting or purification ofthe fusion protein. The use of 6xHis and GST (glutathione S transferase) as tags is well known. Inclusion of a cleavage site at or near the fusion junction will facilitate removal of the extraneous polypeptide after purification. Other useful fusions include linking of functional domains, such as active sites from enzymes such as a hydrolase, glycosylation domains, cellular targeting signals or transmembrane regions.
4. Linkers/Coupling Agents
It can be considered as a general guideline that any biochemical cross-linker that is appropriate for use in an endonuclease will also be of use in the present context, and additional linkers may also be considered. Cross-linking reagents are used to form molecular bridges that tie together functional groups of two different molecules, e.g., a stablizing and coagulating agent. To link two different proteins in a step-wise manner, hetero-bifunctional cross-linkers can be used that eliminate unwanted homopolymer formation. It is contemplated that cross-linkers may be implemented with the modified protein molecules ofthe invention. Bifunctional cross-linking reagents have been extensively used for a variety of purposes including preparation of affinity matrices, modification and stabilization of diverse structures, identification of binding sites, and structural studies. In the context of the invention, such cross-linkers may be used to stabilize the polypeptide or to render it more useful as gene specific reagent, for example, by improving the modified protein's targeting capability or overall efficacy. Cross-linkers may also be cleavable, such as disulfides, acid-sensitive linkers, and others. Homobifunctional reagents that carry two identical functional groups may be used to induce efficient cross-linking between identical and different macromolecules or subunits of a macromolecule, and linking of polypeptides to specific binding sites on binding partners. Heterobifunctional reagents contain two different functional groups. By taking advantage of the differential reactivities of the two different functional groups, cross-linking can be controlled both selectively and sequentially. The bifunctional cross-linking reagents can be divided according to the specificity of their functional groups, e.g., amino, sulfhydryl, guanidino, indole, carboxyl specific groups. Of these, reagents directed to free amino groups have become especially popular because of their commercial availability, ease of synthesis and the mild reaction conditions under which they can be applied. A majority of heterobifunctional cross- linking reagents contains a primary amine-reactive group and a thiol-reactive group.
Exemplary methods for cross-linking ligands to liposomes are described in U.S. Patent 5,603,872 and U.S. Patent 5,401,511, each specifically incorporated herein by reference in its entirety). Various ligands can be covalently bound to liposomal surfaces through the cross- linking of amine residues, i another example, heterobifunctional cross-linking reagents and methods of using the cross-linking reagents are described (U.S. Patent 5,889,155, specifically incorporated herein by reference in its entirety). The cross-linking reagents combine a nucleophilic hydrazide residue with an electrophilic maleimide residue, allowing coupling in one example, of aldehydes to free thiols. The cross-linking reagent can be modified to cross-link various functional groups and is thus useful for cross-linking polypeptides and sugars. Table 3 details certain hetero-bifunctional cross-linkers considered useful in the present invention.
Figure imgf000034_0001
In instances where a particular polypeptide does not contain a residue amenable for a given cross-linking reagent in its native sequence, conservative genetic or synthetic amino acid changes in the primary sequence can be utilized. C. Nucleic Acid Molecules
1. DNA Binding Domains of Homing Endonucleases
The present invention concerns independent domains from naturally occurring nucleic acids, as well as whole proteins encoded by naturally occurring nucleic acids, isolatable from cells, that are free from total genomic DNA and that are capable of expressing all or part of a protein or polypeptide. The polynucleotide may encode a native protein that may be manipulated to encode a modified protein. Alternatively, the polynucleotide may encode a modified protein, or it may encode a polynucleotide that will be used to make a fusion protein with a modified protein. It is contemplated that a single polynucleotide molecule may encode, 1, 2, or more different polypeptides (all or part). Nucleic acids of the present invention may be used in expression systems to produce recombinant proteins that can be purified from expressing cells to yield active proteins.
As used herein, the term "nucleic acid" refers to a DNA molecule that has been isolated free of total genomic DNA of a particular species. Therefore, a nucleic acid encoding a polypeptide refers to a DNA that contains wild-type, polymorphic, or modified polypeptide- coding sequences yet is isolated away from, or purified free from, total genomic DNA. Included within the term "nucleic acid" are a nucleic acids encoding nucleic acid binding proteins, portions of such proteins, and recombinant vectors, including, for example, plasmids, cosmids, phage, viruses, and the like. As used in this application, the term "polynucleotide" refers to a nucleic acid molecule that has been isolated free of total genomic nucleic acid. Therefore, a "polynucleotide encoding a native polypeptide" refers to a DNA segment that contains wild-type polypeptide-coding sequences isolated away from, or purified free from, total genomic DNA. The term "cDNA" is intended to refer to DNA prepared using messenger RNA (mRNA) as template. It also is contemplated that a particular polypeptide from a given species may be represented by natural variants that have slightly different nucleic acid sequences but, nonetheless, encode the same protein (due to wobble in codons).
Similarly, a polynucleotide comprising an isolated or purified wild-type, polymorphic, or modified polypeptide gene refers to a DNA segment including wild-type, polymorphic, or modified polypeptide coding sequences and, in certain aspects, regulatory sequences, isolated substantially away from other naturally occurring genes or protein encoding sequences. In this respect, the term "gene" is used for simplicity to refer to a functional protein, polypeptide, or peptide-encoding unit. As will be understood by those in the art, this functional term includes genomic sequences, cDNA sequences, and smaller engineered gene or cDNA segments that express, or may be adapted to express, proteins, polypeptides, domains, peptides, fusion proteins, and modified polypeptides. A nucleic acid encoding all or part of a native or modified polypeptide may contain a contiguous nucleic acid sequence encoding all or a portion of such a polypeptide of the following lengths: about 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, or more nucleotides, nucleosides, or base pairs.
In particular embodiments, the invention concerns isolated DNA segments and recombinant vectors incorporating DNA sequences that encode a wild-type, polymorphic, or modified polypeptide or peptide that includes within its amino acid sequence a contiguous amino acid sequence in accordance with, or essentially corresponding to a native polypeptide. The term "recombinant" may be used in conjunction with a polypeptide or the name of a specific polypeptide, and this generally refers to a polypeptide produced from a nucleic acid molecule that has been manipulated in vitro or that is the replicated product of such a molecule.
In other embodiments, the invention concerns isolated DNA segments and recombinant vectors incorporating DNA sequences that encode a polypeptide or peptide that includes within its amino acid sequence a contiguous amino acid sequence in accordance with, or essentially corresponding to the polypeptide.
The nucleic acid segments used in the present invention, regardless of the length of the coding sequence itself, may be combined with other nucleic acid sequences, such as promoters, polyadenylation signals, additional restriction enzyme sites, multiple cloning sites, other coding segments, and the like, such that their overall length may vary considerably. It is therefore contemplated that a nucleic acid fragment of almost any length may be employed, with the total length preferably being limited by the ease of preparation and use in the intended recombinant DNA protocol.
It is contemplated that the nucleic acid constructs of the present invention may encode full-length polypeptide from any source or encode a truncated version of one or more polypeptides, such as a nucleic acid binding domain without other domains. Alternatively, a nucleic acid sequence may encode a full-length polypeptide sequence with additional heterologous coding sequences, for example to allow for purification of the polypeptide, transport, secretion, post-translational modification, or for therapeutic benefits such as targetting or efficacy. As discussed above, a tag or other heterologous amino acid segments may be added to the modified polypeptide-encoding sequence, wherein "heterologous" refers to an amino acid segment from another polypeptide.
In a non-limiting example, one or more nucleic acid constructs may be prepared that include a contiguous stretch of nucleotides identical to or complementary to a particular gene, such as to a homing endonuclease. A nucleic acid construct may be at least 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 250, 300, 400, 500, 600, 700, 800, 900, 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 15,000, 20,000, 30,000, 50,000, 100,000, 250,000, 500,000, 750,000, to at least 1,000,000 nucleotides in length, as well as constructs of greater size, up to and including chromosomal sizes (including all intermediate lengths and intermediate ranges), given the advent of nucleic acids constructs such as a yeast artificial chromosome are known to those of ordinary skill in the art. It will be readily understood that "intermediate lengths" and "intermediate ranges," as used herein, means any length or range including or between the quoted values (i.e., all integers including and between such values).
The DNA segments used in the present invention encompass biologically functional equivalent modified polypeptides and peptides, for example, a modified endonuclease. Such sequences may arise as a consequence of codon redundancy and functional equivalency that are known to occur naturally within nucleic acid sequences and the proteins thus encoded. Alternatively, functionally equivalent proteins or peptides may be created via the application of recombinant DNA technology, in which changes in the protein structure may be engineered, based on considerations ofthe properties ofthe amino acids being exchanged. Changes designed by a human may be introduced through the application of site-directed mutagenesis techniques.
V. Delivery and Uses of Modified Proteins A. Vectors
It is contemplated by the present invention that virtually any type of vector may be employed in any known or later discovered method to deliver modified proteins of this invention. Thus, a vector in the context of the present invention refers to a carrier nucleic acid molecule into which a sequence encoding a native, modified or unmodified protein can be inserted for introduction into a cell and thereby replicated. A nucleic acid sequence can be exogenous, in that it is foreign to the cell into which the vector is being introduced; or that the sequence is homologous to a sequence in the cell but positioned within the host cell nucleic acid in which the sequence is ordinarily not found. Vectors include but are not limited to plasmids; cosmids; viruses; artificial chromosomes such as YACs (yeast artificial chromosomes) and BACs (bacterial artificial chromosomes); and synthetic constructs such as linear/circular expression elements (LEEs/CEEs).
Viral vectors may be derived from viruses know to those of skill in the art, for example, bacteriophage, animal and plant virus, including but not limited to, adenovirus, vaccinia virus (Ridgeway, 1988; Baichwal and Sugden, 1986; Coupar et al, 1988) adeno-associated virus (AAV) (Ridgeway, 1988; Baichwal and Sugden, 1986; Hermonat and Muzycska, 1984) retrovirus and herpesvirus and offer several features for use in gene transfer into various mammalian cells (Friedmann, 1989; Ridgeway, 1988; Baichwal and Sugden, 1986; Coupar et al, 1988; Horwich et al, 1990). One of skill in the art would be well equipped to construct a vector through standard recombinant techniques as described in Sambrook et al, 2001, Maniatis et al, 1990 and Ausubel et al, 1994, incorporated herein by reference.
In certain embodiments of the invention, cells containing the modified polypeptides of the present invention may be identified in vitro or in vivo by including a marker in the expression vector. Such markers would confer an identifiable change to the cell permitting easy identification of cells containing the expression vector. Generally, a selectable marker is one that confers a property that allows for selection. A positive selectable marker is one in which the presence of the marker allows for its selection, while a negative selectable marker is one in which its presence prevents its selection. An example of a positive selectable marker is a drug resistance marker.
Usually the inclusion of a drug selection marker aids in the cloning and identification of transformants, for example, genes that confer resistance to neomycin, puromycin, hygromycin, DHFR, GPT, zeocin and histidinol are useful selectable markers. In addition to markers conferring a phenotype that allows for the discrimination of transformants based on the implementation of conditions, other types of markers including screenable markers such as GFP, whose basis is colorimetric analysis, are also contemplated. Alternatively, screenable enzymes such as herpes simplex virus thymidine kinase (tk) or chloramphenicol acetyltransferase (CAT) may be utilized. One of skill in the art would also know how to employ immunologic markers, possibly in conjunction with FACS analysis. The marker used is not believed to be important, so long as it is capable of being expressed simultaneously with the nucleic acid encoding a gene product. Further examples of selectable and screenable markers are well known to one of skill in the art.
B. Modified Polypeptides as Delivery Vehicles or Inhibitors
The present invention contemplates the use of modified nucleic acid binding polypeptides as delivery vehicles that may be used to deliver a molecule to a specific nucleic acid site or as an inhibitor at a particular site.
Thus, the present invention further contemplates a number of means by which a molecule may be delivered to a cell, tissue or subject. Virtually any method by which nucleic acids can be introduced into a cell, or an organism may be employed with the current invention, as described herein or as would be known to one of ordinary skill in the art. Such methods include, but are not limited to direct delivery of DNA by: injection (U.S. Patents 5,994,624, 5,981,274, 5,945,100, 5,780,448, 5,736,524, 5,702,932, 5,656,610, 5,589,466 and 5,580,859, each incorporated herein by reference); microinjection (Harlan and Weintraub, 1985; U.S. Patent No. 5,789,215, incorporated herein by reference); electroporation (U.S. Patent No. 5,384,253, incorporated herein by reference; Tur-Kaspa et al, 1986; Potter et al, 1984); calcium phosphate precipitation (Graham and Van Der Eb, 1973; Chen and Okayama, 1987; Rippe et al, 1990); using DEAE-dexfran followed by polyethylene glycol (Gopal, 1985); direct sonic loading (Fechheimer et al, 1987); liposome mediated transfection (Nicolau and Sene, 1982; Fraley et al, 1979; Nicolau et al, 1987; Wong et al, 1980; Kaneda et α/., 1989; Kato et al, 1991) receptor- mediated transfection (Wu and Wu, 1987; Wu and Wu, 1988); microprojectile bombardment (PCT Application Nos. WO 94/09699 and 95/06128; U.S. Patents 5,610,042; 5,322,783 5,563,055, 5,550,318, 5,538,877 and 5,538,880, and each incorporated herein by reference); agitation with silicon carbide fibers (Kaeppler et al, 1990; U.S. Patents 5,302,523 and 5,464,765, each incorporated herein by reference); Agrobacterium-mediated transformation (U.S. Patents 5,591,616 and 5,563,055, each incorporated herein by reference); PEG-mediated transformation of protoplasts (Omirulleh et αl, 1993; U.S. Patents 4,684,611 and 4,952,500, each incorporated herein by reference); desiccation/inhibition-mediated DNA uptake (Potrykus et αl, 1985); or any combination of such methods.
As an inhibitor, a modified nucleic acid binding polypeptide can bind to the altered target site and inhibit or prevent transcription or translation ofthe downstream sequence.
VI. EXAMPLES
The following examples are included to demonstrate preferred embodiments of the invention. It should be appreciated by those of skill in the art that the techniques disclosed in the examples which follow represent techniques discovered by the inventor to function well in the practice of the invention, and thus can be considered to constitute preferred modes for its practice. However, those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments which are disclosed and still obtain a like or similar result without departing from the spirit and scope ofthe invention. EXAMPLE 1 The creation of E-Drel
The experimental strategy is outlined in FIG. 1. Structural modeling indicated that it should be possible to create a novel chimaeric endonuclease by fusing the N-terminal domain of l-Dmol to an I-Oel monomer, repacking the new protein interface to facilitate efficient folding and intimate domain association, and then inserting a short peptide linker to create an enzyme monomer. The domain interface ofthe initial protein model was analyzed to identify 14 interface residues to be redesigned, followed by an automated computational design protocol to search through 8 x 1017 possible interface sequences. All amino acids but cysteine were allowed at each position. The best predicted E-Drel interface variants as determined by computational redesign (16 total constructs, each containing between 8 and 12 altered residues in the interface) were then generated and screened in vivo to insure proper folding and solubility. Biochemical characterization of several soluble E-Drel variant proteins revealed that each was able to bind and cleave a specific, 23 bp chimeric DNA target site with high specificity. The co-crystal structure of one of these active variants bound to target site DNA was determined in order to assess the accuracy of prediction of the computational redesign method and characterize the artificial endonuclease.
The availability of X-ray crystal structures of l-Dmol (Silva et al, 1999) and of I-Oel (Jurica et al, 1998; Chevalier et al, 2001) allowed the generation of a detailed starting model of E-Drel. The N-terminal 'large' domain of l-Dmol was substituted for a single subunit of the I- Crel homodimer to create the initial scaffold for an enzyme chimaera. Superposition of the backbone atoms in the two conserved LAGLIDADG helices at the Dmo/Cre interface was used to orient and align the docked l-Dmol domain. Further analysis of this modeled protein revealed a large though unoptimized protein interface of approximately 1300 A2 with numerous steric side chain clashes and a very unfavorable overall repulsive interface potential (Table 4). One residue (LI 08) was located within a central LAGLIDADG helix in the domain interface, while the remaining five residues (L47, H51, L55, K193 and LI 94) were located within the interface near the helices. The residues at these 6 positions were substituted with alanine residues (or, in one case, an aspartate residue) to minimize steric clashes in an attempt to promote the formation of a more stable domain interface. However, this alanine-substituted or sidechain-truncated E-Drel variant proved to be insoluble. The most likely explanation for this was probably the failure to form a stable domain interface due to the presence of structural cavities, and thus performed a complete automated redesign ofthe domain interface. TABLE 4. PREDICTED CONTRIBUTIONS TO THE INTERFACE FREE ENERGY OF
EACH STAGE OF DESIGN
Figure imgf000041_0001
TABLE CAPTION: a) l-Crel crystal structure. b) l-Dmol crystal structure. c) Initial model ofthe l-Dmol - l-Crel chimera (no sequence changes). d) Model ofthe l-Dmol - l-Crel chimera with side chains exhibiting significant van-der- Waals (vdW) overlaps in the interface truncated to alanine residues e) Computational design of E-Drel using a modeled backbone template. f) E-E»rel crystal structure.
Energies are computed by Tania Kortemme, University of Washington, Seattle, WA, USA and David Baker, University of Washington, Seattle, WA, USA; personal communication. LJattr: attractive component of the Lennard- Jones potential, LJrep: repulsive part of Lennard- Jones potential, Sol: solvation energy, H-Bond: hydrogen bonding energy, ΔGtot: total interface free energy. Accessible surface area buried in the interface is computed with a probe radius of 1.4 A using WHATIF.
The computational interface redesign focused on the six residues exhibiting steric clashes in the original model, and was extended to include eight additional residues predicted to contribute substantially to the interface free energy (A12, Y13, L17, 119, 152, E105, Y109 and F113). At each of these fourteen sites, a library of possible side chain conformations spanning 19 potential amino acids (all but cysteine) in different backbone-dependent rotameric states (on average about 500 rotamers per sequence position) was created. The interaction of all rotamers with the surrounding, fixed portion of the molecule (including the polypeptide backbone and all side chains not subjected to sequence design), and all pairwise rotamer-rotamer energies were computed using a free energy function which includes van-der- Waals interactions, solvation effects, explicit hydrogen-bonding interactions, and statistical terms representing the backbone- dependent internal free energies of amino acid rotamers (Tania Kortemme, University of Washington, Seattle, WA, USA and David Baker, University of Washington, Seattle, WA, USA; personal communication; Kuhlman and Baker, 2000).
A Monte-Carlo-simulated annealing procedure, in which a move consists of the random replacement of a single rotamer with an alternative rotamer from the library, was then used to search through the 8 x 10 sequence combinations (with 6 x 10 total rotamer combinations) to identify particularly low free energy amino acid sequences. Since the Monte-Carlo protocol does not guarantee finding a global free energy minimum, 1000 separate sequence design runs were performed using two polypeptide backbone models with slightly different relative orientations of the LAGLIDADG helices in the domain interface. This procedure yielded a family of sequences with different amino acid choices at each of the 14 design positions. Native residues were consistently best at three of 14 positions, and a single new residue was consistently best at a fourth position. Two to five different residues were identified at the remaining ten positions, to yield a total of 51,840 possible combinations. This set of solutions was further reduced by eliminating sequence changes likely to affect nearby active site residues or, in some cases, by reducing structural redundancy (for example, if F and Y were computationally selected at a position, only one residue was continued with based on whether a neighboring atom could form an H-bond). These steps further reduced the solutions from 35 possible amino acid substitutions over 10 positions to 25 substitutions over 9 positions, or 1152 position/residue combinations. In a final computational step, the interface free energy of all 1152 combinations was exhaustively enumerated using optimized rotamer conformations for each sequence. Consistently top scoring interface free energies were found for sixteen different sequences with the following substitutions as compared with wildtype: A12A or Y; Y13Y; L17L; I19W; L47L or W; H51H or F; 1521; L55R; E105R; A108A; Y109Y; F113I; K193N or Y; L194F.
In addition to the redesign of the enzyme interface, a short peptide linker was inserted between the l-Dmol and l-Crel domains to generate a monomeric protein. The linker chosen for this purpose, -NGN-, resembled the -NMR- linker found in native I-Dmol, but contained glycine and asparagine residues in order to exploit the high β-turn propensity of NG- and GN-containing peptides (Hutchinson and Thornton, 1994). The 16 enzyme variants described above were generated by site-directed mutagenesis and screened for in vivo folding and solubility. The solubility screen utilizes a blue/white colony color difference that reflects protein solubility- dependent LacZα complementation in E. coli (Wigley et al, 2001). In order to perform this screen, a lacZα peptide was fused to E-Drel variants, and then expressed each in E. coli cells expressing a lacZω protein partner. Briefly, constructs encoding Ε-Drel were subcloned into the pl-Crel vector and transfected into E coli cells, and induced with 0.5 mM IPTG in BL21[DΕ3] E.coli cells overnight at 15°C. Cells were harvested by centrifugation and lysed by sonication in 50 mM Tris pH 8.0, 100 mM NaCl and 1 mM CaCl2. Cell debris was removed by centrifugation at (40000g, 45 min, 4°C). The supernatant was forced through a 0.2 μm syringe filter, and applied to a heparin column (Pharmacia). E-Drel was eluted with an increasing salt gradient. Collected fractions from the single peak were diluted with an equal volume solution of 50 mM Tris pH^δ.0, and loaded back over the heparin column. After the second elution, E-Drel was >95% pure (SDS-PAGE); it was then dialyzed overnight into 125 mM NaCl, 50 mM Tris pH=8.0, 1 mM CaCl , 5% glycerol, and concentrated to ~4 mg/ml by centrifugation (Centriprep, Millipore) and stored at -80°C. In this screen insoluble E-E>reI/lacZ constructs form inclusion bodies, fail to complement lacZω, and give rise to white colonies. Conversely, soluble Ε- DrelllacZa constructs complement lacω to give rise to blue colonies on X-gal indicator plates (FIG. 1).
The alanine-truncated version of Ε-E>rel gave rise exclusively to white colonies in this protein folding and solubility screen. In contrast, strongly lacZ-positive blue colonies indistinguishable from those generated by an I-Crel/lacZα positive control were generated when different Ε-Drel/lacZα variants whose interface had been computationally redesigned were expressed. The interface residue substitutions indicated by computational redesign were incorporated over several rounds of site-directed mutagenesis. An increase in the fraction and intensity of blue colonies was observed with the incorporation of each successive substitution. The most soluble E-Drel constructs displayed predicted interface free energies similar to parent structures l-Crel and l-Dmol, in contrast to the initial and alanine-truncated forms of Ε-E>rel that had poor predicted interface free energies and were largely insoluble. (Table 4). These results indicate that the computational interface redesign protocol generated several protein variants that are able to fold. EXAMPLE 2 E-Drel is a novel endonuclease with altered specificity
In order to determine the binding and catalytic activities of E-Drel on different DNA target sites, three of the most highly soluble E-Drel variants as identified by the in vivo protein solubility assay were over-expressed and purified as described above. All three proteins were soluble and easily purified by heparin affinity and size exclusion chromatography. All three proteins were stable at 4°C at a concentration of ~5 mg/ml in buffer containing 5% glycerol, 150 mM NaCl, 1 mM CaCl2, and 50 mM Tris pH 8.0. Since E-Drel is a two-domain chimeric monomer composed of l-Dmol and l-Crel domains, it was reasoned that the most likely E-Drel target site would be a chimera of the l-Dmol and I-Oel target sites (FIG. 2A). The two native homing sites (termed 'dmo' and 'ere', respectively) can be considered as four distinct half sites, with the center of each target site defined by the middle of the four base overhang generated upon cleavage. The native dmo site is asymmetric, and the two half sites are referred to here as DI and D2. The ere site is pseudo-palindromic, and the two half sites are refer to as CI and CI'. Four chimeric sites can be generated from these four half sites (FIG. 2A): these sites were termed drel (D1:C1), dre2 (D1:C1'), dre3 (D2:C1) and dre4 (D2:C1').
The affinity of protein binding to DNA target sites was performed by gel shift analyses (also referred to as electrophoretic migration retardation analyses) using labeled oligonucleotide substrate in the presence of 20 mM Tris 9.0, 10 mM calcium chloride, 1 mM DTT, and 50 μg/ml BSA. Gels were imaged on a Storm Phosphorimager 840 (Molecular Dynamics, Sunnyvale, CA).
Cleavage activity was also evaluated. The digestion of labeled oligonucleotide substrates was performed in I-Oel buffer (20 mM Tris 9.0, 10 mM MgC12, 1 mM DTT, 50 μg/ml BSA) at 65°C (l-Crel digests), or in New England Biolabs Buffer 4 containing 50 μg/ml BSA at 37°C (I- Dmol digests) or 65°C (E-E>rel digests). 5' end-labeled primers and template DNA (a dre3 target site cloned into pBSIISK+), were used to generate both the sequencing ladders and the dsDNA substrates for E-Drel cleavage. Denatured Ε-Z eI-digested substrates (labeled as 'X' in FIG. 2C) were run alongside their corresponding sequencing reactions to map cleavage positions. Gels were imaged on a Storm Phosphorimager 840 (Molecular Dynamics, Sunnyvale, CA). Each of the three purified E-Drel variants cleaved target sites dre3 and dre4, but was unable to cleave the drel or dre2 target sites or the native dmo or ere target sites (FIG. 2B). Conversely, purified l-Dmol or l-Crel did not cleave any of the four dre target sites. The dre3 and dre4 sites each contain the same dmo half-site and one ofthe two ere half sites: thus the N- terminal domain of l-Dmol recognizes only the D2 dmo half site, which was unknown upon beginning this project, while the C-terminal domain from l-Crel recognizes either ere half-site, as expected for a domain from an endonuclease homodimer (FIG. 2B). The biochemical behavior of E-Drel on these different DNA target sites indicates that E-Drel is a novel, highly sequence-specific endonuclease that displays altered DNA target site specificity.
Since all three E-Drel constructs behaved identically in binding and cleavage assays, a single variant, Drel 6, was chosen for more thorough characterization. This E-Drel variant contains eight computationally designed point mutations at the domain interface (I19W, H51F, L55R, E105R, L108A, F113I, K193N, L194F). Drel6 cleaves its target site precisely at one phosphodiester bond on each DNA strand, separated by four base pairs in the target site DNA to generate four base, 3'-extended cohesive ends (FIG. 2C). This end geometry is identical to all other characterized LAGLIDADG homing endonucleases. Drel6 displays a dissociation constant (Ka ) of 100 ±5 nM as determined by gel shift assays, or ~ two orders magnitude lower than the 1 nM dissociation constant of native I-Oel (Wang et al, 1997). Like l-Crel, E-Drel forms a tight complex with cleavage products in which product dissociation is rate-limiting. This prevents the direct determination of steady-state kcat and KM values. However, the estimated single turnover catalytic rate (kcat*) (Halford et al, 1980) of E-Drel is nearly identical to native l-Crel: kcat* = 0.04 min -1 for E-Drel, vs 0.03 min _1 for I-CVel (Meg Chadsey, University of Washington, Seattle, Washington USA, Kathryn M. Stephens University of Washington, Seattle, Washington USA, Raymond Monnat, University of Washington, Seattle, Washington USA, Monique Turmel University of Laval, Quebec, Quebec Canada, Claude Lemieux, University of Laval, Quebec, Quebec Canada;personal communication).
EXAMPLE 3 E-Drel structural analysis
The structure of E-Drel bound to its DNA target site was determined in order to visualize the redesigned protein interface, to determine the accuracy of the computational interface prediction, and to visualize the endonuclease DNA interface and its active sites. The structure of E-Drel was determined by x-ray crystallography using data collected at the Advanced Light Source Synchrotron beamline 5.0.2 to 2.4 A resolution (Rwork/Rfree = 0.231/0.256; FIG. 1). Within each asymmetric unit of the P3[ unit cell, four copies of the E-Drel/DNA complex are visible: two are well-ordered and have an average B of 35 A2, while the remaining two complexes were poorly ordered and have been modeled as poly-alanine/DNA (average B ~ 110 A2), hi the well-ordered complexes, density is present for all residues except 1-4 and 253-260 (which are similarly disordered in the l-Dmol and I-Oel structures, respectively (Jurica et al, 1998; Silva et al, 1999; Chevalier et al, 2001).
The general topology of the E-E>rel structure and its domain interface was similar to those found in the previously determined structures ofthe parental endonucleases: the conserved core LAGLIDADG helices pack tightly against one another, and the redesigned residues cluster to either side of these helices to form a well-packed domain interface (FIG. 3). The buried surface area of the interface in the E-Drel structure (1460 A2) is comparable to that in l-Dmol (1430 A2) and l-Crel (1870 A2) and is in the size range of typical protein-protein interfaces (Conte et al, 1999).
The crystal structure of E-Drel correlated well with the initial structural model (FIG. 4A): the Cα RMSD between predicted and actual structures is 0.8 A. The greatest divergence is in the DNA-binding loops between βl/β2 and β3/β4 in the domain originating from l-Dmol. However, the model of these loops was derived from the l-Dmol structure lacking a DNA substrate, and upon DNA binding there are conformational changes at this part of the DNA interface. Slight differences are also noticeable at the C-terminal end of α3 and N-terminal end of α4 (connected by the linker between the two domains), where the ends of each of these helices are closer to one another (~0.4 A) compared to the model. Prediction of the exact position ofthe top of α4 was complicated by slight divergence in the backbone positions ofthe tops of the LAGLIDADG helices of l-Dmol and I-Oel (about 1.4 A). In E-E>rel, helix α4 assumes an intermediate position compared to the same helix in native l-Dmol and l-Crel; the slight movement of α3 aids to accommodate this fit. The linker sequence of -102NGN1o4- is packed against the top ofthe protein, and side-chains are easily seen.
The computational model accurately predicted the E-Drel interface structure (FIGS. 4B- 4E). The apparent rigidity of the protein backbone in the interface undoubtedly facilitated the agreement of predicted side chain conformations to the actual structure. Overall, the side chains in the designed and experimental interfaces superimpose well including both conserved and substantially altered residues.
The availability of a high-resolution crystal structure provided an opportunity to further analyze factors contributing to interface stabilization and the conformations of individual side chains. Computational alanine scanning (Tania Kortemme, University of Washington, Seattle, WA, USA and David Baker, University of Washington, Seattle, WA, USA; personal communication) was performed on all residues in the interface area. Each residue was separately mutated in silico to alanine, and the effect of the mutation on the interface free energy was computed (Table 5). Consistent with the design strategy, all residues substituted in the design procedure are predicted to contribute significantly to the interface free energy of the E-E>rel crystal structure. The two exceptions (R55 and N193) are most likely due to subtle template differences between the model and the x-ray structure, as the discrepancies disappear when the crystal structure is used to repeat the design procedure. This computational alanine scan suggests that residues Y13, W19 and F194 are vital energetic "hot spots" for interface stabilization (Table 5). Y13 forms hydrogen bonds across the interface to both DUO and N193 (FIG. 4C). W19 interacts with three residues across the domain interface: its main interaction, as designed, is stacking with F151. In addition, it forms an H-bond with Q144, and a loose cation- π interaction with R148 (FIG. 4D). These last two interactions were not predicted since Q144 and R148 are residues adjacent to an active site and, as such, were excluded from repacking calculations (see below). Finally, F194 reaches across the interface into a hydrophobic pocket created by LI 7, L47, F51 and 152 (FIG. 4Ε).
TABLE 5. PREDICTED SIDE CHAIN CONTRIBUTIONS OF RESIDUES IN THE
INTERFACE VICINITY TO THE INTERFACE FREE ENERGY FOR E-DREI
Figure imgf000047_0001
Figure imgf000048_0001
Table caption:
a) E-E>rel crystal structure. b) computational design for the E-Drel sequence using a modeled backbone template. c) computational design ofthe E-Drel sequence using the backbone ofthe crystal structure.
ΔΔGjnt values are predicted changes in interface free energy for a side chain mutation to alanine computed as described elsewhere (Tania Kortemme, University of Washington, Seattle, WA, USA and David Baker, University of Washington, Seattle, WA, USA). Positions subjected to sequence design are indicated in bold. Positions with sequence changes in the final E-Z)rel sequence compared to the parent sequences are highlighted in red. Designed alanine residues (A12A and L108A) are not shown.
A fundamental assumption of the engineering strategy (FIG. 1) was that the two independent DNA binding domains in E-Drel would continue to recognize and bind their respective native DNA half-sites. The validity of this assumption was borne out by the crystal structure: all substrate contacts made by E-E>rel originate from β-sheets in the major groove, and contacts made across the ere half-site contained in E-Drel closely resembled those previously documented in I-Oel DNA cocrystal structures (FIG. 5) (Jurica et al, 1998; Chevalier et al, 2001). Residues making contacts in the ere half-site of E-Drel include two arginines, three glutamines, an asparagine and a tyrosine. The dmo half-site interface, which had not been previously visualized, includes direct contacts to DNA bases by four arginines, two acidic residues (Asp and Glu), a tyrosine and a threonine. The l-Dmol domain also makes two base- specific contacts from the protein backbone and stacking between a thymine methyl group and a tyrosine ring. As has been observed in all other homing endonuclease structures, the number of H-bonds in the DNA-protein interface of E-Drel is under-saturated: of 92 potential H-bonds that could be made in the major grove of the 23 base pair interface, only 32 direct and 16 water- mediated contacts were observed (FIG. 5). These two DNA-protein interfaces illustrate the diversity of DNA-protein contacts employed by LAGLIDADG endonucleases.
It was not as clear at the outset of the project that successful redesign of the E-Drel protein interface would generate an active enzymatic catalyst, because the predicted active sites of E-Drel are located directly at the bottom of the redesigned protein interface and might be structurally perturbed during the engineering process. Surprisingly, all of the soluble E-Drel variants retained catalytic activities comparable to the parent endonucleases. Activity appears to be retained because the E-Drel active sites recapitulate LAGLIDADG active site architecture (FIG. 6), which consists of a strictly conserved acidic residue at the base of each LAGLIDADG helix; three catalytic metals bound between the two active sites by these acidic residues; and nonspecific "pockets" in each endonuclease domain that accommodate and order bulk water around each scissile phosphate. The ordering of water molecules appears to be important for catalysis in l-Crel protein side chains make no direct contact to catalytic nucleophiles, the scissile phosphates or the leaving groups during catalysis (Chevalier et al, 2001). The two active sites of E-E»rel contain three Mg2+ ions with the central metal being shared by both active sites. These metal ions are coordinated by D21 and DI 17, each of which is located at the base of a LAGLIDADG interface helix, similar to what had been previously observed for l-Crel bound to target site DNA (Chevalier et al, 2001). Each E-Drel active site contains residues from both domains. Thus, the E-E>rel active sites are themselves chimeras. The substrate DNA in the E-Drel complex is not cleaved, because of a combination of low temperature and low pH during crystal growth. During the redesign process catalytically important residues such as the active site, metal-coordinating aspartates and residues at the periphery of the active sites (e.g., Q42, Q144, R148 and K195) were avoided. However, the cocrystal structure of E-Drel bound to target DNA revealed that W19, which was designed to stack against F151 in the protein interface, made two important interactions with active site residues. These unanticipated contacts included an H-bond across the protein interface with Q144, and a loose cation - π interaction with R148 (FIG. 4D). While these interactions were not anticipated, they did not impede metal binding or catalysis.
EXAMPLE 4
Creating a Modified Nuclease
A nuclease consisting of either a naturally occurring protein or an artificial chimeric protein (described in examples 1 to 3 above) can be modified in a variety of ways to alter the activities delivered to its DNA target sequence. The initial nuclease constructs generally act to create a double strand break at or near the target site. The protein can be modified by mutagenesis in such a manner that any of its active sites (of which there are usually two, but may be more in the case of more complex protein oligomers) are independently inactivated or functionally attenuated, so that one or more sites on each DNA strand are not cleaved, or are cleaved independently to varying extents. The effects of mutations within the active sites of homing endonucleases have been extensively investigated and characterized through a combination of enzymatic and structural analyses in the Stoddard and Monnat laboratories, and this data can be used to design independently attenuated active sites in modified nuclease proteins. (See for example, Galburt et al, 2000; Argast et al, 1998; Galburt, 1999; Chevalier et al, 2001, Flick, 1998; Heath, 1997; Stephens et al, 1997; and Flick et al, 1997).
A natural or redesigned nuclease, including chimeric nucleases, can also be modified by adding or eliminating specific surface residues, including, but not limited to cysteines, lysines, arginines, and histidines, to which a wide variety of specific chemical compounds may be attached covalently or non-covalently. These compounds may include, but are not limited to, fluorophores, chromophores, cross-linking reagents, differentially incorporated isotope labels, radiolabeled compounds, metal chelating agents, and other reactive species.
A natural or redesigned nuclease, including chimeric nucleases, can also be modified by incorporating one or more additional protein domains to either the amino- or carboxy-terminal termini ofthe modified protein. Such additional protein domains can include, but are not limited to enzymatic catalysts, DNA-binding proteins, transcriptional activators or repressors, DNA- remodeling factors, protein binding modules, membrane-binding proteins and/or protein flurophores.
A natural or redesigned nuclease, including chimeric nucleases, can also be modified by repacking and remodeling their cores in order to alter their thermal and/or chemical stability, or to introduce metal-binding sites within their structures.
EXAMPLE 5
Creating Modified Nuclease with Altered Site Specific Activity
A nuclease consisting of either a naturally occurring protein or an artificial chimeric protein (described in examples 1 to 3 above) can be modified in a variety of ways to alter its DNA target specificity, beyond any alteration produced as the result of recombining nuclease domains. Using a structure of a nuclease/DNA complex as a starting model, computational methods similar to those described for the redesign of a protein-protein interface can be used to redesign the protein-DNA interface, with an altered set of amino acids within that interface selected against a novel DNA target sequence. The computational methods can be modified to be more appropriate for such an interface, for example by altering the energetic weighting schemes of various forms of structural and chemical interactions, by accommodating and modeling DNA conformational dynamics, and/or by incorporating explicitly modeled solvent molecules into computational redesign algorithms.
Independently or in concert with computational redesign of the protein surface in the DNA interface, a variety of in vivo and in vitro selection algorithms can be used to select for a nuclease protein, from a partially or completely randomized library of nuclease protein variants, that recognizes a DNA target site of an investigator's choosing. These selection methods can be modified or extended to also ensure that the selected nuclease variant does not bind alternative target sites, also of the investigator's choosing. Selection methods can include, but are not limited to, screens based on DNA binding, DNA binding and nuclease cleavage activity, gene inactivation or inactivation, or any other biochemical or physiological process that is dependent on the presence and/or action ofthe modified protein at a targeted DNA sequence. The screens can be performed in solution, in encapsulated solution systems, or in living prokaryotic or eukaryotic cells. Selection experiments can be performed in conjunction with computational redesign algorithms, or independently of them. Computational screens can be used to help direct the first generation of selected nuclease variants, or can be used independently of selection experiments.
All of the compositions and/or methods and/or apparatus disclosed and claimed herein can be made and executed without undue experimentation in light of the present disclosure.
While the compositions and methods of this invention have been described in terms of preferred embodiments, it will be apparent to those of skill in the art that variations may be applied to the compositions and/or methods and/or apparatus and in the steps or in the sequence of steps ofthe method described herein without departing from the concept, spirit and scope of the invention.
More specifically, it will be apparent that certain agents which are both chemically and physiologically related may be substituted for the agents described herein while the same or similar results would be achieved. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope and concept of the invention as defined by the appended claims.
REFERENCES
The following references, to the extent that they provide exemplary procedural or other details supplementary to those set forth herein, are specifically incorporated herein by reference.
U.S. Patent 4,554,101
U.S. Patent 4,684,611
U.S. Patent 4,952,500
U.S. Patent 5,302,523 U.S. Patent 5,322,783
U.S. Patent 5,384,253
U.S. Patent 5,401,511
U.S. Patent 5,464,765
U.S. Patent 5,538,877 U.S. Patent 5,538,880
U.S. Patent 5,550,318
U.S. Patent 5,563,055
U.S. Patent 5,580,859
U.S. Patent 5,589,466 U.S. Patent 5,591,616
U.S. Patent 5,603,872
U.S. Patent 5,610,042
U.S. Patent 5,656,610
U.S. Patent 5,702,932 U.S. Patent 5,736,524
U.S. Patent 5,780,448
U.S. Patent 5,789,215
U.S. Patent 5,889,155
U.S. Patent 5,945,100 U.S. Patent 5,981,274
U.S. Patent 5,994,624
Argast et al, J. Mol. Biol, 280:345-353, 1998.
Ausubel et al, In: Current Protocols in Molecular Biology, John, Wiley and Sons, Inc, NY, 1994.
Baichwal and Sugden, In: Gene Transfer, Kucherlapati (ed.), NY, Plenum Press, 117-148, 1986. Belfort and Roberts, Nucleic Acids Res., 25:3379-3388, 1997.
Bell-Pedersen et al, Nucleic Acids Res., 18:3763-3770, 1990.
Bibikova et al, Mol. Cell. Biol. 21:289-297, 2001.
Bolon and Mayo, Proc. Natl. Acad. Sci USA, 98:14274-14279, 2001. Bryk et al, EMBO J., 12:4040-4041, 1993.
Chen and Okayama, Mol. Cell. Biol, 7(8):2745-2752, 1987.
Chevalier and Stoddard, Nucleic Acids Res., 29:3757-3774, 2001.
Chevalier et al, Nature Struct. Biol, 8:312-316, 2001.
Conte et α/., J. Mol. Biol, 285:2177-2198, 1999. Coupar et al, Gene, 68:1-10, 1988.
Dahiyat and Mayo, Science, 278:80-81, 1997.
Dalgaard et al, Nucleic Acids Res., 25:4626-4638, 1997.
Dalgaard et al, Proc. Natl. Acad. Sci. USA, 90:5414-5417, 1993.
Denovan- Wright et al, Plant Mol. Biol, 36:285-295, 1998. Drouin et al, Nucleic Acids Res., 28 :4566-4572, 2000.
Duan et /., Cell, 89:555-564, 1997.
Dunbrack, R. L., jr. & Cohen, F. E., Protein Sci 6: 1661-81, 1997.
Dunbrack, "Rotamer Libraries in the 21st Century", Curr. Opin. Struct. Biol. 12:431-440, 2002
Eddy and Gold, Genes Dev., 5:1032-1041, 1991. Farinas et al, Curr. Opin. Biotech., 12:545-551, 2001.
Fechheimer et al, Proc. Natl. Acad. Sci. USA, 84:8463-8467, 1987.
Flick et al, Protein Science 6: 2677 - 2680, 1997
Flick et al, Nature 394: 96 - 10, 1998
Fraley et al, Proc. Natl. Acad. Sci. USA, 76:3348-3352, 1979. Friedmann, Science, 244:1275-1281, 1989.
Galburt et al, Nature Struct. Biol, 6:1096-1099, 1999.
Galburt et al, J. Mol. Biol. 300:877-887, 2000
Gimble and Thorner, Nature, 357:301-306, 1992.
Goodrich-Blair and Shub, Cell, 84:211-221, 1996. Goodrich-Blair and Shub, Nucleic Acies Res., 22:3715-3721, 1994.
Goodrich-Blair et al, Cell, 63:417-424, 1990.
Gopal, ? . Cell. Biol, 5:1188-1190, 1985.
Gotbalenya, Nucleic Acids Res., 26:1741-1748, 1998.
Gorbalenya, Protein Sci, 3: 1117-1120, 1994. Graham and Nan Der Eb, J Virology, 52:456-467, 1973.
Gruen et al, Nucleic Acids Res., 30:29-34, 2002.
Guo et al, Science, 289:452-457, 2000.
Halford et α/., R/oc/2e/n. J, 191:581-592, 1980. Harbury et al, Science, 282:1462-1467, 1998.
Harland and Weintraub, J. Cell Biol, 101:1094-1099, 1985.
Heath et al, Nature Struct. Biol, 4:468-476, 1997.
Hermonat and Muzycska, Proc. Natl. Acad. Sci. USA, 81:6466-6470, 1984.
Ho et al, Proc. Natl. Acad. Sci. USA, 94:8994-8999, 1997. Holloway et al, Curr. Genet., 36:69-78, 1999.
Horwich etα/., J Virol, 64:642-650, 1990.
Hutchinson and Thornton, Protein Sci., 3:2207-2216, 1994.
Ichiyanagi et al, J. Molec. Biol, 300:889-901, 2000.
Jacob, Science 196:1161-1166, 1977. Johansen et al, Nucleic Acids Res. , 21 :4405, 1993.
Johnson et al, In: Biotechnology And Pharmacy, Pezzuto et al. (eds.), Chapman and Hall, ΝY, 1993.
Jurica and Stoddard, Cell. Mol. Life Sci., 55:1304-1326, 1999.
Jurica et al, Mol. Cell, 2:469-476, 1998. Kaeppler et al, Plant Cell Reports, 9:415-418, 1990.
Kaneda et al, Science, 243:375-378, 1989.
Kato et al, J. Biol. Chem., 266:3361-3364, 1991.
Kleanthous et al, Nature Struct. Biol, 6:243-252, 1999.
Ko et al, Struct. Fold. Des., 7:91-102, 1999. Kostriken et al. , Cell, 35 : 167- 174, 1983.
Kowalski et al, Nucleic Acids Res., 27:2115-2125, 1999.
Kroymann and Zetsche, Curr. Genet., 31:414-418, 1997.
Kuhlman and Baker, Proc. Natl. Acad. Sci. USA USA, 97:10383-10388, 2000.
Kuhlmann et al, FEBSLett., 463:1-2, 1999. Kyte. and Doolittle, J. Mol. Biol, 157:105-132, 1982
Lambowitz and Belfort, Annu. Rev. Biochem., 62:587-622, 1993.
Lanio et al, J. Mol. Biol, 283:59-69, 1998.
Lanio et al, Protein Eng., 9:1005-1010, 1996.
Lazaridis and Karplus, Proteins 35:133-152, 1999. Maniatis, et al, In: Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press, Cold Spring Harbor, NY, 1990.
Marshall and Lemieus, Nucleic Acids Res., 20:6401-6407, 1992.
Matsuura et al, Genes Dev., 11:2910-2924, 1997. Mendes, J. et al, Curr. Opin. Struct. Biol. 12:441 -446, 2002).
Moneilhet et al, Nucleic Acids Res., 28:1245-1251, 2000.
Neria, E., Fischer, S. & Karplus, M. (1996) J Chem Phys 105, 1902-21.
Nicolau and Sene, Biochim. Biophys. Acta, 721:185-190, 1982.
Nicolau et al, Methods Enzymol, 149:157-176, 1987. Omirulleh et al, Plant Mol. Biol, 21(3):415-428, 1993.
Paquin et al, Curr. Genet., 28:97-99, 1995.
Paquin et al, Proc. Natl. Acad. Sci. USA, 91:11807-11810, 1994.
PCT Application No. WO 94/09699
PCT Application No. WO 95/06128 Pietrokovski, Protein Sci. , 7 : 64-71 , 1998.
Pokala and Handel, J. Struct. Biol, 134:269-281, 2001.
Potrykus et α/., Mol. Gen. Genet, 199:183-188, 1985.
Potter et al, Proc. Natl. Acad. Sci. USA, 81:7161-7165, 1984.
Ridgeway, In: Vectors: A survey of molecular cloning vectors and their uses, Stoneham: Butterworth, 467-492, 1988.
Rippe et «/., o/. Cell. Biol, 10:689-695, 1990.
Saguez et al, Nucleic Acids Res., 28:1299-1306, 2000.
Sambrook et al, In: Molecular cloning, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, 2001 Schafer et al. , Curr. Genet. , 25 :336-41 1994.
Schmidt-Dannert, Biochem., 40:13125-13135, 2001.
Sharma et al, Proc. Natl. Acad. Sci. USA, 89:6658-6662, 1992.
Shub et al, Trends Biochem., 19:402-404, 1994.
Silva et /., J Mol. Biol, 286:1123-1136, 1999. Smith et al, Nucl Acids Res. 27:274-281, 1999.
Smith et al, Nucl Acids Res. 28:3361-3369, 2000.
Stephens et al, Proteins 28: 137 - 139, 1997
Thompson et al, Gene, 119:247-251, 1992.
Tian et al, J. Mol. Biol, 218:747-760, 1991. Tur-Kaspa et α/., Mol. Cell Biol, 6:716-718, 1986. Wang et al., Nucleic Acids Res., 25:3767-3776, 1997. Watabe etal, J. Biochem. (Tokyo), 90:1623-1632, 1981. Watabe et al, J. Biol. Chem., 258:4663-4665, 1983. Wenz et al, Biochim. Biphys. Acta, 1219:73-80, 1994.
Wigley et al, Nature Biotech., 19:131-136, 2001. Wong et al, Gene, 10:87-94, 1980. Wu and Wu, Biochemistry, 27:887-892, 1988. Wu and Wu, J Biol. Chem., 262:4429-4432, 1987. Zimmerly et al, Cell, 82:545-554, 1995a.
Zimmerly et al, Cell, 83:529-538, 1995b.

Claims

1. A method of creating a modified nuclease with nucleic acid sequence-specific activity comprising:
(a) preparing a computational model of a complex between an unmodified nuclease and the specific nucleic acid sequence, wherein the nuclease comprises a catalytic domain and a DNA binding domain;
(b) identifying potential contact points between the DNA binding domain and the specific nucleic acid sequence; and,
(c) identifying an amino acid change that creates or enhances a contact point to improve an interface between the DNA binding domain and the nucleic acid sequence, and further provides a design for the modified nuclease.
2. The method of claim 1, wherein the nucleic acid is double-stranded DNA.
3. The method of claim 1, wherein the specific nucleic acid sequence is at least 8 residues in length.
4. The method of claim 1 , further comprising preparing the modified nuclease.
5. The method of claim 1, wherein the nuclease is a homing endonuclease.
6. The method of claim 5, wherein the homing endonuclease is a member of the LAGLIDADG family.
7. The method of claim 6, wherein the LAGLIDADG endonuclease is l-Dmo-l, Cre-l, I- Cettl, I-Scel, I-Scell, I-SceV, I-SceVI, or l-Llal.
8. The method of claim 1, wherein the amino acid change results in reduced steric hindrance or improper bonding between the DNA binding domain and the nucleic acid sequence.
9. The method of claim 1, wherein the amino acid change creates an contact point between the DNA binding domain and the nucleic acid sequence.
10. The method of claim 1, wherein amino acid changes are determined computationally.
10.1. The method of claim 1, further comprising obtaining a crystal structure ofthe unmodified nuclease.
10.2. The method of claim 10.1, wherein the crystal structure further comprises a specific nucleic acid sequence ofthe unmodified nuclease.
10.3. The method of claim 10.1, wherein obtaining a crystal structure of the unmodified nuclease comprises generating the crystal structure.
11. The method of claim 4, further comprising detecting nucleic acid sequence-specific activity ofthe modified nuclease by a binding assay.
12. The method of claim 10, wherein computationally determining amino acid changes comprises calculating the free energy of amino acid residues at each contact point or of amino acid residues along the interface.
13. The method of claim 12, wherein computationally detennining amino acid changes comprises evaluating a library of amino acid side chain rotamer conformations in different backbone-dependent rotameric states for reduced free energy.
14. The method of claim 13, wherein the library excludes cysteine.
15. The method of claim 13, wherein the different backbone-dependent rotameric states comprise (i) rotamers interacting with surrounding groups; (ii) rotamers interacting with a fixed portion of the molecule including the backbone and all side chains not subject to substitution; and (iii) pairwise rotamer to rotamer energies.
16. The method of claim 12, further comprising calculating the free energy of the designed modified nuclease.
17. The method of claim 12, wherein the overall minimum free energy is increased by (i) providing at least one polypeptide backbone models with different relative orientations of LAGLIDADG sequences in the interface between the DNA binding domain and the nucleic acid sequence; (ii) performing sequence design assays on the polypeptide backbone; and (iii) obtaining different amino acid combinations at each contact point.
18. The method of claim 17, wherein the number of amino acid combinations is reduced by:
(a) eliminating sequences that affect the activity of nearby contact point residues, (b) choosing between redundant residues by their ability to form hydrogen bonds with neighboring residues,
(c) screening for optimum rotamer conformation for each sequence; and
(d) identifying the top scoring interface free energy sequences having the overall minimum free energy.
19. The method of claim 4, wherein the modified nuclease further comprises a reactive group.
20. The method of claim 19, wherein the reactive group comprises a cross-linking agent, a fluorophore, a chromophore, a metal chelator, or a protein catalytic domain attached to the modified nuclease.
21. The method of claim 20, wherein the reactive group is chemically attached to the modified nuclease.
22. The method of claim 1, wherein the amino acid change is implemented by site-directed mutagenesis of a nucleic acid encoding the nuclease prior to modification.
23. The method of claim 4, further comprising assaying the modified nuclease for solubility and folding.
24. The method of claim 4, wherein the modified nuclease can cleaves both strands of a target DNA site.
25. The method of claim 4, wherein the modified nuclease can cleave the plus strand of a DNA target site.
26. The method of claim 4, wherein the modified nuclease can cleave the minus strand of a DNA target site.
27. The method of claim 1, wherein the modified nuclease is chimeric and comprises multiple DNA binding domains.
28. The method of claim 27, wherein step (a) involves preparing a computational model of a complex between a chimeric nuclease and a nucleic acid sequence, wherein the cliimeric nuclease comprises (i) a first nuclease, comprising a DNA binding domain and a catalytic domain and (ii) at least a second DNA binding domain; and the method further comprises:
(d) substituting amino acids that are potential contact points between the second DNA binding domain and the first nuclease, wherein the substitution ofthe amino acid creates or improves the protein-protein contact points, and wherein the substitution provides a design for a modified chimeric nuclease
29. The method of claim 28, wherein the nuclease is from a homing endonuclease.
30. The method of claim 29, wherein the homing endonuclease is a LAGLIDADG endonuclease.
31. The method of claim 30, wherein the LAGLIDADG endonuclease is l-Dmo-l, Cre-l, I- Ceul, l-Scel, I-Sceϊl, 1-SceV, 1-SceYl, or l-Llal.
32. The method of claim 27, wherein the second DNA binding domain is from a homing endonuclease.
33. The method of claim 27, wherein the chimera further comprises a peptide linker molecule to create a monomeric protein.
34. The method of claim 33, wherein the peptide linker is located between the first DNA binding domain and the second DNA binding domain.
35. A modified nuclease that has altered sequence-specificity made by the method comprising:
! (a) preparing a computational model of a complex between a nuclease and an altered target nucleic acid sequence, wherein the nuclease comprises a catalytic domain and a DNA binding domain; (b) identifying potential contact points between the DNA binding domain and the nucleic acid sequence; (c) identifying an amino acid change that creates or enhances a contact point to improve an interface between the DNA binding domain and the target nucleic acid sequence, and further provides a design for the modified nuclease; and, (d) preparing the modified nuclease.
36. A method of designing a modified chimeric nucleic acid binding polypeptide with sequence-specific activity comprising:
(a) preparing a computational model of a chimeric nucleic acid binding polypeptide and a target nucleic acid sequence, wherein the chimeric nucleic acid binding polypeptide comprises (i) a first polypeptide having a nucleic acid binding domain and (ii) at least a second polypeptide having a nucleic acid binding domain; (b) identifying an amino acid interference between the nucleic acid binding domains of the first polypeptide and the second polypeptide, wherein the interference comprises sterically hindered groups or improperly bonded groups; and (c) identifying an amino acid substitution to alleviate the interference while maintaining or enhancing an interface between the chimeric nucleic acid binding polypeptide and the nucleic acid sequence, wherein the substitution provides a design for the modified chimeric nucleic acid binding polypeptide.
37. The method of claim 36, wherein the modified chimeric nucleic acid binding polypeptide further comprises a catalytic domain having site-specific nuclease activity.
38. The method of claim 36, wherein the first polypeptide comprises a DNA binding domain from a homing endonuclease.
39. The method of claim 38, wherein the homing nuclease is a LAGLIDADG endonuclease.
40. The method of claim 36, wherein the second polypeptide has a DNA binding domain from a homing endonuclease.
41. The method of claim 40, wherein the homing endonuclease is a LAGLIDADG endonuclease.
42. The method of claim 39, wherein the first polypeptide comprises the DNA binding domain from Dmo-I.
43. The method of claim 41, wherein the second polypeptide comprises the DNA binding domain from Cre-I.
44. A modified nuclease with nucleic acid sequence-specific activity produced by the method of claim 1.
45. A modified chimeric nuclease with site-specific activity at a combined target nucleotide sequence comprising:
(a) a first DNA binding domain and a catalytic domain from a first homing endonuclease; and, (b) a second DNA binding domain from a second homing endonuclease,
wherein the chimeric nuclease is capable of binding the combined target nucleotide sequence of the first and second DNA binding domains of the first and second DNA binding domains.
46. The modified chimeric nuclease of claim 45, wherein the homing endonuclease of the first DNA binding domain is a LAGLIDADG endonuclease.
47. The modified chimeric nuclease of claim 45, wherein the homing endonuclease of the second DNA binding domain is a LAGLIDADG endonuclease.
48. The modified chimeric nuclease of claim 45, further comprising a linker peptide.
49. The modified chimeric nuclease of claim 45, further comprising a cross-linking agent, a fluorophore, a chromophore, a metal chelator, or a protein catalytic domain attached to the modified chimeric nuclease.
50. The modified chimeric nuclease of claim 45, wherein the first nuclease further comprises an amino acid substitution.
51. The modified chimeric nuclease of claim 50, wherein said amino acid substitution in the first nuclease reduces steric hindrance ofthe modified chimeric nuclease compared to the unmodified chimeric nuclease.
52. The modified chimeric nuclease of claim 45 wherein the second nuclease further comprises an amino acid substitution.
53. The modified chimeric nuclease of claim 52, wherein the at least one amino acid substitution in the second nuclease reduces steric hindrance of the modified chimeric nuclease compared to the unmodified chimeric nuclease.
54. The modified chimeric nuclease of claim 45, wherein the first nuclease is Dmo-I.
55. The modified chimeric nuclease of claim 45, wherein the second nuclease is Cre-I.
56. The modified chimeric nuclease of claim 45, further comprising a peptide linker molecule to create a monomeric protein.
57. The modified chimeric nuclease of claim 56, wherein the peptide linker between the first nuclease and the second nuclease comprises the amino acid sequence of NGN.
58. A modified chimeric DNA nuclease produced by the method of claim 36 comprising a) a first nuclease comprising a DNA binding domain and a catalytic domain and b) at least a second DNA binding domain.
59. A modified chimeric homing endonuclease comprising the DNA binding and catalytic domains of Dmo-I and the DNA binding domain of Cre-I, wherein the endonuclease has site-specific activity at a combined target sequence comprising the target sequences of Dmo-I and Cre-I.
60. A method of designing a modified chimeric nuclease with nucleic acid sequence-specific activity comprising:
(a) preparing a computational model of a chimeric nuclease and a DNA sequence, wherein said chimeric nuclease comprises (i) a first nuclease, comprising a DNA binding domain and a catalytic domain and (ii) at least a second DNA binding domain;
(b) identifying an amino acid interference between the first nuclease and the second DNA binding domain, the interference comprising sterically hindered groups or improperly bonded groups; and (c) identifying an amino acid substitution to alleviate the interference yet maintains or enhance an interface between the nuclease and the DNA sequence, wherein the substitution providing a design for the modified chimeric nuclease.
61. The method of claim 60, further comprising preparing the modified chimeric nuclease.
62. The method of claim 60, wherein the first nuclease is a homing endonuclease.
63. The method of claim 62, wherein the homing nuclease is a LAGLIDADG endonuclease.
64. The method of claim 60, wherein the second DNA binding domain is from a homing endonuclease.
65. The method of claim 64, wherein the homing endonuclease is a LAGLIDADG endonuclease.
66. The method of claim 60, wherein said amino acid substitution reduces steric hindrance at an interface.
67. The method of claim 60, wherein said amino acid substitution reduces improper bonding at an interface.
68. The method of claim 66, wherein amino acid substitutions are determined computationally.
69. The method of claim 66, wherein amino acid substitutions are determined empirically.
70. The method of claim 68, wherein determining comprises calculating the free energy of amino acid residues at each amino acid interference or at one or more amino acids along the interface.
71. The method of claim 70, wherein the free energy calculations are performed using a Monte-Carlo method.
72. The method of claim 70, wherein computational determination comprises evaluating a library of amino acid side chain conformations in different backbone-dependent rotameric states for reduced free energy.
73. The method of claim 72, wherein the library excludes cysteine.
74. The method of claim 72, wherein the different backbone-dependent rotameric states comprise (i) rotamers interacting with surrounding groups; (ii) rotamers interacting with a fixed portion of the molecule including the backbone and all side chains not subject to substitution; and (iii) pairwise rotamer to rotamer energies.
75. The method of claim 70, further comprising calculating the free energy of the complete chimera.
76. The method of claim 75, wherein the free energy calculations are performed using a Monte-Carlo method.
77. The method of claim 76, wherein the Monte-Carlo method is enhanced to increase the overall minimum free energy by (a) providing at least two polypeptide backbone models with different relative orientations of LAGLIDADG sequences in the domain interface; (b) performing sequence design assays on each polypeptide backbone; and (c) obtaining sequences with different amino acid combinations at each contact point.
78. The method of claim 77, wherein the number of different amino acid combinations is reduced by:
(a) eliminating sequences that affect the activity of nearby contact point residues,
(b) choosing between redundant residues by their ability to form hydrogen bonds with neighboring residues,
(c) screening for optimum rotamer conformation for each sequence; and
(d) identifying the top scoring interface free energy sequences having the overall minimum free energy.
79. The method of claim 60, wherein the chimera further comprises a peptide linker molecule to create a monomeric protein.
80. The method of claim 79, wherein the peptide linker is located between the first DNA binding domain and the second DNA binding domain.
81. The method of claim 60, wherein the chimera further comprises a reactive group.
82. The method of claim 81, wherein the reactive group comprises a cross-linking agent, a fluorophore, a chromophore, a metal chelator, or a protein catalytic domain attached to said chimeric nuclease.
83. The method of claim 82, wherein the reactive group is chemically attached to the modified chimeric nuclease.
84. The method of claim 62, wherein said substitution is effected by site-directed mutagenesis of a nucleic acid encoding the unmodified chimeric endonuclease.
85. The method of claim 62, further comprising assaying the modified chimeric nuclease for solubility and folding.
86. The method of claim 60, wherein the chimeric nuclease cleaves both strands of a target DNA site.
87. The method of claim 60, wherein the chimeric nuclease cleaves the plus strand of a DNA target site.
88. The method of claim 60, wherein the chimeric nuclease cleaves the minus strand of a DNA target site.
89. A method of screening to identify a modified DNA binding polypeptide with altered DNA sequence-specific activity comprising:
(a) generating polypeptides with one or more amino acid substitutions in their DNA binding domains;
(b) contacting the polypeptides with nucleic acid segments with random sequences, under conditions that allow the DNA binding domains ofthe polypeptides to bind specifically to the nucleic acid segments;
(c) identifying which polypeptides specifically bind the nucleic acid segments; and, (d) identify the sequences ofthe nucleic acid segments.
90. The method of claim 89, wherein the polypeptides are nucleases.
91. The method of claim 89, wherein the modified polypeptides specifically bind a sequence of at least 9 residues.
PCT/US2003/027875 2002-09-06 2003-09-05 Methods and compositions concerning designed highly-specific nucleic acid binding proteins WO2004031346A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2003290518A AU2003290518A1 (en) 2002-09-06 2003-09-05 Methods and compositions concerning designed highly-specific nucleic acid binding proteins

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US40884702P 2002-09-06 2002-09-06
US60/408,847 2002-09-06

Publications (2)

Publication Number Publication Date
WO2004031346A2 true WO2004031346A2 (en) 2004-04-15
WO2004031346A3 WO2004031346A3 (en) 2006-10-26

Family

ID=32069688

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2003/027875 WO2004031346A2 (en) 2002-09-06 2003-09-05 Methods and compositions concerning designed highly-specific nucleic acid binding proteins

Country Status (2)

Country Link
AU (1) AU2003290518A1 (en)
WO (1) WO2004031346A2 (en)

Cited By (69)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1591521A1 (en) * 2004-04-30 2005-11-02 Cellectis I-Dmo I derivatives with enhanced activity at 37 degrees C and use thereof
WO2006097854A1 (en) * 2005-03-15 2006-09-21 Cellectis Heterodimeric meganucleases and use thereof
WO2007034262A1 (en) * 2005-09-19 2007-03-29 Cellectis Heterodimeric meganucleases and use thereof
WO2007057781A2 (en) 2005-10-25 2007-05-24 Cellectis Laglidadg homing endonuclease variants having mutations in two functional subdomains and use thereof.
WO2008093152A1 (en) * 2007-02-01 2008-08-07 Cellectis Obligate heterodimer meganucleases and uses thereof
WO2010001189A1 (en) * 2008-07-03 2010-01-07 Cellectis The crystal structure of i-dmoi in complex with its dna target, improved chimeric meganucleases and uses thereof
WO2010136981A2 (en) 2009-05-26 2010-12-02 Cellectis Meganuclease variants cleaving the genome of a pathogenic non-integrating virus and uses thereof
WO2011007336A1 (en) 2009-07-17 2011-01-20 Cellectis Viral vectors encoding a dna repair matrix and containing a virion-associated site specific meganuclease for gene targeting
WO2011021166A1 (en) 2009-08-21 2011-02-24 Cellectis Meganuclease variants cleaving a dna target sequence from the human lysosomal acid alpha-glucosidase gene and uses thereof
WO2011064750A1 (en) * 2009-11-27 2011-06-03 Basf Plant Science Company Gmbh Chimeric endonucleases and uses thereof
WO2011064751A1 (en) * 2009-11-27 2011-06-03 Basf Plant Science Company Gmbh Chimeric endonucleases and uses thereof
WO2011082310A2 (en) 2009-12-30 2011-07-07 Pioneer Hi-Bred International, Inc. Methods and compositions for targeted polynucleotide modification
WO2011101811A2 (en) 2010-02-18 2011-08-25 Cellectis Improved meganuclease recombination system
EP2365065A2 (en) 2005-10-25 2011-09-14 Cellectis I-Crel homing endonuclease variants having novel cleavage specificity and use thereof
US8021867B2 (en) 2005-10-18 2011-09-20 Duke University Rationally-designed meganucleases with altered sequence specificity and DNA-binding affinity
WO2011141820A1 (en) 2010-05-12 2011-11-17 Cellectis Meganuclease variants cleaving a dna target sequence from the dystrophin gene and uses thereof
WO2011141825A1 (en) 2010-05-12 2011-11-17 Cellectis Meganuclease variants cleaving a dna target sequence from the rhodopsin gene and uses thereof
WO2012001527A2 (en) 2010-06-15 2012-01-05 Cellectis S.A. Method for improving cleavage of dna by endonuclease sensitive to methylation
WO2012004671A2 (en) 2010-07-07 2012-01-12 Cellectis Meganucleases variants cleaving a dna target sequence in the nanog gene and uses thereof
WO2012007848A2 (en) 2010-07-16 2012-01-19 Cellectis Meganuclease variants cleaving a dna target sequence in the was gene and uses thereof
WO2012010976A2 (en) 2010-07-15 2012-01-26 Cellectis Meganuclease variants cleaving a dna target sequence in the tert gene and uses thereof
WO2012058458A2 (en) 2010-10-27 2012-05-03 Cellectis Sa Method for increasing the efficiency of double-strand break-induced mutagenesis
WO2012129373A2 (en) 2011-03-23 2012-09-27 Pioneer Hi-Bred International, Inc. Methods for producing a complex transgenic trait locus
WO2012138901A1 (en) 2011-04-05 2012-10-11 Cellectis Sa Method for enhancing rare-cutting endonuclease efficiency and uses thereof
WO2012138927A2 (en) 2011-04-05 2012-10-11 Philippe Duchateau Method for the generation of compact tale-nucleases and uses thereof
WO2012149470A1 (en) 2011-04-27 2012-11-01 Amyris, Inc. Methods for genomic modification
WO2012168910A1 (en) * 2011-06-10 2012-12-13 Basf Plant Science Company Gmbh Nuclease fusion protein and uses thereof
EP2535406A1 (en) 2007-07-23 2012-12-19 Cellectis Meganuclease variants cleaving a DNA target sequence from the human hemoglobin beta gene and uses thereof
US8338157B2 (en) 2008-03-11 2012-12-25 Precision Biosciences, Inc. Rationally-designed meganuclease variants of lig-34 and I-crei for maize genome engineering
WO2013009525A1 (en) 2011-07-08 2013-01-17 Cellectis S.A. Method for increasing the efficiency of double-strand break-induced mutagenssis
WO2013019411A1 (en) 2011-08-03 2013-02-07 E. I. Du Pont De Nemours And Company Methods and compositions for targeted integration in a plant
EP2568048A1 (en) 2007-06-29 2013-03-13 Pioneer Hi-Bred International, Inc. Methods for altering the genome of a monocot plant cell
WO2013066423A2 (en) 2011-06-21 2013-05-10 Pioneer Hi-Bred International, Inc. Methods and compositions for producing male sterile plants
WO2013166113A1 (en) 2012-05-04 2013-11-07 E. I. Du Pont De Nemours And Company Compositions and methods comprising sequences having meganuclease activity
EP2770052A1 (en) 2007-06-06 2014-08-27 Cellectis Method for enhancing the cleavage activity of I-CreI derived meganucleases
WO2014164466A1 (en) 2013-03-12 2014-10-09 E. I. Du Pont De Nemours And Company Methods for the identification of variant recognition sites for rare-cutting engineered double-strand-break-inducing agents and compositions and uses thereof
US8927247B2 (en) 2008-01-31 2015-01-06 Cellectis, S.A. I-CreI derived single-chain meganuclease and uses thereof
WO2015088643A1 (en) 2013-12-11 2015-06-18 Regeneron Pharmaceuticals, Inc. Methods and compositions for the targeted modification of a genome
WO2015095804A1 (en) 2013-12-19 2015-06-25 Amyris, Inc. Methods for genomic integration
WO2015188109A1 (en) 2014-06-06 2015-12-10 Regeneron Pharmaceuticals, Inc. Methods and compositions for modifying a targeted locus
WO2015200805A2 (en) 2014-06-26 2015-12-30 Regeneron Pharmaceuticals, Inc. Methods and compositions for targeted genetic modifications and methods of use
WO2016061374A1 (en) 2014-10-15 2016-04-21 Regeneron Pharmaceuticals, Inc. Methods and compositions for generating or maintaining pluripotent cells
WO2016100819A1 (en) 2014-12-19 2016-06-23 Regeneron Pharmaceuticals, Inc. Methods and compositions for targeted genetic modification through single-step multiple targeting
US9683257B2 (en) 2008-07-14 2017-06-20 Precision Biosciences, Inc. Recognition sequences for I-CreI-derived meganucleases and uses thereof
WO2018023014A1 (en) 2016-07-29 2018-02-01 Regeneron Pharmaceuticals, Inc. Mice comprising mutations resulting in expression of c-truncated fibrillin-1
EP3354732A1 (en) 2014-06-23 2018-08-01 Regeneron Pharmaceuticals, Inc. Nuclease-mediated dna assembly
WO2019043082A1 (en) 2017-08-29 2019-03-07 Kws Saat Se Improved blue aleurone and other segregation systems
EP3456831A1 (en) 2013-04-16 2019-03-20 Regeneron Pharmaceuticals, Inc. Targeted modification of rat genome
EP3460063A1 (en) 2013-12-11 2019-03-27 Regeneron Pharmaceuticals, Inc. Methods and compositions for the targeted modification of a genome
WO2019067875A1 (en) 2017-09-29 2019-04-04 Regeneron Pharmaceuticals, Inc. Non-human animals comprising a humanized ttr locus and methods of use
EP3561050A1 (en) 2013-02-20 2019-10-30 Regeneron Pharmaceuticals, Inc. Genetic modification of rats
WO2020123377A1 (en) 2018-12-10 2020-06-18 Neoimmunetech, Inc. Nrf-2 deficient cells and uses thereof
WO2020131632A1 (en) 2018-12-20 2020-06-25 Regeneron Pharmaceuticals, Inc. Nuclease-mediated repeat expansion
WO2020206134A1 (en) 2019-04-04 2020-10-08 Regeneron Pharmaceuticals, Inc. Methods for scarless introduction of targeted modifications into targeting vectors
WO2020206139A1 (en) 2019-04-04 2020-10-08 Regeneron Pharmaceuticals, Inc. Non-human animals comprising a humanized coagulation factor 12 locus
WO2020247812A1 (en) 2019-06-07 2020-12-10 Regeneron Pharmaceuticals, Inc. Non-human animals comprising a humanized albumin locus
WO2020247452A1 (en) 2019-06-04 2020-12-10 Regeneron Pharmaceuticals, Inc. Non-human animals comprising a humanized ttr locus with a beta-slip mutation and methods of use
WO2021108363A1 (en) 2019-11-25 2021-06-03 Regeneron Pharmaceuticals, Inc. Crispr/cas-mediated upregulation of humanized ttr allele
WO2022240846A1 (en) 2021-05-10 2022-11-17 Sqz Biotechnologies Company Methods for delivering genome editing molecules to the nucleus or cytosol of a cell and uses thereof
WO2022251644A1 (en) 2021-05-28 2022-12-01 Lyell Immunopharma, Inc. Nr4a3-deficient immune cells and uses thereof
WO2022256437A1 (en) 2021-06-02 2022-12-08 Lyell Immunopharma, Inc. Nr4a3-deficient immune cells and uses thereof
WO2023064924A1 (en) 2021-10-14 2023-04-20 Codiak Biosciences, Inc. Modified producer cells for extracellular vesicle production
WO2023129974A1 (en) 2021-12-29 2023-07-06 Bristol-Myers Squibb Company Generation of landing pad cell lines
EP4219731A2 (en) 2016-05-18 2023-08-02 Amyris, Inc. Compositions and methods for genomic integration of nucleic acids into exogenous landing pads
WO2023220603A1 (en) 2022-05-09 2023-11-16 Regeneron Pharmaceuticals, Inc. Vectors and methods for in vivo antibody production
WO2023225665A1 (en) 2022-05-19 2023-11-23 Lyell Immunopharma, Inc. Polynucleotides targeting nr4a3 and uses thereof
WO2024064952A1 (en) 2022-09-23 2024-03-28 Lyell Immunopharma, Inc. Methods for culturing nr4a-deficient cells overexpressing c-jun
WO2024064958A1 (en) 2022-09-23 2024-03-28 Lyell Immunopharma, Inc. Methods for culturing nr4a-deficient cells
WO2024077174A1 (en) 2022-10-05 2024-04-11 Lyell Immunopharma, Inc. Methods for culturing nr4a-deficient cells

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHEVALIER B.S. ET AL.: 'Design, Activity, and Structure of a Highly Specific Artificial Endonuclease' MOL. CELL vol. 10, no. 4, October 2002, pages 895 - 905, XP002248750 *
HADI M.Z. ET AL.: 'Determinants in Nuclease Specificity of Ape1 and Ape2, Human Homologues of Escherichia coli Exonucleoase III' J. MOL. BIOL. vol. 316, no. 3, February 2002, pages 853 - 866, XP004470963 *

Cited By (122)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1591521A1 (en) * 2004-04-30 2005-11-02 Cellectis I-Dmo I derivatives with enhanced activity at 37 degrees C and use thereof
WO2005105989A1 (en) * 2004-04-30 2005-11-10 Cellectis I-dmoi derivatives with enhanced activity at 37°c and use thereof.
US8211685B2 (en) 2004-04-30 2012-07-03 Cellectis I-DmoI derivatives with enhanced activity at 37° C and use thereof
WO2006097854A1 (en) * 2005-03-15 2006-09-21 Cellectis Heterodimeric meganucleases and use thereof
WO2007034262A1 (en) * 2005-09-19 2007-03-29 Cellectis Heterodimeric meganucleases and use thereof
US8119361B2 (en) 2005-10-18 2012-02-21 Duke University Methods of cleaving DNA with rationally-designed meganucleases
US8021867B2 (en) 2005-10-18 2011-09-20 Duke University Rationally-designed meganucleases with altered sequence specificity and DNA-binding affinity
US8304222B1 (en) 2005-10-18 2012-11-06 Duke University Rationally-designed meganucleases with altered sequence specificity and heterodimer formation
US8163514B2 (en) 2005-10-18 2012-04-24 Duke University Methods of cleaving DNA with rationally-designed meganucleases
US8148098B2 (en) 2005-10-18 2012-04-03 Duke University Methods of cleaving DNA with rationally-designed meganucleases
US8143016B2 (en) 2005-10-18 2012-03-27 Duke University Methods of cleaving DNA with rationally-designed meganucleases
US8143015B2 (en) 2005-10-18 2012-03-27 Duke University Methods of cleaving DNA with rationally-designed meganucleases
US8133697B2 (en) 2005-10-18 2012-03-13 Duke University Methods of cleaving DNA with rationally-designed meganucleases
US8129134B2 (en) 2005-10-18 2012-03-06 Duke University Methods of cleaving DNA with rationally-designed meganucleases
US8124369B2 (en) 2005-10-18 2012-02-28 Duke University Method of cleaving DNA with rationally-designed meganucleases
US8119381B2 (en) 2005-10-18 2012-02-21 Duke University Rationally-designed meganucleases with altered sequence specificity and DNA-binding affinity
US8377674B2 (en) 2005-10-18 2013-02-19 Duke University Method for producing genetically-modified cells with rationally-designed meganucleases with altered sequence specificity
EP2368980A2 (en) 2005-10-25 2011-09-28 Cellectis I-crel homing endonuclease variants having novel claevage specificity and use thereof
WO2007057781A2 (en) 2005-10-25 2007-05-24 Cellectis Laglidadg homing endonuclease variants having mutations in two functional subdomains and use thereof.
EP2365066A1 (en) 2005-10-25 2011-09-14 Cellectis Laglidadg homing endonuclease variants having mutations in two functional subdominants und use thereof
EP2368979A2 (en) 2005-10-25 2011-09-28 Cellectis I-crel homing endonuclease variants having novel claevage specificity and use thereof
EP2343369A1 (en) 2005-10-25 2011-07-13 Cellectis Laglidadg homing endonuclease variants having mutations in two functional subdominants und use thereof
EP2365065A2 (en) 2005-10-25 2011-09-14 Cellectis I-Crel homing endonuclease variants having novel cleavage specificity and use thereof
EP2343368A1 (en) 2005-10-25 2011-07-13 Cellectis Laglidadg homing endonuclease variants having mutations in two functional subdominants und use thereof
WO2008093249A3 (en) * 2007-02-01 2008-10-30 Cellectis Obligate heterodimer meganucleases and uses thereof
WO2008093152A1 (en) * 2007-02-01 2008-08-07 Cellectis Obligate heterodimer meganucleases and uses thereof
EP2433641A1 (en) * 2007-02-01 2012-03-28 Cellectis Obligate heterodimer meganucleases and uses thereof
EP2770052A1 (en) 2007-06-06 2014-08-27 Cellectis Method for enhancing the cleavage activity of I-CreI derived meganucleases
US8912392B2 (en) 2007-06-29 2014-12-16 Pioneer Hi-Bred International, Inc. Methods for altering the genome of a monocot plant cell
EP2568048A1 (en) 2007-06-29 2013-03-13 Pioneer Hi-Bred International, Inc. Methods for altering the genome of a monocot plant cell
EP2535406A1 (en) 2007-07-23 2012-12-19 Cellectis Meganuclease variants cleaving a DNA target sequence from the human hemoglobin beta gene and uses thereof
US8927247B2 (en) 2008-01-31 2015-01-06 Cellectis, S.A. I-CreI derived single-chain meganuclease and uses thereof
US8338157B2 (en) 2008-03-11 2012-12-25 Precision Biosciences, Inc. Rationally-designed meganuclease variants of lig-34 and I-crei for maize genome engineering
WO2010001189A1 (en) * 2008-07-03 2010-01-07 Cellectis The crystal structure of i-dmoi in complex with its dna target, improved chimeric meganucleases and uses thereof
US9683257B2 (en) 2008-07-14 2017-06-20 Precision Biosciences, Inc. Recognition sequences for I-CreI-derived meganucleases and uses thereof
US10273524B2 (en) 2008-07-14 2019-04-30 Precision Biosciences, Inc. Recognition sequences for I-CreI-derived meganucleases and uses thereof
US10287626B2 (en) 2008-07-14 2019-05-14 Precision Biosciences, Inc. Recognition sequences for I-CreI-derived meganucleases and uses thereof
WO2010136981A2 (en) 2009-05-26 2010-12-02 Cellectis Meganuclease variants cleaving the genome of a pathogenic non-integrating virus and uses thereof
WO2011007336A1 (en) 2009-07-17 2011-01-20 Cellectis Viral vectors encoding a dna repair matrix and containing a virion-associated site specific meganuclease for gene targeting
WO2011021166A1 (en) 2009-08-21 2011-02-24 Cellectis Meganuclease variants cleaving a dna target sequence from the human lysosomal acid alpha-glucosidase gene and uses thereof
EP2504429A4 (en) * 2009-11-27 2013-06-05 Basf Plant Science Co Gmbh Chimeric endonucleases and uses thereof
WO2011064750A1 (en) * 2009-11-27 2011-06-03 Basf Plant Science Company Gmbh Chimeric endonucleases and uses thereof
CN102686726A (en) * 2009-11-27 2012-09-19 巴斯夫植物科学有限公司 Chimeric endonucleases and uses thereof
EP2504429A1 (en) * 2009-11-27 2012-10-03 BASF Plant Science Company GmbH Chimeric endonucleases and uses thereof
EP2504430A1 (en) * 2009-11-27 2012-10-03 BASF Plant Science Company GmbH Chimeric endonucleases and uses thereof
CN102686726B (en) * 2009-11-27 2015-12-16 巴斯夫植物科学有限公司 Chimeric endonuclease and uses thereof
WO2011064751A1 (en) * 2009-11-27 2011-06-03 Basf Plant Science Company Gmbh Chimeric endonucleases and uses thereof
CN102762726A (en) * 2009-11-27 2012-10-31 巴斯夫植物科学有限公司 Chimeric endonucleases and uses thereof
US10316304B2 (en) 2009-11-27 2019-06-11 Basf Plant Science Company Gmbh Chimeric endonucleases and uses thereof
EP2504430A4 (en) * 2009-11-27 2013-06-05 Basf Plant Science Co Gmbh Chimeric endonucleases and uses thereof
JP2013511978A (en) * 2009-11-27 2013-04-11 ビーエーエスエフ プラント サイエンス カンパニー ゲーエムベーハー Chimeric endonuclease and use thereof
WO2011082310A2 (en) 2009-12-30 2011-07-07 Pioneer Hi-Bred International, Inc. Methods and compositions for targeted polynucleotide modification
WO2011101696A1 (en) 2010-02-18 2011-08-25 Cellectis Improved meganuclease recombination system
WO2011101811A2 (en) 2010-02-18 2011-08-25 Cellectis Improved meganuclease recombination system
WO2011141825A1 (en) 2010-05-12 2011-11-17 Cellectis Meganuclease variants cleaving a dna target sequence from the rhodopsin gene and uses thereof
WO2011141820A1 (en) 2010-05-12 2011-11-17 Cellectis Meganuclease variants cleaving a dna target sequence from the dystrophin gene and uses thereof
WO2012001527A2 (en) 2010-06-15 2012-01-05 Cellectis S.A. Method for improving cleavage of dna by endonuclease sensitive to methylation
WO2012004671A2 (en) 2010-07-07 2012-01-12 Cellectis Meganucleases variants cleaving a dna target sequence in the nanog gene and uses thereof
WO2012010976A2 (en) 2010-07-15 2012-01-26 Cellectis Meganuclease variants cleaving a dna target sequence in the tert gene and uses thereof
WO2012007848A2 (en) 2010-07-16 2012-01-19 Cellectis Meganuclease variants cleaving a dna target sequence in the was gene and uses thereof
WO2012058458A2 (en) 2010-10-27 2012-05-03 Cellectis Sa Method for increasing the efficiency of double-strand break-induced mutagenesis
WO2012129373A2 (en) 2011-03-23 2012-09-27 Pioneer Hi-Bred International, Inc. Methods for producing a complex transgenic trait locus
WO2012138901A1 (en) 2011-04-05 2012-10-11 Cellectis Sa Method for enhancing rare-cutting endonuclease efficiency and uses thereof
US11198856B2 (en) 2011-04-05 2021-12-14 Cellectis Method for the generation of compact tale-nucleases and uses thereof
EP3320910A1 (en) 2011-04-05 2018-05-16 Cellectis Method for the generation of compact tale-nucleases and uses thereof
US9315788B2 (en) 2011-04-05 2016-04-19 Cellectis, S.A. Method for the generation of compact TALE-nucleases and uses thereof
WO2012138927A2 (en) 2011-04-05 2012-10-11 Philippe Duchateau Method for the generation of compact tale-nucleases and uses thereof
US8685737B2 (en) 2011-04-27 2014-04-01 Amyris, Inc. Methods for genomic modification
WO2012149470A1 (en) 2011-04-27 2012-11-01 Amyris, Inc. Methods for genomic modification
US9701971B2 (en) 2011-04-27 2017-07-11 Amyris, Inc. Methods for genomic modification
CN103620027A (en) * 2011-06-10 2014-03-05 巴斯夫植物科学有限公司 Nuclease fusion protein and uses thereof
WO2012168910A1 (en) * 2011-06-10 2012-12-13 Basf Plant Science Company Gmbh Nuclease fusion protein and uses thereof
US9758796B2 (en) 2011-06-10 2017-09-12 Basf Plant Science Company Gmbh Nuclease fusion protein and uses thereof
US9574208B2 (en) 2011-06-21 2017-02-21 Ei Du Pont De Nemours And Company Methods and compositions for producing male sterile plants
WO2013066423A2 (en) 2011-06-21 2013-05-10 Pioneer Hi-Bred International, Inc. Methods and compositions for producing male sterile plants
WO2013009525A1 (en) 2011-07-08 2013-01-17 Cellectis S.A. Method for increasing the efficiency of double-strand break-induced mutagenssis
WO2013019411A1 (en) 2011-08-03 2013-02-07 E. I. Du Pont De Nemours And Company Methods and compositions for targeted integration in a plant
US9909110B2 (en) 2012-05-04 2018-03-06 E. I. Du Pont De Nemours And Company Compositions and methods comprising sequences having meganuclease activity
US10150956B2 (en) 2012-05-04 2018-12-11 E I Du Pont De Nemours And Company Compositions and methods comprising sequences having meganuclease activity
WO2013166113A1 (en) 2012-05-04 2013-11-07 E. I. Du Pont De Nemours And Company Compositions and methods comprising sequences having meganuclease activity
US9499827B2 (en) 2012-05-04 2016-11-22 E I Du Pont De Nemours And Company Compositions and methods comprising sequences having meganuclease activity
EP3561050A1 (en) 2013-02-20 2019-10-30 Regeneron Pharmaceuticals, Inc. Genetic modification of rats
US10329574B2 (en) 2013-03-12 2019-06-25 E I Du Pont De Nemours And Company Methods for the identification of variant recognition sites for rare-cutting engineered double-strand-break-inducing agents and compositions and uses thereof
WO2014164466A1 (en) 2013-03-12 2014-10-09 E. I. Du Pont De Nemours And Company Methods for the identification of variant recognition sites for rare-cutting engineered double-strand-break-inducing agents and compositions and uses thereof
EP3456831A1 (en) 2013-04-16 2019-03-20 Regeneron Pharmaceuticals, Inc. Targeted modification of rat genome
WO2015088643A1 (en) 2013-12-11 2015-06-18 Regeneron Pharmaceuticals, Inc. Methods and compositions for the targeted modification of a genome
EP3460063A1 (en) 2013-12-11 2019-03-27 Regeneron Pharmaceuticals, Inc. Methods and compositions for the targeted modification of a genome
EP4349980A2 (en) 2013-12-11 2024-04-10 Regeneron Pharmaceuticals, Inc. Methods and compositions for the targeted modification of a genome
WO2015095804A1 (en) 2013-12-19 2015-06-25 Amyris, Inc. Methods for genomic integration
WO2015188109A1 (en) 2014-06-06 2015-12-10 Regeneron Pharmaceuticals, Inc. Methods and compositions for modifying a targeted locus
EP3708671A1 (en) 2014-06-06 2020-09-16 Regeneron Pharmaceuticals, Inc. Methods and compositions for modifying a targeted locus
EP3708663A1 (en) 2014-06-23 2020-09-16 Regeneron Pharmaceuticals, Inc. Nuclease-mediated dna assembly
EP3354732A1 (en) 2014-06-23 2018-08-01 Regeneron Pharmaceuticals, Inc. Nuclease-mediated dna assembly
EP3461885A1 (en) 2014-06-26 2019-04-03 Regeneron Pharmaceuticals, Inc. Methods and compositions for targeted genetic modifications and methods of use
WO2015200805A2 (en) 2014-06-26 2015-12-30 Regeneron Pharmaceuticals, Inc. Methods and compositions for targeted genetic modifications and methods of use
WO2016061374A1 (en) 2014-10-15 2016-04-21 Regeneron Pharmaceuticals, Inc. Methods and compositions for generating or maintaining pluripotent cells
EP3561052A1 (en) 2014-10-15 2019-10-30 Regeneron Pharmaceuticals, Inc. Methods and compositions for generating or maintaining pluripotent cells
EP3653048A1 (en) 2014-12-19 2020-05-20 Regeneron Pharmaceuticals, Inc. Methods and compositions for targeted genetic modification through single-step multiple targeting
WO2016100819A1 (en) 2014-12-19 2016-06-23 Regeneron Pharmaceuticals, Inc. Methods and compositions for targeted genetic modification through single-step multiple targeting
EP4219731A2 (en) 2016-05-18 2023-08-02 Amyris, Inc. Compositions and methods for genomic integration of nucleic acids into exogenous landing pads
WO2018023014A1 (en) 2016-07-29 2018-02-01 Regeneron Pharmaceuticals, Inc. Mice comprising mutations resulting in expression of c-truncated fibrillin-1
WO2019043082A1 (en) 2017-08-29 2019-03-07 Kws Saat Se Improved blue aleurone and other segregation systems
US11697822B2 (en) 2017-08-29 2023-07-11 KWS SAAT SE & Co. KGaA Blue aleurone and other segregation systems
EP4276185A2 (en) 2017-09-29 2023-11-15 Regeneron Pharmaceuticals, Inc. Rodents comprising a humanized ttr locus and methods of use
WO2019067875A1 (en) 2017-09-29 2019-04-04 Regeneron Pharmaceuticals, Inc. Non-human animals comprising a humanized ttr locus and methods of use
WO2020123377A1 (en) 2018-12-10 2020-06-18 Neoimmunetech, Inc. Nrf-2 deficient cells and uses thereof
WO2020131632A1 (en) 2018-12-20 2020-06-25 Regeneron Pharmaceuticals, Inc. Nuclease-mediated repeat expansion
WO2020206139A1 (en) 2019-04-04 2020-10-08 Regeneron Pharmaceuticals, Inc. Non-human animals comprising a humanized coagulation factor 12 locus
WO2020206134A1 (en) 2019-04-04 2020-10-08 Regeneron Pharmaceuticals, Inc. Methods for scarless introduction of targeted modifications into targeting vectors
WO2020247452A1 (en) 2019-06-04 2020-12-10 Regeneron Pharmaceuticals, Inc. Non-human animals comprising a humanized ttr locus with a beta-slip mutation and methods of use
WO2020247812A1 (en) 2019-06-07 2020-12-10 Regeneron Pharmaceuticals, Inc. Non-human animals comprising a humanized albumin locus
WO2021108363A1 (en) 2019-11-25 2021-06-03 Regeneron Pharmaceuticals, Inc. Crispr/cas-mediated upregulation of humanized ttr allele
WO2022240846A1 (en) 2021-05-10 2022-11-17 Sqz Biotechnologies Company Methods for delivering genome editing molecules to the nucleus or cytosol of a cell and uses thereof
WO2022251644A1 (en) 2021-05-28 2022-12-01 Lyell Immunopharma, Inc. Nr4a3-deficient immune cells and uses thereof
WO2022256437A1 (en) 2021-06-02 2022-12-08 Lyell Immunopharma, Inc. Nr4a3-deficient immune cells and uses thereof
WO2023064924A1 (en) 2021-10-14 2023-04-20 Codiak Biosciences, Inc. Modified producer cells for extracellular vesicle production
WO2023129974A1 (en) 2021-12-29 2023-07-06 Bristol-Myers Squibb Company Generation of landing pad cell lines
WO2023220603A1 (en) 2022-05-09 2023-11-16 Regeneron Pharmaceuticals, Inc. Vectors and methods for in vivo antibody production
WO2023225665A1 (en) 2022-05-19 2023-11-23 Lyell Immunopharma, Inc. Polynucleotides targeting nr4a3 and uses thereof
WO2024064952A1 (en) 2022-09-23 2024-03-28 Lyell Immunopharma, Inc. Methods for culturing nr4a-deficient cells overexpressing c-jun
WO2024064958A1 (en) 2022-09-23 2024-03-28 Lyell Immunopharma, Inc. Methods for culturing nr4a-deficient cells
WO2024077174A1 (en) 2022-10-05 2024-04-11 Lyell Immunopharma, Inc. Methods for culturing nr4a-deficient cells

Also Published As

Publication number Publication date
AU2003290518A1 (en) 2004-04-23
WO2004031346A3 (en) 2006-10-26
AU2003290518A8 (en) 2004-04-23

Similar Documents

Publication Publication Date Title
WO2004031346A2 (en) Methods and compositions concerning designed highly-specific nucleic acid binding proteins
Silva et al. From monomeric to homodimeric endonucleases and back: engineering novel specificity of LAGLIDADG enzymes
EP3004338B1 (en) A laglidadg homing endonuclease cleaving the t cell receptor alpha gene and uses thereof
US8927247B2 (en) I-CreI derived single-chain meganuclease and uses thereof
Moure et al. The crystal structure of the gene targeting homing endonuclease I-SceI reveals the origins of its target site specificity
WO2010001189A1 (en) The crystal structure of i-dmoi in complex with its dna target, improved chimeric meganucleases and uses thereof
AU2020214090B2 (en) Inhibition of unintended mutations in gene editing
Mpakali et al. Crystal structure of insulin-regulated aminopeptidase with bound substrate analogue provides insight on antigenic epitope precursor recognition and processing
WO2021042047A1 (en) C-to-g transversion dna base editors
US20110281306A1 (en) Novel Zinc Finger Nuclease and Uses Thereof
Perry et al. Structural dynamics in DNA damage signaling and repair
EP2231697B1 (en) Improved chimeric meganuclease enzymes and uses thereof
US20140148361A1 (en) Generation and Expression of Engineered I-ONUI Endonuclease and Its Homologues and Uses Thereof
EP1989299A2 (en) Meganuclease variants cleaving a dna target sequence from a xeroderma pigmentosum gene and uses thereof
JP2010528649A (en) Method for enhancing cleavage activity of I-CreI-derived meganuclease
Fajardo-Sanchez et al. Computer design of obligate heterodimer meganucleases allows efficient cutting of custom DNA sequences
Misiaszek et al. Cryo-EM structures of human RNA polymerase I
Thyme et al. Reprogramming homing endonuclease specificity through computational design and directed evolution
Schiller et al. Structural studies of DNA end detection and resection in homologous recombination
EP2027313B1 (en) Compositions and methods comprising the use of cell surface displayed homing endonucleases
Joshi et al. Evolution of I-SceI homing endonucleases with increased DNA recognition site specificity
Schormann et al. Poxvirus uracil‐DNA glycosylase—An unusual member of the family I uracil‐DNA glycosylases
US8841109B2 (en) IGA1 protease polypeptide agents and uses thereof
Németh et al. Chemical Approach to Biological Safety: Molecular‐Level Control of an Integrated Zinc Finger Nuclease
US20230399641A1 (en) Genomic editing of improved efficiency and accuracy

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP