EP1276858A1 - Non-pcr based recombination of nucleic acids - Google Patents

Non-pcr based recombination of nucleic acids

Info

Publication number
EP1276858A1
EP1276858A1 EP01926467A EP01926467A EP1276858A1 EP 1276858 A1 EP1276858 A1 EP 1276858A1 EP 01926467 A EP01926467 A EP 01926467A EP 01926467 A EP01926467 A EP 01926467A EP 1276858 A1 EP1276858 A1 EP 1276858A1
Authority
EP
European Patent Office
Prior art keywords
stranded
double
nucleic acid
dna
gene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP01926467A
Other languages
German (de)
French (fr)
Inventor
Alexander Volkov
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Danisco US Inc
Original Assignee
Genencor International Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Genencor International Inc filed Critical Genencor International Inc
Publication of EP1276858A1 publication Critical patent/EP1276858A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • C12N15/1027Mutagenizing nucleic acids by DNA shuffling, e.g. RSR, STEP, RPR
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/66General methods for inserting a gene into a vector to form a recombinant vector using cleavage and ligation; Use of non-functional linkers or adaptors, e.g. linkers containing the sequence for a restriction endonuclease

Abstract

Described herein are methods for forming recombined nucleic acids which do not require the use of PCR for purposes of assembly or recombination. As such, the recombined nucleic acids are not subjected to size limitations or amplification of undesired errors. Moreover, the present invention allows recombination of sequences with low homology which would not occur if subjected to the high temperatures of PCR. Additionally, methods of forming proteins, screening assays, genetic pools, and recombined nucleic acids and proteins are provided herein.

Description

NON-PCR BASED RECOMBINATION OF NUCLEIC ACIDS
FIELD OF THE INVENTION
The invention relates to the in vitro recombination of nucleic acids. The methods provided herein are generally based on random fragmentation of double-stranded DNA molecules, generation of single-stranded ends on both ends of the fragments and assembly of the fragments having single-stranded ends.
BACKGROUND OF THE INVENTION
Although nature has generated many commercially useful proteins, it is generally appreciated that introducing changes into naturally occurring proteins, e.g., by substituting one or more of the amino acids within such a protein, provides variant proteins that have an altered property, which may be desired over the property of the corresponding naturally occurring protein. Improvement of proteins, in particular enzymes, ligands, receptors, and antibodies is one of the important objectives of biotechnology [Harayama, Trends Biotechnol. 16(2):76-82 (1998)]. Current methods for generating variants or mutants of naturally occurring proteins generally involve the directed and random manipulation of the nucleic acids encoding them. Widely used methods are oligonucleotide-directed mutagenesis, cassette mutagenesis, and error-prone polymerase chain reactions.
In oligonucleotide-directed or site-directed mutagenesis, a short sequence within a given nucleic acid is replaced with an oligonucleotide, comprising one or more desired mutations, which will be introduced into the nucleic acid, thereby altering the sequence of the nucleic acid and the potential protein encoding information. Site-directed mutagenesis has been widely in the study of protein structure and properties [for review, e.g. see
DeSantis and Jones, Curr. Opin. Biotechnol. 10(4)324-30; Candy and Duggleby, Biochim. Biophys. Acta 1385(2):323-38 (1998); MacLennan et al., Ann. N.Y. Acad. Sci 853:31-42 (1998); di Cera, Adv. Protein Chem. 51 :59-119 (1998); Flower, Biochim. Biophys. Acta 1422(3):207-34 (1999); Olah and Stiles, Methods Mol. Biol. 83:25-43 (1997); Viville, Methods Mol. Biol. 57:87-95 (1996); Deng and Nickoloff, Anal. Biochem. 200(1 ):81 -8 (1992); Bryan, Methods Mol. Biol. 40:271-89 (1995)]. Moreover, a number of studies report on the combination of site-directed mutagenesis and methods based on the polymerase chain reaction (PCR) [for review see: Ling and Robinson, Anal. Biochem. 254(2):157-78 (1997); Costa et al., Methods Mol. Biol. 57:239-48 (1996); Barik, Methods Mol. Biol. 57:203-15 (1996); Shimada, Methods Mol. Biol. 57:157-65 (1996)].
In cassette mutagenesis, generally a double-stranded oligonucleotide harboring a block of random or partially random nucleotides is inserted into a chosen position of a nucleic acid. This insertion can either be an insertion, without deleting or substituting any other nucleotide, or a complete or partial replacement of existing nucleotides [Oliphant et al., Gene 44(2-3): 177-83 (1986); Arkin et al., Proc. Natl. Acad. Sci. U.S.A., 89(16)7811-5 (1992); for review see, Kegler-Ebo et al., Methods Mol. Biol. 57:297-310 (1996)]. Error-prone PCR uses low-fidelity polymerization conditions to introduce a low level of point mutations randomly over a long sequence. Many protein encoding genes were subjected to random mutagenesis by error-prone PCR resulting in mutagenized genes encoding variant proteins with an altered and/or improved property [Song et al., Appl. Environ. Microbiol. 66(3):890-894 (2000); Henke et al., Biol Chem. 380(7-8):1029-33 (1999); Buchholz et al., Nat. Biotechnol. 16(7):657-62 (1998)].
All of the above-mentioned methods can be used to introduce mutations at desired positions or randomly within a nucleic acid. However, they all reveal certain disadvantages particularly with respect to repeated cycles of mutagenesis and combining mutations residing within different nucleic acid molecules. Combinatorial and repeated cycle approaches are highly desired for high throughput analysis and optimization of variant proteins. Because in site-directed mutagenesis, generally the oligonucleotides are individually designed, the number of variants that can be generated is labor intensive and is not practical for many rounds of mutagenesis or for the generation of large libraries of variants. Cassette mutagenesis using oligonucleotides with blocks of random nucleotides requires sequencing and regrouping of individual clones after each selection round making it also a very labor intensive approach. The deficiencies of error-prone PCR protocols include low processivity of the polymerase and are further discussed below in a general discussion of PCR protocol deficiencies.
The search for proteins with optimized properties involves searching more and more sequences within large libraries and requires increased numbers of cycles of mutagenic amplification and selection. However as discussed above, the existing mutagenesis methods that are in widespread use have distinct limitations when used for repeated cycles.
In nature the evolution of most proteins and/or protein encoding genes occurs by mutation, homologous recombination and natural selection. Homologous recombination is a ubiquitous process that plays an important role in species adaptation and survival. During meiosis homologous recombination ensures mixing and combining of the genes. The importance of homologous recombination is illustrated by its duality of functions - increasing genetic diversity as well as preserving genetic integrity [Stahl, Sci. Am. 256:91-101 (1987)]. However, natural in vivo recombination mechanisms usually operate at low efficiencies. Various approaches have emerged as tools to mimic and accelerate nature's recombination strategy and direct the evolution of protein function [for review see, Kuchner and Arnold Trends Biotechnol. 15(12):523-30 (1997)]. In one method a set of parent genes is digested with DNase I to create a pool of short random fragments (10- to 50-bp) that are reassembled into full-length genes by repeated thermocycling in the presence of DNA polymerase. [Stemmer, Proc. Natl. Acad. Sci. U.S.A., 91 (22):10747-51 (1994); Stemmer, Nature 370(6488):389-91 (1994)]. Reportedly, a gene or a plasmid can be assembled from a large number of oligodeoxynucleotides [Stemmer et al., Gene, 164(1 ):49-53 (1995)]. Other methods directed to recombining nucleic acids are published in WO 97/07205;
WO 98/01581 ; WO 98/28416; WO 98/41622; WO 98/41623; WO 98/41653; WO 99/23236; WO 99/65927; WO 00/04190; Stemmer, US Patent No. 5,605,793; Stemmer et al., US Patent No. 5,811 ,238; Short, US Patent No. 5,965,408; van Kempen et al., Biochemistry 37:3459-3466 (1998); Volkov et al., Nucleic Acids Res. 27(18):i-vi (1999); Shao et al., Nucleic Acids Res. 26(2):681-883 (1998); Judo et al., Nucleic Acids Res. 26(7):1819-1825 (1998); Zhao and Arnold, Nucleic Acid Res. 25(6):1307-1308 (1997).
In yet another study it was reported that during PCR, as a result of breaking and/or nicking of the DNA template, in vitro recombination occurred. [Marton et al. Nucleic Acids Res. 19(9):2423-6 (1991 )]. It was reported that such damage allows the formation of hybrid duplexes containing at least one truncated strand (e.g., terminating at the position of the damage and/or nick), the 3' end of which maps within a homologous region of another DNA fragment. Extension of this 3' end by the DNA polymerase then results in a linkage of sequences identical to that arising from homologous recombination. Another study reported that PCR co-amplification of two distinct HIV1 tat gene sequences lead to the formation of recombinant DNA molecules. [Meyerhans et al. Nucleic Acids Res. 18(7):1687-91 (1990)] . A variation of the above methods as also been reported. In one report, a method is described which consists of priming the template sequences (e.g. nucleic acids encoding variant proteins) followed by repeated cycles of denaturation and extremely abbreviated annealing/polymerase-catalyzed extension, see, e.g., Zhao et al., Nat. Biotechnol. 16(3):258-61 (1998). Under those conditions, in each cycle, the growing fragments can anneal to different templates based on sequence complementarity and extend further. This process is repeated until the full-length sequences are formed.
Recombining nucleic acids, when combined with an activity screen or selection protocol can accelerate finding desired traits. Such methods have reportedly been used to enhance enzyme stability (Zhao et al., supra), to enhance enzyme activity [Stemmer, Nature 370:389-391 (1994); Buchholz et al., Nat. Biotechnol. 16(7):657-62 (1998); Christians et al., Nat. Biotechnol. 17(3):259-264 (1999); Merz et al., Biochemistry 39(5):880-9 (2000)], to change substrate specificity [Zhang et al., Proc. Natl. Acad. Sci. U.S.A., 94(9):4504-9 (1997)], and to improve protein folding [Crameri et al., Nat. Biotechnol. 14(3):315-9 (1996); Buchholz et al., Nat. Biotechnol. 16(7):657-662 (1998)]. In a recent adaptation of these methods, two or more naturally occurring homologous genes or gene families are used as the starting genetic material to efficiently recombine sequences from different species [Crameri et al., Nature 391(6664):288-91 (1998); Minshull and Stemmer, Curr. Opin. Chem. Biol. 3(3):284-90 (1999); Chang et al., Nat. Biotechnol. 17(8):793-7 (1999); Ness et al., Nat. Biotechnol. 17(9):893-6 (1999)].
Overall, the methods described above are either labor intensive or require PCR. For example, many of the methods discussed above require a PCR reassembly step and, in addition, a PCR procedure to amplify the desired full-length nucleic acid. During prolonged cycles of PCR the rate of down-mutations grows with the information content of the sequence (e.g., the length of a sequence, numbers of PCR cycles, library size). Eventually, the balance of down-mutations to up-mutations will statistically prevent the selection of further improvements. In addition, PCR-based methods are known to be most powerful for the manipulation of nucleic acids in the range of 1 to 5 kb and usually become very inefficient for amplification of DNA sequences above 20 kb, failing to produce sufficient amount of full-length product for cloning and subsequent analysis. Although, recently amplification of nucleic acids in the range of greater than 50 kb was reported, as pointed out above, amplification of such long sequences may lead to the accumulation of unacceptably high number or errors and inactivation of biological functions encoded by these sequences. Thus, methods employing PCR have many deficiencies in in vitro homologous recombination of larger molecules, such as viral genomes and/or chromosomal fragments.
Another limitation of the currently used methods originates from the nature of the generation of small DNA fragments and their assembly process. Stemmer [Proc. Natl. Acad. Sci. U.S.A., 91 (22):10747-51 (1994)] used a pool of short random fragments of 10- to 50-bp in length that needed to be assembled into full-length genes by repeated thermocycling in the presence of DNA polymerase. Two random fragments with complementary ends may be assembled together only if the region of complementarity is long enough to form stable double-stranded DNAs (hybhdes) at high temperatures during PCR. This requirement may significantly limit recombination between DNA sequences with low sequence homology and as such requires an additional PCR step to amplify the full- length product. On the other hand, long nucleic acid molecules may comprise regions of internal homology (self-complementarity) that after denaturation under PCR conditions may form stable intramolecular stems, which interfere with the process of generating a full-length product.
Accordingly, there is a need in the art for an improved method of obtaining random pieces of genetic material for assembly to produce random nucleic acids and protein variants that may be screened for a particular use.
SUMMARY OF THE INVENTION
The present invention provides methods directed to the in vitro recombination of nucleic acids. The methods are based on random fragmentation of double-stranded nucleic acid molecules resulting in smaller double-stranded fragments, generation of single- stranded ends (or cohesive ends) on those fragments and assembly. The methods provided herein do not require the use of PCR amplification or PCR-like thermocycling for the assembly of recombined nucleic acids and thus offer advantages over existing in vitro recombination methods. As further discussed below, PCR may be used for amplification once the recombined nucleic acid is formed.
In one aspect of the invention a method for forming a recombined nucleic acid is provided. In one embodiment, the method comprises randomly fragmenting one or more double-stranded nucleic acid molecules, preferably DNA, to form double-stranded fragments having ends; generating single-stranded cohesive ends on each end of said fragments; and assembling together at least two of said fragments having said cohesive ends to form a recombined nucleic acid. In one embodiment, the nucleic acid molecules and said fragments remain double-stranded throughout said fragmenting, generating single-stranded cohesive ends and assembling steps. The recombined nucleic acid formed from the present invention can be further subjected to at least one more repetition of said fragmenting, generating single-stranded cohesive ends and assembling steps. Preferably, the method is repeated at least 10 or more times. Moreover, in another embodiment, one or more double-stranded nucleic acid molecules are added to the formed recombined nucleic acid and said fragmenting, generating single-stranded cohesive ends and assembling steps are repeated.
The double-stranded nucleic acid molecule or molecules which are randomly fragmented can be any variety of molecules. In a preferred embodiment, the nucleic acid molecule is not subjected to a size limitation. Preferably, the nucleic acid molecule is a gene, an operon or metabolic pathway containing more than one gene or a chromosome. In one aspect of the invention, at least two nucleic acid molecules having sequences which differ from each other are fragmented. The nucleic acid molecules can be homologs of one another, variants of a nucleic acid, or a mix of naturally occuring nucleic acid molecules and variants of one another.
Also provided herein is a method for forming a recombined nucleic acid having a gene deleted or inverted. In one aspect, the method comprises randomly fragmenting a double-stranded nucleic acid molecule comprising a gene flanked by known sequences to form double-stranded fragments having ends; generating single-stranded cohesive ends on each end of said fragments; and assembling said fragments with a double-stranded insert having single-stranded ends complementary to said known sequences to form a recombined nucleic acid having said gene deleted or inverted. Wherein a gene is deleted, the insert does not comprise the gene.
In one embodiment for forming a recombined nucleic acid having a gene inverted, the double-stranded insert having single-stranded ends is formed by a method comprising amplifying a gene with primers that have additional nucleotides that are not natural extensions of said gene, wherein said additional nucleotides are complementary to opposing ends of said known sequences to form an amplification product. Preferably, the amplication product is treated to form said insert having single-stranded ends complementary to said known sequences.
In a further aspect of the invention, a method for forming a recombined nucleic acid having a gene inserted therein. In one embodiment, the method comprises randomly fragmenting a double-stranded nucleic acid molecule; generating single-stranded cohesive ends on each end of said fragments; and assembling said fragments with a double-stranded insert comprising a gene and having single-stranded ends to form a recombined nucleic acid having a gene inserted therein. In one aspect, the double-stranded insert is formed by a method comprising amplifying said insert with primers that have additional nucleotides to form an amplification product; and treating said amplification product to form said insert having single-stranded ends. The additional nucleotides can be random or natural extensions from a selected site for insertion.
Additionally, the recombined nucleic acids formed herein can be used in conventional amplification after formation, the formation of proteins, screening assays, in formation of genetic pools, cloning or expressions vectors, inserting heterologous genes into existing operons for the purpose of metabolic engineering, and have a variety of other applications. The compositions formed from the methods of the present invention are also provided herein.
DETAILED DESCRIPTION OF THE INVENTION
The present invention provides for forming recombined nucleic acids which do not require the use of PCR for purposes of assembly or recombination. As such, the recombined nucleic acids are not subjected to size limitations or amplification of undesired errors. Moreover, the present invention allows recombination of sequences with low homology which would not occur if subjected to the high temperatures of PCR. Additionally, methods of forming proteins, screening assays, genetic pools, and recombined nucleic acids and proteins are provided herein.
In one aspect of the invention a method for forming a recombined nucleic acid is provided. In one embodiment, the method comprises randomly fragmenting one or more double-stranded nucleic acid molecules to form double-stranded fragments having ends; generating single-stranded cohesive ends on each end of said fragments; and assembling together at least two of said fragments having said cohesive ends to form a recombined nucleic acid. Nucleic acid as used herein can be any nucleic acid, DNA (synthetic, genomic or cDNA), RNA, or a mix of DNA and RNA. In preferred embodiments, nucleic acid is DNA. For convenience, DNA is used herein for illustration.
The one or more double-stranded DNA molecules can be any double-stranded DNA molecule. As further described below, the DNA molecule can be recombined in a variety of ways. In a preferred embodiment, DNA molecules which differ in sequence from each other are used. In this embodiment, a heterogenous population of DNA molecules may be used. The terms "population" or "library" or grammatical equivalents thereof, as used herein, generally mean a collection of components such as nucleic acids, nucleic acid fragments, proteins, vectors, constructs, cells, etc. Usually, a populations of the invention comprises from at least two components to 109 components. Preferred are populations comprising from at least 10 components to 108 components, more preferred are populations comprising from at least 50 components to 107 components and most preferred are populations comprising from at least 100 components to 106 components. Preferably, within each family of components, e.g., nucleic acids, the family members are related, but differ in at least one aspect, e.g., in their sequence, i.e., they are not identical.
In an alternative embodiment, the recombined nucleic acid is formed of one initial DNA molecule, wherein the sequence of the initial DNA molecule has been rearranged such that the recombined nucleic acid has a sequence which differs from the initial DNA molecule. In this embodiment, more than one double-stranded DNA molecule can be used wherein the DNA molecules are homogenous. In this sense, homogenous refers to DNA molecules having the same sequence.
In one embodiment, the DNA molecules are at least about 30 bp and can be any desired length. In one embodiment, the DNA molecules are genes, operons or metabolic pathways, or chromosomes. The double-stranded DNA molecules to be fragmented can be any molecules including naturally occurring molecules and variants thereof. By "naturally occurring", "wild type" or grammatical equivalents thereof, is meant a nucleic acid sequence or an amino acid sequence that is found in nature and in one embodiment, includes naturally occurring allelic variations. Alternatively, the DNA molecules are non-naturally occurring sequences. By "non-naturally occurring", or grammatical equivalents thereof, is meant a nucleic acid sequence or an amino acid sequence that is not found in nature.
Preferably, the DNA molecules are a mixture of naturally occurring and non-naturally occurring sequences.
In one aspect of this embodiment, the DNA molecule is a variant of a naturally occurring nucleic acid. A "variant" or grammatical equivalents thereof, refers to a component that is altered at one or more sites with respect to a corresponding naturally occurring component. Thus, a nucleic acid variant (or variant nucleic acid) comprises a nucleotide sequence that is altered by one or more nucleotides when compared to a nucleotide sequence of a naturally occurring nucleic acid or to a nucleotide sequence of a non-naturally-occurring sequence. Accordingly, a protein variant (or variant protein) comprises an amino acid sequence that is altered by one or more amino acid residues when compared to an amino acid sequence of a naturally occurring protein or to an amino acid sequence of a non-naturally-occurring protein. In one embodiment, a variant has one or more deletions, substitutions, insertions, truncations or combinations thereof.
In a preferred embodiment of the invention, a population of double-stranded DNA molecules comprises a naturally-occurring nucleic acid, homologs, naturally occurring allelic variations thereof as well as random and site-directed variants. Wherein all the DNA molecules are based on the same nucleic acid, being variants or homologs thereof, etc., the DNA molecules are said to be related or a family. In one aspect of the invention, homolog refers to a gene or protein which is identified as functionally equivalent but produced in a different species.
In other embodiments of the invention, a population of double-stranded DNA molecules is generated by mutagenesis. The mutagenesis methods employed may be site- directed or random and are generally known in the art. Alternatively, error-prone PCR can be used to generate the double-stranded DNA molecules. Other methods for obtaining DNA molecules can be used, such as using mutator strains, chemical mutagenesis or irradiation with X-rays or ultraviolet light using methods as known in the art. In one aspect of the invention, the DNA molecules can be represented at about the same ratio. For example, the population may comprise five different variants, 'a', 'b', 'c', 'd', and 'e' of a naturally occurring nucleic acid. These five variants may be combined in a 1 :1 :1 :1 :1 ratio. In another aspect of this embodiment, one variant (e.g., 'a', that may comprise a desired mutation) is over-represented over the other variants (e.g., 5:1 :1:1 :1). Each variant may be present in a different molar ratio in the population.
In a preferred embodiment, the method comprises fragmenting the double-stranded DNA molecules. In a preferred embodiment, the DNAs are randomly fragmented. Generally, each double-stranded DNA is fragmented into at least two fragments. The fragments may be of different sizes and are preferably at least about 15 base pairs and may be at least 1 kb, 5 kb, 10 kb, or preferably larger is some embodiments. Random fragmentation can be done by using enzymes including, but not limited to DNAsel [Liao, J. Biol. Chem. 249:2354 (1974); Matsuda and Ogoshi J. Biochem. 59:230 (1966); Hong, Methods Enzymol. 155:93 (1987)], P1 nuclease [Furuichi and Miura, Nature 253:374 (1975)], S1 nuclease [Noll, Nature 251 :249 (1974)], T7 endonuclease [Center et al., Proc. Natl. Acad. Sci. U.S.A. 65:242 (1970); de Massy et al., J. Mol. Biol. 193:359 (1987)], S1 nuclease, mung bean nuclease, or combinations thereof, or in combinations with intercalating agents, such as ethidium bromide. Random fragmentation may be by shearing of DNA in one embodiment, and includes, but is not limited to sonication of DNA and passage of the DNA through a tube having a small orifice, such as a needle.
In one embodiment, first and second double-stranded DNAs are fragmented to generate at least 4 fragments. Generally the number of different specific nucleic acid fragments will be at least about 100, preferably at least about 500, more preferably at last about 1000 and most preferably at least about 104.
The DNA fragments generated by fragmentation of a population of double-stranded DNA or the double-stranded DNA population may comprise short single-stranded 5'- or 3'- protruding ends. In one embodiment of the invention the short 5'- or 3'- protruding ends of the double-stranded DNA population or the short 5'- or 3'- protruding ends of the DNA fragments are removed. Enzymatic removal of 5'- or 3'- protruding ends includes, but is not limited to using one or more of the following enzymes: Bal 31 , S1 nuclease, mung bean nuclease, P1 nuclease, DNAsel, exonuclease I, exonuclease VII, N. crassa nuclease, [see Sambrook et al., Molecular Cloning: A Laboratory Manual (2nd ed.) Vol. 1-3, Cold Spring Harbor Laboratory Press, New York (1989); Brown, Molecular Biology LabFax, BIOS
Scientific Publishers Limited; Information Press Ltd, Oxford, UK, 1991 )]. Alternatively, short overhanging 5'- protruding ends serve as a template for DNA polymerase, which is added to synthesize a blunt-end DNA or a blunt-end DNA fragment. Such DNA polymerases include, but are not limited to DNA polymerase I (Kornberg polymerase), DNA polymerase I (Klenow fragment), T4 DNA polymerase, T7 DNA polymerase, Taq DNA polymerase, micrococcal DNA polymerase, etc.
In another aspect of the invention, single-stranded cohesive ends are generated. By "single-stranded end", "cohesive end" or grammatical equivalents thereof, herein is meant, a nucleic acid that is protruding from the end(s) of an otherwise double-stranded nucleic acid. In one embodiment the fragments remain double-stranded throughout two or more subsequent steps of the method of the invention. In this embodiment, it is understood that the fragments remain double-stranded but for the cohesive ends.
In preferred embodiments, the cohesive ends are generated by digestion of the DNA fragments and are part of or integral to the DNA fragment prior to the generation of the cohesive ends. In such an embodiment, the methods do not comprise attachment of a single stranded oligo or PCR synthesis to form the single stranded end. Preferred embodiments utilize a type lls restriction endonucleases or an exonuclease.
Alternatively, in some embodiments herein, cohesive ends may be formed by the addition of oligonucleotides either in a polymerase reaction or in a ligation reaction using standard methods known in the art. Additionally, cohesive ends can be formed by removing nucleotides, or by the addition and removal of nucleotides. For example, in one embodiment, the cohesive ends can be added by using primers to form a template, synthesizing a complementary end thereon, and then removing said template. Preferred embodiments exclude the use of a ribonucleotides and/or ribonucleases in the generation of the cohesive ends. Alternative embodiments include nucleic acid fragments and/or the generation of cohesive ends which include the use of ribonucleotides. The nucleic acids may comprise a mix of DNA and RNA. In the generation of cohesive ends, wherein nucleotides are removed which include RNA, ribonucleases or chemical agents which degrade RNA may be used.
As used herein, a type Ms endonuclease is an enzyme which binds to a short recognition sequence, rarely palindromic, which will cut one strand (and usually both) downstream of the recognition sequence instead of within it. The terms "cut", "cleave", or grammatical equivalents thereof, as used herein, refer to the digestion of nucleic acid using enzymes or breaking the nucleic acid or to the digestion of proteins using proteases or other known protein cleaving agents. Thus, type Ms endonucleases allow generation of cohesive ends which can have random sequences, since cleavage is based on the recognition site, whereas the cleavage site can be any sequence at all falling at the appropriate distance from the recognition site.
Typically, the type Ms endonucleases do not require a palindrome for site recognition and typically cut DNA a measured number of bases to one side of the recognition site [e.g., the Mbo II site is 5'...GAAGA...3', and the cut site is 8 bases 3' of the recognition site on the upper DNA strand and 7 bases 5' of the recognition site on the bottom strand; abbreviated as GAAGA(8/7)]. Some type Ms restriction enzymes, such as Sael, Scgl, BsaXI, Bsp24l, C/'el and C/ePI cut on both sides of their recognition sequence, and thus have 4 cleavage sites instead of two. Some type Ms restriction enzymes recognize more than one nucleic acid sequence (e.g., Sael, Scgl, SsaXI, BseMII, 6sp24l, C/'el, C/ePI, Mme\, Tagil and Tthl 111I). REBASE (restriction enzyme data base) is a comprehensive database of restriction enzymes, including type Ms restriction enzymes (Roberts and Macelis, Nucleic Acids Res. 26(1 ):338-350 (1998); incorporated as reference in its entirety). In one embodiment, the site is 1 , 2, 3, 4, 5, 6, 7, 8, 9 or 10 bases away from the recognition site. In one aspect of the invention, one or more of the following type Ms endonucleases are used: (names of enzymes in parentheses are isoschizomers; enzymes with asteriks indicate that the enzyme is not commercially available as of the publication of Roberts and Macelis, supra, but can be isolated by methods known in the art): ocelli*, Sael*, Bbv\, BbvW* (Bbs\, Bpi\, BpuAI), Bce83l*, Bcefl*, Scgl, Bc/VI, BfiV, Bin\* (Ac N\, Alw\), SsaXI*, BseMII*, SseRI, Ssgl, BsmAI (Alw26\), Bsp24l\ BspMI, Bs/DI (Sse3DI), Bts\, C/'el*, C/ePI*, Ec/I*. Eco31 l (Bsal), £co57l, Esp3l (BsmBI), Faul, Fin\ (SsmFI), Fok\ (SseGI, 6sfF5l, Sts\*), Gsu\ (Spml), Hgal, Hph\ (Asu P\), Ksp632l (Bsu6l, Eam11041, Eaή), MboW, Mme\*, Mnl\, P/el (Myl*), F eAI*, Sapl, SføNI (SscAI*), Stf?132l, Tagil*, and Tthl M U*.
Type Ms restriction enzymes leaving cohesive ends comprising only one nucleotide (such as Mboll) generate a maximum of only 4 different cohesive ends ('A', 'C\ 'G, and T); type Ms restriction enzymes leaving cohesive ends comprising two nucleotides (such as Bsgl) generate a maximum of 16 different cohesive ends ('AA', 'AG', 'AC, 'AT, 'CA', 'CC, 'CG\ 'CT, 'GA\ 'GC, 'GG', 'GT, TA", TC, TG\ and TT); type Ms restriction enzymes leaving cohesive ends comprising three nucleotides (such as Ksp632l) generate a maximum of 64 different cohesive ends; type Ms restriction enzymes leaving cohesive ends comprising four nucleotides (such as Bbv\) generate a maximum of 256 different cohesive ends; type I Is restriction enzymes leaving cohesive ends comprising five nucleotides (such as Hgal) generate a maximum of 1024 different cohesive ends; and type Ms restriction enzymes leaving cohesive ends comprising six nucleotides (such as C/'el) generate a maximum of 4096 different cohesive ends. In one embodiment, type Ms restriction enzymes leaving cohesive ends comprising five or six nucleotides are preferred.
In one aspect of this embodiment, the double-stranded DNAs or the double-stranded DNA fragments, generated by fragmentation, are cloned into a vector and then released by digestion with a type I Is restriction endonuclease to generate cohesive ends. Preferably, the vector comprises a recognition sites for a type II restriction enonuclease (such as EcoRV ), flanked on either side by a recognition sequence for a type Ms restriction endonuclease. In another aspect of this embodiment, the recognition sequences for type II and type lls restriction endonuclease may overlap. The double-stranded DNAs or the double-stranded DNA fragments are cloned into the type II restriction site, thereby generating constructs comprising the double-stranded DNAs or the double-stranded DNA fragments, flanked by two type lls restriction sites. Upon digestion of this construct with a type lls restriction endonuclease, the double-stranded DNAs or the double-stranded DNA fragments are released, each comprising two cohesive ends.
In one embodiment, the size of the vector is larger than the average size of the DNA fragments to be cloned into the vector. This allows the double-stranded DNAs or the double-stranded DNA fragments, not ligated into the vector to be removed by centrifugation using e.g., a sizing filtration, such as Microcon 100 filters. The ligated DNA will remain on the filter, while non-ligated DNA pass through the filter. The ligated DNA may then be digested with a type lls restriction endonuclease and the released double-stranded DNAs or the double-stranded DNA fragments, each comprising cohesive end(s), can be separated from the vector, by passing them through a Microcon 100 filter, as described above. This time, the vector will remain on the filter. The vector thus does not only act as a provider for the type lls recognition sites, but also helps to discriminate between un-ligated, ligated and released double-stranded DNAs and/or released double-stranded DNA fragments. Alternatively, gel electrophoresis can be used instead of centrifugation for separation of the vector, un-ligated, ligated and released double-stranded DNAs and/or released double- stranded DNA fragments. The released double-stranded DNA fragments comprising cohesive ends are then used for assembling or recombining, as is described below. ln another embodiment, the ligation products are transformed into a host cell and propagated prior to further manipulation with type lls restriction endonuclease. Appropriate host cells are described further below.
In another embodiment the double-stranded DNA fragments are digested with one or more type lls restriction enzymes without being prior subcloned into a vector. In this embodiment type lls restriction endonuclease(s) cut at recognition site(s) located within the double-stranded DNA fragments and generate cohesive ends.
In another preferred embodiment the double-stranded DNA population is digested with one or more type lls restriction enzymes without being prior fragmented as described above. In this embodiment type lls restriction endonuclease(s) cut at recognition site(s) located within the double-stranded DNA population and generate cohesive ends. The use of more than one type lls restriction enzyme serves simultaneously to fragment the double- stranded DNA population and to generate cohesive ends.
Depending on the size of the DNA fragments or size of the double-stranded DNAs within the double-stranded DNA population, an appropriate type lls restriction endonuclease may be chosen for digestion to generate double-stranded DNA fragments of a desired size and comprising cohesive end(s). For example, the recognition sequence for StMZ2\, CCCG(4/8), statistically occurs once every 256 nucleotides; the recognition sequence for BseMII, CTSAG(10/8), wherein S is G or C, statistically occurs once every 512 nucleotides; the recognition sequence for Bbv\, GCAGC(8/12) statistically occurs once every 1024 nucleotides; the recognition sequence for Mme\, TCCRAC(20/18), wherein R is A or G, statistically occurs once every 2048 nucleotides; the recognition sequence for SspMI, ACCTGC(4/8), statistically occurs once every 4096 nucleotides; the recognition sequence for Sael, (10/15)ACNNNNGTAYC(12/7), wherein N is A or C or G or T, and wherein Y is C or T, statistically occurs once every 8192 nucleotides; and the recognition sequence for Sapl, GCTCTTC(1/4), statistically occurs once every 16384 nucleotides.
In one embodiment, an adaptor comprising a recognition sequence for a type lls restriction endonuclease is ligated to both ends of the first double-stranded DNA fragments. In this embodiment two complementary oligonucleotides comprising a recognition sequence for a type lls restriction endonuclease are synthesized and hybridized to each other, thereby forming a double-stranded "type lls adaptor", which is then ligated to the DNA fragment(s). Methods for the chemical synthesis of oligonucleotides are known in the art and- as such are not presented herein. The recognition sequences for type lls restriction endonucleases can be found e.g., in Roberts and Macelis, supra. The DNA sequence of these type lls adaptors are designed in such a way that the cleavage reaction is directed toward sequences located within the double-stranded DNA fragments. After ligating the type lls adaptor to the double- stranded DNA fragments, the type lls restriction endonuclease, whose recognition sequence is provided by the type lls adaptor is added and the DNA is digested, thereby generating first double-stranded DNA fragments and second double-stranded DNA fragments comprising cohesive ends. In a preferred embodiment, the cohesive ends generated on a first double-stranded DNA fragment have complementation to the cohesive ends generated on a second double-stranded DNA fragment. In another embodiment, an adaptor comprising a recognition sequence for a type lls restriction endonuclease is ligated to both ends of the double-stranded DNAs comprised within the double-stranded DNA population, i.e. the double-stranded DNA are not fragmented prior to the ligation of the type lls adaptor.
In one embodiment, the type lls adaptors are added to both ends of the double- stranded DNA fragment or double-stranded DNA simultaneously. In this aspect of the embodiment, both ends of the double-stranded DNA are accessible for ligation of the type lls adaptors, e.g., by being blunt-ended. In this embodiment, the type lls adaptors ligated to the ends of the respective DNAs can be identical.
In another aspect of this embodiment, the type lls adaptors are added to both ends of the double-stranded DNA fragment or double-stranded DNA sequentially. In this aspect of the embodiment, initially, usually only one end of the double-stranded DNA is accessible for ligation of the type lls adaptor (the 'first accessible end'), e.g., by being blunt-ended. The other end either comprises a cohesive end or is protected by other DNA, e.g., vector DNA. After the first ligation of the type lls adaptor to the first accessible end, the other end is prepared for the second ligation, i.e. is made blunt-ended. This can be done by eliminating a cohesive end by e.g., exonuclease digestion or by polymerase fill-in reaction. In case the second end is protected by other DNA (e.g. vector DNA), this other DNA is digested away and blunt-ends are prepared. A second type lls adaptor is ligated to the second accessible end. In this embodiment, the type lls adaptors ligated to the ends of the respective DNAs can be identical or different.
In one embodiment, the type lls adaptor of the invention may be labeled. By "labeled" herein is meant that a compound (such as a type lls adaptor) has at least one element, isotope or chemical compound attached to enable the detection of the compound. The labels may be incorporated into the compound at any position. The compound is either directly or indirectly labeled with a label, which provides a detectable signal, e.g. radioisotope, fluorescers, colored dyes, enzyme, antibodies, particles such as magnetic particles, chemiluminescers, or specific binding molecules, etc. Specific binding molecules include pairs, such as biotin and streptavidin, digoxin and antidigoxin etc. For the specific binding members, the complementary member would normally be labeled with a molecule which provides for detection, in accordance with known procedures, as outlined above. The label can directly or indirectly provide a detectable signal. A label attached to a type lls adaptor can be used to purify the type lls adaptor away from the digested DNA fragments. ln another embodiment of the invention, the single-stranded ends are generated using an exonuclease. Exonucleases are commercially available and include, but are not limited to λ exonuclease, bacteriophage T7 gene 6 exonuclease, Bal 31 nuclease, and exonuclease III [see Sambrook et al., Molecular Cloning: A Laboratory Manual (2nd ed.) Vol. 1-3, Cold Spring Harbor Laboratory Press, New York (1989); Brown, Molecular Biology LabFax, BIOS Scientific Publishers Limited; Information Press Ltd, Oxford, UK, 1991)]. The exonuclease is added to the double-stranded DNA fragments and is incubated, according to the recommendations of the supplier, under conditions sufficient for the successive removal of nucleotides from the double-stranded DNA fragments, thereby generating single-stranded or cohesive ends.
In one embodiment, the single-stranded ends are generated using a 5'-3' exonuclease, thereby generating protruding 3'- cohesive ends. Protruding 3'- ends ensure that the DNA fragments are not modified by DNA polymerase or ligase until they hybridize with a complementary sequence. In one aspect of this embodiment, the single-stranded ends are generated using λ exonuclease. λ exonuclease is 10-100 times more active with double-stranded DNA (blunt- ended or with a 3'-overhang) than with single-stranded DNA, but is inefficient with double- stranded DNA with a 5'-overhang. Nicks and gaps are usually not recognized as starting points, λ exonuclease catalyzes the processive, stepwise release of 5'- mononucleotides from the 5'- ends of double-stranded DNA. The preferred substrate is double-stranded DNA with a terminal 5'- phosphate. Appropriate reaction conditions, following published protocols and/or protocols provided by the supplier of this commercially available enzyme are used to generate 3'- cohesive ends on the double-stranded DNAs and double-stranded DNA fragments [Little, in Gene Amplification and Analysis: structural Analysis of Nucleic Acids (J.G. Chirikjian and T.S. Papas eds.) Elsevier, New York, Vol 2, p136 (1981 ); Radding, J. Mol. Biol. 18:235 (1966); Little et al., J. Biol. Chem. 242:672 (1967); Sambrook et al., Molecular Cloning: A Laboratory Manual (2nd ed.) Vol. 1-3, Cold Spring Harbor Laboratory Press, New York (1989); Brown, Molecular Biology LabFax, BIOS Scientific Publishers Limited; Information Press Ltd, Oxford, UK, 1991 )]. In another aspect of this embodiment, the single-stranded ends are generated using the bacteriophage T7 gene 6 exonuclease. T7 gene 6 exonuclease is a double-stranded specific 5'-3' exonuclease that removes mononucleotides from both the 5' termini of the two strands of linear DNA. Appropriate reaction conditions, following published protocols and/or protocols provided by the supplier of this commercially available enzyme are used to generate 3'- cohesive ends on the double-stranded DNAs and double-stranded DNA fragments [Roberts et al., Biochemistry 21(23):6000-5 (1982); Brantley and Beer, Gene Anal. Tech. 6(4):75-8 (1989) ]. The reaction conditions for the exonuclease treatment of double-stranded DNA and/or double-stranded DNA fragments can be varied to allow the generation of single- stranded ends of a defined length. Single-stranded ends of various lengths can be generated by changing one or more of the following reaction conditions: (i) concentration of exonuclease, (ii) ratio of exonuclease vs. substrate, (iii) incubation time, (iv) incubation temperature, (v) salt concentration.
In one embodiment, a single-stranded end comprises 2-300 nucleotides, that is 2- 300 nucleotides of the opposite strand are successively removed by the exonuclease. More preferably, a single-stranded end comprises 2-100 nucleotides, more preferably, 2-60, 2-40 or 2-20 nucleotides, and more preferably, a single-stranded end or cohesive end comprises 2-10 nucleotides, most preferably, about 10.
Increasing the length of the single-stranded ends one can successfully recombine and assemble sequences of increasing length. Cohesive ends comprising e.g., 10 nucleotides statistically occur only once in a million, thereby making the method of the invention suitable for the recombination of entire chromosomes.
Assembly and recombination of DNA fragments according to the method of the invention is preferably based upon complementarity of overhanging single-stranded ends that are generated on said double-stranded DNA fragments. In one embodiment, complementation does not need to be exact for hybridization. The terms "complementary" or "complementarity", or grammatical equivalents thereof, as used herein, refer to the natural binding of polynucleotides under permissive salt and temperature conditions by base-pairing. Complementarity between two single-stranded nucleic acids may be "partial", in which only some of the nucleic acid bind, or it may be complete when total complementarity exists between the single-stranded nucleic acids. The invention provides a method for the generation of recombined double-stranded
DNAs. Preferably, the recombined nucleic acid has a sequence which is different from an initial DNA molecule prior to recombination. The sequence may differ by having at least one section of sequence replaced by a fragment of a variant or homolog in accordance with the methods provided herein. Alternatively, the sequence may differ by having sections within one DNA molecule rearranged in a different order. Preferably, the product encoded by the recombined nucleic acid retains the function of the wild type protein, such as catalytic activity, but has an altered property such as further discussed below. A chimeric nucleic acid or protein as used herein refers to any sequence which has been manipulated to contain at least a portion, ranging from at least one residue to as many as but for one residue of another molecule. Generally, at least one random fragment has been incorporated. The term "assembly", or grammatical equivalents thereof, herein means combining one or more nucleic acid molecules to form one contiguous nucleic acid molecule. "Recombination", or forming a "recombined" nucleic acid is generally the reassortment of sections of nucleic acid sequences between one, or preferably at least two nucleic acid molecules, having sequences which differ from each other.
Assembly is based on the annealing of cohesive ends between nucleic acid molecules. The cohesive ends anneal based on substantial complementation. The term substantial complementation means that the ends have at least partial to complete complementation such that they can anneal under the selected reaction conditions. For example, the reaction conditions may include further polymerase or ligation reactions. Generally, such reactions can be adjusted to favor hybridization and do not require temperature conditions such as those required in PCR.
Generally, the methods of the invention are useful for the generation of novel chimeric polynucleotides. These novel polynucleotides may encode useful proteins, such as novel receptors, ligands, antibodies and enzymes. These novel polynucleotides may also comprise hybrid nucleic acids, wherein, for example, 5' untranslated regions of genes, 3' untranslated regions of genes, introns, exons, promoter regions, enhancer regions and other regulatory sequences for gene expression, such as dominant control regions, are recombined. The double-stranded DNA fragments comprising single-stranded ends are assembled based on the complementarity of their respective cohesive ends. The lowest level of identity required for recombination of homologous sequences is determined by the size of the cohesive ends. Identity can range from 1 to 8 nucleotides (1 , 2, 3, 4, 5, 6, 7 or 8) and assists in and recombination of sequences. Cohesive ends of 1-8 nucleotides are provided on DNA fragments treated with type lls restriction endonucleases. For assembly of genes, large gene fragments, viral genomes or entire operons, DNA fragments comprising longer cohesive ends are preferred. DNA fragments with longer cohesive ends are preferably generated using exonuclease treatment of DNA.
Thus, the methods of the invention provide for the recombination of DNA fragments ranging from 50-100 bp to several Mbp. The methods of the invention are particular useful for the recombination of very large DNA sequences, when conventional cloning protocols fail. Assembly of large DNA sequences, using e.g., the method, wherein the cohesive ends are generated by exonuclease can substitute restriction endonuclease based DNA manipulations for example, when these large DNA sequences lack convenient or any restriction sites and PCR assembly becomes very inefficient.
Assembly occurs by contacting a single-stranded end on a first double-stranded DNA fragment with a single-stranded end on a second double-stranded DNA fragment. As discussed above, the assembly can also be done using double-stranded DNA with cohesive ends that have not been fragmented. Only those cohesive ends having regions of at least some homology with other cohesive ends will assemble into a recombined nucleic acid. In a preferred embodiment, the assembly step is perfomed in a reaction mixture which only includes randomly fragmented DNA. Preferably, the assembly step in making 5 the recombined nucleic acid does not contain extraneous DNA such as vector DNA, particularly DNA that has been cut at unique restriction sites. Extraneous DNA can hinder the assembly of randomly fragmented DNA other randomly fragmented DNA. In one embodiment herein, the methods include the identification of linear recombined nucleic acids comprising assembled randomly fragmented pieces of double-stranded DNA, ando which excludes other types of DNA. However, as further discussed below, once the recombined nucleic acid is formed, it can be cloned into a vector or amplified, etc.
Alternatively, non-randomly fragmented nucleic acids can be included in the reaction mixture with the randomly fragmented nucleic acids. Preferably, in this embodiment, two or more fragments are assembled, or preferably at least one gene is assembled or s recombined. Furthermore, the method comprises identifying the recombined nucleic acids which have multiple random fragments assembled together or contiguously. Preferably, the recombined nucleic acids having more than one random fragment assembled to another random fragment or separated from other assembly products and further utilized. Such utilizations include the production of recombined proteins and their use in screening assays.o Preferably, such embodiments include an increase in the ratio of randomly fragmented DNA to non-randomly fragmented DNA or vector, so as to increase the likelihood that two or more inserts will assemble with each other or contiguously.
The assembly reaction does not require any prior denaturation of the population of DNA fragments, as used by current methods known to the skilled artisan. The assembly5 reaction can proceed at temperatures, wherein the DNA fragments remain double-stranded, i.e, are not separated into two strands. In one embodiment, the assembly is performed at about 37°C. In one aspect of this embodiment assembly is performed at temperatures lower than 37°C.
In one embodiment, the assembly reaction is accelerated by the addition of a volumeo excluder, such as polyethylene glycol (PEG) or other known volume excluders as known in the art. The concentration of PEG is preferably from 0% to about 30%, more preferably from about 5% to about 20% and most preferably from about 10% to about 15%.
In another embodiment, the assembly reaction is accelerated by the addition of a salt, including, but not limited to sodium chloride, potassium chloride or ammonium sulphate.5 The salt concentration is from 0 mM to about 2500 mM, preferably from about 0 mM to 250 mM, more preferably from about 10 mM to about 200 mM and most preferably from about 10 mM to about 100 mM. In one embodiment, the recombined nucleic acid comprising a gap is incubated with DNA polymerase and dNTPs (i.e., dATP, dCTP, dGTP, dTTP) under conditions sufficient for closing the gap, that is the missing nucleotides are synthesized by the DNA polymerase. The DNA polymerase which can be employed herein may be any enzyme known in the art 5 that can catalyze a DNA chain extending reaction using as a template the sequence of an existing strand. Such DNA polymerases include, but are not limited to DNA polymerase I (Komberg polymerase), DNA polymerase I (Klenow fragment), T4 DNA polymerase, T7 DNA polymerase, Tag DNA polymerase, micrococcal DNA polymerase, etc. After DNA synthesis, the remaining nick may be treated with DNA ligase using standard methods.o DNA polymerase and DNA ligase may be added simultaneously.
The steps of the methods provided herein may constitute a cycle which favor direction toward desirable mutations leading to desirable traits or phenotypes. The recombined nucleic acid may be cloned into a vector, propagated and screened for a species or first subpopulation with a desired property. This results in the identification ands isolation of, or enrichment for, a recombined nucleic acid encoding a polypeptide that has acquired a desired property.
According to the present invention, at least two nucleic acid sequences are recombined at the same time. However, preferably any number of different nucleic acids may be assembled or recombined at the same time. This is advantageous because a largeo number of different variants can be made rapidly without iterative procedures.
In another embodiment of the invention, after completing this cycle once, the steps of the method are repeated at least one time using the newly generated first subpopulation of chimeric DNAs as the starting material. Preferably the cycle is repeated at least 2 time, more preferably up to 5 times, more preferably up to 10 times, and most preferably up to5 100 times or more.
In another embodiment, the chimeric DNA or the full-length. gene generated according to one of the methods described above, is amplified. The terms "amplification" or "amplify" or grammatical equivalents thereof, as used herein, refer to the production of additional copies of a nucleic acid sequence and is generally carried out using theo polymerase chain reaction (PCR). PCR technologies are well known in the art (e.g., see Dieffenbach and Dveksler in PCR Primer, A Laboratory Manual, Cold Spring Harbor Press, Princeton, N.Y.).
In one aspect of this embodiment, the first subpopulation of chimeric DNA is subjected to reiterated assembly without prior cloning into a vector, propagation or 5 screening to identify a species with a desired property.
In another aspect of this embodiment, the first subpopulation of chimeric DNA is cloned into a vector, propagated and screened to identify a species or first subpopulation with a desired property prior to subjecting the first subpopulation to reiterated assembly or recombination. After the second round of assembly, a second subpopulation is obtained that may be screened for the same property or for a different property.
As outlined above, the invention provides chimeric DNAs and chimeric DNAs encoding variant polypeptides. The chimeric DNA and the variant polypeptide preferably 5 have at least one property, which differs from the same property of the corresponding naturally occurring polynucleotide or corresponding naturally occurring polypeptide. The property of the chimeric DNA or of the variant polypeptide is the result of assembly according to the present invention or assembly and prior mutagenesis.
The term "property" or grammatical equivalents thereof in the context of a nucleico acid, as used herein, refer to any characteristic or attribute of a polynucleotide that can be selected or detected. These properties include, but are not limited to, a property affecting binding to a polypeptide, a property conferred on a cell comprising a particular polynucleotide, a property affecting gene transcription (e.g., promoter strength, promoter recognition, promoter regulation, enhancer function), a property affecting RNA processings (e.g., RNA splicing, RNA stability, RNA conformation, and post-transcriptional modification), a property affecting translation (e.g., level, regulation, binding of mRNA to ribosomal proteins, post-translational modification).
The term "property" or grammatical equivalents thereof in the context of a polypeptide, as used herein, refer to any characteristic or attribute of a polypeptide that cano be selected or detected. These properties include, but are not limited to oxidative stability, substrate specificity, catalytic activity, thermal stability, alkaline stability, pH activity profile, resistance to proteolytic degradation, Km, kcat, Km/kcat ratio, protein folding, inducing an immune response, ability to bind to a ligand, ability to bind to a receptor, ability to be secreted, ability to be displayed on the surface of a cell, ability to oligomerize, ability to5 signal, ability to stimulate cell proliferation, ability to inhibit cell proliferation, ability to induce apoptosis, ability to be modified by phosphorylation or glycosylation, ability to treat disease.
As used herein, the term "screening" has its usual meaning in the art and is, in general a multi-step process. In the first step, a recombined nucleic acid or variant polypeptide is provided. In the second step, a property of the nucleic acid or varianto polypeptide is determined. In the third step, the determined property is compared to a property of the corresponding naturally occurring polynucleotide, to the property of the corresponding naturally occurring polypeptide or to the property of the starting material for the generation of the recombined nucleic acid. The latter may be a synthetic DNA.
It will be apparent to the skilled artisan that the screening for a differing or altered5 property depends entirely upon the property of the starting material for the generation of the chimeric DNA. The skilled artisan will therefore appreciate that the invention is not limited to any specific property to be screened for and that the following description of properties lists illustrative examples only. Unless otherwise specified, a change in any of the above-listed properties, when comparing the property of a recombined nucleic acid or protein to the property of a naturally occurring nucleic acid or naturally occurring protein is preferably at least a 20%, more preferably, 50%, more preferably at least a 2-fold increase or decrease. Generally, any change which can be detected is considered as a change in property.
A change in substrate specificity is defined as a difference between the kcat/Km ratio of the naturally occurring protein and that of the variant thereof. The kcat/Km ratio is generally a measure of catalytic efficiency. Generally, the objective will be to generate variants of naturally occurring proteins with greater (numerically large) kcat/Km ratio for a given substrate when compared to that of the naturally occurring protein, thereby enabling the use of the protein to more efficiently act on a target substrate. However, it may be desirable to decrease efficiency. An increase in kcat/Km ratio for one substrate may be accompanied by a reduction in kcat/Km ratio for another substrate. This is a shift in substrate specificity and variants of naturally occurring proteins exhibiting such shifts have utility where the naturally occurring protein is undesirable, e.g., to prevent undesired hydrolysis of a particular substrate in an admixture of substrates. Km and kcat are measured in accordance with known procedures.
A change in oxidative stability can be evidenced, for example, by at least about 20%, more preferably at least 50% increase of enzyme activity when exposed to various oxidizing conditions. Such oxidizing conditions include, but are not limited to exposure of the protein to the organic oxidant diperdodecanoic acid (DPDA). Oxidative stability is measured by known procedures.
A change in alkaline stability is evidenced by at least about a 5% or greater increase or decrease (preferably increase) in the half life of the enzymatic activity of a variant of a naturally occurring protein when compared to that of the naturally occurring protein. In the case of e.g., subtilisins, alkaline stability can be measured as a function of autoproteolytic degradation of subtilisin at alkaline pH, e.g., 0.1 M sodium phosphate, pH 12 at 25°C or 30°C. Generally, alkaline stability is measured by known procedures.
A change in thermal stability is evidenced by at least about a 5% or greater increase or decrease (preferably increase) in the half life of the catalytic activity of a variant of naturally occurring protein when exposed to a relatively high temperature and neutral pH as compared to that of the naturally occurring protein. In the case of e.g., subtilisins, thermal stability can be measured as a function of autoproteolytic degradation of subtilisin at elevated temperatures and neutral pH, e.g., 2mM calcium chloride, 50 mM MOPS, pH 7.0 at 59°C. Generally, thermal stability is measured by known procedures.
Receptor variants, for example are experimentally tested and validated in in vivo and in in vitro assays. Suitable assays include, but are not limited to, e.g., examining their binding affinity to natural ligands and to high affinity agonists and/or antagonists. In addition to cell-free biochemical affinity tests, quantitative comparison are made comparing kinetic and equilibrium binding constants for the natural ligand to the naturally occurring receptor and to the receptor variants. The kinetic association rate (Kon) and dissociation rate (Koff), and the equilibrium binding constants (Kd) can be determined using surface plasmon resonance on a BIAcore instrument following the standard procedure in the literature [Pearce et al., Biochemistry 38:81-89 (1999)]. For most receptors described herein, the binding constant between a natural ligand and its corresponding naturally occurring receptor is well documented in the literature. Comparisons with the corresponding naturally occurring receptors are made in order to evaluate the sensitivity and specificity of the receptor variants. Preferably, binding affinity to natural ligands and agonists is expected to increase relative to the naturally occurring receptor, while antagonist affinity should decrease. Receptor variants with higher affinity to antagonists relative to the non naturally occurring receptors may also be generated by the methods of the invention.
Similarly, ligand variants, for example are experimentally tested and validated in in vivo and in in vitro assays. Suitable assays include, but are not limited to, e.g., examining their binding affinity to natural receptors and to high affinity agonists and/or antagonists. In addition to cell-free biochemical affinity tests, quantitative comparison are made comparing kinetic and equilibrium binding constants for the natural receptor to the naturally occurring ligand and to the ligand variants. The kinetic association rate (Kon) and dissociation rate (Koff), and the equilibrium binding constants (K ) can be determined using surface plasmon resonance on a BIAcore instrument following the standard procedure in the literature [Pearce et al., Biochemistry 38:81-89 (1999)]. For most ligands described herein, the binding constant between a natural receptor and its corresponding naturally occurring ligand is well documented in the literature. Comparisons with the corresponding naturally occurring ligands are made in order to evaluate the sensitivity and specificity of the ligand variants. Preferably, binding affinity to natural receptors and agonists is expected to increase relative to the naturally occurring ligand, while antagonist affinity should decrease. Ligand variants with higher affinity to antagonists relative to the non naturally occurring ligands may also be generated by the methods of the invention. The methods of the invention are also useful for the specific deletion of a gene or nucleic acid. In this embodiment, a DNA comprising a gene of interest that will be deleted, is provided. The gene may be flanked by known nucleotide sequences. According to the methods of the invention, the DNA comprising the gene of interest is randomly fragmented. Preferably, the gene encodes a full length protein, or a desired segment. As discussed above, single-stranded overhanging ends can be generated. In one embodiment, at least three fragments are generated, comprising the gene and nucleic acids on opposing flanking ends of said gene. Preferably the gene is randomly fragmented into at least two fragments, preferably more. Some of the DNA fragments will have cohesive ends corresponding to the sequence that flank the gene of interest. Two oligonucleotides with overlapping, complementary sequences are designed and synthesized. When annealed to each other, these oligonucleotides comprise cohesive ends corresponding to the sequences that flank the gene of interest. The same sequences will be represented in the population of DNA fragments generated from the starting DNA. The sequences of the oligonucleotide that hybridize to each other and form a double-stranded region may be completely random or may comprise a specific sequence which is not the gene or nucleic acid to be deleted. The oligonucleotides, separately or annealed, are added to the DNA fragments and the mix is treated with DNA ligase and optionally with DNA polymerase. Some assembled chimeric DNAs will have flanking sequences joined by the sequences incorporated into the oligonucleotide. The gene of interest is deleted.
The methods of the invention are also useful for the inversion of a gene or nucleic acid relative to surrounding genes or sequences. In this embodiment, a DNA comprising a gene of interest that will be inverted, is provided. The gene is flanked by known nucleotide sequences. The gene of interest is amplified by PCR using primers that comprise additional nucleotides, for example 5', and correspond to the gene flanking sequences. The "additional" nucleotides refer to the nucleotidse that do not anneal to the template, but rather overhang. Generally, the additional nucleotides are as long as preferred for a cohesive end as discussed above. Primers are known in the art, and generally are at least 7 nucleotides long. Preferably, the nucleotides are not natural extensions such that the nucleic acids would be assembled back into the initial sequence. Natural extensions refer to extending the sequence, such as a gene, to have its flanking sequence in the order and orientation that would be found prior to manipulation. Rather, the additional nucleic acids, for example, 5', will be complementary to, for example, the 3' end of the flanking sequence. According to the methods of the invention, the original DNA comprising the gene of interest is fragmented, preferably randomly, and treated so as to generate single-stranded overhanging. The amplified DNA fragment comprising the gene of interest is also treated to provide single-stranded ends. Some of the single-stranded ends, generated on the fragments derived from original DNA and those generated on the amplified gene fragment are complementary to each other. The fragments are mixed, annealed and ligated. Some assembled chimeric DNAs will have the gene of interest in inverted orientation.
The methods of the invention are also useful for the insertion of a gene or a nucleic acid into a known or unknown sequence. In this embodiment, a DNA comprising a known sequence into which a gene of interest will be inserted, is provided. The gene of interest is amplified by PCR using primers comprising additional nucleotides, for example, on their 5' ends that are complementary to the insertion sequence. According to the methods of the invention, the original DNA comprising the gene of interest is fragmented and treated with exonuclease to provide single-stranded overhanging ends (e.g., 3'), thereby providing at least two DNA fragments, T and 'II'. The amplified DNA fragment comprising the gene of interest is also treated with exonuclease to provide single-stranded ends. Some of the single-stranded ends, generated on the fragments derived from original DNA and those generated on the amplified gene fragment are complementary to each other. The fragments are mixed, annealed and ligated. Some assembled chimeric DNAs will have the gene of interest inserted into the specified sequence. In one embodiment, the additional nucleotides added to the PCR primers are random and as such the amplified gene can be inserted into any site.
In one embodiment of the invention, at least one double-stranded DNA molecule encodes a protein.
By "protein" herein is meant at least two covalently attached amino acids, which includes proteins, polypeptides, oligopeptides and peptides. The protein may be a naturally occurring proteins, a variant of a naturally occurring protein or a synthetic protein. The protein may be made up of naturally occurring amino acids and peptide bonds, or synthetic peptidomimetic structures, generally depending on the method of synthesis. Thus "amino acid", or "peptide residue", as used herein means both naturally occurring and synthetic amino acids. For example, homo-phenylalanine, citrulline and noreleucine are considered amino acids for the purposes of the invention. "Amino acid" also includes imino acid residues such as proline and hydroxyproline. The side chains may be in either the (R) or the (S) configuration. In the preferred embodiment, the amino acids are in the (S) or L- configuration. Stereoisomers of the twenty conventional amino acids, unnatural amino acids such as α,α-disubstituted amino acids, N-alkyl amino acids, lactic acid, and other unconventional amino acids may also be suitable components for proteins of the present invention. Examples of unconventional amino acids include, but are not limited to: 4- hydroxyproline, γ-carboxyglutamate, ε-N,N,N-thmethyllysine, ε-N-acetyllysine, O- phosphoserine, N-acetylsehne, N-formylmethionine, 3-methylhistidine, 5-hydroxylysine, ω- N-methylarginine, and other similar amino acids and imino acids. If non-naturally occurring side chains are used, non-amino acid substituents may be used, for example to prevent or retard in vivo degradations. Proteins including non-naturally occurring amino acids may be synthesized or in some cases, made recombinedly; see van Hest et al., FEBS Lett. 428:(1- 2) 68-70 (1998); and Tang et al., Abstr. Pap. Am. Chem. S218:U138-U138 Part 2 (1999), both of which are expressly incorporated by reference herein.
A "recombined protein" or "variant protein", as outlined further below, or grammatical equivalents thereof, as used herein, refer to a protein made using recombined techniques, i.e. through the expression of a recombined nucleic acid or chimeric nucleic acid as depicted above. A recombined or variant protein is distinguished from a naturally occurring protein by at least one or more characteristics. For example, the recombined or variant protein may be isolated or purified away from some or all of the proteins and compounds with which it is normally associated in its wild type host, and thus may be substantially pure. For example, an isolated recombined or variant protein is unaccompanied by at least some of the material with which it is normally associated in its natural state, preferably constituting at least about 0.5%, more preferably at least about 5% by weight of the total protein in a given sample. A substantially pure protein comprises at least about 75% by weight of the total protein, with at least about 80% being preferred, and at least about 90% being particularly preferred. As used herein, "substantially pure" means an object species (such as a protein or nucleic acid) is the predominant species present (i.e., on a molar basis it is more abundant than any other individual species in a composition), and preferably a substantially purified fraction is a composition, wherein the object species comprises at least about 50% (on a molar basis) of all macromolecular species present. Generally, a substantially pure composition will comprise more than about 80 to 90 percent of all macromolecular species present in the composition. Isolated nucleic acids and proteins are those taken from their native environment. Most preferably, the object species is purified to essential homogeneity (macomolecular contaminant species cannot be detected in the composition by conventional detection methods), wherein the composition consists essentially of a single macromolecular species.
Included within this definition are proteins whose amino acid sequence is altered by one or more amino acids when compared to the sequence of a naturally occurring protein. The definition also includes the production of a protein from one organism in a different organism or host cell. Alternatively, the recombined or variant protein may be made at a significantly higher concentration than is normally seen, through the use of a inducible promoter or high expression promoter, such that the recombined or variant protein is made at increased concentration levels. Furthermore, all of the recombined or variant proteins outlined herein are in a form not normally found in nature, as they may contain amino acid substitutions, insertions and deletions, with substitutions being preferred.
The nucleic acids may be from any number of eukaryotic or prokaryotic organisms or from archaebacteria. Particularly preferred are nucleic acids from mammals. Suitable mammals include, but are not limited to, rodents (rats, mice, hamsters, guinea pigs, etc.), primates, farm animals (including sheep, goats, pigs, cows, horses, etc) and in the most preferred embodiment, from humans. Other suitable examples of eukaryotic organisms include plant cells, such as maize, rice, wheat, cotton, soybean, sugarcane, tobacco, and arabidopsis; fish, algae, yeast, such as Saccharomyces cerevisiae; Aspergillus and other filamentous fungi; and tissue culture cells from avian or mammalian origins. Also preferred are nucleic acids from prokaryotic organisms. Suitable examples of prokaryotic organisms include gram negative organisms and gram positive organisms. Specifically included are enterobacteriaciae bacteria, pseudomonas, micrococcus, corynebacteria, bacillus, lactobacilli, streptomyces, and agrobacterium. Polynucleotides encoding proteins and enzymes isolated from extremophilic organisms, includining, but not limited to hyperthermophiles, psychrophiles, psychrotrophs, halophiles, barophiles and acidophiles, are particularly preferred. Such enzymes may function at temperatures above 100°C in terrestrial hot springs and deep sea thermal vents, at temperatures below 0°C in arctic waters, in the saturated salt environment of the Dead Sea, at pH values at around 0 in coal deposits and geothermal sulfur-rich springs, or at pH values greater than 11 in sewage sludge.
The proteins can be intracellular proteins, extracellular proteins, secreted proteins, enzymes, ligands, receptors, antibodies or portions thereof. In a preferred embodiment of the invention, the first double-stranded DNA encodes all or a portion of an enzyme. By "enzyme" herein is meant any of a group of proteins that catalyzes a chemical reaction.
Enzymes include, but are not limited to (i) oxidoreductases; (ii) transferases, comprising transferase transferring one-carbon groups (e.g., methyltransferases, hydroxymethyl-, formyl-, and related transferases, carboxyl- and carbamoyltransferases, amidinotransferases) transferases transferring aldehydic or ketonic residues, acyltransferases (e.g., acyltransferases, aminoacyltransferas), glycosyltransferases (e.g., hexosyltransferases, pentosyltransferases), transferases transferring alkyl or related groups, transferases transferring nitrogenous groups (e.g., aminotransferases, oximinotransferases), transferases transferring phosphorus-containing groups (e.g., phosphotransferases, pyrophosphotransferases, nucleotidyltransferases), transferases transferring sulfur-containing groups (e.g., sulfurtransferases, sulfotransferases, CoA- transferases), (iii) Hydrolases comprising hydrolases acting on ester bonds (e.g., carboxylic ester hydrolases, thioester hydrolases, phosphoric monoester hydrolases, phosphoric diester hydrolases, triphosphoric monoester hydrolases, sulfuric ester hydrolases), hydrolases acting on glycosyl compounds (e.g., glycoside hydrolases, hydrolyzing N- glycosyl compounds, hydrolyzing S-glycosyl compound), hydrolases acting on ether bonds (e.g., thioether hydrolases), hydrolases acting on peptide bonds (e.g., α-aminoacyl-peptide hydrolases, peptidyl-amino acid hydrolases, dipeptide hydrolases, peptidyl-peptide hydrolases), hydrolases acting on C-N bonds other than peptide bonds, hydrolases acting on acid-anhydride bonds, hydrolases acting on C-C bonds, hydrolases acting on halide bonds, hydrolases acting on P-N bonds, (iv) lyases comprising carbon-carbon lyases (e.g., carboxy-lyases, aldehyde-lyases, ketoacid-lyases), carbon-oxygen lyases (e.g., hydro- lyases, other carbon-oxygen lyases), carbon-nitrogen lyases (e.g., ammonia-lyases, amidine-lyases), carbon-sulfur lyases, carbon-halide lyases, other lyases, (v) isomerases comprising racemases and epimerases, cis-trans isomerases, intramolecular oxidoreductases, intramolecular transferases, intramolecular lyases, other isomerases, (vi) ligases or synthetases comprising ligases or synthetases forming C-O bonds, forming C-S bonds, forming C-N bonds, forming C-C bonds.
Particularly preferred are carbonyl hydrolases. Carbonyl hydrolases are enzymes that hydrolyze compounds comprising O=C-X bonds, wherein X is oxygen or nitrogen. They include hydrolases, e.g., lipases and peptide hydrolases, e.g., subtilisins or metalloproteases. Peptide hydrolases include α-aminoacylpeptide hydrolase, peptidylamino-acid hydrolase, acylamino hydrolase, serine carboxypeptidase, metallocarboxy-peptidase, thiol proteinase, carboxylproteinase and metalloproteinase. Serine, metallo, thiol and acid proteases are included, as well as endo and exo-proteases. In another preferred embodiment, the first and/or second double-stranded DNA encode a variant of an enzyme.
In another preferred embodiment of the invention, the first double-stranded DNA encodes all or a portion of a receptor. By "receptor" or grammatical equivalents herein is meant a proteinaceous molecule that has an affinity for a ligand. Examples of receptors include, but are not limited to antibodies, cell membrane receptors, complex carbohydrates and glycoproteins, enzymes, and hormone receptors.
Particularly preferred are cell-surface receptors. Cell-surface receptors appear to fall into two general classes: type 1 and type 2 receptors. Type 1 receptors have generally two identical subunits associated together, either covalently or otherwise. They are essentially preformed dimers, even in the absence of ligand. The type 1 receptors include the insulin receptor and the IGF (insulin like growth factor) receptor. The type-2 receptors, however, generally are in a monomeric form, and rely on binding of one ligand to each of two or more monomers, resulting in receptor oligomerization and receptor activation. Type-2 receptors include the growth hormone receptor, the leptin receptor, the LDL (low density lipoprotein) receptor, the GCSF (granulocyte colony stimulating factor) receptor, the interleukin receptors including IL-1 , IL-2, IL-3, IL-4, IL-5, IL-6, IL-7, IL-8, IL-9, IL-11 , IL-12, IL-13, IL-15, IL-17, etc., receptors, EGF (epidermal growth factor) receptor, EPO (erythropoietin) receptor, TPO (thrombopoietin) receptor, VEGF (vascular endothelial growth factor) receptor, PDGF (platelet derived growth factor; A chain and B chain) receptor, FGF (basic fibroblast growth factor) receptor, T-cell receptor, transferrin receptor, prolactin receptor, CNF (ciliary neurotrophic factor) receptor, TNF (tumor necrosis factor) receptor, Fas receptor, NGF (nerve growth factor) receptor, GM-CSF (granulocyte/macrophage colony stimulating factor) receptor, HGF (hepatocyte growth factor) receptor, LIF (leukemia inhibitory factor), TGFα/β (transforming growth factor α/β) receptor, MCP (monocyte chemoattractant protein) receptor and interferon receptors (α, β and y). Further included are T cell receptors, MHC (major histocompatibility antigen) class I and class II receptors and receptors to the naturally occurring ligands, listed below. ln another preferred embodiment, the first and/or second double-stranded DNA encode a variant of a receptor.
In one preferred embodiment of the invention, the first double-stranded DNA encodes all or a portion of a ligand. By "ligand" or grammatical equivalents herein is meant a proteinaceous molecule capable of binding to a receptor.
Ligands include, but are not limited to cytokines IL-1ra, IL-1 , 1 L- 1 a , IL-1b, IL-2, IL-3, IL-4, IL-5, IL-6, IL-8, IL-10, IFN-β, INF-γ, IFN-α-2a; IFN-Q-2B, TNF-α; CD40 ligand (chk), human obesity protein leptin, GCSF, BMP-7, CNF, GM-CSF, MCP-1 , macrophage migration inhibitory factor, human glycosylation-inhibiting factor, human rantes, human macrophage inflammatory protein 1 β, hGH, LIF, human melanoma growth stimulatory activity, neutrophil activating peptide-2, CC-chemokine MCP-3, platelet factor M2, neutrophil activating peptide 2, eotaxin, stromal cell-derived factor-1 , insulin, IGF-I, IGF-II, TGF-β1 , TGF-β2, TGF-β3, TGF-α, VEGF, acidic-FGF, basic-FGF, EGF, NGF, BDNF (brain derived neurotrophic factor), CNF, PDGF, HGF, GCDNF (glial cell-derived neurotrophic factor), EPO, other extracellular signaling moieties, including, but not limited to, hedgehog Sonic, hedgehog Desert, hedgehog Indian, hCG; coagulation factors including, but not limited to, TPA and Factor Vila.
In another preferred embodiment, the first and/or second double-stranded DNA encode a variant of a ligand. In one preferred embodiment of the invention, the first double-stranded DNA encodes all or a portion of an antibody. The term "antibody" or grammatical equivalents, as used herein, refer to antibodies and antibody fragments that retain the ability to bind to the epitope that the intact antibody binds and include polyclonal antibodies, monoclonal antibodies, chimeric antibodies, anti-idiotype (anti-ID) antibodies. Preferably, the antibodies are monoclonal antibodies. Antibody fragments include, but are not limited to the complementarity-determining regions (CDRs), single-chain fragment variables (scfv), heavy chain variable region (VH), light chain variable region (VL).
In another preferred embodiment, the first and/or second double-stranded DNA encode a variant of an antibody. Information with respect to nucleic acid sequences and amino acid sequences for enzymes, receptors, ligands, and antibodies is readily available from numerous publications and several data bases, such as the one from the National Center for Biotechnology Information (NCBI).
Using the nucleic acids of the present invention which encode a variant protein, a variety of expression vectors are made. The expression vectors may be either self- replicating extrachromosomal vectors or vectors which integrate into a host genome. Generally, these expression vectors include transc ptional and translational regulatory nucleic acid operably linked/to the nucleic acid encoding the variant protein. The term "control sequence" or grammatical equivalents thereof, as used herein, refer to DNA sequences necessary for the expression of an operably linked coding sequence in a particular host organism. The control sequences that are suitable for prokaryotes, for example, include a promoter, optionally an operator sequence, and a ribosome binding site. 5 Eukaryotic cells are known to utilize promoters, polyadenylation signals, and enhancers. It is understood that when screening for a particular property, or an alteration in properties, that the control can be a "ground zero" control. Alternatively, two proteins may be compared against one another, rather than a control.
In one embodiment of the invention the control sequences are generated by usingo the methods described herein.
Nucleic acid is "operably linked" when it is placed into a functional relationship with another nucleic acid sequence. For example, DNA for a presequence or secretory leader is operably linked to DNA for a polypeptide if it is expressed as a preprotein that participates in the secretion of the polypeptide; a promoter or enhancer is operably linked to a codings sequence if it affects the transcription of the sequence; or a ribosome binding site is operably linked to a coding sequence if it is positioned so as to facilitate translation. Generally, "operably linked" means that the nucleic acid sequences being linked are contiguous, and, in the case of a secretory leader, contiguous and in reading phase. However, enhancers do not have to be contiguous. Linking is accomplished by ligation ato convenient restriction sites. If such sites do not exist, synthetic oligonucleotide adaptors, linkers or the recombination methods of the herein described invention, are used in accordance with conventional practice. The transchptional and translational regulatory nucleic acid will generally be appropriate to the host cell used to express the fusion protein; for example, transcriptional and translational regulatory nucleic acid sequences from5 Bacillus are preferably used to express the fusion protein in Bacillus. Numerous types of appropriate expression vectors, and suitable regulatory sequences are known in the art for a variety of host cells.
In one embodiment of the invention the control sequences are operably linked to a another nucleic acid by using the methods described herein. o In a preferred embodiment, when a naturally occurring secretory sequence leads to a low level of secretion of a variant protein, a replacement of the naturally occurring secretory leader sequence is desired. In this embodiment, an unrelated secretory leader sequence is operably linked to a variant protein encoding nucleic acid leading to increased protein secretion. Thus, any secretory leader sequence resulting in enhanced secretion of the5 variant protein, when compared to the secretion of the naturally occurring protein and its secretory sequence, is desired. Suitable secretory leader sequences that lead to the secretion of a protein are know in the art. In another preferred embodiment, a secretory leader sequence of a naturally occurring protein or a variant protein is removed by techniques known in the art and subsequent expression results in intracellular accumulation of the recombined protein.
In general, the transchptional and translational regulatory sequences may include, but are not limited to, promoter sequences, ribosomal binding sites, transchptional start and stop sequences, translational start and stop sequences, and enhancer or activator sequences. In a preferred embodiment, the regulatory sequences include a promoter and transchptional start and stop sequences.
Promoter sequences encode either constitutive or inducible promoters. The promoters may be either naturally occurring promoters or hybrid promoters. Hybrid promoters, which combine elements of more than one promoter, are also known in the art, and are useful in the present invention. In a preferred embodiment, the promoters are strong promoters, allowing high expression in cells, particularly mammalian cells, such as the ST AT or CMV promoter, particularly in combination with a Tet regulatory element. In addition, the expression vector may comprise additional elements. For example, the expression vector may have two replication systems, thus allowing it to be maintained in two organisms, for example in mammalian or insect cells for expression and in a prokaryotic host for cloning and amplification. Furthermore, for integrating expression vectors, the expression vector contains at least one sequence homologous to the host cell genome, and preferably two homologous sequences which flank the expression construct. The integrating vector may be directed to a specific locus in the host cell by selecting the appropriate homologous sequence for inclusion in the vector. Constructs for integrating vectors are well known in the art.
In addition, in a preferred embodiment, the expression vector contains a selectable marker gene to allow the selection of transformed host cells. Selection genes are well known in the art and will vary with the host cell used.
The nucleic acids are introduced into the cells, either alone or in combination with an expression vector. By "introduced into " or grammatical equivalents herein is meant that the nucleic acids enter the cells in a manner suitable for subsequent expression of the nucleic acid. The method of introduction is largely dictated by the targeted cell type, discussed below. Exemplary methods include CaPO4 precipitation, liposome fusion, lipofectin®, electroporation, viral infection, etc. The nucleic acids may stably integrate into the genome of the host cell, or may exist either transiently or stably in the cytoplasm (i.e. through the use of traditional plasmids, utilizing standard regulatory sequences, selection markers, etc.). The proteins of the present invention are produced by cultuhng a host cell transformed either with an expression vector containing nucleic acid encoding the protein or with the nucleic acid encoding the protein alone, under the appropriate conditions to induce or cause expression of the protein. The conditions appropriate for protein expression will vary with the choice of the expression vector and the host cell, and will be easily ascertained by one skilled in the art through routine experimentation. For example, the use of constitutive promoters in the expression vector will require optimizing the growth and proliferation of the host cell, while the use of an inducible promoter requires the appropriate 5 growth conditions for induction. In addition, in some embodiments, the timing of the harvest is important. For example, the baculovirus used in insect cell expression systems is a lytic virus, and thus harvest time selection can be crucial for product yield.
Appropriate host cells include yeast, bacteria, archaebacteria, fungi, and insect and animal cells, including mammalian cells. Of particular interest are Drosophila melangastero cells, Saccharomyces cerevisiae and other yeasts, E. coli, Bacillus subtilis, SF9 cells, C129 cells, 293 cells, Neurospora, BHK, CHO, COS, Pichia Pastoris, etc.
In a preferred embodiment, the proteins are expressed in mammalian cells. Mammalian expression systems are also known in the art, and include retroviral systems. A mammalian promoter is any DNA sequence capable of binding mammalian RNA s polymerase and initiating the downstream (3') transcription of a coding sequence for the fusion protein into mRNA. A promoter will have a transcription initiating region, which is usually placed proximal to the 5' end of the coding sequence, and a TATA box, using a located 25-30 base pairs upstream of the transcription initiation site. The TATA box is thought to direct RNA polymerase II to begin RNA synthesis at the correct site. Ao mammalian promoter will also contain an upstream promoter element (enhancer element), typically located within 100 to 200 base pairs upstream of the TATA box. An upstream promoter element determines the rate at which transcription is initiated and can act in either orientation. Of particular use as mammalian promoters are the promoters from mammalian viral genes, since the viral genes are often highly expressed and have a broad host range.5 Examples include the SV40 early promoter, mouse mammary tumor virus LTR promoter, adenovirus major late promoter, herpes simplex virus promoter, and the CMV promoter. Typically, transcription termination and polyadenylation sequences recognized by mammalian cells are regulatory regions located 3' to the translation stop codon and thus, together with the promoter elements, flank the coding sequence. The 3' terminus of theo mature mRNA is formed by site-specific post-translational cleavage and polyadenylation. Examples of transcription terminator and polyadenlytion signals include those derived form SV40.
The methods of introducing exogenous nucleic acid into mammalian hosts, as well as other hosts, is well known in the art, and will vary with the host cell used. Techniques5 include dextran-mediated transfection, calcium phosphate precipitation, polybrene mediated transfection, protoplast fusion, electroporation, viral infection, encapsulation of the polynucleotide(s) in liposomes, and direct microinjection of the DNA into nuclei. As will be appreciated by those in the art, the type of mammalian cells used in the present invention can vary widely. Basically, any mammalian cells may be used, with mouse, rat, primate and human cells being particularly preferred, although as will be appreciated by those in the art, modifications of the system by pseudotyping allows all 5 eukaryotic cells to be used, preferably higher eukaryotes. As is more fully described below, a screen can be set up such that the cells exhibit a selectable phenotype in the presence of a bioactive peptide. As is more fully described below, cell types implicated in a wide variety of disease conditions are particularly useful, so long as a suitable screen may be designed to allow the selection of cells that exhibit an altered phenotype as a consequence of theo presence of a peptide within the cell.
Accordingly, suitable mammalian cell types include, but are not limited to, tumor cells of all types (particularly melanoma, myeloid leukemia, carcinomas of the lung, breast, ovaries, colon, kidney, prostate, pancreas and testes), cardiomyocytes, endothelial cells, epithelial cells, lymphocytes (T-cell and B cell) , mast cells, eosinophils, vascular intimals cells, hepatocytes, leukocytes including mononuclear leukocytes, stem cells such as haemopoetic, neural, skin, lung, kidney, liver and myocyte stem cells (for use in screening for differentiation and de-differentiation factors), osteoclasts, chondrocytes and other connective tissue cells, keratinocytes, melanocytes, liver cells, kidney cells, and adipocytes. Suitable cells also include known research cells, including, but not limited to, Jurkat T cells,o NIH3T3 cells, CHO, COS, etc. See the ATCC cell line catalog, hereby expressly incorporated by reference.
In one embodiment, the cells may be additionally genetically engineered, that is, they contain exogenous nucleic acid other than the chimeric nucleic acid of the invention. In a preferred embodiment, the proteins are expressed in bacterial systems. 5 Bacterial expression systems are well known in the art.
A suitable bacterial promoter is any nucleic acid sequence capable of binding bacterial RNA polymerase and initiating the downstream (3') transcription of the coding sequence of the protein into mRNA. A bacterial promoter has a transcription initiation region which is usually placed proximal to the 5' end of the coding sequence. This transcriptiono initiation region typically includes an RNA polymerase binding site and a transcription initiation site. Sequences encoding metabolic pathway enzymes provide particularly useful promoter sequences. Examples include promoter sequences derived from sugar metabolizing enzymes, such as galactose, lactose and maltose, and sequences derived from biosynthetic enzymes such as tryptophan. Promoters from bacteriophage may also be5 used and are known in the art. In addition, synthetic promoters and hybrid promoters are also useful; for example, the tac promoter is a hybrid of the trp and lac promoter sequences. Furthermore, a bacterial promoter can include naturally occurring promoters of non- bacterial origin that have the ability to bind bacterial RNA polymerase and initiate transcription.
In addition to a functioning promoter sequence, an efficient ribosome binding site is desirable. In E. coll, the ribosome binding site is called the Shine-Delgarno (SD) sequence and includes an initiation codon and a sequence 3-9 nucleotides in length located 3-11 nucleotides upstream of the initiation codon.
The expression vector may also include a signal peptide sequence that provides for secretion of the expressed protein in bacteria. The signal sequence typically encodes a signal peptide comprised of hydrophobic amino acids, which direct the secretion of the protein from the cell, as is well known in the art. The protein is either secreted into the growth media (gram-positive bacteria) or into the periplasmic space, located between the inner and outer membrane of the cell (gram-negative bacteria). For expression in bacteria, usually bacterial secretory leader sequences, operably linked to the chimeric nucleic acid, are preferred. In a preferred embodiment, the proteins of the invention are expressed in bacteria and/or are displayed on the bacterial surface. Suitable bacterial expression and display systems are known in the art [Stahl and Uhlen, Trends Biotechnol. 15:185-92 (1997); Georgiou et al., Nat. Biotechnol. 15:29-34 (1997); Lu et al., Biotechnology 13:366-72 (1995); Jung et al., Nat. Biotechnol. 16:576-80 (1998)]. The bacterial expression vector may also include a selectable marker gene to allow for the selection of bacterial strains that have been transformed. Suitable selection genes include genes which render the bacteria resistant to drugs such as ampicillin, chloramphenicol, erythromycin, kanamycin, neomycin and tetracycline. Selectable markers also include biosynthetic genes, such as those in the histidine, tryptophan and leucine biosynthetic pathways.
These components are assembled into expression vectors. Expression vectors for bacteria are well known in the art, and include vectors for Bacillus subtilis, E. coli, Streptococcus cremoris, and Streptococcus lividans, among others.
The bacterial expression vectors are transformed into bacterial host cells using techniques well known in the art, such as calcium chloride treatment, electroporation, and others.
In one embodiment, proteins are produced in insect cells. Expression vectors for the transformation of insect cells, and in particular, baculovirus-based expression vectors, are well known in the art. In another preferred embodiment, proteins are produced in yeast cells. Yeast expression systems are well known in the art, and include expression vectors for Saccharomyces cerevisiae, Candida albicans and C. maltosa, Hansenula polymorpha, Kluyveromyces fragilis and K. lactis, Pichia guillerimondii and P. pastoris, Schizosaccharomyces pombe, and Yarrowia lipolytica. Preferred promoter sequences for expression in yeast include the inducible GAL1.10 promoter, the promoters from alcohol dehydrogenase, enolase, glucokinase, glucose-6-phosphate isomerase, glyceraldehyde-3- phosphate-dehydrogenase, hexokinase, phosphofructokinase, 3-phosphoglycerate mutase, pyruvate kinase, and the acid phosphatase gene. Yeast selectable markers include ADE2, HIS4, LEU2, TRP1 , and ALG7, which confers resistance to tunicamycin; the neomycin phosphotransferase gene, which confers resistance to G418; and the CUP1 gene, which allows yeast to grow in the presence of copper ions.
In a preferred embodiment, the proteins of the invention are expressed in yeast and/or are displayed on the yeast surface. Suitable yeast expression and display systems are known in the art (Boder and Wittrup, Nat. Biotechnol. 15:553-7 (1997); Cho et al., J. Immunol. Methods 220:179-88 (1998); all of which are expressly incorporated by reference). Surface display in the ciliate Tetrahymena thermophila is described by Gaertig et al. Nat. Biotechnol. 17:462-465 (1999), expressly incorporated by reference. In one embodiment, proteins are produced in viruses and/or are displyed on the surface of the viruses. Expression vectors for protein expression in viruses and for display, are well known in the art and commercially available (see review by Felici et al., Biotechnol. Annu. Rev. 1 :149-83 (1995)). Examples include, but are not limited to M13 (Lowman et al., (1991 ) Biochemistry 30:10832-10838 (1991 ); Matthews and Wells, (1993) Science 260:1113-1117; Stratagene); fd (Krebber et al., (1995) FEBS Lett. 377:227-231); T7 (Novagen, Inc.); T4 (Jiang et al., Infect. Immun. 65:4770-7 (1997); lambda (Stolz et al., FEBS Lett. 440:213-7 (1998)); tomato bushy stunt virus (Joelson et al., J. Gen. Virol. 78:1213-7 (1997)); retroviruses (Buchholz et al., Nat. Biotechnol. 16:951-4 (1998)). All of the above references are expressly incorporated by reference. In addition, the proteins of the invention may be further fused to other proteins, if desired, for example to increase expression or increase stability.
Once made, the proteins may be covalently modified. One type of covalent modification includes reacting targeted amino acid residues of a protein with an organic derivatizing agent that is capable of reacting with selected side chains or the N-or C-terminal residues of a protein. Dehvatization with bifunctional agents is useful, for instance, for crosslinking a protein to a water-insoluble support matrix or surface for use in the method for purifying anti-protein antibodies or screening assays, as is more fully described below. Commonly used crosslinking agents include, e.g., 1 ,1-bis(diazoacetyl)-2-phenylethane, glutaraldehyde, N-hydroxysuccinimide esters, for example, esters with 4-azidosalicylic acid, homobifunctional imidoesters, including disuccinimidyl esters such as 3,3'-dithiobis-
(succinimidylpropionate), bifunctional maleimides such as bis-N-maleimido-1 ,8-octane and agents such as methyl-3-[(p-azidophenyl)dithio]propioimidate. Other modifications include deamidation of glutaminyl and asparaginyl residues to the corresponding glutamyl and aspartyl residues, respectively, hydroxylation of praline and lysine, phosphorylation of hydroxyl groups of seryl or threonyl residues, methylation of the "- amino groups of lysine, arginine, and histidine side chains [T.E. Creighton, Proteins: 5 Structure and Molecular Properties, W.H. Freeman & Co., San Francisco, pp. 79-86 (1983)], acetylation of the N-terminal amine, and amidation of any C-terminal carboxyl group.
Another type of covalent modification of the protein included within the scope of this invention comprises altering the native glycosylation pattern of the variant protein or of the corresponding naturally occurring protein. "Altering the native glycosylation pattern" iso intended for purposes herein to mean deleting one or more carbohydrate moieties found in a protein, and/or adding one or more glycosylation sites that are not present in the respective protein.
Addition of glycosylation sites to a protein may be accomplished by altering the amino acid sequence thereof. The alteration may be made, for example, by the addition of,s or substitution by, one or more serine or threonine residues to the protein (for O-linked glycosylation sites). The amino acid sequence may optionally be altered through changes at the DNA level, particularly by mutating the DNA encoding the protein at preselected bases such that codons are generated that will translate into the desired amino acids.
Another means of increasing the number of carbohydrate moieties on the protein iso by chemical or enzymatic coupling of glycosides to the polypeptide. Such methods are described in the art, e.g., in WO 87/05330, published September 11 , 1987 and in Aplin and Wriston, CRC Crit. Rev. Biochem., pp. 259-306 (1981 ).
Removal of carbohydrate moieties present on the protein may be accomplished chemically or enzymatically or by mutational substitution of codons encoding for amino acid5 residues that serve as targets for glycosylation. Chemical deglycosylation techniques are known in the art and described, for instance, by Hakimuddin et al., Arch. Biochem. Biophys., 259:52 (1987) and by Edge et al., Anal. Biochem., 118:131 (1981 ). Enzymatic cleavage of carbohydrate moieties on polypeptides can be achieved by the use of a variety of endo-and exo-glycosidases as described by Thotakura et al., Meth. Enzymol., 138:350 (1987). o Another type of covalent modification of a protein comprises linking the protein to one of a variety of non-proteinaceous polymers, e.g., polyethylene glycol, polypropylene glycol, or polyoxyalkylenes, in the manner set forth in U.S. Patent Nos. 4,640,835; 4,496,689; 4,301 ,144; 4,670,417; 4,791 ,192 or 4,179,337.
The proteins of the present invention may also be modified in a way to form chimeric5 molecules comprising a protein fused to another, heterologous polypeptide or amino acid sequence. In one embodiment, such a chimeric molecule comprises a fusion of a protein with a tag polypeptide which provides an epitope to which an anti-tag antibody can selectively bind. The epitope tag is generally placed at the amino-or carboxyl-terminus of the protein. The presence of such epitope-tagged forms of a protein can be detected using an antibody against the tag polypeptide. Also, provision of the epitope tag enables the protein to be readily purified by affinity purification using an anti-tag antibody or another type of affinity matrix that binds to the epitope tag. In an alternative embodiment, the chimeric molecule may comprise a fusion of a protein with an immunoglobulin or a particular region of an immunoglobulin. For a bivalent form of the chimeric molecule, such a fusion could be to the Fc region of an IgG molecule.
Various tag polypeptides and their respective antibodies are well known in the art. Examples include poly-histidine (poly-his) or poly-histidine-glycine (poly-his-gly) tags; the flu HA tag polypeptide and its antibody 12CA5 [Field et al., Mol. Cell. Biol., 8:2159-2165
(1988)]; the c-myc tag and the 8F9, 3C7, 6E10, G4, B7 and 9E10 antibodies thereto [Evan et al., Molecular and Cellular Biology, 5:3610-3616 (1985)]; and the Herpes Simplex virus glycoprotein D (gD) tag and its antibody [Paborsky et al., Protein Engineering, 3(6):547-553 (1990)]. Other tag polypeptides include the Flag-peptide [Hopp et al., BioTechnology, 6:1204-1210 (1988)]; the KT3 epitope peptide [Martin et al., Science, 255:192-194 (1992)]; tubulin epitope peptide [Skinner et al., J. Biol. Chem., 266:15163-15166 (1991 )]; and the T7 gene 10 protein peptide tag [Lutz-Freyermuth et al., Proc. Natl. Acad. Sci. USA, 87:6393- 6397 (1990)].
In a preferred embodiment, the protein is purified or isolated after expression. The proteins may be isolated or purified in a variety of ways known to those skilled in the art depending on what other components are present in the sample. Standard purification methods include electrophoretic, molecular, immunological and chromatographic techniques, including ion exchange, hydrophobic, affinity, and reverse-phase HPLC chromatography, and chromatofocusing. For example, the protein may be purified using a standard anti-library antibody column. Ultrafiltration and diafiltration techniques, in conjunction with protein concentration, are also useful. For general guidance in suitable purification techniques, see Scopes, R., Protein Purification, Springer- Verlag, NY (1982). The degree of purification necessary will vary depending on the use of the protein. In some instances no purification may be necessary.

Claims

1. A method for forming a recombined nucleic acid comprising: randomly fragmenting one or more double-stranded DNA molecules to form double-stranded fragments having ends; generating single-stranded cohesive ends on each end of said fragments; and assembling together at least two of said fragments having said cohesive ends to form a recombined nucleic acid; wherein said DNA molecule(s) and said fragments remain double-stranded throughout said fragmenting, generating single-stranded cohesive ends and assembling steps.
2. The method of Claim 1 wherein said formed recombined nucleic acid is subjected to at least one more repetition of said fragmenting, generating single-stranded cohesive ends and assembling steps.
3. The method of Claim 1 wherein at least one more double-stranded DNA molecule is added to said formed recombined nucleic acid and said fragmenting, generating single-stranded cohesive ends and assembling steps are repeated.
4. The method of Claim 1 wherein said double-stranded DNA molecule is a gene.
5. The method of Claim 1 wherein said double-stranded DNA molecule is an operon or a metabolic pathway comprising more than one gene.
6. The method of Claim 1 wherein said double-stranded DNA molecule is a chromosome.
7. The method of Claim 1 wherein at least two DNA molecules are fragmented and wherein said DNA molecules differ from one another in sequence.
8. The method of Claim 7 wherein said DNA molecules are homologs of one another.
9. The method of Claim 7 wherein one of said DNA molecules is naturally occurring and one is a variant of said naturally occuring DNA.
10. The method of Claim 7 wherein each of said DNA molecules are variants of a nucleic acid of interest.
11. A method for forming a recombined nucleic acid having a gene deleted or inverted comprising: randomly fragmenting a double-stranded DNA molecule comprising a gene flanked by known sequences to form double-stranded fragments having ends; generating single-stranded cohesive ends on each end of said fragments; assembling said fragments with a double-stranded insert having single- stranded ends complementary to said known sequences to form a recombined nucleic acid having said gene deleted or inverted.
12. The method of Claim 11 wherein said gene is inverted, and wherein said double-stranded insert having single-stranded ends is formed by a method comprising: amplifying said gene with primers that have additional nucleotides that are not natural extensions of said gene, wherein said additional nucleotides are complementary to opposing ends of said known sequences to form an amplification product; and generating on said insert single-stranded ends complementary to said known sequences.
13. The method of Claim 11 wherein said gene is deleted, and wherein said double-stranded insert does not comprise said gene.
14. A method for forming a recombined nucleic acid having a gene inserted therein comprising: randomly fragmenting a double-stranded DNA molecule; generating single-stranded cohesive ends on each end of said fragments; assembling said fragments with a double-stranded insert comprising a gene and having single-stranded ends to form a recombined nucleic acid having a gene inserted therein.
15. The method of Claim 14 wherein said double-stranded insert having single- stranded ends is formed by a method comprising: amplifying said insert with primers that have additional nucleotides to form an amplification product; and generating single-stranded ends on said insert.
16. The method of Claim 14 wherein said additional nucleotides are random.
17. The method of Claim 14 wherein said additional nucleotides are natural extensions from a selected site for insertion.
EP01926467A 2000-04-21 2001-03-28 Non-pcr based recombination of nucleic acids Withdrawn EP1276858A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US55720800A 2000-04-21 2000-04-21
US557208 2000-04-21
PCT/US2001/009971 WO2001081568A1 (en) 2000-04-21 2001-03-28 Non-pcr based recombination of nucleic acids

Publications (1)

Publication Number Publication Date
EP1276858A1 true EP1276858A1 (en) 2003-01-22

Family

ID=24224471

Family Applications (1)

Application Number Title Priority Date Filing Date
EP01926467A Withdrawn EP1276858A1 (en) 2000-04-21 2001-03-28 Non-pcr based recombination of nucleic acids

Country Status (4)

Country Link
EP (1) EP1276858A1 (en)
AU (1) AU2001253001A1 (en)
CA (1) CA2406466A1 (en)
WO (1) WO2001081568A1 (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6951719B1 (en) 1999-08-11 2005-10-04 Proteus S.A. Process for obtaining recombined nucleotide sequences in vitro, libraries of sequences and sequences thus obtained
WO2008027558A2 (en) 2006-08-31 2008-03-06 Codon Devices, Inc. Iterative nucleic acid assembly using activation of vector-encoded traits
WO2012064975A1 (en) 2010-11-12 2012-05-18 Gen9, Inc. Protein arrays and methods of using and making the same
ES2548400T3 (en) 2010-11-12 2015-10-16 Gen9, Inc. Methods and devices for nucleic acid synthesis
EP2944693B1 (en) 2011-08-26 2019-04-24 Gen9, Inc. Compositions and methods for high fidelity assembly of nucleic acids
US9150853B2 (en) 2012-03-21 2015-10-06 Gen9, Inc. Methods for screening proteins using DNA encoded chemical libraries as templates for enzyme catalysis
EP4001427A1 (en) 2012-04-24 2022-05-25 Gen9, Inc. Methods for sorting nucleic acids and multiplexed preparative in vitro cloning
JP6509727B2 (en) 2012-06-25 2019-05-15 ギンゴー バイオワークス, インコーポレイテッド Methods for nucleic acid assembly and high-throughput sequencing
GB2566986A (en) 2017-09-29 2019-04-03 Evonetix Ltd Error detection during hybridisation of target double-stranded nucleic acid

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3614310A1 (en) * 1986-04-28 1987-10-29 Hoechst Ag METHOD FOR ISOLATING MUTED GENES AND THE CORRESPONDING WILD-TYPE GENES
US5783431A (en) * 1996-04-24 1998-07-21 Chromaxome Corporation Methods for generating and screening novel metabolic pathways
JPH1066576A (en) * 1996-08-07 1998-03-10 Novo Nordisk As Double-stranded dna having protruding terminal and shuffling method using the same
EP1228200A1 (en) * 1999-10-27 2002-08-07 California Institute Of Technology Production of functional hybrid genes and proteins

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0181568A1 *

Also Published As

Publication number Publication date
CA2406466A1 (en) 2001-11-01
AU2001253001A1 (en) 2001-11-07
WO2001081568A1 (en) 2001-11-01

Similar Documents

Publication Publication Date Title
EP1328627B1 (en) Method for generating a library of oligonucleotides comprising a controlled distribution of mutations
US5939250A (en) Production of enzymes having desired activities by mutagenesis
CN105247066B (en) Increasing specificity of RNA-guided genome editing using RNA-guided FokI nuclease (RFN)
US7833759B2 (en) Method of increasing complementarity in a heteroduplex
EP1409667B2 (en) Process for preparing variant polynucleotides
US20020155439A1 (en) Method for generating a library of mutant oligonucleotides using the linear cyclic amplification reaction
AU720334B2 (en) Method of screening for enzyme activity
AU2002314712A1 (en) A method of increasing complementarity in a heteroduplex polynucleotide
EP1276858A1 (en) Non-pcr based recombination of nucleic acids
KR20210060541A (en) Improved high throughput combinatorial genetic modification system and optimized Cas9 enzyme variants
US6790605B1 (en) Methods for obtaining a desired bioactivity or biomolecule using DNA libraries from an environmental source
EP1280894B1 (en) Methods for forming recombined nucleic acids
JP2004528801A (en) Increasingly truncated nucleic acid and method for producing the same
EP1179596A1 (en) Nuclease
Zhang et al. CRISPR/Cas9-assisted ssDNA recombineering for site-directed mutagenesis and saturation mutagenesis
EP1311709A1 (en) Methods and compositions for directed molecular evolution using dna-end modification
AU756201B2 (en) Method of screening for enzyme activity
CA2486900A1 (en) A method for obtaining circular mutated and/or chimaeric polynucleotides
AU2003200812A2 (en) Method of screening for enzyme activity
ZA200306203B (en) A method of increasing complementarity in a heteroduplex polynucleotide.

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20021112

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

17Q First examination report despatched

Effective date: 20030523

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20040616