MXPA00012409A

MXPA00012409A - Method for producing polynucleotides with desired properties

Info

Publication number: MXPA00012409A
Application number: MXPA/A/2000/012409A
Authority: MX
Inventors: Claus M Krebber
Original assignee: Maxygen Inc
Priority date: 1998-06-17
Filing date: 2000-12-14
Publication date: 2001-11-21

Abstract

The invention provides methods for the production of polynucleotides with a desired property (e.g., conferring a desired phenotype and/or encoding polypeptide with a desired property) which is selectable or can be screened for. The method includes making insertions and/or deletions at random sites in DNA segments in a population. In some embodiments the random insertions and deletions are made recursively.

Description

METHOD TO PRODUCE POLYUCLEOTIDES WITH DESIRED PROPERTIES FIELD OF THE INVENTION The present invention relates to methods for the production of polynucleotides, which confer a desired phenotype and / or encode a polypeptide having an advantageous predetermined property, which is selectable or can be classified. BACKGROUND OF THE INVENTION The traditional methods of molecular biology to generate novel genes and proteins usually involved a rational or directed mutation. One example is the generation of a polynucleotide that encodes a fusion or chimeric protein using known restriction sites to combine functional domains of two characterized proteins. Another example is the introduction of a point mutation at a specific site in a polypeptide. Although useful, the power of these and similar methods is limited by the requirement of sequence map information or restriction to facilitate mutagenesis and by the limited number of variants that can be generated efficiently. An alternative approach to the generation of variants uses random recombination techniques such as "DNA redistribution" (Patten et al., Curr. Opin.

- - Biotech , 18: 724-733). The redistribution of DNA causes iterative cycles of recombination and sorting or selection to "develop" individual genes, complete plasmids or viruses, multiple gene groups or whole genomes. Such techniques do not require the extensive analysis and quantification required by conventional methods to design polynucleotides and polypeptides. On the other hand, the redistribution of DNA allows the recombination of a large number of mutations in a minimum number of selection cycles, in contrast to recombination events in traditional pairs. Thus, DNA redistribution techniques provide advantages because they provide recombination between mutations in any or all of these, thereby providing a very quick way to explore the manner in which different combinations of mutations can affect a desired result. The present invention provides methods that can be used alone or in combination with random recombination techniques such as redistribution of DNA to generate novel polynucleotides that have or encode a polypeptide having a desired property or combination of properties. SUMMARY OF THE INVENTION In one aspect the invention provides a method for producing a DNA segment having a property or combination of desired properties by mutating a substrate population. The method includes: a) mutating a substrate population that includes a plurality of AND segments by: i) inserting random sites in the segments (random insertion), ii) deleting random sites in the segments (random deletion), or both, to produce a mutated population that includes mutated DNA segments, b) to classify the mutated population to obtain a first selected population, which includes at least one segment of DNA with a desired first property, c) to mutate the first selected population by random insertions, random deletions, or both, to produce a recursively mutated population; and d) classify the mutated population recursively to obtain a recursively selected population that includes at least one DNA segment with a second desired property. In some embodiments, the method additionally includes at least one additional mutation and classification cycle (eg, mutating the selected population recursively and classifying the mutated population recursively resulting to obtain a new recursively selected population with a desired property). after stage (d). In some embodiments, the redistribution of one or a combination of polynucleotides in a recursively selected population is carried out. In various modalities, the second desired property can be the same or different from the first desired property and can be a combination of properties. In some embodiments, the polynucleotides in the recursively selected population have a property that is improved when compared to the polynucleotides in the first selected population. In some embodiments, the substrate population includes DNA segments that encode a polypeptide, a catalytic RNA, a promoter sequence or a vector. In some modalities, the substrate population is homogeneous. In some embodiments, a polynucleotide encoding a polypeptide is classified for an activity such as an enzymatic activity, a substrate specificity, or a binding activity of a polypeptide. In another aspect, the invention provides a method for producing a DNA segment having a desired property by: a) mutating a first substrate population that includes a plurality of AND segments by: i) making insertions at random sites in the segments ( random insertion), -*TO". ii) deletions at random sites in the segments (random deletion), or both, to produce a first mutated population of mutated DNA segments, b) mutate a second substrate population that includes a plurality of AND segments by: i) doing inserts at random sites in the segments, ii) deletions at random sites in the segments, or both, to produce a second mutated population of mutated DNA segments, c) recombine the first substrate population and the second substrate population to produce a population recombined; and d) classifying the recombined population to identify at least one DNA segment with the desired property. In one embodiment, the first and second mutated populations are classified to produce a first and second selected populations, each having a desired property and the selected populations recombine. In several embodiments, recombination can be achieved by redistribution or directed recombination. In some embodiments, the first desired property and the second desired property are the same. In some embodiments, the substrate population includes DNA segments that encode a polypeptide, a catalytic RNA, a promoter sequence, or a vector. In some modalities, the substrate population is homogeneous. In some embodiments, a polynucleotide encoding a polypeptide is classified for an activity such as an enzymatic activity, a substrate specificity, or a binding activity of a polypeptide. In another aspect, the invention provides a method for producing a DNA segment having a desired property by: a) mutating a substrate population that includes a plurality of AND segments by: i) making insertions at random sites in the segments, ii ) make deletions in random sites in the segments; or both, to produce a mutated population of mutated DNA segments, b) to classify the mutated population to obtain a selected population that includes at least one DNA segment with the desired property; c) redistributing at least one segment of DNA for the selected population to produce a recombined population; d) classifying the recombined population for a desired property.

- - In one embodiment, redistribution includes driving a process of amplifying polynucleotides on overlapping segments of at least one polynucleotide of the selected population under conditions under which one segment serves as a model for the extension of another segment, to generate a population of polynucleotides. recombinants. In some embodiments, the substrate population includes DNA segments that encode a polypeptide, a catalytic RNA, a promoter sequence, or a vector. In some modalities, the substrate population is homogeneous. In some embodiments, a polynucleotide encoding a polypeptide is classified for an activity such as an enzymatic activity, a substrate specificity, or a binding activity of a polypeptide. BRIEF DESCRIPTION OF THE FIGURES Figure 1 provides a flowchart of an embodiment of the invention in which the recursive steps of insertion or random deletion and classification are employed to produce a DNA segment with a desired property. Figure 2 provides a flow diagram of an embodiment of the invention in which random insertion or deletion is carried out on two different substrate populations, which are then recombined. Figure 3 provides a flow chart of an embodiment of the invention in which steps of random insertion or deletion, sorting and random recombination are employed to produce a DNA segment with a desired property. DETAILED DESCRIPTION I. Definitions The following terms are defined to provide additional guidance to an expert in the practice of the invention. The term "redistribution", as used herein, refers to techniques for random recombination between substantially homologous but not identical polynucleotides. Several methods of redistribution are described in Patten et al., 1997, Curr. Opin. Biotech 8: 724-733; Stemmer, 1994, Nature 370: 389-391; Stemmer et al., 1994, Proc. Nati Acad. Sci. USA 91: 10747-10751; Zhao et al., 1997, Nucleic Acids Res. 25: 1307-1308; Crameri et al., 1998, Nature 391: 288-291; Crameri et al., 1997, Nat. Biotech 15: 436-43B Arnold et al., 1997, Adv. Biochem. Eng. Biotechnol. , 58: 2-14 Zhang et al., 1997, Proc. Nati Acad. Sci. USA 94: 4504-4509 Crameri et al., 1996, Nat. Biotechnol. 14: 315-319; Crameri st al., 1996, Nat. Med. 2: 100-102; PCT Publications 095/22625; O97 / 20078; 097/35957; 097/35966; 098/13487; W098 / 13485; PCT 98/00852; PCT 97/24239; and the references in them. The redistribution techniques are also described in the following patents and US patent applications: U.S. Patent No. 5,605,793; Patent Applications of E.U. Nos. Series: 08 / 537,874; 08 / 621,859; 08 / 792,409; 08 / 769,062; 08 / 822,589; 09 / 021,769; 60 / 074,294; 08 / 722,660; 08 / 938,690. Each of the patents, applications and publications mentioned above is incorporated herein by reference in its entirety and for all purposes. One method of redistribution comprises conducting a process of amplifying polynucleotides on overlapping segments of a population of variants of a polynucleotide under conditions whereby one segment serves as a model for extension of another segment, to generate a population of recombinant polynucleotides and to classify or selecting a recombinant polynucleotide or an expression product thereof for a desired property. Some redistribution methods use random point mutations (typically introduced in a PCR amplification stage) as a source of diversity. The term "oligonucleotide" as used herein, generally refers to short polynucleotides of about 50 bases (e.g., about 6, 9, 12, 15, 18, 21, 25, 35, or 50 bases of length) . The term "polynucleotide", as used herein, refers to both, oligonucleotides and larger molecules (eg, at least about 60, 100, 200, 300, 500, 1,000, 5,000, 10,000 bases or base pairs The oligo and polynucleotides used in the present invention are usually DNA molecules and are typically double-stranded.The term "property", as used herein, refers to any feature or attribute of the invention. a polynucleotide (or, for example, a polypeptide or encoded RNA) that can be selected or detected in a classification system, including, for example, enzymatic or binding activity of a polynucleotide or a coded polypeptide (e.g., a new activity or a increased or decreased level of a pre-existing activity), fluorescence, properties conferred on a cell comprising a particular polynucleotide, a binding activity (e.g. the property of binding, or being bound by, a specific target molecule, such as receptor, ligand, antibody or antibody fragment, antigen, epitope, or other biological macromolecule). The property can be an attribute of a sequence that controls transcription (eg, promoter strength, regulation), a sequence that affects RNA processing (eg, RNA stability or splicing), a sequence that affects translation (for example the level, regulation, post-transcriptional modification), or a sequence that affects another expression property of a gene or transgene; a replicative element, an element that binds to the protein; a vector; an encoded protein (e.g., activity and enzymatic specificity, activity and binding specificity, pl, stability to denaturation), an encoded RNA (e.g., mRNA or catalytic RNA) and the like. Additional examples are described herein or in references incorporated herein, or will be apparent to an expert with reading ie this description. The term "develop", as used herein, refers to the process of introducing variation in a population of macromolecules and selecting or classifying for the acquisition of a desired property or the partial acquisition of a desired property, resulting in the generation of one or more molecules different from the molecules of the starting population. II. Summary The present invention provides novel methods for the generation of polynucleotides having a desired property (for example an advantageous predetermined property that is selectable or can be classified). In one aspect, the invention provides methods for generating diversity in a population of polynucleotides by insertion or random deletion of sequences and identification of variants with new or improved properties. In some - modalities, multiple insertion / deletion and classification cycles are carried out. In some embodiments, the properties of the variants are developed by one or more of a variety of methods. Typically the mutated polynucleotides are double stranded DNA segments. Examples of suitable DNA segments include DNAs comprising genes, gene fragments, gene groups, vectors, polypeptide encoding sequences, expression regulatory sequences (e.g., promoters, enhancers) and the like. In one embodiment of the invention, a population of polynucleotides (i.e., a substrate population) is mutated by insertion or random deletion and the resulting mutated population is classified to identify a sub-population of species with a desired property (i.e., a selected population). The selected population is then mutated by insertion or random deletion by itself and the resulting two-mutated population is again subjected to classification to produce a new selected population. The second classification cycle can be for the same property or a similar one as that classified in the previous cycle, or for an entirely different property. For example, when a substrate population of vectors is mutated, the first classification could be for species that have acquired a sequence that confers resistance to chloramphenicol, not found in the substrate population and the second classification could be for an increased resistance to chloramphenicol ( the same or similar property) or, alternatively, in subsequent cycles of mutation and classification for the acquisition of a sequence that confers resistance to tetracycline (a different property). The mutation and selection process can be carried out by multiple cycles, if desired, to generate one or more new DNA segments having a specific desired property or combination of properties. For example, in some modalities at least 2, 5 or 10 cycles of insertion / random deletion and classification will be carried out. Following two or more cycles of mutation and selection, at least one species of polynucleotide having the desired property or properties (eg, an activity not found in the initial population of polynucleotides) is isolated from the sub-population. This process is outlined in general in Figure 1; however, the figure is presented only to assist the reader and is not intended to limit the invention in any way. In another embodiment, two or more different substrate populations are mutated by insertion or random deletion, producing corresponding mutated populations. In many embodiments, the two or more mutated populations are classified for particular desired properties (eg, each mutated population is classified for a different property). Following the production of the two or more mutated populations (or following the classification if this takes place), the polynucleotide segments of each of the mutated populations recombine to produce a single recombined population. Recombination may be carried out by redistribution of DNA, or alternatively, using "classical" molecular cloning techniques, in which a region selected from a population of polynucleotides is cloned at a specific site (eg, a restriction site) in a second population of polynucleotides. "Classical" techniques include (i) restricting two populations of DNA molecules and ligating fragments from one of the populations at a restriction site in the DNA of the second population, (ii) amplifying a region of a population of polynucleotides (e.g., by PCR or reverse PCR) and ligated into the polynucleotides of the second population, (iii) and other methods known in the art. The recombined population is then classified for the desired property (s). In some modalities, subsequent cycles of insertion / random deletion or recombination and classification are carried out. This process is outlined in Figure 2; As Figure 1, this Figure is not proposed to limit the invention.

In a third embodiment, a substrate population of polynucleotides is mutated by insertion or random deletion, the resulting mutated population is classified to identify species with a desired property (eg, a "selected population"). The selected population (or a species or species isolated from it), is then developed by random recombination (including random recombination combined with point mutation), which can be recursive or a single-cycle random recombination. This process is outlined in Figure 3; This Figure is also not proposed to limit the invention. The invention will now be described in greater detail. III. Mutation of the Substrate Population a) General An initial step in the method of the invention is the introduction of random insertions or deletions at random sites in a population of polynucleotides. Mutations and deletions are sometimes collectively referred to herein as "mutations." For convenience, a population of polynucleotides in which mutations are to be introduced can be referred to as the "substrate population". Although the method can be carried out on any polynucleotide that can be randomly mutated by insertion or deletion, as noted above the polynucleotides will most frequently be DNA molecules (including cDNAs), usually double stranded DNA molecules. The DNA molecules that make up the substrate population can be of any of several types, including DNA molecules that comprise sequences that encode polypeptides (for example, that encode a protein, multiple proteins, or portions of a protein), regulatory DNAs (for example, promoters, enhancers), vectors (for example an expression vector) and viruses (for example to produce attenuated virions). These DNA molecules are also sometimes called "DNA segments". The substrate population will comprise a plurality of DNA segments, typically at least 102, more frequently at least 104, or at least 106 DNA segments. In many embodiments, the DNA segments in any particular substrate population are identical to one another, being derived from a single original DNA (eg, plasmid DNAs prepared from the same bacterial culture). Such a population is a "homogeneous" substrate population. In some embodiments, however, the substrate population includes segments of DNA that are not identical, such as following IDS: segments of DNA that differ from each other by point mutations (e.g., molecules that have been generated from a model using Error-prone PCR) or other mutations (e.g., insertions or deletions); DNA segments that are related as homologs of different organisms; and DNA segments that are related to one another because they are products of DNA redistribution reactions (see for example Patten et al., 1997, Curr Opin Biotech 8: 724). In a related embodiment, the substrate population will comprise segments of DNA having unrelated sequences (e.g., a substrate population comprising several different plasmid vectors), usually with a plurality (e.g., at least 102 or 106) of each species I presented. Mutations (insertions or deletions or both) are introduced into the DNA segments in the substrate population. For convenience, the polynucleotide population that has mutated can be referred to as the "mutated population". An important aspect of the present invention is that the mutations are introduced at random sites in the DNA segments. "Random", in this context, has its usual meaning and refers to insertions and deletions that (i) are not made at predetermined sites of an objective polynucleotide and (ii) result in a population of polynucleotides (e.g., a mutated population). ) in which many different insertion or deletion sites are represented (ie, different species in the mutated population include insertions or deletions in different sites). In contrast to the random mutations used in the present invention, a mutation is "directed" when it is made at a predetermined site in polynucleotides in a population, such as the insertion of a cassette at a particular restriction site in the DNA segments of a population, or site-specific mutagenesis. The art knows a variety of in vi tro and in vivo methods for making inserts and / or random deletions in polynucleotides. While it will be appreciated that the invention is not limited to any specific method for making insertions or deletions, illustrative examples of these methods are provided infra. Usually the DNA segments to be mutated in vitro are closed circular molecules isolated from cells (eg plasmids, circular bacteriophages and certain vectors) or, alternatively, they can be circularized in vi tro. Any circularization method can be used. For example, linear bacteriophages, eukaryotic viruses, PCR products and other linear molecules can be circularized by treatment with DNA ligase or its equivalent. In some embodiments it will be desirable to carry out the binding reaction in a low concentration of substrate molecules, to avoid or reduce concatemerization. In certain embodiments, to limit the activity of the nuclease to a single division-by-molecule event in the subsequent random linearization stage - - (described infra) circular super-coiled DNA is used. The closed circular molecules can be supercoiled by treatment with topoisomerase II (Gellert et al., 1976, Proc. Nati Acad. Sci., 73: 3872-3876). In a random mutation method, closed circular molecules are randomly divided into a single site. A circular polynucleotide is "linearized" when it is divided once (in contrast to a polynucleotide that is "fragmented"). Methods for random linearization are known, and include limited hydrolysis of double-stranded DNA using nucleases that divide the double stranded (for example DNase I) or using a combination of enzymes that cut double stranded DNA (for example DNase I in presence of ethidium bromide, topoisomerase mutants) and single-stranded specific nucleases (eg SI nuclease, Pl nuclease, Mung bean nuclease). See, for example Yokochi et al., 1996, Genes Cells 1: 1069-1075; Chaudry et al., 1995, Nucí. Acids Res. 23: 805-809. Alternatively, "pseudo-random" linearization can be carried out using a relatively non-specific restriction endonuclease (e.g., one that recognizes a common four-base sequence) under conditions in which division occurs approximately once per molecule. When necessary, prior to insertion or suppression, protruding ends can be blunted upon completion (eg using polymerase and dNTPs) and / or exonuclease treatment. In practice, the division of a large population of molecules will usually result in a distribution of polynucleotides in addition to those that are linearized, including some molecules that are not divisive and others that are fragmented by division into more than one site. It is known in the art to adjust enzyme and substrate concentrations, digestion times and other conditions to obtain primarily single-divided molecules. If desired, linearized molecules can be isolated from the fragments by routine methods (for example, size selection by gel electrophoresis, chromatography, or centrifugation). However, it is not necessary to separate molecules that are uniquely divided from those that are not divided or multiply divided. b) Random Inserts The polynucleotide or oligonucleotide sequence (s) that are randomly inserted into a population of randomly linearized polynucleotides can be from any of a variety of sources. (The sequence (s) to be inserted may be referred to as the insertion sequence or the "insertion population"). Thus, the oligo / polynucleotides to be inserted may have - a defined sequence (s) and biological function (s) (eg, a TATA box sequence of the Drosophila cuticle gene). ). Polynucleotides suitable for insertion include defined functional modules or populations of modules (e.g., promoter, extender, or other regulatory elements, sequences encoding T or B cell epitopes, biotinylation domains, selectable antibody peptides, binding domains to proteins, cellulose binding domains, selectable markers, reporter genes, protein loop sequences, functional domains of a protein, fragments of viral or bacterial genomes and the like). Polynucleotides suitable for insertion also include undefined or defined fragments of molecules with a known function (eg fragments of a known promoter sequence, fragments of sequences encoding polynucleotides). The oligo / polynucleotides may be of unknown or random biological sequence and / or function, or they may not have a particular biological function in nature (eg, a random sequence 12mer library). Suitable insertion polynucleotides can be generated by chemical synthesis, PCR amplification, enzymatic fragmentation and any other means. The size of the sequence (s) to be inserted can be in a wide range, such as at least about 3, 6, 9, 12, 15, 18, 21, or 50 bases in length up to about 0.1, 0.5, 1, or 2 kilobases or even larger. The insertion of the sequence between the terms of a linearized polynucleotide can be carried out by any suitable method. Typically the sequences to be joined are incubated together in the presence of a DNA ligase. In some embodiments, a single species of polynucleotide (eg, a 12-mer of a particular sequence) is randomly inserted into a population of polynucleotides. In different modalities, a plurality (ie, more than 1) of different polynucleotide species is introduced at a particular step in the mutation process (e.g., a set of 12-random sequence numbers, or a mixture of fragments of a promoter sequence is inserted). The inserted sequences can modify or supplement the properties of the substrate molecules in any of a variety of ways. They may, as will be apparent from the examples provided infra, be selected to provide a particular sequence, such as a sequence encoding a particular epitope, protein binding or recognition site, transcription factor binding site, RNA splice site , or similar. Alternatively or in addition, they can act to introduce a variation in length in an encoded polynucleotide or polypeptide. In a coded polypeptide, variations in length influence the specificity of the molecule (e.g., substrate specificity in an enzyme, antigen specificity in an antibody). In a polynucleotide, the variation in length, for example, will change the interval between elements of the transcription factor in a promoter, profoundly influencing the function of the promoter. When insertions are made into a sequence encoding a protein of a polynucleotide, particular tiques may be used, if desired, to retain a particular open reading frame (e.g., ensuring that deletions and / or insertions will be a multiple of three. nucleotide bases of length). For example, in one embodiment, a single codon is inserted (ie, three nucleotides). This can be done by randomly inserting an oligonucleotide having a length that is a multiple of 3 bases (eg Boulain et al., 1986, Mol. Gen. Genet, 20: 339-348). An alternative method involves first randomly inserting a cassette of resistance (for example drug resistance) that can be eliminated by cleavage by restriction endonucleases after selection (for example growth on selective medium). The insertion cassette can be designed to leave a single or multiple random or non-random codon (s) in the coding sequence (Wong et al., 1993, Mol Microbiol 10: 283-292, Dykxhoorn et al., 1997 , Nucí, Acids Res. 5: 4209-4218, Hallet et al., 1997, Nucí Acids Res. 265: 1866-1867). In addition, tiques for co-translational coupling of a reporter gene (eg, GFP) can be used to identify or eliminate non-productive products (ie, with the frame shifted). It will be appreciated that while retaining the original open reading frame will reduce the number of "non-productive" polynucleotides in the mutated population and thus make the classification somewhat more efficient, it is not necessary or always desirable to eliminate shifted reading frame mutations. c) Random Deletions In some embodiments of the invention, deletions are introduced at random sites in a substrate population. The introduction of deletions can be used to reduce the size of a polynucleotide sequence (i.e., to increase the insertion capacity of a vector), to change a property of a polynucleotide (e.g. by changing the range of functional domains in a polynucleotide encoded by a segment of AD?) and for other purposes. When a population of polynucleotides has random deletions (ie, deletions are introduced at random sites), there will usually be variation in the degree of deletions in several molecules in the population. The length (s) of the supressors introduced at any stage will vary depending on the researcher's objectives, but will typically be less than 100 bases or base pairs (eg, at roughly 3, 6, 9, 12, 15 , 18, 21, 25, 35, 50 or 100 bases in length). In some modalities, however, some or all of the deletions may be larger, such as at least about 200 or 500 bases. Deletions can be made by a variety of methods. In one embodiment, a circular or circularized molecule (e.g., a vector) is randomized linearized as described supra. Randomly linearized molecules are then reduced in size (ie, the sequence is subjected to deletion) by the use of a process exonuclease (e.g. Bal31) or exonuclease III). In some embodiments, the resulting linear molecules become blunt by standard methods before recircularization by binding (Sambrook et al., 1989, MOLECULAR CLONING - A LABORATORY MANUAL (MOLECULAR CLONING - A LABORATORY MANUAL 2nd ed. Vol. In one embodiment, the sequences to be inserted (for example such as those described supra) can be included in the binding reaction (resulting in insertion and simultaneous deletion of sequences relative to the substrate population). the invention, the polynucleotide is a vector and the introduction of random deletions and selection is used to reduce the size of the vector s: Ln eliminate critical sequences for the operation of the vector (for example the origin of replication) .The reduced size increases the capacity of introducing new or larger genes into the main structure of the vector, when using, for example, a bacteriophage vector with a limited DNA packaging length (due to the ability of the capsid), the reduction in size of the bacteriophage genome would allow packaging of new or larger genes without affecting the essential functions of the phage. Notably, the present invention allows the reduction in the size of a vector and / or the introduction of genes from other sources without an a priori knowledge of the function of parts of the parent vector. Thus, it is especially useful when using a bacteriophage not characterized as a vector (for example for use in bacteriophage Streptomyces FC31). As noted supra, it will sometimes be desirable, when mutating a polynucleotide encoding a polypeptide, to use techniques to retain a reading frame found in the parent vector. In one embodiment, for example, a single triple: e is deleted from (each of) the polynucleotides deleted from a substrate population. This can be carried out first by inserting a resistance cassette that can be removed (for example after selection) by deleting 3 nucleotides. For example, a cassette or short oligonucleotide containing a Type IIS restriction enzyme recognition site (eg, EarI, SapI) can be designed which, after random insertion can be divided from the circular DNA so that a multiple of 3 nucleotides is removed. Alternatively, the mobilization of a transposon (eg, using cre / lox) can be used to eliminate the resistance cassette. c) Additional Methods In another embodiment of the invention, a mutated population is generated from a substrate population by the introduction of insertion and / or random deletions generated using process exonuclease digestion of two sub-populations of polynucleotides. The sub-populations are then linked to produce novel combinations of sequences, as described below. According to this embodiment, the substrate population may be homogeneous (i.e., a plurality of polynucleotides having the same sequence, for example having the sequence of the particular gene encoding a protein) or may be non-homogeneous (e.g. contains a mixture of polynucleotides having related sequences, such as a family of related genes (eg encoding human actins) or homologues of different species (eg, encoding human and bovine actin genes), or the product of redistribution reactions , or other non-identical polynucleotides as described supra). To produce a mutated population that has random insertions and / or deletions, the substrate population is divided into at least two sub-populations. A series of niche deletions is produced from each of the, for example two sub-populations by incubation with exonuclease using methods well known in the art (see for example Henikoff, 1984, Gene 28: 351, see also New England Biolabs Catalog 1998/99 page 129"Exo.Size TM Deletion Kit"). Briefly, an exonuclease such as exonuclease II is used to create unidirectional deletions in the polynucleotides of each subpopulation. Preferably, restriction endonuclease digestion of the DNA segments in each subpopulation is used to introduce both a terminus susceptible to nuclease (i.e., a 5 'projection or blunt end) and an end not susceptible to nuclease (i.e. a projection). 3 ') so that the nuclease digests in only one direction. The at least two sub-populations differ in that the end site susceptible to different nuclease in different sub-populations. After a series - of deletions of varying lengths (ie, niche deletions) occurs in each subpopulation (eg, incubating aliquots with exonulease for different time periods) polynucleotides from each sub-population is ligated to produce a mixture of polynucleotides mutants that have random insertions (eg duplications) / or deletions at the binding site (a mutated population). An example will help to illustrate this embodiment of the invention. Thus, consider a homogeneous substrate population of DNA segments that encode a polynucleotide; the substrate population is divided into two sub-populations. In one embodiment of the method, the nuclease-susceptible endpoint in a subpopulation is introduced into the polynucleotide site which corresponds to the amino terminus of the encoded polypeptide with digestion towards the terminus c and the nuclease-susceptible end in the other sub-population is introduced into the site of the polynucleotide corresponding to the carboxy terminus of the encoded polypeptide, with digestion to the n-terminus. For purposes of description, the two sub-populations in this illustrative example can be referred to as producers of a product "deleted at the amino terminus" or a product "deleted at the carboxy terminus". After a series of niche deletions occurs in each subpopulation, the polynucleotides of each subpopulation are ligated to produce a mixture of mutated polynucleotides having random insertions (e.g. duplications) and / or deletions at the binding site. Thus, continuing with the example provided above and by way of illustration and not limitation, imagine that in each of the sub-populations the deletions are in the range from 1 base to approximately 99% of the length of the polynucleotide (including, for example 5%, 10%, 90% and 95% of suppressions). It will be appreciated that the binding of a molecule deleted at the amino terminus of which exactly 10% of the length of the molecule is deleted to a molecule deleted at the carboxy terminus of which exactly 95% of the molecule's length is suppressed will result in a molecule having 5% duplication (at the binding junction) compared to the sequence of the substrate polynucleotide. Likewise, the binding of a molecule deleted at the amino terminus of which exactly 5% of the length of the molecule is deleted to a molecule deleted at the carboxy terminus of which exactly 90% of the molecule's length is deleted. it will result in a molecule that has a 5% deletion (at the binding junction) compared to the sequence of the substrate polynucleotide. It will be apparent that many variations of this basic scheme are available, including, for example, the introduction of susceptible ends at sites other than those corresponding to the polypeptide term. It will be appreciated that the present invention is not limited to any particular method of random insertion or deletion and that methods other than those specifically described supra may be used. For example, Auto-insertion DNA, i.e., transposons, can be used for in vivo insertion combined with subsequent in vivo clearance by mobilization, or in vitro elimination by restriction endonucleases. It will often be desirable, prior to the classification step (infra), to enrich the mutated population (s) to identify polynucleotides that have been mutated (i.e., by insertion or deletion). Enrichment is desirable because even deficient methods for insertion and deletion will often result in a mutated population that contains some molecules, or even a substantial proportion of molecules, that are naturally occurring (ie, do not contain an insertion or deletion) . Using an enrichment stage will reduce the size of the population that must be subsequently classified. A variety of methods can be used for enrichment. One method, the use of resistance cassettes, is discussed above. Another suitable method for the enrichment of insertion events is carried out by denaturing the DNA of the mutated combination and subsequently joining it to another aliquot of the inserted DNA, which is immobilized on a solid support. Unbound (eg, wild-type) polynucleotides are washed off and the mutated molecules are eluted from the affinity matrix (eg, using temperature, urea, etc.). Another suitable method for enrichment involves inserting an oligo- or polynucleotide containing, in addition to the sequence to be inserted, a second sequence, such as a lac operator site, which is linked by a protein that binds to specific DNA of immobilized sequence (for example the Lacl repressor). After washing, the polynucleotides with the insertion can be eluted (for example in the presence of isopropylthiogalactoside). Subsequently the oligo- or polynucleotide sequence responsible for the binding can be separated from the polynucleotide, if desired, by a variety of methods, (some of which are discussed above), leaving behind the sequence to be inserted. It will be apparent from the above description that the practice of the invention involves several techniques known to those skilled in the art of molecular biology. Sufficient instructions to direct people of experience through appropriate techniques of cloning, sequence determination, mutation, random recombination and other techniques found in for example Berger and Kimme L, Guide to Molecular Cloning Techniques, METHODS IN ENZYMOLOGY) (Guide to Molecular Cloning Techniques, METHCDS IN ENZYMOLOGY) Volume 152, Academic Press, Inc., San Diego CA; Sambrook et al., (1989). MOLECULAR CLONING - A LABORATORY MANUAL (MOLECULAR CLONING - A LABORATORY MANUAL (2nd ed.) Vol. 1-3; and CURRENT PROTOCOLS IN MOLECULAR BIOLOGY (CURRENT PROTOCOLS IN MOLECULAR BIOLOGY), FM Ausubel et al., Eds., Current Protocols , a joint venture between Greene Publishing Associates Inc. and John Wiley &Sons, Inc., (Supplement 1998) and other references cited herein and other references known in the art IV Classification of a Muted Population Another stage in the The method of the present invention is the classification of a mutated population to discover a desired property.This results in the identification and isolation of, or enrichment of, DNA segments that acquire the desired property as a result of the mutation. (for example, a new property), or in which an existing property is increased in a desired manner. As used herein, the term "classify" has its usual meaning in the art and is, in general, a two-step process. In the first stage it is determined if a DNA segment has a particular property and in the second stage the DNA segment (s) with the property are physically separated from those that do not have the property. For convenience, the polynucleotide population resulting from the classification can be referred to as the "selected population". In some forms of classification, identification and physical separation are achieved simultaneously. For example, the identification and separation of a polynucleotide that confers resistance to a drug to a cell can be achieved by selection of drug-resistant cells (for example by culturing them under conditions in which non-resistant cells do not survive). It will be clear from this example, that the "separation" step of the classification does not imply or require the isolation of a biochemically pure polynucleotide with the desired property. Rather, separation means that the DNA segment of interest is separated from other DNA segments (e.g., cells comprising other segments of DNA). In some embodiments of the invention, when classification is carried out, the physical separation of the DNA segments with the property and those without it does not need to be absolute and due to methodological limitations it is often not. Thus, in some embodiments, the classification of the mutated population results in a selected population that is enriched for the DNA segments with the desired property. It will be immediately apparent to those skilled in the art that classification requires an assay to identify segments of DNA that have the desired property. It will also be apparent that the specific assay will depend on the particular desired property. A variety of examples are provided below to provide additional guidance for experts. Numerous additional tests suitable for use in the present invention are described in publications and descriptions describing methods of "DNA redistribution". Thus the reader is referred to the patents, applications and publications listed in Section I, above, in the description of "redistribution", each of which is incorporated herein by reference in its entirety and for all purposes. It will be appreciated, however, that the invention is not limited to some sorting method. V. Recursive Mutation and Classification In one embodiment of the invention, the selected population, generated as described supra, is mutated, ie insertions, deletions or both are introduced at random sites in the DNA segments in the selected population. The type of mutation may be the same or different from the mutations introduced in the substrate population (ie, the original or first substrate population). For example, in a case in which random insertions were made in the substrate population, inserts can also be introduced in the selected population or, alternatively, deletions can be introduced. On the other hand, when insertions are made, the inserted polynucleotide may be the same or different from the insertion polynucleotide in the previous step.The resulting population of mutated DNA segments may be referred to as a "recursively mutated population" in reference to the fact that the DNA segments have been subjected to more than one mutation cycle by insertion and / or deletion.The recursively mutated population is then classified to identify the desired property.The population of DNA segments resulting from this classification is referred to as "recursively selected population" (ie, a "recursively selected first population"). The classification used for the "selected population" and the "recursively selected population" can be the same or different in modalities in which the same classification is used, the rigor of the classification will be increased to identify DNA segments with increasingly vigorous properties. For example, if the desired property is the ability (of a DNA segment) to confer resistance to a drug on the cell, the second or subsequent screening test may use a higher concentration of the drug than the initial classification (ie , the trial of the mutated population). As another example, if the desired property is the ability of a DNA segment to encode a polypeptide that is linked by a particular antibody, increasingly stringent binding conditions may be employed in the tests. As illustrated in Figure 1, additional cycles of mutation and testing can be carried out, if desired. Generally, from 1 to 50 additional cycles will be carried out, more frequently from about 3 to about 10 additional cycles. In cases in which additional cycles of mutation and testing are carried out, it is convenient to refer to the selected selected populations as the "second recursively selected population", the "recursively selected third population", etc. As is evident, each of the recursively selected populations contains DNA segments with the desired property. Although in some cases the population as a whole will be useful, more often a particular species of DNA segment will be isolated from the population and used. SAW. Mutation of Multiple Substrate Populations and Selection of Recombinants In a related embodiment of the invention, insertions or random deletions are introduced in two (or more) different substrate populations and de-sequence elements of each population are combined by directed recombination or random recombination (by redistribution example). Typically, different insertion sequences are introduced into each of the substrate populations. One or each of the mutated substrate populations may be subjected to classification or selection to identify a particular property conferred by the mutation of that population, before the recombination of the substrate populations. Whether the analysis of substrate populations mutated or not is attempted, the recombined population will undergo classification / selection to identify the desired property or combination of properties. As noted, random recombination methods include DNA redistribution techniques. The redistribution can be carried out in conjunction with the introduction of point mutations (for example by error-prone amplification), or without the introduction of point mutations (for example by the use of corrective polymers). In contrast, "directed recombination" or subcloning refers to recombination methods that require knowledge of the restriction map of at least part of each substrate population and result in the insertion of a restriction fragment from a population at a restriction site. particular in the second population. Examples include the insertion of particular restriction fragments (by restriction and binding) or PCR amplicons (usually by ligand or SOE-PCR ["splice by extension of superposition-PCR"]) derived from a substrate population at a site or location specific in the second substrate and bound population of two substrate populations randomly linearized. VII. Random Recombination of the Selected Population In a different embodiment of the invention, the selected population (described in §111, supra), a recursively selected population (described in §V), or a species of DNA segment isolated from such a population it is used as the starting material for methods that give rise to random recombination and point mutation, for example, redistribution of DNA. It will be understood that random recombination refers to recombination methods other than directed exchange of specific defined sequences (eg, the transfer of a sequence from a population of DNA segments to a second population by restriction and binding of defined restriction fragments, for example as described in Section VI, supra). The methods of random recombination depend instead on the generation of a large combination of DNA fragments by random fragmentation of a single DNA sequence, or of a family of related DNA sequences and the reassembly of the fragments in several combinations to produce segments of DNA with a new structure (that is, new combinations of deletions, insertions and / or point mutations introduced) and with the desired property. Recursive random recombination or non-recursive random recombination methods may be used. The term "recursive" in this context refers to the use of multiple cycles of fragmentation, recombination and selection (for example, at least 2, sometimes at least 5 cycles). Typically, when a method of random recombination is applied to a single DNA segment of a selected population, a recursive recombination method will be used, for example Zhang et al., 1997, Proc. Nati Acad. Sci., 94: 4504. When a population of different DNA segments is used, both the recursive and the non-recursive methods (ie, a single cycle of fragmentation, recombination, and selection) are adequate (see, Crameri et al., 1998, Nature 391: 288 -291). VIII. Examples of Applications This section provides several examples to illustrate various uses of the invention. Many other uses and variations will be apparent to an expert upon reading the present disclosure. Application Example 1: Change of Promoter Specificity In one embodiment, the methods of the invention are used to emit a transcriptional regulatory sequence (e.g., a promoter or extender sequence) so that the expression characteristics of the regulatory sequence, such as inducibility, tissue specificity, or promoter strength. The use of the methods of the invention is particularly powerful for the evolution of regulatory elements, because such elements are typically modular in structure, with different combinations of modules (or differences in relative orientation) that contribute to the regulatory activity / function of unpredictable ways. Typically the mutation and selection of a promoter sequence is carried out using a vector (e.g., an expression vector) in which the target promoter is operably linked to a reporter gene (i.e., a gene encoding a gene product). which can be conveniently rehearsed). Many suitable reporter genes are well known in the art, including green fluorescent protein (GFP), luciferase, β-glucuronidase, β-galactosidase and secreted alkaline phosphatase. An advantage of using a promoter-indicator system is that a change in the function of the promoter can be easily detected, facilitating a variety of methods - simple selection. Once the promoter sequence is emitted by the present method to have the desired property or combination of properties, the promoter region can be cloned into a different vector (e.g., to drive the transcription of a gene of interest other than the reporter gene). Alternatively, the indicator gene sequence can be deleted from the mutated vector and a different gene of interest inserted in its place. Methods for subcloning a promoter or coding sequence into a vector are well known to those skilled in the art. (see, for example Ausubel et al., supra). For example, the mutated promoter can be amplified by the polymerase chain reaction and the amplified sequence cloned in a region upstream of a selected coding sequence. Thus, in an exemplary embodiment of the invention, (1) the substrate population is a population of DNA segments that have a particular promoter activity (e.g., the ability to direct the transcription of a reporter gene in a hepatocyte-specific manner) and (2) the desired property is a different promoter activity (e.g., the ability to handle expression in T lymphocytes) or a combination of activities ( for example, the ability to handle expression in both T lymphocytes and hepatocytes, but not in pancreatic beta cells). The generation of a lymphocyte-specific promoter, for example, can be carried out by mutating a substrate population comprising a hepatocyte promoter operably linked to a GFP reporter gene and carrying out an appropriate selection of the resulting mutated population. . The promoter sequences are mutated by random insertion and / or random deletion. As described supra, examples of polynucleotides suitable for insertion include random fragments of known promoters (e.g., a T cell or hepatocyte specific promoter, the metallothionein promoter the constitutive adenovirus major late promoter, the dexamethasone inducible MMTV promoter, the SV40 promoter, the polII promoter of MRP, the constitutive MPSV promoter, the constitutive CMV promoter and promoter elongation combinations known in the art), synthetic oligonucleotides which constitute modules of known promoters, random sequence polynucleotides and other sequences. In embodiments in which there is more than one mutation cycle, different polynucleotides can be inserted in different stages. For example, the substrate population can be mutated by random insertion of random fragments of an MMTV promoter element and the selected population can be mutated by random insertion of a defined fragment of a metallothionein promoter.

- A suitable selection comprises transducing the mutated population of polynucleotides into cultured cells of a particular type (e.g., a Jurkat T lymphocyte cell line), by testing the expression of the reporter gene in the cells (e.g. using fluorescence-activated cell selection for detect the expression of GFP) and selecting cells in which the reporter gene is expressed. Expression in the Jurkat cell type indicates that the mutated hepatocyte promoter segment has acquired the ability to handle transcription in the second cell type. The mutated DNA segments can then be isolated from the population of transduced cells that show the desired property (e.g., new specificity of expression), combined (if a combination is not isolated) and used for additional cycle (s) of insertion mutagenesis / random deletion or random recombination. Subsequent mutation and selection cycles can be used to emit a subpopulation with a higher GFP expression level in Jurkat cells, to add other elements to the promoter (for example, confer inducibility of steroidal hormone).

Additional selections may be made, if desired, to identify novel promoters with additional desired characteristics. For example, following or concurrently with a selection to identify the ability of the mutated DNA segments described above to handle expression in T cells, it may be desirable to transduce the population of DNA segments in hepatocytes and assay to identify the capacity (or lack of ability) to handle transcription in hepatocytes. Using combinations of assays, it is possible to identify novel promoter sequences that, for example, handle expression in T cells and hepatocytes, but not beta cells. Additional panels of cell types and other variations will be apparent to an expert with reading this description. It will be recognized that in the assays described above, control experiments will usually also be carried out, which are known to the experts. If desired, the DNA segment having the new transcription specificity can be isolated from the cell for further manipulation (eg, it can be operably linked to a variety of coding sequences). As will be apparent to those skilled in the art, when the mutation step is carried out on a vector comprising the promoter gene and the indicator, some of the mutations may disable the function of the reporter gene (for example by entering a frame offset). In such a case, the "non-productive mutants" in the mutated population will be eliminated in the selection stage.

Alternatively, the mutation steps can be carried out on a vector containing the promoter only and following the mutation the promoter sequences can be transferred (for example, by restriction and binding and / or PCR amplification of the promoter sequence and insertion of the promoter sequence). product) as a cassette in a primitive vector comprising a reporter gene. A variety of strategies will be apparent to an expert following the guidance in this description. Application Example 2: Change of an Enzyme Activity In some embodiments of the invention, the substrate population is a population of DNA segments that encode a polynucleotide with an enzymatic activity and the desired property is a new enzyme activity. In one embodiment, the substrate DNA segments encode a polypeptide with β-galactosidase activity and the different enzymatic specificity desired is fucosidase activity. Recursive cycles of mutation by alternative deletions (5-20 base pairs) and insertions (from a library of random hexamers) can be combined with a selection as described in Zhang et al., 1997, Proc. Nati Acad. Sci. 94: 4504. As noted above, in cases where DNAs encoding proteins are mutated, it will often be desirable to use mutation methods that retain the existing reading frame (e.g., deletion and / or insertion of a multiple of 3 nucleotide bases), although, if If desired, non-functional mutants with frame shift can be eliminated during the selection step. Application Example 3: Change of Ownership of an Encoded RNA The methods of the invention can be used to emit a regulatory element (or other region) of an RNA encoded by the DNA segment. For example, RNA stability elements are known that confer increased stability on mRNAs with which they are physically associated (eg, encoded downstream of the sequence encoding the protein). Thus, in one embodiment of the invention, the substrate population is a population of ASDN segments that encode mRNA and the desired property is the increased stability of the mRNA. The evolution of a sequence encoding an ARtSim to encode a more stable RNA is achieved by randomly inserting DNA sequences into a substrate population that encodes an mRNA and selecting or classifying to identify high levels of protein expression (because, therefore, In general, the expression of the protein product of the gene is proportional to the stability of the mRNA) or directly testing the level of expression of the mRNA. In one embodiment, the inserted sequences are fragments (eg, defined or randomized fragments) of DNA sequences of known stability elements (Chan et al., 1998, Proc.Nat.Acid.Sci.95: 6543-6547; Russell et al., 1998, Mol. Cell. Biol. 18: 2173-2183). In one embodiment, increased expression of the gene in the mutated population is detected and the resulting set of clones (or combinations of 2-20 clones having the highest mRNA stability), i.e., the selected population, is used in the redistribution or, as a target population for additional mutations. The additional mutation may include the insertion of additional downstream fragments that confer stability to the ARSfm (same or different from those inserted in earlier steps), deletion and selection to identify increased stability of mRNA, or the insertion of different sequences (for example, to confer a different selectable property on the segment of DNA encoding the RNA ). Application Example 4: Adding a Functional Domain to a Canvas or Expression Vector In this Example, the DNA segments of the substrate population are cloning vectors that can be prokaryotic, eukaryotic, or shuttle vectors and can be vectors characterized (for example pUC18) or uncharacterized vectors. Examples of vectors include - artificial chromosomes, plasmids, episomes, viruses, bacteriophages and mobile elements (eg transposoneis, insertional elements). It is often desirable to add a new domain or functional element to a vector by inserting a cassette that encodes a polypeptide (for example, cue encoding a resistance marker or new gene of interest), regulatory element, gene combinations and regulatory elements. , or other functional or structural elements. However, often the optimal location for insertion is not known. It is especially difficult to design vectors with particular or optimal properties, when the vectors are complex (for example human papilloma virus and other eukaryotic viruses) or proposed for use in relatively uncharacterized species of fungi, plants, bacteria (for example Streptomycetes), etc. By inserting the functional domain, or a fragment thereof, in a random manner, selecting the resulting mutant population and optimizing the desired property (s) by recursive insertion / deletion mutation (and optionally, redistribution) is possible efficiently generate vectors with novel and optimized properties. In one embodiment, an expression cassette (e.g., GFP under the control of the E. coli lac promoter) is inserted into random positions of the combination of a randomly linearized mixture of vectors (e.g. - a combination of pUC19, pETll, pBR322 and pBAD24). Following transformation into host cells (e.g. E. coli), the expression of the protein (for example evaluated by its activity, for example green fluorescence for GFP) and the clones expressing the highest levels of the reporter gene when tested are tested. are induced by IPTG or arabinose are identified and isolated (see, for example, Crameri et al., 1995, Nature Biotech, 14: 315-319). Is the redistribution of AD carried out? and an additional selection. The resulting product is a vector comprising the structural gene and GFP located in a main structure of the particular vector, in a position that provides the best expression properties of the protein. Application Example 5: Construction of an Operon that Confers a Multigenic Phenotype on Cells In another example, the methods of the invention are used to generate a bacterial operon that encodes several coding sequences (for example genes encoding active proteins in a metabolic pathway) particular). Thus, in one embodiment, the coding sequences for each of the polypeptides (e.g., enzymes) to be expressed is inserted in a stepwise manner (e.g. as outlined in Figure 1) into a vector comprising one or more promoters. capable of handling the transcription of the sequences encoding the polypeptide. After each step of the insertion, a selection is made to identify cells that optimally express the phenotype conferred by the inserted polypeptide (s). The resulting multigenic operon comprises each of the polypeptide sequences located relative to each other, regulatory elements and other vector elements at positions that result in optimal expression (or other selected properties). Application Example 6: Insertion of a Fragment Selectable by Affinity in a Polypeptide In another example, a cassette encoding an affinity selectable fragment is randomly inserted into a substrate population of DNA segments comprising a sequence encoding a polypeptide, resulting in mutant polypeptides that retain biological activity and have acquired the ability to be selected by affinity. The addition of an affinity-selectable fragment to a biologically active protein is useful for eg protein purification. Examples of sequences that can be randomly inserted into the sequence encoding the polypeptide of the substrate population include polynucleotides that encode affinity selectable oligo- or polypeptide sequences (e.g., epitopes of peptides recognized by an immunoglobulin), antisense fragments. antibody (for example, Vaughan et al., 1996, Nat. Biotech, 14: 309-314) and others well known in the art. Following the insertion, the mutated population is sorted and / or selected by combination assays: typically one assay identifies mutant polypeptides that include the affinity selectable sequence and a second assay identifies polypeptides that have a second biological property (such as the ability to of encoding a catalytically active enzyme). The classification to identify the affinity (affinity selection) can be carried out by any suitable method, such as COTIO affinity chromatography, immunoprecipitation, etc. In some embodiments, a phage display system is used for affinity enrichment. In such systems, the encoded oligo- or polypeptide is presented on the surface of a cell, virus or bacteriophage, where it is susceptible to binding by the affinity partner (see for example, Ernst et al., 1998, Nucleic Acids Res. 26: 1718-1723; and U.S. Patent Nos. 5,223,409 and 5,403,484). Application Example 7: Production of Protein Vaccines The production of protein vaccines is very often limited by the inefficient expression of the antigenic protein or the inefficient processing of the antigen for presentation on MHC complexes. This can be overcome by inserting one or several epitcpe sequences of the antigen into a well expressed or efficiently processed protein. Thus, in one approach, multiple T cell and / or B cell epitopes are inserted into a known "scaffolding" protein. In one embodiment, the present invention is used to produce effective vaccines by the insertion of immunodominant T cell and B cell epitopes of an immunogenic protein into the scaffold of a highly expressible protein. In an exemplary embodiment, a known epitope of HIV gp 120 B cell is inserted into a human scFv protein (Vaughan et al., 1996 Nature Biotechnology 14: 309-314) and expressed in E. coli. The presence of the B cell epitope in the chimeric protein is classified as described in USSN 09/021769 and 60/074, 294copenientes. Positive clones (ie, of the selected population) are combined and all positive clones are used for the next cycle of insertion of B cell epitopes and / or additional T cell epitopes. The redistribution of the DNA is carried out using DNA from individual clones. The resulting polypeptide comprises multiple well-expressed and well-processed immunogenic peptides and is useful as a vaccine. IX. EXAMPLES - The following examples are provided to illustrate the practice of the invention. EXAMPLE I Synthesis of a Bacterial Vector Containing a New Regulatory Promoter This example demonstrates the use of the invention to produce a vector with novel properties. Starting with a known vector (pAK400-GFP) capable of expressing the green fluorescent protein (GFP), a process is used that includes two cycles of insertion / random deletion mutation and selection or classification to produce a panel of novel vectors. The new vectors have new desired properties (compared to the parent vector) with respect to tetracycline resistance, inducibility and GFP expression levels. A) Synthesis of Randomly Linearized pAK400-GFP The parent vector pAK400-GFP is based on the vector pAK400 (Krebber et al., 1997, J. Immunol.Meth, 201: 35-55), but is modified by replacement of sequences that encode the tetR gene (tetracycline resistance) with the coding sequence for the green fluorescent protein (GFP) To construct pAK400-GFP, GFP is amplified by PCR by primers "GFP. For" and "GFP.Rev" of pBADGFP cycle 3 (Crameri et al., 1996, Nature Biotech, 14: 315-319) and cloned by Ndel and HindIII in a three fragment ligand in a - Ndel and HinflII vector fragment of pAK400, resulting in "pAK400- GFP. " In pAK400-GFP, the expression of OIPF is under the control of the lac promoter and is induced by isopropylthiogalactioside (IPTG). The vector also contains a CoIEl origin of replication derived from pUC of E. coli, a lacl gene for expression of the lac repressor, to repress the lac promoter efficiently, an origin of fl for packaging of strand DNA into phagemids and the gene for chloramphenicol acetyl transferase, which confers resistance to chloramphenicol (CamR). Supercoiled pAK400-GFP is prepared in E. coli by centrifugation at equilibrium of CsCl / ethidium bromide according to standard procedures (for example, Sambrook et al., Supra). The vector is linearized by random division by treatment with DNase I in the presence of ethidium bromide, as described in Chaudry et al., Nucleic Acids Res. 23: 3805-3809. Following the extraction with phenol / chloroform, the vector once with random separations is treated with SI nuclease at low pH to divide at a site opposite to the separation of a braid (Chaudry et al., Supra). The randomized linear vector is extracted using phenol / chloroform, precipitated and treated with a polymerase (to ensure that the DNA has blunt ends) and alkaline phosphatase (to dephosphorylate the linearized molecules to prevent self-binding). Finally, the linearized (i.e. once divided) molecule is purified on a 5% acrylamide gel or by equilibrium centrifugation in CsCl / ethidium bromide (Sambrook et al., Supra). B) Synthesis of tetR Polynucleotides for Random Insertion The tetRA operon containing the tetR gene (tetracycline resistance) of TnlO (Schollmeier et al., 1984, J *. Bacteriol.60: 499-503) is amplified by PCR of pAK400 (Krebber et al., J., Immunol., Meth. 201: 35-55) using the phosphorylated primers Tet.For and Tet.Rev and a corrective polymerase (Pfu; Stratagene) C) Random insertion of the tet operon in pAK400-GFP The products with blunt ends of (A) and (B) above were linked to each other according to standard procedures (Sambrook et al., supra.) D) Selection to Identify resistance to Tetracycline and Chloramphenicol and Classification to identify the Inducibility of GFP by IPTG The binding reaction of step (C) was transformed into a strain of E. coli K12.The transformed cells were seeded and plated on LB agar containing chloramphenicol, tetracycline and IPTG ("plates? And IPTG") After growth overnight at 37 ° C, colonies were selected on the basis of green fluorescence JBé * h-S ^ j¿ím - with exposure to UV light (Cramri et al., 1996, Nature Biotech 14: 315-319), which indicated GFP expression. Colonies expressing GFP were plated in duplicate on agar plates containing chloramphenicol, tetracycline and 2% glucose ("glucose plates") and tested for GFP expression (by inspection under UV irradiation). DNA was prepared from 100 colonies that expressed GFP on IPTG plates (plated on an initial plate) but not on glucose plates (plated in duplicate). These DNA segments comprised a population of different (relative to the position of the tetRA operon) vectors with the phenotype: CamR, TetR, IPTG-inducible expression of GFP (ie, promoter inducible by IPTG). The vectors in this population can be referred to as pAK400-GFP-Tet. As noted above, the tetR gene is inserted in different positions in different species in the population. E) Synthesis of Double-stranded Oligonucleotides of the TnlO Tet Regulatory Unit Chemically synthesized non-phosphorylated double-stranded oligonucleotides (the pairs of Opl-For / Opl.Rev and Op2.For / Op.sub.2 Rev) encoding the two operators of the tnlO promoter (Bertrand et al., 1983, Gene 23: 149-156). Together the two oligonucleotides are referred to as the "tet oligonucleotides".

F) Binding of the tet oligonucleotides in the linearized vector pAK400-GFP and trochating the promoter region in pKA400-GFp-Tet In this and following stages, the Tet oligonucleotides are randomly inserted into the vector linearized pAK400 (linearized as described for the pAK400-GFP vector in step A, supra, but not dephosphorylated) to produce a population of pAK400 vectors containing random insertions of the oligonucleotides. Subsequently, the mutated promoter region (s) of the population (containing insertions) were transferred to the pAK400-GFP-Tet vector population made in step D, supra. (An alternative strategy would be to insert randomly into the pAK400-GFP-Tet vector population.) The strategy used is preferred because it requires selecting fewer clones, ie, only clones in which the tet oligonucleotides have been inserted into random sites within the promoter region lac more than in other places in the vector). As a first step, the concentration of double-stranded tet oligonucleotides was optimized by ligating different amounts of oligonucleotide in the randomly linearized vector, followed by transformation into an appropriate strain of E. coli K12. After growth overnight at 37 ° C, the colonies were counted. The optimal concentration of oligonucleotide is that concentration that only decreased the number of colonies. Although optimizing the concentration of oligonucleotide will increase efficiency, this stage is not critical. Having determined the optimal concentration of oligonucleotide for insertion into the randomly linearized pKA400 (above), the double-stranded tet oligonucleotides encoding parts of the tet promoter region were inserted into the randomly linearized pAK400 vector by blunt end ligating. After extraction with phenol / chloroform, the resulting ligate was cut with Kpnl and Ndel in unique sites flanking the lac promoter of pAK400. The resulting fragments containing the lac promoter and a tet promoter oligonucleotide were isolated using electrophoresis on a non-denaturing 8% acrylamide gel (Sambrook et al., Supra). The Kpnl-Ndel fragment of pAK400 is 209 bp. When an oligonucleotide of 20 base pairs is inserted, the lac promoter fragment will increase in size to 229 bp. Consequently, a 229 bp band of the non-denaturing gel is isolated. The isolated fragment is cloned (ligated) in the vector combination pAK400-GFP-TET, which has been digested with Kpnl and Ndel. The result is that some (though usually not all) of the resulting binding products will comprise a randomly mutated lac-promoter (i.e., which contains random insertions of the tet promoter oligonucleotide) in a pAK400-GFP vector that is also randomly mutated (i.e., by random insertion of the tetRA operon). G) Selection to identify resistance to tet and cam and classification to identify inducibility of GFP by IPTG and / or tetracycline The binding of step (F) was transformed into an appropriate strain of E. coli K12. The transformation was seeded and plated on agar plates containing 30 μg / ml chloramphenicol, 5 μg / ml tetracycline and 2% glucose. The colonies were grown overnight at 37 ° C. The recombinants were selected to identify vectors having different promoters. Expression of GFP in the presence and absence of IPTG and / or tetracycline was determined as described infra. Colonies resistant to chloramphenicol and tetracycline were selected for growth in the presence of these two antibiotics. The resistant colonies were plated in duplicate on four different plates. All plates contained chloramphenicol (to select the CamR of the main structure of the vector pAK400). Plate 2 additionally contained IPTG, La Plate 3 additionally contained tetracycline and Plate 4 additionally contained tetracycline and IPTG. The expression of the GFP indicator gene by the colonies was detected by visual or electronic observation of the green fluorescence of the colonies exposed to UV light.

(Crameri et al., 1996, Nature Biotech 14: 315-319). Colonies that expressed GFP on a plate and not on one of the others were regulated by either IPTG and / or tetracycline. Compared to the parent vector (which is regulated exclusively by the presence or absence of IPTG) colonies in which the expression of GFP is either increased or decreased by the presence or absence of tetracycline have a regulatory function not present in the parent. This selection is able to identify populations of vectors with new phenotypes, i.e., CamR, TetR and GFP expression when different combinations of tetracycline and IPTG were used. The described properties of these vectors can be further improved by insertion cycles, additional deletion cycles, or by redistribution, using the same selection described above (and for example, testing to identify increased levels of GFP expression) or other selections. EXAMPLE II Production of a β-lactamase Containing an In Vivo Biotinylation Peptide This example demonstrates the generation of a high activity beta-lactamase polypeptide containing an in vivo biotinylation sequence. The beta-lactamase gene is able to confer resistance to ampicillin when expressed in a bacterium; the biotinylation sequence can be used to detect or purify a polynucleotide comprising the high activity beta-lactamase polypeptide. This example is illustrative of the creation of a novel multifunctional polypeptide using the techniques of the invention. A) The bla gene (which encodes beta-lactamase) is amplified by PCR of pUC19 using the Bla primers. For and Bla. Rev and subsequently cloned in the Sfil restriction site of pAK200 (Krebber et al., 1997, J., Immunol.Meth., 201: 35-55).

The resulting vector, pAK200SAMP is randomly linearized (but not phosphorylated) as described in Example 1, supra. A 90 bp polydeoxyribonucleotide of stranded dobLe is generated by annealing 90 -mers of Bio.Rev and Bio.For (which encode a polypeptide having a sequence of the biotinylation site in vivo (Schatz, 1993, Bio / Technology 11: 1138 -1143), aggregated in excess and ligated to the pAK200SAMP vector linearized randomly at random positions.The in vivo biotinylation site becomes biotinylated when the protein is expressed in strains of E. coli expressing the endogenous biotin holoenzyme synthetase encoded by Bir A (Barker et al., 1981, J. Mol. Biol. - - 146: 451-467). The pAK200SAMP vector is divided with SfiL. The fragment containing the bla gene and a 90 bp insert is identified by size and gel purified by standard methods. The fragment including the biotinylation sequence is about 896 bp (compared to about 806 bp without the insert). The purified fragments were cloned into the Sfil site of the phage display vector pAK200 (Krebbe et al., 1997, supra). After transformation of the phagemid library, the bacteria are distributed on 2YT agar plate containing 30 μg / ml of chloramphenicol and a concentration of ampicillin that reduces the recovery of the transformation to 50% of the measured complexity (the complexity measured evaluate plating on 2YT agar containing 30 μg / ml chloramphenicol, hereinafter "2YT-Cam30" plates). After growth overnight at 30 ° C, the plates are scraped and resuspended in 2YT. An aliquot is added to 100 ml of 2YT-Cam30 containing the concentration calculated above of ampicillin. After coinfection with VCSM13 (Stratagene) according to Krebber et al., 1997, supra and growth, the phages are precipitated and washed by rotation in PBS / 2% skimmed milk dialysed for two to four cycles against streptavidin (Hawkins et al., 1992, J *. Mol. Biol. 226: 889 -896) immobilized on magnetic beads (Dynal). The binding of individual clones to streptavidin is verified by phage ELISA (Lindner et al., 1997, Biotechniques 22: 140-49). These clones (which are heterogeneous) are referred to as "pAK200-bla-bio". The combination of selection on ampicillin plates and the spin wash procedure identifies polynucleotides that encode an active beta-lactamase containing a biotinylation sequence. B) The expression and activity of beta-lactamase of pAK200-bla-bio produced in Section A, supra, are optimized by redistribution of PCR (Stemmer, 1994, Nature 370: 389-391). To do this, five to ten species of pAK200-bla-bio (clones) are selected based on comparatively high beta-lactamase activity (assessed by conferring resistance to high concentrations of ampicillin on host bacteria). Bla-bio insertion is amplified by PCR using Bla primers. For and Bla. Rev. According to a standard PCR redistribution protocol (Stemmer, 1994, Nature, supra), the PCR products are randomly fragmented by DNase I, reassembled and cloned into messy Sfil sites of pAK200SAMP. The library is grown overnight at 30 ° C on 2YT agar containing 30 μg / ml of chloramphenicol and a concentration of ampicillin (the "limiting" concentration) which reduces the recovery of the transformation to 25% of the complexity measured when grown on plates lacking ampicillin. As described supra, the library is respared from the plates, grown in the presence of the limiting concentration of ampicillin and co-infected with assistant phage (supra) to produce phage particles that exhibit inserts of the bla-bio fusion. These phage particles are washed again by rotation against streptavidin (supra) spikes. Additional cycles of redistribution are carried out using selection conditions in which the concentration of ampicillin is increased and the temperatures for growth, selection and spin washing are increased to 37 ° C. This allows for further optimization of the insertion fusions. of bla-bio with respect to activity, level of biotinylation, folding and stability. The fusion (s) with optimal activity may be used for the quantitation of streptavidin, for example by measuring the activity of the beta-lactamase in a sandwich ELISA. Table 1 Primers, Oligonucleotides, Polynucleotides GFP.Avance AAGGAGATATACATATGGCTAGCAAAGGAGAAG GFP. Reverse TTCACAGGTCAAGCTTCATTATTTGTAGAGCTCATC Tet .Avance TTAAGACCCACTTTCACATTTAAG Tet. Reverse CTAAGCACTTGTCTCCTGTTTAC Opl.Avance CACTCTATCATTGATAGAGT Opl. Reverse ACTCTATCAATGATAGAGTG Op2.Avance TCCCTATCAGTGATAGAGAA Opl. Reverse TTCTCTATCACTGATAGGGA Bla.Avance TATTACTCGCGGCCCAGCCGGCCTTTGCTCACCCAGAAAC Bla. Reverse TAGAATTCGGCCCCCGAGGCCAATGCTTAATCAGTGA Bio.Avance GGTTCTGAAGGTGGTGGTTCTGCTCAGCGTCTGTTCCACATCCTGG ACGCTCAGAAAATCGAATGGCACGGTCCGAAAGGTGGTTCTGGT Bio. Reverse ACCAGAACCACCTTTCGGACCGTGCCATTCGATTTTCTGAGCGTCC AGGATGTGGAACAGACGCTGAGCAGAACCACCACCTTCAGAACC * * * Many modifications and variations of this invention can be made without departing from its spirit and scope, as will be apparent to those skilled in the art. The specific embodiments described herein are offered by way of example only and the invention will be limited only by the terms of the appended claims, together with the full scope of equivalents to which such claims are accredited. All references cited herein are incorporated herein by reference in their entirety and for all purposes to the same extent as if each publication or individual patent application were specifically and individually indicated to be incorporated by reference in its entirety for all purposes.

Claims

CLAIMS 1. A method for producing a recombinant polynucleotide having a desired functional property, the method comprises: a) mutating a first substrate population of polynucleotides encoding a polypeptide by deleting a first terminus susceptible to the nuclease using an exonuclease, by means of from which a first population of nucleic acid segments with deletions is produced; b) mutating a second substrate population of polynucleotides encoding a polypeptide by deletions of a second terminus susceptible to the nuclease using an exonuclease, whereby a second population of polynucleotide segments with deletions is produced; c) recombining polynucleotide segments with deletions generated in (a) for the polynucleotide segments with deletions produced in (b) to produce a mixture of recombinant polynucleotides; and d) classifying the mixture of recombinant polynucleotides to identify at least one recombinant polynucleotide with the desired functional property. The method of claim 1, further comprising the step of subjecting one or more recombinant polynucleotides identified in step (d) to DNA redistribution. The method of claim 1, wherein the classification in step (d) is for polynucleotides that encode a polypeptide having a functional activity selected from the group consisting of: a) an enzymatic activity; b) a binding activity; and c) denaturing stability. 4. The method of claim 3, wherein the activity is an enzymatic activity. The method of claim 1, wherein the substrate populations of polynucleotides comprise a plurality of polynucleotides having the same sequence. 6. The method of claim 1, wherein the substrate populations comprise a mixture of polynucleotides having related sequences. The method of claim 6, wherein the related sequences comprise a family of related genes. The method of claim 6, wherein the related sequences comprise homologous genes of different species. The method of claim 1, wherein the recombinant polynucleotides of step (d) are subjected to at least one additional cycle of mutagenesis and classification. The method of claim 9, wherein at least one additional cycle of mutagenesis and classification comprises conducting an amplification process of 5 polynucleotides on overlapping segments of recombinant polynucleotides of step (d) under conditions whereby one segment serves as a model for the extension of another segment, thereby generating an additional population of recombinant polynucleotides and classifying the additional population of recombinant polynucleotides for a desired property. The method of claim 1, further comprising isolation or enrichment for at least one selected recombinant polynucleotide sequence that 15 has the desired functional property. ejto.