WO2008076368A2 - Fragment-rearranged nucleic acids and uses thereof - Google Patents

Fragment-rearranged nucleic acids and uses thereof Download PDF

Info

Publication number
WO2008076368A2
WO2008076368A2 PCT/US2007/025632 US2007025632W WO2008076368A2 WO 2008076368 A2 WO2008076368 A2 WO 2008076368A2 US 2007025632 W US2007025632 W US 2007025632W WO 2008076368 A2 WO2008076368 A2 WO 2008076368A2
Authority
WO
WIPO (PCT)
Prior art keywords
protein
nucleic acid
variants
exons
relative order
Prior art date
Application number
PCT/US2007/025632
Other languages
French (fr)
Other versions
WO2008076368A3 (en
Inventor
Noubar Afeyan
Original Assignee
Codon Devices, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Codon Devices, Inc. filed Critical Codon Devices, Inc.
Publication of WO2008076368A2 publication Critical patent/WO2008076368A2/en
Publication of WO2008076368A3 publication Critical patent/WO2008076368A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1082Preparation or screening gene libraries by chromosomal integration of polynucleotide sequences, HR-, site-specific-recombination, transposons, viral vectors

Definitions

  • the invention relates to methods for identifying engineered nucleic acids that encode polypeptide variants having one or more functional and/or structural properties of interest.
  • Natural and recombinant protein products have been developed for a wide range of medical, industrial, and agricultural applications. However, methods for identifying new or improved protein products have typically involved mutagenesis or other methods for generating large numbers of random protein variants that can be sampled by selecting or screening for one or more properties of interest. Large numbers of protein variants are required, because only rare variants have desirable structural or functional properties. Most random variants are either non-functional polypeptides or have unwanted side- effects.
  • nucleic acid libraries that express many different protein-encoding sequences.
  • a variety of mutagenesis and molecular biology techniques have been used to prepare the nucleic acid libraries.
  • Different forms of chemical, physical, and/or biological mutagenesis can be used to alter an initial protein-encoding nucleic acid and introduce a range of sequence changes that will encode different polypeptide variants.
  • the number of changes that can be introduced into a starting nucleic acid can be controlled by altering the levels of mutagenesis.
  • Different techniques may be biased towards certain types of mutations. For example, different chemical and physical mutagenesis techniques can result in preferential patterns of nucleotide insertions, deletions, transitions, and/or trans versions.
  • different host cells can be used to introduce certain types of mutations into the sequence of a starting nucleic acid.
  • the resulting mutation patterns remain substantially random, and many different random nucleic acids can be generated with varying degrees of sequence similarity relative to the initial protein-encoding nucleic acid.
  • Molecular biology techniques also have been used to modify protein-encoding nucleic acid sequences and introduce mutations in relatively random patterns so that large numbers of sequence variants can be sampled.
  • error-prone nucleic acid synthesis or amplification e.g., error-prone PCR
  • low-fidelity polymerases or low-fidelity polymerization conditions can be used to introduce mutations into nucleic acids that are synthesized from an initial protein-encoding template.
  • random nucleotides can be introduced during the chemical synthesis of degenerate oligonucleotides that are designed to be incorporated into protein-encoding nucleic acids.
  • the resulting libraries include relatively random sequences and large numbers of variants need to be sampled to identify candidate polypeptides having one or more properties of interest. As explained above, many of the random variants may encode non-functional proteins or proteins with unwanted side-effects.
  • proteins with altered functions have been generated by making specific sequence changes in a gene or by targeting mutagenesis to a specific sequence region of a gene. For example, specific mutations at predetermined positions in a gene can be made and tested if specific amino acid changes have been identified as candidates for improved or altered protein function. Similarly, regions known to be associated with one or more protein functions can be targeted for mutagenesis in order to identify variants with altered properties. However, it is often difficult to predict which mutations should be made or which regions should be targeted in order to obtain novel or improved protein properties. As computer modeling techniques evolve, new and improved protein variants may be developed in silico based on predictable properties of amino acid sequence changes. However, variant libraries remain potentially useful sources for identifying polypeptides having one or more properties of interest.
  • libraries have been made based on sequence homology information. These libraries sample a smaller number of different sequences, but the variants are expected to have a higher probability of being functional since their sequences are based on combinations of known functional sequence variants. For example, libraries based on domain or exon shuffling have been made by replacing one or more domains or exons of a first gene with one or more homologous domains or exons from a second gene of the same species or with one or more equivalent domains or exons from the same gene of a related species.
  • variant proteins incorporate different combinations of sequences from homologous proteins.
  • Homologous genes are fragmented to generate overlapping fragments and the fragments are mixed and reassembled. Fragment reassembly is based on recombination/hybridization between regions of complementary or partially complementary sequences of overlapping fragments. The reassembled gene variants have the same relative sequence order as the initial genes, but incorporate different combinations of homologous sequences from the homologous genes.
  • aspects of the invention relate to nucleic acid and polypeptide libraries and methods for sampling a defined sequence space that represents predetermined rearrangements of one or more known nucleic acid and/or protein sequences.
  • By assaying systematic rearrangements of an initial nucleic acid or protein sequence new classes of therapeutic products with improved and/or novel properties can be identified based on a starting molecule that may have one or more properties of interest.
  • the invention provides methods for selecting gene fragments and patterns of gene fragment rearrangements that can be used to generate variant libraries.
  • aspects of the invention also may help identify novel gene variants that are useful for industrial, agricultural, environmental, research, and other applications in addition to novel therapeutic products.
  • libraries may be designed based on information obtained by analyzing naturally-occurring splice variants of one or more genes of interest.
  • a library may be designed and/or assembled to express different natural splice variants of a gene of interest.
  • a library may express synthetic splice variants or a combination of natural and synthetic splice variants. As additional natural splice variants are identified or predicted, they may be included in expression libraries in order to screen their functional properties and/or compare them to the properties of previously known variants.
  • a library of natural splice variants may contain one or more variants that have functional and/or structural properties of interest.
  • unnatural or synthetic splice variants also may confer functional and/or structural properties of interest.
  • a library may include natural or unnatural splice variants, or a combination thereof.
  • a library of candidate therapeutic protein variants may be based on different configurations of gene fragments (e.g., different relative orders of fragments and/or different subsets of fragments) from a single gene encoding a known therapeutic protein or candidate therapeutic protein. For example, different subsets of exons and/or different relative orders of exons from an initial gene may be assembled to produce a library of protein variants.
  • the initial protein may have one or more properties of interest. For example, it may be non-immunogenic.
  • sequence variants can be made without random or targeted mutagenesis thereby avoiding new amino acid sequences that could be immunogenic.
  • sequence variants can be made without introducing new sequences from homologous genes that also could generate unwanted immunogenicity.
  • one aspect of the invention provides libraries and methods for designing and assembling libraries based on reordered fragments from a predetermined gene or protein sequence.
  • the invention provides libraries and methods for designing and assembling libraries based on subsets of fragments (i.e., by omitting one or more fragments) from a predetermined sequence.
  • a library may include reordered fragments, fragments subsets, and/or different orders of fragment subsets.
  • an individual variant may include one or more reordered fragments in addition to missing one or more fragments of an initial nucleic acid and/or protein.
  • libraries are assembled to include splicing (or other recombination) sequences that can be used to generate a pool of predetermined variants. For example, one or more intron sequences that can promote alternative patterns of intron splicing can be included in a library construct.
  • libraries are assembled without introns or other recombination sequences.
  • the libraries may be designed to include different constructs representing the plurality of predetermined fragment rearrangements.
  • the fragments may be exons, exon portions, functional domains, structural domains, secondary structural motifs, other fragments (e.g., homology motifs, etc.), or any combination thereof.
  • a library may contain known splice variants identified from transcriptomics. For example, at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%, or all of the rearranged gene configurations in a library may correspond to known natural splice variants.
  • a library may contain at least at least 1%, at least 5%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%, or all of the known natural splice variants.
  • aspects of the invention may be used to screen variants of a therapeutic polypeptide and identify new and/or improved therapeutic proteins. However, aspects of the invention also may be used to identify new and/or improved industrial, agricultural, or other useful polypeptides.
  • aspects of the invention are useful for generating protein variants with one or more properties that are related to a parent protein.
  • new functional properties, new combination of functional properties, new subsets of functional properties, modified functional properties, or any combination thereof may be generated without introducing new sequences from a different gene, protein, or species.
  • Variant proteins are expected to share many of the properties of a parent protein. Accordingly, variant proteins of the invention may be biosimilar to a parent protein (e.g., to a parent therapeutic protein). For example, variant proteins may have similar immunogenicity profiles to those of a parent therapeutic protein.
  • the invention provides methods for selectively designing and constructing polypeptide variants that incorporate a plurality of fragments of a protein in specified combinations and/or specified permutations.
  • the invention extends to nucleic acids that encode such polypeptide variants.
  • the invention further provides nucleic acids operatively linked to appropriate expression systems, e.g., expression vectors, host cells expressing such expression vectors, and uses thereof.
  • Aspects of the invention relate to nucleic acid libraries that express polypeptide variants. Further aspects of the invention relate to methods of identifying one or more polypeptide variants having predetermined structures and or functions of interest.
  • a library may include naturally occurring and artificial splice variants.
  • a library may include natural and artificial fragment permutations.
  • a library may include at least 1%, at least 5%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% of all possible rearrangements of predetermined fragments (e.g., exons) of a predetermined protein or gene of interest.
  • a library may include one or more exon deletions. Some of these exon deletions may correspond to naturally occurring splice variants.
  • a library may include at least 1%, at least 5%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% of all possible exon subsets, and/or exon permutations for a predetermined gene or protein of interest.
  • Theoretical numbers of possible different exon rearrangements can be readily calculated for different configurations of fragment subsets and fragment permutations as described in more detail herein.
  • protein variants may have rearranged functional domains.
  • the identity of the functional domains may be based on homology and/or functional studies.
  • a library may include at least at least 1%, at least 5%, at least 20%, 25%, at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% of all possible rearrangements of functional domains identified for a predetermined gene or protein of interest. Other fragment rearrangements also may be prepared.
  • a library may include at least 1%, at least 5%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% of all possible rearrangements of a plurality of predetermined fragments of a gene or protein of interest.
  • aspects of the invention are useful for screening, identifying, making and or using polypeptide variants having one or more desirable features, including functional features, structural features, physiochemical profiles, pharmacokinetic profiles, etc., or any combination thereof.
  • the invention is useful for making and using variants that elicit changes in one or more features which include but are not limited to: transcription, translation, expression, folding, solubility, stability, post-translational modifications, activities, size, charge, localization, degradation, antigenicity, immunogenicity, toxicity, efficacy, affinity, specificity, extractability, etc., or any combination of two or more thereof.
  • the instant invention contemplates, in one aspect, a method of designing a nucleic acid library.
  • Certain methods involve identifying a plurality of gene segments that encode non-overlapping polypeptide domains of a predetermined protein and assembling a plurality of nucleic acids, wherein each nucleic acid comprises the plurality of gene segments, and wherein the relative order of the gene segments is different in each nucleic acid.
  • each nucleic acid comprises a subset of the plurality of gene segments, excluding at least one of the plurality of gene segments.
  • the invention provides a method of making polypeptide variants comprised of a discrete set of domains such that each fragment represents a portion of a predetermined protein.
  • a fragment is encoded by an exon of the gene encoding the predetermined protein.
  • the invention further contemplates incorporating these methods, compositions, and uses into business applications.
  • business methods for identifying, selecting, screening, and marketing commercially relevant polypeptide and nucleic acid variants and or related therapeutic, diagnostic, or industrial products are included in the invention.
  • aspects of the invention may be particularly useful for providing novel therapeutic compounds, accelerating patient access to novel compounds, reducing development costs, and simplifying regulatory approval of novel compounds.
  • the properties of existing therapeutic polypeptides may be improved or altered.
  • methods of the invention also may be used to rescue failed therapeutic candidate compounds by identifying novel variants with favorable production and/or clinical characteristics.
  • FIG. 1 illustrates non-limiting embodiments of predetermined genes showing initial intron/exon configuration and splicing patterns - exons are represented as A, B, C, and D in FIG. IA and Al, A2, A3, B, and C in FIG. IB - introns are represented as 1, 2, and 3 in FIG. IA, and as thick lines connecting A3 to B and B to C in FIG. IB; and
  • FIG. 2 illustrates a non-limiting embodiment of an initial gene having exons A, B, and C and introns 1 and 2 in FIG. 2A
  • FIG. 2B illustrates different subset combinations of exons A-C
  • FIG. 2C illustrates different permutations of exons A-C.
  • aspects of the invention relate to novel polypeptides and intelligent methods for designing and identifying improved therapeutic polypeptides.
  • the invention provides methods for designing new gene sequence configurations and related polypeptide sequence configurations based on rearrangements of initial genes and proteins rather than by introducing mutations or homologous sequences from exogenous sources.
  • aspects of the invention are based, at least in part, on the recognition that biological systems have evolved genetic elements such as exons and introns that can be spliced into several alternative configurations (e.g., via alternative splicing) that can provide related, but varied, functional properties.
  • aspects of the invention extend the concept of alternative splicing to provide a systematic sampling of a predetermined sequence space defined by different types of rearrangements of nucleic acid fragments derived from an initial gene.
  • aspects of the invention provide methods for identifying new classes of therapeutic proteins.
  • a library can be assembled to express a plurality of RNAs or polypeptides that are enriched for variants that share one or more properties of the initial molecule.
  • properties that can be retained by genetic rearrangement include solubility, stability, immunogenicity profiles (e.g., low for a therapeutic polypeptide, higher for a vaccine), specific catalytic activities, specific binding affinities, or other properties that represent desirable features of the initial molecule.
  • novel properties can be uncovered.
  • certain variants may retain the positive features of an initial molecule while removing one or more negative traits (e.g., tissue toxicity, environmental toxicity, excessively rapid or slow serum clearance rates in a patient, unwanted side-effects, lack of substrate specificity, secondary catalytic activities unrelated to the activity of interest, etc.).
  • tissue toxicity e.g., tissue toxicity, environmental toxicity, excessively rapid or slow serum clearance rates in a patient, unwanted side-effects, lack of substrate specificity, secondary catalytic activities unrelated to the activity of interest, etc.
  • novel functional and/or structural properties may be created by rearranging sequence fragments. Accordingly, existing therapeutic proteins may be improved, novel therapeutic classes may be identified, and certain therapeutic candidates that have failed in clinical trials may be rescued.
  • a rearranged gene library based on a gene that expresses a failed therapeutic candidate may be subjected to one or more screens or selections to identify variants of the failed candidate drug that either have increased activity or reduced negative properties (e.g., reduced immunogenicity, reduced toxicity, etc.).
  • Aspects of the invention also may be used to sample and assay genetic rearrangements that express one or more variant therapeutic RNAs.
  • aspects of the invention also may be used to produce libraries of genes encoding RNAs and/or polypeptides that are useful for agriculture, industry, environmental applications, research, etc.
  • sequences of the nucleic acid fragments encoding predetermined polypeptide fragments may be modified without affecting the polypeptide sequences.
  • the nucleic acid sequences may be modified to optimize nucleic acid assembly, stability, and/or expression. For example, certain repeat sequences may be altered or removed (e.g., by introducing one or more degenerate codons without altering the encoded amino acid sequence) to reduce incorrect assembly and/or to stabilize the assembled nucleic acids.
  • codons may be optimized for expression in a particular host cell (e.g., by removing one or more species-specific rare codons) without altering the encoded amino acid sequence.
  • the fragments that are rearranged to generate libraries of nucleic acid variants may be exons, exon portions, functional domains, structural domains, secondary structural motifs, other fragments (e.g., homology motifs, etc.), or any combination thereof.
  • a eukaryotic gene may comprise one or more exons and one or more introns.
  • FIG. IA illustrates a non-limiting example of a gene with four exons illustrated as A, B, C, and D, separated by introns 1, 2, and 3.
  • An exon is a region of DNA within a gene that is transcribed and retained in a final messenger RNA (mRNA) molecule.
  • mRNA messenger RNA
  • an intron refers to a non-coding intervening region in a gene that is precisely removed from an RNA transcript by a process termed RNA splicing, or RNA processing.
  • RNA splicing is a process that removes introns and joins exons from a primary transcript (or pre-mRNA) to form a mature mRNA (transcript).
  • a non-limiting example of a mature mRNA is shown in FIG. IA containing exons A, B, C, and D.
  • each exon contains part of the open reading frame (ORF) that codes for a specific portion of a complete protein.
  • ORF open reading frame
  • exons are wholly or part of the 5' untranslated region (5' UTR) or the 3' untranslated region (3' UTR) of each transcript.
  • the untranslated regions are important for efficient translation of the transcript and for controlling the rate of translation and half life of the transcript.
  • transcripts made from the same gene may not have the same exon structure since parts of the mRNA could be removed by the process of alternative splicing.
  • Some mRNA transcripts have exons with no ORF' s and thus are sometimes referred to as non- coding RNA.
  • spliceosome The process of splicing is catalyzed by a large RN A-protein complex known as a spliceosome.
  • the spliceosome is composed of five small nuclear ribonucleoproteins (snRNPs).
  • snRNPs small nuclear ribonucleoproteins
  • splicing requires many non-snRNP protein factors.
  • the RNA components of snRNPs interact with the intron and may be involved in catalysis.
  • Two types of spliceosomes have been identified (the major and minor) which contain different snRNPs. For example, the major spliceosome splices introns containing GU at the 5' splice site and AG at the 3' splice site.
  • U2 binds to the branch.
  • U4 inhibits U6.
  • U5 binds to Ul and U2 to create the lariat.
  • U2-U6 forms an active catalytic complex.
  • the minor spliceosome is very similar to the major spliceosome. However it splices rare introns with different splice site sequences. Here, the 3' and 5' splice sites are AU and AC, respectively. While the minor and major spliceosomes contain the same U5 snRNP, the minor spliceosome has different, but functionally analogous snRNPs for Ul, U2, U4, and U6, which are respectively called Ul 1, U 12, U4atac, and U ⁇ atac. Most introns start from a GU sequence and end with an AG sequence (in the 5' to 3' direction). They are referred to as the splice donor and splice acceptor sites, respectively.
  • the sequences at the two sites are not sufficient to signal the presence of an intron.
  • Another important sequence is called the branch site located 20 - 50 bases upstream of the acceptor site.
  • the consensus sequence of the branch site is "CU(A/G)A(C/U)", where A is conserved in all genes.
  • the exon sequence is (A/C)AG at the donor site, and G at the acceptor site.
  • Splicing occurs in a two-step biochemical process. Both steps involve transesterification reactions that occur between RNA nucleotides. First, a specific branch-point nucleotide within the intron reacts with the first nucleotide of the intron, forming an intron lariat. Second, the last nucleotide of the first exon reacts with the first nucleotide of the second exon, joining the exons and releasing the intron lariat.
  • Alternative splicing is a process that occurs in eukaryotes in which the splicing process of a pre-mRNA transcribed from one gene can lead to different mature mRNA molecules and therefore to different proteins.
  • Intron retaining mode in this case, instead of splicing out an intron, the intron is retained in the mRNA transcript. However, the intron sequence must be coding and properly expressible, otherwise a stop codon or a shift in the reading frame will cause the protein to be non-functional.
  • Exon cassette mode in this case, certain exons are spliced out to alter the sequence of amino acids in the expressed protein.
  • One aspect of the invention is a library of gene variants based on splice variants.
  • the library is based on natural splice variants. Natural splice variants are splice variants commonly found in RNA transcripts.
  • the library is based on unnatural splice variants, which are based on an extension of naturally occurring alternative splicing to produce splice variants that form junctions between exons that are not directly spliced together in natural systems.
  • the library is a combination of natural and unnatural splice variants.
  • a library of gene variants is designed to include altered introns that will promote alternative splicing patterns that are not found in nature.
  • different constructs in a library may be engineered to include altered donor, acceptor, and branch sites in one or more introns of the naturally occurring gene.
  • the splice sites may be altered to either increase or decrease splicing between different exons, thereby generating different patterns of splicing.
  • Sequence alterations may be based on conserved or consensus sequences or known intron sequences that are either efficiently or inefficiently spliced. Sequence alterations also may be introduced into different constructs to promote alternative promoter selection and/or alternative sites of polyadenylation.
  • a library can be generated with plurality of different constructs that are predicted to produce a plurality of alternative splice variants, some of which may be natural splice variants, some of which may be novel splice variants.
  • a library may be designed so that at least 1%, at least 5%, at least 10%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 90%, at least 95%, about 99%, or all of the splice variants are predicted to be non-natural splice variants.
  • At least 1%, at least 5%, at least 10%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 90%, at least 95%, about 99%, or all of the constructs in the library include the original exons separated by intervening introns, but at least one of the introns comprises a modified donor, acceptor, and/or branch site. In some embodiments, at least one of the introns is an exogenous intron that is introduced to replace one of the original introns.
  • the relative order of the exons in one or more constructs may be changed and the intervening introns may be the original introns or may be modified as described herein.
  • one or more exons may be omitted from some of the constructs in the library.
  • different constructs in a library may contain modified introns, novel orders of two or more exons, one or more omitted exons, or any combination of two or more thereof.
  • a host cell is engineered to include constructs with alternative splicing configurations (e.g., expressed from a construct on a vector) and grown under conditions that allow and/or promote splicing.
  • a plurality of different cells, each with a different set of altered splice domains is designed and engineered. Each cell can give rise to a single splice variant. However, one or more cells may give rise to two or more different splice variants. Desirable properties can be selected using any suitable assay. Cells expressing a desirable variant (e.g., either as a single splice variant or as one of a few splice variants) can be isolated and the desirable splice variant can be identified.
  • a nucleic acid encoding a desirable splice variant once identified, can be synthesized as a single uninterrupted coding sequence without any introns.
  • a nucleic acid can be designed and assembled such that it will produce a single splice variant corresponding to an identified splice variant having one or more desirable properties.
  • host cells may be recombinant cells that are engineered to express altered levels of one or more splicing factors (e.g., enzymes or nucleic acids) and/or one or more altered splicing factors (e.g., enzymes or nucleic acids) having altered levels of activity.
  • splicing factors e.g., enzymes or nucleic acids
  • altered splicing factors e.g., enzymes or nucleic acids
  • an engineered host cell may over-express one or more wild-type or altered splicing factors in order to increase the overall level of splicing activity (e.g., to increase the rate and or level of intron removal from RNA transcripts).
  • a host cell may over-express one or more wild-type or altered splicing factors in order to alter the pattern of intracellular splicing (e.g., to alter the relative rate and or pattern of intron removal from RNA transcripts).
  • a host cell may be engineered to increase exon-skipping.
  • a host cell may be engineered to decrease exon-skipping.
  • trans-acting enzymes e.g., protein kinases
  • trans-acting factors that can act on one or more cis-acting nucleic acids to effect intron removal.
  • cis-acting sequences and trans-acting factors that can be used to increase or decrease exon skipping include, for example, exon sequences and serine/arginine-rich splicing factors that bind to certain exon sequences, such as those disclosed in Wheat et al., Proc. Nat. Acad. ScL, 2005, vol. 102, no. 14, 5002-5007.
  • the amount of exon skipping can be increased in a host cell.
  • the amount of exon-skipping can be kept at low levels in a cell by using a host cell that expressed sufficiently high levels of the serine/arginine-rich splicing factors.
  • nucleic acid constructs may be designed to encode different arrangements of exons without the intervening introns.
  • different constructs encode different subsets of exons arranged in the same relative order as in the naturally-occurring gene. Accordingly, the different subsets of exons correspond to different combinations of one or more exon deletions.
  • a library may be designed to encode different numbers of exon deletions. For example, a library of single exon deletions may be designed.
  • a library containing all combinations of 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or more different exon deletions may be designed.
  • a library may be designed to include all combinations of all numbers of exon deletions, wherein the remaining exons are in the same relative order as in the parent gene. In some embodiments, a library may be designed to include all combinations of all numbers of exon deletions with the exception of deletions that result in only 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or any combinations thereof remaining in the constructs. It should be appreciated that the design considerations will be different for different genes and proteins, since different intron-containing genes can have a wide range of numbers of introns and exons.
  • the relative order of exons may be changed in an assembled nucleic acid library.
  • the relative order of two or more different exons may be changed.
  • a library may be designed to contain different constructs that represent at least 1%, at least 5%, at least 10%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 90%, at least 95%, about 99%, or all of the different relative orders of exons.
  • a library may be designed so at least 1%, at least 5%, at least 10%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 90%, at least 95%, about 99%, or all of the different constructs contain two or more exons in a different order relative to the original gene.
  • libraries may include combinations of exon deletions and exon permutations.
  • the exon deletions are on different constructs from the exon permutations.
  • one or more constructs may contain combinations of one or more exon deletions and/or exon permutations.
  • a library may include some constructs that only contain deletions, some constructs that only contain permutations, and some constructs that contain combinations of deletions and permutations. Libraries may be designed to include at least 1%, at least 5%, at least 10%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 90%, at least 95%, about 99%, or all of the different possible combinations of exon subsets and/or exon permutations described herein.
  • libraries may be designed so that at least 1%, at least 5%, at least 10%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 90%, at least 95%, about 99%, or all of the constructs contain an exon subset and/or an exon permutation described herein.
  • Certain libraries may have predetermined patterns of exon deletions and/or exon permutations. For example, most (e.g., more than 50%, more than 75%, more than 80%, more than 90%, etc.) of the exon deletions and/or exon permutations may be located in the 5' half, 3' half, or between the 5' quarter and the 3' quarter of the gene.
  • the half and quarter measures of the gene may be made with reference to the number of exons and introns rather than being based on nucleic acid lengths. These may be different since the relative lengths of different exons and introns may be very different.
  • aspects of the invention also contemplate adding one or more exons (e.g., by duplicating one or more exons of the original gene. Accordingly, the invention includes embodiments comprising methods of designing and making peptide variants by deleting and or adding one or more exons.
  • the nucleic acid encoding a first variant comprises a subset of all available exons of a gene
  • the nucleic acid encoding a second variant comprises a different subset of all available exons of the gene, and so on, such that each variant is defined by a unique combination, or assortment, of exons of a gene, rather than by the relative order by which the exons are arranged.
  • one or more exons are deleted, or absent.
  • the following example illustrates a gene comprised of four exons, A, B, C, and D, that yields a transcript expressed as ABCD.
  • any one of the four exons may be deleted while retaining the relative order of exon arrangement to generate the following variants: ABC, ABD, ACD, BCD.
  • variants that result from exon assortment may include partially or substantially overlapping sequences.
  • a first variant may differ from a second variant by one exon.
  • two variants may share substantially the same set of exons, with an exception of a single alternative exon.
  • the first variant and the second variant contain an alternative exon, in addition to common exons, such that the former may be expressed as ABC, while the latter may be expressed as ABD.
  • more than one exon may be absent. Examples of such embodiments are: AB, AC, AD, BC, BD, CD, A, B, C, and D.
  • one or more exons are repeated, or duplicated. Examples of embodiments in which one or more exons are duplicated include but are not limited to: AABCD, ABBCD, ABCCD, ABCDD, AABBCD, ABBCCD, AABCCD, AABCDD, AABBCCDD, AABCDD, AAABCD, and so on.
  • one or more exons are deleted or missing, and one or more different set of exons are duplicated.
  • examples of such variants including both deletions and duplications include but are not limited to: AACD, in which exon A is duplicated and exon B is absent; BBDD, in which exons A and C are absent and exons B and D are each duplicated.
  • the invention includes generating peptide variants based on exon reordering.
  • exon reordering refers to the process and/or outcome of altering the relative order of exons from a natural gene, also referred to herein as exon permutations.
  • embodiments include peptide variants that are encoded by the same set of exons, but are rearranged in various relative orders.
  • a peptide encoded by a gene, fragment thereof or corresponding cDNA comprising four exons, A, B, C, and D, which is expressed as ABCD may be rearranged in various orders to yield: ABDC, ACBD, ACDB, ADBC, ADCB, BACD, BADC,
  • each variant includes each of the four exons, A, B, C, and D, but in a different order.
  • variants with different relative orders of exons may include variants where the relative order of all exons is changed (e.g., a variant of ABCD having the order DCBA) or variants where the relative order of only a few exons is changed (e.g., a variant of ABCD having the order BACD, where the order of A and B is inverted, but the order of C and D is the same).
  • variants having different relative orders of other nucleic acid fragments described herein may include variants where only a few fragments are rearranged relative to each other and variants where most or all fragments are rearranged relative to each other. It should be appreciated that the number of different configurations increases exponentially as the number of different fragments increases as described herein.
  • the invention includes peptide variants having different relative orders of exons as well as different exon combinations (e.g., subsets and/or duplications).
  • complex libraries of peptide variants may encompass polypeptides generated by both assortment and permutation described above. To illustrate the model, a few of such examples are shown below.
  • a parent gene having five exons, A, B, C, D, E, is show as: ABCDE. If exons D and E each represent an alternatively spliced exon, such that one but not both is expressed in a cell, naturally occurring variants include ABCD, in which exon E is absent, and ABCE, in which exon D is absent. Thus, in the instant example, the parent variant, ABCDE, represents an artificial (non-naturally occurring) isoform comprising each of the five exons contained in the gene or gene fragment.
  • a second variant may further contain a duplication of an exon. Suppose exon B is duplicated once, based on the same example provided above.
  • ABBCDE (according to the "parent" or largest representation of unique exons)
  • ABBCD (according to the first of the two naturally occurring alternative spliced variant)
  • ABBCE (according to the second of the two alternative variants).
  • BABCDE wherein the first two exons of the ABBCDE variant are swapped
  • BBCAD wherein the exons of ABBCD are rearranged
  • EBBACA wherein the ABBCE variant gives rise to exon rearrangement as well as an additional duplication (that of exon A).
  • these configuration are illustrative purposes only and not intended to be limiting. Many initial genes have larger numbers of exons and similar patterns of exon rearrangements can be generated involving larger numbers of possible variants.
  • exons and intron boundaries can be defined experimentally by analyzing the mRNA sequence that is expressed from a genomic locus. For example, a cDNA construct may be sequenced and the location of the intron/exon boundaries may be inferred from the genomic sequences that are absent from the cDNA.
  • putative exons and introns may be identified based on sequence analysis of a genomic locus (e.g., based on the analysis of open reading frame distributions and the identification of putative splice acceptor, donor, and branching sites).
  • synthetic libraries of the invention may be based on rearrangements of known exon sequences and/or predicted exon sequences. Rearrangements of sequences encoding functional domains, structural domains, secondary structures and other gene fragments:
  • RNA or protein coding sequence may be analyzed and subdivided into a plurality of fragments based on one or more different criteria.
  • at least one of the fragments may represent a polypeptide domain capable of assuming a discrete module.
  • Such domain may represent a structural module, a functional module or both.
  • module or “modular” may refer to a segment of primary polypeptide sequence that is discernable at the level of secondary structure.
  • a modular domain of a protein represents a functionally independent unit.
  • a modular domain of a protein may exert relative stability. Accordingly, different coding fragments may be assigned based on different functional and/or structural domains that they encode. The identification of functional and/or structural domains in an encoded RNA or protein may be based on experimental data, sequence homology, structural data, RNA modeling, protein modeling, computer- implemented structural or folding models, or any other source of actual or predicted functional or structural information. In some embodiments, one or more coding fragments may be based on different secondary structure motifs, for example, the presence of sequences predicted or known to form alpha helices, beta sheets, or other secondary structural motifs.
  • one or more coding fragments may be based on the presence of sequences known or predicted to form certain tertiary structures. In some embodiments, one or more coding fragments may be based on their known or predicted interactions with other RNA or protein subunits (e.g., in hetero- or homo- multimeric nucleic acids, proteins, or nucleic acid protein complexes). Coding fragments also may be defined based on sequence homologies (e.g., the presence or absence of conserved sequences) regardless of whether a known or predicted function or structure has been associated with the sequence. In certain embodiments, coding fragments may defined be based on patterns of one or more sequence elements.
  • coding fragments may be determined arbitrarily or at least not based on predicted structural and/or functional properties of the encoded fragments.
  • a coding sequence may be subdivided into a number of fragments (e.g., of approximately the same size) that is determined based on the number of different rearranged variants to be included in the library.
  • one or more fragments may be defined based on the presence of one or more convenient restriction sites spaced at suitable interval along the coding sequence.
  • the presence of certain repeat sequences, high GC content, or other sequence motifs that are predicted to interfere with nucleic acid assembly may provide a basis for defining fragments of an initial coding sequence.
  • different fragments may be designed so that sequences predicted to interfere with each other during assembly are located on different fragments.
  • different fragments may be assembled separately and then recombined or further assembled in different configurations to generate the constructs of a library.
  • a library of rearranged fragments may contain a combination of different fragments that are defined based on different criteria (e.g., based on a combination of any one or more of the criteria described herein, including natural intron/exons boundaries).
  • the libraries of rearranged coding fragments may be assembled without including introns.
  • one or more introns may be included in some embodiments, either between exons of the initial gene and/or between nucleic acid regions encoding fragments that were defined based on one or more criteria other than exon/intron boundaries.
  • the introns may consist of natural intron sequences or contain artificial and/or modified intron sequences.
  • libraries containing different types of rearrangements may be designed and assembled as described above for exon rearrangements.
  • a library containing and/or expressing different subsets of the predetermined fragments meaning that one or more of the initial fragments are missing from each subset
  • a library also may be designed and assembled to include only permutations of the predetermined fragments.
  • Yet further libraries may be designed and assembled to include combinations of fragment subsets and fragment permutations, including or excluding one or more permutations of fragment subsets.
  • constructs in the library may represent fragment subsets wherein the relative order is the same as the order in the initial gene.
  • Other constructs in the libraries may represent permutations of all of the predetermined fragments from the initial gene.
  • a library also may contain constructs that represent permutations of fragment subsets. Depending on the number of fragments that are being rearranged and the number of different variants present in a library, different percentages of all possible combinations of subsets and/or permutations will be represented (assuming that all variants would be assembled and represented at similar frequencies).
  • a library may be designed to represent only a predetermined subset of all possible rearrangements.
  • a library may be designed to represent and/or actually include at least at least 1%, at least 5%, at least 10%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 90%, at least 95%, about 99%, or more (e.g., all or substantially all) possible fragment subsets, fragment permutations, or combinations of fragment subsets and permutations. However, in some embodiments, less than 1% of the total possible variants may be represented. It should be appreciated that a library also may include one or more other sequences. Some of these sequences may be included intentionally (e.g., as control sequences, reference sequences, or other sequences).
  • a library may be designed and/or assembled to contain any number of different variants. Each variant may be individually assembled using one or more multiplex assembly methods described herein. In some embodiments, one or more (e.g., 100%, 100%-75%, 75%-50%, 50%-25%, 25%- 10%, or fewer) of the different fragments may be assembled independently and reused in different combinations to generate constructs encoding different subsets, permutations, and/or subset permutations as described herein.
  • a library may be assembled to include 5-50 constructs, 50-100 constructs, 100-1,000 constructs, 10 3 - 10 4 constructs, 10 4 - 10 5 constructs, 10 5 - 10 6 constructs, 10 6 - 10 7 constructs, 10 7 -10 8 constructs, 10 8 -10 9 constructs, 10 9 -10 10 , 10 10 -10 15 constructs, or more.
  • each construct will include a single variant.
  • a single construct may be designed to include 2 or more different variants (e.g., 2-, 3, 4, 5, 5-10, 10-50, 50-100, or more different variants).
  • the number of different combinations that a library can represent depends in part on the theoretical number of possible combinations. Total numbers of different combinations of fragment subsets and permutations can be calculated theoretically as described in the following paragraphs.
  • the number CN(r) of different combinations of subsets of n predetermined fragments (e.g., exons and/or other predetermined fragments) taken in groups of r, in the same relative order as in an initial gene may be provided by the following equation:
  • N of subsets of exons that exclude only two exons may be expressed as the number of different combinations of n-2 exons: n ⁇ n ⁇ n * (n - 1)
  • the number of other combinations of different numbers of subsets of exons (fragments) may be calculated.
  • the total number of different combinations of different numbers of predetermined fragments, in the same order as in the original gene, may be provided by the following general equation:
  • ⁇ CN (r) ⁇ (Equation 2) tT tM(n-r)!
  • This number includes the original fragment configuration.
  • a library may be prepared to include the original structure. This may be used as an internal control. However, in some embodiments, a library may be assembled to exclude the original configuration. Also, the original configuration may be provided or assembled separately for use as a control. The number of new subsets of the predetermined fragments, excluding the original fragment configuration, may be provided by the following general equation:
  • ⁇ CN(r)-l ⁇ n ⁇
  • predetermined fragments of a gene or protein may be reordered.
  • the number PN of possible rearrangements consisting of the same number of original fragments n in different relative orders may be provided by the following equation:
  • the number of new permutations (i.e., excluding the original configuration) may be provided by the following equation:
  • new permutations include many different configurations ranging from examples where the relative order of most or all of the fragments is changed to examples where the relative order of only a small number of fragments is changed.
  • new permutations include examples where the order of two adjacent fragments is changed (e.g., inverted) relative to each other, but their order relative to other surrounding fragments is not changed.
  • the total number of possible permutations, including permutations of all different subsets of r fragments, may be provided by the following equation:
  • the number of new rearrangements including all possible permutations of all different subsets of r fragments may be provided by the following equation:
  • a library may include all of the theoretical possible fragment rearrangements described herein (e.g., any of the numbers of fragment rearrangements provided by any of the above equations. However, a library may include any subset of different rearrangements. Accordingly, any percentage of the numbers of different rearrangements provided by any of equations (1) through (16). For example, at least 1%, at least 5%, at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, etc., of the numbers of possible different fragment arrangements may be included in the library. However, higher or lower percentages may be included in the library.
  • a subset of arrangements may be specifically identified and the library may be designed and assembled to include only that specific subset.
  • a library may be designed to include all of the different combinations, but only a percentage are actually assembled.
  • a library also may include variants with one or more duplicated fragments. The possible variants including duplicated fragments will be higher than outlined above and will depend on the number of fragments for which duplications are provided and the number of copies of each duplicated fragment included in the variants. It should be appreciated that any of the libraries described herein may be designed and/or assembled to include or exclude certain nucleic acids. For example, the initial nucleic acid may be included in some embodiments, or excluded in other embodiments.
  • a library may be designed and/or assembled to either include or exclude certain rearranged nucleic acids.
  • a library may be designed to exclude rearranged nucleic acids that are smaller than about 75%, smaller than about 50%, smaller than about 25%, smaller than about 10%, or smaller than about 5% of the length of the initial nucleic acid.
  • a library may be assembled or designed to include certain structural and/or functional portions of an initial nucleic acid.
  • certain sequences such as signal peptides may be excluded from the library variants. This may require excluding all or part of the first exon of certain genes. However, in other embodiments, the cognate signal peptide may be retained in all variants of a protein encoding sequence.
  • a library may contain different configurations of nucleic acid rearrangements based on initial nucleic acids, initial fragment selection (e.g., based on exons, structural or functional domains, etc. or any combination thereof), the inclusion of sequence variants for one or more fragments, the inclusion of one or more controls or reference constructs, and/or the exclusion of one or more nucleic acid configurations.
  • a library may include all or most rearrangements based independently on two or more different initial fragment selection criteria (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 10-15, or more different initial fragment selection criteria).
  • a library may include all or most exon rearrangements, and all or most rearrangements based on one or more other initial fragment selections (e.g., structural domains, functional domains, etc. or any combination thereof). Accordingly, the equations set forth herein are not intended to be limiting but may be useful to determine the number of different constructs that may be required for a library to be representative of all (or a significant proportion) of sequence rearrangements based on one or more initial nucleic acids and one or more predetermined fragment selection.
  • a library may be designed or assembled to include from about 10% of the theoretical number of possible rearrangements to several fold the theoretical number of possible rearrangements.
  • a library may be designed or assembled to include about 1%, about 5%, about 10%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, about 100%, about 1.01 fold, about 1.05 fold, about 1.1 fold, about 1.2 fold, about 1.3 fold, about 1.4 fold, about 1.5 fold, about 1.6 fold, about 1.7 fold, about 1.8 fold, about 1.9 fold, about 2 fold, about 2.5 fold, about 3 fold, about 3.5 fold, about 4 fold, about 4.5 fold, about 5 fold, about 10 fold, about 10-50 fold, about 50- 100 fold, or more of the theoretical number of different rearranged nucleic acids based on one or more equations provided herein.
  • aspects of the invention may involve rearranging fragments representing the entire coding sequence of a gene product of interest (e.g., a protein of interest).
  • a gene product of interest e.g., a protein of interest
  • only certain fragments may be rearranged from certain portions of a coding sequence or gene and not the entire coding sequence or gene. For example, deletions of one or more fragments (e.g., exons or other fragments) may be restricted to a portion of a coding sequence or gene (or two or more portions of a coding sequence or gene, but not the entire coding sequence or gene). Similarly, permutations may be restricted to fragments from a portion of a coding sequence or gene (or two or more portions of a coding sequence or gene, but not the entire coding sequence or gene). In certain embodiments, certain structural regions of a gene product that are known to be important may be retained and maintained in the same relative order.
  • regions or subsets of other regions may be deleted and/or permuted as described herein.
  • a theoretical number of possible rearrangements for portions of a coding sequence or gene may be calculated using the equations provided herein. Accordingly, in each equation the number n of different fragments being rearranged can be the number of fragments that represent only the portions of the coding sequences or genes that are being varied.
  • any portions of a coding sequence or gene may be varied.
  • fragments representing functional and/or structural domains may be rearranged within one or more portions of interest.
  • protein transmembrane domains may be rearranged (e.g., deleted and/or permuted) as described herein without removing or changing the relative position of other portions of a protein.
  • one or more protein binding sites and/or active site regions may be rearranged without rearranging an entire protein.
  • libraries of nucleic acids may be assembled to represent different predetermined coding sequences for protein variants having one or more rearranged portions.
  • initial genes and proteins are described in relation to a starting or initial nucleic acid sequence (e.g., an initial gene encoding an RNA or polypeptide of interest).
  • the initial nucleic acid may be a wild- type gene encoding a molecule of interest.
  • the initial nucleic acid may be a mutant (e.g., a naturally occurring mutant or an artificial mutant of a wild-type gene).
  • a mutant gene may containing one or more point mutations, deletions, insertions, inversions, duplications, etc., or any combination thereof relative to a wild-type gene.
  • an initial gene may be based on a number of different factors.
  • the initial gene encodes for one or more properties that are desirable (e.g., low immunogenicity, solubility, stability, etc.).
  • the initial gene also may be selected based on the presence of one or more properties that are expected to give rise to variants having functional and or structural characteristics of interest.
  • an initial gene may be selected because it encodes one or more catalytic regions and/or one or more binding regions for which modifications and/or new combinations are desired.
  • An initial gene may be selected on the basis of certain therapeutic properties, but for which modifications are desired to enhance the therapeutic activity and/or reduce certain negative physical and/or physiological characteristics.
  • an initial nucleic acid may be an entirely synthetic nucleic acid, in which case it may not include exons.
  • certain initial nucleic acids may be from prokaryotic organisms and not have any introns or exons.
  • certain eukaryotic genes do not contain any exons and may be used as an initial nucleic acid in aspects of the invention.
  • an initial gene may include exons in a genomic form. However, the initial gene used as a reference for fragment rearrangement may be based on a cDNA construct without introns.
  • a library may represent predetermined rearranged variants of a single initial nucleic acid (e.g., coding sequence or gene). However, in certain aspects, a library may represent rearranged variants of two or more different initial nucleic acids (e.g., 3, 4, 5, 6, 7, 8, 9, 10, 10-15, 15-20, 20-25, 25-50, 50- 100, or more different genes or coding sequences). The genes may be related or unrelated.
  • related nucleic acids have sequence similarities, for example 70% or more nucleic acid sequence identity (e.g., 70-75%, 75-80%, 80-85%, 85-90%, 90-95%, about 96%, about 97%, about 98%, about 99%, or more nucleic acid sequence identity) or 70% or more amino acid sequence identity (e.g., 70-75%, 75-80%, 80-85%, 85-90%, 90-95%, about 96%, about 97%, about 98%, about 99%, or more amino acid sequence identity).
  • related nucleic acids may have functional similarities (e.g., they encode kinases, phosphatases, receptors, growth hormones, cytokines, antibodies, etc.).
  • a library contains variants of two or more different starting nucleic acids from only one species (e.g., humans, mice, pigs, non-human primate species, rats, cows, sheep, horses, dogs, cats, etc.). However, in other embodiments, a library contains variants of starting nucleic acids from two or more different species (e.g., two or more species selected from, but not limited to, humans, mice, pigs, non-human primate species, rats, cows, sheep, horses, dogs, and cats).
  • a library may contain variants of two or more independently rearranged nucleic acids (e.g., rearranged variants of each of two or more genes are assembled independently).
  • fragments from different starting nucleic acids e.g., different genes
  • fragments from related genes e.g., encoding protein isoforms
  • gene variants e.g., polymorphic variants or mutants of the same gene
  • aspects of the invention embrace rearranged configurations of specific proteins and/or specific protein domains from related or variant proteins.
  • the invention provides methods to create libraries of rearranged configurations of specific proteins and/or specific protein domains from related or variant proteins.
  • aspects of the invention also embrace libraries of libraries, for example, combinations of multiple protein domain configurations. For instance, if a protein requires a specific element or domain, for example an hydrophobic loop, a library of sequence configurations of hydrophobic loops (e.g., from the same protein or from related or variant proteins) can be inserted as elements of the library of rearranged protein configurations. By also interchanging fragments with similar fragments having variant sequences, the theoretical number of different rearranged nucleic acids is increased.
  • initial nucleic acids e.g., different related nucleic acids or sequence variants of the same nucleic acid
  • n initial fragments based on an initial set of criteria (e.g., exons, structural or functional domains, etc.
  • each initial fragment can be replaced with an equivalent initial fragment from one of the x different nucleic acids (e.g., from the same relative position amongst the initial fragments, but from a different initial nucleic acid), the number CN(r) of different combinations of subsets of n predetermined fragments (e.g., exons and/or other predetermined fragments) taken in groups of r, in the same relative order as in an initial gene may be provided by the following equation:
  • the total number of possible permutations, including permutations of all different subsets of r fragments, wherein any of x different variants may be interchanged for each fragment, may be provided by the following equation:
  • sequence variants may be interchanged for different fragments being rearranged. Certain fragments may be retained and/or maintained in the same relative order and not interchanged with any other sequence variants. However, other fragments may be interchanged with different numbers of sequence variants in addition to being rearranged. For example, certain fragments may be interchanged with any one of x' different related structural fragments (e.g., encoding similar structural domains). Similarly, certain fragments may be interchanged with x " different related functional fragments (e.g., encoding similar functional domains). In some embodiments, x ' may equal x " (e.g., if the different structural and functional domain variants are from the same group of related genes or proteins).
  • x ' may be greater than x " or x " may be greater than x '.
  • the number of sequence variants that are interchanged for any one fragment may be independent of the number of sequence variants exchanged for any other fragment.
  • two or more different fragments representing different structural and/or functional regions of a nucleic acid may be interchanged with different numbers of sequence variants. If the number of sequence variants that can be interchanged for each fragment is different, the theoretical total number of possible different nucleic acids may be calculated by replacing x r in
  • Equations 17-22 with the product of the different numbers of sequence variants that are used for each of the fragments being included in the assembled nucleic acids. As discussed above, it should be appreciated that these calculations are useful to determine a theoretical number of different rearranged nucleic acids may required for a library top be representative of the full range of possible variants. However, one or more predetermined sequences may be excluded (e.g., initial sequences, certain short sequences, etc., may be excluded). In certain embodiments, a library may be designed or assemble to contain only a fraction of the possible rearranged nucleic acids as described herein.
  • Suitable expression vectors can be contemplated. These include a number of expression vector plasmid constructs, containing one or more promoter elements capable of driving the transcription of the gene or genetic element of interest.
  • Suitable host cells include but are not limited to: bacterial (e.g., E. coli), yeast, insect cells, and mammalian cells.
  • the invention extends to such cells expressing the vector, either transiently or stably, that contain the nucleic acid sequence of interest as defined in the invention, operatively linked to appropriate transcription/translation systems.
  • any suitable vector e.g., plasmid, BAC, YAC, viral vector, etc.
  • one or more variants may be encoded on the genome of a host cell or organism.
  • genes encoding polypeptide variants may be clustered within one or a few (e.g., 2, 3, 4, or 5) genetic regions (e.g., plasmid, genomic regions, chromosomes, etc.), organized on one or a few (e.g., 2, 3, 4, or 5) operons, or distributed across many genetic regions or operons (e.g., 6-10 or more).
  • a host cell may be a unicellular organism (e.g., a bacterial or yeast cell).
  • a host cell may be a cell obtained from a multicellular organism but grown in culture (e.g., a mammalian cell grown in culture).
  • a host organism may be a multicellular organism. Examples of multicellular organisms include animals and plants, e.g., mammals, insects, reptiles, fish, birds, land plants, aquatic plants, agricultural plants, monocotyledonous and/or dicotyledonous plants, etc. The type of host chosen may depend on the application.
  • cell systems may express suitable splicing enzymes. Different species have different levels of splicing activity. Even different cell types within a species may have different levels of splicing activity. Accordingly, a suitable splicing host cell may be used (e.g., in culture) and transformed with one or more constructs designed to produce a library of engineered splice variants.
  • a host cell may be engineered to have a modified genome that is suited to the expression of variant polypeptides.
  • a host cell may be engineered to have a reduced genome size (e.g., a genome that is smaller by 10%, 20%, 30%, 40%, 50%, or more).
  • a host cell may be adapted to accommodate genetic elements encoding one or more engineered genes of interest.
  • a host cell may be engineered to encode one or more functions for importing (e.g. substrates), synthesizing, or exporting (e.g. products) metabolites, proteins, or other molecules that are useful to assay for the functional and/or structural properties of interest.
  • a host cell may be engineered to encode one or more membrane-bound transporters (e.g., pumps).
  • a host cell may also be engineered to improve growth rate and/or viability in unnatural environments, to detect the presence of a molecule in its environment, to communicate with other cells, to self-organize into patterns, to propagate or die under defined conditions, to act as a scaffold for extracellular synthesis of materials, or to degrade substances in its environment such as environmental contaminants or pathogens.
  • in vitro expression system may be used.
  • in vitro transcription and translation techniques are widely known and available in the art.
  • RNAs and polypeptides Expressed RNAs and polypeptides and screening and selection methods:
  • Libraries and methods described herein are particularly useful for expressing and assaying polypeptide variants. Any suitable assays may be used to determine the characteristics of expressed RNA and/or polypeptide variants and isolate one or more candidate variants having predetermined levels of one or more structural and/or functional activities.
  • nucleic acids encoding polypeptide variants may be fused to a detection moiety (e.g., a reporter molecule) and/or a purification tag.
  • a tag may include a label, probe, myr, an affinity- based tag, a charge-based tag, His, HA, AP, GST, Flag, biotin, HRP, GFP, myc, Glut, MBP, CBP, Chitin, etc.
  • Common epitope tags that may be used include c-myc, HA, FLAG, and/or V5.
  • Candidate therapeutic compounds that are isolated based on one or more structural and/or functional properties may be tested in one or more clinical assays or trials to determine their therapeutic potential.
  • Novel therapeutic compounds can be based on rearrangements of initial genes that are known to encode a therapeutic compound.
  • An initial gene may encode a hormone, a growth factors, a therapeutic antibody, a receptor, a peptide ligand, or other therapeutic polypeptide.
  • An initial gene may be a genomic gene, a cDNA, a human gene, a non-human gene, a recombinant gene, a modified gene or any other suitable gene.
  • Non-limiting examples of therapeutic polypeptides that can be varied and analyzed as described herein include calcitonin, insulin, insulinotropin, insulin-like growth factors, parathyroid hormone, nerve growth factors, TGF- ⁇ , tumor necrosis factor, glucagon, bone growth factor-2, bone growth factor-7, TSH- ⁇ , interleukin 1, interleukin 2, interleukin 3, interleukin 6, interleukin 11, interleukin 12, CSF- macrophage, immunoglobulins, catalytic antibodies, protein kinase C, superoxide dismutase, tissue plasminogen activator, urokinase, antithrombin III, DNase, tyrosine hydroxylase, blood clotting factor V, blood clotting factor VII, blood clotting factor VIII, blood clotting factor X, blood clotting factor XIII, apolipoprotein E, apolipoprotein A-I, globins, low density lipoprotein receptor, IL-2 receptor, IL
  • candidate agricultural, industrial or other polypeptides may be isolated and assayed to determine their potential under appropriate conditions.
  • aspects of the invention may be used to generate libraries of nucleic acid sequence variants enriched for peptide-expressing constructs having one or more desired properties.
  • Certain libraries represent one or more types of gene fragment rearrangements based on a gene encoding a therapeutic polypeptide. Accordingly, aspects of the invention relate to marketing the methods, compositions, kits, devices, and systems described herein for generating nucleic acid libraries of rearranged genes. These may be used for discovering a novel class of therapeutic products, increasing patient access to a wider range of therapeutic products, and also decreasing cost and time for approval and market access for novel therapeutic products.
  • aspects of the invention may be useful for reducing the time and/or cost of production, commercialization, and/or development of a range of new gene products in addition to new therapeutic products. Accordingly, aspects of the invention relate to business methods that involve collaboratively (e.g., with a partner) or independently marketing one or more methods, kits, compositions, devices, or systems for analyzing and/or assembling libraries and identifying novel polypeptides and gene encoding them. For example, certain embodiments of the invention may involve marketing a procedure and/or associated devices or systems involving techniques and assays described herein. In some embodiments, synthetic nucleic acids, libraries of synthetic nucleic acids, host cells containing synthetic nucleic acids, expressed polypeptides or proteins, etc., also may be marketed.
  • Marketing may involve providing information and/or samples relating to methods, kits, compositions, devices, and/or systems described herein.
  • Potential customers or partners may be, for example, companies in the pharmaceutical, biotechnology and agricultural industries, as well as academic centers and government research organizations or institutes.
  • Business applications also may involve generating revenue through sales and/or licenses of methods, kits, compositions, devices, and/or systems of the invention.
  • the average human gene is 28,000 nucleotides long and consists of 8.8 exons of about 120 nucleotides in length.
  • the exons of an average human gene are separated by 7.8 introns ranging from 100 to 100,000 nucleotides long. It should be appreciated that these numbers are averages and that the numbers of exons and introns in actual genes are integer- valued (whole numbers). Accordingly, a representative gene may have 9 exons separated by 8 introns. However, a representative gene may be subdivided into more than 9 different fragments based on other criteria such as structural domains, functional domains, repeat regions, one or more homologies with other genes or consensus sequences, etc., or any combination thereof.
  • a library may be assembled using one or more multiplex assembly techniques described herein to include primarily sequence variants of interest. The following paragraphs provide theoretical numbers of different rearranged nucleic acids based on alternative predetermined configurations of interest.
  • a library may be assembled to include a number of independent constructs that is several fold higher (e.g., 2 fold, 5 fold, 10 fold, or more) than the theoretical total number of different rearranged nucleic acids.
  • different subsets of the 9 exons of a representative gene may be assembled in different synthetic genes (e.g., excluding at least one exon one or more or all of the introns). If the same relative order of exons is maintained in the synthetic genes as in the original gene, the number of different combinations of subsets of the 9 exons is provided by Equation 1 (where r is the number of exons in the synthetic gene):
  • a library of nucleic acid constructs must contain at least 510 (at least 511 if the original gene or a vector with no exons is included in the library, for example as a control, or at least 512 if the original gene and the vector with no exons are both included in the library, for example, as controls).
  • a library may be designed to consist of all or most of the different predetermined rearranged nucleic acids and be assembled to include at least about 1,000; at least 2,000; or at least 5,000 independent constructs in different embodiments.
  • the relative order of the 9 exons of a representative gene is not maintained and synthetic genes are made with any relative order of exons (including the original relative order)
  • the number of different permutations of the 9 exons is provided by Equation 8:
  • Equation 10 If synthetic genes are assembled having subsets of r exons out of the original 9 exons, and the relative order of the exons is not maintained, the number of different permutations of the r exons is provided by Equation 10:
  • a library of nucleic acid constructs must contain at least 986,409 (at least 986,410 if a vector with no exons is included in the library, for example as a control). Accordingly, a library may be designed to consist of all or most of the different predetermined rearranged nucleic acids and be assembled to include at least about 1,000,000; at least 2,000,000; at least 5,000,000 or at least 10,000,000 independent constructs in different embodiments.
  • exons from different isoforms of the human growth hormone gene may be interchanged.
  • Five isoforms of the human growth hormone gene each have five exons.
  • Different subsets of the 5 exons may be assembled in different synthetic genes (e.g., excluding at least one exon and one or more or all of the introns).
  • Five different synthetic genes are assembled for each exon that is included in each configuration in order to represent each of the initial exon isoforms in different configurations in the assembled library. If the same relative order of exons is maintained in the synthetic genes as in the original genes, the number of different combinations of subsets of the 5 exons accounting for the 5 isoforms of each exon is provided by
  • Equation 17 (where r is the number of exons in the synthetic gene and 5 is the number of different isoforms for each exon):
  • the total number of different possible rearranged genes having subsets of between 1 and 4 of the original 5 exons maintained in the same relative order and accounting for 5 different isoforms for each exon is 4,650.
  • a library of nucleic acid constructs must contain at least 4,650 (at least 4,651 if the original vector with no exons is included in the library, at least 4,655 if the original 5 isoforms are included in the library, or at least 4,656 if the original isoforms and the vector with no exons are both included in the library, for example, as controls).
  • a library may be designed to consist of all or most of the different predetermined rearranged nucleic acids and be assembled to include at least about 10,000; at least 20,000; or at least 50,000 independent constructs in different embodiments.
  • Equation 20 the number of different permutations of the 5 exons, accounting for the 5 isoforms of each exon, is provided by Equation 20:
  • Equation 21 If synthetic genes are assembled having subsets of r exons out of the original 5 exons, and the relative order of the exons is not maintained, the number of different permutations of the r exons accounting for 5 isoforms of each exon is provided by Equation 21 :
  • a library of nucleic acid constructs must contain at least 458,025 (at least 458,026 if a vector with no exons is included in the library, for example as a control). Accordingly, a library may be designed to consist of all or most of the different predetermined rearranged nucleic acids and be assembled to include at least about 500,000, at least 1,000,000; at least 2,000,000; or at least 5,000,000 independent constructs in different embodiments.
  • an initial coding sequence or gene may be subdivided into additional non-overlapping fragments (e.g., based on secondary structures, functional domains, etc., instead of or in addition to the exon fragments) and additional rearranged nucleic acids may be assembled.
  • a library may contain a combination of rearranged fragment variants based on different sets of predetermined non-overlapping fragments.
  • a therapeutic protein may be encoded by a gene having exons Al, A2, A3, B, and C as shown in FIG. IB. Examples of four different exon subsets also are shown in FIG. IB along with a transcript containing all five exons. These exon subsets can be generated by alternative splicing and/or by assembling four different nucleic acid constructs, each encoding one of the exon subsets with no introns. It should be appreciated that additional exon subsets could be generated from this gene.
  • FIG. 2 A shows a non-limiting example of a gene with three exons (A, B, and C) separated by two introns (1 and 2).
  • FIG. 2B shows all of the different possible exon configurations that can be generated with the original exons and different exon subsets wherein the relative order of exons is maintained.
  • FIG. 2C shows all of the possible new exon permutation that can be generated with all of the original exons (the original configuration of the three exons is not shown in FIG. 2C).
  • aspects of the invention may involve one or more nucleic acid assembly reactions in order to make gene fragments, constructs containing rearranged gene fragments, modified host cells, and/or other nucleic acids that may be used to generate biological diversity (e.g., introns or other recombination sequences) and screen or select for one or more functions of interest.
  • nucleic acid assembly reactions in order to make gene fragments, constructs containing rearranged gene fragments, modified host cells, and/or other nucleic acids that may be used to generate biological diversity (e.g., introns or other recombination sequences) and screen or select for one or more functions of interest.
  • aspects of the invention involve assembling nucleic acids that contain one or more gene fragments. Aspects of the invention involve assembling nucleic acids that can be used to modify the genome of a host cell. For example, the genome of a host cell may be reduced in size (e.g., by 1%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or more) in order to accommodate nucleic acids that encode different configurations of gene fragment rearrangements. Nucleic acids of the invention may be assembled using any suitable method including a combination of one or more ligation, recombination, or extension reactions. Multiplex nucleic acid assembly reactions may be used to assemble one or more nucleic acid components.
  • Multiplex nucleic acid assembly relates to the assembly of a plurality of nucleic acids to generate a longer nucleic acid product.
  • multiplex oligonucleotide assembly relates to the assembly of a plurality of oligonucleotides to generate a longer nucleic acid molecule.
  • nucleic acids e.g., single or double-stranded nucleic acid degradation products, restriction fragments, amplification products, naturally occurring small nucleic acids, other polynucleotides, etc.
  • a multiplex assembly reaction e.g., along with one or more oligonucleotides
  • an assembled nucleic acid molecule that is longer than any of the single starting nucleic acids (e.g., oligonucleotides) that were added to the assembly reaction.
  • one or more nucleic acid fragments that each were assembled in separate multiplex assembly reactions may be combined and assembled to form a further nucleic acid that is longer than any of the input nucleic acid fragments.
  • one or more nucleic acid fragments that each were assembled in separate multiplex assembly reactions may be combined with one or more additional nucleic acids (e.g., single or double-stranded nucleic acid degradation products, restriction fragments, amplification products, naturally occurring small nucleic acids, other polynucleotides, etc.) and assembled to form a further nucleic acid that is longer than any of the input nucleic acids.
  • additional nucleic acids e.g., single or double-stranded nucleic acid degradation products, restriction fragments, amplification products, naturally occurring small nucleic acids, other polynucleotides, etc.
  • a target nucleic acid may have a sequence of a naturally occurring gene and/or other naturally occurring nucleic acid (e.g., a naturally occurring coding sequence, regulatory sequence, non-coding sequence, chromosomal structural sequence such as a telomere or centromere sequence, etc., any fragment thereof or any combination of two or more thereof).
  • a target nucleic acid may have a sequence that is not naturally-occurring.
  • a target nucleic acid may be designed to have a sequence that differs from a natural sequence at one or more positions.
  • a target nucleic acid may be designed to have an entirely novel sequence.
  • target nucleic acids may include one or more naturally occurring sequences, non-naturally occurring sequences, or combinations thereof.
  • multiplex assembly may be used to generate libraries of nucleic acids having different sequences.
  • a library may contain nucleic acids having random sequences.
  • a predetermined target nucleic acid may be designed and assembled to include one or more random sequences at one or more predetermined positions.
  • a target nucleic acid may be a first gene fragment that is combined with other gene fragments to generate libraries of rearranged gene fragments.
  • a target nucleic acid may include a functional sequence (e.g., a protein binding sequence, a regulatory sequence, a sequence encoding a functional protein, etc., or any combination thereof).
  • some embodiments of a target nucleic acid may lack a specific functional sequence (e.g., a target nucleic acid may include only nonfunctional fragments or variants of a protein binding sequence, regulatory sequence, or protein encoding sequence, or any other non-functional naturally-occurring or synthetic sequence, or any non-functional combination thereof).
  • Certain target nucleic acids may include both functional and non-functional sequences.
  • a target nucleic acid may be assembled in a single multiplex assembly reaction (e.g., a single oligonucleotide assembly reaction). However, a target nucleic acid also may be assembled from a plurality of nucleic acid fragments, each of which may have been generated in a separate multiplex oligonucleotide assembly reaction. It should be appreciated that one or more nucleic acid fragments generated via multiplex oligonucleotide assembly also may be combined with one or more nucleic acid molecules obtained from another source (e.g., a restriction fragment, a nucleic acid amplification product, etc.) to form a target nucleic acid.
  • another source e.g., a restriction fragment, a nucleic acid amplification product, etc.
  • a target nucleic acid that is assembled in a first reaction may be used as an input nucleic acid fragment for a subsequent assembly reaction to produce a larger target nucleic acid.
  • different strategies may be used to produce a target nucleic acid having a predetermined sequence. For example, different starting nucleic acids (e.g., different sets of predetermined nucleic acids) may be assembled to produce the same predetermined target nucleic acid sequence. Also, predetermined nucleic acid fragments may be assembled using one or more different in vitro and/or in vivo techniques.
  • nucleic acids e.g., overlapping nucleic acid fragments
  • an enzyme e.g., a ligase and/or a polymerase
  • a chemical reaction e.g., a chemical ligation
  • in vivo e.g., assembled in a host cell after transfection into the host cell
  • each nucleic acid fragment that is used to make a target nucleic acid may be assembled from different sets of oligonucleotides.
  • a nucleic acid fragment may be assembled using an in vitro or an in vivo technique (e.g., an in vitro or in vivo polymerase, recombinase, and/or ligase based assembly process).
  • an in vitro assembly reaction may involve one or more polymerases, ligases, other suitable enzymes, chemical reactions, or any combination thereof.
  • the present invention provides among other things methods for assembling large polynucleotide constructs and organisms having increased genomic stability. While specific embodiments of the subject invention have been discussed, the above specification is illustrative and not restrictive. Many variations of the invention will become apparent to those skilled in the art upon review of this specification. The full scope of the invention should be determined by reference to the claims, along with their full scope of equivalents, and the specification, along with such variations.

Abstract

Certain aspects of the present invention provide informed methods for designing and engineering protein variants and identifying novel functional polypeptides, including novel therapeutic products. Aspects of the invention include preparing libraries of nucleic acid variants that are based on rearrangements of predetermined fragments from an initial gene sequence. The libraries contain a high percentage nucleic acids that express functional polypeptide variants and/or variants that have at least one predetermined property of interest. The expressed polypeptide variants are evaluated to identify novel products. Aspects of the invention can be used to identify novel classes of polypeptide variants, for example novel classes of therapeutic products. Aspects of the invention also provide medical, pharmaceutical, industrial, agricultural, environmental, and other uses for protein variants.

Description

FRAGMENT-REARRANGED NUCLEIC ACIDS AND USES THEREOF
RELATED APPLICATIONS
This application claims the benefit under 35 U.S. C. § 119(e) of United States provisional patent application, serial number 60/874,991, filed December 13, 2006, the contents of which are incorporated herein by reference in their entirety.
FIELD OF THE INVENTION
The invention relates to methods for identifying engineered nucleic acids that encode polypeptide variants having one or more functional and/or structural properties of interest.
BACKGROUND OF THE INVENTION
Natural and recombinant protein products have been developed for a wide range of medical, industrial, and agricultural applications. However, methods for identifying new or improved protein products have typically involved mutagenesis or other methods for generating large numbers of random protein variants that can be sampled by selecting or screening for one or more properties of interest. Large numbers of protein variants are required, because only rare variants have desirable structural or functional properties. Most random variants are either non-functional polypeptides or have unwanted side- effects.
Large numbers of protein variants can be sampled using nucleic acid libraries that express many different protein-encoding sequences. A variety of mutagenesis and molecular biology techniques have been used to prepare the nucleic acid libraries. Different forms of chemical, physical, and/or biological mutagenesis can be used to alter an initial protein-encoding nucleic acid and introduce a range of sequence changes that will encode different polypeptide variants. The number of changes that can be introduced into a starting nucleic acid can be controlled by altering the levels of mutagenesis. Different techniques may be biased towards certain types of mutations. For example, different chemical and physical mutagenesis techniques can result in preferential patterns of nucleotide insertions, deletions, transitions, and/or trans versions. Similarly, different host cells can be used to introduce certain types of mutations into the sequence of a starting nucleic acid. However, the resulting mutation patterns remain substantially random, and many different random nucleic acids can be generated with varying degrees of sequence similarity relative to the initial protein-encoding nucleic acid.
Molecular biology techniques also have been used to modify protein-encoding nucleic acid sequences and introduce mutations in relatively random patterns so that large numbers of sequence variants can be sampled. For example, error-prone nucleic acid synthesis or amplification (e.g., error-prone PCR) using low-fidelity polymerases or low-fidelity polymerization conditions can be used to introduce mutations into nucleic acids that are synthesized from an initial protein-encoding template. Alternatively, random nucleotides can be introduced during the chemical synthesis of degenerate oligonucleotides that are designed to be incorporated into protein-encoding nucleic acids. The resulting libraries include relatively random sequences and large numbers of variants need to be sampled to identify candidate polypeptides having one or more properties of interest. As explained above, many of the random variants may encode non-functional proteins or proteins with unwanted side-effects.
In some instances, proteins with altered functions have been generated by making specific sequence changes in a gene or by targeting mutagenesis to a specific sequence region of a gene. For example, specific mutations at predetermined positions in a gene can be made and tested if specific amino acid changes have been identified as candidates for improved or altered protein function. Similarly, regions known to be associated with one or more protein functions can be targeted for mutagenesis in order to identify variants with altered properties. However, it is often difficult to predict which mutations should be made or which regions should be targeted in order to obtain novel or improved protein properties. As computer modeling techniques evolve, new and improved protein variants may be developed in silico based on predictable properties of amino acid sequence changes. However, variant libraries remain potentially useful sources for identifying polypeptides having one or more properties of interest.
Certain variant libraries have been made based on sequence homology information. These libraries sample a smaller number of different sequences, but the variants are expected to have a higher probability of being functional since their sequences are based on combinations of known functional sequence variants. For example, libraries based on domain or exon shuffling have been made by replacing one or more domains or exons of a first gene with one or more homologous domains or exons from a second gene of the same species or with one or more equivalent domains or exons from the same gene of a related species. Different library formats involving orthologous exon shuffling, paralogous exon shuffling, orthologous domain shuffling, or paralogous domain shuffling are reported in Kolkman and Stemmer, Nature Biotechnology, Volume 19, 2001, pages 423-428. In other libraries described in U.S. Patents 6,303,344 and 6,506,603, variant proteins incorporate different combinations of sequences from homologous proteins. Homologous genes are fragmented to generate overlapping fragments and the fragments are mixed and reassembled. Fragment reassembly is based on recombination/hybridization between regions of complementary or partially complementary sequences of overlapping fragments. The reassembled gene variants have the same relative sequence order as the initial genes, but incorporate different combinations of homologous sequences from the homologous genes.
However, these homology based libraries have proved of limited value in identifying improved therapeutic proteins and there remains a need for methods for identifying new and improved protein variants.
SUMMARY OF THE INVENTION
Aspects of the invention relate to nucleic acid and polypeptide libraries and methods for sampling a defined sequence space that represents predetermined rearrangements of one or more known nucleic acid and/or protein sequences. By assaying systematic rearrangements of an initial nucleic acid or protein sequence, new classes of therapeutic products with improved and/or novel properties can be identified based on a starting molecule that may have one or more properties of interest. The invention provides methods for selecting gene fragments and patterns of gene fragment rearrangements that can be used to generate variant libraries. Aspects of the invention also may help identify novel gene variants that are useful for industrial, agricultural, environmental, research, and other applications in addition to novel therapeutic products.
In some aspects, libraries may be designed based on information obtained by analyzing naturally-occurring splice variants of one or more genes of interest. A library may be designed and/or assembled to express different natural splice variants of a gene of interest. In some embodiments, a library may express synthetic splice variants or a combination of natural and synthetic splice variants. As additional natural splice variants are identified or predicted, they may be included in expression libraries in order to screen their functional properties and/or compare them to the properties of previously known variants. Based on the modularity of many proteins, it is expected that naturally- occurring splice variants may provide a pool of related proteins having different domain configurations conferring equivalent functionalities that are sufficiently different to have different relative advantages under a variety of in vivo environments (e.g., in different cell types). Accordingly, a library of natural splice variants may contain one or more variants that have functional and/or structural properties of interest. However, unnatural or synthetic splice variants also may confer functional and/or structural properties of interest. Accordingly, a library may include natural or unnatural splice variants, or a combination thereof.
In one aspect, a library of candidate therapeutic protein variants may be based on different configurations of gene fragments (e.g., different relative orders of fragments and/or different subsets of fragments) from a single gene encoding a known therapeutic protein or candidate therapeutic protein. For example, different subsets of exons and/or different relative orders of exons from an initial gene may be assembled to produce a library of protein variants. In some embodiments, the initial protein may have one or more properties of interest. For example, it may be non-immunogenic. The resulting variant pool will recapitulate a privileged space of immune evasion since the starting protein is non-immunogenic and the sequence rearrangements do not involve introducing mutant and/or exogenous sequences (other than a few novel junction sequences that may be introduced by the rearrangements). In particular, sequence variants can be made without random or targeted mutagenesis thereby avoiding new amino acid sequences that could be immunogenic. Similarly, sequence variants can be made without introducing new sequences from homologous genes that also could generate unwanted immunogenicity.
Accordingly, one aspect of the invention provides libraries and methods for designing and assembling libraries based on reordered fragments from a predetermined gene or protein sequence. In another aspect, the invention provides libraries and methods for designing and assembling libraries based on subsets of fragments (i.e., by omitting one or more fragments) from a predetermined sequence. In some embodiments, a library may include reordered fragments, fragments subsets, and/or different orders of fragment subsets. Accordingly, an individual variant may include one or more reordered fragments in addition to missing one or more fragments of an initial nucleic acid and/or protein.
Aspects of the invention involve selecting gene fragments to be rearranged and/or deleted. In some embodiments, libraries are assembled to include splicing (or other recombination) sequences that can be used to generate a pool of predetermined variants. For example, one or more intron sequences that can promote alternative patterns of intron splicing can be included in a library construct. In other embodiments, libraries are assembled without introns or other recombination sequences. The libraries may be designed to include different constructs representing the plurality of predetermined fragment rearrangements. The fragments may be exons, exon portions, functional domains, structural domains, secondary structural motifs, other fragments (e.g., homology motifs, etc.), or any combination thereof. In some embodiments, a library may contain known splice variants identified from transcriptomics. For example, at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%, or all of the rearranged gene configurations in a library may correspond to known natural splice variants. In some embodiments, a library may contain at least at least 1%, at least 5%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, at least 99%, or all of the known natural splice variants. Aspects of the invention may be used to screen variants of a therapeutic polypeptide and identify new and/or improved therapeutic proteins. However, aspects of the invention also may be used to identify new and/or improved industrial, agricultural, or other useful polypeptides.
Aspects of the invention are useful for generating protein variants with one or more properties that are related to a parent protein. By generating new combinations of fragments from a single protein, new functional properties, new combination of functional properties, new subsets of functional properties, modified functional properties, or any combination thereof, may be generated without introducing new sequences from a different gene, protein, or species. Variant proteins are expected to share many of the properties of a parent protein. Accordingly, variant proteins of the invention may be biosimilar to a parent protein (e.g., to a parent therapeutic protein). For example, variant proteins may have similar immunogenicity profiles to those of a parent therapeutic protein. Accordingly, the invention provides methods for selectively designing and constructing polypeptide variants that incorporate a plurality of fragments of a protein in specified combinations and/or specified permutations. The invention extends to nucleic acids that encode such polypeptide variants. The invention further provides nucleic acids operatively linked to appropriate expression systems, e.g., expression vectors, host cells expressing such expression vectors, and uses thereof. Aspects of the invention relate to nucleic acid libraries that express polypeptide variants. Further aspects of the invention relate to methods of identifying one or more polypeptide variants having predetermined structures and or functions of interest. A library may include naturally occurring and artificial splice variants. A library may include natural and artificial fragment permutations. A library may include at least 1%, at least 5%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% of all possible rearrangements of predetermined fragments (e.g., exons) of a predetermined protein or gene of interest. For example, a library may include one or more exon deletions. Some of these exon deletions may correspond to naturally occurring splice variants. However, a library may include at least 1%, at least 5%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% of all possible exon subsets, and/or exon permutations for a predetermined gene or protein of interest. Theoretical numbers of possible different exon rearrangements can be readily calculated for different configurations of fragment subsets and fragment permutations as described in more detail herein.
In some embodiments, protein variants may have rearranged functional domains. The identity of the functional domains may be based on homology and/or functional studies. For example, a library may include at least at least 1%, at least 5%, at least 20%, 25%, at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% of all possible rearrangements of functional domains identified for a predetermined gene or protein of interest. Other fragment rearrangements also may be prepared. Accordingly, a library may include at least 1%, at least 5%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 85%, at least 90%, at least 95%, or at least 99% of all possible rearrangements of a plurality of predetermined fragments of a gene or protein of interest. Aspects of the invention are useful for screening, identifying, making and or using polypeptide variants having one or more desirable features, including functional features, structural features, physiochemical profiles, pharmacokinetic profiles, etc., or any combination thereof. Thus, the invention is useful for making and using variants that elicit changes in one or more features which include but are not limited to: transcription, translation, expression, folding, solubility, stability, post-translational modifications, activities, size, charge, localization, degradation, antigenicity, immunogenicity, toxicity, efficacy, affinity, specificity, extractability, etc., or any combination of two or more thereof. The instant invention contemplates, in one aspect, a method of designing a nucleic acid library. Certain methods involve identifying a plurality of gene segments that encode non-overlapping polypeptide domains of a predetermined protein and assembling a plurality of nucleic acids, wherein each nucleic acid comprises the plurality of gene segments, and wherein the relative order of the gene segments is different in each nucleic acid. In some embodiments, each nucleic acid comprises a subset of the plurality of gene segments, excluding at least one of the plurality of gene segments.
As disclosed herein, the invention provides a method of making polypeptide variants comprised of a discrete set of domains such that each fragment represents a portion of a predetermined protein. According to one aspect of the invention, a fragment is encoded by an exon of the gene encoding the predetermined protein.
The invention further contemplates incorporating these methods, compositions, and uses into business applications. For example, business methods for identifying, selecting, screening, and marketing commercially relevant polypeptide and nucleic acid variants and or related therapeutic, diagnostic, or industrial products are included in the invention. Aspects of the invention may be particularly useful for providing novel therapeutic compounds, accelerating patient access to novel compounds, reducing development costs, and simplifying regulatory approval of novel compounds. In some embodiments, the properties of existing therapeutic polypeptides may be improved or altered. However, methods of the invention also may be used to rescue failed therapeutic candidate compounds by identifying novel variants with favorable production and/or clinical characteristics. BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 illustrates non-limiting embodiments of predetermined genes showing initial intron/exon configuration and splicing patterns - exons are represented as A, B, C, and D in FIG. IA and Al, A2, A3, B, and C in FIG. IB - introns are represented as 1, 2, and 3 in FIG. IA, and as thick lines connecting A3 to B and B to C in FIG. IB; and
FIG. 2 illustrates a non-limiting embodiment of an initial gene having exons A, B, and C and introns 1 and 2 in FIG. 2A, FIG. 2B illustrates different subset combinations of exons A-C, and FIG. 2C illustrates different permutations of exons A-C.
DETAILED DESCRIPTION OF THE INVENTION
Accordingly, aspects of the invention relate to novel polypeptides and intelligent methods for designing and identifying improved therapeutic polypeptides. The invention provides methods for designing new gene sequence configurations and related polypeptide sequence configurations based on rearrangements of initial genes and proteins rather than by introducing mutations or homologous sequences from exogenous sources.
Aspects of the invention are based, at least in part, on the recognition that biological systems have evolved genetic elements such as exons and introns that can be spliced into several alternative configurations (e.g., via alternative splicing) that can provide related, but varied, functional properties. Aspects of the invention extend the concept of alternative splicing to provide a systematic sampling of a predetermined sequence space defined by different types of rearrangements of nucleic acid fragments derived from an initial gene. Aspects of the invention provide methods for identifying new classes of therapeutic proteins. By providing a plurality of different gene rearrangements based on an initial gene encoding a known RNA or protein, a library can be assembled to express a plurality of RNAs or polypeptides that are enriched for variants that share one or more properties of the initial molecule. Examples of properties that can be retained by genetic rearrangement include solubility, stability, immunogenicity profiles (e.g., low for a therapeutic polypeptide, higher for a vaccine), specific catalytic activities, specific binding affinities, or other properties that represent desirable features of the initial molecule. However, by sampling different genetic rearrangements, novel properties can be uncovered. For example, certain variants may retain the positive features of an initial molecule while removing one or more negative traits (e.g., tissue toxicity, environmental toxicity, excessively rapid or slow serum clearance rates in a patient, unwanted side-effects, lack of substrate specificity, secondary catalytic activities unrelated to the activity of interest, etc.). Also, in some embodiments novel functional and/or structural properties may be created by rearranging sequence fragments. Accordingly, existing therapeutic proteins may be improved, novel therapeutic classes may be identified, and certain therapeutic candidates that have failed in clinical trials may be rescued. For example, a rearranged gene library based on a gene that expresses a failed therapeutic candidate may be subjected to one or more screens or selections to identify variants of the failed candidate drug that either have increased activity or reduced negative properties (e.g., reduced immunogenicity, reduced toxicity, etc.). Aspects of the invention also may be used to sample and assay genetic rearrangements that express one or more variant therapeutic RNAs. However, aspects of the invention also may be used to produce libraries of genes encoding RNAs and/or polypeptides that are useful for agriculture, industry, environmental applications, research, etc.
Libraries of the invention may be assembled using multiplex nucleic acid assembly techniques described herein. In some embodiments, the sequences of the nucleic acid fragments encoding predetermined polypeptide fragments may be modified without affecting the polypeptide sequences. The nucleic acid sequences may be modified to optimize nucleic acid assembly, stability, and/or expression. For example, certain repeat sequences may be altered or removed (e.g., by introducing one or more degenerate codons without altering the encoded amino acid sequence) to reduce incorrect assembly and/or to stabilize the assembled nucleic acids. In some embodiments, codons may be optimized for expression in a particular host cell (e.g., by removing one or more species-specific rare codons) without altering the encoded amino acid sequence.
The fragments that are rearranged to generate libraries of nucleic acid variants may be exons, exon portions, functional domains, structural domains, secondary structural motifs, other fragments (e.g., homology motifs, etc.), or any combination thereof.
Exon rearrangements:
Certain natural protein variants are generated through alternative splicing of a gene that includes exons and introns. For example, a eukaryotic gene may comprise one or more exons and one or more introns. FIG. IA illustrates a non-limiting example of a gene with four exons illustrated as A, B, C, and D, separated by introns 1, 2, and 3. An exon is a region of DNA within a gene that is transcribed and retained in a final messenger RNA (mRNA) molecule. In contrast, an intron refers to a non-coding intervening region in a gene that is precisely removed from an RNA transcript by a process termed RNA splicing, or RNA processing. Thus, RNA splicing is a process that removes introns and joins exons from a primary transcript (or pre-mRNA) to form a mature mRNA (transcript). A non-limiting example of a mature mRNA is shown in FIG. IA containing exons A, B, C, and D. In many genes, each exon contains part of the open reading frame (ORF) that codes for a specific portion of a complete protein. In some cases, exons are wholly or part of the 5' untranslated region (5' UTR) or the 3' untranslated region (3' UTR) of each transcript. The untranslated regions are important for efficient translation of the transcript and for controlling the rate of translation and half life of the transcript. Furthermore, transcripts made from the same gene may not have the same exon structure since parts of the mRNA could be removed by the process of alternative splicing. Some mRNA transcripts have exons with no ORF' s and thus are sometimes referred to as non- coding RNA.
The process of splicing is catalyzed by a large RN A-protein complex known as a spliceosome. The spliceosome is composed of five small nuclear ribonucleoproteins (snRNPs). In addition to snRNPs, splicing requires many non-snRNP protein factors. The RNA components of snRNPs interact with the intron and may be involved in catalysis. Two types of spliceosomes have been identified (the major and minor) which contain different snRNPs. For example, the major spliceosome splices introns containing GU at the 5' splice site and AG at the 3' splice site. It is composed of the Ul, U2, U4, U5, and U6 snRNPs. Ul binds to the 5' splice site. U2 binds to the branch. U4 inhibits U6. U5 binds to Ul and U2 to create the lariat. U6, when activated, displaces Ul and binds U2. U2-U6 forms an active catalytic complex.
The minor spliceosome is very similar to the major spliceosome. However it splices rare introns with different splice site sequences. Here, the 3' and 5' splice sites are AU and AC, respectively. While the minor and major spliceosomes contain the same U5 snRNP, the minor spliceosome has different, but functionally analogous snRNPs for Ul, U2, U4, and U6, which are respectively called Ul 1, U 12, U4atac, and Uόatac. Most introns start from a GU sequence and end with an AG sequence (in the 5' to 3' direction). They are referred to as the splice donor and splice acceptor sites, respectively. However, the sequences at the two sites are not sufficient to signal the presence of an intron. Another important sequence is called the branch site located 20 - 50 bases upstream of the acceptor site. The consensus sequence of the branch site is "CU(A/G)A(C/U)", where A is conserved in all genes. In over 60% of cases, the exon sequence is (A/C)AG at the donor site, and G at the acceptor site.
Splicing occurs in a two-step biochemical process. Both steps involve transesterification reactions that occur between RNA nucleotides. First, a specific branch-point nucleotide within the intron reacts with the first nucleotide of the intron, forming an intron lariat. Second, the last nucleotide of the first exon reacts with the first nucleotide of the second exon, joining the exons and releasing the intron lariat.
Alternative splicing is a process that occurs in eukaryotes in which the splicing process of a pre-mRNA transcribed from one gene can lead to different mature mRNA molecules and therefore to different proteins. There are four known modes of alternative splicing: i) Alternative selection of promoters: this is the only method of splicing which can produce an alternative N-terminus domain in proteins. In this case, different sets of promoters can be spliced with certain sets of other exons. ii) Alternative selection of cleavage/polyadenylation sites: this is the only method of splicing which can produce an alternative C-terminus domain in proteins. In this case, different sets of polyadenylation sites can be spliced with the other exons. iii) Intron retaining mode: in this case, instead of splicing out an intron, the intron is retained in the mRNA transcript. However, the intron sequence must be coding and properly expressible, otherwise a stop codon or a shift in the reading frame will cause the protein to be non-functional. iv) Exon cassette mode: in this case, certain exons are spliced out to alter the sequence of amino acids in the expressed protein.
lntr on-containing libraries:
One aspect of the invention is a library of gene variants based on splice variants. In some embodiments the library is based on natural splice variants. Natural splice variants are splice variants commonly found in RNA transcripts. In some embodiments the library is based on unnatural splice variants, which are based on an extension of naturally occurring alternative splicing to produce splice variants that form junctions between exons that are not directly spliced together in natural systems. In some embodiments the library is a combination of natural and unnatural splice variants. In some embodiments, a library of gene variants is designed to include altered introns that will promote alternative splicing patterns that are not found in nature. For example, different constructs in a library may be engineered to include altered donor, acceptor, and branch sites in one or more introns of the naturally occurring gene. The splice sites may be altered to either increase or decrease splicing between different exons, thereby generating different patterns of splicing. Sequence alterations may be based on conserved or consensus sequences or known intron sequences that are either efficiently or inefficiently spliced. Sequence alterations also may be introduced into different constructs to promote alternative promoter selection and/or alternative sites of polyadenylation. Accordingly, a library can be generated with plurality of different constructs that are predicted to produce a plurality of alternative splice variants, some of which may be natural splice variants, some of which may be novel splice variants. In some embodiments, a library may be designed so that at least 1%, at least 5%, at least 10%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 90%, at least 95%, about 99%, or all of the splice variants are predicted to be non-natural splice variants. In some embodiments, at least 1%, at least 5%, at least 10%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 90%, at least 95%, about 99%, or all of the constructs in the library include the original exons separated by intervening introns, but at least one of the introns comprises a modified donor, acceptor, and/or branch site. In some embodiments, at least one of the introns is an exogenous intron that is introduced to replace one of the original introns.
In some embodiments, the relative order of the exons in one or more constructs may be changed and the intervening introns may be the original introns or may be modified as described herein. In some embodiments, one or more exons may be omitted from some of the constructs in the library. In certain embodiments, different constructs in a library may contain modified introns, novel orders of two or more exons, one or more omitted exons, or any combination of two or more thereof.
In certain embodiments, a host cell is engineered to include constructs with alternative splicing configurations (e.g., expressed from a construct on a vector) and grown under conditions that allow and/or promote splicing. In some embodiments, a plurality of different cells, each with a different set of altered splice domains is designed and engineered. Each cell can give rise to a single splice variant. However, one or more cells may give rise to two or more different splice variants. Desirable properties can be selected using any suitable assay. Cells expressing a desirable variant (e.g., either as a single splice variant or as one of a few splice variants) can be isolated and the desirable splice variant can be identified. In some embodiments, a nucleic acid encoding a desirable splice variant, once identified, can be synthesized as a single uninterrupted coding sequence without any introns. In some embodiments, a nucleic acid can be designed and assembled such that it will produce a single splice variant corresponding to an identified splice variant having one or more desirable properties.
In certain embodiments, host cells may be recombinant cells that are engineered to express altered levels of one or more splicing factors (e.g., enzymes or nucleic acids) and/or one or more altered splicing factors (e.g., enzymes or nucleic acids) having altered levels of activity. For example, an engineered host cell may over-express one or more wild-type or altered splicing factors in order to increase the overall level of splicing activity (e.g., to increase the rate and or level of intron removal from RNA transcripts). In some embodiments, a host cell may over-express one or more wild-type or altered splicing factors in order to alter the pattern of intracellular splicing (e.g., to alter the relative rate and or pattern of intron removal from RNA transcripts). In some embodiments, a host cell may be engineered to increase exon-skipping. In certain embodiments, a host cell may be engineered to decrease exon-skipping. Trans-acting nucleic acids and proteins that can increase or otherwise alter splicing rates and/or splicing patterns are known in the art. For example, US Patent No. 7,049,133, the entire contents of which are incorporated herein by reference, describes certain trans-acting enzymes (e.g., protein kinases) that can act on one or more cis-acting nucleic acids to effect intron removal. Examples of cis-acting sequences and trans-acting factors that can be used to increase or decrease exon skipping include, for example, exon sequences and serine/arginine-rich splicing factors that bind to certain exon sequences, such as those disclosed in Ibrahim et al., Proc. Nat. Acad. ScL, 2005, vol. 102, no. 14, 5002-5007.
Accordingly, by reducing the expression of certain serine/arginine-rich splicing factors, the amount of exon skipping can be increased in a host cell. However, in some embodiments, the amount of exon-skipping can be kept at low levels in a cell by using a host cell that expressed sufficiently high levels of the serine/arginine-rich splicing factors.
Intron-free libraries: In some aspects, nucleic acid constructs may be designed to encode different arrangements of exons without the intervening introns. In some embodiments, different constructs encode different subsets of exons arranged in the same relative order as in the naturally-occurring gene. Accordingly, the different subsets of exons correspond to different combinations of one or more exon deletions. A library may be designed to encode different numbers of exon deletions. For example, a library of single exon deletions may be designed. A library containing all combinations of 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or more different exon deletions may be designed. In some embodiments, a library may be designed to include all combinations of all numbers of exon deletions, wherein the remaining exons are in the same relative order as in the parent gene. In some embodiments, a library may be designed to include all combinations of all numbers of exon deletions with the exception of deletions that result in only 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or any combinations thereof remaining in the constructs. It should be appreciated that the design considerations will be different for different genes and proteins, since different intron-containing genes can have a wide range of numbers of introns and exons.
In some embodiments, the relative order of exons may be changed in an assembled nucleic acid library. For example, the relative order of two or more different exons may be changed. In some embodiments, a library may be designed to contain different constructs that represent at least 1%, at least 5%, at least 10%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 90%, at least 95%, about 99%, or all of the different relative orders of exons. However, in some embodiments a library may be designed so at least 1%, at least 5%, at least 10%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 90%, at least 95%, about 99%, or all of the different constructs contain two or more exons in a different order relative to the original gene.
In some embodiments, libraries may include combinations of exon deletions and exon permutations. In some embodiments, the exon deletions are on different constructs from the exon permutations. In certain embodiments, one or more constructs may contain combinations of one or more exon deletions and/or exon permutations. A library may include some constructs that only contain deletions, some constructs that only contain permutations, and some constructs that contain combinations of deletions and permutations. Libraries may be designed to include at least 1%, at least 5%, at least 10%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 90%, at least 95%, about 99%, or all of the different possible combinations of exon subsets and/or exon permutations described herein. In some embodiments, libraries may be designed so that at least 1%, at least 5%, at least 10%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 90%, at least 95%, about 99%, or all of the constructs contain an exon subset and/or an exon permutation described herein. Certain libraries may have predetermined patterns of exon deletions and/or exon permutations. For example, most (e.g., more than 50%, more than 75%, more than 80%, more than 90%, etc.) of the exon deletions and/or exon permutations may be located in the 5' half, 3' half, or between the 5' quarter and the 3' quarter of the gene. The half and quarter measures of the gene may be made with reference to the number of exons and introns rather than being based on nucleic acid lengths. These may be different since the relative lengths of different exons and introns may be very different.
Aspects of the invention also contemplate adding one or more exons (e.g., by duplicating one or more exons of the original gene. Accordingly, the invention includes embodiments comprising methods of designing and making peptide variants by deleting and or adding one or more exons. In some cases, the nucleic acid encoding a first variant comprises a subset of all available exons of a gene, and the nucleic acid encoding a second variant comprises a different subset of all available exons of the gene, and so on, such that each variant is defined by a unique combination, or assortment, of exons of a gene, rather than by the relative order by which the exons are arranged. Accordingly, in some embodiments, one or more exons are deleted, or absent. The following example illustrates a gene comprised of four exons, A, B, C, and D, that yields a transcript expressed as ABCD. According to this model, any one of the four exons may be deleted while retaining the relative order of exon arrangement to generate the following variants: ABC, ABD, ACD, BCD. Thus, variants that result from exon assortment may include partially or substantially overlapping sequences. For example, a first variant may differ from a second variant by one exon. Accordingly, two variants may share substantially the same set of exons, with an exception of a single alternative exon. In this scenario, the first variant and the second variant contain an alternative exon, in addition to common exons, such that the former may be expressed as ABC, while the latter may be expressed as ABD. In other embodiments, more than one exon may be absent. Examples of such embodiments are: AB, AC, AD, BC, BD, CD, A, B, C, and D. In some embodiments, one or more exons are repeated, or duplicated. Examples of embodiments in which one or more exons are duplicated include but are not limited to: AABCD, ABBCD, ABCCD, ABCDD, AABBCD, ABBCCD, AABCCD, AABCDD, AABBCCDD, AABCDD, AAABCD, and so on. In some cases, one or more exons are deleted or missing, and one or more different set of exons are duplicated. Examples of such variants including both deletions and duplications include but are not limited to: AACD, in which exon A is duplicated and exon B is absent; BBDD, in which exons A and C are absent and exons B and D are each duplicated.
In some embodiments, the invention includes generating peptide variants based on exon reordering. As used herein, exon reordering refers to the process and/or outcome of altering the relative order of exons from a natural gene, also referred to herein as exon permutations. Accordingly, embodiments include peptide variants that are encoded by the same set of exons, but are rearranged in various relative orders. For example, a peptide encoded by a gene, fragment thereof or corresponding cDNA comprising four exons, A, B, C, and D, which is expressed as ABCD, may be rearranged in various orders to yield: ABDC, ACBD, ACDB, ADBC, ADCB, BACD, BADC,
BCAD, BCDA, BDAC, BDCA, CABD, CADB, CBAD, CBDA, CDAB, CDBA, DABC, DACB, DBAC, DBCA, DCAB, and DCBA. Thus, each variant includes each of the four exons, A, B, C, and D, but in a different order. Accordingly, it should be appreciated that variants with different relative orders of exons may include variants where the relative order of all exons is changed (e.g., a variant of ABCD having the order DCBA) or variants where the relative order of only a few exons is changed (e.g., a variant of ABCD having the order BACD, where the order of A and B is inverted, but the order of C and D is the same). Similarly, variants having different relative orders of other nucleic acid fragments described herein may include variants where only a few fragments are rearranged relative to each other and variants where most or all fragments are rearranged relative to each other. It should be appreciated that the number of different configurations increases exponentially as the number of different fragments increases as described herein. In yet other embodiments, the invention includes peptide variants having different relative orders of exons as well as different exon combinations (e.g., subsets and/or duplications). Thus, complex libraries of peptide variants may encompass polypeptides generated by both assortment and permutation described above. To illustrate the model, a few of such examples are shown below. A parent gene having five exons, A, B, C, D, E, is show as: ABCDE. If exons D and E each represent an alternatively spliced exon, such that one but not both is expressed in a cell, naturally occurring variants include ABCD, in which exon E is absent, and ABCE, in which exon D is absent. Thus, in the instant example, the parent variant, ABCDE, represents an artificial (non-naturally occurring) isoform comprising each of the five exons contained in the gene or gene fragment. A second variant may further contain a duplication of an exon. Suppose exon B is duplicated once, based on the same example provided above. This gives rise to: ABBCDE (according to the "parent" or largest representation of unique exons), ABBCD (according to the first of the two naturally occurring alternative spliced variant), and ABBCE (according to the second of the two alternative variants). Now, rearranging these variants can generate a number of possible variants, including: BABCDE, wherein the first two exons of the ABBCDE variant are swapped; BBCAD, wherein the exons of ABBCD are rearranged; and EBBACA, wherein the ABBCE variant gives rise to exon rearrangement as well as an additional duplication (that of exon A). It should be appreciated that these configuration are illustrative purposes only and not intended to be limiting. Many initial genes have larger numbers of exons and similar patterns of exon rearrangements can be generated involving larger numbers of possible variants.
It should be appreciated that exons and intron boundaries can be defined experimentally by analyzing the mRNA sequence that is expressed from a genomic locus. For example, a cDNA construct may be sequenced and the location of the intron/exon boundaries may be inferred from the genomic sequences that are absent from the cDNA. However, putative exons and introns may be identified based on sequence analysis of a genomic locus (e.g., based on the analysis of open reading frame distributions and the identification of putative splice acceptor, donor, and branching sites). Accordingly, synthetic libraries of the invention may be based on rearrangements of known exon sequences and/or predicted exon sequences. Rearrangements of sequences encoding functional domains, structural domains, secondary structures and other gene fragments:
Certain aspects of the invention relate to rearrangements of non-overlapping gene segments or fragments other than exons and introns as described above. An initial RNA or protein coding sequence may be analyzed and subdivided into a plurality of fragments based on one or more different criteria. In some embodiments, at least one of the fragments may represent a polypeptide domain capable of assuming a discrete module. Such domain may represent a structural module, a functional module or both. As used herein, the term "module" or "modular" may refer to a segment of primary polypeptide sequence that is discernable at the level of secondary structure. In some circumstances, but not always, a modular domain of a protein represents a functionally independent unit. From a physiochemical point of view, a modular domain of a protein may exert relative stability. Accordingly, different coding fragments may be assigned based on different functional and/or structural domains that they encode. The identification of functional and/or structural domains in an encoded RNA or protein may be based on experimental data, sequence homology, structural data, RNA modeling, protein modeling, computer- implemented structural or folding models, or any other source of actual or predicted functional or structural information. In some embodiments, one or more coding fragments may be based on different secondary structure motifs, for example, the presence of sequences predicted or known to form alpha helices, beta sheets, or other secondary structural motifs. In certain embodiments, one or more coding fragments may be based on the presence of sequences known or predicted to form certain tertiary structures. In some embodiments, one or more coding fragments may be based on their known or predicted interactions with other RNA or protein subunits (e.g., in hetero- or homo- multimeric nucleic acids, proteins, or nucleic acid protein complexes). Coding fragments also may be defined based on sequence homologies (e.g., the presence or absence of conserved sequences) regardless of whether a known or predicted function or structure has been associated with the sequence. In certain embodiments, coding fragments may defined be based on patterns of one or more sequence elements. For example, the presence of clusters of certain nucleotides and or amino acids in one or more regions may form a basis of fragment determination. However, in some embodiments, coding fragments may be determined arbitrarily or at least not based on predicted structural and/or functional properties of the encoded fragments. A coding sequence may be subdivided into a number of fragments (e.g., of approximately the same size) that is determined based on the number of different rearranged variants to be included in the library. In some embodiments, one or more fragments may be defined based on the presence of one or more convenient restriction sites spaced at suitable interval along the coding sequence. In some embodiments, the presence of certain repeat sequences, high GC content, or other sequence motifs that are predicted to interfere with nucleic acid assembly may provide a basis for defining fragments of an initial coding sequence. For example, different fragments may be designed so that sequences predicted to interfere with each other during assembly are located on different fragments. Accordingly, different fragments may be assembled separately and then recombined or further assembled in different configurations to generate the constructs of a library. It should be appreciated that a library of rearranged fragments may contain a combination of different fragments that are defined based on different criteria (e.g., based on a combination of any one or more of the criteria described herein, including natural intron/exons boundaries). Many of the libraries of rearranged coding fragments may be assembled without including introns. However, one or more introns may be included in some embodiments, either between exons of the initial gene and/or between nucleic acid regions encoding fragments that were defined based on one or more criteria other than exon/intron boundaries. The introns may consist of natural intron sequences or contain artificial and/or modified intron sequences.
Once an initial gene sequence (e.g., with or without introns) has been subdivided into different fragments, libraries containing different types of rearrangements may be designed and assembled as described above for exon rearrangements. For example, a library containing and/or expressing different subsets of the predetermined fragments (meaning that one or more of the initial fragments are missing from each subset) may be assembled with the remaining fragments in each subset being in the same relative order as the starting gene. A library also may be designed and assembled to include only permutations of the predetermined fragments. Yet further libraries may be designed and assembled to include combinations of fragment subsets and fragment permutations, including or excluding one or more permutations of fragment subsets. For example, certain constructs in the library may represent fragment subsets wherein the relative order is the same as the order in the initial gene. Other constructs in the libraries may represent permutations of all of the predetermined fragments from the initial gene. A library also may contain constructs that represent permutations of fragment subsets. Depending on the number of fragments that are being rearranged and the number of different variants present in a library, different percentages of all possible combinations of subsets and/or permutations will be represented (assuming that all variants would be assembled and represented at similar frequencies). Also a library may be designed to represent only a predetermined subset of all possible rearrangements. Accordingly, a library may be designed to represent and/or actually include at least at least 1%, at least 5%, at least 10%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 90%, at least 95%, about 99%, or more (e.g., all or substantially all) possible fragment subsets, fragment permutations, or combinations of fragment subsets and permutations. However, in some embodiments, less than 1% of the total possible variants may be represented. It should be appreciated that a library also may include one or more other sequences. Some of these sequences may be included intentionally (e.g., as control sequences, reference sequences, or other sequences). However, due to errors during assembly, the instability of certain constructs, the presence of contaminants, etc., certain sequences in the library may not correspond to designed variants or other sequences intended to be in the library. Accordingly, in some embodiments, at least 1%, at least 5%, at least 10%, at least 20%, at least 25%, at least 50%, at least 75%, at least 80%, at least 90%, at least 95%, about 99%, or more (e.g., all or substantially all) of the constructs in the library are predetermined fragment rearrangements as described herein.
Library design and assembly:
A library may be designed and/or assembled to contain any number of different variants. Each variant may be individually assembled using one or more multiplex assembly methods described herein. In some embodiments, one or more (e.g., 100%, 100%-75%, 75%-50%, 50%-25%, 25%- 10%, or fewer) of the different fragments may be assembled independently and reused in different combinations to generate constructs encoding different subsets, permutations, and/or subset permutations as described herein. A library may be assembled to include 5-50 constructs, 50-100 constructs, 100-1,000 constructs, 103- 104 constructs, 104- 105 constructs, 105- 106 constructs, 106- 107 constructs, 107-108 constructs, 108-109 constructs, 109-1010, 1010-1015 constructs, or more. Typically, each construct will include a single variant. However, in some embodiments, a single construct may be designed to include 2 or more different variants (e.g., 2-, 3, 4, 5, 5-10, 10-50, 50-100, or more different variants). The number of different combinations that a library can represent depends in part on the theoretical number of possible combinations. Total numbers of different combinations of fragment subsets and permutations can be calculated theoretically as described in the following paragraphs.
Fragment subsets:
In some embodiments, the number CN(r) of different combinations of subsets of n predetermined fragments (e.g., exons and/or other predetermined fragments) taken in groups of r, in the same relative order as in an initial gene may be provided by the following equation:
n\
CN (r) = (Equation 1) r\(n - r)\
For example, if a gene has n exons, the number of different combinations of subsets of exons that exclude only one exon may be expressed as the number of different n\ n\ combinations of n-1 exons: CNCn-I) = '■ = '- — = n
(Λ - l)!(ιι - (n - I))! (H - I)!
Similarly the number of different combinations N of subsets of exons that exclude only two exons may be expressed as the number of different combinations of n-2 exons: n\ n\ n * (n - 1)
CN(n-2) =
(Λ - 2)!(» - (Λ - 2))! (Λ - 2)!*2!
Similarly, the number of other combinations of different numbers of subsets of exons (fragments) may be calculated. The total number of different combinations of different numbers of predetermined fragments, in the same order as in the original gene, may be provided by the following general equation:
Ϋ CN (r) = Ϋ (Equation 2) tT tM(n-r)! This number includes the original fragment configuration. In some embodiments, a library may be prepared to include the original structure. This may be used as an internal control. However, in some embodiments, a library may be assembled to exclude the original configuration. Also, the original configuration may be provided or assembled separately for use as a control. The number of new subsets of the predetermined fragments, excluding the original fragment configuration, may be provided by the following general equation:
Y CN (r) -I = Y I = Y (Equation 3) tf £? H(#i -r)! ti r!(/i -r)!
It should be appreciated that in some embodiments not all combinations of different subsets will be made. For example, subsets having a single fragment may be avoided. The number of different fragment subsets not counting those having a single fragment may be provided by Equation (2) or Equation (3) minus n (the number of different fragments) depending on whether the original combination is included. Accordingly, the number of all different subsets excluding those with a single fragment may be provided by:
Y CiV Xr) -n = Y n =Y (Equation 4)
or,
Y CN(r)-l-n = Y n = Y (Equation 5)
In general, the predicted number of different subset combinations that exclude those having fewer than m fragments may be expressed as the sum of all combinations having between m and n fragments: ∑CN(r) = ∑ w! (Equation 6) r=m r=m ' - V" ' ) •
or the sum of all combinations having between m and n-1 fragments:,
n-l
∑CN(r)-l = ∑ n\
(Equation 7) f^ r\{n-r)\
depending on whether the original combination is included.
Fragment permutations: In some embodiments, predetermined fragments of a gene or protein may be reordered. The number PN of possible rearrangements consisting of the same number of original fragments n in different relative orders may be provided by the following equation:
PN = n\ (Equation 8)
The number of new permutations (i.e., excluding the original configuration) may be provided by the following equation:
PN-I = n\-\ (Equation 9)
It should be appreciated that the new permutations include many different configurations ranging from examples where the relative order of most or all of the fragments is changed to examples where the relative order of only a small number of fragments is changed. For example, new permutations include examples where the order of two adjacent fragments is changed (e.g., inverted) relative to each other, but their order relative to other surrounding fragments is not changed.
The number of possible permutations of subsets having r fragments out of the original number of n different fragments may be provided by the following equation: PN(r) = — - — (Equation 10)
(n - r)\
The total number of possible permutations, including permutations of all different subsets of r fragments, may be provided by the following equation:
∑PN(r) = ∑— ^- (Equation 11) r=\ r=\ (n - ry.
If the library excludes the original configuration, the number of new rearrangements including all possible permutations of all different subsets of r fragments may be provided by the following equation:
n\
∑JW(r)-l = ∑ - 1 (Equation 12) r=\ i (n-r)\
It should be appreciated that in some embodiments not all permutations of different subsets will be made. For example, subsets having a single fragment may be avoided. Accordingly, the number of permutations would be provided by Equation 11 or Equation 12 minus n (the number of different fragments that could form single fragment variants) depending on whether the original combination is included. Accordingly, the number of all different permutations, including permutations of all subsets, but subsets having a single fragment may be provided by:
∑PN(r)-n = ∑—^— (Equation 13) r=\ r=2 \n - ry.
or,
∑PN(r)-l-n = ∑- *— - \ (Equation 14) r=] r^2 Kn ~ r)- In general, a number of all different permutations, including permutations of different subsets of r fragments that avoid permutations having fewer than m fragments may be expressed as:
∑ PN(r) = ∑ n!
(Equation 15)
\ (n - r)\
or,
∑ PN(r) = ∑ -^- - 1 (Equation 16)
depending on whether the original configuration is included.
According to aspects of the invention, a library may include all of the theoretical possible fragment rearrangements described herein (e.g., any of the numbers of fragment rearrangements provided by any of the above equations. However, a library may include any subset of different rearrangements. Accordingly, any percentage of the numbers of different rearrangements provided by any of equations (1) through (16). For example, at least 1%, at least 5%, at least 10%, at least 20%, at least 30%, at least 40%, at least 50%, at least 60%, at least 70%, at least 80%, at least 90%, at least 95%, etc., of the numbers of possible different fragment arrangements may be included in the library. However, higher or lower percentages may be included in the library. In some embodiments, a subset of arrangements may be specifically identified and the library may be designed and assembled to include only that specific subset. In some embodiments, a library may be designed to include all of the different combinations, but only a percentage are actually assembled. In some embodiments, a library also may include variants with one or more duplicated fragments. The possible variants including duplicated fragments will be higher than outlined above and will depend on the number of fragments for which duplications are provided and the number of copies of each duplicated fragment included in the variants. It should be appreciated that any of the libraries described herein may be designed and/or assembled to include or exclude certain nucleic acids. For example, the initial nucleic acid may be included in some embodiments, or excluded in other embodiments. In some embodiments, a library may be designed and/or assembled to either include or exclude certain rearranged nucleic acids. For example, a library may be designed to exclude rearranged nucleic acids that are smaller than about 75%, smaller than about 50%, smaller than about 25%, smaller than about 10%, or smaller than about 5% of the length of the initial nucleic acid. In certain embodiments, a library may be assembled or designed to include certain structural and/or functional portions of an initial nucleic acid. In some embodiments, certain sequences such as signal peptides may be excluded from the library variants. This may require excluding all or part of the first exon of certain genes. However, in other embodiments, the cognate signal peptide may be retained in all variants of a protein encoding sequence.
It should be appreciated that the equations set forth herein provide theoretical numbers of possible alternative rearranged fragments of one or more initial nucleic acids based on a predetermined fragment selection. However, a library may contain different configurations of nucleic acid rearrangements based on initial nucleic acids, initial fragment selection (e.g., based on exons, structural or functional domains, etc. or any combination thereof), the inclusion of sequence variants for one or more fragments, the inclusion of one or more controls or reference constructs, and/or the exclusion of one or more nucleic acid configurations. In some embodiments, a library may include all or most rearrangements based independently on two or more different initial fragment selection criteria (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 10-15, or more different initial fragment selection criteria). For example, a library may include all or most exon rearrangements, and all or most rearrangements based on one or more other initial fragment selections (e.g., structural domains, functional domains, etc. or any combination thereof). Accordingly, the equations set forth herein are not intended to be limiting but may be useful to determine the number of different constructs that may be required for a library to be representative of all (or a significant proportion) of sequence rearrangements based on one or more initial nucleic acids and one or more predetermined fragment selection. For example, a library may be designed or assembled to include from about 10% of the theoretical number of possible rearrangements to several fold the theoretical number of possible rearrangements. For example, in some embodiments a library may be designed or assembled to include about 1%, about 5%, about 10%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, about 90%, about 100%, about 1.01 fold, about 1.05 fold, about 1.1 fold, about 1.2 fold, about 1.3 fold, about 1.4 fold, about 1.5 fold, about 1.6 fold, about 1.7 fold, about 1.8 fold, about 1.9 fold, about 2 fold, about 2.5 fold, about 3 fold, about 3.5 fold, about 4 fold, about 4.5 fold, about 5 fold, about 10 fold, about 10-50 fold, about 50- 100 fold, or more of the theoretical number of different rearranged nucleic acids based on one or more equations provided herein.
Rearrangements of gene regions:
Aspects of the invention may involve rearranging fragments representing the entire coding sequence of a gene product of interest (e.g., a protein of interest).
However, in some embodiments, only certain fragments may be rearranged from certain portions of a coding sequence or gene and not the entire coding sequence or gene. For example, deletions of one or more fragments (e.g., exons or other fragments) may be restricted to a portion of a coding sequence or gene (or two or more portions of a coding sequence or gene, but not the entire coding sequence or gene). Similarly, permutations may be restricted to fragments from a portion of a coding sequence or gene (or two or more portions of a coding sequence or gene, but not the entire coding sequence or gene). In certain embodiments, certain structural regions of a gene product that are known to be important may be retained and maintained in the same relative order. However, other regions or subsets of other regions may be deleted and/or permuted as described herein. A theoretical number of possible rearrangements for portions of a coding sequence or gene may be calculated using the equations provided herein. Accordingly, in each equation the number n of different fragments being rearranged can be the number of fragments that represent only the portions of the coding sequences or genes that are being varied.
It should be appreciated that any portions of a coding sequence or gene may be varied. However, in some embodiments, fragments representing functional and/or structural domains may be rearranged within one or more portions of interest. For example, protein transmembrane domains may be rearranged (e.g., deleted and/or permuted) as described herein without removing or changing the relative position of other portions of a protein. Similarly, one or more protein binding sites and/or active site regions may be rearranged without rearranging an entire protein. Accordingly, libraries of nucleic acids may be assembled to represent different predetermined coding sequences for protein variants having one or more rearranged portions.
Initial genes and proteins: It should be appreciated that aspects of the invention are described in relation to a starting or initial nucleic acid sequence (e.g., an initial gene encoding an RNA or polypeptide of interest). In some embodiments, the initial nucleic acid may be a wild- type gene encoding a molecule of interest. In some embodiments, the initial nucleic acid may be a mutant (e.g., a naturally occurring mutant or an artificial mutant of a wild-type gene). A mutant gene may containing one or more point mutations, deletions, insertions, inversions, duplications, etc., or any combination thereof relative to a wild-type gene.
The choice of an initial gene may be based on a number of different factors. In some embodiments, the initial gene encodes for one or more properties that are desirable (e.g., low immunogenicity, solubility, stability, etc.). However, the initial gene also may be selected based on the presence of one or more properties that are expected to give rise to variants having functional and or structural characteristics of interest. For example, an initial gene may be selected because it encodes one or more catalytic regions and/or one or more binding regions for which modifications and/or new combinations are desired. An initial gene may be selected on the basis of certain therapeutic properties, but for which modifications are desired to enhance the therapeutic activity and/or reduce certain negative physical and/or physiological characteristics.
In certain embodiments, an initial nucleic acid may be an entirely synthetic nucleic acid, in which case it may not include exons. Also, certain initial nucleic acids may be from prokaryotic organisms and not have any introns or exons. In addition, certain eukaryotic genes do not contain any exons and may be used as an initial nucleic acid in aspects of the invention. In many embodiments, an initial gene may include exons in a genomic form. However, the initial gene used as a reference for fragment rearrangement may be based on a cDNA construct without introns.
Libraries containing rearrangements of two or more initial nucleic acids:
In some aspects of the invention, a library may represent predetermined rearranged variants of a single initial nucleic acid (e.g., coding sequence or gene). However, in certain aspects, a library may represent rearranged variants of two or more different initial nucleic acids (e.g., 3, 4, 5, 6, 7, 8, 9, 10, 10-15, 15-20, 20-25, 25-50, 50- 100, or more different genes or coding sequences). The genes may be related or unrelated. In some embodiments, related nucleic acids have sequence similarities, for example 70% or more nucleic acid sequence identity (e.g., 70-75%, 75-80%, 80-85%, 85-90%, 90-95%, about 96%, about 97%, about 98%, about 99%, or more nucleic acid sequence identity) or 70% or more amino acid sequence identity (e.g., 70-75%, 75-80%, 80-85%, 85-90%, 90-95%, about 96%, about 97%, about 98%, about 99%, or more amino acid sequence identity). In some embodiments, related nucleic acids may have functional similarities (e.g., they encode kinases, phosphatases, receptors, growth hormones, cytokines, antibodies, etc.). In some embodiments, a library contains variants of two or more different starting nucleic acids from only one species (e.g., humans, mice, pigs, non-human primate species, rats, cows, sheep, horses, dogs, cats, etc.). However, in other embodiments, a library contains variants of starting nucleic acids from two or more different species (e.g., two or more species selected from, but not limited to, humans, mice, pigs, non-human primate species, rats, cows, sheep, horses, dogs, and cats).
In certain embodiments, a library may contain variants of two or more independently rearranged nucleic acids (e.g., rearranged variants of each of two or more genes are assembled independently). However, in some embodiments, fragments from different starting nucleic acids (e.g., different genes) may be interchanged in addition to fragments being deleted and/or permuted as described herein. For example, fragments from related genes (e.g., encoding protein isoforms) or gene variants (e.g., polymorphic variants or mutants of the same gene) may be swapped in addition to deleted and/or permuted. Accordingly, aspects of the invention embrace rearranged configurations of specific proteins and/or specific protein domains from related or variant proteins. The invention provides methods to create libraries of rearranged configurations of specific proteins and/or specific protein domains from related or variant proteins. Aspects of the invention also embrace libraries of libraries, for example, combinations of multiple protein domain configurations. For instance, if a protein requires a specific element or domain, for example an hydrophobic loop, a library of sequence configurations of hydrophobic loops (e.g., from the same protein or from related or variant proteins) can be inserted as elements of the library of rearranged protein configurations. By also interchanging fragments with similar fragments having variant sequences, the theoretical number of different rearranged nucleic acids is increased. For example, if x different initial nucleic acids (e.g., different related nucleic acids or sequence variants of the same nucleic acid) are each subdivided into the same number n of initial fragments based on an initial set of criteria (e.g., exons, structural or functional domains, etc. or any combination thereof), and each initial fragment can be replaced with an equivalent initial fragment from one of the x different nucleic acids (e.g., from the same relative position amongst the initial fragments, but from a different initial nucleic acid), the number CN(r) of different combinations of subsets of n predetermined fragments (e.g., exons and/or other predetermined fragments) taken in groups of r, in the same relative order as in an initial gene may be provided by the following equation:
CN(r) = xr (Equation 17) r\(n - r)\
Accordingly, the total number of different subset combinations that exclude those having fewer than m fragments and wherein any of x different variants may be interchanged for each fragment, may be provided by the following equations:
∑ CiV Xr) = ∑ n[ xr (Equation 18) r=m r=m 1" - KfI F)-
n-l
∑CN(r)-\ = ∑ n\
(Equation 19)
^m r\(n - r)\
depending on whether the original combination is included.
Similarly, the number of permutations having the same number of original fragments n in different relative orders, wherein any of x different variants may be interchanged for each fragment, may be provided by the following equation:
PN = n\ x" (Equation 20) The number of possible permutations of subsets having r fragments out of the original number of n different fragments, wherein any of x different variants may be interchanged for each fragment, may be provided by the following equation:
n\
PN (r) = -xr (Equation 21)
(n - r)l
The total number of possible permutations, including permutations of all different subsets of r fragments, wherein any of x different variants may be interchanged for each fragment, may be provided by the following equation:
∑PN(r) = ∑-^/ (Equation 22) r=\ r=i \n - ry.
It should be appreciated that in other embodiments, different number of sequence variants may be interchanged for different fragments being rearranged. Certain fragments may be retained and/or maintained in the same relative order and not interchanged with any other sequence variants. However, other fragments may be interchanged with different numbers of sequence variants in addition to being rearranged. For example, certain fragments may be interchanged with any one of x' different related structural fragments (e.g., encoding similar structural domains). Similarly, certain fragments may be interchanged with x " different related functional fragments (e.g., encoding similar functional domains). In some embodiments, x ' may equal x " (e.g., if the different structural and functional domain variants are from the same group of related genes or proteins). However, in other embodiments, x ' may be greater than x " or x " may be greater than x '. It should be appreciated that the number of sequence variants that are interchanged for any one fragment may be independent of the number of sequence variants exchanged for any other fragment. For example, two or more different fragments representing different structural and/or functional regions of a nucleic acid may be interchanged with different numbers of sequence variants. If the number of sequence variants that can be interchanged for each fragment is different, the theoretical total number of possible different nucleic acids may be calculated by replacing xr in
Equations 17-22 with the product of the different numbers of sequence variants that are used for each of the fragments being included in the assembled nucleic acids. As discussed above, it should be appreciated that these calculations are useful to determine a theoretical number of different rearranged nucleic acids may required for a library top be representative of the full range of possible variants. However, one or more predetermined sequences may be excluded (e.g., initial sequences, certain short sequences, etc., may be excluded). In certain embodiments, a library may be designed or assemble to contain only a fraction of the possible rearranged nucleic acids as described herein.
Vectors, Host Cells and Organisms:
It should be appreciated that various expression systems, both in vivo and in vitro, can be used for carrying out aspects of the invention. When an in vivo expression system is to be used, a number of suitable expression vectors can be contemplated. These include a number of expression vector plasmid constructs, containing one or more promoter elements capable of driving the transcription of the gene or genetic element of interest. Suitable host cells include but are not limited to: bacterial (e.g., E. coli), yeast, insect cells, and mammalian cells. Thus, the invention extends to such cells expressing the vector, either transiently or stably, that contain the nucleic acid sequence of interest as defined in the invention, operatively linked to appropriate transcription/translation systems.
Any suitable vector (e.g., plasmid, BAC, YAC, viral vector, etc.) or combination of two or more vectors may be used to harbor one or more engineered genes. In some embodiments, one or more variants (or all variants) may be encoded on the genome of a host cell or organism. In some embodiments, genes encoding polypeptide variants may be clustered within one or a few (e.g., 2, 3, 4, or 5) genetic regions (e.g., plasmid, genomic regions, chromosomes, etc.), organized on one or a few (e.g., 2, 3, 4, or 5) operons, or distributed across many genetic regions or operons (e.g., 6-10 or more).
Any suitable host cell or organism may be used or modified to express an engineered protein variant. A host cell may be a unicellular organism (e.g., a bacterial or yeast cell). A host cell may be a cell obtained from a multicellular organism but grown in culture (e.g., a mammalian cell grown in culture). A host organism may be a multicellular organism. Examples of multicellular organisms include animals and plants, e.g., mammals, insects, reptiles, fish, birds, land plants, aquatic plants, agricultural plants, monocotyledonous and/or dicotyledonous plants, etc. The type of host chosen may depend on the application.
In some embodiments, cell systems may express suitable splicing enzymes. Different species have different levels of splicing activity. Even different cell types within a species may have different levels of splicing activity. Accordingly, a suitable splicing host cell may be used (e.g., in culture) and transformed with one or more constructs designed to produce a library of engineered splice variants.
In some embodiments, a host cell may be engineered to have a modified genome that is suited to the expression of variant polypeptides. For example, a host cell may be engineered to have a reduced genome size (e.g., a genome that is smaller by 10%, 20%, 30%, 40%, 50%, or more). Such a host cell may be adapted to accommodate genetic elements encoding one or more engineered genes of interest. A host cell may be engineered to encode one or more functions for importing (e.g. substrates), synthesizing, or exporting (e.g. products) metabolites, proteins, or other molecules that are useful to assay for the functional and/or structural properties of interest. For example, a host cell may be engineered to encode one or more membrane-bound transporters (e.g., pumps). A host cell may also be engineered to improve growth rate and/or viability in unnatural environments, to detect the presence of a molecule in its environment, to communicate with other cells, to self-organize into patterns, to propagate or die under defined conditions, to act as a scaffold for extracellular synthesis of materials, or to degrade substances in its environment such as environmental contaminants or pathogens.
Alternatively, in vitro expression system may be used. In vitro transcription and translation techniques are widely known and available in the art.
Expressed RNAs and polypeptides and screening and selection methods:
Libraries and methods described herein are particularly useful for expressing and assaying polypeptide variants. Any suitable assays may be used to determine the characteristics of expressed RNA and/or polypeptide variants and isolate one or more candidate variants having predetermined levels of one or more structural and/or functional activities.
Various purification methods can be used to isolate nucleic acids or polypeptides of interest. Such technology is well known in the art. In some embodiments, nucleic acids encoding polypeptide variants may be fused to a detection moiety (e.g., a reporter molecule) and/or a purification tag. A tag may include a label, probe, myr, an affinity- based tag, a charge-based tag, His, HA, AP, GST, Flag, biotin, HRP, GFP, myc, Glut, MBP, CBP, Chitin, etc. Common epitope tags that may be used include c-myc, HA, FLAG, and/or V5. Candidate therapeutic compounds that are isolated based on one or more structural and/or functional properties may be tested in one or more clinical assays or trials to determine their therapeutic potential.
Novel therapeutic compounds can be based on rearrangements of initial genes that are known to encode a therapeutic compound. An initial gene may encode a hormone, a growth factors, a therapeutic antibody, a receptor, a peptide ligand, or other therapeutic polypeptide. An initial gene may be a genomic gene, a cDNA, a human gene, a non-human gene, a recombinant gene, a modified gene or any other suitable gene. Non-limiting examples of therapeutic polypeptides that can be varied and analyzed as described herein include calcitonin, insulin, insulinotropin, insulin-like growth factors, parathyroid hormone, nerve growth factors, TGF-β, tumor necrosis factor, glucagon, bone growth factor-2, bone growth factor-7, TSH-β, interleukin 1, interleukin 2, interleukin 3, interleukin 6, interleukin 11, interleukin 12, CSF- macrophage, immunoglobulins, catalytic antibodies, protein kinase C, superoxide dismutase, tissue plasminogen activator, urokinase, antithrombin III, DNase, tyrosine hydroxylase, blood clotting factor V, blood clotting factor VII, blood clotting factor VIII, blood clotting factor X, blood clotting factor XIII, apolipoprotein E, apolipoprotein A-I, globins, low density lipoprotein receptor, IL-2 receptor, IL-2 receptor antagonists, alpha- 1 antitrypsin, immune response modifiers, α-galactosidase, glucocerebrosidase, erythropoietin, and soluble CD4, including human and recombinant forms of any of these therapeutic proteins.
Similarly, candidate agricultural, industrial or other polypeptides may be isolated and assayed to determine their potential under appropriate conditions.
Business applications: Aspects of the invention may be used to generate libraries of nucleic acid sequence variants enriched for peptide-expressing constructs having one or more desired properties. Certain libraries represent one or more types of gene fragment rearrangements based on a gene encoding a therapeutic polypeptide. Accordingly, aspects of the invention relate to marketing the methods, compositions, kits, devices, and systems described herein for generating nucleic acid libraries of rearranged genes. These may be used for discovering a novel class of therapeutic products, increasing patient access to a wider range of therapeutic products, and also decreasing cost and time for approval and market access for novel therapeutic products.
Aspects of the invention may be useful for reducing the time and/or cost of production, commercialization, and/or development of a range of new gene products in addition to new therapeutic products. Accordingly, aspects of the invention relate to business methods that involve collaboratively (e.g., with a partner) or independently marketing one or more methods, kits, compositions, devices, or systems for analyzing and/or assembling libraries and identifying novel polypeptides and gene encoding them. For example, certain embodiments of the invention may involve marketing a procedure and/or associated devices or systems involving techniques and assays described herein. In some embodiments, synthetic nucleic acids, libraries of synthetic nucleic acids, host cells containing synthetic nucleic acids, expressed polypeptides or proteins, etc., also may be marketed.
Marketing may involve providing information and/or samples relating to methods, kits, compositions, devices, and/or systems described herein. Potential customers or partners may be, for example, companies in the pharmaceutical, biotechnology and agricultural industries, as well as academic centers and government research organizations or institutes. Business applications also may involve generating revenue through sales and/or licenses of methods, kits, compositions, devices, and/or systems of the invention.
EXAMPLES
Example 1. Exon combinations and permutations
Based on genome sequence information, the average human gene is 28,000 nucleotides long and consists of 8.8 exons of about 120 nucleotides in length. The exons of an average human gene are separated by 7.8 introns ranging from 100 to 100,000 nucleotides long. It should be appreciated that these numbers are averages and that the numbers of exons and introns in actual genes are integer- valued (whole numbers). Accordingly, a representative gene may have 9 exons separated by 8 introns. However, a representative gene may be subdivided into more than 9 different fragments based on other criteria such as structural domains, functional domains, repeat regions, one or more homologies with other genes or consensus sequences, etc., or any combination thereof. The number of different unique sequences that may be required for a library to be representative of all (or a significant portion of) possible rearrangements of the predetermined fragments will depend on the types of rearrangements and fragments that are contemplated. Regardless of the ultimate number of different rearranged nucleic acids, it will be important for the library to be assembled accurately to include the predetermined variants as opposed to random variants. Accordingly, a library may be assembled using one or more multiplex assembly techniques described herein to include primarily sequence variants of interest. The following paragraphs provide theoretical numbers of different rearranged nucleic acids based on alternative predetermined configurations of interest. In some examples, a library may be assembled to include a number of independent constructs that is several fold higher (e.g., 2 fold, 5 fold, 10 fold, or more) than the theoretical total number of different rearranged nucleic acids. In one example, different subsets of the 9 exons of a representative gene may be assembled in different synthetic genes (e.g., excluding at least one exon one or more or all of the introns). If the same relative order of exons is maintained in the synthetic genes as in the original gene, the number of different combinations of subsets of the 9 exons is provided by Equation 1 (where r is the number of exons in the synthetic gene):
CN (r) = (Equation 1) r!(9 - r)!
Accordingly, the number of possible different genes having 8 of the initial 9 exons maintained in the same relative order is:
CN(8) = 362,880/40,320=9
Similarly, the numbers of possible different genes having 7, 6, 5, 4, 3, 2, or 1 of the initial 9 exons maintained in the same relative order is, respectively: CN(7) = 362,880/10,080=36 CN(6) = 362,880/4,320=84 CN(5) = 362,880/2,880=126 CN(4) = 362,880/2,880=126 CN(3) = 362,880/4,320=84
CN(2) = 362,880/10,080=36 CN(I) = 362,880/40,320=9 Accordingly, the total number of different possible rearranged genes having subsets of between 1 and 8 of the original 9 exons maintained in the same relative order is 510. A library of nucleic acid constructs must contain at least 510 (at least 511 if the original gene or a vector with no exons is included in the library, for example as a control, or at least 512 if the original gene and the vector with no exons are both included in the library, for example, as controls). Accordingly, a library may be designed to consist of all or most of the different predetermined rearranged nucleic acids and be assembled to include at least about 1,000; at least 2,000; or at least 5,000 independent constructs in different embodiments. In another example, if the relative order of the 9 exons of a representative gene is not maintained and synthetic genes are made with any relative order of exons (including the original relative order), the number of different permutations of the 9 exons is provided by Equation 8:
PN = 9\ = 362,880 (Equation 8) If synthetic genes are assembled having subsets of r exons out of the original 9 exons, and the relative order of the exons is not maintained, the number of different permutations of the r exons is provided by Equation 10:
PN(r) = 9! (Equation 10)
( / (9 - r)! V 4 }
Accordingly, the number of possible different genes having 8 of the initial 9 exons arranged in any relative order is:
PN(8) = 362,880/1=362,880
Similarly, the numbers of possible different genes having 7, 6, 5, 4, 3, 2, or 1 of the initial 9 exons arranged in any relative order is, respectively:
PN(7) = 362,880/2=181,440 PN(6) = 362,880/6=60,480
PN(5) = 362,880/24=15,120 PN(4) = 362,880/120=3,024 PN(3) = 362,880/720=504 PN(2) = 362,880/5,040=72 PN(I) = 362,880/40,320=9
Accordingly, the total number of different possible rearranged genes having between 1 and 9 of the original 9 exons arranged in any relative order is 986,409. A library of nucleic acid constructs must contain at least 986,409 (at least 986,410 if a vector with no exons is included in the library, for example as a control). Accordingly, a library may be designed to consist of all or most of the different predetermined rearranged nucleic acids and be assembled to include at least about 1,000,000; at least 2,000,000; at least 5,000,000 or at least 10,000,000 independent constructs in different embodiments.
In yet another example, exons from different isoforms of the human growth hormone gene may be interchanged. Five isoforms of the human growth hormone gene each have five exons. Different subsets of the 5 exons may be assembled in different synthetic genes (e.g., excluding at least one exon and one or more or all of the introns). Five different synthetic genes are assembled for each exon that is included in each configuration in order to represent each of the initial exon isoforms in different configurations in the assembled library. If the same relative order of exons is maintained in the synthetic genes as in the original genes, the number of different combinations of subsets of the 5 exons accounting for the 5 isoforms of each exon is provided by
Equation 17 (where r is the number of exons in the synthetic gene and 5 is the number of different isoforms for each exon):
CN(r) = 5r (Equation 17) r!(5 -r)!
Accordingly, the number of possible different genes having 4 of the initial 5 exons maintained in the same relative order and accounting for 5 different isoforms of each exon is:
CN(4) = 120*625/24=3,125
Similarly, the numbers of possible different genes having 3, 2, or 1 of the initial 5 exons maintained in the same relative order and accounting for 5 different isoforms of each exon is respectively:
CN(3) = 120*125/12=1,250 CN(2) = 120*25/12=250 CN(I) = 120*5/24=25
Accordingly, the total number of different possible rearranged genes having subsets of between 1 and 4 of the original 5 exons maintained in the same relative order and accounting for 5 different isoforms for each exon is 4,650. A library of nucleic acid constructs must contain at least 4,650 (at least 4,651 if the original vector with no exons is included in the library, at least 4,655 if the original 5 isoforms are included in the library, or at least 4,656 if the original isoforms and the vector with no exons are both included in the library, for example, as controls). Accordingly, a library may be designed to consist of all or most of the different predetermined rearranged nucleic acids and be assembled to include at least about 10,000; at least 20,000; or at least 50,000 independent constructs in different embodiments.
In another example, if the relative order of the 5 exons of the human growth hormone gene is not maintained and synthetic genes are made with any relative order of exons (including the original relative order), the number of different permutations of the 5 exons, accounting for the 5 isoforms of each exon, is provided by Equation 20:
PN = 5 ! 55 = 375,000 (Equation 20)
If synthetic genes are assembled having subsets of r exons out of the original 5 exons, and the relative order of the exons is not maintained, the number of different permutations of the r exons accounting for 5 isoforms of each exon is provided by Equation 21 :
PN (r) = 5! 5r (Equation 21 )
(5 - r)!
Accordingly, the number of possible different genes having 4 of the initial 5 exons arranged in any relative order and accounting for 5 isoforms of each exon is:
PN(4) = 120*625/1=75,000 Similarly, the numbers of possible different genes having 3, 2, or 1 of the initial 5 exons arranged in any relative order and accounting for 5 isoforms of each exon is, respectively:
PN(3) = 120*125/2=7,500 PN (2) = 120*25/6=500 PN(I) = 120*5/24=25
Accordingly, the total number of different possible rearranged genes having between 1 and 5 of the original 5 exons arranged in any relative order and accounting for 5 isoforms of each exon is 458,025. A library of nucleic acid constructs must contain at least 458,025 (at least 458,026 if a vector with no exons is included in the library, for example as a control). Accordingly, a library may be designed to consist of all or most of the different predetermined rearranged nucleic acids and be assembled to include at least about 500,000, at least 1,000,000; at least 2,000,000; or at least 5,000,000 independent constructs in different embodiments.
In further examples, an initial coding sequence or gene may be subdivided into additional non-overlapping fragments (e.g., based on secondary structures, functional domains, etc., instead of or in addition to the exon fragments) and additional rearranged nucleic acids may be assembled. In some examples a library may contain a combination of rearranged fragment variants based on different sets of predetermined non-overlapping fragments.
Example 2. Example of patterns of exon rearrangements
In one example, a therapeutic protein may be encoded by a gene having exons Al, A2, A3, B, and C as shown in FIG. IB. Examples of four different exon subsets also are shown in FIG. IB along with a transcript containing all five exons. These exon subsets can be generated by alternative splicing and/or by assembling four different nucleic acid constructs, each encoding one of the exon subsets with no introns. It should be appreciated that additional exon subsets could be generated from this gene.
FIG. 2 A shows a non-limiting example of a gene with three exons (A, B, and C) separated by two introns (1 and 2).
FIG. 2B shows all of the different possible exon configurations that can be generated with the original exons and different exon subsets wherein the relative order of exons is maintained.
FIG. 2C shows all of the possible new exon permutation that can be generated with all of the original exons (the original configuration of the three exons is not shown in FIG. 2C).
Example 3. Multiplex Nucleic Acid Assembly
Aspects of the invention may involve one or more nucleic acid assembly reactions in order to make gene fragments, constructs containing rearranged gene fragments, modified host cells, and/or other nucleic acids that may be used to generate biological diversity (e.g., introns or other recombination sequences) and screen or select for one or more functions of interest.
Aspects of the invention involve assembling nucleic acids that contain one or more gene fragments. Aspects of the invention involve assembling nucleic acids that can be used to modify the genome of a host cell. For example, the genome of a host cell may be reduced in size (e.g., by 1%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, 50%, or more) in order to accommodate nucleic acids that encode different configurations of gene fragment rearrangements. Nucleic acids of the invention may be assembled using any suitable method including a combination of one or more ligation, recombination, or extension reactions. Multiplex nucleic acid assembly reactions may be used to assemble one or more nucleic acid components. Multiplex nucleic acid assembly relates to the assembly of a plurality of nucleic acids to generate a longer nucleic acid product. In one aspect, multiplex oligonucleotide assembly relates to the assembly of a plurality of oligonucleotides to generate a longer nucleic acid molecule. However, it should be appreciated that other nucleic acids (e.g., single or double-stranded nucleic acid degradation products, restriction fragments, amplification products, naturally occurring small nucleic acids, other polynucleotides, etc.) may be assembled or included in a multiplex assembly reaction (e.g., along with one or more oligonucleotides) in order to generate an assembled nucleic acid molecule that is longer than any of the single starting nucleic acids (e.g., oligonucleotides) that were added to the assembly reaction. In certain embodiments, one or more nucleic acid fragments that each were assembled in separate multiplex assembly reactions (e.g., separate multiplex oligonucleotide assembly reactions) may be combined and assembled to form a further nucleic acid that is longer than any of the input nucleic acid fragments. In certain embodiments, one or more nucleic acid fragments that each were assembled in separate multiplex assembly reactions (e.g., separate multiplex oligonucleotide assembly reactions) may be combined with one or more additional nucleic acids (e.g., single or double-stranded nucleic acid degradation products, restriction fragments, amplification products, naturally occurring small nucleic acids, other polynucleotides, etc.) and assembled to form a further nucleic acid that is longer than any of the input nucleic acids.
In aspects of the invention, one or more multiplex assembly reactions may be used to generate target nucleic acids having predetermined sequences. In one aspect, a target nucleic acid may have a sequence of a naturally occurring gene and/or other naturally occurring nucleic acid (e.g., a naturally occurring coding sequence, regulatory sequence, non-coding sequence, chromosomal structural sequence such as a telomere or centromere sequence, etc., any fragment thereof or any combination of two or more thereof). In another aspect, a target nucleic acid may have a sequence that is not naturally-occurring. In one embodiment, a target nucleic acid may be designed to have a sequence that differs from a natural sequence at one or more positions. In other embodiments, a target nucleic acid may be designed to have an entirely novel sequence. However, it should be appreciated that target nucleic acids may include one or more naturally occurring sequences, non-naturally occurring sequences, or combinations thereof.
In one aspect of the invention, multiplex assembly may be used to generate libraries of nucleic acids having different sequences. In some embodiments, a library may contain nucleic acids having random sequences. In certain embodiments, a predetermined target nucleic acid may be designed and assembled to include one or more random sequences at one or more predetermined positions.
A target nucleic acid may be a first gene fragment that is combined with other gene fragments to generate libraries of rearranged gene fragments. In certain embodiments, a target nucleic acid may include a functional sequence (e.g., a protein binding sequence, a regulatory sequence, a sequence encoding a functional protein, etc., or any combination thereof). However, some embodiments of a target nucleic acid may lack a specific functional sequence (e.g., a target nucleic acid may include only nonfunctional fragments or variants of a protein binding sequence, regulatory sequence, or protein encoding sequence, or any other non-functional naturally-occurring or synthetic sequence, or any non-functional combination thereof). Certain target nucleic acids may include both functional and non-functional sequences. These and other aspects of target nucleic acids and their uses are described in more detail herein.
A target nucleic acid may be assembled in a single multiplex assembly reaction (e.g., a single oligonucleotide assembly reaction). However, a target nucleic acid also may be assembled from a plurality of nucleic acid fragments, each of which may have been generated in a separate multiplex oligonucleotide assembly reaction. It should be appreciated that one or more nucleic acid fragments generated via multiplex oligonucleotide assembly also may be combined with one or more nucleic acid molecules obtained from another source (e.g., a restriction fragment, a nucleic acid amplification product, etc.) to form a target nucleic acid. In some embodiments, a target nucleic acid that is assembled in a first reaction may be used as an input nucleic acid fragment for a subsequent assembly reaction to produce a larger target nucleic acid. Accordingly, different strategies may be used to produce a target nucleic acid having a predetermined sequence. For example, different starting nucleic acids (e.g., different sets of predetermined nucleic acids) may be assembled to produce the same predetermined target nucleic acid sequence. Also, predetermined nucleic acid fragments may be assembled using one or more different in vitro and/or in vivo techniques. For example, nucleic acids (e.g., overlapping nucleic acid fragments) may be assembled in an in vitro reaction using an enzyme (e.g., a ligase and/or a polymerase) or a chemical reaction (e.g., a chemical ligation) or in vivo (e.g., assembled in a host cell after transfection into the host cell), or a combination thereof. Similarly, each nucleic acid fragment that is used to make a target nucleic acid may be assembled from different sets of oligonucleotides. Also, a nucleic acid fragment may be assembled using an in vitro or an in vivo technique (e.g., an in vitro or in vivo polymerase, recombinase, and/or ligase based assembly process). In addition, different in vitro assembly reactions may be used to produce a nucleic acid fragment. For example, an in vitro oligonucleotide assembly reaction may involve one or more polymerases, ligases, other suitable enzymes, chemical reactions, or any combination thereof.
EQUIVALENTS
The present invention provides among other things methods for assembling large polynucleotide constructs and organisms having increased genomic stability. While specific embodiments of the subject invention have been discussed, the above specification is illustrative and not restrictive. Many variations of the invention will become apparent to those skilled in the art upon review of this specification. The full scope of the invention should be determined by reference to the claims, along with their full scope of equivalents, and the specification, along with such variations.
INCORPORATION BY REFERENCE
All publications, patents and sequence database entries mentioned herein, including those items listed below, are hereby incorporated by reference in their entirety as if each individual publication or patent was specifically and individually indicated to be incorporated by reference. In the event of a conflict, the disclosure and description of the present invention shall control.
We claim:

Claims

1. A method of designing a nucleic acid library encoding a plurality of protein variants, the method comprising: selecting a first protein comprising a plurality of peptide fragments arranged in a first relative order, said first protein being encoded by a first nucleic acid comprising a plurality of exons corresponding to said plurality of peptide fragments and defining said first relative order; designing a plurality of protein variants, wherein each of said protein variants comprises a different subset of said plurality of peptide fragments; and determining a corresponding nucleic acid sequence for each of said plurality of protein variants, thereby designing a nucleic acid library.
2. The method of claim 1, wherein said designing step comprises maintaining said first relative order within each of said subset of said plurality of peptide fragments.
3. The method of claim 1, wherein said designing step comprises rearranging the relative order of at least a portion of said plurality of peptide fragments within each of said subsets.
4. The method of claim 1 , wherein said designing step comprises maintaining said first relative order within at least one of said subsets of said plurality of peptide fragments, and rearranging the relative order of at least a portion of said peptide fragments within another of said subsets.
5. The method of claim 1, wherein at least one of said protein variants comprises a subset of said peptide fragments that excludes at least about 1, 2, 3, 4, 5, 10, 25, 50 or 100 fragments from said plurality of peptide fragments.
6. The method of claim 1, wherein said designing step comprises computationally designing at least a subset of said plurality of said protein variants.
7. The method of claim 1, wherein said designing step further comprises determining at least one modification to one peptide fragment within said subset of said plurality of peptide fragments, thereby generating a modified peptide fragment; and designing at least one of said protein variants to comprise said modified peptide fragment.
8. The method of claim 1, wherein said modification comprises an amino acid substitution, deletion, duplication, translocation, rearrangement, or allelic variation.
9. The method of claim 1, further comprising codon-optimizing at least a subset of said nucleic acid sequences corresponding to said protein variants.
10. The method of claim 1, wherein said determining step comprises determining at least one corresponding nucleic acid sequence that is codon-optimized.
11. The method of claim 1 , wherein said determining step comprises determining at least one corresponding nucleic acid sequence that is partially codon-optimized.
12. The method of claim 1, wherein said determining step comprises determining at least one corresponding nucleic acid sequence that comprises a plurality of oligonucleotide segments that, upon rearrangement, define a substantially identical sequence to said first nucleic acid.
13. The method of claim 1 , wherein said first protein is a therapeutic protein.
14. The method of claim 1 , wherein at least a portion of said first protein is a human protein or a homolog thereof.
15. The method of claim 1, wherein said first protein is a hormone, a cytokine, an antigen, an antibody, an enzyme, a clotting factor, a transport protein, a receptor, a regulatory protein, a structural protein, or a transcription factor.
16. The method of claim 28, wherein said first protein is selected from the group consisting of calcitonin, insulin, insulinotropin, insulin-like growth factors, parathyroid hormone, nerve growth factors, TGF-β, tumor necrosis factor, glucagon, bone growth factor-2, bone growth factor-7, TSH-β, interleukin 1, interleukin 2, interleukin 3, interleukin 6, interleukin 11, interleukin 12, CSF-macrophage, immunoglobulins, catalytic antibodies, protein kinase C, superoxide dismutase, tissue plasminogen activator, urokinase, antithrombin III, DNase, tyrosine hydroxylase, blood clotting factor V, blood clotting factor VII, blood clotting factor VIII, blood clotting factor X, blood clotting factor XIII, apolipoprotein E, apolipoprotein A-I, globins, low density lipoprotein receptor, IL-2 receptor, IL-2 receptor antagonists, alpha- 1 antitrypsin, immune response modifiers, and soluble CD4.
17. The method of claim 1 , wherein said first protein is a growth hormone.
18. The method of claim 1 , wherein said first protein is blood clotting factor IX.
19. The method of claim 1, wherein said first protein is α-galactosidase.
20. The method of claim 1, wherein said first protein is glucocerebrosidase.
21. The method of claim 1 , wherein said first protein is erythropoietin.
22. The method of claim 21 , wherein the erythropoietin is human erythropoietin.
23. A method of generating a nucleic acid library, the method comprising: identifying a plurality of gene segments encoding non-overlapping peptide domains of a predetermined protein; and assembling a plurality of nucleic acids, wherein each nucleic acid comprises the plurality of gene segments, and wherein the relative order of the gene segments is different in each nucleic acid.
24. A method of generating a nucleic acid library, the method comprising: identifying a plurality of gene segments encoding non-overlapping peptide domains of a predetermined protein; and assembling a plurality of nucleic acids, wherein each nucleic acid comprises a subset of the plurality of gene segments, and wherein the subset excludes at least one of the plurality of gene segments.
25. A method of generating a nucleic acid library, the method comprising: identifying a plurality of non-overlapping peptide domains of a predetermined protein; and assembling a plurality of nucleic acids, wherein each nucleic acid comprises a plurality of gene segments that encode the non-overlapping peptide domains, and wherein the relative order of the gene segments is different in each nucleic acid.
26. A method of generating a nucleic acid library, the method comprising: identifying a plurality of non-overlapping peptide domains of a predetermined protein, assembling a plurality of nucleic acids, wherein each nucleic acid comprises a subset of a plurality of gene segments that encode the non-overlapping peptide domains, and wherein the subset excludes one or more nucleic acid sequences that encode at least one of the non-overlapping peptide domains.
27. The method of claim 25 or 26, wherein gene segments encoding identical non- overlapping peptide domains have different nucleic acid sequences in different nucleic acids in the library.
28. A method of designing a nucleic acid library encoding a plurality of protein variants, the method comprising: designing a plurality of nucleic acids, wherein each nucleic acid comprises a first plurality of exons arranged in a different relative order, and wherein the first plurality of exons encodes a predetermined protein when the exons are arranged in a first relative order.
29. A method of generating a nucleic acid library encoding a predetermined plurality of protein variants, the method comprising: selecting a protein that is encoded by a first plurality of exons that are arranged in a first relative order; and assembling a plurality of nucleic acids each comprising the first plurality of exons, wherein the relative order of the first plurality of exons is different in each nucleic acid.
30. A method of generating a nucleic acid library encoding a predetermined plurality of protein variants, the method comprising: assembling a plurality of nucleic acids, wherein each nucleic acid comprises a first plurality of exons arranged in a different relative order, and wherein the first plurality of exons encodes a predetermined protein when the exons are arranged in a first relative order.
31. A nucleic acid library encoding a predetermined plurality of protein variants, the library comprising a plurality of nucleic acids, wherein each nucleic acid comprises a first plurality of exons in a different relative order, wherein the first plurality of exons encodes a predetermined protein when the exons are arranged in a first relative order.
32. A method of identifying a nucleic acid that encodes a protein variant, the method comprising: obtaining a plurality of nucleic acids, wherein each nucleic acid comprises a first plurality of exons arranged in a different relative order, and wherein the first plurality of exons encodes a predetermined protein when the exons are arranged in a first relative order; and screening the plurality of nucleic acids to identify a nucleic acid that expresses a protein variant having a predetermined functional property.
33. A method of identifying a nucleic acid that encodes a protein variant, the method comprising: screening a plurality of nucleic acids to identify a nucleic acid that expresses a protein variant having a predetermined functional property, wherein each of the plurality of nucleic acids comprises a first plurality of exons arranged in a different relative order, and wherein the first plurality of exons encodes a predetermined protein when the exons are arranged in a first relative order.
34. A method of identifying a nucleic acid that encodes a protein variant, the method comprising: obtaining a plurality of nucleic acids, wherein each nucleic acid comprises a first plurality of exons arranged in a different relative order, and wherein the first plurality of exons encodes a predetermined protein when the exons are arranged in a first relative order; and screening the plurality of nucleic acids to identify a nucleic acid that expresses a protein variant having a predetermined structural property.
35. A method of identifying a nucleic acid that encodes a protein variant, the method comprising: screening a plurality of nucleic acids to identify a nucleic acid that expresses a protein variant having a predetermined structural property, wherein each of the plurality of nucleic acids comprises a first plurality of exons arranged in a different relative order, and wherein the first plurality of exons encodes a predetermined protein when the exons are arranged in a first relative order.
36. A nucleic acid that encodes a protein variant, the nucleic acid comprising a first plurality of exons encoding a protein having a predetermined functional property, wherein the first plurality of exons is arranged in a different relative order than a first relative order that encodes a predetermined protein.
37. A nucleic acid that encodes a protein variant, the nucleic acid comprising a first plurality of exons encoding a protein having a predetermined structural property, wherein the first plurality of exons is arranged in a different relative order than a first relative order that encodes a predetermined protein.
38. A method of designing a library of protein variants, the method comprising: selecting a protein that is encoded by a first plurality of exons arranged in a first relative order; and designing a plurality of protein variants, wherein each protein variant is encoded by the first plurality of exons arranged in a different relative order.
39. A method of designing a library of protein variants, the method comprising: designing a plurality of protein variants, wherein each protein variant is encoded by a first plurality of exons arranged in a different relative order, wherein the first plurality of exons encodes a predetermined protein when the exons are arranged in a first relative order.
40. A method of making a library of protein variants, the method comprising: selecting a protein that is encoded by a first plurality of exons; and isolating a plurality of protein variants, wherein each protein variant is encoded by the first plurality of exons arranged in a different relative order.
41. A method of making a library of protein variants, the method comprising: isolating a plurality of protein variants, wherein each protein variant is encoded by a first plurality of exons arranged in a different relative order, wherein the first plurality of exons encodes a predetermined protein when they are arranged in a first relative order.
42. A library of protein variants, wherein each protein variant is encoded by a first plurality of exons arranged in a different relative order, and wherein the first plurality of exons encodes a predetermined protein when the exons are arranged in a first relative order.
43. A method of identifying a protein variant, the method comprising: obtaining a plurality of protein variants, wherein each protein variant is encoded by a first plurality of exons arranged in a different relative order; and screening the plurality of protein variants to identify a protein variant having a predetermined functional property.
44. A method of identifying a protein variant, the method comprising: screening a plurality of protein variants to identify a protein variant having a predetermined functional property, wherein each protein variant is encoded by a first plurality of exons arranged in a different relative order.
45. A method of identifying a protein variant, the method comprising: obtaining a plurality of protein variants, wherein each protein variant is encoded by a first plurality of exons arranged in a different relative order; and screening the plurality of protein variants to identify a protein variant having a predetermined structural property.
46. A method of identifying a protein variant, the method comprising: screening a plurality of protein variants to identify a protein variant having a predetermined structural property, wherein each protein variant is encoded by a first plurality of exons arranged in a different relative order.
47. A method of obtaining a protein variant, the method comprising: obtaining a protein variant having a predetermined structural property, wherein the protein variant is encoded by a first plurality of exons arranged in a different relative order than a first relative order that encodes a predetermined protein.
48. A protein variant having a predetermined functional property, wherein the protein variant is encoded by a first plurality of exons arranged in a different relative order than a first relative order that encodes a predetermined protein.
49. A protein variant having a predetermined structural property, wherein the protein variant is encoded by a first plurality of exons arranged in a different relative order than a first relative order that encodes a predetermined protein.
50. A method for generating a nucleic acid library encoding a plurality of protein variants, the method comprising: selecting a first protein comprising a plurality of peptide fragments arranged in a first relative order, said first protein being encoded by a first nucleic acid comprising a plurality of exons corresponding to said plurality of peptide fragments and defining said first relative order; designing a plurality of protein variants, wherein each of said protein variants comprises a different subset of said plurality of peptide fragments; determining a corresponding full-length nucleic acid sequence for each of said plurality of protein variants; generating a set of construction oligonucleotides corresponding to each of said full-length nucleic acid sequences; and assembling each said set of construction oligonucleotides by polymerization, ligation and/or recombination, thereby generating a library of assembled nucleic acids.
51. The method of claim 50, further comprising the step of optimizing fidelity of the assembled nucleic acids by subjecting the construction oligonucleotides to an error filtration, error neutralization or error correction step.
52. The method of claim 51, wherein said optimizing step is performed before, during or after said assembling step.
53. The method of claim 50, wherein at least a subset of said construction oligonucleotides are chemically synthesized.
54. The method of claim 50, wherein said library of assembled nucleic acids comprises at least one synthetic polynucleotide.
55. A nucleic acid library encoding a plurality of variants of a first protein, wherein said first protein comprises a plurality of peptide fragments arranged in a first relative order, and wherein said first protein is encoded by a first nucleic acid comprising a plurality of exons corresponding to said plurality of peptide fragments and defining said first relative order, the library comprising a plurality of predetermined member nucleic acids, each of said member nucleic acids encoding a protein variant having a unique subset of said plurality of peptide fragments.
56. The nucleic acid library of claim 55, further comprising a plurality of host cells, wherein each of said plurality of host cells has been transformed with a recombinant DNA vector comprising a member nucleic acid.
57. A method for generating a protein variant library, the method comprising: selecting a first protein comprising a plurality of peptide fragments arranged in a first relative order, said first protein being encoded by a first nucleic acid comprising a plurality of exons corresponding to said plurality of peptide fragments and defining said first relative order; transforming each of a plurality of host cells with a recombinant DNA vector encoding a different protein variant, wherein each of said different protein variants comprises a unique subset of said plurality of peptide fragments; and expressing said different protein variants.
58. A protein library of a plurality of variants of a first protein, wherein said first protein comprises a plurality of peptide fragments arranged in a first relative order, and wherein said first protein is encoded by a first nucleic acid comprising a plurality of exons corresponding to said plurality of peptide fragments and defining said first relative order, the library comprising a plurality of host cells, each of said plurality of host cells containing a recombinant DNA vector encoding a protein variant comprising a unique subset of said plurality of peptide fragments.
59. A method for screening a protein variant library for a protein with specific affinity for a receptor, the method comprising: providing a library of variants of a first protein, wherein said first protein comprises a plurality of peptide fragments arranged in a first relative order, and wherein said first protein is encoded by a first nucleic acid comprising a plurality of exons corresponding to said plurality of peptide fragments and defining said first relative order, the library comprising a plurality of host cells, each of said plurality of host cells containing a recombinant DNA vector encoding a protein variant having a unique subset of peptide fragments selected from said plurality of peptide fragments exposing the host cells to conditions sufficient to cause expression of said protein variants; lysing the host cells under conditions sufficient to prevent denaturation of said expressed protein variants; contacting the protein variants with a receptor under conditions conducive to specific protein-receptor binding; and identifying a protein variant that binds to said receptor.
60. A method for screening a protein variant library for a functional variant of a target protein, the method comprising: providing a library of variants of a first protein, wherein said first protein comprises a plurality of peptide fragments arranged in a first relative order, and wherein said first protein is encoded by a first nucleic acid comprising a plurality of exons corresponding to said plurality of peptide fragments and defining said first relative order, the library comprising a plurality of host cells, each of said plurality of host cells containing a recombinant DNA vector encoding a protein variant having a unique subset of peptide fragments selected from said plurality of peptide fragments; exposing the host cells to conditions sufficient to cause expression of said protein variants; lysing the host cells under conditions sufficient to prevent denaturation of said expressed protein variants; performing a functional assay on said protein variants, wherein said functional assay generates qualitative or quantitative data that is indicative of the presence or absence or of said functional property, or provides a measure of the degree of said functional property; and evaluating data generated in said functional assay to identify a functional variant.
61. The method of claim 60, wherein said predetermined functional property is specific substrate binding or catalytic activity.
62. A method for screening a protein variant library for a variant having a comparable or improved structural property of a target protein, the method comprising: providing a library of variants of a first protein, wherein said first protein comprises a plurality of peptide fragments arranged in a first relative order, and wherein said first protein is encoded by a first nucleic acid comprising a plurality of exons corresponding to said plurality of peptide fragments and defining said first relative order, the library comprising a plurality of host cells, each of said plurality of host cells containing a recombinant DNA vector encoding a protein variant having a unique subset of peptide fragments selected from said plurality of peptide fragments exposing the host cells to conditions sufficient to cause expression of said protein variants; lysing the host cells under conditions sufficient to prevent denaturation of said expressed protein variants; performing an assay for said structural property on said protein variants; and identifying from said assay a protein variant having a comparable or improved structural property of a target protein.
63. The method of claim 62, wherein said structural property is stability or solubility.
64. The method of claim 62, wherein said structural property is an indicator of immunogenicity of said target protein.
65. A method of designing a nucleic acid library encoding a plurality of protein variants, the method comprising: selecting a first protein comprising a plurality of peptide fragments arranged in a first relative order, said first protein being encoded by a first nucleic acid comprising a plurality of exons corresponding to said plurality of peptide fragments and defining said first relative order; designing a plurality of protein variants, wherein each of said protein variants comprises said plurality of peptide fragments in a different relative order than said first relative order; and determining a corresponding nucleic acid sequence for each of said plurality of protein variants, thereby designing a nucleic acid library.
66. The method of claim 65, wherein said designing step comprises computationally designing at least a subset of said plurality of said protein variants.
67. The method of claim 65, wherein said designing step further comprises: determining at least one modification to one peptide fragments within said plurality of peptide fragments, thereby generating a modified peptide fragment; and designing at least one of said protein variants to comprise said modified peptide fragment.
68. The method of claim 67, wherein said modification is comprises an amino acid substitution, deletion, duplication, translocation, rearrangement, or allelic variation.
69. The method of claim 65, further comprising codon-optimizing at least a subset of said nucleic acid sequences corresponding to said protein variants.
70. The method of claim 65, wherein said determining step comprises determining at least one corresponding nucleic acid sequence that is codon-optimized.
71. The method of claim 65, wherein said determining step comprises determining at least one corresponding nucleic acid sequence that is partially codon-optimized.
72. The method of claim 65, wherein said determining step comprises determining at least one corresponding nucleic acid sequence that comprises a plurality of oligonucleotide segments that, upon rearrangement, define a substantially identical sequence to said first nucleic acid.
73. The method of claim 65, wherein said first protein is a therapeutic protein.
74. The method of claim 65, wherein at least a portion of said first protein is a human protein or a homolog thereof.
75. The method of claim 65, wherein said first protein is a hormone, a cytokine, an antigen, an antibody, an enzyme, a clotting factor, a transport protein, a receptor, a regulatory protein, a structural protein, or a transcription factor.
76. The method of claim 65, wherein said first protein is selected from the group consisting of calcitonin, insulin, insulinotropin, insulin-like growth factors, parathyroid hormone, nerve growth factors, TGF-β, tumor necrosis factor, glucagon, bone growth factor-2, bone growth factor-7, TSH-β, interleukin 1, interleukin 2, interleukin 3, interleukin 6, interleukin 11, interleukin 12, CSF-macrophage, immunoglobulins, catalytic antibodies, protein kinase C, superoxide dismutase, tissue plasminogen activator, urokinase, antithrombin III, DNase, tyrosine hydroxylase, blood clotting factor V, blood clotting factor VII, blood clotting factor VIII, blood clotting factor X, blood clotting factor XIII, apolipoprotein E, apolipoprotein A-I, globins, low density lipoprotein receptor, IL-2 receptor, IL-2 receptor antagonists, alpha- 1 antitrypsin, immune response modifiers, and soluble CD4.
77. The method of claim 65, wherein said first protein is a growth hormone.
78. The method of claim 65, wherein said first protein is blood clotting factor IX.
79. The method of claim 65, wherein said first protein is α-galactosidase.
80. The method of claim 65, wherein said first protein is glucocerebrosidase.
81. The method of claim 65, wherein said target protein is erythropoietin.
82. The method of claim 81 , wherein the erythropoietin is human erythropoietin.
83. A method for generating a nucleic acid library encoding a plurality of protein variants, the method comprising: selecting a first protein comprising a plurality of peptide fragments arranged in a first relative order, said first protein being encoded by a first nucleic acid comprising a plurality of exons corresponding to said plurality of peptide fragments and defining said first relative order; designing a plurality of protein variants, wherein each of said protein variants comprises said plurality of peptide fragments in a different relative order than said first relative order; determining a corresponding full-length nucleic acid sequence for each of said plurality of protein variants, thereby designing a nucleic acid library; for each protein variant, generating a set of construction oligonucleotides that collectively represent its corresponding full-length nucleic acid sequence; and assembling each said set of construction oligonucleotides by polymerization, ligation and/or recombination, thereby generating a library of assembled nucleic acids.
84. The method of claim 83, further comprising the step of optimizing fidelity of the assembled nucleic acids by subjecting the construction oligonucleotides to an error filtration, error neutralization or error correction step.
85 The method of claim 84, wherein said optimizing step is performed before, during or after said assembling step.
86. The method of claim 83, wherein at least a subset of said construction oligonucleotides is chemically synthesized.
87. The method of claim 83, wherein said library of assembled nucleic acids comprises at least one synthetic polynucleotide.
88. A nucleic acid library encoding a plurality of variants of a first protein, wherein said first protein comprises a plurality of peptide fragments arranged in a first relative order, and wherein said first protein is encoded by a first nucleic acid comprising a plurality of exons corresponding to said plurality of peptide fragments and defining said first relative order, the library comprising a plurality of predetermined member nucleic acids, each of said member nucleic acids encoding a protein variant having a unique relative order of said plurality of peptide fragments that is different from said first relative order.
89. The nucleic acid library of claim 88 further comprising a plurality of host cells, wherein each said plurality of host cells has been transformed with a recombinant DNA vector comprising a member nucleic acid.
90. A method for generating a protein variant library, the method comprising: selecting a first protein comprising a plurality of peptide fragments arranged in a first relative order, said first protein being encoded by a first nucleic acid comprising a plurality of exons corresponding to said plurality of peptide fragments and defining said first relative order; and transforming each of plurality of host cells with a recombinant DNA vector encoding a different protein variant, wherein each of said protein variants comprises said plurality of peptide fragments in a different relative order than said first relative order.
91. A library of variants of a first protein, wherein said first protein comprises a plurality of peptide fragments arranged in a first relative order, and wherein said first protein is encoded by a first nucleic acid comprising a plurality of exons corresponding to said plurality of peptide fragments and defining said first relative order, the library comprising a plurality of host cells, each of said plurality of host cells containing a recombinant DNA vector encoding a protein variant having a unique relative order of said plurality of peptide fragments that is different than said first relative order.
92. A method for screening a protein variant library for a protein with specific affinity for a receptor, the method comprising: providing a library of variants of a first protein, wherein said first protein comprises a plurality of peptide fragments arranged in a first relative order, and wherein said first protein is encoded by a first nucleic acid comprising a plurality of exons corresponding to said plurality of peptide fragments and defining said first relative order, the library comprising a plurality of host cells, each of said plurality of host cells containing a recombinant DNA vector encoding a protein variant having a unique relative order; exposing the host cells to conditions sufficient to cause expression of said protein variants; lysing the host cells under conditions sufficient to prevent denaturation of said expressed protein variants; contacting the protein variants with a receptor under conditions conducive to specific protein-receptor binding; and identifying a protein variant that binds to said receptor.
93. A method for screening a protein variant library for a functional variant of a target protein, the method comprising: providing a library of variants of a first protein, wherein said first protein comprises a plurality of peptide fragments arranged in a first relative order, and wherein said first protein is encoded by a first nucleic acid comprising a plurality of exons corresponding to said plurality of peptide fragments and defining said first relative order, the library comprising a plurality of host cells, each of said plurality of host cells containing a recombinant DNA vector encoding a protein variant having a unique relative order; exposing the host cells to conditions sufficient to cause expression of said protein variants; lysing the host cells under conditions sufficient to prevent denaturation of said expressed protein variants; performing a functional assay on said protein variants, wherein said functional assay generates qualitative or quantitative data that is indicative of the presence or absence or of said functional property, or provides a measure of the degree of said functional property; and evaluating data generated in said functional assay to identify a functional variant.
94. The method of claim 93, wherein said predetermined functional property is specific substrate binding or catalytic activity.
95. A method for screening a protein variant library for a variant having a comparable or improved structural property of a target protein, the method comprising: providing a library of variants of a first protein, wherein said first protein comprises a plurality of peptide fragments arranged in a first relative order, and wherein said first protein is encoded by a first nucleic acid comprising a plurality of exons corresponding to said plurality of peptide fragments and defining said first relative order, the library comprising a plurality of host cells, each of said plurality of host cells containing a recombinant DNA vector encoding a protein variant having a unique relative order; exposing the host cells to conditions sufficient to cause expression of said protein variants; lysing the host cells under conditions sufficient to prevent denaturization of said expressed protein variants; performing an assay for said structural property on said protein variants; and identifying from said assay a protein variant having a comparable or improved structural property of a target protein.
96. The method of claim 95 wherein said structural property is stability or solubility.
97. The method of claim 95, wherein said structural property is an indicator of immunogenicity of said target protein.
98. A nucleic acid library encoding a plurality of synthetic splice variants of a known protein, said library comprising a predetermined set of member nucleic acids, each of which encodes a different splice variant of a predetermined protein, and together encode at least about 1%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 75%, 80%, 85%, 90% or 95% of all splice variants for said predetermined protein.
99. A business method comprising: a. identifying a feature of commercial relevancy of a target protein; b. screening a protein library comprising synthetic splice variants of said target protein, for a variant that exhibits said feature to identify a lead variant; c. performing a feasibility analysis for commercialization of said lead variant; d. identifying one or more therapeutic, diagnostic or industrial products; and e. collaboratively or independently, marketing said therapeutic, diagnostic or industrial products.
100. The method of claim 99, wherein said library comprising at least about 500, 1000, 2500, 5000, 10,000, 25,000 50,000, 100,000 variants.
101. The method of claim 99, wherein said library comprises at least about 1%, 5%, 10%, 25%, 50%, 60%, 75%, 80%, 85% 90% or 95% of possible splice variants based upon a computational analysis of intron-exon junctions.
102. The method of claim 99, wherein said feasibility study is performed with a partner.
103. The method of claim 99, wherein said target protein is a therapeutic protein.
104. The method of claim 99, further comprising the step of identifying a pharmaceutical compound comprising said therapeutic protein, and wherein said feasibility study comprises at least one clinical trial evaluating said pharmaceutical compound.
105. The method as recited in claim 104, further comprising the step of collecting royalties from sales of said pharmaceutical compound.
PCT/US2007/025632 2006-12-13 2007-12-13 Fragment-rearranged nucleic acids and uses thereof WO2008076368A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US87499106P 2006-12-13 2006-12-13
US60/874,991 2006-12-13

Publications (2)

Publication Number Publication Date
WO2008076368A2 true WO2008076368A2 (en) 2008-06-26
WO2008076368A3 WO2008076368A3 (en) 2008-11-27

Family

ID=39247666

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/025632 WO2008076368A2 (en) 2006-12-13 2007-12-13 Fragment-rearranged nucleic acids and uses thereof

Country Status (1)

Country Link
WO (1) WO2008076368A2 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9051666B2 (en) 2002-09-12 2015-06-09 Gen9, Inc. Microarray synthesis and assembly of gene-length polynucleotides
US9216414B2 (en) 2009-11-25 2015-12-22 Gen9, Inc. Microfluidic devices and methods for gene synthesis
US9217144B2 (en) 2010-01-07 2015-12-22 Gen9, Inc. Assembly of high fidelity polynucleotides
US10081807B2 (en) 2012-04-24 2018-09-25 Gen9, Inc. Methods for sorting nucleic acids and multiplexed preparative in vitro cloning
US10202608B2 (en) 2006-08-31 2019-02-12 Gen9, Inc. Iterative nucleic acid assembly using activation of vector-encoded traits
US10207240B2 (en) 2009-11-03 2019-02-19 Gen9, Inc. Methods and microfluidic devices for the manipulation of droplets in high fidelity polynucleotide assembly
US10308931B2 (en) 2012-03-21 2019-06-04 Gen9, Inc. Methods for screening proteins using DNA encoded chemical libraries as templates for enzyme catalysis
US10457935B2 (en) 2010-11-12 2019-10-29 Gen9, Inc. Protein arrays and methods of using and making the same
US11072789B2 (en) 2012-06-25 2021-07-27 Gen9, Inc. Methods for nucleic acid assembly and high throughput sequencing
US11084014B2 (en) 2010-11-12 2021-08-10 Gen9, Inc. Methods and devices for nucleic acids synthesis
US11702662B2 (en) 2011-08-26 2023-07-18 Gen9, Inc. Compositions and methods for high fidelity assembly of nucleic acids

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030082630A1 (en) * 2001-04-26 2003-05-01 Maxygen, Inc. Combinatorial libraries of monomer domains

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030082630A1 (en) * 2001-04-26 2003-05-01 Maxygen, Inc. Combinatorial libraries of monomer domains

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
KITAMURA KOICHIRO ET AL: "Construction of block-shuffled libraries of DNA for evolutionary protein engineering: Y-ligation-based block shuffling." PROTEIN ENGINEERING, vol. 15, no. 10, October 2002 (2002-10), pages 843-853, XP002494710 ISSN: 0269-2139 *
KOLKMAN J A ET AL: "DIRECTED EVOLUTION OF PROTEIN BY EXON SHUFFLING" NATURE BIOTECHNOLOGY, NATURE PUBLISHING GROUP, NEW YORK, NY, US, vol. 19, May 2001 (2001-05), pages 423-428, XP001040381 ISSN: 1087-0156 *
QI-LIAN CAI ET AL: "IMMUNOGENICITY OF POLYEPITOPE LIBRARIES ASSEBLED BY EPITOPE SHUFFLING: AN APPROACH TO THE DEVELOPMENT OF CHIMERIC GENE VACCINATION AGAINST MALARIA" VACCINE, BUTTERWORTH SCIENTIFIC. GUILDFORD, GB, vol. 23, no. 2, 1 November 2004 (2004-11-01), pages 267-277, XP009071807 ISSN: 0264-410X *
SAKABE NOBORU JO ET AL: "A bioinformatics analysis of alternative exon usage in human genes coding for extracellular matrix proteins." GENETICS AND MOLECULAR RESEARCH : GMR 2004, vol. 3, no. 4, 2004, pages 532-544, XP002475762 ISSN: 1676-5680 *
VAN DEN BERGH FRANCOISE ET AL: "Characterization of human AMP deaminase 2 (AMPD2) gene expression reveals alternative transcripts encoding variable N-terminal extensions of isoform L" BIOCHEMICAL JOURNAL, vol. 312, no. 2, 1995, pages 401-410, XP002475763 ISSN: 0264-6021 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10774325B2 (en) 2002-09-12 2020-09-15 Gen9, Inc. Microarray synthesis and assembly of gene-length polynucleotides
US10640764B2 (en) 2002-09-12 2020-05-05 Gen9, Inc. Microarray synthesis and assembly of gene-length polynucleotides
US10450560B2 (en) 2002-09-12 2019-10-22 Gen9, Inc. Microarray synthesis and assembly of gene-length polynucleotides
US9051666B2 (en) 2002-09-12 2015-06-09 Gen9, Inc. Microarray synthesis and assembly of gene-length polynucleotides
US10202608B2 (en) 2006-08-31 2019-02-12 Gen9, Inc. Iterative nucleic acid assembly using activation of vector-encoded traits
US10207240B2 (en) 2009-11-03 2019-02-19 Gen9, Inc. Methods and microfluidic devices for the manipulation of droplets in high fidelity polynucleotide assembly
US9968902B2 (en) 2009-11-25 2018-05-15 Gen9, Inc. Microfluidic devices and methods for gene synthesis
US9216414B2 (en) 2009-11-25 2015-12-22 Gen9, Inc. Microfluidic devices and methods for gene synthesis
US9925510B2 (en) 2010-01-07 2018-03-27 Gen9, Inc. Assembly of high fidelity polynucleotides
US9217144B2 (en) 2010-01-07 2015-12-22 Gen9, Inc. Assembly of high fidelity polynucleotides
US11071963B2 (en) 2010-01-07 2021-07-27 Gen9, Inc. Assembly of high fidelity polynucleotides
US10457935B2 (en) 2010-11-12 2019-10-29 Gen9, Inc. Protein arrays and methods of using and making the same
US10982208B2 (en) 2010-11-12 2021-04-20 Gen9, Inc. Protein arrays and methods of using and making the same
US11084014B2 (en) 2010-11-12 2021-08-10 Gen9, Inc. Methods and devices for nucleic acids synthesis
US11702662B2 (en) 2011-08-26 2023-07-18 Gen9, Inc. Compositions and methods for high fidelity assembly of nucleic acids
US10308931B2 (en) 2012-03-21 2019-06-04 Gen9, Inc. Methods for screening proteins using DNA encoded chemical libraries as templates for enzyme catalysis
US10081807B2 (en) 2012-04-24 2018-09-25 Gen9, Inc. Methods for sorting nucleic acids and multiplexed preparative in vitro cloning
US10927369B2 (en) 2012-04-24 2021-02-23 Gen9, Inc. Methods for sorting nucleic acids and multiplexed preparative in vitro cloning
US11072789B2 (en) 2012-06-25 2021-07-27 Gen9, Inc. Methods for nucleic acid assembly and high throughput sequencing

Also Published As

Publication number Publication date
WO2008076368A3 (en) 2008-11-27

Similar Documents

Publication Publication Date Title
WO2008076368A2 (en) Fragment-rearranged nucleic acids and uses thereof
US20200332317A1 (en) Storage through iterative dna editing
Galperin The molecular biology database collection: 2004 update
Kwasnieski et al. Complex effects of nucleotide variants in a mammalian cis-regulatory element
AU775076B2 (en) Protein scaffolds for antibody mimics and other binding proteins
Galperin The molecular biology database collection: 2005 update
EP3844272A1 (en) Methods and compositions for modulating a genome
US6080541A (en) Method for producing tagged genes, transcripts, and proteins
US20070266449A1 (en) Generation of animal models
JP5396653B2 (en) High efficiency gene transfer and expression in mammalian cells by multiple transfection procedures of MAR sequences
JP2020524490A5 (en)
CN104603286A (en) Methods for sorting nucleic acids and multiplexed preparative in vitro cloning
EP2078077A2 (en) Nucleic acid libraries and their design and assembly
CN104937101B (en) The method for designing the divergent big reiterated DNA sequences of codon optimization
CN104685116A (en) Methods for nucleic acid assembly and high throughput sequencing
RU2469089C2 (en) Expression system for increasing gene expression level, nucleic acid molecule, transgenic animal and kit
JPH05503000A (en) Cell-free synthesis and isolation of novel genes and polypeptides
CN110730821A (en) Enhanced hAT family transposon mediated gene transfer and related compositions, systems and methods
Kohman et al. From designing the molecules of life to designing life: future applications derived from advances in DNA technologies
US10053697B1 (en) Programmable alternative splicing devices and uses thereof
CN111315883A (en) Two-component vector library system for rapid assembly and diversification of full-length T cell receptor open reading frames
Haberkorn et al. Molecular imaging and therapy—a programme based on the development of new biomolecules
US20220090051A1 (en) A method for screening of an in vitro display library within a cell
JP2017530726A (en) Antibody-like protein
JP2022513319A (en) SSI cells with predictable and stable transgene expression and methods of formation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07862935

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07862935

Country of ref document: EP

Kind code of ref document: A2