WO2008027558A2 - Iterative nucleic acid assembly using activation of vector-encoded traits - Google Patents

Iterative nucleic acid assembly using activation of vector-encoded traits Download PDF

Info

Publication number
WO2008027558A2
WO2008027558A2 PCT/US2007/019209 US2007019209W WO2008027558A2 WO 2008027558 A2 WO2008027558 A2 WO 2008027558A2 US 2007019209 W US2007019209 W US 2007019209W WO 2008027558 A2 WO2008027558 A2 WO 2008027558A2
Authority
WO
WIPO (PCT)
Prior art keywords
nucleic acid
sequence
assembly
vector
restriction
Prior art date
Application number
PCT/US2007/019209
Other languages
French (fr)
Other versions
WO2008027558A3 (en
Inventor
William J. Blake
Original Assignee
Codon Devices, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Codon Devices, Inc. filed Critical Codon Devices, Inc.
Publication of WO2008027558A2 publication Critical patent/WO2008027558A2/en
Publication of WO2008027558A3 publication Critical patent/WO2008027558A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/66General methods for inserting a gene into a vector to form a recombinant vector using cleavage and ligation; Use of non-functional linkers or adaptors, e.g. linkers containing the sequence for a restriction endonuclease
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/64General methods for preparing the vector, for introducing it into the cell or for selecting the vector-containing host
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions

Definitions

  • Recombinant and synthetic nucleic acids have many applications in research, industry, agriculture, and medicine.
  • Recombinant and synthetic nucleic acids can be used to express and obtain large amounts of polypeptides, including enzymes, antibodies, growth factors, receptors, and other polypeptides that may be used for a variety of medical, industrial, or agricultural purposes.
  • Recombinant and synthetic nucleic acids also can be used to produce genetically modified organisms including modified bacteria, yeast, mammals, plants, and other organisms.
  • Genetically modified organisms may be used in research (e.g., as animal models of disease, as tools for understanding biological processes, etc.), in industry (e.g., as host organisms for protein expression, as bioreactors for generating industrial products, as tools for environmental remediation, for isolating or modifying natural compounds with industrial applications, etc.), in agriculture (e.g., modified crops with increased yield or increased resistance to disease or environmental stress, etc.), and for other applications.
  • Recombinant and synthetic nucleic acids also may be used as therapeutic compositions (e.g., for modifying gene expression, for gene therapy, etc.) or as diagnostic tools (e.g., as probes for disease conditions, etc.).
  • nucleic acids e.g., naturally occurring nucleic acids
  • combinations of nucleic acid amplification, mutagenesis, nuclease digestion, ligation, cloning and other techniques may be used to produce many different recombinant nucleic acids.
  • Chemically synthesized polynucleotides are often used as primers or adaptors for nucleic acid amplification, mutagenesis, and cloning.
  • Techniques also are being developed for de novo nucleic acid assembly whereby nucleic acids are made (e.g., chemically synthesized) and assembled to produce longer target nucleic acids of interest. For example, different multiplex assembly techniques are being developed for assembling oligonucleotides into larger synthetic nucleic acids that can be used in research, industry, agriculture, and/or medicine.
  • aspects of the invention relate to methods, compositions, and devices for assembling nucleic acids.
  • the invention provides nucleic acid configurations and cloning strategies for progressively assembling a long nucleic acid product using a plurality of assembly cycles. Aspects of the invention can reduce the time required for nucleic acid assembly by providing an efficient iterative assembly procedure for generating long nucleic acid products.
  • an assembly cycle involves assembling a vector and two or more nucleic acid inserts containing one or more regulatory sequences.
  • the regulatory sequence(s) activate one or more vector-encoded traits when they are assembled in a predetermined configuration. This allows a correctly assembled nucleic acid to be isolated by selecting or screening for the activated trait(s).
  • the isolated nucleic acid may contain an assembled insert that can be excised along with one or more of the regulatory sequences and combined with a further nucleic acid insert and an appropriate vector in a subsequent assembly cycle.
  • a correctly assembled product can again be isolated using one or more traits that are encoded by the vector and activated by the regulatory sequence(s) present on correctly ligated insert(s).
  • This procedure can be repeated until a final nucleic acid product of interest is assembled.
  • This final product can be used directly or further cloned (e.g., into a new vector) using any suitable technique.
  • one or more regulatory sequences used during assembly may be removed from the final nucleic acid product.
  • correctly assembled nucleic acids can be isolated directly from a pool of transformed host cells in each cycle without requiring individual clones to be isolated and analyzed. This reduces the assembly time associated with each cycle by directly expanding a transformation mix in culture and bypassing the isolation and expansion of individual host cell colonies grown from a transformation mix.
  • an excised insert from a first vector may be combined with a second vector without separating (e.g., size selecting) the excised insert from the first vector backbone or from uncut vector/insert.
  • a restriction digest of a first assembled nucleic acid product may be combined directly with a second vector and another nucleic acid fragment.
  • the restriction digest may include excised insert, empty vector backbone, and uncut vector/insert from the first assembly cycle. While the presence of the first vector backbone may interfere with the second ligation, correctly-ligated product can be selected for if the activated traits encoded by the second vector are different from those encoded by the first vector. This also may reduce assembly time by avoiding labor intensive size selection and isolation steps in each cycle. Accordingly, aspects of the invention provide assembly techniques that are i) less error-prone than simultaneous ligations of pluralities of pooled DNA fragments, and ii) less labor-intensive than iterative pairwise ligation of DNA segments separated based on size.
  • a simultaneous ligation of a plurality of pooled DNA fragments may generate mis- ligated products that typically are not identified until a subsequent sequence analysis performed on nucleic acid retrieved from a transformed cell culture.
  • methods of the present invention may select for correctly ligated products by selecting for activation of one or more vector-encoded traits. Iterative pairwise ligation of DNA segments separated based on size may be slow and labor intensive, because it involves DNA isolation and transformant analysis in each cycle of ligation.
  • methods of the present invention may be implemented without isolating fragments based on size and without analyzing individual clones from a transformation reaction to identify those with correctly ligated inserts.
  • a size analysis may be performed as a quality control step either in parallel (e.g., to monitor the progress of the assembly reaction) or prior to performing the next assembly step (e.g., to confirm that a first assembly was successful prior to proceeding with a second assembly.
  • nucleic acid inserts that are used in any of the vector activation assembly cycles described herein may be a nucleic acid that was previously assembled in a multiplex assembly reaction.
  • nucleic acid fragments generated using any of the multiplex assembly reactions illustrated in FIGs. 1-4 or otherwise described herein may be subsequently assembled to form larger nucleic acid products using one or more cycles of a vector-encoded trait activation technique.
  • one or more vector activation assembly cycles may be included in an assembly procedure outlined in FIG. 5.
  • Non-limiting examples of nucleic acid configurations that may be used for assembly by vector activation are illustrated in FIGs. 6-8, and further described herein.
  • a plurality of assembly cycles can be performed in parallel and pairs of nucleic acid products from a first set of assembly cycles can be combined and assembled in a second set of assembly cycles.
  • pairs of assembled nucleic acids from the second set of assembly cycles can be combined and assembled in a third set of assembly cycles. This process can be repeated one or more times until a final product is assembled to contain all of the starting nucleic acids from the first plurality of assembly cycles.
  • an assembly procedure is hierarchical in that it involves a plurality of converging iterative assembly reactions wherein a first plurality (e.g., N) of pair-wise assembly reactions produces a first plurality of products that are combined in a pair- wise fashion in a second plurality (e.g., N/2) of assembly reactions to generate a second plurality of products.
  • This procedure can be repeated with the number of assembly reactions (and resulting assembly products) being twofold less at each consecutive stage (e.g., until a single final product is generated).
  • the sizes of the nucleic acid products at each stage are about twofold greater than the sizes at the prior stage (assuming that the initial nucleic acid inserts had similar sizes).
  • this hierarchical assembly procedure can produce a long insert that increases exponentially in size as a function of the number of consecutive assembly steps.
  • iterative assembly procedures can be used in a linear assembly procedure.
  • one product of a prior assembly may be combined with a second nucleic acid insert that was not generated from a prior iterative assembly procedure.
  • the second nucleic acid insert at each step may be a oligonucleotide (e.g., a double- stranded pair of oligonucleotides) or a relatively short nucleic acid assembled in a multiplex assembly reaction (e.g., about 500 nucleotides long).
  • nucleic acid being assembled in this linear procedure grows linearly by the length of the second nucleic acid added at each consecutive step.
  • an iterative assembly of the invention may involve a combination of one or more linear and one or more exponential assembly steps and is not limited to either a hierarchical assembly or a linear assembly.
  • Design and assembly methods of the invention may be automated. Methods of the invention may reduce the cost and increase the speed and accuracy of nucleic acid assembly procedures, particularly automated assembly procedures.
  • aspects of the invention provide methods and compositions that can be used to efficiently assemble a target nucleic acid, particularly a long target nucleic acid.
  • a target nucleic acid may be amplified, sequenced or cloned after it is made.
  • a host cell may be transformed with the assembled target nucleic acid.
  • the target nucleic acid may be integrated into the genome of the host cell.
  • the target nucleic acid may encode a polypeptide.
  • the polypeptide may be expressed (e.g., under the control of an inducible promoter).
  • the polypeptide may be isolated or purified.
  • a cell transformed with an assembled nucleic acid may be stored, shipped, and/or propagated (e.g., grown in culture).
  • the invention provides methods of obtaining target nucleic acids by sending sequence information and delivery information to a remote site.
  • the sequence may be analyzed at the remote site.
  • the starting nucleic acids may be designed and/or produced at the remote site using one or more methods of the invention.
  • the starting nucleic acids, an intermediate product in the assembly reaction, and/or the assembled target nucleic acid may be shipped to the delivery address that was provided.
  • Other aspects of the invention provide systems for designing starting nucleic acids and/or for assembling the starting nucleic acids to make a target nucleic.
  • Other aspects of the invention relate to methods and devices for automating a multiplex oligonucleotide assembly reaction that include one or more assembly methods of the invention.
  • Yet further aspects of the invention relate to business methods of marketing one or more methods, systems, and/or automated procedures that involve assembly methods of the invention.
  • FIG. 1 illustrates certain aspects of an embodiment of a polymerase-based multiplex oligonucleotide assembly reaction
  • FIG. 2 illustrates certain aspects of an embodiment of sequential assembly of a plurality of oligonucleotides in a polymerase-based multiplex assembly reaction
  • FIG. 3 illustrates an embodiment of a ligase-based multiplex oligonucleotide assembly reaction
  • FIG. 4 illustrates several embodiments of ligase-based multiplex oligonucleotide assembly reactions on supports;
  • FIG. 5 illustrates an embodiment of a nucleic acid assembly procedure
  • FIG. 6 illustrates a non-limiting embodiment of two assembly cycles of the invention
  • FIG. 7 illustrates a non-limiting embodiment of two assembly cycles of the invention
  • FIG. 8 illustrates non-limiting embodiments of activator sequence configurations according to the invention.
  • FIG. 9 illustrates a non-limiting embodiment of a hierarchical cloning strategy that may be integrated with one or more enzyme-mediated multiplex assembly steps
  • FIG.10 illustrates a non-limiting embodiment depicting a Pairwise Selection Assembly (PSA) procedure
  • FIG.l 1 provides non-limiting diagrams of exemplary vectors for Pairwise Selection Assembly (PSA); and, FIG. 12 illustrates a non-limiting example of promoter regions containing type
  • IIS recognition sequences for use in some aspects of the invention.
  • the invention provides methods and nucleic acid configurations for activating one or more vector-encoded traits (e.g., antibiotic resistance, auxotrophy, etc.) upon correct assembly of two or more nucleic acid fragments into a vector.
  • vector-encoded traits e.g., antibiotic resistance, auxotrophy, etc.
  • Each nucleic acid fragment to be included in an assembly reaction may contain one or more activation sequences. These activation sequences are configured to activate vector-encoded trait(s) on a first vector only if the fragments are assembled in the correct order and orientation in the first vector. Accordingly, a nucleic acid insert that is correctly assembled in the first vector can be isolated by selecting or screening for appropriate trait activation (e.g., in a transformed host cell population).
  • the correctly assembled nucleic acid insert can be removed from the first vector along with one or more activation sequences.
  • This assembled nucleic acid insert then can be combined in a second assembly reaction with a further nucleic acid fragment that also may have one or more activation sequences.
  • These fragments may be inserted into a second vector encoding one or more traits that are activated only if the fragments in the second assembly reaction are correctly assembled into the second vector.
  • the second vector may have the same vector backbone as the first vector.
  • the second vector may be a different vector that encodes one or more of the same traits as the first vector.
  • the second vector may encode one or more traits (e.g., traits that are activated when the correct activating sequence is introduced) that are different from the traits encoded by the first vector.
  • the second vector does not encode any of the activated trait(s) of the first vector.
  • the first vector may not encode any of the activated traits of the first vector.
  • a correctly assembled insert in the second vector may be isolated by selecting or screening for appropriate trait(s) (e.g., in a transformed host cell population). This process may be repeated in one or more additional cycles (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, etc.) adding at least one additional fragment in each cycle. After each cycle, the insert should be larger than the insert that was generated in the previous cycle.
  • a 100 kb fragment of DNA broken into one hundred 1 kb pieces will require 7 assembly steps (100 > 64 > 32 > 16 > 8 > 4 > 2 > 1) while the same fragment broken into two hundred 500 bp pieces will require 8 assembly steps (200 > 128 > 64 > 32 > 16 > 8 > 4 > 2 > 1).
  • the nucleic acids that are combined for assembly in each cycle may be obtained from any suitable source.
  • each nucleic acid fragment independently may have been generated in a multiplex nucleic acid assembly reaction, an amplification reaction, a prior cloning procedure, etc., or any combination thereof.
  • one or two fragments that are combined for assembly each may have been generated in a prior assembly cycle that involved vector-encoded trait activation as described herein.
  • a plurality of parallel assembly cycles involving vector-encoded trait activation may merge according to a predetermined hierarchy to generate a final assembled nucleic acid product in a hierarchical assembly procedure. Aspects of the invention may be used to generate a nucleic acid of any size.
  • the size of the final product will depend, at least in part, on the size of the fragments that are being assembled and the number of assembly cycles that are performed. For example, nucleic acids from about 20 bp to about 1 mb long may be assembled. In some embodiments, a target nucleic acid may between about 100 bp and 1 kb (e.g., about 200 bp, about 300 bp, about 400 bp, about 500 bp, about 600 bp, etc.).
  • a target nucleic acid may between about 1 kb and 10 kb (e.g., about 2 kb, about 3 kb, about 4 kb, about 5 kb, about 6 kb, etc.), between about 10 kb and 100 kb (e.g., about 20 kb, about 30 kb, about 40 kb, about 50 kb, etc.), or between about 100 kb and 1 mb in size.
  • target nucleic acids that are smaller, larger, or intermediate in size also may be assembled according to methods of the invention. Aspects of the invention may be automated.
  • a robotic liquid handling device integrated with a plurality of reaction stations may be used to automate one or more cycles of assembly described herein.
  • one or more reaction steps may be performed (and optionally automated) on a microfluidic device comprising a microfluidic substrate having one or more reaction sites connected via microfluidic channels.
  • aspects of the invention also provide vectors, nucleic acid cassettes (e.g., encoding one or more traits or comprising one or more activation sequences), enzymes, selection agents (e.g., one or more antibiotics), etc., that may be used as standard templates or reagents for assembly methods of the invention.
  • nucleic acid cassettes e.g., encoding one or more traits or comprising one or more activation sequences
  • enzymes e.g., one or more antibiotics
  • selection agents e.g., one or more antibiotics
  • aspects of the invention relate to methods, compositions, and devices for assembling nucleic acids. Some aspects of the invention provide efficient methods for assembling nucleic acids using one or more assembly cycles, wherein at least two predetermined nucleic acids are assembled together with a vector in each assembly cycle. In each cycle, a correctly assembled nucleic acid product may be isolated using one or more appropriate traits (e.g., selectable and/or detectable traits), which become activated (e.g., functional) upon correct assembly of nucleic acid fragments. In some embodiments, one or more predetermined traits encoded on a vector may be activated in each cycle by the correct insertion of assembly nucleic acids into the vector.
  • appropriate traits e.g., selectable and/or detectable traits
  • one or more predetermined traits encoded on a vector may be activated in each cycle by the correct insertion of assembly nucleic acids into the vector.
  • an insert fragment carrying one or more segments of nucleic acid i.e., sequence
  • activation tags that are provided by correct assembly of another fragment (such as an insert).
  • aspects of the invention provide methods for specifically selecting correctly assembled nucleic acids rather than just the presence of certain traits. This can be used in each cycle to avoid certain cloning or validation steps and thereby reduce the time of each cycle. It should be appreciated that in each cycle, one or more nucleic acids being added to the assembly reaction may have been produced in a prior assembly cycle of the invention. Accordingly, iterative assembly using a plurality of assembly cycles can be used to generate progressively longer assembled nucleic acid products in a series of efficient assembly reactions.
  • some embodiments of the invention provide methods for assembling nucleic acid segments which include the following steps: digesting a first population of nucleic acids having at least first, second, third and fourth restriction sites, using a first set of restriction enzymes that cleave the nucleic acids at the first and third sites; digesting a second population of nucleic acids having at least first, second, third and fourth restriction sites, using a second set of restriction enzymes that cleave the nucleic acids at the second and fourth sites, where the first and second populations of nucleic acids comprise a first activation sequence located between the first and second restriction sites and a second activation sequence located between the third and fourth restriction sites, and digestion of the first population results in a first population of nucleic acid segments that comprises the first activation sequence but lacks the second activation sequence, and digestion of the second population results in a second population of nucleic acid segments that lacks the first activation sequence and comprises the second activation sequence; combining (optionally in the presence of a ligase) the first and second populations of
  • any of the embodiments described herein wherein one or more vectors are digested with enzymes that cut at the first and fourth restriction sites may be practiced using a vector that is digested with one or more other enzymes that generate overhangs that are compatible with the overhangs at the first and fourth sites of the inserts being cloned into the vectors.
  • the vector may not contain the first and fourth restriction sites provided it contains sites that can be specifically cut to produce appropriate compatible ends (e.g., overhangs) to clone the regulatory ends of the inserts adjacent to the regulatable markers (e.g., activatable markers) on the vector backbone.
  • the methods may include additional steps: digesting a third population of nucleic acids having at least first, second, third and fourth restriction sites, using a first set of restriction enzymes that cleave the nucleic acids at the first and third sites; digesting a fourth population of nucleic acids having at least first, second, third and fourth restriction sites, using a second set of restriction enzymes that cleave the nucleic acids at the second and fourth sites, where the third and fourth populations of nucleic acids comprise a first activation sequence located between the first and second restriction sites and a second activation sequence located between the third and fourth restriction sites, and digestion of the third population results in a third population of nucleic acid segments that comprises the first activation sequence but lacks the second activation sequence, and digestion of the fourth population results in a fourth population of nucleic acid segments that lacks a first activation sequence and comprises a second activation sequence; combining in the presence of a ligase the third and fourth populations of nucleic acid segments with a second nucleic acid
  • yet additional steps may be included for assembly: digesting a third population of nucleic acids having at least first, second, third and fourth restriction sites, using a first set of restriction enzymes that cleave the nucleic acids at the first and third sites; digesting a fourth population of nucleic acids having at least first, second, third and fourth restriction sites, using a second set of restriction enzymes that cleave the nucleic acids at the second and fourth sites, wherein the third and fourth populations of nucleic acids comprise a 5' promoter sequence located between the first and second restriction sites and a 3' promoter sequence between the third and fourth restriction sites, and digestion of the third population results in a third population of nucleic acid segments that comprises the 5' promoter sequence but lacks the 3' promoter sequence, and digestion of the fourth population results in a fourth population of nucleic acid segments that lacks a 5' promoter sequence and comprises a 3' promoter sequence; combining in the presence of a ligase the third and fourth populations of nucleic acid
  • the methods for assembling nucleic acid segments may include the following steps: digesting a first population of nucleic acids having at least first, second, third and fourth restriction sites, using a first set of restriction enzymes that cleave the nucleic acids at the first and third sites; digesting a second population of nucleic acids having at least first, second, third and fourth restriction sites, using a second set of restriction enzymes that cleave the nucleic acids at the second and fourth sites, wherein the first and second populations of nucleic acids comprise a 5' promoter sequence located between the first and second restriction sites and a 3' promoter sequence located between the third and fourth restriction sites, and digestion of the first population results in a first population of nucleic acid segments that comprises the 5' promoter sequence but lacks the 3' promoter sequence, and digestion of the second population results in a second population of nucleic acid segments that lacks the 5' promoter sequence and comprises the 3' promoter sequence; combining in the presence of a ligase the
  • the first, second, third and/or forth marker genes may be selectable and/or activatable markers, such as antibiotic resistance genes.
  • the restriction enzymes used in any of the embodiments disclosed may be type II restriction enzymes or type IIS restriction enzymes.
  • the same first type IIS restriction enzyme recognition sequence is used for the first and third sites.
  • the same type IIS restriction enzyme recognition sequence is used for the second and fourth sites. Accordingly, a single type IIS enzyme may be used to cut the first and third sites and a different single type IIS enzyme may be used to cut the second and fourth sites.
  • the type IIS recognition sites are located within the flanking regions of the inserts in association with the activating sequences.
  • the second and third sites are oriented in such a way that the nucleic acid of the insert is cleaved by the respective enzymes binding to the recognition sites.
  • the first and fourth sites are oriented in such a way that the nucleic acid of the vector is cleaved by the respective enzymes.
  • an insert released by cleavage at the first and third sites will have an overhang sequence at the first end that is complementary with an overhang generated on the second vector whereas the overhang sequence at the third site will be complementary to the overhang generated at the second site of a different insert release by cleavage at the second and fourth sites, wherein both inserts are designed to be assembled in a subsequent assembly step.
  • the inserts may be designed to include an overlapping sequence (e.g., at least the length of a restriction cleavage site such as a 4-base overlap of some type IIS restriction cleavage sites).
  • the cleavage overhang sizes and orientations generated by the restriction enzymes used for cutting the second and third sites should be compatible so that they generate complementary sequences for ligation within the overlap region of two inserts designed for subsequent ligation.
  • the type IIS recognition sites may be located within the promoter regions of the activation sequences (e.g., between or around the -35 and -10 sequences of a promoter) and located such that the digestion site is either outside the activation sequence within the insert sequence (e.g., for sites 2 and 3) or distal to the insert site and in the vector sequence (e.g., for sites 1 and 4).
  • the restriction site configurations are illustrated by the non-limiting examples provided herein.
  • sites 1, 2, 3, and 4 are recognized by the same enzyme (e.g., the same type IIS enzyme).
  • differential cutting at sites 1 and 3 versus 2 and 4 may be achieved using selective protection, methylation, and/or cleavage techniques as described herein.
  • a first set (e.g., pair) of oligonucleotides may be used to protect only sites 1 and 3 from methylation (e.g., using a RecA mediated triple helix formation).
  • selective cleavage of the unmethylated sites 1 and 3 may be obtained using an enzyme that is sensitive to the methylation of the substrate nucleic acid.
  • sites 2 and 4 may be selectively cleaved using specific oligonucleotides that protect sites 2 and 4 (but not sites 1 and 3).
  • aspects of the invention provide nucleic acid configurations and assembly strategies that involve selecting for one or more vector-encoded traits in a plurality of consecutive assembly cycles.
  • the same vector encoding the same trait(s) may be used in a plurality of consecutive assembly cycles, with one or more additional nucleic acids being added in each cycle.
  • two or more different vectors may be used in consecutive cycles.
  • two different vectors are used repeatedly in alternate cycles.
  • a first vector encoding one or more first traits may be used in a first cycle, followed by a second vector encoding one or more second traits (e.g., one, two, three, four, or more second traits) in a second cycle.
  • first traits e.g., one, two, three, four, or more first traits
  • second traits e.g., one, two, three, four, or more second traits
  • a first vector may contain two selectable markers (e.g., traits), such as chloramphenicol and kanamycin, which become functional upon ligation with a correct insert.
  • selectable markers e.g., traits
  • pCK chloramphenicol and kanamycin
  • a second vector may contain two selectable markers (e.g., traits), such as tetracycline and specintomycin, which become functional upon ligation with a correct insert.
  • pTS as illustrated in FIG. 11.
  • Each of the vectors provided in the instant invention for use in the assembly process contain a functional selection marker, e.g., ampicillin resistance, which can be used for propagations purposes, in addition to two non-functional (e.g., activatable) resistance markers as described above.
  • the vectors such as pCK and pTS, may be constructed such that they contain either a high-copy number origin of replication, or a B AC-based single-copy number origin of replication.
  • the former versions enable DNA assembly up to -10-30 kb, while the latter B AC-based vectors may be more suitable for longer construction up to ⁇ 300 kb. Transition from one vector type to another is seamless, as both vector types have the same non-functional markers that are activated by the same activation tags (i.e., they differ only in the origin of replication).
  • cloning may be done by transformation followed by growth and selection of the transformed cell population without isolating and analyzing individual colonies grown from the transformed cell population prior to subsequent expansion.
  • predetermined nucleic acids and vectors may be designed to produce one or more selectable or detectable traits when a correct assembly reaction occurs.
  • aspects of the invention may provide algorithms (e.g., computer- implemented algorithms) for designing nucleic acid configurations with appropriate selection or detection techniques that may be chosen to isolate correctly assembled nucleic acids in one or more consecutive assembly cycles.
  • a nucleic acid that is correctly assembled from two smaller nucleic acids in a first assembly cycle may be used in a second assembly cycle.
  • correct assembly with yet a further nucleic acid and an appropriate vector may generate a longer assembled nucleic acid that can be isolated using appropriate selection or detection techniques.
  • the same selectable or detectable traits may be used in each assembly cycle.
  • the invention is not limited in this respect and different traits may be used in each cycle.
  • strategies may be developed for alternating traits that are selected for in consecutive assembly cycles. This may reduce the frequency of nucleic acids that are carried over, from one assembly cycle to the next, without being assembled with an additional nucleic acid in each cycle.
  • the correct insertion of at least one nucleic acid into a vector produces a selectable or detectable trait (e.g., by increasing or decreasing the expression of a marker encoded by the vector).
  • the insertion of each of two nucleic acids into the vector produces a selectable or detectable trait (e.g., by each nucleic acid increasing or decreasing the expression of a marker encoded by the vector).
  • each nucleic acid inserted into the vector produces a different selectable or detectable trait (e.g., by increasing or decreasing the expression of a different marker encoded by the vector).
  • each nucleic acid inserted into the vector may increase or decrease the expression of the same marker, resulting in different levels of the detectable or selectable trait depending on the number of nucleic acids that are inserted into the vector. Accordingly, in each assembly cycle, the targeted insertion of predetermined nucleic acids (e.g., two predetermined nucleic acids) into a vector may be selected for by selecting for appropriate combinations and/or levels of one or more vector-encoded traits.
  • predetermined nucleic acids e.g., two predetermined nucleic acids
  • one or more (e.g., one or both) of the predetermined nucleic acids being assembled may have been assembled in a prior assembly cycle from one or more smaller nucleic acids (e.g., from the assembly of two smaller nucleic acids in a prior cycle).
  • an insert e.g., an activation tag
  • a non-functional segment of nucleic acid on a vector may be a promoter or a fragment thereof, which then becomes turned-on in the presence of a correctly assembled insert containing a remainder of sequences necessary to activate the marker.
  • correct assembly of the vector and the insert induces transcription of a gene or a fragment thereof encoded by a nucleic acid segment following the promoter sequence.
  • a portion of a coding sequence may be provided on the insert (e.g., as part of an activation sequence).
  • a promoter may be associated with at least a start codon (ATG) in the activation sequence on the insert tag.
  • ATG start codon
  • a coding region that can be used to activate one or more (e.g., 2, 3, 4, etc.) activatable markers may be included in the activation sequence on an insert.
  • Such a coding region may be, for example, a signal peptide, a multimerization domain, a stabilization domain, etc., or any combination thereof.
  • a nonfunctional segment on a vector may represent a component or part of a functional unit, which must be supplemented by additional component (on an insert) to become functional.
  • a vector may encode one subunit of a factor which by itself inactive, and an insert nucleic acid may encode another subunit, which together with the first subunit encoded by the vector can form a functional unit.
  • a vector may encode only a portion of a functional factor, and an inset may encode a remainder of the functional unit, such that only when the fragments are correctly assembled the factor becomes functional.
  • FIGs. 6 and 7 illustrate non-limiting embodiments of two assembly cycles of the invention.
  • the activation sequences are a promoter (P) and a terminator (T).
  • the activation sequences are both promoters (P and P').
  • other combinations of promoters and terminators may be used as activation sequences (e.g., two terminators).
  • other types of activation sequences may be used.
  • an activation sequence may be any suitable cis-acting sequence (e.g., a regulatory sequence such as a promoter, terminator, enhancer, ribosome binding or other cis-acting regulatory sequence — for example, a cis-acting protein binding sequence) that can regulate the expression levels of a marker gene (e.g., a gene that produces a selectable trait when it is either up-regulated or down-regulated).
  • a regulatory sequence such as a promoter, terminator, enhancer, ribosome binding or other cis-acting regulatory sequence — for example, a cis-acting protein binding sequence
  • a marker gene e.g., a gene that produces a selectable trait when it is either up-regulated or down-regulated.
  • an activation sequence may be any suitable trans-acting sequence (e.g., a sequence that encodes a regulatory peptide or other trans-acting regulatory factor) that can regulate the expression levels of a marker gene (e.g., a gene that produces a selectable trait when it is either up-regulated or down-regulated).
  • a marker gene e.g., a gene that produces a selectable trait when it is either up-regulated or down-regulated.
  • the expression levels of a marker gene in a host cell may be up-regulated or down-regulated by an activation sequence.
  • a marker gene is not expressed in the absence of an activation sequence, and expressed in the presence of the activation sequence. However, the expression level of a marker gene may increase from a lower level to a higher level in the presence of the activation sequence.
  • a marker gene is expressed in the absence of an activation sequence and silenced in the presence of the activation sequence. However, the expression level of a marker gene may decrease from a higher level to a lower level in the presence of the activation sequence.
  • activation sequences are short sequences (e.g., DNA sequences) necessary for the activation of non- functional marker genes (e.g., antibiotic resistance markers) present on the target vector.
  • an activation sequence may be a promoter, a terminator (e.g., a terminal amino-acid/stop-codon region necessary for expression of a marker gene), or other short sequence necessary for correct expression of a marker gene.
  • a regulatory sequence may be a silencer that reduces expression of a gene resulting in a selectable or detectable trait associated with the reduced gene expression.
  • a marker gene may be any gene that confers a detectable or selectable trait on a cell when its expression levels change (e.g., increase or decrease, depending on the marker gene).
  • an antibiotic resistance marker may confer a selectable trait (antibiotic resistance) when its expression level is up-regulated.
  • Other marker genes may be auxotrophic markers, or other selectable or detectable markers.
  • An example of a detectable marker is a fluorescent marker. Such markers are well known in the art.
  • marker genes are fluorescent reporter genes (e.g., GFP,
  • Inactive fluorescent reporters encoded on the target vector would be activated upon insertion of DNA molecule(s) containing activation sequences.
  • cells may be sorted via flow cytometry.
  • Cells containing the expressed fluorescent reporter genes can be isolated.
  • the isolated cells contain the correctly ligated DNA.
  • the activation and expression of a fluorescent reporter gene may be more rapid than the activation and expression of an antibiotic resistance marker. Accordingly, in some embodiments, the isolation of correct constructs using flow cytometry may be performed earlier (e.g., after a shorter cell recovery and growth time after transformation) than a selection involving activation of antibiotic resistance markers.
  • fragments I through IV are assembled in two cycles.
  • fragments I and II of i) are assembled into vector of ii) to generate I+II of iii)
  • fragments III and IV of i') are assembled into vector of ii') to generate III+IV of iii').
  • fragments I+II of iii) and III+IV of iii') are assembled into vector of iv) to generate fragment I+II+III+IV of v).
  • fragments I and II are provided in constructs comprising two flanking activation sequences (P and T), wherein each activation sequence is flanked by two different restriction enzyme recognition sites (restriction sites 1 and 2 flank P, and restriction sites 3 and 4 flank T).
  • P and T flanking activation sequences
  • restriction sites 1 and 2 flank P, and restriction sites 3 and 4 flank T two different restriction enzyme recognition sites
  • one or both of these constructs may be provided in a first vector (e.g., a plasmid).
  • one or both of these constructs may be provided as a linear product of a multiplex nucleic acid assembly reaction.
  • the constructs may be amplified (e.g., by PCR or LCR).
  • the construct containing I is digested with restriction enzymes that cut at 1 and 3 to generate a linear product that contains I and one of the flanking activation sequences (P).
  • the construct containing II is digested with 2 and 4 to generate a product that contains II and the other flanking activation sequence (T).
  • the digested constructs are combined with a vector ii) that contains restriction sites 1 and 4 adjacent to marker genes A and B, respectively. In ii), site 1 is 5' of marker gene A, and site 4 is 3' of marker gene B. A and B are inactive in ii).
  • the vector of ii) is digested with restriction enzymes that recognize sites 1 and 4.
  • the digested nucleic acids of i) and ii) are ligated to generate a product shown in iii).
  • the free ends generated by digestion at site 1 flanking P and at site 1 upstream of A are compatible (e.g., cohesive), and activation sequence P is ligated upstream of marker A in the vector, thereby activating A.
  • the free ends generated by digestion at site 4 flanking T and at site 4 downstream from B are compatible (e.g., cohesive), and activation sequence T is ligated downstream of marker B in the vector, thereby activating B.
  • a correct assembly the free ends generated by restriction digestion at sites 3 and 2, flanking I and II respectively, are compatible (e.g., cohesive) and are ligated together to generate product H-II in a vector expressing A and B as shown in iii).
  • the ligated nucleic acids of i) and ii) are transformed into a suitable host cell preparation.
  • a correct assembly may be selected for by selecting for traits associated with activation of A and B in the host cells.
  • a construct containing III is digested with restriction enzymes that recognize sites 1 and 3
  • a construct containing IV is digested with restriction enzymes that recognize sites 2 and 4.
  • the restriction products of i') are ligated into a vector of ii') that has been digested with restriction enzymes recognizing 1 and 4.
  • the resulting product III+IV is shown ligated into the vector in iii').
  • the correct assembly of III and IV may be isolated by selecting for traits associated with A and B in suitable host cells after transformation of the ligation reaction products.
  • I+II are assembled with III+IV to generate I+II+III+IV.
  • the nucleic acids of iii) may be digested with restriction enzymes that recognize 1 and 3, generating a linear product that contains I+II and one of the flanking activation sequences (P).
  • the nucleic acids of Hi') may be digested with restriction enzymes that recognize 2 and 4, generating a product that contains III+IV and the other flanking activation sequence (T).
  • the digested constructs are combined with a vector of iv) that has been digested with restriction enzymes recognizing sites 1 and 4 adjacent to inactive marker genes C and D, respectively.
  • the nucleic acids are ligated and transformed into a suitable host cell preparation.
  • a correct assembly of I+II+III+IV shown in v) may be selected for by selecting for traits associated with activation of C and D in the host cells.
  • fragments I through IV are assembled in two cycles as described above for FIG. 6.
  • flanking activation sequences in FIG. 7 are both promoters
  • the coding sequences of both sets of marker genes are oriented so that sites 1 and 4 are upstream of the marker genes.
  • sites 1 and 4 are upstream of the marker genes.
  • correct insertion of the promoter containing fragments into sites 1 and 4 activates both sets of marker genes.
  • FIG. 7B A working example of a first assembly cycle of this configuration is provided in FIG. 7B.
  • two approximately 900 bp fragments (I and II) are being combined to make a contiguous 1800 bp fragment.
  • sites 1 and 3 are Bsal sites and sites 2 and 4 are BsmBI sites (see FIG. 7). These restriction sites are not present in the sequence being assembled.
  • Restriction digestion reactions can be heat-inactivated and a portion of each digestion can be combined with linearized destination vector in a ligation reaction. E. coli cells can then be transformed with the ligation reaction and correct pairs can be selected in culture. As demonstrated in FIG. 7B, showing agarose gel electrophoresis of digested DNA before and after assembly and selection, two ⁇ 900 bp fragments can be assembled to generate a contiguous 1,800 bp fragment according to the methods of the present invention. It should be appreciated that after each cycle, an assembled insert (e.g., I+II,
  • activator sequences e.g., P and T in FIG. 6, P and P' in FIG. 7, or any other combination of activator sequences.
  • the activator sequences retain flanking restriction sites 1, 2, 3, and 4 in the same configuration (e.g., as illustrated in FIGs. 6 and 7). Accordingly, the product of each assembly cycle can be used in a further assembly cycle using the same strategy. For example, product v) in
  • FIGs. 6 and 7 can be processed as described herein (e.g., cut with 1 and 3, or with 2 and 4) and ligated into a suitable vector along with an additional fragment having the appropriate compatible free ends for ligation.
  • a vector with different marker genes is used in each cycle.
  • vectors of ii) and iv) may be used in alternate cycles. Accordingly, an insert excised from v) may be cloned into a vector of ii) along with an additional fragment, and correctly assembled inserts may be selected for by selecting for activation of A and B. In the next cycle, a vector of iv) may be used, etc.
  • the selection combination used in each cycle specifically selects for the intended fragment assembly product and selects against the vector that was used in a prior assembly cycle.
  • aspects of the invention may be readily automated.
  • ligation assembly reactions may be performed using restriction digest mixtures that contain the fragments to be assembled and also the vector backbones from the prior assembly.
  • Selection for different markers (e.g., alternate markers) in each cycle reduces or prevents the vector backbones from a prior cycle from interfering with the assembly process (e.g., avoids a background of transformed cells containing vectors from a prior cycle from being amplified). Accordingly, size separation steps (e.g., to isolate a fragment from a vector) may be omitted from these assembly cycles.
  • restriction sites 1-4 and the corresponding restriction enzymes may be chosen from any suitable restriction site/enzyme combinations. However, certain enzyme selections and configurations may be particularly useful.
  • restriction sites 1 and 4 are long recognition sites (e.g., between about 8 and about 50 nucleotides long) that are recognized by rare-cutting restriction enzymes.
  • Rare-cutting restriction enzymes may be meganucleases, modified meganucleases (e.g., engineered meganucleases that include a cleavage domain from a type IIS enzyme but retain the binding domain of a meganuclease — for example from a mutant meganuclease that binds and does not cleave a target sequence, see for example meganucleases described in US Serial Number 60/925,507, filed April 19, 2007, the disclosure of which is incorporated herein by reference), or other rare cutting enzymes (e.g., Notl).
  • Restriction sites 1 and 4 may be Type II sites that are regenerated after ligation in each cycle.
  • Restriction sites 2 and 3 may be Type IIS sites that are not regenerated after ligation in each cycle. Sites 2 and 3 may be oriented to cause cleavage within the central region of each construct (as opposed to cleavage in the flanking regions). Restriction sites 2 and 3 may be different. However, the cleavage patterns (e.g., the type of overhang, 5' or 3', and the overhang length) of the Type IIS restriction enzymes that recognize 2 and 3 may be identical or similar. As a result, the nucleic acid constructs may be designed so that cleavage by Type IIS enzymes specific for sites 2 and 3 generates free ends that are compatible (e.g., cohesive or complementary) for a subsequent ligation reaction.
  • Type IIS enzymes specific for sites 2 and 3 generates free ends that are compatible (e.g., cohesive or complementary) for a subsequent ligation reaction.
  • sites 2 and 3 may be located to cause Type IIS cleavage within a sequence region that is common to the fragments being assembled (e.g., in an overlapping sequence region of I and II, or of III and IV, or of II and III illustrated in FIGs. 6 and 7).
  • the common or overlapping regions are not duplicated after assembly, because the cleavage sites may be designed to cut at a location that results in a single copy of the overlapping or common regions being reassembled upon ligation.
  • the cleavage sites for 2 and 3 are within the sequence of the nucleic acid being assembled.
  • the recognition sites for 2 and 3 are outside the sequence of the nucleic acid being assembled.
  • recognition sites are not regenerated upon ligation (e.g., of I and II, III and IV, or II and III).
  • additional sites 2 and 3 that are at the opposite ends of the fragments are carried over into the newly assembled product and are available for cleavage in a subsequent cycle as illustrated in FIGs. 6 and 7.
  • each insert to be progressively added may represent a plurality of nucleic acid variants.
  • insert I may represent a plurality (e.g., a pool) of variants of I
  • insert II may represent a plurality (e.g., a pool) of variants of II, and so on.
  • variants may include naturally occurring variants (such as SNPs) and other mutations.
  • each insert may encode a module of a protein (polypeptide), each containing one or more variants.
  • the protein is a naturally occurring protein or variant thereof.
  • the protein is an engineered protein (i.e., artificial protein) comprising one or more modules.
  • a module may represent a functional module, e.g., a kinase domain, etc.
  • each insert may represent a gene element, such as regulatory regions (e.g., promoters), exons, untranslated regions (e.g., 5'-UTR and 3'-UTR etc.).
  • insert I may represent a library of promoters
  • insert II may represent a library of genes or fragments thereof having similar functions (such as a functional domain)
  • insert III may represent a further fragment, and so on.
  • each insert may represent a cluster of genes or gene elements.
  • the methods and compositions of the instant invention may be used to generate a library of plurality of nucleic acid variants.
  • the method of the invention may be useful for generating a library of variants, each of which is a relatively long nucleic acid, e.g., 1, 5, 10, 15, 20, 30, 40, 50 kb or more.
  • restriction enzyme digestion and ligation may be performed in the same reaction tube.
  • Type IIS sites that are not regenerated after ligation can drive the reaction towards the correct assembly as described in more detail herein. This also may speed up an assembly reaction by avoiding separate digestion and ligation steps and by avoiding any purification, size separation, or other processing steps in between restriction enzyme digestion and ligation. This aspect also may be readily automated, avoiding additional sample manipulations associated with separate restriction digestion and ligation steps. It should be appreciated that in some embodiments, sites 1 and 3 may be cut by the same restriction enzyme (e.g., oriented such that site 1 is retained and site 3 is not retained after the subsequent ligation reaction).
  • sites 2 and 4 may be cut by the same restriction enzyme (e.g., oriented such that site 4 is retained and site 2 is not retained after the subsequent ligation reaction). It also should be appreciated that sites 1, 2, 3, and 4 all may be recognized by different enzymes. However, in some embodiments, sites 1, 2, 3, and 4 may be recognized by the same enzyme (e.g., the sites all have the same sequence) and differential digestion at positions 1 and 3 versus 2 and 4 may be achieved using specific protection or digestion techniques described herein (e.g., using oligonucleotides to protect from methylation).
  • a target sequence for assembly may be designed so that it does not include such sites.
  • FIG. 8 illustrates non-limiting embodiments showing details of activator sequence configurations such as those in FIGs. 6 and 7. It should be appreciated that the assembly strategy and configurations of sites, activation sequences, and markers illustrated in FIGs. 6, 7, and 8 may be generalized and used for any suitable combination of enzymes, enzyme recognition sites, activation sequences, marker genes, etc. as described herein.
  • activatable auxotrophic markers may include one or more of the following non-limiting examples of yeast alleles that may be used as auxotrophic markers: adel-14, ade2-l, ade2-101, ade2-BglII, canl-100, his3delta200, his3deltal, his3-l l,15, Ieu2deltal, Ieu2-3,112, lys2-801, lys2delta202, trpldeltal, trpldelta63, trpl- 1, trpl-289, ura3-52, ura3-l, ade2delta::hisG, Ieu2delta ⁇ , Iys2delta ⁇ , metl5delta ⁇ , ura3delta ⁇ .
  • auxotrophic markers and other markers that may be used are known in the art (See, for example, Brachmann et al. (1998) "Designer deletion strains derived from Saccharomyces cerevisiae S288C: a useful set of strains and plasmids for PCR-mediated gene disruption and other applications.” Yeast Volume 14, pp 115-132)
  • activatable markers may confer resistance to one or more of the following non-limiting antibiotics: neomycin, ampicillin, hygromycin, gentamycin, bleomycin, phleomycin, kanamycin, geneticin (or G418), paromomycin, tetracycline, beta-lactams, vancomycin, erythromycin, chloramphenicol, novobiocin, cefotaxime, coumermycin Ai and spectinomycin.
  • non-limiting antibiotics include neomycin, ampicillin, hygromycin, gentamycin, bleomycin, phleomycin, kanamycin, geneticin (or G418), paromomycin, tetracycline, beta-lactams, vancomycin, erythromycin, chloramphenicol, novobiocin, cefotaxime, coumermycin Ai and spectinomycin.
  • promoters may be used as activation sequences: bacterial promoters (e.g., T7, tRNA, rrn promoters, etc.); yeast promoters (e.g., GALl, GALlO, ADHl, etc.); insect promoters; mammalian promoters; and/or promoters from other species.
  • bacterial promoters e.g., T7, tRNA, rrn promoters, etc.
  • yeast promoters e.g., GALl, GALlO, ADHl, etc.
  • insect promoters e.g., mammalian promoters
  • mammalian promoters e.g, type IIS restriction sites
  • the same activation sequence may be used at both ends of an insert.
  • the same promoter may be used at the left and right ends of the assembled inserts described herein.
  • the orientation of the activation sequences may be such that they only work when integrated (e.g., ligated) into the correct site.
  • the promoters may be on opposite strands relative to the insert and only work to activate the adjacent marker if cloned into the correct end of the vector.
  • a vector may have an origin of replication, a selectable marker (e.g., an active marker) different from the activatable markers (e.g., inactive markers) used for assembly as described herein.
  • the vector also may include appropriate restriction sites adjacent to the activatable markers as described herein.
  • the vectors may be prokaryotic, eukaryotic (e.g., yeast, mammalian, insect) or viral. Different vectors may be adapted for different insert sizes as described herein (e.g., BAC, YAC, etc. for larger insert sizes) or different uses. However, different vectors may include the same activatable markers and/or appropriate restrictions sites for iterative assembly as described herein.
  • any suitable technique e.g., chemical or enzymatic
  • any suitable technique may be used to digest the nucleic acids at the appropriate sites as described herein for iterative assembly.
  • any suitable technique may be used for connecting nucleic acids (e.g., chemical or enzymatic ligation — e.g., using a suitable ligase such as T4 ligase or other ligase, or in vivo recombination as described herein for concerted assembly).
  • a sequence analysis and design strategy of the invention may be incorporated in an assembly process outlined in FIG. 5.
  • FIG. 5 illustrates a method for assembling a nucleic acid in accordance with one embodiment of the invention.
  • sequence information may be the sequence of a predetermined target nucleic acid that is to be assembled.
  • the sequence may be received in the form of an order from a customer. The order may be received electronically or on a paper copy.
  • the sequence may be received as a nucleic acid sequence (e.g., DNA or RNA).
  • the sequence may be received as a protein sequence.
  • the sequence may be converted into a DNA sequence. For example, if the sequence obtained in act 500 is an RNA sequence, the Us may be replaced with Ts to obtain the corresponding DNA sequence.
  • the sequence obtained in act 500 is a protein sequence, it may be converted into a DNA sequence using appropriate codons for the amino acids.
  • codons for each amino acid consideration may be given to one or more of the following factors: i) using codons that correspond to the codon bias in the organism in which the target nucleic acid may be expressed, ii) avoiding excessively high or low GC or AT contents in the target nucleic acid (for example, above 60% or below 40%; e.g., greater than 65%, 70%, 75%, 80%, 85%, or 90%; or less than 35%, 30%, 25%, 20%, 15%, or 10%), iii) avoiding sequence features that may interfere with the assembly procedure (e.g., the presence of repeat sequences, high GC content or stem loop structures), and/or iv) avoiding recognition sequences for one or more restriction enzymes that may be used in an assembly procedure (e.g., restriction enzyme sites 1-4 illustrated in FIGs.
  • a DNA sequence determination may omit one or more steps relating to the analysis of the GC or AT content of the target nucleic acid sequence (e.g., the GC or AT content may be ignored in some embodiments) or one or more steps relating to the analysis of certain sequence features (e.g., sequence repeats, inverted repeats, etc.) that could interfere with an assembly reaction performed under standard conditions but may not interfere with an assembly reaction including one or more concerted assembly steps.
  • target or insert sequences may be designed or modified to remove one or more of the restriction enzyme sites that are used for the iterative assembly.
  • the sequence information may be analyzed to determine an assembly strategy. This may involve determining whether the target nucleic acid will be assembled as a single fragment or if several intermediate fragments will be assembled separately and then combined in one or more additional rounds of assembly to generate the target nucleic acid.
  • a sequence analysis may involve deciding which fragments will be prepared to be assembled in a first vector using a vector-encoded trait activation technique of the invention.
  • Nucleic acids being assembled may include one or more sequences that could act as activator sequences.
  • an assembly strategy may be designed to prevent these putative activator sequences (e.g., promoters, terminators, etc.) from being located on an assembly fragment at a location (e.g., at a 5' or 3' end) where they may activate a vector-encoded trait when the fragments are incorrectly assembled (e.g., inverted or cloned to the incorrect free end of a linearized vector, etc.).
  • putative activator sequences may be buried within the central regions (e.g., within about the middle 80%) of fragments that are being assembled.
  • a sequence analysis also may be important for choosing the restriction enzymes, activator sequences, or vector-encoded traits that will be used.
  • one or more enzymes chosen for assembly may be ones that are not present (or only present in small numbers) in the target sequence.
  • Activator sequences and/or vector-encoded traits may be chosen so that they do not interfere with one or more functions (e.g., gene encoded functions) on the target nucleic acid.
  • ampicillin resistance may be avoided as an activatable marker if the target nucleic acid being assembled encodes beta lactamase or other enzyme that protects from (e.g., degrades or modifies) ampicillin.
  • input nucleic acids e.g., oligonucleotides
  • the sizes and numbers of the input nucleic acids may be based in part on the type of assembly reaction (e.g., the type of polymerase-based assembly, ligase-based assembly, chemical assembly, or combination thereof) that is being used for each fragment.
  • the input nucleic acids also may be designed to avoid 5' and/or 3' regions that may cross-react incorrectly and be assembled to produce undesired nucleic acid fragments. Other structural and/or sequence factors also may be considered when designing the input nucleic acids.
  • some of the input nucleic acids may be designed to incorporate one or more specific sequences (e.g., primer binding sequences, restriction enzyme sites, etc.) at one or both ends of the assembled nucleic acid fragment.
  • the input nucleic acids are obtained. These may be synthetic oligonucleotides that are synthesized on-site or obtained from a different site (e.g., from a commercial supplier). In some embodiments, one or more input nucleic acids may be amplification products (e.g., PCR products), restriction fragments, or other suitable nucleic acid molecules. Synthetic oligonucleotides may be synthesized using any appropriate technique as described in more detail herein. It should be appreciated that synthetic oligonucleotides often have sequence errors. Accordingly, oligonucleotide preparations may be selected or screened to remove error-containing molecules as described in more detail herein.
  • an assembly reaction may be performed for each nucleic acid fragment.
  • the input nucleic acids may be assembled using any appropriate assembly technique (e.g., a polymerase-based assembly, a ligase-based assembly, a chemical assembly, or any other multiplex nucleic acid assembly technique, or any combination thereof).
  • An assembly reaction may result in the assembly of a number of different nucleic acid products in addition to the predetermined nucleic acid fragment. Accordingly, in some embodiments, an assembly reaction may be processed to remove incorrectly assembled nucleic acids (e.g., by size fractionation) and/or to enrich correctly assembled nucleic acids (e.g., by amplification, optionally followed by size fractionation).
  • correctly assembled nucleic acids may be amplified (e.g., in a PCR reaction) using primers that bind to the ends of the predetermined nucleic acid fragment. It should be appreciated that act 530 may be repeated one or more times. For example, in a first round of assembly a first plurality of input nucleic acids (e.g., oligonucleotides) may be assembled to generate a first nucleic acid fragment. In a second round of assembly, the first nucleic acid fragment may be combined with one or more additional nucleic acid fragments and used as starting material for the assembly of a larger nucleic acid fragment.
  • a first plurality of input nucleic acids e.g., oligonucleotides
  • this larger fragment may be combined with yet further nucleic acids and used as starting material for the assembly of yet a larger nucleic acid.
  • This procedure may be repeated as many times as needed for the synthesis of a target nucleic acid. Accordingly, progressively larger nucleic acids may be assembled.
  • nucleic acids of different sizes may be combined.
  • the nucleic acids being combined may have been previously assembled in a multiplex assembly reaction. However, at each stage, one or more nucleic acids being combined may have been obtained from different sources (e.g., PCR amplification of genomic DNA or cDNA, restriction digestion of a plasmid or genomic DNA, or any other suitable source).
  • One or more cycles of assembly may be performed using a vector-encoded trait- activation technique described herein.
  • nucleic acids generated in each cycle of assembly may contain sequence errors if they incorporated one or more input nucleic acids with sequence error(s). Accordingly, a fidelity optimization procedure may be performed after a cycle of assembly in order to remove or correct sequence errors. It should be appreciated that fidelity optimization may be performed after each assembly reaction when several consecutive cycles of assembly are performed. However, in certain embodiments fidelity optimization may be performed only after a subset (e.g., 2 or more) of consecutive assembly reactions are complete. In some embodiments, no fidelity optimization is performed. Accordingly, act 540 is an optional fidelity optimization procedure.
  • Act 540 may be used in some embodiments to remove nucleic acid fragments that seem to be correctly assembled (e.g., based on their size or restriction enzyme digestion pattern) but that may have incorporated input nucleic acids containing sequence errors as described herein. For example, since synthetic oligonucleotides may contain incorrect sequences due to errors introduced during oligonucleotide synthesis, it may be useful to remove nucleic acid fragments that have incorporated one or more error-containing oligonucleotides during assembly. In some embodiments, one or more assembled nucleic acid fragments may be sequenced to determine whether they contain the predetermined sequence or not. This procedure allows fragments with the correct sequence to be identified.
  • error containing-nucleic acids may be double-stranded homoduplexes having the error on both strands (i.e., incorrect complementary nucleotide(s), deletion(s), or addition(s) on both strands), because the assembly procedure may involve one or more rounds of polymerase extension (e.g., during assembly or after assembly to amplify the assembled product) during which an input nucleic acid containing an error may serve as a template thereby producing a complementary strand with the complementary error.
  • polymerase extension e.g., during assembly or after assembly to amplify the assembled product
  • a preparation of double-stranded nucleic acid fragments may be suspected to contain a mixture of nucleic acids that have the correct sequence and nucleic acids that incorporated one or more sequence errors during assembly.
  • sequence errors may be removed using a technique that involves denaturing and reannealing the double-stranded nucleic acids.
  • single strands of nucleic acids that contain complementary errors may be unlikely to reanneal together if nucleic acids containing each individual error are present in the nucleic acid preparation at a lower frequency than nucleic acids having the correct sequence at the same position. Rather, error containing single strands may reanneal with a complementary strand that contains no errors or that contains one or more different errors.
  • error- containing strands may end up in the form of heteroduplex molecules in the reannealed reaction product.
  • Nucleic acid strands that are error-free may reanneal with error- containing strands or with other error-free strands.
  • Reannealed error-free strands form homoduplexes in the reannealed sample.
  • Any suitable method for removing heteroduplex. molecules may be used, including chromatography, electrophoresis, selective binding of heteroduplex molecules, etc.
  • mismatch binding proteins that selectively (e.g., specifically) bind to heteroduplex nucleic acid molecules may be used.
  • One example includes using MutS, a MutS homolog, or a combination thereof to bind to heteroduplex molecules.
  • the MutS protein which appears to function as a homodimer, serves as a mismatch recognition factor.
  • MSH MutS Homolog
  • the MSH2-MSH6 complex (also known as MutS ⁇ ) recognizes base mismatches and single nucleotide insertion/deletion loops
  • the MSH2-MSH3 complex (also known as MutS ⁇ ) recognizes insertions/deletions of up to 12-16 nucleotides, although they exert substantially redundant functions.
  • a mismatch binding protein may be obtained from recombinant or natural sources.
  • a mismatch binding protein may be heat-stable.
  • a thermostable mismatch binding protein from a thermophilic organism may be used.
  • thermostable DNA mismatch binding proteins include, but are not limited to: Tth MutS (from Thermus thermophilics); Taq MutS (from Thermus aquaticus); Apy MutS (from Aquifex pyrophilus); Tma MutS (from Thermotoga maritim ⁇ ); any other suitable MutS; or any combination of two or more thereof.
  • protein-bound heteroduplex molecules from Thermus thermophilics
  • Taq MutS from Thermus aquaticus
  • Apy MutS from Aquifex pyrophilus
  • Tma MutS from Thermotoga maritim ⁇
  • any other suitable MutS or any combination of two or more thereof.
  • protein-bound heteroduplex molecules from Thermus thermophilics
  • Taq MutS from Thermus aquaticus
  • Apy MutS from Aquifex pyrophilus
  • Tma MutS from Thermotoga maritim ⁇
  • any other suitable MutS or any combination of two or more thereof.
  • heteroduplex molecules bound to one or more MutS proteins may be removed from a sample using any suitable technique (binding to a column, a filter, a nitrocellulose filter, etc., or any combination thereof). It should be appreciated that this procedure may not be 100% efficient. Some errors may remain for at least one of the following reasons. Depending on the reaction conditions, not all of the double-stranded error-containing nucleic acids may be denatured. In addition, some of the denatured error-containing strands may reanneal with complementary error-containing strands to form an error containing homoduplex. Also, the MutS/heteroduplex interaction and the MutS/heteroduplex removal procedures may not be 100% efficient.
  • the fidelity optimization act 540 may be repeated one or more times after each assembly reaction. For example, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more cycles of fidelity optimization may be performed after each assembly reaction.
  • the nucleic acid is amplified after each fidelity optimization procedure. It should be appreciated that each cycle of fidelity optimization will remove additional error-containing nucleic acid molecules. However, the proportion of correct sequences is expected to reach a saturation level after a few cycles of this procedure.
  • the size of an assembled nucleic acid that is fidelity optimized may be determined by the expected number of sequence errors that are suspected to be incorporated into the nucleic acid during assembly.
  • an assembled nucleic acid product should include error free nucleic acids prior to fidelity optimization in order to be able to enrich for the error free nucleic acids. Accordingly, error screening (e.g., using MutS or a MutS homolog) should be performed on shorter nucleic acid fragments when input nucleic acids have higher error rates.
  • one or more nucleic acid fragments of between about 200 and about 800 nucleotides are assembled prior to fidelity optimization. After assembly, the one or more fragments may be exposed to one or more rounds of fidelity optimization as described herein. In some embodiments, several assembled fragments may be ligated together (e.g., to produce a larger nucleic acid fragment of between about 1,000 and about 5,000 bases in length, or larger), and optionally cloned into a vector, prior to fidelity optimization as described herein. At act 550, an output nucleic acid is obtained.
  • an output nucleic acid may be cloned with one or more other nucleic acids (e.g., other output nucleic acids) for subsequent applications.
  • Subsequent applications may include one or more research, diagnostic, medical, clinical, industrial, therapeutic, environmental, agricultural, or other uses.
  • each nucleic acid assembly may involve a combination of one or more extension, ligation, and/or cloning procedures.
  • a target nucleic acid may be assembled entirely in vitro using multiplex extension reactions, ligation reactions, or a combination thereof.
  • the resulting target nucleic acid product then may be transformed into a host cell (e.g., after insertion into a vector) for subsequent growth and amplification.
  • a target nucleic acid may be assembled from a plurality of intermediate nucleic acids (e.g., shorter nucleic acids that will be combined to form the final target nucleic acid product) that have been inserted into vectors and amplified in vivo in a host cell.
  • a target nucleic acid assembly may involve preparing a first plurality of intermediate nucleic acids (e.g., using an in vitro multiplex assembly reaction for each intermediate nucleic acid), cloning each of the first plurality of intermediate nucleic acids into a vector for amplification in a host cell, isolating each of the first plurality of intermediate nucleic acids after amplification in the host cell, and assembling the first plurality of intermediate nucleic acids (e.g., via ligation) to obtain the target nucleic acid.
  • This final assembly step may include cloning into an appropriate vector so that the target nucleic acid can be grown and amplified in an appropriate host cell.
  • assembly of a target nucleic acid may involve several cycles of intermediate cloning.
  • a first plurality of intermediates may be cloned into an appropriate vector and amplified in a host cell.
  • Subsets of the first plurality of intermediates e.g., pairs of nucleic acids, or groups of 3, 4, 5, 6, 7, 8, 9, 10 or more intermediate nucleic acids from the first plurality
  • This second plurality of intermediates also may be cloned into an appropriate vector and amplified in a host cell.
  • This second plurality of intermediates may be assembled directly to form the final target nucleic acid.
  • this second plurality of intermediates may be cycled through one or more additional intermediate assembly procedures (e.g., forming third, fourth, fifth, sixth, or more pluralities of progressively longer intermediates) before a final nucleic acid is assembled.
  • Each of the first plurality of intermediates may be generated by ligation or extension (e.g., in an in vitro multiplex nucleic acid assembly reaction).
  • the decision to further assemble the first plurality of intermediates using one or more cycles of cloning and amplification in host cells may be based on the properties of the intermediates (e.g., predicted or actual difficulties in further assembling them using only in vitro reactions). It should be appreciated overall that assembly time may be reduced by avoiding intermediate cloning steps that involve cell growth. Accordingly, in some embodiments, in vitro assembly techniques alone are used to generate a final nucleic acid product that subsequently may be cloned and propagated in a host cell. However, in some embodiments, one or more intermediates that are difficult to assemble correctly in an in vitro multiplex assembly reaction may be more readily assembled and amplified by cloning into a vector and transforming into a host cell.
  • nucleic acid size also may determine whether vector cloning and in vivo amplification are used for further assembly.
  • nucleic acids that are longer than about 1.5 kb may be further assembled using vector cloning and in vivo amplification.
  • nucleic acids may be predicted to be difficult to assemble using in vitro multiplex assembly (e.g., due to the presence of one or more sequence features predicted to interfere with in vitro multiplex assembly).
  • nucleic acids may be experimentally determined to be difficult to assemble correctly using in vitro multiplex assembly (e.g., a correct final product is not generated).
  • one or more assembly steps involving cloning and host cell transformation may be used to obtain a correct product of interest.
  • an assembly strategy may be designed to provide an integrated overlapping enzyme system (also known as ION) that provides one or more intermediate cloning and host cell transformation cycles that may be combined with in vitro multiplex assembly steps.
  • this assembly is hierarchical.
  • a first plurality of first intermediates may be generated by any suitable method (e.g., ligation, extension, etc., or a combination thereof).
  • Each first intermediate may be cloned into a first vector and amplified in a host cell preparation. These first intermediates then may be grouped together into subsets and the intermediates in each subset may be assembled and cloned into a second vector in a second cloning step.
  • the intermediates in each subset correspond to adjacent sequences in the target nucleic acid.
  • This second cloning step generates a second plurality of intermediates with a smaller number of larger intermediates than the first plurality.
  • the ratio of the numbers of intermediates in the first and second pluralities is related to the number of first intermediates that are cloned together in each subset during the second cloning step.
  • the number of intermediates in the second plurality will be 1 /N the number of intermediates in the first plurality.
  • different subsets may contain different numbers of first intermediates that are cloned together during the second cloning step.
  • This cycle may be repeated one or more times (e.g., subsets of the second plurality of intermediates may be assembled in a third cloning step to generate a third plurality of intermediates, etc.) until a final single product is generated. For example, FIG.
  • FIG. 9 illustrates a non-limiting embodiment of an integrated overlapping enzyme cloning strategy where nine first intermediates each approximately 0.5 kb in length are assembled (e.g., using a multiplex ligase or polymerase assembly), cloned into vectors, and transformed into host cells. Subsets of three first intermediates are then cloned together in a second cloning step to generate three second intermediates each approximately 1.5 kb in length. In a third cloning step, the three second intermediates are cloned together in a third cloning step to generate a full length target nucleic acid approximately 4.5 kb in length.
  • intermediates that are used, the number of intermediates that are cloned together in each cycle, the number of cycles, and the length of the final product may vary, as the invention is not limited in this respect.
  • intermediates of about 1.5 kb in length are generated (e.g., in a polymerase-based in vitro multiplex assembly) and further assembled by cloning and host cell transformation.
  • fidelity optimization e.g., by error removal using a mismatch recognition protein, for example, MUTS
  • fidelity optimization of the first intermediates may be performed (e.g., before or after they are cloned into the first vectors).
  • the cloning vectors that are used at each stage may be identical. However, different vectors may be used for different cloning reactions. For example, each cloning reaction may use a different vector. The different vectors may have different selectable markers. The different vectors may have different copy numbers. The different vectors may be adapted for inserts of different lengths. For example, vectors that are more suited for large inserts may be used at later stages in an assembly. In some embodiments, two different vectors (e.g., with different selectable markers) may be used and alternated in sequential cloning steps.
  • One or more assembly steps may be automated (e.g., using a robotic handler or a microfluidic device). Automation may be facilitated by avoiding fragment isolation (e.g., based on electrophoretic size separation) during one or more cloning steps associated with any stage of assembly described herein.
  • two or more first fragments e.g., different first fragments
  • the reaction is driven towards fragment assembly in the second vector because this integration is not reversed by the restriction enzyme.
  • Selection for fragment integration into the second vector may be performed by using a different selectable markers on each of the two vectors (e.g., ampicillin resistance on the first vector and chloramphenicol resistance on the second vector). After simultaneous digestion and ligation, the reaction mixture may be transformed into a host cell that is then exposed to an appropriate selection.
  • the second vector may be provided in a linear form with incompatible free ends to avoid vector re-ligation that would generate a background of empty vectors having the second selectable marker.
  • type IIS restrictions enzymes may be used to generate appropriate insert fragments from the first vectors.
  • the type IIS sites may be located on a first vector on both sides of a fragment being excised. The type IIS sites may be oriented such that excised fragment does not contain the type IIS sites. As a result, the type IIS sites are not present in the second vector after fragment integration.
  • the backbone of the second vector may be designed, selected, or modified to avoid containing any of the type IIS restriction sites that are used to excise the first fragments from the first vector.
  • Each of the first fragments in the first vectors may be flanked by the same type IIS restriction site to allow excision of all of the fragments using the same enzyme.
  • different type IIS sites and enzymes may be used to excise fragments from different first vectors. However, they are preferably selected to generate appropriate compatible ends (e.g., complementary overhangs) so that the excised fragments can be ligated together without requiring any further processing.
  • three first vectors contain different fragments (a, b, and c) flanked by Bbsl sites.
  • the first vectors all encode ampicillin resistance.
  • the first vectors are incubated in a single reaction along with Bbs I, a ligase (e.g., T4 ligase), and a second vector that encodes chloramphenicol resistance.
  • the second vector may be linearized to generate free ends, each one of which is compatible with a free end of one of the first fragments.
  • a correctly ligated vector containing fragments a, b, and c in the correct order may be selected for using chloramphenicol.
  • the method may be used with different restriction sites and enzymes, different ligases, different vectors with different selectable markers, and different numbers of inserts (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 10-20, or more), that may be assembled in a single concerted reaction according to the invention.
  • This process may be repeated through several cycles by using vectors encoding different selectable markers at each cycle.
  • a pair of vectors encoding different selectable markers may be used. Each vector is used as the second (receiving) vector in alternate cycles.
  • a quality control procedure may be performed at one or more steps in a multi-stage assembly of the invention.
  • an ION assembly may involve a quality control at one or more intermediate stages.
  • quality control may be performed at each intermediate stage.
  • a quality control procedure may include one or more techniques designed to distinguish incorrectly assembled intermediates from correctly assembled intermediates.
  • a quality control procedure may include sequencing, amplification (e.g., by PCR, LCR, etc.), restriction enzyme digestion, size analysis (e.g., using electrophoresis, mass spectrometry, etc.), any other suitable quality control technique, or any combination of two or more thereof.
  • One advantage of real-time quality control during a multi-stage assembly is the early identification of one or more incorrectly assembled intermediates before the final product is generated and analyzed.
  • An incorrectly assembled intermediate can be re-synthesized or re-assembled in a correct format and then re- introduced into an assembly process at an appropriate stage to be incorporated into a final nucleic acid product.
  • an incorrect assembly may be indicative of the presence, in the intermediate nucleic acid, of certain sequences that are difficult to assemble. Certain sequences may be difficult to assemble, because they contain sequences that are unstable (e.g., because they are toxic, they contain certain direct or inverted sequence repeats, etc..
  • one or more alternative assembly techniques may be used to generate an intermediate nucleic acid that was incorrectly assembled using a first assembly technique.
  • a different vector and/or a different host organism may be used.
  • Different assembly methods e.g., extension, ligation, or a combination thereof
  • different starting nucleic acids e.g., different oligonucleotides, etc.
  • two or more smaller fragments of an intermediate that was incorrectly assembled may be prepared.
  • a correctly assembled nucleic acid e.g., a correctly assembled intermediate
  • a correctly assembled nucleic acid may be obtained without using alternative assembly techniques, but instead by screening a larger number of potential constructs (e.g., clones) to identify a correct one.
  • a plurality of nucleic acid fragments may be assembled in a single concerted procedure wherein r the plurality of fragments is mixed together under conditions that promote covalent assembly of the fragments to generate a specific longer nucleic.
  • concerted assembly techniques may be used in combination with iterative assembly techniques described herein (e.g., for example at different stages of an assembly process — or more that two inserts, for example 3, 4, 5, 6, 7, 8, 9, 10, or more may be added at each step of an iterative assembly described herein, wherein only the outer inserts have the activation sequences).
  • a plurality of nucleic acid fragments may be covalently assembled in vivo in a host cell.
  • a plurality of nucleic acid fragments may be mixed together without ligase and transformed into a host cell where they are covalently joined together to produce a longer nucleic acid (e.g., containing the n different nucleic acid fragments covalently liked together).
  • a ligase and/or recombinase may be used in some embodiments (e.g., added to a plurality of nucleic acid fragments prior to a host cell transformation).
  • nucleic acid fragments may be assembled (e.g., in a concerted in vivo assembly without using ligase).
  • any number of nucleic acids e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, etc. may be assembled using concerted assembly techniques.
  • Each nucleic acid fragment being assembled may be between about 100 nucleotides long and about 1 ,000 nucleotides long (e.g., about 200, about 300, about 400, about 500, about 600, about 700, about 800, about 900). However, longer (e.g., about 2,500 or more nucleotides long, about 5,000 or more nucleotides long, about 7,500 or more nucleotides long, about 10,000 or more nucleotides long, etc.) or shorter nucleic acid fragments may be assembled using a concerted assembly technique (e.g., shotgun assembly into a plasmid vector).
  • a concerted assembly technique e.g., shotgun assembly into a plasmid vector.
  • each nucleic acid fragment may be independent of the size of other nucleic acid fragments added to a concerted assembly. However, in some embodiments, each nucleic acid fragment may be approximately the same size (e.g., between about 400 nucleotides long and about 800 nucleotides long). It should be appreciated that the length of a double-stranded DNA fragment may be indicated by the number of base pairs. As used herein, a nucleic acid fragment referred to as "x" nucleotides long corresponds to "x" base pairs in length when used in the context of a double-stranded DNA fragment.
  • one or more nucleic acids being assembled in a concerted reaction may be codon-optimized and/or non- naturally occurring. In some embodiments, all of the nucleic acids being assembled in a concerted reaction are codon-optimized and/or non-naturally occurring.
  • nucleic acid fragments being assembled are designed to have overlapping complementary sequences. In some embodiments, the nucleic acid fragments are double-stranded DNA fragments with 3' and/or 5' single- stranded overhangs. These overhangs may be cohesive ends that can anneal to complementary cohesive ends on different DNA fragments.
  • the presence of complementary sequences (and particularly complementary cohesive ends) on two DNA fragments promotes their covalent assembly in vivo.
  • a plurality of DNA fragments with different overlapping complementary single-stranded cohesive ends are assembled and their order in the assembled nucleic acid product is determined by the identity of the cohesive ends on each fragment.
  • the nucleic acid fragments may be designed so that a first nucleic acid has a first cohesive end that is complementary to a first cohesive end of the vector and a second cohesive end that is complementary to a first cohesive end of a second nucleic acid.
  • the second cohesive end of the second nucleic acid may be complementary to a first cohesive end of a third nucleic acid.
  • the second cohesive end of the third nucleic acid may be complementary a first cohesive end of a fourth nucleic acid. And so on through to the final nucleic acid that has a first cohesive end that may be complementary to a second cohesive end on the penultimate nucleic acid.
  • the second cohesive end of the final nucleic acid may be complementary to a second cohesive end of the vector.
  • this technique may be used to generate a vector containing nucleic acid fragments assembled in a predetermined linear order (e.g., first, second, third, forth, ..., final).
  • the overlapping complementary regions between adjacent nucleic acid fragments are designed (or selected) to be sufficiently different to promote (e.g., thermodynamically favor) assembly of a unique alignment of nucleic acid fragments (e.g., a selected or designed alignment of fragments). It should be appreciated that overlapping regions of different length may be used. In some embodiments, longer cohesive ends may be used when higher numbers of nucleic acid fragments are being assembled. Longer cohesive ends may provide more flexibility to design or select sufficiently distinct sequences to discriminate between correct cohesive end annealing (e.g., involving cohesive ends designed to anneal to each other) and incorrect cohesive end annealing (e.g., between non-complementary cohesive ends).
  • two or more pairs of complementary cohesive ends between different nucleic acid fragments may be designed or selected to have identical or similar sequences in order to promote the assembly of products containing a relatively random arrangement (and/or number) of the fragments that have similar or identical cohesive ends. This may be useful to generate libraries of nucleic acid products with different sequence arrangements and/or different copy numbers of certain internal sequence regions.
  • each of the two terminal nucleic acid fragments may be designed to have a cohesive end that is complementary to a cohesive end on a vector (e.g., on a linearized vector).
  • These cohesive ends may be identical cohesive ends that can anneal to identical complementary terminal sequences on a linearized vector.
  • the cohesive ends on the terminal fragments are different and the vector contains two different cohesive ends, one at each end of a linearized vector), each complementary to one of the terminal fragment cohesive ends.
  • the vector may be a linearized plasmid that has two cohesive ends, each of which is complementary with one end of the assembled nucleic acid fragments.
  • the nucleic acid fragments are mixed with a vector and incubated before transformation into a host cell. It should be appreciated that incubation under conditions that promote specific annealing of the cohesive ends may increase the frequency of assembly (e.g., correct assembly) upon transformation into the host organism.
  • the different cohesive ends are designed to have similar melting temperatures (e.g., within about 5 0 C of each other) so that correct annealing of all of the fragments is promoted under the same conditions. Correct annealing may be promoted at a different temperature depending on the length of the cohesive ends that are used.
  • cohesive ends of between about 4 and about 30 nucleotides in length may be used.
  • Incubation temperatures may range from about 20 0 C to about 50 0 C (including, e.g., 37 0 C). However, higher or lower temperatures may be used.
  • the length of the incubation may be optimized based on the length of the overhangs, the complexity of the overhangs, and the number of different nucleic acids (and therefore the number of different overhangs) that are mixed together.
  • the incubation time also may depend on the annealing temperature and the presence or absence of other agents in the mixture.
  • a nucleic acid binding protein and/or a recombinase may be added (e.g., RecA, for example a heat stable RecA protein).
  • the resulting complex of nucleic acids may be transformed directly into a host without using a ligase.
  • One or more host functions e.g., ligation, recombination, any other suitable function, or any combination thereof
  • a ligase may be added prior to transformation.
  • ligase-free concerted assembly method of the invention may be avoided by using a ligase-free concerted assembly method of the invention.
  • nucleic acid fragments and a vector are transformed into a host cell without any prior incubation period (other than the time required for mixing the nucleic acids and performing the transformation).
  • a recombinase for example RecA, e.g., a thermostable RecA
  • a nucleic acid binding protein may be mixed with the nucleic acid fragments and the vector, and optionally incubated, prior to transformation into a host cell.
  • nucleic acid fragments being assembled all may have complementary 3' overhangs, complementary 5' overhangs, or a combination thereof.
  • the complementary regions of two nucleic acid fragments that are designed to be adjacent should have the same type of overhang. For example, if nucleic acid "n" has a 5' overhang at its second end, then nucleic acid “n+1” should have a 5' overhang at its first end. However, nucleic acid “n+1” may have a 3' overhang at its second end if nucleic acid "n+2" has a 3' overhang at its first end. It should be understood that different nucleic acid assembly configurations may be designed and constructed.
  • a concerted assembly may involve multiple copies of certain nucleic acids and single copies of other nucleic acids.
  • one or more nucleic acid fragments being assembled may have blunt ends.
  • double-stranded blunt ends may have overlapping identical sequences on nucleic acid fragments that are designed to be adjacent to each other on an assembled nucleic acid product.
  • a vector may be a plasmid, a bacterial vector, a viral vector, a phage vector, an insect vector, a yeast vector, a mammalian vector, a BAC, a YAC, or any other suitable vector.
  • a vector may be a vector that replicates in only one type of organism (e.g., bacterial, yeast, insect, mammalian, etc.) or in only one species of organism. Some vectors may have a broad host range.
  • Some vectors may have different functional sequences (e.g., origins or replication, selectable markers, etc.) that are functional in different organisms. These may be used to shuttle the vector (and any nucleic acid fragment(s) that are cloned into the vector) between two different types of organism (e.g., between bacteria and mammals, yeast and mammals, etc.). In some embodiments, the type of vector that is used may be determined by the type of host cell that is chosen.
  • a vector may encode a detectable marker such as a selectable marker (e.g., antibiotic resistance, etc.) so that transformed cells can be selectively grown and the vector can be isolated and any insert can be characterized to determine whether it contains the desired assembled nucleic acid.
  • the insert may be characterized using any suitable technique (e.g., size analysis, restriction fragment analysis, sequencing, etc.).
  • the presence of a correctly assembled nucleic acid in a vector may be assayed by determining whether a function predicted to be encoded by the correctly assembled nucleic acid is expressed in the host cell.
  • host cells that harbor a vector containing a nucleic acid insert may be selected for or enriched by using one or more additional detectable or selectable markers that are only functional if a correct (e.g., designed) terminal nucleic acid fragments is cloned into the vector.
  • a host cell should have an appropriate phenotype to allow selection for one or more drug resistance markers encoded on a vector (or to allow detection of one or more detectable markers encoded on a vector).
  • any suitable host cell type may be used (e.g., prokaryotic, eukaryotic, bacterial, yeast, insect, mammalian, etc.).
  • the type of host cell may be determined by the type of vector that is chosen.
  • a host cell may be modified to have increased activity of one or more ligation and/or recombination functions.
  • a host cell may be selected on the basis of a high ligation and/or recombination activity.
  • a host cell may be modified to express (e.g., from the genome or a plasmid expression system) one or more ligase and/or recombinase enzymes.
  • a host cell may be transformed using any suitable technique (e.g., electroporation, chemical transformation, infection with a viral vector, etc.). Certain host organisms are more readily transformed than others.
  • all of the nucleic acid fragments and a linearized vector are mixed together and transformed into the host cell in a single step. However, in some embodiments, several transformations may be used to introduce all the fragments and vector into the cell (e.g., several consecutive transformations using subsets of the fragments).
  • the linearized vector is preferably designed to have incompatible ends so that it can only be circularized (and thereby confer resistance to a selectable marker) if the appropriate fragments are cloned into the vector in the designed configuration. This avoids or reduces the occurrence of "empty" vectors after selection.
  • Overhangs may be generated using any suitable technique.
  • a double-stranded nucleic acid fragment e.g., a fragment assembled in a multiplex assembly
  • an appropriate restriction enzyme to generate a terminal single-stranded overhang.
  • fragments that are designed to be adjacent to each other in an assembled product may be digested with the same enzyme to expose complementary overhangs.
  • overhangs may be generated using a type IIS restriction enzyme.
  • Type IIS restriction enzymes are enzymes that bind to a double stranded nucleic acid at one site, referred to as the recognition site, and make a single double stranded cut outside of the recognition site.
  • the double stranded cut referred to as the cleavage site, is generally situated 0-20 bases away from the recognition site.
  • the recognition site is generally about 4-7 bp long.
  • All type IIS restriction enzymes exhibit at least partial asymmetric recognition. Asymmetric recognition means that 5' ⁇ 3' recognition sequences are different for each strand of the nucleic acid.
  • the enzyme activity also shows polarity meaning that the cleavage sites are located on only one side of the recognition site. Thus, there is generally only one double stranded cut corresponding to each recognition site.
  • Cleavage generally produces 1-5 nucleotide single-stranded overhangs, with 5' or 3' termini, although some enzymes produce blunt ends. Either cut is useful in the context of the invention, although in some instances those producing single-stranded overhangs are produced.
  • ⁇ 80 type IIS enzymes have been identified.
  • Examples include but are not limited to BstF5 I, BtsC I, BsrD I, Bts I, AIw I, Bcc I, BsmA I, Ear I, MIy I (blunt), PIe I, Bmr I, Bsa I, BsmB I, Fau I, MnI I, Sap I, Bbs I, BciV I, Hph I, Mbo II, BfUA I, BspCN I, BspM I, SfaN I, Hga I, BseR I, Bbv I, Eci I, Fok I, BceA I, BsmF I, BtgZ I, BpuE I, Bsg I, Mme I, BseG I, Bse3D I, BseM I, AcIW I, Alw26 I, Bst6 I, BstMA I, Eaml 104 I, Ksp632 I, Pps I, Sch I (blu
  • each of a plurality of nucleic acid fragments designed for concerted assembly may have a type IIS restriction site at each end.
  • the type IIS restriction sites may be oriented so that the cleavage sites are internal relative to the recognition sequences.
  • enzyme digestion exposes an internal sequence (e.g., an overhang within an internal sequence) and removes the recognition sequences from the ends.
  • the same type IIS sites may be used for both ends of all of the nucleic acid fragments being prepared for assembly. However, different type IIS sites also may be used.
  • Two fragments that are designed to be adjacent in an assembled product each may include an identical overlapping terminal sequence and a flanking type IIS site that is appropriately located to expose complementary overhangs within the overlapping sequence upon restriction enzyme digestion. Accordingly, a plurality of nucleic acid fragments may be generated with different complementary overhangs.
  • the restriction site at each end of a nucleic acid fragment may be located such that digestion with the appropriate type IIS enzyme removes the restriction site and exposes a single- stranded region that is complementary to a single-stranded region on a nucleic acid fragment that is designed to be adjacent in the assembled nucleic acid product.
  • one end of each of the two terminal nucleic acid fragments may be designed to have a single-stranded overhang (e.g., after digestion with an appropriate restriction enzyme) that is complementary to a single-stranded overhang of a linearized vector nucleic acid.
  • the resulting nucleic acid fragments and vector may be transformed directly into a host cell.
  • the nucleic acid fragments and vector may be incubated to promote hybridization and annealing of the complementary sequences prior to transformation in the host cell.
  • a vector may be prepared using any one of the techniques described herein or any other suitable technique that produces a single-stranded overhang that would be complementary to an end of one of the terminal nucleic acid fragments.
  • a type IIS recognition site may be present within a sequence being assembled. If the corresponding type IIS restriction enzyme is used during one or more assembly steps described herein, unwanted restriction fragments may be generated and they may interfere with the yield of correctly assembled nucleic acids. One or more different strategies may be used to avoid unwanted type IIS cleavage.
  • an assembly strategy involves identifying type IIS recognition sites that are not present in a target nucleic acid of interest. One or more of these selected sites (and the corresponding enzymes) may be used in one or more assembly steps described herein without cutting the target nucleic acid at unwanted sites.
  • a nucleic acid sequence may be designed to remove any type IIS recognition sites.
  • the removal of such sites may be achieved while preserving the integrity of the nucleic acid sequence code.
  • the degeneracy of certain amino acid codons may allow nucleotide base substitutions to be made to remove a type IIS recognition site while retaining a codon for the same amino acid (e.g., replace one codon for another). Such substitutions are known to those of ordinary skill in the art and can be made using no more than routine methods.
  • a type IIS restriction enzyme recognition site on a nucleic acid being assembled may be masked to prevent unwanted cleavage at that site.
  • a recognition site may be masked by using a masking molecule (e.g., a nucleic acid) that binds to the recognition site (e.g., because it is complementary to one of the strands in the recognition site).
  • a masking molecule e.g., a nucleic acid
  • Blocking as in "a blocking oligonucleotide”
  • a recognition site may be masked with a molecule capable of masking a restriction enzyme recognition site by preventing cleavage without preventing the enzyme from binding to its recognition site (for example, pcPNA).
  • complete or partial methylation of a nucleic acid may be performed to prevent unwanted cleavage at the methylated site.
  • a masking molecule e.g., nucleic acid
  • one or more assembly recognition sites may be masked from methylation while the rest of the nucleic acid molecule is methylated.
  • Methylation sensitive restriction enzymes may be used to cleave the cleavage site which is unmethylated.
  • E. coli host strains should be selected according to the type of masking strategy being employed. For example, Sssl methylated DNA is mcr sensitive. In such situations, an E. coli strain lacking mcrA, mcrBC and Mrr must be used or the DNA will be degraded. It is understood that the skilled artisan is familiar with how to select suitable host strains. Examples of suitable host strains include but are not limited to:
  • DHlOB genotype F " mcrA ⁇ (mrr-hsdRMS-mcrBC) ⁇ 80/ ⁇ cZ ⁇ M15 ⁇ / ⁇ cX74 recAl em/Al araA ⁇ 39 Mara, leu)1691 gal ⁇ galK ⁇ - rpsL (Str R ) nupG; and TOPlO genotype: F " mcrA ⁇ (mrr-hsdRMS-mcrBC) ⁇ 80/ ⁇ cZ ⁇ M15 ⁇ / ⁇ cX74 recAl ⁇ r ⁇ l39 A(ara- leu)7697 galU galK rpsL (Str*) endA ⁇ nupG.
  • Any suitable masking molecule may be used to prevent cleavage or to prevent methylation.
  • Any suitable nucleic acid may be used, including, for example, DNA, peptide nucleic acid (PNA), pseudocomplementary peptide nucleic acid (pcPNA), locked nucleic acid (LNA), etc.
  • a masking nucleic acid should be long enough to bind to a site with sufficient affinity to specifically prevent cleavage or methylation at that site.
  • a masking nucleic acid may be between about 15 and about 50 nucleotides long (or shorter or longer depending on the context).
  • a masking nucleic acid may be about 60 nucleotides long, about 30 nucleotides long, or any other suitable length.
  • a masking molecule may be capable of binding to a nucleic acid molecule at more than one location and on either one or both strands of the molecule.
  • a different sequence specific masking nucleic acid may be used for each site that is being protected.
  • masking of a cleavage site may be achieved by forming a complex with a specific protein or using Hoogsteen base pairing to mask the cleavage site.
  • RecA or any other suitable recombinase may be included to assist the binding of a masking molecule (e.g., nucleic acid) to a nucleic acid site being protected from cleavage or methylation.
  • site specific cleavage may be obtained using specific cleavage of DNA molecules at RecA-mediated triple-stranded structures.
  • specific cleavage may be obtained using an enzyme capable of cleaving a nucleic acid molecule specifically at a site where a triple- stranded DNA structure is located (e.g., using Sl or BAL31).
  • a triple-stranded DNA structure may be generated using a nucleic acid (e.g., oligonucleotide) that is complementary to one strand of a double-stranded target sequence of interest. The formation of a triple-stranded structure may be promoted by RecA or other suitable recombinase enzyme.
  • Certain enzymes may then be used to cut both strands of the double-stranded target nucleic acid at the location of the triple-stranded structure.
  • Sl nuclease cut both strands of the double-stranded target nucleic acid in the context of a triple-stranded structure towards the 5' end of the nucleic acid (e.g., oligonucleotide) that was added to form the structure.
  • Triple-stranded DNA may be formed at any location in a double stranded nucleic acid molecule.
  • a complementary nucleic acid molecule may be used to form a triple- stranded DNA molecule.
  • a homologous deoxynucleotide may be used to form a triple-stranded DNA molecule.
  • formation of a triple-stranded DNA molecule is performed in the presence of RecA protein.
  • RecA protein Further examples may be found in Shigemori et al. (2004, Nucleic Acids Research, 32(1): 1-8). the entire contents of which are incorporated herein by reference. Accordingly, targeted triple-helix cleavage may be used instead of a type IIS cleavage in certain assembly reactions described herein to avoid cleavage at unwanted sites within a target nucleic acid.
  • a meganuclease restriction enzyme may be used to cleave a nucleic acid molecule at a rare position. Meganuclease restriction enzymes specifically recognize long nucleic acid target sites. In some embodiments, a meganuclease restriction enzyme cleaves both strands of a nucleic acid at its specific cleavage site. In some embodiments, a meganuclease recognition site may be about 12- 45 base pairs, hi other embodiments, a meganuclease recognition site may be about 10, about 15, about 20, about 25, about 30, about 35, about 40 or about 45 base pairs. Restriction enzymes with longer recognition sites also may be used.
  • meganuclease is a homing endonuclease which may be found in phages, bacteria, archaebacteria and various eukaryotes (see for example Epinat et al., 2003, Nucleic
  • a meganuclease or rare-cutter recognition site may be used instead of a type IIS site in certain assembly reactions described herein (along with the appropriate meganucleases and/or rare-cutter enzymes) to avoid cleavage at unwanted sites within a target nucleic acid.
  • Enzymatic digestions of DNA with type II or site-specific restriction enzymes typically generate an overhang of four to six nucleotides. These short cohesive ends may be sufficient for ligating two fragments of DNA containing complementary termini. However, when joining multiple DNA fragments together, longer complementary cohesive termini are preferred to facilitate assembly and to ensure specificity. Accordingly, other techniques may be used to expose longer single-stranded overhangs.
  • uracil DNA glycosylase may be used to hydrolyze a uracil-glycosidic bond in a nucleic acid thereby removing uracil and creating an alkali- sensitive abasic site in the DNA which can be subsequently hydrolyzed by endonuclease, heat or alkali treatment.
  • UDG uracil DNA glycosylase
  • a portion of one strand of a double-stranded nucleic acid may be removed thereby exposing the complementary sequence in the form of a single-stranded overhang.
  • This approach requires the deliberate incorporation of one or more uracil bases on one strand of a double-stranded nucleic acid fragment.
  • This may be accomplished, for example, by amplifying a nucleic acid fragment using an amplification primer that contains a 3.' terminal uracil.
  • the region of the primer 5' to the uracil may be released (e.g., upon dilution, incubation, exposure to mild denaturing conditions, etc.) thereby exposing the complementary sequence as a single- stranded overhang.
  • the length of the overhang may be determined by the position of the uracil on the amplifying primer and by the length of the amplifying primer.
  • UDG is commercially available from suppliers such as Roche Applied Science.
  • a technique for exposing a single-stranded overhang may involve a polymerase (e.g., T4 DNA polymerase) that has a suitable editing function.
  • T4 DNA polymerase e.g., T4 DNA polymerase
  • T4 DNA polymerase possesses 3' -> 5' exonuclease activity. While this activity favors single-stranded regions, it can function, albeit somewhat less efficiently, on blunt ends. Accordingly, in the absence of any exogenous nucleotides, the 3' ends of a nucleic acid fragment contacted with T4 DNA polymerase will be progressively digested. The 5'->3' polymerase activity of T4 may attempt to replace an excised nucleotide.
  • progressive excision on a 3' -> 5' strand may be halted at the first occurrence (in the 3' -> 5' direction) of one of the four nucleotides by providing that nucleotide in sufficient amounts in the reaction mixture. The presence of the nucleotide in the reaction will result in an equilibrium being reached between the excision of the nucleotide and its re- incorporation by T4.
  • a single-stranded overhang may be generated at both ends of a nucleic acid fragment (e.g., if each 3' end does not contain the nucleotide that is added in the T4 polymerase reaction).
  • the length of the overhang generated at each end is a function of the sequence at each end (e.g., the length of the 3' sequence that is free of the nucleotide that is added in the T4 polymerase reaction).
  • single-stranded overhangs may be generated by incubating a double-stranded nucleic acid with a polymerase that has an editing function (e.g., T4 DNA polymerase) without adding any nucleotides.
  • the length of the overhangs may be a function of the incubation time. Accordingly, suitable incubation conditions (including suitable incubation times, for example) may be determined to obtain suitable average overhangs (e.g., about 10, about 20, about 30, about 40, about 50 nucleotides long, etc.).
  • Sequence analysis and fragment design and selection for concerted assembly may include analyzing the sequence of a target nucleic acid and designing an assembly strategy based on the identification of regions, within the target nucleic acid sequence, that can be used to generate appropriate cohesive ends (e.g., single-stranded overhangs). These regions may be used to define the ends of fragments that can be assembled (e.g., in a concerted reaction) to generate the target nucleic acid. The fragments can then be provided or made (e.g., in a multiplex assembly reaction).
  • a target nucleic acid sequence may be analyzed to identify regions that contain at most three different types of nucleotide (i.e., they are missing at least one of G, A, T, or C) on one strand of the target nucleic acid. These regions may be used to generate cohesive ends using a polymerase (e.g., T4 DNA polymerase) processing technique described herein.
  • a polymerase e.g., T4 DNA polymerase
  • the length of a cohesive end is preferably sufficient to provide specificity.
  • cohesive ends may be long enough to have sufficiently different sequences to prevent or reduce mispairing between similar cohesive ends. However, their length is preferably not long enough to stabilize mispairs between similar cohesive sequences. In some embodiments, a length of about 9 to about 15 bases may be used.
  • any suitable length may be selected for a region that is to be used to generate a cohesive overhang.
  • the importance of specificity may depend on the number of different fragments that are being assembled simultaneously.
  • the appropriate length required to avoid stabilizing mispaired regions may depend on the conditions used for annealing different cohesive ends.
  • a target nucleic acid sequence may be analyzed to identify potential cohesive end regions as follows.
  • One or more regions (e.g., about 9-15 base long regions) free of either G, A, T, or C may be identified on one strand of a target nucleic acid.
  • One or more regions (e.g., about 9-15 base regions) free of the complementary nucleotide may be identified on the same strand.
  • regions free of C and regions free of G may be identified on one strand of the target nucleic acid.
  • Alternating regions e.g., alternating C-free and G-free regions) may be selected to define the ends of nucleic acid fragments to be used for assembly so that both ends of each fragment can be processed to generate cohesive ends.
  • a fragment with a C-free region at one end and a G-free region at the other end of each strand can be . processed to generate cohesive overhangs at each end.
  • the C-free region is the 3' region on both strands and the overhang is generated by adding C to the T4 polymerase reaction. Similar configurations may be used with any one of G, A, T, or C.
  • alternating regions may be selected if they are separated by distances that define fragments with suitable lengths for the assembly design. In some embodiments, the alternating regions may be separated by about 200 to about 1,500 bases. However, any suitable shorter or longer distance may be selected. For example, the cohesive regions may be separated by about 500 to about 5,000 bases. It should be appreciated that different patterns of alternating regions may be available depending on several factors (e.g., depending on the sequence of the target nucleic acid, the chosen length of the cohesive ends, and the desired fragment length). In some embodiments, if several options are available, the regions may be selected to maximize the sequence differences between different cohesive ends.
  • the fragment size may be between about 200 and about 1,500 base pairs long, between about 500 and about 5,000 bases long, or shorter or longer depending on the target nucleic acid.
  • each fragment may be generated or obtained using any suitable technique.
  • each fragment may be assembled (e.g., in a multiplex oligonucleotide assembly reaction) so that it is flanked by double stranded regions that will be used to generate the cohesive single-stranded regions.
  • a fragment may be amplified in vitro (e.g., by PCR, LCR, etc.).
  • a fragment may be amplified in vivo.
  • a nucleic acid may be cloned into a vector having suitable flanking restriction sites.
  • the restriction sites may be used to excise a fragment with appropriate end sequences that can be used to generate cohesive ends (e.g., with appropriate single-stranded lengths).
  • type IIS restriction enzymes may be used to cut out an appropriate fragment.
  • a type IIS restriction site may be provided by the vector into which a nucleic acid is cloned.
  • a type IIS restriction site may be provided at the end of a nucleic acid that is cloned into a vector (e.g., at the end of a fragment that is assembled in a multiplex oligonucleotide assembly reaction).
  • a type IIS fragment may be isolated and processed as described herein to generate the cohesive ends. It should be appreciated that any type IIS enzyme may be used, provided that its restriction site is placed at a suitable distance from the cohesive region so that the type IIS fragment can be appropriately processed.
  • a fragment may be processed to generate cohesive ends regardless of whether the type IIS digestion generates overhangs or blunt ends.
  • the overhangs generated by a type IIS enzyme may not be long enough to provide sufficient specificity.
  • each fragment is assembled and fidelity optimized to remove error containing nucleic acids (e.g., using one or more post-assembly fidelity optimization techniques described herein) before being processed to generated cohesive ends.
  • the fidelity optimization may be performed on the synthesized fragments after they are ligated into a first vector used for amplification.
  • the fragments may not be fidelity optimized, or they may be fidelity optimized after treatment to generate cohesive ends.
  • the different nucleic acid fragments that are used to assemble a target nucleic acid may be obtained or synthesized using different techniques. However, in some embodiments they are all produced using the same technique (e.g., assembled in a multiplex oligonucleotide assembly reaction, cloned into a vector, digested with a type IIS enzyme, and processed with T4 DNA polymerase).
  • the resulting fragments may be assembled in a single step concerted reaction and, for example, cloned into a vector that has a selectable marker.
  • the assembly may include an in vitro ligation. However, in some embodiments, the assembly may be an in vivo shotgun assembly wherein the fragments are transformed into a host cell without undergoing an in vitro ligation.
  • fragments are amplified in a first vector that has a first selectable marker and are then combined and assembled into a second vector that has a second selectable marker.
  • selection for the second selectable marker avoids contamination with the first vector.
  • the reactions may be performed in a procedure that does not require removal (e.g., by purification) of the first vector sequence.
  • aspects of the invention may include automating one or more acts described herein.
  • sequence analysis the identification of interfering sequence features, assembly strategy selection (including fragment design and selection, the choice of a particular combination of extension-based and ligation-based assembly reactions, etc.), fragment production, single-stranded overhang production, and/or concerted assembly may be automated in order to generate the desired product automatically.
  • Acts of the invention may be automated using, for example, a computer system.
  • aspects of the invention may be used in conjunction with any suitable multiplex nucleic acid assembly procedure.
  • vector-encoded trait activation may be used in connection with or more of the multiplex nucleic acid assembly procedures described below.
  • aspects of the invention may involve an assembly procedure wherein a plurality of nucleic acids each assembled in a multiplex assembly procedure (e.g., from oligonucleotides) are combined to form a larger nucleic acid using an iterative assembly procedure described herein.
  • multiplex nucleic acid assembly relates to the assembly of a plurality of nucleic acids to generate a longer nucleic acid product.
  • multiplex oligonucleotide assembly relates to the assembly of a plurality of oligonucleotides to generate a longer nucleic acid molecule.
  • nucleic acids e.g., single or double-stranded nucleic acid degradation products, restriction fragments, amplification products, naturally occurring small nucleic acids, other polynucleotides, etc.
  • a multiplex assembly reaction e.g., along with one or more oligonucleotides
  • an assembled nucleic acid molecule that is longer than any of the single starting nucleic acids (e.g., oligonucleotides) that were added to the assembly reaction.
  • one or more nucleic acid fragments that each were assembled in separate multiplex assembly reactions may be combined and assembled to form a further nucleic acid that is longer than any of the input nucleic acid fragments.
  • one or more nucleic acid fragments that each were assembled in separate multiplex assembly reactions may be combined with one or more additional nucleic acids (e.g., single or double-stranded nucleic acid degradation products, restriction fragments, amplification products, naturally occurring small nucleic acids, other polynucleotides, etc.) and assembled to form a further nucleic acid that is longer than any of the input nucleic acids.
  • additional nucleic acids e.g., single or double-stranded nucleic acid degradation products, restriction fragments, amplification products, naturally occurring small nucleic acids, other polynucleotides, etc.
  • a target nucleic acid may have a sequence of a naturally occurring gene and/or other naturally occurring nucleic acid (e.g., a naturally occurring coding sequence, regulatory sequence, non-coding sequence, chromosomal structural sequence such as a telomere or centromere sequence, etc., any fragment thereof or any combination of two or more thereof).
  • a target nucleic acid may have a sequence that is not naturally-occurring.
  • a target nucleic acid may be designed to have a sequence that differs from a natural sequence at one or more positions.
  • a target nucleic acid may be designed to have an entirely novel sequence.
  • target nucleic acids may include one or more naturally occurring sequences, non-naturally occurring sequences, or combinations thereof.
  • multiplex assembly may be used to generate libraries of nucleic acids having different sequences.
  • a library may contain nucleic acids having random sequences.
  • a predetermined target nucleic acid may be designed and assembled to include one or more random sequences at one or more predetermined positions.
  • a target nucleic acid may include a functional sequence (e.g., a protein binding sequence, a regulatory sequence, a sequence encoding a functional protein, etc., or any combination thereof).
  • a target nucleic acid may lack a specific functional sequence (e.g., a target nucleic acid may include only non-functional fragments or variants of a protein binding sequence, regulatory sequence, or protein encoding sequence, or any other non-functional naturally-occurring or synthetic sequence, or any non-functional combination thereof).
  • Certain target nucleic acids may include both functional and non-functional sequences.
  • a target nucleic acid may be assembled in a single multiplex assembly reaction (e.g., a single oligonucleotide assembly reaction). However, a target nucleic acid also may be assembled from a plurality of nucleic acid fragments, each of which may have been generated in a separate multiplex oligonucleotide assembly reaction. It should be appreciated that one or more nucleic acid fragments generated via multiplex oligonucleotide assembly also may be combined with one or more nucleic acid molecules obtained from another source (e.g., a restriction fragment, a nucleic acid amplification product, etc.) to form a target nucleic acid. In some embodiments, a target nucleic acid that is assembled in a first reaction may be used as an input nucleic acid fragment for a subsequent assembly reaction to produce a larger target nucleic acid.
  • a target nucleic acid may be assembled in a single multiplex assembly reaction (e.g., a single oligonucleotide assembly reaction).
  • different strategies may be used to produce a target nucleic acid having a predetermined sequence.
  • different starting nucleic acids e.g., different sets of predetermined nucleic acids
  • predetermined nucleic acid fragments may be assembled using one or more different in vitro and/or in vivo techniques.
  • nucleic acids e.g., overlapping nucleic acid fragments
  • an enzyme e.g., a ligase and/or a polymerase
  • a chemical reaction e.g., a chemical ligation
  • in vivo e.g., assembled in a host cell after transfection into the host cell
  • each nucleic acid fragment that is used to make a target nucleic acid may be assembled from different sets of oligonucleotides.
  • a nucleic acid fragment may be assembled using an in vitro or an in vivo technique (e.g., an in vitro or in vivo polymerase, recombinase, and/or ligase based assembly process).
  • an in vitro assembly reaction may involve one or more polymerases, ligases, other suitable enzymes, chemical reactions, or any combination thereof.
  • a predetermined nucleic acid fragment may be assembled from a plurality of different starting nucleic acids (e.g., oligonucleotides) in a multiplex assembly reaction (e.g., a multiplex enzyme-mediated reaction, a multiplex chemical assembly reaction, or a combination thereof).
  • a multiplex assembly reaction e.g., a multiplex enzyme-mediated reaction, a multiplex chemical assembly reaction, or a combination thereof.
  • the assembly reactions described herein may be performed using starting nucleic acids obtained from one or more different sources (e.g., synthetic or natural polynucleotides, nucleic acid amplification products, nucleic acid degradation products, oligonucleotides, etc.).
  • the starting nucleic acids may be referred to as assembly nucleic acids (e.g., assembly oligonucleotides).
  • assembly nucleic acids e.g., assembly oligonucleotides
  • an assembly nucleic acid has a sequence that is designed to be incorporated into the nucleic acid product generated during the assembly process.
  • the description of the assembly reactions in the context of single-stranded nucleic acids is not intended to be limiting.
  • one or more of the starting nucleic acids illustrated in the figures and described herein may be provided as double stranded nucleic acids. Accordingly, it should be appreciated that where the figures and description illustrate the assembly of single-stranded nucleic acids, the presence of one or more complementary nucleic acids is contemplated. Accordingly, one or more double-stranded complementary nucleic acids may be included in a reaction that is described herein in the context of a single-stranded assembly nucleic acid. However, in some embodiments the presence of one or more complementary nucleic acids may interfere with an assembly reaction by competing for hybridization with one of the input assembly nucleic acids.
  • an assembly reaction may involve only single- stranded assembly nucleic acids (i.e., the assembly nucleic acids may be provided in a single-stranded form without their complementary strand) as described or illustrated herein.
  • the presence of one or more complementary nucleic acids may have no or little effect on the assembly reaction.
  • complementary nucleic acid(s) may be incorporated during one or more steps of an assembly.
  • assembly nucleic acids and their complementary strands may be assembled under the same assembly conditions via parallel assembly reactions in the same reaction mixture.
  • a nucleic acid product resulting from the assembly of a plurality of starting nucleic acids may be identical to the nucleic acid product that results from the assembly of nucleic acids that are complementary to the starting nucleic acids (e.g., in some embodiments where the assembly steps result in the production of a double-stranded nucleic acid product).
  • an oligonucleotide may be a nucleic acid molecule comprising at least two covalently bonded nucleotide residues. In some embodiments, an oligonucleotide may be between 10 and 1,000 nucleotides long.
  • an oligonucleotide may be between 10 and 500 nucleotides long, or between 500 and 1,000 nucleotides long. In some embodiments, an oligonucleotide may be between about 20 and about 100 nucleotides long (e.g., from about 30 to 90, 40 to 85, 50 to 80, 60 to 75, or about 65 or about 70 nucleotides long), between about 100 and about 200, between about 200 and about 300 nucleotides, between about 300 and about 400, or between about 400 and about 500 nucleotides long. However, shorter or longer oligonucleotides may be used. An oligonucleotide may be a single-stranded nucleic acid.
  • a double-stranded oligonucleotide may be used as described herein.
  • an oligonucleotide may be chemically synthesized as described in more detail below.
  • an input nucleic acid e.g., oligonucleotide
  • the resulting product may be double-stranded.
  • one of the strands of a double-stranded nucleic acid may be removed before use so that only a predetermined single strand is added to an assembly reaction.
  • each oligonucleotide may be designed to have a sequence that is identical to a different portion of the sequence of a predetermined target nucleic acid that is to be assembled. Accordingly, in some embodiments each oligonucleotide may have a sequence that is identical to a portion of one of the two strands of a double-stranded target nucleic acid.
  • the two complementary strands of a double stranded nucleic acid are referred to herein as the positive (P) and negative (N) strands. This designation is not intended to imply that the strands are sense and anti-sense strands of a coding sequence.
  • a P strand may be a sense strand of a coding sequence
  • a P strand may be an anti-sense strand of a coding sequence
  • a target nucleic acid may be either the P strand, the N strand, or a double-stranded nucleic acid comprising both the P and N strands. It should be appreciated that different oligonucleotides may be designed to have different lengths.
  • one or more different oligonucleotides may have overlapping sequence regions (e.g., overlapping 5' regions or overlapping 3' regions). Overlapping sequence regions may be identical (i.e., corresponding to the same strand of the nucleic acid fragment) or complementary (i.e., corresponding to complementary strands of the nucleic acid fragment).
  • the plurality of oligonucleotides may include one or more oligonucleotide pairs with overlapping identical sequence regions, one or more oligonucleotide pairs with overlapping complementary sequence regions, or a combination thereof.
  • Overlapping sequences may be of any suitable length. For example, overlapping sequences may encompass the entire length of one or more nucleic acids used in an assembly reaction.
  • Overlapping sequences may be between about 5 and about 500 nucleotides long (e.g., between about 10 and 100, between about 10 and 75, between about 10 and 50, about 20, about 25, about 30, about 35, about 40, about 45, about 50, etc.) However, shorter, longer or intermediate overlapping lengths may be used. It should be appreciated that overlaps between different input nucleic acids used in an assembly reaction may have different lengths.
  • the combined sequences of the different oligonucleotides in the reaction may span the sequence of the entire nucleic acid fragment on either the positive strand, the negative strand, both strands, or a combination of portions of the positive strand and portions of the negative strand.
  • the plurality of different oligonucleotides may provide either positive sequences, negative sequences, or a combination of both positive and negative sequences corresponding to the entire sequence of the nucleic acid fragment to be assembled.
  • the plurality of oligonucleotides may include one or more oligonucleotides having sequences identical to one or more portions of the positive sequence, and one or more oligonucleotides having sequences that are identical to one or more portions of the negative sequence of the nucleic acid fragment.
  • One or more pairs of different oligonucleotides may include sequences that are identical to overlapping portions of the predetermined nucleic acid fragment sequence as described herein (e.g., overlapping sequence portions from the same or from complementary strands of the nucleic acid fragment).
  • the plurality of oligonucleotides includes a set of oligonucleotides having sequences that combine to span the entire positive sequence and a set oligonucleotides having sequences that combine to span the entire negative sequence of the predetermined nucleic acid fragment.
  • the plurality of oligonucleotides may include one or more oligonucleotides with sequences that are identical to sequence portions on one strand (either the positive or negative strand) of the nucleic acid fragment, but no oligonucleotides with sequences that are complementary to those sequence portions.
  • a plurality of oligonucleotides includes only oligonucleotides having sequences identical to portions of the positive sequence of the predetermined nucleic acid fragment. In one embodiment, a plurality of oligonucleotides includes only oligonucleotides having sequences identical to portions of the negative sequence of the predetermined nucleic acid fragment. These oligonucleotides may be assembled by sequential ligation or in an extension-based reaction (e.g., if an oligonucleotide having a 3' region that is complementary to one of the plurality of oligonucleotides is added to the reaction).
  • a nucleic acid fragment may be assembled in a polymerase- mediated assembly reaction from a plurality of oligonucleotides that are combined and extended in one or more rounds of polymerase-mediated extensions.
  • a nucleic acid fragment may be assembled in a ligase-mediated reaction from a plurality of oligonucleotides that are combined and ligated in one or more rounds of ligase-mediated ligations.
  • a nucleic acid fragment may be assembled in a non- enzymatic reaction (e.g., a chemical reaction) from a plurality of oligonucleotides that are combined and assembled in one or more rounds of non-enzymatic reactions.
  • a nucleic acid fragment may be assembled using a combination of polymerase, ligase, and/or non-enzymatic reactions.
  • polymerase(s) and ligase(s) may be included in an assembly reaction mixture.
  • a nucleic acid may be assembled via coupled amplification and ligation or ligation during amplification.
  • the resulting nucleic acid fragment from each assembly technique may have a sequence that includes the sequences of each of the plurality of assembly oligonucleotides that were used as described herein.
  • primerless assemblies since the target nucleic acid is generated by assembling the input oligonucleotides rather than being generated in an amplification reaction where the oligonucleotides act as amplification primers to amplify a pre-existing template nucleic acid molecule corresponding to the target nucleic acid.
  • Polymerase-based assembly techniques may involve one or more suitable polymerase enzymes that can catalyze a template-based extension of a nucleic acid in a 5' to 3' direction in the presence of suitable nucleotides and an annealed template.
  • a polymerase may be thermostable.
  • a polymerase may be obtained from recombinant or natural sources.
  • a thermostable polymerase from a thermophilic organism may be used.
  • a polymerase may include a 3'— ⁇ 5' exonuclease/proofreading activity.
  • a polymerase may have no, or little, proofreading activity (e.g., a polymerase may be a recombinant variant of a natural polymerase that has been modified to reduce its proofreading activity).
  • thermostable DNA polymerases include, but are not limited to: Taq (a heat-stable DNA polymerase from the bacterium Thermus aquaticus); Pfu (a thermophilic DNA polymerase with a 3'—» 5' exonuclease/proofreading activity from Pyrococcus furiosus, available from for example Promega); VentR® DNA Polymerase and VentR® (exo-) DNA Polymerase (thermophilic DNA polymerases with or without a 3'—» 5' exonuclease/proofreading activity from Thermococcus litoralis; also known as TIi polymerase); Deep VentR® DNA Polymerase and Deep VentR® (exo-) DNA Polymerase (thermophilic DNA polymerases with or
  • coli DNA Polymerase I which retains polymerase activity, but has lost the 5' ⁇ 3' exonuclease activity, available from, for example, Promega and NEB); SequenaseTM (T7 DNA polymerase deficient in 3'-5' exonuclease activity); Phi29 (bacteriophage 29 DNA polymerase, may be used for rolling circle amplification, for example, in a TempliPhiTM DNA Sequencing Template Amplification Kit, available from Amersham Biosciences); TopoTaqTM (a hybrid polymerase that combines hyperstable DNA binding domains and the DNA unlinking activity of Methanopyrus topoisomerase, with no exonuclease activity, available from Fidelity Systems); TopoTaq HiFi which incorporates a proofreading domain with exonuclease activity; PhusionTM (a PyrococcusAike enzyme with a processivity-enhancing domain, available from New England Biolabs); any other suitable DNA polymerase, or
  • Ligase-based assembly techniques may involve one or more suitable ligase enzymes that can catalyze the covalent linking of adjacent 3' and 5' nucleic acid termini (e.g., a 5' phosphate and a 3' hydroxyl of nucleic acid(s) annealed on a complementary template nucleic acid such that the 3' terminus is immediately adjacent to the 5' terminus).
  • a ligase may catalyze a ligation reaction between the 5' phosphate of a first nucleic acid to the 3' hydroxyl of a second nucleic acid if the first and second nucleic acids are annealed next to each other on a template nucleic acid).
  • a ligase may be obtained from recombinant or natural sources.
  • a ligase may be a heat- stable ligase.
  • a thermostable ligase from a thermophilic organism may be used.
  • thermostable DNA ligases include, but are not limited to: Tth DNA ligase (from Thermus thermophilics, available from, for example, Eurogentec and GeneCraft); Pfu DNA ligase (a hyperthermophilic ligase from Pyrococcus furiosus); Taq ligase (from Thermus aquaticus), any other suitable heat-stable ligase, or any combination thereof.
  • one or more lower temperature ligases may be used (e.g., T4 DNA ligase).
  • a lower temperature ligase may be useful for shorter overhangs (e.g., about 3, about 4, about 5, or about 6 base overhangs) that may not be stable at higher temperatures.
  • Non-enzymatic techniques can be used to ligate nucleic acids.
  • a 5'- end e.g., the 5' phosphate group
  • a 3'-end e.g., the 3' hydroxyl
  • non-enzymatic techniques may offer certain advantages over enzyme-based ligations.
  • non-enzymatic techniques may have a high tolerance of non-natural nucleotide analogues in nucleic acid substrates, may be used to ligate short nucleic acid substrates, may be used to ligate RNA substrates, and/or may be cheaper and/or more suited to certain automated (e.g., high throughput) applications.
  • Non-enzymatic ligation may involve a chemical ligation.
  • nucleic acid termini of two or more different nucleic acids may be chemically ligated.
  • nucleic acid termini of a single nucleic acid may be chemically ligated (e.g., to circularize the nucleic acid).
  • both strands at a first double-stranded nucleic acid terminus may be chemically ligated to both strands at a second double-stranded nucleic acid terminus.
  • only one strand of a first nucleic acid terminus may be chemically ligated to a single strand of a second nucleic acid terminus.
  • the 5' end of one strand of a first nucleic acid terminus may be ligated to the 3' end of one strand of a second nucleic acid terminus without the ends of the complementary strands being chemically ligated.
  • a chemical ligation may be used to form a covalent linkage between a 5' terminus of a first nucleic acid end and a 3' terminus of a second nucleic acid end, wherein the first and second nucleic acid ends may be ends of a single nucleic acid or ends of separate nucleic acids.
  • chemical ligation may involve at least one nucleic acid substrate having a modified end (e.g., a modified 5' and/or 3' terminus) including one or more chemically reactive moieties that facilitate or promote linkage formation.
  • chemical ligation occurs when one or more nucleic acid termini are brought together in close proximity (e.g., when the termini are brought together due to annealing between complementary nucleic acid sequences).
  • annealing between complementary 3' or 5' overhangs e.g., overhangs generated by restriction enzyme cleavage of a double-stranded nucleic acid
  • any combination of complementary nucleic acids that results in a 3' terminus being brought into close proximity with a 5' terminus e.g., the 3' and 5' termini are adjacent to each other when the nucleic acids are annealed to a complementary template nucleic acid
  • Examples of chemical reactions may include, but are not limited to, condensation, reduction, and/or photochemical ligation reactions.
  • chemical ligation can be used to produce naturally-occurring phosphodiester internucleotide linkages, non-naturally-occurring phosphamide pyrophosphate internucleotide linkages, and/or other non-naturally-occurring internucleotide linkages.
  • the process of chemical ligation may involve one or more coupling agents to catalyze the ligation reaction.
  • a coupling agent may promote a ligation reaction between reactive groups in adjacent nucleic acids (e.g., between a 5'- reactive moiety and a 3 '-reactive moiety at adjacent sites along a complementary template).
  • a coupling agent may be a reducing reagent (e.g., ferricyanide), a condensing reagent such (e.g., cyanoimidazole, cyanogen bromide, carbodiimide, etc.), or irradiation (e.g., UV irradiation for photo-ligation).
  • a chemical ligation may be an autoligation reaction that does not involve a separate coupling agent.
  • autoligation the presence of a reactive group on one or more nucleic acids may be sufficient to catalyze a chemical ligation between nucleic acid termini without the addition of a coupling agent (see, for example, Xu Y & Kool ET, 1997, Tetrahedron Lett. 38:5595-8).
  • Non-limiting examples of these reagent-free ligation reactions may involve nucleophilic displacements of sulfur on bromoacetyl, tosyl, or iodo-nucleoside groups (see, for example, Xu Y et al., 2001, Nat Biotech 19:148-52).
  • Nucleic acids containing reactive groups suitable for autoligation can be prepared directly on automated synthesizers (see, for example, Xu Y & Kool ET, 1999, Nuc. Acids Res. 27:875-81).
  • a phosphorothioate at a 3' terminus may react with a leaving group (such as tosylate or iodide) on a thymidine at an adjacent 5' terminus.
  • two nucleic acid strands bound at adjacent sites on a complementary target strand may undergo auto-ligation by displacement of a 5 '-end iodide moiety (or tosylate) with a 3 '-end sulfur moiety.
  • the product of an autoligation may include a non-naturally-occurring internucleotide linkage (e.g., a single oxygen atom may be replaced with a sulfur atom in the ligated product).
  • a synthetic nucleic acid duplex can be assembled via chemical ligation in a one step reaction involving simultaneous chemical ligation of nucleic acids on both strands of the duplex.
  • a mixture of 5'- phosphorylated oligonucleotides corresponding to both strands of a target nucleic acid may be chemically ligated by a) exposure to heat (e.g., to 97 0 C) and slow cooling to form a complex of annealed oligonucleotides, and b) exposure to cyanogen bromide or any other suitable coupling agent under conditions sufficient to chemically ligate adjacent 3' and 5' ends in the nucleic acid complex.
  • a synthetic nucleic acid duplex can be assembled via chemical ligation in a two step reaction involving separate chemical ligations for the complementary strands of the duplex.
  • each strand of a target nucleic acid may be ligated in a separate reaction containing phosphorylated oligonucleotides corresponding to the strand that is to be ligated and non-phosphorylated oligonucleotides corresponding to the complementary strand.
  • the non-phosphorylated oligonucleotides may serve as a template for the phosphorylated oligonucleotides during a chemical ligation (e.g. using cyanogen bromide).
  • the resulting single-stranded ligated nucleic acid may be purified and annealed to a complementary ligated single-stranded nucleic acid to form the target duplex nucleic acid (see, for example, Shabarova ZA et al., 1991, Nuc. Acids Res. 19:4247-51).
  • aspects of the invention may be used to enhance different types of nucleic acid assembly reactions (e.g., multiplex nucleic acid assembly reactions). Aspects of the invention may be used in combination with one or more assembly reactions described in, for example, Carr et al., 2004, Nucleic Acids Research, Vol. 32, No 20, el 62 (9 pages); Richmond et al., 2004, Nucleic Acids Research, Vol. 32, No 17, pp. 5011-5018; Caruthers et al., 1972, J. MoI. Biol. 72, 475-492; Hecker et al., 1998, Biotechniques 24:256-260; Kodumal et al., 2004, PNAS Vol. 101, No. 44, pp.
  • synthesis and assembly methods described herein may be performed in any suitable format, including in a reaction tube, in a multi-well plate, on a surface, on a column, in a microfluidic device (e.g., a microfluidic tube), a capillary tube, etc.
  • FIG. 1 shows one embodiment of a plurality of oligonucleotides that may be assembled in a polymerase-based multiplex oligonucleotide assembly reaction.
  • Figure IA shows two groups of oligonucleotides (Group P and Group N) that have sequences of portions of the two complementary strands of a nucleic acid fragment to be assembled.
  • Group P includes oligonucleotides with positive strand sequences (Pi, P 2 , ... P n -I, Pn, P n +i » • • -P ⁇ » shown from 5' ->3' on the positive strand).
  • Group N includes oligonucleotides with negative strand sequences (NT, ..., N n+ 1, N n , N n _i, ..., N 2 , Nj, shown from 5'-> 3' on the negative strand).
  • one or more of the oligonucleotides within the S or N group may overlap.
  • FIG. IA shows gaps between consecutive oligonucleotides in Group P and gaps between consecutive oligonucleotides in Group N.
  • FIG. IB shows a structure of an embodiment of a Group P or Group N oligonucleotide represented in FIG. IA.
  • This oligonucleotide includes a 5' region that is complementary to a 5' region of a first oligonucleotide from the other group, a 3' region that is complementary to a 3' region of a second oligonucleotide from the other group, and a core or central region that is not complementary to any oligonucleotide sequence from the other group (or its own group).
  • This central region is illustrated as the B region in FIG. IB.
  • the sequence of the B region may be different for each different oligonucleotide.
  • the B region of an oligonucleotide in one group corresponds to a gap between two consecutive oligonucleotides in the complementary group of oligonucleotides.
  • the 5 '-most oligonucleotide in each group does not have a 5' region that is complementary to the 5' region of any other oligonucleotide in either group. Accordingly, the 5'-most oligonucleotides (Pi and N T ) that are illustrated in FIG.
  • IA each have a 3' complementary region and a 5' non-complementary region (the B region of FIG. IB), but no 5' complementary region.
  • any one or more of the oligonucleotides in Group P and/or Group N can be designed to have no B region.
  • a 5 '-most oligonucleotide has only the 3' complementary region (meaning that the entire oligonucleotide is complementary to the 3' region of the 3'-most oligonucleotide from the other group (e.g., the 3' region of Ni or P ⁇ shown in FIG. IA).
  • one of the other oligonucleotides in either Group P or Group N has only a 5' complementary region and a 3' complementary region (meaning that the entire oligonucleotide is complementary to the 5' and 3' sequence regions of the two overlapping oligonucleotides from the complementary group).
  • only a subset of oligonucleotides in an assembly reaction may include B regions. It should be appreciated that the length of the 5', 3', and B regions may be different for each oligonucleotide.
  • the length of the 5' region is the same as the length of the complementary 5' region in the 5' overlapping oligonucleotide from the other group.
  • the length of the 3' region is the same as the length of the complementary 3' region in the 3' overlapping oligonucleotide from the other group.
  • a 3'-most oligonucleotide may be designed with a 3' region that extends beyond the 5' region of the 5 '-most oligonucleotide.
  • an assembled product may include the 5' end of the 5'-most oligonucleotide, but not the 3' end of the 3'-most oligonucleotide that extends beyond it.
  • FIG. 1C illustrates a subset of the oligonucleotides from FIG. IA, each oligonucleotide having a 5', a 3', and an optional B region. Oligonucleotide P n is shown with a 5' region that is complementary to (and can anneal to) the 5' region of oligonucleotide N n -i.
  • Oligonucleotide P n also has a 3' region that is complementary to (and can anneal to) the 3' region of oligonucleotide N n .
  • N n is also shown with a 5' region that is complementary (and can anneal to) the 5' region of oligonucleotide P n +! .
  • This pattern could be repeated for all of oligonucleotides P2 to P T and Ni to N ⁇ -i (with the 5 '-most oligonucleotides only having 3' complementary regions as discussed herein).
  • oligonucleotides from Group P and Group N may anneal to form a long chain such as the oligonucleotide complex illustrated in FIG. IA.
  • subsets of the oligonucleotides may form shorter chains and even oligonucleotide dimers with annealed 5' or 3' regions. It should be appreciated that many copies of each oligonucleotide are included in a typical reaction mixture. Accordingly, the resulting hybridized reaction mixture may contain a distribution of different oligonucleotide dimers and complexes.
  • Polymerase-mediated extension of the hybridized oligonucleotides results in a template- based extension of the 3' ends of oligonucleotides that have annealed 3' regions. Accordingly, polymerase-mediated extension of the oligonucleotides shown in FIG. 1C would result in extension of the 3' ends only of oligonucleotides P n and N n generating extended oligonucleotides containing sequences that are complementary to all the regions of N n and P n , respectively. Extended oligonucleotide products with sequences complementary to all of N n- i and P n+ i would not be generated unless oligonucleotides P n .
  • the plurality of oligonucleotides should include 5 '-most oligonucleotides that are at least complementary to the entire 3' regions of the 3 '-most oligonucleotides.
  • the 5 '-most oligonucleotides also may have 5' regions that extend beyond the 3' ends of the 3 '-most oligonucleotides as illustrated in FIG. IA.
  • a ligase also may be added to ligate adjacent 5' and 3' ends that may be formed upon 3' extension of annealed oligonucleotides in an oligonucleotide complex such as the one illustrated in FIG. IA.
  • a single cycle of polymerase extension extends oligonucleotide pairs with annealed 3' regions. Accordingly, if a plurality of oligonucleotides were annealed to form an annealed complex such as the one illustrated in FIG. IA, a single cycle of polymerase extension would result in the extension of the 3' ends of the Pi/Nj, P2/N 2 , ..., P n -i/N n -i, P n /N n , P ⁇ + i/Nn + i, ..., P ⁇ /N ⁇ oligonucleotide pairs.
  • a single molecule could be generated by ligating the extended oligonucleotide dimers. In one embodiment, a single molecule incorporating all of the oligonucleotide sequences may be generated by performing several polymerase extension cycles.
  • FIG. ID illustrates two cycles of polymerase extension (separated by a denaturing step and an annealing step) and the resulting nucleic acid products. It should be appreciated that several cycles of polymerase extension may be required to assemble a single nucleic acid fragment containing all the sequences of an initial plurality of oligonucleotides. In one embodiment, a minimal number of extension cycles for assembling a nucleic acid may be calculated as Iog 2 n, where n is the number of oligonucleotides being assembled. In some embodiments, progressive assembly of the nucleic acid may be achieved without using temperature cycles.
  • an enzyme capable of rolling circle amplification may be used (e.g., phi 29 polymerase) when a circularized nucleic acid (e.g., oligonucleotide) complex is used as a template to produce a large amount of circular product for subsequent processing using MutS or a MutS homolog as described herein.
  • a circularized nucleic acid e.g., oligonucleotide
  • annealed oligonucleotide pairs P n /N n and P n+ i/N ⁇ + i are extended to form oligonucleotide dimer products incorporating the sequences covered by the respective oligonucleotide pairs.
  • N n is extended to incorporate sequences that are complementary to the B and 5' regions of N n (indicated as N ' n in FIG. ID).
  • N n+ 1 is extended to incorporate sequences that are complementary to the 5' and B regions of P n+ i (indicated as P' n +i in FIG. ID).
  • These dimer products may be denatured and reannealed to form the starting material of step 2 where the 3' end of the extended P n oligonucleotide is annealed to the 3' end of the extended N n+ i oligonucleotide.
  • This product may be extended in a polymerase-mediated reaction to form a product that incorporates the sequences of the four oligonucleotides (P n , N n , P n +1, N n+ O-
  • One strand of this extended product has a sequence that includes (in 5' to 3' order) the 5', B, and 3' regions of P n , the complement of the B region of N n , the 5', B, and 3' regions of P n+ i, and the complements of the B and 5' regions OfN n+ J.
  • the other strand of this extended product has the complementary sequence.
  • reaction products shown in FIG. ID are a subset of the reaction products that would be obtained using all of the oligonucleotides of Group P and Group N.
  • a first polymerase extension reaction using all of the oligonucleotides would result in a plurality of overlapping oligonucleotide dimers from Pi/Ni to P ⁇ /N ⁇ .
  • Each of these may be denatured and at least one of the strands could then anneal to an overlapping complementary strand from an adjacent (either 3' or 5') oligonucleotide dimer and be extended in a second cycle of polymerase extension as shown in FIG. ID.
  • Subsequent cycles of denaturing, annealing, and extension produce progressively larger products including a nucleic acid fragment that includes the sequences of all of the initial oligonucleotides. It should be appreciated that these subsequent rounds of extension also produce many nucleic acid products of intermediate length.
  • the reaction product may be complex since not all of the 3' regions may be extended in each cycle.
  • unextended oligonucleotides may be available in each cycle to anneal to other unextended oligonucleotides or to previously extended oligonucleotides.
  • extended products of different sizes may anneal to each other in each cycle.
  • FIG. 2 shows an embodiment of a plurality of oligonucleotides that may be assembled in a directional polymerase-based multiplex oligonucleotide assembly reaction. In this embodiment, only the 5 '-most oligonucleotide of Group P may be provided. In contrast to the example shown in FIG. 1, the remainder of the sequence of the predetermined nucleic acid fragment is provided by oligonucleotides of Group N.
  • the 3'-most oligonucleotide of Group N (Nl) has a 3' region that is complementary to the 3' region of Pi as shown in FIG. 2B. However, the remainder of the oligonucleotides in Group N have overlapping (but non-complementary) 3' and 5' regions as illustrated in FIG. 2B for oligonucleotides N1-N3.
  • Each Group N oligonucleotide (e.g., N n ) overlaps with two adjacent oligonucleotides: one overlaps with the 3' region (N n- O and one with the 5' region (N n+ i), except for Ni that overlaps with the 3' regions of Pi (complementary overlap) and N2 (non-complementary overlap), and NT that overlaps only with N ⁇ - i. It should be appreciated that all of the overlaps shown in FIG.
  • each oligonucleotide may have 3', B, and 5'regions of different lengths (including no B region in some embodiments). In some embodiments, none of the oligonucleotides may have B regions, meaning that the entire sequence of each oligonucleotide may overlap with the combined 5' and 3' region sequences of its two adjacent oligonucleotides.
  • Assembly of a predetermined nucleic acid fragment from the plurality of oligonucleotides shown in FIG. 2A may involve multiple cycles of polymerase-mediated extension. Each extension cycle may be separated by a denaturing and an annealing step.
  • FIG. 2C illustrates the first two steps in this assembly process.
  • step 1 annealed oligonucleotides Pi and Ni are extended to form an oligonucleotide dimer.
  • Pi is shown with a 5' region that is non-complementary to the 3' region of Ni and extends beyond the 3' region of Ni when the oligonucleotides are annealed.
  • Pi may lack the 5' non-complementary region and include only sequences that overlap with the 3' region of Ni.
  • the product of Pi extension is shown after step 1 containing an extended region that is complementary to the 5' end of Ni.
  • the single strand illustrated in FIG. 2C may be obtained by denaturing the oligonucleotide dimer that results from the extension of Pi/Ni in step 1.
  • the product of Pi extension is shown annealed to the 3' region of N 2 . This annealed complex may be extended in step 2 to generate an extended product that now includes sequences complementary to the B and 5' regions OfN 2 .
  • cycles of extension may be obtained by denaturing the oligonucleotide dimer that results from the extension reaction of step 2. Additional cycles of extension may be performed to further assemble a predetermined nucleic acid fragment. In each cycle, extension results in the addition of sequences complementary to the B and 5' regions of the next Group N oligonucleotide. Each cycle may include a denaturing and annealing step. However, the extension may occur under the annealing conditions. Accordingly, in one embodiment, cycles of extension may be obtained by alternating between denaturing conditions (e.g., a denaturing temperature) and annealing/extension conditions (e.g., an annealing/extension temperature).
  • denaturing conditions e.g., a denaturing temperature
  • annealing/extension conditions e.g., an annealing/extension temperature
  • T (the number of group N oligonucleotides) may determine the minimal number of temperature cycles used to assemble the oligonucleotides.
  • progressive extension may be achieved without temperature cycling.
  • an enzyme capable promoting rolling circle amplification may be used (e.g., TempliPhi).
  • TempliPhi an enzyme capable promoting rolling circle amplification
  • a reaction mixture containing an assembled predetermined nucleic acid fragment also may contain a distribution of shorter extension products that may result from incomplete extension during one or more of the cycles or may be the result of an Pi/Ni extension that was initiated after the first cycle.
  • FIG. 2D illustrates an example of a sequential extension reaction where the 5'- most Pi oligonucleotide is bound to a support and the Group N oligonucleotides are unbound.
  • the reaction steps are similar to those described for FIG. 2C.
  • an extended predetermined nucleic acid fragment will be bound to the support via the 5'- most Pi oligonucleotide.
  • the complementary strand (the negative strand) may readily be obtained by denaturing the bound fragment and releasing the negative strand.
  • the attachment to the support may be labile or readily reversed (e.g., using light, a chemical reagent, a pH change, etc.) and the positive strand also may be released.
  • FIG. 2E illustrates an example of a sequential reaction where Pj is unbound and the Group N oligonucleotides are bound to a support. The reaction steps are similar to those described for FIG. 2C. However, an extended predetermined nucleic acid fragment will be bound to the support via the 5'-most N T oligonucleotide. Accordingly, the complementary strand (the positive strand) may readily be obtained by denaturing the bound fragment and releasing the positive strand.
  • the attachment to the support may be labile or readily reversed (e.g., using light, a chemical reagent, a pH change, etc.) and the negative strand also may be released. Accordingly, either the positive strand, the negative strand, or the double- stranded product may be obtained.
  • oligonucleotides may be used to assemble a nucleic acid via two or more cycles of polymerase-based extension. In many configurations, at least one pair of oligonucleotides have complementary 3' end regions. FIG.
  • 2F illustrates an example where an oligonucleotide pair with complementary 3' end regions is flanked on either side by a series of oligonucleotides with overlapping non-complementary sequences.
  • the oligonucleotides illustrated to the right of the complementary pair have overlapping 3' and 5' regions (with the 3' region of one oligonucleotide being identical to the 5' region of the adjacent oligonucleotide) that corresponding to a sequence of one strand of the target nucleic acid to be assembled.
  • the oligonucleotides illustrated to the left of the complementary pair have overlapping 3' and 5' regions (with the 3' region of one oligonucleotide being identical to the 5' region of the adjacent oligonucleotide) that correspond to a sequence of the complementary strand of the target nucleic acid.
  • These oligonucleotides may be assembled via sequential polymerase-based extension reactions as described herein (see also, for example, Xiong et al., 2004, Nucleic Acids Research, Vol. 32, No. 12, e98, 10 pages, the disclosure of which is incorporated by reference herein). It should be appreciated that different numbers and/or lengths of oligonucleotides may be used on either side of the complementary pair.
  • the illustration of the complementary pair as the central pair in FIG. 2F is not intended to be limiting as other configuration of a complementary oligonucleotide pair flanked by a different number of non-complementary pairs on either side may be used according to methods of the invention.
  • FIG. 3 shows an embodiment of a plurality of oligonucleotides that may be assembled in a ligase reaction.
  • FIG. 3 A illustrates the alignment of the oligonucleotides showing that they do not contain gaps (i.e., no B region as described herein). Accordingly, the oligonucleotides may anneal to form a complex with no nucleotide gaps between the 3' and 5' ends of the annealed oligonucleotides in either Group P or Group N. These oligonucleotides provide a suitable template for assembly using a ligase under appropriate reaction conditions.
  • FIG. 3B shows two individual ligation reactions. These reactions are illustrated in two steps. However, it should be appreciated that these ligation reactions may occur simultaneously or sequentially in any order and may occur as such in a reaction maintained under constant reaction conditions (e.g., with no temperature cycling) or in a reaction exposed to several temperature cycles. For example, the reaction illustrated in step 2 may occur before the reaction illustrated in step 1. In each ligation reaction illustrated in FIG.
  • a Group N oligonucleotide is annealed to two adjacent Group P oligonucleotides (due to the complementary 5' and 3' regions between the P and N oligonucleotides), providing a template for ligation of the adjacent P oligonucleotides.
  • ligation of the N group oligonucleotides also may proceed in similar manner to assemble adjacent N oligonucleotides that are annealed to their complementary P oligonucleotide. Assembly of the predetermined nucleic acid fragment may be obtained through ligation of all of the oligonucleotides to generate a double stranded product.
  • a single stranded product of either the positive or negative strand may be obtained.
  • a plurality of oligonucleotides may be designed to generate only single-stranded reaction products in a ligation reaction.
  • a first group of oligonucleotides (of either Group P or Group N) may be provided to cover the entire sequence on one strand of the predetermined nucleic acid fragment (on either the positive or negative strand).
  • a second group of oligonucleotides may be designed to be long enough to anneal to complementary regions in the first group but not long enough to provide adjacent 5' and 3' ends between oligonucleotides in the second group.
  • This provides substrates that are suitable for ligation of oligonucleotides from the first group but not the second group.
  • the result is a single-stranded product having a sequence corresponding to the oligonucleotides in the first group.
  • a ligase reaction mixture that contains an assembled predetermined nucleic acid fragment also may contain a distribution of smaller fragments resulting from the assembly of a subset of the oligonucleotides.
  • FIG. 4 shows an embodiment of a ligase-based assembly where one or more of the plurality of oligonucleotides is bound to a support.
  • the 5' most oligonucleotide of the P group oligonucleotides is bound to a support.
  • Ligation of adjacent oligonucleotides in the 5' to 3' direction results in the assembly of a predetermined nucleic acid fragment.
  • FIG. 4A illustrates an example where adjacent oligonucleotides P 2 and P 3 are added sequentially. However, the ligation of any two adjacent oligonucleotides from Group P may occur independently and in any order in a ligation reaction mixture.
  • N2 when Pi is ligated to the 5' end of N 2 , N2 may be in the form of a single oligonucleotide or it already may be ligated to one or more downstream oligonucleotides (N 3 , N 4 , etc.). It should be appreciated that for a ligation assembly bound to a support, either the 5 '-most (e.g., Pj for Group P, or N T for Group N) or the 3'-most (e.g., P T for Group P, or Ni for Group N) oligonucleotide may be bound to a support since the reaction can proceed in any direction.
  • the 5 '-most e.g., Pj for Group P, or N T for Group N
  • the 3'-most e.g., P T for Group P, or Ni for Group N
  • a predetermined nucleic acid fragment may be assembled with a central oligonucleotide (i.e., neither the 5 '-most or the 3 '-most) that is bound to a support provided that the attachment to the support does not interfere with ligation.
  • a central oligonucleotide i.e., neither the 5 '-most or the 3 '-most
  • FIG. 4B illustrates an example where a plurality of N group oligonucleotides are bound to a support and a predetermined nucleic acid fragment is assembled from P group oligonucleotides that anneal to their complementary support-bound N group oligonucleotides.
  • FIG. 4B illustrates a sequential addition.
  • adjacent P group oligonucleotides may be ligated in any order.
  • the bound oligonucleotides may be attached at their 5' end, 3' end, or at any other position provided that the attachment does not interfere with their ability to bind to complementary 5' and 3' regions on the oligonucleotides that are being assembled.
  • This reaction may involve one or more reaction condition changes (e.g., temperature cycles) so that ligated oligonucleotides bound to one immobilized N group oligonucleotide can be dissociated from the support and bind to a different immobilized N group oligonucleotide to provide a substrate for ligation to another P group oligonucleotide.
  • reaction condition changes e.g., temperature cycles
  • support-bound ligase reactions that generate a full length predetermined nucleic acid fragment also may generate a distribution of smaller fragments resulting from the assembly of subsets of the oligonucleotides.
  • a support used in any of the assembly reactions described herein may include any suitable support medium.
  • a support may be solid, porous, a matrix, a gel, beads, beads in a gel, etc.
  • a support may be of any suitable size.
  • a solid support may be provided in any suitable configuration or shape (e.g., a chip, a bead, a gel, a microfiuidic channel, a planar surface, a spherical shape, a column, etc.).
  • a chip e.g., a chip, a bead, a gel, a microfiuidic channel, a planar surface, a spherical shape, a column, etc.
  • oligonucleotide assembly reactions may be used to assemble a plurality of overlapping oligonucleotides (with overlaps that are either 575', 373', 573', complementary, non-complementary, or a combination thereof).
  • Many of these reactions include at least one pair of oligonucleotides (the pair including one oligonucleotide from a first group or P group of oligonucleotides and one oligonucleotide from a second group or N group of oligonucleotides) have overlapping complementary 3' regions.
  • a predetermined nucleic acid may be assembled from non-overlapping oligonucleotides using blunt-ended ligation reactions.
  • the order of assembly of the non-overlapping oligonucleotides may be biased by selective phosphorylation of different 5' ends.
  • size purification may be used to select for the correct order of assembly.
  • the correct order of assembly may be promoted by sequentially adding appropriate oligonucleotide substrates into the reaction (e.g., the ligation reaction).
  • a purification step may be used to remove starting oligonucleotides and/or incompletely assembled fragments.
  • a purification step may involve chromatography, electrophoresis, or other physical size separation technique.
  • a purification step may involve amplifying the full length product. For example, a pair of amplification primers (e.g., PCR primers) that correspond to the predetermined 5' and 3' ends of the nucleic acid fragment being assembled will preferentially amplify full length product in an exponential fashion.
  • a pair of amplification primers e.g., PCR primers
  • the sequence of the predetermined fragment will be provided by the oligonucleotides as described herein.
  • the oligonucleotides may contain additional sequence information that may be removed during assembly or may be provided to assist in subsequent manipulations of the assembled nucleic acid fragment. Examples of additional sequences include, but are not limited to, primer recognition sequences for amplification (e.g., PCR primer recognition sequences), restriction enzyme recognition sequences, recombination sequences, other binding or recognition sequences, labeled sequences, etc.
  • one or more of the 5 '-most oligonucleotides, one or more of the 3 '-most oligonucleotides, or any combination thereof may contain one or more additional sequences.
  • the additional sequence information may be contained in two or more adjacent oligonucleotides on either strand of the predetermined nucleic acid sequence.
  • an assembled nucleic acid fragment may contain additional sequences that may be used to connect the assembled fragment to one or more additional nucleic acid fragments (e.g., one or more other assembled fragments, fragments obtained from other sources, vectors, etc.) via ligation, recombination, polymerase-mediated assembly, etc.
  • purification may involve cloning one or more assembled nucleic acid fragments.
  • the cloned product may be screened (e.g., sequenced, analyzed for an insert of the expected size, etc.).
  • a nucleic acid fragment assembled from a plurality of oligonucleotides may be combined with one or more additional nucleic acid fragments using a polymerase-based and/or a ligase-based extension reaction similar to those described herein for oligonucleotide assembly. Accordingly, one or more overlapping nucleic acid fragments may be combined and assembled to produce a larger nucleic acid fragment as described herein. In certain embodiments, double-stranded overlapping oligonucleotide fragments may be combined. However, single-stranded fragments, or combinations of single-stranded and double-stranded fragments may be combined as described herein.
  • a nucleic acid fragment assembled from a plurality of oligonucleotides may be of any length depending on the number and length of the oligonucleotides used in the assembly reaction.
  • a nucleic acid fragment (either single-stranded or double-stranded) assembled from a plurality of oligonucleotides may be between 50 and 1,000 nucleotides long (for example, about 70 nucleotides long, between 100 and 500 nucleotides long, between 200 and 400 nucleotides long, about 200 nucleotides long, about 300 nucleotides long, about 400 nucleotides long, etc.).
  • One or more such nucleic acid fragments (e.g., with overlapping 3' and/or 5' ends) may be assembled to form a larger nucleic acid fragment (single- stranded or double-stranded) as described herein.
  • a full length product assembled from smaller nucleic acid fragments also may be isolated or purified as described herein (e.g., using a size selection, cloning, selective binding or other suitable purification procedure).
  • any assembled nucleic acid fragment (e.g., full-length nucleic acid fragment) described herein may be amplified (prior to, as part of, or after, a purification procedure) using appropriate 5' and 3' amplification primers.
  • P Group and N Group oligonucleotides are used herein for clarity purposes only, and to illustrate several embodiments of multiplex oligonucleotide assembly.
  • the Group P and Group N oligonucleotides described herein are interchangeable, and may be referred to as first and second groups of oligonucleotides corresponding to sequences on complementary strands of a target nucleic acid fragment.
  • Oligonucleotides may be synthesized using any suitable technique.
  • oligonucleotides may be synthesized on a column or other support (e.g., a chip).
  • chip-based synthesis techniques include techniques used in synthesis devices or methods available from Combimatrix, Agilent, Affymetrix, or other sources.
  • a synthetic oligonucleotide may be of any suitable size, for example between 10 and 1,000 nucleotides long (e.g., between 10 and 200, 200 and 500, 500 and 1,000 nucleotides long, or any combination thereof).
  • An assembly reaction may include a plurality of oligonucleotides, each of which independently may be between 10 and 200 nucleotides in length (e.g., between 20 and 150, between 30 and 100, 30 to 90, 30-80, 30-70, 30-60, 35-55, 40-50, or any intermediate number of nucleotides). However, one or more shorter or longer oligonucleotides may be used in certain embodiments. Oligonucleotides may be provided as single stranded synthetic products. However, in some embodiments, oligonucleotides may be provided as double-stranded preparations including an annealed complementary strand. Oligonucleotides may be molecules of DNA, RNA, PNA, or any combination thereof.
  • a double-stranded oligonucleotide may be produced by amplifying a single-stranded synthetic oligonucleotide or other suitable template (e.g., a sequence in a nucleic acid preparation such as a nucleic acid vector or genomic nucleic acid). Accordingly, a plurality of oligonucleotides designed to have the sequence features described herein may be provided as a plurality of single-stranded oligonucleotides having those feature, or also may be provided along with complementary oligonucleotides.
  • an oligonucleotide may be phosphorylated (e.g., with a 5' phosphate). In some embodiments, an oligonucleotide may be non-phosphorylated.
  • an oligonucleotide may be amplified using an appropriate primer pair with one primer corresponding to each end of the oligonucleotide (e.g., one that is complementary to the 3' end of the oligonucleotide and one that is identical to the 5' end of the oligonucleotide).
  • an oligonucleotide may be designed to contain a central assembly sequence (designed to be incorporated into the target nucleic acid) flanked by a 5' amplification sequence (e.g., a 5' universal sequence) and a 3 * amplification sequence (e.g., a 3' universal sequence).
  • Amplification primers corresponding to the flanking amplification sequences may be used to amplify the oligonucleotide (e.g., one primer may be complementary to the 3' amplification sequence and one primer may have the same sequence as the 5' amplification sequence).
  • the amplification sequences then may be removed from the amplified oligonucleotide using any suitable technique to produce an oligonucleotide that contains only the assembly sequence.
  • a plurality of different oligonucleotides may have identical 5' amplification sequences and identical 3' amplification sequences. These oligonucleotides can all be amplified in the same reaction using the same amplification primers.
  • a preparation of an oligonucleotide designed to have a certain sequence may include oligonucleotide molecules having the designed sequence in addition to oligonucleotide molecules that contain errors (e.g., that differ from the designed sequence at least at one position).
  • a sequence error may include one or more nucleotide deletions, additions, substitutions (e.g., transversion or transition), inversions, duplications, or any combination of two or more thereof.
  • Oligonucleotide errors may be generated during oligonucleotide synthesis. Different synthetic techniques may be prone to different error profiles and frequencies. In some embodiments, error rates may vary from 1/10 to 1/200 errors per base depending on the synthesis protocol that is used.
  • one or more oligonucleotide preparations may be processed to remove (or reduce the frequency of) error-containing oligonucleotides.
  • a hybridization technique may be used wherein an oligonucleotide preparation is hybridized under stringent conditions one or more times to an immobilized oligonucleotide preparation designed to have a complementary sequence. Oligonucleotides that do not bind may be removed in order to selectively or specifically remove oligonucleotides that contain errors that would destabilize hybridization under the conditions used.
  • this processing may not remove all error-containing oligonucleotides since many have only one or two sequence errors and may still bind to the immobilized oligonucleotides with sufficient affinity for a fraction of them to remain bound through this selection processing procedure.
  • a nucleic acid binding protein or recombinase may be included in one or more of the oligonucleotide processing steps to improve the selection of error free oligonucleotides. For example, by preferentially promoting the hybridization of oligonucleotides that are completely complementary with the immobilized oligonucleotides, the amount of error containing oligonucleotides that are bound may be reduced.
  • this oligonucleotide processing procedure may remove more error-containing oligonucleotides and generate an oligonucleotide preparation that has a lower error frequency (e.g., with an error rate of less than 1/50, less than 1/100, less than 1/200, less than 1/300, less than 1/400, less than 1/500, less than 1/1,000, or less than 1/2,000 errors per base.
  • a lower error frequency e.g., with an error rate of less than 1/50, less than 1/100, less than 1/200, less than 1/300, less than 1/400, less than 1/500, less than 1/1,000, or less than 1/2,000 errors per base.
  • a plurality of oligonucleotides used in an assembly reaction may contain preparations of synthetic oligonucleotides, single-stranded oligonucleotides, double- stranded oligonucleotides, amplification products, oligonucleotides that are processed to remove (or reduce the frequency of) error-containing variants, etc., or any combination of two or more thereof.
  • synthetic oligonucleotides synthesized on an array are not amplified prior to assembly.
  • a polymerase-based or ligase-based assembly using non-amplified oligonucleotides may be performed in a microfluidic device.
  • a synthetic oligonucleotide may be amplified prior to use. Either strand of a double- stranded amplification product may be used as an assembly oligonucleotide and added to an assembly reaction as described herein.
  • a synthetic oligonucleotide may be amplified using a pair of amplification primers (e.g., a first primer that hybridizes to the 3' region of the oligonucleotide and a second primer that hybridizes to the 3' region of the complement of the oligonucleotide).
  • the oligonucleotide may be synthesized on a support such as a chip (e.g., using an ink-jet- based synthesis technology).
  • the oligonucleotide may be amplified while it is still attached to the support.
  • the oligonucleotide may be removed or cleaved from the support prior to amplification.
  • the two strands of a double-stranded amplification product may be separated and isolated using any suitable technique.
  • the two strands may be differentially labeled (e.g., using one or more different molecular weight, affinity, fluorescent, electrostatic, magnetic, and/or other suitable tags).
  • the different labels may be used to purify and/or isolate one or both strands.
  • biotin may be used as a purification tag.
  • the strand that is to be used for assembly may be directly purified (e.g., using an affinity or other suitable tag).
  • the complementary strand is removed (e.g., using an affinity or other suitable tag) and the remaining strand is used for assembly.
  • a synthetic oligonucleotide may include a central assembly sequence flanked by 5' and 3' amplification sequences.
  • the central assembly sequence is designed for incorporation into an assembled nucleic acid.
  • the flanking sequences are designed for amplification and are not intended to be incorporated into the assembled nucleic acid.
  • the flanking amplification sequences may be used as universal primer sequences to amplify a plurality of different assembly oligonucleotides that share the same amplification sequences but have different central assembly sequences.
  • the flanking sequences are removed after amplification to produce an oligonucleotide that contains only the assembly sequence.
  • one of the two amplification primers may be biotinylated.
  • the nucleic acid strand that incorporates this biotinylated primer during amplification can be affinity purified using streptavidin (e.g., bound to a bead, column, or other surface).
  • streptavidin e.g., bound to a bead, column, or other surface.
  • the amplification primers also may be designed to include certain sequence features that can be used to remove the primer regions after amplification in order to produce a single-stranded assembly oligonucleotide that includes the assembly sequence without the flanking amplification sequences.
  • the non-biotinylated strand may be used for assembly.
  • the assembly oligonucleotide may be purified by removing the biotinylated complementary strand.
  • the amplification sequences may be removed if the non-biotinylated primer includes a dU at its 3' end, and if the amplification sequence recognized by (i.e., complementary to) the biotinylated primer includes at most three of the four nucleotides and the fourth nucleotide is present in the assembly sequence at (or adjacent to) the junction between the amplification sequence and the assembly sequence.
  • the double-stranded product is incubated with T4 DNA polymerase (or other polymerase having a suitable editing activity) in the presence of the fourth nucleotide (without any of the nucleotides that are present in the amplification sequence recognized by the biotinylated primer) under appropriate reaction conditions. Under these conditions, the 3 ' nucleotides are progressively removed through to the nucleotide that is not present in the amplification sequence (referred to as the fourth nucleotide above). As a result, the amplification sequence that is recognized by the biotinylated primer is removed. The biotinylated strand is then removed.
  • T4 DNA polymerase or other polymerase having a suitable editing activity
  • the remaining non-biotinylated strand is then treated with uracil-DNA glycosylase (UDG) to remove the non-biotinylated primer sequence.
  • UDG uracil-DNA glycosylase
  • This technique generates a single-stranded assembly oligonucleotide without the flanking amplification sequences. It should be appreciated that this technique may be used to process a single amplified oligonucleotide preparation or a plurality of different amplified oligonucleotides in a single reaction if they share the same amplification sequence features described above.
  • the biotinylated strand may be used for assembly.
  • the assembly oligonucleotide may be obtained directly by isolating the biotinylated strand.
  • the amplification sequences may be removed if the biotinylated primer includes a dU at its 3' end, and if the amplification sequence recognized by (i.e., complementary to) the non-biotinylated primer includes at most three of the four nucleotides and the fourth nucleotide is present in the assembly sequence at (or adjacent to) the junction between the amplification sequence and the assembly sequence.
  • the double-stranded product is incubated with T4 DNA polymerase (or other polymerase having a suitable editing activity) in the presence of the fourth nucleotide (without any of the nucleotides that are present in the amplification sequence recognized by the non-biotinylated primer) under appropriate reaction conditions. Under these conditions, the 3' nucleotides are progressively removed through to the nucleotide that is not present in the amplification sequence (referred to as the fourth nucleotide above). As a result, the amplification sequence that is recognized by the non- biotinylated primer is removed. The biotinylated strand is then isolated (and the non- biotinylated strand is removed).
  • T4 DNA polymerase or other polymerase having a suitable editing activity
  • the isolated biotinylated strand is then treated with UDG to remove the biotinylated primer sequence.
  • This technique generates a single- stranded assembly oligonucleotide without the flanking amplification sequences. It should be appreciated that this technique may be used to process a single amplified oligonucleotide preparation or a plurality of different amplified oligonucleotides in a single reaction if they share the same amplification sequence features described above.
  • biotinylated primer may be designed to anneal to either the synthetic oligonucleotide or to its complement for the amplification and purification reactions described above.
  • non-biotinylated primer may be designed to anneal to either strand provided it anneals to the strand that is complementary to the strand recognized by the biotinylated primer.
  • an oligonucleotide may be modified by incorporating a modified-base (e.g., a nucleotide analog) during synthesis, by modifying the oligonucleotide after synthesis, or any combination thereof.
  • a modified-base e.g., a nucleotide analog
  • modifications include, but are not limited to, one or more of the following: universal bases such as nitroindoles, dP and dK, inosine, uracil; halogenated bases such as BrdU; fluorescent labeled bases; non-radioactive labels such as biotin (as a derivative of dT) and digoxigenin (DIG); 2,4-Dinitrophenyl (DNP); radioactive nucleotides; post-coupling modification such as dR-NFt (deoxyribose-NH 2 ); Acridine (6-chloro-2- methoxiacridine); and spacer phosphoramides which are used during synthesis to add a spacer 'arm' into the sequence, such as C3, C 8 (octanediol), C9, C 12, HEG (hexaethlene glycol) and Cl 8.
  • universal bases such as nitroindoles, dP and dK, inosine, uracil
  • nucleic acid binding proteins or recombinases are preferably not included in a post-assembly fidelity optimization technique (e.g., a screening technique using a MutS or MutS homolog), because the optimization procedure involves removing error-containing nucleic acids via the production and removal of heteroduplexes. Accordingly, any nucleic acid binding proteins or recombinases (e.g., RecA) that were included in the assembly steps are preferably removed (e.g., by inactivation, column purification or other suitable technique) after assembly and prior to fidelity optimization.
  • a post-assembly fidelity optimization technique e.g., a screening technique using a MutS or MutS homolog
  • the invention provides methods for assembling synthetic nucleic acids with increased efficiency.
  • the resulting assembled nucleic acids may be amplified in vitro (e.g., using PCR, LCR, or any suitable amplification technique), amplified in vivo (e.g., via cloning into a suitable vector), isolated and/or purified.
  • An assembled nucleic acid (alone or cloned into a vector) may be transformed into a host cell (e.g., a prokaryotic, eukaryotic, insect, mammalian, or other host cell).
  • the host cell may be used to propagate the nucleic acid.
  • the nucleic acid may be integrated into the genome of the host cell.
  • the nucleic acid may replace a corresponding nucleic acid region on the genome of the cell (e.g., via homologous recombination). Accordingly, nucleic acids may be used to produce recombinant organisms.
  • a target nucleic acid may be an entire genome or large fragments of a genome that are used to replace all or part of the genome of a host organism. Recombinant organisms also may be used for a variety of research, industrial, agricultural, and/or medical applications.
  • concerted assembly may be used to assemble oligonucleotide duplexes and nucleic acid fragments of less than 100 to more than 10,000 base pairs in length (e.g., 100 mers to 500 mers, 500 mers to 1,000 mers, 1,000 mers to 5,000 mers, 5, 000 mers to 10,000 mers, 25,000 mers, 50,000 mers, 75,000 mers, 100,000 mers, etc.).
  • methods described herein may be used during the assembly of an entire genome (or a large fragment thereof, e.g., about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or more) of an organism (e.g., of a viral, bacterial, yeast, or other prokaryotic or eukaryotic organism), optionally incorporating specific modifications into the sequence at one or more desired locations.
  • an organism e.g., of a viral, bacterial, yeast, or other prokaryotic or eukaryotic organism
  • nucleic acid products e.g., including nucleic acids that are amplified, cloned, purified, isolated, etc.
  • any of the nucleic acid products may be packaged in any suitable format (e.g., in a stable buffer, lyophilized, etc.) for storage and/or shipping (e.g., for shipping to a distribution center or to a customer).
  • any of the host cells e.g., cells transformed with a vector or having a modified genome
  • cells may be prepared in a suitable buffer for storage and or transport (e.g., for distribution to a customer).
  • cells may be frozen.
  • other stable cell preparations also may be used.
  • Host cells may be grown and expanded in culture.
  • Host cells may be used for expressing one or more RNAs or polypeptides of interest (e.g., therapeutic, industrial, agricultural, and/or medical proteins).
  • the expressed polypeptides may be natural polypeptides or non-natural polypeptides.
  • the polypeptides may be isolated or purified for subsequent use. Accordingly, nucleic acid molecules generated using methods of the invention can be incorporated into a vector.
  • the vector may be a cloning vector or an expression vector.
  • a vector may comprise an origin of replication and one or more selectable markers (e.g., antibiotic resistant markers, auxotrophic markers, etc.).
  • the vector may be a viral vector.
  • a viral vector may comprise nucleic acid sequences capable of infecting target cells.
  • a prokaryotic expression vector operably linked to an appropriate promoter system can be used to transform target cells.
  • a eukaryotic vector operably linked to an appropriate promoter system can be used to transfect target cells or tissues.
  • RNAs or polypeptides may be isolated or purified.
  • Nucleic acids of the invention also may be used to add detection and/or purification tags to expressed polypeptides or fragments thereof.
  • polypeptide-based fusion/tag include, but are not limited to, hexa- histidine (His 6 ) Myc and HA, and other polypeptides with utility, such as GFP, GST, MBP, chitin and the like.
  • polypeptides may comprise one or more unnatural amino acid residue(s).
  • antibodies can be made against polypeptides or fragment(s) thereof encoded by one or more synthetic nucleic acids.
  • synthetic nucleic acids may be provided as libraries for screening in research and development (e.g., to identify potential therapeutic proteins or peptides, to identify potential protein targets for drug development, etc.)
  • a synthetic nucleic acid may be used as a therapeutic (e.g., for gene therapy, or for gene regulation).
  • a synthetic nucleic acid may be administered to a patient in an amount sufficient to express a therapeutic amount of a protein.
  • a synthetic nucleic acid may be administered to a patient in an amount sufficient to regulate (e.g., down-regulate) the expression of a gene.
  • an assembly procedure may involve a combination of acts that are performed at one site (in the United States or outside the United States) and acts that
  • aspects of the invention may include automating one or more acts described herein.
  • a sequence analysis may be automated in order to generate a synthesis strategy automatically.
  • the synthesis strategy may include i) the design of the starting nucleic acids that are to be assembled into the target nucleic acid, ii) the choice of the assembly technique(s) to be used, iii) the number of rounds of assembly and error screening or sequencing steps to include, and/or decisions relating to subsequent processing of an assembled target nucleic acid.
  • one or more steps of an assembly reaction may be automated using one or more automated sample handling devices (e.g., one or more automated liquid or fluid handling devices).
  • reaction reagents including one or more of the following: starting nucleic acids, buffers, enzymes (e.g., one or more ligases and/or polymerases), nucleotides, nucleic acid binding proteins or recombinases, salts, and any other suitable agents such as stabilizing agents.
  • reaction reagents may include one or more reagents or reaction conditions suitable for extension-based assembly, ligation-based assembly, or combinations thereof.
  • Automated devices and procedures also may be used to control the reaction conditions.
  • an automated thermal cycler may be used to control reaction temperatures and any temperature cycles that may be used.
  • a thermal cycler may be automated to provide one or more reaction temperatures or temperature cycles suitable for incubating nucleic acid fragments prior to transformation.
  • subsequent purification and analysis of assembled nucleic acid products may be automated.
  • fidelity optimization steps e.g., a MutS error screening procedure
  • Sequencing also may be automated using a sequencing device and automated sequencing protocols. Additional steps (e.g., amplification, cloning, etc.) also may be automated using one or more appropriate devices and related protocols.
  • assembly reaction mixtures e.g., liquid reaction samples
  • automated devices and procedures e.g., robotic manipulation and/or transfer of samples and/or sample containers, including automated pipetting devices, etc.
  • the system and any components thereof may be controlled by a control system.
  • acts of the invention may be automated using, for example, a computer system (e.g., a computer controlled system).
  • a computer system on which aspects of the invention can be implemented may include a computer for any type of processing (e.g., sequence analysis and/or automated device control as described herein).
  • processing steps may be provided by one or more of the automated devices that are part of the assembly system.
  • a computer system may include two or more computers.
  • one computer may be coupled, via a network, to a second computer.
  • One computer may perform sequence analysis.
  • the second computer may control one or more of the automated synthesis and assembly devices in the system.
  • additional computers may be included in the network to control one or more of the analysis or processing acts.
  • Each computer may include a memory and processor.
  • the computers can take any form, as the aspects of the present invention are not limited to being implemented on any particular computer platform.
  • the network can take any form, including a private network or a public network (e.g., the Internet).
  • Display devices can be associated with one or more of the devices and computers.
  • a display device may be located at a remote site and connected for displaying the output of an analysis in accordance with the invention. Connections between the different components of the system may be via wire, wireless transmission, satellite transmission, any other suitable transmission, or any combination of two or more of the above.
  • sequence information e.g., a target sequence, a processed analysis of the target sequence, etc.
  • a public network such as the Internet
  • a remote location to be processed by computer to produce any of the various types of outputs discussed herein (e.g., in connection with oligonucleotide design).
  • a public network such as the Internet
  • outputs discussed herein (e.g., in connection with oligonucleotide design).
  • the aspects of the present invention described herein are not limited in that respect, and that numerous other configurations are possible.
  • all of the analysis and processing described herein can alternatively be implemented on a computer that is attached locally to a device, an assembly system, or one or more components of an assembly system.
  • sequence information e.g., a target sequence, a processed analysis of the target sequence, etc.
  • a communication medium e.g., the network
  • the information can be loaded onto a computer readable medium that can then be physically transported to another computer for processing in the manners described herein.
  • a combination of two or more transmission/delivery techniques may be used.
  • computer implementable programs for performing a sequence analysis or controlling one or more of the devices, systems, or system components described herein also may be transmitted via a network or loaded onto a computer readable medium as described herein. Accordingly, aspects of the invention may involve performing one or more steps within the United States and additional steps outside the United States.
  • sequence information (e.g., a customer order) may be received at one location (e.g., in one country) and sent to a remote location for processing (e.g., in the same country or in a different country), for example, for sequence analysis to determine a synthesis strategy and/or design oligonucleotides.
  • a portion of the sequence analysis may be performed at one site (e.g., in one country) and another portion at another site (e.g., in the same country or in another country).
  • different steps in the sequence analysis may be performed at multiple sites (e.g., all in one country or in several different countries). The results of a sequence analysis then may be sent to a further site for synthesis.
  • different synthesis and quality control steps may be performed at more than one site (e.g., within one county or in two or more countries).
  • An assembled nucleic acid then may be shipped to a further site (e.g., either to a central shipping center or directly to a client).
  • each of the different aspects, embodiments, or acts of the present invention described herein can be independently automated and implemented in any of numerous ways.
  • each aspect, embodiment, or act can be independently implemented using hardware, software or a combination thereof.
  • the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers.
  • any component or collection of components that perform the functions described above can be generically considered as one or more controllers that control the above-discussed functions.
  • the one or more controllers can be implemented in numerous ways, such as with dedicated hardware, or with general purpose hardware (e.g., one or more processors) that is programmed using microcode or software to perform the functions recited above.
  • one implementation of the embodiments of the present invention comprises at least one computer-readable medium (e.g., a computer memory, a floppy disk, a compact disk, a tape, etc.) encoded with a computer program (i.e., a plurality of instructions), which, when executed on a processor, performs one or more of the above-discussed functions of the present invention.
  • the computer-readable medium can be transportable such that the program stored thereon can be loaded onto any computer system resource to implement one or more functions of the present invention discussed herein.
  • the reference to a computer program which, when executed, performs the above-discussed functions is not limited to an application program running on a host computer.
  • computer program is used herein in a generic sense to reference any type of computer code (e.g., software or microcode) that can be employed to program a processor to implement the above-discussed aspects of the present invention. It should be appreciated that in accordance with several embodiments of the present invention wherein processes are implemented in a computer readable medium, the computer implemented processes may, during the course of their execution, receive input manually (e.g., from a user).
  • a system controller which may provide control signals to the associated nucleic acid synthesizers, liquid handling devices, thermal cyclers, sequencing devices, associated robotic components, as well as other suitable systems for performing the desired input/output or other control functions.
  • the system controller along with any device controllers together form a controller that controls the operation of a nucleic acid assembly system.
  • the controller may include a general purpose data processing system, which can be a general purpose computer, or network of general purpose computers, and other associated devices, including communications devices, modems, and/or other circuitry or components necessary to perform the desired input/output or other functions.
  • the controller can also be implemented, at least in part, as a single special purpose integrated circuit (e.g., ASIC) or an array of ASICs, each having a main or central processor section for overall, system- level control, and separate sections dedicated to performing various different specific computations, functions and other processes under the control of the central processor section.
  • the controller can also be implemented using a plurality of separate dedicated programmable integrated or other electronic circuits or devices, e.g., hard wired electronic or logic circuits such as discrete element circuits or programmable logic devices.
  • the controller can also include any other components or devices, such as user input/output devices (monitors, displays, printers, a keyboard, a user pointing device, touch screen, or other user interface, etc.), data storage devices, drive motors, linkages, valve controllers, robotic devices, vacuum and other pumps, pressure sensors, detectors, power supplies, pulse sources, communication devices or other electronic circuitry or components, and so on.
  • the controller also may control operation of other portions of a system, such as automated client order processing, quality control, packaging, shipping, billing, etc., to perform other suitable functions known in the art but not described in detail herein.
  • aspects of the invention may be useful to streamline nucleic acid assembly reactions. Accordingly, aspects of the invention relate to marketing methods, compositions, kits, devices, and systems for increasing nucleic acid assembly throughput involving combinations of one or more extension-based and/or ligation-based assembly techniques described herein. Aspects of the invention may be useful for reducing the time and/or cost of production, commercialization, and/or development of synthetic nucleic acids, and/or related compositions. Accordingly, aspects of the invention relate to business methods that involve collaboratively (e.g., with a partner) or independently marketing one or more methods, kits, compositions, devices, or systems for analyzing and/or assembling synthetic nucleic acids as described herein.
  • certain embodiments of the invention may involve marketing a procedure and/or associated devices or systems involving nucleic acid assembly techniques described herein.
  • synthetic nucleic acids, libraries of synthetic nucleic acids, host cells containing synthetic nucleic acids, expressed polypeptides or proteins, etc. also may be marketed.
  • Marketing may involve providing information and/or samples relating to methods, kits, compositions, devices, and/or systems described herein.
  • Potential customers or partners may be, for example, companies in the pharmaceutical, biotechnology and agricultural industries, as well as academic centers and government research organizations or institutes.
  • Business applications also may involve generating revenue through sales and/or licenses of methods, kits, compositions, devices, and/or systems of the invention.
  • Example 1 Nucleic acid fragment assembly.
  • step (1) a primerless assembly of oligonucleotides is performed and in step (2) an assembled nucleic acid fragment is amplified in a primer-based amplification.
  • a 993 base long promoter>EGFP construct was assembled from 50-mer abutting oligonucleotides using a 2-step PCR assembly.
  • oligonucleotide pools were prepared as follows: 36 overlapping 50-mer oligonucleotides and two 5' terminal 59-mers were separated into 4 pools, each corresponding to overlapping 200-300 nucleotide segments of the final construct. The total oligonucleotide concentration in each pool was 5 ⁇ M.
  • a primerless PCR extension reaction was used to stitch (assemble) overlapping oligonucleotides in each pool.
  • the PCR extension reaction mixture was as follows: oligonucleotide pool (5 ⁇ M total) 1.0 ⁇ l ( ⁇ 25 nM final each) dNTP (10 mM each) 0.5 ⁇ l (250 ⁇ M final each)
  • Pfu polymerase (2.5 U/ ⁇ l) 0.5 ⁇ l dH 2 O to 20 ⁇ l Assembly was achieved by cycling this mixture through several rounds of denaturing, annealing, and extension reactions as follows: start 2 min. 95°C
  • primerless PCR product 1.0 ⁇ l primer 5' ( 1.2 ⁇ M) 5 ⁇ l (300 nM final) primer 3' (1.2 ⁇ M) 5 ⁇ l (300 nM final) dNTP (10 mM each) 0.5 ⁇ l (250 ⁇ M final each)
  • the amplified sub-segments were assembled using another round of primerless PCR as follows.
  • a diluted amplification product was prepared for each sub-segment by diluting each amplified sub-segment PCR product 1 :10 (4 ⁇ l mix + 36 ⁇ l dHbO). This diluted mix was used as follows: diluted sub-segment mix 1.0 ⁇ l dNTP (1OmM each) 0.5 ⁇ l (250 ⁇ M final each)
  • an assembly cycle using activation of one or more vector- encoded traits to isolate correctly assembled constructs will involve the following steps. 1 - DNA preparation;
  • This protocol selects for correctly-ligated insert DNA to propagate, removing any background or contaminating vector DNA. This enables an automated, 'hands-off assembly scheme that eliminates colony grow-up during the assembly phase. This process may involve approximately log 2 N assembly steps where N is the total number of fragments to be combined (e.g., building a 50 kb final construct from 50 sequence- verified 1 kb fragments would involve six steps).
  • Example 4 Methods and Use of Assembly by Marker Activation.
  • PSA vectors constructed for use in the assembly process contain a functional ampicillin resistance marker and two non-functional resistance markers. These vectors are illustrated in FIG. 11. These non-functional resistance markers are either chloramphenical and kanamycin (pCK) or tetracycline and specintomycin (pTS) (see Figure 11).
  • pCK and pTS vectors have been constructed such that they contain either a high-copy number origin of replication, or a BAC-based single-copy number origin of replication.
  • the former versions enable DNA assembly up to ⁇ 10 kb (although we have successfully assembled up to 22 kb), while the latter BAC-based vectors enable construction up to -300 kb.
  • Transition from one vector type to another is seamless, as both vector types have the same non-functional markers that are activated by the same activation tags (i.e., they differ only in the origin of replication).
  • activation tags i.e., they differ only in the origin of replication.
  • Step I Polymerize blocking oligos with RecA
  • Step II Substrate addition, synapsis
  • methyl-donor e.g., S-adenosylmethionine (SAM)
  • SAM S-adenosylmethionine
  • Step IV Digestion (NOTE: all constructs that DO NOT require blocking start at this step.)
  • the digested sample was diluted 1 :2 (add 30 ⁇ l dH 2 O).
  • Transform competent DHlOB (or similar strain - must be sensitive to cam, kan, tc, and spn, and must be deficient in E. coli mcr, mrr restriction systems). Standard transformation protocols were followed (e.g., add 3 ⁇ l ligation reaction to 50 ⁇ l competent cells, heat shock 30 sec at 42°C, recover in 350 ⁇ l SOC 1 hr at 37°C, 250 rpm).
  • the resulting culture was diluted 1 :50 in selective media and was grown overnight at 37°C, with shaking at -300 rpm. Subsequently, -200 ⁇ l of the culture was plated on selective plates and grown overnight at 37°C.
  • appropriate antibiotics were added as follows: for pTS vector, 5 ⁇ g/ml Tc and 100 ⁇ g/ml Spn were added; for pCK vector, 12.5 ⁇ g/ml Cam and 25 ⁇ g/ml Kan were added.
  • the left activation sequence contains BsmBI and BtgZI sites (corresponding to sites 1 and 2 described herein, respectively) within a modified promoter region.
  • the right activation sequence contains BsmBI and BtgZI sites (corresponding to sites 3 and 4 described herein, respectively) within a modified promoter region.
  • Sequences represented by "Ui " represent the insert specific sequences.
  • the left TTCA and the right ACTC overhangs are compatible with overhangs on the vector that is being used.
  • a single blocking oligonucleotide e.g., ctp-60 or ctd-60
  • a single oligonucleotide e.g., ktd-60 or ktp-60
  • the present invention provides among other things methods for assembling large polynucleotide constructs and organisms having increased genomic stability. While specific embodiments of the subject invention have been discussed, the above specification is illustrative and not restrictive. Many variations of the invention will become apparent to those skilled in the art upon review of this specification. The full scope of the invention should be determined by reference to the claims, along with their full scope of equivalents, and the specification, along with such variations.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Organic Chemistry (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Plant Pathology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Cell Biology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

Certain aspects of the present invention provide methods for assembling nucleic acid molecules using iterative activation of one or more vector-encoded traits to progressively assemble a longer nucleic acid insert. Aspects of the invention also provide kits, compositions, devices, and systems for assembling synthetic nucleic acids using iterative activation of one or more vector-encoded traits.

Description

ITERATIVE NUCLEIC ACID ASSEMBLY USING ACTIVATION OF VECTOR- ENCODED TRAITS
RELATED APPLICATION This application claims the benefit under 35 U.S. C. § 119(e) of United States provisional patent application serial number 60/841,843, filed August 31, 2006, the contents of which are incorporated herein by reference in their entirety.
FIELD OF THE INVENTION Aspects of the application relate to nucleic acid assembly methods.
BACKGROUND
Recombinant and synthetic nucleic acids have many applications in research, industry, agriculture, and medicine. Recombinant and synthetic nucleic acids can be used to express and obtain large amounts of polypeptides, including enzymes, antibodies, growth factors, receptors, and other polypeptides that may be used for a variety of medical, industrial, or agricultural purposes. Recombinant and synthetic nucleic acids also can be used to produce genetically modified organisms including modified bacteria, yeast, mammals, plants, and other organisms. Genetically modified organisms may be used in research (e.g., as animal models of disease, as tools for understanding biological processes, etc.), in industry (e.g., as host organisms for protein expression, as bioreactors for generating industrial products, as tools for environmental remediation, for isolating or modifying natural compounds with industrial applications, etc.), in agriculture (e.g., modified crops with increased yield or increased resistance to disease or environmental stress, etc.), and for other applications. Recombinant and synthetic nucleic acids also may be used as therapeutic compositions (e.g., for modifying gene expression, for gene therapy, etc.) or as diagnostic tools (e.g., as probes for disease conditions, etc.).
Numerous techniques have been developed for modifying existing nucleic acids (e.g., naturally occurring nucleic acids) to generate recombinant nucleic acids. For example, combinations of nucleic acid amplification, mutagenesis, nuclease digestion, ligation, cloning and other techniques may be used to produce many different recombinant nucleic acids. Chemically synthesized polynucleotides are often used as primers or adaptors for nucleic acid amplification, mutagenesis, and cloning. Techniques also are being developed for de novo nucleic acid assembly whereby nucleic acids are made (e.g., chemically synthesized) and assembled to produce longer target nucleic acids of interest. For example, different multiplex assembly techniques are being developed for assembling oligonucleotides into larger synthetic nucleic acids that can be used in research, industry, agriculture, and/or medicine.
SUMMARY OF THE INVENTION
Aspects of the invention relate to methods, compositions, and devices for assembling nucleic acids. The invention provides nucleic acid configurations and cloning strategies for progressively assembling a long nucleic acid product using a plurality of assembly cycles. Aspects of the invention can reduce the time required for nucleic acid assembly by providing an efficient iterative assembly procedure for generating long nucleic acid products. In certain embodiments, an assembly cycle involves assembling a vector and two or more nucleic acid inserts containing one or more regulatory sequences. According to the invention, the regulatory sequence(s) activate one or more vector-encoded traits when they are assembled in a predetermined configuration. This allows a correctly assembled nucleic acid to be isolated by selecting or screening for the activated trait(s). The isolated nucleic acid may contain an assembled insert that can be excised along with one or more of the regulatory sequences and combined with a further nucleic acid insert and an appropriate vector in a subsequent assembly cycle. In this subsequent cycle, a correctly assembled product can again be isolated using one or more traits that are encoded by the vector and activated by the regulatory sequence(s) present on correctly ligated insert(s). This procedure can be repeated until a final nucleic acid product of interest is assembled. This final product can be used directly or further cloned (e.g., into a new vector) using any suitable technique. In some embodiments, one or more regulatory sequences used during assembly may be removed from the final nucleic acid product.
In some embodiments, correctly assembled nucleic acids can be isolated directly from a pool of transformed host cells in each cycle without requiring individual clones to be isolated and analyzed. This reduces the assembly time associated with each cycle by directly expanding a transformation mix in culture and bypassing the isolation and expansion of individual host cell colonies grown from a transformation mix. In some embodiments, an excised insert from a first vector may be combined with a second vector without separating (e.g., size selecting) the excised insert from the first vector backbone or from uncut vector/insert. Indeed, a restriction digest of a first assembled nucleic acid product may be combined directly with a second vector and another nucleic acid fragment. The restriction digest may include excised insert, empty vector backbone, and uncut vector/insert from the first assembly cycle. While the presence of the first vector backbone may interfere with the second ligation, correctly-ligated product can be selected for if the activated traits encoded by the second vector are different from those encoded by the first vector. This also may reduce assembly time by avoiding labor intensive size selection and isolation steps in each cycle. Accordingly, aspects of the invention provide assembly techniques that are i) less error-prone than simultaneous ligations of pluralities of pooled DNA fragments, and ii) less labor-intensive than iterative pairwise ligation of DNA segments separated based on size. A simultaneous ligation of a plurality of pooled DNA fragments may generate mis- ligated products that typically are not identified until a subsequent sequence analysis performed on nucleic acid retrieved from a transformed cell culture. In contrast, methods of the present invention may select for correctly ligated products by selecting for activation of one or more vector-encoded traits. Iterative pairwise ligation of DNA segments separated based on size may be slow and labor intensive, because it involves DNA isolation and transformant analysis in each cycle of ligation. In contrast, methods of the present invention may be implemented without isolating fragments based on size and without analyzing individual clones from a transformation reaction to identify those with correctly ligated inserts. However, it should be appreciated that a size analysis may be performed as a quality control step either in parallel (e.g., to monitor the progress of the assembly reaction) or prior to performing the next assembly step (e.g., to confirm that a first assembly was successful prior to proceeding with a second assembly.
Aspects of the invention can be used in combination with one or more multiplex nucleic acid assembly techniques in order to assemble a long nucleic acid product from small starting nucleic acids (e.g., from a plurality of oligonucleotides). One or more of the nucleic acid inserts that are used in any of the vector activation assembly cycles described herein may be a nucleic acid that was previously assembled in a multiplex assembly reaction. For example, nucleic acid fragments generated using any of the multiplex assembly reactions illustrated in FIGs. 1-4 or otherwise described herein may be subsequently assembled to form larger nucleic acid products using one or more cycles of a vector-encoded trait activation technique. Accordingly, one or more vector activation assembly cycles may be included in an assembly procedure outlined in FIG. 5. Non-limiting examples of nucleic acid configurations that may be used for assembly by vector activation are illustrated in FIGs. 6-8, and further described herein. In some embodiments, a plurality of assembly cycles can be performed in parallel and pairs of nucleic acid products from a first set of assembly cycles can be combined and assembled in a second set of assembly cycles. In turn, pairs of assembled nucleic acids from the second set of assembly cycles can be combined and assembled in a third set of assembly cycles. This process can be repeated one or more times until a final product is assembled to contain all of the starting nucleic acids from the first plurality of assembly cycles. It should be appreciated that in some embodiments an assembly procedure is hierarchical in that it involves a plurality of converging iterative assembly reactions wherein a first plurality (e.g., N) of pair-wise assembly reactions produces a first plurality of products that are combined in a pair- wise fashion in a second plurality (e.g., N/2) of assembly reactions to generate a second plurality of products. This procedure can be repeated with the number of assembly reactions (and resulting assembly products) being twofold less at each consecutive stage (e.g., until a single final product is generated). In some embodiments, the sizes of the nucleic acid products at each stage are about twofold greater than the sizes at the prior stage (assuming that the initial nucleic acid inserts had similar sizes). Accordingly, this hierarchical assembly procedure can produce a long insert that increases exponentially in size as a function of the number of consecutive assembly steps. However, it also should be appreciated that iterative assembly procedures can be used in a linear assembly procedure. For example, at each consecutive step one product of a prior assembly may be combined with a second nucleic acid insert that was not generated from a prior iterative assembly procedure. In some embodiments, the second nucleic acid insert at each step may be a oligonucleotide (e.g., a double- stranded pair of oligonucleotides) or a relatively short nucleic acid assembled in a multiplex assembly reaction (e.g., about 500 nucleotides long). Accordingly, the nucleic acid being assembled in this linear procedure grows linearly by the length of the second nucleic acid added at each consecutive step. It should be appreciated that an iterative assembly of the invention may involve a combination of one or more linear and one or more exponential assembly steps and is not limited to either a hierarchical assembly or a linear assembly. Design and assembly methods of the invention may be automated. Methods of the invention may reduce the cost and increase the speed and accuracy of nucleic acid assembly procedures, particularly automated assembly procedures.
Accordingly, aspects of the invention provide methods and compositions that can be used to efficiently assemble a target nucleic acid, particularly a long target nucleic acid. In some embodiments, a target nucleic acid may be amplified, sequenced or cloned after it is made. In some embodiments, a host cell may be transformed with the assembled target nucleic acid. The target nucleic acid may be integrated into the genome of the host cell. In some embodiments, the target nucleic acid may encode a polypeptide. The polypeptide may be expressed (e.g., under the control of an inducible promoter). The polypeptide may be isolated or purified. A cell transformed with an assembled nucleic acid may be stored, shipped, and/or propagated (e.g., grown in culture).
In another aspect, the invention provides methods of obtaining target nucleic acids by sending sequence information and delivery information to a remote site. The sequence may be analyzed at the remote site. The starting nucleic acids may be designed and/or produced at the remote site using one or more methods of the invention. In some embodiments, the starting nucleic acids, an intermediate product in the assembly reaction, and/or the assembled target nucleic acid may be shipped to the delivery address that was provided. Other aspects of the invention provide systems for designing starting nucleic acids and/or for assembling the starting nucleic acids to make a target nucleic. Other aspects of the invention relate to methods and devices for automating a multiplex oligonucleotide assembly reaction that include one or more assembly methods of the invention. Yet further aspects of the invention relate to business methods of marketing one or more methods, systems, and/or automated procedures that involve assembly methods of the invention.
Other features and advantages of the invention will be apparent from the following detailed description, and from the claims. The claims provided below are hereby incorporated into this section by reference.
BRIEF DESCRIPTION OF THE FIGURES
FIG. 1 illustrates certain aspects of an embodiment of a polymerase-based multiplex oligonucleotide assembly reaction; FIG. 2 illustrates certain aspects of an embodiment of sequential assembly of a plurality of oligonucleotides in a polymerase-based multiplex assembly reaction;
FIG. 3 illustrates an embodiment of a ligase-based multiplex oligonucleotide assembly reaction; FIG. 4 illustrates several embodiments of ligase-based multiplex oligonucleotide assembly reactions on supports;
FIG. 5 illustrates an embodiment of a nucleic acid assembly procedure;
FIG. 6 illustrates a non-limiting embodiment of two assembly cycles of the invention; FIG. 7 illustrates a non-limiting embodiment of two assembly cycles of the invention;
FIG. 8 illustrates non-limiting embodiments of activator sequence configurations according to the invention;
FIG. 9 illustrates a non-limiting embodiment of a hierarchical cloning strategy that may be integrated with one or more enzyme-mediated multiplex assembly steps;
FIG.10 illustrates a non-limiting embodiment depicting a Pairwise Selection Assembly (PSA) procedure;
FIG.l 1 provides non-limiting diagrams of exemplary vectors for Pairwise Selection Assembly (PSA); and, FIG. 12 illustrates a non-limiting example of promoter regions containing type
IIS recognition sequences for use in some aspects of the invention.
DETAILED DESCRIPTION OF THE INVENTION
Aspects of the invention relate to iterative methods for assembling nucleic acid molecules. The invention provides methods and nucleic acid configurations for activating one or more vector-encoded traits (e.g., antibiotic resistance, auxotrophy, etc.) upon correct assembly of two or more nucleic acid fragments into a vector. Each nucleic acid fragment to be included in an assembly reaction may contain one or more activation sequences. These activation sequences are configured to activate vector-encoded trait(s) on a first vector only if the fragments are assembled in the correct order and orientation in the first vector. Accordingly, a nucleic acid insert that is correctly assembled in the first vector can be isolated by selecting or screening for appropriate trait activation (e.g., in a transformed host cell population). Once isolated, the correctly assembled nucleic acid insert can be removed from the first vector along with one or more activation sequences. This assembled nucleic acid insert then can be combined in a second assembly reaction with a further nucleic acid fragment that also may have one or more activation sequences. These fragments may be inserted into a second vector encoding one or more traits that are activated only if the fragments in the second assembly reaction are correctly assembled into the second vector. The second vector may have the same vector backbone as the first vector. The second vector may be a different vector that encodes one or more of the same traits as the first vector. However, in some embodiments, the second vector may encode one or more traits (e.g., traits that are activated when the correct activating sequence is introduced) that are different from the traits encoded by the first vector. In some embodiments, the second vector does not encode any of the activated trait(s) of the first vector. Similarly, the first vector may not encode any of the activated traits of the first vector. Accordingly, a correctly assembled insert in the second vector may be isolated by selecting or screening for appropriate trait(s) (e.g., in a transformed host cell population). This process may be repeated in one or more additional cycles (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, etc.) adding at least one additional fragment in each cycle. After each cycle, the insert should be larger than the insert that was generated in the previous cycle. For example, a 100 kb fragment of DNA broken into one hundred 1 kb pieces will require 7 assembly steps (100 > 64 > 32 > 16 > 8 > 4 > 2 > 1) while the same fragment broken into two hundred 500 bp pieces will require 8 assembly steps (200 > 128 > 64 > 32 > 16 > 8 > 4 > 2 > 1). It should be noted that in the first assembly cycle, a subset of pieces may be paired such that the product of this pairing combined with the remaining pieces will yield a total number of pieces for the second round that is a power of two (e.g., 100 = 72 + 28; 72/2 + 28 = 64). It should be appreciated that the nucleic acids that are combined for assembly in each cycle may be obtained from any suitable source. For example, each nucleic acid fragment independently may have been generated in a multiplex nucleic acid assembly reaction, an amplification reaction, a prior cloning procedure, etc., or any combination thereof. In some embodiments, one or two fragments that are combined for assembly each may have been generated in a prior assembly cycle that involved vector-encoded trait activation as described herein. In certain embodiments, a plurality of parallel assembly cycles involving vector-encoded trait activation may merge according to a predetermined hierarchy to generate a final assembled nucleic acid product in a hierarchical assembly procedure. Aspects of the invention may be used to generate a nucleic acid of any size. The size of the final product will depend, at least in part, on the size of the fragments that are being assembled and the number of assembly cycles that are performed. For example, nucleic acids from about 20 bp to about 1 mb long may be assembled. In some embodiments, a target nucleic acid may between about 100 bp and 1 kb (e.g., about 200 bp, about 300 bp, about 400 bp, about 500 bp, about 600 bp, etc.). In some embodiments, a target nucleic acid may between about 1 kb and 10 kb (e.g., about 2 kb, about 3 kb, about 4 kb, about 5 kb, about 6 kb, etc.), between about 10 kb and 100 kb (e.g., about 20 kb, about 30 kb, about 40 kb, about 50 kb, etc.), or between about 100 kb and 1 mb in size. However, target nucleic acids that are smaller, larger, or intermediate in size also may be assembled according to methods of the invention. Aspects of the invention may be automated. For example, a robotic liquid handling device integrated with a plurality of reaction stations may be used to automate one or more cycles of assembly described herein. In some embodiments, one or more reaction steps may be performed (and optionally automated) on a microfluidic device comprising a microfluidic substrate having one or more reaction sites connected via microfluidic channels.
Aspects of the invention also provide vectors, nucleic acid cassettes (e.g., encoding one or more traits or comprising one or more activation sequences), enzymes, selection agents (e.g., one or more antibiotics), etc., that may be used as standard templates or reagents for assembly methods of the invention. One or more of these nucleic acids and/or reagents may be packaged in a kit. A kit also may comprise instructions for performing one or more assembly cycles involving activation of a vector- encoded trait. These and other aspects of the invention are described in more detail in the following paragraphs.
Aspects of the invention relate to methods, compositions, and devices for assembling nucleic acids. Some aspects of the invention provide efficient methods for assembling nucleic acids using one or more assembly cycles, wherein at least two predetermined nucleic acids are assembled together with a vector in each assembly cycle. In each cycle, a correctly assembled nucleic acid product may be isolated using one or more appropriate traits (e.g., selectable and/or detectable traits), which become activated (e.g., functional) upon correct assembly of nucleic acid fragments. In some embodiments, one or more predetermined traits encoded on a vector may be activated in each cycle by the correct insertion of assembly nucleic acids into the vector. This may be accomplished by designing an insert fragment carrying one or more segments of nucleic acid (i.e., sequence) which function as "activation tags." Thus, according to the invention, selection and/or detection of a trait is at least in part based on activating otherwise nonfunctional marker carried on a fragment (such as vector) by activation tags that are provided by correct assembly of another fragment (such as an insert). Accordingly, aspects of the invention provide methods for specifically selecting correctly assembled nucleic acids rather than just the presence of certain traits. This can be used in each cycle to avoid certain cloning or validation steps and thereby reduce the time of each cycle. It should be appreciated that in each cycle, one or more nucleic acids being added to the assembly reaction may have been produced in a prior assembly cycle of the invention. Accordingly, iterative assembly using a plurality of assembly cycles can be used to generate progressively longer assembled nucleic acid products in a series of efficient assembly reactions.
Thus, some embodiments of the invention provide methods for assembling nucleic acid segments which include the following steps: digesting a first population of nucleic acids having at least first, second, third and fourth restriction sites, using a first set of restriction enzymes that cleave the nucleic acids at the first and third sites; digesting a second population of nucleic acids having at least first, second, third and fourth restriction sites, using a second set of restriction enzymes that cleave the nucleic acids at the second and fourth sites, where the first and second populations of nucleic acids comprise a first activation sequence located between the first and second restriction sites and a second activation sequence located between the third and fourth restriction sites, and digestion of the first population results in a first population of nucleic acid segments that comprises the first activation sequence but lacks the second activation sequence, and digestion of the second population results in a second population of nucleic acid segments that lacks the first activation sequence and comprises the second activation sequence; combining (optionally in the presence of a ligase) the first and second populations of nucleic acid segments with a first nucleic acid vector that is digested with one or more enzymes to generate overhangs that are compatible with the overhangs generated at the first and fourth sites (e.g., the vectors may be digested with restriction enzymes that cleave at the first and fourth restriction sites — however, any other enzymes that produce compatible overhangs may be used), wherein the first nucleic acid vector comprises a coding sequence of a first marker gene 5' of the first restriction site and a coding sequence of a second marker gene 3' of the fourth restriction site; and finally isolating ligated first nucleic acid vectors that express the first and the second marker genes, where expression of the first and the second marker genes is indicative of correct assembly of the first and second populations of nucleic acid segments. It should be appreciated that in any of the embodiments described herein wherein one or more vectors are digested with enzymes that cut at the first and fourth restriction sites (or equivalents thereof) may be practiced using a vector that is digested with one or more other enzymes that generate overhangs that are compatible with the overhangs at the first and fourth sites of the inserts being cloned into the vectors. Accordingly, the vector may not contain the first and fourth restriction sites provided it contains sites that can be specifically cut to produce appropriate compatible ends (e.g., overhangs) to clone the regulatory ends of the inserts adjacent to the regulatable markers (e.g., activatable markers) on the vector backbone.
In some embodiments, the methods may include additional steps: digesting a third population of nucleic acids having at least first, second, third and fourth restriction sites, using a first set of restriction enzymes that cleave the nucleic acids at the first and third sites; digesting a fourth population of nucleic acids having at least first, second, third and fourth restriction sites, using a second set of restriction enzymes that cleave the nucleic acids at the second and fourth sites, where the third and fourth populations of nucleic acids comprise a first activation sequence located between the first and second restriction sites and a second activation sequence located between the third and fourth restriction sites, and digestion of the third population results in a third population of nucleic acid segments that comprises the first activation sequence but lacks the second activation sequence, and digestion of the fourth population results in a fourth population of nucleic acid segments that lacks a first activation sequence and comprises a second activation sequence; combining in the presence of a ligase the third and fourth populations of nucleic acid segments with a second nucleic acid vector that is digested with restriction enzymes that cleave at the first and fourth restriction sites, wherein the second nucleic acid vector comprises a coding sequence of a first marker gene 5' of the first restriction site and a coding sequence of a second marker gene 3* of the fourth restriction site; selecting for ligated second nucleic acid vectors that express the first and the second marker genes, wherein expression of the first and the second marker genes is indicative of correct assembly of the third and fourth populations of nucleic acid segments; digesting the ligated second nucleic acid vector with restriction enzymes that cleave at the second and fourth restriction sites to release a fifth population of nucleic acid segments lacking a first activation sequence and comprising a second activation sequence; digesting the ligated first nucleic acid vector with restriction enzymes that cleave at the first and third restriction sites to release a sixth population of nucleic acid segments comprising a first activation sequence and lacking a second activation sequence, and combining in the presence of a ligase the fifth and sixth populations of nucleic acid segments with a third nucleic acid vector digested with restriction enzymes that cleave at the first and fourth restriction sites and having a third marker gene coding sequence 5' of the first restriction site and a fourth marker gene coding sequence 3' of the fourth restriction site; and selecting for ligated third nucleic acid vectors that express the third and fourth marker genes, where expression of the third and fourth marker genes is indicative of correct assembly of the fifth and sixth populations of nucleic acid segments.
In yet further embodiments, yet additional steps may be included for assembly: digesting a third population of nucleic acids having at least first, second, third and fourth restriction sites, using a first set of restriction enzymes that cleave the nucleic acids at the first and third sites; digesting a fourth population of nucleic acids having at least first, second, third and fourth restriction sites, using a second set of restriction enzymes that cleave the nucleic acids at the second and fourth sites, wherein the third and fourth populations of nucleic acids comprise a 5' promoter sequence located between the first and second restriction sites and a 3' promoter sequence between the third and fourth restriction sites, and digestion of the third population results in a third population of nucleic acid segments that comprises the 5' promoter sequence but lacks the 3' promoter sequence, and digestion of the fourth population results in a fourth population of nucleic acid segments that lacks a 5' promoter sequence and comprises a 3' promoter sequence; combining in the presence of a ligase the third and fourth populations of nucleic acid segments with a second nucleic acid vector that is digested with restriction enzymes that cleave at the first and fourth restriction sites, wherein the second nucleic acid vector comprises a coding sequence of a first marker gene 5' of the first restriction site and a coding sequence of a second marker gene 3' of the fourth restriction site; selecting for ligated second nucleic acid vectors that express the first and the second marker genes, wherein expression of the first and the second marker genes is indicative of correct assembly of the third and fourth populations of nucleic acid segments; digesting the ligated second nucleic acid vector with restriction enzymes that cleave at the second and fourth restriction sites to release a fifth population of nucleic acid segments lacking a 5' promoter sequence and comprising a 3' promoter sequence; digesting the ligated first nucleic acid vector with restriction enzymes that cleave at the first and third restriction sites to release a sixth population of nucleic acid segments comprising a 5' promoter sequence and lacking a 3' promoter sequence; and combining in the presence of a ligase the fifth and sixth populations of nucleic acid segments with a third nucleic acid vector digested with restriction enzymes that cleave at the first and fourth restriction sites and having a third marker gene coding sequence 5' of the first restriction site and a fourth marker gene coding sequence 3' of the fourth restriction site; and selecting for ligated third nucleic acid vectors that express the third and fourth marker genes, wherein expression of the third and fourth marker genes is indicative of correct assembly of the fifth and sixth populations of nucleic acid segments.
In further embodiments according to the invention, the methods for assembling nucleic acid segments may include the following steps: digesting a first population of nucleic acids having at least first, second, third and fourth restriction sites, using a first set of restriction enzymes that cleave the nucleic acids at the first and third sites; digesting a second population of nucleic acids having at least first, second, third and fourth restriction sites, using a second set of restriction enzymes that cleave the nucleic acids at the second and fourth sites, wherein the first and second populations of nucleic acids comprise a 5' promoter sequence located between the first and second restriction sites and a 3' promoter sequence located between the third and fourth restriction sites, and digestion of the first population results in a first population of nucleic acid segments that comprises the 5' promoter sequence but lacks the 3' promoter sequence, and digestion of the second population results in a second population of nucleic acid segments that lacks the 5' promoter sequence and comprises the 3' promoter sequence; combining in the presence of a ligase the first and second populations of nucleic acid segments with a first nucleic acid vector that is digested with restriction enzymes that cleave at the first and fourth restriction sites, wherein the first nucleic acid vector comprises a coding sequence of a first marker gene 5' of the first restriction site and a coding sequence of a second marker gene 3' of the fourth restriction site; and selecting for ligated first nucleic acid vectors that express the first and the second marker genes, wherein expression of the first and the second marker genes is indicative of correct assembly of the first and second populations of nucleic acid segments.
In any of the embodiments exemplified above, the first, second, third and/or forth marker genes may be selectable and/or activatable markers, such as antibiotic resistance genes. In addition, as discussed in more detail herein, the restriction enzymes used in any of the embodiments disclosed may be type II restriction enzymes or type IIS restriction enzymes. In some embodiments, the same first type IIS restriction enzyme recognition sequence is used for the first and third sites. Similarly, in some embodiments, the same type IIS restriction enzyme recognition sequence is used for the second and fourth sites. Accordingly, a single type IIS enzyme may be used to cut the first and third sites and a different single type IIS enzyme may be used to cut the second and fourth sites. It should be appreciated that in some embodiments, the type IIS recognition sites are located within the flanking regions of the inserts in association with the activating sequences. The second and third sites are oriented in such a way that the nucleic acid of the insert is cleaved by the respective enzymes binding to the recognition sites. In contrast, the first and fourth sites are oriented in such a way that the nucleic acid of the vector is cleaved by the respective enzymes. As a result, an insert released by cleavage at the first and third sites will have an overhang sequence at the first end that is complementary with an overhang generated on the second vector whereas the overhang sequence at the third site will be complementary to the overhang generated at the second site of a different insert release by cleavage at the second and fourth sites, wherein both inserts are designed to be assembled in a subsequent assembly step. Accordingly, it should be appreciated that that the inserts may be designed to include an overlapping sequence (e.g., at least the length of a restriction cleavage site such as a 4-base overlap of some type IIS restriction cleavage sites). Also, it should be appreciated that the cleavage overhang sizes and orientations generated by the restriction enzymes used for cutting the second and third sites should be compatible so that they generate complementary sequences for ligation within the overlap region of two inserts designed for subsequent ligation. These and other aspects of the vector and insert configurations are illustrated by the non-limiting examples provided herein. In some embodiments, the type IIS recognition sites may be located within the promoter regions of the activation sequences (e.g., between or around the -35 and -10 sequences of a promoter) and located such that the digestion site is either outside the activation sequence within the insert sequence (e.g., for sites 2 and 3) or distal to the insert site and in the vector sequence (e.g., for sites 1 and 4). These and other aspects of the restriction site configurations are illustrated by the non-limiting examples provided herein.
It should be appreciated that in some embodiments described herein all sites 1, 2, 3, and 4 are recognized by the same enzyme (e.g., the same type IIS enzyme). In these embodiments, differential cutting at sites 1 and 3 versus 2 and 4 may be achieved using selective protection, methylation, and/or cleavage techniques as described herein. For example, a first set (e.g., pair) of oligonucleotides may be used to protect only sites 1 and 3 from methylation (e.g., using a RecA mediated triple helix formation). Subsequently, after removal of the protecting oligonucleotides, selective cleavage of the unmethylated sites 1 and 3 may be obtained using an enzyme that is sensitive to the methylation of the substrate nucleic acid. Similarly, sites 2 and 4 may be selectively cleaved using specific oligonucleotides that protect sites 2 and 4 (but not sites 1 and 3).
Aspects of the invention provide nucleic acid configurations and assembly strategies that involve selecting for one or more vector-encoded traits in a plurality of consecutive assembly cycles. In some embodiments, the same vector encoding the same trait(s) may be used in a plurality of consecutive assembly cycles, with one or more additional nucleic acids being added in each cycle. However, in some embodiments, two or more different vectors may be used in consecutive cycles. In some embodiments, two different vectors are used repeatedly in alternate cycles. For example, a first vector encoding one or more first traits (e.g., one, two, three, four, or more first traits) may be used in a first cycle, followed by a second vector encoding one or more second traits (e.g., one, two, three, four, or more second traits) in a second cycle. By using alternate assembly vectors having different selectable traits in consecutive cycles, background carryover from one cycle to the next (e.g., due to vectors that were not cut or vectors that were re-circularized without the insertion of an additional fragment) is reduced or avoided. In some embodiments, for example, a first vector may contain two selectable markers (e.g., traits), such as chloramphenicol and kanamycin, which become functional upon ligation with a correct insert. A non-limiting example of such a vector is pCK, as illustrated in FIG. 11. Similarly, a second vector may contain two selectable markers (e.g., traits), such as tetracycline and specintomycin, which become functional upon ligation with a correct insert. A non-limiting example of such a vector is pTS, as illustrated in FIG. 11. Thus, the invention includes methods of assembly using these vectors repeatedly in alternate cycles to generate progressively longer nucleic acid fragments.
Each of the vectors provided in the instant invention for use in the assembly process contain a functional selection marker, e.g., ampicillin resistance, which can be used for propagations purposes, in addition to two non-functional (e.g., activatable) resistance markers as described above. The vectors, such as pCK and pTS, may be constructed such that they contain either a high-copy number origin of replication, or a B AC-based single-copy number origin of replication. The former versions enable DNA assembly up to -10-30 kb, while the latter B AC-based vectors may be more suitable for longer construction up to ~300 kb. Transition from one vector type to another is seamless, as both vector types have the same non-functional markers that are activated by the same activation tags (i.e., they differ only in the origin of replication).
Therefore, cloning may be done by transformation followed by growth and selection of the transformed cell population without isolating and analyzing individual colonies grown from the transformed cell population prior to subsequent expansion. In some embodiments, predetermined nucleic acids and vectors may be designed to produce one or more selectable or detectable traits when a correct assembly reaction occurs. Accordingly, aspects of the invention may provide algorithms (e.g., computer- implemented algorithms) for designing nucleic acid configurations with appropriate selection or detection techniques that may be chosen to isolate correctly assembled nucleic acids in one or more consecutive assembly cycles. According to aspects of the invention, a nucleic acid that is correctly assembled from two smaller nucleic acids in a first assembly cycle may be used in a second assembly cycle. In the second cycle, correct assembly with yet a further nucleic acid and an appropriate vector may generate a longer assembled nucleic acid that can be isolated using appropriate selection or detection techniques. In some embodiments, the same selectable or detectable traits may be used in each assembly cycle. However, the invention is not limited in this respect and different traits may be used in each cycle. In some embodiments, strategies may be developed for alternating traits that are selected for in consecutive assembly cycles. This may reduce the frequency of nucleic acids that are carried over, from one assembly cycle to the next, without being assembled with an additional nucleic acid in each cycle.
In each assembly cycle, the correct insertion of at least one nucleic acid into a vector produces a selectable or detectable trait (e.g., by increasing or decreasing the expression of a marker encoded by the vector). In some embodiments, the insertion of each of two nucleic acids into the vector produces a selectable or detectable trait (e.g., by each nucleic acid increasing or decreasing the expression of a marker encoded by the vector). In certain embodiments, each nucleic acid inserted into the vector produces a different selectable or detectable trait (e.g., by increasing or decreasing the expression of a different marker encoded by the vector). However, it should be appreciated that each nucleic acid inserted into the vector may increase or decrease the expression of the same marker, resulting in different levels of the detectable or selectable trait depending on the number of nucleic acids that are inserted into the vector. Accordingly, in each assembly cycle, the targeted insertion of predetermined nucleic acids (e.g., two predetermined nucleic acids) into a vector may be selected for by selecting for appropriate combinations and/or levels of one or more vector-encoded traits. It should be appreciated that, in any assembly cycle, one or more (e.g., one or both) of the predetermined nucleic acids being assembled may have been assembled in a prior assembly cycle from one or more smaller nucleic acids (e.g., from the assembly of two smaller nucleic acids in a prior cycle).
For example, a segment of nucleic acid residing on a vector which on its own non-functional becomes functional (e.g., activated) only when combined (e.g., assembled) with a segment present on an insert (e.g., an activation tag). As discussed in more detail below, a number of such activation configurations are contemplated herein. In some circumstances, a non-functional segment of nucleic acid on a vector may be a promoter or a fragment thereof, which then becomes turned-on in the presence of a correctly assembled insert containing a remainder of sequences necessary to activate the marker. In this case, correct assembly of the vector and the insert induces transcription of a gene or a fragment thereof encoded by a nucleic acid segment following the promoter sequence. In some cases, a portion of a coding sequence may be provided on the insert (e.g., as part of an activation sequence). For example, a promoter may be associated with at least a start codon (ATG) in the activation sequence on the insert tag. In some embodiments, a coding region that can be used to activate one or more (e.g., 2, 3, 4, etc.) activatable markers may be included in the activation sequence on an insert. Such a coding region may be, for example, a signal peptide, a multimerization domain, a stabilization domain, etc., or any combination thereof. In other applications, a nonfunctional segment on a vector may represent a component or part of a functional unit, which must be supplemented by additional component (on an insert) to become functional. For instance, a vector may encode one subunit of a factor which by itself inactive, and an insert nucleic acid may encode another subunit, which together with the first subunit encoded by the vector can form a functional unit. Similarly, a vector may encode only a portion of a functional factor, and an inset may encode a remainder of the functional unit, such that only when the fragments are correctly assembled the factor becomes functional.
FIGs. 6 and 7 illustrate non-limiting embodiments of two assembly cycles of the invention. In FIG. 6, the activation sequences are a promoter (P) and a terminator (T). In FIG. 7, the activation sequences are both promoters (P and P'). However, other combinations of promoters and terminators may be used as activation sequences (e.g., two terminators). Also, other types of activation sequences may be used. In certain embodiments, an activation sequence may be any suitable cis-acting sequence (e.g., a regulatory sequence such as a promoter, terminator, enhancer, ribosome binding or other cis-acting regulatory sequence — for example, a cis-acting protein binding sequence) that can regulate the expression levels of a marker gene (e.g., a gene that produces a selectable trait when it is either up-regulated or down-regulated). In some embodiments, an activation sequence may be any suitable trans-acting sequence (e.g., a sequence that encodes a regulatory peptide or other trans-acting regulatory factor) that can regulate the expression levels of a marker gene (e.g., a gene that produces a selectable trait when it is either up-regulated or down-regulated). It should be appreciated that the expression levels of a marker gene in a host cell may be up-regulated or down-regulated by an activation sequence. In some embodiments, a marker gene is not expressed in the absence of an activation sequence, and expressed in the presence of the activation sequence. However, the expression level of a marker gene may increase from a lower level to a higher level in the presence of the activation sequence. In certain embodiments, a marker gene is expressed in the absence of an activation sequence and silenced in the presence of the activation sequence. However, the expression level of a marker gene may decrease from a higher level to a lower level in the presence of the activation sequence. In some embodiments, activation sequences are short sequences (e.g., DNA sequences) necessary for the activation of non- functional marker genes (e.g., antibiotic resistance markers) present on the target vector. For example, an activation sequence may be a promoter, a terminator (e.g., a terminal amino-acid/stop-codon region necessary for expression of a marker gene), or other short sequence necessary for correct expression of a marker gene. In some embodiments, a regulatory sequence may be a silencer that reduces expression of a gene resulting in a selectable or detectable trait associated with the reduced gene expression.
A marker gene may be any gene that confers a detectable or selectable trait on a cell when its expression levels change (e.g., increase or decrease, depending on the marker gene). For example, an antibiotic resistance marker may confer a selectable trait (antibiotic resistance) when its expression level is up-regulated. Other marker genes may be auxotrophic markers, or other selectable or detectable markers. An example of a detectable marker is a fluorescent marker. Such markers are well known in the art. In some embodiments, marker genes are fluorescent reporter genes (e.g., GFP,
DsRed, YFP, CFP, etc). Inactive fluorescent reporters encoded on the target vector would be activated upon insertion of DNA molecule(s) containing activation sequences. After transformation and recovery, cells may be sorted via flow cytometry. Cells containing the expressed fluorescent reporter genes can be isolated. The isolated cells contain the correctly ligated DNA. In some embodiments, the activation and expression of a fluorescent reporter gene may be more rapid than the activation and expression of an antibiotic resistance marker. Accordingly, in some embodiments, the isolation of correct constructs using flow cytometry may be performed earlier (e.g., after a shorter cell recovery and growth time after transformation) than a selection involving activation of antibiotic resistance markers.
In FIG. 6, fragments I through IV are assembled in two cycles. In a first assembly cycle, fragments I and II of i) are assembled into vector of ii) to generate I+II of iii), and fragments III and IV of i') are assembled into vector of ii') to generate III+IV of iii'). In a second assembly cycle, fragments I+II of iii) and III+IV of iii') are assembled into vector of iv) to generate fragment I+II+III+IV of v). In i), fragments I and II are provided in constructs comprising two flanking activation sequences (P and T), wherein each activation sequence is flanked by two different restriction enzyme recognition sites (restriction sites 1 and 2 flank P, and restriction sites 3 and 4 flank T). It should be appreciated that one or both of these constructs may be provided in a first vector (e.g., a plasmid). However, in some embodiments, one or both of these constructs may be provided as a linear product of a multiplex nucleic acid assembly reaction. In some embodiments, the constructs may be amplified (e.g., by PCR or LCR). The construct containing I is digested with restriction enzymes that cut at 1 and 3 to generate a linear product that contains I and one of the flanking activation sequences (P). The construct containing II is digested with 2 and 4 to generate a product that contains II and the other flanking activation sequence (T). The digested constructs are combined with a vector ii) that contains restriction sites 1 and 4 adjacent to marker genes A and B, respectively. In ii), site 1 is 5' of marker gene A, and site 4 is 3' of marker gene B. A and B are inactive in ii). The vector of ii) is digested with restriction enzymes that recognize sites 1 and 4. The digested nucleic acids of i) and ii) are ligated to generate a product shown in iii). In a correct assembly, the free ends generated by digestion at site 1 flanking P and at site 1 upstream of A are compatible (e.g., cohesive), and activation sequence P is ligated upstream of marker A in the vector, thereby activating A. In a correct assembly, the free ends generated by digestion at site 4 flanking T and at site 4 downstream from B are compatible (e.g., cohesive), and activation sequence T is ligated downstream of marker B in the vector, thereby activating B. In a correct assembly, the free ends generated by restriction digestion at sites 3 and 2, flanking I and II respectively, are compatible (e.g., cohesive) and are ligated together to generate product H-II in a vector expressing A and B as shown in iii). The ligated nucleic acids of i) and ii) are transformed into a suitable host cell preparation. A correct assembly may be selected for by selecting for traits associated with activation of A and B in the host cells.
Similarly, in i') a construct containing III is digested with restriction enzymes that recognize sites 1 and 3, and a construct containing IV is digested with restriction enzymes that recognize sites 2 and 4. The restriction products of i') are ligated into a vector of ii') that has been digested with restriction enzymes recognizing 1 and 4. The resulting product III+IV is shown ligated into the vector in iii'). As described for I and II, the correct assembly of III and IV may be isolated by selecting for traits associated with A and B in suitable host cells after transformation of the ligation reaction products.
In a second assembly cycle, I+II are assembled with III+IV to generate I+II+III+IV. The nucleic acids of iii) may be digested with restriction enzymes that recognize 1 and 3, generating a linear product that contains I+II and one of the flanking activation sequences (P). The nucleic acids of Hi') may be digested with restriction enzymes that recognize 2 and 4, generating a product that contains III+IV and the other flanking activation sequence (T). The digested constructs are combined with a vector of iv) that has been digested with restriction enzymes recognizing sites 1 and 4 adjacent to inactive marker genes C and D, respectively. The nucleic acids are ligated and transformed into a suitable host cell preparation. A correct assembly of I+II+III+IV shown in v) may be selected for by selecting for traits associated with activation of C and D in the host cells.
In FIG. 7, fragments I through IV are assembled in two cycles as described above for FIG. 6. However, the flanking activation sequences in FIG. 7 are both promoters
(e.g., P and P'). Accordingly, the coding sequences of both sets of marker genes (A and B in ii) and ii'), and C and D in iv)) are oriented so that sites 1 and 4 are upstream of the marker genes. As a result, correct insertion of the promoter containing fragments into sites 1 and 4 activates both sets of marker genes. A working example of a first assembly cycle of this configuration is provided in FIG. 7B. Here, two approximately 900 bp fragments (I and II) are being combined to make a contiguous 1800 bp fragment. In this example, sites 1 and 3 are Bsal sites and sites 2 and 4 are BsmBI sites (see FIG. 7). These restriction sites are not present in the sequence being assembled. Restriction digestion reactions can be heat-inactivated and a portion of each digestion can be combined with linearized destination vector in a ligation reaction. E. coli cells can then be transformed with the ligation reaction and correct pairs can be selected in culture. As demonstrated in FIG. 7B, showing agarose gel electrophoresis of digested DNA before and after assembly and selection, two ~900 bp fragments can be assembled to generate a contiguous 1,800 bp fragment according to the methods of the present invention. It should be appreciated that after each cycle, an assembled insert (e.g., I+II,
III+IV, and I+II+III+IV) is flanked by activator sequences (e.g., P and T in FIG. 6, P and P' in FIG. 7, or any other combination of activator sequences). Also, the activator sequences retain flanking restriction sites 1, 2, 3, and 4 in the same configuration (e.g., as illustrated in FIGs. 6 and 7). Accordingly, the product of each assembly cycle can be used in a further assembly cycle using the same strategy. For example, product v) in
FIGs. 6 and 7 can be processed as described herein (e.g., cut with 1 and 3, or with 2 and 4) and ligated into a suitable vector along with an additional fragment having the appropriate compatible free ends for ligation. In some embodiments, a vector with different marker genes is used in each cycle. For example, vectors of ii) and iv) may be used in alternate cycles. Accordingly, an insert excised from v) may be cloned into a vector of ii) along with an additional fragment, and correctly assembled inserts may be selected for by selecting for activation of A and B. In the next cycle, a vector of iv) may be used, etc. By alternating vectors in consecutive cycles, the selection combination used in each cycle (e.g., for A and B or for C and D) specifically selects for the intended fragment assembly product and selects against the vector that was used in a prior assembly cycle. As a result, aspects of the invention may be readily automated. In some embodiments, ligation assembly reactions may be performed using restriction digest mixtures that contain the fragments to be assembled and also the vector backbones from the prior assembly. Selection for different markers (e.g., alternate markers) in each cycle reduces or prevents the vector backbones from a prior cycle from interfering with the assembly process (e.g., avoids a background of transformed cells containing vectors from a prior cycle from being amplified). Accordingly, size separation steps (e.g., to isolate a fragment from a vector) may be omitted from these assembly cycles.
It should be appreciated that restriction sites 1-4 and the corresponding restriction enzymes may be chosen from any suitable restriction site/enzyme combinations. However, certain enzyme selections and configurations may be particularly useful.
In some embodiments, restriction sites 1 and 4 are long recognition sites (e.g., between about 8 and about 50 nucleotides long) that are recognized by rare-cutting restriction enzymes. Rare-cutting restriction enzymes may be meganucleases, modified meganucleases (e.g., engineered meganucleases that include a cleavage domain from a type IIS enzyme but retain the binding domain of a meganuclease — for example from a mutant meganuclease that binds and does not cleave a target sequence, see for example meganucleases described in US Serial Number 60/925,507, filed April 19, 2007, the disclosure of which is incorporated herein by reference), or other rare cutting enzymes (e.g., Notl). Restriction sites 1 and 4 may be Type II sites that are regenerated after ligation in each cycle.
Restriction sites 2 and 3 may be Type IIS sites that are not regenerated after ligation in each cycle. Sites 2 and 3 may be oriented to cause cleavage within the central region of each construct (as opposed to cleavage in the flanking regions). Restriction sites 2 and 3 may be different. However, the cleavage patterns (e.g., the type of overhang, 5' or 3', and the overhang length) of the Type IIS restriction enzymes that recognize 2 and 3 may be identical or similar. As a result, the nucleic acid constructs may be designed so that cleavage by Type IIS enzymes specific for sites 2 and 3 generates free ends that are compatible (e.g., cohesive or complementary) for a subsequent ligation reaction. For example, sites 2 and 3 may be located to cause Type IIS cleavage within a sequence region that is common to the fragments being assembled (e.g., in an overlapping sequence region of I and II, or of III and IV, or of II and III illustrated in FIGs. 6 and 7). In some embodiments, the common or overlapping regions are not duplicated after assembly, because the cleavage sites may be designed to cut at a location that results in a single copy of the overlapping or common regions being reassembled upon ligation. It should be appreciated that, in this non-limiting configuration, the cleavage sites for 2 and 3 are within the sequence of the nucleic acid being assembled. In contrast, the recognition sites for 2 and 3 are outside the sequence of the nucleic acid being assembled. Therefore, these recognition sites are not regenerated upon ligation (e.g., of I and II, III and IV, or II and III). However, in each cycle, additional sites 2 and 3 that are at the opposite ends of the fragments (e.g., at the ends cut at 1 and 4, respectively) are carried over into the newly assembled product and are available for cleavage in a subsequent cycle as illustrated in FIGs. 6 and 7.
In some circumstances, the invention may be useful for generating a library of variants. For example, each insert to be progressively added (e.g., insert I, II, III, and IV as shown in FIGs 6 and 7) may represent a plurality of nucleic acid variants. For example, insert I may represent a plurality (e.g., a pool) of variants of I, and insert II may represent a plurality (e.g., a pool) of variants of II, and so on. Such variants may include naturally occurring variants (such as SNPs) and other mutations. In some embodiments, each insert may encode a module of a protein (polypeptide), each containing one or more variants. In some cases, the protein is a naturally occurring protein or variant thereof. Such variants may represent functional variants, structural variants, sequence variants, etc. In some circumstances, the protein is an engineered protein (i.e., artificial protein) comprising one or more modules. In some embodiments, a module may represent a functional module, e.g., a kinase domain, etc. In further embodiments, each insert may represent a gene element, such as regulatory regions (e.g., promoters), exons, untranslated regions (e.g., 5'-UTR and 3'-UTR etc.). Thus, the invention contemplates that in some embodiments insert I may represent a library of promoters, insert II may represent a library of genes or fragments thereof having similar functions (such as a functional domain), and insert III may represent a further fragment, and so on. In yet further embodiments, each insert may represent a cluster of genes or gene elements. Thus, the methods and compositions of the instant invention may be used to generate a library of plurality of nucleic acid variants. In particular, the method of the invention may be useful for generating a library of variants, each of which is a relatively long nucleic acid, e.g., 1, 5, 10, 15, 20, 30, 40, 50 kb or more.
In some embodiments, restriction enzyme digestion and ligation may be performed in the same reaction tube. The use of Type IIS sites that are not regenerated after ligation can drive the reaction towards the correct assembly as described in more detail herein. This also may speed up an assembly reaction by avoiding separate digestion and ligation steps and by avoiding any purification, size separation, or other processing steps in between restriction enzyme digestion and ligation. This aspect also may be readily automated, avoiding additional sample manipulations associated with separate restriction digestion and ligation steps. It should be appreciated that in some embodiments, sites 1 and 3 may be cut by the same restriction enzyme (e.g., oriented such that site 1 is retained and site 3 is not retained after the subsequent ligation reaction). Similarly, sites 2 and 4 may be cut by the same restriction enzyme (e.g., oriented such that site 4 is retained and site 2 is not retained after the subsequent ligation reaction). It also should be appreciated that sites 1, 2, 3, and 4 all may be recognized by different enzymes. However, in some embodiments, sites 1, 2, 3, and 4 may be recognized by the same enzyme (e.g., the sites all have the same sequence) and differential digestion at positions 1 and 3 versus 2 and 4 may be achieved using specific protection or digestion techniques described herein (e.g., using oligonucleotides to protect from methylation). It should be appreciated that the presence, within the sequence of an insert, of one or more of the recognition sites for an enzyme used for iterative assembly (e.g., a restriction enzyme that cleaves at site 1, 2, 3, and/or 4) may complicate the assembly process by resulting in cleavage at unwanted positions. Accordingly, in some embodiments, a target sequence for assembly may be designed so that it does not include such sites. However, in some embodiments, specific recognition and cleavage at sites 1, 2, 3, and 4 illustrated in FIGs. 6 and 7, without unwanted cutting at any identical sites 1, 2, 3, and 4 within the fragments being assembled, may be achieved using different strategies described herein in more detail (e.g., using rare-cutting enzymes, masking unwanted cleavage sites, methylating unwanted cleavage or recognition sites while masking targeted cleavage or recognition sites to protect them from methylation, etc.). In some embodiments, specific digestion at one or more of locations 1 , 2, 3, and 4, may be achieved using targeted nuclease digestion (e.g., using targeted triple helix formation as described in more detail herein).
FIG. 8 illustrates non-limiting embodiments showing details of activator sequence configurations such as those in FIGs. 6 and 7. It should be appreciated that the assembly strategy and configurations of sites, activation sequences, and markers illustrated in FIGs. 6, 7, and 8 may be generalized and used for any suitable combination of enzymes, enzyme recognition sites, activation sequences, marker genes, etc. as described herein.
In some embodiments, activatable auxotrophic markers may include one or more of the following non-limiting examples of yeast alleles that may be used as auxotrophic markers: adel-14, ade2-l, ade2-101, ade2-BglII, canl-100, his3delta200, his3deltal, his3-l l,15, Ieu2deltal, Ieu2-3,112, lys2-801, lys2delta202, trpldeltal, trpldelta63, trpl- 1, trpl-289, ura3-52, ura3-l, ade2delta::hisG, Ieu2deltaθ, Iys2deltaθ, metl5deltaθ, ura3deltaθ. Additional auxotrophic markers and other markers that may be used are known in the art (See, for example, Brachmann et al. (1998) "Designer deletion strains derived from Saccharomyces cerevisiae S288C: a useful set of strains and plasmids for PCR-mediated gene disruption and other applications." Yeast Volume 14, pp 115-132)
In some embodiments, activatable markers may confer resistance to one or more of the following non-limiting antibiotics: neomycin, ampicillin, hygromycin, gentamycin, bleomycin, phleomycin, kanamycin, geneticin (or G418), paromomycin, tetracycline, beta-lactams, vancomycin, erythromycin, chloramphenicol, novobiocin, cefotaxime, coumermycin Ai and spectinomycin.
In some embodiments, one or more of the following non-limiting list of promoters may be used as activation sequences: bacterial promoters (e.g., T7, tRNA, rrn promoters, etc.); yeast promoters (e.g., GALl, GALlO, ADHl, etc.); insect promoters; mammalian promoters; and/or promoters from other species. In some embodiments, a natural promoter sequence may be modified to incorporate one or more restriction sites (e.g, type IIS restriction sites).
It should be appreciated that the same activation sequence may be used at both ends of an insert. For example, the same promoter may be used at the left and right ends of the assembled inserts described herein. However, the orientation of the activation sequences (e.g., promoters) may be such that they only work when integrated (e.g., ligated) into the correct site. For example, the promoters may be on opposite strands relative to the insert and only work to activate the adjacent marker if cloned into the correct end of the vector.
It should be appreciated that any suitable vector may be used as described herein. For example a vector may have an origin of replication, a selectable marker (e.g., an active marker) different from the activatable markers (e.g., inactive markers) used for assembly as described herein. The vector also may include appropriate restriction sites adjacent to the activatable markers as described herein. The vectors may be prokaryotic, eukaryotic (e.g., yeast, mammalian, insect) or viral. Different vectors may be adapted for different insert sizes as described herein (e.g., BAC, YAC, etc. for larger insert sizes) or different uses. However, different vectors may include the same activatable markers and/or appropriate restrictions sites for iterative assembly as described herein. It should be appreciated that any suitable technique (e.g., chemical or enzymatic) may be used to digest the nucleic acids at the appropriate sites as described herein for iterative assembly. Similarly, any suitable technique may be used for connecting nucleic acids (e.g., chemical or enzymatic ligation — e.g., using a suitable ligase such as T4 ligase or other ligase, or in vivo recombination as described herein for concerted assembly). In some embodiments, a sequence analysis and design strategy of the invention may be incorporated in an assembly process outlined in FIG. 5.
FIG. 5 illustrates a method for assembling a nucleic acid in accordance with one embodiment of the invention. Initially, in act 500, sequence information is obtained. The sequence information may be the sequence of a predetermined target nucleic acid that is to be assembled. In some embodiments, the sequence may be received in the form of an order from a customer. The order may be received electronically or on a paper copy. In some embodiments, the sequence may be received as a nucleic acid sequence (e.g., DNA or RNA). In some embodiments, the sequence may be received as a protein sequence. The sequence may be converted into a DNA sequence. For example, if the sequence obtained in act 500 is an RNA sequence, the Us may be replaced with Ts to obtain the corresponding DNA sequence. If the sequence obtained in act 500 is a protein sequence, it may be converted into a DNA sequence using appropriate codons for the amino acids. When choosing codons for each amino acid, consideration may be given to one or more of the following factors: i) using codons that correspond to the codon bias in the organism in which the target nucleic acid may be expressed, ii) avoiding excessively high or low GC or AT contents in the target nucleic acid (for example, above 60% or below 40%; e.g., greater than 65%, 70%, 75%, 80%, 85%, or 90%; or less than 35%, 30%, 25%, 20%, 15%, or 10%), iii) avoiding sequence features that may interfere with the assembly procedure (e.g., the presence of repeat sequences, high GC content or stem loop structures), and/or iv) avoiding recognition sequences for one or more restriction enzymes that may be used in an assembly procedure (e.g., restriction enzyme sites 1-4 illustrated in FIGs. 6-8). Similar factors may be considered when designing non-coding nucleic acid sequences. However, these factors may be ignored in some embodiments as the invention is not limited in this respect. Also, aspects of the invention may be used to reduce errors caused by one or more of these factors. Accordingly, a DNA sequence determination (e.g., a sequence determination algorithm or an automated process for determining a target DNA sequence) may omit one or more steps relating to the analysis of the GC or AT content of the target nucleic acid sequence (e.g., the GC or AT content may be ignored in some embodiments) or one or more steps relating to the analysis of certain sequence features (e.g., sequence repeats, inverted repeats, etc.) that could interfere with an assembly reaction performed under standard conditions but may not interfere with an assembly reaction including one or more concerted assembly steps. In some embodiments, target or insert sequences may be designed or modified to remove one or more of the restriction enzyme sites that are used for the iterative assembly.
In act 510, the sequence information may be analyzed to determine an assembly strategy. This may involve determining whether the target nucleic acid will be assembled as a single fragment or if several intermediate fragments will be assembled separately and then combined in one or more additional rounds of assembly to generate the target nucleic acid.
A sequence analysis may involve deciding which fragments will be prepared to be assembled in a first vector using a vector-encoded trait activation technique of the invention. Nucleic acids being assembled may include one or more sequences that could act as activator sequences. In some embodiments, an assembly strategy may be designed to prevent these putative activator sequences (e.g., promoters, terminators, etc.) from being located on an assembly fragment at a location (e.g., at a 5' or 3' end) where they may activate a vector-encoded trait when the fragments are incorrectly assembled (e.g., inverted or cloned to the incorrect free end of a linearized vector, etc.). Accordingly, such putative activator sequences may be buried within the central regions (e.g., within about the middle 80%) of fragments that are being assembled.
A sequence analysis also may be important for choosing the restriction enzymes, activator sequences, or vector-encoded traits that will be used. For example, one or more enzymes chosen for assembly may be ones that are not present (or only present in small numbers) in the target sequence. Activator sequences and/or vector-encoded traits may be chosen so that they do not interfere with one or more functions (e.g., gene encoded functions) on the target nucleic acid. In some embodiments, ampicillin resistance may be avoided as an activatable marker if the target nucleic acid being assembled encodes beta lactamase or other enzyme that protects from (e.g., degrades or modifies) ampicillin. Once the overall assembly strategy has been determined, input nucleic acids (e.g., oligonucleotides) for assembling the one or more nucleic acid fragments may be designed. The sizes and numbers of the input nucleic acids may be based in part on the type of assembly reaction (e.g., the type of polymerase-based assembly, ligase-based assembly, chemical assembly, or combination thereof) that is being used for each fragment. The input nucleic acids also may be designed to avoid 5' and/or 3' regions that may cross-react incorrectly and be assembled to produce undesired nucleic acid fragments. Other structural and/or sequence factors also may be considered when designing the input nucleic acids. In certain embodiments, some of the input nucleic acids may be designed to incorporate one or more specific sequences (e.g., primer binding sequences, restriction enzyme sites, etc.) at one or both ends of the assembled nucleic acid fragment.
In act 520, the input nucleic acids are obtained. These may be synthetic oligonucleotides that are synthesized on-site or obtained from a different site (e.g., from a commercial supplier). In some embodiments, one or more input nucleic acids may be amplification products (e.g., PCR products), restriction fragments, or other suitable nucleic acid molecules. Synthetic oligonucleotides may be synthesized using any appropriate technique as described in more detail herein. It should be appreciated that synthetic oligonucleotides often have sequence errors. Accordingly, oligonucleotide preparations may be selected or screened to remove error-containing molecules as described in more detail herein. In act 530, an assembly reaction may be performed for each nucleic acid fragment. For each fragment, the input nucleic acids may be assembled using any appropriate assembly technique (e.g., a polymerase-based assembly, a ligase-based assembly, a chemical assembly, or any other multiplex nucleic acid assembly technique, or any combination thereof). An assembly reaction may result in the assembly of a number of different nucleic acid products in addition to the predetermined nucleic acid fragment. Accordingly, in some embodiments, an assembly reaction may be processed to remove incorrectly assembled nucleic acids (e.g., by size fractionation) and/or to enrich correctly assembled nucleic acids (e.g., by amplification, optionally followed by size fractionation). In some embodiments, correctly assembled nucleic acids may be amplified (e.g., in a PCR reaction) using primers that bind to the ends of the predetermined nucleic acid fragment. It should be appreciated that act 530 may be repeated one or more times. For example, in a first round of assembly a first plurality of input nucleic acids (e.g., oligonucleotides) may be assembled to generate a first nucleic acid fragment. In a second round of assembly, the first nucleic acid fragment may be combined with one or more additional nucleic acid fragments and used as starting material for the assembly of a larger nucleic acid fragment. In a third round of assembly, this larger fragment may be combined with yet further nucleic acids and used as starting material for the assembly of yet a larger nucleic acid. This procedure may be repeated as many times as needed for the synthesis of a target nucleic acid. Accordingly, progressively larger nucleic acids may be assembled. At each stage, nucleic acids of different sizes may be combined. At each stage, the nucleic acids being combined may have been previously assembled in a multiplex assembly reaction. However, at each stage, one or more nucleic acids being combined may have been obtained from different sources (e.g., PCR amplification of genomic DNA or cDNA, restriction digestion of a plasmid or genomic DNA, or any other suitable source).
One or more cycles of assembly may be performed using a vector-encoded trait- activation technique described herein.
It should be appreciated that nucleic acids generated in each cycle of assembly may contain sequence errors if they incorporated one or more input nucleic acids with sequence error(s). Accordingly, a fidelity optimization procedure may be performed after a cycle of assembly in order to remove or correct sequence errors. It should be appreciated that fidelity optimization may be performed after each assembly reaction when several consecutive cycles of assembly are performed. However, in certain embodiments fidelity optimization may be performed only after a subset (e.g., 2 or more) of consecutive assembly reactions are complete. In some embodiments, no fidelity optimization is performed. Accordingly, act 540 is an optional fidelity optimization procedure. Act 540 may be used in some embodiments to remove nucleic acid fragments that seem to be correctly assembled (e.g., based on their size or restriction enzyme digestion pattern) but that may have incorporated input nucleic acids containing sequence errors as described herein. For example, since synthetic oligonucleotides may contain incorrect sequences due to errors introduced during oligonucleotide synthesis, it may be useful to remove nucleic acid fragments that have incorporated one or more error-containing oligonucleotides during assembly. In some embodiments, one or more assembled nucleic acid fragments may be sequenced to determine whether they contain the predetermined sequence or not. This procedure allows fragments with the correct sequence to be identified. However, in some embodiments, other techniques may be used to remove error containing nucleic acid fragments. It should be appreciated that error containing-nucleic acids may be double-stranded homoduplexes having the error on both strands (i.e., incorrect complementary nucleotide(s), deletion(s), or addition(s) on both strands), because the assembly procedure may involve one or more rounds of polymerase extension (e.g., during assembly or after assembly to amplify the assembled product) during which an input nucleic acid containing an error may serve as a template thereby producing a complementary strand with the complementary error. In certain embodiments, a preparation of double-stranded nucleic acid fragments may be suspected to contain a mixture of nucleic acids that have the correct sequence and nucleic acids that incorporated one or more sequence errors during assembly. In some embodiments, sequence errors may be removed using a technique that involves denaturing and reannealing the double-stranded nucleic acids. In some embodiments, single strands of nucleic acids that contain complementary errors may be unlikely to reanneal together if nucleic acids containing each individual error are present in the nucleic acid preparation at a lower frequency than nucleic acids having the correct sequence at the same position. Rather, error containing single strands may reanneal with a complementary strand that contains no errors or that contains one or more different errors. As a result, error- containing strands may end up in the form of heteroduplex molecules in the reannealed reaction product. Nucleic acid strands that are error-free may reanneal with error- containing strands or with other error-free strands. Reannealed error-free strands form homoduplexes in the reannealed sample. Accordingly, by removing heteroduplex molecules from the reannealed preparation of nucleic acid fragments, the amount or frequency of error containing nucleic acids may be reduced. Any suitable method for removing heteroduplex. molecules may be used, including chromatography, electrophoresis, selective binding of heteroduplex molecules, etc. In some embodiments, mismatch binding proteins that selectively (e.g., specifically) bind to heteroduplex nucleic acid molecules may be used. One example includes using MutS, a MutS homolog, or a combination thereof to bind to heteroduplex molecules. In E. colt, the MutS protein, which appears to function as a homodimer, serves as a mismatch recognition factor. In eukaryotes, at least three MutS Homolog (MSH) proteins have been identified; namely, MSH2, MSH3, and MSH6, and they form heterodimers. For example in the yeast, Saccharomyces cerevisiae, the MSH2-MSH6 complex (also known as MutSα) recognizes base mismatches and single nucleotide insertion/deletion loops, while the MSH2-MSH3 complex (also known as MutSβ) recognizes insertions/deletions of up to 12-16 nucleotides, although they exert substantially redundant functions. A mismatch binding protein may be obtained from recombinant or natural sources. A mismatch binding protein may be heat-stable. In some embodiments, a thermostable mismatch binding protein from a thermophilic organism may be used. Examples of thermostable DNA mismatch binding proteins include, but are not limited to: Tth MutS (from Thermus thermophilics); Taq MutS (from Thermus aquaticus); Apy MutS (from Aquifex pyrophilus); Tma MutS (from Thermotoga maritimά); any other suitable MutS; or any combination of two or more thereof. According to aspects of the invention, protein-bound heteroduplex molecules
(e.g., heteroduplex molecules bound to one or more MutS proteins) may be removed from a sample using any suitable technique (binding to a column, a filter, a nitrocellulose filter, etc., or any combination thereof). It should be appreciated that this procedure may not be 100% efficient. Some errors may remain for at least one of the following reasons. Depending on the reaction conditions, not all of the double-stranded error-containing nucleic acids may be denatured. In addition, some of the denatured error-containing strands may reanneal with complementary error-containing strands to form an error containing homoduplex. Also, the MutS/heteroduplex interaction and the MutS/heteroduplex removal procedures may not be 100% efficient. Accordingly, in some embodiments the fidelity optimization act 540 may be repeated one or more times after each assembly reaction. For example, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more cycles of fidelity optimization may be performed after each assembly reaction. In some embodiments, the nucleic acid is amplified after each fidelity optimization procedure. It should be appreciated that each cycle of fidelity optimization will remove additional error-containing nucleic acid molecules. However, the proportion of correct sequences is expected to reach a saturation level after a few cycles of this procedure.
In some embodiments, the size of an assembled nucleic acid that is fidelity optimized (e.g., using MutS or a MutS homolog) may be determined by the expected number of sequence errors that are suspected to be incorporated into the nucleic acid during assembly. For example, an assembled nucleic acid product should include error free nucleic acids prior to fidelity optimization in order to be able to enrich for the error free nucleic acids. Accordingly, error screening (e.g., using MutS or a MutS homolog) should be performed on shorter nucleic acid fragments when input nucleic acids have higher error rates. In some embodiments, one or more nucleic acid fragments of between about 200 and about 800 nucleotides (e.g., about 200, about 300, about 400, about 500, about 600, about 700 or about 800 nucleotides in length) are assembled prior to fidelity optimization. After assembly, the one or more fragments may be exposed to one or more rounds of fidelity optimization as described herein. In some embodiments, several assembled fragments may be ligated together (e.g., to produce a larger nucleic acid fragment of between about 1,000 and about 5,000 bases in length, or larger), and optionally cloned into a vector, prior to fidelity optimization as described herein. At act 550, an output nucleic acid is obtained. As discussed herein, several rounds of act 530 and/or 540 may be performed to obtain the output nucleic acid, depending on the assembly strategy that is implemented. The output nucleic acid may be amplified, cloned, stored, etc., for subsequent uses at act 560. In some embodiments, an output nucleic acid may be cloned with one or more other nucleic acids (e.g., other output nucleic acids) for subsequent applications. Subsequent applications may include one or more research, diagnostic, medical, clinical, industrial, therapeutic, environmental, agricultural, or other uses.
It should be appreciated that each nucleic acid assembly may involve a combination of one or more extension, ligation, and/or cloning procedures. For example, in one embodiment, a target nucleic acid may be assembled entirely in vitro using multiplex extension reactions, ligation reactions, or a combination thereof. The resulting target nucleic acid product then may be transformed into a host cell (e.g., after insertion into a vector) for subsequent growth and amplification. However, in certain embodiments, a target nucleic acid may be assembled from a plurality of intermediate nucleic acids (e.g., shorter nucleic acids that will be combined to form the final target nucleic acid product) that have been inserted into vectors and amplified in vivo in a host cell. In some embodiments, a target nucleic acid assembly may involve preparing a first plurality of intermediate nucleic acids (e.g., using an in vitro multiplex assembly reaction for each intermediate nucleic acid), cloning each of the first plurality of intermediate nucleic acids into a vector for amplification in a host cell, isolating each of the first plurality of intermediate nucleic acids after amplification in the host cell, and assembling the first plurality of intermediate nucleic acids (e.g., via ligation) to obtain the target nucleic acid. This final assembly step may include cloning into an appropriate vector so that the target nucleic acid can be grown and amplified in an appropriate host cell.
In some embodiments, assembly of a target nucleic acid may involve several cycles of intermediate cloning. For example, a first plurality of intermediates may be cloned into an appropriate vector and amplified in a host cell. Subsets of the first plurality of intermediates (e.g., pairs of nucleic acids, or groups of 3, 4, 5, 6, 7, 8, 9, 10 or more intermediate nucleic acids from the first plurality) may subsequently be combined and assembled (e.g., by ligation) to form a second plurality of intermediates that are longer than the first plurality of intermediates (each of the second plurality of intermediates is assembled from one subset of the first plurality of intermediates described above). This second plurality of intermediates also may be cloned into an appropriate vector and amplified in a host cell. This second plurality of intermediates may be assembled directly to form the final target nucleic acid. Alternatively, this second plurality of intermediates may be cycled through one or more additional intermediate assembly procedures (e.g., forming third, fourth, fifth, sixth, or more pluralities of progressively longer intermediates) before a final nucleic acid is assembled. Each of the first plurality of intermediates may be generated by ligation or extension (e.g., in an in vitro multiplex nucleic acid assembly reaction). The decision to further assemble the first plurality of intermediates using one or more cycles of cloning and amplification in host cells may be based on the properties of the intermediates (e.g., predicted or actual difficulties in further assembling them using only in vitro reactions). It should be appreciated overall that assembly time may be reduced by avoiding intermediate cloning steps that involve cell growth. Accordingly, in some embodiments, in vitro assembly techniques alone are used to generate a final nucleic acid product that subsequently may be cloned and propagated in a host cell. However, in some embodiments, one or more intermediates that are difficult to assemble correctly in an in vitro multiplex assembly reaction may be more readily assembled and amplified by cloning into a vector and transforming into a host cell. For example, the presence of one or more direct or inverted repeats, high GC content, etc., may cause assembly errors in an in vitro multiplex assembly reaction that can be avoided using one or more rounds of vector cloning and in vivo amplification. In some embodiments, nucleic acid size also may determine whether vector cloning and in vivo amplification are used for further assembly. In some embodiments, nucleic acids that are longer than about 1.5 kb may be further assembled using vector cloning and in vivo amplification. In some embodiments, nucleic acids may be predicted to be difficult to assemble using in vitro multiplex assembly (e.g., due to the presence of one or more sequence features predicted to interfere with in vitro multiplex assembly). In some embodiments, nucleic acids may be experimentally determined to be difficult to assemble correctly using in vitro multiplex assembly (e.g., a correct final product is not generated). In any of these situations, one or more assembly steps involving cloning and host cell transformation may be used to obtain a correct product of interest. Accordingly, an assembly strategy may be designed to provide an integrated overlapping enzyme system (also known as ION) that provides one or more intermediate cloning and host cell transformation cycles that may be combined with in vitro multiplex assembly steps. In some embodiments, this assembly is hierarchical. A first plurality of first intermediates may be generated by any suitable method (e.g., ligation, extension, etc., or a combination thereof). Each first intermediate may be cloned into a first vector and amplified in a host cell preparation. These first intermediates then may be grouped together into subsets and the intermediates in each subset may be assembled and cloned into a second vector in a second cloning step. The intermediates in each subset correspond to adjacent sequences in the target nucleic acid. This second cloning step generates a second plurality of intermediates with a smaller number of larger intermediates than the first plurality. The ratio of the numbers of intermediates in the first and second pluralities is related to the number of first intermediates that are cloned together in each subset during the second cloning step. For example, if subsets of N first intermediates are combined and cloned together in the second cloning step, the number of intermediates in the second plurality will be 1 /N the number of intermediates in the first plurality. However, it should be appreciated that different subsets may contain different numbers of first intermediates that are cloned together during the second cloning step. This cycle may be repeated one or more times (e.g., subsets of the second plurality of intermediates may be assembled in a third cloning step to generate a third plurality of intermediates, etc.) until a final single product is generated. For example, FIG. 9 illustrates a non-limiting embodiment of an integrated overlapping enzyme cloning strategy where nine first intermediates each approximately 0.5 kb in length are assembled (e.g., using a multiplex ligase or polymerase assembly), cloned into vectors, and transformed into host cells. Subsets of three first intermediates are then cloned together in a second cloning step to generate three second intermediates each approximately 1.5 kb in length. In a third cloning step, the three second intermediates are cloned together in a third cloning step to generate a full length target nucleic acid approximately 4.5 kb in length. It should be appreciated that the sizes of the intermediates that are used, the number of intermediates that are cloned together in each cycle, the number of cycles, and the length of the final product may vary, as the invention is not limited in this respect. In some embodiments, intermediates of about 1.5 kb in length are generated (e.g., in a polymerase-based in vitro multiplex assembly) and further assembled by cloning and host cell transformation.
It should be appreciated that fidelity optimization (e.g., by error removal using a mismatch recognition protein, for example, MUTS) may be performed and any one or more stages during the assembly. For example, fidelity optimization of the first intermediates may be performed (e.g., before or after they are cloned into the first vectors).
The cloning vectors that are used at each stage may be identical. However, different vectors may be used for different cloning reactions. For example, each cloning reaction may use a different vector. The different vectors may have different selectable markers. The different vectors may have different copy numbers. The different vectors may be adapted for inserts of different lengths. For example, vectors that are more suited for large inserts may be used at later stages in an assembly. In some embodiments, two different vectors (e.g., with different selectable markers) may be used and alternated in sequential cloning steps.
One or more assembly steps may be automated (e.g., using a robotic handler or a microfluidic device). Automation may be facilitated by avoiding fragment isolation (e.g., based on electrophoretic size separation) during one or more cloning steps associated with any stage of assembly described herein. In some embodiments, two or more first fragments (e.g., different first fragments) may be removed from two or more first vectors and cloned together into a second vector in a single reaction mixture that comprises one or more restriction enzymes and one or more ligases. Transfer of nucleic acid fragment inserts from the first vectors to the second vector may be promoted in a single reaction mixture containing the first and second vectors, a ligase, and one or more restriction enzymes. For example, if the restriction enzyme(s) excise the fragments from the first vector and the ligase ligates the fragments into the second vector in a form that is not cleaved by the restriction enzyme(s), the reaction is driven towards fragment assembly in the second vector because this integration is not reversed by the restriction enzyme. Selection for fragment integration into the second vector may be performed by using a different selectable markers on each of the two vectors (e.g., ampicillin resistance on the first vector and chloramphenicol resistance on the second vector). After simultaneous digestion and ligation, the reaction mixture may be transformed into a host cell that is then exposed to an appropriate selection. Any combination of different selectable markers and selections may be used if it enables the second vector containing the assembled first fragments to be selected over the first vectors. In the assembly reaction, the second vector may be provided in a linear form with incompatible free ends to avoid vector re-ligation that would generate a background of empty vectors having the second selectable marker. It should be appreciated that type IIS restrictions enzymes may be used to generate appropriate insert fragments from the first vectors. The type IIS sites may be located on a first vector on both sides of a fragment being excised. The type IIS sites may be oriented such that excised fragment does not contain the type IIS sites. As a result, the type IIS sites are not present in the second vector after fragment integration. It should be appreciated that the backbone of the second vector may be designed, selected, or modified to avoid containing any of the type IIS restriction sites that are used to excise the first fragments from the first vector. Each of the first fragments in the first vectors may be flanked by the same type IIS restriction site to allow excision of all of the fragments using the same enzyme. In some embodiments, different type IIS sites and enzymes may be used to excise fragments from different first vectors. However, they are preferably selected to generate appropriate compatible ends (e.g., complementary overhangs) so that the excised fragments can be ligated together without requiring any further processing. In one embodiment, three first vectors contain different fragments (a, b, and c) flanked by Bbsl sites. The first vectors all encode ampicillin resistance. The first vectors are incubated in a single reaction along with Bbs I, a ligase (e.g., T4 ligase), and a second vector that encodes chloramphenicol resistance. The second vector may be linearized to generate free ends, each one of which is compatible with a free end of one of the first fragments. After transformation of the digestion/ligation reaction into a chloramphenicol sensitive host cell, a correctly ligated vector containing fragments a, b, and c in the correct order may be selected for using chloramphenicol. However, it should be appreciated that the method may be used with different restriction sites and enzymes, different ligases, different vectors with different selectable markers, and different numbers of inserts (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 10-20, or more), that may be assembled in a single concerted reaction according to the invention. This process may be repeated through several cycles by using vectors encoding different selectable markers at each cycle. In some embodiments, a pair of vectors encoding different selectable markers may be used. Each vector is used as the second (receiving) vector in alternate cycles.
It should be appreciated that a quality control procedure may be performed at one or more steps in a multi-stage assembly of the invention. For example, an ION assembly may involve a quality control at one or more intermediate stages. In some embodiments, quality control may be performed at each intermediate stage. A quality control procedure may include one or more techniques designed to distinguish incorrectly assembled intermediates from correctly assembled intermediates. For example, a quality control procedure may include sequencing, amplification (e.g., by PCR, LCR, etc.), restriction enzyme digestion, size analysis (e.g., using electrophoresis, mass spectrometry, etc.), any other suitable quality control technique, or any combination of two or more thereof. One advantage of real-time quality control during a multi-stage assembly is the early identification of one or more incorrectly assembled intermediates before the final product is generated and analyzed. An incorrectly assembled intermediate can be re-synthesized or re-assembled in a correct format and then re- introduced into an assembly process at an appropriate stage to be incorporated into a final nucleic acid product. In some embodiments, an incorrect assembly may be indicative of the presence, in the intermediate nucleic acid, of certain sequences that are difficult to assemble. Certain sequences may be difficult to assemble, because they contain sequences that are unstable (e.g., because they are toxic, they contain certain direct or inverted sequence repeats, etc.. However, certain sequences may be difficult to assemble due to sequence features that interfere with an assembly reaction (e.g., because they contain certain direct or inverted sequence repeats, they contain high percentages of certain bases, for example they have a high GC content, etc.). In some embodiments, one or more alternative assembly techniques may be used to generate an intermediate nucleic acid that was incorrectly assembled using a first assembly technique. For example, a different vector and/or a different host organism may be used. Different assembly methods (e.g., extension, ligation, or a combination thereof) and/or different starting nucleic acids (e.g., different oligonucleotides, etc.) may be used. In some embodiments, two or more smaller fragments of an intermediate that was incorrectly assembled may be prepared. This may identify a smaller region that contains challenging sequences that can then be assembled using one or more alternative techniques. It should be appreciated that for some sequences, a correctly assembled nucleic acid (e.g., a correctly assembled intermediate) may be obtained without using alternative assembly techniques, but instead by screening a larger number of potential constructs (e.g., clones) to identify a correct one.
Concerted Assembly
According to aspects of the invention, a plurality of nucleic acid fragments may be assembled in a single concerted procedure whereinrthe plurality of fragments is mixed together under conditions that promote covalent assembly of the fragments to generate a specific longer nucleic. In some embodiments, concerted assembly techniques may be used in combination with iterative assembly techniques described herein (e.g., for example at different stages of an assembly process — or more that two inserts, for example 3, 4, 5, 6, 7, 8, 9, 10, or more may be added at each step of an iterative assembly described herein, wherein only the outer inserts have the activation sequences). According to aspects of the invention, a plurality of nucleic acid fragments may be covalently assembled in vivo in a host cell. In some embodiments, a plurality of nucleic acid fragments (e.g., n different nucleic acid fragments) may be mixed together without ligase and transformed into a host cell where they are covalently joined together to produce a longer nucleic acid (e.g., containing the n different nucleic acid fragments covalently liked together). However, a ligase and/or recombinase may be used in some embodiments (e.g., added to a plurality of nucleic acid fragments prior to a host cell transformation). In some embodiments, 5 or more (e.g., 10 or more, 15 or more, 15 to 20, 20 to 25, 25 to 30, 30 to 35, 35 to 40, 40 to 45, 45 to 50, 50 or more, etc.) different nucleic acid fragments may be assembled (e.g., in a concerted in vivo assembly without using ligase). However, it should be appreciated that any number of nucleic acids (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, etc.) may be assembled using concerted assembly techniques. Each nucleic acid fragment being assembled may be between about 100 nucleotides long and about 1 ,000 nucleotides long (e.g., about 200, about 300, about 400, about 500, about 600, about 700, about 800, about 900). However, longer (e.g., about 2,500 or more nucleotides long, about 5,000 or more nucleotides long, about 7,500 or more nucleotides long, about 10,000 or more nucleotides long, etc.) or shorter nucleic acid fragments may be assembled using a concerted assembly technique (e.g., shotgun assembly into a plasmid vector). It should be appreciated that the size of each nucleic acid fragment may be independent of the size of other nucleic acid fragments added to a concerted assembly. However, in some embodiments, each nucleic acid fragment may be approximately the same size (e.g., between about 400 nucleotides long and about 800 nucleotides long). It should be appreciated that the length of a double-stranded DNA fragment may be indicated by the number of base pairs. As used herein, a nucleic acid fragment referred to as "x" nucleotides long corresponds to "x" base pairs in length when used in the context of a double-stranded DNA fragment.
In some embodiments, one or more nucleic acids being assembled in a concerted reaction (e.g., 1-5, 5-10, 10-15, 15-20, etc.) may be codon-optimized and/or non- naturally occurring. In some embodiments, all of the nucleic acids being assembled in a concerted reaction are codon-optimized and/or non-naturally occurring. In some aspects of the invention, nucleic acid fragments being assembled are designed to have overlapping complementary sequences. In some embodiments, the nucleic acid fragments are double-stranded DNA fragments with 3' and/or 5' single- stranded overhangs. These overhangs may be cohesive ends that can anneal to complementary cohesive ends on different DNA fragments. According to aspects of the invention, the presence of complementary sequences (and particularly complementary cohesive ends) on two DNA fragments promotes their covalent assembly in vivo. In some embodiments, a plurality of DNA fragments with different overlapping complementary single-stranded cohesive ends are assembled and their order in the assembled nucleic acid product is determined by the identity of the cohesive ends on each fragment. For example, the nucleic acid fragments may be designed so that a first nucleic acid has a first cohesive end that is complementary to a first cohesive end of the vector and a second cohesive end that is complementary to a first cohesive end of a second nucleic acid. The second cohesive end of the second nucleic acid may be complementary to a first cohesive end of a third nucleic acid. The second cohesive end of the third nucleic acid may be complementary a first cohesive end of a fourth nucleic acid. And so on through to the final nucleic acid that has a first cohesive end that may be complementary to a second cohesive end on the penultimate nucleic acid. The second cohesive end of the final nucleic acid may be complementary to a second cohesive end of the vector. According to aspects of the invention, this technique may be used to generate a vector containing nucleic acid fragments assembled in a predetermined linear order (e.g., first, second, third, forth, ..., final).
In certain embodiments, the overlapping complementary regions between adjacent nucleic acid fragments are designed (or selected) to be sufficiently different to promote (e.g., thermodynamically favor) assembly of a unique alignment of nucleic acid fragments (e.g., a selected or designed alignment of fragments). It should be appreciated that overlapping regions of different length may be used. In some embodiments, longer cohesive ends may be used when higher numbers of nucleic acid fragments are being assembled. Longer cohesive ends may provide more flexibility to design or select sufficiently distinct sequences to discriminate between correct cohesive end annealing (e.g., involving cohesive ends designed to anneal to each other) and incorrect cohesive end annealing (e.g., between non-complementary cohesive ends).
In some embodiments, two or more pairs of complementary cohesive ends between different nucleic acid fragments may be designed or selected to have identical or similar sequences in order to promote the assembly of products containing a relatively random arrangement (and/or number) of the fragments that have similar or identical cohesive ends. This may be useful to generate libraries of nucleic acid products with different sequence arrangements and/or different copy numbers of certain internal sequence regions.
As illustrated above, each of the two terminal nucleic acid fragments (e.g., the terminal fragment at each end of an assembled product) may be designed to have a cohesive end that is complementary to a cohesive end on a vector (e.g., on a linearized vector). These cohesive ends may be identical cohesive ends that can anneal to identical complementary terminal sequences on a linearized vector. However, in some embodiments the cohesive ends on the terminal fragments are different and the vector contains two different cohesive ends, one at each end of a linearized vector), each complementary to one of the terminal fragment cohesive ends. Accordingly, the vector may be a linearized plasmid that has two cohesive ends, each of which is complementary with one end of the assembled nucleic acid fragments.
In some embodiments, the nucleic acid fragments are mixed with a vector and incubated before transformation into a host cell. It should be appreciated that incubation under conditions that promote specific annealing of the cohesive ends may increase the frequency of assembly (e.g., correct assembly) upon transformation into the host organism. In some embodiments, the different cohesive ends are designed to have similar melting temperatures (e.g., within about 5 0C of each other) so that correct annealing of all of the fragments is promoted under the same conditions. Correct annealing may be promoted at a different temperature depending on the length of the cohesive ends that are used. In some embodiments, cohesive ends of between about 4 and about 30 nucleotides in length (e.g., cohesive ends of about 5, about 10, about 15, about 20, about 25, or about 30 nucleotides in length) may be used. Incubation temperatures may range from about 20 0C to about 500C (including, e.g., 37 0C). However, higher or lower temperatures may be used. The length of the incubation may be optimized based on the length of the overhangs, the complexity of the overhangs, and the number of different nucleic acids (and therefore the number of different overhangs) that are mixed together. The incubation time also may depend on the annealing temperature and the presence or absence of other agents in the mixture. For example, a nucleic acid binding protein and/or a recombinase may be added (e.g., RecA, for example a heat stable RecA protein). The resulting complex of nucleic acids may be transformed directly into a host without using a ligase. One or more host functions (e.g., ligation, recombination, any other suitable function, or any combination thereof) then form the covalently linked structure. In some embodiments, a ligase may be added prior to transformation. However, it should be appreciated that the expense of a ligase (including, for example, the expense of storing and dispensing the ligase, e.g., automatically) may be avoided by using a ligase-free concerted assembly method of the invention.
In some embodiments, nucleic acid fragments and a vector are transformed into a host cell without any prior incubation period (other than the time required for mixing the nucleic acids and performing the transformation). In some embodiments, a recombinase (for example RecA, e.g., a thermostable RecA) and/or a nucleic acid binding protein may be mixed with the nucleic acid fragments and the vector, and optionally incubated, prior to transformation into a host cell.
It should be appreciated that a plurality of nucleic acid fragments being assembled all may have complementary 3' overhangs, complementary 5' overhangs, or a combination thereof. However, the complementary regions of two nucleic acid fragments that are designed to be adjacent should have the same type of overhang. For example, if nucleic acid "n" has a 5' overhang at its second end, then nucleic acid "n+1" should have a 5' overhang at its first end. However, nucleic acid "n+1" may have a 3' overhang at its second end if nucleic acid "n+2" has a 3' overhang at its first end. It should be understood that different nucleic acid assembly configurations may be designed and constructed. For example, a concerted assembly may involve multiple copies of certain nucleic acids and single copies of other nucleic acids. In some embodiments, one or more nucleic acid fragments being assembled may have blunt ends. In some embodiments, double-stranded blunt ends may have overlapping identical sequences on nucleic acid fragments that are designed to be adjacent to each other on an assembled nucleic acid product.
Any suitable vector may be used for any assembly method described herein (e.g., concerted assembly, iterative assembly, etc., or any combination thereof) as the invention is not so limited. For example, a vector may be a plasmid, a bacterial vector, a viral vector, a phage vector, an insect vector, a yeast vector, a mammalian vector, a BAC, a YAC, or any other suitable vector. In some embodiments, a vector may be a vector that replicates in only one type of organism (e.g., bacterial, yeast, insect, mammalian, etc.) or in only one species of organism. Some vectors may have a broad host range. Some vectors may have different functional sequences (e.g., origins or replication, selectable markers, etc.) that are functional in different organisms. These may be used to shuttle the vector (and any nucleic acid fragment(s) that are cloned into the vector) between two different types of organism (e.g., between bacteria and mammals, yeast and mammals, etc.). In some embodiments, the type of vector that is used may be determined by the type of host cell that is chosen.
It should be appreciated that a vector may encode a detectable marker such as a selectable marker (e.g., antibiotic resistance, etc.) so that transformed cells can be selectively grown and the vector can be isolated and any insert can be characterized to determine whether it contains the desired assembled nucleic acid. The insert may be characterized using any suitable technique (e.g., size analysis, restriction fragment analysis, sequencing, etc.). In some embodiments, the presence of a correctly assembled nucleic acid in a vector may be assayed by determining whether a function predicted to be encoded by the correctly assembled nucleic acid is expressed in the host cell.
In some embodiments, host cells that harbor a vector containing a nucleic acid insert may be selected for or enriched by using one or more additional detectable or selectable markers that are only functional if a correct (e.g., designed) terminal nucleic acid fragments is cloned into the vector.
Accordingly, a host cell should have an appropriate phenotype to allow selection for one or more drug resistance markers encoded on a vector (or to allow detection of one or more detectable markers encoded on a vector). However, any suitable host cell type may be used (e.g., prokaryotic, eukaryotic, bacterial, yeast, insect, mammalian, etc.). In some embodiments, the type of host cell may be determined by the type of vector that is chosen. A host cell may be modified to have increased activity of one or more ligation and/or recombination functions. In some embodiments, a host cell may be selected on the basis of a high ligation and/or recombination activity. In some embodiments, a host cell may be modified to express (e.g., from the genome or a plasmid expression system) one or more ligase and/or recombinase enzymes.
A host cell may be transformed using any suitable technique (e.g., electroporation, chemical transformation, infection with a viral vector, etc.). Certain host organisms are more readily transformed than others. In some embodiments, all of the nucleic acid fragments and a linearized vector are mixed together and transformed into the host cell in a single step. However, in some embodiments, several transformations may be used to introduce all the fragments and vector into the cell (e.g., several consecutive transformations using subsets of the fragments). It should be appreciated that the linearized vector is preferably designed to have incompatible ends so that it can only be circularized (and thereby confer resistance to a selectable marker) if the appropriate fragments are cloned into the vector in the designed configuration. This avoids or reduces the occurrence of "empty" vectors after selection.
Single-stranded Overhangs
Certain aspects of the invention involve double-stranded nucleic acids with single-stranded overhangs. Overhangs may be generated using any suitable technique. In some embodiments, a double-stranded nucleic acid fragment (e.g., a fragment assembled in a multiplex assembly) may be digested with an appropriate restriction enzyme to generate a terminal single-stranded overhang. In some embodiments, fragments that are designed to be adjacent to each other in an assembled product may be digested with the same enzyme to expose complementary overhangs. In some embodiments, overhangs may be generated using a type IIS restriction enzyme. Type IIS restriction enzymes are enzymes that bind to a double stranded nucleic acid at one site, referred to as the recognition site, and make a single double stranded cut outside of the recognition site. The double stranded cut, referred to as the cleavage site, is generally situated 0-20 bases away from the recognition site. The recognition site is generally about 4-7 bp long. All type IIS restriction enzymes exhibit at least partial asymmetric recognition. Asymmetric recognition means that 5'→3' recognition sequences are different for each strand of the nucleic acid. The enzyme activity also shows polarity meaning that the cleavage sites are located on only one side of the recognition site. Thus, there is generally only one double stranded cut corresponding to each recognition site. Cleavage generally produces 1-5 nucleotide single-stranded overhangs, with 5' or 3' termini, although some enzymes produce blunt ends. Either cut is useful in the context of the invention, although in some instances those producing single-stranded overhangs are produced. To date, ~80 type IIS enzymes have been identified. Examples include but are not limited to BstF5 I, BtsC I, BsrD I, Bts I, AIw I, Bcc I, BsmA I, Ear I, MIy I (blunt), PIe I, Bmr I, Bsa I, BsmB I, Fau I, MnI I, Sap I, Bbs I, BciV I, Hph I, Mbo II, BfUA I, BspCN I, BspM I, SfaN I, Hga I, BseR I, Bbv I, Eci I, Fok I, BceA I, BsmF I, BtgZ I, BpuE I, Bsg I, Mme I, BseG I, Bse3D I, BseM I, AcIW I, Alw26 I, Bst6 I, BstMA I, Eaml 104 I, Ksp632 I, Pps I, Sch I (blunt), Bfi I, Bso31 I, BspTN I, Eco31 I, Esp3 I, Smu I, Bfu I, Bpi I, BpuA I, BstV2 I, AsuHP I, Acc36 I, Lwe I, Aar I, BseM II, TspDT I, TspGW I, BseX I, BstVl I, Eco57 I, Eco57M I, Gsu I, and Beg I. Such enzymes and information regarding their recognition and cleavage sites are available from commercial suppliers such as NEB. In some embodiments, each of a plurality of nucleic acid fragments designed for concerted assembly may have a type IIS restriction site at each end. The type IIS restriction sites may be oriented so that the cleavage sites are internal relative to the recognition sequences. As a result, enzyme digestion exposes an internal sequence (e.g., an overhang within an internal sequence) and removes the recognition sequences from the ends. Accordingly, the same type IIS sites may be used for both ends of all of the nucleic acid fragments being prepared for assembly. However, different type IIS sites also may be used. Two fragments that are designed to be adjacent in an assembled product each may include an identical overlapping terminal sequence and a flanking type IIS site that is appropriately located to expose complementary overhangs within the overlapping sequence upon restriction enzyme digestion. Accordingly, a plurality of nucleic acid fragments may be generated with different complementary overhangs. The restriction site at each end of a nucleic acid fragment may be located such that digestion with the appropriate type IIS enzyme removes the restriction site and exposes a single- stranded region that is complementary to a single-stranded region on a nucleic acid fragment that is designed to be adjacent in the assembled nucleic acid product. In some embodiments, one end of each of the two terminal nucleic acid fragments may be designed to have a single-stranded overhang (e.g., after digestion with an appropriate restriction enzyme) that is complementary to a single-stranded overhang of a linearized vector nucleic acid. Accordingly, the resulting nucleic acid fragments and vector may be transformed directly into a host cell. Alternatively, the nucleic acid fragments and vector may be incubated to promote hybridization and annealing of the complementary sequences prior to transformation in the host cell. It should be appreciated that a vector may be prepared using any one of the techniques described herein or any other suitable technique that produces a single-stranded overhang that would be complementary to an end of one of the terminal nucleic acid fragments.
It should be appreciated that a type IIS recognition site may be present within a sequence being assembled. If the corresponding type IIS restriction enzyme is used during one or more assembly steps described herein, unwanted restriction fragments may be generated and they may interfere with the yield of correctly assembled nucleic acids. One or more different strategies may be used to avoid unwanted type IIS cleavage. In certain embodiments, an assembly strategy involves identifying type IIS recognition sites that are not present in a target nucleic acid of interest. One or more of these selected sites (and the corresponding enzymes) may be used in one or more assembly steps described herein without cutting the target nucleic acid at unwanted sites.
In some embodiments, a nucleic acid sequence may be designed to remove any type IIS recognition sites. In a coding sequence, the removal of such sites may be achieved while preserving the integrity of the nucleic acid sequence code. For example, the degeneracy of certain amino acid codons may allow nucleotide base substitutions to be made to remove a type IIS recognition site while retaining a codon for the same amino acid (e.g., replace one codon for another). Such substitutions are known to those of ordinary skill in the art and can be made using no more than routine methods. In certain embodiments, a type IIS restriction enzyme recognition site on a nucleic acid being assembled may be masked to prevent unwanted cleavage at that site. A recognition site may be masked by using a masking molecule (e.g., a nucleic acid) that binds to the recognition site (e.g., because it is complementary to one of the strands in the recognition site). "Masking" of a cleavage site to prevent unwanted cleavage may in some circumstances be referred to as "blocking" (as in "a blocking oligonucleotide"), and it should be understood to mean the same. See, for example, Example 4. In certain embodiments, a recognition site may be masked with a molecule capable of masking a restriction enzyme recognition site by preventing cleavage without preventing the enzyme from binding to its recognition site (for example, pcPNA).
In certain embodiments, complete or partial methylation of a nucleic acid may be performed to prevent unwanted cleavage at the methylated site. However, in order to maintain cleavage at certain type IIS sites that are used during assembly (assembly recognition sites), a masking molecule (e.g., nucleic acid) may be used to prevent methylation at certain sites. Accordingly, one or more assembly recognition sites may be masked from methylation while the rest of the nucleic acid molecule is methylated. Methylation sensitive restriction enzymes may be used to cleave the cleavage site which is unmethylated. In some embodiments, complete methylation of a nucleic acid molecule may be followed by selective demethylation to allow cleavage only at the type IIS recognition site. In some instances, appropriate E. coli host strains should be selected according to the type of masking strategy being employed. For example, Sssl methylated DNA is mcr sensitive. In such situations, an E. coli strain lacking mcrA, mcrBC and Mrr must be used or the DNA will be degraded. It is understood that the skilled artisan is familiar with how to select suitable host strains. Examples of suitable host strains include but are not limited to:
DHlOB genotype: F" mcrA Δ(mrr-hsdRMS-mcrBC) Φ80/αcZΔM15 Δ/αcX74 recAl em/Al araA\39 Mara, leu)1691 galϋ galK λ- rpsL (StrR) nupG; and TOPlO genotype: F" mcrA Δ(mrr-hsdRMS-mcrBC) Φ80/αcZΔM15 Δ/αcX74 recAl αrαΔl39 A(ara- leu)7697 galU galK rpsL (Str*) endAλ nupG. Any suitable masking molecule may be used to prevent cleavage or to prevent methylation. Any suitable nucleic acid may be used, including, for example, DNA, peptide nucleic acid (PNA), pseudocomplementary peptide nucleic acid (pcPNA), locked nucleic acid (LNA), etc. A masking nucleic acid should be long enough to bind to a site with sufficient affinity to specifically prevent cleavage or methylation at that site. For example, a masking nucleic acid may be between about 15 and about 50 nucleotides long (or shorter or longer depending on the context). For example, a masking nucleic acid may be about 60 nucleotides long, about 30 nucleotides long, or any other suitable length. A masking molecule may be capable of binding to a nucleic acid molecule at more than one location and on either one or both strands of the molecule. In some embodiments, a different sequence specific masking nucleic acid may be used for each site that is being protected. In some embodiments, masking of a cleavage site may be achieved by forming a complex with a specific protein or using Hoogsteen base pairing to mask the cleavage site. In some embodiments, RecA or any other suitable recombinase may be included to assist the binding of a masking molecule (e.g., nucleic acid) to a nucleic acid site being protected from cleavage or methylation.
In some embodiments, instead of using a type IIS restriction enzyme, site specific cleavage may be obtained using specific cleavage of DNA molecules at RecA-mediated triple-stranded structures. For example, specific cleavage may be obtained using an enzyme capable of cleaving a nucleic acid molecule specifically at a site where a triple- stranded DNA structure is located (e.g., using Sl or BAL31). A triple-stranded DNA structure may be generated using a nucleic acid (e.g., oligonucleotide) that is complementary to one strand of a double-stranded target sequence of interest. The formation of a triple-stranded structure may be promoted by RecA or other suitable recombinase enzyme. Certain enzymes may then be used to cut both strands of the double-stranded target nucleic acid at the location of the triple-stranded structure. For example, Sl nuclease cut both strands of the double-stranded target nucleic acid in the context of a triple-stranded structure towards the 5' end of the nucleic acid (e.g., oligonucleotide) that was added to form the structure. Triple-stranded DNA may be formed at any location in a double stranded nucleic acid molecule. In some embodiments, a complementary nucleic acid molecule may be used to form a triple- stranded DNA molecule. In other embodiments a homologous deoxynucleotide may be used to form a triple-stranded DNA molecule. In certain embodiments, formation of a triple-stranded DNA molecule is performed in the presence of RecA protein. Further examples may be found in Shigemori et al. (2004, Nucleic Acids Research, 32(1): 1-8). the entire contents of which are incorporated herein by reference. Accordingly, targeted triple-helix cleavage may be used instead of a type IIS cleavage in certain assembly reactions described herein to avoid cleavage at unwanted sites within a target nucleic acid.
In certain embodiments, a meganuclease restriction enzyme may be used to cleave a nucleic acid molecule at a rare position. Meganuclease restriction enzymes specifically recognize long nucleic acid target sites. In some embodiments, a meganuclease restriction enzyme cleaves both strands of a nucleic acid at its specific cleavage site. In some embodiments, a meganuclease recognition site may be about 12- 45 base pairs, hi other embodiments, a meganuclease recognition site may be about 10, about 15, about 20, about 25, about 30, about 35, about 40 or about 45 base pairs. Restriction enzymes with longer recognition sites also may be used. An example of a meganuclease is a homing endonuclease which may be found in phages, bacteria, archaebacteria and various eukaryotes (see for example Epinat et al., 2003, Nucleic
Acids Research, 31(11):2953-2962; the entire contents of which are herein incorporated by reference). In certain embodiments, other rare-cutter enzymes may be used (e.g., Notl etc.). Accordingly, a meganuclease or rare-cutter recognition site may be used instead of a type IIS site in certain assembly reactions described herein (along with the appropriate meganucleases and/or rare-cutter enzymes) to avoid cleavage at unwanted sites within a target nucleic acid.
Enzymatic digestions of DNA with type II or site-specific restriction enzymes typically generate an overhang of four to six nucleotides. These short cohesive ends may be sufficient for ligating two fragments of DNA containing complementary termini. However, when joining multiple DNA fragments together, longer complementary cohesive termini are preferred to facilitate assembly and to ensure specificity. Accordingly, other techniques may be used to expose longer single-stranded overhangs. In some embodiments, uracil DNA glycosylase (UDG) may be used to hydrolyze a uracil-glycosidic bond in a nucleic acid thereby removing uracil and creating an alkali- sensitive abasic site in the DNA which can be subsequently hydrolyzed by endonuclease, heat or alkali treatment. As a result, a portion of one strand of a double-stranded nucleic acid may be removed thereby exposing the complementary sequence in the form of a single-stranded overhang. This approach requires the deliberate incorporation of one or more uracil bases on one strand of a double-stranded nucleic acid fragment. This may be accomplished, for example, by amplifying a nucleic acid fragment using an amplification primer that contains a 3.' terminal uracil. After treatment with UDG, the region of the primer 5' to the uracil may be released (e.g., upon dilution, incubation, exposure to mild denaturing conditions, etc.) thereby exposing the complementary sequence as a single- stranded overhang. It should be appreciated that the length of the overhang may be determined by the position of the uracil on the amplifying primer and by the length of the amplifying primer. UDG is commercially available from suppliers such as Roche Applied Science. In other embodiments, a technique for exposing a single-stranded overhang may involve a polymerase (e.g., T4 DNA polymerase) that has a suitable editing function. For example, T4 DNA polymerase possesses 3' -> 5' exonuclease activity. While this activity favors single-stranded regions, it can function, albeit somewhat less efficiently, on blunt ends. Accordingly, in the absence of any exogenous nucleotides, the 3' ends of a nucleic acid fragment contacted with T4 DNA polymerase will be progressively digested. The 5'->3' polymerase activity of T4 may attempt to replace an excised nucleotide. However, by limiting the type of nucleotides available for incorporation, it is possible to avoid incorporation and favor further excision. In some embodiments, progressive excision on a 3' -> 5' strand may be halted at the first occurrence (in the 3' -> 5' direction) of one of the four nucleotides by providing that nucleotide in sufficient amounts in the reaction mixture. The presence of the nucleotide in the reaction will result in an equilibrium being reached between the excision of the nucleotide and its re- incorporation by T4. In some embodiments, a single-stranded overhang may be generated at both ends of a nucleic acid fragment (e.g., if each 3' end does not contain the nucleotide that is added in the T4 polymerase reaction). In some embodiments, the length of the overhang generated at each end is a function of the sequence at each end (e.g., the length of the 3' sequence that is free of the nucleotide that is added in the T4 polymerase reaction).
In some embodiments, single-stranded overhangs may be generated by incubating a double-stranded nucleic acid with a polymerase that has an editing function (e.g., T4 DNA polymerase) without adding any nucleotides. The length of the overhangs may be a function of the incubation time. Accordingly, suitable incubation conditions (including suitable incubation times, for example) may be determined to obtain suitable average overhangs (e.g., about 10, about 20, about 30, about 40, about 50 nucleotides long, etc.).
Sequence analysis and fragment design and selection for concerted assembly Aspects of the invention may include analyzing the sequence of a target nucleic acid and designing an assembly strategy based on the identification of regions, within the target nucleic acid sequence, that can be used to generate appropriate cohesive ends (e.g., single-stranded overhangs). These regions may be used to define the ends of fragments that can be assembled (e.g., in a concerted reaction) to generate the target nucleic acid. The fragments can then be provided or made (e.g., in a multiplex assembly reaction). In some embodiments, a target nucleic acid sequence may be analyzed to identify regions that contain at most three different types of nucleotide (i.e., they are missing at least one of G, A, T, or C) on one strand of the target nucleic acid. These regions may be used to generate cohesive ends using a polymerase (e.g., T4 DNA polymerase) processing technique described herein. It should be appreciated that the length of a cohesive end is preferably sufficient to provide specificity. For example, cohesive ends may be long enough to have sufficiently different sequences to prevent or reduce mispairing between similar cohesive ends. However, their length is preferably not long enough to stabilize mispairs between similar cohesive sequences. In some embodiments, a length of about 9 to about 15 bases may be used. However, any suitable length may be selected for a region that is to be used to generate a cohesive overhang. The importance of specificity may depend on the number of different fragments that are being assembled simultaneously. Also, the appropriate length required to avoid stabilizing mispaired regions may depend on the conditions used for annealing different cohesive ends.
In some embodiments, a target nucleic acid sequence may be analyzed to identify potential cohesive end regions as follows. One or more regions (e.g., about 9-15 base long regions) free of either G, A, T, or C may be identified on one strand of a target nucleic acid. One or more regions (e.g., about 9-15 base regions) free of the complementary nucleotide may be identified on the same strand. For example, regions free of C and regions free of G may be identified on one strand of the target nucleic acid. Alternating regions (e.g., alternating C-free and G-free regions) may be selected to define the ends of nucleic acid fragments to be used for assembly so that both ends of each fragment can be processed to generate cohesive ends. For example, a fragment with a C-free region at one end and a G-free region at the other end of each strand can be . processed to generate cohesive overhangs at each end. In this embodiment, the C-free region is the 3' region on both strands and the overhang is generated by adding C to the T4 polymerase reaction. Similar configurations may be used with any one of G, A, T, or C.
In some embodiments, alternating regions may be selected if they are separated by distances that define fragments with suitable lengths for the assembly design. In some embodiments, the alternating regions may be separated by about 200 to about 1,500 bases. However, any suitable shorter or longer distance may be selected. For example, the cohesive regions may be separated by about 500 to about 5,000 bases. It should be appreciated that different patterns of alternating regions may be available depending on several factors (e.g., depending on the sequence of the target nucleic acid, the chosen length of the cohesive ends, and the desired fragment length). In some embodiments, if several options are available, the regions may be selected to maximize the sequence differences between different cohesive ends.
Selection of the cohesive regions defines the fragments that will be assembled to generate the target nucleic acid. Accordingly, the fragment size may be between about 200 and about 1,500 base pairs long, between about 500 and about 5,000 bases long, or shorter or longer depending on the target nucleic acid.
The fragments may be generated or obtained using any suitable technique. In some embodiments, each fragment may be assembled (e.g., in a multiplex oligonucleotide assembly reaction) so that it is flanked by double stranded regions that will be used to generate the cohesive single-stranded regions.
A fragment may be amplified in vitro (e.g., by PCR, LCR, etc.). In some embodiments, a fragment may be amplified in vivo. For in vivo amplification, a nucleic acid may be cloned into a vector having suitable flanking restriction sites. The restriction sites may be used to excise a fragment with appropriate end sequences that can be used to generate cohesive ends (e.g., with appropriate single-stranded lengths). In some embodiments, type IIS restriction enzymes may be used to cut out an appropriate fragment. A type IIS restriction site may be provided by the vector into which a nucleic acid is cloned. Alternatively or additionally, a type IIS restriction site may be provided at the end of a nucleic acid that is cloned into a vector (e.g., at the end of a fragment that is assembled in a multiplex oligonucleotide assembly reaction). After amplification in vivo, a type IIS fragment may be isolated and processed as described herein to generate the cohesive ends. It should be appreciated that any type IIS enzyme may be used, provided that its restriction site is placed at a suitable distance from the cohesive region so that the type IIS fragment can be appropriately processed. A fragment may be processed to generate cohesive ends regardless of whether the type IIS digestion generates overhangs or blunt ends. In some embodiments, the overhangs generated by a type IIS enzyme may not be long enough to provide sufficient specificity. In some embodiments, each fragment is assembled and fidelity optimized to remove error containing nucleic acids (e.g., using one or more post-assembly fidelity optimization techniques described herein) before being processed to generated cohesive ends. In some embodiments, the fidelity optimization may be performed on the synthesized fragments after they are ligated into a first vector used for amplification. However, in some embodiments, the fragments may not be fidelity optimized, or they may be fidelity optimized after treatment to generate cohesive ends.
It should be appreciated that the different nucleic acid fragments that are used to assemble a target nucleic acid may be obtained or synthesized using different techniques. However, in some embodiments they are all produced using the same technique (e.g., assembled in a multiplex oligonucleotide assembly reaction, cloned into a vector, digested with a type IIS enzyme, and processed with T4 DNA polymerase). The resulting fragments may be assembled in a single step concerted reaction and, for example, cloned into a vector that has a selectable marker. The assembly may include an in vitro ligation. However, in some embodiments, the assembly may be an in vivo shotgun assembly wherein the fragments are transformed into a host cell without undergoing an in vitro ligation.
In some embodiments, fragments are amplified in a first vector that has a first selectable marker and are then combined and assembled into a second vector that has a second selectable marker. As a result, selection for the second selectable marker avoids contamination with the first vector. Accordingly, the reactions may be performed in a procedure that does not require removal (e.g., by purification) of the first vector sequence. Aspects of the invention may include automating one or more acts described herein. For example, sequence analysis, the identification of interfering sequence features, assembly strategy selection (including fragment design and selection, the choice of a particular combination of extension-based and ligation-based assembly reactions, etc.), fragment production, single-stranded overhang production, and/or concerted assembly may be automated in order to generate the desired product automatically. Acts of the invention may be automated using, for example, a computer system.
Aspects of the invention may be used in conjunction with any suitable multiplex nucleic acid assembly procedure. For example, vector-encoded trait activation may be used in connection with or more of the multiplex nucleic acid assembly procedures described below.
Multiplex Nucleic Acid Assembly
Aspects of the invention may involve an assembly procedure wherein a plurality of nucleic acids each assembled in a multiplex assembly procedure (e.g., from oligonucleotides) are combined to form a larger nucleic acid using an iterative assembly procedure described herein. In aspects of the invention, multiplex nucleic acid assembly relates to the assembly of a plurality of nucleic acids to generate a longer nucleic acid product. In one aspect, multiplex oligonucleotide assembly relates to the assembly of a plurality of oligonucleotides to generate a longer nucleic acid molecule. However, it should be appreciated that other nucleic acids (e.g., single or double-stranded nucleic acid degradation products, restriction fragments, amplification products, naturally occurring small nucleic acids, other polynucleotides, etc.) may be assembled or included in a multiplex assembly reaction (e.g., along with one or more oligonucleotides) in order to generate an assembled nucleic acid molecule that is longer than any of the single starting nucleic acids (e.g., oligonucleotides) that were added to the assembly reaction. In certain embodiments, one or more nucleic acid fragments that each were assembled in separate multiplex assembly reactions (e.g., separate multiplex oligonucleotide assembly reactions) may be combined and assembled to form a further nucleic acid that is longer than any of the input nucleic acid fragments. In certain embodiments, one or more nucleic acid fragments that each were assembled in separate multiplex assembly reactions (e.g., separate multiplex oligonucleotide assembly reactions) may be combined with one or more additional nucleic acids (e.g., single or double-stranded nucleic acid degradation products, restriction fragments, amplification products, naturally occurring small nucleic acids, other polynucleotides, etc.) and assembled to form a further nucleic acid that is longer than any of the input nucleic acids.
In aspects of the invention, one or more multiplex assembly reactions may be used to generate target nucleic acids having predetermined sequences. In one aspect, a target nucleic acid may have a sequence of a naturally occurring gene and/or other naturally occurring nucleic acid (e.g., a naturally occurring coding sequence, regulatory sequence, non-coding sequence, chromosomal structural sequence such as a telomere or centromere sequence, etc., any fragment thereof or any combination of two or more thereof). In another aspect, a target nucleic acid may have a sequence that is not naturally-occurring. In one embodiment, a target nucleic acid may be designed to have a sequence that differs from a natural sequence at one or more positions. In other embodiments, a target nucleic acid may be designed to have an entirely novel sequence. However, it should be appreciated that target nucleic acids may include one or more naturally occurring sequences, non-naturally occurring sequences, or combinations thereof.
In one aspect of the invention, multiplex assembly may be used to generate libraries of nucleic acids having different sequences. In some embodiments, a library may contain nucleic acids having random sequences. In certain embodiments, a predetermined target nucleic acid may be designed and assembled to include one or more random sequences at one or more predetermined positions.
In certain embodiments, a target nucleic acid may include a functional sequence (e.g., a protein binding sequence, a regulatory sequence, a sequence encoding a functional protein, etc., or any combination thereof). However, some embodiments of a target nucleic acid may lack a specific functional sequence (e.g., a target nucleic acid may include only non-functional fragments or variants of a protein binding sequence, regulatory sequence, or protein encoding sequence, or any other non-functional naturally-occurring or synthetic sequence, or any non-functional combination thereof). Certain target nucleic acids may include both functional and non-functional sequences. These and other aspects of target nucleic acids and their uses are described in more detail herein.
A target nucleic acid may be assembled in a single multiplex assembly reaction (e.g., a single oligonucleotide assembly reaction). However, a target nucleic acid also may be assembled from a plurality of nucleic acid fragments, each of which may have been generated in a separate multiplex oligonucleotide assembly reaction. It should be appreciated that one or more nucleic acid fragments generated via multiplex oligonucleotide assembly also may be combined with one or more nucleic acid molecules obtained from another source (e.g., a restriction fragment, a nucleic acid amplification product, etc.) to form a target nucleic acid. In some embodiments, a target nucleic acid that is assembled in a first reaction may be used as an input nucleic acid fragment for a subsequent assembly reaction to produce a larger target nucleic acid.
Accordingly, different strategies may be used to produce a target nucleic acid having a predetermined sequence. For example, different starting nucleic acids (e.g., different sets of predetermined nucleic acids) may be assembled to produce the same predetermined target nucleic acid sequence. Also, predetermined nucleic acid fragments may be assembled using one or more different in vitro and/or in vivo techniques. For example, nucleic acids (e.g., overlapping nucleic acid fragments) may be assembled in an in vitro reaction using an enzyme (e.g., a ligase and/or a polymerase) or a chemical reaction (e.g., a chemical ligation) or in vivo (e.g., assembled in a host cell after transfection into the host cell), or a combination thereof. Similarly, each nucleic acid fragment that is used to make a target nucleic acid may be assembled from different sets of oligonucleotides. Also, a nucleic acid fragment may be assembled using an in vitro or an in vivo technique (e.g., an in vitro or in vivo polymerase, recombinase, and/or ligase based assembly process). In addition, different in vitro assembly reactions may be used to produce a nucleic acid fragment. For example, an in vitro oligonucleotide assembly reaction may involve one or more polymerases, ligases, other suitable enzymes, chemical reactions, or any combination thereof. Multiplex oligonucleotide assembly
A predetermined nucleic acid fragment may be assembled from a plurality of different starting nucleic acids (e.g., oligonucleotides) in a multiplex assembly reaction (e.g., a multiplex enzyme-mediated reaction, a multiplex chemical assembly reaction, or a combination thereof). Certain aspects of multiplex nucleic acid assembly reactions are illustrated by the following description of certain embodiments of multiplex oligonucleotide assembly reactions. It should be appreciated that the description of the assembly reactions in the context of oligonucleotides is not intended to be limiting. The assembly reactions described herein may be performed using starting nucleic acids obtained from one or more different sources (e.g., synthetic or natural polynucleotides, nucleic acid amplification products, nucleic acid degradation products, oligonucleotides, etc.). The starting nucleic acids may be referred to as assembly nucleic acids (e.g., assembly oligonucleotides). As used herein, an assembly nucleic acid has a sequence that is designed to be incorporated into the nucleic acid product generated during the assembly process. However, it should be appreciated that the description of the assembly reactions in the context of single-stranded nucleic acids is not intended to be limiting. In some embodiments, one or more of the starting nucleic acids illustrated in the figures and described herein may be provided as double stranded nucleic acids. Accordingly, it should be appreciated that where the figures and description illustrate the assembly of single-stranded nucleic acids, the presence of one or more complementary nucleic acids is contemplated. Accordingly, one or more double-stranded complementary nucleic acids may be included in a reaction that is described herein in the context of a single-stranded assembly nucleic acid. However, in some embodiments the presence of one or more complementary nucleic acids may interfere with an assembly reaction by competing for hybridization with one of the input assembly nucleic acids. Accordingly, in some embodiments an assembly reaction may involve only single- stranded assembly nucleic acids (i.e., the assembly nucleic acids may be provided in a single-stranded form without their complementary strand) as described or illustrated herein. However, in certain embodiments the presence of one or more complementary nucleic acids may have no or little effect on the assembly reaction. In some embodiments, complementary nucleic acid(s) may be incorporated during one or more steps of an assembly. In yet further embodiments, assembly nucleic acids and their complementary strands may be assembled under the same assembly conditions via parallel assembly reactions in the same reaction mixture. In certain embodiments, a nucleic acid product resulting from the assembly of a plurality of starting nucleic acids may be identical to the nucleic acid product that results from the assembly of nucleic acids that are complementary to the starting nucleic acids (e.g., in some embodiments where the assembly steps result in the production of a double-stranded nucleic acid product). As used herein, an oligonucleotide may be a nucleic acid molecule comprising at least two covalently bonded nucleotide residues. In some embodiments, an oligonucleotide may be between 10 and 1,000 nucleotides long. For example, an oligonucleotide may be between 10 and 500 nucleotides long, or between 500 and 1,000 nucleotides long. In some embodiments, an oligonucleotide may be between about 20 and about 100 nucleotides long (e.g., from about 30 to 90, 40 to 85, 50 to 80, 60 to 75, or about 65 or about 70 nucleotides long), between about 100 and about 200, between about 200 and about 300 nucleotides, between about 300 and about 400, or between about 400 and about 500 nucleotides long. However, shorter or longer oligonucleotides may be used. An oligonucleotide may be a single-stranded nucleic acid. However, in some embodiments a double-stranded oligonucleotide may be used as described herein. In certain embodiments, an oligonucleotide may be chemically synthesized as described in more detail below. In some embodiments, an input nucleic acid (e.g., oligonucleotide) may be amplified before use. The resulting product may be double-stranded. In some embodiments, one of the strands of a double-stranded nucleic acid may be removed before use so that only a predetermined single strand is added to an assembly reaction. In certain embodiments, each oligonucleotide may be designed to have a sequence that is identical to a different portion of the sequence of a predetermined target nucleic acid that is to be assembled. Accordingly, in some embodiments each oligonucleotide may have a sequence that is identical to a portion of one of the two strands of a double-stranded target nucleic acid. For clarity, the two complementary strands of a double stranded nucleic acid are referred to herein as the positive (P) and negative (N) strands. This designation is not intended to imply that the strands are sense and anti-sense strands of a coding sequence. They refer only to the two complementary strands of a nucleic acid (e.g., a target nucleic acid, an intermediate nucleic acid fragment, etc.) regardless of the sequence or function of the nucleic acid. Accordingly, in some embodiments a P strand may be a sense strand of a coding sequence, whereas in other embodiments a P strand may be an anti-sense strand of a coding sequence. According to the invention, a target nucleic acid may be either the P strand, the N strand, or a double-stranded nucleic acid comprising both the P and N strands. It should be appreciated that different oligonucleotides may be designed to have different lengths. In some embodiments, one or more different oligonucleotides may have overlapping sequence regions (e.g., overlapping 5' regions or overlapping 3' regions). Overlapping sequence regions may be identical (i.e., corresponding to the same strand of the nucleic acid fragment) or complementary (i.e., corresponding to complementary strands of the nucleic acid fragment). The plurality of oligonucleotides may include one or more oligonucleotide pairs with overlapping identical sequence regions, one or more oligonucleotide pairs with overlapping complementary sequence regions, or a combination thereof. Overlapping sequences may be of any suitable length. For example, overlapping sequences may encompass the entire length of one or more nucleic acids used in an assembly reaction. Overlapping sequences may be between about 5 and about 500 nucleotides long (e.g., between about 10 and 100, between about 10 and 75, between about 10 and 50, about 20, about 25, about 30, about 35, about 40, about 45, about 50, etc.) However, shorter, longer or intermediate overlapping lengths may be used. It should be appreciated that overlaps between different input nucleic acids used in an assembly reaction may have different lengths.
In a multiplex oligonucleotide assembly reaction designed to generate a predetermined nucleic acid fragment, the combined sequences of the different oligonucleotides in the reaction may span the sequence of the entire nucleic acid fragment on either the positive strand, the negative strand, both strands, or a combination of portions of the positive strand and portions of the negative strand. The plurality of different oligonucleotides may provide either positive sequences, negative sequences, or a combination of both positive and negative sequences corresponding to the entire sequence of the nucleic acid fragment to be assembled. In some embodiments, the plurality of oligonucleotides may include one or more oligonucleotides having sequences identical to one or more portions of the positive sequence, and one or more oligonucleotides having sequences that are identical to one or more portions of the negative sequence of the nucleic acid fragment. One or more pairs of different oligonucleotides may include sequences that are identical to overlapping portions of the predetermined nucleic acid fragment sequence as described herein (e.g., overlapping sequence portions from the same or from complementary strands of the nucleic acid fragment). In some embodiments, the plurality of oligonucleotides includes a set of oligonucleotides having sequences that combine to span the entire positive sequence and a set oligonucleotides having sequences that combine to span the entire negative sequence of the predetermined nucleic acid fragment. However, in certain embodiments, the plurality of oligonucleotides may include one or more oligonucleotides with sequences that are identical to sequence portions on one strand (either the positive or negative strand) of the nucleic acid fragment, but no oligonucleotides with sequences that are complementary to those sequence portions. In one embodiment, a plurality of oligonucleotides includes only oligonucleotides having sequences identical to portions of the positive sequence of the predetermined nucleic acid fragment. In one embodiment, a plurality of oligonucleotides includes only oligonucleotides having sequences identical to portions of the negative sequence of the predetermined nucleic acid fragment. These oligonucleotides may be assembled by sequential ligation or in an extension-based reaction (e.g., if an oligonucleotide having a 3' region that is complementary to one of the plurality of oligonucleotides is added to the reaction).
In one aspect, a nucleic acid fragment may be assembled in a polymerase- mediated assembly reaction from a plurality of oligonucleotides that are combined and extended in one or more rounds of polymerase-mediated extensions. In another aspect, a nucleic acid fragment may be assembled in a ligase-mediated reaction from a plurality of oligonucleotides that are combined and ligated in one or more rounds of ligase-mediated ligations. In another aspect, a nucleic acid fragment may be assembled in a non- enzymatic reaction (e.g., a chemical reaction) from a plurality of oligonucleotides that are combined and assembled in one or more rounds of non-enzymatic reactions. In some embodiments, a nucleic acid fragment may be assembled using a combination of polymerase, ligase, and/or non-enzymatic reactions. For example, both polymerase(s) and ligase(s) may be included in an assembly reaction mixture. Accordingly, a nucleic acid may be assembled via coupled amplification and ligation or ligation during amplification. The resulting nucleic acid fragment from each assembly technique may have a sequence that includes the sequences of each of the plurality of assembly oligonucleotides that were used as described herein. These assembly reactions may be referred to as primerless assemblies, since the target nucleic acid is generated by assembling the input oligonucleotides rather than being generated in an amplification reaction where the oligonucleotides act as amplification primers to amplify a pre-existing template nucleic acid molecule corresponding to the target nucleic acid.
Polymerase-based assembly techniques may involve one or more suitable polymerase enzymes that can catalyze a template-based extension of a nucleic acid in a 5' to 3' direction in the presence of suitable nucleotides and an annealed template. A polymerase may be thermostable. A polymerase may be obtained from recombinant or natural sources. In some embodiments, a thermostable polymerase from a thermophilic organism may be used. In some embodiments, a polymerase may include a 3'— ► 5' exonuclease/proofreading activity. In some embodiments, a polymerase may have no, or little, proofreading activity (e.g., a polymerase may be a recombinant variant of a natural polymerase that has been modified to reduce its proofreading activity). Examples of thermostable DNA polymerases include, but are not limited to: Taq (a heat-stable DNA polymerase from the bacterium Thermus aquaticus); Pfu (a thermophilic DNA polymerase with a 3'—» 5' exonuclease/proofreading activity from Pyrococcus furiosus, available from for example Promega); VentR® DNA Polymerase and VentR® (exo-) DNA Polymerase (thermophilic DNA polymerases with or without a 3'—» 5' exonuclease/proofreading activity from Thermococcus litoralis; also known as TIi polymerase); Deep VentR® DNA Polymerase and Deep VentR® (exo-) DNA Polymerase (thermophilic DNA polymerases with or without a 3 '—* 5 ' exonuclease/proofreading activity from Pyrococcus species GB-D; available from New England Biolabs); KOD HiFi (a recombinant Thermococcus kodakaraensis KODl DNA polymerase with a 3'—* 5 'exonuclease/proofreading activity, available from Novagen,); BIO-X-ACT (a mix of polymerases that possesses 5'-3' DNA polymerase activity and 3'—» 5' proofreading activity); Klenow Fragment (an N-terminal truncation of E. coli DNA Polymerase I which retains polymerase activity, but has lost the 5'→ 3' exonuclease activity, available from, for example, Promega and NEB); Sequenase™ (T7 DNA polymerase deficient in 3'-5' exonuclease activity); Phi29 (bacteriophage 29 DNA polymerase, may be used for rolling circle amplification, for example, in a TempliPhi™ DNA Sequencing Template Amplification Kit, available from Amersham Biosciences); TopoTaq™ (a hybrid polymerase that combines hyperstable DNA binding domains and the DNA unlinking activity of Methanopyrus topoisomerase, with no exonuclease activity, available from Fidelity Systems); TopoTaq HiFi which incorporates a proofreading domain with exonuclease activity; Phusion™ (a PyrococcusAike enzyme with a processivity-enhancing domain, available from New England Biolabs); any other suitable DNA polymerase, or any combination of two or more thereof.
Ligase-based assembly techniques may involve one or more suitable ligase enzymes that can catalyze the covalent linking of adjacent 3' and 5' nucleic acid termini (e.g., a 5' phosphate and a 3' hydroxyl of nucleic acid(s) annealed on a complementary template nucleic acid such that the 3' terminus is immediately adjacent to the 5' terminus). Accordingly, a ligase may catalyze a ligation reaction between the 5' phosphate of a first nucleic acid to the 3' hydroxyl of a second nucleic acid if the first and second nucleic acids are annealed next to each other on a template nucleic acid). A ligase may be obtained from recombinant or natural sources. A ligase may be a heat- stable ligase. In some embodiments, a thermostable ligase from a thermophilic organism may be used. Examples of thermostable DNA ligases include, but are not limited to: Tth DNA ligase (from Thermus thermophilics, available from, for example, Eurogentec and GeneCraft); Pfu DNA ligase (a hyperthermophilic ligase from Pyrococcus furiosus); Taq ligase (from Thermus aquaticus), any other suitable heat-stable ligase, or any combination thereof. In some embodiments, one or more lower temperature ligases may be used (e.g., T4 DNA ligase). A lower temperature ligase may be useful for shorter overhangs (e.g., about 3, about 4, about 5, or about 6 base overhangs) that may not be stable at higher temperatures.
Non-enzymatic techniques can be used to ligate nucleic acids. For example, a 5'- end (e.g., the 5' phosphate group) and a 3'-end (e.g., the 3' hydroxyl) of one or more nucleic acids may be covalently linked together without using enzymes (e.g., without using a ligase). In some embodiments, non-enzymatic techniques may offer certain advantages over enzyme-based ligations. For example, non-enzymatic techniques may have a high tolerance of non-natural nucleotide analogues in nucleic acid substrates, may be used to ligate short nucleic acid substrates, may be used to ligate RNA substrates, and/or may be cheaper and/or more suited to certain automated (e.g., high throughput) applications. Non-enzymatic ligation may involve a chemical ligation. In some embodiments, nucleic acid termini of two or more different nucleic acids may be chemically ligated. In some embodiments, nucleic acid termini of a single nucleic acid may be chemically ligated (e.g., to circularize the nucleic acid). It should be appreciated that both strands at a first double-stranded nucleic acid terminus may be chemically ligated to both strands at a second double-stranded nucleic acid terminus. However, in some embodiments only one strand of a first nucleic acid terminus may be chemically ligated to a single strand of a second nucleic acid terminus. For example, the 5' end of one strand of a first nucleic acid terminus may be ligated to the 3' end of one strand of a second nucleic acid terminus without the ends of the complementary strands being chemically ligated.
Accordingly, a chemical ligation may be used to form a covalent linkage between a 5' terminus of a first nucleic acid end and a 3' terminus of a second nucleic acid end, wherein the first and second nucleic acid ends may be ends of a single nucleic acid or ends of separate nucleic acids. In one aspect, chemical ligation may involve at least one nucleic acid substrate having a modified end (e.g., a modified 5' and/or 3' terminus) including one or more chemically reactive moieties that facilitate or promote linkage formation. In some embodiments, chemical ligation occurs when one or more nucleic acid termini are brought together in close proximity (e.g., when the termini are brought together due to annealing between complementary nucleic acid sequences).
Accordingly, annealing between complementary 3' or 5' overhangs (e.g., overhangs generated by restriction enzyme cleavage of a double-stranded nucleic acid) or between any combination of complementary nucleic acids that results in a 3' terminus being brought into close proximity with a 5' terminus (e.g., the 3' and 5' termini are adjacent to each other when the nucleic acids are annealed to a complementary template nucleic acid) may promote a template-directed chemical ligation. Examples of chemical reactions may include, but are not limited to, condensation, reduction, and/or photochemical ligation reactions. It should be appreciated that in some embodiments chemical ligation can be used to produce naturally-occurring phosphodiester internucleotide linkages, non-naturally-occurring phosphamide pyrophosphate internucleotide linkages, and/or other non-naturally-occurring internucleotide linkages.
In some embodiments, the process of chemical ligation may involve one or more coupling agents to catalyze the ligation reaction. A coupling agent may promote a ligation reaction between reactive groups in adjacent nucleic acids (e.g., between a 5'- reactive moiety and a 3 '-reactive moiety at adjacent sites along a complementary template). In some embodiments, a coupling agent may be a reducing reagent (e.g., ferricyanide), a condensing reagent such (e.g., cyanoimidazole, cyanogen bromide, carbodiimide, etc.), or irradiation (e.g., UV irradiation for photo-ligation). In some embodiments, a chemical ligation may be an autoligation reaction that does not involve a separate coupling agent. In autoligation, the presence of a reactive group on one or more nucleic acids may be sufficient to catalyze a chemical ligation between nucleic acid termini without the addition of a coupling agent (see, for example, Xu Y & Kool ET, 1997, Tetrahedron Lett. 38:5595-8). Non-limiting examples of these reagent-free ligation reactions may involve nucleophilic displacements of sulfur on bromoacetyl, tosyl, or iodo-nucleoside groups (see, for example, Xu Y et al., 2001, Nat Biotech 19:148-52). Nucleic acids containing reactive groups suitable for autoligation can be prepared directly on automated synthesizers (see, for example, Xu Y & Kool ET, 1999, Nuc. Acids Res. 27:875-81). In some embodiments, a phosphorothioate at a 3' terminus may react with a leaving group (such as tosylate or iodide) on a thymidine at an adjacent 5' terminus. In some embodiments, two nucleic acid strands bound at adjacent sites on a complementary target strand may undergo auto-ligation by displacement of a 5 '-end iodide moiety (or tosylate) with a 3 '-end sulfur moiety. Accordingly, in some embodiments the product of an autoligation may include a non-naturally-occurring internucleotide linkage (e.g., a single oxygen atom may be replaced with a sulfur atom in the ligated product).
In some embodiments, a synthetic nucleic acid duplex can be assembled via chemical ligation in a one step reaction involving simultaneous chemical ligation of nucleic acids on both strands of the duplex. For example, a mixture of 5'- phosphorylated oligonucleotides corresponding to both strands of a target nucleic acid may be chemically ligated by a) exposure to heat (e.g., to 97 0C) and slow cooling to form a complex of annealed oligonucleotides, and b) exposure to cyanogen bromide or any other suitable coupling agent under conditions sufficient to chemically ligate adjacent 3' and 5' ends in the nucleic acid complex.
In some embodiments, a synthetic nucleic acid duplex can be assembled via chemical ligation in a two step reaction involving separate chemical ligations for the complementary strands of the duplex. For example, each strand of a target nucleic acid may be ligated in a separate reaction containing phosphorylated oligonucleotides corresponding to the strand that is to be ligated and non-phosphorylated oligonucleotides corresponding to the complementary strand. The non-phosphorylated oligonucleotides may serve as a template for the phosphorylated oligonucleotides during a chemical ligation (e.g. using cyanogen bromide). The resulting single-stranded ligated nucleic acid may be purified and annealed to a complementary ligated single-stranded nucleic acid to form the target duplex nucleic acid (see, for example, Shabarova ZA et al., 1991, Nuc. Acids Res. 19:4247-51).
Aspects of the invention may be used to enhance different types of nucleic acid assembly reactions (e.g., multiplex nucleic acid assembly reactions). Aspects of the invention may be used in combination with one or more assembly reactions described in, for example, Carr et al., 2004, Nucleic Acids Research, Vol. 32, No 20, el 62 (9 pages); Richmond et al., 2004, Nucleic Acids Research, Vol. 32, No 17, pp. 5011-5018; Caruthers et al., 1972, J. MoI. Biol. 72, 475-492; Hecker et al., 1998, Biotechniques 24:256-260; Kodumal et al., 2004, PNAS Vol. 101, No. 44, pp. 15573-15578; Tian et al., 2004, Nature, Vol. 432, pp. 1050-1054; and US Patent Nos. 6,008,031 and 5,922,539, the disclosures of which are incorporated herein by reference. Certain embodiments of multiplex nucleic acid assembly reactions for generating a predetermined nucleic acid fragment are illustrated with reference to FIGs. 1 -4. It should be appreciated that synthesis and assembly methods described herein (including, for example, oligonucleotide synthesis, multiplex nucleic acid assembly, concerted assembly of nucleic acid fragments, or any combination thereof) may be performed in any suitable format, including in a reaction tube, in a multi-well plate, on a surface, on a column, in a microfluidic device (e.g., a microfluidic tube), a capillary tube, etc. FIG. 1 shows one embodiment of a plurality of oligonucleotides that may be assembled in a polymerase-based multiplex oligonucleotide assembly reaction. Figure IA shows two groups of oligonucleotides (Group P and Group N) that have sequences of portions of the two complementary strands of a nucleic acid fragment to be assembled. Group P includes oligonucleotides with positive strand sequences (Pi, P2, ... Pn-I, Pn, Pn+i» • • -Pτ» shown from 5' ->3' on the positive strand). Group N includes oligonucleotides with negative strand sequences (NT, ..., Nn+ 1, Nn, Nn_i, ..., N2, Nj, shown from 5'-> 3' on the negative strand). In this example, none of the P group oligonucleotides overlap with each other and none of the N group oligonucleotides overlap with each other. However, in some embodiments, one or more of the oligonucleotides within the S or N group may overlap. Furthermore, FIG. IA shows gaps between consecutive oligonucleotides in Group P and gaps between consecutive oligonucleotides in Group N. However, each P group oligonucleotide (except for Pi) and each N group oligonucleotide (except for NT) overlaps with complementary regions of two oligonucleotides from the complementary group of oligonucleotides. Pi and NT overlap with a complementary region of only one oligonucleotide from the other group (the complementary 3 '-most oligonucleotides Ni and PT, respectively). FIG. IB shows a structure of an embodiment of a Group P or Group N oligonucleotide represented in FIG. IA. This oligonucleotide includes a 5' region that is complementary to a 5' region of a first oligonucleotide from the other group, a 3' region that is complementary to a 3' region of a second oligonucleotide from the other group, and a core or central region that is not complementary to any oligonucleotide sequence from the other group (or its own group). This central region is illustrated as the B region in FIG. IB. The sequence of the B region may be different for each different oligonucleotide. As defined herein, the B region of an oligonucleotide in one group corresponds to a gap between two consecutive oligonucleotides in the complementary group of oligonucleotides. It should be noted that the 5 '-most oligonucleotide in each group (Pi in Group P and NT in Group N) does not have a 5' region that is complementary to the 5' region of any other oligonucleotide in either group. Accordingly, the 5'-most oligonucleotides (Pi and NT) that are illustrated in FIG. IA each have a 3' complementary region and a 5' non-complementary region (the B region of FIG. IB), but no 5' complementary region. However, it should be appreciated that any one or more of the oligonucleotides in Group P and/or Group N (including all of the oligonucleotides in Group P and/or Group N) can be designed to have no B region. In the absence of a B region, a 5 '-most oligonucleotide has only the 3' complementary region (meaning that the entire oligonucleotide is complementary to the 3' region of the 3'-most oligonucleotide from the other group (e.g., the 3' region of Ni or Pτ shown in FIG. IA). In the absence of a B region, one of the other oligonucleotides in either Group P or Group N has only a 5' complementary region and a 3' complementary region (meaning that the entire oligonucleotide is complementary to the 5' and 3' sequence regions of the two overlapping oligonucleotides from the complementary group). In some embodiments, only a subset of oligonucleotides in an assembly reaction may include B regions. It should be appreciated that the length of the 5', 3', and B regions may be different for each oligonucleotide. However, for each oligonucleotide the length of the 5' region is the same as the length of the complementary 5' region in the 5' overlapping oligonucleotide from the other group. Similarly, the length of the 3' region is the same as the length of the complementary 3' region in the 3' overlapping oligonucleotide from the other group. However, in certain embodiments a 3'-most oligonucleotide may be designed with a 3' region that extends beyond the 5' region of the 5 '-most oligonucleotide. In this embodiment, an assembled product may include the 5' end of the 5'-most oligonucleotide, but not the 3' end of the 3'-most oligonucleotide that extends beyond it. FIG. 1C illustrates a subset of the oligonucleotides from FIG. IA, each oligonucleotide having a 5', a 3', and an optional B region. Oligonucleotide Pn is shown with a 5' region that is complementary to (and can anneal to) the 5' region of oligonucleotide Nn-i. Oligonucleotide Pn also has a 3' region that is complementary to (and can anneal to) the 3' region of oligonucleotide Nn. Nn is also shown with a 5' region that is complementary (and can anneal to) the 5' region of oligonucleotide Pn+! . This pattern could be repeated for all of oligonucleotides P2 to PT and Ni to Nτ-i (with the 5 '-most oligonucleotides only having 3' complementary regions as discussed herein). If all of the oligonucleotides from Group P and Group N are mixed together under appropriate hybridization conditions, they may anneal to form a long chain such as the oligonucleotide complex illustrated in FIG. IA. However, subsets of the oligonucleotides may form shorter chains and even oligonucleotide dimers with annealed 5' or 3' regions. It should be appreciated that many copies of each oligonucleotide are included in a typical reaction mixture. Accordingly, the resulting hybridized reaction mixture may contain a distribution of different oligonucleotide dimers and complexes. Polymerase-mediated extension of the hybridized oligonucleotides results in a template- based extension of the 3' ends of oligonucleotides that have annealed 3' regions. Accordingly, polymerase-mediated extension of the oligonucleotides shown in FIG. 1C would result in extension of the 3' ends only of oligonucleotides Pn and Nn generating extended oligonucleotides containing sequences that are complementary to all the regions of Nn and Pn, respectively. Extended oligonucleotide products with sequences complementary to all of Nn-i and Pn+i would not be generated unless oligonucleotides Pn. 1 and Nn+i were included in the reaction mixture. Accordingly, if all of the oligonucleotide sequences in a plurality of oligonucleotides are to be incorporated into an assembled nucleic acid fragment using a polymerase, the plurality of oligonucleotides should include 5 '-most oligonucleotides that are at least complementary to the entire 3' regions of the 3 '-most oligonucleotides. In some embodiments, the 5 '-most oligonucleotides also may have 5' regions that extend beyond the 3' ends of the 3 '-most oligonucleotides as illustrated in FIG. IA. In some embodiments, a ligase also may be added to ligate adjacent 5' and 3' ends that may be formed upon 3' extension of annealed oligonucleotides in an oligonucleotide complex such as the one illustrated in FIG. IA.
When assembling a nucleic acid fragment using a polymerase, a single cycle of polymerase extension extends oligonucleotide pairs with annealed 3' regions. Accordingly, if a plurality of oligonucleotides were annealed to form an annealed complex such as the one illustrated in FIG. IA, a single cycle of polymerase extension would result in the extension of the 3' ends of the Pi/Nj, P2/N2, ..., Pn-i/Nn-i, Pn/Nn, Pπ+i/Nn+i, ..., Pτ/Nτ oligonucleotide pairs. In one embodiment, a single molecule could be generated by ligating the extended oligonucleotide dimers. In one embodiment, a single molecule incorporating all of the oligonucleotide sequences may be generated by performing several polymerase extension cycles.
In one embodiment, FIG. ID illustrates two cycles of polymerase extension (separated by a denaturing step and an annealing step) and the resulting nucleic acid products. It should be appreciated that several cycles of polymerase extension may be required to assemble a single nucleic acid fragment containing all the sequences of an initial plurality of oligonucleotides. In one embodiment, a minimal number of extension cycles for assembling a nucleic acid may be calculated as Iog2n, where n is the number of oligonucleotides being assembled. In some embodiments, progressive assembly of the nucleic acid may be achieved without using temperature cycles. For example, an enzyme capable of rolling circle amplification may be used (e.g., phi 29 polymerase) when a circularized nucleic acid (e.g., oligonucleotide) complex is used as a template to produce a large amount of circular product for subsequent processing using MutS or a MutS homolog as described herein. In step 1 of FIG. ID, annealed oligonucleotide pairs Pn/Nn and Pn+i/Nπ+i are extended to form oligonucleotide dimer products incorporating the sequences covered by the respective oligonucleotide pairs. For example, Pn is extended to incorporate sequences that are complementary to the B and 5' regions of Nn (indicated as N 'n in FIG. ID). Similarly, Nn+ 1 is extended to incorporate sequences that are complementary to the 5' and B regions of Pn+i (indicated as P'n+i in FIG. ID). These dimer products may be denatured and reannealed to form the starting material of step 2 where the 3' end of the extended Pn oligonucleotide is annealed to the 3' end of the extended Nn+i oligonucleotide. This product may be extended in a polymerase-mediated reaction to form a product that incorporates the sequences of the four oligonucleotides (Pn, Nn, Pn +1, Nn+O- One strand of this extended product has a sequence that includes (in 5' to 3' order) the 5', B, and 3' regions of Pn, the complement of the B region of Nn, the 5', B, and 3' regions of Pn+i, and the complements of the B and 5' regions OfNn+J. The other strand of this extended product has the complementary sequence. It should be appreciated that the 3' regions of Pn and Nn are complementary, the 5' regions of Nn and Pn+i are complementary, and the 3' regions of Pn+i and Nn+i are complementary. It also should be appreciated that the reaction products shown in FIG. ID are a subset of the reaction products that would be obtained using all of the oligonucleotides of Group P and Group N. A first polymerase extension reaction using all of the oligonucleotides would result in a plurality of overlapping oligonucleotide dimers from Pi/Ni to Pτ/Nτ. Each of these may be denatured and at least one of the strands could then anneal to an overlapping complementary strand from an adjacent (either 3' or 5') oligonucleotide dimer and be extended in a second cycle of polymerase extension as shown in FIG. ID. Subsequent cycles of denaturing, annealing, and extension produce progressively larger products including a nucleic acid fragment that includes the sequences of all of the initial oligonucleotides. It should be appreciated that these subsequent rounds of extension also produce many nucleic acid products of intermediate length. The reaction product may be complex since not all of the 3' regions may be extended in each cycle. Accordingly, unextended oligonucleotides may be available in each cycle to anneal to other unextended oligonucleotides or to previously extended oligonucleotides. Similarly, extended products of different sizes may anneal to each other in each cycle.
Accordingly, a mixture of extended products of different sizes covering different regions of the sequence may be generated along with the nucleic acid fragment covering the entire sequence. This mixture also may contain any remaining unextended oligonucleotides. FIG. 2 shows an embodiment of a plurality of oligonucleotides that may be assembled in a directional polymerase-based multiplex oligonucleotide assembly reaction. In this embodiment, only the 5 '-most oligonucleotide of Group P may be provided. In contrast to the example shown in FIG. 1, the remainder of the sequence of the predetermined nucleic acid fragment is provided by oligonucleotides of Group N. The 3'-most oligonucleotide of Group N (Nl) has a 3' region that is complementary to the 3' region of Pi as shown in FIG. 2B. However, the remainder of the oligonucleotides in Group N have overlapping (but non-complementary) 3' and 5' regions as illustrated in FIG. 2B for oligonucleotides N1-N3. Each Group N oligonucleotide (e.g., Nn) overlaps with two adjacent oligonucleotides: one overlaps with the 3' region (Nn-O and one with the 5' region (Nn+i), except for Ni that overlaps with the 3' regions of Pi (complementary overlap) and N2 (non-complementary overlap), and NT that overlaps only with Nτ-i. It should be appreciated that all of the overlaps shown in FIG. 2 A between adjacent oligonucleotides N2 to NT- i are non-complementary overlaps between the 5' region of one oligonucleotide and the 3' region of the adjacent oligonucleotide illustrated in a 3' to 5' direction on the N strand of the predetermined nucleic acid fragment. It also should be appreciated that each oligonucleotide may have 3', B, and 5'regions of different lengths (including no B region in some embodiments). In some embodiments, none of the oligonucleotides may have B regions, meaning that the entire sequence of each oligonucleotide may overlap with the combined 5' and 3' region sequences of its two adjacent oligonucleotides.
Assembly of a predetermined nucleic acid fragment from the plurality of oligonucleotides shown in FIG. 2A may involve multiple cycles of polymerase-mediated extension. Each extension cycle may be separated by a denaturing and an annealing step. FIG. 2C illustrates the first two steps in this assembly process. In step 1, annealed oligonucleotides Pi and Ni are extended to form an oligonucleotide dimer. Pi is shown with a 5' region that is non-complementary to the 3' region of Ni and extends beyond the 3' region of Ni when the oligonucleotides are annealed. However, in some embodiments, Pi may lack the 5' non-complementary region and include only sequences that overlap with the 3' region of Ni. The product of Pi extension is shown after step 1 containing an extended region that is complementary to the 5' end of Ni. The single strand illustrated in FIG. 2C may be obtained by denaturing the oligonucleotide dimer that results from the extension of Pi/Ni in step 1. The product of Pi extension is shown annealed to the 3' region of N2. This annealed complex may be extended in step 2 to generate an extended product that now includes sequences complementary to the B and 5' regions OfN2. Again, the single strand illustrated in FIG. 2C may be obtained by denaturing the oligonucleotide dimer that results from the extension reaction of step 2. Additional cycles of extension may be performed to further assemble a predetermined nucleic acid fragment. In each cycle, extension results in the addition of sequences complementary to the B and 5' regions of the next Group N oligonucleotide. Each cycle may include a denaturing and annealing step. However, the extension may occur under the annealing conditions. Accordingly, in one embodiment, cycles of extension may be obtained by alternating between denaturing conditions (e.g., a denaturing temperature) and annealing/extension conditions (e.g., an annealing/extension temperature). In one embodiment, T (the number of group N oligonucleotides) may determine the minimal number of temperature cycles used to assemble the oligonucleotides. However, in some embodiments, progressive extension may be achieved without temperature cycling. For example, an enzyme capable promoting rolling circle amplification may be used (e.g., TempliPhi). It should be appreciated that a reaction mixture containing an assembled predetermined nucleic acid fragment also may contain a distribution of shorter extension products that may result from incomplete extension during one or more of the cycles or may be the result of an Pi/Ni extension that was initiated after the first cycle.
FIG. 2D illustrates an example of a sequential extension reaction where the 5'- most Pi oligonucleotide is bound to a support and the Group N oligonucleotides are unbound. The reaction steps are similar to those described for FIG. 2C. However, an extended predetermined nucleic acid fragment will be bound to the support via the 5'- most Pi oligonucleotide. Accordingly, the complementary strand (the negative strand) may readily be obtained by denaturing the bound fragment and releasing the negative strand. In some embodiments, the attachment to the support may be labile or readily reversed (e.g., using light, a chemical reagent, a pH change, etc.) and the positive strand also may be released. Accordingly, either the positive strand, the negative strand, or the double-stranded product may be obtained. FIG. 2E illustrates an example of a sequential reaction where Pj is unbound and the Group N oligonucleotides are bound to a support. The reaction steps are similar to those described for FIG. 2C. However, an extended predetermined nucleic acid fragment will be bound to the support via the 5'-most NT oligonucleotide. Accordingly, the complementary strand (the positive strand) may readily be obtained by denaturing the bound fragment and releasing the positive strand. In some embodiments, the attachment to the support may be labile or readily reversed (e.g., using light, a chemical reagent, a pH change, etc.) and the negative strand also may be released. Accordingly, either the positive strand, the negative strand, or the double- stranded product may be obtained. It should be appreciated that other configurations of oligonucleotides may be used to assemble a nucleic acid via two or more cycles of polymerase-based extension. In many configurations, at least one pair of oligonucleotides have complementary 3' end regions. FIG. 2F illustrates an example where an oligonucleotide pair with complementary 3' end regions is flanked on either side by a series of oligonucleotides with overlapping non-complementary sequences. The oligonucleotides illustrated to the right of the complementary pair have overlapping 3' and 5' regions (with the 3' region of one oligonucleotide being identical to the 5' region of the adjacent oligonucleotide) that corresponding to a sequence of one strand of the target nucleic acid to be assembled. The oligonucleotides illustrated to the left of the complementary pair have overlapping 3' and 5' regions (with the 3' region of one oligonucleotide being identical to the 5' region of the adjacent oligonucleotide) that correspond to a sequence of the complementary strand of the target nucleic acid. These oligonucleotides may be assembled via sequential polymerase-based extension reactions as described herein (see also, for example, Xiong et al., 2004, Nucleic Acids Research, Vol. 32, No. 12, e98, 10 pages, the disclosure of which is incorporated by reference herein). It should be appreciated that different numbers and/or lengths of oligonucleotides may be used on either side of the complementary pair. Accordingly, the illustration of the complementary pair as the central pair in FIG. 2F is not intended to be limiting as other configuration of a complementary oligonucleotide pair flanked by a different number of non-complementary pairs on either side may be used according to methods of the invention.
FIG. 3 shows an embodiment of a plurality of oligonucleotides that may be assembled in a ligase reaction. FIG. 3 A illustrates the alignment of the oligonucleotides showing that they do not contain gaps (i.e., no B region as described herein). Accordingly, the oligonucleotides may anneal to form a complex with no nucleotide gaps between the 3' and 5' ends of the annealed oligonucleotides in either Group P or Group N. These oligonucleotides provide a suitable template for assembly using a ligase under appropriate reaction conditions. However, it should be appreciated that these oligonucleotides also may be assembled using a polymerase-based assembly reaction as described herein. FIG. 3B shows two individual ligation reactions. These reactions are illustrated in two steps. However, it should be appreciated that these ligation reactions may occur simultaneously or sequentially in any order and may occur as such in a reaction maintained under constant reaction conditions (e.g., with no temperature cycling) or in a reaction exposed to several temperature cycles. For example, the reaction illustrated in step 2 may occur before the reaction illustrated in step 1. In each ligation reaction illustrated in FIG. 3B, a Group N oligonucleotide is annealed to two adjacent Group P oligonucleotides (due to the complementary 5' and 3' regions between the P and N oligonucleotides), providing a template for ligation of the adjacent P oligonucleotides. Although not illustrated, ligation of the N group oligonucleotides also may proceed in similar manner to assemble adjacent N oligonucleotides that are annealed to their complementary P oligonucleotide. Assembly of the predetermined nucleic acid fragment may be obtained through ligation of all of the oligonucleotides to generate a double stranded product. However, in some embodiments, a single stranded product of either the positive or negative strand may be obtained. In certain embodiments, a plurality of oligonucleotides may be designed to generate only single-stranded reaction products in a ligation reaction. For example, a first group of oligonucleotides (of either Group P or Group N) may be provided to cover the entire sequence on one strand of the predetermined nucleic acid fragment (on either the positive or negative strand). In contrast, a second group of oligonucleotides (from the complementary group to the first group) may be designed to be long enough to anneal to complementary regions in the first group but not long enough to provide adjacent 5' and 3' ends between oligonucleotides in the second group. This provides substrates that are suitable for ligation of oligonucleotides from the first group but not the second group. The result is a single-stranded product having a sequence corresponding to the oligonucleotides in the first group. Again, as with other assembly reactions described herein, a ligase reaction mixture that contains an assembled predetermined nucleic acid fragment also may contain a distribution of smaller fragments resulting from the assembly of a subset of the oligonucleotides.
FIG. 4 shows an embodiment of a ligase-based assembly where one or more of the plurality of oligonucleotides is bound to a support. In FIG. 4 A, the 5' most oligonucleotide of the P group oligonucleotides is bound to a support. Ligation of adjacent oligonucleotides in the 5' to 3' direction results in the assembly of a predetermined nucleic acid fragment. FIG. 4A illustrates an example where adjacent oligonucleotides P2 and P3 are added sequentially. However, the ligation of any two adjacent oligonucleotides from Group P may occur independently and in any order in a ligation reaction mixture. For example, when Pi is ligated to the 5' end of N2, N2 may be in the form of a single oligonucleotide or it already may be ligated to one or more downstream oligonucleotides (N3, N4, etc.). It should be appreciated that for a ligation assembly bound to a support, either the 5 '-most (e.g., Pj for Group P, or NT for Group N) or the 3'-most (e.g., PT for Group P, or Ni for Group N) oligonucleotide may be bound to a support since the reaction can proceed in any direction. In some embodiments, a predetermined nucleic acid fragment may be assembled with a central oligonucleotide (i.e., neither the 5 '-most or the 3 '-most) that is bound to a support provided that the attachment to the support does not interfere with ligation.
FIG. 4B illustrates an example where a plurality of N group oligonucleotides are bound to a support and a predetermined nucleic acid fragment is assembled from P group oligonucleotides that anneal to their complementary support-bound N group oligonucleotides. Again, FIG. 4B illustrates a sequential addition. However, adjacent P group oligonucleotides may be ligated in any order. Also, the bound oligonucleotides may be attached at their 5' end, 3' end, or at any other position provided that the attachment does not interfere with their ability to bind to complementary 5' and 3' regions on the oligonucleotides that are being assembled. This reaction may involve one or more reaction condition changes (e.g., temperature cycles) so that ligated oligonucleotides bound to one immobilized N group oligonucleotide can be dissociated from the support and bind to a different immobilized N group oligonucleotide to provide a substrate for ligation to another P group oligonucleotide.
As with other assembly reactions described herein, support-bound ligase reactions (e.g., those illustrated in FIG. 4B) that generate a full length predetermined nucleic acid fragment also may generate a distribution of smaller fragments resulting from the assembly of subsets of the oligonucleotides. A support used in any of the assembly reactions described herein (e.g., polymerase-based, ligase-based, or other assembly reaction) may include any suitable support medium. A support may be solid, porous, a matrix, a gel, beads, beads in a gel, etc. A support may be of any suitable size. A solid support may be provided in any suitable configuration or shape (e.g., a chip, a bead, a gel, a microfiuidic channel, a planar surface, a spherical shape, a column, etc.).
As illustrated herein, different oligonucleotide assembly reactions may be used to assemble a plurality of overlapping oligonucleotides (with overlaps that are either 575', 373', 573', complementary, non-complementary, or a combination thereof). Many of these reactions include at least one pair of oligonucleotides (the pair including one oligonucleotide from a first group or P group of oligonucleotides and one oligonucleotide from a second group or N group of oligonucleotides) have overlapping complementary 3' regions. However, in some embodiments, a predetermined nucleic acid may be assembled from non-overlapping oligonucleotides using blunt-ended ligation reactions. In some embodiments, the order of assembly of the non-overlapping oligonucleotides may be biased by selective phosphorylation of different 5' ends. In some embodiments, size purification may be used to select for the correct order of assembly. In some embodiments, the correct order of assembly may be promoted by sequentially adding appropriate oligonucleotide substrates into the reaction (e.g., the ligation reaction).
In order to obtain a full-length nucleic acid fragment from a multiplex oligonucleotide assembly reaction, a purification step may be used to remove starting oligonucleotides and/or incompletely assembled fragments. In some embodiments, a purification step may involve chromatography, electrophoresis, or other physical size separation technique. In certain embodiments, a purification step may involve amplifying the full length product. For example, a pair of amplification primers (e.g., PCR primers) that correspond to the predetermined 5' and 3' ends of the nucleic acid fragment being assembled will preferentially amplify full length product in an exponential fashion. It should be appreciated that smaller assembled products may be amplified if they contain the predetermined 5' and 3' ends. However, such smaller-than- expected products containing the predetermined 5' and 3' ends should only be generated if an error occurred during assembly (e.g., resulting in the deletion or omission of one or more regions of the target nucleic acid) and may be removed by size fractionation of the amplified product. Accordingly, a preparation containing a relatively high amount of full length product may be obtained directly by amplifying the product of an assembly reaction using primers that correspond to the predetermined 5' and 3' ends. In some embodiments, additional purification (e.g., size selection) techniques may be used to obtain a more purified preparation of amplified full-length nucleic acid fragment.
When designing a plurality of oligonucleotides to assemble a predetermined nucleic acid fragment, the sequence of the predetermined fragment will be provided by the oligonucleotides as described herein. However, the oligonucleotides may contain additional sequence information that may be removed during assembly or may be provided to assist in subsequent manipulations of the assembled nucleic acid fragment. Examples of additional sequences include, but are not limited to, primer recognition sequences for amplification (e.g., PCR primer recognition sequences), restriction enzyme recognition sequences, recombination sequences, other binding or recognition sequences, labeled sequences, etc. In some embodiments, one or more of the 5 '-most oligonucleotides, one or more of the 3 '-most oligonucleotides, or any combination thereof, may contain one or more additional sequences. In some embodiments, the additional sequence information may be contained in two or more adjacent oligonucleotides on either strand of the predetermined nucleic acid sequence.
Accordingly, an assembled nucleic acid fragment may contain additional sequences that may be used to connect the assembled fragment to one or more additional nucleic acid fragments (e.g., one or more other assembled fragments, fragments obtained from other sources, vectors, etc.) via ligation, recombination, polymerase-mediated assembly, etc. In some embodiments, purification may involve cloning one or more assembled nucleic acid fragments. The cloned product may be screened (e.g., sequenced, analyzed for an insert of the expected size, etc.).
In some embodiments, a nucleic acid fragment assembled from a plurality of oligonucleotides may be combined with one or more additional nucleic acid fragments using a polymerase-based and/or a ligase-based extension reaction similar to those described herein for oligonucleotide assembly. Accordingly, one or more overlapping nucleic acid fragments may be combined and assembled to produce a larger nucleic acid fragment as described herein. In certain embodiments, double-stranded overlapping oligonucleotide fragments may be combined. However, single-stranded fragments, or combinations of single-stranded and double-stranded fragments may be combined as described herein. A nucleic acid fragment assembled from a plurality of oligonucleotides may be of any length depending on the number and length of the oligonucleotides used in the assembly reaction. For example, a nucleic acid fragment (either single-stranded or double-stranded) assembled from a plurality of oligonucleotides may be between 50 and 1,000 nucleotides long (for example, about 70 nucleotides long, between 100 and 500 nucleotides long, between 200 and 400 nucleotides long, about 200 nucleotides long, about 300 nucleotides long, about 400 nucleotides long, etc.). One or more such nucleic acid fragments (e.g., with overlapping 3' and/or 5' ends) may be assembled to form a larger nucleic acid fragment (single- stranded or double-stranded) as described herein.
A full length product assembled from smaller nucleic acid fragments also may be isolated or purified as described herein (e.g., using a size selection, cloning, selective binding or other suitable purification procedure). In addition, any assembled nucleic acid fragment (e.g., full-length nucleic acid fragment) described herein may be amplified (prior to, as part of, or after, a purification procedure) using appropriate 5' and 3' amplification primers.
Synthetic Oligonucleotides:
It should be appreciated that the terms P Group and N Group oligonucleotides are used herein for clarity purposes only, and to illustrate several embodiments of multiplex oligonucleotide assembly. The Group P and Group N oligonucleotides described herein are interchangeable, and may be referred to as first and second groups of oligonucleotides corresponding to sequences on complementary strands of a target nucleic acid fragment.
Oligonucleotides may be synthesized using any suitable technique. For example, oligonucleotides may be synthesized on a column or other support (e.g., a chip). Examples of chip-based synthesis techniques include techniques used in synthesis devices or methods available from Combimatrix, Agilent, Affymetrix, or other sources. A synthetic oligonucleotide may be of any suitable size, for example between 10 and 1,000 nucleotides long (e.g., between 10 and 200, 200 and 500, 500 and 1,000 nucleotides long, or any combination thereof). An assembly reaction may include a plurality of oligonucleotides, each of which independently may be between 10 and 200 nucleotides in length (e.g., between 20 and 150, between 30 and 100, 30 to 90, 30-80, 30-70, 30-60, 35-55, 40-50, or any intermediate number of nucleotides). However, one or more shorter or longer oligonucleotides may be used in certain embodiments. Oligonucleotides may be provided as single stranded synthetic products. However, in some embodiments, oligonucleotides may be provided as double-stranded preparations including an annealed complementary strand. Oligonucleotides may be molecules of DNA, RNA, PNA, or any combination thereof. A double-stranded oligonucleotide may be produced by amplifying a single-stranded synthetic oligonucleotide or other suitable template (e.g., a sequence in a nucleic acid preparation such as a nucleic acid vector or genomic nucleic acid). Accordingly, a plurality of oligonucleotides designed to have the sequence features described herein may be provided as a plurality of single-stranded oligonucleotides having those feature, or also may be provided along with complementary oligonucleotides. In some embodiments, an oligonucleotide may be phosphorylated (e.g., with a 5' phosphate). In some embodiments, an oligonucleotide may be non-phosphorylated.
In some embodiments, an oligonucleotide may be amplified using an appropriate primer pair with one primer corresponding to each end of the oligonucleotide (e.g., one that is complementary to the 3' end of the oligonucleotide and one that is identical to the 5' end of the oligonucleotide). In some embodiments, an oligonucleotide may be designed to contain a central assembly sequence (designed to be incorporated into the target nucleic acid) flanked by a 5' amplification sequence (e.g., a 5' universal sequence) and a 3* amplification sequence (e.g., a 3' universal sequence). Amplification primers (e.g., between 10 and 50 nucleotides long, between 15 and 45 nucleotides long, about 25 nucleotides long, etc.) corresponding to the flanking amplification sequences may be used to amplify the oligonucleotide (e.g., one primer may be complementary to the 3' amplification sequence and one primer may have the same sequence as the 5' amplification sequence). The amplification sequences then may be removed from the amplified oligonucleotide using any suitable technique to produce an oligonucleotide that contains only the assembly sequence.
In some embodiments, a plurality of different oligonucleotides (e.g., about 5, 10, 50, 100, or more) with different central assembly sequences may have identical 5' amplification sequences and identical 3' amplification sequences. These oligonucleotides can all be amplified in the same reaction using the same amplification primers.
A preparation of an oligonucleotide designed to have a certain sequence may include oligonucleotide molecules having the designed sequence in addition to oligonucleotide molecules that contain errors (e.g., that differ from the designed sequence at least at one position). A sequence error may include one or more nucleotide deletions, additions, substitutions (e.g., transversion or transition), inversions, duplications, or any combination of two or more thereof. Oligonucleotide errors may be generated during oligonucleotide synthesis. Different synthetic techniques may be prone to different error profiles and frequencies. In some embodiments, error rates may vary from 1/10 to 1/200 errors per base depending on the synthesis protocol that is used.
However, in some embodiments lower error rates may be achieved. Also, the types of errors may depend on the synthetic techniques that are used. For example, in some embodiments chip-based oligonucleotide synthesis may result in relatively more deletions than column-based synthetic techniques.
In some embodiments, one or more oligonucleotide preparations may be processed to remove (or reduce the frequency of) error-containing oligonucleotides. In some embodiments, a hybridization technique may be used wherein an oligonucleotide preparation is hybridized under stringent conditions one or more times to an immobilized oligonucleotide preparation designed to have a complementary sequence. Oligonucleotides that do not bind may be removed in order to selectively or specifically remove oligonucleotides that contain errors that would destabilize hybridization under the conditions used. It should be appreciated that this processing may not remove all error-containing oligonucleotides since many have only one or two sequence errors and may still bind to the immobilized oligonucleotides with sufficient affinity for a fraction of them to remain bound through this selection processing procedure.
In some embodiments, a nucleic acid binding protein or recombinase (e.g., RecA) may be included in one or more of the oligonucleotide processing steps to improve the selection of error free oligonucleotides. For example, by preferentially promoting the hybridization of oligonucleotides that are completely complementary with the immobilized oligonucleotides, the amount of error containing oligonucleotides that are bound may be reduced. As a result, this oligonucleotide processing procedure may remove more error-containing oligonucleotides and generate an oligonucleotide preparation that has a lower error frequency (e.g., with an error rate of less than 1/50, less than 1/100, less than 1/200, less than 1/300, less than 1/400, less than 1/500, less than 1/1,000, or less than 1/2,000 errors per base.
A plurality of oligonucleotides used in an assembly reaction may contain preparations of synthetic oligonucleotides, single-stranded oligonucleotides, double- stranded oligonucleotides, amplification products, oligonucleotides that are processed to remove (or reduce the frequency of) error-containing variants, etc., or any combination of two or more thereof.
In some aspects, synthetic oligonucleotides synthesized on an array (e.g., a chip) are not amplified prior to assembly. In some embodiments, a polymerase-based or ligase-based assembly using non-amplified oligonucleotides may be performed in a microfluidic device. In some aspects, a synthetic oligonucleotide may be amplified prior to use. Either strand of a double- stranded amplification product may be used as an assembly oligonucleotide and added to an assembly reaction as described herein. A synthetic oligonucleotide may be amplified using a pair of amplification primers (e.g., a first primer that hybridizes to the 3' region of the oligonucleotide and a second primer that hybridizes to the 3' region of the complement of the oligonucleotide). The oligonucleotide may be synthesized on a support such as a chip (e.g., using an ink-jet- based synthesis technology). In some embodiments, the oligonucleotide may be amplified while it is still attached to the support. In some embodiments, the oligonucleotide may be removed or cleaved from the support prior to amplification. The two strands of a double-stranded amplification product may be separated and isolated using any suitable technique. In some embodiments, the two strands may be differentially labeled (e.g., using one or more different molecular weight, affinity, fluorescent, electrostatic, magnetic, and/or other suitable tags). The different labels may be used to purify and/or isolate one or both strands. In some embodiments, biotin may be used as a purification tag. In some embodiments, the strand that is to be used for assembly may be directly purified (e.g., using an affinity or other suitable tag). In some embodiments, the complementary strand is removed (e.g., using an affinity or other suitable tag) and the remaining strand is used for assembly. In some embodiments, a synthetic oligonucleotide may include a central assembly sequence flanked by 5' and 3' amplification sequences. The central assembly sequence is designed for incorporation into an assembled nucleic acid. The flanking sequences are designed for amplification and are not intended to be incorporated into the assembled nucleic acid. The flanking amplification sequences may be used as universal primer sequences to amplify a plurality of different assembly oligonucleotides that share the same amplification sequences but have different central assembly sequences. In some embodiments, the flanking sequences are removed after amplification to produce an oligonucleotide that contains only the assembly sequence.
In some embodiments, one of the two amplification primers may be biotinylated. The nucleic acid strand that incorporates this biotinylated primer during amplification can be affinity purified using streptavidin (e.g., bound to a bead, column, or other surface). In some embodiments, the amplification primers also may be designed to include certain sequence features that can be used to remove the primer regions after amplification in order to produce a single-stranded assembly oligonucleotide that includes the assembly sequence without the flanking amplification sequences.
In some embodiments, the non-biotinylated strand may be used for assembly. The assembly oligonucleotide may be purified by removing the biotinylated complementary strand. In some embodiments, the amplification sequences may be removed if the non-biotinylated primer includes a dU at its 3' end, and if the amplification sequence recognized by (i.e., complementary to) the biotinylated primer includes at most three of the four nucleotides and the fourth nucleotide is present in the assembly sequence at (or adjacent to) the junction between the amplification sequence and the assembly sequence. After amplification, the double-stranded product is incubated with T4 DNA polymerase (or other polymerase having a suitable editing activity) in the presence of the fourth nucleotide (without any of the nucleotides that are present in the amplification sequence recognized by the biotinylated primer) under appropriate reaction conditions. Under these conditions, the 3 ' nucleotides are progressively removed through to the nucleotide that is not present in the amplification sequence (referred to as the fourth nucleotide above). As a result, the amplification sequence that is recognized by the biotinylated primer is removed. The biotinylated strand is then removed. The remaining non-biotinylated strand is then treated with uracil-DNA glycosylase (UDG) to remove the non-biotinylated primer sequence. This technique generates a single-stranded assembly oligonucleotide without the flanking amplification sequences. It should be appreciated that this technique may be used to process a single amplified oligonucleotide preparation or a plurality of different amplified oligonucleotides in a single reaction if they share the same amplification sequence features described above. In some embodiments, the biotinylated strand may be used for assembly. The assembly oligonucleotide may be obtained directly by isolating the biotinylated strand. In some embodiments, the amplification sequences may be removed if the biotinylated primer includes a dU at its 3' end, and if the amplification sequence recognized by (i.e., complementary to) the non-biotinylated primer includes at most three of the four nucleotides and the fourth nucleotide is present in the assembly sequence at (or adjacent to) the junction between the amplification sequence and the assembly sequence. After amplification, the double-stranded product is incubated with T4 DNA polymerase (or other polymerase having a suitable editing activity) in the presence of the fourth nucleotide (without any of the nucleotides that are present in the amplification sequence recognized by the non-biotinylated primer) under appropriate reaction conditions. Under these conditions, the 3' nucleotides are progressively removed through to the nucleotide that is not present in the amplification sequence (referred to as the fourth nucleotide above). As a result, the amplification sequence that is recognized by the non- biotinylated primer is removed. The biotinylated strand is then isolated (and the non- biotinylated strand is removed). The isolated biotinylated strand is then treated with UDG to remove the biotinylated primer sequence. This technique generates a single- stranded assembly oligonucleotide without the flanking amplification sequences. It should be appreciated that this technique may be used to process a single amplified oligonucleotide preparation or a plurality of different amplified oligonucleotides in a single reaction if they share the same amplification sequence features described above.
It should be appreciated that the biotinylated primer may be designed to anneal to either the synthetic oligonucleotide or to its complement for the amplification and purification reactions described above. Similarly, the non-biotinylated primer may be designed to anneal to either strand provided it anneals to the strand that is complementary to the strand recognized by the biotinylated primer.
In certain embodiments, it may be helpful to include one or more modified oligonucleotides in an assembly reaction. An oligonucleotide may be modified by incorporating a modified-base (e.g., a nucleotide analog) during synthesis, by modifying the oligonucleotide after synthesis, or any combination thereof. Examples of modifications include, but are not limited to, one or more of the following: universal bases such as nitroindoles, dP and dK, inosine, uracil; halogenated bases such as BrdU; fluorescent labeled bases; non-radioactive labels such as biotin (as a derivative of dT) and digoxigenin (DIG); 2,4-Dinitrophenyl (DNP); radioactive nucleotides; post-coupling modification such as dR-NFt (deoxyribose-NH2); Acridine (6-chloro-2- methoxiacridine); and spacer phosphoramides which are used during synthesis to add a spacer 'arm' into the sequence, such as C3, C 8 (octanediol), C9, C 12, HEG (hexaethlene glycol) and Cl 8. It should be appreciated that one or more nucleic acid binding proteins or recombinases are preferably not included in a post-assembly fidelity optimization technique (e.g., a screening technique using a MutS or MutS homolog), because the optimization procedure involves removing error-containing nucleic acids via the production and removal of heteroduplexes. Accordingly, any nucleic acid binding proteins or recombinases (e.g., RecA) that were included in the assembly steps are preferably removed (e.g., by inactivation, column purification or other suitable technique) after assembly and prior to fidelity optimization.
Applications:
Aspects of the invention may be useful for a range of applications involving the production and/or use of synthetic nucleic acids. As described herein, the invention provides methods for assembling synthetic nucleic acids with increased efficiency. The resulting assembled nucleic acids may be amplified in vitro (e.g., using PCR, LCR, or any suitable amplification technique), amplified in vivo (e.g., via cloning into a suitable vector), isolated and/or purified. An assembled nucleic acid (alone or cloned into a vector) may be transformed into a host cell (e.g., a prokaryotic, eukaryotic, insect, mammalian, or other host cell). In some embodiments, the host cell may be used to propagate the nucleic acid. In certain embodiments, the nucleic acid may be integrated into the genome of the host cell. In some embodiments, the nucleic acid may replace a corresponding nucleic acid region on the genome of the cell (e.g., via homologous recombination). Accordingly, nucleic acids may be used to produce recombinant organisms. In some embodiments, a target nucleic acid may be an entire genome or large fragments of a genome that are used to replace all or part of the genome of a host organism. Recombinant organisms also may be used for a variety of research, industrial, agricultural, and/or medical applications.
Many of the techniques described herein can be used together, applying combinations of one or more extension-based and/or ligation-based assembly techniques at one or more points to produce long nucleic acid molecules. For example, concerted assembly may be used to assemble oligonucleotide duplexes and nucleic acid fragments of less than 100 to more than 10,000 base pairs in length (e.g., 100 mers to 500 mers, 500 mers to 1,000 mers, 1,000 mers to 5,000 mers, 5, 000 mers to 10,000 mers, 25,000 mers, 50,000 mers, 75,000 mers, 100,000 mers, etc.). In an exemplary embodiment, methods described herein may be used during the assembly of an entire genome (or a large fragment thereof, e.g., about 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or more) of an organism (e.g., of a viral, bacterial, yeast, or other prokaryotic or eukaryotic organism), optionally incorporating specific modifications into the sequence at one or more desired locations.
Any of the nucleic acid products (e.g., including nucleic acids that are amplified, cloned, purified, isolated, etc.) may be packaged in any suitable format (e.g., in a stable buffer, lyophilized, etc.) for storage and/or shipping (e.g., for shipping to a distribution center or to a customer). Similarly, any of the host cells (e.g., cells transformed with a vector or having a modified genome) may be prepared in a suitable buffer for storage and or transport (e.g., for distribution to a customer). In some embodiments, cells may be frozen. However, other stable cell preparations also may be used. Host cells may be grown and expanded in culture. Host cells may be used for expressing one or more RNAs or polypeptides of interest (e.g., therapeutic, industrial, agricultural, and/or medical proteins). The expressed polypeptides may be natural polypeptides or non-natural polypeptides. The polypeptides may be isolated or purified for subsequent use. Accordingly, nucleic acid molecules generated using methods of the invention can be incorporated into a vector. The vector may be a cloning vector or an expression vector. A vector may comprise an origin of replication and one or more selectable markers (e.g., antibiotic resistant markers, auxotrophic markers, etc.). In some embodiments, the vector may be a viral vector. A viral vector may comprise nucleic acid sequences capable of infecting target cells. Similarly, in some embodiments, a prokaryotic expression vector operably linked to an appropriate promoter system can be used to transform target cells. In other embodiments, a eukaryotic vector operably linked to an appropriate promoter system can be used to transfect target cells or tissues.
Transcription and/or translation of the constructs described herein may be carried out in vitro (i.e., using cell-free systems) or in vivo (i.e., expressed in cells). In some embodiments, cell lysates may be prepared. In certain embodiments, expressed RNAs or polypeptides may be isolated or purified. Nucleic acids of the invention also may be used to add detection and/or purification tags to expressed polypeptides or fragments thereof. Examples of polypeptide-based fusion/tag include, but are not limited to, hexa- histidine (His6) Myc and HA, and other polypeptides with utility, such as GFP, GST, MBP, chitin and the like. In some embodiments, polypeptides may comprise one or more unnatural amino acid residue(s). In some embodiments, antibodies can be made against polypeptides or fragment(s) thereof encoded by one or more synthetic nucleic acids.
In certain embodiments, synthetic nucleic acids may be provided as libraries for screening in research and development (e.g., to identify potential therapeutic proteins or peptides, to identify potential protein targets for drug development, etc.)
In some embodiments, a synthetic nucleic acid may be used as a therapeutic (e.g., for gene therapy, or for gene regulation). For example, a synthetic nucleic acid may be administered to a patient in an amount sufficient to express a therapeutic amount of a protein. In other embodiments, a synthetic nucleic acid may be administered to a patient in an amount sufficient to regulate (e.g., down-regulate) the expression of a gene.
It should be appreciated that different acts or embodiments described herein may be performed independently and may be performed at different locations in the United States or outside the United States. For example, each of the acts of receiving an order for a target nucleic acid, analyzing a target nucleic acid sequence, identifying an assembly strategy, designing one or more starting nucleic acids (e.g., oligonucleotides), synthesizing starting nucleic acid(s), purifying starting nucleic acid(s), assembling starting nucleic acid(s), isolating assembled nucleic acid(s), confirming the sequence of assembled nucleic acid(s), manipulating assembled nucleic acid(s) (e.g., amplifying, cloning, inserting into a host genome, etc.), and any other acts or any parts of these acts may be performed independently either at one location or at different sites within the United States or outside the United States. In some embodiments, an assembly procedure may involve a combination of acts that are performed at one site (in the United States or outside the United States) and acts that are performed at one or more remote sites (within the United States or outside the United States).
Automated applications:
Aspects of the invention may include automating one or more acts described herein. For example, a sequence analysis may be automated in order to generate a synthesis strategy automatically. The synthesis strategy may include i) the design of the starting nucleic acids that are to be assembled into the target nucleic acid, ii) the choice of the assembly technique(s) to be used, iii) the number of rounds of assembly and error screening or sequencing steps to include, and/or decisions relating to subsequent processing of an assembled target nucleic acid. Similarly, one or more steps of an assembly reaction may be automated using one or more automated sample handling devices (e.g., one or more automated liquid or fluid handling devices). For example, the synthesis and optional selection of starting nucleic acids (e.g., oligonucleotides) may be automated using a nucleic acid synthesizer and automated procedures. Automated devices and procedures may be used to mix reaction reagents, including one or more of the following: starting nucleic acids, buffers, enzymes (e.g., one or more ligases and/or polymerases), nucleotides, nucleic acid binding proteins or recombinases, salts, and any other suitable agents such as stabilizing agents. In some embodiments, reaction reagents may include one or more reagents or reaction conditions suitable for extension-based assembly, ligation-based assembly, or combinations thereof. Automated devices and procedures also may be used to control the reaction conditions. For example, an automated thermal cycler may be used to control reaction temperatures and any temperature cycles that may be used. In some embodiments, a thermal cycler may be automated to provide one or more reaction temperatures or temperature cycles suitable for incubating nucleic acid fragments prior to transformation. Similarly, subsequent purification and analysis of assembled nucleic acid products may be automated. For example, fidelity optimization steps (e.g., a MutS error screening procedure) may be automated using appropriate sample processing devices and associated protocols. Sequencing also may be automated using a sequencing device and automated sequencing protocols. Additional steps (e.g., amplification, cloning, etc.) also may be automated using one or more appropriate devices and related protocols. It should be appreciated that one or more of the device or device components described herein may be combined in a system (e.g., a robotic system). Assembly reaction mixtures (e.g., liquid reaction samples) may be transferred from one component of the system to another using automated devices and procedures (e.g., robotic manipulation and/or transfer of samples and/or sample containers, including automated pipetting devices, etc.). The system and any components thereof may be controlled by a control system.
Accordingly, acts of the invention may be automated using, for example, a computer system (e.g., a computer controlled system). A computer system on which aspects of the invention can be implemented may include a computer for any type of processing (e.g., sequence analysis and/or automated device control as described herein). However, it should be appreciated that certain processing steps may be provided by one or more of the automated devices that are part of the assembly system. In some embodiments, a computer system may include two or more computers. For example, one computer may be coupled, via a network, to a second computer. One computer may perform sequence analysis. The second computer may control one or more of the automated synthesis and assembly devices in the system. In other aspects, additional computers may be included in the network to control one or more of the analysis or processing acts. Each computer may include a memory and processor. The computers can take any form, as the aspects of the present invention are not limited to being implemented on any particular computer platform. Similarly, the network can take any form, including a private network or a public network (e.g., the Internet). Display devices can be associated with one or more of the devices and computers. Alternatively, or in addition, a display device may be located at a remote site and connected for displaying the output of an analysis in accordance with the invention. Connections between the different components of the system may be via wire, wireless transmission, satellite transmission, any other suitable transmission, or any combination of two or more of the above.
In accordance with one embodiment of the present invention for use on a computer system it is contemplated that sequence information (e.g., a target sequence, a processed analysis of the target sequence, etc.) can be obtained and then sent over a public network, such as the Internet, to a remote location to be processed by computer to produce any of the various types of outputs discussed herein (e.g., in connection with oligonucleotide design). However, it should be appreciated that the aspects of the present invention described herein are not limited in that respect, and that numerous other configurations are possible. For example, all of the analysis and processing described herein can alternatively be implemented on a computer that is attached locally to a device, an assembly system, or one or more components of an assembly system. As a further alternative, as opposed to transmitting sequence information (e.g., a target sequence, a processed analysis of the target sequence, etc.) over a communication medium (e.g., the network), the information can be loaded onto a computer readable medium that can then be physically transported to another computer for processing in the manners described herein. In another embodiment, a combination of two or more transmission/delivery techniques may be used. It also should be appreciated that computer implementable programs for performing a sequence analysis or controlling one or more of the devices, systems, or system components described herein also may be transmitted via a network or loaded onto a computer readable medium as described herein. Accordingly, aspects of the invention may involve performing one or more steps within the United States and additional steps outside the United States. In some embodiments, sequence information (e.g., a customer order) may be received at one location (e.g., in one country) and sent to a remote location for processing (e.g., in the same country or in a different country), for example, for sequence analysis to determine a synthesis strategy and/or design oligonucleotides. In certain embodiments, a portion of the sequence analysis may be performed at one site (e.g., in one country) and another portion at another site (e.g., in the same country or in another country). In some embodiments, different steps in the sequence analysis may be performed at multiple sites (e.g., all in one country or in several different countries). The results of a sequence analysis then may be sent to a further site for synthesis. However, in some embodiments, different synthesis and quality control steps may be performed at more than one site (e.g., within one county or in two or more countries). An assembled nucleic acid then may be shipped to a further site (e.g., either to a central shipping center or directly to a client).
Each of the different aspects, embodiments, or acts of the present invention described herein can be independently automated and implemented in any of numerous ways. For example, each aspect, embodiment, or act can be independently implemented using hardware, software or a combination thereof. When implemented in software, the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers. It should be appreciated that any component or collection of components that perform the functions described above can be generically considered as one or more controllers that control the above-discussed functions. The one or more controllers can be implemented in numerous ways, such as with dedicated hardware, or with general purpose hardware (e.g., one or more processors) that is programmed using microcode or software to perform the functions recited above.
In this respect, it should be appreciated that one implementation of the embodiments of the present invention comprises at least one computer-readable medium (e.g., a computer memory, a floppy disk, a compact disk, a tape, etc.) encoded with a computer program (i.e., a plurality of instructions), which, when executed on a processor, performs one or more of the above-discussed functions of the present invention. The computer-readable medium can be transportable such that the program stored thereon can be loaded onto any computer system resource to implement one or more functions of the present invention discussed herein. In addition, it should be appreciated that the reference to a computer program which, when executed, performs the above-discussed functions, is not limited to an application program running on a host computer. Rather, the term computer program is used herein in a generic sense to reference any type of computer code (e.g., software or microcode) that can be employed to program a processor to implement the above-discussed aspects of the present invention. It should be appreciated that in accordance with several embodiments of the present invention wherein processes are implemented in a computer readable medium, the computer implemented processes may, during the course of their execution, receive input manually (e.g., from a user).
Accordingly, overall system-level control of the assembly devices or components described herein may be performed by a system controller which may provide control signals to the associated nucleic acid synthesizers, liquid handling devices, thermal cyclers, sequencing devices, associated robotic components, as well as other suitable systems for performing the desired input/output or other control functions. Thus, the system controller along with any device controllers together form a controller that controls the operation of a nucleic acid assembly system. The controller may include a general purpose data processing system, which can be a general purpose computer, or network of general purpose computers, and other associated devices, including communications devices, modems, and/or other circuitry or components necessary to perform the desired input/output or other functions. The controller can also be implemented, at least in part, as a single special purpose integrated circuit (e.g., ASIC) or an array of ASICs, each having a main or central processor section for overall, system- level control, and separate sections dedicated to performing various different specific computations, functions and other processes under the control of the central processor section. The controller can also be implemented using a plurality of separate dedicated programmable integrated or other electronic circuits or devices, e.g., hard wired electronic or logic circuits such as discrete element circuits or programmable logic devices. The controller can also include any other components or devices, such as user input/output devices (monitors, displays, printers, a keyboard, a user pointing device, touch screen, or other user interface, etc.), data storage devices, drive motors, linkages, valve controllers, robotic devices, vacuum and other pumps, pressure sensors, detectors, power supplies, pulse sources, communication devices or other electronic circuitry or components, and so on. The controller also may control operation of other portions of a system, such as automated client order processing, quality control, packaging, shipping, billing, etc., to perform other suitable functions known in the art but not described in detail herein.
Business applications: Aspects of the invention may be useful to streamline nucleic acid assembly reactions. Accordingly, aspects of the invention relate to marketing methods, compositions, kits, devices, and systems for increasing nucleic acid assembly throughput involving combinations of one or more extension-based and/or ligation-based assembly techniques described herein. Aspects of the invention may be useful for reducing the time and/or cost of production, commercialization, and/or development of synthetic nucleic acids, and/or related compositions. Accordingly, aspects of the invention relate to business methods that involve collaboratively (e.g., with a partner) or independently marketing one or more methods, kits, compositions, devices, or systems for analyzing and/or assembling synthetic nucleic acids as described herein. For example, certain embodiments of the invention may involve marketing a procedure and/or associated devices or systems involving nucleic acid assembly techniques described herein. In some embodiments, synthetic nucleic acids, libraries of synthetic nucleic acids, host cells containing synthetic nucleic acids, expressed polypeptides or proteins, etc., also may be marketed. Marketing may involve providing information and/or samples relating to methods, kits, compositions, devices, and/or systems described herein. Potential customers or partners may be, for example, companies in the pharmaceutical, biotechnology and agricultural industries, as well as academic centers and government research organizations or institutes. Business applications also may involve generating revenue through sales and/or licenses of methods, kits, compositions, devices, and/or systems of the invention. EXAMPLES
Example 1. Nucleic acid fragment assembly.
Gene assembly via a 2-step PCR method: In step (1), a primerless assembly of oligonucleotides is performed and in step (2) an assembled nucleic acid fragment is amplified in a primer-based amplification.
A 993 base long promoter>EGFP construct was assembled from 50-mer abutting oligonucleotides using a 2-step PCR assembly.
Mixed oligonucleotide pools were prepared as follows: 36 overlapping 50-mer oligonucleotides and two 5' terminal 59-mers were separated into 4 pools, each corresponding to overlapping 200-300 nucleotide segments of the final construct. The total oligonucleotide concentration in each pool was 5 μM.
A primerless PCR extension reaction was used to stitch (assemble) overlapping oligonucleotides in each pool. The PCR extension reaction mixture was as follows: oligonucleotide pool (5 μM total) 1.0 μl (~ 25 nM final each) dNTP (10 mM each) 0.5 μl (250 μM final each)
Pfu buffer (1 Ox) 2.0 μl
Pfu polymerase (2.5 U/μl) 0.5 μl dH2O to 20 μl Assembly was achieved by cycling this mixture through several rounds of denaturing, annealing, and extension reactions as follows: start 2 min. 95°C
30 cycles of 95°C 30 sec, 65°C 30 sec, 72°C 1 min. final 72°C 2 min. extension step The resulting product was exposed to amplification conditions to amplify the desired nucleic acid fragments (sub-segments of 200-300 nucleotides). The following PCR mix was used: primerless PCR product 1.0 μl primer 5' ( 1.2 μM) 5 μl (300 nM final) primer 3' (1.2 μM) 5 μl (300 nM final) dNTP (10 mM each) 0.5 μl (250 μM final each)
Pfu buffer (1 Ox) 2.0 μl PfIi polymerase (2.5 U/μl) 0.5 μl
ClH2O to 20 μl The following PCR cycle conditions were used: start 2 min. 95°C 35 cycles of 950C 30 sec, 65°C 30 sec, 72°C 1 min. final 72°C 2 min. extension step
The amplified sub-segments were assembled using another round of primerless PCR as follows. A diluted amplification product was prepared for each sub-segment by diluting each amplified sub-segment PCR product 1 :10 (4 μl mix + 36 μl dHbO). This diluted mix was used as follows: diluted sub-segment mix 1.0 μl dNTP (1OmM each) 0.5 μl (250 μM final each)
Pfu buffer (1 Ox) 2.0 μl
Pfu polymerase (2.5 U/μl) 0.5 μl dH2O to 20 μl
The following PCR cycle conditions were used: start 2 min. 95°C
30 cycles of 950C 30 sec, 65°C 30 sec, 72°C 1 min. final 720C 2 min. extension step The full-length 993 nucleotide long promoter>EGFP was amplified in the following PCR mix: assembled sub-segments 1.0 μl primer 5' (1.2 μM) 5 μl (300 nM final) primer 3' (1.2 μM) 5 μl (300 nM final) dNTP (10 mM each) 0.5 μl (250 μM final each)
Pfu buffer (1 Ox) 2.0 μl
Pfu polymerase (2.5 U/μl) 0.5 μl dH2O to 20 μl
The following PCR cycle conditions were used: start 2 min. 95°C
35 cycles of 950C 30 sec, 65°C 30 sec, 72°C 1 min. final 72°C 2 min. extension step Example 2: General protocol.
In this example, an assembly cycle using activation of one or more vector- encoded traits to isolate correctly assembled constructs will involve the following steps. 1 - DNA preparation;
2- Digestion of insert DNA (e.g., at alternating sites 1, 3 and 2, 4 as described for FIGs. 6-8);
3- Paired ligation of 1-3 cut fragments with 2-4 cut fragments together with vector, DNA;
4- Transformation of host cells with ligation reaction mixture; 5- Recovery growth of the transformed cells in SOC (l-2h); and
6- Transfer of the transformed cell culture to selective liquid media for growth in suspension (10-15 h).
This protocol selects for correctly-ligated insert DNA to propagate, removing any background or contaminating vector DNA. This enables an automated, 'hands-off assembly scheme that eliminates colony grow-up during the assembly phase. This process may involve approximately log2N assembly steps where N is the total number of fragments to be combined (e.g., building a 50 kb final construct from 50 sequence- verified 1 kb fragments would involve six steps).
Due to stringent selection for correctly-ligated insert DNA in the transformed host cell culture, this process will not require colony isolation and grow-up, thereby decreasing assembly time. For example, assuming that each step could be completed in one day, construction of a 50 kb construct from 50 sequence- verified 1 kb segments would take six days. One additional day would be required for assembly of a 100 kb construct from 100 sequence- verified 1 kb segments. This protocol may be automated (e.g., using a microfluidic device).
Example 3: Automation.
In this example of an automated assembly scheme, the following steps will be automated. 1 - Transfer of enzyme mix (1 , 3 or 2, 4) to prepped DNA;
2- Temperature-controlled incubation for digest followed by heat inactivation (e.g., on a block);
3- Mixing of 1, 3 and 2, 4 digests; 4- Transfer of 1 , 4-linearized vector and ligase to digest mix;
5- Cell transformation;
6- Suspension growth in selective media; and
7- Automated DNA prep (e.g., CosMC prep on Biomek FX). One or more of these steps may be automated on a microfluidic device.
Example 4: Methods and Use of Assembly by Marker Activation.
Assembly by marker activation (referred to operationally as pairwise selection assembly (PSA)) was used in the construction of large fragments. In this example, this approach was used to construct a 22 kb fragment (target product) from -400 bp starting fragments. In this process, individually cloned DNA sequences (or linear fragments produced from an amplification or assembly process) contain -65 bp activation tags at both termini. These tags are short DNA sequences necessary for the activation of nonfunctional antibiotic resistance markers present on the target vector. In one example, both tags incorporate promoter regions necessary for the activation of two independent non-functional markers on the target vector. Two fragments to be combined are digested such that only one tag would be retained for each fragment. After ligation with vector, selection based on the two activated markers yields correctly-ligated insert DNA. This process is repeated, switching between two vectors with different markers, until the desired DNA sequence has been constructed. The process is shown schematically in FIG. 10.
PSA vectors constructed for use in the assembly process contain a functional ampicillin resistance marker and two non-functional resistance markers. These vectors are illustrated in FIG. 11. These non-functional resistance markers are either chloramphenical and kanamycin (pCK) or tetracycline and specintomycin (pTS) (see Figure 11). pCK and pTS vectors have been constructed such that they contain either a high-copy number origin of replication, or a BAC-based single-copy number origin of replication. The former versions enable DNA assembly up to ~10 kb (although we have successfully assembled up to 22 kb), while the latter BAC-based vectors enable construction up to -300 kb. Transition from one vector type to another is seamless, as both vector types have the same non-functional markers that are activated by the same activation tags (i.e., they differ only in the origin of replication). Following the protocol presented below, it was demonstrated that: (1) selection of correct ligation products in vivo, obviating the need for individual clone purification through colony isolation; (2) assembly of unpurifϊed PCR products containing activation tags; (3) assembly from cloned fragments flanked by activation tags; and (4) sequence- independent assembly where internal restriction sites are blocked from digestion via site methylation. The use of methyl-sensitive restriction enzymes and RecA-mediated site- blocking enables the assembly of DNA molecules without the need for modification (e.g., to remove restriction sites).
An example of a PSA process flow chart used for the experiment is as follows:
1. Input fragments constructed with flanking activation tags
X 2. RecA-bound oligo hybridization to substrate DNA (if necessary)
X 3. methylation of non-blocked substrate DNA (if necessary)
I
4. digestion of substrate DNA
X
5. pair mixing and ligation i
6. cell transformation
X 7. in vivo selection by marker activation
X 8. DNA extraction
X repeat steps 2-8
An example of a PSA protocol is summarized below. All steps outlined below occurred after DNA preparation and concentration normalization (50 ng/μl). It should be noted that Steps I-III apply only to constructs that require blocking ('L' fragments with internal BsmBI sites or 'R' fragments with internal BtgZI sites). The specific blocking oligonucleotides used in the experiment are listed at the end of this protocol.
Step I. Polymerize blocking oligos with RecA
Figure imgf000094_0001
Figure imgf000095_0001
Step II. Substrate addition, synapsis
Figure imgf000095_0002
Step III. Substrate methylation
Figure imgf000095_0003
It should be noted that ratio of methyl-donor, e.g., S-adenosylmethionine (SAM), to methyl-acceptor (2xCpG sites on substrate) should be taken into account at this step. If random nucleotide composition assumed, then reaction conditions indicated below would yield >60-fold excess methyl-donor for most insert sizes >1 kb.
Figure imgf000095_0004
Step IV. Digestion (NOTE: all constructs that DO NOT require blocking start at this step.)
Figure imgf000096_0001
For constructs that DO NOT require blocking:
Figure imgf000096_0002
For constructs that require blocking, 1 unit BsmBI to 'L' fragments only (1 μl of 1:10 dilution of stock 10 units/μl) and 2 units BtgZI to 'R' fragments only (1 μl of stock 2 units/μl) was directly added to the 30 μl blocking reaction; subsequently, the reaction was incubate as indicated above (55°C 50 min » 85°C 25 min).
Step V. Ligation
Materials: | T4 DNA ligase (NEB, 400,000 units/μl)
The digested sample was diluted 1 :2 (add 30 μl dH2O).
Figure imgf000096_0003
Step VI. Transformation
Transform competent DHlOB (or similar strain - must be sensitive to cam, kan, tc, and spn, and must be deficient in E. coli mcr, mrr restriction systems). Standard transformation protocols were followed (e.g., add 3 μl ligation reaction to 50 μl competent cells, heat shock 30 sec at 42°C, recover in 350 μl SOC 1 hr at 37°C, 250 rpm).
Following the transformation, the resulting culture was diluted 1 :50 in selective media and was grown overnight at 37°C, with shaking at -300 rpm. Subsequently, -200 μl of the culture was plated on selective plates and grown overnight at 37°C. For selection, appropriate antibiotics were added as follows: for pTS vector, 5 μg/ml Tc and 100 μg/ml Spn were added; for pCK vector, 12.5 μg/ml Cam and 25 μg/ml Kan were added.
The following table provides the sequences of blocking oligonucleotides that can be used in this protocol in combination with activation sequences illustrated in FIG. 12.
Figure imgf000097_0001
In FIG. 12, the left activation sequence contains BsmBI and BtgZI sites (corresponding to sites 1 and 2 described herein, respectively) within a modified promoter region. Similarly, the right activation sequence contains BsmBI and BtgZI sites (corresponding to sites 3 and 4 described herein, respectively) within a modified promoter region. Sequences represented by "Ui ..." represent the insert specific sequences. The left TTCA and the right ACTC overhangs are compatible with overhangs on the vector that is being used. It should be appreciated that a single blocking oligonucleotide (e.g., ctp-60 or ctd-60) may be used to block the left sequence from methylation. Similarly, a single oligonucleotide (e.g., ktd-60 or ktp-60) may be used to block the right sequence from methylation.
EQUIVALENTS The present invention provides among other things methods for assembling large polynucleotide constructs and organisms having increased genomic stability. While specific embodiments of the subject invention have been discussed, the above specification is illustrative and not restrictive. Many variations of the invention will become apparent to those skilled in the art upon review of this specification. The full scope of the invention should be determined by reference to the claims, along with their full scope of equivalents, and the specification, along with such variations.
INCORPORATION BY REFERENCE
All publications, patents and sequence database entries mentioned herein, including those items listed below, are hereby incorporated by reference in their entirety as if each individual publication or patent was specifically and individually indicated to be incorporated by reference. In case of conflict, the present application, including any definitions herein, will control.

Claims

1. A method for assembling nucleic acid segments, the method comprising digesting a first population of nucleic acids having at least first, second, third and fourth restriction sites, using a first set of restriction enzymes that cleave the nucleic acids at the first and third sites, digesting a second population of nucleic acids having at least first, second, third and fourth restriction sites, using a second set of restriction enzymes that cleave the nucleic acids at the second and fourth sites, wherein the first and second populations of nucleic acids comprise a first activation sequence located between the first and second restriction sites and a second activation sequence located between the third and fourth restriction sites, and digestion of the first population results in a first population of nucleic acid segments that comprises the first activation sequence but lacks the second activation sequence, and digestion of the second population results in a second population of nucleic acid segments that lacks the first activation sequence and comprises the second activation sequence, combining the first and second populations of nucleic acid segments with a first nucleic acid vector that is digested with one or more restriction enzymes that produce restriction site overhangs that are complementary to the overhangs generated by the first and fourth restriction enzymes on the first and second populations of nucleic acid segments, wherein the first nucleic acid vector comprises a coding sequence of a first marker gene 5' of the first restriction site and a coding sequence of a second marker gene 3' of the fourth restriction site, and isolating ligated first nucleic acid vectors that express the first and the second marker genes, wherein expression of the first and the second marker genes is indicative of correct assembly of the first and second populations of nucleic acid segments.
2. The method of claim 1, further comprising digesting a third population of nucleic acids having at least first, second, third and fourth restriction sites, using a first set of restriction enzymes that cleave the nucleic acids at the first and third sites, digesting a fourth population of nucleic acids having at least first, second, third and fourth restriction sites, using a second set of restriction enzymes that cleave the nucleic acids at the second and fourth sites, wherein the third and fourth populations of nucleic acids comprise a first activation sequence located between the first and second restriction sites and a second activation sequence located between the third and fourth restriction sites, and digestion of the third population results in a third population of nucleic acid segments that comprises the first activation sequence but lacks the second activation sequence, and digestion of the fourth population results in a fourth population of nucleic acid segments that lacks a first activation sequence and comprises a second activation sequence, combining in the presence of a ligase the third and fourth populations of nucleic acid segments with a second nucleic acid vector is digested with one or more restriction enzymes that produce restriction site overhangs that are complementary to the overhangs generated by the first and fourth restriction enzymes on the third and fourth populations of nucleic acid segments, wherein the second nucleic acid vector comprises a coding sequence of a first marker gene 5' of the first restriction site and a coding sequence of a second marker gene 3' of the fourth restriction site, selecting for ligated second nucleic acid vectors that express the first and the second marker genes, wherein expression of the first and the second marker genes is indicative of correct assembly of the third and fourth populations of nucleic acid segments, digesting the ligated second nucleic acid vector with restriction enzymes that cleave at the second and fourth restriction sites to release a fifth population of nucleic acid segments lacking a first activation sequence and comprising a second activation sequence, digesting the ligated first nucleic acid vector with restriction enzymes that cleave at the first and third restriction sites to release a sixth population of nucleic acid segments comprising a first activation sequence and lacking a second activation sequence, and combining the fifth and sixth populations of nucleic acid segments with a third nucleic acid vector digested with one or more restriction enzymes that produce restriction site overhangs that are complementary to the overhangs generated by the first and fourth restriction enzymes on the fifth and sixth populations of nucleic acid segments, wherein the third nucleic acid vector comprises a coding sequence of a third marker gene 5' of the first restriction site and a coding sequence of a fourth marker gene 3' of the fourth restriction site, and selecting for ligated third nucleic acid vectors that express the third and fourth marker genes, wherein expression of the third and fourth marker genes is indicative of correct assembly of the fifth and sixth populations of nucleic acid segments.
3. A method for assembling nucleic acid segments, the method comprising digesting a first population of nucleic acids having at least first, second, third and fourth restriction sites, using a first set of restriction enzymes that cleave the nucleic acids at the first and third sites, digesting a second population of nucleic acids having at least first, second, third and fourth restriction sites, using a second set of restriction enzymes that cleave the nucleic acids at the second and fourth sites, wherein the first and second populations of nucleic acids comprise a 5' promoter sequence located between the first and second restriction sites and a 3' promoter sequence located between the third and fourth restriction sites, and digestion of the first population results in a first population of nucleic acid segments that comprises the 5' promoter sequence but lacks the 3' promoter sequence, and digestion of the second population results in a second population of nucleic acid segments that lacks the 5' promoter sequence and comprises the 3' promoter sequence, combining in the presence of a ligase the first and second populations of nucleic acid segments with a first nucleic acid vector that is digested with restriction enzymes that cleave at the first and fourth restriction sites, wherein the first nucleic acid vector comprises a coding sequence of a first marker gene 5' of the first restriction site and a coding sequence of a second marker gene 3' of the fourth restriction site, and selecting for ligated first nucleic acid vectors that express the first and the second marker genes, wherein expression of the first and the second marker genes is indicative of correct assembly of the first and second populations of nucleic acid segments.
4. The method of claim 2, further comprising digesting a third population of nucleic acids having at least first, second, third and fourth restriction sites, using a first set of restriction enzymes that cleave the nucleic acids at the first and third sites, digesting a fourth population of nucleic acids having at least first, second, third and fourth restriction sites, using a second set of restriction enzymes that cleave the nucleic acids at the second and fourth sites, wherein the third and fourth populations of nucleic acids comprise a 5' promoter sequence located between the first and second restriction sites and a 3' promoter sequence between the third and fourth restriction sites, and digestion of the third population results in a third population of nucleic acid segments that comprises the 5' promoter sequence but lacks the 3' promoter sequence, and digestion of the fourth population results in a fourth population of nucleic acid segments that lacks a 5' promoter sequence and comprises a 3' promoter sequence, combining in the presence of a ligase the third and fourth populations of nucleic acid segments with a second nucleic acid vector that is digested with restriction enzymes that cleave at the first and fourth restriction sites, wherein the second nucleic acid vector comprises a coding sequence of a first marker gene 5' of the first restriction site and a coding sequence of a second marker gene 3' of the fourth restriction site, selecting for Ii gated second nucleic acid vectors that express the first and the second marker genes, wherein expression of the first and the second marker genes is indicative of correct assembly of the third and fourth populations of nucleic acid segments, digesting the ligated second nucleic acid vector with restriction enzymes that cleave at the second and fourth restriction sites to release a fifth population of nucleic acid segments lacking a 5' promoter sequence and comprising a 3' promoter sequence, digesting the ligated first nucleic acid vector with restriction enzymes that cleave at the first and third restriction sites to release a sixth population of nucleic acid segments comprising a 5' promoter sequence and lacking a 3' promoter sequence, and combining the fifth and sixth populations of nucleic acid segments with a third nucleic acid vector digested with restriction enzymes that cleave at the first and fourth restriction sites and having a third marker gene coding sequence 5' of the first restriction site and a fourth marker gene coding sequence 3' of the fourth restriction site, and selecting for ligated third nucleic acid vectors that express the third and fourth marker genes, wherein expression of the third and fourth marker genes is indicative of correct assembly of the fifth and sixth populations of nucleic acid segments.
5. The method of claim 1 or 3, wherein the first and second marker genes are antibiotic resistance genes.
6. The method of claim 2 or 4, wherein the first, second, third and fourth marker genes are antibiotic resistance genes.
7. The method of claim 1, 2, 3 or 4, wherein the restriction enzymes are type II restriction enzymes or type IIS restriction enzymes.
8. The method of claim 1 or 2, wherein the restriction enzymes that cleave the first restriction and fourth restriction sites are type II restriction enzymes, and the restriction enzymes that cleave the second and third restriction sites are type IIS restriction enzymes.
9. The method of claim 1 or 2, wherein the restriction enzymes that cleave the first, second, third, and fourth sites are type II restriction enzymes.
10. The method of claim 1, 2, 3 or 4, wherein the first, second, third and fourth populations of nucleic acids are cloned nucleic acids or PCR-derived nucleic acids.
11. The method of claim 1, 2, 3 or 4, wherein the first, second, third and fourth populations of nucleic acids comprise nucleic acids of about 1 kb in length.
12. The method of claim 2, further comprising digesting the ligated first nucleic acid vectors that express the first and the second marker genes using restriction enzymes that cleave at the first and fourth restriction sites in order to release an assembled nucleic acid.
13. The method of claim 3, further comprising digesting the ligated third nucleic acid vectors that express the third and fourth marker genes using restriction enzymes that cleave at the first and fourth restriction sites in order to release an assembled nucleic acid.
14. The method of claim 12 or 13, wherein the assembled nucleic acid is about SO kb in length.
15. The method of claim 12 or 13, wherein the assembled nucleic acid is about 100 kb in length.
16. The method of claim 1, wherein the first activation sequence is a promoter, terminator or other activation sequence.
17. The method of claim 1, wherein the second activation sequence is a promoter, terminator or other activation sequence.
18. The method of claim 1, wherein the first marker gene is a selectable marker.
19. The method of claim 1, wherein the second marker gene is a selectable marker.
20. The method of claim 1 , wherein the first marker gene is a detectable marker.
21. The method of claim 1, wherein the second marker gene is a detectable marker.
22. The method of claim 2, wherein the first and third marker genes are identical, and wherein the second and fourth marker genes are identical.
23. The method of claim 2, wherein the first and third restriction sites are identical.
24. The method of claim 2, wherein the second and fourth restriction sites are identical.
25. The method of claim 1, wherein specific digestion at the second restriction site is promoted by specifically methylating one or more additional second restriction sites present in the nucleic acid being assembled and digesting the nucleic acid with a methylation sensitive nuclease.
26. The method of claim 1, wherein specific digestion at the third restriction site is promoted by specifically methylating one or more additional third restriction sites present in the nucleic acid being assembled and digesting the nucleic acid with a methylation sensitive nuclease.
27. The method of any one of claims 1 to 4, further comprising repeating the digesting, combining, and selecting steps for 2-50 cycles to progressively assemble a final nucleic acid.
28. The method of claim 27, wherein the digesting, combining, and selecting steps are repeated for 2-10 cycles.
29. The method of claim 28, wherein the digesting, combining, and selecting steps are repeated for about 5 cycles.
30. A method of assembling a nucleic acid, the method comprising a plurality of consecutive alternating assembly cycles, wherein each alternating assembly cycle comprises: a) combining a first nucleic acid insert with a second nucleic acid insert and a first vector, wherein the first nucleic acid insert comprises a first portion of a target sequence and a first activation sequence, the second nucleic acid insert comprises a second portion of the target sequence and a second activation sequence, and the first vector comprises first and second activatable markers; b) selecting for correct assembly of the first and second nucleic acid inserts in the first vector by selecting for activation of the first and second activatable markers, wherein correct assembly results in activation of the first activatable marker by the first activation sequence and activation of the second activatable marker by the second activation sequence; c) isolating from step b) an assembled nucleic acid comprising the first and second portions of the target sequence and the first activation sequence, but not the second activation sequence; d) combining the nucleic acid of step c) with a third nucleic acid insert and a second vector, wherein .the third nucleic acid insert comprises a third portion of the target sequence and the second activation sequence, and the second vector comprises third and fourth activatable markers; e) selecting for correct assembly of the nucleic acid of step c) and the third nucleic acid insert in the second vector by selecting for activation of the third and fourth activatable markers, wherein correct assembly results in activation of the third activatable marker by the first activation sequence and activation of the fourth activatable marker by the second activation sequence; and f) isolating from step e) an assembled nucleic acid comprising the first, second, and third portions of the target sequence and the first activation sequence, but not the second activation sequence; wherein the nucleic acid of step f) can be combined in a subsequent assembly cycle starting at step a) with a with a fourth nucleic acid insert and the first vector, wherein the fourth nucleic acid insert comprises a fourth portion of the target sequence and the second activation sequence.
31. The method of claim 30, wherein 2-50 alternating assembly cycles are performed.
32. The method of claim 31, wherein 2-10 alternating assembly cycles are performed.
33. The method of claim 32, wherein about 5 alternating assembly cycles are performed.
PCT/US2007/019209 2006-08-31 2007-08-31 Iterative nucleic acid assembly using activation of vector-encoded traits WO2008027558A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US84184306P 2006-08-31 2006-08-31
US60/841,843 2006-08-31

Publications (2)

Publication Number Publication Date
WO2008027558A2 true WO2008027558A2 (en) 2008-03-06
WO2008027558A3 WO2008027558A3 (en) 2008-04-24

Family

ID=39030879

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/019209 WO2008027558A2 (en) 2006-08-31 2007-08-31 Iterative nucleic acid assembly using activation of vector-encoded traits

Country Status (2)

Country Link
US (3) US8053191B2 (en)
WO (1) WO2008027558A2 (en)

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8999679B2 (en) 2008-12-18 2015-04-07 Iti Scotland Limited Method for assembly of polynucleic acid sequences
US9555388B2 (en) 2013-08-05 2017-01-31 Twist Bioscience Corporation De novo synthesized gene libraries
US9677067B2 (en) 2015-02-04 2017-06-13 Twist Bioscience Corporation Compositions and methods for synthetic gene assembly
US9777305B2 (en) 2010-06-23 2017-10-03 Iti Scotland Limited Method for the assembly of a polynucleic acid sequence
US9895673B2 (en) 2015-12-01 2018-02-20 Twist Bioscience Corporation Functionalized surfaces and preparation thereof
US9981239B2 (en) 2015-04-21 2018-05-29 Twist Bioscience Corporation Devices and methods for oligonucleic acid library synthesis
US10053688B2 (en) 2016-08-22 2018-08-21 Twist Bioscience Corporation De novo synthesized nucleic acid libraries
US10081807B2 (en) 2012-04-24 2018-09-25 Gen9, Inc. Methods for sorting nucleic acids and multiplexed preparative in vitro cloning
US10202608B2 (en) 2006-08-31 2019-02-12 Gen9, Inc. Iterative nucleic acid assembly using activation of vector-encoded traits
EP2944693B1 (en) * 2011-08-26 2019-04-24 Gen9, Inc. Compositions and methods for high fidelity assembly of nucleic acids
US10308931B2 (en) 2012-03-21 2019-06-04 Gen9, Inc. Methods for screening proteins using DNA encoded chemical libraries as templates for enzyme catalysis
US10417457B2 (en) 2016-09-21 2019-09-17 Twist Bioscience Corporation Nucleic acid based data storage
US10457935B2 (en) 2010-11-12 2019-10-29 Gen9, Inc. Protein arrays and methods of using and making the same
US10669304B2 (en) 2015-02-04 2020-06-02 Twist Bioscience Corporation Methods and devices for de novo oligonucleic acid assembly
US10696965B2 (en) 2017-06-12 2020-06-30 Twist Bioscience Corporation Methods for seamless nucleic acid assembly
WO2020201434A1 (en) * 2019-04-02 2020-10-08 Oxford University Innovation Limited Universal dna assembly
US10844373B2 (en) 2015-09-18 2020-11-24 Twist Bioscience Corporation Oligonucleic acid variant libraries and synthesis thereof
US10894959B2 (en) 2017-03-15 2021-01-19 Twist Bioscience Corporation Variant libraries of the immunological synapse and synthesis thereof
US10894242B2 (en) 2017-10-20 2021-01-19 Twist Bioscience Corporation Heated nanowells for polynucleotide synthesis
US10907274B2 (en) 2016-12-16 2021-02-02 Twist Bioscience Corporation Variant libraries of the immunological synapse and synthesis thereof
US10936953B2 (en) 2018-01-04 2021-03-02 Twist Bioscience Corporation DNA-based digital information storage with sidewall electrodes
US11072789B2 (en) 2012-06-25 2021-07-27 Gen9, Inc. Methods for nucleic acid assembly and high throughput sequencing
US11084014B2 (en) 2010-11-12 2021-08-10 Gen9, Inc. Methods and devices for nucleic acids synthesis
US11332738B2 (en) 2019-06-21 2022-05-17 Twist Bioscience Corporation Barcode-based nucleic acid sequence assembly
US11377676B2 (en) 2017-06-12 2022-07-05 Twist Bioscience Corporation Methods for seamless nucleic acid assembly
US11407837B2 (en) 2017-09-11 2022-08-09 Twist Bioscience Corporation GPCR binding proteins and synthesis thereof
US11492727B2 (en) 2019-02-26 2022-11-08 Twist Bioscience Corporation Variant nucleic acid libraries for GLP1 receptor
US11492728B2 (en) 2019-02-26 2022-11-08 Twist Bioscience Corporation Variant nucleic acid libraries for antibody optimization
US11492665B2 (en) 2018-05-18 2022-11-08 Twist Bioscience Corporation Polynucleotides, reagents, and methods for nucleic acid hybridization
US11512347B2 (en) 2015-09-22 2022-11-29 Twist Bioscience Corporation Flexible substrates for nucleic acid synthesis
US11550939B2 (en) 2017-02-22 2023-01-10 Twist Bioscience Corporation Nucleic acid based data storage using enzymatic bioencryption
US12091777B2 (en) 2019-09-23 2024-09-17 Twist Bioscience Corporation Variant nucleic acid libraries for CRTH2

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011056872A2 (en) 2009-11-03 2011-05-12 Gen9, Inc. Methods and microfluidic devices for the manipulation of droplets in high fidelity polynucleotide assembly
US9216414B2 (en) 2009-11-25 2015-12-22 Gen9, Inc. Microfluidic devices and methods for gene synthesis
WO2011085075A2 (en) 2010-01-07 2011-07-14 Gen9, Inc. Assembly of high fidelity polynucleotides
WO2011150168A1 (en) 2010-05-28 2011-12-01 Gen9, Inc. Methods and devices for in situ nucleic acid synthesis
US9752176B2 (en) 2011-06-15 2017-09-05 Ginkgo Bioworks, Inc. Methods for preparative in vitro cloning
EP2912587A4 (en) 2012-10-24 2016-12-07 Complete Genomics Inc Genome explorer system to process and present nucleotide variations in genome sequence data
WO2014092886A2 (en) * 2012-12-10 2014-06-19 Agilent Technologies, Inc. Pairing code directed assembly
US9561323B2 (en) 2013-03-14 2017-02-07 Fresenius Medical Care Holdings, Inc. Medical fluid cassette leak detection methods and devices
WO2014165818A2 (en) 2013-04-05 2014-10-09 T Cell Therapeutics, Inc. Compositions and methods for preventing and treating prostate cancer
US9988624B2 (en) 2015-12-07 2018-06-05 Zymergen Inc. Microbial strain improvement by a HTP genomic engineering platform
US11208649B2 (en) 2015-12-07 2021-12-28 Zymergen Inc. HTP genomic engineering platform
WO2024132094A1 (en) 2022-12-19 2024-06-27 Thermo Fisher Scientific Geneart Gmbh Retrieval of sequence-verified nucleic acid molecules

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998038299A1 (en) * 1997-02-27 1998-09-03 Gesher-Israel Advanced Biotecs (1996) Ltd. Single step assembly of multiple dna fragments
US20020025561A1 (en) * 2000-04-17 2002-02-28 Hodgson Clague Pitman Vectors for gene-self-assembly

Family Cites Families (362)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4500707A (en) 1980-02-29 1985-02-19 University Patents, Inc. Nucleosides useful in the preparation of polynucleotides
US5096825A (en) * 1983-01-12 1992-03-17 Chiron Corporation Gene for human epidermal growth factor and synthesis and expression thereof
DE3301833A1 (en) 1983-01-20 1984-07-26 Gesellschaft für Biotechnologische Forschung mbH (GBF), 3300 Braunschweig METHOD FOR SIMULTANEOUS SYNTHESIS OF SEVERAL OLIGONOCLEOTIDES IN A SOLID PHASE
DE3329892A1 (en) 1983-08-18 1985-03-07 Köster, Hubert, Prof. Dr., 2000 Hamburg METHOD FOR PRODUCING OLIGONUCLEOTIDES
US4888286A (en) 1984-02-06 1989-12-19 Creative Biomolecules, Inc. Production of gene and protein analogs through synthetic gene design using double stranded synthetic oligonucleotides
US4683195A (en) 1986-01-30 1987-07-28 Cetus Corporation Process for amplifying, detecting, and/or-cloning nucleic acid sequences
US4965188A (en) 1986-08-22 1990-10-23 Cetus Corporation Process for amplifying, detecting, and/or cloning nucleic acid sequences using a thermostable enzyme
US4683202A (en) 1985-03-28 1987-07-28 Cetus Corporation Process for amplifying nucleic acid sequences
CA1293460C (en) 1985-10-07 1991-12-24 Brian Lee Sauer Site-specific recombination of dna in yeast
US4800159A (en) 1986-02-07 1989-01-24 Cetus Corporation Process for amplifying, detecting, and/or cloning nucleic acid sequences
US5093251A (en) * 1986-05-23 1992-03-03 California Institute Of Technology Cassette method of gene synthesis
US5017692A (en) 1986-09-04 1991-05-21 Schering Corporation Truncated human interleukin-a alpha
US5525464A (en) * 1987-04-01 1996-06-11 Hyseq, Inc. Method of sequencing by hybridization of oligonucleotide probes
US4999294A (en) 1987-12-17 1991-03-12 New England Biolabs, Inc. Method for producing the FokI restriction endonuclease and methylase
US5700637A (en) 1988-05-03 1997-12-23 Isis Innovation Limited Apparatus and method for analyzing polynucleotide sequences and method of generating oligonucleotide arrays
AU3869289A (en) 1988-07-14 1990-02-05 Baylor College Of Medicine Solid phase assembly and reconstruction of biopolymers
US5132215A (en) * 1988-09-15 1992-07-21 Eastman Kodak Company Method of making double-stranded dna sequences
GB8822228D0 (en) 1988-09-21 1988-10-26 Southern E M Support-bound oligonucleotides
US5047524A (en) 1988-12-21 1991-09-10 Applied Biosystems, Inc. Automated system for polynucleotide synthesis and purification
US5556750A (en) * 1989-05-12 1996-09-17 Duke University Methods and kits for fractionating a population of DNA molecules based on the presence or absence of a base-pair mismatch utilizing mismatch repair systems
US5459039A (en) * 1989-05-12 1995-10-17 Duke University Methods for mapping genetic mutations
US6008031A (en) * 1989-05-12 1999-12-28 Duke University Method of analysis and manipulation of DNA utilizing mismatch repair systems
US5104789A (en) 1989-05-15 1992-04-14 The United States Of America, As Represented By The Secretary Of Agriculture Monoclonal antibodies which discriminate between strains of citrus tristeza virus
US5527681A (en) 1989-06-07 1996-06-18 Affymax Technologies N.V. Immobilized molecular synthesis of systematically substituted compounds
US5143854A (en) 1989-06-07 1992-09-01 Affymax Technologies N.V. Large scale photolithographic solid phase synthesis of polypeptides and receptor binding screening thereof
US5424186A (en) 1989-06-07 1995-06-13 Affymax Technologies N.V. Very large scale immobilized polymer synthesis
US5104792A (en) 1989-12-21 1992-04-14 The United States Of America As Represented By The Department Of Health And Human Services Method for amplifying unknown nucleic acid sequences
CA2036946C (en) 1990-04-06 2001-10-16 Kenneth V. Deugau Indexing linkers
US6582908B2 (en) 1990-12-06 2003-06-24 Affymetrix, Inc. Oligonucleotides
WO1992015694A1 (en) 1991-03-08 1992-09-17 The Salk Institute For Biological Studies Flp-mediated gene modification in mammalian cells, and compositions and cells useful therefor
US5512463A (en) 1991-04-26 1996-04-30 Eli Lilly And Company Enzymatic inverse polymerase chain reaction library mutagenesis
US5474796A (en) 1991-09-04 1995-12-12 Protogene Laboratories, Inc. Method and apparatus for conducting an array of chemical reactions on a support surface
US5639603A (en) 1991-09-18 1997-06-17 Affymax Technologies N.V. Synthesizing and screening molecular diversity
ES2097925T3 (en) 1991-09-18 1997-04-16 Affymax Tech Nv METHOD FOR SYNTHESIZING DIFFERENT OLIGOMER COLLECTIONS.
US5605662A (en) 1993-11-01 1997-02-25 Nanogen, Inc. Active programmable electronic devices for molecular biological analysis and diagnostics
US6017696A (en) 1993-11-01 2000-01-25 Nanogen, Inc. Methods for electronic stringency control for molecular biological analysis and diagnostics
US5846708A (en) 1991-11-19 1998-12-08 Massachusetts Institiute Of Technology Optical and electrical methods and apparatus for molecule detection
US5384261A (en) 1991-11-22 1995-01-24 Affymax Technologies N.V. Very large scale immobilized polymer synthesis using mechanically directed flow paths
EP1382386A3 (en) 1992-02-19 2004-12-01 The Public Health Research Institute Of The City Of New York, Inc. Novel oligonucleotide arrays and their use for sorting, isolating, sequencing, and manipulating nucleic acids
US5395750A (en) * 1992-02-28 1995-03-07 Hoffmann-La Roche Inc. Methods for producing proteins which bind to predetermined antigens
GB9207381D0 (en) 1992-04-03 1992-05-13 Ici Plc Synthesis of oligonucleotides
US5356802A (en) 1992-04-03 1994-10-18 The Johns Hopkins University Functional domains in flavobacterium okeanokoites (FokI) restriction endonuclease
US5436150A (en) 1992-04-03 1995-07-25 The Johns Hopkins University Functional domains in flavobacterium okeanokoities (foki) restriction endonuclease
US5916794A (en) * 1992-04-03 1999-06-29 Johns Hopkins University Methods for inactivating target DNA and for detecting conformational change in a nucleic acid
WO1993022457A1 (en) * 1992-04-24 1993-11-11 Massachusetts Institute Of Technology Screening for genetic variation
US5541061A (en) 1992-04-29 1996-07-30 Affymax Technologies N.V. Methods for screening factorial chemical libraries
US5639423A (en) 1992-08-31 1997-06-17 The Regents Of The University Of Calfornia Microfabricated reactor
US5288514A (en) 1992-09-14 1994-02-22 The Regents Of The University Of California Solid phase and combinatorial synthesis of benzodiazepine compounds on a solid support
US5795714A (en) * 1992-11-06 1998-08-18 Trustees Of Boston University Method for replicating an array of nucleic acid probes
US5368823A (en) 1993-02-11 1994-11-29 University Of Georgia Research Foundation, Inc. Automated synthesis of oligonucleotides
US5498531A (en) * 1993-09-10 1996-03-12 President And Fellows Of Harvard College Intron-mediated recombinant techniques and reagents
US6150141A (en) * 1993-09-10 2000-11-21 Trustees Of Boston University Intron-mediated recombinant techniques and reagents
US6027877A (en) * 1993-11-04 2000-02-22 Gene Check, Inc. Use of immobilized mismatch binding protein for detection of mutations and polymorphisms, purification of amplified DNA samples and allele identification
DE4343591A1 (en) 1993-12-21 1995-06-22 Evotec Biosystems Gmbh Process for the evolutionary design and synthesis of functional polymers based on shape elements and shape codes
US5834252A (en) * 1995-04-18 1998-11-10 Glaxo Group Limited End-complementary polymerase reaction
US6165793A (en) 1996-03-25 2000-12-26 Maxygen, Inc. Methods for generating polynucleotides having desired characteristics by iterative selection and recombination
KR100491810B1 (en) 1994-02-17 2005-10-11 맥시겐, 인크. Method of inducing DNA mutations by random fragmentation and reassembly
US6335160B1 (en) * 1995-02-17 2002-01-01 Maxygen, Inc. Methods and compositions for polypeptide engineering
US5928905A (en) 1995-04-18 1999-07-27 Glaxo Group Limited End-complementary polymerase reaction
US6117679A (en) 1994-02-17 2000-09-12 Maxygen, Inc. Methods for generating polynucleotides having desired characteristics by iterative selection and recombination
US5605793A (en) * 1994-02-17 1997-02-25 Affymax Technologies N.V. Methods for in vitro recombination
US5514789A (en) 1994-04-21 1996-05-07 Barrskogen, Inc. Recovery of oligonucleotides by gas phase cleavage
WO1996000378A1 (en) 1994-06-23 1996-01-04 Affymax Technologies N.V. Photolabile compounds and methods for their use
US5641658A (en) 1994-08-03 1997-06-24 Mosaic Technologies, Inc. Method for performing amplification of nucleic acid with two primers bound to a single solid support
US5604097A (en) 1994-10-13 1997-02-18 Spectragen, Inc. Methods for sorting polynucleotides using oligonucleotide tags
US5556752A (en) 1994-10-24 1996-09-17 Affymetrix, Inc. Surface-bound, unimolecular, double-stranded DNA
US5871902A (en) * 1994-12-09 1999-02-16 The Gene Pool, Inc. Sequence-specific detection of nucleic acid hybrids using a DNA-binding molecule or assembly capable of discriminating perfect hybrids from non-perfect hybrids
US5766550A (en) * 1995-03-15 1998-06-16 City Of Hope Disposable reagent storage and delivery cartridge
KR19990008000A (en) 1995-04-24 1999-01-25 로버트 에스. 화이트 헤드 How to create a new metabolic pathway and screen it
US5624711A (en) 1995-04-27 1997-04-29 Affymax Technologies, N.V. Derivatization of solid supports and methods for oligomer synthesis
US5700642A (en) 1995-05-22 1997-12-23 Sri International Oligonucleotide sizing using immobilized cleavable primers
US5830655A (en) 1995-05-22 1998-11-03 Sri International Oligonucleotide sizing using cleavable primers
US5877280A (en) * 1995-06-06 1999-03-02 The Mount Sinai School Of Medicine Of The City University Of New York Thermostable muts proteins
US6406847B1 (en) 1995-10-02 2002-06-18 The Board Of Trustees Of The Leland Stanford Junior University Mismatch repair detection
US6537776B1 (en) * 1999-06-14 2003-03-25 Diversa Corporation Synthetic ligation reassembly in directed evolution
US5922539A (en) * 1995-12-15 1999-07-13 Duke University Methods for use of mismatch repair systems for the detection and removal of mutant sequences that arise during enzymatic amplification
US6261797B1 (en) * 1996-01-29 2001-07-17 Stratagene Primer-mediated polynucleotide synthesis and manipulation techniques
US6013440A (en) * 1996-03-11 2000-01-11 Affymetrix, Inc. Nucleic acid affinity columns
US6096548A (en) 1996-03-25 2000-08-01 Maxygen, Inc. Method for directing evolution of a virus
US6242211B1 (en) * 1996-04-24 2001-06-05 Terragen Discovery, Inc. Methods for generating and screening novel metabolic pathways
US5851804A (en) * 1996-05-06 1998-12-22 Apollon, Inc. Chimeric kanamycin resistance gene
SE9602062D0 (en) 1996-05-29 1996-05-29 Pharmacia Biotech Ab Method for detection of mutations
US6277632B1 (en) * 1996-06-17 2001-08-21 Vectorobjects, Llc Method and kits for preparing multicomponent nucleic acid constructs
US6495318B2 (en) 1996-06-17 2002-12-17 Vectorobjects, Llc Method and kits for preparing multicomponent nucleic acid constructs
ZA975891B (en) 1996-07-05 1998-07-23 Combimatrix Corp Electrochemical solid phase synthesis of polymers
JPH1066576A (en) 1996-08-07 1998-03-10 Novo Nordisk As Double-stranded dna having protruding terminal and shuffling method using the same
DE69739121D1 (en) 1996-08-23 2009-01-02 Peter Ruhdal Jensen ARTIFICIAL PROMOTIVE LIBRARIES FOR SELECTED ORGANISMS AND PROMOTERS THEREOF
US6444650B1 (en) 1996-10-01 2002-09-03 Geron Corporation Antisense compositions for detecting and inhibiting telomerase reverse transcriptase
US6110668A (en) * 1996-10-07 2000-08-29 Max-Planck-Gesellschaft Zur Forderung Der Wissenschaften E.V. Gene synthesis method
US5953469A (en) 1996-10-29 1999-09-14 Xeotron Corporation Optical device utilizing optical waveguides and mechanical light-switches
EP0937096B1 (en) 1996-11-06 2004-02-04 Sequenom, Inc. Method of mass spectrometry analysis
IL120339A0 (en) * 1997-02-27 1997-06-10 Gesher Israel Advanced Biotecs Improved DNA assembly method
US6410220B1 (en) * 1997-02-28 2002-06-25 Nature Technology Corp Self-assembling genes, vectors and uses thereof
US6410225B1 (en) 1997-06-27 2002-06-25 Yale University Purification of oligomers
US6489141B1 (en) * 1997-07-09 2002-12-03 The University Of Queensland Nucleic acid sequence and methods for selectively expressing a protein in a target cell or tissue
CA2297661A1 (en) * 1997-07-22 1999-02-04 Darwin Molecular Corp. Amplification and other enzymatic reactions performed on nucleic acid arrays
US6326489B1 (en) 1997-08-05 2001-12-04 Howard Hughes Medical Institute Surface-bound, bimolecular, double-stranded DNA arrays
US6031098A (en) 1997-08-11 2000-02-29 California Institute Of Technology Detection and treatment of duplex polynucleotide damage
DE19736591A1 (en) * 1997-08-22 1999-02-25 Peter Prof Dr Hegemann Preparing long nucleic acid polymers from linkable oligonucleotides
WO1999010369A1 (en) 1997-08-28 1999-03-04 Thomas Jefferson University Compositions, kits, and methods for effecting adenine nucleotide modulation of dna mismatch recognition proteins
US6136568A (en) * 1997-09-15 2000-10-24 Hiatt; Andrew C. De novo polynucleotide synthesis using rolling templates
ES2340857T3 (en) * 1997-09-16 2010-06-10 Centocor Ortho Biotech Inc. METHOD FOR COMPLETE CHEMICAL SYNTHESIS AND EXEMPTION OF GENES AND GENOMES.
US6670127B2 (en) 1997-09-16 2003-12-30 Egea Biosciences, Inc. Method for assembly of a polynucleotide encoding a target polypeptide
WO1999019510A1 (en) 1997-10-10 1999-04-22 President And Fellows Of Harvard College Surface-bound, double-stranded dna protein arrays
JP2001519538A (en) 1997-10-10 2001-10-23 プレジデント・アンド・フェローズ・オブ・ハーバード・カレッジ Replica amplification of nucleic acid arrays
US20020127552A1 (en) 1997-10-10 2002-09-12 Church George M Replica amplification of nucleic acid arrays
US6177558B1 (en) 1997-11-13 2001-01-23 Protogene Laboratories, Inc. Method and composition for chemical synthesis using high boiling point organic solvents to control evaporation
US6042211A (en) 1997-11-25 2000-03-28 Hewlett-Packard Company Ink drop volume variance compensation for inkjet printing
WO1999028505A1 (en) 1997-12-03 1999-06-10 Curagen Corporation Methods and devices for measuring differential gene expression
ES2337027T3 (en) * 1997-12-05 2010-04-20 Europaisches Laboratorium Fur Molekularbiologie (Embl) NEW DNA CLONING METHOD BASED ON THE RECE-RECT RECOMBINATION SYSTEM OF E. COLI.
EP1037909B1 (en) 1997-12-11 2007-06-13 University of Saskatchewan Postweaning multisystemic wasting syndrome virus from pigs
US6093302A (en) 1998-01-05 2000-07-25 Combimatrix Corporation Electrochemical solid phase synthesis
US20030050437A1 (en) 1998-01-05 2003-03-13 Montgomery Donald D. Electrochemical solid phase synthesis
US6150102A (en) * 1998-02-03 2000-11-21 Lucent Technologies Inc. Method of generating nucleic acid oligomers of known composition
EP1054726B1 (en) 1998-02-11 2003-07-30 University of Houston, Office of Technology Transfer Apparatus for chemical and biochemical reactions using photo-generated reagents
WO1999042813A1 (en) * 1998-02-23 1999-08-26 Wisconsin Alumni Research Foundation Method and apparatus for synthesis of arrays of dna probes
US5912129A (en) * 1998-03-05 1999-06-15 Vinayagamoorthy; Thuraiayah Multi-zone polymerase/ligase chain reaction
DE19812103A1 (en) 1998-03-19 1999-09-23 Bernauer Annette Iteratively synthesizing nucleic acid from oligonucleotides which contain cleavable sites, useful for preparation of genes, ribozymes etc.
US6670605B1 (en) 1998-05-11 2003-12-30 Halliburton Energy Services, Inc. Method and apparatus for the down-hole characterization of formation fluids
US6271957B1 (en) 1998-05-29 2001-08-07 Affymetrix, Inc. Methods involving direct write optical lithography
EP1632496A1 (en) 1998-06-22 2006-03-08 Affymetrix, Inc. Reagents and methods for solid phase synthesis and display
US6846655B1 (en) 1998-06-29 2005-01-25 Phylos, Inc. Methods for generating highly diverse libraries
US6287825B1 (en) * 1998-09-18 2001-09-11 Molecular Staging Inc. Methods for reducing the complexity of DNA sequences
US5942609A (en) 1998-11-12 1999-08-24 The Porkin-Elmer Corporation Ligation assembly and detection of polynucleotides on solid-support
DE19854946C2 (en) * 1998-11-27 2002-01-03 Guenter Von Kiedrowski Cloning and copying on surfaces
EP1141275B1 (en) 1999-01-05 2009-08-12 Trustees Of Boston University Improved nucleic acid cloning
US20040005673A1 (en) * 2001-06-29 2004-01-08 Kevin Jarrell System for manipulating nucleic acids
US6358712B1 (en) * 1999-01-05 2002-03-19 Trustee Of Boston University Ordered gene assembly
US20030054390A1 (en) * 1999-01-19 2003-03-20 Maxygen, Inc. Oligonucleotide mediated nucleic acid recombination
US6376246B1 (en) * 1999-02-05 2002-04-23 Maxygen, Inc. Oligonucleotide mediated nucleic acid recombination
US6565727B1 (en) 1999-01-25 2003-05-20 Nanolytics, Inc. Actuators for microfluidics without moving parts
US6372484B1 (en) 1999-01-25 2002-04-16 E.I. Dupont De Nemours And Company Apparatus for integrated polymerase chain reaction and capillary electrophoresis
JP2002535995A (en) 1999-02-03 2002-10-29 ザ チルドレンズ メディカル センター コーポレイション Gene repair involving induction of double-stranded DNA breaks at chromosomal target sites
AU767606B2 (en) * 1999-02-19 2003-11-20 Synthetic Genomics, Inc. Method for producing polymers
JP2002538790A (en) * 1999-03-08 2002-11-19 プロトジーン・ラボラトリーズ・インコーポレーテッド Methods and compositions for economically synthesizing and assembling long DNA sequences
US6511849B1 (en) 1999-04-23 2003-01-28 The Sir Mortimer B. Davis - Jewish General Hospital Microarrays of biological materials
US6824866B1 (en) 1999-04-08 2004-11-30 Affymetrix, Inc. Porous silica substrates for polymer synthesis and assays
DE19925862A1 (en) 1999-06-07 2000-12-14 Diavir Gmbh Process for the synthesis of DNA fragments
US8137906B2 (en) 1999-06-07 2012-03-20 Sloning Biotechnology Gmbh Method for the synthesis of DNA fragments
US6355412B1 (en) * 1999-07-09 2002-03-12 The European Molecular Biology Laboratory Methods and compositions for directed cloning and subcloning using homologous recombination
US6653151B2 (en) 1999-07-30 2003-11-25 Large Scale Proteomics Corporation Dry deposition of materials for microarrays using matrix displacement
US6613581B1 (en) 1999-08-26 2003-09-02 Caliper Technologies Corp. Microfluidic analytic detection assays, devices, and integrated systems
GB9920711D0 (en) 1999-09-03 1999-11-03 Hd Technologies Limited High dynamic range mass spectrometer
AU1075701A (en) * 1999-10-08 2001-04-23 Protogene Laboratories, Inc. Method and apparatus for performing large numbers of reactions using array assembly
US6315958B1 (en) 1999-11-10 2001-11-13 Wisconsin Alumni Research Foundation Flow cell for synthesis of arrays of DNA probes and the like
US6800439B1 (en) 2000-01-06 2004-10-05 Affymetrix, Inc. Methods for improved array preparation
EP1130113A1 (en) 2000-02-15 2001-09-05 Johannes Petrus Schouten Multiplex ligation dependent amplification assay
WO2001062968A2 (en) 2000-02-25 2001-08-30 General Atomics Mutant nucleic binding enzymes and use thereof in diagnostic, detection and purification methods
US6833450B1 (en) 2000-03-17 2004-12-21 Affymetrix, Inc. Phosphite ester oxidation in nucleic acid array preparation
US6365355B1 (en) 2000-03-28 2002-04-02 The Regents Of The University Of California Chimeric proteins for detection and quantitation of DNA mutations, DNA sequence variations, DNA damage and DNA mismatches
US6969587B2 (en) 2000-04-04 2005-11-29 Taylor Paul D Detection of nucleic acid heteroduplex molecules by anion-exchange chromatography
CA2406466A1 (en) 2000-04-21 2001-11-01 Genencor International, Inc. Non-pcr based recombination of nucleic acids
DE10022995C2 (en) 2000-05-11 2003-11-27 Wavelight Laser Technologie Ag Device for photorefractive corneal surgery
US6479262B1 (en) 2000-05-16 2002-11-12 Hercules, Incorporated Solid phase enzymatic assembly of polynucleotides
AU2001264802A1 (en) * 2000-05-21 2001-12-03 University Of North Carolina At Chapel Hill Assembly of large viral genomes and chromosomes from subclones
JP2004509609A (en) 2000-06-02 2004-04-02 ブルー ヘロン バイオテクノロジー インコーポレイテッド Methods for improving sequence fidelity of synthetic double-stranded oligonucleotides
US6605451B1 (en) 2000-06-06 2003-08-12 Xtrana, Inc. Methods and devices for multiplexing amplification reactions
US20020012616A1 (en) 2000-07-03 2002-01-31 Xiaochuan Zhou Fluidic methods and devices for parallel chemical reactions
US20030118486A1 (en) 2000-07-03 2003-06-26 Xeotron Corporation Fluidic methods and devices for parallel chemical reactions
GB0016472D0 (en) 2000-07-05 2000-08-23 Amersham Pharm Biotech Uk Ltd Sequencing method and apparatus
AU2001271893A1 (en) 2000-07-07 2002-01-21 Nimblegen Systems, Inc. Method and apparatus for synthesis of arrays of dna probes
US20030017552A1 (en) * 2000-07-21 2003-01-23 Jarrell Kevin A. Modular vector systems
GB0018876D0 (en) 2000-08-01 2000-09-20 Applied Research Systems Method of producing polypeptides
ES2394877T3 (en) 2000-08-14 2013-02-06 The Government Of The United States Of America As Represented By The Secretary Of The Department Of Health And Human Services Improved homologous recombination mediated by lambda recombination proteins
WO2002016583A2 (en) 2000-08-24 2002-02-28 Maxygen, Inc. Constructs and their use in metabolic pathway engineering
US6610499B1 (en) 2000-08-31 2003-08-26 The Regents Of The University Of California Capillary array and related methods
US6966945B1 (en) 2000-09-20 2005-11-22 Goodrich Corporation Inorganic matrix compositions, composites and process of making the same
US6666541B2 (en) 2000-09-25 2003-12-23 Picoliter Inc. Acoustic ejection of fluids from a plurality of reservoirs
US20020037359A1 (en) 2000-09-25 2002-03-28 Mutz Mitchell W. Focused acoustic energy in the preparation of peptide arrays
US6658802B2 (en) 2000-10-24 2003-12-09 Northwest Rubber Extruders, Inc. Seal for fixed window of motor vehicle with curved concave corners
EP1205548A1 (en) 2000-11-09 2002-05-15 Kabushiki Kaisha Toyota Chuo Kenkyusho Methods and apparatus for successively ligating double-stranded DNA molecules on a solid phase
US6709861B2 (en) 2000-11-17 2004-03-23 Lucigen Corp. Cloning vectors and vector components
WO2002044425A2 (en) 2000-12-01 2002-06-06 Visigen Biotechnologies, Inc. Enzymatic nucleic acid synthesis: compositions and methods for altering monomer incorporation fidelity
US6596239B2 (en) 2000-12-12 2003-07-22 Edc Biosystems, Inc. Acoustically mediated fluid transfer methods and uses thereof
US6660475B2 (en) 2000-12-15 2003-12-09 New England Biolabs, Inc. Use of site-specific nicking endonucleases to create single-stranded regions and applications thereof
US20020133359A1 (en) 2000-12-26 2002-09-19 Appareon System, method and article of manufacture for country and regional treatment in a supply chain system
KR100860291B1 (en) 2001-01-19 2008-09-25 에게아 바이오사이언시스, 인크. Computer-directed assembly of a polynucleotide encoding a target polypeptide
US6514704B2 (en) 2001-02-01 2003-02-04 Xerox Corporation Quality control mechanism and process for a biofluid multi-ejector system
EP1368501A4 (en) 2001-02-21 2005-01-19 Gene Check Inc Mutation detection using muts and reca
CA2468425A1 (en) 2001-03-08 2003-09-19 Applera Corporation Reagents for oligonucleotide cleavage and deprotection
US7211654B2 (en) 2001-03-14 2007-05-01 Regents Of The University Of Michigan Linkers and co-coupling agents for optimization of oligonucleotide synthesis and purification on solid supports
ATE472609T1 (en) * 2001-04-26 2010-07-15 Amgen Mountain View Inc COMBINATORIAL LIBRARIES OF MONOMER DOMAIN
US6951720B2 (en) * 2001-05-10 2005-10-04 San Diego State University Foundation Use of phosphorothiolate polynucleotides in ligating nucleic acids
WO2002095073A1 (en) * 2001-05-18 2002-11-28 Wisconsin Alumni Research Foundation Method for the synthesis of dna sequences
AU2002314997A1 (en) * 2001-06-05 2002-12-16 Gorilla Genomics, Inc. Methods for low background cloning of dna using long oligonucleotides
US6905827B2 (en) 2001-06-08 2005-06-14 Expression Diagnostics, Inc. Methods and compositions for diagnosing or monitoring auto immune and chronic inflammatory diseases
RU2004100107A (en) 2001-06-08 2005-06-10 Шанхай Мендель Дна Сентер Ко., Лтд (Cn) LOW TEMPERATURE CYCLIC EXTENSION OF DNA WITH HIGH SPECIFICITY OF PRIMING
US7179423B2 (en) 2001-06-20 2007-02-20 Cytonome, Inc. Microfluidic system including a virtual wall fluid interface port for interfacing fluids with the microfluidic system
US6416164B1 (en) 2001-07-20 2002-07-09 Picoliter Inc. Acoustic ejection of fluids using large F-number focusing elements
US20050130140A1 (en) 2001-07-23 2005-06-16 Bovenberg Roelof A.L. Process for preparing variant polynucleotides
US6734436B2 (en) 2001-08-07 2004-05-11 Sri International Optical microfluidic devices and methods
WO2003033718A1 (en) 2001-10-17 2003-04-24 Global Genomics Ab Synthesis of oligonucleotides on solid support and assembly into doublestranded polynucleotides
WO2003040410A1 (en) 2001-11-02 2003-05-15 Nimblegen Systems, Inc. Detection of hybridization oligonucleotide microarray through covalently labeling microarray probe
ATE414767T1 (en) 2001-11-22 2008-12-15 Sloning Biotechnology Gmbh NUCLEIC ACID LINKERS AND THEIR USE IN GENE SYNTHESIS
US20030099952A1 (en) 2001-11-26 2003-05-29 Roland Green Microarrays with visible pattern detection
US7057026B2 (en) 2001-12-04 2006-06-06 Solexa Limited Labelled nucleotides
AU2002357249A1 (en) 2001-12-13 2003-07-09 Blue Heron Biotechnology, Inc. Methods for removal of double-stranded oligonucleotides containing sequence errors using mismatch recognition proteins
US20030171325A1 (en) * 2002-01-04 2003-09-11 Board Of Regents, The University Of Texas System Proofreading, error deletion, and ligation method for synthesis of high-fidelity polynucleotide sequences
US6897025B2 (en) 2002-01-07 2005-05-24 Perlegen Sciences, Inc. Genetic analysis systems and methods
JP2005514927A (en) 2002-01-14 2005-05-26 ディヴァーサ コーポレイション Method for producing polynucleotide and method for purifying double-stranded polynucleotide
US20030215837A1 (en) * 2002-01-14 2003-11-20 Diversa Corporation Methods for purifying double-stranded nucleic acids lacking base pair mismatches or nucleotide gaps
US20030219781A1 (en) * 2002-01-14 2003-11-27 Diversa Corporation Compositions and methods for making polynucleotides by iterative assembly of codon building blocks
US20040096826A1 (en) 2002-01-30 2004-05-20 Evans Glen A. Methods for creating recombination products between nucleotide sequences
US20040126757A1 (en) 2002-01-31 2004-07-01 Francesco Cerrina Method and apparatus for synthesis of arrays of DNA probes
US7037659B2 (en) 2002-01-31 2006-05-02 Nimblegen Systems Inc. Apparatus for constructing DNA probes having a prismatic and kaleidoscopic light homogenizer
US7422851B2 (en) 2002-01-31 2008-09-09 Nimblegen Systems, Inc. Correction for illumination non-uniformity during the synthesis of arrays of oligomers
US7157229B2 (en) 2002-01-31 2007-01-02 Nimblegen Systems, Inc. Prepatterned substrate for optical synthesis of DNA probes
US7083975B2 (en) 2002-02-01 2006-08-01 Roland Green Microarray synthesis instrument and method
US7399590B2 (en) 2002-02-21 2008-07-15 Asm Scientific, Inc. Recombinase polymerase amplification
ATE428805T1 (en) 2002-02-28 2009-05-15 Wisconsin Alumni Res Found METHOD FOR ERROR REDUCTION IN NUCLEIC ACID POPULATIONS
CN1656234B (en) 2002-03-20 2012-02-01 创新生物公司 Microcapsules encapsulating a nucleic acid amplification reaction mixture and their use as reaction compartment for parallels reactions
US20030182172A1 (en) 2002-03-25 2003-09-25 Claggett Stuart Lee System and method to build project management processes
US7482119B2 (en) * 2002-04-01 2009-01-27 Blue Heron Biotechnology, Inc. Solid phase methods for polynucleotide production
CA2478983A1 (en) 2002-04-01 2003-10-16 Blue Heron Biotechnology, Inc. Solid phase methods for polynucleotide production
CA2480200A1 (en) 2002-04-02 2003-10-16 Caliper Life Sciences, Inc. Methods and apparatus for separation and isolation of components from a biological sample
DK1501947T3 (en) 2002-04-22 2008-11-17 Genencor Int Method of Creating a Library of Bacterial Clones with Varying Levels of Gene Expression
US20040171047A1 (en) 2002-05-22 2004-09-02 Dahl Gary A. Target-dependent transcription
AU2003240795A1 (en) 2002-05-24 2003-12-12 Invitrogen Corporation Nested pcr employing degradable primers
WO2003100012A2 (en) 2002-05-24 2003-12-04 Nimblegen Systems, Inc. Microarrays and method for running hybridization reaction for multiple samples on a single microarray
US6932097B2 (en) 2002-06-18 2005-08-23 Picoliter Inc. Acoustic control of the composition and/or volume of fluid in a reservoir
JP2006507921A (en) 2002-06-28 2006-03-09 プレジデント・アンド・フェロウズ・オブ・ハーバード・カレッジ Method and apparatus for fluid dispersion
US20040101444A1 (en) 2002-07-15 2004-05-27 Xeotron Corporation Apparatus and method for fluid delivery to a hybridization station
GB0221053D0 (en) 2002-09-11 2002-10-23 Medical Res Council Single-molecule in vitro evolution
US7563600B2 (en) 2002-09-12 2009-07-21 Combimatrix Corporation Microarray synthesis and assembly of gene-length polynucleotides
US6911132B2 (en) 2002-09-24 2005-06-28 Duke University Apparatus for manipulating droplets by electrowetting-based techniques
WO2004029220A2 (en) * 2002-09-26 2004-04-08 Kosan Biosciences, Inc. Synthetic genes
AU2003270898A1 (en) 2002-09-27 2004-04-19 Nimblegen Systems, Inc. Microarray with hydrophobic barriers
WO2004031399A2 (en) 2002-09-30 2004-04-15 Nimblegen Systems, Inc. Parallel loading of arrays
US20040110212A1 (en) 2002-09-30 2004-06-10 Mccormick Mark Microarrays with visual alignment marks
WO2004031351A2 (en) 2002-10-01 2004-04-15 Nimblegen Systems, Inc. Microarrays having multiple oligonucleotides in single array features
EP1560642A4 (en) 2002-10-09 2006-05-03 Univ Illinois Microfluidic systems and components
DE60227525D1 (en) 2002-10-18 2008-08-21 Sloning Biotechnology Gmbh Process for the preparation of nucleic acid molecules
US7267984B2 (en) * 2002-10-31 2007-09-11 Rice University Recombination assembly of large DNA fragments
US7932025B2 (en) * 2002-12-10 2011-04-26 Massachusetts Institute Of Technology Methods for high fidelity production of long nucleic acid molecules with error control
US7879580B2 (en) 2002-12-10 2011-02-01 Massachusetts Institute Of Technology Methods for high fidelity production of long nucleic acid molecules
US7575865B2 (en) 2003-01-29 2009-08-18 454 Life Sciences Corporation Methods of amplifying and sequencing nucleic acids
EP2159285B1 (en) 2003-01-29 2012-09-26 454 Life Sciences Corporation Methods of amplifying and sequencing nucleic acids
US8206913B1 (en) 2003-03-07 2012-06-26 Rubicon Genomics, Inc. Amplification and analysis of whole genome and whole transcriptome libraries generated by a DNA polymerization process
EP1613776A1 (en) 2003-04-02 2006-01-11 Blue Heron Biotechnology, Inc. Error reduction in automated gene synthesis
US7262031B2 (en) * 2003-05-22 2007-08-28 The Regents Of The University Of California Method for producing a synthetic gene or other DNA sequence
US20040241655A1 (en) 2003-05-29 2004-12-02 Yuchi Hwang Conditional touchdown multiplex polymerase chain reaction
US8133670B2 (en) 2003-06-13 2012-03-13 Cold Spring Harbor Laboratory Method for making populations of defined nucleic acid molecules
ES2385828T3 (en) * 2003-06-20 2012-08-01 Exiqon A/S Probes, libraries and kits for analysis of nucleic acid mixtures and procedures for building them
WO2005017153A2 (en) * 2003-08-04 2005-02-24 Blue Heron Biotechnology, Inc. Methods for synthesis of defined polynucleotides
US7198900B2 (en) 2003-08-29 2007-04-03 Applera Corporation Multiplex detection compositions, methods, and kits
US7169560B2 (en) 2003-11-12 2007-01-30 Helicos Biosciences Corporation Short cycle methods for sequencing polynucleotides
US7582858B2 (en) 2004-01-23 2009-09-01 Sri International Apparatus and method of moving micro-droplets using laser-induced thermal gradients
DK1557464T3 (en) 2004-01-23 2011-01-24 Sloning Biotechnology Gmbh Enzymatic preparation of nucleic acid molecules
JP2007534320A (en) 2004-02-27 2007-11-29 プレジデント・アンド・フェロウズ・オブ・ハーバード・カレッジ Polynucleotide synthesis method
US7432055B2 (en) 2004-03-05 2008-10-07 Uchicago Argonne Llc Dual phase multiplex polymerase chain reaction
US20050208503A1 (en) * 2004-03-16 2005-09-22 Handy Yowanto Chemical ligation of nucleic acids
US20050227316A1 (en) 2004-04-07 2005-10-13 Kosan Biosciences, Inc. Synthetic genes
US8825411B2 (en) 2004-05-04 2014-09-02 Dna Twopointo, Inc. Design, synthesis and assembly of synthetic nucleic acids
JP2005319407A (en) 2004-05-10 2005-11-17 Hitachi Ltd Instrument using piezoelectric device
JP2008502367A (en) 2004-06-09 2008-01-31 ウイスコンシン アラムニ リサーチ ファンデーション Fast generation of oligonucleotides
US20060008833A1 (en) * 2004-07-12 2006-01-12 Jacobson Joseph M Method for long, error-reduced DNA synthesis
DE602005026128D1 (en) * 2004-08-27 2011-03-10 Wisconsin Alumni Res Found METHOD FOR ERROR REDUCTION IN NUCLEIC ACID PULPATIONS
WO2006031745A2 (en) 2004-09-10 2006-03-23 Sequenom, Inc. Methods for long-range sequence analysis of nucleic acids
AU2005295351A1 (en) 2004-10-18 2006-04-27 Codon Devices, Inc. Methods for assembly of high fidelity synthetic polynucleotides
US20070122817A1 (en) 2005-02-28 2007-05-31 George Church Methods for assembly of high fidelity synthetic polynucleotides
US20060194214A1 (en) * 2005-02-28 2006-08-31 George Church Methods for assembly of high fidelity synthetic polynucleotides
WO2006049843A1 (en) 2004-11-01 2006-05-11 Parallele Bioscience, Inc. Multiplex polynucleotide synthesis
US7699979B2 (en) 2005-01-07 2010-04-20 Board Of Trustees Of The University Of Arkansas Separation system and efficient capture of contaminants using magnetic nanoparticles
WO2006076679A1 (en) 2005-01-13 2006-07-20 Codon Devices, Inc. Compositions and methods for protein design
JP2008528040A (en) 2005-02-01 2008-07-31 アジェンコート バイオサイエンス コーポレイション Reagents, methods and libraries for bead-based sequencing
US7393665B2 (en) 2005-02-10 2008-07-01 Population Genetics Technologies Ltd Methods and compositions for tagging and identifying polynucleotides
WO2006086209A2 (en) 2005-02-10 2006-08-17 Compass Genetics, Llc Genetic analysis by sequence-specific sorting
US7285835B2 (en) 2005-02-24 2007-10-23 Freescale Semiconductor, Inc. Low power magnetoelectronic device structures utilizing enhanced permeability materials
JP2008539759A (en) 2005-05-11 2008-11-20 ナノリティックス・インコーポレイテッド Method and apparatus for performing biochemical or chemical reactions at multiple temperatures
WO2006127423A2 (en) 2005-05-18 2006-11-30 Codon Devices, Inc. Methods of producing polynucleotide libraries using scarless ligation
ES2387878T3 (en) 2005-06-23 2012-10-03 Keygene N.V. Strategies for the identification of high performance and the detection of polymorphisms
WO2007005053A1 (en) 2005-06-30 2007-01-11 Codon Devices, Inc. Hierarchical assembly methods for genome engineering
US20070184487A1 (en) 2005-07-12 2007-08-09 Baynes Brian M Compositions and methods for design of non-immunogenic proteins
JP2009501522A (en) 2005-07-12 2009-01-22 コドン デバイシズ インコーポレイテッド Compositions and methods for biocatalytic engineering
GB0514910D0 (en) 2005-07-20 2005-08-24 Solexa Ltd Method for sequencing a polynucleotide template
DK1915446T3 (en) * 2005-08-11 2017-09-11 Synthetic Genomics Inc IN VITRO RECOMBINATION PROCEDURE
US7303320B1 (en) 2005-12-07 2007-12-04 Ashley David M Pick-up truck cab accessory mounting bracket
WO2007075438A2 (en) 2005-12-15 2007-07-05 Codon Devices, Inc. Polypeptides comprising unnatural amino acids, methods for their production and uses therefor
WO2007087312A2 (en) 2006-01-23 2007-08-02 Population Genetics Technologies Ltd. Molecular counting
WO2007087347A2 (en) 2006-01-24 2007-08-02 Codon Devices, Inc. Methods, systems, and apparatus for facilitating the design of molecular constructs
EP1999276A4 (en) 2006-03-14 2010-08-04 Genizon Biosciences Inc Methods and means for nucleic acid sequencing
WO2007107710A1 (en) 2006-03-17 2007-09-27 Solexa Limited Isothermal methods for creating clonal single molecule arrays
WO2007123742A2 (en) 2006-03-31 2007-11-01 Codon Devices, Inc. Methods and compositions for increasing the fidelity of multiplex nucleic acid assembly
US20070231805A1 (en) 2006-03-31 2007-10-04 Baynes Brian M Nucleic acid assembly optimization using clamped mismatch binding proteins
EP1842915A1 (en) 2006-04-04 2007-10-10 Libragen Method of in vitro polynucleotide sequences shuffling by recursive circular DNA molecules fragmentation and ligation
WO2007120624A2 (en) 2006-04-10 2007-10-25 Codon Devices, Inc. Concerted nucleic acid assembly reactions
WO2007136736A2 (en) 2006-05-19 2007-11-29 Codon Devices, Inc. Methods for nucleic acid sorting and synthesis
US20070281309A1 (en) 2006-05-19 2007-12-06 Massachusetts Institute Of Technology Microfluidic-based Gene Synthesis
WO2007136833A2 (en) 2006-05-19 2007-11-29 Codon Devices, Inc. Methods and compositions for aptamer production and uses thereof
WO2007136835A2 (en) 2006-05-19 2007-11-29 Codon Devices, Inc Methods and cells for creating functional diversity and uses thereof
US20090087840A1 (en) * 2006-05-19 2009-04-02 Codon Devices, Inc. Combined extension and ligation for nucleic acid assembly
WO2008054543A2 (en) 2006-05-20 2008-05-08 Codon Devices, Inc. Oligonucleotides for multiplex nucleic acid assembly
WO2007136840A2 (en) 2006-05-20 2007-11-29 Codon Devices, Inc. Nucleic acid library design and assembly
ATE536424T1 (en) 2006-06-06 2011-12-15 Gen Probe Inc LABELED OLIGONUCLEOTIDES AND THEIR USE IN NUCLEIC ACID AMPLIFICATION PROCEDURES
WO2008024319A2 (en) 2006-08-20 2008-02-28 Codon Devices, Inc. Microfluidic devices for nucleic acid assembly
DE102006039479A1 (en) 2006-08-23 2008-03-06 Febit Biotech Gmbh Programmable oligonucleotide synthesis
WO2008027558A2 (en) 2006-08-31 2008-03-06 Codon Devices, Inc. Iterative nucleic acid assembly using activation of vector-encoded traits
US20080287320A1 (en) 2006-10-04 2008-11-20 Codon Devices Libraries and their design and assembly
US20080214408A1 (en) 2006-10-18 2008-09-04 Government Of The Uinted States Of America, Represented By The Secretary Dept. In situ assembly of protein microarrays
WO2008076368A2 (en) 2006-12-13 2008-06-26 Codon Devices, Inc. Fragment-rearranged nucleic acids and uses thereof
WO2008109176A2 (en) 2007-03-07 2008-09-12 President And Fellows Of Harvard College Assays and other reactions involving droplets
WO2008130629A2 (en) 2007-04-19 2008-10-30 Codon Devices, Inc. Engineered nucleases and their uses for nucleic acid assembly
US20090305233A1 (en) 2007-07-03 2009-12-10 Arizona Board Of Regents, A Body Corporate Of The State Of Arizona Methods and Reagents for Polynucleotide Assembly
ATE494061T1 (en) 2007-07-10 2011-01-15 Hoffmann La Roche MICROFLUIDIC DEVICE, MIXING METHOD AND USE OF THE DEVICE
WO2009032167A1 (en) 2007-08-29 2009-03-12 Illumina Cambridge Method for sequencing a polynucleotide template
KR100957057B1 (en) 2007-12-03 2010-05-13 래플진(주) Method for Detection of Nucleic Acids by Simultaneous Isothermal Amplification of Nucleic Acids and Signal Probe
US9409177B2 (en) 2008-03-21 2016-08-09 Lawrence Livermore National Security, Llc Chip-based device for parallel sorting, amplification, detection, and identification of nucleic acid subsequences
CN101577391B (en) 2008-05-07 2011-03-23 富士康(昆山)电脑接插件有限公司 Plug connector and manufacture method thereof
CA2723242A1 (en) 2008-05-14 2009-11-19 British Columbia Cancer Agency Branch Gene synthesis by convergent assembly of oligonucleotide subsets
US8808986B2 (en) 2008-08-27 2014-08-19 Gen9, Inc. Methods and devices for high fidelity polynucleotide synthesis
CA2747535C (en) 2008-12-18 2020-01-14 Iti Scotland Limited Method for assembly of polynucleic acid sequences
US8691509B2 (en) 2009-04-02 2014-04-08 Fluidigm Corporation Multi-primer amplification method for barcoding of target nucleic acids
CN104404134B (en) 2009-04-03 2017-05-10 莱弗斯基因股份有限公司 Multiplex nucleic acid detection methods and systems
US9309557B2 (en) 2010-12-17 2016-04-12 Life Technologies Corporation Nucleic acid amplification
EP2464753A4 (en) 2009-08-12 2013-12-11 Harvard College Biodetection methods and compositions
EP3418387B1 (en) 2009-08-31 2020-11-25 Basf Plant Science Company GmbH Regulatory nucleic acid molecules for enhancing seed-specific gene expression in plants promoting enhanced polyunsaturated fatty acid synthesis
PL389135A1 (en) 2009-09-28 2011-04-11 Michał Lower Method and DNA probe for obtaining restriction endonucleases, especially with the desired sequence specificity
US20120315670A1 (en) 2009-11-02 2012-12-13 Gen9, Inc. Compositions and Methods for the Regulation of Multiple Genes of Interest in a Cell
WO2011056872A2 (en) 2009-11-03 2011-05-12 Gen9, Inc. Methods and microfluidic devices for the manipulation of droplets in high fidelity polynucleotide assembly
EP3597771A1 (en) 2009-11-25 2020-01-22 Gen9, Inc. Methods and apparatuses for chip-based dna error reduction
US9216414B2 (en) 2009-11-25 2015-12-22 Gen9, Inc. Microfluidic devices and methods for gene synthesis
US8835358B2 (en) 2009-12-15 2014-09-16 Cellular Research, Inc. Digital counting of individual molecules by stochastic attachment of diverse labels
WO2011085075A2 (en) 2010-01-07 2011-07-14 Gen9, Inc. Assembly of high fidelity polynucleotides
US8716467B2 (en) 2010-03-03 2014-05-06 Gen9, Inc. Methods and devices for nucleic acid synthesis
WO2011143556A1 (en) 2010-05-13 2011-11-17 Gen9, Inc. Methods for nucleotide sequencing and high fidelity polynucleotide synthesis
US8850219B2 (en) 2010-05-13 2014-09-30 Salesforce.Com, Inc. Secure communications
WO2011150168A1 (en) 2010-05-28 2011-12-01 Gen9, Inc. Methods and devices for in situ nucleic acid synthesis
GB2481425A (en) 2010-06-23 2011-12-28 Iti Scotland Ltd Method and device for assembling polynucleic acid sequences
CN101921840B (en) 2010-06-30 2014-06-25 深圳华大基因科技有限公司 DNA molecular label technology and DNA incomplete interrupt policy-based PCR sequencing method
WO2012058488A1 (en) 2010-10-27 2012-05-03 President And Fellows Of Harvard College Compositions of toehold primer duplexes and methods of use
EP4039363A1 (en) 2010-11-12 2022-08-10 Gen9, Inc. Protein arrays and methods of using and making the same
EP2638157B1 (en) 2010-11-12 2015-07-22 Gen9, Inc. Methods and devices for nucleic acids synthesis
DE102010056289A1 (en) 2010-12-24 2012-06-28 Geneart Ag Process for the preparation of reading frame correct fragment libraries
US9809904B2 (en) 2011-04-21 2017-11-07 University Of Washington Through Its Center For Commercialization Methods for retrieval of sequence-verified DNA constructs
EP2530159A1 (en) 2011-06-03 2012-12-05 Sandoz Ag Transcription terminator sequences
US9752176B2 (en) 2011-06-15 2017-09-05 Ginkgo Bioworks, Inc. Methods for preparative in vitro cloning
CN103732744A (en) 2011-06-15 2014-04-16 Gen9股份有限公司 Methods for preparative in vitro cloning
US20150203839A1 (en) 2011-08-26 2015-07-23 Gen9, Inc. Compositions and Methods for High Fidelity Assembly of Nucleic Acids
LT2944693T (en) 2011-08-26 2019-08-26 Gen9, Inc. Compositions and methods for high fidelity assembly of nucleic acids
US9322023B2 (en) 2011-10-06 2016-04-26 Cornell University Constructs and methods for the assembly of biological pathways
EP2780460B1 (en) 2011-11-16 2018-07-11 Sangamo Therapeutics, Inc. Modified dna-binding proteins and uses thereof
US8939620B2 (en) 2011-12-21 2015-01-27 Kawasaki Jukogyo Kabushiki Kaisha Saddle-riding type automotive vehicle
US9150853B2 (en) 2012-03-21 2015-10-06 Gen9, Inc. Methods for screening proteins using DNA encoded chemical libraries as templates for enzyme catalysis
EP4001427A1 (en) 2012-04-24 2022-05-25 Gen9, Inc. Methods for sorting nucleic acids and multiplexed preparative in vitro cloning
US20130281308A1 (en) 2012-04-24 2013-10-24 Gen9, Inc. Methods for sorting nucleic acids and preparative in vitro cloning
CN104685116A (en) 2012-06-25 2015-06-03 Gen9股份有限公司 Methods for nucleic acid assembly and high throughput sequencing
ES2731528T3 (en) 2012-11-30 2019-11-15 Ixogen Ltd Oncolytic adenoviruses that have an increased proportion of the 156R cleavage isoform of E1 B protein
KR102243092B1 (en) 2012-12-06 2021-04-22 시그마-알드리치 컴퍼니., 엘엘씨 Crispr-based genome modification and regulation
US20140310830A1 (en) 2012-12-12 2014-10-16 Feng Zhang CRISPR-Cas Nickase Systems, Methods And Compositions For Sequence Manipulation in Eukaryotes
CN105121641A (en) 2012-12-17 2015-12-02 哈佛大学校长及研究员协会 RNA-guided human genome engineering
WO2014160004A1 (en) 2013-03-13 2014-10-02 Gen9, Inc. Compositions, methods and apparatus for oligonucleotides synthesis
US10053719B2 (en) 2013-03-13 2018-08-21 Gen9, Inc. Compositions and methods for synthesis of high fidelity oligonucleotides
CA2906556C (en) 2013-03-15 2022-07-05 Gen9, Inc. Compositions and methods for multiplex nucleic acids synthesis
US20140272345A1 (en) 2013-03-15 2014-09-18 Rubicon Technology, Inc. Method of growing aluminum oxide onto substrates by use of an aluminum source in an environment containing partial pressure of oxygen to create transparent, scratch-resistant windows
IL289396B2 (en) 2013-03-15 2023-12-01 The General Hospital Coporation Using truncated guide rnas (tru-grnas) to increase specificity for rna-guided genome editing
CA2913865C (en) 2013-05-29 2022-07-19 Cellectis A method for producing precise dna cleavage using cas9 nickase activity
US10421957B2 (en) 2013-07-29 2019-09-24 Agilent Technologies, Inc. DNA assembly using an RNA-programmable nickase
LT3027771T (en) 2013-07-30 2019-04-25 Gen9, Inc. Methods for the production of long length clonal sequence verified nucleic acid constructs
US9388430B2 (en) 2013-09-06 2016-07-12 President And Fellows Of Harvard College Cas9-recombinase fusion proteins and uses thereof
CA2931989C (en) 2013-11-27 2023-04-04 Gen9, Inc. Libraries of nucleic acids and methods for making the same
US20170198268A1 (en) 2014-07-09 2017-07-13 Gen9, Inc. Compositions and Methods for Site-Directed DNA Nicking and Cleaving
EP3209777A4 (en) 2014-10-21 2018-07-04 Gen9, Inc. Methods for nucleic acid assembly

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1998038299A1 (en) * 1997-02-27 1998-09-03 Gesher-Israel Advanced Biotecs (1996) Ltd. Single step assembly of multiple dna fragments
US20020025561A1 (en) * 2000-04-17 2002-02-28 Hodgson Clague Pitman Vectors for gene-self-assembly

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
PACHUK C J ET AL: "Chain reaction cloning: a one-step method for directional ligation of multiple DNA fragments" GENE, ELSEVIER, AMSTERDAM, NL, vol. 243, no. 1-2, February 2000 (2000-02), pages 19-25, XP004187670 ISSN: 0378-1119 *
STEMMER W P ET AL: "Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides" GENE, ELSEVIER, AMSTERDAM, NL, vol. 164, no. 1, 16 October 1995 (1995-10-16), pages 49-53, XP002301505 ISSN: 0378-1119 *
ZHOU XIAOCHUAN ET AL: "Microfluidic PicoArray synthesis of oligodeoxynucleotides and simultaneous assembling of multiple DNA sequences" NUCLEIC ACIDS RESEARCH, OXFORD UNIVERSITY PRESS, SURREY, GB, vol. 32, no. 18, 2004, pages 5409-5417, XP002393873 ISSN: 0305-1048 *

Cited By (64)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10202608B2 (en) 2006-08-31 2019-02-12 Gen9, Inc. Iterative nucleic acid assembly using activation of vector-encoded traits
US8999679B2 (en) 2008-12-18 2015-04-07 Iti Scotland Limited Method for assembly of polynucleic acid sequences
US9777305B2 (en) 2010-06-23 2017-10-03 Iti Scotland Limited Method for the assembly of a polynucleic acid sequence
US11084014B2 (en) 2010-11-12 2021-08-10 Gen9, Inc. Methods and devices for nucleic acids synthesis
US10982208B2 (en) 2010-11-12 2021-04-20 Gen9, Inc. Protein arrays and methods of using and making the same
US11845054B2 (en) 2010-11-12 2023-12-19 Gen9, Inc. Methods and devices for nucleic acids synthesis
US10457935B2 (en) 2010-11-12 2019-10-29 Gen9, Inc. Protein arrays and methods of using and making the same
US11702662B2 (en) 2011-08-26 2023-07-18 Gen9, Inc. Compositions and methods for high fidelity assembly of nucleic acids
EP2944693B1 (en) * 2011-08-26 2019-04-24 Gen9, Inc. Compositions and methods for high fidelity assembly of nucleic acids
US10308931B2 (en) 2012-03-21 2019-06-04 Gen9, Inc. Methods for screening proteins using DNA encoded chemical libraries as templates for enzyme catalysis
US10927369B2 (en) 2012-04-24 2021-02-23 Gen9, Inc. Methods for sorting nucleic acids and multiplexed preparative in vitro cloning
US10081807B2 (en) 2012-04-24 2018-09-25 Gen9, Inc. Methods for sorting nucleic acids and multiplexed preparative in vitro cloning
US11072789B2 (en) 2012-06-25 2021-07-27 Gen9, Inc. Methods for nucleic acid assembly and high throughput sequencing
US10773232B2 (en) 2013-08-05 2020-09-15 Twist Bioscience Corporation De novo synthesized gene libraries
US10618024B2 (en) 2013-08-05 2020-04-14 Twist Bioscience Corporation De novo synthesized gene libraries
US10384188B2 (en) 2013-08-05 2019-08-20 Twist Bioscience Corporation De novo synthesized gene libraries
US11559778B2 (en) 2013-08-05 2023-01-24 Twist Bioscience Corporation De novo synthesized gene libraries
US11185837B2 (en) 2013-08-05 2021-11-30 Twist Bioscience Corporation De novo synthesized gene libraries
US9555388B2 (en) 2013-08-05 2017-01-31 Twist Bioscience Corporation De novo synthesized gene libraries
US10583415B2 (en) 2013-08-05 2020-03-10 Twist Bioscience Corporation De novo synthesized gene libraries
US11452980B2 (en) 2013-08-05 2022-09-27 Twist Bioscience Corporation De novo synthesized gene libraries
US10632445B2 (en) 2013-08-05 2020-04-28 Twist Bioscience Corporation De novo synthesized gene libraries
US10639609B2 (en) 2013-08-05 2020-05-05 Twist Bioscience Corporation De novo synthesized gene libraries
US9833761B2 (en) 2013-08-05 2017-12-05 Twist Bioscience Corporation De novo synthesized gene libraries
US9889423B2 (en) 2013-08-05 2018-02-13 Twist Bioscience Corporation De novo synthesized gene libraries
US10272410B2 (en) 2013-08-05 2019-04-30 Twist Bioscience Corporation De novo synthesized gene libraries
US9839894B2 (en) 2013-08-05 2017-12-12 Twist Bioscience Corporation De novo synthesized gene libraries
US10669304B2 (en) 2015-02-04 2020-06-02 Twist Bioscience Corporation Methods and devices for de novo oligonucleic acid assembly
US11697668B2 (en) 2015-02-04 2023-07-11 Twist Bioscience Corporation Methods and devices for de novo oligonucleic acid assembly
US9677067B2 (en) 2015-02-04 2017-06-13 Twist Bioscience Corporation Compositions and methods for synthetic gene assembly
US9981239B2 (en) 2015-04-21 2018-05-29 Twist Bioscience Corporation Devices and methods for oligonucleic acid library synthesis
US10744477B2 (en) 2015-04-21 2020-08-18 Twist Bioscience Corporation Devices and methods for oligonucleic acid library synthesis
US11691118B2 (en) 2015-04-21 2023-07-04 Twist Bioscience Corporation Devices and methods for oligonucleic acid library synthesis
US10844373B2 (en) 2015-09-18 2020-11-24 Twist Bioscience Corporation Oligonucleic acid variant libraries and synthesis thereof
US11807956B2 (en) 2015-09-18 2023-11-07 Twist Bioscience Corporation Oligonucleic acid variant libraries and synthesis thereof
US11512347B2 (en) 2015-09-22 2022-11-29 Twist Bioscience Corporation Flexible substrates for nucleic acid synthesis
US10384189B2 (en) 2015-12-01 2019-08-20 Twist Bioscience Corporation Functionalized surfaces and preparation thereof
US10987648B2 (en) 2015-12-01 2021-04-27 Twist Bioscience Corporation Functionalized surfaces and preparation thereof
US9895673B2 (en) 2015-12-01 2018-02-20 Twist Bioscience Corporation Functionalized surfaces and preparation thereof
US10053688B2 (en) 2016-08-22 2018-08-21 Twist Bioscience Corporation De novo synthesized nucleic acid libraries
US10975372B2 (en) 2016-08-22 2021-04-13 Twist Bioscience Corporation De novo synthesized nucleic acid libraries
US10754994B2 (en) 2016-09-21 2020-08-25 Twist Bioscience Corporation Nucleic acid based data storage
US11263354B2 (en) 2016-09-21 2022-03-01 Twist Bioscience Corporation Nucleic acid based data storage
US10417457B2 (en) 2016-09-21 2019-09-17 Twist Bioscience Corporation Nucleic acid based data storage
US11562103B2 (en) 2016-09-21 2023-01-24 Twist Bioscience Corporation Nucleic acid based data storage
US12056264B2 (en) 2016-09-21 2024-08-06 Twist Bioscience Corporation Nucleic acid based data storage
US10907274B2 (en) 2016-12-16 2021-02-02 Twist Bioscience Corporation Variant libraries of the immunological synapse and synthesis thereof
US11550939B2 (en) 2017-02-22 2023-01-10 Twist Bioscience Corporation Nucleic acid based data storage using enzymatic bioencryption
US10894959B2 (en) 2017-03-15 2021-01-19 Twist Bioscience Corporation Variant libraries of the immunological synapse and synthesis thereof
US11332740B2 (en) 2017-06-12 2022-05-17 Twist Bioscience Corporation Methods for seamless nucleic acid assembly
US10696965B2 (en) 2017-06-12 2020-06-30 Twist Bioscience Corporation Methods for seamless nucleic acid assembly
US11377676B2 (en) 2017-06-12 2022-07-05 Twist Bioscience Corporation Methods for seamless nucleic acid assembly
US11407837B2 (en) 2017-09-11 2022-08-09 Twist Bioscience Corporation GPCR binding proteins and synthesis thereof
US10894242B2 (en) 2017-10-20 2021-01-19 Twist Bioscience Corporation Heated nanowells for polynucleotide synthesis
US11745159B2 (en) 2017-10-20 2023-09-05 Twist Bioscience Corporation Heated nanowells for polynucleotide synthesis
US10936953B2 (en) 2018-01-04 2021-03-02 Twist Bioscience Corporation DNA-based digital information storage with sidewall electrodes
US12086722B2 (en) 2018-01-04 2024-09-10 Twist Bioscience Corporation DNA-based digital information storage with sidewall electrodes
US11492665B2 (en) 2018-05-18 2022-11-08 Twist Bioscience Corporation Polynucleotides, reagents, and methods for nucleic acid hybridization
US11732294B2 (en) 2018-05-18 2023-08-22 Twist Bioscience Corporation Polynucleotides, reagents, and methods for nucleic acid hybridization
US11492727B2 (en) 2019-02-26 2022-11-08 Twist Bioscience Corporation Variant nucleic acid libraries for GLP1 receptor
US11492728B2 (en) 2019-02-26 2022-11-08 Twist Bioscience Corporation Variant nucleic acid libraries for antibody optimization
WO2020201434A1 (en) * 2019-04-02 2020-10-08 Oxford University Innovation Limited Universal dna assembly
US11332738B2 (en) 2019-06-21 2022-05-17 Twist Bioscience Corporation Barcode-based nucleic acid sequence assembly
US12091777B2 (en) 2019-09-23 2024-09-17 Twist Bioscience Corporation Variant nucleic acid libraries for CRTH2

Also Published As

Publication number Publication date
US10202608B2 (en) 2019-02-12
US8053191B2 (en) 2011-11-08
WO2008027558A3 (en) 2008-04-24
US20200231976A1 (en) 2020-07-23
US20090155858A1 (en) 2009-06-18
US20120270754A1 (en) 2012-10-25

Similar Documents

Publication Publication Date Title
US20200231976A1 (en) Iterative nucleic acid assembly using activation of vector-encoded traits
US11702662B2 (en) Compositions and methods for high fidelity assembly of nucleic acids
US20090087840A1 (en) Combined extension and ligation for nucleic acid assembly
US11408020B2 (en) Methods for in vitro joining and combinatorial assembly of nucleic acid molecules
US20210395724A1 (en) Methods for nucleic acid assembly and high throughput sequencing
WO2008054543A2 (en) Oligonucleotides for multiplex nucleic acid assembly
US20070231805A1 (en) Nucleic acid assembly optimization using clamped mismatch binding proteins
US20150203839A1 (en) Compositions and Methods for High Fidelity Assembly of Nucleic Acids
WO2007120624A2 (en) Concerted nucleic acid assembly reactions
WO2007123742A2 (en) Methods and compositions for increasing the fidelity of multiplex nucleic acid assembly
WO2007136835A2 (en) Methods and cells for creating functional diversity and uses thereof
CA3213037A1 (en) Blocking oligonucleotides for the selective depletion of non-desirable fragments from amplified libraries

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07837632

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: RU

122 Ep: pct application non-entry in european phase

Ref document number: 07837632

Country of ref document: EP

Kind code of ref document: A2