WO2023056379A2

WO2023056379A2 - Sorting of oligonucleotide-directed combinatorial libraries

Info

Publication number: WO2023056379A2
Application number: PCT/US2022/077291
Authority: WO
Inventors: Richard Edward Watts; Divya KANICHAR
Original assignee: Insitro, Inc.
Priority date: 2021-09-30
Filing date: 2022-09-29
Publication date: 2023-04-06
Also published as: WO2023056379A3

Abstract

The present disclosure relates to the electrophoretic sorting of oligonucleotides. The sorted oligonucleotides may be serially enriched and/or used for the synthesis of encoded molecules.

Description

SORTING OF OLIGONUCLEOTIDE-DIRECTED COMBINATORIAL LIBRARIES

CROSS-REFERENCE TO RELATED APPLICATION

[0001] This application claims priority benefit to U.S. Provisional Application No. 63/250,689, filed on September 30, 2021, the contents of which are hereby incorporated by reference in their entirety for all purposes.

FIELD OF THE INVENTION

[0002] The present disclosure relates to systems for sorting oligonucleotides and related methods

BACKGROUND OF THE INVENTION

[0003] The field of combinatorial chemistry has made it possible to prepare a large number of compounds in a single process. These combinatorial libraries are synthesized from successive chemical subunits (e.g., building blocks) that can be assembled on nucleic acids encoding the addition of these chemical subunits. The resulting library compounds may be tested for possession of desired properties (including, for example, binding to a target molecule). Despite the success of many of these methods, existing methods of library synthesis have difficulty sorting libraries of oligonucleotides, resulting in slow partitioning of the library before synthesis of the encoded molecules. Thus, there is a need in the art for improved systems and methods for sorting of libraries to increase DNA encoded library synthesis efficiency and yield.

SUMMARY OF THE INVENTION

[0004] Described herein are methods of sorting nucleic acids. The sorted nucleic acids may be used to synthesize encoded molecules. The provided methods may improve the sorting of DNA encoded libraries to ultimately increase DNA encoded library synthesis efficiency and yield.

[0005] In some aspects, provided herein is a method of sorting a plurality of oligonucleotides G, wherein each oligonucleotide G comprises a plurality of codons and a reactive site, the method comprising: (a) providing a system comprising a hybridization array, wherein the hybridization array comprises a first feature and a second feature, wherein the first and second features are electrophoretically coupled in series by an aqueous separation medium; (b) loading the separation medium at a position upstream of the first and second features with the plurality of oligonucleotides G; (c) applying an electric current across the system for at least a time sufficient to migrate a first portion of the plurality of oligonucleotides G to the first feature and a second portion of the plurality of oligonucleotides G to the second feature; wherein the first feature comprises a multiplicity of first capture oligonucleotides and the second feature comprises a multiplicity of second capture oligonucleotides; wherein the first portion of the plurality of oligonucleotides G comprise a codon which specifically hybridizes with the first capture oligonucleotides; wherein the second portion of the plurality of oligonucleotides G comprise a codon which specifically hybridizes with the second capture oligonucleotides.

[0006] In some embodiments according to the method described above, the electric current is applied for at least a time sufficient to migrate a third portion of the plurality of oligonucleotides G to a position downstream of the first and second features; wherein the third portion of the plurality of oligonucleotides G does not comprise codons which can specifically hybridize with either of the first or second capture oligonucleotides. In some embodiments, (a) the system comprises a third feature, wherein the first, the second, and the third features are electrophoretically coupled in series by the aqueous separation medium; (b) wherein the electric current is applied for at least a time sufficient to migrate the third portion of the plurality of oligonucleotides G to the third feature; wherein the third feature comprises a multiplicity of third capture oligonucleotides; wherein the third portion of the plurality of oligonucleotides G comprise a codon which specifically hybridizes with the third capture oligonucleotides.

[0007] In some embodiments according to any of the methods described above, the electric current is applied for at least a time sufficient to migrate a fourth portion of the plurality of oligonucleotides G to a position downstream of the first, the second, and the third features; wherein the fourth portion of the plurality of oligonucleotides G does not comprise codons which can specifically hybridize with any of the first, the second, or the third capture oligonucleotides. In some embodiments, (a) the system comprises a fourth feature, wherein the first, the second, the third, and the fourth features are electrophoretically coupled in series by the aqueous separation medium; (b) wherein the electric current is applied for at least a time sufficient to migrate the fourth portion of the plurality of oligonucleotides G to the fourth feature; wherein the fourth feature comprises a plurality of fourth capture oligonucleotides; wherein the fourth portion of the plurality of oligonucleotides G comprise a codon which specifically hybridizes with the fourth capture oligonucleotides. In some embodiments, the electric current is applied for at least a time sufficient to migrate a fifth portion of the plurality of oligonucleotides G to a position downstream of the first, the second, the third, and the fourth features; wherein the fifth portion of the plurality of oligonucleotides G does not comprise codons which can specifically hybridize with any of the first, the second, the third, or the fourth capture oligonucleotides.

[0008] In some embodiments according to any of the methods described above, the method further comprises separating a feature from the system. In some embodiments, the method further comprises eluting a portion of the plurality of oligonucleotides G from a feature. In some embodiments, the eluted portion is eluted from the separated feature.

[0009] In some aspects, provided herein is a method of synthesizing an encoded molecule, the method comprising: (a) introducing a charged positional building block to a hybridization array comprising a feature, wherein the feature comprises a portion of oligonucleotides G after sorting according to the method of any one of claims 1-7; (b) reacting the reactive site present on the portion of the oligonucleotides G with the charged positional building block to form a covalent bond between the reactive site and the charged positional building block.

[0010] In some aspects, provided herein is a method of synthesizing an encoded molecule, the method comprising: (a) immobilizing the eluted portion of the plurality of oligonucleotides G of claim 8 or claim 9 on an immobilization array; (b) introducing a positional building block comprising a reactive site to the immobilization array; (c) reacting the immobilized serially enriched plurality of oligonucleotides G with the reactive site of the positional building block to form a covalent bond between the reactive site of the oligonucleotide G and the reactive site of the positional building block. In some embodiments, the immobilization array is an ion exchange resin.

[0011] In some aspects, provided herein is a method of serially enriching a plurality of oligonucleotides G, the method comprising: (a) obtaining a first set of oligonucleotides G that is the eluted portion of the plurality of oligonucleotides G according to claim 8 or claim 9; (b) repeating the method of claim 8 or claim 9 at least once using the first set of oligonucleotides G to obtain a serially enriched plurality of oligonucleotides G. In some embodiments, the method of claim 8 is repeated at least twice to obtain the serially enriched plurality of oligonucleotides G. In some embodiments, the eluted portion of the plurality of oligonucleotides G was sorted by hybridization of a first codon to a first capture oligonucleotide and the serially enriched plurality of oligonucleotides G was sorted at least by hybridization of a second codon to a second capture oligonucleotide, wherein the first and second codons are different.

[0012] In some aspects, provided herein is a method of synthesizing an encoded molecule, the method comprising: (a) immobilizing the serially enriched plurality of oligonucleotides G of any one of claims 13-15 on an immobilization array; (b) introducing a positional building block comprising a reactive site to the immobilization array; (c) reacting the immobilized serially enriched plurality of oligonucleotides G with the reactive site of the positional building block to form a covalent bond between the reactive site of the oligonucleotide G and the reactive site of the positional building block. In some embodiments, the immobilization array is an ion exchange resin.

[0013] In some embodiments according to any of the methods of synthesizing an encoded molecule described herein, the building block is not a nucleic acid or nucleic acid analog.

BRIEF DESCRIPTION OF THE DRAWINGS

[0014] Representative embodiments of the invention are disclosed by reference to the following figures. It should be understood that the embodiments depicted are not limited to the precise details shown.

[0015] FIG. 1 shows an exemplary system for electrophoretic routing of an oligonucleotide based on sequence-specific hybridization.

[0016] FIGs. 2A-2B show exemplary electrophoretic routing of oligonucleotides based on sequence-specific hybridization of an oligonucleotide G. FIG. 2A illustrates the sequencespecific hybridization of a codon of a first oligonucleotide to a capture oligonucleotide on a first feature of a system; the first oligonucleotide is retained on the first feature. FIG. 2B shows the sequence-specific hybridization of a codon of a second oligonucleotide to a capture oligonucleotide on a second feature of the system; the second oligonucleotide is retained on the second hybridization array.

[0017] FIG. 3 shows an exemplary method of synthesizing an encoded molecule on a hybridization array.

[0018] FIG. 4 shows an exemplary method of synthesizing an encoded molecule on an immobilization array. DETAILED DESCRIPTION OF THE INVENTION

[0019] In some aspects, provided herein are methods of sorting a plurality of oligonucleotides G. The sorting methods described herein allow for increased speed of synthesis and greater synthetic yield of encoded molecules (e.g., molecules which comprise building blocks encoded by an oligonucleotide G) formed from the plurality of oligonucleotides G. Several features can be electrophoretically coupled in series by an aqueous separation medium to form a system. The plurality of oligonucleotides G, each comprising a plurality of codons (of a plurality of identities) and optionally a reactive site (such as a first building block comprising the reactive site), may then be loaded to the system. An electrical current is then applied across the system for at least a time sufficient to migrate a first portion of the plurality of oligonucleotides G to a first feature comprising capture oligonucleotides. A portion of the plurality of oligonucleotides G with codons complementary to the capture oligonucleotides on a particular feature hybridize with the capture oligonucleotides. The portion of the plurality of oligonucleotides G hybridized with the capture oligonucleotides may be eluted from the feature and separated, and used to produce a library of encoded molecules. Pools of these encoded molecules may be screened downstream for binding to targets. These methods of electrophoretic migration are faster than traditional diffusion techniques.

[0020] FIG. 1 shows an exemplary system 100 for sorting molecules (such as oligonucleotides) based on sequence specific hybridization of oligonucleotides comprising codons to capture oligonucleotides on a feature (e.g., features 103, 105, 106, 107, or 108) of a hybridization array. A plurality of oligonucleotides G each comprise a plurality of codons and a reactive site, and are loaded to the system at container 101 in an aqueous separation medium 104. The exemplary system of FIG. 1 comprises a hybridization array, wherein a hybridization array is comprised of various features, such as features 103, 105, 106, 107, and 108 (i.e., a first feature 103, a second feature 105, a third feature 106, a fourth feature 107, and a fifth feature 108) that are electrophoretically coupled in series by the aqueous separation medium 104. Any suitable number of features may be selected, depending on the number of codons to be sorted by in one system. The system may be configured such that each component of the system (e.g., container 101 and/or container 102, and/or features 103, 105, 106, 107, and 108 of a hybridization array) may be separated from the system. An electric current may be applied across the system 100 to migrate the plurality of oligonucleotides G from the container 101 to the first feature 103, a second portion of the plurality of oligonucleotides G to the second feature 105, a third portion of the plurality of oligonucleotides G to the third feature 106, a fourth portion of the plurality of oligonucleotides G to the fourth feature 107, and a fifth portion of the plurality of oligonucleotides Gto the fifth feature 108. In this way, the starting plurality of oligonucleotides G is divided into portions of oligonucleotides G based on the sequence-specific hybridization with each feature of a hybridization array. Further application of the electric current may migrate a portion of the original plurality of oligonucleotides G beyond the final feature. This final portion would comprise codons that do not efficiently hybridize with any of the capture oligonucleotides on any feature, for example.

[0021] The capture of the oligonucleotides G by capture oligonucleotides of the hybridization array may not be 100% efficient. For example, although the capture oligonucleotides of a feature efficiently capture complementary codons of oligonucleotides G, it is possible that the entire portion of oligonucleotides G may not be completely captured by the feature. Therefore, some oligonucleotides G (e.g., a final portion of oligonucleotides) may pass through beyond the final feature of a hybridization array.

[0022] As illustrated in FIG. 1, the electrophoretic migration of oligonucleotides G across the system is directional, and, in this example, the oligonucleotides G move from the negative side of the system near container 101 to the positive side of the system near container 102 across the hybridization array, based on the net charge of the oligonucleotides G. The features comprise a multiplicity of capture oligonucleotides (i.e., the same capture oligonucleotides), which are configured to specifically hybridize to a codon of a portion of the plurality of oligonucleotides G, thereby sorting the plurality of oligonucleotides G. To increase the interaction time of the oligonucleotides G with the feature, the polarity of the electric current may be temporarily reversed or the electric current may be temporarily turned off.

[0023] After binding a portion of the original plurality of oligonucleotides G to a feature, the feature may be removed to elute the portion or the portion may be eluted from the feature while still connected to the system. The eluted portion of oligonucleotides G may then be serially enriched/sorted, by repeating the method of sorting a plurality of oligonucleotides G described herein (e.g., adding the eluted portion of oligonucleotides Gto another system 100 as in FIG. 1, applying an electrical current across the system to migrate the oligonucleotides G to a feature, and capturing a portion of the eluted portion of oligonucleotides G on a feature), based on hybridization with another codon. To serially enrich, the features in subsequent sorting steps would typically have different capture oligonucleotides to sort the portion of oligonucleotides G into further sub-portions. For example, the oligonucleotide G will typically comprise a plurality of codons, such as 2 to about 20 codons, and the system may be used to sort a plurality of oligonucleotides based on the presence of particular codons. The plurality of oligonucleotides G may be sorted through a system once based on one codon to yield a portion of the plurality of oligonucleotides G. This portion may then be further sorted in a second system based on other codons.

[0024] As a clarifying example, a plurality of oligonucleotides G may comprise codons arbitrarily designated A, B, C, D, E, F, G, H, I, and J. Each of codons A-J may have a different sequence and specifically hybridize with a different feature. In this illustrative example, assume the plurality of oligonucleotides G comprise oligonucleotides which each have 5 codons selected randomly from codons A-J. For example, if the plurality of oligonucleotides contains four oligonucleotides as follows:

(i) one oligonucleotide G contains codons A, B, C, E, and G;

(ii) another oligonucleotide G contains codons A, B, C, D, and J;

(iii) yet another oligonucleotide G contains codons A, G, H, I, and J; and

(iv) yet another oligonucleotide G contains codons B, C, D, E, F, and G.

In this example, a user may configure a system, as in system 100 of FIG. 1, where the first feature 103 contains capture oligonucleotides which specifically hybridize with codon A (and do not efficiently hybridize with codons B, C, D, E, F, G, H, I, and J). After applying the electric current, the first three exemplary oligonucleotides (i), (ii), and (iii) would be efficiently captured (meaning a substantial portion of the oligonucleotides are captured). Because the efficiency of capture may not be exactly 100%, a portion of oligonucleotide (i), (ii), and (iii) may flow through and not be captured. The oligonucleotide G, (iv), containing codons B, C, D, E, F, and G would not be efficiently captured, and continued application of the electric current would cause this oligonucleotide (iv) to pass the first feature 103. The first feature 103 may then be separated from the system and the portion of oligonucleotides G (containing oligonucleotides each having codon A) may then be eluted. Alternatively, the portion of oligonucleotides G (containing oligonucleotides each having codon A) may be eluted directly from feature 103 while still attached to the system. The eluted portion may then be added to another system, such as a system 100 according to FIG. 1, with different features to sort based on other codons than codon A. For example, this system may contain:

(a) a first feature 103 containing capture oligonucleotides specific for codon E;

(b) a second feature 105 containing capture oligonucleotides specific for codon D; and

(c) a third feature 106 containing capture oligonucleotides specific for codon I, as in FIG. 1.

In this example, exemplary oligonucleotides (i), (ii), (iii), and (iv) would be efficiently separated (i.e., sorted) after two rounds. The system, and any subsequent system, may be configured with particular features based on the desired sorting and specific codons in the plurality of oligonucleotides G.

[0025] As a further clarifying example, a plurality of oligonucleotides G may comprise different coding regions arbitrarily designated A, Bi, Ci, Bf, and Cf wherein: (i) a plurality of different codon sequences can be used in each of the different coding regions (i.e., A, Bi, Ci, Bf, and Cf); and, (ii) the set of sequences chosen to be codons at a coding region is unique to that region. For example, the set of codons used at coding region A may be unique to coding region A and not used in any other coding region (e.g., the set of codons will not be present in coding regions Bi, Ci, Bf, and Cf). Each of the codons used in a particular oligonucleotide G may have a different sequence, and specifically hybridizes with different capture oligonucleotides on a different feature of a hybridization array. In an illustrative example, the plurality of oligonucleotides G comprises oligonucleotides wherein there are, for example, 1152 unique codons in the A set, 32 unique codons in the Bi set, 32 codons in the Ci set, 192 codons in the Bf set, and 192 codons in the Cf set. The sets of codons can be arbitrarily designated using the name of the coding region and a number for the unique codon. For example, the A codon set can comprise codons designated A0001 through Al 152, the Bi set can comprise codons BiOl through Bi32, etc. If all the codons are used, and if all are assembled combinatorially, then the number of different sequences possible may be calculated, and totals 4.34910 x 10¹⁰ sequences (e.g., 1152 (codons of the A set) x 32 (codons of the Bi set) x 32 (codons of the Ci set) x 192 (codons of the Bf set) x 192 (codons of the Cf set) = 4.34910 x 10¹⁰ sequences).

[0026] Similarly, hybridization arrays could be constructed in which capture oligonucleotides complementary to each of the codons are immobilized on a solid support, and all the capture oligonucleotides comprising a coding region can be assembled into separate features of a single hybridization array. In an illustrative example, the Bi hybridization array would have 32 features, and the Cf hybridization array would have 192 features, etc. In this illustrative example, a user may construct a system in which a pool of several billion oligonucleotides G are sorted into pools on the basis of codon sequences. The user could construct a system in which the pool of oligonucleotides G is migrated by electrophoresis through all 32 features of the Bi hybridization array. When an electric current is applied, all the oligonucleotides G will begin to traverse the system, and the oligonucleotides possessing a BiOl codon would be captured on the feature possessing the BiOl -complement capture oligo, all those not possessing the BiOl codon would not be efficiently captured, and would move to the next feature of the hybridization array. If the next feature of the hybridization array has the Bi02-complement capture oligonucleotides, then oligonucleotides G possessing the Bi02 codon can be efficiently captured on this second feature. The pool of oligonucleotides G captured on the BiOl -complement feature could be eluted from the corresponding feature. This BiOl pool could be then sorted into 192 sub-pools on a Bf array. All 32 of the Bi pools could be eluted from the 32 features of the Bi hybridization array and sorted on 32 copies of the 192 feature Bf hybridization array. These sub-pools can all be eluted independently. This would produce a total of 144 sub-pools (e.g., 32 sub-pools of a Bi hybridization array x 192 sub-pools of a Bf hybridization array). Similarly, a group of Bi elutions could be combined and sorted on the same Bf hybridization array. For example, the BiOl through Bi04 elutions could be pooled and sorted on one Bf hybridization array; Bi05 through Bi08 could be sorted on a second Bf hybridization array, etc.

[0027] FIG. 2A illustrates a plurality of oligonucleotides G comprising a first oligonucleotide 203 and a second oligonucleotide 204 to be sorted using a method described herein, each comprising a coding region that comprises a plurality of codons (such as codons 201a, 201b, 201c, 201 d, and 201 e of the first oligonucleotide 203). An oligonucleotide G (e.g., 203 or 204) may comprise additional non-coding regions (not shown), which typically intersperse/separate the codons. The oligonucleotide G further comprises a reactive site 202. The reactive site comprises a chemical structural unit (e.g., a first building block) that is capable of being chemically linked to other chemical structural units (e.g., other building blocks) in sequence to form an encoded region. The building block of the reactive site 202 may be linked to the oligonucleotide G 203 or 204 via a linker 210, such as a peptide linker or an alkyl chain linker. [0028] When a plurality of oligonucleotides G are loaded onto a system, such as the system 100 of FIG. 1, an electric current may be applied across the system to migrate the first oligonucleotide 203 and second oligonucleotide 204 (of FIG. 2A) of the plurality of oligonucleotides to a first feature 205. The first feature 205 comprises a multiplicity of first capture oligonucleotides 207, which are capable of hybridizing with a codon (in this example, codon 201 e).

[0029] The first oligonucleotide 203 contains a codon (codon 20 le in this example) that specifically hybridizes with a capture oligonucleotide of the multiplicity of capture oligonucleotides 207 on the first feature 205. The second oligonucleotide 204 does not contain a codon capable of efficiently hybridizing with the multiplicity of capture oligonucleotides 207, so it is not bound to the first feature 205, and thus remains free to continue migrating in the aqueous separation medium.

[0030] As shown in FIG. 2B, which continues the example of FIG. 2A, the electrical current may further be applied across the system to migrate the second oligonucleotide 204 to the second feature 208. The second feature 208, in this example, comprises a multiplicity of second capture oligonucleotides 209. The multiplicity of second capture oligonucleotides are capable of specifically hybridizing with a codon of the second oligonucleotide 204. Because the first oligonucleotide 203 is immobilized on the first feature 205, it does not migrate further through the system. In other words, the first oligonucleotide 203 and the second oligonucleotide 204 are sorted and immobilized to corresponding features of the hybridization array based on sequence specific hybridization between particular codons and particular capture oligonucleotides.

[0031] The first feature and the second feature are electrophoretically coupled with other features of the hybridization array in the system. The features may also be configured such that they can each be separated from the system. When a feature is removed from the system, the captured portion of oligonucleotides G (e.g., the portion of oligonucleotides G comprising a codon that specifically hybridizes with the capture oligonucleotides of the feature) are likewise removed from the system. The captured portion of the plurality of oligonucleotides G can be eluted from a feature of the system.

[0032] The sorted portion of oligonucleotides G may be used to synthesize an encoded molecule at the encoded region of G. An encoded molecule can be synthesized on the feature itself or after immobilization of the oligonucleotide G to an immobilization array. FIG. 3 shows an exemplary method of synthesizing an encoded molecule from a sorted oligonucleotide G 301 that has been sorted according to the methods of sorting a plurality of oligonucleotides G described herein, on a feature 303. The oligonucleotide G 301 is immobilized on the feature 303 by hybridization of a codon with the multiplicity of capture oligonucleotides 302. A charged positional building block 305 is added to the feature 303. The charged positional building block 305 comprises a reactive site 306 and an anti-codon 307. The anti-codon 307 of the charged positional building 305 is capable of hybridizing (through the anti-codon 307) with a codon of the plurality of codons of the oligonucleotide G 301 that is specifically hybridized to feature 303. The reactive site 308 of oligonucleotide G 301 reacts with the reactive site 306 of the charged positional building block 305 to form a covalent bond between the reactive site 308 and the charged positional building block 305. After the reaction, the anti-codon 307 may be removed, leaving the oligonucleotide G 301 comprising two building blocks (a first building block and a positional building block; see right side of FIG. 3).

[0033] Alternatively, charged position building block may be added in solution or a free building block may be added to react with the oligonucleotide G that is immobilized on a feature to form a covalent bond at the encoded region. In other examples, the oligonucleotide that is immobilized on a feature may be eluted or otherwise separated from the feature. A charged position building block may then be added in solution or a free building block may be added to react with the oligonucleotide G to form a covalent bond at the encoded region.

[0034] In a further alternative, an encoded molecule can be synthesized while G is immobilized on an immobilization array. The eluted portion of the plurality of oligonucleotides G (i.e., eluted from a feature, following sorting using any of the methods of sorting a plurality of oligonucleotides G described herein) are immobilized on an immobilization array. As shown in FIG. 4, an oligonucleotide G 401 (which was sorted by the methods described herein, and then eluted from a feature) is immobilized on immobilization array 403. The immobilization array comprises a solid support, such as the ion exchange resin 402. The ion exchange resin can be an anion exchange resin that binds with the eluted portion of the plurality of oligonucleotides G (e.g., the positively charged anion exchange resin binds with the negatively charged DNA backbone of the sorted portion of the oligonucleotide G). A building block 405 is added to the immobilization array 403. The building block comprises a reactive site 406. The reactive site 407 of the oligonucleotide G 401 reacts with the reactive site 406 of the building block 405 to form a covalent bond between the reactive site 407 and the reactive site 406, leaving the oligonucleotide G 401 comprising two building blocks (a first building block and a positional building block; see right side of FIG. 4).

[0035] The synthesized encoded molecule corresponds to and may be identified by the coding region of the serially enriched oligonucleotide G. The encoded molecules may be subjected to downstream analysis for selection of encoded molecules possessing specific properties (e.g., binding to a particular target molecule). The coding region of the encoded molecules selected for said properties can be PCR amplified and sequenced to determine the identity of the building blocks of the encoded molecules.

Definitions

[0036] As used herein, the singular forms “a,” “an,” and “the” include the plural references unless the context clearly dictates otherwise.

[0037] Reference to “about” a value or parameter herein includes (and describes) variations that are directed to that value or parameter per se. For example, description referring to “about X” includes description of “X”.

[0038] It is understood that aspects and variations of the invention described herein include “consisting” and/or “consisting essentially of’ aspects and variations.

[0039] Unless otherwise noted, the term “hybridize,” “hybridizing,” and “hybridized” includes Watson-Crick base pairing, which includes guanine-cytosine and adenine-thymine (G-C and A- T) pairing for DNA and guanine-cytosine and adenine-uracil (G-C and A-U) pairing for RNA. These terms are used in the context of the selective recognition of a strand of nucleotides for a complementary strand of nucleotides, called a codon, which is complementary and hybridizes to a capture oligonucleotide on a feature.

[0040] The terms “end” and “terminus”, in the context of describing the position of a feature of the nucleic acids described herein, are used synonymously to mean a position that is near the absolute end or absolute terminus of a linear nucleic acid molecule. For example, an initial building block linked to any one of the 20 nucleic acids at the 5’ end of a nucleic acid may be described as being at a position at the “5’ end” or “5’ terminus” of the nucleic acid.

[0041] An “encoded molecule” is synthesized from the eluted portion of the plurality of oligonucleotides G. The encoded molecule is formed when the reactive site present on an oligonucleotide G reacts with a charged positional building block to form a covalent bond between the reactive site and the charged positional building block. The charged positional building block comprises any of the building blocks described herein.

[0042] The “encoded region” of an encoded molecule refers to the portion of the molecule that comprises one or more building blocks.

[0043] As used herein, the terms “upstream” and “downstream” are used to refer to relative positions of features, such as a first feature and a second feature, in a system. “Upstream” indicates that said feature of a system is positioned before another feature of the system in series, and “downstream” indicates that said feature of a system is positioned after another feature of the system in series.

[0044] The term “coding region” is used to describe a region of an oligonucleotide G that is used to identify the building blocks of the encoded molecule. For example, the coding region may be an oligonucleotide comprising a plurality of codons that encodes and directs the synthesis of a encoded molecule, wherein the coding region determines which charged positional building blocks comprising anti-codons may hybridize to a codon of the coding region of oligonucleotide G, thereby synthesizing an encoded molecule.

[0045] As used herein, a “plurality” of x means two or more of x. As used herein, a “multiplicity” of x means a plurality of x wherein each x are identical.

[0046] When a range of values is provided, it is to be understood that each intervening value between the upper and lower limit of that range, and any other stated or intervening value in that stated range, is encompassed within the scope of the present disclosure. Where the stated range includes upper or lower limits, ranges excluding either of those included limits are also included in the present disclosure.

[0047] The section headings used herein are for organization purposes only and are not to be construed as limiting the subject matter described. The description is presented to enable one of ordinary skill in the art to make and use the invention and is provided in the context of a patent application and its requirements. Various modifications to the described embodiments will be readily apparent to those persons skilled in the art and the generic principles herein may be applied to other embodiments. Thus, the present invention is not intended to be limited to the embodiment shown but is to be accorded the widest scope consistent with the principles and features described herein. [0048] The disclosures of all publications, patents, and patent applications referred to herein are each hereby incorporated by reference in their entireties. To the extent that any reference incorporated by reference conflicts with the instant disclosure, the instant disclosure shall control.

[0049] All publications, comprising patent documents, scientific articles and databases, referred to in this application are incorporated by reference in their entirety for all purposes to the same extent as if each individual publication were individually incorporated by reference. If a definition set forth herein is contrary to or otherwise inconsistent with a definition set forth in the patents, applications, published applications and other publications that are herein incorporated by reference, the definition set forth herein prevails over the definition that is incorporated herein by reference. The section headings used herein are for organizational purposes only and are not to be construed as limiting the subject matter described.

Methods of routing/sorting a plurality of oligonucleotides

[0050] Combinatorial chemistry using DNA-directed synthesis relies on the addition of building blocks to encoded molecules under the direction of codons. Oligonucleotides used to encode encoded molecules are typically prepared initially as a complex library of oligonucleotides (each an instance of “G”) which include portions comprising different combinations of codons. Each codon is designed to direct the addition of a building block to the molecule.

[0051] Typically, capture oligonucleotides are immobilized on different pools of beads. The differently labeled beads (comprising capture oligonucleotides) are then positioned in a spatially addressable array (e.g., hybridization array). The complex mixture of oligonucleotides (i.e., the library of oligonucleotides, each an instance of G) is then added to the array. Through simple mixing and diffusion, the oligonucleotides then, by chance, find the bead containing appropriate capture oligonucleotides for pull-down of an oligonucleotide containing an appropriate codon (which hybridizes with the capture oligonucleotide). This process is slow and inefficient, and often involves hours of mixing and diffusion time. Once a sufficient number of oligonucleotides have been sorted to their respective capture oligonucleotides, they may be further sorted based on another codon and/or used to synthesize an encoded molecule under the direction of the codons. In the synthesis process, building blocks are added sequentially to the encoded portion of the molecule. [0052] The present application relates, in general, to improved methods of sorting oligonucleotides (each according to “G”) to their respective capture oligonucleotides. The methods use a system comprising one or more features and include a step of applying an electric current across the system to migrate a plurality of oligonucleotides G to various features of a hybridization array. The methods more efficiently direct the oligonucleotides G to their respective capture oligonucleotides on a feature and also improve the speed of the hybridization. Once a portion of the oligonucleotides G have bound the capture oligonucleotides of a first feature of a hybridization array, the portion of oligonucleotides G that do not hybridize can then be moved to a next feature (containing capture oligonucleotides capable of hybridizing with another codon) or removed (such as by continuing to apply the electric current to migrate them past the feature or by washing). The portion of oligonucleotides G bound to the feature may then be eluted and either: (1) used to synthesize an encoded molecule, or (2) sorted further by passage through another system.

[0053] Thus, in one aspect, provided herein is a method of sorting a plurality of oligonucleotides (i.e., a plurality of oligonucleotides G) using a system comprising one or more features and including a step of applying an electric current. FIGs. 2A-2B show an exemplary method of sorting a plurality of oligonucleotides G using an electrophoretic routing system.

[0054] The oligonucleotides “G” described herein refer to molecules comprising both the oligonucleotide itself (comprising a coding region comprising a plurality of codons) but also an encoded region (if present), depending on context. The oligonucleotides G each comprise at least a plurality of codons. In some embodiments, the plurality of oligonucleotides G comprise a first building block comprising a reactive site. The reactive site allows for the sequential addition of further building blocks to the molecule. In some embodiments, the oligonucleotides G are applied to a system comprising one or more features, wherein the oligonucleotides G are loaded upstream of the one or more features of a hybridization array in separation medium (wherein upstream is with respect to the direction the oligonucleotides move upon application of an electric current to the system). In some embodiments, an electrical current is applied across the system for a time sufficient to migrate the oligonucleotides G to a first feature. The first feature comprises a multiplicity of first capture oligonucleotides (e.g., a plurality of the same capture oligonucleotides). [0055] In some embodiments, the system comprises more than one feature, such as two features, three features, four features, five features, or more, of a hybridization array. In some embodiments, each feature of a plurality of features comprises a specific set of capture oligonucleotides. For example, in some embodiments, the first feature comprises capture oligonucleotides specific for one codon and the second feature comprises capture oligonucleotides specific for a different codon. In some embodiments, each feature of a hybridization array in the system comprises capture oligonucleotides capable of specifically hybridizing with different codons.

[0056] The plurality of oligonucleotides G are migrated through the system by application of an electric current across the system. Portions of the plurality of oligonucleotides G will cease migration upon hybridization of a codon with the capture oligonucleotides of a feature. The time that the electric current is applied (and the voltage) determines how far other portions of the plurality of oligonucleotides G migrate through the system in the event they do not hybridize with capture oligonucleotides of any feature. In some embodiments, the plurality of oligonucleotides G comprises a portion of oligonucleotides G that do not specifically hybridize with the capture oligonucleotides of any of the features of the system. For example, a portion of the plurality of oligonucleotides G may not comprise a codon that specifically hybridizes with the capture oligonucleotides of any of the features of the system.

[0057] In some embodiments, the system may be configured such that one or more of the features of a hybridization array can be separated from the system. In some embodiments, a feature is separated from the system following the electrophoretic migration and capture of a portion of the oligonucleotides G on the feature (e.g., via the sequence specific hybridization of a codon of a portion of the plurality of the oligonucleotides G with the capture oligonucleotides of the feature). In some embodiments, the portion of the plurality of the oligonucleotides G are eluted from the feature. In some embodiments, the eluted portion of the plurality of the oligonucleotides G are eluted from a feature that has been previously separated from the system. [0058] Once a portion of the plurality of oligonucleotides G has been sorted by hybridizing with capture oligonucleotides of a feature of a hybridization array, it may be eluted and sorted further based on other codons. In other words, the portion of oligonucleotides G will typically still be a complex population comprising different codons after a single round of sorting. The first sorting step merely sorts this portion based on the presence of one codon. Substantially all of this first portion of oligonucleotides G would comprise a codon which specifically hybridized with the particular feature of a hybridization array which they were associated with. Once the portion of oligonucleotides G are eluted from this feature, they may then be further sorted based on other codons. Accordingly, in some aspects, provided herein is a method of serially enriching a plurality of oligonucleotides G. A first set of oligonucleotides G can be obtained from any of the methods described herein from an original plurality of oligonucleotides G. The first set of oligonucleotides G (e.g., the portion of oligonucleotides Gthat was eluted from the first feature) can be applied to a system in a separation medium, at a position upstream of a first feature. Following the loading of the first set of oligonucleotides Gto the system, the oligonucleotides G are electrophoretically migrated through the system by the application of an electric current. A portion of the set of oligonucleotides G comprising a codon that specifically hybridizes with the capture oligonucleotides of the first feature of the system. Finally, the hybridized oligonucleotides G can be eluted and separated from the first feature, thus generating an enriched plurality of oligonucleotides G that specifically hybridize with capture oligonucleotides of the first feature of the system.

[0059] The first set of oligonucleotides G (e.g., the portion oligonucleotides G eluted from the feature) may be serially enriched. In some embodiments, the first set of oligonucleotides G is obtained, wherein the first set of oligonucleotides G is the eluted portion of the plurality of oligonucleotides G from any feature. In some embodiments, the first set of oligonucleotides G in a separation medium are loaded onto a system comprising at least one feature, at a position upstream of the feature. In some embodiments, a portion of the first set of oligonucleotides G specifically hybridizes with capture oligonucleotides of the feature when an electric current is applied across the system for a time sufficient to migrate a portion of the first set of oligonucleotides G to the feature. In some embodiments, the portion of the first set oligonucleotides G are eluted from the feature to generate a first set of oligonucleotides G, thereby further serially enriching the plurality of oligonucleotides G. In some embodiments, the feature is separated from the system prior to elution of the portion of the first set oligonucleotides G from the feature. System

[0060] Provided herein are systems comprising at least a separation medium and a first feature of a hybridization array. An exemplary system is shown in FIG. 1. The methods described herein use the system to sort a plurality of oligonucleotides G.

[0061] The system of the present invention comprises at least one feature of a hybridization array. In some embodiments, the system comprises a first feature and a second feature of a hybridization array. In some embodiments, the system comprises more than 2 features, such as any of 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 500, 1,000, 1,500, 2,000, 2,500, 3,000, 3,500, 4,000 or more features of a hybridization array. In some embodiments, the system comprises about 96 features of a hybridization array. In some embodiments, the system comprises about 384 features of a hybridization array. In some embodiments, the system comprises about 3072 features of a hybridization array.

[0062] The system may comprise a power supply to produce an electric current across the system. The electric current allows for the migration of the plurality of oligonucleotides G across the system, allowing for migration to, and interaction with, the plurality of oligonucleotides G with one or more features in the system. In some embodiments, the power supply can be controlled such that the rate of electrophoretic migration across the system can be adjusted to regulate the migration of the plurality of oligonucleotides G. In some embodiments, the power supply is a linear, switched, or battery-based power supply. In some embodiments, the power supply is a DC power supply. In some embodiments, the DC power supply is a battery eliminator, a constant voltage supply, a constant voltage/constant current supply, a multiple output supply, a programmable supply, or a multi-range supply.

[0063] In some embodiments, the system comprises a pump. The pump allows for forcing aqueous solution through the system, for example, to wash the system, exchange buffer, or elute portions of the oligonucleotides G from the system.

[0064] In some embodiments, the system comprises containers for various solutions (e.g., buffers such as the aqueous separation medium). In some embodiments, the system comprises two containers for buffers. The containers may be any suitable container known in the art that are compatible with the methods described herein. In some embodiments, the buffers mediate the electrophoretic migration of a plurality of oligonucleotides G across the system. In some embodiments, the buffers comprise an aqueous separation medium. The aqueous separation medium facilitates the separation (e.g., migration) of a plurality of oligonucleotides G across the system. In some embodiments, the separation medium is compatible with nucleic acids, such that nucleic acids are not degraded in the system. In some embodiments, the separation medium is compatible with non-nucleic acids. For example, the separation medium may be compatible with chemical building blocks.

[0065] In some embodiments, the system comprises a wash solution. In some embodiments, the wash solution may be used to remove unbound portions of the oligonucleotides G (i.e., a portion of the oligonucleotides G that are not captured by any feature) from the system. In some embodiments, the unbound oligonucleotides G do not comprise a codon that is complementary to or capable of hybridizing with a capture oligonucleotide of a feature of the system. In some embodiments, the wash solution is loaded onto the system following the addition of a plurality of oligonucleotides G to the system.

Hybridization array

[0066] The systems and methods described herein for sorting a plurality of oligonucleotides G comprise at least one hybridization array. A “hybridization array” comprises a plurality features, wherein a feature comprises a multiplicity of capture oligonucleotides (e.g., a plurality of the same capture oligonucleotides) that are capable of specifically hybridizing with a codon of an oligonucleotide G. To sort the plurality of oligonucleotides G (which each comprise a plurality of various codons), the feature should not specifically hybridize with at least one other codon in the plurality of oligonucleotides G. In certain embodiments of the methods, a feature includes a substrate of at least two separate areas having immobilized capture oligonucleotides on their surface. In some embodiments, each area of the feature contains a different immobilized capture oligonucleotide, wherein the capture oligonucleotide is an oligonucleotide sequence that is capable of hybridizing with one or more codons of the coding regions of a portion of the plurality of oligonucleotides G. In some embodiments, the feature uses two or more chambers. In some embodiments, the chambers of the feature contain particles, such as beads, that have immobilized capture oligonucleotides on the surface of the beads.

[0067] By immobilizing a capture oligonucleotide on a feature, the plurality of oligonucleotides G may be sorted or selectively separated into sub-pools (i.e., portions) of oligonucleotides G on the basis of the particular oligonucleotide sequence of each coding region comprising a plurality of codons specifically hybridizing with the capture oligonucleotides. In some embodiments, the separated sub-pools of oligonucleotides G can then be separately released or removed from the feature into reaction chambers for further chemical processing. In some embodiments, the step of releasing is optional, not generally limited, and can include dissociating the molecules by heating, using denaturing agents, or exposing the molecules to buffer of pH>12. In some embodiments, the chambers or areas of the array containing different immobilized oligonucleotides can be positioned to allow the contents of each chamber or area to flow into an array of wells for further chemical processing.

[0068] The feature comprises a multiplicity of capture oligonucleotides (e.g., a plurality of the same capture oligonucleotides), which, in some embodiments, is particular to one feature in the system. For example, in some embodiments, a first feature comprises a multiplicity of first capture oligonucleotides capable of specifically hybridizing with a plurality of codons used in a first coding position in oligonucleotide G. In some embodiments, a second feature comprises a multiplicity of second capture oligonucleotides capable of specifically hybridizing with a plurality of codons used in a second coding position in oligonucleotide G. In some embodiments, the multiplicity of first capture oligonucleotides are different from the multiplicity of second capture oligonucleotides. In some embodiments, each feature of the system comprises a different multiplicity of capture oligonucleotides from the multiplicity of capture oligonucleotides of the other features. In other words, in some embodiments, each feature is capable of specifically hybridizing with a different set of codons within G.

[0069] In some embodiments, the capture oligonucleotide comprises between about 6 to about 50 nucleotides, such as between any of about 6 to about 20, about 8 to about 30, about 15 to about 25, and about 30 to about 50 nucleotides. In some embodiments, the anti-codon comprises less than about 50 nucleotides, such as less than any of about 45, 40, 35, 30, 25, 20, 15, 10, or 6 nucleotides. In some embodiments, the anti-codon comprises about 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides. In some embodiments, the anti-codon comprises between about 8 and about 30 nucleotides. In some embodiments, the length of the anti-codon is dependent on the length of the codon. In some embodiments, the length of the anticodon is about the same as the length as the codon. [0070] In some embodiments, the capture oligonucleotides are attached to the feature by a linker. The linker may serve to anchor the capture oligonucleotide to the feature of a hybridization array.

[0071] In some embodiments, the hybridization array comprises a plurality of features (i.e., more than one feature). For example, the hybridization array may comprise a first feature and a second feature. In some embodiments, the hybridization array comprises more than 2 features, such as any of 5, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 500, 1,000, 1,500, 2,000, 2,500, 3,000, 3,500, 4,000 or more features. In some embodiments, the hybridization array comprises about 96 features. In some embodiments, the hybridization array comprises about 384 features. In some embodiments, the hybridization array comprises about 3072 features. In some embodiments, a feature of a hybridization array comprises different capture oligonucleotides than another feature of a hybridization array. In some embodiments, each feature of a hybridization array comprises different capture oligonucleotides.

[0072] In some embodiments, the plurality of features of a hybridization array are electrophoretically coupled in the system. The features are “electrophoretically coupled” when charged molecules (such as oligonucleotides) migrate upon application of an electric current across the system and through the features. For example, in the exemplary system of FIG. 1, if an oligonucleotide is added to container 101 in aqueous medium 104, the features 103, 105, 106, 107, and 108 of the hybridization array are electrophoretically coupled if the oligonucleotide is capable of migrating across the system through each feature upon application of an electric current of sufficient voltage and for sufficient time across the system. In some embodiments, the plurality of features are connected in series (that is, in sequence) in the system; a first feature is upstream of a second feature, a second feature is upstream of a third feature, and a third feature is upstream of a fourth feature (such as, for example, the exemplary system 100 of FIG. 1). As a plurality of oligonucleotides G migrates through the system upon application of the electric current, any portion of the oligonucleotides G capable of specifically hybridizing with a particular feature will be immobilized at the feature and will substantially cease migrating further through the system.

[0073] In some embodiments, the features are arranged within the system to optimize the sorting of a plurality of oligonucleotides G. In some embodiments, the features of the system are substantially equidistant from one another, such that the migration time between each feature is approximately the same. In some embodiments, the features of the system are not equidistant from one another.

[0074] In some embodiments, one or more, or all, of the features are coupled to the system such that they may be removed from the system. In some embodiments, each feature of the system may be configured to be separated from the system. In some embodiments, one feature of a plurality of features may be configured to be separated from the system, while the other features of the plurality of features may remain attached to the system. A particular feature may be separated from the system in order to separate a portion of oligonucleotides G possessing particular codons from the plurality of oligonucleotides G.

[0075] In some embodiments, the features are separated from the system following sorting of a plurality of oligonucleotides G. In some embodiments, a hybridization array comprising a plurality of features is separated from the system following sorting of a plurality of oligonucleotides G. The captured portion of oligonucleotides G (i.e., the portion of oligonucleotides G specifically hybridized to capture oligonucleotides of the particular feature) are separated from the system with the feature. The captured portion of oligonucleotides G may be eluted from the feature. In some embodiments, the captured portion of oligonucleotides G are eluted from the feature while it is attached to the system. In some embodiments, the captured portion of oligonucleotides G are eluted from the feature that has been previously separated from the system. In some embodiments, the portion of the plurality of oligonucleotides G are eluted from the feature using an elution solution. Any suitable elution solution may be used that disrupts the association of the portion of oligonucleotides G from the capture oligonucleotides, or allows the disruption of the portion of oligonucleotides G from the capture oligonucleotides, while otherwise maintaining the stability of the oligonucleotides G, including features of the oligonucleotides G such as a first building block and reactive site. Those skilled in the art may readily select such suitable solutions.

[0076] The eluted portion of oligonucleotides G may be serially enriched. In some embodiments, serial enriching comprises eluting a portion of the plurality of oligonucleotides G from a feature of the hybridization array. In some embodiments, the feature is separated from the system before eluting the portion of the plurality of oligonucleotides G. In some embodiments, the eluted portion of oligonucleotides G are loaded onto another system (comprising different features than the preceding system) following elution from the feature of the preceding system. In some embodiments, the eluted portion of oligonucleotides G are sorted according to any of the sorting methods described herein. In some embodiments, the sorting is repeated to generate a serially enriched portion of the plurality of oligonucleotides G. In some embodiments, the method of sorting is repeated once to generate a serially enriched portion of a plurality of oligonucleotides G. In some embodiments, the method of sorting is repeated twice, three times, four times, or more, to generate a serially enriched portion of a plurality of oligonucleotides G. In some embodiments, the serially enriched oligonucleotides G may be used to synthesize an encoded molecule.

[0077] The capture of the oligonucleotides G by capture oligonucleotides of the hybridization array may not be 100% efficient. For example, even though a portion of oligonucleotides G is capable of binding capture oligonucleotides of a feature of the hybridization array, the entire portion of oligonucleotides G may not be completely captured by the feature. In some embodiments, substantially all of a portion of oligonucleotides G binds to a feature of the hybridization array comprising capture oligonucleotides specific for the portion of oligonucleotides G. In some embodiments, at least about 50% of a portion of oligonucleotides G binds to a feature of the hybridization array comprising capture oligonucleotides specific for the portion of oligonucleotides G, such as at least any of about 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, 99%, and values and ranges therebetween.

Electric current

[0078] An electric current is applied across the system comprising a hybridization array comprising a plurality of features, provided herein, to migrate oligonucleotides G through the system. A separation medium conducts the electric current through the system. The time the electric current is applied may be selected based on the distance the oligonucleotides G need to migrate through the system. In some embodiments, the electric current is applied across the system for at least a time sufficient to migrate a first portion of the plurality of oligonucleotides G to a first feature of the system. In some embodiments, the electric current is applied across the system for at least a time sufficient to migrate a first portion of the plurality of oligonucleotides G to the first feature and a second portion of the plurality of oligonucleotides G to a second feature of the system. In some embodiments, the electric current is applied across the system for at least a time sufficient to migrate a first portion of the plurality of oligonucleotides G to the first feature, a second portion of the plurality of oligonucleotides G to a second feature, and a third portion of the plurality of oligonucleotides G to a third feature of the system. In some embodiments, the electric current is applied across the system for at least a time sufficient to migrate a first portion of the plurality of oligonucleotides G to the first feature, a second portion of the plurality of oligonucleotides G to a second feature, a third portion of the plurality of oligonucleotides G to the third feature, and a fourth portion of the plurality of oligonucleotides G to a fourth feature of the system. In some embodiments, the system comprises up to about 3072 features, or more.

[0079] In some embodiments, the electric current is applied to the system for a duration that corresponds to the voltage used. For example, if a higher voltage is used, the electric current may be applied to the system for a smaller amount of time compared to a lower voltage. The voltage selected should be limited by the ability of the system to dissipate increasing heat from increasing voltage.

[0080] In some embodiments, the electric current is applied to the system for about 5 minutes to about 1.5 hours, such as any of about 5 minutes to about 30 minutes, 20 minutes to about 40 minutes, about 30 minutes to about 1 hour, and about 45 minutes to about 1.5 hours. In some embodiments, the electric current is applied to the system for less than about 1.5 hours, such as less than any of about 1.25 hours, 1 hour, 45 minutes, 30 minutes, 20 minutes, 10 minutes, 5 minutes, or less.

[0081] In some embodiments, the electric current comprises a constant voltage that is sufficient to migrate the plurality of oligonucleotides Gto the features (e.g., a first feature, a second feature, etc.) of the system. In some embodiments, the voltage of the electric current is varied during the migration of the plurality of oligonucleotides G to the features of the system. In some embodiments, the electric current may be turned on and turned off during the migration of the plurality of oligonucleotides G to the features of the system. In some embodiments, the electric current may use reverse polarity separation.

[0082] In some embodiments, the electric current comprises a voltage that does not damage the plurality of oligonucleotides G. In some embodiments, the electric current comprises a voltage of between about 5 V to about 500 V, such as between any of about 5 V to about 100 V, 80 V to about 120 V, about 100 V to about 250 V, and about 200 V to about 500 V. In some embodiments, the voltage is less than about 500 V, such as less than any of about 450 V, 400 V, 350 V, 300 V, 250 V, 200 V, 180 V, 160 V, 140 V, BO V, 120 V, 110 V, 100 V, 90 V, and 80 V. In some embodiments, the voltage is at least about 5 V, such as at least any of about 20 V, 30 V, 40 V, 50 V, 60 V, 70V, 80V, 90 V, 100 V, 110 V, 120 V, 130 V, 140 V, 160 V, 180 V, 200 V, 250 V, 300 V, 350 V, 400 V, 450 V, or 500 V.

[0083] In some embodiments, the system is cooled during the application of the electric current. In some embodiments, the system is cooled following the application of the electric current. A benefit of cooling the system is that the plurality of oligonucleotides G, and other components of the system, are not damaged during the application of the electric current across the system due to excessive heating. Cooling the system also allows for higher voltage to be used, thus increasing the speed of the sorting. In some embodiments, the cooling comprises applying ice to the system. In some embodiments, the system is placed in a cold room (e.g., a room at no more than about 10 °C ambient temperature). In some embodiments, the system is cooled with a water bath, a water jacket, or by incorporation of a device including a water jacket or water bath.

Methods of synthesizing an encoded molecule

[0084] In some aspects, further provided herein are methods of synthesizing an encoded molecule from a portion of a plurality of oligonucleotides G. In some embodiments, the portion of a plurality of oligonucleotides G are obtained from a sorting method described herein. In some embodiments, the plurality of oligonucleotides G each comprise a first building block comprising a reactive site. The first building block comprising a reactive site is configured to react with a positional building block comprising a reactive site. The two reactive sites react, producing an oligonucleotide G comprising a first building block and a positional building block. Additional building blocks can then be successively added through additional rounds of reactions. The positional building blocks are identified by, and their synthesis is directed by, the codons of the oligonucleotide G. Thus, the oligonucleotide G comprises both a coding region (containing a plurality of codons which identifies and directs the synthesis of the encoded region) and an encoded region which comprises at least a first building block and, after one or more rounds of synthesis, one or more positional building blocks. The methods of sorting otherwise described herein allow for isolating portions of oligonucleotides G containing particular codons (from a complex library of oligonucleotides G which contain many different combinations of codons) in advance of synthesizing the encoded region (i.e., adding positional building blocks to the oligonucleotides G).

[0085] In certain embodiments, the method of synthesizing an encoded molecule includes providing at least one feature (e.g., one feature separated from a system used for sorting/routing a plurality of oligonucleotides G). The features that are amenable to a method of synthesizing an encoded molecule are described herein. In some embodiments, the feature used for synthesizing an encoded molecule is a hybridization that has been separated from a system used for sorting a plurality of oligonucleotides G. In some embodiments, the feature used for synthesizing an encoded molecule is a hybridization that has not been separated from the system. In some embodiments, the feature comprises a portion of oligonucleotides G hybridized to capture oligonucleotides of the feature.

[0086] In some embodiments, the method of synthesizing an encoded molecule uses a series of “sort and react” steps. In some embodiments, the plurality of oligonucleotides G containing different combinations of encoding regions are sorted into sub-pools (e.g., portions) by sorting and selective hybridization of one or more coding regions of the oligonucleotides G with capture oligonucleotide of a feature, as provided herein.

[0087] One benefit of sorting the oligonucleotides G into portions is that the sorting allows for each portion to be reacted with a charged positional building block under separate reaction conditions before the portions of oligonucleotides G are combined or mixed for further chemical processing. In some embodiments, each codon uniquely identifies a positional building block, because the identity of the coding region (comprising a plurality of codons) can be correlated to the identity of the reaction process used to add each positional building block, which would include the identity of the positional building block added to the chain of building blocks on an oligonucleotide G. It is understood that the plurality of oligonucleotides G can include one or more coding regions comprising codons that are identical between or among molecules in a pool, but it is also understood that the vast majority, if not all, of the molecules in the pool would have a different combination of coding regions comprising codons. In some embodiments, there may be millions of copies of different combinations of coding regions comprising codons.

[0088] As shown in FIG. 3, in some embodiments the method comprises introducing a charged positional building block 305 (comprising a reactive site 306 and an anti-codon 307) to a feature 303. In some embodiments, the feature may be a feature that has been separated from a system following sorting of a plurality of oligonucleotides G using the methods described herein. In some embodiments, the feature comprises a portion of the plurality of oligonucleotides G that are specifically hybridized with capture oligonucleotides of the feature. In some embodiments, the separation of the feature from the system does not disturb the hybridization of the portion of oligonucleotides G with the capture oligonucleotides of the feature. In some embodiments, a codon of the plurality of codons of the coding region of the portion of the plurality of oligonucleotides G is specifically hybridized with a capture oligonucleotide of the feature (as in FIG. 3).

[0089] In some embodiments, a positional building block comprising a reactive site is introduced to the portion of the plurality of oligonucleotides G on a feature. In some embodiments, the positional building block comprising a reactive site is introduced to the portion of the plurality of oligonucleotides G on a feature that has been separated from a hybridization array. The reactive site of the positional building block reacts with the portion of the plurality of oligonucleotides G at a reactive site in the encoded region of the oligonucleotide G. In some embodiments, the reaction forms a covalent bond between the reactive site of the eluted portion of the plurality of oligonucleotides G and the positional building block.

[0090] Part or all of the coding region of the oligonucleotides G may be single stranded to facilitate hybridization with the anti-codon of the charged positional building block during synthesis of an encoded molecule. In some embodiments, the hybridization of the anti-codon of the charged positional building block with the captured oligonucleotides G is encoded by the coding region of the oligonucleotides G.

[0091] It is understood that different solvents and co-reactants may be used, under acidic, basic, or neutral conditions, depending on the coupling chemistry that is used to react the charged positional building block with the reactive portion of the oligonucleotides G.

[0092] In some embodiments, the method of synthesis further comprises eluting the captured portion of oligonucleotides G from the feature. In some embodiments, the feature is separated from the system prior to elution of the captured portion of oligonucleotides G from the feature. In some embodiments, the captured portion of oligonucleotides G are eluted from the feature using an elution solution. In some embodiments, the elution solution comprises a detergent (e.g., Triton® X-100). In some embodiments, heat is used to elute the captured oligonucleotides G. [0093] In some embodiments, the eluted portion of oligonucleotides G is immobilized on an immobilization array. The immobilization array allows for non-sequence-specific immobilization of the oligonucleotides G (as compared with the sequence specific immobilization of the features), which may improve the efficiency of reaction between the reactive site of the oligonucleotide G and the charged positional building block. In some embodiments, the immobilization array comprises a solid phase. In some embodiments, the solid phase comprises a bead. In some embodiments, the solid phase is an ion exchange resin. In some embodiments, the solid phase is an anion exchange resin (e.g., a positively charged resin). The anion exchange resin is capable of binding with the negatively charged eluted portion of oligonucleotides G. In some embodiments, the anion exchange resin is a strong anion exchange resin. In some embodiments, the anion exchange resin is a weak anion exchange resin. In some embodiments, the anion exchange resin comprises polyethyleneimine (mixed amine), dimethylaminopropyl, quaternized polyethyleneimine (mixed amine), or a fully quaternized amine. In some embodiments, the anion exchange resin comprises an ammonium ion. In some embodiments, the anion exchange resin is a SuperQ 650M resin.

[0094] In some embodiments, the eluted portion of oligonucleotides G is immobilized on the immobilization array comprising the resin (an exemplary scheme is provided in FIG. 4). In some embodiments, the eluted portion of oligonucleotides G are coupled to the immobilization array when the negatively charged backbone of the DNA oligonucleotide interacts with a positively charged resin of the immobilization array.

[0095] In some embodiments, a positional building block comprising a reactive site is introduced to the immobilization array comprising the immobilized eluted portion of oligonucleotides G. The reactive site of the positional building block reacts with the immobilized eluted portion of the plurality of oligonucleotides G at a reactive site in the encoded region of the oligonucleotide G. In some embodiments, the reaction forms a covalent bond between the reactive site of the eluted portion of the plurality of oligonucleotides G and the positional building block.

[0096] In some embodiments, the reaction between the charged positional building block and the reactive site of the oligonucleotides G (e.g., oligonucleotides G on a feature or on an immobilization array) produces a synthesized encoded molecule. In some embodiments, the synthesized encoded molecule corresponds to and may be identified by the coding region of the oligonucleotide G. In some embodiments, the encoded molecules may be subjected to downstream analysis for selection of encoded molecules possessing specific properties (e.g., binding to a particular target molecule). The coding region of the encoded molecules selected for said properties can be PCR amplified and sequenced to determine the identity of the building blocks of the encoded molecules.

Coding region and optional non-coding region

[0097] The nucleic acids (i.e., an oligonucleotide G) described herein comprise a coding region comprising a plurality of codons, and optionally non-coding regions. Non-coding regions may intersperse the codons of the coding region, for example (and thus the non-coding regions would be within the coding region itself). The coding region, by function of the codons, may be used to identify the building blocks (such as the first building block and the positional building blocks) of the encoded region synthesized using a method described herein, during downstream analyses. The plurality of codons of the coding region may also be used to sort a plurality of oligonucleotides G using a system comprising at least a first feature, wherein each feature comprises a multiplicity of capture oligonucleotides that hybridize with a codon of a portion of the oligonucleotides G.

[0098] The coding region may direct the sorting of a plurality of oligonucleotides G; the coding region, specifically a codon therein, determines which features of a system comprising features comprising capture oligonucleotides may hybridize to a portion of the plurality of oligonucleotides G. The coding region encodes and directs the synthesis of the encoded region of the oligonucleotide G; the coding region determines which charged positional building blocks comprising anti-codons may hybridize to the oligonucleotide G, and therefore which building blocks react with a reactive site on the oligonucleotide G and/or reactive sites on building blocks extending therefrom, to synthesize the encoded region. Additional description of coding region(s) and optional non-coding region(s) can be found in US 2020/0263163 Al and US 2019/0169607 Al, which are hereby incorporated by reference in their entirety for all purposes. [0099] The coding region may be partially or entirely single stranded. In some embodiments, the coding region contains from about 1% to 100%, such as any of about 50% to about 100% or about 90% to about 100%, single stranded oligonucleotide. In some embodiments, the coding region is at least partially single stranded. [0100] The oligonucleotide G comprising a reactive site may be used to synthesize an encoded molecule. The oligonucleotide G may be prepared with a first building block which comprises the reactive site. The encoded molecules are formed by reacting charged positional building blocks with the reactive site (which may be part of a first building block attached to the oligonucleotide G) of the oligonucleotide G. Additional positional building blocks are added by successively reacting the encoded region with additional positional building blocks under the direction of the coding region.

[0101] The building blocks are charged by attaching them to an anti-codon and then hybridizing the charged positional building block to a codon in the coding region of the sorted portion of the oligonucleotides G. The reactive site on the positional building block then reacts with the reactive site of the oligonucleotide G to form a covalent bond. Generally, any suitable building block can be attached to any suitable anti-codon to form a charged positional building block. Thus, if the sequence of the oligonucleotide G is known (which can be determined by PCR amplification and sequencing), and the building blocks used for each unique anti-codon during synthesis of the encoded molecule are known, then the identity of the encoded region of the synthesized encoded region can be determined.

[0102] In some embodiments, the oligonucleotide G comprises a coding region comprising at least two codons, wherein the at least two codons correspond to and can be used to identify a positional building blocks in the oligonucleotide G or molecules synthesized therefrom. In some embodiments, the coding region can be amplified by PCR to produce copies of the coding region and the original and/or copies can be sequenced to determine the sequence of the coding region of the oligonucleotide G. The determined sequence can be used to identify the positional building blocks. In some embodiments, the sequence of the coding regions can be correlated to the series of combinatorial chemistry steps used to synthesize the encoded region (such as the initial building blocks and positional building blocks extending therefrom).

[0103] In some embodiments, the coding region is double stranded. In some embodiments, the coding region is single stranded. In some embodiments, the coding region is partially single stranded. The coding region comprises a plurality of codons. The number of codons in the coding region determines how many unique anti-codons (e.g., anti-codons of charged positional building blocks) the coding region can specifically hybridize with. If the number of codons is below 2, the encoded portion may be too small to be practical. If the number of codons is too far above 20, synthetic inefficiencies may interfere with accurate synthesis. Thus, the number of codons is typically a value between these lower and upper quantities. In some embodiments, the coding region comprises between about 2 to about 21 codons, such as between any of about 2 to about 20 codons, about 5 to about 15 codons, and about 10 to about 21 codons. In some embodiments, the coding region comprises less than about 21 codons, such as less than about any of about 20, 15, 5, or 3 codons. In some embodiments, the coding region comprises about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, or 21 codons. In some embodiments, the coding region comprises between about 5 to about 20 codons. In some embodiments, the codons of the coding regions may overlap with one another.

[0104] DNA-encoded synthesis uses the above-described codons to hybridize with anti-codons (e.g., anti-codons attached to charged positional building blocks). The codons used in DNA- encoded synthesis are typically longer than those used in nature (i.e., those which are scanned by a ribosome along an mRNA). If a codon is less than about 6 nucleotides in length, the codon may not accurately direct synthesis of the encoded region. If a codon is too long, such as more than about 50 nucleotides, the codon may become cross-reactive. Such cross reactivity would interfere with the ability of the coding regions to accurately direct and identify the synthesis steps used to synthesize the coding region of the oligonucleotide G. Thus, in some embodiments, each codon of the plurality of codons of a coding region comprises between about 6 to about 50 nucleotides, such as between any of about 6 to about 20, about 8 to about 30, about 15 to about 25, and about 30 to about 50 nucleotides. In some embodiments, each codon comprises less than about 50 nucleotides, such as less than any of about 45, 40, 35, 30, 25, 20, 15, 10, or 6 nucleotides. In some embodiments, each codon comprises about 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, or 50 nucleotides. In some embodiments, each codon comprises between about 8 and about 30 nucleotides.

[0105] In some embodiments, the codons of the coding region overlap. In some embodiments, at least two of the codons of the coding region overlap so as to be coextensive, provided that the overlapping codons only share from about 30% to 1% of the same nucleotides, including about 20% to 1%, including from about 10% to 2%. In some embodiments of the oligonucleotide G, the coding region is from about 30% to 100%, including about from 60% to 100%, including about from 80% to 100%, single stranded. In some embodiments, the oligonucleotide G comprises at least two coding regions comprising at least one codon each, wherein at least two of the coding regions are adjacent. In some embodiments, the oligonucleotide G comprises at least two coding regions, wherein the at least two coding regions are separated by regions of nucleotides that do not direct or record synthesis of an encoded portion of the synthesized compound encoded molecule.

[0106] The oligonucleotide G may direct the synthesis of an encoded molecule by selectively hybridizing to a complementary anti-codon comprising a building block (i.e., a charged positional building block). In some embodiments, a codon of the coding region is unique to (e.g., corresponds to) the identity of a building block that is attached to a reactive site of the oligonucleotide G. In some embodiments, the charged positional building block comprises a building block and at least one corresponding anti-codon which hybridizes with at least one of the plurality of codons in the coding region.

[0107] In some embodiments, at least one codon in the coding region of the oligonucleotide G encodes the addition of a building block (e.g., a building block of a charged positional building block) to a reactive site. In some embodiments, at least one codon encodes the addition of a positional building block to the reactive site. In some embodiments, at least one codon of a plurality of codons encodes for the addition of one building block of a plurality of building blocks. In some embodiments, each codon of a plurality of codons encodes for the addition of one building block of a plurality of building blocks. In some embodiments, a plurality of codons encodes for the addition of a plurality of building blocks.

[0108] The coding region can contain natural and unnatural nucleotides. Suitable nucleotides include the natural nucleotides of DNA (deoxyribonucleic acid), including adenine (A), guanine (G), cytosine (C), and thymine (T), and the natural nucleotides of RNA (ribonucleic acid), adenine (A), uracil (U), guanine (G), and cytosine (C). Other suitable bases include natural bases, such as deoxyadenosine, deoxythymidine, deoxyguanosine, deoxycytidine, inosine, diamino purine; base analogs, such as 2-aminoadenosine, 2-thiothymidine, inosine, pyrrolo- pyrimidine, 3 -methyl adenosine, C5-propynylcytidine, C5-propynyluridine, C5-bromouridine, C5-fluorouridine, C5-iodouridine, C5 -methylcytidine, 7-deazaadenosine, 7-deazaguanosine, 8- oxoadenosine, 8-oxoguanosine, O(6)-methylguanine, 4-((3-(2-(2-(3- aminopropoxy)ethoxy)ethoxy)propyl)amino)pyrimidin-2(lH)-one, 4-amino-5-(hepta-l,5-diyn-l- yl)pyrimidin-2(lH)-one, 6-methyl-3,7-dihydro-2H-pyrrolo[2,3-d]pyrimidin-2-one, 3H- benzo[b]pyrimido[4,5-e][l,4]oxazin-2(10H)-one, and 2-thiocytidine; modified nucleotides, such as 2'-substituted nucleotides, including 2'-O-methylated bases and 2'-fluoro bases; and modified sugars, such as 2'-fluororibose, ribose, 2'-deoxyribose, arabinose, and hexose; and/or modified phosphate groups, such as phosphorothi oates and 5'-N-phosphoramidite linkages. It is understood that an oligonucleotide is a polymer of nucleotides. In certain embodiments, the coding region does not have to contain contiguous bases. In certain embodiments, the coding region can be interspersed with linker moieties or non-nucleotide molecules.

[0109] In some embodiments, the coding region of the oligonucleotide G contains from about 5% to 100%, including from about 5% to about 50%, about 40% to about 80%, about 80% to 99%, about 90% to about 99%, or about 100% DNA nucleotides. In some embodiments, the coding region contains from about 5% to about 100%, including from about 5% to about 50%, about 40% to about 80%, about 80% to about 99%, about 90% to about 99%, or about 100% RNA nucleotides. In some embodiments, wherein the coding region comprises a specified percentage of DNA nucleotides or RNA nucleotides, respectively, the remaining percentage comprises RNA nucleotides of DNA nucleotides, respectively.

[0110] In some embodiments, the oligonucleotide G may further comprise a non-coding region or a plurality of non-coding regions. The term "non-coding region," when present, refers to a region of the oligonucleotide G that does not correspond to any anti-coding nucleic acid used to synthesize a compound on the encoded region of the oligonucleotide G. In some embodiments, non-coding regions are optional. In some embodiments, the oligonucleotide G contains from 1 to about 20 non-coding regions, including from 2 to about 9 non-coding regions, including from 2 to about 4 non-coding regions. In some embodiments, the non-coding regions contain from about 4 to about 50 nucleotides, including from about 12 to about 40 nucleotides, and including from about 8 to about 30 nucleotides. In some embodiments, one or more of the non-coding regions are double stranded, which reduces cross-hybridization.

[0111] The addition of non-coding regions can separate codons in the coding region to avoid or reduce cross-hybridization, because cross-hybridization would interfere with accurate encoding of a compound synthesized from the oligonucleotide G. Further, the non-coding regions can add functionality to the coding region of the oligonucleotide G other than just hybridization with anti-codons or encoding. The non-coding regions may be interspersed with the codons of the coding region. For example, two codons of the coding region may be separated by a non-coding region. Thus, in some embodiments, a coding region comprises one or more non-coding regions. In some embodiments, one or more of the non-coding regions can be modified with a label, such as a fluorescent label or a radioactive label. Such labels can facilitate the visualization or quantification of the oligonucleotide G. In some embodiments, one or more of the non-coding regions are modified with a functional group or tether which facilitates processing. In some embodiments, one or more of the non-coding regions are double stranded (e.g., “blocked”), which reduces cross-hybridization. Suitable non-coding regions are typically selected that do not interfere with PCR amplification of the nucleic acid portion of the oligonucleotide G (e.g., noncoding regions do not interfere with identification of the building blocks used to synthesize an encoded molecule).

Building blocks

[0112] A "building block" as used herein is a chemical structural unit capable of being chemically linked to other chemical structural units (e.g., other building blocks). The oligonucleotide G may comprise at least a first building block, which is added during creation of the complex library of oligonucleotides G (which is then sorted by the sorting methods described herein). Positional building blocks may then be added to the encoded region of the oligonucleotide G during synthesis. The methods of synthesizing a molecule from an oligonucleotide G (e.g., a sorted portion of oligonucleotides G sorted according to the methods described herein, optionally a serially enriched portion of oligonucleotides G) described herein, in some aspects, require one or more building blocks.

[0113] In some embodiments, the first building block is not a nucleic acid or nucleic acid analog. In some embodiments, the positional building blocks are not nucleic acids or nucleic acid analogs. In some embodiments, a building block has one, two, or more reactive chemical groups that allow the building block to undergo a chemical reaction that links the building block to other chemical structural units (e.g., other chemical structural units present in other building block, such as positional building blocks). In some embodiments, the building block is linked to other chemical structural units (e.g., other building blocks) by a covalent bond.

[0114] It is understood that part or all of the reactive chemical group of a building block may be lost when the building block undergoes a reaction to form a chemical linkage. For example, a building block in solution may have two reactive chemical groups. In this example, the building block in solution can be reacted with the reactive chemical group of a building block that is part of a chain of building blocks to increase the length of a chain, or extend a branch from the chain. When a building block is referred to in the context of a solution or as a reactant, then the building block will be understood to contain at least one reactive chemical group, but may contain two or more reactive chemical groups. When a building block is referred to the in the context of a polymer, oligomer, or molecule larger than the building block by itself, then the building block will be understood to have the structure of the building block as a (monomeric) unit of a larger molecule, even though one or more of the chemical reactive groups will have been reacted.

[0115] The types of molecule or compound that can be used as a building block are not generally limited, so long as one building block is capable of reacting together with another building block to form a covalent bond. In some embodiments, the building block is not a nucleic acid or nucleic acid analog. In some embodiments, the building block is a chemical structural unit.

[0116] In some embodiments, the building block has one chemical reactive group to serve as a terminal unit. In some embodiments, the building block has 1, 2, 3, 4, 5, or 6 suitable reactive chemical groups. In some embodiments, a first initiator building block, a second initiator building block, and a polymer building block each independently have 1, 2, 3, 4, 5, or 6 suitable reactive chemical groups. Suitable reactive chemical groups for building blocks include, a primary amine, a secondary amine, a carboxylic acid, a primary alcohol, an ester, a thiol, an isocyanate, a chloroformate, a sulfonyl chloride, a thionocarbonate, a heteroaryl halide, an aldehyde, a haloacetate, an aryl halide, an azide, a halide, a triflate, a diene, a dienophile, a boronic acid, an alkyne, and an alkene.

[0117] Any coupling chemistry can be used to connect building blocks, provided that the coupling chemistry is compatible with the presence of an oligonucleotide.

[0118] Exemplary coupling chemistry includes, formation of amides by reaction of an amine, such as a DNA-linked amine, with an Fmoc-protected amino acid or other variously substituted carboxylic acids; formation of ureas by reaction of an amine, including a DNA-linked amine, with an isocyanate and another amine (ureation); formation of a carbamate by reaction of amine, including a DNA-linked amine, with a chloroformate (carbamoylation) and an alcohol; formation of a sulfonamide by reaction of an amine, including a DNA-linked amine, with a sulfonyl chloride; formation of a thiourea by reaction of an amine, including a DNA-linked amine, with thionocarbonate and another amine (thioureation); formation of an aniline by reaction of an amine, including a DNA- linked amine, with a heteroaryl halide (SNAr); formation of a secondary amine by reaction of an amine, including a DNA-linked amine, with an aldehyde followed by reduction (reductive animation); formation of a peptoid by acylation of an amine, including a DNA- linked amine, with chloroacetate followed by chloride displacement with another amine (an SN2 reaction); formation of an alkyne containing compound by acylation of an amine, including a DNA-linked amine, with a carboxylic acid substituted with an aryl halide, followed by displacement of the halide by a substituted alkyne (a Sonogashira reaction); formation of a biaryl compound by acylation of an amine, including a DNA-linked amine, with a carboxylic acid substituted with an aryl halide, followed by displacement of the halide by a substituted boronic acid (a Suzuki reaction); formation of a substituted triazine by reaction of an amine, including a DNA-linked amine, with a cyanuric chloride followed by reaction with another amine, a phenol, or a thiol (cyanurylation, Aromatic Substitution); formation of secondary amines by acylation of an amine including a DNA- linked amine, with a carboxylic acid substituted with a suitable leaving group like a halide or triflate, followed by displacement of the leaving group with another amine (SN2/SN1 reaction); and formation of cyclic compounds by substituting an amine with a compound bearing an alkene or alkyne and reacting the product with an azide, or alkene (Diels- Alder and Huisgen reactions). In certain embodiments of the reactions, the molecule reacting with the amine group, including a primary amine, a secondary amine, a carboxylic acid, a primary alcohol, an ester, a thiol, an isocyanate, a chloroformate, a sulfonyl chloride, a thionocarbonate, a heteroaryl halide, an aldehyde, a chloroacetate, an aryl halide, an alkene, halides, a boronic acid, an alkyne, and an alkene, has a molecular weight of from about 30 to about 1330 Daltons.

[0119] In some embodiments of the coupling reaction, the building block might be added by substituting an amine, including a DNA-linked amine, using any of the chemistries above with molecules bearing secondary reactive groups like amines, thiols, halides, boronic acids, alkynes, or alkenes. Then the secondary reactive groups can be reacted with building blocks bearing appropriate reactive groups. Exemplary secondary reactive group coupling chemistries include, acylation of the amine, including a DNA- linked amine, with an Fmoc-amino acid followed by removal of the protecting group and reductive animation of the newly deprotected amine with an aldehyde and a borohydride; reductive animation of the amine, including a DNA-linked amine, with an aldehyde and a borohydride followed by reaction of the now-substituted amine with cyanuric chloride, followed by displacement of another chloride from triazine with a thiol, phenol, or another amine; acylation of the amine, including a DNA-linked amine, with a carboxylic acid substituted by a heteroaryl halide followed by an SNAr reaction with another amine or thiol to displace the halide and form an aniline or thioether; and acylation of the amine, including a DNA-linked amine, with a carboxylic acid substituted by a haloaromatic group followed by substitution of the halide by an alkyne in a Sonogashira reaction; or substitution of the halide by an aryl group in a boronic ester-mediated Suzuki reaction.

[0120] In some embodiments, the coupling chemistries are based on suitable bond-forming reactions known in the art. See, for example, March, Advanced Organic Chemistry, fourth edition, New York: John Wiley and Sons (1992), Chapters 10 to 16; Carey and Sundberg, Advanced Organic Chemistry, Part B, Plenum (1990), Chapters 1- 11; Goodnow er al, A Handbook for DNA-Encoded Chemistry: Theory and Applications for Exploring Chemical Space and Drug Discovery, New York: John Wiley and Sons (2014);

[0121] and Coltman et al, Principles and Applications of Organotransition Metal Chemistry, University Science Books, Mill Valley, Calif (1987), Chapters 13 to 20; each of which is incorporated herein by reference in its entirety.

[0122] In some embodiments, the building block can include one or more functional groups in addition to the reactive group or groups employed to attach (e.g., react) a building block. One or more of these additional functional groups can be protected to prevent undesired reactions of these functional groups. Suitable protecting groups are known in the art for a variety of functional groups (Greene and Wuts, Protective Groups in Organic Synthesis, second edition, New York: John Wiley and Sons (1991), incorporated herein by reference in its entirety). Particularly useful protecting groups include Fmoc-groups, t-butyl esters and carbamates, acetals, trityl ethers and amines, acetyl esters, trimethylsilyl ethers, trichloroethyl ethers and esters and carbamates.

[0123] The type of building block is not generally limited, so long as the building block is compatible with one more reactive groups capable of forming a covalent bond with other building blocks. In some embodiments, the building block is not a nucleic acid or nucleic acid analog.

[0124] Suitable building blocks include but are not limited to, a peptide, a saccharide, a glycolipid, a lipid, a proteoglycan, a glycopeptide, a sulfonamide, a nucleoprotein, a urea, a carbamate, a vinylogous polypeptide, an amide, a vinylogous sulfonamide peptide, an ester, a saccharide, a carbonate, a peptidylphosphonate, an azatides, a peptoid (oligo N-substituted glycine), an ether, an ethoxyformacetal oligomer, thioether, an ethylene, an ethylene glycol, disulfide, an arylene sulfide, a nucleotide, a morpholino, an imine, a pyrrolinone, an ethyleneimine, an acetate, a styrene, an acetylene, a vinyl, a phospholipid, a siloxane, an isocyanide, a isocyanate, and a methacrylate. In certain embodiments, the (BI)M or (B2)K of formula (I) each independently represents a polymer of these building blocks having M or K units, respectively, including a polypeptide, a polysaccharide, a polyglycolipid, a polylipid, a polyproteoglycan, a polyglycopeptide, a polysulfonamide, a polynucleoprotein, a polyurea, a poly carbamate, a polyvinylogous polypeptide, a polyamide, a poly vinylogous sulfonamide peptide, a polyester, a polysaccharide, a polycarbonate, a polypeptidylphosphonate, a polyazatides, a polypeptoid (oligo N-substituted glycine), a polyethers, a polythoxyformacetal oligomer, a polythioether, a polyethylene, a polyethylene glycol, a poly disulfide, a polyarylene sulfide, a polynucleotide, a polymorpholino, a polyimine, a polypyrrolinone, a polyethyleneimine, a polyacetates, a polystyrene, a polyacetylene, a polyvinyl, a polyphospholipids, a polysiloxane, a polyisocyanide, a polyisocyanate, and a polymethacrylate. In certain embodiments, from about 50% to about 100%, including from about 60% to about 95%, and including from about 70% to about 90% of the building blocks have a molecular weight of from about 30 to about 500 Daltons, including from about 40 to about 350 Daltons, including from about 50 to about 200 Daltons.

[0125] It is understood that building blocks having two reactive groups would form a linear oligomeric or polymeric structure, or a linear non-polymeric molecule, containing each building block as a unit. It is also understood that building blocks having three or more reactive groups could form molecules with branches at each building block having three or more reactive groups. [0126] In some embodiments, the first building block is attached to an oligonucleotide G by a linker. In some embodiments, the oligonucleotide G comprises a first linker and a second linker. In some embodiments, the first linker is different from the second linker. In some embodiments, the oligonucleotide G comprises two or more linkers. The term "linker" as used herein refers to a bifunctional molecule or a portion thereof, which attaches a building block to the oligonucleotide G. In some embodiments, the building block is attached to the linker by a covalent bond. [0127] Various commercially available linkers are amenable to the applications of the present methods. Example of linkers may include, but are not limited to, PEG (e.g., azido-PEG-NHS, or azido-PEG-amine, or di-azido-PEG), or an alkane acid chain moiety (e.g., 5-azidopentanoic acid, (S)-2-(azidomethyl)-l-Boc-pyrrolidine, 4- azidoaniline, or 4-azido-butan-l-oic acid N- hydroxysuccinimide ester); thiol-reactive linkers, such as those being PEG (e.g., SM(PEG)n NHS-PEG-maleimide), alkane chains (e.g., 3-(pyridin-2-yldisulfanyl)-propionic acid-Osu or sulfosuccinimidyl 6-(3'-[2- pyridyldithio] -propionamido)hexanoate)); and amidites for oligonucleotide synthesis, such as amino modifiers (e.g., 6-(trifhioroacetylamino)-hexyl-(2- cyanoethyl)-(N,N- diisopropyl)-phosphoramidite), thiol modifiers (e.g., 5-trityl-6- mercaptohexyl-l-[(2- cyanoethyl)-(N,N-diisopropyl)]-phosphoramidite, or chemically co-reactive pair modifiers (e.g., 6-hexyn-l-yl-(2-cyanoethyl)-(N,N-diisopropyl)-phosphoramidite, 3- dimethoxytrityloxy-2-(3-(3-propargyloxypropanamido)propanamido)propyl-l-0- succinoyl, long chain alkylamino CPG, or 4-azido-butan-l-oic acid N-hydroxysuccinimide ester)); and compatible combinations thereof.

[0128] Many kinds of chemistry are available for use in this invention (e.g., for reaction of a building block with another building block). In theory, any chemical reaction could be used that does not chemically alter DNA. Reactions that are known to be sufficiently DNA compatible include but are not limited to: Wittig reactions, Heck reactions, homer-Wads-worth-Emmons reactions, Henry reactions, Suzuki couplings, Sonogashira couplings, Huisgen reactions, reductive aminations, reductive alkylations, peptide bond reactions, peptoid bond forming reactions, acylations, SN2 reactions, SNAr reactions, sulfonylations, ureations, thioureations, carbamoylations, formation of benzimidazoles, imidazolidinones, quinazolinones, isoindolinones, thiazoles, imidazopyridines, diol cleavages to form glyoxals, Diels-Alder reactions, indole-styrene couplings, Michael additions, alkene-alkyne oxidative couplings, aldol reactions, Fmoc-deprotections, trifluoroacetamide deprotections, Alloc-deprotections, Nvoc deprotections and Boc- deprotections. (See, Handbook for DNA-Encoded Chemistry (Goodnow R. A., Jr., Ed.) pp 319-347, 2014 Wiley, N.Y. March, Advanced Organic Chemistry, fourth edition, New York: John Wiley and Sons (1992), Chapters 10 to 16; Carey and Sundberg, Advanced Organic Chemistry, Part B, Plenum (1990), Chapters 1-11; and Coltman et al., Principles and Applications of Organotransition Metal Chemistry, University Science Books, Mill Valley, Calif. (1987), Chapters 13 to 20; each of which is incorporated herein by reference in its entirety.

EXAMPLES

[0129] The application may be better understood by reference to the following non-limiting examples, which are provided as exemplary embodiments of the application. The following examples are presented in order to more fully illustrate embodiments and should in no way be construed, however, as limiting the broad scope of the application. While certain embodiments of the present application have been shown and described herein, it will be obvious that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions may occur to those skilled in the art without departing from the spirit and scope of the invention. It should be understood that various alternatives to the embodiments described herein may be employed in practicing the methods described herein.

Example 1. Serial enrichment of an oligonucleotide and use of the oligonucleotide in the synthesis of an encoded molecule

[0130] This example demonstrates the serial enrichment of a portion of a plurality of oligonucleotides G from a pool of oligonucleotides G. Further, this example demonstrates the use of the serially enriched oligonucleotide G in the synthesis of an encoded molecule.

[0131] A pool of oligonucleotides G is provided, wherein each of the oligonucleotides G comprise a coding region and a reactive portion. The pool of oligonucleotides G comprises portion that have combinations of codons that are distinct from other portions of the pool. The pool of oligonucleotides G is loaded onto a system in a separation medium (e.g., a system such as a system 100 shown in FIG. 1) comprising one or more features comprising capture oligonucleotides. An electric current is applied to the system for at least a time sufficient to migrate a first portion of the pool of oligonucleotides G to a first feature of the system. The first portion of the pool of oligonucleotides G may hybridize with the first capture oligonucleotides on the first feature.

[0132] Following hybridization of the first portion of the pool of oligonucleotides G with the first feature, the first feature may be separated from the system. The captured portion of oligonucleotides G remains associated with the capture oligonucleotides of the feature following separation of the feature from the system. Upon separation of the feature from the system, the captured portion of oligonucleotides G may be eluted from the feature. Before elution, the feature may be washed with wash buffer to remove unbound portions of the oligonucleotide G. [0133] In order to further enrich the captured portion of oligonucleotides G, the eluted portion of oligonucleotides G may be re-loaded in a separation medium onto a system (e.g., a system comprising one or more features comprising capture oligonucleotides, as exemplified in FIG. 1). An electric current is applied to the system for at least a time sufficient to migrate a first portion of the eluted portion of oligonucleotides G to a first feature of the system. The first portion of the eluted portion of oligonucleotides G may hybridize with at least one of the first capture oligonucleotides on the first feature.

[0134] Following hybridization of the first portion of the eluted of oligonucleotides G with the first feature, the first feature may be separated from the system. The captured portion of the eluted portion oligonucleotides G remains associated with the capture oligonucleotides of the feature following separation of the feature from the system, thereby generating a serially enriched portion of oligonucleotides Gthat specifically hybridize with the first feature (e.g., the capture oligonucleotides of the first feature). Optionally, the serially enrichment steps may be repeated to further enrich a portion of the plurality of oligonucleotides G.

[0135] The serially enriched oligonucleotides G may then be used to synthesize an encoded molecule. The encoded molecule may be synthesized on a feature (e.g., the feature that has been separated from the system used to electrophoretically sort and serially enriched a portion of the pool of oligonucleotides G). At least one charged positional building block (e.g., a positional building block comprising an anti-codon) may be introduced to the feature comprising the serially enriched oligonucleotides G. The anti-codon of the charged positional building block may hybridize with a codon of the serially enriched oligonucleotides G that are captured on the feature. The reactive site present on the serially enriched oligonucleotides G may be reacted with the building block of the charged positional building block to form a covalent bond between the reactive site and the charged positional building block, thereby synthesizing an encoded molecule.

[0136] Alternatively, the encoded molecule may be synthesized on an immobilization array. The serially enriched portion of the pool of oligonucleotides that is captured on the feature may be separated from the system. Upon separation from the system, the serially enriched oligonucleotides G may be eluted from the feature. The eluted serially enriched oligonucleotides G may be immobilized on an immobilization array (e.g., an immobilization array comprising a solid phase that is an anion exchange resin). The eluted serially enriched oligonucleotides G can bind to the solid phase on the immobilization array. At least one positional building block comprising a reactive site may be introduced to the immobilization array comprising the serially enriched oligonucleotides G. The reactive site present on the serially enriched oligonucleotides G may be reacted with the reactive site of the positional building block to form a covalent bond between the reactive site of the oligonucleotide G (in the encoded region) and the reactive site of the positional building block. Repeating the synthesis steps allows for synthesizing an encoded molecule at the encoded region of the oligonucleotide G.

[0137] The synthesized encoded molecule corresponds to and may be identified by the coding region of the serially enriched oligonucleotide G. The encoded molecules may be subjected to downstream analysis for selection of encoded molecules possessing specific properties (e.g., binding to a particular target molecule). The coding region of the encoded molecules selected for said properties can be PCR amplified to determine the identity of the building blocks of the encoded molecules.

Claims

CLAIMS In the claims:

1. A method of sorting a plurality of oligonucleotides G, wherein each oligonucleotide G comprises a plurality of codons and a reactive site, the method comprising:

(a) providing a system comprising a hybridization array, wherein the hybridization array comprises a first feature and a second feature, wherein the first and second features are electrophoretically coupled in series by an aqueous separation medium;

(b) loading the separation medium at a position upstream of the first and second features with the plurality of oligonucleotides G;

(c) applying an electric current across the system for at least a time sufficient to migrate a first portion of the plurality of oligonucleotides G to the first feature and a second portion of the plurality of oligonucleotides G to the second feature; wherein the first feature comprises a multiplicity of first capture oligonucleotides and the second feature comprises a multiplicity of second capture oligonucleotides; wherein the first portion of the plurality of oligonucleotides G comprise a codon which specifically hybridizes with the first capture oligonucleotides; wherein the second portion of the plurality of oligonucleotides G comprise a codon which specifically hybridizes with the second capture oligonucleotides.

2. The method of claim 1, wherein the electric current is applied for at least a time sufficient to migrate a third portion of the plurality of oligonucleotides G to a position downstream of the first and second features; wherein the third portion of the plurality of oligonucleotides G does not comprise codons which can specifically hybridize with either of the first or second capture oligonucleotides.

3. The method of claim 2, wherein:

(a) the system comprises a third feature, wherein the first, the second, and the third features are electrophoretically coupled in series by the aqueous separation medium;

(b) wherein the electric current is applied for at least a time sufficient to migrate the third portion of the plurality of oligonucleotides G to the third feature; wherein the third feature comprises a multiplicity of third capture oligonucleotides;

43 wherein the third portion of the plurality of oligonucleotides G comprise a codon which specifically hybridizes with the third capture oligonucleotides.

4. The method of claim 3, wherein the electric current is applied for at least a time sufficient to migrate a fourth portion of the plurality of oligonucleotides G to a position downstream of the first, the second, and the third features; wherein the fourth portion of the plurality of oligonucleotides G does not comprise codons which can specifically hybridize with any of the first, the second, or the third capture oligonucleotides.

5. The method of claim 4, wherein:

(a) the system comprises a fourth feature, wherein the first, the second, the third, and the fourth features are electrophoretically coupled in series by the aqueous separation medium;

(b) wherein the electric current is applied for at least a time sufficient to migrate the fourth portion of the plurality of oligonucleotides G to the fourth feature; wherein the fourth feature comprises a plurality of fourth capture oligonucleotides; wherein the fourth portion of the plurality of oligonucleotides G comprise a codon which specifically hybridizes with the fourth capture oligonucleotides.

6. The method of claim 5, wherein the electric current is applied for at least a time sufficient to migrate a fifth portion of the plurality of oligonucleotides G to a position downstream of the first, the second, the third, and the fourth features; wherein the fifth portion of the plurality of oligonucleotides G does not comprise codons which can specifically hybridize with any of the first, the second, the third, or the fourth capture oligonucleotides.

7. The method of any one of claims 1-6, further comprising separating a feature from the system.

8. The method of any one of claims 1-7, further comprising eluting a portion of the plurality of oligonucleotides G from a feature.

44

9. The method of claim 8, wherein the eluted portion is eluted from the separated feature.

10. A method of synthesizing an encoded molecule, the method comprising:

(a) introducing a charged positional building block to a hybridization array comprising a feature, wherein the feature comprises a portion of oligonucleotides G after sorting according to the method of any one of claims 1-7;

(b) reacting the reactive site present on the portion of the oligonucleotides G with the charged positional building block to form a covalent bond between the reactive site and the charged positional building block.

11. A method of synthesizing an encoded molecule, the method comprising:

(a) immobilizing the eluted portion of the plurality of oligonucleotides G of claim 8 or claim 9 on an immobilization array;

(b) introducing a positional building block comprising a reactive site to the immobilization array;

(c) reacting the immobilized serially enriched plurality of oligonucleotides G with the reactive site of the positional building block to form a covalent bond between the reactive site of the oligonucleotide G and the reactive site of the positional building block.

12. The method of claim 11, wherein the immobilization array is an ion exchange resin.

13. A method of serially enriching a plurality of oligonucleotides G, the method comprising:

(a) obtaining a first set of oligonucleotides G that is the eluted portion of the plurality of oligonucleotides G according to claim 8 or claim 9;

(b) repeating the method of claim 8 or claim 9 at least once using the first set of oligonucleotides G to obtain a serially enriched plurality of oligonucleotides G.

14. The method of claim 13, wherein the method of claim 8 is repeated at least twice to obtain the serially enriched plurality of oligonucleotides G.

45

15. The method of claim 13 or claim 14, wherein the eluted portion of the plurality of oligonucleotides G was sorted by hybridization of a first codon to a first capture oligonucleotide and the serially enriched plurality of oligonucleotides G was sorted at least by hybridization of a second codon to a second capture oligonucleotide, wherein the first and second codons are different.

16. A method of synthesizing an encoded molecule, the method comprising:

(a) immobilizing the serially enriched plurality of oligonucleotides G of any one of claims 13-15 on an immobilization array;

17. The method of claim 16, wherein the immobilization array is an ion exchange resin.

18. The method of any one of claims 11, 12, 16, or 17, wherein the building block is not a nucleic acid or nucleic acid analog.