METHODS AND COMPOSITIONS FOR PRODUCTION OF DIRECTED
SEQUENCE LIBRARIES
CROSS REFERENCE TO RELATED APPLICATIONS [0001] This application claims the benefit of U.S. Provisional Patent Application
Serial No. 60/383,208, filed May 24, 2002, the disclosure of which is incorporated herein by reference in its entirety.
TECHNICAL FIELD [0002] The invention provides methods and reagents for producing a directed library that includes sequences corresponding to portions of a polynucleotide target of interest.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR
DEVELOPMENT [0003] This invention was made in part during work supported by grant no. 65236-
01-1-5437 from the Defense Advanced Research Projects Agency (DARPA). The government has certain rights in the invention.
BACKGROUND OF THE INVENTION [0004] The chemical synthesis tools of molecular biology and recombinant DNA technology have made possible the construction of libraries which contain pools of different nucleotide sequences. Libraries that contain small fragments containing sequences of a gene of interest or its complement are useful for a variety of applications. A library may be constructed from fragments generated by enzymatic digestion of the gene (Matveeva et al. (1997) Nucleic Acids Res. 25:5010-16; Pierce and Ruffner (1998) Nucleic Acids Res. 26:5093-5101; WO 99/50457).
[0005] The most important requirements for diagnostic and therapeutic methods based on hybridization between probe and target nucleic acids are sequence specificity and strong probe-target interactions. Such applications include microarrays (Southern et al. (1999) Nat. Genet. 21:5-9), competitive RT-PCR (Ishibashi (1997) J. Biochem. Biophys. Methods 35:203-207), and ligation-mediated amplification assays (Landegren et al. (1996) Methods 9:84-90), as well as antisense (Bruice & Lima (1997) Biochemistry 36:5004-5019;
Sohail & Southern (2000) Adv. Drug Deliv. Rev. 44:23-34), ribozyme (Scarabino & TocchiniNalentini (1996) FEBS Lett. 383:185-190; Amarzguioui et al. (2000) Nucleic Acids Res. 28:4113-4124) and siRΝA (Holen et al. (2002) Nucleic Acids Res. 30:1757- 1766; Shi (2003) TRENDS Genetics 19: 9-12; Bohula et al. (2003) J. Biol. Chem. 278: 15991-15997) approaches to gene inhibition. The specificity and efficacy of probe-target interactions depends on parameters such as target accessibility, hybridization rate, and the stability of the formed duplex (Sczakiel and Far (2002) Curr. Opin. Mol. Ther. 4:149-153). Because of the complexity of these interactions, the rational design methods, both experimental and theoretical, that have been developed for predicting optimal probe sequences and target site accessibility have had only limited success (Sczakiel & Far (2002) Curr. Opin. Mol. Ther. 4:149-153; Sohail & Southern (2000) Adv. DrugDeliv. Rev. 44: 23-34). Also, the common notion that sequences that are less involved in internal hydrogen bonding interactions represent more favorable target sequences is an oversimplification (Sczakiel & Far (2002) Curr. Opin. Mol. Ther. 4:149-153; Fakler et al. (1994) J. Biol. Chem. 269:16187-16194; Laptev et al. (1994) Biochemistry 33:11033- 11039). Target RNAs are often folded differently in the cell than in vitro (Lindell et al. (2002) RNA 8:534-541), and may be complexed with proteins that further reduce target site accessibility (Lieber & Strauss (1995) Mol. Cell Biol. 15:540-551). Conversely, some cellular factors may promote probe hybridization with target sites that are not accessible in vitro (Laptev et al. (1994) Biochemistry 33:11033-11039; Bertrand & Rossi (1994) EMBO J. 13:2904-2912).
[0006] As a consequence of this complexity, optimal hybridizing sequences (both sense and antisense) cannot be rationally selected based on sequence data and predicted or experimentally-determined target accessibility. To address this problem, several in vitro and in vivo methods for selecting target sequences from sequence libraries have been developed, using 5-30 nucleotide long variable sequences (Lieber & Strauss (1995) Mol. Cell. Biol. 15:540-551; Allawi et al. (2001) RNA 7:314-327; Lloyd et al. (2001) Nucleic Acids Res. 29:3664-3673; Ho et al. (1998) Nat. Biotechnol. 16:59-63; Birikh et al. (1997) RNA 3:429-437; Lima et al. (1997) J. Biol. Chem. 272:626-638; Wrzesinski et al. (2000) Nucleic Acids Res. 28:1785-1793; Scherr et al. (2001) Mol. Ther. 4:454-460; Milner et al. (1997) Nat. Biotechnol. 15: 37-541; Patzel & Sczakiel (2000) Nucleic Acids Res. 28: 2462- 2466; Yu et al. (1998) J. Biol. Chem. 273:23524-23533; WO 00/43538; WO 02/24950). An additional advantage of such libraries is that they can be used in a "reverse" genomics
approach, which can identify genes responsible for a specific phenotype without prior knowledge of any sequence information ( Li et al. (2000) Nucleic Acids Res. 28:2605-2612; Kawasaki & Taira (2002) Nucleic Acids Res. 30:3609-3614).
[0007] Screening for polynucleotide-based drugs, diagnostic probes, primers, and genomics tools may be performed by using a library of random polynucleotide sequences. However, this approach is very inefficient due to the large number of sequences which must be screened {i.e., >417, or 1010, molecules for >17-nucleotide hybridizing regions), which prevents or restricts use of these libraries in many applications, such as, for example, cell-based screening studies and microarray s. The preparation and screening of such libraries is very expensive, as well as time and labor-intensive. Further, a completely random 11 -nucleotide library has been shown to exhibit hybridization patterns that are far less intense than those generated with semi-random libraries, due to the greater likelihood of cross-hybridization among members of the library (Ho et al. (1996) Nucleic Acids Res. 24:1901-07). Random libraries also have the undesirable potential to target all cellular nucleic acids, rather than just the specific target of interest. Therefore, in in vivo applications, such libraries can block expression of housekeeping genes or other unintended targets, as well as inhibiting the activities of structural and functional RNAs such as rRNA, tRNA, snRNA, HI RNA, etc.
[0008] Directed sequence libraries, which contain sequences that correlate with and/or are complementary to the sequence of a target polynucleotide of interest, offer a useful alternative to screening entirely random libraries. Such directed libraries are useful for gene-specific applications, and offer a superior alternative for computer-assisted, theoretical design of polynucleotide probes and primers. The use of a directed sequence library significantly simplifies the screening process, since comparatively small libraries may be prepared and assayed. Further, non-specific, toxic effects on non-target genes are significantly diminished or eliminated, allowing directed sequence libraries to be cloned and expressed in intact cells, resulting in identification of optimal oligonucleotide probes and target sites under intracellular conditions.
[0009] The complexity of directed sequence libraries is typically 1000-10,000
(essentially the length of the target), whereas the complexity of completely random libraries is 4", where n is the length of the library sequence, which for a 20-mer library is in the range of 1012. Because of the huge reduction in complexity, screening of directed
sequence libraries for useful sequences is enormously more efficient than screening of the random libraries.
[0010] One method that has been used for preparation of a directed sequence library is a multi-stage process for making a directed antisense library against a target transcript (Pierce and Ruffner (1998) Nucleic Acids Res. 26:5093-101 ; WO 99/50457). This method includes digesting cloned cDNA target of interest with Exonuclease III/Mung bean nuclease, resulting in serial deletions extending into the cDNA sequence from one end. The resulting fragments with single-stranded ends are blunted with Pfu polymerase, followed by circularization of the DNA vector with T4 DNA ligase, and cloning. The resulting deletion library is digested with EsmFI and Bbs restriction (type IIS) endonucleases to recover 14 base pair segments of the cDNA, followed by blunting of the ends with Pfu polymerase. After gel purification, the DNA vector containing the 14 base pair directed library is circularized with T4 DNA ligase and cloned. This is a very complicated and time- and labor-intensive method. Also, Exo III exhibits a preference for stopping at particular sequence positions, e.g., AT base pairs, and certain local sequences and/or structures cause Exo III to either stall or fall off of the template. Consequently, not all sequences are equally represented in a library obtained by this method. Moreover, about 500 base pairs from each end of a target are missing in the library because of Exo III actions. Further, the size of such libraries is restricted in length {e.g., 14 base pairs) by the properties of the known type IIS restriction enzymes.
[0011] Another method for producing a directed library, described in WO 00/43538 and Bruckner et al. (2002) Biotech iques 33: 874-882, includes hybridization of an immobilized DNA target with a randomized sequence of uniform length (20 nucleotides), flanked on each end by a defined primer sequence masked by complementary blocking oligonucleotides. This method suffers from several serious drawbacks: The complexity of the initial random library (4 or 10 ) is higher than any target gene complexity (and even the entire human genome). The preparation and screening of such libraries is very time and labor-intensive. It also requires an immobilization of target polynucleotides. Hybridization with an immobilized target suffers from drawbacks of a requirement for large volumes for hybridization solutions and slow, inefficient binding of probes to target. This method also suffers from the cumbersome requirement of extra steps to separate bound from unbound probes and to elute bound probe from the target prior to amplification of the bound sequences. In addition, hybridization patterns obtained with a completely random 20-
nucleotide library are expected to be far less intense than those obtained with shorter libraries, due to formation of complementary complexes among members of the library (see, e.g., Ho et al. (1996) Nucleic Acids Res. 24:1901-07). Even when a high initial concentration of the 20-nucleotide random library is used, the concentration of individual sequences in the random pool is not high enough to provide efficient hybridization with a DNA target (see, e.g., Wertmur (1991) Critical Rev. Biochem. Mol. Biol. 26:227-59). The method disclosed in WO 00/43538 is restricted by the use of long, immobilized DNA targets, which hybridize to oligonucleotide probes less efficiently than shorter, non- immobilized oligonucleotide fragments in solution (see, e.g., Armour et al. (2000) Nucleic Acids Res. 28: 605-09; Southern et al. (1999) Nαtwre Genet. Suppl 21 :5-9). Also, solid- phase hybridization methods produce high background due to the high stability of partially complementary GC-rich duplexes {e.g., with several mismatches) and nonspecific surface effects. In fact, WO 00/43538 suggests that the majority of the 20-mer sequences captured on an immobilized DΝA target from the random oligonucleotide pool at 52°C will contain 4-8 mismatches, making them virtually insensitive to single nucleotide polymorphism (SΝPs).
[0012] Another method that has been used is described in Boiziau et al. (1999) J.
Biol. Chem. 274: 12730-12737, using a "template-assisted combinatorial strategy". Boiziau et al. selected DΝA aptamers targeting an accessible binding site in an RΝA hairpin, using both completely random libraries and libraries "enriched" in target-specific sequences. The "enriched sequences" were produced by ligation of "half-candidates" in the presence of an RΝA hairpin using RΝA ligase. The half-candidates were designed as hemi-random probes containing defined primer and comparatively longl5-nt terminal random sequences, and were used without masking oligonucleotides in the ligation reaction. Both ligation methods showed low efficiency and target-specificity, what is a consequence of the preference of RΝA ligase to ligate sequence motifs that are not aligned in complementary complexes (Harada and Orgel (1993) Proc. Natl. Acad. Sci. USA 90: 1576-1579. Also, due to the lack of masking oligonucleotides, most ligation products were unrelated to the RΝA target. Consequently, the authors found no benefit to using libraries prepared from hemi- random probes versus using probes with completely random 30-mer libraries without a ligation step.
[0013] In view of the foregoing, there is a need for an improved procedure for generating a directed sequence library which is highly specific for the target sequence from
which the library is generated and which does not suffer from the limitations of the methods described above.
BRIEF SUMMARY OF THE INVENTION [0014] The invention provides methods, compositions, and kits for preparing directed sequence libraries. The invention includes hemi-random oligonucleotide probes with defined sequences and random ends, and masking sequences hybridized to the defined sequences of the probes. The random sequences of pairs of probes, when hybridized to adjacent positions on a polynucleotide target of interest, are ligated together. The target- dependent ligation products are amplified to provide a directed sequence library. [0015] Accordingly, in one aspect, the invention provides a method for preparing a directed sequence library, including (a) combining a polynucleotide target and hemi- random oligonucleotide probes, (b) ligating probes which are hybridized to adjacent sequences on the target; and (c) amplifying ligated pairs of probes. Each hemi-random probe includes a defined nucleotide sequence along at least a portion of its length and a random sequence at its 5' or 3' end, and further includes a masking oligonucleotide hybridized to at least a portion of the defined sequence. The probes and the target are combined in solution under conditions that allow the random sequences of the probes to hybridize to the target. In some embodiments random polynucleotides that can compete for hybridization with target, but do not ligate to hemi random probes; are included. The length of the competing polynucleotides is shorter than the length of the random region of the hemi-random probes, for example about 5 to about 9, often about 6 to about 8 nucleotides, when a hemi-random with a random sequence of about 10 nucleotides is used. [0016] In embodiments of the methods of the invention, the nucleotide sequence of the target may be either known or unknown. In various embodiments, the target includes RNA, DNA, cDNA, mRNA, total cellular RNA, or genomic DNA. In some embodiments, the target includes a viral genome, a bacterial genome, or a eukaryotic genome, or a portion thereof. In various embodiments, the target may be double stranded, denatured {e.g. , by heat, alkali, or other means), single stranded, and/or fragmented. In some embodiments, the target may be extracted from an entire organism or a tissue thereof, from a microorganism or from a cell, such as a normal cell, an infected cell, or a cancer cell. [0017] In embodiments of the methods of the invention, the random sequences of the hemi-random probes are of fixed and/or variable length. In some embodiments, the
random sequence of a probe is about 3 to about 100, about 3 to about 50, about 4 to about 15, or about 5 to about 10 nucleotides in length. In various embodiments, the random sequences are fully random or partially random.
[0018] In embodiments of the methods of the invention, the defined sequences of the hemi-random probes include a cleavage site for a restriction endonuclease. In some embodiments, the defined sequences include binding sites for primers, such as for example PCR amplification primers.
[0019] In embodiments of the methods of the invention, the masking oligonucleotides are on different oligonucleotides than the hemi-random probes. In other embodiments, each masking oligonucleotide is a part of the same oligonucleotide as a hemi-random probe and includes sequences complementary to and capable of self- hybridizing with at least a portion of the defined sequence of the probe. [0020] In some embodiments of the methods of the invention, a fully random oligonucleotide that includes a 5 '-terminal phosphate is included which, when hybridized to a position intermediate and immediately adjacent to two hemi-random probes, is ligated at each end to a random sequence of a hemi-random probe in a target-dependent manner. The random oligonucleotide may be about 3 to about 100, about 3 to about 50, about 4 to about 15, or about 5 to about 10 nucleotides in length.
[0021] In embodiments of the method of the invention, hemi-random probes that are hybridized to adjacent positions on the target are ligated with a ligase enzyme, for example a DNA ligase. In some embodiments, the ligase enzyme is a thermostable ligase. In other embodiments, ligation is performed chemically, for example, using a 5 'end activating group or a chemical condensing agent.
[0022] In some embodiments, the method of the invention further includes separating target and target-bound hemi-random probes from probes that are not bound to the target. Such embodiments typically include providing a target that is derivatized with one member of a pair of affinity ligands, e.g. biotin, followed by purification of targets and targets hybridized to probes on an affinity matrix that includes the other member of the pair of affinity ligands, e.g., avidin or streptavidin. In one embodiment, the target is biotinylated and the affinity matrix includes avidin or streptavidin-conjugated magnetic beads, which may be collected by applying a magnetic current. Alternatively, target-bound and unbound probes may be separated by non-denaturing gel electrophoresis.
[0023] In embodiments of the methods of the invention, ligated hemi-random probes are amplified, for example by polymerase chain reaction. In some embodiments, the method includes addition of two primers suitable for polymerase chain reaction, wherein one primer is complementary to at least a portion of the defined sequence of one of the ligated probes and the other primer is complementary to at least a portion of the complement of the defined sequence of the other ligated probe. In some embodiments, the amplified ligated hemi-random probes are inserted into a vector, such as a cloning vector or an expression vector. In further embodiments, the vectors are introduced into host cells. In some embodiments, the amplified polynucleotides are inserted into an expression template/cassette that allows transcription of the directed library in vitro and then transfection of the transcripts into the cells.
[0024] In one aspect, the invention provides a host cell that includes a vector with amplified ligated hemi-random probes prepared according to a method of the invention. In another aspect, the invention provides a directed sequence library prepared according to a method of the invention.
[0025] In one aspect, the invention provides compositions for preparing directed sequence libraries. In one embodiment, compositions of the invention provide a hemi- random oligonucleotide probe which includes a defined sequence along at least a portion of its length and a random sequence at its 5' or 3' end, and further includes a masking oligonucleotide that is hybridized to at least a portion of the defined sequence. In various embodiments, the masking oligonucleotide is contained within a different or the same oligonucleotide as the hemi-random probe. In some embodiments, compositions of the invention include a mixture of probes with some probes including random sequences at their 5' ends and other probes including random sequences at their 3' ends. In some embodiments, compositions of the invention further include a polynucleotide target or targets and/or a ligase enzyme, for example a DNA ligase enzyme, such as a thermostable ligase enzyme. In some embodiments, compositions of the invention include a directed sequence library prepared by a method of the invention. In other embodiments, compositions include a pair of amplified ligated probes prepared by a method of the invention. In still further embodiments, compositions include a vector that includes a ligated amplification product produced by a method of the invention and/or a host cell including such a vector.
(0026] In one aspect, the invention provides a microarray that includes one or more directed sequence libraries, or directed sequence inserts that include amplified random sequences from pairs of hemi-random probes ligated in a target-dependent manner, prepared according to methods of the invention. In various embodiments, the microarray is in the form of RI-PCR primers, antisense, ribozymes, or small interfering RNA (siRNA, shRNA and miRNA).
[0027] In one aspect, the invention provides kits for producing a directed sequence library. In some embodiments, kits include hemi-random probes, masking oligonucleotides, ligases, buffers, primers, reagents for PCR amplification, and combinations thereof. In another aspect, the invention provides kits that include one or more directed sequence libraries prepared by methods of the invention. In a still further aspect, kits include microarrays that include a directed sequence library, or directed sequence inserts that include amplified random sequences from pairs of hemi-random probes ligated in a target-dependent manner, prepared according to methods of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS [0028] Fig. 1 schematically depicts several examples of designs for hemi-random probes. Fig. 1 A depicts hemi-random probes with constant regions and 10-nucleotide random regions. Fig. IB depicts hemi-random probes with masking oligonucleotides that are complementary to and hybridized to the constant regions. Fig. IC depicts hemi-random probes with self-complementary hairpin masking regions. PS: defined primer sequence. RS: restriction site.
[0029] Fig. 2 schematically depicts examples of target-dependent ligation of hemi- random probes hybridized with a target. Fig. 2 A depicts ligation of two hemi-random probes. Fig. 2B depicts ligation of two hemi-random probes to a bridging random oligonucleotide. Among multiple possible ligation products, only those containing both (left and right) primer sequences would be amplified by PCR. [0030] Fig. 3 schematically depicts the binding of probes of the invention to polynucleotide targets. Fig. 3A depicts hemi-random probes with masking oligonucleotides hybridized to single-stranded regions of a denatured double-stranded DNA or single-stranded DNA, or RNA target. Fig. 3B depicts hemi-random probes with masking oligonucleotides hybridized to a fragmented single-stranded polynucleotide target.
[0031] Fig. 4 schematically depicts production of a directed sequence library that includes target-dependent ligation of hemi-random probes hybridized to a polynucleotide target. After ligation of the probes with a ligase, pairs of ligated probes are PCR amplified. Further treatment of the amplified polynucleotides with at least one restriction endonuclease releases amplified directed sequence (both sense and antisense) inserts, yielding a directed sequence library of sequences corresponding to the original target. [0032] Fig. 5 shows a gel analysis of ligation products derived using the scheme in
Fig. 4 with a denatured dsDNA target and T4 DNA ligase. Samples loaded on the gel were PCR amplification products of the ligation reaction. The gel was a native 10% polyacrylamide gel stained with ethidium bromide. Two types of probes were used, a 36- mer that included a 26-nt constant region and a 10-nt random region, and a 28-mer that included a 20-nt constant region and a 7-nt randomized region containing one fixed nucleotide position depicted as "T", also G, C or A could be used to reduce cross hybridization among hemi-random probes. Ligations were performed at two different probe concentrations, plus or minus masking oligonucleotides, and plus or minus DNA target. The bottom of the figure schematically depicts examples of possible target- independent ligation of probes complementary to other probes, and target-dependent specific ligation.
[0033] Fig. 6 (A-F) represents nucleotide sequencing results for ligations of hemi- random 36-mers performed at various temperature and salt conditions. Regions corresponding to the sequences of the DNA target are highlighted. The target was a 1:1 mixture of SFV (Semliki Forest Virus) DNA fragment (7,378 bp) and SFV Helper DNA (5,092 bp) obtained by PCR from the plasmids pSFV, expressing the replicon genome, and pSFV Helper, expressing structural genes. Together, these sequences represent the full- length 12 kb SFV genomic target of interest. Both DNAs were double stranded and heat denatured before hybridization with the probes.
[0034] Fig. 7 is a chart representing the length distribution of target-matching library sequences found in the ligated probes. The diagram is based on results shown in Figs. 6 A-F. Only sequences with 0-1 mismatches with the target sequences were scored. [0035] Fig. 8 shows two histograms representing the distribution of 42 library sequences obtained for two different SFV DNA targets. The histogram is based on the results of shown in Figs. 6A-F. Fig. 8A shows results obtained with a 7 Kb SFV DNA fragment. Fig. 8B shows results obtained with a 5 Kb SFV Helper DNA.
[0036] Fig. 9 schematically depicts examples of probe sequences bound to a target with full or partial complementarity. Fig. 9A depicts full complementarity with all 20 bases of an antisense region complementary to the target. Fig. 9B depicts partial complementarity. Fig. 9C depicts an internal mismatch.
[0037] Fig. 10 schematically shows use of a "tagged" target DNA fragment {e.g. , biotinylated) capable of binding to an affinity matrix {e.g., streptavidin-coated magnetic beads), which allows removal of self-ligated probes by affinity purification of target-bound ligation products.
[0038] Fig. 11 represents nucleotide sequencing results for ligations of hemi- random 36-mers on a single-stranded cDNA target, corresponding to part of mouse TNF- alpha mRNA. Regions complementary to the sequences of the DNA target are highlighted. Fig. 11 A represents the results using commonly used ligation conditions including lOmM MgCl2 and lOOmM NaCl, and a reaction temperature of 25°C. Fig. 1 IB represents the results when the ligation reaction was performed in the presence of a 100-fold excess of competing random oligonucleotide hexamers. CS: length of complementary sequences; IMS: number of internally mismatched base-pairs.
[0039] Fig. 12 schematically depicts examples of competitive binding of hemi- random probes to a target random oligonucleotides. Fig. 12A depicts binding of hemi- random probes with perfect or near perfect complementarity to the target. Fig. 12B depicts displacement of hemi-random probes with only partial complementarity to the target by random oligomers.
[0040] Fig. 13 schematically depicts binding interactions between hemi-random probes and a structured RNA target. After hybridization of the probes to a folded RNA target, only probes can be ligated and amplified. Since the hybridization events occur only in single-stranded and looped regions, this method provides target accessible sites. [0041] Fig. 14 schematically depicts insertion of a directed library into an expression vector (left-hand side of figure) and conversion of a directed library into an expression cassette/template (right-hand side of figure). Examples of types of constructs that may be expressed include antisense and triplex forming (both DNA and RNA), ribozyme (RNA) and deoxyribozyme (DNA), and small interfering RNA, including short double-stranded RNA (siRNA), short hairpin RNA (shRNA), and hairpin micro RNA (miRNA).
DETAILED DESCRIPTION OF THE INVENTION [0042] The invention provides methods and reagents for producing a directed sequence library. As used herein, "directed sequence library" or "directed library" refers to a plurality of nucleic acid molecules that includes sense and/or antisense sequences corresponding to portions of a polynucleotide sequence or sequences of interest. For example, a 17-20-mer directed sequence library produced using the actin gene as a polynucleotide target contains sequences 17-20 nucleotides in length that correspond to sequences contained within the actin gene. A directed sequence library may contain sequences of the target and/or their complements. The invention described herein includes rapid solution hybridization and target-dependent ligation of sequences selected from a random pool as an efficient, highly-specific means of generating a directed sequence library.
[0043] The present invention provides methods and compositions for greatly enriching completely random libraries in sequences related to the target(s) of interest. The directed libraries produced using methods of the invention allow efficient and cost- effective identification or selection of molecules having optimal hybridization characteristics to the target of interest. Sequences so selected may differ in length, G-C content, and number of sequence matches with the target; however, they will generally have in common the hybridization characteristics for which they were selected. Directed sequence libraries provide starting materials for a multitude of applications, including design of antisense and ribozyme-based oligonucleotide genomics tools and therapeutics, design of oligonucleotide diagnostic reagents for detection of infectious agents, genetic traits and diseases, production of microarrays that contain sequences corresponding to a gene of interest or its complement, production of affinity reagents for purifying or enriching a sequence of interest from a mixture, and selection and optimization of sequences that may be used to produce siRNA (small interfering RNA) molecules to inhibit a known or unknown gene of interest.
General Techniques
[0044] The practice of the present invention will employ, unless otherwise indicated, conventional techniques of molecular biology (including recombinant techniques), microbiology, cell biology, and biochemistry, which are within the skill of the art. Such techniques are explained fully in the literature, such as: "Molecular Cloning: A
Laboratory Manual," vol. 1-3, third edition (Sambrook et al., 2001); "Oligonucleotide Synthesis" (M.J. Gait, ed., 1984); "Methods in Enzymology" (Academic Press, Inc.); "Current Protocols in Molecular Biology" (F.M. Ausubel et al., eds., 1987); "PCR Cloning Protocols," (Yuan and Janes, eds., 2002, Humana Press).
Methods for Production of Directed Sequence Libraries
[0045] In one aspect, the invention provides methods for producing a directed sequence library. Methods of the invention include the use of hemi-random probes, with random ends and defined, non-random sequences that are complementary to and hybridized to other oligonucleotides called masking oligonucleotides. After hybridization of the random ends of the hemi-random probes to a target polynucleotide of interest, the random ends of pairs of probes that are hybridized to adjacent sequences on the target are ligated, and ligated probes are then amplified, resulting in a library of sequences that represent portions of the target that was used. Inclusion of masking sequences renders the defined sequences of the hemi-random probes double-stranded and serves to promote target- dependent ligation, rather than target-independent ligation of hemi-random probes hybridized to unmasked randomized sequences of other hemi-random probes. A schematic depiction of an illustrative embodiment of the method of the invention is shown in Fig. 4. Generally, methods of the invention are performed in very small volumes with targets and probes free in solution (i.e., not bound to a solid support), which allows very rapid and highly specific solution hybridization.
Targets
[0046] Methods of the invention include a polynucleotide target. As used herein, the term "target" refers to a polynucleotide or plurality (i.e., a set) of polynucleotides from which a directed sequence library is prepared.
[0047] As used herein, "polynucleotide" refers to a polymeric form of nucleotides of any length and any three-dimensional structure (e.g., single-stranded, double-stranded, triple-helical, etc.), which contain deoxyribonucleotides, ribonucleotides, and/or analogs or modified forms of deoxyribonucleotides or ribonucleotides. The term polynucleotide also includes peptide nucleic acids (PNA). Polynucleotides may be naturally occurring or non- naturally occurring. The terms "polynucleotide" and "nucleic acid" and "oligonucleotide" as used herein are used interchangeably.
[0048] A target may be of known or unknown sequence. Suitable targets include any polynucleotides of interest. Targets may contain DNA, RNA or a combination of DNA and RNA, and may be single-stranded or double-stranded. Examples of targets include mRNA or cDNA (for example, either a single species or multiple species transcribed from a set of genes expressed under conditions of interest), DNA, single-stranded DNA, rRNA, a gene or gene fragment of interest, recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated DNA of any sequence, isolated RNA of any sequence, a viral genome or fragment of a viral genome, a bacterial genome or fragment of a bacterial genome, eukaryotic genomic DNA, mammalian genomic DNA, precursor (unprocessed) mRNA and tRNA (pre-mRNA and pre-tRNA), RNA-protein complexes, and DNA-protein complexes.
[0049] Polynucleotide targets may be prepared by a variety of methods For example, a full-length ssDNA target may be obtained by asymmetric PCR (i.e., using only one PCR primer). Alternatively, a full-length ssDNA target may be obtained by standard PCR using both unmodified and biotinylated primers. After amplification, dsDNA may be loaded on an avidin column (or avidin-conjugated magnetic beads), and after a treatment that denatures the DNA (and not the avidin-biotin complex), only the biotinylated strand will remain bound to the column, while the non-biotinylated strand will flow through and be removed from the column. Another example of target preparation includes denaturation of a dsDNA target by heating at, for example, 95 °C for about 2 minutes, followed by immediate cooling on ice. Denaturation may alternatively be accomplished in alkaline media, such as 0.25 M NaOH, followed by neutralization by HC1 (Nickerson et al. (1990) Proc. Natl. Acad. Sci. USA 87:8923-27). Target fragments may be prepared using a "random PCR" technique (see, e.g., Wong et al. (1996) Nucleic Acids Res. 24:3778-83), which includes addition of hemi-random primers, typically including a 20 nucleotide constant primer-binding region and a 6 nucleotide random region, to a DNA of interest. Upon addition of Klenow fragment (or Taq polymerase), the hybridized primers are elongated, producing fragments of varied length that can be further amplified by PCR, if desired. If the polynucleotide target sequence is known, shorter fragments can be obtained by PCR or RT-PCR using several different primer pairs. Other methods of target preparation are available and are well known to those of skill in the art.
Hemi-Random Probes
[0050] The methods of the invention also include a plurality of hemi-random probes. As used herein, "hemi-random probe" or "probe" refers to an nucleotide containing both a defined, predetermined sequence and a random sequence. [0051] The "defined sequence'Or "predetermined sequence" is a sequence that is known and may include sequences that have practical utility, such as restriction sites or primer sites for amplification purposes. The defined sequence may be single-stranded, or may include a complementary sequence that self-hybridizes by forming a haiφin loop. The defined sequence may include DNA, RNA or a combination of DNA and RNA (i.e., a DNA-RNA chimera).
[0052] The "random" sequence is a random nucleotide sequence at either the 5' or
3' end of a probe. In methods of the invention, hemi-random probes are provided as a mixture of probes with 5' and 3' random end sequences. A random sequence at the 5' end of a probe contains a terminal phosphate group, while a random sequence at the 3' end of a probe contains a terminal hydroxyl group. The random sequence is generally single- stranded, and generally includes DNA rather than RNA at the ligation site. Probes of the invention are generally provided as a mixture of probes with 5' random end sequences and 3' random end sequences. The random sequence may be fully random or may be partially random (i.e., interrupted by at least one "fixed" nucleotide at a particular position within the random sequence). The random sequence may be of fixed or variable length.. [0053] Methods for producing the random sequences of the hemi-random probes are well known in the art. Such methods include, for example, randomized synthesis on a DNA synthesizer using standard protected nucleotide phosphoramidite chemistry and deprotection protocols to extend the nucleotide polymer from the 3' end, in the presence of a mixture of four phosphoramidite bases (e.g., A,C,G,T).
[0054] The length of the random sequence may be fixed or variable and is generally chosen such that the total length of directed sequence of a pair of ligated probes (i.e., random sequences that hybridized to the target) is of sufficient length to represent a unique sequence. For example, 17-nt random sequences are generally expected to be unique in the human genome (see, e.g., Saha et al. (2002) Nature Biotech. 19:508-12). One of skill in the art may readily determine a suitable length for the random sequence of the probes based on characteristics of the target, such as length and complexity and the purpose for which the library is to be used. The length of the random sequence is sometimes from about 3 to
about 50, often from about 4 to about 15, more often from about 5 to about 10 nucleotides in length.
Masking Oligonucleotides
[0055] The methods of the invention include masking oligonucleotides. As used herein, "masking oligonucleotide" refers to an oligonucleotide which contains a sequence that is complementary along at least a portion of its length with at least a portion of the defined sequence of a probe. The masking oligonucleotide may include any suitable nucleic acid or analog that is capable of hybridizing with at least a portion of the defined sequence of the probe, including, for example, DNA, RNA, and PNA. The masking oligonucleotide may be provided as a separate single-stranded oligonucleotide, or may be part of the probe sequence. When the masking oligonucleotide is part of the probe sequence, it is generally at the end opposite the random sequence and is capable of self- hybridizing with the defined sequence, for example by forming a hairpin loop. In methods of the invention, a masking oligonucleotide and a defined sequence of a probe hybridize to form a double-stranded sequence. (See Figs. IB and IC.)
Hybridization of Hemi-Random Probes to Target
[0056] The hemi-random probes, with hybridized masking sequences, are mixed with the polynucleotide target. If the target is double-stranded, it may be denatured, for example by heating to a high temperature followed by rapid cooling, prior to addition of the probes. The probes and target are mixed under conditions of appropriate stringency to allow random single-stranded sequences at the 5' or 3' end of probes to hybridize to complementary sequences on the target. In one embodiment, a pair of hemi-random probes hybridizes to adjacent positions on the polynucleotide target with 5' and 3' ends of random sequences adjacent to one another. In another embodiment, a third random "bridging" oligonucleotide is included that hybridizes to a position intermediate but directly adjacent to two hemi-random probes (i.e., the 5' end of the random sequence of one hemi-random probe and the 3' end of the random sequence of another hemi-random probe) (Fig. 2). The third, bridging oligonucleotide includes a 5' terminal phosphate which may ligated. [0057] "Hybridization" as used herein refers to association between two single- stranded polynucleotides to form a duplex via hydrogen bonding. Optimal hybridization conditions depend on a variety of factors, including the length and base compositions of the
polynucleotides, the extent of base mismatching between the two polynucleotides, the presence of salt and organic solvents, polynucleotide concentration, and temperature. Generally, the higher the "stringency" of the hybridization conditions, the higher the sequence complementarity must be between two polynucleotides to allow them to hybridize. Appropriate hybridization conditions of varying stringency are widely known and published in the art (see, for example, Sambrook et al. (2001), "Molecular Cloning: A Laboratory Manual," third edition). Generally, high stringency hybridization conditions may be selected at about 2-5°C lower than the thermal melting point (Tm) for a specific double-stranded sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength and pH conditions) at which 50% of a polynucleotide sequence hybridizes to a perfectly matched (i.e., complementary) sequence. Typically, stringent conditions will be those in which the ionic strength is about 0.02 molar or lower at pH 7 and the temperature is at least about 60°C (although the hybridization temperature depends on the length of the oligonucleotides used). As other factors may significantly affect the stringency of hybridization, including, for example, nucleotide base composition and size of the complementary strands, the presence of organic solvents, salt, formamide, DMSO, or glycerol, and the extent of base mismatching, the combination of parameters is more important than the absolute measure of any one factor. Examples of relevant hybridization conditions include (in order of increasing stringency): incubation temperatures of 25°C, 30°C, 35°C and 37°C; buffer concentrations of 10 X SSC, 6 X SSC, IX SSC, 0.1 X SSC (where SSC is 0.15 M NaCl and 15 mM citrate buffer) and their equivalents using other buffer systems; formamide concentrations of 0%, 25%, 50%, and 75%; incubation times from 5 minutes to 24 hours; 1, 2, or more washing steps; wash incubation times of 1, 2, or 15 minutes; and wash solutions of 6 X SSC, IX SSC, 0.1 X SSC, or deionized water.
[0058] Generally, in methods of the invention, temperatures below 45 °C are used when the random sequence of the hemi-random probes is 10 nucleotides or less. The melting temperature of a 10-mer is within the range of 10-50°C, depending on GC-content and nucleotide sequence, and at high temperatures generally only a few 10-mers will be capable of hybridizing to a limited number of positions on the target. Elevated temperatures may be used when probes with longer (e.g., 15-30 nucleotide) random sequences are used.
[0059] An example of hybridization of probes to a long double-stranded heat- denatured DNA target is depicted in Fig. 3A, and hybridization to a fragmented DNA target is depicted in Fig. 3B. Use of fragmented single-stranded DNA is desirable in some embodiments as a means of minimizing the masking effect of secondary structures of the target, which prevents or reduces hybridization of the probes. [0060] Sequence-specifically hybridized probes may be fully or partially complementary to the target (Fig. 9). To produce a library that includes sequences that are directly complementary or identical to sequences of the target, high stringency conditions are used. To introduce a degree of diversity that is greater than the diversity of the sequences of the target, lower stringency conditions are used. Probes are typically used at low enough concentrations that target-independent ligation of probes is minimized but at high enough concentrations that target-dependent ligation is efficient. This is typically accomplished by using an excess of target.
[0061] Methods of the invention provide both uniform and non-uniform lengths of sequences complementary to target. In embodiments in which lengths are uniform, the uniform length minimizes or eliminates redundant sequences, reduces complexity, and makes libraries of the invention self-amplifiable and easily subjectable to subtractive hybridization and gel purification techniques. In previous methods, use of uniform length probes was subject to the disadvantageous limitation of non-uniform hybridization for polynucleotides with different GC contents or different sequences (see, e.g., Breslauer, et al. (1986) Proc. Natl. Acad. Sci. USA 83: 3746-50). The present invention circumvents this problem by allowing for non-uniform length of hybridizing regions formed by the ligation of hemi-random probes under uniform hybridization conditions.
Ligation
[0062] Hemi-random probes, with 5' and 3' ends of random sequences hybridized to adjacent positions on the target are ligated together. In some embodiments, a third random oligonucleotide is included which is ligated to a hemi-random probe on each end when hybridized to a position intermediate but directly adjacent to two probes (Fig. 2). Ligation is "target dependent" because only sequences hybridized to adjacent positions on the target will be ligated. Unhybridized sequences, or those hybridized to non-adjacent positions, will not be ligated. Inclusion of masking sequences helps to prevent undesirable target-independent ligation of the probes which can occur when the probes hybridize to
each other rather than to the target. Inclusion of a masking sequence which forms a duplex with the defined sequence of a probe prevents the defined sequence from being available to hybridize with a random sequence of another probe which happens to be complementary, and also prevents the defined sequence from being available to hybridize with the target. The masking sequences thus serve to promote target-dependent ligation of random probe sequences that are complementary to the target. Ligation may be performed either after, or concurrent with, hybridization of probes to the target.
[0063] Most DNA ligases can discriminate single-base mismatches in the vicinity of the ligation site (see, e.g., U.S. Pat. No. 4,988,617). This feature makes ligase-mediated assays superior with regard to sequence specificity when compared with simple hybridization assays (see, e.g., Barany (1991) PCR Methods Applications 1 :5-16). Further, the use of hemi-random probes with short random hybridizing sequences, e.g., 10 nucleotides, increases sensitivity to internal mismatches in comparison to longer hybridizing sequences. In contrast, longer sequences, e.g., 20 nucleotides, allow an average of 4-8 mismatches (WO 00/43538; Bruckner et al., supra). [0064] Appropriate conditions for ligation are well known in the art. As used herein, "ligation" refers to the formation of a phosphodiester bond between adjacent 3' -OH and 5 '-phosphate termini of two polynucleotides to form an uninterrupted polynucleotide strand, such that there are no gaps (i.e., missing nucleotides) at the junction. Ligation is generally catalyzed by a ligase enzyme. Ligase enzymes that are capable of catalyzing ligation reactions between adjacent nucleotides on one strand of a duplex are well known and widely available. Often, a DNA ligase, such as T4 DNA ligase, is used. Examples of ligases include T4 DNA ligase, E. coli DNA ligase, AMPLIGASE., Taq DNA ligase, Thermus thermophilus DNA ligase, Thermus scotoductus DNA ligase, and Rhodothermus marinus DNA ligase (see U.S. Pat. No. 6,316,229). Other useful ligases include ribozyme ligases, such as template-dependent ribozymes or deoxyribozyme ligases prepared by modification of naturally-occurring ribozymes (e.g., modified Tetrahymena ribozyme ligase) or selected by SELEX (Systematic Evolution of Ligands by Exponential Enrichment) (see, e.g., U.S. Pat. No. 5,652,107). A thermostable ligase may be used. "Thermostable ligase" refers to an enzyme which is stable to heat , is heat resistant (i.e., does not become denatured or inactivated at elevated temperatures), and catalyzes ligation at high temperatures, for example 50°C to 90°C. An example of a thermostable ligase is the ligase derived from Thermus aquaticus (Takahashi et al. (1984) J. Biol. Chem. 259:10041-
47), which may also be prepared recombinantly (WO 91/17239). Alternatively, a chemical ligation method may be used (e.g., Harada and Orgel (1994) J. Mol. Evol 38:558-560; James and Ellington (1997) Chem. Biol 4: 595-605; Zuber and Behr (1994) Biochemistry 33:8122-8127. Examples of chemical ligation reagents include, but are not limited to, water-soluble carbodiimides, cyanogen bromide, and N-cyanoimidazole. Chemical ligation can be as discriminative to mismatches around the ligation site as DNA ligase (Harada and Orgel, supra; James and Ellington, supra).
[0065] Appropriate conditions for ligation will depend on the particular ligase used and on the length of the hybridized random sequences to be ligated. In one embodiment, ligation is performed at a temperature of about 16-25°C. Often, ligation is performed at a temperature that is in the upper half of the melting temperature range for a hybridized random portion of a hemi-random probe. This serves to avoid partial complementarity between probes and target sequences. For example, hemi-random probes with 10-mer random sequences often may be ligated at a temperature of about 30-35°C. Probes with longer random sequences may be ligated at a higher temperature, e.g., about 35-40°C, or may be ligated using a thermostable ligase at an elevated temperature appropriate for the ligase being used (see, e.g., U.S. Pat. No. 6,054,564).
[0066] A ligation approach employing a ligase enzyme and ligation of short randomized sequences to form a target-specific sequence provides a higher quality library than previous approaches which have employed only hybridization, without the ligation step (see, e.g. Paquin et al., supra). The methods of the present invention, in addition to yielding perfectly target-matched sequences equal in length to the random sequences of the probes, also may yield shorter fully complementary sequences, or sequences with a mismatch (see, e.g., Figs. 6, 9 and 11). Mismatches frequently occur upon hybridization, and some ligase enzymes, such as T4, are known to tolerate mismatches, albeit infrequently. Both internal and terminal mismatches, as well as mismatches at the ligation site can be produced as minor ligation products (Wu and Wallace (1989) Gene 76:245-254; Goffin et al. (1987) Nucleic Acids Res. 15:8755-8771; Harada and Orgel (1993) Nucleic Acids Res. 21:2287-91). The accuracy of a ligase enzyme can be enhanced under certain conditions, e.g., at elevated temperatures, in the presence of certain solutes, and/or at low ligase concentrations (Wu and Wallace, supra). In addition, the quality of a directed library may be enhanced by using a second selection step, based on hybridization to an immobilized DNA target, often with about 100 to about 500 nucleotide fragments, followed
by purification using, for example, affinity chromatography, magnetic beads, or non- denaturing gel electrophoresis.
[0067] In some embodiments, addition of random oligonucleotides that do not include a 5' terminal phosphate, for example having terminal 5'-hydroxyl groups, to a ligation mixture that includes hemi-random probes with masking oligonucleotides and a polynucleotide target can also improve the quality of the directed library, in terms of the length of complementary antisense sequence obtained. Such random oligonucleotides compete with hemi-random probes for binding to the target, thus permitting only perfectly matched probes to hybridize with the target and be subject to ligation. Typically, the length of a random polynucleotide is about 5 to about 9 nucleotides, often about 6 to about 8 nucleotides in length, for a hemi-random probe with a 10-nt random sequence. When using other length random sequences, the length of the competing random oligonucleotides may be adjusted accordingly, generally such that it does not exceed the length of the random sequence. The competing random oligonucleotide can be DNA or RNA. RNA has the advantages that it is poorly ligated by and a poor template for DNA ligase (Moore and Shaφ (1992) Science 256: 992-997; Nilsson et al.(2001) Nucleic Acids Res. 29: 578-581), and, therefore, would provide less target-independent ligation products and would compete less for the ligation with 5'-end phosphorylated hemi-random probe than competing DNA oligonucleotides. Fig. 1 IB depicts an example of anti-TNF directed library sequences that were prepared by ligation of hemi-random probes in the presence of competing random hexamers, present in about a 100-fold excess over the hemi-random probes, at sub-optimal conditions. Conditions could be readily optimized by one of skill in the art by varying experimental parameters, for example, the ratio of random oligonucleotide to hemi-random probe, the length of the competing random oligonucleotide, concentration of salt and other solutes in the reaction mixture, and the ligation reaction temperature. [0068] In some embodiments, target-bound ligation products may be separated from unbound probes and non-target-dependent ligated probes. Methods useful for such a purification are well-known to those of skill in the art. For example, the polynucleotide target may be derivatized with one member of a pair of affinity ligands, e.g., biotin, and target polynucleotides with or without bound ligation products may be purified with a matrix conjugated to the other member of the affinity pair, e.g., an avidin or streptavidin affinity column or avidin or streptavidin-conjugated magnetic beads, prior to amplification of the ligated probes (Fig. 10). Affinity purification may be performed either prior to or
after ligation of hybridized hemi-random probes, and may further enrich the directed library for members with uniform hybridization characteristics.
Amplification
[0069] Ligated probe sequences may be amplified by a number of methods that are well-known to those of skill in the art to produce a directed sequence library. As used herein, "amplification" refers to the process of producing multiple copies of a desired polynucleotide sequence.
[0070] An exemplary method for amplification is the polymerase chain reaction, or
"PCR." As used herein, "PCR" refers to a process of amplifying one or more specific polynucleotide sequences, wherein (1) oligonucleotide primers which determine the ends of the sequences to be amplified are annealed to single-stranded polynucleotides, (2) a polymerase extends the 3' ends of the annealed primers to create a polynucleotide strand complementary in sequence to the polynucleotide to which the primers were annealed, (3) the resulting double-stranded polynucleotide is denatured to yield two single-stranded polynucleotides, and (4) the processes of primer annealing, primer extension, and product denaturation are repeated enough times to generate easily identified and measured amounts of the sequences defined by the primers. DNA amplification procedures by PCR are well known and are described in U.S. Pat. Nos. 4,683,195, 4,683,202, and 4,965,188, and PCR Protocols: A Guide to Methods and Applications (1990) Innis et al., eds., Academic Press, San Diego.
[0071] In methods of the invention, the products of target-dependent ligation, described above, may be amplified by PCR using methods well known in the art (see, e.g., Barany (1991) PCR Methods Applications 1 :5-16). Primers are used which are complementary to at least a portion of the defined sequence of each of the probes or its complement. Generally, in methods of the invention, a pair of primers is used, one of which is specific for and hybridizable to the defined sequence of one of the two probes which have been ligated together, and the other of which is specific for and hybridizable to the complement of the defined sequence of the other probe. By using a pair of specific bidirectional primers, and due to the exponential nature of PCR amplification, the target- dependent ligation products are preferentially amplified versus target and unbound probes. Typically, only a small portion of the ligation reaction mixture is used for PCR, so that the polynucleotide target, and non-bound and non-ligated probes, do not interfere with PCR.
PCR amplification of a ligated pair of probes results in a double-stranded product, in which one strand contains a sequence of the target (sense) and the other strand contains a sequence complementary to the target (antisense).
[0072] For efficient copying of the products of ligation of hemi-random probes, whether comprising DNA, RNA, or a DNA-RNA chimera, reverse transcriptase may be used instead of DNA polymerase in step (2) above. It is well known that reverse transcriptase enzymes, e.g., avian myeloblastosis virus reverse transcriptase (AMV RT), catalyzes the synthesis of DNA strands using templates comprising DNA, RNA, or RNA- DNA chimeras (Kacian (1977) Meth. Virol. 6:143). The resulting RNA-DNA/DNA duplexes can then proceed through steps (3) and (4) above for efficient amplification by PCR. In some cases, there may be enough material following the ligation step to allow cloning directly without amplification, permitting omission of steps (3) and (4).
Cloning of Amplified Target-Dependent Ligation Products
[0073] Amplified target-dependent ligation products may be inserted into a plasmid or vector by means well known to one of skill in the art. If the defined sequences of the hemi-random probes contain restriction sites, a restriction digest may be performed on the amplified products, followed by cloning of a directed sequence insert into an appropriate vector using standard techniques of molecular biology. If the defined sequences do not include restriction sites, a blunt-end ligation method may be used. As used herein, a "directed sequence insert" includes the random sequences of a pair of probes that originally hybridized to adjacent sites on the target and were ligated together, using the methods described above.
[0074] As used herein, a "vector" refers to a polynucleotide that is capable of transferring an inserted polynucleotide into and/or between host cells. Some vectors are capable of autonomous replication in a host cell into which they are introduced (e.g., bacterial vectors having a bacterial origin of replication and episomal mammalian vectors). Other vectors (e.g., non-episomal mammalian vectors) are integrated into the genome of a host cell upon introduction into the host cell and thereby are replicated along with the host genome. Other vectors function as "expression vectors" which, when introduced into an appropriate host cell, can be transcribed and translated into a polypeptide. Directed sequence inserts produced by the methods of the invention may be introduced into expression vectors, from which they are transcribed and then translated into polypeptides.
Expression vectors may be derived from bacteriophage, including all DNA and RNA phage (e.g., cosmids), or viral vectors derived from all eukaryotic viruses, such as baculoviruses and retroviruses, adenoviruses and adeno-associated viruses, Herpes viruses, Vaccinia viruses and all single-stranded, double-stranded, and partially double-stranded DNA viruses, all positive and negative stranded RNA viruses, and replication defective retroviruses. Another example of a vector is a yeast artificial chromosome (YAC), which contains both a centromere and two telomeres, allowing YACs to replicate as small linear chromosomes.
[0075] Recombinant vectors containing directed sequence inserts may be used to transform or transfect host cells for further amplification of a directed sequence library. "Transformation" or "transfection" refers to the insertion of an exogenous polynucleotide into a host cell, irrespective of the method used for the insertion, for example, lipofection, transduction, infection, electroporation, etc. The exogenous polynucleotide may be maintained as a non-integrated vector, for example, a plasmid, or alternatively, may be integrated into the host cell genome. A "host cell" includes an individual cell or cell culture which can be or has been a recipient for vector(s) or for incoφoration of polynucleotides or proteins. Host cells include progeny of a single host cell, and the progeny may not necessarily be completely identical (in moφhology or in total genomic DNA complement) to the original parent cell due to natural, accidental, or deliberate mutation. A host cell includes cells transfected or transformed in vivo with a polynucleotide(s) of this invention. Examples of suitable host cells include bacterial cells or eukaryotic cells grown in culture.
Compositions
[0076] The invention includes compositions that are useful for preparing a directed sequence library. Such compositions include hemi-random probes, polynucleotide targets, and/or reagents for ligating and/or amplifying sequences which bind to the target. The invention also includes compositions containing polynucleotides produced by the methods of the invention and/or polypeptides encoded by polynucleotides produced by the methods of the invention. Other compositions of the invention include directed sequence libraries produced by the methods described above. Other compositions of the invention include host cells containing directed libraries or polynucleotides (e.g., directed sequence inserts produced by the methods of the invention).
[0077] In one aspect, the invention provides a composition containing at least one hemi-random probe with a random sequence and a defined sequence, and hybridized to a masking oligonucleotide, as described above. The composition may further include a target polynucleotide of known or unknown sequence, as described above. The ability to make directed libraries of unknown gene sequences may be useful, for example, for identifying sequences that represent potential drug targets on a virus or collection of viruses of unknown sequence. The composition may further include reagents and enzymes for ligating probes that bind to adjacent sequences on the target, and may further contain reagents, enzymes, and/or primers for amplification of the ligated sequences, for example by PCR. Compositions may also include buffers and other reagents suitable for hybridization of the masking oligonucleotide to the defined sequence of the probe and for hybridization of the random sequences of the hemi-random probes to the target polynucleotide. Compositions may also include a ligase, a DNA polymerase, and/or other enzymes useful in the ligation and/or amplification reactions of the methods for preparing a directed sequence library described above. Compositions may also include buffers and other reagents suitable for catalytic activity of these enzymes.
[0078] In another aspect, compositions of the invention include amplified target- dependent ligation products and reagents such as restriction enzymes, vectors, and/or host cells, and appropriate buffers or media, for cloning of amplified target-dependent ligation products.
[0079] In another aspect, the invention provides compositions including at least one polynucleotide prepared by the methods described above. Some compositions include polynucleotides produced by ligation of pairs of probes that hybridize to adjacent positions on a target. Other compositions include polynucleotides produced by amplifying the ligation products. Still further compositions include isolated polynucleotides obtained from the reaction products of the ligation or amplification reactions, or obtained from vectors into which these products have been introduced. Isolated polynucleotides may be obtained by digestion of the surrounding nucleic acid with a restriction endonuclease or by other methods of molecular biology that are well known to those of skill in the art. [0080] In a further aspect, the invention provides polypeptides encoded by polynucleotides prepared by the methods of the invention. For example, pairs of probes that hybridize to adjacent sequences on a target may be ligated and amplified and the resulting amplified sequences may be introduced into an expression vector, from which
they may be transcribed and translated into polypeptides in a host cell or in vitro. Such polypeptides may be used for immunological studies or for diagnostic or therapeutic puφoses.
Kits
[0081] The reagents and libraries described herein can be packaged in kit form. In one aspect, the present invention provides a kit that includes reagents useful for preparation and/or use of directed sequence libraries, in suitable packaging. Kits of the invention include any of the following, separately or in combination: hemi-random probes, target polynucleotides, masking oligonucleotides, PCR primers, enzymes and reagents for ligation and/or amplification reactions, vectors, reagents, enzymes, host cells and/or growth medium for cloning of the amplified products of the ligation reaction, or libraries produced by the methods of the invention.
[0082] Each reagent is supplied in a solid form or liquid buffer that is suitable for inventory storage, and later for exchange or addition into a reaction or culture medium. Suitable packaging is provided. As used herein, "packaging" refers to a solid matrix or material customarily used in a system and capable of holding within fixed limits one or more of the reagent components for use in a method of the present invention or one or more libraries produced by the method of the present invention. Such materials include glass and plastic (e.g., polyethylene, polypropylene, and polycarbonate) bottles, vials, paper, plastic, and plastic-foil laminated envelopes and the like.
[0083] A kit may optionally also provide additional components that are useful in the procedure. These optional components include buffers, reacting surfaces, means for detection, control samples, instructions, and inteφretive information.
Uses for Directed Sequence Libraries
[0084] Directed sequence libraries and methods of the present invention may be used as starting materials for a multitude of applications, including development of diagnostic reagents, therapeutic reagents ( e.g., polynucleotide therapeutics), and genomics tools, and as affinity reagents.
[0085] In one aspect, libraries of the invention may be used for development and optimization of sequences for antisense- and ribozyme-based polynucleotide genomics tools (e.g., gene knockdown, gene-target validation, etc.) and therapeutics. For example, a
directed sequence library may be prepared from a gene sequence that provides a particular cellular function. Antisense sequences that block that function may be determined by screening the library for sequences that inhibit gene function. Target accessibility, hybridization parameters, and inhibitory effects may also be assessed. [0086] Rationally-designed nucleic acid viral therapeutics currently in use, including antisense, ribozymes, deoxyribozymes, siRNA, shRNA and miRNA target only one rationally-selected sequence on a viral RNA. In cases where a virus mutates rapidly (e.g. HIV or Influenza) the rationally-selected target sequences mutate over time, and the therapeutic will become ineffective. The same is true for nucleic acid therapeutics directed at cancer targets, or mutation can lead to resistance to the drug. In contrast, nucleic acid therapeutics, selected de novo from a pool of directed sequence libraries, are advantageous in comparison to rationally-designed defined sequence therapeutics for nucleic acid-based anti-viral (and anti-cancer) applications. Therapeutics selected from a directed sequence library of the invention target multiple sites on a viral nucleic acid or cancer target simultaneously, allowing effective expression and down-regulation of a rapidly mutating virus or cancer cell. Knowledge of the genetic sequence, molecular, and structural biology of the virus or cancer cell are unnecessary, in contrast to rational drug design methods. [0087] In another aspect, libraries of the invention may be used for selection and optimization of sequences useful for RNA interference, such as siRNA (small interfering RNA) molecules capable of inhibiting known or unknown genes. "siRNA" refers to a double-stranded RNA molecule that inhibits expression of a complementary known or unknown gene(s) (see, e.g., Tuschl (2002) Nature Biotechnology 20:446-48). [0088] In another aspect, libraries of the invention may be used to select optimal probes for microarrays of immobilized polynucleotide sequences for microarray- based diagnostics and gene expression analysis, including detection of the presence of bacterial and viral infectious agents, genetic traits and diseases, SNPs, etc. (see, e.g., Rampal, ed. (2001) DNA Arrays, Methods and Protocols (Humana Press). In an embodiment, target-dependent ligation products may be prepared by the methods of the invention to include overlapping sequences of a viral genome and immobilized on a solid support. Such an array may be used to distinguish between viral strains. In another embodiment, target-dependent ligation products prepared by methods of the invention may be immobilized on a solid support and used to detect or quantify related polynucleotide sequences. As used herein, "microarray" refers to a surface with an array of putative
binding (e.g., by hybridization) sites for a biochemical sample. Typically, a microarray refers to an assembly of distinct polynucleotides or polypeptides immobilized at defined positions on a substrate. Microarrays are formed on substrates fabricated with materials such as paper, glass, plastic (e.g., polypropylene, nylon), polyacrylamide, nitrocellulose, silicon, optical fiber, or any other suitable solid or semi-solid support, and configured in a planar (e.g., glass plates, silicon chips) or three-dimensional (e.g., pins, fibers, beads, particles, microtiter wells, capillaries) configuration. Polynucleotides or polypeptides may be attached to the substrate by a number of means, including (i) in situ synthesis (e.g., high- density polynucleotide arrays) using photolithographic techniques (see Fodor et al., Science (1991) 251 :767-73; Pease et al., Proc. Natl. Acad. Sci. USA (1994) 91 :5022-5026; Lockhart et al., Nature Biotechnology (1996) 14:1654; U.S. Pat. Nos. 5,578,832; 5,556,752; and 5,510,270); (ii) spotting/printing at medium to low density on glass, nylon, or nitrocellulose (see Schena et al., Science (1995) 270:467-70; DeRisi et al., Nature Genetics (1996) 14:457-60; Shalon et al., Genome Res. (1996) 6:639045; and Schena et al., Proc. Natl. Acad. Sci. USA (1992) 20:1679-84; and (iv) by dot-blotting on a nylon or nitrocellulose hybridization membrane (see, e.g., Sambrook et al., Eds. (2001) Molecular Cloning: A Laboratory Manual, 3rd ed., Vol. 1-3, Cold Spring Harbor Laboratory (Cold Spring Harbor, N.Y.)). Polynucleotides or polypeptides may also be noncovalently immobilized on the substrate by hybridization to anchors, by means of beads, or in a fluid phase such as in microtiter wells or capillaries. Arrays may include polynucleotide sequences prepared by the methods of invention.
[0089] In another aspect, libraries of the invention may be used for development of diagnostic or forensic reagents for detection of the presence of bacterial and viral infectious agents, genetic traits and diseases, SNPs, etc. For example, libraries of the invention may be used to select and optimize adjacent pairs of oligonucleotide probe sequences that are useful in ligase-mediated detection methods. In another example, libraries of the invention may be used to select and optimize polynucleotide sequences useful for hybridization- mediated DNA detection (i.e., affinity complementation). In a further example, libraries of the invention may be used to select and optimize polynucleotide primer sequences for PCR-based detection methods.
[0090] In another aspect, libraries of the invention may be used for development of affinity reagents. For example, a directed sequence library or a portion thereof, prepared by methods of the invention, may be coupled to a solid support and used for enrichment or
purification of a polynucleotide sequence or nucleoprotein complex of interest from a mixture. Means for attachment of polynucleotides to a solid support are well known in the art. For example, amino-modified polynucleotides can be attached to an aldehyde- functionalized surface via reaction with free aldehyde groups using Schiff s base chemistry. In another example, amino-terminal polynucleotides can be coupled to isothiocyanate- activated glass, to aldehyde-activated glass, or to a glass surface modified with epoxide. [0091] In another aspect, methods of the invention may be used for mapping accessible sites on RNA target (see Fig. 13). Often, knowledge of site accessibility in mRNA is required for the design of optimal RT-PCR primers, gene microarray experiments, and for antisense and ribozyme therapeutics. Since T4 DNA ligase could ligate DNA sequences on the RNA templates, (Nilson et al.(2001) supra), the hemi-random probes can be hybridized with a folded RNA target, e.g., in buffer solution or in cell lysates, and ligated with high sequence specificity. The products of target-dependent ligation can then be amplified, cloned, and sequenced. Since hybridization events occur only in the single-stranded and looped regions (and not in double-stranded stems), the method of invention will provide information regarding target accessible sites. Despite the fact that several experimental and theoretical methods are already available for the study of the accessible sites in RNA, they are all have serious drawbacks. Therefore, there is a need for expanding the arsenal of existing methods. Recently a new method by Liang et al.(WO0224950A2), based on the hybridization of "di-tag" hybridization probes, similar to those described in Paquin et al., WO0043538, supra, has been suggested. Liang and co- workers use an approach that is similar to the Paquin method with one exception, the target origin. Liang et al. use folded single-stranded RNA with partially accessible sites for hybridization, whereas Paquin et al. use denatured double-stranded DNA that presumably have most sequences accessible for hybridization.
[0092] In other aspects, libraries of the invention may be used for preparative extraction of specific genes (including mRNA, genomic DNA, or fragments thereof), and as probes for specific sequences in Northern blots, in situ hybridization, and genomics mapping and annotation procedures.
[0093] In another aspect, libraries of the invention may be prepared from more than one target simultaneously (i.e., in a single reaction vessel). After cloning of directed sequence inserts obtained from multiple targets into vectors, the individual inserts may be sequenced and aligned to the appropriate target by, e.g., computer-assisted sequence
alignment, to select desirable probe sequences for each target used in the mixture. These methods may be used to significantly enhance and accelerate genomics-related studies. Further, they can be used to generate cocktails of inhibitors of the expression of one or more genes, according to the targets used to generate the directed libraries. These cocktails can generated by expressing the libraries in cells of interest, selecting for a desired phenotype, and recovering the sequences of the library that conferred the phenotype by PCR and sequencing (see Li et al. (2000) supra; Kawasaki & Taira (2002), supra).
EXAMPLES
[0094] The following examples are provided to illustrate but not limit the present invention.
Example 1: Directed sequence library production by ligation of pairs of hemi-random probes with randomized positions on a SFV DNA target DNA Target
[0095] A 1 :1 mixture of full-length heat denatured PCR-amplified fragments containing two groups of genes from SFV (Semliki Forest Virus), ds SFV (7 kb) and ds SFV Helper (5kb), or 1 kb or 20-30 bp fragments of SFV, was used as the target. SFV and SFV Helper sequences were derived from separate plasmids available from Life Technologies, Inc. (SFV Gene Expression System).
Hemi-Random Probes, Masking Oligonucleotides, and PCR Primers
[0096] Hemi-random probes, masking oligonucleotides, and PCR primers were synthesized by IDT (Integrated DNA Technologies, Coralville, IA).
[0097] Hemi-random probes contained 10-mer random regions and 26-mer defined sequences that contained a primer binding site and a restriction site, as follows:
Hemi-Random Probe A:
5 '-pNNNNNNNNNNGGATCCCTGCTGACGACTAGACTGTG-3 '
(SEQ ID NO: 1)
Hemi-Random Probe B:
5 '-CAGTCTAGCAAGTATGCGTCCTCGAGNNNNNNNNNN-3 '
(SEQ ID NO:2)
[0098] Masking oligonucleotides contained sequences complementary to and masking the 26-nt long defined sequences of the probes. Masking oligonucleotides were used to prevent hybridization of the defined sequences of the probes to target sequences and to prevent parasitic ligation of probe sequences to each other. The sequences of the masking oligonucleotides were as follows:
Masking Oligonucleotide for Hemi-Random Probe A:
5'- CACAGTCTAGTCGTCAGCAGGGATCC -3' (SEQ ID NO:3)
Masking Oligonucleotide for Hemi-Random Probe B:
5'-CTCGAGGACGCATACTTGCTAGACTG-3' (SEQ ID NO:4)
[0099] Primers used for PCR amplification of ligation products were as follows:
Primer 1 : 5'-CACAGTCTAGTCGTCAGCAG-3' (SEQ ID NO:5)
Primer 2: 5'-CAGTCTAGCAAGTATGCGTC-3' (SEQ ID NO:6)
Hybridization and Ligation
[00100] The hemi-random probes were pre-hybridized with their corresponding masking oligonucleotides in T4 DNA ligase reaction buffer for 5 min at room temperature.
The target was added and the mixture was then incubated for 30 min at varying temperatures (25-42°C) to allow the probes to hybridize to the target. T4 DNA ligase was then added and the mixture was incubated at room temperature for 1 hour. The ligation reaction mixture contained the following:
Hemi-Random Probes A and B 0.1 - 1 μM (2-20 pmol, 2-4μl)
Masking Oligonucleotides for Hemi-Random Probes A and B 0.1-1 μM (2-20 pmol, 2-
4μl)
DNA target 0.01-1 μM (0.2-20 pmol, 2μl)
T4 DNA ligase buffer (30 mM Tris-HCl, pH 7.8, 5-10 mM MgCl2, 10 mM DTT, 1 mM
ATP) (2μl of lOx), 50-200 mM NaCl
T4 DNA ligase 0.1 U/μl (2 units, 1 μl)
H2O up to 20 μl
[00101] In parallel, controls were run without masking oligonucleotides, without
DNA target, or using hemi-random probes with different or shorter defined sequence or shorter random sequence (e.g., 20-mer defined sequence and 7-mer random sequence plus one fixed position).
Amplification by PCR
[00102] After the ligation reaction was complete, lμl of the 20μl ligation mixture was used for PCR amplification of the 72 bp ligation product. Typical cycles were: 94°C
30 sec - 54°C 30 sec - 72°C 15 sec (20 cycles).
[00103] After PCR, 10 μl of the reaction mixture was mixed with 3 μl of 6x loading buffer (0.25% bromphenol blue, 0.25% xylene cyanol, 30% glycerol in water) and loaded onto a 10% native polyacrylamide gel in lx TBE. The gel was run at room temperature at
25V/cm field. After electrophoresis, the gel was stained with ethidium bromide and visualized under UN light.
Cloning and Sequencing
[00104] The 72 bp ligation products were PCR amplified on a large scale, gel purified, and cloned into the pT7Blue-3 vector (Νovagen). E. coli were transformed with the recombinant vector and colonies were used for mini-preps. DΝA was isolated using the Wizard Plus Minipreps Purification System (Promega) or QIAprep Spin Miniprep Kit (Qiagen), and sent to Marshall University DΝA Core Facility for dye-primer sequencing.
Results
[00105] The results of the target-dependent ligation experiments described above are shown in Figs. 5, 6, 7, and 8.
Example 2: Directed sequence library production using a TΝF DΝA target
[00106] An anti-TΝF directed library was prepared. Experimental conditions and hemi-random probes were as described in Example 1, with the following differences: [00107] The DΝA target was a single-stranded murine TΝFα cDΝA. The target was prepared by amplification from a pGEM-4/TΝF plasmid which included sequences for the murine TNFα gene with the full-length 5'-UTR and part of the 3'-UTR, totaling 1 kb. Amplification was by asymmetric PCR, using only a single primer, allowing production of single-stranded DNA. The single-stranded DNA was purified away from primers using a
GeneClean III kit, ethanol precipitated, and used in experiments as a target for preparation of a directed library.
[00108] After ligation, samples were sent for sequencing to MWG Biotech
(mwgbiotech.com). The effect of competing random hexamers was also evaluated. The results of these experiments are shown in Fig. 11.
***
[00109] Although the foregoing invention has been described in some detail by way of illustration and examples for puφoses of clarity of understanding, it will be apparent to those skilled in the art that certain changes and modifications may be practiced without departing from the spirit and scope of the invention. Therefore, the description should not be construed as limiting the scope of the invention, which is delineated by the appended claims.
[00110] All publications, patents and patent applications cited herein are hereby incoφorated by reference in their entirety for all puφoses to the same extent as if each individual publication, patent or patent application were specifically and individually indicated to be so incoφorated by reference.