WO2007098130A9 - Stratégie innovante de déconvolution et de fusion pour criblage à grande échelle - Google Patents

Stratégie innovante de déconvolution et de fusion pour criblage à grande échelle

Info

Publication number
WO2007098130A9
WO2007098130A9 PCT/US2007/004313 US2007004313W WO2007098130A9 WO 2007098130 A9 WO2007098130 A9 WO 2007098130A9 US 2007004313 W US2007004313 W US 2007004313W WO 2007098130 A9 WO2007098130 A9 WO 2007098130A9
Authority
WO
WIPO (PCT)
Prior art keywords
bait
items
prey
library
experiment
Prior art date
Application number
PCT/US2007/004313
Other languages
English (en)
Other versions
WO2007098130A2 (fr
WO2007098130A3 (fr
Inventor
Jing Huang
Fulai Jin
Tony R Hazbun
Original Assignee
Univ California
Purdue Research Foundation
Jing Huang
Fulai Jin
Tony R Hazbun
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ California, Purdue Research Foundation, Jing Huang, Fulai Jin, Tony R Hazbun filed Critical Univ California
Publication of WO2007098130A2 publication Critical patent/WO2007098130A2/fr
Publication of WO2007098130A9 publication Critical patent/WO2007098130A9/fr
Publication of WO2007098130A3 publication Critical patent/WO2007098130A3/fr

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/60In silico combinatorial chemistry
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B35/00ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B35/00ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
    • G16B35/20Screening of libraries
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/60In silico combinatorial chemistry
    • G16C20/64Screening of libraries

Definitions

  • the present invention relates generally to methods, systems, kits, and apparatus for data analysis and more particularly to the efficient detection of interactions between atoms, molecules and/or cells.
  • proteome-wide screening platforms such as the array of yeast two-hybrid strains [2] and proteome microarrays[l 1,12]), for instance, allows an entire subject library to be screened at once and hits to be decoded without the need for DNA or protein sequencing.
  • the second challenge concerns obtaining protein-protein interaction (PPI) data with high accuracy and high coverage [14, 35-39].
  • False-positive (FP) and false-negative (FN) interactions can be detected for both extrinsic (experimental) and intrinsic (methodology) reasons.
  • Inherent method limitations can only be remedied technologically.
  • Experimental errors in large-scale analyses, on the other hand, can be only eliminated through repeated and reciprocal PPI detection. However, to repeat the screening will demand much more resources than to do the screening only once.
  • each proteome-wide probing normally employs one bait protein and identifies on average five prey proteins (network neighbors) [21]. 6,144 (size of the yeast proteome) experiments are required to cover the whole interactome in one pass.
  • the proteome-wide platforms e.g., the arrayed yeast two-hybrid library [2] and protein microarrays [40,11-12] have the physical capacity to detect thousands to tens of thousands of proteins -far higher than five preys per experiment.
  • the efficiency for interactome mapping can be increased by screening multiple baits together, if the relationship between mixed baits and their interacting preys can be deconvo luted.
  • One possibility is by labeling baits with distinct fluorescent dyes (or potentially quantum dots). This "one color- one bait" approach, however, would be quite limiting due to both technical and economic reasons.
  • the present invention relates generally to data analysis.
  • a method for analyzing interactions between items in a bait library and items in a prey library is provided.
  • the items of a library may be atoms, molecules and/or cells.
  • a unique N-digit base-X code is assigned to each of M items in a bait library.
  • Each one of the N-digits is associated with a group of experiments.
  • Each experiment of a group is associated with a value of the digit associated with that group, wherein each experiment involves the same prey items and involves the bait items whose codes have the value of the digit associated with that experiment.
  • a result of each experiment is analyzed to determine if a prey item interacts with one or more of the bait items in each experiment. Based on the values of the digit associated with the experiments in which the prey item interacts with a bait, a subset of the M bait items that potentially interact with the prey item is determined. [0015] In one embodiment, for each prey item: a result of each experiment is analyzed to determine if the prey item interacts with one or more of the bait items in each experiment; and a subset of the M bait items that potentially interact with the prey item is determined based on the values of the digits associated with the experiments in which the prey item interacts with a bait.
  • a unique N-bit binary code is assigned to each of M items in a bait library.
  • Each one of the N-bits is associated with a pair of 2N experiments.
  • Each experiment of a pair is associated with a binary state of the bit associated with that pair.
  • Each experiment involves the bait items whose binary codes have the binary state of the bit associated with that experiment and may involve the same prey items.
  • a result of each experiment is analyzed to determine if a prey item interacts with one or more of the bait items in each experiment. Based on the binary states of the bits associated with the experiments in which the first prey item interacts with a bait, a subset of the M bait items that potentially interact with the first prey item is determined.
  • M items in a prey library are associated with a first well on a first plate.
  • a result of a first experiment involving a bait and the first well is analyzed to determine if the bait interacts with one of the prey items in the first well.
  • One of the M items in the first well is associated with a second well on the first or other plate.
  • a result of a second experiment involving the bait and the second well is analyzed to determine if the bait interacts with one of the prey items in the second well.
  • Which one of the M items in the prey library interacts with the bait is determined from the results of at least the first and second experiments.
  • the invention provides a kit comprising an information storage medium (e.g, a computer readable medium such as CD, DVD, or diskette, etc.) and a library.
  • an information storage medium e.g, a computer readable medium such as CD, DVD, or diskette, etc.
  • the kit includes both a prey library and a bait library.
  • Figure 1 illustrates a method for detecting interactions according to an embodiment of the resent invention.
  • FIG. 1 Scheme for PI-Deconvolution.
  • (a) Graph representation of a hypothetical 32-protein network. Yellow filled circles, proteins (nodes); boxed lines, interactions (edges). For simplicity, only nodes and edges concerning proteins 1—16 are shown,
  • Every pair contains a "+" pool and a "-" pool, each employing 8 baits (half the batch size).
  • 2n experiments rows
  • Each column represents profile of a prey; positive signal (red), negative signal (black). All valid preys (columns outlined in red) and their possible baits are listed. If a prey binds to only one bait in a batch, the prey should be detected only once in each pair of experiments. Degenerate profile "n" or "?” are used to indicate neither or both experiments in a pair give a positive call (such as prey 5 or prey 13). Preys with degenerate profiles can still be partially deconvoluted and further narrowing-down can be achieved by reciprocal confirmation, (d) A graph can be drawn according to the result in c.
  • FIG. 3 PI-Deconvolution applied to protein interaction mapping, (a) Yeast proteome microarray screening. 15 bait proteins are encoded as shown and 8 bait pools are prepared accordingly. Each image column represents the result of a pooling screen, and each image row represents the same spot of the array. A positive signal indicates the presence of one or more binding proteins in the pool. Signals from "+" pools are false-colored red and "- " pools green. For example, the prey spots representing CMDl (first row) were positive when probed with the "+” pools of pairs 1 and 2 (in red), and the "-" pools of pairs 0 and 3 (in green). The profile of CMDl is thus read as "-+-K-", which equals the encoding tag for the bait CMKl .
  • the results obtained by the PI-Deconvolution analysis are identical to those obtained from single-bait probing (using 15 arrays). Only reciprocally confirmed interactions (red bidirectional arrows) and self interactions (black arrows) are shown (bottom). Detailed explanation of hit recognition is described in Methods, (b) Yeast two-hybrid array screening.
  • the whole library array consists of 16 plates with 384 strains each. Shown are images of one representative library plate screened with 16 baits using PI- Deconvolution; each image is the result of a pooling screen with 8 baits.
  • DIP yeast interactome
  • FIG. 6 Pooling scheme. 16 strains (a) will be pooled into 4 pairs of pools according to (b). Each pair has a "-" pool and a "+” pool. For example, strain 4 will be pooled into “+” pools of pairO and pairl; and "-" pools of pair2 and pair3. If strain 4 is two- hybrid positive (red), it will make 4 of the 8 pools positive. Identify of the positive strain can be deconvoluted because the only possible strain that may cause the 4 positive pools is strain 4 (c). [0025] Figure 7.
  • Yeast genome two-hybrid screening using SPA arrays (a) Every spot on the SPA_6 array represents a pool of 32 yeast AD strains and the original 6,144-strain yeast AD-array was compressed into a SPA array with 96 12-pool sets, and each set represents 64 AD strains. The 12 spots in the 3 squares at the same position of 3 plates belong to the same group (middle panel). Green squares highlight the groups containing two-hybrid positive strains; red squares highlight a group with one false positive spot. Four examples of deconvolution were shown with number 1 ⁇ 4. (b) Visualization of common positive hits across different SPA arrays.
  • FIG. 9 Method of plate pooling.
  • the AD strain were kept in 6496-well plates.
  • Three pool arrays with different screen redundancy were constructed as shown in Table 5. Taking the pool array with screen redundancy 4 as example (orange), the 64 plates were divided into 4 batches (A ⁇ D) of 16 plates. For each batch, 4 pairs of pool plates were prepared, and each pair contained a "+" plate and a "-" plate.
  • This table shows how the plates with single strains were mixed into the pool plate. For example, plate 1 ⁇ 8 was mixed into pool plate "-" of pair 3 using 96-channel robots (96 pools will be generated at once into 96 wells), and plate 9 ⁇ 16 will be mixed into pool plate "+” of pair 3. The way to make pool arrays with higher screen redundancy is similar.
  • FIG. 10 Positive hits on different SPA arrays. Pool families on SPA_4 (orange) and SPA_5 (blue) are indexed by Row-Col-Batch, and pool families on SPA_6 (purple) are indexed by Row-Col. The profile of positive pool families are shown in "Profile" column. Posjpair# indicates the number of positive pairs in this pool family. Positive pools in the same row in this table are the parallel pool families. Profiles of parallel pool families will be put in the row only if their profiles are consistent with each other, otherwise they will be put into different rows. The deconvolution results are shown only if there are no more than 4 deconvolution possibilities. The results of crossvalidation between uniquely deconvo luted hits and four independent sets of protein interaction data are also shown. Results of cross validating hits with deconvolution ambiguity can be found in Table 11.
  • Figure 13 Deconvolution performance of STA arrays. This describes the number of ambiguous results within each of the different test arrays.
  • Figure 14 (a) Cross validation of the ambiguous hits on SPA_4. These ambiguous hits were first crossvalidated with the results from other SPA's for further narrow-down (9th column) and all putative interactions were then retested experimentally in quarduplicate (10th column, number of reproduced out of 4 repetitions). These putative hits were also compared to one dataset from previous duplicate screening using the original array with single AD strains (1 lth column), one dataset from previous bait pooling screening (12th column) and one dataset from other literatures (13th column), (b) Cross validation of the ambiguous hits on SPA_5. (c) Cross validation of the ambiguous hits on SPA_6.
  • One embodiment of the present invention provides for a novel pooling- deconvolution strategy that can dramatically decrease the effort required to generate large- scale data sets.
  • This "PI-Deconvolution" strategy employs imaginary tagging and allows the screening of 2 N probe proteins (baits) in 2*N pools, with N replicates for each bait. Deconvolution of baits with their binding partners (preys) can be achieved by reading the prey's profile from the 2*N experiments.
  • Embodiments of the invention have aspects of binary coding (imaginary tagging) of baits, combinatorial mix-bait screening, and built-in prey-bait tracking and cross-validation. The number of bits is not limited and can be any number, but is preferably 1, 2, 3, 4, 5, 6, 7, 8, 9, orlO.
  • PI-Deconvolution can identify protein-small molecule interactions inferred from profiling the yeast deletion collection. PI-Deconvolution should be applicable to a wide range of library- against- library approaches (e.g., for aptamers, siRNA, antibodies, peptides, and small organic molecules), and can also be used to optimize array designs.
  • library- against- library approaches e.g., for aptamers, siRNA, antibodies, peptides, and small organic molecules
  • a molecule is composed of two or more atoms.
  • a molecule may be electrically neutral.
  • a molecule may be part of a cell or larger structure, or it may exist independent of any larger structure.
  • a cell is the structural and functional unit of all living organisms.
  • a living cell can take in nutrients, convert these nutrients into energy, carry out specialized functions, and reproduce as necessary.
  • a living cell may be considered dead when it cannot proceed with some or all of these functions.
  • a cell may be either dead or living.
  • a cell can be eukaryotic or prokaryotic.
  • a library is a collection of one or more atoms, molecules, and/or cells. Atoms, molecules, and/or cells are collectively called items. There are many libraries. One library may have none of the items in another library. Thus, the libraries do not overlap. One libraries may have a portion of its items that are the same as some or all of the items in another library. Thus, the libraries do overlap. Also, all of the items in one library may also be in another library. Two libraries may be exact copies of each other. [0039] A bait and a prey are each an item. In some instances, a bait and a prey may interact. In some instances, baits may be organized into one library, and preys are organized in another library.
  • bait and prey libraries may be of any of the type described above.
  • a prey is attached to a known position ("indexed") on a plate or given a specified location, e.g., in a well.
  • a bait is attached on a plate.
  • the prey attached to the plate may be called the bait.
  • the bait becomes a single prey experiment.
  • Multiple prey may be placed in each well of a plate.
  • An array may be a physical distribution of items on or within one or more trays, plates, or other surfaces or structures.
  • An array may also correspond to a distribution or organization of data associated with each item. This latter type of array may be an array of numbers as might be found in computer software or logic.
  • An experiment occurs when one or more prey and one or more baits are put into an environment where an interaction between a bait and prey may be detected.
  • An experiment may utilize one or more plates.
  • An experiment may also refer to each well of a plate.
  • a "label” or a “detectable moiety or marker” is a composition detectable by spectroscopic, photochemical, biochemical, immunochemical, chemical, or other physical means.
  • useful labels include 32 P, fluorescent dyes and proteins (e.g., used in FRET), electron-dense reagents, enzymes (e.g., as commonly used in an ELISA), biotin, digoxigenin, or haptens and proteins which can be made detectable, e.g., by incorporating a radiolabel into the peptide or used to detect antibodies specifically reactive with the peptide.
  • a marker can also be phenotypic change in a cell.
  • a detector is used to detect the label. Typical detectors include spectrophotometers, phototubes and photodiodes, microscopes, scintillation counters, cameras, film and the like, as well as combinations thereof.
  • test compound or “drug candidate” or “modulator” or grammatical equivalents as used herein describes any molecule, either naturally occurring or synthetic, e.g., protein, oligopeptide (e.g., from about 5 to about 25 amino acids in length, preferably from about 10 to 20 or 12 to 18 amino acids in length, preferably 12, 15, or 18 amino acids in length), oligopeptidomimetic, small organic molecule, polysaccharide, lipid, fatty acid, polynucleotide, RNAi or siRNA, oligonucleotide (including antisense and triplex forming oligonucleotides, oligonucleotide mimetic, ribozyme, aptamer, etc.
  • protein oligopeptide
  • oligopeptide e.g., from about 5 to about 25 amino acids in length, preferably from about 10 to 20 or 12 to 18 amino acids in length, preferably 12, 15, or 18 amino acids in length
  • oligopeptidomimetic
  • the test compound can be in the form of a library of test compounds, such as a combinatorial or randomized library that provides a sufficient range of diversity.
  • Test compounds are optionally linked to a fusion partner, e.g., targeting compounds, rescue compounds, dimerization compounds, stabilizing compounds, addressable compounds, and other functional moieties.
  • a fusion partner e.g., targeting compounds, rescue compounds, dimerization compounds, stabilizing compounds, addressable compounds, and other functional moieties.
  • new chemical entities with useful properties are generated by identifying a test compound (called a "lead compound") with some desirable property or activity, e.g., inhibiting activity, creating variants of the lead compound, and evaluating the property and activity of those variant compounds.
  • HTS high throughput screening
  • a "small organic molecule” refers to an organic molecule, either naturally occurring or synthetic, that has a molecular weight of more than about 50 Daltons and less than about 2500 Daltons, preferably less than about 2000 Daltons, preferably between about 100 to about 1000 Daltons, more preferably between about 200 to about 500 Daltons.
  • An "siRNA” or “RNAi” refers to a nucleic acid that forms a double stranded RNA, which double stranded RNA has the ability to reduce or inhibit expression of a gene or target gene when the siRNA expressed in the same cell as the gene or target gene. "siRNA” or “RNAi” thus refers to the double stranded RNA formed by the complementary strands.
  • an siRNA refers to a nucleic acid that has substantial or complete identity to a target gene and forms a double stranded siRNA.
  • the siRNA is at least about 15-50 nucleotides in length ⁇ e.g., each complementary sequence of the double stranded siRNA is 15-50 nucleotides in length, and the double stranded siRNA is about 15-50 base pairs in length, preferable about preferably about 20-30 base nucleotides, preferably about 20-25 or about 24-29 nucleotides in length, e.g., 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, or 30 nucleotides in length.
  • Aptamers are DNA or RNA molecules that have been selected from random pools based on their ability to bind other molecules with high affinity specificity (see, e.g., Cox and Ellington, Bioorg. Med. Chem. 9:2525-2531 (2001); Lee et al, Nuc. Acids Res. 32:D95-D100 (2004)). Aptamers have been selected which bind nucleic acid, proteins, small organic compounds, vitamins, inorganic compounds, cells, and even entire organisms.
  • Antibody refers to a polypeptide substantially encoded by an immunoglobulin gene or immunoglobulin genes, or fragments thereof which specifically bind and recognize an analyte (antigen).
  • the recognized immunoglobulin genes include the kappa, lambda, alpha, gamma, delta, epsilon and mu constant region genes, as well as the myriad immunoglobulin variable region genes.
  • Light chains are classified as either kappa or lambda.
  • Heavy chains are classified as gamma, mu, alpha, delta, or epsilon, which in turn define the immunoglobulin classes, IgG, IgM, IgA, IgD and IgE, respectively.
  • An exemplary immunoglobulin (antibody) structural unit comprises a tetramer.
  • Each tetramer is composed of two identical pairs of polypeptide chains, each pair having one "light” (about 25 kD) and one "heavy” chain (about 50-70 kD).
  • the N-terminus of each chain defines a variable region of about 100 to 110 or more amino acids primarily responsible for antigen recognition.
  • the terms variable light chain (V L ) and variable heavy chain (VH) refer to these light and heavy chains respectively.
  • Antibodies exist, e.g., as intact immunoglobulins or as a number of well- characterized fragments produced by digestion with various peptidases.
  • pepsin digests an antibody below the disulfide linkages in the hinge region to produce F(ab)'2, a dimer of Fab which itself is a light chain joined to V H -C H I by a disulfide bond.
  • the F(ab)' 2 may be reduced under mild conditions to break the disulfide linkage in the hinge region, thereby converting the F(ab) 5 2 dimer into an Fab' monomer.
  • the Fab' monomer is essentially an Fab with part of the hinge region (see, Paul (Ed.) Fundamental Immunology, Third Edition, Raven Press, NY (1993)). While various antibody fragments are defined in terms of the digestion of an intact antibody, one of skill will appreciate that such fragments may be synthesized de novo either chemically or by utilizing recombinant DNA methodology. Thus, the term antibody, as used herein, also includes antibody fragments either produced by the modification of whole antibodies or those synthesized de novo using recombinant DNA methodologies (e.g., single chain Fv).
  • peptidomimetic and “mimetic” refer to a synthetic chemical compound that has substantially the same structural and functional characteristics of the polynucleotides, polypeptides, antagonists or agonists of the invention.
  • Peptide analogs are commonly used in the pharmaceutical industry as non-peptide drugs with properties analogous to those of the template peptide. These types of non-peptide compound are termed “peptide mimetics” or “peptidomimetics” (Fauchere, Adv. Drug Res. 15:29 (1986); Veber and Freidinger TINS p. 392 (1985); and Evans et ah, J. Med. Chem. 30:1229 (1987), which are incorporated herein by reference).
  • Peptide mimetics that are structurally similar to therapeutically useful peptides may be used to produce an equivalent or enhanced therapeutic or prophylactic effect.
  • the mimetic can be either entirely composed of synthetic, non-natural analogues of amino acids, or, is a chimeric molecule of partly natural peptide amino acids and partly non- natural analogs of amino acids.
  • the mimetic can also incorporate any amount of natural amino acid conservative substitutions as long as such substitutions also do not substantially alter the mimetic's structure and/or activity.
  • a mimetic composition is within the scope of the invention if it is capable of carrying out the binding or enzymatic activities of a polypeptide or polynucleotide of the invention or inhibiting or increasing the enzymatic activity or expression of a polypeptide or polynucleotide of the invention.
  • nucleic acid or “polynucleotide” refers to deoxyribonucleotides or ribonucleotides and polymers thereof in either single- or double-stranded form. Unless specifically limited, the term encompasses nucleic acids containing naturally occurring and synthetic analogues of nucleotides that have similar binding properties as the reference nucleic acid and are metabolized in a manner similar to naturally occurring nucleotides.
  • nucleic acids include, for example and without limitation, phosphorothioates, phosphoramidates, methyl phosphonates, chiral-methyl phosphonates, 2-O-methyl ribonucleotides, peptide-nucleic acids (PNAs), and the like.
  • PNAs peptide-nucleic acids
  • amino acid polymers in which one or more amino acid residue is an artificial chemical mimetic of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers and non-naturally occurring amino acid polymers.
  • amino acid chains of any length, including full-length proteins (i.e., antigens), wherein the amino acid residues are linked by covalent peptide bonds.
  • amino acid refers to naturally occurring and synthetic amino acids, as well as amino acid analogs and amino acid mimetics that function in a manner similar to the naturally occurring amino acids.
  • Naturally occurring amino acids are those encoded by the genetic code, as well as those amino acids that are later modified, e.g., hydroxyproline, ⁇ - carboxyglutamate, and O-phosphoserine.
  • Amino acid analogs refers to compounds that have the same basic chemical structure as a naturally occurring amino acid, i.e., an a carbon that is bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g., homoserine, norleucine, methionine sulfoxide, methionine methyl sulfonium. Such analogs have modified R groups ⁇ e.g., norleucine) or modified peptide backbones, but retain the same basic chemical structure as a naturally occurring amino acid.
  • Amino acid mimetics refers to chemical compounds that have a structure that is different from the general chemical structure of an amino acid, but that functions in a manner similar to a naturally occurring amino acid.
  • biomolecule is a molecule found in nature. Examples of biomolecules include, but are not limited to, polypeptides, polynucleotides, and carbohydrates.
  • a "customer” is any individual, institution, corporation, university, or organization seeking to obtain products and/or services.
  • a "provider” is any individual, institution, corporation, university, or organization that provides products and/or services. Typically, providers sell the products and/or services to customers.
  • FIG. 1 illustrates a Pooling and deconvolution method 100, PI-Deconvolution (PID), for determining an interaction between baits and preys according to an embodiment of the invention.
  • the baits and preys are organized into a bait library and a prey library. These libraries may be identical as described above.
  • the interactions between the baits and the preys occur during one or more experiments.
  • Step 105 assigns a unique N-digit base-X code to each of M items in a bait library.
  • M may be the total number of items in the bait library or a lesser amount.
  • M is X N , but M can be any value lower than this.
  • M will fall somewhere between X N and X ⁇ 1 so that ⁇ X ⁇ N ⁇ M ⁇ ⁇ X ⁇ N ⁇ l .
  • a and ⁇ are form factors depending on how the experiments are organized. For instance, a could be the number of wells in an experiment, as described in section 4 below.
  • the number of bait in an experiment can vary from one experiment to another.
  • the maximum bait in any experiment would be given by X.
  • the conservation of bait number per experiment could be conserved if dummy baits were used. These would be baits that do not physically exist, but are artificially created to be placeholders. This could be done simply for computation purposes.
  • Step 115 associates each experiment of a group with a value of the Kth-digit associated with that group.
  • experiment 12 would be associated with the fourth value in the octal system.
  • the fourth value would normally be designated as 3, again since 0 is the first value. Note that experiment 12 would be associated with the second of the four groups of experiments.
  • Each experiment involves the same prey items.
  • the number of prey items could be all of the items in the prey library or a lesser amount.
  • the number of preys can equal M, particularly when the bait and prey libraries are identical.
  • Step 120 analyzes a result of each experiment to determine if a prey item interacts with one or more of the bait items in each experiment. Typically, step 120 would determine if any prey in each experiment interacts with one or more of the bait items.
  • Step 125 determines a first subset of the M bait items that potentially interact with the first prey item. This determination is based on the experiments in which the first prey item interacts with a bait. In one embodiment, the determination is based on the values of the digits associated with each experiment.
  • 2 N baits are distinguished by their assigned N-bit binary codes, which are text strings consisting of "+” and "-” symbols (Fig. 2b).
  • the baits are assigned to (associated with) N pairs of experiments (each pair containing one "+” and one "-” experiment) corresponding to the binary bits (Fig. 2c).
  • Fig. 2c binary bits
  • bait 6 is used for the "+” or "-” experiment in a pair is determined by its symbol (value) in the coding string at the corresponding bit (Fig. 2b-c). For example, bait 6 is represented by string “—+—(-”. At bit 2 (third digit from the right in the string), its symbol is "+”. Thus in pair 2, bait 6 is included in the "+” experiment. This "+” experiment also includes all the other baits with a "+” sign in the column for bit 2.
  • Each prey's interacting bait(s) can be revealed by the prey's profile in all the N pairs of experiments. For example, prey 2 binds to only bait 5 among baits 1-16. In Fig. 2c, prey 2 is detected in 4 experiments, which are "— " of pair 0, "-" of pair 1, "+” of pair 2, and “— “ of pair 3. Accordingly, prey 2 can be represented by the profile "— + — ", denoting its readout in each of the 4 experiment pairs. Since pair numbering corresponds to bit numbering in the tag of a bait, the prey's profile can allow a direct track back to its own bait(s). In this case, the profile of prey 2 is identical to the bit tag for bait 5; thus, prey 2 binds to bait 5.
  • N is the number of proteins (bait or prey).
  • prey 3 will be judged invalid for this batch of baits. Prey 31 is rejected for the same reason.
  • a protein like prey 5 although it is not identified as a positive interactor in pair 1 due to random error, it does light up in all the other pairs and thus will be scored as a valid prey.
  • a cutoff-of-occurrence is used in this prey-validation process.
  • a typical range for a cutoff value is up to three.
  • a typical range for a cutoff value is up to two. Any cutoff value should be flexible and dependent on the particular screen employed.
  • proteins 1-16 have been used both as baits and as preys, therefore interactions among them can be confirmed by reciprocal (pair-wise) confirmation (Fig. 2d, red arrows); interactions with other proteins can only be confirmed after these proteins have been used as baits (Fig. 2d, green arrows).
  • method 100 is used to pool what is normally called preys, instead of pooling baits.
  • the bait simply becomes a prey.
  • method 100 can be used to screen a single bait.
  • the definitions of bait and prey are broad enough to encompass this idea since either one of two interacting items may be described as a bait or a prey.
  • a plate is any structure or surface that has areas (wells) to confine an item.
  • the idea of PI-Deconvolution can be also used to re-design prey arrays and maximize the efficiency of single bait screens.
  • the yeast two-hybrid array consists of 16 plates, each containing 384 AD-fusion strains (6,144 strains).
  • Each unique binary code would be associated with one of the original plates.
  • the preys in well “1” of half of the plates is put onto the "+” plate of the that pair.
  • the preys in well “1” on the other half of the original plates are put on the "-” plate.
  • Method 100 proceeds to find out which original plate (and of course which well) has a prey that interacts with the bait.
  • the 16 plates can be compressed into 8 plates using the PI-Deconvolution scheme (8 AD strains per well). This compressed library can be maintained and screened against single baits, equivalent to screening the original 16-plate array in quadruplicate (total reduction to 12.5%).
  • each well on a plate may be viewed as an experiment. If a maximum 8 AD strains are kept for each well, then the maximum value for X is 8.
  • the original 16 plates may be compressed to 4 plates. In this embodiment, M equals 64.
  • the unique code has 2 digits and has a base of 8. Alter the first 8 experiments, which in total involves two plates and all 6,144 strains, each of the 8 strains in a well are placed in the same well of a different quarter of a plate, which may be the same plate or the second plate involved in the group of experiments associated with the second digit of the code.
  • Protein networks are best modeled as scale-free networks [25], in which the majority of nodes have only a few neighbors while a small number of nodes (“hubs”) have many. As expected, the coverage of interactions between proteins of high connectivity is lower than that between proteins of low connectivity (coverage for highly-connected nodes can be improved when the bit number is decreased). PID is especially useful for mapping interactions of low-degree nodes, which account for the majority of nodes in protein networks and require the largest number of experiments using traditional single-bait methods.
  • PID will be able to cover a human interactome network with the same efficiency as for yeast (using same pool size). However, since average degree appears to be conserved among different organisms [21,22,26], even fewer experiments may in fact be needed, because a larger pool of baits can be accommodated on a larger proteome array. Likewise, PID should be amenable to increasing the throughput of high-content and/or high-dimensional screening/mapping projects [27,28].
  • PI-Deconvolution a novel pooling and deconvolution strategy
  • PID improves coverage and accuracy simultaneously, without the necessity of secondary screens.
  • most hits are at least partially deconvoluted (92% of the hits are narrowed down to at most 4 baits); further deconvolution can be achieved by pair-wise confirmation.
  • PID is generally applicable to both two-hybrid array and proteome microarray platforms.
  • PED is very flexible because it can be easily scaled up or down by setting different N values.
  • PID can be useful for other library- against- library scenarios, particularly if most probes in the query library have only a few targets in the subject library.
  • the imaginary tagging (coding) is universally applicable regardless of the nature of the query (molecules, cells, organisms, etc).
  • the data set may be provided to the intelligence module in real time as the data is being collected, or it may be stored in a memory unit or buffer and provided to the intelligence module after completion of the experiment(s).
  • the data set maybe provided to a separate system such as a desktop computer system or other computer system, via a network connection (e.g., LAN, VPN, intranet, Internet, etc.) or direct connection (e.g., USB or other direct wired or wireless connection) to the acquiring device, or provided on a portable medium such as a CD, DVD, floppy disk or the like.
  • the data set includes if and where on a prey holding device (plate) an interaction has occurred. After the data set has been received or acquired in step 120, the data set may be analyzed to determine the baits that potentially interact with the prey.
  • the PID process may be implemented in computer code running on a processor of a computer system.
  • the code includes instructions for controlling a processor to implement various aspects and steps of the PID process.
  • the code is typically stored on a hard disk, RAM or portable medium such as a CD, DVD, etc.
  • the processes may be implemented in a label detection device including one or more processors executing instructions stored in a memory unit coupled to the processor(s). Code including such instructions may be embodied in a carrier signal, which may be downloaded or transmitted to the memory unit over a wired or wireless network connection or direct connection with a code source, or the code may be provided using a portable medium as is well known.
  • PID process of the present invention can be coded using a variety of programming languages such as C, C++, C#, Fortran,
  • the bait and prey items described herein are often members of a library, which can be organized in the form of an array.
  • the bait and prey can be members of the same library, overlapping libraries, or different libraries.
  • Libraries can be used, e.g. , to assay for protein protein interactions, to identify enzymatic substrates, for example protein kinase substrates, to identify nucleic acids, including SNPs and allelic variants, to identify proteins and antibodies, to assay for pharmacogenetic effects, and to identify drugs that affect the function of genes or proteins, e.g., by investigating the effect of the drug on a molecule or cell.
  • Libraries can also be used to identify non-high-frequency events in any multiplexed format, such as in screening disease or biomarkers, in which case the libraries can be sample banks or collections.
  • Libraries include chemical (organic or inorganic) molecules, e.g., protein, sugar, nucleic acid or lipid, and cells. Essentially any molecule or cell can be used in the assays of the invention. After the bait and prey are allowed to interact, they are washed at suitable stringencies known in the art to remove non-specific interactions. Typically, a marker or label is used as an indicator that the desired interaction between the bait and prey has occurred.
  • the marker can be a fluorescent molecule, an enzyme, a cellular phenotype, etc., as described herein.
  • the marker can be linked to the bait, the prey, or to both the bait and the prey. In one embodiment, the marker is linked to the bait.
  • the marker can be the same or a different marker.
  • the prey library is organized in an array, such that each item or member of the library is affixed to a specified location on a solid-state platform, such that the location and identity of the prey is known.
  • a plate containing wells is used to organize the prey into defined locations, such that the location and identity of the prey is known.
  • One or more prey can be located in each well.
  • the libraries of the invention are nucleotide or protein arrays, including yeast-two hybrid arrays, antibody arrays, oligonucleotide arrays, siRNA arrays, peptide arrays, aptamer arrays, etc.
  • Such arrays can include a subset of gene or proteins expressed in a cell, or can contain the entire genome, transcriptome, or proteome, or a random assortment thereof (see, e.g., Kilker et al, PNAS 6:2099-2104 (2005)).
  • the arrays contain protein or nucleic acid from a healthy cell or a pathologic cell, or a cells at a specific stage of a cell cycle, maturation, or differentiation pathway, or under specified environmental or developmental conditions (see, e.g., Ely et al, Eur. J. of Cell. Biol. 84:431-444 (2005)).
  • nucleic acid arrays such as oligonucleotide arrays are used (see, e.g., Fodor et al, Science 251:767-773 (1991); Brown & Botstein, Nature Genet. 21 :33-37 (1999); Eberwine, Biotechniques 20:584-591 (1996)).
  • These arrays are collections of specifically chosen oligonucleotides that are bound to a solid support at predetermined and addressable locations.
  • these arrays comprise an oligonucleotide that specifically identifies each of the known genes in a genome. Messenger RNAs or cDNAs derived from a cell are applied to the array.
  • each mRNA or cDNA hybridizes with an oligonucleotide that corresponds to the particular gene from which it was transcribed. Because the identity and location of each immobilized oligonucleotide is predetermined, each hybridization event indicates that a particular gene has been expressed by the cell.
  • One commercialized version of an oligonucleotide array is the GeneChipTM from Affymetrix.
  • beads coated with an array, or cells are each attached to an optical sensor molecule. To provide an address, the beads are then drawn into wells at the end of fibers in a fiber optic bundle (see, e.g. , Bead ArrayTM (Illumina)).
  • arrays can be made from EST libraries.
  • the members of the library are intended to function to reduce the level of mRNA ⁇ e.g. antisense molecules, ribozymes, DNAzymes and the like) or the level of translation from an mRNA.
  • Detection of interaction between the bait and prey can be accomplished, for example, by using a labeled detection moiety that binds specifically to the library item (e.g., an antibody that is specific for RNA-DNA duplexes).
  • a labeled detection moiety that binds specifically to the library item
  • One example uses an antibody that recognizes DNA-RNA heteroduplexes in which the antibody is linked to an enzyme (typically by recombinant or covalent chemical bonding). The antibody is detected when the enzyme.reacts with its substrate, producing a detectable product.
  • the library is a cellular library.
  • the library is a chemical library, e.g., a drug library or a peptide library.
  • high throughput screening methods involve providing a combinatorial library, e.g., a chemical or peptide library, containing a large number of potential therapeutic compounds.
  • a combinatorial library e.g., a chemical or peptide library
  • Such "combinatorial chemical libraries” or “ligand libraries” are then screened.
  • the compounds thus identified can serve as conventional "lead compounds” or can themselves be used as potential or actual therapeutics.
  • a combinatorial chemical library is a collection of diverse chemical compounds generated by either chemical synthesis or biological synthesis, by combining a number of chemical "building blocks” such as reagents.
  • a linear combinatorial chemical library such as a polypeptide library is formed by combining a set of chemical building blocks (amino acids) in every possible way for a given compound length (i.e., the number of amino acids in a polypeptide compound). Millions of chemical compounds can be synthesized through such combinatorial mixing of chemical building blocks.
  • combinatorial chemical libraries include, but are not limited to, peptide libraries ⁇ see, e.g., U.S. Patent 5,010,175, Furka, Int. J. Pept. Prot. Res. 37:487-493 (1991) and Houghton et al, Nature 354:84-88 (1991)).
  • Other chemistries for generating chemical diversity libraries can also be used. Such chemistries include, but are not limited to: peptoids (e.g., PCT Publication No.
  • WO 91/19735 encoded peptides (e.g., PCT Publication WO 93/20242), random bio-oligomers (e.g. , PCT Publication No. WO 92/00091 ), benzodiazepines (e.g., U.S. Pat. No. 5,288,514), diversomers such as hydantoins, benzodiazepines and dipeptides (Hobbs et al, Proc. Nat. Acad. ScL USA 90:6909-6913 (1993)), vinylogous polypeptides (Hagihara et al, J. Atner. Chem. Soc.
  • nucleic acid libraries see Ausubel, Berger and Sambrook, all supra
  • peptide nucleic acid libraries see, e.g., U.S. Patent 5,539,083
  • antibody libraries see, e.g., Vaughn et al, Nature Biotechnology, 14(3):309-314 (1996) and PCT/US96/10287)
  • carbohydrate libraries see, e.g., Liang et al, Science, 274:1520-1522 (1996) and U.S. Patent 5,593,853
  • small organic molecule libraries see, e.g., benzodiazepines, Baum C&EN, Jan 18, page 33 (1993); isoprenoids, U.S.
  • the molecule or cell of interest is placed in a well.
  • the molecule or cell of interest can be bound to the solid state component, directly or indirectly, via covalent or non-covalent linkage, e.g., via a tag.
  • the tag can be any of a variety of components.
  • a molecule that binds the tag (a tag binder) is fixed to a solid support, and the tagged molecule of interest is attached to the solid support by interaction of the tag and the tag binder.
  • a number of tags and tag binders can be used, based upon known molecular interactions well described in the literature.
  • a tag has a natural binder, for example, biotin, protein A, or protein G
  • tag binders avidin, streptavidin, neutravidin, the Fc region of an immunoglobulin, etc.
  • Antibodies to molecules with natural binders such as biotin are also widely available and appropriate tag binders ⁇ see, SIGMA Immunochemicals 1998 catalogue SIGMA, St. Louis MO).
  • any haptenic or antigenic compound can be used in combination with an appropriate antibody to form a tag/tag binder pair.
  • Thousands of specific antibodies are commercially available and many additional antibodies are described in the literature.
  • the tag is a first antibody and the tag binder is a second antibody which recognizes the first antibody.
  • receptor-ligand interactions are also appropriate as tag and tag-binder pairs, such as agonists and antagonists of cell membrane receptors (e.g., cell receptor-ligand interactions such as transferrin, c-kit, viral receptor ligands, cytokine receptors, chemokine receptors, interleukin receptors, immunoglobulin receptors and antibodies, the cadherin family, the integrin family, the selectin family, and the like; see, e.g., Pigott & Power, The Adhesion Molecule Facts Book I (1993)).
  • cell membrane receptors e.g., cell receptor-ligand interactions such as transferrin, c-kit, viral receptor ligands, cytokine receptors, chemokine receptors, interleukin receptors, immunoglobulin receptors and antibodies, the cadherin family, the integrin family, the selectin family, and the like; see, e.g., Pigott & Power, The Adhesion Molecule
  • toxins and venoms can all interact with various cell receptors.
  • hormones e.g., opiates, steroids, etc.
  • intracellular receptors e.g., which mediate the effects of various small ligands, including steroids, thyroid hormone, retinoids and vitamin D; peptides
  • lectins e.g., which mediate the effects of various small ligands, including steroids, thyroid hormone, retinoids and vitamin D; peptides
  • drugs lectins
  • sugars e.g., nucleic acids (both linear and cyclic polymer configurations), oligosaccharides, proteins, phospholipids and antibodies
  • nucleic acids both linear and cyclic polymer configurations
  • oligosaccharides oligosaccharides
  • proteins e.g.
  • Synthetic polymers such as polyurethanes, polyesters, polycarbonates, polyureas, polyamides, polyethyleneimines, polyarylene sulfides, polysiloxanes, polyimides, and polyacetates can also form an appropriate tag or tag binder. Many other tag/tag binder pairs are also useful in assay systems described herein, as would be apparent to one of skill upon review of this disclosure.
  • Common linkers such as peptides, polyethers, and the like can also serve as tags, and include polypeptide sequences, such as poly-Gly sequences of between about 5 and 200 amino acids.
  • polypeptide sequences such as poly-Gly sequences of between about 5 and 200 amino acids.
  • Such flexible linkers are known to those of skill in the art.
  • poly(ethelyne glycol) linkers are available from Shearwater Polymers, Inc., Huntsville, Alabama. These linkers optionally have amide linkages, sulfhydryl linkages, or heterofunctional linkages.
  • Tag binders are fixed to solid substrates using any of a variety of methods currently available.
  • Solid substrates are commonly derivatized or functionalized by exposing all or a portion of the substrate to a chemical reagent which fixes a chemical group to the surface which is reactive with a portion of the tag binder.
  • groups which are suitable for attachment to a longer chain portion would include amines, hydroxyl, thiol, and carboxyl groups.
  • Aminoalkylsilanes and hydroxyalkylsilanes can be used to functionalize a variety of surfaces, such as glass surfaces. The construction of such solid phase biopolymer arrays is well described in the literature (see, e.g., Merrifield, J. Am. Chem. Soc.
  • Non-chemical approaches for fixing tag binders to substrates include other common methods, such as heat, cross-linking by UV radiation, and the like.
  • a method for identifying one or more interactions between biomolecules includes the following: contacting each of a populaton of copies of a prey library, with each of a series of pools of subsets of a bait library, wherein the bait molecules included in the pools are determined using a pooling- deconvolution strategy, detecting interactions between members of the prey library and pools of the bait library; and analyzing the detected interactions using a pooling-deconvolution data analysis, thereby identifying the one or more interactions between biomolecules.
  • the bait library is a library of kinases, and the prey library includes biomolecules, typically polypeptides.
  • the method is used to identify kinase substrates.
  • the pools for this embodiment preferably include 10 or fewer kinases. Exemplary pools include 9, 8, 7, 6, 5, 4, 3, or 2 kinases. More preferred pools include 5, 4, 3, or 2 kinases. Illustrative embodiments include pools of 4, 3, or 2 kinases.
  • the bait library can include 10, 20, 25, 50, 75, 100, 200, 250, or 500 kinases, which can be from different species, but are preferably from the same species.
  • the prey library can include 10, 20, 25, 50, 100, 200, 250, 500, 750, or 1000 polypeptides, which can be from a different species or the same species.
  • the prey library can be immobilized on a microarray.
  • the proteins of the prey library can be from different species than the kinases of the bait library, but in illustrative examples are from the same species.
  • the prey library includes 100, 200, 250, 500, 750, 1000, 2000, 2500, 5000, 7500, or 10,000 human proteins and the kinases are human kinases.
  • the prey library includes at least 100 polypeptides immobilized on a solid substrate at a density of at least 100/cm 2 .
  • kits for identifying interactions typically include a bait library and/or a prey library, as well as an information storage medium having a plurality of instructions adapted to direct an information processing device to perform an operation for analyzing interactions between items in a bait library and items in a prey library.
  • the operation includes the steps for carrying out a pooling-deconvolution analysis.
  • the items in the library are atoms, molecules and/or cells.
  • the information storage medium can be physically present within the kit, or access to the plurality of instructions can be provided via a computer network, such as a wide area network, for example the Internet. Accordingly, the information storage medium is connected to a computer server, for example an internal hard drive on the computer server.
  • the information storage medium can be other than an internal hard drive on a server. In other aspects, the information storage medium is physically present in the kit.
  • the information storage medium can be a portable storage medium that is inserted into a drive or external port of a computer.
  • the kit includes a series of pools that include members of the bait library, wherein the members of the pools are determined using a pooling-deconvolution experimental strategy.
  • the kit can include 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 25, 50, or 100 pools.
  • the kit includes a series of pools that include 2-4 kinases.
  • the pools are set up such that 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, or 100 biomolecules, for example polypeptides, that are part of a pooling- deconvolution strategy are intentionally not included in the pools.
  • kits will give a party receiving the kits, such as a customer that purchases a kit, the ability to add their own polypeptide(s) to certain pools to perform a pooling-deconvolution experiment.
  • kits are provided that include pooling-deconvolution coding information within the kit or otherwise accessible to a customer, that includes coding information for 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, or 100 additional biomolecules that are not included in the bait pools.
  • a receiver of a kit can add biomolecules to a prey library included in a kit.
  • a service function of a provider of prey libraries and/or bait libraries may perform the analysis discussed herein, including those involving 1 or more biomolecules received from a customer.
  • a provider selling the kit of the invention to a customer will typically provide information to the customer regarding the identities of the members of the pools. This information can be provided in a package insert physically associated with kit, or can be provided on a computer that is accessible by the customer over a network, especially a wide area network such as the Internet. Furthermore, the kit can include, or a customer can be given access to, specific coding information for each pool using the pooling-deconvolution method provided herein. In certain aspects, wherein the information storage medium is optionally present in the kit, the kit includes coding information regarding the bait molecules present in the pools. Alternatively a provider of the kit can provide access via a computer network to coding information regarding the pools of bait molecules.
  • the coding information can be such as to include information for 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 additional biomolecule, that are not included in pools sent to a customer, but that a customer can add to certain pools for the pooling-deconvolution analysis, for example according to the coding provided by the provider.
  • the members of the bait library and/or the prey library of the kit are biomolecules. All of the members of the prey library and/or the bait library can be immobilized on a solid substrate using techniques well known in the art. All of the members of the prey library can be immobilized on a solid substrate at a density of at least 100/cm 2 . Jn certain aspects, the prey library includes 50, 100, 200, 250, 500, 750, 1000, 2000, 2500, 5000, 7500, or 10000 different biomolecules, such as proteins. The proteins can be from the same species.
  • a method for offering a protein interaction identification kit to a customer includes the following: a. presenting the customer with an identity of each of a population of biomolecules; b. accepting from the customer, an identification of target bait molecules from the population of biomolecules; and C.
  • kits comprising a prey library comprising the population of biomolecules, a bait library comprising the target bait molecules, and an information storage medium having a plurality of instructions adapted to direct an information processing device to perform an operation for analyzing interactions between items in a bait library and items in a prey library, wherein the items in a library are atoms, molecules and/or cells, the operation comprising the steps for carrying out a pooling-deconvolution analysis.
  • the kit includes a series of pools of the target bait molecules that are identified by the customer and that are constructed by the provider using a pooling- deconvolution strategy.
  • an online system such as a series of Internet pages that include an input function for the customer, such as an online form, can be used to present a library available from the provider to a customer, and can be used by the customer to select members of the library, to include in a bait library.
  • the provider can then optionally, construct the pools of the bait library using a pooling-deconvolution strategy.
  • the provider can then either provide the pools of the bait library to the customer along with the prey library, or can perform a pooling deconvolution method using the prey library and the pools of the bait library.
  • the prey library can be less than the entire library presented to the customer, but in preferred examples, includes the entire library.
  • the kit provided to a customer in this aspect of the invention can include any of the features of the kits described above.
  • yeast proteome microarrays [11,12], which contain 4,088 purified Saccharomyces cerevisiae proteins (as glutathione S-transferase fusions) immobilized on nitrocellulose-coated glass slides.
  • a small network of protein interactions (Fig. 4a, bottom) was derived. This a "gold standard" network, because all the interactions in the network have been reciprocally confirmed (bi-directional red arrows in Fig. 4a).
  • PI-Deconvolution was also used to screen a genome- wide two-hybrid array consisting of ⁇ 6,000 yeast strains, each designed to contain one of the —6,000 S. cerevisiae open reading frames (ORFs) fused to the Gal4 activation domain (AD)[2].
  • ORFs open reading frames
  • AD Gal4 activation domain
  • 16 two-hybrid bait strains that each express a full length ORF fused to the Gal4 DNA binding domain (Fig. 4b) was used. Thirteen of these bait strains have previously been screened against the genome- wide array. Because of experimental variability, these single bait screens required each bait to be screened in duplicate, resulting in a total of 32 screens for the 16 baits[2,14].
  • the 16 bait strains were mixed into 8 pools and screened against the two-hybrid array. In this procedure, two 8-bait pools in the same pair cover all of the 16 baits. Therefore, 4 pairs of PI-Deconvolution screens represent 4 independent screens of all the 16 baits. This protocol is a significant advantage over the individual bait procedure because it reduces the number of screens from 32 to 8, yet each bait is screened in quadruplicate. [0124] In the 13 single bait screens, 484 preys were observed and defined as two-hybrid positive colonies [2,14].
  • the 188 PI-Deconvolution non- reproducible hits contain only 2 (of 125, 1.6%) reproducible hits from the single bait data (Table 3), suggesting that the PI-Deconvolution 155 reproducible hits might represent almost complete (saturated) coverage, subject to the detection sensitivity of the current system.
  • the increased coverage is due to the high repetitions inherent in PI-Deconvolution screening.
  • 57 could be assigned to a single bait, 34 to two baits, and 51 to four possible baits (see discussion above about further deconvolution of positives assigned to more than one bait).
  • 56 belong to the 13 previous screened baits.
  • 38 were previously classified as reproducible positives in single bait screens; 11 were previously classified as non-reproducible positives (i.e., appearing only once in duplicate screens), but can now be considered reproducible because they appeared all four times in Pl- Deconvolution screens; and 7 are novel interactions that had eluded detection in single bait screens.
  • One example of a novel interaction is an interaction between Gael and Glc7, which are regulatory and catalytic subunits, respectively, of a type 1 phosphatase (PPl) involved in the regulation of glycogen synthesis [15].
  • PPl type 1 phosphatase
  • PI-Deconvolution increases screening efficiency by making better use of the physical capacity of whole-proteome platforms for parallel detection.
  • the PI-Deconvolution approach was also tested on an assay independent of protein interaction mapping, namely the identification of yeast mutants resistant to specific drugs.
  • the S. cerevisiae deletion collection is a set of ⁇ 4,500 strains, each deleted for one of the non-essential ORFs [16,17].
  • n value i.e. , screening a larger pool
  • acceptable pool size is also determined by the sensitivity and background of the detection method (as is true for any pooling strategy).
  • pooled screening generally relies on the gain of a signal, drug hypersensitivity cannot be scored in a pooling screen using fitness as a readout.
  • preferred pool sizes for a bait library include 10 or less, 9 or less, 8 or less, 7 or less, 6 or less, or especially 5 or less, 4 or less, 3 or less, or 2. This is particularly true for methods in which the prey and bait are polypeptides, and especially when the prey include at least 100, 200, 250, 500, 1000, 2000, 2500, 5000, 7500, or 10000 proteins immobilized at a density of at least 100/cm 2 on an addressable microarray.
  • Example 5 illustrates a bait pool size of 3.
  • a method for detecting a molecule that affects the phosphorylation of a polypeptide or protein by a kinase wherein the polypeptide or protein identified is a substrate for the kinase.
  • the polypeptide or protein is contacted with the kinase in the presence of a test molecule, under conditions permissive for phosphorylation of the substrate by the kinase. Phosphorylation of the substrate by the kinase is then detected. A difference in phosphorylation in the presence versus absence of the test molecule indicates that the test molecule affects phosphorylation of the substrate by the kinase.
  • a YEAST TWO-HYBRID SMART POOL ARRAY SYSTEM FOR PROTEIN INTERACTION MAPPING [0134] A novel two-hybrid smart pool array (SPA) system was prepared in which, instead of individual AD strains, well-designed AD pools were screened in an array format that enables built-in replication and prey-bait deconvolution. Using this method, a Saccharomyces cerevisiae genome SPA increases Y2H screening efficiency by an order of magnitude. [0135] Bait pooling does not provide as large a benefit to most investigator-initiated research programs, which often focus on screening only one or a few select baits. However, instead of pooling baits, the same pooling-deconvolution principle can be applied to pool prey (AD) strains, enabling efficient screening of individual baits with high accuracy and coverage.
  • AD pool prey
  • Prey-based pools are advantageous over bait-based pools because once prey pool arrays are prepared they can be maintained indefinitely and reused for new screens. Another advantage of prey pooling (over bait pooling) is apparent upon considering the established two-hybrid selection procedures. Due to fortuitous activating sequences, a significant fraction of bait-BD fusions can activate the two-hybrid reporter gene (e.g., the HIS3 gene in this paper) without the presence of any prey-AD fusion protein. Addition of 3-amino-triazole (AT), an enzymatic inhibitor of His3, can compensate for auto-activation.
  • AT 3-amino-triazole
  • deconvolution is possible because every strain is pooled into 4 different pools (one from each pair), so if one of the 16 strains is two-hybrid positive (for a given bait) then 4 of the 8 pools will yield a positive colony.
  • identity of the two-hybrid positive strain can be deconvoluted by its presence only in a specific combination of 4 pools and absence in the other pools.
  • the unit of robotic pooling in use is a whole (96-well) plate (instead of each well), enabling many (in this case, 96) pools to be made at once (see Figure 9). [0140] Pooling is facilitated by a 96-channel pipetting robot (Biomek FX, Beckman Coulter). As shown in Figure 9, 6,144 strains were kept in 64 96-well plates.
  • the identity of the positive strain in a 64-strain set can be uniquely deconvoluted (to "+” or “-” profiles only) from the pattern of the corresponding 12 spots (e.g., Example 1 of Figure 7).
  • a 64-strain set contains more than one positive AD strain, there will be deconvolution ambiguity ("?” profiles). False positive or false negative spots can also cause "?” or "n” in the profile, but the profile can still be partially deconvoluted (e.g., Examples 2 and 3 of Figure 7) as described below.
  • a pair may also give "?” profile if more than one two-hybrid positive strain is present in the corresponding set of 2 n strains. However, this is unlikely to happen on the SPA arrays. It has been estimated that on average one yeast protein binds to only 3 ⁇ 10 other proteins.
  • M # of preys to which one bait binds
  • N # of pool families on a SPA array
  • S size of pool family (number of preys in a pool family)
  • x # of pool families having more than one positive AD strains in their corresponding 2 n strain set.
  • a pool family on SPA_6 is positive (i.e., at least one positive exists in the corresponding 64 AD strains), then all 6 pairs should be positive (i.e., at least one pool in a pair is positive). Pool families with too few positive pairs (e.g., Example 4 in Fig. 7a) will be removed as false positives because false positive spots usually lack reproducibility.
  • the second step of analyzing SPA array data is deconvolution. Because most yeast proteins bind to only 3 ⁇ 10 other proteins, each set of 64 strains on SPA_6 most likely contains zero or only one two-hybrid positive strains. Therefore, the identity of the positive strain in a 64-strain set can be uniquely deconvoluted (to "+” or "-" profiles only) from the pattern of the corresponding 12 spots (e.g., Example 1 of Figure 7a). However, when a 64- strain set contains more than one positive AD strain, there will be deconvolution ambiguity ("?” profiles). False positive or false negative spots can also cause "?” or "n” in the profile, but the profile can still be partially deconvoluted (e.g., Examples 2 and 3 of Figure 7a).
  • pool family "C-E-5" on SPA_4 is constructed from wells E5 of plates 33-48; all 16 wells at the same well position are covered in pool family "B-E-5" on SPA_5, which is derived from wells E5 of plates 33-64. Therefore, pool family "B-E-5" on SPA_5 is a parallel pool family of "C-E-5" on SPA_4.
  • This pool family "B-E-5" (on SPA_5) corresponds to two parallel - pool families on SPA_4, i.e., "C-E-5" (wells E5 of plate 33-48) and "D-E-5" (wells E5 of plate 49 ⁇ 64).
  • every pool family on SPA_6 is parallel to two pool families on SPA_5 and four on SPA_4 ( Figure 9).
  • a hit's profile on one SPA array should be predictable from its profile on another SPA array.
  • pool family "C-E-5" on SPA_4 should be positive and give a profile of"- - ++".
  • the profile of its parallel pool family "B-E-5" on SPA_5 should be " ++”
  • the profile of parallel pool family "E-5" on SPA_6 should be "+ ++”.
  • pool family "B-E-5" on SPA_5 gives profile " ++”
  • the parallel pool family "C-E-5" on SPA__4 should give profile "- -++”.
  • the advantage of the SPA scheme is that we can directly determine if a positive pool family on one SPA array is reproduced in the other SPA arrays.
  • pool family "B-E-5" on SPA_5 as example.
  • this pool family contains one two-hybrid positive strain and gives degenerate profile "- ? -n+” due to experimental errors.
  • the identity of the positive strain cannot be uniquely deconvoluted, we can still determine if this positive pool family is observed in SPA_4 or SPA_6. From the profile "- ? -n+”, we predict that "C-E-5" on SPA_4 should give a four-digit profile which does not conflict with "? -n+”.
  • Ambiguity can also be resolved experimentally by testing the unresolved preys individually or, more efficiently, by employing a "reshuffled" pooling configuration (analogous to bait reshuffling as we previously described [43]).
  • SPA yeast two-hybrid smart pool arrays
  • Gaclp a regulatory subunit of protein phosphatase type I involved in glycogen accumulation in Saccharomyces cerevisiae. MoI Genet Genomics 265, 622-635 (2001).

Landscapes

  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Library & Information Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biophysics (AREA)
  • Medicinal Chemistry (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Computing Systems (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

Des exemples de mode de réalisation de la présente invention concernent des méthodes, des systèmes, des kits et des appareils d'analyse de données pour détecter des interactions entre atomes, molécules et/ou cellules. Un exemple de mode de réalisation de la présente invention concerne une stratégie innovante de fusion-déconvolution pouvant réduire de façon spectaculaire le travail nécessaire pour créer des jeux de données à grande échelle. Cette stratégie de “ PI-déconvolution ” utilise un codage imaginaire base X sur N chiffres de XN protéines sonde (amorces ou proies) et permet le criblage des amorces dans X*N fusions, avec N répliques pour chaque amorce. La déconvolution de protéines avec leurs partenaires de liaison peut être obtenue en lisant le profil de la proie à partir des X*N expériences. La méthode peut être utilisée pour cribler XN amorces ou protéines proie.
PCT/US2007/004313 2006-02-16 2007-02-16 Stratégie innovante de déconvolution et de fusion pour criblage à grande échelle WO2007098130A2 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US77431706P 2006-02-16 2006-02-16
US60/774,317 2006-02-16

Publications (3)

Publication Number Publication Date
WO2007098130A2 WO2007098130A2 (fr) 2007-08-30
WO2007098130A9 true WO2007098130A9 (fr) 2007-10-18
WO2007098130A3 WO2007098130A3 (fr) 2008-11-06

Family

ID=38437936

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2007/004313 WO2007098130A2 (fr) 2006-02-16 2007-02-16 Stratégie innovante de déconvolution et de fusion pour criblage à grande échelle

Country Status (1)

Country Link
WO (1) WO2007098130A2 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10208347B2 (en) * 2016-05-25 2019-02-19 Bioinventors & Entrepreneurs Network, Llc Attribute sieving and profiling with sample enrichment by optimized pooling

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002070662A2 (fr) * 2001-03-02 2002-09-12 Gpc Biotech Ag Systeme de dosage a trois hybrides
WO2002074901A2 (fr) * 2001-03-19 2002-09-26 Hybrigenics Inference de carte d'interaction proteine-proteine faisant appel a des paires de profil de domaine d'interaction

Also Published As

Publication number Publication date
WO2007098130A2 (fr) 2007-08-30
WO2007098130A3 (fr) 2008-11-06

Similar Documents

Publication Publication Date Title
Chanda et al. Fulfilling the promise: drug discovery in the post-genomic era
Bader et al. Functional genomics and proteomics: charting a multidimensional map of the yeast cell
Wilson et al. Recent developments in protein microarray technology
Buck et al. ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments
Lay Jr et al. Problems with the “omics”
Braun Interactome mapping for analysis of complex phenotypes: insights from benchmarking binary interaction assays
Espadaler et al. Prediction of protein–protein interactions using distant conservation of sequence patterns and structure relationships
Scholtens et al. Local modeling of global interactome networks
Wang et al. In vitro DNA-binding profile of transcription factors: methods and new insights
Jin et al. A pooling-deconvolution strategy for biological network elucidation
Furka Forty years of combinatorial technology
Benegas et al. Robust and annotation-free analysis of alternative splicing across diverse cell types in mice
Naidu et al. Current knowledge on microarray technology-an overview
WO2007098130A9 (fr) Stratégie innovante de déconvolution et de fusion pour criblage à grande échelle
Alexandari et al. De novo distillation of thermodynamic affinity from deep learning regulatory sequence models of in vivo protein-DNA binding
US20040067539A1 (en) Method of making and using microarrays of biological materials
Braberg et al. Genetic interaction analysis of point mutations enables interrogation of gene function at a residue‐level resolution: Exploring the applications of high‐resolution genetic interaction mapping of point mutations
Johnston The yeast genome: on the road to the Golden Age
Haverty et al. Limited agreement among three global gene expression methods highlights the requirement for non-global validation
US20040096840A1 (en) Validated design for microarrays
Uttamchandani et al. The expanding world of small molecule microarrays
Yao et al. Exploiting antigen receptor information to quantify index switching in single-cell transcriptome sequencing experiments
Frueh et al. Large-scale molecular profiling approaches facilitating translational medicine: Genomics, transcriptomics, proteomics, and metabolomics
US6994965B2 (en) Method for displaying results of hybridization experiment
US20040073527A1 (en) Method, system and computer software for predicting protein interactions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07751098

Country of ref document: EP

Kind code of ref document: A2