WO2005016230A2 - Evaluation de signatures proteiques - Google Patents

Evaluation de signatures proteiques Download PDF

Info

Publication number
WO2005016230A2
WO2005016230A2 PCT/US2003/017979 US0317979W WO2005016230A2 WO 2005016230 A2 WO2005016230 A2 WO 2005016230A2 US 0317979 W US0317979 W US 0317979W WO 2005016230 A2 WO2005016230 A2 WO 2005016230A2
Authority
WO
WIPO (PCT)
Prior art keywords
sample
array
test
substrate
amino acid
Prior art date
Application number
PCT/US2003/017979
Other languages
English (en)
Other versions
WO2005016230A3 (fr
Inventor
Joshua Labaer
Original Assignee
President And Fellows Of Harvard College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by President And Fellows Of Harvard College filed Critical President And Fellows Of Harvard College
Priority to AU2003304409A priority Critical patent/AU2003304409A1/en
Priority to US10/910,718 priority patent/US8609344B2/en
Publication of WO2005016230A2 publication Critical patent/WO2005016230A2/fr
Publication of WO2005016230A3 publication Critical patent/WO2005016230A3/fr

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/53Immunoassay; Biospecific binding assay; Materials therefor
    • G01N33/569Immunoassay; Biospecific binding assay; Materials therefor for microorganisms, e.g. protozoa, bacteria, viruses
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/68Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving proteins, peptides or amino acids
    • G01N33/6803General methods of protein analysis not limited to specific proteins or families of proteins
    • G01N33/6845Methods of identifying protein-protein interactions in protein mixtures

Definitions

  • HBSAg a protein indicating the presence of active type B hepatitis
  • p24 used to detect the presence of HIV
  • CA-125 used to detect ovarian cancer and some lung cancers
  • CMV Antigen Cytomegalo virus detection
  • Cryptococcal Antigen detection of cryptococcal infection
  • Rheumatoid Factor rheumatoid arthritis
  • the invention features a method that includes: preparing a binding pattern signature for the target by contacting a first sample of a test array with a positive control for the target.
  • the test array has a plurality of proteins affixed to a substrate at replicable locations, wherein identification of compositions of the proteins need not be known.
  • a second sample of the array is contacted with a negative control not containing the target.
  • Conditions can be used such that components in the positive control detectably bind to locations of the first array and produce a target pattern, and components in the negative control detectably bind to locations of the second array and produce a control pattern.
  • the signature for the target biological material has locations present in the target pattern and absent from the control pattern.
  • a third sample of the array is contacted with the specimen, to obtain a specimen pattern of locations and the specimen pattern is compared with the target signature, so that presence of the target in the specimen is indicated by the presence of the target signature in the specimen pattern.
  • the target material can be of biological origin, such as a macromolecule, such as all or part of a protein, carbohydrate, lipid, or a monomer thereof, or a molecule having components of more than one of these, such as a lipoprotein.
  • the method can be used, e.g., for evaluating a target, e.g., a sample or a target material in a specimen.
  • the proteins affixed to the substrate are from a mammal, for example, the proteins affixed to the substrate are from a human, for example, the proteins affixed to the substrate are from a cancer cell.
  • Proteins on the array include portions of proteins, such as peptides and oligopeptides, which terms are used interchangeably and have the same meaning.
  • the cancer cell is, for example, a primary or metastatic cancer of lung, skin, leukemia, lymphoma, brain, breast, prostate, bowel, esophagus, liver, pancreas, and head or neck cancers.
  • the proteins affixed to the substrate are from a pathogen.
  • the pathogen is a virus, a bacterium, a fungus, or a protozoan.
  • the target can be a prion.
  • the protein on the array from a bacterium can be any species of a genus Actinobacillus, Bacillus, Borrelia, Brucella, Chlamydia, Clostridium, Coxiella, Enterococcus, Escherichia, Francisella, Hemophilus, Legionella, Mycobacterium, Neisseria, Pasteurella, Pseudomonas, Salmonella, Shigella, Staphylococcus, Streptococcus, Treponema, or Yersinia.
  • the Bacillus is B. anthracis
  • the Escherichia is E. coli O157:H7
  • the Mycobacterium is M. tuberculosis
  • the Borrelia is B.
  • virus is influenza, human immunodeficiency, Venezuelan equine encephalitis, West Nile, smallpox, rhinovirus, ⁇ bola, Rift Valley fever, Lassa fever, measles, mumps, Marburg, yellow fever, herpes, hantavirus, hepatitis A, hepatitis B, hepatitis C, rotavirus, parvovirus, rabies, respiratory syncytial, rubella, Epstein Barr, Newcastle disease, hoof and mouth, tobacco mosaic, Glycine mosaic comovirus, or wheat American striate.
  • the protein on an array from a fungus is a species from the group of genera Aspergillus, Candida, Phytophthora, Puccinia, Lichen, and Trichophyton.
  • the Aspergillus is A.flavus.
  • the target material can be a bacterial or fungal toxin.
  • the protein on an array from a protozoan is Plasmodium, Leishmania, Entamoeba, Enterocytozoan, Cryptosporidium, and Giardia.
  • the proteins affixed to the substrate are random proteins.
  • the specimen can be a biological fluid sample, for example, urine, saliva, lacrymal secretions, nasal discharge, blood, serum, plasma, lymph, perspiration, amniotic fluid, cerebrospinal fluid, ascites fluid, semen, vaginal secretions, feces, or cell extract.
  • the specimen is an environmental sample, for example, the environmental sample is a soil suspension, air infusion, pond water, lake water, river water, ocean water, sewage, industrial effluent, food, beverages, consumable goods, packaged goods, mail, baggage, a fluid extract of a rubbing or an instrument or object, or any physical sample in any phase (e.g., liquid, solid, or vapor).
  • the preparing step in certain embodiments is re-iterated to obtain a statistically significant number of signature binding patterns for the target material.
  • the method can be re-iterated for one or more additional target materials, for example, the signature binding pattern for the additional biological material is obtained using a fourth sample of the replicable test array.
  • the preparing step can in certain embodiments be preparing a binding signature for a target material which is a component of a novel organism. Iteration can be performed in parallel or sequentially.
  • the biological fluid can be obtained from a patient with an acute medical condition, for example, the acute medical condition is a cardiac condition, such as myocardial infarction or stroke.
  • the biological fluid can be obtained from a patient with an autoimmune disease, such as multiple sclerosis, myasthenia gravis, Hashimoto's disease, systemic lupus erythematosis, uveitis, Guillain-Barre' syndrome, Grave's disease, idiopathic myxedema, autoimmune oophoritis, chronic immune thrombocytopenic purpura, colitis, diabetes, psoriasis, pemphigus vulgaris, and rheumatoid arthritis.
  • the sample can be obtained from a patient having an inflammatory condition, for example, asthma, allergy, and inflammatory bowel syndrome.
  • the signature for the target biological material additionally comprises at least one location present in the control pattern and absent from the target pattern.
  • An embodiment of the inventions is thus a universal test array for detecting an unwanted cell, disease, or organism in a biological specimen from a mammal by a method described herein (e.g., the above method), wherein the sample of the test array comprises a plurality of proteins of the mammal.
  • the invention also provides a kit for detecting a pathogen, a kit for detecting a cancer cell, a kit for detecting an acute medical condition, and a kit for detecting an autoimmune disease, e.g., a kit provided by a method described herein.
  • the kit includes: (1) an array comprising a plurality of addresses, wherein each address of the plurality comprises a handle and (2) a vector nucleic acid comprising (i) a promoter; (ii) an entry site; and (iii) a tag encoding sequence, wherein the tag can be attached to the handle.
  • the kit can further include software and/or a database, e.g., in computer memory or a computer readable medium (e.g., a CD-ROM, a magnetic disc, flash memory.
  • Each record of the database can include a field for the polypeptide (e.g., a randomized polypeptide) encoded by the nucleic acid sequence and a descriptor or reference for the physical location of the encoding nucleic acid sequence in the kit, e.g., location in a microtitre plate.
  • the record also includes a field representing a result (e.g., a qualitative or quantitative result) of detecting the polypeptide encoded by the nucleic acid sequence.
  • the database can include a record for each address of the plurality present on the array.
  • the records can be clustered or have a reference to other records (e.g., including hierarchical groupings) based on the result.
  • the software can contain computer readable code to configure a computer- controlled robotic apparatus to manipulate nucleic acids encoding test amino acid sequences and vector nucleic acids in order to insert the encoding nucleic acids into the vector nucleic acids and further to manipulate the insertion products onto addresses of the array.
  • the kit can also include instructions for use of the array or a link or indication of a network resource (e.g., a web site) having instructions for use of the array or the above database of records describing the addresses of the array.
  • a method of providing an array can include: providing the aforementioned kit, and a plurality of nucleic acid sequences, each encoding a unique test amino acid sequence and an excision site.
  • the method further includes removing each of the plurality of nucleic acid sequence from the excision site and inserting it into the entry site of the vector nucleic acid to thereby generate a test nucleic acid sequence encoding a test polypeptide comprising the test amino acid sequence and the tag; and disposing each of the plurality of test nucleic acid sequences at an address of the array.
  • the proteins on the array are randomized or include a randomized segment of at least 10 amino acid in which at least four, five, eight, nine or ten positions are randomized. Randomization can be generated, e.g., using degenerate oligonucleotides or a random number generator (e.g., on a computer).
  • the randomized segment is between 10-50, 10-20, or 10-200 amino acids in length. Longer segments can also be used. The degree of randomization can vary at a given position and by the number of positions (e.g., between 10-100, 30-100, 60-100, 80-100 or 70-90% of the positions. Randomization can included biased compositions of starting material.
  • the randomized segment is within a domain of a folded protein, e.g., a binding loop of an extracellular protein or domain thereof, e.g., a CDR of an immunoglobulin domain.
  • the invention features a method of providing an interaction profile.
  • the method includes providing an array of capture probes, contacting a sample to the array, and identifying probes to which the sample interacts (e.g., to which one or more molecules in the sample interacts), thus providing an interaction profile.
  • the array includes a plurality of capture probes. Each capture probe is positionally distinguishable from the other probes. In one embodiment, each probe includes a unique region. In another embodiment, each probe includes a randomized region. 2005/016230
  • the interaction of the compound with the probe results in a covalent modification of the probe, e.g., a covalent bond of the probe can be broken or formed.
  • the interaction of the compound with the capture probe is a binding interaction wherein neither the compound nor the probe has a covalent bond broken or formed.
  • the interaction profile is a list of objects, each object representing a unique capture probe, and having an associated parameter, e.g., a numerical value. The list can contain two, three, four, five, six, seven, eight, nine, ten, 15, 20, 50, 100, 1000 or more objects.
  • each unique capture probe is represented by an object. In this embodiment, the list includes as many objects as unique capture probes.
  • the list includes the capture probes which interact with the compound.
  • the list can contain only those capture probes for which an interaction was detected, or only those capture probes for which an interaction met a predetermined condition.
  • Such a list has fewer objects as members than the number of unique capture probes.
  • the interaction profile is stored in computer memory, such as random access memory or flash memory, or on computer readable media, such as magnetic (e.g., a diskette, removable hard drive, or internal hard drive) or optical media (e.g., a compact disk (CD), DVD, or holographic media).
  • a profile stored in this manner can be on a personal computer, server, e.g., a network server, or mainframe, and can be accessed from another device across a network, e.g., an intranet or internet.
  • the interaction profile is printed on to a media such as a plastic, a paper or a label, e.g., as a bar code or variation thereof.
  • the parameter associated with each object of an interaction profile can be obtained from a quantitative observation, or a qualitative observation, preferably a quantitative observation.
  • the associated parameter is a function of the amount of interaction between the compound and the probe.
  • the amount of interaction can be the amount of binding, the amount of probe modification, or affinity.
  • the associated parameter is a function of the amount of binding between the compound and the probe.
  • the parameter can be a function of the amount of a quantitative observation such as a fluorescent signal, a radioactive signal, or a phosphorescent signal of a contacted capture probe.
  • the parameter can be provided by an instrument, e.g., a CCD camera.
  • the parameter is a function of the surface plasmon resonance at the site of a contacted capture probe.
  • the associated parameter are adjusted for a background signal.
  • the associated parameter is a function of moles of bound compound.
  • the associated parameter is an affinity, relative affinity, apparent affinity, association constant, dissociation constant, logarithm of an affinity, or free energy for binding, of the compound for the capture probe.
  • the associated parameter in the list are differ.
  • the list contains more than one object, and i.e., the associated parameter of the objects in the list are not all the same.
  • the associated parameters are values
  • the values can provide a range.
  • the values can be distributed in the range. In some embodiments, the values can approximate a Poisson distribution.
  • the list can contain objects whose associated values are zero, or null.
  • the list can contain objects whose associated values are positive or negative. In one embodiment, the list does not contain any objects whose associated values are zero or null.
  • interaction profiles are provided for a sample using varying amounts of the sample, i.e. an interaction profile is provided for a sample at a first concentration, at a second concentration, etc.
  • interaction profiles are provided for a sample for interaction with varying concentrations of capture probes.
  • an array can have more than one unit, the compositions of the units being identical, but the first unit having the probes at a first concentration, and the second unit having the probes at a second concentration, etc.
  • interaction profiles are provide for a sample for various intervals after contacting the sample to the array. For example, a first profile can be provided after a first interval of time has elapsed after application, and a second profile can be provided after a second interval, etc.
  • the capture probes contain a unique region.
  • the unique region is an interaction site, an interaction site variant, or putative interaction site.
  • the unique region is random.
  • the array of the present invention can contain at least two, four, preferably 16, 64, 96, 128, 384, 1536, or more unique capture probes.
  • the array is a solid silica support.
  • the plurality of nucleic acid probes can be stably attached to the support by a covalent bond.
  • a probe can be stably attached with a silane compound reactive with primary amines, or acrylamide moieties in the oligonucleotide or PCR productions, or an amino silane or polylysine or other polymer capable of UV crosshnking to single-stranded or double-stranded DNA
  • the array is a glass slide.
  • the slide can be treated with an activating agent, e g , 1,4- diphenylene-dnsothiocyanate (PDC)
  • the capture probes are small organic chemicals, e.g., compounds with a molecular weight of 10,000, 5,000, 3,000, or 1,000 Daltons or less.
  • the capture probes on the array can be biological polymers such as nucleic acids, polypeptides, complex sugars, and combinations thereof
  • a polypeptide can be covalently linked to a DNA or RNA.
  • the capture probes are polypeptides.
  • the polypeptides can contain 2, 10, 20, 30, 50, or 100 or more amino acids
  • the polypeptides are antibodies.
  • the capture probes are nucleic acids
  • the capture probes can be a deoxy ⁇ bonucleic acid (DNA), a ⁇ bonucleic acid (RNA), a peptide nucleic acid (PNA), or any combination thereof
  • the unique region is preferably a binding site
  • the term "database" refers to at least one table of information, containing at least one record A record is a row in the table A record can have one or more fields or att ⁇ butes.
  • a record in a database of interaction profiles, can have fields desc ⁇ bing the location of a capture probe on an array, the composition, e g , nucleic acid sequence, of the capture probe at the location, and/or a value, e g., a numencal value, which is a function of the extent of interaction of the capture probe with a sample.
  • the invention features an array including a substrate having a plurality of addresses
  • Each address of the plurality includes (1) a nucleic acid (e g , a DNA or an RNA) encoding a hyb ⁇ d amino acid sequence which includes a test amino acid sequence and an affinity tag, and, optionally, (2) a binding agent that recognizes the affinity tag.
  • each address of the plurality also includes one or both of (i) an RNA polymerase; and (ii) a translation effector.
  • each test amino acid sequence in the plurality of addresses is unique.
  • a test amino acid sequence can differ from all other test amino acid sequence of the plurality by 1, or more amino acid differences, (e.g., about 2, 3, 4, 5, 8, 16, 32, 64 or more differences; and, by way of example, has about 800, 256, 128, 64, or 32, 16, 8, 4, or fewer differences).
  • the test amino acid sequence encoded by the nucleic acid at each address of the plurality is identical to all other test amino acid sequences in the plurality of addresses.
  • the affinity tag encoded by the nucleic acid at each address of the plurality is the same, or substantially identical to all other affinity tags in the plurality of addresses.
  • the nucleic acid at each address of the plurality encodes more than one affinity tag.
  • the affinity tag encoded by the nucleic acid at an address of the plurality differs from at least one other affinity tag in the plurality of addresses.
  • the affinity tag is fused directly to the test amino acid sequence, e.g., directly amino-terminal, or directly carboxy-terminal.
  • the affinity tag is separated from the test amino acid by one or more linker amino acids, e.g., 1, 2, 3, 4, 5, 6, 8, 10, 12, 20, 30 or more amino acids, preferably about 1 to 20, or about 3 to 12 amino acids.
  • the linker amino acids can include a cleavage site, flexible amino acids (e.g., glycine, alanine, or serine, preferably glycine), and/or polar amino acids.
  • the linker and affinity tag can be amino-terminal or carboxy-terminal to the test amino acid sequence.
  • the nucleic acid can be a RNA, or a DNA (e.g., a single-stranded DNA, or a double stranded DNA).
  • the nucleic acid includes a plasmid DNA or a fragment thereof; an amplification product (e.g., a product generated by RCA, PCR, NASBA); or a synthetic DNA.
  • the nucleic acid can further include one or more of: a transcription promoter; a transcription regulatory sequence; a untranslated leader sequence; a sequence encoding a cleavage site; a recombination site; a 3' untranslated sequence; a transcriptional terminator; and an internal ribosome entry site.
  • the nucleic acid sequence includes a plurality of cistrons (also termed "open reading frames"), e.g., the sequence is dicistronic or polycistronic.
  • the nucleic acid also includes a sequence encoding a reporter protein, e.g., a protein whose abundance can be quantitated and can provide an indication of the quantity of test polypeptide fixed to the plate.
  • the reporter protein can be attached to the test polypeptide, e.g., covalently attached, e.g., attached as a translational fusion.
  • the reporter protein can be an enzyme, e.g., ⁇ -galactosidase, chloramphenicol acetyl transferase, ⁇ -glucuronidase, and so forth.
  • the reporter protein can produce or modulate light, e.g., a fluorescent protein (e.g., green fluorescent protein, variants thereof, red fluorescent protein, variants thereof, and the like), and luciferase.
  • the transcription promoter can be a prokaryotic promoter, a eukaryotic promoter, or a viral promoter.
  • the promoter is the T7 RNA polymerase promoter.
  • the regulatory components e.g., the transcription promoter, can vary among nucleic acids at different addresses of the plurality. For example, different promoters can be used to vary the amount of polypeptide produced at different addresses.
  • the nucleic acid also includes at least one site for recombination, e.g., homologous recombination or site-specific recombination, e.g., a lambda att site or variant thereof; a lox site; or a FLP site.
  • the recombination site lacks stop codons in the reading frame of a nucleic acid encoding a test amino acid sequence.
  • the recombination site includes a stop codon in the reading frame of a nucleic acid encoding a test amino acid sequence.
  • the nucleic acid includes a sequence encoding a cleavage site, e.g., a protease site, e.g., a site cleaved by a site-specific protease (e.g., a thrombin site, an enterokinase site, a PreScission site, a factor Xa site, or a TEV site), or a chemical cleavage site (e.g., a methionine, preferably a unique methionine (cleavage by cyanogen bromide) or a proline (cleavage by formic acid)).
  • a cleavage site e.g., a protease site, e.g., a site cleaved by a site-specific protease (e.g., a
  • the nucleic acid can include a sequence encoding a second polypeptide tag in addition to the affinity tag.
  • the second tag can be C-terminal to the test amino acid sequence and the affinity tag can be N-terminal to the test amino acid sequence; the second tag can be N-terminal to the test amino acid sequence, and the affinity tag can be C-terminal to the test amino acid sequence; the second tag and the affinity tag can be adjacent to one another, or separated by a linker sequence, both being N-terminal or C- terminal to the test amino acid sequence.
  • the second tag is an additional affinity tag, e.g., the same or different from the first tag.
  • the second tag is a recognition tag.
  • the recognition tag can report the presence and/or amount of test polypeptide at an address.
  • the recognition tag has a sequence other than the sequence of the affinity tag.
  • a plurality of polypeptide tags (e.g., less than 3, 4, 5, about 10, or about 20 tags) are encoded in addition to the first affinity tag.
  • Each polypeptide tag of the plurality can be the same as or different from the first affinity tag.
  • the nucleic acid sequence can further include an identifier sequence, e.g., a non-coding nucleic acid sequence, e.g., one that is synthetically inserted, and allows for uniquely identifying the nucleic acid sequence.
  • the identifier sequence can be sufficient in length to uniquely identify each sequence in the plurality; e.g., it is about 5 to 500, 10 to 100, 10 to 50, or about 10 to 30 nucleotides in length.
  • the identifier can be selected so that it is not complementary or identical to another identifier or any region of each nucleic acid sequence of the plurality on the array.
  • the test amino acid sequence can further include a protein splicing sequence or intein.
  • the intein can be inserted in the middle of a test amino acid sequence.
  • the intein can be a naturally-occurring intein or a mutated intein.
  • the nucleic acids encoding the test amino acid sequences can be obtained from a collection of full-length expressed genes (e.g., a repository of clones), a cDNA library, or a genomic library.
  • the encoding nucleic acids can be nucleic acids (e.g., an mRNA or cDNA) expressed in a tissue, e.g., a normal or diseased tissue.
  • the test polypeptides i.e., test amino acid sequences
  • the test polypeptides are random amino acid sequences, patterned amino acids sequences, or designed amino acids sequences (e.g., sequence designed by manual, rational, or computer-aided approaches).
  • the plurality of test amino acid sequences can include a plurality from a first source, and plurality from a second source.
  • the test amino acid sequences on half the addresses of an array are from a diseased tissue or a first species, whereas the sequences on the remaining half are from a normal tissue or a second species.
  • each address of the plurality further includes one or more second nucleic acids, e.g., a plurality of unique nucleic acids.
  • the plurality in toto can encode a plurality of test sequences.
  • each address of the plurality can encode a pool of test polypeptide sequences, e.g., a subset of a library or clone bank
  • a second array can be provided in which each address of the plurality of the second array includes a single or subset of members of the pool present at an address of the first array.
  • the first and the second array can be used consecutively.
  • each address of the plurality further includes a second nucleic acid encoding a second amino acid sequence.
  • each address of the plurality includes a first test amino acid sequence that is common to all addresses of the plurality, and a second test amino acid sequence that is unique among all the addresses of the plurality
  • the second test amino acid sequences can be query sequences whereas the first amino test amino acid sequence can be a target sequence.
  • each address of the plurality includes a first test amino acid sequence that is unique among all the addresses of the plurality, and a second test amino acid sequence that is common to all addresses of the plurality.
  • the first test amino acid sequences can be query sequences whereas the second amino test amino acid sequence can be a target sequence
  • the second nucleic acid encoding the second test amino acid sequence can include a sequence encoding a recognition tag and/or an affinity tag
  • the first and second amino acid sequences can be such that they interact with one another. In one preferred embodiment, they are capable of binding to each other.
  • the second test amino acid sequence is optionally fused to a detectable amino acid sequence, e g , an epitope tag, an enzyme, a fluorescent protein (e g , GFP, BFP, variants thereof)
  • the second test amino acid sequence can be itself detectable (e g., an antibody is available which specifically recognizes it).
  • one is capable of modifying the other (e.g., making or breaking a bond, preferably a covalent bond, of the other)
  • the first ammo acid sequence is kinase capable of phosphorylat g the second amino acid sequence; the first is a methyl ase capable of methylating the second; the first is a ubiquitin hgase capable of ubiquitinating the second, the first is a protease capable of cleaving the second; and so forth
  • These embodiments can be used to identify an interaction or to identify a compound that modulates, e g., inhibits or enhances, an interaction
  • the binding agent can be attached to the substrate
  • the substrate can be de ⁇ vatized and the binding agent covalent attached thereto
  • the binding agent can be attached via a bridging moiety, e g , a specific binding pair (e g , the substrate contains a first member of a specific binding pair, and the binding agent is linked to the second member of the binding pair, the
  • an insoluble substrate e.g., a bead or particle
  • the binding agent is attached to the insoluble substrate.
  • the insoluble substrate can further contain information encoding its identity, e g , a reference to the address on which it is disposed
  • the insoluble substrate can be tagged using a chemical tag, or an electronic tag (e g , a transponder).
  • the insoluble substrate can be disposed such that it can be removed for later analysis.
  • Each record of the database can include a field for the amino acid sequence encoded by the nucleic acid sequence and a desc ⁇ ptor or reference for the physical location of the nucleic acid sequence on the array.
  • the record also includes a field representing a result (e.g., a qualitative or quantitative result) of detecting the polypeptide encoded by the nucleic acid sequence.
  • the database can include a record for each address of the plurality present on the array.
  • the records can be clustered or have a reference to other records (e g , including hierarchical groupings) based on the result
  • the invention features an array including a substrate having a plurality of addresses.
  • Each address of the plurality includes: (1) an RNA encoding a hybrid amino acid sequence comp ⁇ sing a test amino acid sequence and an affinity tag, and (2) a binding agent that recognizes the affinity tag.
  • each address of the plurality also includes one or both of (I) a transc ⁇ ption effector, and ( ⁇ ) a translation effector
  • the array can include other features desc ⁇ bed herein
  • the invention features a method of providing an array of polypeptides The method includes (1) providing or obtaining a substiate with a plurality of addresses, each address of the plurality including (I) a nucleic acid encoding an amino acid sequence comp ⁇ sing a test amino acid sequence and an affinity tag, and (n) a binding agent that recognizes the affinity tag, (2) contacting each address of the plurality with a translation effector to thereby translate the hybnd amino acid sequence; and (3) maintaining the substrate under conditions permissive for the amino acid sequence to bind the binding agent.
  • the substrate can be contacted to a sample, e.g., as described here.
  • the nucleic acid provided on the substrate is synthesized in situ, e.g., by light-directed chemistry.
  • each address of the plurality is provided with a nucleic acid, e.g., by pipetting, spotting, printing (e.g., with pins), piezoelectric delivery, or, e.g., other means of mechanical delivery.
  • the provided nucleic acid is a template nucleic acid, and the method further includes amplifying the template, e.g., by PCR, NASBA, or RCA.
  • the method can further include transcribing the nucleic acid to produce one or more RNA molecules encoding the test amino acid sequence.
  • the method can further include washing the substrate, e.g., after sufficient contact with a translation effector.
  • the wash step can be repeated, e.g., one or more times, e.g., until a translation effector or translation effector component is removed.
  • the wash step can remove unbound proteins.
  • the stringency of the wash step can vary, e.g., the salt, pH, and buffer composition of the wash buffer can vary.
  • the substrate can be washed with a chaotrope, (e.g., guanidinium hydrochloride, or urea).
  • a chaotrope e.g., guanidinium hydrochloride, or urea
  • the chaotrope can itself be washed from the array, and the polypeptides renatured.
  • the nucleic acid sequence also encodes a cleavage site, e.g., a protease site, e.g., between the test amino acid sequence and the affinity tag.
  • the method can further include contacting an address of the array with a protease that specifically recognizes the site.
  • the method can further include contacting the substrate with a second substrate.
  • the gel in an embodiment wherein the substrate is a gel, the gel can be contacted with a second gel, and the contents of one gel can be transferred to another (e.g., by diffusion or electrophoresis).
  • the method can include disrupting the binding between the affinity tag and the binding agent or between the binding agent and the substrate prior to transfer.
  • the method can further include contacting the substrate with living cells, and detecting an address wherein a parameter of the cell is altered relative to another address.
  • each test amino acid sequence in the plurality of addresses is unique.
  • a test amino acid sequence can differ from all other test amino acid sequence of the plurality by 1, or more amino acid differences, (e.g., about 2, 3, 4, 5, 8, 16, 32, 64 or more differences; and, by way of example, has about 800, 256, 128, 64, or 32, 16, 8, 4, or fewer differences).
  • the test amino acid sequence encoded by the nucleic acid at each address of the plurality is identical to all other test amino acid sequences in the plurality of addresses.
  • the affinity tag encoded by the nucleic acid at each address of the plurality is the same, or substantially identical to all other affinity tags in the plurality of addresses.
  • the nucleic acid at each address of the plurality encodes more than one affinity tag.
  • the affinity tag encoded by the nucleic acid at an address of the plurality differs from at least one other affinity tag in the plurality of addresses.
  • the affinity tag is fused directly to the test amino acid sequence, e.g., directly amino-terminal, or directly carboxy-terminal.
  • the affinity tag is separated from the test amino acid by one or more linker amino acids, e.g., 1, 2, 3, 4, 5, 6, 8, 10, 12, 20, 30 or more amino acids, preferably about 1 to 20, or about 3 to 12 amino acids.
  • the linker amino acids can include a cleavage site, flexible amino acids (e.g., glycine, alanine, or serine, preferably glycine), and/or polar amino acids.
  • the linker and affinity tag can be amino-terminal or carboxy-terminal to the test amino acid sequence.
  • the nucleic acid can further include one or more of: a transcription promoter; a transcription regulatory sequence; a untranslated leader sequence; a sequence encoding a cleavage site; a recombination site; a 3' untranslated sequence; a transcriptional terminator; and an internal ribosome entry site.
  • the nucleic acid sequence includes a plurality of cistrons (also termed "open reading frames"), e.g., the sequence is dicistronic or polycistronic.
  • the nucleic acid also includes a sequence encoding a reporter protein, e.g., a protein whose abundance can be quantitated and can provide an indication of the quantity of test polypeptide fixed to the plate.
  • the reporter protein can be attached to the test polypeptide, e.g., covalently attached, e.g., attached as a translational fusion.
  • the reporter protein can be an enzyme, e.g., ⁇ -galactosidase, chloramphenicol acetyl transferase, ⁇ -glucuronidase, and so forth.
  • the reporter protein can produce or modulate light, e.g., a fluorescent protein (e.g., green fluorescent protein, variants thereof, red fluorescent protein, variants thereof, and the like), and luciferase.
  • the transcription promoter can be a prokaryotic promoter, a eukaryotic promoter, or a viral promoter. In a preferred embodiment, the promoter is the T7 RNA polymerase promoter.
  • the regulatory components, e.g., the transcription promoter can vary among nucleic acids at different addresses of the plurality.
  • the nucleic acid also includes at least one site for recombination, e.g., homologous recombination or site-specific recombination, e.g., a lambda att site or variant thereof; a lox site; or a FLP site.
  • the recombination site lacks stop codons in the reading frame of a nucleic acid encoding a test amino acid sequence.
  • the recombination site includes a stop codon in the reading frame of a nucleic acid encoding a test amino acid sequence.
  • the nucleic acid includes a sequence encoding a cleavage site, e.g., a protease site, e.g., a site cleaved by a site-specific protease (e.g., a thrombin site, an enterokinase site, a PreScission site, a factor Xa site, or a TEV site), or a chemical cleavage site (e.g., a methionine, preferably a unique methionine (cleavage by cyanogen bromide) or a proline (cleavage by formic acid)).
  • a site-specific protease e.g., a thrombin site, an enterokinase site, a PreScission site, a factor Xa site, or a TEV site
  • a chemical cleavage site e.g., a methionine, preferably a unique methionine (cle
  • the nucleic acid can include a sequence encoding a second polypeptide tag in addition to the affinity tag.
  • the second tag can be C-terminal to the test amino acid sequence and the affinity tag can be N-terminal to the test amino acid sequence; the second tag can be N-terminal to the test amino acid sequence, and the affinity tag can be C-terminal to the test amino acid sequence; the second tag and the affinity tag can be adjacent to one another, or separated by a linker sequence, both being N-terminal or C- terminal to the test amino acid sequence.
  • the second tag is an additional affinity tag, e.g., the same or different from the first tag.
  • the second tag is a recognition tag.
  • the recognition tag can report the presence and/or amount of test polypeptide at an address.
  • the recognition tag has a sequence other than the sequence of the affinity tag.
  • a plurality of polypeptide tags (e.g., less than 3, 4, 5, about 10, or about 20 tags) are encoded in addition to the first affinity tag.
  • Each polypeptide tag of the plurality can be the same as or different from the first affinity tag.
  • the nucleic acid sequence can further include an identifier sequence, e.g., a non-coding nucleic acid sequence, e.g., one that is synthetically inserted , and allows for uniquely identifying the nucleic acid sequence.
  • the identifier sequence can be sufficient in length to uniquely identify each sequence in the plurality; e.g., it is about 5 to 500, 10 to 100, 10 to 50, or about 10 to 30 nucleotides in length.
  • the identifier can be selected so that it is not complementary or identical to another identifier or any region of each nucleic acid sequence of the plurality on the array.
  • the test amino acid sequence can further include a protein splicing sequence or intein.
  • the intein can be inserted in the middle of a test amino acid sequence.
  • the intein can be a naturally-occurring intein or a mutated intein.
  • the nucleic acid sequences encoding the test amino acid sequences can be obtained from a collection of full-length expressed genes (e.g., a repository of clones), a cDNA library, or a genomic library.
  • the test amino acid sequences can be genes expressed in a tissue, e.g., a normal or diseased tissue.
  • the test polypeptides can be mutants or variants of a scaffold protein (e.g., an antibody, zinc-finger, polypeptide hormone etc.).
  • the test polypeptides are random amino acid sequences, patterned amino acids sequences, or designed amino acids sequences (e.g., sequence designed by manual, rational, or computer-aided approaches).
  • the plurality of test amino acid sequences can include a plurality from a first source, and plurality from a second source.
  • the test amino acid sequences on half the addresses of an array are from a diseased tissue or a first species, whereas the sequences on the remaining half are from a normal tissue or a second species.
  • each address of the plurality further includes one or more second nucleic acids, e.g., a plurality of unique nucleic acids.
  • the plurality in toto can encode a plurality of test sequences.
  • each address of the plurality can encode a pool of test polypeptide sequences, e.g., a subset of a library or clone bank.
  • a second array can be provided in which each address of the plurality of the second array includes a single or subset of members of the pool present at an address of the first array.
  • the first and the second array can be used consecutively.
  • each address of the plurality further includes a second nucleic acid encoding a second amino acid sequence.
  • each address of the plurality includes a first test amino acid sequence that is common to all addresses of the plurality, and a second test amino acid sequence that is unique among all the addresses of the plurality.
  • the second test amino acid sequences can be query sequences whereas the first amino test amino acid sequence can be a target sequence.
  • each address of the plurality includes a first test amino acid sequence that is unique among all the addresses of the plurality, and a second test amino acid sequence that is common to all addresses of the plurality.
  • the first test amino acid sequences can be query sequences whereas the second amino test amino acid sequence can be a target sequence.
  • the second nucleic acid encoding the second test amino acid sequence can include a sequence encoding a recognition tag and/or an affinity tag.
  • the first and second amino acid sequences can be such that they interact with one another. In one preferred embodiment, they are capable of binding to each other.
  • the second test amino acid sequence is optionally fused to a detectable amino acid sequence, e.g., an epitope tag, an enzyme, a fluorescent protein (e.g., GFP, BFP, variants thereof).
  • the second test amino acid sequence can be itself detectable (e.g., an antibody is available which specifically recognizes it).
  • the method can further include detecting the second test amino acid sequence at each address of the plurality, e.g., by detecting the detectable amino acid sequence (e.g., the epitope tag, enzyme or fluorescent protein).
  • one is capable of modifying the other (e.g., making or breaking a bond, preferably a covalent bond, of the other).
  • the first amino acid sequence is kinase capable of phosphorylating the second amino acid sequence; the first is a methylase capable of methylating the second; the first is a ubiquitin ligase capable of ubiquitinating the second; the first is a protease capable of cleaving the second; and so forth.
  • the method can further include detecting the modification at each address of the plurality. These embodiments can be used to identify an interaction or to identify a compound that modulates, e.g., inhibits or enhances, an interaction.
  • the binding agent can be attached to the substrate.
  • the substrate can be derivatized and the binding agent covalent attached thereto.
  • the binding agent can be attached via a bridging moiety, e.g., a specific binding pair, (e.g., the substrate contains a first member of a specific binding pair, and the binding agent is linked to the second member of the binding pair, the second member being attached to the substrate).
  • a bridging moiety e.g., a specific binding pair
  • the substrate contains a first member of a specific binding pair, and the binding agent is linked to the second member of the binding pair, the second member being attached to the substrate.
  • an insoluble substrate e.g., a bead or particle
  • the insoluble substrate can further contain information encoding its identity, e.g., a reference to the address on which it is disposed.
  • the insoluble substrate can be tagged using a chemical tag, or an electronic tag (e.g., a transponder).
  • the insoluble substrate can be disposed such that it can be removed for later analysis.
  • the invention features a method of providing an array across a network, e.g., a computer network, or a telecommunications network.
  • the method includes: providing a substrate comprising a plurality of addresses, each address of the plurality having a binding agent; providing a plurality of nucleic acid sequences, each nucleic acid sequence comprising a sequence encoding a test amino acid sequence and an affinity tag that is recognized by the binding agent; providing on a server a list of either (i) nucleic acid sequences of the plurality or (ii) subsets of the plurality (e.g., sets of randomized sequences); transmitting the list across a network to a user; receiving at least one selection of the list from the user; disposing the one or more nucleic acid sequence corresponding to the selection on an address of the plurality; and providing the substrate to the user.
  • the plurality of nucleic acid sequences includes a random segment, e.g., a segment encoding a randomized polypeptide sequence.
  • each nucleic acid sequence is disposed at a unique address. For example, if a subset is selected, each nucleic acid sequence of the subset is disposed at a unique address.
  • a plurality of nucleic acid sequences are disposed at each address. The method can further include contacting each address of the plurality with one or more of (i) a transcription effector, and (ii) a translation effector.
  • the substrate is maintained under conditions permissive for the amino acid sequence to bind the binding agent.
  • One or more addresses can then be washed, e.g., to remove at least one of (i) the nucleic acid, (ii) the transcription effector, (iii) the translation effector, and or (iv) an unwanted polypeptide, e.g., an unbound polypeptide or unfolded polypeptide.
  • the array can optionally be contacted with a compound, e.g., a chaperone; a protease; a protein-modifying enzyme; a small molecule, e.g., a small organic compound (e.g., of molecular weight less than 5000, 3000, 1000, 700, 500, or 300 Daltons); nucleic acids; or other complex macromolecules e.g., complex sugars, lipids, or matrix molecules.
  • the array can be further processed, e.g., prepared for storage. It can be enclosed in a package, e.g., an air- or water-resistant package.
  • the array can be desiccated, frozen, or contacted with a storage agent (e.g., a cryoprotectant, an antibacterial, an anti-fungal).
  • a storage agent e.g., a cryoprotectant, an antibacterial, an anti-fungal
  • an array can be rapidly frozen after being optionally contacted with a cryoprotectant. This step can be done at any point in the process (e.g., before or after contacting the array with an RNA polymerase; before or after contacting the array with a translation effector; or before or after washing the array).
  • the packaged product can be supplied to a user with or without additional contents, e.g., a transcription effector, a translation effector, a vector nucleic acid, an antibody, and so forth.
  • each test amino acid sequence in the plurality of addresses is unique.
  • a test amino acid sequence can differ from all other test amino acid sequence of the plurality by 1, or more amino acid differences, (e.g., about 2, 3, 4, 5, 8, 16, 32, 64 or more differences; and, by way of example, has about 800, 256, 128, 64, or 32, 16, 8, 4, or fewer differences).
  • the test amino acid sequence encoded by the nucleic acid at each address of the plurality is identical to all other test amino acid sequences in the plurality of addresses.
  • the affinity tag encoded by the nucleic acid at each address of the plurality is the same, or substantially identical to all other affinity tags in the plurality of addresses.
  • the nucleic acid at each address of the plurality encodes more than one affinity tag.
  • the affinity tag encoded by the nucleic acid at an address of the plurality differs from at least one other affinity tag in the plurality of addresses.
  • the affinity tag is fused directly to the test amino acid sequence, e.g., directly amino-terminal, or directly carboxy-terminal.
  • the affinity tag is separated from the test amino acid by one or more linker amino acids, e.g., 1, 2, 3, 4, 5, 6, 8, 10, 12, 20, 30 or more amino acids, preferably about 1 to 20, or about 3 to 12 amino acids.
  • the linker amino acids can include a cleavage site, flexible amino acids (e.g., glycine, alanine, or serine, preferably glycine), and/or polar amino acids.
  • the linker and affinity tag can be amino-terminal or carboxy-terminal to the test amino acid sequence.
  • the nucleic acid can be a RNA, or a DNA (e.g., a single-stranded DNA, or a double stranded DNA).
  • the nucleic acid includes a plasmid DNA or a fragment thereof; an amplification product (e.g., a product generated by RCA, PCR, NASBA); or a synthetic DNA.
  • the nucleic acid can further include one or more of: a transcription promoter; a transcription regulatory sequence; a untranslated leader sequence; a sequence encoding a cleavage site; a recombination site; a 3' untranslated sequence; a transcriptional terminator; and an internal ribosome entry site.
  • the nucleic acid sequence includes a plurality of cistrons (also termed "open reading frames"), e.g., the sequence is dicistronic or polycistronic.
  • the nucleic acid also includes a sequence encoding a reporter protein, e.g., a protein whose abundance can be quantitated and can provide an indication of the quantity of test polypeptide fixed to the plate.
  • the reporter protein can be attached to the test polypeptide, e.g., covalently attached, e.g., attached as a translational fusion.
  • the reporter protein can be an enzyme, e.g., ⁇ -galactosidase, chloramphenicol acetyl transferase, ⁇ -glucuronidase, and so forth.
  • the reporter protein can produce or modulate light, e.g., a fluorescent protein (e.g., green fluorescent protein, variants thereof, red fluorescent protein, variants thereof, and the like), and luciferase.
  • the transcription promoter can be a prokaryotic promoter, a eukaryotic promoter, or a viral promoter.
  • the promoter is the T7 RNA polymerase promoter.
  • the regulatory components e.g., the transcription promoter, can vary among nucleic acids at different addresses of the plurality. For example, different promoters can be used to vary the amount of polypeptide produced at different addresses.
  • the nucleic acid also includes at least one site for recombination, e.g., homologous recombination or site-specific recombination, e.g., a lambda att site or variant thereof; a lox site; or a FLP site.
  • the recombination site lacks stop codons in the reading frame of a nucleic acid encoding a test amino acid sequence.
  • the recombination site includes a stop codon in the reading frame of a nucleic acid encoding a test amino acid sequence.
  • the nucleic acid includes a sequence encoding a cleavage site, e.g., a protease site, e.g., a site cleaved by a site-specific protease (e.g., a thrombin site, an enterokinase site, a PreScission site, a factor Xa site, or a TEV site), or a chemical cleavage site (e.g., a methionine, preferably a unique methionine (cleavage by cyanogen bromide) or a proline (cleavage by formic acid)).
  • a cleavage site e.g., a protease site, e.g., a site cleaved by a site-specific protease (e.g., a
  • the nucleic acid can include a sequence encoding a second polypeptide tag in addition to the affinity tag.
  • the second tag can be C-terminal to the test amino acid sequence and the affinity tag can be N-terminal to the test amino acid sequence; the second tag can be N-terminal to the test amino acid sequence, and the affinity tag can be C-terminal to the test amino acid sequence; the second tag and the affinity tag can be adjacent to one another, or separated by a linker sequence, both being N-terminal or C- terminal to the test amino acid sequence.
  • the second tag is an additional affinity tag, e.g., the same or different from the first tag.
  • the second tag is a recognition tag.
  • the recognition tag can report the presence and/or amount of test polypeptide at an address.
  • the recognition tag has a sequence other than the sequence of the affinity tag.
  • a plurality of polypeptide tags (e.g., less than 3, 4, 5, about 10, or about 20 tags) are encoded in addition to the first affinity tag.
  • Each polypeptide tag of the plurality can be the same as or different from the first affinity tag.
  • the nucleic acid sequence can further include an identifier sequence, e.g., a non-coding nucleic acid sequence, e.g., one that is synthetically inserted, and allows for uniquely identifying the nucleic acid sequence.
  • the identifier sequence can be sufficient in length to uniquely identify each sequence in the plurality; e.g., it is about 5 to 500, 10 to 100, 10 to 50, or about 10 to 30 nucleotides in length.
  • the identifier can be selected so that it is not complementary or identical to another identifier or any region of each nucleic acid sequence of the plurality on the array.
  • the test amino acid sequence can further include a protein splicing sequence or intein.
  • the intein can be inserted in the middle of a test amino acid sequence.
  • the intein can be a naturally-occurring intein or a mutated intein.
  • the nucleic acid sequences of the plurality can be obtained from a collection of full-length expressed genes (e.g., a repository of clones), a cDNA library, or a genomic library.
  • the test amino acid sequences can be genes expressed in a tissue, e.g., a normal or diseased tissue.
  • the test polypeptides can be mutants or variants of a scaffold protein (e.g., an antibody, zinc-finger, polypeptide hormone etc.).
  • the test polypeptides are random amino acid sequences, patterned amino acids sequences, or designed amino acids sequences (e.g., sequence designed by manual, rational, or computer-aided approaches).
  • the plurality of test amino acid sequences can include a plurality from a first source, and plurality from a second source.
  • the server can be provided with lists of test amino acid sequences associated with a diseased tissue or a first species in addition to lists of test amino acid sequences associated with a normal tissue or a second species.
  • the binding agent can be attached to the substrate.
  • the substrate can be derivatized and the binding agent covalent attached thereto.
  • the binding agent can be attached via a bridging moiety, e.g., a specific binding pair, (e.g., the substrate contains a first member of a specific binding pair, and the binding agent is linked to the second member of the binding pair, the second member being attached to the substrate).
  • an insoluble substrate e.g., a bead or particle
  • the binding agent is attached to the insoluble substrate.
  • the insoluble substrate can further contain information encoding its identity, e.g., a reference to the address on which it is disposed.
  • the insoluble substrate can be tagged using a chemical tag, or an electronic tag (e.g., a transponder).
  • the insoluble substrate can be disposed such that it can be removed for later analysis.
  • the invention also features a computer system including (i) a server storing a list of amino acid sequences and/or their descriptors, and (ii) software configured to: (1) send a list of amino acid sequence and/or their descriptors to a client; (2) receive from the client a plurality of selected amino acid sequences from the list ; and (3) interface with an array provider (e.g., a robotic system, or a technician) so as to dispose on a substrate nucleic acids encoding the selected amino acid sequences, each at a plurality of addresses.
  • an array provider e.g., a robotic system, or a technician
  • the invention also features a computer system including (i) a server storing a list of amino acid sequences and/or their descriptors, and (ii) software configured to: ( 1) receive information (e.g., from a client, e.g. a remote client) about interactions between the amino acid sequences and a sample (e.g., a sample including an unknown); (2) compare the information about the interactions to a database of interactions observed for other samples (e.g., other unknowns or other controls), and (3) send results of the comparison to a user (e.g., the client).
  • information e.g., from a client, e.g. a remote client
  • a sample e.g., a sample including an unknown
  • a sample e.g., a sample including an unknown
  • randomized refers to one or more sequences in which any subunit (e.g., nucleotide, ribonucleotides, or amino acid) can be present at one, more than one or all specified or unspecified positions; therefore, for such positions as are randomized, the sequence of the finished molecule is not pre-determined, but is left to at least some degree of chance.
  • a process of randomizing a protein or nucleic acid can refer to a synthetic method in which the incorporation of a subunit is left to at least some degree of chance.
  • the user uses a plurality of protein ligands (e.g., random ligands on an array) to rapidly detect the presence of an agent that is wholly or partially composed of macromolecules (virus, bacteria, parasite, cancer antigen, disease markers, etc.) in a sample derived from a patient (e.g., from blood, urine, exhaled air, etc.)or in an environment (air, water supply, etc.).
  • a sample derived from a patient e.g., from blood, urine, exhaled air, etc.
  • an environment air, water supply, etc.
  • Exemplary situations for rapidly diagnosis include situations in which: (l)the patient is recently admitted to a hospital with a fever and other signs of infection, a bacterial or viral agent is suspected, but a specific diagnosis is required; (2)a group of individuals are traveling in a foreign country and become ill with a common set of symptoms suspected of being a possible infection, the identity of the agent is not known and the ability to rapidly detect the infection and identify the agent becomes essential; (3)an air or water supply is suspected of contamination by a bioterrorist agent, detection of this agent becomes essential.
  • the specific and rapid diagnosis of a particular variation of cancer will allow a more appropriate specifically-tailored chemotherapy.
  • This invention will address these situations and will enable the appropriate detection of signature patterns that point to a specific diagnosis (or detection )in a field environment and in real time.
  • Virtually all pathogens (and diseases) have several or more (usually many) macromolecules that are unique to the pathogen (or disease)that are not normally found in healthy blood (or other tested sources).
  • infected (or affected)hosts produce response proteins that could also signal disease. Detection of these marker proteins enables identification of the pathogen (or specifically diagnose the disease).
  • marker proteins has already been demonstrated in a number of cases for example: HBSAg (a protein indicating the presence of active type B hepatitis), p24 (used to detect the presence of HIV), CA-125 (used to detect ovarian cancer though also found in some lung cancers), CMV Antigen (Cytomegalovirus detection), Cryptococcal Antigen (detection of cryptococcal infection), Rheumatoid Factor (rheumatoid arthritis), etc.
  • Some advantages of some embodiments described herein include: 1) there is no need to select or identify proteins or macromolecules that are specific enough to point to a particular pathogen or disease, and 2) more than one protein associated with a particular pathogen or disease can be detected by a profiling method.
  • some antigens are associated with more than one disease. For example, CA-125 is unusual in healthy women and is often elevated in ovarian cancer. But the same antigen is also elevated in some lung cancers. Thus, integration of additional information should improve the specificity of diagnosis.
  • a signature pattern of several or more proteins is identified on a test array and this is used to make the diagnosis.
  • One implementation uses a set of identical arrays comprising a collection of specific capture probes (e.g., each with a unique binding property).
  • the capture probe can have a chemistry or structure that has a high likelihood of binding to macromolecules.
  • each element in the array is known, but its binding specificity does not need to be known.
  • the test array is probed with a specimen and the macromolecules in the specimen will bind to various elements of the test array.
  • the test array is then examined for a signature pattern that specifically differentiates between individuals infected with the agent and normal individuals.
  • the pattern for the signature can be determined heuristically by training the test array on test sets and then testing unknowns.
  • fuzzy logic, genetic algorithms, or multi-dimensional distant metrics can be used to compare signatures or profiles, e.g., to classify profiles as related or unrelated and so forth.
  • Test array This array contains many elements to which macromolecules will bind. Each element in the array can be identified well enough to reproducibly place it on an array whenever desired. A collection of arrays with the same elements will be needed for the training set. The elements in the array can be varied to allow a broad range of binding specificities. The choice of test arrays can be adjusted depending on the application. This inventions works particularly well in conjunction with NAPPA, which allows the simple adjustment of a protein array to include any desired protein elements by simply spotting different samples of DNA. Examples of possible arrays: a.
  • Pathogen proteome array Among the best arrays in this context would be a collection of proteins present in the targeted pathogen. Because most macromolecules in an organism will bind to other proteins in the organism, there will be a high hit rate and many spots will light. All proteins in the proteome are not required, just a large sample, b. Host proteome. Another good choice will be a large collection of host proteins. Because the pathogen interacts with the host, a collection of host proteins that interact with pathogen macromolecule can be used.c. Collection of random proteins. There is enough variation in protein chemistry that a well-randomized collection of proteins or peptides will bind some fraction of the pathogen macromolecules and create a signature pattern, d. Small molecules -a well randomized set of small molecules with varied chemistries could also be used here
  • Detection system This is a system that detects binding of any macromolecule to any element of the array.
  • the detection system need not require the detection of any specific protein, it can merely detect that something has bound to certain elements. This can be accomplished in several ways, not limited to the following: a. Surface Plasmon Resonance -a change in index of refraction is detected, which indicates macromolecule binding, b.
  • Surface Plasmon Enhanced Illumination -a resonance is set up by an array of holes or features. A change of index of refraction at the binding surface shifts resonant wavelength and demonstrates macromolecule binding, c.
  • Sample labeling The macromolecules in the sample are labeled with fluorescent or radioactive markers.
  • the detector measures the presence of the marker at specific positions on the microarray and indicates that a macromolecule has bound.
  • Samples -Samples can be acquired from affected individuals and from control normal individuals. Enough samples are acquired to provide an opportunity train the algorithms to differentiate a normal sample from an affected sample.
  • An exemplary Process (see attached figure) 1. A collection of identical test arrays are created. 2. Samples from individuals with or without the pathogen are each reacted with a test array. 3. Test a ⁇ ays are appropriately washed to eliminate non-specific binding. 4. Raw data are collected on each test array showing the elements with specific binding 5. Heuristic analysis compares infected to normal individuals to find specific patterns 6. Several specific patterns may be expected: a.
  • Constant background -patterns illuminated in all samples b.. Protein Signature -patterns illuminated only in infected samples c. Individual variation -elements that light in some individuals and not others 7. Complete a statistical analysis to find those elements with good predictability in evaluating unknowns. 8. Unknown samples are then read and compared to the determined patterns to make diagnoses.
  • Some advantages of certain aspects may include: 1. The signature pattern does not need to be a single absolute pattern, only a set of patterns that statistically indicate the presence of the pathogen 2. The same set of a ⁇ ays may be used to identify more than one pathogen 3. This tool can be used to find a pattern for a pathogen, even if the pathogen has never before been identified.
  • a group of affected individuals can be identified and a separate group of control individuals can be identified, the heuristic algorithms can be used to find a specific pattern 4.
  • the method does not use of mass spectrometry to identify the proteins bound to the array. Detection of binding could be done with relatively simple instruments that could be fairly compact in size. 5.
  • Some pathogens have the ability to mutate and change their phenotype.
  • the continued inclusion of new data into the training sets enables the signature patterns can evolve with the changing pathogens. Thus the arrays may never become obsolete. 6.
  • Expected future uses and/or commercial applications may include: 1. Clinical diagnosis of patients with suspected infections of known pathogens. 2. Clinical diagnosis of specific variants of pathogens for rapid adjustment of antibiotic therapy 3. Rapid clinical diagnosis of patients with other disorders where marker proteins may be helpful, e.g., specific forms of cancer, specific types of rheumatic diseases, acute myocardial infarction 4. Clinical evaluation of populations with suspected pathogen of unknown character (requires a control population of unaffected individuals). 5.
  • FIG. 1 Evaluation of samples (water supply, food, air, etc.)for the presence of microorganisms or toxic macromolecules. 6. Evaluation of samples (e.g., water supply, food, air, etc.) for the presence of a threat, e.g., bioterrorist contamination.
  • a threat e.g., bioterrorist contamination.
  • Corresponding samples from normal individuals are illustrated in 2a, 2b, . . . . and 2e.
  • the profiles are compared, e.g., using heuristic analysis to identify locations common to all individuals (horizontal hatching), specific to infected individuals, or specific to normal individuals (vertical hatching) or without co ⁇ elation (diagonal hatching).
  • FIG. 3 the patterns are summarized.
  • Macromolecular arrays of proteins can be used to detect an interaction profile (or binding signature pattern) for a biological sample specimen, e.g., to detect a pathological condition in a subject
  • the invention provides in various embodiments methods to enable the user to rapidly detect the presence of an agent by identifying macromolecules such as proteins from a virus, a bacterium, a fungus, a parasite, a cancer antigen, etc., in a biological sample specimen from a patient (for example, blood, urine, perspiration, amniotic fluid, lachrymal secretions, vaginal secretions, semen, exhaled air, saliva, sweat, cerebrospinal fluid, tears, feces, or extracts of cells or tissue) or in an environment (for example, air, water supply, soil, vegetation, etc.).
  • a variety of public health situations would benefit from an ability to more rapidly diagnose the presence of a specific pathogen, including: (1) a patient is recently admitted to a hospital with fever and other signs of infection, a bacterial or viral agent is suspected, and a specific diagnosis is required; (2) a group of individuals is traveling in a foreign country and become ill with a common set of symptoms suspected of infectious etiology, the identity of the agent is not known, and the ability to rapidly detect and identify the agent of infection is essential; (3) a group of dead livestock or wild animals are found, and distinction between a pathogen or another causative factor must be made; (4) an air or water supply is suspected of contamination by a bioterrorist infectious agent, and rapid detection and identification of this agent become essential.
  • aspects of this disclosure address these situations, providing methods of detection of appropriate signature patterns that point to a specific diagnosis or detection in a clinical setting or in a field environment, and enabling availability of diagnostic data, e.g., in a matter of hours.
  • a profile or signature pattern
  • a test array that includes a plurality of proteins, and this profile or signature is then used to make the diagnosis.
  • the test array is replicable, i.e., it is one of a set of identical arrays comprising a collection of specific binding sites, each site having a unique binding chemistry likely to bind to at least one macromolecule, such as a protein, peptide or oligopeptide.
  • the identity of the source of each element in the a ⁇ ay is known, as are reproducible methods of obtaining and applying each element, but neither the specific identity (function, sequence, etc.) nor its binding specificity need be known. Elements of known identity can be included.
  • a "sample" of each of the replicable test arrays is probed with biological specimens.
  • the method thus uses a plurality of duplicated test a ⁇ ays, or can re-use samples of arrays, or can use a combination.
  • Components present in the specimen bind to various individual proteins positioned at "addressable" locations (the locations being replicable for each "sample" of the test a ⁇ ay) on the test array.
  • Each test a ⁇ ay is then examined for a signature pattern that differentiates between specimens, for example, from contrasting sets of individuals, for example, between a set of individuals infected with a pathogenic agent and a set of uninfected individuals.
  • the signature pattern for each target such as a pathogen is determined heuristically, by training the test array on a set of known positive specimens, and then testing unknown specimens.
  • An advantage of this method is that it is not necessary to identify and analyze the chemistry of any of the proteins in the array, or to use known proteins, including positive marker proteins, as long as the specific proteins in a signature pattern to which specimen molecules bind on the test array are replicable, that is, can be replicated to the same addressable location for each of the additional plurality of samples of the arrays. Each sample of the test array is then identical with respect to components present of addressable locations. Moreover, because the diagnosis is comprised of a signature pattern of binding of several components of the specimen to locations of the array, the sensitivity of the approach is significantly increased compared to use of a single marker.
  • Test array is in general a two-dimensional substrate which contains a plurality of binding components, such as proteins, each at an addressable location; a subset of the components of certain characteristic locations can bind molecules in a biological sample specimen.
  • a positionally addressable a ⁇ ay can comprise a plurality of different substances, for example proteins, polypeptide, peptide or oligopeptide molecules comprising functional domains of the proteins, protein containing cellular material, or even whole cells or viruses, on a solid support (or substrate).
  • a test a ⁇ ay comprises from about 5 to about 1,000 locations, or about 50 to about 5,000 locations, or about 100 to about 10,000 locations. Each component in the a ⁇ ay is identified only sufficiently well to reproducibly deposit it at the addressable location in a consistent quantity, and to affix it to the location on the array under scale-up conditions of production required for the large numbers of arrays for commercial use. Proteins can be affixed to the substrate, for example, to an aldehyde treated glass slide (MacBeath et al, Science 289: 1760, 2000).
  • a plurality of a ⁇ ays can be used as a "training set.”
  • the number and variety of protein components in the array should be sufficient to allow a broad range of specimen component binding specificities and can be subsequently reduced in second generation arrays for a specific diagnostic application.
  • the choice of source and number of components to deposit in a test array can thus be adjusted for the application.
  • a NAPPA nucleic acid programmed protein arrays
  • enables a protein array to include a plurality of protein elements by spotting different samples of DNA is used.
  • a “nucleic acid programmable protein array” or “NAPPA” refers to an array having a plurality of nucleic acids disposed at addressable locations on the array, on which synthesis of a polypeptide encoded by the nucleic acid is conducted such that the polypeptide remains bound to the array.
  • Examples of potential sources of compositions for a ⁇ ays include, all or part of a pathogen proteome a ⁇ ay, or a host proteome array. Also suitable are a random protein array, and a small molecule array.
  • a pathogen proteome array or subset, i.e., a collection of proteins present in a targeted pathogen, is a preferred embodiment because many macromolecules in a target organism will bind other proteins in the same organism with high affinity.
  • a complete set of proteins in the proteome is not required, as a large subset of pathogen proteins provides an initial array sufficient for the training set.
  • detectably bound means that the presence of a component from the target specimen can be detected bound to an addressable location on the array, and indicates that a component of the specimen has bound.
  • a “detection system” is a system that detects binding of a macromolecule to a protein at any location on the array. The detection method need not require detection of the presence of a specific protein or interaction, rather the method detects that a composition has bound to an addressable location in an array. Detection can be accomplished in several ways, including Surface Plasmon Resonance, in which a change in index of refraction is detected as a result of macromolecule binding.
  • a change of index of refraction at the binding surface can be detected by Surface Plasmon Enhanced Illumination, in which a resonance is set up by an a ⁇ ay of holes or features due to shifts in resonant wavelength which demonstrate macromolecule binding.
  • Sample labeling can be used, i.e., macromolecules in the sample are labeled with one or more fluorescent or radioactive markers.
  • Specimen samples can include, e.g., a biological fluid, cell or tissue, or an environmental sample, in the case of positive controls are acquired from affected subjects, for example, subjects having an infection or a cancer, in contrast to control unaffected individuals, or from an environment.
  • a plurality of samples and control specimens can be used to provide for statistically significant training of the algorithms to differentiate a normal sample from an affected sample.
  • the plurality can be, for example, at least 2, or 3 to 10 samples, or 5 to 12, or 50-200 samples for any particular target, disease or disorder.
  • Test arrays allow the direct analysis of discrete protein binding and other activities without the complications of adverse in vivo effects.
  • a low- density (96 well format) protein array has been developed in which proteins, spotted onto a nitrocellulose membrane and biomolecular interactions, were visualized by autoradiography (Ge, H. 2000 Nucleic Acids Res. 28:e3, 1- VII).
  • a high-density protein array (100,000 samples within 222 X 222 mm) that was used for antibody screening was formed by spotting proteins onto polyvinylidene difluoride (PVDF;Lueking et al. 1999 Anal. Biochem. 270:103-111). Proteins have been printed on a flat glass plate that contained wells formed by an enclosing hydrophobic Teflon mask, and the arrayed antigens were detected using enzyme-linked immunosorbent assay (ELISA) techniques (Mendoza et al. (1999) Biotechniques 27:778-788.).
  • PVDF polyvinylidene difluoride
  • ELISA enzyme-linked immunosorbent assay
  • the binding component is in one embodiment attached to the substrate of the test array.
  • the substrate can be derivatized and the binding component covalent attached thereto.
  • the binding agent can be attached via a bridging moiety, e.g., a specific binding pair, (e.g., the substrate contains a first member of a specific binding moiety, and the binding component is linked to the second member of the binding pair, the second member being attached to the substrate).
  • test array can also refer to a set of micro-wells with a plurality of addresses in which the binding components are deposited.
  • a database e.g., as a computer memory or a computer readable medium of the collection of signatures for each test array, can be included.
  • the database can have a field representing a result (e.g., a qualitative or quantitative result).
  • the database includes a record for each address of the plurality present on the a ⁇ ay.
  • the records can be clustered or have a reference to other records (e.g., including hierarchical groupings) based on the result.
  • each test location in the plurality of addresses features a composition that is unique.
  • a portion of the addresses can be redundant, providing internal controls. Redundancy is controlled by knowledge of the complexity of the components, e.g., a proteome, a random protein library, etc. For example, a proteome having about 35,000 unique gene products, if represented by a test array of
  • test a ⁇ ay having components from a target organism or a disease cell at addressable locations can be used to identify an interaction or to identify a compound that modulates, e.g., inhibits or enhances, an interaction.
  • a polypeptide or protein at an addressable location includes a cleavage site, e.g., a protease site, e.g., a site cleaved by a site-specific protease (e.g., a thrombin site, an enterokinase site, a PreScission site, a factor Xa site, or a TEV site), or a chemical cleavage site (e.g., a methionine, preferably a unique methionine (cleavage by cyanogen bromide) or a proline (cleavage by formic acid)).
  • a site-specific protease e.g., a thrombin site, an enterokinase site, a PreScission site, a factor Xa site, or a TEV site
  • a chemical cleavage site e.g., a methionine, preferably a unique methionine (
  • test amino acid sequence can further include a protein splicing sequence or intein.
  • the intein can be inserted in the middle of a test amino acid sequence.
  • the intein can be a naturally-occurring intein or a mutated intein.
  • a variety of test amino acid sequences can be disposed at different addresses of the plurality.
  • the test array can include proteins that are expressed in a tissue, e.g., a normal or diseased tissue.
  • the test polypeptides are random amino acid sequences, patterned amino acids sequences, or designed amino acids sequences (e.g., sequence designed by manual, rational, or computer-aided approaches).
  • the proteins can include a plurality from a first source, and plurality from a second source.
  • an address of the test a ⁇ ay can further include one or a plurality of additional polypeptides.
  • an address can include a pool of test polypeptides, e.g., a subset of polypeptides encoded by a library or clone bank.
  • a second test array can be provided in which an address of the plurality of the second test a ⁇ ay includes a single or subset of members of the pool present at an address of the first a ⁇ ay. The first and the second test arrays can be used simultaneously or consecutively.
  • each address of the plurality includes a first test amino acid sequence that is common to addresses of the plurality, and a second test component that is unique among the addresses of the plurality.
  • the second test component can be query compositions whereas the first amino test amino acid sequence can be a target sequence.
  • a test a ⁇ ay can be stored for use at a later time, for example, can be rapidly frozen after being optionally contacted with a cryoprotectant.
  • the packaged product can be supplied to a user with or without additional components. Proteins at addressable locations can be obtained from a collection of full-length expressed genes (e.g., a repository of clones), for example, expressed in a tissue, e.g., a normal or diseased tissue.
  • the method can further include washing the substrate, e.g., after sufficient contact with a specimen.
  • the wash step can be repeated, e.g., one or more times, e.g., until an excess of a component is removed.
  • the wash step can remove unbound proteins.
  • the stringency of the wash step can vary, e.g., the salt, pH, and buffer composition of the wash buffer can vary.
  • the substrate can be washed with a chaotrope, (e.g., guanidinium hydrochloride, or urea).
  • the chaotrope can itself be washed from the array, and the compositions can be renatured.
  • contacting the specimen can be performed under conditions of sufficient stringency that only limited washing is necessary prior to continuing with the method.
  • the method can further include contacting the substrate with a second substrate.
  • the substrate is a gel
  • the gel can be contacted with a second gel, and the contents of one gel can be transferred to another (e.g., by diffusion or electrophoresis).
  • the addressable locations can have a composition further containing an epitope (e.g., recognized by a monoclonal antibody), or a binding agent (e.g., avidin or streptavidin, GST, or chitin binding protein).
  • Detection can entail contacting each address of the plurality with a binding agent, e.g., a labeled biotin moiety, labeled glutathione, labeled chitin, a labeled antibody, etc.
  • a binding agent e.g., a labeled biotin moiety, labeled glutathione, labeled chitin, a labeled antibody, etc.
  • each address of the plurality is contacted with an antibody specific to an amino acid sequence.
  • the antibody can be labeled, e.g., with a fluorophore.
  • Kits provided herein can further include a database, e.g., in computer memory or a computer readable medium (e.g., a CD-ROM, a magnetic disc, flash memory). Each record of the database can include a descriptor or reference for the physical location of the signature pattern on the a ⁇ -ay.
  • the records can be clustered or have a reference to other records (e.g., including hierarchical groupings) based on the result.
  • the kit can also include instructions for use of the test array, or a link or indication of a network resource (e.g., a web site) having instructions for use of the arrays or the above database of records describing the addresses of the signature patterns for each application.
  • the invention provides a method of providing an array across a network, e.g., a computer network, or a telecommunications network. The method includes transmitting across a network to a user; receiving at least one selection of the list from the user; transmitting at least one signature patterns corresponding to the selection of an application; and providing the substrate to the user.
  • the invention can include a computer system including a server storing a list of test array or their descriptors, and software configured to: send a list of test arrays and/or their descriptors to a client; receive from the client one or a plurality of applications desired for synthesis, or selected from the list; and interface with an array provider (e.g., a robotic system, or a technician) so as to dispose on a substrate proteins or other compositions, each at a plurality of addresses.
  • an array provider e.g., a robotic system, or a technician
  • the term "address,” as referred to herein, is a positionally distinct portion of a substrate in an array. Thus, a component at a first address can be positionally distinguished from a component at a second address. The address is located in and/or on the substrate or in micro wells.
  • the address can be distinguished by two coordinates (e.g., x-y) in embodiments using two-dimensional arrays, or by three coordinates (e.g., x-y-z) in embodiments using three-dimensional arrays or multiple.
  • substrate refers to a composition in or on which a set of protein polypeptides, or small molecules is disposed.
  • the substrate may be discontinuous.
  • An illustrative case of a discontinuous substrate is a set of gel pads separated by a partition.
  • the terms "peptide,” “polypeptide,” and “protein” are used interchangeably.
  • Unique reagent refers to a component that differs from other components at other addresses within the plurality of addresses.
  • An array can include additional pluralities of addresses in addition to the plurality being described; a plurality can include, e.g., at least 10, 100, or 1000 addresses).
  • the component can differ from the components at other addresses in terms of recognition and binding of one or more different specimens.
  • a unique component can be a molecule, e.g., a biological macromolecule (e.g., a protein, a polypeptide, or a carbohydrate), or a small organic compound. In the case of biological polymers, a structural difference can be a difference in sequence at least one position.
  • a structural difference e.g., for polymers having the same sequence, can be a difference in conformation (e.g., due to allosteric modification; meta-stable folding; alternative native folded states; prion or prion-like properties) or a modification (e.g., covalent and non-covalent modifications (e.g., a bound ligand))
  • conformation e.g., due to allosteric modification; meta-stable folding; alternative native folded states; prion or prion-like properties
  • a modification e.g., covalent and non-covalent modifications (e.g., a bound ligand)
  • Both solid and porous substrates are suitable for recipients for the encoding nucleic acids described herein.
  • a substrate material can be selected and/or optimized to be compatible with the spot size (e.g., density) required for the application.
  • the substrate is a solid substrate.
  • Solid substrates include: mass spectroscopy plates (e.g., for MALDI), glass (e.g., functionalized glass, a glass slide, porous silicate glass, a single crystal silicon, quartz, UV-transparent quartz glass), plastics and polymers (e.g., polystyrene, polypropylene, polyvinylidene difluoride, poly-tetrafluoroethylene, polycarbonate, PDMS, acrylic), metal coated substrates (e.g., gold), silicon substrates, latex, membranes (e.g., nitrocellulose, nylon), and a glass slide suitable for surface plasmon resonance (SPR).
  • the substrate is porous, e.g., a gel or matrix.
  • porous substrates include: agarose gels, acrylamide gels, sintered glass, dextran, meshed polymers (e.g., macroporous crosslinked dextran, sephacryl, and sepharose), and so forth.
  • Substrates can have properties such as being opaque, translucent, or transparent.
  • the addresses can be distributed, on the substrate in one dimension, e.g., a linear a ⁇ ay; in two dimensions, e.g., a planar array; or in three dimensions, e.g., a three dimensional array.
  • the solid substrate may be of any convenient shape or form, e.g., square, rectangular, ovoid, or circular. In another embodiment, the solid substrate can be disc shaped and attached to a means of rotation. In one embodiment, the substrate contains at least 1, 10, 100, 10 , 10 4 , 10 5 , 10 , 10 7 , 10 8 , or 10 9 or more addresses per cm 2 .
  • the center to center distance of each address can be at least about 5 mm, 1 mm, 100 ⁇ m, or can be less than about 10 ⁇ m, 1 ⁇ m, or 100 nm.
  • the longest diameter of each address can be at least about 5 mm, 1 mm, or less than about 100 ⁇ m, 10 ⁇ m, 1 ⁇ m, or 100 nm.
  • each address contains at least about 1 ⁇ g, for example, 10 ⁇ g , or each address contains less than about 100 ng, 10 ng, 1 ng, 100 pg, 10 pg, 1 pg, or 0.1 pg of the protein.
  • each address contains at least about 100, 10 3 , 10 4 , 10 5 , 10 6 , 10 7 , 10 8 , or 10 9 or more molecules of the composition.
  • the substrate can be modified to facilitate the stable attachment of linkers, capture probes, or binding agents.
  • linkers capture probes, or binding agents.
  • a surface can be amidated, e.g., by silylating the substrate, e.g., with triaikoxyaminosilane.
  • Si lane-treated surface can also be derivatized with homobifunctional and heterobifunctional linkers.
  • the substrate can be derivatized, e.g., so it has a hydroxy, an amino (e.g., alkylamine), carboxyl group, N-hydroxy- succinimidyl ester, photoactivatable group, sulfhydryl, ketone, or other functional group available for reaction.
  • the substrates can be derivatized with a mask in order to only derivatized limited areas; a chemical etch or UV light can be used to remove derivatization from selected regions.
  • a chemical etch or UV light can be used to remove derivatization from selected regions.
  • Substrates can be partitioned.
  • each address is partitioned from all other addresses in order to maintain separation of molecules at each of the addresses.
  • the substrate can be partitioned, e.g., by depressions, grooves, or photoresist.
  • the substrate can be a microchip with microchannels and reservoirs etched therein, e.g., by photolithography.
  • Other non-limiting examples of substrates include multi-welled plates, e.g., 96-, 384-, 1536-, 6144- well plates, and polyclimethyl siloxane (PDMS) plates. Such high-density plates are commercially available, often with specific surface treatments. Depending on the optimal volume required for each application, an appropriate density plate is selected.
  • the partitions are generated by a hydrophobic substance, e.g., a Teflon mask, grease, or a marking pen (e.g., from Snowman, Japan).
  • the substrate is designed with reservoirs isolated by protected regions, e.g., a layer of photoresist.
  • a mask can be focused or placed on the substrate, and a photoresist barrier separating the two reservoirs can be removed by illumination.
  • the method can also include moving the substrate in order to facilitate mixing.
  • Substrates can have a planar surface.
  • the addresses are not physically partitioned, but diffusion is limited on the planar substrate, e.g., by increasing the viscosity of the solution, by providing a matrix with small pore size which excludes large macromolecules, and/or by tethering at least one of the aforementioned macromolecules.
  • modest or even substantial diffusion to neighboring addresses is permitted.
  • Results e.g., a signal of a label
  • the address can be accurately determined.
  • Substrates are not limited to two-dimensional.
  • a three-dimensional substrate can be generated, e.g., by successively applying layers of a gel matrix on a substrate. Each layer contains a plurality of addresses. The porosity of the layers can vary, e.g., so that alternating layers have reduced porosity.
  • a three-dimensional substrate includes stacked two- dimensional substrates, e.g., in a tower format. Each two-dimensional substrate is accessible to a dispenser and detector.
  • a substrate can be a micromachined chip. Chips are made with glass and plastic materials, using rectangular or circular geometry. Wells and fluid channels are machined into the chip, and then the surfaces are derivatized.
  • a humidity-controlled chamber can be used to control evaporation.
  • a disk geometry (also termed "CD format") is another suitable substrate for the microarray. Sample addition and reactions are performed while the disk is spinning (see PCT WO 00/40750; WO 97/21090; GB patent application 9809943.5; 'The next small thing” (Dec. 9, 2000) Economist Technology Quarterly p. 8; PCT WO 91/16966; Duffy et al. (1999) Analytical chemistry; 71, 20, (1999), 4669-4678).
  • the disc can include sample-loading areas, reagent-loading areas, reaction chambers, and detection chambers.
  • Such microfluidic structures are arranged radially on the disc with the originating chambers located towards the disc center.
  • Samples from a microtiter plate can be loaded using a liquid train and a piezo dispenser. Multiple samples can be separated in the liquid train by air gaps or an inert solution.
  • the piezo dispenser then dispenses each sample onto appropriate application areas on the CD surface, e.g., a rotating CD surface.
  • the volume dispensed can vary, e.g., less than about 10 pL, 50 pL, 100 pL, 500 pL, 1 nL, 5 nL, or 50 nL.
  • the centripetal force conveys the dispensed sample into appropriate reaction chambers.
  • a master CD can be made by deep reactive ion etching (DRIE) on a 6-inch silicon wafer. This master disk can be plated and used as a model to manufacture additional CDs by injection molding (e.g., mic AB, Uppsala, Sweden).
  • DRIE deep reactive ion etching
  • a stroboscope can be used to synchronize the detector with the rotation of the CD in order to track individual detection chambers.
  • Components of the test array or of the specimen sample can have an affinity tag
  • An amino acid sequence that encodes a member of a specific binding pair can be used as an affinity tag.
  • the other member of the specific binding pair is attached to the substrate, either directly or indirectly.
  • One class of specific binding pair is a peptide epitope and the monoclonal antibody specific for it. Any epitope to which a specific antibody is or can be made available can serve as an affinity tag. See Kolodziej and Young (1991) Methods Enz. 194:508-519 for general methods of providing an epitope tag.
  • Exemplary epitope tags include HA (influenza haemagglutinin; Wilson et al. (1984) Cell 37:767), myc (e.g., Mycl-9E10, Evan et al. (1985) Mol. Cell.
  • An antibody can be coupled to a substrate of an array, e.g., indirectly using Staphylococcus aureus protein A, or streptococcal protein G.
  • the antibody can be covalently bound to a derivatized substrate, e.g., using a crosslinker, e.g., N-hydroxy- succinimidyl ester.
  • the test polypeptides with epitopes such as Flag, HA, or myc are bound to antibody-coated plates.
  • Another class of specific binding pair is a small organic molecule or simple polymer, and a polypeptide sequence that specifically binds it.
  • Specific binding pairs include glutathione and glutathione-S-transferase, chitin binding protein and chitin, cellulase and cellulose, methotrexate and dihydrofolate reductase, and FK506 and FKBP.
  • Art-known methods of tethering components such as proteins, e.g., the use of specific binding pairs, are suitable for the affinity or chemical capture of polypeptides on the array.
  • Appropriate substrates include commercially available streptavidin and avidin-coated plates, for example, 96-well Pierce Reacti-Bind Metal Chelate Plates or Reacti-Bind Glutathione Coated Plates (Pierce, Rockford, IL). Histidine- or GST- tagged test polypeptides are immobilized on either 96-well Pierce Reacti-Bind Metal Chelate Plates or Reacti-Bind Glutathione Coated Plates, respectively, and unbound proteins are optionally washed away.
  • Yet another class of specific binding pair is a metal, and a polypeptide sequence which can chelate the metal.
  • An exemplary pair is Ni 2+ and the hexa-histidine sequence (see U.S. Patent No.
  • An affinity tag can be a dimerization sequence, e.g., a homodimerization or heterodimerization sequence., preferably a heterodimerization sequence.
  • the affinity tag is a coiled-coil sequence, e.g., the heptad repeat region of Fos.
  • the binding agent coupled to the a ⁇ ay is the heptad repeat region of Jun.
  • the test polypeptide is tethered to the substrate by heterodimization of the Fos and Jun heptad repeat regions to form a coiled-coil.
  • the affinity tag is provided by an unnatural amino acid, e.g., with a side chain having functional properties different from a naturally occurring amino acid.
  • the binding agent attached to the substrate functions as a chemical handle to either bind or react with the affinity tag.
  • the affinity tag is a free cysteine which can be oxidized with a thiol group attached to the substrate to create a disulfide bond that tethers the test polypeptide.
  • Recognition Tags A variety of recognition tags can be used. For example, an epitope to which an antibody is available can be used as a recognition tag. The tag can be located at the N- or C-terminal to the sequence of interest.
  • the tag is recognized, e.g., directly, or indirectly (e.g., by binding of an antibody).
  • Green fluorescent protein coding regions of interest are taken from the FLEX repository and transferred into fusion vectors encoding either an N- or C- terminal green fluorescent protein (GFP) tag.
  • GFP green fluorescent protein
  • Complexes are detected by fluorescence spectroscopy (Spectra Max Gemini, Molecular Devices). The environment of a fluorophore has a strong effect on the quantum yield of fluorescence (i.e., the ratio of emitted to absorbed photons) through collisional processes and resonance energy transfer (a radiative process), and the concentration of target-query complexes that gives an acceptable signal-to-noise ratio is determined experimentally.
  • HRP horseradish peroxidase
  • AP alkaline phosphatase
  • MS Mass Spectroscopy recognition is achieved by analysis by mass spectroscopy, e.g., MALDI-TOF, which is indicative of the presence of a bound polypeptide.
  • a patient specimen is contacted to a sample of the test the array.
  • patient samples include serum proteins, proteins extracted from a biopsy obtained from the patient, and so forth as described herein.
  • cells or cell extracts can be contacted to the array in order to query for components displayed on the cell surface.
  • the specimen is modified with a compound prior to being contacted to the a ⁇ ay.
  • the components in the specimen can be biotinylated.
  • Addresses that bind proteins in the specimen are then identified by contacting the array with labeled streptavidin or labeled avidin.
  • the sample is unlabeled.
  • MALDI, SPR, or another techniques are used to identify if a protein is bound at each address.
  • Arrays can be designed to identify proteins associated with various pathologies, e.g., to detect antigens associated with cancer at various stages (for example, early, pre-metastatic stages or late stage cancer) or to provide a prediction (for example, to quantitate the abundance of an antigen correlated with a condition).
  • the subject can be a human patient, an animal, a forensic sample, or an environmental sample (e.g., from a waste system).
  • Detection of binding of a test sample macromolecule to one or more addressable locations can be achieved also by labeling the macromolecules in the test sample with a label which is radioactive, or a fluorophore, or a chemical, an epitope (to be identified by a specific antibody), or by labeling with a nucleic acid (to be amplified and identified for example by a labeled complementary nucleic acid for hybridization).
  • Such labeling of the test sample macromolecules can be achieved chemically, for example, via an -SH group of a cysteine residue.
  • RNA-directed RNA polymerases and DNA-directed RNA polymerases are both suitable transcription effectors.
  • DNA-directed RNA polymerases include bacteriophage T7 polymerase , phage T3, phage ⁇ ll, Salmonella phage SP6, or Pseudomonas phage gh-1, as well as archeal
  • RNA polymerases bacterial RNA polymerase complexes
  • eukaryotic RNA polymerase complexes eukaryotic RNA polymerase complexes.
  • T7 polymerase is a preferred polymerase. It recognizes a specific sequence, the
  • T7 promoter (see e.g., U.S. Patent No. 4,952,496), which can be appropriately positioned upstream of an encoding nucleic acid sequence. Although, a DNA duplex is required for recruitment and initiation of T7 polymerase, the remainder of the template can be single stranded. In embodiments utilizing other RNA polymerases, appropriate promoters and initiations sites are selected according to the specificity of the polymerase. RNA-directed RNA polymerases can include Q ⁇ replicase, and RNA-dependent
  • the transcription/translation mix is in a minimal volume, and this volume is optimized for each application.
  • the volume of translation effector at each address can be less than about 10 "4 , 10 "5 , 10 "6 , 10 "7 , 10 “8 , or 10 "9 L.
  • the a ⁇ ay can be maintained in an environment to prevent evaporation, e.g., by covering the wells or by maintaining a humid atmosphere.
  • the entire substrate can be coated or immersed in the translation effector.
  • One possible translation effector is a translation extract prepared from cells.
  • the translation extract can be prepared e.g., from a variety of cells, e.g., yeast, bacteria, mammalian cells (e.g., rabbit reticulocytes), plant cells (e.g., wheat germ), and archebacteria.
  • the translation extract is a wheat germ agglutinin extract or a rabbit reticulocyte lysate.
  • the translation extract also includes a transcription system, e.g., a eukaryotic, prokaryotic, or viral RNA polymerase, e.g., T7 RNA polymerase.
  • the translation extract is disposed on the substrate such that it can be removed by simple washing.
  • the translation extract can be supplemented, e.g., with additional amino acids, tRNAs, tRNA synthases, and energy regenerating systems.
  • the translation extract also include an amber, ochre, or opal suppressing tRNA.
  • the tRNA can be modified to contain an unnatural amino acid.
  • the translation extract further includes a chaperone, e.g., an agent which unfolds or folds polypeptides, (e.g., a recombinant purified chaperones, e.g., heat shock factors, GroEL/ES and related chaperones, and so forth.
  • the translation extract includes additives (e.g., glycerol, polymers, etc.) to alter the viscosity of the extract.
  • additives e.g., glycerol, polymers, etc.
  • Affinity Tags An amino acid sequence that encodes a member of a specific binding pair can be used as an affinity tag. The other member of the specific binding pair is attached to the substrate, either directly or indirectly.
  • One class of specific binding pair is a peptide epitope and the monoclonal antibody specific for it. Any epitope to which a specific antibody is or can be made available can serve as an affinity tag. See Kolodziej and Young (1991) Methods Enz. 194:508-519 for general methods of providing an epitope tag.
  • Exemplary epitope tags include HA (influenza haemagglutinin; Wilson et al. (1984) Cell 37:767), myc (e.g., Mycl-9E10, Evan et al. (1985) Mol Cell. Biol. 5:3610-3616), VSV-G, FLAG, and 6- histidine (see, e.g., German Patent No. DE 19507 166).
  • An antibody can be coupled to a substrate of an array, e.g., indirectly using Staphylococcus aureus protein A, or streptococcal protein G.
  • the antibody can be covalently bound to a derivatized substrate, e.g., using a crosslinker, e.g., N-hydroxy- succinimidyl ester.
  • the test polypeptides with epitopes such as Flag, HA, or myc are bound to antibody-coated plates.
  • Another class of specific binding pair is a small organic molecule, and a polypeptide sequence that specifically binds it. See, for example, the specific binding pairs listed in Table 1. Table 1
  • CBD Cellulase
  • amylose or maltose dihydrofolate reductases methotrexate
  • tethering proteins e.g., the use of specific binding pairs are suitable for the affinity or chemical capture of polypeptides on the a ⁇ ay.
  • Appropriate substrates include commercially available streptavidin and avidin- coated plates, for example, 96-well Pierce Reacti-Bind Metal Chelate Plates or Reacti- Bind Glutathione Coated Plates (Pierce, Rockford, EL). Histidine- or GST-tagged test polypeptides are immobilized on either 96-well Pierce Reacti-Bind Metal Chelate Plates or Reacti-Bind Glutathione Coated Plates, respectively, and unbound proteins are optionally washed away.
  • the polypeptide is an enzyme, e.g., an inactive enzyme, and ligand is its substrate.
  • the enzyme is modified so as to form a covalent bond with its substrate.
  • the polypeptide is an enzyme, and the ligand is an enzyme inhibitor.
  • Yet another class of specific binding pair is a metal, and a polypeptide sequence which can chelate the metal.
  • An exemplary pair is Ni 2+ and the hexa-histidine sequence (see U.S. Patent No. 4,877,830; 5,047,513; 5,284,933; and 5,130,663.).
  • the affinity tag is a dimerization sequence, e.g., a homodimerization or heterodimerization sequence., preferably a heterodimerization sequence.
  • the affinity tag is a coiled-coil sequence, e.g., the heptad repeat region of Fos.
  • the binding agent coupled to the array is the heptad repeat region of Jun.
  • the test polypeptide is tethered to the substrate by heterodimization of the Fos and Jun heptad repeat regions to form a coiled-coil.
  • the affinity tag is provided by an unnatural amino acid, e.g., with a side chain having functional properties different from a naturally occurring amino acid.
  • the binding agent attached to the substrate functions as a chemical handle to either bind or react with the affinity tag.
  • the affinity tag is a free cysteine which can be oxidized with a thiol group attached to the substrate to create a disulfide bond that tethers the test polypeptide.
  • the substrate and the liquid-handling equipment are selected with consideration for required liquid volume, positional accuracy, evaporation, and cross-contamination.
  • the density of spots can depend on the liquid volume required for a particular application, and on the substrate, e.g., how much a liquid drop spreads on the substrate due to surface tension, and the positional accuracy of the dispensing equipment.
  • Numerous methods are available for dispensing small volumes of liquid onto substrates.
  • U.S. Patent No. 6,112,605 describes a device for dispensing small volumes of liquid.
  • U.S. Patent No. 6,110,426 describes a capillary action-based method of dispensing known volumes of a sample onto an array.
  • Nucleic acid spotted onto slides can be allowed to dry by evaporation. Dry air can be used to accelerate the process.
  • Capture Probes The substrate can include an attached nucleic acid capture probe at each address.
  • capture probes can be used create a self- assembling array.
  • a unique capture probe at each address selectively hybridizes to a nucleic acid encoding a test amino acid sequence, thereby organizing each encoding nucleic acid to a unique address.
  • the capture nucleic acid can be covalently attached or bound, e.g., to a polycationic surface on the substrate.
  • the capture probe can itself be synthesized in situ, e.g., by a light-directed method (see, e.g., U.S. Patent No.
  • the capture probe can hybridize to the nucleic acid encoding the test polypeptide.
  • the capture probe anneals to the T7 promoter region of a single stranded nucleic acid encoding the test amino acid sequence.
  • the capture probe is ligated to the encoding nucleic acid sequence.
  • the capture probe is a padlock probe.
  • the capture probe hybridizes to a nucleic acid encoding a test amino acid sequence, e.g., a unique region of the nucleic acid, or to a nucleic acid sequence tag provided on the nucleic acid for the purposes of identification.
  • the insoluble substrates having a binding agent attached can be disposed at each address of the a ⁇ ay.
  • the insoluble substrates can further include a unique identifier, such as a chemical, nucleic acid, or electronic tag.
  • Chemical tags e.g., such as those used for recursive identification in "split and pool” combinatorial syntheses. Kerr et al. (1993) J. Am. Chem. Soc, 115:2529-2531) Nikolaiev et al. ((1993) Peptide Res. 6, 161-170) and Ohlmeyer et /.((1993) Proc. Natl. Acad. Sci.
  • a nucleic acid tag can be a short oligonucleotide sequence that is unique for a given address.
  • the nucleic acid tag can be coupled to the particle.
  • the encoding nucleic acid provides a unique identifier.
  • the encoding nucleic acid can be coupled or attached to the particle.
  • Electronic tags include transponders as mentioned below.
  • the insoluble substrate can be a particle (e.g., a nanoparticle, or a transponder), or a bead. Beads.
  • the disposed particle can be a bead, e.g., constructed from latex, polystyrene, agarose, a dextran (sepharose, sephacryl), and so forth.
  • Transponders U.S. Patent No. 5,736,332 describes methods of using small particles containing a transponder on which a handle or binding agent can be affixed. The identity of the particle is discerned by a read- write scanner device which can encode and decode data, e.g., an electronic identifier, on the particle (see also Nicolaou et al. (1995) Angew. Chem. Int. Ed. Engl. 34: 2289-2291). Test polypeptides are bound to the transponder by attaching to the handle or binding agent.
  • the nucleic acid can be an RNA, single stranded DNA, a double stranded DNA, or combinations thereof.
  • a single-stranded DNA can include a hairpin loop at its 5' end which anneals to the T7 promoter sequence to form a duplex in that region.
  • the nucleic acid can be an amplification products, e.g., from PCR (U.S. Patent No. 4,683,196 and 4,683,202); rolling circle amplification ("RCA," U.S. Patent No. 5,714,320), isothermal RNA amplification or NASBA (U.S. Patent Nos.
  • the sequence of the encoding nucleic acid is known prior to being disposed at an address.
  • the sequence of the encoding nucleic acid is unknown prior to disposal at an address.
  • the nucleic acid can be randomly obtained from a library.
  • the nucleic acid can be sequenced after the address on which it is placed has been identified as encoding a polypeptide of interest.
  • Amplification in situ A nucleic acid disposed on the array can be amplified directly on the array, by a variety of methods, e.g., PCR (U.S. Patent No.
  • RNA amplification is well described in the art (see, e.g., U.S. Patent Nos. 5,130,238; 5,409,818; and 5,554,517; Romano et al. (1997) Immunol Invest. 26:15-28; in technical literature for "RnampliFireTM” Qiagen, CA).
  • Isothermal RNA amplification is particularly suitable as reactions are homogenous, can be performed at ambient temperatures, and produce RNA templates suitable for translation.
  • Vectors for Expression Coding regions of interest can be taken from a source plasmid, e.g., containing a full length gene and convenient restriction sites, or sites for homologous or site- specific recombination, and transferred to an expression vector.
  • the expression vector includes a promoter and an operably linked coding region, e.g., encoding an affinity tag, such as one described herein.
  • the tag can be N or C terminal.
  • the vector cany a cap-independent translation enhancer (CITE, or IRES, internal ribosome entry site) for increased in vitro translation of RNA prepared from cloned DNA sequences.
  • CITE cap-independent translation enhancer
  • IRES internal ribosome entry site
  • the fusion proteins will be generated with commercially available in vitro transcription/translation kits such as the Promega TNT Coupled Reticulocyte Lysate Systems or TNT Coupled Wheat Germ Extract Systems.
  • Cell-free extracts containing translation component derived from microorganisms, such as a yeast, or a bacteria can also be used.
  • the vector can include a number of regulatory sequences such as a transcription promoter; a transcription regulatory sequence; a untranslated leader sequence; a sequence encoding a protease site; a recombination site; a 3' untranslated sequence; a transcriptional terminator; and an internal ribosome entry site.
  • the vector or encoding nucleic acid can also include a sequence encoding an intein.
  • Inteins can be used to cyclize, ligate, and/or polymerize polypeptides, e.g., as described in Evans et al. (1999) J Biol Chem 274:3923 and Evans et al. (1999) J Biol Chem 274: 18359.
  • Useful sets of proteins for creating test arrays include naturally proteomic sets, randomized versions thereof, and artificial proteins (e.g., artificial variants of polypeptides that include a folded domain). Such proteins can be stored in a repository, see below. Proteins Naturally occurring sequences. Naturally occurring sequences can be procured from cells of species from the kingdoms of animals, bacteria, archebacteria, plants, and fungi.
  • Non-limiting examples of eukaryotic species include: mammals such as human, mouse (Mus musculus), and rat; insects such as Drosophila melanogaster; nematodes such as Caemorhabditis elegans; other vertebrates such as Brachydanio rerio; parasites such as Plasmodium falciparum, Leishmania major, fungi such as yeasts, Histoplasma, Cryptococcus, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Pichia pastoris and the like); and plants such as Arabidoposis thaliana, rice, maize, wheat, tobacco, tomato, potato, and flax.
  • Non-limiting examples of bacterial species include E.
  • amino acid sequence encoded by viral genomes can be used, e.g., a sequence from rotavirus, hepatitis A virus, hepatitis B virus, hepatitis C virus, herpes virus, papilloma virus, or a retrovirus (e.g., FflV-1, HIV-2, HTLV, SIV, and STLV).
  • a cDNA library is prepared from a desired tissue of a desired species in a vector described herein. Colonies from the library are picked, e.g., using a robotic colony picker. DNA is prepared from each colony and used to program a NAPPA array. rtificial sequences.
  • the encoding nucleic acid sequence can encode artificial amino acid sequences.
  • Artificial sequences can be randomized amino acid sequences, patterned amino acid sequence, computer-designed amino acid sequences, and combinations of the above with each other or with naturally occurring sequences.
  • Cho et al (2000) J Mol Biol 297:309-19 describes methods for preparing libraries of randomized and patterned amino acid sequences. Similar techniques using randomized oligonucleotides can be used to construct libraries of random sequences. Individual sequences in the library (or pools thereof) can be used to program a NAPPA array. Dahiyat and Mayo (1997) Science 278:82-7 describe an artificial sequence designed by a computer system using the dead-end elimination theorem.
  • Similar systems can be used to design amino acid sequences, e.g., based on a desired structure, such that they fold stably.
  • computer systems can be used to modify naturally occurring sequences in order Mutagenesis.
  • the array can be used to display the products of a mutagenesis or selection. Examples of mutagenesis procedures include cassette mutagenesis (see e.g., Reidhaar-Olson and Sauer (1988) Science 241:53-7), PCR mutagenesis (e.g., using manganese to decrease polymerase fidelity), in vivo mutagenesis (e.g., by transfer of the nucleic acid in a repair deficient host cell), and DNA shuffling (see U.S. Patent No.
  • selection procedures include complementation screens, and phage display screens.
  • Mutagenic methods can be used to introduce randomization.
  • more methodical variation can be achieved. For example, an amino acid position or positions of a naturally occurring protein can be systematically varied, such that each possible substitution is present at a unique position on the array.
  • the all the residues of a binding interface can be varied to all possible other combinations.
  • the range of variation can be restricted to reasonable or limited amino acid sets. Collections.
  • Additional collections include arrays having at different addresses one of the following combinations: combinatorial variants of a bioactive peptide; specific variants of a single polypeptide species (splice variants, isolated domains, domain deletions, point mutants); polypeptide orthologs from different species; polypeptide components of a cellular pathway (e.g., a signalling pathway, a regulatory pathway, or a metabolic pathway); and the entire polypeptide complement of an organism.
  • the computer system can be networked to receive data, e.g., raw data or processed data, from a data acquisition apparatus, e.g., a microchip slide scanner, a fluorescence microscope, or surface plasmon resonance.
  • the computer system includes a relational database.
  • the database houses all data from multiple interaction profiles, e.g., using the same or different arrays.
  • One table contains table rows for each array contacting evaluation, e.g., describing one or more the array production number, experiment date, array contents experimental conditions, and so forth.
  • the raw data from an interaction microa ⁇ ay experiment for example, is stored in a second table with table rows for each address on the array.
  • This data includes the signature/profile information.
  • the second table can have fields for observed fluorescence, background fluorescence, the amino acid sequences present at the microarray address, other annotations, links, cross-references and so forth.
  • the system is designed to facilitate digital access to the data in order to interface the experimental results with predictive models of interactions.
  • the system can be accessed in real time, e.g., as profile data is acquired, and from multiple network stations, e.g., multiple users within a company (e.g., using an Intranet), multiple customers of a data provider (e.g., using secure Internet communication protocols), or multiple individuals across the globe (e.g., using the Internet).
  • Clustering algorithm can be applied to profiles in the database that are associated with particular information (e.g., a diagnosis, detection, species, etc.)
  • Exemplary clustering algorithms include Eisen et al. ((1998) Proc. Nat. Acad. USA 95:14863) and Golub et al ((1999) Science 286:531) for methods of clustering signatures/profiles.
  • Other methods of comparing profiles include training a recognition algorithm.
  • the recognition algorithm can use one or more of: statistical analysis; fuzzy logic, hidden Markov models, regression, decision trees, neural networks, and genetic algorithms. See, variously, US published applications 2002- 0146724; 2003-0059792; 2003-0049701; and 2003-0023385.
  • the information set can be used to send information that assigns a descriptor to incoming information to a user, e.g., a remote client.
  • the process in one embodiment includes: providing a plurality of identical test arrays, each array having a predetermined number of samples of proteins from a pathogen or an affected subject, the samples being identically arrayed at addressable locations; reacting each of the plurality of specimen biological samples from affected subjects and from unaffected controls with a test a ⁇ ay; washing test arrays appropriately to eliminate non-specific binding; collecting raw data on each test a ⁇ ay showing the pattern of elements that have specific binding of at least one component with a specimen sample; and analyzing heuristically to compare binding patterns of specimens of affected individuals to that of normal unaffected individuals, to find disease specific patterns. It is envisioned that many of the steps in these processes will be eliminated with further development.
  • washing steps may be reduced or eliminated by establishing conditions that are sufficiently stringent that specific binding is obtained ab initio.
  • Several types of patterns are expected for binding of a specimen macromolecule to any given spot at an addressable location in an array. At some locations, a "constant background" is observed, because the same patterns are illuminated in all specimen samples regardless of origin from an affected subject or a control. See Figure 1 for spots present in all individuals. At another set of locations, a "protein signature" of disease-specific "detectably bound" addressable locations, or illuminated patterns, is observed in samples from affected individuals only. See Figure 1. Finally, individual variation is observed as components that are illuminated in some individuals and not others, with no disease state correlation.
  • a statistical analysis of markably bound illuminated components at their addressable locations in with affected and unaffected control samples is performed, to find those components that correlate with a disease state with good predictability, for evaluating unknown samples. Unknown samples can then be read, and compared to the previously determined patterns, to provide diagnoses.
  • the signature pattern need not be a single absolute pattern.
  • a set of patterns that statistically indicate the presence of the pathogen is sufficient. For example, each field in the pattern can have associated with it a variance or standard deviation.
  • the same set of test arrays having components such as proteins at addressable locations may be used to identify more than one pathogen, if each pathogen has a reproducible pattern distinct from the pattern of other pathogens.
  • An array can further be used to establish a pattern for a novel pathogen, even if the pathogen was not heretofore identified.
  • the heuristic algorithms can be used to find a specific pattern.
  • Embodiments of the invention do not require use of mass spectrometry to identify the proteins bound to the array. Detection of binding is accomplished with relatively simple instruments that are compact in size. A number of pathogens, including HIV, influenza virus, and many protozoans, have the ability to mutate and change their expressed proteonic phenotype.
  • Methods provided herein allow the continued inclusion of new data into the training sets, so that the signature patterns can evolve with the changing pathogens, and can maintain utility. Once patterns are identified by the methods herein, the number of locations of the test array can be reduced to those that show a strong positive predictive value and good negative predictive value. This reduction creates simple or test arrays for an application, which can be more readily deployed for field use.
  • the methods herein are envisioned to provide commercial applications that include: clinical diagnosis of patients with suspected infections of known pathogens; clinical diagnosis of specific variants of pathogens for rapid adjustment of antibiotic therapy; rapid clinical diagnosis of patients with other disorders where marker proteins may be helpful, specific forms of cancer, specific types of autoimmune diseases such as rheumatic diseases, diseases such as acute myocardial infarction; clinical evaluation of human, animal or plant populations with suspected pathogen of unknown character (requires a control population of unaffected individuals); evaluation of samples (for example, water supply, food, air) for the presence of microorganisms or toxic macromolecules; and evaluation of samples for the presence of bioterrorist contamination.
  • Food can be examined for the presence of a food poisoning agent such as a bacterium, and the bacterium can be identified by the methods herein, for example, the bacterium can be identified as a species of the genus Salmonella or the genus Staphylococcus.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Immunology (AREA)
  • Biomedical Technology (AREA)
  • Chemical & Material Sciences (AREA)
  • Urology & Nephrology (AREA)
  • Hematology (AREA)
  • Physics & Mathematics (AREA)
  • Food Science & Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Cell Biology (AREA)
  • Biotechnology (AREA)
  • Pathology (AREA)
  • Medicinal Chemistry (AREA)
  • General Physics & Mathematics (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Tropical Medicine & Parasitology (AREA)
  • Virology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

L'invention concerne des réseaux matriciels d'essais constitués de sondes de capture, utilisés pour identifier des informations caractéristiques concernant un échantillon. Par exemple, les procédés décrits dans cette invention peuvent être utilisés pour identifier la présence d'une cellule cancéreuse ou d'un agent pathogène dans un échantillon prélevé sur un sujet, ou la présence d'une molécule cible dans un échantillon environnemental.
PCT/US2003/017979 2001-01-23 2003-06-09 Evaluation de signatures proteiques WO2005016230A2 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU2003304409A AU2003304409A1 (en) 2002-06-07 2003-06-09 Evaluating protein signatures
US10/910,718 US8609344B2 (en) 2001-01-23 2004-08-03 Nucleic-acid programmable protein arrays

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US38703402P 2002-06-07 2002-06-07
US60/387,034 2002-06-07

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US10/055,432 Continuation-In-Part US6800453B2 (en) 2001-01-23 2002-01-22 Nucleic-acid programmable protein arrays

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US10/910,718 Continuation-In-Part US8609344B2 (en) 2001-01-23 2004-08-03 Nucleic-acid programmable protein arrays

Publications (2)

Publication Number Publication Date
WO2005016230A2 true WO2005016230A2 (fr) 2005-02-24
WO2005016230A3 WO2005016230A3 (fr) 2006-01-26

Family

ID=34192918

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2003/017979 WO2005016230A2 (fr) 2001-01-23 2003-06-09 Evaluation de signatures proteiques

Country Status (2)

Country Link
AU (1) AU2003304409A1 (fr)
WO (1) WO2005016230A2 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007070181A1 (fr) * 2005-12-15 2007-06-21 Kimberly-Clark Worldwide, Inc. Procede de detection de la conjonctivite bacterienne
US7282349B2 (en) 2003-12-16 2007-10-16 Kimberly-Clark Worldwide, Inc. Solvatochromatic bacterial detection
US7300770B2 (en) 2004-12-16 2007-11-27 Kimberly-Clark Worldwide, Inc. Detection of microbe contamination on elastomeric articles
US7399608B2 (en) 2003-12-16 2008-07-15 Kimberly-Clark Worldwide, Inc. Microbial detection and quantification
WO2010010213A1 (fr) 2008-07-22 2010-01-28 Equipo Ivi Investigacion Sl Profil d'expression génique utilisé comme marqueur de la réceptivité endométriale
JP2015514396A (ja) * 2012-03-09 2015-05-21 マサチューセッツ インスティテュート オブ テクノロジー 接着シグネチャー

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001040803A1 (fr) * 1999-12-03 2001-06-07 Diversys Limited Procede de criblage direct

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001040803A1 (fr) * 1999-12-03 2001-06-07 Diversys Limited Procede de criblage direct

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7282349B2 (en) 2003-12-16 2007-10-16 Kimberly-Clark Worldwide, Inc. Solvatochromatic bacterial detection
US7399608B2 (en) 2003-12-16 2008-07-15 Kimberly-Clark Worldwide, Inc. Microbial detection and quantification
US7687245B2 (en) 2003-12-16 2010-03-30 Kimberly-Clark Worldwide, Inc. Microbial detection and quantification
US8338128B2 (en) 2003-12-16 2012-12-25 Kimberly-Clark Worldwide, Inc. Microbial detection and quantification
US7300770B2 (en) 2004-12-16 2007-11-27 Kimberly-Clark Worldwide, Inc. Detection of microbe contamination on elastomeric articles
WO2007070181A1 (fr) * 2005-12-15 2007-06-21 Kimberly-Clark Worldwide, Inc. Procede de detection de la conjonctivite bacterienne
US7727513B2 (en) 2005-12-15 2010-06-01 Kimberly-Clark Worldwide, Inc. Method for screening for bacterial conjunctivitis
WO2010010213A1 (fr) 2008-07-22 2010-01-28 Equipo Ivi Investigacion Sl Profil d'expression génique utilisé comme marqueur de la réceptivité endométriale
JP2015514396A (ja) * 2012-03-09 2015-05-21 マサチューセッツ インスティテュート オブ テクノロジー 接着シグネチャー
EP2823035A4 (fr) * 2012-03-09 2015-08-05 Massachusetts Inst Technology Signatures d'adhésion

Also Published As

Publication number Publication date
WO2005016230A3 (fr) 2006-01-26
AU2003304409A1 (en) 2005-03-07
AU2003304409A8 (en) 2005-03-07

Similar Documents

Publication Publication Date Title
US8609344B2 (en) Nucleic-acid programmable protein arrays
Tomizaki et al. Protein‐detecting microarrays: current accomplishments and requirements
CN110475864B (zh) 用于识别或量化在生物样品中的靶标的方法和组合物
Hu et al. Functional protein microarray technology
CN105189749B (zh) 用于标记和分析样品的方法和组合物
Talapatra et al. Protein microarrays: challenges and promises
US20200217850A1 (en) Heterogeneous single cell profiling using molecular barcoding
Uttamchandani et al. Protein and small molecule microarrays: powerful tools for high-throughput proteomics
CN101918590B (zh) 核酸测序
JP2004536565A5 (fr)
US20010031468A1 (en) Analyte assays employing universal arrays
US20050255491A1 (en) Small molecule and peptide arrays and uses thereof
US20100240544A1 (en) Aptamer biochip for multiplexed detection of biomolecules
US20020168692A1 (en) Biosensor detector array
US20210016283A1 (en) Ultrahigh throughput protein discovery
Huels et al. The impact of protein biochips and microarrays on the drug development process
US20060228735A1 (en) Multiplex assay systems
WO2005016230A2 (fr) Evaluation de signatures proteiques
US20020187464A1 (en) Microarray-based method for rapid identification of cells, microorganisms, or protein mixtures
WO2019148001A1 (fr) Procédés et composition pour systèmes de détection de protéine à molécule unique à haut rendement
EP1403641B1 (fr) Méthode pour calculer des constantes d'association et de dissociation en utilisant une puce de polymères pour l'identification des polymères ioniques
AU2007254676A1 (en) Nucleic-acid programmable protein arrays
AU2002241943A1 (en) Nucleic-acid programmable protein arrays
del Campo et al. Diagnostics and high throughput screening
WO2024059655A1 (fr) Caractérisation de l'accessibilité de structures macromoléculaires

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP