EP1255859A2 - Tissue specific genes of diagnostic import - Google Patents

Tissue specific genes of diagnostic import

Info

Publication number
EP1255859A2
EP1255859A2 EP00976921A EP00976921A EP1255859A2 EP 1255859 A2 EP1255859 A2 EP 1255859A2 EP 00976921 A EP00976921 A EP 00976921A EP 00976921 A EP00976921 A EP 00976921A EP 1255859 A2 EP1255859 A2 EP 1255859A2
Authority
EP
European Patent Office
Prior art keywords
polynucleotides
protein
seq
nos
fragments
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP00976921A
Other languages
German (de)
French (fr)
Inventor
Thierry Sornasse
Jeffrey J. Seilhamer
George A. Watson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Incyte Corp
Original Assignee
Incyte Genomics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Incyte Genomics Inc filed Critical Incyte Genomics Inc
Publication of EP1255859A2 publication Critical patent/EP1255859A2/en
Withdrawn legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6809Methods for determination or identification of nucleic acids involving differential detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6883Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the present invention relates to a composition
  • a composition comprising a plurality of polynucleotides which are cell and/or tissue specific. These polynucleotides may be used to define and direct a metabolic or developmental process, to identify or to monitor the progression of a condition, disease, or disorder, or to evaluate and monitor the efficacy of a treatment protocol.
  • array technology can provide a simple way to explore the expression of a single polymorphic gene or the expression profile of a large number of related or unrelated genes.
  • arrays are employed to detect the expression of a specific gene or its variants.
  • arrays provide a platform for examining which genes are tissue specific, direct the differentiation of a cell type or tissue, carry out housekeeping functions, function as parts of a signaling cascade, or characterize a particular genetic predisposition, condition, disease, or disorder.
  • gene expression profiling is particularly relevant to improving diagnosis and prognosis of disease.
  • tissue and cell specific genes against which genes expressed during the disease process may be compared.
  • both the levels and sequences expressed in brain tumors may be compared with the levels and sequences expressed in normal brain tissue.
  • These comparisons may be made on a single array by incorporating a particular tissue or cell specific reference set alongside novel sequences or on multiple arrays, each of which contains at least some subset of the known reference set.
  • the present invention satisfies a need in the art in that it provides such a reference set.
  • the reference set may be used in its entirety or in part to produce an expression profile that may be used to define and direct a metabolic or developmental process, to identify or to monitor the progression of a condition, disease, or disorder, or to evaluate and monitor the efficacy of a treatment protocol.
  • the present invention provides a plurality of tissue or cell specific polynucleotides which may be used on an array to produce an expression profile.
  • This profile may define expression of the polynucleotides in normal tissue, during a particular metabolic or developmental process or during the onset, progression, or treatment of a human condition, disease, or disorder.
  • these polynucleotides are selected from SEQ ID NOs:l-416.
  • the invention also provides a plurality of polynucleotides which display tissue or cell specific expression and are selected from: a) SEQ ID NOs:209-218 and 1-10, cell specific polynucleotides of heart and fragments thereof; b) SEQ ID NOs:219-249 and 11-41, cell specific polynucleotides of skeletal muscle and fragments thereof; c) SEQ ID NOs: 250-25 land 42-43, cell specific polynucleotides of uterus and fragments thereof; d) SEQ ID NOs:252-256 and 44-48, cell specific polynucleotides of 5 ovary and fragments thereof; e) SEQ ID NOs:257-263 and 49-55, cell specific polynucleotides of stomach and fragments thereof; f) SEQ ID NOs:264-283 and 56-75, cell specific polynucleotides of intestine and fragments thereof; g) SEQ ID NOs:284-293 and 76-85
  • the plurality of polynucleotides are immobilized on a substrate.
  • the expression of a plurality of polynucleotides is used to detect
  • the tissue is embryonic stem cells which are differentiating into brain, heart, kidney, liver, lung, muscle or pancreatic tissues.
  • the tissue is a biopsy from diseased brain, heart, kidney, liver, lung, muscle, ovarian, pancreatic, small intestine, stomach, or uterine tissues which is being diagnosed for a cancer or immune or inflammatory disease or subjected to forensic analysis.
  • the point of origin of a metastatic cancer is determined.
  • the polynucleotides are used in high throughput methods of screening molecules or compounds to identify a ligand, the method comprising combining a polynucleotide with molecules or compounds under conditions to allow specific binding and detecting specific binding, thereby identifying a ligand which specifically binds to the polynucleotide.
  • the molecules or compounds to be screened are selected from DNA molecules, RNA molecules, PNAs, mimetics,
  • the invention provides a substantially purified polynucleotide selected from SEQ ID NOs:212, 228, 233, 259, 271, 287, 316-319, 324, 370, 379, 380, 383, 410, and 412 or a fragment thereof, SEQ ID NO:4, 20, 25, 51, 63, 79, 108-111, 116, 162, 171, 172, 175, 202, and 204.
  • 30 316-319, 324, 370, 379, 380, 383, 410, and 412 or a fragment thereof, SEQ ID NO:4, 20, 25, 51, 63, 79, 108-111, 116, 162, 171, 172, 175, 202, and 204 is used in an expression vector transformed into a host cell to produce a protein or a portion thereof by culturing the host cell under conditions for the expression of protein and recovering the protein from the host cell culture.
  • the invention provides a protein or a portion thereof.
  • the invention provides a protein or a portion thereof.
  • 35 protein is used in a high throughput method to screen large numbers of molecules or compounds to identify at least one ligand which specifically binds the protein, the method comprising combining the protein with the molecules or compounds under conditions to allow specific binding and detecting specific binding, thereby identifying a ligand which specifically binds the protein.
  • the protein is used to purify a ligand, the method comprising combining the protein with a sample under conditions to allow specific binding, recovering the bound protein, and separating the protein from the ligand, thereby obtaining purified ligand.
  • the molecules or compounds screened or purified may be selected from DNA molecules, RNA molecules, PNAs, mimetics, peptides, proteins, agonists, antagonists, antibodies or their fragments, immunoglobulins, inhibitors, drug compounds, and pharmaceutical agents. Any of these molecules or compounds may have diagnostic or therapeutic applications.
  • Sequence Listing is a compilation of polynucleotides obtained by sequencing and extension of clone inserts of different cDNAs. Each sequence is identified by a sequence identification number (SEQ ID NO or SEQ ID) and by the clone number (Incyte ID) from which it was obtained.
  • Table 1 lists the fragments and extended polynucleotides by their SEQ ID NO and cDNA respectively, tissue, and by the description associated with at least a fragment of a homologous polynucleotide in GenBank. The descriptions were obtained using the sequences of the Sequence Listing and BLAST analysis.
  • Table 2 lists the source of the RNAs used to produce target polynucleotides for hybridization to the UNIGEM V microarray (Incyte Genomics, Palo Alto CA).
  • the columns present the Source No, Tissue, Age, Ethnicity/Sex, Cause of Death, and Conditions or Diseases, as known for each donor.
  • Table 3 shows the data for each of the clones across each of the tissues used in the experiments.
  • the columns present Clone ID and the tissues (with source number)-heart, skeletal muscle, uterus, stomach, small intestine, lung, liver, kidney, pancreas, spleen and brain. This data was produced using GEMTOOLS software (Incyte Genomics).
  • Table 4 presents the analysis of variance (ANOVA) for the data.
  • the columns present Clone
  • Table 5 shows the cell and tissue specificity of the polynucleotides across tissues (heart, skeletal muscle, uterus, stomach, small intestine, lung, liver, kidney, pancreas, spleen and brain). The cell and tissue specific groupings were produced using mean values [mean (tissue)- mean (entire set)] and grouped using EXCEL98 software (Microsoft).
  • array refers to an ordered arrangement of hybridizable polynucleotides. These are arranged so that there are a "plurality" of polynucleotides, preferably at least one polynucleotide, preferably at least 100 polynucleotides, and more preferably at least 1,000 polynucleotides, and even more preferably at least 10,000 polynucleotides on a 1 cm 2 substrate.
  • the maximum number of polynucleotides is unlimited, but is at least 100,000.
  • the signal from each of the hybridized polynucleotides is individually distinguishable.
  • a “polynucleotide” refers to a chain of nucleotides. Preferably, the chain has from about 15 to 10,000 nucleotides and more preferably from about 400 to 6,000 nucleotides.
  • the term "probe” refers to a probe polynucleotide capable of hybridizing with a target polynucleotide to form a hybridization complex. In most instances, the sequences of the probe and target polynucleotides will be complementary (no mismatches) when aligned. In some instances, there may be up to a 10% mismatch.
  • “Fragment” refers to any part of an Incyte clone or polynucleotide which retains a useful characteristic. Useful fragments may be used in hybridization technologies, to identify or purify ligands, or as a therapeutic to regulate replication, transcription or translation.
  • “Ligand” refers to any agent, molecule, or compound which will bind specifically to a complementary site on a polynucleotide or protein. Such ligands stabilize or modulate the activity of polynucleotides or proteins and may be composed of at least one of the following: inorganic and organic substances including nucleic acids, proteins, carbohydrates, fats, and lipids.
  • “Purified” refers to any molecule or compound that is removed, isolated, or separated from its natural environment and is at least about 60% free, and more preferably about 90% free, from other components with which it is naturally associated.
  • Specific binding refers to a special and precise interaction between two molecules which is dependent upon a particular structure such as molecular side groups. For example, the hydrogen bonding between two single stranded nucleic acids or the binding between an epitope or a protein and an agonist, antagonist, or antibody.
  • sample is used in its broadest sense.
  • a sample containing polynucleotides may comprise a bodily fluid; an extract from a cell, chromosome, organelle, or membrane isolated from a cell; genomic DNA, RNA, or cDNA in solution or bound to a substrate; a cell; a tissue; a tissue print; a finger print, a hair, and the like.
  • Portion refers to any part of a protein used for any purpose, but especially for the screening of molecules or compounds to identify those which specifically bind to that portion and for producing antibodies.
  • polynucleotide encoding a protein refers to nucleic acid sequence that closely aligns with a sequence which encodes a conserved protein motif or domain that were identified by employing analyses well known in the art. These analyses include Hidden Markov Models (HMMs) such as PFAM (Krogh (1994) J Mol Biol 235:1501-1531; Sonnhamer et al. (1988) Nucl Acids Res 26:320-322), BLAST (Basic Local Alignment Search Tool; Altschul (1993) J Mol Evol 36: 290-300; and Altschul et al. (1990) J Mol Biol 215:403-410), or other analytical tools such as BLIMPS (Henikoff et al. (1998) Nucl Acids Res 26:309-12). Additionally, “polynucleotide encoding a protein” may refer to a polynucleotide that is expressed in or associated with specific human metabolic processes, conditions, disorders, or diseases.
  • HMMs Hidden
  • Cell specific refers to those polynucleotides which occur at a statistically significant level in more than one tissue.
  • the commonality between the tissues may be ascribed to the types of cells that are an integral part of or would be expected to be found in a particular tissue, e.g., blood cells, nerve cells, endothelial cells, and the like.
  • the present invention provides a plurality of tissue or cell specific polynucleotides which may be used on an array to produce an expression profile.
  • This profile may define expression of these polynucleotides in normal tissue, during a particular metabolic or developmental process or during the onset, progression, or treatment of a human condition, disease, or disorder.
  • These polynucleotides represent known and novel genes normally expressed in the cells or tissues of the brain, heart, intestine, kidney, liver, lung, smooth muscle, ovary, pancreas, spleen, stomach, or uterus. The expression of these polynucleotides may be compared to the expression of other known or novel genes found on an array.
  • the plurality of polynucleotides comprises SEQ ID NOs:l-416.
  • Tissue or cell-specific reference sets may be selected from SEQ ID NOs:209-218 and 1-10, cell specific polynucleotides of heart and fragments thereof; b) SEQ ID NOs:219-249 and 11-41, cell specific polynucleotides of skeletal muscle and fragments thereof; c) SEQ ID NOs:250-251 and 42-43, cell specific polynucleotides of uterus and fragments thereof; d) SEQ ID NOs: 252-256 and 44-48, cell specific polynucleotides of ovary and fragments thereof; e) SEQ ID NOs:257-263 and 49-55, cell specific polynucleotides of stomach and fragments thereof; f) SEQ ID NOs:264-283 and 56-75, cell specific polynucleotides of intestine and fragments thereof; g) SEQ ID NOs:2
  • the invention also provides a substantially purified polynucleotide selected from SEQ ID NOs:212, 228, 233, 259, 271, 287, 316-319, 324, 370, 379, 380, 383, 410, and 412 or a fragment thereof, SEQ ID NO:4, 20, 25, 51, 63, 79, 108-111, 116, 162, 171, 172, 175, 202, and 204.
  • These polynucleotides may be used in an expression vector transformed into a host cell to produce a protein or a portion thereof by culturing the host cell under conditions for the expression of protein and recovering the protein from
  • the microarray can be used for large scale genetic or gene expression analysis of a large number of novel target polynucleotides.
  • targets are prepared by methods well known in the art and are from mammalian cells or tissues which are in a certain stage of development or differentiation; have been treated with a known molecule or compound, such as a cytokine, growth factor, a drug, and the like; or have been extracted or biopsied from a mammal with a known or unknown condition, disorder, or disease before or after treatment.
  • the plurality of polynuleotides are useful to determine the differentiation of embryonic stem cells toward brain, heart, kidney, liver, lung, muscle or pancreatic tissues or to determine whether a cancer is metastatic or its source by analyzing biopsied tissue from diseased brain, heart, kidney, liver, lung, muscle, ovarian, pancreatic, small intestine, stomach, or uterine tissues.
  • the plurality of polynucleotides may be used during the diagnosis of a cancer, an immunopathology, a neuropathology, and the like.
  • the target polynucleotides are hybridized to the probe polynucleotides for the purpose of defining a novel gene profile associated with that developmental stage, treatment, condition, disorder or disease.
  • the gene profile can be used for diagnosis, prognosis, or monitoring of treatments where altered expression of known and novel genes is associated with a cancer, an immunopathology, a neuropathology, and the like.
  • a gene profile can be used to investigate an individual's predisposition to a condition, disorder or disease such as a cancer, an immunopathology, a neuropathology, and the like.
  • the polynucleotides of the invention are employed as hybridizable polynucleotides on a microarray, the polynucleotides are organized in an ordered fashion so that each polynucleotide is present at a specified location on the substrate. Because the probe polynucleotides are at specified locations on the substrate, their hybridization patterns and intensities can be compared with the hybridization patterns and intensities of other known and novel polynucleotides to create an expression profile. Such a profile, interpreted in terms of expression levels of the cell and tissue specific, known, and novel genes can be correlated with a particular metabolic process, developmental stage, treatment, condition, disorder, disease, or stage of disease.
  • the plurality of polynucleotides can also be used to identify or purify a molecule or compound which specifically binds to at least one of the polynucleotides. These molecules may be identified from a sample or in high throughput mode from a large number of molecules and compounds including mRNAs, cDNAs, genomic fragments, and the like. Typically, the molecules or compounds will be of particular diagnostic or therapeutic interest. If nucleic acid molecules in a sample enhance the hybridization background, it may be advantageous to remove the offending molecules. One method for removing such molecules is by hybridizing the sample with immobilized probe polynucleotides and washing away those molecules that do not form hybridization complexes. At a later point, hybridization complexes can be dissociated, thereby releasing those molecules which specifically bind the probe polynucleotides. Method for Selecting Polynucleotide Probes
  • polynucleotides There are numerous different ways to select polynucleotides. Some of the more common ones include selecting probes from genes which are well known in the literature to have an association with a particular condition, disorder, or disease, which have a common functional characteristic such as the presence of a particular motif or domain or a signal peptide, which are expressed in a particular cell type or tissue such as blood or bone marrow, and the like.
  • the probes are non-redundant; therefore, no more than one probe represents a particular gene. Control sequences, however, may be selected specifically for their redundancy.
  • Polynucleotides of the composition may be manipulated to optimize their performance in ' hybridization technologies. Polynucleotide selection may be optimized by examining the sequences using a computer algorithm to identify fragments lacking potential secondary structure. Computer algorithms such as those employed in Vector NTI software (Informax, N. Bethesda MD) or LASER GENE software (DNASTAR, Madison WI) are well known in the art. These programs search nucleic acid sequences to identify stem loop structures and tandem repeats and to analyze G+C content of the sequence. In mammalian arrays, those sequences with a G+C content greater than 60% may be excluded. Alternatively, polynucleotides can be optimized under experimental conditions to determine whether polynucleotide probes and their complementary targets hybridize optimally.
  • the polynucleotides may be compared with clustered or assembled sequences to assure that each polynucleotide is derived from a different gene.
  • the polynucleotide may be physically extended utilizing the partial nucleotide sequences derived from the Incyte clone and employing the XL-PCR kit (Applied Biosystems, Foster City CA) or other means known in the art.
  • Polynucleotide probes can be genomic DNA or cDNA or mRNA, or any RNA-like or DNA-hke material, such as peptide nucleic acids, branched DNAs and the Uke. They may be the sense or antisense strand. Where targets are double stranded, probes may be either sense or antisense strands. Where targets are single stranded, probes are complementary single strands.
  • polynucleotide probes are cDNAs.
  • the size of the cDNAs may vary and is preferably from 15 to 10,000 nucleotides, more preferably from 60 to 4000 nucleotides, and most preferably from 200-600 nucleotides.
  • probes are plasmids.
  • the cDNA sequence of interest is the insert sequence. Excluding the vector DNA and regulatory sequences, cDNA size may vary preferably from 15 to 10,000 nucleotides, more preferably from 60 to 4000 nucleotides, and most perferably from 200-600 nucleotides.
  • Probes can be prepared by a variety of synthetic or enzymatic methods well known in the art. Probes can be synthesized, in whole or in part, using chemical methods well known in the art (Caruthers et al. (1980) Nucleic Acids Symp Ser (7):215-233). Alternatively, probes can be produced enzymatically or recombinantly, by in vitro or in vivo transcription.
  • Nucleotide analogues can be incorporated into the probes by methods well known in the art. The only requirement is that the incorporated nucleotide analogues of the probe must base pair with target nucleotides. For example, certain guanine nucleotides can be substituted with hypoxanthine which base pairs with cytosine residues. However, these base pairs are less stable than those between guanine and cytosine. Alternatively, adenine nucleotides can be substituted with 2,6-diaminopurine which can form stronger base pairs than those between adenine and thymidine.
  • probes can include nucleotides that have been derivatized chemically or enzymatically. Typical chemical modifications include derivatization with acyl, alkyl, aryl or amino groups.
  • Probes can be synthesized on a substrate. Synthesis on the surface of a substrate may be accomplished using a chemical coupling procedure and a piezoelectric printing apparatus as described by Baldeschweiler et al. (PCT AV095/251116). Alternatively, the probe can be synthesized on a substrate surface using a self-addressable electronic device that controls when reagents are added as described by Heller et al. (USPN 5,605,662).
  • cDNA Complementary DNA
  • Probes can be immobilized by covalent means such as by chemical bonding procedures or UV.
  • a cDNA is bound to a glass surface which has been modified to contain epoxide or aldehyde groups.
  • a cDNA probe is placed on a polylysine coated surface and then UV cross-linked as described by Shalon et al. (PCT/WO95/35505; inco ⁇ orated herein by reference).
  • a DNA is actively transported from a solution to a given position on a substrate by electrical means (Heller et al. supra).
  • probes, clones, plasmids or cells can be arranged on a filter.
  • cells are lysed, proteins and cellular components degraded, and the DNA is coupled to the filter by UV cross-linking.
  • probes do not have to be directly bound to the substrate, but rather can be bound to the substrate through a linker group.
  • the linker groups are typically about 6 to 50 atoms long to provide exposure of the attached probe.
  • Preferred linker groups include ethylene glycol oligomers, diamines, diacids and the like.
  • Reactive groups on the substrate surface react with a terminal group of the linker to bind the linker to the substrate. The other terminus of the linker is then bound to the probe.
  • Probes can be attached to a substrate by sequentially dispensing reagents for probe synthesis on the substrate surface or by dispensing preformed DNA fragments to the substrate surface.
  • Typical dispensers include a micropipette delivering solution to the substrate with a robotic system to control the position of the micropipette with respect to the substrate. There can be a multiplicity of dispensers so that reagents can be delivered to the reaction regions efficiently.
  • a sample containing targets is provided.
  • the samples can be any sample containing targets and obtained from any bodily fluid (blood, urine, saliva, phlegm, gastric juices, etc.), cultured cells, biopsies, or other tissue or forensic preparations.
  • DNA or RNA can be isolated from a sample according to any of a number of methods well known to those of skill in the art. For example, methods of purification of nucleic acids are described in Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology: Hybridization With Nucleic Acid Probes, Part I. Theory and Nucleic Acid Preparation. Elsevier Science, New York NY). In one case, total RNA is isolated using TRIZOL reagent (Life Technologies, Gaithersburg MD), and mRNA is isolated using oligo d(T) column chromatography or glass beads.
  • targets when targets are derived from an mRNA, targets can be a DNA reverse transcribed from an mRNA, an RNA transcribed from that DNA, a DNA amplified from that DNA, an RNA transcribed from the amplified DNA, and the like.
  • target when target is derived from DNA, target can be DNA amplified from DNA, or RNA reverse transcribed from DNA.
  • targets are prepared by more than one method.
  • Total mRNA can be amplified by reverse transcription using a reverse transcriptase and a primer consisting of oligo d(T) and a sequence encoding the phage T7 promoter to provide a single stranded DNA template.
  • the second DNA strand is polymerized using a DNA polymerase and an RNAse which assists in breaking up the DNA/RNA hybrid.
  • T7 RNA polymerase can be added, and RNA transcribed from the second DNA strand template as described by Van Gelder et al. (USPN 5,545,522).
  • RNA can be amplified in vitro, in situ or in yiyo (Eberwine, USPN 5,514,545). It is also advantageous to include quantisation controls to assure that amplification and labeling procedures do not change the true abundance of transcripts in a sample.
  • a sample is spiked with a known amount of control nucleic acid, and the probes include control probes which specifically hybridize with the control nucleic acid. After hybridization and processing, the hybridization signals should reflect accurately the amounts of control nucleic acid added to the sample. Prior to hybridization, it may be desirable to fragment the nucleic acids of the sample.
  • Fragmentation improves hybridization by minimizing secondary structure and cross-hybridization among the nucleic acids in the sample or with noncomplementary probes. Fragmentation can be performed by mechanical or chemical means.
  • the nucleic acids may be labeled with one or more labeling moieties to allow for detection and quantitation of hybridization complexes.
  • the labeling moieties can include compositions that can be detected by specfroscopic, photochemical, biochemical, bioelectronic, immunochemical, electrical, optical or chemical means.
  • the labeling moieties include radioisotopes, such as 32 P, 33 P or 35 S; chemiluminescent compounds, labeled binding proteins, heavy metal atoms, specfroscopic markers such as fluorescent markers and dyes; magnetic labels, linked enzymes, mass spectrometry tags, spin labels, electron transfer donors and acceptors, and the Uke.
  • Exemplary dyes include quinoline dyes, triarylmethane dyes, phthaleins, azo dyes, cyanine dyes, and the Uke.
  • fluorescent markers absorb Ught above about 300 nm, more preferably above 400 nm, and usually emit Ught at wavelengths at least greater than 10 nm above the wavelength of the Ught absorbed.
  • Preferred fluorescent markers include fluorescein, phycoerythrin, rhodamine, lissamine, and Cy3 and Cy5.
  • LabeUng can be carried out during an amplification reaction, such as polymerase chain and in vitro transcription reactions; by nick translation, or by 5' or 3 -end-labeUng reactions.
  • labeled nucleotides are used in an in vitro transcription reaction.
  • the label is inco ⁇ orated after or without an ampUfication step, the label is inco ⁇ orated either by using a terminal transferase or a kinase on the 5 ' end of the target polynucleotide and then incubating overnight with a labeled oUgonucleotide in the presence of T4 RNA Ugase.
  • the labeling moiety can be inco ⁇ orated after hybridization once a probe/target complex has formed.
  • biotin is first incorporated during an ampUfication step as described above. After the hybridization reaction, unbound nucleic acids are rinsed away so that the only biotin remaining bound to the substrate is that attached to targets that are hybridized to probes. Then, an avidin-conjugated fluorophore, such as avidin-phycoerythrin, that binds with high affinity to biotin is added.
  • the labeUng moiety is inco ⁇ orated by intercalation into preformed target/probe complexes. In this case, an intercalating dye such as a psoralen-Unked dye can be employed.
  • Probes or polynucleotides may be used to screen a library of molecules or compounds for specific binding affinity.
  • the Ubraries may be DNA molecules, RNA molecules, PNAs, peptides, proteins such as transcription factors, enhancers, repressors, and other organic or inorganic Ugands which regulate activities such as replication, transcription, or translation of polynucleotides in the biological system.
  • the assay involves combining the probe with the library of molecules or compounds under conditions that allow specific binding, and detecting specific binding to a ligand which specifically binds the probe.
  • a protein or a portion thereof transcribed and translated from a probe may be used to screen Ubraries of molecules or compounds in any of a variety of screening assays.
  • the protein or portion thereof may be free in solution, affixed to an abiotic or biotic substrate, borne on a cell surface, or located intracellularly. Specific binding between the protein and a Ugand may be measured.
  • the assay may be used to identify DNA, RNA, or PNAs, agonists, antagonists, antibodies, immunoglobulins, inhibitors, mimetics, peptides, proteins, drugs, or any other Ugand, that specifically binds the protein.
  • Purification of Ligand Probes may be used to purify a Ugand from a sample. A method for using a probe to purify a
  • Ugand would involve combining the probe with a sample under conditions to allow specific binding, detecting specific binding, recovering the bound protein, and using an appropriate agent to separate the polynucleotide from the purified ligand.
  • the encoded protein or a portion thereof may be used to purify a ligand from a sample.
  • a method for using a protein or a portion thereof to purify a ligand would involve combining the protein or a portion thereof with a sample under conditions to allow specific binding, detecting specific binding between the protein and Ugand, recovering the bound protein, and using an appropriate agent to separate the protein from the purified Ugand.
  • Hybridization and Detection Hybridization causes a denatured polynucleotide probe and a denatured complementary target to form a stable duplex through base pairing. Hybridization methods are well known to those skilled in the art.
  • Conditions can be selected for hybridization where completely complementary probe and target can hybridize, i.e., each base pair must interact with its complementary base pair. Alternatively, conditions can be selected where probe and target have mismatches of up to about 10% but are still able to hybridize. Suitable conditions can be selected by varying the concentrations of salt in the prehybridization, hybridization, and wash solutions or by varying the hybridization and wash temperatures. With some substrates, temperature can be decreased by adding formamide to the prehybridization and hybridization solutions.
  • Hybridization can be performed at low stringency with buffers, such as 5xSSC with 1 % sodium dodecyl sulfate (SDS) at 60°C, which permits hybridization between probe and target sequences that contain some mismatches to form probe/target complexes. Subsequent washes are performed at higher stringency with buffers such as 0.2xSSC with 0.1% SDS at either 45 °C (medium stringency) or 68 °C (high stringency), to maintain hybridization of only those probe/target complexes that contain completely complementary sequences. Background signals can be reduced by the use of detergents such as SDS, Sarcosyl, or TRITON X-100 (Sigma-Aldrich, St. Louis MO) or a blocking agent, such as salmon sperm DNA.
  • buffers such as 5xSSC with 1 % sodium dodecyl sulfate (SDS) at 60°C, which permits hybridization between probe and target sequences that contain some mismatches to form probe/target
  • Hybridization specificity can be evaluated by comparing the hybridization of control probe to target sequences that are added to a sample in a known amount.
  • the control probe may have one or more sequence mismatches compared with the corresponding target. In this manner, it is possible to evaluate whether only complementary probes are hybridizing to the targets or whether mismatched hybrid duplexes are forming.
  • Hybridization reactions can be performed in absolute or differential hybridization formats.
  • absolute hybridization format probes from one sample are hybridized to microarray probes, and signals detected after hybridization complexes form. Signal strength correlates with probe levels in a sample.
  • differential hybridization format differential expression of a set of genes in two biological samples is analyzed. Probes from the two samples are prepared and labeled with different labeUng moieties. A mixture of the two labeled targets is hybridized to the microarray probes, and signals are examined under conditions in which the emissions from the two different labels are individually detectable. Targets in the microarray that are hybridized to substantially equal numbers of probes derived from both biological samples give a distinct combined fluorescence (Shalon,
  • the labels are fluorescent labels with distinguishable emission spectra, such as a lissamine conjugated nucleotide analog and a fluorescein conjugated nucleotide analog.
  • Cy3 and Cy5 fluorophores are employed. After hybridization, the microarray is washed to remove nonhybridized polynucleotides, and complex formation between the hybridizable array probes and the targets is examined. Methods for detecting complex formation are well known to those skilled in the art.
  • the probes are labeled with a fluorescent label, and measurement of levels and patterns of fluorescence indicative of complex formation is accompUshed by fluorescence microscopy, preferably confocal fluorescence microscopy.
  • fluorescence microscopy preferably confocal fluorescence microscopy.
  • An argon ion laser excites the fluorescent label, emissions are directed to a photomultipUer, and the amount of emitted Ught is detected and quantitated.
  • the detected signal should be proportional to the amount of probe/target complexes at each position of the microarray.
  • the fluorescence microscope can be associated with a computer-driven scanner device to generate a quantitative two-dimensional image of hybridization intensity. The scanned image is examined to determine the abundance/expression level of hybridized probe.
  • microarray fluorescence intensities can be normalized to take into account variations in hybridization intensities when more than one microarray is used under similar test conditions.
  • individual polynucleotide probe/target complex hybridization intensities are normalized using the intensities derived from internal normalization controls contained on each microarray.
  • This section describes an expression profile using the polynucleotides of this invention.
  • the reference set can be used as part of a expression profile which detects changes in the expression of novel genes whose transcripts are modulated in a particular metabolic response, treatment, condition, disorder, or disease. These genes will include genes whose altered expression is correlated with a cancer, an immunopathology, a neuropathology, and the Uke.
  • the expression profile comprises a pluraUty of detectable hybridization complexes. Each complex is formed by hybridization of one or more probes to one or more complementary targets. At least one of the probes, preferably a pluraUty of probes, is hybridized to a complementary target forming, at least one and preferably, a pluraUty of complexes. A complex is detected by inco ⁇ orating at least one labeUng moiety.
  • the expression profiles provide "snapshots" that can show unique expression patterns that are characteristic of a metaboUc process, treatment, condition, disorder or disease.
  • probes After performing hybridization experiments and detecting signals from a microarray, particular probes can be identified and selected based on their expression patterns. Such probes can be used to clone a full length sequence for the gene, to screen a library for a closely related homolog, to screen for or purify ligands, or to produce a protein.
  • the pluraUty of polynucleotides can be used as hybridizable elements in a microarray.
  • a microarray can be employed in several appUcations including diagnostics, prognostics and treatment regimens, and drug discovery and development for conditions, disorders, and diseases such as cancer, an immunopathology, a neuropathology and the Uke.
  • the microarray is used to monitor the progression of disease.
  • the differences in gene expression between healthy and diseased tissues or cells can be assessed and cataloged.
  • disease can be diagnosed at eariier stages before the patient is symptomatic.
  • the invention can be used to formulate a prognosis and to design a treatment regimen.
  • the invention can also be used to monitor the efficacy of treatment.
  • the microarray is employed to "fine tune" the treatment regimen. A dosage is established that causes a change in genetic expression patterns indicative of successful treatment. Expression patterns associated with the onset of undesirable side effects are avoided. This approach may be more sensitive and rapid than waiting for the patient to show inadequate improvement, or to manifest side effects, before altering the course of treatment.
  • animal models which mimic a human disease can be used to characterize expression profiles associated with a particular condition, disorder or disease or the freatment of the condition, disorder or disease.
  • Experimental treatment regimens may be tested in these animal models using microarrays to establish and then follow expression profiles over time.
  • microarrays may be used with cell cultures or tissues removed from animal models to rapidly screen large numbers of candidate drug molecules, looking for ones that produce an expression profile similar to those of known therapeutic drugs, with the expectation that molecules with the same expression profile will likely have similar therapeutic effects.
  • the invention provides the means to rapidly determine the molecular mode of action of a drug.
  • Embryonic (ES) stem cells isolated from rodent or human embryos retain the potential to form embryonic tissues.
  • ES cells such as the mouse 129/SvJ cell line are placed in a blastocyst from the C57BL/6 mouse strain, they resume normal development and contribute to tissues of the live-born animal.
  • ES cells are preferred for use in the creation of experimental knockout and knockin animals.
  • the method for this process is well known in the art and the steps are: the cDNA is introduced into a vector, the vector is transformed into ES cells, transformed cells are identified and microinjected into mouse cell blastocysts, blastocysts are surgically transferred to pseudopregnant dams.
  • ES cells are also used for the treatment of victims of Parkinson's disease, stroke, and other neuropathologies (The Engineer, 14(18):lff; September 2000). Pharmaceutical companies are also targeting disorders of the liver, kidney, and pancreas, specifically alpha- 1 antifrypsin, polycystic kidney disease, and diabetes, respectively.
  • traumatic damage to the nervous system and internal organs may also be treated by transplantation of cells or organs which are differentiated from embryonic stem cells.
  • the present invention may be used to characterize the developmental pathways of the differentiation processes that give rise to brain, heart, kidney, Uver, lung, muscle, ovarian, pancreatic, small intestine, stomach, or uterine tissues. Knockout Analysis
  • a region of a gene is enzymatically modified to include a non-natural intervening sequence such as the neomycin phosphotransferase gene (neo; Capecchi (1989) Science 244:1288-1292).
  • the modified gene is transformed into cultured ES cells and integrates into the endogenous genome by homologous recombination.
  • the inserted sequence disrupts transcription and translation of the endogenous gene.
  • ES cells can be used to create knockin humanized animals or transgenic animal models of human diseases.
  • knockin technology a region of a human gene is injected into animal ES cells, and the human sequence integrates into the animal cell genome.
  • Transgenic progeny or inbred lines are studied and treated with potential pharmaceutical agents to obtain information on the progression and freatment of the analogous human condition.
  • cDNAs As described herein, the uses of the cDNAs, provided in the Sequence Listing of this appUcation, and their encoded proteins are exemplary of known techniques and are not intended to reflect any limitation on their use in any technique that would be known to the person of average skill in the art.
  • the cDNAs provided in this application may be used in molecular biology techniques that have not yet been developed, provided the new techniques rely on properties of nucleotide sequences that are currently known to the person of ordinary skill in the art, e.g., the triplet genetic code, specific base pair interactions, and the Uke.
  • reference to a method may include combining more than one method for obtaining, assembling or expressing cDNAs that will be known to those skilled in the art.
  • the BRAINON01 normaUzed cDNA library was constructed from cancerous brain tissue obtained from a 26-year-old Caucasian male during cerebral meningeal excision following diagnosis of grade 4 oUgoastrocytoma localized in the right fronto-parietal part of the brain.
  • the tumor had been irradiated (5800 rads).
  • Patient history included hemiplegia, epilepsy, ptosis of eyelid, and common migraine, and medications included Dilantin® (Parke-Davis, Morris Plains NJ).
  • the frozen tissue was homogenized and lysed using a POLYTRON homogenizer (PT-3000; Brinkmann Instruments, Westbury NY) in guanidinium isothiocyanate solution.
  • the lysate was extracted with acid phenol, pH 4.7, per Stratagene RNA isolation protocol (Stratagene, San Diego CA).
  • the RNA was extracted with an equal volume of acid phenol, reprecipitated using 0.3 M sodium acetate and 2.5 volumes of ethanol, resuspended in DEPC-treated water, and treated with DNase for 25 min at 37°C.
  • the RNA extraction was repeated with phenol, pH 8.7, and precipitated with sodium acetate and ethanol as before.
  • the mRNA was isolated with the OLIGOTEX kit (Qiagen, Chatsworth CA) and used to construct the cDNA library.
  • the mRNA was handled according to the recommended protocols in the SUPERSCRIPT plasmid system (Life Technologies).
  • cDNAs were fractionated on a SEPH AROSE CL4B column (Amersham Pharmacia Biotech), and those cDNAs exceeding 400 bp were Ugated into PSPORT I plasmid (Life Technologies).
  • the plasmid was transformed into DH5 competent cells (Life Technologies) to construct the BRAINOT03 library.
  • the library was normalized in a single round according to the procedure of Soares et al. (1994, Proc Natl Acad Sci 91 :9928-9932) with the following modifications: 1) the primer to template ratio in the primer extension reaction was increased from 2:1 to 10:1, 2) the ddNTP concentration was reduced to 150 ⁇ M to allow generation of longer (400-1000nt) primer extension products, and 3) the reanneaUng hybridization was extended from 13 to 48 hours.
  • Plasmid DNA was released from bacterial cells and purified using the REAL Prep 96 plasmid kit (Qiagen). This kit enabled the simultaneous purification of 96 samples in a 96-well block using multi-channel reagent dispensers. The recommended protocol was employed except for the following changes: 1) the bacteria were cultured in 1 ml of sterile TERRIFIC BROTH (BD Biosciences, Sparks MD) with carbenicilUn at 25 mg/L and glycerol at 0.4%; 2) the cultures were inoculated, incubated for 19 hours, and then lysed with 0.3 ml of lysis buffer; and 3) following isopropanol precipitation, the plasmid DNA pellet was resuspended in 0.1 ml of distilled water.
  • TERRIFIC BROTH BD Biosciences, Sparks MD
  • the cDNAs were prepared using a MICROLAB 2200 system (Hamilton, Reno NV) in combination with DNA ENGINE thermal cyclers (PTC200; MJ Research, Waltham MA).
  • the cDNAs were sequenced by the method of Sanger and Coulson (1975; J Mol Biol 94:441 f) using ABI PRISM 377 DNA sequencing systems (AppUed Biosystems). Most of the sequences were sequenced using standard ABI protocols and kits (Applied Biosystems) at solution volumes of 0.25x - l.Ox. In the alternative, some of the sequences were sequenced using solutions and dyes from Amersham Pharmacia Biotech.
  • Incyte clones were mapped to non-redundant Unigene clusters (Unigene database (build 46), NCBI; Shuler (1997) J Mol Med 75:694-698), and the 5' clone with the strongest BLAST alignment (at least 90% identity and 100 bp overlap) was chosen, verified, and used in the construction of the microarray.
  • the UNI GEM V microarray (Incyte Genomics) contains 7075 array elements which represent 4610 annotated genes and 2,184 unannotated clusters. Table 1 shows the GenBank 119 annotations for SEQ ID NOs:l-416 of this invention as produced by BLAST analysis.
  • BLAST involves finding similar segments between the query sequence and a database sequence, evaluating the statistical significance of any similarities, and reporting only those matches that satisfy a user-selectable threshold of significance. BLAST produces alignments of both nucleotide and amino acid sequences to determine sequence similarity.
  • HSP High scoring Segment Pair
  • the basis of the search is the product score, which is defined as:
  • the product score takes into account both the degree of identity between two sequences and the length of the sequence match as reflected in the BLAST score.
  • the BLAST score is calculated by scoring +5 for every base that matches in an HSP and -4 for every mismatch. For a product score of
  • the match will be exact within a 1 % to 2% error and for a product score of 70, the match will be exact.
  • Homologous molecules usually show product scores between 15 and 40, although lower scores may identify related molecules.
  • the P- value for any given HSP is a function of its expected frequency of occurrence and the number of HSPs observed against the same database sequence with scores at least as high.
  • Percent sequence identity is found in a comparison of two or more amino acid or nucleic acid sequences. Percent identity can be determined electronically using the MEGALIGN program, a component of LASERGENE software (DNASTAR). The percent similarity between two amino acid sequences is calculated by dividing the length of sequence A, minus the number of gap residues in sequence A, minus the number of gap residues in sequence B, into the sum of the residue matches between sequence A and sequence B, times one hundred. Gaps of low or of no homology between the two amino acid sequences are not included in determining percentage similarity.
  • Sequences with conserved protein motifs may be searched using the BLOCKS search program.
  • This program analyses sequence information contained in the Swiss-Prot and PROSITE databases and is useful for determining the classification of uncharacterized proteins translated from genomic or cDNA sequences (Bairoch et al. (1997) Nucleic Acids Res 25:217-221; Attwood et al. (1997) J Chem Inf Comput Sci 37:417-424).
  • PROSITE database is a useful source for identifying functional or structural domains that are not detected using motifs due to extreme sequence divergence. Using weight matrices, these domains are calibrated against the SWISS-PROT database to obtain a measure of the chance distribution of the matches.
  • the PRINTS database can be searched using the BLIMPS search program to obtain protein family "finge ⁇ rints".
  • the PRINTS database complements the PROSITE database by exploiting groups of conserved motifs within sequence alignments to build characteristic signatures of different protein famiUes.
  • nucleic acid sequences of the Sequence Listing designed F, R, or T, were produced by extension of an appropriate fragment of the original clone insert using oligonucleotide primers designed from this fragment.
  • One primer was synthesized to initiate 5' extension of the known sequence, and the other primer, to initiate 3' extension of the known sequence.
  • the initial primers were designed using OLIGO software (Molecular Insights, Cascade CO), or another appropriate program, to be about 22 to 30 nucleotides in length, to have a GC content of about 50% or more, and to anneal to the target sequence at temperatures of about 68°C to about 72°C. Any stretch of nucleotides which would result in hai ⁇ in structures and primer-primer dimerizations was avoided.
  • Selected human cDNA libraries were used to extend the sequence. If more than one extension was necessary or desired, additional or nested sets of primers were designed.
  • PCR was performed in 96-well plates using the DNA ENGINE thermal cycler (MJ Research).
  • the reaction mix contained DNA template, 200 nmol of each primer, reaction buffer containing Mg 2+ , (NH 4 ) 2 S0 4 , and ⁇ -mercaptoethanol, Taq DNA polymerase (Amersham Pharmacia Biotech), ELONGASE enzyme (Life Technologies), and Pfu DNA polymerase (Stratagene), with the following parameters for primer pair PCI A and PCI B: Step 1: 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68°C, 5 min; Step 7: storage at 4°C.
  • the parameters for primer pair T7 and SK+ were as follows: Step 1 : 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 57°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68°C, 5 min; Step 7: storage at 4°C.
  • the concentration of DNA in each well was determined by dispensing 100 ⁇ l PICOGREEN reagent (0.25% v/v PICOGREEN (Molecular Probes, Eugene OR) dissolved in lx TE) and 0.5 ⁇ l of undiluted PCR product into each well of an opaque fluorimeter plate (Corning Costar, Acton MA), allowing the DNA to bind to the reagent.
  • the plate was scanned in a Fluoroskan II (Labsystems Oy, Helsinki FI) to measure the fluorescence of the sample and to quantify the concentration of DNA.
  • a 5 l to 10 ⁇ aUquot of the reaction mixture was analyzed by electrophoresis on a 1 % agarose minigel to determine which reactions were successful in extending the sequence.
  • the extended nucleotides were desalted and concentrated, transferred to 384-well plates, digested with CviJI cholera virus endonuclease (Molecular Biology Research, Madison WI), and sonicated or sheared prior to religation into pUC 18 vector (Amersham Pharmacia Biotech).
  • CviJI cholera virus endonuclease Molecular Biology Research, Madison WI
  • sonicated or sheared prior to religation into pUC 18 vector
  • the digested nucleotides were separated on 0.6% to 0.8% agarose gels, fragments were excised, and agar digested with AGARACE (Promega).
  • Extended clones were reUgated using T4 ligase (New England Biolabs, Beverly MA) into pUC 18 vector (Amersham Pharmacia Biotech), treated with Pfu DNA polymerase (Stratagene) to fill-in restriction site overhangs, and transfected into competent E. coU cells. Transformed cells were selected on antibiotic-containing media, and individual colonies were picked and cultured overnight at 37°C in 384-well plates in LB/2x carbenicilUn tiquid media.
  • the cells were lysed, and DNA was ampUfied using Taq DNA polymerase (Amersham Pharmacia Biotech) and Pfu DNA polymerase (Stratagene) with the following parameters: Step 1: 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 72°C, 2 min; Step 5: steps 2, 3, and 4 repeated 29 times; Step 6: 72°C, 5 min; Step 7: storage at 4°C. DNA was quantified by PICOGREEN reagent (Molecular Probes) as described above. Samples with low DNA recoveries were reampUfied using the conditions described above.
  • mRNA for Target Polynucleotides The mRNAs or tissues for preparing target polynucleotides were obtained from Biochain
  • Probe polynucleotides were amplified from bacterial vectors by thirty cycles of PCR using primers complementary to vector sequences flanking the insert and purified using SEPHACRYL-400 beads (Amersham Pharmacia Biotech). Purified polynucleotides were robotically arrayed onto a glass microscope slide (Corning Science Products, Corning NY) previously coated with 0.05% aminopropyl silane (Sigma-Aldrich) and cured at 110°C. The microarray was exposed to UV irradiation in a STRATALINKER UV-crossUnker (Stratagene).
  • mRNA sample shown in Table 2, was reverse transcribed using MMLV reverse transcriptase in the presence of dCTP-Cy3 or dCTP-Cy5 (Amersham Pharmacia Biotech) according to standard protocol. After incubation at 37°C, the reaction was stopped with 0.5 M sodium hydroxide, and RNA was degraded at 85 °C.
  • the target polynucleotides were purified using CHROMASPIN 30 columns (Clontech, Palo Alto CA) and ethanol precipitation.
  • the hybridization mixture containing 0.2 mg of each of the Cy3 and Cy5 labeled target polynucleotides, was heated to 65°C, and dispensed onto the UNIGEM V microarray (Incyte Genomics) surface.
  • the microarray was covered with a coversUp and incubated at 60°C C.
  • the microarrays were sequentially washed at 45°C in moderate stringency buffer (lxSSC and 0.1% SDS) and high stringency buffer (O.lxSSC) and dried.
  • a confocal laser microscope was used to detect the fluorescence-labeled hybridization complexes. Excitation wavelengths were 488 nm for Cy3 and 632 nm for Cy5. Each array was scanned twice, one scan per fluorophore. The emission maxima was 565 nm for Cy3 and 650 nm for Cy5. The emitted light was split into two photomultipUer tube detectors based on wavelength. The output of the photomultipUer tube was digitized and displayed as an image, where the signal intensity was represented using a Unear 20 color transformation, with red representing a high signal and blue a low signal. The fluorescence signal for each probe was integrated to obtain a numerical value corresponding to the signal intensity using GEMTOOLS expression analysis software (Incyte Genomics).
  • Vw Variance within (Vw) categories.
  • F ratio The ratio of Vb divided by Vw (F ratio) was compared to the F distribution for a population of equal degree of freedom (DF) and the probability of the F ratio was returned.
  • Vbetween ⁇ Vwithin Vwithin
  • genes were associated with a primary tissue category according to the highest differential average value.
  • a minimum differential average value of 1.5 was required to associate a gene with a tissue category.
  • genes were associated with a secondary, tertiary, and even quaternary tissue category according to the second, third, and fourth highest differential average values, respectively.
  • the polynucleotide or fragments thereof and the protein or portions thereof are labeled with 32 P-dCTP, Cy3-dCTP, Cy5-dCTP (Amersham Pharmacia Biotech), or BIODIPY or FITC (Molecular Probes), respectively.
  • Candidate molecules or compounds previously arranged on a substrate are incubated in the presence of labeled nucleic or amino acid. After incubation under conditions for either a polynucleotide or protein, the substrate is washed, and any position on the substrate retaining label, which indicates specific binding or complex formation, is assayed. The binding molecule is identified by its arrayed position on the substrate.

Abstract

The present invention relates to a composition comprising a plurality of polynucleotides which are cell and/or tissue specific and which may be used in their entirety or in part as references in producing an expression profile that defines a metabolic or developmental process, treatment, condition, disease, or disorder.

Description

TISSUE SPECIFIC GENES OF DIAGNOSTIC IMPORT
TECHNICAL FIELD
The present invention relates to a composition comprising a plurality of polynucleotides which are cell and/or tissue specific. These polynucleotides may be used to define and direct a metabolic or developmental process, to identify or to monitor the progression of a condition, disease, or disorder, or to evaluate and monitor the efficacy of a treatment protocol.
BACKGROUND ART Array technology can provide a simple way to explore the expression of a single polymorphic gene or the expression profile of a large number of related or unrelated genes. When the expression of a single gene is examined, arrays are employed to detect the expression of a specific gene or its variants. When an expression profile is examined, arrays provide a platform for examining which genes are tissue specific, direct the differentiation of a cell type or tissue, carry out housekeeping functions, function as parts of a signaling cascade, or characterize a particular genetic predisposition, condition, disease, or disorder.
The application of gene expression profiling is particularly relevant to improving diagnosis and prognosis of disease. However, in order to determine whether expression of a particular gene in a particular disease is significant, it is useful to provide a reference set of tissue and cell specific genes against which genes expressed during the disease process may be compared. For example, both the levels and sequences expressed in brain tumors may be compared with the levels and sequences expressed in normal brain tissue. These comparisons may be made on a single array by incorporating a particular tissue or cell specific reference set alongside novel sequences or on multiple arrays, each of which contains at least some subset of the known reference set.
The present invention satisfies a need in the art in that it provides such a reference set. The reference set may be used in its entirety or in part to produce an expression profile that may be used to define and direct a metabolic or developmental process, to identify or to monitor the progression of a condition, disease, or disorder, or to evaluate and monitor the efficacy of a treatment protocol.
SUMMARY The present invention provides a plurality of tissue or cell specific polynucleotides which may be used on an array to produce an expression profile. This profile may define expression of the polynucleotides in normal tissue, during a particular metabolic or developmental process or during the onset, progression, or treatment of a human condition, disease, or disorder. In one embodiment, these polynucleotides are selected from SEQ ID NOs:l-416.
The invention also provides a plurality of polynucleotides which display tissue or cell specific expression and are selected from: a) SEQ ID NOs:209-218 and 1-10, cell specific polynucleotides of heart and fragments thereof; b) SEQ ID NOs:219-249 and 11-41, cell specific polynucleotides of skeletal muscle and fragments thereof; c) SEQ ID NOs: 250-25 land 42-43, cell specific polynucleotides of uterus and fragments thereof; d) SEQ ID NOs:252-256 and 44-48, cell specific polynucleotides of 5 ovary and fragments thereof; e) SEQ ID NOs:257-263 and 49-55, cell specific polynucleotides of stomach and fragments thereof; f) SEQ ID NOs:264-283 and 56-75, cell specific polynucleotides of intestine and fragments thereof; g) SEQ ID NOs:284-293 and 76-85, cell specific polynucleotides of lung and fragments thereof; h) SEQ ID NOs:294-345 and 86-137, cell specific polynucleotides of liver and fragments thereof; i) SEQ ID NOs:346-356 and 138-148, cell specific polynucleotides of kidney
10 and fragments thereof; j) SEQ ID NOs:357-374 and 149-166, cell specific polynucleotides of pancreas and fragments thereof; and k) SEQ ID NOs:375-416 and 167-208, cell specific polynucleotides of brain and fragments thereof. In one aspect, the plurality of polynucleotides are immobilized on a substrate.
In another embodiment, the expression of a plurality of polynucleotides is used to detect
15 expression in a tissue. In one aspect, the tissue is embryonic stem cells which are differentiating into brain, heart, kidney, liver, lung, muscle or pancreatic tissues. In a second aspect, the tissue is a biopsy from diseased brain, heart, kidney, liver, lung, muscle, ovarian, pancreatic, small intestine, stomach, or uterine tissues which is being diagnosed for a cancer or immune or inflammatory disease or subjected to forensic analysis. In a third aspect, the point of origin of a metastatic cancer is determined.
20 In another embodiment, the polynucleotides are used in high throughput methods of screening molecules or compounds to identify a ligand, the method comprising combining a polynucleotide with molecules or compounds under conditions to allow specific binding and detecting specific binding, thereby identifying a ligand which specifically binds to the polynucleotide. The molecules or compounds to be screened are selected from DNA molecules, RNA molecules, PNAs, mimetics,
25 peptides, and proteins.
In another embodiment, the invention provides a substantially purified polynucleotide selected from SEQ ID NOs:212, 228, 233, 259, 271, 287, 316-319, 324, 370, 379, 380, 383, 410, and 412 or a fragment thereof, SEQ ID NO:4, 20, 25, 51, 63, 79, 108-111, 116, 162, 171, 172, 175, 202, and 204. In one aspect, the polynucleotide selected from SEQ ID NOs:NOs:212, 228, 233, 259, 271, 287,
30 316-319, 324, 370, 379, 380, 383, 410, and 412 or a fragment thereof, SEQ ID NO:4, 20, 25, 51, 63, 79, 108-111, 116, 162, 171, 172, 175, 202, and 204 is used in an expression vector transformed into a host cell to produce a protein or a portion thereof by culturing the host cell under conditions for the expression of protein and recovering the protein from the host cell culture.
In a third embodiment, the invention provides a protein or a portion thereof. In one aspect, the
35 protein is used in a high throughput method to screen large numbers of molecules or compounds to identify at least one ligand which specifically binds the protein, the method comprising combining the protein with the molecules or compounds under conditions to allow specific binding and detecting specific binding, thereby identifying a ligand which specifically binds the protein. In a second aspect, the protein is used to purify a ligand, the method comprising combining the protein with a sample under conditions to allow specific binding, recovering the bound protein, and separating the protein from the ligand, thereby obtaining purified ligand. The molecules or compounds screened or purified may be selected from DNA molecules, RNA molecules, PNAs, mimetics, peptides, proteins, agonists, antagonists, antibodies or their fragments, immunoglobulins, inhibitors, drug compounds, and pharmaceutical agents. Any of these molecules or compounds may have diagnostic or therapeutic applications.
DESCRIPTION OF THE SEQUENCE LISTING AND TABLES A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The Sequence Listing is a compilation of polynucleotides obtained by sequencing and extension of clone inserts of different cDNAs. Each sequence is identified by a sequence identification number (SEQ ID NO or SEQ ID) and by the clone number (Incyte ID) from which it was obtained.
Table 1 lists the fragments and extended polynucleotides by their SEQ ID NO and cDNA respectively, tissue, and by the description associated with at least a fragment of a homologous polynucleotide in GenBank. The descriptions were obtained using the sequences of the Sequence Listing and BLAST analysis.
Table 2 lists the source of the RNAs used to produce target polynucleotides for hybridization to the UNIGEM V microarray (Incyte Genomics, Palo Alto CA). The columns present the Source No, Tissue, Age, Ethnicity/Sex, Cause of Death, and Conditions or Diseases, as known for each donor. Table 3 shows the data for each of the clones across each of the tissues used in the experiments. The columns present Clone ID and the tissues (with source number)-heart, skeletal muscle, uterus, stomach, small intestine, lung, liver, kidney, pancreas, spleen and brain. This data was produced using GEMTOOLS software (Incyte Genomics). Table 4 presents the analysis of variance (ANOVA) for the data. The columns present Clone
ID, Var. Betw (variance between), Var. Within (variance within), F (value), and Probability. These values were produced using batch ANOVA (Sokal and Rohlf (1969) Biometry; the Principles and Practice of Statistics in Biological Research. WH Freeman, San Francisco CA) and EXCEL98 software (Microsoft, Seattle WA). Table 5 shows the cell and tissue specificity of the polynucleotides across tissues (heart, skeletal muscle, uterus, stomach, small intestine, lung, liver, kidney, pancreas, spleen and brain). The cell and tissue specific groupings were produced using mean values [mean (tissue)- mean (entire set)] and grouped using EXCEL98 software (Microsoft).
DESCRIPTION OF THE INVENTION Definitions
The term "array" refers to an ordered arrangement of hybridizable polynucleotides. These are arranged so that there are a "plurality" of polynucleotides, preferably at least one polynucleotide, preferably at least 100 polynucleotides, and more preferably at least 1,000 polynucleotides, and even more preferably at least 10,000 polynucleotides on a 1 cm2 substrate. The maximum number of polynucleotides is unlimited, but is at least 100,000. Furthermore, the signal from each of the hybridized polynucleotides is individually distinguishable.
A "polynucleotide" refers to a chain of nucleotides. Preferably, the chain has from about 15 to 10,000 nucleotides and more preferably from about 400 to 6,000 nucleotides. The term "probe" refers to a probe polynucleotide capable of hybridizing with a target polynucleotide to form a hybridization complex. In most instances, the sequences of the probe and target polynucleotides will be complementary (no mismatches) when aligned. In some instances, there may be up to a 10% mismatch.
"Fragment" refers to any part of an Incyte clone or polynucleotide which retains a useful characteristic. Useful fragments may be used in hybridization technologies, to identify or purify ligands, or as a therapeutic to regulate replication, transcription or translation. "Ligand" refers to any agent, molecule, or compound which will bind specifically to a complementary site on a polynucleotide or protein. Such ligands stabilize or modulate the activity of polynucleotides or proteins and may be composed of at least one of the following: inorganic and organic substances including nucleic acids, proteins, carbohydrates, fats, and lipids.
"Purified" refers to any molecule or compound that is removed, isolated, or separated from its natural environment and is at least about 60% free, and more preferably about 90% free, from other components with which it is naturally associated.
"Specific binding" refers to a special and precise interaction between two molecules which is dependent upon a particular structure such as molecular side groups. For example, the hydrogen bonding between two single stranded nucleic acids or the binding between an epitope or a protein and an agonist, antagonist, or antibody.
"Sample" is used in its broadest sense. A sample containing polynucleotides may comprise a bodily fluid; an extract from a cell, chromosome, organelle, or membrane isolated from a cell; genomic DNA, RNA, or cDNA in solution or bound to a substrate; a cell; a tissue; a tissue print; a finger print, a hair, and the like. "Portion" refers to any part of a protein used for any purpose, but especially for the screening of molecules or compounds to identify those which specifically bind to that portion and for producing antibodies.
The phrase "polynucleotide encoding a protein" refers to nucleic acid sequence that closely aligns with a sequence which encodes a conserved protein motif or domain that were identified by employing analyses well known in the art. These analyses include Hidden Markov Models (HMMs) such as PFAM (Krogh (1994) J Mol Biol 235:1501-1531; Sonnhamer et al. (1988) Nucl Acids Res 26:320-322), BLAST (Basic Local Alignment Search Tool; Altschul (1993) J Mol Evol 36: 290-300; and Altschul et al. (1990) J Mol Biol 215:403-410), or other analytical tools such as BLIMPS (Henikoff et al. (1998) Nucl Acids Res 26:309-12). Additionally, "polynucleotide encoding a protein" may refer to a polynucleotide that is expressed in or associated with specific human metabolic processes, conditions, disorders, or diseases.
"Cell specific", as defined herein, refers to those polynucleotides which occur at a statistically significant level in more than one tissue. The commonality between the tissues may be ascribed to the types of cells that are an integral part of or would be expected to be found in a particular tissue, e.g., blood cells, nerve cells, endothelial cells, and the like. The Invention
The present invention provides a plurality of tissue or cell specific polynucleotides which may be used on an array to produce an expression profile. This profile may define expression of these polynucleotides in normal tissue, during a particular metabolic or developmental process or during the onset, progression, or treatment of a human condition, disease, or disorder. These polynucleotides represent known and novel genes normally expressed in the cells or tissues of the brain, heart, intestine, kidney, liver, lung, smooth muscle, ovary, pancreas, spleen, stomach, or uterus. The expression of these polynucleotides may be compared to the expression of other known or novel genes found on an array. The plurality of polynucleotides, the entire reference set, comprises SEQ ID NOs:l-416. Tissue or cell-specific reference sets may be selected from SEQ ID NOs:209-218 and 1-10, cell specific polynucleotides of heart and fragments thereof; b) SEQ ID NOs:219-249 and 11-41, cell specific polynucleotides of skeletal muscle and fragments thereof; c) SEQ ID NOs:250-251 and 42-43, cell specific polynucleotides of uterus and fragments thereof; d) SEQ ID NOs: 252-256 and 44-48, cell specific polynucleotides of ovary and fragments thereof; e) SEQ ID NOs:257-263 and 49-55, cell specific polynucleotides of stomach and fragments thereof; f) SEQ ID NOs:264-283 and 56-75, cell specific polynucleotides of intestine and fragments thereof; g) SEQ ID NOs:284-293 and 76-85, cell specific polynucleotides of lung and fragments thereof; h) SEQ ID NOs:294-345 and 86-137, cell specific polynucleotides of liver and fragments thereof; i) SEQ ID NOs:346-356 and 138-148, cell specific polynucleotides of kidney and fragments thereof; j) SEQ ID NOs:357-374 and 149-166, cell specific polynucleotides of pancreas and fragments thereof; and k) SEQ ID NOs: 375 -416 and 167-208, cell specific polynucleotides of brain and fragments thereof. The plurality of polynucleotides is arrayed on a substrate, preferably a microarray or used as probes.
The invention also provides a substantially purified polynucleotide selected from SEQ ID NOs:212, 228, 233, 259, 271, 287, 316-319, 324, 370, 379, 380, 383, 410, and 412 or a fragment thereof, SEQ ID NO:4, 20, 25, 51, 63, 79, 108-111, 116, 162, 171, 172, 175, 202, and 204. These polynucleotides may be used in an expression vector transformed into a host cell to produce a protein or a portion thereof by culturing the host cell under conditions for the expression of protein and recovering the protein from the host cell culture.
The microarray can be used for large scale genetic or gene expression analysis of a large number of novel target polynucleotides. These targets are prepared by methods well known in the art and are from mammalian cells or tissues which are in a certain stage of development or differentiation; have been treated with a known molecule or compound, such as a cytokine, growth factor, a drug, and the like; or have been extracted or biopsied from a mammal with a known or unknown condition, disorder, or disease before or after treatment. Specifically, the plurality of polynuleotides are useful to determine the differentiation of embryonic stem cells toward brain, heart, kidney, liver, lung, muscle or pancreatic tissues or to determine whether a cancer is metastatic or its source by analyzing biopsied tissue from diseased brain, heart, kidney, liver, lung, muscle, ovarian, pancreatic, small intestine, stomach, or uterine tissues. The plurality of polynucleotides may be used during the diagnosis of a cancer, an immunopathology, a neuropathology, and the like. The target polynucleotides are hybridized to the probe polynucleotides for the purpose of defining a novel gene profile associated with that developmental stage, treatment, condition, disorder or disease. Subsequently, the gene profile can be used for diagnosis, prognosis, or monitoring of treatments where altered expression of known and novel genes is associated with a cancer, an immunopathology, a neuropathology, and the like. In some cases, a gene profile can be used to investigate an individual's predisposition to a condition, disorder or disease such as a cancer, an immunopathology, a neuropathology, and the like.
When the polynucleotides of the invention are employed as hybridizable polynucleotides on a microarray, the polynucleotides are organized in an ordered fashion so that each polynucleotide is present at a specified location on the substrate. Because the probe polynucleotides are at specified locations on the substrate, their hybridization patterns and intensities can be compared with the hybridization patterns and intensities of other known and novel polynucleotides to create an expression profile. Such a profile, interpreted in terms of expression levels of the cell and tissue specific, known, and novel genes can be correlated with a particular metabolic process, developmental stage, treatment, condition, disorder, disease, or stage of disease. The plurality of polynucleotides can also be used to identify or purify a molecule or compound which specifically binds to at least one of the polynucleotides. These molecules may be identified from a sample or in high throughput mode from a large number of molecules and compounds including mRNAs, cDNAs, genomic fragments, and the like. Typically, the molecules or compounds will be of particular diagnostic or therapeutic interest. If nucleic acid molecules in a sample enhance the hybridization background, it may be advantageous to remove the offending molecules. One method for removing such molecules is by hybridizing the sample with immobilized probe polynucleotides and washing away those molecules that do not form hybridization complexes. At a later point, hybridization complexes can be dissociated, thereby releasing those molecules which specifically bind the probe polynucleotides. Method for Selecting Polynucleotide Probes
There are numerous different ways to select polynucleotides. Some of the more common ones include selecting probes from genes which are well known in the literature to have an association with a particular condition, disorder, or disease, which have a common functional characteristic such as the presence of a particular motif or domain or a signal peptide, which are expressed in a particular cell type or tissue such as blood or bone marrow, and the like.
Preferably, the probes are non-redundant; therefore, no more than one probe represents a particular gene. Control sequences, however, may be selected specifically for their redundancy.
Polynucleotides of the composition may be manipulated to optimize their performance in' hybridization technologies. Polynucleotide selection may be optimized by examining the sequences using a computer algorithm to identify fragments lacking potential secondary structure. Computer algorithms such as those employed in Vector NTI software (Informax, N. Bethesda MD) or LASER GENE software (DNASTAR, Madison WI) are well known in the art. These programs search nucleic acid sequences to identify stem loop structures and tandem repeats and to analyze G+C content of the sequence. In mammalian arrays, those sequences with a G+C content greater than 60% may be excluded. Alternatively, polynucleotides can be optimized under experimental conditions to determine whether polynucleotide probes and their complementary targets hybridize optimally.
Where the greatest numbers of non redundant polynucleotides are desired, the polynucleotides may be compared with clustered or assembled sequences to assure that each polynucleotide is derived from a different gene. To obtain a longer or different probe for a particular gene, the polynucleotide may be physically extended utilizing the partial nucleotide sequences derived from the Incyte clone and employing the XL-PCR kit (Applied Biosystems, Foster City CA) or other means known in the art. Polynucleotide Probes
Polynucleotide probes can be genomic DNA or cDNA or mRNA, or any RNA-like or DNA-hke material, such as peptide nucleic acids, branched DNAs and the Uke. They may be the sense or antisense strand. Where targets are double stranded, probes may be either sense or antisense strands. Where targets are single stranded, probes are complementary single strands.
In one embodiment, polynucleotide probes are cDNAs. The size of the cDNAs may vary and is preferably from 15 to 10,000 nucleotides, more preferably from 60 to 4000 nucleotides, and most preferably from 200-600 nucleotides. In another embodiment, probes are plasmids. In this case, the cDNA sequence of interest is the insert sequence. Excluding the vector DNA and regulatory sequences, cDNA size may vary preferably from 15 to 10,000 nucleotides, more preferably from 60 to 4000 nucleotides, and most perferably from 200-600 nucleotides.
Polynucleotide probes can be prepared by a variety of synthetic or enzymatic methods well known in the art. Probes can be synthesized, in whole or in part, using chemical methods well known in the art (Caruthers et al. (1980) Nucleic Acids Symp Ser (7):215-233). Alternatively, probes can be produced enzymatically or recombinantly, by in vitro or in vivo transcription.
Nucleotide analogues can be incorporated into the probes by methods well known in the art. The only requirement is that the incorporated nucleotide analogues of the probe must base pair with target nucleotides. For example, certain guanine nucleotides can be substituted with hypoxanthine which base pairs with cytosine residues. However, these base pairs are less stable than those between guanine and cytosine. Alternatively, adenine nucleotides can be substituted with 2,6-diaminopurine which can form stronger base pairs than those between adenine and thymidine.
Additionally, probes can include nucleotides that have been derivatized chemically or enzymatically. Typical chemical modifications include derivatization with acyl, alkyl, aryl or amino groups.
Probes can be synthesized on a substrate. Synthesis on the surface of a substrate may be accomplished using a chemical coupling procedure and a piezoelectric printing apparatus as described by Baldeschweiler et al. (PCT AV095/251116). Alternatively, the probe can be synthesized on a substrate surface using a self-addressable electronic device that controls when reagents are added as described by Heller et al. (USPN 5,605,662).
Complementary DNA (cDNA) can be arranged and then immobilized on a substrate. Probes can be immobilized by covalent means such as by chemical bonding procedures or UV. In one such method, a cDNA is bound to a glass surface which has been modified to contain epoxide or aldehyde groups. In another case, a cDNA probe is placed on a polylysine coated surface and then UV cross-linked as described by Shalon et al. (PCT/WO95/35505; incoφorated herein by reference). In yet another method, a DNA is actively transported from a solution to a given position on a substrate by electrical means (Heller et al. supra). Alternatively, probes, clones, plasmids or cells can be arranged on a filter. In the latter case, cells are lysed, proteins and cellular components degraded, and the DNA is coupled to the filter by UV cross-linking. Furthermore, probes do not have to be directly bound to the substrate, but rather can be bound to the substrate through a linker group. The linker groups are typically about 6 to 50 atoms long to provide exposure of the attached probe. Preferred linker groups include ethylene glycol oligomers, diamines, diacids and the like. Reactive groups on the substrate surface react with a terminal group of the linker to bind the linker to the substrate. The other terminus of the linker is then bound to the probe.
Probes can be attached to a substrate by sequentially dispensing reagents for probe synthesis on the substrate surface or by dispensing preformed DNA fragments to the substrate surface. Typical dispensers include a micropipette delivering solution to the substrate with a robotic system to control the position of the micropipette with respect to the substrate. There can be a multiplicity of dispensers so that reagents can be delivered to the reaction regions efficiently. Sample Preparation
In order to conduct sample analysis, a sample containing targets is provided. The samples can be any sample containing targets and obtained from any bodily fluid (blood, urine, saliva, phlegm, gastric juices, etc.), cultured cells, biopsies, or other tissue or forensic preparations.
DNA or RNA can be isolated from a sample according to any of a number of methods well known to those of skill in the art. For example, methods of purification of nucleic acids are described in Tijssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology: Hybridization With Nucleic Acid Probes, Part I. Theory and Nucleic Acid Preparation. Elsevier Science, New York NY). In one case, total RNA is isolated using TRIZOL reagent (Life Technologies, Gaithersburg MD), and mRNA is isolated using oligo d(T) column chromatography or glass beads. In one alternative, when targets are derived from an mRNA, targets can be a DNA reverse transcribed from an mRNA, an RNA transcribed from that DNA, a DNA amplified from that DNA, an RNA transcribed from the amplified DNA, and the like. When target is derived from DNA, target can be DNA amplified from DNA, or RNA reverse transcribed from DNA. In yet another alternative, targets are prepared by more than one method.
When targets are amplified it is desirable to amplify the nucleic acids in the sample and to maintain their relative abundances, including low abundance transcripts. Total mRNA can be amplified by reverse transcription using a reverse transcriptase and a primer consisting of oligo d(T) and a sequence encoding the phage T7 promoter to provide a single stranded DNA template. The second DNA strand is polymerized using a DNA polymerase and an RNAse which assists in breaking up the DNA/RNA hybrid. After synthesis of the double stranded DNA, T7 RNA polymerase can be added, and RNA transcribed from the second DNA strand template as described by Van Gelder et al. (USPN 5,545,522). RNA can be amplified in vitro, in situ or in yiyo (Eberwine, USPN 5,514,545). It is also advantageous to include quantisation controls to assure that amplification and labeling procedures do not change the true abundance of transcripts in a sample. For this purpose, a sample is spiked with a known amount of control nucleic acid, and the probes include control probes which specifically hybridize with the control nucleic acid. After hybridization and processing, the hybridization signals should reflect accurately the amounts of control nucleic acid added to the sample. Prior to hybridization, it may be desirable to fragment the nucleic acids of the sample.
Fragmentation improves hybridization by minimizing secondary structure and cross-hybridization among the nucleic acids in the sample or with noncomplementary probes. Fragmentation can be performed by mechanical or chemical means.
The nucleic acids may be labeled with one or more labeling moieties to allow for detection and quantitation of hybridization complexes. The labeling moieties can include compositions that can be detected by specfroscopic, photochemical, biochemical, bioelectronic, immunochemical, electrical, optical or chemical means. The labeling moieties include radioisotopes, such as 32P, 33P or 35S; chemiluminescent compounds, labeled binding proteins, heavy metal atoms, specfroscopic markers such as fluorescent markers and dyes; magnetic labels, linked enzymes, mass spectrometry tags, spin labels, electron transfer donors and acceptors, and the Uke.
Exemplary dyes include quinoline dyes, triarylmethane dyes, phthaleins, azo dyes, cyanine dyes, and the Uke. Preferably, fluorescent markers absorb Ught above about 300 nm, more preferably above 400 nm, and usually emit Ught at wavelengths at least greater than 10 nm above the wavelength of the Ught absorbed. Preferred fluorescent markers include fluorescein, phycoerythrin, rhodamine, lissamine, and Cy3 and Cy5.
LabeUng can be carried out during an amplification reaction, such as polymerase chain and in vitro transcription reactions; by nick translation, or by 5' or 3 -end-labeUng reactions. In one case, labeled nucleotides are used in an in vitro transcription reaction. When the label is incoφorated after or without an ampUfication step, the label is incoφorated either by using a terminal transferase or a kinase on the 5 ' end of the target polynucleotide and then incubating overnight with a labeled oUgonucleotide in the presence of T4 RNA Ugase.
Alternatively, the labeling moiety can be incoφorated after hybridization once a probe/target complex has formed. In one case, biotin is first incorporated during an ampUfication step as described above. After the hybridization reaction, unbound nucleic acids are rinsed away so that the only biotin remaining bound to the substrate is that attached to targets that are hybridized to probes. Then, an avidin-conjugated fluorophore, such as avidin-phycoerythrin, that binds with high affinity to biotin is added. In another case, the labeUng moiety is incoφorated by intercalation into preformed target/probe complexes. In this case, an intercalating dye such as a psoralen-Unked dye can be employed. Screening Assays Probes or polynucleotides may be used to screen a library of molecules or compounds for specific binding affinity. The Ubraries may be DNA molecules, RNA molecules, PNAs, peptides, proteins such as transcription factors, enhancers, repressors, and other organic or inorganic Ugands which regulate activities such as replication, transcription, or translation of polynucleotides in the biological system. The assay involves combining the probe with the library of molecules or compounds under conditions that allow specific binding, and detecting specific binding to a ligand which specifically binds the probe.
Similarly, a protein or a portion thereof transcribed and translated from a probe may be used to screen Ubraries of molecules or compounds in any of a variety of screening assays. The protein or portion thereof may be free in solution, affixed to an abiotic or biotic substrate, borne on a cell surface, or located intracellularly. Specific binding between the protein and a Ugand may be measured. Depending on the kind of library being screened, the assay may be used to identify DNA, RNA, or PNAs, agonists, antagonists, antibodies, immunoglobulins, inhibitors, mimetics, peptides, proteins, drugs, or any other Ugand, that specifically binds the protein. Purification of Ligand Probes may be used to purify a Ugand from a sample. A method for using a probe to purify a
Ugand would involve combining the probe with a sample under conditions to allow specific binding, detecting specific binding, recovering the bound protein, and using an appropriate agent to separate the polynucleotide from the purified ligand.
Similarly, the encoded protein or a portion thereof may be used to purify a ligand from a sample. A method for using a protein or a portion thereof to purify a ligand would involve combining the protein or a portion thereof with a sample under conditions to allow specific binding, detecting specific binding between the protein and Ugand, recovering the bound protein, and using an appropriate agent to separate the protein from the purified Ugand. Hybridization and Detection Hybridization causes a denatured polynucleotide probe and a denatured complementary target to form a stable duplex through base pairing. Hybridization methods are well known to those skilled in the art. (See Ausubel, supra, units 2.8-2.11, 3.18-3.19 and 4.6-4.9.) Conditions can be selected for hybridization where completely complementary probe and target can hybridize, i.e., each base pair must interact with its complementary base pair. Alternatively, conditions can be selected where probe and target have mismatches of up to about 10% but are still able to hybridize. Suitable conditions can be selected by varying the concentrations of salt in the prehybridization, hybridization, and wash solutions or by varying the hybridization and wash temperatures. With some substrates, temperature can be decreased by adding formamide to the prehybridization and hybridization solutions.
Hybridization can be performed at low stringency with buffers, such as 5xSSC with 1 % sodium dodecyl sulfate (SDS) at 60°C, which permits hybridization between probe and target sequences that contain some mismatches to form probe/target complexes. Subsequent washes are performed at higher stringency with buffers such as 0.2xSSC with 0.1% SDS at either 45 °C (medium stringency) or 68 °C (high stringency), to maintain hybridization of only those probe/target complexes that contain completely complementary sequences. Background signals can be reduced by the use of detergents such as SDS, Sarcosyl, or TRITON X-100 (Sigma-Aldrich, St. Louis MO) or a blocking agent, such as salmon sperm DNA.
Hybridization specificity can be evaluated by comparing the hybridization of control probe to target sequences that are added to a sample in a known amount. The control probe may have one or more sequence mismatches compared with the corresponding target. In this manner, it is possible to evaluate whether only complementary probes are hybridizing to the targets or whether mismatched hybrid duplexes are forming.
Hybridization reactions can be performed in absolute or differential hybridization formats. In the absolute hybridization format, probes from one sample are hybridized to microarray probes, and signals detected after hybridization complexes form. Signal strength correlates with probe levels in a sample. In the differential hybridization format, differential expression of a set of genes in two biological samples is analyzed. Probes from the two samples are prepared and labeled with different labeUng moieties. A mixture of the two labeled targets is hybridized to the microarray probes, and signals are examined under conditions in which the emissions from the two different labels are individually detectable. Targets in the microarray that are hybridized to substantially equal numbers of probes derived from both biological samples give a distinct combined fluorescence (Shalon,
PCT/WO95/35505). In a preferred embodiment, the labels are fluorescent labels with distinguishable emission spectra, such as a lissamine conjugated nucleotide analog and a fluorescein conjugated nucleotide analog. In another embodiment Cy3 and Cy5 fluorophores (Amersham Pharmacia Biotech, Piscataway NJ) are employed. After hybridization, the microarray is washed to remove nonhybridized polynucleotides, and complex formation between the hybridizable array probes and the targets is examined. Methods for detecting complex formation are well known to those skilled in the art. In a preferred embodiment, the probes are labeled with a fluorescent label, and measurement of levels and patterns of fluorescence indicative of complex formation is accompUshed by fluorescence microscopy, preferably confocal fluorescence microscopy. An argon ion laser excites the fluorescent label, emissions are directed to a photomultipUer, and the amount of emitted Ught is detected and quantitated. The detected signal should be proportional to the amount of probe/target complexes at each position of the microarray. The fluorescence microscope can be associated with a computer-driven scanner device to generate a quantitative two-dimensional image of hybridization intensity. The scanned image is examined to determine the abundance/expression level of hybridized probe. Typically, microarray fluorescence intensities can be normalized to take into account variations in hybridization intensities when more than one microarray is used under similar test conditions. In a preferred embodiment, individual polynucleotide probe/target complex hybridization intensities are normalized using the intensities derived from internal normalization controls contained on each microarray.
Expression Profiles
This section describes an expression profile using the polynucleotides of this invention. The reference set can be used as part of a expression profile which detects changes in the expression of novel genes whose transcripts are modulated in a particular metabolic response, treatment, condition, disorder, or disease. These genes will include genes whose altered expression is correlated with a cancer, an immunopathology, a neuropathology, and the Uke.
The expression profile comprises a pluraUty of detectable hybridization complexes. Each complex is formed by hybridization of one or more probes to one or more complementary targets. At least one of the probes, preferably a pluraUty of probes, is hybridized to a complementary target forming, at least one and preferably, a pluraUty of complexes. A complex is detected by incoφorating at least one labeUng moiety. The expression profiles provide "snapshots" that can show unique expression patterns that are characteristic of a metaboUc process, treatment, condition, disorder or disease.
After performing hybridization experiments and detecting signals from a microarray, particular probes can be identified and selected based on their expression patterns. Such probes can be used to clone a full length sequence for the gene, to screen a library for a closely related homolog, to screen for or purify ligands, or to produce a protein. Utility of the Invention
The pluraUty of polynucleotides can be used as hybridizable elements in a microarray. Such a microarray can be employed in several appUcations including diagnostics, prognostics and treatment regimens, and drug discovery and development for conditions, disorders, and diseases such as cancer, an immunopathology, a neuropathology and the Uke. Expression Profiles
In one situation, the microarray is used to monitor the progression of disease. The differences in gene expression between healthy and diseased tissues or cells can be assessed and cataloged. By analyzing changes in patterns of gene expression, disease can be diagnosed at eariier stages before the patient is symptomatic. The invention can be used to formulate a prognosis and to design a treatment regimen. The invention can also be used to monitor the efficacy of treatment. For treatments with known side effects, the microarray is employed to "fine tune" the treatment regimen. A dosage is established that causes a change in genetic expression patterns indicative of successful treatment. Expression patterns associated with the onset of undesirable side effects are avoided. This approach may be more sensitive and rapid than waiting for the patient to show inadequate improvement, or to manifest side effects, before altering the course of treatment.
Alternatively, animal models which mimic a human disease can be used to characterize expression profiles associated with a particular condition, disorder or disease or the freatment of the condition, disorder or disease. Experimental treatment regimens may be tested in these animal models using microarrays to establish and then follow expression profiles over time. In addition, microarrays may be used with cell cultures or tissues removed from animal models to rapidly screen large numbers of candidate drug molecules, looking for ones that produce an expression profile similar to those of known therapeutic drugs, with the expectation that molecules with the same expression profile will likely have similar therapeutic effects. Thus, the invention provides the means to rapidly determine the molecular mode of action of a drug. Embryonic Stem Cells
Embryonic (ES) stem cells isolated from rodent or human embryos retain the potential to form embryonic tissues. When ES cells such as the mouse 129/SvJ cell line are placed in a blastocyst from the C57BL/6 mouse strain, they resume normal development and contribute to tissues of the live-born animal. ES cells are preferred for use in the creation of experimental knockout and knockin animals. In mice, the method for this process is well known in the art and the steps are: the cDNA is introduced into a vector, the vector is transformed into ES cells, transformed cells are identified and microinjected into mouse cell blastocysts, blastocysts are surgically transferred to pseudopregnant dams. The resulting chimeric progeny are genotyped and bred to produce heterozygous or homozygous strains. ES cells are also used for the treatment of victims of Parkinson's disease, stroke, and other neuropathologies (The Scientist, 14(18):lff; September 2000). Pharmaceutical companies are also targeting disorders of the liver, kidney, and pancreas, specifically alpha- 1 antifrypsin, polycystic kidney disease, and diabetes, respectively. In time, traumatic damage to the nervous system and internal organs may also be treated by transplantation of cells or organs which are differentiated from embryonic stem cells. The present invention may be used to characterize the developmental pathways of the differentiation processes that give rise to brain, heart, kidney, Uver, lung, muscle, ovarian, pancreatic, small intestine, stomach, or uterine tissues. Knockout Analysis
In gene knockout analysis, a region of a gene is enzymatically modified to include a non-natural intervening sequence such as the neomycin phosphotransferase gene (neo; Capecchi (1989) Science 244:1288-1292). The modified gene is transformed into cultured ES cells and integrates into the endogenous genome by homologous recombination. The inserted sequence disrupts transcription and translation of the endogenous gene. Knockin Analysis
ES cells can be used to create knockin humanized animals or transgenic animal models of human diseases. With knockin technology, a region of a human gene is injected into animal ES cells, and the human sequence integrates into the animal cell genome. Transgenic progeny or inbred lines are studied and treated with potential pharmaceutical agents to obtain information on the progression and freatment of the analogous human condition.
As described herein, the uses of the cDNAs, provided in the Sequence Listing of this appUcation, and their encoded proteins are exemplary of known techniques and are not intended to reflect any limitation on their use in any technique that would be known to the person of average skill in the art. Furthermore, the cDNAs provided in this application may be used in molecular biology techniques that have not yet been developed, provided the new techniques rely on properties of nucleotide sequences that are currently known to the person of ordinary skill in the art, e.g., the triplet genetic code, specific base pair interactions, and the Uke. Likewise, reference to a method may include combining more than one method for obtaining, assembling or expressing cDNAs that will be known to those skilled in the art. It is also to be understood that this invention is not Umited to the particular methodology, protocols, and reagents described, as these may vary. It is also understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention which will be limited only by the appended claims. The examples below are provided to illustrate the subject invention and are not included for the purpose of Umiting the invention.
EXAMPLES For puφoses of example, the preparation and sequencing of the BRAINON01 cDNA library is described. Preparation and sequencing of other cDNAs in Ubraries in the LIFESEQ database (Incyte Genomics) have varied over time, and the gradual changes involved use of kits, plasmids, and machinery available at the particular time the Ubrary was made and analyzed. I cDNA Library Construction
The BRAINON01 normaUzed cDNA library was constructed from cancerous brain tissue obtained from a 26-year-old Caucasian male during cerebral meningeal excision following diagnosis of grade 4 oUgoastrocytoma localized in the right fronto-parietal part of the brain. The tumor had been irradiated (5800 rads). Patient history included hemiplegia, epilepsy, ptosis of eyelid, and common migraine, and medications included Dilantin® (Parke-Davis, Morris Plains NJ).
The frozen tissue was homogenized and lysed using a POLYTRON homogenizer (PT-3000; Brinkmann Instruments, Westbury NY) in guanidinium isothiocyanate solution. The lysate was extracted with acid phenol, pH 4.7, per Stratagene RNA isolation protocol (Stratagene, San Diego CA). The RNA was extracted with an equal volume of acid phenol, reprecipitated using 0.3 M sodium acetate and 2.5 volumes of ethanol, resuspended in DEPC-treated water, and treated with DNase for 25 min at 37°C. The RNA extraction was repeated with phenol, pH 8.7, and precipitated with sodium acetate and ethanol as before. The mRNA was isolated with the OLIGOTEX kit (Qiagen, Chatsworth CA) and used to construct the cDNA library. The mRNA was handled according to the recommended protocols in the SUPERSCRIPT plasmid system (Life Technologies). cDNAs were fractionated on a SEPH AROSE CL4B column (Amersham Pharmacia Biotech), and those cDNAs exceeding 400 bp were Ugated into PSPORT I plasmid (Life Technologies). The plasmid was transformed into DH5 competent cells (Life Technologies) to construct the BRAINOT03 library. II Normalization of the cDNA Library
4.9 x IO6 independent clones of the BRAINOT03 library were grown in liquid culture under carbenicilUn (25 mg/L) and methicilUn (1 mg/ml) selection following transformation by electroporation into DH12S competent cells (Life Technologies). The culture was monitored using a DU-7 spectrophotometer (Beckman Coulter, Fullerton CA) until it reached an OD600 of 0.2, and then superinfected with a 5-fold excess of the helper phage Ml 3K07 (Vieira et al. (1987) Methods Enzymol 153:3-11).
To reduce the number of highly expressed cDNAs, the library was normalized in a single round according to the procedure of Soares et al. (1994, Proc Natl Acad Sci 91 :9928-9932) with the following modifications: 1) the primer to template ratio in the primer extension reaction was increased from 2:1 to 10:1, 2) the ddNTP concentration was reduced to 150μM to allow generation of longer (400-1000nt) primer extension products, and 3) the reanneaUng hybridization was extended from 13 to 48 hours. After the single stranded DNA circles were purified by hydroxyapatite chromatography and converted to partially double-stranded by random priming, the cDNAs were elecfroporated into DH10B competent bacteria (Life Technologies) to construct the BRAINON01 normaUzed Ubrary. III Isolation and Sequencing of cDNA Clones
Plasmid DNA was released from bacterial cells and purified using the REAL Prep 96 plasmid kit (Qiagen). This kit enabled the simultaneous purification of 96 samples in a 96-well block using multi-channel reagent dispensers. The recommended protocol was employed except for the following changes: 1) the bacteria were cultured in 1 ml of sterile TERRIFIC BROTH (BD Biosciences, Sparks MD) with carbenicilUn at 25 mg/L and glycerol at 0.4%; 2) the cultures were inoculated, incubated for 19 hours, and then lysed with 0.3 ml of lysis buffer; and 3) following isopropanol precipitation, the plasmid DNA pellet was resuspended in 0.1 ml of distilled water.
The cDNAs were prepared using a MICROLAB 2200 system (Hamilton, Reno NV) in combination with DNA ENGINE thermal cyclers (PTC200; MJ Research, Waltham MA). The cDNAs were sequenced by the method of Sanger and Coulson (1975; J Mol Biol 94:441 f) using ABI PRISM 377 DNA sequencing systems (AppUed Biosystems). Most of the sequences were sequenced using standard ABI protocols and kits (Applied Biosystems) at solution volumes of 0.25x - l.Ox. In the alternative, some of the sequences were sequenced using solutions and dyes from Amersham Pharmacia Biotech.
IV Selection of Sequences for the Microarray
Incyte clones were mapped to non-redundant Unigene clusters (Unigene database (build 46), NCBI; Shuler (1997) J Mol Med 75:694-698), and the 5' clone with the strongest BLAST alignment (at least 90% identity and 100 bp overlap) was chosen, verified, and used in the construction of the microarray. The UNI GEM V microarray (Incyte Genomics) contains 7075 array elements which represent 4610 annotated genes and 2,184 unannotated clusters. Table 1 shows the GenBank 119 annotations for SEQ ID NOs:l-416 of this invention as produced by BLAST analysis.
V Homology Searching of Polynucleotides and Proteins
BLAST involves finding similar segments between the query sequence and a database sequence, evaluating the statistical significance of any similarities, and reporting only those matches that satisfy a user-selectable threshold of significance. BLAST produces alignments of both nucleotide and amino acid sequences to determine sequence similarity.
The fundamental unit of the analysis is the High scoring Segment Pair (HSP). An HSP consists of two sequence fragments of arbitrary, but equal lengths, whose alignment is locally maximal and for which the aUgnment score meets or exceeds threshold of significance set by the user.
The basis of the search is the product score, which is defined as:
% sequence identity x % maximum BLAST score 100
The product score takes into account both the degree of identity between two sequences and the length of the sequence match as reflected in the BLAST score. The BLAST score is calculated by scoring +5 for every base that matches in an HSP and -4 for every mismatch. For a product score of
40, the match will be exact within a 1 % to 2% error and for a product score of 70, the match will be exact. Homologous molecules usually show product scores between 15 and 40, although lower scores may identify related molecules. The P- value for any given HSP is a function of its expected frequency of occurrence and the number of HSPs observed against the same database sequence with scores at least as high.
Percent sequence identity is found in a comparison of two or more amino acid or nucleic acid sequences. Percent identity can be determined electronically using the MEGALIGN program, a component of LASERGENE software (DNASTAR). The percent similarity between two amino acid sequences is calculated by dividing the length of sequence A, minus the number of gap residues in sequence A, minus the number of gap residues in sequence B, into the sum of the residue matches between sequence A and sequence B, times one hundred. Gaps of low or of no homology between the two amino acid sequences are not included in determining percentage similarity.
Sequences with conserved protein motifs may be searched using the BLOCKS search program. This program analyses sequence information contained in the Swiss-Prot and PROSITE databases and is useful for determining the classification of uncharacterized proteins translated from genomic or cDNA sequences (Bairoch et al. (1997) Nucleic Acids Res 25:217-221; Attwood et al. (1997) J Chem Inf Comput Sci 37:417-424). PROSITE database is a useful source for identifying functional or structural domains that are not detected using motifs due to extreme sequence divergence. Using weight matrices, these domains are calibrated against the SWISS-PROT database to obtain a measure of the chance distribution of the matches.
The PRINTS database can be searched using the BLIMPS search program to obtain protein family "fingeφrints". The PRINTS database complements the PROSITE database by exploiting groups of conserved motifs within sequence alignments to build characteristic signatures of different protein famiUes. For both BLOCKS and PRINTS analyses, the cutoff scores for local similarity were: >1300=strong, 1000-1300=suggestive; for global similarity were: p<exp-3; and for strength (degree of correlation) were: >1300=strong, 1000-1300=weak. VI Extension of cDNA Clones
Some of the nucleic acid sequences of the Sequence Listing, designed F, R, or T, were produced by extension of an appropriate fragment of the original clone insert using oligonucleotide primers designed from this fragment. One primer was synthesized to initiate 5' extension of the known sequence, and the other primer, to initiate 3' extension of the known sequence. The initial primers were designed using OLIGO software (Molecular Insights, Cascade CO), or another appropriate program, to be about 22 to 30 nucleotides in length, to have a GC content of about 50% or more, and to anneal to the target sequence at temperatures of about 68°C to about 72°C. Any stretch of nucleotides which would result in haiφin structures and primer-primer dimerizations was avoided.
Selected human cDNA libraries were used to extend the sequence. If more than one extension was necessary or desired, additional or nested sets of primers were designed.
High fideUty ampUfication was obtained by PCR using methods well known in the art. PCR was performed in 96-well plates using the DNA ENGINE thermal cycler (MJ Research). The reaction mix contained DNA template, 200 nmol of each primer, reaction buffer containing Mg2+, (NH4)2S04, and β-mercaptoethanol, Taq DNA polymerase (Amersham Pharmacia Biotech), ELONGASE enzyme (Life Technologies), and Pfu DNA polymerase (Stratagene), with the following parameters for primer pair PCI A and PCI B: Step 1: 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68°C, 5 min; Step 7: storage at 4°C. In the alternative, the parameters for primer pair T7 and SK+ were as follows: Step 1 : 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 57°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68°C, 5 min; Step 7: storage at 4°C.
The concentration of DNA in each well was determined by dispensing 100 μl PICOGREEN reagent (0.25% v/v PICOGREEN (Molecular Probes, Eugene OR) dissolved in lx TE) and 0.5 μl of undiluted PCR product into each well of an opaque fluorimeter plate (Corning Costar, Acton MA), allowing the DNA to bind to the reagent. The plate was scanned in a Fluoroskan II (Labsystems Oy, Helsinki FI) to measure the fluorescence of the sample and to quantify the concentration of DNA. A 5 l to 10 μ\ aUquot of the reaction mixture was analyzed by electrophoresis on a 1 % agarose minigel to determine which reactions were successful in extending the sequence.
The extended nucleotides were desalted and concentrated, transferred to 384-well plates, digested with CviJI cholera virus endonuclease (Molecular Biology Research, Madison WI), and sonicated or sheared prior to religation into pUC 18 vector (Amersham Pharmacia Biotech). For shotgun sequencing, the digested nucleotides were separated on 0.6% to 0.8% agarose gels, fragments were excised, and agar digested with AGARACE (Promega). Extended clones were reUgated using T4 ligase (New England Biolabs, Beverly MA) into pUC 18 vector (Amersham Pharmacia Biotech), treated with Pfu DNA polymerase (Stratagene) to fill-in restriction site overhangs, and transfected into competent E. coU cells. Transformed cells were selected on antibiotic-containing media, and individual colonies were picked and cultured overnight at 37°C in 384-well plates in LB/2x carbenicilUn tiquid media.
The cells were lysed, and DNA was ampUfied using Taq DNA polymerase (Amersham Pharmacia Biotech) and Pfu DNA polymerase (Stratagene) with the following parameters: Step 1: 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 72°C, 2 min; Step 5: steps 2, 3, and 4 repeated 29 times; Step 6: 72°C, 5 min; Step 7: storage at 4°C. DNA was quantified by PICOGREEN reagent (Molecular Probes) as described above. Samples with low DNA recoveries were reampUfied using the conditions described above. Samples were diluted with 20% dimethysulphoxide (1:2 v/v), and sequenced using DYENAMIC energy transfer sequencing primers and the DYENAMIC DIRECT kit (Amersham Pharmacia Biotech) or the ABI PRISM BIGDYE terminator kit (AppUed Biosystems).
VII mRNA for Target Polynucleotides The mRNAs or tissues for preparing target polynucleotides were obtained from Biochain
Institute (San Leandro CA), International Institute for Advanced Medicine (Exeter PA), and Oncormed (Gaithersburg MD). RNA was extracted from tissue samples using the extraction protocol and purification procedures described above.
VIII Microarray Preparation, Labeling of Targets, and Hybridization Analyses Substrate Preparation
Probe polynucleotides were amplified from bacterial vectors by thirty cycles of PCR using primers complementary to vector sequences flanking the insert and purified using SEPHACRYL-400 beads (Amersham Pharmacia Biotech). Purified polynucleotides were robotically arrayed onto a glass microscope slide (Corning Science Products, Corning NY) previously coated with 0.05% aminopropyl silane (Sigma-Aldrich) and cured at 110°C. The microarray was exposed to UV irradiation in a STRATALINKER UV-crossUnker (Stratagene).
Target Preparation
Each mRNA sample, shown in Table 2, was reverse transcribed using MMLV reverse transcriptase in the presence of dCTP-Cy3 or dCTP-Cy5 (Amersham Pharmacia Biotech) according to standard protocol. After incubation at 37°C, the reaction was stopped with 0.5 M sodium hydroxide, and RNA was degraded at 85 °C. The target polynucleotides were purified using CHROMASPIN 30 columns (Clontech, Palo Alto CA) and ethanol precipitation.
Hybridization
The hybridization mixture, containing 0.2 mg of each of the Cy3 and Cy5 labeled target polynucleotides, was heated to 65°C, and dispensed onto the UNIGEM V microarray (Incyte Genomics) surface. The microarray was covered with a coversUp and incubated at 60°C C. The microarrays were sequentially washed at 45°C in moderate stringency buffer (lxSSC and 0.1% SDS) and high stringency buffer (O.lxSSC) and dried.
Detection
A confocal laser microscope was used to detect the fluorescence-labeled hybridization complexes. Excitation wavelengths were 488 nm for Cy3 and 632 nm for Cy5. Each array was scanned twice, one scan per fluorophore. The emission maxima was 565 nm for Cy3 and 650 nm for Cy5. The emitted light was split into two photomultipUer tube detectors based on wavelength. The output of the photomultipUer tube was digitized and displayed as an image, where the signal intensity was represented using a Unear 20 color transformation, with red representing a high signal and blue a low signal. The fluorescence signal for each probe was integrated to obtain a numerical value corresponding to the signal intensity using GEMTOOLS expression analysis software (Incyte Genomics).
IX Data Analysis and Results
Out of the 7075 genes present on UNIGEM V, 3627 genes or 51 % were expressed at a significant level across all 30 tissue samples. Significance was defined as signal to background ratio exceeding 2.5 and area hybridization exceeding 40% for both probes. All data was transformed so that differential gene expression values were Log base 2 scale.
Analysis of Variance
For each gene, an ANOVA test was run using the tissue categories as the grouping variable.
The ANOVA tested whether measurements across samples belonging to known categories were associated with those categories. ANOVA compares the Variance between (Vb) categories to the
Variance within (Vw) categories. The ratio of Vb divided by Vw (F ratio) was compared to the F distribution for a population of equal degree of freedom (DF) and the probability of the F ratio was returned.
Anova Computation
Vbetween ∑ (X - XG)2 ∑ NG(XG - XT)2
F = Vbetween ■ Vwithin Vwithin =
(N - k) (*- D
DF = (N-k)*(k-l) X, : Individual value NG Number of Individuals in Category Xα : Category Mean X, : Population Mean N : Number of Individuals Number of Categories
The null hypothesis states that if the measurement variations between samples are due to chance only, the variance within categories and variance between categories should be the same. Therefore, in the absence of any significant association between gene expression and tissue categories, the probability returned by ANOVA is equal to 1. Reciprocally, a strong association between gene expression and tissue categories implies that the variance between samples is significantly greater than the variance within categories, and therefore the probabiUty returned by ANOVA is small. The data for the 340 genes shown in Table 3 was used to produce Table 4 which shows that each gene selected for annotation scored an ANOVA probabiUty equal or below IO"5. Gene Annotation
Since selection criterion imposed that the variances of measurement within tissue categories were small (see above), it was acceptable to summarize these measurements as the average of the measurements within each tissue category. Furthermore, in order to emphasize differences between tissue categories for each gene, the differences between tissue averages and all-tissues average were computed; formula and values are shown in Table 5.
Using these differential average values, genes were associated with a primary tissue category according to the highest differential average value. A minimum differential average value of 1.5 was required to associate a gene with a tissue category. When possible, genes were associated with a secondary, tertiary, and even quaternary tissue category according to the second, third, and fourth highest differential average values, respectively. X Screening Molecules for Specific Binding with the Polynucleotide or Protein
The polynucleotide or fragments thereof and the protein or portions thereof are labeled with 32P-dCTP, Cy3-dCTP, Cy5-dCTP (Amersham Pharmacia Biotech), or BIODIPY or FITC (Molecular Probes), respectively. Candidate molecules or compounds previously arranged on a substrate are incubated in the presence of labeled nucleic or amino acid. After incubation under conditions for either a polynucleotide or protein, the substrate is washed, and any position on the substrate retaining label, which indicates specific binding or complex formation, is assayed. The binding molecule is identified by its arrayed position on the substrate. Data obtained using different concentrations of the nucleic acid or protein are used to calculate affinity between the labeled nucleic acid or protein and the bound molecule. High throughput screening using very small assay volumes and very small amounts of test compound is fully described in Burbaum et al. USPN 5,876,946.
All patents and publications mentioned in the specification are incoφorated herein by reference. Various modifications and variations of the described method and system of the invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention. Although the invention has been described in connection with specific preferred embodiments, it should be understood that the invention as claimed should not be unduly Umited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the invention that are obvious to those skilled in the field of molecular biology or related fields are intended to be within the scope of the following claims.
TABLE 1
SEQ ID cDNA SEQ ID cDNA Tissue Description of GenBank Homolog
1 2380381 209 1345551CB1 heart g602702 mitochondrial 2,4-dienoyl-CoA
2 1618422 210 1618422.con heart g285990 mRNA for KIAK0002 gene
3 2672064 211 4646418CB1 heart g 1399027 cysteine rich prot 2
4 608361 212 608361.con heart g6808282 mRNA; cDNA DKFZp58600221
5 1922596 213 2483470CB1 heart g3452378 glutamate oxaloacetate transaminase, exon 9
6 1850033 214 1850033CB1 heart gl89010 HUMMLC2At
7 986987 215 3030106CB1 heart g809558 mRNA for cardiac myosin bindin
8 718807 216 718807xon heart gl526977 mRNA for ryanodine receptor 2
9 2880435 217 2880435.con heart g 1841371 MURRl mRNA
10 187326 218 187326.con heart g4071059 TNNI3
11 3208425 219 3208425.con skel muscle g28596 aldolase A
12 1668474 220 1668474.con skel muscle g 178645 erythroid ankyrin mRNA
13 1622542 221 1622542.con skel muscle g3882276 mRNA for KIAA0778 protein
14 4014318 222 4014318.con skel muscle g306472 DHP-sensitive calcium channel
15 2394888 223 1485879CB1 skel muscle g791039 mRNA for skeletal muscle-specific calpain
16 1345550 224 2637261CB1 skel muscle g 179787 carbonic anhydrase III
17 1719955 225 1719955.con skel muscle g34788 mRNA for muscle specific enolase
18 2256026 226 2256026.con skel muscle g 1021572 CpG island DNA genomic Msel fragment
19 1538086 227 2501821CB1 skel muscle g 1212945 mRNA for guanidinoacetateN-methyltransferase
20 958633 228 1532783CB1 skel muscle g7297634 CG5676 gene product
21 2635943 229 3013501CB1 skel muscle g3153910 muscle glycogen phosphorylase
22 121888 230 3405838CB1 skel muscle g4808812 myosin heavy chain Ila
23 1627492 231 1627492.con skel muscle g3127082 FIP2 alternatively translated mRNA
24 4073867 232 3354111CB1 skel muscle g4426911 phytanoil-CoA alpha hydroxylase
25 2190170 233 1866437CB1 skel muscle g5759308 putative glialblastoma cell differentiation-related protein
26 972224 234 972224.con skel muscle gl220345 myosin light chain 2
27 1413644 235 1413644CB1 skel muscle g409928 adenylyl cyclase-associated prot (CAP2)
28 1538224 236 1538224.con skel muscle g439602 Rad mRNA
29 2623268 237 2623268.con skel muscle g6523810 FEZ2 protein
30 1665533 238 1665533.con skel muscle g 1620035 XIB mRNA, complete eds
31 981484 239 981484CB1 skel muscle g34837 MYF6 for muscle determination factor
32 973629 240 973629.con skel muscle gl 88591 alkali myosin light chain 1
TABLE 1
SEQ ID cDNA SEQ ID cDNA Tissue Description of GenBank Homolog
33 1539638 241 1539638CB1 skel muscle gl 88582 myosin light chain 1 slow a
34 3015758 242 3010791CB1 skel muscle g402646 mRNA for fast MyBP-C
35 2832314 243 2832314 con skel muscle g338826 TCB gene encoding cytosohc thyroid hormone-binding protein
36 1702996 244 912973CB 1 skel muscle gl80621 cytoplasmic linker proteιn-170 alpha 2
37 839947 245 122365CB1 skel muscle g337721 ryanodine receptor mRNA
38 1867522 246 974936CB 1 skel muscle g 1943766 sarco pin
39 1987831 247 1987831 con skel muscle gl 80708 calcineuπn A2
40 2639708 248 1642009CB 1 skel muscle g339964 slow-twitch skeletal troponin I
41 973815 249 2148260CB 1 skel muscle g546020 troponin T
42 2079906 250 1852756CB 1 Uterus g 190153 replication factor C
43 2852042 251 2852042 con Uterus gl81070 cysteine-πch peptide mRNA
44 2368282 252 2665890CB 1 Ovary gl81375 cholesterol side-chain cleavage
45 2831248 253 2831248 con Ovary g32344 gene for heterogeneous RNP
46 182802 254 182802 con Ovary g257052 3 beta-hydroxysteroid dehydrogenase
47 1003884 255 1520287CB1 Ovary gl86836 laminin Bl chain
48 1 120 256 1 120 con Ovary g35902 mRNA for ribosomal protein L7
49 1285380 257 1516165CB 1 Stomach g7339519 mRNA for procathepsin E
50 1636639 258 1636639 con Stomach g3005731 clone 24747 mRNA sequence
51 1985870 259 1985870 con Stomach g8346840 partial LGALS9 gene for galectin 9 exons
52 1677936 260 3665933CB1 Stomach g31771 mRNA for gastric lipase
53 910612 261 910612 con Stomach g38068 Japanese macaque pepsinogen A-2/3
54 2594407 262 807530CB 1 Stomach gl658285 gastπcsin mRNA
55 963536 263 963536 con Stomach g35706 pS2 mRNA induced by estrogen
56 434377 264 434377 con Intestine g599833 VE-cadheπn mRNA
57 2121863 265 2121863 con Intestine g719268 cysteine-πch heart protein (hCRHP)
58 1597231 266 1597231CB1 Intestine g 1 185451 cy tochrome P450 monooxy genase
59 4174437 267 4174437 con Intestine gl81532 defensin 5 gene
60 2182901 268 2182901 con Intestine gl81546 defensin 6 mRNA
61 1747979 269 181 1382CB 1 Intestine g30822 mRNA for diacylglycerol kinase
62 1630553 270 1630553 con Intestine g 178285 angiotensin I-converting enzym
63 478960 271 155179CB 1 Intestine g6647301 matπptase mRNA
64 2132487 272 2132487 con Intestine g391772 regenerating protein I
TABLE 1
SEQ ID cDNA SEQ ID cDNA Tissue Description of GenBank Homolog
65 2921 152 273 2921152CB1 Intestine g971462 mRNA for I- 15P (I-BABP)
66 1846428 274 180031 1CB1 Intestine g 183414 guanyhn mRNA
67 2796143 275 610574CB 1 Intestine g2924619 mRNA for hepatocyte growth factor activator inhibitor type 2
68 1805613 276 1805613 con Intestine gl 814276 A33 antigen precursor
69 1431273 277 989613CB 1 Intestine g535474 N-benzoyl-L-tyrosyl-p-amino-benzoic acid hydrolase
70 1804662 278 1804662 con Intestine g2058317 mRNA for putative carboxylesterase
71 2921 194 279 2921 194 con Intestine g2385451 mRNA for galectιn-3
72 395368 280 395368 con Intestine g2826520 maltase-glucoamylase
73 2182861 281 1845979CB1 Intestine g454153 intestinal mucin (MUC2)
74 1806436 282 1751028CB 1 Intestine g 187468 P-glycoproteιn (PGYl)
75 2922143 283 1501077CB 1 Intestine g36644 si mRNA for sucrase-isomaltase
76 876720 284 3130321CB1 Lung g 190845 receptor for advanced glycosylation end products (RAGE)
77 1910091 285 1910091 con Lung gl699037 ABC3 mRNA
78 2174130 286 2174130 con Lung g 181467 decay-accelerating factor
79 2219077 287 g6580818 Lung g6580814 indolethylamine N-methyltransferase
80 1965041 288 1965041 con Lung g3882236 mRNA for KIAA0758 protein
81 1649959 289 1649959 con Lung g 186729 mesothe al keratin K7
82 1222317 290 1222317CB1 Lung g 179916 CAPL protein mRNA
83 2510171 291 939088CB 1 Lung g36490 secretory leucocyte protease inhibitor
84 1988674 292 1988674 con Lung g 190673 pulmonary surfactant-associated prot B
85 1672640 293 1672640 con Lung g37946 mRNA for pre-pro-von Willebrand
86 1926543 294 g48841 15 Liver g23875 3-oxoacyl-CoA thiolase
87 1504934 295 1504934 con Liver g28560 peroxisomal L-alanine glyoxylate aminotransferase
88 2512879 296 2512879 con Liver g 178089 class I alcohol dehydrogenase (ADH1) alpha subunit
89 1359832 297 1359832 con Liver g5002378 alcohol dehydrogenase beta2 subunit
90 1583076 298 1583076 con Liver g 178147 alcohol dehydrogenase class I gamma subunit
91 139838 299 139838CB 1 Liver g 178120 class II alcohol dehydrogenase (ADH4)
92 1344654 300 1344654 con Liver g219409 mRNA for alpha-2-plasmιn inhibitor
93 2513979 301 2513979 con Liver g28747 mRNA for apohpoprotein All prec
94 2369312 302 2369312 con Liver g28802 mRNA for precursor of apohpoprotein Cl
95 2048364 303 2514629CB1 Liver g28805 mRNA for lipoprotein apoCII
96 85246 304 85246 con Liver gl78856 apohpoprotein H
TABLE 1
SEQ ID cDNA SEQ ID cDNA Tissue Description of GenBank Homolog
97 166337 305 139825CB1 Liver g 178994 liver arginase mRNA
98 138274 306 138274CB1 Liver gl79078 asialoglycoprotein receptor HI
99 1633340 307 4285165CB1 Liver g5020419 carbamyl phosphate synthetase I
100 1982416 308 630729CB1 Liver g 180255 ceruloplasmin
101 946822 309 946822 con Liver gl 82389 coagulation factor X
102 2517330 310 2517330 con Liver g 179721 complement component C8-gamma
103 2516489 311 272669CB1 Liver g 179970 corticosteroid binding globulin
104 88741 312 138361CB1 Liver g 180986 cytochrome P450IIA3 (CYP2A3)
105 168865 313 168865CB1 Liver g263688 cytochrome P450 2C [Macaca
106 231779 314 271684CB1 Liver g510085 (clone NF 10) cytochrome P450 mfedipine oxidase
107 234123 315 2513588CB1 Liver gl 81394 cytosohc epoxide hydrolase
108 1833801 316 1626663CB1 Liver g8164183 22kDa peroxisomal membrane p
109 1923613 317 1923613 con Liver g6523808 carbonyl reductase mRNA
110 2058620 318 2058620 con Liver g7023255 cDNA FLJ10913 fis
111 1930954 319 1965888CB1 Liver g7023313 cDNA FLJ 10948 fis
112 1511658 320 1486348 con Liver g 182406 fibπnogen alpha subunit
113 2590673 321 2590673 con Liver g 188630 flavin-containing monooxygenase form II
114 1995380 322 1995380 con Liver gl 83655 glutathione S-transferase
115 167409 323 2078240CB1 Liver g31675 mRNA for group-specific component
116 1846226 324 1846226 con Liver g6759555 mRNA for putative progesterone bp
117 2052185 325 185986CB1 Liver g 184487 hemopexin mRNA
118 2517389 326 085596CB1 Liver g 184391 histidine-πch glycoprotein
119 911015 327 1544305CB1 Liver g2865608 homogentisate 1 ,2-dιoxygenas
120 604856 328 149832CB1 Liver g494988 nicotinamide N-methyltransferase
121 1448718 329 1448718 con Liver gl 83117 insulin-like growth factor bp
122 2517268 330 2517268 con Liver g33988 mRNA for inter-alpha-trypsin inhibitor
123 167134 331 085011CB1 Liver g33984 second protein of inter-alpha-trypsin inhibitor complex
124 2843638 332 2843638 con Liver g3236285 leptin receptor short form
125 1813269 333 1297817CB1 Liver g 180947 carboxylesterase mRNA
126 1861971 334 2517374CB1 Liver g24444 mRNA for alpha 1 -acid glycoprotein (orosomucoid)
127 2005973 335 2005973CB1 Liver g 189410 oxytocin mRNA
128 2515729 336 2515729CB1 Liver g35896 mRNA for retinol binding protein
TABLE 1
SEQ ID cDNA SEQ ID cDNA Tissue Description of GenBank Homolog
129 2132356 337 2132356 con Liver g35689 liver mRNA for protein C
130 1001726 338 2614869CB1 Liver g5834471 mRNA for regucalcin
131 2631845 339 2631845 con Liver gl 160968 serum amyloid A
132 86390 340 086390CB1 Liver g337749 serum amyloid A protein
133 1287840 341 2881975CB1 Liver g432974 sterol carrier protein X
134 2516905 342 g5596369 Liver g5596369 transferπn receptor 2 alpha
135 606122 343 606122 con Liver g36712 mRNA for tyrosine aminotransferase
136 3553733 344 2515740CB1 Liver g4530276 lipopolysacchaπde-binding p
137 1813381 345 1272023CB1 Liver g36574 mRNA for S-protein
138 1634342 346 1634342 con Kidney g2707821 aldehyde reductase (ALDR1)
139 1418871 347 629242CB1 Kidney g3523100 Ksp-cadheπn (CDH16)
140 3766382 348 3766382 con Kidney g2708638 carbonic anhydrase precursor
141 943181 349 3485891CB1 Kidney g521073 mRNA for chloride channel
142 603761 350 3321896CB1 Kidney gl 809239 glycoprotein receptor gρ330
143 1297562 351 1297562 con Kidney g2213812 podocalyxin-hke protein
144 2910715 352 2910715 con Kidney g7768681 genomic DNA, chromosome 21 q
145 196975 353 1612344CB1 Kidney g296365 mRNA for propionyl-CoA carboxylase a-chain
146 1453049 354 1453049 con Kidney g452649 mRNA for lung amiloπde sensitive Na+ channel
147 1968695 355 1881237CB1 Kidney g339204 (clone V6) transcobalamin II
148 958344 356 3669695CB1 Kidney g340165 uromoduhn (Tamm-Horsfall glycoprotein)
149 254081 357 2776408CB1 Pancreas g53751 1 alpha-amylase mRNA
150 1330674 358 1330674 con Pancreas gl87149 bile salt-activated lipase (BAL)
151 2377834 359 2377834 con Pancreas g35329 mRNA for procarboxypeptidase Al
152 2075464 360 1307376CB1 Pancreas g790226 preprocarboxypeptidase A2
153 2383235 361 4166960CB1 Pancreas g 180885 cohpase mRNA
154 1285503 362 g180331 Pancreas gl80331 cystic fibrosis mRNA, CFTR
155 2383205 363 2383205 con Pancreas g 182057 pancreatic elastase IIA mRNA
156 2015871 364 2015871 con Pancreas g607029 elastase III B mRNA
157 2374046 365 2088868CB1 Pancreas gl63497 PDI (E C 5 3 4 1)
158 1709828 366 1709828 con Pancreas g325464 endogenous retrovirus type C oncovirus sequence
159 2061119 367 1515152CB1 Pancreas g31 107 mRNA for elongation factor 2
160 3665105 368 3665105 con Pancreas g 1244511 pancreatic zymogen granule membrane protein GP-2
TABLE 1
SEQ ID cDNA SEQ ID cDNA Tissue Description of GenBank Homolog
161 2068983 369 2068983 con Pancreas g893381 mRNA for Reg-related sequence
162 2242648 370 3526170CB1 Pancreas g7023457 cDNA FLJ 11041 fis
163 885032 371 5070239CB1 Pancreas gl 87231 pancreatic lipase related prote
164 2383830 372 949518CB1 Pancreas g 190012 phosphohpase A-2
165 2085191 373 2085191 con Pancreas g521215 pancreatic trypsin 1
166 2792982 374 2792982 con Pancreas g3928429 mRNA for trypsinogen IV a-form
167 243123 375 787351CB1 Brain g 1709300 amyloid precursor-like protei
168 382416 376 382416 con Brain g 182736 cerebellar degeneration-assoc prot
169 1852659 377 1852659 con Brain g397934 a2-chιmaeπn
170 3220181 378 3220181 con Brain g251801 glial fibπllary acidic protein
171 1726307 379 1726307 con Brain g7669991 mRNA, cDNA DKFZp761L0516
172 1904244 380 1904244 con Brain Incyte Unique
173 2039955 381 2039955 con Brain g600118 extensin-hke protein
174 2675641 382 2675641 con Brain g 189982 testis-specific cAMP-dependent prot kinase catalytic subunit
175 1412749 383 1412749 con Brain g6523828 PI 9 protein mRNA
176 1963854 384 1963854 con Brain g9588045 BRI3
177 2949085 385 2949085 con Brain g3892873 mRNA for GABA-B Rib receptor
178 2963196 386 2963196 con Brain g251801 glial fibπllary acidic protein
179 1505977 387 3493359CB1 Brain g493133 glutamate receptor 2
180 1674985 388 1674985 con Brain g2894085 mRNA for p40
181 2109054 389 2109054 con Brain g5689336 mRNA for EB3 protein
182 3317039 390 3317039 con Brain g3451335 F22162_l
183 2838551 391 2838551 con Brain g4426596 islet-brain 1 mRNA
184 1477568 392 1477568 con Brain g4322560 cell-hne OV177 DRR1
185 2963871 393 2963871 con Brain g2865218 mtegrin binding protein Del-1
186 1740547 394 2847104CB1 Brain g 1263035 neuronal membrane glycoprot
187 2292011 395 2292011 con Brain gl710283 neuronal olfactomedin-related ER localized prot
188 1349484 396 1349484 con Brain g3882192 mRNA for KIAA0736 protein
189 1674253 397 1674253 con Brain g 1665814 mRNA for KIAA0275 gene
190 1932189 398 1932189 con Brain g307306 neuroendocπne-specific protein A
191 1403041 399 1558165CB1 Brain g687589 (AFlq) mRNA
192 1486358 400 1486358 con Brain g35958 beta-tubulin gene (5-beta)
TABLE 1
SEQ ID cDNA SEQ ID cDNA Tissue Description of GenBank Homolog
193 1439065 401 3869211CB1 Brain g2645406 calmodu n-stimulated phosphodiesterase PDE1B 1
194 530629 402 530629 con Brain gl 710192 clone 23586 mRNA sequence
195 1672676 403 g559331 Brain g559331 mRNA for KIAA0080 gene
196 1989129 404 1989129 con Brain g 1503987 mRNA for KIAA0202 gene
197 1486348 405 1486348 con Brain g662277 mRNA for MOBP
198 1397294 406 1397294 con Brain gl236938 transcriptional activator mRN
199 2844322 407 2844322 con Brain g 1927201 FEZl mRNA
200 1481440 408 1481440 con Brain 3* of g1403054'"'>
201 26459 409 026459CB1 Brain g3290199 peanut-hke 2 (PNUTL2) mRNA
202 1406786 410 1406786 con Brain g7669991 mRNA, cDNA DKFZp761L0516
203 1485846 411 1485846 con Brain g 190084 proteohpid protein
204 2153242 412 2153242 con Brain g5817080
205 2157981 413 3335607CB1 Brain g2921407 EEN-B 1 mRNA
206 3244361 414 3244361 con Brain g31657 GAT1 mRNA for GABA transporter
207 1986737 415 1289007CB1 Brain g307287 (clone CCG-B7) mRNA sequence
208 2506867 416 1286746CB1 Brain g35439 mRNA for protein gene product
ci —
00 ©
m oo vo r^ σv m CN o "
vo oo © vo m cN m r- σv CN CN — O — — CN CN
CN © vo in © IN — r- O — CN CN
3 CN CN σv Tt in vo
CN © — CN en CN
Tf Tf cN _i — — cn — —
3 a m in in ov
H oo o iή; ' — c-i ©
©
CN Tf en
σv © — σv τf N c cN cN e — ' — — ' en cN
en vo r- o — vo m oo CN en CN CN CN — CN — ci cN
θ oo t o , 1 -- —
© r in in oo σv in in σv vo o
r^ σv x
r- en
en oo vo m oo
— q — q ©
o r-~ o
— q — —
m oo vo — m r~ cn oo
— en — —
o σ> oo
© m en
cn
m
r^ vo r^
oo o ©
©
σv Ov en r~ σv en m en o
m en oo vo en ©
oo
en σ -—v m en
O m r-^ vo en σv Ov CN en vo
εoε/oosfi/i3d Lzβz o OΛV Table 4
Table 4
Table 5
Mean(tιssue)-Mean(Entιre Set)
-a
X C O CΛ
2 r < c. e 3 re c r r 25? g o α. 6p _n 3 c1 3 Ξ re 3. re" O 3 re re re §
3 3 3"
Clone ID re" 3" re
2380381 165 020 -089 -030 -030 012 -077 085 083 -092 -084 -084 Heart
1618422 152 -116 093 004 -017 070 017 -108 -113 -018 005 048 Heart
2672064 210 -040 069 -074 -008 -044 040 -142 024 -166 -049 040 Heart
608361 198 020 065 -087 032 -002 -006 -051 -107 082 -015 -100 Heart
1922596 191 144 -181 -169 -060 -068 -161 142 077 -135 -147 083 Heart
1850033 378 067 002 -040 -012 -031 -067 -095 -087 -047 -056 -073 Heart
986987 533 026 -034 -064 -041 -070 -054 -081 -095 -075 -056 -107 Heart
718807 397 -017 -083 -031 -045 -062 -050 -083 -135 024 -082 120 Heart
2880435 207 -058 023 -049 -034 -016 -031 -055 -004 -049 -018 014 Heart
187326 539 141 -040 -066 -069 -088 -077 -078 -137 -050 -082 -107 Heart
1997963 295 264 -095 -090 -041 -048 -093 -125 031 -095 -164 -007 Heart Sk Muscle
467700 414 242 061 -124 049 -007 -1 16 -132 -139 -097 -106 -156 Heart Sk Muscle
57382 406 392 -077 -088 -029 -063 -172 -175 -046 -155 -172 -017 Heart Sk Muscle
1222442 219 168 -166 -160 -029 -013 -152 027 090 -102 -106 -016 Heart Sk Muscle
4013105 401 266 037 -076 -122 -177 -143 -192 039 -166 -050 -007 Heart Sk Muscle
924319 561 310 -095 -092 -124 -138 098 -130 -173 -116 -126 -187 Heart Sk Muscle
1645119 275 184 -069 -066 -080 -051 -084 -030 -046 -059 -065 -042 Heart Sk Muscle
1379925 379 338 -081 -108 -095 -121 -108 -090 -079 -119 -046 -078 Heart Sk Muscle Uterus
1900961 318 230 154 -062 -028 035 -144 -116 -096 -105 -124 -133 Heart Sk Muscle Spleen
3506985 279 250 -172 -206 -044 021 037 -124 -179 -164 204 -223 Heart Uterus
551403 178 140 166 -015 050 -120 -118 -121 -116 017 -132 145 Heart Uterus Brain Sk Muscle
3948420 203 178 193 000 087 -117 -105 -216 -131 019 -190 184 Heart Ovary
1722853 241 -027 -048 186 026 -055 -014 -054 003 -094 -114 003 Heart Brain
1557490 270 -060 095 -084 -011 -118 -074 -175 -025 -104 -007 194 Heart Brain
3208425 131 287 -039 -070 -034 -005 -086 -159 011 -126 -136 069 Sk Muscle
1668474 137 373 -050 -036 -022 -086 -084 -087 -1 10 -042 -051 -018 Sk Muscle
1622542 -024 206 -066 -066 -034 005 -068 -079 -066 -043 009 119 Sk Muscle
4014318 -024 310 017 010 -016 -035 -041 -043 -066 -016 -005 -096 Sk Muscle
2394888 -100 341 -090 -084 -111 -114 -012 002 046 -139 -077 123 Sk Muscle
Second
Thdir Table 5
Mean(tιssue)-Mean(Entιre Set) oo
X c re 2 O CO p 3 re c re < r c r < o » 3 3. 3 3 er 3 er 3
Clone ID n_ a re" 3 o 3- re
1345550 -0 59 6 89 009 -0 84 -069 -0 92 -0 35 -0 84 -1 39 -1 15 -0 62 -1 12 Sk Muscle
1719955 1 25 4 16 -1 39 -1 25 -1 08 -1 21 -040 -0 09 0 16 -1 84 -1 49 0 30 Sk Muscle
2256026 0 01 3 03 0 13 -0 10 -046 -037 -048 -0 32 -0 75 0 13 -0 27 -064 Sk Muscle
1538086 0 13 2 04 -0 91 -074 -0 65 -0 73 -1 04 1 46 0 17 0 26 -0 81 -0 01 Sk Muscle
958633 099 1 57 044 0 74 -0 61 -0 63 -0 53 -077 0 06 0 30 -077 -0 16 Sk Muscle
2635943 0 59 5 72 -0 26 -0 76 -0 34 -047 -1 10 -092 -1 21 -1 32 -094 -057 Sk Muscle
121888 1 23 5 92 -061 -0 55 -043 -1 24 -0 92 -0 72 -1 13 -0 90 -0 96 -099 Sk Muscle
1627492 0 30 2 40 0 10 0 85 -041 -0 15 -0 69 -0 90 -0 1 1 -0 83 -0 73 0 08 Sk Muscle
4073867 035 1 62 -0 56 -0 18 -045 -0 18 -1 16 1 37 061 -067 -073 -092 Sk Muscle
2190170 0 20 2 24 0 39 043 -0 31 -0 25 -0 28 -1 05 -0 29 -1 01 -044 0 01 Sk Muscle
Ov t 972224 0 44 6 70 0 13 -0 38 -009 -0 68 -0 89 -1 14 -1 21 -1 45 -1 08 -1 56 Sk Muscle
1413644 0 61 3 15 0 36 -0 74 -0 26 -0 14 -1 19 -1 37 -0 85 -1 26 -0 96 1 36 Sk Muscle
1538224 1 1 1 1 71 -0 07 -0 33 -0 82 -061 006 -0 11 -0 1 1 -0 65 -0 70 -044 Sk Muscle
2623268 0 14 2 07 097 0 23 0 65 -0 15 -0 61 -1 52 -0 23 -1 65 -074 044 Sk Muscle
1665533 046 1 93 -0 12 -0 24 -046 -041 0 02 -0 83 002 -047 -0 14 -041 Sk Muscle
981484 -0 04 2 58 -0 54 -060 -005 -0 33 -040 -041 -049 -0 10 -0 28 0 00 Sk Muscle
973629 0 70 6 01 -0 53 -077 -0 35 -0 67 -1 03 - 1 15 -1 02 -1 01 -055 - 1 09 Sk Muscle
1539638 -0 34 1 69 0 64 0 51 005 0 28 -004 -1 22 004 -1 21 -0 58 -006 Sk Muscle
3015758 0 08 5 84 -0 58 -0 53 -048 -099 -0 83 -0 67 -1 21 0 29 -045 -098 Sk Muscle
2832314 0 76 1 81 0 00 -0 19 -0 19 -0 23 -002 -2 79 0 85 -2 07 -0 55 1 05 Sk Muscle
1702996 008 2 32 1 21 -0 20 046 -0 04 -0 39 -1 14 -0 50 Panceasr -064 -0 92 -0 25 Sk Muscle
839947 -o n 243 -1 06 - 1 33 -0 14 -0 37 -0 19 -0 25 -044 -0 37 -0 19 048 Sk Muscle
1867522 0 02 7 47 -1 55 -1 81 -0 56 -1 54 -0 75 -041 -097 -1 23 pee Sln -0 73 -0 67 Sk Muscle
1987831 0 25 1 63 0 00 X -0 32 -0 37 -0 50 -1 00 -0 76 x -0 28 1 24 Sk Muscle
2639708 0 36 5 54 -042 -0 82 -066 -0 56 -0 83 -092 -1 06 -0 53 -0 56 -0 79 Sk Muscle
973815 0 84 5 96 -004 -041 -044 -0 88 -0 80 - 1 33 -1 43 -090 -0 80 -0 80 Sk Muscle
169884 2 50 4 51 -1 30 X -0 70 -090 -1 18 -0 81 -1 64 x -079 -0 38 Sk Muscle Heart
2638235 2 16 2 34 -0 60 -0 88 0 18 -0 67 002 -0 69 -048 -049 -0 92 -1 02 Sk Muscle Heart t Firs
Second
d Thir
Fourth £9
96C0C/00SI1/X3d LZ6Z£/10 OΛV Table 5
Mean(tιssue)-Mean(Entιre Set) oo ac G O -α on CO - On re 2 re < o re C r r _ c s 3 α 3. 3 o 3
3 c
3 Bl n •3 O 3 30 re re" re re er e 3 3 3"
Clone ID re" 3- r
1635004 -117 -042 -116 -093 281 464 -085 003 -158 001 -020 -123 Intestine Stomach
2132752 -141 -038 -065 -on 320 419 -036 -112 -055 -062 -060 -122 Intestine Stomach
1734393 -090 -040 -103 -046 307 466 -097 -075 -125 005 -088 -074 Intestine Stomach
4179338 -066 -135 -201 -202 -067 583 -152 572 -181 -166 -165 -155 Intestine Liver
1427623 -078 -068 -096 X -038 363 -129 256 -033 X -111 -111 Intestine Liver
3320987 -084 -075 084 -127 -116 389 -123 -015 247 -162 -022 -199 Intestine Kidney
2239819 -098 -Oil -112 -114 -085 363 -112 -088 268 213 -102 -121 Intestine Kidney Pancreas
876720 -041 -037 -042 -058 -057 -044 345 -026 -050 -042 001 -049 Lung
1910091 -031 -005 -023 -064 -047 -031 213 -060 -047 -028 -043 081 Lung
2174130 -054 -018 -005 096 069 019 153 -072 -044 -067 040 -076 Lung
2219077 035 -033 -002 005 -032 -042 207 -017 -036 -044 -016 -062 Lung
1965041 063 006 031 -075 -084 -116 183 -093 076 -075 054 -088 Lung
1649959 -098 -010 -062 -099 086 064 291 -032 071 -038 -127 -130 Lung
1222317 001 -003 056 -098 016 013 215 -140 -021 -091 061 -111 Lung
2510171 -048 -041 -108 -120 071 -067 338 043 016 -020 -094 -074 Lung
1988674 -075 -048 -040 -064 -016 002 293 -047 -067 -051 042 -025 Lung
1672640 148 055 107 052 -027 -050 162 -165 -131 -122 051 -100 Lung
1749417 -I 09 -144 -037 -036 029 -028 235 -021 223 -089 -037 -072 Lung Kidney
1926543 -006 -067 -086 -064 -025 -001 -041 206 089 -032 -037 -037 Liver
1504934 -039 009 -047 -020 -082 -085 -090 476 013 -053 -075 -099 Liver
2512879 -069 -039 -031 -051 076 037 099 369 -067 -160 -049 -240 Liver
1359832 -016 -067 000 -027 065 047 133 409 -031 -178 -093 -357 Liver
1583076 -046 -073 -049 -069 112 066 083 373 -060 -146 -074 -240 Liver
139838 -037 -049 -125 -060 -040 -070 -061 566 -076 -052 -051 -075 Liver
1344654 -071 -046 -067 -071 -083 -070 -025 371 141 -051 -058 -102 Liver
2513979 -073 006 -056 -032 -056 -100 -056 613 -083 -091 -090 -102 Liver
2369312 -130 -046 -091 066 -107 -177 095 581 -191 -174 106 -070 Liver
2048364 -083 -010 -145 -120 -100 -063 -030 612 -095 -093 -118 020 Liver
Second
Third Table 5
89
96C0C/00SI1/X3d LZ6Z£/10 OΛV Table 5
Table 5
Table 5

Claims

What is claimed is:
1. A plurality of cell and tissue specific polynucleotides selected from SEQ ID NOs :1 -416 or the complement thereof. 2. A subset of the polynucleotides of claim 1 , wherein the subset is selected from at least one of the groups consisting of a) SEQ ID NOs: 209-218 and 1-10, cell specific polynucleotides of heart and fragments thereof, b) SEQ ID NOs:219-249 and 11-41, cell specific polynucleotides of skeletal muscle and fragments thereof; c) SEQ ID NOs:250-251 and 42-43, cell specific polynucleotides of uterus and fragments thereof; d) SEQ ID NOs:252-256 and 44-48, cell specific polynucleotides of ovary and fragments thereof; e) SEQ ID NOs:257-263 and 49-55, cell specific polynucleotides of stomach and fragments thereof; f) SEQ ID NOs:264-283 and 56-75, cell specific polynucleotides of intestine and fragments thereof; g) SEQ ID NOs:284-293 and 76-85, cell specific polynucleotides of lung and fragments thereof; h) SEQ ID NOs:294-345 and 86-137, cell specific polynucleotides of liver and fragments thereof; i) SEQ ID NOs:346-356 and 138-148, cell specific polynucleotides of kidney and fragments thereof; j) SEQ ID NOs:357-374 and 149-166, cell specific polynucleotides of pancreas and fragments thereof; and k) SEQ ID NOs:375-416 and 167-208, cell specific polynucleotides of brain and fragments thereof.
2. The composition of claim 1 , wherein the polynucleotides are immobilized on a substrate.
3. A high throughput method for detecting expression of a polynucleotide in a sample, the method comprising: a) hybridizing the polynucleotides of claim 1 with the nucleic acids of the sample under condition to form a hybridization complex; and b) detecting the hybridization complex, wherein the presence of hybridization complex indicates expression of the polynucleotide in the sample.
4. The method of claim 3 wherein the nucleic acids of the sample are amplified prior to hybridization.
5. The method of claim 3 wherein hybridization complex formation indicates the differentiation of embryonic stem cells into a tissue selected from the group consisting of brain, heart, kidney, liver, lung, muscle or pancreatic tissues.
6. A high throughput method of screening molecules or compounds to identify a ligand, the method comprising: a) combining the polynucleotides of claim 1 with molecules or compounds under conditions to allow specific binding; and b) detecting specific binding, thereby identifying a ligand which specifically binds to the composition.
7. The method of claim 6 wherein the molecules or compounds are selected from DNA molecules, RNA molecules, peptide nucleic acids, mimetics, peptides, and proteins.
8. An isolated polynucleotide selected from SEQ ID NOs:212, 228, 233, 259, 271 , 287, 316- 319, 324, 370, 379, 380, 383, 410, and 412 or a fragment thereof. 9. The polynucleotides of claim 8 wherein the fragments are SEQ ID NOs:4, 20, 25, 51, 63,
79, 108-111, 116, 162, 171, 172, 175, 202, and 204, respectively.
10. An expression vector containing a polynucleotide of claim 8.
11. A host cell containing the expression vector of claim 10
12. A method for producing a protein, the method comprising the steps of: (a) culturing the host cell of claim 11 under conditions for the expression of protein; and
(b) recovering the protein from the host cell culture.
13. A protein produced by the method of claim 12.
14. A high-throughput method for screening a library of molecules or compounds to identify at least one ligand which specifically binds a protein, the method comprising: (a) combining the protein of claim 13 with the library under conditions to allow specific binding; and
(b) detecting specific binding between the protein and a molecule or compound, thereby identifying a ligand which specifically binds the protein.
15. The method of claim 14 wherein the library is selected from DNA molecules, RNA molecules, peptide nucleic acids, mimetics, peptides, proteins, agonists, antagonists, antibodies or their fragments, immunoglobulins, inhibitors, drug compounds, and pharmaceutical agents.
16. A method of purifying a ligand from a sample, the method comprising: a) combining the protein of claim 13 with a sample under conditions to allow specific binding; b) recovering the bound protein; and c) separating the protein from the ligand, thereby obtaining purified ligand.
17. A composition comprising the protein of claim 13 in conjunction with a pharmaceutical carrier.
18. A purified antibody that specifically binds to the protein of claim 13.
EP00976921A 1999-11-04 2000-11-02 Tissue specific genes of diagnostic import Withdrawn EP1255859A2 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US16350899P 1999-11-04 1999-11-04
US163508P 1999-11-04
PCT/US2000/030396 WO2001032927A2 (en) 1999-11-04 2000-11-02 Tissue specific genes of diagnostic import

Publications (1)

Publication Number Publication Date
EP1255859A2 true EP1255859A2 (en) 2002-11-13

Family

ID=22590327

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00976921A Withdrawn EP1255859A2 (en) 1999-11-04 2000-11-02 Tissue specific genes of diagnostic import

Country Status (5)

Country Link
EP (1) EP1255859A2 (en)
JP (1) JP2004507206A (en)
AU (1) AU1462801A (en)
CA (1) CA2388511A1 (en)
WO (1) WO2001032927A2 (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6962779B1 (en) 1998-10-02 2005-11-08 Diadexus, Inc. Method of diagnosing, monitoring, staging, imaging and treating gastrointestinal cancers
JP2004515220A (en) * 2000-07-21 2004-05-27 インサイト・ゲノミックス・インコーポレイテッド Protease
US20050112568A1 (en) * 2001-06-05 2005-05-26 Lori Friedman Dgks as modifiers of the p53 pathwha and methods of use
US6905827B2 (en) 2001-06-08 2005-06-14 Expression Diagnostics, Inc. Methods and compositions for diagnosing or monitoring auto immune and chronic inflammatory diseases
US6815181B2 (en) 2001-07-09 2004-11-09 Applera Corporation Nucleic acid molecules encoding human secreted hemopexin-related proteins
GB0117631D0 (en) * 2001-07-19 2001-09-12 Syngenta Ltd Improvements in or relating to organic compounds
EP1281756A1 (en) * 2001-07-31 2003-02-05 GENOPIA Biomedical GmbH Regulator of calcineurin
EP1434876A4 (en) * 2001-09-11 2005-05-25 Univ Colorado Regents Expression profiling in the intact human heart
US7504222B2 (en) 2001-10-31 2009-03-17 Millennium Pharmaceuticals, Inc. Compositions, kits, and methods for identification, assessment, prevention, and therapy of breast cancer
AU2003241897A1 (en) * 2002-05-29 2003-12-12 Kyowa Hakko Kogyo Co., Ltd. Novel ubiquitin ligase
AU2003285877A1 (en) * 2002-11-14 2004-06-03 Eli Lilly And Company Novel proteins and their uses
WO2005068657A2 (en) * 2004-01-20 2005-07-28 Yissum Research Development Company Of The Hebrew University Of Jerusalem Genetically profiled cell lines (gpcl) and methods of utilizing same for genetic dissection of cellular phenotypes
CA2554585A1 (en) * 2004-01-27 2005-08-04 Compugen Ltd. Novel nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of cardiac disease
US7345142B2 (en) 2004-01-27 2008-03-18 Compugen Ltd. Nucleotide and amino acid sequences, and assays and methods of use thereof for diagnosis of cardiac disease
EP1851543A2 (en) 2005-02-24 2007-11-07 Compugen Ltd. Novel diagnostic markers, especially for in vivo imaging, and assays and methods of use thereof

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5663315A (en) * 1994-12-06 1997-09-02 Alphagene, Inc. Isolated DNA encoding human GP2
EP1027456B1 (en) * 1997-10-31 2005-03-16 Affymetrix, Inc. (a Delaware Corporation) Expression profiles in adult and fetal organs

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0132927A2 *

Also Published As

Publication number Publication date
AU1462801A (en) 2001-05-14
WO2001032927A2 (en) 2001-05-10
CA2388511A1 (en) 2001-05-10
WO2001032927A3 (en) 2002-08-15
JP2004507206A (en) 2004-03-11

Similar Documents

Publication Publication Date Title
US6821724B1 (en) Methods of genetic analysis using nucleic acid arrays
CN110177886B (en) Cluster classification and prognosis prediction system based on gastric cancer biological characteristics
US20210108266A1 (en) Method for discovering pharmacogenomic biomarkers
EP2215261B1 (en) A method of diagnosing neoplasms
US20040077003A1 (en) Composition for the detection of blood cell and immunological response gene expression
US20200056244A1 (en) Methods of treating a subject with a high gleason score prostate cancer
EP1718768B1 (en) Methods and compositions for determining a graft tolerant phenotype in a subject
US20030104410A1 (en) Human microarray
US20040106140A1 (en) Methods of identification and isolation of polynucleotides containing nucleic aicd differences
WO2002068579A2 (en) Kits, such as nucleic acid arrays, comprising a majority of human exons or transcripts, for detecting expression and other uses thereof
EP1255859A2 (en) Tissue specific genes of diagnostic import
JPH09503921A (en) Comparative analysis of gene transcripts
JP2007515947A (en) Prenatal diagnosis using acellular fetal DNA in amniotic fluid
US20040229224A1 (en) Allele-specific expression patterns
WO2005074540A2 (en) Novel predictors of transplant rejection determined by peripheral blood gene-expression profiling
US20020029113A1 (en) Method and system for predicting splice variant from DNA chip expression data
US20060228714A1 (en) Nucleic acid representations utilizing type IIB restriction endonuclease cleavage products
US20050170375A1 (en) Methods for enhancing gene expression analysis
US20030198983A1 (en) Methods of genetic analysis of human genes
US20030059788A1 (en) Genetic markers of toxicity, preparation and uses thereof
Barrett et al. High yields of RNA and DNA suitable for array analysis from cell sorter purified epithelial cell and tissue populations
JP2005512527A (en) Method for determining transcriptional activity
WO2002008453A2 (en) Canine toxicity genes
US20030082596A1 (en) Methods of genetic analysis of probes: test3
US20040235008A1 (en) Methods and compositions for profiling transcriptionally active sites of the genome

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20020522

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20050203