MXPA96003077A

MXPA96003077A - Comparative analysis of the transcription of the

Info

Publication number: MXPA96003077A
Application number: MXPA/A/1996/003077A
Authority: MX
Inventors: J Seilhamer Jeffrey; W Scott Randal
Original assignee: Incyte Pharmaceuticals Inc
Priority date: 1994-01-27
Filing date: 1996-07-29
Publication date: 2000-01-01

Abstract

A method and system for quantifying the relative abundance of gene transcripts of a biological specimen. One embodiment of the method generates high throughput analysis of specific sequences of multiple RNAs or their corresponding cDNAs (gene transcription image analysis). Another embodiment of the method produces an image analysis of gene transcription through the use of high throughput analysis of cDNA sequences. In addition, projection of gene transcription images can be used to detect or diagnose a particular condition, disease or biological condition that correlates with the relative abundance of gene transcripts in a given cell or cell population. The invention provides a method for comparing the transcription image analysis of genes from two or more different biological specimens in order to distinguish between the two specimens and to identify one or more genes that are differentially expressed between the two specimens.

Description

• - - COMPARATIVE ANALYSIS OF GENETIC TRANSCRIPTION 1. FIELD OF THE INVENTION The present invention pertains to the field of molecular biology and computer science; more particularly, the present invention describes methods for analyzing gene transcripts and diagnosing the genetic expression of cells and tissues. * 2. BACKGROUND OF THE INVENTION Until very recently, the history of molecular biology had described one gene at a time. Scientists have observed the physical changes of the cell, isolated mixtures of the cell or its medium, purified proteins, sequencing proteins and from this constructed probes to search for the corresponding gene. Recently, different countries have installed massive projects to sequence the billions of bases in the human genome. These projects-typically begin by dividing the genome into large portions of chromosomes and then determine the sequences of these pieces, which are then analyzed to see their identity with known proteins or portions of them, known as reasons. Unfortunately, most genomic DNA does not encode proteins and although it is postulated that this has some effect on the ability of the cell to make protein, its importance in medical applications is currently not understood. A third methodology involves sequencing only transcripts that encode the cellular machinery actively involved in making protein, namely mRNA.

The advantage is that the cell has already edited all the uncoded DNA, and it is relatively easy to identify the ^ v that • encodes RNA protein. The usefulness of this form of XTJ approximation was not immediately obvious for researchers of genomes. In fact, when it was initially proposed to sequence the cDNA, the method was strongly denounced by those in charge of sequencing genes. For example, the project leader of the Human Genome of the United States of America North America disqualified the sequencing of the cDNA for not being valuable and refused to approve the financial support of the projects. In this description, we show methods for analyzing DNA, including cDNA libraries. Based on our analysis and research, we see each individual gene product as a "pixel" of information, which is related to the expression of that, and only that, gene. We show in the present, methods by which individual "pixels" of gene expression information can be combined into a single "image" of gene transcription, in which each of the individual genes can be visualized simultaneously and allows relationships between the gene pixels so that they are easily visualized and understood. We also show a new method that we call electronic subtraction. Electronic subtraction will allow the gene researcher to convert a single image into a moving image, which describes the temporality or dynamics of gene expression, at the level of a cell or a whole tissue. It is this sense of "movement" of the cellular machinery on the scale of a cell or organ that constitutes the new invention in the present. This constitutes a new vision in the process of the physiology of the living cell and which reserves great promises to reveal and discover new forms of therapeutic approach and diagnosis in medicine. We show another method that we call "Northern and * electronic", which drags the expression of a single gene through many types of cells and tissues. Nucleic acids (DNA and RNA) carry within their sequence the hereditary information and are therefore the primary molecules of life. Nucleic acids are found in all living organisms including bacteria, fungi, viruses, plants and animals. It is interesting to determine the relative abundance of different discrete nucleic acids 5 in different cells, tissues and organisms over time under different conditions, treatments and regimens. All dividing cells in the human body contain the same set of 23 pairs of chromosomes. It is estimated that these autosomal and sexual chromosomes encode approximately 100,000 genes. It is believed that the differences between different cell types reflect the differential expression of the approximately 100,000 genes. Fundamental questions of biology could be answered by understanding which genes are transcribed and knowing the abundance of the transcripts in different strains. Previously, the technique only took measurements for the analysis of some genes known at that time by standard molecular biology techniques such as polymerase chain reaction (PCR), Northern blot analysis, or other types of DNA probe analysis such as Hybridization in if you Each of these methods allows one to analyze the transcription of only known genes and / or a small number of genes each time, Nucí Acids Res. 7097-7104 (1991); Nucí Acids Res. 18, 4833-42 (1990); Nucí Acids Res. 18, 2789-92 (1989); European J. Neuroscience 2, 1063-1973 (1990); Analytical Biochem. 187, 364-73 (1990); Genet Annals Techn, Appl. 7, 64-70 (1990); GATA 8. (4), 129-33 (1991); Proc. Nati Acad. Sci. USA 85, 1696-1700 (1988); Nucí Acids Res. 19, 1954 (1991); Proc. Nati Acad. Sci. USA 88., 1943-47 (1991); Nucí Acids Res. 19, 6123-27 (1991); Proc.

^ P? Nati Acad. Sci. USA 85, 5738-42 (1988); Nucí Acids Res. 16, 10937 (1988). Studies of the number and types of genes whose transcription is induced or regulated in some other way during cellular processes such as activation, differentiation, aging, viral transformation, morphogenesis, and mitosis have been pursued for many years, using a variety of methodologies. One of the oldest methods was to isolate and analyze levels of proteins in a cell, tissue, organ system, or even organisms before and after the process of interest. One method to analyze multiple proteins in a sample is to use 2-dimensional gel electrophoresis, where the proteins can, in principle, be identified and quantified as individual bands, and finally reduced to a discrete signal. Currently the analysis in 2 dimensions only resolves approximately ^^ P "15 percent of proteins In order to positively analyze those bands that are resolved, each band must be separated from the membrane and subjected to protein sequence analysis using Edman degradation. of the bands were presented in quantities too small to obtain a reliable sequence, and many of those bands contained more than one discrete protein.An additional difficulty is that many of the proteins were blocked at the amino terminus, further complicating the sequencing process. Analyzing the differentiation in the level of gene transcription has overcome many of these disadvantages and drawbacks, since the power of recombinant DNA technology allows the amplification of signals that contain very small amounts of material.The most common method, called "Hybridization subtraction", involves the isolation of the mRNA from the biological specimen before (B) and then (A) of the development process of interest, transcribing a set of mRNA into cDNA, subtracting specimen B of specimen A (cDNA mRNA) by hybridization, and constructing a cDNA library from the fraction of MRNA of non-hybridization. Many different groups have used this strategy successfully, and a variety of procedures have been published and improved using the same basic scheme. Nucí Acids Res. 19, 7097-7104 (1991); Nucí Acids? Res. 18, 4833-42 (1990); Nucí Acids Res. 18, 2789-92 (1989); European J. Neuroscience 2, 1063-1973 (1990); Analytical Biochem. 187, 364-73 (1990); Genet Annals Techn, Appl. 1_, 64-70 (1990); GATA 8. (4), 129-33 (1991); Proc. Nati Acad. Sci.

USA 85, 1696-1700 (1988); Nucí Acids Res. 19, 1954 (1991); , Proc. Nati Acad. Sci. USA 88., 1943-47 (1991); Nucí Acids Res. 19, 6123-27 (1991); Proc. Nati Acad. Sci. USA 85, 5738-42 (1988); Nucí Acids Res. 16, 10937 (1988). Although each of these techniques has particular strengths and weaknesses, there are still some limitations and undesirable aspects of these methods: First, the time and effort required to build these libraries is quite large. Typically, a trained molecular biologist might expect that the construction and characterization of such a library would require 3 to 6 months, depending on the level of skill, experience, and luck. Second, subtraction libraries are typically inferior to libraries constructed using standard methodology. A typical conventional cDNA library should have a clone complexity of at least 106 clones, and an average insertion size of 1-3 kB. In contrast, subtraction libraries can have complexities of 102 or 103 and average insert sizes of 0.2 kB. Therefore, there may be a significant loss of clones and sequence information associated with those libraries. Third, this way # of approximation allows the researcher to capture only the genes induced in specimen A in relation to specimen B, not vice versa, nor does it easily allow comparison with a third specimen of interest (C). Fourth, this form of approximation requires large quantities (hundreds of micrograms) of "driver" mRNA (specimen B), which significantly limits the number and type of subtractions that are possible since many tissues and cells are very difficult to obtain in large numbers. amounts.

Fifth, the resolution of the subtraction depends on the physical properties of the DNA hybridization: DNA or RNA: DN. The ability of a given sequence to find a hybridization match depends on its unique CoT value. The CoT value is a function of the number of copies (concentration) of the particular sequence, multiplied by the hybridization time. It follows that, for sequences that are abundant, hybridization events will occur very rapidly (low CoT value), while rare sequences will duplicate very high CoT values. The CoT values that allow these rare sequences to form duplicates and therefore be selected effectively are difficult to achieve in a convenient time frame. Therefore, hybridization subtraction is simply not a useful technique with which to study the relative levels of rare mRNA species. Sixth, this problem is further complicated by the fact that the formation of duplicates also depends on the composition of nucleotide bases for a given sequence. The sequences rich in G + C form stronger duplicates than those with a high content of A + T. Therefore, the above sequences will tend to be selectively removed by hybridization subtraction. Seventh, it is possible that hybridization between non-exact matches may occur. When this happens, the expression of a homologous gene can "mask" the expression of a gene of interest, artificially biasing the results for that particular gene. Matsubara and Okubo proposed using partial cDNA sequences to establish gene expression profiles that could be used in functional analysis of the human genome. Matsubara and Okubo warn of the danger of using random priming, because it creates multiple unique DNA fragments from the individual mRNAs and can thus bias the analysis of the number of particular mRNAs per library. The sequenced members selected at random from a cDNA library directed from 3 'and established the frequency of appearance of the various ESTs. They proposed comparing EST lists of various cell types to classify genes. The genes expressed in many types of cells were termed domestic and those expressed in certain cells were called specific cell genes, fPr even in the absence of the complete sequence of the gene or the biological activity of the gene product. The present invention avoids the drawbacks of the prior art by providing a method for quantifying the relative abundance of the transcripts of multiple genes in a given biological specimen by the use of high-throughput specific sequence analysis of individual RNAs and / or their corresponding DNAs . The present invention offers many advantages over current protein discovery methods that attempt to isolate individual proteins based on biological effects. The method of the present invention provides comparisons of the detailed diagnosis of the profiles of the cells that reveal numerous changes in the expression of the individual transcripts. The present invention offers many advantages over current subtraction methods that include an analysis * £ _ from more complex libraries (106 to 107 clones compared Your with 103 clones) that allows the identification of messages of little abundance as well as makes possible the identification of messages that can increase or decrease in abundance. These large libraries are very routine to be done in contrast to the libraries of previous methods. Further, The homologs can easily be distinguished with the method of the present invention. This method is very convenient because it organizes a large amount of data in a digestible, understandable format.

The most significant differences are highlighted by the electronic subtraction. In deep analyzes they become more convenient. The present invention provides many advantages over previous methods of electronic cDNA analysis. The method is particularly powerful when analyzing more than 100 and preferably more than 1000 gene transcripts. In this case, the new low frequency transcripts are discovered and the tissues are typified. High resolution analyzes of gene expression can be used directly as a diagnostic profile or to identify specific disease genes for the development of more classical diagnostic approach forms. This process is defined as frequency analysis of gene transcription. The resulting quantitative analysis of gene transcripts is defined as a comparative analysis of gene transcripts. 3. SUMMARY OF THE INVENTION The invention is a method for analyzing a specimen containing gene transcripts comprising the steps of (a) producing a library of biological sequences; (b) generating a set of transcript sequences, wherein each of the transcript sequences in said set is indicative of one of the biological sequences different from the library; (c) processing the transcription sequences in a programmed computer (in which a reference database of the sequences of the transcripts indicative of the reference sequences is stored), to generate an identified sequence value for each of the Transcription sequences, where f "each identified sequence value is indicative of the sequence annotation and a degree of coincidence between one of the biological sequences of the library and at least one of the reference sequences, and (d) process each identified sequence value 5 to generate final data values indicative of the number of times each identified sequence value is present in the library The invention also includes a method for comparing ^ - Two specimens containing gene transcripts. The (7"^ first specimen is processed as described above.

The second specimen is used to produce a second library of biological sequences, which is used to generate a second set of transcript sequences, wherein each of the transcript sequences in the second set is indicative of one of the biological sequences of the second library. Then the second set of ^ Transcription sequences are processed in a computer programmed to generate a second set of identified sequence values, that is, the additional identified sequence values, each of which is indicative of a sequence annotation and includes a degree of overlap between one of the biological sequences of the second library and at least one of the reference sequences. The additional identified sequence values are processed to generate additional final data values # each identified sequence value is indicative of the sequence annotation and a degree of coincidence between one of the biological sequences of the library and at least one of the sequences reference; and (d) processing each identified sequence value to generate final data values indicative of the number of times that each identified sequence value is present in the library. The invention also includes a method for comparing two specimens containing gene transcripts. The first O specimen is processed as described above. The second specimen is used to produce a second library of biological sequences, which is used to generate a second set of transcript sequences, wherein each of the transcript sequences in the second set is indicative of one of the biological sequences of the second library. Then the second set of # Transcription sequences are processed in a programmed computer to generate a second set of identified sequence values, that is, the additional identified 0 sequence values, each of which is indicative of a sequence annotation and includes a degree of coincidence between one of the biological sequences of the second library and at least one of the reference sequences. The additional identified sequence values 5 are processed to generate additional final data values # representative of clones transfected with DNA. Each clone in the population is identified by a specific sequence method that identifies the gene from which the single mRNA was transcribed. The number of times each (gene is identified with a clone to evaluate the abundance of gene transcripts.) Genes and their abundances are listed in order of abundance to produce an image of gene transcription, in an additional mode, the relative abundance of the transcripts. of genes in a cell type or tissue is compared with the relative abundance of numbers of transcripts of genes in a second type of cell or tissue in order to identify differences and similarities.In another embodiment, the method includes a system to analyze a library of biological sequences that includes an element to receive a set of sequences of # transcription where each of the transcription sequences is indicative of one of the biological sequences different from the library; and an element for processing the transcription sequences in a computer system in which a database of reference transcription sequences indicative of reference sequences is stored, wherein the computer is programmed with software to generate a sequence value of identification for each of the Wr transcript sequences, wherein each identified sequence value is indicative of a sequence annotation and the degree of coincidence between one of the biological sequences of the library and at least one of the reference sequences, and to process each identified sequence value to generate final data values indicative of the number of times each identified sequence value is present in the library. # In essence, the invention is a method and system for quantifying the relative abundance of gene transcripts in a biological specimen. The invention provides a method for comparing the transcription picture of genes from two or more different biological specimens in order to distinguish between the two specimens and identify one or more genes that are differentially expressed between the two specimens. Thus, this gene transcription image and its ff comparison can be used as a diagnostic. One modality of the method generates a high-throughput specific sequence analysis of multiple RNAs or their corresponding DNA: a 0 transcript image of genes. Another quality of the method produces image analysis of gene transcripts through the use of high-throughput DNA sequence analysis. In addition, two or more gene transcription images can be compared and used to detect or diagnose a particular condition, disease, or biological condition that correlates with the relative abundance of gene transcripts in a given cell or cell population. 4. Description of the Tables and Drawings 5 4.1. Tables Table 1 presents a detailed explanation of the letter codes used in Tables 2-5. Table 2 lists the hundred most common gene transcripts. This is a partial list of isolated from The HUVEC cDNA library prepared and sequenced as described below. The column of the left hand refers to the order of abundance of the sequences in this table. The next column entitled "number" is the clone number of the first HUVEC sequence identification reference that matches the sequence in the "entry" column number. Isolates that were not well sequenced are not present in Table 2. The next column, entitled "N", * indicates the total number of cDNAs that have the same degree of coincidence with the sequence of the transcription of reference in the "entry" column. The column titled "entry" gives the name of the NIH GENBANK locus, which corresponds to the sequence numbers of the library. The "s" column indicates in a few cases the species of the reference sequence. The code for The "s" column is given in Table 1. The column entitled "describer" provides a complete explanation in English of the identity of the sequence corresponding to the name of the NIH GENBANK locus in the "input" column. Table 3 is a comparison of the top 15 most abundant gene transcripts in normal monocytes and activated macrophage cells. Table 4 is a detailed summary of the summary of the library subtraction analysis comparing the THP-1 and human macrophage cDNA sequences. In Table riT 4, the same code as in Table 2 is used. Additional columns are for "bgfreq" (abundance number in the subtraction library), "rfend" (abundance number in the target library) and " quotient "(the target abundance number divided by the abundance number of subtraction). As is clear from the careful reading of the table, when the abundance number in the library of B subtraction is "0", the target abundance number is divided by 0.05. This is a way to get a result (not possible by dividing by 0) and that distinguishes the result from the quotients of the subtraction numbers of 1. Table 5 is the computer program, written in source code, to generate gene transcription subtraction profiles. Table 6 is a partial list of the entries of the database used in the electronic analysis of the Northern blot as provided by the present invention. 4. 2 Brief description of the Drawings Figure 1 is a diagram summarizing the data collected and stored with respect to the library construction portion of the sequence preparation and analysis. Figure 2 is a diagram representing the sequence of operations performed by the "abundance classification" software in a class of preferred embodiments of the method of the invention. Figure 4 is a more detailed block diagram of the bioinformatics process from a new sequence (which has already been sequenced but has not been identified) to print the analysis of the image of the transcript and the provision of the subscriptions of the database.

. DETAILED DESCRIPTION OF THE INVENTION The present invention provides a method for comparing the relative abundance of gene transcripts in different biological specimens by using a high sequence specificity analysis of the individual RNAs or their corresponding cDNAs (or alternatively, of the data representing other biological sequences). This process is denoted in the present wr as an image of gene transcription. The quantitative analysis of relative abundance for a set of gene transcripts is denoted herein as "gene transcription image analysis" or "analysis of frequency of gene transcription. "The present invention makes it possible to obtain a profile for the transcription of genes in any population of cells or tissue given from any type of organism. ^ to obtain a profile of a specimen consisting of a single cell (or clones of a single cell), or of many cells, or of more complex tissue than a single cell and containing multiple cell types, such as the liver . The invention has significant advantages in the fields of diagnostics, toxicology and pharmacology, to name a few. A highly sophisticated diagnostic test can be performed on the sick patient whose diagnosis has not been made. A biological specimen is obtained consisting of the patient's fluids or tissues, and the gene transcripts are isolated and expanded at the amount necessary to determine their identity. Optionally, gene transcripts can be converted to cDNA. A sampling of the gene transcripts is subjected to specific sequence analysis and quantified. These sequence abundances of gene transcripts are compared against the sequence abundances of the base Wr data reference including normal datasets for sick and healthy patients. The patient has the disease (s) with which the patient data set correlates most closely. 5, For example, the analysis of gene transcription frequency can be used to differentiate normal cells or tissues from diseased cells or tissues, precisely as this highlights the differences between normal monocytes and activated macrophages in Table 3. In toxicology, A fundamental question is which tests are most effective in predicting or detecting a toxic effect. The imaging of gene transcripts provides very detailed information about the cell and the tissue environment, some of which would not be obvious in conventional, more detailed methods of analysis. The image (of gene transcription is a more powerful Jjf method for predicting the toxicity and efficacy of drugs.) Similar benefits accrue in the use of this tool in pharmacology.The image of gene transcription can be used selectively to observe the protein categories that are expected to affect, for example, enzymes that detoxify toxins In an alternative modality, the comparative analysis of gene transcription frequency is used to differentiate between cancer cells that respond to anticancer agents and those that do not respond. Examples of anticancer agents are tamoxifen, vincristine, vinblastine, podophyllotoxins, etoposide, tennisposide, cisplatin, biological response modifiers such as interferon, 11-2, GM-CSF, enzymes, hormones and the like.This method also provides a means for classifying gene transcripts by functional category.In the case of cancer cells, the factor It is transcription or other essential regulatory molecules are very important categories to analyze through different libraries. In yet another embodiment, the comparative analysis of gene transcription frequency is used to differentiate between control liver cells and liver cells isolated from patients treated with experimental drugs such as FIAU to distinguish between the pathology caused by the underlying disease and the caused by the drug. In yet another modality, the comparative analysis of gene transcription frequency is used to differentiate between brain tissue from treated and untreated patients with lithium. In a further embodiment, the comparative analysis of gene transcription frequency is used to differentiate between cells treated with cyclosporin and Fk506 and normal cells.

In a further embodiment, the comparative analysis of gene transcription frequency is used to differentiate between virally infected human cells (including HIV infected) and uninfected human cells. Gene transcription frequency analysis is also used for the rapid recognition of gene transcripts in HIV-resistant, HIV-infected, and HIV-sensitive cells. The comparison of the abundance of the transcription of genes will indicate the success of the treatment and / or new avenues to study. In a further embodiment, the comparative analysis of gene transcription frequency is used to differentiate between bronchial lavage fluids from healthy and diseased patients with a variety of conditions. In a further embodiment, the comparative analysis of gene transcription frequency for j ^ P is used to differentiate between cell, plant, microbe and mutant animals and wild-type species. In addition, the transcription abundance program is adapted to allow the scientist to evaluate the transcription of a gene in many different tissues. These comparisons could identify mutants by default that do not produce a gene product and point mutants that produce a less abundant message in a different way. These mutations can affect the basic biochemical and pharmacological processes, such as flp mineral nutrition and metabolism, and can be isolated by means known to those skilled in the art. Thus, crops with improved yields, resistance to pests and other factors can be developed. In a further modality, the comparative analysis of gene transcription frequency is used for a comparative analysis between species that would allow the selection of better models of pharmacological animals. In this embodiment, humans and other animals (such as a mouse) or their cultured cells are treated with a specific test agent. The sequence-specific abundance of each cDNA population is determined. If the animal's test system is a good model, the homologous genes in the cDNA population of the animal must change expression in a manner similar to those in human cells. If side effects are detected with the drug, a detailed analysis of transcript abundance is performed to recognize changes in gene transcription. Then the models must be evaluated by comparing the basic physiological changes. In a further embodiment, the comparative analysis of gene transcription frequency is used in the clinical setting to give a very detailed gene transcription profile of a patient's cells or tissue (eg, a blood sample). In particular the frequency analysis of gene transcription is used to give a high-resolution gene expression profile of a disease state or condition. In the preferred embodiment, the method uses a high-throughput cDNA sequencing to identify specific transcripts of interest. The generated DNA and the deduced amino acid sequences are then compared extensively with GENBANK and other data banks of , _ sequences as described below. The method offers many advantages over current protein discovery by two-dimensional gel methods that attempt to identify the individual proteins involved in a particular biological effect. Here, detailed comparisons of the profiles of the activated and inactive cells reveal numerous changes in the expression of the individual transcripts. After this it is determined if jf- the sequence is an "exact" match, similar or mismatch, the sequence is entered into a database. Next, the numbers of DNA copies that 0 correspond to each gene are tabulated. Although this can be done slowly and with difficulty, if at last, made by the hand of man, from the printing of all entries, a computer program is a useful and quick way to tabulate this information. The numbers of cDNA copies (optionally divided by the total number of sequences in the data set) provide an image of the relative abundance of * transcripts for each corresponding gene. The list of genes represented should then be selected by abundance in the cDNA population. A multitude of 5 additional types of comparisons or dimensions are possible and are exemplified below. An alternative method to produce an image of gene transcription includes the steps of obtaining a mixture ^. of test mRNA and provide a representative order of or unique probes whose sequences are complementary to at least some of the test mRNAs. Next, a fixed amount of the test mRNA is added to the arranged probes. The test RNA is incubated with the probes for a sufficient time to allow hybridization of the test mRNA and the probes. The mRNA-probe hybrids are detected and the amount determined. Hybrids are identified by their * position in the order of the probe. The amount of each hybrid is added to give a population number. Each hybrid quantity is divided by the population number for 0 to provide a set of relative abundance data called an image analysis of gene transcription. 6. Emploses The following examples are provided to illustrate the subject of the invention. These examples are provided by way of illustration and are not included for the purpose of limiting the invention. 6. 1. Origin of tissues and cell lines For analysis with the computer program claimed herein, biological sequences can be obtained from virtually any source. The most popular are the tissues obtained from the human body. You can get tissues from any organ of the body, from a -? ^ donor of any age, any abnormality or any immortalized cell line. Immortal cell lines may be preferred in some cases due to their cell type purity; Other tissue samples invariably include mixed cell types. This available a special technique to take a single cell (for example, a brain cell) and strengthen the cellular machinery to develop enough cDNA to sequence, using the techniques and analyzes described herein (cf.

Patent of the United States of North America Nos. 5, 021, 35 and 0 5,168,038, which are incorporated by reference). The examples given herein used the following immortalized cell lines: U937 monocyte-like cells, TPH-macrophage-like cells, vascular-induced endothelial cells (HUVEC cells) and cells 5 as masts HMC-1.

The U-937 cell line is a human histolytic lymphoma cell line with monocyte characteristics, established from malignant cells obtained from the pleural effusion of a patient with diffuse histolytic lymphoma (Sundstrom, C. and Nilsson, K (1976) Int. J. Cancer 17: 565). The U-937 * is one of only a few lines of human cells with the morphology, cytochemistry, surface and monocyte-like receptors characteristic of histiocytic cells. These cells can be induced to terminal monocytic differentiation and will express new molecules on the surface of the cell when activated with supernatants of human mixed lymphocyte cultures. In this type of in vitro activation, the cells produce morphological and functional changes, including increased antibody-dependent cellular cytotoxicity (ADCC) against erythroid cells and tumor target cells (one of the main functions of macrophages). The activation of U-937 cells with phorbol 12-myristate 13 acetate (PMA) in vitro stimulates the production of many compounds, including prostaglandins, leukotrienes and platelets activating factor (PAF), which are potent inflammatory mediators. Thus, U-937 is a cell line that is well suited for the identification and isolation of gene transcripts associated with normal monocytes. The HUVEC cell line is an early passage endothelial cell culture, normal, homogeneous, well characterized from the human umbilical vein (Cell Systems Corp., 12815 .NE 124th Street, Kirkland, WA 98034). Only gene transcripts of induced or treated HUVEC cells were sequenced. A batch of 1 x 108 cells was treated for 5 hours with 1 U / milliliter rIL-lb and 100 ng / ml of E. coli lipopolysaccharide endotoxin (LPS) before harvesting. A separate batch of 2 x 10 cells was treated in confluence with 4 U / milliliters of TNF and 2 U / milliliters of interferon-gamma (IFN-gamma) before harvesting. THP-1 is a line of human leukemic cells with distinctive monocytic features. This cell line was derived from the blood of a 1-year-old child with acute monocytic leukemia (Tsuchiya, S. et al. (1980) Int. J. Cancer: 171-76). The following cytological and cytochemical criteria were used to determine the nature # Monocytic cell line: 1) the presence of alpha-naphthyl butyrate esterase activity that could be inhibited with sodium fluoride; 2) the production of lysozyme; 3) phagocytosis of latex particles and sensitized SRBC (red blood cells of sheep); and 4) the ability of THP-1 cells treated with mitomycin C to activate T lymphocytes following ConA treatment (concanavalin A).

Morphologically, the cytoplasm contained small azurophilic granules and the nucleus was indented and irregularly f with deep folds. The cell line has Fc and C3b receptors, which probably work in phagocytosis. THP-1 cells treated with the tumor promoter 12-o-tetradecanoyl phorbol-13 acetate (TPA) stops proliferation and differs in cells such as macrophages that mimic macrophages derived from native monocytes in many aspects. Morphologically, as the shape of the cells changes, the nucleus becomes more irregular and additional rf- phagocytic vacuoles appear in the cytoplasm. Differentiated THP-1 cells also exhibit increased adhesion to the plastic of the tissue culture. The HMC-1 cells (human mast cell line) were established from the peripheral blood of a Mayo Clinic patient with leukemia, from mast cells 5 (Leukemia Res. (1988) 12: 345-55). The cultured cells looked similar to immature nV cloned murine mast cells, contained histamine, and stained positively with chloroacetate esterase, amino caproate esterase, eosinophilic major basic protein (MBP) and tryptase. HMC-1, 0 cells, however, have lost the ability to synthesize normal IgE receptors. HMC-1 cells also possess a translocation 10; 16, present in the cells initially collected by leukophoresis from the patient and not a culture artifact. Thus, HMC-1 cells are a good model for mast cells.

H 6.2 Construction of cDNA libraries For comparisons between libraries, libraries should be prepared in similar ways. Some parameters seem to be particularly important for the control. One of those parameters is the method of isolating mRNA. It is important to use the same conditions to remove DNA and heterogeneous nuclear RNA for comparison libraries. The fractionation of the cDNA size must be controlled. carefully. Preferably, the same vector ic 'should be used to prepare the libraries to be compared. At least, the same type of vector (for example, unidirectional vector) should be used to ensure a valid comparison. A unidirectional vector can be preferred in order to more easily analyze the product. It is preferred to prime only with oligo dT unidirectional primer in order to obtain a single clone by transcription of mRNA to obtain cDNA. However, it is recognized that using a mixture of oligo dT and random primers can also have an advantage because mixing results in greater sequence diversity when the discovery of genes is also a goal. Similar effects can be obtained with DR2 (Clontech) and HXLOX (US Biochemical) and also Invitrogen and Novagen vectors. These vectors have two requirements. First, there must be sites of primer for the commercially available primers as the reverse primers T3 and M13. Second, the vector must accept inserts up to 10 kB. It is also important that a clone sample be randomly drawn, and that a significant population of clones. Data has been generated with 5000 clones; however, if very rare genes and / or their relative abundance are to be obtained, as many as 100,000 clones from a single library may be needed for sampling. The fractionation of J cDNA size should also be carefully controlled. From Alternatively, plates can be selected, instead of clones. To one side of the Uni-ZAP ™ vector system by Stratagene described below, it is now believed that other unidirectional vectors can also be used in a similar manner.

For example, it is believed that these vectors include, but are not limited to DR2 (Clontech), and HXLOX (U.S. Biochemical). Preferably, the details of the construction of the library (as shown in Figure 1) are collected and stored in a database for later retrieval in relation to the sequences being compared. Figure 1 shows important information regarding the collaborator of the cDNA or cell library or provider, pretreatment, biological origin, culture, RNA preparation, and cDNA construction. Similarly, detailed information about other steps, benefits the Pr analysis of sequences and libraries in depth. The RNA must be harvested from cell and tissue samples and the cDNA libraries are subsequently constructed. The cDNA libraries can be constructed from according to techniques known in the art. (See, for example, Maniatis, T et al. (1982) Molecular Cloning, Cold Spring Harbor Laboratory, New York). You can also buy cDNA libraries. The U-937 cDNA library , (catalog No. 937207) was obtained from Stratagene, Inc., 11099 M. 0 Torrey Pines Rd., La Jolla, CA 92037. The THP-1 cDNA library was constructed for the Stratagene client from cultured THP cells. 48 hours with 100 nm TPA and 4 hours with 1 μg / ml LPS. The human mast cell cDNA library HMC-1 was also sent 5 in Stratagene from cultured HMC-1 cells. The HUVEC cDNA library was ordered to be made in Stratagene from two batches of induced HuVEC cells that were processed separately. Essentially, all libraries were prepared 0 in the same way. First, poly (A +) RNA (mRNA) was purified. for RNA U-937 and HMC, cDNA synthesis was only primed with oligo dT. For THP-1 and HUVEC RNA, the synthesis of aDNA was primed separately with both oligo dT and random hexamers, and the two cDNA libraries were separately treated. 5 Synthetic adapter oligonucleotides were ligated into the DNA ends making it possible to insert them into the Uni-Zap ™ vector system (Stratagene), allowing high unidirectional efficiency (sense orientation), the construction of the lambda library and the convenience of a plasmid system 5 with blue-white color classification to detect clones with cDNA inserts. Finally, the two libraries were combined into a single library mixing equal numbers of bacteriophages. A Libraries can be selected with either DNA probes or antibody probes and the pBluescript® phagemid (Stratagene) can be easily extracted in vivo. Fagomido allows the use of a plasmid system to easily insert characterization, sequencing, site-directed mutagenesis, the creation of unidirectional deletions and the expression of fusion proteins. The phage particles of the library sent to be made # infected within the host strain of E. coli XLl-blue® (Stratagene), which has a high transformation efficiency, which increases the probability of obtaining clones -raras poorly represented in the cDNA library. 6.3. Isolation of cDNA clones The fagomid forms of the individual cDNA clones were obtained in an in vivo extraction process, in which the host bacterial strain was co-infected with both the phage of the lambda library and T an auxiliary phage fl. The proteins derived from both the phage-containing library and the auxiliary phage nicked the lambda DNA, initiated a new DNA synthesis from the defined sequences in the target lambda DNA and created a smaller, single chain circular phagocyte DNA molecule, which included all the DNA sequences of the pBluescript® plasmid and the cDNA insert. The fagomide DNA was secreted from the cells and purified, then used to reinfect fresh host cells, where "U" produced the double-stranded fagomida DNA Because fagomida carries the gene for beta-lactamase, the newly transformed bacteria are selected in a medium containing ampicillin. The fagomide DNA was purified using the System of Magic Minipreps ™ DNA Purification (Promega catalog # A7100. Promega Corp. 2800 Woods Hollow Rd., Madison, Wl 'H 53711). This small-scale process provides a simple and reliable method for the iysis of bacterial cells and rapidly isolates the purified fagomid DNA using a Appropriate resin that binds to DNA. The DNA was separated from the purification resin already prepared for DNA sequencing and other analytical manipulations. The fagomide DNA was also purified using the QIAGEN® QIAwell-8 Plasmid Purification System (QUIAGEN Inc., 9259 Eton Ave., Cattsworth, CA 91311). This PT product line provides a convenient, reliable and fast high-throughput method for lysing bacterial cells and isolating highly purified fagomide DNA using QIAGEN anion exchange resin particles with EMPORE ™ membrane technology starting from 3 M in a multiple well format. The DNA was separated from the purification resin already prepared for DNA sequencing and other analytical manipulations. An alternative method for the purification of fagomida has recently become available. This uses the Miniprep Kit (Catalog No. 77468, available from Advanced Genetic Technologies Corp., 19212 Orbit Drive, Gaithersburg, Maryland). This equipment is in the 96-well format and provides enough reagents for 960 purifications. Each equipment is provided with a recommended protocol, which has been used except for the following changes. First, the 96 wells are each filled with only 1 milliliter of sterile, intense broth with carbenicillin at 25 milligrams / liter and 0.4% glycerol. After the wells are inoculated, the bacteria are cultured for 24 hours and treated with 60 μl of regulator for lysis. A centrifugation step is carried out (2900 rpm for 5 minutes) before the contents of the block are added to the main filter tray. The optional step of adding isopropanol to the TRIS regulator is not carried out routinely. After the last step in the protocol, the samples are transferred to a 96-well Beckman block for storage. Another new DNA purification system is the WIZARD ™ product line that is available from Promega (catalog No. A7071) and can be adapted to the 96-well format. 6. 4 Sequencing of cDNA Clones The cDNA inserts from randomized isolates from libraries U-937 and THP-1 were partially sequenced. Methods for DNA sequencing are well known in the art. Conventional enzymatic methods employ the Klenow fragment of DNA polymerase, Sequenase ™ or Taq polymerase to extend the strands of DNA from an oligonucleotide primer quenched to the DNA template of interest. Methods have been developed for the use of both simple and double-chain templates. The chain termination reaction * products are usually subjected to electrophoresis on urea-acrylated gels and are detected either by autoradiography (for precursors labeled with radionuclide) or by fluorescence (for fluorescent labeled precursors). Recent advances in the preparation of the mechanized reaction, sequencing and analysis using the fluorescent detection method * have allowed the expansion in the number of sequences that can be determined per day (such as Applied Biosystems 373 and DNA sequencer 377 , Catalyst 800). Currently with the system as described, the reading lengths fluctuate from 250 to 400 bases and are dependent on the clones. The length of the reading also varies with the length of time the gel runs. In general, shorter races tend to truncate the sequence. A minimum of only about 25 to 50 bases is necessary to establish the identification and degree of homology of the sequence. The specific method, which includes but is not limited to hybridization, mass spectroscopy, capillary electrophoresis and gel electrophoresis 505 6.5. Investigation of homology of DNA clones and deduced protein (and subsequent steps) Using the nucleotide sequences derived from * clones of cDNA clones as sequences of doubt (Sequence Listing sequences), databases containing 0 sequences previously identified are investigated to see if they have areas of homology (similarity). Examples of those databases include Genbank and EMBL. We next describe examples of two homology search algorithms that can be used, and then describe the subsequent steps 5 implemented in the computer to bring them to the end according to the preferred embodiments of the invention. In the following description of the computer implemented steps of the invention, the word "library" denotes a set (or population) of nucleic acid sequences of a biological specimen. A "library" may consist of cDNA sequences, RNA sequences, or the like, which characterize a biological specimen. The biological specimen may consist of cells of a single type 0 of human cells (or may be any of the other types of specimens mentioned above). We think that the sequences in a library have been determined in order to represent or accurately characterize a biological specimen (for example, they can consist of cDNA sequences representative of RNA clones taken from a single human cell). Wm In the following description of the steps implemented in computer of the invention, the expression "database" denotes a set of stored data that represents a collection of sequences, which in turn represent a collection of biological reference materials. For example, a database may consist of data representing many stored cDNA sequences which are, in turn, representative of human cells 5 infected by various viruses, human cells of various ages, cells of different mammalian species, and so on. like that. In the preferred embodiments, the invention employs a computer programmed with software (to be described) to perform the following steps: (a) processing of indicator data of the cDNA sequences of a library (generated as a result of high-level sequencing) cDNA yield or other method) f to determine if each sequence in the library matches a DNA sequence from a reference database of DNA sequences (and if so, identify the entry of the reference database that matches the sequence and indicate the degree of coincidence between the reference sequence and the library sequence) and assign an identified sequence value 15 based on the annotation of the sequence and the degree of coincidence with each of the sequences in the sequence. library; (b) for some or all of the entries in the database, tabulate the number of sequence values identified that match in the library (although this can be done by hand from an impression of all the entries, we prefer to carry out this step using the computer software that will be described later), thereby generating a set of final data values or "abundance numbers"; and # (c) if the libraries are of different sizes, divide each abundance number by the total number of. sequences in the library, to obtain a relative abundance number for each identified sequence value 5 (ie, a relative abundance of each gene transcript). The list of identified sequence values (or genes corresponding thereto) can be selected by i, - abundance in the cDNA population. A multitude of i f additional types of comparisons or dimensions is possible. For example (to be described later in greater detail), steps (a) and (b) may be repeated for two different libraries (sometimes referred to as an "objective" library and a "subtraction library").

Then, for each identified sequence value (or gene transcript), a "quotient" value is obtained S dividing the abundance number (for that identified sequence value) for the target library, enter the abundance number (for that value of - sequence identified) for the subtraction library. In fact, the subtraction must take place in multiple libraries. It is possible to add the transcripts of several libraries (for example, three) and then divide them among another set of multiple transcripts libraries (again, for example, three). The notation for • this operation can be abbreviated as (A + B + B) / (D + E + F), where the uppercase letters each indicate a complete library. Optionally, the abundance numbers of the transcripts in the summed libraries can be divided 5 by the total sample size before the subtraction. Unlike standard hybridization technology that allows a single subtraction of two libraries, once one has processed a set or library of jfe transcription sequences and stored them in the In the computer, you can perform any number of subtractions in the library, for example, using this method, the quotient values can be obtained by dividing the values of relative abundance in a first library between the corresponding values in a second library. and vice versa. In variations of step (a), the library consists of • in nucleotide sequences derived from cDNA clones. Examples of databases that can be investigated to see areas of homology (similarity) in step (a) include the 20 commercially available databases known as Genbank (NIH) EMBL (European Molecular Biology Labs, Germany), and GENESEQ (Intelligenetics, Mountain View, California). A homology investigation algorithm that can be used to implement step (a) is the algorithm described in the work of D.J. Lipman and W.R. Péarson, entitled "Rapid and Sensitive Protein Similarity Searches," Science, 227: 1435 (1985). In this algorithm, the homologous regions are discovered in a way in two steps. In the first step, the 5 most homologous regions are determined by calculating a match indicator using a homology indicator table. The "Ktup" parameter is used in this step to set the minimum window size to be moved to compare two sequences. Ktup also sets the number of bases that must coincide) to extract the region of greatest homology between the sequences. In this step, insertions or deletions are not applied and the homology is exposed as an initial value (INIT). In the second step, the homologous regions are align to obtain the maximum indication of coincidence by inserting a gap in order to add a probable deleted portion. The match indicator obtained in the first step is recalculated using the Homology Indicator Table and the Insertion Indicator Table to a value-optimized 0 (OPT) in the final product. DNA homologies between two sequences can be examined graphically using the Harr method to construct graphs of homology matrices (Needleman, S.G. and Wunsch, C. O., J. Mom. Biol 48: 443 (1970)). This method produces a two-dimensional graph that can be useful for determining the regions of homology against the repeating regions.

However, in a class of preferred modalities, the step (a) is implemented by processing the data from the library in the commercially available computer program known as the INHERIT 670 Sequence Analysis System, available from Applied Biosystems Inc. (Foster City, California), including software known as the sof ware Invoice (also available from Applied Biosystems Inc.). The Invoice program pre-processes each library sequence to "edit" portions of it that do not appear to be of interest, such as the vector used to prepare the library, additional sequences that can be edited or masked (ignored by research instruments) include but are not limited to the poly-A tail and the repetitive GAG and CCC sequences. You can write a low-end search program to mask those "little information" sequences, or programs such as BLAST can ignore the low-information sequences. In the algorithm implemented by the INHERIT 670 sequence analysis system, the pattern specification language (developed by TRW Inc.) is used to determine regions of homology. "There are three parameters that determine how the INHERIT analysis executes the sequence comparisons: window size, window derivation and error tolerance." The size of the window specifies the? F length of the segments within which the sequence is subdivided. problem sequence The window derivation specifies where the next segment [to be compared] begins, counting from the beginning of the previous segment. errors specifies the total number of insertions, deletions and / or substitutions that are tolerated over the specified word length. The error tolerance must be set to any integer between 0 and 6. The values e? for lack are window tolerance = 20, derivation of 0 window = 10 and error tolerance = 3. "INHERIT Analysis Users Manual, pp. 2-15. Version 1.0, Applied Biosystems, Inc. October 1991. Using a combination of these three parameters, a database (such as a DNA database) can be searched to see the sequences containing regions of homology and the appropriate sequences are indicated with a value ~ Initial WM. Subsequently, these homologous regions are examined using dot matrix homology plots to determine regions of homology against regions of 0 repetition. Smith-Waterman alignments can be used to expose the results of the homology investigation. The INHERIT software can be run using a Sun computer system programmed with the UNIX operating system. The research alternatives with respect to 5 INHERIT include the BLAST program, GCG (available from Genetics Computer Group, Wl) and the Dasher program (Temple Smith, Boston University, Boston, MA). Nucleotide sequences can be investigated against Genbank, EMBL or client databases such as GENESEQ (available from _ Intelligenetics, Mountain View, CA) or other databases for * genes. In addition, we have investigated some sequences against our own domestic database. In the preferred embodiments, the transcription sequences are analyzed by the INHERIT software for the best conformation with a transcription of reference genes to assign a sequence identifier and assigned the degree of homology, which together are the identified sequence value and they are entered into, and subsequently processed by a Macintosh personal computer (available from Apple) programmed with a computer program of "abundance and subtraction classification analysis" (to be described later). Prior to the subtraction analysis and abundance classification program (also referred to as the "abundance classification" program), the identified sequences of the cDNA clones are assigned values (according to the parameters given above) by the degree of coincidence according to the following categories: "exact" coincidence (regions with a high degree of identity), homologous human coincidence) homologous non-human coincidence (regions of high similarity present in species other than human species), or without coincidence (without significant regions of homology with respect to nucleotide sequences previously identified, stored in the form of the database). Alternatively, the degree of coincidence may be a numerical value as described below. Again, with reference to the step of "identifying the matches between the reference sequences and the database entries, the protein and peptide sequences can be deduced from the sequences of the nucleic acids. deduced polypeptide, match identification can be performed in a manner analogous to that done with the cDNA sequences.A protein sequence is used as a problem sequence and compared to previously identified sequences contained in such a database as the Swiss / Prot, PIR and the NBRF Protein database to find homologous proteins. These proteins are indicated by their homology using a Table of homology indicators (Orcutt, B.C. and Dayoff, M.O. Scoring Matrices, PIR Report MAT - 0285 (February 1985)) resulting in an INIT indicator.

The homologous regions are aligned to obtain the maximum match indicators by inserting a gap that adds a probable deleted portion. The match indicator is recalculated using the Homology Indicator Table and the Insertion Indicator Table resulting in an Optimal Indicator (OPT). Even in the absence of knowledge about the appropriate reading frame of an isolated sequence, the protein homology research described above can be carried out by investigating in the 3 reading frames. The homologies of the peptide and protein sequences can also be ascertained using the INHERIT 670 sequence analysis system in a manner analogous to that used in the DNA sequence homologies. The pattern specification language and parameter windows are used to investigate protein databases for sequences that contain regions of homology with indicators of an initial value. Subsequent displays on the dot matrix homology plot show regions of homology against repeating regions. Additional research tools that are available for use in the pattern research database include PLsearch Blocks (available from Henikoff &; Henikoff, University of Washington, Seattle), Dacher and GCG. Patterns research databases include, but are not limited to, Protein Blocks (available from Henikoff &Henikoff, University of Washington, Seattle), Brookhaven Protein (available from Brookhaven National Laboratory, Brookhaven, MA), PROSITE (available in Amos Bairoch, University of Geneva, Switzerland), ProDo (available in Temple Smith, Boston University) and PROTEIN MOTIF FINGERPRINT (available at University of Leeds, United Kingdom). The ABI Assembler application software, part of the INHERIT DNA analysis system (available from Applied Biosystems, Inc., Foster City, CA), can be used to create or manage sequence assembly projects by assembling data from fragments of sequences selected in a longer sequence. The Assembler software combines two advanced computing technologies that maximize the ability to assemble sequenced DNA fragments into Assemblies, a special grouping of data in which relationships between sequences are displayed by graphical overlays, alignment and statistical views. The process is based on the Meyers-Kececioglu fragment assembly model (INHERIT ™ Assembler User's Manual, Applied Biosystems, Inc., Foster City, CA), and uses graph theory as the foundation of a highly sequenced multiple alignment machinery. rigorous to assemble fragments of DNA sequences. Other assembly programs that can be used include MEGALIGN (available from Roger Staden, Cambridge, England). Next, with reference to Figure 2, we describe in more detail the "abundance classification" program that implements "step (b)" mentioned above to tabulate the number of library sequences that match each base entry of data (the "abundance number" for each entry in the database). Figure 2 is a flow diagram of a preferred embodiment of the abundance classification program. A source code that lists this modality of the abundance rating program is shown in Table 5. In the implementation of Table 5, the abundance classification program is written using the FoxBASE programming language, commercially available from Microsoft Corporation. Although FoxBASE was the program chosen for the first iteration of this technology, it should not be considered limiting. Many other programming languages can also be used, Sybase being a particularly desirable alternative, as will be obvious to those skilled in the art. The names of the subroutine specified in Figure 2 correspond to the subroutines listed in Table 5. With reference to Figure 2, again, the "Identified sequences" are transcription sequences that represent each sequence of the library and a corresponding identification of the database entry (if any) with which it matches. In other words, the "identified sequences" are transcription sequences that represent the output of "step (a)" described above. Figure 3 is a block diagram of a system for implementing the invention. The system of Figure 3 includes the generation unit of library 2 that generates a library and find out an output stream of transcription sequences indicative of the biological sequences comprising the library. The programmed processor 4 receives the output of the data stream from »Unit 2 and process this data according to" step (a) " T mU described above to generate the identified sequences. The processor 4 may be a processor programmed with the available computer program known as the INHERIT 670 sequence analysis system and the commercially available computer program known. as the Invoice program (both available on Applied Biosystems Inc.) and with the UNIX operating system. JB Still with reference to Figure 3, the identified sequences are loaded into the processor 6 which is programmed with the abundance rating program. The -processor 0 6 generates the Final Transcript sequences indicated in both Figure 2 and Figure 3. Figure 4 shows a more detailed block diagram of a planned relational computing system, which includes several research techniques that can be implemented , along with a 5 database research against which to ask.

With reference to Figure 2, the abundance classification program first performs an operation known as "Tempnum" in the identified Sequences, to discard all identified sequences except those that match the database entries of the selected types. For example, the Tempnum process can select identified sequences that represent matches of the following types with the entries in the database (see the definition above): "exact" match, "homologous" human match, "other species" match represents genes present in species other than human), "no" coincidence (there are no significant regions of homology with respect to the entries in the database representing previously identified nucleotide sequences), "I" match (Incyte for sequences of DNA not previously known), or "X" match (matches EST in the reference database). This eliminates the sequences U, S, M, V, A, R and D, (see definitions in Table 1). The values of the identified sequences selected during the "Tampnum" process then passes to an additional selection (weeding) operation known as "Tempred". This operation can, for example, discard all the values of the identified sequences that represent matches with entries of the selected database.

'M' The values of the sequences identified during the "Tempred" process are then classified according to the library, during the "Tempdesign" operation, it is contemplated that the "identified sequences" can represent sequences from a single library, or from two or more libraries. Consider first the case that the values of the identified sequence represent sequences from a single library. In this case, all the values of the identified sequence determined during "Tempred pass classifying in the" Templib "operation, further classified in the" Libsort "operation, and finally an additional classification in the" Temptarsort "operation. that the transcription sequences produced during the 5"Tempred" operation represent sequences of two libraries (which we will call the "target" library and the "library" of ¿Subtraction? For example, the target library may consist of cDNA sequences from clones of a diseased cell, while the subtraction library may consist of cDNA sequences from clones of the diseased cell after treatment by exposure to a drug. As another example, the target library may consist of cDNA sequences of clones of a cell type of a young human, while the subtraction library may consist of 5 clone sequences of the same cell type of the same human at different ages. In this case, the "Tempdesig" operation directs all the transcription sequences that represent the target library to process them according to "Templib" (and then 5"Libsort" and "Temptarsort"), and directs all the transcription sequences that represent the subtraction library to process them according to "Tempsub" (and then "Subsort" and "Tempsubsort"). For example, the operations of ^ A consecutive classification "Templib," Libsort, "and Temptarsort" classify the identified sequences of the target library in descending order of the abundance number (to generate a list of decreasing abundance numbers, each abundance number corresponding to a database entry, or several lists of abundance numbers decreasing, with the numbers of abundance in each list corresponding to the entries of the database of a type fS - selected) with the redundancies removed from each classified list. The consecutive classification operations "Tempsub," "Subsort", and Tempsubsort "classify 0 sequences identified from the subtraction library in decreasing order of abundance number (to generate a list of decreasing abundance numbers, each abundance number corresponding to an entry in the database, or several lists of decreasing abundance numbers, with the numbers 5 of abundance in each list corresponding to the entries of * the database of a selected type) with the redundancies removed from each classified list. The product of the transcription sequences of the operation "Temptarsort" "typically represent sorted lists from which a histogram can be generated in which the position along an axis (eg, horizontal) indicates the number of abundance (from the target library sequences), and the position at ^ Length of another axis (for example, vertical) indicates the value of the identified sequence (e.g., human or non-human gene type). Similarly, the product of the transcription sequences from the "Tempsubsort" operation typically represent classified lists from which a histogram can be generated in which the position along a 5 axis (eg, horizontal) indicates the abundance number (of sequences of the subtraction library), tJjf- and the position along another axis (for example, vertical) indicates the value of the identified sequence (for example the type of human or non-human gene) ). 0 The product of the transcription sequences (classified lists) of classification operations Tempsubsort and Temptarsort are combined during the operation identified as "Cruncher" The "Cruncher" process identifies pairs of corresponding target abundance and subtraction numbers (both representing the same value of the identified sequence), and divides one from the other to generate a "quotient" value for each pair of corresponding abundance numbers, and then classify the quotient values in order of decreasing quotient value. The data product of the "Cruncher" operation (the final transcription sequence in Figure 2) is typically a classified list from which a histogram can be generated in which the position along an axis indicates the size of the quotient of the abundance numbers (for the corresponding identified sequence values of the target and subtraction libraries) and the position along another axis indicates the value of the identified sequence (e.g., gene type). Preferably, before obtaining a quotient between the two abundance values of the libraries, the Cruncher operation also divides each quotient value by the total number of sequences in one or both of the target and subtraction libraries. The resulting lists of "relative" quotient values generated by the Cruncher operation are useful for many medical, scientific and industrial applications. Also preferably, the product of the Cruncher operation is a set of lists, each list representing a sequence of decreasing quotient values for a different selected subset (e.g., protein family) of the database entries.

In one example, the abundance classification program of the invention tabulates the numbers of mRNA transcripts for a library corresponding to each gene identified in a database. These numbers are divided by the total number of sample clones. The results of the division reflect the relative abundance of mRNA transcripts in the type of cell or tissue from which they were obtained, obtaining this set of final data referred to herein as "gene transcription image analysis". The resulting subtracted data shows exactly which proteins and genes are up-regulated and down-regulated with very detailed complexity. 6.6. HUVEC cDNA library Table 2 is a table of abundance that lists the transcripts of several genes in an induced HUVEC library. The transcripts are listed in descending order of abundance. This computerized classification simplifies tissue analysis and accelerates the identification of new significant proteins that are specific for this type of cells. This type of endothelial cell line tissues of the cardiovascular system, and as much as is known about its composition, particularly in response to activation, greater opportunity for target proteins to be available to affect the treatment of disorders of this tissue, as well as the highly common arteriosclerosis. 6. 7. Monocyte cell and 5-mast cell cDNA libraries Tables 3 and 4 show truncated comparisons of two libraries. In Tables 3 and 4 the "normal monocytes" are the HMC-1 cells, and the "activated macrophages" are the THP-1 cells previously treated with PMA and activated with 0 LPS. Table 3 lists in descending order of abundance the most abundant gene transcripts for both cell types. With only 15 gene transcripts of each cell type, this table allows a rapid qualitative comparison of the most common transcripts. This classification of abundance, with its convenient side-by-side exposure, provides a useful research tool immediately. | jf. In this example, this research instrument describes that 1) only one of the 15 transcripts of higher activated macrophages is found above the 15 0 transcripts of normal genes (poly-A binding protein); and 2) a new gene transcript (not previously reported in another database) is represented relatively high in activated macrophages but is not prominent in a similar manner in normal macrophages. Such a research tool provides researchers with a short path • to new proteins, such as receptors, cell surface and intracellular signaling molecules, which can serve as drug targets in commercial drug screening programs. Such an instrument can save a considerable amount of time over what is consumed in a trial and error program aimed at identifying important proteins in and around cells, because those proteins that carry out the daily cellular functions and represented as RNA in steady state are quickly eliminated from a subsequent characterization. This illustrates how the profiles of gene transcripts change with altered cell function. Those skilled in the art know that the biochemical composition of cells also changes with other functional changes such as cancer, including several stages of f. cancer, and exposure to toxicity. A gene transcription subtraction profile such as in Table 3 is useful as a first screening tool for that type of gene expression and protein studies. 6. 8. Subtraction analysis of normal monocyte cell and activated monocyte cell cDNA libraries. Once the cDNA data is in the computer, the computer program as described in Table 5 was used to obtain ratios of all gene transcripts in the two libraries described in Example 6.7, and gene transcripts they were classified by the descending values of their ratios. If a transcript of abundance genes is unknown but appears to be less than 1. As an approximation - and to obtain a quotient, which would not be possible if the non-represented gene had an abundance of zero - to the genes that are represented in only one of the two libraries is assigned an abundance of 1/2. Using 1/2 for clones not represented increases the relative importance of the "on" and "off" genes, whose products would be candidates for drugs. The resulting print is called the subtraction table and is an extremely valuable selection method, as shown by the following data. . Table 4 is a subtraction table, in which the normal monocyte library was electronically "subtracted" from the library of activated macrophages. This table highlights more effectively the changes in abundance of gene transcripts by activating macrophages. Even among the first 20 transcripts listed, there are several transcripts of unknown genes. Thus, electronic subtraction is a useful instrument with which to help researchers to identify faster and more rapidly the changes in biology, mycoses between two types of cells. Such an instrument can save universities and pharmaceutical companies that spend billions of dollars in valuable research time and 5 laboratory resources at the early detection stage and can accelerate the drug development cycle., which in turn allows researchers to establish drug selection programs much sooner. Thus, this research instrument provides a way to bring new drugs to the public faster and more economically. Also, a subtraction table can be obtained for the diagnosis of patients. A sample of an individual patient (such as monocytes obtained from a biopsy of a blood sample) can be compared to the data provided herein to diagnose conditions associated with macrophage activation. ^ ?. Table 4 uncovers many transcripts of new genes (called Incyte clones). Note that many genes are turned on in the activated macrophage (ie, the monocyte had 0 on the bgfreq column). This method of selection is superior to other selection techniques, such as the Western blot, which are unable to uncover such a multitude of discrete new gene transcripts. The subtraction selection technique has also discovered a large number of transcripts of cancer genes (rho oncogenes, ETS2, rab-2 ras, related to YPT1, and mRNA of acute myeloid leukemia) in activated macrophages. These transcripts can be attributed to the use of immortalized cell lines and are inherently interesting for that reason. This selection technique offers a detailed picture of up-regulated transcripts including oncogenes, which help explain why anticancer drugs interfere with the patient's immunity mediated by activated macrophages. Armed with the knowledge gained from this screening method, those skilled in the art can establish more targeted programs, more effective drug screening programs to identify drugs that are differentially effective against both the relevant cancers and the conditions. of activated macrophages with the same gene transcription profile; 2) cancer alone; and 3) flR conditions. activated macrophages. The smooth muscle senescence protein (22kd) was up-regulated in the activated macrophage, indicating that it is a candidate to block to control inflammation. 6. 9 Abduction analysis of normal liver cell libraries and liver cells infected with hepatitis * In this example, the rats are exposed to the hepatitis virus and remain in the colony until they show definitive signs of hepatitis. Of the rats diagnosed with hepatitis, half are treated with a new anti-hepatitis agent (anti-hepatitis agent). Liver samples are obtained from all rats before being exposed to the hepatitis virus and at the end of treatment with anti-hepatitis agent or not receiving treatment. In addition, liver samples can be obtained from rats with hepatitis just before treatment with anti-hepatitis agent. The liver tissue is treated as described in Examples 6.2 and 6.3 to obtain mRNA and subsequently to sequence the cDNA. The cDNA of each sample is processed and analyzed according to its abundance with the computer program of Table 5. The transcript images of genes resulting from the cDNA provide detailed images of fl, the baseline (control) for each animal and of the states infected and / or treated of the animals. The cDNA data for a group of samples can be combined into a profile of transcripts of group summary genes for all control samples, all samples of the infected rats and all samples of the rats treated with anti-hepatitis agent . The subtractions are made between the appropriate individual libraries and the pooled libraries. For the fßß individual animals, control and post-study samples can be subtracted. Also, if the samples are obtained before and after treatment with anti-hepatitis agent, the data of the individual animals and the treatment groups can be subtracted. In addition, the data for all control samples can be combined and averaged. The average of the control can be subtracted from the averages of cDNA samples of both anti-hepatitis agent and post-study and anti-hepatitis or post-study agents. If pre- and post-treatment samples are available, the pre- and post-treatment samples can be compared individually (or averaged electronically) and subtracted. These subtraction tables are used in two general ways. First, we analyze the differences for transcripts of genes that are associated with continuous hepatic impairment or healing. Subtraction tables are instruments to isolate the effects of drug treatment from the underlying underlying pathology of hepatitis. 0 Because hepatitis affects many parameters, additional liver toxicity has been difficult to detect only with blood tests for the usual enzymes. The profile of gene transcription and subtraction provides a much more complex biochemical picture that the 5 researchers have needed to analyze those difficult problems. Second, the subtraction tables provide an instrument to identify clinical markers, individual proteins or other biochemical determinants that are used to predict and / or evaluate a clinical end point, such as disease, drug-related improvement, and even additional pathology due to the drug. ,. The subtraction tables specifically highlight the genes that are turned on or off.Thus, the subtraction tables provide a first selection for a set of gene transcription candidates to be used as clinical markers.Subsequently, the electronic subtractions of cell libraries and additional tissues reveal which of the potential markers are actually found in different cell and tissue libraries.Candidates for gene transcripts found in additional jJR libraries are removed from the pool of potential clinical markers. and other relevant samples that are known to be lacking and that 0 have the relevant condition are compared to validate the selection of the clinical marker In this method, the particular physiological function of the protein transcript does not need to be determined to qualify the transcription of genes as a marker c línico. 6. 10. Electronic Northern blot A limitation of electronic subtraction is that it is difficult to compare more than one pair of images at a time. Once the products are identified as relevant for further study (via electronic subtraction or other methods), it is useful to study the expression of single genes in a multitude of different tissues. In the laboratory, the "Northern" stain hybridization technique is used for this purpose. In this technique, a single cDNA, or a probe corresponding thereto, is labeled and then hybridized against a spot containing RNA samples prepared from a multitude of tissues or cell types. In autoradiography, the expression pattern of that particular gene, one at a time, can be quantified in all included samples. In contrast, another embodiment of this invention fH is the computerized form of this process, referred to herein "Electronic Northern blot." In this variation, the expression of a single gene is sought against a multitude of prepared and sequenced libraries present in the database. In this way, the expression pattern of any candidate gene can only be examined instantaneously and effortlessly. In this way, more candidate genes can be explored, leading to more frequent and fruitful relevant discoveries. The computer program included as Table 5 includes a program to perform this function, and Table 6 is a partial list of entries of the database used in the analysis of the electronic Northern blot. 6.11. Phase I Clinical Cases Based on the establishment of safety and efficacy of the previous animal tests, clinical trials of Phase I were undertaken. Normal patients undergo preliminary clinical laboratory tests. In addition, appropriate specimens are taken and subjected to gene transcription analysis. Additional specimens are taken from patients at previously determined intervals during the test. The specimens are subjected to gene transcription analysis, as described above. In addition, gene transcript changes noted in the first toxicity study of rats are evaluated? carefully as clinical markers in the patients followed. Changes in gene transcription analyzes are evaluated as indicators of toxicity by correlation with clinical signs and symptoms and other laboratory results. In addition, subtraction is performed on specimens from individual patients and on specimens from averaged patients. The subtraction analysis highlights any toxicological change in the treated patients. This is a very refined determinant of toxicity. The subtraction method also scores clinical markers. Other subgroups can be analyzed by subtraction analysis, including, for example, 1) segregation by occurrence and type of adverse effects; and 2) segregation by dose. 5 6.12 Analysis of gene transcription imaging in clinical studies An image analysis of gene transcription (or multiple gene transcription image analysis) is a useful tool in other clinical studies. For example, differences in image analysis of gene transcripts before and after treatment can be seen in patients with drug treatments and placebos. This method also effectively selects clinical markers to continue the clinical use of the drug. # 6.13 Comparative Analysis of transcription of genes between species The subtraction method can be used to select 0 libraries of cDNAs of diverse origins. For example, the same cell types of different species can be compared by gene transcription analysis to select specific differences, such as detoxification enzyme systems. These tests help in the selection and validation of an animal model for the purpose of drugs intended for human or animal use. When the comparison between animals of different species is shown in columns for each species, we refer to this as a comparison between species, or zoological spot. The embodiments of this invention may employ databases such as those written using the FoxBASE programming language commercially available from Microsoft Corporation. Other embodiments of the invention employ other databases, such as a random peptide database, a database of polymers, an oligomer database, or an oligonucleotide database of the type described in the Patent of the United States of America No. 5,270,170, issued December 14, 1993 to Culi et al., International Publication Application of TCP No. WO 9322684, published November 11, 1993, International Publication Application of TCP No. WO 9306121 , published on April 1, 1993, or Request for International Publication of the TCP No.-WO 9119818, published on December 26, 1991. These four references (the texts of which are incorporated herein by reference) include the description that may be applied to implement those other embodiments of the present invention. All references referenced in the foregoing text are expressly incorporated herein by reference herein. For those skilled in the art, various modifications and variations of the described method and system of the invention will be apparent, without departing from the scope and spirit of the invention. Although the invention has been described in connection with preferred specific embodiments, it should be understood that the invention as claimed should not be unduly limited to those specific embodiments.

- # TABLE 1 Designations Distribution (D) (F) E = Exact C = Not specific H = Homologue P = Cell / tissue id 0 = Other species u = Unknown N = No U = matches D = non-coding gene Species U = illegible (S) R = repetitive DNA H = = human A = Only Poly-A A = monkey V = Only Vector P = = pig S = Leap D = dog I = Clone Incyte V = bovine X = matches EST B = rabbit Library R = rat * (L) M = mouse U = U937 H = hamster M = HMC C = chicken T = THP -1 F = amphibian H = HUVEC I = invertebrate S = Spleen z = protozoon L = Lung G = Fungus Y = Cell T &B A = Adenoid * TABLE 1 (Cont.) Location Function (Z) (R) N = Nuclear T = translation C = cytoplasmic L = protein processing K = cellular skeleton R = ribosomal protain E = cell surface 0 = Oncogene 0 Z = memb. intracellular G = GTP ptn GTP N = mitochondria V = viral element S = secreted Y = kinase / phosphatase U = unknown A = tumor related to antigen 5 X = Other 1 = binding proteins State D = NA binding / transcription (i) B = surface / receptor molecule 0 = no current interest C = Ca ++ binding protein 0 1 = make first analysis S = ligands / effectors 2 = first analysis done H = voltage response protein 3 = Sequence of length E = Complete enzyme F = ferroprotein 4 = secondary analysis P = Protease / inhibitor 5 5 = Northern tissue Z = Oxidative phosphorylation 6 = Obtain full length Q = sugar metabolism M = amino acid metabolism N = acid metabolism nucleic 0 W = lipid metabolism K = Structural X = Other U = Unknown TABLE 2 Clone numbers 15,000 A 20000 Libraries: HUVEC ordered by ABUNDANCE Total clones analyzed: 5000 319 genes, for a total of 1713 clones number N c entry descriptor 1 15365 67 HSRP 41 Riboptn L41 2 15004 65 NCY015004 INCYTE 015004 3 15638 63 NCY015638 INCYTE 015638 4 15390 50 NCY015390 INCYTE 015390 5 15193 47 HSFIB1 Fibronectin ü 15220 47 RRRPL9 Riboptn L9 *? 15280 47 NCY015280 INCYTE 015280 8 15583 33 M62060 EST HHCH09 (IGR) 9 15662 31 H? ACTCGR Actin, gamma 10 15026 29 NCY015026 INCYTE 015026 11 15279 24 HSEF1AR Elf 1-alpha 12 15027 23 NCY015027 INCYTE 015027 13 15033 20 NCY015033 INCYTE 015033 14 15198 20 NCY015198 INCYTE 015198 15 15809 20 HSCOLL1 Collagenase 16 15221 19 NCY015221 INCYTE 015221 17 15263 19 NCY015263 INCYTE 015263 18 15290 19 NCY015290 INCYTE 015290 19 15350 18 NCY015350 INCYTE 015350 20 15030 17 NCY015030 INCYTE 015030 21 15234 17 NCY015234 INCYTE 015234 22 15459 16 NCY015459 INCYTE 015459? 3 15353 15 NCY015353 INCYTE 015353 m 15378 15 S76965 Ptn kinase inhib P 15255 14 HUMTHYB4 Thymosin beta-4 26 15401 14 'HSLIPCR ipocortin I 27 15425 14 HSPOLYAB Poly-A bp 28 18212 14 HUMTHYMA Thymosin, alpha 29 18216 14 HSMRP1 Motility relat ptn; MRP-1; CD-9 15189 13 HS18D Interferon induc ptn 1-8D 31 15031 12 HUMFKBP FK506 bp 32 15306 12 HSH2AZ Histone H2A 33 15621 12 HUM EC Lectin, B-galbp, 14kDa 34 15789 11 NCY015789 INCYTE 015789 35 16578 11 HSRPS11 Riboptn Sil 36 16632 11 M61984 EST HHCA13 (IGR) 37 18314 11 NCY018314 INCYTE 018314 38 15367 10 NCY015367 INCYTE 015367 39 15415 10 HSIFNIN1 interferon induc mRNA 40 15633 10 HSLDHAR Lactate dehydrogenase 41 15813 10 CHKNMHCB C Myosin heavy chain B 42 18210 10 NCY018210 INCYTE 018210 43 18233 10 HSRPII140 RNA polymerase II 44 18996 10 NCY018996 INCYTE 018996 45 15088 9 HUMFERL Ferritin, light chain 46 15714 9 NCY015714 INCYTE 015714 47 15720 9 NCY015720 INCYTE 015720 48 15863 9 NCY015863 INCYTE 015863 49 16121 9 HSET Endothelin 50 18252 9 NCY018252 INCYTE 018252 51 15351 8 HUMALBP Lipid bp, adipocyte 52 15370 8 NCY015370 INCYTE 015370 TABLE 2 (Cont) entry number s descriptor 53 15670 8 BTCIASI V NADH-ubiq oxidoreductase 54 15795 8 NCY015795 INCYTE 015795 55 16245 8 NCY016245 INCYTE 016245 56 18262 8 NCY018262 INCYTE 018262 57 18321 8 HSRPL17 Riboptn L17 58 15126 7 XLRPL1BRF Ribopt LL 59 15133 7 HSAC07 Act; Beta 60 15245 7 NCY015245 INCYTE 015245 61 15288 7 NCY015288 INCYTE 015288 62 15294 7 HSGAPDR G-3-PD 63 15442 7 HUMLAMB Laminin receiver, 54kDa 64 15485 7 HSNGMRNA Uracil DNA glycosylase ^ §5 16646 7 NCY016646 INCYTE 016646 Hß 18003 7 HUMPAIA Plsmnogen activ gene ^ 67 15032 6 HUMUB Ubiquitin 68 15267 6 HSRPS8 Riboptn S8 69 15295 6 NCY015295 INCYTE 015295 70 15458 6 RNRPS10R R Riboptn S10 71 15832 6 RSGALEM R UDP-galactose epimerase 72 15928 6 HUMAPOJ Apolipoptn J 73 16598 6 HUMTBBM40 Tubulin, beta 74 18218 6 NCY018218 INCYTE 018218 75 18499 6 HSP27 Hydrophobic ptn p27 76 18963 6 NCY018963 INCYTE 018963 77 18997 6 NCY018997 INCYTE 018997 78 15432 5 HSAGALAR Galactosidase A, alpha 79 15475 5 NCY015475 Incyte 015 475 80 15 721 5 NCY015721 Incyte 015 721 81 15 865 5 NCY015865 Incyte 015 865 82 16 270 5 NCY016270 Incyte 016270 16886 5 NCY016886 Incyte 016 886 18500 5 NCY018500 Incyte 018,500 ^ r 18503 5 NCY018503 Incyte 018 503 86 19 672 5 RRRPL34 R Riboptn L34 87 15 086 4 XLRPL1AR F Riboptn Lia 88 15113 4 HUMIFNWRS tRNA synthetase, trp 89 15242 4 NCY015242 INCYTE 015242 90 15249 4 NCY015249 INCYTE 015249 91 15377 4 NCY015377 INCYTE 015377 92 15407 4 NCY015407 INCYTE 015407 93 15473 4 NCY015473 INCYTE 015473 94 15588 4 HSRPS12 Riboptn S12 95 15684 4 HSEF1G Elf 1-gamma 96 15782 4 NCY015782 INCYTE 015782 97 15916 4 HSRPS18 Riboptn S18 98 15930 4 NCY015930 INCYTE 015930 99 16108 4 NCY016108 INCYTE 016108 100 16133 4 NCY016133 INCYTE 016133 TABLE 3 NORMAL MONOCITE AGAINST MACROPHAGE ACTIVATED THE FIRST 15 MORE ABUNDANT GENES NORMAL ACTIVATED 1 Lengthening factor I alpha Interleukin beta 2 Fosfoproteinaribosomal Inflammatory protein macrophage 3 Ribosomal protein S8 homologous Interleukin 4 Beta-globin Lymphocyte activation gene 5 Ferritin H chain Elongation factor I alpha 6 Ribosomal protein L7 Actin beta 7 Nuclcoplasmin Prololine specifies T-cells of variance 8 Ribosomal protein S20 Homologous poly-A binding protein 9 Transferrin receptor Osteopontin; nephropontine 10 Poly-A binding protein Tumor necrosis alpha factor 11 Tumor ptn controlled by translation Clone INCYTE 011050 12 Ribosomal protein S25 Cu / Zn superoxide dismutase 13 SRP9 signal recognition particle Adelinalo cyclin (yeast homolog) 14 Histone H2A .Z Cell B activation molecule related to NGF Ribosomal protein Ke-3 Protcasa Ncxin-I.dcrivadadc glial £ ^^ TABLE 4 Libraries: THP-1 Subtraction: IIMC Classified by ABUNDANCE Total clones analyzed: 7375 1057 genes, for a total of 2151 clones entry number s ¿bgfreq rfend quotient 10022 HUMIL1 IL 1-beta 0 131 262.00 10036 HSMDNCF IL-8 0 119 238.00 10089 HSLAG1CDN Lymphocyte activ gene 0 71 142.00 10060 HUMTCSM RANTES 0 23 46,000 10003 HUMMIP1A MIP-1 3 121 40.333 10689 HSOP Osteopontin 0 20 40,000 11050 NCY011O5O INCYTE 011050 0 17 34,000 10937 HSTNFR TNF-alpha 0 17 34,000 10176 HSSOD Superoxide dismutase 0 14 28,000 fe, 10886 HSCDW40 B-cell activ, NGF-relat 0 10 20,000 W 10186 HUMAPR Early resp PMA-induc 0 9 18,000 - 10967 HUMGDN PN-1, glial-deriv 0 9 18,000 11353 NCY011353 INCYTE 011353 0 8 16,000 10298 NCY010298 INCYTE 010298 0 7 14,000 10215 HUM4C0LA Collagenase, type IV 0 6 12,000 10276 NCY010276 INCYTE 010276) 0 6 12,000 10488 NCY010488 INCYTE 010488 or 6 12,000 11138 NCY011138 INCYTE 011138 or 6 12,000 10037 HUMCAPPRO Adenylate cyclase 1 10 10,000 10840 HUMADCY Adenylate cyclase or 5 10,000 10672 HSCD44E Cell adhesion glptn or 5 10,000 12837 HUMCYCLOX Cyclooxygenase-2 or 5 10,000 10001 NCYOIOOOI INCYTE 010001 or 5 10,000 10005 NCY010005 INCYTE 010005 or 5 10,000 10294 NCY010294 INCYTE 010294 or 5 10,000 10297 NCY010297 INCYTE 010297 or 5 10,000 10403 NCY010403 INCYTE 010403 or 5 10,000 NCY010699 INCYTE 010699 or 5 10,000 NCY010966 INCYTE 010966 or 5 10,000 NCY012092 INCYTE 012092 or 5 10,000 12549 HSRHOB Oncogene rho or 5 10,000 10691 HUMARF1BA ADP-ribosylation fctr or 4 8,000 12106 HSADSS Adenylosuccinate synthetase or 4 8,000 10194 HSCATHL Cathepsin L or 4 8,000 10479 CLMCYCA Cyclin A or 4 8,000 10031 NCY010031 INCYTE 010031 or 4 8,000 10203 NCY010203 INCYTE 010203 or 4 8,000 10288 NCY010288 INCYTE 010288 or 4 8,000 10372 NCY010372 INCYTE 010372 or 4 8,000 10471 NCY010471 INCYTE 010471 or 4 8,000 10484 NCY010484 INCYTE 010484 or 4 8,000 10859 NCY010859 INCYTE 010859 or 4 8,000 10890 NCY010890 INCYTE 010890 or 4 8,000 11511 NCY011511 INCYTE 011511 or 4 8,000 11868 NCY011868 INCYTE 011868 or 4 8,000 12820 NCY012820 INCYTE 012820 or 4 8,000 10133 HSI1RAP IL-1 antagonist or 4 8,000 10516 HUMP2A Phosphatase, regul 2A or 4 8,000 11063 HUMB94 TNF-induced response or 4 8,000 11140 HSHB15RNA HB15 gene; new Ig or 3 6,000 10788 NCY001713 INCYTE 001713 or 3 6,000 10033 NCY010033 INCYTE 010033 or 3 6,000 10035 NCY010035 INCYTE 010035 or 3 6,000 10084 NCY010084 INCYTE 010084 or 3 6,000 10236 NCY010236 INCYTE 010236 or 3 6,000 10383 NCÍT010383 INCYTE 010383 or 3 6,000 í ^^ TABLE 4 (Cont.) entry number 's describer sgfre < 3 rfend 1 quotient 10450 NCY010450 INCYTE 010450 0 3 6.000 10470 NCY010470 INCYTE 010470 0 3 6.000 10504 NCY010504 I CYTE 010504 0 3 6.000 10507 NCY010507 INCYTE 010507 0 3 6.000 10598 NCY010598 INCYTE 010598 0 3 6.000 10779 NCY010779 INCYTE 010779 0 3 6.000 10909 NCY010909 INCYTE 010909 0 3 6.000 10976 NCY010976 INCYTE 010976 0 3 6.000 10985 NCY010985 INCYTE 010985 0 3 6.000 11052 NCY011052 INCYTE 011052 0 3 6.000 11068 NCY011068 INCYTE 011068 0 3 6.000 11134 NCY011134 INCYTE 011134 0 3 6.000 11136 NCY011136 INCYTE 011136 0 3 6.000 11191 NCY011191 INCYTE 011191 0 3 6.000 11219 NCY011219 INCYTE 011219 0 3 6.000 '11386 NCY011386 INCYTE 011386 0 3 6.000 11403 NCY011403 INCYTE 011403 0 3 6.000 11460 NCY011460 INCYTE 011460 0 3 6.000 11618 NCY011618 INCYTE 011618 0 3 6.000 11686 NCY011686 INCYTE 011686 0 3 6.000 12021 NCY012021 INCYTE 012021 0 3 6.000 12025 NCY012025 INCYTE 012025 0 3 6.000 12320 NCY012320 INCYTE 012320 0 3 6.000 12330 NCY012330 INCYTE 012330 0 3 6.000 12853 NCY012853 INCYTE 012853 0 3 6.000 14386 NCY014386 INCYTE 014386 0 3 6.000 14391 NCY014391 INCYTE 014391 0 3 6. OCO 11 TABLE 5 * Masfcer m? Nu for SÜ3TRACTICN ontput SEX TRI- _OPg BBT 8A? STY CCF1? SET EXACT CN SET TYPEAKEAD TO 0 CLEAR 'SET DEVTCCE TO SCREEN USS - "SnHxtGuysFo? BAS? + Mac: fox files sClonea .dbí" qo TOP 'STORE NUMBER TO HOTIATE with BOTGOM STCRB NUMBER TO'TERMIN? TB STORE' 'TO Targecl STORE' 'TO Tárgßt2 STORE' 'TO Targee3 STORE.' • TO objectl STORE '• TO 0bject2 STORE' 'TO Objssct3 STORE 0 TO A AL' • 'STORE 0 TO EMATCH STORE 0 TO HMATCH STORE 0 TO CMATCH STORE 0 TO IMATCH STORE 0 TO JTF STCR? 1 TO BAIL EO WHILE .T. '* • program. i • Subtraction 2.f t '»Data .10 / 11/94. ... * Version, i Fo? BASE + / Kac, zevÍBÍc 1.10 * Nstea ... . \ Fsppafc file SubtraCt'iO? 3. * SCREEN 1 TYPE 0 HEADIN3 '9cxe «n 1 * AT 40, * 2 SIZS 286,492 PIXELS FCNT' G« neva \ 9 COLOR 0,0,0, PIXSLS 75,120 TO 178,241 STY23871 COLOR 0,0, -1,24610 , -1,8947 3 PIXELS 27, 134 SAY 'Subtxac ± on Menu * STYLE 65536 FOOT "Gßpßv»', 274 COLOR 0,0, -1, -1, -1, -1 ß PIXELS 117,126 G? T EMATCH STYLE 65536 FOOT 'Chicago' 12 PICIUR? 'C * Exact' SIZE15; 62 'CO' PIXELS 135.125 GET HMATCH STYLS 65536 FONT 'Chicago * .12.PICTURB' VC Homologous' SIZE .15,1 9 PIXELS 153,126 GET OM? TCH STYLE 55536 FOOT 'Chicago', 12 PICTURE "Í« C Other epc 'SIZE 15.84 8 PIX? LS 90,152 SAY "Matehesi'.8T? LE 65535 FONT« G «nev * M2 COLOR 0,0, tl, -l, -1, -1 ß PIXELS 171,126 G? T match STYLE 65536 FONT • Chicago ', 12 PICTURS "ß * C Tncy e 'SIZE 15.55 CO @ PIXELS 252,137 GET start STYLE 0 FCNT 'Geneva', 12 SIZE 15.70 COLOR 0,0, -1, -1, -1, -1 (I PIXELS 252,236 GET TERMINATE STYLE 0 FOMT 'Gßneva.', 12 SIZE 15,70 COLOR 0,0, -1, -1, -1, -1 ß PIXELS 252,35 BAY "laclu or clones'" STYLS 65536 FCNT 'Gßn' goes * M2 COLOR 0,0 -1, -1, -1, -1 Q FIXELS- 252,2156AY "-> 'STYLE 65536 FOOT' Genßva ', 14 COLOR 0,0, -1, -1, -1, -1. @ PIXELS 198,126 GET PTF ETYLE S5536 CNT' Chidago ', 12 PICTURE "ß'CPrilJt CO file 'SIZE 15', 9 ß 'PIXELS 90.9 TO 181,109 STYLE 3T71 COLOR 0,0, -1, -25500, -1, -1 ß PIXELS 90.288 TO'181.397 STYL23871 COLOR 0,0, -1, -25600, -1, -1 ß PIXELS di.296 SñY "Backgrcund: * ST? LE 65536 FCNT * G« n «va.s, 27a COLOR 0,0, -1, -1, -1, -1 ß FIX L L? 45.135 GBT ANAL STYLE 65536 FCOT 'Chicago', .12 PICTURE '«' R OvfralljFuacticn * S1ZE 4 ß PIXELS 81,26 SAY 'Targßtj' ST? LE 65536 FONT" Gßnßva ', 270 COLOR 0, 0, -1, -1, -1, -1 * PKCSL6108,20 GET targetl STYLE 0 K3OT "Genéva" -, 9 SIZE 12.79 COLOR 0,0, íl, -1, -1, -1 • ß PIXELS 135,20 GET carg «t2 STYLE 0 PCNT * Ceneva", 9 8IZE 12.79 COLOR 0,0, -1, -1, -1, -1 .ß PIXELS 162,20 GET targett3 STYLS 0 FCOT "0« n «V *" "9 5I2J.12,79 COLOR 0,0, -1, -1, -1, -1 ß PIXELS 108,299 GET objectl STYLE 0 FCNT 'G« to «va', 9 SIZE 12.79-COLOR 0, 0, -1, -1, -1, -1 ß? IXELS 135,299 GET object2 STYLE 0. FOTT 'G nßva', 9 SIZE 12.79 COLOR 0,0, -1, -1, -1, -1 8 PIXELS 162,299 G? T? Bject3 STYLE 0 FCNT "Geneva", 9 SIZE 12.79 COLOR 0, 0, -1, -1 , -1, -1 • »PIXELS 27 €, 324, GET Bail STYLS 65536 FCNT 'Chicago', 12 PICTURE * ß * R BunjS &il out" B1ZS 4112 • * EOFs Subtractian. . fmt READ 11? Bail-2 CL2AR CLOSS DATABASES USE "SmartGu; FoxBASB + / Mac? Fcx fi ßi: clones.dbf" .SET SAEGGT ON SCREEN.1 OFF HEIURN ENDT7 STORE VAL (5YS (2)) TO STARTIÍIE STORE OTPERjTargetl.) TO Targetl STORE UPFER (Targ * t2) TO Targßt2 ^ STORE UPPE (Targßt3) TO Target3 STORE UPPE (Objectl) TO Objectl STORE UPPER (objectS) T0Object2 STORE U? PER (Objšct3) TO Objšct3 clear SET TALX Cíí GAP s TEB? TE-INTTIATE + I GO INITIATE COPY NEXT GAP FIELDS NUM3ER, release .D.P, 2,, ENTRY, S, DESCRIPTOR, START.RF? ND, I TO t? MPNUM USE TEHEtlUM CCUNT TO TOT COPY TO TE24PRED FOR D- 'E'. OR.D- 'O' .OR. D ^ H 1. OR. D- 'N1 .OR. D »'! 1- USE TEKPRSD IF EaiatchiO .AND. Kmacch-Q .AND. Onatch-O .AND. HíATCH »0 COPY TO TE4PDESIG COPY STRUCTJRS TO TEMPDESIG USE TEMPDESIG IX Ercatch »l APPEND FRCM TEMPNUM FOR D * 'B' ENDF IF'Hmatch *! APPEND FECW TEMPÍTOM FOR D- 'H' ENDTF XF cpatch * L AFP? ND FR? JM, 'TEMPNUM FOR D ^' O 'ENDIF XT Uratcha l? PH = ND FRCM TEKENUM FOX D * 'I' .O ^. D »'X'« .OR.D- 'N' EHDIF COONT TO STAKXOT COPY STRUCTORE TO TEMPLIB USE TEKFLI? APPEND FROM TEMPDESIG FO.H liferaryßOTPER (targetl) XP targßt2o '' APPEND FFOM TEMPDESIG FOR library * UPFER (target2) ENDIF IF target3 < > '' APPEND FROM. TSÍPDESIG FOR lÜJr * ry «pPTER (targßt3) ENDIF COONT TO ANALTOT USE TEKPDESIG COPY STRUOTURE TO TEMPSUB USE TEMPSUB? PTEND FRCM TEKPDBS3Q FOR lihrn? And «OTPHl. { Cbjectl) i and tar «t2o '' APPEND FROM TEMPDESIG FOR. lihraxyiOTPER (0bjtCt2) EHDXF IF tasGAcSo '•? PPEHD FBOH TBtFDSSXS FOR library »U? PER (Ctajtct3) COTOT TO eCBTRACTOT SBT TALK CfT * CCKFHESSICN SUBROOTINE A? 'O0MPRESSIN3' OUERY LIBRARY 'USE TEMPLTB ifl SO? T CN' ETTKY.NUMBHR TO IBSORT. USE LI23CRT CCUNT TC IDGENB REHACE ALL RFEND ITH i MARX1 - 1 SW2 «0 DO V7HILB S 2-0 ROLL IF NARK1 > »IDGENB PACK COUNT TO AUNI UE "SW2-1 LOOP GO MARK1 DUP * 1 STORE ENTRY TO TESTA STORE D TO DESIGA. ew »o - • 'DO WHILE SW« 0 TEST SKIP 8T0RE ENIRY TO TESTS STORE D TO ÜESIGB GF TESTA - TESTB .AÍGD.DESIGAMDESIGB DELBTS DUP «DUP + 1 LOOP ENDIF GO'MRRKl REPLAC2 RFEND WITH CU? 1 - MASKl + DOT '• ENDDO.TEST LOOP ENDDO ROLL SORT OH P? F D, NUMBER TO TEMP? AR90RT. USE TEUPIARSORT * REPL? CE ALL START WTIH RP1WD / XD3ENE * 10000 CCONT TO TEKPTARCO * CCKPRÉSSICM SUBROUT? NB B? 'CCtíFRESSXNß TARGET LLBRARY' USE .TEMPSUB SORT OR EWTRY.HUMBER TO 'SUBSOKr USE SUBSOWT COUNT TO '¡' "ffTR K RSPLAC5 ALL RTZND KTTH 1 MNa - i 8W2-0 DO WHILB SN2« 0 ROLL 17 M? HCL> «SUBGEN? PACX CCCKT 10 BUNIQUB sm * l LOOP E? DXJ GO HAMO, - DUP - 1 STORE,? OTKY TO TESTA STORE OR TO D? SIG? SW - O EO «HILE SW« 0 TEST «ap STORE ENIRY TO. TESTS STORE D TO D? SIGB IF TESTA * TESTS.AND.DS5IGA »DBSISB DÉJETE 8 0 DU? - DUP + 1 LOOP ZNDIF GO MARX1 REPLACE RFEND WITH EUP ? MAMÜ MARX1 + IXJP SW-Í LOOP ENDDO TEST L8P: ENDDO ROLL SORT CN RF3ND / D, UMBER TO TEMPSUBSORT -USÉ TEMPSUBSORT * WEPLACE ALL START ITH RFEND / IEGENE * 10000 COUNT TO T? MPSUECO ****** # »** ** • *» »» • * »*» » »» »****« «**** t» »******* * i * A ******************* ++ ** *** •• ** «» * «* • *** '**' * * FUSICN ROUTTNE? 'SUETRACTING LIBRARIES1 USE SUBTRACTION COPY STRUCTURE TO CRUNCKER SEL2CT 2 USE 1? MFSUESORT S? L? CT 1 USB CRUNCHER APPEND FRCM TEMPTARSORT CCUNT TO BAILOUT MARK e 0 DO see .T .. EELECG i MARK = MARK + 1 IF MARX BAILCCT 2XXT ENDIF GO MARK STORE'ENTKY TO SCANN? R S3LECT 2 LÓCATE. FOR ENTRY »SCANN? R IFPOUNDO STORE RFIND TO BIT1 STORE RFEND TO BIT2 2LS? 'STORE 1/2 TO BITL STORE 0 TO BTT2 ENDIF SELECT 1 REPLACE BGFREQ WITH BIT2 REPLAC? CURRENT ITH BIT1 LOOP acc SEMCT i REPIACE ALL RATTO TTH RFEND / ACTOAL SORT AND RATIO ', BGFRIOJD, DESCRIPTOR TO FINAL ************************** ****************** »« «» *** «******** * # ************** * SET D? VICE TO PRNT SEGPRINT ON E SCT SET ALTÉRNATE TO 'A enoíd .Patent Figures: Subtracfcion. txt * SBT ALTÉRNATE CN ENECASE STORE VAL (SYS (2)) 'TO FIOTME IF J1NTIME STARTTME * STORE FINTIM2 + S6400 TO .FINTXME ENDIF STORE GINTIME - STAGSG? ME .TO OCMPSEC STORE CCMPSEC / 60 TO COMPMIN '*** + **** + ********' * »** SET MARGI TO 10 ßl, l EAY" Library Your tractiop Analygis "STYLE 65536 FONT 'Genßva', 274 COLOR 0,0,0, -1, -1, - 7 i * 7 • t? date () tch = 0 .AND. Cpatch =? ' .AND. IMATCKsO IF Hraatchsl 7? 'Human,'? NDGF • IF Cmatehßl 7? 'Othßr ep.' ENDJF IF Imatch-1 7.7 'XNCYTE' ENDIF • IF ANA sl? 'Sorted by ABUNDANCE1-? NDIF. XF ANAL-2 7 'Arranged by FUNCTICN' ENDIF 7 'Total alones represented:' ?? STR. { TOT, 5.0)? 'Total -clones analyaedi' ?? STR (STARTOT, 5.0) 7"Total, csiryutation- time!. STR (CCM? IN (5,2) 7? 'Min? Taa"???' D M deáignation £ »distribution z» locatic r »Function t» rpecißi i = inte? "*****" ***** '*** + «**» * «» *** »» 0 HSADIN3"Screen 1" AT 40.2 SIZE 286,492 PDCSLS FWT 'Geneva', 9 COLOR 0,0,0, ?? STR (AUNIQUE, 4,0) '7?' Genes, for a total of '• .7? STR (ANALTOT, .4,0)' 7 'clones' 7. • SCSEEN 1 TYP? 0 HEADING "Scrßen 1" AT 40, 2 SlZE 286, 492 PIXELS FOOT' G «nßva", 7 COLOR 0, 0, 0, lißt OFF fields nunber, DrF, Z, Ri E? 7p «, S, DESCRI? TOR, BGFR £ Q, RFEND, RAT10, I SET PRIOT 'OFF' CLOSE DATABASES, • USE" SpartGuy: FOXBAS? + / Mac fox files and clones, obb * ANAL CÁB. «« 2 • * start / function SET PRIOT 'ON SET HEADE83 CN SCRJEEN 1 TYPE 0 KEADING' Screen l 'AT 40, 2 SIZE 286, 492 PIXELS' FONT 'Helvetica *, 268 COLOR 0 7 '• •? 'BINDINa FROTSINS'? SCREEN '1 TYPE 0 KEADING' Screen 1"AT 40 ', 2 SlZE 286,492 PIXELS FCIOT' Helvétic ', 265 COLOR 0 7 'surfaca molßcules and recepteni'. • SCREEN 1 TYPE 0 HEADING 'Screen 1 »AT 40.2 SIZ2286,492 PIX? LS FC« T "G * nev **, 7 COLOR 0,0,0, LI-AT OFF number, D (' F; ZrR , EOTR, S, r3_SC ^? TO, S. ^^ FOR R-'B '• SCREEN 1 TYPE 0 HEADING "Screen" l'? T 40.2 SlZE 236,4.92 PIXELS .FONT .'Helvética ", 265 COLOR 0 7 'Calcum-binding proteins!' SCREEN 1 TYPE 0 KEADING «aereen 1 'AT 40,2'siZE 286,492 PIXELS FCNT' Geneva \ 7 COLOR 0,0,0, ÜSt OFF fields nupiber, D, F, Z , R, ENTRY, SlDESCRIPTOR, BsFREQ, M, END, RA'p:?, I FOR R = 'C SORBEN 1 TYPE 0 KEADIN3' Screen 1"AT 40.2 SIZE.286,492 PIXELS FCNT 'Helvetica", 265 COLOR 0 'Liganda' and affßctorai! SCREEN 1 TYPE 0 EEtoCNS "Sereen 1" AT 40.2 SlZE 286,492 PDCELS FCNT "Geneva", 7 COLOR 0,0,0, list OFF fields purnber, D ^ F, Z, R, ENIRY , S, DESCRIPTOR, BGFREQ, IiyEND, RATIO, I FOR R «'S' SCREEN 1 TYPE 0 HEADT-33" Screen 1 * AT 40.2 SlZE 286,492 PIXELS FCRT 'Helvetica ", 265 COLOR 0 7 'úther binding proteinei' SCKEEN 1 TYPE 0 H? ADINS 'Screen 1' AT'40,2 SlZE 286,492 PIXELS FCNT * C 'nßva *, 7 COLOR 0,0,0, list OFF fißldJ'pupi »ßr, DfF, Z, R, EOT7lY, S, D- ^ CRIPT0R, B3FRSQ, Ira €), RATI0, I FOR Ra'I '• 7. • • SCREEN 1 TYPE 0 HEADING "Scrßen 1" AT 40, 2 SlZE 286, 492 PIXELS FONT 'Helvetica', 268 COLOR 0 7 'CNCOGENES' 7 eCREEN 1 TYPE 0 HEADT- * »'Screen 1' AT 40.2 SlZE 286,492 PIXELS FONT 'Helvetica", 265 COLOR 0 7 'General oaeogenßsj', "SCREEN 1 TYPE 0, HEADING" Scrßen 1"AT-40,2 SZE 286,492 PIXELS -FONT * Geneva« ', 7 COLOR 0,0,0, list OFF fieid? E ^^ D) F (2,, E ^, 9fD ^ OT O fBsFR? A, HFpiD.R? TIOfX FOR £ = '©' SCREEN 1 TYPE 0 KEADING "Scrßen i 'AT 40,2 SlZE 286,492 PIXELS FONT' Helvetica *, 265 COLOR 0 7 'GTP-binding protein * i' SCREEH 1 TYPE 0 KEADING "Scrßßp 1" AT 40,2 SZE 286,492 PIXELS FONT 'Geneva ", 7 COLOR 0,0,0, li» t OFP fißlj nurnber, D, F, Z , R, ENTRY, S, DESCRIPTOR, BSFREQ, RFEND, RATIO, I FOR R »'0' SCRE? N 1 TYPE OR HSADIN3« Scrßen 1 * AT 40,2 SlZE 286,492 F? X? LS F0NT 'Helvetica *, 265 COLOR 0 7 'Viral ßlßnßptEi' "_ _ SCREEN 1 YFE 0 HEADING" Screen 1 * AT 40.2 SlZE 286,492 PIXELS FONT • JsS &rHr, 7 COLOR "? ?, 0, liet OFF fields number, D, _F, Z, R, IN, S, DESaUFT0R, BGFREQ, RFEND (RATIO, I FOR R-'V SCREEN 1 TYPE 0 HEADIN3"Scrßen 1" AT 40.2 SlZE 286 ', 492 FIX? LS FONT "Helvetica *, 255 COLOR 0 7-' Xinases and P osp atasegi '• SCH? IN 1 TYPE 0 H? ADING" Sareen 1 * AT 10.2 SlZE 286,492 PIXSLS FONT "Genev? I' , 7 COLOR 0,0,0, list OFF-fißlds nuinber, D, F, Z, R,! RY, S, DESCRIPTOR, BGFREQ, RFEND, RATIO, I FOR Ra'Y 'SCREEN 1-TYPE 0 HEADING "Scrßen 1 * AT 40.2 SlZE 286,492 PIXELS FCNT "Helvetica", 265 COLOR 0 ? "Tumor-related antigensJ 'SCRE? N 1 TYPE 0 HEADING" Screen 1"AT 40.2 SlZE 286,492 PIXSLS FCNT" G * neva ", 7 COLOR 0,0,0, list OFF-fißlds number, D, F, Z , R, ENTRY, S, DESCRIPTOR, BGFREQ, RFEND, RATIO, I FOR R * 'A' 7. SCREEN 1 TYPE 0 HEADTtvG «Screen 1" AT 40.2 SIZ? 286,492 PIXELS FONT "Helvetica", 268 COLO 0 ? '' PROTEIN SYNTHETIC KACHINERY PROTEINS! . ? , SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZ3286.492 PIXELS FONT 'Helvetica *, 265 COLOR 0 ? 'Transcription and NuclÃ © sic Acid-binding profeeinaj' SCREEN 1 TYP? 0 H? ADING "Scrßen 1" AT 40,2 SIZ? 286,492 PIXELS FONT "G"? "Va", 7 COLOR 0,0,0, list OFF faithful n «nber, D, F, Z, R, SNI ^ Y, S, DESCRIPTOR, BGFREQ; RFEND, RATIO, I FOR R = 'D' SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40.2 SlZE 286,492 PIXELS FONT -Helvética ", 265 COLOR.0 7 'Translation:' * 'SCREEN 1 TYPE 0 HEADING" Screen 1"AT 40,2 SlZE 286,492 PIXELS FCNT 'Geneva', 7 COLOR 0,0,0, list OFF fields nuiBfcer, D, F, Z, R> ENRRY, S, DESCRIPTOR, EGFREQ, RFEND, RATIO, I FOR R »'T' SCREEN 1 TYFE? ' HEADING "Screen 1" AT'40,2 SIZE 286,492 PIXELS FCNT 'Helvetica *, 265 COLOR 0?' Ribasaual proteins: '• SCRE? N 1 TY7E 0 HEADTNG "Screen 1 * AT 40.2 SSZE 286,492 PIXELS FQNT * G * n «vn ', 7 COLOR 0.0,0, liatt OFF fields nuiri3 € r, D, F, Z, R, ?? TRY, S < DESCRIPTOR, BGFREQ, RFEN ?, RATIO, I FOR Rß'R 'SCREEN 1 TYPE 0 KEADING * Scrß «n 1 * AT 40,2 SlZE' 286,492 PIXELS FONT" Helvetica ',' 265 COLOR 0 7 'Protein proceseing:', SCREEN 1 TYPE 0 KEADING "Scxeßn 1" AT 40,2 SlZE 286,432 PIXELS FONT «G '« neva ", 7 COLOR 0,0,0, list OFF Cielda nup ± ter, D, F, Z, R, E2írRY, S , DESCRIPTOR, BGF9SQ, RFEND, RATIO, I FOR R-TL '7'. . . SCKEEN 1 TYPE 0 HEADIOT "Seseen 1 * AT 40..2 S ZE 286, 492 PIXELS. FONT" "Hßlvßtica ', 268 COLOR 0' 7 7 'IN ES" 7 SCREEN 1 TYPE 0 HEADIN5 * Sczeen 1' AT 40, 2 SlZE 286,492 PIXELS F8T "Helvétic", 265 COLOR 0 • • 'Ferroprotßinst' SCREEN 1 TYPB 0 HEADING 'Screen 1"AT 40,2 SlZE 286,492 PIXELS FONT" Gßneva ", 7 COLOR 0,0,0, liat OFF fields ni ? ibr, D, F, Z, R, 2NTRY, S, DESCRIPTCR, SSTlEQ, RFEND, RATIO, I FOR Rs'F 'SCREEN 1 TYPE 0 HEADING "Screen-1" AT 40.2 SISE 235,492 PIXELS FONT "Helvetica" , 265 COLOR 0 7 'Proseases and inhibi ors!' SCREEN 1 TYPE 0 HEADING "Scrßen 1« AT 40,2 SlZE 286,492 PIXELS 70NT 'Geneva', 7 COLOR 0,0,0, liat OFF fields nup? ^ R, D , F, Z, R, EOTSY, S, DESCRIBE, 8sFREQ, RrEND, R? TIO (I FOR R 'P' SCREEN 1 TYPB 0 HEADING "Scrßen 1" AT 40,2 SlZE 286,492 PIXELS TONT "Helv tica", 265 COLOR 0 7 'Oxidative phoep orylatianj.' .. SCREEN 1 TYPE 0 HEADINQ "Scr" in 1"AT 40.2 SlZE 286,492 PIXELS FQNT" Geneva "^ COLOR 0,0,0, list OFF fißlds nupiber.DFZiR.ia ? IRY.S.DESCRIPIOR.aGFREQ. BFED.RATIO,! FOR R-c'Z "SCREEN 1 TYP? 0 HEADING '' Scrßen 1"AT 40,2 S2ZB 286,492 PIXELS FONT" Helvetica ", 265 COLOR 0 7 'Sugax metatollsnu' • SCKEEN 1 TYPE 0 HEADING" Screen 1"AT 40.2 SlZE 236,492 PIXELS FONT" Ceneva ", 7 COLOR 0,0,0, liat OFF fields number, D, F, Z, R, EOTRY, S, DESCRIPTOR, BGFREQ, RFEND RATIO, I FOR R * 'Q' eCRE? N "l TYPE 0 HEADIKG '.¡ believe 1 'AT 40,2 SlZE 286,492 PIXELS FONT "Helvetica", 265 COLOR 0 7 'Aaiino acid pßtabolisn:' '• SCRE? N 1 TYPE 0 HEADING "Screen 1" AT 40.2 SIZ? 286,492 PIXELS FCCST "Geneva", 7 COLOR 0,0,0; list OFP fields nupbßr, D, F, Z, R, ENIRY, S, DESCR? PTOR, BG7REQ, RFEND, R? TIOrI FOR R = 'M' SCREEN 1 TYPE 0. HEADXNG "Screen 1 'AT 40, 2 SIZB 286 , 492 PIX? LS FCNT "jdf fief 'S? S CO OR 0? 'Nucleic "acid metabolism:' • SCREEN l .TYPE 0 H? ADIN3" Screen '1"AT 40, 2 SlZE 286, 492 PIXELS FCNT" Geneva ", 7 COLOR 0, 0, 0,' fW liat, OFF 'fields nupbßr, D, F, Z, R, ENTRY, S, DESCRIPTOR, BGFREQ, RFEND, RATIO, I FOR R- 'N' 'SCREEN'1 TYPE 0 HE? DING "Screen 1" AT 40.2 SlZE 286,492 PIXELS' FCNT "Helvetica", 26S COLOR 0 7 'Lip'id mßtaboütn:' SCREEN 1 TYPE 0 HSADING "Screen 1" AT 40,2 SIZ? 286,492 PIXELS rCW "Geneva", 7 COLOR 0,0,0, liat OFF fields nupber , D, F, Z, R, ENTRY, S, I3ESCRIPR0R, BGFREQ, RFEND, RATIO, I FOR R »'W BCREEN 1 TYPE 0 HEADIN3" Scrßen' 1"AT 40.2 SlZE 286,492 PIXELS FONT" Helvetica ", 265 COLOR 0 7 'Ot ßr ßnzyroea:' SCKEEN 1 TYPE 0 HEADING "Screen 1" AT 40.2 SlZE 286,492 PIXELS FONT "Genßva", 7 COLOR 0,0,0, liat OFF fiields cÚ > ! r, D, Z,, E7IRY, 9, DESCRIPTOR, BGFREQ, FEND, RATTO / S FOR R = 'E' 7. . • •. • -. - • 'SCRE? N 1 TYPE 0 H? ADING "Screen 1' AT 40.2 S2ZE 286,492 PIXZLS FCNT • Helvetica", 268 COLOR 0 ? '? • MISCFT.TANEOUS CATEGORIES '? SCRE? N 1 TYPE 0 HEADING "Screßn 1" AT 40,2 SlZE 286,492 PIXELS FCNT "Helvetica", 265 COLOR 0 ? 'Stress responder:' 'SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZ? 286,492 PIXELS FONT 'Gen «va", 7 COLOR 0,0,0, liat OFF fißlds pwaber, D, FvZ, R, ENTRY, S, DESCRIPTOR, BGFREQ, RF? ND, RATIO, I FOR R *' H 'SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica", 265 COLO '0?' STRUCTURAL: '• SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZ3286,492 PIXELS FCNT "Gßn < ? va ", 7 COLOR 0,0,0, list OFF fields nunber, D, F, Z, R, ENIRY, S, IffiSCRIPTOR, BGFREQ, RFEriD, RATIO, I; FOR R = 'K' SCREEN 1 TYP? 0 H? ADING "Screen 1" AT 40Í2 SIZ? 286,492 PIXELS FONT "Helvetica", 255 COLOR? • Other clones ¡"SCREEN 1 TYP? 0 SEADHa "Screen 1" AT 40.2 SlZE 286,492 PIXELS 'FONT "" Gene-va ", 7. COLOR 0,0,0 list OFF fields ni2d ^ r, D, F, Z, R, ENrRY, S, iESCRIPTOR , BGFREQ, RFEND, RAT10, I FOR R = 'X' SCREEN 1 TYP? 0 HEADING "Serien 1 * AT 40.2 SlZE 286,492 PIXELS FCNT" Helvetica ", 265 COLOR 0? 'Clones? £ jmjswn fupctions' SCREEN 1 T? FE 0 K? ADKG = 3eraert 1"AT 40.2 SISE 286,432 PIXSLS 3TCNT" Cer. «Va *, 7 COLOR 0,0,0, list OFF fißlds nur? Mr, D,?, Z (R, ENIRY, S, DESCRIPTOR, BGFREQ, RÍ? ND, RATIO, I FOR R «'U' ENDCASE DO "Test print.prg" SET PRINT OFT SET DBVICE TO SCREEN COS? DATAHASES ERASE TEMPLSB.DBF ERASE TEMPW? T.DBF ERASE TEMPDESId.DB? SET KARGIN TO 0 CLSAR LOOP ENDCO • Northern (ainala), version 11-25-94 eloße dataJases SET TALK OFF SET PRINT OFF 'SET EXACT OFF ß - CLEAR' STOSE. ' 'TO Eobject STORE' 'TO Dobject STORE 0 TO Nup ?. STORE 0 'TO zog STORE 1 TO Bail DO WHILE, T. . * Prograra. i Northern (single) .fine * Data 8/8/34 • * Veraion-. s .Fos BASE + / liac, 'revision 1.10 * Notes. ...: .Fopnat file Northern (ßingle) * • • 'SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40.2 SlZE 286,493 PIXELS FCMT "Geneva", 12 COLO' 0,0,0 ß PIXELS 15,81 TO 46,397 STYLE 28447 COLOR 0,0, -1, -25600, -1, -1 ß PIXELS 89,79 TO 192,422 STYLB 28447 COLOR 0,0,0, -25600, -1, -1 ß PIXELS 115,98 SAY " Eptry ti • STYLE 65536 F8T "Genßva", 12 COLOR 0,0,0, -1, -1, -1 S PIXELS 115,173 GET Eobjeat STYLE 0 FCNT 'Geneva ", 12 SIZS 15,142 COLOR 0,0,0, -1 , -1, -1 8 PIXELS 145.89 SAY • Deaeription "STYLE 65536 FONT." Genov ', 12 COLOR 0,0,0, -1, -1, -1. ß PIXELS 145,173 GBT Dobject STYLE 0 FOOT 'Geneva', 12 SI2E 15,241 COLOR 0,0,0, -1, -1, -1 ß PIX? LS 35,89 SAY "Single Northern search aereen" STYLB 65536 FCNT * G « neva ", 274 COLOR 0,0, - ß PIXELS 220,162 GET Bail STYLE 65536 FONT 'Chicago", 12 PICTORE "ß * R I continued, Bail out' SlZE 9 PIXELS 175.98 SAY "Clone #?" STYLS 55535 FONT 'Gßpeva. »., 12 COLOR 0,0,0, -1, -1.-1 9- PIXELS 175,173 GET Numb STYLE 0 FONT * G« neva \ 12 Slze 15,70 COLOR 0,0,0 , -1, -1, -1 ß PIXELS 80,152 SAY "Enter * p and CNE of the following" STYLE 65535 FONT "Geneva ', 12 COLOR -1', * * 'ECF: Northern (single). Fmt KEADTF Eail = 2 CLEAR • screen 1 off 'RETURN ENDTP USB »S? NartGuy¡F0x3ASE + /? Ac: Fc? files iLookup. bf "SET TAL 'ON dbf ' SNDGF BRCW5E STORE Entry TO Searchval 'CLOS? DATAHASES ERASE '"Loo3? .- p entry.dbf" ENDIF • ZF-Dobject' • SET 2XACT OFF SET SAFETY OFF eaRT ON descriptor TO "LcoJaip descriptor bf" SET SAFETY On USE "LooJaip descriptor bf 'LCCATE FOR UPPER (TR M (descriptor)) = UPPER (TRIM (Dobjeet)) • ur. NOT.FOUND U CL? AR LOOP BNDTJ? BROWSE STORE Entry TO Searchval CLOSB DATABASBS, ERASE 'Lookup descriptor. dbf "# SET EXACT ON IN? TF • F NuwboO USE • Smartsuy! FoxBASE + rMac: Fox files .clones, bf" GO Nüpk ERCW3E .STORE Entry TO Sßarchval E? DIF OLEAR? 'Northern analyali ffor e? Cry' 7? Sear'chval 7 •. ? 'Encer Y to proceed' WATT to o? - CLEAR IF UPPER (OK) or'Y 'screen 1 off RETURN ENDIF * COWPRESSICN 'SUBROÜTINE FOR LIBRA? Y.dbf 7' Coifreaaing the Librarles file no .--. . ' USE "SpartGuy: FoxBASE + / Mac: Fox filters .libraxißß .dbf" SET SAFETY GFF - SORT ON library .TO "CsppreBaed librarles. Bf • * FOR entered> 0 'SET SAFETY ON USE "Cappreased librarles", dbf • DELST3 FOR ßnteredse'O PACK COUNT TO TOT MARK1 * 1 S¡W2 »0. DO WHILE S 2 »0 ROLL • 33? MAR ^ l > »TOT • PACK SW2» = 1 'LOOP EMDIF GO MARX1. 'STORE libraxy TO TESTA' SKEP STORE Library TO TESTB IF TESTA m TESTB DELET? E2ÜIF MARK1 »MARTl + l OOP 'ENDDO ROLL * Northern ßnálysiß CLEAR 7' Doing thß northern now. , • SET TALK OK USE * epvartGu? Fox3ASE * / Mac «Fox f iles curtains. bf * - COPY TO "Hite bf '", FOR ent? y * ßearehval SET SAFETY CN * MASTER ANALYSIS 3; VERSION 12-5-94 * Master menu for analyaio output CLOS? DATABAS? 3 SET TALK OFF SET SAFETY OF? CLEAR SET D? VTC? TO SCRE3N SET DEFAULT TO "SmartGuy: Fc ?? BASE + / Mac: fox filesiOutput programsi" USE "SmartGuyjFoxBASE - (- / Mac: fox iles¡Clones. Bf" GO TOP STORE NUMB? R TO INITIAT3 GO BOTTO STORE NUMBER TO ERMÍNATE STORE 0 TO EMTIR3 STORE 0 TO XMATCH STORE 0 TO PRINTON STORE 0 TO PTF DO WHILE .T. * Program: Master analysis. Emt * Date: 12/9/94 * Version: FoxBASE / Mac, revision 1.10 * Notes ....: Format file Master analysis »SCREEN 1 TYPE 0 HSADINO" Screen 1"AT 40.2 SIZ? 286,492 PIXELS FONT • Genov ", 9 COLOR 0,0,0, § PIX? LS 39,255 TO 277,430 STYLS 28447 COLOR 0,0, -1, -25600, -1.-1 S PIXELS 75,120 TO 178,241 STYL33871 COLOR 0,0 , -1, -25600, -1, -1 <3 PIXELS 27,98 SAY "Customized Output Menu" STYLE 65536 FCNT "Geneva ', 274 COLOR 0,0, -1, -1, -1 ß P? XELS 45.54 GET condsn STYLS 65536 FOIT "Chicago", 12 PICTCRE "S * C Condensed format" SlZE Q PIXELS 54,261 GET STYL anal? 65536 FONT "Chicago", 12 PICTUR3"@ * RV" Sort / nup? Ber; Sort / entry? G PIXELS 117,126 GET EMATCH STYLE 65536 FOOT "Chicago", 12 PICTURE "3 * C Exact" SlZE 15.62 CO.? PIXELS 135,126 GET HMATCH STYLE 65536 FOOT "Chicago", 12 rICTUR? "Q * C Homologoufi" SlZE 15,1 -ß iXELS 153,125 GET OMATCH STYLS 65536 FONT "Chicago", 12 FICTURE "3 * C Other spc" SIZ315.84 W? IXES 90,152 SAY "Matches:" STYLE 65536 FONT * Geneva * , 268 COLOR 0,0, -1, -1, -1, -1 9 PIXELS 53,54 GET PRINTON STYLE 65536 FONT "Chicago", 12 PICTURE "3 * C Ipclude clone listing * g PIXELS 171,126 GET Imacch STYLE 65536 FONT "Chicago", 12 PICTURE "3 * C In? Yt?" SlZE 15.65 CO? PIXELS 252.146 GET initiat? STYLE 0 FONT "Geneva", 12 SIZE? 15.70 COLOR 0.0, -1, -1, -1 , -1 9 PIXELS 270,146 GET ends STYLE 0 FONT * G * neva ", 12 SlZE 15,70 COLOR 0,0, -1, -1, -1, -1 3 PIX? LS 234,134 SAY" include clones "STYL? 65536 FONT "Genßva *, 12 COLOR 0,0, -1, -1, -1, -1 ß PIXELS 270, 125- SAY" - &"; STYL? 65536 FONT "Genßva *, 14 COLOR 0,0, -1, -1, -1, -1 (i PIXELS 198,126 GET PTF STYLE 65536 FCNT" Chicago ", 12 PICTURE" 9 * q Prinfc to file * SI2.E 15 , 9 S PIXELS 189,0 TO 257,120 STYLE 3871 COLOR 0,0, -1, -25600, -1, -1 ß PIXELS 209,8 SAY "Library ▲lection" STYLE 65536 FONT "Geneva", 266 COLOR 0,0, -1, -1, -1, -1 ß PIXELS 227.18 GET HTCIRE STYL265536 FONT "Chicago", 12 PICTOR? "S * RV All; Sselected * SlZE 16 * * EOF: Master analysis.fmt READ IF ANAL »9 .CLEAR CL0S3 DATABASES ERASE TEMFASTER.DBF USE" Sp «rtGuy: FoxBASE-t- / Mac; i.}. Í iles clones.dbf" SET SAFETY ON SCREEN 1 OFF RETURN ENDIF clear 7 INITIAT? ? ERMATE YOURSELF 7 CONDEM? ANAL? e atch 7 Hmatch? C? Natch 7 8ATCK SET TALK CN I? ENTIRE = 2 USE "Uiiique librarles." Dbf "R? PLACS ALL i WITH '' BROWSS FIELDS i, libname, library, total, entered AT 0.0? NDIF USE "SmartGuy: FoxBASE + / Mac: fox files solones.dbf" * COPY TO T? MPNUM FOR NüMSER «IiaTIATE.AND.NUMSER < = T2SMIKATE * US? TEMPNUM COPY STRUCTURE TO TEKPLI3 USE TEMPLI3 I? ENTIRE-1 APP? ND FROM "SaartGuy: Fox3AS? + / Mac: fox files: Clones.dbf" ENDIF I? ENITRR-2 USE "Unique librarles .dbf" COPY TO SSLECTED FOR UPP3R (i) - 'AND' USE SELECTED STORE RECCOUNTO TO STOPIT MARK = 1 DO WKILE .T. I? MARK > STOPIT CLEAR? X1T? NDI? USE S? LECTED GO MARK STORE library TO THISQNE? 'COPYTNG' ?? THISONE USE TEMPLIB APPEND FRCM • SmartGuyFoxBASE + / Mac: fex filea: Clones, dbf * FOR library «THISC2? STORE KARX + 1 TO MARX LOO? #? NDDQ ENDIF USE "SmartGuy: FoxBASE + / Kac! Fc" file ": clones bf • CCUNT TO STARTOT COPY STRUCTORE TO TEMFDESIG USE TEMPD? IF Ematch-0 .AND .. Hmateh = 0 .AND. Qp? Atch = 0 .AND. IMATCH «0 APPEND FRCM TEMPLIB ENDIF APPENDING FROM T? MPLIB FOR D * 'E' ENDIF IF Hmatch = l APPENDING FROM T? MPLI3 FOR D * 'H' ENDIF IF Omatch = l APP? ND FROM TEMPLIB FOR Da'O 'ENDIF IF Imatchal APPEND FROM TEMPLIB FOR D «'I * .OR.D-'X' .OR.D-'N '2NDIF IF Xrratchpl APPENDING FROM TEMPLIB FOR D»' X 'ENDIF ßet calk off EO CASE CASE PTF = 0 SET DEVTC3 TO PRINT SET PRINT ON ETBCT CASE PTFsl SET ALTÉRNATE TO 'Total funstion aort.txt "" SET ALTÉRNATE TO "H and 0 function sort.txt" "SET ALTÉRNATE TO" Shear Stress HUVEC 2: Abur.dar.ce sort.txfc " * SET ALTÉRNATE TO "Shear Stress HUVEC 2 • .Abur.? Ance con.t t" * SET ALTÉRNATE TO "Shoar Stress HUVEC 2: Func ion sort fcxt * * SET ALTÉRNATE TO "Shear Stress HUVEC 2: Distribution sort. Txfc" * SET ALTÉRNATE TO "Shear stress HÜVEC l; Clone Ust.txf * SET ALTÉRNATE TO" Shear Stress HUVEC 2: ocación aort.txt " SET ALTÉRNATE ON ENECA3? 1 • i 7 date? '' ?? TIMBO? 'Clone- numbers' ?? STR (INITIATS, 6.0) 7? 'thrsjgh' 7? STR. { TERMINATE, 6, 0) 7 'Free them:' IF ENTIRE = 1? 'All Libraries' ENDIF 13? ENTIRE = 2 MARK-1 m WHILE .T. j-BFlF MARK > STOPIT? XIT ENDIF USE SELECT? D GO MARX 7 '' ?? TRIM (libname) STORE MARK + 1 TO MARK LOOP ENDDO ? NDIF? 'Desióncionß:' IF EmatehsO .AND. Hmatch * 0 • AND. anatch = 0 .AND. IMATCH = 0 7? 'All' ENDIF IF Snatch = l ?? 'Exact,' ENDIF IF Hmatch-1 ?? 'Human,'? NDIF • IF Ctnatchs-1 77 'Other sp. 'ENDIr IF Inatchsl 77' INCYTE1 ENDT IF XpaLCh = l ?? < SS7 'ENDIF I? ONDEN-1? 'Copdensed form analyaia' ENDTP IF ANALal? 'Sorted by NUMBER' ENDIF IF ANAL = 2? 'Sorted by ENTRY' ENDIF IF ANAL = 3 1 'rranged by ABUNDANC?' ENDIF I? ANAL-4 7 'Sorted by INTER? ST' ENDIF TF ANAL = 5 • Arranged by LOCATION 'KDIF' W AN ES "? 'Arranged by DISTRI3UTI0N' EMDIF IF ANAL-.7? 'Arranged by FUNCTION' ENDIF? 'Total clones represented:' ?? STR (STARTOT, 6.0 )? "Total clones analyzeds' ?? STR (ANALTOT, 6.0) 'Al = llbrary d = deeignation f «distribution z« location r «function c« cer ? **** + • ** • ** ************************ »« «« *** «** w _ **** * # *** «** w * ^ USE TEMPD? SIG SCREEN 1 TYPE 0 HEADING "Screen 1" A? 40.2 SlZE 286,492 PIXELS FOOT "Geneva" ', 7 COI R 0,0,0, DO CASE CASE ANAL = 1 «*** ort / number HEADING ON CONDENMI SORT TO TEMP1 ON ENTRY, NUMB? R DO" CCMPR? 3SI0N nup? Er.PRG 'SORT TO TEKP1 ON NUMBER USE TE Pl lint off fields number, L, D, F, Z, R, C, SNTRY, S, DESCRIPTOR * list Off flared number, L, D, F, Z , R, C, ^ rr.HY, S, DSSCRJP 0R,? N3TH, RFEND, ^ CLOSE DATABASES ERASE TEMPl.D3F ENDIF CASE ANAL = 2 * sorc / D? SCRIPTOR. SET KEADING ON * SORT TO TEMP1 ON DESCRIPTOR, ENTRY, NU? 3? R / S for D- '?' .OR.D-'H '.OR.D-'O' .OR.D * 'X' .OR.D-. ' 1' • SORT TO TEMP1 ON ENTRY, DESCRIPTOR, NUMBER / S for Da'E1.OR.D-'H '.OR.D-'O' .OR.D «'X' .OR.D * 'I' SORT TO TEMP1 ON ENTRY, START / S for D * 'E' .OR.Ds'K 'XF CONDEN = l DO "COMPRESSION entry.PSC * ELSE USE TEMP1 list off fields number, L, D, F, Z, R, C, EÍ ^ RY, S, DESCRIPTOR, LENS, R7EtqD, INIT, I CLOSE DATABAS? S ERASE TEMPl. BF ENDIF CASE ANAL = 3 * ssrt by abundance SET HEADING ON SORT TO TEMPl ON ENTRY, UMBER for D-'E '.OR.D =' H '.OR.D =' O '.OR.Dx' X '.OR.D- * I' DO "CQMPR? SSIQN abundance RG "CASE ANAL-4 * sort / intereat SET HEADING ON IF CONDEN = l SORT TO TEMP1 ON ENTRY, NUMBER FOR I > 0 DO "COMPRESSION interest.PRG" ELSE SORT ON I / D.ENTRY TO TEP1 FOR I > 1 USE TEMP1 list off fields nurnber, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR, LENGTK, RFEND, INIT, I CLOSE DATA3ASES iRASE TEMPl.DEF * DIF CASE ANAL = 5 * arrange / location SET H? ADING ON STORE 4 TO AMPLIFIER 7 'Nuclear:' SORT ON ENTRY, NUMBER FIELDS RFEND, UMBER, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR. L? NGTH, TNIT, I, COMMEN IF CCNDEN = 1 DO "Ccirpression location. Rg" ELSE DO "Normal suhroutine 1" L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, CCMMEN L, D, F, Z, R, C, IN RY, S, DFCRIPTOR, LEGTH, BTIT, I, COMMEN 7 'Cell ßurf cß:' SORT CN ENTRY, NUMB? R FIELDS RF? ND, NUMBER,, D, F, Z, R, CENTRY, S, DESCRIPTOR, L? NGTH, INIT. I, CCWMEN IF CCNDEN-1 DO "Cspprßssion location.prg" ELS? DO "Normal subroutine 1" _ENDXF "? 'Intracellular membrane:' SORT ON? OTRY, NUM3ER FIELDS RF? ND, UMBER, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR, LENGTH. , COMMEN IF CCNDENßl DO "Csmpression location.prg" ELSE DO "Nopnal subroutine 1" ENDIF 7 'Mitochondrial:' SORT CN QÍIRY, UMBER FIELDS KFIN, NUMBER1 L, D, F < Z, R, C,? OTRY, S, DESCRIPTOR, LENGra, INIT, I, COMMEN IF CS2JDEN - 1 CO 'Csppreasion location. rg "ELSE DO 'Normal aubroutinß 1"ENDIF #?' Sßc '? TiQcl *' SORT ON ENTRY, UMBER FIELDS RFEND, NUM3ER, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR. LENGTH, INIT, I, CCMMEN IF C0ND? N = 1 DO "Compreaaion location. Rg" ELSE DO "Normal subroutinß 1 *? NDIF? • Otheri '_ SORT ON ENTRY, UMBER FAITHFUL RFEND, UMBER,, D, F, Z, R, C, ENTRY, S, DESCRIPTOR,? M3TH, INXT, I, COMMEN IF CONDEN = l DO "Comprassion location.prg" ELSE DO "Normal subroutinß 1 * RFEND, NUMBER, L, D, F, Z, R, C, EOTRY, S, DESCRIPTOR, L? NGTK, INIT, I, COMÜ? N " S? T D? VICE. TO PRINTER SET PRINTER ON EJECT DO "Output hea ing.prg 'USE" Ana-lysis loca ion.dbf * DO "Create bargraph.prg' SET HEADINO OFF? 'FUNCTIONAL CLASS TOTAL UNIQUE NEW% TOTAL' LIST OFT F1ELDS Z, AME, CLONES, GENES, EW , FERCEOT, GRAPH CLOS? DATABASES ERASE TEMP2.DBF SET H? ADING ON * USE -SmarGuy: FoxBASE + / Mac: fox files iTEMEMASTER.dbf "SNDIF S? NAES * arrange / distribution SET HEADING ON STORE 3 TO AMPLIFIER? 'Cell / tiaaue Bpecific distribution:' _ "?? ._, SORT ON ENTR, NUMBER FIELDS RFEND.NUMBER / L.D.F.Z ^ .C.ENTRY, 3, DESCRIPTOR, I ^ G il, I, CCMKEN IF CCNDEN »1 DO" Comprßssion dißprib.prg "ELSE EO "Normal lubroutine 1 * ENDIF? 'Non-specific diatribution:' 'm, ^ tt _" .__.

SORT ON ENTRY, UMBER FI? LDS RFEND, NUMB? R, L, D, F, Z, R, C, ENTRY, S, D? SCRIPTOR, LENGTH, INGT, I, COMMEN IF CONDEN-1 DO "Campression dißtrib.prg" DO "Normal eubroutina 1 * ENDIF? 'Unknown distribution:' _" "" •, "" "_, SORT CN ENIRY, NUMBER FXELD3 RF? ND, UMBER, L, D, F, Z, R, C, EHTRY, S, DESCRIPTOR, LENGTH. INIT, I, COMMEN IF CONDENsl DO "Ccmpreasion diatrib.prg" ELSE EO "Normal subroutine 1" ENDIF XF CONDEN = l S? T DEVICE TO PRINTER S? T PRIMER ON SJSCT DO "Output heading.prg" USE "Analysis distribution. Bf" DO "Create bargraph.prg" SET HEADING OFF? 'FUNCTIONAL CLASS TOTAL UNIQUE TOTAL *? "LIST OFF FIELDS P. AME.CLONES, GENES, PERCENT, GRAPH CLOSE DATABASES ERASE TEMP2.DBF S? T HEADING ON • USE "SmartGuy: FoxBASE + 7Mae: rc? Files.-TE PMAST? R. Dbf"? NDIF CASE ANAL = 7 * arrange / function SET H? ADING ON mTORE 10 TO AMPLIFIER 'BINDING PROTEINS1? ? 'Suxface molecules ar.d receivers:' SORT ON ENTRY, NUM3I-R FIELDS RF? ND, NUMH? R, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, COMM ? N IF CONDEN = l DO "Copprésaion function. R" ELS? DO "Normal Eubrou ine 1" END F? 'Calcium-biuding proteir.s:'? ORT ON? NTRY.NUNB? R FIELDS RF? ND, NUM3ER, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR, EMGTH, INIT, I, COMMEN IF CONDEN = l DO "Carrapressioh function. Rg" ELSE DO "Normal ßubróutine 1" ENDIF? 'Ligar.ds and effectors t' SORT ON ENTEY, NUMBER FIELDS RFEND, UMB? R, L, D,?, Z, R, C, EOTRY, S, D? SCRIFTOR, L? NGTH, INIT, I, CCMMEN ÉCCSTOEN-l "Csmpressisn function.prg" DO "Normal subroutxnß 1" ENDIF 7 'Othßr binding proteins:' SORT ON ENTRY.NUMBER FIELDS PJEND, NU ^ ER, L, D, F, Z, R, C, Ep? Y , S, DESCRIPTOR, LiN3ra, INIT (I, CCMMEN IF CONDEN.1 DO 'Compression function.prg "ELSE DO "Normal subroutine 1" ENDIF "EJECT? 'ONCOGENES'? 7 'General oncogenea:' SORT ON? NTRY.NUMB? R F1ELDS RFEND, NUMBER, L, D, F, Z, R, C, ENTRY, S, DESCRIFTOR , LE? »IHfINIT, I, COMMEN IF CONDEN = l Coxppresaion iun.c ion.prg "DO" Normal subroutine 1"ENDIF • GTP-binding protein i 'SORT ON ENTRY, UMBER FT? LD3 RFEND, UMBER, L, D, F, Z, R, C , ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, COMM? N IF CONDEN * l DO "Copqpression function. Prg * ELS? DO "Normal subroutine 1 * ENDIF? 'Viral element?'? SORT ON ENTRY, NUMBER FI? LD3 RFEND, NUMBER, L, D, F, Z, R.C, ENTRY, 3. DESCRIPTOR, LENGTH, XNIT.X, CCWM? N IF CONDEM = l DO "Compression funccion.prg * ELS? DO "Normal subroutine 1" ENDIF? 'Kinasßs and Phoßphatasßs:' SORT ON? NTRY, NUMBER FI? LDS RFEND, UMBER,, D, F, Z, R, C, ENTRY, 3, DESCRIPTOR, LEN3TH, INIT, I, CCMOJ IF C0NDEN = 1 DO "Compression function.prg * ELS? DO "Normal aubroutine 1" ENDIF? "Tumor-related aitigeasi 'SORT ON? NTRY, NUMBER FI? LDS RF? ND, NU 3ER, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR, ENGTH, IN2T, I / CCMM? N IF CONDE? Fcl ssion function.prg 'subroutina 1'? 'PRCT? IN SYNTHETIC MACHIN? RY PROT? 3-N3'? "Transcription and Nucleic Acid-bir.ding protein: 'SORT ON ENTRY, NUMBER FI? LDS ilFEND, NUME ? R, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR, L? NGTH, INIT, I, CCMCN IF CQNDEN = 1 DO "Compressedon function.prg 'ELS? DO "Normal subroutine 1" ENDIF 7 'Translation: • SORT CN? NTRY, NUMBER FIELDS RF? ND, NUMBER, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR, LENC-TK, INIT, I , CCt * l? N IF CCNDEN-sl DO "Compression function.prg" E £ E DO "Normal subroutine 1"? NDIF |? Fc ibosacoal proteins: 'B r ON ENTRY.NUMB? R FIELDS RFEND, UMEER, L, D, F, Z, R, C, ENRRY, S, DESCRIPTOR, LENGTK, INTT, X, CCMMEN rf CONDEN-1 DO "Compression function.prg" ELSE DO "Normal subroutine 1 'ENDIF 7' Protßin processing! 'SORT ON ENTRY, NUMBER FI? LDS RFEND, UMB? R, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR, LSNGTH.INIT, I , CCMM? N IF CONDEN-l DO "Compreßsion fur.ction.prg". ELSE DO * Noppal ßubroutine 1 * ENDIF * EJECT? 'ENZYMES'? 'Ferxoproteinsí' SORT ON ENTRY, UMBI3R FIELD3 RF? ND, NUMB? R, L. D, F, Z, R, C, ENTRY, S, DESCRIPTOR, ENSTH, EJIT, I, COM? Í? Íí IF C0NDEN = 1 DO "Compression function.prg" EIJSE DO 'Normal subroutine 1"ENDIF?' Proteases and inhibitors: 'SORT ON ENTRY, NUMBER FIELDS RF? ND, NUME? R, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR, ENGTxí, INTT, I, CCWMEN IF COND? N = l DO "Coropreasion function.prg" ELS? DO "Normal subroutine 1" SIDIF 7 'O? Idative ph? 3p orylaeion:' SORT ON ENTRY, NUMBER FI? LDS RFEND, NUMBER, L, D, F, Z, R, C, ENTR, .S, DESCRIPTOR, L ? NGTH, INIT, I, COMMEN IF CCNDEN-d DO "Compraaaion function.prg" EL ?? DO "Normal subroucine 1"? S? I? 7 'Sugar'meta olism:' SORT ON? NTRY, NUM3? R FIELDS RFEND, NUMBER, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR, LENGTH, INIT I, COMMEN IF CONDEN-1 DO "Cómpreseion function.prg" ELS? DO "Normal subroutine 1" ENDIF? 'Amino acid metabolism:' .T ON ENTRY, NUM3? R FIELDS RFEND, NUM3ER, L, D, F, Z, R, C, ENGRY, S, DESCRIPTOR, LENGTH, NIT, I, CQMMEN: 0NDEN = 1 too * Co? Ppression íunction.prg "? LSE DO "Normal subroutine 1 *? NDI? 7 'Nucleic acid metabolist:' SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER.L, D, F, Z, R, C,? NTRY, S, DESCRIPTOR, LENGTH, INIT, I , CCWMEN IF COND? N = l DO "Comprehension function.prg"? LSE DO JNormal subroutine 1"ENDIF? 'Lipid petabolÍ3m:' SORT ON ENTRY, NUMBER FIELDS RF? ND, lMMB? R, L, D, F, Z, R, C, ErmiY, S, DESCRIPTOR, LENGTK, INIT, I, CCWMEN Go CONDENsl DO "Compresßisp f nc ion.prg" ELSE DO "Normal ßubrsutine 1" F, Z, R, C, EOTRY, S, DESCRIPTOR, L? NGTH, INTT, I, CCMMEN • EJECT 7 'MISCELLANEOUS CATEGORIES' 7 7 'Stress' response:' SORT ON EOTRY, NUMBSR FIELDS RF? ND, UMBER, L, D, F, Z, R, C, EOTRY, S, DESCRIPTOR. LENGTH, INXT, I. CCMM? N IF CON33? N «.l DO 'Compression fun.ctioh.prg" ELS? DO 'Normal subroutine 1"ENDIF 7' Structural; 1 SORT QN ENTR.Y, NUMB £ R FIELDS RFEND, ^ ^ fflER, L, D, F, Z, R, C, EN Y, S, DESCRIPTOR, E ^ K3 ^ 'H, IN ^^ IF C0NDEN = 1 DO' Compression functicn.prg "ELSE DO "Normal subroutine 1"? NDIF 7 'Other clones:' SORT ON? NTRY, UMBER FIELDS RFEND, NUMBER, L, D,?, Z, R, C, EMImiY, S,? ESCRIPTOR, LENGTH, INIT, I / ODMMEN IF C0NDEN = 1 EO 'Compression function.prg "? LS? .- *** * DO" Normal subroutine 1"ENDIF?" Clones of faceown function:' SORT QN ENTRY, NUMBER FI? LDS RFEND, NUM3ER, L, D , F, Z, R, C,? TiTRY, S, DESCRIFT0R, L? NGTH, INIT, I, COMM? N IF CONDEN ^ l DO 'Compresflion function.prg "ELSE DO "Normal subroutine 1" ENDIF IF C0NDEN »1? ECT * SET DEVICE TO PRINTER * SET PRINT CN DO" Output headipg.pxg "» »* USE" AnalyBia function.dbf "" Create bargraph.prg "ISET HEADING OFF SCRE? N 1 TYP? 0 HEADING "Screen 1 * AT 40,2 SIZ? 2 $ 6,492 PIX? LS FONT" Géneva ", 12 COLOR 0,0,0 * + *? • TOTAL TOTAL NEW DIST 7 'FUNCTICNAL CLASS CLONES GEN? S GENES FUNCTICSíIAL Cl? SS' 7 '** »* LIST OF? FIELDS P, NAM?, CL? N? S, GENES, NEW, PERC? NT, GRAPH, CCMPANY LIST OFF FIELDS P, AME, CLJN? S, GENES, NEW, PERCENT, GRAPH CLOSE DATABASES ERASE TEMP2.DSF S? T HEADIN3 ON * USE "SmarpGuy: FaxBAS? + / Mac: fox files TEMPMASTER. Bf • ENDIF CASE ANAL = 8 DO "Subgroup umummary 3. rg" ENDCASE ~ p "Test print.prg" i? K PRINT OFF ^ ET D? VICE TO SCREEN CLOSE DATABASES «ERASE T? MPLIB.DBP« ERASE ?? MPNUM. DBF * EHASE TEMPDESIG.DBF * ERASE SELECTED.DBF CLEAR LOOP acco * COMPRESSION SU3R0UTINE FOR ANALYSIS PROGRAMS USE T? MP1 COUNT TO TOT R? PLACE ALL RFEND WITH 1 MARK1 = 1 SW2-0 DO .WHIL? SW2 = 0 ROLL IF MARK1 > «TOT PACX COUNT TO UNIQUE COUNT TO NEWG? NES FOR D = 'H' .OR.Da '0' SW2 = 1 LOOP ENDIF GO MARi DUP = 1 .STORE EOTRY TO TESTA «0 WHILE SW = 0 TEST = KIF STORE ENTRY TO TESTB TF TESTA = TESTS D? LETE DU? = DUP-rl LOOP ENDIF GO MAR 1. R? PLACE RFEND WITH DÜP MARK1 «MARXl + DUP SH = 1 LOOP ENDDO TEST LOOP NDDO ROLL • GO TOP STORE Z TO LOC '«' Analysis locat'ion.dbf" TE FOR Z-LOC ACE CLONES WITH TOT K? PLACE GENES WITH UNIQUE R? PLACE NEW WITK NEWG? N? S USE TEMP1 SORT ON RF ? ND '/ D TO TEMP2 USE TEMP2 77 STR (UNIQUE, 5.0) 77' genes, for a total of '?? STR (TOT, 5.0) ??' .clones'? 'V Coincidence' list off fißlds upúser.RFEipD.L ^ íF ^^^.? NrRy.S.DSSC IPTOR.LElJG H, ™ !!,! * S? T PRINT OFF CLOSE DATA3ASES ERASE TEMP1.DBF ERASE T? MP2.DBF USE TEMPDESIG * COMPRESSION SUBROUTTNS FOR ANALYSIS PROGRAMS USE TEMP1 COUNT TO TOT REPLAC? ALL RFEND ITH 1 MARX1 M 1 SW2-0 DO WHILE SW2 = 0 ROLL IF MARK1 > = TOT PACK COUNT TO UNIQUE SW2 = 1 LOOP ENDIF GO MARKl EUP = 1 STORE ENTRY TO TESTA SW - 0 f WHIL? SW = 0 TEST CP ORE ENTRY TO TESTB IF TESTA «T? STB DEL? TE DUP = DUP + 1 LOOP • ENDIF GO MARK1 REPLACE RFEND ITH DUP MARK1 ß MAR l + DU? SW = 1 LOOP. ? NDDO TEST LOOP ? NDDO ROLL «BRCWSE COUNT TO P3 FOR 1.3 IF P3 > 0? STR (P3,3,0) 7? 'genes with priority = 3 (Full insert sequepce :)' list off fields nun? ér..RF D! LF.F, ZP.C, ENRRY # S, DE3CRI? TOR, L? l?? TÍI, IÍIIT for 1 *3 • 3 ENDIF COUOT TO P2 FOR Ia2. IF P2 > 0? STR (P2,3,0) 77 'genes witth priority »2 (Primary analysis complete :)' liat off faithful number, RFEND, L, D, F, Z, R, C, ENIRY, S, DESCRIPTOR, L? NGTH , INIT for 1 = 2 7 EUDF COUNT TO Pl FOR 1-1 IF P1 > 0 7 STR (Pl, 3.0Ã? 'Genes with priority = 1 (Primary analysis needed:)' list of the faithful number, RFEND,, D, F, Z, R, C, ENTRY, 9, DESCRIPTOR, L? NGTH, INIT for 1 = 1 ENDIF * SET PRINT OFF CLOSS DATABASES ERASE TEMPl.DBF 'ERASE TEMP2.DBF USE' SmartGuy: FoxBAS? + / Mac: fox i clones clones. bf * ^ F »COMPR? SSXON SU3R0UTIN? FOR ANALYSIS PROGRAMS USE TZMP1 COUNT TO TOT REPLAC? ALL RFEND WITH 1 MASK1 - 1 SW2 * 0 DO WHILE SW2-0 ROLL IF MARK1 > = TOT PACK COÜNT TO UNIQUE SW2 = 1 LOOP ENDIF GO MARK1 DUP = 1 D? L? TE DUP - EUP + 1 LOOP ENDIF GO MARKl REPLACE RFEND WITH DUP MARK1 = MARXl + EUP SW = 1 IiOOP ENDDO TEST LOOP ENDDO ROLL «BROWSE TEMP2 total of '7' V Coincidence 'list off iieldfl nupber, FJ3ND, L.D.F, Z, R, C,? WIXY, S, DESCRIPTOR.LEKMH, INIT, I *? ET PRINT OFF CLOSE DATABASES ERASE TEMP1.DBF ERASE TEMP2.DBF USE "SroartGuy: FoxBASEt / Mac: fox files: clones.dbf '* * COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS USE TEMP1 COUNT TO TOT R? PLACE ALL RF? ND WITH 1 MARK1 - 1 SW2 = 0 DO WHILE SW2 = 0 ROLL IF MARK1 > - TOT PACK COUNT TO UNIQUE COUNT TO NEWGENBS FOR D = 'H' .OR.D = '0' SW2the LOOP ENDIF GO MARK1 DUP - 1 < ? ENTRY TO TESTA 0 'DCOOIWHILE SW = Q TEST? IP STORE ENTRY TO TESTB IF TESTA = T? STB DELET3 DUP = DUP + 1 LOOP ? NDIF GO MARK1"REPLAC? FFEND WITH DUP MARK1 - KARK1 + DUP SW = 1 LOOP ENDDO TEST LOOP ENDDO ROLL REPLAC? GENES WITH UNIQUE REFLACE NEW WITH N? WGEUE3. USE TEMP1 SORT CN RFEND / D TO T? MP2 USE TEMP2 SET H? ADING ON 7? STR (UNIÓOS, 5.0) ?? 'genes, fox a total of' 7? STR (TOT, 5.0) ?? 'clones' * • «? 'V Coincidence' lißt 'off fields amber, RF? ^, L, D, F, Z, R, C, ENTRY, S, DESCPaFTOR, LE3 ^ raK, INIT, I w ** «SCRE? N 1 TYPE 0 H ? ADING "Screen 1" AT 40.2 SlZE 286,492 PIXELS FCNT "Geneva", 12 COLOR 0,0, "list Cff faithful RFEND, S, DESCRIPTOR" S? T PRINT OFF CLOS? DATABASES ERASE TEMPl.DBF ERASE T? MP2.DBF USE TEMFDESTG * CCMFR? SSION-SUBROOTINE FOR ANALYSIS PROGRAMS USE T? MP1 COUNT TO TOT REPLACE ALL RFEND WITH 1 MARK1 = 1 SW2-0 DO WKILE SW2 = 0 RDLL IF MARK1 > = »TOT PACK COUNT TO UNI? Yj? S 2 = 1 LOOP ENDIF C-0 MARK1 DUP »1 STORE ENTRY TO TESTA £ W m O 8 WHIL? SWaO TEST rp STORE ENTRY TO T? STB IF TESTA - TESTB DELETE DUP = DUP + 1 LOOP ENDTP GO MARICL REPLACE RFEND WITH DÜP MARK1 = MARK1 + DUP ew =? LOOR ENDDO TEST LOOP ENDDO ROLL c-a TOP STORE F TO DIST US? "Analysis distribution.dbf" -iOCATE FOR P = DIST - fcPLAC? CLONES WITH TOT UPLACE GENES WITH UNIQU33 USE TEMP1 ßsrt or rfend / d to T? MP2 US? TEMP2 ?? STR (UNIQUB, 5.0) ?? 'genes, for a total of' 77 eTR (TOT, 5.0) 7? 'clones? 'V Coincidenee' liat off fieids ™ jróf *, RF5 ^,, D, F-, S,, C, B? Tt.S, DESCRIÉ * SET PRINT OFF CLOSE DATABASES ERASE TEMPl.DBF .ERASE T? MP2.DBF USE TEMPDESIG * COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS USE TEMPl COUNT TO TOT R? PLAC? ALL RFEND WITH 1 MARK1 = 1 SW2-0 DO WHILE SW2 = 0 ROLL IF MARK1 > - TOT PACK COUNT TO UNIQUE SW2 = 1 LOOP ENDI? GO MARK1 DUP - 1 STORE ENTRY TO TESTA SW * 0 WHIL? S = 0 TEST STORE? NTRY TO TE? TB IF TESTA to TESTB DELETE DUP. = DUP + 1 LOOP ENDIF GO MARK1 R? PLACB -RFEND WITH DUP MARK1 - MAHK1 + DUP SW = 1 LOOP ? NDDO TEST LOOP ENDDO ROLL 'GO TO? USE TEMP1 7? STR (UNIQU?, 5.0) 1 genes, for a total of 'STR (TT, 5.0)' clones'? 'V Match' list off fields number, RFE ^ ro,, D, F, ZFR, C, EtTOnr, S, DESCRIFTOR, I ^ N3TH, INIT, I * SET FRXNT OFF CLOSE DATABASES ERASE TEMPl.DBF USE TEMPDESIG * CCMPRE? SICN SUBROUTINE FOR ANALYSIS PROGRAMS USE "SmartGuy: FoxBASE * / Mac: = or? Files: Clones.dbf" COPY TO TSMP1 FOR US? TEMP1 COUNT TO IDGENE FOR D «'E' .0R.D» '0' .OR.D = 'H' .OR.D »'N' .OR.Ete'R '.OR.Dß'A1 DELET? FOR D = 'N' .OR.D = 'D' .OR.D = 'A' .OR.D = 'U * .OR.Dp'S', OR.D '' M '.OR.D *' R '.OR.D =' V PACK COUNT TO TOT REPLAC? ALL RFEND WITH 1 MARK1 «1 SW2 = Q DO WHILE SW2 = 0 ROLL IF MARK1 > = TOT PACK COUNT TO UNIQUE SW2 = 1 LOOP "? -.? NDIF GO MARX1 DUP» 1 STORE ENTRY TO TESTA SW - or DO WHIL? SW-0 TEST SKI? STORE EGTGRY TO TESTB IF TESTA = TESTB DELET? * DUP - DUP + 1 LOOP ENDIF GO MARK1 REPIACE RFEND WTTH DUP MARK1 * MARKl + DUP Sífal * SET PRINTER ON SORT ON RFEND / D, UMBER TO TEMP2 USE TEMP2 REPLAC? ALL START WITH RFEND / IEGENE * 10000 ?? STR (UNIQUE, S, 0) ?? 'genes, for a total of' ?? STR (TOT, 5.0) 7? 'clones'? 'Coincidence V V Clones / 10000' set heading off? ? .

SCREEN 1 TYPE 0 H? ADING 'Screen 1"AT 40.2 SlZE 286,492 PIX? LS FONT" Geneva', 7 COLOR 0,0,0, list fields nu ber, RilOT, START, L, D, F, Z, R, C, ENTRY, S, DESCRIPTOR, INIT, I * SET PRINT OFF CLOSE D? TABASES ERASE TEMPl.DBF ERASE TEMP2.DBF USE "SmartGuy.FoxBASEi '/ Mac: fox files: clones.dbf" i COMPRESSION' SUBROUTINE FOR ANALYSIS PROGRAMS US? TEMP1 COUNT TO IDGSNE FOR D * '3' .OR.D = '?' .0R.D = 'H' .OR.Da'N '.OR.D =' R '.OR.Da'A' DELETE FOR D-s'N '.OR.D =' D '.OR.D 'A' .OR.Dβ'U *, OR.D -''.OR.D-'M '.OR.D =' R '.OR.D =' V PACK COUNT TO TOT REPLACE ALL RFEND WITH 1 MARK1 s 1 SW2 = 0 DO WHILE SW2 = 0 ROLL IP MARK1 «TOT PACK COUNT TO UNIQU? S 2 = 1 LOOP ? NDIF STORE EOTRY TO TESTB IF TESTA = TESTS OF? T? DUP - DUP + 1 LOOP - ENDIF GO ARK1 REPLACE RF? ND WITH DUP MARK1 - MARKl-rDUP SW = 1 LOOP ENDDO TEST LOOP ENDDO ROLL KRCWSE PRIOT? R CN T ON RFEND / D, NUMBER TO TEMP2 USE T? MP2 R? PLAC? ALL START ITH RFEND / IDGENE * 10000 ?? STR (UNIQUS, 5.0) ?? 'genes, for a total of' ?? STR (TOT, 5.0) 7? 'clone *'? 'Coincidence V V Clonea / 10000' set heading off SCREEN 1 TYPE 0 H? ADING "Scte < 311" AT 40.2 SIZ? 286,492 PIXELS FONT "Genßva", 7 COLOR 0,0,0, list fields number, RFEND, ST? RT, L, D, P, z; R, C, ENIRY, S, DESCRIPTOR; INIT, I «SET PRINT OFF CLOS? DATA3ASES ERASE T? MPl.DBF ERASE T? MP2.DBF US? "SmartGu: FoxBASE + - / Mac i fox files clones, dbf" * USE TEMP1 csuNT to TOT ?? 'Total of ?? STR (TOT, 4.0) ?? 'clones'? * List Off Loop nup? er, L, D, F, Z, R, C, ENIRY, DESCRIPTOR, LENGTH, RF? ND, INIT, I list off fields number, L, D, F, Z, R, C, ENTRY, DESCRIPTOR CLOS? DATABASES ERASE TEMPl.DBF USE T? MPD? SIG • «Lifescan menu; version 8-7-94 SET TALK OFF set device to screen CLEAR USE "SroartGuy: FoxBAS? + / Mac: fox files: clones, dbf * STORE LUFOATEO TO Update GO BOTTCM STORE P £ CNO () TO clonene STORE 6 TO Chooser DO WHIL? .T. * Program. s Lifßseq menu.fmt * Date .... »1/11/95 * Version: FoxEASE + / Mac, revision 1.10 * Notes Format file Lifesaq menu * SCREEN 1 TYP? 0 HEADING "Screen 1" AT 40.2 SlZE 286,492 PIXELS FCNT "Ger.eva", 268 COLOR 0,0, PIXELS 18,126 TO 77,365 STYLE 2B479 COLOR 32767, -25600, -1, -16223, -16721, -15725 PIXELS 110.29 TO 188.217 STYL? 3871 COLOR 0,0, -1, -25600, -1, -1 PIXELS 45,161 SAY "LIFESEQ" STYLE 55536 FdNT 'Geneva', 536 COLOR 0,0, -1, -1,7135,5884 9 PIXELS 36,269 SAY " TW "STYLE 65536 FONT * Geneva *, 12 COLOR 0,0, -1, -1,7135,5884 0 PIXELS 63,143 SAY" Molecular Biolo? And Deslctop * STYLE 65536 FCNT "Helvetica", 18 COLOR 0,0,0, ß PIXELS 90,252 TO 251,467 STYLE 28447 COLOR 0,0, -1, -25600, -1, -1 3 PIXELS 117,270 GET Chosser STYLE 65536 FONT "Chicago", 12 PICTUR5"< ? «RV Trarascript profiles 9 PIX? LS 135,128 SAY Update STYL? 0 FONT * Geneva ", 12 SlZE 15.79 COLOR 0,0,0, -25600, -1, -1 'G PIX? LS 171,128 SAY clonepo STYLE 0 FONT' Geneva ', 12 SlZE 15,79 COLOR 0,0 , 0, -25600, '-1, -1 <? PIXELS 135,44 SAY "Last update:" STYLE 65536 FONT' Geneva "« 12 COLOR 0,0, -1, -1, -1, -1 ß FTXELS 171,44 SAY "Total clones:" STYL? 65536 FQNT "Geneva", 12 COLOR 0,0, -1, -1, -1, -1 9 PIXELS 45,296 SAY "vl.30" STYL? 65536 FONT "Geneva", 782 COLOR 0,0, -1, -1, -1, -1 * EOF: Lifeseq menu.fmc READ DO CASE CAS? Chooser = l DO "SmartGuyjFox? ASE + / Mac! Fox files: Outtput programs iMastÃ¤r analysis 3. rg" filsB: Output program: Subtracticn 2.prg "filss: Output program: orthern (single) .prg" files: Output progxamsiSeß individual clone.prg "filesiLibrarles iOutput programs: Menu.prg" CLEAR SCRE? N 1 OFF R? TURN ENDCASE LOOP ENDDO 1 os YES, 0 SAY "Datábase Subset Anaiyais" STYLE 65536 FONT "Geneva", 274 COLOR 0,0,0, -1, -1, -1 ? 7? • j? dateO? '' 77 TIMBO 7 'Clone pumbers' ?? STR (3NITIATE, 6.0) ?? 'through' '?? STR (TERMINATE, 6.0) 7 • Free them: 'IF ENTIR £ = 1? 'All free * ENDIF IF ENTIRE = 2 MARK = 1 DO WHILE .7. IF MARK > STOPIT EXIT ENDIF US? SELECTED GO MARK? '' 7? TRXM (libname) STORE MARK + 1 TO MARK LOOP ENDDO ENDIF? 'Designatione i' IF Ematch = 0 .AND. Hraatch = 0 .AND. Ctnatch «0 ?? 'All' ENDIF I? Eaatch «l ?? 'Exact,' ÍSOIF Hmatch = l 'Human,' 'ENDF IF Cmatch »l ?? 'Other sp. 'ENDIF IF CONDEN-1? 'Condensad forioat analyßiß' ENDIF IF ANAL-1? • 'Sorted by NUMBER'? NDIF IF ANAL? 2? 'Sorted by ENTOY1 ENDIF XF ANAL-3 7' Arranged by ABUNDANC? ' ENDIF IF ANAL * 4? 'Sorted by INTEREST' ENDIF IF ANAL-5? 'Arranged by LOCATICN' ENDIF IF ANAL-d? 'Arranged by DISTRIBUTICN' ENDIF TB ANAL-7? 'Arrangad by FUNCTTON' ENDIF? 'Total clones represented: 77 STR (STARTOT, 6.0)? 'Total clonea analyzedi' 7? STR (ANALTOT, 6.0)HEA F USE TEMP1 COUNT TO TOT ?? 'Total of' ?? STR (TOT, 4.0) ?? 'clones'? "List? Ff fields number, L, D, F, Z, R, C, EIí5TRY, DESCRIPTOR, LEt > TGTH, RF? ND, INIT, I liat off laida number, L.D, F.Z, R, C, ENTRY, DESCRIPTOR CLOSE DATAHASES ERASE TEMPl.D3F USE TEMPD? SIG F * USE TEMP1 COUNT TO TOT ?? 'Total of ?? STR (TOT, 4.0) I F «Northern (single), version 11-25-94 cióse databases SET TALX Oc? SET PRINT OFF S? T EXACT OFF CLEAR STORE '' TO Eobjšct STORE '' TO Dcbject STORE 0 TO Numb STORE 0 TO Zsg STORE 1 TO Bail DO WHILE .T. * Program: Northern (single). fmt * Date: 3/8/94 * Version: FoxBASE + / Mas, revision 1.10 * Notes: Format file Northern (single) 1 TYPE 0 INQ "Screen 1" AT 40.2 SI2E 286,492 PIX? LS FONT "Geneva", 12 COLOR 0,0,0 PIXELS 15,81 TO 46,397 STYL? 28447 COLOR 0,0, -1, -25600, -1, -1 * ß PIXELS 89,79 TO 192,422 STYL? 28447 COLOR CO, 0, -25600, -1, -1 9 PIXEL3 115.98 SAY «Entry #:" STYLB 65536 PCNT "Geneva", 12 COLOR 0,0,0, -1, -1, -1 S PIXELS 115.173 QET Eobject STYLS 0 FCNT "Ger.eva", 12 SIZ315,142 COLOR 0,0,0, -1, -1, -1 8 FIXEL? 145.89 SAY "Description * STYLE 65536 FONT MGeneva", 12 COLOR 0,0,0, -1, -1, -1 3 PIXELS 145,173 G? T Dobject STYLE 0 FONT • Ge eva *, 12 SIZ? 15,241 COLOR 0,0,0, -1, -1, -1 @ PIXELS 35.89 SAY "Single Northern ßearch screen" STYLS 65536 FONT "Ger.ava", 274 COIOR 0,0, - @ PIXELS 220,162 GET Bail STYLE 65536 FONT "Chicago", 12 PICTOR? "3 * R Contin e; Bail out 'SlZE ß PSXELS- 175.98 SAY" Clone #: "STYLE 65536 FCOT" Geneva "; 12 COLOR 0,0,0, -1, -1, -1, S PIXELS 175,173 G? T Numb STYLE 0 FONT "Genßva", 12 SIZ? 15.70 COLOR 0,0,0, -1, -1, -1 < S PIXELS 80,152 SAY "Enter any ONE sf che follo ing:" STYLE 65536 FONT "Geneva'.U COLOR -1, * * EOF: Northern (single). f t READ IF Bail 2 CLEAR aereen 1 off files: Lookup. bf "IF Eobjecto '' STORE UPPER (Eobjeee) to fobjset SET SAFETY OFF SORT ON Sntry TO * Lookup entry.dbf 'S5T SAFETY ON USE "Lookup entry.dbf" SOLUTION FOR Look-Eabject TF .NCT.FOUND CLEAR LOOP ENDIF BROWSE STORE Entry TO Searchv¡al CLOSE DATABAS? S ERASE "Lookup -entry.dbf" ENDIF IF Dobject '' SET EXACT OFF SET SAF? TY OFF SORT ON descriptor TO "Lookup descriptor.dbf 'SET SAFETY On USE" iioo up descriptor. dbf "WORK FOR UPPER (TRI (descriptor)) = UPPER (TRI (D? bj ect)) IF .NOT.FOOND OR CLEAR LOOP ENDIF BROWSE STORE E try TO Searchval CLOS? DATABASES ERASE "Lookup descriptor.dbf" SET EXACT CN ENDIF IF NupiboO USE "SmartGuy: FoxBASE + / Mac: Fox filestelones, db" GO Numb BRCW5E STORE? Ntry TO Searchval ENDIF CLEAR Northern snalysia ícr entry? Searchval ft? 'Enter and to proceed' WAIT to ox CLEAR IF UPP? R (O) < > 'Y' screen 1 off RETURN ENDIF " is. dbf '" PACK ew *? LOOP ? NDI? GO MARK1 STORE library TO TESTA S IP 'STORE Library TO TESTB IF TESTA = TESTB D? LEGE ? NDIF MARil «MAKK1 + 1 LOOP ? NDDO ROLL * Northern analysis CL? AR 7 'Doing the northem now ...' SET TALK ON USE * SmartGuy: FoxBASE + '/ Mac: F «x files clones, dbf SET SAF? TY OFF COPY TO "Hits. Bf" FOR entryasearchval SET SAF? TY ON CLOSE DATABA =? S SELECT 1 USE "Compressed librarles. Bf" STORE KSCCCONT O TO Entrißs? EL? CT 2 US? "Hits. Bf" Marted EO WHILE .T. SELECT 1 J Kark > Enfcri »s EXIT EMDIF GO MARK STORE library TO Jigger SELECT 2 COUNT TO Zog FOR library = Jigyer -SSEELLECT 1 &A35PPLACE hits with Zog k «Mark + l LOOP ENDDO SELECT 1 BROWS? FIELDS LI3FARY, LIBNAME, ENT? RED, HITS AT 0.0 CI? AR? 'Enter Y to print:' WAIT TO PRINSET IF UF? ER (PRINSET) = 'Y' SET PRINT ON CLEAR E? CT 'SCREEN 1 TYPE 0 : LG "Ecreen 1" AT 40,2 SIZE 286,492 FIXELS FONT "Geneva", 14 COLOR 0,0,0 ? 'DATABASE ENTRI2S MATCHING ENTRY' ?? Sec chval? DATEO? 1 TYPE 0 ING "Screen 1" AT 40, '2 SIZ? 236,492 PIXELS FONT "Geneva", 7 COLOR 0,0,0, sT ÓFF FIELDS library, libname, entered, hits? S? L? CT 2 LIST OFF FIELDS NUMBER, LIBRARY, D, S,?, Z, R,? NrRY, DESCRIPTION, R? STAR-r, START, RFEND SET TALK OFF SET PRINT OFF ENDTF CLOS? DATABAS32S SET TALK OFF CLEAR DO 'Test print .prg * RETÜRN Ubrary libpamo ADSNINB01 Inflamed adenoid ADRENOR01 Adrenal güpd (r) ADR? NOTD1 Adrßnal glapd (T) AMLBNOT01 AML blast celia (T) eMENNOTO Bonß merrow BMARNOT02 Bonß marro (T) CARDNOT01 Cardlae musel (T) CHAO OTQ1 Chin, oyster shell COR NOT01 Corneal? Troma FiaRAOTOl Fibro laßt, AT 5 FIBRAGTC2 Fibroblast, AT 30 F1BRANT01 Flbroblast AT Fl3pNGT01 FibroblasU uv 5 F19RNGT02 Rbroblast. uv 30 *., R FIBfWOTOl Rbroblast aRNOTD2 Normal Fibroblas HMC1NOT01 Maßt cßll Une HMC-1 HUVELPBOI HUVEC IFN.TNF.LPS HUVENOB01 HUVEC control HUV? STB01 HUVEC shear stress HYPO CB01 Hypothalamus KIDNNOT01 ldnßy (T) UV MOT01 UVTG (T) LUNGNOT01 Lung (T) MUSC OT01 S fll? Lal mutdß (T) OVIDNOBOt Oviduct PANCNOT01 Pancreas, normal PG? JNOHOI Pliuilary (r) PITUNOT01 Pllullary fj) PLACNOB01 Placenta S1NT OTD2 Smßll ¡ntßatínß (T) SPL FET01 6pl? Entl? Vßr, fetal SPLNNOTOS Spleen (T) STOMNOT01 Stomach 6YNORAB01 Rhßum. synovlum JB YNOT01 T + B lyrnprtoblat: STNOTOI T? 9tia (T) P1NOB01 THP-1 control f T THHPI 1PEB01 THP phorbol THP1PLB01 THP-1 phnrbol LPS U937NOT01 U937, monocyle leu number library d a f z r r entry descriptor rf ta iatart rfand 2304 U837NOT01 E H C C T HUMEF1B EJonoitloo lador 1-bata 0- 0 773 3240 HMC1N T01 E H C 0 T HUMEFlB Elongal'cn (actor 1-bßt? 0 370 773 3259 HMC1NOT01 E H C C T HUMEFlB Elonoaticn (actor 1-bata 0 371 773 «93 HMC1NOT01 E H C C T HU EF1B Elongatten tactor 1-bßta 0 470 773 39S9 HMC1NOT01 E H O CT HUMEFlB Elopgawn a or 1'bßta 0 327 773 9139 HMO1NOT01 E H C 0 T HUMEF1B Elongaucn (actor 1-bßta 0 375 773

Claims

# CLAIMS

1. A method for analyzing a specimen containing gene transcripts, the method comprising the 5 steps of: (a) producing a library of biological sequences; (b) generate a set of sequences of ^^, transcripts, where each of the sequences of 10 transcripts in that set is indicative of one of the biological sequences different from the library; (c) process the transcription sequences in a programmed computer in which the database of the biological reference sequences is stored, to generate a 15 value of a sequence identified for each of the transcription sequences, where each of the values Identified sequence MIM is indicative of a sequence annotation and a degree of coincidence between one of the sequences and at least one of the sequences of 20 reference transcript; and (d) processing each of the identified sequence values to generate final data values indicative of a number of times each identified sequence value is present in the library.

2. The method of claim 1, wherein the pasq (a) includes the steps of: obtaining a mRNA mixture; make copies of RNA cDNA; isolate a representative population of clones transfected with the cDNA and produce from them the library of biological sequences.

3. The method of claim 1, wherein the biological sequences are cDNA sequences.

4. The method of claim 1, wherein the sequences are RNA sequences.

The method of claim 1, wherein the biological sequences are protein sequences.

The method of claim 1, wherein a first value of the degree of coincidence is indicative of an exact match, and a second value of that degree of coincidence is indicative of a non-exact match.

Wt 1 A method of comparing two specimens containing gene transcripts, said method comprising: 0 (a) analyzing a first specimen according to the method of claim 1; (b) producing a second library of biological sequences; (c) generating a second set of transcription sequences, wherein each of the sequences together is indicative of one of the sequences of the second library; (d) process the second set of transcription sequences in the programmed computer to generate a 5 second set of identified sequence values known as additional identified sequence values, wherein each of the additional identified sequence values is indicative of a sequence annotation and a degree of coincidence between one of the biological sequences 10 of the second library and at least one of the reference sequences; (e) processing each additional identified sequence value to generate additional final data values indicative of a number of times each sequence value 15 identified is present in the second library; and (f) processing the final data values from the flp of the first specimen and the additional identified sequence values from the second specimen to generate quotients of transcription sequences, each of these values of the ratios indicative of differences in numbers of gene transcripts between the two specimens.

8. A method for quantifying the relative abundance of mRNA in a biological specimen, said method comprising the steps of: (a) isolating a population of transcripts from (b) identifying the genes from which the mRNA was transcribed by a method of specific sequence; 5 (c) determine numbers of transcripts of MRNA corresponding to each of the genes; and (d) using the mRNA transcription numbers to determine the relative abundance of transcripts of j mRNA within the population of mRNA transcripts.

9. A diagnostic method comprising producing an image of gene transcription, said method comprising the steps of: (a) isolating a population of mRNA transcripts from the biological specimen; 5 (b) identify the genes from which the mRNA was transcribed by a specific W-sequence method; (c) determining numbers of mRNA transcripts corresponding to each of the genes; and 0 (d) using the mRNA transcription numbers to determine the relative abundance of mRNA transcripts within the population of mRNA transcripts, wherein the data that determine the relative abundance values of mRNA transcripts is the image of the mRNA. transcription of the biological specimen gene.

10. The method of claim 9, further comprising: (e) providing a set of standard transcription images of sick and normal genes; Y 5 (f) comparing the gene transcription image of the biological specimen with the gene transcription images of step (e) to identify at least one of the standard transcription images of genes that most closely approximates the transcription image 0 of genes of the biological specimen.

The method of claim 9, wherein the biological specimen is biopsy tissue, saliva, blood or urine.

12. A method for producing a gene transcription picture, the method comprising the steps of: (a) obtaining a mixture of mRNA; Wk (b) make copies of RNA cDNA; (c) inserting the cDNA into a vector by adapting and using that vector to transfect cells from a suitable host strain that are platinized and allowed to grow into clones, with each clone representing a single mRNA; (d) isolating a representative population of recombinant clones; (e) identify amplified cDNAs from each 5 clone in the population by a specific sequence method that identifies the gene from which the unique mRNA was transcribed. (f) determine a number of times that each gene is represented within the population of clones as a 5 indication of relative abundance; and (g) listing the genes and their relative abundance in order of abundance, thereby producing the gene transcription picture.

The method of claim 12, further including the step of diagnosing disease by: repeating steps (a) to (g) in biological specimens from a random sample of normal and diseased humans, encompassing a variety of diseases , to produce reference sets of images of 15 transcripts of normal and diseased genes. obtain a test specimen from a human, and produce a transcript image of the test gene by performing steps (a) through (g) on that test specimen. compare the transcript image of the gene with 20 reference sets of gene transcript images; and identifying at least one of the gene transcription images that closely approximates the transcript image of the test gene. 25.

A computer system for analyzing a library of biological sequences, including that system: an element for receiving a set of sequences of transcripts, wherein each of the sequences of transcripts is indicative of one of the different biological sequences. from the library; and an element for processing the transcription sequences in the computer system in which a database of sequences of reference transcripts is stored, wherein the computer is programmed with software 0 to generate an identified sequence value for each of the transcription sequences, wherein each identified sequence value is present in the library.

The system of claim 14, which also includes: an element for * the generation of a library to produce the library of the sequences -Jñt biological and generate the set of transcription sequences from that library.

16. The system of claim 15, wherein the element for generating the library includes: an element for obtaining a mRNA mixture; an element for making cDNA copies of the mRNA; an element for inserting the cDNA copies into cells and allowing the cells to develop into clones; 5 an element to isolate a population * »•« 123 Wr representative of the clones and produce from them the library of biological sequences. F F * • 124 * SUMMARY A method and system for quantifying the relative abundance of gene transcripts in a biological specimen. A method modality generates high throughput analysis of specific sequences of multiple RNAs or their corresponding cDNAs (gene transcription image analysis). Another modality of the method produces an image analysis of ft gene transcription by using high-throughput analysis 10 performance of cDNA sequences. In addition, projection of gene transcription images can be used to detect or diagnose a particular condition, disease or biological condition that correlates with the relative abundance of gene transcripts in a cell or population of 15 cells given. The invention provides a method for comparing the gene transcription image analysis of two or more f f different biological specimens in order to distinguish between the two specimens and identify one or more genes that are differentially expressed between the two specimens. twenty * * * * *