EP1230383A2

EP1230383A2 - Quantitative assay for expression of genes in microarray

Info

Publication number: EP1230383A2
Application number: EP00921995A
Authority: EP
Inventors: Eugenia Wang
Original assignee: Sir Mortimer B Davis Jewish General Hospita
Current assignee: Sir Mortimer B Davis Jewish General Hospita
Priority date: 1999-04-08
Filing date: 2000-04-10
Publication date: 2002-08-14
Also published as: JP2004502129A; AU4224400A; WO2000060126A3; WO2000060126A2; IL145826A0

Abstract

A method has been developed for detection of gene expression or hybridization in microarrays, for example, in combinatorial libraries where quantities are very small and spots located very closely, resulting in uncomfortable situations where intense reaction can spill over into the adjacent spots, and therefore obscure the accuracy of the reaction of the neighboring sites. The assay uses a digoxigenin enzyme assay for detection. A method for enhancing the reliability of analysis of expression of DNA in microarray formats has also been developed, using software analysis that normalizes the spots. This process uses deformable template techniques to quantify large-scale array data automatically, despite possible spatial distortion of the arrays. Each node in the deformable template represents a gene spot, and iterates according to the gradient descent rule, which minimizes an energy function combining data mismatch energy and template deformation energy.

Description

QUANTITATIVE ASSAY FOR EXPRESSION OF GENES IN MICROARRAY

Background of the Invention The United States government has rights in this invention by virtue of grant R01 AG09278 from the U.S. National Institute on Aging to Eugenia Wang and from the Defense Advanced Research Projects Agency of the U. S. Department of Defense to E. Wang.

The present invention is in the area of a method for detecting quantitative as well as qualitative levels of expression of genes in a microarray, for example, in combinatorial libraries.

Microarray techniques offer biologists a systematic way to survey DNA and RNA variation. Until recently, the only tools available to scientists were Northern blot analysis, RNase protection or RT-PCR to assay differential expression. These techniques are limited to use with a few genes at a time. In contrast, microarray techniques provide a means of generating a global view of huge numbers of gene expressions simultaneously which has attracted great interest, and they are becoming standard tools of both molecular biology research and clinical diagnostics. As the first step of expression profiling experiments, the analysis and quantification of the array images exert an important impact on the accuracy of the subsequent data mining and exploration.

Microarrays typically contain at separate sites nanomolar (less than picogram) quantities of individual genes, cDNAs, or Expressed Sequence Tags ("ESTs") (partial gene sequences) on a substrate such as a nitrocellulose or silicon plate, or photolithographically prepared glass substrate. Microarrays containing approximately a thousand ESTs are commercially available from Affymatrix. Clontech sells arrays of gene- specific cDNA fragments, with approximately half the number of Affymatrix' s ESTs, designed for specific research areas such as tumor research or broad applications. Once fabricated, the arrays are hybridized to cDNA probes using standard techniques with gene-specific primer mixes. The nucleic acid to be analyzed — the target — is isolated, amplified and labeled, typically with a fluorescent reporter group, radiolabel or phosphorous label probe. After the hybridization reaction is completed, the array is inserted into the scanner, where patterns of hybridization are detected. The hybridization data are collected as light emitted from the reporter groups already incorporated into the target, which is now bound to the probe array. Probes that perfectly match the target generally produce stronger signals than those that have mismatches. Since the sequence and position of each probe on the array are known, by complementarity, the identity of the target nucleic acid applied to the probe array can be determined.

There are a variety of labels that are used. cDNAs and ESTs can be detected by autoradiography or phosphorimaging ( P). Fluorescent dyes are also used, and are commercially available from suppliers such as Clontech.

The mapping and sequencing phase of the human genome project is well ahead of schedule. So far, complete genomic sequences of 17 model organisms, including the eukaryotes S. cerevisiae and C. elegans, have been finished. The complete human genome sequence is expected to be available this year. However, of the genes already sequenced, currently the function of only approximately 20% of 53,000 human genes, or 25% of 6,200 yeast open reading frames, are known. The next phase of the human genome project will be dealing with understanding the functions of the remaining 80% of the genes. The main approach to studying a new gene's function is by determining its pattern of expression - when, where and how strongly it is expressed.

Techniques currently implemented to investigate gene expression levels include Northern blots (Alwine, et al. Proc. Natl Acad. Sci. USA 74, 5350-5354 (1977)), RT-PCR (Martorana, et al. BioTechniques 27, 136-144 (1999), Wen, et al. Proc. Natl. Acad. Sci. USA 95, 334-339 (1998)), differential display (Liang and Pardee Science 257, 967-971 (1992)), sequencing of DNAs from cDNA libraries (Okubo, K. et al. Nature Genet. 2, 173-179 (1992)), and serial analysis of gene expression - SAGE (Velculescu, V.E. et al. Cell 88, 243-251 (1997)). All of these approaches except SAGE can process simultaneously only several dozens or hundreds of samples. Keeping in mind the huge volume of information that should be received in the near future (expression patterns of tens of thousands of genes under different conditions, and in different organisms), it seems that only high-throughput methods like cDNA or oligonucleotide microarrays will be powerful enough to resolve this challenge. cDNA microarrays have already been successfully implemented in a number of cases, including: light- and dark-grown J. thaliana seedlings (Desprez , et al. Plant J. 14, 643-652 (1998)), heat shock and phorbol ester- regulated genes in human T cells (Shena, et al. Proc. Natl Acad. Sci. USA 93, 10614-10619 (1996)), young and old mice (Lee, Science 285, 1390-1393 (1999)), and temporal programs of gene expressions in human fibroblasts in response to serum stimulation (Iyer, V.R. et al. Science 283, 83-87 (1999)). Many aspects of microarray implementation were recently reviewed in a special issue of Nature Genetics 21, suppl. (1999). So far, in most cases cDNA arrays have been printed on glass slides. Glass as a support has many advantages; it is a durable, non-porous material which has low peculiar fluorescence. But on the other hand, the load of each dot in a microarray on glass is lower than on nylon membranes. As a consequence, with microarrays on glass it is necessary to use very high concentrations of fluorescently labeled probe, typically 50-200 μg of total RNA or several μg of mRNA per array, in each hybridization (Duggan, et al. Nature Genetics 21, suppl., 10-14 (1999)). Such quantities of RNA are usually not available, and this fact limits possible cDNA microarray implementations. Another problem concerning the cDNAs compromising the microarray is that PCR products extracted from clone inserts of cDNA libraries (Gerhold, et al. TIBS 24, 168-173 (1999)) are probably not optimal hybridization probes, because clone inserts can be of very different lengths (0.3 -3.0 kb) and GC content, and therefore have different melting temperatures. This means that the efficacy of hybridization of different probes in microarrays under particular hybridization conditions can differ very significantly, making expression profiles dependent on experimental conditions. Finally, if cDNA targets for hybridization on arrays on glass are fluorescently labeled, while this approach allows direct comparison between control and test samples in one experiment (Cheung, et al. Nature Genetics 21, suppl., 15-19 (1999)), the sensitivity of fluorescent probes is lower than that of radioactive or enzyme-coupled probes. Fluorescent probes can also bleach during analysis, making it impossible to rescan an array; and last but not least, the price of laser scanners and other equipment necessary to analyze fluorescence still remains too high for broader implementation of cDNA microarrays. Microarrays provide a simple and comprehensive way to describe huge numbers of genes simultaneously. While array preparation techniques have matured in recent years, as reported by Cheung, et al, Nature Genet. 21, 15-19 (1999) and Brown and Botstein Nαtwre Genet. 21, 33-37 (1999), the techniques for quantifying and analyzing the array data are still in an evolving stage. As noted above, arrays in general have found a wide range of applications, such as investigating normal biological and disease processes, and profiling differential gene expression. The analysis of array data has therefore attracted considerable research interest. As the first step of expression profiling experiments, the analysis and quantification of the array images exert an important impact on the accuracy of the subsequent data mining and exploration. However, despite the high cost of commercial packages (Bowtell, Nature Genet. 21, 25-32 (1999)), the procedures for quantifying and analyzing remain time-consuming and tedious. Some commercial packages use a rigid template grid to extract the gene expressions, and require an accuracy of up to one pixel in overlaying the template. A simple comparison between two arrays may take several hours for a first-time user. In addition, most commercially available software packages rely on the human operator to not only align the array, but also evaluate the reliability of the results. Moreover, all of the currently available systems for detection of gene expression have disadvantages. Bleeding due to an overabundance of a specific gene's presence or expression is a problem with some commercially available filters, with bleeding from one spot to adjacent spots, obscuring the results for the adjacent spots. Fluorescent labels fade, so that permanent records must be made by alternative techniques. Sensitivity of detection is low.

Some of these problems have been minimized using more specific primer mixes. A cDNA probe generated by random priming distributes the isotropic label among nearly all RNA species. Many labeled species will contribute only to non-specific background hybridization, or will cross- hybridize to many different cDNA fragments. The proportion of label that is in sequences complementary to genes represented on the array is minimal. Label can be concentrated into poly A+ RNA using oligo(dT) primers. However, most cells contain mRNA from many thousands of genes at any given time, so the probe primarily consists of sequences not represented on the array that can contribute to undesirable cross-hybridization. Selection of probes to exclude common sequences helps. Another approach to address these problems is to decrease the number of genes on the array. Still another approach is to apply multiple samples to each array, so that results can be averaged, and anomalous results readily identified.

However, none of these techniques have eliminated problems with analysis of the microarrays. There remains a need for sensitive, accurate, reproducible means for analysis of microarrays of genes and ESTs. There is also a need for quantitative detection, not just qualitative detection.

It is therefore an object of the present invention to provide a method and materials for accurately detecting gene expression in microarrays of large numbers of genes, cDNAs or ESTs. It is a further object of the present invention to provide a method and materials for quantitating as well as detecting expression of genes in microarrays.

It is another object of the present invention to provide a non- radioactive or non-fluorescent assay for gene expression for use in microarrays, Northern blots, Southern blots or other techniques using DNA hybridization. Summary of the Invention

A method has been developed for detection of gene expression or hybridization in microarrays, for example, in combinatorial libraries where quantities are very small and spots located very closely, resulting in uncomfortable situations where intense reaction can spill over into the adjacent spots, and therefore obscure the accuracy of the reaction of the neighboring sites. The assay uses a digoxigenin enzyme assay for detection.

The examples demonstrate the utility of the enzymatic detection system. The transcriptionally regulated profile of E-box-related genes specific to a given cultured cell sample was determined by unique digoxigenin (DIG)- labeled cDNAs produced from RNAs isolated from the culture of interest. This specific enzymatic labeling probe allows the end result of detecting hybridization reaction intensity by colorimetric evaluation of alkaline phosphatase-coupled antibody to DIG. The enzymatic deposit on each locus of the E-box microarray is readily analyzed by an upright microscope attached to a CCD camera, without the problem of the long delay needed for exposure time with radioactive probes, or the photobleaching and high background reaction problem associated with the fluorescent probe approach. The enzymatic approach provides a user-friendly designer approach to custom-adapt the gene screening task to analyze a subgroup of gene expressions controlled by the same molecular modality. The assay is very sensitive, enabling detection of as little as 0.02 nanograms.

A method for enhancing the reliability of analysis of expression of DNA in microarray formats has also been developed, using software analysis that normalizes the spots. This process uses deformable template techniques to quantify large-scale array data automatically, despite possible spatial distortion of the arrays. Each node in the deformable template represents a gene spot, and iterates according to the gradient descent rule, which minimizes an energy function combining data mismatch energy and template deformation energy .

The utility of the normalization method for analysis was demonstrated in a study to identify families of genes in the mouse liver whose transcript levels are altered by aging. Commercially available cDNA microarrays were used to analyze liver mRNA levels from young versus aged C57BL/6 male mice. Hybridization of RT-cDNAs from young and aged mouse livers to a mouse cDNA microarray of 588 genes (ClonTech Atlas Mouse cDNA Expression Array) shows specific, coordinated age- associated changes in the expression of certain families of genes whose activities are critical to tissue function and repair and maintenance of the normal physiological state. These gene families include tumor suppressors, cell cycle regulators, and various stress response and signaling pathway components. ³²P-labeled ssRT-cDNA was used as the probes for the microarray hybridization assays, with mRNA levels ranging from very high to very low abundance. The process described herein was demonstrated to yield superior results.

Brief Description of the Drawings Figures la, lb, lc, Id and le show an apparatus for making microarrays.

Figure 2a-e demonstrates all the steps of template definition, iteration, unreliable region detection and spot labeling. It is easy to distinguish the over-expressed spots, labeled as '*', and those distorted by the shadow of over-expressed spots, labeled as '?'. The labels of the spots are confirmed by visual inspection on the original image, where arrows indicate the positions of spots in the shadow of the over-expressed ones. Figure 2a is a raw image of part of a ClonTech cDNA array. The array quantification method was tested by analyzing ClonTech Atlas™ cDNA Filter Arrays, which comprise 588 cDNA elements spotted in duplicate. Following the user's manual provided by the manufacturer, an autoradiograph was obtained after the procedures of radioactivity labeling, array hybridization and rinsing. The digital images are obtained by scanning the autoradiographs upright at 300 DPI, with 8-bit gray scale and gamma correction disabled. A 3*3 out- range pixel smoothing filter was applied to reduce salt-and-pepper noise. The initial position overlays a grid template on the original image, which is one block of a ClonTech filter. Users are required to provide the position of spots in the left-top and right-bottom corners via a graphical user interface. In equations (5) and (6), parameters are set as λ, = φ, = \ and η - 50. Figure 2b is the initial alignment with a prototype template. Figure 2c is the fine alignment by automatic iteration of a deformable template. Figure 2d shows the unreliable regions (yellow) detected by mathematical morphology. The structuring element is chosen as a disk with the same size as an ideal sample spot. Figure 2e shows automatic labeling of the spots, corresponding to the unreliable regions ('*' over-expressed spots; '?' - spots distorted by the shadow; V - normal spots). Figures 3a-3d show the qualitative and quantitative evaluation of repeatability and reliability. Two blocks (transcription factors and general DNA-binding proteins) of ClonTech Atlas™ mouse cDNA arrays were quantified using the same deformable template method. The imaging conditions are the same as in Figure 2a. Two independent operators are required to provide the initial position (left-top and right-bottom corners) of the prototype template (Figure 3 a). Each operator repeats the same procedure on one sample three times. For the sake of simplicity, the normalized intensity of a spot of [0, 0.33] is defined as low-abundance, [0.33, 0.67] is defined as medium-abundance, and [0.67, 1] is defined as high-abundance. Relative error versus intensity demonstrates that the errors in quantifying low-abundance spots are found to be higher than in quantifying medium- and high-abundance spots. Maximal error in ratio versus intensity demonstrates that the errors in ratios among low-abundance spots are found to be higher than the rest of the spots (Figure 3b). An example of undesirable normalization compares the gene expressions in the two samples being compared to control genes (Figure 3c). Line LI is the ideal case of two identical samples: S,, = S_2l, i = l,...,N ; line Z2 is the central line of the real samples; line L3 is the central line of the real controls. The normalization procedure can be understood as rotating the coordinate space by angle Θ(L2, 13) . Thus the reliability of normalization can be evaluated by the angles θ(Ll,L3) and Θ(L2,L3) (Figure 3d). The larger the angles, the less reliable the normalization. In this case, θ(Ll,L3) = -29.30° , Θ(L2,L3) = -14.58° . An example of desirable normalization is seen in Figure : fl(Zl,E3) = -0.96° , Θ(L2,L3) = 2.13° .

Figure 4 shows cDNA microarray hybridization for evaluation of E-box binding-related gene expression. The matrix position, with each gene's abbreviation, is written underneath each locus of three repeats of dots with identical amounts deposited; the X-coordinates denote the number 1, 2, 3, 4, and 5 positions, and Y-coordinates denote the "a" through "o" positions. The matrix location for each gene triplet is then identified as XN coordinates. For example, 5k denotes the position of N-Myc, and 3d denotes the position of Mad. The same coordinates are also included in Table 4.

Figures 5a and b show the expression profiles of E-box binding-related gene expressions in Hela cells. Figure 5a - total RNA was labeled with digoxigenin in RT reaction with gene specific primers; Figure 5b - mRNA was labeled with digoxigenin in RT reaction with oligo(dT) primers. Arrows within the matrix show positions of: /- Hela DNA (positive control); II- lambda DNA (negative control); Z/7-UBC; IV- RPL-13A; V- MBP-1; VI- HPRT1. The distance between dots can be measured by the bar of 1mm.

Figure 6 shows the hybridization of products of multiplex PCR with 5 pairs of primers with a cDNA microarray. Arrows within the matrix point to: I -Mrdb; //- c-Myc p64; III- TFII-1 ; IV- ODC1 ; V- cdc25A; VI- Hela genomic DNA.

Figure 7 shows the relationship between concentrations of 5 genes including Mrdb, c-Myc, TFII-1, ODC1, and cdc25a, and intensity of hybridization signals. Logarithmic approximation is shown. Dot intensity is represented by the arbitrary units on the Y-axis; concentration is measured as ng/ml on the X-axis.

Figures 8a and 8b show the expression profiles of E-box-related genes in Hela cells (Figure 8a), and normal human lymphocytes (Figure 8b). Arrows within the matrix show positions of: I- Aldolase C; II- Mad4; III- MBP-1.

Figures 9a, b and c depict a pairwise comparison of E-box gene expression in Hela cells and human lymphocytes. Two independent hybridizations are averaged for each type of cell. Figure 9a - Three- dimensional, and Figure 9b - two-dimensional, representations of differences in gene expression. Each panel corresponds to one column in Figure 8a, and each bar represents an individual gene. Figure 9c - Distribution of genes with common gain or loss of expression in dependence on relative ratio value. The relative fold ratio between samples S 1 and S2 is computed as

R_DM(SI,S) — (Sr-Sύ max(S],S2)

which yields a value in the range of [-1 ,+1]. Positive values correspond to up- regulation, and negative values correspond to down-regulation, of genes in sample S2. The relative fold ratio has a similar meaning to that of conventional fold ratio, except that the value is normalized and symmetric, with clear physical inteφretation. R^Si^) = ±0.5 corresponds to a two-fold up- or down-regulation in normalizing the two samples; a set of housekeeping genes of relatively constant expression levels were selected as controls, and linear normalization was applied.

Detailed Description of the Invention

Two methodologies have been developed to enhance resolution and sensitivity of microarray analysis of gene detection. The first is a chromophore detection assay to facilitate determination of the amount of expression in microarrays, Northern blots, Southern blots, and other techniques for determination of DNA hybridization; the second is a fast and reliable approach to analyzing generic arrays using a deformable template to extract the expression spots in the array, which is capable of identifying the unreliable expressions automatically. This automated iteration reduces the human error in quantification to a large extent, and optimizes the processing time in comparing arrays ten-fold compared to the existing packages. Apparatus for Making Microarrays

Figures la-le show an apparatus for forming microarrays of biological materials. As shown Figures la and lb, the base of the apparatus is a vibration isolation table 1. Mounted on the table 1 by means of a first horizontal linear guide 2, is a platform 14. The platform 14 is connected through a carriage (not shown) to a drive mechanism (not shown) such as a lead screw in the guide 2. The horizontal linear guide 2 carries a computer controlled motor 5 a connected to the drive mechanism, which effects movement of the platform 14 back and forth along the first linear guide 2. The motor 5 a is linked to a computer 13 via an amplifier 15 and a motion control board 28.

The platform 14 is designed to hold detachable sample reservoirs 11 in predetermined positions 18. As shown in Figure lb, a sample reservoir 11 is a 96-well microtiter plate. The platform 14 also holds a series of substrates 12 which are held in place by means of suction, created by drawing a vacuum through the holes 21 on the platform underneath the substrate.

As seen in Figures la and lb, at opposite sides of the table 1 are vertical risers 6, having upper ends that carry a second horizontal linear guide 3 mounted substantially transversely, above and straddling the platform 14 and the first linear guide 2. The second horizontal linear guide 3 carries a computer controlled motor 5b connected to a drive mechanism (not shown) such as a lead screw which is in threaded engagement with a carriage (not shown) which can be moved along the guide 3 by operation of the motor 5b, linked to the computer 13 via an amplifier 16 and the motion control board 28.

A third guide 4 is attached to the second linear guide 3 by means of the carriage such that the third linear guide 4 is substantially peφendicular to the first linear guide 2 and the second linear guide 3. By means of the computer controlled motor 5b, the third linear guide 4 can be moved back and forth by the carriage along the axis of the second linear guide 3. A drive mechanism within the third linear guide 4, e.g. a lead screw that is meshed with the carriage, enables the third linear guide 4 to be moved vertically by a further computer controlled motor 5c and positioned in any desired vertical location within the range of movement. Computer control is achieved by connection of motor 5a to an amplifier 17 which is connected to the motion control board 28. Use of linear guide 2, linear guide 3, linear guide 4 and the three carriages, provides for motion in three dimensions.

As shown in Figure lc, near the lower end of the third linear guide 4, is attached a sampling manifold 9 which contains four sampling needles 8 spaced linearly along the manifold at intervals to allow for simultaneous sample pick-up by all four sampling needles 8 from four sample locations in sample reservoirs 11. The sampling manifold 98 can be moved between two positions by activation of a pneumatic cylinder 10 connected between the sampling manifold 9 and third linear guide 4. As shown in Figure lc in solid lines, the sampling manifold 9 is in the "down" position, for sampling and cleaning. When the third linear guide 4 is being re-positioned, the sampling manifold is pivoted to the "up" position, as shown by the broken lines.

The base of the third linear guide 4 has piezoelectric inkjets 7 mounted thereon, the sampling needles 8 being connected to the piezoelectric inkjets 7 by microline tubing conduits 34 (Figure lc). Each sampling needle 8 is connected through a conduit 34 to a micropump 35 and thence to a microvalve 36. Each microvalve 36 is adjustable so that the fluid delivered from the pump 35 can be directed selectively to the corresponding piezoelectric inkjet 7 or to the waste.

As shown in Figure lb, there are two gravity overflow reservoirs 24, 25, positioned on opposite sides of the first linear guide 2 at the rear of the platform, on isolation table 1. The reservoirs 24, 25 contain cleaning solutions that can, when desired, be pumped through the interior of the apparatus. As shown in Figure Id, such overflow reservoirs 24, 25 are provided with a fluid in-feed aperture 40 in the lower portion of the overflow reservoir, into which is pumped the solution of interest from a fluid reservoir 41 , through a pump 44. In the upper portion of the overflow reservoir is a fluid overflow aperture 42 out of which the liquid in the overflow reservoir returns to the fluid reservoir 41. The overflow reservoir is further provided with openings 43 on the top to allow for insertion of the sampling needles 8.

Also mounted to the vibration isolation table 1 are two cleaning boxes 26, 27 (Figure lb) positioned on opposite sides of the first linear guide 2 outwardly of the gravity overflow reservoirs 24, 25. As shown in Figure le, the top of each box is provided with openings 51, 52 to accommodate the sampling needles 8 and the piezoelectric inkjets 7, respectively. Each box 26, 27 is provided with nozzles 50 on the interior sides thereof for the delivery of wash fluid from a wash fluid reservoir (not shown) onto the exterior surfaces of the sampling needles 8 and the piezoelectric inkjets 7. The lower part of the boxes are provided with an exit port 53 through which the waste wash fluid is sucked off into a vacuum trap.

In operation, a number of substrates 12 are placed on the platform 14 and selected samples are loaded into the sample reservoirs 11. The platform 14 mounted on the first linear guide 2 is moved into position such that the sampling manifold 9 is in position for sampling. The sampling manifold 9 is lowered by actuation of the pneumatic cylinder 10, and the third linear guide 4 is lowered, to place the sampling needles 8 into the sample reservoirs 11. The quantities of sample to be placed on the microarray are taken up through the sampling needles 8 by way of the micropumps 35 and delivered through the conduits 34 to the piezoelectric inkjets 7, and the sampling manifold 9 retracted. The piezoelectric inkjets 7 are positioned over the substrate 12 and the piezoelectric inkjets 7 deliver the samples onto the substrate 12. This process is repeated such that multiple samples are delivered to the selected substrates 12 at the predetermined locations to form a microarray.

To prevent cross-contamination of samples, the sampling needles 8, the conduit 34, the micropump 35, the microvalves 36, and the piezoelectric inkjets 7 are cleaned between take-up of different samples. This is done using the two gravity overflow reservoirs, one 24 containing saline and the other 25 containing water. Saline or water is taken up by sampling needle 8 and delivered through the conduit 34 to the piezoelectric inkjets 7 to flush the system. Following this the cleaning boxes 26, 27, are used to spray water and/or air on the exterior surfaces of the sampling needles 8 and the piezoelectric inkjets 7.

The whole of the process may be automatized. Motion control, digital actuation, sample processing, and micropumping can all be controlled by means of a computer using specialized computer programs designed therefor. Certain functions such as the recirculation of fluid through the gravity overflow reservoirs can be run continuously during operation and do not require computer control.

A variety of liquid reagents can be dispensed using the described apparatus. For example, the liquids may contain DNA, RNA, modified nucleic acids and nucleic acid analogues, peptides, antibodies, antigens, enzymes, or cells. The apparatus can also dispense activator or inhibitor fluids. An activator fluid is one which makes possible coupling to the substrate, or causes a synthesis reaction with a previously deposited reagent. An inhibitor fluid protects an area on the substrate to prevent the material in the area from reacting.

Piezoelectric inkjets 7 preferably are drop-on-demand printer heads which are able to deliver small metered amounts of liquids quickly and accurately. The amount of material delivered will depend on the specific use, and may be, for example, 10 to 1000 picolitres (pi), preferably 20 to 100 pi, and most preferred 35 pi. Examples of sample reservoirs include 96-well and 384-well microtiter plates and Eppendorf™ tubes. Examples of sampling devices include sampling needles, which may be made of stainless steel bore tubing and may include syringe tips. The gravity overflow reservoirs and the cleaning boxes may be located in any suitable position.

The micropumps 35 may be activated intermittently or continuously. Intermittent activation may be achieved using an AC - DC relay under the control of the motion control board. Components of the apparatus may be provided separately for assembly, together with instructions for assembly and use of the apparatus. Digoxigenin Enzymatic Detection Assay

As demonstrated in Example 2, an extremely sensitive and accurate means for visualizing the extent of hybridization between nucleic acid molecules has been developed. The assay utilizes digoxigenin (DIG) to label target nucleic acid molecules, such as cDNA produced from gene-specific primers, with subsequent incubation with anti-digoxigenin antibody conjugated with an enzyme such as alkaline phosphatase (AP), and colorimetric or chemiluminescent detection.

The method includes the steps of labeling the nucleic acid molecules, typically cDNA, hybridizing the labelled molecules, rinsing to remove material that did not hybridize, incubating the hybridized material with enzyme conjugated anti-digoxigenin antibody, typically alkaline phosphatase, staining for revelation of bound enzyme, and scanning for data acquisition. This method is fast, requiring a maximum of two days, and can detect samples of four micrograms or less.

The assay can be used to detect as little as 0.02 nanograms of nucleic acid. This means the starting material; i.e., the DNA or RNA isolated from the targeted tissues can be as little as one to two micrograms. The starting materials can be labeled by the digoxigen method, then the labeled nucleic acid used for hybridization with the microarrays. Methods currently used by

Affymatrix and ClonTech both require using mRNA as the starting material, which is difficult to obtain, requiring at least 100 to 1,000 micrograms of total RNA. This is in contrast to the one to two micrograms total RNA required for the 0.02 nanograms which can be detected with the assay described herein. The assay uses an enzyme that can cleave a chromogenic, chemiluminescent or colorimetric substrate. In the preferred embodiment, the enzyme is alkaline phosphatase and the substrate is Disodium 3-(4- methoxyspiro { 1 ,2-dioxetane-3 ,2 ' -(5 ' -chloro)tricyclo [3.3 J J ] decan} -4-yl) phenyl phosphate - CSPD. In the preferred embodiment, following antibody incubation, the reacted substrate is stained with 5-Bromo-4-chloro-3-indolyl- phosphate, toluidine salt (BCIP), and Nitro blue tetrazolium chloride (NBT) in detection buffer. Extraction of gene expression from array images

In an ideal array image, the gene expressions are represented as equally spaced spots of the same size. However, due to random disturbance in printing the spots, or the waφing of the membranes caused by undesirable temperatures, the spots can be displaced from their ideal positions. Simple grid templates are insufficient to extract the gene expressions. To overcome the variability of these spots, a deformable template, with shape-varying ability, has been developed to keep track of the distortion of gene spots. Active shape models (Cootes, et al., Computer Vision and Image Understanding 61, 38-59 (1995); Ostu, N. IEEE Trans, on Syst. Man &

Cybern. 8, 62-66 (1978); and Adryan, et al., BioTech. 26, 1174-1179 (1999)) and deformable templates (Sena, J. Image Analysis and Mathematical Morphology, Academic Press, London, 1982; Haralick, et al. IEEE Trans, on Patt. Anal. Mach. Intell. 9, 532-550 (1987); Ostu, N. IEEE Trans, on Syst. Man & Cybern. 8, 62-66 (1978)) were developed to detect and locate distorted objects by incoφorating prior information concerning the shape of desired objects into the development of an active 'snake' model.

Generally, deformable template matching techniques integrate model- driven and data-driven analysis by an energy function and a set of regularization parameters. Two factors are taken into account: data mismatch and template deformation. Usually, criterion functions are defined to quantify these two factors: one measures how much the input pattern differs from the deformed template, and the other measures the degree to which the template is deformed. In template matching or classification applications, optimal matching is achieved by minimizing a weighted sum of these two criteria. The weighting factors are called regularization parameters, which provide a trade-off between template deformation and data mismatch. Template representation As described in the example, a raw image is scanned from a ClonTech cDNA mouse array. Each sample spot in the three-dimensional view of the input image of the array corresponds to a local minimum in the gray level space. The following model is then used in order to extract pertinent information.

An array of gene samples is represented as a set of N circle spots of the same size. Each spot is a circle represented by S(C(P,r),J) , where C(P, r) is a circle centered at Euclidean coordinate P = (x, y) , and with radius r . The intensity of the spot is an average value 7 of pixel intensities inside the circle. The radius of a microarray spot is determined by the printing and imaging conditions, and can be measured and set as a constant over the whole array. Therefore, the prototype template is based on prior knowledge about a generic array, which can be set as a grid of evenly spaced circles around the array. The objective of using a deformable template is to find the optimal position of the centers of all spots regarding both data mismatch and template deformation factors. Data mismatch energy The data mismatch energy measures the fitness of the deformed template to the input pattern. Since the input image is in gray level, and the gene expressions are represented by the integral intensities of local regions, data mismatch energy can be defined in term of the integral intensities:

Defining potential function, _at each _{spotj t}h_e d_ata mismatch energy is simplified to:

Here φ_t is the weight of the z^'-th node, and r is a predefined radius for the sample spots. The potential function is an integration of gray level values over a local region, which has smoothing abilities. Experimental results exhibit its robustness to minor perturbation and noise. Template deformation energy

The template deformation energy measures the deviation of the deformed template from the prototype template. The template deformation to translation happens due to waφing of membranes. In order to quantify the degree of deformation, the deformation energy for each spot is defined as the

Euclidean distance of the deformed control node P, from its predefined position P_l0 :

E₂ v [(*. - ,o⁾² + ⁽y, - v,o)²] (3)

Here, λ, is the flexibility of the -th node. The minimization of the template deformation energy helps to prevent the nodes from being attracted by perturbations and noise in local backgrounds, and thus improves the robustness of the proposed algorithm. Relaxation

Combining the template deformation and data mismatch energies, one can define an overall energy function as: E = aE_t +E₂ ; a is a regularization parameter.

(4)

The localization procedure of sample spots coπesponds to the minimization of the overall energy. The minima of this function can be obtained by using global optimization methods, such as dynamic programming, greedy searching algorithm, neural networks, etc., yet at the cost of excessive computing. Since each spot of an array is printed at a predefined position, it is assumed that the gene spots are presented in a regular array, for which the initial position can be chosen by predetermined sites according to the number of loci in the entire matrix. It is also assumed that the prototype array template can be placed close enough to the input array. Thus the array spots can be localized by finding a local minimum around the initial grid. This can be realized by a deterministic gradient descent technique: Δx, = →7 — = -*7 ^• (α ^• P, (70⁵,) ^• -) + A (*, - *,o))

CDC, GX,

(5)

Δv, = - ~ = -?7 - (« - ^ ( (R,) - -) + A (v, - ,₀))

(6) /^' = 1, 2, ..., N , and η is the learning rate.

In an isotopic array, the flexibility and weight of each spot are set to be the same. The positions of the target centers are regressed to minimize the energy defined by (4), which can be inteφreted as a set of nodes on a rubber band; the positions of the nodes tend to gravitate to the bottom of a local minimum of the energy field. In order to restrict the movement of the nodes and avoid tuning an optimal value for the learning rate, the nodes are moved one unit in each iteration, according to the sign of Ax and Δy • For example, to extract gene expressions from one function group of an Atlas cDΝA array, a 3*3 out-range pixel smoothing filter (Ekstrom, M. P. Digital Image Processing Techniques, Academic Press, London, 1984) is applied to reduce salt-and-pepper noise. The membrane is waφed during the procedure of hybridization and washing, so that it is impossible to find an exact match between the membrane under investigation and a predefined grid template. However, after less than five iterations, all nodes in the deformable template come to a stable position, which minimizes the overall energy function defined by (4). Visual inspection confirms good correspondence between the deformed grid template and the gene spots. Detection of unreliable regions In array hybridization, some strong signals are observed in autoradiographs, colloquially called 'bleeding' spots. These over-expressed signals cause problems in quantification of not only themselves, but also the genes in their neighborhoods. Most commercial packages for analyzing array data either require the users to check bleeding genes and their neighbors visually, or simply ignore the existence of these spots. Failure to detect overexpressed spots may lead to erroneous results in comparing two gene expression profiles. The objective of the approach described herein is to identify the bleeding regions automatically, and alarm the users to potential eπors in the quantification result. Thus, the users are relieved from tedious and subjective evaluation of the problematic regions in their membranes. The following basic assumptions have been made about the spots in a generic array:

The spots are mutually exclusive.

The change of intensity within each spot has a certain range.

Each spot is a connected component with limited size. Therefore, the identification of over-expressed spots can be simplified as filtering out the spots larger than a predefined size. Based on shape, mathematical moφhology provides an efficient approach to process digital images, and has been widely used in solving image processing problems which were difficult to solve by linear filters. Appropriately used, mathematical moφhological operations tend to simplify image data by preserving their essential shape characteristics and eliminating iπelevancies. The basic mathematical moφhological operations are erosion and dilation. Based on the composition of erosion and dilation, opening and closing are defined. Considering the case of a binary image, let A be the set of points representing the binary 'on' pixels of the original binary image, and B be the set of points representing binary 'on' pixels of the structuring element. The basic moφhological operations are defined below:

Dilation of a binary image A by binary structuring element B, A ® B = {b + a I for some b _ B and a e A} Erosion of a binary image A by binary structuring element B,

AΘB = {p \ b + p e A for every b e B)

Opening of a binary image A by binary structuring element B, A ° B = (AΘB) Θ B

Closing of a binary image A by binary structuring element B, A • B = (A Θ B)ΘB

In order to distinguish the over-expressed spots from the normally expressed spots, a disk structuring element, C, whose size is the upper limit of a spot on the aπay being studied, was selected.

C = {(x, y) I x² + y² < R²} , where R is the maximal radius of an ideal spot. The procedure of identifying the over-expressed spots can be described as follows:

Step 1. Binarize the input image E = {f(x, y) | 0 < x < M, 0 < y < N} by global thresholding techniques, such as Ostu's thresholding method (Ostu, Ν.A. IΕΕΕ Trans, on Syst. Man & Cybern. 8, 62-66 (1978)) based on optimal discriminant analysis: G = {g(x,y) \ 0 ≤ x < M, 0 ≤ y < N} ,

{ if f(x,y) > T g(x, y) = <, , T is Ostu's optimal threshold

[0 otherwise

(7) Step 2. Filter out the normal size spots by moφhological opening with structuring element C: G'= G ° C = {g'(x,y)} (8)

Consequently, the unreliable regions are represented as a set:

UR = {(x,y) \ g'(x,y) = l, (x,y) _ F) (9)

Step 3. Identify control node P, = (χ,,y,) as an over-expressed spot if

(x„y, ) e UR . In fact, quantification problems reside not only in over-expressed spots, but also those spots in their neighborhoods. During the iterations of locating the real positions of the spots, some control nodes are quickly attracted by the over-expressed spots, since a low data mismatch energy field is formed around these spots. In this system, a gene expression spot is labeled as unreliable, if the coπesponding control node moves far from its initial position.

Target measurement

In quantitative analysis with microaπays, it is assumed that the amount of fluorescent light or radioactivity emitted from each spot is representative of the amount of labeled nucleic acid probe associated with that spot (Atlas™ cDNA Expression Arrays User Manual, Protocol #PT3140-1, Version #PR91208, pp7-9, 1998). The intensity of each spot displayed in a 3D plot shows that the assumption of hemispheric shape for a sample spot is not sufficient for quantification puφoses. The following integration function is used to calculate the volume of each spot: I(P_i) = ffidxdydz = jf(P)dx fy

In discrete images, the integration is replaced by summation, and uses the area of a circle to normalize the volume:

in which P_j is the target node obtained from the deformable template matching iterations. The normalized volume is a real value within a range [0, 7_max ] , where I_max is the upper limit of a pixel value. For 8-bit and 16-bit gray-scale images, 7_max equals 255 and 65535 respectively.

Performance Test: Automatic Processing vs. Manual Operation The system was trained based on 24 sub-images from two ClonTech filters. The training stage includes selection of the optimal regularization parameter in the definition of energy. The system was tested on 204 sub- images from 17 ClonTech filters, and 50 bio-chip microaπay images printed in house. Testing results show that the proposed model can extract the spots satisfactorily, when they are mutually exclusive. However, some spots are attracted by the over-expressed ones, and thus cause distortion to the quantification procedure. Although the system is able to identify these eπoneous spots automatically, an ultimate solution requires improvement in the design of optimal probes and experimental conditions. Table 1 lists the comparison among three systems in analyzing Atlas™ Aπays, including Atlaslmage™ 1.0 released by ClonTech (http://www.clontech.com), EstBlot (Adryan, et al., BioTech 26, 1174-1179 (1999)) developed by Johannes Gutenberg University, and Array Analyzer developed in our lab. Compared with the existing systems, our system performs several times faster. The system error is measured by repeating the quantification procedure on one aπay six times. Two factors are considered in evaluating the system:

^N σ ^N σ

Average error μ_err = Avg(— '-) and Maximal eπor M_err = Max(—^L) , in which ,=ι μ, '=ι μ, μ, and σ_t are the average value and standard deviation in measuring the t^'-th spot among several repeats. Table 2 lists the comparison of two systems on both ClonTech filters and the microaπays printed in house, and it is clear that the system described herein outperforms the existing system.

In order to quantify the system eπor involved in the proposed aπay image analysis method, two samples are compared, and the following aspects of the comparison procedure analyzed: eπor in quantification, eπor in ratio, and eπor in normalization. The relative eπor in quantification is defined as σ, I μ, , where σ, and μ_t are the standard deviation and average value of the z-th spot among the six independent tests. To compare the gene expression of two samples, we define a relative fold ratio instead of the conventional fold ratio. For samples S, and S₂ , the relative fold ratio is computed as

(S — S ) R_m: (S, , S₂ ) = ^! — , which is a value in the range [-1JJ] ; positive max(S, , S₂ ) values indicate up-regulation and negative values indicate down-regulation from S, to S₂ . The relative fold ratio has a similar meaning to that of the conventional one, except that the value is normalized and easy to understand.

For instance, R_RI, = ±0.5 coπesponds to up- or down-regulation at two times. The relationship between the abundance of the genes and the average of maximum eπor in ratio comparison. Regarding genes with normalized abundance in [0, 0.33], (0.33, 0.67], (0.67, 1] as low, medium, and high abundance, the maximum eπor in ratio for each category is 1.89%, 4.89% and 14.54%, respectively. It is clear that the ratios among low abundance genes are more sensitive to the initial positions of the prototype template in the quantification. Therefore, the system is more reliable in dealing with medium- and high- abundance spots. Table 3 lists the relationship between the automatic evaluation (reliability labels) of the spots and the quantitative eπors. Quantitative studies confirm that the automatically identified 'unreliable' loci suffer from more errors than the other two categories. Actually, the percentage of 'unreliable' spots can be used as an indicator of the quality of the hybridization procedure.

Aπay Analyzer automates most of the quantification and comparison functions. It processes different function groups of an Atlas cDNA aπay separately, and therefore enables the system to be readily generalized to the processing of all types of aπays. Meanwhile, Aπay Analyzer is robust to variation among different users. The quantification of spots largely depends on the nature of the input image, rather than the subjective judgement of the users. While most commercial systems rely on human operators to evaluate the reliability of the results, Aπay Analyzer automatically identifies the potential eπor caused by over-expressed spots and their shadow. One of the unique features of this system is to discard unreliable quantification and exclude misleading results in advance. It is worth noting that due to sequence-dependent hybridization characteristics and variations inherent in any hybridization reaction, Atlas data and any other aπay data should be considered only semi-quantitative. It is always necessary to verify any interesting results of aπay experiments with other assays to measure the level of RNA via Northern and/or semi-quantitative RT-PCR methods.

In summary, the process defined by the computer program presented here addresses some of the problems of unreliability inherent in indiscriminate microaπay design. A further solution is to design a microaπay in a cluster pattern, in a "divide-and-conquer" fashion, to group the genes according to their intrinsic level of abundance, and then aπay them in the template. The "divide-and-conquer" approach also provides versatility, allowing follow-up study of a selected cluster of genes in a focused effort. Emerging reports show the power of the microaπay analysis approach to determine gene expression changes in a template of 18,000 to 20,000 genes. The method described herein allows the detection of areas where most changes occur, permitting an in-depth follow-up verification via other molecular methodologies such as Northern blotting analysis and/or semi- quantitative RT-PCR analysis. Furthermore, selected areas can also be used for in-depth analysis, to compare computer automated analysis versus manual densitometric tracing. Ultimately, the computational work involved in any microaπay analysis should document multigene changes with maximal reliability and repeatability, and minimal human eπor. This problem can be dealt with by high-powered mathematical modeling and computer programs, but more importantly it needs to be borne in mind when the microaπays are designed. Therefore, "designer biochips" to cluster genes according to their level of abundance in a biological system, and the use of computerized automatic processing of data analysis, ease the task of bioinformatics in the cuπent race of high-throughput technology. Example 1: Comparison of DNA from young and old animals using a Deformable Template.

Total RNA was isolated from frozen liver by extracting homogenized liver with guanidium/phenol solution, according to the technique described in P. Chomcynski and N. Sacchi, Anal Biochem. 162, 156 (1987). The aqueous phase was further extracted with (25:24: 1) phenol: CHC1₃:IAA, and glycogen was removed by centrifugation. RNA was recovered by ethanol precipitation, quantified by spectroscopy, and qualified by gel electrophoresis. 1.5 μg of high quality Dnase-treated total RNA was used in the [alpha-³²P] dATP labeled MMLV reverse transcriptase cDNA synthesis, using 588 optimized primers (ClonTech). Unincoφorated ³²P-labeled nucleotides were removed from the labeled cDNA probes by profiling with Chroma Spin spin columns (ClonTech), using gravity elution. Fractions containing the cDNA/R A probes were converted to single stranded cDNA at 68°C with 1 M NaOH, and neutralized with 1 M NaH₂PO₄[pH 7.0]. The probes were hybridized overnight at 68° C to prehybridized (68° C for 30 minutes) ClonTech mouse microaπays, containing 588 PCR gene products, in the presence of C₀t-1 DNA and sheared salmon sperm, using ExpressHyb hybridization solution (ClonTech). The membranes were washed X4 with prewarmed 2X SSC, 1% SDS, followed by two additional washes with pre-warmed 0JX SSC, 0.5% SDS. The membranes were then wrapped wet in plastic wrap for autoradiography and phosphorimaging .

In analyzing a ClonTech Atlas mouse cDNA expression array, the autoradiograph film is scanned at 300DPI, 8 bit gray scale, by a typical commercially available flatbed scanner, Saphir Linotype-Hell. The acquired image is enhanced and analyzed by a software package, Array Analyzer, developed in-house. In order to deal with membranes with undesirable waφing, a deformable template is defined to quantify the gene expressions automatically. Each node in the deformable template iterates according to the gradient descent rule, which minimizes an energy function combining data mismatch energy and template deformation energy. An ideal gene spot is modeled as a circle with a predefined radius; thus the gene expressions can be quantified by integrated intensities. Aπay Analyzer is also capable of identifying "bleeding" regions and thus alarm the user of potential eπors in the quantification. In the later analysis, these regions are carefully checked, and are excluded from the discussion. To compare the gene expression of two samples, a relative fold ratio was defined instead of the conventional fold ratio; the relative fold ratio between samples show that, taking account of human eπors in the operation, the pairwise comparison is more reliable in medium and high abundance genes. The experimental results described in the text are based on the average of three repeats on each filter. In normalizing the two samples, Lambda DNA was selected as a negative control, and HPRT, MOD, and G3PDH were selected as positive controls. A linear normalization method is applied.

Table 1 Performance of three systems in quantifying two ClonTech Atlas arrays

Software Atlaslmagi e™ 1.0 EstBlot Aπay Analyzer

Overall alignment 20 min^{(1 )} M 15 min M 1 *9 M sec⁽²⁾

Fine tune the alignment ~30 min M 5 min M 0.5*9 A sec

Adjust background 15 min M 3 min A 0.5 sec A

Check the reliability of samples 15 min M N/A 20 sec A

Repeat for the second aπay 1 hour M 0 A 1.5*9 M & A sec

Compare the two aπays 15 min M & A 4 min A 1 *6 sec A

Customizing report (tabular data) 10 min M & A 0.5 min A 0.5 min A

Customizing report (graphical N/A N/A 5 min M & A data)

(1) The processing time of AtlasImageTM 1.0 is that estimated in the manufacturer's instruction.

(2) The processing times of EstBlot and Array Analyzer are tested on a Pentium® 400, 384MB RAM.

In this table, 'M' refers to manual operation, and 'A' refers to automatic processing. Operator '*' in measuring the processing time of Array Analyzer means the repeat times of the same procedure. Since Aπay Analyzer is designed for generic array analysis, each group of genes and housekeeping genes in ClonTech' s cDNA array is treated as an individual array. Thus, the quantification function is applied nine times for each array (6 times for functional groups and 3 times for control genes), and the comparison function is applied six times. Table 2 Comparison of system errors in quantification

Displacement EstBlot Aπay Analyzer AπayAnalyzer

(3-10 pixels) (ClonTech (ClonTech (Microaπays printed in filters) filters) house)

M__* (%) 11.49 6.97 0.93

M. 55.86 33.10 5.01

Two independent operators are required to provide the initial position (left-top and right-bottom corners) of the prototype template. Each operator repeats the same procedure on one sample for three times. During testing, it was observed that human operators could displace the initial position of a grid template from three up to ten pixels in aligning. For the six repeats, two factors are

N _σ considered in evaluating the system: Average eπor μ_err = Avg(— '-) and

,=ι μ,

^N σ Maximal eπor M_err = Max(—) , in which μ and σ. are the average value and

standard deviation in measuring the z-th spot.

Table 3 Comparison of errors for spots with different labels

Label Number (total μ_err (%) μ(Max{R_DM) - M,n(R_DM)) number is 98)

O (normal) 69 (70.4%) 6.56 0.1039

? (unreliable) 22 (22.45%) 9.46 0.2177

(over-expressed) 7 (7J 4%) 3.18 0.04

The experimental conditions are the same as in Table 3. This table lists the relationship between the automatic evaluation (reliability labels) of the spots and the quantitative eπors. Two factors are considered: average eπor

^N σ μ_en - Avg(—^L), and the average of maximum eπor in ratio comparison. It is

,=ι μ, shown that the automatically identified 'unreliable' loci suffer from more eπors than the other two categories. Actually, the percentage of 'unreliable' spots can be used as an indicator of the quality of the hybridization procedure. Example 2: Digoxigenin Enzymatic Detection for Microarray Analysis of E-Box Binding Related Gene Expression.

Realizing the advantages and problems of cDNA microaπays for expression profiling, a new approach was developed based on utilizing digoxigenin (DIG) to label target cDNA produced from gene-specific primers, with subsequent incubation with anti-digoxigenin antibody conjugated with alkaline phosphatase (AP), and colorimetric or chemiluminescent detection. A set of genes containing the E-box binding element (CACGTG), located in promoter regions of many genes, was selected as the probes. Probably the best- known representative of E-box-binding genes is c-Myc, whose transactivating activity plays crucial roles in the regulation of cell cycle, proliferation and apoptosis (Eilers, M. Mol. Cells 9, 1-6 (1999); Dang, C.V. Mol. Cell Biol. 19, 1-11 (1999); Facchini and Penn FASEB Journal 12,633-651 (1998)). Genes interacting with or regulating expression for c-Myc, as well as some target genes whose expression is E-box-binding-dependent, are included in this microaπay. These custom-designed microaπays, combined with the enzymatic approach to label hybridization probes, allow the development of an inexpensive, user-friendly system for high-throughput gene screening assay of specific subgroups of gene expressions. Materials and Methods Selection of probes for arraying

E-box-binding proteins, as well as c-Myc-regulating, -interacting and target genes, were chosen from different data bases - GeneAtlas (http://www.citi2.fr/GENATLAS), GeneCards (http://bioinfo.weizmann.ac.il/cards), GenBank (http://www.ncbi.nkm.nih.gov/Web/Genbank) and PubMed (http://www.ncbi.nlm.nih.gov/PubMed). Unigene (http://www.ncbi.nlm.nih.gov/UniGene/index.html) cluster numbers and sequences were used to identify genes and verify their uniqueness. Nine housekeeping genes, as well as HeLa cell DNA, were selected as positive controls; as negative controls, lambda DNA and 2xSSC (2x standard salt solution - 0.3 M NaCl, 30 mM Na citrate, pH 7.0) were chosen. For each gene, a pair of primers was generated with the help of Primer3 software (Rosen and Skaletsky (1998) Primer3. Code available at http://www-genome.wi.mit.edu/ genome software/other/primer3.html.). The program parameters were chosen in such a way that the melting temperature of the amplicon should be close to 80°C but not more than 88°C or less than 75°C, the length of the amplicon was to be generally around 450 bp (with a few outlyers between 300 and 700 bp), with primer annealing temperature about 60°C, and average length of primers 23 bp. Sequences of all amplicons have been carefully verified using proprietary software (BLASTN, FASTA), to avoid homology with repetitive elements and other related sequences, and also to distinguish between genes from the same family. A full list of all selected genes is represented in Table 4. DNA. RNA and mRNA isolation

Total RNA and DNA were isolated from approximately 10 HeLa cell cultures and human peripheral lymphocytes isolated from fresh blood aliquots using Trizol reagent (Gibco BRL, Burlington, ON). DNA and RNA concentrations and quality were determined by spectrophotometric and gel electrophoresis analysis in 0.8 or 2% agarose gels, respectively. Poly(A)⁺RNA was isolated from 150 μg of total RNA using the Oligotex mRNA kit (Qiagen, Mississauga, ON), according to the manufacturer's instructions. Amplification and purification of probes 10 μg of total RNA was reverse-transcribed in 40 μl reaction, using 200

U of MMLV (Gibco BRL, Burlington, ON) according to the manufacturer's instructions. Two PCR reactions for each pair of primers were conducted in a total volume of 100 μl, in a GeneAmp PCR system 9700 (PE Applied Biosystems, Norwalk, CT). Each 50 μl reaction (10 mM Tris-HCl, pH 8.6, 50 mM KCl, 0.1% Triton X-100, 1.5 mM MgCl₂, 0.5 mM of each dNTP, 20 pM of each primer, 1.25 U of Taq DNA polymerase (Amersham Pharmacia Biotech, Baie d'Urfe, QC) and 10 μl of RT reaction or 100 ng of genomic DNA) was thermal-cycled as follows: first cycle at 94°C for 5 min, 35 cycles at 94°C for 45 sec, at 60°C for 1 min and at 72°C for 30 sec, the last cycle at 72°C for 7 min. Probes that could not be amplified in RT-PCR were extracted from genomic DNA, with the condition that the primers were selected in the 3' region of a gene. Size and yield of PCR products were determined by gel electrophoresis in 2% agarose. Then PCR products were purified from solution or agarose gel bands, following preparative agarose gel electrophoresis (if byproducts were determined), using GFX columns (Amersham Pharmacia Biotech, Baie d'Urfe, QC). After purification, concentrations of all probes were estimated by agarose gel electrophoresis, and adjusted to approximately lOO ng/μl.

Robotic arraying

Purified PCR products in 2x standard salt solution (SSC) were aπayed in triplicates from 384- well plates, utilizing a GeneMachines™ OmniGrid microaπayer (Genomic Instrumentation Services, San Carlos, CA) equipped with ChipMaker2 tips (Telechem International, San Jose, CA). The spacing between dots was 400 μm. Microaπays were printed on Hybond-N or Hybond-N+ nylon membranes (Amersham Pharmacia Biotech, Baie d'Urfe, QC), attached to standard glass slides with tape. Before and after each 10 slides with membranes, regular slides were inserted to inspect printing quality. After aπaying, membranes were UV iπadiated at 50 mJ (GS Gene linker, Bio- Rad, Hercules, CA) to immobilize the DNA; then fragments of membranes containing arrays (approximately 1 x 1.5 cm) were cut off, denaturated in boiling water for 5 min, rinsed in 0.1 % SDS for 5 min, and used for prehybridization. After the UV iπadiation step, membranes can be stored attached to glass slides.

Preparation of DIG-labeled cDNA for hybridization An initial mix of gene-specific primers (GSP) was produced. For this puφose, 1 nM of each primer that was used in RT-PCR reactions to prepare probes was mixed in a total volume of 250 μl. Digoxigenin (DIG)-labeled targets were produced in RT reaction as follows: 1 μl of GSP, 4 μg of total RNA, and RNAse-free water in total volume of 14 μl were heated at 65°C for 15 min to denature the RNA, and then kept at room temperature for 5 min for primer annealing. Alternatively, 2 μg of mRNA and 400 ng of oligo(dT)_{1 -}ι₈ primers were used. The reaction mix, containing 8 μl of 5x first strand buffer supplied by the enzyme's manufacturer, 2 μl of 10 mM mix of dATP, dCTP and dGTP (final concentration 500 μM each), 4 μl of 0.1 M DDT, 0.7 μl RNAguard, 31 U/μl (Amersham Pharmacia Biotech, Baie d'Urfe, QC), 10 μl of a 2 mM mix of 19:1 dTTP:DIG-l 1-dUTP (Roche, Laval, QC) and 2 μl (200 U/μl) of Moloney murine leukemia virus reverse transcriptase (MMLV RT) (Gibco BRL, Burlington, ON), was added. Reaction was carried out at 37°C for 1 h, followed by enzyme degradation at 94°C for 5 min in GeneAmp 9700. Alternatively, Omniscript reverse transcriptase (Qiagen, Mississauga, ON) was used according to the manufacturer's instructions. Labeling reactions were purified on GFX columns; this step eliminates all labeled products shorter than 100 bp, as well as unincoφorated nucleotides, primers and protein. After purification, efficacy of labeling was estimated as follows: 1 μl of 1 : 100, 1 : 1000, 1 : 10000 and 1 : 100000 dilutions were spotted on Hybond-N membrane, together with dilutions of control DIG-labeled DNA at known concentrations (10-0.01 pg/μl) as standardization for our assays (Roche, Laval, QC); after immobilization with UV, the membrane was incubated with alkaline phosphatase (AP)-conjugated antibody to DIG (Anti-DIG-AP), rinsed, and stained with chemiluminescent substrate, Disodium 3-(4-methoxyspiro{l,2- dioxetane-3,2'-(5'-chloro)tricyclo[3.3JJ³'⁷]decan}-4-yl) phenyl phosphate - CSPD (Roche, Laval, QC), according to the manufacturer's instructions. Hybridization and processing For hybridization and pre-hybridization, DIG Easy Hyb buffer (Roche,

Laval, QC), or formamide buffer containing 50% deionized formamide, 5x SSC, 2% blocking solution (Roche, Laval, QC), 0.1% N-lauroylsarcosine, 0.02%) SDS, 100 μg/ml denaturated salmon DNA, were used. Membranes were pre-hybridized at 42°C for 2 h in a hybridization oven (Autoblot, Bellco, Vineland, NJ). Hybridization was performed at 42°C overnight in 1 ml or less of hybridization solution, in 5-ml Falcon tubes. The concentration of labeled probes in the hybridization mix constituted 10 ng/ml. Before hybridization the probes were denaturated at 65°C for 10 min in hybridization solution.

Afterwards, hybridization membranes were rinsed (unless mentioned specially) twice with lxSSC, 0.1% SDS for 15 min at room temperature, and then with prewarmed OJxSSC, 0.1% SDS for 15 min at 68°C. Alternatively, membranes were rinsed in more stringent conditions, i.e. twice in 2xSSC, 0.1% SDS at 68°C. for 30 min, and twice in OJxSSC, 0.1% SDS at 68°C for 30 min. After equilibration for 5 min in rinsing buffer (0.3% Tween 20 in maleic buffer (0J M maleic acid, 0J5 M NaCl, pH 7.5)), membranes were blocked for 1.5 h in 1 % blocking solution under slight agitation, and then treated for 30 min in 10 ml of alkaline phosphatase-conjugated sheep anti-digoxigenin antibody (Roche, Laval, QC), diluted 1 :1000 for colorimetric staining, or 1 :10000 for chemiluminescent detection. Following antibody incubation, membranes were rinsed three times for 15 min in rinsing buffer, equilibrated for 2 min in detection buffer (0.1 M Tris-HCl, 0.15 M NaCl, pH 9.5), and stained with 175 μg/ml 5-Bromo-4-chloro-3-indolyl-phosphate, toluidine salt (BCIP), and 330 μg/ml Nitro blue tetrazolium chloride (NBT) in detection buffer. Alternatively, 1 : 100 dilution of CSPD was applied, and chemiluminescence was detected according to the manufacturer's recommendations (Roche, Laval, QC) using BioMax MR Kodak film.

Scanning and evaluation of arrays

Arrays were scanned on an Olympus microscope equipped with a Multiscan-4 System (Applied Scientific Instrumentation, Eugene, OR) and a color CCD Sony 950 camera. Data acquisition and montage of different fields of view into one file were accomplished with the help of the Northern Eclipse Imaging System (EMPIX Imaging, Missisauga, ON). Quantitative measurements of intensity of enzymatic reaction at each dot, background subtraction, normalization to housekeeping genes, and comparison of paired hybridizations were all performed with an in-house software program. Results

Selection of probes and primers

After careful evaluation of different data bases, 61 genes were selected for arraying, including 9 housekeeping genes. This set of genes contains 38 E- box binding genes, together with the Myc (c-, N-, LI and L2) family, 5 c-Myc regulating factors (ZFP161, nm23-H2S, MBP-1, RBMS 1 and RBMS2), 5 c- Myc interacting genes (YY1, TFII-1, PAM, MM-1 and alpha-tubulin), and 4 c- Myc target genes (prothymosin alpha, MRDB, ODCl , and cdc25A). Positive controls include 9 housekeeping genes with different levels of expression (UBC, beta-actin, GADPH, HPRT1, phospholipase 2, HLA-C, PRS9, aldolase C, and RPL13A), and also HeLa genomic DNA. Lambda DNA and 2xSSC (2x standard salt solution), which was used as solvent for all probes, were selected as negative controls.

Primers for all genes were selected with the help of Primer3 software, provided that they coπesponded to the same conditions for PCR reaction, and produced products of similar melting temperature. Most products were produced from HeLa or lymphocyte cDNA. In case PCR amplification failed from cDNA, primers were selected in the 3' region of these genes, and amplicons were produced from HeLa genomic DNA. The average annealing temperature of primers was 60J±0.9°C, which allowed all PCR reactions to be in the 96-well format. Sizes and melting temperatures of products, and annealing temperatures of primers, are represented in Table 4. The average size of PCR products for aπaying, and their melting temperature, were 441 ±58 bp and 80±3°C, respectively. Selecting these parameters allowed hybridization and post-hybridization rinsing in stringent conditions, decreasing drastically the possibility of cross-hybridization and background level.

Scrupulous selection of primers may be used to distinguish in some cases between very close members of gene families (for example, USFl and 2, ID2, 3 and 4, members of the Myc family, and so on), or between two different transcripts of c-Myc. As is well known, there are several different transcription forms of c-Myc, transcribed from different promoters, with varying regulation properties (Bodescot and Brison Gene 174, 115-120 (1996)). Selecting primers in the 1^st exon and the 2^nd-3^rd exons allowed discrimination between full-size and truncated forms of c-Myc. Conditions influencing hybridization Several parameters which probably influence the results of hybridization with cDNA microarrays printed on nylon membranes were carefully tested. First of all, gene profiling results were examined using either mRNA or total HeLa RNA. Suφrisingly, the whole pattern of expression was very similar, with the exception of a few genes (UBC, RPL-13A, MBP-1) the signals from mRNA were several times higher; the most prominent difference was found in UBC, where it approached 5-fold. Alternatively, signals for HPRT1 and phospholipase A2 were higher with total RNA. In conditions where quantity of mRNA is a limiting factor, total RNA can be used instead, without significant differences in results of expression profiling.

Comparison of two reverse transcription enzymes, Moloney murine leukemia virus (MMLV) (Gibco BRL, Burlington, ON) and OmniScript (Qiagen, Mississauga, ON), used for production of digoxigenin-labeled targets for hybridization, did not reveal any difference in expression profile when gene-specific primers were used; but signal intensity was stronger after labeling with MMLV, especially after a day of staining (Table 5). When oligo(dT) primers were used with mRNA, some significant differences in expression levels of several genes were detected. Labeling with OmniScript produced 2-3 times more intense signals for RP-S9, RP-L13A, enolasel, N-Myc and MAD4. To decide which buffer is better for hybridization with microaπays, EasyHyb™ (Roche, Laval, QC) and formamide-based buffers were compared. The expression profile of HeLa mRNA was found to be independent of buffer composition, but signals were higher after hybridization in formamide buffer (Table 5), and addition of 2% blocking reagent further reduced background in comparison with EasyHyb™, thereby facilitating subsequent scanning and image evaluation.

No substantial differences were found in expression profile of HeLa mRNA when rinsing conditions of different stringency were used (see Materials and Methods). More stringent rinsing evenly lowered all signals, and produced signals with shaφer borders, rendering them easier to scan and evaluate. Standard rinsing conditions are probably already stringent enough in hybridizations with cDNA microarrays and gene-specific primers; therefore standard rinsing is prefeπed, because it is not so time-consuming.

Comparison of positively charged (Hybond-N+) with neutral (Hybond- N) nylon membranes revealed no differences in sensitivity. Aside from this consideration, the neutral (Hybond-N) nylon membrane is preferable due to its stronger texture for printing support. This strength was not found in the positively charged Hybond-N+ membrane, which was found to retain visible printing footprints, causing complications in image analysis and increased background. As may be seen from Table 5, increasing the staining time from overnight to 1 day usually increased the overall strength of signals by only 10%. Longer staining time increased the background level of the reaction, which compromised the possible advantage of higher sensitivity. Variations in hybridization conditions can increase overall signal intensity by 30-40%. However, the positive effects are not additive, and the maximum difference in total intensity of microarrays approaches only 50%. The following conditions for hybridization of DIG-labeled targets with the cDNA microaπay are optimal: printing probes on neutral nylon membrane, reverse transcription reaction with total RNA, gene-specific primers and MMLV reverse transcriptase, hybridization in formamide buffer, and standard rinsing conditions. These conditions were implemented in the experiments described in the following paragraphs.

Specificity, sensitivity and reproducibility of hybridization

To evaluate the specificity of cDNA microaπay hybridization, 5 genes (MRDB, ODC, TFII-1, cdc25A and c-Myc), covering the entire range of length (368-711 bp) of aπayed products, were labeled in multiplex PCR reaction and hybridized with cDNA arrays (Figure 6). As expected, only 5 samples on the array were positive, as well as the HeLa genomic DNA as control, since it will hybridize with the locus where HeLa genomic DNA was spotted at the highest concentration at position 51, and negative show little or no detection at positions la and lb where spotted HeLa genomic DNA is of low quantity. In all, these experiments demonstrate no signs of cross-hybridization (Figure 4). To estimate the sensitivity and derive a calibration curve for cDNA microarray hybridization, different concentrations of this 5 -gene PCR mix (10, 4, 1, 0.4, 0J and 0.04 ng/ml) were hybridized with aπays. The results of this experiment are presented in Figure 5. Linear dependence in semi-logarithmic coordinates, with an obvious plateau in the region of 4- 10 ng/ml, was observed for all genes, with the same slope of 45±2. The lower limit of detection varies slightly for different probes in the aπay, and coπesponds to 40-100 pg/ml per individual gene. These results are close to the detection limit of the digoxigenin system (10-30 pg/ml), according to the manufacturer (Roche, Laval, QC). This level of sensitivity allows detection of mRNAs of intermediate abundance, each representing more than 0.04% of total cell mRNA. Taking into account this detection level, it is estimated that for hybridization with a microaπay containing about 70 genes of intermediate abundance, 7 ng of labeled probe produced from gene-specific primers should suffice. For the next hybridizations, a concentration of labeled probes of 10 ng/ml was selected. The yield of standard reverse transcription labeling reaction with gene specific primers is about 20-40 ng; therefore, one labeling reaction yields enough product for 2-4 independent hybridization reactions. In contrast to unstable radioactive probes, DIG-labeled probes can be stored and reused several times. Reusing hybridization mixes 2-3 times, after storing at - 20° C for several months, gave results quite concordant with the original ones. The arrays were scanned at a resolution of 3600 dpi, and results were compared with results of microscope scanning. In general, variability between replicated dots was higher in the case of the scanner, and linearity may be influenced by the scanner's software. The scanner can be used for initial evaluation of hybridization results, especially when chemilumenescence detection is implemented.

Expression profiling of Hela cells in comparison with human lymphocytes Expression profiles of E-box genes were determined in replicating

HeLa cells (Figure 8a) and normal human lymphocytes (Figure 8b). In lymphocytes, the most prominent alteration consisted of more than 2-fold up- regulation of E-box-related genes TCF4, MAD4 and Aldolase C. Alternatively, down-regulation of c-Myc-regulating genes MBP1 and Nm23- H2S, and small down-regulation of c-Myc and up-regulation of N-Myc, were registered in lymphocytes in comparison with HeLa cells. Expression of some c-Myc interacting and target genes was down- (MM- 1 , ODC 1 ) or up-regulated (PAM, MrDb) in lymphocytes. Also, small up- (MITF, ID2) and down- (TFEB) regulation was detected in expression of several E-box-binding genes in lymphocytes, in comparison with HeLa cells. These results are shown in Figures 9a, 9b and 9c. Summary cDNA and oligonucleotide microaπays are becoming an increasingly powerful technique for investigating gene expression patterns. In spite of the fast progress in this field, some limitations of the technique persist. One of the major obstacles is the requirement for a large amount of mRNA. Another problem with existing microarray systems is data mining; while infonnation on expression of tens of thousands genes is absolutely vital to estimate the functions of new genes, in some instances a researcher is interested in the expression profile of only a subset of genes, in many physiological conditions. The significant differences in expression of 3-6 genes out of 61 are already much more manageable than can be detected from ordinary microaπays with massive numbers of genes, in the hundreds or thousands. For example, in SAGA analysis of 45,000 genes, it was found that about only 1% are differentially expressed in normal and cancerous human cells. A similar estimation resulted from analysis of expression profiles in young and old mice; expressions of only 1.8% of about 6,000 genes are changed more than 2-fold. Printing microaπays on nylon filters, and using digoxigenin to label the cDNA with gene-specific primers, permits use of as little as one to 4 μg of total RNA per hybridization. This is the same sensitivity that can be attained with radioactivity in the Clontech protocol, and it is much more sensitive than ordinary microarrays, which need several μg of mRNA, and therefore require 100 to 1 ,000 micrograms of total RNA to begin with. In addition, DIG-labeled probes of high labeling sensitivity can be stored for a long time, and reused several times, in contrast to fluorescently or radioactively labeled ones.

Careful selection of genes for inclusion in a microarray, and using digoxigenin for labeling, also helps avoid another disadvantage of radioactive labeling: genes in the E-box microarray are all in the same category of abundance (intermediate or low abundant). Excluding highly abundant genes eliminates the problem of merging of strong signals. Merged signals in some circumstances substantially complicate the process of scanning, and create unreliable results during the data acquisition step. Other advantages of using the enzymatic labeling approach, superceding both the radioactive and fluorescent probe approaches, are the time-saving and repeatability aspects. In general this process from start to finish, including the steps needed for labeling cDNA, hybridization, rinsing, incubation with the alkaline phosphatase conjugated anti-digoxigenin antibody, staining for revelation of bound alkaline phosphatase, and scanning for data acquisition, requires a maximum of two days. This is quite a time saving, compared with the up to eight days' exposure required for radioactive ³²P or ³³P labeled probes. The advantage of the enzyme-labeled probes over fluorescent- labeled probes is the cost savings in the data evaluation step, where the method requires an inexpensive routine upright microscope, whereas the fluorescent- labeled probes require the use of an expensive laser detection system or a confocal microscope set-up. This, plus the notorious fact that fluorescence can be easily bleached after the scanning process, makes our enzymatic approach far superior, due to the ability to scan an array repeatedly with an inexpensive microscope, without losing the original signal intensity.

These results can be compared to the commercially available gene expression array systems as follows: Table 6: Comparative Characteristics of Gene Expression Array

Systems.

Affvmetrix* Clontech Fluorescent DIG-Array

Printed Spot, NA 10 1-15 0.5-2 ng ofcDNA

Total RNA per 5(0.5)** 2-5 50-200 2-4 hybridization

Sensitivity 1 :100,000 1 :20,000 1 :100,000 1 :100,000

(1:2,000,000) Difference detection twofold (10%) twofold twofold 40-50%

* routine use, current limit in parenthesis

** before amplification in an in vitro transcription reaction (typically 30-100 fold).

TABLE 4

List of E-box transcription factors, c-Myc interacting genes and housekeeping genes represented in microaπay.

CΛ W

H

H W

CΛ w w w

H

w

TABLE 4 Cont.

CΛ w

CΛ

H I— I

H

M

CΛ a w w

H

t-¹ w

ON

TABLE 5

Influence of different hybridization conditions on intensity of microarray signals.

* - total intensity of all microaπay dots after subtraction of background; the same quantity of DIG-labeled Hela cDNA was used in all hybridizations.

Claims

1. A method for detecting DNA hybridization in a microaπay, Northern or

Southern blot comprising adding digoxigenin labelled gene-specific primers to target nucleic acid molecules, allowing the primers to react with the target nucleic acid molecules to produce labelled target nucleic acid molecules, incubating the labelled target nucleic acid molecules with anti- digoxigenin antibody conjugated with an enzyme cleaving a chromogenic substrate, and detecting the reaction of the target nucleic acid molecules with the antibody using detection means for a colorimetric, chromogenic or chemiluminescent assay.

2. The method of claim 1 wherein the nucleic acid molecules are DNA.

3. The method of claim 1 wherein the target DNA is on a microaπay and is present in picogram quantities.

4. The method of claim 1 wherein the target DNA is on a Northern blot.

5. The method of claim 1 wherein the target DNA is on a Southern blot.

6. The method of claim 1 wherein the primers are reacted with the target nucleic acid molecules, the nucleic acid molecules are DNA and the target DNA is amplified in a polymerase chain reaction.

7. The method of claim 1 wherein the detection means is the chemiluminescent substrate, Disodium 3-(4-methoxyspiro{l,2-dioxetane-3,2'- (5'-chloro)tricyclo[3JJJ³'⁷]decan}-4-yl) phenyl phosphate - CSPD.

8. The method of claim 1 further comprising following antibody incubation, staining with 5-Bromo-4-chloro-3-indolyl-phosphate, toluidine salt (BCIP), and Nitro blue tetrazolium chloride (NBT) in detection buffer.

9. A method for enhancing the resolution of hybridization reactions between probes and target DNA in a microaπay format comprising providing test samples in the microaπay in a quantity of less than 200 test spots.

10. The method of claim 8 wherein at least three spots are provided for each test sample.

11. The method of claim 10 wherein the three spots are located randomly throughout the microaπay.

12. The method of claim 9 wherein at least nine housekeeping genes are provided for normalization of data.

13. The method of claim 12 further comprising providing means for normalizing the size of the detection reaction for each test sample relative to the size of the other test samples and housekeeping genes.

14. The method of claim 13 wherein each test sample detection reaction is fitted into a circle of the same diameter.

15. The method of claim 14 wherein the fitting is performed by a computer that measures and normalizes the data for each test sample.

16. The method of claim 15 wherein the data is further normalized relative to the detection reaction for each of the housekeeping genes.

17. A kit for use in the method of claim 1 comprising digoxigenin and reagents to label nucleic acid primers.

18. The kit of claim 17 further comprising a substrate selected from the group consisting of colorimetric, chromogenic, and chemiluminescent substrates.

19. An informatics system for use in the method of claim 9.