WO2004111647A1

WO2004111647A1 - Analysis of a microarray data set

Info

Publication number: WO2004111647A1
Application number: PCT/EP2004/006245
Authority: WO
Inventors: Lisa A. Gilhuijs-Pederson
Original assignee: Academisch Ziekenhuis Bij De Universiteit Van Amsterdam
Priority date: 2003-06-16
Filing date: 2004-06-09
Publication date: 2004-12-23

Abstract

A system for analyzing a microarray data set 300 includes a measurement device 310 and an analysis device 350. The set includes a plurality of microarrays, where each microarray Ai includes a plurality of spots Gg for interaction with a biological target within a biological variety Vk. Each spot on a microarray is associated with a corresponding spot on any of the other microarrays of the set. For each microarray, the spots have been hybridized with biological material from a common reference variety Vo and with biological material from at least one further variety Vk (k>O). The measurement device obtains fluorescence intensities Yikg on each spot of each microarray for each variety that hybridized to the respective array. The analysis device performs an across array normalization of the microarray data set based on the interaction of the common reference variety on the respective microarrays.

Description

Analysis of a microarray data set

Field of the invention

The invention relates to a method for the analysis of microarray data sets, for example, from high-throughput gene expression experiments. The invention further relates to a system for analyzing the data set and to an analysis device.

Background of the invention

Microarray technologies are commonly utilized to answer biological questions about gene expression levels across organisms, tissues, states of health, etc. A variety of microarray technologies are currently available also for other biological material than genes. Differences between these technologies are generally due to the nature of the probe being fixed to the array (cDNA, oligos, proteins, antibodies, etc.) and the manner in which these probes are affixed to the array (pin-tip spotted, ink-jet sprayed, built directly on the array one nucleotide at a time, etc.).

In a typical DNA microarray experiment, a microarray is prepared by fixing or synthesizing known nucleic acids to a suitable substrate in a grid pattern. Each spot in the microarray is comprised of a purified nucleic acid and each nucleic acid may be placed in one or more spots on the micro array. Each spot is referred to as a "probe" or "gene". The microarray may have thousands of nucleic acid spots. The microarrays are usually produced by an automated mechanical printing process so that the same nucleic acid is spotted on the same location in each array. Alternatively, the microarray may be produced by synthesizing the nucleic acids directly on the surface of the substrate. To analyze biological samples with microarrays, pools of purified mRNA are prepared from cell populations under study and reverse-transcribed into cDNA. The cDNA samples are labeled with fluorescent dyes such as the red fluorescent dye, Cy5, and the green fluorescent dye, Cy3. hi principle, also other dyes may be used to tag the cDNA targets. The labeled cDNA samples are referred to as the "target" or "variety". The purpose of a typical cDNA micro array experiment is to quantify the amount of original mRNAs in the sample through specific/selective binding between the genes on the microarray and the labeled cDNA varieties. The process of allowing the cDNA varieties under favorable conditions to bind selectively to the complementary gene/probe on the microarray is called hybridization.

Typically a set of arrays is hybridized in one operation. The microarray set is incubated for a set period and temperature wherein the cDNA of the variety or varieties applied to the arrays are allowed to hybridize with the complementary gene sequences on the microarray spots. After incubation, the microarray is washed to remove any unhybridized cDNA. The washed microarray is illuminated by light that causes the colored (e.g. red and green) tags to emit fluorescent light. The intensities of the fluorescent light emitted by the dyes are measured (e.g. using a photo-multiplier tube PMT) and recorded for each region of the microarray. A separate scan, at the appropriate excitation wavelength, is done for each fluorophore. Some modern scanners allow for the option to scan more than just two channels (two dyes) allowing the possibility to competitively hybridize more than two samples to the microarray. The intensities of the fluorescent dye signals depend, in part, on the abundance of the corresponding mRNAs in the sample.

Some microarray technologies are more reliable than others. Some of these technologies have a larger variance in spot size, spot shape, amount of probe reliably being affixed to the array, etc. than others. Traditionally, when there is reason to suspect large variances during the microarray fabrication step, the microarrays will have two (or more) different samples each labeled with a separate fluorescent dye hybridized against them and only the ratio of the separate dye channel intensities is trusted. In the so-called reference design, a common reference variety Vo is uniformly labeled with one color dye (e.g. green) and the variety of interest samples are individually labeled with another color dye (e.g. red). The common reference variety Vo may, for example, be taken from a healthy cell line, and each additional variety may, for example, be taken from a respective animal known to have a certain disease (e.g. a type of cancer). In this example, information can be gathered on which genes may contribute to that type of cancer. Each microarray is hybridized with equal quantities of this common reference sample and one of the variety samples. In principle each array may include more than one additional variety by using more than two dyes. Varieties are represented by V_k where V_k (k=Q) typically denotes the common reference sample variety and V_k (k>l) denotes each of the variety sample classes. In traditional ANOVA analysis of microarrays, V_k is representative of more than one variety sample within the same variety class (for example, many patient samples of liver cancer) - hence is calculated from intensity signals from more than one array.

The outcome of the hybridization is measured for each array A_t, for each spot G_g, for each variety class V_k and each dye D_j, giving a microarray data set with measurements Y_ykg reflecting the hybridization of spot G_g of array A_t with the reference variety Vo and with the microarray specific variety V_k (k>l). Microarray experiments contain a large amount of variability due to the methods used for preparing and purifying the gene and cDNA samples, spotting the polynucleotides on the micro array, scanning the washed micro array after incubation, and the variability that arises from the inherent complexity of biological systems. In particular, spotted cDNA arrays are subject to many errors and large sources of noise. Therefore, it has become common practice not to operate directly on the intensities of the dyes but to calculate the ratio between the red and green dye intensities for each spot on the micro array. Only this ratio of variety sample fluorescence to reference sample fluorescence is trusted as a measure of reference-variety difference of expression for any given spot. Assuming that the red and green dyes have similar labeling efficiencies and fluorescence efficiencies, if equal amounts of a specific mRNA exist within both targets hybridized to the probe, the red and green intensities for the spot would be roughly equal and the red/green ratio would be about one. If, on the other hand, more of the specific mRNA exists within the red target than the green target, the measured red intensity signal would be larger than the green intensity signal and the red/green ratio would be larger than one. Conversely, if more of the specific mRMA exists within the green target than the red target, the red/green ratio would be a small fraction of one. If more than one specific variety is used on an array, for each specific variety the respective ratio of the specific variety compared to the reference variety will be determined. The goal of the above experiment is to identify statistically significant differences between the expressions of the varieties.

Analysis of the fluorescence intensities of hybridized microarrays typically includes spot segmentation, background determination (and possible subtraction), elimination of bad spots, followed by a method of normalization to correct for any remaining noise. Some common normalization techniques include global normalization on all spots or a subset of the spots such as housekeeping genes, prelog shifting to obtain better baseline matches, or in the case of two (or more) channel hybridizations finding the best fit that helps to give an M vs. A plot that is centered about M=0 and/or that helps to give a log(Red) vs. log(Green) plot that is centered about the diagonal with the smallest spread. The M vs. A plot is also referred to as the R vs. I plot, where R is a ratio, such as R = log₂(Red/Green) and I is an intensity, such as

/ = log VRed* Green . Scaling, shifting, best fits through scatter plots, etc. are techniques currently utilized to normalize microarray datasets and to give better footing for subsequent analysis. Most of these normalization methods have some underlying hypothesis behind them (such as "most genes within the study do not vary much").

Many sources of noise (such as laser nonlinearities, dye labeling efficiencies, etc.) have been shown to give sometimes intensity-dependent nonlinear error terms. Many of the common normalization techniques are not able to accurately correct for such terms. Lowess fit normalization, although designed to correct for such nonlinearities between the Red and Green channel on a single array, will fail in regions of the array where there is significant nonspecific hybridization (i.e. dye and/or labeled sample sticking to the array itself for reasons other than complementary match of the sample target to the specific probe spot ) of one dye versus the other. Here, shifting of the scatter plot such that it is centered about M=0 shifts the noise of the excess sticking dye down only to the detriment of the other dye channel (which assumably were accurate measures before shifting). If the spread about M=0 is deemed to still be too large, subsequent scaling only serves to reduce all measures (both good and bad) potentially hiding true signal while reducing this inaccurately shifted noise.

Normalization strategies are typically applied to each array separately to attempt to correct for variances between the separate dye channels. Analysis of Variance (ANOVA) has been applied to microarray datasets as described in Kerr, et al,, "Analysis of Variance for Gene Expression Microarray Data", Journal of Computational Biology 7, 819-837 (2000) and US 2002/0177132. ANOVA, as applied by Kerr et. al., is reasonably accurate at separating noise effects from true signal effects by modeling for all effects deemed relevant within a specifically designed microarray dataset (essentially accounting for the effects that would be corrected for with traditional normalization methods within terms such as A_t, etc and leaving the true biological signal within the VG_kg term of interest). ANOVA assumes a linear data model that represents the data set. A typical data model for a reference design is: logϋ . ) = μ + +V_k + G_g + (VG)_kg + ε,_ks where • yi_jkg is the measured fluorescent light intensity from the g* gene of the i^Α array, ¹ dye and &^Λ variety; a logarithmic transformation is typically applied to make the intensity distribution more normal and to put low intensity and high intensity genes on more equal footing. • μ is the average of all log scaled measurements

• A_t is the effect of the f¹ array, accounting for variations in overall fluorescence of any given array

• V_k is the effect of the k^ variety, accounting for overall variations in fluorescence of any given variety class (which unfortunately also includes any non-separable dye effects in the case of V versus V^ where k>0)

• G_g is the effect of the g* gene, accounting for overall variations in expression levels of different mRNA's within the samples

• (VG)_kg is the effect of interest (i.e. which gene within a variety class is differentially expressed compared to the same gene in the other variety classes) • Qy_kg is the mean zero independent error term of the model.

It should be noted that in this approach a given variety term typically includes samples applied to several arrays with a common feature, such as gender, healthy/diseased, treated with drugs yes/no, etc. The samples per array are identified using knowledge of both the array index i and the dye index y but are pooled in the analysis into a variety V_k. The terms of interest are the interactions between varieties and genes (VG)_kg. These terms capture departures from the overall averages that are attributable to the specific combination of a variety k and a gene g. The terms A and V effectively normalize the data as an integral process with the data analysis. Typically, an ANOVA model is fitted to the data using a least square minimization. Current ANOVA models applied to reference design microarrays may account for a "spot" AG_ig correction term based on an average of both channels (i.e. the common reference variety and the array-specific variety). This undesirably mixes both "spot noise" and "biological variance" of the two (or more) samples.

Summary of the invention It is an object of the invention to provide an improved method, software, system and device for analysis of a microarray data set. To meet the object of the invention, a method of analyzing a microarray data set includes performing an across array normalization of the microarray data set based on the interaction of the common reference variety on the respective microarrays. The microarray set includes a plurality of microarrays. Each microarray A_t of the set includes a plurality of spots G_g for interaction with a biological target within a biological variety V_k. Each spot on a microarray is associated with a corresponding spot on any of the other microarrays of the set. For each microarray, the spots are subjected to a biological interaction, such as a hybridization, with biological material from a common reference variety Vo and with biological material from at least one further variety V_k (k>0). The microarray data set provides measurements Yy_kg , such as fluorescence intensities, on each spot of each microarray for each variety that interacted to the respective array. The normalization may be followed by a conventional further analysis of the normalized data set. The nonnalization algorithm may also be used to verify whether one or more other normalization strategies perform well enough. If the normalization method according to the invention is only used for verification, no actual normalization needs to take place.

In general, there may be many causes for arrays within a set or individual spots not giving reliable results. Such causes include defects in the array surface, irregular spot shapes and sizes, hairs and dust specks giving high fluorescence, non-specific binding of dye to large regions of the array, etc. Even if quality of the array manufacturing and array processing increases there will always be a certain variation. The reference design gives a lot of additional information that can aide in first detecting and ultimately correcting many of these defects that to date has not been taken advantage of. According to the invention, the data set is normalized based on the measurements involving the common reference material. Since a large amount of the reference sample is labeled uniformly and then small portions are aliquotted for each array, this sample is in principle the same on each array. If the spots do not become saturated (all cDNA sequences bound and mRNA still left in solution unable to find a place to bind) and if the dynamics of binding are not extremely different on each of the arrays (possible temperature and concentration dependencies), etc. the intensities of the dye labeled reference samples should be identical. Variations in intensity across the arrays are then attributed to defects in the array surface, irregular spot shapes and sizes, hairs and dust specks giving high fluorescence, nonspecific binding of dye to large regions of the array, etc. In principle, by first ignoring the fluorescence of the variety samples V_k (k>0) and analyzing the reference sample VQ intensities as a complete study of its own gives valuable information on these sources of noise which in principle can subsequently be corrected for uniformly in both the reference and variety channels. Preferably, the reference channel and the variety channel have reasonably similar fluorescence intensity distributions. Even if this condition is not optimal, preprocessing the dataset appropriately might make the dataset amenable to outlier detection & removal and spot correction. Experiments reveal that performing a separate normalization step based on the reference variety before performing a further analysis can significantly increase the reliability of the analysis and thus leads to a more discriminative analysis.

It will be appreciated that the biological variety V_k (also referred to as sample) may in principle be in any suitable form. In general the variety includes a plurality of biological targets, such as individual genes, proteins, etc for interaction with a respective spot/probe on the array. The probe is formed by material able to interact with a single target of the biological sample, e.g. for an mRNA analysis the probe may be cDNA, or oligo (single-strand nucleotide - SSN); for a protein analysis the probe may be a specifically binding antibody, hi the remainder, usually the term gene will be used for a target in a variety since the varieties being tested in the most commonly used cDNA and oligo microarrays are collections of genes. It will be appreciated that this does not exclude in any way other distinguishable biological targets. The varieties may be identified/labeled using any suitable technique. Currently, dye labeling is used in microarray analysis to be able to distinguish between the at least two varieties per array. It will also be appreciated that for testing one gene, in principle, more than one spot on an array may be used. This may be done by introducing a spot index s that can be correlated with a gene index g. Alternatively, G_g can be used on a per spot basis even if one gene is spotted more than once, thus treating a gene spotted multiple times as separate genes. Each array includes the common reference variety V and at least one further variety. Vo, the term for the common reference sample, is thus representative of fluorescent intensities from multiple arrays. In the detailed description given below for a preferred embodiment, multiple samples within the same variety class will be treated as separate varieties V_k. It will be appreciated that the concept can also be applied where a variety in fact is a variety class covering multiple arrays, as is used in the conventional ANOVA approach. In this conventional approach, a variety (i.e. pooled class) is indicated by an index k. Since many samples can be labeled with the same index k, knowledge of both indices k and i is required for unique sample identification, h principle, more than one variety sample could be present on a same array. So, in addition to the array index / and variety class index k also the dye index / was used to distinguish between the separate variety samples each labeled with a different dyey. In the description here, each sample is regarded as a separate variety, indicated by index k, making the index/ redundant (and occasionally omitted in the text that follows). To simplify the description further, a sample (other than the common reference sample) that is applied to more than one array is seen as a separate variety for each array to which it is applied.

According to the measure of the dependent claim 2, the step of normalizing the data set includes selecting all measurements Y_lg from the data set that relate to the common reference variety; and estimating a term (AG)_lg indicative of the spot variance of the common reference variety for the g* spot on the i^ array. This gives a "per spot" assessment of reliability of the measurement (e.g. fluorescence intensity) not contaminated by true biological variance.

In a preferred embodiment as described in the measure of the dependent claim 3, the step of estimating the term (AG)_lg includes fitting a linear data model to the selected measurements that includes the following terms μ + A_l + G_g + (AG)_lg , where: • μ is the average of all selected measurements,

• A_t is average intensity of all spots for the common reference variety on array i minus μ, and

• G_g is average intensity of the g^& spot within the common reference variety over all arrays minus μ. The fitting is preferably done using a least square minimization, but also other techniques may be used.

According to the measure of the dependent claim 4, the method of analyzing a microarray data includes removing outlying measurements by applying an outlier detection criterion based on the spot variation term (AG)_lg for selecting measurements Y_lg that are unreliable; and removing measurements Y_lg for each spot g that has been identified to have an outlier Y_lg on at least one array i.

The traditional ANOVA analysis is heavily influenced by outlier spots. Causes of such outliers include dust, hair, scratch on the surface of the array, etc. The normalization according to the invention provides the term (AG)_lg indicative of the spot variance of the common reference variety for the g spot on the / array for each spot, enabling detection of outliers. Such outliers are then removed from the data set also for the non-reference varieties, since in many cases the cause of the disturbance in the results for the reference variety applies equally to the specific variety. In this way unreliable measurements are removed and do not negatively influence the further analysis.

According to the measure of the dependent claim 5, the step of applying the outlier detection criterion includes comparing (AG)_ig with measurements Yι_g from the data set that relate to the common reference variety. In this way per spot and per array comparison can be made of original intensities with estimated noise. Noise could be intensity dependent. It is, therefore, preferred not to consider a spot ig as noise merely because the (AG)_ig term is relatively large, but rather to consider spots as outliers for extreme (AG)ι_g terms that fall out of the consensus cluster as for example visualized in an (AG)_ig versus Y_ig plot. Instead of using the original Y_ig a corrected version of Y_ig may be used, for example a normalized Y_ig or a spot corrected Y_ig, using spot correction according to the invention. The original Y_ig may be advantageously used to detect poorly hybridized dark spot outliers, whereas the spot corrected Y_ig terms may be advantageously used to detect bright spot outliers. Using spot corrected Y_ig, any bright outliers that could not be observed in the original plot (AG)_ig versus Y_ig are further separated away from the majority of spots in the cluster. Alternatively, (AG)_igm.ay be compared with G_g.

Preferably, as described in the measure of the dependent claim 6, a human operator can adjust the outlier detection criterion. For example, the operator may successive apply a stricter criterion removing more outliers until, for example based on experience, most outliers have been removed and a further tightening of the criterion may result in removing correct measurements. To assist the human operator, preferably, (AG)\_% and measurements _\g from the data set that relate to the reference material are visually represented to the operator. In this way, the operator can more easily identify areas of outlying measurements. Experiments have shown that clusters are easily visible within an AG_ig vs. Y{_g scatter plot which help to distinguish between typical corrections vs. unusual corrections. Preferably, also the selection criterion is visually represented (e.g. by separating an area of outlying and an area of acceptable measurements).

According to the measure of the dependent claim 8, the steps of estimating the term (AG)_ig and removing outlying measurements are performed iteratively until a stopping criterion has been reached. The iterative operation results in an improved estimation of the terms μ, Ai, and G_g. This in turn results in a more accurate spot correction term (AG)ι_g . In a scatter plot this results in a more dense cluster of points making the determination between trustworthy (AG)i_g corrections and outlier spots easier. In principle, a human operator may determine when the iteration can stop. For example, points that fall far away from all other points in a scatter plot are considered to be outliers. The iterations will improve the clustering. When, empirically, a certain "strength in numbers" is achieved the iteration process can be stopped. Alternatively, automatic and/or statistical criteria may be used.

According to the measure of the dependent claim 9, spot correction is performed by correcting the respective measurements for all varieties for the f¹ array, they^-th dye and the ^ώ spot in dependence on the corresponding estimated term (AG)ι_g , preferably by subtracting either the term (AG)i_g or the term ((AG)_fe / Ymg) * Yijkg- Errors that are not outliers but are good candidates for spot correction are spot size/shape variances, subtle differences in overall hybridization quality, even some non-linear laser effects can be corrected if not too large.

Additionally, the (AG)ι_g term can be estimated for alternatively designed datasets, including 1-dye datasets. Herewith, (AG)_ig will include the measurement of biological variance. Depending on the design of the datasets and the biological questions of interest, the (AG)_ig terms across multiple arrays can provide useful information on noise vs. biological differential expression.

Additionally, determining (AG)_ig on background measures (either 2-dye or 1-dye, any hybridization design) can give insight into estimates of distinct background features/trends that can be utilized for optimization of background subtraction (yielding more reliable spot intensity measures).

To meet the object of the invention, a computer program product is provided operative to cause a processor to perform the method as described above.

To meet the object of the invention, a system for analyzing a microarray data set that includes a plurality of microarrays, where each microarray^_/ includes a plurality of spots G_g for interaction with a biological target within a biological variety V_k,' each spot on a microarray being associated with a corresponding spot on any of the other microarrays of the set; for each microarray, the spots having been subjected to a biological interaction, such as a hybridization, with biological material from a common reference variety Vo and with biological material from at least one further variety Vk (k> ); includes: a measurement device for obtaining measurements Yyk_g, such as fluorescence intensities, on each spot of each microarray for each variety that interacted to the respective array; and an analysis device for performing an across array normalization of the microarray data set based on the interaction of the common reference variety on the respective microarrays. To meet an object of the invention, an analysis device for use in the system includes: an input for receiving from a measurement device measurements Yy_kg, such as fluorescence intensities, on each spot of each microarray for each variety that interacted to the respective array; a processor for, under control of a program, performing an across array normalization of the microarray data set based on the interaction of the common reference variety on the respective microarrays; and an output for providing an analysis outcome.

These and other aspects of the invention are apparent from and will be elucidated with reference to the embodiments described hereinafter. Brief description of the drawings In the drawings: Fig.1 shows a reference design microarray set;

Fig.2 shows more details of a microarray; Fig.3 shows a block diagram of an analysis system;

Fig.4 shows normalization based on an estimated 'spot variation' term (AG)i_g; Fig.5 shows a scatter plot of AG_ig; Figs.6 to 8 show three alternative scatter plots of AGι_g;

Fig.9 shows a subset of Pritchard et. al.'s Variable genes in the Testis; Figs.lOA and 10B illustrate outliers on a sample array; Fig.l 1 shows the overall mean of four 4 distinct sample arrays; Fig.12 shows applying the spot correction algorithm in a loop design; Fig.13 compares the traditional ANOVA analysis with the PAR- ANOVA analysis according to the invention; Fig.14 compares the PAR- ANOVA analysis with the DEF- ANOVA according to the invention; and

Figs.15 to 17 illustrate differences in distribution in the measured intensities of the dyes.

Detailed description of the preferred embodiment h the remainder, a description is given of five main inventions relating the analysis of microarray data sets. All of these analysis methods can in principle be applied to any and all microarray technologies (cDNA, natural oligo, synthetic oligo (e.g. Affymetric's arrays), protein, antibody, print tip spotted, ink jet sprayed, fabricated directly on the array, etc.) if a two (or more) dye common reference design is utilized. However, invention 3 (DEF- ANOVA), as described in more details below, can also be used for 1-dye microarrays, such as Affymetric's arrays. Invention 4 (Feature Discovery) can be applied to any microarray dataset independent of the number of dye channels or the hybridization design. A fifth invention relates to scanning of the microarrays in order to optimize the input for the analysis which could in principle be used for any microarray dataset independent of the number of dye channels or the hybridization design. The inventions can be used together but may also be used independent of each other in combination with other analysis techniques.

In summary, the inventions are:

1. Normalization

A separate normalization step is performed that includes an across array normalization of the microarray data set based on the interaction of the common reference variety on the respective microarrays. Main uses of this normalization are: a. detection and removal of outliers, b. performing spot correction.

2. PAR-ANOVA (Per-Array Reference design ANOVA)

The ANOVA-style analysis of microarrays to determine statistically significant genes is applied on a "per array" basis, instead of a variety class covering several arrays. Only the genes that are in consensus across arrays within a traditional variety class (when such replicates exist) are accepted to be statistically significant true positives. 3. DEF-ANO VA (Dye Effect Free ANOVA)

In a microarray set with multiple arrays belonging to a same variety class, such a class is used as variety ₀. Other varieties and/or variety class samples which are labeled with the same dye as used for Vo are analyzed with respect to Vo, preferably using the PAR- ANOVA approach. This effectively results in a one dye microarray study, side-stepping an immediate need for dye correction.

4. Feature Discovery

The equations utilized for detecting noise, biases, and outliers within the first invention can be applied universally to all datasets (even if they are not 2-dye common reference design). Application to background measures allow feature detection within background measures to determine improved estimates for optimal background subtraction. Application to intensity measures (including but not limited to background subtracted and/or normalized) allows the identification of regions/spots/pixels upon the arrays which differ. These regions/spots/pixels can subsequently be analyzed in conjunction with either background measures, background (AG)_ig estimates, (AG)_ig estimates for same variety class samples, (AG)_ig estimates for different variety class samples, or any standard microarray measure for improved feature identification, classification, and processing.

5. Scanner control

A target intensity profile is determined, for example by taking an average of intensity profiles of pre-scanned arrays. The scanning device (laser settings, PMT gain) is then set for each channel of each array separately to obtain a scan with the desired target profile. Using measurements with better profiles obtained in this way, results in a better performance of the spot correction. Additionally, using similar intensity distributions lies a foundation for reducing experimental errors and nonlinearities for every and all microarray design/study (not just common reference design).

A strategy that includes all five main inventions might entail:

1. Scanner control, according to invention 5, to obtain optimal measurements as input for the normalization and further analysis. 2. Outlier removal by the common reference channel detection methodology (invention la). Additionally, techniques such as image analysis software and visual inspection of the images may be used. 3. Verification of the validity of standard background subtraction and/or improving background measures for subtraction (invention 4).

4. Noπnalization by a common normalization method and/or spot correction normalization (invention lb) either separately or combined. It is sometimes useful to perform some preprocessing techniques such as shifting of baselines and/or global scaling before spot correction outlier detection in step 1 because it can give tighter clusters of AGi_g corrections and make it easier to detect true outliers from diffuse clusters and make the final AG_ig more representative of spot noise as opposed to large fold variances in total fluorescence intensities across channels/arrays, etc. 5. A statistical significance test is applied. Preferably, the analysis of PAR- ANOVA

(invention 2) or DEF-ANOVA (invention 3) is used. Also other analysis methods, such as t-test, traditional ANOVA, etc. may be used. Ideally multiple techniques will provide consensus and/or be unique tools offering insight into the types of noise that still might remain in the datasets.

Description of the system

Fig.l shows a microarray set that, in this example, includes three arrays A_\, A_% and A . Spot correction and across array normalization are utilized in reference design experiments. PAR- ANOVA is typically applied in a reference design, but can be used for other experiments as well. DEF-ANOVA is not dependent on reference designs nor on 2-dye studies. It can be applied whenever there is at least one variety class set of arrays labeled with one dye and other arrays labeled with the same dye (whether the second dye reference channel is ignored or not). Scanner control can be applied to any microarray design.

In reference design experiments, a common reference sample (variety) indicated as Vo in Fig.l is labeled in unison (as opposed to separate labeling of the reference sample for each separate array) and/or pooled together to form one uniform reference sample after separate labelings. In Fig. 1 this is illustrated by Vo being applied to all three arrays. In addition to the reference variety, at least one additional variety is applied to each respective array. In Fig.1 exactly one additional variety is applied to each array. In the remainder the additional varieties will be indicated as Vk, where k>0. In this way there are at least two varieties on each array, one of which is the common reference variety. The interaction (e.g. hybridization) of each respective variety with the array can be separately measured, typically by using different dyes. Normally, all reference material is labeled using one dye, whereas all additional varieties are labeled using another dye. If more than one additional variety is applied to an array, more dyes need to be used. In the remainder, most emphasis will be on a two-dye system (i.e. one reference variety and one additional variety per array), but the invention is in no way limited to such a system. Fig.2 schematically provides more details on an array. It shows an array 210 with a plurality of spots. In this example, 45 spots are shown. Spots G_\, G₂ and G₃ are specifically shown in the Figure. After incubation, to each spot a certain amount (or none at all) of the reference variety and similarly of the array-specific variety may have hybridized. Using a red and green dye, a spot may have any red intensity and any green intensity up to a certain maximum. By scanning the colors separately an image can be obtained that reflects the hybridization of the 'green' variety illustrated as 220 and an image that reflects the hybridization of the 'red' variety illustrated as 230. Details of the arrays, such as size, shape, number of spots, are outside the scope of the invention. Any suitable array may be used.

Fig.3 shows a block diagram of a system according to the invention. The system includes a microarray scanning device 310 (also referred to as measurement device) and an analysis device 350. The scanning device 310 includes an excitation radiation source (or is attached to such a source), for example a laser 312. The laser 312 is preferably capable of optimally exciting each dye channel used on the microarray set 300. In the example of Fig. 3 the set includes eight arrays 301 to 308 that are typically scanned sequentially. The scanning device 310 also includes a detector, such as a photo-multiplier (PMT) 314 for measuring the fluorescence intensity of the spot excited by the laser. Using the approach where each sample on an array is seen as a separate variety, the scanning device provides fluorescence intensities Yy_kg where the dye index/ is redundant in the case of only varieties 0 and k being hybridized to any given array and can be simplified to Y_tk_g for each spot g on each array i and each variety k. For a typical 2-dye approach this gives the following measurements (assuming normal sequential numbering):

- array 1: Yι,o,_s for Vo and Yι,ι,_g for the specific variety of array 1, being V_\,

- array 2: Y2,o,_s for Vo and Y∑ g for the specific variety of array 2, being Vz, - etc. In this example, the array index i and variety index k are the same for the variety samples where k>0. Using a 3 -dye approach, the measurements may be arranged as: - array 1: Yι ,o,_g for V₀ ; Yι ,ι,_g for the first specific variety of array 1, being V\, and ij_Xg for the second specific variety of array 1, being V%

- array 2: Y2j,o,_g for V₀ ; Y2_j,₃,g for the first specific variety of array 3, being F₃, and Yj .g for the second specific variety of array 2, being F₄ - etc.

It will be appreciated that persons skilled in the art can easily adapt this to situations wherein a same sample (other than the reference sample) is applied to multiple arrays and to the conventional ANOVA approach of pooling samples to a class referred to as the variety.

In itself, microarray scanning devices are known and can be used in the system according to the invention. The excitation radiation source 312 and the detector 314 are controlled by a processor 315. This processor, under control of a suitable program, can set settings of the source and/or detector. Typically, an operator of the scanning device 310 can influence such settings. The program may be stored in a storage 316, such as a non- volatile memory (ROM, hard-disk, etc.). The processor 315 also receives the data from the detector 314. It may permanently or temporarily store this data in a storage, such as the storage 316.

Preferably, it can provide the data to an analysis device 350 via an interface 318, such as a network interface like Ethernet.

The analysis device 350 is typically a conventional computer, such as a personal computer or workstation, loaded with specialized statistical analysis software to analyze the fluorescent intensities of the material samples. The analysis device receives the data from the scanning device via an interface 352. Preferably, the interface 352 is a network interface. The data may also be received in any other suitable way, such as on a record carrier, such as a CD-R. A processor 354 is loaded with the analysis software and analyzes the data. Preferably, the processor enables a human operator to control the analysis. To this end, a conventional user interface 356 may be used. For example, a display may be used to provide information on the analysis, e.g. in the form of tables, scatter plots, etc. The operator may provide input using input devices, such as a keyboard, mouse and tablet. The analysis program, data to be analyzed, and the analysis outcome may be stored in a memory/storage such as a hard disk 358. During execution it may be stored in a volatile memory, such as RAM.. In a preferred embodiment, as will be described in more detail below in the section 'scanner control', a target intensity profile is determined. Such a profile may be a histogram of the intensities per pixel of the scanner output. For each of the arrays, the settings of the scanner are then adjusted until the profile of the array optimally matches the target profile. The target intensity profile is preferably determined by, for one array of the set, creating an intensity profile for each channel. The settings of the scanner are adjusted to bring the channel- specific profiles together. The scanner may be controlled automatically to obtain an optimal match of a newly scanned intensity distribution with the target intensity distribution. The automatic feedback given between the prescan and the actual scan, is preferably based on per pixel analysis (i.e. prior to manual segmentation of the image). Target profiles that may have been obtained under control of a human operator are preferably stored in storage of the scanner device. Such a target profile can be retrieved for subsequent scanning operations. The target profile may have been created under control of a human operator on a separate computer. The outcome of such an analysis may then be supplied back (e.g. downloaded on demand) to the scanner. The storage in the scanner or analysis computer is preferably arranged as a database, allowing user friendly storing and retrieval of the data in association with further attributes, such as identifiers of arrays, manufacturers, etc. The scanner control, including operations like determining a target profile, removing landmarks that might negatively influence the target profile, and adjusting setting to obtain for each array a scan with a profile matching the target profile, is preferably executed by the processor 315 of the scanning device 310. To this end, suitable programs or suitable program modules are loaded in the processor for causing the processor to perform the described operations.

Normalization with Outlier Detection and Spot Correction

According to the first mentioned invention, analysis of the microarray data set includes performing an across array normalization of the microarray data set based on the interaction of the common reference variety on the respective microarrays. After the normalization has been performed, further statistical analysis may be performed on the normalized data set using any suitable analysis technique, not described further here. It will be appreciated that the normalization data, such as the spot correction data, obtained by the method according to the invention may be used without actually performing the final normalization (e.g. outlier removal or AG normalization according to the invention). The spot correction data may be used to quantify the noise in the measurements, for example, to help determine which existing normalization strategy performs the best. By first considering just the common reference sample channel within this two (or more) dye common reference design study, valuable information can be gained about the sources and sizes of noise and nonlinearities both within and across arrays within the entire study. Since the common reference sample was labeled with one color dye in one uniform labeling procedure and since this common reference sample is hybridized to each of the identically fabricated arrays (or pertaining to the identical subset of probes if they are fabricated differently), in principle, fluorescence intensity of just the reference sample for any given spot (i.e. fluorescent dye labeled reference sample bound to unique probe) should be identical on every single array (excluding saturated points). Reasons that the fluorescence intensity of the common reference sample channel might be different across the arrays include, but are not limited to, the following sources of error: print-tip variances, spot size and shape variances, microarray coating inhomogeneities, dust, hairs, bubbles under the coverslip, well-plate effects, etc. To the extent that the fluorescence intensity of just the reference sample for any given spot is not the same, the amount of variation for any given spot can be determined across each of the arrays. In a preferred embodiment, the step of normalizing the data set includes selecting all measurements Y_to_g (simplified to Yι_g) from the data set that relate to the common reference variety. Since the common reference variety is labeled with one unique dye, these measurements can be easily extracted from the entire set of measurements. Next, a 'spot variation' term (AG)_ig is estimated that is indicative of the spot variance of the common reference variety for the g* spot on the 2^th array, as is illustrated in Fig.4.

Obvious large variations indicated by the spot variation term can be flagged as outliers and eliminated from the dataset. This is referred to as Outlier detection and removal'. Upon identification and removal of outlier spots, the estimation of the terms within the model becomes more reliable. The remaining smaller corrections, (AG)_ig: can be uniformly subtracted on a per spot basis from both the reference sample channel and the variety sample channel assuming a correlation between the dye channels. This is referred to as 'spot correction'. In this way, all sources of variation can be corrected for perfectly within the common reference sample channel across all arrays including laser nonlinearities, etc and sources of variation that are common to both the reference and variety channels are removed from the variety channel. In regions of the array where there is significant nonspecific hybridization of one dye versus the other, this 'spot correction' will not be able to eliminate this dye imbalance, but this often times helps to amplify the variance in the variety channel which can then be detected and classified to be outliers in other ways. Furthermore, upon identification and removal of outlier spots, the variance in the remaining spots is significantly reduced by subtracting the 'spot variation' term uniformly from both the common reference sample channel and the variety of interest sample channel(s) when there is no detectably large dye imbalance between the two channels. Thus, the 'spot variation' is used for correction at spot level. In view of this, the term (AG)_ig is also referred to as 'spot correction' term. In the remainder, the term (AG)_ig is sometimes indicated as AG_ig as well.

Definitions, Notation. Etc. • "i" number of arrays each denoted by "A " where (1 ≤ i ≤ I)

• "G" number of genes (or spots if not duplicated) each denoted by "G_g" where (1 ≤ g ≤ G)

• "AGj_g" is the second order interaction term between the array effects "A " and the gene effects "G_g" representing spot variations/noise. • yi_g denotes the measured fluorescence intensity for spot g on array i for the reference channel (thus, variety is

by convention; in the definitions and in the equation the subscript k has been omitted)

• μ is the overall average of all spots, all arrays but only for the reference channel intensities.

In a preferred embodiment, this spot variation term is estimated by applying Analysis of Variance (ANOVA) techniques to just the common reference sample channel, resulting in an easy identification of outlier spots that are not always detected during image analysis. This can be done by fitting the following ANOVA equation to just the common reference channel on all arrays:

It should be noted that all degrees of freedom are used in this 1 to 1 mapping / perfect fit of the terms within this equation to all reference channel intensities on all arrays (unless genes are spotted in duplicate, triplicate, etc. and G_g truly represents unique genes/probes as opposed to individual spots - either can be modeled for within the context of spot correction). In a preferred embodiment, a best fit of this equation to the measured fluorescence intensities is achieved by minimizing the residual sum of squares RSS (the sum of the difference between the measured fluorescence and the terms within the ANOVA model quantity squared):

RSS = ∑_J{yi_g -μ - At -G_g - AGg) ig The minimum of the residual sum of squares can be obtained by setting the partial derivative with respect to each of the ANOVA terms equal to zero. dRSS . v /

where yig is denoted by y„ (the dots indicating that a summation is performed over each

•g index respectively). This equation indicates that the overall average teπn, μ, within the ANOVA equation is determined by summing over the reference sample intensities on all arrays with index i and all spots/genes with index g, divided by the total number of arrays / and total number of spots/genes G.

Similarly for the remaining terms:

Where ^ yig is denoted by y_t, (the dot indicating a summation over the index g only). This s equation indicates that the array term, A_h is determined for any given array with index i by summing over the reference sample intensities for all spots/genes with index g on just array i divided by the total number of spots/genes G minus the previously determined overall average term. Similarly, the equations for G_g and AGt_g are determined:

— OK—S—S = 0 n => A jGig = yig- μ- A Λi-G-~ιg oAGig

These AG_ig terms capture all of the variance of the common reference sample that cannot be modeled correctly by the overall average, array, and gene terms within the ANOVA equation. This will include, but is not limited to, such sources of variation as spot size, spot shape, nonlinear dye/laser effects within the reference channel, regions of nonspecific hybridization upon any given array, different efficiencies of probe sticking to the array during manufacturing (due to such things as plate effects, concentration variances of probe sources, drying time of the array before the hybridization, etc.), dust, hairs, bubbles under the coverslip, scratches on the array, etc. If all of the AG_ig terms are reasonably small, one might choose to immediately perform a spot correction in dependence on this term, e.g. subtract this correction term or (AG_{ I Y_i00g ) * Y_1Jkg from both the common reference channel and the variety channel for any given gene/spot g on array i, trusting it to be true spot noise, hi a formula, y_j._kg = y_jjkg -AG_tg, [ or y^' _g = y _g - (ΛG_ig I Y._00g ) * Y_j. _g ] where: • y_tjkg denotes the original measured fluorescence intensity for spot g on array i for variety k

• y_jkg denotes the corrected fluorescence intensity for spot g on array i for variety k

• AGi_g is the spot correction term as estimated above

In practice however, different sources of noise can contribute varying amounts of noise, sometimes even quite large. An outlier detection criterion is applied for selecting measurements Y_ig that are unreliable on the spot variation term (AG)i_g. Measurements Yι_g that have been identified as having an outlier Y_igon at least one array i are then removed for each spot g (thus also for the additional variety on the array).

It is preferred to enable a human operator to adjust the outlier detection criterion. This can, for example, be achieved by presenting to the operator on a display a scatter plot of AGi_g versus the original fluorescence intensity for any given gene/spot g on array i for all genes/spots on all arrays (or a subset thereof). Fig. 5 shows an example of such a scatter plot. For each spot of all arrays (thus I*G spots), the intensity level Y(_g of the reference variety is indicated horizontally and the AG_ig value vertically. For each spot a small dot is placed in the scatter plot at location (Y_ig, AG_ig). From this typical example of a scatter plot, it can be observed that most points fall within a characteristic cluster for any given microarray dataset. Typically, the majority of AG_ig corrections are relatively small (both positive and negative) for low intensity genes/spots and the corrections can get larger (both positive and negative) for increasing intensity genes/spots. Some of this increase could reflect large overall differences of quality of the hybridizations across the arrays that are not effectively captured in a linear A_{ term and or nonlinearities within the dye fluorescence and/or lasers. It may or may not be trustworthy to subtract this correction term from both channels and it is preferred to determine this on a dataset specific basis.

A region for concern within the scatter plot is when low intensity genes/spots have large correction values (typically large negative AG_ig terms). This is indicative of the spot g having very low intensity on one or more of the arrays and extremely large on one or more of the other arrays. Corrections should not be made on this spot as it is usually indicative of the array(s) with the high intensity for spot g being contaminated by dust, hairs, scratches, etc. and/or the amount of probe sticking to spot g being highly variable across arrays during manufacturing of the array. Even fluorescence ratios should not be trusted for these spots. Instead, the identified spots g should be regarded as true outlier spots that probably cannot be reliably corrected and eliminated from the dataset. Removing these genes/spots from all arrays also reduces the number of large A G_ig correction terms within the high intensity region of the scatter plot, making it easier to determine the thresholds for trustworthy corrections. It is preferred to iteratively estimate the terms (A G)_ig and remove outlying measurements until a stopping criterion has been reached. One way of doing this is to iteratively fit an upper threshold line and a lower threshold line to the scatter plot to separate obvious outliers from the dense cluster of assumably trustworthy A G_ig corrections within the scatter plot. Fig.5 shows both threshold lines. By iteratively determining thresholds picking off the most extreme outliers and refitting the ANOVA equation each iteration, any bias in the AG_ig terms from the previous iteration is minimized before removing a new set of outliers. It will be appreciated that the thresholds for outlier removal may also be automatically determined.

In a preferred embodiment, (AG)_ig is compared with measurements Yi_g from the data set that relate to the common reference variety as shown in Fig.5. Also other comparisons may be used. Each of such comparison may perform better for certain types of errors. Figs. 6 to 8 show three comparisons based on the same set of arrays. Fig.6 shows a scatter plot of (AG)_ig versus Yι_g as described above. Two threshold lines 610 and 620 are indicated. The threshold line 610 is used for singling out the poorly hybridized dark spot outliers in the lower left corner separated from the main cluster by line 610. The threshold line 620 is used for singling out the bright spot outliers spot outliers in the upper left corner separated from the main cluster by line 620. Instead of using the original Y_ig a corrected version of Y_ig may be used, for example a normalized Y_ig. Preferably, a spot corrected Y_ig is used where the spot correction according to the invention is used. This scatter plot is shown in Fig.7. As can also be observed in the figure, this scatter plot is very suitable for singling out bright spot outliers (those in the upper left corner separated from the main cluster by line 710). Using spot corrected Y_ig, bright outliers that can not easily be observed in the original plot (AG)i_g versus Y_ig are further separated away from the majority of spots in the cluster. The original Yi_g may be advantageously used to detect poorly hybridized dark spot outliers, whereas the spot corrected Y_ig terms may be advantageously used to detect bright spot outliers. Fig.8 shows an alternative scatter plot wherein (AG)_igis compared with G_g. This plot is also suitable, in particular for detecting bright spot outliers (those in the upper left corner separated from the main cluster by line 810). The outlier detection based on the analysis of just the common reference channel has been applied to the CAMDA02 "Project Normal" dataset. Fig. 9 shows a subset of Pritchard et. al.'s Variable genes in the Testis. The analysis according to the invention revealed some genes (LPAAT-4, ApoCI, and EST originally being cited as the highest, second highest, and 4^th highest statistically significant genes within the testis tissue set) that appeared to be statistically significant due solely to a gross imbalance of reference sample intensities across different arrays. In the following two tables intensities with unusually large values are indicated in bold. In the top row, F635 indicates that a "fluorescence" of the dye with wavelength = 635 nm is used; M#T#_# indicates Mouse number #, T=testis (variety sample), # = 3 or 5 referring to which dye the sample (or reference) was labeled with, _# = tissue sample replicate number (each sample was run in duplicate / two arrays).

Table 1. Reference Sample Intensities for Arrays with Reference Sample Labeled Green.

Errors within the database that were inconsistent with image brightness, missed saturated spots, etc. were also easily identified with the ANOVA outlier detection algorithm according to the invention. Figs 10A and 10B illustrate outliers on array M6K3_1. Fig.lOA shows pin block 15. Fig.l OB shows pin block 8. A pin block is a distinct region on the array that is spotted with the same pin. The not-specifically indicated circles represent spots which had anomalously low values within the excel spreadsheet of fluorescence intensities but appear reasonably bright within the image (all other arrays had larger intensities reported within the spreadsheet). The circle indicated in Fig.10A by a white arrow 1010 indicates an anomalously large value within the excel spreadsheet but low brightness within the image (all other arrays had low intensities reported within the spreadsheet). The circle indicated in Fig.l OB by a white arrow 1020 indicates a saturated spot that was not detected within the original study which is clearly due to an array artifact. Furthermore, a correlation was detected between the average mouse variance as reported by Pritchard et. al. (0.038, 0.018, 0.054) for (Kidney, Liver, Testis) respectively and the overall mean of all spots (both dye channels) as determined within an appropriate Kerr et. al. style ANOVA analysis. This is illustrated in Fig.l 1, showing the overall mean of the 4 distinct sample arrays for mice 2, 3, 5, and 6. The overall mean is shown for the raw data, Lowess normalized data and data corrected using the spot correction according to the invention. Original raw data and Lowess normalized data have highest overall averages for Kidney and Testis samples. Spot Corrected data has more uniform overall averages across arrays and tissues. This is strong proof that incorrect conclusions are being drawn from microarray studies with the use of current normalization strategies such as Lowess normalization which normalize each array separately. It is preferred not to trust Red/Green ratios until and unless across array normalization (such as is proposed here with the ANOVA analysis of just the common reference channel leading to improved outlier detection and spot correction normalization) is implemented.

Performing the same spot correction in dependence on eAG_ig term for both the common reference sample channel and the variety sample channel for each corresponding spot/gene g and array i, typically removes the majority of the "peacock" spread of low intensity genes/spots coalescing those spots/genes to a distinct tight cluster at the low intensity region of the M vs. A plot with nominal spread about M=0. This effectively gets rid of the majority of the baseline noise in the intensity measures, allowing for more trustworthy determination of statistical significance for increasingly lower intensity genes/spots. In situations where the correlation between red and green intensities for any given amount of sample (due to such tilings as preferential dye incorporation biases, imbalance between fluorescence efficiencies of the two dyes, drastic differences in intensity ranges due to laser settings or PMT settings) is too poor, spot correction normalization can introduce biases (by over or under correcting in the variety sample channel). In situations where the common reference sample is extremely different from the variety sample (i.e. some genes having high expression in one sample and extremely low or no expression in the other) the spot correction might similarly over or under correct intensities within the variety sample channel. Sometimes this over/under correction is not a problem if the general trends between variety sample classes are either preserved (true positives still being found in spite of over/under correction) and/or if subsequent statistical significance tests are able to detect the false positives when general trends are not faithfully preserved during the over/under correction. Given the typical number of spots/genes on a microarray (hundreds to tens of thousands), it can be anticipated that a few false positives could still exist depending on the overall quality and number of arrays within the entire dataset. Statistical tests to detect channel correlation thresholds and/or subtracting the spot correction as a percentage of the original reference channel spot intensity, i.e. ((AG)_ig I Ym^ * Y≠g instead of subtracting the absolute measure (AG)i_g may be applied for optimization of the spot correction normalization methodology.

Application of the spot correction algorithm to a loop design

Fig.12 shows a known loop design wherein each variety sample gets split. One half is labeled with the green dye, the other half with the red dye. Each array has a red variety and a green variety hybridized to it, albeit from different varieties. Therefore, each variety of interest is measured twice without the redundancy of a common reference sample being applied to all arrays. Another advantage of loop design is that it has dye correction information already in it. In the example of Fig.12, array 1 has variety 0 (indicated with vO) labeled with red dye, variety 1 labeled with green; array 2 has variety 1 labeled in red and variety 2 labeled in green, etc. until array 6 where variety 5 is labeled with red and variety 0 is labeled with green, closing the loop. Traditionally, comparing varieties that are further separated on the loop becomes difficult because of the accumulated errors throughout the intermediate arrays.

According to the invention, the spot correction algorithm of equation (1) completely describes all variance across the two measures of any given variety. Thus, the algorithm is applied first to the red and green measures of variety 0, then the red and green measures of variety 1, etc. The algorithm here can also be used for spot correction and/or outlier detection removal. In this way, outliers can be removed, noise can be reduced, more information can be obtained from a traditional loop analysis, direct comparison of variances across the loop will become more reliable and loops can be made larger.

The application of equation (1) is valid because: - There is no V_k term, since it is always the same variety - There is no D_j term, since it is redundant with A_t

Applying the equation to variety 0 (v₀) will give (AG)_ig(vo, red) for array 1 and (AG)_ig (v₀, green for array 6. In this way, for each array an (AG)i_g green and (AG)_ig red measure is obtained. Correlation between (AG)_ig green and (AG)_ig red is indicative of correlated spot noise that can be corrected for. Poor correlation between (AG)_ig green and (AG)_ig red is indicative of dye variance or biological variance. Dye corrections can be obtained from comparing all green measures to all red measures. These dye corrections can be applied in addition to the spot noise corrections, leaving preferentially the biological signal remaining.

Per Array Reference design ANOVA (PAR-ANOVA) Outliers can also be caused by significant nonspecific hybridization of one dye vs. the other (i.e. not due to specific binding of the target to the probe but rather one dye (or both) sticking to either the spot and/or background due to non-biological (nonspecific) binding properties). Key to dealing with such outliers lies in correctly analyzing replicates. Most researchers when analyzing replicate samples or replicate spots trust average measures of the replicates without correctly detecting and eliminating outliers (assuming that a reasonably symmetric and/or normal distribution of noise components average themselves out). Within traditional ANOVA analysis of microarrays, biological sample replicates are pooled into the same variety term. Fig.13 illustrates the pooling. For example, assuming that the sample on array A_\ represents a first male, with a specific type of cancer and not treated with drugs; the sample on array A₂ represents a second male, with the same type of cancer and also not treated with drugs; and the sample on array A₃ represents a third male, with the same type of cancer and also not treated with drugs. Traditionally, these three samples are pooled into a class (male, cancer, no drugs), in Fig.13 A indicated as class V%. This class is regarded as one variety in the traditional analysis. Fig.l3A shows a further variety class V_\, also being a pool of samples of three arrays Aύ, to Aξ,. Class V_\ may, for example, represent samples of a healthy human. Having one extreme outlier averaged into this variety term can produce false positives and/or false negatives when determining statistically significant genes. Even if the outlier detection described above has been applied, this is no guarantee that all outliers have been removed correctly from the variety channels. Fig.13 A also shows that, as in the invention, a common reference sample Vo is applied to all arrays.

According to the invention, a method of analyzing a microarray data set includes, for each array of a set of microarrays, determining those spots for which the further variety V_k (k>0) shows a statistically significant differential expression with respect to the common reference variety Vo. The microarray set includes a plurality of microarrays, where each microarray A_{ includes a plurality of spots G_g for interaction with a biological target within a biological variety V_k. Each spot on a microarray is associated with a corresponding spot on any of the other microarrays of the set. For each microarray, the spots are subjected to a biological interaction, such as a hybridization, with biological material from a common reference variety Vo and with biological material from at least one further variety V_k (k>0), where the further varieties relate to the variety samples of interest. The microarray data set includes measurements 7 , such as fluorescence intensities, on each spot of each microarray for each variety that interacted to the respective array. According to the invention, an analysis is performed for the variety sample of each array separately and not for an average of the class. This results in detecting spots per array that show a statistically significant differential expression.

In a preferred embodiment, the microarray set includes at least two arrays each with at least one respective further variety V_k (k>0) belonging to a same variety class. The analysis method includes accepting from the spots that have been determined as showing a statistically significant differential expression with respect to the common reference variety Vo, only those spots as truly differential spots that are in consensus across a large percentage of the arrays of the set that relate to the same variety class. A spot is then only trusted if it has a same statistically significant differential expression for the corresponding spots on substantially all of the arrays within a sample class. With 'same' is meant 'not opposite', so all corresponding spots being all up-regulated or all down-regulated. So, if a spot G_g shows a statistically significant differential expression on array Aj it is only accepted if the same expression is found for G_g on substantially all other arrays A_t with a further variety of the same class, for i e {jjwhich have a variety sample within class k . This can easily be checked by counting the number of times a spot is statistically significant down-regulated, not statistically significant, or statistically significant up-regulated. By assigning the value -1, 0, and 1, respectively, to these three possibilities, the sum over all corresponding spots directly shows whether a spot is down- regulated or up-regulated over all spots (e.g. if the sum is the number of arrays, then the spot is up-regulated over all arrays). Ideally, all arrays with a sample of the same class should give the same differential expression for each corresponding spot for accepting the spot as being a differential gene. It will be understood that, depending on the overall quality of the hybridized arrays, a certain number or percentage (e.g. 10%) of disagreement in expression can be accepted as still giving a sufficiently high confidence that a truly differential gene has been identified. In a preferred embodiment, an assessment of the quality of the array set is made. For a high quality batch, a higher percentage of consensus can be used than for a lower quality batch. Preferably, for each array a term VGk_g is determined that represents a second order interaction between variety effects Vk and spot effects G_g. A spot is, preferably, determined as having a statistically significant differential expression if VGo,_g —VGk_ιg is determinably always positive or negative within an error model estimate (e.g. bootstrapping from residuals; bootstrapping is a technique that is known and can be easily applied by persons skilled in the art). So, the array-specific variety (k> ) is compared with the reference variety ₀ for each gene. This is illustrated in Fig.l3B. In the example shown here, each array contains the common reference sample Vo and one further sample that belongs to a class present on multiple arrays as described already for Fig.13 A. Each of the further samples is treated as a distinct variety, indicated as V\ to F₆. Each of those varieties is compared to V₀. If more than one array-specific variety is used per array, this comparison is done for each of those additional varieties.

It should be noted that PAR- ANOVA can also be applied to datasets which do not have replicates within a sample class for determination of statistically significant genes in each sample, but there is no further check for consensus to verify the reliability of any genes that are found to be statistically significant. This allows the application of Per Array Reference design ANOVA to such datasets as time-series datasets (where no time sample is hybridized in replicate).

The following description indicates how VG_kg can be determined.

Definitions. Notation. Etc. • "i" number of arrays each denoted by "A " where (1 ≤ i ≤ I) • "£" number of varieties each denoted by "V_k" where (1 ≤ k ≤ K). Note that Kerr et. al. pool distinct samples as replicates within a class (such as diseased tissue versus healthy tissue, each class having more than one biological sample, hence more than one array contributing to the variety). Within PAR- ANOVA, the variety sample channel of each array is considered to be a distinct class - only the consensus of differential expression is then considered across a pooled class.

• "G" number of genes (or spots if not duplicated) each denoted by "G_g" where ( - g - G)

• "VG_kg" is the second order interaction term between the variety effects "V_k" and the gene effects "Gg". This is the effect of interest. It indicates which genes are differentially expressed across distinct varieties. Other second order effects are currently omitted from the model but can be included if such effects are deemed significant within a dataset. AGj_g is omitted because it is anticipated to be small and/or zero after applying spot correction normalization. AVj_k is typically small when hybridizations are performed reasonably accurately and no gross mistakes are made with laser and PMT gain setting choices.

• μ is the overall average of all spots, both channels, all arrays, etc. that are included in any given model.

• yi_kg denotes the measured fluorescence intensity for spot g on array i variety k. (Note that for the reference channel, variety is k=0 by convention. Also note that the dye index/ was omitted for the 2-dye example where k=0 is redundant withy-0 and k>0 is redundant with/=l.)

In a preferred embodiment, for each array a term VG_kg is determined by fitting the following ANOVA equation to both channels on all arrays: ytkg =μ + A + Vk + G_g + VGkg + akg Continuing with the minimization of the Residual Sum of Squares yields:

for >Q.

The equations for V_k can easily be substituted back into the equation for A_t.

Solving for G_g is straightforward:

where the open circle subscript for the k index of y now indicates that for each array i that is summed over, both channels of the array are summed over (i.e. k=Q and the corresponding k>0).

Hence the summation is not over I+K channels, but rather 21 channels.

Solving for VG_kg, the effects of interest: dRSS . _._ (y . og _τ. „

- 0 => . V. G- o-g. = \, - μ ... -V. o. -G_g„ // and dVGkg I VGkg = yi(k)kg - μ - Aim- Vk- Gg for k>0.

Values for Aι, V_k, and G_g can easily be substituted into these equations.

Dye Effect Free ANOVA (DEF-ANOVA)

In this method, the exact same equations as PAR- ANOVA are used with the identical aims of "not creating false positives by averaging outliers into effects", "not missing true positives by averaging outliers into effects" and preferably "only trusting a consensus of individual effects". Whereas in PAR- ANOVA, V₀ is obtained from the common reference signal channel of all arrays labeled with a same color dye (for example, green) that differs from the variety samples dye (for example, red), here Vo is obtained from arrays within a distinct variety class channel labeled with the same color dye as the other variety samples within the study (as seen below). Experiments show that analysis within a "Dye Effect Free" framework (DEF- ANOVA) according to the invention yields greater consensus (fewer false positives due to the averaged influence of outliers), and improves threshold sensitivities (1.2 fold increased/decreased instead of traditional 2 or 3 -fold), and allows one to trust statistical significance for lower intensity genes/spots.

The DEF-ANOVA approach is illustrated in Fig.14, where Vo is formed by the samples of a same class on arrays A_\ to A^. The other samples (in the example all belonging to a same second class and applied to arrays Ad, to A₆, identified using the same dye as used for Vo) are analyzed with respect to the pooled variety class Vo, preferably using the per-array equations of PAR- ANOVA. The fewer arrays which get pooled into Vo (as opposed to all arrays with the common reference channel in the PAR- ANOVA application) makes the DEF- ANOVA application much more sensitive to baseline noise, outliers etc. The strong advantage of utilizing the DEF-ANOVA application is that it is no longer required to correct for any dye biases (i.e. it is not required to question whether statistically significant genes in the PAR- ANOVA are due to true expression differences or difference in dye incorporation efficiencies).

Thus, according to the invention, a microarray set includes at least a disjunct first and second subset of microarrays. At least the first subset includes a plurality of microarrays. The second subset includes at least one microarray. Each microarray A_{ includes a plurality of spots G_g. Each spot on a microarray is associated with a corresponding spot on any of the other microarrays of the set. For each microarray, the spots are subjected to a biological interaction, such as hybridization, with biological material from at least one sample. Each of the spots of the microarrays of the first subset is subjected to a biological interaction, such as hybridization, with a biological target within a biological sample belonging to a same first class. Each of the spots of the microarrays of the second subset is subjected to a biological interaction with a biological target within a biological sample belonging to a same second class, distinct from the first class. A microarray data set includes measurements Y_tkg, such as fluorescence intensities, on each spot of each microarray for each sample that interacted to the respective array. The method of analyzing the microarray data set, includes:

- pooling samples of the first subset to a variety class Vo.

- for each array of the second subset, determining those spots that show a statistically significant differential expression with respect to the first subset; and

- accepting from the determined spots, only those spots as truly differential spots that are in consensus across a large percentage of all arrays of the second subset.

DEF_ANOVA can also be applied in the situation where there exist replicates within the first subset / variety class but no replicates forming a second class - but then subsequent verification of the statistically significant genes can not be performed by looking for consensus across replicates within a class. In a preferred embodiment, in addition to the samples mentioned above also a common sample _com_mon is applied to each array of each subset. This sample is labeled in a different way (it uses a different dye) then for the other samples. As described for invention 1, first an across array normalization of the microarray data set is performed based on the interaction of the common sample Fco_mmon ^on *h^e respective microarrays. The common sample ^_common plays the role of VQ described in invention 1. After this spot correction normalization, with optional outlier removal as described for invention 1, the DEF-ANOVA analysis is performed within the single dye as described above.

It will be understood that the DEF-ANOVA can be applied for more than one dye. So, in a situation where one dye is used for F_common, it is possible to apply to each array samples with a second dye and with a third dye. DEF-ANOVA can then be applied to the second dye samples and, separately, also to the third dye samples. DEF-ANOVA can also be applied to any 1-dye study with at least 1 variety class consisting of multiple arrays (to form Vo) and any other number of additional arrays whether within variety classes or distinct varieties. This allows DEF-ANOVA to be applied to many different study designs and many different array technologies (Ex: Affymetrix arrays).

Feature Discovery

Equation 1 within "Normalization with Outlier Detection and Spot Correction" can also be used to identify regions/spots/pixels that differ within intensity measures or background measures across the arrays within a dataset irrespective of the number of channels (i.e. dyes) or the hybridization design of the study. Application to background measures allow feature detection within background measures to determine improved estimates for optimal background subtraction. For the application to background measures, the term y;_g in equation 1 does not denote the measured fluorescence intensity for spot g on array i but instead denotes the measured fluorescence intensity of an area around a corresponding spot g on array i (such an area being referred to as a background area). In principle such area may be chosen in any suitable way. For example, the area may be substantially circular, centered on the spot, leaving out the spot itself (resulting in a donut-like area). Also other shaped areas may be used. Preferably, the respective background areas of the spots do not overlap. Application to intensity measures (including but not limited to original, background subtracted or normalized intensity measures) allows the identification of regions/spots/pixels upon the arrays which differ with respect to identical regions on other arrays. Regions refers to a set of spot intensities or background intensities or pixels that are neighboring and display similar feature patterns. No predefined grouping is made, rather (AG)i_g estimates are analyzed to see which spots/pixels are neighbors to other spots/pixels displaying the similar trend. These regions/spots/pixels can subsequently be analyzed in conjunction with either background measures, background (AG)i_g estimates, (AG)_ig estimates for same variety class samples, (AG)_ig estimates for different variety class samples, or any standard microarray measure for improved feature identification, classification, and processing. Ultimately, it is anticipated that a combination of noise reduction techniques will provide the optimal normalized dataset and application of equation 1 provides unique insight into which noise reductions techniques perform best for each unique type of noise/bias. Thus, according to the invention the (AG)ig estimate(s) that can be obtained using equation 1 (such as background (AG)_ig estimates, (AG)_ig estimates for same variety class samples, (AG)ι_g estimates for different variety class samples) are separately or in combination used for selecting one or more noise reduction techniques from a predetermined set of such techniques. This is preferably done by applying at first one of the noise reduction techniques to the measurements (of the spots and/or background areas), applying the estimation according to equation 1 of the invention to the noise corrected +measurements, and comparing the outcome obtained using a second noise reduction technique, a third technique etc. The technique giving good results (preferably, the best results of all available techniques) is used. As the best results can be seen those results that optimize the signal while minimizing noise patterns most. Relatively low AG values are an indication. If so desired, several techniques giving good results may be combined (i.e. different techniques for different subsets of spots), where the equation is used to determine whether the combination gives an improved outcome compared to using only less techniques. Additionally, application of equation 1 to datasets [in addition to those amenable to PAR ANOVA and DEF ANOVA] provides unique insight into differentially expressed genes of interest.

Scanner control

Spot correction, as described above, finds the perfect mapping to bring all reference channel signals in correspondence with one another on each of the arrays. However the arrays with the smallest range of intensities will reduce the information from the arrays with the largest range of intensities. Any information loss is best avoided if possible. Spot correction will intuitively work best when both channels of all arrays yield a similar range of intensity values (being careful to make this range as high as is reasonable without saturating too many if any spots). Some microarray facilities take great care to optimize laser settings and PMT gain on a per array basis to satisfy certain known or expected criteria (removal of low intensity banana curves, centering the M vs. A plot about M=0, matching known behavior of landmark spots, etc.). Many microarray facilities however are more likely to trust recommended scanner settings and PMT gain that were supplied with the equipment and/or might unknowingly choose criteria for adjustments that are not optimal. Better tools are needed to help researchers choose laser settings and PMT gain that optimize the information gain within their studies.

Assuming that a research facility has sources of error in one or more of many possible stages (mRNA quantification, dye labeling, inconsistencies in hybridization conditions, etc.), it might be necessary to revert back to the premise that "most genes don't change in expression" in order to normalize the arrays. In such a case, selecting laser settings and PMT gain on such global criteria as "eliminating the banana curve" or "getting most data points centered about M=0" might not be the most optimal. Rather, when being forced to assume that "most genes don't change in expression", it might be best to get similar expression ranges/distributions in all channels of all arrays (hence similar to one dye microarray studies). Sometimes, when things go wrong enough within one or more steps of the analysis, a wide distribution of intensity ranges can be obtained that are not biologically justified and which can have nonlinearities that are not easily corrected for. This is illustrated in Figs.15 to 17. In each of those figures the intensity profiles are shown for a red and green dye. To distinguish the profiles in the figures encapsulating curves have been added. Curves indicated with 10 are used for the red dye intensities; curves indicated with 20 are used for the green dye intensities.

Fig. 15 shows an example, wherein the red dye channel has a distribution with many low intensity values in a tighter spread around the peak than the green channel. A simple shifting of the red distribution will not yield similar distributions. An intensity dependent scaling will also be necessary. Alternatively, the laser intensity and/or PMT gain for the red channel could have been set higher obviating the need for such a large scaling. Fig.16 shows an example wherein the red dye channel and green dye channel have similar distributions of intensity values slightly shifted from one another with the red channel having slightly higher intensities but similar spread. A simple shifting of the red distribution will probably yield similar distributions. Here, the laser intensity and PMT gain for the red and green channels are in good balance with one another. Fig. 17 shows an example wherein the red dye channel has a broad distribution with many high intensity values spread off of the scale of Figs. 15, 16 and 17. A large scaling will be necessary to make these distributions more similar - but this large scaling will be nonlinear (i.e. dependent on intensity). Alternatively, the laser intensity and/or PMT gain for the red channel could have been set lower obviating the need for such a large scaling.

According to the invention, a target intensity profile is determined. For each of the arrays, the settings of the scanner are then adjusted until the profile of the array optimally matches the target profile. Normally, the scanner output is a 'photo' of the array, like for example shown in Fig. 10. Preferably, the intensity distribution is determined per pixel of the scanner output. The profile may be formed by making a histogram of the intensities. The target intensity profile may be determined by, for one array of the set, creating an intensity profile for each channel. Next, the settings of the scanner are adjusted to bring the channel-specific profiles together. This may be done iteratively until an optimal match of the profiles is achieved. The combined profile that has been determined in this way can then be used for all arrays of the set; it acts as the target profile. In a preferred embodiment, instead of determining the target profile from the 'red' and 'green' profile of one array, all arrays of the set may be pre-scanned. Next, an average intensity profile is calculated, with for example the Quantile Normalization technique of Bolstad et. al., and used as the target profile. For determining the target intensity distribution, preferably, obvious landmark intensities that might bias the distribution are removed from the distribution, h a preferred embodiment, the scanner is controlled automatically to obtain an optimal match of the newly scanned intensity distribution with the target intensity distribution. Any suitable mechanism may be used for this. For example, determining in which direction a setting should be changed (e.g. increase of laser intensity of PMT amplification), and choosing a next setting. Also a large step may be taken, using a binary search approach of halving the step each time in the right direction. The next setting may also be chosen in dependence on the change in intensity profile achieved by the previous setting. In general it should be avoided to perform an excessive number of scans, since each scan tends to degrade the quality of the array. Any suitable control algorithm may be used. Such algorithms are generally known and may be applied. By using automatic control to achieve an optimal fit of the intensity profile with the target profile it is avoided that due to imperfect manual intervention one or more arrays are not optimally scanned.

The scanner control described above aims to achieve optimal conditions for application of the spot correction normalization methodology. To this end it is desired that both the red and green channels give similar distributions/profiles for red and green intensities across all arrays. It makes sense at such a point to optimize experimental parameters for the most effective application of spot correction normalization, including the choice of laser and PMT gain settings. Best correlation between the red and green channels is not necessarily achieved when the M vs. A plot is centered about M=0 for low intensities (as is one of the current standards). It is preferred not to use visual inspection of the M vs. A plot for manual adjustment of laser and PMT gain settings. In such an approach settings are typically chosen based on the low intensity /baseline /blank spots and a proper intensity distribution is not obtained for the medium and high intensity spots. Large variation for the high intensity spots/genes can be unknowingly accepted within an M vs. A plot while trying to minimize low intensity variations. Large variations for the high intensity spots/genes indicate probable nonlinearities between the red and green channels (more so than small variations for the low intensity spots/genes would have yielded in an M vs. A plot that does not perfectly center the low intensity spots/genes about M=0). Nonlinearities between the red and green channels that are too large will demand further nouvelle adjustments/corrections prior to the application of spot correction normalization.

It is preferred to base the automatic feedback of laser and PMT gain settings on well correlated intensity range distributions for both channels of all arrays within a given microarray study. This feedback, in order to be automatically given between the prescan and the actual scan, is preferably based on per pixel analysis (i.e. prior to manual segmentation of the image). Preferably, a model is used to determine changes to laser settings and PMT gain to obtain a scan with a profile that closely matches the target profile. This model may be empirically determined, for example by changing a scan with a typical distribution to a typical target profile in order to determine the effect of changing either setting. Techniques such as Quantile Normalization can be used to correct datasets that have not been optimized for spot correction normalization criteria prior to the application of spot correction normalization according to the invention - though Quantile Normalization is sensitive to high intensity outliers and could reduce information within high quality arrays by the inclusion of too many low quality arrays

In a preferred embodiment, target profiles that may have been obtained under control of a human operator are stored in the scanner device. Such a target profile can be retrieved for subsequent scanning operations. The target profile can be used effectively for arrays with similar characteristics (e.g. from a same manufacturing batch and same incubation operation), but can also be applied more broadly for an array and sample selection (both variety and reference) that has the characteristic that "most genes don't change much" such that it makes sense to compare intensity distributions across both channels of all arrays within the study. The target profile may be stored in combination with identifying information, such as batch of manufacturing and incubation batch. This enables automatic retrieval of target profiles and makes it easier to analyze profiles in order to improve models for automatic control of the scanning device.

Care must be taken not to bias the intensity range distributions with landmark intensities (Ex: Cy3 fixed directly to a distinct set of spots in varying intensities, sometimes quite high). Without segmentation of the raw per pixel intensities, anticipated landmark intensity distributions for any given microarray should ideally be subtracted.

General observations

Traditionally, ANOVA models applied to microarray datasets are fit to log scaled datasets t_ikg = log(_y^) instead of y_ikg . As indicated in the equations above, in an embodiment the fitting is done for the unsealed data to overcome numerical instabilities with log scaling.

Mathematically, a large number divided by a large number yields a tighter range of ratios than when small numbers are divided. Furthermore, comparing the logs of small numbers tends to amplify the differences between the small numbers whereas comparing the logs of large numbers tends to shrink the differences between the large numbers. It is still recommended to sometimes analyze the log scale dataset to check for consensus between scaled and nonscaled datasets. As an example only the derivation of the preferred nonscaled equations are shown here. A person skilled in the art can easily derive similar equations for scaled measurements.

Traditionally, background measures are subtracted from spot intensities to correct for array specific local variations in binding efficiency. Sometimes, however, the background measures might be inaccurate or too large (ex: due to diffuse light scattering from neighboring bright spots). In such a case, subtracting background measures from spot intensities would erroneously subtract too much from the intensity signal. It is insightful in either case to analyze both the raw intensities and the background subtracted intensities (and even just the local background intensities) as separate datasets to gain useful information on such things as spot outliers vs. background outliers, the cumulative effect of intensity baseline noise and background baseline noise, etc. Analysis of just the background measures is a useful tool for automatically delineating smears, smudges, scratches, bubbles, etc. on any given array - making it easier and more time efficient to flag poor quality regions.

Further novel applications of the algorithms herein are also being tested for concurrency of principle. Just one example is the application of the Spot Correction algorithms to a set of arrays which always have variety class A labeled with red and variety class B labeled with green. Here, neither the red nor green channel consists of a common reference sample, but rather are similar biological samples within a variety class. Although some biological variance is expected across samples within a variety class, fransforming all signals (both red and green channels) relative to the average signal within one channel (of the respective variety class) [i.e. performing spot correction normalization from and on a variety class] preserves any per array biological variance while minimizing any local noise.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. Use of • the verb "comprise" and "include" and its conjugations do not exclude the presence of elements or steps other than those stated in a claim. The article "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. A computer program product may be stored/distributed on a suitable medium, such as optical storage, but may also be distributed in other forms, such as being distributed via the Internet or wired or wireless telecommunication systems. In a system/device/apparatus claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

The detailed derivation of outlier detection, spot correction normalization, PAR- ANOVA and DEF-ANOVA are included for illustration purposes only. All modifications of actual equations which still uphold the essence of the developed claims are implicitly included within these claims. Further adaptations to the equations which include but are not limited to such issues as spot/gene replication upon the arrays, explicitly modeling for pin-block effects, explicitly modeling for other anticipated effects, the separation of the noise into more than just the AGi_g term and analysis of either and/or both(all) terms for outlier determination and/or subsequent normalization, adaptation of the equations for hybridizations with more than one sample variety and more than two dyes, and the future possibility of introducing nonlinear terms/effects within the ANOVA equations etc. are all implicitly included within these claims as long as the essence of utilizing the common reference channel to elicit noise information, outlier detection, and/or normalization corrections are implemented within the new spot correction normalization designs/algorithms and that the essence of statistical significance determination on a "per array basis" within the applicable design constructs (and trusting results that are in consensus when multiple arrays within a variety class exist - as opposed to first pooling the arrays within the class) are implemented within the new PAR-ANOVA and/or DEF-ANOVA designs/algorithms.

Claims

CLAIMS:

1. A method of analyzing a microarray data set, the microarray set including a plurality of microarrays, where each microarray A_t includes a plurality of spots G_g for interaction with a biological target within a biological variety V& each spot on a microarray being associated with a corresponding spot on any of the other microarrays of the set; for each microarray, the spots being subjected to a biological interaction, such as a hybridization, with biological material from a common reference variety Vo and with biological material from at least one further variety V_k (k>0); the microarray data set providing measurements Y_ykg, such as fluorescence intensities, on each spot of each microarray for each variety that interacted to the respective array; the method including performing an across array normalization of the microarray data set based on the interaction of the common reference variety on the respective microarrays.

2. A method of analyzing a microarray data set as claimed in claim 1 , wherein the step of normalizing the data set includes: selecting all measurements Y_ig from the data set that relate to the common reference variety; and estimating a term (AG)_ig indicative of the spot variance of the common reference variety for the g* spot on the z^-th array.

3. A method of analyzing a microarray data set as claimed in claim 2, wherein the step of estimating the term (AG)ι_g includes fitting a linear data model to the selected measurements that includes the following terms μ + A_t + G_g + (AG)_ig , where:

• μ is the average of all selected measurements,

• Ai is average intensity of all spots for the common reference variety and the at least one further variety on the i array, and

• G_g is average intensity of the g spot within the common reference variety and the at least one further variety over all arrays.

4. A method of analyzing a microarray data set as claimed in claim 2, including removing outlying measurements by: applying an outlier detection criterion based on the spot variation term (AG)_ig for selecting measurements Y_ig that are unreliable; and removing measurements Y_ig for each spot g that has been identified to have an outlier Y_ig on at least one array i.

5. A method of analyzing a microarray data set as claimed in claim 4, wherein applying the outlier detection criterion includes comparing (AG)_ig with measurements 7 from the data set that relate to the common reference variety.

6. A method as claimed in claim 4 or 5, including enabling a human operator to adjust the outlier detection criterion.

7. A method as claimed in claim 5 or 6, including visually representing to a human operator (AG)_ig and measurements Yt_g from the data set that relate to the common reference variety.

8. A method of analyzing a microarray data set as claimed in any one of the claims 2 to 7, including the steps of iteratively estimating the term (AG)_ig and removing outlying measurements until a stopping criterion has been reached.

9. A method of analyzing a microarray data set as claimed in any one of the claims 2 to 8, including performing a spot correction by adjusting the measurements for all varieties for the i array and the g^¹ spot in dependence on the corresponding term (AG)_ig .

10. A computer program product operative to cause a processor to perform the method as claimed in any one of the preceding claims.

11. A system for analyzing a microarray data set that includes a plurality of microarrays, where each microarray A{ includes a plurality of spots G_g for interaction with a biological target within a biological variety V_k, each spot on a microarray being associated with a corresponding spot on any of the other microarrays of the set; for each microarray, the spots having been subjected to a biological interaction, such as a hybridization, with biological material from a common reference variety Vo and with biological material from at least one further variety V_k (k>0); the system including: a measurement device for obtaining measurements Yy_kg, such as fluorescence intensities, on each spot of each microarray for each variety that interacted to the respective array; and an analysis device for perforaiing an across array normalization of the microarray data set based on the interaction of the common reference variety on the respective microarrays.

12. An analysis device for use in the system as claimed in claim 11 ; the analysis device including: an input for receiving from a measurement device measurements Yy_kg, such as fluorescence intensities, on each spot of each microarray for each variety that interacted to the respective array; a processor for, under control of a program, performing an across array normalization of the microarray data set based on the interaction of the common reference variety on the respective microarrays; and an output for providing an analysis outcome.