WO2008062855A1 - Méthode de détection des défauts dans les données de microréseaux d'adn - Google Patents

Méthode de détection des défauts dans les données de microréseaux d'adn Download PDF

Info

Publication number
WO2008062855A1
WO2008062855A1 PCT/JP2007/072605 JP2007072605W WO2008062855A1 WO 2008062855 A1 WO2008062855 A1 WO 2008062855A1 JP 2007072605 W JP2007072605 W JP 2007072605W WO 2008062855 A1 WO2008062855 A1 WO 2008062855A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
data
cell
dna microarray
values
Prior art date
Application number
PCT/JP2007/072605
Other languages
English (en)
Inventor
Tomokazu Konishi
Original Assignee
Akita Prefectural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Akita Prefectural University filed Critical Akita Prefectural University
Priority to JP2009520720A priority Critical patent/JP5147084B2/ja
Publication of WO2008062855A1 publication Critical patent/WO2008062855A1/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6834Enzymatic or biochemical coupling of nucleic acids to a solid phase
    • C12Q1/6837Enzymatic or biochemical coupling of nucleic acids to a solid phase using probe arrays or probe chips
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers

Definitions

  • the present invention relates to a method of detecting hybridization problems in DNA microarray data.
  • Hybridization is the basis of microarray analysis and, while widely used, is not free from technical problems. For example, some hybridizations form a doughnut-like geometric pattern around the center of chip images. Such patterns often result in reduced signals from certain areas of the chip, appearing similar to surface scratching that may be attributed to the entrainment of dust. Although analytical programs that identify such problems have been proposed, the methods are destructive, resulting in the total cancellation of the array chip data when large defects are present.
  • the dChip package implements several automated algorithms for recognizing and removing outliers during model-based data normalization.
  • the algorithms find patterns in the responses among perfect match (PM) and mismatch (MM) probes for each gene, and cells and probe sets that disagree with the resultant patterns are identified as outliers.
  • PM perfect match
  • MM mismatch
  • this approach is based on a series of mathematical models that are derived from a very simplified view of both biological fundamentals and the composition of the data.
  • the appropriateness of the models and the calculation methods are difficult to check rigorously as there is no objective indicator for how well the models, which inevitably contain parameters for handling noise, describe the experimental system.
  • microarray data often include problems caused by uneven hybridization and dust contamination. Such problems should be removed prior to analysis to prevent degradation of analytical accuracy and false positive results.
  • the present invention seeks a method that finds out the troubles as local tendency of cell data in comparisons of each array to an ideal standard of hybridization. Cells at the identified locations of the troubles are cancelled before data normalization. The cancellations will not affect the original distribution of the array data, since the cancellations are independent to signal intensities. Consequently, remained data will be able to be used for analyses. According to one aspect of the present invention, a method of detecting defects in
  • DNA microarray data comprises: providing target DNA microarray data that are a set of cell values obtained from a DNA microarray; providing standard data that are a set of standard values, each standard value corresponding to each cell value of the DNA microarray data; obtaining a difference value between each cell value of the DNA microarray data and each standard value of the standard data; obtaining a pseudo image by replacing each cell value of the DNA microarray data with the difference value; calculating a representative value for a small region corresponding to a predetermined number of cells in the pseudo image based on the difference values of the predetermined number of cells, repeating the calculating by moving the small region over the pseudo image cell by cell to obtain a set of representative values for the small regions; and detecting one or more small regions having an outlying representative value based on a distribution of the set of representative values in comparison with an expected normal distribution of the set of representative values wherein the detected one or more small regions include cells having defective cell values.
  • the target DNA microarray data and the standard data are normalized.
  • the standard value is a representative value for each cell obtained from a plurality of normalized DNA microarray data.
  • the representative value for each cell is a measure of central tendency including mean, median and mode.
  • the measure of central tendency is trimmed mean, median, or weighted means.
  • a plurality of normalized DNA microarray data are obtained by the same type of DNA microarray as the target DNA microarray data.
  • the number of DNA microarray data may be six to ten sets of DNA microarray data, for example.
  • the DNA microarray data sets for the standard data are obtained based on the same tissue.
  • the DNA microarray data sets for the standard data are obtained based on a plurality of difference tissues, hi the latter, preferably, a number of DNA microarray data for a variety of different tissues are prepared.
  • the size of window namely, the size of small region is between 3 x 3 cells and 10 x 10 cells, hi one preferable example, the size of window or the size of small region is 5 x 5 cells.
  • the representative value for a window is a measure of central tendency of difference values for each cell. Specifically, the measure of central tendency is median, trimmed mean, or weighted mean.
  • the set of representative values for the small regions are normalized to obtain a set of indices; and one or more small regions whose index value exceeds a critical value that is predetermined according to an expected normal distribution of the set of indices are detected.
  • both the indices and the critical value are z-scores.
  • cell values for cells belonging to the detected one or more windows are cancelled.
  • the present invention relates to a computer program for causing a computer to execute the above described steps for detecting defect in DNA microarray data.
  • the present invention relates to a computer program for causing a computer to execute the above described steps for detecting and removing defect hi DNA microarray data.
  • the present invention relates to a computer-readable medium for storing the above described programs.
  • Figure 1 shows a histogram of standard deviations for medians of moving windows.
  • the mode is 0.31, larger than the expected value of 0.25.
  • Figure 2 shows coincidences between two sets of ideal standards for leaves analyzed by two different laboratories.
  • Figures 3A, 3B, and 3C show distributions of differences between hybridizations and standards.
  • Straight line at y x denotes the normal distribution.
  • Data are denser at the center of the plots. Only 2.3%, 0.1%, and 0.003% of data have z-scores greater than of 2, 3 and 4, respectively.
  • Figures 4A, 4B, and 4C show distributions of index values.
  • Figure 6 shows numbers of cancelled cells at expectations of 2 and 20 windows (50 and 500 cells, respectively).
  • Figure 7 shows standard deviations of differences among cell data in reproducibility measurements.
  • Figure 9 shows positions of cancelled windows in a chip. Four typical examples at the indicated expectations are shown. Upper left: hybridization with relatively small numbers of cancellations. Upper right: uneven hybridization. Lower left: regular shapes with straight boundaries. Lower right: clusters at symmetric positions.
  • Figure 10 shows generation of standard data.
  • Figure 11 shows obtaining of a difference value between DNA microarray data and standard data.
  • Figure 12 shows scanning of window over a pseudo image.
  • Figure 13 shows a window of the present invention provided in a pseudo image.
  • Figure 14 shows a flow chart explaining the present invention.
  • Hardware configuration for executing the method of the present invention comprises a computer apparatus not shown including an input device, an output device, a display device, a storage device that may include a hard disc drive, a memory device, a computer-readable medium, or any other storage means, and a processor.
  • Various data for the present invention including measured data and calculated data are stored in the memory device.
  • Various calculations are conducted by the processor.
  • various data including measured data and calculated data may be displayed on the display device in various forms.
  • Target DNA microarray data are provided (Fig. 14 Sl).
  • the target DNA microarray data are a set of cell values.
  • the target DNA microarray data are originally obtained as a set of signal intensities of probe cells of a DNA microarray.
  • each cell value is a normalized logarithmic value (z-score) that is obtained by taking logarithm and z-normalizing the logarithmic value. The median-based normalization can be used.
  • the target DNA microarray data are stored in the storage device.
  • Standard data are provided.
  • the standard data are a set of standard values. Each standard value corresponds to each probe cell of the DNA microarray.
  • the standard data are hypothetical data or reference data and a set of standard values are typically obtained by calculation results. Ideally, the standard data are a set of expected hypothetical values that are the most average or most probable values.
  • the standard data set is prepared as a z-score so as to correspond to z-normalized cell values of target DNA microarray data.
  • the standard data are stored in the storage device.
  • the standard data are obtained from a plurality of normalized data sets (e.g. 6 to 10 sets) obtained by the same type of target DNA microarray.
  • Each standard value is obtained by calculating a representative value for each cell values of the plurality of normalized data sets.
  • the representative value is a measure of central tendency including mean, median and mode.
  • the representative value may be trimmed mean, median or weighted mean.
  • the standard data are found from a plurality of normalized DNA microarray data sets for the same tissue as used for the target DNA microarray data. The tissue used for the standard data set is not limited to the same tissue.
  • the standard data are found from a representative value (trimmed mean, median, weighted mean, for example) for a plurality of normalized DNA microarray data sets for a variety of different tissues.
  • the median-based normalization can be used.
  • the target DNA microarray data are GeneChip ® data
  • the standard data are preferably obtained from a plurality of GeneChip ® data sets. If there exists a perfect DNA microarray data with no defects and errors, one DNA microarray data can be used as the standard data.
  • the standard data are not limited to those obtained based on actual measurement.
  • Each standard value of the standard data may be the same value.
  • Each standard value of the standard data may be zero. In this case, a difference value and each cell value for the target DNA microarray data are equivalent. Further, the standard value may be a set of pseudo-random numbers with small variance.
  • Normalization techniques for the target DNA microarray data and the standard data are not restricted to the median-based method. Rather, it is possible to use other methods that enable to make the normalized data (typically z-scores) comparable therebetween.
  • the three-parameter method Konishi, T., Three-parameter lognormal distribution ubiquitously found in cDNA microarray data and its application to parametric data treatment. BMC Bioinformatics, 5:5, 2004, which is incorporated herein by reference
  • other normalization techniques known to the person skilled in the art can be used.
  • a difference value between each cell value of the target DNA microarray data and each standard value of the standard data is obtained (Fig. 14 S2). At this stage, it is not possible to cancel cell values with a large difference value because the large difference value may be biologically significant value.
  • a difference value may include a ratio of each cell value to the standard value.
  • the difference values are calculated by the processor and are stored in the storage device. Each difference value corresponds to each cell of the DNA microarray.
  • a pseudo image is obtained by replacing each cell value of the DNA microarray data with the difference value (Fig. 14 S3). Namely, if the DNA microarray is comprised of M x N probe cells, also known as spots, a pseudo image is also comprised of M x N cells, each cell of which has a respective difference value. As shown in Fig.13, the pseudo image is an image where each cell has difference values ⁇ z h A z 2 ⁇ z 3 ⁇ z 4 ,A z 5 , .... The pseudo image may be displayed on the display device but it is optional. A window of a predetermined size corresponding to a predetermined number of cells of a small region in the pseudo image is provided. Referring to Fig.
  • the window is moving over the pseudo image one cell by one cell while sequentially calculating a representative value for each window based on the difference value of the predetermined number of cells, thereby obtaining a representative value set of the windows (Fig. 14 S4).
  • Fig. 12 merely shows a schematic representation for the moving window and a moving (scanning) direction of the window may be any directions including a horizontal direction and a vertical direction on the pseudo image.
  • a window W sequentially moves horizontally on the pseudo image cell by cell (W t , W t+ls W t+2 .).
  • the representative values for windows are calculated by the processor and are stored in the storage device.
  • the window algorithm or operation of the present invention itself is similar to neighborhood or local operation of image processing. Namely, visit each cell C in the pseudo image and calculate a representative value for a neighborhood (a small region) in the pseudo image including the cell C. However, according to the window operation of the present invention, unlike the neighborhood operation of image processing, it is not necessary to renew a value of cell C with the representative value. According to the present invention, an aim of the window operation is to obtain a value for representing a small region in the pseudo image. Also, according to the present invention, the cell C is not necessarily centered on the window or small region. The position of the cell C in each small region can be any predetermined position in the small region.
  • a representative value for the window is a measure of central tendency including mean, median and mode.
  • the representative values may be median, trimmed mean, or weighted mean.
  • the size of the small region (window) is between 3 x 3 cells and 10 x 10 cells, hi one preferable aspect, the size of the small region (window) is 5 x 5 cells.
  • the window W corresponds to a small region of 5 X 5 cells.
  • a representative value (median, for example) for 25 cell values of the pseudo image is obtained and the obtained value represents the window (small region) of interest.
  • the representative value for a small region (5 x 5 cells, for example) is calculated while moving the window W over the pseudo image one cell by one cell.
  • the obtained representative value for each window (small region) is stored in the memory device.
  • the calculation of the representative value for the window does not require actual display of the pseudo image on the display device.
  • probe cells of DNA microarray are randomly arranged without biological significance, even with neighboring probe cells of DNA microarray, the representative values for windows (a small region of a predetermined number of cells) must be distributed normally according to central limiting theorem.
  • the representative value sets are normalized to obtain z-scores of the representative values so as to compare with a critical value (Fig. 14 S5).
  • the z-scores of the representative values are indices with which a predetermined critical value or cut off value (prepared as a z-score) is compared.
  • the width can be obtained indirectly from a width of distribution for the difference value set.
  • the width of distribution for the difference value set is obtained and then correcting the obtained width with a compensation factor.
  • the compensation factor may vary depending on the representative value. For example, if the representative value is mean, the compensation factor may be 1 V ⁇ n.
  • the compensation factor may be obtained using a simulation such as Monte Carlo method.
  • the width of distribution for the representative value sets can be obtained directly from its distribution. For example, IQR (Interquartile Range) or MAD (Mean Absolute Deviation) can be used as the width for the distribution.
  • the width can be obtained from a slope of linear regression to approximate Q-Q plots for the representative value set.
  • the width can be predetermined based on various actual measurements. Specifically, a set of widths obtained from various measurements are prepared and a mode for the set is used as a predetermined width. The standard deviation may be used for normalizing the representative value set.
  • One or more windows including possible defect cell values are detected by comparing each index value with a critical value that is predetermined according to an expected normal distribution of the indices.
  • a window whose index value exceeds a predetermined critical value is regarded as a small region including defect cell values (Fig. 14 S6).
  • the critical value is predetermined as a z-score according to normal distribution by an operator. For example, if it is wished to cancel two windows from an ideal normal distribution, a z-score of 4.61 may be predetermined. However, if the critical value of 4.61 is predetermined, more than two windows are generally detected. All cell values of detected one or more windows are cancelled or rejected (Fig. 14 S7).
  • Resultant data after cancellation and/or cancelled data may be displayed on the display device in the various forms including a graph and an image of arrayed probe cells, for example. Alternatively, the cell values of detected one or more windows are corrected instead of cancelling.
  • a window of predetermined size is used.
  • predetermined different sized windows may be used. For example, a window corresponding to 3 x 3 cells and another window corresponding to 7 x 7 cells are used and results are combined. Namely, windows are examined based on two sets of representative values. Detected windows are compared between the results for the different sized windows and overlapping cells may be cancelled. For example, if detected windows are completely overlapped, cell data for 3 x 3 cells are cancelled.
  • the following section explains an algorithm that finds and removes the troubles.
  • the troubles are distinguished from biological effects by means of data distribution.
  • the algorithm bases on several verifiable assumptions of which appropriateness is tested with GeneChip ® data as a non-limiting example in the Results section.
  • the validity of the algorithm and the effects of data cancellation are tested using GeneChip ® data obtained from a series of experiments.
  • the algorithm is demonstrated to greatly improve the reproducibility of measurements, and removes only a small number of faultless data.
  • the proposed method hereinafter referred to as a parametric scanning algorithm, for identifying microarray problems is as follows.
  • the present invention provides a parameter-scanning algorithm to detect such defects on the basis of the character of data distributions.
  • the cell data is thoroughly scanned using a window algorithm, and windows with an index value greater than a critical value, also known as a threshold are recognized as defects and removed from the array data.
  • the index is found from the differences between tne target ana an iaeai standard of hybridization obtained as a trimmed mean among experiments, representing the statistical center of differences in each section.
  • the threshold is derived as a screening level designated by the operator, but has only limited effect on the effectiveness of data cancellation.
  • a standard, ideal array is selected, and indices representing the size of distinct regions in each chip are determined. Regions with indices larger than a threshold value in reference to the standard are recognized as problem areas.
  • the standard is found as a set of trimmed means among hybridizations.
  • the experiments are simply normalized by dividing the respective median values (including both PM and MM cells) and taking logarithms.
  • the trimmed means of data for each cell in the array are calculated, the resulting set of means is adopted as the ideal standard of hybridization. If the means are calculated using a sufficiently large number of array data, the values can be considered stable and to be suitable for a standard. No particular distributions are expected in the ideal standard.
  • Differences between simply normalized array data and the standard are then found for each cell. These differences may represent both biological responses and experimental noise.
  • the distribution of the differences is expected to be approximately normal, since the logarithms of biological changes appropriately measured and normalized obey a normal distribution.
  • the differences are therefore z-normalized using robust estimators of the distribution parameters, and the distributions are checked on quantile-quantile (Q-Q) plots. Normalization of the differences is for analysis of the characteristic of differences and is an optional step for the present invention.
  • the indices are found by using the medians of the z-normalized differences among neighboring cells which correspond to cells within a window on an array.
  • the matrix of the differences is rearranged to reflect the physical order of the chip, and data are collected via a moving window that simulates scanning through a pseudo image of the chip to find the medians.
  • the window median is robust to biological responses, since neighboring cells on a chip do not have biological relationships. In contrast, experimental problems that hide or add signals at the window will affect the window median.
  • the window medians will obey a normal distribution in a strict sense, according to the effect described by the central limiting theorem. Although this model does not expect particular distributions for problems, affected windows will produce outliers in the normal distribution of the matrix medians.
  • the indices are found by normalizing the matrix medians. There is a difficulty in the normalization; width of the distribution of matrix medians is not robust to problems. Indeed, the width may increase with the number of problems. If the distribution is simply z-normalized, the number of recognized problems will be reduced. However, this effect can be readily avoided by finding the width from that of the distribution of the differences among cells. In principle, a width of 0.25 was predicted in the present study for the mean of a window of 25 cells. Here, the width of the distribution of cell differences is robust with respect to problems, since large problems will produce outliers that will not affect the distribution at the central quantiles. In practice, the distributions for cells are not perfectly normal, having long tails possibly due to systematic additive noise in the data.
  • the proper width can be estimated robustly from the proper quantiles. Consequently, the effect of the problems can be excluded by estimating the width of the distribution of indices according to the distribution for cells.
  • Systematic noise as well as hybridization problems may change the compensation factor of 0.25 to somewhat larger values.
  • a constant of 0.31 was used, obtained as the mode in actual measurements and being smaller than many other values that may have been affected by many problems (Fig. 1). All indices were adjusted or normalized by dividing by this constant.
  • the threshold is derived by a test level decided by analysis prior to the operation, similar to screening levels in other statistical tests. The parametric nature of data handling makes it possible to estimate how many indices will be larger (and smaller) among half a million results.
  • the program will ask the operator how many windows should be expected. If an array is problem-free, the expected number of windows will be recognized by the random neighboring of biological responses on the chip. In practice, the affected indices will not obey the normal distribution and will more likely take values that exceed the threshold.
  • B-2 Program A program for the parametric scanning method is provided in the form of a function for the R.
  • the function requires the library "affy", which is available from BioC (http://www.bioconductor.org/).
  • An outsourcing service is available as a part of data normalization (http://www.super-norm.com).
  • PM data for the arrays were normalized according to the three-parameter method (Konishi, T., Three-parameter lognormal distribution ubiquitously found in cDNA microarray data and its application to parametric data treatment. BMC Bioinformatics, 5:5, 2004).
  • C-I -1 Stability of hybridization standard The method compares each datum with the ideal standard of hybridization, which should represent a stable pattern of the sample tissue. If the pattern is truly stable, the pattern will coincide with that of other standards determined using different sets of data on identical tissue. To confirm this coincidence, the standards obtained using data from two research groups were compared. Both groups determined the transcriptome of leaves, one as part of an atlas of plants, and the other as a control for infection experiments. Standards were obtained as trimmed means of the median-normalized log data. The results were compared on a scatter plot with 1000 corresponding cell data (Fig. 2). The coincidence between laboratories was thus confirmed. Some other examples of inter- and intra- laboratory comparisons are presented, showing likely correspondences.
  • the proposed method assumes that the differences between each datum and the ideal standard of hybridization will be distributed normally in a rough sense. This assumption was confirmed by means of QQ plots for the data distribution. The distributions had long tails, which may reflect the systematic additive noise of measurement. However, all of the distributions were coincident with the theoretic values at -1.5 to 1.5 (Figs. 3 A, 3B and 3C), indicating that more than 85% of the data obeyed the normal distribution. As problems and noise influence the distribution, hybridizations with large problems had a narrower range of coincidence, as observed in the case shown in Fig. 3C (ATGE 14C).
  • the method also assumes that the indices, which are derived from the medians of the moving windows, will be distributed normally when large problems are not present. This assumption was also confirmed by means of QQ plots (Figs. 4A, 4B and 4C). The distributions observed were roughly normal, as expected from the central limiting theorem. The standard deviation of 0.31, determined from many hybridizations (Fig. 1), afforded good compensation for the width of the distribution and slope of the plot (Figs. 4A and 4B). As expected, the width of the distribution increased with the severity of the problems (Fig. AC, ATGE 14C).
  • Figure 6 compares the numbers of cancelled data under different expectations. Data sources of Figure 6 are as follows: rectangles (Schmid, M., Davison, T. S., Henz, S. R., Pape, U. J., Demar, M., Vingron, M., Sch ⁇ lkopf, B., Weigel, D., and Lohmann, J., A gene expression map of Arabidopsis development.
  • the problems detected had patterns indicative of surface polishing, uneven hybridization, and errors in the fabricated cell structure.
  • Symmetric patterns of clusters surrounding the center of the chip (Fig. 9, lower right) can be identified as polishing artifacts.
  • the signals in the affected area are always distinctively lower and thus insensitive to the expectation value.
  • Cases with advanced degree of surface polishing will form the common doughnut-like cluster pattern.
  • clusters with indefinite shape are more likely indicative of uneven hybridization.
  • data has a tendency to increase or decrease, producing diffusion in the scatter plot with experimental reproducibility (Fig. 5).
  • Such unevenness can be derived from several sources, and some of the distinctive regions are insensitive to the expectation value while some are not (Fig. 9, ATGE 14 C).
  • the differences in sensitivity correspond to the differences in the magnitude of the defect.
  • Defects detected as smaller clusters or isolated windows may have been formed by dust. Again, some of these features are distinct while others are not.
  • Errors in the chip structure can be identified as repeated clusters in the same parts of multiple chips, forming regular shapes often surrounded by straight lines. Many such defects are not problems but control cells designed and placed on the chip, although some may be caused by problems, appearing in all chips with similar batch numbers (i.e., same manufacturing lot). Such problems might be caused by product errors that have not been detected in quality controls and can result in serious problems. In the case shown in Fig. 5, the huge upward diffusion is attributed to this sort of failure (Fig. 9, lower left). The proposed method will reduce false positives in microarray data analyses.
  • the proposed method will rescue clean data from a failure-free region of hybridization, and the data remaining after cancellation can be normalized and used for further analysis.
  • the resultant data set showed fair coincidence with the corresponding pairs in reproducibility experiments (Fig. 5, center). The total cost of experiments will be reduced in comparison to an ad hoc approach to cancellation of genes in arrays and/or entire arrays.
  • the R program will be affected by the tissue effect in discovery of the ideal standard of hybridization. That is, the standards will differ according to the differentiation of cells in the sample. Such an effect will occur when treating small numbers of arrays together with large number of arrays on a different tissue. Additionally, treating data using less than four arrays is not encouraged, since the standard cannot be considered stable.
  • the stability of the standard can be checked using the approach shown in Fig. 2, and the tissue effect can be noticed by a marked increase in cancellations without producing the clusters of cancelled windows found in Fig. 9. Such problems can be avoided by finding the standard separately from the recognition process. Practically, two alternative ways can be employed to discover the ideal standard: using randomly selected samples among various tissues of many arrays, and by finding tissue-specific standards and using these for the corresponding arrays.
  • the present invention can be utilized for microarray analysis for detecting nucleotide hybridization including measuring mRNA levels and finding SNPs, for example.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Analytical Chemistry (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

L'invention cherche à pallier les problèmes causés par une hybridation inégale et la contamination par la poussière. À cet effet on obtient: une valeur de différence entre chaque valeur de cellule de données d'un microréseau d'ADN et chaque valeur standard correspondante de données standard, ainsi qu'une pseudoimage en remplaçant chaque valeur de cellule des données du microréseau d'ADN par la valeur de différence. On crée une fenêtre correspondant à un nombre prédéterminé de cellules dans la pseudoimage, et on calcule une valeur médiane pour chaque fenêtre en déplaçant séquentiellement la fenêtre sur la pseudoimage pour obtenir un ensemble de valeurs représentatif des fenêtres. La ou les fenêtres dont la valeur d'indice excède une valeur critique sont détectées puis annulées.
PCT/JP2007/072605 2006-11-21 2007-11-15 Méthode de détection des défauts dans les données de microréseaux d'adn WO2008062855A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2009520720A JP5147084B2 (ja) 2006-11-21 2007-11-15 Dnaマイクロアレイデータにおける欠陥を検出する方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US86068006P 2006-11-21 2006-11-21
US60/860,680 2006-11-21

Publications (1)

Publication Number Publication Date
WO2008062855A1 true WO2008062855A1 (fr) 2008-05-29

Family

ID=39429785

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2007/072605 WO2008062855A1 (fr) 2006-11-21 2007-11-15 Méthode de détection des défauts dans les données de microréseaux d'adn

Country Status (2)

Country Link
JP (1) JP5147084B2 (fr)
WO (1) WO2008062855A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA3213915A1 (fr) * 2016-09-22 2018-03-29 Illumina, Inc. Detection de la variation du nombre de copies somatiques

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006030822A1 (fr) * 2004-09-14 2006-03-23 Toudai Tlo, Ltd. Procede et programme de traitement des donnees d'expression genetique

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030182066A1 (en) * 2000-06-28 2003-09-25 Tomokazu Konishi Method and processing gene expression data, and processing programs
US7715990B2 (en) * 2002-01-18 2010-05-11 Syngenta Participations Ag Probe correction for gene expression level detection
JP4266575B2 (ja) * 2002-06-07 2009-05-20 株式会社東京大学Tlo 遺伝子発現データの処理方法および処理プログラム

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006030822A1 (fr) * 2004-09-14 2006-03-23 Toudai Tlo, Ltd. Procede et programme de traitement des donnees d'expression genetique

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KONISHI T.: "Detection and restoration of hybridization problems in Affymetrix GeneChip data by parametric scanning", GENOME INFORMATICS, vol. 17, December 2006 (2006-12-01), pages 100 - 109 *

Also Published As

Publication number Publication date
JP2010510557A (ja) 2010-04-02
JP5147084B2 (ja) 2013-02-20

Similar Documents

Publication Publication Date Title
US6489096B1 (en) Quantitative analysis of hybridization patterns and intensities in oligonucleotide arrays
Reimers et al. Quality assessment of microarrays: visualization of spatial artifacts and quantitation of regional biases
Goryachev et al. Unfolding of microarray data
WO2004095221A2 (fr) Appareil et procedes permettant d'analyser et de caracteriser des sequences nucleotidiques
US6763308B2 (en) Statistical outlier detection for gene expression microarray data
Arteaga-Salas et al. An overview of image-processing methods for Affymetrix GeneChips
JP4302924B2 (ja) Dnaマイクロアレイのデータを統計的に分析する際の画像測定法
US20030087289A1 (en) Image analysis of high-density synthetic DNA microarrays
Balagurunathan et al. Noise factor analysis for cDNA microarrays
CN111508559B (zh) 检测目标区域cnv的方法及装置
US20130151164A1 (en) Systems and Methods for Analyzing Microarrays
WO2008062855A1 (fr) Méthode de détection des défauts dans les données de microréseaux d'adn
WO2013171565A2 (fr) Procédé et système pour évaluer des molécules dans des échantillons biologiques en utilisant des images dérivées de micropuce
Konishi Detection and restoration of hybridization problems in affymetrix GeneChip data by parametric scanning
US6876929B2 (en) Process for removing systematic error and outlier data and for estimating random error in chemical and biological assays
WO2006030822A1 (fr) Procede et programme de traitement des donnees d'expression genetique
TWI808595B (zh) 分析缺陷的方法
JP2007049126A (ja) 半導体ウエハ上の局所性不良を検出するテスト方法及びこれを用いるテストシステム
Dror et al. Bayesian estimation of transcript levels using a general model of array measurement noise
US20150347674A1 (en) System and method for analyzing biological sample
Arteaga-Salas et al. Reducing spatial flaws in oligonucleotide arrays by using neighborhood information
Rueda Image Processing of Affymetrix Microarrays
Arteaga-Salas 9 Image Processing of Affymetrix Microarrays
US20050175228A1 (en) Method and system for managing and querying gene expression data according to quality
WO2024138666A1 (fr) Procédé et appareil de test de qualité pour puce de positionnement omique spatio-temporelle, dispositif et support de stockage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07832334

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2009520720

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07832334

Country of ref document: EP

Kind code of ref document: A1