WO2008056693A1 - Procédé de traitement de données de micro-réseau d'adn, dispositif de traitement et programme de traitement - Google Patents

Procédé de traitement de données de micro-réseau d'adn, dispositif de traitement et programme de traitement Download PDF

Info

Publication number
WO2008056693A1
WO2008056693A1 PCT/JP2007/071620 JP2007071620W WO2008056693A1 WO 2008056693 A1 WO2008056693 A1 WO 2008056693A1 JP 2007071620 W JP2007071620 W JP 2007071620W WO 2008056693 A1 WO2008056693 A1 WO 2008056693A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
dna microarray
cell
probe
value
Prior art date
Application number
PCT/JP2007/071620
Other languages
English (en)
Japanese (ja)
Inventor
Tomokazu Konishi
Original Assignee
Akita Prefectural University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Akita Prefectural University filed Critical Akita Prefectural University
Priority to JP2008543099A priority Critical patent/JP5147073B2/ja
Publication of WO2008056693A1 publication Critical patent/WO2008056693A1/fr

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/30Microarray design

Definitions

  • the present invention relates to a DNA microarray data processing method, processing apparatus, and processing program.
  • DNA microarray data is obtained as incomplete relative values, it is necessary to standardize the data in order to compare the data between microarrays. There are several techniques for standardizing DNA microarray data.
  • Non-patent Document 1 It is provided by Affymetritas and is installed in an operating system called MAS or GCOS. This consists of complicated steps and includes many options that can be selected. Basically, based on PM-MM data, the median is standardized by making it a constant value between chips (Non-patent Document 1). )
  • Nonparametric methods find differences in data based on rank.
  • the “ideal data distribution” is used as a scale for evaluating the difference between the rankings.
  • Many currently used “more advanced” analysis methods use this.
  • Quantile Normalization There are two methods for Quantile Normalization.
  • Non-patent Document 2 All data are sorted in order of intensity for each data, and the average is taken between the numerical values of the same rank. The collection of average values obtained is considered as the ideal data distribution. Replace all data with the values of the ideal data distribution while maintaining the ranking in the data. As a result, all standardized data have the same distribution (Non-patent Document 2).
  • RMA has the power to consider the average value of the measurement group as a desirable value. Probably because of this, various data corruption can occur. Also, RMA does not know how far it will be in the noise range. And RMA always needs to calculate all data together. Therefore, if data is added, basically the calculations must be repeated.
  • Standardization is performed using a three-parameter lognormal distribution model of DNA chip data (Patent Document 1, Patent Document 2, Non-Patent Document 5, the contents of these documents are incorporated herein by reference).
  • Data often has this distribution over a wide range of signal strengths. The distribution is mathematically described and easily reproduced.
  • a background value ⁇ which is an unknown number, is obtained for each data, and the other two parameters are automatically derived and standardized by applying them to this distribution.
  • the optimal background value ⁇ is log-normally distributed. Since the lognormal distribution is a distribution in which the logarithmic value of the data becomes a normal distribution, by standardizing the logarithm straight ⁇ -, it is possible to make the standard of the data equal.
  • the above parametric method is based on the observed distribution of data, and uses an operation that finds 7 that cannot be measured (but important), so this method has sufficient validity. It is. And the accuracy of the analysis result obtained as a result is also high. This is indicated, for example, by improving the reproducibility of measurements.
  • the estimation of the ⁇ parameter required when standardizing the measured data with a three-parameter lognormal distribution requires repeated calculations while observing the convergence of the data on the model.
  • standardization of data is performed by curve fitting of the QQ plot so that it is closest to the lognormal distribution while increasing or decreasing the ⁇ parameter.
  • the approximate value of ⁇ can be obtained by a single calculation, but iterative calculation is necessary to obtain the required accuracy.
  • a human judges the shape of the QQ plot visually, and a case where a certain Merckmar is determined!
  • the speed of the calculation is the first problem. Of course, there is no speed at which the machine will repeat.
  • the task of comparing data and ⁇ value candidates with a lognormal distribution is a powerful and heavy calculation even with the latest computer.
  • Another problem is that it is difficult to define what determines the difference from the lognormal distribution.
  • the measurement data has a noise level unique to each measurement and a signal level indicating a non-linear response.
  • One measurement is strong! /, The signal area is measured accurately, and another measurement is weak! /, The signal area is measured accurately. Therefore, “from where to where” should be precisely matched varies slightly for each measurement. If this is decided first, it is difficult to perform an ideal work. However When trying to make a decision at a time, the operator's personality may be reflected in the data.
  • Patent Literature l WO02 / 01477 (US2003 / 0182066)
  • Patent Document 2 JP 2004-13573
  • Patent 1 Gene hip bxpression Analysis, DataAnalysis Fundamentals (by Aifym etrix)
  • Non-Patent Document 2 Bolstad, Tsuji, Irizarry, R., Astrand, M. and Speed, T. (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19, 185 -193.
  • Non-Patent Document 3 Li, C. and Wong, W. (2001) Model-based analysis of oligonucleotide arays: expression index computation and outlierdetection. Proc Natl Acad Sci USA, 98, 31-36.
  • Non-Patent Document 4 Li, C. and Hung, W.W. (2001) Model-based analysis of oligonucleotid arrays: modelvalidation, design issues and standard error application. Genome Biol, 2.
  • Non-Patent Document 5 onishi T, “Three-parameter log normal distribution ubiquitously found in cDNA microarray data and its application to parametric data treatment, BM BM Bioinformatics 2004, 5: 5
  • An object of the present invention is to perform standardization of DNA microarray data at high speed by simple calculation.
  • An object of the present invention is to provide a standardization method that avoids the distortion from linearity in the standardization of the conventional parametric method and the necessity of the iterative operation necessary for the nomenclature estimation. [0013] The present invention also provides an estimate of the compulsive noise.
  • a method for processing DNA microarray data according to the present invention includes:
  • each data is rearranged in size order to determine the rank of the data.
  • the rank of certain data is the inverse of the standard normal cumulative distribution function. And outputting a standardized predicted value of the data,
  • the rank is input as a quantile (quantile number). Since the predicted value is an output value of the inverse function of the standard normal cumulative distribution function, it is acquired as a z-score. Therefore, the data is standardized by replacing the data with the predicted value.
  • each hybridized data is obtained as the signal intensity of each probe cell or each spot. Therefore, each data can be considered as the signal intensity of the probe cell.
  • the data set subject to standardization of the present invention is not necessarily limited to the observed signal intensity itself, and the data subject to standardization depends on the configuration or system of the target DNA microarray. It will be understood by those skilled in the art that can be a value obtained by performing some processing on the observed value.
  • the data to be standardized according to the present invention is not necessarily limited to all the observed hybridized data, and is subject to standardization depending on the configuration or system of the target DNA microarray. It will also be appreciated by those skilled in the art that the data can be selected data. For example, when a DNA microarray includes PM (Perfect Match) probes and MM (Mismatch) probes! /, The data to be standardized is PM probe cell data! /, Is an MM probe cell. The data of the PM probe cell is preferable. Typical microarrays containing PM (Perfect Match) and MM (Mismatch) probes are typically GeneChips. (Registered trademark). It is also possible for a person skilled in the art to remove external control cell data, discard noise-containing cell data, or exclude the presence of insensitive probes before applying the standardization step. Understood.
  • DNA microarray data is standardized by replacing all data for which ranking has been determined with the predicted values.
  • data of a part of probe cells is replaced with the predicted value.
  • the probe cell to be replaced is a cell with strong signal intensity. In this way, in the standardization by the norametric method, the measurement value of the hybridization is strong enough to lose linearity! Can be estimated and replaced.
  • the DNA microarray data processing method according to the present invention further includes a step of estimating a signal intensity region in which the compulsive noise is dominant! /, Or even! /.
  • the data is distributed with parameters specific to that noise.
  • these are grouped by gene (probe set), for example, using a trimmed average, the collected numbers tend to be normally distributed. This means that normally distributed noise is randomly selected and its trimmed average is taken.
  • the logarithmic value of the gene level (if it can be measured properly) has a normal distribution power. Since data with a normal distribution is selected by a combination of genes and a trimmed average is taken, the distribution is wider than noise and ⁇ is larger because it is not a random selection.
  • the estimation of the compatible noise according to the present invention is based on the above-mentioned viewpoint.
  • the DN DNmicroarray is composed of a plurality of probe sets, and the probe set is composed of a plurality of probe cells. It corresponds to a gene.
  • a typical example of a microarray provided with such a probe set is GeneChip.
  • the representative value is selected from a group power consisting of a trimmed average, a median, and a weighted average.
  • the step comprises
  • the data smaller than the obtained inflection point is estimated as the data affected by the compo- nent noise.
  • the step of obtaining the inflection point includes
  • the step of obtaining the inflection point includes
  • An intersection of the approximate straight line of the second part and the approximate straight line of the first part is acquired, and the acquired intersection is used as an inflection point.
  • the DNA microarray data processing method according to the present invention further comprises a pre-processing step of rejecting defective probe cell data, the step comprising:
  • a predetermined number of probes in a small region Prepare a window corresponding to the cell, calculate the representative value of the window based on the value of the predetermined number of probe cells, move the window one cell at a time and scan the pseudo image to represent the representative value of each window. Step to get the
  • the z-score of the representative value is compared with a preset critical value assuming a normal distribution of the representative value of each window, and one or more windows estimated to include defective probe cell data are determined. And a step of rejecting a plurality of probe cell data corresponding to the defined window.
  • the numerical value includes not only the difference value obtained by subtracting the other data from one data but also the ratio of both data.
  • the present invention is also provided as a processing device for DNA microarray data.
  • the DNA microarray data processing apparatus further comprises storage means for storing part or all of each data, and display means for processing and displaying each data according to a desired format. .
  • the processing apparatus may further include means for executing the steps recited in each claim.
  • a DNA microarray data processing apparatus is typically composed of one or more computers having an input unit, an output unit, a display unit, a calculation unit, and a storage unit.
  • the present invention is also provided as a computer program for causing a computer to execute the steps recited in each claim or as a computer-readable medium recording the program.
  • a computer to process DNA microarray data (typically for standardization), a computer A data set obtained from a single DNA microarray! /, A means for rearranging each data in size order and determining the rank of the data;
  • the present invention When the present invention is applied to specific DNA microarray data, the content of the present invention is changed without departing from the spirit of the present invention, depending on the uniqueness of the DNA microarray system and DNA microarray data. It is understood by those skilled in the art that improvements can be made, and such improvements and modifications are included in the present invention. In addition, it is understood by those skilled in the art that data handled in the present invention can be appropriately converted based on known statistical methods and microarray data processing methods, and such improvements and modifications are included in the present invention. It is. In addition, as a post-processing step, the present invention includes a step of converting the standardized data into a constant so as to be suitable for the subsequent data analysis!
  • the present invention can avoid the above-mentioned two problems caused by the normometric method, that is, distortion from linearity and the necessity of the iterative operation necessary for the parameter estimation.
  • the calculation can be completed only once while correcting the distortion, so that the calculation time is shortened and an error due to an error in the parameter estimation can be avoided. This calculation does not require repeated parameter estimation, and as long as the above assumptions are correct! /, And no correction is required! / ,!
  • the present invention in the standardization method of microarray data, standardization of all data is realized by estimating a value that should be originally taken from the rank of the data.
  • the present invention and the parametric method (SuperNORM) described above are similar in that they aim for a lognormal distribution.
  • SuperNORM obtained this distribution by adding and subtracting one parameter ⁇ .
  • the present invention obtains a lognormal distribution under the assumption that this should be the case. It can be said that the objectivity and convenience are bartered.
  • the reason why the present invention may use such an assumption is that it has been verified many times by SuperNORM, so the same chip is used in the same experiment as before, and it suddenly happens one day. However, it will not be observed for any reason. This is because it is difficult.
  • DNA microarray data can be standardized by assigning corresponding lognormal distribution model values based on the rank of the data without obtaining ⁇ values or background values; repeated calculations and numbers This is advantageous over the parametric method (SuperNORM) in that it eliminates the confirmation of the agreement with the physical model, thereby reducing calculation time and labor.
  • SuperNORM parametric method
  • the principle of the present invention is as follows.
  • the present invention makes two assumptions that if individual measurements were made without any problems, the data would be a three-parameter lognormal distribution and that the measurement problems would not affect the rank of the data. Based on.
  • standardization is performed based on the assumption that the data should be based on a lognormal distribution. For each data, check the order of each cell data. The normal distribution value corresponding to the data order is returned.
  • each value to be taken is a function of rank. For example, it can be expressed by the following formula.
  • the inverse function of the standard cumulative distribution function is a function that returns a z-score from the distribution ratio.
  • the "normsinv" function it is also sometimes called a normal point function (percent point function of the normal distribution), a fractional fll function, or a quantile fbnctioru.
  • the signal power S of a certain rank the value that should be considered from the original distribution, can be predicted. If so, it is also possible to replace the measured value with this predicted value according to its rank.
  • the replaced values are necessarily normally distributed. These values can then be compared across measurements. Therefore, the purpose of data standardization is also achieved by this replacement. It can be calculated at high speed what rank a value takes in the population. It's not difficult to give that rank a specific z-score.
  • the DNA microarray data processing method of the present invention is executed by a processing device including a computer.
  • the processing apparatus is mainly composed of one or a plurality of computers, and has input means, output means, storage means for storing various data and programs, and calculation means.
  • Examples of the input means include a mouse and a keyboard.
  • the input means is not limited to these.
  • data transferred from another computer can be directly transferred to a computer without using a user interface such as a mouse keyboard. It may be a means for inputting into the.
  • the output means is exemplified by a display device, that is, a display, but the output means is not limited to the display device.
  • printing means (including those with or without a display device), calculated It may be a means for transferring the data as it is to another computer.
  • the storage means include a hard disk drive, main memory (ROM and RAM), and a storage medium.
  • the storage means stores various data and programs for executing calculations.
  • the storage means stores the observed signal intensity of each probe cell and various values obtained by processing the signal intensity. Is memorized.
  • the calculation means is mainly composed of a CPU (Central Processing Unit), and the input information and / or certain information is stored in the storage means, and the predetermined value is set while being controlled by the control program. It is configured to calculate.
  • CPU Central Processing Unit
  • DNA microarray data processing method of the present invention will be further described based on FIG. First, one DNA microarray data set is prepared (Sl).
  • This DNA microarray data set consists of the signal intensity of each probe cell obtained according to the gene expression level.
  • a data set including the signal intensity of each probe cell is prepared as data stored in a storage unit of the processing apparatus or a storage medium readable by the processing apparatus as numerical values.
  • the signal strength of each probe cell is sorted in descending order (may be sorted in ascending order), and the signal intensity of each probe cell is rearranged in order of magnitude to obtain each probe cell data. Is determined (S2).
  • the order of each PM cell data may be checked. If the data contains external controls (such as a cross pattern on the chip or a calibration curve), it is easy for those skilled in the art to simply remove these from the data. To be understood.
  • the data rank is acquired as a quantile (quantile number).
  • quantile number The specific calculation method is as follows. For a single piece of data in an individual measurement of a single data set, find the fractional AI point (The uniform order statistic medians: mu) in all the data. For example, if there are 3 ⁇ 4 taka, this is the i-th data from the largest,
  • n For data with a large n such as a microarray, it can be substantially replaced with i / n.
  • the rank of each data is input to the inverse function of the standard normal cumulative distribution function, and the standardized predicted value of each data is output (S3).
  • the specific calculation method is as follows.
  • This output value is a predicted value after standardization of the corresponding data.
  • each data is replaced with the corresponding output value N (i), and z-standardization of each data is performed (S4).
  • the z-score of each data obtained in this way is stored in the storage unit of the processing device. Then, other processes are further performed as necessary.
  • creation of pseudo data is exemplified.
  • the standardized value of the DNA microarray data is a z-score, and ⁇ is corrected after being compressed logarithmically. This is quite different from what the data originally has. Therefore, many processing programs executed after data standardization cannot accept it.
  • the pseudo data is to supplement it. It becomes a constant, not a logarithm. It is calculated to have a common ⁇ . That is, based on the normal distribution value (logarithm), the lognormal distribution value (constant) is returned. And at that time, we use the same value as the value of ⁇ between experiments. This value is found in the original data group.
  • each data is subtracted using the formula (UQ * LQ— M * M) / (UQ + LQ-2M) that approximates the background value, and the logarithm Then, determine the ⁇ value using a method such as IQR.
  • IQR a method such as IQR.
  • UQ and LQ are 25th and 75th percentiles, but other percentile combinations with 50 in between, 60 and 40, 80 and 20, etc. may be used.
  • the additive noise level of the data can also be examined using the properties of the system.
  • Individual N (i) obtained can be extracted for each gene, and its median or average value (trim average, weighted average) can be used as an estimate of the expression level of that gene.
  • median average weighted average
  • This distribution can be used to investigate the measurement confidence range. In the following, the estimation of the level of additive noise will be described in detail.
  • the force S which is considered to correct the value in the strong signal region, is not corrected for the compulsive noise in the weak signal region. Therefore, in the present invention, it is desirable to estimate the level of additive noise! /.
  • the average value is a normal distribution.
  • 10 to 20 probe cell data are collected and used as a group (so-called probe set) for each gene.
  • a median is used to collect data for each probe set. The median basically has the same characteristics as the average. So, though not so much, the value of ⁇ should be smaller than the population by taking the median.
  • the actual measurement data is also a biphasic plot as expected.
  • the normal probability plot (QQ plot) representing the distribution in the case of actual data is an inverted “ku” consisting of a first part on the upper data side and a second part on the lower data side with different slopes.
  • a normal probability plot of a biphasic shape in the shape of is obtained. This lower second part is an area that suggests the possibility of being dominated by noise. In the case of this data, there is an inflection point near zero. Noise level from here You can power to find a benole.
  • the first part and the second part having different inclinations may be linearly approximated to determine the intersection of the approximated straight lines. A z-score below this intersection is considered to be under the influence of noise. In one aspect, data that is affected by noise can be rejected.
  • a high quantile is desirable, but if it is too high, it may be affected by noise. Therefore, a quantile of about 10% from the top is desirable. It is also possible to use a weighted average that weights the upper data instead of the quantile.
  • the representative value is obtained for each cell, a set of MXN representative values is obtained, and the obtained representative values are aggregated to draw a histogram. These values are roughly normal force S, and cells with extremely low sensitivity form a lower long tail. The discovery of the long tail It will be apparent to those skilled in the art that the drawing of the ram is not necessarily required.
  • the sensitivity of each probe can be estimated. Focus on the z-score obtained by standardization for a probe set of one gene, and obtain the median or average value (trim average, weighted average) to obtain the estimated expression level of that gene. By subtracting each z-score from the estimated value, you can obtain the sensitivity of each cell probe. This sensitivity is roughly normal. Probes that are virtually insensitive form a lower long tail in the distribution ( Figure 6A). Desirably, the probe cell that is often attributed to this long tail through multiple measurement data using the same type of tip does not have this sensitivity! /, A probe cell.
  • the probe cells do not have these sensitivities! /, As long as hybridization experiments are performed under the same conditions, they do not exhibit sensitivity. Therefore, once a cell is found, it can be excluded from the beginning in subsequent analysis.
  • the signal intensity of each probe cell of one microarray is prepared (S10).
  • the signal intensity of each probe cell is logarithmically converted and then standardized and prepared as a z-score.
  • the standardization method for the standardization here, other well-known methods are adopted as the standardization method according to the present invention.
  • the z-score is simply divided by the median or average value.
  • a reference data set including reference data corresponding to each probe cell is prepared.
  • the reference data is virtual data (or calculation results) that can be used as a reference. Ideally , The most average data, or the most standard
  • the reference data is typically prepared as a z-score
  • the reference data includes a large number (for example, 6 to 10 sets) obtained by standardizing the measurement results of the material being measured on the microarray used, and representative values thereof. (Trimmed average, median, weighted average, etc.).
  • the organization is the same as the source organization of the microarray data set to be processed, but is not limited to the same organization.
  • the reference data can be found as representative values (trimmed average, median, weighted average, etc.) of a standardized version of a large number of data sets measured as diverse as possible. For miscellaneous organizations, the number of datasets should be large.
  • Standardization of these data sets is performed by standardization of a method different from the standardization of the present invention, for example, by taking a logarithm and dividing by a median. Alternatively, standardization may be performed by subtracting the background value.
  • this pre-processing process it is desirable to create the reference data from multiple GeneChips. If there is a mistake V, GOT! And a chip data set, it is possible to use that one data set as reference data.
  • the reference data is not limited to the above-described one.
  • the reference data set may be composed of values (including 0) common to the probe cells.
  • the reference data set may be generated from a random number having a small deviation close to zero by simulation.
  • a window corresponding to a predetermined number of probe cells in a small area is prepared in the pseudo image having the acquired difference, and the predetermined number of probe cells are set. Based on the value! /, Calculating the representative value of the window, move the window one cell at a time and scan the pseudo image to obtain the representative value of each window (S30, Fig. 10, Fig. 11) .
  • Windows predetermined number of cells
  • the representative value of the small area is selected from the median, trim average, weighted average, average, total power, and group power.
  • the small region is 3 ⁇ 3 ⁇ ; 10 ⁇ 10 cells.
  • the small area is a 5 ⁇ 5 cell. As shown in FIG.
  • the pseudo image is a diagram in which the difference from the reference data ⁇ , ⁇ , ⁇ , ⁇ , ⁇ ,... Is given as the value of each probe cell of the DNA microarray.
  • window W is a small area of 5 x 5 cells.
  • the representative value (for example, median) given to the 25 probe cells included in the small area is obtained and used as the representative value of the window W (ie, small area).
  • the representative value of the window W ie, small area.
  • the representative value of the small area 5 ⁇ 5 cells
  • the acquired representative value of each window small area
  • the calculation of the representative value of the window does not necessarily require that the pseudo image is actually displayed on the display means.
  • the representative value of each window (corresponding to each window)
  • the representative value of a given number of cells in a small area should be normally distributed according to the central limit theorem.
  • Figure 12 shows a normal probability plot of the representative value (median) of each window.
  • DNA microarray datasets that contain window values corresponding to dust and dirt spots are normal distributions, although they are regular distributions. If you look at the QQ plot, you will see values that are just a few high, low and high, and low in the region, and deviate from the theoretical values (do not ride on a straight line! /). In addition, the ⁇ of the normal distribution becomes slightly larger due to the influence of dust.
  • the representative value of each acquired window is standardized to obtain a ⁇ score of the representative value (S40).
  • This representative value ⁇ score serves as an index for comparison with a preset critical value or cutoff value ( ⁇ score).
  • ⁇ score a preset critical value or cutoff value
  • the standardized representative value is compared with a criterion for rejection (critical value, cutoff value) set in advance assuming a normal distribution of the representative value of each window, and defective probe cell data is obtained.
  • This value is compared with the z score of the representative value of each window, and the z score to be rejected is determined.
  • the critical value can be selected as appropriate by the operator.
  • cell data contained in the determined window (a predetermined number of cells in the small area) is rejected (S 60). If the window corresponds to 5 x 5 cells, 25 cell data are discarded per window. If two completely separated windows are determined, 50 cell data are discarded. If the two determined windows are next to each other (shifted by one cell), it becomes a rejection area of 5 x 6 cells.
  • the present invention can be used for standardization of DNA chip data.
  • FIG. 1A is a QQ plot of data obtained by the parametric method.
  • FIG. 2 ⁇ Fig. 1 ⁇
  • Fig. IB is a diagram showing a comparison of the data in IB using a scatter plot.
  • FIG. 2B is an enlarged view of the strong signal intensity region of FIG. 2A.
  • FIG. 3A is a scatter plot showing the data in FIG. 2A by rank.
  • FIG. 3B This is the same figure as FIG. 2B, and is shown for comparison with FIG. 3A.
  • FIG. 5 is a flowchart showing a standardization method of the present invention.
  • FIG. 6A is a histogram showing the sensitivity of each cell of GeneChip. The arrow indicates the lower long tenor.
  • FIG. 6B is a normal probability plot of GeneChip PM-MM data.
  • FIG. 7 is a flowchart of a pre-processing step for rejecting cell data estimated to be defective.
  • FIG. 9 is a diagram for explaining step 20 in FIG. 7.
  • FIG. 10 is a conceptual diagram showing window scanning on a pseudo image.
  • the window dimensions are different from the actual dimensions.
  • FIG. 12 Normal probability plot showing the distribution of the representative value (median) of each window.

Landscapes

  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

Il est possible de fournir un procédé de standardisation rapide des données de micro-réseau d'ADN au moyen d'un calcul simple. Le procédé comprend : une étape de réarrangement d'ensembles de données obtenus d'un micro-réseau d'ADN (c'est-à-dire, une pluralité de données hybrides) dans un ordre descendant de façon à attribuer un rang aux données ; une étape d'entrée du rang de certaines données dans une fonction inverse d'une fonction de distribution cumulée normale standard et de sortie d'une valeur prédite après standardisation des données ; et une étape de remplacement des données par la valeur prédite.
PCT/JP2007/071620 2006-11-08 2007-11-07 Procédé de traitement de données de micro-réseau d'adn, dispositif de traitement et programme de traitement WO2008056693A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2008543099A JP5147073B2 (ja) 2006-11-08 2007-11-07 Dnaマイクロアレイデータの処理方法、処理装置及び処理プログラム

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US85811506P 2006-11-08 2006-11-08
US60/858,115 2006-11-08
US86068006P 2006-11-21 2006-11-21
US60/860,680 2006-11-21

Publications (1)

Publication Number Publication Date
WO2008056693A1 true WO2008056693A1 (fr) 2008-05-15

Family

ID=39364507

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2007/071620 WO2008056693A1 (fr) 2006-11-08 2007-11-07 Procédé de traitement de données de micro-réseau d'adn, dispositif de traitement et programme de traitement

Country Status (2)

Country Link
JP (1) JP5147073B2 (fr)
WO (1) WO2008056693A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002001477A1 (fr) * 2000-06-28 2002-01-03 Center For Advanced Science And Technology Incubation, Ltd. Procede de traitement de donnees d'expression genique et programmes de traitement
WO2003062450A2 (fr) * 2002-01-18 2003-07-31 Syngenta Participations Ag Correction de sondes pour detection de niveau d'expression genetique
JP2004013573A (ja) * 2002-06-07 2004-01-15 Center For Advanced Science & Technology Incubation Ltd 遺伝子発現データの処理方法および処理プログラム
WO2006030822A1 (fr) * 2004-09-14 2006-03-23 Toudai Tlo, Ltd. Procede et programme de traitement des donnees d'expression genetique

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002001477A1 (fr) * 2000-06-28 2002-01-03 Center For Advanced Science And Technology Incubation, Ltd. Procede de traitement de donnees d'expression genique et programmes de traitement
WO2003062450A2 (fr) * 2002-01-18 2003-07-31 Syngenta Participations Ag Correction de sondes pour detection de niveau d'expression genetique
JP2004013573A (ja) * 2002-06-07 2004-01-15 Center For Advanced Science & Technology Incubation Ltd 遺伝子発現データの処理方法および処理プログラム
WO2006030822A1 (fr) * 2004-09-14 2006-03-23 Toudai Tlo, Ltd. Procede et programme de traitement des donnees d'expression genetique

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KONISHI T.: "Atarashii Microarray Data no Kaiseki Hoho", PLANT ORGANELLES NEWS LETTER, no. 2, August 2005 (2005-08-01), pages 17 - 20 *
KONISHI T.: "DNA Chip Data no Atarashii Kaisekiho ga Motarasu Chiken", THE JAPANESE JOURNAL OF CLINICAL PATHOLOGY, vol. 54, no. 1, 25 January 2006 (2006-01-25), pages 37 - 44 *
TOMOKAZU KONISHI: "Three-parameter lognormal distribution ubiquitously found in cDNA microarray data and its application to parametric data treatment", BMC BIOINFORMATICS, vol. 5, 2004, pages 5 *

Also Published As

Publication number Publication date
JP5147073B2 (ja) 2013-02-20
JPWO2008056693A1 (ja) 2010-02-25

Similar Documents

Publication Publication Date Title
EP2745110B1 (fr) Extrapolation de données de capteur interpolées pour augmenter un débit échantillon
WO2017110753A1 (fr) Procédé d'analyse de nombre, dispositif d'analyse de nombre, et support d'informations pour analyse de nombre
CZ20003884A3 (cs) Způsob vyhodnocování chemických a biologických vzorků, získaných z maticových hybridizačních testů
Thompson Towards a unified model of errors in analytical measurementBased on papers read by the author at SAC99 in Dublin, July 1999, and at FACSS in Vancouver, October 1999.
AU2020203717A1 (en) Detecting a transient error in a body fluid sample
WO2006030822A1 (fr) Procede et programme de traitement des donnees d'expression genetique
Todorov Robust selection of variables in linear discriminant analysis
CN113707219A (zh) 用于分析核酸扩增反应的分析方法和系统
US10973467B2 (en) Method and system for automated diagnostics of none-infectious illnesses
WO2008056693A1 (fr) Procédé de traitement de données de micro-réseau d'adn, dispositif de traitement et programme de traitement
KR101684742B1 (ko) 약물 가상 탐색 방법과 집중 탐색 라이브러리 구축 방법 및 이를 위한 시스템
JP6280910B2 (ja) 分光システムの性能を測定するための方法
Bell-Glenn et al. Calculating detection limits and uncertainty of reference-based deconvolution of whole-blood DNA methylation data
Dror et al. Bayesian estimation of transcript levels using a general model of array measurement noise
CN109920474A (zh) 绝对定量方法、装置、计算机设备和存储介质
JP4266575B2 (ja) 遺伝子発現データの処理方法および処理プログラム
KR100469608B1 (ko) 디엔에이 마이크로어레이 자료 분석 방법 및 그 시스템
WO2022025104A1 (fr) Dispositif, système et procédé de diffusion d'informations et programme
Rueda Image Processing of Affymetrix Microarrays
US20030023403A1 (en) Process for estimating random error in chemical and biological assays when random error differs across assays
Arteaga-Salas 9 Image Processing of Affymetrix Microarrays
D’Aucelli et al. A MATLAB Framework for measurement system analysis based on ISO 5725 Standard
US20110029439A1 (en) High throughput research workflow
JP2022186254A5 (fr)
Myllykangas Validation of OS-Seq panels for clinical diagnostics of inherited disorders

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07831351

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2008543099

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 07831351

Country of ref document: EP

Kind code of ref document: A1